forum.alglib.net
http://forum.alglib.net/

Prepare TestData with Nonreal data fields
http://forum.alglib.net/viewtopic.php?f=2&t=3795
Page 1 of 1

Author:  sagh0901 [ Thu Mar 09, 2017 5:32 am ]
Post subject:  Prepare TestData with Nonreal data fields

Hello Admin,

I am using the dataset from http://archive.ics.uci.edu/ml/datasets/Forest+Fires. It contains two columns specifically month and day.
My attempt was by giving(0-11) for months(ordering by FIFO in column) and same for day(0-6) with same ordering technique. I didnt follow Calendar ordering for month and day column. With this attempt, I was able to give some real number for actual textual representation. When I used mlptraines or mlpkfoldcvlm or mlpkfoldcvlbfgs, irrespective of how many hidden layer neurons i give, I am getting very large rmserror(magnitude in 50 to 100). I am suspecting normalization of the data. But by reading documentation, I understood that normalization is handled interanally by API. Please guide me, If I am missing anything here, Thanks.

Author:  Sergey.Bochkanov [ Thu Mar 09, 2017 4:00 pm ]
Post subject:  Re: Prepare TestData with Nonreal data fields

Hi!

It is better to use one-of-N encoding for categorical variables like month and day. BTW, what is "reference" error for such dataset?

Author:  sagh0901 [ Sat Mar 11, 2017 6:45 pm ]
Post subject:  Re: Prepare TestData with Nonreal data fields

Hello Serey,
Thanks for the reply. Sorry i couldnt respond you fastly. I was busy with my other course. I couldn't solve the problem with one-of-N/one-of-N-one encoding.

The extract of dataset:
X Y month day FFMC DMC DC ISI temp RH wind rain area
7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0 0
7 4 oct tue 90.6 35.4 669.1 6.7 18 33 0.9 0 0
7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0 0
8 6 mar fri 91.7 33.3 77.5 9 8.3 97 4 0.2 0
8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0 0
8 6 aug sun 92.3 85.3 488 14.7 22.2 29 5.4 0 0
8 6 aug mon 92.3 88.9 495.6 8.5 24.1 27 3.1 0 0
8 6 aug mon 91.5 145.4 608.2 10.7 8 86 2.2 0 0
8 6 sep tue 91 129.5 692.6 7 13.1 63 5.4 0 0
7 5 sep sat 92.5 88 698.6 7.1 22.8 40 4 0 0
7 5 sep sat 92.5 88 698.6 7.1 17.8 51 7.2 0 0
7 5 sep sat 92.8 73.2 713 22.6 19.3 38 4 0 0
6 5 aug fri 63.5 70.8 665.3 0.8 17 72 6.7 0 0
6 5 sep mon 90.9 126.5 686.5 7 21.3 42 2.2 0 0
6 5 sep wed 92.9 133.3 699.6 9.2 26.4 21 4.5 0 0
6 5 sep fri 93.3 141.2 713.9 13.9 22.9 44 5.4 0 0
5 5 mar sat 91.7 35.8 80.8 7.8 15.1 27 5.4 0 0

which i converted month colums in 12 months, so data set will have N+12-1 cloumns so far:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1

the day column also changed with 7 columns, in total the total columns now became N+12+7-2
Sun Mon Tue Wed Thu Fri Sat
1 0 0 0 0 0 0
0 1 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 0 1

The sample set now becomes:
X Y Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Sun Mon Tue Wed Thu Fri Sat FFMC DMC DC ISI temp RH wind rain area
7 5 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 86.2 26.2 94.3 5.1 8.2 51 6.7 0 0
7 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 90.6 35.4 669.1 6.7 18 33 0.9 0 0

The RMS output:
Image

It doesn't matter how many hidden layers i give, I see RMS is no where near to zero. I don't know how to proceed further. Please let me know if i am missing any thing.

Page 1 of 1 All times are UTC
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/