forum.alglib.net http://forum.alglib.net/ |
|
Prepare TestData with Nonreal data fields http://forum.alglib.net/viewtopic.php?f=2&t=3795 |
Page 1 of 1 |
Author: | sagh0901 [ Thu Mar 09, 2017 5:32 am ] |
Post subject: | Prepare TestData with Nonreal data fields |
Hello Admin, I am using the dataset from http://archive.ics.uci.edu/ml/datasets/Forest+Fires. It contains two columns specifically month and day. My attempt was by giving(0-11) for months(ordering by FIFO in column) and same for day(0-6) with same ordering technique. I didnt follow Calendar ordering for month and day column. With this attempt, I was able to give some real number for actual textual representation. When I used mlptraines or mlpkfoldcvlm or mlpkfoldcvlbfgs, irrespective of how many hidden layer neurons i give, I am getting very large rmserror(magnitude in 50 to 100). I am suspecting normalization of the data. But by reading documentation, I understood that normalization is handled interanally by API. Please guide me, If I am missing anything here, Thanks. |
Author: | Sergey.Bochkanov [ Thu Mar 09, 2017 4:00 pm ] |
Post subject: | Re: Prepare TestData with Nonreal data fields |
Hi! It is better to use one-of-N encoding for categorical variables like month and day. BTW, what is "reference" error for such dataset? |
Author: | sagh0901 [ Sat Mar 11, 2017 6:45 pm ] |
Post subject: | Re: Prepare TestData with Nonreal data fields |
Hello Serey, Thanks for the reply. Sorry i couldnt respond you fastly. I was busy with my other course. I couldn't solve the problem with one-of-N/one-of-N-one encoding. The extract of dataset: X Y month day FFMC DMC DC ISI temp RH wind rain area 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0 0 7 4 oct tue 90.6 35.4 669.1 6.7 18 33 0.9 0 0 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0 0 8 6 mar fri 91.7 33.3 77.5 9 8.3 97 4 0.2 0 8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0 0 8 6 aug sun 92.3 85.3 488 14.7 22.2 29 5.4 0 0 8 6 aug mon 92.3 88.9 495.6 8.5 24.1 27 3.1 0 0 8 6 aug mon 91.5 145.4 608.2 10.7 8 86 2.2 0 0 8 6 sep tue 91 129.5 692.6 7 13.1 63 5.4 0 0 7 5 sep sat 92.5 88 698.6 7.1 22.8 40 4 0 0 7 5 sep sat 92.5 88 698.6 7.1 17.8 51 7.2 0 0 7 5 sep sat 92.8 73.2 713 22.6 19.3 38 4 0 0 6 5 aug fri 63.5 70.8 665.3 0.8 17 72 6.7 0 0 6 5 sep mon 90.9 126.5 686.5 7 21.3 42 2.2 0 0 6 5 sep wed 92.9 133.3 699.6 9.2 26.4 21 4.5 0 0 6 5 sep fri 93.3 141.2 713.9 13.9 22.9 44 5.4 0 0 5 5 mar sat 91.7 35.8 80.8 7.8 15.1 27 5.4 0 0 which i converted month colums in 12 months, so data set will have N+12-1 cloumns so far: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 the day column also changed with 7 columns, in total the total columns now became N+12+7-2 Sun Mon Tue Wed Thu Fri Sat 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 The sample set now becomes: X Y Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Sun Mon Tue Wed Thu Fri Sat FFMC DMC DC ISI temp RH wind rain area 7 5 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 86.2 26.2 94.3 5.1 8.2 51 6.7 0 0 7 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 90.6 35.4 669.1 6.7 18 33 0.9 0 0 The RMS output: It doesn't matter how many hidden layers i give, I see RMS is no where near to zero. I don't know how to proceed further. Please let me know if i am missing any thing. |
Page 1 of 1 | All times are UTC |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |