 Post subject: Problem with random forestPosted: Wed Jan 03, 2018 1:07 pm

Joined: Wed Jan 03, 2018 1:05 pm
Posts: 2
Hello!
I get problem with using random forest methods

Here is building RF
Code:
double[,] xy = new double[,]
{{0, 0, 0, 0, 0, 255, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 1},
{0, 255, 0, 0, 0, 0, 0, 0, 2},
{0, 0, 0, 0, 255, 0, 0, 0, 3},
{0, 0, 0, 0, 0, 0, 0, 255, 4}};
alglib.dfbuildrandomdecisionforestx1(xy, 8, 5, 5, 50, 3, 0.6, out info, out df, out rep);

When I try to use it like this
Code:
double[] x = new double[]{0, 0, 0, 0, 0, 0, 0, 0};
alglib.dfprocess(df, x, ref y);

I get wrong classification result {0.005, 0.49, 0, 0.505, 0}. So max possibility is 4th value (0.505). But it should be 2nd value (inner value is zero array, that is class 1)
Thank you!

 Post subject: Re: Problem with random forestPosted: Wed Jan 03, 2018 4:59 pm

Joined: Fri May 07, 2010 7:06 am
Posts: 824
Hi!

Random forests are (no surprise!) randomized constructs. They try randomly many different classification schemes, with different variables being selected - and different random datasets being generated for training. In particular, it is very likely that roughly 40% of your random trees will be trained without instances of class #2. And your toy dataset is not well suited for randomized methods - drop just one variable (say, last one), and you can not reliably distinguish between instances of classes #2 and #4.

So, it is completely normal that on such small toy dataset you get such results. Try training on larger dataset, with noise being added to inputs.

 Post subject: Re: Problem with random forestPosted: Thu Jan 04, 2018 5:28 am

Joined: Wed Jan 03, 2018 1:05 pm
Posts: 2

