Problem with random forest
Page 1 of 1

Author:  antonysoldatov [ Wed Jan 03, 2018 1:07 pm ]
Post subject:  Problem with random forest

I get problem with using random forest methods

Here is building RF
  double[,] xy = new double[,]
  {{0, 0, 0, 0, 0, 255, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 1},
  {0, 255, 0, 0, 0, 0, 0, 0, 2},
  {0, 0, 0, 0, 255, 0, 0, 0, 3},
  {0, 0, 0, 0, 0, 0, 0, 255, 4}};
  alglib.dfbuildrandomdecisionforestx1(xy, 8, 5, 5, 50, 3, 0.6, out info, out df, out rep);

When I try to use it like this
double[] x = new double[]{0, 0, 0, 0, 0, 0, 0, 0};
  alglib.dfprocess(df, x, ref y);

I get wrong classification result {0.005, 0.49, 0, 0.505, 0}. So max possibility is 4th value (0.505). But it should be 2nd value (inner value is zero array, that is class 1)
Please help me to solve this problem.
Thank you!

Author:  Sergey.Bochkanov [ Wed Jan 03, 2018 4:59 pm ]
Post subject:  Re: Problem with random forest


Random forests are (no surprise!) randomized constructs. They try randomly many different classification schemes, with different variables being selected - and different random datasets being generated for training. In particular, it is very likely that roughly 40% of your random trees will be trained without instances of class #2. And your toy dataset is not well suited for randomized methods - drop just one variable (say, last one), and you can not reliably distinguish between instances of classes #2 and #4.

So, it is completely normal that on such small toy dataset you get such results. Try training on larger dataset, with noise being added to inputs.

Author:  antonysoldatov [ Thu Jan 04, 2018 5:28 am ]
Post subject:  Re: Problem with random forest

Thank you for reply and your advise!

Page 1 of 1 All times are UTC
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group