Thanks for your response. It is a little disappointing that there are some missing aspects of the random forest algorithm, but I understand completely that it is rather complex and that ALGLIB is a much larger project than just this class. If only I had the time to learn the algorithm and help out!
Sergey.Bochkanov wrote:
77? Errr... it should be impossible to get such value when solving classification problem :) Please, show me code which leads to such output so I can trace this bug.
I can give you some code, although it is in the form of an addin for another program, you should be able to see how it works.
The relevant sections are as follows:
This build the array (a double [,] array) where the first column is the calssification (a double value, either 1 or 0) and the other columns are double values between 0 and 1. In this case I have a fudge where if there is a missing value (NODATA) then it is replaced with a value of 0.5. I had tried -9999 as I mentioned, since the documentation recommends creating another class for the NODATA values...
Code:
//build an array to hold the parameters for the random forest operation.
double[,] array = new double[inputVSG.FeatureTable.Count, nParameters + 1];
//populate the array.
for (int i = 1; i < inputVSG.FeatureTable.Count - 1; i++) {
array[i - 1, 0] = inputVSG.FeatureTable[i].classification;
for (int param = 0; param < nParameters; param++) {
try {
array[i - 1, param + 1] = inputParamListList[param][i];
} catch (NoDataException) {
array[i - 1, param + 1] = .5;
}
}
}
After this I carry out the operation:
nParameters = (around 7), nClasses = 1, nTrees = (between 50 and 200), rValue = (between .3 and .6)
Code:
//Create the random forest.
int number = 0;
dforest.decisionforest df = new dforest.decisionforest();
dforest.dfreport dfreport = new dforest.dfreport();
dforest.dfbuildrandomdecisionforest(ref array, inputVSG.FeatureTable.Count, nParameters, nClasses, nTrees, rValue, ref number, ref df, ref dfreport);
Then I check the TEST dataset using the following:
Code:
for (int param = 0; param < nParameters; param++) {
try {
outputAttributesArray[param] = outputParamListList[param][row.RowIndex];
} catch (NoDataException) {
outputAttributesArray[param] = .5;
}
}
dforest.dfprocess(ref df, ref outputAttributesArray, ref result);
row.rank = result[0];
This results in a value of between 0 and 1 usually. But when I had the NODATA values set at -9999 there were some strange values resulting. Let me know if you need any more information. I can also give you my dataset if you like.
Cheers,
Alex