forum.alglib.net

ALGLIB forum
It is currently Mon May 06, 2024 11:48 am

All times are UTC


Forum rules


1. This forum can be used for discussion of both ALGLIB-related and general numerical analysis questions
2. This forum is English-only - postings in other languages will be removed.



Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: Decision Forest implementation - further analysis
PostPosted: Mon Sep 12, 2011 1:37 pm 
Offline

Joined: Mon Sep 12, 2011 1:15 pm
Posts: 5
All,
 
Firstly hello. I have been using the alglib library for an AI implementation utilising the decision forest components with good effect so far.
 
I can currently predict data as intended, but in a strive for pushing the accuracy i had a few questions:
 
1. input counts - what is the approximately optimal amount of inputs that a decision tree should work on. I know from past experience that I would not want to give an artificial neural network more than ~30 inputs. I know that decision trees can actively exclude inputs if they are not relational to the outputs - In that case can decision forests work on more (lets say 100) and how will this affect the results/accuracy?
 
2. variable results - even though decision forests are groups of decision trees i have still noticed variable results in the models being returned after training, even with the same datasets. Surely the purpose of decision forests is to utilise the power of numbers, gathering many trees and taking the majority vote. With the same datasets being passed in - I can retrieve one forest with 50% accuracy, and another with 70% accuracy. This I do not understand, and it has led me to implement a further layer of grouping and majority voting, something i have named a decision master - in order to suppress the variability.
 
3. data normalisation - is there a certain normalisation technique that decision trees will respond to better than others?
Personally I am using 'leveled' (I made that name up by the way) normalisation which maps inputs
 {0.1, 0.5, 5, 8, 100} into {0.2, 0.4, 0.6, 0.8. 1}
Another technique i used was 'ranged' (again) which would map the same array as something like
 {0.1, 0.5, 5, 8, 100} into {0.01, 0.07, 0.1, 0.12. 0.96}
but the results were not as strong.
What are peoples experiences with this?
 
4. Extracting results - is there a way to extract what inputs are more relational to the outputs than others? I can see many weights being held in the internal model of the decision forests, but im not sure how to approach this.
 
Many thanks for any replies.


Top
 Profile  
 
 Post subject: Re: Decision Forest implementation - further analysis
PostPosted: Tue Sep 13, 2011 8:59 am 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 907
Hello!

HotPotato wrote:
1. input counts - what is the approximately optimal amount of inputs that a decision tree should work on.

There is no clear limit on the number of inputs. Decision forests can exclude redundant inputs and monitor generalization error. It is better not to feed forest with trash data, but forest is more trash-resistant than neural network or other non-ensemble model. I suppose that in most application 100 inputs will be handled without problems.

HotPotato wrote:
2. variable results - even though decision forests are groups of decision trees i have still noticed variable results in the models being returned after training, even with the same datasets.

Decision forests are random, hence some amount of randomness will be always present. If forest results are too variable, increase amount of the trees - it should make results more stable.

HotPotato wrote:
3. data normalisation - is there a certain normalisation technique that decision trees will respond to better than others?

Hard to tell. Definitely they are not sensitive to the scaling/shifting of inputs - that's because decision forest does not calculate linear combinations of inputs.

HotPotato wrote:
4. Extracting results - is there a way to extract what inputs are more relational to the outputs than others? I can see many weights being held in the internal model of the decision forests, but im not sure how to approach this.

Again, hard to tell. You can't decide what inputs are more important judging from weights assigned by decision forest. Some authors propose to make random permutation of one particular input and to train a new forest using modified sample. Comparison of the generalization errors should tell you what inputs are more important :) I should say that I don't like this idea, but it is the only solution I've heard of.

Hope this helps, and good luck with ALGLIB :)


Top
 Profile  
 
 Post subject: Re: Decision Forest implementation - further analysis
PostPosted: Wed Sep 14, 2011 4:56 pm 
Offline

Joined: Mon Sep 12, 2011 1:15 pm
Posts: 5
Thanks for the replies, that's very useful.

One further question -

What would be the best way to serialize one of the AI objects?

I wish to save my results (alglib.decisionforest) using binary serialisation and saving it to my database, but as the objects are not marked with the [Serialize] attribute, I'm getting runtime compiler errors when attempting serialization.

Thanks in advance,

HP


Top
 Profile  
 
 Post subject: Re: Decision Forest implementation - further analysis
PostPosted: Wed Sep 14, 2011 6:51 pm 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 907
You can use dfserialize/dfunserialize functions - they provide portable serialization interface, which can be used to move serialized objects between different machines (different endianness) and even different programming languages.


Top
 Profile  
 
 Post subject: Re: Decision Forest implementation - further analysis
PostPosted: Wed Sep 14, 2011 8:26 pm 
Offline

Joined: Mon Sep 12, 2011 1:15 pm
Posts: 5
Great - thanks for your reply.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 25 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group