Hello!
HotPotato wrote:
1. input counts - what is the approximately optimal amount of inputs that a decision tree should work on.
There is no clear limit on the number of inputs. Decision forests can exclude redundant inputs and monitor generalization error. It is better not to feed forest with trash data, but forest is more trash-resistant than neural network or other non-ensemble model. I suppose that in most application 100 inputs will be handled without problems.
HotPotato wrote:
2. variable results - even though decision forests are groups of decision trees i have still noticed variable results in the models being returned after training, even with the same datasets.
Decision forests are random, hence some amount of randomness will be always present. If forest results are too variable, increase amount of the trees - it should make results more stable.
HotPotato wrote:
3. data normalisation - is there a certain normalisation technique that decision trees will respond to better than others?
Hard to tell. Definitely they are not sensitive to the scaling/shifting of inputs - that's because decision forest does not calculate linear combinations of inputs.
HotPotato wrote:
4. Extracting results - is there a way to extract what inputs are more relational to the outputs than others? I can see many weights being held in the internal model of the decision forests, but im not sure how to approach this.
Again, hard to tell. You can't decide what inputs are more important judging from weights assigned by decision forest. Some authors propose to make random permutation of one particular input and to train a new forest using modified sample. Comparison of the generalization errors should tell you what inputs are more important :) I should say that I don't like this idea, but it is the only solution I've heard of.
Hope this helps, and good luck with ALGLIB :)