Machine learning in trading: theory, models, practice and algo-trading - page 1278

 
elibrarius:

Something similar to the permutation that Maxim found. But does it make sense to substitute a predictor with a variation from 0.1 to 0.2 for a predictor with a variation from 800 to 300000? No!

But to shuffle its rows does. The range of numbers and probability distribution will remain, but the values in each example will become random.

Yes, indeed something similar, probably that's when this idea came up. I did not understand what the problem is to change the predictor, because each predictor on the line has its own indicators, in addition it is necessary to somehow keep the grid breakdown of these values (it may be a uniform step 0.1 0.2 0.3 or any other - there are options for different developers model builders) such as it was when the algorithm for building a tree, if it is possible.

And what is also important is that you need to check not the entire sample, but just those rows where the previous leaf was activated, so that the data were comparable in accuracy, i.e. you need to pre-filter the sample (for evaluation on different leaves respectively different filtration).

 
Aleksey Vyazmikin:

Yes, indeed something similar, probably then this idea arose. I do not understand what the problem is to change the predictor, because each predictor on the line has its own indicators, in addition it is necessary to keep the grid breakdown of these values (it may be a uniform step 0.1 0.2 0.3 or any other - there are options for different developers of model builders) such as it was when the algorithm for building a tree, if it is possible.

Well instead of the predictor which maximum was 0,2 substitute the value from other predictor with a value from 800 to 300000? And it turns out that it will always go to the right branch. And we need to check both the right and left branches.
The normalization will help to fit into the range, but the probability distribution may be different, as a result the right branches will be triggered more often than the left or vice versa.
Or I don't understand your idea and we are talking about different things.

Aleksey Vyazmikin:

And what is also important is that we should not check the entire sample, but precisely those lines where the previous leaf was activated, so that the data are comparable in accuracy, i.e. we should pre-filter the sample (for evaluation on different leaves, respectively, different filtering).

Removing the predictor, we drop nodes that separate its data (why a separate leaf?). When you discard each node you have to check 2 branches. If we discard 10 nodes, we get 11 choices (subtrees) with 11 leaves as an answer. This needs to be averaged, running the whole sample with a mixed column will roughly show this in the change in the final tree/forest error.
Read article about pemutation - everything is described there in details.

 
elibrarius:

Well, instead of the predictor which had a maximum of 0.2 value from the other predictor with a value of 800 to 300000? And it turns out that it will always go to the right branch. And we need to check both right and left branches.
The normalization will help to fit into the range, but the probability distribution may be different, as a result the right branches will be triggered more often than the left or vice versa.
Or I don't understand your idea and we are talking about different things.

By discarding the predictor we're discarding the nodes that separate the data from it (what does that have to do with a separate sheet?). By discarding each node we have to check 2 branches. If we discard 10 nodes, we get 11 choices with 11 leaves as the answer. This needs to be averaged, running the entire sample with a mixed column will roughly show this in the change in the final tree/forest error.
Read the article about pemutation - it describes everything in detail there.

My method is not pemutation, so I cannot reproduce it.

I start from another one, that a leaf is an already completed rule, a sign of some observation, and it may well already exist without a tree structure. A tree is a tool for generating rules from observations.

Of course, I agree that some predictors will make a modified leaf rule completely non-functional on the same part of the sample, but this is not a problem, since the goal is to find a better analog and compare only with it - it is normal that some predictors that split the sample by split will be lost, but as a rule this only concerns the value series for an individual predictor. For example, we have a sheet of three predictors A>3 && B<1 && C>=20, there are also predictors D and E which are not included in the rules list, respectively we need to remove one by one each predictor, to begin with A and replace it with D and E with a step of the grid division of predictor values and with different signs of inequality, in this case check each new rule on the same sample areas where the original was activated and get the statistics of the classification accuracy for each rule. And, accordingly, the best option is to compare with the original and give this comparison an estimate. Do this procedure for all leaves. In this case exclude those leaves which are duplicated. It is important to evaluate not the result of the tree as a set of rules, and each rule stated in the leaf separately from the others.

I am not trying to determine the importance of the predictor for the greed tree, but the importance of the predictor for the stability of the rule (leaf) proposed by the model.
 
Aleksey Vyazmikin:

My method is not pemutation, so I cannot reproduce it.

I start from another one, that a leaf is an already completed rule, a sign of some observation, and it may well already exist without a tree structure. A tree is a tool for generating rules from observations.

Of course, I agree that some predictors will make a modified leaf rule completely non-functional on the same part of the sample, but this is not a problem, since the goal is to find a better analog and compare only with it - it is normal that some predictors that split the sample by split will be lost, but as a rule this only concerns the value series for an individual predictor. For example, we have a sheet of three predictors A>3 && B<1 && C>=20, there are also predictors D and E which are not included in the rules list, respectively we need to remove one by one each predictor, A to begin with and replace it with D and E with a step of the grid division of predictor values and with different signs of inequality, in this case check every new rule on the same parts of sample where the original was activated and get the statistics of classification accuracy for each rule. And, accordingly, the best option is to compare with the original and give this comparison an estimate. Do this procedure for all leaves. In this case exclude those leaves which are duplicated. It is important to evaluate not the result of the tree as a set of rules, but each rule in the leaf separately from the others.

I'm not trying to determine the importance of the predictor for the greed tree, but the importance of the predictor for the stability of the rule (leaf) proposed by the model.
I've been doing scaffolding so far. So in your MO direction - you know what to do)
 
elibrarius:
I'm scaffolding for now. So in your direction MO - you know what to do)

So it turns out that everyone here is about his own :)

Tell me, is it realistic to implement on the same Alglib to build a forest only with unique values of predictors, or at least with unique splits? Trees will be less of course, but they won't duplicate their errors/right answers, which should give more plausibility outside of training sample.

 
Aleksey Vyazmikin:

Tell me, is it realistic to implement on the same Alglib possibility to build a forest only with unique values of predictors, or at least with unique splits? Of course there will be less trees, but they won't duplicate their errors/right answers, which should give more probability out of sample training.

Of course. MQL allows you to program anything, like any other language. And Alglib can be rewritten beyond recognition to fit your ideas.

 
Elibrarius:

Of course. MQL allows you to program anything, just like any other language. And Alglib can be rewritten beyond recognition to fit your ideas.

In the word "real" I put "as simple as that"...

But from the answer I understood the attitude, thank you, I won't bother.

 
Aleksey Vyazmikin:

In the word "real" I put "how easy"...

But from the answer I understood the mood, thank you, I will not bother.

Complicated, of course.
But there is a base in the form of tree function. That's what you can modify as you want.
 
elibrarius:
It's complicated, of course.
But there is a base in form of tree function. That's what you can modify it as you see fit.

Thanks for your reply. You have a good understanding of the code, are you planning a public release with improvements to the tree-building algorithm? Even things like tree depth or built-in pruning to a determinable size of the number of rule observations would be very useful there. I didn't use Alglib myself, but there are those who might find it very useful.

 
Aleksey Vyazmikin:

Thank you for your reply. You have a good understanding of the code, are you planning a public release with improvements to the tree-building algorithm? Even things like tree depth or built-in pruning to a determinable size of the number of rule observations would be very useful there. I didn't use Alglib myself, but there are those who might find it very useful.

Figuring it out and making releases are different things. I'm experimenting now. I'm mixing predictors now. May be I will abandon it, as I abandoned NS because of my inability to cope with noise.

Figuring it out is quite easy. You need to look at the code for a few hours and everything will become clear.
Reason: