How do I understand a new trade? - General

Forester 2020.08.12 08:21 #19631

Aleksey Vyazmikin:

Earlier I probably didn't explain that there should be at least 1% of the indicators left in the sheet on small samples and 100 on large samples, so of course the split will not be to the point of no error in the sheet on any class.

You seem to misunderstand the last step - I see it as a statistical evaluation of the remaining 1% sample - in this sample we observe that the result improves with splits by different predictors, we get subspace information, for example:

If A>x1, then target 1 will be true 40%, which is 60% of the subsample

If B>x2, then target 1 will be correctly defined by 55%, which is 45% of the subsample

If A<=x1, then target 1 will be true 70%, which is 50% of the subsample

Each such split has a significance factor (haven't decided how to count it yet), and the last split has one as well.

and so on, let's say up to 5-10 predictors, then when applying, if we reach the last split, we add up the coefficients (or use a more complicated method of summation), and if the sum of coefficients exceeds the threshold, then the sheet is classified 1, otherwise zero.

A simple way to implement this is to forcibly build a forest up to the penultimate split, and then exclude predictors already selected from the sample, so that new ones would be selected. Or simply after building the tree, filter the sample by sheet and go through each predictor by itself in search of the best split that meets the criterion of completeness and accuracy.

And, the result on the training sample will improve if the other class "0" means no action and not the opposite input, otherwise there can be both improvement and deterioration.

Anyway, the training result will be something between the tree trained to N and N+1 depths, for example 6 and 7. If at level 6 the error is 20% and at level 7 the error is 18%, your method will give an error between them, for example 19%. Do you think a gain of 1% is worth the time?
Earlier described a simple way - train some trees to level 6 and some to level 7 depth. This would require rewriting the code.
Now I've come up with a simpler way, since you don't have to rewrite anything. Just build a random forest with any package to level 6 and another forest to level 7, then average.

Thetraining sample should not bother us - it is always fine.

Algorithm for combining ranges The market is a Is there a pattern

Aleksey Vyazmikin 2020.08.12 12:42 #19632

elibrarius:
All the same, the result of training will be something between a tree trained to N and N+1 depth, for example 6 and 7. If at level 6 the error is 20% and at level 7 the error is 18%, your method will give an error between them, for example 19%. Do you think a gain of 1% is worth the time?
Earlier described a simple way - train some trees to level 6 and some to level 7 depth. This would require rewriting the code.
Now I've come up with a simpler way, since you don't have to rewrite anything. Just build a random forest with any package to level 6 and another forest to level 7, then average.

The training sample shouldn't worry us-it's always fine.

The gain is usually more than 1%.

Of course you can randomize forests, but how do you get them to be the same up to the penultimate split? Suppose you've trained 10 trees to the 6th split, and you train 10 more trees the same way, but to the 7th.

Is there a pattern FOREX - Trends, forecasts variable passing

Forester 2020.08.12 13:19 #19633

Aleksey Vyazmikin:

The gain is usually more than 1%.

Of course you can random forests, but how do you make them be the same up to the penultimate split? Suppose you trained 10 trees to the 6th split, and the remaining 10 we teach the same way, but to the 7th.

No way. That's why they are random, they take random columns for training. Averaging then gives good results.
You can try to set the fraction of columns = 1. That is, all columns would be involved in building the tree, rather than a random 50% of all columns. All trees will be the same, so also set 1 tree in the forest. Total one forest with one tree is trained to 6, the other to 7 level of depth.
If you need more than 2 trees - then independently remove some columns from the set and train additional forests on all remaining columns.

Addendum: the number of rows involved in training should also be set =1, i.e. all so that training is the same. So everything random from random forest is removed.

Is there a pattern A probability theory problem Machine learning for robots