Machine learning in trading: theory, models, practice and algo-trading - page 1305

 

The change in performance as a function of the "probability" threshold shift for 0/1 classification from 0.45 to 0.65

Essentially accuracy

The histograms show that the increase in classification accuracy occurs quite smoothly with a shift in the probability threshold for classification, which cannot be said about profits.

It turns out that we should not only consider the efficiency of classification, but also assess how profits are distributed between the rules (leaves), and what their sensitivity threshold is. In other words, no matter how you spin it, you have to pull out the individual rules and give them an estimate.

 
elibrarius:
saw)
So there's a confusion with the terminology.

What about cross-validation? All sabbets are involved, so validation is more accurate than the test.

Anyway, yeah, whatever. My test is the 2nd part of the subsample, but I'll call it validation then

 
Aleksey Vyazmikin:

Change in performance as a function of the "probability" threshold shift for 0/1 classification from 0.45 to 0.65

Essentially accuracy

The histograms show that the increase in classification accuracy occurs quite smoothly with a shift in the probability threshold for classification, which cannot be said about profits.

It turns out that we should not only consider the efficiency of classification, but also assess how profits are distributed between the rules (leaves), and what their sensitivity threshold is. No matter how you look at it, you have to take out separate rules and grade them.

Profits are smaller at 0.65 because there are less trades too. For example, instead of 100, there will be 10 trades. You can increase the lot

 
Aleksey Vyazmikin:

Change in performance as a function of the "probability" threshold shift for 0/1 classification from 0.45 to 0.65

Essentially accuracy

The histograms show that the increase in classification accuracy occurs quite smoothly with a shift in the probability threshold for classification, which cannot be said about profits.

It turns out that we should not only consider the efficiency of classification, but also assess how profits are distributed between the rules (leaves), and what their sensitivity threshold is. That is, no matter how you spin it, you have to pull out individual rules and give them an estimate.

When you raise the threshold when the model goes bad there will be less and less deals on the new data, probabilities will roll around zero, it's a good time for retraining

To raise the thresholds it is necessary that the error would be low, otherwise there will be no signals at all
 
elibrarius:

Profits are smaller at 0.65 because there are less trades too. For example instead of 100, there will be 10 deals.

The number of trades, and profitable trades just varies quite smoothly (trade is a trade/2 by MT logic)

Just the loss is not stable per trade, because the stop loss is not fixed.

 
elibrarius:
saw)
Overall, confusion with the terminology

I propose my terminology (I will stick to it for now):

1. Learning sample - the one where model creation takes place

2. test sample - used to control the quality of model training, including the stoppage of training

Examination sampling - it does not depend on training, it is used for assessing the quality of a model

 
Maxim Dmitrievsky:

raise the threshold, when the model will deteriorate there will be fewer and fewer deals on the new data, the probabilities will roll around zero, this is a good time for retraining

I don't want to raise the thresholds so that the error would not be too high, otherwise there would be no signals at all.

Yes it is clear. It's just that the signals disappear because of the lack of reproduction of the connections in the leaves, especially if their large total activation part revolved around 0.5 and looked like the sum of 0.1+0.05+0.08+0.25+0.03 - one of the sums fell out and that was it, no activation occurs.

 
Aleksey Vyazmikin:

Yes, it's understandable. Just signals disappear because of the lack of reproduction links in the leaves, especially if their large total activation part was spinning near 0.5, and looked like the sum of 0.1 + 0.05 + 0.08 + 0.25 + 0.03 - fell out one of the sums and all, no activation occurs.

It means the algorithm does not properly handle new data, I need to further manipulate it ) roughly - retrained

 
Maxim Dmitrievsky:

10% error per test and trace for ~10k examples, increases smoothly with increase

with such an error, the models started to work on the new data

on validation differently, you have to go through the options

I don't reveal algorithms anymore, I'm just communicating

О! That's the deal! Almost like mine! Told you I didn't need to listen to all kinds of ales and wizards :)

 
Maxim Dmitrievsky:

It means that the algorithm does not generalize well to the new data, you need to twist-twist more ) roughly - retrained

So I keep on spinning, I don't want to fool myself :)

I'll add a new dose of predictors...

Reason: