Machine learning in trading: theory, models, practice and algo-trading - page 3520

 
Aleksey Vyazmikin #:

Other sampling and markup

Is the indicator not affected by class balance?

It does to some extent.
 
Maxim Dmitrievsky #:
Measure the entropy of labels before training. Then compare it with the results on the OOS of the trained model using your own estimates. Reducing entropy should improve trading on the OOS.

I think you need Recall under one for this, I don't have such models.

 
Aleksey Vyazmikin #:

I think it requires a Recall under a unit, I don't have any models like that anywhere near me.

Subjectively, entropy reduction should affect any melons and for any models. If there is such a dependence, but it hasn't been proven yet.
Chatgpt says there is no direct correlation. But of course we believe him very much.
 
The less entropy - the more patterns, the less noise in the data. The order of labels indirectly reflects the patterns in the original data. That's the logic.
It does not matter that these labels do not depend on each other, but on features. (actually they depend on each other).
 
Maxim Dmitrievsky #:
Subjectively, entropy reduction should affect any melons and for any models. If there is such dependence, but it is not proven yet.

My point is that at low Recall - there will be very few units after classification, and there should probably be about the same number of units for correct estimation. Or do you propose to take only the area where the model classified an example with a unit and estimate entropy from this data? Then the amount of data will be very small.

 
Aleksey Vyazmikin #:

My point is that at low Recall - there will be very few units after classification, and there should probably be about the same number of units for correct estimation. Or do you propose to take only the area where the model classified an example with a unit and estimate entropy from this data? Then the amount of data will be very small.

We should compare relative to other similar models, similar markups among themselves. Then absolute values of entropy are not important.
 
Maxim Dmitrievsky #:
Subjectively, entropy reduction should affect any melons and for any models. If there is such dependence, but it is not proved yet.
Chatgpt says there is no direct correlation. But of course we believe him very much.

The translator on the English forum translated that "should affect any melons". The word "Data" was there :)

Although it can affect some melons too :)
 

Prediction horizon has a big impact on label entropy in my case.

Here is the best result so far, when prediction is 7 bars ahead

It works pretty cool on new data too. But I need to do all the tests in numbers later.

Iteration: 0, Cluster: 8, PE: 0.40936608744037023
R2: 0.9870464881731833
Iteration: 0, Cluster: 0, PE: 0.4050201326242423
R2: 0.9501689078137812
Iteration: 0, Cluster: 14, PE: 0.4068050171665601
R2: 0.9424351839049473
Iteration: 0, Cluster: 6, PE: 0.4083841431533637
R2: 0.9640201773292999
Iteration: 0, Cluster: 1, PE: 0.4029598952998219
R2: 0.9647563735412019
Iteration: 0, Cluster: 10, PE: 0.42256264493226764
R2: 0.9480179359165651
Iteration: 0, Cluster: 11, PE: 0.39482397910804234
R2: 0.9544428061031717
Iteration: 0, Cluster: 4, PE: 0.4142613513859099
R2: 0.9627640645034978
Iteration: 0, Cluster: 13, PE: 0.39485729195195374
R2: 0.9681222157387659
Iteration: 0, Cluster: 2, PE: 0.4105500062917616
R2: 0.9247043823736797
Iteration: 0, Cluster: 3, PE: 0.4369346321572134
R2: 0.957083732499118
Iteration: 0, Cluster: 7, PE: 0.39147941900792577
R2: 0.9229524618732813
Iteration: 0, Cluster: 9, PE: 0.3804822442346182
R2: 0.9578936787240975
Iteration: 0, Cluster: 12, PE: 0.4128185350207254
R2: 0.90955228742045
Iteration: 0, Cluster: 5, PE: 0.37123232664087563
R2: 0.9187458031617316
 
Maxim Dmitrievsky #:

Prediction horizon strongly affects the entropy of labels in my case.

Here is the best result so far, when predicting 7 bars ahead

It works pretty cool on new data too. But I need to do all the tests in numbers later.

How do you compare after splitting into clusters (judging by the log), because, I assume, they have different numbers of sample elements?

 
Aleksey Vyazmikin #:

How do you compare after splitting into clusters (judging by the log), since I assume they have different numbers of sample elements?

Apparently, the number of items doesn't affect the metric that much.

It's much more affected by the way the labels are labelled.
Reason: