Machine learning in trading: theory, models, practice and algo-trading - page 3523

 
Maxim Dmitrievsky #:

10 models (there are two in each model, basic and meta)

And immediately ready TC.


I run in batches of 20-100 retraining with different parameters. The markup has the biggest impact.

So I want to find a way to find the most correct markup.

Here we are not looking for the "correct" markup, but a convenient for training model with specific settings.

But, even so, it's quite realistic to go over 1kk as well.

 
Aleksey Vyazmikin #:

What is being sought here is not the "right" markup, but an easy to train model with specific settings.

But, even so, it is quite realistic to go over 1kk too.

Well, the features are the same, but the markups are different. And you get different models.
 
Maxim Dmitrievsky #:
Well, the features are the same, but the markups are different. So you get different models.

Well, of course that makes sense. My method just allows you to discard the factor of model settings.

If you claim that your CB settings do not significantly affect the learning process, then drop such a sample for reproduction, I will be interested to familiarise myself with it.

In any case, this is all randomness talk. As long as there is no possibility to quickly detect the "breakdown" of the model, the drain will be rapid on new data.

 
Aleksey Vyazmikin #:

Well, of course it makes sense. My method simply allows you to put aside the model tuning factor.

If you claim that your CB settings do not significantly affect the learning process, then throw such a sample for reproduction, I will be interested to familiarise myself with it.

In any case, this is all randomness talk. As long as there is no possibility to quickly detect the "breakdown" of the model, the drain will be rapid on new data.

There are 2 models, not the usual way. 2 different datasets
 
Maxim Dmitrievsky #:
There are 2 models, not the usual. 2 different datasets

Well, it's clear, I'm interested in the sample with markup after the first model.

 
Aleksey Vyazmikin #:

Well, that's understandable, I'm interested in sampling with markup after the first model.

There are 2 models at the same time. The second one is trained just to trade/not to trade. I can upload it tomorrow, but I don't know why :)
 
Maxim Dmitrievsky #:
There are 2 models out there at the same time. The second one is to learn simply, to trade/not to trade. I can throw tomorrow, I don't know why :)

Still relevant. I'm really interested, as I have quite different conclusions - probably such a different sample.

 
Aleksey Vyazmikin #:

Still relevant. I'm really interested as I have very different conclusions - perhaps such a different sample.

Separate file for each cluster
Files:
output_csvs.zip  5242 kb
 

It turned out to reduce entropy (logloss) through markup with the addition of some "rules". That is, combining MO and TC on logic.

For example, with random partitioning, although Accuracy was normal, but logloss left much to be desired

{'learn': {'Accuracy': 0.8438783894823336, 'Logloss': 0.4787490774779375}, 'validation': {'Accuracy': 0.7420178799489144, 'Logloss': 0.5603823600397243}}

And with the new markings, it works like this.

{'learn': {'Accuracy': 0.9840909090909091, 'Logloss': 0.12419709401710959}, 'validation': {'Accuracy': 0.9470899470899471, 'Logloss': 0.2028722652115128}}

I'm really happy, it's a real improvement. I didn't bring up entropy for nothing.

{'learn': {'Accuracy': 0.9907674552798615, 'Logloss': 0.09702284179278793}, 'validation': {'Accuracy': 0.955585464333782, 'Logloss': 0.15982284254600834}}

:)

 
Maxim Dmitrievsky #:

It turned out to reduce entropy (logloss) through markup with the addition of some "rules". That is, combining MO and TC on logic.

For example, with random partitioning, although Accuracy is normal, but logloss left much to be desired

And with the new markup it turns out like this

I'm really happy, a real improvement. I touched upon entropy for a reason.

:)

Of course, it is possible to get such a result with random sampling. But, according to rough calculations, it is necessary to make at least 10000 restarts of the markup, taking into account the length of the sample and the range of parameters. This is the minimum, at the level of probability of the same markup falling out, and so in the neighbourhood of a million.

That's why I wanted to find a fast way to check, but directly through entropy did not work. It takes a long time through the model.
Reason: