Machine learning in trading: theory, models, practice and algo-trading - page 2595

 
elibrarius #:

Then it turns out that it is necessary to train on the shortest possible section. So that after you change the pattern, the new pattern starts working faster.

For example, if training for 12 months, then after the change of the pattern in 6 months, the new and old patterns will be 50/50. And in about a year there will be training and trading on the new pattern. That is, for almost a whole year the pattern was trading on an outdated pattern and was most likely losing.

If you train for 1 month, the model will learn to work correctly again in a month.

It would be good to study for 1 week... But there is not enough data.

I'm sure it's not a good idea to go for a short period. And the data will be insufficient for the models and the risk of overfitting to market conditions. The concept of adaptation, it looks good, but because of the lag (while accumulated data, the state may already have changed) it is unlikely a grail. You can try several models simultaneously - one is responsible for longer-term patterns, another/others for shorter-term (current) ones, the solution is a function of the solutions of all these models.

 
Aleksey Nikolayev #:
There are more interesting questions of using MO in trading. For example, the algorithm of determining which interval of history to take for training. Perhaps it can be set by some meta-parameters which are optimized by crossvalidation. I'll have to read Prado.)

Probably better to roll by forward, it always has OOS after train. In SW only the first pass will be like that, the rest will use both before and after TRAIN data.

 
Replikant_mih #:

It is not at all necessary to leave in a short period, I'm sure. And the data will be insufficient for the models and the risk of overfitting to the state of the market. The concept of adaptation looks good, but because of the lag (while accumulated data, the state may have already changed) it is unlikely a grail. You can try several models simultaneously - one is responsible for longer-term patterns, another/others for shorter-term (current) ones, the solution is a function of the solutions of all these models.

According to recent experiments on 5000 lines of M5 (about 2 months) there is something interesting. At 3000 it's already bad. But this is for the specific chips+target taken. Maybe there will be a different set of features and target, when the model will work after training on a short line. We have to experiment...
 
elibrarius #:
According to recent experiments on 5000 lines of M5 (about 2 months) there is something interesting. At 3000 it's already bad. But this is for specific chips+target taken. Maybe there will be a different set of features and target, when the model will work after training on a short line. We'll have to experiment...

Depends on number of features, I like more features, usually 5000 is not enough, if you have up to 5 features, probably 5000 would be fine.

 
elibrarius #:

Then it turns out that it is necessary to train on the shortest possible section. So that after you change the pattern, the new pattern starts working faster.

For example, if training for 12 months, then after the change of the pattern in 6 months, the new and old patterns will be 50/50. And in about a year there will be training and trading on the new pattern. That is, for almost a whole year the pattern was trading on an outdated pattern and was most likely losing.

If you train for 1 month, the model will learn to work correctly again in a month.

It would be good to study for 1 week... But the data is not enough already.

Then it may under-learn, give less profit... here it is already a matter of selection probably. Sometimes sampling from current distributions may help for short samples, like in the articles
 
Replikant_mih #:

About the noise, yes. Although I hadn't thought about it in terms of taking sections of the story with and without noise. And by the way, how do you understand that before model training? Like, iteratively? Trained on the whole area, saw where the best performs, left only these areas and trained first only on these areas? Here comes the second question, which, pending experimental confirmation, may be called philosophical: is it better for the model to immediately see different areas, including noisy, but train on average on the noisier data, or to learn from cleaner data, but not to ever see the noisy data.


What's wrong with giant sizes? Besides the increased computation time?

Well, like to train on the last history on different tray lengths and compare how the models worked on an earlier history, if they stop working simultaneously somewhere, then here it is "the horizon of applicability in the past", taking into account the last history. And how it will be in the future is unknown, but you can determine the criteria for stopping the bot - it started trading worse than on the history, for example.

On a long sample, a general pattern that shows up on subsamples can disappear. If regularities from subsamples contradict each other, then we will only be able to learn from noise, which is successfully done in most cases :)
 
elibrarius #:

It is probably better to use valking forward, because it always uses OOS after trayn. In SW, only the first pass will be like that, the rest will use both before and after trains.

There is a special time-series KV, at catbust for example. But then the dataset cannot be shuffled. And if you don't shuffle, it's like fitting to shifting sections of equal length. And if you shuffle it, then the training is more for local signals, not dependent, roughly speaking, on changing trends. Who knows which is better :)
 
Maxim Dmitrievsky #:
And if you don't mix it up, it's like fitting to alternating sections of equal length.
I'm going to trade that way, too. The length will be equal to a week. One week trading, one weekend studying. And valking forward does the same thing.
 
Foolishness
 
elibrarius #:

It is probably better to roll forward, it always has OOS after trayn. In KV, only the first pass will be like that, the rest will use both before and after OOS for trailing data.

I agree with you if we answer the question "How to trade the next period? If we answer the question "Is there a pattern in the given period of history?", then the QC is quite applicable.

Reason: