Discussion of article "CatBoost machine learning algorithm from Yandex with no Python or R knowledge required" - page 2

 
Andrey Dibrov:

I paid attention to the length of the test period. But a stable positive result - on a short period adjacent to the period of training - a month - two. Let's say we train on a two-year history. Test + a month. Save the result. Shift (or add) for this month - before training (retraining). Test + month. Keep the result. And so on.

Is this a small period?

I understand your idea, I thought about it myself - even made a script, but the training will be blind and on small data - it is doubtful that you can get anything there.

 
Aleksey Vyazmikin:

Is that a small period?

I understand your idea, I was thinking about it myself - I even made a script, but the training will be blind and on small data - it is doubtful that you can get anything there.

I have to test it, a sliding window is like always fresh data).

 
Valeriy Yastremskiy:

Gotta test it, sliding window is like always fresh data)

Who needs it? Can you determine that the market has changed in any metric, but has changed in a way that it was not before? If you can and such an event has occurred, then yes - you need to train a new model taking into account the new data. The smaller you take the interval, the more fitting to the data will be, as no general regularities will be revealed.

For "luck", yes, you can do it, now the script will cut a sample and see what will happen if you train on a window of 12 months every month.

 
Aleksey Vyazmikin:

Who needs it? Can you determine that the market has changed in any metric, but has changed in a way that it was not before? If you can and such an event has occurred, then yes - you need to train a new model taking into account the new data. The smaller you take the interval, the more fitting to the data will be, as there will be no general regularities revealed.

For "luck", yes, you can do it, now the script will cut a sample and see what will happen if you train on a window of 12 months every month.

Me))))) Just manually trying to describe at least specifically different states of BP. I can't say that it's easy) And the sliding window just helps. Of course there is a question of width, but screening of emissions in the window is more effective in my opinion than in filters. Although I may be wrong)

 

Here's a sampling from the article

Took 2 years for training, trained every new month.

I learnt 400 trees - settings for all models are the same.

And EURUSD - here I learnt on the history for a year, also every month


 
No, I made a mistake above - the sample in the article is different - it's archived - I'll redo it now.
 
Aleksey Vyazmikin:
No, I made a mistake above - the sample in the article is different - it's archived - I'll redo it now.


This is the correct version.

Look at Recall - you can see that the models lack knowledge of the market, in other words - the market is more variable than the information in the window - especially closer to our days.

Valeriy Yastremskiy:

To me))))) Just manually trying to describe at least specifically different states of BP. I can't say it's easy) And the sliding window just helps. Of course, there is a question of width, but screening of emissions in the window is more effective in my opinion than in filters. Although I may be wrong)

Here above showed what came out if you take a window of 12 months.

Regarding outliers - if the model is tree-based and also uses quantisation, on the contrary, the more information you give, the less you will be affected by outliers, because statistically they will be small.

 
Aleksey Vyazmikin:


It's the right thing to do.

Look at Recall - you can see that the models lack knowledge of the market, in other words - the market is more variant than the information hit the window - especially closer to our days.

Here's the above showing what came out if you take a 12 month window.

Regarding outliers - if the model is tree-based and also uses quantisation, on the contrary, the more information you give, the less you will be affected by outliers, because statistically they will be small.

The width of the window is very important for the result of training depending on the state of the series. And the width has optimality. Too big period of sliding window data is as harmful as too small one.

 
Valeriy Yastremskiy:

The width of the window is important for the learning result depending on the state of the row. And the width has optimality. Too big period of sliding window data is as harmful as too small one.

Let's move from abstractions to numbers. How much will a small window be effective?

The point is that you suggest jumping after market conditions, while I suggest using knowledge about different market conditions. The more knowledge backed by history, the slower the patterns built on it will change.

And then, how do you define hyperparameters on a small sample - how many training iterations at least. I put the same everywhere.
 
Ah, try repeating the same experiment, only adding another month or two of history to the training sample and then compare the two tests. Whether the neural network will remain stable, what influence more recent price movements have on this model...