Discussion of article "Advanced resampling and selection of CatBoost models by brute-force method" - page 2

[Deleted]  
Aleksey Vyazmikin:

Interesting article.

I got the feeling that with this tricky move with random assignment and pseudo-sample generation, we just find similar dependencies from the training period significant on the test.

What percentage of models fail the test?

It would be interesting to add a third sample - let us learn from the first one, select goodness-of-fit results given the test, and check the selection result on the exam.

But how can we find similar correlations if the market is random? The answer is none, only by spiking past data into the training. And there's nothing here. You can add even a 10th sample or test it in MT5 on new data.
[Deleted]  
Stanislav Korotky:
The main questionable point is learning from the latest data and testing on older data. This is somewhat analogous to looking into the future: the latest current models incorporate something from earlier models (market participants have memory, after all), but in the opposite direction it is more difficult to predict the future. I think that if you restart the algorithm in the canonical way (training on old data, testing on new data - it's more like reality), the result is not so good.
There is no difference, you can check. I just like it better this way.
 
Valeriy Yastremskiy:

It depends on what is considered a regularity, if it is the order of succession of increments, tied to time, then it is a seasonal regularity of behaviour of increments, if not tied, then the same sequence of increments with some freedom in accuracy.

And it depends on what is considered fitting. If knowingly identical series, then it is a fitting, but the purpose of the test (no matter from which side) is to check the result on not identical areas.

And the logic of training on the near period is logical, but it is the same, if we test in the depth of history, the result should be the same, if we train in the depth of history, and test in the near period.

We only confirm the hypothesis that there are regularities in the test and training plots.

Fitting - if the predictor(sheet or analogue) classified a small number of cases, less than 1% of the observations - this is me explaining what fitting is to me.

 
Maxim Dmitrievsky:
But how can we find similar relationships if the market is random? The answer is, we can't, only by feeding past data into the training. And here nothing is mixed in. You can add even 10 samples, or you can test in MT5 on new data.

I understand that there's no spoofing. I don't know Python, but it seems to me that the model is estimated from 2015 to 2020, right?

I'm more about the validity of the estimation criterion, how much it can help select a model that will work outside of the test sample that was used to select it.

[Deleted]  
Aleksey Vyazmikin:

My understanding is that it's not swept. I don't know Python, but it seems to me that the model evaluation is from 2015 to 2020, correct?

I'm more about the validity of the evaluation criterion, how much it can help select a model that will work outside of the test sample that was used to select it.

Everyone can evaluate as they wish. I think the approach in the article is quite normal. Normal. If there are any other super galactic testing techniques then please let me know.

Without python, unfortunately, machine learning is almost non-existent... I'll have to learn it sooner or later, it's very simple )

 
Maxim Dmitrievsky:

Everyone is free to evaluate as they wish. I think the approach in the article is quite normal. Normal. If there are any other supergalactic testing technologies, please let me know.

The approach in the article is interesting, no argument here.

And we will invent supergalactic technologies :)

I think it is possible to look at the significance of predictors by number, say up to 1% and compare this index in different models, where the number is smaller, the probability of the model working is higher, as it has generalised more information - we should think in this way.

 
Aleksey Vyazmikin:

My understanding is that it's not swept. I don't know Python, but it seems to me that the model evaluation is from 2015 to 2020, correct?

I'm more about the validity of the evaluation criterion, how much it can help select a model that will work outside of the test sample that was used to select it.

To the extent that the series will be similar. There is a probability that the behaviour of a series outside the test sample differs so much that the found regularities will disappear. But it is finite and small on a small time scale.

And it cannot help.

 
Valeriy Yastremskiy:

As much as the series will be similar. There is a probability that the behaviour of the series outside the test sample differs so much that the found regularities disappear. But it is finite and small in a small time interval.

And it cannot help.

That's why I lack statistical information, let's say we studied 1000 models and 5% of them showed a good profit since 2015, at the same time we need to assess the similarity of models among themselves, which is more difficult, but more informative.

[Deleted]  
Aleksey Vyazmikin:

That's why I lack statistical information, let's say we studied 1000 models and 5% of them showed a good profit since 2015, but we also need to evaluate the similarity of models among themselves, which is more difficult, but more informative.

You can't write everything. If it's about that, yes. If the conditions are well chosen, in a bootforce cycle you get a lot of good models and few bad ones. It's just a matter of picking the best one. So it's not just one random model.

The article cites 2 models from the learning loop for 20 or 50 models (I don't remember), which are passing the test. And there are actually more profitable ones.

 
Maxim Dmitrievsky
Can you drop a link to Jupyter Notebook with this source code in Colab?