Discussion of article "Gradient Boosting (CatBoost) in the development of trading systems. A naive approach" - page 4

 
Valeriy Yastremskiy:

Even if overlapping, it's still a fairly complex topic, so different explanations of the topic will be on point.))))

And few questions are asked - even here, when Maxim showed the trick with partial memory loss :)

 
Rorschach:

That's funny, I thought if the expectation is so low, it's a tester grail. I ran it on Saber, on a custom symbol, almost the same result.

Checked 17 year, there is a similar up-trend, it's draining.

Is it such a lucky piece of history or can you get such a picture for the past years too? I know it's losing on the test, but it was a completely different market there.

I checked 4 sko, the result is significant. The funny thing is, I've never seen Sharpe's over 3, is there such a thing?

I'm not sure I understand the question. Is it a good piece of training? It's like that on any of them.

There's no problem with that, it's the generalisation for new data that's the problem.

about Saber data - as far as I understand, not every currency pair is suitable and he does optimisation, i.e. he goes over models

Purely in theory... if you randomly sample and retrain for a long time, you can find a good model. In practice I got X2 oos +-, i.e. on new data worked as long as traine\valid, in terms of time. Sometimes a little more.

It is desirable to do it somewhere in the cloud, the laptop does not allow it

 
Maxim Dmitrievsky:

I'm not sure I understand the question. Is it a good piece of training? On any of them.

no problem with that, it's the generalisation for new data that's the problem.

about the Saber data - as far as I understand, not every currency pair is suitable and he does optimisation, i.e. he goes through the models

Purely in theory... if you randomly sample and retrain for a long time, you can find a good model. In practice I got X2 oos +-, i.e. on new data worked as long as traine\valid, in terms of time. Sometimes a little more.

It is desirable to do it somewhere in the cloud, the laptop does not allow it

I don't understand the terms, it randomises the TC parameters, makes runs and tries to find the area of the best parameter sets for the TC result. That's optimisation. There is no model there. There are models in NS with MO.

 
Valeriy Yastremskiy:

What I don't understand in terms of the terms, it randomises the TC parameters, makes runs and tries to find the region of the best parameter sets for the TC result. That's optimisation. There's no model. There are models in NS with MO.

A TS with a set of parameters is a model

 
Aleksey Vyazmikin:

And few questions are asked - even here, when Maxim showed a trick with partial memory loss :)

What kind of memory loss?

 
elibrarius:

What's with the memory loss?

Here we create a memory of past movements with a label binding:

The final step is to create additional columns with offset rows by look_back depth, which means adding additional (lagged, lagged) features to the model.

Further mixing:

Разобьём данные на два датасета равной длины, предварительно случайно перемешав обучающие примеры.

Assuming that the mixing is uniform, this means that in training we have obtained column information on half of the sample about past and present returnees. On a relatively small period, where it is possible to fit volatility in this way, it works due to knowledge about the market, but as soon as it changes significantly - the model cannot work. Here, it seemed to me just a memory effect, rather than identification of a general pattern. Maxim, correct me if you perceive it differently.

 
Aleksey Vyazmikin:

Here we are creating a memory of past movements with a binding to a tag:

Further mixing:

Assuming that the mixing is uniform, this means that in training we have obtained information in columns on half the sample about past and present returns. On a relatively small period, where it is possible to fit volatility in this way, it works due to knowledge about the market, but as soon as it changes significantly - the model cannot work. Here, it seemed to me just a memory effect, rather than identification of a general pattern. Maxim, correct me if you perceive it differently.

If you look at the signs themselves, they have serial correlation(autocorrelation), if you look at the labels, the same thing. Serial correlation leads to incorrect model estimation, incorrect training. A crude example (or maybe not) is overtraining for volatility, yes. Shuffling is a primitive way to break up the seriality a bit, and shuffling train and test is a bit of balancing of the data in both sets. This issue needs to be dealt with more seriously, not in such a primitive way, which is what I wanted to dedicate the next article to. As it is a separate, rather large topic.
 
Maxim Dmitrievsky:
If you look at the features themselves - they have serial correlation (autocorrelation), if you look at the labels - the same thing. Serial correlation leads to incorrect model estimation, incorrect training. A crude example (or maybe not) is overtraining for volatility, yes. Shuffling is a primitive way to break up the seriality a bit, and shuffling train and test is a bit of balancing of the data in both sets. This issue needs to be dealt with more seriously, not in such a primitive way, which is what I wanted to dedicate the next article to. As it is a separate, rather big topic.

It would be an interesting article if it solves the question of whether it is possible to mix samples at all based on their similarity.

As far as I understand, if the samples are similar, it is possible, but if they are significantly different, it is not. In our case, we are working with a changing market, and then the question of the possibility of mixing is determined by the time interval.... I would like to see a specific numerical criterion for assessing the similarity of two samples with a test of the theory of the admissibility of their mixing. Information for reflection.

 
Aleksey Vyazmikin:

Here we are creating a memory of past movements with a binding to a tag:

Further mixing:

Assuming that the mixing is uniform, this means that in training we have obtained information in columns on half the sample about past and present returns. On a relatively small period, where it is possible to fit volatility in this way, it works due to knowledge about the market, but as soon as it changes significantly - the model cannot work. Here, it seemed to me just a memory effect, rather than identification of a general pattern. Maxim, correct me if you perceive it differently.

It is just N increments inside a sliding window.
Maxim Dmitrievsky:
If you look at the signs themselves, they have serial correlation (autocorrelation), and if you look at the labels, the same is true. Serial correlation leads to incorrect estimation of the model, incorrect training. A crude example (or maybe not) is overtraining for volatility, yes. Shuffling is a primitive way to break up seriality a bit, and shuffling train and test is a bit of balancing of the data in both sets. This issue needs to be dealt with more seriously, not in such a primitive way, which is what I wanted to dedicate the next article to. Because it is a separate, rather big topic.
Mixing the train set does not change the tree model in any way. The tree will sort each column. The result of sorting mixed and unmixed data is the same.
Mixing train with test is not necessary in my opinion, as I wrote above.
 
elibrarius:
It is simply N increments within a sliding window. Shuffling the train set doesn't change the tree in any way
I know.