Discussion of article "Gradient Boosting (CatBoost) in the development of trading systems. A naive approach" - page 4

You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Even if overlapping, it's still a fairly complex topic, so different explanations of the topic will be on point.))))
And few questions are asked - even here, when Maxim showed the trick with partial memory loss :)
That's funny, I thought if the expectation is so low, it's a tester grail. I ran it on Saber, on a custom symbol, almost the same result.
Checked 17 year, there is a similar up-trend, it's draining.
Is it such a lucky piece of history or can you get such a picture for the past years too? I know it's losing on the test, but it was a completely different market there.
I checked 4 sko, the result is significant. The funny thing is, I've never seen Sharpe's over 3, is there such a thing?I'm not sure I understand the question. Is it a good piece of training? It's like that on any of them.
There's no problem with that, it's the generalisation for new data that's the problem.
about Saber data - as far as I understand, not every currency pair is suitable and he does optimisation, i.e. he goes over models
Purely in theory... if you randomly sample and retrain for a long time, you can find a good model. In practice I got X2 oos +-, i.e. on new data worked as long as traine\valid, in terms of time. Sometimes a little more.
It is desirable to do it somewhere in the cloud, the laptop does not allow it
I'm not sure I understand the question. Is it a good piece of training? On any of them.
no problem with that, it's the generalisation for new data that's the problem.
about the Saber data - as far as I understand, not every currency pair is suitable and he does optimisation, i.e. he goes through the models
Purely in theory... if you randomly sample and retrain for a long time, you can find a good model. In practice I got X2 oos +-, i.e. on new data worked as long as traine\valid, in terms of time. Sometimes a little more.
It is desirable to do it somewhere in the cloud, the laptop does not allow it
I don't understand the terms, it randomises the TC parameters, makes runs and tries to find the area of the best parameter sets for the TC result. That's optimisation. There is no model there. There are models in NS with MO.
What I don't understand in terms of the terms, it randomises the TC parameters, makes runs and tries to find the region of the best parameter sets for the TC result. That's optimisation. There's no model. There are models in NS with MO.
A TS with a set of parameters is a model
And few questions are asked - even here, when Maxim showed a trick with partial memory loss :)
What kind of memory loss?
What's with the memory loss?
Here we create a memory of past movements with a label binding:
The final step is to create additional columns with offset rows by look_back depth, which means adding additional (lagged, lagged) features to the model.
Further mixing:
Assuming that the mixing is uniform, this means that in training we have obtained column information on half of the sample about past and present returnees. On a relatively small period, where it is possible to fit volatility in this way, it works due to knowledge about the market, but as soon as it changes significantly - the model cannot work. Here, it seemed to me just a memory effect, rather than identification of a general pattern. Maxim, correct me if you perceive it differently.
Here we are creating a memory of past movements with a binding to a tag:
Further mixing:
Assuming that the mixing is uniform, this means that in training we have obtained information in columns on half the sample about past and present returns. On a relatively small period, where it is possible to fit volatility in this way, it works due to knowledge about the market, but as soon as it changes significantly - the model cannot work. Here, it seemed to me just a memory effect, rather than identification of a general pattern. Maxim, correct me if you perceive it differently.
If you look at the features themselves - they have serial correlation (autocorrelation), if you look at the labels - the same thing. Serial correlation leads to incorrect model estimation, incorrect training. A crude example (or maybe not) is overtraining for volatility, yes. Shuffling is a primitive way to break up the seriality a bit, and shuffling train and test is a bit of balancing of the data in both sets. This issue needs to be dealt with more seriously, not in such a primitive way, which is what I wanted to dedicate the next article to. As it is a separate, rather big topic.
It would be an interesting article if it solves the question of whether it is possible to mix samples at all based on their similarity.
As far as I understand, if the samples are similar, it is possible, but if they are significantly different, it is not. In our case, we are working with a changing market, and then the question of the possibility of mixing is determined by the time interval.... I would like to see a specific numerical criterion for assessing the similarity of two samples with a test of the theory of the admissibility of their mixing. Information for reflection.
Here we are creating a memory of past movements with a binding to a tag:
Further mixing:
Assuming that the mixing is uniform, this means that in training we have obtained information in columns on half the sample about past and present returns. On a relatively small period, where it is possible to fit volatility in this way, it works due to knowledge about the market, but as soon as it changes significantly - the model cannot work. Here, it seemed to me just a memory effect, rather than identification of a general pattern. Maxim, correct me if you perceive it differently.
If you look at the signs themselves, they have serial correlation (autocorrelation), and if you look at the labels, the same is true. Serial correlation leads to incorrect estimation of the model, incorrect training. A crude example (or maybe not) is overtraining for volatility, yes. Shuffling is a primitive way to break up seriality a bit, and shuffling train and test is a bit of balancing of the data in both sets. This issue needs to be dealt with more seriously, not in such a primitive way, which is what I wanted to dedicate the next article to. Because it is a separate, rather big topic.
Mixing train with test is not necessary in my opinion, as I wrote above.
It is simply N increments within a sliding window. Shuffling the train set doesn't change the tree in any way