Machine learning in trading: theory, models, practice and algo-trading - page 163

You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
1) If you look at the first publications of the author of randomforest algorithms, the author seriously claimed that rf is not prone to overtraining at all and gave a lot of examples. The randomforest package itself is built so as to exclude even the slightest suspicion of overtraining.
At the same time, randomforest is the most overtrainable algorithm. I have personally burned myself.
2) The vast majority of publications on machine learning are not tested on any analog of the second file. The reason is trivial. The algorithms are NOT applied on time series. And it turns out that a random division of file number one is quite sufficient. And this is indeed the case, for example, when recognizing handwritten text.
1) Overtraining and forrest and GBM and any other methods. Unnoticeable on the background of folded data and very noticeable on heavily noisy data.
2) There are, there are publications discussing the introduction of nested crossvalidation on additional samples in a different time range.
2) There are, there are publications discussing the introduction of nested crossvalidation on additional samples in a different time range.
If it's not too much trouble, link
One of the discussions: http://stats.stackexchange.com/questions/65128/nested-cross-validation-for-model-selection
Ibid: /go?link=https://stats.stackexchange.com/questions/103828/use-of-nested-cross-validation
There are links to articles in the discussions.
One interesting article: http://www.andrewng.org/portfolio/preventing-overfitting-of-cross-validation-data/
As you can see from the title, it is about overtraining, which happens at the stage of model evaluation on validation folds of crossvalidation. Accordingly, in addition to crossvalidation, we need more sampling to evaluate the already selected model.
If krakto (already wrote about it):
A model selected via crossvalidation must be revalidated by another time-delayed sample.
And nested crossvalidation implies building n k-fold crossvalidations (on different data) followed by validation on n deferred samples (each time on different data).
And even that is not all. If the top layer of deferred samples is re-selected, e.g. a committee of models based on these deferred samples, then the validation of the committee must be performed on one more deferred sample.
Ideally, this process:
k-fold кроссвалидация
-------------------------------- повторилась n раз
------------------------------------------------------------- на полученных данных сформирован комитет
------------------------------------------------------------------------------------------------------------------------ комитет валидирован на еще одной выборке из будущего
needs to be repeated not once but m times, in order to GET the results to be DISTRIBUTED at the very top level. This lowers the bias to a practically feasible minimum.
But in doing so, the expected value of, for example, FS may decrease many times... Pain.
introducing nested crossvalidation on additional samples in a different time range.
I do something similar too. Suppose I have data to train for a year. I will train 12 models - one on the data for January, the second model on the data for February, the third for March, etc. I select predictors and model parameters to ensure that any of these models trained on a small part of the data trades well during the whole year and this gives me some hope that predictors used have constant correlations between them. Making a decision on the new data using this entire ensemble of models.
Of all the crossvalidation methods I've tried, this one gave the best results on the new data. But there are a lot of unresolved problems - how many models should there be, i.e. I can train a hundred instead of 12, but is there a point? Valuation of trade is also important, I can choose anything, including rf or sharp, I need to experimentally pick the best one.
I do something similar, too. Let's say I have data for training for a year. I will train 12 models - one on the data for January, the second model on the data for February, the third for March, etc. I select predictors and model parameters to ensure that any of these models trained on a small part of the data trades well during the whole year and this gives me some hope that predictors used have constant correlations between them. Making a decision on the new data using this entire ensemble of models.
Of all the crossvalidation methods I've tried, this one gave the best results on the new data. But there are a lot of unresolved problems - how many models there should be, i.e. I can train a hundred instead of 12, but is there a point? Trade evaluation is also important, you can choose anything, including rf or sharp, you need to experiment to find the best one.
I do something similar, too. Let's say I have data for training for a year. I will train 12 models - one on the data for January, the second model on the data for February, the third for March, etc. I select predictors and model parameters to ensure that any of these models trained on a small part of the data trades well during the whole year and this gives me some hope that predictors used have constant correlations between them. Making a decision on the new data using this entire ensemble of models.
Of all the crossvalidation methods I've tried, this one gave the best results on the new data. But there are a lot of unresolved problems - how many models there should be, i.e. I can train a hundred instead of 12, but is there a point? Trade evaluation is also important, anything to choose from, including rf or sharp, you need to experimentally pick the best one.
One of the discussions: http://stats.stackexchange.com/questions/65128/nested-cross-validation-for-model-selection
Ibid: /go?link=https://stats.stackexchange.com/questions/103828/use-of-nested-cross-validation
There are links to articles in the discussions.
One interesting article: http://www.andrewng.org/portfolio/preventing-overfitting-of-cross-validation-data/
As you can see from the title, it is about overtraining, which happens at the stage of model evaluation on validation folds of crossvalidation. Consequently, besides cross-validation we need another sample for estimating the already selected model.