Machine learning in trading: theory, models, practice and algo-trading - page 2551

 
Maxim Dmitrievsky #:

What is the right way to use CV results afterwards?

I optimize the hyperparameters of the model (depth of tree training or number of examples in the leaf, number of trees, etc.) and dataset (number of rows, combinations of features are possible)

I run all these variants, then I select the best variant of model and data parameters according to the best total result of valuing forward. In my opinion, cross-validation is worse, while rolling forward is a copy of what will happen in reality: trade for a week, retrain for another week, retrain again, etc.

take the best found model parameters and then train them on the whole dataset

It is illogical to train the whole dataset.
What is the best depth of history you got, you should continue training on the same depth. My model trained N times on 50 000 lines of М5 (almost a year) may show 52% on the sum of all forwards. If we train it with the same parameters, differing only in the depth of history, i.e. 70 000 or 30 000 lines, it will be less than 50% on the sum of all forwards.

The reason is that the leaves will not contain the same examples. Trees may have more or fewer leaves, etc. I think that for different sized datasets you should change the depth or the number of examples in the leaf.

 
elibrarius #:

I optimize the hyperparameters of the model (depth of tree training or number of examples in the leaf, number of trees, etc.) and dataset (number of rows, combinations of features are possible)

I run all these variants, then I select the best variant of model and data parameters according to the best total result of valuing forward. Cross-validation in my opinion is worse, valking forward is a copy of how it will be in reality: trade for a week - retrain, another week, retrain again, etc.

On the whole dataset - illogical.
What is the best depth of history obtained - the same depth to continue learning. My model trained N times on 50 000 lines of М5 (almost a year) may show 52% of the sum of all forwards. If we train it with the same parameters and the only difference is the depth of history, ie 70 000 or 30 000 lines, it will be less than 50% of the sum of all forwards.

The reason is that the leaves will not contain the same examples. Trees may have more or fewer leaves, etc. I think you have to change the depth or number of examples in the leaf for different sized datasets.

Well, in my opinion, cv is needed to evaluate the quality of the dataset, not the robustness of a particular model. If average error on k-folds is acceptable, then we can train model on this dataset and it will be good too. You can borrow parameters averaged from models used for cv.
 
Maxim Dmitrievsky #:
Well, in my opinion, cv is needed to evaluate the quality of the dataset, not the robustness of a particular model. If the average error on k-folds is acceptable, we can then train a model on this dataset and it will be good too. You can borrow parameters averaged from models used for cv.
We will take different chunks from the dataset all the time. I optimize both dataset (number of lines and features) and model parameters.
 
Aleksey Nikolayev #:

It is probably possible to check every observation from the test to see if it is an outlier in some sense relative to the exam.

That's what would be interesting to know!

My point is that the market is changeable and cyclic, and in theory any model, assuming that the events repeat (otherwise there is no point in training), will have high accuracy in different periods of its existence, and it is likely that on the test sections it will simply be another market, another wave. Training is done on the most pronounced patterns, but are we entitled to assume that they will be as stable! I think that the quality of a model depends on the predictors that describe stable patterns and therefore we should train on those examples that are typical to the outcome in different parts of the sample.

 
mytarmailS #:
You can through wood models...
Decompose the model into rules, analyze the rules with the necessary statistics (repeatability, etc...), see if the rule appears on new data...

The "intrees" package 5 lines of code and go

I've been doing this for a long time with leaves, but it's not quite the same - it doesn't allow me to detect atypical examples in the sample.

 
Vladimir Perervenko #:

NoiseFiltersR package.Have a look at the article.

Looked at the article, as I understand this package does not give significant results - an increase of about 3%, but it's interesting - can you explain the principle of its work?

 
elibrarius #:
We will take different chunks from the dataset all the time. I optimize both dataset (number of lines and features) and model parameters.

I forgot, do you have a target color/type of the current hour candle?

 
Aleksey Nikolayev #:

If everything is more or less clear with the noise predictors, then not so much with the noise examples. I would like to know more about the ways to define them (in the sense of theory, rather than names of packages/functions used, although of course R always has references to articles). It is clear that there should be a "don't trade" class when classifying, since striving to be in the market all the time is considered a mistake. But it is not very clear how to correctly describe that class in a more or less formal way.

There are three options for processing noisy examples: delete, re-partition (correct the markup) and create a separate class of noisy examples. In my experience, about 25% of the sample is "noise". Quality improvement of about 5%, depending on models and data preparation. I use it sometimes.

There is one more problem when using predictors - their drift. And this problem should be defined and taken into account in testing and in operation. In the appendix you can find the translation of the article (look for others on the net) and there is a drifter package. It is not the only one. But the point is that when selecting predictors, you need to consider not only their importance, but also their drift. For high drifters throw them away or transform them, for low drifters take them into account (correct them) when testing and working.

Good luck

Files:
Drift1.zip  2238 kb
 
Aleksey Vyazmikin #:

I forgot, do you have a target color/type of the current hour candle?

The color of the candlestick even with a 30% error can be a loser. We don't know how much profit we'll get from it... As a rule the color is guessed well in the slow movements of the price (during the night). And one wrong strong day candle may be worth 10 small night ones. I think that guessing the color of candlesticks is again a random output (due to the random size).
That's why I did the classification with TP, SL. If they are equal, then 52% of successful trades are already profitable. If TP=2*SL. Then >33% of successful trades will be profitable. The best I have had is 52-53% of successful trades with TP=2*SL over 2 years. But in general, I'm thinking to use regression with fixed TP/SL. More accurately, somehow make a classification on regression.
 

I haven't been on the forum for several years, and it's still there. As in the song: "What you were, what you have remained, steppe eagle, dashing Cossack...".

Statistics begins with an axiom, which, being an axiom, is not discussed:


"Garbage in, garbage out."


In principle, there are not and cannot be mathematical methods which will make a candy out of garbage. Either there is a set of predictors that PREVENT the teacher, or there isn't.

And models play practically no role, as well as various cross-validations and other computationally capacious perversions.


PS.

By the way, the "importance" of predictors in a model has nothing to do with the ability to predict the teacher.

Reason: