Machine learning in trading: theory, models, practice and algo-trading - page 3210

 

The problem mentioned above is that there is a model that has excellent results on a training file and an OOS file. I understand that the training file can be obtained even by random sampling by sample, and the OOS is a residue from the training file.

But when running the model on an external file, the result is catastrophically bad.

I remembered, had such a variant some years ago.

I managed to find the reason. The reason was looking ahead, an extremely inconvenient reason, because it is very difficult to understand what looking ahead is.

Then I created a model in which the teacher is the increments of ZZ, and there are a lot of predictors, in the calculation of which ZZ was involved. For example, the difference between price and ZZ. When training, I simply cut off a piece of the file that did not contain the right links of ZZ. And when calculating predictors, the missing ZZ values were extended by the last link.

The three files (this is in Rattle) that it gives a random sample gave a classification error of about 5%. However, the fourth file, which is unrelated to the first three, gave a random error.

If we remove all the predictors that ZZ is involved in computing, however, everything falls into place: the classification error is roughly equal on all four files.

This is looking ahead.

With retraining it is clear: be extremely careful to use optimisation in the tester, and in R clean the list of predictors from rubbish. But how to detect looking ahead?

From past experience, the following is obvious: if the classification error on train, test, and validation is less than 10%, then you should throw out predictors one by one until the error increases to 20% 30% ....

 
ZigZags should be built for the moment of the bar, not into the future.
 
Forester #:
ZigZags should be plotted at the moment of the bar, not into the future.

For model training ZZ was not extended, because it is not necessary: the pattern that the model is looking for is one line - neighbouring lines are not taken into account, and the sample for training is 1500 bars.

 
Where did this zig-zag crap come from anyway, who came up with it first?

Sanych, when will you remember that a teacher is a target + fichi?))) not one target

And volatility is spelt without H.
 
СанСаныч Фоменко #:

From past experience, the following applies: if the classification error on train, test and validation is less than 10%, then stupidly throw out the predictors one by one until the error increases to 20% 30% ....

Genius)))))

How to find real predictors with error less than 10% ?

Don't tell me they don't exist, it's a matter of faith....

 
Maxim Dmitrievsky #:
Where did this zig-zag crap come from, who invented it first?

Are you yourself interested or not at all interested in writing on the substance of the problem, rather than showing off your favourite self?

 
СанСаныч Фоменко #:

I managed to find the reason. The reason is looking ahead, an extremely inconvenient reason, because it is extremely difficult to understand what looking ahead is.

Then I used a model in which the teacher is the increments of ZZ, and there are a lot of predictors in the calculation of which ZZ was involved. For example, the difference between price and ZZ. When training, I simply cut off a piece of the file that did not contain the right links of ZZ. And when calculating predictors, the missing values of ZZ were extended by the last link.

To avoid peeking Forester says correctly, you should calculate predictors in the loop at each iteration without peeking....

That's the solution.

 
mytarmailS #:

Genius)))))

How then to find real predictors with less than 10% error ?

Don't tell me there are none, it's a matter of faith.....

It's easy.

I wrote above how I did it with the example of ZZ.

But it's not about ZZ: we put the teacher's chants into predictors and get happiness before running on the file outside.

And you can not run on the file OUT and live happy, as Maxim does with very beautiful pictures.

But back to the problem of looking ahead. Suggested blunt overshoot. Or maybe there is something else?

 
СанСаныч Фоменко #:

Are you yourself interested or not at all interested in writing on the merits of the problem rather than showing off your favourite self?

It is not interesting at all to solve other people's mental problems
 
mytarmailS #:

In order to avoid peeking, Forester is right to say that you should calculate predictors without peeking at each iteration of the loop.

That's the solution.

With the ZZ example, it was obvious.

But I often get a classification error of less than 10% without ZZ. It turns out to be rubbish. Threw it away.

Reason: