OOS error is expected to be higher than train error due to unfamiliar data

Aleksei Kuznetsov 2018.05.29 11:50 #9621

Ivan Negreshniy:

And what do you, for yourself, justify the admissibility of deviations downwards, except pure chance?

I have no justification for expecting OOS to be better than Train.
How can you expect OOS random data to produce better results than the ones you learned on? It can't be, except by chance.
It was recently written in another thread that the student cannot know more than the teacher.
Example.
Part of the data (eg 80%) on the OOS will be familiar to the model and there it will show the same error as on Train (let error=30%), the other 20% of the OOS data will be new and unlearned, and will give an error of 50%. Combined, this 80% familiar and 20% new data should raise the site OOS error to about 35%.
So I expect a worsening of the OOS results to be more likely than an improvement.
There is also a probability of improvement if the OOS gets a lot of successful examples, in a greater proportion than the Train plot. I can't imagine any other variants of error reduction on resistance.

Ivan Negreshniy:

And then what is your main task, if not the struggle with this randomness, because it levels the meaning of both validation and OOS and ME as a whole?)

The task, - that the delta of errors was not too big.

Neuromongers, don't pass by Practical advice, please. Discussion of article "Deep

Aleksei Kuznetsov 2018.05.29 12:18 #9622

A question for Alexei.
How do you weed out the noise ones from your 600 predictors?

Ivan Negreshniy 2018.05.29 12:32 #9623

elibrarius:

I have no justification for expecting the OOS to be better than Train.
How can you expect random OOS data to produce better results than the ones you learned on? It can't be, except by chance.
It was recently written in another thread that the student cannot know more than the teacher.
Example.
Part of the data (eg 80%) on the OOS will be familiar to the model and there it will show the same error as on Train (let error=30%), the other 20% of the OOS data will be new and unlearned, and will give an error of 50%. Combined, this 80% familiar and 20% new data should raise the site OOS error to about 35%.
So I expect a deterioration in the OOS results to be more likely than an improvement.
There is also a probability of improvement if the OOS gets a lot of successful examples, in a greater proportion than the Train plot. I can not imagine any other options to reduce the error in the feedback.

The task, - that the delta of errors was not too big.

In order not to get confused, it is necessary to define the terminology - OOS or OOS(out of sample) is data, by definition not familiar models, another thing is IS(in sample)

Aleksei Kuznetsov 2018.05.29 12:36 #9624

Ivan Negreshniy:

We need to decide with the terminology - OOS or OOS(out of sample) is data, by definition not familiar models, another thing is IS(in sample)

If patterns in the data are found, then the examples that correspond to them - can be considered familiar.

Yuriy Asaulenko 2018.05.29 12:46 #9625

elibrarius:

The probability of improvement is also there, if a lot of successful examples fall into the OOS, in a greater proportion than on the Train section. I don't see any other options for reducing the error in the feedback loop.

I read in one of the MO books that when training, the ratio of successful to unsuccessful should correspond to reality. We should train equally for successful outcomes as for unsuccessful ones.

Aleksei Kuznetsov 2018.05.29 12:53 #9626

Yuriy Asaulenko:

I read in one of the MoD books that when training, the success-failure ratio must match reality. We should teach both successful and unsuccessful outcomes equally.

Then why try to filter out the unsuccessful and noisy examples; or isolate them, repartition them to "don't know" and train the network again?

Yuriy Asaulenko 2018.05.29 12:57 #9627

elibrarius:
Then why try to filter out failed and noisy examples; or isolate them, repartition them to "don't know" and train the network again?

I'm not aware of it. That's for those who do it. I teach as I wrote above.

[Deleted] 2018.05.29 14:28 #9628

Dr. Trader:

Yes, it's a pity that your demo was lost. And all because you look too much on the OOS, although the same article where it was written that the OOS model can not be selected, and on the forum here many times wrote the same thing.

CB is not relevant for Forest... I look AOB, have already written that a direct correlation between the AOB and stability on the AOB.

Did not get the same error on the OOB as on the trane so far, always more than at least 2 times. I'll try again later, I'm sick of this nonsense now :)

Maybe I'll have to switch to R and get a better model, because the algib doesn't give me anything else to look at

[WARNING CLOSED!] Any newbie FOREX - Trends, forecasts You must always set

Yuriy Asaulenko 2018.05.29 16:03 #9629

Maxim Dmitrievsky:

because the alglib itself doesn't let you see anything else

I told you a long time ago, it's a dead end. It is necessary to simulate not in MT - R, MatLab, as A_K2, in VisSim, etc.

When the model will work, then you can transfer it to MT, and you can not transfer it).

FOREX - Trends, forecasts The market is a Using high-frequency robots -

[Deleted] 2018.05.29 16:12 #9630

Yuriy Asaulenko:

I told you a long time ago, it's a dead end. Modeling should not be done in MT - R, MathLab, as A_K2, in VisSim, etc.

When the model will work, then you can transfer to MT, and you can not transfer).

What else interesting to say?

Machine learning in trading: theory, models, practice and algo-trading - page 963