Machine learning in trading: theory, models, practice and algo-trading - page 470

 
Mihail Marchukajtes:

The real point, though, is this. If there is a bad separation at the reference site, it does NOT matter right or wrong, the very fact of separation is weak. And the model has worked no more than 50% of the training intrevalue, then such a model is considered overtrained.... IMHO

By the way, do you remember, in your article on sequences you suggested counting several signals in a row, reversing them... signal superposition

I came up with an interesting solution to implement something like this through fuzzy logic and build it into the learning process... I'll post something later :)

 
Maxim Dmitrievsky:

Sometimes the brain starts to break down... about the noise in the forex, it's not a radio signal, right? Where does noise come from in forex?


The concept of "noise" in forex has been discussed quite extensively by me on this thread. I don't remember whether I made it up myself or borrowed it from someone else, and it doesn't matter. In any case, I have posted links to similar articles in this thread.

My perception of "noise" is all or part of the predictor that has no relation to the target variable, a sort of coffee grounds.


Let me explain with an example (I repeat what I wrote before).


Take the target, which consists of two classes: men and women.

We take a predictor: clothing.

There are only two predictor values: pants and skirts. In certain countries this predictor has 100% predictive ability, i.e. skirts predict women, and pants predict men. This predictor has no noise at all. Error of classification = 0. There is no overtraining.

The example is far-fetched and the predictor "clothes" can contain clothes with the name "unisex". For us, this means that both men and women can wear such clothes, i.e., for our target variable, "unisex" clothing has NO predictive power at all - which is my understanding of SHUM.

If we take a predictor that has "pants," "skirts," and "unisex," then "unisex" will be the source of the classification error. If the proportion of clothing "unisex" is 30%, then theoretically you can get a model learning error = 30%, but on such a predictor, a 29% error means a model overtraining of 1%!


I use this in practice. Because of this, I was able to select predictors for a random forest with less than 30% error. This one is not an over-trained model. The error is about the same on training, on test samples within the same file, and on other external files.

What does it mean to reduce the error in my example? It means finding predictors whose values, the noise values, would be less than that very 30%. I have not succeeded. Maybe someone will succeed.

But without this analysis, the use of any MO models is an empty exercise, an intellectual game of numbers on coffee grounds.


PS.

The specified error usually does not depend on the type of models. Tried different scaffolding, ada variants - about the same. But the NS, which is nnet, gives noticeably worse results.

 
SanSan Fomenko:

The concept of "noise" in Forex has been extensively discussed in this thread. I don't remember whether I invented it myself or borrowed it from someone else, and it doesn't matter. In any case, I have posted links to similar articles in this thread.

The way I see it, "noise" is all or part of the predictor that has nothing to do with the target variable, a sort of coffee grounds.


Let me explain with an example (I repeat what I wrote before).


Take the target, which consists of two classes: men and women.

We take a predictor: clothing.

There are only two predictor values: pants and skirts. In certain countries this predictor has 100% predictive ability, i.e. skirts predict women, and pants predict men. This predictor has no noise at all. Error of classification = 0. There is no overtraining.

The example is far-fetched and the predictor "clothes" can contain clothes with the name "unisex". For us, this means that both men and women can wear such clothing, i.e., for our target variable, clothing "unisex" has NO predictive power at all - which is my understanding of SHUM.

If we take a predictor that has "pants," "skirts," and "unisex," then "unisex" will be the source of the classification error. If the proportion of clothing "unisex" is 30%, then theoretically we can achieve a model learning error = 30%, but on such a predictor, a 29% error means a model overtraining of 1%!


Only we don't know beforehand to what extent this noise predictor can give minimal error, if in real conditions when selecting predictors... there just sift out uninformative ones and that's it

but in general it seems to be clear )

 
Maxim Dmitrievsky:

... the uninformative ones are simply sifted out there


This is a profound misconception: the error minimization algorithm works, which can be understood in many different ways. Noise that contains more diversity than NOT noise turns out to be the most useful. The algorithm picks up froth from coffee grounds.

 
SanSanych Fomenko:

This is a profound misconception: the error minimization algorithm works, which can be understood in many different ways. Noise that contains more diversity than NOT noise is the most suitable. The algorithm picks up froth from coffee grounds.

I mean Jpredictor... it sort of sifts out the noise stuff by itself

All in all, this is more a topic for experimentation than for trying to really understand what's going on there )

i want to try this h2O platform, it's just a forest with boosting... maybe you've heard? people say it's ok, along with mcrosoft's and xgbboost

https://www.h2o.ai/

 

For those who haven't seen it, I recommend checking out this thread

 
Vizard_:

Fa and Misha are not miners)))
Will do for wanking on parrots. + LightGBM, + CatBoost.
If you want to get a slightly better cut, put python and all on GPU...

Many R packages work fine with the GPU.

Have you run SatBoost ? Just wondering.

Good luck

 
Maxim Dmitrievsky:
I mean Jpredictor... it filters out noise features by itself

Anyway, this topic is more for experimentation than for trying to really understand what's going on there)

I want to try this h2O platform, it's just a forest with boosting... maybe you heard? people say it's ok, along with mikroft's and xgbboost

https://www.h2o.ai/

It is written in Java and uses a lot of memory. It works no better and no worse than similar R packages. It has one disadvantage and one advantage - continuous improvement without backward compatibility.

It is ok to experiment but I would not recommend it for work (IMHO).

Good luck

 
SanSanych Fomenko:

This is a profound misconception: the error minimization algorithm works, which can be understood in many different ways. Noise that contains more diversity than NOT noise is the most suitable. The algorithm picks up froth from coffee grounds.

"noise" and "overfitting" are jargonisms, the meaning of which everyone defines differently. Intuitively we understand the difference between "learned" and "learned" - it's difficult to translate it into the language of programs. I define it simply - the moment when the test error begins to grow and is the beginning of "overtraininga" (not the same with "overfitting"). If I find it, I will send you a link to an interesting discussion on this subject in English-language net.

Good luck

 
SanSanych Fomenko:

The concept of "noise" in Forex has been extensively discussed in this thread. I don't remember whether I invented it myself or borrowed it from someone else, and it doesn't matter. In any case, I have posted links to similar articles in this thread.

The way I see it, "noise" is all or part of the predictor that has nothing to do with the target variable, a sort of coffee grounds.


Let me explain with an example (I repeat what I wrote before).


Take the target, which consists of two classes: men and women.

We take a predictor: clothing.

There are only two predictor values: pants and skirts. In certain countries this predictor has 100% predictive ability, i.e. skirts predict women, and pants predict men. This predictor has no noise at all. Error of classification = 0. There is no overtraining.

The example is far-fetched and the "clothing" predictor can contain clothing with the name "unisex". For us, this means that both men and women can wear such clothing, i.e., for our target variable, clothing "unisex" has NO predictive power at all - which is my understanding of SHUM.

If we take a predictor that has "pants," "skirts," and "unisex," then "unisex" will be the source of the classification error. If the proportion of clothing "unisex" is 30%, then theoretically we can achieve a model learning error = 30%, but on such a predictor, a 29% error means a model overtraining of 1%!


I use this in practice. Because of this, I was able to select predictors for a random forest with less than 30% error. This one is not an over-trained model. The error is about the same on training, on test samples within the same file, and on other external files.

What does it mean to reduce the error in my example? It means finding predictors whose values, the noise values, would be less than that very 30%. I have not succeeded. Maybe someone will succeed.

But without this analysis, the use of any MO models is an empty exercise, an intellectual game of numbers on coffee grounds.


PS.

The specified error usually does not depend on the type of models. Tried different scaffolding, ada variants - about the same. But the NS, which is nnet, gives noticeably worse results.

You can remove irrelevant examples, isolate main or independent components, discretize finally. Have you preprocessed the predictors? I hope you have removed emissions (not critical for forests).

You sound pessimistic.

Is it just me?

Good luck

Reason: