Machine learning in trading: theory, models, practice and algo-trading - page 555

 
Maxim Dmitrievsky:

So I don't know what to believe in this life... everything has to be double-checked.


Benchmarks are the salvation)))

Various conversions and cuts. The top one is the raw data.

train = rms sampling with light sql. test = OOS. time = rms time in sec.


 
Regarding emissions in datasets, the market could use this method.
 

I sometimes wonder about this forum. It's quiet and dull and dumb. And suddenly some people like Vladimir or Vizard_ or the most suspicious podotr appear and start showing master classes. Who are they? I ask everyone to show passports and diplomas of education! :))))

 
SanSanych Fomenko:

You should only use predictors that HAVE a RELATION to the target variable. In this case "linearly" or "non-linearly" is irrelevant, irrelevant to the very precisely worded "have a relation".

well this and everything else in the text is clear, but what does the correlation of the trait with the target in an initially non-linear model have to do with it

and I wrote why it's needed in case of regression model, but not in classification, because there's not target but classes... read more deeply what I'm writing about :)

 
Maxim Dmitrievsky:

this and everything else in the text is clear, but what does the correlation of the trait with the target in an initially non-linear model have to do with it?

and I wrote why it is needed in the case of regression model, but not in the classification, because there is no target, but classes... read more deeply what I'm writing about :)


I do not need to read deeper - I understand you perfectly, but you do not understand me at all.

I am writing about overtraining (overfitting) - this is the main enemy of all classification models. The behavior of overfitted models is NOT determined in the future.

To combat this total evil, I see two tools:

1. ridding the input set of predictors of noise

2. careful testing.

All of this I write based on my own calculations, I assure you of a very large volume, which I have been doing for over a year.

I'm too lazy to search and then form a readable psot, as I have no goal of convincing anyone of anything.


PS.

You keep insisting on the innocuousness and even usefulness of noise predictors - you are not the first, there are plenty of such people, they are called astrologers.

 
SanSanych Fomenko:

You keep insisting on the innocuousness and even usefulness of noise predictors - you are not the first, there are plenty of such people, they are called astrologers.


Where did I write such a thing?

 
Maxim Dmitrievsky:

Where did I write that?

Reread your post.

ну это и все далее по тексту понятно, но причем здесь корреляция признака с целевой в изначально нелинейной модели

а я написал зачем она (корреляция) нужна в случае регрессионной, а в классификации нет, потому что там вообще не целевая а классы



It turns out that I speculated, and I think our disagreement is based on the following:

You are against correlation, and I never wrote about correlation between the predictor and the target variable.

It's called "talk."

I have always written: the predictor must be related to the target variable. I never meant correlation, linear, non-linear regression in the meaning of the word"relation." Moreover, all of the predictor "importance" algorithms that classification algorithms give out are not good enough for me either.


Look at my example: target: gender with a male/female class and predictor: clothing with a skirt/pants value.

 
SanSanych Fomenko:

Reread your post.



It turns out that I speculated, and I think our disagreement is based on the following:

You are against correlation, and I never wrote about correlation between the predictor and the target variable.

It's called "talk."

I have always written: the predictor must be related to the target variable. I never meant correlation, linear, non-linear regression in the meaning of the word"relation." Moreover, all of the predictor "importance" algorithms that classification algorithms give out are not good enough for me either.


See my example: target: gender with classes male/female, and predictor: clothing with value of skirts/pants.


yes, it's just that sometimes they write that the signs with the target should correlate, i.e. there should be a linear dependence

and I wrote that for regression models it may be reasonable, that at least 1 attribute should be linearly related to the target

about the "relation" of course i agree :)

 
Vizard_:

Benchmarking salvation)))

Various conversions and cuts. The top one is the raw data.

train = r.sampling with light sq. test = OOS. time = r.sampling time in sec.



even a good result on a forward is not always a harbinger of further profit output on the card :)

cross-validation has already been mentioned above, i think it's the best way to do it

 
Overlearning - Occurs in the presence of large weights (~10^18), a consequence of multicollinearity, resulting in an unstable A(x, w) model.


Overlearning is treated by: early stopping of model learning, restriction of growth of weights (L1(Lasso) and L2 regularization), restriction of connections in network (Dropout), also possible application of penalty functions (ElacticNet, Lasso).

Moreover, L1 regularization leads to the selection of features, as it zeroes their weight coefficients.

Getting rid of "noisy" features is the selection of features. We have our own methods for this. This does not always benefit the model, so sometimes use L2 regularization (helps solve the problem of multicollinearity).


SanSanych Fomenko, your statement about the relation of features and targets is a bit self-righteous. Because how can you assert something that has not yet been proven; that is what the MO model is built for. A model that is built and works gives some estimate of what the relationship is with "so-and-so" accuracy.

And the example with pants and skirts, displays the paucity of knowledge of the researcher about the study area, because in such a model you throw out valuable attributes about the place of residence, time of year, latitude and longitude of the region of residence, and so on.


Before building a model, one should understand the field under study, for the devil, like genius, lies in the details.


PS. Having arguments is a good thing. They help polish points of view, teach you how to provide good arguments to theses, and lead to general truth.

Reason: