Machine learning in trading: theory, models, practice and algo-trading - page 179

 
Dr.Trader:


Are you talking about PCA component analysis or something else? I don't remember all the examples I've posted here :)

If you're talking about PCA, you can't make a candy out of garbage anyway. It is necessary to have quite good predictors mixed with bad, then PCA can sift out the bad from the good.

You underestimate the positive results of your experience. There are no grails. And there is a comprehensive tool to combat such evils as overtraining.

The very first step is to sift out the absolutely blatant garbage, which is what has a decisive impact on overtraining. And in that first step, the PCA is very helpful. After this step, the predictors relevant to the target variable will remain, and all fantasy will disappear. But don't overestimate the importance of this step - it's the first step. After it, the next steps are necessary:

  • rfe before each training.
  • retraining of the model at every new bar (ideally) or on weekends.

And your experiments with rfa will prove extremely useful.

PS.

Note that I have deliberately left out the work with the models themselves.

 
SanSanych Fomenko:

You underestimate the positive results of your experience. There are no grails. There is a comprehensive tool to combat the evil of overtraining.

The very first step is to weed out the outright garbage, which is what has the decisive effect on overtraining. And in that first step, the PCA is very helpful. After this step, the predictors relevant to the target variable will remain, and all fantasy will disappear. But don't overestimate the importance of this step - it's the first step. After it, the next steps are necessary:

  • rfe before each training.
  • retraining of the model at every new bar (ideally) or on weekends.

And your experiments with rfa will prove extremely useful.

PS.

Note that I have deliberately kept silent on the work with the models themselves.

Can I talk more about PCA? How do you weed out the garbage in this way?
 
SanSanych Fomenko:

You underestimate the positive results of your experience. There are no grails. There is a comprehensive tool to combat the evil of overtraining.

The very first step is to weed out the outright garbage, which is what has the decisive effect on overtraining. And in that first step, the PCA is very helpful. After this step, the predictors relevant to the target variable will remain, and all fantasy will disappear. But don't overestimate the importance of this step - it's the first step. After it, the next steps are necessary:

  • rfe before each training.
  • retraining of the model at every new bar (ideally) or on weekends.

And your experiments with rfa will prove extremely useful.

PS.

Note that I have deliberately kept silent on the work with the models themselves.

Retraining the model on each bar.... Does this mean that one single bar affects the entire model? In light of the importance of each bar in learning, your endless struggle with retraining becomes understandable...
 
Andrey Dik:
Re-learning the model on every bar.... Does it mean that a single bar affects the entire model? In the light of the importance of every bar in training, your endless struggle with retraining becomes clear...

The model that I'm now trying to bring to mind - I am, to finish training on each new bar. To be honest, I don't see a big impact... Sometimes for dozens of bars in a row, the model remains the same as before (the mechanism protects me from overfitting the model). But if some banker says something wrong in the news and the price goes somewhere wrong - there is a hope that the model will catch up with the latest changes within a couple of bars. It does not make much sense to fit the model to each bar, but if there is a way to quickly react to changes - why not take advantage of it?

Mihail Marchukajtes:
Can I have more details about PCA? How can you sift out garbage in such a way?

Somewhere around a hundred pages ago, Sannych posted a link to the "Principal component analysis" article in this thread. I used it to make some code and entered it here too. You need to read many pages to find it.

I also liked this article, although it has neither R nor MQL, but only excel. But it explains the principle of action a little better. h ttp://www.chemometrics.ru/materials/textbooks/pca.htm

 
Mihail Marchukajtes:
Can I speak more about PCA? How do you sift out garbage in such a way?

See how the Principal Component Method works.

But there is also an interesting comment from darkAlert, which explains why this method will not work in some application tasks. I quote:

"You forgot to mention that PCA (like other classical multivariate data reduction methods) only looks for linear dependencies..."

When applied to trading, this method is not suitable since predictor properties in the form of oscillator and indicator values fed to inputs are inherently non-linear.

Как работает метод главных компонент (PCA) на простом примере
Как работает метод главных компонент (PCA) на простом примере
  • habrahabr.ru
В этой статье я бы хотел рассказать о том, как именно работает метод анализа главных компонент (PCA – principal component analysis) с точки зрения интуиции, стоящей за ее математическим аппаратом. Максимально просто, но подробно. Математика вообще очень красивая и изящная наука, но порой ее красота скрывается за кучей слоев абстракции...
 

^GSPC comes from http://finance.yahoo.com, I think it's reliable

UNRATE, PAYEMS, GDP are taken from FRED (probably https://fred.stlouisfed.org/) , there will be a catch, thanks for the warning.

In general I prefer to use hourly eurusd and then I will try it.

*It was an answer to the fact that all those state indices sometimes recalculate and change their historical values.

 
Vizard_:
You can, but I won't tell)))
Significance is calculated from the weight of the despersion. That's it. Whether to use or not,
whether dimensional reduction is necessary, whether you don't end up throwing the baby out with the water,
apply to abysmally or pre-processing... another question...

I am not doing preprocessing much easier and some data are deleted, which I think is not good..... Because it is necessary to apply each signal TC without deleting. There is a single idea, which is to bring the output variable to the input. Some element of fitting :-) BUT

if we consider that the output variable is regulated by the number of TC gains, then by changing this parameter we can in any case know what is the quality of our input data. HM.... let me explain. There is a philosophy of selecting the output variable A simple example we have two signals:

Blue's gain of 1 pip. In my setup conditions it says that the one mark signals that have more than 50 pips. This blue one will be marked as 0 even though the market tends to go up and this blue signal could be marked as 1. By adjusting the profit parameter we thus include and exclude additional ones in our output set to get the maximum generalizing ability..... This can be done in the range minus the spread of 100 pips. With the brute-force method it takes a long time, even with step 10 it should run the optimization at least ten times...... In general, the question remains open.

 
Mihail Marchukajtes:
Can you tell me more about the PCA? How do you sift out garbage in this way?

Not only am I too lazy to find a link for you in this thread, but I also don't need to.

Would you be so kind as to leaf through this thread? The PCA doesn't sift through garbage just like that - there's a nuance to it. So it makes sense to look for it.

 
Dr.Trader:

The model that I'm now trying to bring to mind - I am, to finish training on each new bar. To be honest, I don't see a big impact... Sometimes for dozens of bars in a row, the model remains the same as before (the mechanism protects me from overfitting the model). But if some banker says something wrong in the news and the price goes somewhere wrong - there is a hope that the model will catch up with the latest changes within a couple of bars. There is no sense to fit the model to each bar, but if there is a way to quickly react to changes - why not take advantage of it?


I've tried many times to push one idea that is obvious to me: there is no single tool that can be used without retraining the model with a small error.

You have to go by the grain: cleaned up the obvious garbage, scaled, maybe Voh-Soch, selected predictors, picked up a model..... and then it turns out that everything should be thrown out, because the target is just a total dud....

In my practice, literally 3-5% error in minus at each step. And if initially the model gave an error of over 40%, and was retrained, I managed to move the unretrained model to 20%. About six months of work.

 

All right, but there are experts in MQL here???? Since we are all here :-)

Can you tell me variants, how can I optimize one variable so that the other comes to 0 ???? Or at least close to zero .....

Generally, optimization of a variable based on another variable....

Reason: