Machine learning in trading: theory, models, practice and algo-trading - page 361

 
SanSanych Fomenko:

Training, retraining, and retraining (overfitting) are fundamentally different things.

All this training on every new bar is chewed up and chewed up on this forum and in general within TA.

In the fight against overtraining (overfitting) I know two methods.

Clearing the set of predictors from predictors not related to the target variable - clearing the input set of predictors from noise. This question was considered in details at the first 100 milestones of this thread

Having the set of predictors cleaned from noise we start fitting the model with training sample and then with test and validation sample, which are random samples from one file. The error on all three sets should be approximately the same.

3. Then we take a file that is separate from the previous one and run the model on it. The error, again, should be about the same as the previous ones.

4. If these checks are performed regularly, the question "20% drawdown is a signal for retraining" will not be asked at all since the first three steps resulted in a drawdown as a model parameter and going beyond that value says that the model is not working and everything should be started over again.


Well this is all a general case, yes. And in each case we have to work more with each particular TS, and then the understanding comes when it is normally trained, when it is retrained, and when it has periods. For example, the buy/sell ratio should be approximately the same, and the training sample should contain trend up, trend down and flat, otherwise it is a one-way draw. The main thing in this case is not to "retrain" oneself :)

Also, the seasonality factor, the crisis factor and other many different regularities that must be taken into account.

 
SanSanych Fomenko:

Cleanup of a set of predictors from predictors not relevant to the target variable - cleanup of the input set of predictors from noise. In the first 100 newsletters of this thread this question was discussed in great detail

What was the final solution - how to determine the predictors that are not related to the output commands?
 
elibrarius:
What was the final solution - how to determine predictors not related to output commands?
How? - It all depends on the specifics. It took me 2 days to solve a simple problem. Now the solution seems obvious.))
 
Yuriy Asaulenko:
How? - It all depends on the specifics. It took me two days to solve a simple problem. Now the solution seems obvious.)
I assume that there are also general, universal principles? Well, it is interesting to know about your specific example, too.
 
elibrarius:
I assume that there are also general, universal principles? Well, it is interesting to know about your specific example, too.

Check out my blog.

The principles, yes, are universal. But there are no general solutions. Except for some classes of problems.

 

A summary of the previous hundred pages :)

Neuronics and almost all other popular models are very far from artificial intelligence. They can simply find such combinations of predictor values that achieve the desired learning accuracy, and in the future, when forecasting, they sort of interpolate (or extrapolate) the past results to get a new forecast.

This means that if we take, for example, Ma, Rci, and Stochastic for predictors, and train the neural network using zigzag reversals as a learning objective - then we tell the neural network "these three predictors can predict reversals. And the neuronics itself won't be able to figure out if these predictors really fit. It will memorize those data with tolerable accuracy and in trading we will hope that the same combinations of Ma, Rci and Stochastic will be preserved before the reversal. But they won't, and we will be out of the money.

A model trained on useless predictors will fail, no matter if it is gbm, neuron or regression. You can even generate random rows and use them as predictors, the neuronics will find recurring combinations among them and remember them.
It is the task of a human data miner to select predictors and the purpose of training, using other tools. And training the model (neuronics) is a tiny penultimate step.

Predictors must keep correlation with the target on training data, both in the past and in the future. That's why SanSanych talks about testing the model on different files, just to make sure that the dependencies found don't disappear with new data.
I.e. we carefully study and select predictors and the target, train the model and test it. Then we test it on completely new data for the model. If the forecast accuracy in both cases did not coincide, then the predictors or the target are not suitable. We need to look for others.


SanSanych Fomenko:

Can the NS predict non-stationary series? If so, what types of non-stationarity?

In my opinion, neuronics is completely unsuitable for working with the price to predict nonstationary time series. The price behavior is constantly changing, the found patterns stop working after hours, everything is chaotic. And then someone takes a neuronics, gives it a price for a couple of months and requires to find the dependencies repeating during all that time. But there are no repeating dependencies, and what the neuron can find and remember will be just a 100% random coincidence.

If we take a neuron, then we can give it only somehow processed prices (not pure ohlc), such as indicators.

 

Thanks for the summary, I didn't feel like reading 100 pages...)

The process of manual selection of predictors, for example by combinations of 3 pcs will take a very long time. In MT5 there are 38 standard technical indicators. Combinations of 3 - a huge number. Moreover, we should also select periods, types of prices and other input parameters. And if we add interesting non-standard indicators, the number of tests will increase even more.

Therefore, we should look for an automated assessment of indicators. Vladimir Perervenko has found 2 common methods in his articles:

1) Removing of highly correlated variables - it is done using R there.

2) Selection of the most important variables - also solved in R.

I am writing directly in MT5, may be there already are some ready-made solutions for these questions? Or a method of transferring solutions from R to MT5, at least in a simplified version....?

It's clear how to search for correlation of indicators - just look for the difference between each indicator pair -> summarize it for each bar -> divide by the number of bars. (or another way?)

With the most important ones - not completely solved...

Maybe there are some other methods of clearing the predictors?

 
elibrarius:

But I am still writing directly on MT5, maybe there are already ready solutions for these issues? Or a method of transferring solutions from R to MT5, at least in a simplified version....?

To write all at once in MQL, while relying on R, is not the best option. It is easier to develop a strategy in R, then using this library https://www.mql5.com/ru/code/17468 to call R code from the Expert Advisor and test it in the tester.
Most likely while creating and testing a lot of things will be deleted and changed, packages, models change, etc., it is easier to change and test it all in R itself.

At the end, when you like it and it all works, you can try to port your code to mql by hand.
Many packages used in R are actually written in C/C++ and you can find source code for standard packages herehttps://cran.r-project.org/web/packages/available_packages_by_name.html,

 
Dr. Trader:

A summary of the previous hundred pages :)

Neuronics and almost all other popular models are very far from artificial intelligence. They can simply find such combinations of predictor values that achieve the desired learning accuracy, and in the future, when predicting, they sort of interpolate (or extrapolate) the past results to get a new prediction.

This means that if we take, for example, Ma, Rci, and Stochastic for predictors, and train the neural network using zigzag reversals as a learning objective - then we tell the neural network "these three predictors can predict reversals. And the neuronics itself won't be able to figure out if these predictors really fit. It will memorize those data with tolerable accuracy and in trading we will hope that the same combinations of Ma, Rci and Stochastic will be preserved before the reversal. But they won't, and we will be out of the money.

A model trained on useless predictors will fail, no matter if it's gbm, neuron or regression. You can even generate random series and use them as predictors, the neuronics will find recurring combinations and remember them.
It is the task of a human data miner to select predictors and the purpose of training, using other tools. And training the model (neuronics) is a tiny penultimate step.

Predictors must keep correlation with the target on training data, both in the past and in the future. That's why SanSanych talks about testing the model on different files, just to make sure that the dependencies found don't disappear with new data.
I.e. we carefully study and select predictors and the target, train the model and test it. Then we test it on completely new data for the model. If the forecast accuracy in both cases did not coincide, then the predictors or the target are not suitable. You should look for other ones.


In my opinion neuronics is completely unsuitable for predicting non-stationary time series. The price behavior is constantly changing, found patterns stop working after hours, everything is chaotic. And then someone takes a neuronics, gives it a price for a couple of months and requires to find the dependencies repeating during all that time. But there are no repeating dependencies, and what the neuron can find and remember will be just a 100% random coincidence.

If you also take a neuron, then you can give it only processed prices (not pure ohlc), such as indicators.

The problem is not with neurons or anything else applicable to the markets. The problem is what is inputted to DM tool. And it's pure madness to use the naked price as an input.

The problem is the predictors, as CC calls them. That is, the problem is how to represent a non-stationary series as stationary. Whoever is closest to solving this problem is the best.

 
Dr. Trader:

It's not the best solution to write everything in MQL, relying on R at the same time. It is easier to develop the strategy in R, then use this library https://www.mql5.com/ru/code/17468 to call the R code from the Expert Advisor and test it in the Strategy Tester.
Most likely while creating and testing a lot of things will be deleted and changed, packages, models, etc., it is easier to change and test it all in R itself.

At the end, when you like it and it all works, you can try to port your code to mql by hand.
Many packages used in R are actually written in C/C++ and you can find source code for standard packages herehttps://cran.r-project.org/web/packages/available_packages_by_name.html,

It's complicated... It will take more time than if you understand the algorithm (like for K-correlation above) and write it. I think the function of trying all the inputs, calculating the correlation, and sifting out the highly correlated ones would take a couple of hours.

I hope it will be as easy to use other solutions for sifting out predictors).

So are there any other solutions for finding unnecessary predictors?

Reason: