Machine learning in trading: theory, models, practice and algo-trading - page 28

 
mytarmailS:
I already did this, the neural network does not learn on a larger horizon with the target that I set it

I smile at your words.

It is not a neural network that does not learn, but you make it not learn, literally. A neural network will learn where there is a signal in addition to noise, and your task is to separate this signal from noise with the help of the Great Neural Network, not allowing it to learn.

 
Alexey Burnakov:
This is a more correct answer.) It is necessary to try different methods.

I once suggested this approach.

We take an Expert Advisor, it can be from kodobase. The main thing is that it should be full-blooded and it does not matter if it is losing. You can set the preliminary requirements. For example: trending, three windows, without SL and TP, MM, error recovery of states...

We should look at the reason of the loss and try to solve it with R. Solve problems in the low windows by extrapolating, in the high window we predict the next bar by classification.

 
Alexey Burnakov:

My article about feature selection.

https://habrahabr.ru/company/aligntechnology/blog/303750/

It was interesting to read, thanks for the hard work. I didn't understand about experiment 6 - why were predictors Y and Z selected in the end? Z is the target variable, it shouldn't have been in the list of predictors.

I like the idea of taking a 3D rabbit graph and training the model on it. To be honest, at first I thought that "rabbit" is some set of pseudo-random data, which looks nice on the graph and only makes sense to a person with his ability to see the rabbit in an array of numbers. But there is no function that describes the rabbit, for any model it will just be a set of points that form clusters of a special form in 3-d space. From my point of view all three coordinates of each point have no relationship and dependence between them, I expected that no algorithm can cope. At the end of the article I was surprised.

I tried the same experiment with PCA, and the results are the following:
I need one component for 95% accuracy, it uses X and Y. But prediction algorithm doesn't work with one component, I need at least two. Basically they both use X and Y (it's not shown in the code, but you can see it in the graph), so it's ok.
Train the model on sampleA, then predict Z for sampleA : R^2 = 0.04759303
Train the model on sampleA, then predict Z for sampleB: R^2 = 0.05325888

I drew a 3d plot of the predicted Z's for sampleB, it came out poorly. The model didn't really find any dependencies, it just averaged all the Z's into one plane. The rabbit came out flat. Have you tried training some models, and drawing a predicted rabbit? I wonder how it works out for different models.

Attached the code for the rabbit and the y-aware pca. There is a small bug, the predictor loadings graph sorts the components for some reason. Ie they are 1,10,11,12,2,3,4,5,... But in order of importance we should look in this order: PC1, PC2, PC3,...

Files:
bunny_pca.txt  5 kb
 
mytarmailS:
I've done this before, the neural network doesn't learn on a larger horizon with the target I set it

I have not yet tried to train the neuron beyond one bar, but I assume that the principle will be the same. I work with it this way:

1) collect as much raw data as possible from mt5 - ohlc, time, indicators. Since I don't know what is useful and what is not, I take everything and try to weed out the trash. I can't send everything to neuronka, because neuronka retrains on garbage and in that case it makes a mistake in fronttest. It's not easy to sift out garbage, in fact, all 28 pages of this thread are devoted to how to do it, and so far everything is ambiguous. I add to the data a target variable - 0 or 1, depending on the fall / rise in price for the next bar.

2) Actually, garbage collection. There should be a function that analyzes the initial data and gives it a score. The higher the score - the less garbage. We try to feed the evaluation function with various combinations of initial predictors (training table columns) and try to improve the results. As a result a certain set of predictors will be found, which, in theory, will not lead to overtraining.

3) Suppose the trash is eliminated. Now it's time to train the neural network. I use the package nnet from R, it has advantages in the form of absence of hyperparameters like training speed, acceleration, braking, types of training functions. Less hyperparameters means less problems. You need to train with cross validation. I take the original table, and divide it into three rows in the ratio of 70%/15%/15% (new tables are called train/test/validate) (the code can be taken from the rattle log, I started from this). Then I do for example 10 iterations of training neuronics on the train table. I calculate the prediction error R^2. Predict the results for test and validate tables, count R^2 for them too. I do 10 more iterations of training, repeat prediction for 3 tables, look at new R^2. Stop training at the moment when R^2 on some of the tables starts falling (it will always grow only on the train table). Done. Now you can repeat all this again, using a different number of hidden neurons, and a different number of training iterations between cross-validations, hoping that the minimum R^2 of the three tables at the end will be higher than the last time.

All of this may seem a bit complicated, but it's actually even more complicated :), there are a bunch of problems in each of these steps. But it gives consistent results.

 
SanSanych Fomenko:

It's good that you are not trained, because you teach on noise. But if you were trained, yes grail, yes on the real....

Busy here trying to eliminate noise. That's why we take so many predictors in the hope that at least something will remain.

If the stochastic (for example) had a magic value of 90 and the market would have fallen at this indicator value, then you wouldn't need any networks, it would be visible to the eye

1) At best what you do is the selection of the best 10 indicators out of 100 possible by some criteria, and the indicator has a range of values of say 100 out of these 100 values only 3 work and not always but in certain situationsOnly these 3 values are not noise and they are the only ones to be left without noise, not the entire indicator, because the other values are the same noise, so all these rejections are qualitatively not deep and therefore not very effective.

2) Furthermore, there is such a notion as inconsistency of attributes, for example, take the price, it is objective and not contradictory, that is, if the price is growing, it is growing and there is no second (if we ignore the nuances of the trend strength, etc.), take the same stochastic it can take 90 values, when it is flat and when the trend is up and down, that isThis indicator does not help the network, on the contrary, it always confuses the market because its readings are inconsistent, so ordinary indicators are not applicable to the market , and you don't even think about it, you just fill everything up, and then you tell me about the noise.

I'm not satisfied with the forecast horizon, like I wrote above

 
Dr.Trader:

It was interesting to read, thanks for the effort. I did not understand about experiment 6 - why were predictors Y and Z selected in the end? Z is the target variable, it should not have been in the list of predictors.

I like the idea of taking a 3D rabbit graph and training the model on it. To be honest, at first I thought that "rabbit" is some kind of pseudo-random data, which looks nice on the graph and only makes sense to a person with his ability to see the rabbit in an array of numbers. But there is no function that describes the rabbit, for any model it will just be a set of points that form clusters of a special form in 3-d space. From my point of view all three coordinates of each point have no relationship and dependence between them, I expected that no algorithm can cope. At the end of the article I was surprised.

Tried the same experiment with PCA, the results are like this:
One component is needed for 95% accuracy, it uses X and Y. But the prediction algorithm doesn't work with one component, you need to take at least two. Basically they both use X and Y (it's not shown in the code, but you can see it in the graph), so it's ok.
Train the model on sampleA, then predict Z for sampleA : R^2 = 0.04759303
Train model on sampleA, then predict Z for sampleB: R^2 = 0.05325888

I drew a 3d plot of the predicted Z's for sampleB, it came out poorly. The model didn't really find any dependencies, it just averaged all the Z's into one plane. The rabbit came out flat. Have you tried training some models, and drawing a predicted rabbit? I wonder how it works out for different models.

Attached the code for the rabbit and the y-aware pca. There is a small bug, the predictor loadings graph sorts the components for some reason. Ie they are 1,10,11,12,2,3,4,5,... but in order of importance you should look in this order: PC1, PC2, PC3,...

So, did you get X and Y in the major components? That would be important to understand.

Second, about the rabbit approximation. Of course it should be! It's a linear model. It will just draw a plane (if X and Y) with a little slope, or a line (if one predictor). That's the whole linear rabbit model ) That's why I try to use non-linear models.

This is how to reconstruct the rabbit using another method (based on discrete values).

 
Alexey Burnakov:

So, did you have X and Y in the main components? It would be important to understand.

Yes, only X and Y. How to do this through code I have not yet found, all the articles about this operate with graphs. The number of components to take can be seen in the componentsToUse variable. In this case componentsToUse = 2, it means you need to take only those predictors that have wide horizontal lines on the chart for PC1 and PC2.

In that graph above you need to look at the columns PC1 and PC2 (the first and second main components), and then at the green horizontal lines. If the line goes away from 0 (whether positive or negative) - it means that this predictor is used in the corresponding main component. PC1 uses y_clean ("_clean" was automatically added during data scaling not to be confused with the original predictor), PC2 uses x_clean. This is the result of the PCA component analysis - you have to take x_clean and y_clean.

Moving forward, PC3 would use input_noise_3_clean. This is just an example, PC3 does not need to be taken in this case.

This actually worked out very well. The X and Y clearly show up on the graph. I have previously posted the same chart for forex, for example, everything is bad there.

 
Dr.Trader:

Yes, only X and Y. How to do this through code I have not yet found, all the articles about this operate with graphs. The number of components to take can be seen in the componentsToUse variable. In this case componentsToUse = 2, it means you need to take only those predictors that have wide horizontal lines on the chart for PC1 and PC2.

In that graph above you need to look at the columns PC1 and PC2 (the first and second main components), and then at the green horizontal lines. If the line goes away from 0 (regardless of whether it is positive or negative) - it means that this predictor is used in the corresponding main component. PC1 uses y_clean ("_clean" is automatically added during data scaling not to be confused with the original predictor), PC2 uses x_clean.

Moving forward, PC3 would use input_noise_3_clean. This is just an example, PC3 does not need to be used in this case.

This actually worked out very well. The X and Y clearly show up on the graph. I have posted the same chart for forex before but it is not so good there.

The selection of predictors here is obtained through the linear method. Well, I'm glad you are surprised. It means you saw something new.)
 
Dr.Trader:

I have not yet tried to train the neuron beyond one bar, but I assume that the principle will be the same. I work with it this way:

1) collect as much raw data as possible from mt5 - ohlc, time, indicators. Since I don't know what is useful and what is not, I take everything and try to weed out the trash. I can't send everything to neuronka, because neuronka retrains on garbage and in that case it makes a mistake in fronttest. It's not easy to sift out garbage, in fact, all 28 pages of this thread are devoted to how to do it, and so far everything is ambiguous. I add to the data a target variable - 0 or 1, depending on the fall / rise in price for the next bar.

2) Actually, garbage collection. There should be a function that analyzes the initial data and gives it a score. The higher the score - the less garbage. We try to feed the evaluation function with various combinations of initial predictors (training table columns) and try to improve the results. As a result a certain set of predictors will be found, which, in theory, will not lead to overtraining.

3) Suppose the trash is eliminated. Now it's time to train the neural network. I use the package nnet from R, it has advantages in the form of absence of hyperparameters like training speed, acceleration, braking, types of training functions. Less hyperparameters means less problems. You have to train with cross validation. I take the original table, and divide it into three rows in the ratio of 70%/15%/15% (new tables are called train/test/validate) (the code can be taken from the rattle log, I started from this). Then I do for example 10 iterations of training neuronics on the train table. I calculate the prediction error R^2. Predict the results for test and validate tables, count R^2 for them too. I do 10 more iterations of training, repeat prediction for 3 tables, look at new R^2. Stop training at the moment when R^2 on some of the tables starts falling (it will always grow only on the train table). Done. Now you can repeat all this again, using a different number of hidden neurons, and a different number of training iterations between cross-validations, hoping that the minimum R^2 of the three tables at the end will be higher than the last time.

All of this may seem a bit complicated, but it's actually even more complicated :), there are a bunch of problems in each of these steps. But it gives consistent results.

I have a different point of view and you don't get it
 

SanSanych Fomenko:
What if you take the first 10 (sort of) before the step, and discard the rest?

I drew a graph of the dependence of R^2 and the percentage of winning cases on the number of components used. The best result on the fronttest was with 41 components (gain of about 70%, very good). But you can't tell that from the backtest charts, they just keep growing. If we rely on the importance of the components, we should have taken 73, which is not the best result in the fronttest.

R^2 fronttest can be negative even when winning >50% of the time, due to unbalanced required results, the number of classes "0" is different from "1", so their average is not 0.5, and R^2 from this is slightly worse.

Reason: