Machine learning in trading: theory, models, practice and algo-trading - page 90

 

I have tried different self-described validation methods, including those described in these articles. My conclusions are as follows:

In forex there is no strict dependence between the target variable and predictors, forex is not a formula that can be found and applied to calculate new data. All that the model can do is to find a certain regularity and extrapolate the results for trading on new data.
That is, there is a certain multidimensional space (dimensionality is equal to the number of predictors), in which there are several points (known target variables). The model constructs a hyperplane in this space that separates the points in the space (the "buy" class from the "sell" class). There are infinitely many ways to construct this hyperplane (in a simple case - draw four points on a sheet of paper, and draw a curved line between them so that there are 2 points to the right of the curve and two points to the left. The ways to draw a curve will be infinite). Therefore, there is no guarantee that the constructed model reflects the correct dependence of the target variable on the predictors. Validation is used to check the adequacy of the model - some of the points were not used during training, and you can easily find out whether the model has coped, whether it will show the correct result on these test points.

If the model could not pass validation correctly, there could be many reasons for it, for example
- the model found some non-existent dependencies, which exist only in the training examples
- there was some dependency in the training data, which is not present in the test data. For example, when all data for the test are taken later in time, and the behaviour of the forex symbol has changed
- the model itself is initialized with an unsuccessful seed. It often happens that the model trained on the same data will give different results during validation after many attempts to train it again

It is not known what caused a bad result in a particular case. All we can do is to estimate how good the model is on average - build the model dozens of times, make an estimate at validation. The data for training/validation should be divided again each time.
What I think is a valid way is to divide the data randomly in the ratio 50%/50% (not by time, but so that everything was evenly mixed, for example lines 1,2,5,7 for training, 3,4,6,8 for validation), train the model on the first part, then do validation on the second part, I use accuracy to evaluate the model. Repeat this 50 times (re-split the data into 2 random parts, training, validation). Then calculate the average accuracy on the training data, and the average on the validation data. Let's say the average accuracy on the training sample was 90%, and the average on the validation sample was 80%. The accuracy on the fronttest will be even lower, I use this rule of thumb: calculate the difference (90%-80%=10%), and subtract it from the validation result (80%-10% = 70%). It turns out that such a model on the fronttest will have an average accuracy of about 70%. Then, I genetically adjust the model parameters and predictors to increase this estimate from 70% (it is much harder than it seems, it is difficult to go beyond 50%).

But I don't like the fact that this result is just an average value, without guarantees. The real accuracy when trading will be 60% to 80%, or even 50% to 90%, depending on how unlucky you are. No matter how I try, I can't find the best model by any indication. Probably the only solution is to build dozens of models with the best parameters and predictors found, and take the result where the majority of people look (congress of models).

This is closely related to what SanSanych said at the beginning of the thread. You can also take away the last part of the known data for the last-control sample, as he advised. Do not use these data for training and validation, but just store them separately until the end of model training. Then test the finished model, or congress, on this data. The upside is that this will show how the model performs on new time-weighted data. The downside is that there will be less data left for training and validation, and the model will be a bit outdated at the start of trading. There is a small nuance here, that if you did not like the result on these control data, and you started to select a model that will show a good result on this site - then you started to use these data for validation, accordingly, the model is selected with them in mind, and therefore there is a small look into the future, control and this whole operation loses its meaning, and in this case it was easier not to make a control sample at all.

 
Dr.Trader:

I tried different self-written validation methods, including the ones described in these articles. My conclusions are as follows:

In forex there is no strict relationship between target variable and predictors, forex is not a formula that can be found and applied to calculate new data. All that the model can do is to find some regularity and extrapolate the results for trading on new data.
In other words, there is a certain multidimensional space (dimensionality equals the number of predictors) with some points (known target variables). The model builds a hyperplane in this space that separates the points in the space (the "buy" class from the "sell" class). There are an infinite number of ways to build this hyperplane (in a simple case, draw four points on a sheet, and draw a curved line between them so that there are two points to the right of the curve, and two to the left as well. The ways to draw a curve are endless). Therefore, there is no guarantee that the built model reflects the correct dependence of the target variable on predictors. Validation is used to check the adequacy of the model - some points have not been used in training and you can easily find out if the model has coped with it, if it will correctly show the result on these test points.

If the model fails the validation, there may be many reasons for that, for example
- the model has found some non-existent dependencies, which are present only in the training examples
- there was a dependency in the training data, which did not exist in the test data. For example, in a case where all the data for the test is taken later in time, and the behavior of the forex symbol has changed
- The model itself has been initialized with an unsuccessful seed. It often happens that a model trained on the same data will give different results on validation after multiple attempts to re-train it

We do not know what caused the bad result in this case. All you can do is estimate how good the model is on average - build the model dozens of times, estimate on validation. The training/validation data needs to be re-divided each time.
What I think is a valid way is to divide the data randomly in a 50%/50% ratio (not by time, but so that everything is evenly mixed, for example lines 1,2,5,7 for training, 3,4,6,8 for validation), train the model on the first part, then do validation on the second, I use accuracy to evaluate the model. Repeat this 50 times (re-partition the data into 2 random parts, training, validation). Then calculate the average accuracy on the training data, and the average accuracy on the validation data. Suppose the average accuracy on the training sample was 90%, on the validation sample 80%. The accuracy on the fronttest will be even lower, I use this rule of thumb: calculate the difference (90%-80%=10%), and subtract it from the validation result (80%-10% = 70%). It turns out that such model on fronttest will have average accuracy about 70%. Next, I genetically adjust model parameters and predictors to increase this estimate from 70% (it's much harder than it seems, it's difficult to get even over 50%).

But I don't like that this result is just an average value, with no guarantees. The real accuracy when trading will be 60% to 80%, or even 50% to 90%, depending on how unlucky. No matter how I try, but I can't catch the best model by any indication. Probably the only solution is to build dozens of models with the best parameters and predictors found, and take the result where most people look (congress of models).

This is closely related to what SanSanych said at the beginning of the thread. You can also use his advice to remove the last part of the known data for the last control sample. Don't use this data for training and validations, but just store it separately until the end of model training. Then test the finished model, or congress, on this data. The plus is that it will show how the model is doing on the new time data. The minus is that there will be less data left for training and validation, and at the beginning of trading the model will already be a bit outdated. There is a slight nuance here, if you don't like results on the benchmark data and start choosing a model that will give good results in this area, you have begun to use this data for validation, respectively, the model has been chosen taking them into account and therefore there is a slight looking into the future and this entire operation becomes senseless, and in this case it would be easier not to do the benchmark selection at all.

You got it right! ©

But, one crucial nuance. For crossvalidation and top layer nested cv, you have to take time separated observations. Well, or at least for the top layer take sparse samples of dates that do not coincide with training and the bottom CV.

The results should be worse, but truer.

If you get a correlation between CV and nested CV results (correlation). Then the model fits the data.
 
Vizard_:
Not yet)))
7 looked it up. Cuts are no better than the version of half a year ago, or when I looked I do not remember exactly. In the window and fylo statistics writes different. Selecting the importance of inputs
Questionable, compared head-on with rf and a couple more, and may give high priority and very unimportant. If you take the best cut (from the window) is still not good.
On this data I get at least 92%. Pribludka (as is) is still of little use for practical applications. For the effort in the development and flight of ideas - kudos.

All imho of course. Now for now)))


When we are dealing with a man of Reshetov's level, we can safely demand:

1. a review of analogues.

2. Indication of the shortcomings that are supposed to be overcome in these analogues.

3. Indication of the mechanism to eliminate these shortcomings (you can hide the specifics in the market economy)

4. Comparison of analogues with your own development. This comparison must prove that all the above-mentioned drawbacks of existing analogues have been eliminated. And we got a tool which is NOT worse than the analogues.

If a person of Reshetov's level does not do this, then: Reshetov's effort in development and flight of fancy - respect.

 
SanSanych Fomenko:

then you can safely demand:

))))))
 
Vizard_:
Not yet)))
7 looked. Cuts are no better than the version of half a year ago, or when I looked I do not remember exactly. In the window and fylo statistics writes different. Selecting the importance of inputs
Questionable, compared head-on with rf and a couple more, and may give high priority and very unimportant. If you take the best cut (from the window) is still not good.
On this data I get at least 92%. Pribludka (as is) is still of little use for practical applications. For the effort in the development and flight of fancy - kudos.

All imho of course. Now goodbye)))


The results in the window is a committee of two models. If you look at the file, which saves the predictor two models In the window is the result of the committee of these two models. as it is so ....
 
Vizard_:
I see.Notepad compressed, and did not twist down)))But for comparison took from the window.
Deleted immediately for lack of use, although someone may be useful...
In general, I think you should not like this work. First, it answers a very important question, namely what percentage of generalizing information contained in the input data with respect to the output. And secondly, if Yuri listens to what I suggested to him, then you will get a bomb, which will close many questions. I unfortunately did not manage to teach the model at 100 percent on their data (without manipulating the data that increase the generalization ability, as it turned out imaginatively), to see how the model will work in the future. However, having obtained 100% generalization of the committee, we need to make sure that each of the models has the same 100%, that is, when the input data fully describe the output. Then we will see.... Meanwhile, the conclusions about the optimizer's inoperability are premature. Another thing is that each of us is trying to build a model to the IDEAL output. Which in turn is extremely difficult, if not impossible. Well, what if the output is not IDEAL, but with a little mistakes..... That's what's interesting..... Let's say to mark with "1" not those signals which have a profit of 100 pips, but also those which have a profit of -30 pips and higher.... Knowing this assumption it will be enough to take a signal 30 pips better and the problem will be solved, if we manage to build a model with the generalization level of 100%
 
In general, for the construction of classification models, the order of records is not so important, it is important that the model learns it 100% and it is important that the market reaction in the near future to the same events was the same. The absence of contradictory data, so to speak. Another thing is with prognostic models, for them the order of the records is important. This is one of the differences between classification and prediction.....
 

I wonder if this will help us.... So I understand that the processing power of such a thing is an order of magnitude higher, if not by several....

https://hi-tech.mail.ru/news/compact-quantum-computer/?frommail=1

 
SanSanych Fomenko:
Mihail Marchukajtes:
And all those who wish to do so. The z1 archive contains two files train and test. For Target build model on train, apply to test, post results in % (successfully predicted
of cases) for both samples (train = xx%, test = xx%). Methods and models do not need to be announced, just numbers. It is allowed to use any data manipulation
and mining methods.
Files:
z1.zip  43 kb
 
Vizard_:
And all comers. The z1 archive contains two files train and test. For Target, build a model on train, apply to test, post results in % (successfully predicted
of cases) for both samples (train = xx%, test = xx%). Methods and models do not need to be announced, just numbers. It is allowed to use any data manipulation
and mining methods.

Thank you! I'll give it a try.

Let's agree not to look into the test until the trained model is evaluated. I used to sin with this.

That is, we train until we are blue in the face the best model on train. Maybe two or three models. Then their one time test.

Reason: