Machine learning in trading: theory, models, practice and algo-trading - page 742

 

Remember I said I got a model that has been dialing in since 01.31.2018, and here's how this model has been working out these two weeks since 03.05.2018 to the present day. Tester result.

Pretty good for an old lady trained on 40 points and has been working for like 1.5 months on OOS.

And this is her full OOS from 01.31.2018

And you still think it's a fit???? Let me remind you that the screenshots show the OOS section

 

Here are calculations that show that all is idle chatter:

  • without careful justification that predictors affect the target variable
  • overtraining (overfitting) is determined only on files at TIME that are outside the training time.

Initial data:

Two consecutive time files with 54 predictors and a target variable on the trend reversal: short-out-long

The calculations are performed in rattle, which divides the first file R Rat_DF1a into three parts: train, test, validation. The division into parts is done by sample, i.e. a random selection of bars of the source file is made.

RF calculation results: 500 trees, 7 predictors per node.

Number of observations used to build the model: 2491

Missing value imputation is active.


Call:

randomForest(formula = trainY ~ ,

data = crs$dataset[crs$sample, c(crs$input, crs$target)],

ntree = 500, mtry = 7, importance = TRUE, replace = FALSE, na.action = randomForest::na.roughfix)


Type of random forest: classification

Number of trees: 500

No. of variables tried at each split: 7


OOB estimate of error rate: 1.61%

Confusion matrix:

-1 0 1 class.error

-1 498 5 2 0.01386139

0 3 1067 17 0.01839926

1 1 12 886 0.01446051

What a wonderful result! The Grail! Note that the AOB is the piece of file that was not used in training.

Let's look at the training error here. We see that 500 is not needed, we can do with 50 or 100 trees.



Let's check it on the test section

Error matrix for the Random Forest model on Rat_DF1a [test] (counts):


Predicted

Actual -1 0 1 Error

-1 110 3 0 2.7

0 3 221 2 2.2

1 0 2 194 1.0


Error matrix for the Random Forest model on Rat_DF1a [test] (proportions):


Predicted

Actual -1 0 1 Error

-1 20.6 0.6 0.0 2.7

0 0.6 41.3 0.4 2.2

1 0.0 0.4 36.3 1.0


Overall error: 1.8%, Averaged class error: 1.96667%.


Rattle timestamp: 2018-03-14 10:57:23 user


The training result is confirmed. Grail!


Let's double-check again in the validation section.

Error matrix for the Random Forest model on Rat_DF1a [validate] (counts):


Predicted

Actual -1 0 1 Error

-1 105 1 0 0.9

0 1 218 2 1.4

1 0 1 205 0.5


Error matrix for the Random Forest model on Rat_DF1a [validate] (proportions):


Predicted

Actual -1 0 1 Error

-1 19.7 0.2 0.0 0.9

0 0.2 40.9 0.4 1.4

1 0.0 0.2 38.5 0.5


Overall error: 0.9%, Averaged class error: 0.9333333%.


Rattle timestamp: 2018-03-14 10:59:52 user


Grail!!! You can run to a microfinance company and borrow as much dough as you can!


But there is one BUT: division of the file was done by random sampling of bars, and will trade strictly by increasing time.

Let's check on the file where this chronology has been saved - Rat_DF1b

And here is the result:

Error matrix for the Random Forest model on Rat_DF1b (counts):


Predicted

Actual -1 0 1 Error

-1 0 324 237 100.0

0 0 633 540 46.0

1 0 152 697 17.9


Error matrix for the Random Forest model on Rat_DF1b (proportions):


Predicted

Actual -1 0 1 Error

-1 0 12.5 9.2 100.0

0 0 24.5 20.9 46.0

1 0 5.9 27.0 17.9


Overall error: 48.5%, Averaged class error: 54.63333%.


Rattle timestamp: 2018-03-14 11:02:16 user


CATASTROPHE! THE MODEL IS RETRAINED! THE PREDICTORS RELATIVE TO THE TARGET VARIABLE IS JUST NOISE, ONLY ON NOISE CAN MO PRODUCE SUCH AMAZING RESULTS.


I have shown a normal, ordinary, at the level of a university student, scheme of model fitting and checking. The main drawback: it lacks any consideration of the relationship between predictors and target variable.

But the scheme should ALWAYS be at least that, and it is NOT complete yet - we need another run of the tester, which will confirm the test result on an ordinary sequential file. Well, and then to the microfinance company.



 
SanSanych Fomenko:

I will give calculations which show that everything is idle talk:

  • overtraining (overfitting) is determined only on files in TIME outside of training time.

It is strange, that you have such good results on the test. In my experiments and there it was much worse. By doing different initialization of RNG before mixing - I got different results in test and validation - very different at different RNG, both by error and by number of trades.

As a result, I came to the conclusion that test and validation are not needed at all, and we should train at one site, and evaluate at the other (you have it as a separate file). That way the random factor of "good luck" of mixing will be excluded.

 

Guys, is the grail ready?

 
SanSanych Fomenko:

Here are calculations that show that all is idle chatter:

  • without careful justification that predictors affect the target variable
  • overtraining (overfitting) is determined only on files at TIME that are outside the training time.

Initial data:

Two consecutive time files with 54 predictors and a target variable on the trend reversal: short-out-long

The calculations are performed in rattle, which divides the first file R Rat_DF1a into three parts: train, test, validation. The division into parts is done according to the sample, i.e. a random selection of bars of the source file is made.



But there is one BUT: division of the file is done by random selection of bars, while trading will be done strictly by increasing time.

Let's check on the file where this chronology has been saved - Rat_DF1b



Overall error: 48.5%, Averaged class error: 54.63333%


Rattle timestamp: 2018-03-14 11:02:16 user


CATASTROPHE! THE MODEL IS RETRAINED! THE PREDICTORS RELATIVE TO THE TARGET VARIABLE IS JUST NOISE, ONLY ON NOISE CAN MO PRODUCE SUCH AMAZING RESULTS.


I have shown a normal, ordinary, at the level of a university student, scheme of model fitting and checking. The main drawback: it lacks any consideration of the relationship between predictors and target variable.

But the scheme should ALWAYS be at least that, and it is NOT complete yet - we need another run of the tester, which will confirm the test result on an ordinary sequential file. Well, and then to the microfinance company.



This is the main mistake in dividing into subsets(train/val/test). The order should be as follows:

  1. Split the time-ordered data set into train/val/test.
  2. When training, we mix only the train set (never the validation set and the test set). I'm talking about classification of course.
  3. We get all transformation parameters, predictor transformations only on training set. We use them on val/test set.
  4. Evaluation, selection, creation of predictors on training set only.

Good luck

 

If we talk about evaluating predictors with models, in my opinion the most advanced package is RandomUniformForest. It has a very detailed consideration of the importance of predictors from different points of view. I recommend to have a look. In one of my articles I described it in detail.

I refused to use a model selection of predictors. Limited to the specifics of the model used.

Good luck

 
Vladimir Perervenko:
  1. When training, we only mix the training set (never the validation set and the test set). I'm talking about classification, of course.

From Nikopenko S., Kadurin A., Arkhangelskaya E. "Deep Learning" p. 139.


For validation data to be of the same nature as the training data, you have to mix them together. Otherwise a random piece of trend or a flat may appear there. As a result, we won't be evaluating the model for its ability to generalize, but fitting it to a certain chunk of continuous history at a certain section (we will stop there, after all).

But, as I wrote above, after experimenting with different initialization of RSCh, in which different by success valid sets were obtained, - I came to the conclusion, that the valid section probably isn't needed. In addition to it we can use other methods of regularization. However, these experiments were on a small amount of data (5 days), and if the number of examples increase by 10 times, maybe mixing will be more uniform and the data in these areas will be homogeneous (i.e. the same nature) - in this case valid may be useful.

Update: If there are a lot of data and several flips and trends of both directions fall into the valid plot, then in this case mixing with the training set may not be necessary.
 
Vladimir Perervenko:

If we talk about evaluating predictors with models, in my opinion the most advanced package is RandomUniformForest. It has a very detailed consideration of the importance of predictors from different points of view. I recommend to have a look. In one of my articles I described it in detail.

I refused to use a model selection of predictors. Limited to the specifics of the model used.

Good luck

And I think that the most advanced is a completely different product ;-).... In which it is implemented a little differently.

Two networks, where the sample is divided into 2 subsamples train and test, where for network B (the second polynomial), the traynomial is the test, and the test traynomial. This counts exclusively the test sample, where half is processed by one polynomial, the other by the other. And the classes are divided evenly. That is, ones are equally divided into tray and test, and, respectively, zeros are also divided equally. And unfortunately there is no time. The file can be fed at least according to the ordering of vectors. Maybe this is the key to reducing overfits.


Truth I do not quite understand, maybe speaking of the validation area it is a section of trace in my favorite optimizer?

And in your case the test section is a control section, when we let the network work for some time... I'm confused with concepts...

 

In any case, I think so, that the test section can not affect the training section and must be formed as randomly as possible, for the tasks of classification, where the control section even going orderly by time will not depend on this very time. Why? Because by mixing all the data we are trying to get the real potential out of this set, and not a fortunate coincidence of circumstances in the form of ordering. So when you shuffle the data, you really see what the data can do... Something like this....

With repeated optimization the result should jump within 10-20% is precisely because of the ordering of the data, one time well ordered, the other time a little worse, etc .... IMHO!!!

 
elibrarius:

From Nikopenko S., Kadurin A., Arkhangelskaya E. "Deep Learning" pp. 139.


In order for the validation data to be of the same nature as the training data - you need to shuffle them together. Otherwise there may be a random piece of a trend or a flat. As a result, we won't be evaluating the model for its ability to generalize, but fitting it to a certain piece of continuous history at a certain section (we will stop there, after all).

But, as I wrote above, after experimenting with different initialization of RSCh, in which different by success valid sets were obtained, - I came to the conclusion, that the valid section probably isn't needed. In addition to it we can use other methods of regularization. However these experiments were on a small amount of data (5 days), and if the number of examples to increase by 10 times, maybe mixing will be more uniform and the data on these sites will be homogeneous (i.e. the same nature).

Update: If there is a lot of data and the valid plot includes several flutes and trends in both directions, then in this case mixing with the training set may not be necessary.

The adolescent spirit of controversy is indomitable :)

I was talking about the classification of timeseries. For example for M15 two weeks for training approximately 1000 bars. The next week for validation - 500 bars. During training, we mix the training set, but not the validation one.

Mixing the entire set before dividing is necessary in two cases: stratified sets and crossvalidation. And in this case, sampling should be done without substitution, to avoid getting the same examples in both sets.

Considering that we have no limit on the number of examples and that these are timeseries, it is better to split before mixing. IMHO

Reason: