Machine learning in trading: theory, models, practice and algo-trading - page 948

 
SanSanych Fomenko:
Generally speaking, targeting 3 classes isn't very good. Better to split into two target classes: (-1,0) and (0.1), and then combine them when deciding on a position.

I was the one who asked for the 3 classes. A couple of pages ago there was a similar file with just two targeting. Two targeting I could not handle (one of the classes is heavily skewed by the number, so it's also difficult), trying now three.

 

I waited for randomForest, but for the training file took 10% of the original file.

Here is the result:

Number of observations used to build the model: 20276
Missing value imputation is active.

Call:
 randomForest(formula = as.factor(arr_Buy) ~ .,
              data = crs$dataset[crs$sample, c(crs$input, crs$target)],
              ntree = 500, mtry = 7, importance = TRUE, replace = FALSE, na.action = randomForest::na.roughfix)

               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 7

        OOB estimate of  error rate: 11.22%
Confusion matrix:
     -1    0    1 class.error
-1 5335  314   21  0.05908289
0    95 8712  480  0.06191450
1     2 1363 3954  0.25662719

Variable Importance
===================

                                 -1      0      1 MeanDecreaseAccuracy MeanDecreaseGini
arr_Sell                     223.88 208.20 164.24               216.84          3073.97
arr_DonProcVisota             25.86  57.39  53.05                57.46           203.17
Levl_Close_MN1                21.45  51.65  52.89                58.17           140.68
Levl_High_H4                  22.35  51.81  50.33                58.17           118.24
Levl_Close_W1                 20.11  48.85  49.94                52.90           145.82
arr_LastBarPeresekD_Down_M15  22.38  49.70  49.39                57.02           129.74
arr_Den_Nedeli                21.46  44.74  49.06                49.48           129.12
Levl_Low_H4                   20.55  47.30  47.49                50.63           126.02
arr_Regresor                  23.40  46.78  46.25                50.94           126.67
Levl_Close_D1                 21.03  42.44  44.51                47.91           164.55
Levl_Support_D1               22.47  47.09  44.35                51.17           153.52
Levl_Low_H1                   20.62  47.36  43.94                50.02           132.55
Levl_Support_W1               18.05  47.42  43.40                47.19           134.89
arr_DonProc_M15               23.81  50.52  42.83                52.10           184.02
arr_TimeH                     20.55  41.15  42.07                42.59           109.51
arr_LastBarPeresekD_Up_M15    22.46  43.85  41.64                47.63           134.78
arr_DonProc                   21.88  56.26  41.49                50.51           228.07
Levl_High_W1                  20.78  40.34  41.27                45.67            91.07
Levl_Low_D1                   18.87  39.05  40.81                42.30           116.54
Levl_Support_MN1              18.99  39.66  40.36                41.64           122.87
Levl_High_MN1                 17.25  36.97  39.62                40.44            88.72
Levl_first_H4                 18.77  38.39  38.71                42.28            70.63
arr_LastBarPeresekD_Down      19.31  42.64  37.27                44.06           150.78
Levl_Low_MN1                  17.96  37.24  36.40                37.83            92.71
Levl_first_H1                 14.43  35.64  34.62                37.45            78.99
Levl_High_D1                  19.35  34.28  34.48                36.43           106.69
Levl_Close_H4                 19.72  39.58  33.59                39.85           170.80
X_USE_Filter_MA_02            15.76  33.70  33.01                38.00            50.89
Levl_Support_H4               16.71  39.24  32.66                36.73           152.03
Levl_Low_W1                   16.72  34.05  32.47                33.22           111.06
Levl_High_H1                  17.58  33.14  31.93                33.90           117.74
Levl_Close_H1                 19.09  35.94  30.80                35.35           160.90
Levl_first_W1                 12.82  30.50  30.57                31.63            51.57
X_Use_Donchianf               13.69  28.78  30.46                31.47            59.44
Levl_first_MN1                13.94  27.58  30.02                29.14            59.58
arr_LastBarPeresekD_Up        15.96  31.32  28.67                30.59           142.72
Use_Filter_MA_Prirost         11.99  27.62  26.88                33.29            38.78
arr_Vektor_Don_M15            13.70  26.01  25.25                29.80            32.87
X_Use_BarPeresek_iMA_TF       11.40  18.48  25.17                25.30            20.57
Levl_first_D1                 14.97  28.47  24.99                29.17            46.49
X_Use_Filter_Fibo_in_Day      12.43  24.34  24.84                29.00            45.29
Levl_Support_H1               17.57  29.98  24.77                28.92           141.18
arr_Vektor_Week                7.99  19.81  24.10                21.56            27.30
arr_RSI_Open_H1               10.31  25.18  23.80                29.74            27.62
X_USE_Filter_MA               12.64  22.15  22.22                25.99            41.49
arr_Vektor_Don                10.20  19.81  15.88                17.82            41.06
arr_Vektor_Day                11.65  15.07  15.62                16.44            25.24
arr_BB_Center                 10.41  15.87  14.93                15.48            38.40
arr_BB_Up                      7.78  10.50  14.74                13.99            17.23
X_Use_ChanelEvaProc            4.30  13.36  12.15                17.63            67.45
arr_RSI_Open_M1                7.45  12.63  10.49                13.56            25.92
arr_BB_Down                    5.47  14.77   7.02                14.50            16.72
USE_Filter_MA_Donchian         0.72   0.10   4.71                 3.29             1.78

Time taken: 2.29 mins
 

And here is an interesting graph


It shows that increasing the number of trees up to about 50 gives a decrease in the error, but after increasing the number of trees over 100 is completely useless

 

And here are the results on the validation and testing chunks. They are large, 45% of the original

Error matrix for the Random Forest model on Pred_027_2016_H2_T.csv [validate] (counts):

      Predicted
Actual    -1     0     1 Error
    -1 24023  1523    70   6.2
    0    473 39706  1964   5.8
    1     10  5858 17615  25.0

Error matrix for the Random Forest model on Pred_027_2016_H2_T.csv [validate] (proportions):

      Predicted
Actual   -1    0    1 Error
    -1 26.3  1.7  0.1   6.2
    0   0.5 43.5  2.2   5.8
    1   0.0  6.4 19.3  25.0

Overall error: 10.9%, Averaged class error: 12.33333%


Error matrix for the Random Forest model on Pred_027_2016_H2_T.csv [test] (counts):

      Predicted
Actual    -1     0     1 Error
    -1 23847  1502    73   6.2
    0    455 39677  1984   5.8
    1      7  6024 17673  25.4

Error matrix for the Random Forest model on Pred_027_2016_H2_T.csv [test] (proportions):

      Predicted
Actual   -1    0    1 Error
    -1 26.1  1.6  0.1   6.2
    0   0.5 43.5  2.2   5.8
    1   0.0  6.6 19.4  25.4

Overall error: 11%, Averaged class error: 12.46667%
 

Very decent, if nothing looks ahead.

The TC on M1 is not clear: we will predict within the spread.

 
SanSanych Fomenko:

I waited for randomForest, but for the training file took 10% of the original file.

Here are the results:

Thank you, judging by the results things are not bad?

However, it seems that arr_Sell was used as a predictor, judging by the predictor importance table? If so, it's not correct.

SanSanych Fomenko:

Here is an interesting graph


It shows that increasing number of trees up to about 50 gives decrease of error, but after increasing number of trees more than 100 is absolutely useless

So it probably makes sense, the more predictors the more solutions, or am I wrong?

 
SanSanych Fomenko:

Very decent, if nothing looks ahead.

I do not understand the TS on M1: we will predict within the spread.

There is a trend strategy, and it works on MOEX on Si, i.e. the spread is not important there.

 

In order to make the final decision we need to:

  • test the predictive ability of predictors under very specific input file conditions
  • physically divide the input file into two parts: do what I did on one, and run the finished forest on the other. If the error matches, then millionaire or billionaire!

 
Aleksey Vyazmikin:

It's a trending strategy, and it works on the MOEX exchange on Si, i.e. the spread is not important there.

What other trend in the classification? The prediction errors will tear the trend - nothing is left of the trend.

 
Aleksey Vyazmikin:


However, it seems that arr_Sell was used as a predictor, judging from the predictor importance table? If so, that's not correct.


Of course it is, for fuck's sake!

What other ones?

Let me recalculate.

Reason: