Machine learning in trading: theory, models, practice and algo-trading - page 31

 
Alexey Burnakov:

Yuri, the first sample on your data:

method loss_function cv_folds bag_frac model_params AUC_cv accuracy_train accuracy_validate
GBM bernoulli 4 0.4 0.015_|_7_|_70_|_600 0.429659 0.590361 0.50501
GBM bernoulli 4 0.6 0.02_|_5_|_110_|_600 0.485727 0.586345 0.51503

Two different sets of parameter values for training. It is noteworthy that the AUC is below the plinth on crossvalidation.

All in all, 51.5% accuracy on the test is the best we got.

I don't even know how you get about 60%.

You should throw out that set of predictors.

If we just take every single increment, a few oscillators more than 100 predictors with more than 5000 observations, i.e. H1, then from such a set we can choose 10-15 predictors, which not only have less than 40% prediction error, but most importantly they will give us a REAL model.

 
SanSanych Fomenko:

We should throw out this set of predictors.

If we stupidly take the increments of everything, a few oscillators over 100 predictors with more than 5000 observations, i.e. H1, then from such a set of 10-15 predictors can be selected, which not only give the prediction error less than 40%, but most importantly, they give a NOT REBUILDED model.

We don't know yet which features Yuri has included in the kit. He says they are all needed.
 

In general, I did not get better than 51.5% classification accuracy. Accordingly, the rest of the metrics will also be close to random guessing.

The balance of responses on the test is almost perfectly 50/50.

Yuri, I look forward to your revelations.

 
I have the number of correct predictions on test.csv about 50%, all unpromising. I agree that the set of predictors is not very good, Yuri, add more standard indicators, if your model is really that good then I think you can achieve 80% or more correct predictions with good predictors.
 
Alexey Burnakov:

In general, I did not get a better than 51.5% classification accuracy. Correspondingly, the rest of the metrics will also be close to random guessing.

The balance of answers on the test is almost perfectly 50/50.

Thanks for the information. If nobody could get a better result, and I myself ran this dataset on Weka and it was a bummer there too, then it's time to update the libVMR version. 60% correct answers on such samples is not a limit, if you apply new version.
Alexey Burnakov:

Yuri, I am waiting for your revelations.

I am not hiding anything. For the old version the results of which I have already given above, all information is in the public domain:

Description of the method of building a binary classifier: https://sites.google.com/site/libvmr/

Java source code with comments: https://sourceforge.net/p/libvmr/code/HEAD/tree/trunk/

Builds: https://sourceforge.net/projects/libvmr/files/

Векторная машина Решетова
  • sites.google.com
Теория и практика алгоритмов машинного обучения обладающих обобщающей способностью
 
Yuri, thank you.

There's one thing I don't understand. If the set is linearly seperable, why not take the usual SVM method? How is yours better?
 
Alexey Burnakov:
Yuri, thank you.

There's one thing I don't understand. If the set is linearly seperable, why not use the usual SVM method? How is yours any better?

If the set is linearly separable, then the number of potential separating hyperplanes is infinite. In that case, it is necessary to find some criterion for identifying an adequate hyperplane. One such criterion has been formulated for the method of reference vectors in the book: Vapnik V. N. and Chervonenkis A. Y. Pattern Recognition Theory. Moscow: Nauka, 1974. More precisely, many different criteria are considered in this book.

Both SVM and VMR are reference vector methods.

  • SVM is a method for reconstructing dependencies from empirical data. The criterion is the maximum distance between reference hyperplanes if the space is linearly separable. See Vapnik V. N. Dependence reconstruction from empirical data. Moscow: Nauka, 1979.
  • VMR is a method for identifying strong dependencies and removing (reducing) weak ones. The criterion is a minimax of the distance between the reference hyperplanes, independent of linear separability. That is, VMR does not reconstruct dependencies (does not add anything to the model that is not known to be present in the training sample), not to mention the fact that some implicit dependencies do not get into the model (are sifted out). More specifically, VMR reduces the hyperspace by reducing some of the features.

Which method is better or worse can be argued for a long time. However, you can take and check the generalizability and then everything will fall into place.

 
Yury Reshetov:

If the set is linearly separable, then the number of potential separating hyperplanes is infinite. In such a case it is necessary to find some criterion for identifying an adequate hyperplane. One such criterion has been formulated for the method of reference vectors in the book: Vapnik V. N. and Chervonenkis A. Y. Pattern Recognition Theory. Moscow: Nauka, 1974. More precisely, many different criteria are considered in this book.

Both SVM and VMR are reference vector methods.

  • SVM is a method for reconstructing dependencies from empirical data. The criterion is the maximal distance between reference hyperplanes if the space is linearly separable. See Vapnik V. N. Dependence reconstruction from empirical data. Moscow: Nauka, 1979.
  • VMR is a method of identifying strong dependencies and removing (reducing) weak ones. The criterion is the minimax of the distance between the reference hyperplanes, regardless of linear separability. That is, VMR does not reconstruct dependencies (does not add anything to the model that is not known to be present in the training sample), not to mention the fact that some implicit dependencies do not get into the model (are sifted out). More specifically, VMR reduces the hyperspace by reducing some of the features.

Which method is better or worse can be argued for a long time. However, you can take and check the generalizability and then everything falls into place.

Problems should be solved as they arise, and putting the cart (model) ahead of the horse (predictors) is an absolutely futile exercise. All the more so to compare carts when it is not known what is harnessed to them and whether it is harnessed at all.

Before applying any type of models it is necessary to clear the list of predictors from noise, leaving only predictors that are "related" to the target variable. If you don't do this, you can easily slip into building models based on Saturn's rings, coffee grounds and other predictors that have been widely used in practice for several hundred years.

AboveDr.Trader tried to do the work to remove the noise from his set of predictors.

The result is negative.

I think that the reason of negative result is small number of observations at very big number of predictors. But this is the direction to dig before applying ANY models.

 
Yury Reshetov:

If the set is linearly separable, then the number of potential separating hyperplanes is infinite. In such a case it is necessary to find some criterion for identifying an adequate hyperplane. One such criterion has been formulated for the method of reference vectors in the book: Vapnik V. N. and Chervonenkis A. Y. Pattern Recognition Theory. Moscow: Nauka, 1974. More precisely, many different criteria are considered in this book.

Both SVM and VMR are reference vector methods.

  • SVM is a method for reconstructing dependencies from empirical data. The criterion is the maximal distance between reference hyperplanes if the space is linearly separable. See Vapnik V. N. Dependence reconstruction from empirical data. Moscow: Nauka, 1979.
  • VMR is a method for identifying strong dependencies and removing (reducing) weak ones. The criterion is the minimax of the distance between the reference hyperplanes, regardless of linear separability. That is, VMR does not reconstruct dependencies (does not add anything to the model that is not known to be present in the training sample), not to mention the fact that some implicit dependencies do not get into the model (are sifted out). More specifically, VMR reduces the hyperspace by reducing some of the features.

Which method is better or worse can be argued for a long time. However, you can take and check the generalizability and then everything falls into place.

Yury, thank you. I will think about it.

We do have a question. How did you select the predictors?
 



Unfortunately, I can't calculate Sharpe and the like in R because I have 49 random samples that don't reconstruct the sequence of trades when superimposed.


R has everything you need. See fTrading::sharpeRatio.

Oh, and PerformanceAnalitics wouldn't hurt to take a look.

Good luck

Reason: