Binary classification algorithms are the most inefficient, according to a new report on binary classification - General

Yury Reshetov 2016.07.02 21:48 #311

Alexey Burnakov:
Yuri, thank you. I will think about it.

We do have a question. How did you select the predictors?

I don't cull them in any way. That's what VMR does. The 21st century is over 16 years old, and all the hard work should be done by automated systems, not people.

Yury Reshetov:

... More specifically, VMR reduces the hyperspace by reducing some of the predictors.

There is even a concrete example where VMR automatically reduces one of the predictors. See Predicting Bankruptcy.

On this page you can even try a simple classic example of how the algorithm does it: Reduction of non-informative predictors and non-supporting vectors (samples) in a training set

Прогнозируем банкротства - Векторная машина Решетова

sites.google.com

Авторы: Myoung-Jong Kim, Ingoo Han опубликовали статью под названием: «The discovery of experts decision rules from qualitative bankruptcy data using genetic algorithms». Судя по оригинальной авторской статье, выборка предназначалась для генетических алгоритмов, т. к. предполагалось, что для других алгоритмов она не по зубам. Однако, вынужден...

Migration - Virtual Hosting Articles on the development Demo Account Opening -

Alexey Burnakov 2016.07.03 05:50 #312

I have to try your algorithm in practice. I can't imagine why it would work so well.

СанСаныч Фоменко 2016.07.03 10:27 #313

Yury Reshetov:

I don't cull them in any way. That's what VMR does. The 21st century is over 16 years old, and all the hard work should be done by automated systems, not humans.

In my experience, which I am NOT absolutizing, the predictor selection algorithms built into the models are the most inefficient. Neither are the numerous separate packages and functions available for predictor selection.

Why do I say that?

For me, there is a simple and for trading basic test.

Learn the model. We test it on a sample that is outside the training sample by time, and compare the errors. First, the errors can't be around 50% - that's not a trainable model at all. We consider errors less than 40%. If in the training and validation sample these errors are about equal, then the model is not over-trained. If these errors are significantly different, and can diverge by many times, especially if the training error is less than 10%, then the model is over-trained. The reason for overtraining is the presence of noise predictors, which the model learning algorithm does NOT retrieve.

So far, I haven't come across sets of predictors that don't have noise among them. And all predictor selection algorithms could not cope with this noise, nor could the idea of coarsening (regularization) of models!

So your opinion is wrong, and given getting overtrained models is dangerous on the real, which is always "out of sample".

Optimization Types - Algorithmic MQL5 Wizard: Development of Real and Generated Ticks

Yury Reshetov 2016.07.05 17:25 #314

Alexey Burnakov:

I propose problem number one. I will post its solution later. SanSanych has already seen it, please do not tell me the answer.

Introduction: in order to build an algorithm for trading it is necessary to know what factors will be the basis for predicting the price, or the trend, or the direction of transaction opening. The selection of such factors is not an easy task, and it is infinitely complicated.

Attached is an archive with an artificial csv dataset that I made.

The data contain 20 variables with prefix input_, and one rightmost variable output.

The output variable depends on some subset of input variables(the subset can contain from 1 to 20 inputs).

The task: with the help of any methods (machine learning) select variables inputs, with the help of which it is possible to determine the state of variable output on existing data.

Will there be any other problems with binary classification? Because I came to a hullabaloo when they already posted both results and sampling generation method.

I would like to participate and test a new version of my binary classifier.

Optimization Types - Algorithmic Demo Account Opening - Momentum - Oscillators -

Yury Reshetov 2016.07.05 19:26 #315

IvannaSvon:
Yuri, please reply to a private message.

The script is in the attachment.

Dataset was taken from EURUSD H1

Files:

Create_CSV.mq5 3 kb

Dr. Trader 2016.07.06 04:11 #316

Yury Reshetov:

Any other problems with binary classification? Because I came to a huddle when they already laid out both results and sampling generation method.

I would like to participate and at the same time test a new version of my binary classifier.

There are always tasks :)

There are two files in the archive: train.csv and test.csv. The last column in files is required result of binary classification. We need to use train.csv file to train the model, then apply the model to test.csv. The already known target results for test.csv cannot be used beforehand, they are needed only for the final check. The data is taken from eurusd d1, class 0 or 1 - price decrease or increase in the next bar. If the model correctly predicts the result for test.csv at least in 6 out of 10 cases, then you can try to trade in it in Forex. If correctly predicted in 7 out of 10 cases (and above) - this is the right path to the grail, and then test the training and the model test in other years and months, and if everything will be the same - then very good.

I've taken into account the previous mistakes and now the files are based on the delta bars, not the raw values. And everything is normalized by rows given the same type of predictors in a row, not by columns.

I myself try to use genetics for enumerating predictor variants. I train the principal components model with two crossvalidations on the set of predictors and return the classification error as a fitness value for genetics. When genetics reaches its limit, I take the final set of predictors, train a neuron again with two crossvalidations on them. Final prediction error on the test data is about 40%. Here begins the bad news - depending on parameters of the neuron (number of internal weights, number of iterations between crossvalidations) the final error depends, it randomly varies within 30%-50%. I was once happy that I got only 30%, but as it turned out I can't control it, and just get an average of 40%.
The logical question is why do you need a principal component model in the middle? It simply has no training parameters, fed the data, get the model, calculate the error of crossvalidation on the test data. If you use the neural network immediately to determine the fitness function of the genetics, then the training time increases very much, plus it is not clear what training parameters to use the neural network itself.

I also ported Yuri's VMR model to R, in the appendix. I took my data with noise predictors, got error on training data 30% and error on test data 60%, i.e. miracle did not happen, the model was retrained. It's possible I have errors while transferring the code, if you see any inconsistencies please let me know. When training the model I didn't have enough memory for a large kernel machine, I used a medium one (parameter kernelTrickMode <- 2). I couldn't wait for a million iterations per column, I used 10000 (iterPerColumn <- 10000).

Files:

binary_classifier_train.zip 2458 kb

libVMR.txt 12 kb

Real and Generated Ticks Optimization Types - Algorithmic Strategy Optimization - Algorithmic

Yury Reshetov 2016.07.06 07:40 #317

Dr.Trader:

There are always tasks :)

There are two files in the archive - train.csv and test.csv.

Unfortunately, my RAR archive does not unpack. IMHO it is better to pack everything in ZIP. Because there are unpackers for ZIP formats on all platforms. Also, many users don't use RAR.

Dr.Trader:
I also ported Yuri's VMR model to R, in the app. I took my data with noise predictors and got error 30% on training data and error 60% on test data, i.e. a miracle did not happen, the model was retrained. Probably a bug in the transfer of the code, if you see any inconsistencies please let me know.

I will definitely check it out. I don't know R well enough though.

Was the port done manually or through some machine?

How to Participate - Migration - Virtual Hosting Working with Projects -

СанСаныч Фоменко 2016.07.06 09:45 #318

Can someone else try the principal component method for noise rejection, but predictors with thousands of observations, not likeDr.Trader?

СанСаныч Фоменко 2016.07.06 09:59 #319

Dr.Trader:

See personal.

mytarmailS 2016.07.06 13:18 #320

SanSanych Fomenko:
Maybe someone else will try the principal components method for noise rejection, but predictors having thousands of observations, and not likeDr.Trader?

Why don't you just try it?

ps. I tried it a long time ago, I didn't get anything interesting

In the course of my research there are some modest but interesting results, I would like to share but on this "wonderful" forum, I can not put a picture or attach a file, maybe who knows what the problem?

Market Facilitation Index - Market Facilitation Index - Market Facilitation Index -

Machine learning in trading: theory, models, practice and algo-trading - page 32