Machine learning in trading: theory, models, practice and algo-trading - page 1012

 
Aleksey Panfilov:

Very interesting, can you elaborate on the measurement of predictive ability?

And above all what to measure?

I wrote, I gave charts, I posted code - the skilled people buried everything in these 1000 pages...

Lazy to repeat. The most used here is vtreat, I don't use it. The main thing is to think about this topic and discard everything else.

 
Aleksey Vyazmikin:

So you did not raise the question of what settings should have ZZ?

The ZZ parameter is different for each instrument and timeframe. For example for EURUSD M15 a good initial value of 15 pips (4 digits). It also depends on predictors that you use. It's a good idea to optimize parameters of predictors and ZZ parameter together. That is why it is desirable to have non-parametric predictors, it makes life much easier. As such, digital filters show good results. Using ensembles and cascade combining I got average Accuracy = 0.83. This is a very good result. Tomorrow I will send an article for review that describes the process.

Good luck

 
Vladimir Perervenko:

The ZZ parameter is different for each instrument and timeframe. For example for EURUSD M15 a good initial value of 15 points (4 digits). It also depends on predictors that you use. It's a good idea to optimize parameters of predictors and ZZ parameter together. That is why it is desirable to have non-parametric predictors, it makes life much easier. As such, digital filters show good results. Using ensembles and cascade combining I got average Accuracy = 0.83. This is a very good result. Tomorrow I will send an article for review that describes the process.

Good luck

Very interesting. Looking forward to it.

 
Grail:

If you share the rows bid and ask 2004-to the current date I will try, I usually learn 1-3 years and test at 20-30%

dataset, lern and test as well as raw series from ducas

Nice curve :) But it is unlikely to intrigue anyone, it is not clear what kind of software, how did you calculate this curve. On your dataset I got not much more than 52% accuracy, by the way you have there markers end before the chips, I cut them off at my place. I need to add more cut prices from which you got lern and test, to then run on the bulltester the result of the classifier.

PS: in fact any tester yield curves and as it turned out reports on quality of classification/regression can not prove anything to public. Some time ago in a closed algotrading group there was an interesting idea to agree on an interface for exchanging ready-made models in C++ dll (which anyway all the algotraders and machine traders use), which take as input a bunch of rows in json past, and then complemented by new data (candlesticks, ticks, tickans, etc.), and displays forecasts. In short, the essence of the idea is to exchange some standardized "black boxes" that can then be checked when the future comes, on the tester, when the data is available. Only this way you can understand whether the model works or not, well, you can also through the web-api, but it's cumbersome to keep for this VPN, etc.. especially if the number of models. And so all these figures accuracy, Sharp ratio, etc. have little meaning, there are 100500 ways to not consciously fit and as much consciously and to understand it no one will, you need more vestigial evidence.

 
Maxim Dmitrievsky:

If you have classification, you can estimate through relative classification error or logloss (cross entropy), if regression, rmse will do. You can also measure the error differential on the tray and test and achieve the smallest difference.

Just the settings are chosen so that the trayn, validation and test have about the same separation in the predicted classes.

Just what this is all about, the forest can easily be overtrained even with shallow trees, and certainly if trees are created to clean sheets, there will be overtraining with a higher probability.

So how do you avoid this? Well, here we go again with "garbage in garbage out". Are there any non "garbage" predictors in nature?

In theory, we take ROC_AUC and the value should stop growing along the horizontal axis, if the predictor has something worthwhile. But having looked through all of them, I haven't found any.

It draws strictly straight line upwards.

But no indicator will drive the market up by history, of course not.)

I have tried it many times but was not so impressed.

 
forexman77:

Just the settings are chosen so that the tray, validation, and test have about the same separation in the predicted classes.

Just what this is all about, the forest can be easily overtrained even with shallow trees, and certainly if trees are created to pure leaves, there will be overtraining with a higher probability.

So how do you avoid this? Well, here we go again with "garbage in garbage out". Are there any non "garbage" predictors in nature?

In theory, we take ROC_AUC and the value should stop growing along the horizontal axis, if the predictor has something worthwhile. But having looked through all of them, I haven't found any.

It draws strictly straight line upwards.

But, no indicator will drive the market up in such a way based on the history.)

If the error on the valid line is the same as on the trace, everything should work. Obviously this is not the case

 
Maxim Dmitrievsky:

If the error on the valid. section is the same as on the trace, then everything should work. Obviously you don't.

Well, not exactly identical, close. If completely identical, it's a tree of depth three, the picture was given.

Depth 15 is chosen, which showed the test more or less.

In about 20 minutes I'll lay out the breakdown by class.

 

Depth Three:

[[8010 7122]
 [7312 8410]]
трайн наоборот

[[8026 7105]
 [7209 8512]]
трайн 

[[5538 5034]
 [5117 5395]]
предсказание по обученной модели на трайн, эти данные не участвовали в обучении.
Поясню данные для теста берутся не из не использованных выборок, это данные, которые вообще не доступны для
алгоритма в процессе обучения(находятся вне временного промежутка участка обучения).

Depth 15:

[[7667 7464]
 [7227 8494]]
трайн наоборот

[[14430   702]
 [  661 15061]]
трайн 

[[5405 5167]
 [4958 5554]]
тест

At the same time although depth 15 leads obviously to overtraining, but the forward is better with it. Also on other models I have. When not heavily overfitting.

Forwards:

15

3

It turns out to predict the tags of the class you're looking for 4-6% more than the negative.

 
Gianni:

Nice curve :) But it is unlikely to intrigue anyone, it is not clear what kind of software, how did you calculate this curve. On your datasets I got not much more than 52% accuracy, by the way you have markers end before the chips, I cut them in my own. I need to add more cut prices from which you got lern and test, to then run on the bulltester the result of the classifier.

PS: in fact any tester yield curves and as it turned out reports on quality of classification/regression can not prove anything to public. Some time ago in a closed algotrading group there was an interesting idea to agree on an interface for exchanging ready-made models, for example, in C++ dll (which all algotraders and machine traders use anyway), which take as input a bunch of rows in json past, and then complemented by new data (candlesticks, ticks, tickans, etc.), and displays forecasts. In short, the essence of the idea is to exchange some standardized "black boxes" that can then be checked when the future comes, on the tester, when the data is available. Only this way you can understand whether the model works or not, well, you can also through the web-api, but it's cumbersome to keep for this VPN, etc.. especially if the number of models. And so all these figures accuracy, Sharp ratio, etc. have little meaning, there are 100500 ways to not consciously fit and as much consciously and to understand it no one will, we need more vestigial evidence.

There are waning null tuples of features at the beginning of the training and test data samples, probably there was not enough history to calculate them, and the algorithm did not control that, so they also need to be removed to work correctly.

And where is this group, if not a secret, and is it possible to look there?

 
forexman77:

Depth Three:

Depth 15:

At the same time although depth 15 leads obviously to overtraining, but the forward is better with it. Also on other models I have. When not heavily overfitting.

Forwards:

15

3


I think you need to reduce the number of deals, it seems on every bar...

Reason: