Machine learning in trading: theory, models, practice and algo-trading - page 104

 
Dr.Trader:

Vtreat is better. It evaluates everything statistically, how good/bad the predictor is overall for predicting the target variable, without adjusting for a particular prediction model. It is recommended to use predictors with a score of no more than 1/(number of predictors). For example if you have 200 predictors, you take from them only those that have a score less than 1/200. It is possible to estimate the predictors and if all estimates are higher than the threshold - instead of unsuccessfully trying to train the model and predict new data it is better to immediately start looking for other predictors.

There are a couple of disadvantages - the package works with predictors one at a time, and does not take into account their interaction. I also don't like that even if there are completely the same, or highly correlated predictors - vtreat will not remove repetitive ones, sometimes this is very annoying.

Actually, correlated predictors are evil.

Maybe the package requires pre-processing of predictors in general, e.g. scaling, centering, removing correlated ..... as in caret

Maybe so?

 
SanSanych Fomenko:

Actually, correlated predictors are evil.

Maybe the package needs pre-processing of predictors in general, like scaling, centering, removing correlated ..... as in caret

How about this?

By the way, nobody cancelled data mining. There is an excellent article on this topic here on the site of our colleague. Unfortunately, the author does not participate in this thread.
 
Dr.Trader:

You have developed a good toolkit for evaluating heuristics, solid. You've proved that the way you've developed for training the model (committee) is not suitable for forex, but what next?


You haven't put a point on the method yet. It's intriguing if only because 1/3 of the best models in training and testing go through another 5 years of validation with a plus. If everyone leaked.

Besides, there's another thought about my graph. If a model is so good that 95% of its values would lie above 0 on validation, then you can forget about the validation/test relationship and take any trained model.

That's the point of looking for powerful models (with good generalizing effects).

 
I always read such topics (not only on this forum) where they try to build complex trading theories.
Genetic algorithms, neural networks, intricate formulas that only the author understands, etc.

And I always see that such systems do not work in the market. Monitoring goes either to zero or to minus.
But in the neighboring topic, someone earns by using an Expert Advisor on two slips. And they earn good money.

The question is, does it all make sense?
Because in my experience, the easier and clearer the system is, the more profitable it is.
 
SanSanych Fomenko:

.... But only after sifting out the noise. And the fact that there is no noise is determined by the approximate invariance of the model's performance on different samples. Not the absolute value of the prediction error, but the fact of approximate equality of performance indicators, which (equality) can be interpreted as a proof of the absence of model overtraining.

I want to respond to you as well.

Here you are looking at the equality of indicators. You call it a lack of overtraining. And did you try to really assess the lack of retraining by testing your selected model with another large sample - a delayed sample? Wouldn't equality on one part of the data degenerate into a model fit for that data, and the model would drain in the future? I follow this pattern in my research.

 
Read about elasticnet. It is a method and package. Hybrid regularization for linear models. Predictor correlation is handled there.
 
Does anyone train their models separately for byes and selves?
 
Alexey Burnakov:

I want to answer you as well.

Here you are looking at the equality of indicators. You call it a lack of overtraining. But have you tried to assess the real lack of overtraining by testing your selected model with another large sample - a delayed sample? Wouldn't equality on one part of the data degenerate into a model fit for that data, and the model would drain in the future? I follow this pattern in my research.

I have my own algorithm for sifting out noise.

If applied, on selected predictors to train the model, the performance of this model is about equal on any samples. My thoughts on confidence intervals are from here, to get rid of the words "approximately equal".

I'll say more than that.

The practice looks different.

You have to work in the window. So to work in the window from my pre-selected set of predictors I start selecting predictors by rfe from caret. For a particular window I get some subset, which lowers the error by 5-7%. I do it once a week on H1. This subset is changed at the next weekend. I've been doing it this way since last year.

But getting rid of the noise beforehand is a must. If I don't do it I will see miracles.

 
I have no reason to make two models with opposite results:
Does anybody trains their models separately for buy and sell?

I predict only 2 classes - "buy" and "sell", that is, I will always have some trade open. I work with one model, I don't see the point in making two models that simply give opposite results.

But I'd like to change gradually to 3 classes - "buy"/"close all and not to trade"/"sell". That would give you a chance to trade with a more complex strategy. I've tried it a couple of times, but I've had trouble training models to three classes, especially if the model is regressive followed by rounding the result to classes.
I think it's worth trying to create two models in which the original 1/0/-1 classes are transformed into 1/0/0 for the first model(buy only), and 0/0/1 for the second model (sell only). This will lead to unbalanced classes in each model (the number of examples with one class is much higher than the other), but I have found good metrics for evaluating models that work under such conditions - F-score and kappa. I haven't done anything in this direction yet, but such a plan looks possible enough.

 
SanSanych Fomenko:

Actually, correlated predictors are evil.

Maybe the package needs pre-processing of predictors at all, like scaling, centering, removing correlated ..... as in caret

Maybe it does?

No, vtreat just doesn't analyze predictor interactions at all, unfortunately. It studies them strictly one at a time. It's not a graphical package :(
I don't think scaling or centering would make any difference. And if you enable the y-aware option, the package will scale and center the data itself.

Interesting article from Vladimir, thanks for the link. The analysis of interaction of predictors is right on topic.

Reason: