Machine learning in trading: theory, models, practice and algo-trading - page 496

 
Dr. Trader:

Can the forest extrapolate? Yes.
Does it do it well? No.


What is good and what is bad?

Do you have a comparative analysis of different models? And from the beginning: with the suitability of particular predictors for a particular target, with the suitability of a particular set of predictors for a particular model, and then an evaluation with a mandatory run on a file outside the training files? With justification that the models are not over-trained.


With all this in place, it will be possible to judge what is good and what is bad for a particular set of predictors and target. At the same time it should be understood that most likely there is a different set of predictors and the target, which will give a different result.


For my particular case, I did this kind of work. The result posted several times in this thread. Order of models as they get worse: ada, rf, SVM. The worst is NS, but its some ancient version, I did not use modern ones. This is all subject to the above conditions.

 
Dr. Trader:

Here's an interesting example, I posted it in this thread some time ago.
Extrapolation in this case would be predicting outside the "cloud of known points"

If the known points are well clustered, you can see that extrapolation is not a problem for most models.
But if the known points were arranged more randomly, without obvious clusters, then the prediction itself would be worse, and the extrapolation would not be credible.

It's all about predictors, if you put all kinds of garbage into the model you really won't get good extrapolation.
I don't think it's possible to find ideal predictors for forex, I would never trade with extrapolation based on financial data.


extrapolation is a prediction on unknown points, if the points are outside the max and min of the training sample, then RF will always output max and min from the trained sample

and you are just confused with mb approximation?

 
Dr. Trader:

Here's an interesting example, I posted it in this thread some time ago.
Extrapolation in this case would be predicting outside the "cloud of known points".

If the known points are well clustered, we can see that extrapolation is not a problem for most models.
But if the known points were arranged more randomly, without obvious clusters, then the prediction itself would be worse, and the extrapolation would not be credible.

It's all about predictors, if you put all kinds of garbage into the model you really won't get good extrapolation.
I don't think it's possible to find ideal predictors for forex, I would never trade with extrapolation based on financial data.


The question of trust in statistics in general is philosophical.

Here is the classification.

Does the very notion of "extrapolation" apply to it? To my mind, no. Classification finds patterns, and then tries to distribute new data according to these patterns.


Extrapolation is in analytical models that have some function in analytical form.


AND ARIMA? Does it have extrapolation? It depends on what. The model itself takes the last few bars, usually just one. But parameter selection requires thousands of bars. This thousand is extrapolated, and the one in the last calculation is not.


I do not think that extrapolation in its mathematical sense is applicable to the financial markets.

 
SanSanych Fomenko:

The question of trust in statistics in general is philosophical.

Here is the classification.

Does the very notion of "extrapolation" apply to it? Not to me it doesn't. Classification finds patterns, and then tries to distribute new data according to these patterns.


Extrapolation is in analytical models that have some function in analytical form.


AND ARIMA? Does it have extrapolation? It depends on what. The model itself takes the last few bars, usually just one. But parameter selection requires thousands of bars. This thousand is extrapolated, and the one in the last calculation is not.


I don't think extrapolation in its mathematical sense is applicable to the financial markets.


Extrapolation in MO is the ability of the model to work on new data, and it is a special type of approximation. On the training sample your model APPROXIMISES, on new data not in the training sample it EXTRAPOLISES

that's why the linear regression example is given compared to XGboost, which you didn't read carefully, linear regression extrapolates perfectly, while anything involving decision trees CANNOT extrapolate because of the structure of decision trees

 

Linear regression in general exists and in particular extrapolates ONLY on stationary series with normally distributed residuals from the model. There are a huge number of limitations to its application that make this type of model useless for financial series.

Or one gets into the APPLICABILITY of the models to his specific data, then it's modeling, in all other cases it's a numbers game.

A huge number of posts on this thread are a numbers game, since no evidence is given to prove otherwise.

 
SanSanych Fomenko:

Linear regression in general exists and in particular extrapolates ONLY on stationary series with normally distributed residuals from the model. There are a huge number of limitations to its application that make this type of model useless for financial series.

Or one gets into the APPLICABILITY of the models to his specific data, then it's modeling, in all other cases it's a numbers game.

A huge number of posts in this thread is a numbers game, as no evidence is given to prove otherwise.


what does linear regression have to do with it? the question was how to use scaffolding correctly so as not to make stupid mistakes, for example, thinking that they can EXTRAPOLIZE

feed a time series in the forest in the form of quotes, and the model will predict only the maximum and minimum value of the studied series, if it goes beyond the range of the new data

 
Aliosha:

What a mess, gentlemen...

a little bit of information from KO:


In financial markets, extrapolation/interpolation is applicable and very much in demand.


If it is "applicable and in demand", then why have you piled up a successful TS for all these years?

P.S. I hear a cat screaming... Right, I think Alyosha wrote something again!

 
Aliosha:

What a mess, gentlemen...

a little information from KO:

Extrapolation and interpolation in the MO context are ONE AND ONE! In both cases, you need to getthe value (int,float[]) of a point that does NOT match the point from the training dataset. Reservations about the location of the point in hyperspace, in relation to the training point cloud, is IMPOSSIBLE, since it all depends on the features, the structure of the feature space, in one projection will be a point "outside" the training cloud, in another "inside", it does not matter, it makes sense only WHAT IS NOT IN TRAINING, point.

To summarize: If the point is not in the training dataset, the result of its classification or regression, will be both extrapolation and interpolation, depending on the final interpretation of the result by the subject area, but for the MO algorithm - THIS IS ONE AND THE ONE.

Les extrapolates - great! In capable hands, it is better and orders of magnitude faster than NS.

In financial markets, extrapolation/interpolation is applicable and very much in demand.


A separate advice to Maxim: a clever person is more often mistaken than a fool, because he makes much more tests, but only the fool is emotionally attached to his point of view and it is hard for him to part with it. You choose who you are)))


ok, give me an example of at least 1 article with an example that shows how well scaffolding extrapolates. I couldn't find any.

That in my opinion is not great.

and how are you going to understand when the point is inside and when outside the cloud, when you have a lot of different features, and what is this about when it is more important range of target values in training, when all the trees are built, the target can not go out of this range EVER


 
Maxim Dmitrievsky:

Linear regression is excellent at extrapolation, while anything involving decision trees CANNOT extrapolate

Extrapolation involves predicting new data beyond the predictor values known during training.

Here's a piece of an old picture, all shaded green is extrapolation, and judging by the picture the forest can do it, otherwise it would be colored white (as in the case of some SVM models)


The forest, the neural network and the linear model all know how to extrapolate. If you give data that's far away from known values for prediction then all these models will give you a prediction, they all have some kind of algorithm for such cases.

But why do you think that if a linear model does extrapolation using the formula y=ax+b then it's doing fine, but if the forest is doing it using the nearest known neighbor then it can't do anything? Both of these algorithms have the right to exist. As SanSanych said - for each set of predictors and target, you need to do research and compare models, only then you can say if the models do extrapolation perfectly.
What is written in the articles on the hubra - also applies to specific predictors and targets, it is not the truth that works for all cases, it is a specific study for a particular case.

 
Dr. Trader:

Extrapolation implies predicting new data beyond the predictor values known during training.

Here is a piece of an old picture, everything shaded green is extrapolation, and judging by the picture the forest can do it, otherwise it would be colored white (as in the case of some SVM models)


The forest, the neural network and the linear model all know how to extrapolate. If you give data far away from known values for a prediction, all these models will give a prediction, they all have some algorithm for such cases.

But why do you think that if the linear model does extrapolation by the formula y=ax+b it does it perfectly, but if the forest does it by the nearest known neighbor it can't do anything? Both of these algorithms have the right to exist. As SanSanych said - for each set of predictors and target, you need to do research and compare models, only then you can say if the models do extrapolation perfectly.
What is written in articles on hubra - also applies to specific predictors and target, it's not a truth that works for all cases, it's a specific study for a specific case.


You just have to do a tree study.


Reason: