Machine learning in trading: theory, models, practice and algo-trading - page 497

 
Dr. Trader:

But why do you think that if the linear model does extrapolation by the formula y=ax+b, it does it perfectly, but if the forest does it by the nearest known neighbor, it can do nothing? Both of these algorithms have the right to exist.


I'm not counting anything, I showed you an example and a bunch of articles. What difference does it make whether the LR does it perfectly or not. The point is that RF can't extrapolate at all, constructively, ever, under any conditions, and LR is given for comparison and clarity.

That's what I'm asking and I just asked for UNDERSTANDING examples of why you don't think that's true :)

 
Aliosha:



What does this have to do with "articles"? Are you kidding me? I gave the example of Minsky, who is like Newton only in ML and he screwed up so brightly, and you're talking about throw-ins on Habra or scripts in R (read: the algorithm itself did not build a wank a little parameters)


If you built the forest yourself in C++ you would have guessed to do "extrapolation" a la MLP, but in R... Godspeed...


I don't know any Minsky and Pozharsky and I don't understand what you have on your charts ) You need to teach RF some set with targets from 0 to 10 or up to 100 and then give the answer, which should obviously be over 100 and RF should output only 100

Here the author has in the article:

#  set up functionality for modelling down the track
library(xgboost) #  extreme gradient boosting
library(nnet) #  neural network
library(ranger) # for random forests
library(rpart) # for demo single tree
library(rpart.plot)
library(viridis) # for palette of colours
library(grid) # for annotations

#  sample data - training set
set.seed(134) # for reproducibility
x <- 1:100 + rnorm(100)
y <- 3 + 0.3 * x + rnorm(100)

#  extrapolation / test set, has historical data plus some more extreme values
extrap <- data.frame(x = c(x, 1:5 * 10 + 100))

mod_lm <- lm(y ~ x)
mod_nn <- nnet(y ~ x, size = 8, linout = TRUE)

#  XG boost.  This is a bit more complicated as we need to know how many rounds
#  of trees to use.  Best to use cross-validation to estimate this.  Note - 
#  I use a maximum depth of 2 for the trees which I identified by trial and error
#  with different values of max.depth and cross-validation, not shown
xg_params <- list(objective = "reg:linear", max.depth = 2)
mod_cv <- xgb.cv(label = y, params = xg_params, data = as.matrix(x), nrounds = 40, nfold = 10) #  choose nrounds that gives best value of root mean square error on the training set
best_nrounds <- which(mod_cv$test.rmse.mean == min(mod_cv$test.rmse.mean))
mod_xg <- xgboost(label = y, params = xg_params, data = as.matrix(x), nrounds = best_nrounds)

mod_rf <- ranger(y ~ x)

p <- function(title) {
    plot(x, y, xlim = c(0, 150), ylim = c(0, 50), pch = 19, cex = 0.6,
        main = title, xlab = "", ylab = "", font.main = 1)
    grid()
}

predshape <- 1

par(mfrow = c(2, 2), bty = "l", mar = c(7, 4, 4, 2) + 0.1)

p("Linear regression")
points(extrap$x, predict(mod_lm, newdata = extrap), col = "red", pch = predshape)

p("Neural network")
points(extrap$x, predict(mod_nn, newdata = extrap), col = "blue", pch = predshape)

p("Extreme gradient boosting")
points(extrap$x, predict(mod_xg, newdata = as.matrix(extrap)), col = "darkgreen", pch = predshape)

p("Random forest")
fc_rf <- predict(mod_rf, data = extrap)
points(extrap$x, fc_rf$predictions, col = "plum3", pch = predshape)

grid.text(0.5, 0.54, gp = gpar(col = "steelblue"),
          label = "Tree-based learning methods (like xgboost and random forests)\nhave a particular challenge with out-of-sample extrapolation.")
grid.text(0.5, 0.04, gp = gpar(col = "steelblue"),
          label = "In all the above plots, the black points are the original training data,\nand coloured circles are predictions.")

I do not understand r badly, I understand only that from 100 to 150 RF should have predicted an adequate result, as other models, but this did not happen


 
Alyosha:

It should not. It will give a local interpolation of nearest points, like Knn (quasi-optimal classifier), but rougher. You just don't know how to rotate bases in RF trees and it looks like "cubes".


Well, in a previous post I added a code with a screenshot, what's "wrong" there?

 
Alyosha:

The fact that in the forest algorithm trees divide points by one attribute orthogonally, if you rotate the basis you get the same as in MLP, for this you need to get into the code of the forest and correct or write your own forest)))


I'm sorry, that would be a different kind of forest. I meant the classic version.

I'm trying to master what I have, and write something there...

in the end what, the classical RF does not know how to extrapolate

 
Aliosha:

In ML there are no "classics" there is something that works and solves the problem. To master someone else's algorithms in all their diversity is as sensible as to understand the code of all indicators in Kodobase and Market, which is not reasonable...

There are not so many basic heuristics in ML, which should be mastered by yourself, manually, so that "it bounces off your fingers", you wake up in the middle of the night and just from memory type in C++ gradient busting for half an hour (just kidding), it's not as hard as it seems, and then you can generate 100500 variations of algorithms from hobber articles by yourself.


Ohoho...

Dr. Trader repeated twice, I'll repeat it to you too, the third, they say God and the unclean three times they want to hear, it means something, in a mystical context...

New points in the PRINCIPLE SPACE, in relation to physical time are NOT LOCATED STRICTLY OUTSIDE THE ENTIRE SPACE OF POINTS, time is time, in chips your are chips, well not connected physical time linearly with for example momentum or spectrum. The "extrapolated" points will be anywhere inside and outside in your feature space.


And I didn't say that, I just said that the structure of the ancients is such that if they branched out on all the training values of the targets, the model will output strictly what it is branched out on, and no new values it can output... at least this is what it says in the article with the example. I'll make my examples and show then what I've got :) If the threshold value in the target is 100 in training, the output it can not give more than 100 ... Because all values above 100 will go into the sheet 100, it is purely physically does not have sheets with values greater than 100.

 
Maxim Dmitrievsky:

If the limit value of the target was 100 during training, it cannot output more than 100... because all values above 100 will go into the 100 list.

Normalization was invented for a reason.
 
Yuriy Asaulenko:
Normalization is invented for a reason.

it is clear, the question of principle is about the functioning of trees. No matter how you normalize, any outlier on new data the tree will not extrapolate, but will give the extreme value it knows. That's why it's not necessary to normalize data for trees.

 
Maxim Dmitrievsky:

it is clear, the question of principle is about the functioning of trees. No matter how you normalize, any outlier on new data the tree will not extrapolate, but will give the extreme value it knows. That's why it's not necessary to normalize data for trees at all.

I don't see the need for RF for myself yet, but for MLP I not only normalize, but also pass the input signal through a sigmoid - i.e. the dynamic range of inputs is limited and outliers don't matter.
 
Maxim Dmitrievsky:

it is clear, the question of principle is about the functioning of trees. No matter how you normalize, any outlier on new data the tree will not extrapolate, but will give the extreme value it knows. That's why it's not necessary to normalize data for trees at all.

I think the solution in this case is simple, introduce feedback.

Respectfully.

 
Andrey Kisselyov:

I think the solution in this case is simple, introduce feedback.

Respectfully.


I have one :) I don't care if it can't extrapolate or if it can... the model will predict according to a known set... just for general education

there are some errors in lib with model errors, the smaller the set the smaller the error, i don't understand rpicol yet

Reason: