Machine learning in trading: theory, models, practice and algo-trading - page 1605

 
mytarmailS:

What you are doing(the test on the "third" sample) in terms of GMDH is called the "predictive power criterion".

I see you are a good expert. Could you please state the essence of GMDH in a few phrases, for non-mathematicians?

 
secret:

I see that you are a good specialist. Could you explain the essence of MGUA in a few phrases, for non-mathematicians?

regression model with enumeration of features transformed by different kernels (polynomial, splines, doesn't matter). The simplest model with the lowest error is preferred. It does not save from overtraining in the market.

Roughly speaking, this is bruteforcing models, where the simplest one is chosen, based on external criteria

it's like the basics of machine learning )

 
mytarmailS:

For example, the MSUA regression just mocks the regression of the modern random forest algorithm and all sorts of boosting...

Boosting is better in everything, if you prepare chips like for MSUA, it will be better.

but you don't care if you don't know what to teach

 
secret:

I see that you are a good specialist. Could you explain the essence of MSUA in a few phrases, for non-mathematicians?

I'm not an expert at all )) unfortunately....

If very simply, roughly and imprecisely, the principle of MSUA is self-organization...


For example, we have a set of features

x1,x2,x3.....x20...

from these attributes we create a set of candidate models

m1,m2,m3.....m10...

from these models the best ones are selected, and from the best ones new models are created, again the selection .... etc... and so on, until the error in the new data (previously unknown to the algorithm) decreases

The algorithm changes itself, complicates itself, organizes itself... Something like a genetic algorithm

 
Maxim Dmitrievsky:

regression model with the enumeration of features transformed by different kernels (polynomial, splines, it does not matter). The simplest model with the smallest error is preferred. It does not save from overtraining in the market.

Roughly speaking, it's a bruteforcing of models, where the simplest one is chosen, based on external criteria

Then I see nothing new and original in this methodology.

 
mytarmailS:

from these models the best ones are selected, and from the best ones new models are created, again the selection .... etc... and so on until the error on the new data (previously unknown to the algorithm) decreases

The algorithm changes itself, complicates itself, organizes itself... Sounds a bit like a genetic algorithm.

Then I don't see mathematics here, it's more brain work, well, and coding. GA is a trivial thing.

Why then all run around with this MSUA, writing dissertations, so that it's impossible to understand them, if inside it's some primitive, intuitively understandable since kindergarten?

 
Maxim Dmitrievsky:

Boosting is better in everything, if you prepare the features like for MSUA, it will be better

but it doesn't matter if you don't know what to teach

I disagree...

Let's make a small test, quick, by eye )


Create four variables (regular random) of 1000 elements each

z1 <- rnorm(1000)

z2 <- rnorm(1000)

z3 <- rnorm(1000)

z4 <- rnorm(1000)

create the target variable y as a sum of all four

y <- z1+z2+z3+z4


let's train boosting and mgua, not even to prediction, but just to make it explainy

I split the sample into three pieces, one training two for the test


green is MSUA

The red is Generalized Boosted Regression Modeling (GBM)

gray is the original data

remember, the target is the elementary sum of all predictors

http://prntscr.com/rawx14

As we see both algorithms have coped with the task very well


Now let's make the task a bit more complicated

let's add cumulative sum or trend to the data

z1 <- cumsum(rnorm(1000))

z2 <- cumsum(rnorm(1000))

z3 <- rnorm(1000)

z4 <- rnorm(1000)

and change the target to look like

y <- z1+z2+z3

so we add up two predictors with a trend and one standard one, and z4 turns out to be a noise, because it doesn't take part in the target y

and so we get the following result

http://prntscr.com/rax81b

Our boosting is all fucked up, and MGUA doesn't matter at all


I managed to "kill" MSUA only with this wild target

y <- ((z1*z2)/3)+((z3*2)/z4)

And even that's not completely, and what about the boosting ? )))

http://prntscr.com/raxdnz


code for games

set.seed(123)
z1 <- cumsum(rnorm(1000))
z2 <- cumsum(rnorm(1000))
z3 <- rnorm(1000)
z4 <- rnorm(1000)

y <- ((z1*z2)/3)+((z3*2)/z4)

x <- cbind.data.frame(z1,z2,z3,z4) ; colnames(x) <- paste0("z",1:ncol(x))

tr <- 1:500
ts <- 501:800
ts2<- 801:1000

library(gbm)
rf <- gbm(y[tr] ~ ., data = x[tr,],
          distribution = "gaussian", n.trees = 1000,
           cv.folds = 5)
best.iter.max <- gbm.perf(rf, method = "cv")
prg <- predict(rf,x[c(tr,ts,ts2),],n.trees = best.iter.max)

library(GMDHreg)
gmd <- gmdh.gia(X = as.matrix(x[tr,]),y = y[tr],prune = 5,
                    criteria = "PRESS")
prh <- predict(gmd,as.matrix(x[c(tr,ts,ts2),]))

par(mfrow=c(1,3))
plot(head(y[tr],30),t="l",col=8,lwd=10,main = "train ")
lines(head(prg[tr],30),col=2,lwd=2)
lines(head(prh[tr],30),col=3,lwd=2)
plot(head(y[ts],30),t="l",col=8,lwd=10,main = "test ")
lines(head(prg[ts],30),col=2,lwd=2)
lines(head(prh[ts],30),col=3,lwd=2)
plot(head(y[ts2],30),t="l",col=8,lwd=10,main = "test2 ")
lines(head(prg[ts2],30),col=2,lwd=2)
lines(head(prh[ts2],30),col=3,lwd=2)


Скриншот
Скриншот
  • prnt.sc
Снято с помощью Lightshot
 
secret:

Then I don't see the math here, it's more brain work, well, and coding. GA is a trivial thing.

Why then all run around with this MSUA, write dissertations, so that it is impossible to understand them, if inside it is some primitive, intuitively understandable from kindergarten?

I don't know, but it describes the data much better, post written, code posted

 
mytarmailS:

I disagree...

Let's make a small test, quick, by eye )

I have no desire to mess around with R (I use python), maybe the reason is that MSUA creates fey regressors, so it fits. If you do the same selection for boosting, there will be no difference

Here's an MSUA enumeration for the forest

https://www.mql5.com/ru/code/22915

RL algorithms
RL algorithms
  • www.mql5.com
Данная библиотека имеет расширенный функционал, позволяющий создавать неограниченное количество "Агентов". Использование библиотеки: Пример заполнения входных значений нормированными ценами закрытия: Обучение происходит в тестере...
 
Maxim Dmitrievsky:

I have no desire to mess around with R (I use python), maybe the reason is that MSUA creates fey regressors, so it fits. If you do the same selection for boosting, there will be no difference

Here's an MSUA enumeration for the forest

https://www.mql5.com/ru/code/22915

First, what other fey regressors? What nonsense, then why does the MSUA go out when the problem gets harder?

Secondly, in my example I have the same data for both MSUA and Boost

thirdly, you don't need to mess around, can't you make a matrix with four random values in python and then make a cumulative sum of them? To check your boost?

2 lines of code ))


I'm curious what's in it myself.

Reason: