Machine learning in trading: theory, models, practice and algo-trading - page 136

 

And how exactly do you calculate R^2, what function?

I tried to train different models through rattle, it counted "pseudo r^2" through correlation, namely cor(fitpoints[,1], fitpoints[,2])^2, but I want to count r^2 with the same method as you did for comparison.

Will this code [1 - sum((y-x)^2)/sum((y-mean(y))^2)] work?

 
Dr.Trader:

And how exactly do you calculate R^2, what function?

I tried to train different models through rattle, it counted "pseudo r^2" through correlation, namely cor(fitpoints[,1], fitpoints[,2])^2, but I want to count r^2 with the same method as you did for comparison.

Will this code [1 - sum((y-x)^2)/sum((y-mean(y))^2)] work?


Exactly. X is a model.

 
Dr.Trader:

The more neurons in the hidden layer - the more complex function can be described by the neuron, you need more hidden layers and neurons in them.

But then the problem will be that the neuron uses consecutive additions and multiplications (and for example sigmoids for the activation function) to describe the target, so obviously you will not get your original function, but some kind of approximation. And it may well turn out that this approximation will remember some peculiarities in training data, which will not work correctly on new data. So sometimes you need to stop training, see if the error on the test sample has decreased, and continue training if all is well. At some point the error on the test data will start to grow, then the training must be stopped completely.

Also, the output of neuronics is limited by the activation function. For popular ones - sigmoid is (0;1), relu is [0;inf). Target values need to be scaled to a different interval, your outputs in the interval (-7;7) are simply unattainable for many packages.

I scale all data with cf 0 st.off 1.

you can target -1;1. But this is only really needed if the output neuron has tangent activation.

0:;1 for sigmoid in a similar case.

And if there is an identity there, in principle, it is not necessary. but you have to take into account the real spread of the data. The weights may not be saturated to that level.

And I'm learning with output curves on trayne and test and I see where to stop.
 

Once again I took rattle, trained nnet, took ready code from the log. Rattle doesn't quite work correctly with nnet, I added some more code to stop training in time.

The best R^2 on new data = 0.18. The best network configuration turned out pretty funny, with one neuron in the only inner layer. You can still have two neurons in the inner layer, it will be about the same result. If you keep increasing the number of neurons, the graph shows that the network retrains very quickly and works worse and worse on new data.

On the right graph the blue line is new data for the model, from line 20001. The rest is training and crossvalidation.

The convolutional network seems to be in the lead.

Files:
 
Dr.Trader:

Once again I took rattle, trained nnet, took ready code from the log. Rattle doesn't quite work correctly with nnet, I added some more code to stop training in time.

The best R^2 on new data = 0.18. The best network configuration turned out pretty funny, with one neuron in the only inner layer. You can still have two neurons in the inner layer, it will be about the same result. If we continue to increase their number, the graph shows that the network retrains very quickly and performs worse and worse on new data.

On the right graph the blue line is new data for the model, from line 20001. The rest is training and crossvalidation.

The convolutional network seems to be leading the way.

The result is good! Congratulations, you beat my simple NS.

Did you prepare the chips or did you feed them as they are lags? This problem can't be solved on pure lags, it seems. Preparation of chips is necessary.

Another observation - it seems that your network output is strictly -1;1. Either you took a tangent activation and then did not do reverse output conversion, or something else.

And just for reference. The convolutional network (deep learning) in the mxnet package. So far the version is only with git. But the basic stuff works.
 
Alexey Burnakov:

And just for reference. Flux network (deep learning) in the mxnet package. So far the version with git only. But basic everything works.

just an observation, off topic but.... When I asked for help with the convolutional network and pointed to the package mxnet, everyone was silent, silent, and now everyone suddenly became interested in what I was talking about 50 pages ago, why is this happening? :) I wonder if after 100 pages someone will notice the package quantstrat, which I also paid attention to a long time ago....

You'll say - ha, well, go and do it yourself if you're so smart, the fact is that I am not and is not smart and I do not understand much, and with English, too, all very poor, and your four lines of code with explanations for me can equal to spend a week to understand myself, and that does not always succeed ...

 
mytarmailS:

just an observation, off topic but.... When I asked for help with the convolutional network and pointed out the package mxnet, everyone was silent, saying "just hang out, man" and now everyone is suddenly interested in what I said about 50 pages ago, why is this happening? :) I wonder if after 100 pages someone will notice the package quantstrat, which I also paid attention to a long time ago....

You'll say - ha, well, go and do it yourself if you're so smart, the fact is that I am not and is not smart and I do not understand much, and with English, too, all very poor, and your four lines of code with explanations for me can equal to spend a week to understand myself, and that does not always succeed ...

What a great comment, colleague! ) Ha

Let me answer point by point as I see it:

1) 90% of people here are guided by their own interest, the thinking phase and the experimental stage. Therefore, some good ideas are put aside for a long time. Since you didn't give any interesting examples or tasks, no one is interested. Simple, isn't it?

2) There are two strategies to gain knowledge: digging and painfully trying to do something (we all do it to varying degrees. For example, I remember the dude in the theme "Learn to make money, Villagers!", who spent several years of his life testing the performance of available advisors. And they all failed). Another option - to wait in the hope that someone will help and post the ready-made. So, if you, by virtue of your circumstances, have chosen the second strategy, the waiting period for the ready one may be very extended.

Regarding mxnet, now that I'm into it, I don't mind posting the code, which, by the way, is almost the same form on the Internet:

install.packages("drat", repos="https://cran.rstudio.com")
drat::: addRepo("dmlc") 

install.packages("mxnet") 

 

 

train.x = data.matrix(dat_ready_scale[1:(nrow(dat_ready_scale) / 2), 1:100])

train.y = dat_ready_scale[1:(nrow(dat_ready_scale) / 2), 101]

test.x = data.matrix(dat_ready_scale[!rownames(dat_ready_scale) %in% rownames(train.x), 1:100])

test.y = dat_ready_scale[!rownames(dat_ready_scale) %in% rownames(train.x), 101]



########

train.x <- t(train.x)

test.x <- t(test.x)


dim(train.x) <- c(100, 1, 1, ncol(train.x))

dim(test.x) <- c(100, 1, 1, ncol(test.x))

#########



############ BUILD NET


library(mxnet)


# first conv layer


data <- mx.symbol.Variable('data')


conv1 <- mx.symbol.Convolution(data = data,

  kernel=c(14, 1),

  stride=c(1, 1),

  num.filter = 1)


tanh1 <- mx.symbol.Activation(data = conv1, 

 act.type = 'relu')


pool1 <- mx.symbol.Pooling(data = tanh1, 

     pool_type = "avg",

     kernel=c(5, 1), 

     stride=c(1, 1))



# second conv layer


conv2 <- mx.symbol.Convolution(data = conv1,

  kernel=c(12, 1),

  stride=c(1, 1),

  num.filter = 1)


tanh2 <- mx.symbol.Activation(data = conv2, 

 act.type = 'relu')


pool2 <- mx.symbol.Pooling(data = tanh2, 

     pool_type = "avg",

     kernel=c(5, 1), 

     stride=c(1, 1))



# third conv layer


conv3 <- mx.symbol.Convolution(data = conv2,

  kernel=c(10, 1),

  stride=c(1, 1),

  num.filter = 1)


tanh3 <- mx.symbol.Activation(data = conv3, 

 act.type = 'relu')


pool3 <- mx.symbol.Pooling(data = tanh3, 

     pool_type = "avg",

     kernel=c(2, 1), 

     stride=c(1, 1))



# first fully connected layer


flatten <- mx.symbol.Flatten(data = conv3)


fc1 <- mx.symbol.FullyConnected(data = flatten

   , num_hidden = 10)


tanh4 <- mx.symbol.Activation(data = fc1, act.type = 'tanh')



# second fully connected layer


fc2 <- mx.symbol.FullyConnected(data = tanh4, num_hidden = 1)


lenet <- mx.symbol.LinearRegressionOutput(data = fc2)


#### train


device <- mx.cpu()

log <- mx.metric.logger$new()


model <- mx.model.FeedForward.create(lenet, 

 X = train.x,

 y = train.y,

 ctx = device, 

 num.round = 100, 

 array.batch.size = 128,

 learning.rate = 0.01, 

 momentum = 0.9,

 eval.metric = mx.metric.rmse,

 eval.data = list(data = test.x, label = test.y),

 optimizer = 'sgd',

 initializer = mx.init.uniform(0.5),

 #array.layout = 'rowmajor',

 epoch.end.callback = mx.callback.log.train.metric(1, log))


plot(log$train, type = 'l', col = 'blue', ylim = c(min(c(log$train, log$eval)), max(c(log$train, log$eval))))

lines(log$eval, type = 'l', col = 'red')


mx.ctx.internal.default.value = list(device="cpu",device_id=0,device_typeid=1)

class(mx.ctx.internal.default.value) = "MXContext"


preds <- as.numeric(predict(model, test.x))

1 - sum((test.y - preds)^2) / sum((test.y - mean(test.y))^2)

Naturally, it's just a fish showing the basic logic.

 
Alexey Burnakov:
Did you prepare the chips, or did you feed them as they are lags? This problem cannot be solved on pure lag, it seems. Preparation of chips is necessary.

Another observation - it seems that your network output is strictly -1;1. Either you took tangent activation and then did not do reverse output conversion, or something else.

I fed everything as it was in the original, no changes. I'm sure I could have gotten a better result if I had calculated the values of the right indicators and used them for prediction as well. But which are the right indicators? (rhetorical question, no answer yet).

Everything is funny about "-1;1" in the answer. The output on the last neuron there is linear, without activation function, so it's not limited by anything. I tried scaling target values in -1;1 too, but after that the network started giving results in the range (-0.2;0.2). For some reason the results will always be in a narrower range than required, probably because of fast halted learning, only 250 iterations.
If we add more neurons and don't stop learning, then at the end the network will be trained to the desired invalue. 100 neurons in inner layer is almost enough for 100% accuracy on training data. Judging by the log, the sum of residuals on all 20000*0.7 (corrected later) results was around 200. But in this case results on crossvalidation will cease to correlate with the desired ones at all, they will just be random values, although in the right interval.

 
Dr.Trader:

I fed everything as it is in the original, no changes. I'm sure I could have gotten a better result if I had calculated the values of the right indicators and used them for prediction as well. But which are the right indicators? (rhetorical question, no answer yet).

About the "-1;1" in the answer, it's all funny. The output on the last neuron there is linear, with no activation function, that is, it is not limited by anything. I tried scaling target values in -1;1 too, but after that the network started giving results in the range (-0.2;0.2). For some reason the results will always be in a narrower range than required, probably because of fast halted learning, only 250 iterations.
If we add more neurons and don't stop learning, then at the end the network will be trained to the desired invalue. 100 neurons in inner layer is almost enough for 100% accuracy on training data. Judging by the log, the sum of residuals on all 20000*100 predictors was about 200. But in this case the results on crossvalidation will cease to correlate with desired ones, they will just be random values, although within the required interval.

That's funny. I'll have to think about it.

HH: most likely, in this range [-1;1] , the network gets the most consistent signals at the input and this fragment of the function is the easiest to model (the NS learns what is easiest). And, naturally, this variant, when the gradient descent finds its minimum. It's hard to argue with that...

OK, I'll add a hint for you if you still want to practice.

First, R^2 0.55 can really be achieved by applying some small functional transformation to the "metafunction". Another thing is that the function turns out to be a bit complicated in appearance.

One more thing - try to take:

rowMeans(df[, 1:10])

rowMeans(df[, 1:20])

rowMeans(df[, 1:30])

...

rowMeans(df[, 1:100])

These 10 metafiches contain the desired combination of meaningful inputs.

By the way, convolutional layers let you pick this out already in the learning process, if you know where to dig.

Why prompt, in fact - even if you know what to mask with, you have to try hard to reproduce the output approximation. And as an insider, I don't like the feeling of trying to sell people an unsolvable problem.

 
Alexey Burnakov:

Just a great comment, colleague! ) Ha

Allow me to respond point by point according to my vision:

I understand you, I expected a rougher reaction, thanks for not beating))) and for the code too thanks, at the weekend I will try to figure it out, as now I am busy with another idea....

Need some help...

I want to do a cointegration test in sliding but it throws an error...

here's a simple test on static data


library(tseries) 
ri <- cumsum(rnorm(10000))  #  типа цены
si <- cumsum(rnorm(10000))  #  типа цены
ln <- length(ri)

data <- as.data.frame(cbind(ri,si))

#проводим  линейную регрессию для определения правильного соотношения
model <- lm(  ri ~ si + 0 , data)
#вычисляем  разницу цен (спред)
spread <- ri - coef(model)[1] * si
#проводим  тест Дики-Фуллера на стационарность
test <- adf.test(as.vector(spread), k=0)

test$p.value

Everything works...

But when I do the same in a sliding window, I get an error -Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :

0 (non-NA) cases

sliding window test

ri <- cumsum(rnorm(10000))  #  типа цены
si <- cumsum(rnorm(10000))  #  типа цены
ln <- length(ri)


data <- as.data.frame(cbind(ri,si))


test_vec <- rep(0,ln) #  тут будем хранить показатели теста 

for(i in 151:ln){
  print(i)
          idx <- (i-150):i
                #проводим  линейную регрессию для определения правильного соотношения
                model <- lm(  ri[idx] ~ si[idx] + 0 , data[idx,])
                #вычисляем  разницу цен (спред)
                spread <- ri[idx] - coef(model)[1] * si[idx]
                #проводим  тест Дики-Фуллера на стационарность
                test <- adf.test(as.vector(spread), k=0)
                
                test_vec[i] <- test$p.value
                
}

on stack overflow in similar problems say that it is because of "NA" in the data, but i don't have it, that's for sure...

what's the problem? please help

Reason: