Machine learning in trading: theory, models, practice and algo-trading - page 3048

 
Aleksey Vyazmikin #:

In general updated, even the error does not write, but the result is the same - all up almost.

ran it, I got the same picture))


I realised the error, there I had a peek into the future in the target... yeah... we lose our grip.

This line should be replaced

dp <- с(diff(close),0)   

by

dp <- tail(c(diff(close),0),nrow(X))


I rewrote the code a bit more readable.

close <- cumsum(rnorm(10000,sd = 0.00001))+100
par(mar=c(2,2,2,2))  ; plot(close,t="l")


D <- make_data(close)
tr <- 1:500
R <- make_rules(y = D$Y[tr] , x = D$X[tr,])
# head(R)
buy_rules <- R$condition[ R$pred==1 ]



plot(x = 1:2000,y = rep(NA,2000), ylim = c(-0.001,0.001)) 
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( D$diff_price[  eval(str2expression(buy_rules[i]))  ] )
  lines(cum_profit,col=8,lwd=1)}
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( D$diff_price[  eval(str2expression(buy_rules[i]))  ] )
      
      if(length(cum_profit)>30){
      ccor <- cor(cum_profit, 1:length(cum_profit))
      if(ccor>=0.95)  lines(cum_profit,col=i,lwd=2)
      }
}
abline(h = 0,col=2,lty=2)
gc(T,T)

helper functions

make_rules <- function(y, x){
  library(inTrees)  # ?inTrees::getRuleMetric()
  library(RRF)
  rf <- RRF(x = x,y = y,ntree=100)
  rule <- getRuleMetric(unique(extractRules(RF2List(rf),x)),x,y)
  rule <- data.frame(rule,stringsAsFactors = F)
  for(i in c(1,2,3,5)) rule[,i] <- as.numeric(rule[,i])
  return(rule)}
make_data <- function(close){
  sw <- embed(x = close,dimension = 10)[,10:1] #  make slide window data
  X <- t(apply(sw,1,scale)) #  normalase data
  
  dp <- tail(c(diff(close),0),nrow(X)) #  diff prices
  Y <- as.factor( ifelse(dp>=0,1,-1) ) #  target for classification
  res <- list(Y=Y,X=X,diff_price=dp)
  return(res)
}


 
Maxim Dmitrievsky #:

Well, if there are no other evaluation criteria, then through the stability of the parameters

one can also represent the output values of the TS as signals in time and measure their entropy and compare it with randomness. If the TS captures some regularities that are repeated with some periodicity, it will be reflected.

For builders of custom FFs, it might be useful.

The best measure is time and tests in real life. Any TC will stop working.

I have already understood why it all does not work on new data, and even roughly understand what to do.

 
Aleksey Nikolayev #:

The question about ONNX from ONNX arose simply from the juxtaposition of two statements I have encountered: 1) model acquisition can be represented as a pipeline, 2) the pipeline can be converted to ONNX format.

It is clear that this is hardly possible in practice. In fact, I would like to understand what exactly prevents the implementation of such a possibility, in order to realise the fundamental limitations of this technology as a whole.

It's one thing if it's limitations like impossibility to write to a file and another if it's limitations like lack of support for data types (dataframes, for example).

Both statements are true. Getting a model including preprocessing is possible. Unfortunately not in all frameworks and only the simplest ones. TF/Keras implement the first NN layers performing preprocessing. scikit-learn has the richest choice of pipeline+model . See scl2onnx.

It is good to see that serious contributors realise that ONNX should include the whole pipeline starting with preprocessing. New data in production must go through the same preprocessing steps as in training. Otherwise, the results of the ONNX model will be unpredictable.

This direction is rapidly developing and I think this issue will be solved soon. For now you should experiment.

Good luck

 
Comments not relevant to this thread have been moved to "Unacceptable way of communicating".
 
mytarmailS #:

ran it, I got the same picture ))


Realised my mistake, I had a peek into the future in the target one...yep...losing my touch.

This line should be replaced

to


slightly rewritten the code to be more readable

auxiliary functions


Trying the modified code

close <- cumsum(rnorm(10000,sd = 0.00001))+100
par(mar=c(2,2,2,2))  ; plot(close,t="l")


D <- make_data(close)
tr <- 1:500
R <- make_rules(y = D$Y[tr] , x = D$X[tr,])
#  head(R)
buy_rules <- R$condition[ R$pred==1 ]



plot(x = 1:2000,y = rep(NA,2000), ylim = c(-0.001,0.001)) 
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( D$diff_price[  eval(str2expression(buy_rules[i]))  ] )
  lines(cum_profit,col=8,lwd=1)}
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( D$diff_price[  eval(str2expression(buy_rules[i]))  ] )
  
  if(length(cum_profit)>30){
    ccor <- cor(cum_profit, 1:length(cum_profit))
    if(ccor>=0.95)  lines(cum_profit,col=i,lwd=2)
  }
}
abline(h = 0,col=2,lty=2)
gc(T,T)

make_rules <- function(y, x){
  library(inTrees)  # ?inTrees::getRuleMetric()
  library(RRF)
  rf <- RRF(x = x,y = y,ntree=100)
  rule <- getRuleMetric(unique(extractRules(RF2List(rf),x)),x,y)
  rule <- data.frame(rule,stringsAsFactors = F)
  for(i in c(1,2,3,5)) rule[,i] <- as.numeric(rule[,i])
  return(rule)}
make_data <- function(close){
  sw <- embed(x = close,dimension = 10)[,10:1] #  make slide window data
  X <- t(apply(sw,1,scale)) #  normalase data
  
  dp <- tail(c(diff(close),0),nrow(X)) #  diff prices
  Y <- as.factor( ifelse(dp>=0,1,-1) ) #  target for classification
  res <- list(Y=Y,X=X,diff_price=dp)
  return(res)
}

I get an error

> D <- make_data(close)
Error in make_data(close) : could not find function "make_data"
> source('~/.active-rstudio-document', echo=TRUE)

> close <- cumsum(rnorm(10000,sd = 0.00001))+100

> par(mar=c(2,2,2,2))  ; plot(close,t="l")

> D <- make_data(close)
Error in make_data(close) : could not find function "make_data"
> source('~/.active-rstudio-document', echo=TRUE)

> close <- cumsum(rnorm(10000,sd = 0.00001))+100

> par(mar=c(2,2,2,2))  ; plot(close,t="l")

> D <- make_data(close)
Error in make_data(close) : could not find function "make_data"
> 
 
Aleksey Vyazmikin #:

Trying the modified code

I get an error

Because functions must be declared first and then used...

Are you trolling or what?
 
mytarmailS #:
Because functions have to be declared first and then used...

Are you trolling or something?

How did I know...

Then this is the error

> D <- make_data(close)
Error in h(simpleError(msg, call)) : 
  ошибка при оценке аргумента '.data' при выборе метода для функции 'embed': argument ".data" is missing, with no default
 
Interesting article about RL. https://habr.com/ru/articles/349800/
Also an interesting conversation in the cometarium between the creator of THIS thread and another member.
 
mytarmailS #:
Interesting article about RL. h ttps:// habr.com/ru/articles/349800/
Also an interesting conversation in the cometarium between the creator of THIS thread and another member.

Isn't the concept of RL redundant for trading tasks? We have the influence of the environment on the agent, but is there an influence of the agent on the environment? It is probably possible to introduce this second influence artificially, but does it make sense?

Two (or three) ideas from the article are not at all superfluous for us - it is that the loss function should reflect exactly what we need and should be smooth (and monotonic). In our case, it should be profit and it should depend smoothly and monotonically on the parameters of the model.

Smoothness of some analogue of profit, probably, can be achieved somehow (for example, by something like kernel smoothing). But I doubt very much about monotonicity.

 
Aleksey Nikolayev #:

Isn't the concept of RL redundant for trading tasks? We have the influence of the environment on the agent, but is there an influence of the agent on the environment? It is possible to introduce this second influence artificially, but does it make sense?

Two (or three) ideas from the article are not at all superfluous for us - it is that the loss function should reflect exactly what we need and should be smooth (and monotonic). In our case, it should be profit and it should depend smoothly and monotonically on the model parameters.

Smoothness of some analogue of profit, probably, can be achieved somehow (for example, by something like kernel smoothing). But I doubt very much about monotonicity.

The basis of the financial result of trading is the price movement - a non-stationary random process.

Are we trying to turn a non-stationary random process into a smooth and monotonic one by means of some tricks? Maybe we are walking wide? Especially if we take into account that classification error less than 20%(!) outside the training set is extremely difficult to achieve. Maybe we should start by working on reducing the classification error?

Reason: