Getting a model including preprocessing is possible, but not in all frameworks and only the simplest ones - General

mytarmailS 2023.04.24 08:50 #30471

Aleksey Vyazmikin #:

In general updated, even the error does not write, but the result is the same - all up almost.

ran it, I got the same picture))

I realised the error, there I had a peek into the future in the target... yeah... we lose our grip.

This line should be replaced

dp <- с(diff(close),0)

by

dp <- tail(c(diff(close),0),nrow(X))

I rewrote the code a bit more readable.

close <- cumsum(rnorm(10000,sd = 0.00001))+100
par(mar=c(2,2,2,2))  ; plot(close,t="l")


D <- make_data(close)
tr <- 1:500
R <- make_rules(y = D$Y[tr] , x = D$X[tr,])
# head(R)
buy_rules <- R$condition[ R$pred==1 ]



plot(x = 1:2000,y = rep(NA,2000), ylim = c(-0.001,0.001)) 
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( D$diff_price[  eval(str2expression(buy_rules[i]))  ] )
  lines(cum_profit,col=8,lwd=1)}
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( D$diff_price[  eval(str2expression(buy_rules[i]))  ] )
      
      if(length(cum_profit)>30){
      ccor <- cor(cum_profit, 1:length(cum_profit))
      if(ccor>=0.95)  lines(cum_profit,col=i,lwd=2)
      }
}
abline(h = 0,col=2,lty=2)
gc(T,T)

helper functions

make_rules <- function(y, x){
  library(inTrees)  # ?inTrees::getRuleMetric()
  library(RRF)
  rf <- RRF(x = x,y = y,ntree=100)
  rule <- getRuleMetric(unique(extractRules(RF2List(rf),x)),x,y)
  rule <- data.frame(rule,stringsAsFactors = F)
  for(i in c(1,2,3,5)) rule[,i] <- as.numeric(rule[,i])
  return(rule)}
make_data <- function(close){
  sw <- embed(x = close,dimension = 10)[,10:1] #  make slide window data
  X <- t(apply(sw,1,scale)) #  normalase data
  
  dp <- tail(c(diff(close),0),nrow(X)) #  diff prices
  Y <- as.factor( ifelse(dp>=0,1,-1) ) #  target for classification
  res <- list(Y=Y,X=X,diff_price=dp)
  return(res)
}

mytarmailS 2023.04.24 09:31 #30472

Maxim Dmitrievsky #:

Well, if there are no other evaluation criteria, then through the stability of the parameters

one can also represent the output values of the TS as signals in time and measure their entropy and compare it with randomness. If the TS captures some regularities that are repeated with some periodicity, it will be reflected.

For builders of custom FFs, it might be useful.

The best measure is time and tests in real life. Any TC will stop working.

I have already understood why it all does not work on new data, and even roughly understand what to do.

Vladimir Perervenko 2023.04.24 11:32 #30473

Aleksey Nikolayev #:

The question about ONNX from ONNX arose simply from the juxtaposition of two statements I have encountered: 1) model acquisition can be represented as a pipeline, 2) the pipeline can be converted to ONNX format.

It is clear that this is hardly possible in practice. In fact, I would like to understand what exactly prevents the implementation of such a possibility, in order to realise the fundamental limitations of this technology as a whole.

It's one thing if it's limitations like impossibility to write to a file and another if it's limitations like lack of support for data types (dataframes, for example).

Both statements are true. Getting a model including preprocessing is possible. Unfortunately not in all frameworks and only the simplest ones. TF/Keras implement the first NN layers performing preprocessing. scikit-learn has the richest choice of pipeline+model . See scl2onnx.

It is good to see that serious contributors realise that ONNX should include the whole pipeline starting with preprocessing. New data in production must go through the same preprocessing steps as in training. Otherwise, the results of the ONNX model will be unpredictable.

This direction is rapidly developing and I think this issue will be solved soon. For now you should experiment.

Good luck

Article: Price forecasting with "New Neural" is an Learning ONNX for trading

Rashid Umarov 2023.04.25 04:06 #30474

Comments not relevant to this thread have been moved to "Unacceptable way of communicating".

Aleksey Vyazmikin 2023.04.25 23:52 #30475

mytarmailS #:

ran it, I got the same picture ))

Realised my mistake, I had a peek into the future in the target one...yep...losing my touch.

This line should be replaced

to

slightly rewritten the code to be more readable

auxiliary functions

Trying the modified code

close <- cumsum(rnorm(10000,sd = 0.00001))+100
par(mar=c(2,2,2,2))  ; plot(close,t="l")


D <- make_data(close)
tr <- 1:500
R <- make_rules(y = D$Y[tr] , x = D$X[tr,])
#  head(R)
buy_rules <- R$condition[ R$pred==1 ]



plot(x = 1:2000,y = rep(NA,2000), ylim = c(-0.001,0.001)) 
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( D$diff_price[  eval(str2expression(buy_rules[i]))  ] )
  lines(cum_profit,col=8,lwd=1)}
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( D$diff_price[  eval(str2expression(buy_rules[i]))  ] )
  
  if(length(cum_profit)>30){
    ccor <- cor(cum_profit, 1:length(cum_profit))
    if(ccor>=0.95)  lines(cum_profit,col=i,lwd=2)
  }
}
abline(h = 0,col=2,lty=2)
gc(T,T)

make_rules <- function(y, x){
  library(inTrees)  # ?inTrees::getRuleMetric()
  library(RRF)
  rf <- RRF(x = x,y = y,ntree=100)
  rule <- getRuleMetric(unique(extractRules(RF2List(rf),x)),x,y)
  rule <- data.frame(rule,stringsAsFactors = F)
  for(i in c(1,2,3,5)) rule[,i] <- as.numeric(rule[,i])
  return(rule)}
make_data <- function(close){
  sw <- embed(x = close,dimension = 10)[,10:1] #  make slide window data
  X <- t(apply(sw,1,scale)) #  normalase data
  
  dp <- tail(c(diff(close),0),nrow(X)) #  diff prices
  Y <- as.factor( ifelse(dp>=0,1,-1) ) #  target for classification
  res <- list(Y=Y,X=X,diff_price=dp)
  return(res)
}

I get an error

> D <- make_data(close)
Error in make_data(close) : could not find function "make_data"
> source('~/.active-rstudio-document', echo=TRUE)

> close <- cumsum(rnorm(10000,sd = 0.00001))+100

> par(mar=c(2,2,2,2))  ; plot(close,t="l")

> D <- make_data(close)
Error in make_data(close) : could not find function "make_data"
> source('~/.active-rstudio-document', echo=TRUE)

> close <- cumsum(rnorm(10000,sd = 0.00001))+100

> par(mar=c(2,2,2,2))  ; plot(close,t="l")

> D <- make_data(close)
Error in make_data(close) : could not find function "make_data"
>

mytarmailS 2023.04.26 04:27 #30476

Aleksey Vyazmikin #:

Trying the modified code

I get an error

Because functions must be declared first and then used...

Are you trolling or what?

Aleksey Vyazmikin 2023.04.26 07:17 #30477

mytarmailS #:
Because functions have to be declared first and then used...

Are you trolling or something?

How did I know...

Then this is the error

> D <- make_data(close)
Error in h(simpleError(msg, call)) : 
  ошибка при оценке аргумента '.data' при выборе метода для функции 'embed': argument ".data" is missing, with no default

mytarmailS 2023.04.26 08:22 #30478

Interesting article about RL. https://habr.com/ru/articles/349800/

Also an interesting conversation in the cometarium between the creator of THIS thread and another member.

Aleksey Nikolayev 2023.04.26 09:32 #30479

mytarmailS #:
Interesting article about RL. h ttps:// habr.com/ru/articles/349800/

Also an interesting conversation in the cometarium between the creator of THIS thread and another member.

Isn't the concept of RL redundant for trading tasks? We have the influence of the environment on the agent, but is there an influence of the agent on the environment? It is probably possible to introduce this second influence artificially, but does it make sense?

Two (or three) ideas from the article are not at all superfluous for us - it is that the loss function should reflect exactly what we need and should be smooth (and monotonic). In our case, it should be profit and it should depend smoothly and monotonically on the parameters of the model.

Smoothness of some analogue of profit, probably, can be achieved somehow (for example, by something like kernel smoothing). But I doubt very much about monotonicity.

Formalising common approaches to [Archive!] Pure mathematics, physics, Testing real-time forecasting systems

СанСаныч Фоменко 2023.04.26 09:50 #30480

Aleksey Nikolayev #:

Isn't the concept of RL redundant for trading tasks? We have the influence of the environment on the agent, but is there an influence of the agent on the environment? It is possible to introduce this second influence artificially, but does it make sense?

Two (or three) ideas from the article are not at all superfluous for us - it is that the loss function should reflect exactly what we need and should be smooth (and monotonic). In our case, it should be profit and it should depend smoothly and monotonically on the model parameters.

Smoothness of some analogue of profit, probably, can be achieved somehow (for example, by something like kernel smoothing). But I doubt very much about monotonicity.

The basis of the financial result of trading is the price movement - a non-stationary random process.

Are we trying to turn a non-stationary random process into a smooth and monotonic one by means of some tricks? Maybe we are walking wide? Especially if we take into account that classification error less than 20%(!) outside the training set is extremely difficult to achieve. Maybe we should start by working on reducing the classification error?

Random Flow Theory and From theory to practice LET'S SAY THAT ...

Machine learning in trading: theory, models, practice and algo-trading - page 3048