How to train trees by clusters? - General

Vladimir Perervenko 2020.07.28 19:47 #19281

mytarmailS:

in continuation of a personal conversation

your option

plain variant

As you can see the values are quite different, you can check yourself

I have in my model

because I have only one column, but it doesn't really matter.

===================UPD

Man, they are different every time you run umap_tranform, it shouldn't be the same

Wasn't paying attention. It's been a long time...

Forester 2020.07.28 20:50 #19282

mytarmailS:

in continuation of a personal conversation

your option

plain variant

As you can see the values are quite different, you can check yourself

In my model

because I have only one column, but it doesn't really matter.

===================UPD

They are different every time you run umap_tranform, it shouldn't be that way

Usually the Seed (of the built-in RNG) is set to some value for repeatability. If not, then random is taken. Maybe this package has Seed too - check it.

mytarmailS 2020.07.28 21:10 #19283

elibrarius:
Usually for repeatability they set the Seed (of the built-in HSS) to some value. If not, then random is taken. Perhaps in this package Seed is also - check it.

Yes I think so, but the point is that without RMS it should always be the same, in the package analog "umap" always the same result

mytarmailS 2020.07.28 21:16 #19284

Aleksey Vyazmikin:

Especially for you, with only one hope that you will learn r-ku)

install.packages("TTR","uwot")

clos <- d$X.CLOSE.

get.ind <- function(x,n=5){
  
  all_to.all_colums <- function(x,names){
    cb <- combn(ncol(x),2)
    res <- matrix(ncol = 0,nrow = nrow(x))
    for(i in 1:ncol(cb)){
      j1 <- cb[1,i]
      j2 <- cb[2,i]
      
      res <- cbind(res,   x[,j1] - x[,j2]  )
      colnames(res) <- paste0(names, 1:ncol(res))
    }
    return(res)}
  
  library(TTR)
  aroon  <- aroon(x,n)
  BBands <- BBands(x,n)            ;   BBands <- all_to.all_colums( BBands, names = "BBands")
  CCI    <- CCI(x,n)
  CMO    <- CMO(x,n)
  DEMA   <- diff(c(0,DEMA(x,n)))
  Donchian <- DonchianChannel(x,n) ;   Donchian <- all_to.all_colums( Donchian, names = "Donchian")
  MACD   <- MACD(x,n)
  moment <- momentum(x,n)
  PBands <- PBands(x,n)            ;   PBands <- all_to.all_colums( PBands, names = "PBands")
  RSI    <- RSI(x,n)
  SAR    <- diff(c(0,SAR(cbind(x,x,n))))   
  SMA    <- diff(c(0,SMA(x,n)))
  stoch  <- stoch(x,n)
  TDI    <- TDI(x,n)
  VHF    <- VHF(x,n)
  WPR    <- WPR(x,n)
  
  ind <- cbind.data.frame(aroon,BBands,CCI,CMO,DEMA,Donchian,
                          MACD,moment,PBands,
                          RSI,SAR,SMA,stoch,TDI,VHF,WPR)
  return(ind)
}
get.target <- function(x, change){
  zz <- TTR::ZigZag(x,change = change,percent = F)
  zz <- c(diff(zz),0) ; zz[zz>=0] <- 1 ; zz[zz<0] <- -1
  return(zz)
}

X <- get.ind(clos)
Y <- as.factor(get.target(clos,change = 0.001))


library(uwot)

train.idx <- 100:8000
test.idx <- 8001:10000

UM <- umap(X = X[train.idx,],
           y = Y[train.idx], 
            approx_pow = TRUE, 
            n_components = 3, 
            ret_model = TRUE,
            n_threads = 4 L, 
            scale = T)

predict.train <- umap_transform(X = X[train.idx,], 
                                model = UM, n_threads = 4 L, 
                                verbose = TRUE)

predict.test <- umap_transform(X = X[test.idx,], 
                                model = UM, n_threads = 4 L, 
                                verbose = TRUE)



library(car)

scatter3d(x = predict.train[,1], 
          y = predict.train[,2], 
          z = predict.train[,3],
          groups = Y[train.idx],
          grid = F, 
          surface = F,
          ellipsoid = F,
          bg.col = "black",surface.col = c(2,3))

there are two functions

get.ind

и

get.target

the first creates a date set of indicators, the second the target of the zigzag

all you need to do is load the data with the closing price of 10k and write it in the variable clos

and get your umap with the target

https://github.com/jlmelville/uwot

Vertical Line - Lines Enabling the Storage - Cycle Lines - Lines

Aleksey Vyazmikin 2020.07.28 22:51 #19285

mytarmailS:

Especially for you, with only one hope that you will learn r-ku)

there are two functions

и

the first creates a date set of indicators, the second the target of the zigzag

all you need to do is load the data with a close price of 10k and write it in the variable clos

and get your umap with the target

https://github.com/jlmelville/uwot

Very nice, thanks!

I wish more comments :)

The question here is how to synchronize the predictors from the file with the obtained target?

mytarmailS 2020.07.28 22:59 #19286

Aleksey Vyazmikin:

Very nice, thank you!

I wish more comments :)

The question is how to synchronize the predictors from the file with the target?

Well, since the target is built on the price, it is already synchronized, and if the predictors are built on the same scene, it means they are too)

Or maybe I did not understand your question?

I tried to name the variables so that they were understandable without comments

Projects and MQL5 Storage Global Variables - Algorithmic Working with Storage -

fxsaber 2020.07.28 23:10 #19287

A question from a beginner.

There are three variables A, B, C. Some condition is handwritten from them. For example.

(A > B) && (A - B < C) && (A + 3 * C > 2 * B)

I want to reproduce this condition automatically. I don't need to find it, because I already know it. But I need to have for example a dozen of some weight coefficients, a certain combination of which is able to hit this condition with high probability, when I set A, B, C there (polynomial or HC - I don't know, because it's zero) and get the original condition.

I am interested in what kind and how many input weighting coefficients the desired function has, so that such original conditions can be reproduced via weights?

Indicators - Charts - Accumulation/Distribution - Volume Indicators Accumulation/Distribution - Volume Indicators

Aleksey Vyazmikin 2020.07.28 23:25 #19288

So, how to train trees by clusters, I tell and show you.

We got the following model for class recognition

The history is accurate enough Accuracy 0.9196756 - i.e. the logic of the cluster is quite reproducible.

Next I trained for each cluster on the model

Cluster 1

Cluster 2

Cluster 3

Cluster 4

All clusters have Accuracy 0.53 or so.

And this is how the model looks like without splitting into clusters

Accuracy 0.5293815 is about the same as the clusters.

If we compare the models for clusters and one tree model with the entire sample, we see that the cluster trees have more leaves with generalized sample information with a target of 1 and -1, which is theoretically good.

Let's see what the tests show - first, let's look at the training period

Model without cluster partitioning:

Model with clustering:

We see that the accuracy is better with the model without clustering, but the trades are higher with the model on clusters, which allows for better financial performance.

Now let's look at the sample outside of training.

And here are our clusters:

And the model without clusters:

The situation here seems to be reversed - a lot of trades had a detrimental effect when the market started convulsing since April.

I decided to look at the leaves from the cluster models individually, had there been no cluster, on a descending histogram:

A total of 6 unprofitable leaves (zero target removed - it's a barrier to entry), it turns out that we are not in the right cluster?

Optimization Types - Algorithmic Executing Trades - Trading How to Choose a

Aleksey Vyazmikin 2020.07.28 23:29 #19289

mytarmailS:

Well, since the target is built on price, it is already synchronized, and if the predictors are built on the same stage, then they are too)

Or I have not understood the question.

I tried to name the variables so that they were understandable without comments

How to take my dataset with predictors and close price and load it by specifying the column with close price instead of using the variant of indicators generation in R?

As I understand it, since the target is ZZ tops, then the predictor part of the sample should be filtered out, here, and so to feed the predictors you have to filter out the table with the predictors as well, or what?

Close By - Trade Close By - Trade Request Execution - Opening

Maxim Dmitrievsky 2020.07.28 23:45 #19290

fxsaber:

A question from a beginner.

There are three variables A, B, C. Some kind of condition is made up of them by hand. For example.

I want to reproduce this condition automatically. There is no need to find it, because I already know it. But I need, for example, to have a dozen of some weight coefficients, a certain combination of which is able to hit this condition with high probability, when I set A, B, C there (polynomial or HC - I don't know, because I don't know) and get the original condition.

I am interested in what kind and how many input weights the desired function has, so that such original conditions can be reproduced via weights?

as an alternative

(A > B) && (A - B < C) && (A + 3 * C > 2 * B)

The input of NS is values A,B,C n times (let's say 1000), the output is answers of your formula for these values as 0;1. You should try it. And see the classification error, how well the model reproduces the condition.

If you need to see exactly what kind and interpret it, then through trees

Variant 2 (if the first one worked badly) - A, B, A-B, C, A+3*C, 2B - variables, all the same as in the first variant to put in the tree. And you can see its structure as in Alexey's pictures above

Stochastic Oscillator - Oscillators How to copy deals Williams' Percent Range -

Machine learning in trading: theory, models, practice and algo-trading - page 1929