Machine learning in trading: theory, models, practice and algo-trading - page 196

 
Vizard_:
Take your time. Regardless of the clustering method. Do one pass. Trim a few observations. Do it again. Compare the results.

results. Work out a method for further action. Otherwise you'll make "grails", and then you'll wonder why it does not work))))


I was thinking of doing a total overshoot, if initially there were 500, in the end it may turn out that a good model will take into account only 5

and that's all I'll leave...

 
Dr.Trader:

If you search for patterns like mytarmailS, by the sliding window on bars - each pattern will carry information about what interval the values on each bar can be in. The more patterns - the smaller interval will be allocated to each bar.....................

This is just my style of data input, good or bad, I dunno, I think it's the same as everyone else's...

But it has nothing to do with the approach I explained...

This approach allows you to clustering and extract the few percent of useful data that conventional MOEs can't.

You can take your predictors as you feed them into your network, cluster them (each one) and run them through the algorithm I described

 
Dr.Trader:

Roughly speaking, for a particular window with new data to fall into some previously found pattern, it must fall into such veritcal constraints inherent in each pattern.

Well it depends on how much you want to teach the model, if you teach it weakly and don't do many clusters then figures in the model's head will be smoothed out and that defect you say theoretically should disappear...

Here's how the model sees a price series of 50 values that are packed into 49 clusters

й

code...

price <- cumsum(rnorm(30000))+1000
plot(price,t="l")

# делаем скользящее окно (50 знач) через матрицу хенкеля
hankel <- function(data, r=50) {
  do.call(cbind,
          lapply(0:(r-1),function(i) { data[(i+1):(length(data)-(r-1-i))]}))}
price <- hankel(price)

# скалирую, центрирую, в общем нормализирую каждую строчку в матрице
price <- t(apply(price,1,function(x){scale(x,center = T,scale = T)}))

# запускаем пакет с кластеризацыей, в дан. случ с кохоненом
# хорошый мануал  
#https://cran.r-project.org/web/packages/SOMbrero/vignettes/doc-numericSOM.html
install.packages("SOMbrero")
library(SOMbrero)

# тренируем модель, с матрицей 7 х7 те 49 кластеров
model <- trainSOM(price, dimension=c(7,7), verbose=T)
plot(model,what="prototypes",type="lines",print.title=T)
 
mytarmailS:

Well, it depends on how much you want to train the model, if you train weakly, don't make many clusters, then the figures in the model's head will be quite smoothed and the drawback you say theoretically should disappear...

Here's how the model sees a price series of 50 values that are packed into 49 clusters

code...

price <- cumsum(rnorm(30000))+1000
plot(price,t="l")

# делаем скользящее окно (50 знач) через матрицу хенкеля
hankel <- function(data, r=50) {
  do.call(cbind,
          lapply(0:(r-1),function(i) { data[(i+1):(length(data)-(r-1-i))]}))}
price <- hankel(price)

# скалирую, центрирую, в общем нормализирую каждую строчку в матрице
price <- t(apply(price,1,function(x){scale(x,center = T,scale = T)}))

# запускаем пакет с кластеризацыей, в дан. случ с кохоненом
# хорошый мануал  
#https://cran.r-project.org/web/packages/SOMbrero/vignettes/doc-numericSOM.html
install.packages("SOMbrero")
library(SOMbrero)

# тренируем модель, с матрицей 7 х7 те 49 кластеров
model <- trainSOM(price, dimension=c(7,7), verbose=T)
plot(model,what="prototypes",type="lines",print.title=T)


Vo. I did the same thing a long time ago. Kohonen is interesting stuff. But remember that calling it a clustering algorithm is wrong. It's a convolution algorithm. And clustering is then done on the resulting two-dimensional space...
 
Alexey Burnakov:
Here. I also did it long time ago. Kohonen is interesting. But remember, it is not correct to call it a clustering algorithm. It is a convolution algorithm. And clustering is then done on the resulting two-dimensional space...

Thank you ! I did not know.

But it is possible to use model exit as a cluster, isn't it?

head(model$clustering , 100)
  [1]  7  7  7  7  7  7  7  7  6  6  6  5  5  4  4  3  3  2  2  1  1  1  1  1  8 15 15 22 22 29 36
[32] 43 43 43 43 43 43 43 44 44 45 45 45 46 46 46 46 47 47 47 47 48 48 48 49 42 35 35 28 28 21 21
[63] 21 21 21 21 21 21 21 21 21 14 14 14  7  7  7  7  7  7  7  6  5  4  3  3  2  1  1  1  1  1  1
[94]  8  8 15 22 22 29 36
 
mytarmailS:

Thank you ! I did not know.

But it is possible to use the exit from the model as a cluster, isn't it?

head(model$clustering , 100)
  [1]  7  7  7  7  7  7  7  7  6  6  6  5  5  4  4  3  3  2  2  1  1  1  1  1  8 15 15 22 22 29 36
[32] 43 43 43 43 43 43 43 44 44 45 45 45 46 46 46 46 47 47 47 47 48 48 48 49 42 35 35 28 28 21 21
[63] 21 21 21 21 21 21 21 21 21 14 14 14  7  7  7  7  7  7  7  6  5  4  3  3  2  1  1  1  1  1  1
[94]  8  8 15 22 22 29 36

Yes, of course you can... It's essentially an aggregation of similar vectors in these cells.

It just may be this way (I did it myself at work and I know what I'm saying). I have n million entries. The vector of inputs is a couple of hundred long. I want to cluster it, but I don't know in advance how many clusters there are. Not every algorithm will be able to process such an array of data on an ordinary computer. Therefore, I first minimize the input space and make a grid of, for example, 50*50. It turns out 2500 typical representatives of the population... On such an array, my computer pulls agnes (hierarchical clustering). It builds a matrix of proximity of all to all...

And it turns out that I cluster these 2500 into, say, 10 clusters and the aggregation metric is good.

 
Alexey Burnakov:

Yes, of course you can... There is essentially an aggregation of similar vectors in these cells.

It just may be this way (I did it myself at work and I know what I'm saying). I have n million entries. The vector of inputs is a couple of hundred long. I want to cluster it, but I don't know in advance how many clusters there are. Not every algorithm will be able to process such an array of data on an ordinary computer. Therefore, I first minimize the input space and make a grid of, for example, 50*50. It turns out 2500 typical representatives of the population... On such an array, my computer pulls agnes (hierarchical clustering). It builds a matrix of proximity of all to all...

And it turns out that I cluster these 2500 into, say, 10 clusters and the aggregation metric is good.

in the manualhttps://cran.r-project.org/web/packages/SOMbrero/vignettes/doc-numericSOM.html

in the section

Building super classes from the resulting SOM

exactly what you're talking about

 

For those who want to know what the market really is.

watch from old videos to new ones....

https://www.youtube.com/channel/UCsdK6s9i_I0NEMvis6wXhnQ/featured

it seems to be not difficult, but to program ? it's not possible for me yet...

I suggest to discuss how you can encode such approaches for MO or just encode

 

For information on the R language and the new MetaTrader 5 build 1467:

  • An updated version of the statistical libraries, similar to R, has been released:

    Statistical Distributions in MQL5 - take the best from R and make it faster

  • Calculations in MQL5 are from 3 to 7 times faster than in R (even taking into account that functions there are implemented in C++).
  • Some R functions have errors due to old optimization/simplification methods, leading to erroneous results
  • A beta-version of graphical libraries similar to R has been added, what allows to visualize data as in R.
  • Added useful function ArrayPrint, which prints both standard arrays and structures similar to R


You can update to 1467 from the MetaQuotes-Demo server.

A lot of new mathematical and statistical functions similar to R will be added in the coming versions. This will allow more calculations and visualizations to be carried out directly in MetaTrader 5.
 
Just curious - do you take as predictors here only the values of prices and different indicators of prices? and real volumes and indicators of volumes anyone uses?
Reason: