Machine learning in trading: theory, models, practice and algo-trading - page 214

 
Vladimir Perervenko:


PS. And parallelize the calculations of lm(). This is exactly the case when you need to

Thank you.

I've seen how to parallelize an operation in a loop through foreach %dopar%. I don't know how to hook it to a hidden loop in DT. I don't know if it may be faster.

 
Alexey Burnakov:

Thanks.

I've seen how to parallelize an operation in a loop through foreach %dopar%. I don't know how to attach it to a hidden loop in DT. And I don't know if it would be faster.

I meant this part of code.

lm_models <-    x[,
{
        lapply(c(1:20), function(x) summary(lm(data = .SD[, c(1:x, 21), with = F],    formula = V21 ~ . -1))$'fstatistic'[[1]])
}
, by = sampling
]

Instead of lapply - foreach()

 

There is something wrong with graphs that take several tens of seconds to build.

Check out this package ("nhstplot"). It's fast and pretty good, in my opinion.

> plotftest(f = 4, dfnum = 3, dfdenom = 5, title = "Fisher's F test")

 
Vladimir Perervenko:

There is something wrong with graphs that take several tens of seconds to build.

Check out this package ("nhstplot"). It's fast and pretty good, in my opinion.

> plotftest(f = 4, dfnum = 3, dfdenom = 5, title = "Fisher's F test")

I will have a look at it. But where is the semi-transparency here, where are the hundreds of superimposed objects? Test it under heavy conditions and then we'll see if it's faster or not.
 
Vladimir Perervenko:

Ahh. I'll try that, thanks. It turns out that the lapply loop will be replaced by a parallel loop. And this all goes around in a DT loop, where there are 1000 iterations.

Is there another way to skip these 1000 iterations through foreach?

 
Alexey Burnakov:
I'll take a look at it. But where is the semi-transparency here, where are the hundreds of superimposed objects? Test in heavy conditions and then we'll understand whether it's faster or not.

Fast drawing of translucencies can be achieved by video card with OpenGL. In R there is a library for this rgl, it is more for 3d, but if you can make an orthogonal projection, and draw lines, it will be just what you need. I did not manage to figure it out on the spot, you need to read the documentation.

Done:

It's pretty easy to draw semi-transparent lines, all you need is a table with X and Y coordinates. You can also add a third column Z for three-dimensionality.

library(rgl)
for(i in 1:1000){
    lines3d(cbind(1:1000, cumsum(rnorm(1000))), col="blue", alpha=0.1)
}

But it turned out to be slow all the same. Judging by processexplorer - video is used by only 5%, and one logical processor at all 100%. I think R is too slow in feeding data into OpenGL, much slower than it can receive. Somehow it came out that way.

Purely for the sake of clutter :) do it, maximize the window to the full screen, and spin the "figure" with the left mouse button.

library(rgl)
for(i in 1:100){
    lines3d(cbind(cumsum(rnorm(100)), cumsum(rnorm(100)), cumsum(rnorm(100))), alpha=0.2)
}
 
You can do this. You could take out a hundred lines at a time. And distributions in the full program. It would be much faster
 

Finally made the first try of the idea with clusters, which I voiced earlier, a test run, just to see what's up, predictors are simple

Moving series of 5 OHLC values + volume + volatil 6 predictors of 5 values each

history of 100 000 bars

each predictor was normalized of course ) and then clustered into 100 clusters sorry for the nonsense

The target was created out of the blue (at the moment I just sat down), I just took the reversal in this form: The target is an extremum that is higher than the previous 4 candles and higher than the next 10 candles after it.

I started to look for repeating patterns...

The best I could find with such no predictors was the following pattern

open high low close volum volat target_avg target_count
91      30  41    91    100   0.4        9

In(open high low close volum vol um) there are the numbers of clusters that characterize this pattern

target_avg - this is the probability of my turn being triggered in this pattern; I haven't managed to find any patterns with theprobability of triggering 80-90% according to these predictors

target_count - the number of times the pattern was caught in the history, I haven't found any significant patterns, which were caught 30-50 times using these predict ors either

In total the best I found in these predict ors is a pattern, in which the reversal (target) works in 40% of cases and observations of such (number of patterns) is 9 pieces

Maybe this is the only piece of useful information, the required "NO BLOW", which can be extracted from the set of predictors and it explains only one reason for the reversal and even then only by 40% and the reasons for different reversals are different and there are definitely not 10 or 30 of them imho.

And now think about how the algorithm can explain all market movements with such predictors, it's impossible, because predictors can explain only 2%, and all the rest is noise ...

plus there is no control of statistical recurrence, i.e. IR can make a decision based on one or two observations, and that's how it is in fact in most cases in 95% of cases

Anyway, I digress... let's continue with the pattern, having estimated the quality of inputs on the new sample I will say this, it is far from a Mercedes, but if it is a killed Zaporozhets then this approach is a nine from the factory

The quality of inputs is much better, clearer, fewer errors...

another thing, the full pattern is...

open high low close volum volat
91   6    30  41    91    100

when I ran the recognition on the new data of 50 000 candlesticks the algorithm couldn't find any of such a pattern, it just didn't show up))

I had to cut the pattern down and left only the prices

open high low close volum volat
91   6    30  41    91    100

I found about 20 of such patterns

here are the entries in the pattern, I have not selected anything "ala best entries" just photographed as is in the sequence in which the deals were made, not all deals of course just the first few, so that you can evaluate

й

eqivity is good, although the risk is greater than expected

к

remember, it's only one pattern, and only the shorts

If someone needs the code, then I will lay it out, although I doubt it, everything is very clear
 

214 pages is a lot to study/learn. (And everyone is talking about something different and not always understandable).

Is it possible to summarize all these pages in one post, even if not very short? Type: set goal, methods of solution, results, conclusions.

Let me say right away that my model of the market is a random process (Brownian motion), or rather the sum of several (may be many) such motions with feedbacks. And it is absolutely useless to predict or look for any regularities except statistical ones. I.e., any meaningful predictors simply do not exist, at least for speculative purposes.

 
of course, post the code. it's interesting
Reason: