Machine learning in trading: theory, models, practice and algo-trading - page 195

 
Mihail Marchukajtes:
Well yes, I was making lags because in previous versions they increase enveloping power, now with improved prefetching algorithm it's not required so I'm trying to train without them. Let's take a look and see. I will report about the results later.

If the trick doesn't work, i.e. generalizability does not increase without lags, then 13-th and 14-th version are most likely some kind of non-universal trend, customized for a narrow range of tasks?

In that case, you'll have to roll GIT back in order to move jPrediction in a different, more universal way.

Although there is a second hypothesis: the presence of lags in the sample is a narrow and non-universal direction, for which the previous versions were honed?

 
Yury Reshetov:

If the trick fails, i.e. generalizability without lags will not improve, then most likely version 13 and 14 are some non-universal direction, sharpened for a narrow range of tasks?

In that case, we'll have to roll back GIT to move jPrediction in a different, more universal way.

Although there is a second hypothesis: the presence of lags in the sample is a narrow and non-universal direction, for which the previous versions were honed?

Well, let's see what's up... I'll report right away as soon as I practice...
 
Dr.Trader:

I'll respond here then.

#пара строк из той таблицы, не буду я всё текстом копировать, потом первая строка повторена ещё дважды
dat <- data.frame(cluster1=c(24,2,13,23,6), cluster2=c(5,15,13,28,12), cluster3=c(18,12,16,22,20), cluster4=c(21,7,29,10,25), cluster5=c(16,22,24,4,11), target.label=c(1,1,0,1,0))
dat <- rbind(dat, dat[1,], dat[1,])
#результат последней строки поменян на 0 для эксперимента
dat[7,"target.label"]=0

library(sqldf)
#для sqldf точек в названиях колонок быть не должно
colnames(dat)[6] <- "target"

dat1 <- sqldf( "select cluster1, cluster2, cluster3, cluster4, cluster5, avg(target) as target_avg, count(target) as target_count from dat group by cluster1, cluster2, cluster3, cluster4, cluster5" )
dat1
dat1[ dat1$target_count>=10 & dat1$target_avg>0.63 , ]
dat1[ dat1$target_count>=10 & ( dat1$target_avg<0.37 | dat1$target_avg>0.63 ), ] #на случай если оба "0" или "1" встречаются чаще 70%

Thank you, very compact solution!!!

Please help me with one more nuance in the string


dat1 <- sqldf( "select cluster1, cluster2, cluster3, cluster4, cluster5, avg(target) as target_avg, count(target) as target_count from dat group by cluster1, cluster2, cluster3, cluster4, cluster5" )

How do I replace the selective cluster names with one variable say

colnames_dat <- colnamed(dat) [-"target"]
dat1 <- sqldf( "select colnames_dat, avg(target) as target_avg, count(target) as target_count from dat group by colnames_dat" )

because in reality there will be 500 or maybe even 1000 clusters, each cluster name would be unrealistic to write manually, and the solution does not work directly

 
Mihail Marchukajtes:
Well, let's see what's up... I'll tell you about it as soon as I practice...

The point is that before version 13, those predictors that were closer to the beginning of the sample were processed with higher probability. And those at the end of the sample (closer to the target variable) were processed with a lower probability. In other words, if the most significant predictors were placed to the left in the sample in advance, and the least significant ones to the right, we would get a good generalization ability. If vice versa, then poor. The problem was that in order to do this, we had to know in advance which predictors were most significant, i.e. we had to pre-rank them in the sample by significance. But in this case, the predictor selection algorithm itself was not very effective.

In version 14, the processing probability for all predictors is about the same. But this creates another problem. After all, the predictor selection algorithm works according to the gradient search method, shifting each time by one step towards increasing the generalization ability. At the same time it has a non-zero risk of "getting stuck" on a local extremum, like other gradient methods. Prior to version 12, this risk was reduced due to preliminary ranking of predictors in the sample.

In general, there are problems in both first and second versions of the algorithm and we should somehow analyze them in order to eliminate them. For example, to introduce into the algorithm some random jumps for several steps in different directions from the current state, in order to "jump" over "ravines".

 
mytarmailS:

> clusternames <- paste(colnames(dat)[-ncol(dat)], collapse=",")
>
clusternames
[1] "cluster1,cluster2,cluster3,cluster4,cluster5"
> sql_query <- paste0("select ", clusternames, ", avg(target) as target_avg, count(target) as target_count from dat group by ", clusternames)
>
sql_query
[1] "select cluster1,cluster2,cluster3,cluster4,cluster5, avg(target) as target_avg, count(target) as target_count from dat group by cluster1,cluster2,cluster3,cluster4,cluster5"
> dat1 <- sqldf( sql_query )
>
dat1

 
Yury Reshetov:

to "jump" over "ravines".

Sometimes L-BFGS optimization is embedded into neuronics, it allows to climb out of ravines. The nnet neural package, for example.

There's a lot of math there, I don't know how it works, but the idea is to descend not along gradient, but along gradient from gradient (derivative of derivative).

 
Vizard_:
1) Correctly understood and primitive example(if nets or etc. are nowhere to get, it starts "making it up")))
2) Only why look for 70%, when you can find and use 100 (not for the price of course).

1) yes, yes, they have that sin )) i described it in paragraphs about the advantages of "my" approach

2) I will look for combinations of reversals, those are not just a direction or color of the candle - up-turn, down-turn, not reversal

I'll have much less observations than I need but if everything works I'll be happy with 40% of results, I don't even need 70% as my target risk will be 1 to 5

Dr.Trader:

Thank you very much, I will slowly prepare data, then cluster, then look for patterns and I will let you know the results

 
Dr.Trader:

Sometimes L-BFGS optimization is built into neuronics, it lets you climb out of ravines. The neuronki package nnet, for example.

BFGS and its derivatives, such as L-BFGS, are designed to solve the problem that is already solved in jPrediction, namely to find local extrema. That is, these algorithms allow "climbing out" of "ravines" in the direction of the nearest extrema, rather than "jumping" over ravines in order to find alternative extrema.

We need algorithms "jumpers". And it is desirable that they "jump" not randomly, but in some potentially promising direction. Theoretically, this can be implemented through genetics, where such "jumps" are made through mutations. But genetic algorithms are very slow and are more suitable for those tasks where potential descendants can be tested for lice with minimal time consumption. Training a neural network to calculate its generalizability is time-consuming, so genetics would be too slow here.

Okay, for lack of a better one, I'm currently testing a variant with random "jumps".

 

Another R book. I'll pin it here, since it's not clear where else to go. Let it be.

S.E. Mastitsky, V.K. Shitikov.

STATISTICAL ANALYSIS AND DATA VISUALIZATION WITH R

 

If you search for patterns like mytarmailS, by the sliding window on bars - each pattern will carry information about what interval the values on each bar can be in. The more patterns, the less interval will be set for each bar.

Roughly speaking, in order for a certain window with new data to be included in some pattern found earlier, it should get into such veritical limits inherent to each pattern.

If you take and find 1000 patterns, the width of the "channel" for each of them will be small. And since new data is always slightly different from training data - it will be hard to get into such a narrow channel, it will lead to errors.

I would be guided by Occam's Brite - if you can reduce the number of patterns and get the same result without deterioration, then it is better to reduce it.

Reason: