Machine learning in trading: theory, models, practice and algo-trading - page 1781

 
Aleksey Vyazmikin:

I didn't delete the date - I just didn't save it so it would take up less space in the file - what do you need it for?

Well, time is an indirect sign of volatility, which is seasonal in time, there are active trading hours and there are passive

Aleksey Vyazmikin:

Is it possible to consecutively save the predictors in a file and then just download them for training?

If you can save them, then in order to train the model it is necessary to load the matrix into the environment, and that will be the end of it )) or rather earlier, at the stage of formation of the matrix with predicates

Aleksey Vyazmikin:

My training sample is about a gigabyte now - CatBoost can easily handle it, but I wouldn't risk building a genetic tree in R...

Wow that's not a few gigs, I wonder how many traits you have?

What kind of genetic tree?


Aleksey Vyazmikin:

1)Are you using volume, or is it just more convenient with volume?

2)Don't you need ZZ parameters to adjust predictors?

3)I don't understand about data distortion - do you need to shift the data to know all the bar data at bar zero? If so, do you not have a peek at the zero bar?

1) just )

2) how's that? And how do you adjust the predictors for ZZ ?

If you change something for your own purposes, you should always leave the original for others.

 

Now I'm finalizing the MOS on clustering. Purely on clusters is not learning well, I will add increments to the features

another example


 
Maxim Dmitrievsky:

Now I'm finalizing the MOS on clustering. Purely on clusters is not learning well, I will add increments to the features

another example

What's on the graphs?

What about graphs with and without increments?

 
mytarmailS:

What's on the graphs?

like with and without increments? balance on the charts?

Trace and test, balance in pips.

no increments, all clusters.

increments do not give anything

 
Maxim Dmitrievsky:

traine and test, balance in points

without increments, only clusters

increments do not give anything

Try to "thin out" the clusters, i.e. to drop long clusters...

For example, you have a vector of clusters that are based on price, price is "P" cluster "C"


price is R R R R R R R R R R R R R R

cluster number 11111111122222222221111111111111112222222222222

leave only transitions from cluster to cluster

price - R R R R R R R R R R R R R R R R R R

cluster number 111111111112222222222111111111111111222222222222222


R R

2 1 2

If we drop all these sticking points, we might get rid of the noise, and the sampling will be greatly reduced.

Try it, I did it once with hmm

 
mytarmailS:

Try to "thin out" the clusters, that is, throw out the long sticking of clusters...

For example, you have a vector of clusters that are based on price, price is "P" cluster "C"


price is R R R R R R R R R R R R R R

cluster number 1111111111122222222221111111111111112222222222222

leave only transitions from cluster to cluster

price - R R R R R R R R R R R R R R R R R R

cluster number 111111111112222222222111111111111111222222222222222


R R

2 1 2

If we drop all these sticking points, we might get rid of the noise, and the sampling will be greatly reduced.

Try it, I used to do it with hmm

If you've already done it, there's no point

 
Maxim Dmitrievsky:

If you've already done it, there's no point.

don't make conclusions out of thin air

First of all it was hmm , and secondly I made a big improvement, but the problem was different...

 
mytarmailS:

don't make conclusions out of thin air

first of all it was hmm , and secondly I made a significant improvement, although the task was different...

too much of a pain in the ass )

 
Maxim Dmitrievsky:

too painful and incomprehensible)

Pisetz in R is half a line, and your lauded python is painful?

Well...

 
Maxim Dmitrievsky:

Now I'm finalizing the MOS on clustering. Purely on clusters is not learning well, I will add increments to the features

Another example


What do you share, and why aren't you happy with the increments? They are essentially time-based speeds. But I can't do it without averaging. But if you start taking averages into account, you quickly get a maze. There has to be a working middle somewhere. It is not enough at the last tick and a bit more than that.

Reason: