Machine learning in trading: theory, models, practice and algo-trading - page 3012

 
СанСаныч Фоменко #:

I've written a few times.

Here you have to say the same thing a hundred times to be heard.

 
Aleksey Vyazmikin #:

No, it's just that when I came to the thread and started talking about culling rules from trees and evaluating them you laughed at the idea.

I made the next step now - creation of conditions for creation of potentially high-quality rules through estimation of quantum sections of predictor, and again I face total misunderstanding.

So just pulling rules out of the tree doesn't make you happy? In theory, it's a matter of luck there too, but at the expense of their (rules) number one could find something.

It's about the same as searching strategy parameters in an optimiser, but in a more elegant way.
 
Maxim Dmitrievsky #:

So just pulling the rules out of the tree didn't get you happy?

The method turned out to be quite good, except for the fact that there is no certainty about the further life of rules with their subsequent reincarnation. At long intervals more than 50% of the selected rules showed positive performance results.

I used a genetic tree - this is very slow if the sample contains many predictors.

So I decided to look for ways to reduce the amount of information fed to the tree for training. I started looking for ways to highlight potentially useful data.

Another problem is that leaves/rules are very similar in terms of activation points. And as the leaf base grew, uniqueness was hard to find.

As a result, the design is interesting, there is something to improve, but it was extremely slow in my case. In general, not suitable for experiments, but interesting for implementation, if the whole concept of TC construction device is ready.

And of course - I don't know R, I asked local gurus, and nobody could help me to solve my tasks.

Now I would add sampling and forced selection of the root predictor (by list) and blocking of the already used one.

 
Aleksey Vyazmikin #:

The method turned out to be quite good, except for the fact that there is no certainty about the further life of rules with their subsequent reincarnation. At long intervals more than 50% of the selected rules showed positive performance results.

I used a genetic tree - this is very slow if the sample contains many predictors.

So I decided to look for ways to reduce the amount of information fed to the tree for training. I started looking for ways to select potentially useful data.

Another problem is that leaves/rules are very similar in terms of activation points. And as the leaf base grew, uniqueness was hard to find.

As a result, the design is interesting, there is something to improve, but it was extremely slow in my case. In general, not suitable for experiments, but interesting for implementation, if the whole concept of the TC construction device is ready.

And of course - I don't know R, I asked local gurus, and nobody could help me to solve my problems.

Now I would add sampling and forced selection of the root predictor (by list) and blocking of the already used one.

What does catbust have to do with it? Why do you need it, do you pull rules from it too?

why not take a simple tree and go from the root to the tops by rules, giving less weight to complex rules (penalty for rule complexity)

run each rule in a tester on new data, throwing out the ones with a big error beforehand.

ZY, I still don't like this approach intuitively, I haven't figured out why yet
 
Aleksey Vyazmikin #:

And of course - I don't know R,

I've been hearing that for over a year.

You can learn R in a week.

 
Maxim Dmitrievsky #:

What does catbust have to do with it? Why do you need it, do you get the rules out of it too?

CatBoost is a great speed to check if the direction of ideas is correct in the first place.

I can pull rules from the first tree, but of course they turn out much weaker on average (there are good ones, but very rarely), so I left this idea for now. Now there is an alternative way of building trees, perhaps the rules are stronger there, but there is no possibility to work in MQL5 with such a model without python.

And in general, I have my own ideas how to build a model that is slow to create, but with the same checks that were used to select leaves. Maybe someday I will get to its implementation in code.

 
Maxim Dmitrievsky #:

why not just take a simple tree and go from root to tops by rule, giving less weight to complex rules (penalty for rule complexity)

run each rule in a tester on new data, throwing out the ones with a big error beforehand.

ZY, I still don't like this approach intuitively, I haven't figured out why yet

The difference is essentially only in the amount of data and CPU load when applying the model.

Well and plus, leaves are easier to ensemble by grouping and distributing weights (I called it a herbarium :) ).

Many trees are used to create rules, which means that signals overlap, which is not the case in just one tree.

 
mytarmailS #:

I've been hearing this for over a year now.

R can be learnt in a week

Apparently not everyone is so talented.

And the code is not simple - I tried to redo it, but there was not enough information on the Internet to solve the problem.

Another disadvantage of R is that there is no simple solution for parallelising calculations between computers.

 
Aleksey Vyazmikin #:

The difference is essentially only in the amount of data and CPU load when applying the model.

Well and plus, leaves are easier to ensemble, gathering into groups and distributing weights (I called it herbarium :) ).

Many trees are used to create rules, which means that signals overlap, which is not the case in just one tree.

I realise why I dislike this idea, because association (rules e.g.) != causation :)

 
Aleksey Vyazmikin #:

Another disadvantage of R is that there is no simple solution for parallelising computations between computers.

Yeah, sure.

professional opinion of a professional R user.

Reason: