Machine learning in trading: theory, models, practice and algo-trading - page 3163

 
Forester #:

I found another problem.
I found a good variant with training once a week on 5000 lines of M5 (3,5 weeks). And decided to shift all the data to 300 lines - like training not on Saturdays, but on Tuesdays. As a result, the model on OOS from profitable became unprofitable.
These new 300 lines ( about 8% of the total) brought out other chips and other splits, which became better for slightly changed data.
Repeated the shift by 300 for 50000 rows. It would seem to be only 0.8% of new rows. But the changes on the OOS are significant too, though not as strong as with 5000 rows.

In general there is a fit not only to the size of the window, but also to the beginning of the window. Small offsets make a big difference in the result. There are no strong features, everything is on the edge of 50/50 ± 1-2%.

This seems to be a common problem for trees - lack of robustness.

There is a faint hope that some improvement is possible by moving to more elaborate (in terms of matstat) split rules. This is something like the same "difference trees" I gave a link to an article about recently. Or something like the CHAID chi-square statistics.

Of course, this is not a panacea and it is not a fact that these specific examples of split rules will work for us at all. But it is an example that split rules can and should be treated creatively.

The main idea to take from the matstat is to stop tree growth when a critical p-value is reached, not for some left-wing reasons.
 
Forester #:

I found another problem.
I found a good variant with training once a week on 5000 lines of M5 (3,5 weeks). And decided to shift all the data to 300 lines - like training not on Saturdays, but on Tuesdays. As a result, the model on OOS from profitable became unprofitable.
These new 300 lines ( about 8% of the total) brought out other chips and other splits, which became better for slightly changed data.
Repeated the shift by 300 for 50000 rows. It would seem to be only 0.8% of new rows. But the changes on the OOS are significant too, though not as strong as with 5000 rows.

In general there is a fit not only to the size of the window, but also to the beginning of the window. Small offsets make a big difference in the result. There are no strong features, everything is on the edge of 50/50 ± 1-2%.

What model?

 
СанСаныч Фоменко #:

What model?

wooden
 
Forester #:
wooden
You need to find a coreset that has a pattern, and only train on it. It can be on any piece of the graph, it can be found through overshooting. Otherwise, noise does not allow the model to concentrate. The trend now is coresets - small representative subsamples. It is quite simple and gives results.
 

Interesting article about trees and reinforcement learning in them.....

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4760114/

============================

main idea

2.2 Motivation

In short, the proposed reinforcement learning tree (RLT) model is a traditional random forest model with a special type of selection of separation variables and suppression of noise variables. These features are made available by implementing the reinforcement learning mechanism at each internal node. Let us first consider a checkerboard example demonstrating the impact of reinforcement learning: Assume that X ~ uni f [ 0, 1 ] p and E ( Y | X ) = I { I ( I ( X (1) 0 .5) = I ( X (2) >0 .5)} , so that p 1 = 2 and p 2 = p -2 . The difficulty in estimating this structure using the usual random forests is that neither of the two strong variables shows insignificant effects.The immediate reward, i.e., the reduction in prediction errors, from partitioning into these two variables is asymptotically identical to the reward obtained by partitioning into either of the noise variables. Hence, when p is relatively large, it is unlikely that either X (1) , or X (2) will be chosen as the separation variable. However, if we know in advance that splitting on either X (1) , or X (2) will yield significant future benefits for later splits, we could confidently force a split on either variable regardless of the immediate rewards.

=========================

Well, and package on R accordingly

https://cran.r-project.org/web/packages/RLT/RLT.pdf

Reinforcement Learning Trees
Reinforcement Learning Trees
  • www.ncbi.nlm.nih.gov
In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each...
 
Forester #:
wooden

What's the exact name? Or is it homemade?

I have been using different "wooden" models for many years and have never seen anything like this.

 
mytarmailS #: However, if we know in advance that splitting on either X (1) , or X (2) will yield significant future benefits for later splits, we could confidently force a split on any variable regardless of immediate rewards.

I can force it, but I don't know by which fiche one should X1, X2, or X157

 
СанСаныч Фоменко #:

What's the exact name? Or is it homemade?

I have been using different "wooden" models for many years and have never seen anything like this.

Homemade. Possibilities for experiments are not limited....
 
Maxim Dmitrievsky #:
You need to find a coreset that has a pattern and train on it only. It can be on any piece of the graph, it is searched through enumeration. Otherwise, noise does not allow the model to concentrate. The trend now is coresets - small representative subsamples. It's pretty simple and yields results.

How to search? Go through all chunks (e.g. 100 by 5000 pp) and see how successfully the other 500,000 rows on that model predict the other 500,000 rows?

 
Forester #:

How to search? Go through all chunks (e.g. 100 by 5000 pp) and see how successfully the other 500,000 rows on that model predict?

Yeah, you can randomly pull samples instead of chunks in a row, that's more correct.

Reason: