Machine learning in trading: theory, models, practice and algo-trading - page 2386

 
Aleksey Vyazmikin:

Moreover, I have been doing this approach for many years.

However, now it is somewhat put off, because it is really very slow, which makes it impossible to quickly change the target (look for a good one), and individual rules also stop working, no matter how you check them on history.

Maxim, can you help, make a script in python which will cycle to create a model (or 100 models) from CSV file, then analyze it(them) on predictors validity with standard CatBoost means, then prohibit/allow using of predictors (CatBoost is able to do it) on certain conditions and create again new model. This method will allow to single out important predictors, which will lead to model improvement. I use similar method, but it's very inconvenient for me, because every cycle, after sifting/adding predictors I have to restart the training cycle manually.

I don't have time for that yet

and i don't believe in this approach (i know better and faster multiples)

 
Maxim Dmitrievsky:

not to that yet

and do not believe in this approach (I know better and faster multiples)

The approach works - it's not a matter of faith.

And which approach is faster and better, let's compare effectiveness!

 
Aleksey Vyazmikin:

The working approach is not a matter of faith.

And which approach is faster and better, let's compare effectiveness!

For this not to be a matter of faith, you need some proof.

Removing features from the model changes their Interaction, so you can move them around as long as you want
 
As we know - any NS, forest, boost can reproduce inside itself any function like MA, and other digital filters. And it would seem to make no sense to feed the same MAs, if you just feed 50-100 bars.
In deep neural networks, maybe.
But in regulated boosts and scaffolding, no.
For example in catbust the recommended tree depth = 6. That's 2^6=64 splits. If we need MA30, then on average each bar will be split 2 times (in half and one of the halves still in half). To roughly reproduce MA (with 10% error), we need at least 10 splits of each bar. This is the depth of 9-10 is needed.
But the deep division of the tree will not let it generalize.
Thus, it turns out that shallow trees can generalize, but it is impossible to reproduce any required features inside (like MA). This means that MACs, CCIs and everything else that we want to check as features should be passed along with bars.

I was not the only one who believed that bars alone are sufficient for tree systems. If there are still supporters, I suggest them to express their arguments.
 
Maxim Dmitrievsky:

So it's not a matter of faith, you need some proof.

Removing features from the model changes their Interaction, so you can rearrange them for as long as you want

Let's define what is required to prove.

Why, there may be a proc of removing a predictor, in my opinion, formally a predictor can pass selection as good for the root (often) split for the reason that it has good for it - often correspondence with other predictors improves result - greed principle, but this principle works with dataset as a whole, there are no checks on spatial characteristics (frequency of event on all sample in connection with their outcome), this situation of accumulation of event outcomes on 1/5 of sample for example only from that there was Or a similar situation, but there the reason is different - even the model turns out to be stable, but it turns out that the predictor correlates well with the target at the moment when the financial indicators of the transaction outcome are mostly too small in plus or too large in minus, and this is a very subtle point that the model does not know how to take into account when training.

That's why the goal is not only to improve the classification model itself on different time intervals, but also to improve it from the point of view of financial outcome.

 
Aleksey Vyazmikin:

Let us define what is required to prove.

Why, there can be a prognosis from removal of the predictor, in my opinion, formally predictor can pass selection as good for the root (often) split for the reason that it has good indicators for it - often correspondence with other predictors improves result - greed principle, but this principle works with date set as a single whole, there are no checks on spatial characteristics (frequency of event on all sample in connection with their outcome), this situation of accumulation of event outcome on 1/5 of sample for example only from that there was Or a similar situation, but there the reason is different - even the model turns out to be stable, but it turns out that the predictor correlates well with the target at the moment when the financial indicators of transaction outcome are mostly too small in plus or too large in minus, and this is a very subtle point that the model is not able to take into account when training.

So the goal is not only to improve the classification model itself at different time intervals, but also to improve it in terms of financial outcome.

I'm not ready to code and then support incomprehensible ideas with incomprehensible outcomes

 
Maxim Dmitrievsky:

not ready to code and then support incomprehensible ideas with an incomprehensible outcome

So tell me that only your ideas are correct and worthy of discussion.

 
Aleksey Vyazmikin:

So tell me that only your ideas are correct and worthy of discussion.

rather reasonable. And from the description did not understand anything

I already wrote about senseless rearrangement of signs, I did it a couple of years ago.

 
Maxim Dmitrievsky:

rather reasonable. And I did not understand anything from the description

I already wrote about the meaninglessness of rearranging the features, I did it a couple of years ago.

If you do not understand the description, ask questions, what is not clear - I will try to explain better.

I did the same thing a couple of years ago, and abandoned it because of the labor involved, not meaninglessness.

Below is a table of the results of the old experiment, the work goes like this:

1. The number of predictors is cut into 9 pieces.

2. Combinations between pieces are created - 512

3. Then there is an evaluation of how the samples behave on average with the presence/absence of each chunk.

4. An assumption is made about the significance of the chunk (positive/negative).

5. Significant chunks are beaten into smaller chunks, and less significant ones are combined into one chunk (not necessarily that they have to go in order)

6. New 512 combinations are formed

7. If a small chunk is found that negatively affects the sample, it is excluded from further enumeration until the result stops improving, then the rejected chunks can be tried to be added and the result analyzed in the same way. Positive influences, on the other hand, are summarized into one group.

Here is an example of the change in indicators with 32 such iterations.



The method, of course, can be improved, but this requires experiments and results of its outcome.

Yes, the improvement is not in times, but the results also allow you to think about which predictors are better or worse affect the result and why.

And, I want to try working specifically with CatBoost statistics and removing/adding predictors (and their groups) for the very reason that it might be quicker than the brute force I previously used.

Another plus is that too many predictors leads to rare splits, and activation of leaves can be very rare in the sample outside of training (I showed it on the screenshot earlier), which deliberately reduces the quality of training and its evaluation.

 
Aleksey Vyazmikin:

If you don't understand from the description, ask questions about what is not clear - I'll try to explain better.

I did the same thing a couple of years ago, and gave up because of the labor involved, not the pointlessness.

Pointless waste of time.

Reason: