Machine learning in trading: theory, models, practice and algo-trading - page 2387

 
Aleksey Vyazmikin:

If you don't understand from the description, ask questions about what is not clear - I'll try to explain better.

I did the same thing a couple of years ago, and gave up because of labor costs, not meaninglessness.

Below is a table of the results of the old experiment, the work goes like this:

1. The number of predictors is cut into 9 pieces.

2. Combinations between pieces are created - 512

3. Then there is an evaluation of how the samples behave on average with the presence/absence of each chunk.

4. An assumption is made about the significance of the chunk (positive/negative).

5. Significant chunks are beaten into smaller chunks, and less significant ones are combined into one chunk (not necessarily that they have to go in order)

6. New 512 combinations are formed

7. If a small chunk is found that negatively affects the sample, it is excluded from further enumeration until the result stops improving, then the rejected chunks can be tried to be added and the result analyzed in the same way. Positive influences, on the other hand, are summarized into one group.

Here is an example of the change in indicators with 32 such iterations.



The method, of course, can be improved, but this requires experiments and results of its outcome.

Yes, the improvement is not in times, but the results also allow you to think about which predictors are better or worse affect the result and why.

And, I want to try working specifically with CatBoost statistics and removing/adding predictors (and their groups) for the very reason that it might be quicker than the brute force I previously used.

Another plus is that too many predictors leads to rare splits, and activation of leaves can be very rare in the sample outside of training (I showed that on screenshot earlier), which deliberately reduces quality of training and its evaluation.

It's kind of tricky.
How is it better than just checking all the features by adding 1?
First you train 1000 times (with 1000 features to check) on 1 feature, find the best one. Then 999 times on the best chip and 999 times on the rest, picked the second best one. Then on the top 2 and the third of the 998 remaining, etc.
A total of 2 nested cycles.
Models with a small number of features learn very quickly. You can get 20-30 of the best ones in a reasonable amount of time. And after 10-20 selected features, models usually stop improving, adding new features after them only worsens results.
 
Maxim Dmitrievsky:

Pointless Killing Time

It is clear that there will be no constructive discussion - there is no desire to understand the essence.

 
Aleksey Vyazmikin:

Clearly, there will be no constructive discussion - there is no desire to understand the point.

There is no desire to suffer bullshit, the point is clear (suffering bullshit)

 
elibrarius:
Something complicated.
How is it better than simple testing of all the features by adding 1?
First trained 1000 times (with 1000 features to test) on 1 feature, found the best one. Then 999 times on the best chip and 999 times on the rest, picked the second best one. Then on the top 2 and the third of the 998 remaining ones, etc.
A total of 2 nested cycles.
Models with a small number of features learn very quickly. You can get 20-30 of the best ones in a reasonable amount of time. And after 10-20 selected features, models usually stop improving, adding new features after them only worsens results.

You should not look for the best one, but a combination of features, and this is the problem. Why the problem, because it is impossible to try all combinations, that's why we need eurestic method. Another problem is potential strong similarity of different predictors after splitting them, which in ensembles will lead to overestimation of probability, because there will be many essentially correlated leaves.

 

the man decided to reinvent boosting with boosting, let's not stop him

appeals to common sense did not help

 
Maxim Dmitrievsky:

no desire to suffer bullshit, the point is clear (suffering bullshit)

Why bullshit?

Does it improve - yes it does.

There is a theoretical basis for it, yes there is.

Of course, it is not an improvement by an order of magnitude.

And yes, it may be little effective for your predictors - here I can allow the justification for refusal.

 
neuro is on fire ))
forget about trading, turn the neural network into an indicator

 
Aleksey Vyazmikin:

Why bullshit?

Does it make a difference - yes it does.

There is theoretical justification - yes there is.

Of course, it is not an improvement by an order of magnitude.

And yes, it may be marginally effective for your predictors - here I can allow for a rationale for rejection.

already said everything, I won't interfere with the overkill of the non-overkill

 
Aleksey Vyazmikin:

We should not look for the best one, but for the combination of the two, which is the problem. Why it is a problem, because it is impossible to try all combinations, that is why we need the eureistic method. Another problem is potential strong similarity of different predictors after splitting them, which in ensembles will lead to overestimation of probability, since there will be many essentially correlated leaves.

After you choose the first best chip, the second one will be chosen with the best interaction together with the first one, and so on. When you get to 10, the next one will be chosen with the best interaction with whichever of 10 previously chosen, but most likely with all of them.
 
elibrarius:
After selecting the first best feature, the second one will be chosen with the best interaction along with the first one, and so on when you reach 10. The next one will be the one with the best interaction with any of the previously selected ones, but most likely with all of them.

It doesn't work that way

remove chips with low importance from the model and break it, then compare your ass with your finger (other chips), and so on around the circle

Reason: