Machine learning in trading: theory, models, practice and algo-trading - page 2413

 
Maxim Dmitrievsky:
I don't see the full picture of why this might work.

Suppose there are random predictors in the sample, actually noise, the goal is to clean up the noise.

Do you think it won't improve the result?

 
Aleksey Vyazmikin:

Suppose there are random predictors in the sample, actually noise, the goal is to clean up the noise.

Do you think it won't improve results?

It's easier to take any chip-target combination and filter signals by time until a stable signal is found. And build a bot from such models
 
Maxim Dmitrievsky:
It is easier to take any chip-target combination and filter signals by time until a stable signal is found. And build a bot from such models

Either I don't understand the point, then write in more detail, or I don't understand, how the suggested actions differ from adding to predictor_1 an additional predictor_2, containing information about time?

 
Aleksey Vyazmikin:

I thought about how to improve the method of selecting predictors/characters/features by analyzing the resulting model.

I've thrown myself some ideas for the implementation of the algorithm, but decided to share them with the esteemed community, maybe before we start working on the implementation of this algorithm, there will be constructive criticism or additions/clarifications to the algorithm. It is interesting to think that nothing will work with justification.


Selecting predictors by their frequency of use (Feature importance) when creating a CatBoost model

The idea is that each algorithm has its own peculiarities of building trees, and we will select those predictors that are used more often by the algorithm, in this case CatBoost.

However, to assess uniformity on the time scale, we will use multiple samples and combine their data into a single table. This approach will allow us to sift out random events that have a strong influence on the choice of the predictor in one of the models. The regularities on which the model is built should occur throughout the sample, which can facilitate proper classification on new data. This feature is applicable to data from the market, i.e., data without completeness, including data with hidden cyclicality, i.e., not temporal, but event-driven. At the same time, it is desirable to penalize the predictors that are not in the top 30%-50% in one of the plots, which will allow to select the predictors that are most frequently in demand when creating models on different time plots.

Also, to reduce the randomness factor, you should use models with different Seed values, I think there should be from 25 to 100 such models. Whether it is necessary to add a coefficient depending on the quality of the obtained model or just average all results by predictors - I do not know yet, but I think we should start with a simple one, i.e. just average.

The issue of using a quantization table is important, it may have a decisive role in the selection of predictors. If the table is not fixed, then each model will create its own table for the subsample, which will not allow comparison of the results obtained, so the table should be common for all samples.

It is possible to get a quantization table:

  1. By setting hyperparameters for CatBoost on the type and number of partitions into quanta of the entire training sample, and saving the results in csv.
  2. Set hyperparameters for CatBoost by type and number of partitions into quanta by selecting one of the sample areas, let's say the best, and save the results in csv.
  3. Obtain a table using a separate script that selects the best choices from a set of tables.
Previously obtained tables are used for each sample through forced table loading during training.

You can quantize by yourself before feeding into the boost - everything will be under your control.
From 0 to 0.00005 = 0.00005 from 0.00005 to 0.00010 = 0.00010, etc.

 
elibrarius:

You can quantize by yourself before feeding into the boost - everything will be under your control.
From 0 to 0.00005 = 0.00005 from 0.00005 to 0.00010 = 0.00010, etc.

The third version of obtaining the quantization table also includes evaluation of custom quantization tables, which I pre-generate. Experiments show that this is not always the best option. By the way, since we are talking about numerical sequences, what other steps can be used besides linear, Fibonacci, exponential?

 
Aleksey Vyazmikin:

The third option of obtaining the quantization table provides for the evaluation and user quantization tables, which I pre-generate. Experiments show that this is not always the best option. By the way, since we are talking about numerical sequences, what other steps can be used except linear, Fibonacci, exponential?

Isn't 3 enough for you? The number of experiments has already tripled.
 
Aleksey Vyazmikin:

Either I do not understand the point, then write in more detail, or I do not understand how the proposed actions differ from adding to the predictor_1 an additional predictor_2, containing information about the time?

There are reasons not to write details yet, but they will come someday. Parts of the circuit have already been described here. I see this as the only sensible option, not tied to a husk of selecting attributes. Ideally, the features can be anything, and so can the labels. The task of the algorithm is to calibrate for them, taking into account the temporal component (filtering out places where these features don't work). Proofs - Prado's meta-labeling with some tuning of this approach. You are in a completely different steppe, so understanding may not arise.
 
elibrarius:
Three isn't enough for you? The number of experiments has already tripled)) what else?

Of course not enough :) In fact, I'm selecting the optimal table for each predictor, and the more non-sampling, the better. The latest version of the script even selects the best intervals from all tables and combines them into one table for each predictor.

 
Maxim Dmitrievsky:
There are reasons not to write details yet, but they will come someday. Parts of the scheme have already been described here. I see this as the only sensible option, not tied to the husk of feature selection. Ideally, the features can be anything, so can the labels. The task of the algorithm is to calibrate for them, taking into account the temporal component (filtering out places where these features don't work). Proofs - Prado's meta-labeling with some tuning of this approach. You are in a completely different steppe, so understanding may not arise.

Yes, nothing is really clear about the algorithm calibration. Even if you have filtered out places in training, how to recognize those places in application is not clear.

 

A fan of the movie The Matrix?

What does the matrix have to do with it?
I read clever people, you can get more information in an hour of reading than in 10 years of reading all sorts of blozhyk misfits, overseas
and not so much...
Reason: