Machine learning in trading: theory, models, practice and algo-trading - page 1276

 
Maxim Dmitrievsky:

all beavers build the same dams, even though they don't know it

but each stubbornly believes that he has invented something new.

The beaver is a hardworking, correct creature, it's quite another matter ,a pupil of the horse , it's a nasty creature, chase him with a broom from everywhere, or better yet, just ignore him.

 
Kesha Rutoff:

The beaver is a hard-working, correct creature, but the Horseman's Apprentice is an ugly creature, and he should be broomed out of everything, or better yet, simply ignored.

"Analyst in a jar" :))) dishwasher get out

 
Aleksey Vyazmikin:

The point is that even if you took 50% of all, then further there is a clear selection of these 50% for the first root split (or in Alglib it is not so?). CatBoost has not only random selection of predictors, but also random split (weights are added randomly to calculations) on first trees.

I get different results, and my goal is not to estimate the whole model, but to get leaves that are highly likely to describe most of the sample. Then such leaves are checked on the history by years and a composition is made of them, which may not describe the whole market, but I think it is better to have more accurate answers to what you know than to guess with 50% probability most of the time.

In my point of view, it's better to have answers, than to guess with 50% probability most of the time.

In Alglib out of 50% predictors all remaining ones are searched, each is divided into 4 parts by quartiles, and from all variants division with the best error is chosen.

In principle, random partitioning is not difficult to attach.
I haven't seen single trees with good test results (45-50%), but a forest of them is more interesting)


 
On the importance of the predictors I was looking at:
xgboost, lightGBM packages had built-in methods for estimating feature importance for "wood models":

  1. Gain
    This measure shows the relative contribution of each feature to the model. To calculate this, we go through each tree, look at each node of the tree which feature leads to a node split, and how much the model uncertainty is reduced according to the metric (Gini impurity, information gain).
    For each feature its contribution is summed over all trees.
  2. Cover
    Shows the number of observations for each feature. For example, you have 4 features, 3 trees. Suppose chip 1 has 10, 5 and 2 observations in trees 1, 2 and 3 respectively. Then the importance for this chip is 17 (10 + 5 + 2).
  3. Frequency
    Shows how often a given feature is found in nodes, i.e. the total number of tree splits for each feature in each tree is counted.
They don't really divide the importance correctly.
I have a forest trained on 5 bars results in a better test than 100. But when trained by 100, the first 5 are not marked as important, but some far away.
When trained at 100, the error of individual trees and forests is lower - obviously at the expense of overtraining and giving importance to 30-100 bars. But it is obvious that they are not important either by conventional logic, but by the fact that the forest at 5 bars gives the best results.
 
By the way, I do not understand what is the difference between Cover and Frequency? More precisely, what is the observation of the features in Cover? (I understand with breakdown by a chip in Frequency). Trees seem to be split by features, not observed.
 
Aleksey Vyazmikin:

There's a R script with a genetic algorithm to create a tree, select generations by entropy improvement. Then there is some kind of final selection. I take all trees for final selection and pull leaves from them for separate further measurements in MT5. The script has not been publicly posted, so there are no detailed descriptions. Apparently it is like selecting the best tree from the forest, but there is a depth limitation to avoid overtraining, and the process takes about 2 days for all cores in the last sample, where not all bars, but only signals for entry, and if all bars are for 3 years, the calculation takes 1.5 months. After calculation I make splitting of the tree, i.e. I remove column with root predictor of the best population tree and start all over again, it appeared that even on 40 this procedure sometimes very good leaves are created, so I came to conclusion, that the best mathematical tree layout is not always the most effective, and one information prevents to show another, what has already appeared later used in the same CatBoost, when they randomly choose predictors from all sample to build one tree.

After briefly reviewing the code, I saw a genetic selection of features for building a tree from rpart package. That is, each tree was offered a different set of features for learning. Due to genetics such feature set is faster than complete brute force.
But the tree is not a magic tree, but the one offered by rpart. I think it's standard there.
 
Maxim Dmitrievsky:

first train the model on all features, save the errors

then, sequentially, randomize each of predictors, say, by normal distribution, and check the error again on all features, including this randomized (changed) one, compare it with the initial one. There is no need to retrain the model. And so check each of the predictors. If the predictor was good, the error on the entire sample (including all other original predictors) will increase dramatically compared to the original. Save the error differences, sift out the best fiches based on them. Then, at the end, train on only the best ones and model into production. Bad predictors are noise to the model, what the hell they need with their 1%. Usually 5-10 good ones remain, importance of the rest decreases exponentially (Zipf law).

I tried to teach filters, but not much, I don't see much sense, it's better to put everything into one model at once

If you can, just about the selection of predictors VERY competent(already threw earlier)

Found your post about permutation.
It's an interesting variant. I have to try it.
Although I'm afraid that if I apply it to a 100-bar model and try to remove 95 bars and leave the first 5, the result will be 50%. After all, those first 5 barely participated in the splits (on average only 5% of the nodes are built on them).
 
elibrarius:
Found your post about permutation.
It's an interesting variant. I must try it.
Although I'm afraid that if I apply it to a model with 100 bars and try to remove 95 bars and leave the first 5, the result will be 50%. After all, those first 5 barely participated in the splits (on average only 5% of the nodes are built on them).

I do not know what you do with 100 bars, you should probably apply properly and everything will be fine

 
Maxim Dmitrievsky:

I do not know what you do with 100 bars, you probably need to apply properly and everything will be

I want to automate the process of sifting out unimportant predictors)

 
Maxim Dmitrievsky:

first train the model on all features, save the errors

then, sequentially, randomize each of predictors, say, by normal distribution, and check the error again on all features, including this randomized (changed) one, compare it with the initial one. There is no need to retrain the model. And so check each of the predictors. If the predictor was good, the error on the entire sample (including all other original predictors) will increase dramatically compared to the original. Save the error differences, sift out the best fiches based on them. Then, at the end, train on only the best ones and model into production. Bad predictors are noise to the model, what the hell they need with their 1%. Usually 5-10 good ones remain, importance of the rest decreases exponentially (Zipf law).

I tried to teach filters, but not much, I don't see much sense, it's better to put everything into one model at once

If you can, just about the selection of predictors VERY competent(already threw earlier)

I understood this method differently.
For the predictor under study, you should not feed random values with a normal distribution, but just shuffle the rows in this column.

In general, the results from the article are impressive. I have to try it in practice.
Reason: