Machine learning in trading: theory, models, practice and algo-trading - page 1276
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
all beavers build the same dams, even though they don't know it
but each stubbornly believes that he has invented something new.
The beaver is a hardworking, correct creature, it's quite another matter ,a pupil of the horse , it's a nasty creature, chase him with a broom from everywhere, or better yet, just ignore him.
The beaver is a hard-working, correct creature, but the Horseman's Apprentice is an ugly creature, and he should be broomed out of everything, or better yet, simply ignored.
"Analyst in a jar" :))) dishwasher get out
The point is that even if you took 50% of all, then further there is a clear selection of these 50% for the first root split (or in Alglib it is not so?). CatBoost has not only random selection of predictors, but also random split (weights are added randomly to calculations) on first trees.
I get different results, and my goal is not to estimate the whole model, but to get leaves that are highly likely to describe most of the sample. Then such leaves are checked on the history by years and a composition is made of them, which may not describe the whole market, but I think it is better to have more accurate answers to what you know than to guess with 50% probability most of the time.
In my point of view, it's better to have answers, than to guess with 50% probability most of the time.
In principle, random partitioning is not difficult to attach.
I haven't seen single trees with good test results (45-50%), but a forest of them is more interesting)
xgboost, lightGBM packages had built-in methods for estimating feature importance for "wood models":
This measure shows the relative contribution of each feature to the model. To calculate this, we go through each tree, look at each node of the tree which feature leads to a node split, and how much the model uncertainty is reduced according to the metric (Gini impurity, information gain).
For each feature its contribution is summed over all trees.
Shows the number of observations for each feature. For example, you have 4 features, 3 trees. Suppose chip 1 has 10, 5 and 2 observations in trees 1, 2 and 3 respectively. Then the importance for this chip is 17 (10 + 5 + 2).
Shows how often a given feature is found in nodes, i.e. the total number of tree splits for each feature in each tree is counted.
I have a forest trained on 5 bars results in a better test than 100. But when trained by 100, the first 5 are not marked as important, but some far away.
When trained at 100, the error of individual trees and forests is lower - obviously at the expense of overtraining and giving importance to 30-100 bars. But it is obvious that they are not important either by conventional logic, but by the fact that the forest at 5 bars gives the best results.
There's a R script with a genetic algorithm to create a tree, select generations by entropy improvement. Then there is some kind of final selection. I take all trees for final selection and pull leaves from them for separate further measurements in MT5. The script has not been publicly posted, so there are no detailed descriptions. Apparently it is like selecting the best tree from the forest, but there is a depth limitation to avoid overtraining, and the process takes about 2 days for all cores in the last sample, where not all bars, but only signals for entry, and if all bars are for 3 years, the calculation takes 1.5 months. After calculation I make splitting of the tree, i.e. I remove column with root predictor of the best population tree and start all over again, it appeared that even on 40 this procedure sometimes very good leaves are created, so I came to conclusion, that the best mathematical tree layout is not always the most effective, and one information prevents to show another, what has already appeared later used in the same CatBoost, when they randomly choose predictors from all sample to build one tree.
But the tree is not a magic tree, but the one offered by rpart. I think it's standard there.
first train the model on all features, save the errors
then, sequentially, randomize each of predictors, say, by normal distribution, and check the error again on all features, including this randomized (changed) one, compare it with the initial one. There is no need to retrain the model. And so check each of the predictors. If the predictor was good, the error on the entire sample (including all other original predictors) will increase dramatically compared to the original. Save the error differences, sift out the best fiches based on them. Then, at the end, train on only the best ones and model into production. Bad predictors are noise to the model, what the hell they need with their 1%. Usually 5-10 good ones remain, importance of the rest decreases exponentially (Zipf law).
I tried to teach filters, but not much, I don't see much sense, it's better to put everything into one model at once
If you can, just about the selection of predictors VERY competent(already threw earlier)
It's an interesting variant. I have to try it.
Although I'm afraid that if I apply it to a 100-bar model and try to remove 95 bars and leave the first 5, the result will be 50%. After all, those first 5 barely participated in the splits (on average only 5% of the nodes are built on them).
Found your post about permutation.
It's an interesting variant. I must try it.
Although I'm afraid that if I apply it to a model with 100 bars and try to remove 95 bars and leave the first 5, the result will be 50%. After all, those first 5 barely participated in the splits (on average only 5% of the nodes are built on them).
I do not know what you do with 100 bars, you should probably apply properly and everything will be fine
I do not know what you do with 100 bars, you probably need to apply properly and everything will be
I want to automate the process of sifting out unimportant predictors)
first train the model on all features, save the errors
then, sequentially, randomize each of predictors, say, by normal distribution, and check the error again on all features, including this randomized (changed) one, compare it with the initial one. There is no need to retrain the model. And so check each of the predictors. If the predictor was good, the error on the entire sample (including all other original predictors) will increase dramatically compared to the original. Save the error differences, sift out the best fiches based on them. Then, at the end, train on only the best ones and model into production. Bad predictors are noise to the model, what the hell they need with their 1%. Usually 5-10 good ones remain, importance of the rest decreases exponentially (Zipf law).
I tried to teach filters, but not much, I don't see much sense, it's better to put everything into one model at once
If you can, just about the selection of predictors VERY competent(already threw earlier)
I understood this method differently.
In general, the results from the article are impressive. I have to try it in practice.For the predictor under study, you should not feed random values with a normal distribution, but just shuffle the rows in this column.