Machine learning in trading: theory, models, practice and algo-trading - page 2800
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
There's over 30% first class. And, yeah, maybe, I don't see the problem. It's enough to find one rule\list that will be more likely to predict "1" than "0", albeit rarely.
Besides, no one can change the dataset by balancing the classes.You were complaining about catbust, and catbust is not a tree\rule\list.
You were complaining about the catbusters, and catbusters aren't wood.
The complaint is not about the algorithm, it is what it is, but about the fact that it is better to feed it with already chewed data.
Earlier you understood it somehow...
Forum on trading, automated trading systems and testing trading strategies.
Machine learning in trading: theory, models, practice and algo-trading
mytarmailS, 2016.10.29 11:22 pm.
hypothetical situation....
We have 100 potential predictors, for simplicity of explanation let them be indicators.
Let's imagine that we initially know that in all these predictors there is only one profitable situation, it is when RSI crossed 90 and stochastic has just become below zero (the situation from the ceiling, of course), this situation gives a price drop with a probability of 90%, all other predict ors are complete noise, all other situations in the predictors RSI and stochastic are also complete noise, and there are hundreds and hundreds of different situations....
so we have about 0.01% of useful signal to 99.9% of noise.
Suppose by some miracle your MO weeds out all 98 predictors and leaves only two - RSI and stochastic.
In RSI there are hundreds of situations RSI>0, RSI>13, RSI<85, RSI=0, RSI<145, ............. and so hundreds and hundreds, in stochastic there are no less situations, the working situation is only one, since you train MO to recognise all price movements, MO will build models taking into account all possible situations that exist in RSI and stochastic, and the probability in those situations that they will work is almost zero, but MO is obliged to take them into account and build some models on them, despite the fact that it is the real noise, and that one working situation will just get lost among hundreds of other solutions, that's the retraining.....
Well, how did you get it at last???
Justify what model representation and target proportions have to do with it. I am saying that the model can be represented as a modernised sheet - a rule.
Only NS need balancing. Tree models do not require balancing.
This is so for good data, in any case counters inside the algorithm work and make decisions on the number of allocated targets...
The peculiarity here is that the CatBoost model prefers to assign all examples to a probability less than 0.5 - thus it does not classify the target "1", and what is between 0 and 0.5 is also not very well distributed.
If we have 100 examples of the target 5 labels ("A") and 95 labels ("B").
then the model can't give a probability for label "A" greater than 0.5.
In some individual rule it can, but the post says catbust, and this is a model (sum of rule predictions), not a single rule, and the sum will not have such a high probability.
Even if the model is sure that it's mark "A". the sum of the probability of the rules of mark "A" will be overridden by the sum of the rules of "B" because the rules of "B" will be much bigger.
Only NS need balancing. Wooden models do not require balancing.
https://stats.stackexchange.com/questions/340854/random-forest-for-imbalanced-data
if we have 5 marks ("A") and 95 marks ("B") per 100 examples of the target
then the model cannot give a probability for label "A" greater than 0.5
In some individual rule it can, but the post says catbust, and this is a model (sum of rule predictions), not a single rule, and the sum will not have such a high probability.
Even if the model is sure that it's mark "A". the sum of the probability of the rules of mark "A" will be overpredicted by the sum of the rules of "B" because the rules of "B" will be much bigger.
It all depends on the predictors and the number of trees in the model.
I don't insist on CatBoost model for training.
https://stats.stackexchange.com/questions/340854/random-forest-for-imbalanced-data
https://www.mql5.com/ru/blogs/post/723619
77 out of 16000 is too few. 77 examples are hardly representative.
The only option is to study the tree very deeply.
https://www.mql5.com/ru/blogs/post/723619
77 out of 16000 is too few. 77 examples are hardly representative.
The only option is to study the tree very deeply.
How's the book?