Machine learning in trading: theory, models, practice and algo-trading - page 2800

 
Aleksey Vyazmikin #:

There's over 30% first class. And, yeah, maybe, I don't see the problem. It's enough to find one rule\list that will be more likely to predict "1" than "0", albeit rarely.

Besides, no one can change the dataset by balancing the classes.

You were complaining about catbust, and catbust is not a tree\rule\list.

 
Only NS need balancing. Wooden models do not require balancing.
 
mytarmailS #:

You were complaining about the catbusters, and catbusters aren't wood.

The complaint is not about the algorithm, it is what it is, but about the fact that it is better to feed it with already chewed data.

Earlier you understood it somehow...

Forum on trading, automated trading systems and testing trading strategies.

Machine learning in trading: theory, models, practice and algo-trading

mytarmailS, 2016.10.29 11:22 pm.

hypothetical situation....

We have 100 potential predictors, for simplicity of explanation let them be indicators.

Let's imagine that we initially know that in all these predictors there is only one profitable situation, it is when RSI crossed 90 and stochastic has just become below zero (the situation from the ceiling, of course), this situation gives a price drop with a probability of 90%, all other predict ors are complete noise, all other situations in the predictors RSI and stochastic are also complete noise, and there are hundreds and hundreds of different situations....

so we have about 0.01% of useful signal to 99.9% of noise.

Suppose by some miracle your MO weeds out all 98 predictors and leaves only two - RSI and stochastic.

In RSI there are hundreds of situations RSI>0, RSI>13, RSI<85, RSI=0, RSI<145, ............. and so hundreds and hundreds, in stochastic there are no less situations, the working situation is only one, since you train MO to recognise all price movements, MO will build models taking into account all possible situations that exist in RSI and stochastic, and the probability in those situations that they will work is almost zero, but MO is obliged to take them into account and build some models on them, despite the fact that it is the real noise, and that one working situation will just get lost among hundreds of other solutions, that's the retraining.....

Well, how did you get it at last???


Justify what model representation and target proportions have to do with it. I am saying that the model can be represented as a modernised sheet - a rule.

 
elibrarius #:
Only NS need balancing. Tree models do not require balancing.

This is so for good data, in any case counters inside the algorithm work and make decisions on the number of allocated targets...

 
Aleksey Vyazmikin #:

The peculiarity here is that the CatBoost model prefers to assign all examples to a probability less than 0.5 - thus it does not classify the target "1", and what is between 0 and 0.5 is also not very well distributed.

If we have 100 examples of the target 5 labels ("A") and 95 labels ("B").

then the model can't give a probability for label "A" greater than 0.5.

In some individual rule it can, but the post says catbust, and this is a model (sum of rule predictions), not a single rule, and the sum will not have such a high probability.


Even if the model is sure that it's mark "A". the sum of the probability of the rules of mark "A" will be overridden by the sum of the rules of "B" because the rules of "B" will be much bigger.

 
elibrarius #:
Only NS need balancing. Wooden models do not require balancing.

https://stats.stackexchange.com/questions/340854/random-forest-for-imbalanced-data

random forest for imbalanced data?
random forest for imbalanced data?
  • 2018.04.16
  • MSilvy MSilvy 139 1 1 silver badge 8 8 bronze badges
  • stats.stackexchange.com
I have a dataset where yes=77 and no=16000, a highly imbalanced dataset. My plan was to identify the most important variables influencing the response variable using random forest and then develop a logistic regression model using the selected variable. I am planning to use...
 
mytarmailS #:

if we have 5 marks ("A") and 95 marks ("B") per 100 examples of the target

then the model cannot give a probability for label "A" greater than 0.5

In some individual rule it can, but the post says catbust, and this is a model (sum of rule predictions), not a single rule, and the sum will not have such a high probability.


Even if the model is sure that it's mark "A". the sum of the probability of the rules of mark "A" will be overpredicted by the sum of the rules of "B" because the rules of "B" will be much bigger.

It all depends on the predictors and the number of trees in the model.

I don't insist on CatBoost model for training.

 

https://www.mql5.com/ru/blogs/post/723619

77 out of 16000 is too few. 77 examples are hardly representative.
The only option is to study the tree very deeply.

Нужна ли деревьям и лесам балансировка по классам?
Нужна ли деревьям и лесам балансировка по классам?
  • www.mql5.com
Я тут читаю: Флах П. - Машинное обучение. Наука и искусство построения алгоритмов, которые извлекают знания из данных - 2015 там есть несколько страниц посвященных этой теме. Вот итоговая: Отмеченный
 
elibrarius #:

https://www.mql5.com/ru/blogs/post/723619

77 out of 16000 is too few. 77 examples are hardly representative.
The only option is to study the tree very deeply.

How's the book?
 
mytarmailS #:
How's the book?
I read it 4 years ago. I can't really say anything about it anymore. I copied one page to my blog since it was new to me and decided I should keep it as a memento. )
Reason: