How do you know about Multicollinearity? - General

mytarmailS 2022.10.27 15:43 #28031

Aleksey Vyazmikin #:

I'm doingit now, including for a forum thread to seeif it makessense for that sample.

It doesn't

Aleksey Vyazmikin 2022.10.27 15:46 #28032

mytarmailS #:

There's no point

You think that sample is hopeless?

mytarmailS 2022.10.27 15:49 #28033

Aleksey Vyazmikin #:

CatBoost chooses randomly the number of predictors at each iteration of splitting or tree building - it depends on settings, and it means that strongly correlated predictors have more chance to get into random, i.e. not at them, but at the information they carry.

Yeah, and the creators of boosts don't know that...

They also don't know that it's possible to filter out signs by correlation))) how would they know, the method is only 50 years old))))

do you really believe that you know more than they do?

Aleksey Vyazmikin #:

Do you think that sample is hopeless?

Sure... Boost takes it all into account.

And don't give me a hard time, I'm probably younger than you.)

Does the Grail exist? Is there a market Offtopic: Stealing Aleynikov or

mytarmailS 2022.10.27 16:03 #28034

Aleksey Vyazmikin #:

You think that sample is hopeless?

https://datascience.stackexchange.com/questions/12554/does-xgboost-handle-multicollinearity-by-itself

Decision trees are inherently immune to multicollinearity. For example, if you have 2 functions ,

that are 99% correlated, the tree will only choose one of them when making a partition decision. Other models,

such as logistic regression, will use both functions.

Since bousting trees use separate decision trees, they are also not affected by multicollinearity.

========

you can use this approach, evaluate the importance of each function and keep only the best functions for your final model.

Which is actually what I'm telling you earlier

Does XGBoost handle multicollinearity by itself?

2016.07.02
ope ope 1,653 3 3 gold badges 16 16 silver badges 27 27 bronze badges
datascience.stackexchange.com

I'm currently using XGBoost on a data-set with 21 features (selected from list of some 150 features), then one-hot coded them to obtain ~98 features. A few of these 98 features are somewhat redundant, for example: a variable (feature) $A$ also appears as $\frac{B}{A}$ and $\frac{C}{A}$. My questions are : From what I understand, the model is...

Complex arbitration Is there a pattern Algorithms and Trading Systems

Aleksey Vyazmikin 2022.10.27 18:14 #28035

mytarmailS #:

Yeah, and the creators of boosts like that don't know that....

They also don't know that it is possible to filter out signs by correlation)) how could they know, the method is only 50 years old)))

Do you really believe you know more than they do?

I do. Boost takes it all into account.

And don't give me that shit, I'm probably younger than you.)

I analyse the results of the models and I see that they grab highly correlated predictors, for example predictors based on time - even if they have a small time lag.

I think they know everything perfectly well, but also they shouldn't tell you about platitudes that are decades old....

About "You" or "You" - I think it's better for everyone to call the interlocutor as it suits him, if it doesn't carry an offensive message and doesn't hinder constructive dialogue.

mytarmailS #:

https://datascience.stackexchange.com/questions/12554/does-xgboost-handle-multicollinearity-by-itself

Decision trees are inherently immune to multicollinearity. For example, if you have 2 functions,

that are 99% correlated, the tree will choose only one of them when deciding whether to split. Other models,

such as logistic regression, will use both functions.

Because bousting trees use separate decision trees, they are also not affected by multicollinearity.

========

you can use this approach, evaluate the importance of each feature and keep only the best features for your final model.

Which is actually what I'm telling you earlier

That's the thing, it will choose - yes one, but how many times will this choice go through....

Besides CatBoost has some differences from xgboost, and there are different results on different samples, on average CatBoost is faster and even better, but not always.

Neural Networks for trading Any questions from newcomers Is there a pattern

Aleksey Vyazmikin 2022.10.27 18:16 #28036

Plus I have my own method of grouping similar predictors and selecting the best option from them, and I need a control group in the form of correlation...

Aleksey Vyazmikin 2022.10.27 18:17 #28037

The script all works - guess I'll have to leave it overnight....

Forester 2022.10.27 19:50 #28038

Aleksey Vyazmikin #:

CatBoost chooses randomly the number of predictors at each iteration of splitting or tree building - it depends on settings, and it means that strongly correlated predictors have more chance to get into random, i.e. not at them, but at the information they carry.

Are you sure it's picking predictors at random? I wasn't catbusting, I was looking at the code of basic bousting examples. All the predictors are used there. I.e., the best one is taken. The correlated one will be next to it, but slightly worse. But at some other split levels or in correction trees, another of the correlated predictors may be better.

Is there a pattern FOREX - Trends, forecasts "The 'perfect' trading system

Maxim Dmitrievsky 2022.10.28 01:02 #28039

Aleksey Vyazmikin grouping similar predictors and selecting the best variant from them, and I need a control group in the form of correlation....

So throw me a couple of informative formulae to try out.

Mikhail Mishanin 2022.10.28 05:55 #28040

https://habr.com/ru/post/695276/ may be useful/interesting to some people

Хитрые методики сэмплинга данных

2022.10.27
habr.com

Любой, кто хоть раз обучал нейронки, знает, что принято на каждой эпохе шаффлить датасет, чтобы не повторялся порядок батчей. А зачем это делать? Обычно это объясняют тем, что шаффлинг улучшает генерализацию сетей, делает точнее эстимейт градиента на батчах и уменьшает вероятность застревания SGD в локальных минимумах. Здесь можно посмотреть...

Machine learning in trading: theory, models, practice and algo-trading - page 2804