The forest combines the best models for optimal results.

Aleksei Kuznetsov 2020.11.24 15:41 #51

Maxim Dmitrievsky:
I didn't see that in the textbooks. I know you can improve quality by combining good models. But not the other way round :)

Well, that's exactly what the forest does. All the good and the bad. And the forest was made by very cool mathematicians and statisticians.

And they probably tried to combine only the best models (if that's what you and I thought).

FOREX - Trends, forecasts Machine learning in trading: Pivot, support and resistance

[Deleted] 2020.11.24 15:43 #52

elibrarius:

Well, that's exactly what the forest does. All the good and the bad. And the forest was made by very cool mathematicians and statisticians.

And they probably tried to combine only the best models (if that's what you and I are thinking).

even the forest has a reasonable limit of 50-100 trees, empirically derived by someone, it doesn't make sense anymore

pooling the best ones is common practice. On kaggle everyone likes to stack boosts. At least the meme used to go

Machine learning in trading: The ticking history of The Sultonov Regression Model

Aleksei Kuznetsov 2020.11.24 15:45 #53

Maxim Dmitrievsky:
even a forest has a reasonable limit of 50-100 trees, empirically deduced by someone, it doesn't make sense anymore.

Yes. It doesn't make sense any more. Time is wasted, and quality gain is very little.

You have just 50 clustering variants. It will be fine to average them.

[Deleted] 2020.11.24 15:48 #54

elibrarius:
Yes. There's no point. Time is wasted, and the quality gain is very little.

You have just 50 clustering variants. It would be fine to average them out.

I see the point of only clustering the best ones.

Aleksei Kuznetsov 2020.11.24 15:49 #55

Maxim Dmitrievsky:

pooling the best is common practice. On kaggle everyone likes to stack boosts. At least the meme used to go

I guess it's not the Breiman's who sit on kaggle)))))) So they are experimenting...

Aleksei Kuznetsov 2020.11.24 15:50 #56

Maxim Dmitrievsky:

I see the point of only putting the best ones together.

Try both and compare the result on the exam sample.

[Deleted] 2020.11.24 15:50 #57

elibrarius:

The Breiman's are probably not sitting on kaggle))))) So they are experimenting...

these are the ones who won the contests )

[Deleted] 2020.11.24 15:53 #58

elibrarius:
Try both and compare the result on the exam sample.

No, there's no point in adding bad models. By definition.

In training, averaging is one thing, but averaging the trained is another. Here you are deliberately shooting yourself in the foot by adding bad ones. They introduce error and that's it. And there's no such practice, I haven't seen it anywhere.

plus imagine the cost of getting a signal from 50 models, the brakes on testing.

Not the Grail, just Adviser to the whole Machine learning in trading:

Aleksei Kuznetsov 2020.11.24 15:56 #59

Maxim Dmitrievsky:

No, there's no point in adding bad models. By definition.

In training, averaging is one thing, but averaging the trained is another. Here you are deliberately shooting yourself in the foot by adding bad ones. They introduce error and that's it.

Try it. It won't take long. Wouldn't it be interesting to test it in an experiment? Breiman didn't do it in his random forest.

Aleksei Kuznetsov 2020.11.24 15:57 #60

Maxim Dmitrievsky:

No, there's no point in adding bad models. By definition.

In training, averaging is one thing, but averaging the trained is another. Here you are deliberately shooting yourself in the foot by adding bad ones. They introduce error and that's it. And there's no such practice, I haven't seen it anywhere.

plus imagine the cost of getting a signal from 50 models, the brakes on testing.

That's what happens in any random forest.

Discussion of article "Advanced resampling and selection of CatBoost models by brute-force method" - page 6