Discussion of article "Advanced resampling and selection of CatBoost models by brute-force method" - page 5

 
elibrarius:
Then it is necessary to average exactly. Otherwise it'll be "different" on the new data.

you don't need to average exactly. The sampler already has averaging built in.

GMM sampler can create a bad sample, with skewed classes, etc., sampling is random. Does it make sense to take it into account?

 
Maxim Dmitrievsky:

definitely don't need to average

GMM sampler can create a bad sample, with skewed classes etc, sampling is random. Does it make sense to take it into account?

A random forest similarly creates a set of successful and not-so-successful trees. Averaging all models shows a better result on new data than a single best tree.

 
elibrarius:

A random forest similarly creates a set of good and not so good trees. Averaging all models shows a better result on new data than a single best tree.

and if you compose several forests, there will be approximately zero transactions, the signals will overlap.

 
Maxim Dmitrievsky:

and if you compose several scaffolds, the trades will be about zero, the signals will overlap.

Several (e.g. 10) forests of 100 is the same as one forest of 1000 trees. It gives a lot of signals.

 
elibrarius:

A few, (e.g. 10) forests of 100 is the same as one forest of 1000 trees. It gives a lot of signals.

Any practice? I've done it. Signals become scarce.
 
If the classification is through 0.5. Will trigger from 0.51 and 0.49 instead of 0.6 and 0.4
 
Maxim Dmitrievsky:
Any practice? I've done it before. Signals become few.
If you have set the indentation from 0.5, you just need to reduce it. If there are 10 times more trees, then the indentation is 10 times less
 
elibrarius:
If you have a 0.5 indentation set, you just need to reduce it.
I agree with that, it was still getting small. And I don't quite understand why you should add randomly bad models. Compose cool ones that improve each other - another conversation
 
Maxim Dmitrievsky:
I agree with that, it wasn't getting enough anyway. And I don't quite understand why you would add randomly bad models. Compose cool ones that improve each other - another conversation
With the forest did this about 2 years ago, trained 1000, took the best 10-50. It did not work, apparently the result on new data was not very good.
It is averaging of everything that is needed. The basic descriptions of the scaffolding principle say so. Like the crowd knows better than one expert.
 
elibrarius:
I did this with timber about 2 years ago, trained 1000, took the best 10-50. It did not work, apparently the result on new data was not very good.
It is the averaging of everything in a row that is needed. The basic descriptions of the scaffolding principle say so. Like the crowd knows better than one expert.
I didn't see that in the textbooks. I know you can improve quality by combining good models. But not the other way round :)