How to improve the best models on new data: Averaging all models shows better results than a single best tree

Maxim Dmitrievsky 2020.11.24 15:11 #41

elibrarius:
Then it is necessary to average exactly. Otherwise it'll be "different" on the new data.

you don't need to average exactly. The sampler already has averaging built in.

GMM sampler can create a bad sample, with skewed classes, etc., sampling is random. Does it make sense to take it into account?

Machine learning in trading: Discussion of article "Metamodels Any questions from newcomers

Forester 2020.11.24 15:14 #42

Maxim Dmitrievsky:

definitely don't need to average

GMM sampler can create a bad sample, with skewed classes etc, sampling is random. Does it make sense to take it into account?

A random forest similarly creates a set of successful and not-so-successful trees. Averaging all models shows a better result on new data than a single best tree.

Maxim Dmitrievsky 2020.11.24 15:16 #43

elibrarius:

A random forest similarly creates a set of good and not so good trees. Averaging all models shows a better result on new data than a single best tree.

and if you compose several forests, there will be approximately zero transactions, the signals will overlap.

Forester 2020.11.24 15:18 #44

Maxim Dmitrievsky:

and if you compose several scaffolds, the trades will be about zero, the signals will overlap.

Several (e.g. 10) forests of 100 is the same as one forest of 1000 trees. It gives a lot of signals.

Maxim Dmitrievsky 2020.11.24 15:19 #45

elibrarius:

A few, (e.g. 10) forests of 100 is the same as one forest of 1000 trees. It gives a lot of signals.

Any practice? I've done it. Signals become scarce.

Forester 2020.11.24 15:20 #46

If the classification is through 0.5. Will trigger from 0.51 and 0.49 instead of 0.6 and 0.4

Forester 2020.11.24 15:20 #47

Maxim Dmitrievsky:
Any practice? I've done it before. Signals become few.

If you have set the indentation from 0.5, you just need to reduce it. If there are 10 times more trees, then the indentation is 10 times less

Maxim Dmitrievsky 2020.11.24 15:24 #48

elibrarius:
If you have a 0.5 indentation set, you just need to reduce it.

I agree with that, it was still getting small. And I don't quite understand why you should add randomly bad models. Compose cool ones that improve each other - another conversation

Forester 2020.11.24 15:37 #49

Maxim Dmitrievsky:
I agree with that, it wasn't getting enough anyway. And I don't quite understand why you would add randomly bad models. Compose cool ones that improve each other - another conversation

With the forest did this about 2 years ago, trained 1000, took the best 10-50. It did not work, apparently the result on new data was not very good.
It is averaging of everything that is needed. The basic descriptions of the scaffolding principle say so. Like the crowd knows better than one expert.

Machine learning in trading: Failed forex strategies, opinions [ARCHIVE] Any rookie question,

Maxim Dmitrievsky 2020.11.24 15:39 #50

elibrarius:
I did this with timber about 2 years ago, trained 1000, took the best 10-50. It did not work, apparently the result on new data was not very good.
It is the averaging of everything in a row that is needed. The basic descriptions of the scaffolding principle say so. Like the crowd knows better than one expert.

I didn't see that in the textbooks. I know you can improve quality by combining good models. But not the other way round :)

Discussion of article "Advanced resampling and selection of CatBoost models by brute-force method" - page 5