How to improve the quality of learning.

Aleksei Kuznetsov 2020.11.24 19:01 #71

Valeriy Yastremskiy:

Apparently we have different ideas about random bousting. Decisive tree, it's about selected features from a random set. The point is that the sets are random, but the selection / clustering into bad good ones was originally there. It's like throwing a needle, measuring angles and calculating the number of pi)

from the wiki

Let's build adecision tree that classifies the samples of the given subsample, and during the creation of the next node of the tree we will choose a set of features on the basis of which the partitioning is performed (not from allM features , but only fromm randomly chosen ones). The selection of the best of thesem features can be done in different ways. The original Breiman code uses theGini criterion, which is also used in theCART decisive tree algorithm. Some implementations of the algorithm use theinformation gain criterion instead. ^[3]

I don't understand you.
I have heard about Random boosting for the first time too.
I was talking about random forest.

Aleksei Kuznetsov 2020.11.24 19:05 #72

Maxim Dmitrievsky:

Yes, there are many trees, but each one is trying to train itself best on different traits. This is not the same as lumping multiple forests (including bad ones) together

In a random forest, the trees are averaged.
However, combining case forests based on the same attributes is equivalent to 1 forest with the number of trees = the number of trees in all forests to be combined. The only difference will be different initialisation of the HCS.

[Deleted] 2020.11.24 19:08 #73

elibrarius:
Trees in a case forest are averaged.
However, merging of case forests based on the same features is equal to 1 forest with the number of trees = the number of trees in all merged forests. The only difference will be different initialisation of the HCS.

The difference is that each tree without pruning is able to perfectly remember the dataset, which causes it to retrain. An ensemble of trees is against overtraining, because some averaging occurs. But each tree is good on its own.

If you herd classifiers, it's a different story there. Averaging with a bad classifier degrades the overall result

Machine learning in trading: Discussion of article "Grokking What to feed to

Aleksei Kuznetsov 2020.11.24 19:17 #74

Maxim Dmitrievsky:

The difference is that every tree without pruning is able to remember the dataset perfectly, which causes it to retrain. An ensemble of trees is against overlearning, since some averaging occurs. But each tree is good on its own.

If you herd classifiers, it's a different story there. Averaging with a bad classifier degrades the overall result

Besides pruning, there is a limit on depth and on the number of examples in the leaf.

A single tree is also a classifier.

I hope you will find time to compare the average and the best results on the exam sample. Not to argue theoretically, but to confirm one of the variants by practice.

Machine learning in trading: What to feed to Machine learning for robots

Valeriy Yastremskiy 2020.11.24 19:29 #75

elibrarius:
I don't understand you.
This is the first time I've heard of Random boosting, too.
I was talking about random forest.

I apologise, typo. Forest of course, forest. By the way, it was first implemented in fortran 77 (with oop) in 1986, when people here were still learning fortran 4 (without oop).

But it doesn't change the point. Sampling the best features, trees in an ensemble improves the result. But at the same time clustering into good bad sets is performed from a random set of features, not the full set, which reduces the required resources, and at the same time, as practice has shown, does not significantly worsen the result.

Machine learning in trading: Is there a pattern Discussion of article "Random

[Deleted] 2020.11.24 19:29 #76

elibrarius:

In addition to pruning, there is a depth limit and a limit on the number of examples per sheet.

One tree is also a classifier.

I hope you will find time to compare the average and the best results on the exam sample. Not to argue theoretically, but to confirm one of the variants by practice.

Trained 20 models

Iteration:  0 R^2:  0.8235250920362135
Iteration:  1 R^2:  0.6105081195352418
Iteration:  2 R^2:  0.5999893279334669
Iteration:  3 R^2:  0.7034867465493326
Iteration:  4 R^2:  0.49771677587528107
Iteration:  5 R^2:  0.8190243407873834
Iteration:  6 R^2:  0.9160173823652586
Iteration:  7 R^2:  0.809572709204347
Iteration:  8 R^2:  0.8537940261267768
Iteration:  9 R^2:  0.7244418893207643
Iteration:  10 R^2:  0.8809333905804972
Iteration:  11 R^2:  0.7920488879746739
Iteration:  12 R^2:  0.8377299883565552
Iteration:  13 R^2:  0.8667892348319326
Iteration:  14 R^2:  0.6321639879122785
Iteration:  15 R^2:  0.7561855032577106
Iteration:  16 R^2:  0.4121119648365902
Iteration:  17 R^2:  0.7421029264382919
Iteration:  18 R^2:  0.836331050771787
Iteration:  19 R^2:  0.7477743928781102

Best:

All 20:

[Deleted] 2020.11.24 19:35 #77

50 models

[Deleted] 2020.11.24 19:37 #78

100 models

best

all

[Deleted] 2020.11.24 19:40 #79

Once again, on 50 models:

Iteration:  0 R^2:  0.797041035933919
Iteration:  1 R^2:  0.6824496839528826
Iteration:  2 R^2:  -0.10034902026957526
Iteration:  3 R^2:  0.328548941268331
Iteration:  4 R^2:  0.057993335625261544
Iteration:  5 R^2:  0.43595119223755463
Iteration:  6 R^2:  -0.1461644857089356
Iteration:  7 R^2:  -0.9017316279265075
Iteration:  8 R^2:  0.0031339532771327283
Iteration:  9 R^2:  -0.6090350854501592
Iteration:  10 R^2:  -0.7554715262958651
Iteration:  11 R^2:  0.8889548573023011
Iteration:  12 R^2:  -0.6851507097155135
Iteration:  13 R^2:  -0.042098743896817226
Iteration:  14 R^2:  0.22006019984338276
Iteration:  15 R^2:  -0.4950383969975669
Iteration:  16 R^2:  0.2773014537990013
Iteration:  17 R^2:  0.4472756948107278
Iteration:  18 R^2:  0.3842534295398661
Iteration:  19 R^2:  -0.06660146376162235
Iteration:  20 R^2:  -0.13214701476491186
Iteration:  21 R^2:  -0.014549407007194204
Iteration:  22 R^2:  0.11446106552499291
Iteration:  23 R^2:  0.28201359760085487
Iteration:  24 R^2:  -0.32881820516653015
Iteration:  25 R^2:  -0.11531960758010862
Iteration:  26 R^2:  -0.22343090109420405
Iteration:  27 R^2:  -0.2359542081469308
Iteration:  28 R^2:  -0.2601186685105703
Iteration:  29 R^2:  0.7814611177095688
Iteration:  30 R^2:  -0.25351714267240644
Iteration:  31 R^2:  0.23253274050003103
Iteration:  32 R^2:  -0.06336213642832789
Iteration:  33 R^2:  0.8253438383511618
Iteration:  34 R^2:  0.2634214576140671
Iteration:  35 R^2:  0.1234251060806747
Iteration:  36 R^2:  0.5421316161448162
Iteration:  37 R^2:  0.2050233417898205
Iteration:  38 R^2:  0.4735349758266585
Iteration:  39 R^2:  -0.3067801197806268
Iteration:  40 R^2:  0.578989248941286
Iteration:  41 R^2:  0.2660816711693378
Iteration:  42 R^2:  0.19419203781618766
Iteration:  43 R^2:  -0.5900063179871913
Iteration:  44 R^2:  -0.4341693524447342
Iteration:  45 R^2:  0.593129434935225
Iteration:  46 R^2:  -0.6595885008415516
Iteration:  47 R^2:  -0.41482641919393526
Iteration:  48 R^2:  0.27611537596461266
Iteration:  49 R^2:  0.2459498592107655

Best

averages

[Deleted] 2020.11.24 19:42 #80

Once again.

Discussion of article "Advanced resampling and selection of CatBoost models by brute-force method" - page 8