Analysis of predictor importance in machine learning models

Alexander_K2 2019.01.29 19:31 #12791

I stand by my opinion: here reside two indisputable relatives of the venerable KsanKsanych (Fa). 1) Alyoshenka the son, who is overtaken by angry investors, and 2) grandson Kesha, who promises billions to everyone who reads his grandfather's creations.

Please do not confuse the two!

FOREX and ECONOMETRY. Theory, EURUSD - Trends, Forecasts Sparring on MetaQuotes-Demo demo

Aleksey Vyazmikin 2019.01.29 20:47 #12792

Interesting opinion by a StarCraft 2 game professional on what's going on. Especially about cheating in the last match. We should not forget that the organization of such spectacles from large companies is primarily a marketing move. The right thing to do would be to buy their shares for this event inside the day.

ACTIONS news , forecasts Interesting and Humour Trading on the price

Aleksei Kuznetsov 2019.01.30 09:21 #12793

If you are interested, you can compare the importance tables by permutation and by actual removal of a predictor

Importance of predictors by brute force (removing 1 at a time) , feature, absolute value, related value * 100 1) 17 0.01097643069603077 99 2) 30 0.006790004907923086 61 3) 61 0.004684715336508855 42 4) 2 -0.0002692516957934765 -2 5) 59 -0.0006465367565449825 -5 6) 34 -0.0006503517167333328 -5 7) 5 -0.001340840857516234 -12 8) 41 -0.001504570905518282 -13 9) 15 -0.001971414359495396 -17 10) 49 -0.002008411960897655 -18 11) 6 -0.002027305543154334 -18 12) 55 -0.002292162160081906 -20 13) 47 -0.002398304141661728 -21 14) 29 -0.003010337993465118 -27 15) 51 -0.004160368206123241 -37 16) 45 -0.004454751375256194 -40 17) 31 -0.004888451443569572 -44 18) 0 -0.00493201061731692 -44 19) 48 -0.005610904510929521 -51 20) 3 -0.005764515487066274 -52 21) 57 -0.005965409431599886 -54 22) 10 -0.006056332510674986 -55 23) 35 -0.006367565963429744 -58 24) 58 -0.006638024809636447 -60 25) 43 -0.007371220115761079 -67 26) 9 -0.007420288551508419 -67 27) 21 -0.007838972444520739 -71 28) 4 -0.007840269966254226 -71 29) 44 -0.008004942292835771 -72 30) 16 -0.008290498838290847 -75 31) 36 -0.008995332552560964 -81 32) 50 -0.009024243316015798 -82 33) 27 -0.009105675807931257 -82 34) 24 -0.01027361001595535 -93 35) 7 -0.01052719088846928 -95 36) 26 -0.01082406611271462 -98 37) 18 -0.01155880619525071 -105 38) 60 -0.01156309946744785 -105 39) 56 -0.01203862169736691 -109 40) 1 -0.01203862169736691 -109 41) 25 -0.0122272134638268 -111 42) 38 -0.01241174339783128 -113 43) 62 -0.01249635462233889 -113 44) 28 -0.01266702047388507 -115 45) 11 -0.01359028620740281 -123 46) 39 -0.01404126970316556 -127 47) 20 -0.01439737068264699 -131 48) 52 -0.01439756725211659 -131 49) 42 -0.01444571512808378 -131 50) 22 -0.01551886866180208 -141 51) 33 -0.01615798882405024 -147 52) 12 -0.01905830020505599 -173 53) 14 -0.01926462731981513 -175 54) 37 -0.01995084300903066 -181 55) 40 -0.020510512124551 -186 56) 19 -0.021415509666178 -195 57) 63 -0.02151966963894812 -196 58) 54 -0.02355949029687353 -214 59) 64 -0.02507021252693609 -228 60) 32 -0.02702794503628224 -246 61) 8 -0.02803580711831312 -255 62) 13 -0.03090123190409769 -281 63) 46 -0.03344678821960098 -304 64) 53 -0.03558721250407129 -324 65) 23 -0.04407219798162174 -401	Importance of predictors by the permutation method , feature, absolute value, related value * 100 1) 55 0.04340158682225395 99 2) 61 0.02562763893643727 59 3) 58 0.02546470705535522 58 4) 56 0.02529445125891924 58 5) 59 0.02513377163594621 57 6) 57 0.02208166602125552 50 7) 64 0.02019285632774162 46 8) 60 0.0160907362360114 37 9) 43 0.0125324616278514 28 10) 35 0.01239249171969528 28 11) 13 0.01233138008911674 28 12) 24 0.01170363669371338 26 13) 62 0.01162424331038356 26 14) 63 0.01149019906346291 26 15) 45 0.01127777161657609 25 16) 34 0.01085020622422195 24 17) 46 0.01061844113396632 24 18) 20 0.01007598993178244 23 19) 2 0.009874770749918993 22 20) 19 0.00973881761283335 22 21) 1 0.009100774421598679 20 22) 32 0.009027289557555301 20 23) 9 0.008970631365350451 20 24) 54 0.00802484531062575 18 25) 8 0.007874015748031482 18 26) 53 0.007388216046985141 17 27) 41 0.006952887365763216 16 28) 12 0.0065631543248105 15 29) 21 0.006511968996697037 15 30) 31 0.006445981174562854 14 31) 30 0.005790682414698156 13 32) 42 0.005742446472030011 13 33) 22 0.003590654957257189 8 34) 4 0.003590358440616087 8 35) 38 0.00350243104857792 8 36) 10 0.00350243104857792 8 37) 29 0.003392223030944636 7 38) 5 0.003253553701826867 7 39) 52 0.003019071994331074 6 40) 11 0.002622140078149371 6 41) 15 0.001506974549529611 3 42) 49 0.001178236999850979 2 43) 27 0.000646877104963639 1 44) 23 0.0001088642328799794 0 45) 0 -0.0007427642973199949 -1 46) 36 -0.0008086747680855211 -1 47) 18 -0.001719116017552688 -3 48) 16 -0.003868408494392753 -8 49) 7 -0.004264601904658535 -9 50) 25 -0.004436590312574581 -10 51) 44 -0.004549722466056144 -10 52) 17 -0.005094229165450173 -11 53) 33 -0.007112771718937178 -16 54) 50 -0.008009653155771651 -18 55) 6 -0.008725562553674474 -20 56) 26 -0.01000190433609049 -23 57) 47 -0.01158648521535965 -26 58) 3 -0.01809942562041326 -41 59) 51 -0.01843159353630121 -42 60) 39 -0.02375369534904158 -54 61) 40 -0.02659139305699997 -61 62) 37 -0.02970174182772609 -68 63) 48 -0.031083105562031 -71 64) 14 -0.03323633066169551 -76 65) 28 -0.03952723165321592 -91

Importance of predictors by brute force (removing 1 at a time)
, feature, absolute value, related value * 100
1) 17 0.01097643069603077 99
2) 30 0.006790004907923086 61
3) 61 0.004684715336508855 42
4) 2 -0.0002692516957934765 -2
5) 59 -0.0006465367565449825 -5
6) 34 -0.0006503517167333328 -5
7) 5 -0.001340840857516234 -12
8) 41 -0.001504570905518282 -13
9) 15 -0.001971414359495396 -17
10) 49 -0.002008411960897655 -18
11) 6 -0.002027305543154334 -18
12) 55 -0.002292162160081906 -20
13) 47 -0.002398304141661728 -21
14) 29 -0.003010337993465118 -27
15) 51 -0.004160368206123241 -37
16) 45 -0.004454751375256194 -40
17) 31 -0.004888451443569572 -44
18) 0 -0.00493201061731692 -44
19) 48 -0.005610904510929521 -51
20) 3 -0.005764515487066274 -52
21) 57 -0.005965409431599886 -54
22) 10 -0.006056332510674986 -55
23) 35 -0.006367565963429744 -58
24) 58 -0.006638024809636447 -60
25) 43 -0.007371220115761079 -67
26) 9 -0.007420288551508419 -67
27) 21 -0.007838972444520739 -71
28) 4 -0.007840269966254226 -71
29) 44 -0.008004942292835771 -72
30) 16 -0.008290498838290847 -75
31) 36 -0.008995332552560964 -81
32) 50 -0.009024243316015798 -82
33) 27 -0.009105675807931257 -82
34) 24 -0.01027361001595535 -93
35) 7 -0.01052719088846928 -95
36) 26 -0.01082406611271462 -98
37) 18 -0.01155880619525071 -105
38) 60 -0.01156309946744785 -105
39) 56 -0.01203862169736691 -109
40) 1 -0.01203862169736691 -109
41) 25 -0.0122272134638268 -111
42) 38 -0.01241174339783128 -113
43) 62 -0.01249635462233889 -113
44) 28 -0.01266702047388507 -115
45) 11 -0.01359028620740281 -123
46) 39 -0.01404126970316556 -127
47) 20 -0.01439737068264699 -131
48) 52 -0.01439756725211659 -131
49) 42 -0.01444571512808378 -131
50) 22 -0.01551886866180208 -141
51) 33 -0.01615798882405024 -147
52) 12 -0.01905830020505599 -173
53) 14 -0.01926462731981513 -175
54) 37 -0.01995084300903066 -181
55) 40 -0.020510512124551 -186
56) 19 -0.021415509666178 -195
57) 63 -0.02151966963894812 -196
58) 54 -0.02355949029687353 -214
59) 64 -0.02507021252693609 -228
60) 32 -0.02702794503628224 -246
61) 8 -0.02803580711831312 -255
62) 13 -0.03090123190409769 -281
63) 46 -0.03344678821960098 -304
64) 53 -0.03558721250407129 -324
65) 23 -0.04407219798162174 -401

Importance of predictors by the permutation method
, feature, absolute value, related value * 100
1) 55 0.04340158682225395 99
2) 61 0.02562763893643727 59
3) 58 0.02546470705535522 58
4) 56 0.02529445125891924 58
5) 59 0.02513377163594621 57
6) 57 0.02208166602125552 50
7) 64 0.02019285632774162 46
8) 60 0.0160907362360114 37
9) 43 0.0125324616278514 28
10) 35 0.01239249171969528 28
11) 13 0.01233138008911674 28
12) 24 0.01170363669371338 26
13) 62 0.01162424331038356 26
14) 63 0.01149019906346291 26
15) 45 0.01127777161657609 25
16) 34 0.01085020622422195 24
17) 46 0.01061844113396632 24
18) 20 0.01007598993178244 23
19) 2 0.009874770749918993 22
20) 19 0.00973881761283335 22
21) 1 0.009100774421598679 20
22) 32 0.009027289557555301 20
23) 9 0.008970631365350451 20
24) 54 0.00802484531062575 18
25) 8 0.007874015748031482 18
26) 53 0.007388216046985141 17
27) 41 0.006952887365763216 16
28) 12 0.0065631543248105 15
29) 21 0.006511968996697037 15
30) 31 0.006445981174562854 14
31) 30 0.005790682414698156 13
32) 42 0.005742446472030011 13
33) 22 0.003590654957257189 8
34) 4 0.003590358440616087 8
35) 38 0.00350243104857792 8
36) 10 0.00350243104857792 8
37) 29 0.003392223030944636 7
38) 5 0.003253553701826867 7
39) 52 0.003019071994331074 6
40) 11 0.002622140078149371 6
41) 15 0.001506974549529611 3
42) 49 0.001178236999850979 2
43) 27 0.000646877104963639 1
44) 23 0.0001088642328799794 0
45) 0 -0.0007427642973199949 -1
46) 36 -0.0008086747680855211 -1
47) 18 -0.001719116017552688 -3
48) 16 -0.003868408494392753 -8
49) 7 -0.004264601904658535 -9
50) 25 -0.004436590312574581 -10
51) 44 -0.004549722466056144 -10
52) 17 -0.005094229165450173 -11
53) 33 -0.007112771718937178 -16
54) 50 -0.008009653155771651 -18
55) 6 -0.008725562553674474 -20
56) 26 -0.01000190433609049 -23
57) 47 -0.01158648521535965 -26
58) 3 -0.01809942562041326 -41
59) 51 -0.01843159353630121 -42
60) 39 -0.02375369534904158 -54
61) 40 -0.02659139305699997 -61
62) 37 -0.02970174182772609 -68
63) 48 -0.031083105562031 -71
64) 14 -0.03323633066169551 -76
65) 28 -0.03952723165321592 -91

By permutation, the first 10 lines show that if we remove the predictor, the error will worsen by 2-6%, the first 10 of the search - only by 0.1-0.2%, because in practice the tree will always find another predictor for which there will be almost as good separation (primarily due to correlated with the predictor removed, but even if they are previously removed, something will still be found).

What is interesting, almost half of the predictors show negative importance when actually removed, i.e. if you remove them the tree error will decrease, i.e. they are clearly noisy. But the noisiest one only worsens the result by 0.5%.
And the fact that the order of importance is not at all similar leads to the idea that it is still better to sift out the noisy predictors by enumeration.

Circular Buffer Issue MQL5 The interaction of markets. [Archive!] Pure mathematics, physics,

[Deleted] 2019.01.30 10:31 #12794

Maybe because you have to compare with some benchmark or known example, not hot with light.

+speed is very important. Since alglib doesn't have imports built in, I think permutation is optimal right now (tried a bunch of permutation methods)

How does MathRand() generate Optimalization MT4 - Limit Metatrader 4 released

Aleksey Vyazmikin 2019.01.30 11:00 #12795

elibrarius:

By permutation, the first 10 lines show that if you remove the predictor, the error will worsen by 2-6%, the first 10 of the search - only by 0.1-0.2%, because in practice the tree will always find another predictor for which will be almost as good separation (primarily due to correlated with the removed predictor, but even if you remove them previously, there is still something to find).

Why do you need a common error, do you have an equilibrium binary sample? I'm leaning more toward finding ways to improve class 1 accuracy.

Aleksei Kuznetsov 2019.01.30 11:21 #12796

Aleksey Vyazmikin:

Why do you need total error, do you have equilibrium binary sampling?

Total error is not an individual leaf, but a tree/forest.

Aleksey Vyazmikin:

I'm leaning more toward finding ways to improve Class 1 accuracy.

Me too)

Aleksei Kuznetsov 2019.01.30 11:23 #12797

Maxim Dmitrievsky:

Maybe because you have to compare with some benchmark or known example, not hot with light.

+speed is very important. Since alglib doesn't have imports built in, I think shuffling is optimal now (I've tried a lot of brute force methods).

The brute force (removal/addition by 1) is the benchmark against which all other methods should be compared. But it's long, I agree. But if it adds at least 5%, I'm willing to wait.

Not for MT developers! Any rookie question, so Mashki and I. Captured

Aleksei Kuznetsov 2019.01.30 11:30 #12798

Another little experiment with permutation.
With different runs on the same tree, due to the randomness of permutation, the order of importance also changes

Aleksey Vyazmikin 2019.01.30 11:44 #12799

elibrarius:
Another little experiment with permutation.
With different runs on the same tree, due to the randomness of permutation, the order of importance also changes

I wanted to clarify, on which sample do you test the result of the permutation method, the one that was trained, or the test sample?

I understand that noise is something that stops working at all on the sample outside of training. But, I think it's not about a single predictor, but rather about relationships/leaves. I.e. there are two possibilities - the predictor is garbage, or it's just not being used correctly, i.e. the leaves are garbage.

Neuromongers, don't pass by Bayesian regression - Has Is there a pattern

Aleksei Kuznetsov 2019.01.30 11:56 #12800

Aleksey Vyazmikin:

I wanted to clarify, on which sample do you test the result of the permutation method, the one that was trained, or the test sample?

I understand that noise is something that stops working at all on the sample outside of training. But, I think it's not about a single predictor, but rather about relationships/leaves. I.e. there are two possibilities - the predictor is garbage or it's just not being used correctly, i.e. the leaves are garbage.

On the training one, since the trees are untrained. In over-trained trees it should be on the test tree, because the tree would remember the noise.
I think it doesn't matter for the untrained ones.
But sample size is important. The larger it is, the more representative it is. And my training plot is three times larger.

---------

From the https://www.mql5.com/ru/blogs/post/723619 tutorial, a large representative sample makes balancing across classes unnecessary, reducing temporal randomness. Transferred this to the untrained trees.
But I may be wrong, and I need to check the significance of predictors on the test plot.

Нужна ли деревьям и лесам балансировка по классам?

www.mql5.com

Я тут читаю: Флах П. - Машинное обучение. Наука и искусство построения алгоритмов, которые извлекают знания из данных - 2015там есть несколько страниц посвященных этой теме. Вот итоговая:Отмеченный...

Is there a pattern Discussion of article "Random Using neural networks in

Machine learning in trading: theory, models, practice and algo-trading - page 1280