Machine learning in trading: theory, models, practice and algo-trading - page 189

 
SanSanych Fomenko:


If you take almost any book on datamining, the procedures for removing correlated predictors are necessarily described.

Interacting predictors are not necessarily correlated... They interact on target....

And the presence of interaction yields results of the species:

> summary(lm(data = train_sample_list[[1]], price_future_lag_diff_6 ~ price_diff_lag_11 * price_diff_min_lag_16))


Call:

lm(formula = price_future_lag_diff_6 ~ price_diff_lag_11 * price_diff_min_lag_16, 

    data = train_sample_list[[1]])


Residuals:

      Min        1Q    Median        3Q       Max 

-0.035970 -0.000824  0.000001  0.000847  0.027278 


Coefficients:

                                          Estimate Std. Error t value Pr(>|t|)    

(Intercept)                              3.883e-05  3.146e-05   1.234  0.21714    

price_diff_lag_11                        4.828e-02  9.092e-03   5.310 1.12e-07 ***

price_diff_min_lag_16                   -3.055e-02  1.141e-02  -2.678  0.00743 ** 

price_diff_lag_11:price_diff_min_lag_16 -3.520e+00  3.515e-01 -10.014  < 2e-16 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Residual standard error: 0.0024 on 10465 degrees of freedom

Multiple R-squared:  0.01611, Adjusted R-squared:  0.01583 

F-statistic: 57.11 on 3 and 10465 DF,  p-value: < 2.2e-16

All predictors are significant (and their interaction). F-statistics are awesome...

 
Alexey Burnakov:

Interacting predictors will not necessarily correlate... They interact on the target....

And the presence of interaction yields the results of the species:

> summary(lm(data = train_sample_list[[1]], price_future_lag_diff_6 ~ price_diff_lag_11 * price_diff_min_lag_16))


Call:

lm(formula = price_future_lag_diff_6 ~ price_diff_lag_11 * price_diff_min_lag_16, 

    data = train_sample_list[[1]])


Residuals:

      Min        1Q    Median        3Q       Max 

-0.035970 -0.000824  0.000001  0.000847  0.027278 


Coefficients:

                                          Estimate Std. Error t value Pr(>|t|)    

(Intercept)                              3.883e-05  3.146e-05   1.234  0.21714    

price_diff_lag_11                        4.828e-02  9.092e-03   5.310 1.12e-07 ***

price_diff_min_lag_16                   -3.055e-02  1.141e-02  -2.678  0.00743 ** 

price_diff_lag_11:price_diff_min_lag_16 -3.520e+00  3.515e-01 -10.014  < 2e-16 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Residual standard error: 0.0024 on 10465 degrees of freedom

Multiple R-squared:  0.01611, Adjusted R-squared:  0.01583 

F-statistic: 57.11 on 3 and 10465 DF,  p-value: < 2.2e-16

All predictors are significant (and their interaction). F-statistics are awesome...

Well, maybe I just had a different idea about the meaning of the word "interaction".
 

Vizard_:

I won't give you the data...

There's no data, so there's nothing to discuss.

Rest... you!

 
Vizard_:
Just forgot who told you in 9 y.y. that your network is not properly configured, and with a polyanalyst you can pull the formula))))
There are no pretensions and secrets, I am using standard dm tools, sometimes with minor edits. If you're only interested in the referrals, listen,
But the reality is a bit different... The previous experiment was on real data. I did it with simple artificial data. The first set was recognized absolutely
correctly. I added ......... The answer should be 100%, but jPrediction 11 was so shitty that it gave out too much "ii"))) In short, fine-tune it,

The device doesn't work yet. I won't throw out the data, you're showing off, you'll figure it out... Version 20, maybe I'll look again, if "advertising" will be like today)))


Are you aware that multiple optimizations always produce different results. This is optimization, if it always came to the same answer would be certainly good but it's too strange, try to optimize a few times, I'm sure that 8 out of 10 times you get your 100% So that's how it is....
 
SanSanych Fomenko:
Well, maybe I just had a different idea about the meaning of the word "interaction."


There are clear rules for treating interactions in linear models. They are a bit more complicated than the linear combination treatment:https://www.r-bloggers.com/interpreting-interaction-coefficient-in-r-part1-lm/

But you have to dig through a lot of combinations to find meaningful interactions. Now that's an ambush.

Interpreting interaction coefficient in R (Part1 lm)
Interpreting interaction coefficient in R (Part1 lm)
  • grumble10
  • www.r-bloggers.com
Interaction are the funny interesting part of ecology, the most fun during data analysis is when you try to understand and to derive explanations from the estimated coefficients of your model. However you do need to know what is behind these estimate, there is a mathematical foundation between them that you need to be aware of before being able...
 
Mihail Marchukajtes:
Are you aware that multiple optimizations always get different results. It's optimization, if it always came to the same answer it would be good, but it's too strange, try to optimize several times, I'm sure that 8 out of 10 times you get your 100%. So it's like this....

He doesn't even realize that the total sample before training is divided into parts at random: some patterns go to the training part, others to the test part. And with this breakdown, it may well turn out that some patterns, necessary to clarify the patterns, are crowded in the test part and are not presented in the training part. And since the algorithm is trained only on the training part and has no telepathic powers to know what is in the test part, there will be errors when calculating the generalization ability. That is, nothing surprising happens.

But when the patterns that are supposed to clarify patterns turn out to be evenly distributed over different parts of the sample, then the learning ability is higher than in the case described above.

I.e. it is not a case of one thing for another and any randomness sooner or later can show its undesirable side.

It is quite possible that it is possible to find a method by which the general sample will be divided into parts not randomly, but deterministically? But for now, experience shows that all determinism in dividing a sample is fraught with fitting and subsequent overtraining.

 
Yury Reshetov:

It is quite possible that one could find a method by which the general sample would be divided into parts not randomly, but deterministically? But so far, as experience shows, all determinism in dividing a sample is fraught with fitting and subsequent retraining.

Maybe we should do training several times and each time divide up the sample randomly? And already from this ready set of trained models we can choose and in general we can evaluate how good the model is.
This way we can negate the chance of getting a randomly fitted model without becoming hostage to determinism.

 
Andrey Dik:
Maybe you need to do training several times and divide up the sample randomly each time? And already from this ready set of trained models to choose and in general you can assess how good the model is.
That way we can reduce to zero the probability of getting a randomly fitted model, while not being hostage to determinism.

This is already implemented in jPrediction, i.e. several different sample partitions are computed in parallel on different CPU cores (two binary classifiers are one terrary per free core). The processor turns out to be 100% loaded. The problem is that the number of cores in the CPU is limited, so the probability of irregular pattern distribution can only be reduced, but it is very problematic to minimize them. Unless you train patterns on supercomputers rather than on cpu's.

For example, if you calculate models on the Chinese supercomputer Tianhe-2, there are 3,120,000 cores. The probability of uneven distribution of patterns over parts of the sample will be negligible. If we compute patterns on a 4-core staff (and also reserve a couple of cores for other tasks), it is not surprising that sooner or later we will run into irregularity.

 
I remind you that no one ever asked me why I did it, but I make such an output variable, that the number of ones and zeros was equal. I do it by adjusting signals profit, from -10 pips to + 50. If the number of ones and zeros is equal, the model is seldom split in two. And once again I remind you, it does not matter how to divide, it is important that the division was stable.....
 
Yury Reshetov:

This is already implemented in jPrediction, i.e. several different sample partitions are computed in parallel on different CPU cores (two binary classifiers are one terrary per free core). The processor turns out to be 100% loaded. The problem is that the number of cores in the CPU is limited, so the probability of irregular pattern distribution can only be reduced, but it is very problematic to minimize them. Unless you train patterns on supercomputers rather than on cpu's.

For example, if you calculate models on the Chinese supercomputer Tianhe-2, there are 3,120,000 cores. The probability of uneven distribution of patterns over parts of the sample will be negligible. If you calculate patterns on a 4-core staff (and also reserve a couple of cores for other tasks), it is not surprising that sooner or later you will run into irregularity.

That is, it is useful to do this. So instead of 4 partitions, obviously not enough, you should do 40 partitions. It would take 10 times longer to count for 4 cores, but I suppose time can be sacrificed in favor of robustness.

"If it can be done and if it will do any good, it should be done." (c) Papo Carlo Albertovich.

Reason: