Machine learning in trading: theory, models, practice and algo-trading - page 189

You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
If you take almost any book on datamining, the procedures for removing correlated predictors are necessarily described.
Interacting predictors are not necessarily correlated... They interact on target....
And the presence of interaction yields results of the species:
> summary(lm(data = train_sample_list[[1]], price_future_lag_diff_6 ~ price_diff_lag_11 * price_diff_min_lag_16))
Call:
lm(formula = price_future_lag_diff_6 ~ price_diff_lag_11 * price_diff_min_lag_16,
data = train_sample_list[[1]])
Residuals:
Min 1Q Median 3Q Max
-0.035970 -0.000824 0.000001 0.000847 0.027278
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.883e-05 3.146e-05 1.234 0.21714
price_diff_lag_11 4.828e-02 9.092e-03 5.310 1.12e-07 ***
price_diff_min_lag_16 -3.055e-02 1.141e-02 -2.678 0.00743 **
price_diff_lag_11:price_diff_min_lag_16 -3.520e+00 3.515e-01 -10.014 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0024 on 10465 degrees of freedom
Multiple R-squared: 0.01611, Adjusted R-squared: 0.01583
F-statistic: 57.11 on 3 and 10465 DF, p-value: < 2.2e-16
All predictors are significant (and their interaction). F-statistics are awesome...
Interacting predictors will not necessarily correlate... They interact on the target....
And the presence of interaction yields the results of the species:
> summary(lm(data = train_sample_list[[1]], price_future_lag_diff_6 ~ price_diff_lag_11 * price_diff_min_lag_16))
Call:
lm(formula = price_future_lag_diff_6 ~ price_diff_lag_11 * price_diff_min_lag_16,
data = train_sample_list[[1]])
Residuals:
Min 1Q Median 3Q Max
-0.035970 -0.000824 0.000001 0.000847 0.027278
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.883e-05 3.146e-05 1.234 0.21714
price_diff_lag_11 4.828e-02 9.092e-03 5.310 1.12e-07 ***
price_diff_min_lag_16 -3.055e-02 1.141e-02 -2.678 0.00743 **
price_diff_lag_11:price_diff_min_lag_16 -3.520e+00 3.515e-01 -10.014 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0024 on 10465 degrees of freedom
Multiple R-squared: 0.01611, Adjusted R-squared: 0.01583
F-statistic: 57.11 on 3 and 10465 DF, p-value: < 2.2e-16
All predictors are significant (and their interaction). F-statistics are awesome...
Vizard_:
I won't give you the data...
There's no data, so there's nothing to discuss.
Rest... you!
Just forgot who told you in 9 y.y. that your network is not properly configured, and with a polyanalyst you can pull the formula))))
There are no pretensions and secrets, I am using standard dm tools, sometimes with minor edits. If you're only interested in the referrals, listen,
But the reality is a bit different... The previous experiment was on real data. I did it with simple artificial data. The first set was recognized absolutely
correctly. I added ......... The answer should be 100%, but jPrediction 11 was so shitty that it gave out too much "ii"))) In short, fine-tune it,
The device doesn't work yet. I won't throw out the data, you're showing off, you'll figure it out... Version 20, maybe I'll look again, if "advertising" will be like today)))
Well, maybe I just had a different idea about the meaning of the word "interaction."
There are clear rules for treating interactions in linear models. They are a bit more complicated than the linear combination treatment:https://www.r-bloggers.com/interpreting-interaction-coefficient-in-r-part1-lm/
But you have to dig through a lot of combinations to find meaningful interactions. Now that's an ambush.
Are you aware that multiple optimizations always get different results. It's optimization, if it always came to the same answer it would be good, but it's too strange, try to optimize several times, I'm sure that 8 out of 10 times you get your 100%. So it's like this....
He doesn't even realize that the total sample before training is divided into parts at random: some patterns go to the training part, others to the test part. And with this breakdown, it may well turn out that some patterns, necessary to clarify the patterns, are crowded in the test part and are not presented in the training part. And since the algorithm is trained only on the training part and has no telepathic powers to know what is in the test part, there will be errors when calculating the generalization ability. That is, nothing surprising happens.
But when the patterns that are supposed to clarify patterns turn out to be evenly distributed over different parts of the sample, then the learning ability is higher than in the case described above.
I.e. it is not a case of one thing for another and any randomness sooner or later can show its undesirable side.
It is quite possible that it is possible to find a method by which the general sample will be divided into parts not randomly, but deterministically? But for now, experience shows that all determinism in dividing a sample is fraught with fitting and subsequent overtraining.
It is quite possible that one could find a method by which the general sample would be divided into parts not randomly, but deterministically? But so far, as experience shows, all determinism in dividing a sample is fraught with fitting and subsequent retraining.
Maybe you need to do training several times and divide up the sample randomly each time? And already from this ready set of trained models to choose and in general you can assess how good the model is.
This is already implemented in jPrediction, i.e. several different sample partitions are computed in parallel on different CPU cores (two binary classifiers are one terrary per free core). The processor turns out to be 100% loaded. The problem is that the number of cores in the CPU is limited, so the probability of irregular pattern distribution can only be reduced, but it is very problematic to minimize them. Unless you train patterns on supercomputers rather than on cpu's.
For example, if you calculate models on the Chinese supercomputer Tianhe-2, there are 3,120,000 cores. The probability of uneven distribution of patterns over parts of the sample will be negligible. If we compute patterns on a 4-core staff (and also reserve a couple of cores for other tasks), it is not surprising that sooner or later we will run into irregularity.
This is already implemented in jPrediction, i.e. several different sample partitions are computed in parallel on different CPU cores (two binary classifiers are one terrary per free core). The processor turns out to be 100% loaded. The problem is that the number of cores in the CPU is limited, so the probability of irregular pattern distribution can only be reduced, but it is very problematic to minimize them. Unless you train patterns on supercomputers rather than on cpu's.
For example, if you calculate models on the Chinese supercomputer Tianhe-2, there are 3,120,000 cores. The probability of uneven distribution of patterns over parts of the sample will be negligible. If you calculate patterns on a 4-core staff (and also reserve a couple of cores for other tasks), it is not surprising that sooner or later you will run into irregularity.
That is, it is useful to do this. So instead of 4 partitions, obviously not enough, you should do 40 partitions. It would take 10 times longer to count for 4 cores, but I suppose time can be sacrificed in favor of robustness.
"If it can be done and if it will do any good, it should be done." (c) Papo Carlo Albertovich.