Machine learning in trading: theory, models, practice and algo-trading - page 295

 

The question of the correlation of variables has been discussed many times.

Of course, correlation is the most doubtful, very quickly begins to see a correlation with the rings of Saturn, coffee grounds...

For some reason, this has been forgotten:

The Granger causalitytest is a procedure for testing causality ("Granger causality") betweentime series. The idea of the test is that the values (changes) of time series{\displaystyle x_{t}}, which are the cause of changes of time series{\displaystyle y_{t}}, should precede the changes of this time series, and moreover should make a significant contribution to the forecast of its values. If each variable contributes meaningfully to the prediction of the other, then perhaps there is some other variable that affects both.

TheGrangertestconsistently tests the two null hypotheses: "x is not the cause of y by Granger" and "y is not the cause of x by Granger." To test these hypotheses two regressions are constructed: in each regression the dependent variable is one of the variables tested for causality, and the regressors are the lags of both variables (in fact, it is avector autoregression).

And here is the code for this case from here

# READ QUARTERLY DATA FROM CSV
library(zoo)
ts1 <- read.zoo('Documents/data/macros.csv', header = T, sep = ",", 
FUN = as.yearqtr)
 
# CONVERT THE DATA TO STATIONARY TIME SERIES
ts1$hpi_rate <- log(ts1$hpi / lag(ts1$hpi))
ts1$unemp_rate <- log(ts1$unemp / lag(ts1$unemp))
ts2 <- ts1[1:nrow(ts1) - 1, c(3, 4)]
 
# METHOD 1: LMTEST PACKAGE
library(lmtest)
grangertest(unemp_rate ~ hpi_rate, order = 1, data = ts2)
# Granger causality test
#
# Model 1: unemp_rate ~ Lags(unemp_rate, 1:1) + Lags(hpi_rate, 1:1)
# Model 2: unemp_rate ~ Lags(unemp_rate, 1:1)
#   Res.Df Df      F  Pr(>F)
# 1     55
# 2     56 -1 4.5419 0.03756 *
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
# METHOD 2: VARS PACKAGE
library(vars)
var <- VAR(ts2, p = 1, type = "const")
causality(var, cause = "hpi_rate")$Granger
#         Granger causality H0: hpi_rate do not Granger-cause unemp_rate
#
# data:  VAR object var
# F-Test = 4.5419, df1 = 1, df2 = 110, p-value = 0.0353
 
# AUTOMATICALLY SEARCH FOR THE MOST SIGNIFICANT RESULT
for (i in 1:4)
  {
  cat("LAG =", i)
  print(causality(VAR(ts2, p = i, type = "const"), cause = "hpi_rate")$Granger)
Английский язык — Википедия
Английский язык — Википедия
  • ru.wikipedia.org
Самоназвание: Регулирующая организация: Общее число говорящих: Статус: Классификация Категория: Письменность: Языковые коды ГОСТ 7.75–97: ISO 639-1: ISO 639-2: ISO 639-3: Распространение английского языка[3]: Английский язык возник в раннем Средневековье как язык части германских племён, вторгшихся в Британию. Он стал родным для...
 
SanSanych Fomenko:

Of course, correlation is the most dubious, very quickly begins to see a correlation with the rings of Saturn, coffee grounds...

Somehow this has been forgotten:

Nobody forgot anything....

That's when google correlate will ask, and what method esteemed user you want to measure the relationship? your post will be relevant then, but now Google is not asking and will not ask, the service is six years old, if they wanted to do it, they would have done it already

And another thing ...

Google has billions of BPs in the database, there will always be a hundred other BPs that were close by chance, just because the database is huge and it does not matter how they measure proximity, a simple correlation or something arcane and complicated.

The question is how to sift out the random from the non-random.

 
mytarmailS:

The question is how to sift the random from the not random

We can

1) divide the Eura series into two parts "1" and "2"

2) throw the row "1" in Google and it will find all of the close rows.

3) memorize names of all close rows.

4) throw row "2" into google way will find all close rows

5) memorize names of all close rows

6) compare the names of rows from items 3) and 5) and look for such a row which is present in 3) and in 5)

Thus we find series which do not accidentally correlate with euros - this is something like crossvalidation in its most primitive form.

But how to get these names I don't know, you probably need to parse the page

 
mytarmailS:

Nobody forgot anything....

That's when google correlate will ask, and what method dear user do you want to measure the connection? then your post will be relevant, but now Google is not asking, and will not ask, the service is 6 years old, if they wanted to do it, they would have done it already

And another thing ...

Google has billions of BPs in the database, there will always be a hundred other BPs that were close by chance, just because the database is huge and it does not matter how they measure proximity, a simple correlation or something arcane and complicated.

The question is how to sift out the random from the non-random


So sift out by test from all the garbage that google has collected - that's what I meant.
 
SanSanych Fomenko:
The difference between the two types of correlations is that in this situation the correlations are different.

I'm not going to do it, I'm not going to do it alone :)

It seems to me that the fossil fuel is the most cheaper in the world, but in this world the fossil fuel is the most cheaper.

 
mytarmailS:

No matter how sophisticated a test you apply, they all show a great connection.

I have my doubts about that. A high correlation of trends only means that they are generally growing and falling in approximately the same way. To start with it would be good to look for correlation of not trends but growths, for example you can save those similar trends in csv that google will show, then find lags by yourself and recalculate the correlation, it will be much more objective.

And correlation doesn't guarantee at all that one variable can predict the other. Generally, selecting predictors for prediction by the principle of high correlation is unfortunate. What SanSanych suggested I haven't tried before, but it looks more reliable.

 
mytarmailS:

We can

1) divide the eura row into two parts "1" and "2

2) throw row "1" in google way will find all close rows

3) memorize names of all close rows.

4) throw row "2" into google way will find all close rows

5) memorize names of all close rows

6) compare names of rows from points 3) and 5) and look for such a series which is present in both 3) and 5)

Thus we find series which do not accidentally correlate with euro; this is something like crossvalidation in its most primitive form.

But how to get these names I don't know, you probably need to parse the page


It's called the CHOU test.

Actually checks for sample heterogeneity in the context of a regression model.

 
Dr.Trader:

I doubt that. High correlation of trends means only that they generally grow and fall about the same. To begin with it would be good to look for correlation of not trends, but increases, for example, you can save in csv those similar trends that google gives out, then find lags yourself and recalculate the correlation, it will already be much more objective.

Yes, I agree, but google doesn't show us all its base, but only what correlates "by trends", to take what correlates by trends and make increments from it and measure corr. is also not objective, probably ... :) you have to look at the whole database

Dimitri:


It's called a PSE test.

Actually tests sample heterogeneity in the context of a regression model.

Here we go. This CHOU can be applied, but we need to get the rows out of Google, or at least their names
 

I read this pamphlet http://www.mirkin.ru/_docs/dissert065.pdfand wanted to use NeuroShell Day Trader Pro

 
Getting Started With TensorFlow  |  TensorFlow
Getting Started With TensorFlow  |  TensorFlow
  • www.tensorflow.org
This guide gets you started programming in TensorFlow. Before using this guide, install TensorFlow. To get the most out of this guide, you should know the following: How to program in Python. At least a little bit about arrays. Ideally, something about machine learning. However, if you know little or nothing about machine learning, then this...
Reason: