Zero sample correlation does not necessarily mean there is no linear relationship - page 39

 


Here from the comments on the link from hrenfx 22.03.2011 00:43 liked it:

there is a correlation! :)
it does not mean that one is a consequence of the other.
but the phenomena are related.
(and then one may start inventing explanations).
but that's not the point.
The point is that, in some perspective, it is possible to predict the relationship of one to the other. up to a point. )
of course, a clear understanding of the mechanism of the connection will allow one to predict when the connection will end.
but...
but also just by constantly analysing the correlation - it is possible to predict when it will end. )

 
Like Grove : there is a correlation - it can't not be))))))))))
 
Neutron:

I agree in part, but by no means with everything. If you want to have a substantive discussion on the subject you have raised, you will first have to read a few of my posts revealing my view on the subject. I've had to repeat myself a lot, so I won't do that any more. Just sent you two links to my posts via PM.

 
hrenfx

Good afternoon, I've been following your threads and I'm interested in your logic,

I have a question - have you tried to rewrite the correlation indicator from recycle2 to mt5

 

In my research I needed to qualitatively assess the relationship between the series, so I decided to use the correlation coefficient. The conclusions are disappointing - the methods that classical statistics suggest using are practically useless for finding non-obvious relationships between series. For example, let's take a weekly gold futures chart and its Open Interest:

Obviously, there is a direct correlation. Yes, it's not very strong and obvious, but when the gold price rises, the Open Interest value of the gold futures is higher, when it falls - lower.

Later, we will find the correlation coefficients between the price of gold and its OI. But first, let's consider the most common Pearson correlation formula:

If you look closely, it becomes clear that the formula detrends the data (x - x median), aligns the volatilities by standard deviation over the entire sample, and then counts how long, both series have been in the same direction. Obviously, the calculation requires the first differences of the form I(0), because in the case of I(1) we are in for an ambush, because the series we are dealing with are always positive (the price is always greater than zero), but about that too later.

Pearson correlation: 0.02234314

Kendel correlation: 0.002866038

Spearman correlation: 0.002046104

That is, in fact, no correlation was found in all cases. But what about our keen eyes? Are we imagining it all? And is the correlation between gold and Open Interest the same as the correlation between banana imports from Morocco and the country's birth rate?

Perhaps the reason is the lag of one indicator relative to the other. The lags just don't match up. What if the OI goes up first and only then does gold do it? - Oh, then there could be money to be made on that :) Let's test the idea with a Cross-correlation function:

Somewhat unconvincing. There are some two values that stand out from the sample, overall and here the picture is as if there is no relationship and therefore lag plays no role.

OK. Let's then try to calculate the correlation on the I(1) series. Who says that this should not be done in any case? Let there be an overestimation of the result - but better an overestimation than no result. For this purpose an experiment was conducted, let's generate 100 BPs and calculate correlation matrix for them. The average value will show how much the estimate will be overestimated, and simply when working on the I(1) series, will we take this into account, or not?

Here's a script on R that does it all:

#
# corexp - эксперимент выявляющий особенности корреляционных функций при работе с I(1) рядами
# exp - количество экспериментов
# lenght - длинна каждой серии
# cortype - тип корреляции (pearson - КК Пирсона, kendall - КК Кендалла, spearman - КК Спирмана)
# retrange - Истина, если требуется сгенерировать I(1) ряды
#
corexp <- function(exp = 10, lenght = 1000, cortype = 'pearson', retrange = TRUE)
{
   bp <- matrix(ncol = exp, nrow = lenght)
   for(i in 1:exp)
   {
      bp[,i] <- rnorm(lenght, mean = 0.000117, sd = 0.0048)
      if(retrange == FALSE)
            bp[,i] <- cumsum(bp[,i])
   }
   #Рассчитываем матрицу корреляций
   mcor <- matrix(ncol=exp, nrow=exp)
   for(k in 1:exp)
   {
      for(i in 1:exp)
      {
         mcor[k,i] <- cor(bp[,k], bp[,i], method = cortype)
      }
   }
   return(mcor)
}

# Статистика корреляций
# При желании считаем здесь все что угодно
corstat <- function(m)
{
   m[m == 1] <- NaN
   mean(m, na.rm = TRUE)
}

Let's actually look at this 'mean': 0.153359. It seems fine - it is overestimated by as little as 15%. But there is another trap. We look at the distribution of the correlation matrix:

The mean value in this case is not defined at all, or rather any correlation value is as frequent as any other value. It is all about the positive bias of our BP, which is set by the parameter highlighted in bold. After all, all prices we deal with have positive values, i.e. they are in the positive zone.

1. As you can see, I(1) series cannot be used at all. For series whose relationship is not obvious and not rigidly functional, correlation coefficients are absolutely useless.

2. The choice of a particular implementation of a correlation coefficient does not fundamentally affect anything. None of the three common coefficients has ever been able to reveal the relationship between gold and its open interest, even though it is obvious that such a relationship exists.

 
C-4:

Pearson correlation: 0.02234314

Kendel correlation: 0.002866038

Spearman correlation: 0.002046104

Can we have a look at the original series? Are they available in Excel, for example?
 
The original rows are not saved. Here is one of the generations in CSV format.
Files:
bp.txt  2010 kb
 
C-4:
The original rows are not saved. Here is one of the generations in CSV format.
What is your source row of open interest data?
 
Here is the OI data aligned with the price of gold.
Files:
gold_oi_2.txt  19 kb
 
Correlation coefficient = 0.766654
Reason: