Correlation Coefficient with Two Arrays

 

Hi,

I'm a newcomer to programming and MQL4, and I was wondering if some could help me with a piece of code?

I've looked everywhere and the book/articles/google don't help me. I may have stumbled across the answer but because the code is so complex, and not relating to comparing arrays, I'm having great difficulty.


My problem:

I have 2 arrays. They are both the same size.

  • Array1 contains the previous 500 Bid prices.
* As it is a custom array, I believe the newest value is in Array1[0], second oldest in Array1[1]..and so forth.
This is the code I use in case anyone is interested:
//--------------------------------------------------------------- 3 --
   // Tick Counter
{ 
  double tickarray[500];
  static int cnt=499;
  if(cnt<0) cnt=0; 
  { //My addition
    if(tickarray[0] !=0)
   {
    for (int i=499; i>1; i--)
    {
     tickarray[i]=tickarray[i-1];
    } 
   }
  tickarray[cnt]=Bid;
  cnt--;
  return(0);
}
} 


  • Array2 contains the following sequence: 1,2,3,4,5,6..500. (I haven't made this yet, but if someone can tell me how to enter 1 to N in an array without manually doing it I would be forever greatful)

* My coding is very clumsy with using for() and i'm sure it could be used in this context. Also I'd only need to do it the once so I don't know if that means it should go in a different part of the EA (e.g before start()).


My goal:

I wish to receive the correlation coefficient between the two arrays. (e.g. -1>x<1)

This process is meant to be identical to that of the excel correlation function. In excel you simply highlight the two arrays and it spits out the result in no time.

I use this as a filter as I find it gives a very good indication of the strength of a movement and ensures entries are made when there is not a high level of noise.

On excel I worked with tick-by-tick data and so Array1 is in effect holding this data (just as excel would e.g. A1:A500.)


Final note:

Also, if anyone has any suggestion as to how to use iMAOnArray() on a custom array(e.g. how would you use the period parameter in this context), I - and the forex karma gods - would be forever greatful.


Thank you so much in advance if you can help, i'll appreciate it so much.



As an extra note:

I found this link with the code below, however the comment attached to the script was:


Hello Russell,

Your function was giving incorrect results. I tried it with two hard-coded numeric arrays with identical values and it doesn't return 1. Initially I thought there was something wrong with my Correlation indicator. Took me a while to discover it was actually this routine that was giving me the error.

Scott

reply 12.07.2009 14:22 apparelink



Code:

double correlation(double dArrayX[] , double dArrayY[]){    int liX = ArraySize(dArrayX);    if(liX != ArraySize(dArrayY)){       Print("correlation : arrays don't match");       return(0);    }       if(liX <= 1){       return(0);    }       double ldSumXAVG = 0;    double ldSumYAVG = 0;    double ldmuT = 0;    double ldDevX = 0;    double ldDevY = 0;    double ldRoX = 0;    double ldRoY = 0;       for(int i = 0; i < liX; i++){          ldSumXAVG += dArrayX[i];       ldSumXAVG += dArrayY[i];          }    ldSumXAVG /= liX;    ldSumYAVG /= liX;         for (i = 0; i < liX; i++){          ldDevX = dArrayX[i]-ldSumXAVG;       ldDevY = dArrayY[i]-ldSumYAVG;         ldmuT += ldDevX*ldDevY;       ldRoX +=MathPow(ldDevX,2);         ldRoY +=MathPow(ldDevY,2);    }         double ldDivider = MathSqrt(ldRoX)*MathSqrt(ldRoY);    if (ldDivider == 0){       Print("correlation : can't divide by zero");       return(0);    }    return( ldmuT / ldDivider); }

 

I wish to receive the correlation coefficient between the two arrays. (e.g. -1>x<1) [...]

What you seem to be calculating isn't correlation in the usual sense. If one of the arrays contains the numbers 0 to 499 (or 1 to 500), then in effect you're calculating the smoothness of the price direction. A perfect upward arithmetic progression in price, of any magnitude, will have correlation of 1; a decline in price will have correlation of -1. A geometric progression (e.g. price changes by a cumulative 5% each time) will have correlation typically close to +1/-1, but declining towards zero in relation to the size of the percentage change.

Leaving that aside, the correlation code you have quoted should work apart from the fact that there seems to be a simple typo in it. I think that the line "ldSumXAVG += dArrayY[i];" should instead be "ldSumYAVG += dArrayY[i];". Apart from that, it looks okay.

If you want to improve performance, this is one of the areas where you can use a cyclical array rather than constantly shifting the array contents. Correlation isn't dependent on the internal sequence of each set of data. For example, the correlation of {1.4, 1.6, 1.5, 1.9, 1.8} and {1, 2, 3, 4, 5} is the same as the correlation of {1.6, 1.4, 1.5, 1.9, 1.8} and {2, 1, 3, 4, 5}. Therefore, you ought to be able to do something like the following - though this is not tested:

static double Prices[500];
static double Counters[500];
static int ArrayIndex = 0;

Prices[ArrayIndex % ArraySize(Prices)] = Bid;
Counters[ArrayIndex % ArraySize(Counters)] = ArrayIndex;
ArrayIndex++;

// calculate correlation of Prices[] and Counters[]
 
jjc:
What you seem to be calculating isn't correlation in the usual sense.
@jjc (or others): how would you then measure correlation?
 
schnappi:
@jjc (or others): how would you then measure correlation?

I'm basically going to leave this to someone better at statistics, and more sober, than I am. However, casting my mind back 9 months to the original query...

When measuring correlation, you'd typically assemble e.g. a line of 500 people, and then record e.g. their height versus their shoe size and plot the correlation between those two things. Whereas the original question was broadly equivalent to recording people's shoe size and plotting that against whether each person was 45th or 392nd in line. That's not meaningless in context, because the sequence in the queue has a meaning - i.e. time - but it basically translates to calculating how smoothly price increases or decreases over time. It would be more normal to calculate the correlation of something like price changes on each bar versus volume.

Putting it another way, it's not a question of how you "measure correlation". It's what you're measuring correlation between. And the standard thing to say at this point is generally "correlation is not causation".

 
jjc:

I'm basically going to leave this to someone better at statistics, and more sober, than I am. However, casting my mind back 9 months to the original query...

When measuring correlation, you'd typically assemble e.g. a line of 500 people, and then record e.g. their height versus their shoe size and plot the correlation between those two things. Whereas the original question was broadly equivalent to recording people's shoe size and plotting that against whether each person was 45th or 392nd in line. That's not meaningless in context, because the sequence in the queue has a meaning - i.e. time - but it basically translates to calculating how smoothly price increases or decreases over time. It would be more normal to calculate the correlation of something like price changes on each bar versus volume.

Putting it another way, it's not a question of how you "measure correlation". It's what you're measuring correlation between. And the standard thing to say at this point is generally "correlation is not causation".

Poster JJ, imo, is 100% of use and valid in his/her take.

How do you measure correlations? Which coefficients are applicable in your case?

In case, you haven't known yet, there are variations that apply to certain strict criterion of sampling and distributions. There are kendalls, rankings, taus and interclass, etc, and I will not go into all that, but you have to, unfortunately or fortunately.

I dont know what the one in excel is using - but I guess it has its own unique version. Is the result accurate, for your hypothesis?

To be frank, this is one of the simpler statistical exercise, imho, as long as you have determined which formulary does not give divide error, the most accurate, and tested for all ranges,etc, of your hypothesis. Basically, most of them simply required tweaking of means, deviations, or their orders, that's about it. But if haven't had any proper hypothesis to begin with, you will surely go in circles, as it seems.

 

This is a correletation function.


double correlation(double& dArrayX[], double& dArrayY[]){
    double xy[], xx[], yy[];
    double xyTotal, xxTotal, yyTotal,xTotal, yTotal;
    double pearson = 0;
    int size = ArraySize(dArrayX);
    for(int i=0;i < size;i++){
        xy[i] = dArrayX[i]*dArrayY[i];
        xx[i] = MathPow(dArrayX[i],2);
        yy[i] = MathPow(dArrayY[i],2);
        xyTotal += xy[i];
        xxTotal += xx[i];
        yyTotal += yy[i];
        xTotal +=dArrayX[i];
        yTotal +=dArrayY[i];
    }
    pearson = [(size * xyTotal)-(xx[i]*yy[i])] / MathSqrt ( [(size * xxTotal)- MathPow(xTotal,2)][(size * yyTotal)- MathPow(yTotal,2)] );
    return pearson;
}
 
Daniel Castro:

This is a correletation function.


You function doesn't compile, and it seems to use arrays where they are not needed
 
  1. schnappi how would you then measure correlation?
    #define INDEX uint   // Zero based.
    #define COUNT uint   // One based.
    double correlation_coefficient(double& a[], double& b[], COUNT length, INDEX iBeg=0){
       INDEX    iEnd  = iBeg + length;
       /* https://en.wikipedia.org/wiki/Correlation_and_dependence
        * CorrelationCoefficient = ss(xy)/Sqrt( ss(xx) ss(yy) )
        * ss(xy) = E (xi-Ave(x))(yi-Ave(y)) = n E XiYi - E Xi E Yi
        * ss(xx) = E (xi-Ave(x))^2          = n E Xi**2 - (E Xi)**2
        * ss(yy) = E (yi-Ave(y))^2          = n E Yi**2 - (E Yi)**2
        */
       double   Ex=0.0,  Ex2=0.0,    Ey=0.0,  Ey2=0.0,    Exy=0.0;    // Ex=Sum(x)
       for(; iBeg < iEnd; ++iBeg){
          double   x = a[iBeg],   y = b[iBeg];
          Ex += x;    Ex2 += x * x;     Ey += y;    Ey2 += y * y;  Exy += x * y;
       }
          double   ssxy  = length * Exy - Ex * Ey;
          double   ssxx  = length * Ex2 - Ex * Ex;
          double   ssyy  = length * Ey2 - Ey * Ey;
          double   deno  = MathSqrt(ssxx * ssyy);
       return (deno == 0.0) ? 0.0 : ssxy / deno;
    }

  2. JC Correlation isn't dependent on the internal sequence of each set of data
    True as long as "the internal sequence of each set of data" is the same sequence.
 
William Roeder #:

  1. True as long as "the internal sequence of each set of data" is the same sequence.

Thanks for the function !!!!!  this is exactly what ive been looking for. And exactly  mql5 equivalent to pinescript correlation(x, y, len)

Link for anyone searching for correlation formula https://www.mql5.com/en/articles/5481

Practical application of correlations in trading
Practical application of correlations in trading
  • www.mql5.com
In this article, we will analyze the concept of correlation between variables, as well as methods for the calculation of correlation coefficients and their practical use in trading. Correlation is a statistical relationship between two or more random variables (or quantities which can be considered random with some acceptable degree of accuracy). Changes in one ore more variables lead to systematic changes of other related variables.
Reason: