I wish to receive the correlation coefficient between the two arrays. (e.g. -1>x<1) [...]
Leaving that aside, the correlation code you have quoted should work apart from the fact that there seems to be a simple typo in it. I think that the line "ldSumXAVG += dArrayY[i];" should instead be "ldSumYAVG += dArrayY[i];". Apart from that, it looks okay.
If you want to improve performance, this is one of the areas where you can use a cyclical array rather than constantly shifting the array contents. Correlation isn't dependent on the internal sequence of each set of data. For example, the correlation of {1.4, 1.6, 1.5, 1.9, 1.8} and {1, 2, 3, 4, 5} is the same as the correlation of {1.6, 1.4, 1.5, 1.9, 1.8} and {2, 1, 3, 4, 5}. Therefore, you ought to be able to do something like the following - though this is not tested:
static double Prices[500]; static double Counters[500]; static int ArrayIndex = 0; Prices[ArrayIndex % ArraySize(Prices)] = Bid; Counters[ArrayIndex % ArraySize(Counters)] = ArrayIndex; ArrayIndex++; // calculate correlation of Prices[] and Counters[]
What you seem to be calculating isn't correlation in the usual sense.
@jjc (or others): how would you then measure correlation?
I'm basically going to leave this to someone better at statistics, and more sober, than I am. However, casting my mind back 9 months to the original query...
When measuring correlation, you'd typically assemble e.g. a line of 500 people, and then record e.g. their height versus their shoe size and plot the correlation between those two things. Whereas the original question was broadly equivalent to recording people's shoe size and plotting that against whether each person was 45th or 392nd in line. That's not meaningless in context, because the sequence in the queue has a meaning - i.e. time - but it basically translates to calculating how smoothly price increases or decreases over time. It would be more normal to calculate the correlation of something like price changes on each bar versus volume.
Putting it another way, it's not a question of how you "measure correlation". It's what you're measuring correlation between. And the standard thing to say at this point is generally "correlation is not causation".
I'm basically going to leave this to someone better at statistics, and more sober, than I am. However, casting my mind back 9 months to the original query...
When measuring correlation, you'd typically assemble e.g. a line of 500 people, and then record e.g. their height versus their shoe size and plot the correlation between those two things. Whereas the original question was broadly equivalent to recording people's shoe size and plotting that against whether each person was 45th or 392nd in line. That's not meaningless in context, because the sequence in the queue has a meaning - i.e. time - but it basically translates to calculating how smoothly price increases or decreases over time. It would be more normal to calculate the correlation of something like price changes on each bar versus volume.
Putting it another way, it's not a question of how you "measure correlation". It's what you're measuring correlation between. And the standard thing to say at this point is generally "correlation is not causation".
Poster JJ, imo, is 100% of use and valid in his/her take.
How do you measure correlations? Which coefficients are applicable in your case?
In case, you haven't known yet, there are variations that apply to certain strict criterion of sampling and distributions. There are kendalls, rankings, taus and interclass, etc, and I will not go into all that, but you have to, unfortunately or fortunately.
I dont know what the one in excel is using - but I guess it has its own unique version. Is the result accurate, for your hypothesis?
To be frank, this is one of the simpler statistical exercise, imho, as long as you have determined which formulary does not give divide error, the most accurate, and tested for all ranges,etc, of your hypothesis. Basically, most of them simply required tweaking of means, deviations, or their orders, that's about it. But if haven't had any proper hypothesis to begin with, you will surely go in circles, as it seems.
This is a correletation function.
double correlation(double& dArrayX[], double& dArrayY[]){ double xy[], xx[], yy[]; double xyTotal, xxTotal, yyTotal,xTotal, yTotal; double pearson = 0; int size = ArraySize(dArrayX); for(int i=0;i < size;i++){ xy[i] = dArrayX[i]*dArrayY[i]; xx[i] = MathPow(dArrayX[i],2); yy[i] = MathPow(dArrayY[i],2); xyTotal += xy[i]; xxTotal += xx[i]; yyTotal += yy[i]; xTotal +=dArrayX[i]; yTotal +=dArrayY[i]; } pearson = [(size * xyTotal)-(xx[i]*yy[i])] / MathSqrt ( [(size * xxTotal)- MathPow(xTotal,2)][(size * yyTotal)- MathPow(yTotal,2)] ); return pearson; }
- schnappi: how would you then measure correlation?
#define INDEX uint // Zero based. #define COUNT uint // One based. double correlation_coefficient(double& a[], double& b[], COUNT length, INDEX iBeg=0){ INDEX iEnd = iBeg + length; /* https://en.wikipedia.org/wiki/Correlation_and_dependence * CorrelationCoefficient = ss(xy)/Sqrt( ss(xx) ss(yy) ) * ss(xy) = E (xi-Ave(x))(yi-Ave(y)) = n E XiYi - E Xi E Yi * ss(xx) = E (xi-Ave(x))^2 = n E Xi**2 - (E Xi)**2 * ss(yy) = E (yi-Ave(y))^2 = n E Yi**2 - (E Yi)**2 */ double Ex=0.0, Ex2=0.0, Ey=0.0, Ey2=0.0, Exy=0.0; // Ex=Sum(x) for(; iBeg < iEnd; ++iBeg){ double x = a[iBeg], y = b[iBeg]; Ex += x; Ex2 += x * x; Ey += y; Ey2 += y * y; Exy += x * y; } double ssxy = length * Exy - Ex * Ey; double ssxx = length * Ex2 - Ex * Ex; double ssyy = length * Ey2 - Ey * Ey; double deno = MathSqrt(ssxx * ssyy); return (deno == 0.0) ? 0.0 : ssxy / deno; }
- JC: Correlation isn't dependent on the internal sequence of each set of dataTrue as long as "the internal sequence of each set of data" is the same sequence.
Thanks for the function !!!!! this is exactly what ive been looking for. And exactly mql5 equivalent to pinescript correlation(x, y, len)
Link for anyone searching for correlation formula https://www.mql5.com/en/articles/5481
- www.mql5.com
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Hi,
I'm a newcomer to programming and MQL4, and I was wondering if some could help me with a piece of code?
I've looked everywhere and the book/articles/google don't help me. I may have stumbled across the answer but because the code is so complex, and not relating to comparing arrays, I'm having great difficulty.
My problem:
I have 2 arrays. They are both the same size.
* My coding is very clumsy with using for() and i'm sure it could be used in this context. Also I'd only need to do it the once so I don't know if that means it should go in a different part of the EA (e.g before start()).
My goal:
I wish to receive the correlation coefficient between the two arrays. (e.g. -1>x<1)
This process is meant to be identical to that of the excel correlation function. In excel you simply highlight the two arrays and it spits out the result in no time.
I use this as a filter as I find it gives a very good indication of the strength of a movement and ensures entries are made when there is not a high level of noise.
On excel I worked with tick-by-tick data and so Array1 is in effect holding this data (just as excel would e.g. A1:A500.)
Final note:
Also, if anyone has any suggestion as to how to use iMAOnArray() on a custom array(e.g. how would you use the period parameter in this context), I - and the forex karma gods - would be forever greatful.
Thank you so much in advance if you can help, i'll appreciate it so much.
As an extra note:
I found this link with the code below, however the comment attached to the script was:
Hello Russell,
Your function was giving incorrect results. I tried it with two hard-coded numeric arrays with identical values and it doesn't return 1. Initially I thought there was something wrong with my Correlation indicator. Took me a while to discover it was actually this routine that was giving me the error.
Scott
reply 12.07.2009 14:22 apparelink
Code: