Coincidence or cointegrated? - page 2

 
I could not get the alglib to work, so I am attempting to build a tool that can help me in the mean time. This is where I am at so far with the calculus:


//+------------------------------------------------------------------+
//| IntegrateNormalBetween                                       |
//+------------------------------------------------------------------+
/*
This function gives the integral between  x1 and x2.
"height" is same as "Y" in an equation, so f(x)=Y.
I am estimating the integral using mid-point rectangles because if I attempt to integrate this function at any one point I will get zero.
*/
double IntegrateNormalBetween(const double Average, const double StandardDeviation, double X1_point, double X2_point, int N_samplesize=0)
  {
   double integral=0;
   double width_increment=0,width_midpoint=0,last_width=0,height=0,last_height=0,area=0,total_area=0;
   double width=0.01;//smallest rectangle width is 0.01. The smaller the more accurate.
   width_increment = X1_point;
      while( (width_increment+width) <= X2_point )//if new value will be < x.
      {       
       width_midpoint = width/2 + width_increment;
       width_increment = width_increment + width;//the width: add or increment by 0.01 
       height = Gaussian(width_midpoint,Average,StandardDeviation);//the height is f( (width_midpoint) ). Another function can be used here instead of the Gaussian.
       area = width * height;//area: width * midpoint_height...sum all rectangles.
       total_area = total_area + area;
      }
      if( (width_increment+width) > X2_point && (width_increment) < X2_point )//if it will be > x dont increment.
      {
       last_width = X2_point - width_increment;//the last width: find difference between last value and x.  
       width_midpoint = last_width/2 + width_increment;
       last_height = Gaussian(width_midpoint,Average,StandardDeviation);//the last height is f( (difference_value/2) ) 
       area = width * height;//area: last width * midpoint_last_height
       total_area = total_area + area;
      }
   integral = total_area;
   return(integral);   
  }
//+------------------------------------------------------------------+
//| End of IntegrateNormalBetween                                |
//+------------------------------------------------------------------+  
 



//+------------------------------------------------------------------+
//| Gaussian function                                      |
//+------------------------------------------------------------------+
/*
 This is also called "The Normal Equation": Y = { 1/[ σ * sqrt(2π) ] } * e-(x - μ)2/2σ2

*/
double Gaussian(double X_point, double X_avg, double X_stddeviation)
  {
   double pi=3.14159265, E=2.71828182;
   double Y=0,E_exponent=0,pnom1=0,pnom2=0;
   
      if(X_stddeviation != 0)
      {
       //calculate E's exponent
       E_exponent = ( -1 * pow( (X_point - X_avg),2 ) )/( 2 * pow(X_stddeviation,2) );
       pnom1 = pow(E,E_exponent);
      
       //calculate other polynomial
       pnom2 = 1/( X_stddeviation * sqrt(2 * pi) );
      
       //calculate result
       Y = pnom2 * pnom1;
      }
      else Print("Error: Gaussian function zero divide.");

   return(Y);
  }
//+------------------------------------------------------------------+
//| End of Gaussian function                                |
//+------------------------------------------------------------------+  
 



//+------------------------------------------------------------------+
//| StudentTDistribution function                                      |
//+------------------------------------------------------------------+
/*
 This is the formula "The Student's T-Distribution"
 
*/
double StudentTDistribution(double t_point, int N_sample_size)
  {
   double pi=3.14159265;
   double gamma=sqrt(pi);
   double v = N_sample_size - 1;
   
   double Y=0,pnom1=0,pnom2=0,pnom3=0,pnom4=0,pnom5=0,pnom1_exponent=0;
   
      if(v > 0)
      {      
       pnom1_exponent = (0 - ( (v + 1)/2 ) );//calculate pnom1's exponent
       pnom1 = pow( ( 1 + ( pow( t_point,2) )/v ),pnom1_exponent );
       
       pnom2 = (v/2);
       
       pnom3 = sqrt( (v * pi) ) * gamma;
      
       pnom4 =  gamma * ( (v + 1)/2 );
       
       pnom5 = pnom4 / ( pnom3 * pnom2 );

       Y = pnom1 * pnom5;
      }
      else Print("Error: StudentTDistribution function zero divide.");

   return(Y);
  }
//+------------------------------------------------------------------+
//| End of StudentTDistribution function                                |
//+------------------------------------------------------------------+  
 


If I estimate the true correlation(taken by averaging about 100 correlation values) and take this as the true population, then I simply 
can find the probability that that figure is correct by integrating under the normal curve. I am assuming that the correlation figures 
are normally distributed.

For example let's say the SMA(100) of correlation figures is 850. One day I get 940. I now want to know if this is true, thus my null hypothesis is that "x"
is a false figure, and my alternate hypothesis is that "x"is a plausible figure. I should just plug the mean and the standard deviation 
into the probability distribution(the integration) function:
double mean=0,stdv=0,y=0,x=0, h=0;
   mean = 850;
   stdv = 100;
   x = 940;
   
y = IntegrateNormalBetween(mean,stdv,0,x);//y is 81.5%. For me this is sufficient to reject the null hypothesis i.e. the figure is valid.


The dilema I have now is that the above procedure is simply my attempt to find a work around. The above procedure should be used when we know 
the full collection of the population(all possible correlation figures) and we do NOT know this. This therefore points me in the direction of t-statistic, since 
it is an accepted procedure to use just a sample of the population. Using the t-distribution is the best approach, for this situation.
Now, I have a student's t-distribution function but I keep getting contrary results, so I am basically stuck with the previous method. I would like someone to 

present a piece of code to get the correct t-distribution figure because the one I have is not working.


p.s. no correlation figure will reach 940. It is just an illustration to show that the function is working.


 

I won't need to use the entire range of the student's t-distribution, since I am only interested in probability values of 10,5,1 percent. I plan on rejecting the null hypothesis at a critical level 5 percent for only positive t-scores. I can remake the student's t-distribution function into a more specialized one to account for only the figures I need. This should reduce the complexity:

//+------------------------------------------------------------------+
//| StudentTDistribution function                                    |
//+------------------------------------------------------------------+
/*
 This gives an estimate of the probability values for the t-score.
*/
double StudentTDistribution(double t_score, int N_sample_size)
  {   
   double pvalue = 1;//assume null is certain
   int v = N_sample_size - 1;
 
      if(v<99)Print(" Required sample size is 100. Current sample size is "+IntegerToString(N_sample_size));
      if(t_score<0)Print(" Invalid t_score. Current t_score is "+DoubleToStr(t_score));
      
      if(v==99 && t_score>=2.365 )
      {      
       pvalue = 0.01;
      }
      else if(v==99 && t_score>= 1.66 && t_score<2.365 )
      {      
       pvalue = 0.05;
      }  
      else if(v==99 && t_score>= 1.29 && t_score<1.66 )
      {      
       pvalue = 0.10;
      }          
      else Print("out of range pvalue.");

   return(pvalue);
  }
//+------------------------------------------------------------------+
//| End of StudentTDistribution function                                |
//+------------------------------------------------------------------+  
 

I know this seems very "artificial" but I think this will be much faster and easier. The values the above function has were simply copied from a real t-table. This just restricts us to using a critical value of 10 or lower along with a required sample size of 100.


The hardest part is completed so what's left to be done is to get a function to :

1) estimate true population parameters, such as population mean. I think the integration function above can do this for us.

2) calculate sample parameters, such as sample mean. Sample size must be exactly 100. We then calculate t-score from sample.

3) conduct a hypothesis test on the estimated population parameters at critical values of 10 or 5 or 1. I will use 5% from the student's tdistribution, so if I get 10% or more I must accept the null hypothesis i.e. the estimated population parameters is invalid. If I get 5% or less then the alternate hypothesis is true i.e. the estimated population parameters are valid.

 

Can you explain why you use a Student's t-test?

Intuitively, given the last n close prices of two currency pairs, for correlation I'd simply use Pearson's coefficient R, whereas for cointegration I'd use the Engle-Granger test, i.e. a unit root test of the residuals of linear regression.

I'm not saying that your approach is by any means incorrect; probably I just don't understand where you're going with that.

 
Chris70:

"Can you explain why you use a Student's t-test?"

"Intuitively, given the last n close prices of two currency pairs, for correlation I'd simply use Pearson's coefficient R, whereas for cointegration I'd use the Engle-Granger test, i.e. a unit root test of the residuals of linear regression.

I'm not saying that your approach is by any means incorrect; probably I just don't understand where you're going with that."

Since we don't have all possible correlation figures(population), we will have to estimate it from sample. And if we are going to estimate population parameters from a sample then a t-test is the best tool for this, in my opinion.

Calculating the correlation value(Pearson R) in this direct way is not very reliable, as you will find a drastic difference in figures each time. 

 
Sorry for my lateness in completing the task. Before, I continue, I think it is prudent to review why I am taking this approach, so lets go back to what got us to this point. I have an EA that is opening multiple positions on different pairs when it receives a trade signal, in this case it was a "BUY" signal. It was intended that this EA should trade multiple signals at different times in order to diversify risk, however, it was subsequently discovered by me that currency pairs are to some extent correlated. This caused the positively correlated pairs to trigger a trade simultaneously in the same direction, and this is not ideal for me.

Since no one knows what the "true" correlation value(Pearson R) of two currency pairs are then we need a proper way of estimating this value. It is my interpretation that the Law of large Numbers as well as the Central Limit theory applies to ALL numbers in this universe.

Central Limit theory essentially states that for a sufficiently large sample size, the probability of a random variable will converge to a central figure i.e. the true probability figure of a random variable.

The Law of Large Numbers is essentially states that when you take a sufficiently large sample size of any random variable(in this case a correlation value) the resulting distribution will form a nearly symmetrical bell shape regardless of the underlying distribution, and the top of the bell is in the center.i.e you will get a normal distribution.


I stand corrected on the applicability of the Central Limit Theory or Law of Large numbers. These are just two tools I am borrowing from mathematics and statistics because I think they can help me solve a problem. In using the above tools, I will treat the correlation value as a random variable since we don't know the true value. I will also test the validity of the resulting figure by doing an hypothesis test via the Student's T Distribution. The hypothesis test will give the likelihood that the estimated value is erroneous(spurious). When the likelihood of a spurious figure is 5% or less, then this figure is deemed valid based on the hypothesis test.


//+------------------------------------------------------------------+
//| PearsonR function                                    |
//+------------------------------------------------------------------+
/*
 This gives a product-moment correlation coefficient over three pairs of observations.
 r = Σ (xy) / sqrt [ ( Σ x2 ) * ( Σ y2 ) ]
*/
double PearsonR(double& X_array[], double& Y_array[])
  {   
   int N_sample_size=ArrayRange(X_array,0);
   double xmean=0,ymean=0,xo=0,yo=0,xsquaredsum=0,ysquaredsum=0,numerator=0,denominator;
   double r=0;
      if(ArrayRange(Y_array,0) < N_sample_size)
      {
       Print("Error: Array X has "+IntegerToString(N_sample_size)+" elements but array Y has less. ");
       return(r); 
      }   
   xmean = iMAOnArray(X_array,N_sample_size,N_sample_size,0,MODE_SMA,0);
   ymean = iMAOnArray(Y_array,N_sample_size,N_sample_size,0,MODE_SMA,0);    
      while(N_sample_size>0)
      {
       --N_sample_size;
       xo = X_array[N_sample_size] - xmean; 
       yo = Y_array[N_sample_size] - ymean;
       numerator = numerator + (xo * yo);
       xsquaredsum = xsquaredsum + (xo * xo);
       ysquaredsum = ysquaredsum + (yo * yo);      
      }
   denominator = ( (xsquaredsum * ysquaredsum) ); 
   denominator = sqrt( (denominator) );
   //BreakPoint("","","",true,"denominator ",DoubleToStr(denominator),"numerator ",DoubleToStr(numerator));   
   
   if(denominator > 0)r = (numerator / denominator);  
   return(r);
  }
//+------------------------------------------------------------------+
//| End of PearsonR function                                |
//+------------------------------------------------------------------+  
 


//+------------------------------------------------------------------+
//| Sample_StdDev function                                    |
//+------------------------------------------------------------------+
/*
 This gives an estimate of the population standard deviation via sample standard deviation.
*/
double Sample_StdDev(double& x[])
  {   
   int n=ArrayRange(x,0);
   int v=n-1;
   double xo=0,xsum=0,variance=0,stdv=0;
   double xmean = iMAOnArray(x,n,n,0,MODE_SMA,0);
   //s2 = Σ ( xi - x )2 / ( n - 1 )
      while(n>0)
      {
       --n;
       xo = x[n] - xmean;
       xsum = xsum + (xo * xo);
      }
   if( v > 0 )variance = (xsum / (v) );
   else Print("zero divide in Sample_StdDev.");
   stdv = sqrt(variance);
   
   return(stdv);
  }
//+------------------------------------------------------------------+
//| End of Sample_StdDev function                                |
//+------------------------------------------------------------------+  
 


//+------------------------------------------------------------------+
//| isCorrelated function                                      |
//+------------------------------------------------------------------+
/*
 This will tell if two series are correlated. Correlation R is assumed to be normally distributed.

*/
bool isCorrelated(string Symbol1, string Symbol2, ENUM_TIMEFRAMES ChartTimeframe, int ashift)
  {
   double tmparr1[],tmparr2[],r_arr[];
   double r=0,total_r=0,average_r=0,t_score=0;
   bool corr_result=false;
   int N=400,n=100,stp=3,nstp=3,h=0,cnt_r=0;
   double mean_p=0,mean_s=0,stdv_p=0,stdv_r=0,stdv_s=0,t_value=0;
   //1) estimate true population parameters, such as population mean,standard deviation. 
   mean_p = iMA(Symbol1,ChartTimeframe,N,0,MODE_SMA,PRICE_CLOSE,ashift);
   ArrayResize(tmparr1,stp);
   ArrayResize(tmparr2,stp);
   ArrayResize(r_arr,N);
   
      //fill temporary array and 
      while(N>stp)
      {
       nstp = N - stp;
       h = 0;
          while(N>nstp)
          {
           --N;
           tmparr1[h] = iClose(Symbol1,ChartTimeframe,N);  
           tmparr2[h] = iClose(Symbol2,ChartTimeframe,N);     
          } 
       r = PearsonR(tmparr1,tmparr2);//caculate R for the two series
       r_arr[cnt_r] = r;
       total_r = total_r + r;
       ++cnt_r;       
      }
    ArrayResize(r_arr,n);//shrink array to n number of data  
    average_r = ( total_r / cnt_r );//calculate average r 
    mean_p = average_r;//estimated population mean     
    stdv_r = Sample_StdDev(r_arr);//calculate standard deviation of r series

   //2) calculate sample parameters, such as sample mean. Sample size must be exactly 100. We then calculate t-score from sample.
   mean_s = iMAOnArray(r_arr,n,n,0,MODE_SMA,0);
   stdv_s = stdv_r;
  
  
   //3) t = [ mean_s - μ ] / [ stdv_s / sqrt( n ) ]
   t_score = (mean_s - mean_p)/(stdv_s/sqrt(n)); 
   
   //4) conduct a hypothesis test on the estimated population parameters at critical values of 10 or 5 or 1. 
   //I will use 5% from the student's tdistribution, 
   //so if I get 10% or more I must accept the null hypothesis i.e. the estimated population parameters is invalid. 
   t_value = StudentTDistribution(t_score,n);   
      if(t_value <= 0.05)
      {
       //If I get 5% or less then the alternate hypothesis is true i.e. the estimated population parameters are valid. 
       corr_result = true;
      }
      else
      {
       //I must accept the null hypothesis i.e. the estimated population parameters is invalid. 
       corr_result = false;
      }   
 //BreakPoint("","","",true,"total_r",DoubleToStr(total_r),"mean_p",DoubleToStr(mean_p),"stdv_r",DoubleToStr(stdv_r),"mean_s",DoubleToStr(mean_s),"stdv_s",DoubleToStr(stdv_s),"t_score",DoubleToStr(t_score));     
   return(corr_result);
  }
//+------------------------------------------------------------------+
//| End of isCorrelated function                                |
//+------------------------------------------------------------------+  
 


Here is the usage:

bool h = isCorrelated("EURUSD","USDCHF",0,1);//if h is true then the pairs are correlated.


Good bye

forex news - Trading blogs and financial markets analysis
forex news - Trading blogs and financial markets analysis
  • www.mql5.com
Forex – foreign exchange market, or currency market – is the market where one currency is traded for another. Open 24-hours a day from Sunday evening through to Friday night, it is the world's most
 
Romeo Dela Cruz Jr:
Hi hello sir please guide/help me i wanna earn money for my financial assistance but i dont give up for this trading. Try and try.

Please checkout youtube for videos on trading multiple time frames. Once you learn that you will be alright

 
Dwaine Hinds:
Sorry for my lateness in completing the task. Before, I continue, I think it is prudent to review why I am taking this approach, so lets go back to what got us to this point. I have an EA that is opening multiple positions on different pairs when it receives a trade signal, in this case it was a "BUY" signal. It was intended that this EA should trade multiple signals at different times in order to diversify risk, however, it was subsequently discovered by me that currency pairs are to some extent correlated. This caused the positively correlated pairs to trigger a trade simultaneously in the same direction, and this is not ideal for me.

Since no one knows what the "true" correlation value(Pearson R) of two currency pairs are then we need a proper way of estimating this value. It is my interpretation that the Law of large Numbers as well as the Central Limit theory applies to ALL numbers in this universe.

Central Limit theory essentially states that for a sufficiently large sample size, the probability of a random variable will converge to a central figure i.e. the true probability figure of a random variable.

The Law of Large Numbers is essentially states that when you take a sufficiently large sample size of any random variable(in this case a correlation value) the resulting distribution will form a nearly symmetrical bell shape regardless of the underlying distribution, and the top of the bell is in the center.i.e you will get a normal distribution.


I stand corrected on the applicability of the Central Limit Theory or Law of Large numbers. These are just two tools I am borrowing from mathematics and statistics because I think they can help me solve a problem. In using the above tools, I will treat the correlation value as a random variable since we don't know the true value. I will also test the validity of the resulting figure by doing an hypothesis test via the Student's T Distribution. The hypothesis test will give the likelihood that the estimated value is erroneous(spurious). When the likelihood of a spurious figure is 5% or less, then this figure is deemed valid based on the hypothesis test.



Here is the usage:


Good bye

Regarding "true" correlation I see that in your example code you're already using the formula for the empirical sample correlation (which per se isn't "true", but the best we can get). You might argue that by the law of large numbers the sample average ultimately near equals the expected value, however the law of large numbers strictly speaking isn't valid for fat tail distributions like in trading. However... I guess we don't need perfection to make a trade.

Although I suggested using Pearson's correlation myself, I admit that this method comes with a weakness: it's designed to be used only for evaluation of the relationship between two samples that are assumed to be linearly(!) dependent, which most likely is far from true for any relations in the complex world of price time series.

I see that you have profound knowledge about statistics and maths, so I'll be careful with recommendations ;-) . Personally,  what I can think of as an another alternative is Spearman's rank correlation coefficient rho. This can be used similarly to Pearson R, but is designed to be used for non-linear monotonic dependence, which in trading probably is much closer to the truth than linearity.

Spearman's correlation is fairly easy to put into code, e.g. for the correlation betw. array X[] and Y[]:

1. for each element of X[] obtain a rank value (e.g. via pairwise comparison method inside a while-loop until ranking completed) and store the ranks in an array like   int X_rank[]

2. the same for Y[] --> Y_rank[]

3. double numerator=0;

    for (int n=0;n<elements;n++) {numerator+=6*pow(X_rank[n]-Y_rank[n],2);}

    double rho=1-numerator/(elements*(pow(elements,2)-1));

And: you can also derive a t-score from Spearman's rho:       t_score = rho*sqrt((elements-2)/(1-pow(rho,2)))

 
Dwaine Hinds:

Hi  Jean Francois Le Bas. This is what I am getting when I execute "UseAlglib.mq4 ":







Please tell what you did to make it work.

you don't need a pointer to call a function in Alglib

just use the class name : Classname::function()

 
and please show the code of the corresponding error because i'm not a psychic
Reason: