Coincidence or cointegrated? - page 2

To add comments, please log in or register
Dwaine Hinds
145
Dwaine Hinds  
I could not get the alglib to work, so I am attempting to build a tool that can help me in the mean time. This is where I am at so far with the calculus:


//+------------------------------------------------------------------+
//| IntegrateNormalBetween                                       |
//+------------------------------------------------------------------+
/*
This function gives the integral between  x1 and x2.
"height" is same as "Y" in an equation, so f(x)=Y.
I am estimating the integral using mid-point rectangles because if I attempt to integrate this function at any one point I will get zero.
*/
double IntegrateNormalBetween(const double Average, const double StandardDeviation, double X1_point, double X2_point, int N_samplesize=0)
  {
   double integral=0;
   double width_increment=0,width_midpoint=0,last_width=0,height=0,last_height=0,area=0,total_area=0;
   double width=0.01;//smallest rectangle width is 0.01. The smaller the more accurate.
   width_increment = X1_point;
      while( (width_increment+width) <= X2_point )//if new value will be < x.
      {       
       width_midpoint = width/2 + width_increment;
       width_increment = width_increment + width;//the width: add or increment by 0.01 
       height = Gaussian(width_midpoint,Average,StandardDeviation);//the height is f( (width_midpoint) ). Another function can be used here instead of the Gaussian.
       area = width * height;//area: width * midpoint_height...sum all rectangles.
       total_area = total_area + area;
      }
      if( (width_increment+width) > X2_point && (width_increment) < X2_point )//if it will be > x dont increment.
      {
       last_width = X2_point - width_increment;//the last width: find difference between last value and x.  
       width_midpoint = last_width/2 + width_increment;
       last_height = Gaussian(width_midpoint,Average,StandardDeviation);//the last height is f( (difference_value/2) ) 
       area = width * height;//area: last width * midpoint_last_height
       total_area = total_area + area;
      }
   integral = total_area;
   return(integral);   
  }
//+------------------------------------------------------------------+
//| End of IntegrateNormalBetween                                |
//+------------------------------------------------------------------+  
 



//+------------------------------------------------------------------+
//| Gaussian function                                      |
//+------------------------------------------------------------------+
/*
 This is also called "The Normal Equation": Y = { 1/[ σ * sqrt(2π) ] } * e-(x - μ)2/2σ2

*/
double Gaussian(double X_point, double X_avg, double X_stddeviation)
  {
   double pi=3.14159265, E=2.71828182;
   double Y=0,E_exponent=0,pnom1=0,pnom2=0;
   
      if(X_stddeviation != 0)
      {
       //calculate E's exponent
       E_exponent = ( -1 * pow( (X_point - X_avg),2 ) )/( 2 * pow(X_stddeviation,2) );
       pnom1 = pow(E,E_exponent);
      
       //calculate other polynomial
       pnom2 = 1/( X_stddeviation * sqrt(2 * pi) );
      
       //calculate result
       Y = pnom2 * pnom1;
      }
      else Print("Error: Gaussian function zero divide.");

   return(Y);
  }
//+------------------------------------------------------------------+
//| End of Gaussian function                                |
//+------------------------------------------------------------------+  
 



//+------------------------------------------------------------------+
//| StudentTDistribution function                                      |
//+------------------------------------------------------------------+
/*
 This is the formula "The Student's T-Distribution"
 
*/
double StudentTDistribution(double t_point, int N_sample_size)
  {
   double pi=3.14159265;
   double gamma=sqrt(pi);
   double v = N_sample_size - 1;
   
   double Y=0,pnom1=0,pnom2=0,pnom3=0,pnom4=0,pnom5=0,pnom1_exponent=0;
   
      if(v > 0)
      {      
       pnom1_exponent = (0 - ( (v + 1)/2 ) );//calculate pnom1's exponent
       pnom1 = pow( ( 1 + ( pow( t_point,2) )/v ),pnom1_exponent );
       
       pnom2 = (v/2);
       
       pnom3 = sqrt( (v * pi) ) * gamma;
      
       pnom4 =  gamma * ( (v + 1)/2 );
       
       pnom5 = pnom4 / ( pnom3 * pnom2 );

       Y = pnom1 * pnom5;
      }
      else Print("Error: StudentTDistribution function zero divide.");

   return(Y);
  }
//+------------------------------------------------------------------+
//| End of StudentTDistribution function                                |
//+------------------------------------------------------------------+  
 


If I estimate the true correlation(taken by averaging about 100 correlation values) and take this as the true population, then I simply 
can find the probability that that figure is correct by integrating under the normal curve. I am assuming that the correlation figures 
are normally distributed.

For example let's say the SMA(100) of correlation figures is 850. One day I get 940. I now want to know if this is true, thus my null hypothesis is that "x"
is a false figure, and my alternate hypothesis is that "x"is a plausible figure. I should just plug the mean and the standard deviation 
into the probability distribution(the integration) function:
double mean=0,stdv=0,y=0,x=0, h=0;
   mean = 850;
   stdv = 100;
   x = 940;
   
y = IntegrateNormalBetween(mean,stdv,0,x);//y is 81.5%. For me this is sufficient to reject the null hypothesis i.e. the figure is valid.


The dilema I have now is that the above procedure is simply my attempt to find a work around. The above procedure should be used when we know 
the full collection of the population(all possible correlation figures) and we do NOT know this. This therefore points me in the direction of t-statistic, since 
it is an accepted procedure to use just a sample of the population. Using the t-distribution is the best approach, for this situation.
Now, I have a student's t-distribution function but I keep getting contrary results, so I am basically stuck with the previous method. I would like someone to 

present a piece of code to get the correct t-distribution figure because the one I have is not working.


p.s. no correlation figure will reach 940. It is just an illustration to show that the function is working.


Dwaine Hinds
145
Dwaine Hinds  

I won't need to use the entire range of the student's t-distribution, since I am only interested in probability values of 10,5,1 percent. I plan on rejecting the null hypothesis at a critical level 5 percent for only positive t-scores. I can remake the student's t-distribution function into a more specialized one to account for only the figures I need. This should reduce the complexity:

//+------------------------------------------------------------------+
//| StudentTDistribution function                                    |
//+------------------------------------------------------------------+
/*
 This gives an estimate of the probability values for the t-score.
*/
double StudentTDistribution(double t_score, int N_sample_size)
  {   
   double pvalue = 1;//assume null is certain
   int v = N_sample_size - 1;
 
      if(v<99)Print(" Required sample size is 100. Current sample size is "+IntegerToString(N_sample_size));
      if(t_score<0)Print(" Invalid t_score. Current t_score is "+DoubleToStr(t_score));
      
      if(v==99 && t_score>=2.365 )
      {      
       pvalue = 0.01;
      }
      else if(v==99 && t_score>= 1.66 && t_score<2.365 )
      {      
       pvalue = 0.05;
      }  
      else if(v==99 && t_score>= 1.29 && t_score<1.66 )
      {      
       pvalue = 0.10;
      }          
      else Print("out of range pvalue.");

   return(pvalue);
  }
//+------------------------------------------------------------------+
//| End of StudentTDistribution function                                |
//+------------------------------------------------------------------+  
 

I know this seems very "artificial" but I think this will be much faster and easier. The values the above function has were simply copied from a real t-table. This just restricts us to using a critical value of 10 or lower along with a required sample size of 100.


The hardest part is completed so what's left to be done is to get a function to :

1) estimate true population parameters, such as population mean. I think the integration function above can do this for us.

2) calculate sample parameters, such as sample mean. Sample size must be exactly 100. We then calculate t-score from sample.

3) conduct a hypothesis test on the estimated population parameters at critical values of 10 or 5 or 1. I will use 5% from the student's tdistribution, so if I get 10% or more I must accept the null hypothesis i.e. the estimated population parameters is invalid. If I get 5% or less then the alternate hypothesis is true i.e. the estimated population parameters are valid.

Chris70
473
Chris70  

Can you explain why you use a Student's t-test?

Intuitively, given the last n close prices of two currency pairs, for correlation I'd simply use Pearson's coefficient R, whereas for cointegration I'd use the Engle-Granger test, i.e. a unit root test of the residuals of linear regression.

I'm not saying that your approach is by any means incorrect; probably I just don't understand where you're going with that.

12
To add comments, please log in or register