Discussion of article "R-squared as an estimation of quality of the strategy balance curve" - page 2

 
I read diagonally, but nevertheless - I made important conclusions for myself. Thank you very much to the author.
 
Thanks for the helpful article! Reposted it! :-)
 

Figure 19: LR-Correlation distribution for 10,000 random walks


Figure 20: Distribution of R^2 for 10,000 random walks

I don't see how R^2 can take the negative values that are shown in the second graph? Yes and there are questions with the first graph. If the linear regression is plotted correctly, it seems like the Pearson's RQ (LR) should not be negative. But in the graph it is not. Where am I wrong?


Got it. I'm not wrong anywhere, it's just that the graphs have custom R^2 and LR - multiplication by -1 of the real value occurs if the last element of the numerical series is less than the first. It would be good to write about it before the graphs.

 

In the paper, linear regression is considered with an error - via CLinReg::LRLine.

Proof

#include <Graphics\Graphic.mqh> 
#include <Math\Stat\Normal.mqh>
#include <Math\Alglib\Alglib.mqh>

// Returns Y-values of the line (y(x)=a*x+b)
void GetLine( const double a, const double b, const int Amount, double &Result[] )
{
  ArrayResize(Result, Amount);
  
  for (int i = 0; i < Amount; i++)
    Result[i] = a * i + b;    
}

// Returns a linear regression via CLinReg::LRLine
void GetLinearRegression( const double &Array[], double &Result[] )
{
  const int Total = ArraySize(Array);
  
  CMatrixDouble XY(Total, 2);
  
  for (int i = 0; i < Total; i++)
  {
    XY[i].Set(0, i);
    XY[i].Set(1, Array[i]);
  }
  
  int retcode;
  double a, b;
  
  CLinReg::LRLine(XY, Total, retcode, a, b);

  GetLine(a, b, Total, Result);    
}

// Returns a linear regression via CAlglib::LRBuild + CAlglib::LRUnpack
void GetLinearRegression2( const double &Array[], double &Result[] )
{
  const int Total = ArraySize(Array);
  
  CMatrixDouble XY(Total, 2);
  
  for (int i = 0; i < Total; i++)
  {
    XY[i].Set(0, i);
    XY[i].Set(1, Array[i]);
  }
  
  int retcode;
  
  CLinearModelShell lm;
  CLRReportShell    ar;
//--- arrays for storing regression results
  double lr_coeff[];
//--- calculation of linear regression coefficients
  CAlglib::LRBuild(XY, Total, 1, retcode, lm, ar);
//--- obtaining linear regression coefficients
  CAlglib::LRUnpack(lm, lr_coeff, retcode);

  GetLine(lr_coeff[0], lr_coeff[1], Total, Result);      
}

void ToChart( const double &Array1[], const double &Array2[], const int X = 0, const int Y = 0, const int Width = 780, const int Height = 380 )
{
  static const string Name = __FILE__;
  
  CGraphic Graphic; 

  if (ObjectFind(0, Name) < 0) 
    Graphic.Create(0, Name, 0, X, Y, Width, Height); 
  else 
    Graphic.Attach(0, Name); 

  Graphic.CurveAdd(Array1, CURVE_LINES);
  Graphic.CurveAdd(Array2, CURVE_LINES);
  
  Graphic.CurvePlotAll(); 
  Graphic.Update();  
}

void GetRandomArray( double &Array[], const int Amount = 1 e3 )
{
  double Random[];
  
  MathSrand(GetTickCount()); 

  MathRandomNormal(0, 1, Amount, Random); 
  MathCumulativeSum(Random, Array);
}

#define  TOSTRING(A) #A + " = " + (string)(A) + "\n"

void OnStart() 
{   
  double Array[];
  
  GetRandomArray(Array);  
  
  double Estimate[];
  double Estimate2[];
     
  GetLinearRegression(Array, Estimate);
  GetLinearRegression2(Array, Estimate2);

  const double R = CAlglib::PearsonCorr2(Array, Estimate);
  const double R2 = CAlglib::PearsonCorr2(Array, Estimate2);
  
  Print(TOSTRING(R) +
        TOSTRING((Array[0] > Array[ArraySize(Array) - 1]) ? -R : R) +
        TOSTRING(R2));
  
  ToChart(Array, Estimate2);
}


Result

R = -0.5864718581193301
(Array[0]>Array[ArraySize(Array)-1])?-R:R = -0.5864718581193301
R2 = 0.58647185811933


The sign is incorrect. Alternative LR implementation (CAlglib::LRBuild + CAlglib::LRUnpack) counts correctly:


 
fxsaber:

The graphs of the LR Correlation and R^2 distributions for the 10,000 independent examples that are presented in the article show that R^2 != LR^2.

I don't understand why the second degree of the original "concave" distribution makes it "flat"?

This is where I was proven wrong. To me, the statement is not obvious at all

What is surprising is that by a simple mathematical action (by raising to the second degree) we have completely removed the unwanted marginal effects of the distribution.

So I decided to experimentally confirm via animation (not to take my word for it)

#include <Graphics\Graphic.mqh> 
#include <Math\Stat\Normal.mqh>
#include <Math\Alglib\Alglib.mqh>

// Returns Y-values of the line (y(x)=a*x+b)
void GetLine( const double a, const double b, const int Amount, double &Result[] )
{
  ArrayResize(Result, Amount);
  
  for (int i = 0; i < Amount; i++)
    Result[i] = a * i + b;    
}

// Returns a linear regression via CAlglib::LRBuild + CAlglib::LRUnpack
void GetLinearRegression( const double &Array[], double &Result[] )
{
  const int Total = ArraySize(Array);
  
  CMatrixDouble XY(Total, 2);
  
  for (int i = 0; i < Total; i++)
  {
    XY[i].Set(0, i);
    XY[i].Set(1, Array[i]);
  }
  
  int retcode;
  
  CLinearModelShell lm;
  CLRReportShell    ar;
//--- arrays for storing regression results
  double lr_coeff[];
//--- calculation of linear regression coefficients
  CAlglib::LRBuild(XY, Total, 1, retcode, lm, ar);
//--- obtaining linear regression coefficients
  CAlglib::LRUnpack(lm, lr_coeff, retcode);

  GetLine(lr_coeff[0], lr_coeff[1], Total, Result);      
}

// Calculates R
double GetCustomR( const double &Array[] )
{
  double Estimate[];
   
  GetLinearRegression(Array, Estimate);
   
  const double R = CAlglib::PearsonCorr2(Array, Estimate);

  return((Array[0] > Array[ArraySize(Array) - 1]) ? -R : R);
}

// Calculates R random vectors
void GetRandomCustomR( const int Amount, const int VectorSize, double &Result[] )
{
  double Random[];
  double Sum[];
  
  MathSrand(GetTickCount()); 

  ArrayResize(Result, Amount);
  
  for (int i = 0; i < Amount; i++)
  {
    MathRandomNormal(0, 1, VectorSize, Random); 
    MathCumulativeSum(Random, Sum);
    
    Result[i] = GetCustomR(Sum);
  }  
}

void ToChart( const double &X[],  const double &Y[], const string Str = NULL, const int X0 = 0, const int Y0 = 0, const int Width = 780, const int Height = 380 )
{
  static const string Name = __FILE__;
  
  CGraphic Graphic; 

  if (ObjectFind(0, Name)<0) 
    Graphic.Create(0, Name, 0, X0, Y0, Width, Height); 
  else 
    Graphic.Attach(0, Name); 

  Graphic.BackgroundMain(Str); 
  Graphic.BackgroundMainSize(16); 

  Graphic.CurveAdd(X, Y, CURVE_HISTOGRAM).HistogramWidth(6);
  
  Graphic.CurvePlotAll(); 
  Graphic.Update();  
}

void MathPow( double &Result[], const double &Array[], const double Pow )
{
  const int Size = ArrayResize(Result, ArraySize(Array));
  
  for (int i = 0; i < Size; i++)
    Result[i] = (Array[i] < 0) ? -MathPow(-Array[i], Pow) : MathPow(Array[i], Pow);
}

// https://www.mql5.com/en/docs/standardlibrary/mathematics/stat/normal
//+------------------------------------------------------------------+ 
//| Calculate frequencies for data set| 
//+------------------------------------------------------------------+ 
bool CalculateHistogramArray(const double &data[],double &intervals[],double &frequency[], 
                             double &maxv,double &minv,const int cells=10) 
  { 
   if(cells<=1) return (false); 
   int size=ArraySize(data); 
   if(size<cells*10) return (false); 
   minv=data[ArrayMinimum(data)]; 
   maxv=data[ArrayMaximum(data)]; 
   double range=maxv-minv; 
   double width=range/cells; 
   if(width==0) return false; 
   ArrayResize(intervals,cells); 
   ArrayResize(frequency,cells); 
//--- set the centres of the intervals 
   for(int i=0; i<cells; i++) 
     { 
      intervals[i]=minv+(i+0.5)*width; 
      frequency[i]=0; 
     } 
//--- fill in the interval frequencies 
   for(int i=0; i<size; i++) 
     { 
      int ind=int((data[i]-minv)/width); 
      if(ind>=cells) ind=cells-1; 
      frequency[ind]++; 
     } 
   return (true); 
  } 

void DistributionToChart( const double &Array[], const string Str = NULL, const int NCells = 51 )
{
  double X[];          // histogram interval centres 
  double Y[];          // number of values from the sample that fall into the interval 
  double Max, Min;     // maximum and minimum values in the sample 
  
  CalculateHistogramArray(Array, X, Y, Max, Min, NCells);   

  ToChart(X, Y, Str);
}

void OnInit() 
{   
  double R[];
  
  GetRandomCustomR(1 e3, 1 e4, R);
  
  double Array[];

  const int Max = 50;
  
  while (!IsStopped())
    for (int i = 1; !IsStopped() && i < (Max << 1); i++)
    {
      const double Pow = (i > Max) ? ((Max << 1) - i) * 0.1 : i * 0.1;
      
      MathPow(Array, R, Pow);
      DistributionToChart(Array, "Distribution of R^" + DoubleToString(Pow, 1));      
      
      Sleep(100);
    }
}



It seems to be like this.

 

 

Fig. 21: R^2 value as a custom optimisation criterion

Where in MQL LR Correlation, which is on the picture? Or this and many other parameters are calculated only for single runs, so they are absent in ENUM_STATISTICS?

If so, they suggest to calculate this parameter from the reasonable considerations mentioned in this article: by equity without MM and squared.


ZY I measured how much time it takes to calculate GetCustomR for an array of a million values (like equity) - 2.5 seconds. That's a lot of time. Everything is spent on LR calculation(CAlglib::LRBuild + CAlglib::LRUnpack). But sometimes curved LR through CLinReg::LRLine is an order of magnitude faster. If you dope it up, it becomes tolerable in Optimisations as an Optimisation criterion.

Документация по MQL5: Стандартные константы, перечисления и структуры / Состояние окружения / Статистика тестирования
Документация по MQL5: Стандартные константы, перечисления и структуры / Состояние окружения / Статистика тестирования
  • www.mql5.com
Максимальная просадка баланса в процентах. В процессе торговли баланс может испытать множество просадок, для каждой фиксируется относительное значение просадки в процентах. Возвращается наибольшее значение Максимальная...
 
Dennis Kirichenko:

Oh! I always thought it was 100. Thanks, interesting article.

Yes, that's a number I've come across in reputable books on R and statistics. But sorry couldn't find the link, so sorry.

Dennis Kirichenko:

It is also common to perform significance tests on the regression coefficient. Even Alglib has them :-)

Obviously, the tests are for normal distribution. We got a uniform distribution.

PearsonCorrelationSignificance(), SpearmanRankCorrelationSignificance().

Thanks for the link, I'll remember it.

 
fxsaber:

ZY Wrong statement

R^2 is nothing but the correlation between a graph and its linear model

//-- Find R^2 and its sign
   double r2 = MathPow(corr, 2.0);

Yes indeed, a gross error in wording. I'm surprised I even wrote such a thing. I will correct it.

When you look at all the other MQL-codes, you don't understand why they are given, because they are completely unreadable without CStrategy knowledge

CStrategy is needed only for collecting requisites. The main code, as it was rightly noted, is the actual calculation of R2.

The code for calculating "equity", suitable for R^2. It is written in MT4-style, it is not difficult to translate it to MT5....

Let's study it.

 
Maxim Dmitrievsky:

I agree with this, all the rest will have to be ripped out of classes to add to your system... it would be better to have everything in separate f-iases or a separate includnik.

Take your equity calculation (or the code presented by fxsaber) as a double-array and insert it into the R^2 calculation function. You don't need to tear out anything, you don't need to use classes and CStrategy.

 
fxsaber:

I don't understand how R^2 can take negative values as shown in the second graph? There are also questions with the first graph. If the linear regression is constructed correctly, it seems that Pearson's LR should not be negative. But in the graph it is not. Where am I wrong?

Got it. I'm not wrong anywhere, it's just that the graphs have custom R^2 and LR - multiplication by -1 of the real value occurs if the last element of the numerical series is less than the first. It would be good to write about this before the graphs.

It is hidden in the article:

Our script calculates both LR Correlation and R^2. We will see the difference between them a little later. There is a small addition to the script. We will multiply the resulting correlation coefficient by the final sign of the synthetic graph. If we end up with a result less than zero, the correlation will be negative, if more - positive. This is done in order to quickly and easily separate negative outcomes from positive ones without having to resort to other statistics. This is how LR Correlation works in MetaTrader 5, and R^2 will be built according to the same principle.

Perhaps I should have written about it elsewhere and several times.