Ensemble methods to enhance numerical predictions in MQL5

MetaTrader 5 — Statistics and analysis | 13 December 2024 at 14:46

1 489

Francis Dube

Introduction

Machine learning often produces multiple predictive models of varying quality. Practitioners typically evaluate these models and select the best-performing one for real-world applications. However, this article explores an alternative approach: repurposing seemingly inferior models by combining their outputs to potentially enhance overall predictive performance. We will examine various techniques for combining predictions and demonstrate their implementation in pure MQL5. Finally, we will compare these methods and discuss their suitability for different scenarios.

To formalize the concept of combining model predictions, let’s introduce some key notation. Consider a training set consisting of K data points, each represented as a pair (xi,yi), where xi is a predictor vector and yi is the corresponding scalar response variable we aim to predict. Suppose we have N trained models, each capable of making predictions. When presented with a predictor x, model n generates a prediction denoted as f_n(x). Our goal is to construct a consensus function f(x) that effectively combines these N individual predictions, yielding a more accurate overall prediction than any single model.

Consensus function

This consensus function, often referred to as an ensemble or meta-model, has the potential to outperform its constituent models. In this exploration, we will delve into diverse techniques for constructing effective ensemble models and assess their practical implementation and performance in MQL5.

Ensembles based on averaged predictions

One of the simplest techniques for combining numeric predictions is simple averaging. By calculating the mean of multiple predictions, we can often achieve a more accurate and robust estimate than relying on any single model. This approach is both computationally efficient and easy to implement, making it a practical choice for a wide range of applications. The simplicity of the arithmetic average is its greatest strength. Unlike more complex ensemble methods that require the estimation of multiple parameters, averaging is inherently resistant to overfitting. Overfitting occurs when a model becomes too closely tied to the specific characteristics of the training data, which compromises its ability to generalize to unseen data.

By avoiding parameter estimation entirely, the simple average bypasses this issue, ensuring consistent performance even in noisy or small datasets. In contrast, other ensemble techniques, as we will explore later, often involve parameter tuning and optimization, which can introduce a degree of susceptibility to overfitting. Thus, while averaging may lack the sophistication of advanced ensemble methods, its reliability and ease of use make it an essential tool in the ensemble learning toolkit.

A fundamental mathematical principle, rooted in the Cauchy-Schwarz inequality, provides the theoretical foundation for the function that defines an ensemble of averaged predictions. This inequality states that the square of the sum of N numbers is always less than or equal to N times the sum of their squares.

Cauchy derived inequality

Now, consider a vector of predictors x used to predict a dependent variable y. Substituting a in the inequality with the errors made by a model predicting y from x, then a_n = f_n(x) - y. If the summands on the left side of this equation are split by assuming f(x) to be the average of the predictions. Factoring out N and replacing the far-right term of the equation into the left side of the Cauchy-derived inequality and subsequently dividing both sides by N^2, we arrive at the fundamental equation that underpins averaging as an ensemble method:

Average Ensemble

The summations on the right side of the equation above represent the squared errors of the individual models. Summing these squared errors and dividing by the number of component models gives the mean squared error (MSE) of the individual models. Meanwhile, the left side of the equation represents the squared error of the consensus model, which is derived from the mean of the individual predictions.

Mathematically, this inequality postulates that, for any set of predictors and targets, the squared error of the mean prediction will never exceed the mean squared error of the individual predictions. Equality is achieved only when the prediction errors of all individual models are identical.

Of course, this benefit does not come without limitations. The effectiveness of averaging depends significantly on the nature of the component models. If all models have similar predictive power, averaging their predictions is often a reasonable and effective approach. However, problems can arise when the predictive power of the component models varies widely. In such cases, averaging may dilute the contributions of stronger models while overemphasizing weaker ones, potentially reducing the overall predictive performance of the ensemble.

The code that implements averaging for ensembles is encapsulated in the CAvg class, defined in ensemble.mqh. This class, along with all other classes implementing ensemble methods, relies on the user supplying a collection of pretrained models. These models must adhere to the IModel interface, which is defined as follows:

//+------------------------------------------------------------------+
//| IModel interface defining methods for manipulation of learning   |
//| algorithms                                                       |
//+------------------------------------------------------------------+
interface IModel
  {
//train a model
   bool train(matrix &predictors,matrix&targets);
//make a prediction with a trained model
   double forecast(vector &predictors);
  };

The IModel interface specifies two methods:

train(): This method contains the logic for training a model.
forecast(): This method defines the operations for making predictions based on new input data.

//+------------------------------------------------------------------+
//| Compute the simple average of the predictions                    |
//+------------------------------------------------------------------+
class CAvg
  {

public:
                     CAvg(void) ;
                    ~CAvg(void) ;
   double            predict(vector &inputs, IModel* &models[]) ;

  } ;
//+------------------------------------------------------------------+
//| Constructor                                                      |
//+------------------------------------------------------------------+
CAvg::CAvg(void)
  {
  }
//+------------------------------------------------------------------+
//| Destructor                                                       |
//+------------------------------------------------------------------+
CAvg::~CAvg(void)
  {
  }
//+------------------------------------------------------------------+
//|  Make a prediction by using consensus from multiple models       |
//+------------------------------------------------------------------+
double CAvg::predict(vector &inputs, IModel* &models[])
  {
   double output = 0.0 ;

   for(uint imodel=0 ; imodel<models.Size() ; imodel++)
     {
      output +=models[imodel].forecast(inputs) ;
     }

   output /= double(models.Size()) ;
   return output;
  }

The CAvg class includes a predict() method, which is invoked with a vector of input data and an array of pretrained component models. This method returns a scalar value representing the consensus prediction. In the case of the CAvg class, the consensus prediction is calculated as the mean prediction derived from the supplied array of models. By adhering to this design, the CAvg class ensures flexibility and modularity, allowing users to seamlessly integrate various model types into their ensemble methods.

Unconstrained linear combinations of predictive models

When faced with a set of models with widely varying predictive qualities, one ensemble method that can be adopted is simple linear regression. The idea is to calculate the consensus prediction as a weighted sum of the component model predictions, including a constant term to account for any bias.

Linear regression based ensemble

This ensemble method is implemented in the CLinReg class. The constructor, destructor, and predict() methods share the same signatures as those of the CAvg class described earlier.

//+------------------------------------------------------------------+
//| Compute the linear regression of the predictions                 |
//+------------------------------------------------------------------+
class CLinReg
  {

public:

                     CLinReg(void) ;
                    ~CLinReg() ;
   bool              fit(matrix & train_vars, vector &train_targets,IModel* &models[]);

   double            predict(vector &inputs, IModel* &models[]) ;

private:
   OLS *m_linreg ;   // The linear regression object
  } ;
//+------------------------------------------------------------------+
//| Constructor                                                      |
//+------------------------------------------------------------------+
CLinReg::CLinReg(void)
  {
   m_linreg = new OLS();
  }
//+------------------------------------------------------------------+
//| Fit the consensus model from saved models                        |
//+------------------------------------------------------------------+
bool CLinReg::fit(matrix &train_vars,vector &train_targets,IModel* &models[])
  {

   matrix independent(train_vars.Rows(),models.Size()+1);

   for(ulong i=0 ; i<independent.Rows() ; i++)     // Build the design matrix
     {
      independent[i][models.Size()] = 1.0;
      vector ins = train_vars.Row(i);
      for(uint imodel=0 ; imodel<models.Size() ; imodel++)
         independent[i][imodel] = models[imodel].forecast(ins) ;

     }
   return m_linreg.Fit(train_targets,independent);
  }
//+------------------------------------------------------------------+
//|  Destructor                                                      |
//+------------------------------------------------------------------+
CLinReg::~CLinReg(void)
  {
   if(CheckPointer(m_linreg)==POINTER_DYNAMIC)
      delete m_linreg ;
  }
//+------------------------------------------------------------------+
//| Predict                                                          |
//+------------------------------------------------------------------+
double CLinReg::predict(vector &inputs, IModel* &models[])
  {
   vector args = vector::Zeros(models.Size());

   for(uint i = 0; i<models.Size(); i++)
      args[i] = models[i].forecast(inputs);

   return m_linreg.Predict(args);
  }

However, the CLinReg class introduces a fit() method, which specifies the operations for training the consensus model.

The fit() method takes as input:

Matrix of predictors.
Vector targets.
An array of component models.

Within fit(), an instance of the OLS class is used to represent the consensus regression model. The matrix variable independent serves as the design matrix, constructed from the squared errors of predictions made by the individual component models, augmented by a constant term (a column of ones). When the predict() method of CLinReg is invoked, it returns the result of using the prediction errors of the component models as inputs to the consensus regression model.

Combining models as a weighted sum of component predictions works well in specific, rare scenarios. However, this approach is often overlooked in real-world applications for two main reasons:

Risk of Overfitting: The weights in the consensus model are parameters that must be optimized. If the ensemble includes many component models, the optimization process can lead to significant overfitting, reducing the model's ability to generalize to unseen data.
Collinearity: If two or more models produce similar predictions, collinearity can cause instability in the weight estimations. This issue arises because weights for models with similar performance may sum to a constant, with those models behaving similarly only for cases encountered during training.

However, this assumption often breaks in real-world scenarios. When an out-of-sample case occurs, models that previously generated similar predictions may react differently, potentially leading the consensus model to produce extreme and unreliable results.

Constrained linear combinations of biased models

Using simple regression as a basis for combining multiple predictive models can sometimes lead to an unstable model with extreme weights. This issue typically arises when regression coefficients have opposite signs, which is required to balance the values so that the data fits well. For example, one coefficient of a correlated pair of models can only be driven to a large positive value if its counterpart is driven to a tiny negative value. To prevent these extreme values, we can constrain the regression coefficients to avoid extreme negative values. This approach also reduces the number of degrees of freedom in the optimization process, making the model more stable and less prone to overfitting.

This ensemble method is implemented in the Cbiased class. It includes the now familiar fit() and predict() methods found in other ensemble implementations.

//+------------------------------------------------------------------+
//|Compute the optimal linear combination of the predictions         |
//|subject to the constraints that the weights are all nonnegative.  |
//|A constant term is also included.                                 |
//|This is appropriate for biased predictors                         |
//+------------------------------------------------------------------+
class Cbiased:public PowellsMethod
  {

public:

                     Cbiased(void) ;
                    ~Cbiased() ;
   bool              fit(matrix & train_vars, vector &train_targets,IModel* &models[]);
   double            predict(vector &inputs,IModel* &models[]) ;

private:
   vector m_coefs ;    // Computed coefficients here
   int biased_ncases ;
   int biased_nvars ;
   matrix biased_x ;
   vector biased_y ;
   virtual double    func(vector &p,int n=0);
  } ;
//+------------------------------------------------------------------+
//| Constructor                                                      |
//+------------------------------------------------------------------+
Cbiased::Cbiased(void)
  {
  }
//+------------------------------------------------------------------+
//| Destructor                                                       |
//+------------------------------------------------------------------+
Cbiased::~Cbiased(void)
  {
  }
//+------------------------------------------------------------------+
//| Function to be optimized                                         |
//+------------------------------------------------------------------+
double Cbiased::func(vector &p,int n = 0)
  {

   double  err, pred,diff, penalty ;
// Compute criterion
   err = 0.0 ;
   for(int i=0 ; i<biased_ncases ; i++)
     {
      pred = p[p.Size()-1] ;                    // Will cumulate prediction
      for(int j=0 ; j<biased_nvars ; j++)       // For all model outputs
         pred += biased_x[i][j] * p[j] ;        // Weight them per call
      diff = pred - biased_y[i] ;               // Predicted minus true
      err += diff * diff ;                      // Cumulate squared error
     }

   penalty = 0.0 ;
   for(int j=0 ; j<biased_nvars ; j++)
     {
      if(p[j] < 0.0)
         penalty -= 1.e30 * p[j] ;
     }

   return err + penalty ;
  }
//+------------------------------------------------------------------+
//| Fit the consensus model                                          |
//+------------------------------------------------------------------+
bool Cbiased::fit(matrix & train_vars, vector &train_targets,IModel* &models[])
  {
   biased_ncases = int(train_vars.Rows());
   biased_nvars = int(models.Size());
   biased_x = matrix::Zeros(biased_ncases,biased_nvars);

   biased_y = train_targets;

   m_coefs = vector::Zeros(biased_nvars+1);

   for(int i = 0; i<biased_ncases; i++)
     {
      vector ins = train_vars.Row(i);
      for(int j = 0; j<biased_nvars; j++)
         biased_x[i][j] = models[j].forecast(ins);
     }

   m_coefs.Fill(1.0/double(biased_nvars));
   m_coefs[m_coefs.Size()-1] = 0.0;

   int iters = Optimize(m_coefs,int(m_coefs.Size()));

   double sum = m_coefs.Sum();

   m_coefs/=sum;

   return true;
  }
//+------------------------------------------------------------------+
//| Make prediction with consensus model                             |
//+------------------------------------------------------------------+
double Cbiased::predict(vector &inputs,IModel* &models[])
  {
   double output=0.0;
   for(uint imodel=0 ; imodel<models.Size() ; imodel++)
     {
      output += m_coefs[imodel] * models[imodel].forecast(inputs);
     }
   return output;
  }

However, the key difference in Cbiased lies in how the weights are optimized.

The optimization of weights is done using Powell’s method for function minimization. This is why the Cbiased class is a descendant of the PowellsMethod class. The criterion function is implemented in the func() method, which iterates through the training data, accumulating squared errors using the supplied weights. For each sample in the dataset:

The predictions from the component models are weighted according to the current weights and a constant term.
The squared differences between the predictions and the target values are summed.

The criterion function ends by checking whether any of the trial weights are negative. If any weights are negative, a penalty is applied. The function returns the total error plus any penalty from the negative weights. This type of ensemble method is most appropriate when it is known that some component models are biased. Bias in this context refers to models that consistently produce predictions that are either too high or too low compared to their corresponding target values, often showing a notable tendency. By constraining the weights, Cbiased effectively reduces the influence of biased models, leading to a more balanced and accurate ensemble prediction. In the next section, we will present a method that is suitable for model sets that exhibit little to no bias, where the focus is on aggregating predictions from models with comparable performance.

Constrained combinations of unbiased models

When a set of component models is known to have no significant bias in their predictions, there is no need to include a constant term in the consensus model. Removing the constant term helps to reduce the model’s propensity to overfit the data. Additionally, this approach ensures that the weights of the models are never negative, as discussed in previous methods. Furthermore, an additional constraint is imposed on the weights: they must sum to one. This constraint provides two key benefits:

Ensuring an unbiased consensus model: As long as the component models are reasonably unbiased, requiring the weights to sum to one ensures that the consensus model remains unbiased as well.
Interpolation between predictions: The sum-to-one constraint guarantees that the consensus prediction is an interpolation between the component models' predictions. This ensures that the final prediction does not drastically deviate from the individual predictions, preventing extreme outcomes that could arise from extreme weights.

This is illustrated by the following equation:

Constrained Linear Model

The code implementing this ensemble method is largely identical to the previous implementations. The main difference with the CUnbiased class lies in the function being minimized.

//+------------------------------------------------------------------+
//|Compute the optimal linear combination of the predictions         |
//|subject to the constraints that the weights are all nonnegative   |
//|and they sum to one.  This is appropriate for unbiased predictors.|
//+------------------------------------------------------------------+
class CUnbiased:public PowellsMethod
  {

public:

                     CUnbiased(void) ;
                    ~CUnbiased() ;
   bool              fit(matrix & train_vars, vector &train_targets,IModel* &models[]);
   double            predict(vector &inputs,IModel* &models[]) ;

private:
   vector m_coefs ;    // Computed coefficients here
   int unbiased_ncases ;
   int unbiased_nvars ;
   matrix unbiased_x ;
   vector unbiased_y ;
   virtual double    func(vector &p,int n=0);
  } ;
//+------------------------------------------------------------------+
//| Constructor                                                      |
//+------------------------------------------------------------------+
CUnbiased::CUnbiased(void)
  {
  }
//+------------------------------------------------------------------+
//| Destructor                                                       |
//+------------------------------------------------------------------+
CUnbiased::~CUnbiased(void)
  {
  }
//+------------------------------------------------------------------+
//| Function to be optimized                                         |
//+------------------------------------------------------------------+
double CUnbiased::func(vector &p,int n = 0)
  {

   double sum, err, pred,diff, penalty ;

// Normalize weights to sum to one
   sum = p.Sum() ;

   if(sum < 1.e-60)    // Should almost never happen
      sum = 1.e-60 ;   // But be prepared to avoid division by zero

   vector   unbiased_work = p / sum ;

// Compute criterion
   err = 0.0 ;
   for(int i=0 ; i<unbiased_ncases ; i++)
     {
      pred = 0.0 ;                                       // Will cumulate prediction
      for(int j=0 ; j<unbiased_nvars ; j++)              // For all model outputs
         pred += unbiased_x[i][j] * unbiased_work[j] ;   // Weight them per call
      diff = pred - unbiased_y[i] ;                      // Predicted minus true
      err += diff * diff ;                               // Cumulate squared error
     }

   penalty = 0.0 ;
   for(int j=0 ; j<unbiased_nvars ; j++)
     {
      if(p[j] < 0.0)
         penalty -= 1.e30 * p[j] ;
     }

   return err + penalty ;
  }
//+------------------------------------------------------------------+
//| Fit the consensus model                                          |
//+------------------------------------------------------------------+
bool CUnbiased::fit(matrix & train_vars, vector &train_targets,IModel* &models[])
  {
   unbiased_ncases = int(train_vars.Rows());
   unbiased_nvars = int(models.Size());
   unbiased_x = matrix::Zeros(unbiased_ncases,unbiased_nvars);

   unbiased_y = train_targets;

   m_coefs = vector::Zeros(unbiased_nvars);

   for(int i = 0; i<unbiased_ncases; i++)
     {
      vector ins = train_vars.Row(i);
      for(int j = 0; j<unbiased_nvars; j++)
         unbiased_x[i][j] = models[j].forecast(ins);
     }

   m_coefs.Fill(1.0/double(unbiased_nvars));

   int iters = Optimize(m_coefs);

   double sum = m_coefs.Sum();

   m_coefs/=sum;

   return true;
  }
//+------------------------------------------------------------------+
//| Make prediction with consensus model                             |
//+------------------------------------------------------------------+
double CUnbiased::predict(vector &inputs,IModel* &models[])
  {
   double output=0.0;
   for(uint imodel=0 ; imodel<models.Size() ; imodel++)
     {
      output += m_coefs[imodel] * models[imodel].forecast(inputs);
     }
   return output;
  }

This function incorporates the additional constraints discussed earlier, specifically the non-negativity of the weights and the requirement that they sum to one.

Variance-weighted combinations of predictive models

Another method for combining predictions from component models is based on optimal weights that are determined by each model's prediction accuracy. This technique involves allocating smaller weights to models with larger prediction errors and larger weights to models with smaller prediction errors. This method is particularly effective when there is significant variability in the quality of the component models. However, if the models are highly correlated, this technique may not be ideal, and another ensemble method should be considered.

The idea behind weighting according to model quality is rooted in the theory that if the models are unbiased and uncorrelated, assigning weights inversely proportional to the models' errors will minimize the expected squared error.

Variance weighted ensemble

To achieve this, the relative weight for each model is calculated based on the inverse of its error, and then the weights are scaled to ensure they sum to one.

Weight Equation

The ensemble of variance-weighted models is implemented in the CWeighted class. In the fit() method, for each training sample:

The prediction of each component model is computed.
And the squared error of each prediction is accumulated.

//+------------------------------------------------------------------+
//| Compute the variance-weighted average of the predictions         |
//+------------------------------------------------------------------+
class CWeighted
  {
public:

                     CWeighted(void) ;
                    ~CWeighted() ;
   bool              fit(matrix & train_vars, vector &train_targets,IModel* &models[]);
   double            predict(vector &inputs,IModel* &models[]) ;

private:
   vector m_coefs ;    // Computed coefficients here

  };
//+------------------------------------------------------------------+
//| Constructor                                                      |
//+------------------------------------------------------------------+
CWeighted::CWeighted(void)
  {
  }
//+------------------------------------------------------------------+
//| Destructor                                                       |
//+------------------------------------------------------------------+
CWeighted::~CWeighted(void)
  {
  }
//+------------------------------------------------------------------+
//| Fit a consensus model                                            |
//+------------------------------------------------------------------+
bool CWeighted::fit(matrix &train_vars,vector &train_targets,IModel* &models[])
  {
   m_coefs = vector::Zeros(models.Size());

   m_coefs.Fill(1.e-60);
   double diff = 0.0;
   for(ulong i = 0; i<train_vars.Rows(); i++)
     {
      vector ins = train_vars.Row(i);
      for(ulong j = 0; j<m_coefs.Size(); j++)
        {
         diff = models[j].forecast(ins) - train_targets[i];
         m_coefs[j] += (diff*diff);
        }
     }

   m_coefs=1.0/m_coefs;

   m_coefs/=m_coefs.Sum();

   return true;
  }
//+------------------------------------------------------------------+
//| Make a prediction with the consensus model                       |
//+------------------------------------------------------------------+
double CWeighted::predict(vector &inputs,IModel* &models[])
  {
   double output = 0.0;

   for(uint i = 0; i<models.Size(); i++)
      output+=m_coefs[i]*models[i].forecast(inputs);

   return output;
  }

Once this is done for all training samples, the total error for each model is used to compute its weight. These weights are then summed and scaled to ensure the total weight sums to one. This approach ensures that models with lower errors are given more influence in the final ensemble prediction, which can potentially improve prediction, especially in scenarios where the models exhibit varying degrees of predictive accuracy.

Interpolated combinations based on General regression neural networks

The ensemble methods we have discussed so far work well when the consensus model is trained with clean data. However, when the training data is noisy, the model may suffer from poor generalization. To address this issue, one effective regression method is the General Regression Neural Network (GRNN). The standout advantage of GRNN over traditional regression is its reduced susceptibility to overfitting. This is because the parameters of GRNN's have relatively less impact on the model compared to traditional regression techniques. While this gain in generalization comes at the cost of some accuracy, GRNN's can model complex, non-linear relationships, providing a useful tool when the data exhibits such characteristics.

GRNN's produce predictions that are interpolations among the target values in the training data. The interpolation is determined by a weight that defines how an out-of-sample case differs from known in-sample cases. The more similar the samples are, the higher the relative weight assigned. Although a GRNN can be described as a smoothing operation due to its interpolation of unknown samples into the known sample space, its theoretical underpinnings are grounded in statistics.

When presented with a dataset—comprising predictions from component models and their corresponding targets—the consensus prediction of a GRNN is the minimum expected squared error, which is given by the conditional expectation.

Conditional expectation

Since the joint density of the training data is typically unknown, we cannot directly use the formula for conditional expectation. Instead, we rely on estimates of the joint densities derived from the training data, which leads to the form of the GRNN depicted below.

GRNN formula

Before presenting the code for the GRNN-based ensemble method, we must first discuss the implementation of the GRNN. The code for the GRNN is defined in grnn.mqh, which contains the definition of the CGrnn class.

//+------------------------------------------------------------------+
//| General regression neural network                                |
//+------------------------------------------------------------------+
class CGrnn
  {
public:
                     CGrnn(void);
                     CGrnn(int num_outer, int num_inner, double start_std);
                    ~CGrnn(void);
   bool              fit(matrix &predictors,matrix &targets);
   vector            predict(vector &predictors);
   //double            get_mse(void);
private:
   bool              train(void);
   double            execute(void);
   ulong             m_inputs,m_outputs;
   int               m_inner,m_outer;
   double            m_start_std;
   ulong             m_rows,m_cols;
   bool              m_trained;
   vector            m_sigma;
   matrix            m_targets,m_preds;
  };

The GRNN implementation for regression tasks involves several key components. The constructor initializes parameters such as the number of inner and outer iterations and the starting standard deviation for the sigma weights.

//+------------------------------------------------------------------+
//| Default constructor                                              |
//+------------------------------------------------------------------+
CGrnn::CGrnn(void)
  {
   m_inner = 100;
   m_outer = 10;
   m_start_std = 3.0;
  }
//+------------------------------------------------------------------+
//| Parametric constructor                                           |
//+------------------------------------------------------------------+
CGrnn::CGrnn(int num_outer,int num_inner,double start_std)
  {
   m_inner = num_inner;
   m_outer = num_outer;
   m_start_std = start_std;
  }

The fit method stores the training data, including input predictors and target values, and initializes the sigma weights. It then trains the GRNN model by iteratively optimizing the sigma weights using a simulated annealing approach. During training, the sigma weights are perturbed, and the cross-validation error for the perturbed weights is calculated. The perturbation is accepted or rejected based on the error and a temperature parameter, with the temperature gradually reduced to focus the search.

//+------------------------------------------------------------------+
//| Fit data to a model                                              |
//+------------------------------------------------------------------+
bool CGrnn::fit(matrix &predictors,matrix &targets)
  {
   m_targets = targets;
   m_preds = predictors;
   m_trained = false;
   m_rows = m_preds.Rows();
   m_cols = m_preds.Cols();
   m_sigma = vector::Zeros(m_preds.Cols());

   if(m_targets.Rows() != m_preds.Rows())
     {
      Print(__FUNCTION__, " invalid inputs ");
      return false;
     }

   m_trained = train();

   return m_trained;
  }

The predict method calculates the distance between the input vector and each training data point, weights the training data points based on their distance from the input, and computes the predicted output as a weighted average of the target values of the training data points. The sigma weights determine the influence of each training data point on the prediction.

//+------------------------------------------------------------------+
//| Make a prediction with a trained model                           |
//+------------------------------------------------------------------+
vector CGrnn::predict(vector &predictors)
  {
   if(!m_trained)
     {
      Print(__FUNCTION__, " no trained model available for predictions ");
      return vector::Zeros(1);
     }

   if(predictors.Size() != m_cols)
     {
      Print(__FUNCTION__, " invalid inputs ");
      return vector::Zeros(1);
     }

   vector output  = vector::Zeros(m_targets.Cols());
   double diff,dist,psum=0.0;

   for(ulong i = 0; i<m_rows; i++)
     {
      dist  = 0.0;
      for(ulong j = 0; j<m_cols; j++)
        {
         diff  = predictors[j]  - m_preds[i][j];
         diff/= m_sigma[j];
         dist += (diff*diff);
        }
      dist  = exp(-dist);
      if(dist< EPS1)
         dist = EPS1;
      for(ulong k = 0; k<m_targets.Cols(); k++)
         output[k] += dist * m_targets[i][k];
      psum += dist;
     }
   output/=psum;
   return output;
  }

Cross-validation is used to evaluate the model's performance and optimize the sigma weights, while simulated annealing serves as a meta-heuristic optimization algorithm for finding the optimal sigma weights. Ultimately, the GRNN performs kernel-based interpolation, where the prediction is a weighted interpolation between the training data points.

The ensemble based on the GRNN is implemented as the class CGenReg.

//+------------------------------------------------------------------+
//| Compute the General Regression of the predictions                |
//+------------------------------------------------------------------+
class CGenReg
  {

public:

                     CGenReg(void) ;
                    ~CGenReg(void) ;
   bool              fit(matrix & train_vars, vector &train_targets,IModel* &models[]);
   double            predict(vector &inputs,IModel* &models[]) ;

private:
   CGrnn *grnn ;       // The GRNN object
   vector m_work ;     // Work vector nmodels long
   vector            m_targs;
   matrix            m_vars;
  } ;

The CGenReg class utilizes a CGrnn object to model complex relationships between the predictions of individual models and the actual target values. In the fit method, it first stores the training data, including the target values (train_targets) and the input variables (train_vars). It then gathers the individual predictions from each model, creating a matrix (preds) where each row represents a training sample and each column holds the prediction from the corresponding model in the set. The CGrnn object is trained using the matrix of individual predictions (preds) as input and the actual target values (targ) as the output.

//+------------------------------------------------------------------+
//| Fit consensus model                                              |
//+------------------------------------------------------------------+
bool CGenReg::fit(matrix & train_vars, vector &train_targets,IModel* &models[])
  {
   m_targs = train_targets;
   m_vars = train_vars;

   m_work = vector::Zeros(models.Size());

   matrix targ = matrix::Zeros(train_targets.Size(),1);

   if(!targ.Col(train_targets,0))
     {
      Print(__FUNCSIG__, " error adding column ", GetLastError());
      return false;
     }

   matrix preds(m_vars.Rows(),models.Size());
   for(ulong i = 0; i<m_vars.Rows(); i++)
     {
      vector ins = m_vars.Row(i);
      for(uint j = 0; j< models.Size(); j++)
        {
         preds[i][j] = models[j].forecast(ins);
        }
     }

   return grnn.fit(preds,targ);
  }

In the predict method, the class gathers predictions from each model for a new input (inputs) and stores them in a work vector (m_work). The trained CGrnn is then used to predict the final output based on these individual predictions. The method returns the first element of the predicted output vector as the final prediction.

//+------------------------------------------------------------------+
//| Make a prediction                                                |
//+------------------------------------------------------------------+
double CGenReg::predict(vector &inputs,IModel* &models[])
  {
   vector output;
   for(uint i = 0; i<models.Size(); i++)
      m_work[i] = models[i].forecast(inputs);
   output = grnn.predict(m_work);
   return output[0];
  }

Conclusion: Comparing ensemble methods

Various ensemble methods have been presented, with their inherent strengths and weaknesses briefly discussed. To conclude this article, we will investigate how these methods compare when applied to actual data. This comparison is implemented as a MetaTrader 5 script named Ensemble_Demo.mq5.

The script generates several synthetic groups of datasets. The first group consists of datasets used to train benchmark models. Models trained on such data are distinguished as good models, and the data itself is considered clean. A second group of datasets is generated to train bad models, which are considered inferior to the good models trained on the clean data.

The last group of datasets is used to train models that are considered biased. These models are biased relative to the good models referenced earlier. A partial dataset from each group is combined to simulate noisy data.

The script allows users to specify how many good, bad, and biased models are trained. The user also has control over the number of samples that make up the training data, enabling an assessment of how sample size affects the performance of the ensemble methods. Lastly, users can choose to train an ensemble model using the clean data by setting the parameter TrainCombinedModelsOnCleanData to true, or to train it using noisy data by setting it to false.

The models are feed-forward neural networks implemented by the FFNN class in mlffnn.mqh.

//+------------------------------------------------------------------+
//| Class for a basic feed-forward neural network                    |
//+------------------------------------------------------------------+
class FFNN
  {
protected:

   bool              m_trained;             // flag noting if neural net successfully trained
   matrix            m_weights[];           // layer weights
   matrix            m_outputs[];           // hidden layer outputs
   matrix            m_result;              // training result
   uint              m_epochs;              // number of epochs
   ulong             m_num_inputs;          // number of input variables for nn
   ulong             m_layers;              // number of layers of neural net
   ulong             m_hidden_layers;       // number of hidden layers
   ulong             m_hidden_layer_size[]; // node config for layers
   double            m_learn_rate;          // learning rate
   ENUM_ACTIVATION_FUNCTION m_act_fn;       // activation function
   //+------------------------------------------------------------------+
   //| Initialize the neural network structure                          |
   //+------------------------------------------------------------------+

   virtual bool      create(void)
     {
      if(m_layers - m_hidden_layers != 1)
        {
         Print(__FUNCTION__,"  Network structure misconfiguration ");
         return false;
        }

      for(ulong i = 0; i<m_layers; i++)
        {
         if(i==0)
           {
            if(!m_weights[i].Init(m_num_inputs+1,m_hidden_layer_size[i]))
              {
               Print(__FUNCTION__," ",__LINE__," ", GetLastError());
               return false;
              }
           }
         else
            if(i == m_layers-1)
              {
               if(!m_weights[i].Init(m_hidden_layer_size[i-1]+1,1))
                 {
                  Print(__FUNCTION__," ",__LINE__," ", GetLastError());
                  return false;
                 }
              }
            else
              {
               if(!m_weights[i].Init(m_hidden_layer_size[i-1]+1,m_hidden_layer_size[i]))
                 {
                  Print(__FUNCTION__," ",__LINE__," ", GetLastError());
                  return false;
                 }
              }
        }

      return true;
     }
   //+------------------------------------------------------------------+
   //| Calculate output  from all layers                                |
   //+------------------------------------------------------------------+

   virtual matrix    calculate(matrix &data)
     {
      if(data.Cols() != m_weights[0].Rows()-1)
        {
         Print(__FUNCTION__," input data not compatible with network structure ");
         return matrix::Zeros(0,0);
        }

      matrix temp = data;

      for(ulong i = 0; i<m_hidden_layers; i++)
        {
         if(!temp.Resize(temp.Rows(), m_weights[i].Rows()) ||
            !temp.Col(vector::Ones(temp.Rows()), m_weights[i].Rows() - 1))
           {
            Print(__FUNCTION__," ",__LINE__," ", GetLastError());
            matrix::Zeros(0,0);
           }

         m_outputs[i]=temp.MatMul(m_weights[i]);

         if(!m_outputs[i].Activation(temp, m_act_fn))
           {
            Print(__FUNCTION__," ",__LINE__," ", GetLastError());
            return matrix::Zeros(0,0);
           }

        }

      if(!temp.Resize(temp.Rows(), m_weights[m_hidden_layers].Rows()) ||
         !temp.Col(vector::Ones(temp.Rows()), m_weights[m_hidden_layers].Rows() - 1))
        {
         Print(__FUNCTION__," ",__LINE__," ", GetLastError());
         return matrix::Zeros(0,0);
        }

      return temp.MatMul(m_weights[m_hidden_layers]);
     }
   //+------------------------------------------------------------------+
   //|  Backpropagation method                                          |
   //+------------------------------------------------------------------+

   virtual bool      backprop(matrix &data, matrix& targets, matrix &result)
     {
      if(targets.Rows() != result.Rows() ||
         targets.Cols() != result.Cols())
        {
         Print(__FUNCTION__," invalid function parameters ");
         return false;
        }
      matrix loss = (targets - result) * 2;
      matrix gradient = loss.MatMul(m_weights[m_hidden_layers].Transpose());
      matrix temp;

      for(long i = long(m_hidden_layers-1); i>-1; i--)
        {
         if(!m_outputs[i].Activation(temp, m_act_fn))
           {
            Print(__FUNCTION__," ",__LINE__," ", GetLastError());
            return false;
           }

         if(!temp.Resize(temp.Rows(), m_weights[i+1].Rows()) ||
            !temp.Col(vector::Ones(temp.Rows()), m_weights[i+1].Rows() - 1))
           {
            Print(__FUNCTION__," ",__LINE__," ", GetLastError());
            return false;
           }

         m_weights[i+1] = m_weights[i+1] + temp.Transpose().MatMul(loss) * m_learn_rate;

         if(!m_outputs[i].Derivative(temp, m_act_fn))
           {
            Print(__FUNCTION__," ",__LINE__," ", GetLastError());
            return false;
           }

         if(!gradient.Resize(gradient.Rows(), gradient.Cols() - 1))
           {
            Print(__FUNCTION__," ",__LINE__," ", GetLastError());
            return false;
           }

         loss = gradient * temp;

         gradient = (i>0)?loss.MatMul(m_weights[i].Transpose()):gradient;
        }

      temp = data;
      if(!temp.Resize(temp.Rows(), m_weights[0].Rows()) ||
         !temp.Col(vector::Ones(temp.Rows()), m_weights[0].Rows() - 1))
        {
         Print(__FUNCTION__," ",__LINE__," ", GetLastError());
         return false;
        }

      m_weights[0] = m_weights[0] + temp.Transpose().MatMul(loss) * m_learn_rate;

      return true;
     }

public:
   //+------------------------------------------------------------------+
   //| Constructor                                                      |
   //+------------------------------------------------------------------+

                     FFNN(ulong &layersizes[], ulong num_layers = 3)
     {
      m_trained = false;
      m_layers = num_layers;
      m_hidden_layers = m_layers - 1;

      ArrayCopy(m_hidden_layer_size,layersizes,0,0,int(m_hidden_layers));

      ArrayResize(m_weights,int(m_layers));

      ArrayResize(m_outputs,int(m_hidden_layers));
     }
   //+------------------------------------------------------------------+
   //| Destructor                                                       |
   //+------------------------------------------------------------------+

                    ~FFNN(void)
     {
     }
   //+------------------------------------------------------------------+
   //| Neural net training method                                       |
   //+------------------------------------------------------------------+

   bool              fit(matrix &data, matrix &targets,double learning_rate, ENUM_ACTIVATION_FUNCTION act_fn, uint num_epochs)
     {
      m_learn_rate = learning_rate;
      m_act_fn = act_fn;
      m_epochs = num_epochs;
      m_num_inputs = data.Cols();
      m_trained = false;

      if(!create())
         return false;

      for(uint ep = 0; ep < m_epochs; ep++)
        {
         m_result = calculate(data);

         if(!backprop(data, targets,m_result))
            return m_trained;
        }

      m_trained = true;
      return m_trained;
     }
   //+------------------------------------------------------------------+
   //| Predict method                                                   |
   //+------------------------------------------------------------------+

   matrix            predict(matrix &data)
     {
      if(m_trained)
         return calculate(data);
      else
         return matrix::Zeros(0,0);
     }

  };
//+------------------------------------------------------------------+

The FFNN class defines a Multi-Layer Perceptron (MLP), a type of artificial neural network used for supervised learning tasks. It contains several properties such as:

m_trained, a Boolean flag that indicates if the network has been successfully trained;
m_weights, an array of matrices that store the weights between each layer;
m_outputs, an array of matrices that hold the outputs from each hidden layer;
m_result, a matrix holding the final network output after training;
m_epochs, the number of training epochs (iterations);
m_num_inputs, the number of input variables for the network;
m_layers, the total number of layers in the network, including input and output layers;
m_hidden_layers, the number of hidden layers in the network;
m_hidden_layer_size, an array that defines the number of nodes in each hidden layer;
m_learn_rate, the learning rate used for weight updates during training;
m_act_fn, the activation function used in the hidden layers.

The class includes both private and public methods. Private methods such as:

create, which initializes the network structure by allocating memory for weight matrices and the hidden layer outputs based on the specified configuration;
calculate, which propagates input data through the network, applying weights and activation functions to calculate the output;
backprop, which implements the backpropagation algorithm, adjusting the weights based on the error between the predicted and actual outputs.

The public methods include:

FFNN (constructor), which initializes the network with the specified number of layers and hidden layer sizes;
~FFNN (destructor), which releases the resources allocated for the network;
fit, which trains the network on a given dataset, adjusting the weights through backpropagation over the specified number of epochs;
predict, which uses the trained network to generate predictions for new input data, effectively performing forward propagation.

In the script, the CMlfn class implements the IModel interface based on an instance of FFNN. Next is a brief exposition of running the script in different configurations.

//+------------------------------------------------------------------+
//|                                                Ensemble_Demo.mq5 |
//|                                  Copyright 2024, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2024, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"
#property script_show_inputs
#include<mlffnn.mqh>
#include<ensemble.mqh>
#include<np.mqh>
//--- input parameters
input int      NumGoodModels=3;
input int      NumBiasedModels=7;
input int      NumBadModels=5;
input int      NumSamples=20;
input int      NumAttempts=1;
input double   VarParam=3.0;//variance parameter
input bool     TrainCombinedModelsOnCleanData = true;
//+------------------------------------------------------------------+
//| Clean up dynamic array pointers                                  |
//+------------------------------------------------------------------+
void cleanup(IModel* &array[])
  {
   for(uint i = 0; i<array.Size(); i++)
      if(CheckPointer(array[i])==POINTER_DYNAMIC)
         delete array[i];
  }
//+------------------------------------------------------------------+
//| IModel implementation of Multilayered iterative algo of GMDH     |
//+------------------------------------------------------------------+
class CMlfn:public IModel
  {
private:
   FFNN              *m_mlfn;
   double             m_learningrate;
   ENUM_ACTIVATION_FUNCTION    m_actfun;
   uint               m_epochs;
   ulong              m_layer[3];

public:
                     CMlfn();
                    ~CMlfn(void);
   void              setParams(double learning_rate, ENUM_ACTIVATION_FUNCTION act_fn, uint num_epochs);
   bool              train(matrix &predictors,matrix&targets);
   double            forecast(vector &predictors);
  };
//+------------------------------------------------------------------+
//| Constructor                                                      |
//+------------------------------------------------------------------+
CMlfn::CMlfn(void)
  {
   m_learningrate=0.01;
   m_actfun=AF_SOFTMAX;
   m_epochs= 100;
   m_layer[0] = 2;
   m_layer[1] = 2;
   m_layer[2] = 1;
   m_mlfn = new FFNN(m_layer);
  }
//+------------------------------------------------------------------+
//| Destructor                                                       |
//+------------------------------------------------------------------+
CMlfn::~CMlfn(void)
  {
   if(CheckPointer(m_mlfn) == POINTER_DYNAMIC)
      delete m_mlfn;
  }
//+------------------------------------------------------------------+
//| Set other hyperparameters of the model                           |
//+------------------------------------------------------------------+
void CMlfn::setParams(double learning_rate, ENUM_ACTIVATION_FUNCTION act_fn, uint num_epochs)
  {
   m_learningrate=learning_rate;
   m_actfun=act_fn;
   m_epochs= num_epochs;
  }
//+------------------------------------------------------------------+
//| Fit a model to the data                                          |
//+------------------------------------------------------------------+
bool CMlfn::train(matrix &predictors,matrix &targets)
  {
   return m_mlfn.fit(predictors,targets,m_learningrate,m_actfun,m_epochs);
  }
//+------------------------------------------------------------------+
//| Make a prediction with the trained model                         |
//+------------------------------------------------------------------+
double CMlfn::forecast(vector &predictors)
  {
   matrix preds(1,predictors.Size());

   if(!preds.Row(predictors,0))
     {
      Print(__FUNCTION__, " error inserting row ", GetLastError());
      return EMPTY_VALUE;
     }
   matrix out = m_mlfn.predict(preds);
   return out[0][0];
  }

//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
//---
   if(NumSamples<1 || NumAttempts<1 || VarParam<0.0 || NumBadModels<1 || NumGoodModels<1 || NumBiasedModels<1)
     {
      Print(" Invalid User inputs ");
      return;
     }

   int ndone, divisor;
   double diff, std, temp;

   double computed_err_average ;
   double computed_err_unconstrained ;
   double computed_err_unbiased ;
   double computed_err_biased ;
   double computed_err_weighted ;
   double computed_err_bagged ;
   double computed_err_genreg ;

   CAvg average;
   CLinReg unconstrained;
   CUnbiased unbiased;
   Cbiased biased;
   CWeighted weighted;
   CGenReg genreg;

   vector computed_err_raw = vector::Zeros(NumBadModels+NumGoodModels+NumBiasedModels);

   std  =  sqrt(VarParam);

   divisor = 1;

   IModel* puremodels[];
   matrix xgood[],xbad[],xbiased[],test[10];

   if(ArrayResize(puremodels,NumBadModels+NumGoodModels+NumBiasedModels)<0 ||
      ArrayResize(xgood,NumBadModels+NumGoodModels+NumBiasedModels)<0 ||
      ArrayResize(xbad,NumBadModels+NumGoodModels+NumBiasedModels)<0 ||
      ArrayResize(xbiased,NumBadModels+NumGoodModels+NumBiasedModels)<0)
     {
      Print(" failed puremodels array resize ", GetLastError());
      return;
     }

   for(uint i = 0; i<puremodels.Size(); i++)
      puremodels[i] = new CMlfn();

   for(uint i = 0; i<xgood.Size(); i++)
      xgood[i] = matrix::Zeros(NumSamples,3);

   for(uint i = 0; i<xbad.Size(); i++)
      xbad[i] = matrix::Zeros(NumSamples,3);

   for(uint i = 0; i<xbiased.Size(); i++)
      xbiased[i] = matrix::Zeros(NumSamples,3);

   for(uint i = 0; i<test.Size(); i++)
      test[i] = matrix::Zeros(NumSamples,3);

   computed_err_average = 0.0 ;
   computed_err_unconstrained = 0.0 ;
   computed_err_unbiased = 0.0 ;
   computed_err_biased = 0.0 ;
   computed_err_weighted = 0.0 ;
   computed_err_bagged = 0.0 ;
   computed_err_genreg = 0.0 ;

   vector t,v;
   matrix d;

   ndone  = 1;
   for(uint i = 0; i<xgood.Size(); i++)
     {
      xgood[i].Random(0.0,1.0);
      if(!xgood[i].Col(sin(xgood[i].Col(0)) - pow(xgood[i].Col(1),2.0) + std*xgood[i].Col(2),2))
        {
         Print(" column insertion error ", GetLastError());
         cleanup(puremodels);
         return;
        }
     }
   matrix xb(xgood[0].Rows(),1);
   for(uint i = 0; i<xbad.Size(); i++)
     {
      xbad[i] = xgood[0];
      xb.Random(0.0,1.0);
      if(!xbad[i].Col(xb.Col(0),2))
        {
         Print(" column insertion error ", GetLastError());
         cleanup(puremodels);
         return;
        }
     }
   for(uint i = 0; i<xbiased.Size(); i++)
     {
      xbiased[i] = xgood[0];
      if(!xbiased[i].Col(xgood[0].Col(2)+1.0,2))
        {
         Print(" column insertion error ", GetLastError());
         cleanup(puremodels);
         return;
        }
     }
   for(uint i = 0; i<test.Size(); i++)
     {
      test[i].Random(0.0,1.0);
      if(!test[i].Col(sin(test[i].Col(0)) - pow(test[i].Col(1),2.0) + std * test[i].Col(2),2))
        {
         Print(" column insertion error ", GetLastError());
         cleanup(puremodels);
         return;
        }
     }

   for(uint imodel=0; imodel<puremodels.Size(); imodel++)
     {
      if(imodel < xgood.Size())
        {
         t=xgood[imodel].Col(2);
         d=np::sliceMatrixCols(xgood[imodel],0,2);
        }
      else
         if(imodel >= xgood.Size() && imodel<(xgood.Size()+xbiased.Size()))
           {
            t=xbiased[imodel-xgood.Size()].Col(2);
            d=np::sliceMatrixCols(xbiased[imodel-xgood.Size()],0,2);
           }
         else
           {
            t=xbad[imodel - (xgood.Size()+xbiased.Size())].Col(2);
            d=np::sliceMatrixCols(xbad[imodel - (xgood.Size()+xbiased.Size())],0,2);
           }

      matrix tt(t.Size(),1);

      if(!tt.Col(t,0) || !puremodels[imodel].train(d,tt))
        {
         Print(" failed column insertion ", GetLastError());
         cleanup(puremodels);
         return;
        }

      temp  = 0.0;

      for(uint i = 0; i<test.Size(); i++)
        {
         for(int j = 0; j<NumSamples; j++)
           {
            t  = test[i].Row(j);
            v  = np::sliceVector(t,0,2);
            diff = puremodels[imodel].forecast(v) - t[2];
            temp += diff*diff;
           }
        }
      computed_err_raw[imodel] += temp/double(test.Size()*NumSamples);
     }
//average
   matrix tdata;
   if(TrainCombinedModelsOnCleanData)
      tdata = xgood[0];
   else
     {
      tdata = matrix::Zeros(NumSamples*3,3);
      if(!np::matrixCopyRows(tdata,xgood[0],0,NumSamples) ||
         !np::matrixCopyRows(tdata,xbad[0],NumSamples,NumSamples*2) ||
         !np::matrixCopyRows(tdata,xbiased[0],NumSamples*2))
        {
         Print(" failed to create noisy dataset");
         cleanup(puremodels);
         return;
        }
     }
   temp = 0.0;
   for(uint i  = 0; i<test.Size(); i++)
     {
      for(int j = 0; j<NumSamples; j++)
        {
         t = test[i].Row(j);
         v = np::sliceVector(t,0,2);
         diff = average.predict(v,puremodels) - t[2];
         temp += diff*diff;
        }
     }
   computed_err_average += temp/double(test.Size()*NumSamples);
//unconstrained
   temp = 0.0;
   t = tdata.Col(2);
   d = np::sliceMatrixCols(tdata,0,2);
   if(!unconstrained.fit(d,t,puremodels))
     {
      Print(" failed to fit unconstrained model ");
      cleanup(puremodels);
     }

   for(uint i  = 0; i<test.Size(); i++)
     {
      for(int j = 0; j<NumSamples; j++)
        {
         t = test[i].Row(j);
         v = np::sliceVector(t,0,2);
         diff = unconstrained.predict(v,puremodels) - t[2];
         temp += diff*diff;
        }
     }
   computed_err_unconstrained += temp/double(test.Size()*NumSamples);
//unbiased
   temp = 0.0;
   t = tdata.Col(2);
   d = np::sliceMatrixCols(tdata,0,2);
   if(!unbiased.fit(d,t,puremodels))
     {
      Print(" failed to fit unbiased model ");
      cleanup(puremodels);
     }

   for(uint i  = 0; i<test.Size(); i++)
     {
      for(int j = 0; j<NumSamples; j++)
        {
         t = test[i].Row(j);
         v = np::sliceVector(t,0,2);
         diff = unbiased.predict(v,puremodels) - t[2];
         temp += diff*diff;
        }
     }
   computed_err_unbiased += temp/double(test.Size()*NumSamples);
//biased
   temp = 0.0;
   t = tdata.Col(2);
   d = np::sliceMatrixCols(tdata,0,2);
   if(!biased.fit(d,t,puremodels))
     {
      Print(" failed to fit biased model ");
      cleanup(puremodels);
     }

   for(uint i  = 0; i<test.Size(); i++)
     {
      for(int j = 0; j<NumSamples; j++)
        {
         t = test[i].Row(j);
         v = np::sliceVector(t,0,2);
         diff = biased.predict(v,puremodels) - t[2];
         temp += diff*diff;
        }
     }
   computed_err_biased += temp/double(test.Size()*NumSamples);
//weighted
   temp = 0.0;
   t = tdata.Col(2);
   d = np::sliceMatrixCols(tdata,0,2);
   if(!weighted.fit(d,t,puremodels))
     {
      Print(" failed to fit weighted model ");
      cleanup(puremodels);
     }

   for(uint i  = 0; i<test.Size(); i++)
     {
      for(int j = 0; j<NumSamples; j++)
        {
         t = test[i].Row(j);
         v = np::sliceVector(t,0,2);
         diff = weighted.predict(v,puremodels) - t[2];
         temp += diff*diff;
        }
     }
   computed_err_weighted += temp/double(test.Size()*NumSamples);
//gendreg
   temp = 0.0;
   t = tdata.Col(2);
   d = np::sliceMatrixCols(tdata,0,2);
   if(!genreg.fit(d,t,puremodels))
     {
      Print(" failed to fit generalized regression model ");
      cleanup(puremodels);
     }
   for(uint i  = 0; i<test.Size(); i++)
     {
      for(int j = 0; j<NumSamples; j++)
        {
         t = test[i].Row(j);
         v = np::sliceVector(t,0,2);
         diff = genreg.predict(v,puremodels) - t[2];
         temp += diff*diff;
        }
     }
   computed_err_genreg += temp/double(test.Size()*NumSamples);

   temp = 0.0;
   PrintFormat("\n\n\nRandom DataSet%5d    Raw errors:", ndone);
   for(uint imodel  = 0; imodel<puremodels.Size() ; imodel++)
     {
      PrintFormat("  %.8lf", computed_err_raw[imodel] / ndone) ;
      temp += computed_err_raw[imodel] / ndone ;
     }
   PrintFormat("\n       Mean raw error = %8.8lf", temp / double(puremodels.Size())) ;

   PrintFormat("\n        Average error = %8.8lf", computed_err_average / ndone) ;
   PrintFormat("\n  Unconstrained error = %8.8lf", computed_err_unconstrained / ndone) ;
   PrintFormat("\n       Unbiased error = %8.8lf", computed_err_unbiased / ndone) ;
   PrintFormat("\n         Biased error = %8.8lf", computed_err_biased / ndone) ;
   PrintFormat("\n       Weighted error = %8.8lf", computed_err_weighted / ndone) ;
   PrintFormat("\n         GenReg error = %8.8lf", computed_err_genreg / ndone) ;

   cleanup(puremodels);
  }
//+------------------------------------------------------------------+

Running the script with the defaults yields the following output.

MR      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)       Random DataSet    1    Raw errors:
KI      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.38602529
HP      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.36430552
CK      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.36703202
OS      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.51205057
EJ      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.57791798
HE      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.66825953
FL      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.65051234
QD      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.57403745
EO      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.71593174
PF      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.62444495
NQ      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.77552594
KI      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.75079339
MP      0       15:56:41.914    Ensemble_Demo (BTCUSD,D1)         0.78851743
CK      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)         0.52343272
OR      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)         0.70166082
EK      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)       
RE      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)              Mean raw error = 0.59869651
QL      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)       
DE      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)               Average error = 0.55224337
ML      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)       
QF      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)         Unconstrained error = 10.21673109
KL      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)       
RI      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)              Unbiased error = 0.55224337
GL      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)       
PH      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)                Biased error = 0.48431477
CL      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)       
HH      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)              Weighted error = 0.51507522
OM      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)       
LK      0       15:56:41.915    Ensemble_Demo (BTCUSD,D1)                GenReg error = 0.33761372
KM      0       15:57:11.108    Ensemble_Demo (BTCUSD,D1)       
GG      0       15:57:11.108    Ensemble_Demo (BTCUSD,D1)       
CQ      0       15:57:11.108    Ensemble_Demo (BTCUSD,D1)

If all script parameters are kept the same as the previous run, except this time we opt to train the consensus model on noisy data. We observed the following ouput.

NL      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)       Random DataSet    1    Raw errors:
OS      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.72840629
GJ      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.63345953
PE      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.68442450
JL      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.91936106
OD      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.75230667
LO      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.88366446
PF      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.78226316
CQ      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.87140196
II      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.58672356
KP      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         1.09990815
MK      0       15:59:51.502    Ensemble_Demo (BTCUSD,D1)         0.92548778
OR      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)         1.03795716
GJ      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)         0.80684429
GE      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)         1.24041209
GL      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)         0.92169606
NF      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)       
CS      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)              Mean raw error = 0.85828778
RF      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)       
DS      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)               Average error = 0.83433599
FF      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)       
FP      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)         Unconstrained error = 23416285121251567120416768.00000000
DS      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)       
JR      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)              Unbiased error = 0.83433599
HS      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)       
PP      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)                Biased error = 0.74321307
LD      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)       
GQ      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)              Weighted error = 0.83213118
PD      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)       
FR      0       15:59:51.503    Ensemble_Demo (BTCUSD,D1)                GenReg error = 0.78697882

Key observations from these results are that ensemble methods generally outperform individual models, as combining multiple models typically yields better results than relying on a single one. However, there is no universal best method, as each has its strengths and weaknesses, and the optimal choice depends on the specific dataset and problem at hand.

Unconstrained regression, while potentially powerful, is highly susceptible to overfitting, particularly with noisy or small datasets. On the other hand, GRNN excels in handling small, noisy datasets by effectively smoothing the data, though it may sacrifice some fitting power for larger, cleaner datasets.

Linear regression methods can also be effective, but overfitting is a concern, especially with noisy or small datasets. Simple averaging and variance weighting are generally robust and can be good choices when the dataset is noisy or the optimal method is uncertain. In summary, the choice of ensemble method should be carefully considered based on the specific characteristics of the dataset, and it is often beneficial to experiment with different methods and evaluate their performance on a validation set to make an informed decision. All code referenced in the text is attached.

File Name	Description
MQL5/include/mlffnn.mqh	Contains the definition of FFNN class implementing a basic multilayer peceptron
MQL5/include/grnn.mqh	Defines the CGrnn class which implements a generalized regression neural network that used simulated annealing
MQL5/include/OLS.mqh	Defines the OLS class which encapsulates ordinary least squaures regression
MQL5/include/ensemble.mqh	Contains the defininition of six ensemble methods implemented as the classes CAvg, CLinReg, Cbiased, CUnbiased, CWeighted and CGenReg
MQL5/include/np.mqh	Contains variation matrix and vector utility functions
MQL5/scripts/Ensemble_Demo.mq5	This is a script that demonstrates the ensemble classes defined in ensemble.mqh

Attached files |

Download ZIP

ensemble.mqh (50.08 KB)

grnn.mqh (6.6 KB)

mlffnn.mqh (8.39 KB)

np.mqh (74.16 KB)

OLS.mqh (13.34 KB)

Ensemble_Demo.mq5 (13.03 KB)

Warning: All rights to these materials are reserved by MetaQuotes Ltd. Copying or reprinting of these materials in whole or in part is prohibited.