
A feature selection algorithm using energy based learning in pure MQL5
Introduction
In the realm of algorithmic trading, the widespread use of machine learning has prompted the adoption of data mining techniques to uncover hidden patterns in financial data. Within this landscape, practitioners often grapple with the challenge of sorting through numerous variables to identify those most likely to be beneficial in achieving specific goals or solving particular problems. In this article, we explore the implementation of a feature selection algorithm aimed at assessing the relevance of a set of candidate variables for a given prediction task.
Yun Li, Jennie Si, Guojing Zhou, Shasha Huang, and Songcan Chen co-authored a research paper titled "FREL: A Stable Feature Selection Algorithm." This paper introduces an algorithm named Feature Weighting as Regularized Energy-Based Learning (FREL), which serves as a feature selection or weighting technique designed to offer both accuracy and stability. In our discussion, we provide an overview of the theoretical underpinnings behind regularized energy-based learning and feature weighting. Furthermore, we illustrate the efficacy of the proposed approach through the implementation of an example MQL5 program, crafted as a script, to highlight the method's potential as a tool for feature selection.
Weighted nearest-neighbour classification
The concept behind FREL draws inspiration from a technique known as weighted nearest-neighbour classification, which leverages the distances between points in a dataset to make predictions. By determining appropriate weights for each feature, this method enhances prediction accuracy. Weighted nearest-neighbour classification represents a variation of the k-nearest neighbor (k-NN) algorithm, a widely used approach in machine learning for classification tasks. In the standard k-NN classification, the algorithm examines the k closest data points in the training set when classifying a new data point, ultimately assigning the majority class among these neighbors to the new data point.
In weighted nearest-neighbour classification, however, instead of merely tallying the votes of the nearest neighbors, each neighbor's vote is weighted according to its distance from the new data point. The rationale is that closer neighbors should exert a stronger influence on the classification decision than those farther away. This weighting process involves calculating the distance between the new data point and each point in the training set. Common distance metrics used include Euclidean distance, Manhattan distance, or cosine similarity, selected based on the characteristics of the data. In this context, we utilize the Manhattan distance, also referred to as city-block distance, between data points. The formula for calculating this distance is shown below. Where, w , are the weights and a testcase is being evaluated relative to other training data, given as training case
Understanding Energy-Based Models
Energy-based modeling in machine learning serves as a versatile framework applicable to both supervised and unsupervised learning tasks. It operates on the principle of assigning energy values to various data configurations and learning a model capable of distinguishing between desirable and undesirable configurations. This is achieved by minimizing the energy of observed data while maximizing the energy of unobserved or undesirable data configurations.
At the heart of energy-based models lies the definition of an energy function, denoted as E(). This function takes as input a configuration of input variables or predictors, along with a set of model parameters. The output of the energy function provides an indication of the relevance of the input variables' configuration. For instance, in the context of evaluating a regression model, the energy function might be represented as the mean square error. When relevant predictors are inputted into the mean square error equation, the output value tends to be smaller, reflecting higher relevance. Conversely, poor predictors lead to larger mean square error values. The energy function assigns a scalar value to each conceivable configuration of variables.
The objective of training an energy-based model is to learn the parameters of the energy function so that it assigns low energies to relevant input variables and high energies to irrelevant ones. This entails defining an objective function that penalizes high energies for correct variables and low energies for incorrect ones. To achieve this, the goal is to identify the configuration of incorrect variables that yields the lowest energy, representing a sample likely to cause erroneous predictions from the model. The function below represents the energy of the configuration of inputs , x , and model parameter, w , that outputs the erroneous value, y , too low to distinguish from configurations of input variables that produce accurate predictions.
Ultimately, the objective function aims to maximize the discrepancy between the incorrect configuration with the lowest energy and the nearest correct configuration of variables. The energy from such a configuration is given below.
The objective function, known as a loss functional, comprises a per-sample averaged loss function. Given below as the log loss.
Various loss criteria can serve as the per-sample loss function, such as hinge loss, log loss, square loss, and square-exponential loss, with the choice depending on the application.
In summary, these are the fundamental concepts underlying FREL. The subsequent section delves into the specifics of the algorithm itself.
The FREL algorithm
To effectively apply the FREL algorithm, certain fundamental considerations must be followed. Firstly, it's crucial to carefully assess the training data. FREL is ideally suited for datasets that map a set of candidate variables to a single target. Equally important is ensuring that the variables are similar in scale. Exposing FREL to candidate predictors with inconsistent scaling can significantly distort the final results.
Secondly, given that FREL is an energy-based learning procedure, it necessitates the definition of an energy function incorporating weighting parameters. Therefore, the model used should be configured to accept a set of candidate variables along with corresponding weightings. For instance, considering the mean square error as an energy function, where the model is a regression model, incorporating weighting parameters becomes relatively straightforward. Each weight would be paired with a candidate predictor.
Lastly, a per-sample loss function must be selected to determine the loss functional. The loss functional, which incorporates the weighting parameters of the model, is the function (of functions) minimized to obtain the optimal weights.
The core steps of the FREL algorithm are as follows:
- Begin with a training dataset comprising n observations with d candidate predictors corresponding to n target values. The goal is to ascertain the most relevant predictors from the set of d candidates to determine the target values. This results in assigning weights to each of the d predictors, indicating the importance of the variable relative to others. A larger weight signifies greater relevance in defining the target value.
- Initially, assign all weights a value of 1.
- Apply weighted nearest neighbors classification to each observation in the training data to identify the configuration of incorrect variables yielding the lowest energy and the nearest configuration of correct variables with high energy. Utilize these energy values to calculate the per-sample loss using the selected loss function.
- Finally, minimize the objective loss function, optionally with regularization, using an appropriate optimization procedure. This constitutes the core FREL algorithm.
MQL5 implementation of FREL
The implementation of FREL showcased in this text utilizes Powell's method of optimization. While the specific optimization method used is not critical, the results should be relatively consistent across methods. In this implementation, Powell's method is represented as the class "PowellsMethod," defined in Powells.mqh. The FREL algorithm is encapsulated in the class "FREL," a descendant of "PowellsMethod," specified in frel.mqh.
//+------------------------------------------------------------------+ //| constructor | //+------------------------------------------------------------------+ FREL(matrix &in_data,int numboot=1, int bootsize=0) { m_data = in_data; m_num_boot=(numboot>0)?numboot:1; m_bootsize=(bootsize>2 && bootsize<=int(m_data.Rows()) && m_num_boot>1)?bootsize:int(m_data.Rows()); if(ArrayResize(m_indices, int(m_data.Rows()))!=int(m_data.Rows()) || ArrayResize(m_target_bin, int(m_data.Rows()))!=int(m_data.Rows()) || ArrayResize(m_trial_weights, int(m_data.Cols()-1))!=int(m_data.Cols()-1) || ArrayResize(m_work_weights, int(m_data.Cols()-1))!=int(m_data.Cols()-1) ) { Print(__FUNCTION__, " error ", GetLastError()); m_memory_allocated = false; } else m_memory_allocated = true; }
Let's delve into the parametric constructor description. It is invoked with at least one parameter: a matrix of training data. It's crucial to note how the training data should be structured in the matrix. Each row represents an observation or a single sample, while the columns represent the candidate variables or predictors to be evaluated. The target is expected to reside in the last column of the matrix. The optional parameters of the constructor are further explained in the table below.
Parameter Name | Data Type | Default Value | Description |
---|---|---|---|
numboot | integer | 1 | "numboot" sets the number of bootstraps to be conducted |
bootsize | integer | 0 | "bootsize" defines the size of each bootstrap. Be careful setting this parameter. If a value larger than the number of observations in the matrix is used, "numboot" automatically fall back to 1 and "bootsize" to the number of observations. |
There's only one method users need to familiarize themselves with to utilize the "FREL" class: "WeighVars()."
//+-----------------------------------------------------------------------+ //| Find the most relevant variables from a dataset of candidate variables| //+-----------------------------------------------------------------------+ bool WeighVars(int num_bins_target, double reg_factor,int &index[],double &weights[]) { if(!m_memory_allocated) { Print(" INTERNAL ERROR "); return false; } if(num_bins_target<=1 || num_bins_target>int(m_data.Rows())) { Print(__FUNCTION__, " invalid function parameter: num_bins_target. Parameter should be >=2 "); return false; } int ret=0; double target[], target_thresholds[] ; double sum ; int n_cases = int(m_data.Rows()); m_reg_factor = MathAbs(reg_factor); m_loss = 0.0; if(ArrayResize(index,int(m_data.Cols()-1))!=int(m_data.Cols()-1) || !np::vecAsArray(m_data.Col(m_data.Cols()-1),target) ) { Print(__FUNCTION__, " error ", GetLastError()); return false; } int k = num_bins_target ; if(!bin_array(target, k, target_thresholds, m_target_bin)) return false; if(k<num_bins_target) { Print("error bins of target vector ", num_bins_target," : ", k); return false; } for(int i=0 ; i<n_cases ; i++) { if(m_target_bin[i] >= num_bins_target) { Print("error m_target_bin array at index ", i, " is ",m_target_bin[i], " should be less than ", num_bins_target); return false; } } ret = calc_wt(num_bins_target,m_loss,weights); if(ret<0) return false; sum = 0.0 ; for(ulong var=0 ; var<m_data.Cols()-1 ; var++) { weights[var] = m_data.Col(var).Std() * exp(weights[var]); sum += weights[var] ; } for(ulong var=0 ; var<m_data.Cols()-1 ; var++) { weights[var] *= 100.0 / sum ; index[var] = int(var) ; } MathQuickSortDescending(weights,index,0,int(weights.Size()-1)) ; return true; }
This method evaluates the training data specified in the constructor. It returns a boolean value, with "false" indicating that the procedure failed to complete. The parameters of this method are as follows:
- "num_bins_target": An integer defining the number of bins the target values will be partitioned into. This parameter should be set to any integer >= 2 but <= the number of observations in the training data.
- "reg_factor": A positive double value controlling the degree of regularization. A value of 0 disables regularization.
- "index[]": An integer array to which part of the operation's results will be written. It contains the original column indices, as supplied to the constructor, arranged in descending order of relevance to the target.
- "weights[]": A double array containing the optimal weightings, arranged in descending order.
When "WeighVars()" is invoked, the target values are extracted from the matrix and put into an array in preparation for a call to the private method "bin_array()." This method segments the array into approximately equal-sized categories, outputting two arrays upon successful completion. "upperbound_thresholds[]" is an array of upper thresholds for each segment, while the integer array "categories[]" contains index values representing the segment each corresponding target value belongs to. Each of these values is checked to ensure that all target values were binned correctly.
//+------------------------------------------------------------------+ //| calculates the optimal weights of candidate variables | //+------------------------------------------------------------------+ int calc_wt(int num_bins_target,double &loss_value, double &w[]) { int ret,rand_error, class_count[] ; ret = 0; if(ArrayResize(class_count,num_bins_target)!=num_bins_target || (w.Size()!=uint(m_data.Cols()-1) && ArrayResize(w,int(m_data.Cols()-1))!=int(m_data.Cols()-1))) { Print(__FUNCTION__, " error ", GetLastError()); return -1; } ArrayInitialize(w,0.0); loss_value = 0.0 ; for(ulong i=0 ; i<m_data.Rows() ; i++) m_indices[i] = int(i) ; for(int ibootstrap=0 ; ibootstrap<m_num_boot; ibootstrap++) { Comment(" Bootstrap iteration ", ibootstrap+1); ArrayInitialize(class_count,0); int ii, j, k, m; ii = int (m_data.Rows()) ; while(ii > 1) { m = int (m_data.Rows()) - ii ; if(m >= m_bootsize) break ; j = (int)(MathRandomUniform(0.0,1.0,rand_error) * ii) ; if(j >= ii) j = ii - 1 ; k = m_indices[m] ; m_indices[m] = m_indices[m+j] ; m_indices[m+j] = k ; --ii ; ++class_count[m_target_bin[m_indices[m]]] ; } for(int i=0 ; i<num_bins_target ; i++) { if(class_count[i] < 2) Print(__FUNCTION__, " class at ", i, " has less than 2 members. Consider adjusting Frel parameters. (number of partitions or bootstrap sample size)"); } ArrayInitialize(m_trial_weights,0.0); ret += Optimize(m_trial_weights); loss_value += PowellsMethod::GetFret() ; for(ulong i=0 ; i<m_data.Cols()-1 ; i++) w[i] += m_trial_weights[i] ; } for(ulong i=0 ; i<m_data.Cols()-1; i++) w[i] /= double(m_num_boot) ; return ret ; }
Weight estimation commences with a call to "calc_wt()." Here, bootstrap sampling is conducted, shuffling the data before initial weights are optimized. Optimization is performed by the parent class method "Optimize()." The optimal weights for each bootstrap are summed in "w[]," to be averaged before exiting "calc_wt()".
//+------------------------------------------------------------------+ //| function minimized by Powells optimization method | //+------------------------------------------------------------------+ virtual double func(const double& p[]) { double pen = 0.0 ; for(ulong i=0 ; i<m_data.Cols()-1 ; i++) { if(p[i] > 4.0) { m_work_weights[i] = exp(4.0) + p[i] - 4.0 ; pen += (p[i] - 4.0) * (p[i] - 4.0) ; } else if(p[i] < -3.0) { m_work_weights[i] = exp(-3.0) + p[i] + 3.0 ; pen += (p[i] + 3.0) * (p[i] + 3.0) ; } else m_work_weights[i] = exp(p[i]) ; } return (loss(m_work_weights) + pen) ; }
Remember that the function being minimized is the loss functional, represented by an overridden method of the parent class called "func()".
//+------------------------------------------------------------------+ //| calculates the loss function | //+------------------------------------------------------------------+ double loss(double &w[]) { double totaloss = total_loss(w); totaloss/=double(m_data.Rows()); if(m_reg_factor>0.0) { for(ulong i=0; i<m_data.Cols()-1;i++) totaloss+=m_reg_factor*pow(w[i],2.0); } return totaloss; }
Within "func()," the method "loss()" is engaged, triggering the calculation of the per-sample loss function implemented as the private method "total_loss()".
//+------------------------------------------------------------------+ //| loss over all data | //+------------------------------------------------------------------+ double total_loss(double &w[]) { int category,first, other ; double distance, top, bottom, loss ; loss = 0.0 ; for(int i=0; i<m_bootsize; i++) { other = m_indices[i] ; category = m_target_bin[other] ; top = bottom = DBL_MAX ; for(int iother=0 ; iother<m_bootsize; iother++) { first = m_indices[iother] ; if(first == other) continue ; distance = 0.0 ; for(ulong v=0 ; v<m_data.Cols()-1; v++) { distance += w[v] * fabs(m_data[other][v] - m_data[first][v]) ; } if(m_target_bin[first] == category) { if(distance < top) top = distance ; } else { if(distance < bottom) bottom = distance ; } } distance = top - bottom ; if(distance > 30.0) loss += distance ; else loss += log(1.0 + exp(distance)); } return loss ; }
Upon completion of all bootstraps, the averaged optimal weights are written to the user-supplied "weights[]" array. Before sorting in descending order, the weights are transformed so that they collectively add up to 100, enhancing interpretability.
An example of feature selection using FREL
To demonstrate the FREL algorithm, we provide the script "FrelExample.mq5." This script utilizes FREL to analyze a randomly generated dataset comprising candidate variables and a target to identify the best predictors. Users can adjust all parameters of the FREL algorithm and certain characteristics of the synthetic dataset. This includes the total number of observations ( num_observations ) and the number of candidate predictors ( num_candidate_predictors ). Below is a snippet illustrating the user-adjustable inputs of the script:
//---user adjustable input parameters input int number_of_partitions = 8; //Number of partitions input double regularization_factor = 0.0; //Regularization factor input int number_of_bootstraps = 1; input int bootstrap_size = 0; input ulong num_observations = 2000;// Sample size of random dataset input ulong num_candidate_predictors = 10;// Maximum number of candidate predictors in dataset
The script generates a matrix of random numbers with num_observations rows and num_candidate_predictors + 1 columns. The last column is overwritten with the sum of the columns at indices 1, 3, 5, and 7, which serves as the target variable of the dataset.
//+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ void OnStart() { srand(126); //---check user input parameters if(number_of_partitions<2 || num_observations<10 || num_candidate_predictors<8) { Print("Invalid input parameters"); return; } //---the data matrix dataset; //---initialize size of random dataset dataset.Init(num_observations,num_candidate_predictors+1); //---fill dataset with random data dataset.Random(1.0,10.0); //---set the target variable in the last column if(!dataset.Col(dataset.Col(1) + dataset.Col(3) + dataset.Col(5) + dataset.Col(7),num_candidate_predictors)) { Print("error ", GetLastError()); return; } //---initialize Frel object FREL frel(dataset,number_of_bootstraps,bootstrap_size); //---declare containers to recieve output from Frel operation double optimal_weights[]; int index[]; //--- ulong timeIT = GetTickCount64(); //---find the most relevant predictors if(!frel.WeighVars(number_of_partitions,regularization_factor,index,optimal_weights)) return; //---calculate runtime Print("Runtime of FREL ", GetTickCount64() - timeIT, " ms"); //---display results for(uint i = 0; i<optimal_weights.Size(); i++) Print("Predictor at Column index ", index[i], " weight ", optimal_weights[i]); } //+------------------------------------------------------------------+
The objective is to observe if FREL can appropriately weight the variables, designating columns 1, 3, 5, and 7 as having the strongest relationship with the target. Initially, we execute the script with default parameters, noting that regularization is disabled and only one bootstrap is specified.
Output.
ON 0 18:12:30.906 FrelExample (BTCUSD,D1) Runtime of FREL 273375 ms GD 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 7 weight 24.46987538756267 IH 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 3 weight 24.22319404776024 EL 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 5 weight 22.26820806768701 LP 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 1 weight 22.13748732798876 DD 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 0 weight 1.162036446785271 KK 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 8 weight 1.1532145209345603 RO 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 4 weight 1.1496286906955606 RS 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 2 weight 1.1472521997561425 NG 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 6 weight 1.14561384476096 DK 0 18:12:30.906 FrelExample (BTCUSD,D1) Predictor at Column index 9 weight 1.14348946606884
Next, we investigate the impact of regularization on the estimated weights by testing with regularization degrees of 0.1 and 1.0.
Output.
MQ 0 18:19:03.951 FrelExample (BTCUSD,D1) Runtime of FREL 331296 ms QD 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 3 weight 19.63442784832085 PK 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 5 weight 19.009699240770477 GO 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 7 weight 18.823288529399388 GQ 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 1 weight 18.18026689510982 NE 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 0 weight 4.106428447842871 KI 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 8 weight 4.075425288243113 OM 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 2 weight 4.070169243578418 MQ 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 6 weight 4.051103060690134 FE 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 9 weight 4.025271426001863 FJ 0 18:19:03.951 FrelExample (BTCUSD,D1) Predictor at Column index 4 weight 4.0239200200430805
Output.
HP 0 18:25:43.421 FrelExample (BTCUSD,D1) Runtime of FREL 362984 ms FF 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 3 weight 10.353013480731704 JJ 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 7 weight 10.227015183302557 IM 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 5 weight 10.213781888319609 KQ 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 1 weight 10.079770794877978 PF 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 0 weight 9.948300319843046 QJ 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 8 weight 9.938367489770178 KN 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 2 weight 9.897336276433514 DQ 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 6 weight 9.79559491756489 EF 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 9 weight 9.774541742551756 CI 0 18:25:43.421 FrelExample (BTCUSD,D1) Predictor at Column index 4 weight 9.77227790660475
Results from the regularization tests indicate that the weights become spread across other variables, diverging from the correct variables. It is likely that specifying a larger degree of regularization will lead to less distinct weights, making it challenging to differentiate useful variables from irrelevant ones.
Upon examining the runtime results from our tests, it's evident that FREL operates relatively slowly. The bottleneck is likely attributed to the "total_loss()" function, which must iterate through the entire dataset multiple times as the optimizer executes. To improve runtime efficiency, we conduct numerous bootstraps with a smaller sample size. The following results are obtained from a run with 100 bootstraps of 40 samples.
Output.
IN 0 18:30:55.441 FrelExample (BTCUSD,D1) Runtime of FREL 22985 ms OK 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 3 weight 18.706272752181135 OL 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 1 weight 18.32079620338284 RS 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 5 weight 18.194009676469012 HG 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 7 weight 16.298306686632337 MI 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 4 weight 5.838867272535404 LM 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 9 weight 5.249285089162589 FQ 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 8 weight 4.791606631149278 DE 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 6 weight 4.770223641360407 KI 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 0 weight 3.974977300216029 KM 0 18:30:55.441 FrelExample (BTCUSD,D1) Predictor at Column index 2 weight 3.855654746910961
Conclusion
In this article, we introduced an MQL5 implementation of feature weighting using regularized energy-based modeling. We provided a brief overview of the algorithm's theoretical foundation and showcased its effectiveness on a synthetic dataset. While the results were promising, we observed that the algorithm incurred significant computational costs, leading to slow analysis. To address this issue, we proposed the utilization of multiple bootstraps with smaller sample sizes, which notably improved the overall execution speed of the algorithm. However, our implementation could still benefit greatly from either multithreading or GPU acceleration. Nonetheless, individuals interested in the method are encouraged to customize the code according to their needs. The article includes all the code discussed, with each source file detailed in the table below.
Source File | Description |
---|---|
Mql5\include\np.mqh | A header file of various vector and matrix manipulation tools |
Mql5\include\Powells.mqh | Contains the definition of the PowellsMethod class which implements Powells method of function minimization |
Mql5\include\frel.mqh | Contains the definition of the FREL class representing the feature weighting as regularized energy based learning algorithm |
Mql5\script\FrelExample.mq5 | A script demonstrating the use of the FREL class |





- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use