Self Optimizing Expert Advisor With MQL5 And Python (Part IV): Stacking Models

MetaTrader 5 — Examples | 19 September 2024, 16:07

2 561

In this series of articles, we will discuss different ways of building trading applications capable of dynamically adjusting themselves to evolving market conditions. There are potentially infinite ways we can approach this problem but, it is unlikely that all possible solutions will be valid. Therefore, our goal today is to demonstrate and empirically analyze the merits and shortcomings of different possible solutions, to help you improve your trading strategies.

Overview of The Trading Strategy

We shall turn our attention to forecasting the NZDJPY currency pair. We desire to algorithmically learn a trading strategy from the data we will collect on the symbol from our MetaTrader 5 Terminal. As humans, we may be naturally biased towards trading strategies that are aligned with our own beliefs and interests. Machine learning models are also biased. The bias of a machine learning model, is the extent to which the assumptions made by the model are violated. Our trading strategy will rely on an ensemble of 2 AI models. The first model will be trained to predict the future close price of the NZDJPY pair, 20 minutes into the future. The second model will be trained to predict the amount of error in the prediction made by the first model. This technique is known as stacking. Our hope is that by stacking 2 models, we will be able to overcome our human bias, and hopefully this will be enough to lead us to higher levels of performance.

Overview of The Methodology

We fetched approximately, 9000 rows of M1 market data on the NZDJPY pair from our MetaTrader 5 Terminal using a customized MQL5 script. We created 2D and 3D scatter plots of the market data. However, we were not able to identify any discernible relationships in the data. We also performed time-series decomposition on the data-set, and we were able to identify a clear downtrend and the presence of strong seasonal effects in the data.

Our data was then partitioned into training and test sets. A set of 15 different models were fit and evaluated on the training set. The Stochastic Gradient Descent (SGD) Regressor was the bet performing model from the group.

Subsequently, when we analyzed our feature importance metrics, we found that the High price appeared to be the most informative predictor we had for predicting the future close price of the NZDJPY pair. The High price obtained the best Mutual Information (MI) score. Furthermore, we also employed scikit-learn’s implementation of the Recursive Feature Elimination (RFE) algorithm. All the predictors we have were deemed important by the RFE algorithm. However, as we shall see in our discussion, just because a relationship exists, does not guarantee we will successfully capture and model it.

After identifying our best performing model, we then proceeded to tune the parameters of our model. Normally in our discussions, after parameter tuning, we usually proceed to test for overfitting by comparing the performance of our customized model against the performance of the default model. However, there are many different ways we can test for overfitting. Today, we chose to test for overfitting by analyzing our model’s residuals. We observed high levels of correlation between our model’s residuals and its lags. Normally, the residuals of a model that has sufficiently learned should have no correlation. Therefore, this suggested to us that our best performing model may not have learned effectively, or that there exists other data that may help us explain our target, and we have failed to include that data.

Afterward, we recorded the residuals of our model on the training and test sets. We did not fit the model on the test set at this point. We then cross validated our set of 15 models on the training residuals of our SGD Regressor. Our best performing model was the Lasso Regression, however we selected the third-best model, a Deep Neural Network (DNN), as our candidate solution. Our rationale for doing so was the flexibility offered to us by the deep neural network gives us an opportunity to tune it better to the data, than we could tune the Lasso due to its limited number of tuning parameters.

We tuned our DNN Regressor to predict the residuals of our SGD Regressor in a 2-step process that resulted in 2 unique models. We first performed 100 iterations of a random search over the parameters of our DNN Regressor, thus we created the first model. The best continuous parameters we identified were used as the starting point for an unconstrained global optimization attempt using the limited memory L-BFGS-B algorithm, and this is how we obtained our second model. Both models outperformed the default DNN Regressor on unseen validation data. Furthermore, our last model was the best performing model, meaning we didn’t waste our time by taking these extra steps in a 2-fold manner.

Finally, we exported both our models to ONNX format and proceeded to build an AI-powered Expert Advisor that has learned to correct its own errors.

Fetching The Data We Need

We will start off by fetching the data we need from our MetaTrader 5 Terminal. The script attached below will fetch as many bars of historical price data as we specify from our Terminal, before writing that data in CSV format and storing it for us in our folder “Files”.

//+------------------------------------------------------------------+
//|                                                      ProjectName |
//|                                      Copyright 2020, CompanyName |
//|                                       http://www.companyname.net |
//+------------------------------------------------------------------+
#property copyright "Gamuchirai Zororo Ndawana"
#property link      "https://www.mql5.com/en/users/gamuchiraindawa"
#property version   "1.00"
#property script_show_inputs

//+------------------------------------------------------------------+
//| Script Inputs                                                    |
//+------------------------------------------------------------------+
input int size = 100000; //How much data should we fetch?

//+------------------------------------------------------------------+
//| Global variables                                                 |
//+------------------------------------------------------------------+
int rsi_handler;
double rsi_buffer[];

//+------------------------------------------------------------------+
//| On start function                                                |
//+------------------------------------------------------------------+
void OnStart()
  {

//--- Load indicator
   rsi_handler = iRSI(_Symbol,PERIOD_CURRENT,20,PRICE_CLOSE);
   CopyBuffer(rsi_handler,0,0,size,rsi_buffer);
   ArraySetAsSeries(rsi_buffer,true);

//--- File name
   string file_name = "Market Data " + Symbol() + ".csv";

//--- Write to file
   int file_handle=FileOpen(file_name,FILE_WRITE|FILE_ANSI|FILE_CSV,",");

   for(int i= size;i>=0;i--)
     {
      if(i == size)
        {
         FileWrite(file_handle,"Time","Open","High","Low","Close");
        }

      else
        {
         FileWrite(file_handle,iTime(Symbol(),PERIOD_CURRENT,i),
                   iOpen(Symbol(),PERIOD_CURRENT,i),
                   iHigh(Symbol(),PERIOD_CURRENT,i),
                   iLow(Symbol(),PERIOD_CURRENT,i),
                   iClose(Symbol(),PERIOD_CURRENT,i)
                  );
        }
     }
//--- Close the file
   FileClose(file_handle);

  }
//+------------------------------------------------------------------+

Cleaning The Data

We will start by formatting our data. First, load the libraries we need.

#Import the libraries we need
import pandas as pd
import numpy as np

Now read in the market data.

#Read in the market data
nzd_jpy = pd.read_csv('Market Data NZDJPY.csv')

The data is in the wrong order, let us reset it so that it runs from the oldest date first and the date closest to the present moment should be last.

#Format the data
nzd_jpy = nzd_jpy[::-1]
nzd_jpy.reset_index(inplace=True, drop=True)

Define the target.

#Labelling the data
nzd_jpy['Target'] = nzd_jpy['Close'].shift(-20)
nzd_jpy.dropna(inplace=True)

We will also add binary targets for plotting purposes.

#Add binary target for plotting
nzd_jpy['Binary Target'] = np.nan
nzd_jpy.loc[nzd_jpy['Close'] > nzd_jpy['Target'],'Binary Target'] = 1
nzd_jpy.loc[nzd_jpy['Close'] <= nzd_jpy['Target'],'Binary Target'] = 0

Let us inspect the data.

#Current state of our dataframe
nzd_jpy

Fig 1: Our current data frame

Exploratory Data Analysis

Let us create scatter-plots to determine if there are any relationships we can observe. Unfortunately, our data appears randomly distributed with no clear separations between up and down moves in the market.

#Lets perform scatter plots
sns.scatterplot(data=nzd_jpy,x=nzd_jpy['Open'], y=nzd_jpy['Close'],hue='Binary Target')

Fig 2: A scatter-plot of the Open and Close price

We thought of creating box plots to summarize all the instances whereby price either rose or fell. We believed that there could potentially be differences between the distribution of data in these 2 possible targets. Regrettably, our box-plots shows that there is hardly any difference between the distribution of data in the two possible outcomes.

#Let's create categorical box plots
sns.catplot(data=nzd_jpy,x='Binary Target',y='Close',kind='box')

Fig 3: A box-plot summarizing all the instances whereby price levels fell (0) or rose (1)

We can also decompose the time-series data into 3 components:

Trend
Seasonal
Residual

The trend component represents the average long-term movement of price levels. The seasonal component, accounts for cyclical patterns that are observed repeatedly in the data, while the residual components are the remainder of whatever could not be explained by the previous 2 components. Since we are using M1 data on the NZDJPY pair, we set our period to 1440, or in other words the average performance in price over 1 complete day. We can observe a very clear and strong down-trend in the data even before performing the decomposition. However, by subtracting the trend from the original data, we can now clearly observe the seasonal effects in the data.

#Time series decomposition
import statsmodels.api as sm
nzd_jpy_decomposition = sm.tsa.seasonal_decompose(nzd_jpy['Close'],period=1440,model='additive')
fig = nzd_jpy_decomposition.plot()

Fig 4: Time-series decomposition of our data

Some effects may be hidden in higher dimensions. Creating 3D scatter-plots of our data, may allows us to unveil effects hidden beyond our scope with a 2D scatter-plot. Unfortunately, this dataset was not one of those cases. Our data is sill quite challenging to separate, and does not display any discernable relationships.

#Let's also perform 3D plots
#Visualizing our data in 3D

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111,projection='3d')
colors = ['blue' if movement == 0 else 'red' for movement in nzd_jpy.loc[:,"Binary Target"]]
ax.scatter(nzd_jpy.loc[:,"Low"],nzd_jpy.loc[:,"High"],nzd_jpy.loc[:,"Close"],c=colors)

ax.set_xlabel('NZDJPY Low')
ax.set_ylabel('NZDJPY High')
ax.set_zlabel('NZDJPY Close')

Fig 5: Visualizing our 3D scatter-plot

Preparing To Model The Data

Before we can start modeling our data, we need to first standardize and scale the data. Load the libraries we need.

#Let's prepare the data for modelling
from sklearn.preprocessing import RobustScaler

Scale the data.

#Scale the data
X = pd.DataFrame(RobustScaler().fit_transform(nzd_jpy.loc[:,['Close']]),columns=['Close'])
residuals_X = pd.DataFrame(RobustScaler().fit_transform(nzd_jpy.loc[:,['Open','High','Low']]),columns=['Open','High','Low'])
y = nzd_jpy.loc[:,'Target']

Model Selection

Let us load the libraries we need to model the data.

#Cross validating the models
from sklearn.model_selection import cross_val_score,train_test_split
from sklearn.linear_model import Lasso,LinearRegression,Ridge,ElasticNet,SGDRegressor,HuberRegressor
from sklearn.ensemble import RandomForestRegressor,GradientBoostingRegressor,AdaBoostRegressor,ExtraTreesRegressor,BaggingRegressor
from sklearn.svm import LinearSVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neural_network import MLPRegressor

Now partition the data into 2 halves.

#Create train-test splits
train_X,test_X,train_y,test_y = train_test_split(nzd_jpy.loc[:,['Close']],y,test_size=0.5,shuffle=False)
residuals_train_X,residuals_test_X,residuals_train_y,residuals_test_y = train_test_split(nzd_jpy.loc[:,['Open','High','Low']],y,test_size=0.5,shuffle=False)

Store the models in a list and create a data frame to store our validation error levels.

#Store the models
models = [
    Lasso(),
    LinearRegression(),
    Ridge(),
    ElasticNet(),
    SGDRegressor(),
    HuberRegressor(),
    RandomForestRegressor(),
    GradientBoostingRegressor(),
    AdaBoostRegressor(),
    ExtraTreesRegressor(),
    BaggingRegressor(),
    LinearSVR(),
    KNeighborsRegressor(),
    DecisionTreeRegressor(),
    MLPRegressor(),
]

#Store the names of the models
model_names = [
    'Lasso',
    'Linear Regression',
    'Ridge',
    'Elastic Net',
    'SGD Regressor',
    'Huber Regressor',
    'Random Forest Regressor',
    'Gradient Boosting Regressor',
    'Ada Boost Regressor',
    'Extra Trees Regressor',
    'Bagging Regressor',
    'Linear SVR',
    'K Neighbors Regressor',
    'Decision Tree Regressor',
    'MLP Regressor',
]

#Create a dataframe to store our cv error
cv_error = pd.DataFrame(columns=model_names,index=np.arange(0,5))

Cross validate each model.

#Cross validate each model
for model in models:
  cv_score = cross_val_score(model,X,y,cv=5,n_jobs=-1,scoring='neg_mean_squared_error')
  for i in np.arange(0,5):
    index = models.index(model)
    cv_error.iloc[i,index] = cv_score[i]

Visualizing the results.

cv_error

Fig 6: Some of our error levels when forecasting the future close price of the NZDJPY

Fig 7: A continuation of our error levels

Fig 8: Our final model error levels

We can plot our performance levels across all 5 folds. Our neural network's poor performance was alarming, when we visualized the data in this format. It would probably stand to benefit a lot from parameter-tuning.

cv_error.plot()

Fig 9: Visualizing our error levels

Box-plots help us summarize a lot of information in a single plot. For example, in our plots below, we can clearly see how poorly the DNN performed on this task. It is the last model on the right, and it is displaying a lot more variance in its performance than the other models.

sns.boxplot(cv_error)

Fig 10: Visualizing our errors as box-plots

We can identify the best performing model as the model with the lowest mean error levels.

#Our mean validation error
cv_error.mean()

Fig 11: Visualizing our mean error levels

Feature Importance

Feature importance algorithms help us understand whether our model has learned meaningful associations, or if the model has learned relationships we may have not known about. First, let us import the libraries we need.

#Feature importance
from sklearn.feature_selection import mutual_info_regression,RFE

We will begin by calculating mutual information (MI) scores. MI informs us about the potential each predictor posses, to help us predict the target. Lastly, MI is measured on a logarithmic scale. Therefore, MI scores above 3 are rarely seen in practice.

mi_score = pd.DataFrame(mutual_info_regression(X,y),columns=['MI Score'],index=X.columns)

Plotting our MI scores clealy shows us the the High price appears to have the most potential to predict the future close price of the NZDJPY.

mi_score.plot()

Fig 12: Plotting our MI scores

The RFE algorithm is similarly as straightforward to use as almost any object from the scikit-learn library. We simply create an instance of the class, and fit it to the data, before we can assess the feature importance levels it attributes to each predictor. Our RFE algorithm believed that all the predictors were equally important to predict the NZDJPY close price.

#Select the best features
rfe = RFE(model, n_features_to_select=5, step=1)
rfe = rfe.fit(X, y)
rfe.ranking_

array([1, 1, 1, 1])

Parameter Tuning

Let us now extract as much performance from our SGD Regressor model as we can. We will perform a randomized search over a sample of the parameter space of the model. Let us first import the libraries we need.

#Parameter tuning
from sklearn.model_selection import RandomizedSearchCV

Create an instance of the default model.

#Initialize the model
model = SGDRegressor()

Create a tuner object and specify the possible parameter values we would like to sample.

#Define the tuner
tuner = RandomizedSearchCV(
        model,
        {
        "loss" : ['squared_error', 'huber', 'epsilon_insensitive','squared_epsilon_insensitive'],
        "penalty":['l2','l1', 'elasticnet', None],
        "alpha":[0.1,0.01,0.001,0.0001,0.00001,0.00001,0.0000001,10,100,1000,10000,100000],
        "tol":[0.1,0.01,0.001,0.0001,0.00001,0.000001,0.0000001],
        "fit_intercept": [True,False],
        "early_stopping": [True,False],
        "learning_rate":['constant','optimal','adaptive','invscaling'],
        "shuffle": [True,False]
        },
        n_iter=100,
        cv=5,
        n_jobs=-1,
        scoring="neg_mean_squared_error"
)

Fit the tuner object.

#Fit the tuner
tuner.fit(train_X,train_y)

The best parameters we found.

#Our best parameters
tuner.best_params_

{'tol': 0.001,
'shuffle': False,
'penalty': 'elasticnet',
'loss': 'huber',
'learning_rate': 'adaptive',
'fit_intercept': True,
'early_stopping': True,
'alpha': 1e-05}

Testing For Overfitting

We can detect overfitting if we observe correlation in the residuals of the model. If a model has learned effectively, its residuals should be random white noise, indicating there is no predictable pattern in the errors being made by our model. However, a model that demonstrates autocorrelation in its residuals may be a cause for concern. It may signal that the regression we have performed is spurious, or that we have selected an inappropriate model for our task.To get started, we need to capture the residuals of our customized model.

#Model validation
model = SGDRegressor(
    tol             = tuner.best_params_['tol'],
    shuffle         = tuner.best_params_['shuffle'],
    penalty         = tuner.best_params_['penalty'],
    loss            = tuner.best_params_['loss'],
    learning_rate   = tuner.best_params_['learning_rate'],
    alpha           = tuner.best_params_['alpha'],
    fit_intercept   = tuner.best_params_['fit_intercept'],
    early_stopping  = tuner.best_params_['early_stopping']
)
model.fit(train_X,train_y)
residuals = test_y - model.predict(test_X)

We can visualize our model's residuals in a plot. Unfortunately, we can clearly see there is autocorrelation in the model’s residuals. In otherwords, whenever the residuals fall, they tend to continue falling, and when the residuals rise, they tend to continue rising. This means that the residuals future values, may have a relationship with its previous values, a tell-tale sign that our model may not have learned effectively, even after performing parameter tuning!

#Plot the residuals
residuals.plot()

Fig 13: Our model residuals

There are more robust tests for auto-correlation, for example we can create an auto-correlation (ACF) plot. The ACF plot will have spikes at each possible lag value. The height of each spike represents the levels of correlation the time-series data has with its lagged value. There is also a blue cone like structure in the background of our plot, the blue cone represents our confidence intervals. Any correlation levels beyond the confidence interval, are considered statistically significant.

#The residuals appear to have autocorrelation
from statsmodels.graphics.tsaplots import plot_acf
fig = plot_acf(residuals)

Fig 14: The auto-correlation of our model residuals

This is not a good sign, hopefully we may be able to alleviate this by training our DNN to correct our first model. Let us record our SGD Regresosr’s error’s on the training data and then on the test data. Note, we will not fit the model on the test data at this point, we will only measure its error levels.

#Prepare the residuals for our second model
model = SGDRegressor(tol= 0.001,
 shuffle=False,
 penalty= 'elasticnet',
 loss= 'huber',
 learning_rate='adaptive',
 fit_intercept= True,
 early_stopping= True,
 alpha= 1e-05)

#Store the model residuals
model.fit(train_X,train_y)
residuals_train_y = train_y - model.predict(train_X)
residuals_test_y = test_y - model.predict(test_X)

We will now cross validate out models on predicting the error levels of the SGD Regressor.

#Cross validate each model
for model in models:
  cv_score = cross_val_score(model,residuals_train_X,residuals_train_y,cv=5,n_jobs=-1,scoring='neg_mean_squared_error')
  for i in np.arange(0,5):
    index = models.index(model)
    cv_error.iloc[i,index] = cv_score[i]

Let us visualize the validation error.

#Cross validaton error levels
cv_error

Fig 15: Some of model validation error levels when forecasting our first model's error levels

Fig 16: A continuation of our validation error levels

We can view our average error levels in descending order to quickly identify our best performing model.

#Store the model's performance
cv_error.mean().sort_values(ascending=False)

Fig 17: Our mean validation error levels clearly show the Lasso is the best performing model we have

Our box-plots show just how poorly the SGD Regressor performed when trying to predict its own error levels.

sns.boxplot(cv_error)

Fig 18: Box-plots of our validation error when forecasting our model's residuals

We can also create line-plots to visualize our data.

cv_error.plot()

Fig 19: Visualizing our error levels an line plots

Parameter Tuning Our Deep Neural Network

Let us now prepare to tune the parameters of our DNN Regressor, first we will define the tuner object and a sample of the parameter space we wish to search.

#Let's tune the model
#Reinitialize the model

model = MLPRegressor()

#Define the tuner

tuner = RandomizedSearchCV(
        model,
        {
                "activation":["relu","tanh","logistic","identity"],
                "solver":["adam","sgd","lbfgs"],
                "alpha":[0.1,0.01,0.001,0.00001,0.000001],
                "learning_rate": ["constant","invscaling","adaptive"],
                "learning_rate_init":[0.1,0.01,0.001,0.0001,0.000001,0.0000001],
                "power_t":[0.1,0.5,0.9,0.01,0.001,0.0001],
                "shuffle":[True,False],
                "tol":[0.1,0.01,0.001,0.0001,0.00001],
                "hidden_layer_sizes":[(10,20),(100,200),(30,200,40),(5,20,6)],
                "max_iter":[10,50,100,200,300],
                "early_stopping":[True,False]
        },
        n_iter=100,
        cv=5,
        n_jobs=-1,
        scoring="neg_mean_squared_error"
)

Now we will fit the tuner object.

#Fit the tuner
tuner.fit(residuals_train_X,residuals_train_y)

Finally we can see the best parameters we found.

#The best parameters we found
tuner.best_params_

{'tol': 0.0001,
'solver': 'lbfgs',
'shuffle': False,
'power_t': 0.5,
'max_iter': 300,
'learning_rate_init': 0.01,
'learning_rate': 'constant',
'hidden_layer_sizes': (30, 200, 40),
'early_stopping': False,
'alpha': 1e-05,
'activation': 'identity'}

Deeper Parameter Tuning

Let us now test to see if we cannot find even better parameters still, since we do not necessarily know where the best input values may lie, we will attempt to perform unconstrained global optimization using the limited memory L-BFGS-B algorithm in the SciPy library. The L-BFGS-B algorithm can be effectively used for global optimization problems. The numerical solver itself is implemented in Fortran code, and the SciPy library provides a thin wrapper for easily interfacing with the routine. We will start, by importing the libraries we need.

Fig 20: The developers of the original BFGS algorithm, from left to right: Broyden, Fletcher, Goldfarb and Shanno

#Deeper optimization
from scipy.optimize import minimize

Now we will define the objective function to be minimized, we want to minimize the training error of our DNN Regressor. We will fix all other input parameters of the model since our SciPy minimizers can only handle continuous optimization problems.

#Define the objective function
def objective(x):
    #Create a dataframe to store our accuracy
    cv_error = pd.DataFrame(index = np.arange(0,5),columns=["Current Error"])
    #The parameter x represents a new value for our neural network's settings
    #In order to find optimal settings, we will perform 10 fold cross validation using the new setting
    #And return the average RMSE from all 10 tests
    #We will first turn the model's Alpha parameter, which controls the amount of L2 regularization
    MLPRegressor(hidden_layer_sizes=(20,5),activation='identity',learning_rate='adaptive',solver='lbfgs',shuffle=True,alpha=x[0],tol=x[1])
    model = MLPRegressor(
    tol                 = x[0],
    solver              = tuner.best_params_['solver'],
    power_t             = x[1],
    max_iter            = tuner.best_params_['max_iter'],
    learning_rate       = tuner.best_params_['learning_rate'],
    learning_rate_init  = x[2],
    hidden_layer_sizes  = tuner.best_params_['hidden_layer_sizes'],
    alpha               = x[3],
    early_stopping      = tuner.best_params_['early_stopping'],
    activation          = tuner.best_params_['activation'],
    )
    #Cross validate the model
    cv_score = cross_val_score(model,residuals_train_X,residuals_train_y,cv=5,n_jobs=-1,scoring='neg_mean_squared_error')
    for i in np.arange(0,5):
      cv_error.iloc[i,0] = cv_score[i]
    #Return the Mean CV RMSE
    return(cv_error.iloc[:,0].mean())

We shall define the starting point of our optimization procedure by the best parameters we found from our random search. We will also pass constraints for our optimization procedure, we will force all values to be positive and furthermore we will allow all values in the range of 10 to the power -100 until 10 to the power 100. This is a very large domain, and hopefully it will contain the optimal parameters we are looking for.

#Define the starting point
pt = [tuner.best_params_['tol'],tuner.best_params_['power_t'],tuner.best_params_['learning_rate_init'],tuner.best_params_['alpha']]
bnds = ((10.0 ** -100,10.0 ** 100),
        (10.0 ** -100,10.0 ** 100),
        (10.0 ** -100,10.0 ** 100),
        (10.0 ** -100,10.0 ** 100))

Searching for the best parameters.

#Searching deeper for better parameters
result = minimize(objective,pt,method="L-BFGS-B",bounds=bnds)

Our optimization results.

result

message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
success: True
   status: 0
      fun: -0.01143352283130129
        x: [ 1.000e-04 5.000e-01 1.000e-02 1.000e-05]
      nit: 2
      jac: [-1.388e+04 -4.657e+04 5.625e+04 -1.033e+04]
     nfev: 120
     njev: 24
hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>

Testing For Overfitting

Let us now test for overfitting. This time around, we will compare our 2 models, against the performance of the default DNN Regressor. Let us instantiate instances of each DNN Regressor we wish to test.

#Model validation
default_model = MLPRegressor()

customized_model = MLPRegressor(
    tol                 = tuner.best_params_['tol'],
    solver              = tuner.best_params_['solver'],
    power_t             = tuner.best_params_['power_t'],
    max_iter            = tuner.best_params_['max_iter'],
    learning_rate       = tuner.best_params_['learning_rate'],
    learning_rate_init  = tuner.best_params_['learning_rate_init'],
    hidden_layer_sizes  = tuner.best_params_['hidden_layer_sizes'],
    alpha               = tuner.best_params_['alpha'],
    early_stopping      = tuner.best_params_['early_stopping'],
    activation          = tuner.best_params_['activation'],
)

lbfgs_customized_model = MLPRegressor(
    tol                 = result.x[0],
    solver              = tuner.best_params_['solver'],
    power_t             = result.x[1],
    max_iter            = tuner.best_params_['max_iter'],
    learning_rate       = tuner.best_params_['learning_rate'],
    learning_rate_init  = result.x[2],
    hidden_layer_sizes  = tuner.best_params_['hidden_layer_sizes'],
    alpha               = result.x[3],
    early_stopping      = tuner.best_params_['early_stopping'],
    activation          = tuner.best_params_['activation'],
)

Now we will create a list of models and also create a data frame to store out validation error levels.

models = [default_model,customized_model,lbfgs_customized_model]
cv_error = pd.DataFrame(index=np.arange(0,5),columns=['Default NN','Random Search NN','L-BFGS-B NN'])

Cross-validating each model.

#Cross validate the model
for model in models:
  model.fit(residuals_train_X,residuals_train_y)
  cv_score = cross_val_score(model,residuals_test_X,residuals_test_y,cv=5,n_jobs=-1,scoring='neg_mean_squared_error')
  for i in np.arange(0,5):
    index = models.index(model)
    cv_error.iloc[i,index] = cv_score[i]

Our cross-validation error.

cv_error

Default NN	Random Search NN	L-BFGS-B NN
-0.007735	-0.007708	-0.007692
-0.00635	-0.006344	-0.006329
-0.003307	-0.003265	-0.00328
0.005225	-0.004803	-0.004761
-0.004469	-0.004447	-0.004492

Now let us analyze our average error levels.

cv_error.mean().sort_values(ascending=False)

Model	Validation Error
L-BFGS-B NN	-0.005311
Random Search NN	-0.005313
Default NN	-0.005417

As we can see, our model's were all performing within the same range. However, our customized models were clearly providing us with lower error levels on average. Unfortunately, the variance displayed by our models is almost the same across the board, this is shown to us by our box-plots. The variance levels help us determine the model's level of skill,

sns.boxplot(cv_error)

Fig 21: Our validation error levels on the held out data

Let us now see if our ensemble approach is better than just using a single model to forecast price levels. We will first prepare the models we need.

#Now that we have come this far, let's see if our ensemble approach is worth the trouble
baseline = LinearRegression()
default_nn = MLPRegressor()

#The SGD Regressor will predict the future price
sgd_regressor = SGDRegressor(tol= 0.001,
 shuffle=False,
 penalty= 'elasticnet',
 loss= 'huber',
 learning_rate='adaptive',
 fit_intercept= True,
 early_stopping= True,
 alpha= 1e-05)

#The deep neural network will predict the error in the SGDRegressor's prediction
lbfgs_customized_model = MLPRegressor(
    tol                 = result.x[0],
    solver              = tuner.best_params_['solver'],
    power_t             = result.x[1],
    max_iter            = tuner.best_params_['max_iter'],
    learning_rate       = tuner.best_params_['learning_rate'],
    learning_rate_init  = result.x[2],
    hidden_layer_sizes  = tuner.best_params_['hidden_layer_sizes'],
    alpha               = result.x[3],
    early_stopping      = tuner.best_params_['early_stopping'],
    activation          = tuner.best_params_['activation'],
)

Fitting the model's on the training set.

#Fit the models on the train set
baseline.fit(train_X.loc[:,['Close']],train_y)
default_nn.fit(train_X.loc[:,['Close']],train_y)
sgd_regressor.fit(train_X.loc[:,['Close']],train_y)
lbfgs_customized_model.fit(residuals_train_X.loc[:,['Open','High','Low']],residuals_train_y)

Store the models in a list.

#Store the models in a list
models = [baseline,default_nn,sgd_regressor,lbfgs_customized_model]

Creating a data-frame to store our error levels.

#Create a dataframe to store our error
ensemble_error = pd.DataFrame(index=np.arange(0,5),columns=['Baseline','Default NN','SGD','Customized NN'])

Importing the libraries we need.

from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error

Create the time-series split object.

#Create the time-series object
tscv = TimeSeriesSplit(n_splits=5,gap=20)

We need to reset the indexes of our data.

#Reset the indexes so we can perform cross validation
test_y = test_y.reset_index()
residuals_test_y = residuals_test_y.reset_index()

test_X = test_X.reset_index()
residuals_test_X = residuals_test_X.reset_index()

Now we will perform time-series cross-validation. Note that the model predicting the residuals needs to be trained separately from the other models that are simply predicting the future close price.

#Cross validate the models
for j in np.arange(0,4):
  model = models[j]
  for i,(train,test) in enumerate(tscv.split(test_X)):
    #The model predicting the residuals
    if(j == 3):
      model.fit(residuals_test_X.loc[train[0]:train[-1],['Open','High','Low']],residuals_test_y.loc[train[0]:train[-1],'Target'])
      #Measure the loss
      ensemble_error.iloc[i,j] = mean_squared_error(residuals_test_y.loc[test[0]:test[-1],'Target' ], model.predict(residuals_test_X.loc[test[0]:test[-1],['Open','High','Low']]))

    elif(j <= 2):
      #Fit the model
      model.fit(test_X.loc[train[0]:train[-1],['Close']],test_y.loc[train[0]:train[-1],'Target'])
      #Measure the loss
      ensemble_error.iloc[i,j] = mean_squared_error(test_y.loc[test[0]:test[-1],'Target' ],model.predict(test_X.loc[test[0]:test[-1],['Close']]))

Now let us analyze our validation error levels. Unfortunately, we failed to outperform the performance of a simple linear regression model, indicating that our model may be too sensitive to the variance of the data. The good news is that we out performed our default DNN Regressor.

ensemble_error.mean().sort_values(ascending=True)

Models	Validation Error
Baseline	0.004784
Customized L-BFGS-B NN	0.004891
SGD	0.005937
Default NN	35.35851

Exporting To ONNX Format

Open Neural Network Exchange (ONNX) is an open-source protocol for building and deploying machine learning models in a language agnostic manner. ONNX allows us to easily integrate our scikit-learn models into our Expert Advisors by relying on the MQL5 API support for ONNX. First, we will import the libraries we need.

#Prepare to export to ONNX
import onnx
import netron
from   skl2onnx import convert_sklearn
from   skl2onnx.common.data_types import FloatTensorType

Now let us fit the models on all the data we have available.

#Fit the SGD model on all the data we have
close_model = SGDRegressor(tol= 0.001,
 shuffle=False,
 penalty= 'elasticnet',
 loss= 'huber',
 learning_rate='adaptive',
 fit_intercept= True,
 early_stopping= True,
 alpha= 1e-05)

close_model.fit(nzd_jpy.loc[:,['Close']],nzd_jpy.loc[:,'Target'])

Then lastly we will fit our DNN Regressor.

#Fit the deep neural network on all the data we have
residuals_model = MLPRegressor(
 tol=0.0001,
 solver= 'lbfgs',
 shuffle=False,
 power_t= 0.5,
 max_iter= 300,
 learning_rate_init= 0.01,
 learning_rate='constant',
 hidden_layer_sizes=(30, 200, 40),
 early_stopping=False,
 alpha=1e-05,
 activation='identity')

#Fit the model on the residuals
residuals_model.fit(residuals_train_X,residuals_train_y)
residuals_model.fit(residuals_test_X,residuals_test_y)

Let us define the input shapes of our 2 models.

# Define the input type
close_initial_types = [("float_input",FloatTensorType([1,1]))]
residuals_initial_types = [("float_input",FloatTensorType([1,3]))]

We need to create ONNX representations of our models.

# Create the ONNX representation
close_onnx_model = convert_sklearn(close_model,initial_types=close_initial_types,target_opset=12)
residuals_onnx_model = convert_sklearn(residuals_model,initial_types=residuals_initial_types,target_opset=12)

Lastly, we need to store our models in ONNX format.

#Save the ONNX Models
onnx.save_model(close_onnx_model,'close_model.onnx')
onnx.save_model(residuals_onnx_model,'residuals_model.onnx')

Visualizing our ONNX Models

Let us also visualize our models to ensure that they have the input and output shapes we specifeid. We will start by visualizing our DNN Regressor. First, we will import the library we need.

#Import netron
import netron

Now we will visualize our DNN.

#Visualizing the residuals model
netron.start("/ENTER/YOUR/PATH/residuals_model.onnx")

Fig 22: Visualizing our DNN Regressor model

Fig 23: Visualizing our DNN Regressor Model

Fig 24: The input and output shape of our DNN regressor meets our expectations

Let us also visualize our SGD Regressor model.

#Visualizing the close model
netron.start("/ENTER/YOUR/PATH/close_model.onnx")

Fig 25: Visualizing our SGD Regressor model

Fig 26: Visualizing our SGD Regressor model

Implementing in MQL5

To start building our AI-powered Expert Advisor, we first need to load the ONNX files we have just created into the application.

//+------------------------------------------------------------------+
//|                                                   NZD JPY AI.mq5 |
//|                                        Gamuchirai Zororo Ndawana |
//|                          https://www.mql5.com/en/gamuchiraindawa |
//+------------------------------------------------------------------+
#property copyright "Gamuchirai Zororo Ndawana"
#property link      "https://www.mql5.com/en/gamuchiraindawa"
#property version   "1.00"

//+------------------------------------------------------------------+
//| Load the ONNX models                                             |
//+------------------------------------------------------------------+
#resource "\\Files\\residuals_model.onnx" as const uchar residuals_onnx_buffer[];
#resource "\\Files\\close_model.onnx" as const uchar close_onnx_buffer[];

Now, we need the trade library to help us open and close our positions.

//+------------------------------------------------------------------+
//| Libraries                                                        |
//+------------------------------------------------------------------+
#include  <Trade/Trade.mqh>
CTrade Trade;

Let us also create global variables that we will use throughout our program.

//+------------------------------------------------------------------+
//| Global variables                                                 |
//+------------------------------------------------------------------+
long residuals_model,close_model;
vectorf residuals_forecast   = vectorf::Zeros(1);
vectorf close_forecast       = vectorf::Zeros(1);
float forecast = 0;
int model_state,system_state;
int counter = 0;
double bid,ask;

In our model's initialization procedure, we will first load our 2 ONNX models, then we will validate that our models are working.

//+------------------------------------------------------------------+
//| Expert initialization function                                   |
//+------------------------------------------------------------------+
int OnInit()
  {
//--- Load the ONNX models
   if(!load_onnx_models())
     {
      return(INIT_FAILED);
     }

//--- Validate the model is working
   model_predict();
   if(forecast == 0)
     {
      Comment("The ONNX models are not working correctly!");
      return(INIT_FAILED);
     }

//--- We managed to load our ONNX models
   return(INIT_SUCCEEDED);
  }

Whenever our model is removed from the chart, we will free up any resoruces we are no longer using.

//+------------------------------------------------------------------+
//| Expert deinitialization function                                 |
//+------------------------------------------------------------------+
void OnDeinit(const int reason)
  {
//--- Free up the resources we no longer need
   release_resorces();
  }

If we receive any new prices, we will first update our market prices. Afterwards, we will fetch a prediction from our model. Once we have a forecast, we will check if we have any positions open. If we have no open positions, we will then follow our model's prediction so long as the price changes on higher time frames permits us to do so. Otherwise, if we allready have an open position, we will first wait for 20 candles to elapse before checking to see if our model is predicting a reversal. Recall that we trained the model to forecast 20 steps into the future, therefore we should allow some time to elapse before checking for a reversal.

//--- If we have no positions, follow the model's forecast
   if(PositionsTotal() == 0)
     {
      //--- Our model is suggesting we should buy
      if(model_state == 1)
        {
         if(iClose(Symbol(),PERIOD_W1,0) > iClose(Symbol(),PERIOD_W1,12))
           {
            Trade.Buy(0.3,Symbol(),ask,0,0,"NZD JPY AI");
            system_state = 1;
           }
        }

      //--- Our model is suggesting we should sell
      if(model_state == -1)
        {
         if(iClose(Symbol(),PERIOD_W1,0) < iClose(Symbol(),PERIOD_W1,12))
           {
            Trade.Sell(0.3,Symbol(),bid,0,0,"NZD JPY AI");
            system_state = -1;
           }
        }
     }

   else
     {
      //--- We want to wait 20 mins before forecating again.
      if(time_stamp != time)
        {
         time_stamp= time;
         counter += 1;
        }

      if((system_state!= model_state) && (counter >= 20))
        {
         Alert("Reversal detected by our AI system, closing all positions");
         Trade.PositionClose(Symbol());
         counter = 0;
        }
     }
  }

Let us define the function to fetch prediction from our models, recall that we have 2 separate models that each need to be called in turn.

//+------------------------------------------------------------------+
//| Fetch a prediction from our models                               |
//+------------------------------------------------------------------+
void model_predict(void)
  {
//--- Define the inputs
   vectorf close_inputs = vectorf::Zeros(1);
   vectorf residuals_inputs = vectorf::Zeros(3);

   close_inputs[0]     = (float) iClose(Symbol(),PERIOD_M1,0);
   residuals_inputs[0] = (float) iOpen(Symbol(),PERIOD_M1,0);
   residuals_inputs[1] = (float) iHigh(Symbol(),PERIOD_M1,0);
   residuals_inputs[2] = (float) iLow(Symbol(),PERIOD_M1,0);

//--- Fetch predictions
   OnnxRun(residuals_model,ONNX_DEFAULT,residuals_inputs,residuals_forecast);
   OnnxRun(close_model,ONNX_DEFAULT,close_inputs,close_forecast);

//--- Our forecast
   forecast = residuals_forecast[0] + close_forecast[0];
   Comment("Model forecast: ",forecast);

//--- Remember the model's prediction
   if(forecast > close_inputs[0])
     {
      model_state = 1;
     }
   else
     {
      model_state = -1;
     }
  }

We also need a function to update our market prices.

//+------------------------------------------------------------------+
//| Update the market data we have                                   |
//+------------------------------------------------------------------+
void update_market_data(void)
  {
   bid = SymbolInfoDouble(Symbol(),SYMBOL_BID);
   ask = SymbolInfoDouble(Symbol(),SYMBOL_ASK);
  }

This function will release the resources we are no longer using.

//+------------------------------------------------------------------+
//| Rlease the resources we no longer need                           |
//+------------------------------------------------------------------+
void release_resorces(void)
  {
   OnnxRelease(residuals_model);
   OnnxRelease(close_model);
   ExpertRemove();
  }

Lastly, this function will prepare our ONNX models from the ONNX buffers we created in the beginning of our application.

//+------------------------------------------------------------------+
//| This function will load our ONNX models                          |
//+------------------------------------------------------------------+
bool load_onnx_models(void)
  {
//--- Load the ONNX models from the buffers we created
   residuals_model = OnnxCreateFromBuffer(residuals_onnx_buffer,ONNX_DEFAULT);
   close_model = OnnxCreateFromBuffer(close_onnx_buffer,ONNX_DEFAULT);

//--- Validate the models
   if((residuals_model == INVALID_HANDLE) || (close_model == INVALID_HANDLE))
     {
      //--- We failed to load the models
      Comment("Failed to create the ONNX models: ",GetLastError());
      return(false);
     }

//--- Set the I/O shapes of the models
   ulong residuals_inputs[] = {1,3};
   ulong close_inputs[]     = {1,1};
   ulong model_output[]     = {1,1};

//---- Validate the I/O shapes
   if((!OnnxSetInputShape(residuals_model,0,residuals_inputs)) || (!OnnxSetInputShape(close_model,0,close_inputs)))
     {
      //--- We failed to set the input shapes
      Comment("Failed to set model input shapes: ",GetLastError());
      return(false);
     }

   if((!OnnxSetOutputShape(residuals_model,0,model_output)) || (!OnnxSetOutputShape(close_model,0,model_output)))
     {
      //--- We failed to set the output shapes
      Comment("Failed to set model output shapes: ",GetLastError());
      return(false);
     }

//--- Everything went fine
   return(true);
  }
//+------------------------------------------------------------------+

Fig 27: Back testing our Expert Advisor

Fig 28: The results of testing our Expert Advisor

Conclusion

In this article, we have demonstrated how we can build self-correcting trading applications. Our conversation highlighted how to analyze your model's residuals to detect for overfitting and bias in your machine learning models. Unfortunately, we detected that our best performing model's residuals were miss-behaved. We could try to remedy this by differencing the time-series data and the target until we no longer observe any autocorrelation in the residuals, however this could also make our model more challenging to interpret. While we cannot guarantee that the points we have raised in this article will generate consistent success, it is definitely worth considering if you are keen on applying AI into your trading strategies. Join us for our next discussion as we will try to remedy the pitfalls we have observed today, while balancing model interpretability.

Attached files |

Download ZIP

residuals_model.onnx (57.17 KB)

close_model.onnx (0.26 KB)

Self_Optimizing_Expert_Advisors_in_Python_l_MQL5.ipynb (750.78 KB)

NZD_JPY_AI.mq5 (7.37 KB)

Warning: All rights to these materials are reserved by MetaQuotes Ltd. Copying or reprinting of these materials in whole or in part is prohibited.

Overview of The Trading Strategy

Overview of The Methodology

Fetching The Data We Need

Cleaning The Data

Exploratory Data Analysis

Preparing To Model The Data

Model Selection

Feature Importance

Parameter Tuning

Testing For Overfitting

Parameter Tuning Our Deep Neural Network

Deeper Parameter Tuning

Testing For Overfitting

Exporting To ONNX Format

Visualizing our ONNX Models

Implementing in MQL5

Conclusion

Other articles by this author