Data Science and ML (Part 44): Forex OHLC Time series Forecasting using Vector Autoregression (VAR)

MetaTrader 5 — Trading systems | 23 June 2025, 08:39

1 379

Omega J Msigwa

Contents

What is Vector Autoregression (VAR)?

This is a traditional and statistical time series forecasting tool used to investigate the dynamic relationships between multiple time series variables. Unlike univariate autoregressive models such as ARIMA (discussed in the previous article) which only forecast a single variable based on its previous values, VAR models investigate the interconnectivity of many variables.

They accomplish this by modeling each variable as a function of not only its previous values but also of the past values of other variables in the system. In this article, we will explore the fundamentals of Vectorautoregression and its application to trading.

Their origin
Vector Autoregression was first presented in the 1960s by economist Clive Granger. Granger's significant discoveries laid the framework for understanding and modeling the dynamic interactions that exist among economic factors. VAR models acquired significant momentum in econometrics and macroeconomics during the 1970s and 1980s.

This technique is a multivariate extension of auto-regression (AR) models. While traditional AR models such as ARIMA, analyze the relationship between a single variable and its lagged values, VAR models consider multiple variables simultaneously. In a VAR model, each variable is regressed on its own lagged values as well as the lagged values of other variables in the system.

In the previous article of this series, we discussed ARIMA and found out that it cannot incorporate multiple variables into its training and forecasting process, In this article we will discuss VAR which most people might consider to be ARIMA's predecessor as it aims to fix univariate time series forecasting issue.

To understand this simple technique (model) let us look at its mathematics.

Mathematics Behind the Vector Autoregression (VAR) model

The main difference between other autoregressive models (AR, ARMA, and ARIMA) and the VAR model is that former models are unidirectional (the predictors variable influences the target variable not vice versa) but VAR is bidirectional.

Mathematically, a VAR(p) model with 'p' lags can be represented as.

Where:

c = The constant term (intercept) of the model
= Coefficient of lags Y till order p.
= The value of the timeseries at time t
= The error term at time t.

A K-dimensional VAR model of order P, denoted as VAR(p), considering k=2, the equation will be as follows.

For the VAR model, we have multiple time series variables that influence each other; It is modeled as a system of equations with one equation per time series variable. Below is the formula in matrix form.

The final VAR equation becomes.

To ensure the validity and trustworthiness of the results obtained from the VAR model, various assumptions and requirements must be met.

Assumptions Underlying the VAR Model

Linearity
As we have seen from its formula, VAR is a linear model at its core, so all the variables deployed in this model must be linear (i.e., expressed as weighted sums of lagged values).
Stationarity
All variables deployed in this model must be stationary, i.e., the mean, variance, and covariance of each feature of the time series must be constant over time. We have to convert all non-stationary features to stationary if we have any in the dataset.
No perfect multicollinearity between features
For VAR to work effectively, no explanatory variable can be an exact linear combination of others. This matters because it helps in preventing singular matrices in OLS estimation (i.e., must be invertible). We have to drop redundant features or use a regularization technique to tackle this issue.
No autocorrelation in residuals
It assumes the residuals are not serially correlated, they should be white noise. Autocorrelation biases standard errors and invalidates statistical tests.
Sufficient observations
VAR assumes that it has received sufficient data for parameter estimation. So, we need to feed this model with as much information as possible for maximum efficiency.

Now let's see how you can implement this model in the Python programming language.

Implementing the VAR Model on OHLC values in Python

Start by installing all Python dependencies, requirements.txt file can be found in the attachments section.

pip install -r requirements.txt

Imports.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

# Suppress all warnings
warnings.filterwarnings("ignore")

sns.set_style("darkgrid")

We start by importing Open, High, Low, and Close OHLC values from MetaTrader 5.

symbol = "EURUSD"
timeframe = mt5.TIMEFRAME_D1

if not mt5.symbol_select(symbol, True):
    print("Failed to select and add a symbol to the MarketWatch, Error = ",mt5.last_error)
    quit()
    
rates = mt5.copy_rates_from_pos(symbol, timeframe, 1, 10000)
df = pd.DataFrame(rates) # convert rates into a pandas dataframe

df

Outputs.

	time	open	high	low	close	tick_volume	spread	real_volume
0	611280000	1.00780	1.01050	1.00630	1.00760	821	50	0
1	611366400	0.99620	1.00580	0.99100	0.99600	2941	50	0
2	611452800	0.99180	0.99440	0.98760	0.99190	1351	50	0
3	611539200	0.99330	0.99370	0.99310	0.99310	101	50	0
4	611798400	0.97360	0.97360	0.97320	0.97360	81	50	0
...	...	...	...	...	...	...	...	...
9995	1748390400	1.13239	1.13453	1.12838	1.12910	153191	0	0
9996	1748476800	1.12918	1.13849	1.12105	1.13659	191948	0	0
9997	1748563200	1.13630	1.13901	1.13127	1.13470	186924	0	0
9998	1748822400	1.13435	1.14500	1.13412	1.14436	168697	0	0
9999	1748908800	1.14385	1.14549	1.13642	1.13708	147424	0	0

We obtained 10000 bars from the daily timeframe, which we can consider a lot because, according to the models' assumption, data must be sufficient.

Since we want to use this model on OHLC values, let's drop other columns.

ohlc_df = df.drop(columns=[
    "time",
    "tick_volume",
    "spread",
    "real_volume"
])

ohlc_df

I chose to use OHLC values only because I believe there is a strong relationship between these values that the model could help us spot, not to mention these four variables are the fundamental features we can extract from financial instruments.

Since this model assumes stationarity in its features, we can tell that OHLC values aren't stationary so, let us make them stationary by differentiating each value with from its previous value(s) once.

stationary_df = pd.DataFrame()

for col in df.columns:
    stationary_df["Diff_"+col] = df[col].diff()

stationary_df.dropna(inplace=True)
stationary_df

Outputs.

Diff_Open	Diff_High	Diff_Low	Diff_Close
1	0.00080	0.00180	-0.01670	-0.00950
2	-0.00960	-0.00840	-0.01370	-0.01880
3	-0.01870	-0.01930	-0.00350	-0.00190
4	-0.00180	-0.00210	-0.00590	-0.00870
5	-0.00890	-0.00310	-0.01300	-0.01200
...	...	...	...	...

Optionally, we can check for stationarity if unsure of the newly obtained variables.

from statsmodels.tsa.stattools import adfuller

for col in stationary_df.columns:
    
    result = adfuller(stationary_df[col])
    print(f'{col} p-value: {result[1]}')

Outputs.

Diff_Open p-value: 0.0
Diff_High p-value: 1.0471939301334604e-28
Diff_Low p-value: 1.1015540451195308e-23
Diff_Close p-value: 0.0

The P-value must be less than 0.05 (<0.05) for the data to be considered stationary. As we can see the p-value is smaller than 0.05 so, our data is good for now.

Again, according to VAR assumptions, perfect multicollinearity must not exist between the features so, let's ensure that.

stationary_df.corr()

Outputs.

	Diff_Open	Diff_High	Diff_Low	Diff_Close
Diff_Open	1.000000	0.565829	0.563516	0.036347
Diff_High	0.565829	1.000000	0.452775	0.564026
Diff_Low	0.563516	0.452775	1.000000	0.557139
Diff_Close	0.036347	0.564026	0.557139	1.000000

The correlation matrix among the features seems fine, we can even check for the mean absolute correlation coefficient of the whole matrix, ensuring |p| < 0.8.

print("Mean absolute |p|:", np.abs(np.corrcoef(stationary_df, rowvar=False).mean()))

Outputs.

Mean absolute |p|: 0.5924538886295351

Selecting the best number of lags

We've seen in the formula that the VAR model uses the past information (lags) to forecast the future, we have to know which number of lags to use yields the best outcome. Luckily, VAR offered by stats models offers the function to help us determine this value according to a couple of criteria:

Let's calculate lags for 30 days (since data was obtained from a daily timeframe), and observe the information criteria.

# Select optimal lag using AIC
lag_order = model.select_order(maxlags=30)

print(lag_order.summary())

Outputs.

VAR Order Selection (* highlights the minimums)  
==================================================
       AIC         BIC         FPE         HQIC   
--------------------------------------------------
0       -41.87      -41.87   6.537e-19      -41.87
1       -45.15      -45.14   2.457e-20      -45.15
2       -45.63      -45.60   1.530e-20      -45.62
3       -45.85      -45.81   1.225e-20      -45.84
4       -45.99      -45.94   1.065e-20      -45.97
5       -46.18      -46.12   8.805e-21      -46.16
6       -46.24      -46.17   8.256e-21      -46.22
7       -46.28      -46.20   7.951e-21      -46.25
8       -46.31      -46.22   7.708e-21      -46.28
9       -46.34      -46.24   7.471e-21      -46.31
10      -46.36      -46.24   7.368e-21      -46.32
11      -46.41      -46.28   6.979e-21      -46.37
12      -46.42      -46.28   6.890e-21      -46.38
13      -46.44      -46.28   6.806e-21      -46.38
14      -46.45      -46.28   6.730e-21      -46.39
15      -46.45      -46.28   6.697e-21      -46.39
16      -46.46      -46.28   6.628e-21      -46.40
17      -46.49     -46.29*   6.460e-21      -46.42
18      -46.50      -46.28   6.419e-21      -46.42
19      -46.50      -46.28   6.383e-21      -46.43
20      -46.50      -46.27   6.358e-21      -46.43
21      -46.51      -46.27   6.306e-21      -46.43
22      -46.52      -46.26   6.292e-21      -46.43
23      -46.53      -46.26   6.216e-21      -46.44
24      -46.53      -46.25   6.185e-21      -46.44
25      -46.54      -46.24   6.162e-21      -46.44
26      -46.54      -46.24   6.113e-21      -46.44
27      -46.55      -46.23   6.092e-21      -46.44
28      -46.55      -46.22   6.086e-21      -46.44
29     -46.56*      -46.22  6.031e-21*     -46.44*
30      -46.56      -46.21   6.033e-21      -46.44
--------------------------------------------------

Each row shows values for different lag orders, a value marked with an asterisk is the minimum value for that criterion, indicating the "best" lag order according to that criterion.

So, according to this lag order summary.

For AIC, the best model is at lag 29 (value -46.56)
For BIC, the best model is at lag 17 (value -46.29)
For FPE, the best model is at lag 29 (value -6.031e-21)
For HQIC, the best model is at lag 29 (value -46.44)

Most people use AIC and BIC information to select a model. Simply put, AIC tends to select more complex models (higher lags) while BIC penalizes complexity more strongly, often selecting simpler models.

HQIC is the middle ground between AIC and BIC while FPE focuses on prediction error.

For now, let us fit the model with the lag value according to the AIC criteria.

# Fit the model with selected lag
results = model.fit(lag_order.aic)

print(results.summary())

Outputs.

Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Wed, 04, Jun, 2025
Time:                     10:40:37
--------------------------------------------------------------------
No. of Equations:         4.00000    BIC:                   -46.2188
Nobs:                     9970.00    HQIC:                  -46.4425
Log likelihood:           175968.    FPE:                6.03280e-21
AIC:                     -46.5571    Det(Omega_mle):     5.75774e-21
--------------------------------------------------------------------
Results for equation diff_open
=================================================================================
                    coefficient       std. error           t-stat            prob
---------------------------------------------------------------------------------
const                 -0.000002         0.000013           -0.115           0.908
L1.diff_open          -0.959329         0.010918          -87.867           0.000
L1.diff_high           0.009878         0.004957            1.993           0.046
L1.diff_low            0.006869         0.005010            1.371           0.170
L1.diff_close          0.995718         0.004583          217.244           0.000
L2.diff_open          -0.935345         0.015071          -62.062           0.000
L2.diff_high           0.007118         0.006749            1.055           0.292
L2.diff_low            0.022288         0.006819            3.268           0.001
L2.diff_close          0.939861         0.011863           79.226           0.000
L3.diff_open          -0.906595         0.018115          -50.045           0.000
L3.diff_high           0.003072         0.007954            0.386           0.699
L3.diff_low            0.018535         0.008097            2.289           0.022
L3.diff_close          0.910898         0.015703           58.006           0.000
L4.diff_open          -0.898803         0.020501          -43.841           0.000
L4.diff_high           0.003670         0.008912            0.412           0.681
L4.diff_low            0.015668         0.009103            1.721           0.085
L4.diff_close          0.886824         0.018628           47.606           0.000
L5.diff_open          -0.867308         0.022560          -38.445           0.000
L5.diff_high           0.001318         0.009676            0.136           0.892
L5.diff_low           -0.000027         0.009942           -0.003           0.998
L5.diff_close          0.884632         0.020996           42.133           0.000
...
...
...
L29.diff_open         -0.005922         0.004617           -1.283           0.200
L29.diff_high          0.007026         0.004956            1.418           0.156
L29.diff_low           0.004387         0.005005            0.876           0.381
L29.diff_close         0.035169         0.010568            3.328           0.001
=================================================================================

Results for equation diff_high
=================================================================================
                    coefficient       std. error           t-stat            prob
---------------------------------------------------------------------------------
const                  0.000008         0.000048            0.165           0.869
L1.diff_open          -0.010294         0.038697           -0.266           0.790
L1.diff_high          -0.887555         0.017570          -50.515           0.000
L1.diff_low           -0.020634         0.017757           -1.162           0.245
L1.diff_close          0.969305         0.016245           59.667           0.000
L2.diff_open           0.006028         0.053418            0.113           0.910
L2.diff_high          -0.838250         0.023920          -35.043           0.000
L2.diff_low           -0.057396         0.024169           -2.375           0.018
L2.diff_close          0.914246         0.042047           21.744           0.000
L3.diff_open          -0.160354         0.064208           -2.497           0.013
L3.diff_high          -0.807663         0.028191          -28.650           0.000
L3.diff_low           -0.042960         0.028698           -1.497           0.134
L3.diff_close          0.869460         0.055659           15.621           0.000
L4.diff_open          -0.168775         0.072664           -2.323           0.020
L4.diff_high          -0.785399         0.031589          -24.863           0.000
L4.diff_low           -0.054113         0.032265           -1.677           0.094
L4.diff_close          1.013851         0.066026           15.355           0.000
L5.diff_open          -0.146275         0.079959           -1.829           0.067
L5.diff_high          -0.746785         0.034295          -21.775           0.000
L5.diff_low           -0.098885         0.035238           -2.806           0.005
L5.diff_close          1.012989         0.074419           13.612           0.000
...
...
...
L27.diff_open          0.020345         0.053645            0.379           0.705
L27.diff_high         -0.153391         0.028136           -5.452           0.000
L27.diff_low          -0.065690         0.028874           -2.275           0.023
L27.diff_close         0.251005         0.062004            4.048           0.000
L28.diff_open         -0.005863         0.040235           -0.146           0.884
L28.diff_high         -0.087603         0.023901           -3.665           0.000
L28.diff_low           0.008246         0.024229            0.340           0.734
L28.diff_close         0.134924         0.051754            2.607           0.009
L29.diff_open         -0.000480         0.016364           -0.029           0.977
L29.diff_high         -0.051136         0.017564           -2.911           0.004
L29.diff_low           0.035083         0.017741            1.977           0.048
L29.diff_close         0.054123         0.037457            1.445           0.148
=================================================================================

Results for equation diff_low
=================================================================================
                    coefficient       std. error           t-stat            prob
---------------------------------------------------------------------------------
const                  0.000005         0.000047            0.101           0.920
L1.diff_open           0.024212         0.038141            0.635           0.526
L1.diff_high          -0.058570         0.017317           -3.382           0.001
L1.diff_low           -0.904567         0.017501          -51.686           0.000
L1.diff_close          0.976598         0.016012           60.993           0.000
L2.diff_open           0.067049         0.052650            1.274           0.203
L2.diff_high          -0.084679         0.023576           -3.592           0.000
L2.diff_low           -0.866233         0.023822          -36.363           0.000
L2.diff_close          0.937652         0.041442           22.626           0.000
L3.diff_open           0.065284         0.063284            1.032           0.302
L3.diff_high          -0.108128         0.027785           -3.892           0.000
L3.diff_low           -0.791679         0.028285          -27.989           0.000
L3.diff_close          0.844047         0.054858           15.386           0.000
L4.diff_open           0.018366         0.071619            0.256           0.798
L4.diff_high          -0.116216         0.031134           -3.733           0.000
L4.diff_low           -0.747223         0.031801          -23.497           0.000
L4.diff_close          0.816060         0.065076           12.540           0.000
L5.diff_open          -0.040872         0.078809           -0.519           0.604
L5.diff_high          -0.110998         0.033802           -3.284           0.001
L5.diff_low           -0.731241         0.034731          -21.054           0.000
L5.diff_close          0.832344         0.073348           11.348           0.000
...
...
...
L29.diff_open          0.024357         0.016128            1.510           0.131
L29.diff_high          0.026179         0.017312            1.512           0.130
L29.diff_low          -0.072592         0.017486           -4.151           0.000
L29.diff_close         0.051738         0.036919            1.401           0.161
=================================================================================

Results for equation diff_close
=================================================================================
                    coefficient       std. error           t-stat            prob
---------------------------------------------------------------------------------
const                  0.000013         0.000071            0.185           0.853
L1.diff_open           0.037592         0.057827            0.650           0.516
L1.diff_high           0.007085         0.026256            0.270           0.787
L1.diff_low            0.011658         0.026535            0.439           0.660
L1.diff_close         -0.020373         0.024276           -0.839           0.401
L2.diff_open           0.150341         0.079825            1.883           0.060
L2.diff_high          -0.035345         0.035745           -0.989           0.323
L2.diff_low           -0.041114         0.036117           -1.138           0.255
L2.diff_close         -0.012920         0.062832           -0.206           0.837
L3.diff_open          -0.000054         0.095949           -0.001           1.000
L3.diff_high          -0.047439         0.042126           -1.126           0.260
L3.diff_low            0.028500         0.042884            0.665           0.506
L3.diff_close         -0.113979         0.083173           -1.370           0.171
L4.diff_open          -0.083562         0.108585           -0.770           0.442
L4.diff_high          -0.083193         0.047204           -1.762           0.078
L4.diff_low            0.055907         0.048215            1.160           0.246
L4.diff_close          0.026375         0.098665            0.267           0.789
L5.diff_open          -0.148622         0.119487           -1.244           0.214
L5.diff_high          -0.065192         0.051248           -1.272           0.203
L5.diff_low            0.011819         0.052658            0.224           0.822
L5.diff_close          0.125327         0.111207            1.127           0.260
...
...
...
L29.diff_open          0.002852         0.024453            0.117           0.907
L29.diff_high         -0.011652         0.026247           -0.444           0.657
L29.diff_low          -0.004191         0.026511           -0.158           0.874
L29.diff_close         0.070689         0.055974            1.263           0.207
=================================================================================

Correlation matrix of residuals
              diff_open  diff_high  diff_low  diff_close
diff_open      1.000000   0.223818  0.241416    0.126479
diff_high      0.223818   1.000000  0.452061    0.770309
diff_low       0.241416   0.452061  1.000000    0.765777
diff_close     0.126479   0.770309  0.765777    1.000000

The VAR model, similarly to other statistical/traditional time series models, they provide a detailed summary of the model performance and its features. This summary helps us understand the model in detail, let us briefly analyze the above model's summary.

Regression results

--------------------------------------------------------------------
No. of Equations:         4.00000    BIC:                   -46.2188
Nobs:                     9970.00    HQIC:                  -46.4425
Log likelihood:           175968.    FPE:                6.03280e-21
AIC:                     -46.5571    Det(Omega_mle):     5.75774e-21

No. of Equations: 4, means the system (model) contains 4 endogenous variables; diff_open, diff_high, diff_low, diff_close.
Nobs (Number of observations used): Since we chose the AIC criteria, which uses 29 lags, 29+1 features weren't included in the training (estimation) process, as those values before were used as initial lags.
AIC, BIC, HQIC, and FPE: All these values are negative (normal,) which is a good indication of a better fit.
Log Likelihood: A high positive value indicates a good model fit.

Each equation results

For each variable (diff_open, diff_high, diff_low, and diff_close), you see.

Coefficients

These represent the impact of each lagged variable (L1, L2, etc.) on the current value. The closer to 1 this value is the more positive impact a variable has on the current equation variable and vice versa.
For example.

Results for equation diff_open
=================================================================================
                    coefficient       std. error           t-stat            prob
---------------------------------------------------------------------------------
const                 -0.000002         0.000013           -0.115           0.908
L1.diff_open          -0.959329         0.010918          -87.867           0.000

The coefficient of -0.959329 here means that A 1-unit increase in yesterday's (lag-1) diff_open value is associated with a decrease of 0.959329 units in today's diff_open value, holding all other variables constant.

Std. Error

These represent the precision of coefficient estimates.
t-stat

This stands for statistical significance. The greater the absolute value |t-stat| of this metric is, the more significant the variable is. A large absolute value (e.g., |t|>2) indicates statistical significance.

The value of |(-87.867)| = +87.876 is large indicating that the effect of the variable diff_open at lag-1 is highly significant (not due to random chance).
prob

This represents the p-value associated with each coefficient's t-statistic. It tells you whether a specific lagged variable has a significant effect on the current value of the dependent variable.

When the prob value is less than or equal to 0.05, a variable is statistically significant.

Residual correlation matrix

Correlation matrix of residuals
              diff_open  diff_high  diff_low  diff_close
diff_open      1.000000   0.223818  0.241416    0.126479
diff_high      0.223818   1.000000  0.452061    0.770309
diff_low       0.241416   0.452061  1.000000    0.765777
diff_close     0.126479   0.770309  0.765777    1.000000

This shows correlations between prediction errors across equations.

A high correlation between diff_high/diff_close (0.77) and diff_low/diff_close (approximately: 0.766) suggests common unexplained factors affect these pairs.

Out-of-Sample Forecasting using VAR

Similarly to the ARIMA model discussed in the previous article, forecasting out-of-sample data using VAR is quite tricky. Unlike machine learning models, these traditional models have to be updated regularly with new information.

Let's make a function for the task.

def forecast_next(model_res, symbol, timeframe):
    forecast = None
        
    # Get required lags for prediction
    rates = mt5.copy_rates_from_pos(symbol, timeframe, 0, model_res.k_ar+1) # Get rates starting at the current bar to bars=lags used during training
    if rates is None or len(rates) < model_res.k_ar+1:
        print("Failed to get copy rates Error =", mt5.last_error())
        return forecast, None
        
    # Prepare input data and make forecast
    input_data = pd.DataFrame(rates)[["open", "high", "low", "close"]].values
    stationary_input = np.diff(input_data, axis=0)[-model_res.k_ar:] # get the recent values equal to the number of lags used by the model
    
    try:
        forecast = model_res.forecast(stationary_input, steps=1) # predict the next price
    except Exception as e:
        print("Failed to forecast: ", str(e))
        return forecast, None
        
    try:
        updated_data = np.vstack([model_res.endog, stationary_input[-1]]) # concatenate new/last datapoint to the data used during previous training
        updated_model = VAR(updated_data).fit(maxlags=model_res.k_ar) # Retrain the model with new data
    except Exception as e:
        print("Failed to update the model: ", str(e))
        return forecast, None
        
    return forecast, updated_model

To get a prediction we have to equip the model with the initial trained model and re-update the new model after each prediction by reassigning a model variable to itself.

res_model = results # Initial model

forecast, res_model = forecast_next(model_res=res_model, symbol=symbol, timeframe=timeframe)

forecast_df = pd.DataFrame(forecast, columns=stationary_df.columns)
print("next forecasted:\n", forecast_df)

Outputs.

next forecasted:
    diff_open  diff_high  diff_low  diff_close
0    0.00435   0.003135  0.001032   -0.000655

We can simplify the training and prediction process by wrapping all these into a class.

File VAR.py

import pandas as pd
import numpy as np
import MetaTrader5 as mt5
from statsmodels.tsa.api import VAR

class VARForecaster:
    def __init__(self, symbol: str, timeframe: int):
        self.symbol = symbol
        self.timeframe = timeframe
        self.model = None
        
    def train(self, start_bar: int=1, total_bars: int=10000, max_lags: int=30):
        
        """Trains the VAR model using the collected OHLC from given bars from MetaTrader5
        
            start_bar:
                int: The recent bar according to copyrates_from_pos 
            total_bars:
                int: Total number of bars to use for training
            max_lags:
                int: The maximum number of lags to use 
        """
        
        self.max_lags = max_lags
        
        if not mt5.symbol_select(self.symbol, True):
            print("Failed to select and add a symbol to the MarketWatch, Error = ",mt5.last_error())
            quit()
            
        rates = mt5.copy_rates_from_pos(self.symbol, self.timeframe, start_bar, total_bars)
        
        if rates is None:
            print("Failed to get copy rates Error =", mt5.last_error())
            return
        
        if total_bars < max_lags:
            print(f"Failed to train, max_lags: {max_lags} must be > total_bars: {total_bars}")
            return
        
        train_df  = pd.DataFrame(rates) # convert rates into a pandas dataframe

        train_df = train_df[["open", "high", "low", "close"]]
        stationary_df = np.diff(train_df, axis=0) # Convert OHLC values into stationary ones by differenciating them
        
        self.model = VAR(stationary_df)
    
        # Select optimal lag using AIC
        
        lag_order = self.model.select_order(maxlags=self.max_lags)
        print(lag_order.summary())
        
        # Fit the model with selected lag
        
        self.model_results = self.model.fit(lag_order.aic)
        print(self.model_results.summary())
        
    def forecast_next(self):
        
        """Gets recent OHLC from MetaTrader5 and predicts the next differentiated prices

        Returns:
            np.array: predicted values
        """
        
        forecast = None
            
        # Get required lags for prediction
        rates = mt5.copy_rates_from_pos(self.symbol, self.timeframe, 0, self.model_results.k_ar+1) # Get rates starting at the current bar to bars=lags used during training
        
        if rates is None or len(rates) < self.model_results.k_ar+1:
            print("Failed to get copy rates Error =", mt5.last_error())
            return forecast
            
        # Prepare input data and make forecast
        input_data = pd.DataFrame(rates)[["open", "high", "low", "close"]]
        stationary_input = np.diff(input_data, axis=0)[-self.model_results.k_ar:] # get the recent values equal to the number of lags used by the model
        
        try:
            forecast = self.model_results.forecast(stationary_input, steps=1) # predict the next price
        except Exception as e:
            print("Failed to forecast: ", str(e))
            return forecast
            
        try:
            updated_data = np.vstack([self.model_results.endog, stationary_input[-1]]) # concatenate new/last datapoint to the data used during previous training
            updated_model = VAR(updated_data).fit(maxlags=self.model_results.k_ar) # Retrain the model with new data
        except Exception as e:
            print("Failed to update the model: ", str(e))
            return forecast
        
        self.model = updated_model
            
        return forecast

Let's wrap this up in a Python-based trading robot.

Making the VAR-Based Trading Robot

Given the above class, which can aid us in training and making forecasts on the next value, let us incorporate the predicted outcomes into a trading strategy.

Firstly, in the previous example, we used stationary values produced by differencing the current values from the previous one to get stationary values. At the same time, the approach works, it is not very practical when it comes to building a trading strategy.

Instead, let's find the difference between open and high value, to get how much the price moves upward from the opening price and the difference between open and low value, to get how much the price moves downward from the opening price.

By getting these two values, one for tracking the upward movement of the candle and the other for tracking the downward movement of the candle we can use the predicted outcomes for setting stop loss and take profit values.

Let us change the features used in our model.

# Prepare input data and make forecast
input_data = pd.DataFrame(rates)[["open", "high", "low", "close"]]
        
stationary_input = pd.DataFrame({
        "high_open": input_data["high"] - input_data["open"],
        "open_low": input_data["open"] - input_data["low"]
})

The resulting features obtained by differentiating are most likely a stationary variable (no need to check for now).

Inside the main robot file, let's schedule the training process and print the forecasted values.

Filename: VAR-TradingRobot.py

import MetaTrader5 as mt5
import schedule
import time
from VAR import VARForecaster

symbol = "EURUSD"
timeframe = mt5.TIMEFRAME_D1
mt5_path = r"c:\Users\Omega Joctan\AppData\Roaming\Pepperstone MetaTrader 5\terminal64.exe" # replace this with a desired MT5 path

if not mt5.initialize(mt5_path): # initialize MetaTrader5
    print("Failed to initialize MetaTrader5, error =", mt5.last_error())
    quit()

var_model = VARForecaster(symbol=symbol, timeframe=timeframe)
var_model.train(start_bar=1, total_bars=10000, max_lags=30) # Train the VAR Model

def get_next_forecast():
    
    print(var_model.forecast_next())
    
schedule.every(1).minutes.do(get_next_forecast)

while True:
    
    schedule.run_pending()
    time.sleep(60)
    
else:
    mt5.shutdown()

Outputs.

[[0.00464001 0.00439884]]

Now that we have these two separate forecasted outcomes for high_open and open_low, let us create a simple trading strategy based on a simple moving average.

Filename: VAR-TradingRobot.py

import MetaTrader5 as mt5
import schedule
import time

import ta
from VAR import VARForecaster
from Trade.Trade import CTrade
from Trade.SymbolInfo import CSymbolInfo
from Trade.PositionInfo import CPositionInfo
import numpy as np
import pandas as pd

symbol = "EURUSD"
timeframe = mt5.TIMEFRAME_D1
mt5_path = r"c:\Users\Omega Joctan\AppData\Roaming\Pepperstone MetaTrader 5\terminal64.exe" # replace this with a desired MT5 path

if not mt5.initialize(mt5_path): # initialize MetaTrader5
    print("Failed to initialize MetaTrader5, error =", mt5.last_error())
    quit()

var_model = VARForecaster(symbol=symbol, timeframe=timeframe)
var_model.train(start_bar=1, total_bars=10000, max_lags=30) # Train the VAR Model

# Initlalize the trade classes

MAGICNUMBER = 5062025
SLIPPAGE = 100

m_trade = CTrade(magic_number=MAGICNUMBER, 
                 filling_type_symbol=symbol, 
                 deviation_points=SLIPPAGE)

m_symbol = CSymbolInfo(symbol=symbol)
m_position = CPositionInfo()

#####################################################

def pos_exists(pos_type: int, magic: int, symbol: str) -> bool: 
    
    """Checks whether a position exists given a magic number, symbol, and the position type

    Returns:
        bool: True if a position is found otherwise False
    """
    
    if mt5.positions_total() < 1: # no positions whatsoever
        return False
    
    positions = mt5.positions_get()
    
    for position in positions:
        if m_position.select_position(position):
            if m_position.magic() == magic and m_position.symbol() == symbol and m_position.position_type()==pos_type:
                return True
            
    return False

def trading_strategy():
    
    forecasts_arr = var_model.forecast_next().flatten()
    
    high_open = forecasts_arr[0]
    open_low = forecasts_arr[1]
    
    print(f"high_open: ",high_open, " open_low: ",open_low)
    
    # Get the information about the market
    
    rates = mt5.copy_rates_from_pos(symbol, timeframe, 0, 50) # Get the last 50 bars information
    rates_df = pd.DataFrame(rates)
    
    if rates is None:
        print("Failed to get copy rates Error =", mt5.last_error())
        return 
    
    sma_buffer = ta.trend.sma_indicator(close=rates_df["close"], window=20)
    
    m_symbol.refresh_rates()
    
    if rates_df["close"].iloc[-1] > sma_buffer.iloc[-1]: # current closing price is above sma20
        if pos_exists(pos_type=mt5.POSITION_TYPE_BUY, symbol=symbol, magic=MAGICNUMBER) is False: # If a buy position doesn't exist
            m_trade.buy(volume=m_symbol.lots_min(),
                        symbol=symbol,
                        price=m_symbol.ask(),
                        sl=m_symbol.ask()-open_low,
                        tp=m_symbol.ask()+high_open)

    else: # if the closing price is below the moving average
        
        if pos_exists(pos_type=mt5.POSITION_TYPE_SELL, symbol=symbol, magic=MAGICNUMBER) is False: # If a buy position doesn't exist
            m_trade.sell(volume=m_symbol.lots_min(),
                        symbol=symbol,
                        price=m_symbol.bid(),
                        sl=m_symbol.bid()+high_open,
                        tp=m_symbol.bid()-open_low)
            
    
schedule.every(1).minutes.do(trading_strategy)

while True:
    
    schedule.run_pending()
    time.sleep(60)
    
else:
    mt5.shutdown()

Using the trade classes discussed in this article, we check if a position of the same kind exists; if it doesn't, we open a position of the same type. The predicted values high_open and open_low are used for setting the take profit and stop loss of a trade respectively for a buy trade and vice versa for a sell trade.

A simple moving average indicator of the period (window) = 20 is used as a confirmation signal. If the current close price is above the moving average indicator, we open a buy trade; otherwise, we do the opposite for a sell trade.

Outputs.

Final Thoughts

Vector Auto-Regression is a decent classical time series model with the ability to forecast multiple regression features, an ability that most machine learning models don't have.

These models offer a couple of advantages such as:

A flexible lag structure which allows different lag lengths for different variables,
They capture interdependencies (dynamic relationships between variables),
They have no strict exogeneity assumption, often present in traditional regression models.

Some of their drawbacks include:

Their sensitivity to stationary variables, as they only work best in stationary data.
They assume a linear relationship between a variable and its lags, something not always feasible in the financial markets.
They can also suffer from overfitting when given many variables and lags.

This article was aimed at raising awareness about this model, its composition, and how it can be applied to trading data, as I found it less documented online on this particular subject. Please don't hesitate to improve the idea to suit your needs.

Best regards.

Stay tuned and contribute to machine learning algorithms development for the MQL5 language in this GitHub repository.

Attachments Table

Filename	Descrition & Usage
Trade/*	MQL5-like trade classes in Python language.
error_description.py	Contains MetaTrader 5 error codes descriptions.
forex-ts-forecasting-using-var.ipynb	A Jupyter notebook containing examples for learning purposes.
VAR.py	Contains the class that utilizes the VAR model for training and making forecasts.
VAR-TradingRobot.py	A trading robot that opens buy and sell trades based on predictions made by the VAR model.

Attached files |

Download ZIP

Attachments.zip (54.26 KB)

Warning: All rights to these materials are reserved by MetaQuotes Ltd. Copying or reprinting of these materials in whole or in part is prohibited.

This article was written by a user of the site and reflects their personal views. MetaQuotes Ltd is not responsible for the accuracy of the information presented, nor for any consequences resulting from the use of the solutions, strategies or recommendations described.

What is Vector Autoregression (VAR)?

Mathematics Behind the Vector Autoregression (VAR) model

Assumptions Underlying the VAR Model

Implementing the VAR Model on OHLC values in Python

Out-of-Sample Forecasting using VAR

Making the VAR-Based Trading Robot

Final Thoughts

Other articles by this author