preview
Statistical Arbitrage Through Cointegrated Stocks (Part 7): Scoring System 2

Statistical Arbitrage Through Cointegrated Stocks (Part 7): Scoring System 2

MetaTrader 5Trading systems |
1 607 2
Jocimar Lopes
Jocimar Lopes

Introduction

In the previous article, we introduced a suggested scoring system for baskets of stocks. We left it with the results of a backtest that showed relatively good numbers, but also revealed some flaws in our basket. 

  • Very long position holding times on average
  • At least one position lasted more than sixteen weeks! This was the maximum position holding time.

The main goal of that backtest was not to assess the profitability of the strategy or basket, as we mentioned there, but to test how our scoring system would perform while lacking two of its suggested eliminatory criteria: the stability of the portfolio weights and the time to mean reversion, or half-time. That is, at the point we left the last article, our scoring system was using only two of the four suggested eliminatory criteria. We simply built several baskets with the most liquid securities from the semiconductor industry and chose the most cointegrated one. We did not evaluate if the portfolio weights indicated by the Johansen test were stable enough, and we did not evaluate the time to mean reversion.

  • Liquidity 
  • Strength of the cointegration vectors 
  • Stability of the cointegration vector (portfolio weights) 
  • Time to mean reversion (half-time) 

In this article, we will fill this gap by covering the remaining two: stability of the portfolio weights and time to mean reversion. We will continue from that point and use the same selected baskets to check how the system can benefit from the inclusion of these two ranking factors.

For the readers who may not be following this series, we are developing a statistical arbitrage framework for the average retail trader with a consumer notebook, limited funds, and a regular internet bandwidth. The project started as an informal chit-chat with friends and resulted in a challenge that my partner and I accepted as what it is: an opportunity for learning and to improve our skills as traders. The conversation was motivated by the passing of the mathematician and hedge fund manager Jim Simmons, who achieved a record of 30 years of consecutive gains with his legendary Medallion Fund, with a "66.1% average gross annual return or a 39.1% average net annual return between 1988 and 2018" using statistical arbitrage and, in his own words, “some kind of machine learning”.

Until now, we have seen how to apply and interpret the most common correlation, cointegration, and stationarity tests for pairs-trading and for portfolio groups (baskets). We implemented several Python scripts for analysis, two sample Expert Advisors, one for pairs-trading and the other for baskets, and ran some backtests with them. We also set up and have been evolving the required database schema to support our experiments.

If the statistical arbitrage approach is of interest to you, please check the previous articles of this series and take some time to play with them. You will see that the conversation is very trader-friendly, as we have been standing “on the shoulders of the giants”. Professional mathematicians and statisticians already did the hard work for us, and we are benefiting from it by keeping the focus on the trading side - instead of the hard math - and making extensive use of ready-made open-source libraries. Now that we are near the end of the basic part of this series, it’s a good time to review the fundamentals.

That said, let's finish our scoring system. We’ll start by modifying our coint_rank table to accommodate the missing data, the two missing ranking factors.

Fig. 1 - Diagram showing the updated coint_rank table fields and datatypes

Fig. 1. Diagram showing the updated coint_rank table fields and datatypes



Time to mean reversion

The backtest we ran to test our (half-implemented) scoring system brought some weird numbers in respect to the average position holding times: almost four and a half weeks, on average! And at least one position remained open for more than sixteen weeks. This is not the expected risk exposure for our strategy. 

Fig. 2 - Screenshot of the previous article backtest report showing position holding times

Fig. 2. Screenshot of the previous article backtest report showing position holding times

Supposing that everything is fine with the EA and the backtest environment, and since our strategy is a mean-reversion-based one, the most obvious factor to explain these weird numbers is the time to mean reversion. Our spread might be lasting a lot more than expected to return to the mean, which is the trigger to close all the hedged positions.

Do you remember how we chose the basket to trade? We took the “most” cointegrated, that is, the basket with the highest cointegration strength, a four-symbol basket built from the ten most liquid symbols from the semiconductor industry. In other words, we scored by liquidity and cointegration strength… and nothing more. But once a cointegrated portfolio is identified, the next logical step is assessing how quickly spread deviations from the mean return to it. This metric was given the fancy name of half-life of mean reversion. The half-life of mean reversion quantifies the expected time for the spread (the residual from the cointegrating relationship) to revert halfway back to its mean after the deviation.

In stat arb, where profits stem from betting on temporary mispricings, quantifying the half-life is crucial for estimating position holding times, and by consequence, for assessing the overall strategy's viability. Later, when dealing with money/risk management, the half-life of mean reversion will be useful to determine the volume of our positions, the size of our trades, since shorter half-lives support larger bets due to lower uncertainty.

A short half-life (minutes to hours, in our case) indicates rapid reversion and allows quick position unwinds. We reduce our risk exposure, and our capital is freed to be invested in new opportunities. Conversely, a long half-life (days to weeks) ties up our capital, increases holding costs (like swaps for Forex and borrow fees for shorting stocks), and increases our risk exposure to news events or liquidity dries. Without the quantification of the half-life to mean reversion, we are subject to not only staying in the market for longer than expected, but also to exiting prematurely.

If you would keep only a single bit of information from this introduction, engrave in stone the following: MEAN REVERSION IS NOT GUARANTEED.

Your position may remain open forever. Prolonged deviations can trigger stop losses or margin calls. Half-lives exceeding your risk tolerance (e.g., >30 days in high-frequency setups) should be filtered out. 

How to Calculate Half-Life

The math behind the calculation of the half-life to mean reversion is hairy for the non-initiated on the math white magic. The half-life is derived from modeling the spread as an Ornstein–Uhlenbeck process, a stochastic differential equation that describes the mean-reverting behavior. In discrete time, which is our case, this approximates an AR(1) model, a first-order autoregressive model.

“The Ornstein–Uhlenbeck process is used in the Vasicek model of the interest rate. The Ornstein–Uhlenbeck process is one of several approaches used to model (with modifications) interest rates, currency exchange rates, and commodity prices stochastically. (...) One application of the process is a trading strategy known as pairs trade.” (Wikipedia page about Ornstein-Uhlenbeck process in financial mathematics)

I’m quoting the above as a reference for the math-inclined reader. Fortunately, as said above, we do not need to master the math behind this calculation because it has a formula, and we can count on our trusty statsmodels Python library for the AR(1) model fitting. All that we need to know is the steps to compute. (Your AI assistant can be of invaluable help here, since this is a very well-known calculation and statsmodels is a very well-known statistical library.)

The attached Python file does it in the compute_half_life() method.

if len(spread) < 3:
            return np.nan
        lag = spread.shift(1).dropna()
        delta = (spread - lag).dropna()
        if len(delta) < 2:
            return np.nan
        res = sm.OLS(delta, sm.add_constant(lag, has_constant='add')).fit()
        # beta = res.params[1]
        beta = res.params.iloc[-1]
        if beta >= 0:
            return np.inf
        else:
            return -np.log(2) / beta

The spread must be validated for stationarity before passing it here, which we already did with an ADF test previously in our pipeline. The data we are passing here are already stationary. Also, we need to run this on out-of-sample data to avoid overfitting and confirm the predictive power, which we’ll be doing next. As an initial threshold, we will look for half-lives in the four to twenty hours (1-5 bars) for our strategy (remember we are swing trading in the H4 timeframe). You should adapt this range for your own needs. For example, you might want to look for a 5-20-day range for a daily timeframe. But again: change, backtest, change again, backtest, and eventually, optimize. If half-life is infinite (non-reverting), discard the pair.

Interpretation

The half-life is expressed in the number of periods corresponding to the timeframe. In our example strategy, it is a four-hour period, since we are testing for the H4 timeframe. So, a half-life of 10 means it takes approximately 10 bars, or 40 hours, for the spread to reduce its deviation from the mean by 50%.

For example, in Figure 3, you can see the coint_rank table without any filters (except for half_life IS NOT NULL). The first two lines show the records for the same basket, timeframe, and lookback, taken in two different moments (timestamps). The measured half-life is ~6.98, meaning, this basket, when tested for cointegration by the Johansen method, in the H4 timeframe, with a lookback of 180 days, takes nearly 7 bars of 4 hours each, or 48 hours, to have its deviated spread returning to the mean. It is implicit that the other 48 hours, the spread went from the mean to its peak. 

Fig. 3 - Metaeditor database interface showing the coint_rank table with NOT NULL half_life field

Fig. 3. Metaeditor database interface showing the coint_rank table with NOT NULL half_life field

In Figure 4, you can see that in the same table, we have an example of a very long half-life. In this case, the maximum half-life is among our recorded tests. The basket in question, when tested for a lookback of 120 days, takes nearly 934 bars of 4 hours each, or nearly 155 days to return to the mean. Obviously, you will not want to trade this basket because it will require holding positions longer, increasing our exposure to market risks.

Fig. 4 - Metaeditor database interface showing the coint_rank table with NOT NULL half_life field and not infinity

Fig. 4. Metaeditor database interface showing the coint_rank table with MAX half_life field and not infinity

Note the use of the NOT IN clause to exclude ‘Infinity’ values in SQLite. Without this clause, the MAX half-life returned would be ‘Infinity’. In this case, we can assume that the spread could never return to the mean; what is to say, our positions may remain open indefinitely, unless we have any other closing/stop-loss criteria, like a position holding time threshold, for example (close by time).

Fig. 5 - Metaeditor database interface showing the coint_rank table with MAX half_life without infinity filter

Fig. 5. Metaeditor database interface showing the coint_rank table with MAX half_life without infinity filter

Infinite Half-Life indicates no mean reversion (spread is non-stationary or random walk-like). This is stored as `np.inf` by our code and suggests the basket is not suitable for mean-reversion trading in the H4 timeframe, with a lookback of 180 days.

    def compute_half_life(self, spread: pd.Series) -> float:
        """
        Compute the half-life of mean reversion for a given spread series.
        
        Parameters
        ----------
        spread : pd.Series
            The stationary spread (residual) series from the cointegration vector.
        
        Returns
        -------
        float
            The half-life in periods (e.g., bars). Returns np.inf if no mean reversion (beta >= 0), np.nan if insufficient data.
        """
        if len(spread) < 3:
            return np.nan
        lag = spread.shift(1).dropna()
        delta = (spread - lag).dropna()
        if len(delta) < 2:
            return np.nan
        res = sm.OLS(delta, sm.add_constant(lag, has_constant='add')).fit()
        # beta = res.params[1]
        beta = res.params.iloc[-1]
        if beta >= 0:
            return np.inf
        else:
            return -np.log(2) / beta

By contrast, yet in the same sample data, we have a very short half-life, the table minimum half-life of nearly one and a half H4 bar, or approximately 6 hours to mean reversion.

Fig. 6 - Metaeditor database interface showing the coint_rank table with the MIN half_life field

Fig. 6. Metaeditor database interface showing the coint_rank table with the MIN half_life field

This is ideal for statistical arbitrage, as it suggests trading opportunities (entering a trade when the spread deviates and exiting when it reverts) can occur more frequently within a short time frame.

Fig. 7 - Metaeditor database interface showing the coint_rank table with NULL half_life fields

Fig. 7. Capture_metaeditor_db_coint_rank_half_life_NULL

Finally, in Figure 7, you can see that we have thousands of NaN (Not a Number) Half-Life. They are stored as NULL by SQLite and occur when there’s insufficient data or numerical issues like constant spread, indicating the calculation failed.

To sum it up, when it comes to profitability, a shorter half-life is generally better for stat arb strategies, as it implies quicker trades with lower capital lockup and reduced risk of the cointegration breaking down, but we must always take into account that short or long half-life is relative because it must align with our strategic criteria for scoring. In our example here, it will be short or long relative to the H4 timeframe, which is our strategic criterion since the beginning of the scoring.

Keep in mind that:

  • A finite half-life assumes the spread is stationary. Check `stability_stat` (ADF statistic) to confirm the spread remains stationary out-of-sample. A less negative `stability_stat` suggests the cointegration vector may not hold in the future.
  • Issues like constant or near-constant price series can lead to `NaN` or `inf` values. This indicates poor data or low liquidity.
  • The half-life is tied to the timeframe. On a different timeframe, the same basket might show a different half-life. Compare half-lives within the same timeframe.

The coint_ranker_auto.py script attached to this article will help in the understanding of the half-life behavior by plotting the spread for top baskets, so we can visually confirm that the reversion speed matches the computed half-life.

Fig. 8 - Plot of the cointegrated spread for a basket of four Nasdaq stocks with the mean reversion half-life

Fig. 8. Plot of the cointegrated spread for a basket of four Nasdaq stocks with the mean reversion half-life

In our scoring system, we’ll be using half-life alongside rank_score and stability_stat (see below) to rank the baskets. We will prioritize baskets with high rank_score (strong cointegration), short half_life (fast reversion), and more negative stability_stat (stable vector).


Stability of the portfolio weights

The portfolio weights are the cornerstone of statistical arbitrage through cointegration. They will define not only the size of each simultaneously opened hedged position, but also their direction. We talked a lot about them when introducing the Johansen cointegration test, and we’ll not repeat ourselves here to save your time. It is enough to remember that they are the origin of the mean-reverting spread calculation.

They come from the eigenvectors produced by the Johansen test, the multivariate statistical method we’ve been using to identify cointegration among multiple stock price series (non-stationary time series). The test estimates the number of cointegrating vectors and provides the associated eigenvectors, which represent the linear combinations of assets that form a stationary, tradeable portfolio. 

For instance, in our last backtest, for a basket composed of  "NVDA", "INTC", "AVGO", "ASX", we used {1.0, -1.9598590335874817, -0.36649674991957104, 21.608207065113874} as the portfolio weights.

NVDA INTC AVGO ASX
1.0
-1.9598590335874817
-0.36649674991957104
21.608207065113874

Table 1. A stock basket with respective portfolio weights (cointegration vectors)

It means that for each unit of long position in NVDA, we would be opening 

  • One short position of ~1.96 units of INTC 
  • One short position of ~0.37 units of AVGO
  • Another long position of ~21.6 units of ASX

It is this weighted hedging schema that, ideally, ensures the expected stat arb market neutrality. (Ideally, because market neutrality is a never-fully-achieved goal. There will always remain residual risks, totally out of our scope here.) However, assuming these weights are fixed over time can be risky in dynamic financial markets. While scoring, we must check the stability of these Johansen eigenvectors for at least three reasons:

First, because cointegration implies a long-term equilibrium, but economic regimes shift due to factors like policy changes, market crashes, or sector rotations. If the eigenvectors fluctuate significantly over time across rolling estimation windows, it tells us that the underlying relationship may not be stable. Unstable weights could lead to a breakdown in mean reversion. They may turn a profitable arbitrage strategy into one prone to persistent deviations and losses.

Second, because stable weights ensure the portfolio remains hedged against common factors, like market beta, for example. Instability might amplify our exposure to unintended risks, such as volatility spikes or cointegration breakdowns. For example, during the 2008 financial crisis, many cointegrated pairs in financial stocks decoupled, rendering prior eigenvectors obsolete. Regularly testing stability helps avoid overconfidence in historical estimates and prompts timely strategy adjustments, such as re-estimating weights or exiting positions. This is the main reason why we implemented the “database as a single source of truth” for real-time model updates.

And third, because unstable portfolio weights will almost certainly result in higher drawdowns and lower Sharpe ratios. If instability is detected, we must aggressively demote the basket in our scoring system.

What is `stability_stat`?

As its name suggests, the stability_stat field in the coint_rank table stores the portfolio weights stability statistics. Its value can be calculated by two different methods:

Rolling window eigenvector comparison - This is a direct method, in which we compare the cointegration vector across successive rolling windows. We expect the cosine distance to stay within a chosen threshold to consider the portfolio weights as stable. 

By in-sample/out-of-sample ADF validation - This is a four-step process. We start by splitting the price data into two parts for in-sample (IS) and out-of-sample (OOS) tests, and compute the cointegration vector on the IS data. Then, by using this cointegration vector, we calculate the spread on the OOS data. Finally, we evaluate the ADF statistic on this OOS spread. We expect that both IS and OOS ADF statistics indicate stationarity to consider the portfolio weights as stable.

That is:

  1. Split the data into in-sample and out-of-sample
  2. Get the in-sample coint vector, spread, and ADF stats
  3. Use the same coint vector to calculate the spread and ADF stats on out-of-sample

These ADF statistics are the same Augmented Dickey Fuller stationarity test we have been using since the start of this series. Again, we already talked about the ADF test, and you, the attentive reader, are kindly invited to check the article in which we introduced the ADF test.

There is no “better” or recommended method. Both methods are useful for different purposes. The rolling windows eigenvector comparison is better for real-time model updates and portfolio rebalancing when monitoring online trading. The in-sample/out-of-sample ADF validation fits better on pre-trading scenarios, like the scoring phase, and for assessing portfolio viability on backtests. This is the method we should use for estimating the portfolio stability. So, this is the method we are using here, in our scoring system.

(Given the importance of the continuous re-evaluation of the portfolio weights for account protection, eventually, we will be proposing an article with the results we got when benchmarking the use of these two methods in different scenarios.)

In the attached Python script, you will find the compute_stability function we are using. As said, it is based on the second method above.

1. Split the price data into in-sample (first 70% by default, controlled by `split_ratio=0.7`) and out-of-sample (remaining 30%) periods.

def compute_stability(
        self,
        prices: pd.DataFrame,
        method: str,
        split_ratio: float = 0.7,
        det_order: int = 0,
        k_ar_diff: int = 1
    ) -> float:
        """
        Check the stability of the cointegration vector by computing it on in-sample data
        and evaluating the ADF statistic on the out-of-sample spread.
        
        Parameters
        ----------
        prices : pd.DataFrame
            DataFrame with columns as asset tickers and rows as time-indexed prices.
        method : str
            'Engle-Granger' for pairs or 'Johansen' for baskets.
        split_ratio : float
            Fraction of data to use as in-sample (default 0.7).
        det_order : int
            Deterministic order for Johansen (default 0).
        k_ar_diff : int
            Lag order for Johansen (default 1).
        
        Returns
        -------
        float
            The out-of-sample ADF statistic (more negative indicates stronger stability/stationarity).
            Returns np.nan if insufficient data.
        """
        n = len(prices)
        if n < 50:
            return np.nan
        split = int(n * split_ratio)
        if n - split < 30:
            return np.nan
        train = prices.iloc[:split]
        test = prices.iloc[split:]

2. Compute the cointegration vector using `get_coint_vector` on the in-sample data (using either Engle-Granger for pairs or Johansen for baskets).\

try:
      vec_is = self.get_coint_vector(train, method, det_order, k_ar_diff)

3. Apply this vector to the out-of-sample prices to form the spread

spread_oos = pd.Series(np.dot(test.values, vec_is), index=test.index)
 if spread_oos.std() < 1e-10 or spread_oos.isnull().any():
     self.logger.debug("Out-of-sample spread is effectively constant or contains NaNs")
      raise ValueError("Out-of-sample spread is effectively constant or contains NaNs")

4. Run the ADF test on this out-of-sample spread using `adfuller` from `statsmodels`, extracting the test statistic (the first element of the result, `adfuller(spread_oos)[0]`)

adf_stat = adfuller(spread_oos)[0]

5. Return the ADF statistic as `stability_stat`, or `np.nan` if the computation fails (e.g., insufficient data or numerical issues).

    return float(adf_stat)
except Exception as e:
    self.logger.warning(f"Stability computation error: {e}")
    return np.nan

As you may already have noted, at the end of the day, we are simply measuring how robust the cointegration relationship is when applied to out-of-sample data. It is the ADF statistic computed on the out-of-sample spread that tests whether the spread remains stationary when the cointegration vector (derived from in-sample data) is applied to unseen data. We are assessing whether the cointegration relationship is likely to persist in future trading.

Remember that a more negative ADF statistic indicates stronger stationarity, that is, stronger evidence of mean reversion in the out-of-sample spread. Common critical values for ADF at 5% significance are around -2.86 to -3.5 (depending on sample size and model). If `stability_stat` is more negative than these, the spread is likely stationary at 95% confidence.

On the other hand, near-zero or positive values indicate the out-of-sample spread is likely non-stationary, suggesting the cointegration vector does not produce a mean-reverting spread in unseen data. This implies the cointegration relationship is unstable and may not be reliable for trading.

The method requires sufficient out-of-sample data. Here, we are enforcing at least 30 bars (n - split < 30). For a lookback of 180 days in the H4 timeframe (~1080 hours of effective trading in stock exchanges, ~270 bars), the out-of-sample period is ~81 bars. Shorter lookbacks may reduce reliability.

Please note that the stability_stat is specific to the timeframe (H4 in our case). A basket stable on H4 may not be on D1 or M15, so compare within the same timeframe. Also, take into account that the stability of portfolio weights alone isn’t enough. A basket with a very negative stability_stat but a long half_life may be stable, but too slow for practical trading. We should always combine it with half-life and p-value (Engle-Granger) or eigen_strength (Johansen).

In our coint_rank table, we are prioritizing baskets with stability_stat more negative than -2.86 for 5% significance, and we are combining it with the half_life and the p_value for pairs-trading, or with the half_life and the eigen_strength for baskets.

Below we have these filters applied for baskets of stocks, using the eigen_strength from the Johansen cointegration test.

SELECT timeframe as tf, lookback as lb, assets as basket, 
coint_vector as weights, eigen_strength as strength, 
stability_stat as stability, half_life as rev_bars
FROM 'coint_rank'
WHERE stability_stat < -2.86
ORDER BY eigen_strength DESC;

Fig. 9 - Metaeditor database interface showing the coint_rank table with selected baskets for backtest

Fig. 9. Metaeditor database interface showing the coint_rank table with selected baskets for backtest

We are choosing baskets with the highest cointegration strength among those with stability statistics below our -2.86 threshold. In light blue, we highlighted the top five for consideration. Note that the first four among the top five have at least one weight (that is, one cointegration vector) exceptionally high relative to the normalized one (1.0). 

Take the third basket as an example:

Ticker (symbol) AVGO LAES MRVL ASX
Rounded weight on portfolio
1.0 47.6 -3.1 -78.4

Table 2. A stock basket with very high sell short requirements

That means that for each AVGO share in our position, we would have to buy 47 LAES shares and simultaneously sell 3 MRVL and 78 ASX shares. As small retail traders, we should avoid this type of portfolio, because, unless we have plenty of money and fast access (think “priority” access) to our broker, we may face hard times and high transaction costs when trying to short sell this volume of relatively low liquidity shares. Again, “low liquidity” here is relative to semiconductor industry top liquidity shares, like NVDA, INTC, and AVGO, for example.

We would rather choose a more homogeneous portfolio, like that on the fifth line. 

Ticker (symbol) INTC WOLF NVTS ASX
Rounded weight on portfolio 1.0 -0.6 1.46 -0.46

Table 3. A stock basket with similar portfolio weights

The slightest weaker cointegration strength is widely compensated by the good stability statistics (-3.72). Moreover, it has a half-time to mean reversion estimated in only three H4 bars, or nearly twelve hours, which fits perfectly with the holding time we are expecting for our statistical arbitrage swing trading.

To avoid this imbalance in our top-ranking baskets, we can filter out the tickers that are causing it: LAES and ASX. Note that in Table 3, ASX doesn’t cause the imbalance. That is because it is highly disbalanced against NVDA, but we do not want to filter out NVDA since it is part of our core strategy, our initial hypothesis.

SELECT timeframe as tf, lookback as lb, assets as basket, 
coint_vector as weights, eigen_strength as strength, 
stability_stat as stability, half_life as rev_bars
FROM 'coint_rank'
WHERE stability < -2.86 
AND basket NOT LIKE '%LAES%'
AND basket NOT LIKE '%ASX%'
ORDER BY strength DESC;

Fig. 10 - Metaeditor database interface showing the coint_rank table with selected baskets for backtest excluding LAES and ASX tickers

Fig. 10. Metaeditor database interface showing the coint_rank table with selected baskets for backtest excluding LAES and ASX tickers

Now we have more balanced portfolios among the top five. In particular, the second row shows a promising basket (without NVDA), but with high cointegration stability (-4.29) and with estimated bars for reversion (half_life) of about 48 hours (12 x H4), which fits perfectly our swing trading style.

Let’s backtest it and see if we have any improvement over our previous backtest done without these two additional ranking criteria. (Later, we can check that with NVDA in the third row.)


Backtest it

As we did in the previous backtest, we are hardcoding the trading parameters in our EA for backtesting. To understand why, please take a look at the previous article's backtest.

// check if we are backtesting
   if(MQLInfoInteger(MQL_TESTER))
     {
      Print("Running on Tester");
      ArrayResize(symbols, 4);
      ArrayResize(weights, 4);
      // "H4",120,"INTC,AMD,AVGO,MU"
      // "[1.0, -0.5005745550348311, 0.7458706676501435, -0.3108921739081775]"
      symbols[0] = "INTC";
      symbols[1] = "AMD";
      symbols[2] = "AVGO";
      symbols[3] = "MU";
      //---
      weights[0] = 1.0;
      weights[1] = -0.5005745550348311;
      weights[2] = 0.7458706676501435;
      weights[3] = -0.3108921739081775;
      timeframe = PERIOD_H4;
      //InpLookbackPeriod = 120;
     }
   else
     {
      // Load strategy parameters from database
      if(!LoadStrategyFromDB(InpDbFilename,
                             InpStrategyName,
                             symbols,
                             weights,
                             timeframe,
                             InpLookbackPeriod))

Note that we included the lookback period as a possible user input (InpLookbackPeriod). From now on, we are separating the lookback used for the mean spread/std dev calculation (used for entering/exiting positions) from the lookback period used for assessing the cointegration. Also, the mean and std dev lookback now can be used in our optimizations, as it was the case here.

Fig. 11 - Screenshot showing the inputs used in the backtest

Fig. 11. Screenshot showing the inputs used in the backtest

Below you see the optimization results sorted by Sharpe Ratio.

Fig. 12 - Screenshot showing the backtest optimization results sorted by Sharpe Ratio

Fig. 12. Screenshot showing the backtest optimization results sorted by Sharpe Ratio

We ran a single test on the second one, highlighted in pale blue. Compared with the first row, this second place has a slightly lower Sharpe Ratio (6.87). But this is widely compensated by its higher Recovery Factor (5.72), smaller drawdown (3.36), and more than twice the Expected Payoff, with nearly half of the trades, which means lower transaction costs.

Note that in this optimization pass, it is entering positions after 3.0 std dev from the mean and exiting at 0.8 std dev. Also, we are backtesting for a 120-day cointegration lookback period and a lookback period of 90 days for the mean spread and std dev calculations.

The backtest covered a period of four months.

Fig. 13 - Screenshot showing the settings used in the backtest

Fig. 13. Screenshot showing the settings used in the backtest

The consolidated report shows promising numbers.

Fig. 14 - Screenshot showing the backtest consolidated report statistics

Fig. 14. Screenshot showing the backtest consolidated report statistics

Note the bad history quality and focus on the fundamentals of statistical arbitrage, not in these particular results.

Besides the stats already mentioned above, note the ratio between profit trades (51.67%) and loss trades (48.33%). This small edge is characteristic of statistical arbitrage. 

The balance/equity graph shows a sustainable evolution of the capital curve, with small drawdowns.

Fig. 15 - Screenshot showing the backtest balance/equity graph

Fig. 15. Screenshot showing the backtest balance/equity graph

It seems like we addressed the weird concentration of trades we found in the previous backtest, after including stability of the portfolio weights and mean reversion half-time in our scoring system.

Fig. 16 - Screenshot showing the backtest position times, days, and months

Fig. 16. Screenshot showing the backtest position times, days, and months

Now, our trades are well distributed on all days of the week and all four chosen months. The only concentration was in the trading session (USA), which was expected since we were backtesting Nasdaq stocks.

The MFE and MAE profit distribution shows some room for improvement. 

Fig. 17 - Screenshot showing the backtest MFE and MAE profit distribution

Fig. 17. Screenshot showing the backtest MFE and MAE profit distribution

Probably, we can have better results when we implement the dynamic portfolio rebalancing, which is the main topic of our next installment.

Finally, the main issue in our previous backtest: position holding times.

Fig. 18 - Screenshot showing the backtest position holding times

Fig. 18. Screenshot showing the backtest position holding times

The maximal and average position holding times suggest that the inclusion of the mean reversion half-time in our scoring system contributed to the overall improvement we see here. Previously, we had the average position holding time of about a month, with at least one position that remained open for more than fifteen weeks. Now we have an average of about 48 hours, with only three positions that remained open for more than a week. Classical swing trade!


Conclusion

In this article, we described, implemented, and backtested two ranking criteria for a minimal scoring system aimed at ranking baskets of stocks for statistical arbitrage through cointegration. These two criteria are part of the scoring system we started describing and developing in the previous article. That is, the two criteria described here, and the other three described in the previous article, compose a unique scoring system to be applied at once when qualifying the stock baskets for backtesting.

The first criterion described is the stability of the cointegration vector, which assesses the portfolio weights stability over time. Since the portfolio weights define the portfolio hedging, the desired market-neutrality of stat arb strategies is highly dependent on them being stable. This criterion helps in eliminating baskets in which they fluctuate beyond a defined threshold.

The second criterion presented here is the time to mean reversion (half-time), which estimates the average position holding time. This criterion is directly tied to the strategic criteria described in the previous article, in which we defined whether we will operate in short or extended timeframes, that is, if we want to trade intraday, daily, or longer periods. By taking into account the estimated time to mean reversion, we avoid including baskets that would require excessive time to exit the market, so keeping our risk-exposure within the admissible limits.

Finally, we ran a backtest after using these two criteria, along with the three previously described, and its results suggest that we can have a much better basket selection when the time to mean reversion and the stability of the portfolio weights are part of the scoring system.

We include here all the files required to reproduce the experiment, including the backtest settings (*.ini file) and the optimization config (*.set file).

File Description
config\CointNasdaq.INTC.H4.20250701_20251031.020.ini Backtest configuration settings
config\CointNasdaq.set Optimization parameters SET file
Experts\StatArb\CointNasdaq.mq5 Sample Expert Advisor
Files\StatArb\schema-0.5.sql Updated database schema (DDL)
Include\StatArb\CointNasdaq.mqh Sample Expert Advisor header
Python\coint_ranker_auto.py Python class for ranking and storing the cointegration tests
Attached files |
Last comments | Go to discussion (2)
Cyberdude
Cyberdude | 9 Dec 2025 at 15:50

First, I appreciate your attempt to explain this topic simply.

But I think your backtests are far from reality.

A delay of 0 and modeling 'every tick' are both unrealistic because neither exists. A delay of 0 doesn't actually exist. Set it to at least 100 ms.
And 'every tick' modeling is a manufactured tick by the MT5. You need a 'real tick'.
If I were you, I would make it very clear that this is a sure-loss strategy for retail MT5 users.
Jocimar Lopes
Jocimar Lopes | 10 Dec 2025 at 09:05
Cyberdude #:

First, I appreciate your attempt to explain this topic simply.

But I think your backtests are far from reality.

A delay of 0 and modeling 'every tick' are both unrealistic because neither exists. A delay of 0 doesn't actually exist. Set it to at least 100 ms.
And 'every tick' modeling is a manufactured tick by the MT5. You need a 'real tick'.
If I were you, I would make it very clear that this is a sure-loss strategy for retail MT5 users.
Yes, you are right in these points: a delay of 0 is unrealistic, as is the 'every tick' choice. But there is a reason for these choices.

'Every tick' is because we have to deal with very low quality history data for stocks symbols in the default demo account without Exchange subscription. 'Every tick' provided, although yet low quality, slightly better history data.

When it comes to the 0 delay, that is because in this article I was focused in describing the proposed scoring system skeleton, not with the strategy performance in real trading. So, I didn't even think about it.

Don't take me wrong. You are right and **your alert for the readers is valid**. This 'strategy' must be read between quotes. It was never intended as a real world strategy.

Thank you.
Developing a Trading Strategy: The Butterfly Oscillator Method Developing a Trading Strategy: The Butterfly Oscillator Method
In this article, we demonstrated how the fascinating mathematical concept of the Butterfly Curve can be transformed into a practical trading tool. We constructed the Butterfly Oscillator and built a foundational trading strategy around it. The strategy effectively combines the oscillator's unique cyclical signals with traditional trend confirmation from moving averages, creating a systematic approach for identifying potential market entries.
Optimizing Long-Term Trades: Engulfing Candles and Liquidity Strategies Optimizing Long-Term Trades: Engulfing Candles and Liquidity Strategies
This is a high-timeframe-based EA that makes long-term analyses, trading decisions, and executions based on higher-timeframe analyses of W1, D1, and MN. This article will explore in detail an EA that is specifically designed for long-term traders who are patient enough to withstand and hold their positions during tumultuous lower time frame price action without changing their bias frequently until take-profit targets are hit.
Price Action Analysis Toolkit Development (Part 49): Integrating Trend, Momentum, and Volatility Indicators into One MQL5 System Price Action Analysis Toolkit Development (Part 49): Integrating Trend, Momentum, and Volatility Indicators into One MQL5 System
Simplify your MetaTrader  5 charts with the Multi  Indicator  Handler EA. This interactive dashboard merges trend, momentum, and volatility indicators into one real‑time panel. Switch instantly between profiles to focus on the analysis you need most. Declutter with one‑click Hide/Show controls and stay focused on price action. Read on to learn step‑by‑step how to build and customize it yourself in MQL5.
Automating Trading Strategies in MQL5 (Part 38): Hidden RSI Divergence Trading with Slope Angle Filters Automating Trading Strategies in MQL5 (Part 38): Hidden RSI Divergence Trading with Slope Angle Filters
In this article, we build an MQL5 EA that detects hidden RSI divergences via swing points with strength, bar ranges, tolerance, and slope angle filters for price and RSI lines. It executes buy/sell trades on validated signals with fixed lots, SL/TP in pips, and optional trailing stops for risk control.