Feature Engineering for ML (Part 9): Structural Break Tests in Python
Table of Contents
- Introduction
- Two Categories of Structural Break Tests
- CUSUM Tests: Chu-Stinchcombe-White
- Explosiveness Tests: Chow-Type Dickey-Fuller
- SADF: Supremum Augmented Dickey-Fuller
- Sub- and Super-Martingale Tests
- QADF and CADF: Robustifying the Supremum
- Why the Book Snippets Do Not Run in Production
- The Production Problem: Window Selection Over Time
- Structural Breaks as Regime Selectors
- Results on Synthetic Data
- Conclusion
- References
Introduction
The preceding articles in this Feature Engineering for ML series built features from the structure of time itself: Part 1 established fractional differentiation as a way to preserve long memory in a stationary series; Part 3 embedded the trading calendar into Fourier coordinates; and Part 5 compressed tick-level order flow into bar-indexed microstructural statistics. This article builds features from a qualitatively different question: not what the current bar looks like, but whether the data-generating process that produced it has recently changed.Chapter 17 of López de Prado's Advances in Financial Machine Learning opens with a pointed observation. Structural breaks, the transition from one market regime to another, represent some of the best risk-adjusted trading opportunities precisely because most participants are caught off guard. A mean-reverting dynamic that gives way to momentum traps traders who continue to fade breakouts; a trending market that reverts catches traders who held directional positions too long. The actors on the losing side do not immediately recognise their mistake. They hold, they average down, and eventually they are stopped out. It is this forced liquidation that creates the edge. But identifying the transition in real time, rather than in retrospect, requires a formal statistical test.
The module afml.structural_breaks implements two families of such tests. The first, CUSUM tests, measure whether cumulative forecast errors deviate significantly from white noise. The second, explosiveness tests, detect exponential growth or collapse that is inconsistent with a random walk. This article covers both families. It adds two variants not in the original implementation and documents performance problems in the book's Python snippets that make them unusable on realistic series lengths. It closes by developing the chapter's implied link to strategy selection.
Two Categories of Structural Break Tests

Figure 1. Architecture of the structural break test taxonomy from AFML Chapter 17
- Left branch (CUSUM): Tests whether cumulative forecast errors deviate significantly from white noise. Brown-Durbin-Evans (§17.3.1) is regression-based and not yet implemented; Chu-Stinchcombe-White (§17.3.2) operates on levels only and is fully implemented in afml.structural_breaks.
- Right branch (Explosiveness): Tests for exponential growth or collapse. The Chow-Type DFC tests for a single unknown break date. SADF uses a nested double loop to detect multiple regime switches. The sub/super-martingale tests (SM-Poly, SM-Exp, SM-Power) operate under alternative functional forms with a φ-penalisation for long-run bubble bias.
- New additions: QADF (§17.4.2.4) and CADF (§17.4.2.5) robustify the SADF supremum statistic against outliers in the inner ADF distribution. Both are implemented in this module and are absent from the original codebase.
- Output (green node): All tests produce bar-indexed scalar series that plug directly into the ML feature pipeline built in prior articles.
CUSUM Tests: Chu-Stinchcombe-White
The Chu-Stinchcombe-White test (Homm and Breitung, 2012) simplifies the earlier Brown-Durbin-Evans method by dropping the regressor array entirely. It assumes that the null hypothesis is no trend, E_{t-1}[Δy_t] = 0, which allows the test to work directly on the levels {y_t}. For each pair of time points (n, t), it standardises the departure of log-price y_t from the reference level y_n:

Under H_0 : β_t = 0, S_{n,t} ~ N[0, 1]. To remove the dependence on the arbitrary reference level y_n, the test takes the supremum over all n ∈ [1, t]. The time-dependent critical value derived via Monte Carlo is:
![Critical value formula c_alpha[n,t] Critical value formula c_alpha[n,t]](https://c.mql5.com/2/224/eq2_csw_critical.png)
When the statistic exceeds its critical value, the null of no trend is rejected. The practical signal for a feature pipeline is the normalized difference (stat − critical value), which measures by how much the rejection margin is exceeded. A positive value entering a rolling window indicates that the series has been trending; a value near zero indicates white-noise behaviour.
from afml.structural_breaks import get_chu_stinchcombe_white_statistics csw = get_chu_stinchcombe_white_statistics( log_prices, test_type="one_sided", # or "two_sided" ) # Returns pd.DataFrame with columns "stat" and "critical_value" # Feature: csw["stat"] - csw["critical_value"] (excess over rejection boundary) excess = csw["stat"] - csw["critical_value"]
The two-sided version takes the absolute value of the departure, making it symmetric with respect to rallies and sell-offs. For most feature pipelines the one-sided version is more informative because the sign of the departure (upward vs. downward deviation) is itself a regime signal.
Explosiveness Tests: Chow-Type Dickey-Fuller
The Chow-Type test (Chow, 1960) follows a first-order autoregressive specification for the null (random walk) and alternative (explosive process) hypotheses. It asks whether, at some unknown time τ*T, the series switched from a unit-root process to an explosive one. The test fits the regression:
where D_t[τ*] is a dummy variable that equals zero before τ*T and one from τ*T onward. The test statistic is the t-ratio of δ, denoted DFC_{τ*}. Because τ* is unknown, Andrews (1993) proposed trying all possible break dates in [τ_0, 1−τ_0] and taking the supremum:
The main limitation of this approach is that it assumes only one break date. A bubble-burst-recovery cycle appears stationary to the Chow test because the two explosive episodes cancel each other out. For that reason SDFC is best used as a complementary feature alongside SADF, not as a standalone signal. Its value as a feature is the break-point location itself: when SDFC peaks, the corresponding τ* index identifies where the market most likely switched regimes, which is useful as a conditioning variable for strategy selection.
from afml.structural_breaks import get_chow_type_stat sdfc = get_chow_type_stat(log_prices, min_length=20) # Returns pd.Series of DFC_τ* statistics # SDFC = sdfc.max() — supremum over all tested τ*
SADF: Supremum Augmented Dickey-Fuller
Phillips, Wu and Yu (2011) showed that standard unit-root tests may not distinguish a stationary process from a periodically collapsing bubble. Their solution, SADF, fits the augmented Dickey-Fuller regression at each endpoint t, backwards expanding the start point t_0:

For each endpoint t, all combinations of (t_0, t) are tried. SADF takes the supremum of the resulting ADF statistics:

where τ is the minimum sample length used for estimation. When β > 0, the series is in an explosive regime; the SADF statistic rises. When β < 0, the series is steady, mean-reverting toward a long-run level; the statistic is negative or near zero. β = 0 is the unit-root case, where the series is non-stationary but not explosive.
The chapter identifies four functional-form choices for the ADF specification. The linear model (constant + linear trend) is the standard ADF. The quadratic model adds t² to the trend component. Both operate on log-price differences. The SM models (sm_poly_1, sm_poly_2, sm_exp, sm_power) operate on levels or log-levels directly, testing for sub- or super-martingale behaviour under specific functional forms. Each model is a separate hypothesis about the shape of the explosive trajectory; including all of them in the feature matrix lets the downstream model select whichever is most predictive for the current asset.
from afml.structural_breaks import get_sadf # Standard linear SADF sadf_linear = get_sadf( log_prices, model="linear", lags=1, min_length=20, add_const=True, ) # Exponential sub/super-martingale test with φ=0.5 smt_exp = get_sadf( log_prices, model="sm_exp", lags=1, min_length=20, phi=0.5, )
Log prices, not raw prices
The chapter contains a precise warning on this point that is easy to overlook. When the ADF null is rejected on raw prices, the conclusion is that prices have finite variance. The implied corollary is that return variance is not time-invariant: it must decrease as prices rise and increase as prices fall, in order to keep price variance constant. Running ADF on raw prices therefore embeds a structural heteroscedasticity assumption that is almost certainly violated over long samples with bubbles. Log-price stationarity tests are correctly interpreted as tests on return mean, not return variance. Always pass np.log(price_series), not the raw price series, to get_sadf().
Sub- and Super-Martingale Tests
The SM family replaces the ADF specification with regressions that use functional forms of time as regressors. The motivation is to detect explosive trends that do not fit the AR(1) model assumed by ADF. Under each specification, the test regresses y (or log y) on a function of t and tests whether the coefficient on that function is zero. The same backwards-expanding window logic applies, and the same supremum is taken:

The φ parameter corrects for a structural bias toward long-run bubbles. In the simple regression case, the variance of β shrinks as (t−t_0) grows, which means long samples with small but persistent trends produce artificially large t-statistics. Setting φ = 0.5 cancels this effect exactly; φ = 0 gives the raw SMT statistic; φ → 1 increasingly favours short-run bubbles over long-run ones. In practice, including SMT at φ = 0, 0.5, and 1.0 as separate features gives the ML algorithm a mechanism to discriminate between different holding horizons of the explosive episode.
QADF and CADF: Robustifying the Supremum
The SADF supremum statistic is sensitive to outliers in the inner ADF distribution {ADF_{t0,t}}_{t0∈[1,t−τ]}. A single window with a near-singular design matrix can produce a spuriously large t-statistic that dominates the supremum, biasing SADF upward. The chapter proposes two alternatives.
QADF (§17.4.2.4) replaces the supremum with a high quantile. For a user-specified q ∈ [0, 1], Q_{t,q} is the q-th quantile of the inner ADF distribution at time t. A companion dispersion measure Qdot_{t,q,v} = Q_{t,q+v} − Q_{t,q−v} captures the width of the upper tail. SADF itself is the special case Q_{t,1}, but Q_{t,0.95} is far more robust to outlier windows.
CADF (§17.4.2.5) uses the conditional mean of the ADF values above the q-th quantile, which the chapter denotes C_{t,q}. This is the expected value of ADF_{t0,t} given that it exceeds the threshold Q_{t,q}. By construction C_{t,q} ≤ SADF_t. A scatter plot of SADF_t against C_{t,q} reveals an ascending line with approximately unit gradient under normal conditions; when SADF rises far above C_{t,q}, it signals that the supremum is being driven by an outlier window rather than by a broad explosive trend in the inner distribution.
from afml.structural_breaks import get_qadf, get_cadf qadf = get_qadf(log_prices, model="linear", lags=1, min_length=20, q=0.95, v=0.025) # qadf["q_adf"] — Q_{t,0.95} (robust centrality of high ADF values) # qadf["q_dot"] — Qdot_{t,q,v} (width of upper tail) cadf = get_cadf(log_prices, model="linear", lags=1, min_length=20, q=0.95) # cadf["c_adf"] — C_{t,q} (conditional mean above threshold) # cadf["c_dot"] — conditional std above threshold # Outlier diagnostic: when this ratio is large, SADF is driven by a single window outlier_ratio = (sadf_linear - cadf["c_adf"]) / cadf["c_dot"].clip(lower=0.01)
Why the Book Snippets Do Not Run in Production
Snippets 17.1–17.4 in the chapter are explicitly described as pedagogical: "The purpose of this code is not to estimate SADF quickly, but to clarify the steps involved in its estimation." The chapter even includes a FLOP count table showing that a single SADF update on a dollar bar series with T=356,631 bars requires approximately 2.035 TFLOPs. On a series of that length, the full SADF time series requires an estimated 242 PFLOPs — a figure that makes the algorithm's parallelization requirements explicit.
Below that scale, on the kind of daily or hourly bar series used in research pipelines, the book snippets fail in three concrete ways.

Figure 2. 2-panel illustration of book snippet timing vs. optimized implementation
- Panel (a): Absolute timing on a log scale. The green bars (optimized) are consistently 10–50× shorter than the red bars (book snippets). For the CSW test at n=150, the book takes 375 ms vs. the optimized version's 10 ms. The gap widens with n because of the O(T²) scaling.
- Panel (b): Speedup factors. The SADF inner loop at n=100 is 50× faster because the book version rebuilds a Pandas DataFrame from scratch on every call to getYX(), whereas the optimized version pre-computes X and y as NumPy arrays once and passes slices. The CSW test at n=80 is 32× faster because the optimized version precomputes σ²_t via a cumulative sum once, rather than recomputing it inside the O(T²) inner loop.
The three failure modes, in order of severity:
1. O(T²) Series.loc inside the CSW inner loop
The original CSW implementation calls series.loc[index] inside a double Python loop. Although Pandas DatetimeIndex lookup is O(1) amortised, the constant overhead from label resolution is significant relative to the arithmetic work. At n=150 bars, the book version runs in 375 ms. Extrapolated to n=5,000 — a modest two-year daily series — that becomes approximately 7.6 minutes. On a 10-year series it is over 30 minutes.
The fix is to convert the series to a NumPy array once and precompute σ²_t via a cumulative sum np.cumsum(np.diff(vals)**2). The inner loop then works entirely with integer indices and float64 arithmetic, with no Python object overhead.
2. Pandas DataFrame reconstruction inside the SADF inner loop
Snippet 17.2 (getYX) builds a Pandas DataFrame on every call. The SADF outer loop calls this function once per time step t, and the inner loop at each t calls getBetas() approximately (t − τ) times. The DataFrame construction — including index alignment, copy-on-write checks, and column naming — is purely overhead: the computation needs only NumPy arrays. The fix is to build the full X and y arrays once outside the outer loop and pass read-only slices by reference.
3. Silent argument-order bug in getBetas
The book signature is getBetas(y, x) — outcome first, then features. Every calling site in the snippets is correct, but the convention is the opposite of the scikit-learn standard (X, y). Any user who ported the function and called it with getBetas(X, y) would obtain a numerical result without an error message; the beta vector would correspond to the regression of X on y rather than y on X. The optimized implementation standardises the signature to get_betas(X, y) throughout and raises a clear error if shapes are inconsistent.
None of these are criticisms of the chapter. The book's snippets serve their stated purpose: they clarify what the algorithm computes. Production use requires the optimizations described above.
The Production Problem: Window Selection Over Time
The SADF algorithm uses a backwards-expanding window: the right edge of the window is fixed at the current time t, and the left edge t_0 expands backward to the beginning of the sample. This means that as more bars accumulate, every SADF update re-reads the full history from bar 1. On a series that grows by one bar per day, the computation time grows quadratically with calendar time.
The practical question is whether the full expanding window is necessary, or whether a rolling window of fixed length W produces a useful signal at much lower cost. The answer depends on the type of bubble being detected.
For short-run bubbles with a holding period of days to weeks, a rolling window of W = 252 bars (one year of daily data) captures all relevant structure. The left edge discards history older than one year, which is acceptable because regime switches from two years ago are no longer tradeable. For long-run bubbles spanning multiple years — the dot-com and subprime-crisis examples in the chapter — the expanding window is necessary because the test needs enough contrast between the pre-bubble and bubble regimes to achieve statistical power.
The practical solution is to implement a hybrid: use a fixed-size lookback window L for the expanding left edge, set large enough to capture the longest expected bubble duration but not equal to the full history. For daily equity data, L = 504 (two years) is a reasonable default. Computing SADF on [max(1, t−L), t] preserves sensitivity to medium-run bubbles and bounds the per-bar cost to O(L²) instead of O(T²). The get_sadf() implementation supports this via the window boundaries passed to _get_y_x_numpy():
# Rolling-window SADF: limit lookback to the last 504 bars L = 504 results = {} for t in range(min_length, len(log_prices)): window = log_prices.iloc[max(0, t - L): t + 1] sadf_t = get_sadf(window, model="linear", lags=1, min_length=min_length) if len(sadf_t) > 0: results[log_prices.index[t]] = sadf_t.iloc[-1] sadf_rolling = pd.Series(results)
The rolling version changes the interpretation of the statistic slightly: it answers "is the process explosive in the last L bars?" rather than "has it ever been explosive?". For a feature intended to select the current trading strategy, the shorter-horizon interpretation is more appropriate. For a risk management signal intended to detect long-run bubble accumulation, the full expanding window is needed.
Structural Breaks as Regime Selectors
The chapter's opening motivation — that structural break transitions offer some of the best risk-adjusted opportunities in financial markets — implies a connection between the test statistics and strategy selection that the chapter does not make explicit. The connection is this: the three regimes identified by the SADF specification (steady, unit-root, explosive) map directly onto the strategy regime in which different signal types perform best.
In the steady regime (β < 0, SADF deeply negative), the process is mean-reverting with a finite equilibrium level. The half-life of deviations from that level can be estimated directly from the ADF coefficient: half-life = −log(2) / log(1 + β). In this regime, signals based on over-extension from moving averages, Bollinger Band reversions, and pairs-trading spread mean-reversion all have theoretical support. The market maker's inventory model is well-posed in the steady regime because the market is clearing at a stable price level.
In the explosive regime (β > 0, SADF spiking), the process is trending with a trajectory that the model predicts will either continue exponentially or reverse abruptly. The chapter emphasises that this regime is where most participants are caught off guard: those positioned for mean-reversion are fighting the trend, and the forced unwinding of those positions is what sustains the explosive move. The appropriate strategy is to run with the trend, size positions modestly because the exit timing is uncertain, and monitor SADF for the first sign that the expansive window set {ADF_{t0,t}} is collapsing — which signals the bubble bursting.
The unit-root regime (β ≈ 0, SADF near zero) is the most difficult to trade. The process is genuinely unpredictable under the ADF specification: past prices carry no information about future direction. This is the regime where random entry with tight stops performs as well as any signal-based strategy, and where transaction costs dominate. The correct response is to reduce position size and wait for the regime to resolve in one direction.
This framework gives structural break statistics a concrete role in the feature matrix that goes beyond a simple "is there a break?" flag. Each SADF value at each bar is a continuous score of explosiveness; the three-way classification into steady, unit-root, and explosive regimes is a downstream discretization applied after the fact. Including raw SADF, SDFC break-point location, and CSW excess-over-critical-value as features lets the ML model learn regime structure from labeled data instead of relying on a fixed threshold.

Figure 3. 2-panel illustration of SADF regime zones mapped to strategy selection on synthetic data
- Panel (a): The synthetic price series with a background colour indicating the regime at each bar. Orange shading marks periods where SADF > 1.5 (explosive); blue marks SADF < −0.5 (steady/mean-reverting); grey marks the unit-root neutral zone.
- Panel (b): SADF_t with threshold lines and strategy labels. The orange "trend / event strategy" annotations mark the two explosive episodes. The blue "mean-reversion strategy" annotation marks the early steady phase. The grey "neutral / reduce exposure" label marks the period between the two explosions where SADF was near zero.
Results on Synthetic Data
The figure below applies the full test battery to 300 synthetic bars with two injected explosive episodes: a rally in bars 80–120 and a sell-off in bars 190–210. Both episodes use a drift magnitude far exceeding the baseline noise level.

Figure 4. 4-panel illustration of structural break test statistics on 300 synthetic bars with two injected explosive episodes (orange shading)
- Panel (a): The synthetic price series. The first explosive episode produces a price rally from approximately 160 to 300; the second produces a rapid reversal.
- Panel (b): The CSW one-sided statistic (solid blue) versus the time-dependent critical value c_{\alpha}[n,t] (dashed orange). The statistic clearly exceeds the critical value during both explosive episodes. The critical value rises slowly as the span (t − n) of the best reference level grows; this property means the test becomes harder to reject late in the sample, which is a deliberate design choice to prevent false alarms from data accumulation.
- Panel (c): The Chow DFC_{τ*} statistic. The SDFC supremum (gold marker) is found at bar 190 — the onset of the sell-off — because the sign change from positive to negative drift produces the largest absolute t-statistic. This illustrates both the strength and the limitation of the Chow approach: it correctly identifies the most extreme break point, but a single SDFC value at T cannot distinguish between the two separate episodes.
- Panel (d): SADF_t with the 1.5 threshold. The statistic rises into the explosive zone during both episodes and returns to negative values between them. The second spike (sell-off) is sharper because the sell-off velocity was higher than the rally velocity in the synthetic generator.
A result worth examining explicitly: the CSW statistic in panel (b) remains elevated for several bars after the end of the first explosive episode. This persistence is expected because the CSW reference level y_n is set to the lowest historical price in the expanding backward scan. After a large rally, the best reference level is still the pre-rally trough, so the departure y_t − y_n remains large even after the rally has stopped. For a feature pipeline, this persistence is informative: the test is reporting that the cumulative departure from the pre-regime level is still statistically significant, even if the current direction has reversed. That is a different piece of information from what SADF reports, which is why including both in the feature matrix adds value.
Conclusion
This article implemented the structural break test suite from AFML Chapter 17 in afml.structural_breaks. The CUSUM family is represented by the Chu-Stinchcombe-White test on levels. The explosiveness family provides the Chow-Type DFC, SADF across six regression models, QADF, and CADF. All implementations are backed by a 38-test suite covering shape, mathematical properties, regression values, edge cases, and numerical agreement with naive baselines.
The book's Python snippets are mathematically correct but computationally unusable on realistic series lengths. The three concrete problems — O(T²) Pandas label lookups in CSW, repeated DataFrame construction in the SADF inner loop, and a silent argument-order convention — each require separate fixes. The optimized implementation achieves 32–50× speedups on the innermost loops by switching to precomputed NumPy arrays and a @njit-compiled OLS kernel.
The production window selection problem — whether to use an expanding or rolling window for SADF — depends on the bubble duration being targeted. Daily series with a two-year lookback (L = 504) bound the computation to O(L²) per bar while retaining sensitivity to medium-run regime transitions.
The next article in this series will port the structural break test kernels to MQL5, implementing the CSW and SADF statistics as a reusable include file in MQL5/Include/BlueprintQuant/ for use in Expert Advisors.
References
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter 17.
- Phillips, P., Wu, Y., & Yu, J. (2011). Explosive behavior in the 1990s Nasdaq: When did exuberance escalate asset values? International Economic Review, 52, 201–226.
- Homm, U., & Breitung, J. (2012). Testing for speculative bubbles in stock markets: A comparison of alternative methods. Journal of Financial Econometrics, 10(1), 198–231.
- Andrews, D. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica, 61(4), 821–856.
- Phillips, P., Shi, S., & Yu, J. (2013). Testing for multiple bubbles 1: Historical episodes of exuberance and collapse in the S&P 500. Working paper, Singapore Management University.
- Chow, G. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica, 28(3), 591–605.
Warning: All rights to these materials are reserved by MetaQuotes Ltd. Copying or reprinting of these materials in whole or in part is prohibited.
This article was written by a user of the site and reflects their personal views. MetaQuotes Ltd is not responsible for the accuracy of the information presented, nor for any consequences resulting from the use of the solutions, strategies or recommendations described.
Building a Viewport SnR Volume Profile Indicator in MQL5
Neural Networks in Trading: Time Series Forecasting Using Adaptive Modal Decomposition (ACEFormer)
Training a nonlinear U-Transformer on the residuals of a linear autoregressive model
Beyond Maximum Drawdown: Building a Drawdown DNA Analyzer in MQL5
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use