Discussing the article: "Feature Engineering for ML (Part 1): Fractional Differentiation — Stationarity Without Memory Loss"

MetaQuotes 2026.04.16 11:44

Check out the new article: Feature Engineering for ML (Part 1): Fractional Differentiation — Stationarity Without Memory Loss.

Integer differentiation forces a binary choice between stationarity and memory: returns (d=1) are stationary but discard all price-level information; raw prices (d=0) preserve memory but violate ML stationarity assumptions. We implement the fixed-width fractional differentiation (FFD) method from AFML Chapter 5, covering get_weights_ffd (iterative recurrence with threshold cutoff), frac_diff_ffd (bounded dot product per bar), and fracdiff_optimal (binary search for minimum stationary d*).

Every machine learning pipeline for financial time series faces a preprocessing decision that most practitioners make without thinking: how to transform raw prices into features. The standard answer is to compute returns as first differences of log-prices. Returns are stationary, which satisfies the assumptions of most ML algorithms. But returns have a critical flaw: they erase memory. A return series contains no information about what price level the asset has visited, how far it has drifted from its long-term mean, or how its current level relates to historical support and resistance zones. Every value depends on exactly one predecessor, and everything before that predecessor is discarded.

This is not merely a theoretical concern. Equilibrium models need memory to assess how far a price process has drifted from its expected value. Mean-reversion strategies need memory to identify the mean. Trend-following strategies need memory to distinguish a true trend from noise. When integer differencing erases this memory, the ML algorithm must reconstruct it via feature engineering (lagged returns, rolling statistics, technical indicators). These features are imperfect proxies for information lost during preprocessing.

Fractional differentiation resolves this dilemma. Instead of differencing by an integer (0 for prices, 1 for returns), we difference by a real number d between 0 and 1. The result is a series that is stationary — satisfying ML assumptions — while preserving as much memory as possible from the original price series. López de Prado introduced this technique to the financial ML community in Chapter 5 of AFML, building on foundational work by Hosking (1981). This article develops the theory, explains the two implementation strategies (expanding window and fixed-width window), and walks through the production-grade Python implementation in the afml library. The next article in the series translates this engine into MQL5 for deployment on live MetaTrader 5 feeds.

Author: Patrick Murimi Njoroge

New comment