Time Series Econometrics and GARCH Volatility Models in Algorithmic Trading (Part 1)
1.1 Foundations of Time Series Analysis
The modern theory of time series analysis traces its origins to the work of Yule (1927), who introduced the autoregressive model, and Slutsky (1937), who demonstrated that moving average processes could generate apparently cyclical behavior from random shocks. The synthesis of these ideas by Box and Jenkins (1970) into the ARIMA modeling framework established the standard methodology for time series identification, estimation, and forecasting that remains widely used today. The Box-Jenkins approach prescribes a three-stage iterative procedure: (i) model identification through examination of the autocorrelation function (ACF) and partial autocorrelation function (PACF); (ii) parameter estimation via maximum likelihood or conditional least squares; and (iii) diagnostic checking through residual analysis.
In the context of financial return series, Hamilton (1994) provided a comprehensive treatment of time series methods, while Tsay (2010) specialized the framework for financial applications. A critical insight from this literature is that financial returns typically exhibit weak serial dependence in levels but strong dependence in squared returns—a phenomenon known as volatility clustering. This observation motivated the development of conditional heteroskedasticity models, as traditional ARMA models assume a constant unconditional variance.
1.2 ARCH and GARCH Models: Development and Extensions
Engle's (1982) ARCH model was the first to formally parameterize time-varying conditional variance as a function of past squared innovations. The GARCH(1,1) generalization by Bollerslev (1986) introduced lagged conditional variance terms, yielding a more parsimonious specification that captured long-memory effects in volatility. Nelson (1991) proposed the Exponential GARCH (EGARCH) model to address the leverage effect [1] —the asymmetric volatility response to positive and negative return shocks—while Glosten, Jagannathan, and Runkle (1993) developed the GJR-GARCH threshold model for the same purpose. Zakoian (1994) introduced the Threshold GARCH (TGARCH) model with an alternative asymmetric specification.
The empirical adequacy of GARCH models has been extensively studied. Hansen and Lunde (2005) compared 330 ARCH-type models for exchange rate and equity return volatility and found that the simple GARCH(1,1) is remarkably difficult to outperform [2] . Andersen and Bollerslev (1998) reconciled the apparent poor performance of GARCH models in predicting realized volatility by demonstrating that much of the forecast evaluation literature used noisy proxies for the true latent variance process. Subsequent work by Andersen, Bollerslev, Diebold, and Labys (2003) introduced realized volatility estimators based on high-frequency data, providing more accurate benchmarks for evaluating conditional variance models.
1.3 Time Series Models in Algorithmic Trading
The application of time series models to trading strategy development has a substantial history. Chan (2009) provided a practitioner-oriented overview of statistical arbitrage strategies based on mean-reverting time series processes. Avellaneda and Lee (2010) developed pairs trading strategies grounded in cointegration theory, an extension of univariate time series methods to multivariate settings. More recently, the integration of machine learning with classical time series methods has attracted considerable attention (Dixon, Halperin, and Bilokon, 2020), though the fundamental building blocks remain the ARMA and GARCH specifications.
Several studies have specifically examined GARCH-based trading strategies. Engle and Sokalska (2012) proposed intraday volatility-based trading rules. Alexander and Lazar (2006) evaluated GARCH-based Value-at-Risk models for position sizing in commodity trading. Brownlees, Engle, and Kelly (2011) developed a practical framework for volatility forecasting in risk management applications. Despite this body of work, a rigorous academic treatment that bridges the gap between GARCH theory and full-stack algorithmic trading system design remains conspicuously absent from the literature.
1.4 Volatility Forecasting and Economic Value
The question of whether superior volatility forecasts translate into economic value has been addressed from several angles. Fleming, Kirby, and Ostdiek (2001, 2003) demonstrated that volatility timing strategies based on conditional variance forecasts can generate substantial economic gains for mean-variance investors, even after accounting for transaction costs. West, Edison, and Cho (1993) established the theoretical conditions under which improved volatility forecasts yield higher expected utility for portfolio investors. Engle and Colacito (2006) developed a model-confidence-set approach to volatility forecast evaluation that directly maps to economic loss functions. Our research builds upon this tradition by embedding GARCH volatility forecasts within a complete algorithmic trading architecture and evaluating economic performance through strategy-level metrics including Sharpe ratio, maximum drawdown, and Calmar ratio.
2.1 Stochastic Processes and Stationarity
Let (Ω, ℱ, P) denote a probability space. A discrete-time stochastic process {Xₜ}ₜ∈ℤ is a collection of random variables defined on this space, indexed by integer-valued time. We adopt the following definition of covariance stationarity.
Definition 3.1 (Covariance Stationarity). A stochastic process {Xₜ} is covariance stationary (weakly stationary) if (i) E[Xₜ] = μ for all t; (ii) Var(Xₜ) = γ(0) < ∞ for all t; and (iii) Cov(Xₜ, Xₜ₋ₖ) = γ(k) depends only on the lag k and not on t.
For financial applications, we define the log-return process rₜ = ln(Pₜ/Pₜ₋₁), where Pₜ denotes the asset price at time t. Under standard regularity conditions, the return process rₜ is typically (at least approximately) covariance stationary, even when the price process Pₜ is non-stationary (integrated of order one). This observation motivates the common practice of modeling returns rather than price levels.
2.2 Autoregressive and Moving Average Processes
Definition 3.2 (AR(p) Process). A stochastic process {rₜ} follows an autoregressive process of order p, denoted AR(p), if rₜ = c + ϕ₁rₜ₋₁ + ϕ₂rₜ₋₂ + … + ϕₚrₜ₋ₚ + εₜ, where c is a constant, ϕ₁, …, ϕₚ are autoregressive coefficients, and {εₜ} is a white noise process with E[εₜ] = 0 and Var(εₜ) = σ².
The AR(p) process is covariance stationary if and only if all roots of the characteristic polynomial Φ(z) = 1 − ϕ₁z − ϕ₂z² − … − ϕₚzᵖ lie outside the unit circle. Under stationarity, the process admits the infinite moving average (Wold) representation rₜ = μ + ∑ⱼ₌₀^∞ ψⱼεₜ₋ⱼ, where the coefficients ψⱼ are absolutely summable.
Definition 3.3 (MA(q) Process). A stochastic process {rₜ} follows a moving average process of order q, denoted MA(q), if rₜ = c + εₜ + θ₁εₜ₋₁ + θ₂εₜ₋₂ + … + θₙεₜ₋ₙ, where θ₁, …, θₙ are moving average coefficients.
An MA(q) process is always covariance stationary regardless of the parameter values. However, for the process to be invertible—a requirement for unique identification—all roots of the MA characteristic polynomial Θ(z) = 1 + θ₁z + θ₂z² + … + θₙz⁾ must lie outside the unit circle.
2.3 ARIMA Processes
Definition 3.4 (ARIMA(p,d,q) Process). A stochastic process {Xₜ} follows an autoregressive integrated moving average process of orders (p, d, q) if the d-th difference (1 − L)ᵈ Xₜ is a stationary and invertible ARMA(p,q) process, where L denotes the lag operator.
For financial return series, d = 0 is the typical case (since returns are generally stationary), reducing the ARIMA specification to an ARMA model. However, the ARIMA framework is relevant when modeling integrated variables such as cumulative returns, asset prices, or interest rate levels that require differencing to achieve stationarity.
2.4 The ARCH Model
Engle (1982) introduced the ARCH model to formalize the observation that the conditional variance of financial returns varies over time in a predictable manner. Consider the return process decomposition rₜ = μₜ + εₜ, where μₜ = E[rₜ | ℱₜ₋₁] is the conditional mean and εₜ is the innovation.
Definition 3.5 (ARCH(m) Process). The innovation process εₜ follows an ARCH(m) process if εₜ = σₜ zₜ, where zₜ ~ i.i.d.(0,1), and the conditional variance is given by σ²ₜ = α₀ + α₁ε²ₜ₋₁ + α₂ε²ₜ₋₂ + … + αₘε²ₜ₋ₘ, where α₀ > 0 and αᵢ ≥ 0 for i = 1, …, m.
The positivity constraints on the αᵢ parameters ensure that the conditional variance is non-negative. The unconditional variance of εₜ is given by Var(εₜ) = α₀ / (1 − ∑ᵢ₌₁ᵐ αᵢ), which exists and is finite if and only if ∑ᵢ₌₁ᵐ αᵢ < 1. The kurtosis of ARCH innovations exceeds 3 (the Gaussian value), thereby generating the fat-tailed distributions observed in empirical return data.
2.5 The GARCH Model
Definition 3.6 (GARCH(p,q) Process). The innovation process εₜ follows a GARCH(p,q) process if εₜ = σₜ zₜ, where zₜ ~ i.i.d.(0,1), and the conditional variance satisfies σ²ₜ = ω + ∑ᵢ₌₁ᵖ αᵢε²ₜ₋ᵢ + ∑ⱼ₌₁ᵐ βⱼσ²ₜ₋ⱼ, where ω > 0, αᵢ ≥ 0, and βⱼ ≥ 0.
The GARCH(1,1) specification, σ²ₜ = ω + αε²ₜ₋₁ + βσ²ₜ₋₁, is the workhorse model in financial econometrics due to its parsimony and empirical adequacy. The persistence of volatility shocks is governed by the sum α + β, where values close to unity indicate high persistence (the Integrated GARCH or IGARCH case). The unconditional variance equals ω/(1 − α − β) when α + β < 1.
Theorem 3.1 (Stationarity of GARCH(1,1)). The GARCH(1,1) process εₜ = σₜ zₜ with σ²ₜ = ω + αε²ₜ₋₁ + βσ²ₜ₋₁ is strictly stationary and ergodic if and only if E[ln(αz²ₜ + β)] < 0. The process is covariance stationary if and only if α + β < 1.
Proof. The strict stationarity condition follows from the theory of random recurrence equations (Bougerol and Picard, 1992). The GARCH(1,1) conditional variance can be written as σ²ₜ = ω + (αz²ₜ₋₁ + β)σ²ₜ₋₁, which is a stochastic recurrence of the form Yₜ = Aₜ Yₜ₋₁ + Bₜ with Aₜ = αz²ₜ + β and Bₜ = ω. By the theorem of Bougerol and Picard, a unique strictly stationary solution exists if and only if the top Lyapunov exponent is negative, i.e., E[ln Aₜ] = E[ln(αz²ₜ + β)] < 0. The covariance stationarity condition follows by taking unconditional expectations: E[σ²ₜ] = ω + (α + β)E[σ²ₜ₋₁], yielding E[σ²ₜ] = ω/(1 − α − β) provided α + β < 1. □
2.6 Asymmetric GARCH Extensions
2.6.1 The EGARCH Model
The Exponential GARCH model of Nelson (1991) specifies the logarithm of the conditional variance as ln(σ²ₜ) = ω + α(|zₜ₋₁| − E[|zₜ₋₁|]) + γzₜ₋₁ + β ln(σ²ₜ₋₁). The leverage effect is captured by the parameter γ: when γ < 0, negative return shocks increase volatility more than positive shocks of equal magnitude. A key advantage of the EGARCH specification is that no non-negativity constraints are required on the parameters, since the exponential function ensures σ²ₜ > 0 automatically.
2.6.2 The GJR-GARCH Model
The GJR-GARCH model (Glosten, Jagannathan, and Runkle, 1993) augments the standard GARCH specification with an indicator function: σ²ₜ = ω + αε²ₜ₋₁ + γε²ₜ₋₁ · I(εₜ₋₁ < 0) + βσ²ₜ₋₁, where I(·) is the indicator function. The parameter γ captures the differential impact of negative shocks. The news impact curve for the GJR-GARCH model is piecewise linear, with slope α for positive innovations and slope α + γ for negative innovations.
2.7 Innovation Distributions
While the standard GARCH specification assumes Gaussian innovations zₜ ~ N(0,1), empirical evidence strongly favors fat-tailed distributions. The Student-t distribution with ν degrees of freedom provides a natural extension, with the density f(z; ν) = [Γ((ν+1)/2) / (√(π(ν−2)) Γ(ν/2))] · (1 + z²/(ν−2))⁻⁽ᵛ⁺¹⁾ᐟ², defined for ν > 2. As ν → ∞, the Student-t converges to the Gaussian. An alternative is the Generalized Error Distribution (GED), which nests the Gaussian (ν = 2) and allows for both thinner and thicker tails.
3. Estimation Methodology and Diagnostic Testing
3.1 Maximum Likelihood Estimation
Let θ = (ω, α₁, …, αₙ, β₁, …, βₘ)ᵀ denote the parameter vector of a GARCH(p,q) model. Given a sample of T return observations {r₁, …, rₜ}, the conditional log-likelihood function under Gaussian innovations is ℓ(θ) = −(T/2)ln(2π) − (1/2)∑ₜ₌₁ᵀ [ln(σ²ₜ) + ε²ₜ/σ²ₜ], where εₜ = rₜ − μₜ and σ²ₜ is specified by the GARCH equation. The maximum likelihood estimator (MLE) θ̂ = argmaxₓ ℓ(θ) is obtained via numerical optimization, typically using the BFGS or Marquardt algorithms [3] .
Theorem 4.1 (Consistency and Asymptotic Normality of GARCH MLE). Under Assumptions A1–A5 (stated in Appendix A), the QMLE θ̂ₜ satisfies: (i) θ̂ₜ → θ₀ in probability as T → ∞ (consistency); and (ii) √T(θ̂ₜ − θ₀) →ᵈ N(0, V) where V = A⁻¹BA⁻¹ is the sandwich covariance matrix, A = E[∂²ℓₜ/∂θ∂θᵀ] is the Hessian, and B = E[(∂ℓₜ/∂θ)(∂ℓₜ/∂θ)ᵀ] is the outer product of gradients.
The sandwich form of the asymptotic covariance matrix V is essential because, under distributional misspecification (e.g., assuming Gaussian innovations when the true distribution is Student-t), the information matrix equality A = B does not hold, and the standard inverse-Hessian covariance estimator is inconsistent. The robust sandwich estimator remains consistent under quasi-maximum likelihood estimation, providing valid inference even when the innovation distribution is misspecified.
3.2 Model Selection Criteria
We employ three information criteria for model selection: the Akaike Information Criterion AIC = −2ℓ(θ̂) + 2k, the Bayesian Information Criterion BIC = −2ℓ(θ̂) + k·ln(T), and the Hannan-Quinn Criterion HQC = −2ℓ(θ̂) + 2k·ln(ln(T)), where k is the number of estimated parameters. The BIC is consistent (selects the true model with probability approaching one as T → ∞), while the AIC tends to overfit in finite samples. In practice, we report all three criteria and select the specification that achieves the best out-of-sample forecasting performance.
3.3 Diagnostic Testing
3.3.1 Tests for Serial Correlation
The Ljung-Box Q-statistic [4] tests for residual serial correlation: Q(m) = T(T+2) ∑ₖ₌₁ᵐ ρ̂²ₖ/(T−k), where ρ̂ₖ is the sample autocorrelation of standardized residuals at lag k. Under the null hypothesis of no serial correlation, Q(m) ~ χ²(m) asymptotically. We apply this test to both the standardized residuals ẑₜ = ε̂ₜ/σ̂ₜ and the squared standardized residuals ẑ²ₜ.
3.3.2 ARCH-LM Test
Engle's (1982) Lagrange Multiplier test for ARCH effects [5] regresses ẑ²ₜ on a constant and m lags: ẑ²ₜ = α₀ + α₁ẑ²ₜ₋₁ + … + αₘẑ²ₜ₋ₘ + uₜ. The test statistic T·R² from this auxiliary regression follows a χ²(m) distribution under the null hypothesis of no remaining ARCH effects. Rejection of the null on raw return residuals motivates GARCH modeling; failure to reject on GARCH-filtered residuals validates the volatility specification.
3.3.3 Distributional Tests
We assess the distributional assumptions on the standardized residuals using the Jarque-Bera test for normality, the Kolmogorov-Smirnov test for goodness-of-fit to the hypothesized innovation distribution, and probability integral transform (PIT) histograms. If the model is correctly specified, the PIT values uₜ = F(ẑₜ; θ̂) should be approximately uniformly distributed on [0,1].
3.4 Volatility Forecasting
For a GARCH(1,1) model, the h-step-ahead conditional variance forecast from time T is given by the recursion σ²ₜ₊ₕ|ₜ = ω + (α + β)σ²ₜ₊ₕ₋₁|ₜ for h ≥ 2, with initial condition σ²ₜ₊₁|ₜ = ω + αε²ₜ + βσ²ₜ. The forecast converges to the unconditional variance at a geometric rate: σ²ₜ₊ₕ|ₜ = σ̄² + (α + β)ʰ⁻¹(σ²ₜ₊₁|ₜ − σ̄²), where σ̄² = ω/(1 − α − β). This mean-reversion property is central to the trading strategy: when the current conditional variance is elevated relative to the long-run level, we anticipate a subsequent decline in volatility, and vice versa.


