Discussing the article: "MetaTrader 5 Machine Learning Blueprint (Part 11): Kelly Criterion, Prop Firm Integration, and CPCV Dynamic Backtesting"
Curious though—why did you go with CSCV/PBO instead of DSR for handling selection bias?
Would be interested to hear your thinking here.
Great article, really enjoyed it.
Curious though—why did you go with CSCV/PBO instead of DSR for handling selection bias?
Would be interested to hear your thinking here.
That’s an excellent question, and I’m glad you’re digging into the validation layer—it’s where most production systems quietly fail.
Short answer: For evaluating a single, dynamically sized, path-dependent strategy across varying historical contexts, CPCV + PBO was the more direct and appropriate diagnostic tool in this article (rather than DSR).
DSR’s Core Strength: Correcting for Multiple Testing
The Deflated Sharpe Ratio (DSR) is designed to address selection bias when you run many trials—such as thousands of hyperparameter combinations or model variants—and then select the best-performing one. It deflates the observed Sharpe ratio by accounting for the number of independent trials, the variance of the Sharpe estimator, non-normality, and sample length, giving you a more realistic probability that the reported performance is statistically significant rather than the result of luck.
The article’s primary focus was not hyperparameter optimization of the sizing logic. Instead, the pipeline was presented as a fixed, integrated system:
- Stage-1: get_signal() (confidence-aware with concurrency correction)
- Stage-2: Kelly multiplier (payoff-aware)
- Dynamic prop-firm w -calibration (budget-aware)
The central question was: “Is this particular sizer robust across different historical contexts and path realizations?” — not “Which of 5,000 sizers performs best?”
Why CPCV/PBO Fits a Stateful, Path-Dependent Strategy
A dynamically sized strategy is inherently stateful. Tuesday’s position size depends on Monday’s P&L, which in turn depends on all prior sizing decisions and the evolving account state ( PropFirmAccountState ). A single historical backtest is therefore just one random draw from the full distribution of possible equity paths.
- CPCV (Combinatorial Purged Cross-Validation) generates that distribution by executing the exact same logic over φ[N, k] combinatorial purged paths, each with a freshly initialized account state bar-by-bar. This produces the equity-curve “fan” shown in Figure 5.
- PBO (Probability of Backtest Overfitting) then quantifies the overfitting risk on that distribution: “If I select this strategy based on its performance in one particular split/path, what is the probability that it will perform worse out-of-sample than the median across all paths?”
PBO is the natural companion to CPCV because it directly leverages the same combinatorial path structure to audit a single model/sizer, rather than correcting for selection bias across many competing models.
They Are Complementary, Not Competitors
I deliberately chose CPCV/PBO for this article because it matched the immediate validation task. In a full production pipeline that does involve large-scale hyperparameter optimization (as covered in earlier parts of the series), DSR remains the essential final gatekeeper after model selection.
A robust workflow would typically combine both tools:
- Use CPCV/PBO early to stress-test the integrated model + sizing framework for path-dependency and stability.
- Apply DSR at the end, after any HPO on signals or parameters, to guard against selection bias in the final reported metrics.
You’ve hit exactly the right nuance. If the workflow involved grid-searching kelly_fraction , safety_factor , or max_amplification over hundreds of points, DSR would become critical to avoid being fooled by the luckiest configuration. But to answer the concrete question — “Will this specific sizer blow up a FundedNext account in 20% of parallel universes?” — PBO computed on CPCV paths is the sharper, more targeted instrument.
- 2026.04.06
- www.mql5.com
That’s an excellent question, and I’m glad you’re digging into the validation layer—it’s where most production systems quietly fail.
Short answer: For evaluating a single, dynamically sized, path-dependent strategy across varying historical contexts, CPCV + PBO was the more direct and appropriate diagnostic tool in this article (rather than DSR).
DSR’s Core Strength: Correcting for Multiple Testing
The Deflated Sharpe Ratio (DSR) is designed to address selection bias when you run many trials—such as thousands of hyperparameter combinations or model variants—and then select the best-performing one. It deflates the observed Sharpe ratio by accounting for the number of independent trials, the variance of the Sharpe estimator, non-normality, and sample length, giving you a more realistic probability that the reported performance is statistically significant rather than the result of luck.
The article’s primary focus was not hyperparameter optimization of the sizing logic. Instead, the pipeline was presented as a fixed, integrated system:
- Stage-1: get_signal() (confidence-aware with concurrency correction)
- Stage-2: Kelly multiplier (payoff-aware)
- Dynamic prop-firm w -calibration (budget-aware)
The central question was: “Is this particular sizer robust across different historical contexts and path realizations?” — not “Which of 5,000 sizers performs best?”
Why CPCV/PBO Fits a Stateful, Path-Dependent Strategy
A dynamically sized strategy is inherently stateful. Tuesday’s position size depends on Monday’s P&L, which in turn depends on all prior sizing decisions and the evolving account state ( PropFirmAccountState ). A single historical backtest is therefore just one random draw from the full distribution of possible equity paths.
- CPCV (Combinatorial Purged Cross-Validation) generates that distribution by executing the exact same logic over φ[N, k] combinatorial purged paths, each with a freshly initialized account state bar-by-bar. This produces the equity-curve “fan” shown in Figure 5.
- PBO (Probability of Backtest Overfitting) then quantifies the overfitting risk on that distribution: “If I select this strategy based on its performance in one particular split/path, what is the probability that it will perform worse out-of-sample than the median across all paths?”
PBO is the natural companion to CPCV because it directly leverages the same combinatorial path structure to audit a single model/sizer, rather than correcting for selection bias across many competing models.
They Are Complementary, Not Competitors
I deliberately chose CPCV/PBO for this article because it matched the immediate validation task. In a full production pipeline that does involve large-scale hyperparameter optimization (as covered in earlier parts of the series), DSR remains the essential final gatekeeper after model selection.
A robust workflow would typically combine both tools:
- Use CPCV/PBO early to stress-test the integrated model + sizing framework for path-dependency and stability.
- Apply DSR at the end, after any HPO on signals or parameters, to guard against selection bias in the final reported metrics.
You’ve hit exactly the right nuance. If the workflow involved grid-searching kelly_fraction , safety_factor , or max_amplification over hundreds of points, DSR would become critical to avoid being fooled by the luckiest configuration. But to answer the concrete question — “Will this specific sizer blow up a FundedNext account in 20% of parallel universes?” — PBO computed on CPCV paths is the sharper, more targeted instrument.
Thanks for taking the time to share your thoughts—I appreciate it. I went back and reread AFML to review things more carefully,and with your explanation as well, and it helped me understand this part much better.
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Check out the new article: MetaTrader 5 Machine Learning Blueprint (Part 11): Kelly Criterion, Prop Firm Integration, and CPCV Dynamic Backtesting.
The bet-sizing signal from Part 10 is concurrency-corrected but carries no payoff-ratio adjustment, no response to a hard drawdown budget, and no validation across combinatorial paths. This article covers three additions: a two-stage architecture in which a Kelly payoff multiplier is applied on top of get_signal, preserving the concurrency correction while incorporating win/loss asymmetry; a prop firm integration layer that calibrates the sigmoid w parameter continuously from the remaining drawdown budget under FundedNext Stellar 2-Step rules; and a CPCV backtest framework that simulates a fresh account state across all φ[N, k] paths, producing a Sharpe distribution and a PBO audit.
You have a bet-sizing signal from the toolkit introduced in Part 10. The signal is confidence-aware, concurrency-corrected, and discretized. What it is not yet is payoff-aware, budget-constrained, or validated across combinatorial paths. Three concrete gaps remain.
First, the sizing methods in Part 10 treat wins and losses symmetrically. A strategy that wins three dollars for every one it loses warrants a fundamentally different allocation than a symmetric bet at the same probability, and get_signal has no mechanism to express this. Second, none of the AFML methods incorporate a hard drawdown limit. When a prop firm account has consumed 70% of its daily loss capacity, the same model signal should produce a much smaller position than it would at the start of the day, and a static sizing function cannot respond to that. Third, a single backtest of a dynamically-sized strategy is a misleading performance summary. The position sizes at each bar depend on the P&L history up to that bar, which depends on every prior sizing decision, so the result is as much a function of the specific historical path as of the strategy's genuine edge.
All three gaps are addressed here. After reading, you will have: a precise account of when the Kelly criterion should replace or supplement get_signal, including the numerical crossover point at which the two methods diverge and the five structural conditions Kelly cannot satisfy in live financial markets; a two-stage hybrid architecture in which get_signal handles confidence-aware, concurrency-corrected signal sizing and a Kelly payoff multiplier applies asymmetric win/loss adjustment as a second stage, preserving the concurrency correction while adding what Kelly alone can express; a prop firm risk integration layer (PropFirmAwareSizer) in which the sigmoid w parameter is calibrated continuously from the remaining drawdown budget under the FundedNext Stellar 2-Step rules, so that as daily or overall loss capacity is consumed, the sizing function flattens automatically without threshold logic or manual override; and a CPCV dynamic backtest framework that simulates a fresh account state bar-by-bar through each of the φ[N, k] combinatorial paths, producing a distribution of equity curves and a PBO audit rather than the single path-dependent result that standard backtesting provides. Each component comes with practical limitations that the relevant sections make explicit.
Author: Patrick Murimi Njoroge