Discussing the article: "Machine Learning Blueprint (Part 4): The Hidden Flaw in Your Financial ML Pipeline — Label Concurrency"

 

Check out the new article: Machine Learning Blueprint (Part 4): The Hidden Flaw in Your Financial ML Pipeline — Label Concurrency.

Discover how to fix a critical flaw in financial machine learning that causes overfit models and poor live performance—label concurrency. When using the triple-barrier method, your training labels overlap in time, violating the core IID assumption of most ML algorithms. This article provides a hands-on solution through sample weighting. You will learn how to quantify temporal overlap between trading signals, calculate sample weights that reflect each observation's unique information, and implement these weights in scikit-learn to build more robust classifiers. Learning these essential techniques will make your trading models more robust, reliable and profitable.

Most non-financial ML researchers can assume that observations are drawn from IID processes (IID  — Independent and Identically Distributed). For example, you can obtain blood samples from a large number of patients and measure their cholesterol. Of course, various underlying common factors will shift the mean and standard deviation of the cholesterol distribution, but the samples are still independent: there is one observation per subject. Suppose you take those blood samples, and someone in your laboratory spills blood from each tube into the following nine tubes to their right. That is, tube 10 contains blood for patient 10, but also blood from patients 1 through 9. Tube 11 contains blood for patient 11, but also blood from patients 2 through 10, and so on. Now you need to determine the features predictive of high cholesterol (diet, exercise, age, etc.), without knowing for sure the cholesterol level of each patient. That is the equivalent challenge that we face in financial ML, with the additional handicap that the spillage pattern is non-deterministic and unknown.

Models trained on concurrent observations often show inflated in-sample performance (because they're learning the same patterns multiple times) but poor out-of-sample performance (because the real frequency of those patterns is much lower than the model believes).

Sample weighting provides an elegant solution. Instead of treating all observations equally, we assign weights based on how much unique information each observation contains. Observations that overlap heavily with others receive lower weights, while truly independent observations receive higher weights.

Author: Patrick Murimi Njoroge