Discussing the article: "MetaTrader 5 Machine Learning Blueprint (Part 1): Data Leakage and Timestamp Fixes"
The activity-driven bars do not solve all problems you mentioned for time bars. For example, you wrote:
The Subtle Intra-Bar Leakage: However, a more subtle form of data leakage can still occur within the very formation of that time bar. If a significant event transpires midway through a 1-minute bar (e.g., at 09:00:35), any features derived from that bar (such as its high price or a flag for the event) will inevitably incorporate this information by the bar's end.
If you build equal volume, equal range or other tick-based custom bars, you will mark such a bar with a single label anyway, and it will leak (or more precise, blur) information about the high price across the entire bar.
The only way to solve this - is to build "bars" with the specific features (you're going to use) in mind. For example, in case of high or lows being the main features, you should try, probably a zigzag "bars" with extermums marked with exact time.
Actually, the approach with constant timeframes, and specifically limiting them to M1 is problematic in the context of data leakage in MT5. Labelling M1 bars with ending time is not much better than with beginning time, imho.
For those, who are interested in building custom bars (charts) natively in MT5, there is the article with MQL5 implementation of equal-volume, equal-range, and renko bars. Of course, you can mark the bars with ending time in the open source code.

- www.mql5.com

- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Check out the new article: MetaTrader 5 Machine Learning Blueprint (Part 1): Data Leakage and Timestamp Fixes.
Before we can even begin to make use of ML in our trading on MetaTrader 5, it’s crucial to address one of the most overlooked pitfalls—data leakage. This article unpacks how data leakage, particularly the MetaTrader 5 timestamp trap, can distort our model's performance and lead to unreliable trading signals. By diving into the mechanics of this issue and presenting strategies to prevent it, we pave the way for building robust machine learning models that deliver trustworthy predictions in live trading environments.
Data snooping or data leakage might seem subtle, but its impact on machine learning models can be monumental—and devastating. Imagine studying for a test where you unknowingly peek at the answers beforehand. Your perfect score feels earned, but it's actually cheating. This is precisely what happens when we use MetaTrader 5's default timestamps in machine learning—data leakage unexpectedly corrupts your model's integrity.
How MetaTrader 5's Timestamps Trick You
By timestamping at the start, MetaTrader 5 implies this bar's data was available at 18:55:00—a full 5 minutes before it actually closed! If your model uses this in training, it's like giving a student exam answers 5 minutes before the test begins. To counteract this, we should avoid using MetaTrader 5's precompiled time-bars, instead using tick data to create the bars we use in our models.
Author: Patrick Murimi Njoroge