Discussing the article: "MetaTrader 5 Machine Learning Blueprint (Part 1): Data Leakage and Timestamp Fixes"

 

Check out the new article: MetaTrader 5 Machine Learning Blueprint (Part 1): Data Leakage and Timestamp Fixes.

Before we can even begin to make use of ML in our trading on MetaTrader 5, it’s crucial to address one of the most overlooked pitfalls—data leakage. This article unpacks how data leakage, particularly the MetaTrader 5 timestamp trap, can distort our model's performance and lead to unreliable trading signals. By diving into the mechanics of this issue and presenting strategies to prevent it, we pave the way for building robust machine learning models that deliver trustworthy predictions in live trading environments. 

Data snooping or data leakage might seem subtle, but its impact on machine learning models can be monumental—and devastating. Imagine studying for a test where you unknowingly peek at the answers beforehand. Your perfect score feels earned, but it's actually cheating. This is precisely what happens when we use MetaTrader 5's default timestamps in machine learning—data leakage unexpectedly corrupts your model's integrity.

How MetaTrader 5's Timestamps Trick You

EURUSD M5 - MetaTrader5

MetaTrader 5 labels the 5-minute bar starting at 18:55, i.e., the 2nd-last bar above, as:
Time Open High Low Close

2 Apr 18:55

  1.08718

  1.08724

  1.08668

  1.08670

By timestamping at the start, MetaTrader 5 implies this bar's data was available at 18:55:00—a full 5 minutes before it actually closed! If your model uses this in training, it's like giving a student exam answers 5 minutes before the test begins. To counteract this, we should avoid using MetaTrader 5's precompiled time-bars, instead using tick data to create the bars we use in our models.

Author: Patrick Murimi Njoroge

 

The activity-driven bars do not solve all problems you mentioned for time bars. For example, you wrote:

The Subtle Intra-Bar Leakage: However, a more subtle form of data leakage can still occur within the very formation of that time bar. If a significant event transpires midway through a 1-minute bar (e.g., at 09:00:35), any features derived from that bar (such as its high price or a flag for the event) will inevitably incorporate this information by the bar's end.

If you build equal volume, equal range or other tick-based custom bars, you will mark such a bar with a single label anyway, and it will leak (or more precise, blur) information about the high price across the entire bar.

The only way to solve this - is to build "bars" with the specific features (you're going to use) in mind. For example, in case of high or lows being the main features, you should try, probably a zigzag "bars" with extermums marked with exact time.

Actually, the approach with constant timeframes, and specifically limiting them to M1 is problematic in the context of data leakage in MT5. Labelling M1 bars with ending time is not much better than with beginning time, imho.


For those, who are interested in building custom bars (charts) natively in MT5, there is the article with MQL5 implementation of equal-volume, equal-range, and renko bars. Of course, you can mark the bars with ending time in the open source code.

Custom symbols: Practical basics
Custom symbols: Practical basics
  • www.mql5.com
The article is devoted to the programmatic generation of custom symbols which are used to demonstrate some popular methods for displaying quotes. It describes a suggested variant of minimally invasive adaptation of Expert Advisors for trading a real symbol from a derived custom symbol chart. MQL source codes are attached to this article.