
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Check out the new article: Pipelines in MQL5.
In this piece, we look at a key data preparation step for machine learning that is gaining rapid significance. Data Preprocessing Pipelines. These in essence are a streamlined sequence of data transformation steps that prepare raw data before it is fed to a model. As uninteresting as this may initially seem to the uninducted, this ‘data standardization’ not only saves on training time and execution costs, but it goes a long way in ensuring better generalization. In this article we are focusing on some SCIKIT-LEARN preprocessing functions, and while we are not exploiting the MQL5 Wizard, we will return to it in coming articles.
The SCIKIT-LEARN library of Python has, for all intents and purposes, become the go-to industry standard in machine learning pre-processing. By writing a minimal amount of code, developers get to apply a Standard-Scaler to center features; a Min-Max-Scaler to compress features to within a fixed range; a Robust-Scaler that serves to reduce the undue confluence of outliers; or a One-Hot-Encoder that helps expand features into binary representations. More than that, SCIKIT-LEARN’s pipeline class does allow these steps to be chained seamlessly, which does mean that all datasets that are passed to the model enjoy the same transformation sequences. This modular plug-and-play mechanism has brought about rapid adoption of machine learning in a vast array of industries.
In contrast, MQL5 developers are daunted by a different reality. While MQL5 is relatively efficient in handling trading data, it still fails to natively provide preprocessing methods comparable to SCIKIT-LEARN. For every transformation - whether scaling, encoding or even imputing missing values, the coding has to be done manually and often in a fragmented way. Not only does this raise the odds of introducing errors, but it also makes it harder to reproduce test results or maintain consistency across training and testing data.
The solution, in my opinion, could be in designing a preprocessing pipeline class in MQL5 that emulates this SCIKIT-LEARN philosophy. If we can implement reusable modules such as CStandardScaler, CMinMaxScaler, CRobustScaler, and COneHotEncoder, we could chain a preprocessing pipeline into a container. This structure would ensure that the raw financial time series data does undergo systematic preparation before entering deep learning models. This would be the case whether the models are coded natively in MQL5 or they are imported via ONNX. By using this, MQL5 developers get to adopt a familiar Python workflow with MQL5, unlocking cleaner experimentation, faster development, and presumably having more reliable AI systems.
Author: Stephen Njuki