Discussing the article: "Developing a robot in Python and MQL5 (Part 2): Model selection, creation and training, Python custom tester"

MetaQuotes · 2025-02-19T08:51:26.0000000Z

Check out the new article: Developing a robot in Python and MQL5 (Part 2): Model selection, creation and training, Python custom tester . We continue the series of articles on developing a trading robot in Python and MQL5. Today we will solve the problem of selecting and training a model, testing it, implementing cross-validation, grid search, as well as the problem of model ensemble. In the previous article , we talked a bit about machine learning, performed data augmentation, developed features for the future model and selected the best of them. Now it is time to move on and create a working machine learning model that will learn from our features and trade (hopefully successfully). To evaluate the model, we will write a custom Python tester that will help us evaluate the performance of the model and the beauty of the test graphs. For more beautiful test graphs and greater model stability, we will also develop a number of classic machine learning features along the way. Our ultimate goal is to create a working and maximally profitable model for price forecasting and trading. All code will be in Python, with inclusions of the MQL5 library. Author: Yevgeniy Koshtenko

Maxim Dmitrievsky 2024.06.02 11:53 #11

Still row prices turn out to be the best chips.

I used to be sceptical because of their non-stationarity. But after some manipulations I also started to extract decent models on these features.

So from ignorance comes knowledge, and from knowledge comes ignorance :)

Can it be? Metatrader known bugs ... New Members introduction

Yevgeniy Koshtenko 2024.06.02 21:49 #12

Ivan Butko #:
Good motivation when there are results!

And, as I realised, it's not a week ahead, and not a month, but a normal year's worth of work

Thank you very much! Yes, it motivates me a lot! I'll keep on researching) It's night again, I have a cup of coffee and code ideas with me)))))

Yevgeniy Koshtenko 2024.06.02 21:49 #13

Maxim Dmitrievsky #:

Still row prices turn out to be the best chips.

I used to be sceptical because of their non-stationarity. But after some manipulations I also started to extract decent models on these features.

So from ignorance comes knowledge, and from knowledge comes ignorance :)

Here is a type of such tried, my mother-in-law is a trader with experience of 15+ years, she keeps saying it is necessary to make chips on volumes))) https://www.mql5.com/ru/code/50133

Индикатор Price / Volume

www.mql5.com

Одна из простых фич для машинного обучения

Maxim Dmitrievsky 2024.06.02 23:05 #14

Yevgeniy Koshtenko #:

This is the kind of thing I tried, my mother-in-law is a trader with 15+ years of experience, she keeps saying you need to make chips on volumes))) https://www.mql5.com/ru/code/50133

Yes, it is true that more often volatility is added (e.g. std indicator), but it doesn't give much. Or increments divided by volatility.

Milksad 2024.09.09 10:20 #15

Eugene, from your articles I started to study ML in relation to trading, thank you very much for that.

Could you explain the following points.

After the label_data function processes the data, its volume is significantly reduced (we get a random set of bars that satisfy the conditions of the function). Then the data goes through several functions and we divide it into train and test samples. The model is trained on the train sample. After that, the ['labels'] columns are removed from the test sample and we try to predict their values to estimate the model. Is there no concept substitution in the test data? After all, for the tests we use data that has passed the label_data function (i.e. a set of non sequential bars selected in advance by a function that takes future data into account). And then in the tester there is parameter 10, which, as I understand, should be responsible for how many bars to close the deal, but since we have a non sequential set of bars, it is not clear what we get.

The following questions arise: Where am I wrong? Why not all bars >= FORWARD are used for tests? And if we do not use all bars >= FORWARD, how can we choose the bars necessary for prediction without knowing the future?

Thank you.

Machine learning in trading: Econometrics: one step ahead Discussion of article "Neural

Eric Ruvalcaba 2024.10.09 17:41 #16

Great work, very interesting, practical and down to earth. Hard to see an article this good with actual examples and not just theory without results. Thank you so much for your work and sharing, I'll be following and looking forward this series.

Discussion of article "Experiments One more average ... Discussion of article "Triangular

Yevgeniy Koshtenko 2024.10.12 20:05 #17

Eric Ruvalcaba #:
Great work, very interesting, practical and down to earth. Hard to see an article this good with actual examples and not just theory without results. Thank you so much for your work and sharing, I'll be following and looking forward this series.

Thanks a lot! Yes, there are still a lot of implementations of ideas ahead, including the expansion of this one with translation into ONNX)

Damjan Cvetanovski 2024.11.17 10:28 #18

Is there any particular reason for using RandomForestClassifier for feature selection and XGBclassifier for model classification?

Andy An 2025.02.19 08:51 #19

Critical flaws:

Problems in preventing data leakage:
- The augment_data() function creates serious data leakage problems between the training and test sets
- Augmentation mixes data from different time periods
Errors in performance evaluation methodology:
- Model testing does not take into account real market conditions
- The model is trained on future data and tested on historical data, which is unacceptable
Technical problems in the code:
- The generate_new_features() function creates features but does not return them (returns the original data)
- The test_model() uses X_test.iloc[i]['close'] , but 'close' may be missing after transforming the features
Inconsistent data processing:
- Data is labelled twice in different ways ( markup_data() and label_data() )
- Clustering results ( cluster ) are not used in further training
Methodological problems in trading strategy:
- Static exit after 10 bars instead of adaptive strategy
- No risk management (except for a simple stop-loss)
- No consideration of transaction costs (other than simple spread)
Ineffective validation:
- No validation of the model on historical data considering time structure (walk-forward analysis)
- Cross validation is applied to time series without taking into account time series specificity

Recommendations for improvement:

Eliminate data leakage - strictly segregate data over time
Implement proper walk-forward validation
Implement more realistic testing, taking into account slippage and commissions
Finalise the logic of entering/exiting positions
Use time-series specific methods

Discussing the article: "Data Machine learning in trading: Python in algorithmic trading

Discussing the article: "Developing a robot in Python and MQL5 (Part 2): Model selection, creation and training, Python custom tester" - page 2

Critical flaws:

Recommendations for improvement: