Discussing the article: "Developing a robot in Python and MQL5 (Part 1): Data preprocessing"
В задаче прогнозирования EURUSD мы добавили бинарный столбец "labels", указывающий, превысило ли следующее изменение цены спред и комиссию.
By the way, out of over 700,000 pieces of data, the price changed by more than the spread in only 70,000 cases.
EURUSD has ~0 spread 90% of the time. You are working with H1 data. How did you get this result?
Фича инжиниринг — преобразование исходных данных в набор признаков для обучения моделей машинного обучения. Цель — найти наиболее информативные признаки. Есть ручной подход (человек выбирает признаки) и автоматический (с помощью алгоритмов).
We will use an automatic approach. We will apply the new feature creation method to automatically extract the best features from our data. We will then select the most informative ones from the resulting set.
The best feature for price prediction turned out to be the opening price itself. Signs based on moving averages, price increments, standard deviation, daily and monthly price changes were included in the top. Automatically generated signs turned out to be uninformative.There is a question about the quality of the algorithms for feature generation, or rather its complete absence.
A person generated all the attributes from OHLCT data - five columns in total. You are pushing the feature generation algorithm onto a much larger number of initial features. It's hard to imagine that the chip generation algorithm couldn't reproduce the simplest MA chip.
Feature clustering combines similar features into groups to reduce the number of features. This helps to get rid of redundant data, reduce correlation and simplify the model without overfitting. The best feature for price prediction turned out to be the opening price itself.
Did the clustering throw out the HLC prices because they fell into the same cluster as the O-price?
If the price turned out to be the best sign for its prediction (and the other signs are its derivatives), does it mean that we should forget about the other signs and it is reasonable to add more input data by moving to a lower timeframe and taking prices of other symbols as signs?
Prices should, of course, be removed from the training sample, as MO will not perform adequately on the new data, especially if they fall outside the training range.
The high informativeness of prices arises from the uniqueness of their values, i.e. it is easier for the algorithm to remember or match prices with labels.
In MO practice, not only uninformative features are removed, but also suspiciously overinformative features, which are raw prices.
In an ideal scenario, there should be several attributes that are +- equally informative. That is, there are no clear leaders or outsiders. This means that none of the attributes litter the training and do not pull the blanket on themselves.Prices should, of course, be removed from the training sample, as MO will not perform adequately on the new data, especially if they fall outside the training range.
If we go to returns, the feature generation algorithm is obliged to generate a cumulative sum, which will be the same prices. At the same time, it will not be known that these are the prices.
If we go to returns, then the algorithm of feature generation is obliged to generate a cumulative sum, which will be the same prices. It will not be known that these are the prices.
I don't get it
All signs should be pseudo-stationary, like increments. Raw prices should be removed from the training.
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Check out the new article: Developing a robot in Python and MQL5 (Part 1): Data preprocessing.
Developing a trading robot based on machine learning: A detailed guide. The first article in the series deals with collecting and preparing data and features. The project is implemented using the Python programming language and libraries, as well as the MetaTrader 5 platform.
The market is becoming increasingly complex. Today it is turning into a battle of algorithms. Over 95% of trading turnover is generated by robots.
The next step is machine learning. These are not strong AI, but they are not simple linear algorithms either. Machine learning model is capable of making profit in difficult conditions. It is interesting to apply machine learning to create trading systems. Thanks to neural networks, the trading robot will analyze big data, find patterns and predict price movements.
We will look at the development cycle of a trading robot: data collection, processing, sample expansion, feature engineering, model selection and training, creating a trading system via Python, and monitoring trades.
Working in Python has its own advantages: speed in the field of machine learning, as well as the ability to select and generate features. Exporting models to ONNX requires exactly the same feature generation logic as in Python, which is not easy. That is why I have selected online trading via Python.
Author: Yevgeniy Koshtenko