Discussing the article: "Data Science and ML (Part 29): Essential Tips for Selecting the Best Forex Data for AI Training Purposes"
Here, for example, in this article https://link.springer.com/article/10.1186/s40854-024-00622-6?utm_source
they prove that OHLC is not just four numbers, but a single topological object.
If we leave only Close, we lose information about volatility within the bar. A high correlation of 99% is "noise" for linear regression, but that 1% difference is a "signal" for the trader (shadow length, breakout strength). Removing "correlated" prices turns a candlestick chart into a linear chart, destroying the very essence of candlestick analysis.
The author himself admits the limitations of the method, but still suggests using it for feature selection.
The market is not linear. The same Article introduces the concept of structural limitations (High ≥ Close). Pearson correlation does not see these constraints. If we follow the logic of the first article and remove the "redundant" High/Low, the model ceases to understand the limits of acceptable values. As a result, we get an algorithm that does not understand the difference between a "calm market" and a "market with huge tails" if their opening prices coincide.
This is "saving on matches."
You can transform the data (Unconstrained Transformation) rather than "throwing away" the data to simplify it. Instead of removing High and Low because of their correlation with Open, you should transform them into relative values (candle spread, close position relative to extremes). Thus, the dimensionality remains the same (or slightly less), but the informativeness (geometry) remains 100%, and the correlation problem disappears.
- 2024.03.05
- link.springer.com
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Check out the new article: Data Science and ML (Part 29): Essential Tips for Selecting the Best Forex Data for AI Training Purposes.
In this article, we dive deep into the crucial aspects of choosing the most relevant and high-quality Forex data to enhance the performance of AI models.
With all the trading data and information such as indicators (there are more than 36 built-in indicators in MetaTrader 5), symbol pairs (there are more than 100 symbols) that can also be used as data for correlation strategies, there are also news which are valuable data for traders, etc. The point I'm trying to raise is that there is abundant information for traders to use in manual trading or when trying to build Artificial Intelligence models to help us make smart trading decisions in our trading robots.
Out of all the information we have at hand, there has to be some bad information (that is just common sense). Not all indicators, data, strategy, etc. are useful for a particular trading symbol, strategy, or situation. How do we determine the right information for trading and machine learning models for maximum efficiency and profitability? This is where feature selection comes into play.
Author: Omega J Msigwa