Discussing the article: "MQL5 Wizard Techniques you should know (Part 58): Reinforcement Learning (DDPG) with Moving Average and Stochastic Oscillator Patterns"

 

Check out the new article: MQL5 Wizard Techniques you should know (Part 58): Reinforcement Learning (DDPG) with Moving Average and Stochastic Oscillator Patterns.

Moving Average and Stochastic Oscillator are very common indicators whose collective patterns we explored in the prior article, via a supervised learning network, to see which “patterns-would-stick”. We take our analyses from that article, a step further by considering the effects' reinforcement learning, when used with this trained network, would have on performance. Readers should note our testing is over a very limited time window. Nonetheless, we continue to harness the minimal coding requirements afforded by the MQL5 wizard in showcasing this.

From our last article, we tested 10 signal patterns from our 2 indicators (MA & Stochastic Oscillator).  Seven were able to forward-walk based on a 1-year test window. However, of these, only 2 did so by placing both long and short trades. This was down to our small test-window, which is why readers are urged to test this on more history before taking it any further. 

We are following a thesis here where the three main modes of machine learning can be used together, each in its own ‘phase’. These modes, to recap, are supervised-learning (SL), reinforcement-learning (RL), and inference-learning (IL). We dwelt on SL in the last article, where combined patterns of the moving average and stochastic oscillator were normalized to a binary vector of features. This was then fed into a simple neural network that we trained on the pair EUR USD for the year 2023 and subsequently performed forward tests for the year 2024. 

Since our approach is based on the thesis RL can be used to train models while in use, we want to demonstrate this in this article by using our earlier results & network from SL. RL, we are positing, is a form of back propagation when in deployment that carefully fine-tunes our buy-sell decisions so that they are not solely based on projected changes in price alone as was the case in the SL model. 

This ‘fine-tuning’ as we have seen in past RL articles, marries exploration and exploitation. So, in doing so, our policy network would from training in a live market environment determine which states should result in buy or sell actions. There could be cases where a bullish state would not necessarily mean a buying opportunity, and vice versa. This means that our RL model acts as an extra filter to the decisions made by the SL model. The states from our SL model were using single-dim continuous values, and this will be very similar to the action space we will be using.

Author: Stephen Njuki