Discussing the article: "MQL5 Wizard Techniques you should know (Part 54): Reinforcement Learning with hybrid SAC and Tensors"

 

Check out the new article: MQL5 Wizard Techniques you should know (Part 54): Reinforcement Learning with hybrid SAC and Tensors.

Soft Actor Critic is a Reinforcement Learning algorithm that we looked at in a previous article, where we also introduced python and ONNX to these series as efficient approaches to training networks. We revisit the algorithm with the aim of exploiting tensors, computational graphs that are often exploited in Python.

Soft Actor Critic (SAC) is one of the algorithms used in Reinforcement Learning when training a neural network. To recap, reinforcement learning is a budding method of training in machine learning, alongside supervised learning and unsupervised learning.

The replay buffer is a very important component of SAC off-policy algorithm in Reinforcement Learning given that it keeps past-experiences of state, action, reward, next state, and the done-flag (for logging if an episode complete or ongoing) into sample mini batches for training. Its main purpose is to de-correlate various experiences, such that the agent is able to learn from a more diverse set of experiences, which tends to improve learning-stability and sample-efficiency.

In implementing SAC, we can use the MQL5 language, but the networks created would not be as efficient to train as those created in Python with open-source libraries like TensorFlow or PyTorch. And therefore, as we saw in the last reinforcement learning article where python was used to model a rudimentary SAC Network, we continue with Python but this time looking to explore and harness its Tensor graphs. There are, in principle, two ways of implementing a replay buffer in Python. The manual approach or the Tensor-based approach. 

Author: Stephen Njuki