Discussing the article: "MQL5 Wizard Techniques you should know (Part 51): Reinforcement Learning with SAC"
MetaQuotes:
hello stephen thanks for your educative articles , iam suggesting that you add in nfp, cpi and interest rates historical data from economic calendars since that data influnce the market severly
Check out the new article: MQL5 Wizard Techniques you should know (Part 51): Reinforcement Learning with SAC.
Author: Stephen Njuki
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Check out the new article: MQL5 Wizard Techniques you should know (Part 51): Reinforcement Learning with SAC.
Soft Actor Critic is a Reinforcement Learning algorithm that utilizes 3 neural networks. An actor network and 2 critic networks. These machine learning models are paired in a master slave partnership where the critics are modelled to improve the forecast accuracy of the actor network. While also introducing ONNX in these series, we explore how these ideas could be put to test as a custom signal of a wizard assembled Expert Advisor.
Soft Actor Critic is yet another reinforcement learning algorithm that we are considering, having looked at a few already that included proximal policy optimization, deep-Q-networks, SARSA, and others. This algorithm though, like some that we have already looked at, uses neural networks, but with some important caveat. The total number of networks used are three, and these are: 2 critic networks and an actor network. The two critic networks make reward forecasts (Q-Values) when inputted with an action and an environment state, and the minimum of the outputs of these 2 networks is used in modulating the loss function used for training the actor network.
The inputs to the actor network are environment state coordinates, with the output being 2-fold. A mean vector, and a log-standard-deviation vector. By using the Gaussian process, these two vectors are used to derive a probability distribution for the possible actions open to the actor. And so, while the 2 critic networks can be trained traditionally, the actor network clearly is a different kettle of fish. There is quite a bit to get into here, so let’s first reiterate the basics before going any further. The two critic networks for input take the current state of the environment and an action. Their output is an estimate of the expected return (Q-value) for taking that action in that state. The use of two critics helps in reducing overestimation bias, a common problem with Q-learning.
We are sticking with the same model we have been using this far, of 9-environment states and 3 possible actions. In order to process the probability distribution of the actions, we need the log probabilities function whose code was shared at the beginning of this piece. Compiling with the wizard and performing a test run for the remaining 4 months of the data window does present us with the following report:
Author: Stephen Njuki