Thanks Sir @Dmitriy
//+------------------------------------------------------------------+
Hello everyone. I have this version after about 3-4 cycles (database collection - training - test) began to give just a straight line on tests. Deals do not open. Training was done all times by 500 000 iterations. Another interesting point - at a certain moment the error of one of the critics became very large at first, and then gradually the errors of both critics decreased to 0. And for 2-3 cycles the errors of both critics are at 0. And on the tests Test.mqh gives a straight line and no deals. In Research.mqh passes there are passes with negative profit and deals. There are also passes with no deals and zero outcome. There were only 5 passes with a positive outcome in one of the cycles.
In general, it is strange. I have been training strictly according to Dmitry's instructions in all articles, and I have not been able to get a result from any article. I do not understand what I do wrong....
New article Neural networks made easy (Part 51): behavioural actor-criticism (BAC) has been published:
Author: Dmitriy Gizlyk
I downloaded the zipped folder, but there were many other folders inside.
If possible I would like you to explain how to deploy and train.
Congratulations on a great job!
Thank you very much
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Check out the new article: Neural networks made easy (Part 51): Behavior-Guided Actor-Critic (BAC).
The last two articles considered the Soft Actor-Critic algorithm, which incorporates entropy regularization into the reward function. This approach balances environmental exploration and model exploitation, but it is only applicable to stochastic models. The current article proposes an alternative approach that is applicable to both stochastic and deterministic models.
First, let's talk about the need to study the environment in general. I think everyone agrees that this process is necessary. But for what exactly and at what stage?
Let's start with a simple example. Suppose that we find ourselves in a room with three identical doors and we need to get into the street. What shall we do? We open the doors one by one until we find the one we need. When we enter the same room again, we will no longer open all the doors to get outside, but instead we will immediately head to the already known exit. If we have a different task, then some options are possible. We can again open all the doors, except for the exit we already know, and look for the right one. Or we can first remember which doors we opened earlier when looking for a way out and whether the one we need was among them. If we remember the right door, we head towards it. Otherwise, we check the doors we have not tried before.
Conclusion: We need to study the environment in an unfamiliar situation to choose the right action. After finding the required route, additional exploration of the environment can only get in the way.
However, when the task changes in a known state, we may need to additionally study the environment. This may include searching for a more optimal route. In the example above, this may happen if we needed to go through several more rooms or we found ourselves at the wrong side of the building.
Therefore, we need an algorithm that allows us to enhance environmental exploration in unexplored states and minimize it in previously explored states.
Author: Dmitriy Gizlyk