Discussing the article: "Neural networks made easy (Part 38): Self-Supervised Exploration via Disagreement"

 

Check out the new article: Neural networks made easy (Part 38): Self-Supervised Exploration via Disagreement.

One of the key problems within reinforcement learning is environmental exploration. Previously, we have already seen the research method based on Intrinsic Curiosity. Today I propose to look at another algorithm: Exploration via Disagreement.

Disagreement-based Exploration is a reinforcement learning method that allows an agent to explore environment without relying on external rewards, but rather by finding new, unexplored areas using an ensemble of models.

In the article "Self-Supervised Exploration via Disagreement", the authors describe this approach and propose a simple method: training an ensemble of forward dynamics models and encouraging the agent to explore the action space where there is maximum inconsistency or variance between the predictions of the models in the ensemble.

Thus, rather than choosing actions that produce the greatest expected reward, the agent chooses actions that maximize disagreement between models in the ensemble. This allows the agent to explore regions of state space where the models in the ensemble disagree and where there are likely to be new and unexplored regions of the environment.

In this case, all models in the ensemble converge to the mean, ultimately reducing the spread of the ensemble and providing the agent with more accurate predictions about the states of the environment and the possible consequences of actions.

In addition, the algorithm of exploration via disagreement allows the agent to successfully cope with the stochasticity of interaction with the environment. The results of experiments conducted by the authors of the article showed that the proposed approach actually improves exploration in stochastic environments and outperforms previously existing methods of intrinsic motivation and uncertainty modeling. In addition, they observed that their approach can be extended to supervised learning, where the value of a sample is determined not based on the ground truth label but based on the state of the ensemble of models.

Thus, the algorithm of exploration via disagreement represents a promising approach to solve the exploration problem in stochastic environments. It allows the agent to explore the environment more efficiently and without having to rely on external rewards, which can be especially useful in real-world applications where external rewards may be limited or costly.

Author: Dmitriy Gizlyk

 

Thank you very much for this article!
I see that you also offer a zip file with many RL experiments inside. Is there a specific mq5 file, which I can compile, run and evaluate in more detail?

Thank you very much!

 
Eugen Funk #:

Thank you very much for this article!
I see that you also offer a zip file with many RL experiments inside. Is there a specific mq5 file, which I can compile, run and evaluate in more detail?

Thank you very much!

Hi, yes you can. In attachment all files from previous articles.

Reason: