Discussing the article: "Neural networks made easy (Part 55): Contrastive intrinsic control (CIC)"
At all stages, I don't come out on the plus side.
In the first stages there is pre-training, which is about exploring the environment and learning the Actor's skills. No external rewards are used here at all. We train the Actor to develop multiple skills. Hence, we do not expect positive passages. External reward is used only in the last stage of Finetune, when we train the Planner to manage the Actor's skills for the task at hand. And the results are directly dependent on the completeness of the first two iterations.
Hello again. I can't understand one point. What is the point of setting take profit if it is trawled? It will never work that way.
It is first of all a risk management tool. It is a defence against sharp big movements. Besides, we train the model. Theoretically, stop loss and take profit do not have to be larger than the size of the candle. In the process of training, we look for the most profitable strategy.
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Check out the new article: Neural networks made easy (Part 55): Contrastive intrinsic control (CIC).
Contrastive training is an unsupervised method of training representation. Its goal is to train a model to highlight similarities and differences in data sets. In this article, we will talk about using contrastive training approaches to explore different Actor skills.
The Contrastive Intrinsic Control algorithm begins with training the Agent in the environment using feedback and obtaining trajectories of states and actions. Representation training is then performed using Contrastive Predictive Coding (CPC), which motivates the Agent to retrieve key features from states and actions. Representations are formed that take into account the dependencies between successive states.
Intrinsic rewards play an important role in determining which behavioral strategies should be maximized. CIC maximizes the entropy of transitions between states, which promotes diversity in Agent behavior. This allows the Agent to explore and create a variety of behavioral strategies.
After generating a variety of skills and strategies, the CIC algorithm uses the Discriminator to instantiate the skill representations. The Discriminator aims to ensure that states are predictable and stable. In this way, the Agent learns to "use" skills in predictable situations.
The combination of exploration motivated by intrinsic rewards and the use of skills for predictable actions creates a balanced approach for creating varied and effective strategies.
As a result, the Contrastive Predictive Coding algorithm encourages the Agent to detect and learn a wide range of behavioral strategies, while ensuring stable learning. Below is the custom algorithm visualization.
Author: Dmitriy Gizlyk