Discussing the article: "Neural networks made easy (Part 55): Contrastive intrinsic control (CIC)"

 

Check out the new article: Neural networks made easy (Part 55): Contrastive intrinsic control (CIC).

Contrastive training is an unsupervised method of training representation. Its goal is to train a model to highlight similarities and differences in data sets. In this article, we will talk about using contrastive training approaches to explore different Actor skills.

The Contrastive Intrinsic Control algorithm begins with training the Agent in the environment using feedback and obtaining trajectories of states and actions. Representation training is then performed using Contrastive Predictive Coding (CPC), which motivates the Agent to retrieve key features from states and actions. Representations are formed that take into account the dependencies between successive states.

Intrinsic rewards play an important role in determining which behavioral strategies should be maximized. CIC maximizes the entropy of transitions between states, which promotes diversity in Agent behavior. This allows the Agent to explore and create a variety of behavioral strategies.

After generating a variety of skills and strategies, the CIC algorithm uses the Discriminator to instantiate the skill representations. The Discriminator aims to ensure that states are predictable and stable. In this way, the Agent learns to "use" skills in predictable situations.

The combination of exploration motivated by intrinsic rewards and the use of skills for predictable actions creates a balanced approach for creating varied and effective strategies.

As a result, the Contrastive Predictive Coding algorithm encourages the Agent to detect and learn a wide range of behavioral strategies, while ensuring stable learning. Below is the custom algorithm visualization.

Custom algorithm visualization

Author: Dmitriy Gizlyk

 
Hello. I can't get a positive result on research. A straight line is drawn. It seems that there is a limitation on results in the code.
Files:
 
star-ik #:
Hello. I can't get a positive result on research. A straight line is drawn. It seems that there is a limitation on the results in the code.

At what stage? First run with random parameters? After running Pretrain? Or Finetune?

 
At all stages, I don't come out on the plus side.
 
star-ik #:
At all stages, I don't come out on the plus side.

In the first stages there is pre-training, which is about exploring the environment and learning the Actor's skills. No external rewards are used here at all. We train the Actor to develop multiple skills. Hence, we do not expect positive passages. External reward is used only in the last stage of Finetune, when we train the Planner to manage the Actor's skills for the task at hand. And the results are directly dependent on the completeness of the first two iterations.

 
What are Finetune's acceptable error rates? And when will the file be written to the Tester folder?
 
I've got Research, I'm on the plus side.
 
Hello again. I can't understand one point. What is the point of setting take profit if it is trawled? It will never work that way.
 
star-ik #:
Hello again. I can't understand one point. What is the point of setting take profit if it is trawled? It will never work that way.

It is first of all a risk management tool. It is a defence against sharp big movements. Besides, we train the model. Theoretically, stop loss and take profit do not have to be larger than the size of the candle. In the process of training, we look for the most profitable strategy.

 
Another question. After Research has gone to the plus side, can I run the rest repeatedly on this data? The thing is that it goes back to minus again and spoils the statistics.
 
Dimitri, you have already published a new article and may not be coming back here. But I will try to ask you a question. Please tell me, did your Expert Advisor execute trades in both directions? For some reason, I only have a buy. Is it worth torturing with it further?