Discussing the article: "Neural networks made easy (Part 61): Optimism issue in offline reinforcement learning"
Neural Networks - It's Simple (Part 61)
Part 61, can you see the result in monetary terms?
I must say a big thank you to the author, who takes a purely theoretical article and explains in popular language how it can:
a) apply it in trading,
b) program and test it in a strategy tester.
Take a look at the original article and see for yourself what kind of work Dmitry has done - https://arxiv.org/abs/2207.10295.

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning
- arxiv.org
Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We...

You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Check out the new article: Neural networks made easy (Part 61): Optimism issue in offline reinforcement learning.
During the offline learning, we optimize the Agent's policy based on the training sample data. The resulting strategy gives the Agent confidence in its actions. However, such optimism is not always justified and can cause increased risks during the model operation. Today we will look at one of the methods to reduce these risks.
Recently, offline reinforcement learning methods have become widespread, which promises many prospects in solving problems of varying complexity. However, one of the main problems that researchers face is the optimism that can arise while learning. The agent optimizes its strategy based on the data from the training set and gains confidence in its actions. But the training set is quite often not able to cover the entire variety of possible states and transitions of the environment. In a stochastic environment, such confidence turns out to be not entirely justified. In such cases, the agent's optimistic strategy may lead to increased risks and undesirable consequences.
Author: Dmitriy Gizlyk