Discussing the article: "Neural networks made easy (Part 58): Decision Transformer (DT)"

 

Check out the new article: Neural networks made easy (Part 58): Decision Transformer (DT).

We continue to explore reinforcement learning methods. In this article, I will focus on a slightly different algorithm that considers the Agent’s policy in the paradigm of constructing a sequence of actions.

In this series, we have already examined a fairly wide range of different reinforcement learning algorithms. They all use the basic approach:

  1. The agent analyzes the current state of the environment.
  2. Takes the optimal action (within the framework of the learned Policy - behavior strategy).
  3. Moves into a new state of the environment.
  4. Receives a reward from the environment for a complete transition to a new state.

The sequence is based on the principles of the Markov process. It is assumed that the starting point is the current state of the environment. There is only one optimal way out of this state and it does not depend on the previous path.


I want to introduce an alternative approach presented by the Google team in the article "Decision Transformer: Reinforcement Learning via Sequence Modeling" (06.02.2021). The main highlight of this work is the projection of the reinforcement learning problem into the modeling of a conditional sequence of actions, conditioned by an autoregressive model of the desired reward.

Author: Dmitriy Gizlyk

Reason: