Discussing the article: "Neural Networks Made Easy (Part 97): Training Models With MSFformer"

MetaQuotes 2025.01.13 13:34

Check out the new article: Neural Networks Made Easy (Part 97): Training Models With MSFformer.

When exploring various model architecture designs, we often devote insufficient attention to the process of model training. In this article, I aim to address this gap.

The initial training dataset gives the model its first understanding of the environment. However, the financial markets are so multifaceted that no training set can fully replicate them. Additionally, the dependencies that the model learns between the analyzed indicators and profitable trades may turn out to be false or incomplete, as the training set may lack examples capable of revealing such discrepancies. Therefore, during the training process, we will need to refine the training dataset. At this stage, the approach to collecting additional data will differ.

The task at this stage is to optimize the Actor's learned policy. To achieve this, we need data that is relatively close to the trajectory of the current Actor policy, which allows us to understand the direction of reward changes when actions deviate from the current policy. With this information, we can increase the profitability of the current policy by moving in the direction towards maximizing the reward.

There are various approaches to achieve this, and they may change depending on factors such as the model architecture. For instance, with a stochastic policy, we can simply run several Actor passes using the current policy in the strategy tester. The stochastic head will do this. The randomness of the Actor's actions will cover the action space we are interested in, and we will be able to retrain the model using the updated data. In the case of a deterministic Actor policy, where the model establishes explicit relationships between the environmental state and the action, we can add some noise to the Agent's actions to create a cloud of actions around the current Actor policy.

In both cases, it is convenient to use the slow optimization mode of the strategy tester to collect additional data for the training dataset.

Author: Dmitriy Gizlyk

New comment