Discussing the article: "Neural Networks in Trading: Actor—Director—Critic"

 

Check out the new article: Neural Networks in Trading: Actor—Director—Critic.

We invite you to explore the Actor-Director-Critic framework, which combines hierarchical learning and a multi-component architecture for creating adaptive trading strategies. In this article, we take a detailed look at how using the Director to classify the Actor's actions helps to effectively optimize trading decisions and improve the robustness of models in financial market conditions.

In financial applications, the Actor–Critic architecture is commonly used to build Agents capable of forecasting short-term returns while managing long-term risk. For example, in portfolio rebalancing tasks, the Critic learns to estimate expected returns, while the Actor selects asset weights that maximize portfolio value. However, even this advanced architecture has limitations. During the early stages of training, the Critic's estimates may be highly inaccurate, causing the Actor to receive misleading signals. As a result, the Agent may repeatedly explore action-space regions that are known to be unprofitable.

To address this limitation, the paper "Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework" introduced a new framework: Actor—Director—Critic (ADC). In addition to the Actor and Critic, the architecture incorporates a third component — the Director. Its role is to act as a classifier capable of distinguishing high-quality actions from poor ones even before the Critic has learned to provide reliable evaluations. Unlike the Critic, the Director performs a classification rather than an evaluation function. It determines whether a particular action should be used to train the policy or whether it is inherently low-quality and should be excluded from further consideration.

The introduction of the Director offers several advantages. First, selectivity is critically important during the early stages of training, where ineffective actions should be avoided whenever possible. Second, in environments with high transaction costs and market volatility, every unsuccessful action can be expensive for the Actor. Under such conditions, the Director serves as an initial guidance mechanism for the Actor, enabling it to focus on potentially effective actions. This approach reduces exploration entropy and accelerates the formation of productive strategies.


Author: Dmitriy Gizlyk