文章 "神经网络变得轻松(第二十九部分):优势扮演者-评价者算法" - 页 2

 
         double reward = Rates[i - 1].close - Rates[i - 1].open;
         switch(action)
           {
            case 0:
               if(reward < 0)
                  reward *= -20;
               else
                  reward *= 1;
               break;
            case 1:
               if(reward > 0)
                  reward *= -20;
               else
                  reward *= -1;
               break;
            default:
               if(batch == 0)
                  reward = -fabs(reward);
               else
                 {
                  switch((int)vActions[batch - 1])
                    {
                     case 0:
                        reward *= -1;
                        break;
                     case 1:
                        break;
                     default:
                        reward = -fabs(reward);
                        break;
                    }
                 }
               break;
           }

能否详细解释一下计算奖励的代码。因为在第 27 部分中,奖励政策如下,与上面的代码不同:

  1. 盈利头寸获得的奖励等于烛台体的大小(分析每个烛台的系统状态;从烛台开盘到收盘,我们都处于头寸中)。
  2. 脱离市场 "状态的惩罚是烛台体的大小(烛台体大小带负号,表示利润损失)。
  3. 亏损头寸则以双烛台体大小(亏损 + 损失的利润)表示。
Discussion of article "Neural networks made easy (Part 29): Advantage Actor-Critic algorithm"
Discussion of article "Neural networks made easy (Part 29): Advantage Actor-Critic algorithm"
  • 2022.11.25
  • MetaQuotes
  • www.mql5.com
New article Neural networks made easy (Part 29): Advantage Actor-Critic algorithm has been published: Author: Dmitriy Gizlyk...
附加的文件:
Capture.PNG  15 kb