6.Backpropagation methods for Dropout

Traditionally, after implementing the feed-forward algorithm, we move on to organizing the backpropagation process. As you know, in the base class of the neural layer, the backpropagation algorithm is implemented by four virtual methods:

CalcOutputGradient for calculating the error gradient at the output of a neural network
CalcHiddenGradient for propagating a gradient through a hidden layer
CalcDeltaWeights for the required calculation of weight correction values
UpdateWeights for updating the weight matrix

All the above methods are overridden in new classes as needed. As mentioned earlier, our Dropout layer does not contain trainable parameters. As a consequence, it does not contain a weight matrix. Thus, the last two methods are not relevant to our class. At the same time, we will have to override these methods to maintain the integrity of our model architecture because, during training, it will call these methods for all the neural layers used. If we do not override them, then when these methods are called, the operations of the inherited parent method will be performed. In this case, the absence of a buffer of the weight matrix and related objects can lead to critical errors. In the best-case scenario, as a result of our control operation, we will terminate the method with a false result, which will lead to the interruption of the training process. Therefore, we override these methods and replace them with empty methods that will always return a positive result.

   virtual bool      CalcDeltaWeights(CNeuronBase *prevLayer, bool read)
                                                        override { return true; }
   virtual bool      UpdateWeights(int batch_size, TYPE learningRate,
                          VECTOR &Beta, VECTOR &Lambda) override { return true; }

The CalcOutputGradient method is used only for the results layer. Dropout operation principles do not imply its use as a results layer. Therefore, we do not override it.

Thus, we only have one method left to override: the CalcHiddenGradient method that propagates the gradient through the hidden layer. This method, like most of the previous ones, is declared as virtual in the base neural network class and is overridden in all new classes to establish the specific algorithm of the neural layer operation. In the parameters, the method receives a pointer to the object of the previous layer. Right within the method body, we set up a control block to verify the validity of pointers to objects used by the method. As in the feed-forward method, we check pointers to all used objects, both external and internal.

bool CNeuronDropout::CalcHiddenGradient(CNeuronBase *prevLayer)
  {
//---control block
   if(!prevLayer || !prevLayer.GetGradients() || !m_cGradients)
      return false;

After successfully passing the block of controls, we must create a branching of the algorithm depending on the computing device. As always, in this section, we will consider the implementation of the algorithm using MQL5 tools and will return to the multi-threaded implementation of the algorithm in the next section.

//--- branching of the algorithm depending on the execution device
   ulong total = m_cOutputs.Total();
   if(!m_cOpenCL)
     {

In the implementation block using MQL5, we check the class operating mode. During operational use mode, we simply copy the data from the error gradient buffer of the current layer into a similar buffer of the previous layer.

      //--- check the operating mode flag
      if(!m_bTrain)
         prevLayer.GetGradients().m_mMatrix = m_cGradients.m_mMatrix;
      else
         prevLayer.GetGradients().m_mMatrix = m_cGradients.m_mMatrix *
                                              m_cDropOutMultiplier.m_mMatrix;
     }
   else  // OpenCL block
     {
      return false;
     }
//---
   return true;
  }

If the method operates in model training mode, according to the Dropout algorithm, we need to multiply the error gradient buffer of the current layer element-by-element by the masking vector buffer. The matrix multiplication operation allows us to do this literally in one line of code.

As you can see, at this stage we have passed the error gradient into the buffer of the previous layer. Therefore, the task set for this method is completed, and we can finish the method execution. Now we add a stub in the block for organizing multi-threaded operations. We will return to it in one of the subsequent sections.

Thus, we have fully implemented the Dropout algorithm using standard MQL5 tools. At this stage, you can already create a model and obtain initial results using this approach. However, as we have discussed before, it is equally important to have the capability to restore the previously trained model functionality at any convenient time for the full functionality of any neural layer within the model. Therefore, in the next section, we will look at methods for saving neural layer data and restoring the functioning of the layer from previously saved data.

6.2.1.1 Feed-forward method

6.2.1.3 File operations