6.Batch normalization feed-forward methods

We continue moving forward along the path of building the batch normalization class, and simultaneously, along the path of understanding the structure and methods of organizing neural networks. Earlier, we discussed various architectures for constructing neural layers to solve practical tasks. However, the operation of the batch normalization layer is equally important in organizing the functioning of a neural network, although its task may not be immediately apparent. Rather, it is hidden within the organization of the processes of the neural network itself and serves more for the stability of our model.

We have already built the class initialization methods. Now it's time to build the algorithm of method operation directly. We begin this process with the FeedForward method. This method is declared virtual in the CNeuronBase neural layer base class of our library and is overridden in each new class.

I would like to remind you that this approach allows us to eliminate the use of dispatch methods and functions for reallocating information flows and calling various methods depending on the class of the object being used. In practice, we can simply pass a pointer to any derived object into a local variable of the base class of the neural layer and call the method declared in the base class. At the same time, the system will perform all dispatching functions without our participation. It will call the method related to the actual type of the object.

This property is exactly what we exploit when expecting to receive a pointer to an object of the base class of the neural layer in the method parameters. At the same time, a pointer to any of the neural layer objects in our library can be passed in the parameters. We can work with it through the use of overridden virtual functions.

The operation of the feed-forward method itself starts with a control block for checking pointers to the objects used by the method. Here we check both the pointer to the object of the previous layer obtained in the parameters and pointers to internal objects.

bool CNeuronBatchNorm::FeedForward(CNeuronBase *prevLayer)
  {
//--- control block
   if(!prevLayer || !prevLayer.GetOutputs() || !m_cOutputs ||
      !m_cWeights || !m_cActivation)
      return false;

Please note that along with other objects, we also check the pointer to the activation function object. Although the batch normalization algorithm does not use an activation function, we will not limit the user's capabilities and will provide them with the option to use an activation function as they see fit. Moreover, there are practical cases where applying an activation function after data normalization is beneficial. For example, the method authors recommend normalizing data immediately before applying the activation function. At first glance, applying such an approach would require modifications to every previously discussed class. However, we can implement the same functionality without modifying the existing classes. We simply need to declare the required neural layer without an activation function, followed by a normalization layer with the desired activation function. Therefore, I believe the use of the activation feature in our class is justified.

Next, we will branch the algorithm for a case when the normalization batch size is equal to 1 or less. It should be understood that when the batch is equal to 1, no normalization is performed, and we simply pass the tensor of the original data to the output of the neural layer. After completing the data copy from the buffer, we call the activation method and exit the method after verifying the results of the operations.

//--- check the size of the normalization batch
   if(m_iBatchSize <= 1)
     {
      m_cOutputs.m_mMatrix = prevLayer.GetOutputs().m_mMatrix;
      if(m_cOpenCL && !m_cOutputs.BufferWrite())
         return false;
      if(!m_cActivation.Activation(m_cOutputs))
         return false;
      return true;
     }

Next, we need to construct the algorithm of the method. Following the concept we have adopted, we will create two variants of the algorithm implementation: by standard MQL5 tools and in the multi-threaded calculations mode using OpenCL. Therefore, next, we create another branching of the algorithm depending on the user's choice of the computational device. In this section, we will consider the construction of the algorithm using MQL5. In further sections, we will return to the construction of the algorithm using OpenCL.

//--- branching of the algorithm over the computing device
   if(!m_cOpenCL)
     {

We start the block of operations using MQL5 with a small preparatory work. To simplify the process of accessing the data, we save a sequence of raw data into a local matrix.

      MATRIX inputs = prevLayer.GetOutputs().m_mMatrix;
      if(!inputs.Reshape(1prevLayer.Total()))
         return false;

According to the data normalization algorithm, we find the mean value. In considering the architecture of our solution, we have decided to use an exponential moving average, which is determined by the formula.

      VECTOR mean = (m_cBatchOptions.Col(0) * ((TYPE)m_iBatchSize - 1.0) + 
                     inputs.Row(0)) / (TYPE)m_iBatchSize;

After determining the moving average, we find the average variance.

      VECTOR delt = inputs.Row(0) - mean;
      VECTOR variance = (m_cBatchOptions.Col(1) * ((TYPE)m_iBatchSize - 1.0) +
                         MathPow(delt2)) / (TYPE)m_iBatchSize;

Once the mean and variance values are found, we can easily compute the normalized value of the current element in the sequence.

      VECTOR std = sqrt(variance) + 1e-32;
      VECTOR nx = delt / std;

Note that we add a small constant to the variance to eliminate the potential zero division error.

The next step of the batch normalization algorithm is shift and scaling.

      VECTOR res = m_cWeights.Col(0) * nx + m_cWeights.Col(1);

After that, we only need to save the obtained values into the respective elements of the buffers. Please note that we save not only the results of the algorithm operations in the result buffer but also our intermediate values in the normalization parameters buffer. We will need them in subsequent iterations of the algorithm. Do not forget to check the results of the operations.

      if(!m_cOutputs.Row(res0) ||
         !m_cBatchOptions.Col(mean0) ||
         !m_cBatchOptions.Col(variance1) ||
         !m_cBatchOptions.Col(nx2))
         return false;
     }
   else  // OpenCL block
     {
      return false;
     }

This completes the algorithm splitting depending on the computing device used. As always, we will set a temporary stub for the OpenCL block in the form of a false value return. We will return to this part later.

Now, before exiting the method, we activate the values in the result buffer of our class. To do this, we call the Activation method of our special object to work with the m_cActivation activation function. After checking the result of the operation, we terminate the method.

   if(!m_cActivation.Activation(m_cOutputs))
      return false;
//---
   return true;
  }

With that, we conclude our work on the feed-forward method of the CNeuronBatchNorm batch normalization class. I hope that understanding the logic behind its construction wasn't difficult for you. Now, let's move on to building the backpropagation methods.