Creating a neural layer using MQL5 tools

When starting to implement a fully connected neural layer, it should be taken into account that this will be the base class for all subsequent architectural solutions of neural layers. Therefore, we must make it as versatile as possible while also allowing for a potential expansion of functions. At the same time, we should provide the possibility to easily integrate extensions into the existing solution.

Let's start by creating our neural layer base class CNeuronBase inherited from the CObject class. We define the internal variables of the class:

  • m_cOpenCL — a pointer to an instance of the class for working with OpenCL technology
  • m_cActivation — a pointer to an activation function object
  • m_eOptimization — the type of neuron optimization method during training
  • m_cOutputs — an array of values at the output of neurons
  • m_cWeights — an array of weights
  • m_cDeltaWeights — an array for accumulating outstanding weight updates (cumulative error gradient for each weight since the last update)
  • m_cGradients — the error gradient at the output of the neural layer as a result of the last iteration of the backward pass
  • m_cMomenum — unlike other variables, this will be an array of two elements for recording pointers to arrays of accumulated moments

To facilitate access to variables from derived classes, all variables will be declared in a block called protected.

In the class constructor, we initialize the above variables with default parameters. I have specified Adam and Swish optimization method as an activation function but you can choose your preferred optimization method and activation function. We will leave the pointer to the class working with OpenCL empty and create instances for all other classes used.

CNeuronBase::CNeuronBase(void)   : m_eOptimization(Adam)
  {
   m_cOpenCL = NULL;
   m_cActivation = new CActivationSwish();
   m_cOutputs = new CBufferType();
   m_cWeights = new CBufferType();
   m_cDeltaWeights = new CBufferType();
   m_cGradients = new CBufferType();
   m_cMomenum[0] = new CBufferType();
   m_cMomenum[1] = new CBufferType();
  }

We immediately create a class destructor so we don't forget about memory cleanup after the class finishes its work.

CNeuronBase::~CNeuronBase(void)
  {
   if(!!m_cActivation)
      delete m_cActivation;
   if(!!m_cOutputs)
      delete m_cOutputs;
   if(!!m_cWeights)
      delete m_cWeights;
   if(!!m_cDeltaWeights)
      delete m_cDeltaWeights;
   if(!!m_cGradients)
      delete m_cGradients;
   if(!!m_cMomenum[0])
      delete m_cMomenum[0];
   if(!!m_cMomenum[1])
      delete m_cMomenum[1];
  }

Next, we create a neural layer initialization method. In the parameters, the method receives an instance of the CLayerDescription class with a description of the layer to be created. To avoid getting lost in the intricacies of the method algorithm, I suggest breaking it down into separate logical blocks

The method starts with a block in which we check input parameters. First, we check the validity of the pointer to the object. Then we check the type of the layer being created and the number of neurons in the layer: each layer should have at least one neuron, because from the logical perspective of constructing a neural network, a layer without neurons blocks the passage of the signal and paralyzes the entire network. Note that when checking the type of the created layer, we use the virtual method Type and not the constant defNeuronBase which it returns. This is a very important point for future class inheritance. The fact is that when using a constant, calling such a method for descendant classes would always return false when trying to create a layer other than the base one. Using a virtual method allows us to obtain a constant identifier of the final derived class, and the check will yield a true comparison result between the specified type of neural layer and the object being created.

bool CNeuronBase::Init(const CLayerDescription *desc)
  {
//--- source data control block
   if(!desc || desc.type != Type() || desc.count <= 0)
      return false;

In the next block, we will verify the validity of previously created buffers for recording the data flow coming out of the neural layer and the gradient to them (if necessary, we create new instances of the class). We initialize arrays with zero values.

//--- creating a results buffer
   if(!m_cOutputs)
      if(!(m_cOutputs = new CBufferType()))
         return false;
   if(!m_cOutputs.BufferInit(1desc.count0))
      return false;
//--- creating error gradient buffer
   if(!m_cGradients)
      if(!(m_cGradients = new CBufferType()))
         return false;
   if(!m_cGradients.BufferInit(1desc.count0))
      return false;

After that, we check the number of elements of the input signal. In the case of using the neural layer as an array of incoming signals, we will not have preceding neural layers, and other data buffers will not be required. We can remove them without any problem and clear the memory. Then we check the validity of the pointer to the object in m_cOpenCL and, if the result is positive, we create a copy of the data buffer in the OpenCL context.

//--- removing unused features for the source data layer
   if(desc.window <= 0)
     {
      if(m_cActivation)
         delete m_cActivation;
      if(m_cWeights)
         delete m_cWeights;
      if(m_cDeltaWeights)
         delete m_cDeltaWeights;
      if(m_cMomenum[0])
         delete m_cMomenum[0];
      if(m_cMomenum[1])
         delete m_cMomenum[1];
      if(m_cOpenCL)
         if(!m_cOutputs.BufferCreate(m_cOpenCL))
            return false;
      m_eOptimization = desc.optimization;
      return true;
     }

Further method code is executed only if there are previous neural layers. Let's create and initialize an instance of the activation function method. We have moved this process to a separate method, SetActivation, which we are now simply calling. We will examine the algorithm of the SetActivation method a bit later.

//--- initializing an activation function object
   VECTOR ar_temp = desc.activation_params;
   if(!SetActivation(desc.activationar_temp))
      return false;

The next step is to initialize the matrix of weights. We determine the number of elements in the matrix and initialize it with random values using the Xavier method. In the case of using LReLU as an activation function, we will use the He method.

//--- initializing a weight matrix object
   if(!m_cWeights)
      if(!(m_cWeights = new CBufferType()))
         return false;
   if(!m_cWeights.BufferInit(desc.countdesc.window + 10))
      return false;
   double weights[];
   double sigma = (desc.activation == AF_LRELU ?
                  2.0 / (double)(MathPow(1 + desc.activation_params[0], 2)
                                                           * desc.window) :
                  1.0 / (double)desc.window);
   if(!MathRandomNormal(0MathSqrt(sigma), m_cWeights.Total(), weights))
      return false;
   for(uint i = 0i < m_cWeights.Total(); i++)
      if(!m_cWeights.m_mMatrix.Flat(i, (TYPE)weights[i]))
         return false;

We still need to initialize the buffers for deltas and moments. The size of the buffers will be equal to the size of the weight matrix, and we will initialize them with zero values. Remember that not all optimization methods use the moment matrices in the same way. Therefore, we will initialize the matrices of moments depending on the optimization method. We will clear and delete unnecessary arrays to free up memory for productive use.

//--- initialization of the gradient accumulation object at the weight matrix level
   if(!m_cDeltaWeights)
      if(!(m_cDeltaWeights = new CBufferType()))
         return false;
   if(!m_cDeltaWeights.BufferInit(desc.countdesc.window + 10))
      return false;
//--- initializing moment objects
   switch(desc.optimization)
     {
      case None:
      case SGD:
         for(int i = 0i < 2i++)
            if(m_cMomenum[i])
               delete m_cMomenum[i];
         break;

      case MOMENTUM:
      case AdaGrad:
      case RMSProp:
         if(!m_cMomenum[0])
            if(!(m_cMomenum[0] = new CBufferType()))
               return false;
         if(!m_cMomenum[0].BufferInit(desc.countdesc.window + 10))
            return false;
         if(m_cMomenum[1])
            delete m_cMomenum[1];
         break;

      case AdaDelta:
      case Adam:
         for(int i = 0i < 2i++)
           {
            if(!m_cMomenum[i])
               if(!(m_cMomenum[i] = new CBufferType()))
                  return(false);
            if(!m_cMomenum[i].BufferInit(desc.countdesc.window + 10))
               return false;
           }
         break;

      default:
         return false;
         break;
     }
//--- saving parameter optimization method
   m_eOptimization = desc.optimization;
   return true;
  }

At the end of the method we save the specified weight optimization method.

The SetOpenCL method is used to save a pointer to the object of work with the OpenCL context and looks simpler than the initialization method. However, unlike all previously considered methods, we do not terminate the method operation upon receiving an invalid pointer to the object. This is because we do not introduce a flag for the use of OpenCL technology in every neural layer class. Instead, we use a single flag in the base class of the neural network. In turn, to check the use of the technology inside the class, we can verify the validity of the pointer in the m_cOpenCL variable.

It should be noted that all objects of the neural network operate within a single OpenCL context. All objects are provided with a pointer to the same object of the CMyOpenCL class. With such an approach, deleting an instance of the class in one of the neural network objects will invalidate the pointer in all objects that use it. The flag may not correspond to the current state of the pointer. Additionally, in the case of disabling the use of technology, we leave the possibility of specifying an empty value of the pointer to the object.

Therefore, the code of our method can be conditionally divided into two parts. The first part of the code will be executed when receiving an invalid pointer to the object. In this case, we need to clear all previously created data buffers in the OpenCL context.

bool CNeuronBase::SetOpenCL(CMyOpenCL *opencl)
  {
   if(!opencl)
     {
      if(m_cOutputs)
         m_cOutputs.BufferFree();
      if(m_cGradients)
         m_cGradients.BufferFree();
      if(m_cWeights)
         m_cWeights.BufferFree();
      if(m_cDeltaWeights)
         m_cDeltaWeights.BufferFree();
      for(int i = 0i < 2i++)
        {
         if(m_cMomenum[i])
            m_cMomenum[i].BufferFree();
        }
      if(m_cActivation)
         m_cActivation.SetOpenCL(m_cOpenCLRows(), Cols());
      m_cOpenCL = opencl;
      return true;
     }

The second part of the method will be executed when receiving a valid pointer to the object working with the OpenCL context. Here, we organize the creation of new data buffers in the specified OpenCL context for all objects of the current class.

   if(m_cOpenCL)
      delete m_cOpenCL;
   m_cOpenCL = opencl;
   if(m_cOutputs)
      m_cOutputs.BufferCreate(opencl);
   if(m_cGradients)
      m_cGradients.BufferCreate(opencl);
   if(m_cWeights)
      m_cWeights.BufferCreate(opencl);
   if(m_cDeltaWeights)
      m_cDeltaWeights.BufferCreate(opencl);
   for(int i = 0i < 2i++)
     {
      if(m_cMomenum[i])
         m_cMomenum[i].BufferCreate(opencl);
     }

   if(m_cActivation)
      m_cActivation.SetOpenCL(m_cOpenCLRows(), Cols());
//---
   return(!!m_cOpenCL);
  }

Earlier, we talked about isolating the activation function initialization procedure into a separate method. I suggest examining this method to complete the description of the new object initialization process. This is one of the few methods where we don't organize a block for data verification. Verification of the activation function parameters is not feasible due to the variance in the range of permissible values when using different functions. In most cases, the range of their values is limited only by common sense and the architectural requirements of the model.

As for the choice of the activation function, it exists implicitly, in the form of a list of allowable values. But even if the user inserts a value not from the enumeration, we will create activation function objects within the body of the switch statement. This means that we will have implicit control over the type of the activation function, and if the specified value is absent in the selection function, we will create a base class without an activation function.

The need to create a base class is due to maintaining the functionality of the class without using an activation function in standard mode. As you will see a little later, in some cases we will use neural layers without activation functions.

bool CNeuronBase::SetActivation(ENUM_ACTIVATION_FUNCTION functionVECTOR &params)
  {
   if(m_cActivation)
      delete m_cActivation;

   switch(function)
     {
      case AF_LINEAR:
         if(!(m_cActivation = new CActivationLine()))
            return false;
         break;

      case AF_SIGMOID:
         if(!(m_cActivation = new CActivationSigmoid()))
            return false;
         break;

      case AF_LRELU:
         if(!(m_cActivation = new CActivationLReLU()))
            return false;
         break;

      case AF_TANH:
         if(!(m_cActivation = new CActivationTANH()))
            return false;
         break;

      case AF_SOFTMAX:
         if(!(m_cActivation = new CActivationSoftMAX()))
            return false;
         break;

      case AF_SWISH:
         if(!(m_cActivation = new CActivationSwish()))
            return false;
         break;

      default:
         if(!(m_cActivation = new CActivation()))
            return false;
         break;
     }

After creating an instance of the required activation function object, we pass the function parameters and a pointer to the OpenCL context object to the new object.

   if(!m_cActivation.Init(params[0], params[1]))
      return false;
   m_cActivation.SetOpenCL(m_cOpenCLm_cOutputs.Rows(), m_cOutputs.Cols());
   return true;
  }

Feed-forward operations will be implemented in the FeedForward method. In the parameters, the method receives a pointer to the object of the previous layer. Since we are planning to build the classes of all neural layers based on one base class, we can use the class of the base neural layer in the method parameters to get a pointer to the previous layer of any type. The use of virtual access methods to the internal objects of the class allows you to build a universal interface without being tied to a specific type of neural layer.

At the beginning of the method, we check the validity of pointers to all objects used in the method. This is our initial data: the pointer to the previous layer received in the parameters, as well as the buffer of the neurons' output states contained in it. Together with them, we will check the pointers to the weight matrix and the buffer for recording the results of the forward pass of the current layer, that is, the buffer of the output states of the neurons of the current layer. Again, it's a good practice to check the pointer to the instance of the class for calculating the values of the activation function.

bool CNeuronBase::FeedForward(CNeuronBase * prevLayer)
  {
//--- control block
   if(!prevLayer || !m_cOutputs || !m_cWeights ||
      !prevLayer.GetOutputs() || !m_cActivation)
      return false;
   CBufferTypeinput_data = prevLayer.GetOutputs();

Then we check the pointer to the object working with OpenCL. If the pointer is valid, we move on to the block that is using this technology. We will talk about it a little later when considering the organization of the process of parallel computing. In case of an invalid pointer to an object or its absence, we move on to the block of calculations using standard MQL5 tools. Here, we will first check the consistency of matrix sizes and reformat the source data matrix into a vector, adding a unit element for the bias. We will perform the operation of matrix multiplication by the weight matrix. The result will be written to the outgoing stream buffer. Before exiting the method, do not forget to compute the values of the activation function at the output of the neural layer.

//---branching of the algorithm depending on the device for performing operations
   if(!m_cOpenCL)
     {
      if(m_cWeights.Cols() != (input_data.Total() + 1))
         return false;
      //---
      MATRIX m = input_data.m_mMatrix;
      if(!m.Reshape(1input_data.Total() + 1))
         return false;
         m[0m.Cols() - 1] = 1;
         m_cOutputs.m_mMatrix = m.MatMul(m_cWeights.m_mMatrix.Transpose());
        }

   else
     {
      //--- Here is the code for accessing the OpenCL program
      return false;
     }
//---
   return m_cActivation.Activation(m_cOutputs);
  }

The forward pass is followed by the backpropagation pass. We break down this neural network training procedure into component parts and create four methods:

  • CalcOutputGradient for calculating the error gradient at the output of the neural network,
  • CalcHiddenGradient to enable the gradient propagation through the hidden layer,
  • CalcDeltaWeights for calculating the necessary weight adjustments,
  • UpdateWeights for updating the weight matrix.

We will move along the data flow path and consider the algorithm of each method.

In the process of supervised learning, after the forward pass, the calculated output values of the neural network are compared with the target values. The deviation on each neuron of the output layer is determined at this moment. We perform this operation in the CalcOutputGradient method. The algorithm of this method is quite simple: the method receives an array of target values and the type of the loss function used as parameters. At the beginning of the method, we will validate the pointers to the used objects as well as ensure the compatibility of the array sizes.

bool CNeuronBase::CalcOutputGradient(CBufferType * target, ENUM_LOSS_FUNCTION loss)
  {
//--- control block
   if(!target || !m_cOutputs || !m_cGradients ||
      target.Total() < m_cOutputs.Total() ||
      m_cGradients.Total() < m_cOutputs.Total())
      return false;

Next, similar to the feed-forward method, we will create a branching in the algorithm depending on the device used for calculations. The algorithm using the OpenCL technology will be discussed in the next chapter, and now let's look at the process construction using MQL5.

Let's take a look at the process of computing the error gradient at the output of the neural network. At first glance, we should move in the direction of minimizing the error for each neuron. In other words, calculate the difference between the reference and calculated values and minimize this difference. In this case, we get a linear dependence of the error and the gradient. This is true when using mean absolute error as a loss function, with all the resulting advantages and disadvantages.

When we were talking about the loss function, we considered other options and discussed their advantages and disadvantages. But how can we take advantage of them? The answer here is pretty simple. One should consider the loss function and the trainable model as a single complex function. In this case, we should minimize not the deviation for each neuron of the output layer, but directly the value of the loss function. Just as when propagating the error gradient through the neural network, we calculate the derivative of the loss function and multiply it by the deviation of the loss function value from zero. Moreover, for MAE and MSE we can consider only the derivative of the loss function as the error and disregard multiplying it by the value of the loss function since this linear scaling will be compensated by the learning rate, while when using cross-entropy, we are compelled to multiply it by the value of the loss function. The reason is that if the target and calculated values are equal, the loss function will give 0, and its derivative will be equal to −1. If we don't multiply the derivative by the error, we will continue adjusting the model parameters in the absence of an error.

In this case, it is not at all necessary to fully calculate the value of the loss function. Cross-entropy is commonly used as the loss function in classification tasks. Therefore, as target values, we expect to obtain a vector in which only one element will be set to one, while all others will be zero. For zero values, the derivative will also be zero, and multiplication by 1 doesn't change the result. Therefore, it is enough for us to multiply the derivative by the logarithm of the calculated value. It is the logarithm of 1 that will give 0, indicating that there is no error.

Taking into account the above, to calculate the corresponding error gradient at the model's output, we will use a switch statement to create a branching process based on the employed loss function. In case the specified loss function is not present, we will calculate the simple deviation of the calculated results from the target values.

//---branching of the algorithm depending on the device for performing operations
  if(!m_cOpenCL)
    {
    switch(loss)
       {
        case LOSS_MAE:
          m_cGradients.m_mMatrix = target.m_mMatrix - m_cOutputs.m_mMatrix;
          break;
        case LOSS_MSE:
          m_cGradients.m_mMatrix = (target.m_mMatrix - m_cOutputs.m_mMatrix) * 2;
          break;
        case LOSS_CCE:
          m_cGradients.m_mMatrix = target.m_mMatrix /
          (m_cOutputs.m_mMatrix + FLT_MIN) * MathLog(m_cOutputs.m_mMatrix) * (-1);
          break;

        case LOSS_BCE:
          m_cGradients.m_mMatrix = (target.m_mMatrix-m_cOutputs.m_mMatrix) /
              (MathPow(m_cOutputs.m_mMatrix, 2) - m_cOutputs.m_mMatrix + FLT_MIN);
          break;
        default:
          m_cGradients.m_mMatrix = target.m_mMatrix - m_cOutputs.m_mMatrix;
          break;
       }
    }
  else
    return false;
//---
  return true;
 }

After obtaining the error at the neural network output, it's necessary to determine the influence of each neuron in our network on this error. To achieve this, we need to propagate the error gradient layer by layer, reaching every neuron. The responsibility for organizing the loop that iterates through the layers of the neural network lies with the network manager, that is, the neural network base class CNet. Now we will examine the organization of the process within a single neural layer.

In the parameters, the CalcHiddenGradient method receives a pointer to the previous layer of the neural network. We will need it to write the transmitted error gradient. In the previous method, we determined the error at the neuron output, but the neuron output value depends on the activation function. To determine the influence of each element of the input data on the final result, it's necessary to exclude the influence of the activation function on the error. To achieve this, we will adjust the error gradient using the derivative of the activation function. This operation, like the computation of the activation function itself, is implemented in a separate class.

bool CNeuronBase::CalcHiddenGradient(CNeuronBase *prevLayer)
  {
//--- adjusting the incoming gradient to the derivative of the activation function
   if(!m_cActivation.Derivative(m_cGradients))
      return false;

Next comes the block in which we check pointers of used objects. First, we validate the received pointer to the previous layer. Then, we extract and validate the pointers to the buffers of results and gradients from the previous layer. We also verify the consistency of the number of elements in the specified buffers. Additionally, we check the presence of a sufficient number of elements in the weight matrix. Such a number of preventive checks are necessary for the stable operation of the method and to prevent potential errors when accessing data arrays.

//--- checking the buffers of the previous layer
   if(!prevLayer)
      return false;
   CBufferType *input_data = prevLayer.GetOutputs();
   CBufferType *input_gradient = prevLayer.GetGradients();
   if(!input_data || !input_gradient ||
      input_data.Total() != input_gradient.Total())
      return false;
//--- checking the correspondence between the size of the source data buffer and the weight matrix
   if(!m_cWeights || m_cWeights.Cols() != (input_data.Total() + 1))
      return false;

After successfully passing all the checks, we proceed directly to the computational part. Let me remind you that the derivative of the product of a variable and a constant is a constant. In this case, the derivative with respect to the neuron is the corresponding weight. Consequently, the neuron influence on the result is the product of the error gradient at the output of the function and the corresponding weight. We will calculate the sum of such products for each neuron in the previous layer. We will write the obtained values into the corresponding cell of the gradient buffer of the previous layer.

As in the methods described above, we carry out the separation of the algorithm depending on the computing device used. We will get acquainted with the algorithm for implementing multi-threaded calculations a little later. Let's now consider the implementation of the algorithm using MQL5 tools. As mentioned earlier, we need to calculate the sum of products of error gradients from neurons dependent on a given neuron and their corresponding weights. Performing this operation is easily accomplished using matrix multiplication. In this case, it suffices to multiply the error gradient matrix by the matrix of weights. We will store the result of the operation in a local matrix.

We cannot immediately write the result of the operation to the error gradient matrix of the previous layer. If you look at the forward pass method, you will see how we added the bias element. Accordingly, when multiplying matrices, we will get the result, taking into account the error on the bias element. However, the previous layer does not expect this value, and the size of the matrix of gradients is smaller. Therefore, we will first resize the matrix obtained from the multiplication operation to the required dimensions, and then transfer its values to the gradient matrix of the previous layer.

Note that in this method, we do not adjust the gradient obtained at the output of the previous layer by the derivative of the activation function of neurons in the previous layer, as we did with a similar operation at the beginning of this method. Therefore, if the previous layer is the hidden layer of our network, then the first thing that will be done when calling the considered method on the lower layer is to adjust the gradient for the derivative of the activation function. Doubling the operation will lead to errors.

//--- branching of the algorithm depending on the device for performing operations
   if(!m_cOpenCL)
     {
      MATRIX grad = m_cGradients.m_mMatrix.MatMul(m_cWeights.m_mMatrix);
      if(!grad.Reshape(input_data.Rows(), input_data.Cols()))
         return false;
      input_gradient.m_mMatrix = grad;
     }
   else
      return false;
//---
   return true;
  }

We now have a calculated error gradient on each neuron in our network. There is enough data to update the weights. However, as we know, the weights are not always updated after each iteration of the backpropagation pass. Therefore, we separated the process of updating the weight matrix into two methods. In the first one, we will calculate the error gradient for each weight similarly to how we calculated the error gradient for the neuron in the previous layer. In the second one, we will adjust the weight matrix.

We will calculate the value of the error gradient for the weight matrix in the CalcDeltaWeights method. In the parameters of the method, similar to the previous one, there will be a pointer to the preceding layer of the neural network, but now we will use not the gradient buffer from it, but the array of output values.

Similar to the previously discussed methods, this method starts with a block of checks. It is followed by a block of calculations.

bool CNeuronBase::CalcDeltaWeights(CNeuronBase *prevLayerbool read);
  {
//--- control block
   if(!prevLayer || !m_cDeltaWeights || !m_cGradients)
      return false;
   CBufferType *Inputs = prevLayer.GetOutputs();
   if(!Inputs)
      return false;

In the previous method, we have already adjusted the gradient for the derivative of the activation function. Therefore, we will skip this iteration and proceed directly to the calculation of the gradient on the weights. Here, as in other methods, there is a branching of the algorithm based on the computation device. In the MQL5 block, similarly to the previous method, we will employ matrix multiplication, because, in essence, both methods perform a similar operation only for different matrices. But there are a few differences here.

First, in the previous method, we removed the bias element. However, in this case, we need to add a unitary element to the vector of the previous layer results in order to determine the error gradient on the corresponding weight.

Second, earlier we multiplied the matrix of gradients by the matrix of weights. Now we multiply the transposed matrix of error gradients by the vector of the previous layer results with the bias element.

In addition, we were overwriting the error gradient of the previous layer, but for the weight gradient, we will sum them up, thereby accumulating the error gradient over the entire period between weight update operations.

//--- branching of the algorithm depending on the device for performing operations
   if(!m_cOpenCL)
     {
      MATRIX m = Inputs.m_mMatrix;
      if(!m.Reshape(1Inputs.Total() + 1))
         return false;
      m[0Inputs.Total()] = 1;
      m = m_cGradients.m_mMatrix.Transpose().MatMul(m);
      m_cDeltaWeights.m_mMatrix += m;
     }
   else
      return false;
//---
   return true;
  }

At the conclusion of the backpropagation process, we need to adjust the weight matrix. To perform this functionality, our class provides the UpdateWeights method. However, let's not forget that we have different options available for choosing the optimization method. The question was resolved using a simple and intuitive approach. The public method for updating the weights provides a dispatcher function to select the optimization method based on the user's choice. The actual process of adjusting the weight matrix is implemented in separate methods, with one method for each optimization method version.

bool CNeuronBase::UpdateWeights(int batch_sizeTYPE learningRate,
                                VECTOR &BetaVECTOR &Lambda)
  {
//--- control block
   if(!m_cDeltaWeights || !m_cWeights ||
       m_cWeights.Total() < m_cDeltaWeights.Total() || batch_size <= 0)
      return false;
//---
   bool result = false;
   switch(m_eOptimization)
     {
      case None:
         result = true;
         break;

      case SGD:
         result = SGDUpdate(batch_sizelearningRateLambda);
         break;
      case MOMENTUM:
         result = MomentumUpdate(batch_sizelearningRateBetaLambda);
         break;
      case AdaGrad:
         result = AdaGradUpdate(batch_sizelearningRateLambda);
         break;
      case RMSProp:
         result = RMSPropUpdate(batch_sizelearningRateBetaLambda);
         break;
      case AdaDelta:
         result = AdaDeltaUpdate(batch_sizeBetaLambda);
         break;
      case Adam:
         result = AdamUpdate(batch_sizelearningRateBetaLambda);
         break;
     }
//---
   return result;
  }

Algorithms for each of the weight optimization methods have already been presented earlier, while we considered their features. We will not duplicate them here, but we will implement them in the protected block in our base class of the neural layer.

We have already discussed the implementation of feed-forward and backpropagation operations in a fully connected neural layer. However, we will not re-train the neural network at each launch. Therefore, we need methods for working with files: writing and reading data from the state of the neural layer. We should be resource-efficient, so let's consider which information we need to save. The general rule is to save a minimum amount of information, but it should be sufficient for a quick startup and the functioning of the class without interrupting the process. Let's take a look at internal class variables and critically evaluate the need to save their contents to a file.

  • m_cOpenCL — a pointer to an instance of the class for working with OpenCL technology, which is responsible for a separate functionality, but does not contain additional information. Not to be written to file.
  • m_cActivation — a pointer to an activation function object. The activation function type is set by the user when constructing a neural network. Using a different activation function can lead to distortion of the results across the entire network. Save.
  • m_eOptimization — a type of neuron optimization method during training, which is specified by the user when constructing a neural network. Influences the learning process. Save.
  • m_cOutputs — an array of neuron output values. The number of elements is set by the neural network architect. The content is overwritten on every forward pass. It's sufficient to save the number of neurons in the layer and not save the entire array.
  • m_cWeights — a weight matrix. The value of the elements is formed in the process of training the neural network. Save.
  • m_cDeltaWeights — a matrix for accumulating failed weight updates (cumulative error gradient for each weight since the last update). Values ​​are accumulated between weights matrix updates and reset to zero after weights adjustments. The size of the array is equal to the weight matrix. Not to be written to a file.
  • m_cGradients — the error gradient at the output of the neural layer as a result of the last iteration of the backward pass. The content is overwritten on every backward pass. The size of the array is equal to the buffer of the output signal. Not to be written to a file.
  • m_cMomenum – unlike other variables, this will be an array of two elements for writing pointers to moment accumulation arrays. The use of buffers depends on the optimization method. The content is accumulated during the training of the neural network. Save.

After determining the data to be written to the file, let's proceed to create the file writing method Save. This virtual method exists in all descendant classes of the CObject class. In the parameters, the method receives the handle of the file to be written.

In the body of the method, we first check the received handle and the validity of the pointer to the result buffer of the neural layer. As we remember, a neural layer can be used with both full functionality and not. When using an object as a layer of input data, we deleted all buffers except for the input data buffer. Therefore, the presence of this buffer is mandatory for the neural layer. If any of the checks fail, we exit the method with a result of false.

Next, we write the type of the neural layer and the size of the result buffer to the file. At the same time, do not forget to check the results of the operations.

bool CNeuronBase::Save(const int file_handle)
  {
//--- control block
   if(file_handle == INVALID_HANDLE)
      return false;
//--- writing result buffer data
   if(!m_cOutputs)
      return false;
   if(FileWriteInteger(file_handleType()) <= 0 ||
      FileWriteInteger(file_handlem_cOutputs.Total()) <= 0)
      return false;

After successfully writing the size of the result buffer, we check the validity of the pointers to the activation function objects and the weight matrices. In the absence of at least one object, we consider the current neural layer to be the initial data layer. To confirm this, we write 1 as a flag to indicate the preservation of the input data layer in the file. Otherwise, we save 0, which will indicate the preservation of the full-functionality neural layer.

//--- checking and writing the source data layer flag
   if(!m_cActivation || !m_cWeights)
     {
      if(FileWriteInteger(file_handle1) <= 0)
         return false;
      return true;
     }
   if(FileWriteInteger(file_handle0) <= 0)
      return false;

Then, using the optimization method, we determine the number of moments required for recording to the buffer.

   int momentums = 0;
   switch(m_eOptimization)
     {
      case SGD:
         momentums = 0;
         break;
      case MOMENTUM:
      case AdaGrad:
      case RMSProp:
         momentums = 1;
         break;
      case AdaDelta:
      case Adam:
         momentums = 2;
         break;
      default:
         return false;
         break;
     }

Immediately, we organize a loop to validate the pointers to the momentum buffers.

   for(int i = 0i < momentumsi++)
      if(!m_cMomenum[i])
         return false;

After the block of checks, there are operations for directly writing data to the file. First, we save the values of variables, and then we call the file writing methods for the objects that need to be saved.

//--- saving a matrix of weighting coefficients, moments, and activation functions
   if(FileWriteInteger(file_handle, (int)m_eOptimization) <= 0 ||
      FileWriteInteger(file_handlemomentums) <= 0)
      return false;
   if(!m_cWeights.Save(file_handle) || !m_cActivation.Save(file_handle))
      return false;
   for(int i = 0i < momentumsi++)
      if(!m_cMomenum[i].Save(file_handle))
         return false;
//---
   return true;
  }

As seen from the provided code, we simply skip objects that do not need to be saved. However, this approach is not applicable when loading data from a file, as even skipped objects are necessary for the normal functioning of the neural layer. Therefore, the data loading method Load must be supplemented with a missing object initialization block. Let's see how it is implemented.

Just like when writing to a file, the method also receives a file handle for data in its parameters. Therefore, at the beginning of the method, we validate the received file handle.

bool CNeuronBase::Load(const int file_handle)
  {
//--- control block
   if(file_handle == INVALID_HANDLE)
      return false;

Reading data from the file should be done in precise accordance with the sequence of data writing. First, we saved the type of neural layer and the number of elements in the buffer in the results buffer. The type of the neural layer will be read by the method of the top-level object (dynamic array of neural layers) to create the required neural layer. In the body of this method, we will read the number of elements in the result buffer and initialize a buffer of the corresponding size.

//--- loading result buffer
   if(!m_cOutputs)
      if(!(m_cOutputs = new CBufferType()))
         return false;
   int outputs = FileReadInteger(file_handle);
   if(!m_cOutputs.BufferInit(1outputs0))
      return false;

Immediately create a gradient buffer of the same size.

//--- creating error gradient buffer
   if(!m_cGradients)
      if(!(m_cGradients = new CBufferType()))
         return false;
   if(!m_cGradients.BufferInit(1outputs0))
      return false;

Next, we check the flag for loading the input data neural layer. In the case of loading it, we delete unused objects and exit the method with a positive result.

//--- checking the source data layer flag
   int input_layer = FileReadInteger(file_handle);
   if(input_layer == 1)
     {
      if(m_cActivation)
         delete m_cActivation;
      if(m_cWeights)
         delete m_cWeights;
      if(m_cDeltaWeights)
         delete m_cDeltaWeights;
      if(m_cMomenum[0])
         delete m_cMomenum[0];
      if(m_cMomenum[1])
         delete m_cMomenum[1];
      if(m_cOpenCL)
         if(!m_cOutputs.BufferCreate(m_cOpenCL))
            return false;
      m_eOptimization = None;
      return true;
     }

Further code is executed only when loading a fully functional neural layer. At the beginning of this block, we read the optimization method from the file and the number of used momentum buffers.

   m_eOptimization = (ENUM_OPTIMIZATION)FileReadInteger(file_handle);
   int momentums = FileReadInteger(file_handle);

After that, we check the pointer to the weights matrix object. If necessary, we will create a new instance of the object and immediately call the data buffer loading method.

//--- creating objects before loading data
   if(!m_cWeights)
      if(!(m_cWeights = new CBufferType()))
         return false;
//--- loading data from file
   if(!m_cWeights.Load(file_handle))
      return false;

Then, we read the type of the activation function from the file and initialize an instance of the corresponding class using the SetActivation method. The activation function parameters will be loaded by calling the method with the same name for loading data from the activation function object.

//--- activation function
   if(FileReadInteger(file_handle) != defActivation)
      return false;
   ENUM_ACTIVATION_FUNCTION activation = 
                         (ENUM_ACTIVATION_FUNCTION)FileReadInteger(file_handle);
   if(!SetActivation(activation,VECTOR::Zeros(2)))
      return false;
   if(!m_cActivation.Load(file_handle))
      return false;

Similarly, we will load the data of the momentum buffers.

//---
   for(int i = 0i < momentumsi++)
     {
      if(!m_cMomenum[i])
         if(!(m_cMomenum[i] = new CBufferType()))
            return false;
      if(!m_cMomenum[i].Load(file_handle))
         return false;
     }

After loading the data, we initialize the m_cDeltaWeights buffer. The buffer will be initialized with zero values. In this case, the buffer size is equal to the number of elements in the weights matrix.

First, check the pointer to the object and create a new one if necessary. Then, we will write 0 into all elements of the buffer.

//--- initializing remaining buffers
   if(!m_cDeltaWeights)
      if(!(m_cDeltaWeights = new CBufferType()))
         return false;
   if(!m_cDeltaWeights.BufferInit(m_cWeights.m_mMatrix.Rows(),
                                  m_cWeights.m_mMatrix.Cols(), 0))
      return false;

At the end of the method, we pass the current pointer m_cOpenCL to all internal objects. Here, we are not adding a check for the validity of the pointer. Since all objects of the neural network work within the same OpenCL context, we pass even an invalid pointer to the objects.

//--- passing a pointer to the OpenCL context to objects
   SetOpenCL(m_cOpenCL);
//---
   return true;
  }

As a result of implementing all the methods described above, the final structure of our class has taken the following form.

class CNeuronBase    :  public CObject
  {
protected:
   bool              m_bTrain;
   CMyOpenCL*        m_cOpenCL;
   CActivation*      m_cActivation;
   ENUM_OPTIMIZATION m_eOptimization;
   CBufferType*      m_cOutputs;
   CBufferType*      m_cWeights;
   CBufferType*      m_cDeltaWeights;
   CBufferType*      m_cGradients;
   CBufferType*      m_cMomenum[2];

   //---
   virtual bool      SGDUpdate(int batch_sizeTYPE learningRate,
                                                    VECTOR &Lambda);
   virtual bool      MomentumUpdate(int batch_sizeTYPE learningRate,
                                                    VECTOR &BetaVECTOR &Lambda);
   virtual bool      AdaGradUpdate(int batch_sizeTYPE learningRate,
                                                    VECTOR &Lambda);
   virtual bool      RMSPropUpdate(int batch_sizeTYPE learningRate,
                                                    VECTOR &BetaVECTOR &Lambda);
   virtual bool      AdaDeltaUpdate(int batch_size,
                                                    VECTOR &BetaVECTOR &Lambda);
   virtual bool      AdamUpdate(int batch_sizeTYPE learningRate,
                                                    VECTOR &BetaVECTOR &Lambda);
   virtual bool      SetActivation(ENUM_ACTIVATION_FUNCTION function,
                                                    VECTOR &params);

public:
                     CNeuronBase(void);
                    ~CNeuronBase(void);
   //---
   virtual bool      Init(const CLayerDescription *description);
   virtual bool      SetOpenCL(CMyOpenCL *opencl);
   virtual bool      FeedForward(CNeuronBase *prevLayer);
   virtual bool      CalcOutputGradient(CBufferType *target,
                                                    ENUM_LOSS_FUNCTION loss);
   virtual bool      CalcHiddenGradient(CNeuronBase *prevLayer);
   virtual bool      CalcDeltaWeights(CNeuronBase *prevLayer);
   virtual bool      UpdateWeights(int batch_sizeTYPE learningRate,
                                                    VECTOR &BetaVECTOR &Lambda);
   virtual void      TrainMode(bool flag)         {  m_bTrain = flag;            }
   virtual bool      TrainMode(void)        const {  return m_bTrain;            }
   //---
   CBufferType       *GetOutputs(void)      const {  return(m_cOutputs);         }
   CBufferType       *GetGradients(void)    const {  return(m_cGradients);       }
   CBufferType       *GetWeights(void)      const {  return(m_cWeights);         }
   CBufferType       *GetDeltaWeights(voidconst {  return(m_cDeltaWeights);    }

   virtual bool      SetOutputs(CBufferTypebufferbool delete_prevoius = true);
   //--- methods for working with files
   virtual bool      Save(const int file_handle);
   virtual bool      Load(const int file_handle);
   //--- method of identifying the object
   virtual int       Type(void)             const { return(defNeuronBase);       }
   virtual ulong     Rows(void)             const { return(m_cOutputs.Rows());   }
   virtual ulong     Cols(void)             const { return(m_cOutputs.Cols());   }
   virtual ulong     Total(void)            const { return(m_cOutputs.Total());  }
  };