Activation function class

We still have some open questions regarding the implementation of the neural layer base class. One of them is the neuron activation function class.

The activation function class will contain the operations for calculating the activation function and its derivative. There are various types of activation functions, and the book does not provide the full list of such functions, while it only covers the more commonly used ones. New, well-performing activation functions can emerge. So, if you need to add a new activation function to this library, the easiest way would be to do so by creating a new class that inherits from a certain base class. In this way, by overriding a couple of methods responsible for the direct calculation of the function and its derivative, the changes will be propagated to all neural network objects, including those created earlier.

Following this logic, I decided to create not a single activation function class that would cover all the functions discussed earlier, but a class structure in which each class would contain an algorithm for only one activation function. In this structure, there would be one base class at the top, which would define the interfaces for interaction with other objects and serve as an object for accessing methods from other objects without being tied to a specific activation function.

By creating a single branching point in the algorithm during the initialization of a specific activation function class, we move away from checking the used function at each iteration of the forward and backward passes.

The parent class for all activation functions CActivation is inherited from the CObject class, which is the base class for all objects in MQL5.

The CActivation class only contains methods for organizing the interface and does not describe any of the activation functions. In turn, to organize the activation function classes, I defined the following methods:

  • CActivation — a class constructor;
  • ~CActivation — a class destructor;
  • Init — passing parameters to calculate the activation function;
  • GetFunction — getting the used activation function and its parameters;
  • Activation — performs calculation of the activation function value based on the reference value;
  • Derivative — derivative from the activation function;
  • SetOpenCL — writing a pointer to an OpenCL object;
  • Save and Load — virtual methods for working with files;
  • Type — a virtual method for class identification.

In general, the class looks much simpler than those discussed previously. In the constructor of the class, we will set the default activation function parameters.

CActivation::CActivation(void) : m_iRows(0),
                                 m_iCols(0),
                                 m_cOpenCL(NULL)
  {
   m_adParams = VECTOR::Ones(2);
   m_adParams[1] = 0;
  }

Note that to calculate the derivative of certain activation functions, we only need the value of the activation function itself. In other cases, we will need values before the activation function. Therefore, let's introduce two pointers to the corresponding data buffers:

  • m_cInputs
  • m_cOutputs

In the body of this class, we will create only one instance of a buffer, and in another variable, we will save a pointer to the buffer that calls the neural layer. Due to this, in the destructor of the class, we will only delete one object.

CActivation::~CActivation(void
  {
   if(!!m_cInputs)
      delete m_cInputs;
  }

In the class initialization method, we store the resulting activation function parameters and create a data buffer object. It's important to note that at this stage, we are merely creating an instance of the class; we are initializing the buffer itself because we don't yet know the required buffer data sizes.

bool CActivation::Init(VECTOR &params)
  {
   m_adParams = params;
//---
   m_cInputs = new CBufferType();
   if(!m_cInputs)
      return false;
//---
   return true;
  }

The activation parameter reading method is straightforward. We only return the value of the variables.

ENUM_ACTIVATION_FUNCTION CActivation::GetFunction(VECTOR &params)
  {
   params = m_adParams;
   return GetFunction();
  }

The Activation method that calculates the values of the activation function, in the parameters receives a pointer to the neural layer result buffer. This buffer contains neuron performance data prior to the activation function. We need to activate the obtained values and overwrite them into the specified buffer. However, as we know, the obtained values might be needed when calculating derivatives of certain functions. Therefore, we "play" with the pointers to the buffer objects, saving the obtained pointer in the variable m_cInputs. In the variable received in parameters and in the m_cOutputs variable, we save the buffer from the m_cInputs variable. The current class corresponds to the absence of an activation function, so we don't perform any operations on the obtained data.

However, there is one nuance. Since we don't perform any operations on the obtained data, we need to return them to the calling program. At this point, we have already replaced the buffer that we will return. Therefore, we check the used activation function, and if no further actions are required on the obtained data, we will return the pointer to the buffer back and delete the unnecessary object.

It might seem like there are many unnecessary actions in the method that didn't alter the data in any way. However, these are our small investments in the functionality of the inheriting classes.

bool CActivation::Activation(CBufferType *&output)
  {
   if(!output || output.Total() <= 0)
      return false;
   m_cOutputs = m_cInputs;
   m_cInputs = output;
   output = m_cOutputs;
   if(GetFunction() == AF_NONE && output != m_cInputs)
     {
      delete output;
      output = m_cInputs;
     }
//---
   return true;
  }

At the same time, the method for calculating the derivative of the activation function in this class will remain nominal. In all cases it will return a positive value.

The SetOpenCL method for activating multi-threaded computation functionality receives in the parameters a pointer to an object for working with the OpenCL context object and the size of the result buffer for the calling neural layer. We will need these buffer sizes for initialization and creation of the buffer in the context.

In the body of the method, we store the resulting dimensions and pointer, then initialize the data buffer of the specified size with null values and create a buffer in the OpenCL context.

bool CActivation::SetOpenCL(CMyOpenCL *openclconst ulong rowsconst ulong cols)
  {
   m_iRows = rows;
   m_iCols = cols;
   if(m_cOpenCL != opencl)
     {
      if(m_cOpenCL)
         delete m_cOpenCL;
      m_cOpenCL = opencl;
     }
//---
   if(!!m_cInputs)
     {
      if(!m_cInputs.BufferInit(m_iRowsm_iCols0))
         return false;
      m_cInputs.BufferCreate(m_cOpenCL);
     }//---
   return(!!m_cOpenCL);
  }

As you can see, the methods of the class are quite simple. All we have to do is look at file-handling techniques. Their algorithm is also simple. In the body of the Save method, as usual, we check the file handle for writing the data we receive in the parameters and store the activation function type and parameter value.

bool CActivation::Save(const int file_handle)
  {
   if(file_handle == INVALID_HANDLE)
      return false;
   if(FileWriteInteger(file_handleType()) <= 0 ||
      FileWriteInteger(file_handle, (int)GetFunction()) <= 0 ||
      FileWriteInteger(file_handle, (int)m_iRows) <= 0 ||
      FileWriteInteger(file_handle, (int)m_iCols) <= 0 ||
      FileWriteDouble(file_handle, (double)m_adParams[0]) <= 0 ||
      FileWriteDouble(file_handle, (double)m_adParams[1]) <= 0)
      return false;
//---
   return true;
  }

The data loading method Load also receives a file handle in parameters. In the method body, we check the validity of the received handle and read the values of the constants. After that, we initialize one data buffer. At the same time, we do not forget to control the operation process.

bool CActivation::Load(const int file_handle)
  {
   if(file_handle == INVALID_HANDLE)
      return false;
   m_iRows = (uint)FileReadInteger(file_handle);
   m_iCols = (uint)FileReadInteger(file_handle);
   m_adParams.Init(2);
   m_adParams[0] = (TYPE)FileReadDouble(file_handle);
   m_adParams[1] = (TYPE)FileReadDouble(file_handle);
//---
   if(!m_cInputs)
     {
      m_cInputs = new CBufferType();
      if(!m_cInputs)
         return false;
     }
   if(!m_cInputs.BufferInit(m_iRowsm_iCols0))
      return false;
//---
   return true;
  }

We have reviewed all the methods of the base class of the CActivation activation function. So, we have the following class structure.

class CActivation : protected CObject
  {
protected:
   ulong             m_iRows;
   ulong             m_iCols;
   VECTOR            m_adParams;
   CMyOpenCL*        m_cOpenCL;
   //---
   CBufferType*      m_cInputs;
   CBufferType*      m_cOutputs;
 
public:
                     CActivation(void);
                    ~CActivation(void) {if(!!m_cInputsdelete m_cInputs; }
   //---
   virtual bool      Init(VECTOR &params);
   virtual ENUM_ACTIVATION_FUNCTION  GetFunction(VECTOR &params);
   virtual ENUM_ACTIVATION_FUNCTION   GetFunction(void) { return AF_NONE; }
   virtual bool      Activation(CBufferType*& output);
   virtual bool      Derivative(CBufferType*& gradient) { return true;    }
   //---
   virtual bool      SetOpenCL(CMyOpenCL *openclconst ulong rows
                                                  const ulong cols);

   //--- methods for working with files
   virtual bool      Save(const int file_handle);
   virtual bool      Load(const int file_handle);
   //--- object identification method
   virtual int       Type(void)             const { return defActivation; }
  };

However, as we discussed earlier, this class only lays the groundwork for future classes of various activation functions. To add the actual activation function algorithm, you need to create a new class by overriding several methods. For example, let's create a class of a linear activation function. The structure of this class is given below.

class CActivationLine   :  public CActivation
  {
public:
                     CActivationLine(void) {};
                    ~CActivationLine(void) {};
   //---
   virtual ENUM_ACTIVATION_FUNCTION   GetFunction(voidoverride
                                              { return AF_LINEAR; }
   virtual bool      Activation(CBufferType*& outputoverride;
   virtual bool      Derivative(CBufferType*& gradientoverride;
  };

The new CActivationLine class is publicly inherited from the created above base class of the CActivation activation function. The constructor and destructor of the class are empty. All we have to do is redefine three methods:

  • GetFunction — gets the used activation function and its parameters;
  • Activation — performs calculation of the activation function value based on the reference value;
  • Derivative — a derivative of the activation function.

In the GetFunction method, we only change the return type of the activation function to the corresponding class.

The Activation method in the parameters receives a pointer to the initial data buffer similar to the method of the parent class. In the body of the method, we don't check the received pointer; we simply call the method of the parent class, where we check the received pointer and "play" with the pointers to data buffers. After this, the algorithm is split into two threads: one using the OpenCL technology and the other without it. We will learn about multi-threaded operations a little later. In the block of operations without using multi-threading, we simply invoke the activation function for the matrix of obtained values, specifying the activation function type as AF_LINEAR and the function parameters.

bool CActivationLine::Activation(CBufferType*& output)
  {
   if(!CActivation::Activation(output))
      return false;
//---
   if(!m_cOpenCL)
     {
      if(!m_cInputs.m_mMatrix.Activation(output.m_mMatrixAF_LINEAR,
                                          m_adParams[0], m_adParams[1]))
         return false;
     }
   else // OpenCL block
     {
      return false;
     }
//---
   return true;
  }

The method that calculates the derivative is even more straightforward. In the parameters, the method receives a pointer to the error gradient object. The obtained values must be corrected for the derivative of the activation function. As you know, the derivative of a linear function is its coefficient at the variable. So, the only thing we have to do is multiply the resulting gradient vector by the parameter of the activation function with the index of 0.

bool CActivationLine::Derivative(CBufferType*& gradient)
  {
   if(!m_cInputs || !m_cOutputs ||
      !gradient || gradient.Total() < m_cOutputs.Total())
      return false;
//---
   if(!m_cOpenCL)
     {
      gradient.m_mMatrix = gradient.m_mMatrix * m_adParams[0];
     }
   else // OpenCL block
     {
      return false;
     }
//---
   return true;
  }

As you can see, the mechanism for describing the new activation function is quite simple. Let's create a class for using ReLU as the activation function in a similar manner.

class CActivationLReLU : public CActivation
  {
public:
                     CActivationLReLU(void) { m_adParams[0] = (TYPE)0.3; };
                    ~CActivationLReLU(void) {};
   //---
   virtual ENUM_ACTIVATION_FUNCTION   GetFunction(voidoverride { return AF_LRELU; }
   virtual bool      Activation(CBufferType*& outputoverride;
   virtual bool      Derivative(CBufferType*& gradientoverride;
  };

In the activation function of the new class, we will also use a matrix activation function call specifying the corresponding function type, AF_LRELU.

bool CActivationLReLU::Activation(CBufferType*& output)
  {
   if(!CActivation::Activation(output))
      return false;
//---
   if(!m_cOpenCL)
     {
      if(!m_cInputs.m_mMatrix.Activation(output.m_mMatrixAF_LRELU,m_adParams[0]))
         return false;
     }
   else // OpenCL block
     {
      return false;
     }
//---
   return true;
  }

We'll use a similar approach in the derivative method of the activation function.

bool CActivationLReLU::Derivative(CBufferType*& gradient)
  {
   if(!m_cOutputs || !gradient ||
      m_cOutputs.Total() <= 0 || gradient.Total() < m_cOutputs.Total())
      return false;
//---
   if(!m_cOpenCL)
     {
      MATRIX temp;
      if(!m_cInputs.m_mMatrix.Derivative(tempAF_LRELU,m_adParams[0]))
         return false;
      gradient.m_mMatrix *= temp;
     }

   else // OpenCL block
     {
      return false;
     }
//---
   return true;
  }

The reader may have a reasonable question as to why we should create new classes if we use the activation matrix functions embedded in the MQL5 language. This is done more to ensure a unified approach with and without OpenCL multi-threaded technologies. These methods will incorporate code for organizing multi-threaded computations in the OpenCL context. The use of the described classes enables a unified call to activation function algorithms using both MQL5 tools and multi-threaded computations in the OpenCL context.