Defining constants and enumerations

The process of defining constants is one of those basic processes that is often overlooked. Moreover, it enables the organization and systematization of future work on creating a software product. Particular attention should be given to it when creating complex, structured products with a multi-block branched architecture.

Here, we won't discuss specific local variables and constants, as their scope will often be determined by separate blocks or functions. We will discuss creating constants that will serve as a common thread throughout our program and will frequently be used for organizing interactions both between blocks within our product and for data exchange with external programs.

Starting a large project by creating constants and enumerations is a very useful practice. Here, we can also include the creation of global variables. Primarily, this is one of the integral parts of developing project architecture. When contemplating the list of global constants and enumerations, we are re-evaluating our project as a whole, reconsidering its objectives and the means to achieve them. Even in broad strokes, we conceptualize the project structure and define the tasks of each block and the flow of information between them. We also understand what information needs to be obtained from an external program, what information needs to be returned, and at which stage of the process.

The work done at this stage will be our roadmap when creating the project. A detailed examination of data exchange interface organization allows us to assess the necessity of having specific information at each stage. This also provides the opportunity to identify the sources of information and uncover potential data deficits. Eliminating data deficits during the design stage will be much easier than during the implementation phase. At that point, we would have to revisit the design stage to search for the necessary data sources. Next, it will be necessary to consider possible ways of transmitting information from the source to the processing location and attempt to seamlessly integrate them into the established architecture with minimal adjustments. This will lead to an unpredictable number of revisions in the already established processes, and it will be necessary to assess the impact of these revisions on adjacent processes.

We will collect all the files of the library to be built in the NeuroNetworksBook\realization subdirectory according to the file structure.

All global constants of our project will be collected in one file, defines.mqh.

So what constants are we going to define?

Let's take a look at the architecture of the project. As we've discussed, the result of our work will be a class that encompasses the complete organization of a neural network's operation. In the MQL5 architecture, all objects are inherited from the base class CObject. It includes the virtual Type method which is defined for class identification and which returns an integer value. Consequently, for a unique identification of our class, we should define a certain constant, preferably distinct from the constants of existing classes. This will serve as a prototype for the business card of our class within the program. To create named constants, we will utilize the mechanism of macro substitution.

#define defNeuronNet 0x8000

Next, our neural network will consist of neurons. Neurons are organized into layers, and a neural network may consist of multiple layers. Since we are constructing a universal constructor, at this stage, we don't know the number of layers in the neural network or the number of neurons in each layer. Therefore, we assume that there will be a dynamic array for storing pointers to neuron layers. Most likely, in addition to simple storage of pointers to neural layer objects, we will need to create additional methods for working with them. Based on these considerations, we will create a separate class for such storage. Consequently, we will also create a business card for it.

#define defArrayLayers 0x8001

Next in the structure, we will create a separate class for the neural layer. Later, when we approach the implementation of computation algorithms using the OpenCL technology, we will discuss the organization of vector computations and the means of transferring data to the GPU memory. In this context, creating classes for each individual neuron might not be very convenient, but we will need a class for storing information and organizing data exchange buffering. Thus, we must create "business cards" for these objects as well.

It should be noted that the book will explore several architectural solutions for organizing neurons. Each architecture has its own peculiarities in terms of forward and backward propagation algorithms. However, we have already decided that we will not create distinct objects for neurons. So, we need to introduce identification at the level of neural layers. Therefore, we will create separate identifiers for each architecture of the neural layer.

#define defBuffer                0x8002
#define defActivation            0x8003
#define defLayerDescription      0x8004
#define defNeuronBase            0x8010
#define defNeuronConv            0x8011
#define defNeuronProof           0x8012
#define defNeuronLSTM            0x8013
#define defNeuronAttention       0x8014
#define defNeuronMHAttention     0x8015
#define defNeuronGPT             0x8016
#define defNeuronDropout         0x8017
#define defNeuronBatchNorm       0x8018

We have defined constants and object identifiers and can move further. Let's recall what this book starts with. At the very beginning of the book, we considered a mathematical model of a neuron. Each neuron has an activation function. We've seen several options for activation functions, and all of them are valid choices. Due to the absence of a derivative, we'll exclude the threshold function from the list. However, we'll implement the remaining discussed activation functions using the OpenCL technology. In the case of working with the CPU, we will use vector operations in which activation functions are already implemented. To maintain consistency in approaches and to indicate the used activation function, we use the standard enumeration ENUM_ACTIVATION_FUNCTION.

However, it's worth noting that later, when discussing convolutional neural network algorithms, we will become familiar with the organization of a pooling layer. It utilizes other functions.

//--- pooling layer activation functions
enum ENUM_PROOF
  {
   AF_MAX_POOLING,
   AF_AVERAGE_POOLING
  };

Take a look at the chapter Training a neural network. In it, we discussed various options for loss functions and optimization methods for neural networks. In my understanding, we should provide the user with the ability to choose what they want to use. However, we need to restrict the choices to the capabilities of our library. For a loss function, we can use the standard enumeration ENUM_LOSS_FUNCTION by analogy with the activation function. For model optimization methods, we will create a new enumeration.

As you can observe, in the enumeration of optimization methods, I added the None element to allow the option of disabling training for a specific layer. Such an approach is often utilized when using a pre-trained network on new data. For instance, we might have a trained and functioning neural network that works well on one financial instrument, and we would like to replicate it for other instruments or timeframes. In all likelihood, without retraining, its performance will drop dramatically.

enum ENUM_OPTIMIZATION
  {
   None=-1,
   SGD,
   MOMENTUM,
   AdaGrad,
   RMSProp,
   AdaDelta,
   Adam
  };

In this case, we have a choice: to train the neural network from scratch or to retrain the existing network. The second option usually requires less time and resources. However, to avoid disrupting the entire network, the retraining process starts with a low learning rate and focuses on the final layers (decision-making neurons), while leaving the initial analytical layers untrained.

Along with the learning methods, we discussed techniques for improving the convergence of neural networks. In this regard, normalization and dropout will be organized as separate layers — for them, we have already defined constants when discussing neural layers. We will implement one regularization — Elastic Net. The process will be controlled through the variables λ1 and λ2. If both variables are zero, regularization is disabled. In the case where one of the parameters is equal to zero, we will obtain L1 or L2 regularization, depending on the non-zero parameter.

Have you noticed that in this chapter we have refreshed our memories of the major milestones of the material we have studied? In addition, behind each constant or enumeration element, there is a specific functionality that we still need to implement.

But I'd like to add one more point. When introducing the OpenCL technology, we discussed that not all OpenCL-enabled devices work with the double type. It would probably be foolish to create copies of the library for different data types.

Here it's important to understand that different data types provide different levels of precision for computations. Therefore, when creating a model, it's important to ensure consistent conditions for all scenarios of model operation, both with and without using the OpenCL technology. To address this issue, we will introduce data type macros along with corresponding types for vectors and matrices.

#define TYPE                      double
#define MATRIX                    matrix<TYPE>
#define VECTOR                    vector<TYPE>

We organize a similar macro substitution for an OpenCL program.

#resource "opencl_program.cl" as string OCLprogram
//---
#define LOCAL_SIZE                256
const string ExtType=StringFormat("#define TYPE %s\r\n"
                                  "#define TYPE4 %s4\r\n"
                                  "#define LOCAL_SIZE %d\r\n",
                                   typename(TYPE),typename(TYPE),LOCAL_SIZE);
#define cl_program                ExtType+OCLprogram

Here we can also add to the models various hyperparameters. For example, it could be a learning rate. You can also add parameters for optimization and regularization methods.

#define defLossSmoothFactor       1000
#define defLearningRate           (TYPE)3.0e-4
#define defBeta1                  (TYPE)0.9
#define defBeta2                  (TYPE)0.999
#define defLambdaL1               (TYPE)0
#define defLambdaL2               (TYPE)0

However, it's important to keep in mind that the hyperparameter values mentioned here are just default values. During the operation of the model, we will use variables that will be initialized with these values when the model is created. However, the user has the right to specify different values without changing the library code. We will discuss the mechanism of such a process when constructing classes and their methods.

Creating the framework for the future MQL5 program

Mechanism for describing the structure of the future neural network