Mechanism for describing the structure of the future neural network

We have already decided that we will build a universal constructor for the convenient creation of neural networks of various configurations. Hence, we need some mechanism (interface) to be able to pass the model configuration to be built. Let's think about what information we need to get from the user to unambiguously understand what kind of neural network is supposed to be created.

First of all, we need to understand how many layers of neurons our network will have. There should be at least two such layers: an input layer with initial data and an output layer with results. Additionally, the new neural network may include a varying quantity of hidden layers. Their quantity may vary, and we will not limit them now.

To create each layer of the neural network, we need to know the number of neurons in that layer. Hence, in addition to the number of neural layers, the user must specify the number of neurons in each layer.

Now let's recall that in the previous section, we defined constants for several types of neural layers, which will differ by the type of neurons. To understand what kind of layer the user wants to create, you need to get that initial information. So, the user should be able to specify it for each layer that is created.

In addition, we considered different variants of activation functions. Which one should be used when creating neurons?

When creating a universal tool, we must provide the user with the option to choose the activation function. Hence, we add the activation function to the list of parameters that the user should specify.

Then there is another question: will all neurons in the same layer use the same activation function? Or will there be options to use different activation features within a single layer? I propose to focus on the first option, where all neurons of one layer use one activation function.

Let me explain my point. While discussing techniques to improve the convergence of neural networks and, in particular, data normalization, we talked about the importance of data comparability at the input of the neural layer. The use of different activation functions, on the other hand, is highly likely to lead to data imbalance. This is due to the nature of the activation functions themselves. Remember, the sigmoid returns data in the range from 0 to 1. The value range of the hyperbolic tangent lies in the range from -1 to 1. ReLU can return values from 0 to +. Evidently, different activation functions will produce significantly different values and only complicate the training and operation of the neural network.

Additionally, from a technical perspective, there are also advantages to using one activation function for the entire neural layer. In this case, we can then limit ourselves to a single integer value to store the activation code of neurons in a layer regardless of the number of neurons. To store individual activation functions, we would have had to create a whole vector of values the size of the number of neurons in the layer.

The next thing we need to know when creating the architecture of a neural network is the weight optimization method. In the chapter Neural network optimization methods, we covered six optimization methods. In the previous chapter, we set up an enumeration to identify them. Now you can take advantage of this enumeration and let the user choose one of them.

Why is it important for us to know the optimization method now, at the stage of creating the neural network, rather than during its training? It's very simple. Different optimization methods require different amounts of objects to store information, so when creating a neural network, it is necessary to create all the required objects. Given that we have memory constraints on our computing machine, we need to use it rationally and not create unnecessary objects.

When creating layers such as normalization and Dropout, we will need some specific information. For normalization, we need the normalization sample size (batch), and for Dropout, we need to specify the probability of "dropping out" neurons during training.

Looking ahead, for some types of neural layers, we will still need the size of the input and output window, as well as the step size from the beginning of one input window to the beginning of the next window.

To make it easier for the user to create consecutively identical layers, let's add another parameter to specify such a sequence.

As a result, we have accumulated a dozen parameters that the user needs to specify for each layer. Let's add to this the total number of layers to create in a neural network. These are all things we want to get from the user before creating the neural network. We will not overly complicate the data transfer process, and to describe one neural layer, we will create a class named CLayerDescription with elements to store the specified parameters.

class CLayerDescription    :  public CObject
  {
public:
                     CLayerDescription(void);
                    ~CLayerDescription(void) {};
   //---
   int               type;         // Type of neural layer
   int               count;        // Number of neurons in a layer
   int               window;       // Source data window size
   int               window_out;   // Results window size
   int               step;         // Input data window step
   int               layers;       // Number of neural layers
   int               batch;        // Weight Matrix Update Packet Size
   ENUM_ACTIVATION_FUNCTION   activation;  // Activation function type
   VECTOR            activation_params[2]; // Array of activation function parameters
   ENUM_OPTIMIZATION optimization// Weight matrix optimization type
   TYPE              probability;  // masking probability, Dropout only
  };

Note that the created class is inherited from the CObject class, which is the base class for all objects in MQL5. It's a small point that we'll exploit a little later.

We will not complicate the class constructor in any way, but only set some default values. You can use any of your values here. I recommend, however, that you specify the most commonly used parameters. This will make it easier for you to specify them later in the program code.

CLayerDescription::CLayerDescription(void)   :  type(defNeuronBase),
                                                count(100),
                                                window(100),
                                                step(100),
                                                layers(1),
                                                activation(AF_TANH),
                                                optimization(Adam),
                                                probability(0.1),
                                                batch(100)
  {
   activation_params = VECTOR::Ones(2);
   activation_params[1] = 0;
  }

Now let's get back to why it was important to inherit from CObject. Here everything is quite straightforward: we have created an object to describe one neural layer but not the whole neural network. We have not yet specified the total number of layers and their sequence.

I decided not to complicate the process and use the CArrayObj class from the standard MQL5 library. This is a dynamic array class for storing pointers to CObject objects and their successors. Hence, we can write our neural layer description objects into it. In this way, we address the issue of a container for storing and transmitting information about neural networks. The sequence of neural layers will correspond to the sequence of stored descriptions from the zero-index input layer to the output layer.

In my opinion, this is a rather simple and intuitive way to describe the structure of a neural network. But every reader can make use of their own developments.