• 6.1.4.1 Creating a script to test batch normalization

6.Creating a script to test batch normalization

To analyze the effect of batch normalization on the result, let's take the simplest models with a fully connected perceptron. One of our very first tests was to check the influence of preprocessing normalization of input data on the model's performance. In that test, we concluded that it was important to normalize the initial data and used normalized initial data in all subsequent models. However, the preliminary normalization of the initial data always has costs and is not very convenient for working in financial markets, when the initial data goes in a continuous stream. In this case, the normalization of the source data must be written in the program code. When changing the dataset, whether it's due to time-dependent factors or alterations in the analyzed instrument, it may require modifications to the code or external parameters that need to be defined outside the model. This is an additional cost. After that, you would need to retrain the model. Therefore, it would be logical to find a way to incorporate the data normalization process into the model and update its parameters during the model training. Don't you think that the batch normalization model we are looking at is suitable for solving this problem? This will be our first test.

To conduct such an experiment, we will use the script for testing perceptron models perceptron.py and create a copy of it named batch_norm.py. Let's make small changes to it.

At the beginning of the script, we import the necessary libraries as usual.

# Import libraries
import os
import pandas as pd
import numpy as np
import tensorflow as tf 
from tensorflow import keras 
import matplotlib.pyplot as plt
import MetaTrader5 as mt5

Before training, we need to load training datasets which are available in the sandbox of the MetaTrader 5 terminal. To determine the path to the sandbox, we connect to the terminal and find the path to the terminal data folder. We add MQL5\Files to the resulting path and thus get the path to the terminal sandbox. If you saved the training dataset to a subdirectory, you also need to add it to this sandbox path. Now you can disconnect from the terminal. We will create two local variables with the full path to the files of the training dataset, one with normalized data and the second one with non-normalized data.

# Load the training dataset
if not mt5.initialize():
    print("initialize() failed, error code =",mt5.last_error())
    quit()
 
path=os.path.join(mt5.terminal_info().data_path,r'MQL5\Files')
mt5.shutdown()
filename = os.path.join(path,'study_data.csv')
filename_not_norm = os.path.join(path,'study_data_not_norm.csv')

First, we load data from the normalized set.

data = np.asarray( pd.read_table(filename,
                   sep=',',
                   header=None,
                   skipinitialspace=True,
                   encoding='utf-8',
                   float_precision='high',
                   dtype=np.float64,
                   low_memory=False))

Then we divide the uploaded data into patterns and goals. Let me remind you that when creating the training dataset, we wrote all the information about the pattern to the file in one line. At the same time, each line contains information about only one pattern. The last two elements in the row contain the target values of the pattern. Let's use this property and determine the number of elements in the second dimension of our data array. Subtracting the number of elements from the obtained value by the target values, we get the number of elements of one pattern description. Using this information, we divide the data into two arrays.

# Divide the training sample into initial data and goals
targets=2
inputs=data.shape[1]-targets
train_data=data[:,0:inputs]
train_target=data[:,inputs:]

After that, we load and divide the data of the non-normalized training dataset in the same way.

#load unnormalized training dataset
data = np.asarray( pd.read_table(filename_not_norm,
                   sep=',',
                   header=None,
                   skipinitialspace=True,
                   encoding='utf-8',
                   float_precision='high',
                   dtype=np.float64,
                   low_memory=False))

# Split the non-normalized training sample into initial data and goals
train_nn_data=data[:,0:inputs]
train_nn_target=data[:,inputs:]
 
del data

After dividing the training dataset into two tensors, we delete the source data object in order to use our resources more efficiently.

The next step after loading the data is to create neural network models for testing.

First, we will create a small fully connected perceptron with one hidden layer of 40 elements and a result layer of 2 elements.

# Creating the first model with one hidden layer
model1 = keras.Sequential([keras.layers.InputLayer(input_shape=inputs),
                           keras.layers.Dense(40, activation=tf.nn.swish), 
                           keras.layers.Dense(targets, activation=tf.nn.tanh) 
                         ])

After this, we create a callback object for early termination if the model's error on the training dataset doesn't decrease for more than five epochs. When compiling the model, we specify the Adam parameter optimization method and the standard deviation as a function of the model's training error. In addition to the error function to track the quality of training, we add the Accuracy metric, which shows the proportion of correct responses to the model.

callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=5)
model1.compile(optimizer='Adam'
               loss='mean_squared_error'
               metrics=['accuracy'])
model1.summary()

Next, we create a second model, in which we simply add a batch normalization layer between the source data layer and the hidden model layer.

# Add batch normalization to the source data 
# to a model with one hidden layer
model1bn = keras.Sequential([keras.layers.InputLayer(input_shape=inputs),
                             keras.layers.BatchNormalization(),
                             keras.layers.Dense(40, activation=tf.nn.swish), 
                             keras.layers.Dense(targets, activation=tf.nn.tanh) 
                            ])

And we compile the model with the same parameters.

model1bn.compile(optimizer='Adam'
               loss='mean_squared_error'
               metrics=['accuracy'])
model1bn.summary()

The models for our first experiment are ready.

In the second experiment, I would like to evaluate the impact of using batch normalization within the network between hidden layers of the model. To conduct this experiment, we will also create fully connected perceptrons, but with three similar hidden layers. In the first model, we'll create a model without using batch normalization. Let's just take the first model from this script and add two hidden layers to it, similar to the first hidden layer. The source data and results layers remain unchanged.

# Create a model with three hidden layers
model2 = keras.Sequential([keras.layers.InputLayer(input_shape=inputs),
                           keras.layers.Dense(40, activation=tf.nn.swish), 
                           keras.layers.Dense(40, activation=tf.nn.swish), 
                           keras.layers.Dense(40, activation=tf.nn.swish), 
                           keras.layers.Dense(targets, activation=tf.nn.tanh) 
                         ])

For the sake of experiment purity, we will compile the model with the same parameters.

model2.compile(optimizer='Adam'
               loss='mean_squared_error'
               metrics=['accuracy'])
model2.summary()

Now let's add a batch normalization layer before each hidden layer. Note that we do not add a batch normalization layer before the result layer, because the authors of the method do not recommend it. In their experiments, this worsened the results of the models.

# Add batch normalization for the source data and hidden layers of the second model
model2bn = keras.Sequential([keras.layers.InputLayer(input_shape=inputs),
                             keras.layers.BatchNormalization(),
                             keras.layers.Dense(40, activation=tf.nn.swish), 
                             keras.layers.BatchNormalization(),
                             keras.layers.Dense(40, activation=tf.nn.swish), 
                             keras.layers.BatchNormalization(),
                             keras.layers.Dense(40, activation=tf.nn.swish), 
                             keras.layers.Dense(targets, activation=tf.nn.tanh) 
                            ])

As before, the model is compiled without changing the parameters.

model2bn.compile(optimizer='Adam'
               loss='mean_squared_error'
               metrics=['accuracy'])
model2bn.summary()

Now that all the models are built, we can start training them. All models will be trained with the same parameters. To train the model, we will use batches of 1000 patterns between weight matrix updates. Training will last for 500 epochs unless early stopping occurs. The last 10% of the training dataset will be used for validation. At the same time, the patterns will be mixed during the learning process.

First, let's train a model with one hidden layer using normalized data.

# Train the first model on non-normalized data
history1 = model1.fit(train_data, train_target,
                      epochs=500, batch_size=1000,
                      callbacks=[callback],
                      verbose=2,
                      validation_split=0.1,
                      shuffle=True)
model1.save(os.path.join(path,'perceptron1.h5'))

Next, we train the same model using non-normalized data.

# Train the first model on non-normalized data
history1nn = model1.fit(train_nn_data, train_nn_target,
                      epochs=500, batch_size=1000,
                      callbacks=[callback],
                      verbose=2,
                      validation_split=0.1,
                      shuffle=True)

Now we train a similar model using a batch normalization layer between the source data and the hidden layer. Training will be carried out on a non-normalized training dataset.

history1bn = model1bn.fit(train_nn_data, train_nn_target,
                      epochs=500, batch_size=1000,
                      callbacks=[callback],
                      verbose=2,
                      validation_split=0.1,
                      shuffle=True)
model1bn.save(os.path.join(path,'perceptron1bn.h5'))

The results of the first two trainings will serve as benchmarks for evaluating the model performance with the batch normalization layer.

At this stage, we gather enough information to draw a conclusion from the first experiment: data normalization during data preprocessing can be replaced with a batch normalization layer between the raw data and the trainable model.

Let's move on to working on the second experiment and determine the impact of the addition of a batch normalization layer before the hidden layer of the model on the training process and the overall performance of the trained model. To do this, we need to train two more models.

First, we train a model with three hidden layers using pre-normalized data. We use the same training parameters to train the model.

history2 = model2.fit(train_data, train_target,
                      epochs=500, batch_size=1000,
                      callbacks=[callback],
                      verbose=2,
                      validation_split=0.1,
                      shuffle=True)
model2.save(os.path.join(path,'perceptron2.h5'))

Next, we train the model using a non-normalized training dataset, but with a batch normalization layer before of each hidden layer. In particular, the batch normalization layer is also used before the first hidden layer after the source data layer.

history2bn = model2bn.fit(train_nn_data, train_nn_target,
                      epochs=500, batch_size=1000,
                      callbacks=[callback],
                      verbose=2,
                      validation_split=0.1,
                      shuffle=True)
model2bn.save(os.path.join(path,'perceptron2bn.h5'))

After training these two models, we have enough information to draw conclusions from the results of the second experiment. For clarity, let's create graphs showing the change in training and validation errors as a function of the number of training epochs.

First, let's plot the change in the standard deviation of the data of our models from the target data for the first experiment.

# Drawing model training results with one hidden layer
plt.plot(history1.history['loss'], label='Normalized inputs train')
plt.plot(history1.history['val_loss'], label='Normalized inputs validation')
plt.plot(history1nn.history['loss'], label='Unnormalized inputs train')
plt.plot(history1nn.history['val_loss'], label='Unnormalized inputs validation')
plt.plot(history1bn.history['loss'],
                        label='Unnormalized inputs\nvs BatchNormalization train')
plt.plot(history1bn.history['val_loss'],
                   label='Unnormalized inputs\nvs BatchNormalization validation')
plt.ylabel('$MSE$ $loss$')
plt.xlabel('$Epochs$')
plt.title('Model training dynamics\n1 hidden layer')
plt.legend(loc='upper right', ncol=2)

In addition to the first graph, let's plot the dynamics of changes in the Accuracy metric.

plt.figure()
plt.plot(history1.history['accuracy'], label='Normalized inputs train')
plt.plot(history1.history['val_accuracy'], label='Normalized inputs validation')
plt.plot(history1nn.history['accuracy'], label='Unnormalized inputs train')
plt.plot(history1nn.history['val_accuracy'], label='Unnormalized inputs validation')
plt.plot(history1bn.history['accuracy'],
                           label='Unnormalized inputs\nvs BatchNormalization train')
plt.plot(history1bn.history['val_accuracy'],
                      label='Unnormalized inputs\nvs BatchNormalization validation')
plt.ylabel('$Accuracy$')
plt.xlabel('$Epochs$')
plt.title('Model training dynamics\n1 hidden layer')
plt.legend(loc='lower right', ncol=2)

We build similar graphs to display the results of the second experiment.

# Drawing the results of training models with three hidden layers
plt.figure()
plt.plot(history2.history['loss'], label='Normalized inputs train')
plt.plot(history2.history['val_loss'], label='Normalized inputs validation')
plt.plot(history2bn.history['loss'],
                   label='Unnormalized inputs\nvs BatchNormalization train')
plt.plot(history2bn.history['val_loss'],
              label='Unnormalized inputs\nvs BatchNormalization validation')
plt.ylabel('$MSE$ $loss$')
plt.xlabel('$Epochs$')
plt.title('Model training dynamics\n3 hidden layers')
plt.legend(loc='upper right', ncol=2)

plt.figure()
plt.plot(history2.history['accuracy'], label='Normalized inputs train')
plt.plot(history2.history['val_accuracy'], label='Normalized inputs validation')
plt.plot(history2bn.history['accuracy'],
                       label='Unnormalized inputs\nvs BatchNormalization train')
plt.plot(history2bn.history['val_accuracy'],
                  label='Unnormalized inputs\nvs BatchNormalization validation')
plt.ylabel('$Accuracy$')
plt.xlabel('$Epochs$')
plt.title('Model training dynamics\n3 hidden layers')
plt.legend(loc='lower right', ncol=2)

So, at this stage, we have trained all the models using data from the training set. For us, the training dataset represents historical data. Of course, the fact that the model can approximate historical data is a good thing. But we would like the model to work well in real-time. To check how the model behaves on unknown data, let's check the operation of the models on a test sample.

We load the test dataset in the same way as we loaded the training datasets. First, let's load the normalized test dataset.

# Uploading a test dataset
test_filename = os.path.join(path,'test_data.csv')
test = np.asarray( pd.read_table(test_filename,
                   sep=',',
                   header=None,
                   skipinitialspace=True,
                   encoding='utf-8',
                   float_precision='high',
                   dtype=np.float64,
                   low_memory=False))

Now we divide the loaded data into patterns and target values.

# Separation of the test sample into initial data and goals
test_data=test[:,0:inputs]
test_target=test[:,inputs:]

Then we repeat the algorithm to load the non-normalized test dataset.

test_filename = os.path.join(path,'test_data_not_norm.csv')
test = np.asarray( pd.read_table(test_filename,
                   sep=',',
                   header=None,
                   skipinitialspace=True,
                   encoding='utf-8',
                   float_precision='high',
                   dtype=np.float64,
                   low_memory=False))

# Split the test dataset into initial data and goals
test_nn_data=test[:,0:inputs]
test_nn_target=test[:,inputs:]
 
del test

After copying the data, we delete the array of initial data, which will allow us to manage our resources more efficiently.

Next, we will test the operation of all models on test samples. We check the operation of models without batch normalization layers on normalized data. We will test models using batch normalization layers on non-normalized test sample data.

# Checking the results of models on a test sample
test_loss1, test_acc1 = model1.evaluate(test_data, test_target, verbose=2
test_loss1bn, test_acc1bn = model1bn.evaluate(test_nn_data, test_nn_target,
                                                                verbose=2
test_loss2, test_acc2 = model2.evaluate(test_data, test_target, verbose=2
test_loss2bn, test_acc2bn = model2bn.evaluate(test_nn_data, test_nn_target,
                                                                verbose=2

Testing results are output to the log.

# Output test results to the journal
print('Model 1 hidden layer')
print('Test accuracy:', test_acc1)
print('Test loss:', test_loss1)

print('Model 1 hidden layer with BatchNormalization')
print('Test accuracy:', test_acc1bn)
print('Test loss:', test_loss1bn)

print('Model 3 hidden layers')
print('Test accuracy:', test_acc2)
print('Test loss:', test_loss2)

print('Model 3 hidden layer with BatchNormalization')
print('Test accuracy:', test_acc2bn)
print('Test loss:', test_loss2bn)

For clarity, we make a graphical representation of the results separately for the standard deviation and for the Accuracy metric.

plt.figure()
plt.bar(['1 hidden layer','1 hidden layer\nvs BatchNormalization',
         '3 hidden layers','3 hidden layers\nvs BatchNormalization'],
        [test_loss1,test_loss1bn,test_loss2,test_loss2bn])
plt.ylabel('$MSE$ $Loss$')
plt.title('Test results')

plt.figure()
plt.bar(['1 hidden layer','1 hidden layer\nvs BatchNormalization',
         '3 hidden layers','3 hidden layers\nvs BatchNormalization'],
        [test_acc1,test_acc1bn,test_acc2,test_acc2bn])
plt.ylabel('$Accuracy$')
plt.title('Test results')
 
plt.show()

After creating the graphs, we call the command to render them on the user's screen.

With this, we conclude our work on the script that allows testing of how the use of batch normalization layer affects training results and model performance. We will get familiar with the results in the next section, dedicated to testing models.