Implementing the perceptron model in Python

To implement the fully connected perceptron model in Python, we will use the template we created earlier. As you may recall, in this template, we left the description of the neural layers in our model unfilled.

# Create a neural network model
model = keras.Sequential([keras.Input(shape=inputs),
                         # Fill the model with a description of the neural layers
                         ])

To create fully connected layers in a neural network, we will use the layers.Dense class from the Keras library. The following operation is performed within this layer:

where:

  • activation = activation function, set in parameters
  • input = an array of source data
  • kernel = a weight matrix
  • dot = a vector multiplication operation
  • bias = a displacement element

Dense provides parameters to control the neural layer creation process:

  • units — the dimension of the output space (the number of neurons in the layer);
  • activation — the activation function used;
  • use_bias — is an optional parameter that indicates whether to use a vector of displacement elements;
  • kernel_initializer — the method of initializing the matrix of weights;
  • bias_initializer — a method for initializing a vector of displacement elements;
  • kernel_regularizer — a weight matrix regularization method;
  • bias_regularizer — a method for regularizing the displacement vector;
  • activity_regularizer — a method of regularizing the activation function;
  • kernel_constraint — a weight matrix restriction function;
  • bias_constraint — a displacement vector restriction function.

Please note that you cannot change the settings after the first access to the layer.

In addition to the above parameters, Dense can take an input_shape parameter that indicates the size of the input array. This parameter is valid only for the first layer of the neural network. When this parameter is used, an input layer is created to be inserted in front of the current layer. The operation can be considered as an equivalent to explicitly defining the input layer.

We'll start implementing our first neural network model by copying our script template into a new perceptron.py file. In the created file, we will create the first model with one hidden layer of 40 neurons and 2 neurons in the results layer. In the hidden layer, we'll use Swish as an activation function. The neurons in the output layer will be activated by the hyperbolic tangent.

# Create a neural network model
model1 = keras.Sequential([keras.Input(shape=inputs),
                           keras.layers.Dense(40, activation=tf.nn.swish), 
                           keras.layers.Dense(targerts, activation=tf.nn.tanh) 
                          ])

In theory, this is enough to start training the model. However, we study the operation of various models and want to understand the influence of changing the neural network architecture on the model ability to learn and generalize the initial data. So I've added two more models. One model has two additional hidden layers. The result is a model with three hidden layers. All three hidden layers are completely identical: they have 40 elements each and are activated by the Swish function. The first and last layers remain unchanged.

# Create a model with three hidden layers
model2 = keras.Sequential([keras.Input(shape=inputs),
                           keras.layers.Dense(40, activation=tf.nn.swish), 
                           keras.layers.Dense(40, activation=tf.nn.swish), 
                           keras.layers.Dense(40, activation=tf.nn.swish), 
                           keras.layers.Dense(targerts, activation=tf.nn.tanh) 
                         ])

The following steps should be repeated for each model. First, let's prepare the model for training using the compile method.

model2.compile(optimizer='Adam'
               loss='mean_squared_error'
               metrics=['accuracy'])

After that, we will start the model training process and save the trained model.

history2 = model2.fit(train_data, train_target,
                      epochs=500, batch_size=1000,
                      callbacks=[callback],
                      verbose=2,
                      validation_split=0.2,
                      shuffle=True)
model2.save(os.path.join(path,'perceptron2.h5'))

We will build the third model on the basis of the second model with the addition of regularization. For each neural layer, we will specify in the kernel_regularizer parameter the keras.regularizers.l1_l2 class object with the L1 and L2-regularization parameters. As you can see from the class name, we'll be using ElasticNet.

# Add regularization to the model with three hidden layers
model3 = keras.Sequential([keras.Input(shape=inputs),
               keras.layers.Dense(40, activation=tf.nn.swish,
                  kernel_regularizer=keras.regularizers.l1_l2(l1=1e-7, l2=1e-5)), 
               keras.layers.Dense(40, activation=tf.nn.swish,
                  kernel_regularizer=keras.regularizers.l1_l2(l1=1e-7, l2=1e-5)), 
               keras.layers.Dense(40, activation=tf.nn.swish,
                  kernel_regularizer=keras.regularizers.l1_l2(l1=1e-7, l2=1e-5)), 
               keras.layers.Dense(targerts, activation=tf.nn.tanh) 
                         ])

Next, we'll compile and train the model. All three models use identical training parameters. This will make it possible to directly assess the impact of the model architecture on the learning outcome. At the same time, we will eliminate the influence of other factors as much as possible.

model3.compile(optimizer='Adam'
               loss='mean_squared_error'
               metrics=['accuracy'])
history3 = model3.fit(train_data, train_target,
                      epochs=500, batch_size=1000,
                      callbacks=[callback],
                      verbose=2,
                      validation_split=0.2,
                      shuffle=True)
model3.save(os.path.join(path,'perceptron3.h5'))

Since we are training not one but three models in this script, we also need to correct the visualization unit. Let's display the training results of all three models on one graph. This will demonstrate differences in the training and validation process. We will make changes to the blocks for constructing both graphs.

# Plot the training results of the three models
plt.figure()
plt.plot(history1.history['loss'],
         label='Train 1 hidden layer')
plt.plot(history1.history['val_loss'],
         label='Validation 1 hidden layer')
plt.plot(history2.history['loss'],
         label='Train 3 hidden layers')
plt.plot(history2.history['val_loss'],
         label='Validation 3 hidden layers')
plt.plot(history3.history['loss'],
         label='Train 3 hidden layers vs regularization')
plt.plot(history3.history['val_loss'],
         label='Validation 3 hidden layer vs regularization')
plt.ylabel('$MSE$ $Loss$')
plt.xlabel('$Epochs$')
plt.title('Dynamic of Models train')
plt.legend(loc='lower left')

plt.figure()
plt.plot(history1.history['accuracy'],
         label='Train 1 hidden layer')
plt.plot(history1.history['val_accuracy'],
         label='Validation 1 hidden layer')
plt.plot(history2.history['accuracy'],
         label='Train 3 hidden layers')
plt.plot(history2.history['val_accuracy'],
         label='Validation 3 hidden layers')
plt.plot(history3.history['accuracy'],
         label='Train 3 hidden layers\nvs regularization')
plt.plot(history3.history['val_accuracy'],
         label='Validation 3 hidden layer\nvs regularization')
plt.ylabel('$Accuracy$')
plt.xlabel('$Epochs$')
plt.title('Dynamic of Models train')
plt.legend(loc='upper left')

After training, our template tests the model performance on a test sample. Here, we also have to test three models under similar conditions. I'll skip the test sample loading block, as it moved from the template unchanged. Here is just a code for directly testing models.

# Check the results of models on a test sample
test_loss1, test_acc1 = model1.evaluate(test_data,
                                        test_target,
                                        verbose=2
test_loss2, test_acc2 = model2.evaluate(test_data,
                                        test_target,
                                        verbose=2
test_loss3, test_acc3 = model3.evaluate(test_data,
                                        test_target,
                                        verbose=2)

The test results in the template were published in a journal. Now we have the results of testing three models. It will be more efficient to compare the results on the graph. We will use the Matplolib library to build graphs.

In this case, we will not display the dynamics of the process, as before, but compare the values. Therefore, it will be more convenient to use a column chart to display values. The library offers the bar method for constructing diagrams. This method takes two arrays in its parameters: in the first, we will specify the labels of the compared parameters, and in the second, their values. To complete the picture, let's add the title of the graph and the vertical axis using the title and ylabel methods, respectively.

plt.figure()
plt.bar(
    ['1 hidden layer','3 hidden layers''3 hidden layers\nvs regularization'],
    [test_loss1,test_loss2,test_loss3])
plt.ylabel('$MSE$ $Loss$')
plt.title('Result of test')
 
plt.figure()
plt.bar(
    ['1 hidden layer','3 hidden layers''3 hidden layers\nvs regularization'],
    [test_acc1,test_acc2,test_acc3])
plt.ylabel('$Accuracy$')
plt.title('Result of test')

We will see how the script works a little later. In the next chapter, we'll prepare data for training and testing models.