Neural network training

We have already learned about the structure of an artificial neuron, the organization of data exchange between neurons, and the principles of neural network construction. We also learned how to initialize synaptic coefficients. The next step is to train the neural network.

Tom Mitchell proposed the following definition of machine learning:

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks T, as measured by P, improves with experience E."

Usually, three main approaches to neural network training are distinguished:

Supervised learning
Unsupervised learning
Reinforcement learning

Algorithms of supervised learning in practice produce the best results, but they require a lot of preparatory work. The very principle of supervised learning implies that there are correct answers. As a supervisor, at each iteration of learning, we will guide the neural network by showing it the correct results, thereby encouraging the neural network to memorize what is true and what is false.

In this approach, correlation links between the raw data and the correct answers are set up within the neural network. Ideally, a neural network should learn to extract essential features from the set of initial data. By generalizing the set of extracted features, it should determine the object's affiliation to a particular class (classification tasks) or indicate the most probable course of events (regression tasks).

The complexity of this approach lies in the need for extensive preparatory work. This approach requires mapping the correct answers to each set of initial data from the training sample. It is not always possible to automate this work, and human resources have to be involved. At the same time, using manpower to prepare the training set and correct answers increases the risk of errors in the sample and, as a result, improper configuration of the neural network.

Another risk of this approach is the overfitting of the neural network. This phenomenon usually occurs when training deep networks on a small set of input data. In such a case, the neural network is able to "memorize" all pairs of initial data sets with correct answers. In doing so, it will lose any ability to generalize the data. As a result, we will get a neural network with excellent results on the training data set and completely random answers on the test sample and when using it on real data.

The risk of neural network overfitting can be reduced by using various regularization, normalization, and dropout methods which we will discuss later.

Also, note that we can encounter a task where there is no clear correct answer for the presented data sets. In such cases, other approaches to training neural networks are used.

Unsupervised learning is used when there are no correct answers for the training sample. Unsupervised learning algorithms allow for the extraction of individual features of the original data objects. Comparing the extracted features, the algorithms cluster the original data, grouping the most similar objects into certain classes. The number of such classes is specified in the hyperparameters of the neural network.

Cost saving during the preparation stage comes at the expense of object recognition quality and a more limited range of solvable tasks.

With the growth of the volume of initial data for training, unsupervised learning algorithms are widely used for the preliminary training of neural networks Initially, a neural network is created and trained unsupervised on a large dataset. This allows us to train the neural network to extract individual features from the initial data set and to partition a large amount of data into separate classes of objects.

Then, decision-making neural layers (most often fully connected perceptron neural layers) are added to the pre-trained network, and the neural network is further trained using supervised learning algorithms.

This approach allows training the neural network on a large volume of initial data, which helps minimize the risk of overfitting the neural network. At the same time, since training on the primary dataset occurs unsupervised, we can further train the deep neural network on a relatively small set of paired original data with correct answers. This reduces the resources required for preparatory work during supervised training.

A separate approach to neural network training can be called reinforcement learning. This approach is used to solve optimization problems that require constructing a strategy. The best results are demonstrated when training neural networks with computer and logic games, for which this method was developed. It is applicable for long finite processes, when throughout the process the neural network needs to make a series of decisions based on the state of the environment, and the cumulative result of the decisions taken will only be clear at the end of the process. For example, winning or losing a game.

The essence of the method is to assign some sort of reward or penalty for each action. During the training process, the strategy with the maximum reward is determined.

In this book, more attention will be given to supervised learning, which in practice shows the best training results and is applicable for solving regression tasks, including time series forecasting.

Weight initialization methods in neural networks

Loss functions