Convolutional neural networks

We continue our exploration of neural network architectures, focusing now on the principles of operation and construction of convolutional neural networks (CNNs). These neural networks are widely used in tasks involving object recognition in photos and videos. It is considered that convolutional neural networks are resistant to changes in scale, shifts in perspective, and other spatial distortions of images. Their architecture enables them to equally effectively detect objects in any part of a scene.

In addition to the architectural differences we will discuss in the next chapter, there is a fundamental distinction in the logical processing of incoming data streams between convolutional and fully connected neural networks. As we have seen earlier, each neuron in a fully connected network is linked to all neurons in the previous layer, responding to distinct patterns in input data. Let's try to translate this to image pattern recognition.

Imagine training a neural network to recognize digits printed on a piece of paper. Each digit is printed on a perfectly clean sheet of paper, and your neural network has learned to recognize them perfectly. But when you input a signal with a slight noise, its output becomes unpredictable, as the noise alters the image, making it deviate from the ideal patterns in the training dataset.

The opposite situation is also possible. When you train a neural network on noisy images, where the object of interest is on some background, a fully connected neural network with a sufficient number of neurons and an adequate training dataset can solve such a task. At the same time, it perceives the picture as a whole. Changing or removing the background can disturb the equilibrium of a fully connected neural network, causing its output to become unpredictable. Once again, it's all about the integrity of the perception of the world. When using supervised learning, if we provided a neural network with a noisy image during training and gave it the correct answer, the neural network associated the image with the answer and memorized it. But it didn't pick out the right image from the picture; instead, it memorized the whole picture. Therefore, the absence of the background is perceived by the neural network as the absence of some component of the image. In such a case, it will be difficult to give the correct result.

Similarly, issues arise with rotations, zooms, or any minor alteration in input data, treating each change as new and unseen by a fully connected network. This demands additional resources for processing and memorization, posing performance challenges as the size of processed images increases.

In addition to the recognition problems mentioned above, there is also a performance problem. As the size of the processed images increases, the size of the incoming data stream also grows. As a consequence, the weight matrices also grow. Therefore, more memory is required to store the weight matrix and more time is required to train the neural network. That said, in many cases, a significant portion of the image does not carry useful information. Consequently, resources are being spent inefficiently.

To address these, convolutional neural networks were designed. Instead of examining the whole image, CNNs divide it into small components, scrutinizing each as if under a microscope. Each convolutional layer contains filters checking individual parts for correspondence to desired patterns, mitigating noise and background influence. The use of small weight matrices reduces memory requirements, and the size remains unchanged with larger processed images.

The use of small weight matrices for each filter allows for a significant reduction in the memory requirement for storing them. Moreover, in this approach, the size of the weight matrix does not depend on the size of the original image, but on the size of the filter image. Therefore, as the size of the processed images increases, the size of the weight matrix remains unchanged.

In addition to the aforementioned benefits, CNNs reduce the processed data with each layer since for each small constituent part of the image, only one value is returned, indicating the degree of similarity between the image and the desired pattern.

In terms of trading, I have tried to translate the advantages of CNNs into a new plane. Instead of searching for small constituents of the desired pattern in an image, we will be looking for small components of patterns from price candles and indicator values in the input data stream. By doing so, we will try to eliminate the influence of noise fluctuations on the overall result.

Furthermore, one of the major advantages of convolutional neural networks lies in the fact that the network learns the filters during the training process.

Let's delve into the architectures of this solution and get to know it more closely.