Discussion of article "Neural networks made easy (Part 7): Adaptive optimization methods"

 

New article Neural networks made easy (Part 7): Adaptive optimization methods has been published:

In previous articles, we used stochastic gradient descent to train a neural network using the same learning rate for all neurons within the network. In this article, I propose to look towards adaptive learning methods which enable changing of the learning rate for each neuron. We will also consider the pros and cons of this approach.

Testing of the optimization by the Adam method was performed under the same conditions, that were used in earlier tests: symbol EURUSD, timeframe H1, data of 20 consecutive candlesticks are fed into the network, and training is performed using the history for the last two years. The Fractal_OCL_Adam Expert Advisor has been created for testing. This Expert Advisor was created based on the Fractal_OCL EA by specifying the Adam optimization method when describing the neural network in the OnInit function of the main program.

      desc.count=(int)HistoryBars*12;
      desc.type=defNeuron;
      desc.optimization=ADAM;

The number of layers and neurons has not changed.

The Expert Advisor was initialized with random weights ranging from -1 to 1, excluding zero values. During testing, already after the 2nd training epoch, the neural network error stabilized around 30%. As you may remember, when learning by the stochastic gradient descent method, the error stabilized around 42% after the 5th training epoch.


The graph of the missed fractals shows a gradual increase in the value throughout the entire training. However, after 12 training epochs, there is a gradual decrease in the value growth rate. The value was equal to 72.5% after the 14th epoch. When training a similar neural network using the stochastic gradient descent method, the percentage of missing fractals after 10 epochs was 97-100% with different learning rates.


Author: Dmitriy Gizlyk

Reason: