Discussion of article "Neural networks made easy (Part 13): Batch Normalization"

 

New article Neural networks made easy (Part 13): Batch Normalization has been published:

In the previous article, we started considering methods aimed at improving neural network training quality. In this article, we will continue this topic and will consider another approach — batch data normalization.

Various approaches to data normalization are used in neural network application practice. However, all of them are aimed at keeping the training sample data and the output of the hidden layers of the neural network within a certain range and with certain statistical characteristics of the sample, such as variance and median. This is important, because network neurons use linear transformations which in the process of training shift the sample towards the antigradient.

Consider a fully connected perceptron with two hidden layers. During a feed-forward pass, each layer generates a certain data set that serves as a training sample for the next layer. The result of the output layer is compared with the reference data. Then, during the feed-backward pass, the error gradient is propagated from the output layer through hidden layers towards the initial data. Having received an error gradient at each neuron, we update the weight coefficients, adjusting the neural network for the training samples of the last feed-forward pass. A conflict arises here: the second hidden layer (H2 in the figure below) is adjusted to the data sample at the output of the first hidden layer (H1 in the figure), while by changing the parameters of the first hidden layer we have already changed the data array. In other words, we adjust the second hidden layer to the data sample which no longer exists. A similar situation occurs with the output layer, which adjusts to the second hidden layer output which has already changed. The error scale will be even greater if we consider the distortion between the first and the second hidden layers. The deeper the neural network, the stronger the effect. This phenomenon is referred to as internal covariate shift.


Classical neural networks partly solve this problem by reducing the learning rate. Minor changes in weights do not entail significant changes in the sample distribution at the output of the neural layer. But this approach does not solve the scaling problem which appears with an increase in the number of neural network layers, and it also reduces the learning speed. Another problem of a small learning rate is that the process can get stuck on local minima, which we have already discussed in article 6.

Author: Dmitriy Gizlyk

 

Hi Dmitriy

Please you would have a code example of an LSTM with the version of the neuronet.mqh file of article 13?

I tried to use article 4 fractal_lstm.mq5 file, but without success ... an error occurs in training ...


cheers

 
Hi, this series about neural networks is very good. Congratulations!

For me as beginner with NN it was very enlighting. I want to use your proposals to code an EA. It should be an constructions set for DNN to try out different functions and topologies and learn, which are better.

So I modified your last example (MLMH + Convolutional).
I added many different activation functions (32 functions - gaussian, SeLU, SILU, Softsign, Symmetric Sigmoid,...) and their derivates,
I changed the error/success calculation (Buy, Sell, DontBuySell) because I think "Don't trade" isn't undefined. So if the NN recognizes no buy and no sell and this is correct in real  it should be rewarded in feedback loop.

Maybe someone has already solutions or can help with following questions:

I'm not able to create functions which need weights from complete layer: Softmax, Maxout, PRelu with learned alpha.
Also I'm not able to do different optimizations (AdaBound, AMSBound, Momentum).
I'm thinking of a DNN-Builder-EA for testing to find the best net-topology.

1. how can I find the in-/out-count of neurons and weights per layer?

2. What topology do you suggest? I tried many variations:
A) A few Neuron  layers with count=19000 then descending count in next layers *0.3
B) 1 Convolutional + 12 layers MLMH with each 300 neurons
C) 29 layers with each 300 neurons
D) 29 layers which each 300 neurons and Normalization between each layer.
I get forecasts to max 57%, but I think it can/has to be be better.
Should there be layers with rising neuron count and then descending again?

3. How can I make a back test? There is a condition to return false when in test modi - I tried to remark it, but no success.
There are many explanations in very detail but I don't understand some overview.

4. Which layer after which? Where should be BatchNorm layers?

5. How much output neurons has convolutional or all the multi head like MLMH when layers=x, step=y, window_out=z? I have to calculate count of next Neuron layer. I want to avoid too big layers or bottlenecks.

6. What about LSTM_OCL? Is it too weak in relation to attention/MH, MHML?

7. I want to implement eta for each layer, but had no success (lack of know how about classes - I'm an good 3rd gen coder).

8. What should be modified, to get an error rate < 0.1. I have constant 0,6+.

9. What about bias neurons in this existing layer layouts?

I studied already many websites for weeks, but didn't found answers for these questions.
But I'm looking forward, to solve this, because of the positive feedback of others, which had already success.

Maybe there is coming up part 14 with solutions for all these issues?

Best regards
and many thanks in advance
 

HI. I am getting this error

CANDIDATE FUNCTION NOT VIABLE: NO KNOW CONVERSION FROM 'DOUBLE __ATTRIBUTE__((EXT_VECTOR_TYPE92000' TO 'HALF4' FOR 1ST ARGUMENT

2022.11.30 08:52:28.185 Fractal_OCL_AttentionMLMH_b (EURJPY,D1) OpenCL program create failed. Error code=5105

2022.11.30 08:52:28.194 Fractal_OCL_AttentionMLMH_b (EURJPY,D1) Error of feedForward function: 4401
2022.11.30 08:52:28.199 Fractal_OCL_AttentionMLMH_b (EURJPY,D1) invalid pointer access in 'NeuroNet.mqh' (2271,16)

when using EA since article part 10 examples

Please any guess???

Thank you

 
MrRogerioNeri #:

HI. I am getting this error

CANDIDATE FUNCTION NOT VIABLE: NO KNOW CONVERSION FROM 'DOUBLE __ATTRIBUTE__((EXT_VECTOR_TYPE92000' TO 'HALF4' FOR 1ST ARGUMENT

2022.11.30 08:52:28.185 Fractal_OCL_AttentionMLMH_b (EURJPY,D1) OpenCL program create failed. Error code=5105

2022.11.30 08:52:28.194 Fractal_OCL_AttentionMLMH_b (EURJPY,D1) Error of feedForward function: 4401
2022.11.30 08:52:28.199 Fractal_OCL_AttentionMLMH_b (EURJPY,D1) invalid pointer access in 'NeuroNet.mqh' (2271,16)

when using EA since article part 10 examples

Please any guess???

Thank you

Hi, can you send full log?

 

Hi. Thanks for help

Rogerio

Files:
20221201.log  7978 kb
 
MrRogerioNeri #:

Hi. Thanks for help

Rogerio

Hello  Rogerio.

1. You don't create model.

CS      0       08:28:40.162    Fractal_OCL_AttentionMLMH_d (EURUSD,H1) EURUSD_PERIOD_H1_ 20Fractal_OCL_AttentionMLMH_d.nnw
CS      0       08:28:40.163    Fractal_OCL_AttentionMLMH_d (EURUSD,H1) OnInit - 130 -> Error of read EURUSD_PERIOD_H1_ 20Fractal_OCL_AttentionMLMH_d.nnw prev Net 5004

2. Your GPU doesn't support double. Please, load last version from article https://www.mql5.com/ru/articles/11804

CS      0       08:28:40.192    Fractal_OCL_AttentionMLMH_d (EURUSD,H1) OpenCL: GPU device 'Intel HD Graphics 4400' selected
CS      0       08:28:43.149    Fractal_OCL_AttentionMLMH_d (EURUSD,H1) 1:9:26: error: OpenCL extension 'cl_khr_fp64' is unsupported
CS      0       08:28:43.149    Fractal_OCL_AttentionMLMH_d (EURUSD,H1) 1:55:16: error: no matching function for call to 'dot'
CS      0       08:28:43.149    Fractal_OCL_AttentionMLMH_d (EURUSD,H1) c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:2199:61: note: candidate function not viable: no known conversion from 'double4' to 'float' for 1st argument
Нейросети — это просто (Часть 34): Полностью параметризированная квантильная функция
Нейросети — это просто (Часть 34): Полностью параметризированная квантильная функция
  • www.mql5.com
Продолжаем изучение алгоритмов распределенного Q-обучения. В предыдущих статьях мы рассмотрели алгоритмы распределенного и квантильного Q-обучения. В первом мы учили вероятности заданных диапазонов значений. Во втором учили диапазоны с заданной вероятностью. И в первом, и во втором алгоритме мы использовали априорные знания одного распределения и учили другое. В данной статье мы рассмотрим алгоритм, позволяющей модели учить оба распределения.
 

Hi Dmitriy

You wrote: You don't create model.

But How do I create a model ? I compile all program fonts and run the EA.

The EA creates a file on folder 'files' whith the extension nnw. this file isn't the model ?

Thanks 

 

Hi Teacher Dmitriy

Now none of the .mqh compiles

for example when i try to compile the vae.mqh i obtain this error

'MathRandomNormal' - undeclared identifier VAE.mqh 92 8

I will try to start from the begning again.

One more question:When you put a new version of NeuroNet.mqh this version is fully compatible with the other olders EA ?

Thanks

rogerio

PS: Even deleting all files and directories and start with a new copy from PART 1 and 2 I can not more compile any code.

For exemple when a try to compile the code in fractal.mq5 a obtain this error:

cannot convert type 'CArrayObj *' to reference of type 'const CArrayObj *' NeuroNet.mqh 437 29

Sorry I realy wanted to understand your articles and code.

PS2: OK i removed the word 'const' on 'feedForward', 'calcHiddenGradients' and 'sumDOW' and now i could compile the Fractal.mqh and Fractal2.mqh



Reason: