Discussion of article "Neural networks made easy (Part 9): Documenting the work" - Forex Economic Calendar - Articles, Library comments

MetaQuotes · 2021-06-03T12:45:52.0000000Z

New article Neural networks made easy (Part 9): Documenting the work has been published: We have already passed a long way and the code in our library is becoming bigger and bigger. This makes it difficult to keep track of all connections and dependencies. Therefore, I suggest creating documentation for the earlier created code and to keep it updating with each new step. Properly prepared documentation will help us see the integrity of our work. Once the program completes, you will receive a ready-to-use documentation. Some screenshots are shown below. The full documentation is provided in the attachment. Author: Dmitriy Gizlyk

Igor Makanu 2020.12.22 14:35 #1

Great and useful article material

Thank you!

Aleksey Mavrin 2020.12.23 11:07 #2

Cool! Documentation is available - you can do it in SB. Although it would be necessary to screw LSTM on the kernels, are you planning to do it?

Dmitriy Gizlyk 2020.12.23 13:06 #3

Aleksey Mavrin:
Cool! Documentation is available - you can do it in SB. Although it would be necessary to screw LSTM on kernels, is it planned?

Yes, it is planned :)

Aleksey Mavrin 2020.12.28 22:26 #4

Dmitriy Gizlyk:

Yes, it's planned :)

Dmitry, please, I didn't realise at first why sine and cosine are added to the input values

neuron.setOutputVal(inputVals.At(i)+(i%2==0 ? sin(i) : cos(i)) );

I also need some advice - should I try to normalise the input data in some way for my tasks?

Now in the examples, as I understand it, everything is given "as is". But even in examples with fractals some oscillators are from 0 to 1, and prices can be much higher than 1 depending on the instrument.

Doesn't this create an initial bias when training on non-normalised inputs?

Discussion of article "Horizontal Discussion of article "Visual Discussion of article "Library

Dmitriy Gizlyk 2020.12.29 06:06 #5

Aleksey Mavrin:

Dmitry, please tell me why sine and cosine are added to the input values.

I also need some advice - should I try to normalise the input data in some way for my tasks?

Now in the examples, as I understand it, everything is given "as is". But even in the examples with fractals some oscillators are from 0 to 1, and prices can be much higher than 1 depending on the instrument.

Doesn't this create an initial bias when training on non-normalised inputs?

This is for time embedding. I will go into more detail in the next article.

Aleksey Mavrin 2020.12.30 02:03 #6

Dmitriy Gizlyk:
This is for time embedding. I'll go into more detail in the next article.

I am trying hard to understand the meaning)

Input values are arithmetically adjusted to a constant matrix from 0 to 1, regardless of what the input data is and its absolute values.

I understand time-embedding in this sense as follows - you superimpose a sine wave on a time series, so that the significance of past candles fluctuates in time.

Ok, it's clear, apparently it doesn't matter that the fluctuations of the input data of each bar with a different phase, or it's a feature.

But then the question about normalisation becomes all the more relevant. The significance is quite different for EURUSD and SP500, for example.

And apparently it is correct to transfer this time-embedding from the bible to the Train function.

Discussion of article "The Discussion of article "Brute Discussion of article "Horizontal

Maxim Dmitrievsky 2020.12.30 04:01 #7

It would be interesting to read about positional embeddings. But there are doubts that they are very necessary. You can use volatility, it is like a sinusoid. But I realise that this is a common practice for time series. There may be some other "know-how" invented there.

Discussion of article "Brute Discussion of article "Library Discussion of article "Library

Aleksey Mavrin 2021.01.08 23:53 #8

@Dmitriy Gizlyk, the question arose while studying and working with the library:

in the method of calculating gradient for hidden layers you add outputVal

this is to compensate its value later in the calcOutputGradients method for universality, right?

You also added gradient normalisation.

bool CNeuron::calcHiddenGradients(CLayer *&nextLayer)
  {
   double targetVal=sumDOW(nextLayer)+outputVal;
   return calcOutputGradients(targetVal);
  }
//+------------------------------------------------------------------+
bool CNeuron::calcOutputGradients(double targetVal)
  {
   double delta=(targetVal>1 ? 1 : targetVal<-1 ? -1 : targetVal)-outputVal;
   gradient=(delta!=0 ? delta*activationFunctionDerivative(outputVal) : 0);
   return true;
  }

The question is whether it would be more correct to normalise not the target but the final delta like this.

double delta=targetVal-outputVal;
delta=delta>1?1:delta<-1?-1:delta;

why? Example: If outputVal is close to 1, and the total weighted gradient of the next layer is also high and positive, we now get a final delta close to zero, which seems wrong.

After all, the delta of the gradient should be proportional to the error of the next layer, i.e. in other words, when the effective weight of a neuron is negative (and possibly in some other cases), the neuron is penalised for an error less than with a positive weight. I may have explained it summarily, but I hope those who are in the subject will understand the idea :) Maybe you have already noticed this point and made such a decision, it would be interesting to clarify the reasons.

Also the same point for OCL code

__kernel void CalcHiddenGradient(__global double *matrix_w,
                                 __global double *matrix_g,
                                 __global double *matrix_o,
                                 __global double *matrix_ig,
                                 int outputs, int activation)
  {
..............   
switch(activation)
     {
      case 0:
        sum=clamp(sum+out,-1.0,1.0)-out;
        sum=sum*(1-pow(out==1 || out==-1 ? 0.99999999 : out,2));

Discussion of article "Timeseries Discussion of article "Implementing Discussion of article "Visualizing

Dmitriy Gizlyk 2021.01.09 01:01 #9

Aleksey Mavrin:

@Dmitriy Gizlyk, the question arose while studying and working with the library:

in the method of gradient calculation for hidden layers you add outputVal

this is for further compensation of its value in the calcOutputGradients method for universality, right?

You've also added gradient normalisation

The question is whether it would be more correct to normalise not the target, but the final delta like this

why? Example: If outputVal is close to 1, and the total weighted gradient of the next layer is also high and positive, then now we get a final delta close to zero, which seems wrong.

After all, the delta of the gradient should be proportional to the error of the next layer, i.e. in other words, when the effective weight of a neuron is negative (and possibly in some other cases), the neuron is penalised for an error less than with a positive weight. I may have explained it summarily, but I hope those who are in the subject will understand the idea :) Maybe you have already noticed this point and made such a decision, it would be interesting to clarify the reasons.

Also the same point for OCL code

Not really. We check target values, just like in hidden layers we add outpuVal to the gradient to get the target and check its value. The point is that sigmoid has a limited range of results: logistic function from 0 to 1, tanh - from -1 to 1. If we penalise the neuron for deviation and increase the weight coefficient indefinitely, we will come to a weight overflow. After all, if we came to a neuron's value equal to 1, and the subsequent layer transmitting an error says that we should increase the value to 1.5. The neuron will obediently increase the weights at each iteration, and the activation function will cut off the values at the level of 1. Therefore, I limit the values of the target to the ranges of acceptable values of the activation function. And I leave the adjustment outside the range to the weights of the subsequent layer.

Discussion of article "Deep Discussion of article "Exploring Moving Average - Trend

Aleksey Mavrin 2021.01.11 13:16 #10

Dmitriy Gizlyk:

Not really. We check the values of the target, just like in hidden layers we add outpuVal to the gradient to get the target and check its value. The point is that sigmoid has a limited range of results: logistic function from 0 to 1, tanh - from -1 to 1. If we penalise the neuron for deviation and increase the weighting factor indefinitely, we will come to weight overflow. After all, if we came to a neuron's value equal to 1, and the subsequent layer transmitting an error says that we should increase the value to 1.5. The neuron will obediently increase the weights at each iteration, and the activation function will cut off the values at the level of 1. Therefore, I limit the values of the target to the ranges of acceptable values of the activation function. And I leave the adjustment outside the range to the weights of the subsequent layer.

I think I've got it. But I'm still wondering if this is the right approach, an example like this:

if the network makes a mistake by giving 0 when it's really 1. From the last layer then the gradient weighted on the previous layer comes (most likely, as I understand) positive and can be more than 1, let's say 1,6.

Suppose there is a neuron in the previous layer that produced +0.6, i.e. it produced the correct value - its weight should increase in plus. And with this normalisation we cut the change in its weight.

The result is norm(1,6)=1. 1-0,6=0,4, and if we normalise it as I suggested, it will be 1. In this case, we inhibit the amplification of the correct neuron.

What do you think?

About infinite increase of weights, something like I heard that it happens in case of "bad error function", when there are a lot of local minima and no expressed global, or the function is not convex, something like that, I'm not a super expert, I just believe that you can and should fight with infinite weights and other methods.

I'm asking for an experiment to test both variants. If I think of how to formulate the test )

Discussion of article "Deep Discussion of article "Horizontal Discussion of article "Exploring