Great and useful article material
Thank you!
Yes, it's planned :)
Dmitry, please, I didn't realise at first why sine and cosine are added to the input values
neuron.setOutputVal(inputVals.At(i)+(i%2==0 ? sin(i) : cos(i)) );
I also need some advice - should I try to normalise the input data in some way for my tasks?
Now in the examples, as I understand it, everything is given "as is". But even in examples with fractals some oscillators are from 0 to 1, and prices can be much higher than 1 depending on the instrument.
Doesn't this create an initial bias when training on non-normalised inputs?
Dmitry, please tell me why sine and cosine are added to the input values.
I also need some advice - should I try to normalise the input data in some way for my tasks?
Now in the examples, as I understand it, everything is given "as is". But even in the examples with fractals some oscillators are from 0 to 1, and prices can be much higher than 1 depending on the instrument.
Doesn't this create an initial bias when training on non-normalised inputs?
This is for time embedding. I'll go into more detail in the next article.
I am trying hard to understand the meaning)
Input values are arithmetically adjusted to a constant matrix from 0 to 1, regardless of what the input data is and its absolute values.
I understand time-embedding in this sense as follows - you superimpose a sine wave on a time series, so that the significance of past candles fluctuates in time.
Ok, it's clear, apparently it doesn't matter that the fluctuations of the input data of each bar with a different phase, or it's a feature.
But then the question about normalisation becomes all the more relevant. The significance is quite different for EURUSD and SP500, for example.
And apparently it is correct to transfer this time-embedding from the bible to the Train function.
@Dmitriy Gizlyk, the question arose while studying and working with the library:
in the method of calculating gradient for hidden layers you add outputVal
this is to compensate its value later in the calcOutputGradients method for universality, right?
You also added gradient normalisation.
bool CNeuron::calcHiddenGradients(CLayer *&nextLayer) { double targetVal=sumDOW(nextLayer)+outputVal; return calcOutputGradients(targetVal); } //+------------------------------------------------------------------+ bool CNeuron::calcOutputGradients(double targetVal) { double delta=(targetVal>1 ? 1 : targetVal<-1 ? -1 : targetVal)-outputVal; gradient=(delta!=0 ? delta*activationFunctionDerivative(outputVal) : 0); return true; }
The question is whether it would be more correct to normalise not the target but the final delta like this.
double delta=targetVal-outputVal; delta=delta>1?1:delta<-1?-1:delta;
why? Example: If outputVal is close to 1, and the total weighted gradient of the next layer is also high and positive, we now get a final delta close to zero, which seems wrong.
After all, the delta of the gradient should be proportional to the error of the next layer, i.e. in other words, when the effective weight of a neuron is negative (and possibly in some other cases), the neuron is penalised for an error less than with a positive weight. I may have explained it summarily, but I hope those who are in the subject will understand the idea :) Maybe you have already noticed this point and made such a decision, it would be interesting to clarify the reasons.
Also the same point for OCL code
__kernel void CalcHiddenGradient(__global double *matrix_w, __global double *matrix_g, __global double *matrix_o, __global double *matrix_ig, int outputs, int activation) { .............. switch(activation) { case 0: sum=clamp(sum+out,-1.0,1.0)-out; sum=sum*(1-pow(out==1 || out==-1 ? 0.99999999 : out,2));
@Dmitriy Gizlyk, the question arose while studying and working with the library:
in the method of gradient calculation for hidden layers you add outputVal
this is for further compensation of its value in the calcOutputGradients method for universality, right?
You've also added gradient normalisation
The question is whether it would be more correct to normalise not the target, but the final delta like this
why? Example: If outputVal is close to 1, and the total weighted gradient of the next layer is also high and positive, then now we get a final delta close to zero, which seems wrong.
After all, the delta of the gradient should be proportional to the error of the next layer, i.e. in other words, when the effective weight of a neuron is negative (and possibly in some other cases), the neuron is penalised for an error less than with a positive weight. I may have explained it summarily, but I hope those who are in the subject will understand the idea :) Maybe you have already noticed this point and made such a decision, it would be interesting to clarify the reasons.
Also the same point for OCL code
Not really. We check target values, just like in hidden layers we add outpuVal to the gradient to get the target and check its value. The point is that sigmoid has a limited range of results: logistic function from 0 to 1, tanh - from -1 to 1. If we penalise the neuron for deviation and increase the weight coefficient indefinitely, we will come to a weight overflow. After all, if we came to a neuron's value equal to 1, and the subsequent layer transmitting an error says that we should increase the value to 1.5. The neuron will obediently increase the weights at each iteration, and the activation function will cut off the values at the level of 1. Therefore, I limit the values of the target to the ranges of acceptable values of the activation function. And I leave the adjustment outside the range to the weights of the subsequent layer.
Not really. We check the values of the target, just like in hidden layers we add outpuVal to the gradient to get the target and check its value. The point is that sigmoid has a limited range of results: logistic function from 0 to 1, tanh - from -1 to 1. If we penalise the neuron for deviation and increase the weighting factor indefinitely, we will come to weight overflow. After all, if we came to a neuron's value equal to 1, and the subsequent layer transmitting an error says that we should increase the value to 1.5. The neuron will obediently increase the weights at each iteration, and the activation function will cut off the values at the level of 1. Therefore, I limit the values of the target to the ranges of acceptable values of the activation function. And I leave the adjustment outside the range to the weights of the subsequent layer.
I think I've got it. But I'm still wondering if this is the right approach, an example like this:
if the network makes a mistake by giving 0 when it's really 1. From the last layer then the gradient weighted on the previous layer comes (most likely, as I understand) positive and can be more than 1, let's say 1,6.
Suppose there is a neuron in the previous layer that produced +0.6, i.e. it produced the correct value - its weight should increase in plus. And with this normalisation we cut the change in its weight.
The result is norm(1,6)=1. 1-0,6=0,4, and if we normalise it as I suggested, it will be 1. In this case, we inhibit the amplification of the correct neuron.
What do you think?
About infinite increase of weights, something like I heard that it happens in case of "bad error function", when there are a lot of local minima and no expressed global, or the function is not convex, something like that, I'm not a super expert, I just believe that you can and should fight with infinite weights and other methods.
I'm asking for an experiment to test both variants. If I think of how to formulate the test )

- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
New article Neural networks made easy (Part 9): Documenting the work has been published:
We have already passed a long way and the code in our library is becoming bigger and bigger. This makes it difficult to keep track of all connections and dependencies. Therefore, I suggest creating documentation for the earlier created code and to keep it updating with each new step. Properly prepared documentation will help us see the integrity of our work.
Once the program completes, you will receive a ready-to-use documentation. Some screenshots are shown below. The full documentation is provided in the attachment.
Author: Dmitriy Gizlyk