Discussion of article "Backpropagation Neural Networks using MQL5 Matrices"

 

New article Backpropagation Neural Networks using MQL5 Matrices has been published:

The article describes the theory and practice of applying the backpropagation algorithm in MQL5 using matrices. It provides ready-made classes along with script, indicator and Expert Advisor examples.

As we will see below, MQL5 provides a large set of built-in activation functions. The choice of a function should be made based on the specific problem (regression, classification). Usually, it is possible to select several functions and then experimentally find the optimal one.

Popular activation functions

Popular activation functions

Activation functions can have different value ranges, limited or unlimited. In particular, the sigmoid (3) maps the data into the range [0,+1], which is better for classification problems, while the hyperbolic tangent maps data into the range [-1,+1], which is assumed better for regression and forecasting problems.

Author: Stanislav Korotky

 
matrix temp;
if(!outputs[n].Derivative(temp, of))

In the backpropagation ,the derivative functions expect to receive x and not activation of x (unless they changed it recently for the activation functions it applies to)

Here is an example :

#property version   "1.00"

int OnInit()
  {
//---
  double values[];
  matrix vValues;
  int total=20;
  double start=-1.0;
  double step=0.1;
  ArrayResize(values,total,0);
  vValues.Init(20,1);
  for(int i=0;i<total;i++){
  values[i]=sigmoid(start);
  vValues[i][0]=start;
  start+=step;
  }
  matrix activations;
  vValues.Activation(activations,AF_SIGMOID);
  //print sigmoid
  for(int i=0;i<total;i++){
  Print("sigmoidDV("+values[i]+")sigmoidVV("+activations[i][0]+")");
  }
  //derivatives
  matrix derivatives;
  activations.Derivative(derivatives,AF_SIGMOID);
  for(int i=0;i<total;i++){
  values[i]=sigmoid_derivative(values[i]);
  Print("dDV("+values[i]+")dVV("+derivatives[i][0]+")");
  }
//---
   return(INIT_SUCCEEDED);
  }

double sigmoid(double of){
return((1.0/(1.0+MathExp((-1.0*of)))));
}

double sigmoid_derivative(double output){
return(output*(1-output));
}

Also theres Activation functions for which you can send in more inputs both in the activation and the derivation , like the elu for instance

Derivative(output,AF_ELU,alpha);
Activation(output,AF_ELU,alpha);
 
Lorentzos Roussos #:

In the backpropagation ,the derivative functions expect to receive x and not activation of x (unless they changed it recently for the activation functions it applies to)

I'm not sure what you mean. There are formulae in the article, which are converted into the lines of source code exactly. The outputs are obtained with Activation call during feedforward stage, and then we take their derivatives on backpropagation. Probably you missed that indexing of output arrays in the classes are with +1 bias to the indexing of layer weights.

 
Stanislav Korotky #:

I'm not sure what you mean. There are formulae in the article, which are converted into the lines of source code exactly. The outputs are obtained with Activation call during feedforward stage, and then we take their derivatives on backpropagation. Probably you missed that indexing of output arrays in the classes are with +1 bias to the indexing of layer weights.

Yes the temp matrix is multiplied by the weights and then output[] contains the activation values .

In the back prop you are taking the derivations of these activation values while the MatrixVector derivative function expects you to send the temp values

heres the difference in derivatives

 

Allow me to simplify it :

#property version   "1.00"

int OnInit()
  {
//---
  //lets assume x is the output of the previous layer (*) weights of a node 
  //the value that goes in activation . 
    double x=3;
  //we get the sigmoid by the formula below 
    double activation_of_x=sigmoid(x);
  //and for the derivative we do 
    double derivative_of_activation_of_x=sigmoid_derivative(activation_of_x);
  //we do this with matrixvector
    vector vX;
    vX.Init(1);
    vX[0]=3;
    //we create a vector for activations
    vector vActivation_of_x;
    vX.Activation(vActivation_of_x,AF_SIGMOID);
    //we create a vector for derivations
    vector vDerivative_of_activation_of_x,vDerivative_of_x;
    vActivation_of_x.Derivative(vDerivative_of_activation_of_x,AF_SIGMOID);
    vX.Derivative(vDerivative_of_x,AF_SIGMOID);
    
    Print("NormalActivation("+activation_of_x+")");
    Print("vector Activation("+vActivation_of_x[0]+")");
    Print("NormalDerivative("+derivative_of_activation_of_x+")");
    Print("vector Derivative Of Activation Of X ("+vDerivative_of_activation_of_x[0]+")");
    Print("vector Derivative Of X ("+vDerivative_of_x[0]+")");
    //you are doing vector derivative of activation of x which returns the wrong value
    //vectorMatrix expects you to send x not the activation(x)
//---
   return(INIT_SUCCEEDED);
  }

double sigmoid(double of){
return((1.0/(1.0+MathExp((-1.0*of)))));
}

double sigmoid_derivative(double output){
return(output*(1-output));
}


 
Lorentzos Roussos #:

Allow me to simplify it :

I see your point. Indeed, sigmoid derivative is formulated via 'y', that is through sigmoid value at point x, that is y(x): y'(x) = y(x)*(1-y(x)). This is exactly how the codes in the article implemented.

Your test script calculates "derivative" taking x as input, not y, hence the values are different.

 
Stanislav Korotky #:

I see your point. Indeed, sigmoid derivative is formulated via 'y', that is through sigmoid value at point x, that is y(x): y'(x) = y(x)*(1-y(x)). This is exactly how the codes in the article implemented.

Your test script calculates "derivative" taking x as input, not y, hence the values are different.

Yes but the activation values are passed to the derivation function while it expects the pre-activation values. That's what i'm saying .

And you missed the point there , the correct value is with the x as input . (the correct value according to mq's function itself)

You are not storing the output_of_previous * weights somewhere (i think) , which is what should be sent in the derivation functions (according to mq's function itself again , i'll stress that)

 
Lorentzos Roussos #:

Yes but the activation values are passed to the derivation function while it expects the pre-activation values. That's what i'm saying .

I think you're wrong. The derivation is calculated according to the formula I gave above: y'(x) = y(x)*(1-y(x)), where x is neuron state before activation and y(x) is the result of applying activation function. You don't use pre-activation value to calculate the derivation, use result of activation (y) instead. Here is a simplified test:

double derivative(double output)
{
  return output * (1 - output);
}

void OnStart()
{
  vector x = {{0.68}};
  vector y;
  x.Activation(y, AF_SIGMOID);                            // got activation/sigmoid result as "y(x)" in y[0]
  vector d;
  x.Derivative(d, AF_SIGMOID);                            // got derivative of sigmoid at x
  Print(derivative(x[0]), " ", derivative(y[0]), " ", d); // 0.2176 0.2231896389723258 [0.2231896389723258]
}
 
Stanislav Korotky #:

I think you're wrong. The derivation is calculated according to the formula I gave above: y'(x) = y(x)*(1-y(x)), where x is neuron state before activation and y(x) is the result of applying activation function. You don't use pre-activation value to calculate the derivation, use result of activation (y) instead. Here is a simplified test:

Yeah , that's what i'm saying , the correct derivative matches the derivative called from the x values.

In the back prop function you are calling the equivalent y.Derivative(d,AF_SIGMOID)

The outputs matrix is y in the backprop in the article i don't think you are storing the equivalent of x in a matrix to call the derivative from that.

(again , according to the mq function)

--

even in your example you are calling the derivative from x , i bet you typed y at first and then "whoopsed"

Just tell them in the russian forum . It will save a lot of people a lot of time if they can add it in the docs .

Thanks

 
Stanislav Korotky #:

I think you're wrong. The derivation is calculated according to the formula I gave above: y'(x) = y(x)*(1-y(x)), where x is neuron state before activation and y(x) is the result of applying activation function. You don't use pre-activation value to calculate the derivation, use result of activation (y) instead. Here is a simplified test:

Let me simplify this . 

This is your example

double derivative(double output)
{
  return output * (1 - output);
}

void OnStart()
{
  vector x = {{0.68}};
  vector y;
  x.Activation(y, AF_SIGMOID);                            // got activation/sigmoid result as "y(x)" in y[0]
  vector d;
  x.Derivative(d, AF_SIGMOID);                            // got derivative of sigmoid at x
  Print(derivative(x[0]), " ", derivative(y[0]), " ", d); // 0.2176 0.2231896389723258 [0.2231896389723258]
}

In your example you are calling x.Derivative to fill the derivatives vector d.

You are not calling y.Derivative to fill the derivatives , why ? Because it returns the wrong values . (and you probably saw it that's why you used x.Derivative).

What is y ? the activation values of x .

So when you do this :

x.Activation(y, AF_SIGMOID);  

you fill y with the activation values of x , but you call the derivative on x not on y. (which is correct according to the mql5 function)

In your article , in the feed forward temp is

matrix temp = outputs[i].MatMul(weights[i]);

And y would be the activation values of x . What matrix is that ?

temp.Activation(outputs[i + 1], i < n - 1 ? af : of)

The outputs. In the article the y (which you don't call the derivative from in the example) is the outputs matrix.

(what we are seeing in code above is the equivalent x.Activation(y,AF) from the example which fills y with the activation values)

In your back prop code you are not calling x.Derivative because x (matrix temp = outputs[i].MatMul(weights[i]);)

is not stored anywhere and you cannot call it. You are calling the equivalent of y.Derivative which returns the wrong values

outputs[n].Derivative(temp, of)
outputs[i].Derivative(temp, af)

because y has the activation values .

Again , according to the mql5 function.

So in your example you are using the right call and in your article you are using the wrong call .

Cheers

 

So, you want this:

   bool backProp(const matrix &target)
   {
      if(!ready) return false;
   
      if(target.Rows() != outputs[n].Rows() ||
         target.Cols() != outputs[n].Cols())
         return false;
      
      // output layer
      matrix temp;
      //*if(!outputs[n].Derivative(temp, of))
      //*  return false;
      if(!outputs[n - 1].MatMul(weights[n - 1]).Derivative(temp, of))
         return false;
      matrix loss = (outputs[n] - target) * temp; // data record per row
     
      for(int i = n - 1; i >= 0; --i) // for each layer except output
      {
         //*// remove unusable pseudo-errors for neurons, added as constant bias source
         //*// (in all layers except for the last (where it wasn't added))
         //*if(i < n - 1) loss.Resize(loss.Rows(), loss.Cols() - 1);
         #ifdef BATCH_PROP
         matrix delta = speed[i] * outputs[i].Transpose().MatMul(loss);
         adjustSpeed(speed[i], delta * deltas[i]);
         deltas[i] = delta;
         #else
         matrix delta = speed * outputs[i].Transpose().MatMul(loss);
         #endif
         
         //*if(!outputs[i].Derivative(temp, af))
         //*   return false;
         //*loss = loss.MatMul(weights[i].Transpose()) * temp;
         if(i > 0) // backpropagate loss to previous layers
         {
            if(!outputs[i - 1].MatMul(weights[i - 1]).Derivative(temp, af))
               return false;
            matrix mul = loss.MatMul(weights[i].Transpose());
            // remove unusable pseudo-errors for neurons, added as constant bias source
            mul.Resize(mul.Rows(), mul.Cols() - 1);
            loss = mul * temp;
         }
         
         weights[i] -= delta;
      }
      return true;
   }

I'll think about it.

Reason: