All Blogs / Trading Ideas / Neural Networks

Deep Neural Networks: A Getting Started Tutorial, Part #4

30 June 2014, 12:42

Sergey Golubev

813

Setting the Network Weights

The code for method SetWeights is presented in Listing 3. Method SetWeights accepts an array of values that represent both weights and bias values. The method assumes the values are stored in a particular order: input-to-A weights followed by A-layer biases, followed by A-to-B weights, followed by B-layer biases, followed by B-to-output weights, followed by output biases.

Listing 3: Method SetWeights

public void SetWeights(double[] weights)
{
  int numWeights = (numInput * numHiddenA) + numHiddenA +
   (numHiddenA * numHiddenB) + numHiddenB +
     (numHiddenB * numOutput) + numOutput;
  if (weights.Length != numWeights)
    throw new Exception("Bad weights length");

  int k = 0;

  for (int i = 0; i < numInput; ++i)
    for (int j = 0; j < numHiddenA; ++j)
      iaWeights[i][j] = weights[k++];

  for (int i = 0; i < numHiddenA; ++i)
    aBiases[i] = weights[k++];

  for (int i = 0; i < numHiddenA; ++i)
    for (int j = 0; j < numHiddenB; ++j)
      abWeights[i][j] = weights[k++];

  for (int i = 0; i < numHiddenB; ++i)
    bBiases[i] = weights[k++];

  for (int i = 0; i < numHiddenB; ++i)
    for (int j = 0; j < numOutput; ++j)
      boWeights[i][j] = weights[k++];

  for (int i = 0; i < numOutput; ++i)
    oBiases[i] = weights[k++];
}

Method SetWeights also assumes the weights are stored in row-major form, where the row indices are the "from" indices and to column indices are the "to" indices. For example, if iaWeights[0] [2] = 1.23, then the weight from input node [0] to layer-A node [2] has value 1.23.

An alternative design for method SetWeights is to pass the weights and bias values in as six separate parameters rather than as a single-array parameter. Or you might want to overload SetWeights to accept either a single array parameter or six weights and bias value parameters.

Computing Deep Neural Network Outputs

Merhod ComputeOutputs begins by setting up scratch arrays to hold preliminary (before activation) sums:

public double[] ComputeOutputs(double[] xValues)
{
  double[] aSums = new double[numHiddenA];
  double[] bSums = new double[numHiddenB];
  double[] oSums = new double[numOutput];

These scratch arrays could have been declared as class members; if so, remember to zero out each array at the beginning of ComputeOutputs. Next, the input values are copied into the corresponding class array:

for (int i = 0; i < xValues.Length; ++i)
  this.inputs[i] = xValues[i];

An alternative is to use the C# Array.Copy method here. Notice the input values aren't changed by ComputeOutputs, so an alternative design is to eliminate the class member array named inputs, and to eliminate the need to copy values from the xValues array. In my opinion, the explicit inputs array makes a slightly clearer design and is worth the overhead of an extra array copy operation.

The next step is to compute the preliminary sum of weights times inputs for the layer-A nodes, add the bias values, then apply the activation function:

for (int j = 0; j < numHiddenA; ++j) // weights * inputs
    aSums[j] += this.inputs[i] * this.iaWeights[i][j];

for (int i = 0; i < numHiddenA; ++i)  // add biases
  aSums[i] += this.aBiases[i];

for (int i = 0; i < numHiddenA; ++i)   // apply activation
  this.aOutputs[i] = HyperTanFunction(aSums[i]);

In the demo, I use a WriteLine statement along with helper method ShowVector to display the pre-activation sums and the local layer-A outputs.

Next, the layer-B local outputs are computed, using the just-computed layer-A outputs as local inputs:

for (int j = 0; j < numHiddenB; ++j)
  for (int i = 0; i < numHiddenA; ++i)
    bSums[j] += aOutputs[i] * this.abWeights[i][j];

for (int i = 0; i < numHiddenB; ++i)
  bSums[i] += this.bBiases[i];

for (int i = 0; i < numHiddenB; ++i)
  this.bOutputs[i] = HyperTanFunction(bSums[i]);

Next, the final outputs are computed:

for (int j = 0; j < numOutput; ++j)
  for (int i = 0; i < numHiddenB; ++i)
    oSums[j] += bOutputs[i] * boWeights[i][j];

for (int i = 0; i < numOutput; ++i)
  oSums[i] += oBiases[i];

double[] softOut = Softmax(oSums);
Array.Copy(softOut, outputs, softOut.Length);

The final outputs are computed into the class array named outputs. For convenience, these values are also returned by method:

double[] retResult = new double[numOutput];
  Array.Copy(this.outputs, retResult, retResult.Length);
  return retResult;
}

An alternative to explicitly returning the weights and bias values as an array is to return void and implement a public method GetWeights. Method HyperTanFunction is defined as:

private static double HyperTanFunction(double x)
{
  if (x < -20.0) return -1.0; // correct to 30 decimals
  else if (x > 20.0) return 1.0;
  else return Math.Tanh(x);
}

And method Softmax is defined as:

private static double[] Softmax(double[] oSums)
{
  double max = oSums[0];
  for (int i = 0; i < oSums.Length; ++i)
    if (oSums[i] > max) max = oSums[i];

  double scale = 0.0;
  for (int i = 0; i < oSums.Length; ++i)
    scale += Math.Exp(oSums[i] - max);

  double[] result = new double[oSums.Length];
  for (int i = 0; i < oSums.Length; ++i)
    result[i] = Math.Exp(oSums[i] - max) / scale;

  return result; // scaled so xi sum to 1.0
}

Method Softmax is quite subtle, but it's unlikely you'd ever want to modify it, so you can usually safely consider the method a magic black box function.

Go Exploring

The code and explanation presented in this article should give you a good basis for understanding neural networks with two hidden layers. What about three or more hidden layers? The consensus in research literature is that two hidden layers is sufficient for almost all practical problems. But I'm not entirely convinced, and fully connected feed-forward neural networks with more than two hidden layers are relatively unexplored.

Training a deep neural network is much more difficult than training an ordinary neural network with a single layer of hidden nodes, and this factor is the main obstacle to using networks with multiple hidden layers. Standard back-propagation training often fails to give good results. In my opinion, alternate training techniques, in particular particle swarm optimization, are promising. However, these alternatives haven't been studied much.