Implementing recurrent models in Python

In the previous sections, we reviewed the principles of organizing a recurrent model architecture, and even built a recurrent neural layer using the LSTM block algorithm. Earlier, we used the Keras library for TensorFlow to build previous neural network models in Python. The same library offers a number of options for building recurrent neural layers. These include classes of basic recurrent neural layers as well as more complex models.

  • AbstractRNNCell — abstract object representing an RNN cell
  • Bidirectional — bidirectional shell for RNN
  • ConvLSTM1D — 1D convolutional LSTM block
  • ConvLSTM2D — 2D convolutional LSTM block
  • ConvLSTM3D — 3D convolutional LSTM block
  • GRU — recurrent block by Cho et al. (2014)
  • LSTM — layer of long-term short-term memory by Hochreiter (1997)
  • RNN — base class for the recurrent layer
  • SimpleRNN — fully connected recurrent layer in which the output must be returned to the input

In the presented list, in addition to the basic recurrence layer class, you can find already familiar LSTM and GRU models. It is also possible to create bidirectional recurrent layers, which are most often used in text translation tasks. The ConvLSTM model is built based on the architecture of the LSTM block but uses convolutional layers instead of fully connected layers as gates and a new content layer.

Additionally, there is an abstract recurrent cell class for creating custom architectural solutions for recurrent models.

We won't go deep into the Keras library API right now. We will use the LSTM block to create our test recurrent models. Exactly this kind of model we recreated using MQL5 and will be able to compare the performance of our models created in different programming languages.

The LSTM block class is designed to automatically choose between CuDNN or pure TensorFlow implementations based on available hardware and environment constraints, ensuring optimal performance.

Users have access to an excessive range of parameters for fine-tuning the recurrent block:

  • units — dimensionality of the output space
  • activation — activation function
  • recurrent_activation — activation function for the recurrent step (gate)
  • use_bias — flag of using an offset vector
  • kernel_initializer — method to initialize the weights matrix for the new context layer
  • recurrent_initializer — method to initialize the weight matrix for gates
  • bias_initializer — initialization method for bias vector
  • kernel_regularizer — function to regularize the weight matrix for the new content layer
  • recurrent_regularizer — function to regularize the weight matrix for gates
  • bias_regularizer — bias vector regularization function
  • activity_regularizer — output layer regularization function
  • kernel_constraint — function of constraints for the weight matrix of the new content layer
  • recurrent_constraint — function of constraints for the weight matrix of gates
  • bias_constraint — function of vector constraints
  • dropout — floating-point number from 0 to 1, defining the share of elements to be dropped out during linear transformation of input data
  • recurrent_dropout — floating-point number from 0 to 1, determining the share of elements to be dropped out during linear transformation of memory state
  • return_sequences — boolean flag to specify whether to return the last result in the output sequence or the results of the whole sequence
  • return_state — boolean flag to indicate whether to return the last state in addition to the output
  • go_backwards — boolean flag to instruct the processing of the input sequence in the backward order and return the reverse sequence
  • stateful — boolean flag to indicate the use of the last state for each sample with the i index in the batch as the initial state for the sample with the i index in the next batch
  • time_major — the format of the input and output sequence tensor shapes
  • unroll — boolean flag used to indicate whether to unroll the recurrent network or use a simple loop; unrolling can accelerate the training of the recurrent network, but it requires more memory

After acquainting ourselves with the control parameters of the LSTM layer class, we will proceed to the practical implementation of various models using the recurrence layer.