Description of a Python script structure

Python is an interpreted programming language with a minimalistic syntax. Such syntax enables the fast creation of small code blocks and the immediate testing of their functionality. Therefore, Python allows you to focus on solving the problem rather than programming. Perhaps, it's precisely due to this feature that Python has gained such popularity.

Despite the fact that interpreted programming languages run slower than compiled ones, Python has currently become the most popular programming language for creating and conducting experiments with neural networks. The issue of execution speed is solved by using various libraries written, among others, in compiled programming languages. Fortunately, Python has the ability to easily expand and incorporate libraries written in practically all available programming languages.

We, too, won't be constructing complex algorithms and will make use of ready-made solutions, including libraries both for building neural networks and for trading. Let's start by familiarizing ourselves with some of them.

The os module contains functions for working with the operating system. Using this library enables the creation of cross-platform applications, as the functionality of this module operates independently of the installed operating system. Here are just some of the functions of the os library:

  • os.name returns the name of the operating system. The following options are possible as a result of executing the function: 'posix', 'nt', 'mac', 'os2', 'ce', 'java'.
  • os.environ is a function for working with environment variables, allowing you to modify, add and delete environment variables.
  • os.path contains a number of functions for working with file and directory paths.

The Pandas module is a library for processing and analyzing data. The library provides specialized data structures and operations for processing numeric tables and time series. It enables data analysis and modeling without using specialized programming languages for statistical processing, such as R or Octave.

The package is designed for data cleaning and initial assessment based on general statistical indicators. It can be used to calculate mean, quantiles, etc. At the same time, the package cannot be considered purely statistical in nature. However, the datasets it creates, such as DataFrame and Series types, are used as inputs in most data analysis and machine learning modules like SciPy, Scikit-Learn, and others.

The DataFrame object is created in the Pandas library. It is designed to work with indexed arrays of two-dimensional data.

In addition, the library provides:

  • Tools for data exchange between structures in memory and files of different formats;
  • Built-in data matching tools and ways to handle missing information;
  • Reformatting datasets, including creating pivot tables;
  • Advanced indexing and sampling capabilities from large datasets;
  • Grouping capabilities that enable performing three-step operations like “split, apply, combine”;
  • Merging and combining different datasets.

The library provides the ability to create hierarchical indexing, allowing you to work with high-dimensional data within structures of lower dimensions. Functions for working with time series allow you to form time periods and change intervals. The library is optimized for high performance, with the most important parts of the code written in Cython and C.

Another library for working with multidimensional arrays, NumPy, is an open-source library. The main capabilities of this module include support for multidimensional arrays (including matrices) and high-level mathematical functions designed for working with multidimensional arrays.

The NumPy library implements computational algorithms in the form of functions and operators that are optimized for working with multidimensional arrays. The library offers the ability to perform vector operations on data. All functions are written in C and optimized for maximum performance. As a result, any algorithm that can be expressed as a sequence of operations on arrays (matrices) and implemented using NumPy runs as fast as the equivalent code executed in MATLAB.

At the same time, NumPy can be considered as an alternative to using MATLAB. Both languages are interpreted and allow performing operations on arrays.

NumPy is often used as a base for working with multidimensional arrays in other libraries. The aforementioned Pandas also uses the NumPy library for low-level array operations.

The Matplotlib module is a comprehensive library for creating static, animated, and interactive visualizations. It can be used to visualize large volumes of data.

We will use the TensorFlow library to create neural network models. This is a comprehensive open-source platform for machine learning. It has a flexible ecosystem of tools, libraries, and community resources that allow researchers to advance the latest achievements in machine learning while enabling developers to easily create and deploy machine learning-based applications.

The library enables the creation and training of machine learning models using intuitive high-level APIs with eager execution, such as Keras. This provides immediate integration of the model and facilitates debugging.

Of course, to integrate with the MetaTrader 5 terminal, we will use the MetaTrader5 library of the same name. It provides a set of functions for data exchange with the terminal, including functions for retrieving market information and executing trading operations.

For technical analysis of data, you can make use of the TA-lib library, which offers a wide range of functions for technical indicators.

Before you can use the libraries, you must install them in the Python environment you are using. To do this, in the command prompt or Windows PowerShell with administrator privileges, you need to execute a series of commands:

  • NumPy installation

pip install numpy

  • installing Pandas

pip install pandas

  • installing Matplotlib

pip install matplotlib

  • installing TensorFlow

pip install tensorflow

  • installing Keras

pip install keras

  • installing MetaTrader 5 library

pip install MetaTrader5

Moving directly to the structure of our script, let's create a template template.py. The script will consist of several blocks. First, we need to connect the necessary libraries to our script.

# import libraries
import os
import pandas as pd
import numpy as np
import tensorflow as tf 
from tensorflow import keras 
import matplotlib as mp
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
import MetaTrader5 as mt5

After training the models, we will create visualization plots to depict the training process and compare the performance of different models. To standardize the plots, we will define common parameters for their construction.

# set parameters for results graphs
mp.rcParams.update({'font.family':'serif',
                    'font.serif':'Clear Sans',
                    'axes.labelsize':'medium',
                    'legend.fontsize':'small',
                    'figure.figsize':[6.0,4.0],
                    'xtick.labelsize':'small',
                    'ytick.labelsize':'small',
                    'axes.titlesize''x-large',
                    'axes.titlecolor''#333333',
                    'axes.labelcolor''#333333',
                    'axes.edgecolor''#333333'
                   })

We will perform the training and testing of all models on a single dataset, which we will specifically preload into a file on the local disk. This approach will allow us to eliminate the influence of disparate data and assess the performance of different neural network models under consistent conditions.

Hence, in the next step, we will load the initial data from the file into the table. Please note the following: since MetaTrader 5 restricts access to files from its programs within the sandboxed environment, you need to provide the full path to the source data file. It will be stored in the MQL5\Files directory of your terminal or its subdirectories if they were specified when saving the data file.

Instead of hardcoding the path to the terminal sandbox in our program code, we will retrieve it from MetaTrader 5 using the provided API. To achieve this, we first establish a connection to the installed terminal and verify the outcome of this operation.

# Connecting to the MetaTrader 5 terminal
if not mt5.initialize():
    print("initialize() failed, error code =",mt5.last_error())
    quit()

After successfully connecting to the terminal, we will request the sandbox path and then disconnect from the terminal. Subsequent operations involving model creation and training will be conducted using Python tools. We do not plan to perform any trading operations in this script.

# Requesting a sandbox path
path=os.path.join(mt5.terminal_info().data_path,r'MQL5\Files')
mt5.shutdown()

In the following short data loading block, you can observe the usage of functions from all three aforementioned libraries simultaneously. Using the os.path.join function, we concatenate the path to the working directory with the name of the training sample file. With the read_table function from the Pandas library, we read and convert the contents of the CSV file into a table. Then we convert the obtained table into a two-dimensional array using the NumPy library function.

# Loading a training sample
filename = os.path.join(path,'study_data.csv')
data = np.asarray( pd.read_table(filename,
                   sep=',',
                   header=None,
                   skipinitialspace=True,
                   encoding='utf-8',
                   float_precision='high',
                   dtype=np.float64,
                   low_memory=False))

The actual reading of the CSV file contents and the transformation of rows into a table are performed using the read_table function from the Pandas library. This function has quite a few parameters for precise configuration of the methods to transform string data into the desired numerical data type. Their full description can be found in the library documentation. We will only describe those we use:

  • filename gives the name of the file to be read, specifying full or relative path;
  • sep specifies the data separator used in the file;
  • header provides row numbers to be used as column names and the beginning of the data, in the absence of headers we specify the value None;
  • skipinitialspace is a boolean parameter that specifies whether to skip spaces after the delimiter;
  • encoding specifies the type of encoding used;
  • float_precision determines which converter should be used for floating point values;
  • dtype specifies the final data type;
  • low_memory internally processes the file piecemeal, which will result in less memory usage during parsing.

As a result of these operations, all training sample data were loaded into a two-dimensional array object of type numpy.ndarray from the NumPy library. Among the loaded data, there are elements of the source data and target values. However, for training a neural network, we need to separately feed the source data as input to the network and then compare the obtained output with the target values after the forward pass. It turns out that the input data and targets for the neural network are separated in time and place of use.

Hence, we need to split this data into separate arrays. Let each data row represent an individual data pattern, with the last two elements of the row containing the target points for that pattern. The shape function will show the size of our array, which means we can use it to determine the dimensions of the initial data and target values. Only by knowing these dimensions can we copy specific samples into new arrays.

In the block below, we will divide the training sample into 2 tables. In doing so, we are separating only the columns while preserving the entire structure of rows. Thus, we get the initial data in one array and the target values in the other array. The patterns can be mapped to the corresponding target values by row number.

# Dividing the training sample into baseline data and targets
inputs=data.shape[1]-2
targerts=2
train_data=data[:,0:inputs]
train_target=data[:,inputs:]

Now that we have the training data, we can start building the neural network model. We will create models using the function Sequential from the Keras library.

# Create a neural network model
model = keras.Sequential([....])

The Sequential model is a linear stack of layers. You can create a Sequential model by passing a list of layers to the model's constructor, and you can also add layers using the add method.

First of all, our model needs to know what dimensionality of data to expect at the input. In this regard, the first layer of the Sequential model must obtain information about the dimensionality of the input data. All subsequent layers perform an automatic dimensionality calculation.

There are several ways to specify the dimensions of the raw data:

  • Pass the input_shape argument to the first layer.
  • Some 2D layers support the specification of the dimensionality of the input data via the input_dim argument. Some 3D layers support input_dim and input_length arguments.
  • Use a special type of neural layer for the original Input data with the shape parameter that specifies the size of the layer.

# Create a neural network model
model = keras.Sequential([keras.Input(shape=inputs),
 # Fill the model with a description of the neural layers
                         ])

We will become familiar with the types of proposed neural layers as we study their architectures. Now let's look at the general principles of building and organizing models.

Once the model is created, you need to prepare it for training and customize the process. This functionality is performed in the compile method which has several parameters:

  • optimizer — an optimizer, can be specified as a string identifier of an existing optimizer or as an instance of the Optimizer class;
  • loss — a loss function, can be specified by a string identifier of an existing loss function, or an eigenfunction;
  • metrics — a list of metrics that the model should evaluate during training and testing, for example, 'accuracy' could be used for the classification task;
  • loss_weights — an optional list or dictionary that defines scalar coefficients for weighting the loss contributions of various model outputs;
  • weighted_metrics — a list of metrics that will be evaluated and weighted during training and testing.

For each parameter, the Keras library offers a different list of possible values, but it does not limit the user to the proposed options. For each parameter, there is a possibility to add custom classes and algorithms.

model.compile(optimizer='Adam'
              loss='mean_squared_error'
              metrics=['accuracy'])

Next, we can start training the created model using the fit method, which allows training a model with a fixed number of epochs. This method has parameters to customize the learning process.

  • x — an array of initial data;
  • y — an array of target results;
  • batch_size — an optional parameter, specifies the number of sets of "source data - target values" pairs before updating the weights matrix;
  • epochs — a number of epochs of training;
  • verbose — an optional parameter, specifies the level of detail of training logging: 0 - no messages, 1 - progress bar, 2 - one line per epoch, auto - automatic selection;
  • callbacks — a list of callbacks to apply during training;
  • validation_split — carries out allocation of a part of the training sample for validation, specified in fractions from 1.0;
  • validation_data — a separate sample for validating the learning process;
  • shuffle — a logical value that indicates the need to shuffle the training sample data before the next epoch;
  • class_weight — an optional dictionary mapping class indices to a weight value used to weight the loss function (only during training);
  • sample_weight — an optional array of NumPy weights for the training sample used to weight the loss function (only during training);
  • initial_epoch — a training start epoch, can be useful for resuming a previous training cycle;
  • steps_per_epoch — a total number of packets before declaring one epoch completed and starting the next, by default equal to the training sample size;
  • validation_steps — the total number of packets from the validation sample before stopping when performing validation at the end of each epoch, defaults to the validation sample size;
  • validation_batch_size — a number of samples per validation batch;
  • validation_freq — an integer, specifies the number of training periods before performing a new validation run.

Of course, we will not use the full set of parameters in the first model. I propose to stop at the parameter callbacks which sets a callback list. This option provides methods for interacting with the learning process in an interactive way.

Its usage allows configuring the retrieval of real-time information about the training process and managing the process itself. In particular, you can accumulate the average values of indicators for an epoch or save the results of each epoch to a CSV file. You can also monitor training metrics and adjust the learning rate or even stop the training process when the monitored metric stops improving. At the same time, it is possible to add your own callback classes.

I suggest using early stopping in the training procedure if there is no improvement in the error function metric over five epochs.

callback = tf.keras.callbacks.EarlyStopping(monitor='loss'patience=5)
history = model.fit(train_datatrain_target,
                      epochs=500batch_size=1000,
                      callbacks=[callback],
                      verbose=2,
                      validation_split=0.2,
                      shuffle=true)

After training is complete, save the trained model to a file on the local disk. To do this, let's use the methods of the Keras and os libraries.

# Saving the learned model
model.save(os.path.join(path,'model.h5'))

For clarity and understanding of the training process, let's plot the dynamics of metric changes during training and validation. Here we will use the methods of the Matplotlib library.

# Drawing model learning results
plt.plot(history.history['loss'], label='Train')
plt.plot(history.history['val_loss'], label='Validation')
plt.ylabel('$MSE$ $Loss$')
plt.xlabel('$Epochs$')
plt.title('Dynamic of Models train')
plt.legend(loc='upper right')
 
plt.figure()
plt.plot(history.history['accuracy'], label='Train')
plt.plot(history.history['val_accuracy'], label='Validation')
plt.ylabel('$Accuracy$')
plt.xlabel('$Epochs$')
plt.title('Dynamic of Models train')
plt.legend(loc='lower right')

After training, it's necessary to evaluate our model's performance on the test dataset, as before deploying the model in real conditions, we need to understand how it will perform on new data. To do this, we will load a test sample. The data loading procedure is entirely analogous to loading the training dataset, with the only difference being the filename.

# Loading a test sample
test_filename = os.path.join(path,'test_data.csv')
test = np.asarraypd.read_table(test_filename,
                   sep=',',
                   header=None,
                   skipinitialspace=true,
                   encoding='utf-8',
                   float_precision='high',
                   dtype=np.float64,
                   low_memory=false))

After loading the data, we will split the obtained table into source data and target labels, just as we did with the training dataset.

# Dividing the test sample into raw data and targets
test_data=test[:,0:inputs]
test_target=test[:,inputs:]

We will check the quality of the trained model on the test sample using the evaluate method from the Keras library. As a result of calling the specified method, we obtain the loss value and metrics on the test dataset. The method has a number of parameters to customize the testing process:

  • x — an array of initial data of the test sample;
  • y — an array of test sample targets;
  • batch_size — a size of the test batch;
  • verbose — a mode of process logging detailing (0 - no logging, 1 - progress indication);
  • sample_weight — an optional parameter used to weight the loss function;
  • steps — a total number of steps to declare the testing process complete;
  • callbacks — a list of callbacks used in the training process;
  • return_dict — a boolean variable that defines the format of the method output (True = as a "metric-value" dictionary, False = as a list).

Most of the above parameters are optional and also have default values. To initiate the testing process, in most cases, it's sufficient to simply provide the data arrays.

# Validation of model results on test sample
ltest_losstest_acc = model.evaluate(test_datatest_target

Finally, let's output the test results to the log and display the previously plotted graphs.

# Logging test results
print('Model in test')
print('Test accuracy:'test_acc)
print('Test loss:'test_loss)
 
# Output of created charts
plt.show()

At this point, the basic script template can be considered complete. It's worth noting that attempting to run this script will result in an error. It has nothing to do with errors in template construction. Indeed, we haven't provided a description of our model yet and left the block empty. During the process of exploring various neural network solutions, we will fill in the model architecture description block, allowing us to fully assess the performance of our template.