Integrating ML models with the Strategy Tester (Part 3): Managing CSV files (II)

Jonathan Pereira | 4 July, 2023

Introduction

In this article, we will focus on the third part of Strategy Tester integration with Python. We will see the creation of the CFileCSV class for the efficient management of CSV files. We will examine some examples and the code, so that the readers better understand how this class can be implemented in practice.

So, what is CSV?

CSV (Comma Separated Values) is a simple and widely used file format for storing and exchanging data. It is similar to a table in which each row represents a set of data while each column represents a field in that data. Values are separated by a delimiter to make them easier to read and write across different tools and programming languages.

The CSV format appeared in the early 1970s and was first used on mainframe systems. CSV cannot be traced to a specific creator since it is a widely used file type.

It is often used to import and export data in various applications such as spreadsheets, databases, data analysis programs, etc. Its popularity is due to both ease of use and understanding, and compatibility with many systems and tools. This is especially useful when we need to share data between different applications, for example to transfer information from one system to another.

So, the key advantages of using CSV are ease of use and compatibility. However, it also has some limitations, such as lack of support for complex data types and reduced ability to handle very large amounts of data. Also, the lack of a universal standard for the CSV format can cause compatibility issues between different applications. In addition, you can accidentally lose or modify data since the format does not provide validation. In general, CSV is a versatile and easy-to-use option for storing and sharing data. Nevertheless, it's important to know and fully understand its limitations and take steps to ensure data accuracy.


Motivation

The creation of the CFileCSV class was caused by the need to integrate the MetaTrader 5 Strategy Tester environment with Python. While developing trading strategies using machine learning (ML) models, I encountered the difficulty using models created in Python. I would either have to create a machine learning library in MQL5, which was beyond my main goal, or create an Expert Advisor entirely in Python.

Although the MQL5 language provides resources for creating ML libraries, I did not want to spend time and effort developing them, since my main goal was to analyze data and build models in a fast and efficient way.

So, the task was to find an intermediate solution. I wanted to take advantage of ML models built in Python, but also be able to apply them directly to my work with MQL5. So, I started to look for a way to overcome this limitation and find a solution for integrating these two environments.

The idea was to create a messaging system where MetaTrader 5 and Python could communicate with each other in a timely fashion. This would allow you to control the initialization and transfer of data from MetaTrader 5 to Python and the sending of predictions from Python to Meta Trader 5. The CFileCSV class was designed to facilitate this interaction by allowing efficient data storage and loading.


Introduction to the CFileCSV class

CFileCSV is a class for working with CSV (Comma Separated Values) files. The class is derived from CFile. So, it provides specific functionality for working with CSV files. The purpose of this class is to make CSV files easier to read and write by making it easier to work with different data types.

One of the big benefits of using CSV files is that such files are easy to share and the provide a convenient way to import/export data. Such files can be easily opened and edited in programs like Excel or Google Sheets, and they can be read in various programming languages. Moreover, since they do not have a specific format, they can be read and written according to different needs.

The CFileCSV class has four main public methods: Open, WriteHeader, WriteLine, and Read. In addition, it has two private helper methods which convert arrays or matrices to strings and write those values to a file.

class CFileCSV : public CFile
  {
private:
   template<typename T>
   string            ToString(const int, const T &[][]);
   template<typename T>
   string            ToString(const T &[]);
   short             m_delimiter;

public:
                     CFileCSV(void);
                    ~CFileCSV(void);
   //--- methods for working with files
   int               Open(const string,const int, const short);
   template<typename T>
   uint              WriteHeader(const T &values[]);
   template<typename T>
   uint              WriteLine(const T &values[][]);
   string            Read(void);
  };  

When using this class, keep in mind that it was designed to work with specific CSV files. If the data in the file is not formatted correctly, the results may be unexpected. It is also very important to make sure that the file has been opened before you attempt to write to it, and that it has write permission.

As an example of using the CFileCSV class, we can create a CSV file from a data matrix. First, we will create an instance of the class and open the file using the Open method. In this method we specify the file name and the Open flag. Next, we use the WriteHeader method to write the header to the file, and the WriteLine method to write data rows from the matrix. Let's illustrate these steps with an example function:

#include "FileCSV.mqh"

void CreateCSVFile(string fileName, string &headers[], string &data[][])
  {
   // Creates an object of the CFileCSV class
   CFileCSV csvFile;

   // Checks if the file can be opened for writing in the ANSI format
   if(csvFile.Open(fileName, FILE_WRITE|FILE_ANSI))
     {
        int rows = ArrayRange(data, 0);
        int cols = ArrayRange(data, 1);
        int headerSize = ArraySize(headers);
        //Checks if the number of columns in the data matrix is equal to the number if elements in the header array and if the number of rows in the data matrix is greater than zero
        if(cols != headerSize || rows == 0)
        {
            Print("Error: Invalid number of columns or rows. Data array must have the same number of columns as the headers array and at least one row.");
            return;
        }
      // Writes header to file
      csvFile.WriteHeader(headers);
      // Writes data rows to file
      csvFile.WriteLine(data);
      // Closes the file
      csvFile.Close();
     }
   else
     {
      // Shows an error message if the file cannot be opened
      Print("Error opening file!");
     }
  }

The purpose of this method is to create a CSV file from an array of headers and an array of data. Let's start by creating an object of the CFileCSV class. Then, check if the file can be opened for writing in ANSI format. If the file can be opened, make sure that the number of columns in the data matrix is equal to the number of elements in the header matrix and that the number of rows in the data matrix is greater than zero. If these conditions are met, the method writes the header to the file using the WriteHeader() method and then writes the data rows using the WriteLine() method. Finally, the method closes the file. If the file cannot be opened, an error message is displayed.

This method will be demonstrated with an example shortly. Pay attention that its implementation can be extended to perform other tasks. For example, you can add more validations: check if the file exists before trying to open it or add an options to choose which delimiter to use.

The CFileCSV class provides a simple and practical way to work with CSV files, making it easy to read and write data to CSV files. However, you should be careful when using it: you should ensure that the files are in the expected format and check the method returns to make sure they were successfully executed.


Implementation

As mentioned above, the CFileCSV class has four main public methods: Open, WriteHeader, WriteLine, and Read. It also has two private helper methods which have name overload: ToString.

int CFileCSV::Open(const string file_name,const int open_flags, const short delimiter=';')
  {
   m_delimiter=delimiter;
   return(CFile::Open(file_name,open_flags|FILE_CSV|delimiter));
  }
template<typename T>
uint CFileCSV::WriteHeader(const T &values[])
  {
   string header=ToString(values);
//--- check handle
   if(m_handle!=INVALID_HANDLE)
      return(::FileWrite(m_handle,header));
//--- failure
   return(0);
  }
template<typename T>
uint CFileCSV::WriteLine(const T &values[][])
  {
   int len=ArrayRange(values, 0);

   if(len<1)
      return 0;

   string lines="";
   for(int i=0; i<len; i++)
      if(i<len-1)
         lines += ToString(i, values)  + "\n";
      else
         lines += ToString(i, values);

   if(m_handle!=INVALID_HANDLE)
      return(::FileWrite(m_handle, lines));
   return 0;
  }
string CFileCSV::Read(void)
  {
   string res="";
   if(m_handle!=INVALID_HANDLE)
      res = FileReadString(m_handle);

   return res;

The ToString methods are private helper methods of the CFileCSV class and are used to convert matrices or arrays to strings and to write these values to a file.

template<typename T>
string CFileCSV::ToString(const int row, const T &values[][])
  {
   string res="";
   int cols=ArrayRange(values, 1);

   for(int x=0; x<cols; x++)
      if(x<cols-1)
         res+=values[row][x] + ShortToString(m_delimiter);
      else
         res+=values[row][x];

   return res;
  }
template<typename T>
string CFileCSV::ToString(const T &values[])
  {
   string res="";
   int len=ArraySize(values);

   if(len<1)
      return res;

   for(int i=0; i<len; i++)
      if(i<len-1)
         res+=values[i] + ShortToString(m_delimiter);
      else
         res+=values[i];

   return res;
  }

These methods are used by WriteHeader and Write Line to convert values passed as parameters to strings and to write those strings to the open file. They are used to ensure that the values are written into the file in the expected format and are separated by the specified delimiter. They are fundamental to ensure that the data is written correctly and in an organized form in the CSV file. 

In addition, these methods provide the CFileCSV class with more flexibility, allowing it to handle different kinds of data because they are implemented as templates. This means that these methods can be applied to any kind of data that can be converted to a string, including integers, floats, strings, and others. This makes the CFileCSV class very versatile and easy to use.

These methods are mainly intended to ensure that values are written to the file in the correct format. They include a delimiter at the end of every element except the last element in a row or matrix. This ensures that the values in the CSV file are properly separated, which very important for later reading and interpreting the data stored in the file.


An example of using ToString(const int row, const T &values[][]):

int data[2][3] = {{1, 2, 3}, {4, 5, 6}};
string str = csvFile.ToString(1, data);
//str -> "4;5;6"

In this example, we pass the second row of the data matrix to the ToString method. The method iterates over each element in the string, appending it to the resulting string, and inserting a delimiter at the end of every element except the last element of the string. The resulting string will be '4;5;6'.

Example of using ToString(const T &values[]):

string headers[] = {"Name", "Age", "Gender"};
string str = csvFile.ToString(headers);
//str -> "Name;Age;Gender"

In this example, the 'headers' array is passed to the ToString method. The method iterates over each element of the array, appending it to the resulting string and inserting a delimiter at the end of each element except the last element of the array. The resulting string will be 'Name;Age;Gender'.

These are just examples of using the ToString and ToString methods. They can be applied to any data type that can be converted to a string. However, please note that they're only available inside the CFileCSV class because they're declared as private.


Algorithmic complexity 

How can we measure the complexity of algorithms and use this information to optimize the performance of algorithms and systems?

The Big O notation is an important tool for analyzing algorithms, which has been recognized since the early days of computer science. The Big O concept was formally defined in the 1960s but it is still widely used today. It allows programmers to roughly estimate the complexity of an algorithm based on its inputs and the operations required to execute it. Using this tool, it is possible to compare different algorithms and define those which provide better performance for specific tasks.

The amount of data and the complexity of the problems that must be solved grow exponentially. That is why the Big O notation is so relevant. While more and more data is generated daily, we need more efficient algorithms to process this data.

The Big O concept is based on the idea that, for an algorithm, the execution time grows according to a certain mathematical function, usually a polynomial. This function is expressed as the Big O notation, which can be represented as O(f(n)), where f(n) is what shows the complexity of the algorithm.

Let's now look at a few examples of using Big O notation:

Big O will help to decide which algorithm to choose for solving your particular problem, and also to optimize the performance of systems.



The time complexity of each method of the CFileCSV class varies depending on the size of the data provided as a parameter.

Please note that these complexities are estimates, as they can be influenced by other factors, such as the size of the file's write buffer, the file system, etc. Moreover, the Big O notation estimates the worst-case scenario. If there is too much data to provide to the methods, the complexity can increase.

In general, the CFileCSV class has an acceptable time complexity and is efficient for working with files that are not too large. However, if you need to handle very large files, you may need to take other approaches or optimize the class to handle specific use cases.

 


Usage Example

//+------------------------------------------------------------------+
//|                                                    exemplo_2.mq5 |
//|                                     Copyright 2022, Lethan Corp. |
//|                           https://www.mql5.com/pt/users/14134597 |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, Lethan Corp."
#property link      "https://www.mql5.com/pt/users/14134597"
#property version   "1.00"
#include "FileCSV.mqh"

CFileCSV csvFile;
string fileName = "dados.csv";
string headers[] = {"Timestamp", "Close", "Last"};
string data[1][3];

//The OnInit function
int OnStart(void)
  {
//Fill the 'data' array with values timestamp, Bid, Ask, Indicador1 and Indicador2
   data[0][0] = TimeToString(TimeCurrent());
   data[0][1] = DoubleToString(iClose(Symbol(), PERIOD_CURRENT, 0), 2);
   data[0][2] = DoubleToString(SymbolInfoDouble(Symbol(), SYMBOL_LAST), 2);

//Open the CSV file
   if(csvFile.Open(fileName, FILE_WRITE|FILE_ANSI))
     {
      //Write the header
      csvFile.WriteHeader(headers);
      //Write data rows
      csvFile.WriteLine(data);
      //Close the file
      csvFile.Close();
     }
   else
     {
      Print("File opening error!");
     }
   return(INIT_SUCCEEDED);
  }
//+------------------------------------------------------------------+

//+------------------------------------------------------------------+

This code is an implementation of the CFileCSV class in MQL5. It covers the following functionality:


Conclusion

The CFileCSV class provides a practical and efficient method way for working with CSV files. It includes methods for opening, writing headers and strings, and reading CSV files. The Open, WriteHeader, WriteLine and Read methods ensure correct operations with CSV files, ensuring that data is written and organized in a readable manner. Thank you for your time! In the next article, we will look at how to use ML models through file sharing using the CFileCSV class that was introduced in this article.