Example of searching for a trading strategy using SQLite

Let's try to use SQLite to solve practical problems. We will import structures into the MqlRates database with the history of quotes and analyze them in order to identify patterns and search for potential trading strategies. Of course, any chosen logic can also be implemented in MQL5, but SQL allows you to do it in a different way, in many cases more efficiently and using many interesting built-in SQL functions. The subject of the book, aimed at learning MQL5, does not allow going deep into this technology, but we mention it as worthy of the attention of an algorithmic trader.

The script for converting quotes history into a database format is called DBquotesImport.mq5. In the input parameters, you can set the prefix of the database name and the size of the transaction (the number of records in one transaction).

input string Database = "MQL5Book/DB/Quotes";
input int TransactionSize = 1000;

To add MqlRates structures to the database using our ORM layer, the script defines an auxiliary MqlRatesDB structure which provides the rules for binding structure fields to base columns. Since our script only writes data to the database and does not read it from there, it does not need to be bound using the DatabaseReadBind function, which would impose a restriction on the "simplicity" of the structure. The absence of a constraint makes it possible to derive the MqlRatesDB structure from MqlRates (and do not repeat the description of the fields).

struct MqlRatesDBpublic MqlRates
{
   /* for reference:
   
      datetime time;
      double   open;
      double   high;
      double   low;
      double   close;
      long     tick_volume;
      int      spread;
      long     real_volume;
   */
   
   bool bindAll(DBQuery &qconst
   {
      return q.bind(0time)
         && q.bind(1open)
         && q.bind(2high)
         && q.bind(3low)
         && q.bind(4close)
         && q.bind(5tick_volume)
         && q.bind(6spread)
         && q.bind(7real_volume);
   }
   
   long rowid(const long setter = 0)
   {
 // rowid is set by us according to the bar time
      return time;
   }
};
   
DB_FIELD_C1(MqlRatesDBdatetimetimeDB_CONSTRAINT::PRIMARY_KEY);
DB_FIELD(MqlRatesDBdoubleopen);
DB_FIELD(MqlRatesDBdoublehigh);
DB_FIELD(MqlRatesDBdoublelow);
DB_FIELD(MqlRatesDBdoubleclose);
DB_FIELD(MqlRatesDBlongtick_volume);
DB_FIELD(MqlRatesDBintspread);
DB_FIELD(MqlRatesDBlongreal_volume);

The database name is formed from the prefix Database, name, and timeframe of the current chart on which the script is running. A single table "MqlRatesDB" is created in the database with the field configuration specified by the DB_FIELD macros. Please note that the primary key will not be generated by the database, but is taken directly from the bars, from the time field (bar opening time).

void OnStart()
{
   Print("");
   DBSQLite db(Database + _Symbol + PeriodToString());
   if(!PRTF(db.isOpen())) return;
   
   PRTF(db.deleteTable(typename(MqlRatesDB)));
   
   if(!PRTF(db.createTable<MqlRatesDB>(true))) return;
   ...

Next, using packages of TransactionSize bars, we request bars from the history and add them to the table. This is a job of the helper function ReadChunk, called in a loop as long as there is data (the function returns true) or the user won't stop the script manually. The function code is shown below.

   int offset = 0;
   while(ReadChunk(dboffsetTransactionSize) && !IsStopped())
   {
      offset += TransactionSize;
   }

Upon completion of the process, we ask the database for the number of generated records in the table and output it to the log.

   DBRow *rows[];
   if(db.prepare(StringFormat("SELECT COUNT(*) FROM %s",
      typename(MqlRatesDB))).readAll(rows))
   {
      Print("Records added: "rows[0][0].integer_value);
   }
}

The ReadChunk function looks as follows.

bool ReadChunk(DBSQLite &dbconst int offsetconst int size)
{
   MqlRates rates[];
   MqlRatesDB ratesDB[];
   const int n = CopyRates(_SymbolPERIOD_CURRENToffsetsizerates);
   if(n > 0)
   {
      DBTransaction tr(dbtrue);
      Print(rates[0].time);
      ArrayResize(ratesDBn);
      for(int i = 0i < n; ++i)
      {
         ratesDB[i] = rates[i];
      }
      
      return db.insert(ratesDB);
   }
   else
   {
      Print("CopyRates failed: "_LastError" "E2S(_LastError));
   }
   return false;
}

It calls the built-in CopyRates function through which the rates bars array is filled. Then the bars are transferred to the ratesDB array so that using just one statement db.insert(ratesDB) we could write information to the database (we have formalized in MqlRatesDB how to do it correctly).

The presence of the DBTransaction object (with the automatic "commit" option enabled) inside the block means that all operations with the array are "overlaid" with a transaction. To indicate progress, during the processing of each block of bars, the label of the first bar is displayed in the log.

While the function CopyRates returns the data and their insertion into the database is successful, the loop in OnStart continues with the shift of the numbers of the copied bars deep into the history. When the end of the available history or the bar limit set in the terminal settings is reached, CopyRates will return error 4401 (HISTORY_NOT_FOUND) and the script will exit.

Let's run the script on the EURUSD, H1 chart. The log should show something like this.

   db.isOpen()=true / ok
   db.deleteTable(typename(MqlRatesDB))=true / ok
   db.createTable<MqlRatesDB>(true)=true / ok
   2022.06.29 20:00:00
   2022.05.03 04:00:00
   2022.03.04 10:00:00
   ...
   CopyRates failed: 4401 HISTORY_NOT_FOUND
   Records added: 100000

We now have the base QuotesEURUSDH1.sqlite, on which you can experiment to test various trading hypotheses. You can open it in the MetaEditor to make sure that the data is transferred correctly.

Let's check one of the simplest strategies based on regularities in history. We will find the statistics of two consecutive bars in the same direction, broken down by intraday time and day of the week. If there is a tangible advantage for some combination of time and day of the week, it can be considered in the future as a signal to enter the market in the direction of the first bar.

First, let's design an SQL query that requests quotes for a certain period and calculates the price movement on each bar, that is, the difference between adjacent opening prices.

Since the time for bars is stored as a number of seconds (by the standards of datetime in MQL5 and, concurrently, the "Unix epoch" of SQL), it is desirable to convert their display to a string for easy reading, so let's start the SELECT query from the datetime field based on DATETIME function:

SELECT
   DATETIME(time, 'unixepoch') as datetime, open, ...

This field will not participate in the analysis and is given here only for the user. After that, the price is displayed for reference, so that we can check the calculation of price increments by debug printing.

Since we are going to, if necessary, select a certain period from the entire file, the condition will require the time field in "a pure form", and it should also be added to the request. In addition, according to the planned analysis of quotes, we will need to isolate from the bar label its intraday time, as well as the day of the week (their numbering corresponds to that adopted in MQL5, 0 is Sunday). Let's call the last two columns of the query intraday and day, respectively, and the TIME and STRFTIME functions are used to get them.

SELECT
   DATETIME(time, 'unixepoch') as datetime, open,
   time,
   TIME(time, 'unixepoch') AS intraday,
   STRFTIME('%w', time, 'unixepoch') AS day, ...

To calculate the price increment in SQL, you can use the LAG function. It returns the value of the specified column with an offset of the specified number of rows. For example, LAG(X, 1) means getting the X value in the previous entry, with the second parameter 1 that means the offset defaulting to 1, i.e. it can be omitted to get the equivalent entry LAG(X). To get the value of the next entry, call LAG(X,-1). In any case, when using LAG, an additional syntactic construction is required that specifies the sorting order of records, in the simplest case, in the form of OVER(ORDER BY column).

Thus, to get the price increment between the opening prices of two neighboring bars, we write:

   ...
   (LAG(open,-1) OVER (ORDER BY time) - open) AS delta, ...

This column is predictive because it looks into the future.

We can reveal that two bars formed in the same direction by multiplying increments by them: positive values indicate a consistent rise or fall:

   ...

   (LAG(open,-1) OVER (ORDER BY time) - open) * (open - LAG(open) OVER (ORDER BY time))

      AS product, ...

This indicator is chosen as the simplest to use in calculation: for real trading systems, you can choose a more complex criterion.

To evaluate the profit generated by the system on the backtest, you need to multiply the direction of the previous bar (which acts as an indicator of future movement) by the price increment on the next bar. The direction is calculated in the column direction (using the SIGN function), for reference only. The profit estimate in the estimate column is the product of the previous movement direction and the increment of the next bar (delta): if the direction is preserved, we get a positive result (in points).

   ...

   SIGN(open - LAG(open) OVER (ORDER BY time)) AS direction,

   (LAG(open,-1) OVER (ORDER BY time) - open) * SIGN(open - LAG(open) OVER (ORDER BY time))

      AS estimate ...

In expressions in an SQL command, you cannot use AS aliases defined in the same command. That is why we cannot determine estimate as delta * direction, and we have to repeat the calculation of the product explicitly. However, we recall that columns delta and direction are not needed for programmatic analysis and are added here only to visualize the table in front of the user.

At the end of the SQL command, we specify the table from which the selection is made, and the filtering conditions for the backtest date range: two parameters "from" and "to".

...
FROM MqlRatesDB
WHERE (time >= ?1 AND time < ?2)

Optionally, we can add a constraint LIMIT?3 (and enter some small value, for example, 10) so that visual verification of the query results at first does not force you to look through tens of thousands of records.

You can check the operation of the SQL command using the DatabasePrint function, however, the function, unfortunately, does not allow you to work with prepared queries with parameters. Therefore, we will have to replace SQL parameter preparation '?n' with query string formatting using StringFormat and substitute parameter values there. Alternatively, it would be possible to completely avoid DatabasePrint and output the results to the log independently, line by line (through an array DBRow).

Thus, the final fragment of the request will turn into:

   ...
   WHERE (time >= %ld AND time < %ld)
   ORDER BY time LIMIT %d;

It should be noted that the datetime values in this query will be coming from MQL5 in the "machine" format, i.e., the number of seconds since the beginning of 1970. If we want to debug the same SQL query in the MetaEditor, then it is more convenient to write the date range condition using date literals (strings), as follows:

   WHERE (time >= STRFTIME('%s', '2015-01-01') AND time < STRFTIME('%s', '2021-01-01'))

Again, we need to use the STRFTIME function here (the '%s' modifier in SQL sets the transfer of the specified date string to the "Unix epoch" label; the fact that '%s' resembles an MQL5 format string is just a coincidence).

Save the designed SQL query in a separate text file DBQuotesIntradayLag.sql and connect it as a resource to the test script of the same name, DBQuotesIntradayLag.mq5.

#resource "DBQuotesIntradayLag.sql" as string sql1

The first parameter of the script allows you to set a prefix in the name of the database, which should already exist after launching DBquotesImport.mq5 on the chart with the same symbol and timeframe. The subsequent inputs are for the date range and length limit of the debug printout to the log.

input string Database = "MQL5Book/DB/Quotes";
input datetime SubsetStart = D'2022.01.01';
input datetime SubsetStop = D'2023.01.01';
input int Limit = 10;

The table with quotes is known in advance, from the previous script.

const string Table = "MqlRatesDB";

In the OnStart function, we open the database and make sure that the quotes table is available.

void OnStart()
{
   Print("");
   DBSQLite db(Database + _Symbol + PeriodToString());
   if(!PRTF(db.isOpen())) return;
   if(!PRTF(db.hasTable(Table))) return;
   ...

Next, we substitute the parameters in the SQL query string. We pay attention not only to the substitution of SQL parameters '?n' for format sequences but also double the percent symbols '%' first because otherwise the function StringFormat will perceive them as its own commands, and will not miss them in SQL.

   string sqlrep = sql1;
   StringReplace(sqlrep"%""%%");
   StringReplace(sqlrep"?1""%ld");
   StringReplace(sqlrep"?2""%ld");
   StringReplace(sqlrep"?3""%d");
   
   const string sqlfmt = StringFormat(sqlrepSubsetStartSubsetStopLimit);
   Print(sqlfmt);

All these manipulations were required only to execute the request in the context of the DatabasePrint function. In the working version of the analytical script, we would read the results of the query and analyze them programmatically, bypassing formatting and calling DatabasePrint.

Finally, let's execute the SQL query and output the table with the results to the log.

   DatabasePrint(db.getHandle(), sqlfmt0);
}

Here is what we will see for 10 bars EURUSD,H1 at the beginning of 2022.

db.isOpen()=true / ok

db.hasTable(Table)=true / ok

      SELECT

         DATETIME(time, 'unixepoch') as datetime,

         open,

         time,

         TIME(time, 'unixepoch') AS intraday,

         STRFTIME('%w', time, 'unixepoch') AS day,

         (LAG(open,-1) OVER (ORDER BY time) - open) AS delta,

         SIGN(open - LAG(open) OVER (ORDER BY time)) AS direction,

         (LAG(open,-1) OVER (ORDER BY time) - open) * (open - LAG(open) OVER (ORDER BY time))

            AS product,

         (LAG(open,-1) OVER (ORDER BY time) - open) * SIGN(open - LAG(open) OVER (ORDER BY time))

            AS estimate

      FROM MqlRatesDB

      WHERE (time >= 1640995200 AND time < 1672531200)

      ORDER BY time LIMIT 10;

 #| datetime               open       time intraday day       delta dir       product      estimate

--+------------------------------------------------------------------------------------------------

 1| 2022-01-03 00:00:00 1.13693 1641168000 00:00:00 1  0.0003200098                                

 2| 2022-01-03 01:00:00 1.13725 1641171600 01:00:00 1  2.999999e-05  1  9.5999478e-09  2.999999e-05 

 3| 2022-01-03 02:00:00 1.13728 1641175200 02:00:00 1  -0.001060006  1 -3.1799748e-08  -0.001060006 

 4| 2022-01-03 03:00:00 1.13622 1641178800 03:00:00 1 -0.0003400007 -1  3.6040028e-07  0.0003400007 

 5| 2022-01-03 04:00:00 1.13588 1641182400 04:00:00 1  -0.001579991 -1  5.3719982e-07   0.001579991 

 6| 2022-01-03 05:00:00  1.1343 1641186000 05:00:00 1  0.0005299919 -1 -8.3739827e-07 -0.0005299919 

 7| 2022-01-03 06:00:00 1.13483 1641189600 06:00:00 1 -0.0007699937  1 -4.0809905e-07 -0.0007699937 

 8| 2022-01-03 07:00:00 1.13406 1641193200 07:00:00 1 -0.0002600149 -1  2.0020098e-07  0.0002600149 

 9| 2022-01-03 08:00:00  1.1338 1641196800 08:00:00 1   0.000510001 -1 -1.3260079e-07  -0.000510001 

10| 2022-01-03 09:00:00 1.13431 1641200400 09:00:00 1  0.0004800036  1  2.4480023e-07  0.0004800036 

...

It is easy to make sure that the intraday time of the bar is correctly allocated, as well as the day of the week - 1, which corresponds to Monday. You can also check the delta increment. The product and estimate values are empty on the first row because they require the missing previous row to be calculated.

Let's complicate our SQL query by grouping records with the same time of day combinations (intraday) and day of the week (day), and calculating a certain target indicator that characterizes the success of trading for each of these combinations. Let's take as such an indicator the average cell size product divided by the standard deviation of the same products. The larger the average product of price increments of neighboring bars, the greater the expected profit, and the smaller the spread of these products, the more stable the forecast. The name of the indicator in the SQL query is objective.

In addition to the target indicator, we will also calculate the profit estimate (backtest_profit) and profit factor (backtest_PF). We will estimate profit as the sum of price increments (estimate) for all bars in the context of intraday time and day of the week (the size of the opening bar as a price increment is an analog of the future profit in points per one bar). The profit factor is traditionally the quotient of positive and negative increments.

   SELECT

      AVG(product) / STDDEV(product) AS objective,

      SUM(estimate) AS backtest_profit,

      SUM(CASE WHEN estimate >= 0 THEN estimate ELSE 0 END) /

         SUM(CASE WHEN estimate < 0 THEN -estimate ELSE 0 END) AS backtest_PF,

      intraday, day

   FROM

   (

      SELECT

         time,

         TIME(time, 'unixepoch') AS intraday,

         STRFTIME('%w', time, 'unixepoch') AS day,

         (LAG(open,-1) OVER (ORDER BY time) - open) AS delta,

         SIGN(open - LAG(open) OVER (ORDER BY time)) AS direction,

         (LAG(open,-1) OVER (ORDER BY time) - open) * (open - LAG(open) OVER (ORDER BY time))

            AS product,

         (LAG(open,-1) OVER (ORDER BY time) - open) * SIGN(open - LAG(open) OVER (ORDER BY time))

            AS estimate

      FROM MqlRatesDB

      WHERE (time >= STRFTIME('%s', '2015-01-01') AND time < STRFTIME('%s', '2021-01-01'))

   )

   GROUP BY intraday, day

   ORDER BY objective DESC

The first SQL query has become nested, from which we now accumulate data with an external SQL query. Grouping by all combinations of time and day of the week provides an "extra" from GROUP BY intraday, day. In addition, we have added sorting by target indicator (ORDER BY objective DESC) so that the best options are at the top of the table.

In the nested query, we removed the LIMIT parameter, because the number of groups became acceptable, much less than the number of analyzed bars. So, for H1 we get 120 options (24 * 5).

The extended query is placed in the text file DBQuotesIntradayLagGroup.sql, which in turn is connected as a resource to the test script of the same name, DBQuotesIntradayLagGroup.mq5. Its source code differs little from the previous one, so we will immediately show the result of its launch for the default date range: from the beginning of 2015 to the beginning of 2021 (excluding 2021 and 2022).

db.isOpen()=true / ok

db.hasTable(Table)=true / ok

   SELECT

      AVG(product) / STDDEV(product) AS objective,

      SUM(estimate) AS backtest_profit,

      SUM(CASE WHEN estimate >= 0 THEN estimate ELSE 0 END) /

         SUM(CASE WHEN estimate < 0 THEN -estimate ELSE 0 END) AS backtest_PF,

      intraday, day

   FROM

   (

      SELECT

         ...

      FROM MqlRatesDB

      WHERE (time >= 1420070400 AND time < 1609459200)

   )

   GROUP BY intraday, day

   ORDER BY objective DESC

  #|             objective       backtest_profit       backtest_PF intraday day

---+---------------------------------------------------------------------------

  1|      0.16713214428916     0.073200000000001  1.46040631486258 16:00:00 5   

  2|     0.118128291843983    0.0433099999999995  1.33678071539657 20:00:00 3   

  3|     0.103701251751617   0.00929999999999853  1.14148790506616 05:00:00 2   

  4|     0.102930330078208    0.0164399999999973   1.1932071923845 08:00:00 4   

  5|     0.089531492651001    0.0064300000000006  1.10167615433271 07:00:00 2   

  6|    0.0827628326995007 -8.99999999970369e-05 0.999601152226913 17:00:00 4   

  7|    0.0823433025146974    0.0159700000000012  1.21665988332657 21:00:00 1   

  8|    0.0767938336191962   0.00522999999999874  1.04226945769012 13:00:00 1   

  9|    0.0657741522256548    0.0162299999999986  1.09699976093712 15:00:00 2   

 10|    0.0635243373432768   0.00932000000000044  1.08294766820933 22:00:00 3

...   

110|   -0.0814131025461459   -0.0189100000000015 0.820605255668329 21:00:00 5   

111|   -0.0899571263478305   -0.0321900000000028 0.721250432975386 22:00:00 4   

112|   -0.0909772560603298   -0.0226100000000016 0.851161872161138 19:00:00 4   

113|   -0.0961794181717023  -0.00846999999999931 0.936377976414036 12:00:00 5   

114|    -0.108868074018582   -0.0246099999999998 0.634920634920637 00:00:00 5   

115|    -0.109368419185336   -0.0250700000000013 0.744496534855268 08:00:00 2   

116|    -0.121893581607986   -0.0234599999999998 0.610945273631843 00:00:00 3   

117|    -0.135416609546408   -0.0898899999999971 0.343437294573087 00:00:00 1   

118|    -0.142128458003631   -0.0255200000000018 0.681835182645536 06:00:00 4   

119|    -0.142196924506816   -0.0205700000000004 0.629769618430515 00:00:00 2   

120|     -0.15200009633513   -0.0301499999999988 0.708864426419475 02:00:00 1   

Thus, the analysis tells us that the 16-hour H1 bar on Friday is the best candidate to continue the trend based on the previous bar. Next in preference is the Wednesday 20 o'clock bar. And so on.

However, it is desirable to check the found settings on the forward period.

To do this, we can execute the current SQL query not only on the "past" date range (in our test until 2021) but once more in the "future" (from the beginning of 2021). The results of both queries should be joined (JOIN) by our groups (intraday, day). Then, while maintaining the sorting by the target indicator, we will see in the adjacent columns the profit and profit factor for the same combinations of time and day of the week, and how much they sank.

Here's the final SQL query (abbreviated):

SELECT * FROM

(

   SELECT

      AVG(product) / STDDEV(product) AS objective,

      SUM(estimate) AS backtest_profit,

      SUM(CASE WHEN estimate >= 0 THEN estimate ELSE 0 END) / 

         SUM(CASE WHEN estimate < 0 THEN -estimate ELSE 0 END) AS backtest_PF,

      intraday, day

   FROM

   (

      SELECT ...

      FROM MqlRatesDB

      WHERE (time >= STRFTIME('%s', '2015-01-01') AND time < STRFTIME('%s', '2021-01-01'))

   )

   GROUP BY intraday, day

) backtest

JOIN

(

   SELECT

      SUM(estimate) AS forward_profit,

      SUM(CASE WHEN estimate >= 0 THEN estimate ELSE 0 END) /

         SUM(CASE WHEN estimate < 0 THEN -estimate ELSE 0 END) AS forward_PF,

      intraday, day

   FROM

   (

      SELECT ...

      FROM MqlRatesDB

      WHERE (time >= STRFTIME('%s', '2021-01-01'))

   )

   GROUP BY intraday, day

) forward

USING(intraday, day)

ORDER BY objective DESC

The full text of the request is provided in the file DBQuotesIntradayBackAndForward.sql. It is connected as a resource in the script DBQuotesIntradayBackAndForward.mq5.

By running the script with default settings, we get the following indicators (with abbreviations):

 #|          objective    backtest_profit    backtest_PF intraday day forward_profit     forward_PF

--+------------------------------------------------------------------------------------------------

 1|   0.16713214428916     0.073200000001  1.46040631486 16:00:00 5   0.004920000048  1.12852664576 

 2|  0.118128291843983    0.0433099999995  1.33678071539 20:00:00 3   0.007880000055    1.277856135 

 3|  0.103701251751617   0.00929999999853  1.14148790506 05:00:00 2   0.002210000082  1.12149532710 

 4|  0.102930330078208    0.0164399999973   1.1932071923 08:00:00 4   0.001409999969  1.07253086419 

 5|  0.089531492651001    0.0064300000006  1.10167615433 07:00:00 2  -0.009119999869 0.561749159058 

 6| 0.0827628326995007 -8.99999999970e-05 0.999601152226 17:00:00 4   0.009070000091  1.18809622563 

 7| 0.0823433025146974    0.0159700000012  1.21665988332 21:00:00 1    0.00250999999  1.12131464475 

 8| 0.0767938336191962   0.00522999999874  1.04226945769 13:00:00 1  -0.008490000055 0.753913043478 

 9| 0.0657741522256548    0.0162299999986  1.09699976093 15:00:00 2    0.01423999997  1.34979120609 

10| 0.0635243373432768   0.00932000000044  1.08294766820 22:00:00 3   -0.00456999993 0.828967065868

... 

So, the trading system with the best trading schedules found continues to show profit in the "future" period, although not as large as on the backtest.

Of course, the considered example is only a particular case of a trading system. We could, for example, find combinations of the time and day of the week when a reversal strategy works on neighboring bars, or based on other principles altogether (analysis of ticks, calendar, portfolio of trading signals, etc.).

The bottom line is that the SQLite engine provides many convenient tools that would need to be implemented in MQL5 on your own. To tell the truth, learning SQL takes time. The platform allows you to choose the optimal combination of two technologies for efficient programming.