MQL's OOP notes: Online Analytical Processing of trading hypercubes: part 1

12 December 2016, 19:28
Stanislav Korotky
2
275
Normally trader's activity involves analysis of a lot of data. Part of it, and probably most significant part, are numbers. Time series, economic indicators, trading reports should be studied, classified and analyzed. It's very likely that this is the trading was the first applied field for big data science. When numbers form a massive array with numerous properties, it can be described as a hypercube. And there is a well-known technology named Online Analytical Processing (OLAP) which is aimed at dissecting this cube. Today we'll consider OLAP as a part of MetaTrader infrastructure and provide a simple implementation of multidimensional analysis in MQL. 

Before we start we should decide what data to analyse. There are many options to choose from, such as trading statements, optimization reports, custom indicators readings. Actually this is not so important which one to use, because we're going to design a universal object-oriented framework suitable for any purpose. But since we need something to test the framework on, we should elect a concrete task. One of the most popular tasks is analysis of an account trading history. So let's take it.

For a given trading history, one could desire to see profits splitted by symbols, week days, or buys and sells. Or he could want to compare profitness of several expert advisers by months. And then it could probably become important to combine these two analyzes in one view. All these can be achieved with OLAP.
  

The blueprint

Object-oriented approach implies that we need to decompose the task to a set of simple logically decoupled yet interconnected parts (classes) and think of a general plan how they work together. First class we need is a data record, where our input data will come from. The record can hold, for example, an information about trading operation or about a single pass during optimization.

The record is basically a vector with an arbitrary number of fields. Since this is an abstract thing, we don't need to know what every field means. For every specific use case we can derive a concrete record class which will know every field meaning and process it accordingly.

To read records from some abstract information source (such as account history, csv-file, html-report, or even WebRequest) we need another class - a data adapter. On the abstract base level the class should provide only one basic functionality - to iterate through records. Again, for every specific use case, we can derive a concrete data adapter capable of populating records with real data from underlying source.

All the records should be somehow mapped into a meta cube. We don't yet know how this can be done, but this is the essense of the task - to slice and aggregate the data into a generalized view with various meaningful statistics. As an abstraction, the meta cube provides only basic properties such as number of dimensions, their names, and size of each dimension. Descendant classes should fill the cube with specific statistics. For example, we can imagine such aggregators as sum, or average value of selected field in input records. 

Before values can be aggregated, the mapping of records should be performed, that is each record should get a set of indices uniquely identifing a cell in the meta cube. This subtask can be delegated to a special class - selector. The abstract base class will define an interface to return a range of possible values (for example, splitting records by day of week implies that selector produces week day number, that is 0 - 6), and to map a record into the range. Derived classes will provide concrete implementation for these methods. You may notice, that every selector corresponds to an edge (or dimension) of the meta cube. And the size of every dimension equals to the range of values of corresponding selector. 

In addition, it's sometimes useful to filter out some records. This is why we should provide a filter. It's much like selector with a restriction applied on values (for example, if you have a day of week selector, producing indices 0-6, you can form a filter on it to exclude specific day from calculations).

After the cube is built, we'll need to visualize results. So, let's add a class for output, name it display

Finally, there should be a core class which binds all the abovementioned stuff together. Let's call it analyst.

The whole picture looks like this.

Online Analytical Processing in MetaTrader



The Implementation

Let's start coding the classes described above. First goes the Record.

class Record
{
  private:
    float data[];
    
  public:
    Record(const int length)
    {
      ArrayResize(data, length);
      ArrayInitialize(data, 0);
    }
    
    void set(const int index, float value)
    {
      data[index] = value;
    }
    
    float get(const int index) const
    {
      return data[index];
    }
};

It does nothing more than storing values in the data array (vector). We use float type to save memory. Data cubes can be very large, so using float instead of double allows for sparing 50% of memory at the expense of a bit lower accuracy.

We'll acquire records from various places by mean of DataAdapter.

class DataAdapter
{
  public:
    virtual Record *getNext() = 0;
    virtual int reservedSize() = 0;
};

The method getNext should be called in a loop until it returns NULL (no more records). Until then all the records should be saved somewhere (see details below). The method reservedSize is provided for optimizing memory allocation.

Every dimension of the cube is calculated based on one or more fields of the records. It's convenient to denote every such field as an element in a special enumeration. For example, if we're going to analyze trading history of an account, we can stick to the following enumeration.

// MT4 and MT5 hedge
enum TRADE_RECORD_FIELDS
{
  FIELD_NONE,          // none
  FIELD_NUMBER,        // serial number
  FIELD_TICKET,        // ticket
  FIELD_SYMBOL,        // symbol
  FIELD_TYPE,          // type (OP_BUY/OP_SELL)
  FIELD_DATETIME1,     // open datetime
  FIELD_DATETIME2,     // close datetime
  FIELD_DURATION,      // duration
  FIELD_MAGIC,         // magic number
  FIELD_LOT,           // lot
  FIELD_PROFIT_AMOUNT, // profit amount
  FIELD_PROFIT_PERCENT,// profit percent
  FIELD_PROFIT_POINT,  // profit points
  FIELD_COMMISSION,    // commission
  FIELD_SWAP           // swap
};

And if we'd like to analyze MetaTrader's optimization results, we could use the following enumeration.

enum OPTIMIZATION_REPORT_FIELDS
{
  OPTIMIZATION_PASS,
  OPTIMIZATION_PROFIT,
  OPTIMIZATION_TRADE_COUNT,
  OPTIMIZATION_PROFIT_FACTOR,
  OPTIMIZATION_EXPECTED_PAYOFF,
  OPTIMIZATION_DRAWDOWN_AMOUNT,
  OPTIMIZATION_DRAWDOWN_PERCENT,
  OPTIMIZATION_PARAMETER_1,
  OPTIMIZATION_PARAMETER_2,
  //...
};

For every specific use case we should elaborate specific enumeration. Any one of such enumerations should be used as a template paramater of the templatized class Selector.

template<typename E>
class Selector
{
  protected:
    E selector;
    string _typename;
    
  public:
    Selector(const E field): selector(field)
    {
      _typename = typename(this);
    }
    
    // returns index of cell to store values from the record
    virtual bool select(const Record *r, int &index) const = 0;
    
    virtual int getRange() const = 0;
    virtual float getMin() const = 0;
    virtual float getMax() const = 0;
    
    virtual E getField() const
    {
      return selector;
    }
    
    virtual string getLabel(const int index) const = 0;
    
    virtual string getTitle() const
    {
      return _typename + "(" + EnumToString(selector) + ")";
    }
};

The field selector will hold specific value - an element of enumeration. For example, if TRADE_RECORD_FIELDS is used, one could create selector for buy/sell trade operations like so:

new Selector<TRADE_RECORD_FIELDS>(FIELD_TYPE);

The field _typename is auxiliary. It will be overriden in derived classes to identify selectors by class names, which can be useful in logs or resulting graphs. The field is used in the virtual method getTitle.

The method select is the main point of the class. This is where the incoming Record will be mapped to specific index on the axis formed by current selector. The index should be in range between getMin and getMax values, and overall number of indices is equal to value returned by getRange. If a given record can not be mapped inside the range, select returns false. Otherwise, if mapping is successfull, it returns true. 

The method getLabel returns a user-friendly description of the given index. For example, for buy/sell operations, index 0 should produce "buy" and index 1 - "sell".

Since we're going to concentrate on trading history analisys, let's introduce an intermediate class of selectors applied on the TRADE_RECORD_FIELDS enumeration.

class TradeSelector: public Selector<TRADE_RECORD_FIELDS>
{
  public:
    TradeSelector(const TRADE_RECORD_FIELDS field): Selector(field)
    {
      _typename = typename(this);
    }

    virtual bool select(const Record *r, int &index) const
    {
      index = 0;
      return true;
    }
    
    virtual int getRange() const
    {
      return 1; // this is a scalar by default, returns 1 value
    }
    
    virtual float getMin() const
    {
      return 0;
    }
    
    virtual float getMax() const
    {
      return (float)(getRange() - 1);
    }
    
    virtual string getLabel(const int index) const
    {
      return "scalar" + (string)index;
    }
};

By default, it maps all records into a single cell in the cube. For example, you can get total profit using this selector.

Now let's specialize this selector a bit more for the case of splitting operation by type (buy/sell).

class TypeSelector: public TradeSelector
{
  public:
    TypeSelector(): TradeSelector(FIELD_TYPE)
    {
      _typename = typename(this);
    }

    virtual bool select(const Record *r, int &index) const
    {
      ...
    }
    
    virtual int getRange() const
    {
      return 2; // OP_BUY, OP_SELL
    }
    
    virtual float getMin() const
    {
      return OP_BUY;
    }
    
    virtual float getMax() const
    {
      return OP_SELL;
    }
    
    virtual string getLabel(const int index) const
    {
      static string types[2] = {"buy", "sell"};
      return types[index];
    }
};

We define the class using FIELD_TYPE element in the base constructor. getRange returns 2 because we have only 2 possible values for the type: OP_BUY or OP_SELL. getMin and getMax return appropriate values. What should we write inside select method?

To answer the question we should decide which information is stored in the record. Let's code a class derived from the Record and intended for use with trading history.

class TradeRecord: public Record
{
  private:
    static int counter;

  protected:
    void fillByOrder()
    {
      set(FIELD_NUMBER, counter++);
      set(FIELD_TICKET, OrderTicket());
      set(FIELD_TYPE, OrderType());
      set(FIELD_DATETIME1, OrderOpenTime());
      set(FIELD_DATETIME2, OrderCloseTime());
      set(FIELD_DURATION, OrderCloseTime() - OrderOpenTime());
      set(FIELD_MAGIC, OrderMagicNumber());
      set(FIELD_LOT, (float)OrderLots());
      set(FIELD_PROFIT_AMOUNT, (float)OrderProfit());
      set(FIELD_PROFIT_POINT, (float)((OrderType() == OP_BUY ? +1 : -1) * (OrderClosePrice() - OrderOpenPrice()) / MarketInfo(OrderSymbol(), MODE_POINT)));
      set(FIELD_COMMISSION, (float)OrderCommission());
      set(FIELD_SWAP, (float)OrderSwap());
    }
    
  public:
    TradeRecord(): Record(TRADE_RECORD_FIELDS_NUMBER)
    {
      fillByOrder();
    }
};

The helper method fillByOrder demonstrates how most of the fields can be filled from current order. Of course, the order should be previously selected somewhere in the code. The number of fields TRADE_RECORD_FIELDS_NUMBER can be either hardcoded in a macro definition or determined dynamically from TRADE_RECORD_FIELDS enumeration (you may find details in the source codes attached at the end of the story, which continues in the part 2).

As you see the field FIELD_TYPE is filled by operation code from OrderType. Now we can get back to the TypeSelector's select method.

    virtual bool select(const Record *r, int &index) const
    {
      int t = (int)r.get(selector);
      index = t;
      return index >= getMin() && index <= getMax();
    }

Here, we read the field from incoming record and assign its value (which can be OP_BUY or OP_SELL) as the index. Only market orders are counted, so select returns false for any other types. We'll consider some other selectors later.

It's time to implement a data adapter specific for trading history. This is the class where TradeRecords will be generated based on a real account history. 

class HistoryDataAdapter: public DataAdapter
{
  private:
    int size;
    int cursor;
    
  protected:
    void reset()
    {
      cursor = 0;
      size = OrdersHistoryTotal();
    }
    
  public:
    HistoryDataAdapter()
    {
      reset();
    }
    
    virtual int reservedSize()
    {
      return size;
    }
    
    virtual Record *getNext()
    {
      if(cursor < size)
      {
        while(OrderSelect(cursor++, SELECT_BY_POS, MODE_HISTORY))
        {
          if(OrderType() < 2)
          {
            if(MarketInfo(OrderSymbol(), MODE_POINT) == 0)
            {
              Print("MarketInfo is missing:");
              OrderPrint();
              continue;
            }

            return new TradeRecord();
          }
        }
        return NULL;
      }
      return NULL;
    }
};

The adapter just iterates through all orders in the history and creates TradeRecord for every market order. There must be a core class which instantiates the adapter and invokes its getNext method. Moreover, the core class should store the returned records in an internal array. This is how we came to the class Analyst.

template<typename E>
class Analyst
{
  private:
    DataAdapter *adapter;
    Record *data[];
    int size;
    
  public:
    Analyst(DataAdapter &a): adapter(&a)
    {
      ArrayResize(data, adapter.reservedSize());
    }
    
    ~Analyst()
    {
      int n = ArraySize(data);
      for(int i = 0; i < n; i++)
      {
        delete data[i];
      }
    }
    
    void acquireData()
    {
      Record *record;
      int i = 0;
      while((record = adapter.getNext()) != NULL)
      {
        data[i++] = record;
      }
      ArrayResize(data, i);
      size = i;
    }
};

The class does not actually instantiate the adapter but accepts it as a parameter of constructor. This is a well-known principle of dependency injection. It allows us to decouple Analyst from concrete implementation of a DataAdapter. In other words, we can exchange different classes for history adapter freely without any modification of Analyst.

Analyst is currently ready for populating the internal array with records, but lacks most important part - aggregation itself.

As you remember, aggregators are classes capable of calculating specific statistics for selected record fields. And the base class for aggregators is a meta cube - multidimensional array storing statistics. 

class MetaCube
{
  protected:
    int dimensions[];
    int offsets[];
    double totals[];
    string _typename;
    
  public:
    int getDimension() const
    {
      return ArraySize(dimensions);
    }
    
    int getDimensionRange(const int n) const
    {
      return dimensions[n];
    }
    
    int getCubeSize() const
    {
      return ArraySize(totals);
    }
    
    virtual double getValue(const int &indices[]) const = 0;
};

The array dimensions describes the structure of the MetaCube. Its size should be equal to number of selectors used, and every element should contain the size of corresponding dimension, which comes from selector range. For example, if we want to see profits by day of week, we should craft a selector returning day indices in range 0-6 according to order open (or close) time. Since this is the only selector we use, dimensions will have 1 element, and its value will be 7. If we want to add another selector - say TypeSelector, described above - in order to see profits splitted both by day of week and operation type, then dimensions will contain 2 elements: 7 and 2. This also means that there should be 14 cells with statistics.

The array totals is actually the array with statistics. You may wonder why is it one dimensional while meta cube has been just called multidimensional. This is because we do not know beforehand how many dimensions user decide to add, so we use plain array to store elements belonging to many dimensions. This is just a matter of proper indexing, that is we need to know where every subarray begins. This is done by means of the array offsets

The base class does not resize or initialize the member arrays, because all this stuff will be implemented in derived classes.

Since all aggregators will have many common features, let's pack them into a single base class derived from MetaCube.

template<typename E>
class Aggregator: public MetaCube
{
  protected:
    const E field;

The Aggregator should process specific field of records. For example, this can be profit (FIELD_PROFIT_AMOUNT). We'll initialize the variable field in constructor below.

    const int selectorCount;
    const Selector<E> *selectors[];

The calculations should be performed in a multidimensional space formed by arbitrary number of selectors. We have considered the example of counting profits splitted by day of week and buy/sell operation, which requires 2 selectors. They should be stored in the array selectors. Selectors themselves are passed into the object again via constructor (see below).

  public:
    Aggregator(const E f, const Selector<E> *&s[]): field(f), selectorCount(ArraySize(s))
    {
      ArrayResize(selectors, selectorCount);
      for(int i = 0; i < selectorCount; i++)
      {
        selectors[i] = s[i];
      }
      _typename = typename(this);
    }

As you remember we have one dimensional array totals as a storage of values in the multidimensional space of selectors. To convert indices of multiple dimensions into a single position in the one dimensional array we use the following helper function.

    int mixIndex(const int &k[]) const
    {
      int result = 0;
      for(int i = 0; i < selectorCount; i++)
      {
        result += k[i] * offsets[i];
      }
      return result;
    }

It accepts an array with indices and returns ordinal position. You'll see how the array offsets is filled a few lines below.

This is the most tricky part: initialization of all internal arrays.

    virtual void setSelectorBounds()
    {
      ArrayResize(dimensions, selectorCount);
      int total = 1;
      for(int i = 0; i < selectorCount; i++)
      {
        dimensions[i] = selectors[i].getRange();
        total *= dimensions[i];
      }
      ArrayResize(totals, total);
      ArrayInitialize(totals, 0);
      
      ArrayResize(offsets, selectorCount);
      offsets[0] = 1;
      for(int i = 1; i < selectorCount; i++)
      {
        offsets[i] = dimensions[i - 1] * offsets[i - 1]; // 1, X, Y*X
      }
    }

Finally we came to the task of calculating statistics.

    // build an array with number of dimentions equal to number of selectors
    virtual void calculate(const Record *&data[])
    {
      int k[];
      ArrayResize(k, selectorCount);
      int n = ArraySize(data);
      for(int i = 0; i < n; i++)
      {
        int j = 0;
        for(j = 0; j < selectorCount; j++)
        {
          int d;
          if(!selectors[j].select(data[i], d)) // does record satisfy selector?
          {
            break;                             // skip it, if not
          }
          k[j] = d;
        }
        if(j == selectorCount)
        {
          update(mixIndex(k), data[i].get(field));
        }
      }
    }

This method will be called for array of records. Every record is presented to each selector in the loop, and if it's not rejected by any one of them, then corresponding indices are saved in the local array k. If all selectors determined valid indices for the record, we invoke method update. It accepts an offset of element in the array totals (the offset is calculated by the helper function mixIndex shown above, based on the array of indices), and value from the specified field of the record. If we continue the example of analyzing profits distribution, then the field will be FIELD_PROFIT_AMOUNT and its value comes from OrderProfit, as you remember.

    virtual void update(const int index, const float value) = 0;

The method is a pure virtual and should be overriden in descendant classes.

Finally, aggregator should provide a method to access resulting statistics.

    double getValue(const int &indices[]) const
    {
      return totals[mixIndex(indices)];
    }
};

The base Aggregator class does almost all the job. Now we can implement many specific aggregators with minimal efforts. But apart from this there is still a lot of work, so let's continue with all the routine tasks in the part 2.


Share it with friends: