Feature Engineering for ML (Part 8): Entropy Features in MQL5

MetaTrader 5 — Trading systems | 30 June 2026, 14:41

112

Patrick Murimi Njoroge

Introduction

This article ports four entropy estimators—Shannon, Plug-In, Lempel-Ziv and Kontoyiannis—that were implemented in Python (Part 7) to MQL5 and explains the design decisions required to make them practical inside MetaTrader 5. The Python reference relied on NumPy sliding windows, Python dictionaries and Numba acceleration; none of those primitives are available in MQL5. More importantly, MQL5 access to intrabar ticks is limited to the broker‑cached window returned by CopyTicksRange(), so historical tick sequences beyond that cache are simply unavailable. The port therefore addresses three central constraints at once:

how to fetch and encode intrabar tick rules using CopyTicksRange() while marking bars with insufficient ticks via the sentinel ENT_EMPTY;
how to count overlapping w‑grams without NumPy (a base‑3 integer hash into a fixed counter array replaces sliding window_view and unique);
how to bound the Kontoyiannis match search (a configurable look‑back window caps the O(n²) inner loop).

The goal is practical: reproduce the Python reference numerically, expose safe EA/inidicator integration points, and make the feature set usable in live trading despite the tick-cache limitation.

Getting Intrabar Tick Data in MQL5

The Python entropy pipeline from Part 7 receives tick-rule sequences as pre-computed NumPy arrays — the data collection step is handled elsewhere in the pipeline before the entropy functions are called. In MQL5, the class is responsible for collecting its own intrabar tick data. The only practical mechanism is CopyTicksRange():

int CopyTicksRange(
   string   symbol,
   MqlTick  &ticks[],
   uint     flags,
   long     from_msc,
   long     to_msc
   );

from_msc and to_msc are timestamps in milliseconds. For bar i on the current symbol and period, the range is:

from_msc = (long)iTime(symbol, period, i)     * 1000;
to_msc   = (long)iTime(symbol, period, i - 1) * 1000 - 1;

This captures ticks strictly inside bar i: from its open until 1 ms before bar i−1 opens. The COPY_TICKS_ALL flag is used so that all available tick records are returned regardless of type.

The tick history constraint

CopyTicksRange() draws from the terminal's local tick cache, which MetaTrader 5 fills on demand when the chart is open. The cache depth is broker-controlled — typically a few days on most retail platforms, occasionally extending to several weeks on platforms configured for historical tick export. Bars older than the cache boundary return an empty array (return value 0 or −1), and the class records ENT_EMPTY for those bars. This constraint does not affect live use, where tick data for the most recent bars is always present; it does limit historical coverage when computing the indicator on a freshly opened chart.

This constraint differs materially from the tick-volume issue documented in Part 6. There, the real traded volume field (MqlTick.volume) was zero on retail forex feeds, but tick count was always available as a proxy and historical coverage was unlimited. Here, the tick sequence itself is unavailable beyond the cache boundary, with no proxy that preserves the directional information the entropy estimators require.

A bar must also supply a minimum number of tick-direction observations for the estimates to be meaningful. The constructor parameter min_ticks (default 8) enforces a floor: bars with fewer valid direction changes return ENT_EMPTY rather than estimates computed on two or three symbols.

Encoding: From MqlTick to uchar Sequence

Part 7 encoded a Python tick-rule array as bytes using the mapping −1 → 98 ('b'), 0 → 99 ('c'), +1 → 97 ('a'). The MQL5 port uses a numerically simpler mapping over uchar (0–255):

Bid change	Direction	uchar value
Δbid < 0	sell	0
Δbid = 0	flat	1
Δbid > 0	buy	2

The last price field (MqlTick.last) is zero on virtually all retail forex feeds. The bid price is used as the reference price for the tick rule throughout. On equity or futures instruments where last is populated, the same class can be adapted by replacing the d_bid difference with ticks[i].last − ticks[i−1].last.

The complete encoding logic, including the millisecond time range construction, is encapsulated in _GetTickSequence():

//+------------------------------------------------------------------+
//| _GetTickSequence: fetch ticks for one bar and apply tick rule    |
//+------------------------------------------------------------------+
int CEntropyFeatures::_GetTickSequence(
   datetime bar_open, datetime bar_close, uchar &seq[])
  {
   MqlTick ticks[];
   long from_msc = (long)bar_open  * 1000;
   long to_msc   = (long)bar_close * 1000 - 1;

   int n = (int)CopyTicksRange(m_symbol, ticks,
                               COPY_TICKS_ALL, from_msc, to_msc);
   if(n <= 1)
      return(0);

   if(ArrayResize(seq, n - 1) < 0)
      return(0);

   int cnt = 0;
   for(int i = 1; i < n; i++)
     {
      double d_bid = ticks[i].bid - ticks[i - 1].bid;
      uchar  dir;
      if(d_bid > 0.0)      dir = 2;
      else if(d_bid < 0.0) dir = 0;
      else                 dir = 1;
      seq[cnt++] = dir;
     }

   ArrayResize(seq, cnt);
   return(cnt);
  }

The function preallocates n − 1 slots and writes only the direction changes (tick 0 has no predecessor). The final ArrayResize(seq, cnt) trims the array to the exact count of valid directions, which may differ from n − 1 if any tick records had undefined bid fields.

Architecture: CEntropyFeatures

The class follows the same header-only, engine-centered design used in Part 6's CMicrostructureFeatures. The file CEntropyFeatures.mqh is placed in MQL5\Include\Features\ and included with angle-bracket syntax:

#include <Features\CEntropyFeatures.mqh>

The public interface is minimal: a constructor that binds symbol, period, and three algorithm parameters; a Calculate() method that iterates over bars; and four read-only accessors.

CEntropyFeatures — data flow

Figure 1. Architecture of CEntropyFeatures — data flow from MetaTrader 5 tick API to the four entropy accessors

MetaTrader 5 API (left): CopyTicksRange() and iTime() supply the raw tick array and bar boundary timestamps respectively.
CEntropyFeatures (centre): _GetTickSequence() applies the bid-change tick rule and encodes directions as uchar {0, 1, 2}. The amber badge marks the sentinel path: when tick count is below min_ticks, ENT_EMPTY = −1e38 is written to all four output arrays for that bar without calling the estimators. Below the sentinel, the encoded array fans out to the four private estimator methods.
Accessors (right): each method takes an integer offset, where 0 corresponds to the bar at start_bar, and guards against out-of-range access by returning ENT_EMPTY.

Constructor signature

//+------------------------------------------------------------------+
//| CEntropyFeatures constructor                                     |
//+------------------------------------------------------------------+
CEntropyFeatures::CEntropyFeatures(
   string          symbol,
   ENUM_TIMEFRAMES period,
   int             konto_window = 20,
   int             plugin_word  = 2,
   int             min_ticks    = 8
   )
  {
   m_symbol       = symbol;
   m_period       = period;
   m_konto_window = MathMax(0, konto_window);
   m_plugin_w     = MathMax(1, MathMin(ENT_MAX_W, plugin_word));
   m_min_ticks    = MathMax(4, min_ticks);
   m_n            = 0;
  }

konto_window = 0 activates the expanding-window variant of the Kontoyiannis estimator, which looks back to the start of the sequence for each evaluation point. konto_window > 0 limits the look-back to the most recent konto_window symbols, bounding the O(n²) inner loop. The default of 20 is a reasonable bound for per-bar messages with 24–50 ticks. plugin_word controls the w-gram word length for the Plug-In estimator; values greater than 4 are clamped to ENT_MAX_W = 4 (81 unique grams on a 3-symbol alphabet, an appropriate upper limit before the gram table grows sparse). min_ticks is floored at 4, the minimum required for the Kontoyiannis estimator to produce a meaningful result.

Shannon Entropy

Shannon entropy measures the marginal symbol distribution, treating each tick direction as an independent draw from the ternary alphabet {0, 1, 2}:

Shannon Entropy

The MQL5 implementation counts occurrences in a three-element integer array, avoiding heap allocation:

//+------------------------------------------------------------------+
//| _Shannon: marginal symbol-frequency entropy in bits              |
//+------------------------------------------------------------------+
double CEntropyFeatures::_Shannon(const uchar &seq[], int n)
  {
   if(n < 2)
      return(ENT_EMPTY);

   int cnt[3] = {0, 0, 0};
   for(int i = 0; i < n; i++)
     {
      if(seq[i] < 3) cnt[seq[i]]++;
     }

   double h = 0.0;
   for(int k = 0; k < 3; k++)
     {
      if(cnt[k] > 0)
        {
         double p = (double)cnt[k] / n;
         h -= p * MathLog(p) / MathLog(2.0);
        }
     }
   return(h);
  }

Shannon's H lies in [0, log₂(3) ≈ 1.585]. It reaches its maximum when all three directions appear with equal frequency (maximum uncertainty about the next tick direction) and falls to zero when only one direction appears. On a strongly trending bar, 80–90% of ticks are buys; Shannon's H drops well below 1.0. On a random bar, Shannon's H approaches 1.585.

Shannon's H is a marginal estimator: it captures the frequency distribution of individual symbols but ignores sequential structure. A sequence that alternates perfectly between buy and sell has the same Shannon's H as a sequence with the same buy/sell counts in random order. The three estimators that follow specifically target this sequential structure.

Plug-In Entropy

The Plug-In estimator extends Shannon's H to w-grams: it computes Shannon entropy over the distribution of all overlapping subsequences of length w, then normalizes to bits per symbol by dividing by w. A trending bar produces a highly non-uniform w-gram distribution (the same bigram "buy→buy" dominates), pulling the normalized entropy down. A bar with no sequential structure produces a nearly uniform distribution of w-grams, yielding normalized entropy close to its maximum.

Hash-based w-gram counting

Python uses numpy.lib.stride_tricks.sliding_window_view to extract all overlapping w-grams as a 2-D array, then numpy.unique to count distinct rows. MQL5 has neither. The port uses a base-3 integer hash: given the w-gram starting at position i, its hash is

Hash-based w-gram counting

Since the alphabet is {0, 1, 2}, each unique w-gram maps to a unique integer in [0, 3^w). This allows the counts to be stored in a fixed array of size 3^w (maximum 81 entries for w = 4) without any sorting or dictionary lookups.

The implementation also applies the arr[:n−1] truncation from the original AFML Python module, which processes only the first n − 1 symbols of the encoded array:

//+------------------------------------------------------------------+
//| _PlugIn: block w-gram entropy, normalized to bits/symbol         |
//+------------------------------------------------------------------+
double CEntropyFeatures::_PlugIn(const uchar &seq[], int n, int w)
  {
   int m = n - 1;            // truncate to n-1 (Python compatibility)
   if(m < w)
      return(ENT_EMPTY);

   int alpha_pow = 1;
   for(int k = 0; k < w; k++) alpha_pow *= 3;

   int counts[];
   if(ArrayResize(counts, alpha_pow) < 0)
      return(ENT_EMPTY);
   ArrayInitialize(counts, 0);

   int total = 0;
   for(int i = 0; i <= m - w; i++)
     {
      int hash = 0, base = 1;
      for(int j = 0; j < w; j++)
        {
         hash += (int)seq[i + j] * base;
         base *= 3;
        }
      if(hash >= 0 && hash < alpha_pow)
        {
         counts[hash]++;
         total++;
        }
     }

   if(total == 0)
      return(ENT_EMPTY);

   double h = 0.0;
   for(int k = 0; k < alpha_pow; k++)
     {
      if(counts[k] > 0)
        {
         double p = (double)counts[k] / total;
         h -= p * MathLog(p) / MathLog(2.0);
        }
     }
   return(h / w);
  }

For w = 1 the Plug-In estimator reduces to Shannon's H on the first n − 1 symbols, which differs from Shannon's H computed on all n symbols by at most one symbol's contribution. The default of w = 2 captures the first layer of sequential dependency: buy→buy, buy→sell, sell→buy, and so on. The arr[:n−1] truncation is required for numerical agreement with the Python reference; omitting it produces outputs that differ by a small but systematic amount on most bars.

Lempel-Ziv Complexity

The Lempel-Ziv complexity c(n)/n counts the number of distinct phrases produced by the LZ76 greedy parsing of the sequence. The algorithm maintains a phrase library: starting from position 1, it extends the current candidate until it finds the shortest extension not present in the library, adds that extension as a new phrase, and advances to the next position. The complexity is the total phrase count divided by sequence length.

Phrase-library membership vs. arbitrary occurrence

The original AFML Python module contained a bug in the membership check: the inner loop checked whether the candidate appeared anywhere in the text before the current position, not whether it was a member of the phrase library. These two conditions are not equivalent. Consider a sequence where phrase "AB" was added to the library at position 3, and the same substring "AB" appears elsewhere in the text at position 7 without having been added to the library at that point. The buggy version counts "AB" as seen; the correct version checks only the library. Part 7 corrected this bug in Python; the MQL5 implementation applies the same correction.

The library is represented as two parallel integer arrays — lib_starts[] and lib_lengths[] — that record the start position and length of each phrase. Membership is checked by comparing the content of the candidate against each library entry of the same length:

//+------------------------------------------------------------------+
//| _LempelZiv: LZ complexity c(n)/n with phrase-library check       |
//+------------------------------------------------------------------+
double CEntropyFeatures::_LempelZiv(const uchar &seq[], int n)
  {
   if(n < 2)
      return(ENT_EMPTY);

   int lib_starts[], lib_lengths[];
   if(ArrayResize(lib_starts,  n) < 0) return(ENT_EMPTY);
   if(ArrayResize(lib_lengths, n) < 0) return(ENT_EMPTY);

   lib_starts[0]  = 0;
   lib_lengths[0] = 1;
   int lib_size = 1, phrase_cnt = 1, pos = 1;

   while(pos < n)
     {
      bool found_new = false;
      for(int length = 1; length <= n - pos; length++)
        {
         bool in_lib = false;
         for(int p = 0; p < lib_size && !in_lib; p++)
           {
            if(lib_lengths[p] != length) continue;
            bool match = true;
            int  s     = lib_starts[p];
            for(int k = 0; k < length && match; k++)
               match = (seq[pos + k] == seq[s + k]);
            if(match) in_lib = true;
           }
         if(!in_lib)
           {
            if(lib_size < n)
              {
               lib_starts[lib_size]  = pos;
               lib_lengths[lib_size] = length;
               lib_size++;
              }
            phrase_cnt++;
            pos += length;
            found_new = true;
            break;
           }
        }
      if(!found_new)
        { phrase_cnt++; break; }
     }

   return((double)phrase_cnt / n);
  }

The worst-case time complexity is O(n²) per bar (each new phrase potentially requires checking all previous library entries). For per-bar messages of 24–50 ticks, this amounts to at most a few hundred comparisons and is not a practical bottleneck. If tick sequences grow substantially longer (e.g., from M1 bars with hundreds of ticks), the inner loop should be replaced with a hash-table or trie structure.

Kontoyiannis Entropy

The Kontoyiannis estimator approximates the entropy rate by measuring how quickly matches become harder to find as the evaluation point moves forward through the sequence. For each position i in the second half of the sequence, the algorithm finds the longest subsequence starting at i that also appears in the look-back window [look_start, i). The match length Λi enters the estimator as:

Kontoyiannis entropy estimator

A high match rate (large Λi) indicates a predictable sequence and low entropy. A low match rate (short matches, Λi = 1) indicates a random sequence and high entropy contribution.

The minimum-lambda floor

When no match of any length is found in the look-back window for a given position — which can happen when the symbol at position i has not appeared in the window at all — the raw longest match length would be zero, causing a division-by-zero error. The implementation floors Λi at 1:

int lambda_i = MathMax(1, lam - 1);

This ensures that a completely novel symbol contributes a full log₂(i+1) unit to the entropy estimate, which is consistent with the symbol carrying maximum local surprise.

//+------------------------------------------------------------------+
//| _Konto: Kontoyiannis entropy rate with bounded look-back         |
//+------------------------------------------------------------------+
double CEntropyFeatures::_Konto(const uchar &seq[], int n, int window)
  {
   if(n < ENT_MIN_N)
      return(ENT_EMPTY);

   double sum  = 0.0;
   int    half = n / 2;

   for(int i = half; i < n; i++)
     {
      int look_start = (window > 0) ? MathMax(0, i - window) : 0;

      int lam = 1;
      while(true)
        {
         if(i + lam > n) break;
         bool found = false;
         for(int j = look_start; j < i && !found; j++)
           {
            if(j + lam > n) continue;
            bool match = true;
            for(int k = 0; k < lam && match; k++)
               match = (seq[i + k] == seq[j + k]);
            if(match) found = true;
           }
         if(!found) break;
         lam++;
        }

      int lambda_i = MathMax(1, lam - 1);
      sum += MathLog((double)(i + 1)) / MathLog(2.0) / lambda_i;
     }
   return(sum / n);
  }

Note that the inner search over j permits the candidate at position j to overlap with the evaluation point i (i.e., j < i ≤ j + lam). This is correct: the look-back window is defined relative to i, not relative to the candidate's end point. A prior bug in the original AFML Python code inserted a guard

if j + length > start:
    break

that incorrectly prevented this overlap, producing systematically shorter matches and underestimated entropy rates. The MQL5 port replicates the corrected behavior from Part 7.

EA Integration

An EA includes CEntropyFeatures.mqh and allocates the object in OnInit(). To add entropy features to the feature vector before each signal evaluation, the EA calls Calculate() on the most recently closed bar and reads the four estimates at offset 0:

#include <Features\CEntropyFeatures.mqh>

CEntropyFeatures *g_ent;

int OnInit()
  {
   g_ent = new CEntropyFeatures(_Symbol, _Period,
                                   20,   // Konto window
                                   2,    // plugin word length
                                   8);  // min ticks per bar
   return(INIT_SUCCEEDED);
  }

void OnTick()
  {
   //--- recompute on new-bar event only
   static datetime last_bar = 0;
   datetime        current  = iTime(_Symbol, _Period, 0);
   if(current == last_bar) return;
   last_bar = current;

   if(!g_ent.Calculate(1, 1))
      return;

   double h  = g_ent.ShannonH(0);
   double pi = g_ent.PlugInH(0);
   double lz = g_ent.LempelZiv(0);
   double kt = g_ent.KontoH(0);

   if(h <= ENT_EMPTY + 1.0)
      return;   // tick data unavailable for this bar

   //--- append to feature vector and run ONNX inference ...
  }

void OnDeinit(const int reason)
  {
   if(g_ent != NULL) { delete g_ent; g_ent = NULL; }
  }

Calling Calculate(1, 1) on every new bar computes one bar at a time and invokes CopyTicksRange() once. This is inexpensive for live use. If the EA needs a longer history (e.g., to compute a rolling mean of entropy over 20 bars), pass a larger n_bars argument on the first call and cache the results; subsequent calls with n_bars = 1 then only update the most recent bar.

The sentinel check — if(h <= ENT_EMPTY + 1.0) — gates the entire feature vector update. If entropy data is unavailable (bar outside the tick cache), the EA skips signal evaluation for that bar rather than operating with a stale or fabricated feature.

The Entropy Viewer

EntropyViewer.mq5 renders all four estimators as line plots in a separate indicator subwindow, with an optional rolling z-score normalization that equalizes the scale differences between, for example, LZ values in (0, 1] and Konto values that may range up to log₂(n). The indicator exposes four inputs:

Input	Default	Purpose
inp_max_bars	300	Number of most-recent bars to compute; controls CopyTicksRange() call frequency
inp_konto_window	20	Kontoyiannis look-back window (0 = expanding)
inp_plugin_word	2	Plug-In word length (1–4)
inp_normalize	false	Apply rolling z-score normalization (window = 60 bars)

On each OnCalculate() call, the indicator runs CEntropyFeatures::Calculate(1, inp_max_bars) and recomputes the full window. This is intentional: entropy estimates depend on the intrabar tick sequence, which remains active until the bar closes, so the most recent bar's estimate can change within the bar as new ticks arrive. The inp_max_bars default of 300 keeps the total CopyTicksRange() overhead manageable; reduce it for higher-frequency charts where bars contain hundreds of ticks and the inner LZ loop becomes the bottleneck.

Buffer mapping

CEntropyFeatures stores results with index 0 at the most recent bar (time-series order). MetaTrader 5 indicator buffers store results with index 0 at the oldest bar shown on the chart (chronological order). The mapping from class index k to buffer index is:

buf_idx = rates_total - 2 - k;

Index rates_total − 1 is the current forming bar (bar 0 in the class), which is excluded. Index rates_total − 2 corresponds to the most recently closed bar (class index 0). Subsequent class indices map to progressively older bars.

Validation

Regime profiles

Figure 2 shows all four estimators computed by CEntropyFeatures across 60 synthetic bars partitioned into four successive market regimes. Each bar carries a synthetic tick sequence of 24 directions; the distribution of directions differs by regime to produce the expected entropy behaviour.

Figure 2. 4-panel illustration of entropy estimator outputs across four synthetic market regimes

Panel (a) — Shannon's H: Drops sharply in the trending regime (bars 15–30) where 80% of ticks are buys; recovers in the oscillating and final random regimes. Maximum is log₂(3) ≈ 1.585 for a uniform ternary distribution.
Panel (b) — Plug-In H (w=2): Also drops in the trending regime and shows a further drop in the oscillating regime because the buy→sell, sell→buy bigrams dominate, creating a non-uniform bigram distribution even when marginal frequencies are balanced. This is the information advantage of the Plug-In estimator over Shannon's H.
Panel (c) — Lempel-Ziv c(n)/n: Falls in the trending regime (repetitive tick sequences produce a short phrase library) and shows a moderate elevation in the oscillating regime (the alternating pattern is compressible but less so than a pure trend). The theoretical maximum approaches 1 for a maximally complex (incompressible) sequence.
Panel (d) — Konto H: Exhibits higher bar-to-bar variance than the other estimators at n = 24 ticks per bar with a bounded window of 12. The trend is visible but noisier, which is consistent with the Kontoyiannis estimator's known sensitivity to message length at small n.

Python–MQL5 agreement

Figure 3 confirms that the MQL5 implementation reproduces the Python reference exactly across all four estimators. The validation was performed on 100 synthetic bars with tick sequences drawn from the same three regime distributions used in Figure 2. Each panel plots the Python output on the x-axis against the MQL5-equivalent output on the y-axis; all 100 points fall on the y = x diagonal, yielding r² = 1.000000 for every estimator.

Python vs MQL5 numerical agreement

Figure 3. 2×2-panel illustration of Python–MQL5 numerical agreement across 100 synthetic bars

Panels (a)–(d): One panel per estimator. All points lie on the y = x diagonal; r² = 1.000000 for Shannon's H, Plug-In H, Lempel-Ziv, and Kontoyiannis H. Exact agreement holds because both implementations apply identical algorithm steps: the arr[:n−1] truncation in Plug-In, the phrase-library (not arbitrary-occurrence) check in Lempel-Ziv, and the Λi ≥ 1 floor in Kontoyiannis.

Validation script

EntropyValidation.mq5 automates five property checks when run as a Script on any liquid forex pair with recent tick data:

// Expected output (any liquid pair with tick data available):
// [CHECK 1] PASS — Count() = n_bars
// [CHECK 2] PASS — Shannon's H ∈ [0, 1.585]: N valid, 0 violations
// [CHECK 3] PASS — Plug-In H ∈ [0, 1.585]: N valid, 0 violations
// [CHECK 4] PASS — Lempel-Ziv ∈ (0, 1]: N valid, 0 violations
// [CHECK 5] PASS — Konto H > 0: N valid, 0 violations
// Tick data coverage: N / 50 bars had >= 8 ticks

The coverage line is the key diagnostic: on a freshly attached chart, the terminal may not yet have cached tick data for all 50 bars, so N may be less than 50. After the chart has been open for a few minutes and MetaTrader 5 has fetched the required tick history, the coverage approaches 50/50 for liquid pairs on hourly and shorter timeframes. If the coverage remains low after several minutes, the broker's tick cache depth may be insufficient for the selected timeframe and bar count; reducing inp_n_bars or switching to a more recent timeframe resolves this.

Conclusion

The MQL5 port provides a production‑ready implementation of the four entropy estimators and the surrounding integration pieces needed to use them in an EA or indicator. Key deliverables are the header‑only CEntropyFeatures class (ShannonH, PlugInH, LempelZiv, KontoH and the ENT_EMPTY sentinel), an EntropyViewer indicator for visualization (with optional z‑score normalization), and a Validation script that checks value bounds and tick coverage. The port implements three MQL5‑specific adaptations: using CopyTicksRange() and an explicit ENT_EMPTY path for bars outside the broker tick cache; base‑3 hashing into fixed counters for efficient w‑gram frequency counts; and a bounded Kontoyiannis look‑back window to control cost on short tick messages.

All four estimators match the Python reference exactly on synthetic tests (r² = 1.000000) and include the algorithmic corrections from Part 7 (arr[:n−1] truncation for Plug‑In, phrase‑library membership for Lempel‑Ziv, and Λi ≥ 1 floor for Kontoyiannis). Note the practical limitation: entropy features are confined to bars inside the broker's tick cache, so the implementation is best suited for live use and recent historical bars on liquid symbols. The next article returns to Python and introduces structural-break detection features that complement these entropy measures for regime identification.

Attached Files

#	File	Destination	Purpose
1	CEntropyFeatures.mqh	MQL5\Include\Features\	Header-only class: tick sequence extraction and four entropy estimators
2	EntropyViewer.mq5	MQL5\Indicators\	Separate-window indicator with four DRAW_LINE buffers and optional z-score normalization
3	EntropyValidation.mq5	MQL5\Scripts\	Script: five property checks (bounds and count) with tick-data coverage report

References

López de Prado, M. (2018). Advances in Financial Machine Learning, Chapter 18: Entropy Features. Wiley.
Kontoyiannis, I., Algoet, P. H., Suhov, Y. M., & Wyner, A. J. (1998). Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. IEEE Transactions on Information Theory, 44(3), 1319–1324.
Ziv, J., & Lempel, A. (1976). On the complexity of finite sequences. IEEE Transactions on Information Theory, 22(1), 75–81.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423.

Attached files |

Download ZIP

MQL5.zip (8.88 KB)

Warning: All rights to these materials are reserved by MetaQuotes Ltd. Copying or reprinting of these materials in whole or in part is prohibited.

This article was written by a user of the site and reflects their personal views. MetaQuotes Ltd is not responsible for the accuracy of the information presented, nor for any consequences resulting from the use of the solutions, strategies or recommendations described.

Patrick Murimi Njoroge

Kenya
8302

Feature Engineering for ML (Part 8): Entropy Features in MQL5

Table of Contents

Introduction

Getting Intrabar Tick Data in MQL5

The tick history constraint

Encoding: From MqlTick to uchar Sequence

Architecture: CEntropyFeatures

Constructor signature

Shannon Entropy

Plug-In Entropy

Hash-based w-gram counting

Lempel-Ziv Complexity

Phrase-library membership vs. arbitrary occurrence

Kontoyiannis Entropy

The minimum-lambda floor

EA Integration

The Entropy Viewer

Buffer mapping

Validation

Regime profiles

Python–MQL5 agreement

Validation script

Conclusion

Attached Files

References

Other articles by this author