Discussing the article: "Feature Engineering for ML (Part 7): Entropy Features in Python"

 

Check out the new article: Feature Engineering for ML (Part 7): Entropy Features in Python.

The article provides production-ready entropy estimators (Shannon, plug-in, Lempel–Ziv, Kontoyiannis) operating on tick-rule–encoded sequences. It resolves three correctness and performance issues in the original code, verifies outputs against chapter references, and extends encoding with quantile and sigma options. Users gain reproducible results and markedly improved computation speed for large bar sets.

The tick rule assigns a direction bt ∈ {−1, 0, +1} to each trade within a bar. The sequence {b1, b2, ..., bN} is the raw material for all four estimators. Two properties of this sequence are informative.

The first property is the marginal distribution. A uniform distribution over {−1, +1} — equal numbers of buys and sells — produces maximum Shannon entropy. A skewed distribution — most trades in one direction — produces low Shannon entropy. Low entropy is therefore a signal that the order flow within the bar is directionally concentrated, which the microstructure literature associates with informed trading.

The second property is the sequential structure. Two bars can have the same marginal distribution (equal buys and sells, H = 1 bit) but completely different sequential structure. One might alternate perfectly — BSBSBS — while the other is random. Plug-in entropy on bigrams (BSBSBS produces only BS and SB) would detect this regularity; Shannon entropy on the marginal distribution would not. Lempel-Ziv complexity captures a third property: how many new phrases appear in the sequence as it extends, which measures how compressible the sequence is. A trending sequence — all buys — is maximally compressible (one phrase). A random sequence is incompressible.

Author: Patrick Murimi Njoroge