MetaTrader 5 Machine Learning Blueprint (Part 14): Transaction Cost Modeling for Triple-Barrier Labels in MQL5

MetaTrader 5 — Trading systems | 8 May 2026, 13:24

576

Patrick Murimi Njoroge

Introduction

Imagine building a bridge but omitting the structure's own weight from your load calculations: the design fails for a cost that was visible from the start. The same operational blindness affects many triple‑barrier labeling pipelines. Researchers commonly set min_ret to an arbitrary constant (0.5–1%) or to legacy spread/commission assumptions, then treat every historical price move above that threshold as genuine signal. The missing step is a disciplined, repeatable way to answer: what is the actual round‑trip transaction cost for this symbol, at this broker, given my strategy's typical holding period and entry hours?

Transaction Cost Pipeline - Data Flow

Figure 1. 2-stage pipeline from broker data collection to labeling threshold

Stage 1 (MQL5): TransactionCostCollector.mq5 runs on any chart, samples CopySpread() history, reads swap rates and symbol properties via SymbolInfoDouble(), and writes a structured CSV to the terminal's Files directory.
Stage 2 (Python): load_cost_model() parses the CSV into a TransactionCostModel, which exposes min_ret_for_symbol() for the labeling pipeline and summary() for cost inspection before committing to a threshold.

This article gives that answer. It presents a reproducible two-stage pipeline: (1) a compact MQL5 script that samples the broker's spread history, swap rates, and symbol properties and writes a structured CSV; and (2) a Python TransactionCostModel that ingests the CSV, converts all components to a common fractional‑return unit, and exposes min_ret_for_symbol() plus diagnostic routines. The output you should expect to obtain and use is explicit: a CSV with spread percentiles and hourly means, a model object that returns per‑trade cost breakdowns, a cost‑calibrated min_ret ready for get events(), and the same parameters to pass into your per‑trade P&L calculation.

Note:min_ret here is a labeling threshold (for training data construction), not the execution profit‑taking target an EA will place in the terminal.

The Three Cost Components and How to Measure Each

Spread is the only component you can measure precisely and in real time from MQL5. It is the difference between ask and bid at the moment of intended entry. For label construction, what matters is not the instantaneous spread at a single bar but the distribution of spread across the hours and sessions your strategy actually trades. A strategy that trades during the London session on EURUSD faces a different spread environment than one that trades during the Asian session or around news events.

EURUSD Spread Distribution by Hour of Day"

Figure 2. Session-stratified spread distribution by hour of day (broker time)

Panel (a): Mean spread in pips by hour across the full spread history. The London–New York overlap (hours 12–17) shows the tightest spreads; hours outside active sessions show substantially wider spreads that a mean-based cost estimate would systematically understate.

The MQL5 function CopySpread() gives access to a full history of spread values at bar resolution, which is exactly the data needed to characterize that distribution.

Slippage is the difference between the price at which you intended to enter and the price at which you were filled. This cannot be measured from historical bar data — it can only be measured from live or demo execution records. For label construction, you have two practical options: use broker-published slippage statistics if they are available, or derive an empirically calibrated constant from your demo trading log. A conservative starting assumption for a market order on a major forex pair during normal liquidity is 0.3–0.7 pips. For indices, metals, or exotic pairs, slippage is higher and more variable. The Python model accepts this as a parameter so it can be updated when you have better data.

Swap is the overnight financing cost — or credit — charged for holding a position past the daily rollover. It is the most commonly ignored component and the one that most systematically distorts labels for strategies with multi-day holding periods. A strategy labeled as profitable using price alone may be unprofitable after swaps on a two-week holding period. MQL5 exposes swap rates directly through SymbolInfoDouble() with SYMBOL_SWAP_LONG and SYMBOL_SWAP_SHORT. The swap mode — whether the rate is expressed in points, account currency, or as an interest percentage — varies by broker and instrument, and the Python model handles all three cases.

There is one more component worth noting even though it is not a market microstructure cost: the triple-swap day. Most brokers charge three times the nightly swap rate on one specific weekday — typically Wednesday for forex — to compensate for the weekend. For strategies with a two-to-four day average holding period, whether the position spans this day is not random; it depends on entry timing. The model accounts for this explicitly when the holding period is three or more days.

MQL5: The Transaction Cost Collector Script

This is a standalone MQL5 script — not an EA, not an indicator — that you compile once and run on any chart of the target instrument. It has no event loop and produces no terminal output beyond a brief summary to the Experts tab. Instead it executes a single blocking pass, collects the required broker data, writes a structured CSV, and exits. The CSV is the only persistent output.

Save the script at Scripts/TransactionCostCollector.mq5 and compile it in MetaEditor. To run, open any chart for the target symbol, right-click the script in the Navigator, and select Attach to chart. The input dialog exposes two parameters: InpBars (bars of spread history to sample, default 50,000) and InpOutputFile (optional filename override; blank defaults to <symbol>_costs.csv). Click OK. The output file appears at <terminal_data_path>/MQL5/Files/<symbol>_costs.csv. Copy that file to your Python project data directory and pass the path to load_cost_model().

How the Script Works

The script body is organized into six sequential sections inside OnStart(), each handled by a dedicated helper function. The entry point reads as a clean call sequence; all measurement and formatting logic is pushed into the helpers. The CSV is written in a uniform five-column format — section, key, value, unit, note — through a single reusable row writer. The six sections produce four primary CSV sections (symbol_properties, swap, commission, spread_summary) followed by one row per active hour in spread_by_hour.

Section 1 — Symbol properties reads all static instrument metadata: point size, tick size and value, contract size, pip factor (10 for 5-digit brokers, 1 for 4-digit), minimum lot, and the base/profit/margin currency triple. The Python model uses these values to convert spread points to price units and to compute notional trade values for commission scaling.

Section 2 — Swap rates reads SYMBOL_SWAP_LONG, SYMBOL_SWAP_SHORT, SYMBOL_SWAP_MODE, and SYMBOL_SWAP_ROLLOVER3DAYS. Rather than converting to a common unit here, the script records the native values and the mode string. The Python model handles all three conversion cases.

Section 3 — Commission diagnostic reads ACCOUNT_COMMISSION_BLOCKED and writes it alongside an embedded note explaining the reference-trade procedure for deriving the actual per-lot rate. This value is zero when no positions are open and cannot be used directly as a per-lot rate; see the commission note in the Practical Considerations section.

Section 4 — Spread distribution calls CopySpread() for the full requested history and computes summary statistics: mean, standard deviation, and the p25/p50/p75/p90/p95/p99 percentiles, in both points and pips.

Section 5 — Session-stratified spread aligns the spread array with bar timestamps via CopyTime() and computes the mean spread for each hour of the day (0–23, broker time). These per-hour values populate the spread_by_hour section of the CSV and are consumed by session_adjusted_spread_pips() in the Python model for session-aware cost estimation.

The listing below opens with the file header, input parameters, and forward declarations. All six helpers are declared before OnStart() so the entry point reads as a sequence of named steps rather than an unstructured block.

//+------------------------------------------------------------------+
//| TransactionCostCollector.mq5                                     |
//|                                                                  |
//| Collects transaction cost data for a given symbol: spread        |
//| distribution, swap rates, and commission diagnostics.            |
//|                                                                  |
//| Run as a Script (not EA) on any chart of the target symbol.      |
//| Output: <terminal_data_path>/MQL5/Files/<symbol>_costs.csv       |
//+------------------------------------------------------------------+
#property script_show_inputs

//--- Input parameters
input int    InpBars       = 50000;  // Bars of spread history to sample
input string InpOutputFile = "";     // Override filename (blank = <symbol>_costs.csv)

#define CSV_SEP ","

//--- Forward declarations
void WriteCsvRow(int handle, string section, string key,
                 string value, string unit, string note);

void CollectSymbolProperties(int fh, string symbol, int digits,
                             double point, double pip_factor,
                             double tick_size, double tick_value,
                             double contract_sz, double min_lot,
                             string cur_base, string cur_profit,
                             string cur_margin);

void CollectSwapInfo(int fh, string symbol, double swap_long,
                     double swap_short, int swap_mode,
                     int swap_3day, string swap_mode_str);

void CollectCommissionInfo(int fh, double commission_blocked);

void CollectSpreadDistribution(int fh, string symbol,
                               int digits, double point, double pip_factor,
                               const int &spread_arr[], int n,
                               int bars_sampled);

void CollectSessionSpread(int fh, string symbol,
                          int digits, double point, double pip_factor,
                          const int &spread_arr[], int n);

OnStart() resolves all symbol properties and swap rates once at the top, then delegates each measurement step to a helper. The spread array is copied after the static properties are written so that CopySpread() receives the final bar count after any validation.

void OnStart()
  {
   string symbol = Symbol();
   int    digits = (int)SymbolInfoInteger(symbol, SYMBOL_DIGITS);

   //--- Core symbol properties
   double point       = SymbolInfoDouble(symbol, SYMBOL_POINT);
   double tick_size   = SymbolInfoDouble(symbol, SYMBOL_TRADE_TICK_SIZE);
   double tick_value  = SymbolInfoDouble(symbol, SYMBOL_TRADE_TICK_VALUE);
   double contract_sz = SymbolInfoDouble(symbol, SYMBOL_TRADE_CONTRACT_SIZE);
   double min_lot     = SymbolInfoDouble(symbol, SYMBOL_VOLUME_MIN);
   string cur_base    = SymbolInfoString(symbol, SYMBOL_CURRENCY_BASE);
   string cur_profit  = SymbolInfoString(symbol, SYMBOL_CURRENCY_PROFIT);
   string cur_margin  = SymbolInfoString(symbol, SYMBOL_CURRENCY_MARGIN);
   double pip_factor  = (digits == 3 || digits == 5) ? 10.0 : 1.0;

   //--- Swap rates
   double swap_long  = SymbolInfoDouble(symbol, SYMBOL_SWAP_LONG);
   double swap_short = SymbolInfoDouble(symbol, SYMBOL_SWAP_SHORT);
   int    swap_mode  = (int)SymbolInfoInteger(symbol, SYMBOL_SWAP_MODE);
   int    swap_3day  = (int)SymbolInfoInteger(symbol, SYMBOL_SWAP_ROLLOVER3DAYS);

   string swap_mode_str = "";
   switch(swap_mode)
     {
      case SYMBOL_SWAP_MODE_POINTS:           swap_mode_str = "points";        break;
      case SYMBOL_SWAP_MODE_CURRENCY_SYMBOL:  swap_mode_str = "currency";      break;
      case SYMBOL_SWAP_MODE_INTEREST_OPEN:    swap_mode_str = "interest_open"; break;
      case SYMBOL_SWAP_MODE_INTEREST_CURRENT: swap_mode_str = "interest_curr"; break;
      case SYMBOL_SWAP_MODE_CURRENCY_MARGIN:  swap_mode_str = "currency_mrgn"; break;
      case SYMBOL_SWAP_MODE_CURRENCY_DEPOSIT: swap_mode_str = "currency_dep";  break;
      default:                                swap_mode_str = "unknown";       break;
     }

   double commission_blocked = AccountInfoDouble(ACCOUNT_COMMISSION_BLOCKED);

   //--- Open CSV
   string fname = InpOutputFile != "" ? InpOutputFile
                                       : symbol + "_costs.csv";
   int fh = FileOpen(fname, FILE_WRITE | FILE_CSV | FILE_ANSI, ',');
   if(fh == INVALID_HANDLE)
     {
      Print("ERROR: Cannot open output file: ", fname);
      return;
     }
   FileWrite(fh, "section" + CSV_SEP + "key" + CSV_SEP + "value"
                + CSV_SEP + "unit" + CSV_SEP + "note");

   //--- Static data sections
   CollectSymbolProperties(fh, symbol, digits, point, pip_factor,
                           tick_size, tick_value, contract_sz, min_lot,
                           cur_base, cur_profit, cur_margin);
   CollectSwapInfo(fh, symbol, swap_long, swap_short, swap_mode,
                   swap_3day, swap_mode_str);
   CollectCommissionInfo(fh, commission_blocked);

   //--- Spread history
   int spread_arr[];
   int bars_copied = CopySpread(symbol, PERIOD_CURRENT, 0, InpBars, spread_arr);
   if(bars_copied <= 0)
     {
      Print("ERROR: CopySpread failed. Bars available: ", bars_copied);
      FileClose(fh);
      return;
     }
   CollectSpreadDistribution(fh, symbol, digits, point, pip_factor,
                             spread_arr, bars_copied, bars_copied);
   CollectSessionSpread(fh, symbol, digits, point, pip_factor,
                        spread_arr, bars_copied);
   FileClose(fh);

   Print("TransactionCostCollector: wrote ", fname);
   Print("  Bars sampled: ", bars_copied);
  }

The four helpers below handle the static-data sections. WriteCsvRow() is the single point through which all CSV output passes; changing the field separator or quoting behavior requires editing only this one function. The remaining three helpers each call it in sequence for their respective rows.

//+------------------------------------------------------------------+
//| Write a single CSV row in the standard 5-column format           |
//+------------------------------------------------------------------+
void WriteCsvRow(int handle, string section, string key,
                 string value, string unit, string note)
  {
   FileWrite(handle,
             section + CSV_SEP + key   + CSV_SEP +
             value   + CSV_SEP + unit  + CSV_SEP + note);
  }

//+------------------------------------------------------------------+
//| Symbol properties section                                        |
//+------------------------------------------------------------------+
void CollectSymbolProperties(int fh, string symbol, int digits,
                             double point, double pip_factor,
                             double tick_size, double tick_value,
                             double contract_sz, double min_lot,
                             string cur_base, string cur_profit,
                             string cur_margin)
  {
   string sec = "symbol_properties";
   WriteCsvRow(fh, sec, "symbol",          symbol,                          "—", "");
   WriteCsvRow(fh, sec, "digits",          (string)digits,                  "—", "");
   WriteCsvRow(fh, sec, "point",           DoubleToString(point, 10),       "price", "");
   WriteCsvRow(fh, sec, "pip_factor",      DoubleToString(pip_factor, 1),   "points_per_pip",
               "10 for 5-digit; 1 for 4-digit");
   WriteCsvRow(fh, sec, "tick_size",       DoubleToString(tick_size, 10),   "price", "");
   WriteCsvRow(fh, sec, "tick_value",      DoubleToString(tick_value, 6),   "account_currency_per_lot", "");
   WriteCsvRow(fh, sec, "contract_size",   DoubleToString(contract_sz, 2),  "units", "");
   WriteCsvRow(fh, sec, "min_lot",         DoubleToString(min_lot, 2),      "lots", "");
   WriteCsvRow(fh, sec, "currency_base",   cur_base,   "—", "");
   WriteCsvRow(fh, sec, "currency_profit", cur_profit, "—", "");
   WriteCsvRow(fh, sec, "currency_margin", cur_margin, "—", "");
  }

//+------------------------------------------------------------------+
//| Swap rates section                                               |
//+------------------------------------------------------------------+
void CollectSwapInfo(int fh, string symbol, double swap_long,
                     double swap_short, int swap_mode,
                     int swap_3day, string swap_mode_str)
  {
   string sec = "swap";
   WriteCsvRow(fh, sec, "swap_long",  DoubleToString(swap_long, 6),
               swap_mode_str, "per night; negative = debit from account");
   WriteCsvRow(fh, sec, "swap_short", DoubleToString(swap_short, 6),
               swap_mode_str, "per night; negative = debit from account");
   WriteCsvRow(fh, sec, "swap_mode",  swap_mode_str,
               "—", "see SYMBOL_SWAP_MODE enum");
   WriteCsvRow(fh, sec, "swap_3day",  (string)swap_3day, "weekday",
               "0=Sun … 6=Sat; triple swap charged on this day");
  }

//+------------------------------------------------------------------+
//| Commission diagnostic section                                    |
//+------------------------------------------------------------------+
void CollectCommissionInfo(int fh, double commission_blocked)
  {
   string sec = "commission";
   WriteCsvRow(fh, sec, "commission_blocked_now",
               DoubleToString(commission_blocked, 4),
               "account_currency",
               "ACCOUNT_COMMISSION_BLOCKED; see derivation note");
   WriteCsvRow(fh, sec, "derivation_note",
               "Open a reference trade of 1.0 lot on this symbol; "
               "read ACCOUNT_COMMISSION_BLOCKED; that value is the "
               "per-side per-lot rate", "—", "");
  }

The two spread collectors both operate on the same spread array returned by CopySpread(). CollectSpreadDistribution() sorts a copy to extract percentiles; CollectSessionSpread() aligns the same array with bar timestamps via CopyTime() to compute per-hour means. Passing the array by const reference to both functions avoids copying 50,000 integers twice.

//+------------------------------------------------------------------+
//| Spread distribution: percentiles and summary statistics          |
//+------------------------------------------------------------------+
void CollectSpreadDistribution(int fh, string symbol,
                               int digits, double point, double pip_factor,
                               const int &spread_arr[], int n,
                               int bars_sampled)
  {
   double sum = 0.0, sum_sq = 0.0;
   int    min_sp = INT_MAX, max_sp = 0;
   for(int i = 0; i < n; i++)
     {
      int s = spread_arr[i];
      sum    += s;
      sum_sq += (double)s * s;
      if(s < min_sp) min_sp = s;
      if(s > max_sp) max_sp = s;
     }
   double mean_sp = sum / n;
   double var_sp  = (sum_sq / n) - (mean_sp * mean_sp);
   double std_sp  = MathSqrt(MathMax(var_sp, 0.0));

   int sorted[];
   ArrayCopy(sorted, spread_arr, 0, 0, n);
   ArraySort(sorted);

   double p25 = sorted[(int)(n * 0.25)];
   double p50 = sorted[(int)(n * 0.50)];
   double p75 = sorted[(int)(n * 0.75)];
   double p90 = sorted[(int)(n * 0.90)];
   double p95 = sorted[(int)(n * 0.95)];
   double p99 = sorted[(int)(n * 0.99)];

   string sec = "spread_summary";
   WriteCsvRow(fh, sec, "bars_sampled", (string)bars_sampled, "bars",    "");
   WriteCsvRow(fh, sec, "mean_points",  DoubleToString(mean_sp, 4), "points", "");
   WriteCsvRow(fh, sec, "std_points",   DoubleToString(std_sp, 4),  "points", "");
   WriteCsvRow(fh, sec, "p25_points",   DoubleToString(p25, 2),     "points", "");
   WriteCsvRow(fh, sec, "p50_points",   DoubleToString(p50, 2),     "points", "");
   WriteCsvRow(fh, sec, "p75_points",   DoubleToString(p75, 2),     "points", "");
   WriteCsvRow(fh, sec, "p90_points",   DoubleToString(p90, 2),     "points", "");
   WriteCsvRow(fh, sec, "p95_points",   DoubleToString(p95, 2),     "points", "");
   WriteCsvRow(fh, sec, "p99_points",   DoubleToString(p99, 2),     "points", "");
   WriteCsvRow(fh, sec, "mean_pips",    DoubleToString(mean_sp / pip_factor, 4), "pips", "");
   WriteCsvRow(fh, sec, "p50_pips",     DoubleToString(p50     / pip_factor, 4), "pips", "");
   WriteCsvRow(fh, sec, "p95_pips",     DoubleToString(p95     / pip_factor, 4), "pips", "");
   WriteCsvRow(fh, sec, "p99_pips",     DoubleToString(p99     / pip_factor, 4), "pips", "");
  }

//+------------------------------------------------------------------+
//| Session-stratified spread: mean by hour of day (broker time)     |
//+------------------------------------------------------------------+
void CollectSessionSpread(int fh, string symbol,
                          int digits, double point, double pip_factor,
                          const int &spread_arr[], int n)
  {
   datetime times[];
   int time_copied = CopyTime(symbol, PERIOD_CURRENT, 0, n, times);
   if(time_copied != n)
      Print("WARNING: CopyTime returned ", time_copied, " bars, expected ", n,
            ". Session spread may be incomplete.");

   double hour_sum[24];
   int    hour_cnt[24];
   ArrayInitialize(hour_sum, 0);
   ArrayInitialize(hour_cnt, 0);

   int limit = MathMin(n, time_copied);
   for(int i = 0; i < limit; i++)
     {
      MqlDateTime dt;
      TimeToStruct(times[i], dt);
      hour_sum[dt.hour] += spread_arr[i];
      hour_cnt[dt.hour]++;
     }

   string sec = "spread_by_hour";
   for(int h = 0; h < 24; h++)
     {
      if(hour_cnt[h] > 0)
        {
         double hmean_pips = (hour_sum[h] / hour_cnt[h]) / pip_factor;
         string hour_str   = StringFormat("hour_%02d", h);
         WriteCsvRow(fh, sec, hour_str,
                     DoubleToString(hmean_pips, 4), "pips",
                     "broker_time; n=" + (string)hour_cnt[h]);
        }
     }
  }

Python: The Transaction Cost Model

The Python side consists of a TransactionCostModel dataclass and a load_cost_model() factory function. The dataclass holds all broker and instrument parameters and exposes methods for computing each cost component as a fraction of entry price. The factory function reads the CSV exported by the MQL5 script and constructs the model automatically. Place both in afml/transaction_costs.py.

Two design decisions are worth noting before reading the code. First, every cost is expressed as a fractional return, not a pip value. This makes the model instrument-agnostic: the same class works identically on currency pairs, metals, and indices without unit conversion, and the output is directly comparable to the return series used for triple-barrier labeling. Second, commission_per_lot is expressed as a per-side rate — the commission charged for a single opening or closing trade on one standard lot. The round-trip computation doubles this internally. This matches the way most ECN brokers quote commission ($7 per lot per side, for example) and avoids the silent doubling error that occurs when a round-trip rate is passed to a function that applies its own ×2 multiplier.

The dataclass definition with all field declarations follows. Fields that have no universal default — symbol and spread_pips — are placed first and have no default value; all others default to zero or conservative starting points.

"""
afml/transaction_costs.py

Loads broker transaction cost data exported by TransactionCostCollector.mq5
and derives the min_ret threshold for triple-barrier label construction.

The threshold is the minimum return a trade must achieve to cover the
round-trip transaction cost at a given spread percentile, slippage
assumption, and holding-period-adjusted swap accrual.
"""

from __future__ import annotations

from dataclasses import dataclass, field
from pathlib import Path

import pandas as pd

@dataclass
class TransactionCostModel:
    """
    Broker-specific transaction cost model for a single symbol.

    All costs are expressed as fractional returns (e.g., 0.0001 = 1 pip
    on a 1.0000 priced instrument) so they can be compared directly to the
    return series used for triple-barrier labeling.

    Parameters
    ----------
    symbol : str
        Instrument identifier (e.g., "EURUSD").
    spread_pips : float
        Spread for cost calculation. Use p95 from the collected distribution,
        not the mean; entries during high-spread periods are disproportionately
        costly and the mean systematically understates them.
    slippage_pips : float
        One-way slippage estimate. Derived from live or demo trade log.
        Default 0.5 pips is conservative for major forex pairs.
    commission_per_lot : float
        Per-side commission in account currency per standard lot.
        Confirm from a reference trade; see CollectCommissionInfo() note.
        Set to zero for spread-only brokers.
    swap_long_per_night : float
        Swap in native MQL5 units for long positions.
        MQL5 sign convention: negative = broker debits your account.
    swap_short_per_night : float
        Swap in native MQL5 units for short positions.
        Usually negative; sometimes positive for short carry trades.
    swap_mode : str
        MQL5 SYMBOL_SWAP_MODE string from the CSV.
    swap_triple_day : int
        Weekday on which triple swap is charged (0=Sun, 6=Sat).
    pip_factor : float
        Points per pip. 10 for 5-digit brokers, 1 for 4-digit.
    point : float
        Broker point size (e.g., 0.00001 for EURUSD 5-digit).
    tick_value : float
        Account currency value of one tick per standard lot.
    tick_size : float
        Minimum price movement.
    contract_size : float
        Units per standard lot (e.g., 100,000 for forex).
    lot_size : float
        Lot size used for this strategy (e.g., 0.01 mini-lot).
    account_currency_rate : float
        Exchange rate from profit currency to account currency.
        Set to 1.0 when profit currency equals account currency.
    spread_by_hour : dict[int, float]
        Hour-of-day mean spread in pips (broker time). Used by
        session_adjusted_spread_pips() for hour-aware cost estimation.
    """

    symbol:                str
    spread_pips:           float
    slippage_pips:         float            = 0.5
    commission_per_lot:    float            = 0.0
    swap_long_per_night:   float            = 0.0
    swap_short_per_night:  float            = 0.0
    swap_mode:             str              = "points"
    swap_triple_day:       int              = 3        # Wednesday default
    pip_factor:            float            = 10.0
    point:                 float            = 0.00001
    tick_value:            float            = 10.0
    tick_size:             float            = 0.00001
    contract_size:         float            = 100_000.0
    lot_size:              float            = 0.01
    account_currency_rate: float            = 1.0
    spread_by_hour:        dict[int, float] = field(default_factory=dict)

The four entry-level cost methods below each return a single component of the round-trip cost as a fraction of the entry price. pip_value is a utility property that exposes the account-currency value of one pip at the configured lot size; it is not used internally but is available for quick sanity checks when inspecting model parameters.

    # ── Derived helpers ───────────────────────────────────────────────────────

    @property
    def pip_value(self) -> float:
        """Account currency value of one pip per lot_size."""
        return (
            (self.tick_value / self.tick_size)
            * (self.pip_factor * self.point)
            * self.lot_size
            * self.account_currency_rate
        )

    def spread_cost_frac(self, entry_price: float) -> float:
        """
        Round-trip spread cost as a fraction of entry price.

        For triple-barrier entries (market orders), the round-trip spread
        is the full bid-ask spread: you enter at ask and exit at bid on a
        market order stop or time exit, so the full spread is crossed once.
        """
        spread_price = self.spread_pips * self.pip_factor * self.point
        return spread_price / entry_price

    def slippage_cost_frac(self, entry_price: float) -> float:
        """Round-trip slippage as a fraction of entry price (2× one-way)."""
        slippage_price = (
            self.slippage_pips * self.pip_factor * self.point * 2
        )
        return slippage_price / entry_price

    def commission_cost_frac(self, entry_price: float) -> float:
        """
        Round-trip commission as a fraction of entry price.

        commission_per_lot is the per-side rate; multiply by 2 to get the
        round-trip cost (entry commission + exit commission). Divide by
        notional value to convert to a fractional return.
        """
        notional = entry_price * self.contract_size * self.lot_size
        if notional == 0:
            return 0.0
        return (
            self.commission_per_lot * 2 * self.account_currency_rate
        ) / notional

The swap method is the most involved because it must handle three different MQL5 swap modes and the triple-swap day. The MQL5 sign convention for swap rates is that a negative value represents a debit from your account (a cost). The method follows this convention: it negates the rate so that a negative swap produces a positive cost fraction, and a positive swap (a carry credit) produces a negative cost fraction that reduces the effective round-trip cost. Carry traders on pairs like AUDJPY or USDMXN benefit from this asymmetry; the original code's use of abs() silently converted carry credits into costs.

    def swap_cost_frac(
        self,
        entry_price: float,
        holding_days: float,
        side: int = 1,
    ) -> float:
        """
        Swap accrual as a fraction of entry price for a given holding period.

        Returns a positive value when swap is a net cost and a negative value
        when swap is a carry credit that reduces the effective round-trip cost.

        MQL5 sign convention: negative swap rate = broker debits the account
        (a cost to you). This method negates the rate so that a debit produces
        a positive cost fraction and a carry credit produces a negative one.

        Triple-swap day: when holding_days >= 3, two additional nights of swap
        are charged on the assumption that the position spans the rollover day.
        This is a conservative approximation; the actual exposure depends on
        entry timing. See the Practical Considerations section for a note on
        calendar days vs. trading bars.

        Parameters
        ----------
        entry_price  : float  Entry price of the trade.
        holding_days : float  Expected holding period in calendar days.
        side         : int    +1 for long, -1 for short.
        """
        rate = (
            self.swap_long_per_night if side >= 0
            else self.swap_short_per_night
        )
        nights = holding_days + (2 if holding_days >= 3 else 0)

        if self.swap_mode == "points":
            # rate in broker points; negative = debit
            # Negate: negative rate → positive cost fraction
            swap_price = -rate * self.point * nights
            return swap_price / entry_price if entry_price > 0 else 0.0

        elif self.swap_mode in ("currency", "currency_mrgn", "currency_dep"):
            # rate in account currency per lot per night; negative = debit
            total_swap = -rate * self.lot_size * nights
            notional   = entry_price * self.contract_size * self.lot_size
            return total_swap / notional if notional > 0 else 0.0

        elif self.swap_mode in ("interest_open", "interest_curr"):
            # Annual interest rate; negative = you pay
            nightly_rate = -rate / 100.0 / 365.0
            return nightly_rate * nights

        return 0.0

The remaining four methods form the public API. round_trip_cost_frac() sums all four components and is the single call that min_ret_for_symbol() maps across the price series. The floor at zero in min_ret_for_symbol() handles the edge case of a strong carry trade where the credit from swap reduces the total cost below zero; a negative labeling threshold is meaningless and would label every trade as positive regardless of return. session_adjusted_spread_pips() and summary() are diagnostic utilities.

    def round_trip_cost_frac(
        self,
        entry_price: float,
        holding_days: float = 0.0,
        side: int = 1,
    ) -> float:
        """
        Total round-trip cost as a fraction of entry price.

        Parameters
        ----------
        entry_price  : Reference price (e.g., close at the label bar).
        holding_days : Expected holding period in calendar days.
                       Pass 0.0 for intraday strategies (no overnight cost).
        side         : +1 long, -1 short. Affects swap direction.
        """
        return (
            self.spread_cost_frac(entry_price)
            + self.slippage_cost_frac(entry_price)
            + self.commission_cost_frac(entry_price)
            + self.swap_cost_frac(entry_price, holding_days, side)
        )

    def min_ret_for_symbol(
        self,
        price_series: pd.Series,
        holding_days: float = 0.0,
        side: int = 1,
        cost_multiplier: float = 1.5,
    ) -> float:
        """
        Derive the min_ret threshold for triple-barrier labeling.

        The threshold is cost_multiplier × median round-trip cost across the
        price series. A cost_multiplier of 1.5 means the profit barrier must
        exceed 1.5× the round-trip cost to receive a positive label; trades
        that barely cover costs are treated as non-events.

        The result is floored at zero. For strong carry pairs the swap credit
        can reduce the median cost below zero, which would yield a negative
        threshold — a result that labels every trade as positive regardless of
        return and defeats the purpose of the filter.

        Parameters
        ----------
        price_series    : pd.Series  Close prices indexed the same as labels.
        holding_days    : float      Expected average holding period in calendar days.
        side            : int        +1 or -1. Use 1 if direction is unknown.
        cost_multiplier : float      Safety margin above break-even cost.
                                     1.0 = break-even; 1.5 is recommended.

        Returns
        -------
        float
            min_ret value to pass to get_events() or equivalent.
        """
        costs = price_series.apply(
            lambda p: self.round_trip_cost_frac(p, holding_days, side)
        )
        return float(max(0.0, costs.median() * cost_multiplier))

    def session_adjusted_spread_pips(self, hour: int) -> float:
        """
        Return the mean spread for a given hour-of-day (broker time).
        Falls back to spread_pips if that hour has no data.
        """
        return self.spread_by_hour.get(hour, self.spread_pips)

    def summary(self, entry_price: float, holding_days: float = 1.0) -> dict:
        """Human-readable cost breakdown at a reference entry price and holding period."""
        pip_price = self.pip_factor * self.point
        return {
            "spread_frac":      self.spread_cost_frac(entry_price),
            "slippage_frac":    self.slippage_cost_frac(entry_price),
            "commission_frac":  self.commission_cost_frac(entry_price),
            "swap_long_frac":   self.swap_cost_frac(entry_price, holding_days,  1),
            "swap_short_frac":  self.swap_cost_frac(entry_price, holding_days, -1),
            "total_long_frac":  self.round_trip_cost_frac(entry_price, holding_days,  1),
            "total_short_frac": self.round_trip_cost_frac(entry_price, holding_days, -1),
            "total_long_pips":  self.round_trip_cost_frac(entry_price, holding_days,  1)
                                * entry_price / pip_price,
            "total_short_pips": self.round_trip_cost_frac(entry_price, holding_days, -1)
                                * entry_price / pip_price,
        }

The factory function parses the CSV produced by the MQL5 script and constructs the dataclass from it. The spread_percentile argument controls which row of the spread_summary section is used as spread_pips; slippage_pips and commission_per_lot are not in the CSV and must be supplied by the caller because they cannot be measured from historical bar data.

# ── CSV loader ────────────────────────────────────────────────────────────────


def load_cost_model(
    csv_path: Path,
    spread_percentile: str = "p95_pips",
    slippage_pips: float = 0.5,
    commission_per_lot: float = 0.0,
    lot_size: float = 0.01,
    account_currency_rate: float = 1.0,
) -> TransactionCostModel:
    """
    Build a TransactionCostModel from a CSV exported by TransactionCostCollector.mq5.

    Parameters
    ----------
    csv_path : Path
        Path to the <symbol>_costs.csv file.
    spread_percentile : str
        Which spread statistic to use as the model spread. Options: mean_pips,
        p50_pips, p95_pips, p99_pips. Default p95_pips is recommended.
    slippage_pips : float
        One-way slippage. Not in the CSV — must be supplied from live or demo trade
        log analysis.
    commission_per_lot : float
        Per-side commission per standard lot in account currency. Confirmed from a
        reference trade; zero for spread-only brokers.
    lot_size : float
        Strategy lot size.
    account_currency_rate : float
        Profit-to-account-currency exchange rate.

    Returns
    -------
    TransactionCostModel
    """
    df = pd.read_csv(csv_path)

    def get(section: str, key: str) -> str:
        row = df[(df["section"] == section) & (df["key"] == key)]
        if row.empty:
            raise KeyError(f"Missing: section={section!r} key={key!r} in {csv_path}")
        return str(row["value"].iloc[0]).strip()

    symbol = get("symbol_properties", "symbol")
    point = float(get("symbol_properties", "point"))
    pip_factor = float(get("symbol_properties", "pip_factor"))
    tick_size = float(get("symbol_properties", "tick_size"))
    tick_value = float(get("symbol_properties", "tick_value"))
    contract_sz = float(get("symbol_properties", "contract_size"))

    swap_long = float(get("swap", "swap_long"))
    swap_short = float(get("swap", "swap_short"))
    swap_mode = get("swap", "swap_mode")
    swap_3day = int(get("swap", "swap_3day"))

    spread_pips = float(get("spread_summary", spread_percentile))

    hour_rows = df[df["section"] == "spread_by_hour"]
    spread_by_hour: dict[int, float] = {}
    for _, row in hour_rows.iterrows():
        hour = int(str(row["key"]).replace("hour_", ""))
        spread_by_hour[hour] = float(row["value"])

    return TransactionCostModel(
        symbol=symbol,
        spread_pips=spread_pips,
        slippage_pips=slippage_pips,
        commission_per_lot=commission_per_lot,
        swap_long_per_night=swap_long,
        swap_short_per_night=swap_short,
        swap_mode=swap_mode,
        swap_triple_day=swap_3day,
        pip_factor=pip_factor,
        point=point,
        tick_value=tick_value,
        tick_size=tick_size,
        contract_size=contract_sz,
        lot_size=lot_size,
        account_currency_rate=account_currency_rate,
        spread_by_hour=spread_by_hour,
    )

Integration with the Labeling and P&L Pipeline

With both parts in place, the pipeline integration has five steps: load the model, inspect the cost breakdown at a reference price, derive min_ret for labeling, pass it to get_events(), and pass the same cost parameters to triple_barrier_pnl(). The first two steps amount to a mandatory pre-flight check: inspect the summary() output before committing to any threshold. If any component looks anomalous — a commission fraction that is larger than the spread fraction, for example, or a swap cost that accounts for the majority of the total — investigate the source data before proceeding.

from pathlib import Path
import pandas as pd
from afml.transaction_costs import load_cost_model

# Step 1: Load model from MQL5 export
model = load_cost_model(
    csv_path           = Path("data/EURUSD_costs.csv"),
    spread_percentile  = "p95_pips",   # conservative: 95th percentile
    slippage_pips      = 0.4,          # derived from demo trade log
    commission_per_lot = 7.0,          # per-side; confirmed from reference trade
    lot_size           = 0.01,
)

# Step 2: Inspect cost breakdown before committing to a threshold
print(pd.Series(model.summary(entry_price=1.08500, holding_days=1.0)))

The summary() output is the first thing to examine. It shows each cost component as a fraction of the entry price, along with the totals expressed in both fractions and pips. On EURUSD at 1.085, with a 95th-percentile spread of 1.2 pips, 0.4 pips of slippage, and a commission equivalent of roughly 0.07 pips for a 0.01 lot, the total intraday round-trip cost is approximately 1.67 pips. Once the breakdown is verified, derive min_ret and pass it to the labeling call.

close = pd.read_parquet("data/EURUSD_H1.parquet")["close"]

# Step 3: Derive min_ret for the labeling call
min_ret_intraday = model.min_ret_for_symbol(
    price_series=close,
    holding_days=0.0,   # no overnight exposure
    cost_multiplier=1.5,
)

# For H1 with max_hold=48: 48 bars × 1 hour = 48 hours = 2.0 calendar days
min_ret_swing = model.min_ret_for_symbol(
    price_series=close,
    holding_days=2.0,
    cost_multiplier=1.5,
)

print(f"min_ret (intraday): {min_ret_intraday:.6f}")
print(f"min_ret (swing):    {min_ret_swing:.6f}")

# Step 4: Pass to get_events for triple-barrier label construction
events = get_events(
    close=close,
    t_events=sampled_idx,
    pt_sl=[2, 1],
    target=volatility,
    min_ret=min_ret_swing,  # cost-derived labeling threshold
    num_threads=4,
    vertical_barrier_times=barriers,
)

The most common integration mistake is to use one set of cost assumptions for labeling and a different set when computing the P&L series that feeds downstream parameter estimation. If min_ret is calibrated to a p95 spread of 1.2 pips but the P&L pipeline deducts only a hardcoded 0.5 pips, then E[P&L] and the average winning-trade value will be more optimistic than the labels were. The Part 15 O-U parameter estimates — and the optimal PT and SL derived from them — will be set wider than the realized cost environment supports. Avoid this by sourcing cost parameters from the model object rather than hardcoding them at each call site.

# Step 5: Pass the same cost parameters to the P&L pipeline
#          model.spread_pips and model.slippage_pips are the p95 values from
#          the CSV — the same figures used to compute min_ret. Do not substitute
#          separate hardcoded constants here.
pnl_series = triple_barrier_pnl(
    df=bars,
    signals=sigs,
    atr=at,
    pt_mult=2.0,
    sl_mult=1.0,
    max_hold=48,
    spread_pips=model.spread_pips,     # from cost model, not hardcoded
    slippage_pips=model.slippage_pips,   # include slippage in P&L
    pip=0.0001,
)

The table below maps each parameter in the model to its source, the recommended value where one exists, and the specific caution associated with each.

	Parameter	Source	Notes
1.	spread_pips	MQL5 script CSV (spread_summary)	Use p95_pips by default. Pass model.spread_pips to both min_ret_for_symbol() and triple_barrier_pnl() — never hardcode a separate constant for each call site.
2.	slippage_pips	Live or demo trade log	Cannot be derived from historical bar data. 0.4–0.7 pips is a conservative starting range for major forex pairs. Include in both the labeling cost and the P&L deduction.
3.	commission_per_lot	Reference trade on demo account	Per-side rate. Open a 1.0 lot trade, read ACCOUNT_COMMISSION_BLOCKED from the Account tab — that value is the per-side per-lot commission. Confirm from the account statement. Zero for spread-only brokers.
4.	swap_long_per_night / swap_short_per_night	MQL5 script CSV (swap)	Collected automatically. MQL5 sign convention: negative = debit. Re-run the script quarterly; swap rates change with central bank policy.
5.	holding_days	Strategy design	Calendar days, not trading bars. For H1 with max_hold=48: 48 × 1 hour = 2.0 calendar days. For H4 with max_hold=30: 30 × 4 hours = 120 hours = 5.0 calendar days. Must be consistent between min_ret_for_symbol() and triple_barrier_pnl().
6.	cost_multiplier	Research parameter	1.5 is a reasonable default. Applies to min_ret derivation only — not to the P&L deduction. See Practical Considerations for calibration guidance.

Practical Considerations

Use p95, not the mean, for spread. The mean spread is the average across all market conditions, including the liquid London-session hours and quiet Asian consolidation periods. Your strategy will occasionally enter during illiquid windows — news events, session opens, thin early-morning hours — and those are precisely the entries where the spread is widest and a mean-calibrated cost model will be most wrong. The 95th percentile is a more honest representation of the spread environment your entries will actually face. If your strategy enforces hard entry-hour filters, you can substitute session_adjusted_spread_pips() to use the hourly mean for entries during those specific hours rather than a single distribution-wide percentile.

Round-Trip Cost Breakdown by Holding Period (EURUSD)

Figure 3. 2-panel illustration of cost component breakdown for intraday and swing holding periods

Panel (a) — Intraday (holding_days=0.0): Spread and slippage account for the full round-trip cost; swap is zero. Commission contributes a small but non-negligible fraction on ECN accounts.
Panel (b) — Swing (holding_days=2.0): Swap accrual makes the total cost meaningfully larger than the intraday figure. Strategies calibrated on intraday cost estimates will systematically mislabel swing-holding-period trades.

The commission gap is intentional.ACCOUNT_COMMISSION_BLOCKED is not a per-lot rate; it is the commission reserved for currently open positions. When no positions are open, it returns zero. To estimate the per-side rate, open a 1.0 lot reference trade on a demo account, read the commission from the Account tab, and confirm from the end-of-day statement. Treat this as a one-time calibration per broker relationship, repeated whenever you change accounts or brokers.

The cost_multiplier=1.5 is a research parameter, not a constant. For a strategy with a high win rate and short holding period, 1.5× may be too conservative and will over-filter genuine signals. For a strategy with a moderate win rate on longer holding periods, 1.5× may be too loose. The correct way to calibrate it is to examine the label distribution before and after the filter.

Label Distribution Before vs After Cost-Calibrated min_ret

Figure 4. 2-panel illustration of label distribution before and after the cost-calibrated min_ret filter

Panel (a) — Before: Raw label distribution including trades that barely clear or fail to clear the round-trip cost. The zero-label class is under-represented relative to what the cost environment warrants.
Panel (b) — After: Label distribution after applying a cost-calibrated min_ret. Trades removed by the filter that were previously labeled positive indicate that the filter is removing genuine edge — the multiplier is too aggressive. Trades removed uniformly across all original label outcomes indicate the filter is removing noise.

If the removed labels are approximately uniformly distributed across winning, losing, and zero-outcome categories, the filter is removing noise — borderline trades that the market treats as coin flips. If the removed labels are disproportionately concentrated in winning trades, the filter is removing edge; the profit threshold is above actual market structure and should be loosened. The goal is to filter labels that are indistinguishable from friction, not labels that represent real opportunity.

The cost_multiplier applies to labeling only — not to P&L computation. When you pass model.spread_pips and model.slippage_pips to triple_barrier_pnl(), you are deducting the raw cost from each trade outcome with no multiplier. The multiplier is a conservatism buffer for the binary pass/fail labeling gate. Inflating costs in the P&L series would distort the distribution used for O-U parameter estimation in Part 15.

Calendar days vs. trading bars. The holding_days parameter is in calendar days for the purpose of swap accrual, but max_hold is in trading bars. These are the same quantity on weekday entries: 48 H1 bars starting Monday morning ends Wednesday morning, spanning exactly 2 calendar days. They diverge near weekends. A position entered Thursday at 10:00 with max_hold=48 bars will close around Monday at 10:00, spanning 4 calendar nights despite covering only 48 trading hours. For strategies with consistent intraday entry patterns, the discrepancy is small. For strategies that can enter late in the trading week, consider using the maximum possible calendar span rather than the mean when setting holding_days.

Re-run quarterly, and always after changing brokers. Spread distributions shift with changes in market structure and broker liquidity arrangements. Swap rates change with central bank policy. A cost model calibrated twelve months ago on one broker may meaningfully understate costs on a different broker or in the current rate environment. The MQL5 script takes a few seconds on any chart and produces a fresh CSV immediately. Re-run it at the start of each new training cycle.

Conclusion

The two components described here close a concrete, operational gap: they replace ad hoc, hardcoded min ret constants with a broker‑measured, repeatable cost calibration that is directly usable in both labeling and P&L computation. Practically, the pipeline delivers four tangible artefacts you should integrate into your research flow:

a CSV export from MetaTrader 5 containing spread percentiles, hourly spread means, swap parameters and symbol metadata;
a TransactionCostModel that converts those values into fractional returns and computes per‑trade cost components;
a numeric min ret derived from the median round‑trip cost (optionally multiplied by a safety factor) ready to pass to get events();
the same cost parameters fed into triple barrier_pnl() so labels and realized P&L are computed on a single, consistent cost basis.

Three design rules matter in practice. Express costs as fractional returns so the model is instrument‑agnostic; use a conservative spread percentile (p95) or session‑adjusted hourly spreads rather than the mean; and treat slippage and per‑lot commission as caller‑supplied calibrations (they require execution logs or a reference trade). Finally, make this measurement part of your cadence: re‑run the MQL5 collector quarterly and whenever you change brokers or accounts, and always use the same TransactionCostModel outputs for both labeling and P&L to avoid subtle but consequential mismatches in downstream parameter estimation.

Attached Files

	File	Module	Role in this article	Key dependencies
1.	TransactionCostCollector.mq5	MQL5 Scripts	Standalone script. Collects spread distribution percentiles and per-hour session means via CopySpread(), swap rates via SymbolInfoDouble(), all symbol properties, and commission diagnostic from ACCOUNT_COMMISSION_BLOCKED. Writes a structured CSV to the MQL5 Files directory. Run once per instrument per quarter or after any broker change.	MQL5 standard library only.
2.	transaction_costs.py	afml	Python module containing the TransactionCostModel dataclass and load_cost_model() factory. Computes spread, slippage, commission, and swap as fractional returns. Handles all three MQL5 swap modes and the triple-swap day. Swap is sign-aware: carry credits reduce the effective cost rather than inflating it. Exposes min_ret_for_symbol() for the labeling pipeline, session_adjusted_spread_pips() for hour-aware strategies, and summary() for cost inspection. model.spread_pips and model.slippage_pips should also be passed to triple_barrier_pnl() in the P&L pipeline.	pandas, dataclasses, pathlib

Attached files |

Download ZIP

transaction_costs.py (13.42 KB)

TransactionCostCollector.mq5 (15.6 KB)

Warning: All rights to these materials are reserved by MetaQuotes Ltd. Copying or reprinting of these materials in whole or in part is prohibited.

This article was written by a user of the site and reflects their personal views. MetaQuotes Ltd is not responsible for the accuracy of the information presented, nor for any consequences resulting from the use of the solutions, strategies or recommendations described.