preview
Encoding Candlestick Patterns (Part 3): Frequency Analysis for Single Candlestick Type Structure

Encoding Candlestick Patterns (Part 3): Frequency Analysis for Single Candlestick Type Structure

MetaTrader 5Trading systems |
120 0
Daniel Opoku
Daniel Opoku

Introduction

Financial price series combine many interacting drivers—macro data, liquidity, policy, news and trader behavior—so charts often look noisy. Yet price action repeatedly produces recognizable candle shapes and local structures. In Parts 1–2 we formalized this observation: individual candles were converted into a finite alphabet of symbols (Part 1) and tools were developed to enumerate possible symbol combinations (Part 2). What remained unresolved was a practical, data-driven question: which of those encoded candle types actually occur in live markets, how often, and what are the reproducible outputs a practitioner can use?

This article closes that basic gap by providing a focused, reproducible frequency analysis of single‑candle symbols. Target audience: traders, algo developers and researchers who want a measurable market profile as input to further modeling. Scope and limits: we analyze only individual candlestick symbols (one‑letter patterns) — not multi‑candle sequences — and we treat Marubozu variants as a single category while grouping all non‑matching candles under the unclassified underscore (“_”).

Using 1,500‑candle samples from GBP/USD and Gold (XAUUSD) on H1, M15 and M5 timeframes, we (1) apply the Part‑1 encoding to historical bars, (2) count occurrences of A/G/H/E/a/g/h/e/D/_ symbols, and (3) compute raw counts and percentages. The deliverable is a reproducible MQL5 script and a TXT report that together produce: the encoded series, per‑symbol counts and percentages, and simple summary metrics (dominant symbols, share of unclassified candles, and bullish/bearish symmetry). These measurable outputs are intended to be the stable input for the next step: frequency analysis of two-letter patterns and transition probabilities.

Frequency analysis is widely used in many disciplines, including linguistics, genetics, and data science, to identify dominant structures within large datasets. Applying the same principle to encoded candlestick data allows us to determine which market structures occur most frequently, which are relatively rare, and whether certain combinations exhibit recurring behavior that may warrant further investigation.


Objective

The primary objective of this article is to perform a statistical frequency analysis of encoded candlestick patterns using real market data. Specifically, we will investigate the occurrence of individual candlestick structures (single-symbol patterns) within historical price series.

Using Gold (XAUUSD) and GBPUSD as case studies, we will:

  1. Convert historical candlestick data into the alphabet-based encoding system as in Part 1.
  2. Extract individual encoded symbols from the encoded market sequence.
  3. Compute the frequency of occurrence and percentage of each symbol.
  4. Compare pattern distributions across different financial instruments.
  5. Identify dominant candlestick structures within the dataset.

To accomplish these objectives, we will develop an MQL5 script capable of automatically encoding historical price data, extracting pattern sequences, counting occurrences, and generating statistical summaries.

The results obtained from this analysis will provide a quantitative foundation for future studies on pattern reliability, transition probabilities, conditional occurrence analysis, and predictive modeling. Rather than treating candlestick patterns solely as visual formations, we begin to view them as measurable statistical events whose behavior can be systematically analyzed and compared across markets.

Ultimately, frequency analysis serves as the bridge between pattern generation and statistical modeling. Once the occurrence characteristics of encoded candlestick structures are understood, traders and researchers can proceed to investigate whether frequently occurring patterns possess exploitable market tendencies and whether rare patterns carry unique informational value.


Single-Candlestick Frequency  Analysis 

In this section, we examine the frequency distribution of individual encoded candlestick types for Gold (XAUUSD) and GBPUSD across the H1, M15, and M5 timeframes. The objective is to identify the dominant candlestick structures within the encoded price-action series and to determine whether certain candle types occur more frequently than others.

Before discussing the results, it is important to clarify two aspects of the encoding scheme introduced in Part 1. First, the underscore symbol ('_') represents unclassified candlesticks. These are candlesticks that do not satisfy any of the predefined bullish or bearish classification rules and are therefore grouped into a single unclassified category.

Second, the Marubozu class (Part 1) includes three related candlestick definitions. They differ slightly, but all show a strong directional body with minimal or no shadows; therefore, they are treated as a single category for encoding and analysis.

  • GBPUSD Encoded Series

GBPUSD H1 Series

For GBPUSD under a one-hour timeframe, a total of 1,500 candlesticks were transformed into their corresponding alphabetic representations using the encoding framework developed in Part 1. Figure 1 presents the resulting encoded price series for a one-hour timeframe, where each alphabet denotes a specific candlestick structure identified from the historical market data. 

GBPUSD_H1_Encode_sg

Figure 1: GBPUSD H1 Encoded Series

Figure 1 illustrates the market as a sequence of encoded symbols rather than conventional candlestick charts. This transformation enables price action to be analyzed using statistical and computational techniques. As discussed previously, the underscore symbol (_) represents candlesticks that do not satisfy any of the predefined bullish or bearish classification criteria and are therefore categorized as unclassified candlesticks.

Having converted the candlestick data into symbolic form, the next step was to examine the frequency of occurrence of each candle type. The results are summarized in Table 1.

Table 1: GBPUSD H1Candle Counts Percentages (Window: 1500)

Candle Type Count Percentage
Bullish Marubozu       'A' 310 20.67%
Bullish SpinTop          'G' 86 5.73%
Bullish Pinbar            'H' 28 1.87%
Bullish Inv.Pinbar       'E' 50 3.33%
Bearish Marubozu      'a' 311 20.73%
Bearish SpinTop         'g' 90 6.00%
Bearish Pinbar           'h' 38 2.53%
Bearish Inv.Pinbar      'e' 48 3.20%
Doji                          'D' 8 0.53%
Unclassified              '_' 531 35.40%

Among the bullish candlestick categories, the symbol 'A' recorded the highest frequency with 310 occurrences, representing 20.67% of the total sample. This was followed by 'G' with 86 occurrences (5.73%), 'E' with 50 occurrences (3.33%), and 'H' with 28 occurrences (1.87%).

Similarly, within the bearish candlestick categories, 'a' exhibited the highest occurrence with 311 counts (20.73%), followed by 'g' with 90 counts (6.00%), 'e' with 48 counts (3.20%), and 'h' with 38 counts (2.53%). The neutral candlestick represented by 'D' appeared 8 times, accounting for 0.53% of the dataset.

The unclassified candlestick category '_' recorded 531 occurrences, representing 35.40% of all observations. This indicates that more than one-third of the analyzed candlesticks do not conform to the predefined encoding rules and may require further subdivision or refinement in future studies.

Aggregating the classified candle types shows that bullish candlesticks accounted for 474 observations (31.60%), while bearish candlesticks accounted for 487 observations (32.46%). The difference between the two groups is relatively small, suggesting that the market exhibited a balanced distribution of bullish and bearish price action over the analyzed period.

A closer examination of corresponding bullish and bearish candlestick pairs reveals a remarkable degree of symmetry. For example, the occurrence frequencies of 'A' and 'a', 'G' and 'g', as well as 'E' and 'e', are very similar. This near-equivalence suggests that the encoding framework captures market behavior that is largely balanced between upward and downward price movements.

Given this observed symmetry, it is reasonable to infer that the unclassified candlestick population may also exhibit a similar balance between bullish and bearish structures. 

The encoding process does not assign direction to unclassified candles, so the 531 unclassified observations may be split roughly evenly between bullish and bearish candles. This cannot be verified under the current scheme, but the symmetry in the classified categories supports the hypothesis.

Overall, the frequency analysis demonstrates that the encoded market series possesses a relatively balanced bullish–bearish structure. This finding establishes a statistical foundation for the subsequent analysis of multi-candlestick patterns, where combinations of encoded symbols can be investigated to determine whether certain sequences occur more frequently than would be expected by chance.

GBPUSD M15 Series

To investigate whether the frequency characteristics observed in the H1 timeframe persist at lower resolutions, the same analysis was performed on 1,500 GBPUSD candlesticks from the 15-minute (M15) timeframe. The candlesticks were transformed into their corresponding encoded alphabet series, as illustrated in Figure 2.

GBPUSD_M15_Encode_sg

Figure 2: GBPUSD M15 Encoded Series

The frequency distribution of the encoded candlestick types is summarized in Table 2.

Table 2: GBPUSD M15 Candle Counts Percentages (Window: 1500)

Candle Type Count Percentage
Bullish Marubozu       'A' 306 20.40%
Bullish SpinTop          'G' 77 5.13%
Bullish Pinbar            'H' 35 2.33%
Bullish Inv.Pinbar       'E' 44 2.93%
Bearish Marubozu      'a' 314 20.93%
Bearish SpinTop         'g' 82 5.47%
Bearish Pinbar           'h' 45 3.00%
Bearish Inv.Pinbar      'e' 44 2.93%
Doji                          'D' 13 0.87%
Unclassified              '_' 540 36.00%

Among the bullish candlestick categories, 'A' recorded the highest frequency with 306 occurrences, representing 20.40% of the total sample. This was followed by 'G' with 77 occurrences (5.13%), 'E' with 44 occurrences (2.93%), and 'H' with 35 occurrences (2.33%).

For the bearish candlestick categories, 'a' was the most frequently occurring pattern with 314 occurrences (20.93%), followed by 'g' with 82 occurrences (5.47%), 'h' with 45 occurrences (3.00%), and 'e' with 44 occurrences (2.93%).

The neutral candlestick represented by 'D' appeared 13 times, accounting for 0.87% of the dataset. Meanwhile, the unclassified category '_' recorded 540 occurrences, representing 36.00% of all observations.

When aggregated, the bullish candlestick categories contributed 462 occurrences (30.80%), while the bearish categories contributed 485 occurrences (32.33%). Similar to the H1 analysis, the difference between bullish and bearish counts is relatively small.

A similar symmetry arrangement is observed as explained in the GBPUSD H1 series. The consistency of this symmetry across both H1 and M15 timeframes indicates that the encoding framework captures an inherent balance between bullish and bearish price movements in the GBPUSD market.

GBPUSD M5 Series

The analysis was further extended to the 5-minute (M5) timeframe to determine whether the observed frequency characteristics remain stable at a higher sampling resolution. As with the previous datasets, 1,500 candlesticks were converted into their encoded alphabet representations, as shown in Figure 3.

GBPUSD_M5_Encode_sg

Figure 3: GBPUSD M5 Encoded Series

Table 3 summarizes the frequency distribution of the encoded candlestick types.

Table 3: GBPUSD M5 Candle Counts Percentages (Window: 1500)

Candle Type Count Percentage
Bullish Marubozu       'A' 329 21.93%
Bullish SpinTop          'G' 71 4.73%
Bullish Pinbar            'H' 39 2.60%
Bullish Inv.Pinbar       'E' 43 2.87%
Bearish Marubozu      'a' 337 22.47%
Bearish SpinTop         'g' 59 3.93%
Bearish Pinbar           'h' 44 2.93%
Bearish Inv.Pinbar      'e' 44 2.93%
Doji                          'D' 46 3.07%
Unclassified              '_' 488 32.53%

Within the bullish candlestick categories, 'A' remained the dominant structure with 329 occurrences, accounting for 21.93% of the total observations. This was followed by 'G' with 71 occurrences (4.73%), 'E' with 43 occurrences (2.87%), and 'H' with 39 occurrences (2.60%).

Among the bearish candlestick categories, 'a' again recorded the highest frequency with 337 occurrences (22.47%), followed by 'g' with 59 occurrences (3.93%). Both 'e' and 'h' appeared 44 times each, representing 2.93% of the total dataset.

The neutral candlestick category 'D' recorded 46 occurrences, corresponding to 3.07% of all observations. This represents a notable increase compared to both the H1 and M15 datasets. The unclassified candlestick category '_' accounted for 488 occurrences (32.53%), which is lower than the proportion observed in the M15 dataset.

The aggregated bullish candlestick count was 482 (32.13%), while the bearish count was 484 (32.26%), once again demonstrating a remarkably balanced distribution between bullish and bearish market structures.

One notable observation at the M5 timeframe is the increased occurrence of the neutral candlestick ('D'). The frequency of doji-like structures exceeded that of several classified bullish and bearish categories, including E/e and H/h. This suggests that lower timeframes may contain a greater proportion of indecision or equilibrium periods, where neither buyers nor sellers establish a clear directional advantage within a single candle interval.

Despite this increase in neutral candlestick activity, the overall bullish–bearish symmetry remains evident. The frequencies of corresponding bullish and bearish candlestick pairs continue to exhibit close agreement, reinforcing the observations made in both the H1 and M15 analysis.

Overall, the results from the three GBPUSD timeframes reveal a consistent statistical structure. The dominance of the A/a categories, the substantial proportion of unclassified candlesticks, and the near-symmetrical distribution between bullish and bearish patterns appear to be persistent characteristics of the encoded GBPUSD price-action series. The primary distinction at lower timeframes is the increased presence of neutral candlestick structures, which may reflect the noisier and more indecisive nature of short-term market movements.

  • GOLD Encoded Series

Having examined the frequency characteristics of the GBPUSD encoded series, our attention now turns to Gold (XAUUSD) to determine whether similar statistical properties exist in a different financial instrument. Gold is known for exhibiting distinct volatility characteristics and market dynamics compared to major currency pairs. Therefore, analyzing its encoded candlestick distribution provides an opportunity to assess the robustness of the encoding framework across different asset classes.

XAUUSD H1 Series

The same methodology applied to GBPUSD was used for the Gold H1 dataset. A total of 1,500 candlesticks were sampled and transformed into their corresponding alphabetic representations according to the encoding scheme introduced in Part 1. Figure 4 presents the resulting encoded candlestick series.

Gold_H1_Encode_sg

Figure 4: XAUUSD H1 Encoded Series

The frequency distribution of the encoded candlestick types is summarized in Table 4.

Table 4: XAUUSD H1 Candle Counts Percentages (Window: 1500)

Candle Type Count Percentage
Bullish Marubozu       'A' 286 19.07%
Bullish SpinTop          'G' 94 6.27%
Bullish Pinbar            'H' 27 1.80%
Bullish Inv.Pinbar       'E' 39 2.60%
Bearish Marubozu      'a' 310 20.67%
Bearish SpinTop         'g' 108 7.20%
Bearish Pinbar           'h' 44 2.93%
Bearish Inv.Pinbar      'e' 33 2.20%
Doji                          'D' 0 0.00%
Unclassified              '_' 559 37.27%

Among the bullish candlestick categories, 'A' recorded the highest frequency with 286 occurrences, representing 19.07% of the total sample. This was followed by 'G' with 94 occurrences (6.27%), 'E' with 39 occurrences (2.60%), and 'H' with 27 occurrences (1.80%).

Within the bearish categories, 'a' dominated with 310 occurrences (20.67%), followed by 'g' with 108 occurrences (7.20%), 'h' with 44 occurrences (2.93%), and 'e' with 33 occurrences (2.20%).

Interestingly, no neutral candlestick ('D') was observed within the analyzed sample. The unclassified category '_' accounted for 559 occurrences, representing 37.27% of the dataset, which is the highest proportion observed thus far.

Aggregating the classified candle types yielded 446 bullish occurrences (29.73%) and 495 bearish occurrences (33.00%). Although bearish structures appeared more frequently than bullish structures, the difference of 49 occurrences represents only a small fraction of the total sample. Consequently, the overall distribution remains relatively balanced.

A comparison of corresponding bullish and bearish candlestick pairs reveals a strong degree of symmetry. The frequencies of A/a, G/g, E/e, and H/h remain closely aligned despite minor variations. This observation suggests that upward and downward price movements occur with similar structural characteristics. Given this balance, it is reasonable to extend the earlier hypothesis that the unclassified candlestick population is also approximately evenly distributed between bullish and bearish structures.

XAUUSD M15 Series

To investigate whether the statistical properties observed in the H1 timeframe persist at a lower timeframe, the analysis was repeated using 1,500 Gold candlesticks from the M15 timeframe. Figure 5 presents the encoded candlestick series.

Gold_M15_Encoded_sg

Figure 5: XAUUSD M15 Encoded Series

The frequency distribution of the encoded symbols is summarized in Table 5.

Table 5: XAUUSD M15 Candle Counts & Percentages (Window: 1500)

Candle Type Count Percentage
Bullish Marubozu       'A' 317 21.13%
Bullish SpinTop          'G' 89 5.93%
Bullish Pinbar            'H' 33 2.20%
Bullish Inv.Pinbar       'E' 39 2.60%
Bearish Marubozu      'a' 324 21.60%
Bearish SpinTop         'g' 90 6.00%
Bearish Pinbar           'h' 34 2.27%
Bearish Inv.Pinbar      'e' 41 2.73%
Doji                          'D' 2 0.13%
Unclassified              '_' 531 35.40%

Among the bullish candlestick categories, 'A' again emerged as the dominant structure with 317 occurrences, accounting for 21.13% of the total observations. This was followed by 'G' with 89 occurrences (5.93%), 'E' with 39 occurrences (2.60%), and 'H' with 33 occurrences (2.20%).

For the bearish categories, 'a' remained the most frequent pattern with 324 occurrences (21.60%), followed by 'g' with 90 occurrences (6.00%), 'e' with 41 occurrences (2.73%), and 'h' with 34 occurrences (2.27%).

The neutral candlestick category ('D') appeared only 2 times (0.13%), indicating that indecision-type structures remain relatively rare within the Gold M15 dataset. The unclassified category '_' accounted for 531 occurrences (35.40%).

When grouped by market direction, bullish candlesticks contributed 478 occurrences (31.86%), while bearish candlesticks contributed 489 occurrences (32.60%). The difference between the two groups was only 11 occurrences, representing less than 1% of the total sample.

This result provides even stronger evidence of bullish-bearish symmetry than that observed in the H1 dataset. The frequencies of corresponding bullish and bearish candlestick pairs remain nearly identical, suggesting that the encoded market structure maintains a stable equilibrium between buying and selling activity. Consequently, the assumption regarding the approximate balance of bullish and bearish candlesticks within the unclassified category remains plausible for the M15 timeframe.

XAUUSD M5 Series

The analysis was further extended to the M5 timeframe to determine whether the observed statistical characteristics persist at a finer market resolution. As before, 1,500 Gold candlesticks were transformed into their corresponding encoded alphabet series, as illustrated in Figure 6.

Gold_M5_Encode_sg

Figure 6: XAUUSD M5 Encoded Series

The frequency distribution of the encoded candlestick types is summarized in Table 6.

Table 6: XAUUSD M5 Candle Counts & Percentages (Window: 1500)

Candle Type Count Percentage
Bullish Marubozu       'A' 302 20.13%
Bullish SpinTop          'G' 97 6.47%
Bullish Pinbar            'H' 41 2.73%
Bullish Inv.Pinbar       'E' 34 2.27%
Bearish Marubozu      'a' 338 22.53%
Bearish SpinTop         'g' 99 6.60%
Bearish Pinbar           'h' 51 3.40%
Bearish Inv.Pinbar      'e' 37 2.47%
Doji                          'D' 2 0.13%
Unclassified              '_' 499 33.27%

Within the bullish categories, 'A' remained the dominant pattern with 302 occurrences (20.13%), followed by 'G' with 97 occurrences (6.47%), 'H' with 41 occurrences (2.73%), and 'E' with 34 occurrences (2.27%). For the bearish categories, 'a' recorded the highest frequency with 338 occurrences (22.53%), followed by 'g' with 99 occurrences (6.60%), 'h' with 51 occurrences (3.40%), and 'e' with 37 occurrences (2.47%). The neutral candlestick category ('D') again appeared only 2 times (0.13%), while the unclassified category '_' accounted for 499 occurrences (33.27%), representing a lower proportion than observed in the H1 and M15 datasets.

The total bullish count was 474 occurrences (31.60%), compared with 525 bearish occurrences (35.00%). Although the bearish count exceeded the bullish count by 51 occurrences, this difference remains relatively small when viewed against the total sample size of 1,500 observations.

The frequencies of corresponding bullish and bearish candlestick pairs continue to exhibit a strong degree of symmetry. While minor deviations become more apparent at this timeframe, the overall structure remains balanced, suggesting that the observed differences may largely reflect short-term market fluctuations rather than a persistent directional bias.

The consistency of this symmetry across H1, M15, and M5 timeframes provides further support for the hypothesis that the unclassified candlestick population is approximately evenly divided between bullish and bearish structures. As the sample size increases, the aggregate distribution appears to converge toward a balanced representation of buying and selling activity.

Comparative Discussion: GBPUSD vs Gold

Table 7: Comparing GBPUSD vs Gold

Metric GBPUSD H1 Gold H1 GBPUSD M15 Gold M15 GBPUSD M5 Gold M5
'A '+ 'a' % ~41.4 ~39.7 ~41.3 ~42.7 ~44.4 ~42.7
Bullish % 31.60 29.73 30.80 31.86 32.13 31.60
Bearish % 32.46 32.33 32.33 32.60 32.26 35.00
Neutral 'D' % 0.53 0.00 0.87 0.13 3.07 0.13
Unclassified % 35.40 37.27 36.00 35.40 32.53 33.27

Overall, the Gold datasets are similar to GBPUSD. Across all three timeframes, A/a are dominant, G/g are typically second, and bullish and bearish frequencies are closely matched. This suggests the encoding captures structural properties that generalize across markets. Such consistency strengthens confidence in the encoding methodology and provides a solid foundation for the subsequent analysis of multi-candlestick pattern frequencies. Unclassified candles are consistently high (32–37%) across all timeframes and instruments. This reflects the strictness of our Part 1 thresholds. A significant minority of candles do not fit the idealized Marubozu or candlestick types defined. These should not be ignored; the candletype function could be expanded to include other types.


Practical Implication for Traders

From a frequency‑based trading perspective, the data suggests that focusing on 'A' and 'a' patterns (strong, near‑marubozu candles) provides the highest number of trading opportunities. However, because they are so common, they may also generate many false signals if used in isolation. Conversely, patterns involving 'H', 'h', or 'D' are statistically rare—trading strategies that rely on them will have few historical examples for backtesting and are unlikely to produce frequent entries.

The near symmetry between bullish and bearish frequencies also reinforces the importance of strict risk management: over a large sample, the market does not favor one side. Any perceived "edge" from a candlestick pattern must come from its contextual placement (e.g., support/resistance, trend phase) rather than from a simple directional imbalance.

Code Structure

To perform the frequency analysis of encoded candlestick patterns, a set of custom structures and functions was developed. The program scans historical candlestick data, converts each candlestick into its corresponding alphabet symbol, counts the occurrence of each symbol, computes frequency statistics, and finally saves the results to an output file.

The CandleCounts structure is a custom data container designed to store information about each encoded candlestick type. It maintains the candlestick code, the number of times the code appears in the encoded series.

//+------------------------------------------------------------------+
//| Structure to hold candle type counts                             |
//+------------------------------------------------------------------+
struct CandleCounts
  {
   int               A;    // bullish marubozu/Long/short
   int               G;    // bullish spinning top
   int               H;    // bullish pinbar
   int               E;    // bullish inverted pinbar
   int               a;    // bearish marubozu/Long/short
   int               g;    // bearish spinning top
   int               h;    // bearish pinbar
   int               e;    // bearish inverted pinbar
   int               D;    // doji
   int               zero; // unclassified (_)
   int               total;
  };

The CandleType() function is responsible for converting a candlestick into its encoded alphabet representation. The function computes key candlestick properties, including candle body size, upper wick length and lower wick length. Using the classification rules established in Part 1, the function evaluates these characteristics and assigns the appropriate alphabet symbol to the candlestick. If a candlestick does not satisfy any of the predefined classification rules, the function assigns the underscore symbol '_' to indicate an unclassified candlestick. The function accepts candlestick position as input and returns the corresponding encoded symbol as a string.

//+------------------------------------------------------------------+
//| Candle Type labeling function                                    |
//+------------------------------------------------------------------+
string CandleType(int shift)
  {
//--- Get price data for the specified candle
   double open  = iOpen(NULL, 0, shift);
   double close = iClose(NULL, 0, shift);
   double high  = iHigh(NULL, 0, shift);
   double low   = iLow(NULL, 0, shift);

//--- Calculate candle components
   double body      = MathAbs(close - open);
   double upperWick = high - MathMax(open, close);
   double lowerWick = MathMin(open, close) - low;

//--- Handle bullish candles
   if(close > open)
     {
      if(body > 1.5 * upperWick && body > 1.5 * lowerWick)
         return "A"; // long body / marubozu
      if(2 * body < upperWick && 2 * body < lowerWick)
         return "G"; // spinning top
      if(lowerWick > 2.5 * body && lowerWick > 2 * upperWick)
         return "H"; // pinbar
      if(upperWick > 2.5 * body && upperWick > 2 * lowerWick)
         return "E"; // inverted pinbar
     }
//--- Handle bearish candles
   else
      if(close < open)
        {
         if(body > 1.5 * upperWick && body > 1.5 * lowerWick)
            return "a";
         if(2 * body < upperWick && 2 * body < lowerWick)
            return "g";
         if(lowerWick > 2.5 * body && lowerWick > 2 * upperWick)
            return "h";
         if(upperWick > 2.5 * body && upperWick > 2 * lowerWick)
            return "e";
        }
      else
         if(close == open)
            return "D"; // doji

//--- Return _ if no conditions met
   return "_";
  }

The CountFromSeries() function performs the frequency-counting operation. After the historical price series has been transformed into an encoded alphabet sequence, this function scans the sequence one character at a time and records the occurrence of each candlestick type.

Before counting begins, all counters are reset to zero to ensure that previous calculations do not influence the current analysis. The function then determines the length of the encoded series and iterates through each symbol, updating the appropriate frequency count. The output of this function forms the basis for calculating frequencies and their percentages.

//+------------------------------------------------------------------+
//| Count from an existing series string                             |
//+------------------------------------------------------------------+
CandleCounts CountFromSeries(string series)
{
   CandleCounts c;
   ZeroMemory(c);

   int len = StringLen(series);
   c.total = len;

   for(int i = 0; i < len; i++)
   {
      string ch = StringSubstr(series, i, 1);

      (ch == "A") ? c.A++    :
      (ch == "G") ? c.G++    :
      (ch == "H") ? c.H++    :
      (ch == "E") ? c.E++    :
      (ch == "a") ? c.a++    :
      (ch == "g") ? c.g++    :
      (ch == "h") ? c.h++    :
      (ch == "e") ? c.e++    :
      (ch == "D") ? c.D++    :
                    c.zero++ ;   //ch == "_")
                   
   }

   return c;
}

The FormatLine() function is responsible for formatting the output records into a consistent and readable layout. For each candlestick type, the function combines the candlestick label, encoded symbol, frequency count & percentage occurrence into a predefined text format suitable for display and file output. Using a dedicated formatting function improves code readability and ensures that all output records follow a uniform structure.

//+------------------------------------------------------------------+
//|    Format count + percentage line                                |
//+------------------------------------------------------------------+
string FormatLine(string label, string code, int count, int total)
{
   double pct = (total > 0) ? (count * 100.0 / total) : 0.0;
   return StringFormat("%-20s (%s): %3d  (%5.2f%%)", label, code, count, pct);
}

The SaveMarketStructureToFile() function handles the generation of the final report.

The function performs the following tasks:

  1. Creates the output file name.
  2. Opens a text file for writing.
  3. Validates that the file was successfully opened.
  4. Writes the frequency analysis results into the file.
  5. Closes the file after all records have been saved.

If the file cannot be opened, the function immediately terminates and returns without writing any data. Otherwise, it proceeds to generate the complete report.

Upon successful completion, the following message is displayed: Results saved to file.

This function acts as the final stage of the analysis pipeline by preserving the generated statistics for further review.

//+------------------------------------------------------------------+
//| Write series + counts + percentages to TXT file                  |
//+------------------------------------------------------------------+
void SaveMarketStructureToFile(string series, int nlookback, string filename = "")
  {
//--- Build default filename if none provided
   if(StringLen(filename) == 0)
      filename = _Symbol + "Mdata2.txt";

//--- open/create file
   int handle = FileOpen(filename, FILE_TXT | FILE_WRITE);

//--- check if file opened successfully
   if(handle == INVALID_HANDLE)
     {
      Print("Failed to open file: ", filename);
      return;
     }

//--- 1) Write the continuous series string
   FileWriteString(handle, "=== MARKET CODED STRUCTURE " + TimeFrameToString(Period()) + " SERIES ===\n");
   FileWriteString(handle, series);
   FileWriteString(handle, "\n\n");

//--- 2) Generate counts
   CandleCounts c = CountFromSeries(series);

//--- 3) Build report with percentages
   string report =
      "=== CANDLE COUNTS & PERCENTAGES (Window: " + IntegerToString(nlookback) + ") ===\n" +
      "----------------------------------------------------------------\n" +
      FormatLine("Bullish Marubozu",   "A", c.A, c.total)     + "\n" +
      FormatLine("Bullish SpinTop",    "G", c.G, c.total)     + "\n" +
      FormatLine("Bullish Pinbar",     "H", c.H, c.total)     + "\n" +
      FormatLine("Bullish Inv.Pinbar", "E", c.E, c.total)     + "\n" +
      FormatLine("Bearish Marubozu",   "a", c.a, c.total)     + "\n" +
      FormatLine("Bearish SpinTop",    "g", c.g, c.total)     + "\n" +
      FormatLine("Bearish Pinbar",     "h", c.h, c.total)     + "\n" +
      FormatLine("Bearish Inv.Pinbar", "e", c.e, c.total)     + "\n" +
      FormatLine("Doji",               "D", c.D, c.total)     + "\n" +
      FormatLine("Unclassified",       "_", c.zero, c.total)  + "\n" +
      "----------------------------------------------------------------\n" +
      StringFormat("Total: %-3d  (100.00%%)", c.total);

//--- 4) Write report to file
   FileWriteString(handle, report);

//--- close file
   FileClose(handle);

//--- output message
   MessageBox(StringFormat("Results saved to file: %s", filename), "Prompt");
  }


Program Flow

The analysis begins by scanning historical candlestick data from the selected financial instrument and timeframe. Each candlestick is passed to the CandleType() function, where it is converted into its corresponding alphabet symbol according to the encoding rules defined in Part 1. These symbols are concatenated to form an encoded market series.

After the encoded series has been generated, the SaveMarketStructureToFile() function is called to produce the final statistical report. Within this function, CountFromSeries() scans the encoded sequence and records the frequency of each candlestick type. The raw counts are then converted into percentage occurrences.

Next, the FormatLine() function organizes each record into a standardized output format containing the candlestick label, encoded symbol, frequency count, and percentage occurrence.

Finally, SaveMarketStructureToFile() writes the formatted results to a text file and closes the file upon completion. When the process finishes successfully, the program displays the message: Results saved to file.

This workflow can be summarized as follows:

workflow

Figure 7: Workflow

Code Usage Demonstration

Figures 8–10 illustrate code execution on GBPJPY across H1, M15, and M5 timeframes. The encoded statistics enable comparison with previous studies to support informed decisions.


GBPJPY_H1_sg

Figure 8: GBPJPY H1 Encoded Series and Statistics


GBPJPY_M15_sg

Figure 9: GBPJPY M15 Encoded Series and Statistics


GBPJPY_M5_sg

Figure 10: GBPJPY M5 Encoded Series and Statistics


Conclusion

This study translated the symbolic encoding framework into a practical, reproducible market profiling step. By processing 1,500‑bar windows for GBPUSD and XAUUSD across H1, M15 and M5, we produced per‑symbol frequency tables that show consistent patterns:

  • The Marubozu category (A/a) is the most frequent across instruments and timeframes. SpinTop (G/g) tends to be the second most common.
  • Bullish vs. bearish totals are closely matched in all samples, supporting a hypothesis of near‑symmetry in the classified population.
  • A substantial fraction (≈32–37%) of candles are unclassified (“_”), reflecting strict classification thresholds; this flag should be interpreted as a signal to adjust rules or expand the taxonomy if needed.
  • Lower timeframes (M5) show an increased incidence of neutral/doji candles in some samples, which is consistent with higher short‑term indecision and noise.

Practical output: the MQL5 script and TXT report let you (a) produce the encoded symbol string for any symbol/TF, (b) obtain counts and percentages for A/G/H/E/a/g/h/e/D/_ and (c) use these profiles as objective inputs to further testing. Importantly, single‑candle frequency alone does not constitute a trading edge—any hypothesis of predictability requires contextual filters (trend, support/resistance, position in sequences) and out‑of‑sample testing.

It is important to note that the number of candlesticks used in the analysis is not restricted to 1,500. Depending on the research objective, traders and researchers may analyze either fewer or more candlesticks by adjusting the Lookback input parameter, which determines the sample size used in the frequency analysis.

Next step: extend this pipeline to consecutive two‑symbol patterns and transition statistics. Those sequence-level frequencies and conditional probabilities are the natural follow-up required to evaluate pattern reliability and to begin building predictive models.

Attached files |
MarketSign_.mq5 (7.61 KB)
MQL5 Wizard Techniques you should know (Part 98): Using an Unscented Kalman Filter and a Capsule Network in a Custom Signal Class MQL5 Wizard Techniques you should know (Part 98): Using an Unscented Kalman Filter and a Capsule Network in a Custom Signal Class
This article presents 'CSignalUKFCapsNet', as a custom class coded in MQL5. This class is meant to be used with the MQL5 Wizard when assembling an Expert Advisor and when selected in the Wizard it defines the Expert Advisor's entry signals. In building this custom class, we brought together the algorithm Unscented Kalman Filter and the Capsule Neural Network. Our algorithm is showcased with four operation modes, and the coding of this as a custom class for the MQL5 Wizard, allows testing with various Trailing Stop methods and Money Management systems.
CSV Data Analysis (Part 5): Real-Time CSV Streaming from Live MetaTrader 5 Sessions CSV Data Analysis (Part 5): Real-Time CSV Streaming from Live MetaTrader 5 Sessions
This article describes a live data export framework for MetaTrader 5 built around a decoupled, three‑layer design. The MQL5 component batches bar and tick records via a write buffer and rotates CSV files daily; a Python daemon tails the stream, renders a live dashboard, and flags anomaly thresholds. The demo indicator illustrates integration points, enabling real‑time monitoring and auditability during trading sessions.
Quantum Neural Network in MQL5 (Part III): A Virtual Quantum Processor Based on Qubits Quantum Neural Network in MQL5 (Part III): A Virtual Quantum Processor Based on Qubits
The article focuses on creating a trading system with a real quantum simulator instead of mathematical analogies. The system uses 3 virtual qubits, quantum gates and superposition principles to analyze markets. It is implemented as a trading EA for MetaTrader 5 in MQL5. The main achievement is the transition from simulation to real quantum principles of financial information processing.
Market Simulation: Getting Started with SQL in MQL5 (V) Market Simulation: Getting Started with SQL in MQL5 (V)
In the previous article, I showed how to proceed in order to add a query mechanism. This was needed so that, inside MQL5 code, you could fully use SQL and retrieve results when executing a SQL SELECT ... FROM query. But there is still one last function we need to implement. This is the DatabaseReadBind function. Since understanding it properly requires a slightly more detailed explanation, it was decided to cover it not in the previous article, but in today's article. So, since the topic will be fairly extensive, let us proceed directly to the next section.