Encoding Candlestick Patterns (Part 3): Frequency Analysis for Single Candlestick Type Structure
Introduction
Financial price series combine many interacting drivers—macro data, liquidity, policy, news and trader behavior—so charts often look noisy. Yet price action repeatedly produces recognizable candle shapes and local structures. In Parts 1–2 we formalized this observation: individual candles were converted into a finite alphabet of symbols (Part 1) and tools were developed to enumerate possible symbol combinations (Part 2). What remained unresolved was a practical, data-driven question: which of those encoded candle types actually occur in live markets, how often, and what are the reproducible outputs a practitioner can use?
This article closes that basic gap by providing a focused, reproducible frequency analysis of single‑candle symbols. Target audience: traders, algo developers and researchers who want a measurable market profile as input to further modeling. Scope and limits: we analyze only individual candlestick symbols (one‑letter patterns) — not multi‑candle sequences — and we treat Marubozu variants as a single category while grouping all non‑matching candles under the unclassified underscore (“_”).
Using 1,500‑candle samples from GBP/USD and Gold (XAUUSD) on H1, M15 and M5 timeframes, we (1) apply the Part‑1 encoding to historical bars, (2) count occurrences of A/G/H/E/a/g/h/e/D/_ symbols, and (3) compute raw counts and percentages. The deliverable is a reproducible MQL5 script and a TXT report that together produce: the encoded series, per‑symbol counts and percentages, and simple summary metrics (dominant symbols, share of unclassified candles, and bullish/bearish symmetry). These measurable outputs are intended to be the stable input for the next step: frequency analysis of two-letter patterns and transition probabilities.
Frequency analysis is widely used in many disciplines, including linguistics, genetics, and data science, to identify dominant structures within large datasets. Applying the same principle to encoded candlestick data allows us to determine which market structures occur most frequently, which are relatively rare, and whether certain combinations exhibit recurring behavior that may warrant further investigation.
Objective
The primary objective of this article is to perform a statistical frequency analysis of encoded candlestick patterns using real market data. Specifically, we will investigate the occurrence of individual candlestick structures (single-symbol patterns) within historical price series.
Using Gold (XAUUSD) and GBPUSD as case studies, we will:
- Convert historical candlestick data into the alphabet-based encoding system as in Part 1.
- Extract individual encoded symbols from the encoded market sequence.
- Compute the frequency of occurrence and percentage of each symbol.
- Compare pattern distributions across different financial instruments.
- Identify dominant candlestick structures within the dataset.
To accomplish these objectives, we will develop an MQL5 script capable of automatically encoding historical price data, extracting pattern sequences, counting occurrences, and generating statistical summaries.
The results obtained from this analysis will provide a quantitative foundation for future studies on pattern reliability, transition probabilities, conditional occurrence analysis, and predictive modeling. Rather than treating candlestick patterns solely as visual formations, we begin to view them as measurable statistical events whose behavior can be systematically analyzed and compared across markets.
Ultimately, frequency analysis serves as the bridge between pattern generation and statistical modeling. Once the occurrence characteristics of encoded candlestick structures are understood, traders and researchers can proceed to investigate whether frequently occurring patterns possess exploitable market tendencies and whether rare patterns carry unique informational value.
Single-Candlestick Frequency Analysis
In this section, we examine the frequency distribution of individual encoded candlestick types for Gold (XAUUSD) and GBPUSD across the H1, M15, and M5 timeframes. The objective is to identify the dominant candlestick structures within the encoded price-action series and to determine whether certain candle types occur more frequently than others.
Before discussing the results, it is important to clarify two aspects of the encoding scheme introduced in Part 1. First, the underscore symbol ('_') represents unclassified candlesticks. These are candlesticks that do not satisfy any of the predefined bullish or bearish classification rules and are therefore grouped into a single unclassified category.
Second, the Marubozu class (Part 1) includes three related candlestick definitions. They differ slightly, but all show a strong directional body with minimal or no shadows; therefore, they are treated as a single category for encoding and analysis.
- GBPUSD Encoded Series
GBPUSD H1 Series
For GBPUSD under a one-hour timeframe, a total of 1,500 candlesticks were transformed into their corresponding alphabetic representations using the encoding framework developed in Part 1. Figure 1 presents the resulting encoded price series for a one-hour timeframe, where each alphabet denotes a specific candlestick structure identified from the historical market data.

Figure 1: GBPUSD H1 Encoded Series
Figure 1 illustrates the market as a sequence of encoded symbols rather than conventional candlestick charts. This transformation enables price action to be analyzed using statistical and computational techniques. As discussed previously, the underscore symbol (_) represents candlesticks that do not satisfy any of the predefined bullish or bearish classification criteria and are therefore categorized as unclassified candlesticks.
Having converted the candlestick data into symbolic form, the next step was to examine the frequency of occurrence of each candle type. The results are summarized in Table 1.
Table 1: GBPUSD H1Candle Counts Percentages (Window: 1500)
| Candle Type | Count | Percentage |
|---|---|---|
| Bullish Marubozu 'A' | 310 | 20.67% |
| Bullish SpinTop 'G' | 86 | 5.73% |
| Bullish Pinbar 'H' | 28 | 1.87% |
| Bullish Inv.Pinbar 'E' | 50 | 3.33% |
| Bearish Marubozu 'a' | 311 | 20.73% |
| Bearish SpinTop 'g' | 90 | 6.00% |
| Bearish Pinbar 'h' | 38 | 2.53% |
| Bearish Inv.Pinbar 'e' | 48 | 3.20% |
| Doji 'D' | 8 | 0.53% |
| Unclassified '_' | 531 | 35.40% |
Among the bullish candlestick categories, the symbol 'A' recorded the highest frequency with 310 occurrences, representing 20.67% of the total sample. This was followed by 'G' with 86 occurrences (5.73%), 'E' with 50 occurrences (3.33%), and 'H' with 28 occurrences (1.87%).
Similarly, within the bearish candlestick categories, 'a' exhibited the highest occurrence with 311 counts (20.73%), followed by 'g' with 90 counts (6.00%), 'e' with 48 counts (3.20%), and 'h' with 38 counts (2.53%). The neutral candlestick represented by 'D' appeared 8 times, accounting for 0.53% of the dataset.
The unclassified candlestick category '_' recorded 531 occurrences, representing 35.40% of all observations. This indicates that more than one-third of the analyzed candlesticks do not conform to the predefined encoding rules and may require further subdivision or refinement in future studies.
Aggregating the classified candle types shows that bullish candlesticks accounted for 474 observations (31.60%), while bearish candlesticks accounted for 487 observations (32.46%). The difference between the two groups is relatively small, suggesting that the market exhibited a balanced distribution of bullish and bearish price action over the analyzed period.
A closer examination of corresponding bullish and bearish candlestick pairs reveals a remarkable degree of symmetry. For example, the occurrence frequencies of 'A' and 'a', 'G' and 'g', as well as 'E' and 'e', are very similar. This near-equivalence suggests that the encoding framework captures market behavior that is largely balanced between upward and downward price movements.
Given this observed symmetry, it is reasonable to infer that the unclassified candlestick population may also exhibit a similar balance between bullish and bearish structures.
The encoding process does not assign direction to unclassified candles, so the 531 unclassified observations may be split roughly evenly between bullish and bearish candles. This cannot be verified under the current scheme, but the symmetry in the classified categories supports the hypothesis.
Overall, the frequency analysis demonstrates that the encoded market series possesses a relatively balanced bullish–bearish structure. This finding establishes a statistical foundation for the subsequent analysis of multi-candlestick patterns, where combinations of encoded symbols can be investigated to determine whether certain sequences occur more frequently than would be expected by chance.
GBPUSD M15 Series
To investigate whether the frequency characteristics observed in the H1 timeframe persist at lower resolutions, the same analysis was performed on 1,500 GBPUSD candlesticks from the 15-minute (M15) timeframe. The candlesticks were transformed into their corresponding encoded alphabet series, as illustrated in Figure 2.

Figure 2: GBPUSD M15 Encoded Series
The frequency distribution of the encoded candlestick types is summarized in Table 2.
Table 2: GBPUSD M15 Candle Counts Percentages (Window: 1500)
| Candle Type | Count | Percentage |
|---|---|---|
| Bullish Marubozu 'A' | 306 | 20.40% |
| Bullish SpinTop 'G' | 77 | 5.13% |
| Bullish Pinbar 'H' | 35 | 2.33% |
| Bullish Inv.Pinbar 'E' | 44 | 2.93% |
| Bearish Marubozu 'a' | 314 | 20.93% |
| Bearish SpinTop 'g' | 82 | 5.47% |
| Bearish Pinbar 'h' | 45 | 3.00% |
| Bearish Inv.Pinbar 'e' | 44 | 2.93% |
| Doji 'D' | 13 | 0.87% |
| Unclassified '_' | 540 | 36.00% |
Among the bullish candlestick categories, 'A' recorded the highest frequency with 306 occurrences, representing 20.40% of the total sample. This was followed by 'G' with 77 occurrences (5.13%), 'E' with 44 occurrences (2.93%), and 'H' with 35 occurrences (2.33%).
For the bearish candlestick categories, 'a' was the most frequently occurring pattern with 314 occurrences (20.93%), followed by 'g' with 82 occurrences (5.47%), 'h' with 45 occurrences (3.00%), and 'e' with 44 occurrences (2.93%).
The neutral candlestick represented by 'D' appeared 13 times, accounting for 0.87% of the dataset. Meanwhile, the unclassified category '_' recorded 540 occurrences, representing 36.00% of all observations.
When aggregated, the bullish candlestick categories contributed 462 occurrences (30.80%), while the bearish categories contributed 485 occurrences (32.33%). Similar to the H1 analysis, the difference between bullish and bearish counts is relatively small.
A similar symmetry arrangement is observed as explained in the GBPUSD H1 series. The consistency of this symmetry across both H1 and M15 timeframes indicates that the encoding framework captures an inherent balance between bullish and bearish price movements in the GBPUSD market.
GBPUSD M5 Series
The analysis was further extended to the 5-minute (M5) timeframe to determine whether the observed frequency characteristics remain stable at a higher sampling resolution. As with the previous datasets, 1,500 candlesticks were converted into their encoded alphabet representations, as shown in Figure 3.

Figure 3: GBPUSD M5 Encoded Series
Table 3 summarizes the frequency distribution of the encoded candlestick types.
Table 3: GBPUSD M5 Candle Counts Percentages (Window: 1500)
| Candle Type | Count | Percentage |
|---|---|---|
| Bullish Marubozu 'A' | 329 | 21.93% |
| Bullish SpinTop 'G' | 71 | 4.73% |
| Bullish Pinbar 'H' | 39 | 2.60% |
| Bullish Inv.Pinbar 'E' | 43 | 2.87% |
| Bearish Marubozu 'a' | 337 | 22.47% |
| Bearish SpinTop 'g' | 59 | 3.93% |
| Bearish Pinbar 'h' | 44 | 2.93% |
| Bearish Inv.Pinbar 'e' | 44 | 2.93% |
| Doji 'D' | 46 | 3.07% |
| Unclassified '_' | 488 | 32.53% |
Within the bullish candlestick categories, 'A' remained the dominant structure with 329 occurrences, accounting for 21.93% of the total observations. This was followed by 'G' with 71 occurrences (4.73%), 'E' with 43 occurrences (2.87%), and 'H' with 39 occurrences (2.60%).
Among the bearish candlestick categories, 'a' again recorded the highest frequency with 337 occurrences (22.47%), followed by 'g' with 59 occurrences (3.93%). Both 'e' and 'h' appeared 44 times each, representing 2.93% of the total dataset.
The neutral candlestick category 'D' recorded 46 occurrences, corresponding to 3.07% of all observations. This represents a notable increase compared to both the H1 and M15 datasets. The unclassified candlestick category '_' accounted for 488 occurrences (32.53%), which is lower than the proportion observed in the M15 dataset.
The aggregated bullish candlestick count was 482 (32.13%), while the bearish count was 484 (32.26%), once again demonstrating a remarkably balanced distribution between bullish and bearish market structures.
One notable observation at the M5 timeframe is the increased occurrence of the neutral candlestick ('D'). The frequency of doji-like structures exceeded that of several classified bullish and bearish categories, including E/e and H/h. This suggests that lower timeframes may contain a greater proportion of indecision or equilibrium periods, where neither buyers nor sellers establish a clear directional advantage within a single candle interval.
Despite this increase in neutral candlestick activity, the overall bullish–bearish symmetry remains evident. The frequencies of corresponding bullish and bearish candlestick pairs continue to exhibit close agreement, reinforcing the observations made in both the H1 and M15 analysis.
Overall, the results from the three GBPUSD timeframes reveal a consistent statistical structure. The dominance of the A/a categories, the substantial proportion of unclassified candlesticks, and the near-symmetrical distribution between bullish and bearish patterns appear to be persistent characteristics of the encoded GBPUSD price-action series. The primary distinction at lower timeframes is the increased presence of neutral candlestick structures, which may reflect the noisier and more indecisive nature of short-term market movements.
- GOLD Encoded Series
Having examined the frequency characteristics of the GBPUSD encoded series, our attention now turns to Gold (XAUUSD) to determine whether similar statistical properties exist in a different financial instrument. Gold is known for exhibiting distinct volatility characteristics and market dynamics compared to major currency pairs. Therefore, analyzing its encoded candlestick distribution provides an opportunity to assess the robustness of the encoding framework across different asset classes.
XAUUSD H1 Series
The same methodology applied to GBPUSD was used for the Gold H1 dataset. A total of 1,500 candlesticks were sampled and transformed into their corresponding alphabetic representations according to the encoding scheme introduced in Part 1. Figure 4 presents the resulting encoded candlestick series.

Figure 4: XAUUSD H1 Encoded Series
The frequency distribution of the encoded candlestick types is summarized in Table 4.
Table 4: XAUUSD H1 Candle Counts Percentages (Window: 1500)
| Candle Type | Count | Percentage |
|---|---|---|
| Bullish Marubozu 'A' | 286 | 19.07% |
| Bullish SpinTop 'G' | 94 | 6.27% |
| Bullish Pinbar 'H' | 27 | 1.80% |
| Bullish Inv.Pinbar 'E' | 39 | 2.60% |
| Bearish Marubozu 'a' | 310 | 20.67% |
| Bearish SpinTop 'g' | 108 | 7.20% |
| Bearish Pinbar 'h' | 44 | 2.93% |
| Bearish Inv.Pinbar 'e' | 33 | 2.20% |
| Doji 'D' | 0 | 0.00% |
| Unclassified '_' | 559 | 37.27% |
Among the bullish candlestick categories, 'A' recorded the highest frequency with 286 occurrences, representing 19.07% of the total sample. This was followed by 'G' with 94 occurrences (6.27%), 'E' with 39 occurrences (2.60%), and 'H' with 27 occurrences (1.80%).
Within the bearish categories, 'a' dominated with 310 occurrences (20.67%), followed by 'g' with 108 occurrences (7.20%), 'h' with 44 occurrences (2.93%), and 'e' with 33 occurrences (2.20%).
Interestingly, no neutral candlestick ('D') was observed within the analyzed sample. The unclassified category '_' accounted for 559 occurrences, representing 37.27% of the dataset, which is the highest proportion observed thus far.
Aggregating the classified candle types yielded 446 bullish occurrences (29.73%) and 495 bearish occurrences (33.00%). Although bearish structures appeared more frequently than bullish structures, the difference of 49 occurrences represents only a small fraction of the total sample. Consequently, the overall distribution remains relatively balanced.
A comparison of corresponding bullish and bearish candlestick pairs reveals a strong degree of symmetry. The frequencies of A/a, G/g, E/e, and H/h remain closely aligned despite minor variations. This observation suggests that upward and downward price movements occur with similar structural characteristics. Given this balance, it is reasonable to extend the earlier hypothesis that the unclassified candlestick population is also approximately evenly distributed between bullish and bearish structures.
XAUUSD M15 Series
To investigate whether the statistical properties observed in the H1 timeframe persist at a lower timeframe, the analysis was repeated using 1,500 Gold candlesticks from the M15 timeframe. Figure 5 presents the encoded candlestick series.

Figure 5: XAUUSD M15 Encoded Series
The frequency distribution of the encoded symbols is summarized in Table 5.
Table 5: XAUUSD M15 Candle Counts & Percentages (Window: 1500)
| Candle Type | Count | Percentage |
|---|---|---|
| Bullish Marubozu 'A' | 317 | 21.13% |
| Bullish SpinTop 'G' | 89 | 5.93% |
| Bullish Pinbar 'H' | 33 | 2.20% |
| Bullish Inv.Pinbar 'E' | 39 | 2.60% |
| Bearish Marubozu 'a' | 324 | 21.60% |
| Bearish SpinTop 'g' | 90 | 6.00% |
| Bearish Pinbar 'h' | 34 | 2.27% |
| Bearish Inv.Pinbar 'e' | 41 | 2.73% |
| Doji 'D' | 2 | 0.13% |
| Unclassified '_' | 531 | 35.40% |
Among the bullish candlestick categories, 'A' again emerged as the dominant structure with 317 occurrences, accounting for 21.13% of the total observations. This was followed by 'G' with 89 occurrences (5.93%), 'E' with 39 occurrences (2.60%), and 'H' with 33 occurrences (2.20%).
For the bearish categories, 'a' remained the most frequent pattern with 324 occurrences (21.60%), followed by 'g' with 90 occurrences (6.00%), 'e' with 41 occurrences (2.73%), and 'h' with 34 occurrences (2.27%).
The neutral candlestick category ('D') appeared only 2 times (0.13%), indicating that indecision-type structures remain relatively rare within the Gold M15 dataset. The unclassified category '_' accounted for 531 occurrences (35.40%).
When grouped by market direction, bullish candlesticks contributed 478 occurrences (31.86%), while bearish candlesticks contributed 489 occurrences (32.60%). The difference between the two groups was only 11 occurrences, representing less than 1% of the total sample.
This result provides even stronger evidence of bullish-bearish symmetry than that observed in the H1 dataset. The frequencies of corresponding bullish and bearish candlestick pairs remain nearly identical, suggesting that the encoded market structure maintains a stable equilibrium between buying and selling activity. Consequently, the assumption regarding the approximate balance of bullish and bearish candlesticks within the unclassified category remains plausible for the M15 timeframe.
XAUUSD M5 Series
The analysis was further extended to the M5 timeframe to determine whether the observed statistical characteristics persist at a finer market resolution. As before, 1,500 Gold candlesticks were transformed into their corresponding encoded alphabet series, as illustrated in Figure 6.

Figure 6: XAUUSD M5 Encoded Series
The frequency distribution of the encoded candlestick types is summarized in Table 6.
Table 6: XAUUSD M5 Candle Counts & Percentages (Window: 1500)
| Candle Type | Count | Percentage |
|---|---|---|
| Bullish Marubozu 'A' | 302 | 20.13% |
| Bullish SpinTop 'G' | 97 | 6.47% |
| Bullish Pinbar 'H' | 41 | 2.73% |
| Bullish Inv.Pinbar 'E' | 34 | 2.27% |
| Bearish Marubozu 'a' | 338 | 22.53% |
| Bearish SpinTop 'g' | 99 | 6.60% |
| Bearish Pinbar 'h' | 51 | 3.40% |
| Bearish Inv.Pinbar 'e' | 37 | 2.47% |
| Doji 'D' | 2 | 0.13% |
| Unclassified '_' | 499 | 33.27% |
Within the bullish categories, 'A' remained the dominant pattern with 302 occurrences (20.13%), followed by 'G' with 97 occurrences (6.47%), 'H' with 41 occurrences (2.73%), and 'E' with 34 occurrences (2.27%). For the bearish categories, 'a' recorded the highest frequency with 338 occurrences (22.53%), followed by 'g' with 99 occurrences (6.60%), 'h' with 51 occurrences (3.40%), and 'e' with 37 occurrences (2.47%). The neutral candlestick category ('D') again appeared only 2 times (0.13%), while the unclassified category '_' accounted for 499 occurrences (33.27%), representing a lower proportion than observed in the H1 and M15 datasets.
The total bullish count was 474 occurrences (31.60%), compared with 525 bearish occurrences (35.00%). Although the bearish count exceeded the bullish count by 51 occurrences, this difference remains relatively small when viewed against the total sample size of 1,500 observations.
The frequencies of corresponding bullish and bearish candlestick pairs continue to exhibit a strong degree of symmetry. While minor deviations become more apparent at this timeframe, the overall structure remains balanced, suggesting that the observed differences may largely reflect short-term market fluctuations rather than a persistent directional bias.
The consistency of this symmetry across H1, M15, and M5 timeframes provides further support for the hypothesis that the unclassified candlestick population is approximately evenly divided between bullish and bearish structures. As the sample size increases, the aggregate distribution appears to converge toward a balanced representation of buying and selling activity.
Comparative Discussion: GBPUSD vs Gold
Table 7: Comparing GBPUSD vs Gold
| Metric | GBPUSD H1 | Gold H1 | GBPUSD M15 | Gold M15 | GBPUSD M5 | Gold M5 |
|---|---|---|---|---|---|---|
| 'A '+ 'a' % | ~41.4 | ~39.7 | ~41.3 | ~42.7 | ~44.4 | ~42.7 |
| Bullish % | 31.60 | 29.73 | 30.80 | 31.86 | 32.13 | 31.60 |
| Bearish % | 32.46 | 32.33 | 32.33 | 32.60 | 32.26 | 35.00 |
| Neutral 'D' % | 0.53 | 0.00 | 0.87 | 0.13 | 3.07 | 0.13 |
| Unclassified % | 35.40 | 37.27 | 36.00 | 35.40 | 32.53 | 33.27 |
Overall, the Gold datasets are similar to GBPUSD. Across all three timeframes, A/a are dominant, G/g are typically second, and bullish and bearish frequencies are closely matched. This suggests the encoding captures structural properties that generalize across markets. Such consistency strengthens confidence in the encoding methodology and provides a solid foundation for the subsequent analysis of multi-candlestick pattern frequencies. Unclassified candles are consistently high (32–37%) across all timeframes and instruments. This reflects the strictness of our Part 1 thresholds. A significant minority of candles do not fit the idealized Marubozu or candlestick types defined. These should not be ignored; the candletype function could be expanded to include other types.
Practical Implication for Traders
From a frequency‑based trading perspective, the data suggests that focusing on 'A' and 'a' patterns (strong, near‑marubozu candles) provides the highest number of trading opportunities. However, because they are so common, they may also generate many false signals if used in isolation. Conversely, patterns involving 'H', 'h', or 'D' are statistically rare—trading strategies that rely on them will have few historical examples for backtesting and are unlikely to produce frequent entries.
The near symmetry between bullish and bearish frequencies also reinforces the importance of strict risk management: over a large sample, the market does not favor one side. Any perceived "edge" from a candlestick pattern must come from its contextual placement (e.g., support/resistance, trend phase) rather than from a simple directional imbalance.
Code Structure
To perform the frequency analysis of encoded candlestick patterns, a set of custom structures and functions was developed. The program scans historical candlestick data, converts each candlestick into its corresponding alphabet symbol, counts the occurrence of each symbol, computes frequency statistics, and finally saves the results to an output file.
The CandleCounts structure is a custom data container designed to store information about each encoded candlestick type. It maintains the candlestick code, the number of times the code appears in the encoded series.
//+------------------------------------------------------------------+ //| Structure to hold candle type counts | //+------------------------------------------------------------------+ struct CandleCounts { int A; // bullish marubozu/Long/short int G; // bullish spinning top int H; // bullish pinbar int E; // bullish inverted pinbar int a; // bearish marubozu/Long/short int g; // bearish spinning top int h; // bearish pinbar int e; // bearish inverted pinbar int D; // doji int zero; // unclassified (_) int total; };
The CandleType() function is responsible for converting a candlestick into its encoded alphabet representation. The function computes key candlestick properties, including candle body size, upper wick length and lower wick length. Using the classification rules established in Part 1, the function evaluates these characteristics and assigns the appropriate alphabet symbol to the candlestick. If a candlestick does not satisfy any of the predefined classification rules, the function assigns the underscore symbol '_' to indicate an unclassified candlestick. The function accepts candlestick position as input and returns the corresponding encoded symbol as a string.
//+------------------------------------------------------------------+ //| Candle Type labeling function | //+------------------------------------------------------------------+ string CandleType(int shift) { //--- Get price data for the specified candle double open = iOpen(NULL, 0, shift); double close = iClose(NULL, 0, shift); double high = iHigh(NULL, 0, shift); double low = iLow(NULL, 0, shift); //--- Calculate candle components double body = MathAbs(close - open); double upperWick = high - MathMax(open, close); double lowerWick = MathMin(open, close) - low; //--- Handle bullish candles if(close > open) { if(body > 1.5 * upperWick && body > 1.5 * lowerWick) return "A"; // long body / marubozu if(2 * body < upperWick && 2 * body < lowerWick) return "G"; // spinning top if(lowerWick > 2.5 * body && lowerWick > 2 * upperWick) return "H"; // pinbar if(upperWick > 2.5 * body && upperWick > 2 * lowerWick) return "E"; // inverted pinbar } //--- Handle bearish candles else if(close < open) { if(body > 1.5 * upperWick && body > 1.5 * lowerWick) return "a"; if(2 * body < upperWick && 2 * body < lowerWick) return "g"; if(lowerWick > 2.5 * body && lowerWick > 2 * upperWick) return "h"; if(upperWick > 2.5 * body && upperWick > 2 * lowerWick) return "e"; } else if(close == open) return "D"; // doji //--- Return _ if no conditions met return "_"; }
The CountFromSeries() function performs the frequency-counting operation. After the historical price series has been transformed into an encoded alphabet sequence, this function scans the sequence one character at a time and records the occurrence of each candlestick type.
Before counting begins, all counters are reset to zero to ensure that previous calculations do not influence the current analysis. The function then determines the length of the encoded series and iterates through each symbol, updating the appropriate frequency count. The output of this function forms the basis for calculating frequencies and their percentages.
//+------------------------------------------------------------------+ //| Count from an existing series string | //+------------------------------------------------------------------+ CandleCounts CountFromSeries(string series) { CandleCounts c; ZeroMemory(c); int len = StringLen(series); c.total = len; for(int i = 0; i < len; i++) { string ch = StringSubstr(series, i, 1); (ch == "A") ? c.A++ : (ch == "G") ? c.G++ : (ch == "H") ? c.H++ : (ch == "E") ? c.E++ : (ch == "a") ? c.a++ : (ch == "g") ? c.g++ : (ch == "h") ? c.h++ : (ch == "e") ? c.e++ : (ch == "D") ? c.D++ : c.zero++ ; //ch == "_") } return c; }
The FormatLine() function is responsible for formatting the output records into a consistent and readable layout. For each candlestick type, the function combines the candlestick label, encoded symbol, frequency count & percentage occurrence into a predefined text format suitable for display and file output. Using a dedicated formatting function improves code readability and ensures that all output records follow a uniform structure.
//+------------------------------------------------------------------+ //| Format count + percentage line | //+------------------------------------------------------------------+ string FormatLine(string label, string code, int count, int total) { double pct = (total > 0) ? (count * 100.0 / total) : 0.0; return StringFormat("%-20s (%s): %3d (%5.2f%%)", label, code, count, pct); }
The SaveMarketStructureToFile() function handles the generation of the final report.
The function performs the following tasks:
- Creates the output file name.
- Opens a text file for writing.
- Validates that the file was successfully opened.
- Writes the frequency analysis results into the file.
- Closes the file after all records have been saved.
If the file cannot be opened, the function immediately terminates and returns without writing any data. Otherwise, it proceeds to generate the complete report.
Upon successful completion, the following message is displayed: Results saved to file.
This function acts as the final stage of the analysis pipeline by preserving the generated statistics for further review.
//+------------------------------------------------------------------+ //| Write series + counts + percentages to TXT file | //+------------------------------------------------------------------+ void SaveMarketStructureToFile(string series, int nlookback, string filename = "") { //--- Build default filename if none provided if(StringLen(filename) == 0) filename = _Symbol + "Mdata2.txt"; //--- open/create file int handle = FileOpen(filename, FILE_TXT | FILE_WRITE); //--- check if file opened successfully if(handle == INVALID_HANDLE) { Print("Failed to open file: ", filename); return; } //--- 1) Write the continuous series string FileWriteString(handle, "=== MARKET CODED STRUCTURE " + TimeFrameToString(Period()) + " SERIES ===\n"); FileWriteString(handle, series); FileWriteString(handle, "\n\n"); //--- 2) Generate counts CandleCounts c = CountFromSeries(series); //--- 3) Build report with percentages string report = "=== CANDLE COUNTS & PERCENTAGES (Window: " + IntegerToString(nlookback) + ") ===\n" + "----------------------------------------------------------------\n" + FormatLine("Bullish Marubozu", "A", c.A, c.total) + "\n" + FormatLine("Bullish SpinTop", "G", c.G, c.total) + "\n" + FormatLine("Bullish Pinbar", "H", c.H, c.total) + "\n" + FormatLine("Bullish Inv.Pinbar", "E", c.E, c.total) + "\n" + FormatLine("Bearish Marubozu", "a", c.a, c.total) + "\n" + FormatLine("Bearish SpinTop", "g", c.g, c.total) + "\n" + FormatLine("Bearish Pinbar", "h", c.h, c.total) + "\n" + FormatLine("Bearish Inv.Pinbar", "e", c.e, c.total) + "\n" + FormatLine("Doji", "D", c.D, c.total) + "\n" + FormatLine("Unclassified", "_", c.zero, c.total) + "\n" + "----------------------------------------------------------------\n" + StringFormat("Total: %-3d (100.00%%)", c.total); //--- 4) Write report to file FileWriteString(handle, report); //--- close file FileClose(handle); //--- output message MessageBox(StringFormat("Results saved to file: %s", filename), "Prompt"); }
Program Flow
The analysis begins by scanning historical candlestick data from the selected financial instrument and timeframe. Each candlestick is passed to the CandleType() function, where it is converted into its corresponding alphabet symbol according to the encoding rules defined in Part 1. These symbols are concatenated to form an encoded market series.
After the encoded series has been generated, the SaveMarketStructureToFile() function is called to produce the final statistical report. Within this function, CountFromSeries() scans the encoded sequence and records the frequency of each candlestick type. The raw counts are then converted into percentage occurrences.
Next, the FormatLine() function organizes each record into a standardized output format containing the candlestick label, encoded symbol, frequency count, and percentage occurrence.
Finally, SaveMarketStructureToFile() writes the formatted results to a text file and closes the file upon completion. When the process finishes successfully, the program displays the message: Results saved to file.
This workflow can be summarized as follows: 
Figure 7: Workflow
Code Usage Demonstration
Figures 8–10 illustrate code execution on GBPJPY across H1, M15, and M5 timeframes. The encoded statistics enable comparison with previous studies to support informed decisions.

Figure 8: GBPJPY H1 Encoded Series and Statistics

Figure 9: GBPJPY M15 Encoded Series and Statistics

Figure 10: GBPJPY M5 Encoded Series and Statistics
Conclusion
This study translated the symbolic encoding framework into a practical, reproducible market profiling step. By processing 1,500‑bar windows for GBPUSD and XAUUSD across H1, M15 and M5, we produced per‑symbol frequency tables that show consistent patterns:
- The Marubozu category (A/a) is the most frequent across instruments and timeframes. SpinTop (G/g) tends to be the second most common.
- Bullish vs. bearish totals are closely matched in all samples, supporting a hypothesis of near‑symmetry in the classified population.
- A substantial fraction (≈32–37%) of candles are unclassified (“_”), reflecting strict classification thresholds; this flag should be interpreted as a signal to adjust rules or expand the taxonomy if needed.
- Lower timeframes (M5) show an increased incidence of neutral/doji candles in some samples, which is consistent with higher short‑term indecision and noise.
Practical output: the MQL5 script and TXT report let you (a) produce the encoded symbol string for any symbol/TF, (b) obtain counts and percentages for A/G/H/E/a/g/h/e/D/_ and (c) use these profiles as objective inputs to further testing. Importantly, single‑candle frequency alone does not constitute a trading edge—any hypothesis of predictability requires contextual filters (trend, support/resistance, position in sequences) and out‑of‑sample testing.
It is important to note that the number of candlesticks used in the analysis is not restricted to 1,500. Depending on the research objective, traders and researchers may analyze either fewer or more candlesticks by adjusting the Lookback input parameter, which determines the sample size used in the frequency analysis.
Next step: extend this pipeline to consecutive two‑symbol patterns and transition statistics. Those sequence-level frequencies and conditional probabilities are the natural follow-up required to evaluate pattern reliability and to begin building predictive models.
Warning: All rights to these materials are reserved by MetaQuotes Ltd. Copying or reprinting of these materials in whole or in part is prohibited.
This article was written by a user of the site and reflects their personal views. MetaQuotes Ltd is not responsible for the accuracy of the information presented, nor for any consequences resulting from the use of the solutions, strategies or recommendations described.
MQL5 Wizard Techniques you should know (Part 98): Using an Unscented Kalman Filter and a Capsule Network in a Custom Signal Class
CSV Data Analysis (Part 5): Real-Time CSV Streaming from Live MetaTrader 5 Sessions
Quantum Neural Network in MQL5 (Part III): A Virtual Quantum Processor Based on Qubits
Market Simulation: Getting Started with SQL in MQL5 (V)
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use