A brief review of the previous material and the prerequisites for creating a new model
In the first article, we introduced the concept of market memory,
which is determined as a longterm dependence of price increments of some order. Further we considered the concept of "seasonal
patterns" which exist in the markets. Until now, these two concepts existed separately. The purpose of the article is to show that
"market memory" is of seasonal nature, which is expressed through maximized correlation of increments of arbitrary order for closely
located time intervals and through minimized correlation for distant time intervals.
Let's put forward the following hypothesis:
Correlation of price increments depends on the presence of seasonal patterns, as well as on the clustering of nearby increments.
Let's try to confirm or refute it in a free intuitive and slightly mathematical style.
Classical econometric approach to identifying patterns in price increments is Autocorrelation
According to the classical approach, the absence of patterns in price increments is determined through the absence of serial correlation. If there
is no autocorrelation, the series of increments is considered random, while further search for patterns is believed to be
ineffective.
Let's see an example of visual analysis of EURUSD increments using autocorrelation function. All the examples will be performed using
IPython.
def standard_autocorrelation(symbol, lag):
rates = pd.DataFrame(MT5CopyRatesRange(symbol, MT5_TIMEFRAME_H1, datetime(2015, 1, 1), datetime(2020, 1, 1)),
columns=['time', 'open', 'low', 'high', 'close', 'tick_volume', 'spread', 'real_volume'])
rates = rates.drop(['open', 'low', 'high', 'tick_volume', 'spread', 'real_volume'], axis=1).set_index('time')
rates = rates.diff(lag).dropna()
from pandas.plotting import autocorrelation_plot
plt.figure(figsize=(10, 5))
autocorrelation_plot(rates)
standard_autocorrelation('EURUSD', 50)
This function converts H1 close prices to relevant differences with the specified period (lag 50 is used) and displays the autocorrelation
chart.
Fig. 1. Classical price increments correlogram
The autocorrelation chart does not reveal any patterns in price increments. Correlations between adjacent
increments fluctuate around zero, which points to the randomness of the time series. We could end our econometric analysis right here, by
concluding that the market is random. However, I suggest reviewing the autocorrelation function from a different angle: in the context of
seasonal patterns.
I assumed that the correlation of price increments could be generated by the presence of seasonal patterns.
Therefore, let us exclude from the example all hours except for a specific one. Thus, we will create a new time series, which will have its own
specific properties. Build an autocorrelation function for this series:
Fig. 2. A correlogram of price increments with excluded hours (with only the 1st hour
for each day left)
The correlogram looks better for the new series. There is a strong dependence of current increments on previous
ones. The dependence decreases when time delta between the increments increases. It means that current day 1st hour increment strongly
correlates with the previous day's 1st hour increment, and so on. This very important information indicates the existence of seasonal
patterns, i.e. increments have a memory.
Custom approach to identifying patterns in price increments is Seasonal Autocorrelation
We have found that there is a correlation between the current day's 1st hour increment and previous 1st hour
increments, which however decreases, when the distance between the day increases. Now let's see if there is a correlation between the
adjacent hours. For this purpose, let's modify the code:
def seasonal_autocorrelation(symbol, lag, hour1, hour2):
rates = pd.DataFrame(MT5CopyRatesRange(symbol, MT5_TIMEFRAME_H1, datetime(2015, 1, 1), datetime(2020, 1, 1)),
columns=['time', 'open', 'low', 'high', 'close', 'tick_volume', 'spread', 'real_volume'])
rates = rates.drop(['open', 'low', 'high', 'tick_volume', 'spread', 'real_volume'], axis=1).set_index('time')
rates = rates.drop(rates.index[~rates.index.hour.isin([hour1, hour2])]).diff(lag).dropna()
from pandas.plotting import autocorrelation_plot
plt.figure(figsize=(10, 5))
autocorrelation_plot(rates)
seasonal_autocorrelation('EURUSD', 50, 1, 2)
Here, delete all hours except the first and the second, then calculate the differences for the new series and build the autocorrelation
function:
Fig. 3. A correlogram of price increments with excluded hours (with only the 1st and the
2nd hours for each day left)
Obviously, there is also a high correlation for the series of nearest hours, which indicates their correlation and
mutual influence. Can we get a reliable relative score for all pairs of hours, not just for the selected ones? To do this, let's use the methods
described below.
Heat map of seasonal correlations for all hours
Let's continue to explore the market and try to confirm the original hypothesis. Let's look at the big picture. The below function sequentially
removes hours from the time series, leaving only one hour. It builds the price difference for this series and determines correlation with
the series built for other hours:
#calculate correlation heatmap between all hours
def correlation_heatmap(symbol, lag, corrthresh):
out = pd.DataFrame()
rates = pd.DataFrame(MT5CopyRatesRange(symbol, MT5_TIMEFRAME_H1, datetime(2015, 1, 1), datetime(2020, 1, 1)),
columns=['time', 'open', 'low', 'high', 'close', 'tick_volume', 'spread', 'real_volume'])
rates = rates.drop(['open', 'low', 'high', 'tick_volume', 'spread', 'real_volume'], axis=1).set_index('time')
for i in range(24):
ratesH = None
ratesH = rates.drop(rates.index[~rates.index.hour.isin([i])]).diff(lag).dropna()
out[str(i)] = ratesH['close'].reset_index(drop=True)
plt.figure(figsize=(10, 10))
corr = out.corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
sns.heatmap(corr[corr >= corrthresh], mask=mask)
return out
out = correlation_heatmap(symbol='EURUSD', lag=25, corrthresh=0.9)
The function accepts the increment order (time lag), as well as the correlation threshold to drop off hours with
low correlation. Here is the result:
Fig. 4. Heat map of correlations between increments for different hours for the period
20152020.
It is clear that the following clusters have the maximum correlation: 05 and 1014 hours. In the previous
article we created a trading system based on the first cluster, which was discovered in a different way (using boxplot). Now the patterns are
also visible on the heat map. Now let's view the second interesting cluster and analyze it. Here is the summary statistics for the cluster:
out[['10','11','12','13','14']].describe()

10 
11 
12 
13 
count 
1265.000000 
1265.000000 
1265.000000 
1265.000000 
1265.000000 
mean 
0.001016 
0.001015 
0.001005 
0.000992 
0.000999 
std 
0.024613 
0.024640 
0.024578 
0.024578 
0.024511 
min 
0.082850 
0.084550 
0.086880 
0.087510 
0.087350 
25% 
0.014970 
0.015160 
0.014660 
0.014850 
0.014820 
50% 
0.000900 
0.000860 
0.001210 
0.001350 
0.001280 
75% 
0.013460 
0.013690 
0.013760 
0.014030 
0.013690 
max 
0.082550 
0.082920 
0.085830 
0.089030 
0.086260 
Parameters for all the cluster hours are quite close, however their average value for the analyzed sample is negative (about 100 fivedigit points).
A shift in average increments indicates that the market is more likely to decline during these hours than to grow. It is also worth noting that
an increase in increments lag leads to a greater correlation between the hours due to the appearance of a trend component, while a decrease in
the lag leads to lower values. However, the relative arrangement of clusters is almost unchanged.
For example, for a single lag, increments of hours 12, 13, and 14 still strongly correlate:
plt.figure(figsize=(10,5))
plt.plot(out[['12','13','14']])
plt.legend(out[['12','13','14']])
plt.show()
Fig. 5. Visual similarity between the series of increments with a single lag composed
of different hours
Formula of regularity: simple but good
Remember the hypothesis:
Correlation of price increments depends on the presence of seasonal patterns, as well as on the clustering of nearby increments. f
We have seen in the autocorrelation diagram and on the heat map that there is a dependence of hourly increments both on past values and on
nearby hour increments. The first phenomenon stems from the recurrence of events at certain hours of the day. The second one is connected
with the clustering of volatility in certain time periods. Both of these phenomena should be considered separately and should be combined,
if possible. In this article, we will perform an additional study of the dependence of particular hour increments (removing all other hours
from the time series) on their previous values. The most interesting part of the research will be performed in the next article.
# calculate joinplot between real and predicted returns
def hourly_signals_statistics(symbol, lag, hour, hour2, rfilter):
rates = pd.DataFrame(MT5CopyRatesRange(symbol, MT5_TIMEFRAME_H1, datetime(2015, 1, 1), datetime(2020, 1, 1)),
columns=['time', 'open', 'low', 'high', 'close', 'tick_volume', 'spread', 'real_volume'])
rates = rates.drop(['open', 'low', 'high', 'tick_volume', 'spread', 'real_volume'], axis=1).set_index('time')
# price differences for every hour series
H = rates.drop(rates.index[~rates.index.hour.isin([hour])]).reset_index(drop=True).diff(lag).dropna()
H2 = rates.drop(rates.index[~rates.index.hour.isin([hour2])]).reset_index(drop=True).diff(lag).dropna()
# current returns for both hours
HF = H[1:].reset_index(drop=True); HL = H2[1:].reset_index(drop=True)
# previous returns for both hours
HF2 = H[:1].reset_index(drop=True); HL2 = H2[:1].reset_index(drop=True)
# Basic equation: ret[1] = ret[0]  (ret[lag]  ret[lag1])
# or Close[1] = (Close[0]Close[lag])  ((Close[lag]Close[lag*2])  (Close[lag1]Close[lag*21]))
predicted = HF(HF2HL2)
real = HL
# correlation joinplot between two series
outcorr = pd.DataFrame()
outcorr['Hour ' + str(hour)] = H['close']
outcorr['Hour ' + str(hour2)] = H2['close']
# real VS predicted prices
out = pd.DataFrame()
out['real'] = real['close']
out['predicted'] = predicted['close']
out = out.loc[((out['predicted'] >= rfilter)  (out['predicted'] <= rfilter))]
# plptting results
from scipy import stats
sns.jointplot(x='Hour ' + str(hour), y='Hour ' + str(hour2), data=outcorr, kind="reg", height=7, ratio=6).annotate(stats.pearsonr)
sns.jointplot(x='real', y='predicted', data=out, kind="reg", height=7, ratio=6).annotate(stats.pearsonr)
hourly_signals_statistics('EURUSD', lag=25, hour=13, hour2=14, rfilter=0.00)
Here is an explanation of what is done in the above listing. There are two series formed by removing unnecessary hours, based on which price
increments (their difference) are calculated. Hours for the series are determined in the "hour" and "hour2" parameters. Then we obtain
sequences with a lag of 1 for each hour, i.e. HF series is ahead of HL by one value — this allows calculating the actual
increment and the predicted increment, as well as the difference between them. Firstly, let's construct a scatter plot for the 1st and 2nd
hour increments:
Figure 5. Scatter plot for increments of hour 13 and 14 for
the period 20152020.
As expected, increments are highly correlated. Now let's try to predict the next increment based on the
previous one. To do this, here is a simple formula, which can predict the next value:
Basic equation: ret[1] = ret[0]  (ret[lag]  ret[lag1])
or Close[1] = (Close[0]Close[lag])  ((Close[lag]Close[lag*2]) 
(Close[lag1]Close[lag*21]))
Here is the explanation of the resulting formula. To predict the future increment, we are on the zero bar. Predict
the value of the next increment ret[1]. To do this, subtract the difference of the previous increment (with a 'lag') and the
next one (lag1) from the current increment. If the correlation of increments between two adjacent hours is strong, it can be expected that
the predicted increment will be described by this equation. Below is an explanation of the equation for close prices. Thus, the future
forecast is based on three increments. The second code part predicts future increments and compares them to actual ones. Here is a chart:
Figure 6. Scatter plot for actual and predicted increments for
the period 20152020.
You can see that charts in figures 5 and 6 are similar. This means that the method for determining the patterns via
correlation is adequate. At the same, values are scattered across the chart; they do not lie on the same line. These are forecast errors which
negatively affect the prediction. They should be handled separately (which is beyond the scope of this article). Forecast around zero is
not really interesting: if the forecast for the next price increment is equal to the current one, you cannot generate profit from it.
Forecasts can be filtered using the rfilter parameter.
Figure 7. Scatter plot for actual and predicted increments with rfilter = 0.03 for
the period 20152020.
Note that the heat map was created using data from 2015 to current date. Let us shift the starting day back to 2000:
Fig. 8. Heat map of correlations between increments for different hours for the period
from 2000 to 2020.

10 
11 
12 
13 
count 
5151.000000 
5151.000000 
5151.000000 
5151.000000 
5151.000000 
mean 
0.000470 
0.000470 
0.000472 
0.000472 
0.000478 
std 
0.037784 
0.037774 
0.037732 
0.037693 
0.037699 
min 
0.221500 
0.227600 
0.222600 
0.221100 
0.216100 
25% 
0.020500 
0.020705 
0.020800 
0.020655 
0.020600 
50% 
0.000100 
0.000100 
0.000150 
0.000100 
0.000250 
75% 
0.023500 
0.023215 
0.023500 
0.023570 
0.023420 
max 
0.213700 
0.212200 
0.210700 
0.212600 
0.208800 
As you can see, the heat map is somewhat thinner, while the dependence between hours 13 and 14 has decreased. At the same time, the average
value of increments is positive, which sets a higher priority to buying. A shift in the average value will not allow effectively trading in
both time intervals, so you have to choose.
Let's look at the resulting scatter plot for this period (here I only provide the actual\forecast chart):
Fig. 9. Scatter plot for actual and forecast increments with rfilter = 0.03, for the
period 2000  2020.
The spread of values has increased, which is a negative factor for such a long period.
Thus, we have obtained a formula and an approximate idea of the distribution of actual and predicted increments
for certain hours. For greater clarity, the dependencies can be visualized in 3D.
# calculate joinplot between real an predicted returns
def hourly_signals_statistics3D(symbol, lag, hour, hour2, rfilter):
rates = pd.DataFrame(MT5CopyRatesRange(symbol, MT5_TIMEFRAME_H1, datetime(2015, 1, 1), datetime(2020, 1, 1)),
columns=['time', 'open', 'low', 'high', 'close', 'tick_volume', 'spread', 'real_volume'])
rates = rates.drop(['open', 'low', 'high', 'tick_volume', 'spread', 'real_volume'], axis=1).set_index('time')
rates = pd.DataFrame(rates['close'].diff(lag)).dropna()
out = pd.DataFrame();
for i in range(hour, hour2):
H = None; H2 = None; HF = None; HL = None; HF2 = None; HL2 = None; predicted = None; real = None;
H = rates.drop(rates.index[~rates.index.hour.isin([hour])]).reset_index(drop=True)
H2 = rates.drop(rates.index[~rates.index.hour.isin([i+1])]).reset_index(drop=True)
HF = H[1:].reset_index(drop=True); HL = H2[1:].reset_index(drop=True); # current hours
HF2 = H[:1].reset_index(drop=True); HL2 = H2[:1].reset_index(drop=True) # last day hours
predicted = HF(HF2HL2)
real = HL
out3D = pd.DataFrame()
out3D['real'] = real['close']
out3D['predicted'] = predicted['close']
out3D['predictedABS'] = predicted['close'].abs()
out3D['hour'] = i
out3D = out3D.loc[((out3D['predicted'] >= rfilter)  (out3D['predicted'] <= rfilter))]
out = out.append(out3D)
import plotly.express as px
fig = px.scatter_3d(out, x='hour', y='predicted', z='real', size='predictedABS', color='hour', height=1000, width=1000)
fig.show()
hourly_signals_statistics3D('EURUSD', lag=24, hour=10, hour2=23, rfilter=0.000)
This function uses the already known formula for calculating the predicted and the actual values. Each separate
scatter plot shows the actual/forecast dependence for each hour as if a signal was generated at the previous day's hour 100 increment. Hours
from 10.00 to 23.00 are used as an example. Correlation between the nearest hours is maximal. As the distance increases, correlation
decreases (scatter plots become more like circles). Starting with hour 16, further hours have little dependence on previous day's hour 10.
Using the attachment, you can rotate the 3D object and select fragments to obtain more detailed information.
Fig. 10. 3D scatter plot for actual and predicted increments from 2015 to 2020.
Now it's time to create an Expert Advisor to see how it works.
Example of an Expert Advisor trading the identified seasonal patterns
Similarly to an example from the previous
article, the robot will trade a seasonal pattern which is only based on the statistical relationship between the current increment and
the previous one for one specific hour. The difference is that other hours are traded and a principle based on the proposed formula is
used.
Let's consider an example of using the resulting formula for trading, based on the statistical study:
input int OpenThreshold = 30;
input int OpenThreshold1 = 30;
input int OpenThreshold2 = 30;
input int OpenThreshold3 = 30;
input int OpenThreshold4 = 30;
input int Lag = 10;
input int stoploss = 150;
input int OrderMagic = 666;
input double MaximumRisk=0.01;
input double CustomLot=0;
The following interval with patterns was determined: {10, 11, 12, 13, 14}. Based on it, the "Open threshold" parameter can be set for each hour
individually. These parameters are similar to "rfilter" in fig. 9. The 'Lag' variable contains the lag value for the increments (remember,
we analyzed the lag of 25 by default, i.e. almost a day for the H1 timeframe). Lags could be set separately for each hour, but the same value for
all hours is used here for simplicity purposes. Stop Loss is also the same for all positions. All these parameters can be optimized.
The trading logic is as follows:
void OnTick() {
if(!isNewBar()) return;
CopyClose(NULL, 0, 0, Lag*2+1, prArr);
ArraySetAsSeries(prArr, true);
const double pr = (prArr[1]  prArr[Lag])  ((prArr[Lag]  prArr[Lag*2])  (prArr[Lag1]  prArr[Lag*21]));
TimeToStruct(TimeCurrent(), hours);
if(hours.hour >=10 && hours.hour <=14) {
if(CheckMoneyForTrade(_Symbol,LotsOptimized(),ORDER_TYPE_SELL)) {
if(pr <= signal && hours.hour==10)
OrderSend(Symbol(),OP_SELL,LotsOptimized(), Bid,0,Ask+stoploss*_Point,NormalizeDouble(Bid  signal, _Digits),NULL,OrderMagic);
if(pr <= signal1 && hours.hour==11)
OrderSend(Symbol(),OP_SELL,LotsOptimized(), Bid,0,Ask+stoploss*_Point,NormalizeDouble(Bid  signal1, _Digits),NULL,OrderMagic);
if(pr <= signal2 && hours.hour==12)
OrderSend(Symbol(),OP_SELL,LotsOptimized(), Bid,0,Ask+stoploss*_Point,NormalizeDouble(Bid  signal2, _Digits),NULL,OrderMagic);
if(pr <= signal3 && hours.hour==13)
OrderSend(Symbol(),OP_SELL,LotsOptimized(), Bid,0,Ask+stoploss*_Point,NormalizeDouble(Bid  signal3, _Digits),NULL,OrderMagic);
if(pr <= signal4 && hours.hour==14)
OrderSend(Symbol(),OP_SELL,LotsOptimized(), Bid,0,Ask+stoploss*_Point,NormalizeDouble(Bid  signal4, _Digits),NULL,OrderMagic);
}
}
}
The 'pr' constant is calculated by the above specified formula. This formula forecasts price increment on the next bar. Condition for each
hour is checked then. If the increment meets the minimum threshold for the particular hour, a sell deal is opened. We have already found out
that shifting the average increments to the negative zone makes purchases ineffective in the interval from 2015 to 2020, you can check it
yourself.
Let's launch genetic optimization with parameters specified in fig. 11 and view the result:
Fig. 11. The table of genetic optimization parameters
Let's look at the optimization chart. In the optimized interval, the most efficient Lag values are located
between 1730 hours, which is very close to our assumption about the dependence of increments of a particular hour on the current day on the
same hour of the previous day:
Fig. 12. Relationship of the 'Lag' variable to the 'Order threshold' variable in the
optimized interval
The forward chart looks similarly:
Fig. 13. Relationship of the 'Lag' variable to the 'Order threshold' variable in the
forward interval
Here are the best results from the backtest and forward optimization tables:
Fig. 14, 15. Backtest and forward optimization charts.
It can be seen that the pattern persists throughout the entire 2015–2020 interval. We can assume that the
econometric approach worked perfectly well. There are dependencies between same hour increments for the next days of the week, with some
clustering (the dependence may not be with the same hour, but with a nearby hour). In the next article, we will analyze how to use the second
pattern.
Check the increment period on another timeframe
Let us perform an additional check on the M15 timeframe. Suppose we are looking for the same correlation between the current hour and the same
hour of the previous day. In this case the effective lag must by 4 times larger and be about 24*4 = 96, because each hour contains four M15
periods. I have optimized the Expert Advisor with the same settings and with the M15 timeframe.
In the optimized interval, the resulting effective lag is <60, which is strange. Probably the optimizer found another pattern, or the EA
was overoptimized.
Fig. 16. Relationship of the 'Lag' variable to the 'Order threshold' variable in the
optimized interval
As for the forward test results, the effective lag is normal and corresponds to 100, which confirms the
pattern.
Fig. 17. Relationship of the 'Lag' variable to the 'Order threshold' variable in the
forward interval
Let us view the best backtest and forward results:
Fig. 18. Backtest and forward; the best forward pass
The resulting curve is similar to the H1 chart curve, with a significant increase in the number of deals.
Probably, the strategy can be optimized for lower timeframes.
Conclusion
In this article, we put forward the following hypothesis:
Correlation of price increments depends on the presence of seasonal patterns, as well as on the clustering of
nearby increments.
The first part has been fully confirmed: there is correlation between hourly increments form different weeks. The second statement has been
implicitly confirmed: the correlation has clustering, which means that current hour increments also depend on the increments of
neighboring hours.
Please note that the proposed Expert Advisor is by no means the only possible variant for trading the found correlations. The proposed logic
reflects the author’s view on correlations, while the EA optimization was only performed to additionally confirm the patterns found
through statistical research.
Since the second part of our study requires additional substantial research, we will use a simple machine learning model in the next article to
totally confirm or refute the second part of the hypothesis.
The attachment contains a readytouse framework in the Jupyter notebook format, which you can use to study other financial instruments.
The results can further be tested using the attached testing EA.