### Introduction

Traders widely use the indicators that show the basic quotes "more clearly" allowing them to perform analysis and forecast market prices movement. The issues concerning the validity of the transformation and the obtained results credibility are usually not considered and, at its best, replaced with testing the trading systems based on the indicators.

It's quite obvious for me that there is no sense in using the indicators, let alone applying them in creation of trading systems, unless we can solve the issues concerning initial quotes transformation and the obtained result credibility. In this paper we show that there are serious reasons for such a conclusion. We will consider potential problems using three indicators: straight trend line, exponential moving average and Hodrick-Prescott filter.

### 1. A Bit of Theory

For the readers' convenience I will mention some terms from the probability theory and mathematical statistics that will be used further. I am not going to provide links, as the terms applied here are fully equivalent to the corresponding ones used in textbooks.

**1.1. ****The Probabilistic Description of Economic Observations**

The quotes that we observe are indirect selective measurements (general population is unknown to us) of some stochastic process at the fundamental level including:

- accurately measured deterministic components, for example, executed currency buying or selling deals;
- deterministic components measured with an error, such as the amount of currency sold in a time interval, for example, in one day;
- stochastic component, which cannot be measured - the mood of the crowd. For most of the time, the main characteristic of this component is a random movement with a drift.

Interaction of these components results in a stochastic process including:

- trends (deterministic and stochastic);
- cycles with fixed and stochastic periods length;
- random movement with a drift.

Non-stationarity is a common characteristic of a stochastic process, which is reflected in currency quotes. The concept of nonstationary stochastic process is important for us only because it provides almost no means of its analysis, so it must be divided into a set of separate processes that are possible to be analyzed. When applying the indicators, a trader does not think about the applicability of the indicator to a specific symbol quote. However, there are econometrics tools allowing to evaluate the possibility of applying an indicator and the result of this application.

**1.2.
****Random Event. Probability**

*A random event* (buying and selling currency, in our case) is an event that may or may not happen. We know that the number of deals in different days and at different times of a day is different and, in fact, is a random value but only the events at discrete points in time (such as minute, hour, day, etc.) are taken into account most commonly.

*Relative frequency of a random event* is the ratio of a number of occurences of such an event M to the general number of accomplished observations N. With the growth of the number of observations (in theory, up to infinity) the frequency tends to the number called a random event probability. According to definition, the probability is a value from zero to one. The term "probability" will usually be used in this article instead of relative frequency.

*Random value* is a quantity that takes different values with certain probabilities.

*General set* stands for all possible values the random one can take. We always deal with a sample of the general set in the market, usually using the quotes for some period of time. It is quite natural that statistics obtained using a sample differs from the statistics calculated on the general set, as the relative frequency differs from the probability. Further calculations are carried out to evaluate the differences between the statistics obtained using a sample and the one calculated on the general set. Such an approach is impossible in the case of indicators, as the prices, for example, a close price, are considered as deterministic values by an indicator during calculations.

Another interesting observation. Since we are trying to observe the general set, we can ignore the differences in the quotes submitted by different dealing centers, as it is easy to change the quotes values but it is very difficult to change their statistical properties.

**1.3. ****Characteristics of Random Variables**

**1.3.1. ****Descriptive Statistics**

Sets of random quantities (currency quotes, in our case) are characterized by a number of parameters. Some of these parameters will be used further on.

*Histogram* is a chart showing the random value frequency. In its extreme case it is a chart showing the density of the probability distribution.

*Arithmetic mean (average)* is the sum of all observations values divided by the number of observations (the number of periods in our case). It is not applicable for all distributions and most popular for normal ones when coinciding with the median. Strictly speaking, this implies that the most popular moving average indicator can be applied in case the quotes have the distribution law, for which the average value exists.

The *median *divides all observations in a sample into two parts: in the first case all observations are less by the median value, in the second one observations values exceed by the median value. The median exists for any distribution and is not sensitive to outliers. In case the average is equal (or close) to the median, it is one of the normal distribution law features.

Deviation from the average is quite an interesting question. *Dispersion * is a mean value of squares of deviation of a random value from its mathematical expectation. Square root of dispersion is a *mean-square (standard) deviation*.

Standard deviation and dispersion are not resistant to outliers.

A dimensionless quantity called *asymmetry ratio* (skewness) serves as the indicator of an *asymmetry* degree of the distribution density curve. If the skewness value is less than «six divided by the number of observations», distribution of the probability of a random value depends on the normal law.

Another value that characterizes the distribution density is *kurtosis*. It equals to 3 in normal conditions. In case kurtosis is more than three, the top is sharp and «heavy» tails fall in a low-angle manner.

As we can see, plenty of concepts is applicable to random variables having the normal distribution law. It is not so bad, since a large number of distribution laws are reduced to the normal one when the number of observations increases.

**1.3.2. ****Normal Distribution**

Normal (Gaussian) distribution is an extreme case of almost all real probability distributions.

Lyapunov's limit theorem serves as theoretical basis stating that distribution of sums of independent random values having any initial distribution will be normal in case there are lost of observations and their contribution is small. Therefore, it is widely used in many real-world applications of probability theory.

The normal distribution is a symmetrical bell-shaped curve, extending all over the number axis. Gaussian distribution depends on two parameters: μ (mathematical expectation) and σ (standard deviation).

Mathematical expectation and the median of the given distribution are equal to μ, while dispersion is equal to σ^{2}. The curve of probability density is symmetric to the mathematical expectation. Asymmetry ratio and excess are γ
= 0, ε = 3.

The normal distribution density is often described not as x variable function but as z = (x − μ) / σ variable function having zero mathematical expectation and dispersion that is equal to 1.

Distribution with μ = 0 and σ = 1 is called the standard normal distribution (i.i.i).

Fig. 1. Normal distribution

**1.3.3. ****Student's Distribution (t-distribution)**** **

The main parameter is the degree of freedom (the number of elements in the sample). With the increase in the number of degrees of freedom Student's distribution approaches the standardized normal one, and in case n > 30, Student's distribution can be replaced with the normal distribution. In case n < 30, Student's dustribution has heavier tails.

Fig. 2. Student's distribution

t-statistics is widely used for testing statistical hypotheses.

**1.3.4. ****Chi-square (Pearson Distribution)**

In case Хi are independent random values having i.i.i, then the sum of their squares is subject to χ^{2}-distribution. Density depends on a single parameter ν (usually called the number of freedom degrees) equal to the number of independent random variables. If the number of freedom degrees ν →∞, χ^{2}-distribution tends to the normal distribution
having center v and dispersion 2ν. Distribution density is aymmetric, unimodal and it also becomes more flat and symmetrical with increasing freedom degrees.

Fig. 3. Pearson distribution (chi-square)

**1.3.5. ****F - Fisher distribution**** **

Fisher F-distribution is a distribution of a dispersion relation, i.e. the ratio of two series of dispersions.

If two independent random variables have a chi-square distribution with degrees of freedom (*V*1, *V*2), their ratio has Fisher distribution.

Fig. 4. Fisher distribution

**1.3.6. ****R-square Determination Ratio**

Determination ratio shows, what proportion of the result dispersion is explained by the independent variables influence. In case of two variables that is Pearson correlation square. It shows the amount of dispersion that is total between the two variables.

The significance of the correlation ratio depends on the number of observations or the Fisher F-statistics. When the number of candlesticks in a quote exceeds 100, even very small deviations of the observed values from zero are sufficient to confirm the significance of the indicator.

**1.4. ****Determining Hypotheses**

What conclusions can we make about some general set parameter, in case we have a selective value of this parameter? The answer to this question depends on whether we have some prior information about the size of the general parameter.

If the prior information about the general magnitude of the parameter is absent, we can evaluate this parameter by a selective value, setting the confident interval for it, i.e. the range, within which its value is located with a certain confident probability.

In practice we usually need to check some specific and in most cases simple hypothesis ** But**. This hypothesis is considered to be a null one. To test the hypothesis, some criteria are used allowing to accept or reject it. The types of statistics listed below are most often used as the criteria: t-statistics, F-statistics and chi-square statistics. When using some specific software for statistics (for example, STATISTICA) or econometrics (like EViews), calculated criterion is accompanied by the significance value of this criterion -

*p-value*. For example, the p-value of 0.02 (2%) means that the corresponding criterion is

__not significant__at 1% significance level and significant at 5% significance level. Equivalently, it can be assumed the null hypothesis is not valid with the probability equal to "1 - p-value".

Selection of a p-value is subjective and determined by the severity of the consequences of an erroneous evaluation of a specific criterion.

**1.5.**** Quotes Statistical Characteristics**

**1.5.1. ****Descriptive Statistics**

Descriptive statistics include:

- A histogram that must approach the distribution law, when the amount of candlesticks in a quote increases;
- Main trend measures: mean, median;
- Dispersion measure: standard deviation;
- Form measures: skewness and kurtosis;
- Jarque-Bera normality criterion.

Jarque-Bera criterion. Null hypothesis * But*: the distribution is normal. For example, the probability accompanying the criterion value, is equal to 0.04. It seems that the following conclusion can be made: the probability of the null hypothesis acceptance is equal to 4%. However, this is not entirely correct, since the calculated value is a criterion p-value and the probability of the null hypothesis acceptance is equal to 96%.

**1.5.2. ****Autocorrelation and Q-statistics**

*Correlation* is a measure of the relation between two variables. Correlation ratio can vary from -1.00 to +1.00. The value -1.00 means completely negative correlation, the value +1.00 means completely positive correlation. The value 0.00 means the absence of the correlation.

The correlation between the elements of one quote is called autocorrelation. It can be very useful in finding trends. The presence of autocorrelation challenges any conclusions about the quotes as random variables, because the most significant factor in determining a random value is the independence of various prices at different periods of time.

In statistical analysis software autocorrelation is accompanied by Q-statistics of Ljung-Box with the p-value. The null hypothesis is: the autocorrelation is absent, i.e. in case p-value is equal to zero we may conclude that the correlation is absent before some definite candlestick in a quote.

*Exclusion of autocorrelations (trends) from the quotes is the first step in getting the possibility to use the methods of mathematical statistics.*

**1.5.3. ****Quotes Stationarity**

We will consider the quotes are **stationary**, in case their mathematical expectation and dispersion do **not** depend on time. Even this definition of stationarity is too strict and not very suitable for practical application. The quotes are very often considered to be stationary, in case deviations of mathematical expectation and/or dispersion comprise several per cents (usually not more than 5%) within some time.

The actual quotes in the Forex market are not stationary. They have the following deviations:

- The presence of a trend generated by the dependence between observations in time. Dependence is a characteristic feature of currencies quotes and economic observations in common;
- Cyclicity;
- Varying dispersion (heteroscedasticity).

The quotes deviating from the stationary ones are called *nonstationary*. They are analyzed by the successive decomposition into components. The decomposition process terminates upon receipt of the balance of a stationary series with almost constant expectation and/or dispersion.

There are several tests for quotes stationarity. The basic ones are unit root tests. The most famous of the unit root tests is the Dickey-Fuller test. The null hypothesis ** But**: quotes are not stationary (they have a unit root), i.e. the average and dispersion depend on time. Since there is almost constant dependence on time (a trend), presence of a trend in the quotes must be indicated when carrying out the test. At this stage they are determined by eye.

**1.6. **** Indicators Specification (Regression)**

A superficial glance at the indicators texts written using such languages as MQL5, for instance, allows to identify two forms of their setting: analytic (most common) and tabular (applied to the indicators, which are called filters, for example, Kravchuk's indicators).

But we will use the term 'regression' - a common term in mathematical statistics and econometrics.

Having the idea of what we want to get from the quotes, we need to set the following parameters to formulate the (indicator) regression:

- The list of independent variables used for the indicator calculation;
- Independent variables ratios;
- Indicator calculation equation that will be used for the dependent variable calculation.

While there are some difficulties in creating multi-currency indicators, there are no such difficulties in regression.

Having these three positions, it will be necessary to fit the regression to a quote. In contrast to traders' forums the word «fit, fitting» is not a dirty one in econometrics, but the standard procedure, during which conformity of the (indicator) regression to quotes is calculated using one of the multiple evaluation methods. Ordinary least squares (OLS) are the best known evaluation method.

Evaluation reveals two points of interest:

- Compliance of the indicator with the quotes – the value of the remaining error;
- Stability of the calculated regression parameters in the future.

The answers to these questions are given during the indicators diagnostics.

**1.7. ****Indicators Diagnostics**

Indicators (regressions) diagnostics is divided into three groups:

- Ratios diagnostics;
- Residues diagnostics;
- Stability diagnostics.

Each verification procedure described below includes specification of the null hypothesis that is used as a verification hypothesis. Verification result consists of selecting the values of one or more statistics and their attached p-values. The latter indicate the probability of the execution of the null hypothesis condition, which is the basis of the verification statistics.

Thus, small *p-values* lead to the null hypothesis rejection. For example, if a p-value lies between 0.05 and 0.01, the null hypothesis is deviated on 5% level, not 1%.

It should be noted that there are various suggestions and distribution results connected with each verification. For example, some statistics have accurate, finite test distributions (usually t or F-distributions). Others are large samples of test statistics with asymptotic χ2 distributions.

**1.7.1. ****Ratios Diagnostics**

Ratios diagnostics gives information and defines limitations of the evaluated ratios including the special case of verifications for missed and redundant variables. The following verifications of the regression equation ratios will be used:

- Confidence ellipses allow to reveal the correlation between equations ratios;
- The test of missing variables allows to determine the necessity of additional variables in the regression equation;
- Redundant variables test allows to reveal excessive variables;
- The break test allows to determine the reaction of the regression equation to changes of a trend. It is desirable to create such a regression equation that would be equally good at reflecting the quotes at the ascending, descending and flat quotes segments.

**1.7.2. ****Residues Diagnostics**

We have already mentioned the importance of studying the residues when trying to transform nonstationary quotes into stationary ones.

Unit root test can show that the residues are distributed much closer to the normal law in comparison to basic quotes. The word "closer" reflects the fact that the residues have the average and dispersion depending on time, which leads to instability of regression equation ratios.

Using the traders' forums terms, we can say that we should not "overoptimize (here is our infamous fitting!)" a trading system, i.e. it must not loose its characteristics at the next segments. The system is not suitable for future quotes segments because of the mathematical expectation and dispersion changing over time.

The following tests will be carried out for the residues: series correlation, normality, heteroscedasticity and autoregressive conditional residues heteroscedasticity.

**Correlograms - Q-statistics** shows the residues autocorrelations and calculates Ljung-Box Q-statistics for the appropriate lags with the indication of the *p-value*.

**Histogram - normality test** shows the histogram and descriptive residues statistics including Jarque-Bera statistics when testing for normality. If the residues are normally distributed, the histogram should be bell-shaped and the Jarque-Bera statistics should not be significant.

**Tests of heteroscedasticity** verifies the equation residues heteroscedasticity. If there is an evidence of heteroscedasticity, it is necessary either to change the regression specification (change the indicator), or model the heteroscedasticity.

Let's use the heteroscedasticity test of White with the __null hypothesis concerning the absence of heteroscedasticity against the heteroscedasticity test of an unknown, common form.__

White describes his method as the common test for the model error specification, as the null hypothesis the test is based on suggests that the errors are both homoscedastic and independent from the independent variables, and that the linear model specification is correct. Exclusion of any of these parameters could lead to a significant test statistics. On the contrary, insignificant test statistic implies that none of these three parameters have been violated.

**1.7.3. ****Stability Diagnostics**

Diagnostic stability is the most interesting and important in this case, as the results of the diagnostics reveal the indicator predictive capabilities. In MT4 or MT5 stability can be diagnosed using the strategy tester. Further on we will show that the strategy tester cannot diagnose the future stability of a trading system created using the indicators. It can only give some evaluation of a trading system based on historical data.

Like during trading systems tests, the common method of the stability diagnostics is that ** Т** quote bars are divided into the observations

*Т***that will be used for the evaluation and**

*1*

*Т*

*2 =*

*Т*

*–*

*Т***bars that will be used for testing and evaluation.**

*1*In case a trading system is tested at two segments, the problem of its future stability cannot be solved, as the test at the second segment shows only that this new segment is similar to the previous one by its unknown statistic parameters. At the same time, the statistical issues that have been solved during the trading system creation remain unknown.

Of course, different quotes segments are selected during the trading systems testing but it is impossible to detect by eye, say, heteroscedasticity areas or the quotes segments, at which regression ratios will be unstable.

Several tests (not all stability tests) are listed below. With this tests we may be sure that a trading system will show a stable result, in case test conditions appear in a quote in the future.

For example, changing of a trend direction from descending to ascending one or vice versa is a breakpoint test. If this test has not found a breakpoint, we can be sure that the indicator will show stable results in case of any changes of a trend.

**Quandt-Andrews break point test**

__The null hypothesis: the absence of breakpoints between two observations, spaced from the ends of the sample by 15%.__

Quandt-Andrews breakpoint test carries out verification for one or more unknown structure breakpoints in a sample for a given equation. The basic idea of Quandt-Andrews test is that a separate test of Chow breakpoint is carried out for each observation between two dates or observations ** t1** and

**. k of test statistics from the Chow tests are then summed into one test statistics for the test**

*t2*__against the null hypothesis concerning the absence of breakpoints between__

__t1____and__

**.**

__t__2**Ramsey RESET Test**

__The null hypothesis: the error in managing the regression is a normally distributed value with a zero average.__

Series correlation, heteroscedasticity or abnormal distribution law for all violate the assumption that the noises are distributed normally.

RESET - a common test for the following types of specification errors:

- Missed variables; X does not include all appropriate variables;
- Incorrect functional form: some or all variables in y and X must be transformed by a logarithm, a power, an inverse value or in some other way;
- The correlation between X and e can be caused by a few factors including X measurement error or the presence of a lag value and a correlation in the noise series.

With such specification errors, OSL evaluations will be shifted (the system error is not equal to zero) and invalid (does not fit in the evaluated quantity by its probability when increasing the number of observations), thus, the ordinary output procedures will be unjustified.

**Recursive Residues**

Recursive Residues tests are based on the multiple regression evaluations with the gradual increase in the number of bars.

**One Step Ahead Forecasting Test**

If we look at the definition of the previously presented recursive residues, we may see that each recursive residue is a one step ahead forecast error. If we want to check the possibility of the dependent variable value to pass by the fitted model along all data up to the point for the time t, each error must be compared with its standard deviation from the full sample.

**Ratio Recursive Estimates**

This type allows to trace the change in the estimates for any ratio when the amount of estimation data in a sample increases. The figure shows selected ratios in the equation for all executable recursive estimations. The figures show two standard intervals around the estimated ratios.

In case the ratio shows a significant change when adding the data to the evaluation equation, that is a sure sign of instability. Ratio images can sometimes show dramatic leaps, as the postulated equation tries to overcome a structural break.

Technical analysis has a wide range of the so-called "adaptive" indicators, though there are no attempts to determine the actual need for such an adaptation. Ratios recursive estimates can solve this issue.

### 2. Preparing Initial Data

Let's take close prices of EURUSD daily quotes from November 11, 2010 to March 23, 2011 for our analysis. The quotes are received from MT4 terminal by F2 and exported to Excel.

The quotes linear chart looks as follows:

Fig. 5. EURUSD chart

This example shows the necessity to control missed data in the indicators. We should not think that shown quotes are just a special case of low-quality quotes. Data omissions can occur because of various reasons. Besides, we should mind the data missed during USA holidays. The issue of missed data becomes particularly critical when building trading systems based on various economic factors, such as the correlation of currency rates and stock indices, that are not traded around the clock.

In our simple case it is possible to carry out the linear interpolation and lessen the influence of missed data on calculations at least to some degree.

In addition there is the outliers issue. The outliers issue is more complicated than the issue of missing data. Before we start looking for the outliers, we should answer the following question: what is an outlier? I consider an outlier to be a price movement exceeding three standard deviations that was not followed by the further strong price movement.

The outliers are determined not by the quotes but by their residues: let's calculate the series by subtracting the previous price value from the next one – eurusd(i) – eurusd(i+1) (in the MQL notation). The English notation has several names for this value. It is 'differenced' on the chart. The word 'returns' is used most often. I will use the word 'residue' here and below. It is the value obtained after removal of a trend in the quotes. EURUSD residues chart looks as follows:

Fig. 6. EURUSD residue

The standard deviation for EURUSD quotes is equal to 0.033209. Therefore, there are no outliers in our quotes according to the formulated outliers criteria.

In case outliers are present, they can be replaced with, say, values for missed data and then interpolated.

Provided method of removing the outliers is not the only one and, most importantly, it is not correct. If the residue comprises of the quotes residues after a trend removal, it is quite evident that the size of the outliers depend on the method the trend is determined, i.e. the outliers issue must be considered after the trend determination issue is solved.

At this point the preparation of the basic data for the further analysis is considered to be complete.

### 3. Analysis of the Statistical Parameters

The analysis of Forex quotes statistical parameters and the analysis of EURUSD quotes in particular are carried out to check the possibility of applying the indicators for analysis and trading systems creation.

The typical algorithm of creating a trading system looks as follows:

- An indicator is selected (for example, Moving Average) and a trading system is created on its basis;
- As it is usually impossible to build a trading system based on a single indicator, additional indicators are implemented into it to avoid false market entries.

Also, "do not overfit, just do not overfit" mantra should be uttered at this stage.

**3.1. ****Descriptive Statistics**

We know from statistics that if a quote had been subject to the normal distribution law like a random value, the value of the average calculation error would have changed in case of the changes in the number of periods and would have coincided at infinity with mathematical expectation that is a constant for the normal law. The quotes could have been replaced with a straight horizontal line, stop loss and take profit could have been set at the levels of standard deviations. But that is not the case. Let's examine the reasons.

We will check the compliance of quotes with the normal distribution law.

Let's create EURUSD quotes histogram which looks as follows:

Fig. 7. EURUSD histogram

The histogram shows how many times a definite price has emerged within the range we have selected.

According to its look, the distribution is not normal, two tops spoil the whole picture. Let's carry out Jarque-Bera normality test with H0 null hypothesis: the distribution is normal. The result is shown below:

Parameter | Value (fact) | Theoretical value |
---|---|---|

Average | 1.3549 | The average should be equal to the median |

Median | 1.3580 | The median should be equal to the average |

Standard deviation | 0.0332 | - |

Assymmetry (slant) | 0.0909 | 0.0 |

Kurtosis | 2.1052 | 3.0 |

Jarque-Bera | 3.5773 | 0.0 |

Probability | 0.1671 | 1.0 |

Table 1. Distribution normality test result

According to the Jarque-Bera criterion, conclusion about the non-compliance with normality is not so dogmatic because:

- The average and the median almost coincide
- Asymmetry is close to zero
- Kurtosis is close to three
- Existing discrepancies are well reflected by the last "Probability" line, which shows that the distribution is normal with the probability of 16.7186%.

We may have different attitudes to this figure. On the one hand, we can not reject the null hypothesis (a quote is distributed normally) on the conventional level of significance, such as 95%. On the other hand, it is impossible to consider the distribution to be normal at 16%.

Since the average is almost coincide with the median (one of the normal distribution features), let's check, if we can trust the calculated values of the average. Let's carry out the test for the average equality by dividing the quotes into sections.

The result is as follows:

EURUSD | Quantity | Average | Standard deviation | Average error |
---|---|---|---|---|

[1.25, 1.3) | 4 | 1.2951 | 0.0034 | 0.0017 |

[1.3, 1.35) | 42 | 1.3262 | 0.0125 | 0.0019 |

[1.35, 1.4) | 48 | 1.3740 | 0.0133 | 0.0019 |

[1.4, 1.45) | 9 | 1.4131 | 0.0083 | 0.0027 |

All | 103 | 1.3549 | 0.0332 | 0.0032 |

Table 2. Comparing the average values at segments

As shown by this test, the average is calculated with an error having the most typical value of 19 pips that can reach up to 32 pips.

Considering that, we conclude that we cannot use the average.

The standard deviation value of 0.033209 looks very suspicious. These are 332 pips! Generally speaking, such a large standard deviation is obvious: EURUSD quote has a trend, which in fact is a regular deterministic component distorting any statistical characteristics of quotes.

**3.2. ****Testing the Quotes Autocorrelation**

The concept of "randomness" is based on the independence of a random quantity values relative to each other. Quotes appearance allows to find the directional movement sections - trends.

Determinism (presence of a trend) implies the dependence of the adjacent EURUSD values that can be checked by calculating the autocorrelation (ACF), i.e. the correlation between the adjacent EURUSD values.

The results are shown below:

Fig. 8. Autocorrelation function of EURUSD quotes

The probability joined to the Q-statistics is the same everywhere and equal to zero.

The calculations show that:

- The value of the autocorrelation function decreases smoothly and that decrease is probably a regular one.

The calculated probability refers to the test with the null hypothesis but there is no correlation up to lag 16 (in our case). Since this probability is equal to zero for all lags, we srictly reject the null hypothesis about the absence of autocorrelation (trend) in quotes.

**3.3. ****Quotes Stationarity Analysis**

We will carry out EURUSD quotes stationarity analysis using Dickey-Fuller test in its three versions: with a shift, with a trend, without a shift and a trend.

The test result consists of two parts: for EURUSD and for differentiated EURUSD quotes denoted as D(EURUSD).

The null hypothesis of that test is that EURUSD is not stationary (has a unit root). We will carry out the calculations of not only a unit root, but also of the statistical characterisitics of EURUSD differentiation results. The differentiation chart is presented below:

Fig. 9. EURUSD quotes residue

It can be concluded visually that EURUSD differentiated quotes are random oscillations located approximately around zero.

Let's examine three methods of calculating the EURUSD quotes stationarity test.

**1. ****The quotes without a shift (a constant) and a trend, for which the regression has the following look****:**

D(EURUSD) = С(1) * EURUSD(1) + С(2) * D(EURUSD(1))

The probability of accepting the null hypothesis (the series is not stationary): 0.6961

Variable | Ratio | t-statistics | Probability of being equal to zero |
---|---|---|---|

EURUSD(1) | 3.09E-05 | 0.0488 | 0.9611 |

D(EURUSD(1)) | 0.2747 | 2.8759 | 0.0049 |

Table 3. Stationarity test results without considering the shift and the trend

Evaluation of the regression fitting to D(EURUSD) by R-square: 0.07702.

The following conclusions can be drawn from that data:

- EURUSD quotes should be recognized as not stationary with high probability (69%). We do not reject the null hypothesis strictly;
- D(EURUSD) increment does not depend on the previous EURUSD price value with the probability of 99.5%;
- D(EURUSD) fully depends on the previous D(EURUSD(1)) increment;
- The value of the R-square determination ratio = 0.077028 shows the complete non-compliance of the regression with D(EURUSD) differentiated quotes.

**2. EURUSD quote with a shift (a constant), for which the regression looks as follows:**

Variable | Ratio | t-statistics | Probability of being equal to zero |
---|---|---|---|

EURUSD(1) | -0.0445 | -1.6787 | 0.0964 |

D(EURUSD(1)) | 0.3049 | 3.1647 | 0.0021 |

С | 0.0603 | 1.6803 | 0.0961 |

Table 4. Stationarity test results considering the shift

The probability of the null hypothesis acceptance (the series is not stationary): 0.4389

Evaluation of the regression fitting to D(EURUSD) by R-square: 0.1028

The following conclusions can be drawn from that data:

- EURUSD quotes should be recognized as not stationary with quite high probability (43%). We do not reject the null hypothesis strictly;
- We should not include the previous EURUSD price value and a constant (a shift) into the regression equation for D(EURUSD) increment, as we consider these ratios to be equal to zero for the 5% significance level;
- D(EURUSD) fully depends on the previous D(EURUSD(1)) increment;
- The value of the R-square determination ratio = 0.102876 shows the complete non-compliance of the regression with D(EURUSD) differentiated quotes.

**3.**** EURUSD quote with a shift (a constant) and a trend**, for which the regression looks as follows:

D(EURUSD) = С(1) * EURUSD(1) + С(2) * D(EURUSD(1)) + С(3) + С(4) * TREND

The probability of accepting the null hypothesis (the series is not stationary): 0.2541

Variable | Ratio | t-statistics | Probability of being equal to zero |
---|---|---|---|

EURUSD(-1) | -0.0743 | -2.6631 | 0.0091 |

D(EURUSD(-1)) | 0.2717 | 2.8867 | 0.0048 |

C | 0.0963 | 2.5891 | 0.0111 |

TREND(11/01/2010) | 8.52E-05 | 2.7266 | 0.0076 |

Table 5. Stationarity test results considering the shift and the trend

Evaluation of the regression fitting to D(EURUSD) by R-square: 0.1667

The following conclusions can be drawn from that data:

- EURUSD quotes should be recognized as not stationary with quite high probability (25%). We do not reject the null hypothesis strictly;
- Although the probability of the ratio being equal to zero during a trend is less than 1%, the value of this ratio is extremely small, i.e. the trend is a horizontal line;
- The value of the R-square determination ratio = 0.166742 shows the complete non-compliance of the regression with D(EURUSD) differentiated quotes.

From these calculations the following conclusion can be drawn: in case basic EURUSD quotes are not stationary, then their first difference, obtained by subtracting the previous price value from the next one, is probably a stationary one.

In this case we have removed a trend and a shift, which can be described by the following equation:

eurusd = c(1) * trend + c(2),

where c(1) and c(2) are constants that can be evaluated by the method of least squares.

This equation is a common regression equation completely coinciding with the "regression" tool in MT4 terminal. I.e., we have replaced the basic quote by the straight line. It is a widely used method in technical analysis, as we can easily recall a wide range of instruments consisting of straight lines: channels, support and resistance levels, Fibonacci levels, Gann etc.

Straight lines are the first tool used by any trader. But why do we trust this tool? Why do we consider the straight lines to be reliable? We will answer this question later in the article.

In addition to the straight lines, the indicators that replace basic quotes with some curves are also used in technical analysis. We will do the same way and take two well-known indicators for analysis: exponential moving average and Hodrick-Prescott filter.

### 4. Quotes Detrending

The use of «detrending» term is intended to emphasize the connection of this section with the corresponding notion of econometrics. More precisely and in accordance with the previously declared model of financial markets, we should talk about the removal (detrending) of a regular component from the quotes.

We have determined three regular components in our case: linear trend, exponential moving average and Hodrick-Prescott filter.

All regular components will be set as time series.

**4.1.**** Linear Trend**

Let's set the linear trend by adding one to the previous value.

We will evaluate the linear regression ratios:

eurusd = c(1) * trend + c(2),

We get the combined chart of a eurusd basic quote, straight regression line shifted vertically and the residue obtained by deducting the regression line from the quote:

Fig. 10. EURUSD chart, linear regression and residue

Now we evaluate the following equation using the method of least squares:

EURUSD = С(1)*TREND + С(2)

The evaluation of the regression equation is accompanied by the following data:

Variable | Ratio | t-statistics | Probability of being equal to zero |
---|---|---|---|

TREND | 0.0004 | 4.4758 | 0.0000 |

C | 1.3318 | 223.3028 | 0.0000 |

Table 6. Linear trend stationarity test results

Evaluation of the regression fitting to the R-square quote = 0.1655.

The following conclusions can be drawn from the result:

- According to the R-square determination ratio, the straight line can explain changes in the quotes only in 16% of cases;
- The residue from the deduction of the linear trend from the quote differs inconsiderably from the quote itself. Apparently, it will have the same statistical flaws like the quote.

**4.2****. Exponential Smoothing**

Holt-Winters algorithm without a seasonal component with smoothing parameters for a quote (level) and a trend will be selected for exponential smoothing.

The main idea of the method:

- Remove the trend from the time series by separating the level from the trend;
- Smoothing the level (a parameter);
- Smoothing the trend forecast (b parameter).

Obtained result is shown in the figure.

Fig. 11. Exponential moving average

We have received a standard exponential moving average that lags a bit but displays the quote well enough. Smoothing parameters are displayed at the top, parameters selection has not been carried out.

Now we evaluate the following equation using the method of least squares:

EURUSD = С(1)*EURUSD_EX +С(2)

The evaluation of the regression equation is accompanied by the following data:

Variable | Ratio | t-statistics | Probability of being equal to zero |
---|---|---|---|

EURUSD_EX | 0.9168 | 24.3688 | 0.0000 |

C | 0.1145 | 2.2504 | 0.0266 |

Table 7. Linear regression evaluation results

Evaluation of the regression fitting to the R-square quote = 0.8546

The following conclusions can be drawn from the result:

- According to the R-square determination ratio, the exponential moving average can explain changes in the quotes in 84% of cases;
- The residue from deduction of the exponential average from the quote is similar to a random process with the normal distribution. Let's consider that there is some sense in the further analysis of that residue.

**4.3. ****Hodrick-Prescott Filter**

Hodrick-Prescott filter has the lambda parameter.

We will not deal with the selection of this parameter and we will take it as being equal to 8162.

The result is shown below:

Fig. 12. Hodrick-Prescott filter

Now we evaluate the following equation using the method of least squares:

EURUSD = С(1)*EURUSD_HP + С(2)

The evaluation of the regression equation is accompanied by the following data:

Variable | Ratio | t-statistics | Probability of being equal to zero |
---|---|---|---|

EURUSD_HP | 1.0577 | 23.9443 | 0.0000 |

C | -0.0782 | -1.3070 | 0.1942 |

Table 8. Evaluation results of fitting the regression to the quotes

Evaluation of the regression fitting to the R-square quote = 0.8502

The following conclusions can be drawn from the result:

- The probability of the second ratio (the constant) being equal to zero is 19%. That puts in doubt the use of the constant in the regression equation;
- According to the R-square determination ratio, Hodrick-Prescott filter can explain changes in the quotes in 85% of cases;
- The residue from deduction of Hodrick-Prescott filter from the quote is similar to a random process with the normal distribution and it makes sense to analyze it further.

**5.**** Ratios Diagnostics**

Ratios diagnostics includes the following tests:

- The confidence ellipse defines the correlation between the regression equation ratios: the closer the ellipse to a circle, the less the correlation;
- Confidence interval defines the limits of the equation ratios variation. In technical analysis, the ratios are the constants that usually can be changed by using the "period" parameter or in some other way. But in any case, the ratios are not considered as random values. Let's check, if that is true;
- Missed variables test – the null hypothesis is considered: an additional independent variable is not significant.
- Redundant variables test – the null hypothesis: additional variable ratio is equal to zero;
- Breakpoints test determines the presence of the quotes statistical characteristics changing points. Let's check trend change points in terms of technical analysis in the role of the mentioned changing points. In the analyzed EURUSD quote we can allocate at least two trends – descending and ascending (here we ignore a flat movement).

**5.1. ****Confidence Ellipse**

Let's create confidence ellipses for each of the regression equations:

Fig. 13. Confidence ellipse for the regression equation 1

Fig. 14. Confidence ellipse for the regression equation 2

Fig 15. Confidence ellipse for the regression equation 3

The following conclusions can be drawn from the figures:

- The correlation of the ratios for the linear trend regression is present and can be evaluated roughly at 0.5;
- The correlation for the regression with the exponential moving average and the Hodrick-Prescott filter is practically equal to one, which requires the exclusion of the constants of regression equations. Significant probability of the constant being equal to zero supports the idea of its exclusion from the regression equation with Hodrick-Prescott filter.

**5.2. ****Confidence Interval**

Let's check the assumption that the constants in the regression equation are random values.

To do this, we should create confidence intervals:

Variable | Ratio | Confidence interval 90% | Confidence interval 95% | ||||
---|---|---|---|---|---|---|---|

Lower border |
Upper border |
% from the interval |
Lower border | Upper border | % from the interval | ||

TREND | 0.0004 | 0.0002 | 0.0006 | 74.3362 | 0.0002 | 0.0006 | 88.7168 |

C | 1.3318 | 1.3219 | 1.3417 | 1.4868 | 1.3200 | 1.3436 | 1.7767 |

EURUSD_EX | 0.9168 | 0.8543 | 0.9793 | 13.6247 | 0.8422 | 0.9914 | 16.2810 |

C | 0.1145 | 0.0300 | 0.1991 | 147.5336 | 0.0135 | 0.2155 | 176.2960 |

EURUSD_HP | 1.0577 | 0.9844 | 1.1310 | 13.8661 | 0.9701 | 1.1453 | 16.5694 |

C | -0.0782 | -0.1776 | 0.0211 | 254.0276 | -0.1970 | 0.0405 | 303.5529 |

Table 9. Confidence levels of the regression ratios

Observing the confidence intervals we can see that the ratio is a random value that behaves according to its status – while the confidence increases (channel width shrinks), the interval width expands.

«% from the interval» column is of great interest, as it represents the percentage relation of the ratio value interval width to the ratio value. As we can see, this quantity for the regression constants with the exponential average and the filter has completely unacceptable values of over 100%! it is necessary to mention again that correlation ratios between two ratios of these equations are almost equal to one.

Let's remove the constant from the equations and re-evaluate the regression ratios.

The following result will be obtained:

Variable | Ratio | Confidence interval 90% | Confidence interval 95% | ||||
---|---|---|---|---|---|---|---|

Lower border | Upper border | % from the interval | Lower border | Upper border | % from the interval | ||

EURUSD_EX | 1.0014 | 0.9999 | 1.0030 | 0.3131 | 0.9996 | 1.0033 | 0.3742 |

EURUSD_HP | 1.0000 | 0.9984 | 1.0015 | 0.3127 | 0.9981 | 1.0018 | 0.3737 |

Table 10. Confidence intervals of the recalculated regression ratios

I will not display the new calculations for the regressions with the exponential average and the filter in order not to make the article too big.

I will just mention that the following regression equations will be used further on:

EURUSD = 1.00149684612*EURUSD_EX

EURUSD = 1.00002609628*EURUSD_HP

**5.3. ****Missed and Excessive Variables (Indicators)**

A typical trading system creation algorithm consists of the following steps. Some indicator is taken and used for testing of a trading system. Then an additional indicator is added for sorting out false triggers of the trading system etc.

This algorithm cannot show, when a trader must stop. It is not able to indicate, if some additional indicators are needed or if it is necessary to exclude some indicators from the trading system. The existing theory of trading systems creation cannot answer these questions, but the answers can be found when carrying out the test for missed and excessive variables (indicators).

Missed variables test – the null hypothesis is considered: an additional independent variable is not significant.

Let's create one complex indicator out of all three ones that we have:

EURUSD = C(1)*TREND + C(2) + C(3)*EURUSD_EX + C(4)*EURUSD_HP

While evaluating the ratios of this integral indicator (regression), we will get the following result:

EURUSD = 1.41879198369e-05*TREND - 0.00319950161771 + 0.50111527265*EURUSD_EX + 0.501486719095*EURUSD_HP

The probability of the appropriate ratios being equal to zero are shown in the following table:

Variable | Ratio | Probability of being equal to zero |
---|---|---|

TREND | 1.42E-05 | 0.7577 |

C | -0.0032 | 0.9608 |

EURUSD_EX | 0.5011 | 0.0000 |

EURUSD_HP | 0.5014 | 0.0004 |

Table 11. Evaluation of the probability of the indicator ratios being equal to zero

The table shows that we should not have included TREND indicator and the constant, as we can be sure that their ratios are equal to zero.

Let's add one more integral indicator (the square of the exponential average eurusd_ex^2) to the previous one and carry out the test of the missed variable (eurusd_ex^2) with the null hypothesis: additional eurusd_ex^2 variable is not significant.

According to the calculated t and F-statistics, the probability of the additional variable (eurusd_ex^2) not being significant is equal to 44.87%. On this basis, it can be argued that additional indicators are not needed in our trading system.

But even more interesting thing is the estimation of the overall indicator with added eurusd_ex^2 shown in the table:

Variable | Ratio | Probability of being equal to zero |
---|---|---|

TREND | 1.69E-05 | 0.7154 |

C | 1.9682 | 0.4496 |

EURUSD_EX | -2.3705 | 0.5317 |

EURUSD_HP | 0.4641 | 0.0020 |

EURUSD_EX^2 | 1.0724 | 0.4487 |

Table 12. Evaluation of the probability of the overall indicator ratios being equal to zero with eurusd_ex^2

The table shows that only the indicator based on Hodrick-Prescott filter is of some interest.

Redundant variables test – the null hypothesis: additional variable ratio is equal to zero.

Let's try to examine that from the other side and carry out the redundant variables test with the null hypothesis: the redundant variable ratio is equal to zero. We will indicate trend c as redundant variables in our complex indicator.

According to the calculated t and F-statistics, the probability of trend and c redundant variables being equal to zero is 92.95%. On this basis, it can be argued that our trading system has trend and c redundant variables. That corresponds with the previous results well enough.

The evaluation of the overall indicator consisting of the exponential average and Hodrick-Prescott filter has the following look:

Variable | Ratio | Probability of being equal to zero |
---|---|---|

EURUSD_EX | 0.4992 | 0.00 |

EURUSD_HP | 0.5015 | 0.00 |

Table 13. Evaluation of the probability of the overall indicator ratios being equal to zero, in case this indicator consists of the exponential moving average and Hodrick-Prescott filter

i.e. we have no doubts in usefulness of using these indicators in the trading system.

### 6. Residues Diagnostics

**6.1. ****Autocorrelation - Q Statistics**

Fig. 16. Autocorrelation function after deduction of the linear trend

The correlogram shows that deduction of the linear trend from the basic quote does not negate the presence of a trend, as shown by ACF. The probability of the correlation absence is equal to zero, i.e. we strictly reject the null hypothesis at all significance levels.

Fig. 17. Autocorrelation function after deduction of the exponential smoothing

The correlogram shows that deduction of the exponential curve from the basic quote has excluded the trend at all candlesticks higher than the second one, as shown by ACF.

According to the calculations, the probability of the correlation absence is equal to zero, i.e. we strictly reject the null hypothesis at all significance levels.

But if we put some additional efforts and exclude the correlation at the first two candlesticks, then we will be able to get the residue without correlations.

Fig. 18. Autocorrelation function after deduction of Hodrick-Prescott filter

The correlogram shows that deduction of Hodrick-Prescott filter from the basic quote has excluded the trend at all candlesticks higher than the third one, as shown by ACF. The probability of the correlation absence is equal to zero, i.e. we strictly reject the null hypothesis at all significance levels. But if we put some additional efforts and exclude the correlation at the first two candlesticks, then we will be able to get the residue without correlations.

Conclusion. The attempt to remove the deterministic component by deducting our indicators from EURUSD basic quote have failed completely for the linear trend and has partially succeeded for the exponential moving average and Hodrick-Prescott filter.

The further analysis of our indicators becomes meaningless because of the autocorrelation (deterministic component). We should find such an indicator that allows to exclude the autocorrelation in the residues. We will do that in the next section.

### 7. Creation and Examination of the Indicator Considering Analysis

At present we do not have a formal theory to create a set of indicators. The only way is the direct search with the selection of some set according to the analysis results.

From the previous autocorrelation analysis it was concluded that the autocorrelation at the first quotes candlesticks has remained after detrending.

Let's examine the following equation considering the mentioned fact:

EURUSD = C(1)*EURUSD_HP(1) + C(2)*D(EURUSD_HP(1)) + C(3)*D(EURUSD_HP(2))

D(EURUSD_HP(1)) means the residue between the quote and Hodrick-Prescott filter smoothing, the first lag (the second bar, not the first one when calculating the bars starting from one).

The evaluation of that equation ratios with the use of the least squares method leads to the following results:

Variable | Ratio | Probability of being equal to zero |
---|---|---|

EURUSD_HP(1) | 1.0001 | 0.0000 |

D(EURUSD(1)) | 0.8262 | 0.0000 |

D(EURUSD(-2)) | -0.4881 | 0.0000 |

Table 14. Results of the ratios evaluation by using the least squares method

According to the excessive variables test and the calculated t and F-statistics, the probability of the fact that the ratios in the presence of eurusd(1) and eurusd(2) variables are equal to zero is null, i.e. these two variables are not excessive.

The autocorrelation shows the absence of dependancies up to lag 16 with the probability more than 70% (first signature line):

Fig. 19. Residue autocorrelation

White heteroscedasticity test gives the result concerning F-statistics confirming that heteroscedasticity is absent with the probability of 80%.

Examination of the breakpoint according to Quandt-Andrews test with the null hypothesis: «no breakpoints» gives the result: the null hypothesis with the probability of 71% is accepted (no breakpoints).

It should be mentioned once again that examined quotes have at least one breakpoint (one trend reversal) according to the standard technical analysis. But our indicator has similar statistical parameters both for descending and for ascending trends and, therefore, it is invariant to the market state.

Ramsey integral test with the null hypothesis: «the errors in managing the regression is a normally distributed value» with the probability of 48% by t and F-statistics is accepted. On this basis, we can neglect the residue autocorrelation and its heteroscedasticity.

Also, that means that the linear squares evaluations are not shifted (the mathematical expectation of the examined value coincides with the examined value) and it is possible to carry out the recursive residues test.

Let's test the one step ahead recursive residues forecast. The upper part of the figure gives the recursive residues and limitation lines at two standard deviations. Besides, the left axis shows the probability for the quotes candlesticks, at which the indicator ratio constancy hypothesis would be deviated at 5%, 10% and 15% significance level. There are not so many of this points but their existence means false triggering of stop losses and take profits.

Fig. 20. Recursive residues forecast test

Let's name recursive estimates of the regression equation ratios. The chart is formed as follows: ratios values for the far left bar are calculated. Then one bar is added and ratios values are calculated again and again up to the very last bar. In case of small amount of bars at the left side, ratios values are very unstable, of course. However, while the number of bars used for the calculation is increased, stability (constancy) also strengthens.

Fig. 21. С(1) ratio recursive estimates

Fig. 22. С(2) ratio recursive estimates

Fig. 23. С(3) ratio recursive estimates

The figures show that certain instability was observed at the beginning of the quotes interval but then it may be considered that ratios values have become stable. However, strictly speaking, the ratios of our regression equation are not constants.

### Conclusion

This article presented one more proof of the fact that financial data is not stationary. The standard method of nonstationary data division into the data sum has been used in the article to get a stationary residue.

Having a stationary residue of basic quotes, we can answer the main question concerning the stability of the obtained indicator.

The information presented in the article is just the beginning of a trading system creation that can and should be based on the quotes forecasts.

### List of References

Translated from Russian by MetaQuotes Software Corp.

Original article: https://www.mql5.com/ru/articles/320