Discussion of article "Econometric Approach to Analysis of Charts" - page 10

 
denkir:

I understand, I use the concept of "class" or "interval" for this.

faa1947, in your figure I see that the distribution is not unimodal. That is another problem.

Then the number of classes (racks) is also calculated by some formula, rules... the most famous ones are:

Sturges' formula, Freedman-Diaconis rule, Scott's rule, Square-root choice, etc.

See my post above -Alexey. Still, you should start with the length of the sample, which is determined by the purpose of the forecast for the TS. I.e.: is the sample length sufficient to answer the question: is it possible to enter a pose (exit a pose) or not? Most likely, the sample length will not be significant (up to a hundred) and fantastic distributions like the one above will disappear.
 
-Alexey-:

denkir :

I think you can. They should not affect the statistical parameters of the sample in any way (in particular, the distribution parameters). That's why they are outliers.

What is the basis for the above statement? (by the way, this is not the reason why they are removed, although they do affect the parameters).

On the basis that the sample size is large enough not to be affected by outliers. This is one of the reasons why I favour large sample size. Then colleague faa1947 said, if I'm not mistaken, that outliers are not the same as outliers. It is clear that if you have for example eurusd rate fluctuated in the sample within the range of $1.20-1.50, and then there was some server error that returned the value of $15.77, then such an outlier will mess up all the statistical parameters of the sample.


-Alexey-:

A true distribution characterises the properties of a series, but a non-stationary one is by definition one in which they change.

But tell me, what is the connection between the true distribution and stationarity? I confess I've never met such a connection. Where did you get it from, if it is not a secret? ;-)


 
faa1947:
См. мой пост выше -Alexey. Все-таки надо начинать с длины выборка, которая определяется целью прогноза для ТС. Т.е.: достаточна ли длина выборки для ответа на вопрос: можно входить  в позу (выходить из позы) или нельзя? Скорее всего длина выборки будет не значительной (до сотни) и исчезнут фантастические распределения, подобные выше приведенному.

See, here's the thing. By reducing the sample size, we may miss such a phenomenon as volatility clustering. Then there is no point in using a non-linear model. So we can use a linear model. And this does not reflect the nature of the financial series, its derivative (not to be confused with the derivative in matrix analysis).

And what are these data on the basis of which you obtained bimodal distribution? Could you upload a file with the data?

 
denkir:

See, here's the thing. By reducing the sample size, we may miss such a phenomenon as volatility clustering. Then there is no point in using a non-linear model. So we can use a linear model. And this does not reflect the nature of the financial series, its derivative (not to be confused with the derivative in matrix analysis).

If we profess the system approach, we should dance from the goal: entry/exit to the pose. What will be the quotient in the dry residue? If we take a limited sample, it is not necessary to have volatility or non-stationarity at all, is it? The model has to be chosen. ARMA (not even ARIMA) is widely used. We do not have a goal to apply a certain model. The goal is to get a forecast with a reasonable level of confidence limits. It is interesting just to calculate the forecast for, for example, 100, 500, 1000 candlesticks. Maybe we will see something.

And what are these data on the basis of which you obtained the bimodal distribution? Could you upload a file with the data?

EURUSD D1 from 1999/01/04 to 2011/01/13, 3063 candlesticks taken from the terminal.

 
denkir:

On the basis that the sample is large enough to be unresponsive to outliers. This is one of the reasons why I am in favour of a large sample size. Then my colleague faa1947 said, if I am not mistaken, that an outlier is not the same as an outlier. It is clear that if, for example, the eurusd rate fluctuated in the sample within the range of $1.20-1.50, and then there was some server error that returned the value of $15.77, then such an outlier will mess up all the statistical parameters of the sample.


But you tell me, what is the connection between the true distribution and stationarity? I confess, I've never seen such a connection. Where did you get it from, if it is not a secret? ;-)


I answered you on the last page with a quote from the textbook. If it is not clear, I will explain in more detail:

The concept of a general population is in a certain sense similar to the concept of a random variable (probability distribution law, probability space), because it is completely conditioned by a certain set of conditions.

A certain set of conditions generates a stationary process. If it is undefined - there is no general population, respectively no true distribution. The logic is roughly like this. Although, of course, I could be wrong.

According to your answer - the answer is wrong. By the way, I was hoping to see a reference to the matrix on the basis of which you made the above statement. If you don't have it (i.e. you don't know why outliers are removed and how they affect the distribution), then I recommend you, as you did to me earlier, to learn the material. Everything is clear with large outliers, and you are right to note that there should be a check for them (such a check can detect a gap in quotes), but the question is not about them, but about 3-4-5 sigma. It is also interesting that even without touching the question whether they should be removed or not, the methodology of their removal (at least according to the book mentioned above, although it does not say everything) is a very big and long work, as I think, and quite complicated.

 
-Alexey-:

A certain set of conditions generates a stationary process. If it is uncertain, there is no general population, and therefore no true distribution. The logic is roughly like this. Although, of course, I may be wrong.

According to Wikipedia: the general population is the set of values that a researcher has selected for analysis. A researcher can sample into the general population by any rule, including something as dubious as "stationarity". It is normal when a rule is selected outside the concepts of statistics, for example, "all transactions of buying euros for rubles". The transactions are generated by some economic process that we know only approximately, we do not know the number of transactions. These two circumstances make us consider non-stationarity to be the main characteristic of currency quotations, all the rest are its signs.

Everything is clear with large outliers, and you have correctly noted that there should be a check for them (such a check can detect a gap in quotes), but the speech (and the question) is not about them, but about 3-4-5 sigmas.

Further, I bring histograms of D1 quotes of the euro-dollar pair in the amount of 100 candlesticks.

Here are the descriptive statistics of the sample

Further, one of Close values was consistently changed to 3 sigma, 5 sigma and 5 average.

We see:

1. Left-handed tail everywhere.

2. The number of histogram columns changes, even though 20 intervals per 100 candlesticks are specified everywhere

3. The p-value of the fit to the normal law and, accordingly, the confidence range of the fit changes.

 
faa1947:
According to Wikipedia: A population is a set of values that a researcher has selected for analysis. A researcher can select the general population by any rule, including such a dubious one as "stationarity". It is normal when a rule is selected outside the concepts of statistics, for example, "all transactions of buying euros for rubles". The transactions are generated by some economic process that we know only approximately, we do not know the number of transactions. These two circumstances make us consider the main characteristic of currency quotations to be non-stationarity, everything else is its signs.

Half of it is wrong, and the other half is understated, some of it is correct. That's why it is desirable to check Wiki with specialised literature, and to have it - recommended by the Ministry of Education and all that. Later I will write a definition from the encyclopaedic dictionary of statistics. I think you're completely confused.

A population is a set of values that a researcher has selected for analysis.

That's only when all existing data has been selected, and it is known that there are no others. Otherwise, if I take 10 candles, that's a sample population, not a general population. For example, when you measure the work of a machine, it is known that tomorrow there will be new data - there is no general population ever. The general population is evaluated by it (sample) - read about estimation, about sampling. And at estimation it is not fitted under the normal law (if it is not known that it is such), but the law is identified by special methods, and while it is unknown-how can something be removed (I gave you a link to the book - it is written about it there). And it is further evaluated not by one sample, but by several samples, special tests are compared, etc.
 
-Alexey-:

Half of it is wrong, and the other half is understated, some of it is correct. Therefore, it is desirable to check Wiki with specialised literature, and to have it recommended by the Ministry of Education and all that. Later I will write a definition from the encyclopaedic dictionary of statistics. I think you are completely confused.

This is only when all existing data is taken and it is known that there are no others. Otherwise, if I take 10 candles, that's a sample population, not a general population. For example, when you measure the work of a machine, it is known that tomorrow there will be new data - there is no general population ever. The general population is evaluated on it (sample) - read about evaluations. And at estimation it is not fitted to the normal law (if it is not known that it is such), but the law is identified by special methods, and while it is unknown-how can something be removed (I gave you a link to the book - it is written about it there). And it is further evaluated not by one sample, but by several, special tests are compared, etc.

Unfortunately, you allowed yourself to take out of context separate phrases, without trying to understand the meaning of my post. Once again. In forex, the rule for selecting the general population is all trades that are not known to us. I am not interested in the definition from the encyclopaedia as from Wiki - I am interested in the forecast, not in statistical properties of the general population. Under your pressure I almost got into botanical fornication and even went to the STATISTICS package to see the definition of "general population": the world-famous package does not consider this concept.

Emissions cleanup. Gave specifics of the effect of emissions on the fit, look closely. If you don't remove the emission, you will get the wrong fit, that's the whole point, and quite possibly with very good characteristics. If you change the outlier, either the parameters of the law or the law will change, all because of one or two values in the sample. After fitting, there is no need to clean the data.

It is still very desirable that you read textbooks, and better documentation of mat packages on econometrics, not on matstatistics - it will improve your understanding. Any econometrics package starts with preparation of data for analysis, not vice versa, first fitting and then data preparation.

 
OK, I'll leave you to it. Any package is just a tool - nothing more. We seem to speak different languages :) I just didn't want readers to be misled, but I hope those who need it will understand.
 
-Alexey-:
. :) I just didn't want the readers to be misled, but I hope those who need it will figure it out.

Ok, I'll leave you to it.

I don't think it's the right decision

Any package is just a tool - nothing more. We seem to be speaking different languages.

Please forgive me for letting you give you advice about packages. In theoretical arguments, such as theses and dissertations, your approach is quite valid and preferable to mine. But we have a practical problem and a package is not just a tool, but a system in which many concepts are brought together to produce a practical result. We get not just an unambiguous interpretation of terms, but also their unambiguous computation. The author of the article mentioned GARCH, and this is a very stretchy concept, much less unambiguous than "general population".

I hope you will continue to participate in the topic.