No universal methods for removing outliers

Denis Kirichenko 2011.02.27 23:07 #71

faa1947:

As it seems to me, the topikstarter, tried to solve the question with a dashing sabre kick - econometric packages offer many more models than GARCH. Choosing a model and then choosing model parameters is the middle of the road, not the beginning of it...

Yes, more. GARCH was taken for illustrative purposes as an example. It was not evaluated at all. Much less other models. This has been said several times.

In previous posts there was criticism about analysis based on differences. I think this criticism arose because the author missed the stage of preparation of initial data.

According to the author of the article, non-stationarity is the only evil of the market. It is not.

It isn't. The other evils are "fat tails", volatility clustering and leverage effects....

1. We need to decide on the number of candlesticks in the sample. Does the number of candles in the sample depend on the timeframe? According to the literature, 50 candlesticks should be enough.

And I have seen data that the disadvantage of non-linear models is the need for a large sample... about 1000 pcs. about 1000 pieces.

2. Let's try to fit a distribution to our sample. Preferably a normal distribution. The immediate question will be the number of racks the graph is plotted against. Where did you get the number of racks on which the graph is plotted?

It will not be normal. I will write about this later in the article about distributions. It's coming out soon.

Please clarify the meaning of the term "rack".

Machine learning in trading: Do you know how Using Neural Networks in

Denis Kirichenko 2011.02.27 23:12 #72

faa1947:
наличие выбросов: следует заменить выбросы, т.е. котировки свыше некоторого порога (например, 3 сигма) на величину порога. У Булашова другое мнение о величине порога.

- check by Fourier or ACF for cycles, just in case. Due to the limited sample and the properties of the market itself, there are most likely no cycles.

- solve the problem of trends. I cannot agree with the author - detrending by subtracting the MOG is a serious simplification of the problem. The logarithm is taken for an exponential trend, while for an additive trend the first differences are sufficient. The trend will have to be dealt with separately and regressions will be necessary, and all the variety of regressions. You have to subtract the regression, not the MOG. This is for deterministic trends, but there are also statistical trends.

Without addressing these issues, reasoning about the statistical characteristics of the sample has no basis...

I agree that we need to work with sampling. It is already a matter of mat. statistics....

There are no universal methods for removing outliers...

That's why the sample size should be large.

About trends. I haven't investigated the issue. I'll keep it in mind.

FOREX and ECONOMETRY. Theory, Machine learning in trading: From theory to practice

Denis Kirichenko 2011.02.27 23:17 #73

-Alexey-:
Браво! Отличный пост, затрагивает многие вопросы. Но, некоторые пункты можно критиковать. Например, один из них - на основании чего вы решили, что нужно удалять выбросы? Их удалять нельзя.

I think you can. They should not affect the statistical parameters of the sample in any way (in particular, the distribution parameters). That's why they are outliers.

Denis Kirichenko 2011.02.27 23:24 #74

-Alexey-:

As far as I know, outliers are removed when making measurements when it is known in advance that the results are united by at least some law, i.e. in other words, when the process generating the measured value is non-random or random stationary, and the outlier can be caused by randomness (exceeding the limits of non-randomness or stationarity), and such randomness in this case is a distortion. If we deal with a series of prices, non-stationary, then randomness of any level is a part of statistics (besides the non-random part, but it is difficult to separate them), and removal of a part of statistics, respectively, is a distortion of statistics. I am closer to the idea that when working with a random non-stationary process we have no right to remove (cut) something.

I don't know. Where did you get this idea from? Usually an outlier is considered as such when it goes beyond some value of the statistical criterion. By the way, the procedure is described by Bulashev. I.e. the presence of non- or stationarity is not a criterion for whether to apply the procedure of emission removal or not.

Random Flow Theory and The Sultonov Regression Model How to code?

Denis Kirichenko 2011.02.27 23:29 #75

faa1947:
Выброс выбросу рознь. Приходится просматривать котировки. Если выброс - это относительно редкое явление, то следует обрезать до порога (не удалять). Если это не так, то не понятно что делать. В принципе, выбросы сильно искажают статистику. Любой пакет статистики предусматривает такую возможность и дает соответствующие рекомендации.

I'd rather agree. But it is necessary to clearly understand on what basis we delete some emission. I mean, we need to build a system of rules... statistical processing.

Denis Kirichenko 2011.02.27 23:31 #76

faa1947:

The deeper problem of applying mathematical statistics and econometrics is that both initial data, intermediate results and conclusions have to be checked by extra-mathematical - intuitive methods. Selection of the cut-off threshold (2, 3, 4 sigma or other) is possible only after visual viewing of the graph and refers to the problem of selecting confidence intervals. The biggest problem of applying matstatistics is that its application is not conceivable without the art of the statistician himself. No one will formulate the rule "cut - don't cut". If you cut - you remove the characteristic of non-stationarity, if you do not cut - you distort the true distribution of the general population by unsuccessful sampling.

The heart of econometrics is hypothesis testing, where it is possible to make errors of the first and second kind: to reject the correct null hypothesis in favour of the incorrect alternative hypothesis, and to reject the correct alternative hypothesis in favour of the incorrect null hypothesis.

Given the above, I can agree and disagree with you at the same time. It is impossible to answer your question unequivocally without considering a specific sample in advance.

I agree. There is subjectivism. But I'm sticking to having the hypothesis evaluated using mat stats.....

Denis Kirichenko 2011.02.27 23:58 #77

-Alexey-:

What sections can you recommend? For each of my statements (including questioning) I can provide a link to the matrix.

Yes? In my opinion, you asked faa1947 such questions that I think you are not aware of the issues.

For example, statistical distribution is a variation characteristic. Stationarity is temporal...

that's your pearl:

A true distribution characterises the properties of a series, but a non-stationary one is by definition one in which they change.

Then about model parameters and fitting... When the model parameters are set, nothing is fitted...

Econometrics: why co-integration is Where is the line Machine learning in trading:

Denis Kirichenko 2011.02.28 00:13 #78

faa1947:

Even the topkstarter is not participating. I would like some consistency in the discussion and development of the article under discussion. For example, in the first step, on a concrete example, to consider in detail the preliminary analysis of data and its preparation for modelling. For example:

1. Justification of the sample size.

2. Justification of the need for data transformation.

3. Choosing how to transform the data:

- dealing with outliers and missing data.

- Data transformation - removal of trends, cyclicality

4. Determination of trend types and their accounting

5. Fitting the distribution to the transformed data.

6. Analysis for stationarity of transformed data.

7. Accounting for heteroscedasticity

The topikstarter is a bit dumbfounded by the interest shown :-))))

Now that's real constructive criticism, imho. Many thanks to colleague faa1947. I'll take some time out.... I'll try to post my thoughts later.... but in general, I agree with the proposed procedural list....

"New Neural" is an And let's make a Discussing the article: "Developing

--- 2011.02.28 00:34 #79

denkir:

I'm going to take some time out.....

thank you.

Denis Kirichenko 2011.02.28 00:48 #80

sergeev:
thank you.

What do you mean? :-)

It's better to keep quiet, it's more useful? Did you mean something like that from the classics?

- When you speak, Ivan Vasilyevich, it seems as if you are delirious.

Discussion of article "Econometric Approach to Analysis of Charts" - page 8