Outlier is not the same as an outlier in measurement

[Deleted] 2011.02.26 13:53 #61

faa1947:

Bravo! Great post, touches on many issues. But, some points can be criticised. For example, one of them - on what basis did you decide that you need to remove outliers? They should not be removed.

Non-fitting system - main Machine learning in trading: Fibonacci levels: myth or

СанСаныч Фоменко 2011.02.26 20:19 #62

-Alexey-:
Bravo! Great post, touches on many issues. But, some points can be criticised. For example, one of them - on what basis did you decide that you need to remove outliers? You can't remove them.

An outlier is not the same as an outlier. You have to look at the quotes. If an outlier is relatively rare, then you should cut it to a threshold (not delete it). If it is not, it is not clear what to do. In principle, outliers heavily distort statistics. Any statistics package provides for this possibility and gives appropriate recommendations.

MT5 tester bug: same Machine learning in trading: The Sultonov Regression Model

[Deleted] 2011.02.27 01:53 #63

faa1947:
An outlier is not an outlier. You have to look at the quotes. If an outlier is a relatively rare occurrence, then it should be trimmed to the threshold (not deleted). If it is not, it is not clear what to do. In principle, outliers heavily distort statistics. Any statistics package provides for this possibility and gives appropriate recommendations.

As far as I know, outliers are removed in measurements when it is known in advance that the results are united by at least some law, i.e. in other words, when the process generating the measured value is non-random or random stationary, and the outlier can be caused by randomness (exceeding the limits of non-randomness or stationarity), and such randomness in this case is a distortion. If we deal with a series of prices, non-stationary, then randomness of any level is a part of statistics (besides the non-random part, but it is difficult to separate them), and removal of a part of statistics, respectively, is a distortion of statistics. I am closer to the idea that when working with a random non-stationary process we have no right to remove (cut) something. By the way, you did not answer what you think is the ultimate goal of, for example, trimming. Statistics packages are probably aimed at working with stationary series, and the recommendations to trim anomalous values are valid in this case.

If this is not the case, it is not clear what to do.

What do you mean by this?

Random Flow Theory and First sacred cow: "If Signs of a REAL

СанСаныч Фоменко 2011.02.27 10:24 #64

-Alexey-:

As far as I know, outliers are removed when making measurements when it is known in advance that the results are united by at least some law, i.e. in other words, when the process generating the measured value is non-random or random stationary, and the outlier can be caused by randomness (exceeding the limits of non-randomness or stationarity), and such randomness in this case is a distortion. If we deal with a series of prices, non-stationary, then randomness of any level is a part of statistics (besides the non-random part, but it is difficult to separate them), and removal of a part of statistics, respectively, is a distortion of statistics. I am closer to the idea that when working with a random non-stationary process we have no right to remove (cut) something. By the way, you did not answer what you think is the ultimate goal of, for example, trimming. Statistical packages are probably aimed at working with stationary series, and recommendations to trim anomalous values in this case are valid.

What do you mean by that?

Even the ARIMA model deals with non-stationary series by reducing them to a stationary form.

It seems to me that the problem of cutting quotes has two layers: superficial and deep.

On the surface there are problems, for example, of knocking down stops, which have nothing to do with the non-stationarity of the market.

The deeper problem of applying mathematical statistics and econometrics is that both initial data, intermediate results and conclusions have to be checked by extra-mathematical - intuitive methods. The choice of the cut-off threshold (2, 3, 4 sigma or other) is possible only after visual viewing of the graph and refers to the problem of choosing confidence intervals. The biggest problem of applying matstatistics is that its application is not conceivable without the art of the statistician himself. No one will formulate the rule "cut - don't cut". If you cut - you remove the characteristic of non-stationarity, if you do not cut - you distort the true distribution of the general population by unsuccessful sampling.

The heart of econometrics is hypothesis testing, where it is possible to make errors of the first and second kind: to reject the correct null hypothesis in favour of the incorrect alternative hypothesis, and to reject the correct alternative hypothesis in favour of the incorrect null hypothesis.

Given the above, I can agree and disagree with you at the same time. It is impossible to answer your question unequivocally without considering a specific sample in advance.

Econometrics: one step ahead Rate of price change, Machine learning in trading:

[Deleted] 2011.02.27 14:45 #65

Даже модель ARIMA работает с нестационарными рядами путем приведения их к стационарному виду.

And even after that the model orders can change in time. Conclusion - a non-stationary data series has been fitted to a model (method) designed to work with a stationary series at some point. Since this is so - it is necessary to somehow investigate how often to do the chasing, how long it works. Without that - how do you use an inappropriate model?

did not cut - distorted the true distribution of the general population by the failed sample

Wt here I think the reasoning is wrong. A non-stationary series has no general population, otherwise it is a stationary series. And since it is, there is no true distribution.

About the knockdown - how do you know if it was or not? Of course, if you analyse the data of several DCs, near strong levels, even round ones, and see that some of them knocked down the stop (for which you need to introduce a criterion), then I agree with you, there appear some quasi-objective grounds for cutting. But - to establish this is a whole work, a big research.

The deeper problem of applying mathematical statistics and econometrics is that both initial data, intermediate results and conclusions have to be checked by extra-mathematical - intuitive methods. The choice of the cut-off threshold (2, 3, 4 sigma or other) is possible only after visual viewing of the graph and refers to the problem of choosing confidence intervals. The biggest problem with the application of matstatistics is that its application is not conceivable without the art of the statistician himself.

That's not what I would call it. The art, or maybe just the degree of training of the statistician is determined by the way he is able to estimate the limits of applicability of methods oriented to work with a stationary series in relation to a non-stationary series. But not intuitively to estimate, but still quantitatively (numerically).

Machine learning in trading: Classical thechanalysis doesn't work Remembering veterans: Box and

СанСаныч Фоменко 2011.02.27 18:01 #66

-Alexey-:

And even after that the model orders can change in time. Conclusion - a non-stationary data series was fitted to a model (method) designed to work with a stationary series, at some site.

Standard reasoning in TA: non-stationary series is a sum of stationary sections with different characteristics. If we take Matlab's toolbox, this issue is not considered at all: it is considered that the BP has several differences from the normally distributed one and then we fight with these deviations. Not all of them are dealt with.

Since this is the case - it is necessary to somehow investigate how often to do the chase, how long it works. Without that - how do you use an inappropriate model?

There is no such problem. There are two kinds of forecast: one step ahead (for the next candle) and many steps ahead.

Wt here I think the reasoning is wrong. A non-stationary series has no general population, otherwise it is already a stationary series. And since it is, there is no true distribution.

I disagree in principle. Stationarity is a characteristic of the series, not the size of the population.

About knocking down a stop - how do you know if it was or not? Of course, if you analyse the data of several DCs, near strong levels, even round ones, and see that some of them blew off the stop (for which you need to introduce a criterion), then I agree with you, some quasi-objective grounds for cutting appear. But - to establish this is a whole job, a lot of research.

Knocking down a stop is just an example. When reviewing quotes, we have to decide what we're going to take as quotes, and what drops out for reasons unknown to us.

That's not what I would call it. The art, or maybe just the degree of training of a statistician is determined by how he is able to estimate the limits of applicability of methods oriented to work with a stationary series in relation to a non-stationary series. But not intuitively to estimate, but still quantitatively (numerically).

I disagree. It is impossible to bring to a stationary series completely. that is where errors in the definition of hypotheses come from.

[Deleted] 2011.02.27 19:33 #67

faa1947:

Standard reasoning in TA: a nonstationary series is the sum of stationary sections with different characteristics.

Is there a basis for this reasoning? After all, the series obtained by the mentioned sum can be obtained without it - by chance, and it can be obtained by another sum of other segments with other laws. And since it is so - then how to be (what is true)?

If we take Matlab's toolbox, this issue is not considered at all: it is considered that the BP has some differences from normally distributed and further struggle with these deviations.

On the basis of what is this considered?

There is no this issue. There are two kinds of forecasting: one step ahead (for the next candle) and many steps ahead.

What does this have to do with the fact that a model with different parameters may be more optimal at the next steps?

I disagree in principle. Stationarity is a characteristic of the series, not the size of the population.

It is not clear about this - can you describe in more detail what you mean? A true distribution characterises the properties of a series, but a non-stationary distribution is by definition one in which they change. Therefore, there is no true one belonging to the general population. N.R. has only one true one, and at a given moment of time, and for a finite number of candlesticks.

Knocking down a stop is just an example. When reviewing quotes, we need to decide what we will take as quotes, and what falls out for reasons unknown to us.

And on what basis can you decide something if the reasons are unknown?

I disagree. It is completely impossible to reduce to a stationary series. hence the errors in defining hypotheses.

Now I disagree in principle. With the wording itself. How can a series whose characteristics change randomly be made stationary? I.e., this approach is not grounded in anything, so what hypotheses can we talk about?

Building a trading system Calculate the probability of From theory to practice.

СанСаныч Фоменко 2011.02.27 20:45 #68

-Alexey-:

Is there any basis for this reasoning? After all, the series obtained by the mentioned sum can be obtained without it - by chance, and it can be obtained by another sum of other plots with other laws. And since this is the case - what is the case (which is true)?

In my post I argued that there is no basis for it. in TA there is simply no other way.

On what basis is that considered to be the case?

It's not my opinion - that's what all matstatistics is based on.

What does that have to do with the fact that a model with different parameters may be more optimal in the next steps?

There is no such thing as "optimal". Either there is a fit with some level of confidence or there is not. There is a fit - there is a prediction

It's not clear about this - can you describe in more detail what you mean?

The number of SVs in BP is not involved in determining stationarity.

And on what basis can anything be decided if the causes are unknown?

This is the standard for random processes. If the causes are known, then most likely a deterministic process.

Now I disagree in principle. With the wording itself. How can a series whose characteristics change randomly be made stationary? I.e. this approach is not grounded in anything, what hypotheses can we talk about?

GARCH is a model with changing volatility, for example.

We have a two person discussion and it has become too abstract. Even the topkstarter is not participating. I would like some consistency in the discussion and development of the article under discussion. For example, in the first step, on a concrete example, to consider in detail the preliminary analysis of data and its preparation for modelling. For example:

1. Justification of the sample size.

2. Justification of the need for data transformation.

3. Choosing how to transform the data:

- dealing with outliers and missing data.

- Data transformation - removal of trends, cyclicality

4. Determination of trend types and their accounting

5. Fitting the distribution to the transformed data.

6. Analysis for stationarity of transformed data.

7. Accounting for heteroscedasticity

That's enough for now. Another plan is quite permissible. I would like to organise some systematic presentation of the problem of preparing the quotient for the modelling described in the paper under discussion.

Denis Kirichenko 2011.02.27 20:53 #69

Topikstarter[herringbones, it hasn't been called that yet] is in a bit of a creative crisis :-))))

But he's following the discussion....

He is grateful to faa1947 for making constructive comments...

-Alexey-, I would recommend you to study the matrix ...

I will consider all comments, later I will present my counterarguments and arguments...

Does the Grail exist? -Hot cakes, pancakes, pancakes! Service Work: Towards re-shaping

[Deleted] 2011.02.27 21:20 #70

denkir:

-Alexey-, I would recommend you to study the matrix....

What sections can you recommend? To each of my statements (including questioning) I can provide a link to the matrix.

P.S. And what moment is the creative crisis connected with, if it's not a secret? :)

[ARCHIVE!] Any rookie question, [Archive!] FOREX - Trends, FOREX - Trends, forecasts

Discussion of article "Econometric Approach to Analysis of Charts" - page 7