Discussion of article "Statistical Estimations"

 

New article Statistical Estimations is published:

Estimation of statistical parameters of a sequence is very important, since most of mathematical models and methods are based on different assumptions. For example, normality of distribution law or dispersion value, or other parameters. Thus, when analyzing and forecasting of time series we need a simple and convenient tool that allows quickly and clearly estimating the main statistical parameters. The article shortly describes the simplest statistical parameters of a random sequence and several methods of its visual analysis. It offers the implementation of these methods in MQL5 and the methods of visualization of the result of calculations using the Gnuplot application.

Author: Victor

 
Для тех, кто серьезно занимался (-ется) анализом совместного движения фин. инструментов (> 2-х) - MQL4 форум
  • www.mql5.com
Для тех, кто серьезно занимался (-ется) анализом совместного движения фин. инструментов (> 2-х) - MQL4 форум
 

"Eliminating outliers.


Before proceeding to the estimation of statistical parameters, it should be noted that the precision of the estimate may be insufficient if the sample contains gross errors (outliers). The impact of outliers on the accuracy of estimates is particularly strong when the sample size is small. Outliers are values that deviate anomalously from the centre of the distribution. Such deviations can be caused by various kinds of unlikely events and errors that occurred during the collection of statistics and sequence generation.

It is rather difficult to decide whether to filter outliers or not, because in most cases it is impossible to determine unambiguously whether a given value is a outlier or belongs to the process under consideration. If outliers are detected and a decision is made to filter them, the question arises, what to do with these erroneous values? The most logical thing to do is to simply exclude them from the sample, and the accuracy of estimation of statistical characteristics of the general population may increase, but one should not forget that when dealing with time sequences, one should be careful about excluding samples from the sequence."

It's better not to do it at all.

Yes, all data should be validated, and yes, validation should be automated.

But it is better to discard a data source than to manipulate the original data, manually or automatically.

In real life, accepting or excluding large risks on the basis of their "low probability" is the cause of many tragedies and disasters.

 

Victor, this is the kind of question.

Do you think Kurtosis can be less than 1?

If so.

gs=(1.55+0.8*MathLog10((double)n/10.0)*MathSqrt(kurt-1))*MathSqrt(sum2/(n-1));

would be equal to -1.:-)

Great article!

 
denkir:

Victor, this is the kind of question.

Do you think Kurtosis can be less than 1?

If so.

would be equal to -1. :-)

Great article!


Most likely, theoretically kurtosis cannot be less than one. Probably a value equal to one would be obtained for a sequence consisting of straight line samples. For example, 1,2,3,4,5.

Whether due to errors, the algorithm used in the article can give a value of kurtosis less than one, I don't know. At the end of the article it was mentioned that the behaviour of the coefficient calculation algorithm has not been investigated.

 

Indeed, when computing unbiased estimates, kurtosis can take a value less than one. For example, for the input sequence 4,7,13,16.

Thank you for your remark. I will make changes.

 
Corrections have been made.