Bernoulli, Moab-Laplace theorem; Kolmogorov criterion; Bernoulli scheme; Bayes formula; Chebyshev inequalities; Poisson distribution law; Fisher, Pearson, Student, Smirnov etc. theorems, models, simple language, without formulas. - page 7

 
sever31: What are "tails" in a distribution? Are they outliers that are clearly knocked out of the general pattern of the distribution?

Well, roughly, but not quite. Yes, we are talking about values of a random variable that are very different from its mean.

Usually tails come in thick and thin. Here is a very loose definition of a tail: it is the probability of an outlier exceeding a given one.

The thickness of the tail is not determined by the magnitude of the outlier itself, i.e. the deviation from the mean, but by the probability of such strong deviations. The higher it is, the thicker the tail is.

A normal distribution is generally considered to have thin tails. I don't know of any practical distribution whose tails are thinner than those of the normal distribution.

And now for an even more precise definition of tails. But first, a picture and a little introduction:

This is the well-known picture of a bell, i.e. a Gaussian distribution. The curve drawn here is the density function of the distribution (here a normal distribution). At the bottom are drawn sigmas - standard deviations. Sigma is a measure of how narrow or wide a distribution (any) is.

The area under any distribution density function (f.p.r., in English literature pdf, probability distribution function) is always 1.

Any pdf is non-negative. This actually reflects the fact that probability is always non-negative.

If we want to find the probability of a random variable being between sigma and two sigmas (to the right of the mean), it is sufficient to find the area under the curve bounded by the vertical lines "+ sigma" and "+ 2*sigma". Let us denote it as follows: P( sigma <= X < 2*sigma). Keep in mind that even at +1000*sigma this function is still not equal to zero. Yes, it decreases very quickly (like mathExp(-x^2)), but it does not become zero.

Now back to the tails. The right tail is the function right_tail( X; X0 ) = P( X0 <= X < infinity ). Please note again that the tail is exactly a function of X0. The larger X0 (to the right), the smaller the function is usually. I.e. usually (not always, but asymptotically always) this function is a decreasing function from X0 and tends to zero.

For the normal distribution right_tail_normal( X; X0 ) ~ mathExp(-X0^2) or something comparable (can't remember, it's a non-elementary function).

But for the Laplace distribution (see picture in my previous post):


right_tail_laplace( X; X0 ) ~ mathExp(-a*X0). Note: this is already another function that tends to zero much faster than the tail of the normal distribution!

And here is another one - the Cauchy distribution:


For it right_tail_cauchy( X; X0 ) ~ 1 / X0. This function is even slower to zero as x increases.

We have seen three different right_tail( X; X0 ) functions. The real difference between the tails of the different pdf's is the different rates of decrease of this function for different pdf's. For the normal distribution the function decreases very fast (thin-tailed), for the Laplace distribution it decreases quite fast but infinitely faster than the first one (already thick tail), for Cauchy it is infinitely faster than both first ones (creepy fat tail).

 
Mathemat:

Not a good idea to illustrate a normal distribution. I'm not sure that stopping the process at, say, 10,000 will give exactly a normal distribution in the cross section. Also, this distribution has parameters that are constantly changing.

Please elaborate on this point if possible. Frankly speaking, I don't understand why the bell that appears is not normal. The point is that each line is a trajectory of wandering of a particle, all particles have the same binomial process of increment and finite and equal number of steps, therefore any aggregate process has identical aggregate properties. How can the parameters change?
 
C-4:
From this point if you can elaborate. To be honest, I don't understand why the bell that's being drawn is not normal? The point is that each line is a trajectory of wandering of the particle, all particles have the same binomial process of increments and finite and equal number of steps, . How can the parameters change?

Of course, that's what I want details from you.

1. "all particles have the same binomial accretion process" - explain what this means. This is the first time I hear about such process. What is the distribution function of the increments?

2. "hence any aggregate process has identical aggregate properties" - well that too is completely incomprehensible and not at all mathematical.

If you make a "cross section" of this entire set of trajectories at abscissa, say 10000, then each trajectory will mark a point there. How can you be sure that all these points are distributed exactly according to the normal law?

 
Mathemat:

If you "cross section" all this set of trajectories on abscissa, say 10000, then each trajectory will mark a point there. How can you be sure that all these points are distributed exactly according to the normal law?


The central limit theorem. The random variable in question is the sum of a large number (10000) of independent random variables which means that its distribution is close to the normal distribution.
 

1. "все частицы обладают одним и тем же биноминальным процесоом приращения" - поясните, что это означает. Я впервые слышу о таком процессе. Какова функция распределения приращений?

Maybe I didn't put it accurately. I meant this, which in turn comes from the accumulation of discrete random variables: -1 и +1.

If you "cross section" all that accumulation of trajectories at abscissa, say 10000, then each trajectory will mark a point there. How can you be sure that all these points are distributed exactly according to the normal law?

Now I don't understand at all why these points can be distributed non-normally if each of them has the same RMS and the same number of steps of 10,000? We need to set up an experiment and plot the probability hits, I bet it will be normal, with the top of the bell at zero.

 

You convinced me, Avals.

I was picking on you, C-4. I still don't understand about the "binomial incremental process", though. Well, let's assume you meant increments distributed according to some law with finite m.o. and variance.

 
In the frames of my research I needed to generate a random stock graph of OHLC type. When it comes to Returns everything is simple: we generate random numbers within specified limits of MO and variance (Excel allows to do such a thing), but how to create from these returns charts of OHLC type, that is the question. The difficulty is in defining the normal range of High and Low with respect to Open and Close. That is why I ask experts to advise how to correctly make OHLC from returnees. Of course, one can of course randomly generate each tick and "collect" it from the tick history of an OHLC candlestick, but it is a very slow and meaningless method.
 
C-4: It is of course possible to randomly generate each tick and "assemble" OHLC candles from the tick history, but this is a very slow and pointless method.
But it is very accurate, because it does not require introducing several arbitrary parameters. But it doesn't avoid the necessity to know the statistical characteristics of the tick process :). And it is not similar to the Wiener in some ways. For example, it is considerably more returnable than the standard Wiener process.
 

Yes, very accurate indeed. But it is the speed that is the problem. I'm just writing in C# + WealthLab - and it's quite a cumbersome bunch. I tried to generate 100 bars with 3000 ticks each and it ended up taking 8-10 seconds. I need to generate at least 500 000 bars, and preferably 3-4 million (about 10 years of one-minute history).

It seems that the input to the formula should be variance, MO, number of ticks, the output should have an OHLC bar. It looks like this.

Let's simplify the task for the first approximation: let's generate fully "normal" OHLC. Let it be a classical normal distribution. Another thing is that afterwards we would like to generate a distribution based on this formula that approximates real market ones - for example, take the real volatility of instruments and generate a random OHLC based on it.

 
Do as you like. I can't advise you as I don't know the characteristics of the ticking process.
Reason: