Bernoulli, Moab-Laplace theorem; Kolmogorov criterion; Bernoulli scheme; Bayes formula; Chebyshev inequalities; Poisson distribution law; Fisher, Pearson, Student, Smirnov etc. theorems, models, simple language, without formulas. - page 6

 
sergeyas:

Let's listen to Alexei's presentation first, since he was the first to do so.

Yusuf and everyone else, please don't take it as a diminution of your knowledge on the subject.

Instead of consistency, you start piling on additional terminology and getting ahead of yourself.



It's a traders' disease. Afraid of not being able to push the button. I'm like that myself.
 

The concept of a normal distribution from Chapter 9 of Bollinger on Bollinger Bands

 
 

This thread promises to be a good reservoir of knowledge.

A long time ago I decided to get the normal distribution in practice, for which I conducted a numerical experiment. I took 500 accumulative series of 10,000 independent tests. We obtain 500 random unconnected graphs. We take for them the same reference point and look how they will diverge with time, or to be more exact, with the increase of the number of tests. So their divergence will obey the normal distribution law and in the aggregate they will form a bell of the normal distribution:

What is interesting is that the average divergence will be equal to the square root of the number of trials. Thus, after 1,000 trials we have the right to expect that on average any of the series, will be 32 points away from its original zero position, while after 10,000 trials it will be a mere 100 points away. You can see that by the shape of the bell. At first it diverges sharply enough to the sides, and then the "speed" of divergence begins to decrease.

An interesting fact is that the sum of all 500 series, no matter how many trials there are in them, will be approximately zero. This is perfectly illustrated in the picture: 50% of the series were above zero after 10,000 trials, while 50% were above zero. Thus, the average state or mathematical expectation of all the systems will tend towards zero.

I therefore have a question for connoisseurs: how to calculate the deviation of the actual mathematical expectation from the theoretical, zero MO? After all, of course, there is nothing to expect that the sum of all tests will be clearly equal to 0. It may be equal to +3 or -20 or so. And a second sub-question: will this error value collapse to zero with increasing trials, or will it "freeze" at a level proportional to the square root of the number of trials?

 
C-4:

how to calculate the deviation of the actual mathematical expectation from the theoretical, zero MO? After all, of course, there is nothing to expect that the sum of all tests will be clearly 0. It may be +3 or -20 or so. And a second sub-question: will this error value collapse to zero with increasing trials, or will it "freeze" at a level proportional to the square root of the number of trials?


sb is the sum of independent random variables. Let the increments be normally distributed with mo=0, sko=X. Then the sum of N increments is also NR with mo=0, sko=SQRT(N)*X, which is what you have in the figure (N there is 10000).

If we take the sum of M such independent sbs, it will also be normally distributed with mo=0, sko=SQRT(M*N)*X

So when the number of trials increases, the sum will not freeze or tend to zero, but will instead increase in proportion to the root of the number of trials. But the arithmetic mean (also divided by the number of trials), will converge to zero when the number of trials increases due to the Bernoulli theorem already considered

 
What are "tails" in the distribution? Are they outliers that are clearly out of the distribution?
 

Если взять сумму M таких независимых сб, то она так же буден распределена нормально с мо=0, ско=SQRT(M*N)*X

OK, let me try to solve the problem: 10 accumulative series of 10,000 tests each are given. The final result of the series is as follows:

1
145
2
-32
3
-80
4
25
5
-172
6
102
7
78
9
-121
10
95
Total
40

The sum of M independent sibs is +40. Substitute the result into the formula: SQRT(40*10,000) * 100 = 63,245. Something inadequate result turns out. I must have misunderstood what "sum of M" means.

Or does it mean that all experiments should be lined up one by one and analyze the deviation of the final result from the M.O.?

 
C-4: A long time ago I decided to get a normal distribution in practice, for which I carried out a numerical experiment. I took 500 accumulative series of 10,000 independent tests. We obtain 500 random unrelated graphs. We take for them the same reference point and look how they will diverge with time, or to be more exact, with the increase of the number of tests. So their divergence will obey the normal law of distribution and in the aggregate they will form a bell of the normal distribution:

Not a good idea to illustrate a normal distribution. I'm not sure that stopping the process at, say, 10,000 will give exactly a normal distribution in the cross section. Besides, this distribution has parameters that are constantly changing.

If I'm wrong - give me a link where it is claimed that the distribution of the "cross section" (i.e. divergences from zero) is at least asymptotically normal.

SProgrammer: Understanding this is the key to understanding 90% in the theorem.

Without formulas, you won't get a feel for it to the liver. You know it yourself. But you can't use formulas here.

yosuf: This shows that the solutions of the material balance equations and the theoreter laws coincide and they are mutually complementary in interpreting the results of the analysis of phenomena.

Haven't you heard that the gamma function is found in all sorts of fields of science and engineering?

I don't see anything supernatural in its appearance when solving diphuras. And you only brought up gamma distribution because you saw what this function is called in Excel. Well what connection in your diphuras with a terver, Yusuf?!

SProgrammer correctly says that there are very few actually used distributions in the terver/matstat - although you can make them up as much as you like. So I recommend you, if you are still so enthralled by (18), to try to think about Erlang and where you got it from. Just try to put your reflections not as pithy conclusions like the one quoted above, but in a more complete form.

I looked up Feller, vol. 2. There is something about gamma-distribution, but it has horrible formulas and only a couple of words about Erlang. So not here.

But there is something interesting about the exponential distribution (Feller, vol. 2, p. 69):


This is particularly interesting because the distribution of price returns is well approximated by the Laplace distribution.
 
C-4:

OK, let me try to solve the problem: 10 accumulative series of 10,000 tests each are given. The final result of the series is as follows:

1
145
2
-32
3
-80
4
25
5
-172
6
102
7
78
9
-121
10
95
Total
40

The sum of M independent sibs is +40. Substitute the result into the formula: SQRT(40*10,000) * 100 = 63,245. Somewhat inadequate result turns out. I must have misunderstood what "sum of M" means.

Or does it mean that all experiments must be lined up in a chain one by one and analyze the deviation of the final result from the M.O.?


Basil, let's start at the beginning. Have you modelled a random walk as a cumulative sum of coin-type increments? Two outcomes +1 and -1 with equal probabilities 0.5/0.5. This random variable itself is not normally distributed - it is a discrete distribution with 2 values. Its MO=0 and RMS=SQRT(0.5*0.5)=0.5

Then we already consider the random walk as a sum of these increments. Suppose we take 10000 increments as you do. What will it be equal to? Obviously it is a random variable (the second one). If the increments are independent, this distribution will converge to normal with increasing number of trials with MO=0, RMS=SQRT(10000)*0.5=50. From this and the 3x sigma rule for example, it can be deduced that over 99% of the realizations of this SV will fall in the interval -150...+150. I.e. outside this interval less than 10000*0.01=100 CB realizations.

Then you already consider the sum of these CBs. You have in the column the sum of 10 realizations of this CB. It will be new (already third) SA, which is also distributed normally with MO=0, RMS=50*SQRT(10) =158. What you have in total +40 is only one realization of this third SV. But it varies quite widely. Again, 99% of the data will lie in the range -474...+474

 
The theorist whales have forgotten my little question(
Reason: