Discussion of article "Application of the Eigen-Coordinates Method to Structural Analysis of Nonextensive Statistical Distributions" - page 2

 
alsu:

What I mean by all this. Suppose we have a certain model, and on the basis of it we have obtained a theoretical function. And let it be that due to our ignorance we failed to take into account some very insignificant but systematic factor. In this case, the method of eigen-coordinates because of its extraordinary sensitivity will give us a slap on the wrist, saying that the real data do not correspond to the model. But that's not true! - The model is correct, but it does not take into account only one factor, and from the practical point of view this deficiency may turn out to be insignificant at all (as in the same example of Hilhorst-Schell, where it is difficult to notice the difference even by eye). So I would read "only from the fundamental point of view" as "rather from the fundamental point of view" in the sense that the value of maximum accuracy of correspondence may be not so essential from the applied point of view (for solving a practical problem), but from the fundamental point of view (of thorough understanding of all the processes taking place).

From an applied point of view, the value of maximum fit accuracy is not so essential if you know the limitations of the model in advance. For example, there are experimental data, there is a theory that describes them well in some area (any model has limitations). Suddenly it turns out to be the case that the method gave a kick in the cap, it will do so outside the model (e.g. our model doesn't work at high/low temperatures) we will see this. On the other hand, we usually have information about the properties of the model, e.g. that it is derived with some assumptions, at these temperatures other effects appear that are not accounted for in the model. There is nothing wrong with this, the model has an area of applicability.

Fundamentalism is always stronger, because its applicability area is wider. To have a wide applicability area you need to have special properties.

Also, the method only gives us a verdict that the model does not fit the experimental data, but says nothing about the reasons for the discrepancy (as in my example - we cannot determine whether the model is "generally" correct with minor flaws or whether it should be completely revised), and that is a flaw.

There is cooler magic for such cases - it's symmetry considerations.

It seems to me that the architectural flaw of statistical mechanics can hardly be corrected with the help of the indicative distribution.

 

Quantum:

It seems to me that by means of the indicative distribution it is unlikely to be possible to correct the architectural fault of statistical mechanics.

And there is no fault, try to substitute mu=0, nu=1, a=gamma into your calculations (paragraphs 2.3-2.4 of the article). Here is an extract from the article


In this case the calculations are almost trivial - after substitution of 3 coordinates only 2 remain, but you can notice that X1 and X2 are linearly dependent, i.e. in fact we have to eliminate one more coordinate. Next, substitute real data, for example, with EURUSD. You will be pleasantly surprised by the results (in terms of chart linearity). The most interesting thing is that, as far as I remember, there are deviations from linearity just in the area of "high temperatures" (in the sense of in the area of large moduli returns), and not at all in the direction you would expect - in fact, if you plot everything carefully, you will see that the "thick tail" of the distribution thins sharply at the end (it's hard to estimate, there are not enough points, but something like exp(-x^3) or exp(-x^4). This is to the question of a) whether it is possible to build one single model that would work in all regions (probably not, since nonlinear effects in the "saturation mode" play a predominant role) and b) such a tail corresponds to q-Gaussian, like an accordion to a goat, for that matter.
.

You can do it the other way round - feed the csv-file with real distribution of deviation modules into the script from paragraph 2.4 and see what happens. Since the problem is highly overdetermined (one of the C3 coefficients is very close to zero, and the other two C1 and C2 are very linearly dependent), I can't even predict the result (the MNC may overflow). If you are lazy, wait till evening, I can do it myself. Once we see the pictures, it will be clear who is right and what to talk about next).

By the way, I don't claim that the exponential is a panacea, on the contrary, in terms of non-extensive I support you and suggest to calculate which distribution maximises Q-entropy on [0;+inf) (do you know calculus of variations? I don't know it very well, but I can do it in principle, it's not very complicated). There are theoretical considerations (I wrote above about information), though not quite formalised, plus some intuition, if you like.

 
Ah, well yes, I should have got my arse up and looked on the internet and it turns out that q-exponential has already been calculated by kind people. Who's going to do the adjustments to the quotes?
 

Particularly pleasing is that

The q-exponential distribution has been used to describe the distribution of wealth (assets) between individuals
 
alsu:

And there is no joint, try to substitute in your calculations (paragraphs 2.3-2.4 of the article) mu=0, nu=1, a=gamma. Here is an extract from the paper


In this case the calculations are almost trivial - after substitution of 3 coordinates only 2 remain, but you can notice that X1 and X2 are linearly dependent, i.e. in fact we have to eliminate one more coordinate. Next, substitute real data, for example, with EURUSD. You will be pleasantly surprised by the results (in terms of chart linearity). The most interesting thing is that, as far as I remember, there are deviations from linearity just in the area of "high temperatures" (in the sense of in the area of large moduli returns), and not at all in the direction you would expect - in fact, if you plot everything carefully, you will see that the "thick tail" of the distribution thins sharply at the end (it's hard to estimate, there are not enough points, but something like exp(-x^3) or exp(-x^4). This is to the question of a) whether it is possible to build one single model that would work in all regions (probably not, since nonlinear effects in the "saturation mode" play a predominant role) and b) such a tail corresponds to q-Gaussian, like an accordion to a goat, for that matter.
.

You can do it the other way round - feed the csv-file with real distribution of deviation modules into the script from paragraph 2.4 and see what happens. Since the problem is highly overdetermined (one of the C3 coefficients is very close to zero, and the other two C1 and C2 are very linearly dependent), I can't even predict the result (the MNC may overflow). If you are lazy, wait till evening, I can do it myself. Once we see the pictures, it will be clear who is right and what to talk about next).

By the way, I don't claim that the exponential is a panacea, on the contrary, in terms of non-extensive I support you and suggest to calculate which distribution maximises Q-entropy on [0;+inf) (do you know calculus of variations? I don't know it very well, but I can do it in principle, it's not very complicated). There are theoretical considerations (I wrote above about the information), though not quite formalised, plus some intuition, if you like.

Working with modules is a very good idea, it would be interesting to see what happens.

P1(x) is weaker than P2(x) - the latter has richer dynamics according to the dif. equation, besides, P2(x) contains a Gaussian, which makes it universal (you can correct all problems where it appears).

I think we should dig towards P(U) - it is almost Gaussian, but with a cunning nonlinear transformation of the argument through erf-1(x) - this is how the tails were cut off at Scher.

when differentiating and integrating P(U), there are constructions with argument transformation in the form erf(a*erf-1(x)) - what this is not quite clear.

I.e. the idea is to recover from known exact solutions (Scher has a second example slide 25) by comparing equations the general form of the differential equation, the solutions of which will take the form of known functions in particular cases (by analogy with the hypergeometric function).

plot InverseErf - Wolfram|Alpha
  • www.wolframalpha.com
x
 
alsu:
Ah, yes, I had to get off my arse and look on the Internet, and it turned out that q-exponential had already been calculated by kind people

No less kind people have shown that there is a global fork (eq. 32), at which after "specific choice" h(x)=tanh(x) and lamda=1 we get g->q.

I wonder if there are other "specific choice" options with the "gaussian" option. I think there must be - the birth of a new quality cannot be on the basis of "do not play any special role" - fundamentality is simply needed here.

UPD: It is possible that "do not play any special role" is an incorrect statement made on the basis of several special cases.

 
Quantum:

From an application point of view, the value of maximising the accuracy of the fit is not so significant if you know the limitations of the model in advance.

The principle of "you can't spoil the porridge with oil" is very questionable in practical modelling.

If you concentrate only on economic time series, then, along with the need to solve other problems, you always have to solve the two-faced problem of "redundancy/insufficiency" of the model. In this case, when the models are equal, the one that is simpler is chosen. To solve this problem in statistics there are a set of tests that allow to try to solve this problem somehow.

The whole modelling mechanism should be balanced. Sure, it is interesting to have breakthroughs in some places, but it is practically interesting when pulling other elements of the models up to the level of that breakthrough.

At the moment, it is still a problem to have kinks (breakpoints) in the quotidian that cannot be accounted for in the modelling. Until this problem is solved, any model refinements are meaningless.

 

Yes, perhaps it is better to look at experimental data first.

Let us consider a classic example (Fig. 4 of the article) of explaining the SP500 distribution using q-Gaussian (function P2(x)).

Daily data on SP500 closing prices were taken from the link: http://wikiposit.org/w?filter=Finance/Futures/Indices/S__and__P%20500/.


SP 500 close prices

SP500 logarithmic returns

SP 500 logarithmic returns distribution


To check the SP500-data.csv file, copy it to the \Files\ folder, then run CalcDistr_SP500.mq5 (distribution calculation) and then q-gaussian-SP500.mq5 ( eigen-coordinates analysis).

Calculation results:

2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    2: theta=1.770125768485269
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    1: theta=1.864132228192338
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    2: a=2798.166930885822
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    1: a=8676.207867097581
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    2: x0=0.04567518783335043
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    1: x0=0.0512505923716428
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    C1=-364.7131366394939
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    C2=37.38352859698793
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    C3=-630.3207508306047
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    C4=28.79001868944634
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    1  0.00177913 0.03169294 0.00089521 0.02099064 0.57597695
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    2  0.03169294 0.59791579 0.01177430 0.28437712 11.55900584
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    3  0.00089521 0.01177430 0.00193200 0.04269286 0.12501732
2012.06.29 20:01:19    q-gaussian-SP500 (EURUSD,D1)    4  0.02099064 0.28437712 0.04269286 0.94465120 3.26179090
2012.06.29 20:01:09    CalcDistr_SP500 (EURUSD,D1)    checking distibution cnt=2632.0 n=2632
2012.06.29 20:01:09    CalcDistr_SP500 (EURUSD,D1)    Min=-0.1229089015984444 Max=0.1690557338964631 range=0.2919646354949075 size=2632
2012.06.29 20:01:09    CalcDistr_SP500 (EURUSD,D1)    Total data=2633

Estimates of the parameter q obtained by the eigen-coordinates method (q=1+1/theta): q~1.55

In the example (Figure 4 of the article), q~1.4.

SP 500 eigencoordinates X1 Y1

SP 500 eigencoordinates X2 Y2

SP 500 eigencoordinates X3 Y3

SP 500 eigencoordinates X4 Y4

Conclusions: in general, these data project quite well to q-gaussian, the data were taken as is, but averaging is still present, since SP500-index tool+daily charts.

X1 and X2 are sensitive in nature, On X3 and X4 the tails are slightly distorted, but not so much that q-gaussian is not the right function - need to find an example with a more pronounced problem.

You can improve X1 and X2 by replacing them with JX1 and JX2, they should straighten out. The tails on X3 and X4 can be corrected by expanding the set of eigen-coordinates by generalising the quadratic dependence, i.e. abandoning symmetry around x0 (+new parameters). We can look at the cubic case of (1+a(x-x0)^3)^theta and its extensions (+new parameters).

Requires study of instrument, time interval and timeframe dependence.

Quandl - Find, Use and Share Numerical Data
  • wikiposit.org
Browse pages curated by Quandl .
 
faa1947:

At the moment, there is still a problem of breakpoints in the kotir, which cannot be taken into account in modelling. Until this problem is solved, any model refinements are meaningless.

Regarding breakpoints (if I understood them correctly).

Let's consider the distribution of logarithmic returns for #AA, M5 (2011.12.01 21:15:00 -2012.06.29 18:10:00).

Calculation was made using the script CalcDistr.mq5, 10000 data for symbol #AA, M5.

#AA

The distribution of logarithmic returns in this case (scale M5) has a complex structure:

#AA distribution

If we consider the distribution of logarithmic returns~ probability of movement in some direction, then there is clearly a sum of distributions here - the structure of distributions on small scales indicates non-stationarity.

The current dynamics is determined by the local distribution, and at breakpoints it is rearranged:

I.e. the distribution is asymmetric in nature (|x| will not pass), it consists of 2 parts/distributions (positive and negative), the local dynamics is determined by the largest volume in the beaker.

Files:
CalcDistr.mq5  4 kb
 

Interesting material, thank you. I don't want to disturb the reigning mathematical nicety here, but I still can't help asking two simple questions:

1. The question of the practical value of these distributions. What should we come to as a result? Description for its own sake is fine, but (I apologise, of course) it smells of botany.

2. Is it reasonable to try to describe completely different in nature processes occurring at different "levels" on the market by a single distribution. The problem of "kinks" has already been mentioned here, but this is only a part of the problems that exist. Moreover, in different historical time intervals the very composition of processes changes significantly, how you want to describe it with one distribution - I don't understand.