Matstat Econometrics Matan

 
Welcome to all gurus of mathematical statistics, econometrics, and mathematical analysis.
A thread for a constructive dialogue, on various topics from these scientific fields.
Adequate communication is welcomed!
 

Forum on trading, automated trading systems and trading strategy testing

From theory to practice. Part 2

Aleksey Nikolayev, 2021.05.05 22:38

Roughly speaking, mixing weakens the dependence but does not remove it completely.
In fact, probabilistic dependence is the most important part of the theorem in terms of practical applications.
When I watched a youtube theorist course at MIT for engineers it was all about it.


Do you mean the coefficient of determination r2 ?
Or something else by probability dependence?

I take r2 into account in real time, to estimate the "strength of influence" of variable x on y
surprisingly on some currency series, it holds quite consistently at high values

r2

 
Isn't probability just not enough? If this is the theory of probability...
 
Dmitry Fedoseev:
Isn't just probability not enough? If it's probability theory...

I wanted to clarify which evaluation criterion was meant in this context.
If conventional correlation, then with r2 they have a difference in calculation, correspondingly different estimates.
In statistics, they usually recommend using r2 as more reliable.

r

and literally ten minutes later

rr

 
Roman:

Do you mean the coefficient of determination r2 ?
Or do you mean something else by probability dependence?

Probability (stochastic) dependence is one of the most important concepts in theorising and matstat. The concept is defined (via conditional probability) first for random events, and then transferred to random variables, in the form of a conditional distribution. Dependence is the non-conformity of the conditional distribution with the unconditional distribution, while independence is their coincidence. A popular explanation for dependence is - Knowing what value one c.c. got carries information about the value of the other c.c. Dependence lies between its two extreme states - independence and rigid functional connection.

The general sense is that we always start with a joint distribution of random variables, on the basis of which all sorts of specific dependence metrics are constructed. These can be copulas, mutual entropy, correlation, etc.

Correlation, R2, etc. are only reasonably applicable when the joint distribution is multivariate normal. In practice, they are also applied (due to simplicity) when normality is not certain, but then their usefulness is only determined by experience.

 
Aleksey Nikolayev:

Probability (stochasticity) is one of the most important concepts in theorising and matstat. The concept is defined (through conditional probability) first for random events, and then transferred to random variables, in the form of a conditional distribution. Dependence is the non-conformity of the conditional distribution with the unconditional distribution, while independence is their coincidence. A popular explanation for dependence is - Knowing what value one c.c. got carries information about the value of the other c.c. Dependence lies between its two extreme states - independence and rigid functional connection.

The general sense is that we always start with a joint distribution of random variables, on the basis of which all sorts of specific dependence metrics are constructed. These can be copulas, mutual entropy, correlation, etc.

Correlation, R2, etc. are only reasonably applicable when the joint distribution is multivariate normal. In practice, they are also applied (due to simplicity) when there is no certainty of normality, but then their utility is only determined by experience.

Ah this is a tricky distribution, I always forget about it ))
So all statistical models require this criterion?
And since there is no normality on price series, so begins the torture of preparing the data,
in order to somehow bring them closer to a normal distribution, without losing the original properties.

From this follows the problem of how to prepare these data.
Standardization, cusum, derivation, etc. as I understand it does not lead to quality results.
So they start to thin or something. What methods are there in general?

So again I come to the conclusion that preparing qualitative data for statistical models is a huge topic for study.
I searched for tutorials on this topic, but found nothing, bigdata, MO, neuronka are everywhere, but how to prepare qualitative data for them, for some reason is not disclosed.

 

I just can't understand the following anomaly, why this happens.
I calculated an orthogonal model, which is supposed to be better than MNC.
I got the starting coefficients.
Then model parameters (coefficients) are adjusted by median algorithm, i.e. a kind of robustness against outliers.
The model qualitatively describes the initial series.

In blue - original series.
Grey is the model.

p1

But on one of the history sections, I observe a divergence that further converges to the exact one as in the screenshot above

p2


I cannot see the truth, why does it happen? And what contributes to it?
Coefficients are recalculated at each step and should fit (x) to (y)
Is it a fitting error? I understand that there can be an error in one or two or even three calculation steps,
but it seems strange that the error should last so long. Maybe it's not a fitting error? Is it something else?

 
Roman:

I just can't understand the following anomaly, why this happens.
I calculated an orthogonal model, which is supposed to be better than MNC.
I got the starting coefficients.
Then model parameters (coefficients) are adjusted by median algorithm, i.e. a kind of robustness against outliers.
The model qualitatively describes the initial series.

In blue - original series.
Grey is the model.



But on one of the history sections, I observe a divergence that further converges to the exact one as in the screenshot above


I cannot see the truth, why does it happen? And what contributes to it?
Coefficients are recalculated at each step and should fit (x) to (y)
Is it a fitting error? I understand that there can be an error in one or two or even three calculation steps,
but it seems strange that the error should last so long. Maybe it's not a fitting error? Is it something else?

I can only advise to find some statistical package that implements your model (or similar to it) and see how it behaves with your data. This may help you to understand whether the problem is a faulty model or an implementation error.

 
Roman:

Since there is no normality in the price series, the tortuous task of preparing the data,
, is to somehow approximate a normal distribution without losing the original properties.

Logarithmic increments would not work?
 
Aleksey Nikolayev:

All I can suggest is to find a statistical package that implements your model (or one similar to it) and see how it behaves with your data. It may help to understand whether the problem is a faulty model or an implementation error.

Thanks for the idea, I didn't realise that.

 
secret:
Logarithmizing the increments isn't going to work?

Yes, that's basically what I'm doing as a more or less good option.
On another similar model I also sometimes observe small divergences, like divergence.
But not as prolonged as on the screenshot above, but quite short-lived. It made me wonder why it happens that way.
I have tried this model and saw an even more prolonged divergence.

So I do not understand where this divergence comes from. Not a correct model or low-quality source data.
I do not understand the logic of actions.
Either I should adjust the initial data approximately to normal,
or I should shovel different models.
But try to write this model first, it's not so easy to check and throw it away ))

Reason: