Dependency statistics in quotes (information theory, correlation and other feature selection methods) - page 25

 
Candid:

Do I remember correctly that the input data here is the incremental percentage module?

But if so, then this is essentially the same volatility (i.e. its monotonic and unambiguous function), one would expect all volatility-related effects to appear here as well, albeit in a somewhat filtered form. And since the effects of volatility seem to far outweigh all other market phenomena, the prospect of seeing "something else" in their background looks rather problematic. I repeat, I think it is more promising to try to consistently exclude known but "useless" effects from the raw data.

By the way, Alexei(Mathemat), do you have source data in modules too?

Good day!

For all Forex charts I calculate in pips (in contrast to the stock market).

Second - I save the signs, i.e. I do not take modulo. All that you have seen for D1, M5, H1 - is calculated on the basis of returns in pips, discretized to 5 symbols of the alphabet, the sign of price change is preserved.

Last chart for H1 - I have squared the returns to get rid of the sign.

 
HideYourRichess:

Actually, great Bohr as well as great Shannon, in the solution of their problems went from the essence, "physics", to figures, in difference from what happens here.

Second problem, it is not possible to explain to people who want to believe - that their belief is false. How it is possible to explain to people that the method is not applicable, as it is designed for stationarity and independence. Let even independence in the form of Markov chains, in any case it excludes applicability of method to data with presence of "memory" longer than that considered. Non-stationarity and dependence (I want to emphasize once again, this dependence is also non-stationary, therefore neither CM, nor conditional entropies work) directly follow from the understanding of market processes generating a quote flow.

No, they weren't coming from the point, they were coming from the facts :)) It's a joke :).

Are you demanding strict stationarity from real processes? I hope not. Let's move on. We need regularity, i.e. an effect that exists for a long enough time. That is we are interested in stationary (at least approximately and at least for the time limited by our sample) processes in the market. In other words, the apparatus is quite adequate to the purpose.

 
alexeymosc:

Good afternoon!

For all Forex charts I count in pips (as opposed to the stock market).

Second - I save the signs, i.e. I do not take them modulo. All that you have seen for D1, M5, H1 - is calculated on the basis of returns in pips, discretized to 5 symbols of the alphabet, the sign of price change is preserved.

Last chart for H1 - I have squared the returns to get rid of the sign.

The presence of a sign makes a big difference of course. It's just that in your article, probability density functions are only given for positive values.
 
Candid:
The presence of the sign makes a big difference of course. It is just that in your article probability density functions are given only for positive values.

These functions are not by returns, but by the value of the calculated mutual information, and this value cannot be negative.

In the Habra article for returns the sign was also kept, but there I took percentage increments. But it doesn't make much difference.

Here - compare the last two charts for EURUSD H1. In the first one the sign for the increments is conserved, in the second one it is omitted. Informativity of the second system is naturally higher. But the informativity is not low even with the sign of the movement direction. It is already interesting.

 
alexeymosc:
These functions are not based on returns, but on the values of the calculated mutual information, and this value cannot be negative.

Yes, I've already noticed that I was wrong.

In any case, if the methodology senses the effects of volatility even on significant data it rather speaks in its favour, imho.

 
Candid:

Yes, I've already noticed that I was wrong.

In any case, if the methodology senses the effects of volatility even on the landmark data it rather speaks in its favour, imho.

I've given here the real calculation table below - this is the real EURUSD M5 quotes: https://www.mql5.com/ru/forum/135430/page22

State 1 is the lower quantile (strong downward movement), state 5 is a strong upward movement. The independent variable is return one step back, i.e. the nearest lag. It can be seen that if the source value = 1, the receiver is more likely to take value 1 or 5, but with a bias of 5.

If the source took a value of 5, the receiver is either 1 or 5 more likely to be skewed towards 1. These things reduce the uncertainty of the receiver's state. Both volatility and skewness to specific values play a role here. By isolating the volatility separately, the informative component for specific values (and not for pairs of polar values 1-5) remains.

I purposely posted this screenshot to make the essence of the research clearer. Everything is based on probability and density function.

 
joo:
how is the search done, not by brute force?

Overkill is one option. You could try a genetic search algorithm with mutual information as a fitness function.

Imagine you have a set of 100 variables, they are equally sampled. It could happen that if variable 5 takes value 3 and variable 76 takes value 1, then the probability that the dependent variable takes value 4 is 75%. But to sample this pair of independent variables, we need to measure the mutual information between the two independent variables and the dependent variable 100 * 100 - 100 times. And if we want to look at combinations of the three independent variables

 
Avals:
do not generate SBs based on GARCH. You need to take a real series and generate a SB based on real volatility. I posted a script here https://forum.mql4.com/ru/41986/page10 which replaces the offline history of a real instrument with a SB using tick volumetric. Such a SB will almost 100% replicate the real vol. GARCH, etc. they do not take into account many nuances such as different wave cycles and many others. If there is a difference between this row of SB and the row it is generated from, it is more interesting :)

It is a good idea. Here is a generated graph of SB with identical volatility with EURUSD. Alexey, please perform analysis for it. Let's see if there are any differences.
Files:
eurusd_r.zip  499 kb
 
Candid:

No, they weren't coming from the point, they were coming from the facts :)) It's a joke :).

Are you demanding strict stationarity from real processes? I hope not. Let's move on. We need regularity, i.e. an effect that exists for a long enough time. That is we are interested in stationary (at least approximately and at least for the time limited by our sample) processes in the market. That is the apparatus is quite adequate to the aim.

Exactly, you hope. According to my calculations, the processes that take place at different times in the market differ, shall we say, many times over. Not the percentages you're hoping for. You are trying to compare processes which occur at one time with a process from another time - where the stationarity and adequacy of the methods come from. Reflections of this non-stationarity can be seen in changes of volatility (both cyclical and sporadic), but even that is not complete.

Here, it seems that many people have read Shiryaev's lecture about Pastukhov's work and it seems that when the meter says that "volatility itself is volatile" it should be clear that everything is not simple and we must look carefully at what we do. But no, once again we see another attempt to pull some formulas on the market.

In short, do what you want, it's your time and your losses. Of course, if you enjoy the process of studying the numbers - that's another matter, there is only the fun of the hobby.

 
HideYourRichess:

Exactly, you hope. According to my calculations, the processes that take place at different times in the market differ, shall we say, many times over. Not in the percentages you are hoping for.

Firstly, we understand it. Non-stationarity is a given that you have to put up with, in the worst case parting with your blood money.

Secondly, by discretising by 5 quantiles, we coarsen the data series and the noise is absorbed, at least partially, within the quantile ranges. The density function becomes uniform.

Reason: