Dependency statistics in quotes (information theory, correlation and other feature selection methods) - page 3

 
alexeymosc:

I tried to look for correlations in the quotations of one financial instrument using statistical methods. For a start I took the Dow Jones Industrial index, daily data, and transformed the series into a series of percentage increments.

The increments are heteroscedastic. If you want to forecast the direction of price movement - you must take it into account.

To predict volatility, it is better to use more specific models instead of NS.

 

Response faa1947:

"About "linearity" and "non-linearity" I would also be cautious, as this question can and should be posed within the framework of the model with which you approximate the time series. By analysing the coefficients of this model you can conclude that these coefficients are: constants (or almost constants), deterministic functions or stochastic functions. This is a perfectly concrete and constructive process of analysing the type of dependencies. And what is constructive about discovering this information dependence? And again, how do you see it on the original time series?"


I could make a link to the economic component as well, but sorry, just read the responses to my first post in the thread, it's just about that: intraday volatility is cyclical. And the mutual information has shown that. For daily and higher timeframes the situation is altogether different, there are no obvious cycles there.

How to see it in the initial series? Nothing easier, look at the history for at least half a year on the hour bars and note if there are differences in volatility (in the size of the candles) by time. And when it comes to the daily bars, I personally have not found any natural cycles or any other day-to-day or economic logic. It's just an internal dependency structure in prices.

And I'm not approximating the time series yet, I'm extracting data from prices that allows me to look at familiar prices from a slightly different angle. Here you see the lack of dependencies in the increments, and I see that you are using autocorrelation. That tells me a lot. There is no linear dependence there and never was, and there's no need to constantly show autocorrelograms, I had enough of them myself for a long time already, and they were the same as yours. )

 
anonymous:

The increments are heteroscedastic. If you want to predict the direction of price movement - you must necessarily take this into account.

To predict volatility, it is better to use more specific models instead of NS.


And what models that are more specific to forex as opposed to, say, NS? I'm just interested in your opinion. There are a lot of models in the world.
 
alexeymosc:

And what models that are more specific to forex as opposed to, say, NS? I am just interested in your opinion. There are a lot of models in the world.

I was not talking about models specific to forex (there are many of them, especially for derivatives: https://en.wikipedia.org/wiki/Vanna_Volga), but about models specific to volatility (there are many of them, besides ARCH).

NS is not a Forex specific approach, because it is used everywhere (or even vice versa - where it is too lazy to build normal models and has a lot of computing resources, it is used there).

Approaches to prediction of volatility and price direction should be different. For the first one no NS should be used (unreasonable complication), for the second one you can try.

 

faa1947, please be more careful with the layout of your posts. Sometimes you can't immediately separate what you're quoting from your response.

Now to the point:

faa1947: Как мне кажется, увеличение объема выборки представляет интерес только в рамках предельной теоремы о сходимости по вероятности к нормальному закону. Хочу Вас разочаровать, что если мы не ставим перед собой такой задачи, то простое увеличение выборки ничего не дает. Ниже привожу увеличение выборки в 10 раз.

Finger in the sky, sorry. What normality in the limit are you talking about? Normality of what? The distribution of returns? At this stage this hypothesis is neither hot nor cold for me. I do not need any hypothesis about the distribution of returns and what law they tend to.

Personally, I have the following requirement on my watch: since I intended to use the chi-square criterion of independence of random variables (I wanted so), the sample size had to be such that any frequency of a joint event was guaranteed to be at least 5. This constraint should also be known to you. That's why such a weak sample on the clocks came out.

But that's me. Why alexeymosc used his sample exactly as large as it was, I don't know. But I can guess: probably he wanted to establish a pattern for the entire series and not a part of it.

faa1947: I would rather watch out about "linearity" and "non-linearity", because this question may and should be put in the framework of a model, by which you approximate time series. By analysing the coefficients of this model you can conclude that these coefficients are: constants (or almost constants), deterministic functions or stochastic functions. This is a perfectly concrete and constructive process of analysing the type of dependencies.

There are no models yet. Only Data Mining with non-parametric statistical methods.

I am confident that it is precisely a non-linear relationship: there is no significant linear relationship detectable by Pearson correlation at lags greater than 10. You know this yourself. But relationships are also found at muchlarger lags. So they are non-linear!

faa1947: And what is constructive about detecting this information dependence? And again, how do you see it on the original time series?

It's not easy to see, here I agree with you. We only know the average amount of information transmitted to the zero bar from a rather distant history - and the mechanism of this "information attack from the past" is not known to us. We still need to manage to convert those naked bits into a forecasting tool. Who said it would be easy?

The increments are heteroscedastic. If you want to forecast the price movement direction, you should necessarily take it into account.

I am extremely ignorant of modern econometric models, including ARCH and the related family. Can you explain on your fingers why this has to be taken into account at a stage where no models of incremental behaviour are built? No models, just a blunt application of information theory. Thank you.

 
anonymous:

I was not talking about models specific to forex (there are many of them, especially for derivatives: https://en.wikipedia.org/wiki/Vanna_Volga), but about models specific to volatility (there are many of them, besides ARCH).

NS is not a Forex specific approach, because it is used everywhere (or even vice versa - where it is too lazy to build normal models and has a lot of computing resources, it is used there).

Approaches to prediction of volatility and price direction should be different. For the first one there is no need to use NS (unreasonable complication), for the second one you can try.


I agree in principle about NS, although the method itself is not that simple. There are also a lot of conventions, which are desirable and sometimes obligatory to follow (from data preprocessing and selection of relevant variables, to the construction of networks). And in general, there are people who like to apply the knowledge known to them to the phenomena studied, and there are those who start to study from scratch, and the last probably would prefer NS. IMHO.

But I'm not going to predict volatility, I'm always trying to predict the direction of price movement. In this problem I use NS.

 
alexeymosc:

Answer faa1947:


...intraday volatility is cyclical. And the mutual information has shown that.

Your mutual information showed me nothing. You should make sure there are no deterministic constituents in the BP before doing stat processing. If they are present in BP, they will "score" the statistics and all surveys cannot be trusted. I have to disappoint you that the definition of volatility by initial BP is flawed. I manage to build models with the following parameters: 44 pips volatility and its fluctuation plus minus two pips, i.e. I can consider it constant. The volatility that remains for analysis strongly depends on the model applied.

And I do not approximate the time series yet, I extract data from prices that allow me to look at the usual prices from a slightly different angle. Here you see the lack of dependencies in the increments

Smoke ahead of the locomotive. Actually, at textbook level, the order of BP analysis is defined: stationary/non-stationary - for non-stationary the choice of transformation method is stationary. Surely this first step will involve removing the trend. Next, let's see.

 

I don't understand what you are doing here. I decided to refresh my understanding of the Information Theory (IT) and looked it up in the glossary of terms:

TI considers the notion of "information" only from the quantitative side, without reference to its value or even meaning. With such an approach a page of typewritten text at most always contains approximately the same amount of information, determined only by the number of characters and spaces (i.e. characters) on the page and not depending on what is printed on it, including the case of meaningless, chaotic set of characters. For modeling communication systems, this approach is valid, as they are designed to transmit information represented by any character set error-free over the communication channel. In those cases where it is essential to consider the value and meaning of the information, the quantitative approach is not applicable. This circumstance imposes essential restrictions on the fields of possible applications of TC. Failure to take it into account led at early stages of development to overestimation of applied significance.

In this connection I have three possible answers:

1. You are sure that the dictionary is lying and that's not really the case.

2. You are in the "early stages of development" and have not yet undertaken an assessment of "applied relevance".

3. You are something else.

 
Mathemat:

I am extremely ignorant of modern econometric models...

That clarifies a lot. Actually, econometrics is a science (I stress science) that has been studying economic time series for at least 100 years. The Society of Econometricians in the US was formed in the 30s. Judging by your posts, this is a science you are good at. You are not alone in this forum and as a side note: according to the developers of this site there is a grammatical error in the word "econometrics" and its derivatives.

 
faa1947: Your mutual information didn't show me anything. You should make sure that there are no deterministic components before doing statistical processing of BP.

Again, 25, a thumb in the sky. The study was not done in relation to a series of prices, but their returns. This is firstly.

Secondly, preprocessing of data like the one you mentioned is determined primarily by the goals of the analysis, not by dogmatic requirements imposed on the study without regard to goals.

Smoke ahead of the locomotive. Actually, at the textbook level, the order of BP analysis is defined: stationary/non-stationary - for non-stationary the choice of transformation method is stationary. Surely this first step will involve removing the trend. Next we'll see.

See my objection above. Match the research methods to its objectives! And finally stop muttering your spells about stationarity, detrending and other things that are irrelevant to the topic of the study.

2 HideYourRichess: I'm having a bit of a holiday today, so I'm temporarily free to say whatever I think :) Are we having a religious showdown about what information is?

2 faa1947:

Actually econometrics is a science (I stress science) that has been studying economic time series for at least 100 years.

OK, let it be a science. As I recall, econometrics is very fond of imposing its models on financial data. I don't impose them. Then I'm not doing econometrics. Any other questions?

Reason: