Dependency statistics in quotes (information theory, correlation and other feature selection methods) - page 30

 
faa1947:

residuals analysis for autocorrelation and type of probability density function. Also, of course, R^2. These are, in principle, common time series prediction techniques.

This is just a start, and in the sense of being generally accepted, is not complete. Complete is shown here, although this is only an example of use. Three groups of analyses: coefficients, residuals and stability. If you can reconcile the contradictions, you might get an estimate, which is always a prediction error, as the target is the prediction and everything else is the intermediate results.

Yes, I don't think I've ever done a FULL academic study. There's usually a shortage of time and effort in the job, so I go the shorter route: build several (2 or more) predictive models and do a residuals analysis, then choose one model based on a balanced assessment of the accuracy and quality of the residuals series. I agree that, given time, one can go in-depth into predictor reliability estimates.

By the way, a good and relatively low-cost test of predictor accuracy and stability is cross-validation, where training and validation periods are chosen many times so that eventually the whole original series is broken into small segments and each segment is included in the validation sample.

 
alexeymosc:

Yeah, I don't think I've ever done a FULL academic study. Usually there is a shortage of time and effort in my work, so I take a shorter route: I build several (2 or more) predictive models and do residuals analysis, then choose one model based on a balanced estimate of accuracy and quality of the residuals series. I agree that, given time, one can delve deep into predictor reliability estimates.

By the way, a good and relatively low-cost test for predictor accuracy and stability is cross-validation, where training and validation periods are chosen many times, so that eventually the whole original series is broken into small segments and each segment is included in the validation sample.

For a very long time, I have been giving the following advice, which is hard-won from my own experience. For about thirty years now, and especially nowadays, the most productive way to master a subject area is to take a ready-made package focused on that subject area. I mentioned EViews, but it's not the only package. It's just that this package is used in teaching in our universities. By using the package you will gain:

A huge number of functional and validated by millions of users specialised programmes

analysis scheme

completeness of analysis

references from which the algorithms used were taken.

Having gained experience and an outlook, you can then move on to some enhancements. But these improvements will be based on systematic knowledge of the subject area and it will not occur to you to bring in Peters or someone else like Ptolemy, as some above.

 
faa1947:

For a very long time, I have been giving the following advice, hard-won from my own experience. For about thirty years now, and especially nowadays, the most productive way to learn a subject area is to take a ready-made package focused on that subject area. I mentioned EViews, but it's not the only package. It's just that this package is used in teaching in our universities. Using the package you get:

a large number of functional and tested specialized programmes for millions of users

An analysis framework

Completeness of analysis

Lists of references from which the algorithms used were taken.

Once you have gained experience and a good outlook, you can then move on to any improvements. But these improvements will be based on a systematic knowledge of the subject area and it won't occur to you to bring in Peters or someone else like Ptolemy, as some people above.


Sorry, but the main thing is the correct problem statement and conclusions. For example, what is written in the article relative to "Diagnostics of indicators" and conclusions on the basis of residues that the indicator is useful or not etc. - is not correct (imha). Just like it is not true that the indicator has to predict or be useful at all on the whole series. You do not need to forecast prices all the time to get profit, only at relatively rare moments. Prediction in a certain sense because it is not a single price change, but the system's robustness and repetition of past statistics. And it is not indicators that make predictions, but only the system as a whole.

P.S. The article is still useful, thanks :)

 
Avals:


For example, what you have written in the article in relation to "Indicator diagnostics" and conclusions based on residuals that the indicator is useful or not, etc. - is not correct (imha). It's also not true that the indicator should predict or be useful at all on the whole range.

I agree, the article was written with some hesitation, because its purpose was to demonstrate the approach in general, the methodology in its entirety, so to say.

You don't need to make price forecasts all the time to make profit, only in relatively rare moments

I do not agree at all. Any TS always predicts, whether it admits it or not. With periodicity an analysis is made to make decisions for the future: to take a position, to exit, to stay out of the market or to stay in the market. These decisions are based on forecasts of future market behaviour.

It is not a single price change that is predicted, but the robustness of the system.

Robustness is not predicted, it is constructed, while the result of the construction is evaluated by the forecast error - if the error variance is close to a constant, the system will be stable.

And it is not the indicators that predict, but only the system as a whole.

Naturally. The article assumes that the system consists of a single indicator. Even with this simplification, the article has become too complicated.

If you want to discuss the article in details, I suggest moving to the appropriate topic, maybe others will join. This thread has a different topic after all.

 
faa1947:

There is no need to constantly forecast prices to make profits, only on relatively rare occasions

Completely disagree. Any TS always predicts whether it admits it or not. With periodicity of the TS an analysis is made to make decisions for the future: to take a position, to exit, to stay out of the market or in the market. These decisions are based on a forecast of future market behaviour.


not no need to predict, but no need to predict continuously :) Only at discrete moments, and checking the residuals of the indicator on the whole series actually tests its ability to predict the series continuously
faa1947:

It is not a single price change that is predicted, but the system's robustness

Robustness is not predicted, but constructed, and the result of the construction is evaluated by the prediction error - if the error variance is close to a constant, then the system will be stable.

The results of an individual trade are not important, the statistics on many trades is important. It must at least partially correspond to favorable statistics of back-tests. So when you give money to the system management, the bet is exactly on this - that our favorable statistics will remain (with a systematic approach). And that is robustness
 
alexeymosc:

It is possible that it is. But when we build a returns series of the following form: X[t]-X[t-1], it almost doesn't show it. I use the words returns, increments, returns, these are all differentiated price series.

This is easily verifiable, no need for philological arguments :). By the way, you can try to find out about it on this forum.

The skew of probability in the direction of sign change is minimal and insignificant. But if you calculate the conditional entropy between the dependent variable and returns over two or more lags, then all the unevenness is accounted for in the resulting figure so that entropy is reduced.

Once again: the strength of the effect depends on the TF, but will H1 generated on random ticks be similar to H1 generated on ticks with real return?

I tried to train NS on hourly data and took only the most informative lags (42 variables, on lags 1, 2, 23, 23, 25,... 479, 480, 481). Unfortunately, the result didn't work out very well. Accuracy of prediction of quantile number - in the region of 30-40%. Although, the irregularities the neural network was able to translate to the output, but the dependencies are not sufficient for prediction. The whole problem is that the independent variables are mutually informative at lag 1, 2, 24.... and the total amount of information about the zero bar is really small. We should think as an option to take daily and older timeframes.


I assumed from the beginning that the technique senses all dependencies, both useful and useless for forecasting. Regarding volatility, there is definite evidence to support such an assumption here. That is, those "informative" lags of yours can simply be clogged with this kind of useless information for prediction.

I think my time in this thread is either over or has not come yet :). It's probably time for the fountain to take a rest :).

 
Candid:


I assumed from the start that the methodology senses any dependencies, both those suitable for prediction and those useless. With regard to volatility, there is some evidence to support that assumption here. That is, those "informative" lags of yours can simply be clogged with this kind of useless information for prediction.

I've realised that quite well too. The problem has become more complicated to the level of finding informative signs of a sign of price change. And if it turns out that on intraday timeframes the required informative lags are cyclic (I suspect they are), then their total information about the direction of price movement on the zero bar will be very small...

This I will check. Next there will be: try to investigate the daily bars and probably weeks may still be statistically reliable. But if informative lags are cyclic there too, alas, I think the idea of using only lags will not work. Then you can try indicators.

I had planned it that way from the start, by the way. But until I test it, the fountain won't go away.

"So these "informative" lags of yours could just be clogged up with this kind of useless information for prediction."

And what do you even mean by "useless information"? OK, well, volatility is not our friend. There are also noise components. I think you decided early on to scrap the method. One has to know how to use any tool, I am still learning, so there is more water than calories here.

I've already suggested it: post any indicator and the target it indicates. One condition is that it should be according to the formulas in Echel and there should be a possibility to play with its parameters. The target should also be specific, for example, the price will go down in maximum 6 bars, or the price will exceed the current price by 10 points. I will run the data and give you a set of optimal indicator parameters in terms of information entropy.

 
Avals:
It should, in reality, at least partially match the back-test statistics that are favourable to us.
Hope for "favourable" or have a numerical expression for this "favourable". Above I named one of the estimated values - the variance variation of the prediction error should not exceed 5%. But this is not the only requirement for a robust system. And the back test only gives hope that it will not change.
 
faa1947:
To hope for "favourable" or to have a numerical expression for this "favourable". Above I named one of the calculated values - the variance variation of the forecast error should not exceed 5%. But this is not the only requirement for a robust system. And the back test only gives hope that it will not change.


Yes, there are methods for assessing robustness.

Applied to the topic and to your article: the way to bring the entire price series to a stationary form with positive MO is to create a profitable and robust reversal system that is always in the market. This also applies to the way to distinguish a real series from a random wander. That is, in fact, the indicator that will pass your tests and the criterion that will distinguish the real series from the SB is the algorithm of this system. Therefore, it is naive to believe that an indicator taken at random or the mutual information method is such an algorithm for market quotes. It can only happen purely by chance.

 
faa1947:

Efficient market theory is not considered in econometrics. All its assumptions are based on the fact that the market is not efficient. Econometrics does not include Markowitz and his apologists and their efficient portfolios. Econometrics has been around for over 100 years it has never been disproved by Peters, Mandelbrot and others, as it is originally based on the assumption that the market is non-stationary.

It is econometrics that justifies a forecast one step forward and shows the reasons for fatal deterioration of the forecast several steps forward.


the problem is that macro indicators can periodically change their weighting etc...+ the short period of time available for full analysis...

I would agree, of course, that the phase must be present in the analysis...

Reason: