How to make sense in a scientific study - MQL4 and MetaTrader 4

Avals 2011.11.29 05:22 #721

faa1947:

Yes, 40 is a little low. Did the test and wrote above. After 70 further increasing the sample does not affect the result. Here is the result on sample length. It is noteworthy. The model coefficients are estimated:

EURUSD = C(1)*HP1(-1) + C(2)*HP1(-2) + C(3)*HP1_D(-1) + C(4)*EQ1_HP2(-1) + C(5)*EQ1_HP2(-2) + C(6)*EQ1_HP2(-3) + C(7)*EQ1_HP2_D(-1) + C(8)*EQ1_HP2_D(-2) + C(9)*EQ1_HP2_D(-3) + C(10)*EQ1_HP2_D(-4)

There are 10 in total. All coefficients are random variables. Question: at what sample length they will become approximately a constant. I will show all the coefficients in one fig:

Here sample = 80 observations. We see that after half of the sample everything is settled and especially the error of the evaluation of the coefficient. For the first coefficient I will give a larger one:

This is an estimate of the coefficient itself - we see that its value is not a constant.

And now the error of estimation of the coefficient:

Hence I conclude that the sample should be somewhere over 60 observations.

We need stable coefficients with a small error - this is a measure of sample length!

The convergence of model coefficients or their errors to some number does not determine the number of observations needed. Take an ordinary LR and the smaller the data, the faster it will change its coefficients, and the slower it will increase. But this is a property of the regression itself, not its accuracy in predicting series. And it does not determine the size of the regression calculation window.

If you apply a criterion, which gives numerical results, you should know not just a number, but to what extent you can trust it in this case. For example, for this purpose, mathematical statistics uses the DI.

[WARNING CLOSED!] Any newbie Author's dialogue. Alexander Smirnov. Machine learning in trading:

СанСаныч Фоменко 2011.11.29 05:31 #722

Avals:

And by the way, about the analysis of residuals for normal distribution: only 116 observations is very few for the results to be reliable. I.e. of course, the test can be applied and it will attribute the distribution to normal with some probability, but what is the confidence interval of this prediction? I.e. 25% is again very approximate value and may be the range of 0...50 with 95% confidence level, for example, or 22...28. It depends on both number of observations and variance. It seems to me that with 116 observations the CI would be huge.

I don't analyse for normality. Why?

First of all we need to extract from the quotient what we can use: correlation of observations. If we get a residual without correlations, then we have to find out if there is any other information that can be used - ARCH. If there is, then model (write analytical formula) this information too. The ideal residual is the one from which we cannot (do not know, do not know) extract any information for modelling.

Principles of working with Econometrics: let's discuss the Dependency statistics in quotes

СанСаныч Фоменко 2011.11.29 05:34 #723

paukas:
Make up your mind somehow.....

It's a very good habit to read a sentence to the end, or better still, a paragraph to the end, or better still, everything the author writes.

СанСаныч Фоменко 2011.11.29 05:37 #724

Avals:

convergence of model coefficients or their errors to some number does not determine the number of observations needed. Take an ordinary LR and the smaller the data, the faster it will change its coefficients, and the slower it will increase. But this is a property of the regression itself, not its accuracy in predicting series. And it does not determine the size of the regression calculation window.

If you apply some criterion that gives numerical results, you need to know not just the number, but how much you can trust it in a given case. For instance, mathematical statistics uses a DI for this purpose.

The reasoning is not very clear: why increase the window if the coefficient is constant and the oscillation coefficient is constant? We can see this in the figures.

Vladimir Paukas 2011.11.29 05:39 #725

faa1947:
It is a very useful habit to read a sentence to the end, or better still, a paragraph to the end, or better still, everything the author writes.

An even more useful habit is to write in a way that makes sense not only to the author, but also to the average collective farmer.

Vasiliy Sokolov 2011.11.29 05:40 #726

Reshetov:

At last the adept of the cult, has revealed the main secret of the religious trick!

Elementary, Watson! Because they are non-stationary. Stationarity is when dispersion and expectation are constants and don't depend on the sample, on which they are measured. I.e. in any other independent sample, we should get approximately the same constants. If we didn't get them, then the stationarity hypothesis is disproved.

The stationarity hypothesis can be tested in another way by increasing the sample dimension. In the case of stationarity both variance and expectation must also remain constants.

Oh, come on! The main problem with the model is not in the non-stationarity of the market, but in the model itself, it simply can not work, as evidenced by the strategy tester, which the topicstarter does not want to admit, and at the same time he wonders why his model does not work. There is no need to make such a mess with R^2, etc. when a simple test is much more objective way to tell what is what.

If you want such stationarity, please use equivolume charts. Why, volatility is a constant, dispersion and m.o.s. must be finite, but it will be of little use, as the model did not work on ordinary charts and will not work on equivolume charts either.

Bill Williams and his Machine learning in trading: How to show Indicators

Avals 2011.11.29 05:41 #727

faa1947:

I don't analyse for normality. Why?

First of all, what can be used to extract from the quotient is the correlation between the observations. If we get a residual without correlations, then we have to find out if there is any other information that can be used - ARCH. If there is, then model (write analytical formula) this information too. The ideal residual is the one from which we cannot (do not know, do not know) extract any information for modelling.

How do you not analyse it? You have it written in your article 1.3. Estimating residuals from a regression equation

You get specific numbers -

"The probability that the residual is normally distributed is 25.57%."

ACF, etc. etc.

But these numbers are of no value without an indication of how much they can be believed.

Can the profit factor of 400 trades be trusted as well as 40? So are all other statistical values and numerical criteria - accurate estimates are needed. The confidence interval is one way of doing it. 116 observations is not enough to believe the results of attribution or nonattribution of a distribution to normal, whichever criterion is applied.

Econometrics: let's discuss the How do you practically Remembering veterans: Box and

Yury Reshetov 2011.11.29 05:43 #728

faa1947:

The deafness is astounding.

I've been harping on this for years - kotir is non-stationary and cannot be predicted.

I've been saying all along this thread - the quotient is non-stationary, but it can be predicted if the residual from the model is stationary. The residual is of interest because then you can add up the model (analytical) with a stationary residual. This sum is equal to the quotient, not a pip is lost. I have written a hundred times above. No same thing, adept chukchi who are writers but not readers.

Keep talking. The residual is non-stationary, because if a model fitted to a single sample is tested on any other independent sample, the residual is no longer a constant. It is possible to make fits to other samples, but after those fits we get a different model for each individual sample.

Once again, I repeat for the especially gifted: stationarity can be revealed only by coincidence of statistical data on different, independent samples. And there is no such coincidence.

The trick with econometric manipulations is that they found a method to fit a model to a sample in such a way that all the residuals in that sample are approximately equal. But since such a trick only occurs for a single sample and in other samples the model gives different results, the residuals are not stationary, but only fitted to a single sample. Econometric models cannot extrapolate the future because they do not yet have historical data (which will only appear in the future) that can be fitted to the model.

It is the same as a redrawing indicator - fitting its readings to specific data, changing them retroactively.

Machine learning in trading: Dependency statistics in quotes 1st and 2nd derivatives

Avals 2011.11.29 05:44 #729

faa1947:

The reasoning is not very clear: why increase the window if the coefficient is constant and the osh. coefficient is constant? We can see this in the figures.

I'm not suggesting that the window for calculating regression coefficients should be enlarged. The window for this is not defined by their convergence to a number. I am talking about the number of observations and how it affects the accuracy of your criterion and statistical estimates

[ARCHIVE]Any rookie question, so Let's be friends Calculation of the slope

[Deleted] 2011.11.29 05:49 #730

The rule of thumb in statistics is that there should be at least 300 points, which is the lower limit.

Econometrics: one step ahead forecast - page 73