The Sultonov Regression Model (SRM) - claiming to be a mathematical model of the market. - page 46

 
VladislavVG:


It does not matter. The method of solving the equation has no effect on the solution, if it exists and is the only one, but if there are many acceptable solutions and Yusuf has a local minimum, then genetics or bees are better. For a head-on manual solution, just use a tester: the genetic algorithm will help you.

There is a danger of slipping into fitting - with so many parameters to be varied you need to take more examples to optimise.

And one more remark - you don't know the sample period at which the coefficients were searched. If this period coincided with the period of balance growth on the graph, then, alas, everything is not so rosy.

In my case, there is one single solution, if any, and this solution always gives zero sum of residuals (MO=0), i.e. it is analytically exact solution to the equation in question and approximate solutions are out of the question, although you are right and they have a right to exist. But now, with the advent of the exact solution method, it will be nonsense to deal with approximate solutions. The case of degeneracy of input data, when it is impossible to solve the equation, has not been observed on OHLC forex data yet. The algorithm has not stumbled on them, so they are correct.
 

Now let's try to express the dependence of the average price of formation of the next bar as a linear function of the type:

F(t+1)=a0+a1*O(t)+a2*H(t)+a3*L(t)+a4*C(t)

For a selected history of 15 daily bars D1 the following values of coefficients are obtained:

a4 a3 a2 a1 a0 RR. %
1,17387 -0,71318 0,04476 0,27979 0,27433 0,2894








We have obtained very important information about the nature of the price change:
1. The closing prices of the current bar C are the most significant for the formation of the future bar price;
2. The second most significant are L prices, which indicates a significant strength of bears in relation to bulls;
3. Then O-opening prices are significant;
4. The insignificance of H prices indicates that significant upward price movement is unlikely.
Conclusion: SELL is preferable for the near term.




 
yosuf:

Now let's try to express the dependence of the average price of formation of the next bar as a linear function of the type:

F(t+1)=a0+a1*O(t)+a2*H(t)+a3*L(t)+a4*C(t)

For a selected history of 15 daily bars D1 the following values of coefficients are obtained:

a4 a3 a2 a1 a0 RR. %
1,17387 -0,71318 0,04476 0,27979 0,27433 0,2894














I don't understand at all how regression coefficients are estimated.

Here is the least squares method (OLS) for EURUSD_H1 sample length = 50 bars.

Dependent Variable: F

Method: Panel Least Squares

Date: 12/02/12 Time: 10:26

Sample: 1 50

Periods included: 23

Cross-sections included: 3

Total panel (unbalanced) observations: 47

F= C(1)+C(2)*OPEN(-1)+C(3)*HIGH(-1)+C(4)*LOW(-1)+C(5)*CLOSE(-1)

Coefficient Std. Error t-Statistic Prob.

C(1) 0.114716 0.046286 2.478392 0.0173

C(2) -0.051038 0.156544 -0.326030 0.7460

C(3) -0.343986 0.179835 -1.912786 0.0626

C(4) 0.139395 0.190961 0.729968 0.4695

C(5) 1.163942 0.207562 5.607671 0.0000

R-squared 0.947458 Mean dependent var 1.247037

Adjusted R-squared 0.942454 S.D. dependent var 0.002839

S.E. of regression 0.000681 Akaike info criterion -11.64578

Sum squared resid 1.95E-05 Schwarz criterion -11.44895

Log likelihood 278.6757 Hannan-Quinn criterion. -11.57171

F-statistic 189.3409 Durbin-Watson stat 1.935322

Prob(F-statistic) 0.000000

Here is the graph

Here isthe least squares method (LOS) for EURUSD_H1 with sample length = 2000 bars.

Dependent Variable: F

Method: Panel Least Squares

Date: 12/02/12 Time: 10:29

Sample: 1,000

Periods included: 23

Cross-sections included: 85

Total panel (unbalanced) observations: 1915

F= C(1)+C(2)*OPEN(-1)+C(3)*HIGH(-1)+C(4)*LOW(-1)+C(5)*CLOSE(-1)

Coefficient Std. Error t-Statistic Prob.

C(1) 0.000190 0.000729 0.260526 0.7945

C(2) 0.026179 0.029181 0.897122 0.3698

C(3) -0.020055 0.028992 -0.691745 0.4892

C(4) -0.106262 0.032127 -3.307569 0.0010

C(5) 1.099945 0.031672 34.72901 0.0000

R-squared 0.999362 Mean dependent var 1.259869

Adjusted R-squared 0.999361 S.D. dependent var 0.031014

S.E. of regression 0.000784 Akaike info criterion -11.46178

Sum squared resid 0.001174 Schwarz criterion -11.44727

Log likelihood 10979.66 Hannan-Quinn criterion. -11.45644

F-statistic 748391.1 Durbin-Watson stat 2.058272

Prob(F-statistic) 0.000000

With any sample length it is obligatory to calculate the error of the estimated coefficients. And we see that in the last estimation with the value of coefficient =0.000190 the skop =0.000729. Not only is the value of the coefficient ridiculous, but the sco is 7 times the face value!

Sorry, Yusuf, but that's just another bicycle. Any textbook on regression analysis starts with an equation like yours. But unlike you, students know how to evaluate the result of the fit - any of them will tell you that the said regression cannot be used.

 
faa1947:


I don't understand at all how regression coefficients are estimated.

Here is the least squares method (OLS) for EURUSD_H1 sample length = 50 bars.


At any sample length it is obligatory to calculate the error of the estimated coefficients. And we see that in the last estimation with the coefficient =0.000190 sko =0.000729. Not only is the value of the coefficient ridiculous, but the sco is 7 times the nominal value!

Please give me the type of regression equation you are investigating.
 
yosuf:
Please provide the type of regression equation you are investigating.


It's listed in the post. Here is a copy:

F= C(1)+C(2)*OPEN(-1)+C(3)*HIGH(-1)+C(4)*LOW(-1)+C(5)*CLOSE(-1)

For 50 bars, here is the coefficient.

F= 0.114716047564-0.0510381399594*OPEN(-1)-0.343985953799*HIGH(-1)+0.139395237588*LOW(-1)+1.16394204527*CLOSE(-1)

But it's all about estimating these coefficients. You don't want to understand that far from always, but rather always, the estimation of the coefficients (their value) cannot be trusted. This is the heart of regression analysis.

You have to answer the question: on what basis do we believe that the coefficients we calculate have exactly the value we see?

 
faa1947:


It is stated in the post. Here's a copy:

F= C(1)+C(2)*OPEN(-1)+C(3)*HIGH(-1)+C(4)*LOW(-1)+C(5)*CLOSE(-1)

For 50 bars, here is the coefficient.

F= 0.114716047564-0.0510381399594*OPEN(-1)-0.343985953799*HIGH(-1)+0.139395237588*LOW(-1)+1.16394204527*CLOSE(-1)

But it's all about estimating these coefficients. You don't want to understand that far from always, but rather always, the estimation of the coefficients (their value) cannot be trusted. This is the heart of regression analysis.

We need to answer the question: on what basis do we believe that the coefficients we calculate have exactly the value we see?

By definition, MNC gives the best estimate of the coefficients of the equation in question and if you don't like them for whatever reason, look for another way to estimate them or change the form of the equation. This is the standard approach when investigating phenomena and processes. If the regression equation with the ANC found provides a relative error of less than 1% (0.29% in this case), then what else do I want from these coefficients? You are stuck with the problem of the reliability of the coefficients, no more reliable way of determining them than the ANC has yet been devised. Nevertheless, we must be aware that any reasoning and conclusions we make are only true within the sample in question and there is no guarantee that outside it, including the future, they will remain true. But, we are forced, with a degree of probability, to assume their applicability in the near future. Nothing and no one can predict the future with absolute certainty.
 
yosuf:
By definition, MNC gives the best estimate of the coefficients of the equation in question and if you don't like them for whatever reason, look for another way to estimate them or change the form of the equation. This is the standard approach when investigating phenomena and processes. If the regression equation with the ANC found provides a relative error of less than 1% (0.29% in this case), then what else do I want from these coefficients? You are stuck with the problem of the reliability of the coefficients, no more reliable way of determining them than the ANC has yet been devised. Nevertheless, we must be aware that any reasoning and conclusions we make are only true within the sample in question and there is no guarantee that outside it, including the future, they will remain true. But, we are forced, with a degree of probability, to assume their applicability in the near future. Nothing and no one can predict the future with absolute certainty.


Somehow you didn't get into the regression fitting report. In the latter, different coefficients have different calculation accuracy. The best is 3%. But there are also multiples of par.

I don't get hung up on anything. I just do the standard regression estimation. Anyway, I don't give coefficient values without estimating them.

About the ISC. I want to disappoint you. MNC is not the only method, moreover it is a method with a large number of constraints. There are other methods that do not have such limitations.

 
yosuf:

Now let's try to express the dependence of the average price of formation of the next bar as a linear function of the type:

F(t+1)=a0+a1*O(t)+a2*H(t)+a3*L(t)+a4*C(t)

For a selected history of 15 daily bars D1, the following values of coefficients are obtained:






These 15 daily bars - which dates were taken?
 
Demi:
Those 15 day bars - what dates were taken?

Data used on D1 from 16. 09. 12 to 05. 10. 12
 
yosuf:
Data used on D1 from 16. 09. 12 to 05. 10. 12




))))I thought so:

1. your data is not homogeneous. The model includes data describing 24-hour price dynamics and data describing 4-hour price dynamics. The data for Sunday should be removed. Everyone makes this mistake.

2. You must have an optimal number of observations. There's no precise formula, but it's somewhere between 5 and 10 observations per variable. You have four variables and fifteen observations. The model is inadequate. And don't do as one great expert on this forum did - take a model with four variables and 5,000 observations! ))))

3. Once you have built the model, determine the partial correlation coefficients for each variable. And you find that only C is statistically significant. Build a model including only C and the coefficient before C will be positive.

From this you draw a conclusion common to ALL autoregressive models - IF PRICE IS RISING, then there is a higher probability that it will continue to rise in the future and vice versa. Then you throw out the model.

Reason: