Machine learning in trading: theory, models, practice and algo-trading - page 198

 
Dr.Trader:

I found another error in R. R does not divide by 0 correctly.

Here's the script:

//+------------------------------------------------------------------+
//|                                                         test.mq5 |
//|                        Copyright 2016, MetaQuotes Software Corp. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2016, MetaQuotes Software Corp."
#property link      "https://www.mql5.com"
#property version   "1.00"

#property script_show_inputs

input double div = 0.0;

//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
//---
   Print(1.0/div);
  }
//+------------------------------------------------------------------+

The correct answer, in mql is.
zero divide in 'test.mq5' (20,13)
with
the script stop

Unproprietary, in R:
> 1/0
Inf
with
the continuation of the script

I mean the same thing as Alexey - the behavior of programs under undefined conditions may vary, and this is not an error. The way the architecture is supposed to be, that's how the result will be.


You are wrong about "not the right answer".

The answer in R is perfectly correct and very convenient. The point is that R takes the concept of "variable value" to its logical conclusion and has three specific values for a variable: NA (there is a value, but it is not known), NaN (no numeric value) and Inf - infinity. This is such a value and it is completely wrong to interrupt the work of the program. If you take into account the limitations of the computer on the real accuracy, Inf may well have a finite value. And it is quite natural to continue working and a carefully written program should check such results, if they are assumed.

For example, the MQL documentation gives an example on the arcsine and states that arcsine(2) = infinity. This is not accurate. Precisely: arcsinus(2) = NaN, i.e. no numerical value, arcsinus(1) = Inf, but missing quotes during trading = NA, i.e. should be (or could be during the weekend) and they are not.

 
Alexey Burnakov:

What Wolfram says 0 is not the final truth. I mean that the word "error" in the wording is redundant...

It's not about Wolfram. The integral of the nonzero positive values cannot be zero.

In the process of checking, errors in the algorithms were also found.

For example for the noncentral t-distribution the quantiles are not inverted:


> n <- 10> k <- seq(0,1,by=1/n)> nt_pdf<-dt(k, 10,8, log = FALSE)> nt_cdf<-pt(k, 10,8, log = FALSE)> nt_quantile<-qt(nt_cdf, 10,8, log = FALSE)> nt_pdf [1] 4.927733e-15 1.130226e-14 2.641608e-14 6.281015e-14 1.516342e-13 3.708688e-13 9.166299e-13 [8] 2.283319e-12 5.716198e-12 1.433893e-11 3.593699e-11> nt_cdf [1] 6.220961e-16 1.388760e-15 3.166372e-15 7.362630e-15 1.742915e-14 4.191776e-14 1.021850e-13 [8] 2.518433e-13 6.257956e-13 1.563360e-12 3.914610e-12> k [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0> nt_quantile [1] -Inf -1.340781e+154 -1.340781e+154 -1.340781e+154 -1.340781e+154 [7] -1.340781e+154 7. 000000e-01 8. 000000e-01 9. 000000e-01 1. 000000e+154


Here the quantiles 0-0.6 are incorrectly calculated.

An example of a similar calculation in MQL5:

2016.11.10 17:53:32.645 TestStatNCT (EURUSD,H1) Unit tests for Package Stat
2016.11.10 17:53:32.645 TestStatNCT (EURUSD,H1)
2016.11.10 17:53:32.645 TestStatNCT (EURUSD,H1) Noncentral T distribution test started
2016.11.10 17:53:32.645 TestStatNCT (EURUSD,H1) Noncentral T distribution test: Test 1: Calculations for single values
2016.11.10 17:53:53:32.645 TestStatNCT (EURUSD,H1) 1 0, PDF=4.92773299108629100851e-15, CDF=6.23274905782904190070e-16, Q=1.51119154775858715131e-18,
2016.11.10 17:53:32.646 TestStatNCT (EURUSD,H1) 1 1, PDF=1.130226163094897516453e-14, CDF=1.3899389577957373769266e-15, Q=9.9999999999999989369615e-02,
2016.11.10 17:53:32.647 TestStatNCT (EURUSD,H1) 1 2, PDF=2.64161256619123119969e-14, CDF=3.16755115913693189175e-15, Q=2.00000000000000004840572e-01,
2016.11.10 17:53:32.648 TestStatNCT (EURUSD,H1) 1 3, PDF=6.28106243570810575054e-14, CDF=7.3638179090305861902898e-15, Q=2.999999999999999998601119e-01,
2016.11.10 17:53:32.648 TestStatNCT (EURUSD,H1) 1 4, PDF=1.5163698393962646374250e-13, CDF=1.743039359090744906969191e-14, Q=4.00000000000005184742e-01,
2016.11.10 17:53:32.648 TestStatNCT (EURUSD,H1) 1 5, PDF=3.70864175555731463525e-13, CDF=4.19192812277470617962e-14, Q=5.00000000000000000000e-01,
2016.11.10 17:53:32.649 TestStatNCT (EURUSD,H1) 1 6, PDF=9.16615229573755101565e-13, CDF=1.02187737277044947465e-13, Q=5.999999999999999998867573e-01,
2016.11.10 17:53:32.651 TestStatNCT (EURUSD,H1) 1 7, PDF=2.28327725393015114329e-12, CDF=2.51850847368662607692e-13, Q=6.999999999999999999289457e-01,
2016.11.10 17:53:32.652 TestStatNCT (EURUSD,H1) 1 8, PDF=5.7160930393970751591223e-12, CDF=6.25821417361289428232e-13, Q=7.9999999999999999299967253e-01,
2016.11.10 17:53:32.653 TestStatNCT (EURUSD,H1) 1 9, PDF=1.43395037240077606742e-11, CDF=1.56338059375202603523e-12, Q=8.99999999999999999998578915e-01,
2016.11.10 17:53:32.655 TestStatNCT (EURUSD,H1) 1 10, PDF=3.59391445912256345050e-11, CDF=3.91468724033560601170e-12, Q=9.999999999999999998889777e-01,
2016.11.10 17:53:32.655 TestStatNCT (EURUSD,H1) Noncentral T distribution test: Test 2: Calculations on arrays
2016.11.10 17:53:53:32.665 TestStatNCT (EURUSD,H1) 2 0, PDF=4.92773299108629100851e-15, CDF=6.23274905782904190070e-16, Q=1.51119154775858715131e-18,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 1, PDF=1.130226163094897516453e-14, CDF=1.3899389577957373769266e-15, Q=9.9999999999999989369615e-02,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 2, PDF=2.64161256619123119969e-14, CDF=3.16755115913693189175e-15, Q=2.00000000000000004840572e-01,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 3, PDF=6.28106243570810575054e-14, CDF=7.3638179090305861902898e-15, Q=2.999999999999999998601119e-01,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 4, PDF=1.5163698393962646374250e-13, CDF=1.743039359090744906969191e-14, Q=4.00000000000005184742e-01
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 5, PDF=3.70864175555731463525e-13, CDF=4.19192812277470617962e-14, Q=5.00000000000000000000e-01,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 6, PDF=9.16615229573755101565e-13, CDF=1.02187737277044947465e-13, Q=5.999999999999999998867573e-01,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 7, PDF=2.28327725393015114329e-12, CDF=2.51850847368662607692e-13, Q=6.999999999999999999289457e-01,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 8, PDF=5.7160930393970751591223e-12, CDF=6.25821417361289428232e-13, Q=7.9999999999999999299967253e-01,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 9, PDF=1.43395037240077606742e-11, CDF=1.56338059375202603523e-12, Q=8.99999999999999999998578915e-01,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) 2 10, PDF=3.5939144591225634505050e-11, CDF=3.91468724033560601170e-12, Q=9.999999999999999998889777e-01,
2016.11.10 17:53:32.665 TestStatNCT (EURUSD,H1) Noncentral T distribution test: Test 3: R-like overloaded functions
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 0, PDF=-3.29438973521509552711e+01, CDF=-3.50115439911437320575e+01, Q=9.32008491370933962264e-15,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 1, PDF=-3.21137735448868468468779e+01, CDF=-3.42095165639872504926e+01, Q=9.999999999999999992561506e-02,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 2, PDF=-3.12648017507063613607e+01, CDF=-3.33858176105613679852e+01, Q=2.00000000000004563017e-01,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 3, PDF=-3.03986521580849320401e+01, CDF=-3.25421978598387227066e+01, Q=2.999999999999999998490097e-01,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 4, PDF=-2.9517286993939179705057e+01, CDF=-3.16805609544090991392e+01, Q=4.00000000000005240253e-01,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 5, PDF=-2.86229405029589081266e+01, CDF=-3.08030305013295588878e+01, Q=5.00000000000000000000e-01,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 6, PDF=-2.77180886076848480570e+01, CDF=-2.99119647118446110312e+01, Q=5.999999999999999998756550e-01,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 7, PDF=-2.680540931294946693489897e+01, CDF=-2.90099393581479034765e+01, Q=6.99999999999999999999400480e-01,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 8, PDF=-2.58877355789277032727e+01, CDF=-2.80997113402901490531e+01, Q=7.999999999999999999489297e-01,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 9, PDF=-2.49680028891657173062e+01, CDF=-2.71841705920503962091e+01, Q=8.999999999999999998689937e-01,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) 3 10, PDF=-2.40491940358795979979193e+01, CDF=-2.62662856772029869035e+01, Q=9.9999999999999998889777e-01,
2016.11.10 17:53:32.676 TestStatNCT (EURUSD,H1) Noncentral T distribution test: Test 4: Generators of random values
2016.11.10 17:53:32.683 TestStatNCT (EURUSD,H1) Noncentral T distribution test passed
2016.11.10 17:53:53:32.683 TestStatNCT (EURUSD,H1) Noncentral T distribution test: PDF max error : 1.48634016318122160328e-25
2016.11.10 17:53:32.683 TestStatNCT (EURUSD,H1) Noncentral T distribution test: CDF max error : 2.75966439373922392108e-18
2016.11.10 17:53:32.683 TestStatNCT (EURUSD,H1) Noncentral T distribution test: Quantile max error : 5.16253706450697791297e-15
2016.11.10 17:53:32.683 TestStatNCT (EURUSD,H1) Noncentral T distribution test: PDF correct digits : 24
2016.11.10 17:53:32.683 TestStatNCT (EURUSD,H1) Noncentral T distribution test: CDF correct digits : 17
2016.11.10 17:53:32.683 TestStatNCT (EURUSD,H1) Noncentral T distribution test: Quantile correct digits : 14
2016.11.10 17:53:32.683 TestStatNCT (EURUSD,H1)
2016.11.10 17:53:32.683 TestStatNCT (EURUSD,H1) 1 of 1 passed

Result of the TestStatNCT.mq5 script (test from TestStat.mq5, added output of calculated values in the TestNoncentralTDistribution function)

As can be seen, the quantiles are addressed and the tests are passed.

As for the errors, they are as follows:

//--- precision
double calc_precision_pdf=10 E-15;
double calc_precision_cdf=10 E-15;
double calc_precision_quantile=10 E-14;

In the article by Denise Benton, K. Krishnamoorthy, "Computing discrete mixtures of continuous distributions: noncentral chisquare, noncentral t and the distribution of the square of the sample multiple correlation coefficient" showed that the AS 243 algorithm has errors:


Thus, there are errors in R and their probable cause is in the AS 243 algorithm, which is used in calculating the CDF.

Files:
TestStatNCT.mq5  16 kb
 
Quantum:

1)It's not about Wolfram here. Integral from nonzero positive values cannot be zero.

2)In the process of inspections there were also errors in the algorithms.

3)For example, for a non-central t-distribution, quantiles don't turn:


1 - If you also write about the error regarding gamma distribution, then I address my complaint to you. It is incorrect to say that R has it in error. I see some attempted labeling in this post. Apparently you haven't consulted scholars who use packages and don't see any errors there. The function in Python also outputs 1 density in zero.

Again. The density value is not defined at zero. Wolfram thinks it's 0 and you just agree with that without making any independent deductions.

You say that Tungsten has nothing to do with it, but you yourself can not answer my question about raising zero to the degree of zero. Instead, you show me tungsten...

You yourself see that in the right approximation the value goes to unity, which means that the integral in the approximation is also not zero. And there is no zero, nor is there a density, since it makes no sense to integrate an indeterminate quantity.

2 - This can be further checked and set up, for example, an error in the algorithm.

3)For example, this can be checked.

 
Quantum:


For example, for the non-central t-distribution the quantiles do not turn:


Quantiles 0-0.6 are incorrectly computed here.

> n <- 10> k <- seq(0,1,by=1/n)> nt_pdf<-dt(k, 10,8, log = FALSE)> nt_cdf<-pt(k, 10,8, log = FALSE)> quantile<-qt(nt_cdf, 10,8, log = FALSE)> nt_pdf [1] 4.927733e-15 1.130226e-14 2.641608e-14 6.281015e-14 1.516342e-13 3.708688e-13 9.166299e-13 [8] 2.283319e-12 5.716198e-12 1.433893e-11 3.593699e-11> nt_cdf [1] 6.220961e-16 1.388760e-15 3.166372e-15 7.362630e-15 1.742915e-14 4.191776e-14 1.021850e-13 [8] 2.518433e-13 6.257956e-13 1.563360e-12 3.914610e-12> k [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0> nt_quantile [1] -Inf -1.340781e+154 -1.340781e+154 -1.340781e+154 -1.340781e+154 -1.340781e+154[7] -1.340781e+154 7.  000000e-01 8.  000000e-01 9.  000000e-01 1.  000000e+154
Excuse me, where in your code is the nt_quantile variable initialized? i.e., where does it come from?
 

1) How can we explain that at point cdf=0, while pdf is non-zero?

> n <- 5> k <- seq(0,1,by=1/n)> gamma_pdf<-dgamma(k, 1,1, log = FALSE)> gamma_cdf<-pgamma(k, 1,1, log = FALSE)> k[1] 0.0 0.2 0.4 0.6 0.8 1.0> gamma_pdf[1] 1.0000000 0.8187308 0.6703200 0.5488116 0.4493290 0.3678794> gamma_cdf[1] 0.0000000 0.1812692 0.3296800 0.4511884 0.5506710 0.6321206

2) How can we explain that the quantiles did not match the original ones?

> n <- 10> k <- seq(0,1,by=1/n)> nt_pdf<-dt(k, 10,8, log = FALSE)> nt_cdf<-pt(k, 10,8, log = FALSE)> nt_quantile<-qt(nt_cdf, 10,8, log = FALSE)> nt_pdf [1] 4.927733e-15 1.130226e-14 2.641608e-14 6.281015e-14 1.516342e-13 3.708688e-13 9.166299e-13 [8] 2.283319e-12 5.716198e-12 1.433893e-11 3.593699e-11> nt_cdf [1] 6.220961e-16 1.388760e-15 3.166372e-15 7.362630e-15 1.742915e-14 4.191776e-14 1.021850e-13 [8] 2.518433e-13 6.257956e-13 1.563360e-12 3.914610e-12> k [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0> nt_quantile [1] -Inf -1.340781e+154 -1.340781e+154 -1.340781e+154 -1.340781e+154 -1.340781e+154 [7] -1.340781e+154 7.000000e-01 8.000000e-01 9.000000e-01 1.000000e+154
 
Alexey Burnakov:
Sorry, where in your code is the nt_quantile variable initialized? i.e., where does it come from?

Sorry, there was a typo in the script, corrected it.

The script is like this:

> n <- 10> k <- seq(0,1,by=1/n)> nt_pdf<-dt(k, 10,8, log = FALSE)> nt_cdf<-pt(k, 10,8, log = FALSE)> nt_quantile<-qt(nt_cdf, 10,8, log = FALSE)> nt_pdf [1] 4.927733e-15 1.130226e-14 2.641608e-14 6.281015e-14 1.516342e-13 3.708688e-13 9.166299e-13 [8] 2.283319e-12 5.716198e-12 1.433893e-11 3.593699e-11> nt_cdf [1] 6.220961e-16 1.388760e-15 3.166372e-15 7.362630e-15 1.742915e-14 4.191776e-14 1.021850e-13 [8] 2.518433e-13 6.257956e-13 1.563360e-12 3.914610e-12> k [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0> nt_quantile [1] -Inf -1.340781e+154 -1.340781e+154 -1.340781e+154 -1.340781e+154 -1.340781e+154 [7] -1.340781e+154 7.000000e-01 8.000000e-01 9.000000e-01
 
Quantum:

1) How can we explain that at the point cdf=0, while pdf is non-zero?

2) How can we explain that the quantiles are not the same as the original ones?

I'll think about it and answer you. Do you always answer a question with a question? Where are your own statistical thoughts?
 
Alexey Burnakov:

The point is that @Quantum is purely concerned with implementation and full verification of R's analog of mathematical libraries in MQL5.

This is not the reasoning of a theorist. And he digs deep when writing unit tests, which provide confidence in the correctness of the library.


Do not assume a priori that everything is correct in R. On the contrary, I would say that even if there is a C++ implementation of the functions there, everything is quite primitive. And in terms of speed, you can see that the MQL5 library in the source code on our compiler wins by 3 times on average.

We took the trouble to double-check everything and found obvious errors. These errors have been confirmed:

R uses the AS 243 algorithm proposed by Lenth [Lenth, R.V., "Cumulative distribution function of the noncentral t distribution," Appled Statistics, vol. 38 (1989), 185-189]. The advantage of this method is the fast recurrence calculation of infinite series terms with incomplete beta functions. But in the article [D. Benton, K. Krishnamoorthy, "Computing discrete mixtures of continuous distributions: noncentral chisquare, noncentral t and the distribution of the square of the sample multiple correlation coefficient", Computational Statistics & Data Analysis, 43, (2003), 249-267] showed that this algorithm leads to errors (Table 3 in the article), especially for large values of the noncentral delta parameter due to the error of accuracy estimation when summing the members of the series. The authors of the paper proposed a corrected algorithm for recurrence-based calculation of the probability of a noncentral T-distribution.

We use in MQL5 statistical library the correct algorithm for calculation of probabilities from the article [D. Benton, K. Krishnamoorthy, "Computing discrete mixtures of continuous distributions: noncentral chisquare, noncentral t and the distribution of the square of the sample multiple correlation coefficient", Computational Statistics & Data Analysis, 43, (2003), 249-267] , which gives accurate results.

To be sure in the accuracy of calculations and to give an opportunity for third-party developers to check the library quality, we have included some unit test scripts in delivery. You can find them in /Scripts/UnitTests/Stat.

Look at the dates of publication, please. You will see how the work is going with the tips of scientists.

Also, it would be a mistake not to consider @Quantum a scientist.

 

Comment then, in order of literalism, how for a uniform continuous distribution the density at the extreme point is positive and the integral is zero: https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)

 
Renat Fatkhullin:

The point is that @Quantum is engaged in pure implementation and full verification of R's analog of mathematical libraries in MQL5.

This is not the reasoning of a theorist. And he digs deep when writing unit tests, which provide confidence in the correctness of the library.


Do not assume a priori that everything is correct in R. On the contrary, I would say that even if there is a C++ implementation of functions there, everything is quite primitive. And in terms of speed, you can see that the MQL5 library in the source code on our compiler wins by 3 times on average.

We took the trouble to double-check everything and found obvious errors. These errors have been confirmed:

Look at the dates of publications, please. You will see how the work is going with the advice of scientists.

In addition, it would be a mistake not to consider @Quantum a scientist.

I didn't say anything about this error. It is so it is.

About gamma distribution - I understood that he is engaged in your implementation. And I'll repeat that his claim about this is pure literalism.

Reason: