Discussion of article "Statistical Distributions in MQL5 - taking the best of R" - page 7

 
ivanivan_11:

I'm talking about R, but my skill is veri small)) can someone check if the code is correct?

If the code is correct, can you check the benchmark?

The code is wrong.

You measured the compilation time of the function, not its execution:

The function cmpfun compiles the body of a closure and returns a new closure with the same formals and the body replaced by the compiled body expression.


Proof of the error:

> library(microbenchmark)
> library(compiler)
> n <- 2000
> k <- seq(0,n,by=20)
> qqq<- function(xx) { a <- dbinom(k, n, pi/10, log = TRUE) }
> res <- microbenchmark(cmpfun(qqq))
> a
Ошибка: объект 'a' не найден

If the qqq function had been run during the benchmark, object a would have received the computed data. But instead it turned out that the object was not even created.

As a result, the benchmark counted the compilation time instead of the execution time. In my code, everything is correct - the benchmark counts the actual execution time and object a is created with correct data.


And yes, compilation is quite a costly process: it is shown in milliseconds instead of microseconds

> print(res)
Unit: milliseconds
        expr      min       lq     mean   median       uq      max neval
 cmpfun(qqq) 1.741577 1.783867 1.856366 1.812613 1.853096 2.697548   100
 

And a separate joke, how you overridden the system function q() - quit - in your example.

There was no way to quit R :)

 
Dr.Trader:

I mean that the mql compiler already knows all input parameters at compile time. It is enough for it to calculate everything during compilation, and when calling the script, it just returns the pre-calculated result. I saw some articles on the hub where they compared c++ compilers, and judging by the analysis of assembler code, this is exactly what happens there.

Yes, he may be actively using it. Here are some examples: https://www.mql5.com/ru/forum/58241.

But in this case it won't work - you need to count in full because of complexity, loop and array filling.

 
ivanivan_11:

if the code is correct, can you check the benchmark?

You need to replaceres <- microbenchmark(cmpfun ( q)) with res <- microbenchmark(q())). But previously compiled libraries will not be recompiled into bytecode, I got the same results.

Renat Fatkhullin:
qqq<- function(xx) { a <- dbinom(k, n, pi/10, log = TRUE) }

"a" in this case will be a local variable, inaccessible outside the function itself anyway. But you can do it this way -
a <<- dbinom(k, n, pi/10, log = TRUE)
then it will be a global variable.

Renat Fatkhullin:

But in this case it won't work - you need to count in full because of complexity, loop and array filling.

I see, the speed of execution is excellent then

 

By the way, it costs practically nothing to interpret the primitive call a <- dbinom(k, n, pi/10, log = TRUE) with a direct fall into the R kernel with native execution(dbinom is in r.dll).

So trying to compile this call is obviously useless.

 

Since I've written many times about the fastness of R, let me put in my five cents.

Dear Renat!

Your example is nothing at all!

You have taken two similar functions and draw a conclusion about R's performance at all.

The functions you have given do not represent the power and diversity of R at all.

You should compare computationally capacious operations.

For example, matrix multiplication...

Let's measure the expression in R

c <- a*b,

where a and b are matrices of at least 100*100 size. In your code, make sure that R uses Intel's MKL. And this is achieved simply by installing the corresponding version of R.

If we look at R, there are mountains of code containing computationally intensive operations. To execute them, libraries are used, which are the most efficient at the moment.

And the usefulness of R in trading is not in the functions you rewrote (although they are also necessary), but in the models. In one of my replies to you I mentioned the caret package. See what it is.... The implementation of any practically useful trading model within the framework of this package and on µl will give you the answer

Besides, you should not forget that loading all the cores of a comp is a routine for R. In addition you can load neighbouring comps on the local network.

PS.

For me the idea of comparing the performance of MKL and R is questionable: these two systems have completely different subject areas

 

SanSanych, we will test everything and release a benchmark. But first we will complete the functionality.

The test was justified and it immediately revealed the problem. I have presented the theoretical justification and I am sure that the system overhead of R will be preserved for the overwhelming amount of functionality.

We can multiply the matrices in such a way that Intel will lose. This is long ago not rocket science, and Intel (or rather, such third-party programmers within the company affiliation) is not a champion in mythical knowledge of its processors.

[Deleted]  
СанСаныч Фоменко:

Since I've written many times about the fastness of R, let me put in my five cents.

.................

To San-Sanych and the other guys.

San-Sanych, you know how much I respect you ... ((S) Kataev and Feinzilberg, known as "Ilf and Petrov"), despite some of your post-Soviet jokes here.

Let me clarify something important for you:

1). The main job of a programmer is not to write programs, but to READ programs, in particular his own. Any programmer 95...99% of his time sits and stares at the monitor. Does he write a programme? No, he mostly reads it. Therefore, the closer to natural language what he reads on the screen is, i.e. to what he was taught by his mother, father, grandmother, school teacher - the more efficiently he will decipher these linguistically-obedient krakozebras on the screen and find the correspondence between the algorithm and his programme.

2). For the purposes of point (1) there is nothing better on average than the C language. That's why, for example, I personally (as well as 2-3 responsible and not very responsible people) managed to write a project with 700+ subroutines in C, MQL4, CUDA..... And everything works.

3). From the point of view of point (1), the object-oriented variant of C, i.e. C++, is much worse. (But about that another time).

4). Full compatibility of classical C and MQL4 is simply invaluable. Transferring a procedure back and forth takes half a minute.

5). The main advantage of C+MQL4 is CLARITY. That is, the comprehensibility and transparency of everything that is on the screen of the programmer.

If we compare C-MQL4 with your R, we should look not at the speed and volume of the written code, but at the CLARITY of the text. That is, its comprehensibility. Otherwise, the programmer will stare at the screen for 24 hours in vain attempts to understand what the program does, what parameters it has, why the author named them so, and in general why the programmer did it this way and not the other way. It is not the speed of the programme that is important here, but the correctness of its work and the speed of its APPLICABILITY for the final programmer.

From this point of view, what Metaquotes has done is of course a great support for those who want to insert statistics into their EAs. There is nothing to compare in terms of simplicity and comprehensibility of functions. And this is important. Especially if you have delicate calculations (and Forex and trading in general requires delicate calculations).

Let's compare.

Here is how the integration function looks like in C - MQL4:

//__________________________________________________________________
//|
// Integral_Simpson_3_Points_Lite ()|
//_______________________________________________________|
#define  FACTOR_1_3      (1.0 / 3.0)

double Integral_Simpson_3_Points_Lite
(
   double       & Price_Array[],
   int          Period_Param,
   int          Last_Bar
)
{
        double  Sum, Sum_Full_Cycles, Sum_Tail, Sum_Total;
        int             i, j;
        int             Quant;
        int             Full_Cycles;
        int             Tail_Limit;
        int             Tail_Start;

        //..................................................................
        if (Last_Bar < 0)
        {
                Last_Bar = 0;
        }
        //.........................................
        if (Period_Param <= 1)
        {
                return (0.0);
        }
        //.................................................................
        if (Period_Param == 2)
        {
                return (0.5 * (Price_Array[Last_Bar] + Price_Array[Last_Bar + 1]));
        }
        //=============================================================================================
        //=============================================================================================
        Quant = 3;
        Full_Cycles = (Period_Param - 1) / (Quant - 1);
        Tail_Start = Full_Cycles * (Quant - 1);
        Tail_Limit = Period_Param - Tail_Start;
        //...........................................................
        j = Last_Bar;

        Sum = 0.0;
        for (i = 0; i < Full_Cycles; i ++)
        {
                //.........................................................................
                Sum += Price_Array[j];
                Sum += 4.0 * Price_Array[j + 1];
                Sum += Price_Array[j + 2];
                j = j + (Quant - 1);
        }
        //...........................................................
        Sum_Full_Cycles = Sum * FACTOR_1_3;
        Sum_Tail = Integral_Trapezoid_Lite (Price_Array,
                                            Tail_Limit,
                                            Last_Bar + Tail_Start);
        //.........................................................................
        Sum_Total = Sum_Full_Cycles + Sum_Tail;
        //...............................................................
        return (Sum_Total) ;

}
[Deleted]  

I'll write in parts, it's easier to write that way.

There is a trapezoidal integration function inside:

//__________________________________________________________________
//|
// Integral_Trapezoid_Lite ()|
//_______________________________________________________|
double Integral_Trapezoid_Lite
(
   double       & Price_Array[],
   int          Period_Param,
   int          Last_Bar
)
{
        double  Sum;
        int             i;
        int             Price_Index ;
        //.........................................
        if (Last_Bar < 0)
        {
                Last_Bar = 0;
        }
        //.........................................
        if (Period_Param <= 1)
        {
                return (0.0);
        }
        //.................................................................
        if (Period_Param == 2)
        {
                return (0.5 * (Price_Array[Last_Bar] + Price_Array[Last_Bar + 1]));
        }
        //..................................................................
        //..................................................................
        Sum = 0.0;
        for (i = 0; i < Period_Param; i++)
        {
                Price_Index = Last_Bar + i;
                if (Price_Index < 0)
                {
                        break;
                }
                //........................................
                if ((i == 0) || (i == (Period_Param - 1)))
                {
                        Sum = Sum + Price_Array[i] * 0.5;
                }
                else
                {
                        Sum = Sum + Price_Array[i];
                }
        }
        //...............................................................
        return (Sum) ;
}
[Deleted]  

Everything is absolutely clear and understandable. And what is important, it always works and works well, i.e. with low error even in MT4-MQL4, which saves a lot of time.

But if you want to find out why you have incomprehensible errors when working in R, or if you just want to understand what parameters there are in the integration procedure or what integration method they have programmed there, you will see the following (God forgive me for posting this to immature programming kids):

http://www.netlib.org/quadpack/

This is only the title of the function originally written in Fortran. The main text will come later. This is the original programme used in the R package for integration.

What is there to understand here, tell me?

      subroutine qagse(f,a,b,epsabs,epsrel,limit,result,abserr,neval,
     *   ier,alist,blist,rlist,elist,iord,last)
c***begin prologue  qagse
c***date written   800101   (yymmdd)
c***revision date  830518   (yymmdd)
c***category no.  h2a1a1
c***keywords  automatic integrator, general-purpose,
c             (end point) singularities, extrapolation,
c             globally adaptive
c***author  piessens,robert,appl. math. & progr. div. - k.u.leuven
c           de doncker,elise,appl. math. & progr. div. - k.u.leuven
c***purpose  the routine calculates an approximation result to a given
c            definite integral i = integral of f over (a,b),
c            hopefully satisfying following claim for accuracy
c            abs(i-result).le.max(epsabs,epsrel*abs(i)).
c***description
c
c        computation of a definite integral
c        standard fortran subroutine
c        real version
c
c        parameters
c         on entry
c            f      - real
c                     function subprogram defining the integrand
c                     function f(x). the actual name for f needs to be
c                     declared e x t e r n a l in the driver program.
c
c            a      - real
c                     lower limit of integration
c
c            b      - real
c                     upper limit of integration
c
c            epsabs - real
c                     absolute accuracy requested
c            epsrel - real
c                     relative accuracy requested
c                     if  epsabs.le.0
c                     and epsrel.lt.max(50*rel.mach.acc.,0.5 d-28),
c                     the routine will end with ier = 6.
c
c            limit  - integer
c                     gives an upperbound on the number of subintervals
c                     in the partition of (a,b)
c
c         on return
c            result - real
c                     approximation to the integral
c
c            abserr - real
c                     estimate of the modulus of the absolute error,
c                     which should equal or exceed abs(i-result)
c
c            neval  - integer
c                     number of integrand evaluations
c
c            ier    - integer
c                     ier = 0 normal and reliable termination of the
c                             routine. it is assumed that the requested
c                             accuracy has been achieved.
c                     ier.gt.0 abnormal termination of the routine
c                             the estimates for integral and error are
c                             less reliable. it is assumed that the
c                             requested accuracy has not been achieved.
c            error messages
c                         = 1 maximum number of subdivisions allowed
c                             has been achieved. one can allow more sub-
c                             divisions by increasing the value of limit
c                             (and taking the according dimension
c                             adjustments into account). however, if
c                             this yields no improvement it is advised
c                             to analyze the integrand in order to
c                             determine the integration difficulties. if
c                             the position of a local difficulty can be
c                             determined (e.g. singularity,
c                             discontinuity within the interval) one
c                             will probably gain from splitting up the
c                             interval at this point and calling the
c                             integrator on the subranges. if possible,
c                             an appropriate special-purpose integrator
c                             should be used, which is designed for
c                             handling the type of difficulty involved.
c                         = 2 the occurrence of roundoff error is detec-
c                             ted, which prevents the requested
c                             tolerance from being achieved.
c                             the error may be under-estimated.
c                         = 3 extremely bad integrand behaviour
c                             occurs at some points of the integration
c                             interval.
c                         = 4 the algorithm does not converge.
c                             roundoff error is detected in the
c                             extrapolation table.
c                             it is presumed that the requested
c                             tolerance cannot be achieved, and that the
c                             returned result is the best which can be
c                             obtained.
c                         = 5 the integral is probably divergent, or
c                             slowly convergent. it must be noted that
c                             divergence can occur with any other value
c                             of ier.
c                         = 6 the input is invalid, because
c                             epsabs.le.0 and
c                             epsrel.lt.max(50*rel.mach.acc.,0.5 d-28).
c                             result, abserr, neval, last, rlist(1),
c                             iord(1) and elist(1) are set to zero.
c                             alist(1) and blist(1) are set to a and b
c                             respectively.
c
c            alist  - real
c                     vector of dimension at least limit, the first
c                      last  elements of which are the left end points
c                     of the subintervals in the partition of the
c                     given integration range (a,b)
c
c            blist  - real
c                     vector of dimension at least limit, the first
c                      last  elements of which are the right end points
c                     of the subintervals in the partition of the given
c                     integration range (a,b)
c
c            rlist  - real
c                     vector of dimension at least limit, the first
c                      last  elements of which are the integral
c                     approximations on the subintervals
c
c            elist  - real
c                     vector of dimension at least limit, the first
c                      last  elements of which are the moduli of the
c                     absolute error estimates on the subintervals
c
c            iord   - integer
c                     vector of dimension at least limit, the first k
c                     elements of which are pointers to the
c                     error estimates over the subintervals,
c                     such that elist(iord(1)), ..., elist(iord(k))
c                     form a decreasing sequence, with k = last
c                     if last.le.(limit/2+2), and k = limit+1-last
c                     otherwise
c
c            last   - integer
c                     number of subintervals actually produced in the
c                     subdivision process
c
c***references  (none)
c***routines called  qelg,qk21,qpsrt,r1mach
c***end prologue  qagse
c
      real a,abseps,abserr,alist,area,area1,area12,area2,a1,
     *  a2,b,blist,b1,b2,correc,defabs,defab1,defab2,r1mach,
     *  dres,elist,epmach,epsabs,epsrel,erlarg,erlast,errbnd,
     *  errmax,error1,error2,erro12,errsum,ertest,f,oflow,resabs,
     *  reseps,result,res3la,rlist,rlist2,small,uflow
      integer id,ier,ierro,iord,iroff1,iroff2,iroff3,jupbnd,k,ksgn,
     *  ktmin,last,limit,maxerr,neval,nres,nrmax,numrl2
      logical extrap,noext
c
      dimension alist(limit),blist(limit),elist(limit),iord(limit),
     * res3la(3),rlist(limit),rlist2(52)
c
      external f
c
c            the dimension of rlist2 is determined by the value of
c            limexp in subroutine qelg (rlist2 should be of dimension
c            (limexp+2) at least).
c
c            list of major variables
c            -----------------------
c
c           alist     - list of left end points of all subintervals
c                       considered up to now
c           blist     - list of right end points of all subintervals
c                       considered up to now
c           rlist(i)  - approximation to the integral over
c                       (alist(i),blist(i))
c           rlist2    - array of dimension at least limexp+2
c                       containing the part of the epsilon table
c                       which is still needed for further computations
c           elist(i)  - error estimate applying to rlist(i)
c           maxerr    - pointer to the interval with largest error
c                       estimate
c           errmax    - elist(maxerr)
c           erlast    - error on the interval currently subdivided
c                       (before that subdivision has taken place)
c           area      - sum of the integrals over the subintervals
c           errsum    - sum of the errors over the subintervals
c           errbnd    - requested accuracy max(epsabs,epsrel*
c                       abs(result))
c           *****1    - variable for the left interval
c           *****2    - variable for the right interval
c           last      - index for subdivision
c           nres      - number of calls to the extrapolation routine
c           numrl2    - number of elements currently in rlist2. if an
c                       appropriate approximation to the compounded
c                       integral has been obtained it is put in
c                       rlist2(numrl2) after numrl2 has been increased
c                       by one.
c           small     - length of the smallest interval considered
c                       up to now, multiplied by 1.5
c           erlarg    - sum of the errors over the intervals larger
c                       than the smallest interval considered up to now
c           extrap    - logical variable denoting that the routine
c                       is attempting to perform extrapolation
c                       i.e. before subdividing the smallest interval
c                       we try to decrease the value of erlarg.
c           noext     - logical variable denoting that extrapolation
c                       is no longer allowed (true value)
c
c            machine dependent constants
c            ---------------------------
c
c           epmach is the largest relative spacing.
c           uflow is the smallest positive magnitude.
c           oflow is the largest positive magnitude.
c
c***first executable statement  qagse
      epmach = r1mach(4)
c
c            test on validity of parameters
c            ------------------------------