文章 "MQL5 中的统计分布 - 取最佳的 R" - 页 7

 
ivanivan_11:

我说的是 R 语言,但我的水平实在太低))有人能检查一下代码是否正确吗?

如果代码是正确的,你能检查一下基准吗?

代码是错误的。

你测量的是函数的编译时间,而不是执行时间:

函数 cmpfun 编译一个闭包的主体,并返回一个新的闭包,其形式相同,主体由编译后的主体表达式替换。


错误证明:

> library(microbenchmark)
> library(compiler)
> n <- 2000
> k <- seq(0,n,by=20)
> qqq<- function(xx) { a <- dbinom(k, n, pi/10, log = TRUE) }
> res <- microbenchmark(cmpfun(qqq))
> a
Ошибка: объект 'a' не найден

如果在基准测试过程中运行了 qqq 函数,对象 a 就会收到计算出的数据。但结果却是,对象根本没有创建。

因此,基准计算的是编译时间而不是执行时间。在我的代码中,一切都是正确的--基准测试计算的是实际执行时间,而对象 a 是用正确的数据创建的。


是的,编译是一个相当昂贵的过程:它以毫秒而不是微秒为单位显示

> print(res)
Unit: milliseconds
        expr      min       lq     mean   median       uq      max neval
 cmpfun(qqq) 1.741577 1.783867 1.856366 1.812613 1.853096 2.697548   100
 

还有一个笑话,在你的例子中,你是如何覆盖系统函数 q() - 退出的。

没有办法退出 R :)

 
Dr.Trader:

我的意思是,mql 编译器在编译时已经知道所有输入参数。它在编译过程中计算出所有参数就足够了,而在调用脚本时,它只会返回预先计算出的结果。我在集线器上看到过一些比较 c++ 编译器的文章,从对汇编代码的分析来看,这正是发生在那里的事情。

是的,他可能正在积极使用。这里有一些例子:https://www.mql5.com/ru/forum/58241。

但在这种情况下是行不通的--由于复杂性、循环和数组填充等原因,你需要全盘计算。

 
ivanivan_11:

如果代码是正确的,你能检查一下基准吗?

您需要将res <- microbenchmark(cmpfun(q)) 替换为 res <- microbenchmark( q ()))。但以前编译过的库不会被重新编译成字节码,我得到的结果也是如此。

Renat Fatkhullin:
qqq<- function(xx) { a <- dbinom(k, n, pi/10, log = TRUE) }

在这种情况下,"a "将是一个局部变量,在函数本身之外无法访问。但你可以这样做
a<<- dbinom(k, n, pi/10, log = TRUE)
那么它将是一个全局变量。

Renat Fatkhullin:

但在这种情况下是行不通的--由于复杂性、循环和数组填充,你需要全盘计算。

我明白了,执行速度非常快

 

顺便说一句,将原始调用a <- dbinom(k, n, pi/10, log = TRUE) 解释为直接落入 R 内核的本地执行(dbinom 位于 r.dll)几乎不花一分钱。

因此,试图编译这个调用显然是没用的。

 

既然我已经多次写过关于 R 的快速性的文章,那就让我发表一下自己的看法。

亲爱的雷纳特

你的例子根本不算什么!

你用两个相似的函数就得出了 R 性能的结论。

你给出的函数完全不能代表 R 的强大和多样性。

你应该比较计算量大的操作。

例如,矩阵乘法

让我们测量一下 R 中的表达式

c <- a*b、

其中 a 和 b 是大小至少为 100*100 的矩阵。在代码中,请确保 R 使用英特尔的 MKL。这只需安装相应版本的 R 即可实现。

在 R 中,有大量代码包含计算密集型操作。为了执行这些操作,需要使用目前效率最高的库。

而 R 在交易中的作用并不在于您重写的函数(尽管它们也是必要的),而在于模型。我在给您的一个回复中提到了 Caret 软件包。看看它是什么....在这个软件包和 µl 的框架内实现任何实际有用的交易模型都会给您答案

此外,您不应该忘记,加载一个计算器的所有内核是 R 的例行程序。此外,您还可以在本地网络上加载相邻的计算器。

PS.

对我来说,比较 MKL 和 R 性能的想法值得商榷:这两个系统的主题领域完全不同

 

SanSanych,我们将测试一切并发布基准。但首先我们要完成功能。

测试是有道理的,它立即揭示了问题所在。我已经提出了理论依据,而且我确信,R 的系统开销将在功能过多的情况下得以保留。

我们可以用英特尔会输的方式来乘矩阵。长期以来,这并不是什么火箭科学,英特尔(或者说,公司下属的此类第三方程序员)并不是其处理器神话知识的冠军。

[删除]  
СанСаныч Фоменко:

既然我已经多次写过关于 R 的快速性的文章,那就让我发表一下自己的看法吧。

.................

致 San-Sanych 和其他人。

San-Sanych,你知道我有多尊敬你......((S)卡塔耶夫和费因齐尔伯格,人称 "伊尔夫和彼得罗夫"),尽管你们在这里开了一些后苏联时代的玩笑。

请允许我向您澄清一些重要的事情:

1).程序员的主要工作不是编写程序,而是阅读程序,尤其是他自己的程序。任何一个程序员 95...99% 的时间都是坐在那里盯着显示器。他写程序吗?不,他主要是读程序。因此,他在屏幕上读到的东西越接近自然语言,也就是他的母亲、父亲、祖母、学校老师教给他的东西,他就能越有效地解读屏幕上这些语言上的 "刹那",并找到算法与程序之间的对应关系。

2).这就是为什么我个人(以及 2-3 个负责任的人和不太负责任的人)能用 C、MQL4、CUDA.....,写出一个包含 700 多个子程序的项目,而且一切正常。而且一切正常。

3).从第(1)点的角度来看,C 语言的面向对象变体,即 C++,要糟糕得多(关于这一点,下次再谈)。(关于这一点,下次再谈)。

4).经典 C 和 MQL4 的完全兼容是非常宝贵的。来回传输一个程序只需半分钟。

5).C+MQL4 的主要优势在于清晰。也就是说,程序员屏幕上的所有内容都是可理解的、透明的。

如果我们将 C-MQL4 与您的 R 进行比较,我们不应看编写代码的速度和数量,而应看文本的清晰度。也就是说,它的可理解性。否则,程序员会盯着屏幕 24 小时,徒劳地试图理解程序的功能、参数、作者为何如此命名它们,以及程序员为何这样做而不是那样做。在这里,重要的不是程序的速度,而是程序工作的正确性以及程序对最终程序员的适用性。

从这一角度来看,Metaquotes 所做的一切当然是对那些希望在其 EA 中插入统计数据的人的极大支持。在功能的简易性和易懂性方面,没有什么可以与之相比。这一点非常重要。尤其是当您需要进行精细计算时(一般来说,外汇交易需要精细计算)。

让我们来比较一下。

以下是 C - MQL4 中的整合函数:

//__________________________________________________________________
//|
// Integral_Simpson_3_Points_Lite ()|
//_______________________________________________________|
#define  FACTOR_1_3      (1.0 / 3.0)

double Integral_Simpson_3_Points_Lite
(
   double       & Price_Array[],
   int          Period_Param,
   int          Last_Bar
)
{
        double  Sum, Sum_Full_Cycles, Sum_Tail, Sum_Total;
        int             i, j;
        int             Quant;
        int             Full_Cycles;
        int             Tail_Limit;
        int             Tail_Start;

        //..................................................................
        if (Last_Bar < 0)
        {
                Last_Bar = 0;
        }
        //.........................................
        if (Period_Param <= 1)
        {
                return (0.0);
        }
        //.................................................................
        if (Period_Param == 2)
        {
                return (0.5 * (Price_Array[Last_Bar] + Price_Array[Last_Bar + 1]));
        }
        //=============================================================================================
        //=============================================================================================
        Quant = 3;
        Full_Cycles = (Period_Param - 1) / (Quant - 1);
        Tail_Start = Full_Cycles * (Quant - 1);
        Tail_Limit = Period_Param - Tail_Start;
        //...........................................................
        j = Last_Bar;

        Sum = 0.0;
        for (i = 0; i < Full_Cycles; i ++)
        {
                //.........................................................................
                Sum += Price_Array[j];
                Sum += 4.0 * Price_Array[j + 1];
                Sum += Price_Array[j + 2];
                j = j + (Quant - 1);
        }
        //...........................................................
        Sum_Full_Cycles = Sum * FACTOR_1_3;
        Sum_Tail = Integral_Trapezoid_Lite (Price_Array,
                                            Tail_Limit,
                                            Last_Bar + Tail_Start);
        //.........................................................................
        Sum_Total = Sum_Full_Cycles + Sum_Tail;
        //...............................................................
        return (Sum_Total) ;

}
[删除]  

我会分部分写,这样写起来比较方便。

里面有一个梯形积分函数:

//__________________________________________________________________
//|
// Integral_Trapezoid_Lite ()|
//_______________________________________________________|
double Integral_Trapezoid_Lite
(
   double       & Price_Array[],
   int          Period_Param,
   int          Last_Bar
)
{
        double  Sum;
        int             i;
        int             Price_Index ;
        //.........................................
        if (Last_Bar < 0)
        {
                Last_Bar = 0;
        }
        //.........................................
        if (Period_Param <= 1)
        {
                return (0.0);
        }
        //.................................................................
        if (Period_Param == 2)
        {
                return (0.5 * (Price_Array[Last_Bar] + Price_Array[Last_Bar + 1]));
        }
        //..................................................................
        //..................................................................
        Sum = 0.0;
        for (i = 0; i < Period_Param; i++)
        {
                Price_Index = Last_Bar + i;
                if (Price_Index < 0)
                {
                        break;
                }
                //........................................
                if ((i == 0) || (i == (Period_Param - 1)))
                {
                        Sum = Sum + Price_Array[i] * 0.5;
                }
                else
                {
                        Sum = Sum + Price_Array[i];
                }
        }
        //...............................................................
        return (Sum) ;
}
[删除]  

一切都非常清晰易懂。更重要的是,它总是能正常运行,即使在 MT4-MQL4 中也能保持低误差,从而节省大量时间。

但是,如果你想知道为什么在 R 中工作时会出现难以理解的错误,或者如果你只是想了解整合程序中有哪些参数,或者他们在那里编程了什么整合方法,你会看到以下内容(上帝原谅我把这个贴给不成熟的编程小孩):

http://www.netlib.org/quadpack/

这只是最初用 Fortran 编写的函数的标题。正文在后面。这是 R 软件包中用于集成的原始程序。

这有什么好理解的,告诉我?

      subroutine qagse(f,a,b,epsabs,epsrel,limit,result,abserr,neval,
     *   ier,alist,blist,rlist,elist,iord,last)
c***begin prologue  qagse
c***date written   800101   (yymmdd)
c***revision date  830518   (yymmdd)
c***category no.  h2a1a1
c***keywords  automatic integrator, general-purpose,
c             (end point) singularities, extrapolation,
c             globally adaptive
c***author  piessens,robert,appl. math. & progr. div. - k.u.leuven
c           de doncker,elise,appl. math. & progr. div. - k.u.leuven
c***purpose  the routine calculates an approximation result to a given
c            definite integral i = integral of f over (a,b),
c            hopefully satisfying following claim for accuracy
c            abs(i-result).le.max(epsabs,epsrel*abs(i)).
c***description
c
c        computation of a definite integral
c        standard fortran subroutine
c        real version
c
c        parameters
c         on entry
c            f      - real
c                     function subprogram defining the integrand
c                     function f(x). the actual name for f needs to be
c                     declared e x t e r n a l in the driver program.
c
c            a      - real
c                     lower limit of integration
c
c            b      - real
c                     upper limit of integration
c
c            epsabs - real
c                     absolute accuracy requested
c            epsrel - real
c                     relative accuracy requested
c                     if  epsabs.le.0
c                     and epsrel.lt.max(50*rel.mach.acc.,0.5 d-28),
c                     the routine will end with ier = 6.
c
c            limit  - integer
c                     gives an upperbound on the number of subintervals
c                     in the partition of (a,b)
c
c         on return
c            result - real
c                     approximation to the integral
c
c            abserr - real
c                     estimate of the modulus of the absolute error,
c                     which should equal or exceed abs(i-result)
c
c            neval  - integer
c                     number of integrand evaluations
c
c            ier    - integer
c                     ier = 0 normal and reliable termination of the
c                             routine. it is assumed that the requested
c                             accuracy has been achieved.
c                     ier.gt.0 abnormal termination of the routine
c                             the estimates for integral and error are
c                             less reliable. it is assumed that the
c                             requested accuracy has not been achieved.
c            error messages
c                         = 1 maximum number of subdivisions allowed
c                             has been achieved. one can allow more sub-
c                             divisions by increasing the value of limit
c                             (and taking the according dimension
c                             adjustments into account). however, if
c                             this yields no improvement it is advised
c                             to analyze the integrand in order to
c                             determine the integration difficulties. if
c                             the position of a local difficulty can be
c                             determined (e.g. singularity,
c                             discontinuity within the interval) one
c                             will probably gain from splitting up the
c                             interval at this point and calling the
c                             integrator on the subranges. if possible,
c                             an appropriate special-purpose integrator
c                             should be used, which is designed for
c                             handling the type of difficulty involved.
c                         = 2 the occurrence of roundoff error is detec-
c                             ted, which prevents the requested
c                             tolerance from being achieved.
c                             the error may be under-estimated.
c                         = 3 extremely bad integrand behaviour
c                             occurs at some points of the integration
c                             interval.
c                         = 4 the algorithm does not converge.
c                             roundoff error is detected in the
c                             extrapolation table.
c                             it is presumed that the requested
c                             tolerance cannot be achieved, and that the
c                             returned result is the best which can be
c                             obtained.
c                         = 5 the integral is probably divergent, or
c                             slowly convergent. it must be noted that
c                             divergence can occur with any other value
c                             of ier.
c                         = 6 the input is invalid, because
c                             epsabs.le.0 and
c                             epsrel.lt.max(50*rel.mach.acc.,0.5 d-28).
c                             result, abserr, neval, last, rlist(1),
c                             iord(1) and elist(1) are set to zero.
c                             alist(1) and blist(1) are set to a and b
c                             respectively.
c
c            alist  - real
c                     vector of dimension at least limit, the first
c                      last  elements of which are the left end points
c                     of the subintervals in the partition of the
c                     given integration range (a,b)
c
c            blist  - real
c                     vector of dimension at least limit, the first
c                      last  elements of which are the right end points
c                     of the subintervals in the partition of the given
c                     integration range (a,b)
c
c            rlist  - real
c                     vector of dimension at least limit, the first
c                      last  elements of which are the integral
c                     approximations on the subintervals
c
c            elist  - real
c                     vector of dimension at least limit, the first
c                      last  elements of which are the moduli of the
c                     absolute error estimates on the subintervals
c
c            iord   - integer
c                     vector of dimension at least limit, the first k
c                     elements of which are pointers to the
c                     error estimates over the subintervals,
c                     such that elist(iord(1)), ..., elist(iord(k))
c                     form a decreasing sequence, with k = last
c                     if last.le.(limit/2+2), and k = limit+1-last
c                     otherwise
c
c            last   - integer
c                     number of subintervals actually produced in the
c                     subdivision process
c
c***references  (none)
c***routines called  qelg,qk21,qpsrt,r1mach
c***end prologue  qagse
c
      real a,abseps,abserr,alist,area,area1,area12,area2,a1,
     *  a2,b,blist,b1,b2,correc,defabs,defab1,defab2,r1mach,
     *  dres,elist,epmach,epsabs,epsrel,erlarg,erlast,errbnd,
     *  errmax,error1,error2,erro12,errsum,ertest,f,oflow,resabs,
     *  reseps,result,res3la,rlist,rlist2,small,uflow
      integer id,ier,ierro,iord,iroff1,iroff2,iroff3,jupbnd,k,ksgn,
     *  ktmin,last,limit,maxerr,neval,nres,nrmax,numrl2
      logical extrap,noext
c
      dimension alist(limit),blist(limit),elist(limit),iord(limit),
     * res3la(3),rlist(limit),rlist2(52)
c
      external f
c
c            the dimension of rlist2 is determined by the value of
c            limexp in subroutine qelg (rlist2 should be of dimension
c            (limexp+2) at least).
c
c            list of major variables
c            -----------------------
c
c           alist     - list of left end points of all subintervals
c                       considered up to now
c           blist     - list of right end points of all subintervals
c                       considered up to now
c           rlist(i)  - approximation to the integral over
c                       (alist(i),blist(i))
c           rlist2    - array of dimension at least limexp+2
c                       containing the part of the epsilon table
c                       which is still needed for further computations
c           elist(i)  - error estimate applying to rlist(i)
c           maxerr    - pointer to the interval with largest error
c                       estimate
c           errmax    - elist(maxerr)
c           erlast    - error on the interval currently subdivided
c                       (before that subdivision has taken place)
c           area      - sum of the integrals over the subintervals
c           errsum    - sum of the errors over the subintervals
c           errbnd    - requested accuracy max(epsabs,epsrel*
c                       abs(result))
c           *****1    - variable for the left interval
c           *****2    - variable for the right interval
c           last      - index for subdivision
c           nres      - number of calls to the extrapolation routine
c           numrl2    - number of elements currently in rlist2. if an
c                       appropriate approximation to the compounded
c                       integral has been obtained it is put in
c                       rlist2(numrl2) after numrl2 has been increased
c                       by one.
c           small     - length of the smallest interval considered
c                       up to now, multiplied by 1.5
c           erlarg    - sum of the errors over the intervals larger
c                       than the smallest interval considered up to now
c           extrap    - logical variable denoting that the routine
c                       is attempting to perform extrapolation
c                       i.e. before subdividing the smallest interval
c                       we try to decrease the value of erlarg.
c           noext     - logical variable denoting that extrapolation
c                       is no longer allowed (true value)
c
c            machine dependent constants
c            ---------------------------
c
c           epmach is the largest relative spacing.
c           uflow is the smallest positive magnitude.
c           oflow is the largest positive magnitude.
c
c***first executable statement  qagse
      epmach = r1mach(4)
c
c            test on validity of parameters
c            ------------------------------