Features of the mql5 language, subtleties and tricks - page 117

 

This is the variant I've come up with:

int log2_(ulong n)
{
  if (n==0) return -1;

  #define  M(n, i, base, S) ( n >= (ulong(1)<<(i)) ? S(n, i+base/2) : S(n, i-(base+1)/2) )

  #define  S_0(n, i)  i
  #define  S_1(n, i)  M(n, i, 1, S_0)
  #define  S_2(n, i)  M(n, i, 2, S_1)
  #define  S_4(n, i)  M(n, i, 4, S_2)
  #define  S_8(n, i)  M(n, i, 8, S_4)
  #define  S_16(n, i) M(n, i, 16, S_8)
  #define  S(n)       M(n, 32, 32, S_16)

  return S(n); 
}

It's supposed to be the fastest of all possible ones. All calculations are performed with constants, so they are calculated during compilation. So, everything is reduced to just 6 consecutive comparisons and nothing more. However, this variant works slower than the previous one. I can't understand the reason for that.

 
Alexey Navoykov:

This is the variant I've come up with:

In idea, this is the fastest of all possible. All calculations are made with constants, so they are calculated during compilation. So, everything is reduced to just 6 consecutive comparisons, and nothing more. However, this variant works slower than the previous one. I can't understand what the reason is.

Dividing by two slows it down ? Try to replace with a shift? I suspect that the calculated constants - must be calculated immediately (in this case - and the shift in the define - must also be replaced by a constant).

Besides, "question" is a rather controversial operator, as I know. Twenty years ago it was checked in C++ and sometimes "question" generates much longer code than the usual if operator. Maybe it's the same here ?

And, I would make the return code uint - what if there are some checks when converting signed and unsigned values?

I haven't got an opportunity to experiment manually yet - the CPU is way overloaded... Even text is typed "with slowness"...

 
Georgiy Merts:

Dividing by two slows it down ? Try to replace it with a shift ? I suspect that the calculated constants - must be calculated immediately (in this case - and the shift in the define - must also be replaced by a constant).

Also - "question" - as I know, it's quite a controversial operator ...

Replacing division with shift has no effect. I suspect that the resulting expression is too long, so the compiler didn't optimize it to the end.

But I ran the tests when Optimize=0, while when optimization was enabled, everything went well - the second variant was one and a half times faster. Bingo!

If optimisation is disabled, then the second option is slightly slower at small values, but slightly faster at larger ones. In short, the second option is definitely better.

 
Alexey Navoykov:

This is the variant I've come up with:

This is supposedly the fastest of all possible variants. All calculations are performed with constants, so they are calculated during compilation. So, everything is reduced to just 6 consecutive comparisons and nothing more. However, this variant works slower than the previous one. I cannot understand the reason for that.

That's right - your variant is the fastest.

It's just that the test is idle. Programmers very often forget one important thing when testing for performance: if a calculated value is not used anywhere, the compiler simply won't perform the computation.

It makes sense, what's the point? It's like in quantum superposition. Why should the moon exist if no one is looking at it. "Does the moon exist just because a mouse is looking at it?" (Albert Einstein). :))

So this version of the test with the checksum calculation and printing it would be more correct:

#property strict
#define    test(M,S,EX)        {long sum=0; uint nn=(uint)pow(10,M); ulong mss=GetMicrosecondCount(); for(uint t12=1;t12<=nn;t12++){EX;sum+=(long)n1;} \
                                Print(S+": loops="+(string)nn+" μs="+string(GetMicrosecondCount()-mss)+" Контрольная сумма="+string(sum));}

int log2(ulong n){
  if (n==0) return -1;
  #define  S(k) if (n >= (ulong(1)<<k)) { i += k;  n >>= k; }
  int i=0;  S(32);  S(16);  S(8);  S(4);  S(2);  S(1);  return i;
  #undef  S}


static const uint ulLogTable[64] = {
0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61,
51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62,
57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56,
45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5, 63 };

uint _FastLog2(ulong ulInput){
   ulInput |= ulInput >> 1;
   ulInput |= ulInput >> 2;
   ulInput |= ulInput >> 4;
   ulInput |= ulInput >> 8;
   ulInput |= ulInput >> 16;
   ulInput |= ulInput >> 32;  
   return(ulLogTable[(uint)((ulInput * 0x03f6eaf2cd271461) >> 58)]);};
   
int log2_(ulong n)
{
  if (n==0) return -1;

  #define  M(n, i, base, S) ( n >= (ulong(1)<<(i)) ? S(n, i+base/2) : S(n, i-(base+1)/2) )

  #define  S_0(n, i)  i
  #define  S_1(n, i)  M(n, i, 1, S_0)
  #define  S_2(n, i)  M(n, i, 2, S_1)
  #define  S_4(n, i)  M(n, i, 4, S_2)
  #define  S_8(n, i)  M(n, i, 8, S_4)
  #define  S_16(n, i) M(n, i, 16, S_8)
  #define  S(n)       M(n, 32, 32, S_16)

  return S(n); 
}

void OnStart(){
  srand(GetTickCount());
  ulong n1;
  test(8,"MathLog",n1=ulong(MathLog((double)t12)/MathLog(2.0)))
  test(8,"log2",n1=log2(t12))
  test(8,"log2_",n1=log2_(t12))
  test(8,"_FastLog2",n1=_FastLog2(t12))}

Result:

2019.01.05 02:30:03.681 TestLog (.BrentCrud,H4) MathLog:   loops=100000000 μs=805196 Контрольная сумма=2465782300
2019.01.05 02:30:04.092 TestLog (.BrentCrud,H4) log2:      loops=100000000 μs=410657 Контрольная сумма=2465782300
2019.01.05 02:30:04.234 TestLog (.BrentCrud,H4) log2_:     loops=100000000 μs=141975 Контрольная сумма=2465782300
2019.01.05 02:30:04.432 TestLog (.BrentCrud,H4) _FastLog2: loops=100000000 μs=198015 Контрольная сумма=2465782300
And second place is still _FastLog2, not log2 :))
 
Nikolai Semko:

It's just an idle test. An important point is often forgotten in performance tests: if the calculated value is not used anywhere, the compiler simply doesn't perform the computation.

It makes sense, what's the point? It's like in quantum superposition. Why should the moon exist if no one is looking at it. "Does the moon exist just because a mouse is looking at it?" (Albert Einstein). :))

So this version of the test with the checksum calculation and printing it will be more correct:

Your code is tangled. The variables used in the define are located at the other end of the program code - it's not convenient to sort through such chaos. But that's not the point. The point is that the results of your tests cannot be considered reliable because the compiler knows in advance the algorithm of values passed into the function. So it optimizes your tests. You should compute on random numbers .

By the way, why do you have srand in your code? When I saw it, at first I thought you were using random, but actually you're not.

Here is my code:

void OnStart()
{
  Print("OnStart");
  srand(GetTickCount());
  int count= 50000000;
   
  #define  TEST(func) { \ 
    ulong sum=0; \
    ulong mcscount= GetMicrosecondCount(); \
    for (int i=0; i<count; i++) \
      sum += func(rand() | rand()<<15); \
    Print("Result "+#func+":  sum=",sum,"  time=",(GetMicrosecondCount()-mcscount)/1000," ms"); \    
  }
  
   TEST(log2);
  TEST(log2_);
}
 
Alexey Navoykov:

Your code is confusing. The variables used in the define are located at the other end of the program code - it is inconvenient to sort through such chaos. But this is not the point, the point is that your test results cannot be considered reliable, because the compiler knows in advance the algorithm of values passed into the function. Therefore it optimizes your tests. You should calculate on random numbers .

By the way, why do you have srand in your code? When I saw it, at first I thought you were using random, but actually you're not.

Here is my code:

the code is not mine. I just tweaked it and removed rand to check the same checksums and remove relatively expensive rand function from the loop, but I simply forgot to remove srand.

I return rand. You're right - the compiler does optimize the loop for the sum of logarithms from consecutive values. I'm surprised, though. I don't understand how it does that. Perhaps there's something we're not taking into account.

#property strict
#define    test(M,S,EX)        {srand(45); long sum=0; uint nn=(uint)pow(10,M); ulong mss=GetMicrosecondCount(); for(uint t12=1;t12<=nn;t12++){EX;sum+=(long)n1;} \
                                Print(S+": loops="+(string)nn+" μs="+string(GetMicrosecondCount()-mss)+" Контрольная сумма="+string(sum));}

int log2(ulong n){
  if (n==0) return -1;
  #define  S(k) if (n >= (ulong(1)<<k)) { i += k;  n >>= k; }
  int i=0;  S(32);  S(16);  S(8);  S(4);  S(2);  S(1);  return i;
  #undef  S}


static const uint ulLogTable[64] = {
0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61,
51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62,
57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56,
45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5, 63 };

uint _FastLog2(ulong ulInput){
   ulInput |= ulInput >> 1;
   ulInput |= ulInput >> 2;
   ulInput |= ulInput >> 4;
   ulInput |= ulInput >> 8;
   ulInput |= ulInput >> 16;
   ulInput |= ulInput >> 32;  
   return(ulLogTable[(uint)((ulInput * 0x03f6eaf2cd271461) >> 58)]);};
   
int log2_(ulong n)
{
  if (n==0) return -1;

  #define  M(n, i, base, S) ( n >= (ulong(1)<<(i)) ? S(n, i+base/2) : S(n, i-(base+1)/2) )

  #define  S_0(n, i)  i
  #define  S_1(n, i)  M(n, i, 1, S_0)
  #define  S_2(n, i)  M(n, i, 2, S_1)
  #define  S_4(n, i)  M(n, i, 4, S_2)
  #define  S_8(n, i)  M(n, i, 8, S_4)
  #define  S_16(n, i) M(n, i, 16, S_8)
  #define  S(n)       M(n, 32, 32, S_16)

  return S(n); 
}

void OnStart(){
  ulong n1,n;
  test(8,"MathLog",n=(rand()+1)*(rand()+1);n1=ulong(MathLog((double)n)/MathLog(2.0)))
  test(8,"log2",n=(rand()+1)*(rand()+1);n1=log2(n))
  test(8,"log2_",n=(rand()+1)*(rand()+1);n1=log2_(n))
  test(8,"_FastLog2",n=(rand()+1)*(rand()+1);n1=_FastLog2(n))}

Result:

2019.01.05 04:10:25.808 TestLog (EURUSD,H1)     MathLog:   loops=100000000 μs=1168737 Контрольная сумма=2661391201
2019.01.05 04:10:26.474 TestLog (EURUSD,H1)     log2:      loops=100000000 μs=665631  Контрольная сумма=2661391201
2019.01.05 04:10:27.315 TestLog (EURUSD,H1)     log2_:     loops=100000000 μs=841299  Контрольная сумма=2661391201
2019.01.05 04:10:27.694 TestLog (EURUSD,H1)    _FastLog2:  loops=100000000 μs=378627   Контрольная сумма=2661391201
   

The current winner is _FastLog2

 
Nikolai Semko:

Result:

Current winner _FastLog2

I wonder how you got the same checksum everywhere, if the values are random.

 
Alexey Navoykov:

I wonder how you get the same checksum everywhere, if the values are random.

srand(45) for all functions

I just did it that way at first, but got different checksums, because I didn't take into account that rand()*rand() can be 0, which breaks the checksum. Now I added one to get away from zero.

 
Nikolai Semko:

srand(45) for all functions

I just did the same at first, but got different checksums, because I didn't take into account that rand()*rand() can be 0, which breaks the checksum. Now I added one to get away from zero.

And why do you need the same checksum if we're talking specifically about speed measurements? The point of the sum in this case is simply to prevent the compiler from cutting the code, that's all. And by doing srand(45), you again allow to optimize the test.

By the way, speaking of zero, FastLog2 doesn't check for zero, which gives it a head start. But it's still one and a half to two times slower than log2 if tested correctly).

 
Alexey Navoykov:
And why do you need the same checksum if we are speaking specifically about speed measurements? The point of the sum in this case is simply to prevent the compiler from cutting the code, that's all. And doing srand(45), again, allows you to optimize the test.

You are overestimating the capabilities of the compiler here. Remove srand(45) - checksums will be different, but the speed result remains the same.

Moreover, I was guided by the fact that the calculations were the same for the sake of purity of the experiment, because I didn't go into details of all the functions. Sometimes the value of a function parameter can affect the time of its execution.

All the more reason to check the correctness of the algorithms.

Reason: