Which is faster - Floating-Point or Integer arithmetic? - page 13

 
Fernando Carreiro:

OK! Here it is and I will leave it up to you to draw your own conclusions:

Hello,
Yep, it seems there is not any gain from going 32bit; but we shall pay attention, float/double division is slower than the integer one. Also, slower floats in your example can show us that in modern hardware it may be better to work on the native bit width. However one should keep in mind that with lower bits, e.g. with ints or shorts, one can have the chance to vectorise and gain additional speed

Regards
 
Demos Stogios: Yep, it seems there is not any gain from going 32bit; but we shall pay attention, float/double division is slower than the integer one. Also, slower floats in your example can show us that in modern hardware it may be better to work on the native bit width. However one should keep in mind that with lower bits, e.g. with ints or shorts, one can have the chance to vectorise and gain additional speed
Yes, I agree! The best will be for the coder/user to always test his own implementations and decide which is best for the individual case!
 
Alain Verleyen:

So I ran the scripts (1E8 iterations), once again :-D

We can easily when the script started, finished and that all 4 logical processors where used at 100%. I also checked the MT5 threads, and I confirm there was only 1 additional thread while the script was running.

Conclusion: 1 thread can used 100% of all cores. I was not aware about that.


What I couldn't understand was why the execution time is multiplied by 100 when the iterations are just multiplied by 10 (hardcoded in script, so recompiled between the 2 runs).

2018.01.16 17:54:06.223 224626_2 (NZDUSD,H1) <int>: 37 ms for 10000000 iterations

2018.01.16 17:54:06.251 224626_2 (NZDUSD,H1) <double>: 27 ms for 10000000 iterations


2018.01.16 17:59:14.606 224626_2 (NZDUSD,H1) <int>: 3545 ms for 100000000 iterations

2018.01.16 17:59:18.672 224626_2 (NZDUSD,H1) <double>: 4062 ms for 100000000 iterations

So I changed the script a bit to use an input parameter to select the iterations count, instead of it being hardcoded. And surprise :

2018.01.16 18:04:02.855 224626_2 (NZDUSD,H1) <int>: 34 ms for 10000000 iterations

2018.01.16 18:04:02.887 224626_2 (NZDUSD,H1) <double>: 31 ms for 10000000 iterations


2018.01.16 18:03:53.974 224626_2 (NZDUSD,H1) <int>: 413 ms for 100000000 iterations

2018.01.16 18:03:54.423 224626_2 (NZDUSD,H1) <double>: 449 ms for 100000000 iterations

So there was also an MT5 compiler issue. :-)

P.S: My system workload was very low during these tests.

So it was finally not a compiler issue, but a memory issue. The "slow" results I had were due to disk usage (virtual memory). A bit stupid I didn't think to that sooner.

Thanks to @Fernando Carreiro for his collaboration on his instructive thread.

PS: As a side note, a computer with 4G RAM under Windows 10, is not usable without virtual memory enabled. 
 
Alain Verleyen:

So it was finally not a compiler issue, but a memory issue. The "slow" results I had were due to disk usage (virtual memory). A bit stupid I didn't think to that sooner.

Thanks to @Fernando Carreiro for his collaboration on his instructive thread.

PS: As a side note, a computer with 4G RAM under Windows 10, is not usable without virtual memory enabled. 

Good to know :) As for RAM, well, when I start to feel the heat, the absence of physical ram, I prefer to back off than to keep adding to it and at the same time torture my SSD. And, yes, I am on 4GB as well hehe

 
Comments that do not relate to this topic, have been moved to "double variable with wrong decimal cases".
 

For those interested in efficient code and how mql4/5 code is optimized by the compiler for modern CPU/GPU, there is a very interesting discussion on Russian forum, about calculating square roots directly versus accessing an array with pre-calculated values.

Форум по трейдингу, автоматическим торговым системам и тестированию торговых стратегий

Ошибки, баги, вопросы

Renat Fatkhullin, 2018.03.13 22:59

Проверка показала, что:

  1. SQRT маппятся в прямые CPU инструкции

  2. SQRT + математические вычисления идут без ветвлений и за одну команду (128 бит данные) вычисляется сразу два корня

    Вот этот код превращается в следующий ассемблерный SSE код:
             D1=sqrt((X1-X)*(X1-X)+(Y1-Y)*(Y1-Y));
             D2=sqrt((X2-X)*(X2-X)+(Y2-Y)*(Y2-Y));
             D3=sqrt((X3-X)*(X3-X)+(Y3-Y)*(Y3-Y));
             D4=sqrt((X4-X)*(X4-X)+(Y4-Y)*(Y4-Y));
             D5=sqrt((X5-X)*(X5-X)+(Y5-Y)*(Y5-Y));
             D6=sqrt((X6-X)*(X6-X)+(Y6-Y)*(Y6-Y));
             D7=sqrt((X7-X)*(X7-X)+(Y7-Y)*(Y7-Y));
             D8=sqrt((X8-X)*(X8-X)+(Y8-Y)*(Y8-Y));
            ...
            sqrtsd  xmm1, xmm1
            unpcklpd        xmm4, xmm4
            movapd  xmm3, xmmword ptr [rsp + 432]
            unpcklpd        xmm3, xmmword ptr [rsp + 384]
            subpd   xmm3, xmm4
            mulpd   xmm3, xmm3
            unpcklpd        xmm0, xmm0
            movapd  xmm5, xmmword ptr [rsp + 416]
            unpcklpd        xmm5, xmmword ptr [rsp + 400]
            subpd   xmm5, xmm0
            mulpd   xmm5, xmm5
            addpd   xmm5, xmm3
            sqrtpd  xmm8, xmm5
            movapd  xmm5, xmmword ptr [rsp + 464]
            subpd   xmm5, xmm4
            mulpd   xmm5, xmm5
            movapd  xmm7, xmm9
            subpd   xmm7, xmm0
            mulpd   xmm7, xmm7
            addpd   xmm7, xmm5
            movapd  xmm6, xmm10
            unpcklpd        xmm6, xmm11
            subpd   xmm6, xmm4
            movapd  xmm3, xmmword ptr [rsp + 368]
            unpcklpd        xmm3, xmmword ptr [rsp + 352]
            subpd   xmm3, xmm0
            movapd  xmm4, xmm8
            shufpd  xmm4, xmm4, 1
            sqrtpd  xmm5, xmm7
            mulpd   xmm6, xmm6
            mulpd   xmm3, xmm3
            addpd   xmm3, xmm6
            sqrtpd  xmm15, xmm3
            movapd  xmm0, xmm14
            unpcklpd        xmm0, xmmword ptr [rsp + 336]
            subpd   xmm0, xmm2
            mulpd   xmm0, xmm0
            movapd  xmm2, xmm0
            shufpd  xmm2, xmm2, 1
            addsd   xmm2, xmm0
            movapd  xmm0, xmm15
            shufpd  xmm0, xmm0, 1
            sqrtsd  xmm12, xmm2
    Это произведение исскуства вообще-то. 8 корней вычислено за 4 вызова ассемблерной команды. Два double числа вычислялись за один вызов.

  3. При операциях через массив все идет штатно, с проверками, ветвлениями и потерями на конвертации double -> integer index

  4. При работе с массивами в этом примере идет постоянное смешение FPU/ALU, что очень плохо сказывается на производлительности

  5. Оптимизация доступа к динамическому массиву отличная, выше похвал. Но смешение FPU/ALU операций + перевод double -> integer + ветвления тратят время

Общий вывод: математика в MQL5 побеждает за счет идеальной оптимизации. Тут не массивы проигрывают, а математика выигрывает.

Conclusion : calculating square roots directly is more efficient.
Reason: