Which is faster - Floating-Point or Integer arithmetic? - Expert Advisors and Automated Trading - MQL5 programming forum

Demos Stogios 2018.01.17 14:03 #121

Fernando Carreiro:

OK! Here it is and I will leave it up to you to draw your own conclusions:

Hello,

Yep, it seems there is not any gain from going 32bit; but we shall pay attention, float/double division is slower than the integer one. Also, slower floats in your example can show us that in modern hardware it may be better to work on the native bit width. However one should keep in mind that with lower bits, e.g. with ints or shorts, one can have the chance to vectorise and gain additional speed

Regards

Experiments ... Subscribing for signal Any questions from newcomers

Fernando Carreiro 2018.01.17 14:17 #122

Demos Stogios: Yep, it seems there is not any gain from going 32bit; but we shall pay attention, float/double division is slower than the integer one. Also, slower floats in your example can show us that in modern hardware it may be better to work on the native bit width. However one should keep in mind that with lower bits, e.g. with ints or shorts, one can have the chance to vectorise and gain additional speed

Yes, I agree! The best will be for the coder/user to always test his own implementations and decide which is best for the individual case!

Alain Verleyen 2018.01.17 15:35 #123

Alain Verleyen:

So I ran the scripts (1E8 iterations), once again :-D

We can easily when the script started, finished and that all 4 logical processors where used at 100%. I also checked the MT5 threads, and I confirm there was only 1 additional thread while the script was running.

Conclusion: 1 thread can used 100% of all cores. I was not aware about that.

What I couldn't understand was why the execution time is multiplied by 100 when the iterations are just multiplied by 10 (hardcoded in script, so recompiled between the 2 runs).

2018.01.16 17:54:06.223 224626_2 (NZDUSD,H1) <int>: 37 ms for 10000000 iterations

2018.01.16 17:54:06.251 224626_2 (NZDUSD,H1) <double>: 27 ms for 10000000 iterations

2018.01.16 17:59:14.606 224626_2 (NZDUSD,H1) <int>: 3545 ms for 100000000 iterations

2018.01.16 17:59:18.672 224626_2 (NZDUSD,H1) <double>: 4062 ms for 100000000 iterations

So I changed the script a bit to use an input parameter to select the iterations count, instead of it being hardcoded. And surprise :

2018.01.16 18:04:02.855 224626_2 (NZDUSD,H1) <int>: 34 ms for 10000000 iterations

2018.01.16 18:04:02.887 224626_2 (NZDUSD,H1) <double>: 31 ms for 10000000 iterations

2018.01.16 18:03:53.974 224626_2 (NZDUSD,H1) <int>: 413 ms for 100000000 iterations

2018.01.16 18:03:54.423 224626_2 (NZDUSD,H1) <double>: 449 ms for 100000000 iterations

So there was also an MT5 compiler issue. :-)

P.S: My system workload was very low during these tests.

So it was finally not a compiler issue, but a memory issue. The "slow" results I had were due to disk usage (virtual memory). A bit stupid I didn't think to that sooner.

Thanks to @Fernando Carreiro for his collaboration on his instructive thread.

PS: As a side note, a computer with 4G RAM under Windows 10, is not usable without virtual memory enabled.

MetaTrader 4 and MetaTrader What's wrong Metatrader 4 doesn't use

Demos Stogios 2018.01.17 18:12 #124

Alain Verleyen:

So it was finally not a compiler issue, but a memory issue. The "slow" results I had were due to disk usage (virtual memory). A bit stupid I didn't think to that sooner.

Thanks to @Fernando Carreiro for his collaboration on his instructive thread.

PS: As a side note, a computer with 4G RAM under Windows 10, is not usable without virtual memory enabled.

Good to know :) As for RAM, well, when I start to feel the heat, the absence of physical ram, I prefer to back off than to keep adding to it and at the same time torture my SSD. And, yes, I am on 4GB as well hehe

System Performance and DualCore-Processors MT4 terminal keeps deleting What's new in MetaTrader

Alain Verleyen 2018.01.18 21:08 #125

Comments that do not relate to this topic, have been moved to "double variable with wrong decimal cases".

Alain Verleyen 2018.03.14 10:09 #126

For those interested in efficient code and how mql4/5 code is optimized by the compiler for modern CPU/GPU, there is a very interesting discussion on Russian forum, about calculating square roots directly versus accessing an array with pre-calculated values.

Форум по трейдингу, автоматическим торговым системам и тестированию торговых стратегий

Ошибки, баги, вопросы

Renat Fatkhullin, 2018.03.13 22:59

Проверка показала, что:

SQRT маппятся в прямые CPU инструкции

SQRT + математические вычисления идут без ветвлений и за одну команду (128 бит данные) вычисляется сразу два корня

Вот этот код превращается в следующий ассемблерный SSE код:

         D1=sqrt((X1-X)*(X1-X)+(Y1-Y)*(Y1-Y));
         D2=sqrt((X2-X)*(X2-X)+(Y2-Y)*(Y2-Y));
         D3=sqrt((X3-X)*(X3-X)+(Y3-Y)*(Y3-Y));
         D4=sqrt((X4-X)*(X4-X)+(Y4-Y)*(Y4-Y));
         D5=sqrt((X5-X)*(X5-X)+(Y5-Y)*(Y5-Y));
         D6=sqrt((X6-X)*(X6-X)+(Y6-Y)*(Y6-Y));
         D7=sqrt((X7-X)*(X7-X)+(Y7-Y)*(Y7-Y));
         D8=sqrt((X8-X)*(X8-X)+(Y8-Y)*(Y8-Y));

        ...
        sqrtsd  xmm1, xmm1
        unpcklpd        xmm4, xmm4
        movapd  xmm3, xmmword ptr [rsp + 432]
        unpcklpd        xmm3, xmmword ptr [rsp + 384]
        subpd   xmm3, xmm4
        mulpd   xmm3, xmm3
        unpcklpd        xmm0, xmm0
        movapd  xmm5, xmmword ptr [rsp + 416]
        unpcklpd        xmm5, xmmword ptr [rsp + 400]
        subpd   xmm5, xmm0
        mulpd   xmm5, xmm5
        addpd   xmm5, xmm3
        sqrtpd  xmm8, xmm5
        movapd  xmm5, xmmword ptr [rsp + 464]
        subpd   xmm5, xmm4
        mulpd   xmm5, xmm5
        movapd  xmm7, xmm9
        subpd   xmm7, xmm0
        mulpd   xmm7, xmm7
        addpd   xmm7, xmm5
        movapd  xmm6, xmm10
        unpcklpd        xmm6, xmm11
        subpd   xmm6, xmm4
        movapd  xmm3, xmmword ptr [rsp + 368]
        unpcklpd        xmm3, xmmword ptr [rsp + 352]
        subpd   xmm3, xmm0
        movapd  xmm4, xmm8
        shufpd  xmm4, xmm4, 1
        sqrtpd  xmm5, xmm7
        mulpd   xmm6, xmm6
        mulpd   xmm3, xmm3
        addpd   xmm3, xmm6
        sqrtpd  xmm15, xmm3
        movapd  xmm0, xmm14
        unpcklpd        xmm0, xmmword ptr [rsp + 336]
        subpd   xmm0, xmm2
        mulpd   xmm0, xmm0
        movapd  xmm2, xmm0
        shufpd  xmm2, xmm2, 1
        addsd   xmm2, xmm0
        movapd  xmm0, xmm15
        shufpd  xmm0, xmm0, 1
        sqrtsd  xmm12, xmm2

Это произведение исскуства вообще-то. 8 корней вычислено за 4 вызова ассемблерной команды. Два double числа вычислялись за один вызов.

При операциях через массив все идет штатно, с проверками, ветвлениями и потерями на конвертации double -> integer index
При работе с массивами в этом примере идет постоянное смешение FPU/ALU, что очень плохо сказывается на производлительности
Оптимизация доступа к динамическому массиву отличная, выше похвал. Но смешение FPU/ALU операций + перевод double -> integer + ветвления тратят время

Общий вывод: математика в MQL5 побеждает за счет идеальной оптимизации. Тут не массивы проигрывают, а математика выигрывает.

Conclusion : calculating square roots directly is more efficient.

Error fixing and debugging OnTester event OpenCL: Parallel computations in

Which is faster - Floating-Point or Integer arithmetic? - page 13