Possible conditional check error - page 4

 
PlummersSoftwareLLC/Primes
PlummersSoftwareLLC/Primes
  • github.com
Prime Number Projects in C#/C++/Python. Contribute to PlummersSoftwareLLC/Primes development by creating an account on GitHub.
 
    eval = (data[cnt] > 0x7F);
    left[left_ptr] = (data[cnt] * eval) + (left[left_ptr] * !(eval)); 
    left_ptr += eval;

The result of the expression is assigned to variable 'eval', so it is OK.

But,

    ((test1) || ((test2) && (MyConditionFunc() == 1)) ) && MyTestFunc();

The compiler complains warning: "result of expression not used", because the value of the expression is not assigned to any variable. The expression will be evaluated but it is not a valid statement.

Look for statements vs expressions to understand the difference.

The boolean expression is evaluated according to the short-circuit evalution.

https://en.wikipedia.org/wiki/Short-circuit_evaluation


BTW: branchless code always executes faster than code with branching. It is a fact, but this is another topic.

Your code still has branching, to convert it to branchless use:

eval = (uchar) ((data[cnt] - 0x7F) >> 31);

Edit:

eval = (data[cnt] - 0x80) >> 31;
right[right_ptr] = (data[cnt] * eval) + (right[right_ptr] * !(eval));
right_ptr += eval;

left[left_ptr] = (data[cnt] * !(eval)) + (left[left_ptr] * eval);
left_ptr += !eval;


The difference found in the benchmarks is due to different condtions for branching plus some compiler optimizations.

 
Yes, very good example. Although I think you mean shift right 7 Bits, not 31 Bits.


 
The trick here is that if data[cnt] >= 128, then data[cnt] - 128 is nonnegative, otherwise it is negative. The highest bit in an int, the sign bit (bit 31), is 1 if and only if that number is negative. So shifting right by 31 makes the whole result 0 if it used to be nonnegative, and 1 if it used to be negative.

 

Not all branching is bad. In addition to compiler optimizations, modern CPUs have branch predictors. It is a harware optimization for if..else statements.

Please refer to this link for a nice explanation about branch prediction:

https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array

Keep in mind, by using this

eval = (data[cnt] - 0x80) >> 31;

you actually, turn off compiler optimizations and cpu branch predictors. Plus you sacrifice code readibility for performance. So, it is better to be reserved for performance-critical loops.

Why is processing a sorted array faster than processing an unsorted array?
Why is processing a sorted array faster than processing an unsorted array?
  • 2012.06.27
  • GManNickG GManNickG 462k 50 50 gold badges 467 467 silver badges 534 534 bronze badges
  • stackoverflow.com
Here is a piece of C++ code that shows some very peculiar behavior. For some strange reason, sorting the data miraculously makes the code almost six times faster:
 
amrali:
...


BTW: branchless code always executes faster than code with branching. It is a fact, but this is another topic.

Your code still has branching, to convert it to branchless use:

Edit:


The difference found in the benchmarks is due to different condtions for branching plus some compiler optimizations.

Your example code is the proof your statement is false actually.

2021.07.11 06:03:08.015    372818 (EURUSD,M1)    BINARY SHIFT Loop time: 2.4 nanosec; total time: 40982 microsec. Left = 1606418432 Right = 532676545

And the ternary operator is way faster (why ?), however it is using branching. All depends of the compiler but unfortunately we can't see the assembler code.
 
Alain Verleyen:

Your example code is the proof your statement is false actually.

2021.07.11 06:03:08.015    372818 (EURUSD,M1)    BINARY SHIFT Loop time: 2.4 nanosec; total time: 40982 microsec. Left = 1606418432 Right = 532676545

And the ternary operator is way faster (why ?), however it is using branching. All depends of the compiler but unfortunately we can't see the assembler code.
You know Alain, I tested it on laptop with core i3 and branchless code was faster. This proves that it greatly depends on L1, L2 cache, pipeline capacity among other compiler optimizations.

 
Anyway, these are the broad guidelines and actually there will always be some other factors in the equation. However, good algorithm design and good programming patterns are  indispensable. 
 
amrali:
You know Alain, I tested it on laptop with core i3 and branchless code was faster. This proves that it greatly depends on L1, L2 cache, pipeline capacity among other compiler optimizations.

Interesting. Do you mind to post the results ?

My setup :

2021.07.09 15:36:53.702 Terminal Windows 10 build 19042, Intel Core i7-9750H  @ 2.60GHz, 12 / 15 Gb memory, 62 / 279 Gb disk, IE 11, UAC, GMT-5


Files:
372818.mq5  6 kb
 
Alain Verleyen:

Interesting. Do you mind to post the results ?

My setup :

2021.07.09 15:36:53.702 Terminal Windows 10 build 19042, Intel Core i7-9750H  @ 2.60GHz, 12 / 15 Gb memory, 62 / 279 Gb disk, IE 11, UAC, GMT-5


Hi Alain, I have checked your code. With the ternary operator, you are actually comparing apples to oranges. Your code increment the left and right pointers unconditionally, this is reflected on your shorter execution time, unlike the original code, which increment the pointers based on the condition.

Here is my results afer I made some corrections.

Unconcditional Loop time: 4.9 nanosec; total time: 81974 microsec. Left = 1606418432 Right = 532676574
IF IF Loop time: 6.6 nanosec; total time: 110222 microsec. Left = 1606418432 Right = 532676574
IF ELSE IF Loop time: 6.6 nanosec; total time: 110091 microsec. Left = 1606418432 Right = 532676574
IF ELSE Loop time: 6.6 nanosec; total time: 110222 microsec. Left = 1606418432 Right = 532676574
TERNARY ? Loop time: 0.6 nanosec; total time: 9404 microsec. Left = 1606418432 Right = 532676574
BINARY SHIFT Loop time: 4.5 nanosec; total time: 76047 microsec. Left = 1606418432 Right = 532676574
BINARY SHIFT APPLES Loop time: 0.6 nanosec; total time: 10869 microsec. Left = 1606418432 Right = 532676574


Terminal MetaTrader 5 x64 build 2994 started for MetaQuotes Software Corp.
Terminal Windows 7 Service Pack 1 build 7601, Intel Core i3-2330M  @ 2.20GHz, 2 / 3 Gb memory, 9 / 29 Gb disk, IE 8, Admin, GMT+2
Files:
372818.mq5  7 kb
Reason: