check pointer by "!" - page 7

 
Alain Verleyen #:

Though, with a little improvement 😄

2025.09.30 09:18:48.828    CCollection::benchmarkNot                                 23914 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796
2025.09.30 09:18:48.858    CCollection::benchmarkNot_DoubleCheck            30039 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796
2025.09.30 09:18:48.868    CCollection::benchmarkNullOrCheck                     9694 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796
2025.09.30 09:18:48.879    CCollection::benchmarkNullOrCheck_NoBranch    11098 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796
2025.09.30 09:18:48.888    CCollection::benchmarkNullOrCheck_NoBranchAV  9201 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796

PS: Result obtained with AVX512 maximum optimizations.

Though, what I initially missed, is you changed my code to this  :

    CCollection()
    {
        MathSrand(GetTickCount());
        for(int idx = 0; idx < _size; idx++)
        { units[idx] = ((MathRand() % 0x0F) == NULL) ? new A(idx) : NULL; }
    }

The pointer is either valid or null, so the CheckPointer() is never triggered as you removed the possibility to have a dangling pointer.

Re-enabling the dangling pointers changes the results.

2025.09.30 09:57:25.953    CCollection::benchmarkNot                                  22305 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339
2025.09.30 09:57:25.984    CCollection::benchmarkNot_DoubleCheck             30936 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339
2025.09.30 09:57:25.994    CCollection::benchmarkNullOrCheck                     9978 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339
2025.09.30 09:57:26.006    CCollection::benchmarkNullOrCheck_NoBranch     11597 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339
2025.09.30 09:57:26.020    CCollection::benchmarkNullOrCheck_NoBranchAV  13998 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339

I have serious doubt there is really no branching using the ternary operator here.

 
Alain Verleyen #:

Thanks for sharing a concrete example of the "no branch" paradigm.

As expected it's not better in this case 😮‍💨

What's the point of the benchmarkNot_DoubleCheck() ?

Its supposed to show the actual cost of the check, compared to a single check.
 
Alain Verleyen #:

Though, with a little improvement :

2025.09.30 09:18:48.828    CCollection::benchmarkNot                                 23914 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796
2025.09.30 09:18:48.858    CCollection::benchmarkNot_DoubleCheck            30039 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796
2025.09.30 09:18:48.868    CCollection::benchmarkNullOrCheck                     9694 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796
2025.09.30 09:18:48.879    CCollection::benchmarkNullOrCheck_NoBranch    11098 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796
2025.09.30 09:18:48.888    CCollection::benchmarkNullOrCheck_NoBranchAV  9201 µs, sum = 333795028; Null-Values: 9332204; InstancesCount: 667796

PS: Result obtained with AVX512 maximum optimizations.

PS2: There is not really an improvement if you can have dangling pointer (pointer not NULL but invalid).

I have been testing this with "No Optimization" and, please be aware, the NoBranch-Function has 2 checks, not one. - So the gain by the NoBranch coding style is significant, compared to a branching version of the same code. - At least as far as I can tell. 

When looking at the "Not" compared to the "Not_DoubleCheck" function, you can see the actual cost of the implicit function call by subtracting the results from each other. 30039 - 23914 = 6125 micros for the second line of check in the loop.

 While the cost for the non-branching function is significant lower, which I interpret as the gain for reducint the branch-prediction misses. 11098 - 9694 = 1404 micros.


I have no idea if this is a correct interpretation on the results, but assume it is.

 
Alain Verleyen #:

Though, what I initially missed, is you changed my code to this  :

The pointer is either valid or null, so the CheckPointer() is never triggered as you removed the possibility to have a dangling pointer.

Re-enabling the dangling pointers changes the results.

2025.09.30 09:57:25.953    CCollection::benchmarkNot                                  22305 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339
2025.09.30 09:57:25.984    CCollection::benchmarkNot_DoubleCheck             30936 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339
2025.09.30 09:57:25.994    CCollection::benchmarkNullOrCheck                     9978 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339
2025.09.30 09:57:26.006    CCollection::benchmarkNullOrCheck_NoBranch     11597 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339
2025.09.30 09:57:26.020    CCollection::benchmarkNullOrCheck_NoBranchAV  13998 µs, sum = 332781055; Null-Values: 9333276; InstancesCount: 667339

I have serious doubt there is really no branching using the ternary operator here.

I have updated the code according to your functional proposal, and added more "force-to-do-the-loop" code:

//+------------------------------------------------------------------+
//|                                                  Playground1.mq5 |
//|                                  Copyright 2025, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
#define _size int(1e7)

class A
{
    public:
    const long getId() const { return id; }
    
    static int getInstancesCount() { return instancesCount; }
    
    static int instancesCount;
    static int invalid_ptrs;
    static int valid_ptrs;
    const int  id;

    public:
    A(int a_id) : id(a_id) { instancesCount++; }
};

int A::instancesCount = 0;
int A::invalid_ptrs = 0;
int A::valid_ptrs = 0;


class CCollection
{
    public:
    void               benchmarkNot()
    {
        const ulong start = GetMicrosecondCount();
        long sum = 0;
        long count_null = NULL;
        for(int i = 0; i < _size; i++)
        {
            if(!units[i])
            { 
                count_null++;
                continue;
            }
            sum += units[i].getId() % (MathRand() + 1);
        }
        const ulong usElapsed = GetMicrosecondCount() - start;
        printf("%-45s %llu µs, sum = %lli; Null-Values: %lli; InstancesCount: %i", __FUNCTION__, usElapsed, sum, count_null, A::instancesCount);
    }
    
    void               benchmarkNot_DoubleCheck()
    {
        const ulong start = GetMicrosecondCount();
        long sum = 0;
        long count_null = NULL;
        for(int i = 0; i < _size; i++)
        {
            count_null += (!units[i]);
            sum += (!units[i]) ? NULL : (units[i].getId() % (MathRand() + 1));
        }
        const ulong usElapsed = GetMicrosecondCount() - start;
        printf("%-45s %llu µs, sum = %lli; Null-Values: %lli; InstancesCount: %i", __FUNCTION__, usElapsed, sum, count_null, A::instancesCount);
    }
    
    void               benchmarkNullOrCheck()
    {
        const ulong start = GetMicrosecondCount();
        long sum = 0;
        long count_null = NULL;
        for(int i = 0; i < _size; i++)
        {
            if(units[i]==NULL || CheckPointer(units[i])==POINTER_INVALID)
            {
                count_null++;
                continue;
            }
            sum += units[i].getId() % (MathRand() + 1);
        }
        const ulong usElapsed = GetMicrosecondCount() - start;
        printf("%-45s %llu µs, sum = %lli; Null-Values: %lli; InstancesCount: %i", __FUNCTION__, usElapsed, sum, count_null, A::instancesCount);
    }


    void               benchmarkNullOrCheck_NoBranch()
    {
        const ulong start = GetMicrosecondCount();
        long sum = 0;
        long count_null = NULL;
        for(int i = 0; i < _size; i++)
        {
            count_null += (units[i]==NULL || CheckPointer(units[i])==POINTER_INVALID);
            sum += (units[i]!=NULL && CheckPointer(units[i])!=POINTER_INVALID) ? (units[i].getId() %(MathRand() + 1)) : NULL;
        }
        const ulong usElapsed = GetMicrosecondCount() - start;
        printf("%-45s %llu µs, sum = %lli; Null-Values: %lli; InstancesCount: %i", __FUNCTION__, usElapsed, sum, count_null, A::instancesCount);
    }


    private:
    A*                 units[_size];

    public:
    CCollection()
    {
        MathSrand(GetTickCount());
        for(int idx = 0; idx < _size; idx++)
        { units[idx] = ((MathRand() % 0x0F) == NULL) ? new A(idx) : NULL; }

        for(int idx = 0; idx < _size; idx++)
        { 
            if( ((MathRand() % 0x0F) == NULL) 
             && (units[idx] != NULL) )
            { 
                delete(units[idx]); 
                A::invalid_ptrs++;
            }
            else if(units[idx] != NULL)
            { A::valid_ptrs++; }
        }
    }
    
    ~CCollection()
    {
        for(int i = 0; i < _size; i++)
        if(CheckPointer(units[i]) == POINTER_DYNAMIC)
        delete units[i];
    }
};



void OnStart()
{
    CCollection* collection = new CCollection();

    printf("*** Benchmarking ***\n Invalid pointers: %i\n Valid pointers: %i", A::invalid_ptrs, A::valid_ptrs);

    collection.benchmarkNot();
    collection.benchmarkNot_DoubleCheck();
    collection.benchmarkNullOrCheck();
    collection.benchmarkNullOrCheck_NoBranch();

    delete collection;
}


 
Dominik Egert #:

I have updated the code according to your functional proposal, and added more "force-to-do-the-loop" code:


I think this line should add something like this:

This change should prevent some optimizations tacktiks by the compiler. - I will upload an updated version later or tomorrow.

sum += (sum / xxx-calc);
 
Dominik Egert #:

I have been testing this with "No Optimization" and, please be aware, the NoBranch-Function has 2 checks, not one. - So the gain by the NoBranch coding style is significant, compared to a branching version of the same code. - At least as far as I can tell. 

When looking at the "Not" compared to the "Not_DoubleCheck" function, you can see the actual cost of the implicit function call by subtracting the results from each other. 30039 - 23914 = 6125 micros for the second line of check in the loop.

 While the cost for the non-branching function is significant lower, which I interpret as the gain for reducint the branch-prediction misses. 11098 - 9694 = 1404 micros.


I have no idea if this is a correct interpretation on the results, but assume it is.

I think we have already demonstrated that there is no point to use the "Not" version, unless maybe for lazy coders. Initializing a pointer variable to NULL, then setting it to NULL after it has been deleted is not much effort, and if you need to check a pointer, you can then use the ptr==NULL version. By security, you can use the "full" check version ptr==NULL || CheckPointer(ptr) ...

Now we come to branching/non-branching issue. I add a macro to disable the counters, and the dangling pointers, to compare specifically the branch/no branch.

No compiler optimization :

2025.10.01 09:39:06.539    *** Benchmarking with dangling pointers***
2025.10.01 09:39:06.539     Invalid pointers: 44348
2025.10.01 09:39:06.539     Valid pointers: 622807
2025.10.01 09:39:06.563    CCollection::benchmarkNot                                 23687 µs, sum = 5100127920; Null-Values: 0; InstancesCount: 667155
2025.10.01 09:39:06.579    CCollection::benchmarkNullOrCheck                    16378 µs, sum = 5102876848; Null-Values: 0; InstancesCount: 667155
2025.10.01 09:39:06.594    CCollection::benchmarkNullOrCheck_NoBranch    14406 µs, sum = 5102051155; Null-Values: 0; InstancesCount: 667155

2025.10.01 09:39:52.304    *** Benchmarking with NULL pointers only***
2025.10.01 09:39:52.304     Invalid pointers: 0
2025.10.01 09:39:52.304     Valid pointers: 0
2025.10.01 09:39:52.330    CCollection::benchmarkNot                                 26422 µs, sum = 5463352753; Null-Values: 0; InstancesCount: 666925
2025.10.01 09:39:52.350    CCollection::benchmarkNullOrCheck                    19452 µs, sum = 5462121881; Null-Values: 0; InstancesCount: 666925
2025.10.01 09:39:52.365    CCollection::benchmarkNullOrCheck_NoBranch    14860 µs, sum = 5461147781; Null-Values: 0; InstancesCount: 666925

It demonstrates clearly the gain of the "No branch" coding style.

Though with max compiler optimization :

2025.10.01 09:45:02.648    *** Benchmarking with dangling pointers***
2025.10.01 09:45:02.648     Invalid pointers: 44090
2025.10.01 09:45:02.648     Valid pointers: 622524
2025.10.01 09:45:02.672    CCollection::benchmarkNot                                  23566 µs, sum = 5102222996; Null-Values: 0; InstancesCount: 666614
2025.10.01 09:45:02.682    CCollection::benchmarkNullOrCheck                     10399 µs, sum = 5099655659; Null-Values: 0; InstancesCount: 666614
2025.10.01 09:45:02.692    CCollection::benchmarkNullOrCheck_NoBranch     10119 µs, sum = 5092526990; Null-Values: 0; InstancesCount: 666614

2025.10.01 09:45:16.656    *** Benchmarking with NULL pointers only***
2025.10.01 09:45:16.656     Invalid pointers: 0
2025.10.01 09:45:16.656     Valid pointers: 0
2025.10.01 09:45:16.678    CCollection::benchmarkNot                                  22125 µs, sum = 5459597493; Null-Values: 0; InstancesCount: 667443
2025.10.01 09:45:16.688    CCollection::benchmarkNullOrCheck                       9883 µs, sum = 5470744351; Null-Values: 0; InstancesCount: 667443
2025.10.01 09:45:16.698    CCollection::benchmarkNullOrCheck_NoBranch       9827 µs, sum = 5474991922; Null-Values: 0; InstancesCount: 667443

It demonstrates clearly that the gain of "No branch" coding style is marginal when the compiler optimizations are used. Still there is a gain.

My conclusion : the theoretical gain of no branch coding style is obvious. The practical gain is real but small (in this case), it's to the coder to see if it's worth to invest time on it or not. 

Files:
__forum__.mq5  9 kb
 
Dominik Egert #:

I think this line should add something like this:

This change should prevent some optimizations tacktiks by the compiler. - I will upload an updated version later or tomorrow.

I don't think it will have any impact on the results.
 
Alain Verleyen #:

I think we have already demonstrated that there is no point to use the "Not" version, unless maybe for lazy coders. Initializing a pointer variable to NULL, then setting it to NULL after it has been deleted is not much effort, and if you need to check a pointer, you can then use the ptr==NULL version. By security, you can use the "full" check version ptr==NULL || CheckPointer(ptr) ...

Now we come to branching/non-branching issue. I add a macro to disable the counters, and the dangling pointers, to compare specifically the branch/no branch.

No compiler optimization :

2025.10.01 09:39:06.539    *** Benchmarking with dangling pointers***
2025.10.01 09:39:06.539     Invalid pointers: 44348
2025.10.01 09:39:06.539     Valid pointers: 622807
2025.10.01 09:39:06.563    CCollection::benchmarkNot                                 23687 µs, sum = 5100127920; Null-Values: 0; InstancesCount: 667155
2025.10.01 09:39:06.579    CCollection::benchmarkNullOrCheck                    16378 µs, sum = 5102876848; Null-Values: 0; InstancesCount: 667155
2025.10.01 09:39:06.594    CCollection::benchmarkNullOrCheck_NoBranch    14406 µs, sum = 5102051155; Null-Values: 0; InstancesCount: 667155

2025.10.01 09:39:52.304    *** Benchmarking with NULL pointers only***
2025.10.01 09:39:52.304     Invalid pointers: 0
2025.10.01 09:39:52.304     Valid pointers: 0
2025.10.01 09:39:52.330    CCollection::benchmarkNot                                 26422 µs, sum = 5463352753; Null-Values: 0; InstancesCount: 666925
2025.10.01 09:39:52.350    CCollection::benchmarkNullOrCheck                    19452 µs, sum = 5462121881; Null-Values: 0; InstancesCount: 666925
2025.10.01 09:39:52.365    CCollection::benchmarkNullOrCheck_NoBranch    14860 µs, sum = 5461147781; Null-Values: 0; InstancesCount: 666925

It demonstrates clearly the gain of the "No branch" coding style.

Though with max compiler optimization :

2025.10.01 09:45:02.648    *** Benchmarking with dangling pointers***
2025.10.01 09:45:02.648     Invalid pointers: 44090
2025.10.01 09:45:02.648     Valid pointers: 622524
2025.10.01 09:45:02.672    CCollection::benchmarkNot                                  23566 µs, sum = 5102222996; Null-Values: 0; InstancesCount: 666614
2025.10.01 09:45:02.682    CCollection::benchmarkNullOrCheck                     10399 µs, sum = 5099655659; Null-Values: 0; InstancesCount: 666614
2025.10.01 09:45:02.692    CCollection::benchmarkNullOrCheck_NoBranch     10119 µs, sum = 5092526990; Null-Values: 0; InstancesCount: 666614

2025.10.01 09:45:16.656    *** Benchmarking with NULL pointers only***
2025.10.01 09:45:16.656     Invalid pointers: 0
2025.10.01 09:45:16.656     Valid pointers: 0
2025.10.01 09:45:16.678    CCollection::benchmarkNot                                  22125 µs, sum = 5459597493; Null-Values: 0; InstancesCount: 667443
2025.10.01 09:45:16.688    CCollection::benchmarkNullOrCheck                       9883 µs, sum = 5470744351; Null-Values: 0; InstancesCount: 667443
2025.10.01 09:45:16.698    CCollection::benchmarkNullOrCheck_NoBranch       9827 µs, sum = 5474991922; Null-Values: 0; InstancesCount: 667443

It demonstrates clearly that the gain of "No branch" coding style is marginal when the compiler optimizations are used. Still there is a gain.

My conclusion : the theoretical gain of no branch coding style is obvious. The practical gain is real but small (in this case), it's to the coder to see if it's worth to invest time on it or not. 


I agree.