Features of the mql5 language, subtleties and tricks - page 277

 
Edgar Akhmadeev #:

Wow! The first version, consistently the best on all processors (x86/AVX/AVX2).

Just probably should make a script to test for correctness of all algorithms. It's simple, IMHO. In a date loop compare all the results of calculations with the built-in function.


I guess, you mean ISA subset instead of processor...

BTW, AMD CPUs still support AVX512...

Although I dont see any benefit for that in the task we are trying to optimize here.

What we could try though, make the date calculations using doubles or floats, and see if we can find performance gains between the different ISA subsets.
 
Dominik Egert #:
BTW, AMD processors still support AVX512....

My AMD Ryzen 3 PRO 3200GE does not. I think server processors support it, but I haven't inquired.

 
Edgar Akhmadeev #:
AMD Ryzen 3 PRO 3200GE
I am sorry, I didnt check your specific CPU. - Intel removed AVX512 in 12thgen CPUs, due to compatibility issues with the efficient cores. - But AMD still has it.
 
Dominik Egert #:

Some very little more optimized version:



Results:


I have noticed for a long time that something strange is happening with the MQL compiler when running benchmarks,

If the number of functions to compare exceeds a certain limit ?!, then you are not getting the right results (i.e., the functions towards the end of the mq5 file gets the best results with least execution time, while the upper functions have bad results). This has happened to me before, so it's better to limit the compared functions to 3/4 maximum. If anybody has an explanation for this strange phenomenon, please enlight me.

This happened here too, when Dominic's version was added to the benchmark script file.

To get a more clear view,I simply kept only the two functions in the source code, and here are my results head-to-head:

Compiler Version: 4620 X64 Regular, optimization - true
13th Gen Intel Core i7-13700KF, AVX2 + FMA3
With hours (dt.hour+ dt.min+ dt.sec - on), random datetimes[].
1970.01.01 00:00:18 - 2097.11.29 23:59:36
 3.87 ns, checksum = 1240831954764697   // TimeToStructFast
 3.83 ns, checksum = 1240831954764697   // TimeToStructMQLplus
18.53 ns, checksum = 1240831954764697  /// MQL's TimeToStruct()

Compiler Version: 4620 AVX, optimization - true
13th Gen Intel Core i7-13700KF, AVX2 + FMA3
With hours (dt.hour+ dt.min+ dt.sec - on), random datetimes[].
1970.01.01 00:00:02 - 2097.11.29 23:59:39
 4.07 ns, checksum = 1245293506380519   // TimeToStructFast
 4.08 ns, checksum = 1245293506380519   // TimeToStructMQLplus
18.57 ns, checksum = 1245293506380519  /// MQL's TimeToStruct()

Compiler Version: 4620 AVX2 + FMA3, optimization - true
13th Gen Intel Core i7-13700KF, AVX2 + FMA3
With hours (dt.hour+ dt.min+ dt.sec - on), random datetimes[].
1970.01.01 00:00:47 - 2097.11.29 23:59:41
 4.10 ns, checksum = 1295113651058133   // TimeToStructFast
 4.08 ns, checksum = 1295113651058133   // TimeToStructMQLplus
18.62 ns, checksum = 1295113651058133  /// MQL's TimeToStruct()

The same performance and both managed to get below 5 nanoseconds.

 
amrali #:

I have noticed something strange has been happening with the MQL compiler when running benchmarks for a long time now,

If the number of functions to compare exceeds a certain limit ?!, then you are not getting the right results (i.e., the functions towards the end of the mq5 file gets the best results with least execution time, while the upper functions have bad results). This has happened to me before, so it's better to limit the compared functions to 3/4 maximum. If anybody has an explanation for this strange phenomenon, please enlight me.

This happened here too, when Dominic's version was added to the benchmark script file.

To get a more clear view,I simply kept only the two functions in the source code, and here are my results head-to-head:

The same performance and both managed to get below 5 nanoseconds.

Oh, I remember that discussion, where I said, adding a comment changes performance behavior of functions within the same mqh file...

Wasn't that along the radix sort algorithm?

I had concluded for me as an explanation, its about memory paging. But it was unsatisfying, since i only added comments.

Edit:
So in conclusion, the functions perform the same, roughly.
 
Dominik Egert #:
Oh, I remember that discussion, where I said, adding a comment changes performance behavior of functions within the same mqh file...

Wasn't that along the radix sort algorithm?

I had concluded for me as an explanation, its about memory paging. But it was unsatisfying, since i only added comments.

Edit:
So in conclusion, the functions perform the same, roughly.

yes exactly, we ran into the same issue when testing radix sort. The problem still exists.

I don't know if it is a problem with compiler's applied optimizations, or the processor is ramping its clock up (warm-up, turbo boost or whatever..).

 
amrali #:

yes exactly, we ran into the same issue when testing radix sort. The problem still exists.

I don't know if it is a problem with compiler's applied optimizations, or the processor is ramping its clock up (warm-up, turbo boost or whatever..).


Here is a version with a minimal memory footprint, but for some reason I am unable to replace the const uint t with a variable from the structure. - If I do so, I get an "array out of range" error. - Any idea?


bool TimeToStructMQLplus(const datetime timestamp, MqlDateTime& dt_struct)
{
    static const int Months[] = { 0, 11512692, 11512196, 11511744, 11511248, 11510766, 11510272, 11509790, 11509296, 11508797, 11508318, 11507822, 11507342 };

    const uint t             = (uint)(timestamp);
    dt_struct.day_of_week   = (int)(t / 86400);
    dt_struct.mon           = (dt_struct.day_of_week << 2) | 2;

    dt_struct.day_of_year   = (dt_struct.mon % 1461) >> 2;
    dt_struct.year          = (dt_struct.mon / 1461) + 1970;
    dt_struct.day           = !(dt_struct.year & 3);

    dt_struct.mon           = ((((dt_struct.day_of_year + ((dt_struct.day_of_year < (dt_struct.day + 59)) ? 0 : (2 - dt_struct.day))) * 12) + 373) / 367);
    dt_struct.day           = dt_struct.day_of_week - (int)((dt_struct.year * 5844 - Months[dt_struct.mon]) >> 4);
    #ifndef WITHOUT_HOURS
        dt_struct.hour      = (int)(t / 3600) % 24;
        dt_struct.min       = (int)(t / 60) % 60;
        dt_struct.sec       = (int)(t % 60);

    #endif //#ifndef WITHOUT_HOURS
    dt_struct.day_of_week   = (dt_struct.day_of_week + 4) % 7;
    return (true);
}


EDIT:

Yes, it is a bit strange, though, I have issues replicating it... - Same story as back then, I had the issue, but nobody else could replicate it.


Results of attached file:

Compiler Version: 4620 X64 Regular, optimization - true
12th Gen Intel Core i7-12700K, AVX2 + FMA3
With hours (dt.hour+ dt.min+ dt.sec - on), random datetimes[].
1970.01.01 00:00:23 - 2097.11.29 23:59:51
 3.95 ns, checksum = 1224148774003212   // TimeToStructMQLplus
 4.00 ns, checksum = 1224148774003212   // TimeToStructFast
19.98 ns, checksum = 1224148774003212  /// MQL's TimeToStruct()
 
Dominik Egert #:


Here is a version with a minimal memory footprint, but for some reason I am unable to replace the const uint t with a variable from the structure. - If I do so, I get an "array out of range" error. - Any idea?

Here it is (maybe for a digital clock with limited memory):

bool TimeToStructMQLplus(const datetime timestamp, MqlDateTime& dt_struct)
{
    static const int Months[] = { 0, 11512692, 11512196, 11511744, 11511248, 11510766, 11510272, 11509790, 11509296, 11508797, 11508318, 11507822, 11507342 };

  //const uint t             = (uint)(timestamp);
  //dt_struct.day_of_week   = (int)(t / 86400);
    dt_struct.sec           = (int)(timestamp);
    dt_struct.day_of_week   = (int)((uint)dt_struct.sec / 86400);
    dt_struct.mon           = (dt_struct.day_of_week << 2) | 2;

    dt_struct.day_of_year   = (dt_struct.mon % 1461) >> 2;
    dt_struct.year          = (dt_struct.mon / 1461) + 1970;
    dt_struct.day           = !(dt_struct.year & 3);

    dt_struct.mon           = ((((dt_struct.day_of_year + ((dt_struct.day_of_year < (dt_struct.day + 59)) ? 0 : (2 - dt_struct.day))) * 12) + 373) / 367);
    dt_struct.day           = dt_struct.day_of_week - (int)((dt_struct.year * 5844 - Months[dt_struct.mon]) >> 4);
    #ifndef WITHOUT_HOURS
        dt_struct.hour      = (int)((uint)dt_struct.sec / 3600) % 24;
        dt_struct.min       = (int)((uint)dt_struct.sec / 60) % 60;
        dt_struct.sec       = (int)((uint)dt_struct.sec % 60);

    #endif //#ifndef WITHOUT_HOURS
    dt_struct.day_of_week   = (dt_struct.day_of_week + 4) % 7;
    return (true);
}

after year 2038, the seconds counter (unix timestamp) will reach INT_MAX, so you must keep using uint type for seconds (to avoid negative no).

Note that fields of MqlDateTime are ints, so you have to cast the re-used field. Casting to the same no. of bits or more (widening cast) does not change the bits. It only changes interpretation of the sign bit (2's complement).

 
amrali #:

Here it is (maybe for a digital clock with limited memory):

after year 2038, the seconds counter (unix timestamp) will reach INT_MAX, so you must keep using uint type for seconds (to avoid negative no).

Note that fields of MqlDateTime are ints, so you have to cast the re-used field. Casting to the same no. of bits or more (widening cast) does not change the bits. It only changes interpretation of the sign bit (2's complement).


Thank you. Yes, I saw it this morning. I was to exhausted yesterday.

Now this function has a footprint of 2 cache lines, while only one needs to be pulled, as the other will already hold the MqlDateTime structure. 92 Byte total, MqlDateTime preloaded by function stack, plus datetime. Together 40 Bytes. And the static array is 13*4 = 52 Byte.

I dont think it is possible to further optimize this function as it is now.
 
Dominik Egert #:

Thank you. Yes, I saw it this morning. I was to exhausted yesterday.

Now this function has a footprint of 2 cache lines, while only one needs to be pulled, as the other will already hold the MqlDateTime structure. 92 Byte total, MqlDateTime preloaded by function stack, plus datetime. Together 40 Bytes. And the static array is 13*4 = 52 Byte.

I dont think it is possible to further optimize this function as it is now.
Is your pc out of memory ?!

Edit:
Variables will be optimized into processor registers rather than into cache.