Compiler index checks impact performance and optimization.

fxsaber 2025.07.27 16:29 #3071

b5179, I tried to speed up calculations by creating a table of pre-calculated data. But I didn't get any acceleration, because every access to an array element by index inside MQL5 causes checking of index correctness, even if everything in the code is error-free.

Then I decided to try to create an object that has FASTER access to the index than the array. And I "succeeded"!

I put it in quotes because I got unexplained results.

#include <fxsaber\TicksShort\Table.mqh> // https://www.mql5.com/en/code/61126
#include <fxsaber\Benchmark\Benchmark.mqh> // https://www.mql5.com/en/code/31279

#define  AMOUNT 50 // If you make it 60+, it will slow down hundreds of times!

void OnStart()
{
  long Array[2048];
  ArrayInitialize(Array, 1);

  TABLE<long> Table;
  Table = Array;
    
  long Sum1 = 0;
  long Sum2 = 0;
    
  const ulong StartTime1 = GetMicrosecondCount();
    for (int i = 0; i < 1 e10 / AMOUNT; i++)    
      for (uchar j = 0; j < AMOUNT; j++)
        Sum1 += Array[j];
  Print(GetMicrosecondCount() - StartTime1);

  const ulong StartTime2 = GetMicrosecondCount();
  _BV( // If you comment out this line and the line four below, the slowdown will be 200 times!
  for (int i = 0; i < 1 e10 / AMOUNT; i++)    
    for (uchar j = 0; j < AMOUNT; j++)
      Sum2 += Table[j];
     , 1)
  Print(GetMicrosecondCount() - StartTime2);
      
  Print(Sum1);
  Print(Sum2);
}

The performance of this code depends greatly on whether optimisation is selected in the compiler or not. And also on the included instructions (x64, AVX, ...).

432795 // Time to calculate the sum of array elements by index.
Alert: Bench_Stack = 0, 1 <= Time[Test6.mq5 187 in OnStart: for (int i = 0; i < 1 e10 / AMOUNT; i++) for (uchar j = 0; j < AMOUNT; j++) Sum2 += Table[j];] = 403864 mcs.
403951 // Time to calculate the sum of "table" elements by index.
10000000000
10000000000

It turns out that there is a way to access items by index 8-9% faster than array items.

And then inexplicable things begin, which I highlighted in the source code. For some reason, the table is calculated so quickly only if it is wrapped in a macro. Because of this macro I had to add a corresponding #include. And I can't reproduce this macro without this #include - it slows down hundreds of times.

The same wild slowdown also occurs when the number of calculated elements increases a bit - see the comment to AMOUNT.

Another complaint about the compiler. In the source code the index of an array element is uchar-type, and the number of elements of the corresponding static array is 2048 - which is obviously more than UCHAR_MAX. So why does the compiler checks each time the index for going outside the array?

Please do not make such checks in such situations. And please explain the anomalous behaviour of the script above. Like and inform if it is possible to make data tables in static arrays so that there are no checks on the index every time it is accessed?

Search string: Uluchshenie 130.

Errors, bugs, questions Price Data Generated Errors MetaTrader 4 Build 529

amrali 2025.07.28 16:11 #3072

fxsaber #:

b5179, I tried to speed up calculations by creating a table of pre-calculated data. But I didn't get any acceleration, because every access to an array element by index inside MQL5 causes checking of index correctness, even if everything in the code is error-free.

There are few points to consider here:

1. Index checking for arrays is done at compile time only (if the compiler can deduce the array bounds) and never at runtime, that's why you may get an "array out of index" runtime error if your index is beyond the array bounds.

2. Faster access times that were achieved here are never due to an object or a structure, it is merely due to compiler's optimization for the "switch" statement.

When compiling an small to medium switch with consecutive cases (0 to n), under optimization mode, the compiler converts the series of these multiple if-else statements O(n) into a jump table, where an array of pointers to code blocks is indexed using the cases number at runtime, giving O(1) direct access similar to the lookup table using an array.

3. Optimization of switch depends on the number of cases + optimization mode. Large number or complex switches do not get this jump table optimization and will remain as multiple if-else.

4. Lookup table using an array will always be the standard and the most reliable method not depending on the number of cases, their complexity or the compiler's optimization mode.

5. It would be more interesting if you measure random access times instead of sequentially to negate the effect of cpu cache on results:

const ulong StartTime1 = GetMicrosecondCount();
for (int i = 0; i < INT_MAX; i++) { 
   uchar j = (uchar) (((ulong)i * 601) % AMOUNT);  // Random index. 601 is a prime > AMOUNT
   Sum1 += Array[j];
}
Print(GetMicrosecondCount() - StartTime1);

Price Data Generated Errors Any questions from newcomers OpenCl and the tools

fxsaber 2025.07.28 17:32 #3073

amrali #:
There are a few things to consider here:

1. Index checking for arrays is only done at compile time (if the compiler can detect array boundaries) and never at runtime, so you may get an "array out of index" error at runtime if your index is outside the array boundaries.

I disagree. It is at runtime that there is a check for array out of bounds (as well as a check for pointer validity every time you work through it).

Forum on trading, automated trading systems and testing trading strategies.

New version of MetaTrader 5 build 4260: general improvements

Renat Fatkhullin, 2024.03.28 02:01

Language security requirements do not allow to skip index access control.

New MetaTrader 5 platform MetaTrader 5 Platform update New MetaTrader 5 platform

amrali 2025.07.28 17:58 #3074

fxsaber #:

I disagree. It is at runtime that there is a check for array out of bounds (as well as a check for pointer validity every time you work through it).

Yes it seems that I was wrong in point #1, I did not know about the new access control checks at runtime.

Still, the other points can provide valid explanations for your findings.

Another way to defeat spatial and temporal locality is :

   // Generate array of randomized indices
   int indices[][2];
   ArrayResize(indices, AMOUNT);
   for(int i = 0; i < AMOUNT; i++) {
      indices[i][0] = MathRand();
      indices[i][1] = i;
   }
   ArraySort(indices);

    // iterate the array in a randomized order
   const ulong StartTime1 = GetMicrosecondCount();
   for (int i = 0; i < 1e10 / AMOUNT; i++)    
      for (uchar j = 0; j < AMOUNT; j++) {
        int k = indices[j][1];
        Sum1 += Array[k];
      }
  Print(GetMicrosecondCount() - StartTime1);

REQ HA Offline Chart Trade at control points EURUSD - Trends, Forecasts

fxsaber 2025.07.29 08:45 #3075

amrali #:

Another way to defeat spatial and temporal locality is :

A good way to mix up the indexes. Thanks.

amrali 2025.07.29 10:12 #3076

Simpler mix (Fixed):

const ulong StartTime1 = GetMicrosecondCount();
for (int i = 0; i < INT_MAX; i++) { 
   uchar j = (uchar) (((ulong)i * 67) % AMOUNT);  // Random index. 67 is a prime > AMOUNT
   Sum1 += Array[j];
}
Print(GetMicrosecondCount() - StartTime1);

fxsaber 2025.07.29 19:38 #3077

b5179, another interesting compiler behaviour.

#define  AMOUNT 2000

void OnStart()
{
  long Array[AMOUNT];
  ArrayInitialize(Array, 1);
  
  long Sum = 0;

  const ulong StartTime1 = GetMicrosecondCount();
    for (int i = 0; i < 1 e10 / AMOUNT; i++)    
      for (uint j = 0; j < AMOUNT; j++)
// Sum += Array[j]; // 1880657
        Sum += Array[j] << 1; // 1154557: 40% faster in AVX mode!
  Print(GetMicrosecondCount() - StartTime1);
      
  Print(Sum);
  Print("Compiler Version: " + (string)__MQLBUILD__ + " " + __CPU_ARCHITECTURE__); // Compiler Version: 5179 AVX
}

If you do an additional bit operation during addition, the execution speed increases by 40%.

Search string: Oshibka 139.

amrali 2025.07.30 07:52 #3078

fxsaber #:
b5179, another interesting compiler behaviour.

If you do an additional bit operation during addition, the execution speed increases by 40%.

Search string: Oshibka 139.

By the way, I accidentally found that ArrayInitialize(array, 0) is much faster than ZeroMemory(array) for long[] arrays.

This could make difference in a class constructor or reset() method.

fxsaber 2025.07.30 08:07 #3079

amrali #:
By the way, I accidentally discovered that ArrayInitialise(array, 0) is much faster than ZeroMemory(array) for long[] arrays.

The second function zeroes the array size.

amrali 2025.07.30 08:31 #3080

fxsaber #:

The second function zeroes the array size.

Wrong!

ZeroMemory() zeros all the array elements. (The size does not change).

See the example in the docs:

https://www.mql5.com/en/docs/common/zeromemory

Features of the mql5 language, subtleties and tricks - page 308