Errors, bugs, questions - page 2165

 

I am not getting an optimisation graph for negative values.

The data is available in the optimisation results.

Try setting negative values in your EAs. Values can be * -1 to check.

 
Renat Fatkhullin:

A check revealed that:

  1. SQRT mapped into direct CPU instructions

  2. SQRT + mathematical calculations are performed without branches and for one instruction (128-bit data) two roots are calculated at once

    This code turns into the following assembler SSE code:
    This is a work of art actually. 8 roots were calculated in 4 calls of an assembler instruction. Two double numbers are evaluated in one call.

  3. When operating through an array, everything goes as usual with checks, branching and losses on converting the double -> integer index.

  4. When working with arrays in this example there's constant FPU/ALU mixing which is very bad for productivity

  5. Optimization of dynamic array access is great - beyond praise. But mixing FPU/ALU operations + double -> integer + branching wastes time

The general conclusion: mathematics has won in MQL5 due to perfect optimization. It's not arrays that lose here, but mathematics wins.

Thank you very much for the valuable information.

The news is, of course, more joyous. It's really cool!

I've always said that MQs are beauties!

But I also realize that one has to be very careful with mixing of data types. But forewarned is forearmed.
It would be great to get some advice from the developers in this light.

Now I'll experiment with the types of both regular variables and arrays. I wonder what will come out.

 
Renat Fatkhullin:

I've been experimenting.

I still don't seem to be able to put the puzzle together.

I have made two variants. The first one - converted everything to int type. The second one - to double.

Yes, it became a bit faster. But the main drawback is still there.

Here is the main braking block with the int variant:

 if(arr)
        {  // расчет квадратных корней через массив значений SQRT[]
         D1=SQRT[((X1-X)*(X1-X)+(Y1-Y)*(Y1-Y))];
         D2=SQRT[((X2-X)*(X2-X)+(Y2-Y)*(Y2-Y))];
         D3=SQRT[((X3-X)*(X3-X)+(Y3-Y)*(Y3-Y))];
         D4=SQRT[((X4-X)*(X4-X)+(Y4-Y)*(Y4-Y))];
         D5=SQRT[((X5-X)*(X5-X)+(Y5-Y)*(Y5-Y))];
         D6=SQRT[((X6-X)*(X6-X)+(Y6-Y)*(Y6-Y))];
         D7=SQRT[((X7-X)*(X7-X)+(Y7-Y)*(Y7-Y))];
         D8=SQRT[((X8-X)*(X8-X)+(Y8-Y)*(Y8-Y))];
        }

It has only int type and no type mixing. The SQRT array itself has become int.

It works only 10% faster.

The situation with the double variant is similar.

Well, everything is identical - only in the first case it is the sqrt() function being calculated and there is type mixing.

Whereas the second case refers to an int array and there is no type mixing, and in theory only ALU should be used.

But the second way is 3 times slower. Well, whatever the reason, it is the array.

There is one more important thing.

In the int example, if the canvas has size 100x100, i.e. with the following parameters

we get a speed advantage when accessing the array.

I.e. when using an SQRT array of size 20 000, we gain 15-20%, and when using an array of size 3 000 000, we lose 200%, despite absolutely identical mathematics.

So the size of the array is the cause of the brakes?

Files:
LSD_double.mq5  10 kb
LSD_int.mq5  10 kb
 

People long ago lost the ability to understand the results of modern C++ compilers.

In addition, you have a mess of code, which means almost zero possibility to build naive axioms "if these conditions, then the result will be this". That is, the resulting optimization will rearrange everything so much that your hypotheses will produce tens of percent different results even with minuscule changes in the code.

Take another look at cramming 8 roots into 4 assembler commands and realize that you don't have a chance to assert, demand or appeal to your logic. Optimisers have long operated at prohibitive levels beyond the reach of programmers.

The way the compiler decomposes the roots is an art. And you are trying to beat it with arrays without even understanding the simplest constraint - reading from an array is already a failure. Perfect register work and batch calculation of roots vs. branching (penalties) and climbing into memory with frequent cache misses.

You are asking "why is it faster on small buffer and fails miserably on big one" because you don't know anything about L1/L2/L3 caches of processor at all. If you got into cache, it was counted quickly. Not caught - wait a couple dozen cycles of reading data from upper cache or memory.
 
Renat Fatkhullin:

People long ago lost the ability to understand the results of modern C++ compilers.

In addition, you have a mess of code, which means almost zero possibility to build naive axioms "if these conditions, then the result will be this". That is, the resulting optimization will rearrange everything so much that your hypotheses will produce tens of percent different results even with minuscule changes in the code.

Take another look at cramming 8 roots into 4 assembler commands and realize that you don't have a chance to assert, demand or appeal to your logic. Optimisers have long operated at prohibitive levels beyond the reach of programmers.

I can see your VS comparison results perfectly well and I'm delighted with it.
But the question remains.

I apologise for the chaotic working code, but we are only talking about this section of code and comparing the two execution options:

 if(arr)
        {  // расчет квадратных корней через массив значений SQRT[]
         D1=SQRT[((X1-X)*(X1-X)+(Y1-Y)*(Y1-Y))];
         D2=SQRT[((X2-X)*(X2-X)+(Y2-Y)*(Y2-Y))];
         D3=SQRT[((X3-X)*(X3-X)+(Y3-Y)*(Y3-Y))];
         D4=SQRT[((X4-X)*(X4-X)+(Y4-Y)*(Y4-Y))];
         D5=SQRT[((X5-X)*(X5-X)+(Y5-Y)*(Y5-Y))];
         D6=SQRT[((X6-X)*(X6-X)+(Y6-Y)*(Y6-Y))];
         D7=SQRT[((X7-X)*(X7-X)+(Y7-Y)*(Y7-Y))];
         D8=SQRT[((X8-X)*(X8-X)+(Y8-Y)*(Y8-Y))];
        }
 else // расчет квадратных корней через функцию кв. корня sqrt()
        {
         D1=(int)sqrt((X1-X)*(X1-X)+(Y1-Y)*(Y1-Y));
         D2=(int)sqrt((X2-X)*(X2-X)+(Y2-Y)*(Y2-Y));
         D3=(int)sqrt((X3-X)*(X3-X)+(Y3-Y)*(Y3-Y));
         D4=(int)sqrt((X4-X)*(X4-X)+(Y4-Y)*(Y4-Y));
         D5=(int)sqrt((X5-X)*(X5-X)+(Y5-Y)*(Y5-Y));
         D6=(int)sqrt((X6-X)*(X6-X)+(Y6-Y)*(Y6-Y));
         D7=(int)sqrt((X7-X)*(X7-X)+(Y7-Y)*(Y7-Y));
         D8=(int)sqrt((X8-X)*(X8-X)+(Y8-Y)*(Y8-Y));
        }

There is no rubbish here.

You said that"Optimization of dynamic array access is excellent, beyond praise."

But... see my previous message.

How do you explain my last experiment?

"That is, when we use an SQRT array of size 20,000, we are at a gain of 15-20%, and when we use an array of size 3,000,000, we lose 200% with exactly the same maths.

So the size of the array is the cause of the brakes?"

 

Read my previous reply carefully - it's complete with an exact answer.

I'll explain your questions simply: read five technical articles thoughtfully on processor design in terms of performance and factors affecting it. You can't have a discussion without that, as you need to explain the basics.

 
Renat Fatkhullin:

The way the compiler decomposes the roots is an art. And you're trying to beat it with arrays without even understanding the simplest constraint - reading from an array is already a failure. Perfect register work and batch calculation of roots vs. branching (penalties) and climbing into memory with frequent cache misses.

You are asking "why is it faster on a small buffer and deafeningly fails on a big one" because you don't know about L1/L2/L3 caches of processor at all. If you got into cache, it was counted quickly. If you haven't got it, you will have to wait for couple of tens of cycles of reading data from upper cache or memory.
Renat Fatkhullin:

Read my previous reply carefully - it's finished with an exact answer.

Let me explain your questions simply: read five technical articles thoughtfully on processor design in terms of performance and factors that affect it. You can't have a discussion without that, as you need to explain basic things.

Yay!!!
Finally!
You, Renat, should be pinched for everything.

The picture is now clearer for me.

I was wrong when I was blaming your compiler. I'm sorry, I was wrong. I might have guessed that the reason was in limited caches of the processor. I'm really bad at modern processors and I really need to read about it.

Still, it's not for nothing that I wrote this code - lab rat - and made this wave.

So, for those programmers reading this thread I will summarize what I personally found out as a result of this wave:

  • The sqrt() function and most likely many other elementary functions are very fast and are executed not at the compiler level, but at the CPU level.
  • The MQL5 compiler is so strong in optimizing mathematical logic that it easily beats the modern VS C++ compiler. Which is very inspiring.
  • It's reasonable to try not to mix types in resource-intensive tasks. Mixing types leads to lower computational speed.
  • SIZE MATTERS! (I mean the array's size :)) because of peculiarities of work of multilevel cache of the processor and its limited size. And programmers would do well to watch the total size of arrays and understand that using large arrays may significantly influence the speed of calculations. As far as I understood, the point is about relatively comfortable work of arrays with total size not exceeding about 512kB, and it is ~65000 elements of double type or ~130000 of int..,

I went to fix my code relying on this information. I often abused the size of arrays.

Thank you all!

 

how do i know if the crosshair button is pressed or released?

you can catch when the mouse wheel is pressed, but if the mouse is not in use how can you do this?

 
Alexandr Bryzgalov:

how do i know if the crosshair button is pressed or released?

You can catch the mouse wheel clicking, but if the mouse is not in use, what about it?

How about forcing it or pushing it back when needed?

CHART_CROSSHAIR_TOOL

Enable/disable access to "crosshair" tool by pressing the middle mouse button

bool (default value true)

 
Alexey Viktorov:

Can it be forced or pushed back if necessary?

CHART_CROSSHAIR_TOOL

Enable/disable access to "crosshair" tool by pressing middle mouse button

bool (true by default)

as far as I understand it only accesses the tool, but not turning it off.

Reason: