DirectX - page 2

 
Rorschach:

https://www.mql5.com/ru/forum/227736

Transferred to shader. First 15 seconds the source code runs on cpu, then the gpu version runs.

"You need to move array m_pixels[] from protected: to public: in Canvas.mqh file before compiling"

Hmm. interesting. It turns out the same speed. Probably because there are no 3D conversions.

Need to try with this code.


ZSY use this design so you don't have to fix the original libraries

#define protected public
#include <Canvas\Canvas.mqh>
#undef protected

SZY

Oh no! Sorry - made a mistake in the code and therefore didn't notice that the CPU part wasn't running at all. Corrected it.
Result - 2.5 times faster through GPU.

Very nice to see. Thank you very much for this implementation. Just tweaked it a bit.

Files:
 
Rorschach:

https://www.mql5.com/ru/forum/227736

Transferred to shader. The first 15 seconds runs the source code on cpu, then runs the version on gpu.

It's interesting to do the same with this code and see how the performance gain changes depending on the complexity of calculations (increasing gravity centres).

This code doesn't use an array of sines to speed things up, it just calculates the sine. And you can also change the number of rotating gravity centres.

Files:
Swirl2.mq5  5 kb
 
Nikolai Semko:

Need to try with this code.

I want to make indicator calculation completely on gpu, I need to figure out how to transfer such data volumes.

Nikolai Semko:

The result is 2.5 times faster via GPU.

Nikolai Semko:

This code doesn't use sine array to speed it up, it simply calculates sine.

XRGB(uchar(128+127*sin(d*45)),uchar(128+127*sin(d*70)),uchar(128+127*sin(d*25)));

By the way, I use this formula in my shader, so the speedup is ~10x.

NikolaiSemko:

It's interesting to do the same with this code and see how the performance gain changes depending on the complexity of the calculation (increasing gravity centres).

I'll try to do that.

 
Rorschach:

I want to do indicator calculation completely on gpu, I need to figure out how to transfer such volumes of data.

By the way, I'm using this formula in the shader, so the acceleration is ~10x.

I'll try to make one.

I think the results should inspire.

Yes, it really is a very cool addition to MQL5! Thanks to the MQ team for that.

By simply including the resource in the program, the performance of mathematical calculations can be increased many times over by using the video card resources. This offloads the CPU itself. That is, a super option for the market.

And if OpenCL requires some action on the part of the user to install the software, the DirectX is already installed in Windows by default. And the most amazing thing - I looked at how much the ex5 file increases when using DirectX and did not see any increase in the file at all. Very cool!!! Let's study and use it.

 
Nikolai Semko:

I think the results should be inspiring.

Yes, it really is a very cool addition to MQL5! Thanks to the MQ team for that.

By simply including the resource in the program, the performance of mathematical calculations can be increased many times over by using the video card resources. This offloads the CPU itself. That is, a super option for the market.

And if OpenCL requires some action on the part of the user to install the software, the DirectX is already installed in Windows by default. And the most amazing thing - I looked at how much the ex5 file increases when using DirectX and did not see any increase in the file at all. Very cool!!! Let's study and use it.

By the way, yes, the DX is more versatile than the OCL. Same capabilities + 3D. There's a CPU mode and you don't need to put a driver on the CPU for it.

 
Rorschach:

By the way, I use this formula in the shader, so the acceleration is ~10x.

Still have to count 2.5 times. The CPU was affected by the optimization, but the gpu was not.

Nikolai Semko:

It's interesting to do the same with this code and see how the performance gain changes depending on the complexity of the calculations (increase in gravity centres).

This code doesn't use an array of sines to speed things up, it just calculates the sine. And you can also change the number of rotating gravity centres.

I enabled optimization. The code was rearranged into 3 cycles to make it easier to transfer. The limit is 512 centres. By default, it runs directly on the gpu.

Files:
pixel.zip  1 kb
 
Rorschach:

You still have to count 2.5 times. The CPU was affected by the optimisation, but the GPU was not.

I enabled optimization. I rewrote the code to 3 cycles to make it easier to port. I have limited it to 512 centres. By default, it runs directly on the GPU.

Thank you very much!
2.5x just on simple calculations is also a very good result. On 3D, I think the result will be even steeper.

 
Rorschach:

You still have to count 2.5 times. The CPU was affected by optimization, but the GPU was not.

I enabled optimization. I rearranged the code into 3 loops to make it easier to port. We have limited it to 512 Centers. By default, it runs directly on the GPU.

You asked for OpenCl implementation. Something has turned out. Pardon for the code, I had no time to clean it up. The main thing is that it works.

__kernel void Func(int N, __global double *XP, __global double *YP, __global uchar *h, __global uint *buf)
{
   size_t X = get_global_id(0);
   size_t Width = get_global_size(0);
   size_t Y = get_global_id(1);
   
   float2 p;
   double D=0,S1=0,S2=0;
   
   for(int w=0;w<N;w++){ 
      p.x = XP[w]-X;
      p.y = YP[w]-Y;
      D = fast_length(p);
      S2+=D;
      if(w<N/2)
         S1+=D;
   }   
   //
   double d=S1/S2;
   buf[Y*Width+X] = upsample(upsample((uchar)0xFF,(uchar)h[(int)(d*11520)]),upsample((uchar)h[(int)(d*17920)],(uchar)h[(int)(d*6400)]));
}
Files:
Swirl2_OCL.mq5  14 kb
test_002.zip  1 kb
 
Serhii Shevchuk:

You asked for an OpenCl implementation. I've got something. Pardon the code, I had no time to brush it up. The main thing is that it works.

WOW! Wow! Unexpected for me, to be honest. A gain of more than 10x with my modest video card.
Thank you very much!

 
Serhii Shevchuk:

You asked for an OpenCl implementation. Something came out. Pardon the code, I had no time to brush it up. The main thing is that it works.

Thank you very much!

Are your calculations in the double? Then the result is particularly impressive.