OpenCL: internal implementation tests in MQL5 - page 64

 

http://www.ixbt.com/video3/rad2.shtml- It is best to use optimized libraries for large data sets rather than getting "creative" with writing programs in OpenCL (I'm not ruling it out). You can use a hybrid system, where small amounts are handled usingOpenCL and large amounts are handledusing optimized libraries. You may need to convert the library to a specific programming language and create conditions for the inclusion of this library. If this can be implemented, it would give an impressive result and consequently accelerate the operation many times over. Note to .....

P.S This may be a new thread in the forum

������������������ AMD Radeon � �������������� �����������. ������������� AMD Core Math Library
  • www.ixbt.com
������� ������� ����������� �� �������, ��������� ����������� � ���� � �� �� �����, �� ������ ��� �� ����� ������. � ������� ������������ ������� �� ����� ����������� �� ����� �������, � �������� GPU ����� ����� ������������� �������� ��������� ������� ��������� ���� � ����� ����� �� ����� �����. � ���� ����� ������ ���������� ��� PC ����� x86...
 
GKS: It may be necessary to convert libraries to a specific programming language and create conditions to enable this library. If this can be implemented, you will get an impressive result and consequently the operation will be accelerated many times.

It is not technological for developers to tweak the compiler specifically for an extremely specific, albeit unique, product.

And so far I do not see any trader's tasks that require such a huge size of multiplied matrices.

 
MetaQuotes:

Announcement of MetaTrader 5 update

An update of the MetaTrader 5 platform will be published in the next few days. Once the update is published, there will be an additional news release containing the final list of changes and build numbers. The following changes are planned:

MetaTrader 5 Client Terminal build 648

MetaTester: Added support for using OpenCL programs in testing agents.

Those who understand OpenCL, prepare a task-test under Cloud+OpenCL. Very interesting mathematical perspectives.
 
hrenfx: Understanding OpenCL, prepare a task-test for Cloud+OpenCL. Very interesting mathematical perspectives.
This is more for MetaDriver. He is a big OpenCL specialist here and is kind of trying to emulate testing.
 
hrenfx:
Understanding OpenCL, prepare a task-test for Cloud+OpenCL. Very interesting mathematical perspectives.
Mathemat:
That's more for MetaDriver........................
I will think about it. I need some ideas in terms of what exactly needs to be computed.
 

Recently updated video driver (NVIDIA301.42).

I did one of the old tests for interest (ParallelTester_00-01x) and could not believe my eyes.

On 24 page I was doing test, and there was 29, then I set memory to 2-channel mode and it went to 39.

Now it is ~306.

2012.05.31 22:05:11     ParallelTester_00-01 x (EURUSD,D1)       OpenCL init OK!
2012.05.31 22:05:11     ParallelTester_00-01 x (EURUSD,D1)       GPU time = 141 ms
2012.05.31 22:05:11     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.05.31 22:05:11     ParallelTester_00-01 x (EURUSD,D1)       Result on Gpu МахResult==1.28051 at 1213 pass
2012.05.31 22:05:54     ParallelTester_00-01 x (EURUSD,D1)       CPU time = 43259 ms
2012.05.31 22:05:54     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.05.31 22:05:54     ParallelTester_00-01 x (EURUSD,D1)       Result on Cpu МахResult==1.28051 at 1213 pass
2012.05.31 22:05:54     ParallelTester_00-01 x (EURUSD,D1)       CpuTime/GpuTime = 306.8014184397163
2012.05.31 21:41:04     OpenCL  GPU: NVIDIA Corporation GeForce GT 440 with OpenCL 1.1 (2 units, 1660 MHz, 1023 Mb, version 301.42)
2012.05.31 21:41:04     OpenCL  CPU: AuthenticAMD AMD Athlon(tm) II X4 630 Processor with OpenCL 1.1 (4 units, 2948 MHz, 2048 Mb, version 2.0)
Amazing. It seems that NVIDIA has tweaked the drivers humanly.
 

fyords, how did you make earlier events appear higher in the log?

But in general it's great, I understand you. I was just as happy when I bought my HD 4870 on the cheap and saw its power.

One small recommendation: choose parameters so that GPU execution time is comparable to 1 second. Then the time ratio will also be more accurate. The average error of the GteTickCount() function is no less than tens of milliseconds. So you could easily get 120 ms or 170 ms on the GPU. And the value of acceleration depends greatly on this.

I have fine-tuned this script a bit to make it run on all available devices (look up from the bottom: 1) CPU on Intel platform, then 2) HD 4870 on AMD platform, and 3) CPU on AMD platform):

2012.05.31 15:48:35     OpenCL  CPU: GenuineIntel  Intel(R) Pentium(R) CPU G840 @ 2.80 GHz with OpenCL 1.2 (2 units, 2793 MHz, 8040 Mb, version 2.0 (sse2))
2012.05.31 15:48:35     OpenCL  GPU: Advanced Micro Devices, Inc. ATI RV770 with OpenCL 1.0 (10 units, 780 MHz, 512 Mb, version CAL 1.4.1720)
2012.05.31 15:48:35     OpenCL  CPU: Intel(R) Corporation  Intel(R) Pentium(R) CPU G840 @ 2.80 GHz with OpenCL 1.1 (2 units, 2800 MHz, 8040 Mb, version 1.1)

The script results are from the bottom up!

2012.06.01 01:06:12     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     ------------
2012.06.01 01:06:12     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     Result on Gpu МахResult==0.87527 at 10902 pass
2012.06.01 01:06:12     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     CpuTime/GpuTime = 24.76943755169562
2012.06.01 01:06:12     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     GPU time = 9672 ms
2012.06.01 01:06:02     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     Device number = 2
2012.06.01 01:06:02     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     ------------
2012.06.01 01:06:02     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     Result on Gpu МахResult==0.87527 at 10902 pass
2012.06.01 01:06:02     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     CpuTime/GpuTime = 204.7606837606838
2012.06.01 01:06:02     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     GPU time = 1170 ms
2012.06.01 01:06:01     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     Device number = 1
2012.06.01 01:06:01     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     ------------
2012.06.01 01:06:01     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     Result on Gpu МахResult==0.87527 at 10902 pass
2012.06.01 01:06:01     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     CpuTime/GpuTime = 77.55584331498866
2012.06.01 01:06:01     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     GPU time = 3089 ms
2012.06.01 01:05:57     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     Device number = 0
2012.06.01 01:05:57     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     -------------------------
2012.06.01 01:05:57     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     Result on Cpu МахResult==0.87527 at 10902 pass
2012.06.01 01:05:57     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     Соunt indicators = 16; Count history bars = 144000; Count pass = 12800
2012.06.01 01:05:57     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     CPU time = 239570 ms
2012.06.01 01:01:58     ParallelTester_00-01 x_new_cycle (EURUSD,H1)     ========================================
At the latter parameter, which is 10x less, my card is not as fast as yours. Probably doesn't have time to overclock properly :)
 
For information, GetTickCount has an error rate well under 16 ms, it's not like you're using Windows 95.
 
Mathemat:

fyords, how did you make earlier events appear higher in the log?

In reports, right button "View", new window "Query" button and the log is built by time correctly, and it's more convenient to read (for me).

As for the script, thank you, I will try it tomorrow, it's a long wait for its completion, especially with Count pass = 12800.

For now here is an old script with Count pass = 12800

2012.06.01 01:05:53     ParallelTester_00-01 x (EURUSD,D1)       OpenCL init OK!
2012.06.01 01:05:54     ParallelTester_00-01 x (EURUSD,D1)       GPU time = 999 ms
2012.06.01 01:05:54     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 12800
2012.06.01 01:05:54     ParallelTester_00-01 x (EURUSD,D1)       Result on Gpu МахResult==1.49697 at 10010 pass
2012.06.01 01:13:08     ParallelTester_00-01 x (EURUSD,D1)       CPU time = 434167 ms
2012.06.01 01:13:08     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 12800
2012.06.01 01:13:08     ParallelTester_00-01 x (EURUSD,D1)       Result on Cpu МахResult==1.49697 at 10010 pass
2012.06.01 01:13:08     ParallelTester_00-01 x (EURUSD,D1)       CpuTime/GpuTime = 434.6016016016016
The gain has become even greater.
 
Renat: For information: GetTickCount has an error much less than 16 ms, it's not like you're using Windows 95.

The error isn't actually much less. Yes, close to it, but there are outliers from the average, clustering around 32, 48 and even more. They are rare, I don't argue, they can be ignored.

But when a person runs a script, he or she is not necessarily doing anything on the computer. And the system can also run its own tasks, which can slow down execution.

Technically, the standard deviation is indeed small, around 6-7, and weakly dependent on the execution time itself. But it poorly reflects the true variation. Here's a histogram of the times recorded when performing the same calculations:

The distance between adjacent bars is 16ms. Smaller columns are quite likely, and they differ from each other by as much as 32ms. If the middle column ("true execution time") is 140 milliseconds, then the left one is 124 ms and the right one is 156 ms.

So, the real variation when divided by the low GPU execution time can be quite large:

20 seconds/124 ms ~ 161

20 seconds/156 ms ~ 128.

The "true ratio" of execution times roughly corresponds to the largest bar:

20 sec/140ms ~ 143.

If we take a longer execution time on the GPU, the impact of this error will be much less. At least let it be 500ms.

Script for the simulation:

#define BIG       10000000
#define SMALL     1000

void OnStart( )
{
   Print( "Script started..." );
   double k;
   int times[ SMALL ];
   MathSrand( TimeCurrent( ) );
   for( int ii = 0; ii < SMALL; ii ++ )
   {
      Comment( ii );
      int st = GetTickCount( );
      for( int i = 0; i < BIG; i ++ )   k = sin( i );
      int timeTotal = GetTickCount( ) - st;
      times[ ii ] = timeTotal;
   }

   int h = FileOpen( "gtc_times.txt", FILE_WRITE, "\r\n"  );
   for( int ii = 0; ii < SMALL; ii ++ )
      FileWrite( h, times[ ii ] );   
   FileClose( h ); 
   Print("Script unloaded");
}
//+------------------------------------------------------------------+