OpenCL: internal implementation tests in MQL5 - page 65

 
fyords: In reports, right-click "View", new window "Query" button and the log is timed correctly, and it's easier to read (for me).

Thanks, I didn't realise you were opening a report.

The gain was even greater.

I'm amazed: this is a budget card for under $80! So, NVidia has done some serious work on the driver.

 

And here are the new results:

2012.06.01 09:25:23     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     ========================================
2012.06.01 09:32:25     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     CPU time = 421203 ms
2012.06.01 09:32:25     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     Соunt indicators = 16; Count history bars = 144000; Count pass = 12800
2012.06.01 09:32:25     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     Result on Cpu МахResult==1.2809 at 9448 pass
2012.06.01 09:32:25     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     -------------------------
2012.06.01 09:32:28     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     Device number = 0
2012.06.01 09:32:39     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     GPU time = 11263 ms
2012.06.01 09:32:39     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     CpuTime/GpuTime = 37.39705229512563
2012.06.01 09:32:39     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     Result on Gpu МахResult==1.2809 at 9448 pass
2012.06.01 09:32:39     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     ------------
2012.06.01 09:32:39     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     Device number = 1
2012.06.01 09:32:40     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     GPU time = 998 ms
2012.06.01 09:32:40     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     CpuTime/GpuTime = 422.0470941883768
2012.06.01 09:32:40     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     Result on Gpu МахResult==1.2809 at 9448 pass
2012.06.01 09:32:40     ParallelTester_00-01 x_new_cycle (EURUSD,D1)     ------------

I understand: 1. pure CPU, 2. CPU with OpenCL, 3. GPU with OpenCL ?

And it's still 422.

Mathemat:

I'm amazed: this is a budget card costing under $80! So NVidia has worked very hard on the driver.

And how amazed I am, from dirt to riches. One gets the impression that NVidia reads this forum, makes similar tests, finds bugs and fixes them.

If only the tester would choose what to simulate on, i.e. without the forced writing of code, it would be very good. Still, 1 second (or 11 seconds if video does not allow or is not available) against 7 minutes is power.

 
To run tests within a small margin of error, there is a good method of running a series of measurements and averaging the results or discarding the extremum. But, of course, it is better to increase the amount of computation to stabilize the result.

Modern operating systems and really multi-core processors have seriously removed the problem of scatter of measurements via GetTickCount. My original comment was solely about the erroneous statement "the average error of GetTickCount is at least tens of ms".
 
Attention Intel processor owners! When installing nVidia driver 301.42, OpenCL drivers for CPU are automatically installed,

in the registry it looks like this:

[HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors]
"nvcuda.dll"=dword:00000000
"amdocl.dll"=dword:00000000
"amdocl64.dll"=dword:00000000

"IntelOpenCL64.dll"=dword:00000000

They're about 1.5 times slower (highlighted in red) than Intel's native driver (highlighted in green).

You may remove the corresponding registry values, but save the branch just in case.

 

Dear Admin. Haven't been on your forum for a while, may have missed this point.

Will there be an implementation of surrendering video cards to the needs of the cloud?

 
ilovebtc:

Dear Admin. Haven't been on your forum for a while, may have missed this point.

Will there be an implementation of video card swapping for the needs of the cloud?

Almost done https://www.mql5.com/ru/forum/23/page15#comment_201948

19. MetaTester: Added support for using OpenCL programs in testing agents.

OpenCL programs are intended for performing computations on video cards that support OpenCL 1.1 or higher. Modern video cards contain hundreds of small specialized processors that can simultaneously perform simple mathematical operations on incoming data streams. The OpenCL language undertakes the organisation of such parallel computing and offers a great speed-up for a certain class of tasks.
After the tests, we will enable OpenCL in a distributed network.
 
fyords: My understanding is: 1. pure CPU, 2. CPU with OpenCL, 3. GPU with OpenCL ?

Yes, that's right.

Would you mind running the attached script and posting the results? It's really interesting, isn't it?

Do not be afraid of a large number of digits. They are just there to check the correctness of calculations.

The script also runs through all the devices. The main task is to multiply two large matrices.

The settings can only be changed within the code - the linear size of matrices _size in this line:

#define _size       2000

Change them only if you run out of memory. A sign of that is discrepancies in array numbers when run on a discrete GPU: if the difference in numbers is more than 10^(-4), that is an obvious error. But you seem to have enough memory.

Files:
 

For example, I have a Radeon 6930 graphics card, it has 1280 stream processors. How will it show up in the agent list? As 1 device, or all 1280.

It is times faster by itself than 10 processors, and the bonus is not for 1 added device.

 
Mathemat:

Would you mind running the attached script and displaying the results? It's really interesting.

No, it's not a bore. I'm curious about it myself. I haven't changed anything in the settings.

2012.06.02 09:28:27     vect_v2_all_devices (EURUSD,D1) =======================================
2012.06.02 09:28:27     vect_v2_all_devices (EURUSD,D1) OCL martices mul:         ROWS1 = 2000; COLSROWS = 2000; COLS2 = 2000
2012.06.02 09:30:31     vect_v2_all_devices (EURUSD,D1) CPUTime = 124.504
2012.06.02 09:30:31     vect_v2_all_devices (EURUSD,D1) ---------------
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) read = 4000000 elements
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) Device = 0: time = 2.824 sec.
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) CPUTime / GPUTotalTime = 44.088
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 1968,1939 ) = -5.27639246;    thirdCPU[ 1968,1939 ] = -5.27639246;    buf[ 1968,1939 ] = -5.27639198
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 585,810 ) = 3.74615073;    thirdCPU[ 585,810 ] = 3.74615073;    buf[ 585,810 ] = 3.74614906
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 1131,1732 ) = -4.46934557;    thirdCPU[ 1131,1732 ] = -4.46934557;    buf[ 1131,1732 ] = -4.46934605
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 587,999 ) = -4.46048546;    thirdCPU[ 587,999 ] = -4.46048546;    buf[ 587,999 ] = -4.46048260
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 983,1903 ) = 3.42076445;    thirdCPU[ 983,1903 ] = 3.42076445;    buf[ 983,1903 ] = 3.42076564
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 1927,313 ) = 5.62960339;    thirdCPU[ 1927,313 ] = 5.62960339;    buf[ 1927,313 ] = 5.62960196
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 355,1897 ) = 5.86679220;    thirdCPU[ 355,1897 ] = 5.86679220;    buf[ 355,1897 ] = 5.86678505
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 1455,1651 ) = -3.67937088;    thirdCPU[ 1455,1651 ] = -3.67937088;    buf[ 1455,1651 ] = -3.67936754
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 1207,856 ) = 1.30920172;    thirdCPU[ 1207,856 ] = 1.30920172;    buf[ 1207,856 ] = 1.30920100
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) sum( 1699,575 ) = 2.55669522;    thirdCPU[ 1699,575 ] = 2.55669522;    buf[ 1699,575 ] = 2.55669498
2012.06.02 09:30:38     vect_v2_all_devices (EURUSD,D1) ________________________
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) read = 4000000 elements
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) Device = 1: time = 1.514 sec.
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) CPUTime / GPUTotalTime = 82.235
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 407,514 ) = -3.69270682;    thirdCPU[ 407,514 ] = -3.69270682;    buf[ 407,514 ] = -3.69270515
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 1421,1902 ) = -7.43944120;    thirdCPU[ 1421,1902 ] = -7.43944120;    buf[ 1421,1902 ] = -7.43943167
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 1197,1072 ) = -1.49989450;    thirdCPU[ 1197,1072 ] = -1.49989450;    buf[ 1197,1072 ] = -1.49989557
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 1249,1056 ) = -0.22817086;    thirdCPU[ 1249,1056 ] = -0.22817086;    buf[ 1249,1056 ] = -0.22817032
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 385,1856 ) = 3.88903213;    thirdCPU[ 385,1856 ] = 3.88903213;    buf[ 385,1856 ] = 3.88902068
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 952,488 ) = 0.37963703;    thirdCPU[ 952,488 ] = 0.37963703;    buf[ 952,488 ] = 0.37963703
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 345,1572 ) = 2.28500485;    thirdCPU[ 345,1572 ] = 2.28500485;    buf[ 345,1572 ] = 2.28500390
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 1928,468 ) = -1.35805547;    thirdCPU[ 1928,468 ] = -1.35805547;    buf[ 1928,468 ] = -1.35805535
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 1881,1968 ) = -3.12033391;    thirdCPU[ 1881,1968 ] = -3.12033391;    buf[ 1881,1968 ] = -3.12033129
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) sum( 1454,575 ) = 5.97233009;    thirdCPU[ 1454,575 ] = 5.97233009;    buf[ 1454,575 ] = 5.97232151
2012.06.02 09:30:42     vect_v2_all_devices (EURUSD,D1) ________________________

I just don't understand any of the numbers. Can you explain? Well, at least on the fingers: is it good or not? They are different between devices, and on the lines of 5-6 digits after the decimal point is already different in places.

I think I got it: it is a multiple test on repeated operations, the final time is the average for each device. Right?

 
fyords: Only I don't understand anything from these figures. Can you explain? Just on your fingers: is it good or not? They are different between devices, and on the lines of 5-6 digits after the decimal point is already different in places.
sum( 407,514 ) = -3.69270682;    thirdCPU[ 407,514 ] = -3.69270682;    buf[ 407,514 ] = -3.69270515

These are just check digits. If they coincide with 0.00001, everything is OK. The indexes are chosen at random - it is a random check to make sure that the calculations are correct. Well, we are not going to print here the results of a full check of all 4 million elements of the resulting matrix, are we?

I think I got it: it's a multiple test on repeated operations, the final time is the average for each device. Right?

No, this is a single operation of multiplication of two large matrices.

In terms of performance figures: very good for this card. Now my results. Devices (bottom to top - initialisation order):

2012.06.02 05:49:25     OpenCL  CPU: GenuineIntel  Intel(R) Pentium(R) CPU G840 @ 2.80 GHz with OpenCL 1.2 (2 units, 2793 MHz, 8040 Mb, version 2.0 (sse2))
2012.06.02 05:49:25     OpenCL  GPU: Advanced Micro Devices, Inc. ATI RV770 with OpenCL 1.0 (10 units, 780 MHz, 512 Mb, version CAL 1.4.1720)
2012.06.02 05:49:25     OpenCL  CPU: Intel(R) Corporation  Intel(R) Pentium(R) CPU G840 @ 2.80 GHz with OpenCL 1.1 (2 units, 2800 MHz, 8040 Mb, version 1.1)

I.e. first an Intel CPU with Intel's OCL engine, then my dinosaur HD 4870, and then a stone again but with AMD's engine. Script:

2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) ________________________
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 1477,98 ) = -5.84002066;    thirdCPU[ 1477,98 ] = -5.84002066;    buf[ 1477,98 ] = -5.84001255
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 1339,1186 ) = 0.59214997;    thirdCPU[ 1339,1186 ] = 0.59214997;    buf[ 1339,1186 ] = 0.59215009
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 1410,1861 ) = -0.27033439;    thirdCPU[ 1410,1861 ] = -0.27033439;    buf[ 1410,1861 ] = -0.27033412
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 1282,459 ) = -0.87189484;    thirdCPU[ 1282,459 ] = -0.87189484;    buf[ 1282,459 ] = -0.87189591
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 710,1645 ) = 4.86117268;    thirdCPU[ 710,1645 ] = 4.86117268;    buf[ 710,1645 ] = 4.86116362
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 526,938 ) = 0.94805324;    thirdCPU[ 526,938 ] = 0.94805324;    buf[ 526,938 ] = 0.94805157
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 914,489 ) = 5.58242941;    thirdCPU[ 914,489 ] = 5.58242941;    buf[ 914,489 ] = 5.58243275
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 811,257 ) = -1.11584055;    thirdCPU[ 811,257 ] = -1.11584055;    buf[ 811,257 ] = -1.11583853
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 318,498 ) = 1.62952971;    thirdCPU[ 318,498 ] = 1.62952971;    buf[ 318,498 ] = 1.62952805
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) sum( 648,1434 ) = -5.57316303;    thirdCPU[ 648,1434 ] = -5.57316303;    buf[ 648,1434 ] = -5.57315731
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) CPUTime / GPUTotalTime = 29.879
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) Device = 2: time = 3.105 sec.
2012.06.02 07:38:19     vect_v2_all_devices (EURUSD,H1) read = 4000000 elements
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) ________________________
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 684,439 ) = 0.21124490;    thirdCPU[ 684,439 ] = 0.21124490;    buf[ 684,439 ] = 0.21124732
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 795,204 ) = -1.68047857;    thirdCPU[ 795,204 ] = -1.68047857;    buf[ 795,204 ] = -1.68047154
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 579,1503 ) = 2.46559286;    thirdCPU[ 579,1503 ] = 2.46559286;    buf[ 579,1503 ] = 2.46558809
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 675,1504 ) = 0.44935751;    thirdCPU[ 675,1504 ] = 0.44935751;    buf[ 675,1504 ] = 0.44935691
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 1251,1415 ) = -2.85569835;    thirdCPU[ 1251,1415 ] = -2.85569835;    buf[ 1251,1415 ] = -2.85569715
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 204,1755 ) = 0.31420049;    thirdCPU[ 204,1755 ] = 0.31420049;    buf[ 204,1755 ] = 0.31420231
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 1999,74 ) = -2.22978306;    thirdCPU[ 1999,74 ] = -2.22978306;    buf[ 1999,74 ] = -2.22977948
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 436,657 ) = 0.59192652;    thirdCPU[ 436,657 ] = 0.59192652;    buf[ 436,657 ] = 0.59192693
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 967,922 ) = -4.91348410;    thirdCPU[ 967,922 ] = -4.91348410;    buf[ 967,922 ] = -4.91348314
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) sum( 1489,1175 ) = -2.48868656;    thirdCPU[ 1489,1175 ] = -2.48868656;    buf[ 1489,1175 ] = -2.48868561
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) CPUTime / GPUTotalTime = 179.795
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) Device = 1: time = 0.516 sec.
2012.06.02 07:38:15     vect_v2_all_devices (EURUSD,H1) read = 4000000 elements
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) ________________________
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 303,1215 ) = -7.46387863;    thirdCPU[ 303,1215 ] = -7.46387863;    buf[ 303,1215 ] = -7.46388054
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 1173,1406 ) = -5.64940453;    thirdCPU[ 1173,1406 ] = -5.64940453;    buf[ 1173,1406 ] = -5.64940882
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 1617,1405 ) = -0.98162729;    thirdCPU[ 1617,1405 ] = -0.98162729;    buf[ 1617,1405 ] = -0.98162866
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 760,1003 ) = -0.97699410;    thirdCPU[ 760,1003 ] = -0.97699410;    buf[ 760,1003 ] = -0.97699606
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 679,793 ) = -5.41226530;    thirdCPU[ 679,793 ] = -5.41226530;    buf[ 679,793 ] = -5.41227150
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 1345,1865 ) = 0.95630527;    thirdCPU[ 1345,1865 ] = 0.95630527;    buf[ 1345,1865 ] = 0.95630503
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 1289,1659 ) = -3.82919979;    thirdCPU[ 1289,1659 ] = -3.82919979;    buf[ 1289,1659 ] = -3.82920074
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 1216,1759 ) = 4.87398672;    thirdCPU[ 1216,1759 ] = 4.87398672;    buf[ 1216,1759 ] = 4.87398672
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 1268,1060 ) = 2.78621030;    thirdCPU[ 1268,1060 ] = 2.78621030;    buf[ 1268,1060 ] = 2.78621435
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) sum( 1686,577 ) = -4.36586094;    thirdCPU[ 1686,577 ] = -4.36586094;    buf[ 1686,577 ] = -4.36585188
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) CPUTime / GPUTotalTime = 22.783
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) Device = 0: time = 4.072 sec.
2012.06.02 07:38:14     vect_v2_all_devices (EURUSD,H1) read = 4000000 elements
2012.06.02 07:38:10     vect_v2_all_devices (EURUSD,H1) ---------------
2012.06.02 07:38:10     vect_v2_all_devices (EURUSD,H1) CPUTime = 92.774
2012.06.02 07:36:37     vect_v2_all_devices (EURUSD,H1) OCL martices mul:         ROWS1 = 2000; COLSROWS = 2000; COLS2 = 2000
2012.06.02 07:36:37     vect_v2_all_devices (EURUSD,H1) =======================================
Документация по MQL5: Основы языка / Переменные / Создание и уничтожение объектов
Документация по MQL5: Основы языка / Переменные / Создание и уничтожение объектов
  • www.mql5.com
Основы языка / Переменные / Создание и уничтожение объектов - Документация по MQL5
Reason: