OpenCl and the tools for it. Reviews and impressions. - page 5

 
Mathemat:

Great, it will provide a basis for comparing MQL and C/C++. I just provided the facts. And anyway, why the hell should I be a Thomas the Believer?

There might be a bit more difference on the nerves, no argument there.

And, I'd like to see your "just the facts", by the way.

My "just the facts" are almost there. "Almost" - because I dug out the sources of my old tests. I'm going to update them a bit, and then I'll post the sources with the test results in a table here.

 
joo: And, I'd like to see your 'just the facts', by the way.

Right here. See my second post on the page.

You, by the way, already replied to my post with the link. The C code above can be easily reworked in MQL4. Please see the attachment.

Files:
pi.mq4  1 kb
 
Mathemat:

Right here. See my second post on the page.

By the way, you already replied to my post with the link. The C code above is easily reworked for MQL4. Please see the attachment.

Please make a table of test results and post it here, so that no one reading this thread would have to jump through the links.
 
AlexEro:

I sent you a link to the pictures of these behemoths in a private message - not to pander to the digital ******ism of vocational school kids.

Come on, believe me, your picture won't change their numbers here by even a percentage. Go ahead and post it.

But how long will it last? In a good way, you should mount a good cooling for such a rig, water cooling, say, as on the next picture.

 
joo:
Please, write down the test results as a table and post them here, so that nobody reading this thread would have to jump through the links.

It's not a table. A couple of pictures.

Tests of parallelpi_x.cpp programs at different compilation settings. When comparing against MQL4, only the 1st result is important: 6.723 seconds. No accelerators (SSE*, IPP, OML) are used there.

But if someone would like to take a jerk and see how the results are changing when turning on accelerators, below in attachment is an archive with compiled .exe files and required parallel libraries. All you need is to put them all in one directory and run from command line.

Of course, these results do not represent any competition to monsters on the graphics card array.

The same program rewritten in MQL4:


The result: 22.98 seconds, i.e. 3.4 times more. But there is no work with arrays there, and it may be crucial for us. Attached is the code of the script.

Files:
release.zip  278 kb
pi_1.mq4  1 kb
 
joo:

MQL5 is 20 times faster than MQL4.

C++ is 6 times faster than MQL5 (when using libraries that automatically parallelize execution).

Total: 20*6=120 times.

If you use GPU calculations, it will be even faster.

TOTAL: 10/120=0.083c.

something like this.

So, fanfare! There are four different compilers in the ring, competing to... to compile.

Well, that's a joke, of course. But seriously, six tests have been written. The results are shown in the table below. And comments are even lower. :)

#Tests
Description
Executable name
Test result, s
1
Ex5 script and Ex5 library
1 MLP MQL compiler.ex5
97.2
2
Ex5 script and C++ dll library, MS compiler, all optimizations disabled
2 MLP MS compiler nonOpt.ex5
42.6
3
Ex5 script and C++ dll library, MS compiler, all optimisations on
3 MLP MS compiler Opt.ex5
27.1
4
Ex5 script and C++ dll library, Intel compiler, all optimisations included
4 MLP Intel compiler.ex5
12.5
5
Ex4 script and Ex4 library
5 MLP MQL4 compiler.ex4
669.6
6
Ex4 script and C++ dll library, Intel compiler, all optimizations enabled
6 MLP MQL4 Intel compiler.ex4
10.7




As the test heavy calculations a 4-layer neural network MLP 80-100-100-10 is used, designed as pluggable libraries ex4, ex5, and dll.

We see that:

- the results of connected ex5 libraries and ex4 libraries differ by a factor of 6.8 (1 and 5 tests)

- the results of the connected eX5 libraries and the dll libraries of the Intel compiler differ by a factor of 7.8 (1 and 4 tests)

- results of connected eX4 libraries and dll libraries of Intel compiler differ 62,5 times (tests 5 and 6)

Conclusions:

Of course, I was a little exaggerating concerning the 20-fold difference between MQL4 and MQL5. Such a difference will occur (I remember where that number popped into my head) under intensive use of 2-dimensional arrays, but I'm too lazy to tweak the tests specifically for that specific feature (besides, it won't correspond to typical cases of "heavy" calculations - typical cases are presented in tests), therefore, you can take my word for it or write a test by yourself. Therefore, we can talk about a 6.8 times speed difference between MQL4 and MQL5 and a 62.5 times difference between MQL4 and C++.

And I'm discouraged by this fact - ahtung! - The MQL4 script that called the dll works faster than MQL5 script with the same dll... What does it mean? - I checked it, ran it several times, no errors. The developers seem to have said that MT5 has optimized dll calling as compared to MT4. Either it is peculiarities of builds (MT5 574 and MT4 409) and tests are not quite correct or... I don't know.


Who needs it, can use grid (compiled dll of 4 or 6 tests - very fast grid), sources are attached. Number of neurons in each layer is configurable. But the optimizer (for both 4 and 5) doesn't support more than 64 parameters, besides you have to use a big step, so this monster (as in tests, 80-100-100-10, 19210 parameters to be optimized!) cannot be trained using standard means, you have to use custom optimization algorithms. By the way, for this (and not only for this) I decided to make a paid tool (will be in the shop, of course), which will allow using both 4 and 5 step custom optimizers to train unlimited number of parameters, even 0 step.

Files:
tests_mlp.zip  71 kb
 

It's convincing, joo, very convincing even. But there are a couple of points.

First, even the best result (option 3) is less than 4 times better than option 1 when compiled by MS.

Second, I don't believe that the Intel all-inclusive compiler is better than MS by more than a factor of 2. So you've enabled more optimizations with Intel.

In any case - impressive. Waiting for the table to be filled to the end.

Now I know what these arrays of video cards are for: to do these calculations on every tick!

 
Mathemat:

Second, I don't believe that Intel's all-inclusive compiler is more than twice as good as MS'. So, Intel has more optimizations enabled.

Which ones are there - all of them. There's no sense in using a compiler without using all its features - the fish look where the deepest is, while the programmer looks for the best compiler. In this sense, compilers from MQ are not customised, so we can assume that they are optimally tuned.
 

And, I was discouraged by this fact, akhtung! - MQL4 script that calls dll works faster than MQL5 script with the same dll... What does it mean? - I checked it, ran it several times, no errors. The developers seem to have said that MT5 has optimized dll calling as compared to MT4. Either it is peculiarities of builds (MT5 574 and MT4 409) and tests are not quite correct or... I don't know.



Thank you, it is very illustrative and illustrative.

I guess we shouldn't compare individual percentages so much : because the CPU speed (well, if you take it in MegaHertz MHz, or GigaHertz, GHz) of the suspended computer ... is zero.

When the speed difference is about 10-20%, it makes sense to be more concerned about reliability of the program and its environment and error handling. For example, when using DLL made in MSC, you should pay attention on how it is linked with MSVCRT.DLL and what version, because it has to work in the environment of complicated terminal.exe process and in the environment of MSVCRT.DLL initialization block, which can differ from version to version, and so on. For normal error handling in DLL, MetaTrader builds (and monitors) a chain of exceptions, which in itself slows down the whole system and calls DLL and so on.

On the subject of neuron speed:

here's a man translated the FANN neural network library into OpenCL and claims 20x acceleration on the average GTX 285 card:

"On my current GPU (GeForce 9500 GT), I'm getting roughly the same speed between the normal and OpenCL versions. I currently have a GTX 285 on order, and it should be at least 10x faster. With a modern GPU, such as the GTX 480, I expect it to be at least 20x faster than my 2.26GHz Nehalem Mac Pro. "

...

"Yep, the new card (GTX 285) runs the kernel about 20x faster."

http://leenissen.dk/fann/forum/viewtopic.php?f=2&t=658&start=0

http://leenissen.dk/fann/wp/

 

Almost everyone has been there of course.

For fanatical B4 fans who don't visit mql5.com : OpenCL: Internal Implementation Tests in MQL5

Almost there, however.

I suspect it will be very attractive for autotraders using other platforms.

Reason: