OpenCL: internal implementation tests in MQL5 - page 60

 
Mathemat:

Yeah, MD, you're not doing so well on a bare CPU. Intel's all over the place, though...

Oh, come on, you've got a great graphics card.

I'm afraid to touch it. I've uninstalled it once and found myself in almost total darkness - Windows hooked it with some inappropriate driver. I had to reinsert AMD APP SDK in 1% visibility conditions. It was fun... :)) Hit all the buttons for the hundredth time...................
 
MetaDriver: Yeah. I'm scratching my head. But maybe the driver's crooked. I'm afraid to touch it.
It's funny: even native drivers are crooked - both for Intel and AMD. Let's hope it's just a growth problem.
 

Folks, could you advise whether this OpenCL on a pure CPU gives such acceleration only in MQL5 or in other languages as well?

Have you done the following comparison: MQL5 + OpenCL VS C++ + Full Compiler Optimization?

I suspect that it's not OpenCL that is so cool but the lack of optimizations in MQL5 itself.

P.S. I've got an i7 2700K, I'll try (when I'll get there) to test it on a clean CPU.

 
Mathemat:
Funny thing: even the native drivers are crooked - for both Intel and AMD. Let's hope it's just a growth problem.
Well, I hope that in a couple of years this won't happen, everything will work fine. For the time being I have other problems - to learn how to program this stuff properly... :)
 
hrenfx:

Folks, could you advise whether this OpenCL on a pure CPU gives such acceleration only in MQL5 or in other languages as well?

Have you done the following comparison: MQL5 + OpenCL VS C++ + Full Compiler Optimization?

I suspect that it's not OpenCL that is so cool but the lack of optimizations in MQL5 itself.

P.S. I've got an i7 2700K and will try (when I get there) to test it on a clean CPU.

Nah, we didn't.

Go for it! Good stuff.

Basically, code is ported to C++ in ten minutes at the most (on a bare CPU that counts). How to use OpenCL in C++ I have only seen in a primer and never gave it a try. Actually, I wish I had.

 
hrenfx: I have a suspicion that it's not OpenCL that is cool but lack of optimizations in MQL5 itself.

Yes, it is logical. I recently had a discussion on the ixbt forum with a CPU analyst who shares the same opinion. The discussion started here (my nickname is tamehtaM). Please don't kick me for my incompetence. But I have an impression that Felid is also overdoing it: too often he talks about IGP - even when it, IGP, is either absent or too weak.

My first post in this thread was written when I still did not know how to properly install Intel OpenCL runtime. In fact, I'm still not sure that it was set correctly. But it is already about three times better than on AMD APP SDK.

The acceleration figures are painfully large, that's for sure. And they should obviously become lower when the optimizations appear.

Interestingly enough, even without these optimizations the fifth is faster than the fourth.

And the acceleration itself is not the main thing. The main thing is the absolute figures of runtime. They will definitely not get worse. To be more precise, they shouldn't get any worse.

 
Mathemat:

That is obviously a discrete card, not a CPU: such speedups are hardly achievable in emulation. And the number of devices is already 5, which is really creepy.

If you would not mind running a slightly modified code, please, and post the result here. In the code, calculations for the various OpenCL devices are put into a loop (they should be fast) while calculation on x86, the longest one, is executed only once. It will be long, but the script itself is executed once.

I realise that I am already boring you. But in any case it will be good info for the Support Team.

Here is the result.

2012.04.23 21:42:58 ParallelTester_00-01x_cycle (EURUSD,H1) CpuTime/GpuTime = 439.0727802037846
2012.04.23 21:42:58 ParallelTester_00-01x_cycle (EURUSD,H1) Result on Cpu MachResult==1.41575 at 7544 pass
2012.04.23 21:42:58 ParallelTester_00-01x_cycle (EURUSD,H1) Social indicators = 16; Count history bars = 144000; Count pass = 12800
2012.04.23 21:42:58 ParallelTester_00-01x_cycle (EURUSD,H1) CPU time = 301643 ms
2012.04.23 21:37:56 ParallelTester_00-01x_cycle (EURUSD,H1) Result on Gpu MachResult==1.41575 at 7544 pass
2012.04.23 21:37:56 ParallelTester_00-01x_cycle (EURUSD,H1) Social indicators = 16; Count history bars = 144000; Count pass = 12800
2012.04.23 21:37:56 ParallelTester_00-01x_cycle (EURUSD,H1) GPU time = 687 ms
2012.04.23 21:37:55 ParallelTester_00-01x_cycle (EURUSD,H1) OpenCL init OK! Device number = 4
2012.04.23 21:37:55 ParallelTester_00-01x_cycle (EURUSD,H1) Result on Gpu MachResult==1.41575 at 7544 pass
2012.04.23 21:37:55 ParallelTester_00-01x_cycle (EURUSD,H1) Count indicators = 16; Count history bars = 144000; Count pass = 12800
2012.04.23 21:37:55 ParallelTester_00-01x_cycle (EURUSD,H1) GPU time = 234 ms
2012.04.23 21:37:55 ParallelTester_00-01x_cycle (EURUSD,H1) OpenCL init OK! Device number = 3
2012.04.23 21:37:55 ParallelTester_00-01x_cycle (EURUSD,H1) Result on Gpu MachResult==1.41575 at 7544 pass
2012.04.23 21:37:55 ParallelTester_00-01x_cycle (EURUSD,H1) Count indicators = 16; Count history bars = 144000; Count pass = 12800
2012.04.23 21:37:55 ParallelTester_00-01x_cycle (EURUSD,H1) GPU time = 234 ms
2012.04.23 21:37:54 ParallelTester_00-01x_cycle (EURUSD,H1) OpenCL init OK! Device number = 2
2012.04.23 21:37:54 ParallelTester_00-01x_cycle (EURUSD,H1) Result on Gpu MachResult==1.41575 at 7544 pass
2012.04.23 21:37:54 ParallelTester_00-01x_cycle (EURUSD,H1) Count indicators = 16; Count history bars = 144000; Count pass = 12800
2012.04.23 21:37:54 ParallelTester_00-01x_cycle (EURUSD,H1) GPU time = 234 ms
2012.04.23 21:37:54 ParallelTester_00-01x_cycle (EURUSD,H1) OpenCL init OK! Device number = 1
2012.04.23 21:37:54 ParallelTester_00-01x_cycle (EURUSD,H1) Result on Gpu MachResult==1.41575 at 7544 pass
2012.04.23 21:37:54 ParallelTester_00-01x_cycle (EURUSD,H1) Count indicators = 16; Count history bars = 144000; Count pass = 12800
2012.04.23 21:37:54 ParallelTester_00-01x_cycle (EURUSD,H1) GPU time = 234 ms
2012.04.23 21:37:54 ParallelTester_00-01x_cycle (EURUSD,H1) OpenCL init OK! Device number = 0

 
casinonsk:

Here is the result.

2012.04.23 21:42:58 ParallelTester_00-01x_cycle (EURUSD,H1) CpuTime/GpuTime = 439.0727802037846
2012.04.23 21:42:58 ParallelTester_00-01x_cycle (EURUSD,H1) Result on Cpu MachResult==1.41575 at 7544 pass
2012.04.23 21:42:58 ParallelTester_00-01x_cycle (EURUSD,H1) Social indicators = 16; Count history bars = 144000; Count pass = 12800
2012.04.23 21:42:58 ParallelTester_00-01x_cycle (EURUSD,H1) CPU time = 301643 ms
2012.04.23 21:37:56 ParallelTester_00-01x_cycle (EURUSD,H1) Result on Gpu MachResult==1.41575 at 7544 pass
2012.04.23 21:37:56 ParallelTester_00-01x_cycle (EURUSD,H1) Social indicators = 16; Count history bars = 144000; Count pass = 12800
2012.04.23 21:37:56 ParallelTester_00-01x_cycle (EURUSD,H1) GPU time = 687 ms
2012.04.23 21:37:55 ParallelTester_00-01x_cycle (EURUSD,H1) OpenCL init OK! Device number = 4

So, it is clear now, device = 4 is the naked CPU. And GPU time= 687 ms- quite consistent. It worked fine for you.

I don't understand why the execution time of CPU time is so long. I have about 235000 ms with a much lower core frequency (2.8 GHz).

Either your memory is low or it is very slow, or the CPU is permanently overloaded by some terribly resource-intensive task. It's not clear. The time ratio figure is unnaturally high.

 
Mathemat:

So, it's clear now, device = 4 is the bare CPU. And GPU time= 687 ms- quite consistent. It worked fine for you.

I just don't understand why execution time CPU time is so high. I have about 235000 ms - at much lower core frequency (2.8 GHz).

Either your memory is low or it is very slow, or the CPU is permanently overloaded by some terribly resource-intensive task. It's not clear. The time ratio figure is unnaturally high.

The Expert Advisor is optimizing, the cores are a bit loaded.

16Gb memory

 
casinonsk: The Expert Advisor is optimizing, cores are loaded a bit.

Then everything is clear. Try to run the same script when optimization is over. It probably loads all cores.

It is very interesting to look at a bare, top-end CPU when nothing else is in the way.

Reason: