Discussion of article "OpenCL: From Naive Towards More Insightful Programming"

Sceptic Philozoff 2012.06.06 14:06 #11

denkir: You could add for example some statistical calculation.

I'm already going to do it, (most likely) it will be just the collection of statistics on a large history. Nothing to do with muves though. I hate muves - I have the right to do it :)

Synchronise quotes.

This is a task for CPU rather than GPU. Not every task is suitable for GPU, isn't it? It's better to choose one that is suitable, because that's what the GPU is for.

This will be a two-dimensional array.

I assume that in the future article it will be a three-dimensional one. At the same time, it will become clear how to work with the GPU buffer that displays an array of any dimension.

Activity Spectrum and AFC From theory to practice OpenCL: internal implementation tests

Denis Kirichenko 2012.06.06 14:08 #12

Mathemat:

Already going to do this, (most likely) will be just about collecting stats on a large story....

I assume that in a future article will be three-dimensional. At the same time it will become clear how to write an array of any dimensionality to the GPU buffer and how to work with it.

great! Let's wait...

Sceptic Philozoff 2012.06.08 16:49 #13

denkir: Take the history of quotes for all instruments in the terminal. Let's say one minute. Synchronise the quotes.

Do they need to be synchronised as well?

P.S. I read the help, I see that it is necessary.

Sceptic Philozoff 2013.06.27 01:45 #14

I recalculated the results of the article taking into account the hardware upgrade.

A year ago: Intel Pentium G840 CPU (2 cores @ 2.8 GHz) + AMD HD4870 video card.

Recently: CPU Intel Xeon E3-1230v2 (4 cores/8 threads @ 3.3 GHz; slightly behind i7-3770) + AMD HD video card.6 870.

The results of calculations in OpenCL are shown on the graph (horizontally - the number of optimisation applied in the article):

Computation time is shown vertically in seconds. The execution time of the script "in one blow on the CPU" (on one core, without OpenCL) varied depending on the algorithm changes within 95 plus or minus 25 sec.

As we can see, everything seems clear, but still not very obvious.

The obvious outsider is the dual-core G840. Well, it was expected. Further optimisations did not change the execution time too much, which varied from 4 to 5.5 seconds. Note that even in this case the script execution acceleration reached values over 20 times.

In the competition between two video cards of different generations - the ancient HD4870 and the more modern HD6870 - we can almost consider the 6870 to be the winner. Except for the last stage of optimisation, where the ancient monster 4870 still snatched a nominal victory (although it lagged behind almost all the time). The reasons, frankly speaking, are unclear: the shaders are smaller and their frequency is also lower - but still it won.

Let's assume that this is the vagaries of the development of generations of video cards. Or an error in my algorithm :)

I was frankly pleased with Xeon, which managed to be better than ancient 4870 on all optimisations, and fought with 6870 almost on equal terms, and at the end even managed to beat them all. I'm not saying that it will always be like that on any task. But the task was computationally quite difficult - after all, it was multiplication of two matrices of 2000 x 2000 size!

The conclusion is simple: if you already have a decent CPU like i7, and OpenCL computations are not too long, then maybe you don't need an additional powerful heater (video card). On the other hand, loading the stone at 100% for tens of seconds (during long calculations) is not very pleasant, because the computer "loses responsiveness" for this time.

Why don't I read OpenCL: internal implementation tests OpenCL: real challenges

Winsor Hoang 2015.01.15 04:54 #15

Hi there,

Can you provide an example of how OpenCL can speed up an EA backtesting in EveryTick Mode? Currently, it takes me 18 minutes to run 14 years data in EveryTick mode. I believe that lots of traders will be interested if OpenCL can reduce the test time by 50%.

Is there any differecent Code to run the Experts: Fast In Fast

Lorentzos Roussos 2023.04.26 13:49 #16

Great article , i did not understand how moving the transfer of the row from within the loop outside to private memory speed it up .

The same transfer operation occurs anyway , i got zero impact on my code but of course each case differs. (but then again 0 impact after adding something means that something sped up but no gain)

This part in the OpenCL source :

      "  REALTYPE rowbuf[ COLSROWS ];                                               \r\n"
      "  for( int col = 0; col < COLSROWS; col ++ )                                 \r\n"
      "     rowbuf[ col ] = in1[ r * COLSROWS + col ];                              \r\n"

Where you move it outside of the crunch loop

Thank you

Can I please get transfering values from different Trend and levels

Juan Calvo 2024.02.18 19:21 #17

Thanks for the article, it has been a great learning, looking forward to put it on practice

Discussion of article "OpenCL: From Naive Towards More Insightful Programming" - page 2