Discussion of article "Using OpenCL to test candlestick patterns" - page 2

 
Aleksey Vyazmikin:

And another question, how long does it take to transfer/prepare data for the video card, I heard an opinion that OpenCL for the video card does not make sense to use for real trading because of time losses on data transfer for processing, is it really so?

Figure 8 shows the output of time intervals between control points in the programme code to the terminal console. I will duplicate it here:

Times

Where:

  • OpenCL init - initialisation of OpenCL device and creation of kernels from resources
  • OpenCL buffering - copying OHLC buffers to the video card memory (semi-annual history, period M1)
  • OpenCL total execution - execution (running several kernels with reading results, etc...)
  • OpenCL test and OpenCL prepare orders are the execution stages that OpenCL total execution includes

Loading data into the array (Buffering in the figure) took 26.6ms (maybe I had to load or synchronise something), loading the same data into the video card memory took 8.7ms.

Yes, if you race large data back and forth many times, you can lose a lot of time. Therefore, we need to build the algorithm in such a way that copying to/from GPU memory is minimised. I want to modify the code for testing on tick history in the nearest future. The volumes will be larger there. It will be interesting to see the copying time there.

So far, the most expensive process is initialisation. It takes 316ms. But it can be done once if we use the same kernels from now on.


 
Serhii Shevchuk:

Figure 8 shows the output of time intervals between control points in the programme code to the terminal console. I will duplicate it here:

Where:

  • OpenCL init - initialisation of OpenCL device and creation of kernels from resources
  • OpenCL buffering - copying OHLC buffers to the video card memory (semi-annual history, period M1)
  • OpenCL total execution - execution (running several kernels with reading results, etc...)
  • OpenCL test and OpenCL prepare orders are the execution stages that OpenCL total execution includes

Loading data into the array (Buffering in the figure) took 26.6ms (maybe I had to load or synchronise something), loading the same data into the video card memory took 8.7ms.

Yes, if you race large data back and forth many times, you can lose a lot of time. Therefore, we need to build the algorithm in such a way that copying to/from GPU memory is minimised. I want to modify the code for testing on tick history in the nearest future. The volumes will be larger there. It will be interesting to see the copying time there.

So far, the most expensive process is initialisation. It takes 316ms. But it can be done once if we use the same kernels.


Thank you, very informative! However, how does this compare to a processor?

In any case, for many strategies that work by bar opening, it can be useful if you need to make complex calculations, for example, automatic adaptation to the market, both at initialisation and once a day.

 
Aleksey Vyazmikin:

Thank you, very informative! However, how does this compare similarly to a processor?

Anyway, for many strategies that work on bar opening, it can be useful if you need to make complex calculations, for example automatic adaptation to the market, both at initialisation and once a day.

At the end of the article there are performance comparison tables. The EA is optimised in the strategy tester, then the results obtained and the time spent are compared with what was obtained by the OpenCL tester. Point 3, "Performance Comparison".

At the testing depth of 9 months in the "OHLC on M1" mode, the OpenCL tester manages at most 1 second when optimising two parameters of 100 steps each (10000 passes).

Such optimisation can be performed at least every minute on 60 pairs, which can already be called automatic adaptation to the market, if I understand correctly what you mean.

 
Serhii Shevchuk:

There are performance comparison tables at the end of the article. The EA is optimised in the strategy tester, then the results obtained and time spent are compared with what was obtained by the OpenCL tester. Point 3, "Performance Comparison".

At the testing depth of 9 months in the "OHLC on M1" mode, the OpenCL tester manages at most 1 second when optimising two parameters of 100 steps each (10000 passes).

Such optimisation can be performed at least every minute on 60 pairs, which can already be called automatic adaptation to the market, if I understand correctly what you are talking about.

I have seen the table, the question is in one iteration (the result of one pass) for both CPU and GPU, as there is data preparation and transfer. However, this is obviously a useful topic, it's a pity that not everyone can use it due to its simplicity.

 
Am I correct that if there is a GPU farm of multiple cards, computations will be performed on all cards?
 
kogriv:
Am I correct that if there is a GPU farm of multiple cards, the computation will be done on all cards?

MT5 supports only one GPU device, unfortunately.

 
Aleksey Vyazmikin:

MT5 only supports one GPU device, unfortunately.

Multiple devices can be used.

It is up to the developer to decide which devices to use and how to use them.

 
Renat Fatkhullin:

Multiple devices can be used.

The developer decides which devices to use and how to use them.

If I'm not mistaken, earlier you wrote that only one device can be used at a time - choose which one, yes you can, has something changed?

We are not talking about agents, they, as I understand, each can use a device, only again it is not clear how to bind an agent to a particular device...
 

Very interesting article!

Unfortunately your solution does not offer 'genetic optimization'.
But these could easily be added, even one that exceeds the user's options and benefits compared to MQ5. Mq5 has only one option - genetic or everything or slow.

All you have to do
1) Create a table of the results with their parameter combinations,
2) sort them according to the results,
3) divide into e.g. 5 or (user definable n) sections,
4) then randomly select parameter combinations from each section,
5) Change a value for these selected parameter combinations,
6) check if this combination has already been tested (then return to 4)
7) Test.
8) Start again at 1) unless an abort criterion has been reached.

Only one combination is selected from the worst section, two from the next best, then three, and so on.
Thus, the density of the optimisations is highest in the best section, and since all sections are always about the same size relative to each other (1/5 or 1/n), the section boundaries shift, resulting in an additional increase in density in the best results.

You could also let the user determine: Should another combination be tested from section to section - i.e. 1, 2, 3, 4, 5 in the 5th top section - or two - 1, 3, 5, 7, 9 - or even more to further increase the density in the best section?

Maybe you could introduce something very interesting and useful?
If you do a cluster analysis of the parameter combinations in each section of the table, you could not (only) select the combinations randomly, but selectively:
a) in the middle between the best values of two clusters - to fathom the valley in between
b) in the middle between two as different as possible combinations of one and the same cluster - to find a new best value, a new top.
This will probably significantly improve the quality of the results of genetic optimisation. You would have to try it!

Cluster analysis

Very interesting article!

Unfortunately, your solution does not offer "genetic optimisation".
But you can easily add them, even if they exceed the user's capabilities and advantages compared to MQ5. Mq5 has only one option, genetic, full or slow.

All you need to do is.
1) Make a table of results with their parameter combinations,
2) Sort them by results,
3) Divide them into e.g. 5 or (user-defined n) sections,
4) then randomly select parameter combinations from each section,
5) Change the value for these selected parameter combinations,
6) Check if this combination has already been tested (then go back to 4)
7) Test.
8) Start over from 1) if the abort criterion is not reached.

Only one combination is selected from the worst section, two combinations are selected from the best, then three, and so on.
Therefore, the density of optimisations is highest in the best section, and since all sections are always the same size relative to each other (1/5 or 1/n), the section boundaries shift, leading to an additional increase in density in the best results.

In addition, the user could also be determined: if another section-to-section combination - i.e. 1, 2, 3, 4, 5 in the top 5th section - or two - 1, 3, 5, 7, 9 - or even more, would be tested to further increase the density in the best section.

Maybe you could present something very interesting and useful?
If you also do a cluster analysis of the parameter combinations in each section of the table, you can not only (only) pick combinations at random, but also specifically:
a) midway between the best values of two clusters - comprehend the valley between them
b) midway between two maximally different combinations of the same cluster - find a new best value, a new top.
This would probably improve significantly to the You'll have to give it a try!

Cluster Analysis


PS: In addition to my proposal, would you like to add that you could also develop a presentation of the results in the form of clusters?
The Strategy Tester always shows only the results of the whole range of a variable, it would be much more informative, if one sees, for example, there are three clusters and for each cluster the respective best value (the bigger, the better), their dispersion (the bigger, the better) and the statistics of the individual parameters (max., min., middle, standard deviation (the bigger, the better). This makes it easier to see where the more robust parameter combinations are likely to be and where the more random best values could be.

PS: In addition to my suggestion, could you add that you could also develop a presentation of the results in the form of clusters?
Strategy Tester always shows only the results of the whole range of variables, it would be much more informative if you could see, for example, three clusters and for each cluster the best value (the bigger, the better), their variance (the bigger, the better) and the statistics of the individual parameters (maximum, minimum, mean, standard deviation (the bigger, the better). This makes it easier to find the most reliable combinations of parameters and the best random values.


PPS: By chance I found this article: "Optimizing OpenCL Kernels
for Iterative Statistical Applications on GPUs"
that would fit my suggestion :)

By chance I found this article: "Optimising OpenCL Kernels
for Iterative Statistical Applications on GPUs"
that would fit my suggestion :).



Cluster analysis - Wikipedia
Cluster analysis - Wikipedia
  • en.wikipedia.org
Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances between cluster members, dense areas...
 

Great article!!!

Congratulations, @decanium !

Serhii Shevchuk
Serhii Shevchuk
  • www.mql5.com
Produto publicado This is a convenient tool for measuring the number of points between prices. It support magnetizing to OHLC prices. Calculates profit considering specified lot size and spread (optionally). It counts the number of bars between specified point and the time difference between them. Calculates the slope angle from the...