MetaTrader 5 Python User Group - how to use Python in Metatrader - page 25

 
fxsaber:

How did you debug the OpenCL part of the code?

By running and unpinning the results.
 
Fast235:

Renat, could you please provide some pointers on when OpenCL would have an advantage working with a large number of arrays or when there is a large set of symbols and indicators in one EA?

If you can build up millions of arrays without cross-linking and be able to analyze them in parallel, you can speed up the process.

In general, anyone who solves and understands the limits of his or her tasks will easily answer his or her question. As long as there is no understanding of limits (and there are just dreams of "what if")constantly with the tug of the tasks being solved, parallelism remains just a cool idea.

Most tasks cannot be effectively parallelised, unfortunately.

 
Renat Fatkhullin:

If you can build millions of arrays without cross-links and which can be analysed in parallel, you can speed up.

In general, someone who solves and understands the limits of his tasks will easily answer his question. As long as there is no understanding of limits (and there are just dreams "what if") constantly with tearing tasks being solved, parallelism remains just a cool idea.

Most tasks cannot be effectively parallelised, unfortunately.

I remember about the fact that there are accompanying moments in calculations, that's why the question arose in my mind

 
Renat Fatkhullin:
By running and unpriming the results.

Is it possible to put a primer inside OpenCL code?

 

Roughly speaking about OpenCL

You can imagine that there is a processor with large (vector) registers, into which individual N double values can be written (for example, let N=64, then 64 values can be written into the register).
Such registers can be added, multiplied, etc., with each other, it turns out that for one command, you can perform an operation on N dables.

But there are limitations.

It is impossible to operate with values in one register, i.e. for example it is impossible to add a part of values of one register among themselves.
On frequency, such processor is strongly inferior to the usual CPU, therefore it makes no sense to use it in tasks where sequential processing of one value is required.
Besides the lower frequency, there is also a memory limitation, values can only be loaded and unloaded from the special memory.
It is possible to copy data from RAM into this special memory, but only via a very narrow (slow) channel.
Therefore, tasks that require the processing of large amounts of data are also poorly suited to OpenCL

 
fxsaber:

Is it possible to put a primer inside OpenCL code?

Outside.

Let's skip the lecture session. You simply take and read everything that has been created and described in detail about OpenCL by us and on the web in general.

OpenCL search:




Actually, it's very interesting that "opencl trading" has been googled with a lot of stuff from our resources:


 
Renat Fatkhullin:

Here is a comparison in Python 3.8 and MQL5 in single thread/OpenCL modes: time in seconds, the less the better

pi-single.py
pi-multi.py
Speed PI.mq5 single
Speed PI.mq5 OpenCL
4.1743
0.2101
4.1836
0.1025

Python in JIT mode via numba, hardware is like this:

  • Windows 10 x64, Intel Xeon E5-2690 v3 @ 2.60GHz
  • GeForce RTX 2080

The example of using OpenCL is very simple and there are no frills in its optimization. Although the task is not massive for OpenCL and it triggered overheads in preparation, it still gave a much better result.

Very large parallel calculations can be routinely run with OpenCL. The entry threshold is not high and it only takes a day to figure out how to use it.

Files for playback are attached.

It turns out that this problem of PI calculation in OpenCL was already solved 7 years ago:

OpenCL: Мост в параллельные миры
OpenCL: Мост в параллельные миры
  • www.mql5.com
Настоящая статья открывает небольшой цикл публикаций, посвященных программированию на OpenCL, или Open Computing Language. Платформа MetaTrader 5 в ее текущем воплощении до подключения OpenCL не позволяла напрямую, т.е. нативно, использовать преимущества многоядерных процессоров для ускорения вычислений. В разделе "Статьи", правда, еще полтора...
 
I can't figure it out yet.
Single thread: the value of PI is 3.141592653590
Single thread: calculated in 7.382561 seconds
OpenCL not found. Error code=5100
OpenCL initialization failed with 5100

Does anyone have any direct links to what needs to be downloaded? On Intel it requires registration.


HH On Intel GPU in CPU I have to grok the video adapter drivers, then installed Intel_OpenCL_driver, then put the video adapter drivers. This way everything works and video doesn't slow down.

 

Vict:

I'm afraid that active use of this feature will turn the startup into a multi minute quest.

Will not turn you can enable caching.

Renat Fatkhullin:

Here's a comparison in Python 3.8 and MQL5 in single thread/OpenCL modes: time in seconds, the less, the better

pi-single.py
pi-multi.py
Speed PI.mq5 single
Speed PI.mq5 OpenCL
4.1743
0.2101
4.1836
0.1025

Python in JIT mode via numba, hardware is like this:

  • Windows 10 x64, Intel Xeon E5-2690 v3 @ 2.60GHz
  • GeForce RTX 2080

In multithreading mode you are comparing CPU vs GPU performance in 10000 threads. The GPU is not involved in python.

If I find a computer with a suitable video card, I will fix the Python code and test it on this computer. I will try to run your code on CPU, I thought GPU is required.

It is not my intention to prove who is faster at all. More interesting to know about your plans to integrate Python.

Are you planning any trading functions and tick events in python?


Still, GPU seems to be mandatory, "AMD APP SDK" is not downloadable.
Документация по MQL5: Торговые функции
Документация по MQL5: Торговые функции
  • www.mql5.com
Перед тем как приступить к изучению торговых функций платформы, необходимо создать четкое представление об основных терминах: ордер, сделка и позиция: Ордер – это распоряжение брокерской компании купить или продать финансовый инструмент. Различают два основных типа ордеров: рыночный и отложенный. Помимо них существуют специальные ордера Тейк...
 
Ilyas:

Roughly speaking about OpenCL

We can imagine that there is a processor with large (vector) registers where we can write individual N double values (for example let N=64, then 64 values can be written into the register).
Such registers can be added, multiplied, etc., with each other, it turns out that for one command, you can perform an operation on N dables.

But there are limitations.

It is impossible to operate with values in one register, i.e. for example it is impossible to add a part of values of one register among themselves.
On frequency, such processor is strongly inferior to the usual CPU, therefore it makes no sense to use it in tasks where sequential processing of one value is required.
Besides the lower frequency, there is also a memory limitation, values can only be loaded and unloaded from the special memory.
It is possible to copy data from RAM into this special memory, but only via a very narrow (slow) channel.
Therefore, tasks that require the processing of large amounts of data are also poorly suited to OpenCL

There are no vectors in the PI calculation example. It simply divides the total sum into several independent chunks and sends them to each OpenCL core. Everything is added up at the end.

For example, if there is no discrete video card and there are 4 physical + 4 virtual cores in CPU, the execution will be eight times faster. That is, chunks of sum will be counted on each core in parallel.

Reason: