Parallelism is just a cool idea, according to OpenCL (Part 2) - General

Renat Fatkhullin 2019.12.04 11:18 #241

fxsaber:

How did you debug the OpenCL part of the code?

By running and unpinning the results.

Renat Fatkhullin 2019.12.04 11:22 #242

Fast235:

Renat, could you please provide some pointers on when OpenCL would have an advantage working with a large number of arrays or when there is a large set of symbols and indicators in one EA?

If you can build up millions of arrays without cross-linking and be able to analyze them in parallel, you can speed up the process.

In general, anyone who solves and understands the limits of his or her tasks will easily answer his or her question. As long as there is no understanding of limits (and there are just dreams of "what if")constantly with the tug of the tasks being solved, parallelism remains just a cool idea.

Most tasks cannot be effectively parallelised, unfortunately.

The market is a I will write TORs AI 2023. Meet ChatGPT.

Fast235 2019.12.04 11:24 #243

Renat Fatkhullin:

If you can build millions of arrays without cross-links and which can be analysed in parallel, you can speed up.

In general, someone who solves and understands the limits of his tasks will easily answer his question. As long as there is no understanding of limits (and there are just dreams "what if") constantly with tearing tasks being solved, parallelism remains just a cool idea.

Most tasks cannot be effectively parallelised, unfortunately.

I remember about the fact that there are accompanying moments in calculations, that's why the question arose in my mind

fxsaber 2019.12.04 11:31 #244

Renat Fatkhullin:
By running and unpriming the results.

Is it possible to put a primer inside OpenCL code?

Ilyas 2019.12.04 11:40 #245

Roughly speaking about OpenCL

You can imagine that there is a processor with large (vector) registers, into which individual N double values can be written (for example, let N=64, then 64 values can be written into the register).
Such registers can be added, multiplied, etc., with each other, it turns out that for one command, you can perform an operation on N dables.

But there are limitations.

It is impossible to operate with values in one register, i.e. for example it is impossible to add a part of values of one register among themselves.
On frequency, such processor is strongly inferior to the usual CPU, therefore it makes no sense to use it in tasks where sequential processing of one value is required.
Besides the lower frequency, there is also a memory limitation, values can only be loaded and unloaded from the special memory.
It is possible to copy data from RAM into this special memory, but only via a very narrow (slow) channel.
Therefore, tasks that require the processing of large amounts of data are also poorly suited to OpenCL

AMD or Intel as Timer Client terminal global variables

Renat Fatkhullin 2019.12.04 12:08 #246

fxsaber:

Is it possible to put a primer inside OpenCL code?

Outside.

Let's skip the lecture session. You simply take and read everything that has been created and described in detail about OpenCL by us and on the web in general.

OpenCL search:

Actually, it's very interesting that "opencl trading" has been googled with a lot of stuff from our resources:

1200 subscribers!!! Here's what you can User experience feedback on

Renat Fatkhullin 2019.12.04 12:27 #247

Renat Fatkhullin:

Here is a comparison in Python 3.8 and MQL5 in single thread/OpenCL modes: time in seconds, the less the better

pi-single.py	pi-multi.py	Speed PI.mq5 single	Speed PI.mq5 OpenCL
4.1743	0.2101	4.1836	0.1025

Python in JIT mode via numba, hardware is like this:

Windows 10 x64, Intel Xeon E5-2690 v3 @ 2.60GHz
GeForce RTX 2080

The example of using OpenCL is very simple and there are no frills in its optimization. Although the task is not massive for OpenCL and it triggered overheads in preparation, it still gave a much better result.

Very large parallel calculations can be routinely run with OpenCL. The entry threshold is not high and it only takes a day to figure out how to use it.

Files for playback are attached.

It turns out that this problem of PI calculation in OpenCL was already solved 7 years ago:

OpenCL: The Bridge to Parallel Worlds

OpenCL: Мост в параллельные миры

www.mql5.com

Настоящая статья открывает небольшой цикл публикаций, посвященных программированию на OpenCL, или Open Computing Language. Платформа MetaTrader 5 в ее текущем воплощении до подключения OpenCL не позволяла напрямую, т.е. нативно, использовать преимущества многоядерных процессоров для ускорения вычислений. В разделе "Статьи", правда, еще полтора...

fxsaber 2019.12.04 12:30 #248

I can't figure it out yet.

Single thread: the value of PI is 3.141592653590
Single thread: calculated in 7.382561 seconds
OpenCL not found. Error code=5100
OpenCL initialization failed with 5100

Does anyone have any direct links to what needs to be downloaded? On Intel it requires registration.

HH On Intel GPU in CPU I have to grok the video adapter drivers, then installed Intel_OpenCL_driver, then put the video adapter drivers. This way everything works and video doesn't slow down.

[Archive] Learn how to Terminal speed Spread trading in Meta

Lyuk 2019.12.04 14:50 #249

Vict:

I'm afraid that active use of this feature will turn the startup into a multi minute quest.

Will not turn you can enable caching.

Renat Fatkhullin:

Here's a comparison in Python 3.8 and MQL5 in single thread/OpenCL modes: time in seconds, the less, the better

pi-single.py	pi-multi.py	Speed PI.mq5 single	Speed PI.mq5 OpenCL
4.1743	0.2101	4.1836	0.1025

Python in JIT mode via numba, hardware is like this:

Windows 10 x64, Intel Xeon E5-2690 v3 @ 2.60GHz
GeForce RTX 2080

In multithreading mode you are comparing CPU vs GPU performance in 10000 threads. The GPU is not involved in python.

If I find a computer with a suitable video card, I will fix the Python code and test it on this computer. I will try to run your code on CPU, I thought GPU is required.

It is not my intention to prove who is faster at all. More interesting to know about your plans to integrate Python.

Are you planning any trading functions and tick events in python?

Still, GPU seems to be mandatory, "AMD APP SDK" is not downloadable.

Документация по MQL5: Торговые функции

www.mql5.com

Перед тем как приступить к изучению торговых функций платформы, необходимо создать четкое представление об основных терминах: ордер, сделка и позиция: Ордер – это распоряжение брокерской компании купить или продать финансовый инструмент. Различают два основных типа ордеров: рыночный и отложенный. Помимо них существуют специальные ордера Тейк...

Machine learning in trading: Any rookie question, so AMD or Intel as

fxsaber 2019.12.04 14:53 #250

Ilyas:

Roughly speaking about OpenCL

We can imagine that there is a processor with large (vector) registers where we can write individual N double values (for example let N=64, then 64 values can be written into the register).
Such registers can be added, multiplied, etc., with each other, it turns out that for one command, you can perform an operation on N dables.

But there are limitations.

It is impossible to operate with values in one register, i.e. for example it is impossible to add a part of values of one register among themselves.
On frequency, such processor is strongly inferior to the usual CPU, therefore it makes no sense to use it in tasks where sequential processing of one value is required.
Besides the lower frequency, there is also a memory limitation, values can only be loaded and unloaded from the special memory.
It is possible to copy data from RAM into this special memory, but only via a very narrow (slow) channel.
Therefore, tasks that require the processing of large amounts of data are also poorly suited to OpenCL

There are no vectors in the PI calculation example. It simply divides the total sum into several independent chunks and sends them to each OpenCL core. Everything is added up at the end.

For example, if there is no discrete video card and there are 4 physical + 4 virtual cores in CPU, the execution will be eight times faster. That is, chunks of sum will be counted on each core in parallel.

Remote agents are not how does the optimizer Questions from Beginners MQL5

MetaTrader 5 Python User Group - how to use Python in Metatrader - page 25