
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
I did not play with local memory the time i was testing OpenCL. I found it very complicated, even on other resources on the internet. I only tested vectorized kernels.
I managed to implement Bitonic sort that beats radix sort in speed (by optimizing the kernel algorithm).
But, finally OpenCL is a non-portable solution for parallel-processing. Now, it is almost dead!
non-portable ? We are talking about MT5 platform. mql5 isn't portable either.
Maybe I misunderstood your point.
I did not play with local memory the time i was testing OpenCL. I found it very complicated, even on other resources on the internet. I only tested vectorized kernels.
I managed to implement Bitonic sort that beats radix sort in speed (by optimizing the kernel algorithm).
But, finally OpenCL is a non-portable solution for parallel-processing. Now, it is almost dead!
yeah i find it daunting . Like Scarlett Johannsson appears in front of me and i must flirt with her.
dead within the mql5 ecostytem or in general ?
yeah i find it daunting . Like Scarlett Johannsson appears in front of me and i must flirt with her.
dead within the mql5 ecostytem or in general ?
amrali #:
...
Almost dead in general, because there are yet no API standards among graphic cards' manufacturers, unlike the situation of cpu manufacturers.
Hmm yeah i got the sense nVidia does not like CL a lot . Implementing standards for an open library would be a big headache for the companies i imagine , besides them running their own .
A standard between their libraries would make more sense .
Experimental or not though , if you can find the sweet spot for your algorithm it's still faster.
Experimental or not though , if you can find the sweet spot for your algorithm it's still faster.
On your computer alone! You will never know what GPU other users have.
So the example i posted may be running slower on your GPU even thought its just the minimum basic optimization it could have ?
I mean its not factoring in anything (apart from gpu with double) it just sends the data down.
Unless you mean if the optimization becomes too specific it can't scale (can't be distributed with the speed claims it tested at) , that i getSo the example i posted may be running slower on your GPU even thought its just the minimum basic optimization it could have ?
I mean its not factoring in anything (apart from gpu with double) it just sends the data down.
Unless you mean if the optimization becomes too specific it can't scale (can't be distributed with the speed claims it tested at) , that i getFrom the resources I read on subject before, I remember that the main complaint was performance non-portability.
These fine details needs extended testing across different GPUs.
From the resources I read on subject before, I remember that the main complaint was performance non-portability.
These fine details needs extended testing across different GPUs.
I see , i'm in no way experienced in it to counter that .
It's the poor man's Rocinante i guess .
Thanks for the tips
Where are you getting the values from for ?
I'm running a kernel
and it says private memory 0 , shouldn't it say 32? (f,b,a and c)?
here is the full code :