Discussion of article "Neural networks made easy (Part 5): Multithreaded calculations in OpenCL" - page 2

You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
So this is not news - there was 1 core, and it was loaded, and now there are two cores, the load has decreased by half.... Most likely, the changes are more significant and the comparison is not correct.
To understand the reasons for acceleration, it is not enough to look at the number of cores, you must also look at the computing architecture.
I agree. I didn't understand why 4 vectors were parallelised instead of 2?
So this is not news - there was 1 core, and it was loaded, and now there are two cores, the load has decreased by half.... Most likely the changes are more significant and the comparison is not correct.
More efficient memory allocation can be, a core gets data in full at once, without redistribution. It turns out to be faster. But for some tasks it may be slower when kernel calculations are resource-intensive.
Agreed. Here I didn't understand why 4 vectors were then parallelised and not 2?
2 vectors of 4 elements each were parallelised. Vectors inp and weight. They put four elements in each and then multiplied them in dot.
Dimitri thanks for the reply.
More efficient memory allocation can be, the kernel gets the data in full at once, without reallocation. It turns out to be faster. But for some tasks it may be slower when kernel calculations are resource-intensive.
Maybe.
Two vectors of 4 elements each are parallelised. Vectors inp (initial data) and weight (weights). Four elements were written into each and then multiplied in dot
I.e. due to sequential multiplication operation there is an increase? After all, two vectors are parallelised, in which 4 (conditionally) multiplications are performed sequentially in each vector?
I.e. due to sequential multiplication operation there is an increase? After all, two vectors are parallelised, in which 4 (conditionally) multiplications are performed sequentially in each vector?
Using vector operations allows you to perform the product of 4 elements in parallel rather than sequentially. Look at the video https://ru.coursera.org/lecture/parallelnoye-programmirovaniye/4-1-chto-takoie-viektorizatsiia-i-zachiem-ona-nuzhna-f8lh3. It's about OpenMP, but the meaning is the same.