Discussion of article "Neural networks made easy (Part 5): Multithreaded calculations in OpenCL" - page 2

 
Aleksey Vyazmikin:

So this is not news - there was 1 core, and it was loaded, and now there are two cores, the load has decreased by half.... Most likely, the changes are more significant and the comparison is not correct.

To understand the reasons of acceleration it is not enough to look at the number of cores, you should also look at the architecture of calculations.
 
Dmitriy Gizlyk:
To understand the reasons for acceleration, it is not enough to look at the number of cores, you must also look at the computing architecture.

I agree. I didn't understand why 4 vectors were parallelised instead of 2?

 
Aleksey Vyazmikin:

So this is not news - there was 1 core, and it was loaded, and now there are two cores, the load has decreased by half.... Most likely the changes are more significant and the comparison is not correct.

More efficient memory allocation can be, a core gets data in full at once, without redistribution. It turns out to be faster. But for some tasks it may be slower when kernel calculations are resource-intensive.

 
Aleksey Vyazmikin:

Agreed. Here I didn't understand why 4 vectors were then parallelised and not 2?

2 vectors of 4 elements each were parallelised. Vectors inp and weight. They put four elements in each and then multiplied them in dot.

dot(inp,weight) -> i1*w1+i2*w2+i3*w3+i4*w4
 
Dmitriy Gizlyk

Dimitri thanks for the reply.

 
Maxim Dmitrievsky:

More efficient memory allocation can be, the kernel gets the data in full at once, without reallocation. It turns out to be faster. But for some tasks it may be slower when kernel calculations are resource-intensive.

Maybe.

 
Thanks for the series of articles, interesting and informative. But how to train the network, something I do not get. I have already run it several times on the EURUSD chart. Forecast grows and then starts to fall. Has anyone already trained the network?
 
Dmitriy Gizlyk:

Two vectors of 4 elements each are parallelised. Vectors inp (initial data) and weight (weights). Four elements were written into each and then multiplied in dot

I.e. due to sequential multiplication operation there is an increase? After all, two vectors are parallelised, in which 4 (conditionally) multiplications are performed sequentially in each vector?

 
Very interesting article and the topic of neural networks is a hot topic nowadays. Thanks to the author for the good work.
 
Aleksey Vyazmikin:

I.e. due to sequential multiplication operation there is an increase? After all, two vectors are parallelised, in which 4 (conditionally) multiplications are performed sequentially in each vector?

Using vector operations allows you to perform the product of 4 elements in parallel rather than sequentially. Look at the video https://ru.coursera.org/lecture/parallelnoye-programmirovaniye/4-1-chto-takoie-viektorizatsiia-i-zachiem-ona-nuzhna-f8lh3. It's about OpenMP, but the meaning is the same.

4.1. Что такое векторизация и зачем она нужна - Векторные вычисления с помощью OpenMP 4.0 | Coursera
4.1. Что такое векторизация и зачем она нужна - Векторные вычисления с помощью OpenMP 4.0 | Coursera
  • ru.coursera.org
Video created by Национальный исследовательский Томский государственный университет for the course "Введение в параллельное программирование с использованием OpenMP и MPI". Приветствуем вас на четвертой неделе курса! На этой недели мы разберемся ...