Machine learning in trading: theory, models, practice and algo-trading - page 1980

 
Valeriy Yastremskiy:
And how is memory organized?

depends on where

if you understand it right away, then I'm waiting for clarification :)

http://peterbloem.nl/blog/transformers

Transformers from scratch
  • peterbloem.nl
I will assume a basic understanding of neural networks and backpropagation. If you’d like to brush up, this lecture will give you the basics of neural networks and this one will explain how these principles are applied in modern deep learning systems. Self-attention The fundamental operation of any transformer architecture is the self-attention...
 

Hi all, I didn't post the video directly to the forum thread, but posted it on my blog. WARNING non-normative language for those who are really interested in the market...

https://www.mql5.com/ru/blogs/post/739164

Говорим и показываем о рынке
Говорим и показываем о рынке
  • www.mql5.com
Оказывается пьяным меня смотреть ещё интересней. ВНИМАНИЕ в видео используется не нормативная лексика так что уберите детей от экрана телевизора. Ну и говорим собственно о рынке
 
Mihail Marchukajtes:

Hi all, I didn't post the video directly to the forum thread, but posted it on my blog. WARNING non-normative language for those who are really interested in the market...

https://www.mql5.com/ru/blogs/post/739164

There is also a direct reference to you, Maxim!!!!!
 
Maxim Dmitrievsky:

I only roamed in the woods before, I didn't use the HH-ki.....

Yeah, me neither... That's why I'm talking about the block diagram, so that at least at the level of images to understand how it works

 
mytarmailS:

Yeah, me too... That's why I'm talking about the block diagram, so that at least at the level of images to understand what works

I spent two days trying to figure out what a cohonen layer is

and it turned out to be just a primitive autoencoder

Vladimir wrote about them in articles
 
Maxim Dmitrievsky:

depends on where

if you understand it, I'm waiting for clarification :)

http://peterbloem.nl/blog/transformers

What I can't create, I don't understand, that's what Feynman said.

Multiplication is better than addition, the sign is taken into account. Generally the works of say argument and result is something) a single accounting function.

Queries, keys and values are not quite clear how organized.

The main difference is pseudo-parallel processing and access to trained data and the scalar product of input and output result vectors called self-awareness. And then the matrix of these scalar products is used in training. And it's not weights.

I didn't find about the short long memory in the article.

In general, additional matrices correcting the result are created.

I don't pretend to understand it correctly ))))

 
Valeriy Yastremskiy:

What I can't create, I don't understand, that's what Feynman said.

Multiplication is better than addition, the sign is taken into account. Generally the works of say argument and result is something) a single accounting function.

Queries, keys and values are not quite clear how organized.

The main difference is pseudo-parallel processing and access to trained data and the scalar product of input and output result vectors called self-awareness. And then the matrix of these scalar products is used in training. And it's not weights.

I didn't find about the short long memory in the article.

In general, additional matrices correcting the result are created.

I don't pretend to understand it correctly))))

it's another algorithm (like the coolest now), there's no definition of long and short memory in it like in lstm

the long and short is just to see how an lstm cell works

 
Maxim Dmitrievsky:

I spent two days trying to figure out what a cohonen layer (VQ) is

but it turns out it's just a primitive autoencoder

Vladimir wrote about them in articles

Vladimir wrote about VQ ? or just ?

What about memory ? how does it work there ? is it permanent or does it work in the window (like an indicator), is it static or is it retrained?

is it possible to do something similar to scaffolding ?

I have a million questions)))

 
Maxim Dmitrievsky:

this is a different algorithm (like the coolest now), there are no definitions of long and short memory as in lstm, like

about long and short is just to see how an lstm cell works

Ahh. Well then there's self-awareness and resource in times as understood. In general, scaling the network architecture simply improves its performance to some limit. Here, as understood the complexity of the network by combining different logics of networks and then scaling it))). And consequently

The bottleneck in learning transformers is the matrix of scalar products of self-awareness. For the sequence length t , it is a dense matrix containing t squared elements. At standard 32-bit precision and with t= 1000 a batch of 16 such matrices takes up about 250MB of memory. Since we need at least four of them (before and after softmax, plus their gradients) for a single self-imaging operation, this limits us to a maximum of twelve layers in a standard GPU and BELOW 12GB.

 
Maxim Dmitrievsky:

You have to do a lot of studying and thinking before you understand...

you might have to buy brain vitamins, drink less)

I haven't figured it out yet.) But it's not as hard as it sounds.

So we're back to the usual flowchart again, you have to make it up first so you have an image-level understanding...

like -

first the classifier(it does this and that)

then we connect the classifier to the output (it does this and that)

then count something ( it does this and that )

Output is connected to the casterizer again

etc...


If you just read some complicated nonsense where you don't even know the terms, what will you get?

So, you need to understand the basic principle of the algorithm, and to understand it at the level of the block scheme as I pointed out. Then you will understand what is what and what is what, and when you understand it you will understand what and how you can improve it.

Reason: