Machine learning in trading: theory, models, practice and algo-trading - page 3517
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
The convolution layer and the lstm layer. Stride (filter) is a move step in the convolution
.
If the second function for it is rectangular, we will get SMA
If there are other forms of this function (triangular, exponential, etc.), we will get something else (maybe EMA), just there will be other coefficients for different rows.
But in any case, as with SMA and EMA, there will be a delay due to averaging.
So it seems that they are of little use, as well as MAs. One plus - reduction of the number of predictors.
On convolutions...
If the second function for it is rectangular, we will get SMA
If there are other forms of this function (triangular, exponential, etc.), we will get something else (maybe EMA), just there will be different coefficients for different lines.
But in any case, as with SMA and EMA, there will be a delay due to averaging.
So it seems that they are of little use, as well as MAs. One plus is the reduction of the number of predictors.
It's a little different, but it's a lot more primitive.
In a convolutional neuron, convolutional nuclei are "invented" by the neuron during training and there are thousands of nuclei/filters....
So to think that a convolutional neuron can be replaced by a single rectangular nucleus is absurd.
In a convolutional neuron, convolutional nuclei are "invented" by the neuron during the learning process and the nuclei/filters there are thousands of...
Matching coefficients to different strings? Than it will create some more suitable ph-iie instead of a rectangular one.
Picks up the coefficients of different rows?
No, it picks up the convolution kernels.
Than create some more suitable one instead of a rectangular ph-type.
yes
On convolutions...
If the second function for it is rectangular, we will get SMA
If there are other forms of this function (triangular, exponential, etc.), we will get something else (maybe EMA), just there will be different coefficients for different lines.
But in any case, as with SMA and EMA, there will be a delay due to averaging.
So it seems that they are of little use, as well as MAs. One plus is the reduction in the number of predictors.
Well, the coefficients of the convolution kernel are chosen to minimise the error, as I recall.
a matrix or vector of these coefficients runs over another matrix or vector of input data with a given step, the values are multiplied and the matrix with the maximum total values is taken as the output, for example
then the outputs are passed to the next layer of the NS, in this case lstm, as shown in the picture
lstm then tries to remember the sequences of these values, like a memory analogue.
So the feature of the convolutional layer is that it can select the most important elements from the whole matrix, and lstm remembers their sequences.
As a result, selecting important features of each row + remembering their sequences, what comes after what, in what order. Approximation. Usually at the end a full-link layer with softmax is put in order to distribute all this goodness by classes.
lstm works with radically unbalanced classes - for 2 classes by a factor of ten?
lstm works basically shitty ) very hard to train properly
there are constant gradient explosions because of multiple multiplications of weights.
Maybe now it's fixed somehow, but I don't think so, because everyone switched to transformers.
Transformer is some combination of convolution and lstm, unofficially you can probably make this analogy
It's not random forests - there you have to show off to train properly.
Made some visualisation to better understand the process of degradation of the model results.
Below is a graph showing ZZ on the balance of weighted quantum cutoff errors on the train sample.
You can see that overall the trend is good, positive probability bias, but what about the next two samples, test and exam?
Below are the graphs on these samples.
We observe flatness and at the end of the period the trend to a positive probability shift. Most likely, we will stop the model involving this quantum segment - without waiting for the rise.
And what awaits next?
On the exam sample, the rise continued in the beginning, but again everything fell into flat. How is it possible that there was a long trend on the train, but later we see occasional bursts - something has changed? It remains only to guess - or to make an additional split on another quantum segment, but how to do it on train, because everything is not bad there - a mystery.
But, it happens that we can find quite acceptable variants, here for example - three samples below.
Thoughts, ideas, considerations are welcome!
Thoughts, ideas, considerations are welcome!
If the train is overtrained, then there is less to train.