What to feed to the input of the neural network? Your ideas... - page 80

 
My prediction is that the only model that will be able to trade to a stable profit for a long time is the CNN + BiLSTM + MLP collaboration

Theconvolutional networks will be better deducted
 
My prediction is that the only model that will be able to trade in stable profit for a long time is the CNN + BiLSTM + MLP collaboration

1) Converged networks will be better at pattern extraction and generalisation. That is, they will do preprocessing of the input data.

2) Recurrent ones will work with data already cleaned from rubbish, and their previous state module will adapt to the changing market.

3) The multilayer perceptron will simply learn trading rules from the properly processed data. After all, it is the most meticulous of all three.


In case of strong changes in the behaviour of a currency pair, the recurrence networks will not let you drain quickly, but will gradually bring the trade to flat, which will give the signal for a new training of the whole system.


This is my subjective opinion based on my understanding of these methods.

I haven't properly spun this bundle in my hands yet

Again, considering that all these methods have been discussed hundreds of times by professionals here, I may be wrong. On the other hand, we have moved away from the standard prediction of the next candle, which is usually done in articles, in favour of the TS polishing. And maybe it will give a shift.
 
Ivan Butko Converged networks will be better at pattern extraction and generalisation. That is, they will do preprocessing of the input data.

2) Recurrent ones will work with data already cleaned from rubbish, and their previous state module will adapt to the changing market.

3) The multilayer perceptron will simply learn trading rules from the properly processed data. After all, it is the most meticulous of all three.


In case of strong changes in the behaviour of a currency pair, the recurrence networks will not let you drain quickly, but will gradually bring the trade to flat, which will give the signal for a new training of the whole system.


This is my subjective opinion based on my understanding of these methods.

I haven't properly spun this bundle in my hands yet

Again, considering that all these methods have been discussed hundreds of times by professionals here, I may be wrong. On the other hand, we have moved away from the standard prediction of the next candle, which is usually done in articles, in favour of the TS polishing. And maybe it will give a shift.
This is all nonsense, but the question is different
Does the TS-donor also plus on the forward)?
 
Ivan Butko #:

The Problem of Numbers.

... But, it so happens that the input data already actually carries a force factor in the face of its quantitative value. ...


Samples, each of n numbers, fed to the inputs X[0 ... n-1] in the process of training, should be pre-trained (normalised) in such a way that they are EQUAL to each other, other things being equal.


In the context that the Y value at the output of a matched neuron is the sum of X[0]*W[0] + X[1]*W[1] + X[2]*W[2] + ... + X[n-1]*W[n-1], this equivalence can be achieved in only one way :

The sums of the values |X[0]| + |X[1]| + |X[2]| + ... + |X[n-1]| of all training samples should be equal.

Where W[n] is taken as "other equal condition" equal to X[n].

That is, already taking into account the "other equal condition", two samples are numerically equivalent if their sums X[0]^2 + X[1]^2 + X[2]^2 + ... + X[n-1]^2 are the same.


For example :

If some sections of price by its increments X[n][k]=close[n][k]-open[n][k], where n is the value number of X and k is the sample number, are used as a training sample bag,

then the sum of the values |X[0][k]| + |X[1][k]| + |X[2][k]| + ... + |X[n-1][k]| denotes the path length of the curve travelled by the price within a given sample k.

That is, the training samples k must be pre-normalised to each other, in the physical sense of this example, literally by their length of the price curve path.

And now it is possible (and necessary) to perform another normalisation for them, under the range not less / not more than -+1, by dividing all X[n][k] by max, where max is the maximum |X[n][k]| found from all X[n][k].

And then, in the learning process, these normalised samples X[0 ... n-1][k] are fed as training samples, each having its own score d[k], which determines the contribution of this sample to the aggregate learning result.


Again :

I suppose that training should not be done with all available samples k, but only with those that have a "decent" (I use such a word) score d[k].

However, in order to make a "worthy" sample, we still need to have all samples k with their d[k] scores.

However, this is another topic ...

 
Maxim Dmitrievsky #:
TC donor is also plus on the forward)?

Absolutely

I described the theory: the point of NS is to filter the rough TSs that show something. And not to search for the grail.

And I described the approach: manually search for any TS that shows at least some results.

And filter it with NS.



Here is the original TS.

Unfit for trading, but showing signs of life .


And here is the same one, only filtered by Python NS: LSTM-key.

 
Evgeny Shevtsov #:


Samples, each of n numbers, fed to the inputs X[0 ... n-1] during training, must be pre-trained (normalised) so that they are EQUAL to each other, other things being equal.


In the context that the Y value at the output of a matched neuron is the sum of X[0]*W[0] + X[1]*W[1] + X[2]*W[2] + ... + X[n-1]*W[n-1], this equivalence can be achieved in only one way :

The sums of the values |X[0]| + |X[1]| + |X[2]| + ... + |X[n-1]| must be the same for all training samples.

Where W[n] is taken as "other equal condition" equal to X[n].

That is, already taking into account the "other equal condition", two samples are numerically equivalent if their sums X[0]^2 + X[1]^2 + X[2]^2 + ... + X[n-1]^2 are the same.


For example :

If some sections of price by its increments X[n][k]=close[n][k]-open[n][k], where n is the value number of X and k is the sample number, are used as a training sample package,

then the sum of the values |X[0][k]| + |X[1][k]| + |X[2][k]| + ... + |X[n-1][k]| means the length of the path of the curve travelled by the price within a given sample k.

That is, training samples k must be pre-normalised to each other, in the physical sense of this example, literally by their path length of the price curve.

And now it is possible (and necessary) to perform another normalisation for them, under the range not less / not more than -+1, by dividing all X[n][k] by max, where max is the maximum |X[n][k]| found from all X[n][k].

And then, in the learning process, these normalised samples X[0 ... n-1][k] are fed as training samples, each having its own score d[k], which determines the contribution of this sample to the aggregate learning result.


Again :

I believe that training should not be performed with all available samples k, but only with those that have a "decent" (I use such a word) score d[k].

However, to make a sample of "worthy" ones, we still need to have all samples k with their d[k] scores.

However, this is another topic ...

You write interestingly

This is where the dissonance with problem number theory comes in:

And now it is possible (and necessary) to perform another normalisation for them, under the range of at least / at most -+1, by dividing all X[n][k] by max, where max is the maximal |X[n][k]| found from all X[n][k].
After any normalisation, any squiggle is described by a quantitative number, which affects the final NS result the more its value is larger. And in fact, such a trajectory "outlier" may not be a pattern, but will have the most "weighty" number. Unreasonable and unjustified.

After all, prescribing "strength" to a pattern is tantamount to finding a grail, roughly speaking. There is no need for NS, just take patterns with the highest power (numerical) index, combine them with another similar pattern and open a position, because according to this logic the probability of winning is supposedly higher.

That's why I wanted to get away from numbers, or to find a method that would somehow objectively (as objectively as possible) give a certain power to a pattern, which in some context (mixture with other patterns) gave a high probability of working out.

Then there is a sense of pre-processing.
 
Ivan Butko #:

The problem of numbers.

...

...


...And the weight, which will try to weaken (nullify) the input value 0.9, will also weaken even(!) more strongly(!) the other values in the lower range of this input number (which later may be more important for the performance of the system), due to its static nature, because the weights do not change in a trained NS.
...
...


Evgeny Shevtsov#:

Samples, each of n numbers, fed to the inputs X[0 ... n-1] during training, should be pre-trained (normalised) in such a way that they are EQUAL to each other, other things being equal.

...

Again :

I believe that training should not be done with all available samples k, but only with those that have a "decent" (I use such a word) score d[k].

However, to make a sample of "worthy" ones, we still need to have all samples k with their d[k] scores.

...


One of the options (just an example) how to formalise information on a chart into some non-term number (endowing it with some objective force factor relative to other graphical information) is support and resistance levels. Let's assume that they are formalised. And the price bounces or breaks through them. It clings to them or ignores them.

In this case, the numbers denoting each successive price on the chart will be the same MinMax window, only the maximum value will be this very C/R level. If the price is near it - then 0.9 (-0.9) and closer to 1. If the price is far from it, it is closer to 0.

If one price is on one side of the level and the other is on the other side, then we change the polarity of each side: on the one side closer to the level - these are minus maxima, and on the other side - positive numerical maxima.


This is just a sketch, an example of what numbers should be. Not just a position relative to other numbers in the window, but a reflection of some market context.


Another option: we express patterns through numbers. And here we have a candlestick pattern A of one candle, after which the price goes up 55 times out of 100 on the history. The next pattern AB (candlestick combination of 2 candles) is followed by 48 up and 52 down. The third candlestick combination ABC (a combination of three candlesticks) works out 53 up and 47 down.
And here we have one pattern with a historical overweight up, the other overweight down, the third up: 5, -1, 3.
We combine into a window, normalise into the range of -1..1 and we will have numbers for the input of the NS: 1, -1 and0.4286.

This is an example of the fact that the quantitative factor, being decisive for the mathematical NS, carries objective information about the market and the task of the NS is to choose what of this objective information can be useful for stable trading.

And not to frantically extract something from the absolute rubbish, which is the average numerical set for the NS input.

 

L1, L2 normalisations. (regularisations) are described. Usually they are already built into the MO.

If there is much more rubbish than good examples (which is typical for a typical forex dataset), no approach will help.

 
Ivan Butko #:

It is obligatory

I described the theory: the point of NS is to filter rough TSs that show something. Not to search for the grail.

And I described the approach: manually search for any TS that shows results at least a little.

And filter it with NS.



Here is the original TS.

Unfit for trading, but showing signs of life .


And here is the same one, only filtered by Python NS: LSTM-key.

If it doesn't show in real life, but is taken from the tester, it is still overfit.

 

And here's how the convolution behaves (CNN)

Backtest of the original crude simple TC:
2000-2021


Its forward is


Working method:
- look for some TS, check it on the forward: it should at least not drain. Subjectively evaluate its work.
- before opening a position, we record information from the chart (experimentally - any) - in the input set. (I did it with a script)
- we record the result of the deal in the target for the previously recorded input set.
- we go through the whole training period.
- submit to the NS and train it
- add "if (.... && out > Open_level)" to the position opening condition
- optimise in MT5: twist and turn the threshold relay (the only parameter to optimise in the optimiser of the MT5 tester)
- choose the set you like, for example below - the same chart as the first one from above, but "filtered" by the NS.



- check forward 2021-2025



For comparison BiLSTM ( previous attempt)

.


As input: 1000 (!) normalised prices.

(I keep dreaming of the idea to formulate and design a market context for NS. And the first thing it should contain: a lot of units of information.)


Above was a simple method of exploiting NS until we figured out how to teach it to search for TCs onits own.

I guess we'll have to dive into the MOE section about learning without a teacher, datamining, and so on. The truth is buried in there somewhere.




Notes on NS: retraining is 99% of the work. That is, to arrive at this result (above), you had to sit around for half a day setting up the architecture. There is no voshleb stick on hyperparameter settings. And there are dozens and hundreds of ways to "tweak" something, you can't learn everything.

But in the process of picking you understand or feel the vector of thought direction. Where to poke and what to cook.


At the current moment, looking back at all the many years of effort with these networks, one thing I can say for sure: there is not enough functionality for selecting models.

1) Initialise weights randomly. Ok.

2) Training is started and here is the most interesting part:

3)Each set of weights should be checked on the forward by user-defined criteria: for example, by recovery factor.

4) As soon as the forward has improved - autosave (autosave) the model with the profit graph displayed on the screen. All saved graphs should be visible and scrollable so that the user can view the progress in parallel.

5) At the next iteration, if the model has deteriorated (overlearning has started or just an unsuccessful set of weights) - go back to the previous model and change the Learning rate. We continue until the learning rates improve.

6) If the learning rates do not improve or the user does not like the progress vector of the learning results - interrupt the training and start it again with a new initialisation of the weights.

7) We modify this basic process with additional functionality on demand: for example, we introduce auto-addition of neurons at new training, layers, change activation functions or loss functions, optimisation functions (Adam to BFG+momentum and others), and so on and so forth.

That is, we create a maximally useful search engine for finding a working model.

And we don't do it all by hand.



The space of weights is such a map of space, in which there is an infinite number of white dots. With each new weight the number of possible combinations of multiplication results on the same infinite combinations of input numbers exceeds the mark of the number of stars in the sky.

Each initialisation is a point on this map, a point of infinitely small size. And learning is a trek to neighbouring points. If a given cluster of points is far away from the "hail" point, it is better to knock it out immediately and start learning again. A new initialisation of scales is another, random point on this huge map. And a new training is a new "probing" of the space around.

An ideal formula (method) of finding the necessary set for NS (somehow analysing this map) is something from the fiction section. And as food for thought, how to solve or how to bypass these quantitative limitations.

And constant reinitialisation with a new start of training is at least the simplest and most accessible method to approach the best neural network setup.