A year later after reading an amazing contribution by Chris70, I'm back and have just printed the whole thread. I'm particularly keen in
Neural networks. Such a discussion would be of help.
Please check this article too for more on this and other topics. Feel free to join the project and make it a success.
Okay... it's probably now really time to get back to neural networks... (yeah, I know, the off-topic intermezzo was mostly my own fault...):
In the meantime I finished the code of a multicurrency EA version for neural network price forecasting, so that it's now ready for training, validation and testing.
As I mentioned, the predictions in previous attempts (single currency) were not very reliable on average for any random moment in time, so I wanted to concentrate more on finding high-probability setups only.
I chose a classifier model now instead of forecasting exact prices, because the method that I'll use this time closely follows an approach suggested by Dr. Marcos Lopez De Prado ("Quant of the year" 2019, author of the book "Advances in financial machine learning").
The network has 3 outputs that are labeled based on the "triple barrier method":
- output 1: an upper price level (n pips, fixed distance) is hit first (=upper "horizontal barrier")
- output 2: a lower price level (n pips, fixed distance) is hit first (=lower "horizontal barrier")
- output 3: no price level is hit within a max. number of minutes (="vertical barrier")
The activation function of the output layer is "softmax", which has the nice quality that all three outputs together add up to 1.0, so that the individual outputs can be seen as probabilities within a distribution.
Because it is a classifier this time, the loss function that we want to minimize during training this time isn't MSE loss (mean squared error), but Cross Entropy Loss.
The network has a normal MLP architecture for now, but I might also give it a try and compare with LSTM cells.
As I mentioned earlier, MLPs are good for pattern recognition, LSTMs and related types of recurrent networks are better for long time dependencies. So both have advantages. A multilayered fully connected LSTM network combines the advantages of both and this is also the model that I had initially used with the autoencoder. Without the autoencoder (which gets a little complicated with multi-currency trading), computation performance will suffer, which is why I start with a normal MLP; this doesn't mean that it can't have many neurons/layers, but not having to backpropagate on top of that through lots of time-steps is gonna make the training part a lot faster. We'll see.
Nevertheless, we're not done yet with a standard MLP network. Further following the suggestion of Dr. M.Lopez De Prado, I'm taking the outputs and the correct labels and thereby obtain true positives / true negatives / false positives / false negatives and can make a second (!) MLP network learn (after training of the main network) with this "meta labeling", so that I can calculate things like accuracy (validity), precision (reliability), recall and F-Score. The objective is to use these values for selection of high probability setups only.
For the inputs of the primary/main network, I'm using n periods of High/Low/Close prices (1.) of the main chart symbol and (2.) additional symbols that are conveniently communicated as an input variable (=comma separated list). Instead of pure prices, I take log returns a differencing method. The plan is to use at least all major pairs (EURUSD, USDJPY, GPBUSD, USDCAD, USDCHF) plus AUDUSD, as long as MT5 can handle these many price histories simultanously... It is the job of the neural network to find correlations among the currency pairs by itself and thereby derive possible consequences for the next upcoming prices of the main chart symbol.
I also added the month, the day of the week and the time as input variables.
For those of you who think about developing neural networks by themselves (may it be MQL or e.g. Python..), let's think for a moment about how to best feed these variables into a network (and if you don't know it yet, maybe I can show a neat trick):
Let's take the hour of the day as an example: 23 is followed by 0... does this really make sense? The minutes 23:59 and 0:00 are direct neighbors, but their values are at the highest possible distance. We have no continuity and the network will have some issues trying to make something meaningful out of this huge step. So what can we do?
One very common method (in fact the standard method for this purpose) is called "one-hot" encoding, which means we don't take just one input for the hour of the day, but 24 (i.e. 0-23). If for example the hour is 15:xx, then input number 15 gets the value 1, all other 23 of these inputs get the value 0. This method isn't that rare at all. Think of image recognition: an RGB sub-pixel is either ON or OFF, so it totally makes sense to encode a picture as "one-hot" encodings of all those MegaPixels that the images is made of.
If we only encode the hour, we need those 24 inputs. If we also encode the minute of the hour we have 60 more. Then 12 for the month... All this is absolutely feasible, but there might be a more elegant way...:
Think of the hour hand of a clock (and let's say this clock has a 24h watchface instead of 12h): instead of taking the value of the hour, we might instead take the angle of the hour hand, then we get a 360 degrees circle. Still, between 359° and 0°, there is this huge gap that we want to avoid. So how do we achieve continuity? The magic trick: the sine and cosine wave function! They are continuous, no gaps between neighbor values. If we put this into code, the declaration of the inputs can then look something like this:
et voilà.. we just used only 2 inputs for continuous time information that is precise down to the second, instead of 24+60+60=144 inputs for the one-hot encoding method;
sin(2*M_PI*mon/12) and cos(2*M_PI*mon/12) do the same for the month; this method works for all kinds of such "cyclic" variables.
Okay... now let's see if the multicurrency network version is training without any surprises and I'll come back later with some results...
Neural networks made easy (Part 6): Experimenting with the neural network learning rate - MT5
Neural networks made easy (Part 7): Adaptive optimization methods - MT5 ----------------
Neural networks made easy (Part 8): Attention mechanisms - MT5
Neural networks made easy (Part 11): A take on GPT - MT5
In June 2018, OpenAI presented the GPT neural network model, which immediately showed the best results in a number of language tests. GDP-2 appeared in 2019, and GPT-3 was presented in May 2020. These models demonstrated the ability of the neural network to generate related text. Additional experiments concerned the ability to generate music and images. The main disadvantage of such models is connected with the computing resources they involve. It took a month to train the first GPT on a machine with 8 GPUs. This disadvantage can be partially compensated by the possibility of using pre-trained models to solve new problems. But considerable resources are required to maintain the model functioning considering its size.
Machine learning in Grid and Martingale trading systems. Would you bet on it? - MT5
We have been working hard studying various approaches to using machine learning aimed at finding patterns in the forex market. You already know how to train models and implement them. But there are a large number of approaches to trading, almost every one of which can be improved by applying modern machine learning algorithms. One of the most popular algorithms is the grid and/or martingale. Before writing this article, I did a little exploratory analysis, searching for the relevant information on the Internet. Surprisingly, this approach has little to no coverage in the global network. I had a little survey among the community members regarding the prospects of such a solution, and the majority answered that they did not even know how to approach this topic, but the idea itself sounded interesting. Although, the idea itself seems quite simple.
Let us conduct a series of experiments with two purposes. First, we will try to prove that this is not as difficult as it might seem at first glance. Second, we will try to find out if this approach is applicable and effective.
Please enable the necessary setting in your browser, otherwise you will not be able to log in.