Taking Neural Networks to the next level - page 6

 

Absolute values of MSE, MAE and RMSE are meaningless, unless they refer to exactly the same kind of outputs. Only relative changes are relevant, e.g. when results from different input data are compared (training / validation / testing) or when changes to the model (like different number of layers, different learning rate, activation functions....) are compared to each other. The absolute values also depend on the label scaling method in use, other label data manipulations like log differencing and also the choice of loss function (probably MSE here).

I didn't write anything here over the past few days because I was playing around a lot trying to find any useful results. This is not a technical (/coding) issue, but it just seems, that there really isn't that much to predict in forex. However, complete randomness isn't the case, either.

Predictions often look something like this:

ae_lstm_predictions

The data points in green are price predictions one timestep into the future (=autoencoder-decoded LSTM outputs), each made at the time of those preceding vertical lines (like can be seen on the far right, where the predictions are already made and the actual prices are still unknown). The observation that parallel lines can be distinguished comes from the fact that the model quickly learns the typical distance of high and low prices compared to open and close. Even if a directional bias does exist, it also is wrong VERY often. We all know that the smaller time steps are more noisy, on the other hand, it's hard to make predictions far into the future. The predictions I could make where only correct in 51-57% of the cases (on unseen data that where not part of the training set, with "correct" prediction defined as correctly predicted direction of the close versus the open for a given time period). It probably is no wonder that I got stable ~57% figures only for predictions of 1 minute ahead.

@John OHagan: what were your results for the percentage of correct directional prediction?

So it's not entirely random, but how useful is it? To conclude, I couldn't find any model that could be useful for retail traders because the edge is too small to overcome spreads and commissions. I also compared to the traditional LSTM model (with both stacked and single layer versions) and came to similar results (also in the 51-57% range, depending on the predicted period), although I could confirm that using the autoencoder model made a huge performance difference (=computation-wise). I also could confirm that LSTMs were superior to just feeding the timestep data into a simple MLP network.

Because of these dissappointing results, it doesn't mean that the chapter of neural networks is closed for me, but I consider their usefulness for forex predictions as very limited. I also believe that this is no fault of neural networks, but I guess that's just what forex is like.

A positive finding of this experiment on the other hand is the great power of autoencoders as an alternative for moving averages, just because autoencoders can return denoised "typical" current prices just like MA's, but with absolutely NO LAG. I will probably make use of this quality and forget about the prediction part.

----------

I also compared my neural network results to the predictive quality of another model, that I worked quite a lot with in the past, and which in retrospect might be a  better alternative: polynomial regression, which is a special case of multiple linear regression and can be solved via an iterative approach or via matrix operations. The matrix approach is based on a simple formula; we are looking for the coefficient vector at cost function minimum, which equals (X_transp*X)^-1 * X_transp * Y. The matrix transpose operation is easy, but the tricky part is what is denoted here with a ^. We need to do a matrix inverse operation, which is easy with languages such as Python, but a little tricky to realize in Mql5 (if anybody knows a good solution...). The iterative approach can be written in just a few lines of code in Mql5. Just like neural networks, it also demands too much performance to be used with every tick (at least not in backtesting), but it is still a lot quicker than neural networks. In practice, think of ~30 iterations. Polynomial regression is "wrong" just as often as neural networks are, but it has the advantage, that the "predictions" are more stable, i.e. the directional bias usually doesn't change with every second candle, so even if the estimation for the next individual candle is wrong very often, polynomial regression is a great tool for catching the predominant trend. Plus: if we don't need to change our opinion with "every second bar", this translates into much lower commissions.

Of course, polynomial regression can be considered as "repainting", because with every newly incoming data points, the regression curve changes. I therefore like to work with the "trace line" (I use to call it "PRT"), that doesn't repaint. When applied to the same time frame and number of periods, the lag of this trace line is lower than the lag of a Hull Moving Average, which usually is considered as having only very low lag (I usually work with 5th degree/power polyn. regr.). On the other hand, polynomial regression doesn't go around "corners" (price spikes) very well and tends to exaggerate here. What at first glance seems like a problem can be extremely useful, because when the price drifts away significantly from the polynomial regression (trace) line, this is often marking a high or low and can therefore help with jumping onto the next move earlier (or improve the timing for getting out of a trade).

By the way: I think Moving Averages can be seen as "polynomial regression of 0 degree". For a given single point in time, MA's are just a single value, but lack any directional information. Adding this information leads us to trend lines (like f(x)=a*x+b),which can be described as a first degree regression line (usually obtained via least squares method), or as a regression chanel, if you add a corridor to it, e.g. of n standard deviations. But trend lines also don't go around corners (=at all). Here is where higher degrees of regression lines come into play.

I think a direct comparison between trading based on price deviations from (1) autoencoded prices, (2) polynomial regression (/polyn.regr.trace line) and (3) a standard Moving Average in an otherwise identical expert advisor, might be an interesting project for the future.

@John O'Hagen: sorry I hadn't answered to your pm yet. And yes, evolutionary algos are definitely cool, but they're used in different scenarios, because they are a method of reinforcement learning.

 
Chris70:

Absolute values of MSE, MAE and RMSE are meaningless, unless they refer to exactly the same kind of outputs. Only relative changes are relevant, e.g. when results from different input data are compared (training / validation / testing) or when changes to the model (like different number of layers, different learning rate, activation functions....) are compared to each other. The absolute values also depend on the label scaling method in use, other label data manipulations like log differencing and also the choice of loss function (probably MSE here).

I didn't write anything here over the past few days because I was playing around a lot trying to find any useful results. This is not a technical (/coding) issue, but it just seems, that there really isn't that much to predict in forex. However, complete randomness isn't the case, either.

Predictions often look something like this:


The data points in green are price predictions one timestep into the future (=autoencoder-decoded LSTM outputs), each made at the time of those preceding vertical lines (like can be seen on the far right, where the predictions are already made and the actual prices are still unknown). The observation that parallel lines can be distinguished comes from the fact that the model quickly learns the typical distance of high and low prices compared to open and close. Even if a directional bias does exist, it also is wrong VERY often. We all know that the smaller time steps are more noisy, on the other hand, it's hard to make predictions far into the future. The predictions I could make where only correct in 51-57% of the cases (on unseen data that where not part of the training set, with "correct" prediction defined as correctly predicted direction of the close versus the open for a given time period). It probably is no wonder that I got stable ~57% figures only for predictions of 1 minute ahead.

@John OHagan: what were your results for the percentage of correct directional prediction?

So it's not entirely random, but how useful is it? To conclude, I couldn't find any model that could be useful for retail traders because the edge is too small to overcome spreads and commissions. I also compared to the traditional LSTM model (with both stacked and single layer versions) and came to similar results (also in the 51-57% range, depending on the predicted period), although I could confirm that using the autoencoder model made a huge performance difference (=computation-wise). I also could confirm that LSTMs were superior to just feeding the timestep data into a simple MLP network.

Because of these dissappointing results, it doesn't mean that the chapter of neural networks is closed for me, but I consider their usefulness for forex predictions as very limited. I also believe that this is no fault of neural networks, but I guess that's just what forex is like.

A positive finding of this experiment on the other hand is the great power of autoencoders as an alternative for moving averages, just because autoencoders can return denoised "typical" current prices just like MA's, but with absolutely NO LAG. I will probably make use of this quality and forget about the prediction part.

----------

I also compared my neural network results to the predictive quality of another model, that I worked quite a lot with in the past, and which in retrospect might be a  better alternative: polynomial regression, which is a special case of multiple linear regression and can be solved via an iterative approach or via matrix operations. The matrix approach is based on a simple formula; we are looking for the coefficient vector at cost function minimum, which equals (X_transp*X)^-1 * X_transp * Y. The matrix transpose operation is easy, but the tricky part is what is denoted here with a ^. We need to do a matrix inverse operation, which is easy with languages such as Python, but a little tricky to realize in Mql5 (if anybody knows a good solution...). The iterative approach can be written in just a few lines of code in Mql5. Just like neural networks, it also demands too much performance to be used with every tick (at least not in backtesting), but it is still a lot quicker than neural networks. In practice, think of ~30 iterations. Polynomial regression is "wrong" just as often as neural networks are, but it has the advantage, that the "predictions" are more stable, i.e. the directional bias usually doesn't change with every second candle, so even if the estimation for the next individual candle is wrong very often, polynomial regression is a great tool for catching the predominant trend. Plus: if we don't need to change our opinion with "every second bar", this translates into much lower commissions.

Of course, polynomial regression can be considered as "repainting", because with every newly incoming data points, the regression curve changes. I therefore like to work with the "trace line" (I use to call it "PRT"), that doesn't repaint. When applied to the same time frame and number of periods, the lag of this trace line is lower than the lag of a Hull Moving Average, which usually is considered as having only very low lag (I usually work with 5th degree/power polyn. regr.). On the other hand, polynomial regression doesn't go around "corners" (price spikes) very well and tends to exaggerate here. What at first glance seems like a problem can be extremely useful, because when the price drifts away significantly from the polynomial regression (trace) line, this is often marking a high or low and can therefore help with jumping onto the next move earlier (or improve the timing for getting out of a trade).

By the way: I think Moving Averages can be seen as "polynomial regression of 0 degree". For a given single point in time, MA's are just a single value, but lack any directional information. Adding this information leads us to trend lines (like f(x)=a*x+b),which can be described as a first degree regression line (usually obtained via least squares method), or as a regression chanel, if you add a corridor to it, e.g. of n standard deviations. But trend lines also don't go around corners (=at all). Here is where higher degrees of regression lines come into play.

I think a direct comparison between trading based on price deviations from (1) autoencoded prices, (2) polynomial regression (/polyn.regr.trace line) and (3) a standard Moving Average in an otherwise identical expert advisor, might be an interesting project for the future.

@John O'Hagen: sorry I hadn't answered to your pm yet. And yes, evolutionary algos are definitely cool, but they're used in different scenarios, because they are a method of reinforcement learning.

yes, yes, yes, and yes ! here in trading, it's the pattern recognition that is touchy, not the NN itself. i'm quite sure it could show the revolutionary performances you expected at first once applied to another domain than pattern recognition.

 
Icham Aidibe:

yes, yes, yes, and yes ! here in trading, it's the pattern recognition that is touchy, not the NN itself. i'm quite sure it could show the revolutionary performances you expected at first once applied to another domain than pattern recognition.

I don't totally agree. What I expected was a significant performance boost by using the autoencoder trick. This part still holds true 100%.

I didn't expect anything "revolutionary" with regard to the predictive capacity. This part was completely unclear at the beginning, just like I did say that I'm totally okay if the system should just prove a random walk (which it didn't).

As for the pattern recognition part: NN by nature are absolutely great(!) for pattern recognition (just think of what's possible with convolutional neural networks). The problem is the market itself. We usually read too much into "patterns" and neural networks are just a neat way too prove this. When we look at a chart, our brain can fool use with nice picturesque lies and NN are an analytical/objective way to test if patterns hold up to their promise. The autoencoder part is nothing but a pattern recognition machine. The "universal function solver" within the multilayered structure then just shows that we shouldn't overinterpret the significance of those patterns for what happens at the following time step and the memory part that is added by using LSTM cells instead of simple neurons adds the time dimension and shows the role of the context in which those patterns appear.

In a way, those networks are no better and no worse for the task than billions of neurons in the human brain (they are only more objective and can be tested systematically more easily). We can visually analize chart patterns and make a "best guess" for what happens next and can declare ourselves happy if we are correct a little more than 50% of the time. It would be bold to expect much more from such a synthetic little "brain" and I don't believe anything that deserves the term "revolutionary" is possible when it comes to forex forecasting. So I think the justification of such networks in forex comes from automation and not from being superior.

@Icham: just curious...what other domains for application of neural networks of whatever kind (MLP, RNN, CNN...) can (=in trading) can you think of? News trading, sentiment analysis...? What else?

 

Thanks for the interesting topic, it will be interesting to see what you do next.

Maybe you can better apply machine learning to more skewed assets such as vxx, DJ, stocks. That would be more interesting than forex.


Also, I think (but not sure), that using whole bars from the past n bars would be better information for ML than tick data, because I just think that too much irrelevant information would even be too much for a universal function, we still need to simplify the problem for the robot so that it can function better. (Garbage in Garbage out)


Other things I could imagine to input to ML is Frequencies of waves (im looking at the old fourier transform indicators at the moment)

Slopes of MA's etc. 

 

Brian, thanks for the answer. I tried both whole bar OHLC and tick data. It doesn't change the outcome.

Fourier transformation is a different approach to get simplified / denoised information out of a chart pattern, but the results should look almost identical to using an autoencoder.

The reason why I trade forex only right now are just better conditions / trading costs for forex with the broker that I use right now. You are probably right, though! In the past during manual trading (other broker), I also found that trading stocks is much easier. Also Oil seems a lot easier (if you can manage with the volatility). What's easier for the discretionary trader might be easier for the machine as well...

 

Yes I think the underlying asset is the easiest starting point in terms of expected value. If it is a zero sum game then we have zero expected value - fees. If it is a skewed asset (perhaps due to profits or losses being earned) , then perhaps with a system in place we can extract the positive expected values. I found this with a candle stick EA of dow jones, could only get it to win on buy side but never on sell side no matter what. Also on Excel analysis you can see more positive days than down by far, so for what that's worth the probability is to the upside as can be seen on long term charts.


I could not get consistent results in forex from discretionary trading or from EAs ,Are you using any effective method for discretionary or how do you decide what to do? On Etoro there is a trader with mostly green months all about 1-1.5% profit just trading AUDUSD, for years but only just over 100 trades, so it must somehow be possible. 


Besides this, I still have plenty more things to discuss but for now would also like to ask what is this speech recognition type systems I here about that Reinassance uses/used, is it also ML? I don't know anything about that or markov chains

 

WIth regard to the "zero sum game", let me say that the observation that the winning probability is very close to just 50% on average for any point in time, doesn't mean that there is an even distribution and that there can't be situations with a far bigger winning probability.

Looking exclusivly for high probability situations might actually be a better use case for neural networks. I looked here only at the general predictive capacity at any random point in time.

Although the subject of this thread may lead to a different impression, I believe there's more money to be made by just reacting to what actually happens, than by predicting what possibly might happen. It doesn't mean we can't try both simultanously, though. Most of my profitable EAs are based on reaction to some kind of statistical outliers. Breakout or reversal trades based on deviation from the just mentioned polynomial regression trace lines are just one example. Yesterday was such an example with perfect opportunities in the GBP pairs, where I made a nice profit on the GBPJPY.

I did discretionary trading successfully for many years, almost exclusively with knock-out certificates on EURUSD, Oil, DAX and Gold, but nowadays my trading is 100% automated via VPS. It starts with being too lazy to manually calculate the position sizing ;-). Psychology is completely out of the equation, I can trade several pairs simultanously and 24/5. All my EAs have an automated safety killswitch for max. drawdown protection (meaning all positions are closed and new positions forbidden when a defined DD level is reached). I can only see benefits and can't think of going back to discretionary trading.

Regarding speech recognition: I'm just an ordinary guy who reads articles about machine learning on the internet, so I really am no machine learning expert and I try to learn mostly what I need for trading, which is why I'm almost clueless about speech recognition. What I know is that still LSTMs usually are the major component and recently "attention" learning became a big deal. However, the benefit of attention learning comes better dealing with varying sequence lengths, like a different number of words in sentences. However, we don't have this problem in trading.

About Markov chains: please don't confound Markov chains, hidden Markov model and Markov decision process. The hidden Markov model has been used for a long time as state of the art for speech recognition before it was replaced for the most part by recurrent neural networks like LSTM and GRU.

Here is a fantastic explanation of the hidden Markov Model and the Bayes theorem: https://youtu.be/kqSzLo9fenk

I have no practical experience with the hidden Markov Model in trading, but it is for sure something worth looking into.

I only have some experience with Q-learning (which is derived from the Markov decision process) for decision making in expert advisors.

 

Thanks for the insights, I will look at those links. I also agree with the outliers idea you mentioned. Can I ask how frequent your running system trades and as a goal what % returns do you aim to achieve for say a month or a year and the drawdown?


I also think you should try the high probability situations for your ML as you mentioned.


Are there existing indicators in code base for polynomial regression trace lines you mentioned?

 

This "what's your annual target?" question drives me crazy ;-)  I think it's completely pointless. We all want to make as much money as we can, obviously, and with minimum risk, but what's realistic is not our decision.

Maybe I have a good guess if it's gonna rain today and I can decide if I'll go outside and if I'll bring an umbrella and reduce the chance of getting wet, but I wont control the weather. Or at least I don't think the weather god will be impressed if my personal "target" is "plenty of sun today".

I can chose my trading environment, my position size, diversification and a well tested strategy. I can limit the risk and make sure that I have a historically positive expectancy and the result then just is what happens. Going by those rules, it's almost impossible to have negative years. But how positive? I don't think it makes any sense, to chose a "target" with something that cannot be controlled. Sometimes the trade just isn't there. I don't care. My only target is limited risk. Then I just take what the market has to offer.

Apart from that, for diversification it is important not to trade just one asset or currency pair and not just one system, so annual results also depend on the number of expert advisors that are running in parallel and on their online time / market exposure.

My personal red line for the drawdown of the entire account is 20%. Since 2007 I never reached this level, but I think it's important to define a limit. For an expert advisor to qualify for real trading I want to see at least 5 years of backtesting with a drawdown never beyond 15% (with round about 1% risk per trade; the actual formula is a little more complicated), at least 100 trades and a recovery factor >4. Needless to say that many expert advisors are dumped because they just don't qualify. No problem - on with another strategy. The profit factor is less important and depends on the number of trades during the backtest. A profit factor of something as low as 1.2 (after commissions and swap) can be totally okay if it was over say 2000 trades, while I would ignore such an outcome if it was over sasy just 150 trades. The EAs that reach profit factors in the ~2.0 area are usually the ones that trade only about once a week or less. Anything thats beyond 2.0 in my experience is only possible with serious overfitting (or high risk martingale or grid systems) and not realistic for real trading.

Other traders may have different limits. I think this is very personal. How risk adverse am I? What's the account size? Is it money that I need? Is it retirement money? Is it just "venture capital" that can be lost without any serious consequence...?

---

Sources for polynomial regression in the code base have just been quoted in a parallel thread. Working with the "trace line" on top of just standard polyn. regr. is my own idea (/at least haven't seen it elsewhere yet).

My computer (/Metatrader) is occupied with another backtest right now, so I'm sorry I can't create screenshots of what I mean at the moment.

 
Chris70:


A positive finding of this experiment on the other hand is the great power of autoencoders as an alternative for moving averages, just because autoencoders can return denoised "typical" current prices just like MA's, but with absolutely NO LAG. I will probably make use of this quality and forget about the prediction part.

@Chris70 

Thanks for this , this is very useful piece  of information.Have  you tried other methods like gaussian mixtures or variational autoencoders?

This is  an interesting  article about Continuous Rank Probability Score (CRPS)that I came a cross in Russian part of mql5   forum  https://towardsdatascience.com/a-short-tutorial-on-fuzzy-time-series-part-iii-69445dff83fb
Reason: