# Taking Neural Networks to the next level - page 14

412

Thanks for your code analysis. I seriously wasn't expecting this. Anyway ...

1. Why just one hidden layer? Honest answer? I don't know how to calculate back-propagation when there is more than 1 hidden layer :/ I wasn't kidding when I wrote that my knowledge about ANNs is 30 years old. I actually had to search the net to refresh my memory and the only useful examples that had source code attached, were using one hidden layer. I did find more examples that come with a lot of lengthy explanations and math formulas, but even though I can read most source code easily, it's not that easy for me to wrap my head around the math. And nowadays, it's hard to find any code that doesn't depend on some external libraries, which makes most of the source code examples on the net completely useless, unless you are using the exact same programming language and tools as the original author. For example, instead of writing a few lines of code to show the actual math function, the author would #include some external library where the function is implemented. I mean ... seriously?

```double MSE=0;
for (int k=0;k<OUTPUT_NEURONS;k++)
{MSE+=0.5*pow(FTargetLayer[k]-FOutputLayer[k],2);}```

... gives you the same result as this ...

```double Error=0,MSE=0;
for (int k=0;k<OUTPUT_NEURONS;k++)
{ Error = FTargetLayer[k]-FOutputLayer[k]; MSE += Error*Error; }
MSE*=0.5;
```

... but the latter should (I think?) be less expensive when there are a lot of OUTPUT_NEURONS, because it doesn't make any function calls and the 0.5 multiplication is done only once for the entire loop. I didn't test it, though. Just another one of my "auto-optimization" habits. Loops, function calls, memory allocation/de-allocation, floating point math, error handling - expensive stuff.

3. The "if e!=-1.0" is just my paranoia of Division-by-Zero exceptions ;) Or exceptions in general. But now that you've explained what Exp does (I didn't really think about it), it's clear that it was unnecessary there.

Now ... here's the same source code, but with modifications based on your suggestions. It still only has one hidden layer and uses different variables for input/hidden/target/output layers (as before), but hey! It's a step in the right direction. Right? I still think Neural Networks aren't very useful for predicting price action, but if anyone wants to experiment with it in MQL5, maybe this short example can give them a head start. It should be easy enough for anyone with some programming experience to combine that code with your activation functions to make it more useful, but I left that part out to keep it short. And ... just for the record ... I' not using this code myself, so don't ask me for advice or yell at me if it doesn't work :P the only thing I can say is that it compiles in the MQL5 editor ;)

```// Neural Network Parameters ...
#define DEFAULT_LEARNING_RATE 0.001
#define DEFAULT_WEIGHTS_INIT_FACTOR 0.5
#define INPUT_NEURONS  100
#define HIDDEN_NEURONS 80
#define OUTPUT_NEURONS 5

//- Neurons and their weights -
double FInputLayer[INPUT_NEURONS+1];    // the last element is used for BIAS = 1.0
double FHiddenLayer[HIDDEN_NEURONS+1];  // the last element is used for BIAS = 1.0
double FTargetLayer[OUTPUT_NEURONS];    // target output when training the network
double FOutputLayer[OUTPUT_NEURONS];    // output produced by the network

double FHiddenLayerWeights[HIDDEN_NEURONS+1][INPUT_NEURONS+1];
double FOutputLayerWeights[OUTPUT_NEURONS+1][HIDDEN_NEURONS+1];

double FLearningRate = DEFAULT_LEARNING_RATE;
double FWeightsInitFactor = DEFAULT_WEIGHTS_INIT_FACTOR;

double FTrainingError = 0;

//+------------------------------------------------------------------+
void bpInit()
{
FTrainingError = 0;
bpInitWeights();
}
//+------------------------------------------------------------------+
void bpInitWeights()
{
int i, j, k;
MathSrand(GetTickCount());
//--- Initializes the hidden layer weights
for(j=0; j<=HIDDEN_NEURONS; j++)
for(i=0; i<=INPUT_NEURONS; i++)
FHiddenLayerWeights[j, i] = (MathRand()/32767.0 - 0.5) * (FWeightsInitFactor * 2.0);
//--- Initializes the output layer weights
for(k=0; k<=OUTPUT_NEURONS; k++)
for(j=0; j<=HIDDEN_NEURONS; j++)
FOutputLayerWeights[k, j] = (MathRand()/32767.0 - 0.5) * (FWeightsInitFactor * 2.0);
}
//+------------------------------------------------------------------+
void bpApply()
{
int i, j, k;
FInputLayer[INPUT_NEURONS] = 1.0; // input layer's bias
FHiddenLayer[HIDDEN_NEURONS] = 1.0;  // hidden layer's bias
//--- Feedforwards from INPUT to HIDDEN layer
for(j=0; j<HIDDEN_NEURONS; j++)
{
FHiddenLayer[j] = 0;
for(i=0; i<=INPUT_NEURONS; i++)
FHiddenLayer[j] += FInputLayer[i] * FHiddenLayerWeights[j, i];
FHiddenLayer[j] = BipolarSigmoid(FHiddenLayer[j]);
}
//--- Feedforwards from HIDDEN to OUTPUT layer
for(k=0; k<OUTPUT_NEURONS; k++)
{
FOutputLayer[k] = 0;
for(j=0; j<=HIDDEN_NEURONS; j++)
FOutputLayer[k] += FHiddenLayer[j] * FOutputLayerWeights[k, j];
FOutputLayer[k] = BipolarSigmoid(FOutputLayer[k]);
}
}
//+------------------------------------------------------------------+
double HiddenErrors[HIDDEN_NEURONS+1];
double HiddenWeightsCorrection[HIDDEN_NEURONS+1][INPUT_NEURONS+1];
double OutputErrors[OUTPUT_NEURONS];
double OutputWeightsCorrection[OUTPUT_NEURONS][HIDDEN_NEURONS+1];
//+------------------------------------------------------------------+
double bpTrain()
{
int i, j, k;
double Error;

//--- Feedforward phase
bpApply();

//--- Backpropagation phase
for(k=0; k<OUTPUT_NEURONS; k++)
{
//- Computes OUTPUT layer weights error information
Error = FTargetLayer[k] - FOutputLayer[k];
FTrainingError += Error * Error;
OutputErrors[k] = Error * BipolarSigmoidDerivation(FOutputLayer[k]);
for(j=0; j<=HIDDEN_NEURONS; j++)
OutputWeightsCorrection[k, j] = FLearningRate * OutputErrors[k] * FHiddenLayer[j];

//- Computes HIDDEN layer weights error information
for(j=0; j<HIDDEN_NEURONS; j++)
{
HiddenErrors[j]=0;
for(k=0; k<OUTPUT_NEURONS; k++)
{
HiddenErrors[j] += OutputErrors[k] * FOutputLayerWeights[k, j];
HiddenErrors[j] *= BipolarSigmoidDerivation(FHiddenLayer[j]);
for(i=0; i<=INPUT_NEURONS; i++)
HiddenWeightsCorrection[j, i] = FLearningRate * HiddenErrors[j] * FInputLayer[i];
}
}

//- Updates OUTPUT layer weights and bias
for(k=0; k<OUTPUT_NEURONS; k++)
for(j=0; j<=HIDDEN_NEURONS; j++)
FOutputLayerWeights[k, j] += OutputWeightsCorrection[k, j];

//- Updates HIDDEN layer weights and bias
for(j=0; j<HIDDEN_NEURONS; j++)
for(i=0; i<=INPUT_NEURONS; i++)
FHiddenLayerWeights[j, i] += HiddenWeightsCorrection[j, i];
}
return FTrainingError*0.5;
}
//+------------------------------------------------------------------+
double BipolarSigmoid(double x)
{
return 2.0/(1.0+MathExp(-x))-1.0;
}
//+------------------------------------------------------------------+
double BipolarSigmoidDerivation(double Fx)
{
return ((1.0+Fx)*(1.0-Fx))*0.5;
}
//+------------------------------------------------------------------+
```
PS. Maybe it's just me, but ... as a programmer/software-developer, I find code written in any programming language much easier to understand than explanations written in any natural language (like English - which is used by people to communicate).
544

" I still think Neural Networks aren't very useful for predicting price action, but if anyone wants to experiment with it in MQL5, maybe this short example can give them a head start."

I thereby understand that you don't plan on going much deeper into the subject. It's still nice of you that you shared this example. It may be the most basic form of a neural network, but a good root for people to start with.

1023

My post before shouldnt offend anybody. Hope we're cool all ;)

the actual statement is: if you could make it work on one or two years out of sample after training on at least 10 years, it might be (at least from my point of view) worth to give it a try , because a positive edge is a positive edge.

In the best possible outcome one exploited this models potential and made some winning trades, in the worst case you either lost some money, and saw it perform or launuched it in a demo account an just saw it perform (one could figure out how long it makes useful predictions etc...) Surely this can be done in out of sample data too (with similar or maybe the same results).

I assumed that random walk results wouldlook like about 33% accuracy in predictions (since i heard it i a tutorial from SentDex). However 64% is above a coinflip. (Surely a solid declaration for useful % scores depends on what you are trying to predict or is there a definition for that?)

@Chris70

After implementing the regression apporach the model only makes predictions into a specific direction, either only  long or only short, though it isnt intended. It might be caused by the Labeling Method:

The Outputneurons look like this:                 [+/- points into Profit direction      , +/-  points into Loss direction].

At next I'll implement it the following way:   [+/- points into Positive direction   , +/- points into Negative direction].

Did you solve it ( the "direction-bias" problem, if on can call it like that)  in a similar way?

Im exited to see how this thread can move on.

544

all cool, I guess;

Again, training years are irrelevant - it think - as long as at least different market cycles are covered: in order to discover hidden mathematical relationships with the least amount of overfitting, it's mainly about training samples(!), not years. The lower the time frame, the more samples we get and the shorter the training history can be and then also microcycles become more interesting then macrocycles, so there can't be no satisfying answer as to which training period is optimal without looking at the time frame, too.

The same accounts for the predictions in testing on unseen data: for the accuracy it's more interesting how many predictions were made then how many years passed. Imagine a network based on the monthly timeframe: in 5 years it makes only 60 predictions - that's what an M1 based algorithm does within an hour (assuming a model that does one iteration with every new bar).

Bayne, you should probably mention that we know each other's trading approaches a bit because of topics in the german forum and also many exchanged PM's that the public forum can't know about, which makes it harder to understand for the public what model you're referring to here, so if it's in public I think it would be nice to explain a bit what you're doing (maybe also in a dedicated thread or forum blog post); the labeling that you describe here makes less sense to anybody who doesn't know about your other attempts (your price channel classification model etc...).

I don't understand what you mean by direction bias "problem". I regard a directional bias as the network having an opinion / a tendency (and not something wrong or an error or "problem). We want(!) the network to be biased because it's the difference between randomness and predictive capacity. I'm aware that in science the word bias is often used for some kind of systematic error; that's not what I meant here. I mean a tendency of the individual prediction, not some systematic tendency that makes all predictions wrong by a certain amount.

In a complete randomness scenerio, over time a network would learn to reproduce the mean. I'm assuming here, that acutally a global loss function minimum was found, so the training was not aborted too early and we put for the learning process some kind of momentum measures into place in order to hopefully protect from being stuck in local optima and have used dropout or other methods against overfitting. In a really random world, reproducing the mean then is nothing else but the answer with the least average error. For such a well trained network, any deviation ("bias") from the mean must suggest an actual trend direction for the next time step (=not only of the past data, but also one time step ahead).

1023

Chris70:

all cool, I guess;

Again, training years are irrelevant - it think - as long as at least different market cycles are covered: in order to discover hidden mathematical relationships with the least amount of overfitting, it's mainly about training samples(!), not years. The lower the time frame, the more samples we get and the shorter the training history can be and then also microcycles become more interesting then macrocycles, so there can't be no satisfying answer as to which training period is optimal without looking at the time frame, too.

The same accounts for the predictions in testing on unseen data: for the accuracy it's more interesting how many predictions were made then how many years passed. Imagine a network based on the monthly timeframe: in 5 years it makes only 60 predictions - that's what an M1 based algorithm does within an hour (assuming a model that does one iteration with every new bar).

Bayne, you should probably mention that we know each other's trading approaches a bit because of topics in the german forum and also many exchanged PM's that the public forum can't know about, which makes it harder to understand for the public what model you're referring to here, so if it's in public I think it would be nice to explain a bit what you're doing (maybe also in a dedicated thread or forum blog post); the labeling that you describe here makes less sense to anybody who doesn't know about your other attempts (your price channel classification model etc...).

I don't understand what you mean by direction bias "problem". I regard a directional bias as the network having an opinion / a tendency (and not something wrong or an error or "problem). We want(!) the network to be biased because it's the difference between randomness and predictive capacity. I'm aware that in science the word bias is often used for some kind of systematic error; that's not what I meant here. I mean a tendency of the individual prediction, not some systematic tendency that makes all predictions wrong by a certain amount.

In a complete randomness scenerio, over time a network would learn to reproduce the mean. I'm assuming here, that acutally a global loss function minimum was found, so the training was not aborted too early and we put for the learning process some kind of momentum measures into place in order to hopefully protect from being stuck in local optima and have used dropout or other methods against overfitting. In a really random world, reproducing the mean then is nothing else but the answer with the least average error. For such a well trained network, any deviation ("bias") from the mean must suggest an actual trend direction for the next time step (=not only of the past data, but also one time step ahead).

What about the fact that the market changes after an unregular timespan? Remeber: 1999 less institutions & retail traders used algorithms. 2010 it was more. Nowadays in the USA its about 80% (source ChatwithTraders ;) ). Therefore as you already mentioned : the more recent the data, the more relevant it is.

One aspect that speaks for using longer timespans remains: From where do we know that all the techniques back then used in 1999 are enough represented in the data of nowadays, so that our algo model could use them, older techniques beginn to "work" for many other people on the world again (e.g. Turtle Trader strategy)?

Probably im thinking too much into something that has too few relevance to consider into this giant pool prices. And probably covering the large part with enough" different market cycles" is enough (or the maybe best possible thing) we can give our model to prevent it from learning irrelevant stuff that came "out of time".

To the model im talking about: nono dont get me wrong ^^. I am rebuilding your LSTM-MetaLabel-model (to modify it). I didnt talk about my price channel approach in this forum yet ( if theres interest i could do it no problem).

(but the mistake i was refering too was in fact the same labeling approach also used that pricechannel approach ;) ).

The term "bias" in the recent post is intentionally used for an error. After rebuilding the LSTM-MetaLabel-model with those "wrong" labels it made 600 predictions... All Long not a single of them Short. (Where short trades should have been we got a positive number for both sides (or an absolute "No" from the MetaLabel-Net) and therefore couldnt trade them :/ ==> the conclusion ( One Output for +/-Points in Profitdirection and One for +/- Points in Loss direction defenitly dont work well for that approch) (tested it on multiple hyperparameter/model variations) ==> Now im trying a "normal" Label approach

Btw: check the Oil price if youre training a Model with USD. Could pay out as a feature ;)

@NELODI

You dont have to be a 100% right in your trades. Also the market doesnt have to consist of all traders using the same strategy. Its enough if a large part of traders uses an indicator ( maybe only 10 % using similar variations of a strategy can be enough). Also some indicator, which isnt directly used in a strategy could figure out "niches" or market gaps, which resulted from the use of other indicators. Not to forget about some indicators who always will "work" better or worse. A moving average (at least as a regime filter) will always be held up, as long as traders trade trends. Its more about finding gaps than the holy grail.

60% accuracy with for example 1.5 CRV will create profits  if the right patterns are found.

544

[NELODI: did you delete your last post? this answer is referring to it, but I can no longer see your original post(?) --> your question was if

A) traders use techanical analysis because it works or if

B) tech. analysis works because traders use it]

---

or even

C) Traders use technical indicators even if they don't work?

Like anybody, I only have my personal opinion and experience and it's impossible to come up with definitive answers. I like to see the markets as a signal/noise combination. Thousands of traders and algos all over the world with their different timeframes, strategies, indicators... alltogether produce a perfect noise that only causes random moves that on a long-term average are unbiased. Then there is the "signal", mostly driven by news/fundamentals (which doesn't say that the overlay of the two can't sometimes go into the "wrong" direction). Of course this is a theoretical descriptive model as a possible idea of reality - not reality itself. But if we follow this thought, signal/noise decomposition tools should give us better results than classic technical analysis. With one exception: "indicators" that try to identify statistical outliers that are so extreme that they can't be explained by randomness. Just think of the infamous Swiss Franc crash: I'm sure that many indicators went through the roof and once we see really exceptional values, I can imagine that there is some short term predictive capacity within some traditional technical indicators. But exceptional by definition doesn't happen all the time and for sure an RSI <30 or a 2 std.dev. Bollinger Band by any means is far from exceptional and doesn't really tell a story. And I also don't believe that there is some yet to be found "needle in the haystack indicator" that does any better. Extremes (with the disadantage of being rare and giving only few trade signals) and signal/noise decomposition methods in my opinion are the way to go. This is the reason why I worked so much with neural networks. If a neural network doesn't discover the real story behind the noise, than no other method does either. It has been proven that neural networks of decent size and with non-linear activations are universal(!) formulas and no single indicator or indicator combination can beat the full spectrum of "universal". Anybody please prove me wrong.

I know I'm quite lonely with this opinion and many traders (especially after a lucky streak) would tell me how fabulous their magical indicator combination is. But it's simple math that for any dollar won there's another trader that lost a dollar and I'm confident to believe that the dollar won more often goes to institutions than retail traders. Many "successful" traders confound being lucky for a while with having an edge. I don't have the magic recipe either, but after doing this since 12 years now I acknowledge that finding valuable information in the currency markets is extremly hard and all the simple tools are useless beyond chance. So for the answer is a C).

@Bayne: the real credits for the LSTM/metalabel concept of course go to the book author Dr.M.LopezDePrado, so it's not "my" model.

Again regarding the training time horizon: in general there probably ain't no such thing as too many data, as long as your code can handle it within reasonable time. Speaking for myself, I also have to deal with the data quality that my broker provides and work with what I have.

You mention correctly that the landscape changes over time and that some phenomena of e.g. 1999 no longer exist, whereas new aspects have shown up that years ago nobody thought about. So of course you can consider training based on all kinds of market phenomena as advantageous, but you can just as well consider it the other way around and ask "why pollute my shiny weight matrix that is so well adapted to "modern" market behaviour with 20 year old data that are no longer relevant?". I guess there just is no perfect way.

412

Chris70:

[NELODI: did you delete your last post? this answer is referring to it, but I can no longer see your original post(?) --> your question was if

A) traders use techanical analysis because it works or if

B) tech. analysis works because traders use it]

In essence, my question was: "Do traders use technical indicators because they work, or do technical indicators work because traders use them?" But you are right. Traders also use technical indicators which don't work. But since there are so many indicators in use and you can't know which ones are being used when, I don't believe it is possible to build a "universal" formula to cover them all. And that's what you are trying to do with a Neural Network - if I understand correctly.

11968

NELODI:

In essence, my question was: "Do traders use technical indicators because they work, or do technical indicators work because traders use them?" But you are right. Traders also use technical indicators which don't work. But since there are so many indicators in use and you can't know which ones are being used when, I don't believe it is possible to build a "universal" formula to cover them all. And that's what you are trying to do with a Neural Network - if I understand correctly.

Some instruments are available for trading with exotics specs - which requires to "neuralize" much more than only entries points - but so far it's possible to handle most of the pairs available in the same "model" than your chosen standard (EURUSD).

412

Icham Aidibe:

Some instruments are available for trading with exotics specs - which requires to "neuralize" much more than only entries points - but so far it's possible to handle most of the pairs available in the same "model" than your chosen standard (EURUSD).

I don't trade EURUSD. I think it moves way too slow. My personal favorite is XAUUSD, because it has a lot of volume, but is much more volatile. I also keep an eye on USDCHF, USDJPY, EURCHF and EURJPY, because (A) I've found them to be the most reliable pairs to predict with standard indicators and (B) that is where most of the action happens during London and New York Sessions (my trading hours). I've also been trading GBP crosses before, but I've stopped trading GBP because it became extremely unpredictable, with sudden spikes in both directions, for no apparent reason, before returning back to the average. I use the same indicators on all the instruments I'm trading. Here's an example of how my chart looks like. I bet, most traders would run away from such a crowded chart, but that's how I like it ;)

USDJPY, M1, 2019.10.14

International Capital Markets Pty Ltd., MetaTrader 5, Demo

544

NELODI:

I don't believe it is possible to build a "universal" formula to cover them all.

This has been proven as the "universal approximation theorem".

"Approximation" because neural networks are an iterative method. But as long as the approximation isn't good enough, every new training iteration takes us one step closer to the optimal formula (asymptotically).

It also has been proven, that this doesn't depend on the number of layers. In theory even a single layer "Multilayer" Perceptron can be a universal formula (/approximator), given enough neurons.

Any formula of course can only tell as much as is provided by the input variables. If there is a missing variable, that contains additional and irreplaceable information, there will always be a better formula. But this is just as true for technical indicators. Or how much (technical) chart analysis can you do with 5 prices? If we add more prices, volume, other currency pairs, the picture becomes more complete for ANY "formula". This being said, given identical input data, a neural network will always beat any technical indicator. It has to! It's by definition forced to do so! Because it will self-improve until it does. And because technical indicators just impose some kind of mathematical filter upon price (and/or volume) data that maybe improves upon human interpretability, but can't be reversed into its source data and therefore reduces available information, raw prices are better source variables. These combined with a universal formula approximator can be trained to reveal the best possible information (=always under the assumption that valuable information exists AT ALL).

Just imagine you had to find a formula for facial recognition: that is an equation with as many variables as the picture has megapixels. That's humanly impossible. But a neural network can do it. This example shows, that there's hardly any limit to finding good equations. A bunch of indicators selected based on personal choice or trial and error can impossibly be this powerful.

The real limitation is not in the neural networks, but in the data themselves: we just can't find hidden relationships that simply don't exist. But anything that does exist, an ANN will find it.

Furthermore, indicators essentially are no proven formulas, but hypotheses: we HYPOTHESIZE e.g. an asset is overbought and price will go down next because some oscillator is high. It's no more than an illusion that indicators are proven formulas. Neural networks on the other hand are indeed proven formulas (with iteratively self-improving accuracy towards a global error minimum). So with neural networks we don't need to make any assumptions. If the network is wrong, it will correct itself until it isn't.

Fun fact: any discretionary trader uses neural networks - they're also called 'brains' ;-)   ... they just don't seem to work on autopilot if we're not at the screens

[I also don't like EURUSD very much; the "standard" is only because of the available data quality, which varies between currency with my broker; I like GBPJPY for the combination high volatility + low spreads and things like EURCHF or AUDUSD if I want more stability / less surprises. When I still did discretionary trading, my favorite always was oil: good volatility, but always the same repeating patterns. I just "knew" what price will do next. So thanks also for Bayne's suggestion: adding oil price to the picture might indeed be interesting.]