Taking Neural Networks to the next level - page 12

 
Chris70:

Thanks for the link to Siraj Raval's video. I think what's interesting about this video is that although the code example is written in Python, only simple functions that have a direct equivalent in Mql were used. Basic python IMHO really isn't that different. The real power comes with the imported libraries like keras, numpy, matplotlib, scipy...

And then there often is the argument about using GPU power, which for Mql is only possible via OpenCL, but in my personal experience Mql as a compiled (i.e. more 'medium' level) language can more than compensate for that compared to a 'high' level interpreted language like Python that's certainly better suited for human beings but less so for machines.

Of course there is no direct equivalent to matrix operations in numpy or adding neural network layers with just 1 line of code with Keras. But this is not because Python is inherently more powerful in the sense that it can do things that can't be done with Mql but because other people already did 99% of the work for us. With Mql on the other hand, we need first to build these libraries by ourselves. Is it a lot of work? Definitely. Is it a waste of time? I don't think so. This work only needs to be done ONCE and not with the complete complexity of Python's libraries, but only for the functions that we actually use and functionality grows over time according to the requirements of individual projects.

I'm pretty sure that anybody who is not a professional data scientist anyways will only use the tip of the iceberg of what theoretically can be done with Python.

A problem with machine learning + Mql is that as of today the standard library and the examples in the codebase are pretty much useless for serious machine learning applications.

No idea. Python's integration to MT5 is very new, I've coded mine in MQL : but when I did it I got no other choice, you do have the choice.

edit : the alglib isn't documented in mql, but it works also
 
Bayne:
Your autoencoder:

Is it still an MLP or have you Tried lstm autoencoders?
When i get it right you are only autoencoding the current index.  Or also a 3rd Dimension? (=the while sequence Back in your prediction window for every Index like an lstm does)

edit:

Additional a test without the autoencoder would also be valuable and informational.

+ are any trailing stops or similar involved?

If you suggest LSTM autoencoders I'd like to ask you: why? Sure, generally speaking I can think of scenarios were such a model makes sense, e.g. if in a sequence-to-sequence prediction case we have to deal with variable input sequence length; an LSTM autoencoder then can deliver a constant length sequence (=the bottleneck representation) for further processing with whatever model. However, with price data we have a continous data feed that we can cut into any chunks we want and therefore don't need to deal with sequence length variability.

Furthermore, the whole idea behind the autoencoder model that I described in the beginning of this thread was denoising and performance improvement in the case of very high resolution input data. If on the other hand we process high resolution data with every iteration over all timesteps of an LSTM, this is detrimental for performance and contradicts the reas oning behind using an autoencoder in the first place; and yet, we're not done by simple autoencoding, because we still need the decoder and a prediction model.

Keep in mind that I also used an LSTM (only after autoencoding) which is where the "3. dimension" comes in. Sure, we could also first use an LSTM type autoencoder and then feed the endoding into a supervised learning LSTM prediction model. But why?

___

If instead we change the reasoning and don't look for performance improvement or predictions, but just denoising in order to get a "typical" current price reference (similar to using a moving average), an LSTM autoencoder totally makes sense, but has very little to do with the application that I presented in this thread.

___

Making predictions and actual trading decisions (entries and exits) based on those predictions are two different things, so of course also trailing stops are not "involved" in any model. This is an entirely different discussion that has little to do with prediction models by themselves. Do I use trailing stops? Yes, I have them in most of my EAs as a selectable option (either fixed distance or via fractal based on support/resistance zones), but genetic optimatision usually shows better results with stop trailing switched off, so just one initial stop and a fixed take profit without "classic" trailing. I say classic because with prediction models it makes sense to verify that for preexisting open positions the odds are still in favor every time a new prediction comes in - and if they aren't it is only logical to get out prematurely, even if stop loss or tp were not triggered, but I wouldn't consider this as a trailing stop.

___

@Icham Aidibe: yes, you're right - ALGLIB is a huge(!) and valuable resource for functions related to statistics and machine learning, but the neural network functions from datanalysis.mqh are still somewhat rudimentary (in my opinion).

Basic Principles - Trading Operations - MetaTrader 5 Help
Basic Principles - Trading Operations - MetaTrader 5 Help
  • www.metatrader5.com
is an instruction given to a broker to buy or sell a financial instrument. There are two main types of orders: Market and Pending. In addition, there are special Take Profit and Stop Loss levels. is the commercial exchange (buying or selling) of a financial security. Buying is executed at the demand price (Ask), and Sell is performed at the...
 
Chris70:

If you suggest LSTM autoencoders I'd like to ask you: why? Sure, generally speaking I can think of scenarios were such a model makes sense, e.g. if in a sequence-to-sequence prediction case we have to deal with variable input sequence length; an LSTM autoencoder then can deliver a constant length sequence (=the bottleneck representation) for further processing with whatever model. However, with price data we have a continous data feed that we can cut into any chunks we want and therefore don't need to deal with sequence length variability.

Furthermore, the whole idea behind the autoencoder model that I described in the beginning of this thread was denoising and performance improvement in the case of very high resolution input data. If on the other hand we process high resolution data with every iteration over all timesteps of an LSTM, this is detrimental for performance and contradicts the reasing behind using an autoencoder in the first place; and yet, we're not done by simple autoencoding, because we still need the decoder and a prediction model.

Keep in mind that I also used and LSTM (only after autoencoding) which is where the "3. dimension" comes in. Sure, we could also first use an LSTM type autoencoder and then feed the endoding into a supervised learning LSTM prediction model. But why?

___

If instead we change the reasoning and don't look for performance improvement or predictions, but just denoising in order to get a "typical" current price reference (similar to using a moving average), an LSTM autoencoder totally makes sense, but has very little to do with the application that I presented in this thread.

___

Making predictions and actual trading decisions (entries and exits) based on those predictions are two different things, so of course also trailing stops are not "involved" in any model. This is an entirely different discussion that has little to do with prediction models by themselves. Do I use trailing stops? Yes, I have them in most of my EAs as a selectable option (either fixed distance or via fractal based on support/resistance zones), but genetic optimatision usually shows better results with stop trailing switched off, so just one initial stop and a fixed take profit without "classic" trailing. I say classic because with prediction models it makes sense to verify that for preexisting open positions the odds are still in favor every time a new prediction comes in - and if they aren't it is only logical to get out prematurely, even if stop loss or tp were not triggered, but I wouldn't consider this as a trailing stop.

___

@Icham Aidibe: yes, you're right - ALGLIB is a huge(!) and valuable resource for functions related to statistics and machine learning, but the neural network functions from datanalysis.mqh are still somewhat rudimentary (in my opinion).

thanks for the answer. Recently been wondering how you manage the problem with lacking data in terms of the Y Label of multiple Currencies.  What if the Next bar of the other pair doesnt exist? (the last hour of the day for example) Do you skip the whole row prediction (that would mean losing up to 2-3% of samples), or take the last known price...  whats your trick? :/

edit: another thing that caught my interest was the historical lenght you trained the model on (amont of years or decades)

 

@Bayne:

concerning "lacking data" we should consider three aspects:

1. missing data due to bad quality of the data provided by the broker: so the data do exist, but they're just missing in the provided file

2. actual periods of more rare data, e.g. at night

3. time differences between the first tick of a new bar/candle for different currency pairs

What I'm doing about aspect (1): I consider datapoints with "0" value as invalid and take the last valid datapoint instead; another approach might be to declare datapoints with a defined max. absolute percentage deviation relative to the last datapoint as invalid. Beyond that there really isn't that much we can do about it; if the data provider gives us non-sense prices, there's just no way of knowing the true prices;

Using iBarShift is a possibility for dealing with aspect 3. However, I personally don't like it and do something very simple instead: I have a datetime variable for the timestamp of the current bar and cycle through all symbols and exit the next neural network iteration via return shortcut if any of zero indexed candles of the symbols has the same timestamp as upon last declaration of my datetime variable. If instead they're all different (=if a new bar has begun for all symbols), the function continous and the timestamp variable is updated.

Because forex usually has a very high liquidity and tick frequency, it doesn't make much of a difference to delay the next iteration until all symbols have received a new tick. In practice they're updated more or less simultanously. For example what are a few seconds if the network is based on the hourly timeframe?

If actually whole bars are missing in the data file, this indeed leads to automatic skipping of this iteration, because - in other words - there's then at least one symbol with an unchanged timestamp of the most recent bar. However, I'd rather prefer losing a few percent of my data over training on bad data.

Because the number of datapoints in the trainingset depends just as much on the timeframe as on the historical data period, it doesn't make sense to look at the historical period length alone. I think the number of datapoints is what really counts. In practice I do my network training usually over several years, within the limitations of the provided data of my individual broker (for which lookback period do I have tick data? M1 data? D1 data?), so again it depends on the timeframe.

 
Beware man, ALGLIB is 20 years old ! 
 
Chris70:

@Bayne:

concerning "lacking data" we should consider three aspects:

1. missing data due to bad quality of the data provided by the broker: so the data do exist, but they're just missing in the provided file

2. actual periods of more rare data, e.g. at night

3. time differences between the first tick of a new bar/candle for different currency pairs

What I'm doing about aspect (1): I consider datapoints with "0" value as invalid and take the last valid datapoint instead; another approach might be to declare datapoints with a defined max. absolute percentage deviation relative to the last datapoint as invalid. Beyond that there really isn't that much we can do about it; if the data provider gives us non-sense prices, there's just no way of knowing the true prices;

Using iBarShift is a possibility for dealing with aspect 3. However, I personally don't like it and do something very simple instead: I have a datetime variable for the timestamp of the current bar and cycle through all symbols and exit the next neural network iteration via return shortcut if any of zero indexed candles of the symbols has the same timestamp as upon last declaration of my datetime variable. If instead they're all different (=if a new bar has begun for all symbols), the function continous and the timestamp variable is updated.

Because forex usually has a very high liquidity and tick frequency, it doesn't make much of a difference to delay the next iteration until all symbols have received a new tick. In practice they're updated more or less simultanously. For example what are a few seconds if the network is based on the hourly timeframe?

If actually whole bars are missing in the data file, this indeed leads to automatic skipping of this iteration, because - in other words - there's then at least one symbol with an unchanged timestamp of the most recent bar. However, I'd rather prefer losing a few percent of my data over training on bad data.

Because the number of datapoints in the trainingset depends just as much on the timeframe as on the historical data period, it doesn't make sense to look at the historical period length alone. I think the number of datapoints is what really counts. In practice I do my network training usually over several years, within the limitations of the provided data of my individual broker (for which lookback period do I have tick data? M1 data? D1 data?), so again it depends on the timeframe.

for the historiacal wuestion i was more reffering to the result a few pages ago (69% acc or so). Recently ive been triaining hour bars on EURUSDs last 20Yrs and wondered if it may make sense to use less data to learn patterns which may not last as long but have more predictive power in frequency.

So id like to ask you about what bars- & historical- TimeFrame you used for the result mentioned

 

Sorry Bayne/Ben, I only have a vague idea what you might be referring to, cause I did so many things in the meantime that I don't reliably remember that specific training session. I guess it was on the M15 chart, but please don't ask me about the time span (although there are so many 15 bars per year that I highly doubt that it was more than 5 years).

On the other hand: do you actually have good data quality over 20 years? Then lucky you! I don't really, with my broker. The further back in time I go, the worse it gets. That's why I sometimes even use the Metaquotes demo data for training, but this is far from optimal because my real trading is on an ECN account.

 

The amount of history data you need for trading (as a trader, not an Artificial Neural Network trying to "learn" how to trade), depends on the time-frame you plan to trade.

For example, if you are Scalping on 1-Minute charts and closing each trade within 15 minutes or less, about a week worth of history data should be enough to make solid trade decisions. But ... you would need at least 27 years of history data to get comparable accuracy when Swing-Trading on Daily charts, under the assumption that your trades won't stay open for much longer than 2 weeks.

One thing that I should point out here is that you will need data from the most recent history. Having said that, I seriously doubt you can build an Artificial Neural Network to trade successfully, if it doesn't have the ability to learn and to continue learning as new data arrives, adapting its decisions continuously. The most important candle is always the last candle. Everything that happened before, loses its importance as time passes by.

Also, because of the noise, which is always present on the market, it is highly unlikely that you would be able to predict the NEXT candle, so don't even bother with that. What you should try to get from your ANN, is the general direction (side-ways, up or down) and the range where the price is most likely going to be within the next X candles.

In short, since you need roughly 10000 bars (yes, 10 thousand) to make reasonable predictions about possible price movements within the next 15 bars, your Artificial Neural Network would most likely also need 10000+ input neurons, 3 output neurons for the general direction (side-ways, up or down), 15 output neurons for the high range and 15 output neurons for the lower range. You'd also have to keep training your network with each new bar that just closed, so it can continue adapting to constantly changing market conditions. Alternatively, if you wanted to reduce the number of input neurons, you could try using less data from your targeted time-frame, while adding data from multiple higher time-frames, but that would come at the cost of reduced accuracy.

There are also big differences between trading Days (Mondays are usually slow movers, as traders start preparing for a new week, while Fridays tend to be choppy, because most traders will try to minimize their exposure over the weekend and close their positions) and trading Sessions (EUR pairs have the biggest moves when London is open, while AUD pairs move mostly during the Sydney Session). If your ANN does not have any input about the time of the day and the day of the week, it might as well be trading blindly. So you should probably also include the day of the week and the hour of the day as input neurons, because they usually have a big impact on trading activity.

If you can do that, provided you have sufficient computing power to keep it running in real-time, you might be able to achieve results comparable to those of an experienced trader. Otherwise, you'll just be lagging behind and your ANN predictions will be useless. And that is - IMHO - the absolute minimum you would need, NOT a guarantee for success.

 

you have to be careful also of artificially reconstructed prices during high volatility peaks when the liquidity is low and there are gaps in prices.


but why not use random data to train your NN?

 
NELODI:

The amount of history data you need for trading (as a trader, not an Artificial Neural Network trying to "learn" how to trade), depends on the time-frame you plan to trade.

For example, if you are Scalping on 1-Minute charts and closing each trade within 15 minutes or less, about a week worth of history data should be enough to make solid trade decisions. But ... you would need at least 27 years of history data to get comparable accuracy when Swing-Trading on Daily charts, under the assumption that your trades won't stay open for much longer than 2 weeks.

One thing that I should point out here is that you will need data from the most recent history. Having said that, I seriously doubt you can build an Artificial Neural Network to trade successfully, if it doesn't have the ability to learn and to continue learning as new data arrives, adapting its decisions continuously. The most important candle is always the last candle. Everything that happened before, loses its importance as time passes by.

Also, because of the noise, which is always present on the market, it is highly unlikely that you would be able to predict the NEXT candle, so don't even bother with that. What you should try to get from your ANN, is the general direction (side-ways, up or down) and the range where the price is most likely going to be within the next X candles.

In short, since you need roughly 10000 bars (yes, 10 thousand) to make reasonable predictions about possible price movements within the next 15 bars, your Artificial Neural Network would most likely also need 10000+ input neurons, 3 output neurons for the general direction (side-ways, up or down), 15 output neurons for the high range and 15 output neurons for the lower range. You'd also have to keep training your network with each new bar that arrives, so it can continue adapting to constantly changing market conditions.

If you can do that, provided you have sufficient computing power to keep it running in real-time, you might be able to achieve results comparable to those of an experienced trader. Otherwise, you'll just be lagging behind and your ANN predictions will be useless. And that is - IMHO - the absolute minimum you would need, NOT a guarantee for success.

You're mentioning correctly that the most recent history is more important than older data. If batch or mini-batch training is used (=the usual way of doing it in Python), this indeed means that we need regular maintenance, i.e. retrain the network on a schedule. This is not necessary on the other hand if we use "online" training, which offers the option(!) - not an obligation -  of continuous retraining simultanously with trading, upon each newly incoming bar. This is much easier to implement in Mql (and also what I did; I don't use batch training). Batch training has it's major performance advantages if the training is based on tensor operations done by a GPU. If the calculations are CPU based and the iterations rely on for-loops instead of tensor calculations, this advantage no longer counts. The only valid critique regarding 'online' training is that it's impossible to shuffle the data, so we suffer from overfitting to the most recent data. But this can be a good(!) thing, because it reflects changing market periods. Infact, it's somehow a very "human" behaviour to be more under the impression of the more recent data.

Where you're completely wrong is your understanding of input data with time series (sorry if it makes me wonder if you have any practical experience with neural networks at all or if you're just talking theory): you don't need to compute all "10000+" datapoints at once. Instead, these are a chain of data and you always process only one chain member at a time, not 10000! But this doesn't mean that the information of the other 9999 datapoints isn't also involved (=indirectly): simply because all previous training iterations were based on the sum of those 9999 datapoints and their information is "stored" in the network's weights. This is true for any network architecture. With the special cases of recurrent networks (RNN) or recurrent networks with memory cells (LSTM, GRU)), we have on top of that a "lookback period" of time stamps in chronological order (even in batch training with shuffled data) that is relevant for the individual calculation, beyond the weight matrix. But this lookback period usually consists of a few dozen or hundred candles, not 10000.

If you compare to manual trading: do you have 10000 candle on your chart at any time? Probably not. But you have an understanding of general market behaviour / trading rules / current market phase. Regarding neural networks, this all this is stored in the network weights.

Again, for the individual trade, you probably only have a few dozen candles on your chart and adding more wouldn't change anything because you already know about the current phase that the market is in: this is an ability that networks with memory cells have, too: independent of the weights, there are also the "cell state" variables. These adapt countinously, with every "prediction" (=forward pass), i.e. even outside network training, and they can very well can store the information about the current market phase / sentiment.

Reason: