Taking Neural Networks to the next level - page 5

 
Chris70:

there have been some good articles on the topic of genetic learning, like this one:

https://www.mql5.com/en/articles/2225

I also wrote a bit about it here: https://www.mql5.com/en/forum/316602 , but there was no interest in the topic at the time

added to favorites, very interesting thanks. 
 

Sorry guys, the first results are dissappointing.

I let the version with the 60 min * M1 OHLC-V data (=5x60=300 original data points per timestep, auto-encoded to 30 LSTM inputs, 3 layer LSTM network with 100 timesteps) train for EURUSD in 2016+2017 until the loss function converged to a minimum, then compared this training set to a test set of 2017+2018.

After training, the model still kept the directional bias, i.e. clear indications whether it "thought" that the price will go up or down over the next hour (shown by zig zag lines pointing in this direction). Visually, these estimations seemed correct more often than not.

This is supported by the data: 

On the training set the mean absolute error (MAE) went down to 0.153 compared to 0.248 as baseline of the "naive" model. Absolute values are meaningless here, because the numbers result from a mixture of price and volume; just relative changes count.

The coefficient of determination (with the formula for multiple linear regression) went up from 0.148 (baseline) to 0.273 (trainset).

So far, so good... BUT:

Those findings couldn't be supported with the testset data.

On the test set, the mean absolute error (MAE) with a value of 0.30 was even a little higher than the 0.248 baseline value.

The coefficient of determination with a value of 0.19 was only slightly better than baseline (0.148).

This leads to the assumption, that the model made useful predictions on the training set, but without confirmation on the test set, there is strong suspicion of overfitting.


Of course, it makes sense to play around a little and try out different model architectures (layers, nodes, time steps, autoencoder settings, learning rates...) and it is very well possible, that a slightly different design does the trick.

Apart from that, I also have to do the comparisons to

- same model without volume data (not sure if volume is really helping here)

- simple LSTM without the autoencoder part

. stacked LSTM without the autoencoder

- multi-currency LSTM

I'll keep you guys updated if I find anything more useful. The project is still on.

EDIT: just found a little bug in the code that corrupted the autoencoder file that I used here... need to repeat the test

 

You should ditch tick volume. A broker can filter ticks, plug in or plug out a LP which could change the tick volume. Here one can see 3 differences in tick volume from 1 broker:


Also if you train your model on data from broker X it will fail on Broker Y because of a differences in tick volume. Only use real volume where volume comes from Exchange, not your broker.


I will say again; If we can agree the data( EURUSD, whatever time frame) you use to train is a random walk (you still seem to be on the fence about that), where there is no pattern, no structure to be found, and your autoencoder does give results, it is due to over fitting / curve fitting. IE your encoder has modelled a structure to fit the data. It will perform poorly OOS where the structure does no longer fit the data.

This brings another point. In order for the model to adapt to the market, batch training (offline training) in my opinion is not suitable. This type of model require a continues learning process. You could try and test if online training gives more or less a stable result over time. Also this will reduce your worry about the "confidence" about what is predicited. For example if the last sample shown to the network results in higher then average MSE, you can use this information both in training as well as trading decisions.

 

You*re convincing me that I should ditch tick volume (--> "done")

About the random walk: I neither agree nor disagree. I'm just not sure yet (and trying to find out). But can we agree, that we all should stop trading right now if all is random? Maybe we should.. How can anybody with positive P&L stats be sure that he(/she) isn't just fooling oneself with the own survivor's bias and "trading skills" were just being lucky? Yes, I know about all those guys out there saying things like "you only need to know how to read a chart" ... and before the "consistency" arguments arise: Myself included, I'm net profitable and think I know something about technical analysis, but if you are correct and everything is random ONLY, then there really isn't much to know for the simple coin flip. So I think it's really worth finding out and not just take complete randomness as a given fact. But if it is, then anybody please tell me how trading is anything else but gambling? That is not my opinion, but supporting the "randomness only" hypothesis is another way of saying that an edge in the forex market is entirely impossible. Then we should all delete the Metatrader, walk away and never trade another day.

Can we agree, that the trained model shouldn't return no directional bias if it's fed with zero inputs (or more precisely: the mean of the inputs, which equals zero in stationary data) as a baseline for simulation of the "naive" situation and that a significantly lower error (on unseen data), compared to the "naive" baseline would suggest non-randomness?

 

I don't agree on the continous learning concept, by the way. As I understand it, you're trying to extract useful information for trading decisions from the model being overfitted predominantly to the most recent price behaviour. IF this should work, then your saying that the most recent prices do in fact contain valuable information, this (1.) contradicts(!) the random-walk and (2.) a non-overfitting model should act upon this information just as well (without the continuous re-training).

 

Reading this, in my opinion I would also say prices are random. Its true mostly trading is not based on skill but luck. Even worse is that the financial mathematics of buying and selling can work against you.

However, the question is can we make a profit off of this randomness? For example stationary noise could easily be profitable to trade. I don't know if you consider that to be random.

Having said this, this randomness applies to forex mostly, It does not apply to reasonable periods of time for indices like dow jones.


The price probably is random if you think it moves as a result of people trading, each with his own idea of where the price is going, but with stock indexes, there is more an investment buy side mind set.


If you consider what guys like buffet and them would think, then the real place to look for non random patterns is in financial statement data instead of prices

 
Chris70:

You*re convincing me that you should ditch tick volume.

About the random walk: I neither agree nor disagree. I'm just not sure yet (and trying to find out). But can we agree, that we all should stop trading right now if all is random? Maybe we should.. How can anybody with positive P&L stats be sure that he(/she) isn't just fooling oneself with the own survivor's bias and "trading skills" were just being lucky? Yes, I know about all those guys out there saying things like "you only need to know how to read a chart" ... and before the "consistency" arguments arise: Myself included, I'm net profitable and think I know something about technical analysis, but if you are correct and everything is random ONLY, then there really isn't much to know for the simple coin flip. So I think it's really worth finding out and not just take complete randomness as a given fact. But if it is, then anybody please tell me how trading is anything else but gambling? That is not my opinion, but supporting the "randomness only" hypothesis is another way of saying that an edge in the forex market is entirely impossible. Then we should all delete the Metatrader, walk away and never trade another day.

Can we agree, that the trained model shouldn't return no directional bias if it's fed with zero inputs (or more precisely: the mean of the inputs, which equals zero in stationary data) as a baseline for simulation of the "naive" situation and that a significantly lower error (on unseen data), compared to the "naive" baseline would suggest non-randomness?

My comment is strictly about neural networks, NN training with regard to try to do, predict future price. It has nothing to do with trading the random walk. 

Regarding the second point, are you refering to the autoencoder?

Let me explain with simple analogy.

We take 100 random people of various age, color, weight and so on. We go to the beach where there is a breeze. Each of the 100 people in turn will flip a coin. We keep a tally of the results. After the event we will end up with a time sequence of 100 coin flips.

Now we want to predict if the next coin flip will be heads or tails. There are multiple ways this can be done. One of the methods is to train a network soley on the tally. Because these inputs to the network do not have sufficient information for a mathematical function related to outputs to inputs, it can not be expected it is capable of learning this non-existant function. The best possible obtained results in this case is to predict the last value as future value.

The results can be improved by supplying the network with data of which enough information is given to the network as to be able to come up with a function. For example a model of the coin, maybe it has a dis-balance. This maybe alone is able to better predict the outcome without the tally at all, or combining the tally with the coin model will outperform the coin model. 

Regarding your second post about batch vs online training. Your understanding of my point is not correct. It has nothing to do with the random walk issue. The market is dynamic as can be seen on my volume picture. Things do not stay the same. So a once trained model will go out of fashion, just like a simple RSI overbought oversold TP SL optimization on a certain time window. The network in my opinion should be kept up to date as possible. So when you present a new sample to the network and the network puts out a high MSE (= a sample that contains new information it has not seen before) one could refrain from trading as the predicted output could be unreliable because of this new information. But also you would be able to regulate the learn rate etc pause the back propagation etc etc. 

 

I'm not sure if I understand the word "tally" correctly (I never lived outside Germany (=no native english..) and hear the word for the first time). Nevertheless, I think your wrong about the coinflip experiment. After keeping track of a series of 100 coin flips, the best prediction for the next coin flip is not the last result of the series, but the result with the higher frequency within the series. If the distribution was 53:47 heads:tails, then the answer should be heads, according to the limited available information about the empirical probability, based on this low sample size. My point being: the last result in the series is irrelevant. And even if the distribution was exactly 50:50, the last result is no better than any other.

Even a random-walk can show impressive dynamics and all kinds of trends. And undoubtfully big price moves happen all the time (sometimes news driven), whilst we shouldn't forget, that a random distribution doesn't need to also be a random NORMAL distribution. Extreme moves can occur more frequently than in a normal distribution.

The fact that all those dynamics do exist, is nothing we need to discuss. The real question is if we can know about the moves before they happen.

How frequently a neural network or any predictive model needs to be retrained (up to continous live retraining) is something that needs to be supported by data. The risk of overfitting to the most recent data exists just as well as with long term data.

Example: let's say the EURUSD has just made a huge upwards move after the release of nonfarm-payrolls news. A short-term retrained model might be overfitting in the sense that it now assumes, that skyrocketing is the normal market behaviour and therefore predict still higher prices, whereas in reality, the news is over and the market can bounce back to the opposite.

Accounting more for the most recent data and changing the model data all the time therefore can also be counterproductive opposed the sticking to something that has been learnt based on many events, not just this one event.

I understand your intuition, but it's not true unless proven.

 

With online training i mean weights are updated after every sample instead of updating weights after a batch of samples. It does not mean to regard or disregard recent or older data, nor give priority to either. It is just a different training method.

 

Hi Chris

I have been playing with NN and trading for a while although I do not consider myself a great programmer but I have used a few. My current version is a Conv.LSTM  that trades with Metatrader 5. I also could find little value in using volume( did try it as an input). I also find the lower timeframes to be too noisy to give a reliable signal but once you get to 15 minutes and above, then I have found some value there. The  test and evaluate results from a 1 hour time frame GBP/USD

Mean Squared Error:  0.009717771693628622
Mean Absolute Error:  0.06431447854841699
RMSE:  0.0985787588359106
Time used: 2.8135385513305664 sec 

I have also been playing with evolutionary algos and find them really cool.

If this is of interest to you, please feel free to drop me a line. I sent you a pm to your profile

John

Reason: