You can also run it in the optimizer --> should be faster in non-visual mode; just set the "training counter" to e.g. 1-100 and optimize
"slow and complete" with all but one threads disabled; the weight matrix will be saved to the common file folder and re-used upon the next
training cycle. But be aware that if "
every
tick" is chosen, you get as many iterations per cycle as there are ticks, so then it's probably better to not chose "years" of
training history. Shorter time per cycle means you get more preliminary reports (also in the common file folder).

Ok. Did that now, using only OHLC data from M1 charts and a single day.

Now, I see some dots plotted in the "Passes/Custom max" window.

What do the dots represent? They seem to be moving somewhere between 60k and 59k at the moment.

NELODI: Ok. Did that now, using only OHLC data from
M1 charts and 1 day of training. Now, I see some dots plotted in the "Passes/Custom max" window. What do the dots represent?

That's the loss, i.e. 1/2 squared error times 10^6. Ideally, it should go down between the passes. If not, usually the learning rate is too high
(depends both on initial rate and time decay value) or we're stuck in a local minimum (then momentum (setting: ~ 0-0.9) can help or an
optimizer that has built-in momentum
functionality
like Nesterov or ADAM).

Because the function results are not scaled and can be as high as y, the MSE will have pretty high values with this concrete function example, too.

The "10^6" is a relict of other applications, cause if I have e.g. outputs 0-1 and an average absolute error of let's say 0.01, the MSE is
sometimes too low to be shown with the few digits after the decimal points (2? 3?) that the custom results graph can show, so better too high
indications than too low.

That's the loss, i.e. 1/2 squared error times 10^6. Ideally, it should go down between the passes. If not, usually the learning rate is too
high (depends both on initial rate and time decay value) or we're stuck in a local minimum (then momentum (setting: ~ 0-0.9) can help or an
optimizer that has built-in momentum
functionality
like Nesterov or ADAM).

Ok. I've changed it now to use 5 days of M1 OHLC data from EURUSD per training pass, with the "train counter" input parameter starting at 1 and
stoping at 10000 (10k steps). That should be enough to run through the night. But the "Custom max" output doesn't look like it's going down. It
just keeps moving up and down between 60k and 59k. Except for the "train counter", all the input parameters are at your default settings. Do
you want me to change something and restart?

Btw ... looking at input parameters, "MLP optimizer method" is set to "Nesterov Accelerated Gradient (NAG)"

I'd upload the image, but I don't see any options to do it on this Forum. Hmmm.

Nesterov is pretty robust (I think it was the preset in the uploaded file; can't you change it in the input settings?); you can go with that. The
Nesterov algorithm has "look-ahead step" for the weight update built into the formula, which serves as a neat momentum adaption. I would
only stay away from ADADELTA; although this one has the advantage, that no learning rate needs to be set, it requires other fine tuning cause
it's e.g. pretty sensitive to a good weight initialization or you deal with vanishing or exploding gradients very quickly.

As a starting point for the learning rate: go into visual mode; for test purposes set a very high learning rate just to see what I mean, e.g. 0.2;
then you'll probably see the loss curve graph skyrocketing almost immediately. From there go down with the learning rate in ~ factor
0.1 increments. If you found a value with decreasing loss curve, take about 10-50% of this value to start the training (this will usually be
somewhere in between 0.0001-0.01). Time decay is not obligatory, but can help with fine-tuning at the later stages of the training. Another
method would be setting the highest tolerated learning rate in the beginning, but combined with a quicker decline using a higher time decay
value (time decay means nothing else but learning rate: *=1/(1+decay*iterations), so if for example you set a value of 0.0001, this means
that the learning rate will have reached half the original value after 10.000 iterations.

But if the loss still fails to improve after many thousand iterations (I almost expect this to happen with this formula! The challenge was to
force the network to fail and your example might do a very good job!), it is quite possible, that we just have proven, that your
function
actually is a pretty close approximation to ideal randomness (although it will technically still be pseudo-random).

If (!!) the neural network still is able to reveal some non-random principles, we should (in theory) over time at least see that
0.5*sqrt(2*MSE) < y

By the way: I set the range for y only to 0-100.

Okay... enough looking at computer screens for today...

For something that is supposed to be capable for finding the best formula for given inputs and outputs, it sure has a lot of parameters that need
to be manually tweaked.

Anyway ... points printed on the chart keep going up with each new training iteration (now at 60369), but I am going to let it run as-is until
tomorrow, then I'll post a new screenshot.

does somebody have a link for a good NN EA ? Or is this something people are currently developing ?Also new to trading (2 months). Read some stuff about NN and it’s interesting!

I've stopped training in the optimizer, because the results weren't getting any better and I wanted to see details. As you can see in the
attached Screenshots, the ANN already went through more than 14 million iterations, but the average error is still very high and it's NOT
improving with new iterations.

Since using 3 hidden layers with 100 neurons per layer didn't work out, I've deleted the files generated by the ANN and have started training a new ANN
using 10 hidden layers with 500 neurons each (as per your comment a few posts back). Here is a Screenshot of the new training session, so you can
see the initial results. I will be running this through the optimizer now, to speed up the training process.

Here are the reports from the 1st test, using 3 hidden layers with 100 neurons each, which was running through 14170063 iterations before I've
stopped it:

=============== PROJECT 'ANN_challenge' (TRAINING SUMMARY) ===============
network name: MT5 forum challencetraining data start from: 2019.10.1400:00
training data end at: 2019.10.1723:58
symbol (active chart window): EURUSD
MODEL: multilayer perceptron (fully connected)
neural network architecture:
0: input layer, 3 neurons
1: hidden layer, 100 neurons (400 incoming weights), activation: sigmoid (logistic), learning rate: 0.0012: hidden layer, 100 neurons (10100 incoming weights), activation: sigmoid (logistic), learning rate: 0.0013: output layer, 1 neurons (101 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
loss function: MSE
total number of neurons: 204 plus 3 bias neurons
dead neurons (apoptosis): 0
total number of weights: 10601 (including 201 bias weights)
ADDITIONAL LEARNING PARAMETERS:
learning rate decay: 0.0001
learning rate momentum: 0.3
optimizer: Nesterov
optimization coeff. beta2: 0.9
dropout level: 0.0
weight initialization method: Chris_uniform
INPUT FEATURE SCALING PARAMETERS:
method: minmax_method
exponent: 1.0
radius/clipping: 10.0
baseline shift: 0.0
LABEL SCALING PARAMETERS:
method: minmax_method
exponent: 1.0
radius/clipping: 1.0
baseline shift: 0.0
RESULTS:
total number of backprop. iterations: 14170063 (=773 passes over 18331 training set samples)
total training time: 274 minutes
MSE loss (entire network): 0.060732458
single output neuron loss (average): 0.060732458
with sqrt(2*(L/n))=0.3485181718 avg. abs. error (before rescaling)
=============== PROJECT 'ANN_challenge' (TEST REPORT) ===============
network name: MT5 forum challence
test data start from: 2019.10.1400:00
test data end at: 2019.10.1723:58
symbol (active chart window): EURUSD
total number of samples in the test set: 22515
the network has been trained on 14170063 total backpropagation iterations
(=773 passes over 18331 training set samples)
MODEL: multilayer perceptron (fully connected)
neural network architecture:
0: input layer, 3 neurons
1: hidden layer, 100 neurons (400 incoming weights), activation: sigmoid (logistic)
2: hidden layer, 100 neurons (10100 incoming weights), activation: sigmoid (logistic)
3: output layer, 1 neurons (101 incoming weights), activation: sigmoid (logistic)
loss function: MSE
total number of dead neurons (apoptosis): 0
total number of neurons: 204 plus 3 bias neurons
total number of weights: 10601 (including 201 bias weights)
=============== TEST RESULTS SUMMARY ===============
LABELS (=TARGETS / 'REAL' VALUES):
mean: 49.94030646
median: 46.0
variance: 833.00869208
standard deviation: 28.86188996
excess kurtosis: -1.21960764
sample skewness: 0.06041718
median skewness (Pearson): 0.40956844
PREDICTIONS (=RESCALED OUTPUTS):
mean: 61.82898731
median: 48.62273801
variance: 209.89952254
standard deviation: 14.48790953
excess kurtosis: -1.86380558
sample skewness: 0.3105026
median skewness (Pearson): 2.73460763
TEST OUTCOME (=RESCALED OUTPUTS VS LABELS):
mean squared error (MSE): 441.38377362
standard error of regression (=SER,=RMSE): 21.00913548
mean absolute error (MAE): 16.29652085
maximum absolute deviation (MAD): 48.62272513
explained variance (SSE): 4725887.75010007
residual variance (SSR): 9937755.66295825
total variance (SST): 21675447.70111327
R squared (coefficient of determination): 0.54152017
=============== INDIVIDUAL OUTPUT NEURON RESULTS ===============
output[1] my fn function result 'r':
label mean: 49.94030646, label var.: 833.00869208, label std.dev.:28.86188996, label exc. kurtosis: -1.21960764, label median skewness: 0.13652281,
output mean: 61.82898731, output var.: 209.89952254, output std.dev.: 14.48790953, output exc. kurtosis: -1.86380558, output median skewness: 0.91153588,
MAE: 16.29652085, MAD: 48.62272513, MSE: 441.38377362, SER: 21.00913548, SSE: 4725887.75010007, SSE: 4725887.75010007, SST: 21675447.70111327, R2: 0.54152017

And these are the results of a new test, using an ANN with 10 hidden layers and 500 neurons each ...

=============== PROJECT 'ANN_challenge' (TRAINING SUMMARY) ===============
network name: MT5 forum challencetraining data start from: 2019.10.1700:00
training data end at: 2019.10.1823:59
symbol (active chart window): EURUSD
MODEL: multilayer perceptron (fully connected)
neural network architecture:
0: input layer, 3 neurons
1: hidden layer, 500 neurons (2000 incoming weights), activation: sigmoid (logistic), learning rate: 0.0012: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.0013: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.0014: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.0015: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.0016: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.0017: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.0018: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.0019: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.00110: output layer, 1 neurons (501 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
loss function: MSE
total number of neurons: 4504 plus 10 bias neurons
dead neurons (apoptosis): 0
total number of weights: 2006501 (including 4501 bias weights)
ADDITIONAL LEARNING PARAMETERS:
learning rate decay: 0.0001
learning rate momentum: 0.3
optimizer: Nesterov
optimization coeff. beta2: 0.9
dropout level: 0.0
weight initialization method: Chris_uniform
INPUT FEATURE SCALING PARAMETERS:
method: minmax_method
exponent: 1.0
radius/clipping: 10.0
baseline shift: 0.0
LABEL SCALING PARAMETERS:
method: minmax_method
exponent: 1.0
radius/clipping: 1.0
baseline shift: 0.0
RESULTS:
total number of backprop. iterations: 203760 (=1068 passes over 190 training set samples)
total training time: 416 minutes
MSE loss (entire network): 0.1146638594
single output neuron loss (average): 0.1146638594
with sqrt(2*(L/n))=0.4788817379 avg. abs. error (before rescaling)
=============== PROJECT 'ANN_challenge' (TEST REPORT) ===============
network name: MT5 forum challence
test data start from: 2019.10.1700:00
test data end at: 2019.10.1823:45
symbol (active chart window): EURUSD
total number of samples in the test set: 192
the network has been trained on 203952 total backpropagation iterations
(=1069 passes over 190 training set samples)
MODEL: multilayer perceptron (fully connected)
neural network architecture:
0: input layer, 3 neurons
1: hidden layer, 500 neurons (2000 incoming weights), activation: sigmoid (logistic)
2: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
3: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
4: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
5: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
6: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
7: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
8: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
9: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
10: output layer, 1 neurons (501 incoming weights), activation: sigmoid (logistic)
loss function: MSE
total number of dead neurons (apoptosis): 0
total number of neurons: 4504 plus 10 bias neurons
total number of weights: 2006501 (including 4501 bias weights)
=============== TEST RESULTS SUMMARY ===============
LABELS (=TARGETS / 'REAL' VALUES):
mean: 52.85416667
median: 29.0
variance: 787.65624819
standard deviation: 28.0652142
excess kurtosis: -1.17058506
sample skewness: 0.04767082
median skewness (Pearson): 2.54986474
PREDICTIONS (=RESCALED OUTPUTS):
mean: 49.69327202
median: 49.97117145
variance: 8.36744753
standard deviation: 2.89265406
excess kurtosis: 34.61102201
sample skewness: 5.36822157
median skewness (Pearson): -0.28821223
TEST OUTCOME (=RESCALED OUTPUTS VS LABELS):
mean squared error (MSE): 809.83882906
standard error of regression (=SER,=RMSE): 28.45766732
mean absolute error (MAE): 24.4477848
maximum absolute deviation (MAD): 50.24216175
explained variance (SSE): 1606.54992483
residual variance (SSR): 155489.05517977
total variance (SST): 159126.21266913
R squared (coefficient of determination): 0.02285706
=============== INDIVIDUAL OUTPUT NEURON RESULTS ===============
output[1] my fn function result 'r':
label mean: 52.85416667, label var.: 787.65624819, label std.dev.:28.0652142, label exc. kurtosis: -1.17058506, label median skewness: 0.84995491,
output mean: 49.69327202, output var.: 8.36744753, output std.dev.: 2.89265406, output exc. kurtosis: 34.61102201, output median skewness: -0.09607074,
MAE: 24.4477848, MAD: 50.24216175, MSE: 809.83882906, SER: 28.45766732, SSE: 1606.54992483, SSE: 1606.54992483, SST: 159126.21266913, R2: 0.02285706

Attached are screenshots taken from the Strategy
Tester showing the "Passes/Custom max" chart, raw data from the last couple of iteration, and a detailed view of the ANN
training in visual mode.

I've switched to visual mode now, to see how the ANN looks like. From what I can see, the larger ANN, after 210k iterations, thinks that all the
outputs are somewhere between 49 and 51, with "predicted result" slowly moving up or down with each iteration. It looks almost like the large
ANN is updating the weights to find the average value of all possible outputs (range of outputs is between 0 and 100 in this testset).

Ok. I think, after these two tests, it is fairly safe to say that the ANN is useless when the output is as far away from being linear as possible, as
is the case with this pseudo-random generator.

This was fun. The outcome was kind-of expected, but it was fun getting there.

So ... Chris, would you accept another challenge? ;)

No random numbers this time. Now, I'd like to use price data. No trading, just training and prediction.

For input neurons, use the Open price of the last 100 bars (completed bars, not the current unfinished bar).

For output neurons, use the Close
price of the exact same Bars. That's 100 inputs and 100 outputs.

In this scenario, once the ANN is fully trained, the only thing it would actually have to "predict" is the last Close price, since the first 99
outputs are always going to be identical to last 99 inputs. The challenge is to find out how good the ANN will be in predicting the result of the
last output neuron when presented with previously unseen data for input neurons. But ... I'm also curious to see if the ANN can "figure out"
that the first 99 outputs are just copies of the last 99 inputs and get to a state where it would always produce correct results for the first 99
outputs.

Feel free to increase or decrease the number of input and output neurons, if you think it can improve the results. I've used 100 here only as an
example.

PS. I'll try to do the same with my very simple ANN (source code I've posted earlier in this thread).

Chris70:You can also run it in the optimizer --> should be faster in non-visual mode; just set the "training counter" to e.g. 1-100 and optimize "slow and complete" with all but one threads disabled; the weight matrix will be saved to the common file folder and re-used upon the next training cycle. But be aware that if " every tick" is chosen, you get as many iterations per cycle as there are ticks, so then it's probably better to not chose "years" of training history. Shorter time per cycle means you get more preliminary reports (also in the common file folder).

Ok. Did that now, using only OHLC data from M1 charts and a single day.

Now, I see some dots plotted in the "Passes/Custom max" window.

What do the dots represent? They seem to be moving somewhere between 60k and 59k at the moment.

NELODI:Ok. Did that now, using only OHLC data from M1 charts and 1 day of training. Now, I see some dots plotted in the "Passes/Custom max" window. What do the dots represent?

That's the loss, i.e. 1/2 squared error times 10^6. Ideally, it should go down between the passes. If not, usually the learning rate is too high (depends both on initial rate and time decay value) or we're stuck in a local minimum (then momentum (setting: ~ 0-0.9) can help or an optimizer that has built-in momentum functionality like Nesterov or ADAM).

Because the function results are not scaled and can be as high as y, the MSE will have pretty high values with this concrete function example, too.

The "10^6" is a relict of other applications, cause if I have e.g. outputs 0-1 and an average absolute error of let's say 0.01, the MSE is sometimes too low to be shown with the few digits after the decimal points (2? 3?) that the custom results graph can show, so better too high indications than too low.

Chris70:That's the loss, i.e. 1/2 squared error times 10^6. Ideally, it should go down between the passes. If not, usually the learning rate is too high (depends both on initial rate and time decay value) or we're stuck in a local minimum (then momentum (setting: ~ 0-0.9) can help or an optimizer that has built-in momentum functionality like Nesterov or ADAM).

Ok. I've changed it now to use 5 days of M1 OHLC data from EURUSD per training pass, with the "train counter" input parameter starting at 1 and stoping at 10000 (10k steps). That should be enough to run through the night. But the "Custom max" output doesn't look like it's going down. It just keeps moving up and down between 60k and 59k. Except for the "train counter", all the input parameters are at your default settings. Do you want me to change something and restart?

Btw ... looking at input parameters, "MLP optimizer method" is set to "Nesterov Accelerated Gradient (NAG)"

I'd upload the image, but I don't see any options to do it on this Forum. Hmmm.

Here it is, as an attachment.Files:Nesterov is pretty robust (I think it was the preset in the uploaded file; can't you change it in the input settings?); you can go with that. The Nesterov algorithm has "look-ahead step" for the weight update built into the formula, which serves as a neat momentum adaption. I would only stay away from ADADELTA; although this one has the advantage, that no learning rate needs to be set, it requires other fine tuning cause it's e.g. pretty sensitive to a good weight initialization or you deal with vanishing or exploding gradients very quickly.

As a starting point for the learning rate: go into visual mode; for test purposes set a very high learning rate just to see what I mean, e.g. 0.2; then you'll probably see the loss curve graph skyrocketing almost immediately. From there go down with the learning rate in ~ factor 0.1 increments. If you found a value with decreasing loss curve, take about 10-50% of this value to start the training (this will usually be somewhere in between 0.0001-0.01). Time decay is not obligatory, but can help with fine-tuning at the later stages of the training. Another method would be setting the highest tolerated learning rate in the beginning, but combined with a quicker decline using a higher time decay value (time decay means nothing else but learning rate: *=1/(1+decay*iterations), so if for example you set a value of 0.0001, this means that the learning rate will have reached half the original value after 10.000 iterations.

But if the loss still fails to improve after many thousand iterations (I almost expect this to happen with this formula! The challenge was to force the network to fail and your example might do a very good job!), it is quite possible, that we just have proven, that your function actually is a pretty close approximation to ideal randomness (although it will technically still be pseudo-random).

If (!!) the neural network still is able to reveal some non-random principles, we should (in theory) over time at least see that 0.5*sqrt(2*MSE) < y

By the way: I set the range for y only to 0-100.

Okay... enough looking at computer screens for today...

For something that is supposed to be capable for finding the best formula for given inputs and outputs, it sure has a lot of parameters that need to be manually tweaked.

Anyway ... points printed on the chart keep going up with each new training iteration (now at 60369), but I am going to let it run as-is until tomorrow, then I'll post a new screenshot.

Good Night!

I've stopped training in the optimizer, because the results weren't getting any better and I wanted to see details. As you can see in the attached Screenshots, the ANN already went through more than 14 million iterations, but the average error is still very high and it's NOT improving with new iterations.

Files:Files:Here are the reports from the 1st test, using 3 hidden layers with 100 neurons each, which was running through 14170063 iterations before I've stopped it:

And these are the results of a new test, using an ANN with 10 hidden layers and 500 neurons each ...

Attached are screenshots taken from the Strategy Tester showing the "Passes/Custom max" chart, raw data from the last couple of iteration, and a detailed view of the ANN training in visual mode.

I've switched to visual mode now, to see how the ANN looks like. From what I can see, the larger ANN, after 210k iterations, thinks that all the outputs are somewhere between 49 and 51, with "predicted result" slowly moving up or down with each iteration. It looks almost like the large ANN is updating the weights to find the average value of all possible outputs (range of outputs is between 0 and 100 in this testset).

Ok. I think, after these two tests, it is fairly safe to say that the ANN is useless when the output is as far away from being linear as possible, as is the case with this pseudo-random generator.

Files:This was fun. The outcome was kind-of expected, but it was fun getting there.

So ... Chris, would you accept another challenge? ;)

No random numbers this time. Now, I'd like to use price data. No trading, just training and prediction.

For input neurons, use the Open price of the last 100 bars (completed bars, not the current unfinished bar).

For output neurons, use the Close price of the exact same Bars. That's 100 inputs and 100 outputs.

In this scenario, once the ANN is fully trained, the only thing it would actually have to "predict" is the last Close price, since the first 99 outputs are always going to be identical to last 99 inputs. The challenge is to find out how good the ANN will be in predicting the result of the last output neuron when presented with previously unseen data for input neurons. But ... I'm also curious to see if the ANN can "figure out" that the first 99 outputs are just copies of the last 99 inputs and get to a state where it would always produce correct results for the first 99 outputs.

Feel free to increase or decrease the number of input and output neurons, if you think it can improve the results. I've used 100 here only as an example.

PS. I'll try to do the same with my very simple ANN (source code I've posted earlier in this thread).