Discussion of article "Deep Neural Networks (Part IV). Creating, training and testing a model of neural network" - page 2
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Something subtle learning with the teacher on a very small set happens - only 1000 examples (25% of the total)
Have you not compared the quality of learning on these 1000 and on 3000 (pretrain + train)? Is the difference really small?
Something subtle learning with the teacher on a very small set is happening - only 1000 examples (25% of the total)
Have you not compared the quality of teaching on those 1000 and on 3000 (pretrain + train)? Is the difference really small?
Sent it exactly when the server was rebooting I guess.
One advantage of using pretrain is that for fine-grained training you need far fewer examples, training epochs and training levels. For pre-training, on the other hand, more examples are desirable. From my experience 1500-2000 examples are quite sufficient for pre-training. Pre-training will place DNN weights in the area of optimal solutions. It only needs to be fine tuned. Increasing these weights leads to degradation of quality.
But you can check experimentally.
Good afternoon, gentlemen traders + programmers + mathematicians...
To the author - thank you for the article, it is well explained, even understandable.... )
Vladimir, I have a question from personal prejudices, please answer if possible:
BOUNDARIES OF THE PROBLEM SOLUTION:
1. Training: Find and select in DB2* found repetitions of the Template* in DB1*. The number of repetitions and the allowed deviation will fully characterise the confidence in the Hypothesis*. The created DB2 with repetitions will be much smaller in size than DB1 for the FIs of interest*, and therefore less demanding in terms of computer power consumption. By converting the quotes of different FI* into a single type size, we will be able to save the found repetitions from different FI* into one (unifying) DB1 - which will allow us to get a larger number of repeating Templates*.
2. Analysis: When a new point appears in the Data*, and accordingly a new Pattern* -
2.1. immediately perform a new search in DB2 - which will allow to make a quick search, save the found data in DB3* and update DB2* accordingly
2.2. during the downtime perform a new search in DB1 - that will allow to make an accurate but slow search, save the found data to DB3* and update DB2* accordingly
2.3. on the basis of changes in DB3* to check (according to the trading strategy) and if necessary to change the trading position
deciphering of abbreviations:
DB1 - Database / history of quotes of all FIs* :
1) in the form of a chain of Points* limited by a long history, but from which groups of 20 points will be analysed, after analysing this group we make a shift by one point and repeat the analysis.
2) as a list of Templates*
DB2 - Database with repeating Templates*, which are identical either completely or partially (any number of variables of a Template* is discarded).
DB3 - Database with Templates*, which are identical in full or in part (with deviations to the specified level in the coordinates of Template* Points*) to the last Template, i.e. actual now.
FI* - Financial Instruments (EURUSD, GBPUSD, _SPX500, ...), each financial instrument is stored in a separate file
Point* - consists of price, date, volume, which in turn can correspond to: bar close / fractal / etc.
Template* - a group of 20 Points*
Hypothesis* - the result of the FI* analysis system (as a variant - neural network).
QUESTION ABOUT THE TASK:
Can the task at hand be solved using a neural network and if so, what is the most appropriate method of training the neural network then and why?
Let me clarify the question: is it possible to train a neural network by feeding a significant number of Templates* (one part of the template will contain landmarks for identity determination and the second part will contain a Hypothesis*). If the answer is "yes", what type of neural network can be used to do this and have you found any ready-made implementations of such a thing for
As far as I understand the variants proposed by you are not suitable for this.
I wrote the task a bit summarised - I didn't want to go into subtleties, but I wanted to get an answer to the question.
For myself I can say the obvious - neural networks attract me, but I have not managed to dive into them yet.
Regards,
Vladimir
Good afternoon, gentlemen traders + programmers + mathematicians...
To the author - thank you for the article, it is well explained, even understandable.... )
Vladimir, I have a question from personal prejudices, please answer if possible:
BOUNDARIES OF THE PROBLEM SOLUTION:
1. Training: Find and select in DB2* found repetitions of the Template* in DB1*. The number of repetitions and the allowed deviation will fully characterise the confidence in the Hypothesis*. The created DB2 with repetitions will be much smaller in size than DB1 for the FIs of interest*, and therefore less demanding in terms of computer power consumption. By converting the quotes of different FI* into a single type size, we will be able to save the found repetitions from different FI* into one (unifying) DB1 - which will allow us to get a larger number of repeating Templates*.
2. Analysis: When a new point appears in the Data*, and accordingly a new Pattern* -
2.1. immediately perform a new search in DB2 - which will allow to make a quick search, save the found data in DB3* and update DB2* accordingly
2.2. during the downtime perform a new search in DB1 - that will allow to make an accurate but slow search, save the found data to DB3* and update DB2* accordingly
2.3. on the basis of changes in DB3* to check (according to the trading strategy) and if necessary to change the trading position
deciphering of abbreviations:
DB1 - Database / history of quotes of all FIs* :
1) in the form of a chain of Points* limited by a long history, but from which groups of 20 points will be analysed, after analysing this group we make a shift by one point and repeat the analysis.
2) as a list of Templates*
DB2 - Database with repeating Templates*, which are identical either completely or partially (any number of variables of a Template* is discarded).
DB3 - Database with Templates*, which are identical in full or in part (with deviations to the specified level in the coordinates of Template* Points*) to the last Template, i.e. actual now.
FI* - Financial Instruments (EURUSD, GBPUSD, _SPX500, ...), each financial instrument is stored in a separate file
Point* - consists of price, date, volume, which in turn can correspond to: bar close / fractal / etc.
Template* - a group of 20 Points*
Hypothesis* - the result of the FI* analysis system (as a variant - neural network).
QUESTION ABOUT THE TASK:
Can the task at hand be solved using a neural network and if so, what is the most appropriate method of training the neural network then and why?
Let me clarify the question: is it possible to train a neural network by feeding a significant number of Templates* (one part of the template will contain landmarks for identity determination and the second part will contain a Hypothesis*). If the answer is "yes", what type of neural network can be used to do this and have you found any ready-made implementations of such a thing for
As far as I understand the variants proposed by you are not suitable for this.
I wrote the task a bit summarised - I didn't want to go into subtleties, but I wanted to get an answer to the question.
For myself I can say the obvious - neural networks attract me, but I have not managed to dive into them yet.
Regards,
Vladimir
Good afternoon.
For tasks of training and classification of sequences of varying length you should definitely apply LSTM. There are many varieties of such neural networks. Everything depends on the type of input data and classification goals.
Now it is possible to apply in R/MT4 all kinds of neural networks available in TensorFlow/Keras/CNTK.
Good luck
Afternoon.
For tasks of training and classification of sequences of varying length you should definitely use LSTM. There are many varieties of such neural networks. It all depends on the type of input data and classification goals.
Now it is possible to apply in R/MT4 all kinds of neural networks available in TensorFlow/Keras/CNTK.
Good luck
Good!
The answer is clear, thank you.
I will move in this direction.
Vladimir, what do you think about joining forces? I think together we can overcome...
With respect,
Vladimir
Good!
The answer is clear, thank you.
I will move in this direction.
Vladimir, what do you think about joining forces? I think we can fight together...
With respect,
Vladimir
Good.
We can discuss joint work. But I will be able to start after the end of this series of articles, I hope I will be able to do it by the end of the month.
But we can start preparatory and explanatory work now. For effective work, it is important for me to understand the basic idea. Now I don't quite understand your idea. Can you visualise it in some way?
Good luck
Hello,
could you please explain why different parameters are chosen for different layers of NS:
darch.unitFunction = c(" tanhUnit"," maxoutUnit", " softmaxUnit"),
darch.dropout = c(0.1,0.2,0.1),
darch.weightUpdateFunction = c("weightDecayWeightUpdate", "maxoutWeightUpdate","weightDecayWeightUpdate"),
1) just to show that there is a large variety in the package?
2) or do the specified combinations of parameters have some advantage? (From your experience).
3) or the diversity is useful in principle? And one should try to use different parameters for different layers (not necessarily exactly those specified in the code).
Hello,
could you please explain why different parameters are chosen for different layers of NS:
1) just to show that there is a large variety in the package?
2) or do the specified combinations of parameters have some advantage? (From your experience).
3) or the diversity is useful in principle? And we should try to use different parameters for different layers (not necessarily exactly the ones specified in the code).
Because they have different activation functions.
The activation function of the first hidden layer is determined by the type of input data Since we have inputs in the range +1/-1, the most appropriate is tanh, but it can also be sigm or maxout. The second hidden layer is maxout. The output of course is softmax.
Dropout from experience. Different layers can and in principle should have different parameters. Not all packages provide such a wide possibility to vary parameters.
Choosing a combination of these hyperparameters to get the lowest classification error is an optimisation problem (there will be an example in the fifth part of the article).
And so you can experiment with any parameters (of course, if they are relevant to our data).
Experiment.
Good luck
Hello,
another question.
Why is training split into 2 stages:
1 pretrain and then train only the top DNN layer
and
2 fine learning the whole network?
Wouldn't we get the same result without the 2nd stage if we extend the 1st stage to
pretrain + fine learning the whole network at the same time
(i.e. set rbm.lastLayer = 0, bp.learnRate = 1, darch.trainLayers = T)
Update. I set up an experiment:(red shows changes in the code)
evalq({
require(darch)
require(dplyr)
require(magrittr)
Ln <- c(0, 16, 8, 0)
nEp_0 <- 25
#------------------
par_0 <- list(
layers = Ln,
seed = 54321,
logLevel = 5,
# params RBM========================
rbm.consecutive = F, # each RBM is trained one epoch at a time
rbm.numEpochs = nEp_0,
rbm.batchSize = 50,
rbm.allData = TRUE,
rbm.lastLayer = 0,
rbm.learnRate = 0.3,
rbm.unitFunction = "tanhUnitRbm",
# params NN ========================
darch.batchSize = 50,
darch.numEpochs = nEp_0,
darch.trainLayers = T,
darch.unitFunction = c("tanhUnit","maxoutUnit", "softmaxUnit"),
bp.learnRate = 1,
bp.learnRateScale = 1,
darch.weightDecay = 0.0002,
darch.dither = F,
darch.dropout = c(0.1,0.2,0.1),
darch.fineTuneFunction = backpropagation, #rpropagation
normalizeWeights = T,
normalizeWeightsBound = 1,
darch.weightUpdateFunction = c("weightDecayWeightUpdate",
"maxoutWeightUpdate",
"weightDecayWeightUpdate"),
darch.dropout.oneMaskPerEpoch = T,
darch.maxout.poolSize = 2,
darch.maxout.unitFunction = "linearUnit")
#---------------------------
DNN_default <- darch(darch = NULL,
paramsList = par_0,
x = DTcut$pretrain$woe %>% as.data.frame(),
y = DTcut$pretrain$raw$Class %>% as.data.frame(),
xValid = DTcut$val$woe %>% as.data.frame(),
yValid = DTcut$val$raw$Class %>% as.data.frame()
)
}, env)
got:
INFO [2017-11-15 17:53:24] Classification error on Train set (best model): 29.1% (582/2000)
INFO [2017-11-15 17:53:24] Train set (best model) Cross Entropy error: 1.146
INFO [2017-11-15 17:53:25] Classification error on Validation set (best model): 30.54% (153/501)
INFO [2017-11-15 17:53:25] Validation set (best model) Cross Entropy error: 1.192
INFO [2017-11-15 17:53:25] Best model was found after epoch 8
INFO [2017-11-15 17:53:25] Final 0.632 validation Cross Entropy error: 1.175
INFO [2017-11-15 17:53:25] Final 0.632 validation classification error: 30.01%
INFO [2017-11-15 17:53:25] Fine-tuning finished after 4.4 secs
You have after the second step.
INFO [2017-11-15 17:49:45] Classification error on Train set (best model): 32.57% (326/1001)
INFO [2017-11-15 17:49:45] Train set (best model) Cross Entropy error: 1.244
INFO [2017-11-15 17:49:45] Classification error on Validation set (best model): 30.74% (154/501)
I.e. the error on validation set is the same, i.e. 30%
Before that I tried with
yValid = DTcut$train$raw$Class %>% as.data.frame()
as in the code of step 1, and got:
INFO [2017-11-15 17:48:58] Classification error on Train set (best model): 28.85% (577/2000)
INFO [2017-11-15 17:48:58] Train set (best model) Cross Entropy error: 1.153
INFO [2017-11-15 17:48:59] Classification error on Validation set (best model): 35.66% (357/1001)
Thought it was impossible to get a 30% error with one step, but after validation with the set from step 2 it became as good as yours.
It is possible that just the DTcut$val set is better than the DTcut$train set, that 's why it also performed as good in my one step as it did in your 2 steps.