Machine learning in trading: theory, models, practice and algo-trading - page 609

 
Sergey Chalyshev:


Throw your network in the trash, if it reacts so to the values of gpsc. A normal network works and learns at any initial values, even at zero.


Nonsense.
 

  • Determination of optimal DNN hyperparameters

Typically, the number of hidden layers, the number of neurons in each hidden layer, the applied activation, initialization and regularization functions, the level of learning are referred to the hyperparameters of the neural network. The structure of hyperparameter optimization is shown in the figure below.

optimHP

Fig. 1. Structure of hyperparameters in a neural network and ways to optimize it

There are three ways to optimize hyperparameters:

  1. Grid search
  2. Genetic optimization
  3. Bayesian optimization

In the first case a vector with several fixed values is specified for each hyperparameter. Then, using the caret::train() function or your own script, you train the model on all combinations of hyperparameter values. After that, the model with the best classification quality parameters is selected and its parameters will be accepted as optimal. The disadvantage of this method is that by setting a grid of values, we are likely to miss the optimum.

In the second case, the possibility of stochastic search for the best parameters with the use of genetic algorithms is used. We have already discussed several genetic optimization algorithmsin detail. We will therefore not repeat them.

The third case uses Bayesian approach (Gaussian processes and IMS), which we will test in this article. We will use the packagerBayesianOptimization(version 1.1.0). The theory of applied methods is given inJasper Snoek, Hugo Larochelle, Ryan P. Adams (2012) Practical Bayesian Optimization of Machine Learning Algorithms

Hyperparameters of neural networks in general can be divided into two groups: global and local (node). The global ones include the number of hidden layers, the number of neurons in each layer, the level of learning and momentum, the initialization of weights of neurons. The local ones include layer type, activation function, dropout/dropconect and other regularization parameters.

 

In general, again it is recommended a lot of training on the same data.

I have done several manual trainings here, and I am a little confused. I assumed that the error reduction will go steadily, but it jumps.

Even repeated training on the same network structure can give results with a difference of 3-5%, which can also prevent the right choice of structure.

 

Can you suggest any other traders who can train me? My friend advised me training (from Polikarp Brekhunov - changed by Artyom Trishkin), who knows, maybe there are other traders who conduct training courses?

 
elibrarius:

In general, again it is recommended a lot of training on the same data.

I have done several manual trainings here, and I am a little confused. I assumed that the error reduction will go steadily, but it jumps.

Even repeated training on the same network structure may give results with 3-5% difference, which also may hinder the correct choice of structure.

This is how it should be. The initial initialization of the scales is done with small values of random numbers (depends on the type of initialization). To get reproducible results, it is necessary to set the GSF to the same state before each start of training. This is what set.seed() is for.

Good luck

 
Vladimir Perervenko:

This is the way it should be. The initial initialization of the weights is done by small values of random numbers (depends on the type of initialization). To get reproducible results, it is necessary to set the GSF to the same state before each start of training. This is what set.seed() is for.

Good luck

Ah, well, I was pre-training in Darch RBM in 2 epochs with learnRate = 0.3.

 
Vladimir Perervenko:

  • Determination of optimal DNN hyperparameters

In the second case I use a stochastic search for the best parameters, using genetic algorithms.

For trading, the idea of model optimization (TS) is highly questionable, since any optimization looks for peaks / troughs, and we do not need them. We ideally need smooth plateaus that are as large as possible. These plateaus should have one wonderful property: changes in model parameters should NOT cause them to leave the plateaus.

This is about optimization.

In fact, here we should also add the problem of stability of model parameters, which if they change, then within a fairly narrow (5%) confidence interval. As it seems to me, stability of model parameters results in the situation when its efficiency is at a certain plateau and if we suddenly get a very good result while testing the model, it means we have reached a minimum point, we have got an unstable condition that will never occur in practice, moreover there will be a stop-out around this optimal point.

PS.

By the way, in the tester the developers have provided such a possibility to search for a plateau by color. Personally, I use the tester as a finishing tool and take the parameters, which refer to a square, around which squares of the same color. This is a clear expression of my concept of a "plateau".

 
elibrarius:

Ah, well, I've been doing Darch's RBM pre-learning in 2 epochs with learnRate = 0.3.

Inside darch() there is a parameter seed = NULL by default. set it to some state, for example seed = 12345.

This is a small learnRate value. Start with learnRate = 0.7, numEpochs = 10 for RBM and NN. But this is ceiling data. You need to optimize for a specific data set.

Good luck

 
Vladimir Perervenko:
Inside darch() function there is a parameter seed = NULL by default. set it to some state, for example seed = 12345.

This is a small learnRate value. Start with learnRate = 0.7, numEpochs = 10 for RBM and NN. But this is ceiling data. You need to optimize for a specific data set.

Good luck

Thank you! I will try it.
 
SanSanych Fomenko:

For trading, the idea of model optimization (TS) seems very doubtful, because any optimization looks for peaks / troughs, and we do not need them. We need, ideally, flat plateaus, as large as possible in size. These plateaus should have one wonderful property: changes in model parameters should NOT cause them to depart from the plateau.

This is about optimization.

In fact, here we should also add the problem of stability of model parameters, which if they change, then within a fairly narrow (5%) confidence interval. It seems to me that stability of model parameters results in the situation when its efficiency is at a certain plateau and if we suddenly get a very good result while testing the model, it means we have reached a minimum point, we have got an unstable condition that will never occur in practice, moreover a stop-out will be situated around this optimal point.

PS.

By the way, in the tester the developers have provided such an opportunity to search for a plateau by color. Personally, I use the tester as a finishing tool and take the parameters, which refer to a square, around which squares of the same color. This is a clear expression of my concept of a "plateau".

1. What are you talking about optimizing what? What plateau? What model? If you are talking about a neural network, it would be strange not to train (optimize parmeters) the DNN before using it.

2. What model parameters(?) should be stable?

I don't understand your thoughts.

I was talking about the optimization of hyperparameters DNN, which must be done necessarily and not in the tester.

Reason: