Machine learning in trading: theory, models, practice and algo-trading - page 931

 
Dr. Trader:

Genetics tries to find the correct parameters for a limited number of function calls. By narrowing down the number of variants of this parameter (30 instead of 200) genetics can investigate the region from 1 to 30 in more detail. And this is correct, if you know the specific limits for finding some parameter of the model, it is better to give this information to genetics right away.


Alternatively:

Add this line (the green one) to the code, then the geneticist will have 500 individuals in the population instead of the default 50. And it will be able to check 10 times more models (but the script runtime will also increase by 10 times), being able to try as many combinations of model parameters as possible. Even with max 200 neurons, I think genetics will also be able to find the best result with 0.85, or at least get close to it.

I can not understand what genetics you are talking about. There is no genetics in ELM. Just look at the ELM theory or the description of the elmNN package:

"ELM algorithm is an alternative training method for SLFN ( Single Hidden Layer Feedforward Networks ) which does not need any iterative tuning nor setting parameters such as learning rate, momentum, etc., which are current issues of the traditional gradient-based learning algorithms ( like backpropagation ).

Training of a SLFN with ELM is a three-step learning model:

Given a training set P = {(xi , ti )|xi E R , ti E R , i = 1,..., N}, hidden node output function G(a, b, x), and the number of hidden nodes L

1) Assign randomly hidden node parameters (ai , bi ), i = 1,..., L. It means that the arc weights between the input layer and the hidden layer and the hidden layer are randomly generated.

2) Calculate the hidden layer output matrix H using one of the available activation functions.

3) Calculate the output weights B: B = ginv(H) %*% T ( matrix multiplication ), where T is the target output of the training set.

ginv(H) is the Moore-Penrose generalized inverse of hidden layer output matrix H. This is calculated by the MASS package function ginv.

Once the SLFN has been trained, the output of a generic test set is simply Y = H %*% B ( matrix multiplication ). Salient features:

- The learning speed of ELM is extremely fast.

- Unlike traditional gradient-based learning algorithms which only work for differentiable activation functions, ELM works for all bounded nonconstant piecewise continuous activation functions.

- Unlike traditional gradient-based learning algorithms facing several issues like local minima, improper learning rate and overfitting, etc., ELM tends to reach the solutions straightforward without such trivial issues.

- The ELM learning algorithm looks much simpler than other popular learning algorithms: neural networks and support vector machines."

Even with a small number of neurons it is impossible to get two identical neural networks. You are wrongly defining the threshold for transferring continuous output to class. Threshold = 0.5 is the worst case. Acceptable = median/ But there are more advanced ones.

Good luck

 
Maxim Dmitrievsky:

Oh, it will be something to read, I forgot what it is. Or rather, I forgot what is the difference between GBM and XGboost... or did not know

gbm seems to be able to boost any model, xgb seems to be on trees

I know that boosting is slightly better than bagging, through which the forest is built. I don't know about retraining.


Overtraining has NOTHING to do with model type.

A model is retrained in two cases:

  • the presence of noise predictors is a major problem and is solved ONLY by fitting predictors
  • too precise model fitting - "optimization" of its parameters usually on a small sample. This kind of overfitting is solved by developer experience.

 
SanSanych Fomenko:

SanSanych, stop getting hysterical

 
Vladimir Perervenko:

I cannot understand what genetics you are talking about. There is no genetics in ELM. Just look at the ELM theory or the description of the elmNN package:

"ELM algorithm is an alternative training method for SLFN ( Single Hidden Layer Feedforward Networks ) which does not need any iterative tuning nor setting parameters such as learning rate, momentum, etc., which are current issues of the traditional gradient-based learning algorithms ( like backpropagation ).

Training of a SLFN with ELM is a three-step learning model:

Given a training set P = {(xi , ti )|xi E R , ti E R , i = 1,..., N}, hidden node output function G(a, b, x), and the number of hidden nodes L

1) Assign randomly hidden node parameters (ai , bi ), i = 1,..., L. It means that the arc weights between the input layer and the hidden layer and the hidden layer are randomly generated.

2) Calculate the hidden layer output matrix H using one of the available activation functions.

3) Calculate the output weights B: B = ginv(H) %*% T ( matrix multiplication ), where T is the target output of the training set.

ginv(H) is the Moore-Penrose generalized inverse of hidden layer output matrix H. This is calculated by the MASS package function ginv.

Once the SLFN has been trained, the output of a generic test set is simply Y = H %*% B ( matrix multiplication ). Salient features:

- The learning speed of ELM is extremely fast.

- Unlike traditional gradient-based learning algorithms which only work for differentiable activation functions, ELM works for all bounded nonconstant piecewise continuous activation functions.

- Unlike traditional gradient-based learning algorithms facing several issues like local minima, improper learning rate and overfitting, etc., ELM tends to reach the solutions straightforward without such trivial issues.

- The ELM learning algorithm looks much simpler than other popular learning algorithms: neural networks and support vector machines."

Even with a small number of neurons it is impossible to get two identical neural networks. You are wrongly defining the threshold for transferring continuous output to class. Threshold = 0.5 is the worst case. Acceptable = median/ But there are more advanced ones.

Good luck

On my small training file there is a 100% match between trainings...

 

Over what period of time do you dump the data?

I have a gap of 2 years and the difference between the data is 15 seconds. Predictors: 30 natural and over 1000 generated in "(double)(val1 < val2)" format.

At first I also considered that the number of predictors must be reduced, but practice showed that more is better.

Of course, 1000 predictors in 2 years gives me about 3GB. Using R for such volumes is not serious.

Python surpassed R in datamining because there is Cython and Jython, which plugged into projects like TensorFlow, Spark, MXNet...

 
Dr. Trader:

Genetics tries to find the correct parameters for a limited number of function calls. By narrowing down the number of variants of this parameter (30 instead of 200) genetics can investigate the region from 1 to 30 in more detail. And this is correct, if you know the specific limits for finding some model parameter, it is better to give this information to genetics right away.


Alternatively:

Add this line (the green one) to the code, then the geneticist will have 500 individuals in the population instead of the default 50. And she will be able to check 10 times more models (but the script runtime will also increase by 10 times), being able to try as many combinations of model parameters as possible. Even with max number of neurons of 200 I think genetics will also be able to find better result with 0.85, or at least get close to it.

Thanks!!!! Indeed, the result has improved. Well, let's see what's up... The main thing is to earn steadily ...

 
Vladimir Perervenko:

I can't understand what genetics you're talking about.

It's in the R script, which I showed Michael about a hundred pages ago. The genetic algorithm goes through the parameters for elmnn (activation function, gpsh grain, number of hidden neurons). The fitness function for genetics trains a committee of elmnn models using these parameters, evaluated through kfold, etc.

I wrote this script myself, when I was inspired by your article about elmnn and Bayesian optimization. But I use genetics instead of baes, it works much faster this way and I've made evaluation for my own taste.

 
SanSanych Fomenko:

The model is retrained in two cases:

  • the presence of noise predictors is the main problem and is solved ONLY by the selection of predictors

The question is not just for you, - for everyone.

In practice it is so, i.e. if there are noise predictors, the NS cannot get out of 50-55%. If you pick it up, it can even give out 70%.

But why is this so?
After all, the NS should automatically select weights close to 0 for noisy predictors during training (it is equivalent to their exclusion from the selection). We saw that in the problem at the beginning of the branch.
2) If not by training to underestimate weights, then at least dropout should sift them out...

 
Dr. Trader:

This is in the R script, which I showed Michael about a hundred pages ago. The genetic algorithm tries parameters for elmnn (activation function, gpsh grain, number of hidden neurons). The fitness function for genetics trains a committee of elmnn models using these parameters, evaluated via kfold, etc.

Copy it to your blog, maybe someone else will find it useful. It's not realistic to look for something here.
 
elibrarius:

The question is not just for you, - for everyone.

In practice it is so, i.e. if there are noise predictors the NS cannot get out of 50-55%. If you pick it up, it can even give out 70%.

But why is this so?
After all, the NS should automatically select weights close to 0 for noisy predictors during training (it is equivalent to their exclusion from the selection). We saw it in the problem at the beginning of the branch.
2) If not by training to underestimate weights, then at least dropout should sift them out...

The extra dimension is still there, and you have to draw a curve through it somehow, maybe with a big error.

dropout on the contrary increases the error, no?

Reason: