Machine learning in trading: theory, models, practice and algo-trading - page 368

 
elibrarius:
I think SanSanych meant not to bother with writing his own codes, but to use ready-made functions from R

It's not just about the codes. The finished code is the implementation of some ideas.

See

gafs - selection of predictors by genetic algorithm
and the final model is based on a subset of the predictor which is attached with the optimal number of generations determined by resampling


rfe - backward selection of predictors

implements backward selection of predictors based on ranking of predictor importance.

Predictors are ordered, and less important ones are sequentially removed before simulation.

The goal is to find a subset of predictors that can be used by a more accurate model

safs - - simulated predictor selection robustness (predictor annealing)

During a search, a measure of fitness (i.e., the energy value of SA) is required for the search. This is an internal measure of performance. During search, the data that are available are the instances selected by high-level resampling (e.g., those nine-tenths parts mentioned above). The general approach should do a different resampling procedure. The other option should use multiple out-samples to determine the internal performance evaluation


That said, always remember, "garbage in, garbage out."

 
SanSanych Fomenko:

It's not just about the codes. The finished code is the implementation of some ideas.

See

gafs - selection of predictors by genetic algorithm
and the final model is based on a subset of the predictor that is attached with the optimal number of generations defined by resampling


rfe - backward selection of predictors

implements backward selection of predictors based on ranking of predictor importance.

Predictors are ordered, and less important ones are sequentially removed before simulation.

The goal is to find a subset of predictors that can be used by a more accurate model

safs - - simulated predictor selection robustness (predictor annealing)

During a search, a measure of fitness (i.e., the energy value of SA) is required for the search. This is an internal measure of performance. During search, the data that are available are the instances selected by high-level resampling (e.g., those nine-tenths parts mentioned above). The general approach should do a different resampling procedure. The other option should use multiple out-samples to determine the internal performance evaluation


In doing so, always remember, "garbage in, garbage out."

Interesting ideas. You can't write these things yourself...

I did a correlation screening. For starters.
I sorted the predictors by total correlation, then I go from less correlated ones, and remove those correlated with it. Then I repeat this with the remaining predictors.

Pearson Corr M
1.00,0.97,0.86,0.88,0.84,0.80,
0.97,1.00,0.92,0.84,0.79,0.75,
0.86,0.92,1.00,0.73,0.67,0.63,
0.88,0.84,0.73,1.00,0.99,0.98,
0.84,0.79,0.67,0.99,1.00,1.00,
0.80,0.75,0.63,0.98,1.00,1.00,
Corr + Kfull - sum of columns
5.35,5.26,4.80,5.42,5.30,5.16,

Kfull sorted
4.80(2),5.16(5),5.26(1),5.30(4),5.35(0),5.42(3),
Inputs to delete: 1,3,4,
Inputs to keep: 0,2,5,

 
SanSanych Fomenko:

It's not just about the codes. The finished code is the implementation of some ideas.

See

gafs - selection of predictors by genetic algorithm
and the final model is based on a subset of the predictor which is attached with the optimal number of generations defined by resampling


rfe - backward selection of predictors

implements backward selection of predictors based on ranking of predictor importance.

Predictors are ordered, and less important ones are sequentially removed before simulation.

The goal is to find a subset of predictors that can be used by a more accurate model

safs - - simulated predictor selection robustness (predictor annealing)

During a search, a measure of fitness (i.e., the energy value of SA) is required for the search. This is an internal measure of performance. During search, the data that are available are the instances selected by high-level resampling (e.g., these aforementioned nine-tenths parts). The general approach should do a different resampling procedure. The other option should use multiple out-samples to determine the internal performance evaluation


In doing so, always remember, "garbage in, garbage out."

Interesting ideas, of course you can't write it yourself, and you can't come up with the idea)

The question remains whether such complex ideas are needed for trading purposes. I will try to remove highly correlated ones, and if the result is unacceptable, I will have to switch to R
 
elibrarius:

Interesting ideas, of course you can not write it yourself, and do not come up with the idea itself)

The question remains whether such complex ideas are needed for trading purposes. I'll try removing highly correlated ones for now, if the result is unacceptable, I'll have to switch to R

You are stuck with alglib in the sense that it is extremely unhandy for research work. R interpreter and trying anything is a breeze.

Removing highly correlated predictors - (multicollinear) is a must.

The correlation between the predictor and the target is certainly interesting. I can throw in an idea for this kind of correlation. Build a linear regression (there are more sophisticated ones) and throw out those predictors that have NOT significant coefficients. That didn't work for me. Maybe you can.

But the selection of predictors is a must in MO

 

The article https://www.mql5.com/ru/articles/497 suggested changing the slope of the activation function depending on the number of inputs.

In Algli and in R, is the slope standard or is it self-optimized depending on the number of inputs? Or almost nobody knows what is inside these black boxes?

Нейронные сети - от теории к практике
Нейронные сети - от теории к практике
  • 2012.10.06
  • Dmitriy Parfenovich
  • www.mql5.com
В наше время, наверное, каждый трейдер слышал о нейронных сетях и знает, как это круто. В представлении большинства те, которые в них разбираются, это какие-то чуть ли не сверхчеловеки. В этой статье я постараюсь рассказать, как устроена нейросеть, что с ней можно делать и покажу практические примеры её использования.
 
elibrarius:

The article https://www.mql5.com/ru/articles/497 suggested changing the slope of the activation function depending on the number of inputs.

In Algli and in R, is the slope standard or is it self-optimized depending on the number of inputs? Or almost nobody knows about what's inside these black boxes?


Some neural network libraries offer the user the widest possibilities to customize the created neural network, up to the setting of individual neuron activation or adding/removing individual connections. However practice shows that extensive functionality is often simply not required - there are some typical architectures that cannot be significantly improved by fine-tuning; the same situation is with the methods of training neural networks. Finally, there is another reason why you should not give the user too rich a toolbox - if a neural network requires fine-tuning, then such a tuning is not difficult for the author of a software package, but can often baffle the end user. Hence the conclusion - a good neural network package should not require complex tuning. In line with this principle, the ALGLIB package tries to automatically resolve as many issues as possible, leaving only the really important decisions to the user.

Available architectures
The ALGLIB package allows you to build neural networks without hidden layers, with one hidden layer and with two hidden layers. Connections go from the input layer to the first hidden layer (if there is one), then to the second, then to the output layer. There are no "short" connections from the input layer to the output layer. Hidden layers have one of the standard compressive activation functions, but more variety is possible for the output layer of the neural network. The output layer can be linear (such networks are used in approximation problems), can have a compressive activation function (in case the network outputs are limited to a certain range). Networks with a bounded activation function from above (or below) are also available. In the simplest case (the boundary is zero), this function tends to x if x tends to +∞, and exponentially tends to zero if x tends to -∞.
A special case are neural networks with a linear output layer and SOFTMAX-normalization of outputs. They are used for classification problems, in which the outputs of the network must be non-negative and their sum must be strictly equal to one, which allows to use them as probabilities of assigning the input vector to one of classes (in the limiting case the outputs of the trained network converge to these probabilities). The number of outputs of such a network is always at least two (limitation dictated by elementary logic).
Such set of architectures, in spite of its minimalism, is sufficient to solve most practical problems. The absence of superfluous details allows to concentrate on the problem (classification or approximation), without paying excessive attention to unimportant details (for example, the choice of a particular activation function for a nonlinear layer usually has little effect on the result).

http://alglib.sources.ru/dataanalysis/neuralnetworks.php

 
Read earlier, but that doesn't answer the question(
"Hidden layers have one of the standard compressive activation functions" - which one? Does it adjust its coefficients to the number of incoming connections? The fact that the number of inputs affects the result was shown in the article. Otherwise we bang 100 neurons in the hidden layer (or even in the input layer), and the following neurons can work well with only 5 inputs...
 
elibrarius:
Read earlier, but that doesn't answer the question(
"Hidden layers have one of the standard compressive activation functions" - which one? Does it adjust its coefficients to the number of incoming connections? The fact that the number of inputs affects the result was shown in the article. Otherwise we bang 100 neurons in a hidden layer (or even in the input layer), and the following neurons can only work well with 5 inputs...


it's only if you look in the code or ask them... if there are 2 types of grid, probably sigmoid and some other one on the output, which varies depending on whether it is linear or in range

And with the selection of the grid architecture it's aye... I'll have to go to a lot of trouble, but luckily here I can set only two hidden layers, which is already easier :) I read somewhere that the hidden layer, in general, should be 2 times less than the input layer... or more, I forget )

 

By the way, I checked the correlation with periods from 10 to 60 (6 pieces) on the EURUSD M1 chart, to the exit (I do not have a zigzag, but something close).

-0.00,0.01,0.00,0.01,0.01,-0.01

The correlation ranges from -0.01 to 0.01, i.e. there is no correlation at all.

However, your Expert Advisor shows profit. If you want to manually place trade by looking at trendlinearre and defining some rules based on chart movements, it would be much easier to write an ordinary Expert Advisor that will work by these rules.

 
Maxim Dmitrievsky:


I read somewhere that the hidden layer, in general, should be half as much as the input layer... or more, I forget)

One of the rules "Optimal number of neurons in a hidden layer (# of hidden neurons) = (# of inputs + # of outputs)/2 , or SQRT(# of inputs * # of outputs) "

And in the Reshetov network you used, 2nInputs

Reason: