Machine learning in trading: theory, models, practice and algo-trading - page 556

 
Aleksey Terentev:
Overtraining - Occurs in the presence of large weights (~10^18), a consequence of multicollinearity, which leads to instability of the model A(x, w).


Overtraining is treated by: early stopping of model learning, limiting growth of weights (L1(Lasso) and L2 regularization), limiting links in the network (Dropout), also possible application of penalty functions (ElacticNet, Lasso).

Moreover, L1 regularization leads to the selection of features, as it zeroes their weight coefficients.

Getting rid of "noisy" features is the selection of features. We have our own methods for this. This does not always benefit the model, so sometimes use L2 regularization (helps solve the problem of multicollinearity).


SanSanych Fomenko, your statement about the relation of features and targets is a bit self-righteous. Because how can you assert something that has not yet been proven; that is what the MO model is built for. A built and working model gives some estimate of what the relationship is with "so-and-so" accuracy.

And the example with pants and skirts, displays the paucity of knowledge of the researcher about the study area, because in such a model you throw out valuable features about the place of residence, time of year, latitude and longitude of the region of residence, and so on.


My example is a degenerate case, pure thought, classification without error. There are no such traits in economics, but in genetics, if not 100%, a little less is possible.

Now about regularization.

Undoubtedly.

But the important thing is consistency.

First, always, the selection of traits on the basis of "relation" to the target. If in the stock markets, then on the basis of economic relationships.

That's always first, and then everything else.


I have a working TS with rf. I select 27 predictors out of several hundred attributes using the "relation". Then I select from 27 with standard algorithm on every bar (H1), from 5 to 15 remain, they are always different. I limit the number of trees - 100 is a lot, 50 is not enough, the error at 50 does not stabilize.

This is concrete experience. ZZ classification error is a little less than 30%. No way to reduce - need other predictors, and no ideas in predictors.

 
SanSanych Fomenko:

Since you have so many parameters at once for input, then it is clear what you mean.

In that case, retraining is somewhat secondary here, which is probably what caught my eye. It is closer to "easing" of calculations.

 
Aleksey Terentev:

Since you have so many parameters at once for input, it is clear what you mean.

In that case, overfitting is somewhat secondary here, which is probably what caught my eye. It is closer to "easing" of calculations.


Why secondary?

What is primary?

What could be scarier than retraining (overfitting)?

 
SanSanych Fomenko:

Why secondary?

What is primary?

What's scarier than overtraining (overfitting)?

In the question you raised about sampling parameters for classification with some correlations between target data, you mentioned overfitting. I tried to correct you and generalized this question to the task of feature selection. At the same time I gave considerations on overtraining, where the question of selection arises as a consequence.
That is why I say that in the question raised, overtraining is secondary.
Although you are correct, you should have said so: "Selection of parameters is secondary to retraining, and it is worth more detailed consideration of the question from him.
 
I added a signal indicator, if anyone is interested. https://www.mql5.com/ru/blogs/post/712023
 
Read at your leisure.
https://habrahabr.ru/post/345950/
Добро пожаловать в эру глубокой нейроэволюции
Добро пожаловать в эру глубокой нейроэволюции
  • habrahabr.ru
От имени команды Uber AI Labs, которая также включает Joel Lehman, Jay Chen, Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such и Xingwen Zhang. В области обучения глубоких нейронных сетей (DNN) с большим количеством слоев и миллионами соединений, для тренировки, как правило, применяется стохастический градиентный спуск (SGD). Многие...
 

it turned out that simple NS beyond the boundaries of the training sample works very poorly (comes out to a hyperb. tangent constant)... in the case of regression, i.e. not much better than RF

very illustrative article

https://habrahabr.ru/post/322438/


Нейронные сети в картинках: от одного нейрона до глубоких архитектур
Нейронные сети в картинках: от одного нейрона до глубоких архитектур
  • 2022.02.17
  • habrahabr.ru
Многие материалы по нейронным сетям сразу начинаются с демонстрации довольно сложных архитектур. При этом самые базовые вещи, касающиеся функций активаций, инициализации весов, выбора количества слоёв в сети и т.д. если и рассматриваются, то вскользь. Получается начинающему практику нейронных сетей приходится брать типовые конфигурации и...
 
Maxim Dmitrievsky:

it turned out that simple NS beyond the boundaries of the training sample works very poorly (comes out to a hyperb. tangent constant)... in the case of regression, i.e. not much better than RF

very illustrative article

https://habrahabr.ru/post/322438/


IMHO useful and informative article with no pretensions to novelty, but with a good practical sense.


Aleksey Terentev:
Read at your leisure.
https://habrahabr.ru/post/345950/

IMHO it's a useless article, bad translation, or I'm a bad reader, but for me it looks like a banal heap of outdated ideas and, sadly, another confirmation of the crisis of deep learning as a top technology, even the actual founder of this direction - Jeffrey Hinton - has recently talked about it in his articles about capsule neural networks.

I respect Uber Taxi...)

 
Ivan Negreshniy:
IMHO useful and informative article, without pretension to novelty, but with a good practical sense.

That's why it's easy to get confused, let's say we were using linear or regression and everything was ok, and then we decided to switch to MLP for the same tasks... and no way :)

That's why everybody prefers classification, though regression is good for forecasting :)

I would even say that for trends more suitable is linear or regression, and for flattish ones - MLP.

 

In the course of my exercises with the Garch I got an amazing pattern.

Here is the EURCAD quote


And here is the autocorrelation of absolute increments


An amazing regularity!

1. What does it mean?

2. How can this regularity be used?


PS.

Not all currency pairs have this view.

Here is USDJPY


Reason: