Machine learning in trading: theory, models, practice and algo-trading - page 607

 
Vizard_:

Let's talk about learning TAO, ugh, TAU)))


No, no... there's only DAO here...

 
Vizard_:

Package learningCurve, R, Learning Curve.

How do they help calculate the number of neurons?
 
elibrarius:
How do they help calculate the number of neurons?

if the error has stopped dropping sharply then stop teaching :)

 
Maxim Dmitrievsky:

If the error stops dropping drastically then stop learning :)

We can train only network with already defined structure and look for errors. First of all we need to determine the structure, i.e. the number of neurons on which you can then look for errors.

I.e. the question is how learningCurve before training helps to determine the optimal number of neurons.

Or by any other method.

 
elibrarius:

We can train only network with already given structure and watch for errors. First of all we need to define the structure, i.e. the number of neurons, on which we can already watch for errors.

I.e. the question is how learningCurve before learning helps to determine the optimal number of neurons.

Or by any other method.


It turns out that the number doesn't matter... learning just stops when the error stops dropping, and retraining doesn't happen. I.e. the number of neurons can be obviously large

if I understand correctly.

 
Vizard_:

Let's talk about cognition DAO, ugh, TAU)))


Relativity of cognition is associated with many reasons, among which we should first of all mention different preparedness of consciousness for the act of perception and understanding of the same phenomenon, leading to inconsistent results of cognition (reactions, decision-making, actions, etc.).

 
Maxim Dmitrievsky:

There it turns out that the number doesn't matter anymore... training simply stops when the error has stopped falling, and no retraining takes place. That is, there can be a large number of neurons

if I understand correctly.

This is an early stopping point. I do not see any connection with the learningCurve package.

In the method of early termination training stops at the moment when the complexity of the network reaches the optimal value. This moment is estimated by the time behavior of validation error.

But it also has its weaknesses: too large a network will stop its learning early, when nonlinearities have not yet had time to show their full force. I.e. this methodology is fraught with finding weakly non-linear solutions.

Vizard_:

Errors (2pc).

Please write in more detail. How does learningCurve help determine the number of neurons for the network?

 

I can't say anything about regularization, I haven't experimented with it.

But early stopping leads to overfitting. It somehow works for picture recognition because of the high similarity of test and training data, which is why it is often recommended in books and articles. But it is not suitable for forex.


I advise to learn k-fold crossvalidation. I've seen several different ways, this one works well.

Let's use five-folds. Let's say there are 1000 rows in the training table.

1) Train the model on lines 201-1000. If it is a neuron, no early break, just teach the neuron a certain number of epochs enough to achieve high accuracy. Predict lines 1-200.
2) Teach the model again, now on lines 1-200 together with lines 401-1000, use the same model parameters and in general all identical settings. Predict lines 201-400.
3) Retrain the model, now on lines 1-400 together with 601-1000, use the same model parameters and generally all identical settings. Predict lines 401-600.
4) Retrain the model, now on lines 1-600 together with 801-1000, use the same model parameters and generally all identical settings. Predict lines 601-800.
5) Retrain the model, now on lines 1-800, use the same model parameters and generally all identical settings. Predict rows 801-1000.

As a result, we have five models created by identical training algorithm with identical parameters. And five predictions, each made on data unknown to the model.
Five arrays with predictions are added to the end of each other to get a single long array of length 1000, and evaluate it against the real data, for example with the function R2. This will evaluate our model, training method, and all that.
Then we pick up model parameters (activation function, number of layers and their size, etc.) each time we do all these steps (train 5 models, predict 5 chunks unique for each model, combine them, R2), getting better and better estimate.

To predict new data in a real trade - we predict each of the five patterns, and for the five results we find the average, this will be the final prediction on the new data.

p.s. The number of fouls is better to take a couple of tens, in this example there are only five for simplicity of description.

 

A special kind of obtaining, detecting categories is the operation by analogy of the following kind: cause + conditions → consequence, here the consequence comes only when the cause and conditions are combined. Applying this operation to the categories of parts and whole, one can find the category of structure playing the role of necessary conditions: parts + structure → whole, i.e. the whole cannot be obtained without a corresponding structural condition, a mountain cannot be obtained from a sufficient number of grains of sand while they are just lying on the plane. The necessary condition for obtaining a system from the elements is the relations and connections between the elements: elements + connections → system. The importance of the form was significantly manifested when a simple sewing needle was transformed into a sewing machine needle, for which the eye was moved to the point of the needle. A new quality of the needle required a change of configuration: shape + configuration → quality. This example shows at the same time the operation of the law of development of opposites of a system - a change in quality does not necessarily require a change in quantity.

 

The optimal number of hidden elements is a specific problem to be solved by experience. But the general rule is: the more hidden neurons - the higher the risk of overlearning. In this case, the system does not learn the possibilities of data, but as if it remembers the patterns themselves and any noise they contain. Such a network works fine in the sample, but poorly outside the sample. How can we avoid overlearning? There are two popular methods: early stopping and regularization. The author prefers his own, related to global search.

Let's summarize this part of the story. The best approach for sizing the network is to follow Occam's principle. That is, for two models with the same performance, the model with fewer parameters will generalize more successfully. This does not mean that one should necessarily choose a simple model in order to improve performance. The converse is true: lots of hidden neurons and layers do not guarantee superiority. Too much attention today is paid to large networks, and too little to the principles of their design. Bigger is not always better.


http://ai-news.ru/2016/05/tehnologii_fondovogo_rynka_10_zabluzhdenij_o_nejronnyh_setyah_578372.html

Технологии фондового рынка: 10 заблуждений о нейронных сетях
Технологии фондового рынка: 10 заблуждений о нейронных сетях
  • ai-news.ru
Нейронные сети - один из самых популярных классов алгоритмов для машинного обучения. В финансовом анализе они чаще всего применяются для прогнозирования, создания собственных индикаторов, алгоритмического трейдинга и моделирования рисков. Несмотря на все это, репутация у нейронных сетей подпорчена, поскольку результаты их применения можно...
Reason: