Machine learning in trading: theory, models, practice and algo-trading - page 2015

 
Maxim Dmitrievsky:

to the input and output all the chips, there are fewer neurons in the hidden layer. It just compresses the information by minimizing the error in the output. Input should equal output (ideally). Then the second part of NS is discarded after training, at the output you get compressed features equal to number of neurons in the hidden layer

you can add recurrent layers, etc.

google Autoencoder. and its varieties.

Yes I got it all, thank you, I just don't understand how to train the network to give multiple responses to a single sample line at once. It's not clear how to do it with trees...

 
Aleksey Vyazmikin:

Yes I got it all, thank you, I just do not understand how to train the network to give multiple responses to a single sampling string at once. What's the metric there. With trees it's not clear how it could be done...

I'm not an expert at all, but first - decomposition, when one event generates many possible causes. After that - recomposition, when all these possible causes are analyzed for possible influences on these causes. The result is a small number of parameters, influencing which we can control the event.

 
Aleksey Vyazmikin:

Yes I got it all, thank you, I just do not understand how to train the network to give multiple responses to a single sampling string at once. What's the metric there, with trees it's not clear how this could be done...

It doesn't make much sense at all to use an autoencoder first or just a deep NS. You need them when you have a lot of the same tasks. For example, to compress images and so on and then use them in other ns

 
Alexei Tarabanov:

I'm no expert at all, but first there is decomposition, when one event gives rise to a multitude of possible causes. After that there is recomposition, when all these possible causes are analyzed for possible influences on these causes. The result is a small number of parameters that can be controlled by the event.

Not really - there, by refracting data in a neuron through weights in functions, values are reduced into a single function (sort of like focusing an image). And then, knowing these weights, again decompose into components, like a prism decomposes a rainbow. I understand the process, but I don't understand how to do it with trees.

 
Maxim Dmitrievsky:

It doesn't make much sense at all to use an autoencoder first or just a deep NS. They are needed when there are a lot of the same type of tasks. For example, to compress images, etc., and then use them as embeddings in other ns

Perhaps it makes sense to train exactly these "bottle-neck" neurons on trees. I.e. reduced number of predictors.

 
Aleksey Vyazmikin:

Not exactly - there, by refracting data in a neuron through weights in functions, the values are reduced to a single function. And then, knowing these weights, again decompose into components, like a prism decomposes a rainbow. I understand the process, but I don't understand how to do it through trees.

No. Try decomposition first and you'll get it.

Simply, decomposition goes by one principle, and connection goes by another. Analysis and synthesis. Neurons work in the same way here and there, but in the first case, the event is parsed, and in the second case, it is gathered around factors affecting the event.

 
Aleksey Vyazmikin:

Perhaps it makes sense to train these very "bottle-neck" neurons on trees. That is, a reduced number of predictors.

it makes no sense

compression is compression. If the model is bad as it is, it won't do anything. And regularization has about the same f-function.

 
There are losses when compressing input data. If compression is judged by the ability to retrieve the same input data, then we lose information evenly, including that which would predict the target well.
If we compress only to get the target function, that's the best option.
I think normal learning does just that.
I conclude that compression with input-only estimation will degrade the quality of subsequent training of the target on the compressed data.
However, it is better to perform an experiment instead of making conclusions from the theory.

Although you can understand why Alexey is looking for a way to reduce dimensionality - he uses scaffolding and boosting. In one tree, most of the 3000 inputs may never be used at all. Forests and boosting are better at it, but I'm afraid it's of little use.
 
Maxim Dmitrievsky:

makes no sense.

compression is compression. If the model is already bad, it will not do anything. And regularization does more or less the same thing.

elibrarius:
There are losses when compressing input data. If compression is judged by the ability to get the same input data, then we lose information evenly, including that which would predict the target well.
If we compress only to get the target function, that's the best option.
I think normal learning does just that.
I conclude that compression with input-only estimation will degrade the quality of subsequent training of the target on the compressed data.
However, it is better to perform an experiment instead of making conclusions from the theory.

Although you can understand why Alexey is looking for a way to reduce dimensionality - he uses scaffolding and boosting. In one tree, most of the 3000 inputs may never be used at all. Forests and boosting are better at it, but I'm afraid it's of little use.

There are a number of ideas as to why this might be useful:

1. You can identify interdependent predictors:

1.1. Build a separate model with them and evaluate their predictive power

1.2. exclude them from the sample and assess their impact on the result, if they improve the result then consider creating similar predictors

Use one predictor instead of a group of predictors:

1. this will equalize the chances of taking it at random when building the model

2. Reduce training time by reducing dimensionality

Yes, I want to check this, but I don't know a tool for creating such a model out of the box.


By the way, there was a thought, why don't they use broken functions (like with quantization - instead of stepping line) in training, it would allow to have a gap in data accuracy and reduce re-learning.

 
Aleksey Vyazmikin:

By the way, there was a thought, why don't they use broken functions (as it were with quantization - instead of a step line) when training, it would allow to have a data accuracy backlash and reduce overtraining.

Because it can't be trained, there would be solver jamming in local minima. As for ideas - nothing can be taken out of there, because it's a black box
Reason: