Machine learning in trading: theory, models, practice and algo-trading - page 2792

 
Maxim Dmitrievsky #:

I checked the informativeness of the attributes by shifting them backwards. That is, we take not the last values of the history of attributes, but with an indentation to the past. I took 50 indents. (from zero to -50 bars)

In the right column indentation in bars, in the left column mutual information. The indentation is in ascending order of mutual information between chips and labels.

It turned out that the last prices are not always better than the previous ones, there is some increase at -11 bar in relation to the zero bar:

indicative

What do you mean by "mutual information"? Is it interesting the effect of the fic on the mark. Is the reciprocal influence interesting? How is "mutual information" calculated?

 
СанСаныч Фоменко #:

What do you mean by "mutual information"? Is it interesting the effect of the fiche on the tag. Is the reciprocal influence interesting? How is "mutual information" calculated?

You're stumping me with your questions

 
Maxim Dmitrievsky #:

I checked the informativeness of the attributes by shifting them backwards. That is, we take not the last values of the history of attributes, but with an indentation to the past. I took 50 indents. (from zero to -50 bars)

In the right column indentation in bars, in the left column mutual information. The indentation is in ascending order of mutual information between chips and labels.

It turned out that the last prices are not always better than the previous ones, there is some increase at -11 bar in relation to the zero bar:

indicative

H1 fixes?

0   0.001554  23
1   0.001612  22
2   0.001708  15
3   0.001783  24
Looks like diurnal cycles. 22-24 hours are the most informative. So today will be the same as yesterday.
 
Maxim Dmitrievsky #:

you stump me with your questions

Why stumped?

For me, the influence, connection, predictive power of a feature, chip, predictor with a label can be explained by the following example.

Let there is a label "person", which takes two values: male and female.

Let there be a "clothing" tag, which takes two values: trousers and skirts, and the number of values of different trousers and skirts is hundreds or thousands.

Suppose that men wear only trousers and women only skirts. Then such a fiche determines the label without errors, i.e. prediction error = 0%. We can consider that the fiche affects, is bound, predicts the label 100%. If such conditions are maintained in the future, the error will not change and will be =- 0%.

In modern society this is not the case and there will be a prediction error, the magnitude of which is unknown and may vary depending on the filling of the fiche.

There is a large number of approaches, implemented in the form of software packages, which for our example for the love of some part of women to trousers, and men to skirts will show some difference from 100% connection of the chip with the mark.


The graphs show this very well.

An example of a useless feature:


An example of a fairly promising fiche. The intersection is a prediction error. In the previous graph, one chip completely overlapped the other - the prediction error is 50%.


Is this measure of the difference between the chips in the first graph or in the second graph? The difference in estimates is 2.5 times. But the numbers are relative. Are all the features junk, some or all of them great?

 
Well, look it up on google, I don't want to quote wikipedia. The measure of connection can be geometric, as in the case of correlation, and informational in the case of Mi.

I don't understand why I have to fight someone else's laziness, which you yourself have previously admitted to ).

Give one good approach, you don't need a large number of packets. The name will be enough.
 
Maxim Dmitrievsky geometric, as in the case of correlation, and informational in the case of Mi.

I don't see why I should fight someone else's laziness, which I used to admit to myself )

Yeah, well, okay. Let it be so

 
СанСаныч Фоменко #:

Yeah, well, okay. So be it

Not only you don't give any results and make references to a lot of good packages, but you also make me guess for you what exactly you meant. If something specific is discussed, write specifically, with specific results.

It's a banal example about extended distributions, show me how to get them efficiently.
The information relation was named by you. It is entropy and mutual information on its basis. Do you need to write it 500 times? Entropy is defined for one series, mutual information for 2.
 

It is better to evaluate features not by some methods and packages that are not related to the model, but by the model itself.
2 years ago I compared methods for evaluating the importance of https://www.mql5.com/ru/blogs/post/737458.

The model itself was taken as a sample. I trained it N times (according to the number of features) removing one of them.
The more the result deteriorated after removing a feature, the more important it was. There were also chips whose removal improved the result, i.e. it is clearly noise.

None of the variants of determining the importance of a feature was similar to the exemplary importance. I am afraid that mutual information and other packages may also be inconsistent.

Сравнение разных методов оценки важности предикторов.
Сравнение разных методов оценки важности предикторов.
  • www.mql5.com
Провел сравнение разных методов оценки важности предикторов. Тесты проводил на данных титаника (36 фичей и 891 строки) при помощи случайного леса из 100 деревьев. Распечатка с результатами ниже. За
 
elibrarius #:

Itis better to evaluate features not by some methods and packages not related to the model, but by the model itself.
2 years ago I compared methods of importance evaluation https://www.mql5.com/ru/blogs/post/737458

The model itself was taken as a sample. I trained it N times (according to the number of features) removing one of them.
The more the result deteriorated after removing a feature, the more important it was. There were also chips whose removal improved the result, i.e. it is clearly noise.

None of the variants of determining the importance of a feature was similar to the exemplary importance. I'm afraid that mutual information and other packages may also be inconsistent.

To a first approximation, you are certainly right - one should have a final score, if you mean by evaluating a model by its performance measures.

But, there is one nuance that outweighs everything.

Evaluating a model through its performance is an evaluation on historical data. But how will the model behave in the future?

If we are evaluating the features themselves, we can run a window and get statistics on the change in the value of a feature's score, each one individually. And, as it seems to me, it is preferable to use those features that have a small fluctuation in their importance score, preferably less than 10%. My fic set has sd fluctuations from 10% to 120% at 500 bars (from memory). This means that the score fluctuates within the 10% channel, i.e. the figure we see is it. But for 120%, the value of the importance score we see is the fic.

 
СанСаныч Фоменко #:

To a first approximation, you are certainly right - one should have a final estimate, if you mean by estimating a model its performance measures.

But there is one nuance that outweighs everything.

Evaluating a model through its performance is an evaluation on historical data. But how will the model behave in the future?

Evaluate the valving-forward test.