Machine learning in trading: theory, models, practice and algo-trading - page 2739

 

I can post a sample with a lot of predictors, whoever can select the best ones - well done - let's make a contest.

The best ones will be determined on an independent sample, after training the model on the selected predictors.

Does anyone want to participate, or are they all just so capable by word of mouth?

 

Too general a topic. Often disintegrates into its components. Long overdue to be split into multiple threads. For example: 1. MO: data preprocessing. 2. MO model selection. 3. MO.Model training and optimisation. 4. MO.Implementation of models. 5.MO. IO Automation.

The division is very large and approximate, but it will be clear what the branch is about. And so about everything and nothing.

And of course it is necessary to give reproducible code examples, otherwise there is no practical use of talks .

Good luck to everyone

 
Aleksey Vyazmikin #:

So does the script do it or not?

I'm just surprised how easily many people here lose the thread of the conversation.

Alexey, you asked for an example of how to look at the importance of signs with a sliding window.

I wrote a script for you.

Then you want to search in different scales or whatever you want, why the hell should a script from the past be able to fulfil your wishes in the future?

So I' m just surprised how many people here easily lose the thread of the conversation. That's you.

 
Vladimir Perervenko #:

Too general a topic. Often disintegrates into its components. Long overdue to be split into multiple threads. For example: 1. MO: data preprocessing. 2. MO model selection. 3. MO.Model training and optimisation. 4. MO.Implementation of models. 5.MO. Automation of MOEs.

The division is very large and approximate, but it will be clear what the branch is about. And so about everything and nothing.

And ofcourse it is necessary to give reproducible code examples, otherwise there is no practical use of talks .

Good luck to everyone

Rather, we should divide by the tasks we solve, but it is too individual...

For example, in a preprocessing thread two people will not find common ground if one predicts ZZ on the whole sample and the other uses MO to select 10-20 clusters from all the data for some needs.... Etc...

The same shit will be there, but it will be more smeared on the topics
 
Valeriy Yastremskiy #:

SSF did not say much new, of course, the goal of finding correlation between predictors and results is an obvious goal. The only new thing I caught was that he has about 200 found significant features for the whole training, but for specific data, he uses only 5 per cent of them.

I understand this to mean that there are some ways to quickly determine the state/properties of a series in order to select more significant predictors just for the latest data. The question of volume or length of course arises for proper selection. But apparently it works even with only 200 predictors found and selected in the whole large training.

Isee it like this. A series has properties that are stable in some indices, but these indices and their number are different in different sections. MO finds some different states of sufficient duration of stability of the series, which can be described by different models and accordingly model settings - predictors. The total number of predictors is the total number of settings for different models, and accordingly, by defining a model, one can quickly find previously found settings for it.


I once posted a table in this thread, but now it is not at hand, so I will clarify my idea in words.

I'm relying on the notion of predictor-teacher correlation. "Linkage" is NOT the correlation or "importance" of predictors from fitting almost any model of MOE. The latter reflects how often a predictor is used in an algorithm, so a large value of "importance" might be given to Saturn's rings or coffee grounds. There are packages that allow you to calculate the "link" between predictor and teacher, for example, based on information theory.

So, a word about the table I posted here.

The table contained a numerical estimate of the "link" between each predictor and teacher. Several hundred values of "connectivity" were obtained as the window moved. These values for a particular predictor varied. I calculated the mean and sd for each "connection", which allowed me to:

- isolate predictors that have a "coupling" value that is too small - noise;

- isolate predictors that have a "linkage" value that is too variable. It was possible to find predictors that have a sufficiently large value of "coupling" and sd less than 10%.


Once again, the problem of constructing a TC based on MO is to find predictors that have a large value of "coupling" and a small value of sd when the window moves. In my opinion such predictors will ensure stability of prediction error in the future.


This is NOT the first time I have said the above. Unfortunately, the discussion constantly goes into noise and narcissism.

 
mytarmailS #:

Alexei, you asked for an example of how to look at feature importance with a sliding window.

I wrote a script for you...

Then you want to search in different scales or whatever you want, why the hell should a script from the past be able to fulfil your wishes in the future?

So I'm just surprised how many people here easily lose the thread of the conversation. That's you.

How so, I asked to make a script - yes, I quote " Can you make a script in R for calculations for my sample - I will run it for the sake of experiment. The experiment should reveal the optimal sample size. ", but this is in response to something that has already been done.

Earlier I wrote "... And how do you propose to watch in dynamics, how to realise? " - here I was asking about implementation of predictor estimation in dynamics, i.e. regular estimation by some window and it is not clear whether it is a window at each new sample or after each n samples. If this is what you have done, I don't understand it.

The code you posted is great, but it's just hard for me to understand what it does exactly or what it proves in essence, so I started asking additional questions. What do the two pictures with graphs mean?

 
СанСаныч Фоменко #:

I once posted a table in this thread, but I don't have it handy at the moment, so I'll clarify my thought in words.

I'm relying on the concept of predictor-teacher correlation. "Linkage" is NOT the correlation or "importance" of predictors from fitting almost any MOE model. The latter reflects how often a predictor is used in an algorithm, so a large value of "importance" might be given to Saturn's rings or coffee grounds. There are packages that allow one to compute the "link" between predictor and teacher, e.g. based on information theory.

So, a word about the table I posted here.

The table contained a numerical estimate of the "link" between each predictor and teacher. Several hundred values of "connectivity" were obtained as the window moved. These values for a particular predictor varied. I calculated the mean and sd for each "connection", which allowed:

- isolate predictors that have "coupling" that is too small - noise;

- isolate predictors that have a "linkage" value that is too variable. It was possible to find predictors that have a sufficiently large value of "link" and sd less than 10%.


Once again, the problem of constructing a TC based on MO is to find predictors that have a large value of "link" and a small value of sd when the window moves. In my opinion, such predictors will ensure stability of prediction error in the future.


This is NOT the first time I have said the above. Unfortunately, the discussion constantly goes into noise and narcissism.

So you have the same approach as me in essence, curious! Only, perhaps we look for "connection" differently. As windows, I take 10 sample plots and search for "connection" on them, how do you do it?

What is your algorithm for finding a connection, can you describe it?

 
СанСаныч Фоменко #:


This is NOT the first time I have said the above. Unfortunately, the discussion is constantly drifting into noise and narcissism.

yes, the real discus goes to the presentation of the most dartagnan dartagnan on the background (moderated word) :-)

all from the lack of any results. You can improve and change the method, but the result is like a 50/50 rock.

 
Aleksey Vyazmikin #:

So you have the same approach as me in essence, curious! Only maybe we look for "connection" differently. As windows, I take 10 sample plots and look for "connection" on them, how do you do it?

What is your algorithm for finding a connection, can you describe it?

I use my own algorithm - it works much faster than numerous R libraries. For example,

library("entropy")

You can just use graphs:



Everything has been posted on this thread. Everything is systematically described and chewed up at the code level in articles by Vladimir Perervenko

 
Maxim Kuznetsov #:

yes, the real discus goes into presenting the most dartagnanised dartan in the background (moderated word) :-)

all from the lack of any results. You can improve and change the method, but the result is like a 50/50 rock.

To get to the level of an advisor is not enough strength. But the result of the model fitting error: from 8% to 22% is the fitting error, which differs little in the fitting area and outside the sample.

Reason: