Machine learning in trading: theory, models, practice and algo-trading - page 29

 
Dr.Trader:

I drew a graph of the dependence of R^2 and the percentage of winning cases on the number of components used. The best result on the fronttest was with 41 components (gain of about 70%, very good). But you can't tell that from the backtest charts, they just keep growing. If we rely on the importance of the components, we should have taken 73, which is not the best result in the fronttest.

R^2 fronttest can be negative even when winning >50% of the time, due to unbalanced required results, the number of classes "0" is different from "1", so their average is not 0.5, and R^2 is a little worse from that.

Use crossvalidation to pick up the number of components. The best value on the crossvalidation then check on the validation set.
 

Since we are in such a rush, I am attaching my dataset (binary classification).

There are nine input parameters (in the beginning) - all informative and one output (in the rightmost column).

If the output is 1, then the next bar opening price difference is positive, if 0, then it is negative.

I am interested in the question, which one will have better generalization ability than my algorithm?

Files:
datasets.zip  21 kb
 
Yury Reshetov:

Since we are having such a rush, I am attaching my dataset.

There are nine input parameters (in the beginning) - all informative and one output parameter (in the rightmost column).

If the output is 1, then the next bar opening price difference is positive, if it is 0, then it is negative.

I am interested in the question, who has a better generalizability than my algorithm?

1. How is the "informativeness of predictors" proved?

2. What is "generalizability"?

 
SanSanych Fomenko:

1. How is the "informativeness of predictors" proved?

2. What is "generalizability"?

1. A marked deterioration in generalizability if at least one informative predictor is removed from the sample

2. See video:


 
Yury Reshetov:

1. Noticeable deterioration of generalizability if at least one informative predictor is removed from the sample

2. See video:


Yury, hi. I will try to see your data.
 
Alexey Burnakov:
Yuri, hi. I will try to look through your data.

Greetings!

If you are interested in the data, I can lay out a script that collects information from charts and writes to the file.

 
Yury Reshetov:

1. Noticeable deterioration of generalizability if at least one informative predictor is removed from the sample

2. See video:


2. See video:

Sorry, but the usual nonsense of an uneducated grad student who has not yet been explained that there are many other people besides his beloved who not only know it all, not only have advanced much further, but have implemented it in algorithms used by millions of people (if you include students here)

1. A noticeable deterioration in generalizability if you remove at least one informative predictor from the sample

Believe me, unfortunately this proves nothing. Moreover, if the set of predictors is bad (a lot of noise) then this effect will be the stronger the more noise. This is explained very simply: the more noise, the easier it is for the algorithm to find the "convenient" value.

Regarding the problem in general.

There are quite a large number of algorithms that determine the importance of predictors for a given target variable. These algorithms can be divided into two groups: those that are built into the model building algorithm and those that exist autonomously. In my opinion and in the opinion of people on the branch and on the link I cited here, all these algorithms suffer from one common drawback: if there is a certain critical number of noisy predictors among predictors, the algorithm stops working and moreover starts discarding predictors that are relevant for the target variable.

That is why we here at the branch try to preliminarily clean up the initial set of predictors, and then work with the rest of predictors using standard methods.

Regarding your file.

1. I was not able to build 6 classification models on your data: the error is more than 50%. If you want I can post the results here.

2. The reason for this result is that you have a very bad set of predictors - noise, i.e. predictors that are not related to the target variable. Predictors 6, 7, and 8 have some predictive power, but very little. I don't work with these predictors. The others are just noise.

PS.

If you're really interested in the subject, caret. You'll master it, you'll teach this clever guy from the video. In caret almost 200 models + very useful functions of preconditioning + two very good algorithms for selecting predictors.

PPSS.

Once on a forum posted my vision of what "predictor is relevant to the target variable" means

So.

Let's take the target variable: male/female.

Predictor: clothing.

If the predictor (clothing) contains only skirts and pants, then this predictor will be 100% relevant to the target variable - mutually unambiguous - for the population of a number of countries. But clothing comes in all sorts and of greater variety. Therefore, not 100%, but less. That is, we get that some set of clothes may have a relationship to the target variable, and another may not in principle have a relationship at all. I.e., noise. So the problem is how to find such NOT-noise predictors that will be noise in one window and not in another. And what is the measure of this "noisiness"?

 
Yury Reshetov:

Greetings!

If you are interested in the data, I can lay out a script that collects information from charts and writes to the file.

I have a question too. Do I need to build a predictor on a train and measure the error on a test? And then we can compare with your result, right?
 
Alexey Burnakov:
I have a question too. Should I build a predictor on train and measure the error on test? And we can compare it to your result, right?
Mm-hmm.
 

Colleagues, if you have time, can you ask me questions under the article?https://habrahabr.ru/company/aligntechnology/blog/303750/

Because Habr is silent at all!

Методические заметки об отборе информативных признаков (feature selection)
Методические заметки об отборе информативных признаков (feature selection)
  • habrahabr.ru
Всем привет! Меня зовут Алексей Бурнаков. Я Data Scientist в компании Align Technology. В этом материале я расскажу вам о подходах к feature selection, которые мы практикуем в ходе экспериментов по анализу данных. В нашей компании статистики и инженеры machine learning анализируют большие объемы клинической информации, связанные с лечением...
Reason: