Discussion of article "Random Forests Predict Trends" - page 9

 
Reshetov:
Now you've ruined everything. And how is it that your high AI didn't immediately recognise a trivial sum?
 
gpwr:

faa1947, please show how your model works on the example below. The first column is the modelled series, the 2nd and 3rd columns are predictors. What is the predictive power of these predictors?

-0.35742 0.461646 -0.81907
0.431277 0.004545 0.426731
-0.00018 -0.4037 0.403528
-0.08575 0.90851 -0.99426
0.773826 0.008975 0.764852
0.426905 -0.96485 1.391757
0.253233 0.487955 -0.23472
0.20994 0.880554 -0.67061
-0.09929 0.160276 -0.25956
0.332911 -0.08736 0.420268
0.032258 0.360106 -0.32785
0.253027 -0.06859 0.321621
-0.66668 -0.54985 -0.11683
-0.5476 -0.13231 -0.41529
-0.75652 0.536488 -1.29301
-0.66109 -0.87314 0.212052
-0.09993 -0.86293 0.763
0.014625 0.715032 -0.70041
-0.48345 -0.62666 0.143206
-0.03596 0.935653 -0.97161
-0.17023 0.678024 -0.84826
0.293376 0.079529 0.213847
0.002922 0.754594 -0.75167
0.329194 -0.05535 0.384546
0.639423 -0.41358 1.053007
0.431631 -0.60334 1.034973
0.59125 0.497989 0.093262
0.266098 -0.79645 1.062549
-0.02624 0.643164 -0.6694
0.055014 -0.46214 0.517154
0.436132 -0.89992 1.336052
-0.30143 0.628619 -0.93005
-0.12168 0.886467 -1.00814
-0.10851 -0.0507 -0.0578
-0.74573 -0.50921 -0.23653
-0.574 0.244825 -0.81883
-0.87313 0.336188 -1.20932
-0.00344 0.117363 -0.1208
-0.20265 0.424326 -0.62697
0.177873 -0.17967 0.357541

I am not a generalist mathematician. I try to reason very concretely, and on the grounds of trading it is not difficult to get a sample of 10000 lines. I do not know how to draw any conclusions on 40 lines and I do not see the need to learn, although everything I talk about is applicable to such samples. For medicine 40 lines is normal.

1. In general, I am writing about the overtraining of the model within which the "predictive ability" arose.

2. I am concerned with prediction of nominal values - "long-short". These are classification models. Your example, regression models I do not do .

I'm willing to continue.

I need a quote on which I can run a ZZ and have multiple breakouts. From there I need a fairly large file.

In addition to the cotir itself, several predictors, even one. I will answer the question: does this predictor have predictive power of longs-shorts.

And of course a file is needed - or do you suggest typing everything by hand?

 

TheXpert:

Reshetov:

The secret of "high generalisability" of your sample is revealed: the value of the first column is the sum of the values of the other two columns.


Now you've ruined everything ) . And how is it that your high AI didn't recognise a trivial sum at once?

And it is not designed to recognise sums, because it builds models for binary classification, and the task is from the multiple regression section.

Although the model is for binary classification, but still, the expression:

double x0 = 2.0 * (v0 + 0.96485) / 1.900503 - 1.0;

double x1 = 2.0 * (v1 + 1.00814) / 2.399897 - 1.0;

y = 0.12981203254657206 + 0.8176828303879957 * x0 + 1.0 * x1 -0.005143248786272694 * x0 * x1;

is simplified to: y ~ v0 + v1.

And then all that is left is to test the hypothesis in a spreadsheet.

 

Good afternoon, SanSanych.

On the issue of undertraining, overtraining, you can look at the draft book here http://www.iro.umontreal.ca/~bengioy/dlbook/.

Section 5.3.3 describes everything very well. In general, the whole book is very useful, especially written by coryphets.

Good luck

 
vlad1949:

Good afternoon, SanSanych.

On the issue of undertraining, overtraining, you can look at the draft book here http://www.iro.umontreal.ca/~bengioy/dlbook/.

Section 5.3.3 describes everything very well. In general, the whole book is very useful, especially written by coryphets.

Good luck

Good afternoon!

Thanks for the link.

I have a complete set of tools and a selection of literature on the subject. But it does not facilitate practical application.

If you wish, I can share in the hope of bringing this whole toolkit to automaticity together.

 
faa1947:

Good afternoon!

Thanks for the link.

I have a complete set of tools and a selection of literature on the subject. But it doesn't make practical application any easier.

If you wish, I can share in the hope of bringing all this toolkit to automatism together.

I solve this problem programmatically. The results are fine.

Good luck

 
faa1947:

I am not a universal mathematical expert. I try to reason very concretely, and on the basis of trading it is not difficult to get a sample of 10000 lines. I do not know how to draw any conclusions on 40 lines and do not see the need to learn, although everything I talk about is applicable to such samples. For medicine 40 lines is normal.

1. In general, I am writing about the overtraining of the model within which the "predictive ability" arose.

2. I am concerned with prediction of nominal values - "long-short". These are classification models. Your example, regression models I do not do .

I'm ready to continue.

I need a quote on which I can run a ZZ and have multiple breakouts. From there I need a fairly large file.

In addition to the cotir itself, several predictors, even one. I will answer the question: does this predictor have predictive power of longs-shorts.

And, of course, a file is needed - or do you suggest typing everything by hand?

I see. It is quite easy to check if the model is overtrained by comparing its behaviour on the training sample and outside. But how to make the model not overtrained depends on our ability to determine which predictor-inputs are relevant to the modelled series and which are not, which is much more difficult than determining overtraining. The ability of a model to generalise depends on its overfitting. The example I gave is very simple. The series y being modelled is a noisy sine wave. The first predictor x1 is random numbers. The second predictor x2 is the difference x2 = y-x1. In other words, the model is accurately described by the sum of predictors y = x1+x2. The fact that you refused to apply your method to this simple example only raises the suspicion that your method is not capable of determining the relevance of the data, and determining this relevance is precisely the main goal of identifying overtraining and eliminating it. Real modelling problems are much more complex than my example - they include both relevant data and much more irrelevant data. Separating one from the other is incredibly difficult. A neural network with all inputs will be trained to have connections to the relevance and non-relevance inputs and thus will be retrained. Since you apparently don't know how to determine the relevance of data, I have no interest in your articles and books. Good luck!
 
gpwr:

It is quite easy to check if the model is overtrained by comparing its behaviour on the training sample and outside.

This is a big illusion and, as I understand, not yet paid for by you. The model given in the article has equally good results on three samples outside the training - but this model is overtrained.

And how to make the model not over-trained depends on our ability to determine which predictor-inputs are relevant to the modelled series and which are not, which is much more difficult than determining over-training.

First, read the paper carefully - Table 3 gives the significance of predictors in predicting the target variable

And then learn the matrix, for example, study specialised packages for selecting predictors varSerf, Boruta, FSelector. And the CORElearn package has 35 (!) different algorithms for selecting predictors that matter for the target variable.

From my experience in selecting predictors that matter for the target variable.

1. we form a rather large set of predictors, for example, 50 pieces with the number of bars 15000.

2. With the help of one of the above mentioned algorithms we select predictors on these 15 thousand bars - we usually get 15 to 20 pieces, which are used in model building more often than in 20% of cases.

3. Then we take a smaller window, for example 2000 bars and start moving it one bar at a time, selecting significant predictors from the previously selected 20 out of 50.

4. The specific list of significant predictors changes all the time.

Since you apparently do not know how to determine the relevance of data, I have no interest in your articles and books.

The efficiency of using these packages in your hands will increase greatly if you spend a small amount of money on my book, which explains why it is necessary and how to understand it and real examples on real data.

And the effect will be even greater if you and I together try to create an untrained model on your predictors. Success is not guaranteed, but it is guaranteed that you will not write such superficial posts after communicating with me. Moreover, you will be much more careful on real accounts.

 
faa1947:

1. we form a rather large set of predictors, for example, 50 pieces with the number of bars 15000.

Well, now it is clear why you make money selling your book and not trading.

 
faa1947:

These are big illusions and as I realise not yet paid for by you. The model given in the article has equally good results on three samples out of training - but this model is over-trained.

Overtraining is a well-established and quite specific term. You are not only substituting it but also not explaining what it is in your understanding.

Very reminiscent of Sulton ) in the way you talk.

This website uses cookies. Learn more about our Cookies Policy.