Machine learning in trading: theory, models, practice and algo-trading - page 81

 
mytarmailS:
there's a ser :)

)))

Are you a wealthlab trader by any chance?

 
Alexey Burnakov:

)))

Do you trade at wealthlab?

No, it was tslab, I upload data from R and use tslab to simulate trading on R-model signals, it's faster, more convenient and visual than simulating trades in R-ka, I also use stop-losses, take-takes, commissions... so it's a real headache to write it all
 
Dr.Trader:

4) if you do crossvalidation during training - repeat it several times on the same data, see how much variation there will be in the results, choose models and predictors with a small variation

This is what comes to mind now, but it's not the limit of the possible problems.

You'll be surprised and you'll definitely disagree with me (and not only you :) ), but I don't believe that crosvalidation is effective in the market, at least in its classic application

 
mytarmailS:

You'll be surprised and definitely disagree with me (and not only you :) ) but I do not believe that crosvalidation is effective in the market, at least in its classic application

Why so few trees?
 
mytarmailS:

You'll be surprised and definitely disagree with me (and not only you :) ) but I do not believe that crosvalidation is effective in the market, at least in its classic application

let's start with how you understand the very meaning of crossvalidation. can you tell?
 
SanSanych Fomenko:
Why so few trees?

Well, it turns out that the more trees the less deals the system makes and their quality does not grow at all

For example if my model makes 500 deals with the parameter 10/5 then with the parameter 5/200 (5 splits, 200 trees) it makes one deal or no deals at all, then the generalization goes down, the model is searching for very clear situations that were once but will never be in the future

 
Alexey Burnakov:
Let's start with how you understand the meaning of crossvalidation. can you tell?

I'm sure it's the same as you

We divide the site into 5 parts, train on the 4th, check on the 5th and so go through all variants with sites that would check out-of-sample occurred in all 5 parts of the sample and count the average error

That seems to be it if I haven't forgotten

 
mytarmailS:

Well, it turns out that the more trees the less deals the system makes and their quality does not grow at all

for example if my model makes 500 deals with the parameter 10/5 then with the parameter 5/200 (5 splits, 200 trees) it makes one deal or no deals at all, then the generalization goes down, the model is searching for very clear situations that once happened but will never happen in the future

Interesting idea. It turns out that the number of trees you are fighting with overtraining?
 
mytarmailS:

I'm sure it's the same as you

We divide the site into 5 parts, train on the 4th, check on the 5th and so go through all variants with sites that would check out-of-sample occurred in all 5 parts of the sample and count the average error

That seems to be it if I haven't forgotten.

Yes. What do you need this for? To find the optimal training parameters.

What don't you like about this approach? How are you going to select parameters?

 
SanSanych Fomenko:
It's an interesting idea. It turns out that the number of trees you have is fighting with overtraining?

not really...

What I'm writing is only applicable to my approach.

you know how i do my target, it is reversals

I have three classes of reversal "up", "down" and "no reversal" ( 1 , -1 , 0)

Also you know that the skew of the classes is enormous, the class "0" is dozens of times more than "-1" and "1"

and this means that the model is best trained on the "0" class because it has the most observations, and when you train the model, the more trees the more "0" class is trained and as the "0" class gets better and stronger it starts to absorb (squeeze out) the "1" and "-1" classes That's why the more trees the less deals

Reason: