Machine learning in trading: theory, models, practice and algo-trading - page 1202

 
Maxim Dmitrievsky:

I do not really understand what is in the figures and the essence of the problem

The figures show the financial result (y-axis) of the model when choosing different probabilities for the binary classification (x-axis). On the test sample I got that it is necessary to enter the market always at appearance of an activation signal (training decides whether to enter the market or not). The resulting paradox is that the learning only worsens the basic activation signal and I would not have seen it, if I hadn't decided to see how the financial result changes depending on the shifting of the classification point on the segment of probability.

Maxim Dmitrievsky:

I made a lot of variants of models, and now I'm trying to figure out which one to choose for monitoring :D or to improve it further

in short... i can't decide which one to choose for monitoring :D or to improve it further.

Because for each dimension of the moving window there should be a different distribution from which trades are executed. Then the model adjusts better, including to the test sample. (while zigzag or other outputs are very deterministic in themselves, there are few degrees of freedom for the fitting) The last one is to do it and all, i.e. the enumeration of outputs is more thorough, and then there really will be nothing else to do

to the inputs of increments with different lags, in the old way, with self-sampling through importers and maybe through PCA to get rid of correlation, such variants of bots also made. But in general, using PCA is a flawed idea (although, again, the opposite is what they say on the Internet). Not only do the samples need to be centered, but with new data these components slowly turn into slag.

We have a very different approach to the problem. I'm alien to a purely mathematical description of prices without any real (visually observable patterns) justifications. On the contrary, I apply ZZ and see efficiencies from it (pedicators on ZZ are always at the top of the list in all MO packages). I think combining the two approaches could improve the results.

Selecting models through significance is nonsense - I've shown before that removing different significant predictors on the same model can improve learning results and form new, more productive and stable relationships in tree leaves. All this "importance" is the principle of greed in tree construction, which is not a priori correct, so we need separate meaningful predictor evaluation methods - I don't have them yet.

 

Maxim Dmitrievsky:

All this gives something like this, pretty much no fuss, just wait 10 minutes:

the possibility of further improvements seems dubious actually, when the model is already working out more than 100% of the tray

maybe on some good piece of graph/tool you can squeeze out more

It looks good, but the period of the model is too short, how does it behave with one year old data?

 
Aleksey Vyazmikin:

The figures show the financial result (y-axis) of the model when choosing different probabilities for the binary classification (x-axis). On the test sample it turned out that one should always enter the market when an activation signal appears (training decides whether to enter the market or not). The resulting paradox is that training only worsens the basic activation signal and I would not have seen it, if I hadn't decided to see how the financial result changes depending on the shifting of the classification point on the segment of probability.

We have a very different approach to the problem. I am alien to a purely mathematical description of price with no tangible (visually observable patterns) justifications. On the contrary, I apply ZZ and see efficiencies from it (pedicators on ZZ are always at the top of the list in all MO packages). I think combining the two approaches could improve the results.

Selecting models through significance is nonsense - I've shown before that removing different significant predictors on the same model can improve learning results and form new, more productive and stable relationships in tree leaves. This whole "importance" thing is a principle of greed in tree construction, which is not a priori correct, so we need separate meaningful methods for evaluating predictors - I don't have them yet.

Well, the importance should be looked at the test sample, and finite on the training sample. Classically built-in importance like gini always lies, you have to do permutation (randomize each of the predictors one by one and look at model error), then throw the worst ones out. It is necessary to get rid of correlated features beforehand, otherwise imports through permutation will also lie. No nonsense, you get the best possible model. Why try to re-invent the wheel if nothing better has been invented so far.

 
Aleksey Vyazmikin:

It looks good, but the period of the model is too short, how does it behave on data from a year ago?

15 minutes, I just don't teach longer because it takes longer to wait

Maybe only fxsaber can make such a smooth chart with 15-minute OOS for a few years :)

I have a no-action leveraging, i.e. no strategy is built into the model from the beginning.
 
Aleksey Vyazmikin:

I'm looking at the graph of profit dependence on the number of trees in the model (512 models)

and it looks like models with more trees over 60 drain less often or the sample is small...

Here are other graphs with different numbers of trees.

Sample 7400 for all, rf algorithm

Number of trees is 50.


The error drops as the number of trees grows. It would seem necessary to increase, suddenly to zero.

Number of trees = 150


At 150 the accuracy goes up but not very much - a couple hundredths.

Let's increase the number of trees.




Conclusion: up to 50 trees makes sense to increase the number of trees, but over 100 makes no sense.

I'm too lazy to do it now, but I changed the sample size.

Sample size up to 1000 observations greatly affects the accuracy of the model. But after 5000, sample size does NOT affect model accuracy.


Hence I conclude: the error is not determined by the model or its parameters, but by the "predictors-target" linkage.

 
SanSanych Fomenko:

Let us increase the number of trees

Hence I conclude: the error is determined NOT by the model or its parameters, but by the "predictors-target" link.

50-100 trees is recommended initially, there is no sense to multiply so many weak classifiers, I did not see any improvement either.

Monte Carlo and the like to help, SanSanSanych... constructing bundles is not a human mind thing, it's just a f... brains
 
Maxim Dmitrievsky:

Well, you have to look at the imports on the test sample, and finite on the training sample. Classically built-in imports like gini always lie, you must do permutation (sequentially randomize each of the predictors and watch the model error), then throw out the worst ones. It is necessary to get rid of correlated features beforehand, otherwise imports through permutation will also lie. No nonsense, you get the best possible model. Why invent a bicycle if nothing better has not been invented yet.

Honestly I don't understand the method - are we talking about disabling predictors from training step by step and comparing the results with and without this predictor? Then what does it mean to randomize? How do you decide whether a predictor sucks or not - if the predictor allows you to split correctly 1% of the sample and is in the middle depth of the tree - is it good or bad? Maybe you just need to consider the quality of the tree construction with the root predictor, how it cuts the sample at each level - maybe you need a smooth gradient decay... Bicycle has to invent, because what is in the public domain is not the best of what exists, for example, maybe we should divide the sample is not the maximum but the average or by x-sigma, or whatever - maybe the rules will be more complicated, but more stable. By the way, I do not understand why there is no method of training, which would use not only the numeric counters for splits, but also logical, comparing predictors?

Maxim Dmitrievsky:

15 minutes, I just don't train longer because it takes longer to

I have a feeling that only fxsaber could make such a smooth chart on 15-minute OOS for a few years :)

I have a no-action leerning, i.e. no strategy is laid into the model from the very beginning

Have you ever tried to create a primitive strategy and teach filters for it, that would approve or prohibit an entry into the market?

 
Aleksey Vyazmikin:

I honestly don't understand the method - are we talking about disabling predictors step by step from training and comparing the results with and without this predictor? Then what does it mean to randomize? How do you decide whether a predictor sucks or not - if the predictor allows you to split correctly 1% of the sample and is in the middle depth of the tree - is it good or bad? Maybe you just need to consider the quality of the tree construction with the root predictor, how it cuts the sample at each level - maybe you need a smooth gradient decay... Bicycle has to invent, because what is in the public domain is not the best of what exists, for example, maybe we should divide the sample is not the maximum but the average or by x-sigma or whatever - maybe the rules will be more complicated, but more stable. By the way, I do not understand why there is no method of training, which would use not only the numeric counters for splits, but also logical, comparing predictors?

Have you ever tried to create a primitive strategy and train filters for it that would approve or forbid an entry into the market?

First train the model with all features, save errors

then randomize each of the predictors, say, with a normal distribution, and check the error again on all features, including this randomized (changed) one, compare it with the initial one. There is no need to retrain the model. And so check each of the predictors. If the predictor was good, the error on the entire sample (including all other original predictors) will increase dramatically compared to the original. Save the error differences, sift out the best fiches based on them. Then, at the end, train on only the best ones and model into production. Bad predictors are noise to the model, what the hell they need with their 1%. Usually only 5-10 good ones remain, importance of the rest decreases exponentially (Zipf law).

I tried to teach filters, but not much, I don't see much sense, it's better to put everything into one model at once

If you can, just about the selection of predictors VERY competent(already threw earlier)

Beware Default Random Forest Importances
Beware Default Random Forest Importances
  • explained.ai
0.995 worst radius 0.995 mean perimeter 0.994 mean area 0.984 worst perimeter 0.983 worst area 0.978 radius error 0.953 mean concave points 0.944 mean concavity 0.936 worst concave points 0.927 mean compactness 0.916 worst concavity 0.901 perimeter error 0.898 worst compactness 0.894 worst texture 0.889 compactness...
 
SanSanych Fomenko:

Here are other graphs with different numbers of trees.

Sample 7400 for all, algorithm rf

The number of trees is 50.


The error drops as the number of trees grows. It would seem necessary to increase, suddenly to zero.

Number of trees = 150


At 150 the accuracy goes up but not very much - a couple hundredths.

Let's increase the number of trees.




Conclusion: up to 50 trees makes sense to increase the number of trees, but over 100 makes no sense.

I'm too lazy to do it now, but I changed the sample size.

Sample size up to 1000 observations greatly affects the accuracy of the model. But after 5000, sample size does NOT affect model accuracy.


Hence my conclusion: the error is NOT determined by the model or its parameters, but by the predictor-target relationship.


I think there can be a different number of trees for random forests and different types of boosting, and the number depends on the quality of the predictors and situations, which can be different for the same target (as an example, the target is 100 points of profit from any point). It's interesting to see what combinations of leaves and how often they are used to make decisions - I think that's the kind of information that can better estimate the model. And, another problem is that it is impossible to submit a stationary model of the market for testing and training, which means that only part of the trained model will be used in tests and the model has to be estimated by this part, while the other part can turn out to be much better. And if the classification error is not equal to the correct classification modulo (we use a trawl and decrease the error cost), the model evaluation becomes even more complicated.

 
Maxim Dmitrievsky:

first train the model on all features, save the errors

then, sequentially, randomize each of predictors, say, by normal distribution, and check the error again on all features, including this randomized (changed) one, compare it with the initial one. There is no need to retrain the model. And so check each of the predictors. If the predictor was good, the error on the entire sample (including all other original predictors) will increase dramatically compared to the original. Save the error differences, sift out the best fiches based on them. Then, at the end, train on only the best ones and model into production. Bad predictors are noise to the model, what the hell they need with their 1%. Usually only 5-10 good ones remain, importance of the rest decreases exponentially (Zipf law).

I tried to teach filters, but not much, I don't see the point, it's better to put everything into one model at once

If you can, just about the selection of predictors VERY competent(already threw earlier)

Thank you. Randomize with the same values as the predictor in the sample, right?

In general, the approach is clear, thank you, I need to think about how to implement and try it out.

I'm sorry, I can not master it, so I'll listen to your paraphrases on occasion.

But again, I don't think this is quite right, because it all depends on how close to the root the predictor is in the tree...
Reason: