Machine learning in trading: theory, models, practice and algo-trading - page 897

 
Aleksey Vyazmikin:

assessment of oob (out of bag)

 
Maxim Dmitrievsky:

oob (out of bag) estimation

I read about this method herehttps://habr.com/company/ods/blog/324402/ but could not see how estimation can affect the search for a pattern in the data being changed. Maybe I'm wrong, but here's a simple example in a sample we have, let's say, a pattern like this

"

1+2=3

...

1+2=3,5

...

1+2=3,8

...

1+2=3,5

...

1+2=3

"

"..." - is not a certain period of time after which the rule changes. Even if ideally there is a pattern of rule changes. How can they find this regularity saying that a rule will be changed in n sample lines and then n*x rules will return to their original state? What if the nature of the rule change is not just a time interval, but the influence of other circumstances whose data are in the sample, but the regularity of their influence can only be estimated by the sequence of events (i.e. by the sequence in which each row of data is submitted)? Forests pull chunks by different methods, how can they see not only the horizontal (set of predictors) regularity, but also the vertical (change of predictors relative to the past n)?

 
Aleksey Vyazmikin:

I'll answer later in the evening... I suddenly had a craving for pizza and blondie

 
Maxim Dmitrievsky:

I'll answer later tonight... I suddenly had a craving for pizza and blondie

Spring - can explain the suddenness :)

I will wait for an answer, thanks for taking the time to my probably stupid questions.

 
Aleksey Vyazmikin:

Spring - can explain the suddenness :)

I look forward to hearing from you, thank you for taking the time to ask my probably stupid questions.

On the contrary, these are good logical questions, I asked them myself recently.

 
Aleksey Vyazmikin:

"..." - it is not a certain period of time after which the rule changes. Even if ideally there is a pattern of rule changes. How can forests find this regularity saying that a rule will be changed in n sample lines, and then in n*x lines the rule will return to its original state? What if the nature of the rule change is not just a time interval, but the influence of other circumstances whose data are in the sample, but the regularity of their influence can only be estimated by the sequence of events (i.e. by the sequence in which each row of data is submitted)? Forests pull chunks using different methods, how can they see not only the horizontal (set of predictors) pattern, but also the vertical (change in predictors relative to past n)?

Well not exactly a change in the pattern. More like a coarser approximation. If the sample is large enough, for example, the scaffold is trained on random subsets, pull chunks out of it, yes, and on the oob (the remaining chunks) the model is validated, and the errors are compared. If the errors are +- the same, then the scaffolds are not over-trained, so there is a higher probability of correct predictions in the future. If the error on the oob is not satisfactory, then you can play with the settings a bit, such as reducing the training subset (adding more noise to the model) and increasing the validation subset. By doing so, the model will already approximate the training samples worse, the error will be larger, but on new data there is a chance of getting exactly the same error, i.e. the model will be stable on both subsamples. And since the subsamples themselves are chosen at random, a large number of unknowns on the training subsample are covered. It is clear that this is not a panacea, but it gives more flexibility as opposed to just trees. Same with NS ensembles.

 
Maxim Dmitrievsky:

Well, not exactly a change in the pattern. More like a rougher approximation. If the sample is large enough, for example, then the scaffold is trained on random subsets, pulling chunks out of it, yes, and on the oob (the remaining chunks) the model is validated, and the errors are compared. If the errors are +- the same, then the scaffolds are not over-trained, so there is a higher probability of correct predictions in the future. If the error on the oob is not satisfactory, then you can play with the settings a bit, such as reducing the training subset (adding more noise to the model) and increasing the validation subset. By doing so, the model will already approximate the training samples worse, the error will be larger, but on new data there is a chance of getting exactly the same error, i.e. the model will be stable on both subsamples. And since the subsamples themselves are chosen at random, a large number of unknowns on the training subsample are covered. It is clear that this is not a panacea, but it gives more flexibility, as opposed to just trees. Same with NS ensembles.

Well, roughly as I thought, if primitively, there is simply a check of rules, on a sample of each conditionally independent tree, and because of crossing error is bought, overtraining, but in the same way cut out all temporal patterns whose causality could not be established (and to establish this causality could only be random, if the tree checked its result with that sample, where the pattern was preserved).

And if we slice the sample and train on smaller chunks (let's say cut a year into 12 months and take 2-3 years), and then, in the case of the tree, gather all the rules from each tree with more weight and match them with 24 samples (if a rule works for less than x% of the sample, throw it out), can we not see that different rules will work for different periods? Then we can make the assumption of cyclicality, which must be present in financial markets, due to the timing (financial reports).

For example, many people write about correlation analysis as a preliminary method for estimating predictors, but when I look at the table I cannot understand, the correlation is small, but the tree gives more significance to this element after it is built. Why does this happen?


If we take a predictor called "arr_TimeH" and think about it, it is obvious that we can expect different behavior from the market at different times, for example at 10 am at the stock exchange opening there will be a strong movement, because the information (accumulated events) from the moment of absence of trades is worked out, and at other times the situation may be different, the same planned news may come out, after which a strong market movement is very likely, on the other hand there is an evening session, where movement often changes against the previous day, may be less amplitude, i.e. the time obviously influenced the market, the market will move more or less. That is why I think that MO methods for trading should be used with correction for trading, and not just trust to the already established traditions, including preprocessing data.


P. S. I drew the tables in Photoshop, checked them at will, in order to show the color, and was shocked when I saw that the color of checkboxes coincided with the color of the significance scales - rise to the tone! How so? It turns out that I unconsciously paid attention to it and it affected my choice, maybe people trade intuitively in the same way, that is, according to a system that they are not aware of.

 
Aleksey Vyazmikin:

For example, many people write about correlation analysis as a preliminary method of assessing predictors, but here I look at the table and can not understand, correlation is small, and the tree gives great importance to this element after the construction. Why is this happening?

Perhaps according to a combination of your time-predictors (month, week, day, hour...) the tree is simply going to a certain BUY/SELL bar.

It's like memorizing the Time of big bars and profitable trading by it on historical data, though the correlation of this attribute to price movement will be practically zero.

 
Ivan Negreshniy:

Maybe by combination of your temporal predictors (month, week, day, hour...) the tree just goes to a particular BUY/SELL bar.

It's like memorizing Time bars and trading on them profitably on history, although the correlation of this attribute to price movement will be almost zero.

Perhaps, it does, but there are only two predictors - weekday and hour, i.e. we can get 5*14=70 groups with such attribute, while the sample contains 403933 lines, i.e. 5770 lines fall into the group, on the other hand - 33000 target lines, i.e. 471 target lines fall into each group. And if you also take into account that there are other predictors - then we will already have a lot of groups. It's like cutting an apple into slices, labeling the slices, and writing down those slices that have more of one trait than the others, and since there are so many slices, there will be slices with only one trait. So the question arises, how many predictors should there be for a certain sample size? How big should the apple slices be?

Well, there is a pattern in itself in the days and hours, and the influence here are chronometric factors - the opening of the trading session, the period of trading sessions, news (economic/statistical, which are mostly released at the same time and day of the week).

 
Aleksey Vyazmikin:

Maybe, and comes out, but predictors only two - day of week and hour, ie you can get 5 * 14 = 70 groups on this feature, and in the sample 403933 lines, ie in the group gets 5770 lines, on the other hand the target 33000, then it turns out for each group 471 target gets. And if you also take into account that there are other predictors - then we will already have a lot of groups. It's like cutting an apple into slices, labeling the slices, and writing down those slices that have more of one trait than the others, and since there are so many slices, there will be slices with only one trait. So the question arises, how many predictors should there be for a certain sample size? How big should the apple slices be?

Well, there is a pattern in itself in the days and hours, and it is influenced by chronometric factors - opening of a trading session, period of trading sessions, news (economic/statistical, which are mostly released at the same time and day of the week).

Maxim Dmitrievsky, how do you solve this problem?

What are the options? Pieces of the apple can be different.
For each ns from the ensemble add some context and use these contexts in some control ns?
By context I mean, for example, a link to some basic definition, concept, predictor, and plus some data...