Machine learning in trading: theory, models, practice and algo-trading - page 1301

 
Aleksey Vyazmikin:

Doesn't the model evaluation affect its performance when applied to an unfamiliar sample?

What are you doing, building a bunch of different models and checking which one works best?

What does this have to do with "leaves" then, selection of the best leaves, etc.?

I'm just trying to understand what you're writing about periodically.

Or does each line correspond to one sheet
 
Maxim Dmitrievsky:

What are you doing, building a bunch of different models and checking which one works best?

What does this have to do with "leaves" then, the selection of the best leaves, etc.?

I'm just trying to understand what you're writing periodically.

Seems the conversation was about automatic selection of the model, I explained that the interesting models can be selected in two ways, through a known criterion and formula (as I do it now - the last 3 columns are filled for each sample and for each sample is formed such a table, if 3 column-filter coincided, the model is selected), and you can use machine learning, when you understand what you want from the model in work on an independent sample, but do not know how to achieve it. So for the second way different metric indicators of the model become predictors and the model is trained on them, which already by means of MO selects suitable models from similar data. Similar experience with learning I conducted that year, it showed positive results, in terms of estimation accuracy is good, but completeness is not so good, then I decided that there is not enough diversity in sampling and postponed the work until better times. Now a lot of samples are generated differently, and you can return to this work. The main idea is not to select the best from the available pool, but to select the best by absolute criteria, be it MO or fixed index.

Leaves - this is work with selected models.

Each line is a separate model.
 
Aleksey Vyazmikin:

It seems that the conversation was about automatic model selection, I explained that interesting models can be selected in two ways, through known criterion and formula (as I do it now - 3 last columns are filled for each sample and such table is formed for each sample, if 3 filter columns coincide then model is selected), or by machine learning, when you understand what you want from model in work on independent sampling, but do not know how to achieve it. So for the second way, different metric indicators of the model become predictors and the model is trained on them, which already by means of MO selects suitable models from similar data. Similar experience with learning I conducted that year, it showed positive results, in terms of estimation accuracy is good, but completeness is not so good, then I decided that there is not enough diversity in sampling and postponed until better times. Now a lot of samples are generated differently, and you can return to this work. The main idea is not to select the best from the available pool, but to select the best by absolute criteria, be it MO or fixed index.

Leaves are already working with selected models.

So you then take n-models (as in the file), enter their metrics as predictors for the NS, and then what comes out?

some estimates from experience? like with such indicators the model will work, but not with such?

And then you filter new models through this stuff? Well, like the NS selects the MO models by itself?

 
Maxim Dmitrievsky:

That is, you then take n-models (as in the file), enter their metrics as predictors for the NS, and then the output is what?

some estimates from experience? like with such indicators the model will work, but not with such?

And then you filter new models through this stuff? Well, like the NS selects the MO models then itself?

When I experimented, I took similar metrics for the test sample, and in the target one I put the result of the test (independent of training) sample. The target metrics were profit and drawdown (separately for buying and selling trades) and something else from the metrics of the model itself, I do not remember exactly. Now to the test sample data I need to add the metric indexes from the training sample (at the time I did not know that the result for the Cutbust may be significantly different there), and I still need to experiment with the target one.

The obtained model was fed with results from other samples with models, the main result then was a good filtering of unprofitable models.
 
Aleksey Vyazmikin:

When I experimented, I took similar metrics for the test sample, and put the result of the test (independent of training) sample into the target one. The target metrics were profit, drawdown (separately for buying and selling trades), and something else from the model itself, I do not remember exactly. Now to the test sample data I must surely add the metric indexes from the training sample (at the time I did not know that the results may be significantly different for the Catbust), and I must experiment more with the target one.

That's a very strange ornate solution, I've never seen such a thing and have trouble saying anything about it.

but if it works fine
 
Maxim Dmitrievsky:

a very strange ornate solution, I have never seen anything like this and find it difficult to say anything about it

but if it works, good

The idea is that by the structure of the model, its behavior on the test and training sample, you can lay down certain expectations of behavior in real work.

This direction is very interesting, but requires time and resources. On the other hand here it is possible to develop collectively, to exchange openly predictors.

If you can not say anything about its performance in the future on the model, all MO is a waste of time - it is a matter of chance...

 
Aleksey Vyazmikin:

The idea is that by the structure of the model, its behavior on the test and training sample, you can lay down certain expectations of behavior in real work.

This direction is very interesting, but requires time and resources. On the other hand here it is possible to develop collectively, to exchange openly predictors.

If nothing can be said by the model about its performance in the future, all MO is a waste of time - a matter of chance...

Over time, the scatter of results increases, this must be taken into account. If the model breaks immediately on new trades only then the fitting is done, otherwise you can try to improve it. The easiest way to improve is regularization (gradient step in katbust) or just not training.

Look at how people trade - all sorts of martingale stuff. MoD already gives some advantage.

I'm not writing about complex Bayesian-type models now, because I haven't figured out how to work with them myself, there's still a lot to learn and saw.
 
Maxim Dmitrievsky:

Over time, the scatter of results increases, it must be taken into account. If the model breaks right away on new trades, only then the adjustment, otherwise you can try to squeeze

If the model breaks right away on new trades, then fitting is the only way to try to squeeze it out. I've been showing yesterday that Kabuwen's is a good broker and not a bad dealer.

Yesterday I showed that Catbust forms noise in the leaves(binary trees), which can be removed and the model will improve. I experimented a little more in this direction, increasing filtering, and found that after a certain threshold a paradoxical thing happens - improvements stop on independent sample, but continue on test and training sample. I.e. in reality it turns out that the model continues to work (on the independent sample from training) on the knurled on connections with low weight, actually fits, and here there is a question that either weights are not correctly distributed, or the model is over-trained and accidentally works on white noise (well not exactly noise, on less significant indicators of binary trees). I think you can also look at where these relationships came from and identify their significance on a short exam sample.

 
Aleksey Vyazmikin:

Yesterday I showed that Catbust forms noise in leaves (binary trees), which can be removed and the model will improve. I experimented a little more in this direction, increasing filtering, and found that after a certain threshold a paradoxical thing happens - improvements stop on the independent sample, but continue on the test and training. That is, in reality it turns out that the model continues to run as usual on low-weight links, in fact, fits, and there is a question that either the weights are not distributed correctly, or the model is retrained and accidentally runs on white noise (well, not exactly noise, on less significant indicators of binary trees). I think you can also look at where these relationships came from and find their significance on a short exam sample.

Whichever way you dig, you will find some illusory "regularities" everywhere, they can be found in any phenomenon

The thing that pleases me the most is the large number of "predictors". Where would it come from in the quotes? There's 90% garbage there.

 
Maxim Dmitrievsky:

I have no idea, I don't get into trees and leaves, and I don't intend to... everything can be done at the level of the model itself.

Whichever way you dig - there will be some illusory "regularities" everywhere, they can be found in any phenomenon

so just work in known ways.

And I'm just inspired by manual tuning - I've lost faith in passive magic.

I don't know the exact algorithm for leaf weights, but I think it depends on the sequence of links found, not just the links themselves, i.e. if a new tree in the busting corrects an error, then weight is given on the delta of error correction, while the new link may be more valuable than the correction itself. Ideally you should double-check the links and their weights, check for the number of binary trees involved in decision making, if there are a dozen trees giving 0.5 probability in total, it might be a weak link... On the other hand you need to take into account the size of the tree itself (I'm using depth 4 now, just to identify short rules in leaves). This is just a thought, not in need of an answer ...

Reason: