Machine learning in trading: theory, models, practice and algo-trading - page 1532

 
Aleksey Nikolayev:

Or something like a system-list portfolio with a sliding recalculation.

What exactly should I recalculate?

 
Aleksey Nikolayev:

Or something like a system-list portfolio with a sliding recalculation.

Then it's multiclass. Make a 2nd model that will choose which sheet is best to trade at the moment. Advantages are not obvious and harder to make.

 
Maxim Dmitrievsky:

Well, to begin with the theory. For example, what is the point of selecting models separately for sale and for purchase?

All that is not to buy - to sell, and vice versa.

My ideology is in improving the basic trading strategy, and the strategy is trend following; therefore it does not provide for the flips at an empty place -market entries from the flat, and the "Do Not Enter" essentially sifts out false flat breakdowns.

Further, I experimented with models for closing profitable positions near long ZZ peaks and the result was not satisfactory, i.e. either the predictive ability is lower there or my predictors do not work well there, or maybe both, so I do not use the reversal strategy. On the contrary, I think it's better to train two different models here.

Maxim Dmitrievsky:

When you can just filter inputs through a higher threshold. The "don't trade" class can be given excessive weight by the model, causing the model error to decrease and the predictive (generalizability) ability to generally fall.

Almost this approach, with binary classification to trade/not to trade where it is specified by a model threshold, I use in my experiments with CatBoost, but the trouble is that the model is built like a vacuum cleaner sucking up everything good and bad and the output is a fast model with a low number of inputs.

Maxim Dmitrievsky:

The point of the second model is that 1st model will have 1stand 2nd kind errors - false positive and false negative. We are interested in removing them. To do it you feed the same features to the input of the 2nd model and the output of the 1st model, where 0 - trade was profitable and 1 - trade was losing. Train the second classifier and trade only when it shows 0, i.e. filters the signals of the 1st model. Loss trades will almost disappear on the tray, you have to test it on the test - that's one.

You can train the second model not only on the tray, but also capture the CB, then it will correct trades on the new data - this is two. And then the tests.

That's exactly what I did, only I used not separated predictors, but sheets as predictors, where 1 - signal from the sheet in the predictor, and in the target - the correct classification answers. Of course, my method does not allow to find new connections of predictors, but it allows to look for relationships between existing connections.

I will try your method too, thanks. However, it can be realized in this concept only as a CatBoost model, probably, to save time, but I'm afraid it will cut everything greatly again.

 
Maxim Dmitrievsky:

then it's a multiclass. Make a 2nd model that will choose which sheet is the best to trade at the moment. The advantages are not obvious and harder to make.

Nah, there are too many sheets, such classification won't work...

 
Aleksey Vyazmikin:

I have the ideology to improve the basic trading strategy, and the strategy is a trend strategy, so it does not provide for flips in an empty space - market entries from the flat, and the "Do Not Enter" filters out false flat breakdowns in fact.

Further, I experimented with models for closing profitable positions near long ZZ peaks and the result was not satisfactory, i.e. either the predictive ability is lower there or my predictors do not work well there, or maybe it is both, so I do not use the reversal strategy. On the contrary, I think it's better to train two different models here.

Almost this approach, with binary classification to trade/not to trade where it is specified by a threshold of the model, I use in my experiments with CatBoost, but the problem is that the model is built like a vacuum cleaner sucking up good and bad and the output is a fast but low-incoming model.

That's exactly what I did, only I didn't use the predictors disjointed, but sheets as predictors, where 1 is the signal from the sheet in the predictor, and the target is the correct classification answers. Of course, my method does not allow to find new connections of predictors, but it allows to look for relationships between existing connections.

I will try your method too, thanks. However, it is possible to realize it in the given concept only as CatBoost model, probably, to save time, and I'm afraid that one will strongly cut everything again.

Flips should not be as such, will simply be filtered out by the 2nd model false signals. Well, it depends on the implementation and what you want.

Then it is strange why there are a lot of losing trades, or it is from the OOS picture? Yes, to use the 2nd model there should be a lot of trades so that there would be something to filter, you can even artificially add trades for that (oversampling)

 
Maxim Dmitrievsky:

There should be no reversals as such, it will just filter out the false signals with the 2nd model. Well, it depends on the implementation and what you want.

Then it is strange why there are a lot of losing trades, or it is from the OOS picture? Yes, there should be a lot of deals for using the 2nd model to filter them, you can even artificially add deals for that (oversampling)

The above screenshots were tests on data that was not involved in the training of the model in any way.

Here are the results of training, screenshots - the first is my compilation of sheets on the data used for training (only 1/5 interval was taken)

Notice that the profitable long trades are 52,86%.

We added a tree built on the answers of other sheets.

And the results have improved, the profitable long trades have increased to 79.56%.

To summarize, the approach generally works, but the result in the real application is not so great - why is that probably some of the sheets where the training took place does not contain stable links or these links are few. Each sheet has a response rate of about 1%-3% in the sample.

 
Another option I think is to try not classification, but regression, and isolate exactly the high-yielding combinations of leaves, maybe there will be a better effect in monetary terms.
 
Aleksey Vyazmikin:

In the screenshots above there were tests on the data, which were not involved in the training of the model in any way.

Here are the results of training, screenshots - the first is my compilation of sheets on the data, which was used for training (only 1/5 interval was taken)

Notice that the profitable long trades are 52,86%.

We added a tree built on the answers of other sheets.

And the results have improved, the profitable long trades have increased to 79.56%.

To summarize, the approach generally works, but the result in the real use is not so great - why is that probably some of the sheets where the training took place does not contain stable links or these links are few. Each sheet has a response rate in the neighborhood of 1%-3% per sample.

Well if you just try to improve what you have, that's the Folds option. Break it up into several chunks and train the 1st on some chunks and the 2nd on the rest. I've done up to 500 fouls. You can do less of course. Some sort of improvement.

 
Maxim Dmitrievsky:

Well, if you just try to improve what you have, it's a variant with fouls. Break it up into several chunks and train the 1st on some chunks, and the 2nd on the rest. I've done up to 500 fouls. You can do less of course. It gives some improvement.

Split the sample into pieces and train multiple models, or what?

Although, my sample is only about 14 000 lines...

 
Aleksey Vyazmikin:

Split the sample into parts and train multiple models, or what?

I only have a sample of about 14,000 lines, though...

I wrote above. The main model on one half of the sample, the second correcting model on the other half

then 5-10 fouls will be enough, you can do more.
Reason: