Machine learning in trading: theory, models, practice and algo-trading - page 2843

 
СанСаныч Фоменко #:

I see. You have a superficial familiarity with machine learning models.

The first element of the chain is preprocessing, which takes 50% to 70% of the labour. This is where future success is determined.

The second element of the chain is training the model on a set of trainings.

The third element of the chain is the execution of the trained model on the test set. If the model's performance on these sets differs by at least one third, the model is retrained. You get it once in a while, if not more often. An overtrained model is a model that is too accurate. I'm sorry, the basics.


We seem to have different ideas about basics. Sorry. And we seem to speak different languages.
 
СанСаныч Фоменко #:

What's that about?

Earlier I wrote

Forum on trading, automated trading systems and testing trading strategies

Machine learning in trading: theory, models, practice and algo-trading

Aleksey Vyazmikin, 2022.12.08:44

Can you send me your sample? We have the same vision of the problem of poor model training, I would like to compare how much your selection method is better than mine, and whether it fits your sample.


You replied that it was a good idea, but you erased the message.

 
Andrey Dik #:

We seem to have different ideas about the basics. Sorry. And we seem to speak different languages.

As far as I understand now, I am discussing machine learning models and the optimisation that is built into these models. That's where you started, with neural networks.

You are discussing optimisation per se, which to me is not relevant in machine learning.


Good luck to you in your search for a global optimum.

 
СанСаныч Фоменко #:

As far as I understand now, I am discussing machine learning models and the optimisation that is built into those models. That is where you started, with neural networks.

You are discussing optimisation as such, which to me is not relevant in machine learning.


Good luck in your search for the global optimum.


I don't need the global optimum that you are looking for and failing miserably with over-trained models)))
look at training from a slightly different perspective. you have built a model, predictors, shmudictors and other cool stuff, trained the model, and then it doesn't work on the oos. there you go, it's AO's fault again!
I declare responsibly, it's not the AO's fault, it's the bad model.
I recommend random AO with sorting. your models will always be a little drunk, a little undertrained. guaranteed.
 
СанСаныч Фоменко #:

I have reached the most important idea: there is an undeniable connection between optimisation and model retraining. The model should always be left rather "coarse" and certainly no global optimums are needed.

Simply rejecting a global optimum will obviously not avoid overfitting. Overfitting consists in too great adaptation of the model to this particular sample, to the detriment of the existing regularity. It occurs due to the extremely high flexibility of almost all MO algorithms. Therefore, the standard way to deal with it is to introduce a penalty for excessive model flexibility into the optimisation criterion (lasso regression, as an example). You can simply restrict the flexibility of the model in a prescriptive way, but mathematically it is just a stiffer penalty.

This, by the way, is a good example of why it should be possible to create custom criteria.

Preferring a global extreme in favour of a plateau is a bit different. It is no longer a matter of over-fitting to a particular sample at the expense of existing and unchanging dependence. Here we are talking about the fact that due to non-stationarity of prices (what you wrote about at first) the dependence changes and we need to look for stable (robust) values of parameters that will remain good enough even with small changes in the dependence.

You don't need to mix everything in one pile.

SanSanych Fomenko #:

When I'm looking for an acceptable list of predictors - optimisation in the trousers sense. But the meaning is quite different: trying to avoid "rubbish in - rubbish out". There's a qualitative difference here from trying to find the "right" algorithm that finds the global optimum. No global optimum will give a profitable TS on rubbish.

The choice of trousers is an example of multi-criteria optimization - the choice is made by length, size, colour, fabric, price, brand, etc.) It is clear that the Pareto surface is not built, but there is an implicit mixing in the buyer's head of all criteria into one compromise. The same thing happens with feature selection. The important difference with trousers is that here an explicit formalisation of the trade-off optimality criterion will be useful, since constant reliance on intuition will lead to unpredictable failures.

 

if the model is working, it has settings at which it works well on unknown data. it is also likely to have settings that do not give satisfactory performance on oos - this case some call overtraining. in fact, the estimation criterion is not chosen correctly. the correct criteria will give a purple curve for a working model. the problem comes down to maximisation (global maximum) of the correct estimation criterion. in other words, if we find the global maximum of the correct criterion. we will get a purple curve.

and vice versa, if the criterion is chosen incorrectly, then maximisation of such an incorrect criterion will give a red curve.

And this is assuming that the model works, but we see how important the evaluation criterion is.

But if the model is not working, then nothing will help, neither criterion nor optimisation.

So, model->criterion->optimisation of criterion

 
Aleksey Nikolayev #:

The choice of trousers is an example of multi-criteria optimization - the choice is made by length, size, colour, fabric, price, brand, etc.) It is clear that the Pareto surface is not built, but there is an implicit mixing in the buyer's head of all criteria into one compromise. The same thing happens with feature selection. The important difference with trousers is that here an explicit formalisation of the trade-off optimality criterion will be useful, since constant reliance on intuition will lead to unpredictable failures.

Trousers selection is a good example of criterion-driven optimisation. not all good trousers will fit everyone. user-driven optimisation gives the best, best-fitting trousers (global maximum criterion).

trousers -> trousers evaluation criterion -> selection (optimisation of trousers evaluation criterion)

 
Aleksey Nikolayev #:

A simple rejection of the global extremum will obviously not avoid overtraining (overfitting, overfitting). Overfitting consists in too great adaptation of the model to this particular sample, to the detriment of the existing regularity. It occurs due to the extremely high flexibility of almost all MO algorithms. Therefore, the standard way to deal with it is to introduce a penalty for excessive model flexibility into the optimisation criterion (lasso regression, as an example). It is possible to simply restrict the flexibility of the model in a directive way, but mathematically it is just a stricter penalty.

This, by the way, is a good example of why it should be possible to create custom criteria.

Preferring a global extreme in favour of a plateau is a bit different. It is no longer a matter of over-fitting to a particular sample at the expense of existing and unchanging dependence. Here we are talking about the fact that due to non-stationarity of prices (what you wrote about at first), the dependence changes and it is necessary to look for stable (robust) values of parameters that will remain good enough even with small changes in the dependence.

Don't mix everything in one heap.

The choice of trousers is an example of multi-criteria optimization - the choice is made by length, size, colour, fabric, price, brand, etc.) It is clear that the Pareto surface is not built, but there is an implicit mixing in the buyer's head of all criteria into one compromise. The same thing happens with feature selection. The important difference with trousers is that here an explicit formalisation of the trade-off optimality criterion will be useful, since constant reliance on intuition will lead to unpredictable failures.

Nice to see a post by someone on topic!

 
Andrey Dik #:

If the model is working, then it has settings at which it works well on unknown data. it is also likely to have settings that do not give satisfactory performance on oos - this case some call overtraining. in fact, the estimation criterion is not chosen correctly. the correct criteria will give a purple curve for the working model. the problem comes down to maximising (global maximum) the correct estimation criterion. in other words, if we find the global maximum of the correct criterion. we will get a purple curve.

and vice versa, if the criterion is chosen incorrectly, then maximisation of such an incorrect criterion will give a red curve.

And this is assuming that the model works, but we see how important the evaluation criterion is.

But if the model is not working, then nothing will help, neither the criterion nor optimisation.

So, model->criterion->optimisation of criterion

I distinguish between two types of criteria, integral and derived.
.

examples of integral criteria: balance, profit factor and others. these criteria make a summary assessment without taking into account the results of intermediate events of the process (in trading these are transactions). for example, two results with the same final balance of 10000, in one case 1000 profitable transactions, and in the other case 999 unprofitable and 1 profitable. it is obvious that although the integral criterion is the same in both cases, but the way in which the result was achieved is coordinately different. that is why people often complain about integral criteria, that retraining is received, the market is not stationary, etc.

an example of a derived criterion is the standard deviation from a balance line that runs from the starting point to the end point. such criteria, unlike integral criteria, take into account the intermediate results of the process. this allows to unambiguously describe the requirements in the criterion.

Integral criteria may also have the right to be, as they are applicable to certain types of systems (for example, where the number of transactions per unit of time is practically a constant).

But, for both integral and derived criteria, a global optimum must be reached. the choice of criteria determines the robustness of the system in the future.

If the researcher has an idea that it may be necessary to search not for the global maximum, but for something in the middle, then in this case it is necessary to immediately reconsider the criteria for evaluating the model.

 
I have a question:
Why quote someone's huge text on half a page to write your own two words????
I don't understand these people ever...
Reason: