Machine learning in trading: theory, models, practice and algo-trading - page 3357

 

A probabilistic approach is good and right. We will always have strong noise and the point is to look for differences from what it would have been under SB. The noise variance alone will not be sufficient for this.

IMHO, the classification task is not well suited because it significantly discards information. We need something like looking at the distribution of the price movement value in the desired direction and modelling how this distribution depends on the signs, and then already build a TS according to the type of this distribution (if it has differences from what it would be at SB).

 
Maxim Dmitrievsky #:
Some new definitions again.
One last time: the classifier is calibrated because it outputs incorrect probabilities. They are meaningless in their original form. Get over it.

Couldn't stomach it.

There is no such thing as an abstract, read reference, perfect, probability that is not tied to a random process.

No such thing.

The probability of a coin being tossed, etc.

Therefore, any classifier gives a probability that characterises that particular classifier, which gives the characteristic we need - prediction error. Another classifier will give other probabilities with corresponding class prediction error.

Depending on the predictors and their associated labels, as well as class balancing, the problem of setting a threshold for dividing the probabilities into classes arises. The tools for this operation, called "calibration", are given above. It is possible to do it in a kolkhoz way as well.

In any case, you can significantly reduce the prediction error for those probabilities given by a particular classifier, because there are no other probabilities in nature when working with a particular classifier. If you don't like the probabilities, work with a classifier or do calibration. There is no room for !perfect" probabilities in this particular process, which do not exist theoretically.

One thing that is clear is that dividing into classes by a 0.5 threshold is highly questionable and rarely works.

 
СанСаныч Фоменко #:

I couldn't take it.

The probabilities given by the classifier are meaningless. They are not probabilities. If you need them, you can't use them. Don't run ahead of the locomotive and fill this fact with new meanings. At least deal with it.
 
СанСаныч Фоменко #:

I couldn't take it.

There is no such thing as an abstract, read reference, ideal, probability that is not tied to a random process.

No such thing.

The probability of a coin being tossed, etc.

So any classifier gives a probability that characterises that particular classifier, which gives the characteristic we need - prediction error. Another classifier will give other probabilities with the corresponding class prediction error.

Depending on the predictors and their associated labels, as well as class balancing, the problem of setting a threshold for dividing the probabilities into classes arises. The tools for this operation, called "calibration", are given above. It can be done in a kolkhoz way as well.

In any case, you can significantly reduce the prediction error for those probabilities given by a particular classifier, because there are no other probabilities in nature when working with a particular classifier. If you don't like the probabilities, work with a classifier or do calibration. In this particular process there is no place for !perfect" probabilities, which do not exist theoretically.

One thing that is clear is that dividing into classes by a 0.5 threshold is highly questionable and rarely works.

There we are talking about common matstat errors when using the wrong probability model. For example, if the noise in the regression is actually Laplace distributed, and we calculate as for Gaussian, then obviously there will be errors.

PS. Actually, the whole point there is to return to the probabilistic origins of MO, which, by the way, was called (at least in the USSR) statistical learning in its maiden days).

 

I have already described the example above. There is a classifier that passes the OOS, but the returns are distributed 60/40. You don't like it, you raise the decision threshold, but the situation doesn't change, and sometimes it gets even worse. You scratch your head as to why this is so.

An explanation is given as to why this is so: because in the case of real probability estimation the situation should change.

A solution is given.


 
Maxim Dmitrievsky #:

I have already described the example above. There is a classifier that passes the OOS, but the returns are distributed 60/40. You don't like it, you raise the decision threshold, but the situation doesn't change, and sometimes it gets even worse. You scratch your head as to why this is so.

The explanation is given: because in the case of real probability estimation the situation should change.

You're given a solution


Wasn't this obvious a long time ago?
 
Post-optimisation - also no one can say, but they say collibration! Oh, yeah.
 
Maxim Dmitrievsky #:

I have already described the example above. There is a classifier that passes the OOS, but the returns are distributed 60/40. You don't like it, you raise the decision threshold, but the situation doesn't change, and sometimes it gets even worse. You scratch your head as to why this is so.

The explanation is given: because in the case of real probability estimation the situation should change.

A solution is given.


However, I would like to point out that calibration will not be a panacea and is not free - you need good properties of the existing classifier. To avoid going into explanations, I will quote from your second reference on SHAD. "In general, it can be shown that this method works well if, for each of the true classes, the predicted probabilities are normally distributed with equal variance." This is about the Platt calibration, but some conditions must surely be met for the others too.

Actually, everything is like in matstat - the probabilistic properties of the used model should correspond to the data under study.

 
Aleksey Nikolayev #:

However, I would like to point out that calibration will not be a panacea and is not free - you need good properties of the existing classifier. To avoid going into explanations, I will quote from your second reference on SHAD. "In general, it can be shown that this method works well if, for each of the true classes, the predicted probabilities are normally distributed with equal variance." This is about the Platt calibration, but some conditions must surely be met for the others too.

Actually, everything is like in matstat - the probabilistic properties of the used model should correspond to the data under study.

Of course, this is just a way to make the outputs probabilistic, because using raw model probabilities is useless.

 
Discussed a long time ago, ended up with over-optimisation of f-y :)
Reason: