Machine learning in trading: theory, models, practice and algo-trading - page 3359

 

Classifiers such as support vector method and decision trees have predict_probafunction because they can provide estimates of class probabilities based on their internal features. However, these probability estimates may not be completely accurate or may not reflect the real confidence of the classifier.


For example, for the support vector method , thepredict_probafunction may return probability estimates based on the distance to the separating hyperplane, but these values may be skewed due to features of the method itself.


For decision trees , thepredict_proba function can compute class probabilities based on the number of objects of each class in leaf nodes, but these probabilities may not be entirely accurate due to the structure of the tree.


Thus,although these classifiers have a predict_probafunction , the probabilities they provide may be less reliable compared to methods based on a probabilistic model, such as a naive Bayesian classifier or logistic regression.

 

I present a small experiment for <removed by moderator>.

Trained some model, it doesn't matter which one, without calibration it doesn't improve its properties when the threshold is increased. Deals become less, profit futcor does not grow.

Calibrated in the available way, ran with different thresholds. Calibration was after 2015, everything before that is OOS.

The method is custom, I came up with it myself. Then I will compare it with the well-known ones, because there is a small hitch in their export to MT5, then I will decide.

threshold 0.5

0.6

0.7

A simple example that calibration of even initially weak models gives some result.

CHTD

Машинное обучение в трейдинге: теория, модели, практика и алготорговля - Хорошие свойства у имеющегося классификатора должны соответствовать изучаемым данным.
Машинное обучение в трейдинге: теория, модели, практика и алготорговля - Хорошие свойства у имеющегося классификатора должны соответствовать изучаемым данным.
  • 2023.12.25
  • www.mql5.com
это просто способ привести аутпуты в вероятностый вид. потому что использование сырых вероятностей модели вообще бесполезно. переход к чисто вероятностной постановке задач в трейдинге давно созрел и даже слегка перезрел
 
There is also a trick to calibrate a model trained on other data to its labels. In some subtle situations, which I will not explain, it has a good effect.
 
mytarmailS #:
What is calibration
h ttps://stats.st ackexchange.com/questions/552146/probability-calibration-of-statistical-models
How to calibrate
https://www.tidy models.org/learn/models/calibration/

h ttps://mlr.mlr- org.com/articles/tutorial/classifier_calibration.html

Read the articles, read the articles on the links.

Strange impression.

According to the articles the point of calibration is smoothing in one way or another. and what is better than setting thresholds on smoothed probabilities and on unsmoothed probabilities? There is no estimation, although to me there is an estimation is classification error.

 
It seems to turn out that it is quite possible to calibrate for any regression, not just the one that produces "probabilities". I wonder if this makes any sense.
 
Aleksey Nikolayev #:
if it makes any sense.
That's the most important question
 

New business - selling predictors


 
Maxim Dmitrievsky #:

A simple example is that calibration of even initially weak models gives some result.

Calibration is a mechanism for interpreting the model's performance, tuned to certain data.

By itself it does not change the output values of the model. The variant where after quantisation the ranges are rearranged due to a spike in class proportion - I haven't seen this in models - everything always flows smoothly. Maybe if you divide it into 100 segments, then this will occur....

According to sim, calibration in general leads to a shift of the point 0,5 - more often in the larger side. So without calibration you can find such a point - why you failed to do it is quite unclear, especially if you have the same Take Profit and Stop Loss for all positions. If they are not the same, then you need a completely different approach - calibration by expectation matrix :)

 
Aleksey Vyazmikin #:

Calibration is a mechanism for interpreting model performance, tuned to certain data.

By itself it does not change the output values of the model. The variant when after quantisation the rearrangement of ranges occurs due to a surge of class proportion - I have not seen in models - everything always flows smoothly. Maybe if you divide by 100 segments, then this will occur....

According to sim, calibration in general leads to a shift of the point 0,5 - more often in the larger side. So without calibration you can find such a point - why you failed to do it is quite unclear, especially if you have the same Take Profit and Stop Loss for all positions. If they are not the same, then you need a completely different approach - calibration by expectation matrix :)

I'm not agitating anyone at all. There are plenty of approaches, the question was about the knowledge of the MO

There is always a magic cure for all troubles - optimise everything that moves.
 
Maxim Dmitrievsky #:
I'm not campaigning at all. There are plenty of approaches, the question was about the knowledge of MO

Well, I personally didn't associate the model's answer with the probability of a class dropping out, I take it as the model's confidence in the class definition. Confidence is counted by leaves, and leaves are counted by the training sample. Here a single leaf will show the probability of class dropout. Since each leaf does not have responses at all points in the sample, it turns out that the summation of probabilities is distorted in the final response of the model. Perhaps there is a way to correct at this level - and I am interested in it - I tried to turn the discussion in this direction.

In my opinion - the solution is to group leaves by similar response points and further transformation of the average summary results of the groups....

Reason: