Machine learning in trading: theory, models, practice and algo-trading - page 3605

[Deleted]  
Aleksey Vyazmikin #:

Your code removes rows from the sample that belong to clusters that have a mean of (as you write 0.5) units.

My code doesn't do this and it can be seen in the code. You are now wishful thinking.
[Deleted]  
Aleksey Vyazmikin #:

I haven't done that. But, in essence, this is a binarisation. Again, if the probabilities are preserved on new data, the effect will be there, but if not, it's fucked.

I get a similar effect through quantisation in essence.
More speculation. I feel like a predictor.
 
Maxim Dmitrievsky #:
My code doesn't do that and it shows in the code. This is wishful thinking.

Yes, I misunderstood - it is reflected in the second message.

Maxim Dmitrievsky #:
More guesswork. I feel like a fortune teller.

I don't get it.

[Deleted]  
Aleksey Vyazmikin #:

Yes, I misunderstood - the second post reflects that.

I don't get it.

You can combine it with removing bad clusters, but that's not the story of label adjustment
 
Maxim Dmitrievsky #:
It is possible to combine with removal of bad clusters, but this is not a story about label correction

What is important, what gives a possibility to believe that on new data in this cluster or quantum segment there will be the same probability shift, or at least with the same sign from the mean value?

[Deleted]  
Aleksey Vyazmikin #:

Another important thing is what makes it possible to believe that on new data in this cluster or quantum segment there will be the same probability shift, or at least with the same sign from the mean?

You don't read?

#36001

every post is unreadable for some reason
Машинное обучение в трейдинге: теория, модели, практика и алготорговля
Машинное обучение в трейдинге: теория, модели, практика и алготорговля
  • 2024.08.22
  • Maxim Dmitrievsky
  • www.mql5.com
Добрый день всем, Знаю, что есть на форуме энтузиасты machine learning и статистики...
 
Maxim Dmitrievsky #:

You don't read?

#36001

every post is not comprehensible for some reason

Well, for me the wording " To understand is very simple - stops working :) " - by what criteria to understand? I showed above that it may or may not work over a long period of time such a probability shift...

I read everything. I have already written many times that all this is not enough - we need some additional criteria than just checking on history.

[Deleted]  
Aleksey Vyazmikin #:

Well, for me the wording " To understand is very simple - it stops working :)) " - by what criteria to understand? I have shown above that it may or may not work for a long period of time such a probability shift...

I read everything. I have already written many times that all this is not enough - we need some additional criteria than just checking on history.

What exactly is being discussed: ways to fix labels or complaining that something doesn't work?

 
Maxim Dmitrievsky #:

What exactly is being discussed: ways to fix the tags or complaining that something doesn't work?

Your proposed method of repartitioning is generally understandable. I just want to clarify what sampling period is taken to analyse clusters (is delayed sampling involved in the process), and how is it taken - as a whole or randomly?

At the same time, I note that I do not consider this process to be "correction", as there is no comparison with any benchmark markup.

Regarding seeding - this is the most important issue for me - a way to control the change in probability with minimal lag.

[Deleted]  
Aleksey Vyazmikin #:

Your proposed method of repartitioning is generally clear. Just let me clarify, what sampling period is taken to analyse clusters (is delayed sampling involved in the process), and how is it taken - as a whole or randomly?

At the same time, I would like to note that I do not consider this process as "correction", as there is no comparison with any reference markup.

Regarding seeding - this is the most important issue for me - a way to control the change in probability with minimal lag.

On traine and validation, of course. The test has no labels. What data are you passing to the function. Some newbie questions.

The benchmark is the distribution of labels in clusters. Ideally, each cluster should have 0 or 1. Then the data is completely predictable. Approaching this benchmark is called correction.

It follows automatically that the larger the sample, the stat significance of patterns increases.