I am thinking how to make an artificial sample for binary classification - General

СанСаныч Фоменко 2024.04.24 18:25 #34931

mytarmailS #:
I can fiddle with one sample so that it will pass the test on 5 more samples, but on the real 6th sample it will fail.

So this primitive doesn't work, and what the guy in the video says is real.

Can we see at least four samples?

Aleksey Vyazmikin 2024.04.24 21:04 #34932

I am thinking how to make an artificial sample for binary classification. Requirements - many predictors with noise and part of them with useful data, at least thousands of 20000 strings. The main thing is to know for sure that a solution can be found. Something like a benchmark to see how bad the algorithm is. Here I think, maybe take some function and mark its sections - above and below zero - respectively "1" and "0", any ideas?

Aleksey Nikolayev 2024.04.25 03:55 #34933

Aleksey Vyazmikin #:
I am thinking how to make an artificial sample for binary classification. Requirements - many predictors with noise and part of them with useful data, at least thousands of 20000 strings. The main thing is to know for sure that a solution can be found. Something like a benchmark to see how bad the algorithm is. I'm thinking of taking some function and marking its sections - above and below zero - respectively "1" and "0", any ideas?

Usually you just generate multivariate random samples. If you need fully separable classes, then use distributions with bounded media - uniform, for example. If partial mixing of classes is needed, then Gaussian or their mixtures.

Aleksey Vyazmikin 2024.04.25 08:10 #34934

Aleksey Nikolayev #:
Usually just generate multivariate random samples. If fully separable classes are needed, distributions with limited carriers are used - uniform, for example. If partial mixing of classes is needed, then Gaussian or their mixtures.

I want something meaningful, not random.

Maxim Dmitrievsky 2024.04.25 08:11 #34935

mytarmailS #:

How do you see it other than the way it was in the video?

Well, in many ways, through mutual information, for example.

Aleksey Nikolayev 2024.04.25 08:36 #34936

Aleksey Vyazmikin #:

I want something meaningful, not random.

You want to reinvent the wheel, you have every right. Just describing standard practice.

Aleksey Vyazmikin 2024.04.25 08:54 #34937

Aleksey Nikolayev #:
You want to reinvent the wheel, you have every right. I just described the standard practice.

I did not understand how the target markup is done in the proposed method, how the connection with predictors is established.

Why I want a meaningful something - because my goal is to reduce classification error through estimation of unstable quantum segments, and for that I want to look at the areas where error occurs - visualise them perhaps. To try different approach to estimation and to see the result is reliable, that for example on a significant predictor only 5 useful sites out of 10 were detected, I need to understand how they differ.

Aleksey Nikolayev 2024.04.25 09:42 #34938

Aleksey Vyazmikin #:

I didn't understand how the markup of the target is done in the proposed method, how the link to predictors is established.

Why I want something meaningful - because my task is to reduce classification error through estimation of unstable quantum segments, and for this purpose I want to look at areas where error occurs - visualise them, maybe. To try different approach to estimation and to see the result is reliable, that for example on a significant predictor only 5 useful sites out of 10 were detected, I need to understand how they differ.

It might be worth looking into data generation packages, something like this.

3 пакета Python для генерации синтетических данных

2022.07.15
habr.com

В процессе решения задачи при работе с данными нередко возникает ситуация, когда получение реальных данных сложно, к примеру, если речь идет о конфиденциальной информации, либо сбор данных занимает большое количество времени, либо просто необходимо протестировать проект с данными, которые соответствуют определенным критериям. Для решения...

Aleksey Vyazmikin 2024.04.25 10:15 #34939

Aleksey Nikolayev #:
Might be worth looking into data generation packages, something like this.

Thanks, might be useful, but it's not quite what I need. However, I've already got my own bicycle in my head.....

mytarmailS 2024.04.25 14:21 #34940

Maxim Dmitrievsky #:
Well, in various ways, through mutual information, for example.

I'm not sure what to compare it to.

Machine learning in trading: theory, models, practice and algo-trading - page 3494