How to improve the quality of data - General

Aleksey Vyazmikin 2020.11.12 15:42 #21121

Maxim Dmitrievsky:

I like Near-Miss better (from the pictures)

The pictures are nice - but you have to try.

Maxim Dmitrievsky 2020.11.12 15:48 #21122

Aleksey Vyazmikin:

I must have changed it in the wrong place.

Please check what's wrong.

it is not X, y but data_X, data_y

if you use under-sampling (decreasing number of samples of major class), you'd better collect a lot of data, otherwise the output will be too small (by the size of minor class)

Code profiling - Developing Enabling the Storage - DeMarker - Oscillators -

Aleksey Vyazmikin 2020.11.12 18:44 #21123

Maxim Dmitrievsky:

it's not X, y but data_X, data_y

cc = ClusterCentroids(random_state=0)
x_resampled, y_resampled = cc.fit_resample(data_X, data_y)

NameError                                 Traceback (most recent call last)
<ipython-input-7-29177f78bed3> in <module>()
      1 cc = ClusterCentroids(random_state=0)
----> 2 x_resampled, y_resampled = cc.fit_resample(data_X, data_y)

NameError: name 'data_X' is not defined

Maxim Dmitrievsky 2020.11.12 18:55 #21124

Aleksey Vyazmikin:

Aleksey ... )))

Dataset must be loaded and all the steps in the notebook. What is printed there now - is the stored information, these objects no longer exist

Aleksey Vyazmikin 2020.11.12 19:19 #21125

Maxim Dmitrievsky:

Alexei... )))

dataset has to be loaded and all the steps in the notebook. What is printed there now is stored information, these objects no longer exist

And the archive can be downloaded and unzipped there?

Maxim Dmitrievsky 2020.11.12 19:37 #21126

Aleksey Vyazmikin:

And the archive can be downloaded and unzipped there?

https://stackoverflow.com/questions/3451111/unzipping-files-in-python I'm from my phone. If will not succeed tomorrow will do. Another archive can be downloaded to Google disk once and then copied into GoogleClub. If you have bad internet. You can open zips at once.https://stackoverflow.com/questions/18885175/read-a-zipped-file-as-a-pandas-dataframe. And savehttps://www.google.ru/amp/s/cmdlinetips.com/2020/05/how-to-save-pandas-dataframe-as-gzip-zip-file/amp/

Live Update - For Enabling the Storage - Platform Installation - For

Aleksey Vyazmikin 2020.11.13 00:33 #21127

Maxim Dmitrievsky:
https://stackoverflow.com/questions/3451111/unzipping-files-in-python I'm on my phone. If I can't do it tomorrow, I will. You can also download the archive to Google disk once and then copy it to GoogleClub. If you have bad internet. You can open zips at once.https://stackoverflow.com/questions/18885175/read-a-zipped-file-as-a-pandas-dataframe. And savehttps://www.google.ru/amp/s/cmdlinetips.com/2020/05/how-to-save-pandas-dataframe-as-gzip-zip-file/amp/

Nothing came out yet - I will try again tomorrow.

Maxim Dmitrievsky 2020.11.13 09:06 #21128

Aleksey Vyazmikin:

So far it hasn't worked - I'll try again tomorrow.

reading

data = pd.read_csv('exam.zip', sep=';')

write

to_save.to_csv('oversamled_exam.zip', sep =';',

compression=dict(method='zip', archive_name='exam.csv'))

updated my laptop

Aleksey Vyazmikin 2020.11.13 16:08 #21129

Maxim Dmitrievsky:

reading

data = pd.read_csv('exam.zip', sep=';')

write

to_save.to_csv('oversamled_exam.zip', sep =';',

compression=dict(method='zip', archive_name='exam.csv'))

updated my laptop

Thank you! It all worked out.

I think I have it right - only train transform, because on the test just goes control - so I did, but the result is very strange - error logloss exceeds 1 on the test sample and grows - how can it be - I'm shocked.

Projects and MQL5 Storage MetaEditor - Professional editor Optimization Types - Algorithmic

Aleksey Vyazmikin 2020.11.13 16:20 #21130

Maxim, how do you set this thing up?

from imblearn.under_sampling import TomekLinks

tl = TomekLinks(return_indices=True, ratio='majority')
X_tl, y_tl, id_tl = tl.fit_sample(X, y)

What is id_tl ?

Machine learning in trading: theory, models, practice and algo-trading - page 2113