Machine learning in trading: theory, models, practice and algo-trading - page 2113

 
Maxim Dmitrievsky:


I like Near-Miss better (from the pictures)

The pictures are nice - but you have to try.

 
Aleksey Vyazmikin:

I must have changed it in the wrong place.

Please check what's wrong.

it is not X, y but data_X, data_y

if you use under-sampling (decreasing number of samples of major class), you'd better collect a lot of data, otherwise the output will be too small (by the size of minor class)

 
Maxim Dmitrievsky:

it's not X, y but data_X, data_y

cc = ClusterCentroids(random_state=0)
x_resampled, y_resampled = cc.fit_resample(data_X, data_y)
NameError                                 Traceback (most recent call last)
<ipython-input-7-29177f78bed3> in <module>()
      1 cc = ClusterCentroids(random_state=0)
----> 2 x_resampled, y_resampled = cc.fit_resample(data_X, data_y)

NameError: name 'data_X' is not defined
 
Aleksey Vyazmikin:

Aleksey ... )))

Dataset must be loaded and all the steps in the notebook. What is printed there now - is the stored information, these objects no longer exist

 
Maxim Dmitrievsky:

Alexei... )))

dataset has to be loaded and all the steps in the notebook. What is printed there now is stored information, these objects no longer exist

And the archive can be downloaded and unzipped there?

 
Aleksey Vyazmikin:

And the archive can be downloaded and unzipped there?

https://stackoverflow.com/questions/3451111/unzipping-files-in-python I'm from my phone. If will not succeed tomorrow will do. Another archive can be downloaded to Google disk once and then copied into GoogleClub. If you have bad internet. You can open zips at once.https://stackoverflow.com/questions/18885175/read-a-zipped-file-as-a-pandas-dataframe. And savehttps://www.google.ru/amp/s/cmdlinetips.com/2020/05/how-to-save-pandas-dataframe-as-gzip-zip-file/amp/
 
Maxim Dmitrievsky:
https://stackoverflow.com/questions/3451111/unzipping-files-in-python I'm on my phone. If I can't do it tomorrow, I will. You can also download the archive to Google disk once and then copy it to GoogleClub. If you have bad internet. You can open zips at once.https://stackoverflow.com/questions/18885175/read-a-zipped-file-as-a-pandas-dataframe. And savehttps://www.google.ru/amp/s/cmdlinetips.com/2020/05/how-to-save-pandas-dataframe-as-gzip-zip-file/amp/

Nothing came out yet - I will try again tomorrow.

 
Aleksey Vyazmikin:

So far it hasn't worked - I'll try again tomorrow.

reading

data = pd.read_csv('exam.zip', sep=';')

write

to_save.to_csv('oversamled_exam.zip', sep =';',

compression=dict(method='zip', archive_name='exam.csv'))


updated my laptop

 
Maxim Dmitrievsky:

reading

data = pd.read_csv('exam.zip', sep=';')

write

to_save.to_csv('oversamled_exam.zip', sep =';',

compression=dict(method='zip', archive_name='exam.csv'))


updated my laptop

Thank you! It all worked out.

I think I have it right - only train transform, because on the test just goes control - so I did, but the result is very strange - error logloss exceeds 1 on the test sample and grows - how can it be - I'm shocked.

 

Maxim, how do you set this thing up?

from imblearn.under_sampling import TomekLinks

tl = TomekLinks(return_indices=True, ratio='majority')
X_tl, y_tl, id_tl = tl.fit_sample(X, y)

What is id_tl ?

Reason: