Discussion of article "Metamodels in machine learning and trading: Original timing of trading orders" - page 12

[Deleted]  
Aleksey Vyazmikin #:

And how can an idex value (0,1,2) compare to a calendar date?

They are not compared, just the time column must be set to index, and 0,1,2 is not needed at all

try the highlighted in your code

def get_prices() -> pd.DataFrame:
    p = pd.read_csv('files/EURUSD_H1.csv', delim_whitespace=True)
    pFixed = pd.DataFrame(columns=['time', 'close'])
    pFixed['time'] = p['<DATE>'] + ' ' + p['<TIME>']
    pFixed['time'] = pd.to_datetime(pFixed['time'], format='mixed')
    pFixed['close'] = p['<CLOSE>']
    pFixed.set_index('time', inplace=True)
    pFixed.index = pd.to_datetime(pFixed.index, unit='s')
    pFixed = pFixed.dropna()
    pFixedC = pFixed.copy()

    count = 0
    for i in PERIODS:
        pFixed[str(count)] = pFixedC.rolling(i).mean() - pFixedC
        count += 1

    return pFixed.dropna()
 
Maxim Dmitrievsky #:

They are not compared, just the time column must be set to index, and 0,1,2 is not needed at all

try the highlighted in your code

In general, I have wasted a lot of time already. Adaptation for data from a file was not successful.

Apparently, it is easier to rewrite from scratch.

Data disappears somewhere after markup on the second iteration - I don't know if it's some kind of problem with the book.

Уникальные значения в train_y (meta model): [0.]
Traceback (most recent call last):
  File "F:/FX/Python/meta_modeling_Viborka.py", line 504, in <module>
    res.append(brute_force(pr[pr.columns[1:]], bad_samples_fraction=0.5))
  File "F:/FX/Python/meta_modeling_Viborka.py", line 265, in brute_force
    meta_model.fit(train_X, train_y, eval_set=(test_X, test_y),
  File "C:\Program Files\Python38\lib\site-packages\catboost\core.py", line 5100, in fit
    self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
  File "C:\Program Files\Python38\lib\site-packages\catboost\core.py", line 2319, in _fit
    self._train(
  File "C:\Program Files\Python38\lib\site-packages\catboost\core.py", line 1723, in _train
    self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
  File "_catboost.pyx", line 4645, in _catboost._CatBoost._train
  File "_catboost.pyx", line 4694, in _catboost._CatBoost._train
_catboost.CatBoostError: C:/Go_Agent/pipelines/BuildMaster/catboost.git/catboost/private/libs/target/target_converter.cpp:375: Target contains only one unique value
>>> 
Files:
 

Basically, yes, there's a zeroing out going on here

    # mark bad labels from bad_samples_book
    if BAD_SAMPLES_BOOK.value_counts().max() > 1:
        to_mark = BAD_SAMPLES_BOOK.value_counts()
        mean = to_mark.mean()
        marked_idx = to_mark[to_mark > mean*bad_samples_fraction].index
        pr2.loc[pr2.index.isin(marked_idx), 'meta_labels'] = 0.0
    else:
        pr2.loc[pr2.index.isin(BAD_SAMPLES_BOOK), 'meta_labels'] = 0.0

I can't understand how I can get a value greater than one after the first iteration?

Accordingly, since I don't, all the values "
meta_labels
" are zeroed out for me.
[Deleted]  
Aleksey Vyazmikin #:

Basically, yes, there's a zeroing out going on here

I can't understand how I can get a value greater than one after the first iteration?

Accordingly, since I don't get it, all " " values are zeroed out for me.

something strange is going on. give me time to concentrate, we'll figure it out later ) or send me a piece of your dataset.

 
Maxim Dmitrievsky #:

something strange is going on. give me time to focus, we'll figure it out later ) or send me a piece of your dataset

This is a rough draft. I had to solve problems that did not occur to you, namely unbalanced classes leads to errors when dividing into subsamples.

While it counts - you can sleep...

I'll try to upload the sample later.
Files:
 
Maxim Dmitrievsky #:

Here's a sample.

Still, I guess I didn't fully understand the markup.

 
Maxim Dmitrievsky #:

I understand correctly that the meta model classifies examples with class "1", and the second model is already activated only on units of the first one?

I'm just confused by your code - what is classified by the model as a unit is written as a zero. If I understood it correctly of course...

    p2 = [x[0] < 0.5 for x in p]
    p2_meta = [x[0] < 0.5 for x in p_meta]

In general, if I have correctly reconstructed the method, the result is as follows.

000

I added checks in the code - otherwise it crashes with an error - attached.

And I understand that model parsing does not work, because the model code has changed?

Files:
[Deleted]  
Aleksey Vyazmikin #:

I understand correctly that the meta model classifies examples with class "1", and the second model is already activated only on units of the first one?

I'm just confused by your code - what the model classifies as a unit is written as a zero. If I understand it correctly of course...

In general, if I have correctly restored the method, the result is as follows

I added checks in the code - otherwise it fails with an error - attached.

And I understand that model parsing doesn't work because the model code has changed?

There x[0] are probabilities for the null class, the model gives probabilities for two classes. That is, if the probability of the null class is less than 0.5, then the class is predicted first. So True == 1 and vice versa. Therefore, there is no error.

Yes, there are changes in the new version of catbust, I'll send you the redesigns. I'm a bit away from my computer at the moment, I'll try to help you later
 
Maxim Dmitrievsky #:
Yes, there are changes in the new version of the catbusta, I'll discount the redesigns. A bit away from my computer at the moment, will try to help later

Thanks.

[Deleted]  
Aleksey Vyazmikin #:

Thank you.

Maybe in your dataset the labels are upside down