Discussion of article "Metamodels in machine learning and trading: Original timing of trading orders" - page 11

 
Maxim Dmitrievsky #:
There probably meant that the attributes and labels are built through functions, so automatically.

Probably.

Maxim Dmitrievsky #:
When applied on new data, the already trained model is used, you don't need to delete anything.

And when initialising the start of training, why delete it? I don't cling to words, I just don't understand - we did markup once at the beginning, and then we markup again immediately..... I don't understand this point.

Maxim Dmitrievsky #:
If you want the same - make min and max trade duration the same, min=max
If we knew how to do it right, but we don't know how to do it right....

You can substitute any autopartitioning function, this is the flexibility of the approach.

Nah, I just want to try my sampling, and to do that I need to understand what needs to be removed there.

[Deleted]  
Aleksey Vyazmikin #:

Probably.

And when initialising the start of training, why delete it? I don't cling to words, I just don't understand - we did markup once at the beginning, and then we do markup again right away.... I don't understand this point.

Nah, I just want to try my sampling, and to do that I need to understand what needs to be removed there.

You can skip that point and use your sampling, yes. I've experimented with different options. Repartitioning was done to refine the labels on trade/no trade sort of thing.
[Deleted]  
H.y. if there is interest, I can roll out other variants of more "clear" and simple realisations. Because this one is a bit difficult to understand. There are a lot of assumptions here, you can get confused.

Including variants with conversion to ONNX, if there is such a movement.... Because then you can replace all this construction with one neural network with interesting custom architecture and easily transfer it to the terminal.
 
Maxim Dmitrievsky #:
H.y. if there is interest, I can roll out other variants of more "clear" and simple realisations. Because this one is a bit difficult to understand. There are a lot of assumptions here, you can get confused.

Including variants with conversion to ONNX, if there is such a movement.... Because then you can replace all this construction with one neural network with interesting custom architecture and easily transfer it to the terminal.
Maxim, please publish, as soon as you have time, a variant with conversion to ONNX. I have all such models are draining, even on the traine, I don't understand what is the matter(.
 
Maxim Dmitrievsky #:
H.y. if there is interest, I can roll out other variants of more "clear" and simple realisations. Because this one is a bit difficult to understand. There are a lot of assumptions here, you can get confused.

Including variants with conversion to ONNX, if there is such a movement.... Because then you can replace all this construction with one neural network with interesting custom architecture and easily transfer it to the terminal.

I am interested in any code examples with little description, because I don't understand python code very well yet.

[Deleted]  
Aleksey Vyazmikin #:

I'm interested in any code samples with little description, because I don't understand python code very well yet.

Python is very simple. Literally a couple of books "Python for Complex Problems" - it covers the use of basic packages like numpy, pandas, sklearn. And Mark Lutz - Learning Python, Volume 1. The 2nd one is about classes, you don't need much.
 
Maxim Dmitrievsky #:
Python is very simple. Literally a couple of "Python for Complex Problems" books - it covers using basic packages like numpy, pandas, sklearn. And Mark Lutz - Learning Python, Volume 1. The 2nd one is about classes, don't need much.

Thanks for the recommendation. Python may be easy, but my memory has become bad - so it's hard to learn new things.

Is the column "close" after creating a sample with two targets involved somewhere else, or it can be filled with zeros?

 

In general, I have data in this format in a csv file

  • "Columns with predictors "+ "Column with a calendar date in this format 2010.01.04 07:24:00"+
  • " Target_P column - contains information about the direction of the trade"+
  • " Column Target_100 - cont ains the t a rget "+ "Column Targe t_100- con tains the target "+ "Column Target_100 - contains the target
  • " Column Target_100_Buy - contains the financial result when buying"+
  • " Column Target_100_Sell - contains the financial result of selling"+

Accordingly, I make them look as described in the article through this code

# Загрузите данные из файлов
Load_train_data = pd.read_csv('E:\\FX\\MT5_CB\\MQL5\\Files\\Catboost_Tester_M02_104_SMO\\Setup\\train.csv',sep=';')

# Сохранили предикторы
train_data = Load_train_data.loc[:, :'iVolumes_B0_S15_D1']

# Сохранили значения целевой
train_labels = Load_train_data['Target_100']

# Преобразование столбца 'Time' в формат datetime
Load_train_data['Time'] = pd.to_datetime(Load_train_data['Time'], format='%Y.%m.%d %H:%M:%S')

# Преобразование обратно в строку с нужным форматом
#Load_train_data['Time'] = Load_train_data['Time'].dt.strftime('%Y-%m-%d %H:%M:%S')

# Сохранили значение столбца
train_taime  = Load_train_data['Time']

# Вывод результата
print(train_taime)

# Создали новый DataFrame объединением столбцов
combined_data = pd.concat([train_taime, train_data, train_labels], axis=1)


# Добавили новый столбец "close" после "train_taime" со значениями "1.1"
combined_data.insert(combined_data.columns.get_loc('Time') + 1, 'close', 1.1)

# Переименовали столбец "Target_100" в "labels"
combined_data.rename(columns={'Target_100': 'labels'}, inplace=True)

# Добавили столбец с данными из train_labels
combined_data['meta_labels'] = train_labels

pr = combined_data

# Вывод результата
print(combined_data)

Prints like this

0        2010-01-04 07:24:00
1        2010-01-04 21:02:00
2        2010-01-04 21:14:00
3        2010-01-04 21:56:00
4        2010-01-04 23:08:00
                ...         
28193    2018-03-12 01:18:00
28194    2018-03-12 02:52:00
28195    2018-03-12 03:08:00
28196    2018-03-12 03:38:00
28197    2018-03-12 08:32:00
Name: Time, Length: 28198, dtype: object
                      Time  close  ...  labels  meta_labels
0      2010-01-04 07:24:00    1.1  ...       0            0
1      2010-01-04 21:02:00    1.1  ...       0            0
2      2010-01-04 21:14:00    1.1  ...       0            0
3      2010-01-04 21:56:00    1.1  ...       0            0
4      2010-01-04 23:08:00    1.1  ...       0            0
...                    ...    ...  ...     ...          ...
28193  2018-03-12 01:18:00    1.1  ...       0            0
28194  2018-03-12 02:52:00    1.1  ...       0            0
28195  2018-03-12 03:08:00    1.1  ...       0            0
28196  2018-03-12 03:38:00    1.1  ...       1            1
28197  2018-03-12 08:32:00    1.1  ...       0            0

[28198 rows x 2412 columns]

Next, I comment the functions in your year

# make dataset
#pr = get_prices()
#pr = labelling_relabeling(pr, relabeling=False)
#a, b = tester(pr, MARKUP, use_meta=False, plot=False)
#pr['meta_labels'] = b
#pr = pr.dropna()
#pr = labelling_relabeling(pr, relabeling=True)

I get an error

Traceback (most recent call last):
  File "F:/FX/Python/meta_modeling_Viborka.py", line 386, in <module>
    res.append(brute_force(pr[pr.columns[1:]], bad_samples_fraction=0.5))
  File "F:/FX/Python/meta_modeling_Viborka.py", line 128, in brute_force
    X = X[X.index >= START_DATE]
  File "C:\Program Files\Python38\lib\site-packages\pandas\core\ops\common.py", line 81, in new_method
    return method(self, other)
  File "C:\Program Files\Python38\lib\site-packages\pandas\core\arraylike.py", line 60, in __ge__
    return self._cmp_method(other, operator.ge)
  File "C:\Program Files\Python38\lib\site-packages\pandas\core\indexes\range.py", line 964, in _cmp_method
    return super()._cmp_method(other, op)
  File "C:\Program Files\Python38\lib\site-packages\pandas\core\indexes\base.py", line 6783, in _cmp_method
    result = ops.comparison_op(self._values, other, op)
  File "C:\Program Files\Python38\lib\site-packages\pandas\core\ops\array_ops.py", line 296, in comparison_op
    res_values = _na_arithmetic_op(lvalues, rvalues, op, is_cmp=True)
  File "C:\Program Files\Python38\lib\site-packages\pandas\core\ops\array_ops.py", line 171, in _na_arithmetic_op
    result = func(left, right)
  File "C:\Program Files\Python38\lib\site-packages\pandas\core\computation\expressions.py", line 239, in evaluate
    return _evaluate(op, op_str, a, b)  # type: ignore[misc]
  File "C:\Program Files\Python38\lib\site-packages\pandas\core\computation\expressions.py", line 70, in _evaluate_standard
    return op(a, b)
TypeError: '>=' not supported between instances of 'int' and 'datetime.datetime'
>>> 

I want to test it, but I can't :(

[Deleted]  
Aleksey Vyazmikin #:
TypeError: '>=' not supported between instances of 'int' and 'datetime.datetime'

The first thing I noticed is that you have the wrong dataframe indexes, it should be datetime, i.e. the time column should be indexed

 
Maxim Dmitrievsky #:

The first thing I noticed is that you have the wrong dataframe indexes, it should be datetime, that is, the time column should be indexed

And how can an idex value (0,1,2) be compared to a calendar date?