Machine learning in trading: theory, models, practice and algo-trading - page 3401

 
mytarmailS #:
h ttps:// youtu.be/Ipw_2A2T_wg?si=U03oigHFfaFxwjbs

These are MO heroes.

That reminds me.


 
mytarmailS #:

Try this one.

binary classification

Thank you. Now it works fast!

GPT won't replace human of course, but it helps quite well.

mytarmailS #:

50,000 traits / columns

found a subset of the best features in less than 3 seconds.


all features that are relevant to the target found and none of the 50,000 noise features found

So, it found six predictors out of the whole list. Hmm, now I will train 100 CatBoost models on them and see the average result.


mytarmailS #:
Logistic regression is a classification algorithm, for example, texts are classified using it.

Yes, of course it's a classification algorithm. I haven't seen any contradictions in your arguments and my earlier words. In general - some misunderstanding.

This code saves the indexes of predictors for exclusion and the list of predictors selected by this method.

#  Установка и загрузка пакета abess
#install.packages("abess")
library(abess)

#  Загрузка данных из CSV
data <- read.csv("E:\\FX\\MT5_CB\\MQL5\\Files\\00_Standart_50\\Setup\\train.csv", sep = ";")

#  Указание столбца целевой переменной
target_column <- "Target_100"
target <- data[, target_column]

#  Исключение столбцов по их именам
столбцы_исключения <- c("Time","Target_P","Target_100","Target_100_Buy","Target_100_Sell")
data_without_excluded <- data[, !names(data) %in% столбцы_исключения]

#  Выбор только первых 50 столбцов
#data_without_excluded <- data[, 1:500]

#  Применение метода abess
#  Здесь вам нужно указать вашу модель и настройки метода abess
#  Например:
#model <- abess(y = target, x = data_without_excluded, method = "lasso")
model <- abess(y = target, x = data_without_excluded, tune.path = "gsection", early.stop = TRUE)

#  Копирование результатов модели - разные показатели
ex <- extract(model)

#  Получение имен отобранных предикторов (столбцов)
Pred <- ex$support.vars

#  Сохранение информации в CSV файл
write.csv(Pred, "E:\\FX\\MT5_CB\\MQL5\\Files\\00_Standart_50\\Setup\\Pred.csv", row.names = FALSE)

#  Получение индексов всех предикторов в наборе данных
все_предикторы <- colnames(data_without_excluded)
индексы_всех_предикторов <- seq_along(все_предикторы)

#  Получение индексов предикторов, которые не входят в список для исключения
индексы_оставшихся_предикторов <- setdiff(индексы_всех_предикторов, match(Pred, все_предикторы))

#  Уменьшение индексов на 1
индексы_оставшихся_предикторов <- индексы_оставшихся_предикторов - 1

#  Сохранение индексов в CSV файл
write.csv(индексы_оставшихся_предикторов, "E:\\FX\\MT5_CB\\MQL5\\Files\\00_Standart_50\\Setup\\Оставшиеся_предикторы.csv", row.names = FALSE)
 

So let's compare - 100 CB models are trained - the first picture will be with the abess method, and the second one without, the result on the sample train


Test sample - we stop training on it.

The exam sample is a delayed sample that does not participate in the training process.


It seems that the abess method of selection does not work very effectively....

I should note that I have very unbalanced classes, I have about 16% of units - maybe it affects the selection.

 
Aleksey Vyazmikin #:

The abess selection method doesn't seem to work very effectively....

I have very unbalanced classes, I have about 16% units - maybe it affects the selection.

I have 5 options

1. Something is wrong with data saving. There are BIG questions about the code, it is the most strange and I do not understand, is it again gpt?

2. Perhaps you need to normalise the data

3. Maybe there is something wrong in the data itself.

4. Maybe there's an imbalance.

5. Maybe it's actually working worse
 
mytarmailS #:
I've got five options

1. Something is messed up with data saving. There are BIG questions about the code, it is the most strange and I do not understand, is it again gpt?

2. Perhaps we need to normalise the data

3. Maybe there is something wrong with the data itself

4. Maybe there's an imbalance

5. Probably really worse performance

1. It saves correctly - it corresponds to the output to the log. The code is clear to me on the contrary - it is a compilation of the original code and the one you suggested earlier here.

2. Probably needed for this method. I will try to do it.

3. Well, there may be something wrong in the data - that's what the process with selection is for. The data is obtained correctly, if that is what we are talking about.

4. That's what I'm writing about. Maybe it is necessary to set auto-balancing in the parameters?

5. So far it turns out like this. Maybe with stationary data it would be OK.

 
Aleksey Vyazmikin #:

1. It saves correctly - it corresponds to the output to the log. The code is clear to me on the contrary - it is a compilation of the original code and the one you suggested earlier here.

2. Probably needed for this method. I will try to do it.

3. Well, there may be something wrong in the data - that's what the process with selection is for. The data is obtained correctly, if that is what we are talking about.

4. That's what I'm writing about. Maybe it is necessary to set auto-balancing in the parameters?

5. So far it turns out like this. Maybe with stationary data it would be OK.

Получение индексов предикторов, которые не входят в список для исключения
индексы_оставшихся_предикторов <- setdiff(индексы_всех_предикторов, match(Pred, все_предикторы))

#  Уменьшение индексов на 1
индексы_оставшихся_предикторов <- индексы_оставшихся_предикторов - 1
I don't understand this one at all.



Normalisation is easy, just apply scale(data) to the data before all procedures. This normalises the matrix by columns
 
mytarmailS #:
That's what I don' t understand at all .

For CatBoost it is necessary to submit a list of predictors that are subject to exclusion, i.e. those that are not selected. Decreasing the value by -1 is necessary because the index in the same CatBoost is counted from null, as well as in many other languages.

mytarmailS #:
Normalisation is easy, just apply scale(data) to the data before all procedures. This normalises the matrix by columns.
#  Нормализация выбранных предикторов
normalized_data <- scale(data_without_excluded)
data_without_excluded  <- normalized_data

Is this acceptable?

In general, the result is identical to that without normalisation.

 
Aleksey Vyazmikin #:

For CatBoost it is necessary to submit a list of predictors that are subject to exclusion, i.e. those that are not selected. Decreasing the value by -1 is necessary because the index in the same CatBoost is counted from null, as in many other languages.


Is this acceptable?

In general, the result is identical to that without normalisation.

Yes, it is acceptable.

So the problem is different, I do not understand the code that I highlighted, but without a computer I encrypt many times, tomorrow I will look....

I had coffee at 6.30pm, it's almost 3.40am and I'm lying here staring at the ceiling.
 
mytarmailS #:

Shit I had coffee at 6.30pm, it's nearly 3.40am now and I'm lying there staring at the ceiling.
It's not the coffee, it's not the coffee, it's the MO that's keeping you up, it's the disease.
 
mytarmailS #:
Yes, it's permissible.

So the problem is different, I don't understand the code I highlighted, but I encrypt it many times without a computer, tomorrow I'll look at it....

I had coffee at 6.30pm, it's almost 3.40am and I'm lying here staring at the ceiling.

So the coffee wasn't fake.

Reason: