Machine learning in trading: theory, models, practice and algo-trading - page 2425

 
Maxim Dmitrievsky:
Well, you can take 5-15 increments, the indicators will be no worse than

Mat expectation over 30 points only on increments, where can we observe this? Training on the 2014-2018 sample and working in 2020 - where is that on the pre-transformations?

Maxim Dmitrievsky:
Or weed out all the predictors by correlation first (seconds of time) and then take the remaining 5-15 (if you get that many from strength)

That's how econometrics saves you time.

Do you want to try to do better than you did? I will throw you a sample; it's not very big.

 
Maxim Dmitrievsky:

I've been thinking about dying strategies...

What if I forecast the market characteristics for a long time ahead? Then I restore a series with forecasted characteristics and learn from it, and then trade the market with this model... have you tried to think in this direction?


For example, to predict the spectrum of the market...

Like "we can't know the future, but we can imagine it".
 
Aleksey Vyazmikin:

So I did the first phase of research.

How much energy did it take....

 
mytarmailS:

How much energy has gone....

This is a gift to all the skeptics.

 
Maxim Dmitrievsky:

Or sift out all the predictors by correlation first (seconds of time), and then take the remaining 5-15 (if you can get that many).

But I'll check your idea - it's not hard for me. So what correlation coefficient to take? And how to select 5-15 out of the remaining predictors - write specifically - maybe how to measure and arrange them there?

 
mytarmailS:

How much energy did it take....

Instead of counting other people's money, could you give me a hint on R?

Here I made a script, which should calculate the correlation and remove correlated columns.

library('caret')

df1 = read.csv("F:\\FX\\Открытие Брокер_Demo\\MQL5\\Files\\Proboy_236_TP_4_SL_4\\Si\\Setup\\train.csv", header = TRUE, sep = ";",dec = ".")

df1 <- df1[, ! colnames(df1)  %in% 
           c("Target_100_Buy",
             "Target_100_Sell",
             "Target_P",
             "Time",
             "Target_100")  ] # удаляем не нужные колонки
print (df1)
df2 = cor(df1)
hc = findCorrelation(df2, cutoff=0.3) # putt any value as a "cutoff" 
hc = sort(hc)
reduced_Data = df1[,-c(hc)]
print (reduced_Data)
write.table(reduced_Data, file = "F:\\FX\\Открытие Брокер_Demo\\MQL5\\Files\\Proboy_236_TP_4_SL_4\\Si\\Setup\\outfile_03.csv",
            append = FALSE, quote = FALSE, sep=";",
            eol = "\n", na = "NA", dec = ".", row.names = FALSE,
            col.names = TRUE, qmethod = c("escape", "double"),
            fileEncoding = "")

And there are two questions:

1. How to make the execution of this code in a for loop, namely I need to increase the coefficient and change the file name to save with the coefficient index, or to another directory generated in the loop.

2. I am removing auxiliary columns for calculations, how do I copy them into the resulting table (df2) after removing the correlated columns.

Thanks for your reply.

 
Aleksey Vyazmikin:

Instead of counting other people's money, could you give me some advice on R?

Here I made a script, which should calculate the correlation and remove correlated columns.

And there are two questions:

1. How do I make this code execute in a for loop, specifically I need to increase the coefficient and change the file name to save with the coefficient index, or to another directory generated in the loop.

2. I am removing auxiliary columns for calculations, how do I copy them into the resulting table (df2) after removing the correlated columns.

Thank you for your reply.

answer to question (2)

library('caret')
#df1 загруженная дата
df1 <- as.data.frame(matrix(nrow = 100,ncol = 10,data = sample(1:10,1000,replace = T)))
# head(df1)

not <- c("V1","V2","V3") #  имена переменных которые нам НЕ нужны для корреляции

df2 <-  cor(     df1[, ! colnames(df1)  %in%  not]      )  
# head(df2)

not.need <- findCorrelation(df2, cutoff=0.1) # putt any value as a "cutoff" 
not.need.nms <- colnames(df2[,not.need])  # получаем имена переменных что не прошли коррел тест
# head(not.need.nms)

# получаем изначальную  df1 только без признаков что не прошли отбор
reduced_Data <- df1[, ! colnames(df1)  %in%  not.need.nms]
# head(reduced_Data)


now can you write a function that does the same thing but looks neater

get.findCorrelation <- function(data , not.used.colums , cor.coef){
library('caret')
df2 <-  cor(     data[, ! colnames(data)  %in%  not.used.colums])  
not.need <- findCorrelation(df2, cutoff=cor.coef) 
not.need.nms <- colnames(df2[,not.need])  # получаем имена переменных что не прошли коррел тест
reduced_Data <- data[, ! colnames(data)  %in%  not.need.nms]
return(reduced_Data)}

gf <- get.findCorrelation(data = df1 , 
                          not.used.colums = c("V1","V2","V3"),
                          cor.coef = 0.1)

the file is fed to the input

data = df1
specify columns that will not be used for corel analysis

not.used.colums = c("V1","V2","V3")

well, and tuning korel from the findCorrelation function

cor.coef = 0.1
the output is df1 but without junk features
 

Now the answer to the first question

way <- "F:\\FX\\Открытие Брокер_Demo\\MQL5\\Files\\Proboy_236_TP_4_SL_4\\Si\\Setup\\"

cor.test.range <- seq(from = 0.1,to = 0.7,by = 0.1)  #  диапазон перебора в коеф корр

for(i in 1:length(cor.test.range)){
  
  reduced_Data <- get.findCorrelation(data = df1 , 
                      not.used.colums = c("V1","V2","V3"),
                      cor.coef = cor.test.range[i] )
  
  file.name <- paste0("test.with.cor_" , cor.test.range[i] , ".csv")
  final.way <- paste0(way , file.name)
  
  
  write.csv2(x = reduced_Data,file = final.way,row.names = F)  #  возможно это лучше
  
  #  write.table(reduced_Data, file = final.way,
  #              append = FALSE, quote = FALSE, sep=";",
  #              eol = "\n", na = "NA", dec = ".", row.names = FALSE,
  #              col.names = TRUE, qmethod = c("escape", "double"),
  #              fileEncoding = "")
}
 
Aleksey Vyazmikin:

Then I trained with a fixed quantum table setting on a sample of train - 60% test - 20% exam - 20% with

Doesn't it seem to you that you're tuning your model to the most successful version on the test?
I myself have several times hit the successful test sites and thought - here it is the Grail))). And after shifting a few months forward or backward, - everything became clear and the model was wrong and the predictors were wrong, and I was sinking in those sections.

I completely switched to analyzing the models on crossvaluation or valuation forwards. At best I saw 50%.
Doc mentioned crossvalidation in one of his last posts, too.

 
YURY_PROFIT:

I have a very good Expert Advisor of Maxim, the programmer I quarreled with here, on machine learning during backtests was a fire, on forward tests it worked for a month and a half with decent profit statistics, now it is dumping without even pausing))

YURY_PROFIT:

Please provide me an example of a machine learning EA that worked in the real market for at least 3 months in profit without retraining.

So what's the problem with continuous retraining? What is this vacuum condition"without retraining"? If the forward is working at least a day, it is already a grail, and it is possible to finish training even with every tick.

Reason: