Machine learning in trading: theory, models, practice and algo-trading - page 2807

 
mytarmailS #:

.

Your script consumes almost 9 gigabytes of RAM on my sample, but it seems to work, the files are saved. I don't even know where the memory is consumed there, while the sample takes a little more than a gigabyte.

 
mytarmailS #:

.

I also found out that headings in the table (column names) are saved in quotes - how to switch this off?

 

What does this code do? To make it faster, you should convert all columns to the same data type (float 32, 16 - no need, it will be slower) and calculate coRRR through fast arrays.

if we are talking about the real correction of the kaRma

 
Aleksey Vyazmikin #:

Your script consumes almost 9 gigabytes of RAM on my sample, but it seems to work, the files are saved. I don't even know where the memory is used, while the sample takes a little more than a gigabyte.

So?

R bad probably)

Aleksey Vyazmikin #:

I've also found a problem - headings in the table (column names) are saved in quotes - how to switch it off?

what did you do to solve the problem?

 
mytarmailS #:

So what?

R is bad, I guess.)

what did you do to solve the problem?

Bad/good is too critical a judgement.

It's obvious that the package code is not memory efficient but can be fast, or the script copies the whole table\selection many times.

And what you did - you found the problem and reported it to a professional hoping for help.

 
Maxim Dmitrievsky #:

What does this code do? For speed, you should convert all columns to the same data type (float 32, 16 - no need, it will be slower) and calculate the number of columns using fast arrays.

if we are talking about the real correction of the kaRma

As far as I understand, there is no concept of different data types (int, float, etc.) in R at all. And, it will reduce the memory size, but it will not affect the speed much. On video cards, yes, there will be an increase.

 
Aleksey Vyazmikin #:

As far as I understand, in R there is no concept of different data types (int, float, etc.). And, it will reduce the memory size, but it will not affect the speed much. On video cards, yes, there will be an increase.

everything is there. It will affect the speed catastrophically. Dataframes are the slowest beasts with the most overhead.

It's not about video cards, it's about understanding that such things don't count through dataframes in a sober state.

 
Aleksey Vyazmikin #:

Tip: Is it necessary to use vectors of 100,000 observations to see the correlation between them?

I am looking for highly correlated vectors, i.e. with correlation greater than 0.9.
 
You're welcome
 
Aleksey Vyazmikin #:

Your script has been running for more than a day and has not yet created a single file based on the results of the screening. I don't know, maybe it's time to switch it off?

Depends on the zhekez and sample size. If your processor is multi-core, parallelise the execution. Below is a variant of parallel execution

##----parallel--------------------------
library("doFuture")
registerDoFuture()
plan(multisession)
require(foreach)
bench::bench_time(
foreach(i = 1:length(cor.test.range))%dopar%{
    get.findCor(dt, cor.coef = cor.test.range[i])
}-> res
)
#  process     real
# 140.62 ms    2.95 m
#
 bench::bench_time(
for(i in 1:length(cor.test.range)){
    paste0("train1_" , cor.test.range[i]*10 , ".csv") %>%
        paste0(patch , .) %>% fwrite(res[[i]], .)
}
)
#  process    real
# 156 ms   157 ms

Four times faster than serial. Hardware and software

sessionInfo()
#  AMD FX-8370 Eight-Core Processor
#  R version 4.1.3 (2022-03-10)
#  Platform: x86_64-w64-mingw32/x64 (64-bit)
#  Running under: Windows 10 x64 (build 19044)
#
#  Matrix products: default
#
#  locale:
#     [1] LC_COLLATE=Russian_Russia.1251  LC_CTYPE=Russian_Russia.1251    LC_MONETARY=Russian_Russia.1251
# [4] LC_NUMERIC=C                    LC_TIME=Russian_Russia.1251
#
#  attached base packages:
#     [1] stats     graphics  grDevices utils     datasets  methods   base
#
#  other attached packages:
#     [1] doFuture_0.12.2 future_1.28.0   foreach_1.5.2   fstcore_0.9.12  tidyft_0.4.5
#
#  loaded via a namespace (and not attached):
#     [1] Rcpp_1.0.9        codetools_0.2-18  listenv_0.8.0     digest_0.6.30     parallelly_1.32.1 magrittr_2.0.3
# [7] bench_1.1.2       stringi_1.7.8     data.table_1.14.4 fst_0.9.8         iterators_1.0.14  tools_4.1.3
# [13] stringr_1.4.1     import_1.3.0.9003 parallel_4.1.3    compiler_4.1.3    globals_0.16.1

Good luck

Reason: