Parallelism will not increase RAM consumption by 20-100 times - General

Aleksey Vyazmikin 2022.10.29 22:50 #28071

Maxim Dmitrievsky #:

it's all there. The speed will be catastrophically affected. Dataframes are the slowest beasts with big overhead.

It's not about video cards, it's about understanding that such things don't count through dataframes in a sober state.

What is meant by "dataframes" - explain to the ignorant in this language.

Aleksey Vyazmikin 2022.10.29 22:51 #28072

mytarmailS #:

Tip: Is it necessary to use vectors of 100,000 observations to see the correlation between them?

I am looking for highly correlated vectors, i.e. with correlation greater than 0.9.

I don't know whether it is necessary or not - you should experiment. The sample is not stationary - for half of the sample there was no correlation, and then bang, and then it appeared.

Besides, I've tried all the coefficients in steps of 0.1.

mytarmailS #:
You're welcome.

Is this the cry of the soul?

A formal definition of From theory to practice Bayesian regression - Has

Aleksey Vyazmikin 2022.10.29 22:57 #28073

Vladimir Perervenko #:

Depends on zhekez and sample size. If the processor is multi-core, parallelise execution. Below is a variant of parallel execution

It is 4 times faster than serial execution. Hardware and software

Good luck

So parallelism will not increase RAM consumption?

Although mytarmailS code is more RAM-hungry, it is 50 times faster, maybe there are some limitations of the libraries you use - the script worked for more than 30 hours and did not create a single file.

Thanks for some complicated code examples - in R I am rather just a consumer, I can't figure out what to correct in the main script.

EA N7S_AO_772012 [WARNING CLOSED!] Any newbie What did I do

Maxim Dmitrievsky 2022.10.30 02:01 #28074

mytarmailS #:
Do you mean that for each data type there should be a method to calculate corr?

matrix is a data type built into R, it has something like matrix.corr() vector.

Maxim Dmitrievsky 2022.10.30 02:12 #28075

Aleksey Vyazmikin #:

What is meant by "dataframes" - explain to the ignorant of this language.

It was rather a message to R writers :) these are tables for convenient display of data and some typical manipulations with them such as extracting subsamples (as in sql).

They are not designed to race them in loops on such large data as you have, it will be slower than arrays by 20-100 times. By memory you have already understood by yourself.

I think it's fine here:

#  чтобы прочитать как работает функция и примеры  ?caret::findCorrelation
#  находим колонки которые не коррелированы с порогом корреляции 0,9    "cutoff = 0.9"
not_corr_colums <- caret::findCorrelation(as.matrix(df), cutoff = 0.9, exact = F,names = F)

I don't know how fast the built-in type "matrix" is, but it uses caret, which can also slow down. The built-in type has no vector operation to calculate correlation or something.

Any rookie question, so Errors, bugs, questions Working with Databases

mytarmailS 2022.10.30 02:38 #28076

Where do these thoughts come from

Maxim Dmitrievsky 2022.10.30 02:44 #28077

mytarmailS #:
Where do these thoughts come from

why do you slow down an inbuilt type with the left lobe, which should have its own corr calculation, as fast as possible for it

Valeriy Yastremskiy 2022.10.30 18:05 #28078

Maxim Dmitrievsky #:

why do you slow down a built-in type that should have its own Korr calculation that is as fast as possible for it?

Doesn't the lib take the type into account? Data type is like data for the cheapest calculations. The same matrix should be designed for calculations.

Алексей Тарабанов 2022.10.30 19:11 #28079

mytarmailS #:
How to get smarter in the future without getting stupider in the past? Algorithmically... without creating terabytes of knowledge.

You don't.

Maxim Dmitrievsky 2022.10.31 06:06 #28080

Valeriy Yastremskiy #:

Doesn't the lib take type into account? Data type is like data for the cheapest calculations. The same matrix should be designed for calculations.

I have not found an analogue of numpy for R, and the matrices there are not that fast and R itself consumes a lot of memory due to its paradigm.

Of course, a third-party lib can be slow, who would check it?

I don't know what to compare with, so I don't want to load a gigabyte dataset to compare the speed

Help me find a Questions from Beginners MQL5 MQL4 Learning

Machine learning in trading: theory, models, practice and algo-trading - page 2808