How to build a statistical library for the next 15 years? - Articles, Library comments

Renat Fatkhullin 2016.11.29 18:21 #121

СанСаныч Фоменко:

Read it.

Got into it.

I think that the tests given by you are not quite correct. I consider it necessary to write about it, because comparisons of performance are not the last.

The point is that MKL is qualitatively different from R. And in cases of performance comparisons these qualitative differences should be taken into account where possible. R is an interpreter and MKL is a compiler. This qualitative difference for industrial programmes is to MKL's advantage.

Take a look at the source code of R (it is an opensource code). There all basic mathematics in C/C++ is compiled into the engine. And most of the packages are written in C++ too, otherwise you can't wait for the results of calculations.

If R were an interpreter in basic/system functions, it would lag behind MQL5 by 200-500 times. We specifically tested the system C/C++ functions of R, rather than building manual processing in loops (where R lags behind by hundreds of times).

In R development, there is a constant search for "how can I find a package so I don't have to write a for/while/foreach loop". In fact, there is only one method of doing calculations in R, and that is to pass any more or less massive calculations to third-party packages.

But there is another qualitative difference, which is also of great importance in industrial operation of programmes and in tests these differences were not taken into account, which led to distorted results.

The qualitative difference between R and MKL is that the elementary object in MKL is a scalar, from which more complex objects, for example vectors, are made up. It is vectors that are fed to the input of distribution functions.

Look in the /include/math/stat catalogue for hundreds of vector functions.

In R, there is no concept of a scalar at all. The simplest object is a vector. R exploits this fact extensively and in our example comparing distribution functions in R code we can clearly see such a programming technique, specific for R, as "vectorisation", which is not available in MKL. Since this is a specific technique in R that speeds up computations 10-100 times (depending on the size of the matrix), the code for R would have to contain this very technique. The use of vectorisation is obvious, because in the tests we take an input vector and perform calculations over it 100 times, i.e. it is a matrix with the same, but we can do what with different columns.

There is no vectorisation and no modern features in R. The code there is just written head-on by ordinary programming juniors. Yes, they are decent mathematicians, but they are mediocre programmers.

GPUs in R remain only fairy tales and isolated attempts in the rarest packages.

To summarise: a text in R should be written in R using its capabilities, especially in the absence of their analogues in MKL.

You simply do not know either R or MQL5.

You haven't looked at the sources of R, you don't know the sources of MQL5. You have not built compilers for the last 15 years. But you are trying to argue with those who have done it all.

Machine learning in trading: What's new in MetaTrader Errors, bugs, questions

Renat Fatkhullin 2016.11.29 18:27 #122

Currently, the MQL5 statistical library (excluding Alglib, Fuzzy) already has more than 461 functions: https://www.mql5.com/ru/forum/86386/page222#comment_3867386.

This already covers the basic statistical functions well.

If you have read the article before, I recommend you to read it again - yesterday they released a new version of the article with a lot of new functions.

Машинное обучение: теория и практика (торговля и не только)

www.mql5.com

Добрый день всем, Знаю, что есть на форуме энтузиасты machine learning и статистики...

News Filter Libraries: Expert Machine learning in trading:

fxsaber 2016.11.29 18:51 #123

Renat Fatkhullin:

Currently, the MQL5 statistical library (excluding Alglib, Fuzzy) already has more than 461 functions: https://www.mql5.com/ru/forum/86386/page222#comment_3867386.

This already covers the basic statistical functions well.

I recommend those who have read the article before to read it again - yesterday they released a new version of the article with a lot of new functions.

Still haven't figured out how to send a push message to Quantum. Please add a thing that may not even be in R.

This is a quick calculation of Mean interval when shifting it by one to the right. Similarly, the calculation of the Pearson correlation coefficient.

Pearson is pretty hard to calculate, if head-on. But there are iterative methods of calculation: K[i] through K[i-1].

It's funny, it's the first time I've encountered a sentence in Russian with a comma after each word:

Add, please, a thing that may not exist.

Gentlemen site developers - Stochastic resonance Any rookie question, so

Renat Fatkhullin 2016.11.29 19:51 #124

Why don't you write the necessary function yourself?

Look at the full sources of functions in /include/math/stat and write the missing ones.

СанСаныч Фоменко 2016.11.29 19:56 #125

Renat Fatkhullin:

There is no vectorisation and no modern features in R. The code there is just written by ordinary programming juniors. Yes, they are decent mathematicians, but they are mediocre programmers.

You simply do not know either R or MQL5.

You haven't looked at R sources, you don't know MQL5 sources. You haven't built compilers for the last 15 years. But you are trying to argue with those who have done it all.

I have very modest knowledge of programming, but not to the extent you describe.

Anyway, I understand perfectly well that the internal implementation of R in C++ you refer to has nothing to do with the problem of measuring execution speed I raised. I am writing about the technique of writing code in R itself, and what is inside is what we measure.

So, about vectorisation.

A string looks normal in R

с <- a+b

It is always at least a vector calculation. It depends on the context - what a and b are.

Furthermore,

c <- sqrt(a)

will give a vector c, each element of which is the square root of the corresponding element of vector a

In this case, a does not necessarily have to be a vector, it can be a more complex object, such as a matrix.

In MQL these are always cycles.

Moreover, vectorisation in R implies not only the objects themselves, but:

procedural apparatus in the form of specific operators apply and its variants
use of the standard MKL library to perform all these vectorised operations.

And returning to the meaning of what I wrote in the previous post.

I don't write anything about the quality of C++ functions' implementation at all. Like you, I propose to measure them as they are. But using R language tools which are specially intended for vectorised operations.

For example.

For all your tests, form a matrix M with 100 (as you have), where each column models a quote

Then on R the minimum over all columns looks like

apply(M, 2, min)

The result will be a vector that contains the minimum of each column

Using this pattern, we need to measure the rate of all distribution functions wrapped in the appropriate apply. There are many of them and they are different. There are no analogues in MKL.

At the same time, make sure that the MKL library is installed together with R.

Intel® Math Kernel Library (Intel® MKL) | Intel® Software

software.intel.com

Intel® Math Kernel Library (Intel® MKL) accelerates math processing routines that increase application performance and reduce development time.

Research in matrix packages Machine learning in trading: On one application of

СанСаныч Фоменко 2016.11.29 20:08 #126

Renat Fatkhullin:

Why don't you write the necessary function yourself?

Look at the full function sources in /include/math/stat and write the missing ones.

Interesting idea.

Maybe you can find a performer to port packages. For example, splines. Got top quality mashups, the real deal.

Renat Fatkhullin 2016.11.29 20:19 #127

СанСаныч Фоменко:

I have very modest knowledge of programming, but not to the extent you describe.

In any case, I understand perfectly well that the internal C++ implementation of R you refer to has nothing to do with the problem of measuring execution speed I have raised. I am writing about the technique of writing code in R itself, and what is inside is what we measure.

Unfortunately, you had no idea that everything inside R is in C/C++. You clearly thought that it was an interpreter even in system functions.

So, about vectorisation.

In R, a string looks normal

с <- a+b

It's always at least a vectorising computation. Depends on the context - what a and b are.

Furthermore,

c <- sqrt(a)

will give a vector c, each element of which is the square root of the corresponding element of vector a

In this case, a does not necessarily have to be a vector, it can be a more complex object, such as a matrix.

In MQL these are always cycles.

We have shown how to work faster in loops. And in pure sources in MQL5 without using C++.

And we will also defeat the simplest vector sqrt. Here are two standard functions from the library with a full analogue of R:

bool MathSqrt(double &array[]) // the result is put into the same vector
bool MathSqrt(const double &array[],double &result[]) // result into a separate vector

Compare.

You haven't quite understood yet that these 461 functions of the standard MQ5 maths library have a huge coverage of basic mathematical operations.

Moreover, vectorisation in R implies not only the objects themselves, but:

procedural apparatus in the form of specific apply operators and its variants
use of the standard MKL library to perform all these vectorised operations.

Yes, yes. Theoretically.

And 99% of all operations you do exclusively in the simplest functions without a chance for acceleration.

In MQL5 OpenCL is standard and you can accelerate everything without third-party libraries. And in ordinary MQL5 you can get results at the level of C++.

But in R, the only option is to look for a package to accelerate each cycle. Yes, exactly every cycle, if it is anything in terms of the number of iterations.

And returning to the meaning of what I wrote in the previous post.

You do not have deep knowledge but just use superficial reasoning.

Machine learning in trading: Errors, bugs, questions Research in matrix packages

Renat Fatkhullin 2016.11.29 20:26 #128

Few people realise this, but it's likely that when using MKL there will be a fabulous overhead on moving the R input data into regular arrays that MKL will work on, and then the result has to be moved back into R's internal data representation format.

I haven't dug into this, but logically that's what it looks like. Which means serious expenses for providing MKL support.

In MQL5 there are no such losses at all, of course. Only in OpenCL you need to copy data, but there it's a simple and flat memcopy.

Building a trading system Norm? Any rookie question, so

fxsaber 2016.11.29 20:32 #129

Renat Fatkhullin:

Why don't you write the necessary function yourself?

I wrote it once, but I didn't format it as a math function.

Look at the full sources of functions in /include/math/stat and write the missing ones.

The question is to put them into a standard library with scientific and programming combing, as Quantum does.

Most likely, it will be necessary to make a performance comparison with your solution. Then, I think, it will be possible to convince to put the bicycle in the mat. library. I haven't seen this in mat. packages myself (I can't say for R).

Discussion of article "How Errors, bugs, questions Complete Math Library

Renat Fatkhullin 2016.11.29 20:37 #130

Another little secret - why MQL5 is so fast, especially when the libraries are fully in the source code.

Our compiler is engaged in such a deep optimisation and has the ability to cut off so many checks and conditions that functions disappear completely and loops are simplified to the extreme. Of course, only for the x64 version.

Unlike the use of libraries/packages (where you can't even optimise a call) by other systems, the MQL5 compiler almost always works with the full source code and always performs global optimisation to the maximum depth. This gives amazing results.

That is why it is important for us to provide all standard libraries in source code. We know that in the finals everything will be overoptimised so that you can beat almost everyone in terms of speed. And even the overhead on the managed language will not affect it so much anymore.

Machine learning in trading: Features of the mql5 Any questions from newcomers

Discussion of article "Statistical Distributions in MQL5 - taking the best of R" - page 13