Machine learning in trading: theory, models, practice and algo-trading - page 211

 
Renat Fatkhullin:

1) Unfortunately, you incompletely phrased the question and received an ill-conceived and brief polite "it doesn't matter" answer.

You wanted a "so agreed/convention" answer by formulating it in the question itself. But Duncan got away with "what is right" the first time and repeated it the second time.

2) You did not get confirmation of accuracy in R and did not get an answer as to why the result is different in other packages. Parsing the question of "why the answer is different in other packages" is more important and is capable of revealing the topic.


3) Our position:

выражение для dgamma

(x)= 1/(s^a Gamma(a)) x^(a-1) e^-(x/s)

for x ≥ 0, a > 0 and s > 0


в точке 0 является неопределенным.

R believes you can include this point in the calculation, but take the limit values even if they are infinity as in the case of dgamma(0,0.5,1).

However, if one calculates probabilities given infinity at the zero point, all integrals from dgamma formally become infinite and by this logic pgamma should be equal to infinity for all values of x.

However, this contradicts the results of pgamma, where all values turn out to be finite. They are correct, as if at point x=0 the density is assumed to be 0.

1) Yes, I didn't get a detailed answer. Although I summed it up... I'm not imposing my opinion, I'm tired of arguing too, to be honest. I will point out that what this person said almost verbatim coincided with our original message. How to determine the density at the extreme point is not important, as long as the integrals are counted correctly:

We declare that, strictly speaking, the density of the gamma distribution at point zero is undefined. And when the limit on the right is taken, the density is equal to one.

In light of this, we think that the formulation of the statement "calculation errors in R" is not correct. More precisely, it is a matter of convention: what counts as equal to the expression zero to the power of zero. Equating the gamma distribution density to zero at the point zero does not seem to be any conditional practice.

2) There wasn't even any talk of accuracy on my part. The density at point zero isn't about precision; it's about how you derive it as a result of the function's work - non convergence (NaN) or equating it to the limit or to zero. The main point is that it doesn't matter for calculating the integral.

3) I reread the corrected text of the article. And I'm glad that you decided not to consider dgamma behavior an error.

But this :

all integrals from dgamma formally become infinite and by this logic pgamma should be equal to infinity for all values of x.

That's strange, Renat.

pgamma in principle cannot be infinite, since the ingegral is bounded from above by a value of 1.

Take the normal distribution. It is defined on [-inf,+inf]. Integral of the distribution function = 1 on this entire interval. But somehow it turns out that summation (integration) of density on infinitely large sapport does not result in infinite sum. Although the density on the whole keepport != 0 in any region.

And for dgmamma the point x ==0 with its density == inf (and by the way, you have not in any way neglected the case where the density tends to 1 at this point and what conclusions about integration you draw from this...) how often does it occur? I'll tell you, it doesn't. The probability of realizing a random variable at any point == 0 in any continuous distribution... All statisticians know this. Density is considered an approximation of probability to an infinitesimal region around x.

It follows from this fact that no matter how huge the density is at the extreme point, its effect on the total ingral = 0. Think about it...

I think you're overthinking it. ) But I'm not going to argue and figure it out. Maybe someday I'll realize it and answer it myself instead of Duncan. )

Thank you.

 

R is an amazing system, which personally opened my eyes to how far we were in MetaTrader/MQL from real needs "to make complex calculations simply and right now".

We (C++ developers) have in our blood the approach "you can do everything yourself, and we give you the low-level base and the speed of calculations". We are fanatical about performance and we are good at it - MQL5 is great on 64 bits.

When I personally sat down to work on R I realized that I needed as many powerful functions in one line as possible and that I should be able to do research in general.

So we have made a sharp turn and started upgrading MetaTrader 5:

  • included the previously rewritten mathematical libraries Alglib and Fuzzy into the standard delivery, covered unit tests
  • Developed analog of statistical functions from R, ran tests and covered them with tests.
  • developed the first beta-version of Graphics library as an analogue of plot in R. we added single-line functions for fast output
  • started changing interfaces in terminal output windows, so we could operate with tabular data, changed the direction of output, added turning off unnecessary columns, changed the font to monospaced in the Expert Advisor output window
  • a powerful ArrayPrint function for automatic printing of arrays, including structures, was added.
  • added FileLoad and FileSave functions for quick loading/unloading of arrays to the disk.


Of course, we are at the beginning of the way, but the right vector of forces is already clear.

 

7 steps in integration is certainly not enough. Here's 1000:

> pgamma(0.8, 0.5, 1)
[1] 0.7940968

#а теперь велосипедное интегрирование:
> integration_steps <- seq(0, 0.8, length.out=1001)
> integration_result <- 0
> for(i in 2:length(integration_steps)){
+ integration_result <- integration_result + dgamma(integration_steps[i], 0.5, 1) * (integration_steps[i] - integration_steps[i-1])
+ }
> integration_result
[1] 0.7709089
#погрешность ~0.02, но тут способ уже проще некуда, и так сойдёт :) . Бесконечность при x=0 не мешает.
 
Alexey Burnakov:

1) Yes, I did not get a detailed answer. Although I was summing it up... I'm not imposing my opinion, I'm tired of arguing too, to be honest. I will draw your attention to the fact that this person's words were almost verbatim with our original message.

It was a polite response without detail or verification. And the answer didn't coincide with Wolfram Alpha and Matlab, which is a problem.

No need to sidestep - the root question was clearly stated.

 
Dr. Trader:


#погрешность ~0.02, но тут способ уже велосипедней некуда, и так сойдёт :) . Бесконечность при x=0 не мешает.

Integrate the function 1/x, from 0 to 1, including the boundary points and compare with the result of analytical calculations.

Wolfram says the integral will not converge because of the singularity at x=0.

 
Quantum:

Integrate the function 1/x, from 0 to 1, including the boundary points, and compare with the result of analytical calculations.

With the same code - 7.485471. R got to 76.3342 and said it wouldn't go any further, and that's not an accurate result, and wrong. Wolfram just said right away that the result doesn't add up and didn't answer anything.
The correct answer I do not know, how much?

Don't tell me that since the integral of 1/x can't be found, the integral of dgamma(x) can't be found either. Even if both functions tend to infinity at x -> 0+ , they tend at different rates, and this rate affects whether the integral can be found or not.

 

There is a function -log(x). It tends to infinity at x->0. You can do it without minus, then it will tend downwards, I am so uncomfortable.

And it has an integral from 0 to 1. Infinity does not interfere.


 
Renat Fatkhullin:

R is an amazing system, which personally opened my eyes to how far we were in MetaTrader/MQL from the real need to "make complex calculations simply and right now".

...

So we took a sharp turn and started upgrading MetaTrader 5:

  • included the previously rewritten Alglib and Fuzzy math libraries as standard, covered unit tests
  • Developed analog of statistical functions from R, ran tests and covered them with tests.
  • developed the first beta-version of Graphics library as an analogue of plot in R. we added single-line functions for fast output
  • started to change interfaces in terminal output windows to be able to operate with tabular data. changed output direction, added disabling of unnecessary columns, replaced the font by a monospaced one in the Expert Advisor output window
  • a powerful ArrayPrint function for automatic printing of arrays, including structures, was added.
  • added FileLoad and FileSave functions for quick loading/unloading of arrays on disk.


Of course, we are at the beginning of the way, but the right vector of forces is already clear.

R, as well as many other languages, is so far more convenient for machine learning than MQL as it preliminarily provides some functionality to process data in arrays. The thing is that a sample for machine learning is most often a two-dimensional array, and so it requires some functionality to work with arrays:

  1. Inserting rows and columns as arrays of smaller dimension into another array
  2. Replacing rows and columns in an array as arrays of smaller size
  3. Deletion of rows and columns from an array (for example, to delete low-value predictors or examples with explicit "outliers" from the selection)
  4. Dividing arrays into parts, resulting in two or more arrays that are parts of the original array (necessary for dividing a sample into training and test parts or into more parts, for example for Walling Forward).
  5. Random shuffling of rows and columns in an array with uniform distribution (it is necessary for certain samples from the sample to fall into different parts, preferably evenly distributed in these parts).
  6. Various functions for data processing by separate rows or columns (for example, calculation of arithmetic mean value in a row or in a column, dispersion, or search of maximum or minimum value in a row for the subsequent normalization).
  7. And so on and so forth.

Until MQL has implemented the aforementioned functionality necessary for handling samples in arrays, most developers of machine learning algorithms will prefer other languages that already have all this available. Or they will use unpretentious MLP (algorithm of 1960s) from AlgLib library where, if I remember correctly, for convenience two-dimensional arrays are presented as one-dimensional.

Of course, functions of densities of random distributions are also necessary functionality. But such functions are not always needed in machine learning tasks, and in some tasks they are not used at all. But operations with samples, as with multidimensional arrays, is something that implementation of machine learning algorithms cannot do without for any task, unless of course it is a task of training a grid to learn the known normalized data from trivial COR.

 
Renat Fatkhullin:

R is an amazing system, which personally opened my eyes to how far we were in MetaTrader/MQL from real needs "to make complex calculations simply and right now".

We (C++ developers) have in our blood the approach "you can do everything yourself, and we give you the low-level base and the speed of calculations". We are fanatical about performance and we are good at it - MQL5 is great on 64 bits.

When I personally sat down to work on R, I realized that I needed as many powerful functions in one line as possible and that I should be able to do research in general.

So we have made a sharp turn and started upgrading MetaTrader 5:

  • included the previously rewritten mathematical libraries Alglib and Fuzzy into the standard delivery, covered unit tests
  • Developed analog of statistical functions from R, ran tests and covered them with tests.
  • developed the first beta-version of Graphics library as an analogue of plot in R. we added single-line functions for fast output
  • We began to change the interfaces in terminal output windows to be able to operate with tabular data. We changed the direction of output, added disabling of unnecessary columns, changed the font to monospaced in the Expert Advisor output window.
  • a powerful ArrayPrint function for automatic printing of arrays, including structures, was added.
  • added FileLoad and FileSave to quickly write/read arrays to disk


Of course, we are at the beginning of the way, but the right vector of forces is already clear.

This is a balanced and surprisingly objective assessment of R.

The constructive part of the discussion was not wasted. You listen to comments and suggestions from R users. We too are interested in improving the platform.

Of course you are at the beginning but in any case R vaccines will improve ICL.

Good luck in your hard work.

 

Regarding the conventions that Burnakov spoke of.

Let us consider three quite different cases.

1. Division by a constant equal to zero.

In R we have the result

> 1/0
[1] Inf

Is this result correct?

For the interpreter, this result should be considered correct, because in any case we cannot terminate the environment R

For the compiler this result is correct. And correct is when an exceptional situation occurs, program execution is interrupted and control is given to handle that exceptional situation, otherwise it will crash.

Notice how different it is!

2. Division by a variable equal to zero.

> a<-0
> 1/a
[1] Inf

Strictly speaking, this variant is different from the previous one.

The function 1/a is continuous everywhere except a=0. At that point the limit on the left = -Inf, and the limit on the right = +Inf.

R doesn't understand this, but we can accept it since the difference between minus infinity and plus infinity makes sense in mathematics, not in program code


3. Dividing two infinitesimal quantities as they approach zero

> sin(a)/a
[1] NaN

The meaning of NaN has no explanation for me at all. But it is quite clear, given point 2, that R does not understand limits as such.

Are these errors of R as a programming system? I don't know. Most likely R documentation should have informed about such nuances, but then how to implement it with decentralized development at the moment about 130,000 functions? Do we need to?

What does this imply in terms of the discussion that has arisen?

Decisions are made on shore.

We take R and stupidly port to the MCL code. At the same time we should be aware that the options listed above may have different interpretations in different functions of R

2. declare agreements, which values will be accepted in the cases I mentioned (the list may be incomplete). We thoroughly check the code of R and if it does not match our settings, we port it from R to MCL, correcting it according to our agreements. At the same time, due to centralized development, we consistently implement the accepted agreements and in this sense we have a better system.

Reason: