Machine learning in trading: theory, models, practice and algo-trading - page 298

 
Andrey Dik:

What is better, to spend a little more time on development, but then always quickly calculate, or to develop quickly and then always put up with slow calculations?

If you develop quickly in R but are slow to count, where do you count? Quick to develop a supercar that is slow to drive? I don't need such a supercar.


In research tasks, it is the speed of development that comes to the fore. Instead of a person who writes super-fast code but tests 1 hypothesis a day, a person who writes slow code and runs calculations all night, but by the next day he has 20 hypotheses tested, is easily hired.

For the production-implementation of the model it is easier to hire another person for a small salary, who will rewrite the most promising models in a fast-track language. There are a lot of programmers, and the competition is high, but the researchers who are able to come up with something new, you still need to search :P

It is a common practice. Look for jobs as quantitative researcher and quantitative developer. The former usually need R/Python, the latter usually need C++/Java.

 
anonymous:


In research tasks it is the speed of development that comes to the fore. Instead of a person who writes super-fast code, but checks 1 hypothesis per day, you can hire a person who writes slow code and runs calculations all night, but by the next day he has 20 hypotheses checked.

For the production-implementation of the model it is easier to take another person on a small salary, who will rewrite the most promising models in a fast-acting language. There are a lot of programmers, and the competition is high, but researchers who can come up with something new are still to be found.

It is a common practice. Look for jobs as quantitative researcher and quantitative developer. The former usually need to know R/Python, the latter usually need to know C++/Java.

Isn't it easier to use ready and fast C++ libraries instead of using the same but slow in R for "fast development"? You, and many others here, constantly make a substitution of notions. In the same way, you can "quickly develop" in a C-like environment. Or that to work in R you don't need appropriate knowledge in datamining and statistics? In the same way you need the knowledge to develop in C. So why do double work?

And I don't know what kind of "common practice" you're talking about, but I'm used to doing well at once and once.

During my engineering practice I often saw people who for some reason thought that if they use super-duper development-friendly software complexes, they will definitely do well... But they couldn't understand one thing: they need another thing for everything to be good: the fat in their head.

 
Andrey Dik:

Isn't it easier to use ready and fast C++ libraries instead of using the same but slow in R for "fast development"? You, and many others here, constantly make a substitution of notions. In the same way, you can "quickly develop" in a C-like environment.

The choice of language is your right and personal business. About libraries - alas, for any modern or unusual algorithms, you have to implement everything yourself. For example, to calculate "bipower variance" (what does it mean? :D) I have not found libraries.

Or you don't need appropriate knowledge in datamining and statistics to work in R?

I don't claim so and I don't support this opinion.

And I don't know what kind of "common practice" you're talking about, but I'm used to doing well at once and once.

During my engineering practice I often saw people who for some reason thought that if they use super-duper development-friendly software packages, they will definitely do well... But they could not understand one thing - they needed something else for everything to work well: the fat in the head.

My practice in some projects with ## man-years complexity, where 95% of code is written in R, shows that different tools are good for different tasks, sometimes even slow ones, if they are used for prototypes and not for production. In the industry, using different tools at different stages of development is a common practice, which is confirmed by the requirements for specialists for different jobs. Projects I mentioned would be much more difficult if they were implemented in a C-like language, even by universal soldiers who know and can do everything and immediately write the solution to the problem, without any research phase.

Therefore I take my leave.

 
Andrey Dik:

Isn't it easier to use off-the-shelf and fast C++ libraries instead of using the same but slow in R for "fast development"?

When discussing R I have always stood on a comprehensive, systematic evaluation of this "graphics and statistics system". the algorithmic language itself is not mentioned in the title at all.

A head-on comparison of the performance of code in R and as above code in µl, or another programming language, on a specific, local example is completely NOT correct, since TC never consists as discussed above of a correlation function, but is always a large set consisting of code and function calls. Since R has a huge set of functions, the code usually has a small number of lines (1000 lines is a lot), but very capacious content-wise.

The program itself, which solves some meaningful problem, can be roughly divided into two parts:

  1. an algorithm in the form of code written in R
  2. calling functions for which R is a shell.

On point 1, if you had to write significant amounts of code, then the speed issue can be very serious. There are three ways to overcome it:

  • conversion to byte-code
  • Paralleling to all cores and/or neighboring computers. This is standard and very easy to do, if the algorithm allows it.
  • rewriting into another language, compiling type. C languages are native to R and the process of such inserts is well documented.

On item 2, performance is quite different and I doubt that without special efforts in usual development of µl program we can surpass R in performance.

For example.

On the surface, the operations on vectors and matrices in R are basic. Expressions of the form a <- b*c are executed by the Intel library. There are no loops. This is available to everyone, no additional knowledge or effort is required. When developing an μl program, cycles will most likely be used, although it is also possible to refer to the same library (if the program is not for the Marketplace), but a rather high level of knowledge and effort is required to achieve the same result.


But that is not all.

If we refer to computationally heavy algorithms, and these are appeals to R functions, then the developer himself, faced with the problem of performance, concerned about this problem and usually solved it at the stage of development. This is usually solved either by writing C code, or referring to libraries in C or Fortran. In addition, in such algorithms among the parameters of treatment there is a possibility to specify the use of many cores. The developer in R gets all this automatically, without effort. It is possible to fully focus on the essence of TC, but not on the technique of implementation. It is doubtful that the developer of μl-program can surpass the developer of functions in R for the trivial reason - it is necessary to specifically work on performance, but not on the essence of TC.


From all written it follows that a correct performance comparison would be to write real TC code on µl and the same code on µl+R (there are no trade orders in R) and compare. But in its full glory - do we need it?

 
mytarmailS:
Has anyone tried to work with recurrence plots? you can read it herehttps://habrahabr.ru/post/145805/, in particular tucking MOs instead of raw BPs? might work fine as an option


and more to readhttp://geo.phys.spbu.ru/Problems_of_geophysics/2005/20_Zolotova_38_2005.pdf


Idea for those who can implement it. Compare these plots by machine vision, looking for pattern matching. As a substitute for useless correlation
 
Maxim Dmitrievsky:

An idea for those who can implement this. Compare these plots by machine vision in search of pattern matching. As a substitute for useless correlation.
It is possible to find the most frequent pattern without any tricks. But what's the point?
 
fxsaber:

Previously, some ideas TC could not verify, because it was hampered by the low performance of some algorithms. In this case that is exactly what happened - an alternative algorithm allowed in the optimizer to explore an idea as old as the world, but could not previously be computed in a reasonable time.


When one has to count hundreds of billions of Pearson QCs in patterns of several thousand in length, the low speed of a seemingly simple task becomes an insurmountable bottleneck. One might begin to say that if a problem seems too computationally heavy, it is a poorly formulated problem with poor understanding. Perhaps it is. But what is done is done. And it's always interesting to see how others implement something like this.


And if you implement it on GPU as well, you will be really valuable.)
 
fxsaber:
It is possible to find the most frequent pattern without ponzi. But what's the point?

Well for strategies on patterns, of course) not necessarily the most frequent, there may be many variants
 
Maxim Dmitrievsky:

Well, for a strategy based on patterns it's not necessarily the most frequent one, there may be many variants.

That's what I don't understand. Using pattern data is useful for data compression with a small loss of information (various media). But what does it have to do with TC?

The easiest way to talk about this topic is to use the most common pattern. Finding it is not a difficult task.

Here is some riddle (pattern) occurs most often. The criterion for its selection is not that important right now. Let there be a pattern. How do we use it to trade?

To say that if an outermost history coincides with a zagagulina in the first 80%, then the next prices will follow the same pattern as the remaining 20% of the zagagagulina is nonsense.

 
fxsaber:

That's what I don't understand. Using pattern data is useful for data compression with a small loss of information (various media). But what does it have to do with TC?

The easiest way to talk about this topic is to use the most common pattern. Finding it is not a difficult task.

Here is some riddle (pattern) occurs most often. The criterion for its selection is not that important right now. Let there be a pattern. How to use it to trade?

To say that if an extreme history coincides with a zagagulina for the first 80%, then the following prices will go the same way as the remaining 20% of the zagagulina is nonsense.


Why do you think it's nonsense? if the forecast will be confirmed, on the same history, in a mass of cases
Reason: