Machine learning in trading: theory, models, practice and algo-trading - page 2831

 
СанСаныч Фоменко #:

1. Dick's question is a perfectly valid and correct question. I don't use NS, but I do know that any function in any R package necessarily contains a reference to the author of the algorithm, and for serious algorithms, a reference to the article/book that describes the algorithm implemented in R. Since you are well acquainted with NS, if you were using R, you could search in R for the corresponding type of NS and find the corresponding reference where the corresponding algorithm is described, find a discussion on the algorithm, find out all the nuances of the professionals ... and answer Dick at the highest professional level, instead of mumbling something obscene.


2. R by name: the language of statistics and graphics. The essence of R reveals the rubric of its reference apparatus.

Here is a list of topics that R packages cover. One of the topics is machine learning.

Here is a list of packages related to MO.

A few years ago one could find competitors of R among other specialised statistical languages. For example, SPPS, today I have not found any. R remained the only statistical language, supported and moderated, has a huge number of mirrors, included in Microsoft software.


3. comparing R with Python is completely unjustified.

R is a specialised language. Python is a universal language. Python far surpasses R in the number of users, but the mass user of Python is web design. The fact that Python has statistics packages does NOT allow it to be classified as a statistics language. On this basis, C++, in which the packages used in both R and Python are implemented, can be classified as a statistics language. Due to its detailed rubric and references to algorithms of proposed functions, R can be used to study the theory and practice of statistics, while Python cannot.

And here I will quote Prado, but not verbatim, as I forgot that "Machine learning gives more possibilities, compared to classical statistics, in trading"

and MO bibles are very well developed for python, although others like statsmodels too. So arguing with these facts and proving something to each other is pointless.

In my understanding, R is for students and professors and hobbyists. So you can quote each other and brag about something. For professors, maybe some professors who teach. Python for serious projects and production. I haven't heard of any major MO project in R in production.
 
Maxim Dmitrievsky #:

And here I will quote Prado, but not verbatim, as I forgot that "Machine learning gives more possibilities, compared to classical statistics, in trading"

and MO bibles are very well developed for python, although others like statsmodels too. So arguing with these facts and proving something to each other is pointless.

In my understanding, R is for students and professors and hobbyists. So you can quote each other and brag about something. For professors, maybe some professors who teach. Python for serious projects and production. I haven't heard of any major MO project in R in production.

Why are you arguing about something you yourself claim you don't know?

Microsoft has made some efforts to make R a tool for developing large projects by a large number of developers.

Here is the list of Microsoft products/services that will allow us to work with R:
.
  1. Microsoft R Server / R Server for Azure HDInsight
  2. Data Science VM
  3. Azure Machine Learning
  4. SQL Server R Services
  5. Power BI
  6. R Tools for Visual Studio

What kind of hobbyists, what kind of professors need the Microsoft R Server ecosystem?



And VM virtual machines that allow you to unify everything and anything?

And the Azura cloud service with its machine learning packages and collaborative development tools?


R is now an industrial system integrated by Microsoft into its own and non-Microsoft developments.


And you are "for amateurs"...

 
СанСаныч Фоменко #:

Why are you speculating about something you yourself claim you don't know?

Microsoft has made some efforts to make R a tool for developing large projects by a large number of developers.

Here is the list of Microsoft products/services that will allow us to work with R:
.
  1. Microsoft R Server / R Server for Azure HDInsight
  2. Data Science VM
  3. Azure Machine Learning
  4. SQL Server R Services
  5. Power BI
  6. R Tools for Visual Studio

What kind of hobbyists, what kind of professors need the Microsoft R Server ecosystem?



And VM virtual machines that allow you to unify everything and anything?

And the Azura cloud service with its machine learning packages and collaborative development tools?


R is now an industrial system integrated by Microsoft into its own and non-Microsoft developments.


And you are "for amateurs"...

It's clear that "there's something there."

I'm speaking from the experience of communicating with implementers, including large ones, at least in Russia.

They all did it in python.

Microsoft has sdk for machine learning in C# and Asure, but nobody uses them.

So you made a big project in R, put it on a server. And who will maintain it? No one, because there are no specialists in such a quantity and no one wants to learn R because of one statistic.

For python, hire any student for a stick of sausage and you'll be fine.

"no one" is the word for NOBODY writes in R, under any pretence. Because there's python.

and you drive tradduns to write in R so that they will what? waste their time on a useless language.

 
mytarmailS #:

There's also the question of what we mean by finding the maximum in a noisy function...

As I understood the definition - "optimisation of a noisy function" it means that the function is complex and it is hard to find the maximum in it, gradient algorithms are not applicable, and so on.... Roughly speaking, it's no big deal, you apply global optimisation algorithm and it goes to search for global maximum....


But I look at it differently, I want to find the maximum of a noisy function but with noise removed, not the global maximum in a noisy function, but the global maximum in a de-noisy function....

(And it' s not trivial, because the function is not known and the noise parameters are not known....

You need a priori information about the noise. Moreover, you need a clear matmodel of the noise - additive, multiplicative or something else. You can't make a filter without a model. And this model should be similar to real data.

Perhaps you should look at geodata processing methods that work with two- or three-dimensional data. Or something like Fourier transform, as in jpg images, or wavelets - as in the new version of jpg, or multidimensional splines, etc.

 
Maxim Dmitrievsky #:

I would quote the slogan "fight and search - find and hide".

set any criterion as a custom metric, in particular these standard ones. It will still optimise by logloss, but it will stop at these custom ones, which probably makes some sense

and indeed it does, because stopping in the same bousting is always based on some cast criterion like accuracy.

So far, I guess, only like this. You can also do tree pruning by a custom criterion.

 
Maxim Dmitrievsky #:

it's clear that "there is"

I speak from experience in communicating with implementers, including large ones, at least in the Russian Federation

everything was done in Python

microsoft has sdk for machine learning in C# and Asure, but nobody uses them

So you made a big project in R, put it on the server. And who will maintain it? No one, because there are no specialists in such a large number and no one wants to learn R because of one statistic.

and for python, hire any student for a stick of sausage and you'll be fine.

"no one" is the word for NONE of the people who write in R, under any pretence. Because there is python.

and you drive tradduns to write in R, so that they will what? waste their time on a useless language.

RF is a bad example, because it is extremely dormant in the sense of matstat. Our forum is full of technicians, but most of them have extremely poor ideas about matstat. In institutes professors teach matstat in Excel) This all characterises our scientific and technical school extremely badly - serious solutions from the times of the USSR are much more often bought ready-made abroad, rather than developed in the country.

 
Aleksey Nikolayev #:

It was correctly answered that you need a priori information about noise. Moreover, you need a clear matmodel of noise - additive, multiplicative or something else. You can't make a filter without a model. And this model should be similar to real data.

Perhaps you should look at geodata processing methods that work with two- or three-dimensional data. Or something like Fourier transform, as in jpg images, or wavelets - as in the new version of jpg, or multidimensional splines, etc.

I can make a model without problems, any decomposition, even PCA and go ahead...

But what about the data? They don't exist, it's an unknown function, and even multidimensional....

ALL the data is the scattered points of the AO search results (that's if you save them).

It's not a time series, there's no structure or order.

 
Aleksey Nikolayev #:

RF is a bad example, because it is extremely stupid in the sense of matstat. Our forum is full of technologists, but most of them have extremely poor ideas about matstat. In institutes professors teach matstat in Excel) This all characterises our scientific and technical school extremely badly - serious solutions from the USSR times are much more often bought ready-made abroad, rather than developed in the country.

Excel will be more useful for them later in life :D

 
mytarmailS #:

I can make a model no problem, any decomposition, even PCA and go ahead.

But what about the data? They don't exist, it's an unknown function, and it's multidimensional...

The data are the scattered points from the AO search (if you save them).

It's not a time series, there's no structure or order.

Well, the first thing that comes to mind is to divide the space by a grid with not too large or small cells (the size is determined by the noise model). Start with some (randomly selected, for example) cell - a few points in it determine the direction of the gradient of the smoothed function and move to the next cell and so on until there is no transition or looping. The position of the extremum is set with accuracy to the size of the cell, so it should not be too large, but at the same time it should give the possibility of smoothing, so it is not too small. And we must accept the fact that there is no exact position of the extremum in principle, because it will vary depending on the smoothing method.

 
Aleksey Nikolayev #:

Well, the first thing that comes to mind is to divide the space by a grid with not too large or small cells (the size is determined by the noise model). Start with some (randomly selected, for example) cell - a few points in it determine the direction of the gradient of the smoothed function and move to the next cell, etc. until there is no transition or loop. The position of the extremum is set with accuracy to the size of the cell, so it should not be too large, but at the same time it should give the possibility of smoothing, so it is not too small. And we must accept the fact that there is no exact position of the extremum in principle, because it will vary depending on the smoothing method.

sounds like a lot of work.)

Reason: