Machine learning in trading: theory, models, practice and algo-trading - page 3090

 
mytarmailS #:
I haven't looked into it myself yet, I just came across it, I don't have enough time to do it all catastrophically
There is a 30-page description of the method https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2326253. I've started reading it. Apparently, it is based on cross-validation, but with its own peculiarities - combinatorial-symmetric.
The Probability of Backtest Overfitting
  • papers.ssrn.com
Most firms and portfolio managers rely on backtests (or historical simulations of performance) to select investment strategies and allocate them capital. Standa
 
Andrey Dik #:
No one was attacking R. Go back a few pages, refresh your memory.
Sanych called me and anyone else who doesn't bang their forehead on the altar of R a collective farmer.

I apologise for the kolkhoz, maybe not quite accurate.

Once again I will try to explain the difference between professional development and village development on the principle of "the first guy in the village".

R is not just a programming language, but a medium for developing narrowly specialised tasks - statistics, which includes MO and something else.

Packages in R are part of the language. Let's look at the language distribution package - there are already several basic packages there.

The set of packages in R, which is more than 10 000 packages with more than 100 000 functions, is a FUNCTIONALLY FULL set for solving problems, for example, MO.

Let me explain with the example of MO.

The site mostly discusses different variants of classification algorithms, especially variants of NS. The metaquotes for python are particularly revealing.

From the point of view of MO, the classification algorithm itself is a part of the problem, 30%. Try to find the other 70% in a village called Python. And it is almost impossible to find other variants of classification models, and there are up to 200 (1) of them.

R has an excellent reference apparatus that will allow you to find what is missing.

If you don't know WHAT to look for, then at the first stage you can take Rattle to see what is a set of tools for MO: primary data analysis, transformation, selection of predictors, preparation of files for testing, calculation by model or models, evaluation of results with appropriate graphical representation. This is the basic level.

If you have outgrown Rattle, you can take the Caret shell, which covers MO problems at the highest level. Caret provides access to up to 200 (!) packets that will give signals for trading. These packets can be compared, selected, ensembles of models can be made. Caret has everything that Rattle had, but on a more professional level.

For everything Caret has, R has analogues and a huge number of other supporting tools. All of this represents ONE PURPOSE.


All this is called a PROFESSIONAL ENVIRONMENT for working in statistics and IO in particular.

 
😂😂😂😂
 
Response from Prado et al. to Maxim with his preference for taking OOS at an early site:
Page 7.

Fourth, even if the researcher is working with a large sample, the
OOS analysis will have to cover a large portion of the sample to be conclusive,
which is detrimental to strategy development (see Hawkins [15]). If the OOS
is taken from the end of the time series, we lose the most recent
observations, which are often the most representative of the future. If the OOS
is taken from the beginning of the time series, testing was done on
possibly the least representative part of the data.
 
Forester #:
There's a 30 page description of the method here https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2326253. Started reading. Apparently, it is based on cross-validation, but with its own peculiarities - combinatorial-symmetric one

I don't even want to read it anymore. I'm burned out.

But I can write an automatic synthesis of strategies with a check on the criterion of non-training...

In other words, I can create strategies that maximise the criterion of untraining.


I can synthesise strategies according to this criterion, then I can test them on new data, whether they suck or are worth paying attention to....


Tested it -> got the result -> threw it away/learned it.

But running around for years with one idea like a "casual clown" and doing nothing and throwing it at everyone is a dead end.


What is there a criterion for unlearning?

 
Forester #:
Response from Prado et al. to Maxim with his preference for taking OOS at an early site:
Page 7.

Fourth, even if the researcher is working with a large sample, the
OOS analysis will have to cover a large portion of the sample to be conclusive,
which is detrimental to strategy development (see Hawkins [15]). If the OOS
is taken from the end of the time series, we lose the most recent
observations, which are often the most representative of the future. If the OOS
is taken from the beginning of the time series, testing was done on
possibly the least representative part of the data
.
I think that's why they use cross-validation, so that all sections of the data are one at a time in the OOS
 
mytarmailS #:

What's the criterion for untrained?

On page 8 so far. And this is still an introduction)))
It looks like it will be a comparison by Sharpe (but they write that you can use any other indicator) on cross validation.

 

Wow, they're getting to the Prado.

None of his techniques worked for me.)

 
Maxim Dmitrievsky #:

Wow, they're getting to the Prado.

None of his techniques worked for me.)

I have an Embargo plot that worked on cross-validation. It is harmful and should always be removed. Otherwise there will be an excessive OOS.
Maybe something else... I can't remember everything.
But it's not a fact that it's his invention. Maybe he just retold a useful idea
 
Reason: