Machine learning in trading: theory, models, practice and algo-trading - page 3175

 
fxsaber #:
That's how it is, I asked one question, and then professionals came with their answers))))))

Like a divorcee on a young stupid IT guy))) at least fight back with sticks now))))
 
fxsaber #:

Please clarify, what is the meaning of these intervals?

Right now I am imagining this scheme on them.

  1. The number crusher runs on train, filtering by test.
  2. The number crusher is switched off completely. And a few best results are taken on exam.


The first point seems strange. A la "forward test" in the tester. Is it better than just optimisation without filtering, but on a combined interval: train+test?

In medicine, a group of 60 approximately equally ill patients is randomly divided into three groups, the first group is treated with a new drug, the second with an old one, the third group is not treated, they are given a placebo. And if the first group is better than the second and third, the drug is recognised as good and the experiment is repeated on a large number of some time, i.e. monitored, and then released into free floating.

It seems logical that the probability of false positive and negative results decreases, but to me it is not a panacea against errors.

And I don't understand or recognise the categorical assessment of results in noisy studies at all)))).

 
Andrey Dik #:

No, you can't.

The character of the pattern (surface) tells only about the characteristics of the vehicle according to a particular optimisation criterion. If we take another criterion, the pattern will be different. Misunderstanding of this leads to the misconception that optimisation (training) should not be carried out to the global maximum, on the contrary, it should. It is the choice of an optimisation criterion adequate to the strategy that is the key to correct training.

All this has been discussed many times before.

And everyone has remained at his own opinion. As it seems to me, you are the only one.

You need to look for plateaus, not individual peaks, which due to the randomness of the process will never appear again.

 
Valeriy Yastremskiy #:

In medicine, a group of 60 approximately equally ill patients is randomly divided into three groups, the first is treated with a new drug, the second with an old one, the third is not treated, they are given a placebo. And if the first group is better than the second and third, the drug is recognised as good and the experiment is repeated on a large number of some time, i.e. monitored, and then released into free floating.

It seems logical that the probability of false positive and negative results decreases, but to me it is not a panacea against errors.

And I don't understand or recognise the categorical assessment of results in noisy studies at all)))).

Splitting the file into different sections, and in the example you gave, random inclusion of patients in the group, which corresponds to my sample, works only if the predictors are relevant to the target variable. i.e. it is not rubbish. In medicine, figuring out the relationship of a drug (predictor) to a disease is done by understanding the physiology of the process of putting the drug into the body. We have to have other methods to determine the relationship between the predictor and the target variable - all this is preprocessing, which is done BEFORE training the model and this step is mandatory.

Likewise, the testing architecture is mandatory and must be done ABOVE the model overtraining.

 
СанСаныч Фоменко #:

And everyone is left with their own opinion. It seems to me you're one of a kind.

You need to look for plateaus, not individual peaks, which due to the randomness of the process will never appear again.

I don't really care that I'm in the only number, it just shows that there are really few people who understand the problematic))))

whether plateaus or peaks - depends on the surface of the optimisation criterion, the criterion! why do you think they often use the criterion - error in MO? Because the surface is monotonic))) i.e. they always try to choose such a criterion, which is as monotonic as possible and has one global if possible.

so we should not look for a plateau, but for a criterion with as monotone hypersurface as possible.

By the way, the error criterion has exactly one global with the value 0. And the fact that you need to stop training without reaching the global is another issue and has nothing to do with the criterion surface.

 
how much I pity those poor people with immature minds who read this nonsense and think that here smart people discuss something clever....
 
Maxim Dmitrievsky #:
One could conclude that you've been doing this for years. Or you could just do a random search, which you are doing now.

I just wrote that random search is an unproductive approach.

I use randomisation with an element of randomness in predictor selection when testing sampling potential, and I have been using it for years in CatBoost.

Randomisation does not give any justification for expecting the model to continue to work, because the predictor responses have been randomised into it.

 
Aleksey Nikolayev #:

IMHO, it looks like pi-hacking, which Maxim wrote about recently. Unless some stat-tests are used to determine the significance of allocated quanta, it is definitely him.

I once gave a simple example when the best hour of the week for trading was selected on SB (when it obviously doesn't exist). There were only 5*24=120 variants, but it was enough that such an hour was always found (the time interval was half a year, I think). There is "sampling stability" there as well.

What significance tests do you suggest? I'm not saying that the algorithm for selecting quantum segments is perfect, on the contrary - there is a lot of rubbish and I want to improve it.

I don't understand, on what signs you decided that it is some kind of " pi-hacking" - and which part exactly, selection of quantum segments or screening of strings, which are well and without training screened out by quantum segments (i.e. graphs that I built)? Yes, the method is a bit different from the common approach to building wooden models, but not really much, the concept remains.

Regarding the example on SB, there are two considerations here:

1. If the process is unknown and there is only data, then one can assume a pattern that there is some best hour to trade. Or is there a consideration to reject this hypothesis?

2. If these observations were relatively evenly distributed over time (event history), then this is more like a random number generator error.

In training, I use samples over a large period - usually at least 10 years.

 
СанСаныч Фоменко #:

Splitting the file into different sections, and in the example you gave, randomly including patients in a group, which corresponds to my sample, only works if the predictors are relevant to the target variable. i.e. it is not rubbish. In medicine, figuring out the relationship of a drug (predictor) to a disease is done by understanding the physiology of the process of introducing the drug into the body. We have to have other methods of determining the relationship between predictor and target variable - all this is preprocessing, which is done BEFORE training the model and this step is mandatory.

Likewise, the testing architecture is mandatory and must be done ABOVE the model overtraining.

Unfortunately no. Phagocytosis is visible in a microscope, but further the science of medicine, where the microscope does not help, is hypotheses confirmed by proper experiments)

And by the way, the patients don't know what group they're in.))))

In general, similar conditions, without understanding the cause and effect relationships search for these relationships.

 
fxsaber #:

It varies, of course. But very often you can see a breakdown right after Sample. Perhaps it's a cognitive distortion, when you pay more attention to something and get the impression that it happens too often.

If it happens often, then there should be no question of global patterns changing, otherwise the break point would be around the same date.

But maybe, purely statistically, the frequency of their occurrence changes. That is, the old ones continue to operate, but there are also new ones for the model, which were not previously encountered for a number of reasons - most importantly, they were not present during training. For example, volatility has changed significantly, and the code (conditionally predictors) has some fixed values, or there are few observations for such volatility - it was stable all the time during training or in other ranges. In other words, new observations grow in the sample (if new data are collected) - here we need a mechanism for detecting the appearance of such events.

The opposite can also happen - when a strategy is built on rare events, and the trend lasts for more than a year. Recently I was given a look at such a miracle of EA construction.

The person had only seen the history of the EA's behaviour since 2016 initially (gold growth) and complained that something had broken and it was necessary to optimise the right chart breakdown with the help of MO.

Under the bonnet it turned out that a dozen indicators, each of which gave an average of 100 signals, i.e. actually were found emissions detected by different indicators on the history and combined into a common group. Will these outliers continue to repeat on the history with the same probabilistic outcome?

Yes, there are outliers that are not outliers, although statistically they are, but how to separate them from the others is a question.

fxsaber #:

The chart shows three years of daily trading.

For ticks, it seems like a lot, but I use a larger range - from 10 years on the minutes and the signals are initially not so frequent - there is a base signal.

fxsaber #:

What I didn't do was to make a chart for each range. I counted the statistical data, but did not look at the chart itself.

Look at the dynamics of the patterns - often they can be clumps that were in a relatively short time interval - relative to the sample, it is good if the observations of the pattern tend to repeat over the entire interval.

Another nuance, with the same CatBoost - there about 50% of leaves die off on new data, i.e. patterns stop occurring on which the model was built.

fxsaber #:

Didn't understand the highlighted.

The other two samples are test and exam, following the sample on which the training was done.

Then you asked why to use them - initially test was used to control overtraining in CatBoost, i.e. when new patterns do not give improvements on the test sample - iterations on improvement are stopped. And exam - just already independent testing of the model. Ideally, you should learn to select a model that will be successful on exam by train and test - this is a separate issue.

And, of course, if the task of model selection is solved, then the sample for training can be increased, if necessary, or at least shift the train and test samples closer to the current date.

Reason: