Machine learning in trading: theory, models, practice and algo-trading - page 2550

 
Aleksey Nikolayev #:

Generally speaking, after training (on train) there is not one model, but a set of them, determined by meta-parameters. For example, different degree of the interpolation polynomial or different regularization coefficients in lasso regression, etc. Then the best value for the metaparameter is determined (the best model from the set is taken by testing). In turn, optimization of meta-parameter on test can also be determined by some parameters (meta-parameters), for optimization of which you can apply exam. For example, in what proportions to divide the original sample into train and test.

But, most likely, I just do not understand your idea.)

The best way to choose meta-parameters not by one test section, but with some glued together by cross validation or a rolling forward. Recently discussed.
 
elibrarius #:
The best way is not to select meta-parameters from a single test plot, but from several glued together by cross validation or rollicking forward. Recently discussed.

I agree. Actually, I just wanted to express the thought that cross validation can also be arranged unevenly and be set with some of its own parameters (meta-parameters) which can also be optimized by another sample (instead of taking them from the ceiling).

 
Aleksey Nikolayev #:

I agree. Actually I just wanted to express the idea that crossvalidation can also be arranged in a complicated way and be set by some own parameters (meta-parameters), which can also be optimized by another sample (instead of taking them from the ceiling).

Correctly understood - to carry out any actions so that the sample would be more like a sample of exam.

The question is how best to do it then.

One option, which is often used, enumeration of predictors - but with a large set of them too much. And can we, say, compare strings (set of predictors) for similarity and robustness of the target outcome in the two samples? Then we can drop anomalous (say, rare or not found in sample exam) strings from training sample, and due to this we can train much better, in theory.

 
Aleksey Vyazmikin #:

Correctly understood - to carry out any actions in order to make the sampling more like a sampling of exams.

The question is how best to do it then.

One option that is often used is to enumerate predictors - but with a large set there are too many of them. And can we, say, compare strings (set of predictors) for similarity and robustness of the target outcome in the two samples? Then we can drop anomalous (say, rare or not found in sample exam) strings from training sample, and due to this learn much better, in theory.

Vladimir had an article about it, I don't remember the name of the package, but it just kicked unpredictable strings out of the sample
 
mytarmailS #:
Vladimir had an article about it, I don't remember the name of the package, but it was just dropping unpredictable strings from the sample

Interesting, I'll have to look it up. But I want to throw out not the unpredictable ones, but the ones that don't occur in the sample outside of training. It is even interesting to mark them in some way, of course, and not just throw them out - to identify.

 
Aleksey Vyazmikin #:

Interesting, I'll have to look it up. But, I want to throw out not what is not predictable, and those that do not occur in the sample outside of training. It is even interesting to mark them in some way, of course, and not just throw them out - to identify.

Perhaps we are talking about removing outliers in observations, which may be of two types - by response size and by predictor size (they may both be combined in the same observation). The outliers proper are called the former, while the latter are usually called something else. This science is well developed for linear regression. Probably, you can check each test observation to see if it is an outlier in some sense relative to an exam.

 
Aleksey Vyazmikin #:

Interesting, I'll have to look it up. But, I want to throw out not what is not predictable, and those that do not occur in the sample outside of training. It is even interesting to mark them in some way, of course, rather than just throwing them out - to identify them.

You can use wooden models...
To decompose the model into rules, analyze the rules for the necessary statistics (repeatability is something else...), see if the rule appears on the new data...

The "intrees" package is 5 lines of code and go
 
mytarmailS #:
Vladimir had an article about it, I don't remember the name of the package, but it just kicked out unpredictable strings from the sample

TheNoiseFiltersR package.Take a look at the article.

 
elibrarius #:
The best way is not to pick meta-parameters from a single test plot, but from several glued together by cross validation or rollicking forward. We discussed this recently.

And how to use CV results correctly? to take the best found model parameters and teach them to the whole dataset, or to use it only for good datasets selection

For example, the output is the following table

        iterations      test-Logloss-mean       test-Logloss-std        train-Logloss-mean      train-Logloss-std
0       0       0.689013        0.005904        0.681549        0.007307
1       1       0.685340        0.011887        0.660894        0.001061
2       2       0.685858        0.012818        0.641069        0.004738
3       3       0.685975        0.023640        0.629656        0.000656
4       4       0.686613        0.024923        0.612977        0.002072
...     ...     ...     ...     ...     ...
95      95      0.863043        0.402531        0.123702        0.028628
96      96      0.866321        0.406193        0.122224        0.028623
97      97      0.869681        0.409679        0.120777        0.028611
98      98      0.873030        0.413121        0.119361        0.028595
99      99      0.874569        0.419064        0.117974        0.028572
 
Vladimir Perervenko #:

NoiseFiltersR package.Take a look at the article.

If everything is more or less clear with the noise predictors, not so much with the noise examples. I would like to know more about the methods of their definition (in the sense of theory, not the names of packages/functions used, although of course R always has links to articles). It is clear that there should be a "don't trade" class when classifying, since striving to be in the market all the time is considered a mistake. But it is not quite clear how this class may be described correctly in a more or less formal way.

Reason: