Machine learning in trading: theory, models, practice and algo-trading - page 3166

 
Forester #:

it's time to end this epic endeavour of trying to find patterns in random data.

Yeah.

or it's time to stop thinking inertia and take a sober look at the results.

 
Forester #:

The results of learning through the corset are often not bad

From years 10 to 21 a corset was found with a fraction of 30% (30% of the random history from this site participated in learning), the other years generally pure OOS

In the terminal it looks like this


 

There are many methods for determining coreset. Here are some of the most popular methods:

  • Random subset: Simply select a random subset of points from the original dataset. This is the easiest way to get a coreset, but it does not always provide the best quality.
  • Reference points: Select points from the original dataset that have a large impact on the prediction of the machine learning algorithm. This is a more efficient way to get a coreset than a random subset, but it can be more complex.
  • Clustering: Group the points from the original dataset based on their similarities. Select one point from each group as the coreset. This is an efficient way to get a coreset that represents the original dataset well, but it can be more complex.
  • Haemometric kernel: Select points from the source dataset using the haemometric kernel. This is a powerful method of obtaining coreset which can be used to improve the quality of machine learning algorithms.
  • Extended random subset: This method selects random points from the original dataset, but with higher probability selects points that have a high impact on the prediction of the machine learning algorithm. This is an efficient way to obtain a coreset that provides good quality and can be used for a variety of machine learning tasks.

It is important to note that there is no universal way of obtaining coreset that is suitable for all machine learning tasks. The choice of the method to obtain coreset depends on the specific task and the available computational resource.

*Bard

 
Maxim Dmitrievsky #:

The results of learning through the corset are often not bad

From years 10 to 21 a corset was found with a fraction of 30% (30% of the random history from this site participated in learning), the remaining years generally pure OOS

In the terminal it looks like this


Well, there are also periods of drawdowns lasting six months to a year. Are you ready for that? Especially if the drawdown starts immediately at launching in real?

 
Forester #:

Well, there are also periods of drawdowns lasting six months to a year. Are you ready for that? Especially if the drawdown starts immediately when you start in real?

Usually diversify.

These plots will be profitable on other instruments. And if the general trend of all of them will be the same as on the presented chart, it is a guaranteed investment stability.

You just need to create a portfolio of instruments that will create the biggest recovery factor
 
Forester #:

Well, there are also periods of drawdowns lasting six months to a year. Are you ready for that? Especially if the drawdown starts immediately when you start in real?

I'm not ready to bet on 20 years :) this is more of a case study.

I'm fine with 10 years of training - 1 year of OOS, fine.

but there is a lot of noise, sometimes the model throws out almost all samples as useless, 3 transactions remain

There are also pieces of history that are never predicted normally.

All in all, it's not a very rewarding activity.

It's like spinning the old receiver and accidentally hitting some wave with noise.

 

Once again I am convinced that to forecast you need a model.

The model removes the unnecessary (noise) leaving the necessary (signal), if possible amplifying the necessary (signal), as well as the model is more deterministic, more repeatability in patterns....

as an example.

high-low minutka prices.


Further we build the simplest simplification of the price (create a model).

then we remove the excess (improve the model) with the help of a simple known algorithm for dimensionality reduction, the model became more repeatable.

and the last perhaps decorative touch.


I wonder how the MO will be trained on such data?

This is a test sample.

Confusion Matrix and Statistics

          Reference
Prediction    -1     1
        -1 24130  2780
        1   4478 23613
                                          
               Accuracy : 0.868           
                 95% CI : (0.8652, 0.8709)
    No Information Rate : 0.5201          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.7363          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.8435          
            Specificity : 0.8947          
         Pos Pred Value : 0.8967          
         Neg Pred Value : 0.8406          
             Prevalence : 0.5201          
         Detection Rate : 0.4387          
   Detection Prevalence : 0.4893          
      Balanced Accuracy : 0.8691          
                                          
       'Positive' Class : -1  

Have you ever seen numbers like this before?




 
СанСаныч Фоменко #:

What's the exact name? Or is it homemade?

I have been using different "wooden" models for many years and have never seen anything like this.

What do you mean, homemade? There's a theoretical justification, a good article. There's a package called RLTv3.2.6. It works well. You should pay attention to the version.

About ONNX for wooden models in Python. See package skl2onnx.

Supported scikit-learn models. The last supported set of options is 15.

Good luck

skl2onnx
  • 2023.05.09
  • pypi.org
Convert scikit-learn models to ONNX
 
And the main self-made and self-made man is Breiman, because he didn't write in R. He's such a hack.
 
mytarmailS #:

Have you ever seen those numbers on your own?

0.99 trains/test, with the model truncated to a couple of iterations. Only a few rules remain that predict classes well.

Reason: