Machine learning in trading: theory, models, practice and algo-trading - page 86

 
Mihail Marchukajtes:

Well, let's say he mixed and halved, it turns out that in the training and test samples will be the same number of both classes, no?

If the number of examples of both classes in the sample was not the same, then they will not be the same in the test part. Only the examples for the low-representative class will be divided in half: half of them will be in the training part, and the other half in the test part. In the tutorial part, there will be exactly as many examples from the most representative class as there are from the least-representative class. And the remaining examples of the most representative class, which were not included in the training part of the sample, will be included in the test part.

The point is that in earlier versions there was no balancing at all. There, the sample was mixed by HSGF and was trivially divided into two parts: half of the samples were used in training and half in the test. Then I came across a sample in which the examples of both classes were highly unbalanced. It was quite obvious that the most representative class had excellent results in generalizability, while the low-representative class was below the plinth. I had to add balancing to the separator algorithm to get rid of such outliers.

 
No... it's okay.... false alarm :-)
 
SanSanych Fomenko:
The model is retrained because the list of predictors was not cleaned from noise predictors. This is a training example and is made as such deliberately. That's why I say so confidently.

I thought about it.
The forest remembers data, that's a fact, and with more trees it will have more "memory" to remember. But if even with enough trees it still can't get 100% accuracy, that means there are inconsistent examples in the training data. Some sets of training examples, where predictor values are completely the same, but have different classes. Such data can never be predicted 100%, even with training data. So it turns out that the model cannot even fully train, it just lacks data, and that makes it less likely to retrain.
The inconsistency of the training examples is not even caused by error, but because some predictor that could allow to achieve 100% accuracy was intentionally removed. But without it the predictions on the new data will be better.
Very interesting rule, on its basis we can make some simple method of preliminary estimation of predictor set, so that to discard some sets before training and crossvalidation of models.

 

Hello!

1) Has anyone tried any of the above? Any results?

2) Who tried to test strategies directly in R? I need to simulate trading in R-ka quite primitive but there are stops and other little things, there is a tool that will make it as simple and fast as possible?

 
Yury Reshetov:

Where do I get real volumes in the form of historical data? MetaTrader provides only the tick counter, which is called "volumes". Moreover, the values of these counters can differ by orders of magnitude in different kitchens.

...

Tick volumes differ not only in different kitchens, but even in one. Sometimes you can see a step, here was a dense flow, then bam went sparse.

This is due to a change in the teak filter inside the dilling.

It's interesting: there is a correlation between real volumes and tick volumes, and there is also a correlation between tick volumes and bar size.

 
Nikolay Demko:

Does this mean that there is a correlation between real volumes and bar size?

of course
 
mytarmailS:
of course
What is the collocation between the volume and the bar. The volume can be high, and the body of the candle is absent, as well as on the bar. The volume is small and the candle grew.... it all depends on the market conditions at the moment of the market....
 
Mihail Marchukajtes:
What is the meaning of the correlation between the volume and the bar. The volume can be high and the candlestick body is absent, the same goes for the bar. The volume is small and the candle has grown.... it all depends on the market conditions at the moment of the market....

:)

answering I assumed that we are talking about highly liquid markets, which are futures - foreign exchange markets, I think no one trades penalties here

http://prntscr.com/c10p51

The correlation in the 100-sliding window, volatility versus volume, is considered a significant positive correlation if I remember the value greater than 0.6

Скриншот
Скриншот
  • prnt.sc
Снято с помощью Lightshot
 
mytarmailS:

Maybe someone will be interested, I found a package that can simulate trading and build trading systems called quantstrat

http://www.rinfinance.com/agenda/2013/workshop/Humme+Peterson.pdf

I'll repost it, maybe it just flew by

and another useful linkhttp://www.r-programming.org/papers

or no one is interested in these packages? if not, why? i'm curious how and where anyone tests their models

 
mytarmailS:

I'm going to repost this, maybe it just flew by.

and another useful linkhttp://www.r-programming.org/papers

or nobody is interested in these packages? if not, why? i am curious how and where people test their models

All packages (models) can be divided into two categories:

  • good in principle
  • not good in principle

Performance of those packages which are "basically good" is about the same, difference is not essential.

All problems are not in the model, but in the set of predictors and their preliminary preparation. If we take some set of predictors, then the possibility to build a NOT over-trained model, as well as the magnitude of the error depends little on changes in the model. That is why you should take the simplest and fastest model from those that are "basically good".

PS.

From my own experience. At me more than 75% of labor intensity in the construction of TS - is the selection of predictors, if at all manages to pick up such a set for a particular target variable.

Reason: