Machine learning in trading: theory, models, practice and algo-trading - page 115

 
Anton Zverev:
I always read such topics (not only on this forum) where they try to build complex trading theories.
Genetic algorithms, neural networks, intricate formulas that only the author understands, etc.

And I always see that such systems do not work in the market. Monitoring goes either to zero or to minus.
But in the neighboring topic, someone earns by using an Expert Advisor on two slips. And they earn good money.

The question is, does it all make sense?
Because in my experience, the simpler and clearer the system, the more profitable it is.

Are you by any chance a clerk in an old-style brokerage house?

(Please tell me "how should", I suppose your "secret method" is to trade mashki (random) and double for a loss, right?))

I have to live on commission, gentlemen, only on the commission ...

 
Dr.Trader:

Committee creation, and testing:

There's a problem in that the original factor type classes, and the result in the matrix is converted to the order numbers corresponding to the factors. So at the end the comparison goes through as.numberic().

For everything to work properly with factors, I need to create predictionMatrix as data.frame, but after that my rbind function gave out varnings, I need to change something else, I didn't understand what's wrong there.

A few thoughts on the code:

1. You don't have to use the for() construct unless it's absolutely necessary. There is a wonderful alternative to foreach(), which, apart from the high speed of execution, allows you to parallelize calculations between the available cores.

Model ensemble makes sense and gives results only if the models have significant differences. Two variants: one data set - different models (RF, DT, SVM); one model - different data sets. Example for the last variant below

#----------------------------------------------------------------------
require(ranger)
require(foreach)
require(magrittr)
data("iris")
totalModels = 50
res <- list()
Acc <- c()
res <- foreach(i = seq_len(totalModels),
               .packages = c("ranger", "magrittr")) %do% {
          id <- rminer::holdout(y = iris$Species, ratio = 2/3)
          x.test <- iris[id$ts, -ncol(iris)]
          y.test <- iris[id$ts, ncol(iris)]
          model <- ranger(Species~., data = iris[id$tr, ], 
                          write.forest = TRUE)
          pred <- predict(model, x.test)$predictions
          acc <- sum(pred == y.test) / length(y.test)
          list(Acc = acc, mod = model) 
        }
for (i in 1:totalModels) {Acc[i] <- res[[i]]$Acc}
Acc
 [1] 0.9803922 0.9607843 0.9803922 0.9607843
 [5] 0.9607843 0.9215686 1.0000000 0.9411765
 [9] 0.9019608 0.9607843 0.9803922 0.9607843
[13] 0.9803922 0.9215686 0.9607843 0.9215686
[17] 0.9803922 0.8823529 0.9411765 0.9803922
[21] 0.9607843 0.9215686 0.9607843 0.9411765
[25] 0.9411765 0.9607843 0.9411765 0.9607843
[29] 0.8823529 0.9019608 1.0000000 0.9411765
[33] 0.9215686 0.9803922 1.0000000 0.9607843
[37] 0.9411765 0.9803922 0.9607843 0.9215686
[41] 0.9411765 0.9607843 0.9411765 1.0000000
[45] 0.9607843 0.9411765 0.9215686 0.9411765
[49] 0.9803922 0.9607843

Choose the models with the best performance and work with them from there.

Good luck

 
Vladimir Perervenko:

A few thoughts on the code:

1. You don't need to use for() unless it is absolutely necessary. There is a nice alternative to foreach() which, apart from high speed of execution, allows you to parallelize calculations between the available cores.

Model ensemble makes sense and gives results only if the models have significant differences. Two variants: one data set - different models (RF, DT, SVM); one model - different data sets. An example for the latter option below

Choose models with better performance and work with them.

Good luck

I would like to see you more often. Don't disappear.
 
Vladimir Perervenko:


Let's choose the model with the best indices and work with it.


This is where the trouble lies.

And what are the best values based on what data?

Why do I ask, because Vkontov is trying to figure out how to choose a model (out of many models), using the data from training and testing. And here you have it so straightforward: take the best indicators and work with them.

 
Alexey Burnakov:

This is where the trouble lies.

And what are the best values based on what data?

Why do I ask, because Vkontov is trying to figure out how to choose a model (out of many models), using the data from training and testing. And here you have it so straightforward: take the best indicators and work with them.

The initial set is divided into train/test stratified. On train we train and on test respectively we test. Is it really not clear from the code?

Good luck

 
SanSanych Fomenko:
I would like to see you more often. Do not disappear.
Unfortunately I only have time to browse the site from time to time. There is a lot of work.
 
Vladimir Perervenko:

The initial set is divided into train/test stratified. On train we train on test respectively we test. Is it really not clear from the code?

Good luck,

I will try rminer::holdout, thanks for the example. Generally, from my experience, if you choose a model and its parameters so as to get the best result on a test sample - then the model will eventually show a really good result on the test sample. But the result is usually very low on new data. I'm talking specifically about forex data, in other areas this is quite a normal approach. I don't hope rminer::holdout for forex will change anything dramatically.
 
Dr.Trader:
I'll try rminer::holdout, thanks for the example. Generally, from experience, if you pick model and its parameters to get the best result on a test sample - then the model will finally show a really good result on the test sample. But the result is usually very low on new data. I'm talking specifically about forex data, in other areas this is quite a normal approach. I don't hope rminer::holdout for forex will change anything dramatically.
That's what I mean. And he didn't get it.

In forex, a good test doesn't mean good performance outside the sample. That's why people are struggling. And here it's just like that - we take the best results (and master fitting). In good faith.)
 
Dr.Trader:
then the model will end up showing a really good result on the test sample. But the result is usually very low on new data. I'm talking specifically about forex data,
Alexey Burnakov:
In forex, a good test doesn't mean good performance outside the sample. That's why people are struggling.

The market moves against its own statistics, this is a theory that I have confirmed with practice, it is the only theory I know that gives answers to all questions from why the model does not work on new data to why everyone loses money in the market in general...

why is this so hard for you to accept?

does the old knowledge and habits so much suppress the perception of new information?

Why concentrate so much on the model if the difference in performance between models is between 0.5% and 5%?

No model can help here because the essence is in the data itself

1

I've posted this picture more than once, but nevertheless.....

Look closely! This is the difference between cumulative buy and sell forecasts from two networks cum(buy.signal) - cum(sell.signal), ideally if our model is good then the blue chart should correlate with the price, this means that the network understands the data well and reacts adequately to them, in fact what we see????????

We cannot say that the model does not understand the data, the correlation is inverse but the structure is the same, but there is something wrong with the direction, the market goes against predictions and against statistics that the network learned in the past...

Now tell me what model can help with this? What crossvalidation will help? Any model training followed by out of sample (new data) would be nothing more than a fitting of a model that works well on out of sample and nothing more... And you see it all the time when you train models yourself, that on brand new data the model always fails, don' t you see?! ? I give you the answer why it happens!

 

Is this a graph with the data on which the training itself took place, or is there only a test on the new data? If you draw a graph for both time periods at once, both for the training and for the test, then on the first (training) part of the data will be a complete coincidence of the blue and gray graph, and with the beginning of new data - there will be a sharp transition to the inverse correlation?

If it were that simple, it would be enough to train any model, and just invert its predictions. That doesn't work unfortunately.
Teaching a model that gives 0% accuracy on new data is just as difficult as achieving 100% accuracy. The default, for example flipping a coin is 50% accuracy, and going a couple dozen percent in either direction is a task of equal difficulty. The problem is not that the models give opposite results, but that on some bars the result will be correct, on others - wrong, and all this is random and without the ability to filter out only correct results.

And why are you taking away the S forecast from the B forecast? Maybe you should do the opposite, S-B? Then all of a sudden the correlation would be right too.

Reason: