Machine learning in trading: theory, models, practice and algo-trading - page 883

 
Maxim Dmitrievsky:

regular forest and random forests and tree forests are the same thing :) Forest is an ensemble of Trees

Are features collapsed in the sense that there are fewer of them, or what? By collapsed features we mean rarely changing and/or categorical ones like ones and zeros (well, this is a high level of understanding)

No, collapsed means that one variable has many values, but the number of combinations remains the same. I attached a file, similar to the past one for shopping, but in a different representation.

Files:
 
Vizard_:

Binarization kills a lot of useful information.

What difference does it make how the information is presented, it doesn't change ...? ???

 
Maxim Dmitrievsky:

I personally have nothing about SanSanych, he is a very competent and discreet man, doing something of his own unknown, he probably needs R

I prefer python intuitively, although I haven't invented anything special to make it so that it would be wow, but I continue to learn it quietly, see if it helps :D

R is a great environment, which has many advantages over Python. The main thing is that R is a modeling environment. Compared to Python, the results in R can be obtained faster and easier.

It is clear that Python, together with modules, has its own advantages.

By the way, for RF, it seems that here, as well as in NS, we can do without the search and selection of predictors and use directly normalized BP as such.

 
SanSanych Fomenko:

Normal forest or random forest, or both?

In rattle, run both scaffolding models, which are called tree and ada. Open the log tab and see the R code, references to packages used, and you can understand their differences.

I understand the difference between tree and scaffolding (or I think I do) scaffolding is better to use when there is more uncertainty in the data, i.e. a less stable pattern since scaffolding makes decisions by voting, which occurs on random (independent due to shortening) trees, or am I wrong? And here the option "adad" I do not have, it is not in the screenshot, there is "Forest" - is it wrong?

SanSanychFomenko:

I put Rattle and R (well, and glitches all this stuff ...),

I do not understand what glitches, lately ran a huge number of models - all normal

I had some strange moments when downloading packages - he writes that he starts downloading, but does not download it, then he puts it on and says that he doesn't have the libraries he needs, then he hangs when reading data from a file... Well, the work process is not visible - it is not clear how long to wait until the end. So far I am talking about such bugs. Once removed a task from the dispatcher...

SanSanych Fomenko:


Your rattle picture is incomplete. At a minimum, you have to go to the neighboring tab evaluate and see the results there.

But most importantly you need to split the source file into two parts with different names (most likely you will have to do it in R).

On the first file you build all six models and look at their estimates test, validate. Then the name of the second file you enter in the field R Dataset. And on it you get marks again. All the evaluations must be approximately the same!

If these estimates do not coincide, and the results of models on the second file are much worse, then this means that your models are over-trained, and the reason for over-training is the presence of noise (not related to the target variable) predictors.


This is the moment of truth: either you have a set of predictors relevant to a particular target variable or you don't. And no model can fix this sad circumstance. Then begins the stupid work of selecting a pair of "target-predictors", models are not interesting at all, find a pair, then the models are just seeds in R, you will find a dozen of them in a day and make ensembles of them.

So how do you cut a file with R, do you have to use a special algorithm? Interesting to see what happens in the end.

 
SanSanych Fomenko:


2. No problem using R EA: everything works and is very stable.

And it works for MT5? Where can I find examples of code? I think it would be better to send information through indicator to optimizer because it will be easy to compare them when connecting to the EA, and visually it will be visible what forest thinks about the market situation at any given time.

 
Yuriy Asaulenko:

R is a wonderful environment with many advantages over Python. The main one is that R is a simulation environment. Compared to Python, the results in R can be obtained faster and easier.

It is clear that Python, together with modules, has its own advantages.

By the way, as for RF, it seems that here, as well as in NS, we can do without the search and selection of predictors and use directly normalized BP as such.

it is even possible not to use the normalized one

 
Maxim Dmitrievsky:

You can even do it without rationing.

This will not work. There must be a clear reference of the BP segment to a certain level, zero, for example.

 
Aleksey Vyazmikin:


I understand the difference between trees and forests (or I think I do) forests are better to use when there is more uncertainty in the data, i.e. a less stable pattern since forests make decisions by voting, which occurs on random (independent due to shortening) trees, or am I wrong?

I don't know, I'm judging by the results.

And the option "adad" I do not have, it is not in the screenshot, there is "Forest" - is not it?

In order:


Tree

The 'rpart' package provides the 'rpart' function.


Boost

# Extreme Boost

# The `xgboost' package implements the extreme gradient boost algorithm.


SVM

# Support vector machine.

# The 'kernlab' package provides the 'ksvm' function.


Linear

# Regression model

# Build a Regression model.


Neural Net

# Neural Network

# Build a neural network model using the nnet package.

library(nnet, quietly=TRUE)


By the way, I did this work for you - you can look it all up yourself in Log. If you have another version of rattle, the list may be different.


So how do you cut a file with R, do you need to use a special algorithm? It's interesting to see what happens in the end.

By index, for example: [1:2000,], [2001:4000,]. It's important to keep natural time sequence in second file

 
Aleksey Vyazmikin:

And it works for MT5? Where can I see examples of code? Something like indicator I am interested in for convenience, I think it is better to give information through indicator, because in optimizer it will be easy to compare them when connecting to the EA, and visually it will be visible what forest thinks about the market situation at any given time.

The library has been modified upon my request - I needed a tester from MT5. I did the math, I'm too lazy to look for it, maybe I've cleaned it up.

Have a look at the articlesby Vladimir Perervenko

If you are interested in networks, he has the latest peak in this area, R, advisors, the man is available on the site
 
Aleksey Vyazmikin:

No, collapsed, which means that one variable has many values, but the number of combinations remains the same. I attached a file, an analogue of the last one for purchases, but in another representation.

Try it any way you want :) The main thing is not to forget to read the theory that would not do something stupid, and the package to start it is not difficult, there are plenty of them, and even online - you do not need to install anything. There is a boom of datasens, "it" is everywhere

I have no time to analyze the archives, I'm working on my own stuff

Reason: