Discussion of article "Random Forests Predict Trends" - page 13

 
MetaQuotes Software Corp.:

New article Predicting Trends with Random Forests has been published:

By СанСаныч Фоменко


This model is very easy on prediction, but in practice there are a lot of problems with it.

1. Because the zigzag signal is chosen as the target, one of the simplest variables can be predicted out of him, for example, the sorting of PRICE for the past 20 cycles, obviously from the target design, the signal of zigzag is upward when moverank_price_20 =1, and moverank_price_20 >1 when There is more than a 90% probability that the signal from zigzag is downward; many such variables can be constructed out of this, so it is easy to be successful in model prediction; but one a priori condition here is that you have to know that this point is the point of zigzag. If it is other points in time, but not the point of zigzag, the above variables do not exist in the prediction ability.

2. So there will be a big problem when applying, because you don't know what is the starting point of zigzag, so you have to calculate all the data, at this time, moverank_price_20 =1, do not think that is the starting point of zigzag, and so can't predict to change the point after the trend is up.

3. So the target setting method of zigzag is difficult to work.

 
Could you please tell me what to put there under Windows? In Russian, if you can, because something here is messed up https://rattle.togaware.com/rattle-install-mswindows.html.
 

There is a table in the article



01MeanDecreaseAccuracy MeanDecreaseGini
MA_eur.5.dif142.9741.8554.86 321.86
EUR.dif337.2146.3851.80177.34
RSI_eur.1437.7040.1150.75254.61
EUR.dif2 24.6631.6438.24110.83
MA_eur.10.dif122.9425.3931.48193.08
CHF.dif3 22.9123.4230.1573.36
MA_chf.5.dif1 21.8123.2429.56135.34


But nothing is said about what the figures mean by themselves, their relation to other figures is understandable - better - better, but the values themselves, what should they be and what do they depend on? Here in the article the maximum value of MeanDecreaseGini is 321.86, and I have 1876 - does it depend on the number of predictors or what? And I have a MeanDecreaseAccuracy of 140.22 - how do I interpret this? Maybe I should just convert all values to a percentage of the largest value?

 
Aleksey Vyazmikin:

There's a table in the article



01MeanDecreaseAccuracy MeanDecreaseGini
MA_eur.5.dif142.9741.8554.86 321.86
EUR.dif337.2146.3851.80177.34
RSI_eur.1437.7040.1150.75254.61
EUR.dif2 24.6631.6438.24110.83
MA_eur.10.dif122.9425.3931.48193.08
CHF.dif3 22.9123.4230.1573.36
MA_chf.5.dif1 21.8123.2429.56135.34


But nothing is said about what the figures mean by themselves, their relation to other figures is understandable - better is better, but the values themselves should be what and what do they depend on? Here in the article the maximum value of MeanDecreaseGini is 321.86, and I have 1876 - does it depend on the number of predictors or what? And I have a MeanDecreaseAccuracy of 140.22 - how do I interpret this? Maybe I should just convert all values to a percentage of the highest value?

This is the internal statistic of using predictors while building a bunch of trees, all together called randoForest. Comparing different models makes no sense at all. I have not been able to compare even within the same model. If you want to select predictors, you need to use other tools. I have written many times on the machine learning thread and I am not the only one.

 
СанСаныч Фоменко:

This is the internal statistics of using predictors when building a bunch of trees, all together called randoForest. Comparing different models makes no sense at all. I have not been able to compare even within the same model. If you want to select predictors, you need to use other tools. I have written many times on the machine learning thread and I am not the only one.

I see, i.e. it is an estimator within a model, but not absolute....

Maybe, of course, and wrote on the forum, but that volume to cope with.... - I read half of the forum through my reader. If you can point me in the right direction, I would be grateful.

 
Aleksey Vyazmikin:

I see, i.e. it is an estimator within the model, but not absolute....

Maybe, of course, and wrote on the forum, but that volume to cope... - I read half of the forum through my reader. If you poke your nose, I would be grateful.

1. I do not have a short answer, as it is a whole industry, called datamining, which is comparable to modelling

2. The standard scheme for datamining classification models is as follows:

  • define a target variable
  • look for predictors for THIS target variable.
  • Determine the predictive power of the predictors, i.e. part of the predictor should predict one class, the other part should predict another class, the smaller the overlap, the better the predictors are
  • we take the packages for determining the importance of predictors. There are many of them, I chain the file with an overview

3. Fit the model on the first half of the file preferably with crossvalidation

4. Check on the second half of the file. The results should match


For all this you need a lot of DONE tools. The best is caret. It has everything you need. But not enough.


PS.

This is R. Outside of it, you can't get any further than inarticulate baby babble.

 
СанСаныч Фоменко:

1. I don't have a short answer as it's a whole industry called datamining, which is comparable to modelling

2. The standard scheme for datamining classification models is as follows:

  • define a target variable
  • find predictors for THIS target variable
  • Determine the predictive power of the predictors, i.e. part of the predictor should predict one class, the other part should predict another class, the smaller the overlap, the better the predictors are
  • we take the packages for determining the importance of predictors. There are many of them, I am attaching a file with an overview

3. Fitting the model on the first half of the file, preferably with crossvalidation

4. Check on the second half of the file. The results should match


For all this you need a lot of DONE tools. The best is caret. It has everything that is needed. But not enough.


PS.

That's R. Outside of that, you can't get any further than inarticulate baby babble.

Thanks, I'll keep looking!

 
I installed RStudio, downloaded Boruta package , but how to activate it, how to work with it?
 
Aleksey Vyazmikin:
I installed RStudio, downloaded Boruta package , but how to activate it, how to work with it?

Read the documentation, always for all packages. In RStudio, open the Packages tab, type the package name in the search and click on the package name that pops up, the Help will open. Or better here by package name, there may be links to related materials.

If you are interested in ideology, you will find a link to a theoretical article in the functions included in the package.

CRAN Packages By Name
  • cran.r-project.org
The package will formally test two curves represented by discrete data sets to be statistically equal or not when the errors of the two curves were assumed either equal or not using the tube formula to calculate the tail probabilities
 
СанСаныч Фоменко:

Read the documentation, always for all packages. In RStudio, open the Packages tab, type the package name in the search and click on the package name that pops up, the Help will open. Or better, click here on the package name, there may be links to related materials.

If the ideology is interesting, there will be a link to the theoretical article in the functions that are included in the package.

Thanks!

So I opened the pdf with the description and here the settings are dumbfounded - so many things are required that I do not know what half of it means.

Is there something simpler, even if less reliable, and preferably with GUI?

In general, it would be very useful for you to make articles on this topic, with details of where and how!