How to solve the error in random forests - General

Andrey Kisselyov 2017.09.27 12:31 #4851

I do not know how it counts, mine counts the percentage of non-guessing options.
For example on this option was 1000 inputs of which 600 worked in the plus (I guessed) 400 worked in the red (no guess). so the error is the number of non-guessing options in relation to all variants, in this example, error = 400/1000 = 0.4

Respectfully.

Task Manager - For Spreads - For Advanced Indicators - Charts -

Maxim Dmitrievsky 2017.09.27 13:00 #4852

Andrey Kisselyov:
I do not know how it works, mine counts the percentage of non-guessing options.
For example on this variant was 1000 inputs of which 600 worked in the plus (I guessed) 400 worked in the minus (did not guess). so the error is the number of non-guessing options in relation to all variants, in this example, the error = 400/1000 = 0.4

Respectfully.

Here I understand the final error is divided by the number of examples multiplied by the number of inputs for some reason, if you remove it:

return(result/(npoints*df.m_nclasses));

If you multiply it backwards it will be quite clear result, for example 0.5578064232767638 :)

Andrey Kisselyov 2017.09.27 13:06 #4853

Maxim Dmitrievsky:

Here I understand the final error is divided by the number of examples multiplied by the number of inputs for some reason, if you remove it:

If you multiply it back then the result is quite clear, for example 0.5578064232767638 :)

most likely

return(result/(npoints*df.m_nclasses));

means _Point (points) like guessed points from ... or vice versa.

Regards.

Maxim Dmitrievsky 2017.09.27 13:09 #4854

Andrey Kisselyov:
most likely
Means _Point (points) like guessed points from ... or vice versa.

Regards.

No, here npoints means the length of the input vector :)

And nclasses is the number of outputs, oh how

//+------------------------------------------------------------------+
//| This subroutine builds random decision forest.                   |
//| INPUT PARAMETERS:                                                |
//|     XY          -   training set                                 |
//|     NPoints     -   training set size, NPoints>=1                |
//|     NVars       -   number of independent variables, NVars>=1    |
//|     NClasses    -   task type:                                   |
//|                     * NClasses=1 - regression task with one      |
//|                                    dependent variable            |
//|                     * NClasses>1 - classification task with      |
//|                                    NClasses classes.             |
//|     NTrees      -   number of trees in a forest, NTrees>=1.      |
//|                     recommended values: 50-100.                  |
//|     R           -   percent of a training set used to build      |
//|                     individual trees. 0<R<=1.                    |
//|                     recommended values: 0.1 <= R <= 0.66.        |
//| OUTPUT PARAMETERS:                                               |
//|     Info        -   return code:                                 |
//|                     * -2, if there is a point with class number  |
//|                           outside of [0..NClasses-1].            |
//|                     * -1, if incorrect parameters was passed     |
//|                           (NPoints<1, NVars<1, NClasses<1,       |
//|                           NTrees<1, R<=0 or R>1).                |
//|                     *  1, if task has been solved                |
//|     DF          -   model built                                  |
//|     Rep         -   training report, contains error on a training|
//|                     set and out-of-bag estimates of              |
//|                     generalization error.                        |
//+------------------------------------------------------------------+
static void CDForest::DFBuildRandomDecisionForest(CMatrixDouble &xy,
                                                  const int npoints,
                                                  const int nvars,
                                                  const int nclasses,
                                                  const int ntrees,
                                                  const double r,int &info,
                                                  CDecisionForest &df,
                                                  CDFReport &rep)

I.e. final error should be multiplied by length of training sample multiplied by number of outputs (if 1, then outputs are omitted).

May be useful for someone

I hope I did not mix up anything and did it right :) at least the error values became clear

Managing Passwords - Accounts Parabolic SAR - Trend Productivity - USA -

Andrey Kisselyov 2017.09.27 13:17 #4855

Maxim Dmitrievsky:

No, here npoints means the length of the input vector :)

Then you need to see what is rezult , since divisor is input parameters.

Respectfully.

Maxim Dmitrievsky 2017.09.27 13:20 #4856

Andrey Kisselyov:
In this case, you need to see what is rezult, since the divider is the input parameters.

Sincerely.

In short, it's just an average error over all examples, and you don't need it... rezult returns just the total error, and then it is divided by the number of examples in the sample (this can be removed)

Indicators - Charts - Setup - Toolbars Deleted Charts - Additional

Andrey Kisselyov 2017.09.27 13:22 #4857

Maxim Dmitrievsky:

In short, it's just an average error over all samples, and you don't need it... rezult returns just the total error, and then it is divided by the number of examples in the sample

This means you need to return it to normal, which you did by multiplying by the divisor.

Т.е. итоговую ошибку нужно домножить на длину обучающей выборки умноженную на кол-во выходов (если 1 то выходы опускаем)

Respectfully.

Maxim Dmitrievsky 2017.09.27 13:25 #4858

Andrey Kisselyov:

So it needs to go back to normal, which you did by multiplying by the divider.

Sincerely.

That feeling when you tweak something and then rejoice like a child :)

Andrey Kisselyov 2017.09.27 13:27 #4859

Maxim Dmitrievsky:

The feeling when you tweak something and then rejoice like a child :)

Look in a buffet and take a candy from a vase.

Sincerely.

Ivan Negreshniy 2017.09.27 13:28 #4860

Maxim Dmitrievsky:

In theory, there should be little error inrandom forests, because during their construction all variables are used in decision trees and there is no restriction on memory usage like in neural networks - the number of neurons. There you can only use separate operations to "blur" the result, such as level restriction, tree trimming or bagging. I don't know if there is pruning in MQ implementation of algib, but there is tagging

//      R           -   percent of a training set used to build      |
//|                     individual trees. 0<R<=1.                    |
//|                     recommended values: 0.1 <= R <= 0.66.

If you make this variable smaller than 1, the error should increase.

How to Participate - Copilot coding assistant - Compilation - Developing programs

Machine learning in trading: theory, models, practice and algo-trading - page 486