Machine learning in trading: theory, models, practice and algo-trading - page 486

 
I do not know how it counts, mine counts the percentage of non-guessing options.
For example on this option was 1000 inputs of which 600 worked in the plus (I guessed) 400 worked in the red (no guess). so the error is the number of non-guessing options in relation to all variants, in this example, error = 400/1000 = 0.4

Respectfully.
 
Andrey Kisselyov:
I do not know how it works, mine counts the percentage of non-guessing options.
For example on this variant was 1000 inputs of which 600 worked in the plus (I guessed) 400 worked in the minus (did not guess). so the error is the number of non-guessing options in relation to all variants, in this example, the error = 400/1000 = 0.4

Respectfully.

Here I understand the final error is divided by the number of examples multiplied by the number of inputs for some reason, if you remove it:

return(result/(npoints*df.m_nclasses));

If you multiply it backwards it will be quite clear result, for example 0.5578064232767638 :)


 
Maxim Dmitrievsky:

Here I understand the final error is divided by the number of examples multiplied by the number of inputs for some reason, if you remove it:

If you multiply it back then the result is quite clear, for example 0.5578064232767638 :)


most likely
return(result/(npoints*df.m_nclasses));
means _Point (points) like guessed points from ... or vice versa.

Regards.
 
Andrey Kisselyov:
most likely
Means _Point (points) like guessed points from ... or vice versa.

Regards.

No, here npoints means the length of the input vector :)

And nclasses is the number of outputs, oh how

//+------------------------------------------------------------------+
//| This subroutine builds random decision forest.                   |
//| INPUT PARAMETERS:                                                |
//|     XY          -   training set                                 |
//|     NPoints     -   training set size, NPoints>=1                |
//|     NVars       -   number of independent variables, NVars>=1    |
//|     NClasses    -   task type:                                   |
//|                     * NClasses=1 - regression task with one      |
//|                                    dependent variable            |
//|                     * NClasses>1 - classification task with      |
//|                                    NClasses classes.             |
//|     NTrees      -   number of trees in a forest, NTrees>=1.      |
//|                     recommended values: 50-100.                  |
//|     R           -   percent of a training set used to build      |
//|                     individual trees. 0<R<=1.                    |
//|                     recommended values: 0.1 <= R <= 0.66.        |
//| OUTPUT PARAMETERS:                                               |
//|     Info        -   return code:                                 |
//|                     * -2, if there is a point with class number  |
//|                           outside of [0..NClasses-1].            |
//|                     * -1, if incorrect parameters was passed     |
//|                           (NPoints<1, NVars<1, NClasses<1,       |
//|                           NTrees<1, R<=0 or R>1).                |
//|                     *  1, if task has been solved                |
//|     DF          -   model built                                  |
//|     Rep         -   training report, contains error on a training|
//|                     set and out-of-bag estimates of              |
//|                     generalization error.                        |
//+------------------------------------------------------------------+
static void CDForest::DFBuildRandomDecisionForest(CMatrixDouble &xy,
                                                  const int npoints,
                                                  const int nvars,
                                                  const int nclasses,
                                                  const int ntrees,
                                                  const double r,int &info,
                                                  CDecisionForest &df,
                                                  CDFReport &rep)

I.e. final error should be multiplied by length of training sample multiplied by number of outputs (if 1, then outputs are omitted).

May be useful for someone

I hope I did not mix up anything and did it right :) at least the error values became clear
 
Maxim Dmitrievsky:

No, here npoints means the length of the input vector :)

Then you need to see what is rezult , since divisor is input parameters.

Respectfully.
 
Andrey Kisselyov:
In this case, you need to see what is rezult, since the divider is the input parameters.

Sincerely.

In short, it's just an average error over all examples, and you don't need it... rezult returns just the total error, and then it is divided by the number of examples in the sample (this can be removed)

 
Maxim Dmitrievsky:

In short, it's just an average error over all samples, and you don't need it... rezult returns just the total error, and then it is divided by the number of examples in the sample

This means you need to return it to normal, which you did by multiplying by the divisor.

Т.е. итоговую ошибку нужно домножить на длину обучающей выборки умноженную на кол-во выходов (если 1 то выходы опускаем)



Respectfully.

 
Andrey Kisselyov:

So it needs to go back to normal, which you did by multiplying by the divider.



Sincerely.


That feeling when you tweak something and then rejoice like a child :)

 
Maxim Dmitrievsky:

The feeling when you tweak something and then rejoice like a child :)

Look in a buffet and take a candy from a vase.

Sincerely.
 
Maxim Dmitrievsky:

In theory, there should be little error inrandom forests, because during their construction all variables are used in decision trees and there is no restriction on memory usage like in neural networks - the number of neurons. There you can only use separate operations to "blur" the result, such as level restriction, tree trimming or bagging. I don't know if there is pruning in MQ implementation of algib, but there is tagging

//      R           -   percent of a training set used to build      |
//|                     individual trees. 0<R<=1.                    |
//|                     recommended values: 0.1 <= R <= 0.66.  

If you make this variable smaller than 1, the error should increase.

Reason: