Machine learning in trading: theory, models, practice and algo-trading - page 1266

 
Andrey Khatimlianskii:

I am a guest in this thread. Just came by to share an article

Numerai fell for the slogan "Proletarians of all countries unite!" by Karl Marx, in the MO interpretation by Jeffrey Hinton:)
 
Maxim Dmitrievsky:

And this by the way exactly

Considering the number of morons on the forum (including your favorite magician Mudo), I do not think it's necessary to support this topic anymore, because I did not get any benefit for myself

Maxim, you are wrong! Benefit for you is, the benefit in the formulation and presentation of tasks. However, I'm not trying to persuade you.

 
Yuriy Asaulenko:

Maxim, you are wrong! There is a benefit to you, a benefit in the very formulation and presentation of the tasks. However, I do not persuade.

Well, you see that here inhabit the clinical morons, it's not a forum mask, but real clinical cases, you give them a word they give you two, every message

 
elibrarius:

I found an obscure code in Alglib forests. The full code of cross-etropy calculation function from dataanalysis.mqh:

//+------------------------------------------------------------------+
//| Average cross-entropy (in bits per element) on the test set      |
//| INPUT PARAMETERS:                                                |
//|     DF      -   decision forest model                            |
//|     XY      -   test set                                         |
//|     NPoints -   test set size                                    |
//| RESULT:                                                          |
//|     CrossEntropy/(NPoints*LN(2)).                                |
//|     Zero if model solves regression task.                        |
//+------------------------------------------------------------------+
static double CDForest::DFAvgCE(CDecisionForest &df,CMatrixDouble &xy,
                                const int npoints)
  {
//--- create variables
   double result=0;
   int    i=0;
   int    j=0;
   int    k=0;
   int    tmpi=0;
   int    i_=0;
//--- creating arrays
   double x[];
   double y[];
//--- allocation
   ArrayResizeAL(x,df.m_nvars);
   ArrayResizeAL(y,df.m_nclasses);
//--- initialization
   result=0;
   for(i=0;i<=npoints-1;i++)
     {
      for(i_=0;i_<=df.m_nvars-1;i_++)
         x[i_]=xy[i][i_];
      //--- function call
      DFProcess(df,x,y);
      //--- check
      if(df.m_nclasses>1)
        {
         //--- classification-specific code
         k=(int)MathRound(xy[i][df.m_nvars]);
         tmpi=0;
         for(j=1;j<=df.m_nclasses-1;j++)
           {
            //--- check
            if(y[j]>(double)(y[tmpi]))
               tmpi=j;
           }

         //--- check
         if(y[k]!=0.0)
            result=result-MathLog(y[k]);
         else
            result=result-MathLog(CMath::m_minrealnumber);
        }
     }
//--- return result
   return(result/npoints);
  }

The code fragment marked in red considers something(tmpi) which is not used in any way further down the code. Why is it included then?
Either something is missing or the code is not completely cleaned up.
In general, I started to get into this function, because I wanted to explore a tree. And when I set number of trees in the forest = 1, I saw that all errors are between 0 and 1, and this one from 100 to 300 + happens.
Someone understands cross entropy - is the code even correct, or something is undone?

According to wikipedia it should be



The value in general to infiniteness may be when calculating logloss if for the correct class zero probability is predicted, because the formula includes all other classes except it with a zero coefficient, and there seems to be tried something to resolve this glitch - in tmpi in the loop find the class that in this sample has the highest probability value, maybe wanted to add to the formula, but apparently not thought it through:)
 
Ivan Negreshniy:
The value in general to infiniteness may be when calculating logloss if for the correct class zero probability is predicted, because the formula includes all other classes except it with zero coefficient, and there seems to be tried somehow to eliminate this glitch - in tmpi in the loop find the class that in this sample has the highest probability value, maybe wanted to add to the formula, but apparently not thought it through:)
tmpi is only used in 1 out of 5 error fi. Apparently it was used as a dummy for other functions, but in other functions they forgot to remove it.
In total, 1 has tmpi and is used, 2 more have it but are not used.
In general, it does not affect the work.
 
elibrarius:
tmpi is only used in 1 of the 5 error functions. Apparently it was used as a dummy for other functions, but forgot to remove it in others.
Altogether 1 has tmpi and is used, 2 more have it but are not used.
In general, it does not affect the work.

I'm basically saying that a good formula for error calculation could take into account the probability distribution across all classes, not just one correct one.

That being said, if one of the samples has zero probability of a correct class, then everything flies off to infinity.

Apparently this is why I prefer regression with a quadratic error:)

 
Ivan Negreshniy:
I'm basically saying that a good formula for calculating error could take into account the probability distribution across all classes, not just one correct one.
Well, there are 5 error fs. This one is kind of weird, but the other 4 change from 0 to 1, as they should. So there is a choice)
 
Now let Kesha (SanSanych's grandson) and the investor-killed Aliosha lead the branch. That would be fair.
 
Alexander_K2:
Now let Kesha (grandson of SanSanYch) and Alyosha, who was punished by investors, lead this thread. This will be fair.

It makes more sense to drop this topic, and start a new one, more adequate, with other related topics.

By the way, I found a normal distribution in prices. I have already written in Tip, that all abnormality from the "wrong" data processing - themselves and contribute).

The other day or sooner I will post in Python thread.

 
Yuriy Asaulenko:

It makes more sense to drop this topic, and start a new one, more adequate, with another related topic.

By the way, I found a normal distribution in prices. I have already written in Tip, that all abnormality from the "wrong" data processing - themselves and contribute).

I'll post it in the Python thread one of these days or sooner.

Alas, due to the lack of people of the level of Matemat, Northwind and Prival on the forum, all these topics have no future. IMHO.

Reason: