Machine learning in trading: theory, models, practice and algo-trading - page 1259

 
I.e., if there are 20 predictors, I don't know what Bayes will come up with if he undertrains.
 
Maxim Dmitrievsky:

Not quite, it should be clearer here

https://habr.com/ru/post/276355/

The range of application is large, how exactly it will be used is another question.

I will read it tonight

 
Maxim Dmitrievsky:

We have nothing to talk to you about, because I can see from your psycho-type that you are either a child or just... whatever, forget it.

"Us" as in you? You should find a job for starters, it's a shame to sit on your parents' shoulders at such an age.

 
Maxim Dmitrievsky:

Not quite, it should be clearer here

https://habr.com/ru/post/276355/

the range of applications is large, how exactly it will be used is another question

Bottom line from part 2 of the article:
Dropout can be seen as a cheap version of Bayesianism, which is very simple. The idea is based on the same analogy with ensembles that I mentioned at the end of the last post: imagine that you have a neural network. Now imagine that you take it, accidentally tear off a few neurons, and set it aside. After ~1000 such operations, you get an ensemble of a thousand networks, where each is slightly different at random. We average their predictions, and we get that the random deviations compensate each other in places and give actual predictions. Now imagine that you have a Bayesian network, and you pull a thousand times a set of its weights out of uncertainty, and you get the same ensemble of slightly different networks.

What's cooler about the Bayesian approach is that it allows you to use that randomness in a controlled way.

....

In practice, this translates into the fact that a dimensional network gives better results than a dropout network, albeit not by much.

What makes dropout cooler is that it is very simple, of course.

That is, a deep NS with a dropout is analogous to a Bayesian net. And dropout is in many packages, you can first use it to look for fish on your predictors/targets. And then try to improve the results with Bayesian. In the articles of Vladimir Perervenko the dropout was used, it is possible to experiment on their basis.

I experimented, but I did not find any breakthrough on my predictors.
But I didn't make the networks so deep that I had to drop 1000 neurons from them. I haven't tried more than N-100-100-100-100-1 (401 neurons). That is, 100 may have been removed, but not 1000. To remove 1000, you need a network with 4-10 thousand neurons and it would probably take a long time to count.
It is possible that forests with 1000 trees will have similar results, while counting much faster than NS.
 
elibrarius:
I haven't tried more than N-100-100-100-1 (401 neurons).

Such monsters are not properly trained. Imho, we need a simpler NS. Somewhere up to 100 neurons.

 
elibrarius:
Bottom line from part 2 of the article:

That is, a deep NS with a dropout is analogous to a Bayesian network. And dropout is in many packages, you can first use them to search for fish on their predictors/targets. And then try to improve the results with Bayesian. In the articles of Vladimir Perervenko the dropout was used, you can experiment on their basis.

I experimented, but found no breakthrough on my predictors.
But I didn't make networks so deep that I had to drop 1000 neurons from them. I haven't tried more than N-100-100-100-100-1 (401 neurons). That is, 100 may have been removed, but not 1000. To remove 1000, you need a network with 4-10 thousand neurons and it would probably take a long time to count.
It is possible that forests with 1000 trees will have similar results, while counting much faster than NS.

Oh I don't know, with any number of trees/woods the akurasi stops growing and they just lie there in ballast, not improving anything. Drop them or not, they're a dead giveaway.

That's a crude comparison between Bayes and dropouts in my opinion, but I'm still not very good at this, so I'm not going to argue, maybe it's true.

 
Maxim Dmitrievsky:

A crude comparison between Bayes and dropouts in my opinion, but I'm still not very good at this, so I won't argue, maybe so

It was the author of the article compared, - not me. And he wrote his article with another larger one, on which he did his experiments. That is, this comparison apparently comes from the developers of the method.

 
Maxim Dmitrievsky:

Oh I don't know, with any number of trees/woods the akurasi stops growing and they just lie there in ballast, not improving anything. Drop them, don't drop them, they're a dead man's poultice.

If you're going to build a forest with 1000 trees, you should probably feed 1% of data to each one, i.e. r=0.01, not 0.1 ... 0.6 as recommended for forests with 100 recommended trees.
 

I found an obscure code in Alglib forests. The full code of cross-etropy calculation function from dataanalysis.mqh:

//+------------------------------------------------------------------+
//| Average cross-entropy (in bits per element) on the test set      |
//| INPUT PARAMETERS:                                                |
//|     DF      -   decision forest model                            |
//|     XY      -   test set                                         |
//|     NPoints -   test set size                                    |
//| RESULT:                                                          |
//|     CrossEntropy/(NPoints*LN(2)).                                |
//|     Zero if model solves regression task.                        |
//+------------------------------------------------------------------+
static double CDForest::DFAvgCE(CDecisionForest &df,CMatrixDouble &xy,
                                const int npoints)
  {
//--- create variables
   double result=0;
   int    i=0;
   int    j=0;
   int    k=0;
   int    tmpi=0;
   int    i_=0;
//--- creating arrays
   double x[];
   double y[];
//--- allocation
   ArrayResizeAL(x,df.m_nvars);
   ArrayResizeAL(y,df.m_nclasses);
//--- initialization
   result=0;
   for(i=0;i<=npoints-1;i++)
     {
      for(i_=0;i_<=df.m_nvars-1;i_++)
         x[i_]=xy[i][i_];
      //--- function call
      DFProcess(df,x,y);
      //--- check
      if(df.m_nclasses>1)
        {
         //--- classification-specific code
         k=(int)MathRound(xy[i][df.m_nvars]);
         tmpi=0;
         for(j=1;j<=df.m_nclasses-1;j++)
           {
            //--- check
            if(y[j]>(double)(y[tmpi]))
               tmpi=j;
           }

         //--- check
         if(y[k]!=0.0)
            result=result-MathLog(y[k]);
         else
            result=result-MathLog(CMath::m_minrealnumber);
        }
     }
//--- return result
   return(result/npoints);
  }

The code fragment marked in red considers something(tmpi) which is not used in any way further down the code. Why is it included then?
Either something is missing or the code is not completely cleaned up.
In general, I started to get into this function, because I wanted to explore a tree. And when I set number of trees in the forest = 1, I saw that all errors are between 0 and 1, and this one from 100 to 300 + happens.
Someone understands cross entropy - is the code even correct, or something is undone?

According to wikipedia it should be



 

compared with catbust - it returns normally, usually > 0.5 on the test... well, as usual

I'll look at the calculation tomorrow, maybe the debugging code was not removed

In general, this metric is of no importance here, because it is not used for an early break or anything else... and is uninformative as a result. The error of classification is taken and that's all
Reason: