Machine learning in trading: theory, models, practice and algo-trading - page 1260

 
Maxim Dmitrievsky:
compared to catbust - it returns normal, usually > 0.5 on the test... as usual

Did you check forest of 1 tree in catbust? And does catbust output crossentropy or some other error?

Alglib gives 5 different errors: (my example with 1 tree gave out)

Alert: Training set error estimate report: relclserror=0.267; avgce=184.207; rmserror=0.516; avgerror=0.267; avgrelerror=0.267;

 

No, it is built differently, it increments the number of trees while entropy decreases, when it starts to grow during n iterations it stops, so as not to overcomplicate the model

it gives entropy and any custom metric, but it is based on entropy

on english version of alglib site there is new, faster forest, by the way... i wanted to rewrite it, but i can't find it )

 
I took a look at the xgboost docs. Its output is not cross entropy, but
error - Binary classification error rate. It is calculated as (# wrong cases) / (# all cases).
By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
Different threshold (e.g., 0.) could be specified as "error@0."
 
elibrarius:
Looked at the xgboost docs. It doesn't have a cross entropy output, but
error - Binary classification error rate. It is calculated as (# wrong cases) / (# all cases).
By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
Different threshold (e.g., 0.) could be specified as "error@0."

Well, yes, and here the classification error is used by default, apparently

but you have to distinguish, the boosting uses an error to stop, and the forest just post-facto gives out, and trees all the way through
 
Maxim Dmitrievsky:

No, it is built differently, it increases number of trees while entropy decreases, when it starts to grow during n iterations it stops, so as not to overcomplicate the model

it gives entropy and any custom metric, but it is based on entropy

on english version of alglib site you can find new, faster forest, by the way... i wanted to rewrite it, but i can't find it )

Where's the new alglib - is it somewhere on the forum? It's interesting to compare the code of this function

 
elibrarius:

And where is the new alglib - in the forum somewhere to look? It's interesting to compare the code of this function

No, it's in c++ or c#

http://www.alglib.net/arcnews.php#date_16_06_2018

  • improved random forests construction algorithm, which is from 2x to 10x faster than previous version and produces orders of magnitude smaller forests.
NEWS Archive
  • www.alglib.net
The news archive contains the list of all news, 44 in total. The latest news are at the news page.
 
Maxim Dmitrievsky:

No, it's in c++ or c#

http://www.alglib.net/arcnews.php#date_16_06_2018

  • improved random forests construction algorithm, which is from 2x to 10x faster than previous version and produces orders of magnitude smaller forests.

Thanks!

 
elibrarius:

Thank you!

If you will understand-comparison, write :) so it would be possible to rewrite, if there is not too much hassle. The files of the current forest are too big, it would be nice to reduce. Yes, and acceleration is also a bonus

 

Thanks to Maxim's initiative I was unbanned! Thanks Maxim.

I have already started my trading robot on leaves, the one I have described before the New Year. The result is negative so far, but I have a feeling I have to give it more time, because Si is in flat now after the strong New Year market, which has not been at the right time to start the trading robot.

The tests are still running with minimal volume on an account with a rich history and bad experience, so there will not be a public signal, I will post the report later, when the statistics will be accumulated. I do it for someone who wants to know if my approach is profitable or not.

Regarding CatBoost'a, because I have very small models, 1-30 trees or so, then ran into a situation where the test (on which the selection model) and exam (on which independent testing) sample can show very good financial results, but on the training sample results are very weak. So now I test the model on all three samples, and if I am satisfied with it, I select it. So I recommend to look at the training sample as well, I didn't do it because I expected the same effect as the tree leaves (my alternative approach, when only leaves are selected) or scaffolding, that the model will definitely behave well on the test sample, but it turned out to be not always the case.

So far the question of criteria for the selection of the model is open to me, after tests on the sample for the training of good models (for a number of financial indicators and criteria models) out of 100k is no more than 10-30 pieces, which is not enough of course. We should either lower the criteria or create more models. Catbust has a lot of different parameters, so you can rivet a lot of models.

On the other hand, very much hoped for training with video cards, and it turned out that the 1060 is very little use - so far the experiments show that the training of 200 models took 20 minutes, while on the processor G3900 (actually the slag under the LGA1151) it took only 6 minutes! At the same time the processor is always loaded by 50-60 percent, which makes it impossible to use more than 2 video cards at once, while I had strong hopes for the reg of 6 video cards. I don't know why this happens when in theory everything should be fast. The biggest bottleneck in the GPU calculations is the transfer of the model from RAM to graphics memory and back, but it's too slow for me, perhaps the transfer occurs after each iteration and that's why it slows down. Has anyone else tried to run on GPU?

 
Maxim Dmitrievsky:

If you will understand-comparison, write :) so it would be possible to rewrite, if there is not too much hassle. The files of the current forest are too big, it would be nice to reduce. Yes, and the acceleration is also a bonus.

I compared, there is the same unused piece of code below: (signed in 2009, i.e. no edits were made in this part)


             Copyright 16.02.2009 by Bochkanov Sergey
        *************************************************************************/
        public static double dfavgce(decisionforest df,
            double[,] xy,
            int npoints,
            alglib.xparams _params)
        {
            double result = 0;
            double[] x = new double[0];
            double[] y = new double[0];
            int i = 0;
            int j = 0;
            int k = 0;
            int tmpi = 0;
            int i_ = 0;

            x = new double[df.nvars-1+1];
            y = new double[df.nclasses-1+1];
            result = 0;
            for(i=0; i<=npoints-1; i++)
            {
                for(i_=0; i_<=df.nvars-1;i_++)
                {
                    x[i_] = xy[i,i_];
                }
                dfprocess(df, x, ref y, _params);
                if( df.nclasses>1 )
                {
                   
                    //
                    // classification-specific code
                    //
                    k = (int)Math.Round(xy[i,df.nvars]);
                    tmpi = 0;
                    for(j=1; j<=df.nclasses-1; j++)
                    {
                        if( (double)(y[j])>(double)(y[tmpi]) )
                        {
                            tmpi = j;
                        }
                    }

                    if( (double)(y[k])!=(double)(0) )
                    {
                        result = result-Math.Log(y[k]);
                    }
                    else
                    {
                        result = result-Math.Log(math.minrealnumber);
                    }
                }
            }
            result = result/npoints;
            return result;
        }

Reason: