Machine learning in trading: theory, models, practice and algo-trading - page 1264

 

Mihail Marchukajtes:

Maximka is tangled in the woods, can't find his way out? Business ....

I admit that I haven't been here for a long time, all business and worries.... And here I decided to come here. Check in, so to speak :-) By the way....

There is a place for nothing. There was Maximka the Stoner, became Maximka the Half-wit)))


 
Vizard_:

A place is never empty. There was Maximka the Stoner, now there's Maximka the Half-wit.)


well you're just a good day for life and you don't change your status

Secret Girl
 
Maxim Dmitrievsky:

well, you're just a good day for life, and you don't change your status

secret girl

Oh, Teacher))) It's the same with the "models". Where is the deer and where is the girl, where is 0,
and where is 1... No way to determine, all in one pile))) hilarious...

 
Vizard_:

Oh, Teacher)))) It's the same with models. Where's the deer and where's the girl, where's the 0,
and 1... can't tell, it's all in one pile))) hilarious...

I thought for a long time how to answer, more than 10 minutes... I must have been getting ready, preparing screenshots, worried

i get it, you need to respawn from a pile of shit to make a spectacular appearance, then you feel good about yourself ))

well, welcome back ))))

 
Maxim Dmitrievsky:

The learning speed is good, the response time when using and the download time of the structure are bad, because the forest files are large. I had up to 300 mb.

There is something wrong with serialization. The forest is trained and saved faster than it is loaded back from the file.

If it says that the forest now generates orders of magnitude less files, it's a very big speedup

The NS, on the contrary, takes longer to learn, but the response is instantaneous. There is no difference in the quality of classification. You can use anything, but the forest out of the box works, and the NS needs to be configured


All that is written about the forest in the description of the update:
improved random forests construction algorithm, which is from 2x to 10x faster than previous version and produces orders of magnitude smaller forests.

The old version has the following data structure:

//--- node info:

//--- W[K+0] - variable number (-1 for leaf mode)
//--- W[K+1] - threshold (class/value for leaf node)
//--- W[K+2] - ">=" branch index (absent for leaf node)

The new one stores the same 3 variables for a node and 2 for a leaf node.
And the tree is built exactly the same - until the last example with the 0-th error. I haven't seen any pruning.
The only thing I saw in the comments to the code about acceleration
SplitStrength- split type:
* 0 = split at the random position, fastest one
* 1 = split at the middle of the range
* 2 = strong split at the best point of the range (default)

Apparently random splitting gets 2-10 times speedup and the best points for splitting can result in a more compact tree.

You can just add random point selection to the partitioning function. Edit in 2-3 lines))

 
elibrarius:

I do not know English very well.
All it says about the forest:
improved random forests construction algorithm, which is from 2x to 10x faster than previous version and produces orders of magnitude smaller forests.

The old version has the following data structure:

//--- node info:

//--- W[K+0] - variable number (-1 for leaf mode)
//--- W[K+1] - threshold (class/value for leaf node)
//--- W[K+2] - ">=" branch index (absent for leaf node)

The new one stores the same 3 variables for a node and 2 for a leaf node.
And the tree is built exactly the same - until the last example with the 0-th error. I haven't seen any pruning.
The only thing I saw in the comments to the code about acceleration
SplitStrength- split type:
* 0 = split at the random position, fastest one
* 1 = split at the middle of the range
* 2 = strong split at the best point of the range (default)

Apparently random splitting gets 2-10x speedup and finds the best points for splitting, as a result you may get a more compact tree.

i.e. now you can customize the splitting method? But the default is still the slowest

Oh well, you can do it yourself, yes :)

 
Maxim Dmitrievsky:

i.e. now you can customize the splitting method? But the default is still the slowest

Oh well, you can redo it yourself then, yes :)

Only I'm afraid that all the changes will be overwritten when you update the terminal. It is necessary to make a copy of the forest class and keep it as a separate file
 
elibrarius:
Only I'm afraid that all the edits will be overwritten when you update the terminal. You should make a copy of forest class and keep it as a separate file

Yeah, or keep an archive.

Well then, let's experiment, thanks for poking around, it's useful

Maybe you could add Bayesian trees there too, if you know it well
 
elibrarius:

So I understand that you can try to change something here

//+------------------------------------------------------------------+
//| Makes split on attribute                                         |
//+------------------------------------------------------------------+
static void CDForest::DFSplitC(double &x[],int &c[],int &cntbuf[],const int n,
                               const int nc,const int flags,int &info,
                               double &threshold,double &e,double &sortrbuf[],
                               int &sortibuf[])
 
Maxim Dmitrievsky:

So I understand that you can try to change something here

Yes. And duplicate it in DFSplitR, so that the regression forests also have the same functionality

Reason: