Machine learning in trading: theory, models, practice and algo-trading - page 2110

 
elibrarius:
If unique, then it's crooked. For example, a total of 100 strings of which 10 are unique, of which 2 of 45 strings and 8 of 1. Divided by 5 quanta, it's possible that only 5 by 1 will be chosen, and the 2 most representative (by 45) will be skipped.

Different approaches work differently effectively on different predictors, which is why I want to have different algorithms to understand how best to pick them. I will share my research if you can translate the code into MT5.

 
Aleksey Vyazmikin:

With a uniform distribution is clear - I would first create an array of unique values and use it to cut.

But there are other methods to divide the grid:

Uniform - simply divide the range of values, for example values in a column from 0 to 100, step quantum = 100/255 = 0.39 only not in rows, but in values. I.e. 0,0.39,0.78 .... 99.61

Then you can find the values actually present in the column and remove duplicates.


UniformAndQuantiles - just look for half 255/2 = 127 quanta by method 1 and 128 by method 2, and combine into one array.

The other 3 methods are complicated - I didn't look at them.

 
Aleksey Vyazmikin:

These are the sample quantization methods for CatBoost - it is by these boundaries that the enumeration/learning then goes on.

My experiments show that grid should be chosen for each predictor separately, then the quality gain is observed, but CatBoost can't do it and I can't build a grid and I have to build grids and upload them to csv and then go through them in order to evaluate behavior of targets in them. I think this is a very promising chip, but I need to translate the code to MQL.

1) This is what happens. It takes a separate column, sorts it, and splits it into quanta.

2) That's exactly what it does.

 

Uniform - simply divide the range of values, for example values in a column from 0 to 100, step quantum = 100/255 = 0.39 only not in rows, but in values. I.e. 0,0.39,0.78 .... 99.61

Then you can use these values to find those actually present in the column and remove duplicates.


UniformAndQuantiles - just look for half 255/2 = 127 quanta by method 1 and 128 by method 2, and combine into one array.

The other 3 methods are complicated - I didn't look.

These are complex ones I'm interested in :)

And on UniformAndQuantiles I know the theory, but how to do it in real life, I do not understand - how to determine the area where so quantum and so so quantiles. Here I do not understand - or up to the middle by one method, and after another - but it's crazy.

 
Aleksey Vyazmikin:

Choosing the right partitioning significantly affects the result.

Here's an example on Recall - up to 50% spread - for me that's significant.

Increasing the bounds from 8 to 512 in increments of 512 - though not in order on the histogram - I have the names a bit of a hindrance.


I'm still experimenting with mesh selection, but it's already obvious that there are different predictors that need different meshes to follow the logic, not just fit.

Take 65535 quanta and don't bother. The calculations will be as accurate as possible.

 
Aleksey Vyazmikin:

I am interested in these complex ones :)

And on UniformAndQuantiles I know the theory, but how to do it in real life I do not understand - how to determine the area where so quantum and so quantiles. Here I do not understand - or up to the middle by one method, and after another - but it's crazy.

Yes

 
elibrarius:

1) This is what happens. It takes a separate column, sorts it, and splits it into quanta.

2) That's exactly what it does- what makes you think that?

It doesn't know how to evaluate the relationship between the target and the set of values in quantization. It splits the grid into a given number of segments, if possible, for all predictors, and that's not always necessary. But CatBoost knows how to work with fed (separately prepared) grid of quantization, what I use.

 
elibrarius:

Take 65535 quanta and don't bother. The calculations will be as accurate as possible.

No, it will be a fitting, not a meaningful model!

 
elibrarius:

Yes

That's pretty weird.

 
Aleksey Vyazmikin:

It does not know how to evaluate the relationship between the target and the set of values in quantization. Partitioning of the grid occurs on a given number of segments, if possible, for all predictors, and it is not always necessary. But CatBoost knows how to work with fed (separately prepared) grid quantization, what I use.

And you know how?

Reason: