How to pick the right numbers in computer science? - General

Aleksey Vyazmikin 2020.11.12 10:43 #21081

Maxim Dmitrievsky:

select all files and download them, they will be zipped

different sampling lengths then if a part

Thank you, it is - you can download the archive, which is nice!

But the different lengths of samples - it's bad, I thought to allocate the most random columns, where small deviations are acceptable.

I think that on the sample does not need to apply this method - otherwise how to use it in the real world.

I start it for training, let's see what happens.

[Archive] Learn how to Market etiquette or good Is there a pattern

Maxim Dmitrievsky 2020.11.12 10:46 #21082

Aleksey Vyazmikin:

Thank you, it is - you can download the archive, which is nice!

But the different lengths of samples - this is bad, I thought to allocate the most random columns, where small deviations are acceptable.

I think that on the sample does not need to apply this method - otherwise how to use it in the real world.

I'm running it for training, let's see what happens.

I don't need it for exams, but it may come in handy.

Aleksey Vyazmikin 2020.11.12 10:49 #21083

elibrarius:

Too lazy to convert)
I'll explain the point:

1) we sort the column
2) we count the average number of elements in a quantum, for example 10000 elements / 255 quanta = 39,21
3) in the loop we move by 39,21 elements at each step and add the value from the sorted array to the array of quantum values. I.e. array value 0 = value 0 quantum, 39th value = 1 quantum, 78th value = 2 quantum, etc.

If a value is already in the array, i.e. if we get into an area with many duplicates, we skip the duplicate and don't add it.

At each step, we add exactly 39.21, and then round up the sum to select the element in the array, so it would be equal. I.e. instead of 195 (39*5 = 195) elements take 196 ( 39,21 * 5 = (int) 196,05).

With uniform distribution is clear - I would first create an array of unique values and use it for cutting.

But there are other methods of splitting the grid:

    THolder<IBinarizer> MakeBinarizer(const EBorderSelectionType type) {
        switch (type) {
            case EBorderSelectionType::UniformAndQuantiles:
                return MakeHolder<TMedianPlusUniformBinarizer>();
            case EBorderSelectionType::GreedyLogSum:
                return MakeHolder<TGreedyBinarizer<EPenaltyType::MaxSumLog>>();
            case EBorderSelectionType::GreedyMinEntropy:
                return MakeHolder<TGreedyBinarizer<EPenaltyType::MinEntropy>>();
            case EBorderSelectionType::MaxLogSum:
                return MakeHolder<TExactBinarizer<EPenaltyType::MaxSumLog>>();
            case EBorderSelectionType::MinEntropy:
                return MakeHolder<TExactBinarizer<EPenaltyType::MinEntropy>>();
            case EBorderSelectionType::Median:
                return MakeHolder<TMedianBinarizer>();
            case EBorderSelectionType::Uniform:
                return MakeHolder<TUniformBinarizer>();
        }

Maxim Dmitrievsky 2020.11.12 10:51 #21084

Aleksey Vyazmikin:

With a uniform distribution is clear - I would first create an array of unique values and use it to cut.

But there are other methods of splitting the grid:

There must be a lot of samples, otherwise the model won't learn anything.

Aleksey Vyazmikin 2020.11.12 11:04 #21085

Maxim Dmitrievsky:

there must be a lot of samples, otherwise the model will not learn anything

These are the sample quantization methods for CatBoost - these are the boundaries by which the enumeration/learning then goes on.

My experiments show that the grid should be chosen for each predictor separately, then the quality gain is observed, but this is not able to do CatBoost, and I do not know how to build a grid and I have to build a grid and upload to csv, and then search them in order to assess the behavior of the target in them. I think it's a very promising tool, but I need to translate the code into MQL.

Algorithm for combining ranges Any questions from newcomers Is there a pattern

Maxim Dmitrievsky 2020.11.12 11:07 #21086

Aleksey Vyazmikin:

These are the sample quantization methods for CatBoost - it is by these boundaries that the enumeration/learning then goes on.

My experiments show that grid should be chosen for each predictor separately, then the quality gain is observed, but CatBoost can't do it, and I can't build a grid and I have to build grids and upload them to csv, and then iterate through them to evaluate behavior of targets in them. I think this is a very promising feature, but I need to translate the code into MQL.

Do you have it in the settings of the model (parameters)? I don't know what it is

if not in the settings, then it's bullshit.

Aleksey Vyazmikin 2020.11.12 11:17 #21087

Maxim Dmitrievsky:

in the settings of the model itself (parameters) is it? I dunno what it is

If it's not in the settings, then bullshit.

It is in the settings, at least for the command line

--feature-border-type

Thequantization mode for numerical features.

Possible values:

Median
Uniform
UniformAndQuantiles
MaxLogSum
MinEntropy
GreedyLogSum

Quantization - CatBoost. Documentation

catboost.ai

Mode How splits are chosen Combine the splits obtained in the following modes, after first halving the quantization size provided by the starting parameters for each of them: Maximize the value of the following expression inside each bucket: Minimize the value of the following expression inside each bucket: Maximize the greedy approximation of...

Maxim Dmitrievsky 2020.11.12 11:20 #21088

Aleksey Vyazmikin:

It is in the settings, at least for the command line

--feature-border-type

Thequantization mode for numerical features.

Possible values:

Median
Uniform
UniformAndQuantiles
MaxLogSum
MinEntropy
GreedyLogSum

Does it make a big difference? It should be within a percent

Forester 2020.11.12 11:36 #21089

Aleksey Vyazmikin:

With a uniform distribution is clear - I would first create an array of unique values and use it to cut.

But there are other methods to divide the grid:

If you have unique values, it will be a mess. For example, a total of 100 strings of which 10 are unique, 2 of them by 45 strings and 8 by 1. Divided by 5 quanta, it's possible that only 5 by 1 will be chosen, and the 2 most representative (by 45) will be skipped.

Writing an SMA indicator Any questions from newcomers cannot get correct price

Aleksey Vyazmikin 2020.11.12 11:39 #21090

Maxim Dmitrievsky:

And that makes a big difference? It should be within a percent difference.

Choosing the right partitioning makes a big difference.

Here's an example on Recall - up to 50% variation - for me that's significant.

Increasing the bounds from 16 to 512 in increments of 16 - though not in order on the histogram - my titles are a bit of a hindrance.

I am still experimenting with mesh selection, but it is already obvious that there are different predictors, for which I need different meshes, to observe the logic, and not only to fit.

Algorithm for combining ranges learn how to earn What if the World

Machine learning in trading: theory, models, practice and algo-trading - page 2109