Trees and Forests/busting can be used with categorical attributes - General

Rorschach 2020.10.25 02:16 #20391

Aleksey Vyazmikin:

There's something there - I don't know what it is.

Day of week, day of month, hour, minute, ...same for exit..., deal duration in minutes, SL, TP, result +-1

2 6 0 4 2 6 57 57 100 100 -1
4 2 6 0 4 2 6 57 57 100 200 -1
4 2 6 0 4 2 6 57 57 100 300 -1
4 2 6 0 4 2 6 57 57 100 400 -1
4 2 6 0 4 2 6 57 57 100 500 -1
4 2 6 0 4 2 6 57 57 100 600 -1
4 2 6 0 4 2 6 57 57 100 700 -1
4 2 6 0 4 2 6 57 57 100 800 -1
4 2 6 0 4 2 6 57 57 100 900 -1
4 2 6 0 4 2 6 57 57 100 1000 -1
4 2 6 0 4 2 6 57 57 100 1100 -1
4 2 6 0 4 2 6 57 57 100 1200 -1
4 2 6 0 4 2 6 57 57 100 1300 -1
4 2 6 0 4 2 6 57 57 100 1400 -1
4 2 6 0 4 2 6 57 57 100 1500 -1
4 2 6 0 4 2 6 57 57 100 1600 -1
4 2 6 0 4 2 6 57 57 100 1700 -1
4 2 6 0 4 2 6 57 57 100 1800 -1
4 2 6 0 4 2 6 57 57 100 1900 -1
4 2 6 0 4 2 6 57 57 100 2000 -1
4 2 6 0 4 2 6 57 57 100 2100 -1
4 2 6 0 4 2 6 57 57 100 2200 -1
4 2 6 0 4 2 6 57 57 100 2300 -1
4 2 6 0 4 2 6 57 57 100 2400 -1
4 2 6 0 4 2 6 57 57 100 2500 -1

I have 8GB of memory.

As I understood from your results, the entry information is not taken into account at all. It's strange, because a whole class of systems is based on entry time.

So 50% is taken from the day the deal was closed?

History - MetaTrader 5 History - MetaTrader 5 Testing Visualization - Algorithmic

Maxim Dmitrievsky 2020.10.25 02:24 #20392

Rorschach:

Day of week, day of month, hour, minute, ...same for exit..., deal duration in minutes, SL, TP, result +-1

I have 8GB of memory.

As I understood from your results, the entry information is not taken into account at all. That's strange, since you have a whole class of systems based on entry time.

That's no way to prepare fiches. The ranges of column values should be commensurable. For categorical is done by van chot

Aleksey Vyazmikin 2020.10.25 02:42 #20393

Rorschach:

Day of week, day of month, hour, minute, ...same for exit..., deal duration in minutes, SL, TP, result +-1

I have 8GB of memory.

As I understood from your results, the entry information is not taken into account at all. It is strange, since a whole class of systems is based on entry time.

So 50% is taken from the closing day of the trade?

In general, the result is not strange - there are probably days when the trend changes more often or there is a flat, respectively, the non-stop movement is not infinite and on average it ends after some value of points, so the holding time and TP with SL have hit. And the entry by time turned out to be unimportant, as it does not guarantee a pointless movement - this is a forecast of the future - if we were looking for it - the entry time for profitable trades, then we would have found the highest probability. In general, if there were more predictors, then maybe the entrance at the time of entry with someone else.

The percentage most likely says only about how high in the tree is the split with the predictor. I haven't dealt with that. Here's the description through the translator:

"

Individual importance values for each of the input objects (the default method of calculating object importance for non-rank metrics).

For each object, the change in prediction values shows how much the prediction changes, on average, when the object value changes. The greater the importance value, the greater on average the change in prediction value would be if that feature was changed.

"

Trendline - Lines - Parabolic SAR - Trend Parabolic SAR - Trend

Aleksey Vyazmikin 2020.10.25 02:43 #20394

Maxim Dmitrievsky:
You can't prepare features like that. The ranges of column values should be commensurable. For categorical ones, it's done by van hot.

Why do you consider time as categorical? Or what features are we talking about?

Maxim Dmitrievsky 2020.10.25 03:07 #20395

Aleksey Vyazmikin:

Why do you think time is categorical? Or what signs are we talking about?

Where is the time there? There's the day of the week, the day of the month, the hour of the day, the minute of the hour. Time is a continuous value, and here the categories are ordinal.

Forester 2020.10.25 07:08 #20396

Aleksey Vyazmikin:

By the way, have you seen such a generator, which randomly outputs a number from the array without repeats - I need just such a generator.

This is what I do:

1) I create an array of string indexes with length equal to the number of strings, fill it with values from 0 to N strings

2) I shuffle this array

RandomizeIdx(int &idx[], int rows) {//ссылка на массив и его длина
        int j = 0, c = 0;
        for (int r = 0; r<rows; r++) {//перебор train участка
                j = RandomInteger(rows);//номер строки с которой поменять 
                c = idx[r]; idx[r] = idx[j]; idx[j] = c;
        }
}

where RandomInteger() is any variant of the array

3) then I take all the values of these indexes in a row in the loop and use them from the main array to get the desired string, it turns out to be pseudo-random after mixing the indexes

Fractal Adaptive Moving Average Commodity Channel Index - Standard Deviation Channel -

mytarmailS 2020.10.25 09:00 #20397

Has anyone tried to do a classification for a lot of classes, say 10k?

does it work at all?

Forester 2020.10.25 09:29 #20398

mytarmailS:

Has anyone tried to do a classification for a lot of classes, say 10k?

does it work at all?

Trees/forests/busting can. But haven't tried more than 3, haven't had such a task.

mytarmailS 2020.10.25 09:32 #20399

elibrarius:
Trees/forests/busting can. But I haven't tried more than 3, I didn't have such a task.

Forests hang, not enough RAM

Forester 2020.10.25 09:53 #20400

mytarmailS:

Forests are frozen, not enough RAM

less trees, less depth, maybe enough, at least just to test

Machine learning in trading: theory, models, practice and algo-trading - page 2040