Is there a pattern to the chaos? Let's try to find it! Machine learning on the example of a specific sample.

Aleksey Vyazmikin 2022.10.21 19:46

Actually, I suggest downloading the file from the link. There are 3 csv files in the archive:

train.csv - the sample on which you need to train.
test.csv - auxiliary sample, it can be used during training, including merged with train.
exam.csv - a sample that does not participate in training.

The sample itself contains 5581 columns with predictors, the target in 5583 column "Target_100", columns 5581, 5582, 5584, 5585 are auxiliary, and contain:

5581 column "Time" - date of the signal
5582 column "Target_P" - direction of the trade "+1" - buy / "-1" - sell
5584 column "Target_100_Buy" - financial result from buying
5585 column "Target_100_Sell" - financial result from selling.

The goal is to create a model that will "earn" more than 3000 points on exam.csv sample.

The solution should be without peeking into exam, i.e. without using data from this sample.

To maintain interest - it is desirable to tell about the method that allowed to achieve such a result.

Samples can be transformed in any way you like, including changing the target sample, but you should explain the nature of the transformation so that it is not a pure fit to the exam sample.

spiderman8811 2022.10.22 07:35 #1

Aleksey Vyazmikin:

Actually, I suggest downloading the file from the link. There are 3 csv files in the archive:

train.csv - the sample on which you need to train.
test.csv - auxiliary sample, it can be used during training, including merged with train.
exam.csv - a sample that does not participate in training in any way.

The sample itself contains 5581 columns with predictors, the target in 5583 column "Target_100", columns 5581, 5582, 5584, 5585 are auxiliary, and contain:

5581 column "Time" - date of the signal
5582 column "Target_P" - direction of the trade "+1" - buy / "-1" - sell
5584 column "Target_100_Buy" - financial result from buying
5585 column "Target_100_Sell" - financial result from selling.

The goal is to create a model that will "earn" more than 3000 points on exam.csv sample.

The solution should be without peeking into exam, i.e. without using data from this sample.

To maintain interest - it is desirable to tell about the method that allowed to achieve such a result.

Samples can be transformed in any way you want, including changing the target, but you should explain the essence of the transformation, so that it is not a pure fitting to the exam sample.

There is, of course

Aleksey Vyazmikin 2022.10.22 08:42 #2

spiderman8811 #:
Of course there is.

You want to try and prove it?

Aleksey Vyazmikin 2022.10.22 18:47 #3

Training what is called out of the box with CatBoost, with the settings below - with Seed brute force gives this probability distribution.

FOR %%a IN (*.) DO (                                                                                                                                                                                                                                                                            
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_8\result_4_%%a       --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 8         --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_16\result_4_%%a      --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 16        --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_24\result_4_%%a      --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 24        --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_32\result_4_%%a      --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 32        --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_40\result_4_%%a      --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 40        --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_48\result_4_%%a      --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 48        --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_56\result_4_%%a      --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 56        --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_64\result_4_%%a      --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 64        --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_72\result_4_%%a      --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 72        --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
catboost-1.0.6.exe fit   --learn-set train.csv   --test-set test.csv     --column-description %%a        --has-header    --delimiter ;   --model-format CatboostBinary,CPP       --train-dir ..\Rezultat\RS_80\result_4_%%a      --depth 6       --iterations 1000       --nan-mode Forbidden    --learning-rate 0.03    --rsm 1         --fold-permutation-block 1      --boosting-type Plain   --l2-leaf-reg 6         --loss-function Logloss         --use-best-model        --eval-metric Logloss   --custom-metric Logloss         --od-type Iter  --od-wait 100   --random-seed 80        --random-strength 1     --auto-class-weights SqrtBalanced       --sampling-frequency PerTreeLevel       --border-count 32       --feature-border-type Median            --bootstrap-type Bayesian       --bagging-temperature 1         --leaf-estimation-method Newton         --leaf-estimation-iterations 10                
)

1. Sampling train

2. Sample test

3. Exam sample

As you can see, the model prefers to classify all almost everything by zero - so there is less chance to make a mistake.

Machine learning in trading: Advisors on neural networks, Probability.

Forester 2022.10.24 09:31 #4

The last 4 columns

With 0 class apparently the loss should be in both cases? I.e. -0.0007 in both cases. Or if the buy|sell bet is still made, will we make a profit in the right direction?

[Archive!] FOREX - Trends, Help me write an Trying to wite a

Forester 2022.10.24 09:35 #5

The 1/-1 direction is selected by a different logic, i.e. the MO is not involved in the direction selection? Do we just need to learn 0/1 to trade/not trade (when the direction is rigidly chosen)?

Aleksey Vyazmikin 2022.10.24 10:29 #6

elibrarius #:

Last 4 columns

With 0 class apparently the loss should be in both cases? I.e. -0.0007 in both cases. Or if the buy|sell bet is still made, will we make a profit in the right direction?

With zero grade - do not enter the trade.

I used to use 3 targets - that's why the last two columns with fin results instead of one, but with CatBoost I had to switch to two targets.

elibrarius #:
The 1/-1 direction is selected by a different logic, i.e. the MO is not involved in the direction selection? You just have to learn 0/1 trade/no trade (when direction is rigidly chosen)?

Yes, the model only decides whether to enter or not. However, within the framework of this experiment it is not forbidden to learn a model with three targets, for this purpose it is enough to transform the target taking into account the direction of entry.

Damned Martin Machine learning in trading: FOREX - Trends, Forecasts

Forester 2022.10.24 10:57 #7

Aleksey Vyazmikin #:

If the class is zero - do not enter the transaction.

Earlier I used to use 3 targets - that's why the last two columns with financial result instead of one, but with CatBoost I had to switch to two targets.

Yes, the model only decides whether to enter or not. However, within the framework of this experiment it is not prohibited to teach the model with three targets, for this purpose it is enough to transform the target taking into account the direction of entry.

I.e. if at 0 class(do not enter) the correct direction of the trade is chosen, will profit be made or not?

Forester 2022.10.24 10:59 #8

Aleksey Vyazmikin #:

If the class is zero - do not enter the transaction.

Earlier I used to use 3 targets - that's why the last two columns with financial result instead of one, but with CatBoost I had to switch to two targets.

Catbusta has multiclass, it's strange to abandon 3 classes

Aleksey Vyazmikin 2022.10.24 11:10 #9

elibrarius #:
I.e. if at 0 class(do not enter) the correct direction of the transaction will be chosen, then there will be profit or not?

There will be no profit (if you do revaluation, there will be a small percentage of profit at zero).

It is possible to redo the target correctly only by breaking "1" into "-1" and "1", otherwise it is a different strategy.

elibrarius #:

Catbusta has multiclass, it's strange that they abandoned 3 classes

There is, but there is no integration in MQL5.

There is no model unloading into any language at all.

Probably, it is possible to add a dll library, but I can't figure it out on my own.

FOREX - Trends, forecasts Any questions from newcomers [ARCHIVE]Any rookie question, so

Forester 2022.10.24 11:47 #10

Aleksey Vyazmikin #:

There will be no profit (if you do a revaluation there will be a small percentage of profit at zero).

Then there is little point in financial result columns. There will also be errors of 0 class forecast (instead of 0 we will forecast 1). And the price of the error is unknown. That is, the balance line will not be built. Especially since you have 70% of class 0. I.e. 70% of errors with unknown financial result.
You can forget about 3000 points. If it does, it will be unreliable.

I.e. there is no point in solving the problem....

Any questions from newcomers From theory to practice Strategies that give big

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

New comment