Discussion of article "CatBoost machine learning algorithm from Yandex with no Python or R knowledge required" - Trading Algorithms - Articles, Library comments

Maxim Dmitrievsky 2020.11.09 14:17 #1

Thanks for the article ) what metrics did you find most successful in categorising wp? ROC-AUC, Accuracy, F1 or maybe something else.

Aleksey Vyazmikin 2020.11.09 14:36 #2

Maxim Dmitrievsky:
Thanks for the article ) what metrics did you find most successful in categorising wp? ROC-AUC, Accuracy, F1 or maybe something else

You're welcome :)

I mostly work with unbalanced samples where, for binary classification, zeros outnumber units and the value of units is much higher than zeros. In fact, I only look at Recall and Precision.

Recall - shows the activity of the model due to the "knowledge" it has received, i.e. how often the model responds to the "stimulus" in the form of a unit, the higher the index, the higher the percentage of units the model responds to.

Precision - shows the precision of the responses, and here, depending on the strategy, this precision can be acceptable even at small percentages, for example 45%.

Standard metrics mainly take into account the equivalence of the classified objects, which is not enough to make a final decision.

To select groups of models, I also use the metric of "error balance" - where I evaluate errors in dynamics.

In general, more than 30 different metrics are used, perhaps it is a separate topic for research and article.

How to Choose a Trading Report - Trading Discussion of article "Movement

Maxim Dmitrievsky 2020.11.09 15:03 #3

Aleksey Vyazmikin:

You're welcome :)

I mostly work with unbalanced samples where, for binary categorisation, zeros are more than units and the value of units is much more than zeros. In fact, I only look at Recall and Precision.

Recall - shows how often the model responds to a "stimulus" in the form of a unit, the higher the index, the higher the percentage of units the model responds to.

Precision - shows the accuracy of feedback, and here, depending on the strategy, this accuracy can be acceptable even at small percentages, for example 45%.

The standard metrics mainly take into account the equivalence of the classified objects, which is not enough to make a final decision.

To select groups of models, I also use the metric of "error balance" - where I evaluate errors in dynamics.

In general, more than 30 different metrics are used, perhaps it is a separate topic for research and article.

So, for each strategy, conditionally speaking, a separate model with a small frequency of signals is used and then they are combined? I just didn't really get it from the article. The philosophy of the approach itself is interesting. You were talking earlier in the MO topic about leaf selection and so on. Or is that a separate topic?

Discussion of article "Exploring Discussion of article "Visual Discussion of article "Brute

Aleksey Vyazmikin 2020.11.09 15:10 #4

Maxim Dmitrievsky:
That is, a separate model with a small frequency of signals is used for each strategy and then they are combined?

It is planned to use different basic strategies with different settings in the form of a signal generator, and, accordingly, a different model for each signal. Due to the costliness of the process of calculating predictors, it should be done in one Expert Advisor. At the moment, I do not have a system for working with virtual positions, which could support pending orders and stops, as it is planned to use this approach on netting accounts.

Discussion of article "Grid Discussion of article "Library Trading Principles - Trade

Aleksey Vyazmikin 2020.11.09 15:16 #5

Maxim Dmitrievsky:
I just didn't really get it from the article. The philosophy of the approach itself is interesting. You were talking earlier in the MO thread about leaf selection and so on. Or is that a separate topic?

Leaves are a separate topic. From bousting, in fact, the leaves did not come out very well in terms of their characteristics (I just took them from the first tree). The other day I learnt that XGBoost has an option to save leaves, but I need Phyton, maybe the quality of leaves is better there.

Discussion of article "Grokking Discussion of article "Deep Discussing the article: "Developing

Denis Kirichenko 2020.11.09 22:26 #6

Alexei, with a start!

Imho, for a first article it is very decent. The quality is much higher than that of some writers who have written more than a dozen articles. Thank you, I'll look into the nuances.

P.S. I watched the video. It's nice that there are such enthusiastic young people.....

Discussion of article "Gap Discussion of article "Grid Discussion of article "Grokking

Aleksey Vyazmikin 2020.11.10 04:01 #7

Denis Kirichenko:

Alexei, with a start!

Imho, for a first article it is very decent. The quality is much higher than that of some writers who have written more than a dozen articles. Thank you, I will look into the nuances.

P.S. I watched the video. It's nice that there are such enthusiastic young people.....

Thank you!

This article was written as a manual on application of CatBoost with use of a command line and with the subsequent integration in MT5, I wanted to show a good class on work with tables"CSV fast" which essentially has facilitated me work with large CSV files, I already use it more than 2 years. Under this need I decided to conduct an experiment with standard indicators and it was successful.

In the appendix is the code of the EA and the script that organises the whole infrastructure - it was important to me that what I am describing could be reproduced - so test and report bugs, suggest improvements - I am for a common contribution to this work.

Yes, of course young people are different, it's just that the older you get, the more removed they are from their social environment, and less objectively perceived.

Discussion of article "Developing Discussion of article "Timeseries Discussion of article "Grid

Andrey Dibrov 2020.11.10 10:53 #8

Thanks for the article. I knew that it was being prepared and was waiting for it to be published.... The results of the short-period test in both your and Maxim Dmitrievsky's previous article are good. The results of works on machine learning show that the adjacent short period test on data on which ns has not been trained, almost always have a positive result of bidding based only on the logic of learning neural network, and further requires any optimisation or retraining. Have you thought about automating the process of retraining, or before training the neural network by a period of a short test followed by a shift. This would give a broader picture of the success or otherwise of a particular approach.

Discussion of article "Programming Discussion of article "Neural Discussion of article "Neural

Aleksey Vyazmikin 2020.11.10 11:44 #9

Andrey Dibrov:

Thanks for the article. I knew that it was being prepared and was waiting for it to be published.... The results of the short-period test in both your and Maxim Dmitrievsky's previous article are good. The results of works on machine learning show that the adjacent short period test on data on which ns has not been trained, almost always have a positive result of bidding based only on the logic of learning neural network, and further requires any optimisation or retraining. Have you thought about automating the process of retraining, or before training the neural network by a period of a short test followed by a shift. This would give a broader picture of the success or otherwise of a particular approach.

Glad you found the article interesting.

My approach and the one in Maxim's article differ in everything except for the tool used - CatBoost. In my article, the sample is prepared by signal and more than 100 bars can pass between signals, while Maxim's article has a long training period on each bar to find general patterns, Maxim's training is more focused on the trend, I suggest using complex predictors, while Maxim's article shows a variant of a simple transformation of the price series. Maxim showed, first of all, how to train the model in python and apply it in MT5 without python, which is very convenient - the approach itself cannot be used in trading, that's why the article contains the phrase "naive approach"! My article also focuses on the process of training and integration of the CatBoost model into MT5, but the training is done not in python, but in a separate independent console programme, which is a wrapper of the CatBoost library, which allows to use all the functions as in python. Additionally in the article the structure of the adviser was considered that to show the full cycle of gathering and processing of the information from A to Z. The success of training depends on predictors, in this article we used mainly predictors in the form of standard oscillators and it was a pleasant surprise for me that the model was trained on them and shows interesting results. The goal, like Maxim, was not to provide a ready-made solution to make money, but I gave a good foundation.

As for testing - you are inattentive - Maxim's article shows the behaviour of the EA on the sample outside of training for 4 months, while my EA shows positive results for 15 months. Working for more than a year without retraining is a very good result if we stick to the paradigm of market volatility.

The experiment you suggested is possible if the sample is sufficient, and here my method is inferior to the method described in Maxim's article - on small time intervals there will simply not be enough data for training. According to my observations, we need at least 15000 signals for training, especially if we take into account that the predictors are not analysed separately and not selected.

Yes, and note that the test sample (test.csv in the article) is used only as a means of assessing the quality of the resulting model, and no training is performed on it, as a result, we trained only 60% of the entire sample, which is from 01.06.2014 to 30.04.2018 - the model was not built on the remaining data, so we should say that the model works on market information that was relevant 2.5 years ago.

Discussion of article "Deep Discussion of article "Neural Discussion of article "Deep

Andrey Dibrov 2020.11.10 12:29 #10

Aleksey Vyazmikin:

Glad you found the article interesting.

Mine and Maxim's approach in the article differ in everything, except for the tool used - CatBoost. In my article, the sample is prepared by signal and more than 100 bars can pass between signals, while Maxim's article has a long training period on each bar to search for general patterns, Maxim's training is more focused on the trend, I suggest using complex predictors, Maxim's article shows a variant of a simple transformation of a price series. Maxim showed, first of all, how to train the model in python and apply it in MT5 without python, which is very convenient - the approach itself cannot be used in trading, that's why the article contains the phrase "naive approach"! My article also focuses on the process of training and integration of the CatBoost model into MT5, but the training is done not in python, but in a separate independent console programme, which is a wrapper of the CatBoost library, which allows to use all the functions as in python. Additionally in the article the structure of the adviser was considered that to show the full cycle of gathering and processing of the information from A to Z. The success of training depends on predictors, in this article we used mainly predictors in the form of standard oscillators and it was a pleasant surprise for me that the model was trained on them and shows interesting results. The goal, like Maxim, was not to provide a ready-made solution to make money, but I gave a good foundation.

As for testing - you are inattentive - Maxim's article shows the behaviour of the EA on the sample outside of training for 4 months, while my EA shows positive results for 15 months. Working for more than a year without retraining is a very good result if you stick to the paradigm of market volatility.

The experiment you suggested is possible if the sample is sufficient, and here my method is inferior to the method described in Maxim's article - on small time intervals there will simply not be enough data for training. According to my observations, we need at least 15000 signals for training, especially if we take into account that the predictors are not analysed separately and not selected.

Yes, and note that the test sample (test.csv in the article) is used only as a means of assessing the quality of the resulting model, and no training is performed on it, as a result, we trained only 60% of the entire sample, which is from 01.06.2014 to 30.04.2018 - the model was not built on the remaining data, so we should say that the model works on market information that was relevant 2.5 years ago.

I, paid attention to the length of the test period. But the stable positive result is on a short period adjacent to the training period - a month - two months. Let's say we train on a two-year history. Test + a month. Save the result. Shift (or add) for this month - before training (retraining). Test + month. Keep the result. And so on.

Discussion of article "Practical Discussion of article "Studying Discussion of article "Deep