Discussing the article: "Data Science and ML (Part 31): Using CatBoost AI Models for Trading"

 

Check out the new article: Data Science and ML (Part 31): Using CatBoost AI Models for Trading.

CatBoost AI models have gained massive popularity recently among machine learning communities due to their predictive accuracy, efficiency, and robustness to scattered and difficult datasets. In this article, we are going to discuss in detail how to implement these types of models in an attempt to beat the forex market.

CatBoost is an open-source software library with gradient-boosting algorithms on decision trees, it was designed specifically to address the challenges of handling categorical features and data in machine learning.

It was developed by Yandex and was made open-source in the year of 2017, read more.

Despite being introduced recently compared to machine learning techniques such as Linear regression or SVM's, CatBoost gained massive popularity among AI communities and rose to the top of the most used machine learning models on platforms like Kaggle.

What made CatBoost gain this much attention is its ability to automatically handle categorical features in the dataset, which can be challenging to many machine learning algorithms.

  • Catboost models usually provide a better performance compared to other models with minimal effort, even with the default parameters and settings, these models can perform well accuracy-wise.
  • Unlike neural networks which require domain knowledge to code and make themwork, CatBoost's implementation is straightforward.

Author: Omega J Msigwa

 
Your writing is thought provoking. 

I wonder what would happen if we also tracked which trading session we're in.
 
yes, trading session is a valuable variable to have in your training data
 
All classifiers (including catboost) work correctly only with normalised attributes. Prices as attributes are not suitable.
 

and there is also the problem of exporting the classifier model to ONNX


Note

The label is inferred incorrectly for binary classification. This is a known bug in the onnxruntime implementation. Ignore the value of this parameter in case of binary classification.

 
Price can not be used as training data, early last year I used the price of gold to train the model, when the price of gold continues to hit new highs, the input to the model of these new high price data, the model does not recognise these data, no matter how to give how to change and exceed the highest price of the price of the training data data, give a constant probability of value!
 
Thank you very much for the helpful article.

I have a small question or concern that I hope to share.

When I tried to convert the CatBoost model in a pipeline to ONNX with categorical variables the process failed, throwing an error.


I believe the underlying issue might be related to what is described here:

https://catboost.ai/docs/en/concepts/apply-onnx-ml

Specifics:

Only models trained on datasets without categorical features are currently supported.


In the Jupyter Notebook catboost-4-trading.ipynb that I downloaded, the pipeline fitting code is written as:

pipe.fit(X_train, y_train, catboost__eval_set=(X_test, y_test))

It appears that the parameter "catboost__cat_features=categorical_features" is omitted, so the model may have been trained without specifying categorical features.

This might explain why the model could be saved as ONNX without any problem.

If this is the case, then perhaps the CatBoost native method "save_model" could be used directly, like this:

model = pipe.named_steps['catboost']

model_filename = "CatBoost.EURUSD.OHLC.D1.onnx"

model.save_model(model_filename, format='onnx')

I hope this observation might be helpful.

ONNX |
  • catboost.ai
ONNX is an open format to represent AI models. A quote from the Open Neural Network Exchange documentation:
 
border_count is a partition into bins (quantum segments) for any features, not just categorical features.
 
Why is there no fixed stop loss in the EA code?