Discussing the article: "Python, ONNX and MetaTrader 5: Creating a RandomForest model with RobustScaler and PolynomialFeatures data preprocessing"

 

Check out the new article: Python, ONNX and MetaTrader 5: Creating a RandomForest model with RobustScaler and PolynomialFeatures data preprocessing.

In this article, we will create a random forest model in Python, train the model, and save it as an ONNX pipeline with data preprocessing. After that we will use the model in the MetaTrader 5 terminal.

Random Forest is a powerful tool in the machine learning toolkit. To better understand how it works, let's visualize it as a huge group of people coming together and making collective decisions. However, instead of real people, each member in this group is an independent classifier or predictor of the current situation. Within this group, a person is a decision tree capable of making decisions based on certain attributes. When the random forest makes a decision, it uses democracy and voting: each tree expresses its opinion, and the decision is made based on multiple votes.

Random Forest is widely used in a variety of fields, and its flexibility makes it suitable for both classification and regression problems. In a classification task, the model decides which of the predefined classes the current state belongs to. For example, in the financial market, this could mean a decision to buy (class 1) or sell (class 0) an asset based on a variety of indicators.

However, in this article, we will focus on regression problems. Regression in machine learning is an attempt to predict the future numerical values of a time series based on its past values. Instead of classification, where we assign objects to certain classes, in regression we aim to predict specific numbers. This could be, for example, forecasting stock prices, predicting temperature or any other numerical variable.

Author: Yevgeniy Koshtenko

 
One error in the code I found:

future_pr = dataset['<CLOSE>'].iloc[i + rand]
should be:

future_pr = dataset['close'].iloc[i + rand]

Otherwise you will get an error:

File "C:\Python39\lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
    raise KeyError(key) from err
KeyError: '<close>'

 An incorrect column reference in your labelling_relabeling_regression function. I kept getting errors because you try to access the column '<CLOSE>' in your dataset, but pandas is unable to find it because the correct column name is 'close' , not '<CLOSE>' . It's case sensitivity and the additional angle brackets cause pandas to throw a KeyError .

Very simple error but someone else following along could get confused and give up.

Since the rest of your code is using <CLOSE> ... its probably better to just add a line like:

dataset = dataset.rename(columns={'close': '<CLOSE>'})


def labelling_relabeling_regression(dataset, min_value=1, max_value=1):
    dataset = dataset.rename(columns={'close': '<CLOSE>'}) //add this line to make the rest of the code operate smoothly
    future_prices = []

    for i in range(dataset.shape[0] - max_value):
        rand = random.randint(min_value, max_value)
        future_pr = dataset['<CLOSE>'].iloc[i + rand]
        future_prices.append(future_pr)

    dataset = dataset.iloc[:len(future_prices)].copy()
    dataset['future_price'] = future_prices

    return dataset


 
Shashank Rai #:
One bug in the code I found:

should be:

future_pr = dataset['close'].iloc[i + rand ]

Otherwise you will get an error:

File "C:\Python39\lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: '<close> '

Wrong column reference in your labelling_relabelling_regression function. I kept getting errors because you are trying to access the '<CLOSE>' column in your dataset, but pandas can't find it because the correct column name is 'close', not '<CLOSE>'. Because of the case sensitivity and extra angle brackets, pandas throws a KeyError.

A very simple error, but someone else walking by might get confused and give up.

Since the rest of your code uses <CLOSE> ... it's probably best to just add a line such as:

dataset = dataset.rename(columns={'close': '<CLOSE>'})


def labelling_relabeling_regression(dataset, min_value=1, max_value=1):
dataset = dataset.rename(columns={'close': '<CLOSE>'}) //add this line to make the rest of the code work smoothly
future_prices = [ ]

for i in range(dataset.shape[0] - max_value):
rand = random.randint(min_value, max_value )
future_pr = dataset['<CLOSE>'].iloc[i + rand ]
future_prices.append(future_pr )

dataset = dataset.iloc[:len(future_prices)].copy( )
dataset['future_price'] = future_prices

return dataset


Thanks a lot, I'll have a look, I may have missed it when editing the code!(
 
Yevgeniy Koshtenko #:
Thanks a lot, I'll have a look, I may have missed it when editing the code!(

No problem sir. One more recommendation - for those people not using google colab and just working on their own machine or on AWS or something, they don't need to import gdown

instead use the following:

First:

import os #add this line to your import area instead of import gdown


Second:

replace the following section


# Save the pipeline
joblib.dump(pipeline, 'rf_pipeline.joblib')

# Convert pipeline to ONNX
onnx_model = convert_sklearn(pipeline, initial_types=initial_type)

# Save the model in ONNX format
model_onnx_path = "rf_pipeline.onnx"
onnx.save_model(onnx_model, model_onnx_path)

# Save the model in ONNX format
model_onnx_path = "rf_pipeline.onnx"
onnx.save_model(onnx_model, model_onnx_path)

# Connect Google Drive (if you work in Colab and this is necessary)
from google.colab import drive
drive.mount('/content/drive')

# Specify the path to Google Drive where you want to move the model
drive_path = '/content/drive/My Drive/'  # Make sure the path is correct
rf_pipeline_onnx_drive_path = os.path.join(drive_path, 'rf_pipeline.onnx')

# Move ONNX model to Google Drive
shutil.move(model_onnx_path, rf_pipeline_onnx_drive_path)

print('The rf_pipeline model is saved in the ONNX format on Google Drive:', rf_pipeline_onnx_drive_path)

with:

# Save the pipeline in Joblib format
joblib_model_path = 'rf_pipeline.joblib'
joblib.dump(pipeline, joblib_model_path)

# Convert pipeline to ONNX format
initial_type = [('float_input', FloatTensorType([None, n_features]))]
onnx_model = convert_sklearn(pipeline, initial_types=initial_type)

# Save the ONNX model
model_onnx_path = "rf_pipeline.onnx"
onnx.save_model(onnx_model, model_onnx_path)

# Specify the local folder within the current directory to save the model
local_folder_path = './models/'  # Adjust this path as needed

# Create the directory if it doesn't exist
if not os.path.exists(local_folder_path):
    os.makedirs(local_folder_path)

# Specify the full paths for the models within the local folder
local_joblib_path = os.path.join(local_folder_path, 'rf_pipeline.joblib')
local_onnx_path = os.path.join(local_folder_path, 'rf_pipeline.onnx')

# Move the models to the specified local folder
shutil.move(joblib_model_path, local_joblib_path)
shutil.move(model_onnx_path, local_onnx_path)

print(f'The rf_pipeline model in Joblib format is saved locally at: {local_joblib_path}')

print(f'The rf_pipeline model in ONNX format is saved locally at: {local_onnx_path}')


the model will be stored in a sub-folder called /model sub-directory. This will also store your model as jplotlib file, in case you need it for runtime. Also, both models can be run directly from python to get predictions.

 
Thanks for the article. After doing reinforcement learning for my uni project I did wonder if this was possible.
 
Shashank Rai #:

No problem sir. Another recommendation is for those who don't use google colab and just run on their machine or on AWS, they don't need to import gdown

Instead use the following:

Firstly:

import os #add this line to the import scope instead of import gdown
.


Second:

replace the following section


to:

# Save the pipeline in Joblib format
joblib_model_path = 'rf_pipeline.joblib '
joblib.dump(pipeline, joblib_model_path)

# Convert the pipeline to ONNX format
initial_type = [('float_input', FloatTensorType([None, n_features]))]]
onnx_model = convert_sklearn(pipeline, initial_types=initial_type )

# Save the ONNX model
model_onnx_path = "rf_pipeline.onnx "
onnx.save_model(onnx_model, model_onnx_path)

# Specify a local folder in the current directory to save the model to
local_folder_path = './models/'# Customise this path as needed.

# Create a directory if it does not exist
if not os.path.exists(local_folder_path):
os.makedirs(local_folder_path )

# Specify the full paths to the models in the local folder
local_joblib_path = os.path.join(local_folder_path, 'rf_pipeline.joblib' )
local_onnx_path = os.path.join(local_folder_path, 'rf_pipeline.onnx' )

# Move the models to the specified local folder
shutil.move(joblib_model_path, local_joblib_path)
shutil.move(model_onnx_path, local_onnx_path)

print(f'The rf_pipeline model in Joblib format is saved locally at address: {local_joblib_path}')

print(f'The rf_pipeline model in ONNX format is saved locally at address: {local_onnx_path}')


The model will be stored in the /model subdirectory. This is also where the model will be stored as a jplotlib file, should you need it at runtime. In addition, both models can be run directly from python to get predictions.

Corrected, threw the tutorial file version to the moderator for approval.

 
Adrian Page reinforcement learning for my project at university, I wondered if it was possible.

Yes, I haven't got to reinforcement learning yet))))))

 
22.04.2024 - changed dataset data from Excel format to Python MetaTrader5 library format. Plus, I saved the model locally (previously it was saved to Google Drive)
Reason: