Discussing the article: "Creating a mean-reversion strategy based on machine learning" - page 8

[Deleted]  
sibirqk #:

Imho of course, but using Savitsky_Golay is not much different from using muve. The SG filter is the midpoint of a polynomial regression in a given sliding window, with a specified degree of polynomial. For degree 1, it is a perfect match with the muve of the corresponding period.

To identify the return to the mean, it makes more sense, in my opinion, to use amplitude filtering - renko, renji, zigzags. I think that ranges are the best - the difference between Hg and Lw is a constant. Well, or a constant size ZZ, which is basically the same thing.

Well, the ZZ has proven to be worse.
[Deleted]  

Fourier and singular value decomposition show themselves quite well as filters. I have not been involved in the selection of parameters, first variants.


[Deleted]  

An example of adding exponential decay to the filtering process, based on the first markup function from the article. The last examples are given more weight in the markup, to adjust for more recent data.

def get_labels_filter(dataset, rolling=200, quantiles=[.45, .55], polyorder=3, decay_factor=0.95) -> pd.DataFrame:
    """
    Generates labels for a financial dataset based on price deviation from a Savitzky-Golay filter,
    with exponential weighting applied to prioritize recent data.

    Args:
        dataset (pd.DataFrame): DataFrame containing financial data with a 'close' column.
        rolling (int, optional): Window size for the Savitzky-Golay filter. Defaults to 200.
        quantiles (list, optional): Quantiles to define the "reversion zone". Defaults to [.45, .55].
        polyorder (int, optional): Polynomial order for the Savitzky-Golay filter. Defaults to 3.
        decay_factor (float, optional): Exponential decay factor for weighting past data. 
                                        Lower values prioritize recent data more. Defaults to 0.95.

    Returns:
        pd.DataFrame: The original DataFrame with a new 'labels' column and filtered rows:
                       - 'labels' column: 
                            - 0: Buy
                            - 1: Sell
                       - Rows where 'labels' is 2 (no signal) are removed.
                       - Rows with missing values (NaN) are removed.
                       - The temporary 'lvl' column is removed. 
    """

    # Calculate smoothed prices using the Savitzky-Golay filter
    smoothed_prices = savgol_filter(dataset['close'].values, window_length=rolling, polyorder=polyorder)
    
    # Calculate the difference between the actual closing prices and the smoothed prices
    diff = dataset['close'] - smoothed_prices
    
    # Apply exponential weighting to the 'diff' values
    weighted_diff = diff * np.exp(np.arange(len(diff)) * decay_factor / len(diff)) 
    dataset['lvl'] = weighted_diff # Add the weighted difference as 'lvl'

    # Remove any rows with NaN values 
    dataset = dataset.dropna()
    
    # Calculate the quantiles of the 'lvl' column (price deviation)
    q = dataset['lvl'].quantile(quantiles).to_list() 

    # Extract the closing prices and the calculated 'lvl' values as NumPy arrays
    close = dataset['close'].values
    lvl = dataset['lvl'].values
    
    # Calculate buy/sell labels using the 'calculate_labels_filter' function 
    labels = calculate_labels_filter(close, lvl, q) 

    # Trim the dataset to match the length of the calculated labels
    dataset = dataset.iloc[:len(labels)].copy()
    
    # Add the calculated labels as a new 'labels' column to the DataFrame
    dataset['labels'] = labels
    
    # Remove any rows with NaN values
    dataset = dataset.dropna()
    
    # Remove rows where the 'labels' column has a value of 2.0 (sell signals)
    dataset = dataset.drop(dataset[dataset.labels == 2.0].index)
    
    # Return the modified DataFrame with the 'lvl' column removed
    return dataset.drop(columns=['lvl'])


  • The decay_factor parameter (default 0.95) is added to the code to control the weight given to past data.
  • For each data point, we calculate the weight using np.exp(np.exp(np.arange(len(diff))) * decay_factor / len(diff)), and multiply it by the diff values. This gives more weight to recent diffs and less weight to older ones. Weighted 'lvl' column: The lvl column now stores exponentially weighted differences, making the markup process more sensitive to recent price movements.
  • A smaller decay_factor value (closer to 0) will make the weighting more aggressive, highlighting recent price changes more strongly. This means that the algorithm will react faster to recent deviations from the smoothed price trend.
  • A larger decay_factor value (closer to 1) will lead to smoother weighting, giving more weight to past data. This can be useful for reducing the influence of short-term noise and identifying long-term trends.
[Deleted]  

For training on shorter intervals, e.g. 2018 to 2024, few trades may be obtained if n_clusters = 10 in the hyperparameters. Reducing the number of clusters, e.g. to 5-3, helps to get more trades.

In this way, you can train on shorter time periods and look for good patterns on them by varying different parameters.

You can also reduce the filter periods (Savitzky-Golei filter or splines) of samplers of trades.


 
Hello Max! I'm writing to say that I'm looking forward to articles by 'Maxim Dmitrievsky'. To study each article you post, I've been closely following your work for the past 2 years. I'm from Brazil, I study and learn something new and valuable.

I hope from the bottom of my heart that you, Maxim, continue sharing knowledge research and that the MetaQuotes team values you as a respected author and 'shares the profits' to encourage you to continue the amazing work. I hope you're the best, Maxim!

Please, @MetaQuotes and the @MetaQuotes @ alexx team , give this guy a raise! He deserves it <3

Greetings from Brazil
Maxim Dmitrievsky
Maxim Dmitrievsky
  • 2025.03.07
  • www.mql5.com
Профиль трейдера
[Deleted]  
Vinicius Barenho Pereira #:
Hello Max! I'm writing to say that I'm looking forward to articles by 'Maxim Dmitrievsky'. To study each article you post, I've been closely following your work for the past 2 years. I'm from Brazil, I study and learn something new and valuable.

I hope from the bottom of my heart that you, Maxim, continue sharing knowledge research and that the MetaQuotes team values you as a respected author and 'shares the profits' to encourage you to continue the amazing work. I hope you're the best, Maxim!

Please, @MetaQuotes and the @MetaQuotes @ alexx team , give this guy a raise! He deserves it <3

Greetings from Brazil

Thanks, I'll try to do something interesting and maybe useful in the future :)

 

this is great: At the end of the article it will be possible to train different machine learning models in Python and convert them into trading systems for the MetaTrader 5 trading terminal.

I will look into it in more detail - thanks for the article!

[Deleted]  
Roman Shiredchenko machine learning models in Python and convert them into trading systems for the MetaTrader 5 trading terminal.

I will look into it in more detail - thanks for the article!

You are welcome, they sometimes even make money )
 

Hello Maxim,

I found an issue with the value generation in the get_features function in Python and in MetaTrader 5.

The problem lies in the "skew" statistic in Python and "skewness" in MQL5. From the tests I performed, the values generated by the two languages are slightly different. For example:

-0.087111
In MQL5, and
-0.092592
In Python

It may seem minor, but after the classification of the meta_labels, this leads to a delayed prediction, causing the EA to usually enter one candle late, which makes the strategy ineffective. I recommend not using this statistic in MQL5, or attempting to calculate it manually to match the same values.

Greetings from Brazil

[Deleted]  
KleversonGerhardt #:

Hello Maxim,

I found an issue with the value generation in the get_features function in Python and in MetaTrader 5.

The problem lies in the "skew" statistic in Python and "skewness" in MQL5. From the tests I performed, the values generated by the two languages are slightly different. For example:

-0.087111
In MQL5, and
-0.092592
In Python

It may seem minor, but after the classification of the meta_labels, this leads to a delayed prediction, causing the EA to usually enter one candle late, which makes the strategy ineffective. I recommend not using this statistic in MQL5, or attempting to calculate it manually to match the same values.

Greetings from Brazil

Hi, thank you! I will check it.