Discussing the article: "Spurious Regressions in Python"

MetaQuotes 2024.05.24 10:14

Check out the new article: Spurious Regressions in Python.

Spurious regressions occur when two time series exhibit a high degree of correlation purely by chance, leading to misleading results in regression analysis. In such cases, even though variables may appear to be related, the correlation is coincidental and the model may be unreliable.

Before diving into the realm of algorithmic trading with machine learning, it's crucial to ascertain whether a meaningful relationship exists between the model inputs and the variable we aim to predict. This article illustrates the utility of employing unit root tests on model residuals to validate the presence of such a relationship in our datasets.

Unfortunately, it is possible to construct models using datasets that do not have any genuine relationship. Regrettably these models may yield impressively low error metrics, reinforcing a false sense of control and excessively optimistic outlooks. These flawed models are commonly referred to as "spurious regressions."

This article will begin by first cultivating an intuitive understanding of spurious regressions. Afterwards, we'll generate synthetic time series data to simulate a spurious regression and observe its characteristic effects. Subsequently we'll delve into methods for identifying spurious regressions, relying on our insights to validate a machine learning model crafted in Python. Finally, if our model is validated we will export it to ONNX and implement a trading strategy in MQL5.

Author: Gamuchirai Zororo Ndawana

Carl Schreiber 2024.06.21 11:31 #1

Very interesting article! But it would have been helpful for traders without in-depth statistical knowledge, if the basic terms such as residuals (difference to prediction), stationarity (var. & mean are const. or not), etc. were briefly explained.

Econometrics: one step ahead Not the Grail, just STEPMA intro

Gamuchirai Zororo Ndawana 2024.06.21 12:31 #2

Carl Schreiber #:
Very interesting article! But it would have been helpful for traders without in-depth statistical knowledge, if the basic terms such as residuals (difference to prediction), stationarity (var. & mean are const. or not), etc. were briefly explained.

Thank you Carl indeed you're right, I'll remember to keep it short and sweet next time to maximize utility.

Aleksey Vyazmikin 2024.09.23 20:47 #3

Perhaps it is translation difficulties, but I would like to clarify. Stationarity in the article is defined for the residuals, i.e. the delta between the actual bar closing prices and its prediction? I may not have read well, but why are we drawing conclusions on the same data that was trained, wouldn't it be logical to apply the model on a lagged sample?

The article makes it seem like the time series of quotes are stationary, but all sources tell us otherwise. I think this is an error in the perception of the material.

Also, the question of the model accuracy is not covered, as I understood, it is not accurate at all, and if so, then can we apply different tests with such a strong variation of errors in the model answers?

Ideally, it would be useful to see how the predictors were excluded, by one technique or another, and this affected the results of the regression model.

I think more articles are needed on this topic that can actually be applied to the quotes.

Machine learning in trading: Econometrics: one step ahead Discussing the article: "Developing

Gamuchirai Zororo Ndawana 2024.09.24 09:37 #4

Aleksey Vyazmikin #:

The article makes it seem like the time series of quotes are stationary, but all sources tell us otherwise. I think this is an error in the perception of the material.

Ideally, it would be useful to see how the predictors were excluded, by one technique or another, and this affected the results of the regression model.

I think more articles are needed on this topic that can actually be applied to the quotes.

Hey Aleksey, as you probably already know, there are many different ways we can solve any problem. I prefer measuring the model's residuals on test data it has not seen before. However, the academic literature I was reading at the time suggested to me that even training data the model has seen before is still fine.

And I was not aware that the way I wrote may have suggested time series of market quotes are stationary, we all know they aren't stationary, it was not my intention to say that, and I probably could've phrased things better.

The question of model accuracy was beyond my scope, because spurious models may still score high accuracy metrics.

You know this, one of the first articles I ever wrote for the community. I've learned since then, and I'll continue adding on to the series. This time, I'll keep my writing clear, and I'll especially demonstrate how we can apply this to our advantage when trading financial markets.

Econometrics: one step ahead Machine learning in trading: Dependency statistics in quotes

New comment