Discussing the article: "Overcoming The Limitation of Machine Learning (Part 9): Correlation-Based Feature Learning in Self-Supervised Finance"

 

Check out the new article: Overcoming The Limitation of Machine Learning (Part 9): Correlation-Based Feature Learning in Self-Supervised Finance.

Self-supervised learning is a powerful paradigm of statistical learning that searches for supervisory signals generated from the observations themselves. This approach reframes challenging unsupervised learning problems into more familiar supervised ones. This technology has overlooked applications for our objective as a community of algorithmic traders. Our discussion, therefore, aims to give the reader an approachable bridge into the open research area of self-supervised learning and offers practical applications that provide robust and reliable statistical models of financial markets without overfitting to small datasets.

Academic texts often provide statistical tests to determine whether a model’s assumptions hold. It is important to know how well your model’s assumptions align with the nature of the problem you have, because this tells us whether the model we selected is in sound health for the task we want to delegate to it. However, these standard statistical tests introduce an additional set of material challenges, to an already difficult objective. In brief, standard academic solutions are not only difficult to execute and interpret carefully, but they are also vulnerable to producing false results—meaning they may pass a model that is not sound. This leaves practitioners exposed to unmitigated risks.

Therefore, this article proposes a more practical solution to ensure that your model’s assumptions about the real world are not being violated. We focus on one assumption shared by all statistical models—from simple linear models to modern deep neural networks. All of them assume that the target you have selected is a function of the observations you have on hand. We show that higher levels of performance can be reached by treating the given set of observations as raw material from which we generate new candidate targets that may be easier to learn. This paradigm is also known as self-supervised learning.

These new targets generated from the input data are, by their very definition, guaranteed to be functions of the objective. Doing so may seem unnecessary, but in fact it fortifies one of our statistical models’ greatest blind spots, helping us build more robust and reliable numerically driven trading applications. Let us get started.

Author: Gamuchirai Zororo Ndawana