Discussing the article: "Gaussian Processes in Machine Learning (Part 1): Classification Model in MQL5"
I haven’t had a proper read through it yet, but it seems I’ve already missed something.
В отличие от таких методов, как ... деревья решений, которые выдают только метку класса, ГП позволяют получить вероятностное предсказание.
In my humble opinion, decision trees are excellent at predicting class probabilities.
For classification tasks where the targets are discrete class labels, Gaussian likelihood isn’t suitable.
It seems that ‘tree-based’ classification algorithms convert probabilities into continuous ‘log-odds’ values, and then classification effectively boils down to a regression problem on these continuous log-odds values. Why can’t this be applied to Gaussian likelihood, whatever that may be? Unfortunately, I haven’t come across this term anywhere other than in the Python manual, but I’m familiar with the Gaussian distribution, Gaussian mixtures, maximum likelihood and expectation-maximisation ;-).
I haven’t read it through in detail yet, but it seems I’ve already missed something.
In my humble opinion, trees are excellent at predicting class probabilities.
It seems that ‘tree-based’ classification algorithms convert probabilities into continuous ‘log-odds’ values, and then classification effectively boils down to a regression problem on these continuous log-odds values. Why can’t this be applied to Gaussian likelihood, whatever that may be? Unfortunately, I haven’t come across this term anywhere other than in the Python manual, but I’m familiar with the Gaussian distribution, Gaussian mixtures, maximum likelihood and expectation-maximisation ;-).
Good afternoon!
Indeed, I had a look at scikit-learn; the trees return the class probability. For some reason, I thought that only ensemble methods returned probabilities. Well, you live and learn, as they say.
Now, regarding Gaussian likelihood and why it isn’t suitable for classification tasks.
Gaussian likelihood is the probability density of a normal distribution, subject to the mathematical expectation and variance. In our case, the role of the mathematical expectation in the likelihood is played by the hidden function f, whilst the variance is, in fact, the true data noise.
How does the likelihood differ from a standard probability density function? In a standard probability density function, we substitute certain values of y for fixed parameter values and obtain the probability of that y.
With likelihood, it is the other way round. Our y is fixed, whilst the distribution parameters vary. In other words, the likelihood is a function of the parameters. For example, the likelihood tells us that, for parameters 0.2 and 1, the probability of our observed trajectory y = 0.06. And with parameters of 0.8 and 1.2, the probability of observing y is 0.12. In other words, we see that the second set of parameters provides a more plausible description of the empirical data we are dealing with. Hence the name ‘likelihood’.
Now, why can’t we simply take the ‘logodds’ and apply them to Gaussian likelihood? Gaussian likelihood assumes that the observed data y follow a normal distribution. In other words, y are continuous values.
In a GP model for classification, the latent function f(x) can be interpreted as ‘logodds’. But we predict this function; we do not observe it. What we do observe are discrete labels y. And the Gaussian likelihood is applied precisely to the observed data. Our observed data, however, are discrete. And therefore, in the binary case, they are distributed according to the Bernoulli distribution.
For a classification problem, the likelihood must describe the probability of discrete labels; therefore, it is natural to choose the log-likelihood here.
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Check out the new article: Gaussian Processes in Machine Learning (Part 1): Classification Model in MQL5.
We continue our acquaintance with the machine learning model – Gaussian processes (GP). In the previous article, we examined in detail the regression problem, where the main goal was to predict continuous values. Today we have to deal with a much more complex topic – classification. Its main difficulty is that the inference for classification in Gaussian processes does not have an analytical solution, which requires the use of approximate methods such as Laplace approximation.
To effectively solve this complex problem, we will develop a modular library of Gaussian processes in MQL5. This approach will allow us to structure the code by separating the GP model into independent components and will provide a solid foundation for further improvements and extensions. This library will become a universal tool for both regression and classification tasks.
In the first part of the article, we will examine in detail the theory of GP classification, including the mathematics underlying the approximate methods. We will also introduce the main class of the library — GaussianProcess, which will unite all components of the model, as well as the GPOptimizationObjective class responsible for communication with the Alglib optimization library.
Author: Evgeniy Chernish