Discussion of article "Neural networks made easy (Part 17): Dimensionality reduction"

 

New article Neural networks made easy (Part 17): Dimensionality reduction has been published:

In this part we continue discussing Artificial Intelligence models. Namely, we study unsupervised learning algorithms. We have already discussed one of the clustering algorithms. In this article, I am sharing a variant of solving problems related to dimensionality reduction.

Principal component analysis was invented by the English mathematician Karl Pearson in 1901. Since then, it has been successfully used in many science fields.

To understand the essence of the method, I propose to take the simplified task related to the reducing the dimension of a two-dimensional data array to a vector. From a geometric point of view, this can be represented as a projection of points of a plane onto a straight line.

In the figure below, the initial data is represented by blue dots. There are two projections, on the orange and gray lines, with dots of the corresponding color. As you can see, the average distance from the initial points to their orange projections is smaller than the similar distances to the gray projections. Gray projections have overlapping of projections of points. So, the orange projection is more preferable, as it separates all individual points and loses less data when reducing the dimension (distance from points to their projections).

Such a line is called the principal component. That is why the method is called Principal Component Analysis.

From a mathematical point of view, each principal component is a numerical vector which size is equal to the dimension of the original data. The product of the vector of original data describing one system state by the corresponding vector of the principal component generates the projection point of the analyzed state on the straight line.

Depending on the original data dimension and the requirements for dimensionality reduction, there can be several principal components, but no more than the original data dimension. When rendering a volumetric projection, there will be three of them. When compressing data, the allowable error is usually a loss of up to 1% of data.

Principal component method

Visually this looks similar to a linear regression. But these are completely different methods, and they produce different results.

Author: Dmitriy Gizlyk

 

Ещё одна область использования методов понижения размерности — это визуализация данных. К примеру, у Вас есть данные описания состояний некой системы, представленные 10 параметрами. И Вам необходимо найти способ визуализировать эти данные. Для восприятия человеком наиболее предпочтительными являются 2-х и 3-мерные изображения. Конечно, можно сделать несколько слайдов с различными вариациями 2-3 параметров. Но это не даст полного представления о картине состояний системы. И в большинстве случаев различные состояния в различных слайдах будут сливаться в 1-ну точку. И не всегда это будут одни и те же состояния.

Therefore, we would like to find such an algorithm, which would help us to translate all our states of the system from 10 parameters into 2 or 3-dimensional space. And at the same time it would divide our states of the system with preservation of their mutual location as much as possible. And of course, with minimal loss of information.

Dmitry, thank you for the article!

After reading these lines, I immediately remember the process of analysing the results of optimisation, when I look at a 3D graph and change parameters for each of the axes in turn. After all, I want to see not just the best value of a parameter, but also its influence on other parameters.

Will the method of principal components help in this case? How will the graph look like after dimensionality reduction? How will it be possible to extract from it the values of this or that parameter at each point?

 
Andrey Khatimlianskii results of optimisation, when I look at a 3D graph and change parameters for each of the axes in turn. After all, you want to see not just the best value of a parameter, but also its effect on other parameters.

Will the method of principal components help in this case? How will the graph look like after dimensionality reduction? How will it be possible to extract from it the values of this or that parameter at each point?

In the case of explicit positions of axes (when they can be determined unambiguously), yes, it will help, when there are several variants of location, close in values, then the first calculations will give the result of axis direction, which is not always true. In general, dimensionality reduction does not work in uniform distributions.

ZY, the article is creditable, the author is respected.
 
Valeriy Yastremskiy #:

In case of explicit positions of axes (when they can be determined unambiguously), yes, it will help, when there are several variants of location, close in values, the first calculations will give the result of axis direction, which is not always true. In general, dimensionality reduction does not work in uniform distributions.

Apparently, you need to be in the subject to understand the answer.

Strategy parameters are located on the axes, they can have very different values, be related or independent. I would like to analyse 1 chart and see all relationships at once.

 

"Reducing dimensionality" can quickly push you into a two-dimensional corner if you don't realise that it only works "in sample" in most cases :)

but the article is cool in terms of porting PCA to MQL, although it is in alglib
 
Andrey Khatimlianskii #:

Apparently, you have to be on topic to understand the answer.

Strategy parameters are located on the axes, they can have very different values, be related or independent. I would like to analyse 1 chart and see all relationships at once.

No, the PCA will not give you all the correlations at once, it will only give you an average. It will highlight the strongest ones. If the long-term result does not depend on the parameters, i.e. is constant, PCA will not help. If the influence of parameters on the result is stepwise or wave-like constant, it will not help either, unless of course the analysis is made within one wave/step).

 
Maxim Dmitrievsky #:

"Reducing dimensionality" can quickly push you into a two-dimensional corner if you don't realise that it only works "in sample" in most cases :)

but the article is cool in terms of porting PCA to MQL, although it is in alglib.

you can see the trend))))

 
Andrey Khatimlianskii results of optimisation, when I look at a 3D graph and change parameters for each of the axes in turn. After all, you want to see not just the best value of a parameter, but also its effect on other parameters.

Will the method of principal components help in this case? How will the graph look like after dimensionality reduction? How will it be possible to extract from it the values of this or that parameter at each point?

Andrew, the situation can be explained by the graph presented in the first post. With PCA we are downgrading the dimensionality to a single line. That is, from 2 coordinates, which we multiply by the dimensionality reduction vector we get one value - the distance from "0" to the point on the orange line. By multiplying this distance by the transposed reduction matrix, we get the coordinates of this point in 2-dimensional space. But in doing so, of course, we will get a point on the line with some deviation from the true point. Thus, for each point in the reduced space we can get the coordinates in the original space. But with some error from the original data.

 
Dmitriy Gizlyk #:

Andrew, the situation can be explained by the graph presented in the first post. With PCA we reduce the dimensionality to a single line. I.e. from 2 coordinates, which we multiply by the dimensionality reduction vector we get one value - the distance from "0" to the point on the orange line. By multiplying this distance by the transposed reduction matrix, we get the coordinates of this point in 2-dimensional space. But in doing so, of course, we will get a point on the line with some deviation from the true point. Thus, for each point in the reduced space we can get the coordinates in the original space. But with some error from the original data.

Thanks for the answer.

If the X axis is the value of the parameter and the Y axis is the result of the run in money, a lot of information is lost after the transformation.

And the most unclear thing is how this might look on a 3d chart. How will the dimensionality be lowered?

And for 4d? What would be the result?

Probably need a good imagination or deep understanding of all processes here )

 
Andrey Khatimlianskii #:

Thank you for your reply.

If the X axis is the value of the parameter and the Y axis is the result of the run in money, a lot of information is lost after the conversion.

And the most unclear thing is how it might look on a 3d chart. How will the dimensionality be reduced?

And for 4d? What would be the result?

I guess you need a good imagination or a deep understanding of all the processes )

This is a simplified example at all, of course when we have 2 parameters and one result this method is not much needed. It is when the parameters are more than 5, then there is a problem of visualisation. 4 parameters can be represented by video (one parameter as time) 3 parameters volumetric image. And the result, density or colour in the volumetric image

from wiki is not a bad explanation

Formal formulation of the CRA problem

The principal component analysis problem has at least four basic versions:

  • approximate the data by linear manifolds of lower dimension;
  • find subspaces of lower dimensionality, in theorthogonal projection on which the data scatter (i.e. thestandard deviation from the mean value) is maximised;
  • find subspaces of lower dimensionality, in orthogonal projection on which the RMS distance between points is maximal;
  • for a given multidimensional random variable, construct such an orthogonal transformation of coordinates, as a result of which thecorrelations between individual coordinates will turn to zero.
 
Always heard Neurals were the future of AI.