42

An introduction to SVD and its widely used applications

 5 years ago
source link: https://www.tuicool.com/articles/ARZjumz
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
v6Jnuym.jpg!web

If you’re in the data science world (or close to it), you probably already heard of SVD a thousand times even if you haven’t used it. Whether it’s for PCA (Principal Components Analysis) or recommendation algorithms, SVD is a powerful technique widely used today in a lot of models — we’ll describe what it is and show how it’s used in some key methods.

Note: we won’t cover the code part but only the theory.

What’s SVD?

SVD means Singular Value Decomposition . The SVD of a matrix X of dimension n×d is given by:

7bq2iaR.png!web

Where:

  • U and V are squared orthogonal:
qiyeeuz.png!web
NFNBZrM.png!web
  • D is diagonal of dimension d×n

Some additional notes:

  • D is not squared
  • The SVD of a matrix can be done for any matrix
  • SVD is different from the eigenvalue decomposition of a matrix.

Let’s define the eigenvalue decomposition of a matrix. First, it exists for a matrix X if and only if X is squared and the eigenvectors form a base in the matrix dimension space. If that’s the case, then one can write:

riE3imV.png!web

where P is the matrix of the eigenvectors and D elta is a diagonal matrix of the eigenvalues of X — here, D elta is squared.

In some sense, SVD is a generalization of eigenvalue decomposition since it can be applied to any matrix.

SVD used in PCA

PCA means Principal Components Analysis . Given an input matrix X , it consists in finding components p_i that are linear combinations of the original coordinates:

AjUry2J.png!webbAzQriE.png!web

in such a way that:

  • The components are orthogonal ( E[p_ip_j]=0 )
  • The components are ordered in such a way that p_1 explains the largest percentage of variance, and p_2 has the second largest (that has not been explained by p_1 ) and so on.

We assume here that X is normalized ( E(x_i)=0 and Var(x_i)=1 for each feature).

One can show that the components p are given by

MjeEF3I.png!web

where Gamma is the matrix found when doing the eigenvalue decomposition of the variance-covariance matrix of X . Note that this is possible because the variance-covariance matrix is symmetric and any symmetric matrix has an eigen value decomposition . Also note that the otrhogonality of Gamma implies

NVvYFr2.png!web

Note also that since E(x)=0 , E(p)=0 . And

yYFbAv6.png!web

where D elta is the matrix of the eigenvalues of the variance-covariance matrix of X placed in descending order. The key thing to remember here is that the principal components of X are found through a linear transformation using the eigenvectors of the variance-covariance matrix .

Once we have the components, we can reduce our data by keeping only the top k ( k being arbitrary) components.

But do we really need to compute this eigenvalue decomposition?

No! And that’s where SVD comes in. Indeed, using the formulas of SVD above, the eigenvectors of the variance-covariance matrix are given by the matrix U of the SVD of X . And the eigen values are the squares of the singular values from the SVD of X .

We can thus use SVD to perform PCA , and keep the top k singular values of X to approximate given the top k principal components.

EJ7Rjar.png!web

In some common cases, doing PCA through SVD is numerically more efficient than through the eigenvalue decomposition of the variance-covariance matrix.

SVD used in matrix completion

For most recommendation algorithms, the input matrix being very sparse, matrix factorization methods are key. SVD plays a central role in it.

The general matrix factorization problem formulated as

emmMBnq.png!web

Under the constraint that the rank of M is less or equal than an arbitrary integer r . One can prove that the solution of this problem is

2IvI3aJ.png!web

where

RJVjemI.png!web

and D_r is constructed from D by keeping just the first r singular values.

One can note that we can’t easily solve the above problem when the matrix X has missing values. There are several methods to cope with that. One of them is to start with an initial solution and to iterate by doing SVDs until convergence. We won’t get into the details of the methods here, but the bottom line is that SVD can also be used in matrix completion, since it involves doing matrix factorization.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK