An introduction to SVD and its widely used applications

If you’re in the data science world (or close to it), you probably already heard of SVD a thousand times even if you haven’t used it. Whether it’s for PCA (Principal Components Analysis) or recommendation algorithms, SVD is a powerful technique widely used today in a lot of models — we’ll describe what it is and show how it’s used in some key methods.

Note: we won’t cover the code part but only the theory.

What’s SVD?

SVD means Singular Value Decomposition . The SVD of a matrix X of dimension n×d is given by:

Where:

U and V are squared orthogonal:

D is diagonal of dimension d×n

Some additional notes:

D is not squared
The SVD of a matrix can be done for any matrix
SVD is different from the eigenvalue decomposition of a matrix.

Let’s define the eigenvalue decomposition of a matrix. First, it exists for a matrix X if and only if X is squared and the eigenvectors form a base in the matrix dimension space. If that’s the case, then one can write:

where P is the matrix of the eigenvectors and D elta is a diagonal matrix of the eigenvalues of X — here, D elta is squared.

In some sense, SVD is a generalization of eigenvalue decomposition since it can be applied to any matrix.

SVD used in PCA

PCA means Principal Components Analysis . Given an input matrix X , it consists in finding components p_i that are linear combinations of the original coordinates:

in such a way that:

The components are orthogonal ( E[p_ip_j]=0 )
The components are ordered in such a way that p_1 explains the largest percentage of variance, and p_2 has the second largest (that has not been explained by p_1 ) and so on.

We assume here that X is normalized ( E(x_i)=0 and Var(x_i)=1 for each feature).

One can show that the components p are given by

where Gamma is the matrix found when doing the eigenvalue decomposition of the variance-covariance matrix of X . Note that this is possible because the variance-covariance matrix is symmetric and any symmetric matrix has an eigen value decomposition . Also note that the otrhogonality of Gamma implies

Note also that since E(x)=0 , E(p)=0 . And

where D elta is the matrix of the eigenvalues of the variance-covariance matrix of X placed in descending order. The key thing to remember here is that the principal components of X are found through a linear transformation using the eigenvectors of the variance-covariance matrix .

Once we have the components, we can reduce our data by keeping only the top k ( k being arbitrary) components.

But do we really need to compute this eigenvalue decomposition?

No! And that’s where SVD comes in. Indeed, using the formulas of SVD above, the eigenvectors of the variance-covariance matrix are given by the matrix U of the SVD of X . And the eigen values are the squares of the singular values from the SVD of X .

We can thus use SVD to perform PCA , and keep the top k singular values of X to approximate given the top k principal components.

In some common cases, doing PCA through SVD is numerically more efficient than through the eigenvalue decomposition of the variance-covariance matrix.

SVD used in matrix completion

For most recommendation algorithms, the input matrix being very sparse, matrix factorization methods are key. SVD plays a central role in it.

The general matrix factorization problem formulated as

Under the constraint that the rank of M is less or equal than an arbitrary integer r . One can prove that the solution of this problem is

where

and D_r is constructed from D by keeping just the first r singular values.

One can note that we can’t easily solve the above problem when the matrix X has missing values. There are several methods to cope with that. One of them is to start with an initial solution and to iterate by doing SVDs until convergence. We won’t get into the details of the methods here, but the bottom line is that SVD can also be used in matrix completion, since it involves doing matrix factorization.

What’s SVD?

SVD used in PCA

SVD used in matrix completion

Recommend

JavaScript 基础：继承

Predicting Next Purchase Day

Text Classification — RNN’s or CNN’s?

谁才是世界上最好的 CI/CD 工具？

1st Place Solution for Intel Scene Classification Challenge

Raft算法及etcd/raft的实现思路借鉴

大小端与序列化

Laravel's factory method but for JavaScript

Java并发 -- Disruptor

JavaScript 基础：异步

About Joyk