Understanding Singular Value Decomposition (SVD)
Singular Value Decomposition (SVD) is a fundamental technique in linear algebra for matrix factorization. It has widespread applications in data science and machine learning.
What is SVD?
Any m×n matrix A of rank r can be decomposed into the format
where U is orthogonal, 𝛴 is diagonal, and V is orthogonal. Such decomposition is called a singular value decomposition.
A matrix is orthogonal means that its columns and rows are orthonormal vectors. In other words,
The diagonal elements of 𝛴 are ranked by value. Those diagonal elements are called singular value. 𝛴’s rank, which is equal to the count of the nonzero diagonal elements, is the same as the rank of A.
The proof of the existence of SVD is beyond the scope of this article. Interested reader can check out https://www.cs.ox.ac.uk/james.worrell/SVD-thin.pdf for more info.
Why SVD?
Almost all of SVD’s application in machine learning is related to dimension reduction. So let’s elaborate on that.
Upon revisiting the equation above, have you noticed that there are only r terms in the summation of the rightmost formula? The number of terms is the same as the rank of the matrix. This motivates us to approximate matrix A with a lower rank matrix, defined by
where k < r.
That makes sense, but there can be many other ways to find a matrix of lower rank but similar to matrix A. To give a naive example, we can simply set a column or row of a full-rank matrix A to be zero. That by definition will give us a lower-rank matrix. How can SVD be better than the other lower rank approximation?
The answer is that the SVD decomposition does have the property that any rank k<r truncated decomposition approximates A in an optimal way. As stated in Eckhart-Young Theorem, the Frobenius norm, measuring the difference between the original matrix A and any lower-rank matrix B, is minimized when B equals to the truncated SVD of A.
A proof of the Eckhart-Young Theorem can be found in https://lzhangstat.medium.com/the-proof-of-eckhart-young-theorem-8aad76e5f4be.
How to calculate SVD, i.e. U, 𝝨, and V
We know that if a matrix is symmetric positive definite, its eigenvectors are orthogonal and we can write it in the format of QΛQ^T, where Q consists of the matrix’s eigenvectors and Λ is diagonal. While A may not always be symmetric positive definite, A^TA must be. That means we can write A^T A= QΛQ^T. If we plug in A = U𝛴V^T, we would find out that
That means V = Q and 𝛴 = sqrt(Λ).
To find U , we do the same thing with AA^T.