Introduction to Mathematical Programming

Ming Zhong

Lecture 24

October 29, 2018

Ming Zhong (JHU) AMS Fall 2018 1 / 18 Singular Value Decomposition (SVD) Table of Contents

1 Singular Value Decomposition (SVD)

Ming Zhong (JHU) AMS Fall 2018 2 / 18 Singular Value Decomposition (SVD) Decomposition

n×n We have discussed several decomposition techniques (for A ∈ R ), A = LU when A is non-singular (). A = LL> when A is symmetric positive definite (Cholesky). A = QR for any real (unique when A is non-singular). A = PDP−1 if A has n linearly independent eigen-vectors). A = PJP−1 for any real square matrix. We are now ready to discuss the Singular Value Decomposition (SVD),

A = UΣV∗,

m×n m×m n×n m×n for any A ∈ C , U ∈ C , V ∈ C , and Σ ∈ R .

Ming Zhong (JHU) AMS Fall 2018 3 / 18 Singular Value Decomposition (SVD) Some Brief History

Going back in time, It was originally developed by differential geometers, equivalence of bi-linear forms by independent orthogonal transformation. Eugenio Beltrami in 1873, independently Camille Jordan in 1874, for bi-linear forms. James Joseph Sylvester in 1889 did SVD for real square matrices, independently; singular values = canonical multipliers of the matrix. Autonne in 1915 did SVD via polar decomposition. Carl Eckart and Gale Young in 1936 did the proof of SVD of rectangular and complex matrices, as a generalization of the principal axis transformaton of the Hermitian matrices. Erhard Schmidt in 1907 defined SVD for integral operators. Emile´ Picard in 1910 is the first to call the numbers singular values. Kogbetliantz in 1954, 1955 developed practical methods for computing SVD; independently, Hestenes in 1958, resembling closely to the Jacobi eigenvalue algorithm, using plane rotations or Givens rotations. Replaced by Gene Golub and William Kahan 1965, which uses Householder transformation or reflections. Golub and Christian Reinsch in 1970 published a variant of the Golub/Kahan algorithm.

Ming Zhong (JHU) AMS Fall 2018 4 / 18 Singular Value Decomposition (SVD) Some Special Matrices

Recall that, A> is the transpose of the matrix A. A−1 is the inverse of A. A∗ is the Hermitian transpose of A, i.e., A∗ = (A¯)>. A = A>, A is symmetric; A = A∗, A is Hermitian. A = −A>, A is skew-symmetric. A∗ = A−1, A is unitary.   d11 0 ··· 0  0 d22 ··· 0  D =   is a diagonal matrix.  . . .. .   . . . .  0 ··· 0 dnn

Ming Zhong (JHU) AMS Fall 2018 5 / 18 Singular Value Decomposition (SVD) Matrix as Linear Transformation

Example Consider the following,

1  2 1 5 ~x = , A = , ~y = A~x = . 3 −1 1 2

Ming Zhong (JHU) AMS Fall 2018 6 / 18 Singular Value Decomposition (SVD) Matrix as Linear Transformation

As seen from the example, A~x is a (linear) transformation, which rotates ~x and then stretches it. A rotation (unitary) matrix,

cos(θ) − sin(θ) Q = sin(θ) cos(θ)

rotate the vector about the origin (counter-clockwise) for θ. A stretching (compressing) matrix,

α 0 D = , 0 α

stretch (compress) a vector by α (if α > 1).

Ming Zhong (JHU) AMS Fall 2018 7 / 18 Singular Value Decomposition (SVD) The Idea

So we want to, factor a matrix into a number of constitutive components. with stretching/compressing and rotation. Consider the image of a unit sphere under any m × n matrix (which is a hyper-ellipse). m n A hyper-ellipse in R is obtained upon stretching a unit sphere in R by some factors, σ1, σ2, ··· , σm, in the orthogonal directions, u~1, u~2, ··· , u~m.

We also only consider u~i 2 = 1. σi u~i is the principal semi-axis.

If A has rank r, exactly r of the σi ’s will be non-zero.

If m > n, at most n of the σi ’s will be non-zero.

We will also assume that σ1 ≥ σ2 ≥ · · · σr > 0.

Ming Zhong (JHU) AMS Fall 2018 8 / 18 Singular Value Decomposition (SVD) Unit Sphere to Hyper-Ellipse

Ming Zhong (JHU) AMS Fall 2018 9 / 18 Singular Value Decomposition (SVD) The Idea, cont.

The transformation of the unit sphere into hyper-ellipse is,

A~vi = σi u~i , 1 ≤ i ≤ r.

Putting them all together,   σ1  σ2  A ~v ··· ~v  = u~ ··· u~    1 r 1 r  ..   .  σr in the compact form, AVˆ = Uˆ Σˆ where Σˆ is a r × r diagonal matrix, Vˆ is a n × r matrix with orthonormal columns, and Uˆ is a m × r matrix with orthonormal columns.

Ming Zhong (JHU) AMS Fall 2018 10 / 18 Singular Value Decomposition (SVD) Reduced SVD

The factorization, AVˆ = Uˆ Σˆ is called the reduced singular value decomposition, or reduced SVD.

Ming Zhong (JHU) AMS Fall 2018 11 / 18 Singular Value Decomposition (SVD) Full SVD

We will add n − r columns to V and m − r columns to U, so that we have, A = UΣV∗, where, m×m n×n m×n U ∈ C unitary, V ∈ C unitary, Σ ∈ R diagonal. Σ has exactly r positive diagonal entries and sorted in descending order, σ1 ≥ σ2 ≥ · · · σr > 0, the rest of the entries are zero; r = min{m, n}. It first rotates by V∗, then stretch/compress by Σ, then rotates again by U. Theorem m×n Every matrix A ∈ C has a singular value decomposition. Furthermore, the singular values {σi }’s are uniquely determined. And if A is square and σi ’s are distinct, the singular vectors u~i and ~vi are unique up to complex signs.

Ming Zhong (JHU) AMS Fall 2018 12 / 18 Singular Value Decomposition (SVD) Computing SVD

Assuming SVD, then

A∗A = (UΣV∗)∗UΣV∗ = VΣ2V∗,

and AA∗ = UΣV∗(UΣV∗)∗ = UΣ2U∗. They become the eigenvalue problems,

A∗AV = VΣ2 AA∗U = UΣ2

Note that A∗A is symmetric, thus possible.

Ming Zhong (JHU) AMS Fall 2018 13 / 18 Singular Value Decomposition (SVD) Example

Example Consider, 3 0  A = , 0 −2 then 9 0 A∗A = 0 4 9 0 AA∗ = 0 4

Ming Zhong (JHU) AMS Fall 2018 14 / 18 Singular Value Decomposition (SVD) Example, cont.

Example

The eigenvalues are λ1,2 = {9, 4}, so σ1,2 = {3, 2}, and U and V are given by, ±1 0  . 0 ±1 So, 1 0 3 0 1 0  A = UΣV∗ = 0 1 0 2 0 −1

Ming Zhong (JHU) AMS Fall 2018 15 / 18 Singular Value Decomposition (SVD) Computing SVD

The SVD of a matrix A is typically computed by a two-step procedure, First step, the matrix is reduced to a bi-diagonal matrix, takes O(mn2) floating-point operations (flop), assuming m ≥ n. Second step, to compute SVD of such bi-diagonal matrix, it can be done with an (eigenvalue algorithm). It suffices to compute SVD upto a certain precision (machine epsilon), then it might take O(n) flops. If only singular values are needed, then using Householder reflections for a cost of 4mn2 − 4n3/3 flops. m  n, it is advantageous to first reduce the matrix A to a with QR decomposition and then use Householder reflections to further reduce the matrix to bi-diagonal form, the cost is 2mn2 + 2n3 flops. The second step can be done by a variant of the QR algorithm for the computation of eigenvalues, first described by Golub & Kahan in 1965. Ming Zhong (JHU) AMS Fall 2018 16 / 18 Singular Value Decomposition (SVD) Computing SVD, cont.

Continue on, The DBDSQR in LAPACK implements such iterative approach, this forms the DGESVD. Same algorithm implemented in the GNU Scientific Library (GSL). GSL offers an alternative method: using a one-sided Jacobi orthogonalization in the second step; yet another method for second step uses the idea of divide and conquer eigenvalue algorithms. There are more...... In MATLAB, one just simply uses [U, S, V ] = svd(A).

Ming Zhong (JHU) AMS Fall 2018 17 / 18 Singular Value Decomposition (SVD) Applications of SVD

Some important applications, Pseudo-inverse: A† = VΣ†U∗. Solving homogeneous linear equations, A~x = ~0.

Total least square minimization: minimize A~x 2 with ~x 2 = 1. Range, null space and rank. Low-rank matrix approximation: truncated SVD. P ∗ Separable Models: A = i σi u~i ⊗ ~vi . Nearest orthogonal matrix, the Kabsch algorithm, signal processing, linear inverse problems (Tikhonov regularization methods), PCA, pattern recognition, model analysis, latent semantic indexing in natural language text processing, quantum information (Schmidt decomposition), etc.

Ming Zhong (JHU) AMS Fall 2018 18 / 18