Properties of the Singular Value Decomposition

1 Properties of the Singular Value Decomposition A good reference on numerical linear algebra is G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, 1983. Preliminary definitions: Hermitian: Consider x ∈Cn . Then we define the vector " x Hermitian" by x H := x T . That is, x H is the complex conjugate transpose of x. Similarly, for a matrix A ∈Cm×n , we define AH ∈Cn×m by A T . We say that A ∈Cn×n is a Hermitian matrix if A = AH . Euclidean inner product: Given x, y ∈Cn . Let the elements1 of x and y x1 y1 x y be denoted x = 2 and y = 2 . M M xn yn Then the Euclidean inner product is defined as x, y := x H y = + + + x1y1 x2 y2 K xn yn Euclidean vector norm: Let " •,• " denote the Euclidean inner product. Then the vector norm associated with this inner product is given by = x 2: x, x n . = 2 ∑ x1 i=1 We often omit the "2" subscript when we are discussing the Euclidean norm (or "2-norm") exclusively. 1 Sometimes we use subscripts to denote the elements of a vector, and sometimes to denote different members of a set of vectors. The meaning will be clear from context. 2 Euclidean matrix norm: Given A ∈Cm×n . Then the matrix norm induced by the Euclidean vector norm is given by: Av = 2 A 2: maxv≠0 v 2 = λ H max (A A) λ H H where max (A A) denotes the largest eigenvalue of the matrix A A. (It is a fact that all the eigenvalues of a matrix having the form AH A are all real and nonnegative.) Orthogonality: Two vectors x, y ∈Cn are orthogonal if x, y = 0. {x , x ,K , x } ∈Cn Orthonormal Set: A collection of vectors 1 2 m is said to be an orthonormal set if 0, i ≠ j x , x = i j 1, i = j = ∀ (Hence xi 1, i.) Orthogonal Complement: Consider a subspace X ⊆ Cn . The orthogonal complement of X , denoted X⊥ , is defined as X⊥ := {x ∈Cn: x, y = 0∀ y ∈X}. That is, every vector in X⊥ is orthogonal to every vector in X . ∈ n×n H = H = Unitary Matrix: A matrix U C is unitary if U U UU In . Fact: If U is a unitary matrix, then the columns of U form an orthonormal basis (ONB) for Cn . U U = [u u L u ] Proof of Fact: Denote the columns of as 1 2 n . Then 3 H u1 H H u2 U U = [u u L u ] M 1 2 n H un uHu uHu L uH 1 0 L 0 1 1 1 2 1 uHu uHu L uH 0 1 L 0 = 2 1 2 2 2 = M M O M M M O M H H L H 0 0 L 1 un u1 un u2 un Singular Value Decomposition: Consider M ∈Cm×n . Then there exist unitary matrices = U [u1 u2 K um ] = [ K ] V v1 v2 vn such that Σ U V H , m ≥ n 0 A = U[Σ 0]V H , m ≤ n where σ 1 0 L 0 0 σ L 0 Σ = 2 , p = min(m,n) M M O M 0 0 L σ p and σ ≥ σ ≥K ≥ σ ≥ 0 1 2 p . σ Terminology: We refer to i as the i'th singular value, to ui as the i'th left singular vector, and to vi as the i'th right singular vector. 4 Properties of Singular Values and Vectors: (1) Each singular value and associated singular vectors satisfy = σ = K Av i iui , i 1, , p σ = σ (2) The largest singular value, max: 1, satisfies Av σ = 2 max: maxv≠0 v 2 = A 2 σ = σ (3) The smallest singular value, min : p , satisfies Av σ = 2 min : minv≠0 v 2 −1 If A is square and invertible, then A = 1σ . 2 min σ ≥ σ ≥K ≥ σ > σ =K = σ = 0 (4) Suppose that 1 2 r r+1 p . Then rank(A) = r (5) Suppose that rank(A) = r . Then ( ) = { K } R A span u1,u2 , ,ur , where R(A) denotes the range, or columnspace, of A. (6) Suppose that rank(A) = r . Then N(A) = span{v ,v ,K ,v } r +1 r +2 n , where N(A) denotes the (right) nullspace of A. (7) Suppose that rank(A) = r . Then R⊥ (A) = span{u ,u ,K ,u } r +1 r +2 m , where R⊥ (A) denotes the orthogonal complement of R(A). 5 (8) Suppose that rank(A) = r . Then ⊥ ( ) = { K } N A span v1,v2 , ,vr , where N⊥ (A) denotes the orthogonal complement of N(A). (9) Suppose that rank(A) = r . Then r = σ H A ∑ iuivi i=1 = Σ H Ur rVr where = [ K ], = [ K ] Σ = (σ σ K σ ) U r : u1 u2 ur V r : v1 v2 vr , r : diag 1 2 r . (10) If A ∈Cn×n is invertible, then n −1 1 H A = ∑ σ v u i i i i=1 Definition: The pseudoinverse of A, denoted A# , is given by # = Σ−1 A Vr r Ur . The pseudoinverse has the following properties: • If m = n and rank(A) = n, then A# = A−1. • If m > n, and rank(A) = n, then A is left invertible, and A# = A-L . • If m < n, and rank(A) = m , then A is right invertible, and A# = A− R . • Note that A# is well defined even if rank(A) < min(m,n). Many results for linear systems have the form "if such and such a matrix has full rank, then such and such a property holds" (paraphrased from Golub and Van Loan). Such results are naive, in that they neglect the the fact that a matrix generated from physical data is almost always full rank. The more important question is not "does the matrix have full rank?", but rather "how close is the matrix 6 to one which does not have full rank?". The SVD is a very useful tool in making this concept precise. Consider A ∈Cm×n , and that A has full rank: rank(A) = p:= min(m,n). Suppose that we perturb the elements of A, yielding AÃ = A + ∆A. We wish to know how large the error ∆A can become before AÃ loses rank. Proposition: ∆ < σ ( ) (a) Suppose that A 2 min A . Then rank(A + ∆A) = p. ∆ ∆ = σ ( ) (b) There exists a matrix A, with A 2 min A such that rank(A + ∆A) < p. Proof: (a) Using the triangle inequality (and dropping the subscript): σ ( + ∆ ) = ( + ∆ ) min A A : min v =1 A A v ≥ { − ∆ } min v =1 Av Av ≥ − ∆ min v =1 Av max v =1 Av ≥ σ − σ ∆ min (A) max ( A) > 0 ⇒ rank(A + ∆A) = p p = σ H ∆ = −σ H σ = σ (b) Let A ∑ iuivi , and consider A pupvp (where p min ). It is i=1 p−1 + ∆ = σ H ( + ∆ ) = − easy to see that A A ∑ iuivi , and thus that rank A A p 1. i=1 7 Suppose that rank(A) = p, but A has very small singular values. Then A is "close" to a singular matrix in the sense that there exists a small perturbation ∆A to the elements of A that causes AÃ to lose rank. Indeed, such a matrix should possibly be treated in applications as though it were singular. In practice, people do not look always look for small singular values to determine distance to singularity. It is often more useful to compute the ratio between the maximum and minimum singular values. Definition (Condition Number): Consider A ∈Cm×n and suppose that rank(A) = p = min(m,n). Then the (Euclidean) condition number of A is defined as σ (A) κ(A):= max . σ ( ) min A ### Suppose rank(A) = p, but that κ(A) >> 1. It follows that A is "almost rank deficient". In fact, the process of calculating an inverse2 for A may not be numerically robust, and calculations involving A−1 may be prone to error. Let's explore this idea. Consider the linear system of equations = ∈ m×n ∈ n ≥ ( ) = Ax b, A C , b C , m n, rank A n. Suppose that we are given the data for A and b and need to solve for x. Let R (A) denote the range of A. If b ∈R(A), then we may find x from x = A#b. In reality, the elements of A and b will be corrupted by errors. These may arise due to uncertainty in the methods used to generate the data. Errors also arise due to numerical roundoff in the computer representation of the data. We would like to know how these errors affect the accuracy of the solution vector x. 2 If A is not square, then we calculate A# which will be equal to the left or right inverse, whichever is appropriate. 8 Let AÃ:= A + ∆A bÃ:= b + ∆b xÃ:= x + ∆x so that AÃxÃ = bÃ . Our next result relates the errors ∆A and ∆b to errors in the computed value of x. m×n Proposition: Consider A ∈C has rank n, and that ∆A ≤ α < 1. σ min(A) ∆A ∆b Suppose that ∆A and ∆b satisfy the bounds ≤ δ and ≤ δ , A b where δ is a constant. Then ∆x 2δ ≤ κ(A). x 1 − α ### The proof is obtained as a sequence of homework exercises in Golub and van Loan. − Note that we can, in principle, calculate A 1 by successively solving 1 0 0 0 1 0 the above linear system for b = , ,K , .

Properties of the Singular Value Decomposition

C H a P T E R § Methods Related to the Norm Al Equations

The Column and Row Hilbert Operator Spaces

Matrices and Linear Algebra

Lecture 5 Ch. 5, Norms for Vectors and Matrices

Matrix Norms 30.4

5 Norms on Linear Spaces

A Degeneracy Framework for Scalable Graph Autoencoders

Chapter 6 the Singular Value Decomposition Ax=B Version of 11 April 2019

Domain Generalization for Object Recognition with Multi-Task Autoencoders

Norms of Vectors and Matrices and Eigenvalues and Eigenvectors - (7.1)(7.2)

Deep Subspace Clustering Networks

Eigenvalues and Eigenvectors