<<

Chapter 7

The Decomposition (SVD)

✬ ✩ 1 The SVD produces orthonormal basesofv’s andu’s for the four fundamental subspaces.

2 Using those bases,A becomes a diagonal matrixΣ and Av i =σ iui :σ i = singular value.

T 1 3 The two-bases diagonalizationA=UΣV often has more information thanA=XΛX − .

T T T T 4UΣV separatesA into rank-1 matricesσ 1u1v + +σ rurv .σ 1u1v is the largest! ✫ 1 ··· r 1 ✪

7.1 Bases and Matrices in the SVD

The Singular Value Decomposition is a highlight of .A is anymbyn , 1 square or rectangular. Its rank isr. We will diagonalize thisA, but not byX − AX. The eigenvectors inX have three big problems: They are usually not orthogonal, there are not always enough eigenvectors, andAx=λx requiresA to be a . The singular vectorsofA solve all those problems in a perfect way. Let me describe what we want from the SVD : the right bases for the four subspaces. Then I will write about the steps tofind those bases in order of importance. The price we pay is to have two sets of singular vectors,u’s andv’s. Theu’s are in Rm and thev’s are inR n. They will be the columns of anmbym matrixU and annby n matrixV . I willfirst describe the SVD in terms of those basis vectors. Then I can also describe the SVD in terms of the orthogonal matricesU andV. (using vectors) Theu’s andv’s give bases for the four fundamental subspaces :

u1,...,u r is an for the column space T ur+1,...,u m is an orthonormal basis for the left nullspaceN(A ) v1,...,v r is an orthonormal basis for the row space vr+1,...,v n is an orthonormal basis for the nullspaceN(A).

381 382 Chapter 7. The Singular Value Decomposition (SVD)

More than just orthogonality, these basis vectors diagonalize the matrixA:

“A is diagonalized” Av1 =σ 1u1 Av2 =σ 2u2 ... Avr =σ rur (1)

Those singular valuesσ 1 toσ r will be positive numbers :σ i is the length ofAv i. Theσ’s go into a that is otherwise zero. That matrix isΣ. (using matrices) Since theu’s are orthonormal, the matrixU with thoser columns has U TU=I. Since thev’s are orthonormal, the matrixV hasV TV=I. Then the equations Avi =σ iui tell us column by column thatAV r =U rΣr:       (mbyn)(nbyr) σ1 AV =U Σ A  v v  =  u u    . (2) r r r 1 ·· r 1 ·· r · (mbyr)(rbyr) · σr This is the heart of the SVD, but there is more. Thosev’s andu’s account for the row space and column space ofA. We haven r morev’s andm r moreu’s, from the − − nullspaceN(A) and the left nullspaceN(A T). They are automatically orthogonal to the firstv’s andu’s (because the whole nullspaces are orthogonal). We now include all the v’s andu’s inV andU, so these matrices become square. We still haveAV=UΣ.       (mbyn)(nbyn) σ1       Av equalsUΣ A v1 v r v n = u1 u r u m · . (3) ·· ·· ·· ·· · σr (mbym)(mbyn)

The newΣ ism byn. It is just ther byr matrix in equation (2) withm r extra zero − rows andn r new zero columns. The real change is in the shapes ofU andV. Those − are square orthogonal matrices. So AV=UΣ can becomeA=UΣV T. This is the T Singular Value Decomposition. I can multiply columnsu iσi fromUΣ by rows ofV :

T T T SVD A=UΣV =u 1σ1v + +u rσrv . (4) 1 ··· r

Equation (2) was a “reduced SVD” with bases for the row space and column space. Equation (3) is the full SVD with nullspaces included. They both split upA into the same T r matricesu iσivi of rank one: column times row. 2 T T We will see that eachσ i is an eigenvalue ofA A and also AA . When we put the singular values in descending order,σ σ ...σ >0, the splitting in equation (4) 1 ≥ 2 ≥ r gives ther rank-one pieces ofA in order of importance. This is crucial. T 1 Example 1 When isΛ=UΣV (singular values) the same asXΛX − (eigenvalues) ? Solution A needs orthonormal eigenvectors to allowX=U=V.A also needs eigenvaluesλ 0 ifΛ=Σ. SoA mustbea positive semidefinite(or definite) symmetric ≥ 1 T T matrix. Only then willA=XΛX − which is alsoQΛQ coincide withA=UΣV . 7.1. Bases and Matrices in the SVD 383

Example 2 IfA=xy T (rank1) with unit vectorsx andy, what is the SVD ofA? T Solution The reduced SVD in (2) is exactlyxy , with rankr=1. It hasu 1 =x and v1 =y andσ 1 = 1. For the full SVD, completeu 1 =x to an orthonormal basis ofu’s, and completev 1 =y to an orthonormal basis ofv’s. No newσ’s, onlyσ 1 = 1.

Proof of the SVD We need to show how those amazingu’s andv’s can be constructed. Thev’s will be orthonormal eigenvectors ofA TA. This must be true because we are aiming for

ATA=(UΣV T)T(UΣV T) =VΣ TU TUΣV T =VΣ TΣV T. (5) On the right you see the eigenvector matrixV for the symmetric positive (semi) definite matrixA TA. And(Σ TΣ) must be the eigenvalue matrix of(A TA): Eachσ 2 isλ(A TA)! NowAv i =σ iui tells us the unit vectorsu 1 tou r. This is the key equation (1). The essential point—the whole reason that the SVD succeeds—is that those unit vectors u1 tou r are automatically orthogonal to each other (because thev’s are orthogonal): � � � � T T T 2 T Avi Avj vi A Avj σj T Key stepu i uj = = = vi vj = zero. (6) σi σj σiσj σiσj Thev’s are eigenvectors ofA TA (symmetric). They are orthogonal and now theu’s are also orthogonal. Actually thoseu’s will be eigenvectors of AA T. Finally we complete thev’s andu’s tonv’s andmu’s with any orthonormal bases for the nullspacesN(A) andN(A T). We have foundV andΣ andU inA=UΣV T.

An Example of the SVD Here is an example to show the computation of three matrices inA=UΣV T. � � 3 0 Example 3 Find the matricesU,Σ,V forA= . The rank isr=2. 4 5

With rank2, thisA has positive singular valuesσ 1 andσ 2. We will see thatσ 1 is larger T T thanλ max = 5, andσ 2 is smaller� thanλ�min = 3. Begin with�A A and� AA : 25 20 9 12 ATA= AAT = 20 25 12 41 2 2 Those have the same trace (50) and the same eigenvaluesσ 1 = 45 andσ 2 = 5. The square roots areσ 1 = √45 andσ 2 = √5. Thenσ 1σ2 = 15 and this is the determinant ofA. A key step is tofind the eigenvectors ofA TA (with eigenvalues45 and5): � �� � � �� �� � � � 25 20 1 1 25 20 1 1 =45 =5 20 25 1 1 20 25 −1 −1

Thenv 1 andv 2 are those (orthogonal!) eigenvectors rescaled to length1. 384 Chapter 7. The Singular Value Decomposition (SVD)

� � � � 1 1 1 1 Right singular vectorsv 1 = v2 = − .u i = left singular vectors. √2 1 √2 1

Now computeAv andAv which will beσ u = √45u andσ u = √5u : 1 2 � � 1 1 � 1 � 2 2 2 3 1 1 1 Av1 = = √45 =σ 1 u1 √2 3 √10 3 � � � � 1 3 1 3 Av2 = − = √5 − =σ 2 u2 √2 1 √10 1

The division by √10 makesu 1 andu 2 orthonormal. Thenσ 1 = √45 andσ 2 = √5 as expected. The Singular Value Decomposition isA=UΣV T :

� � � � � � 1 1 3 √45 1 1 1 U= − Σ= V= − . (7) √10 3 1 √5 √2 1 1

U andV contain orthonormal bases for the column space and the row space (both spaces are justR 2). The real achievement is that those two bases diagonalizeA: AV equalsUΣ. Then the matrixU TAV=Σ is diagonal. The matrixA splits into a combination of two rank-one matrices, columns times rows :

� � � � � � √ √ T T 45 1 1 5 3 3 3 0 σ1u1v +σ 2u2v = + − = = A. 1 2 √ 3 3 √ 1 1 4 5 20 20 − An Extreme Matrix Here is a larger example, when theu’s and thev’s are just columns of the . So the computations are easy, but keep your eye on the order of the columns. The matrix A is badly lopsided (strictly triangular). All its eigenvalues are zero.AA T is not close to ATA. The matricesU andV will be permutations thatfix these problems properly.   0 1 0 0 eigenvaluesλ=0,0,0,0 all zero !    0 0 2 0  only one eigenvector (1,0,0,0) A=  0 0 0 3  singular valuesσ=3,2,1 0 0 0 0 singular vectors are columns ofI 7.1. Bases and Matrices in the SVD 385

We always start withA TA and AAT. They are diagonal (with easyv’s andu’s):     0000 1000      0100   0400  ATA= AAT =  0 040   0 090  0 0 09 0 0 00

T T 2 2 2 Their eigenvectors (u’s for AA andv’s forA A) go in decreasing orderσ 1 >σ 2 >σ 3 of the eigenvalues. These eigenvaluesσ 2 = 9,4,1 are not zero!       0 010 3 0 0 01  0100   2   0 010  U=   Σ=   V=    1000   1   0100  0 0 01 0 1000

T Thosefirst columnsu 1 andv 1 have1’s in positions3 and4. Thenu 1σ1v1 picks out the biggest numberA 34 = 3 in the original matrixA. The three rank-one matrices in the SVD come exactly from the numbers3,2,1 inA.

T T T T A=UΣV =3u 1v1 +2u 2v2 +1u 3v3 .

Note Suppose I remove the last row ofA (all zeros). ThenA is a3 by4 matrix and AAT is3 by3—its fourth row and column will disappear. We still have eigenvalues λ=1,4,9 inA TA and AAT, producing the same singular valuesσ=3,2,1 inΣ. Removing the zero row ofA (now3 4) just removes the last row ofΣ together with × the last row and column ofU. Then (3 4)=(3 3)(3 4)(4 4). The SVD is totally × × × × adapted to rectangular matrices. A good thing, because the rows and columns of a data matrixA often have completely different meanings (like a spreadsheet). If we have the grades for all courses, there would be a column for each student and a row for each course: The entrya ij would be the grade. T Thenσ 1u1v1 could haveu 1 = combination course andv 1 = combination student. Andσ 1 would be the grade for those combinations: the highest grade. The matrixA could count the frequency of key words in a journal : A different article for each column ofA and a different word for each row. The whole journal is indexed T by the matrixA and the most important information is inσ 1u1v1 . Thenσ 1 is the largest frequency for a hyperword (the word combinationu 1) in the hyperarticlev 1. I will soon show pictures for a different problem: A photo broken into SVD pieces.

Singular Value Stability versus Eigenvalue Instability The4by4 exampleA provides an example (an extreme case) of the instability of eigen- values. Suppose the4,1 entry barely changes from zero to1/60,000. The rank is now4. 386 Chapter 7. The Singular Value Decomposition (SVD)

  0 1 0 0 That change by only1/60,000 produces a  0 0 2 0    A A=   much bigger jump in the eigenvalues of  0 0 0 3  1 i 1 i 1 − − 0 0 0 λ=0,0,0,0 toλ= , , , 60,000 10 10 10 10

1 The four eigenvalues moved from zero onto a circle around zero. The circle has radius 10 when the new entry is only1/60,000. This shows serious instability of eigenvalues when AAT is far fromA TA. At the other extreme, ifA TA= AA T (a “”) the eigenvectors ofA are orthogonal and the eigenvalues ofA are totally stable. By contrast, the singular values of any matrix are stable. They don’t change more than the change inA. In this example, the new singular values are3,2,1, and1/60,000. T The matricesU andV stay the same. The new fourth piece ofA isσ 4u4v4 , with fifteen zeros and that small entryσ 4 = 1/60,000.

Singular Vectors ofA and Eigenvectors ofS=A TA

Equations (5–6) “proved” the SVD all at once. The singular vectorsv i are the eigenvectors T 2 qi ofS=A A. The eigenvaluesλ i ofS are the same asσ i forA. The rankrofS equals the rankr ofA. The all-important rank-one expansions (from columns times rows) were perfectly parallel:

T T T T SymmetricS S=QΛQ =λ 1q q +λ 2q q + +λ rq q 1 1 2 2 ··· r r T T T T SVD ofA A=UΣV =σ 1u1v +σ 2u2v + +σ rurv 1 2 ··· r

Theq’s are orthonormal, theu’s are orthonormal, thev’s are orthonormal. Beautiful. But I want to look again, for two good reasons. One is tofix a weak point in the eigenvalue part, where Chapter 6 was not complete. Ifλ is a double eigenvalue ofS, we can and mustfind two orthonormal eigenvectors. The other reason is to see how the SVD T T picks off the largest termσ 1u1v1 beforeσ 2u2v2 . We want to understand the eigenvalues λ (ofS) and singular valuesσ (ofA) one at a time instead of all at once.

Start with the largest eigenvalueλ 1 ofS. It solves this problem:

xTSx λ1 = maximum ratio . The winning vector isx=q withSq =λ 1q . (8) xTx 1 1 1

Compare with the largest singular valueσ 1 ofA. It solves this problem: Ax σ1 = maximum ratio || ||. The winning vector isx=v 1 withAv 1 =σ 1u1. (9) x || || 7.1. Bases and Matrices in the SVD 387

This “one at a time approach” applies also toλ 2 andσ 2. But not allx’s are allowed: T x Sx T λ2 = maximum ratio among allx’s withq x=0. The winningx isq . xTx 1 2 (10)

Ax T σ2 = maximum ratio || || among allx’s withv x=0. The winningx isv 2. x 1 || || (11) T 2 2 WhenS=A A wefindλ 1 =σ 1 andλ 2 =σ 2 . Why does this approach succeed? Start with the ratior(x) =x TSx/xTx. This is called the . To maximizer(x), set its partial derivatives to zero:∂r/∂x i = 0 fori=1, . . ., n. Those derivatives are messy and here is the result: one vector equation for the winningx: xTSx The derivatives ofr(x) = are zero whenSx=r(x)x. (12) xTx So the winningx is an eigenvector ofS. The maximum ratior(x) is the largest eigenvalue T λ1 ofS. All good. Now turn toA—and notice the connection toS=A A! � � Ax Ax 2 xTATAx xTSx Maximizing || || also maximizes || || = = . x x xTx xTx || || || || T So the winningx=v 1 in (9) is the top eigenvectorq 1 ofS=A A in (8).

Now I have to explain whyq 2 andv 2 are the winning vectors in (10) and (11). We know they are orthogonal toq 1 andv 1, so they are allowed in those competitions. These paragraphs can be omitted by readers who aim to see the SVD in action (Section 7.2). Start with any orthogonal matrixQ that hasq in itsfirst column. The othern 1 1 1 − orthonormal columns just have to be orthogonal toq . Then useSq =λ 1q : � 1 � 1� 1 � � � � � T T λ1 w λ1 w SQ1 =S q1 q2 ...q n = q1 q2 ...q n =Q 1 . (13) 0S n 1 0S n 1 − − T T T Multiply byQ 1 , rememberQ 1 Q1 =I, and recognize thatQ 1 SQ1 is symmetric likeS: � � T T λ1 w T The symmetry ofQ 1 SQ1 = forcesw=0 andS n 1 =S n 1. 0S n 1 − − − The requirementq Tx=0 has reduced the maximum problem (10) to sizen 1. The 1 − largest eigenvalue ofS n 1 will be the second largest forS. It isλ 2. The winning vector − in (10) will be the eigenvectorq 2 withSq 2 =λ 2q2. We just keep going—or use the magic word induction—to produce all the eigenvectors T q1,...,q n and their eigenvaluesλ 1,...,λ n. The Spectral TheoremS=QΛQ is proved even with repeated eigenvalues. All symmetric matrices can be diagonalized. Similarly the SVD is found one step at a time from (9) and (11) and onwards. Section 7.2 will show the geometry—we arefinding the axes of an ellipse. Here I ask a different question: How are theλ’s andσ’s actually computed? 388 Chapter 7. The Singular Value Decomposition (SVD)

Computing the Eigenvalues ofS and the SVD ofA

T The singular valuesσ i ofA are the square roots of the eigenvaluesλ i ofS=A A. This connects the SVD to the symmetric eigenvalue problem (symmetry is good). In the end we don’t want to multiplyA T timesA (squaring is time-consuming: not good). But the same ideas govern both problems. How to compute theλ’s forS and singular valuesσ forA? Thefirst idea is to produce zeros inA andS without changing theσ’s and theλ’s. 1 Singular vectors and eigenvectors will change—no problem. The similar matrixQ − SQ has the sameλ’s asS. IfQ is orthogonal, this matrix isQ TSQ and still symmetric. Section11.3 will show how to buildQ from2by2 rotations so thatQ TSQ is symmetric and tridiagonal (many zeros). We can’t get all the way to a diagonal matrixΛ—which would show all the eigenvalues ofS—without a new idea and more work in Chapter 11. 1 For the SVD, what is the parallel toQ − SQ? Now we don’t want to change any singular values ofA. Natural answer: You can multiplyA by two different orthogonal T matricesQ 1 andQ 2. Use them to produce zeros inQ 1 AQ2. Theσ’s andλ’s don’t change :

T T T T T T (Q1 AQ2) (Q1 AQ2) =Q 2 A AQ2 =Q 2 SQ2 gives the sameσ(A) from the sameλ(S).

T The freedom of twoQ’s allows us to reachQ 1 AQ2 = bidiagonal matrix(2 diagonals). This compares perfectly toQ TSQ=3 diagonals. It is nice to notice the connection between them: (bidiagonal)T (bidiagonal)= tridiagonal. Thefinal steps to a diagonalΛanda diagonalΣ need more ideas. This problem can’t be easy, because underneath we are solving det(S λI)=0 for polynomials of degree − n = 100or 1000 or more. The favorite way tofindλ’s andσ’s uses simple orthogonal matrices to approachQ TSQ=Λ andU TAV=Σ. We stop when very close toΛ andΣ. This2-step approach is built into the commands eig(S) and svd(A).

REVIEW OF THE KEY IDEAS

T 1. The SVD factorsA intoUΣV , withr singular valuesσ 1 ... σ r >0. ≥ ≥ 2 2 T T 2. The numbersσ 1 , . . .,σ r are the nonzero eigenvalues of AA andA A.

3. The orthonormal columns ofU andV are eigenvectors of AA T andA TA.

4. Those columns hold orthonormal bases for the four fundamental subspaces ofA.

5. Those bases diagonalize the matrix:Av i =σ iui fori r. This isAV=UΣ. ≤ T T 6.A=σ 1u1v + +σ rurv andσ 1 is the maximum of the ratio Ax / x . 1 ··· r || || || || 7.1. Bases and Matrices in the SVD 389

WORKED EXAMPLES

7.1 A Identify by name these decompositions ofA into a sum of columns times rows: T T 1. Orthogonal columnsu 1σ1,...,u rσr times orthonormal rowsv 1 ,...,v r . T T 2. Orthonormal columnsq 1,...,q r times triangular rowsr 1 ,...,r r . T T 3. Triangular columnsl 1,...,l r times triangular rowsu 1 ,...,u r . Where do the rank and the pivots and the singular values ofA come into this picture?

Solution These three factorizations are basic to linear algebra, pure or applied:

1. Singular Value DecompositionA=UΣV T 2. Gram-Schmidt OrthogonalizationA=QR 3. Gaussian EliminationA= LU

You might prefer to separate out singular valuesσ i and heightsh i and pivotsd i:

T 1.A=UΣV with unit vectors inU andV. The singular valuesσ i are inΣ.

2.A= QHR with unit vectors inQ and diagonal1’s inR. The heightsh i are inH.

3.A= LDU with diagonal1’s inL andU. The pivotsd i are inD.

Eachh tells the height of columni above the plane of columns1 toi 1. The volume i − of the fulln-dimensional box (r=m=n) comes fromA=UΣV T = LDU= QHR:

detA = product ofσ’s = product ofd’s = product ofh’s . | | | | | | | |

7.1.B Show thatσ 1 λ max. The largest singular value dominates all eigenvalues. ≥| | Solution Start fromA=UΣV T. Remember that multiplying by an does not change length: Qx = x because Qx 2 =x TQTQx=x Tx= x 2. � � � � � � � � This applies toQ=U andQ=V T. In between is the diagonal matrixΣ.

Ax = UΣV Tx = ΣV Tx σ V Tx =σ x . (14) � � � � � �≤ 1� � 1� �

An eigenvector has Ax = λ x . So (14) says that λ x σ x . Then λ σ 1. � � | |� � | |� �≤ 1� � | |≤ Apply also to the unit vectorx = (1,0,...,0). NowAx is thefirst column ofA. Then by inequality (14), this column has length σ . Every entry must have a σ . ≤ 1 | ij|≤ 1

Equation (14) shows again that the maximum value of Ax / x equalsσ 1. || || || || Section 11.2 will explain how the ratioσ max/σmin governs the roundoff error in solving Ax=b. MATLAB warns you if this “” is large. Thenx is unreliable.