Math 480 Diagonalization and the Decomposition

These notes cover diagonalization and the Singular Value Decomposition.

1. Diagonalization. Recall that a diagonal is a with all off-diagonal entries equal to zero. Here are a few examples of diagonal matrices:  4 0 0 0   4 0 0   −6 0  0 2 0 0 , 0 2 0 ,   . 0 2    0 0 1 0  0 0 1   0 0 0 −1 Definition 1.1. We say that an n×n matrix A is diagonalizable if there exists an S such that S−1AS is diagonal. Note that if D = S−1AS is diagonal, then we can equally well write A = SDS−1. So diagonalizable matrices are those that admit a factorization A = SDS−1 with D diagonal.

Example: If D is a diagonal n × n matrix and S is an invertible n × n matrix, then A = SDS−1 is diagonalizable, since S−1AS = S−1(SDS−1)S = D.  −1 2  For instance, the matrix S = is invertible, so 2 4  −6 0   −2 2  S S−1 = 0 2 8 −2 is diagonalizable. Fact 1.2. If A is a diagonalizable n × n matrix, with S−1AS = D, then the columns of S are eigenvectors of A, and the diagonal entries of D are eigenvalues of A. In particular, if A is diagonalizable then there must exist a basis for Rn consisting of eigenvectors of A. This follows from a simple computation: since S−1AS = D, multiplying both sides by S yields AS = SD.

Write S = [~v1 . . .~vn] and set   λ1 0 0 ··· 0  0 λ2 0 ··· 0   .   .  D =  0 0 λ3 .  .  . . .   . . .. 0  0 0 ··· 0 λn Since multiplying S by a (on the right) just scales the columns,

SD = [λ1~v1 . . . λn~vn]. On the other hand, AS = A[~v1 . . .~vn] = [A~v1 . . . A~vn].

So the equation AS = SD tells us that A~vi = λi~vi (for each i), which precisely says that ~vi is an eigenvector with eigenvalue λi. The previous discussion also works in reverse, and yields the following conclusion. n Fact 1.3. If A is an n × n matrix and there exists a basis ~v1, . . . ,~vn for R such that ~vi is an eigenvector of A with eigenvalue λi, then A is diagonalizable. More specifically, if S = [~v1 . . .~vn], then S−1AS = D, where D is the n × n diagonal matrix with diagonal entries λ1, . . . , λn. Example. I claim that the matrix  4 2 2  A =  2 4 2  2 2 4 has eigenvalues 2 and 8. To find the corresponding eigenvectors, you can analyze N(A − 2I) and N(A−8I). By considering the parametric form for the homogeneous systems (A−2I)~x = ~0 and (A − 8I)~x = ~0, you’ll find that the vectors  −1   −1   1  and  0  0 1 form a basis for the eigenspace associated to the eigenvalue 2, and  1   1  1 forms a basis for the eigenspace associated with the eigenvalue 8. We can then conclude that S−1AS = D, where  −1 −1 1   2 0 0  S =  1 0 1  and D =  0 2 0  . 0 1 1 0 0 8 Note that order is important here: since we put eigenvectors corresponding to 2 into the first two columns of S, we have to put the eigenvalue 2 into the first two diagonal entries of D. We could, however, have switched the order of the eigenvectors corresponding to 2 without changing D, giving a second way of diagonalizing A. A third way of diagonalizing A would be to set  1 −1 −1   8 0 0  T =  1 0 1  and E =  0 2 0  , 1 1 0 0 0 2 and again we have T −1AT = E.

Exercise 1: Check these formulas without computing S−1 and T −1. (Multiply both sides of the equations S−1AS = D and T −1AT = E by S or T and check instead that AS = SD and AT = TE.)

 1 1  Example: The matrix A = is not diagonalizable. If you compute the character- 0 1 istic polynomial det(A − λI), you’ll see that it is simply (1 − λ)2, so the only eigenvalue is λ = 1. The corresponding eigenspace is N(A−1·I) = N(A−I). This space is 1–dimensional (why?) so there cannot be a basis for Rn consisting of eigenvectors of A. So Fact ?? tells us that we can’t diagonalize A.

Exercise 2: Determine whether or not the following matrices are diagonalizable. For the ones that are diagonalizable, write them in the form SDS−1 with D diagonal.

 5 −1 −4   −3 1   −4 6  , , −2 4 −2 . −1 −1 −8 10   −3 −3 6

2. Diagonalization of Symmetric Matrices In general, it’s hard to tell if a matrix is diagonalizable, because it’s hard to find eigenvalues exactly: they’re roots of a complicated polynomial. However, in some cases one can tell very quickly that a matrix is diagonalizable. Theorem 2.1 (The ). If A is an n × n , then A is diagonalizable. In other words, there is a basis for Rn consisting of eigenvectors of A. This is hard to prove, and we’ll simply take it for granted. However, some additional information is much easier to establish. Fact 2.2. If A is an n × n symmetric matrix, and ~v and ~w are eigenvectors of A with different eigenvalues, then ~v and ~w are perpendicular.

This is relatively easy to check using our understanding of orthogonality. Say A~v = λ1~v T and A~w = λ2 ~w with λ1 6= λ2. We need to check that h~v, ~wi = 0. Since h~v, ~wi = ~v ~w, T T λ1~v ~w = (λ1~v) ~w = (A~v)T ~w = ~vT AT ~w T T = ~v A~w = ~v λ2 ~w T = λ2~v ~w. T T T So λ1~v ~w = λ2~v ~w, and since λ1 6= λ2, we conclude that ~v ~w = 0. In this computation, we used the fact that A is symmetric (where?) and the fact that ~w is an eigenvector (where?).

We’ll mostly be interested in symmetric matrices of the form AT A, where A is any m × n matrix. Remember that all such matrices are symmetric, because (AT A)T = AT (AT )T = AT A. Fact 2.3. For any m × n matrix A, the eigenvalues of the symmetric matrix AT A are all non-negative (real) numbers. This is again easy to check. If λ is an eigenvalue of AT A, then we can always find a (non- zero) eigenvector ~v associated with λ, and dividing ~v by ||~v|| yields a length-one eigenvector. So let’s just assume that ||~v|| = 1 and AT A~v = λ~v. Then we have ||A~v||2 = hA~v, A~vi = (A~v)T A~v = ~vT (AT A~v) = ~vT (λ~v) = λh~v,~vi = λ. So λ = ||A~v||2, which is a non-negative real number. In this computation, we used the fact that ~v is an eigenvector of AT A (where?) and the fact that ||~v|| = 1 (where?).

Exercise 3: Write each of the following symmetric matrices in the form SDS−1 with D diagonal. In the second case, the eigenvalues are −1 and 11.  5/2 1/2 0   3 4 4   1/2 5/2 0  ,  4 3 4  0 0 5 4 4 3

3. The Singular Value Decomposition Lots of matrices that arise in practice are not diagonalizable, and are often not even square. However, there is something sort of similar to diagonalization that works for any m × n matrix. We will call a square matrix orthogonal if its columns are orthonormal.

Exercise 4: Explain the following statement: if A is an orthogonal n × n matrix, then A is invertible and AT = A−1. (This came up when we discussed the QR factorization.) Definition 3.1. A Singular Value Decomposition of an m × n matrix A is an expression A = UΣV T where U is an m×m matrix with orthonormal columns, V is an n×n matrix with orthonormal columns, and Σ = (σi,j) is an m × n matrix with σi,j = 0 for i 6= j and

σ1,1 > σ2,2 > σ3,3 > ··· > σm,m > 0. Example: Here is an example of a SVD:  1/3 2/3 −2/3   6 30 −21   4/5 −3/5   45 0 0  = 2/3 −2/3 −1/3 . 17 10 −22 3/5 4/5 0 15 0   2/3 1/3 2/3

Exercise 5: Check that the above decomposition is a Singular Value Decomposition. (You need to check that the left-hand matrix in the decomposition has orthonormal columns, that the rows of the right-hand matrix are orthonormal, and that the middle matrix is “diagonal” with decreasing, positive entries on the diagonal. Of course no work is required to check this third condition.)

Here are the key facts about the SVD: Theorem 3.2. Every m × n matrix A admits (many) Singular Value Decompositions. Fact 3.3. If A = UΣV T is a Singular Value Decomposition of an m × n matrix A, then T • The numbers σi,i are the square roots of the eigenvalues of A A, repeated according to their multiplicities as roots of the characteristic polynomial of AT A. (Note that these eigenvalues are positive since AT A is symmetric.) • The columns of V are eigenvectors of AT A. • The columns of U are eigenvectors of AAT .

The terms σi,i are called the singular values of A. We’ll set σi = σi,i for convenience.

It is not hard to check the second two statements in Fact ??, and in doing so we’ll also T check the first statement. Let V = [~v1 . . .~vn]. To see why each ~vi is an eigenvector of A A, we simply write out AT A in terms of the given SVD:

T T T T T T T T (1) A A~vi = (UΣV ) (UΣV )~vi = V Σ (U U)(ΣV ~vi) = V Σ Σ~ei. The last step uses Exercise 3 and the fact that the columns of V are orthonormal (think T 2 T 2 about this!). Now, Σ Σ is diagonal with diagonal entries σi , so Σ Σ~ei = σi ei and hence

T T 2 2 A A~vi = V Σ Σ~ei = V σi ~ei = σi ~vi.

T 2 So ~vi is an eigenvector of A A with eigenvalue σi .

Exercise 6: Give a similar explanation for why the columns of U are eigenvectors of AAT .

Exercise 7: Show that for any square matrix A, the eigenvectors of A are also eigenvectors of A2. What are the eigenvalues for A2?

Example: If A is a symmetric n × n matrix, then we learned in the previous section that A is diagonalizable; that is, A = S−1DS for some diagonal matrix D and some invertible matrix S. In fact, we saw that the columns of S corresponding to different eigenvalues are orthogonal to one another. If you apply Gram–Schmidt to the columns of S, you’ll get an T that also diagonalizes A. If you order the columns of T and D so that the diagonal entries of D are decreasing, then you’ve found a SVD for A.  6 0 6  For instance, let A =  0 12 6 . To find a SVD for A, we can just diagonalize A 6 6 9 (making sure the matrix S has orthonormal columns). We need to figure out the eigenvalues of AT A. Since A is symmetric, AT A = A2, and the eigenvalues of A2 are just the squares of the eigenvalues of A (see Exercise 7). So the singular values of A are just the eigenvalues of A! To compute these eigenvalues/singular values, we first compute the characteristic polyno- mial, which is the of  6 − λ 0 6  A =  0 12 − λ 6  . 6 6 9 − λ Using cofactor expansion and simplifying shows that the characteristic polynomial is −λ3 + 27λ2 − 162λ = −λ(λ2 − 27λ + 162) = −λ(λ − 9)(λ − 18), so the eigenvalues of A are 18, 9, and 0. So the diagonal matrix in our SVD will be  18 0 0  Σ =  0 9 0  . 0 0 0 The matrix V in our SVD must contain eigenvectors of AT A = A2 going along with the above eigenvalues; again Exercise 7 tells us that A2 and A have the same eigenvectors. If you compute eigenvectors of A and normalize, you should get the orthogonal matrix  1/3 2/3 −2/3  V =  2/3 −2/3 −1/3  2/3 1/3 2/3 (note that you could change any column to its negative). From what we learned about diagonalization, we now know that V ΣV T = A. This is a SVD of A. You can get slightly different SVD’s of A by negating any of the columns of V .

We won’t worry too much about how to find Singular Value Decompositions, but here is a quick sketch of one general method. We just need orthogonal matrices U and V satisfying AV = UΣ. since multiplying both sides by V −1 = V T gives a SVD of A. As we just learned, the matrix V will simply contain eigenvectors of AT A; remember that since AT A is symmetric, there exists an for Rn consisting of such eigenvectors. Then

AV = [A~v1 . . . A~vn],

while if σ1, . . . , σr are the non-zero singular values of A,

UΣ = [σ1~u1 . . . σr~ur ~0 ...~0].

So we just set ~ui = A~vi/σi if σi 6= 0. (It’s not too hard to check that the vectors ~ui are T orthonormal, by writing out their inner products using transpose notation: ~ui · ~uj = ~ui ~uj.) T 2 These vectors ~ui are eigenvectors of AA with eigenvalues σi :   T T 1 1 T 1 2 2 AA ~ui = AA A~vi = A(A A~vi) = Aσi ~vi = σiA~vi = σi ~ui. σi σi σi To get the other columns of U, we just find an orthonormal basis for the 0–eigenspace of AAT (which is just N(AAT )). This can be done using by applying Gram–Schmidt to the basis we get from parametric form.

Exercise 7: Determine which, if any, of the following are Singular Value Decompositions. √ √  1/3 −2/3 2/3   5/ 10 4/ 10   5 0 0  a) √ √ −2/3 −1/3 2/3 0 6 0   4/ 10 −5/ 10 2/3 −2/3 1/3 √ √  1/3 −2/3 2/3   3/ 10 1/ 10   5 0 0  b) √ √ −2/3 −1/3 2/3 0 6 0   1/ 10 −3/ 10 2/3 −2/3 1/3 √ √  1/3 −2/3 2/3   3/ 10 1/ 10   6 0 0  c) √ √ −2/3 −1/3 2/3 0 5 0   1/ 10 −3/ 10 2/3 −2/3 1/3

4. Application: the four fundamental subspaces. Given an SVD A = UΣV T , it’s relatively easy to find bases for the four fundamental subspaces R(A), N(A), R(AT ), and R(A). This is based on the following fact. Fact 4.1. If A has r non-zero singular values, counted according to their multiplicity (i.e. if Σ has r non-zero entries), then A has rank r. Why is this? Row reduce A = UΣV T by multiplying on the left by elementary matrices. T Since U is invertible you’ll eventually get down to IΣV , which has rows σi~vi. The first r of these rows are non-zero and linearly independent, while the rest are zero. So ΣV T has r pivotal rows, so you’ll get r pivots when you finish row reducing A = UΣV T .

We can now explain how to find orthonormal bases for the four fundamental subspaces. The Range of A: The first r columns of U form an orthonormal basis for R(A): for 1 6 i 6 r we have T A~vi = UΣV ~vi = UΣ~ei = Uσi~ei = σi~ui,

so the vectors ~ui are in R(A), and they’re linearly independent (because they’re orthonormal). Since dim R(A) = rank(A) = r, these vectors must form an (orthonormal) basis. The Null Space of A: The last n − r columns of V form a basis for N(A). These vectors are linearly independent (because they’re orthonormal), and they lie in N(A) because T A~vi = UΣV ~vi = UΣ~ei = U~0 for i > r. By the Rank–Nullity Theorem, dim N(A) = n − r, so the linearly independent set {~vr+1, . . . ,~vn} must form a basis for N(A). To find bases for N(AT ) and R(AT ), just notice that AT = (UΣV T ) = V ΣT U T . This is a SVD for AT , because the diagonal entries of Σ and ΣT are the same, and V and U are orthogonal. Applying the same reasoning as above to AT = V ΣT U T , we find that the first r columns of V are a basis for R(AT ) and the last m − r columns of U are a basis for N(AT ).

Example: Consider the SVD √ √ √  1/3 2/3 2/3   4 11 14   3/ 10 1/ 10   6 10 0 0  = √ √ √ −2/3 −1/3 2/3 . 8 7 −2   1/ 10 −3/ 10 0 3 10 0 2/3 −2/3 1/3 Letting A denote the matrix on the left, we have: √ √  3/ 10   1/ 10  • √ , √ is a basis for R(A); 1/ 10 −3/ 10    2/3  •  −2/3  is a basis for N(A);  1/3       1/3 −2/3  •  2/3  ,  −1/3  is a basis for R(AT );  2/3 2/3  • and N(AT ) = {~0} (so, if you like, it has an empty basis).

Exercise 8: Find bases (and dimensions) of the four fundamental subspaces of the matrix whose singular value decomposition is given below:  1/3 2/3 −2/3   6 30 −21   4/5 −3/5   45 0 0  = 2/3 −2/3 −1/3 . 17 10 −22 3/5 4/5 0 15 0   2/3 1/3 2/3 Use your answer to compute the projection of (1 1 1) onto the row space of A.

5. Application: data compression One of the principal uses of the SVD is in data compression. The basic idea is that the important information in a matrix is really contained in its largest singular values, and one may often ignore the smaller singular values without losing the essential features of the data. If A = UΣV T is a singular value decomposition of an m × n matrix, and Σ has precisely r non-zero entries, then the rank of A is r (Fact ??). Similarly, if we let Σk be the matrix T formed from Σ by replacing σk+1, . . . , σr by zeros, then Ak = UΣkV is a rank k matrix, and is the best approximation to A by a rank k matrix. (This can be made precise, but we T won’t worry about the details.) We will refer to Ak = UΣkV as the rank k approximation to A. Now, deleting entries from the diagonal of Σ doesn’t really get rid of all that much in- formation: if you had to transmit the entire matrices U, V , and Σk, you’d be transmitting m2 +n2 +k numbers, which is actually more numbers than were in the original m×n matrix A! T However, we can re-write the rank k approximation Ak = UΣkV in a way that requires much less information than was contained in the original matrix A.

Fact 5.1. If the columns of V are ~vi and the columns of U are ~ui, then we have

T Ak = UkΣk(Vk) , where Uk = [~u1 . . . ~uk 0 ... 0] and Vk = [~v1 . . .~vk 0 ... 0]. In other words, we can just replace all the columns of U and V after the kth column by zeros.

T This is straightforward to check: just think about multiplying U(ΣkV ), and you’ll see that only the first k columns of U and V are actually used in computing these products. Note that this new description of our rank k approximation really does contain a lot less data than was in the original matrix A: there are just k non-zero columns of Uk, each containing m entries, and just k non-zero columns of Vk, each containing n entries. So there are a total of km + kn + k non-zero entries in Uk,Σk, and Vk, which is much less than the mn entries in the original matrix A (if k is much less then m and n). Note also that the entries of the vectors ~ui and ~vi are necessarily small numbers, since these are unit vectors, while the entries of A could all be quite large.  6 0 6  Exercise 9: Compute the rank 1 and rank 2 approximations to the matrix A =  0 12 6  6 6 9 discussed on p. 5.

The third project shows how to use these ideas to compress images.