396 Chapter 7. Applied and ATA

7.2 Positive Definite Matrices and the SVD

This chapter about applications of ATA depends on two important ideas in linear algebra. These ideas have big parts to play, we focus on them now.

1. Positive definite symmetric matrices (both ATA and ATCA are positive definite)

2. Singular Value Decomposition (A U†V T gives perfect bases for the 4 subspaces) D Those are orthogonal matrices U and V in the SVD. Their columns are orthonormal eigenvectors of AAT and ATA. The entries in the diagonal † are the square roots of the eigenvalues. The matrices AAT and ATA have the same nonzero eigenvalues. Section 6.5 showed that the eigenvectors of these symmetric matrices are orthogonal. I will show now that the eigenvalues of ATA are positive, if A has independent columns.

Start with ATAx x. Then xTATAx xTx. Therefore  Ax 2= x 2 >0 D D D jj jj jj jj I separated xTATAx into .Ax/T.Ax/ Ax 2. We don’t have  0 because ATA is invertible (since A has independent columns).D jj Thejj eigenvalues must beD positive. Those are the key steps to understanding positive definite matrices. They give us three tests on S—three ways to recognize when a S is positive definite :

Positive 1. All the eigenvalues of S are positive. definite 2. The “energy” xTSx is positive for all nonzero vectors x. symmetric 3. S has the form S ATA with independent columns in A. D

There is also a test on the pivots .all > 0/ and a test on n .all > 0/. Example 1 Are these matrices positive definite ? When their eigenvalues are positive, construct matrices A with S ATA and find the positive energy xTSx. D 4 0 5 4 4 5 (a) S (b) S (c) S D 0 1 D 4 5 D 5 4       Solution The answers are yes, yes, and no. The eigenvalues of those matrices S are

(a) 4 and 1 : positive (b) 9 and 1 : positive (c) 9 and 1 : not positive. A quicker test than eigenvalues uses two determinants : the 1 by 1 S11 and the 2 by 2 determinant of S. Example (b) has S11 5 and det S 25 16 9 (pass). D D D Example (c) has S11 4 but det S 16 25 9 (fail the test). D D D 7.2. Positive Definite Matrices and the SVD 397

Positive energy is equivalent to positive eigenvalues, when S is symmetric. Let me test the energy xTSx in all three examples. Two examples pass and the third fails :

4 0 x1 2 2 Œx1 x2 4x1 x2 >0 Positive energy when x 0 0 1 x2 D C ¤     5 4 x1 2 2 Œx1 x2 5x1 8x1x2 5x2 Positive energy when x 0 4 5 x2 D C C ¤     4 5 x1 2 2 Œx1 x2 4x1 10x1x2 4x2 Energy 2 when x .1; 1/ 5 4 x2 D C C D     Positive energy is a fundamentalproperty. This is the best definition of positive definiteness.

When the eigenvalues are positive, there will be many matrices A that give ATA S. One choice of A is symmetric and positive definite ! Then ATA is A2, and this choiceD A pS is a true of S. The successful examples (a) and (b) have S A2 : D D

4 0 2 0 2 0 5 4 2 1 2 1 and 0 1 D 0 1 0 1 4 5 D 1 2 1 2            

We know that all symmetric matrices have the form S VƒV T with orthonormal eigenvectors in V . The ƒ has a square root pDƒ, when all eigenvalues are positive. In this case A pS V pƒV T is the symmetric positive definite square root : D D ATA pSpS .V pƒV T/.V pƒV T/ V pƒpƒV T S because V TV I: D D D D D Starting from this unique square root pS, other choices of A come easily. Multiply pS by any matrix Q that has orthonormal columns (so that QTQ I ). Then QpS is another choice for A (not a symmetric choice). In fact all choices comeD this way :

ATA .QpS/T .QpS/ pSQTQpS S: (1) D D D I will choose a particular Q in Example 1, to get particular choices of A. 0 1 Example 1 (continued) Choose Q to multiply pS. Then A QpS. D 1 0 D   0 1 2 0 0 1 4 0 A has S ATA D 1 0 0 1 D 2 0 D D 0 1         0 1 2 1 1 2 5 4 A has S ATA : D 1 0 1 2 D 2 1 D D 4 5         398 Chapter 7. Applied Mathematics and ATA

Positive Semidefinite Matrices Positive semidefinite matrices include positive definite matrices, and more. Eigenvalues of S can be zero. Columns of A can be dependent. The energy xTSx can be zero—but not negative. This gives new equivalent conditions on a (possibly singular) matrix S S T. D 1 All eigenvalues of S satisfy  0 (semidefinite allows zero eigenvalues). 0  2 The energy is nonnegative for every x : xTSx 0 (zero energy is allowed). 0  T 30 S has the form A A (every A is allowed; its columns can be dependent). Example 2 The first two matrices are singular and positive semidefinite—but not the third :

0 0 4 4 4 4 (d) S (e) S (f) S . D 0 1 D 4 4 D 4 4       T 2 2 The eigenvalues are 1;0 and 8;0 and 8;0. The energies x Sx are x2 and 4.x1 x2/ 2 C and 4.x1 x2/ . So the third matrix is actually negative semidefinite. Singular Value Decomposition Now we start with A, square or rectangular. Applications also start this way—the matrix comes from the model. The SVD splits any matrix into orthogonal U times diagonal † times orthogonal V T . Those orthogonal factors will give orthogonal bases for the four fundamental subspaces associated with A. Let me describe the goal for any m by n matrix, and then how to achieve that goal. n m Find orthonormal bases v1;:::; vn for R and u1;:::; um for R so that

Av1 1u1 : : : Avr r ur Avr 1 0 : : : Avn 0 (2) D D C D D

The rankof A is r. Those requirementsin (4) are expressed by a multiplication AV U †. D The r nonzero singular values 1 2 : : : r >0 are on the diagonal of † :   

1 0 : :: AV U † A v1 : : : vr : : : vn u1 : : : ur : : : um 2 3 (3) D 2 3 D 2 3 r 6 0 0 7 4 5 4 5 6 7 4 5 The last n r vectors in V are a basis for the nullspace of A. The last m r vectors in U are a basis for the nullspace of AT. The diagonal matrix † is m by n, withr nonzeros. 1 T n Remember that V V , because the columns v1;:::; vn are orthonormal in R : D

Singular Value Decomposition AV U † becomes A U †V T : (4) D D 7.2. Positive Definite Matrices and the SVD 399

The SVD has orthogonal matrices U and V , containing eigenvectors of AAT and AT A.

Comment. A is diagonalized by its eigenvectors : Axi i xi is like D Avi i ui . But even if A has n eigenvectors, they may not be orthogonal. We need two basesD—an input basis of v’s in Rn and an output basis of u’s in Rm. With two bases, any m by n matrix can be diagonalized. The beauty of those bases is that they can be chosen orthonormal. Then U TU I and V TV I . D D The v’s are eigenvectors of the symmetric matrix S ATA. We can guarantee their vTv D orthogonality, so that j i 0 for j i. That matrix S is positive semidefinite, so its 2 D ¤ eigenvalues are  0. The key to the SVD is that Avj is orthogonal to Avi : i 

2 T T T T 2 i if j i Orthogonal u’s .Avj / .Avi / v .A Avi / v . vi / D (5) D j D j i D 0 if j i  ¤

This says that the vectors ui Avi =i are orthonormal for i 1;:::;r. They are a basis for the column space of A.D And the u’s are eigenvectors ofD the symmetric matrix AAT, which is usually different from S ATA (but the eigenvalues  2;:::; 2 are the same). D 1 r Example 3 Find the input and output eigenvectors v and u for the rectangular matrix A :

2 2 0 A U†V T: D 1 1 0 D  

T 2 Solution Compute S A A and its unit eigenvectors v1; v2; v3. The eigenvalues  D are 8;2;0 so the positive singular values are 1 p8 and 2 p2 : D D 5 3 0 p2 p2 0 T 1 1 A A 3 5 0 has v1 p2 ; v2 p2 ; v3 0 : D 2 3 D 2 2 3 D 2 2 3 D 2 3 0 0 0 0 0 1 4 5 4 5 4 5 4 5

The outputs u1 Av1=1 and u2 Av2=2 are also orthonormal, with 1 p8 and D D D 2 p2. Those vectors u1 and u2 are in the column space of A : D

2 2 0 v1 1 2 2 0 v2 0 u1 and u2 : D 1 1 0 p D 0 D 1 1 0 p D 1   8     2  

Then U I and the Singular Value Decomposition for this 2 by 3 matrix is U†V T : D

T p2 p2 0 2 2 0 1 0 p8 0 0 1 A p2 p2 0 : D 1 1 0 D 0 1 0 p2 0 2 2 3     " # 0 02 4 5 400 Chapter 7. Applied Mathematics and ATA

The Fundamental Theorem of Linear Algebra

I think of the SVD as the final step in the Fundamental Theorem. First come the dimen- sions of the four subspaces in Figure 7.3. Then come the orthogonality of those pairs of subspaces. Now come the orthonormal bases of v’s and u’s that diagonalize A :

T Avj j uj for j r A uj j vj for j r SVD D  D  T Avj 0 for j >r A uj 0 for j >r D D

T T Multiplying Avj j uj by A and dividing by j gives that equation A uj j vj . D D

dim r dim r row column space space of A of A

Av1 1u1 1u1 D v1 u1 v r Avr r ur ur Rn D r ur Rm Avn 0 u vr 1 vn D ur 1 m C nullspaceC nullspace of AT of A dim n r dim m r

Figure 7.3: Orthonormal bases of v’s and u’s that diagonalize A : m by n with rank r.

The “norm” of A is its largest singular value : A 1. This measures the largest jj jj D possible ratio of Av to v . That ratio of lengths is a maximum when v v1 and jj jj jj jj D Av 1u1. This singular value 1 is a much better measure for the size of a matrix than the largestD eigenvalue. An extreme case can have zero eigenvalues and just one eigenvector (1;1) for A. But ATA can still be large : if v .1; 1/ then Av is 200 times larger. D 100 100 A has max 0: But max norm of A 200: (6) D 100 100 D D D   7.2. Positive Definite Matrices and the SVD 401

The Condition Number A valuable property of A U†V T is that it puts the pieces of A in order of importance. D vT Multiplying a column ui times a row i i produces one piece of the matrix. There will be r nonzero pieces from r nonzero ’s, when A has rank r. The pieces add up to A, when we multiply columns of U times rows of †V T : vT 1 1 The pieces vT vT A u1 : : : ur ::::: u1.1 1/ ur .r r /: (7) have rank 1 D 2 3 2 vT 3 D CC r r 4 5 4 5 1 The first piece gives the norm of A which is 1. The last piece gives the norm of A , which is 1=n when A is invertible. The condition number is 1 times 1=n :

1 1 Condition number of A c.A/ A A : (8) D jj jj jj jj D n This number c.A/ is the key to numerical stability in solving Av b. When A is an , the symmetric S ATA is the . SoD all singular values of D an orthogonal matrix are  1. At the other extreme, a singular matrix has n 0. In that case c . OrthogonalD matrices have the best condition number c 1. D D1 D Data Matrices : Application of the SVD “Big data” is the linear algebra problem of this century (and we won’t solve it here). Sensors and scanners and imaging devices produce enormous volumes of information. Making decisive sense of that data is the problem for a world of analysts (mathematicians and statisticians of a new type). Most often the data comes in the form of a matrix. The usual approach is by PCA—Principal Component Analysis. That is essentially vT the SVD. The first piece 1u1 1 holds the most information (in this piece has the greatest ). It tells us the most. The Chapter 7 Notes include references.

REVIEW OF THE KEY IDEAS

1. Positive definite symmetric matrices have positive eigenvalues and pivots and energy. 2. S ATA is positive definite if and only if A has independent columns. D 3. xTATAx .Ax/T.Ax/ is zero when Ax 0. ATA can be positive semidefinite. D D 4. The SVD is a factorization A U†V T (orthogonal)(diagonal)(orthogonal). D D 5. The columns of V and U are eigenvectors of ATA and AAT (singular vectors of A).

6. Those orthonormal bases achieve Avi i ui and A is diagonalized. D T T 7. The largest piece of A 1u1v r ur v gives the norm A 1. D 1 CC r jj jj D 402 Chapter 7. Applied Mathematics and ATA

Problem Set 7.2

1 For a 2 by 2 matrix, suppose the 1 by 1 and 2 by 2 determinants a and ac b2 are positive. Then c>b2=a is also positive.

(i) 1 and 2 have the same sign because their product 12 equals .

(i) That sign is positive because 1 2 equals . C 2 Conclusion : The tests a > 0;ac b >0 guarantee positive eigenvalues 1;2. 2 2 Which of S1, S2, S3, S4 has two positive eigenvalues? Use a and ac b , don’t T compute the ’s. Find an x with x S1x <0, confirming that A1 fails the test. 5 6 1 2 1 10 1 10 S1 S2 S3 S4 : D 6 7 D 2 5 D 10 100 D 10 101         3 For which numbers b and c are these matrices positive definite ?

1 b 2 4 c b S S S : D b 9 D 4 c D b c       4 What is the energy q ax2 2bxy cy2 xTSx for each of these matrices? D C C D 2 2 Complete the square to write q as a sum of squares d1. / d2. / . C 1 2 1 3 S and S : D 2 9 D 3 9     T 5 x Sx 2x1x2 certainly has a saddle point and not a minimum at .0;0/. What symmetricD matrix S produces this energy? What are its eigenvalues?

6 Test to see if ATA is positive definite in each case :

1 1 1 2 1 1 2 A and A 1 2 and A : D 0 3 D 2 3 D 1 2 1   2 1   4 5 7 Which 3 by 3 symmetric matrices S and T produce these quadratic energies?

T 2 2 2 x Sx 2 x x x x1x2 x2x3 . Why is S positive definite? D 1 C 2 C 3 T 2 2 2 x T x 2 x x x x1x2 x1x3 x2x3 . Why is T semidefinite ? D 1 C 2 C 3  8 Compute the three upper left determinants of S to establish positive definiteness. (The first is 2.) Verify that their ratios give the second and third pivots.

2 2 0 Pivots ratios of determinants S 2 5 3 : D D 20 3 83 4 5 7.2. Positive Definite Matrices and the SVD 403

9 For what numbers c and d are S and T positive definite? Test the 3 determinants:

c 1 1 1 2 3 S 1 c 1 and T 2 d 4 : D 21 1 c 3 D 23 4 53 4 5 4 5 1 10 If S is positive definite then S is positive definite. Best proof: The eigenvalues 1 of S are positive because . Second proof (only for 2 by 2) :

1 1 c b The entries of S pass the determinant tests : D ac b2 b a   11 If S and T are positive definite, their sum S T is positive definite. Pivots and eigenvalues are not convenient for S T . BetterC to prove xT.S T/x >0. C C 12 A positive definite matrix cannot have a zero (or even worse, a negative number) on its diagonal. Show that this matrix fails to have xTSx >0 :

4 1 1 x1 x1 x2 x3 1 0 2 x2 is not positive when .x1; x2; x3/ . ; ; /: 2 3 2 3 D 1 2 5 x3   4 5 4 5 13 A diagonal entry ajj of a symmetric matrix cannot be smaller than all the ’s. If it were, then A ajj I would have eigenvalues and would be positive definite. But A ajj I has a on the main diagonal. 14 Show that if all >0 then xTSx > 0. We must do this for every nonzero x, not just the eigenvectors. So write x as a combination of the eigenvectors and T T explain why all “cross terms” are x xj 0. Then x Sx is i D T 2 T 2 T .c1x1 cnxn/ .c11x1 cnnxn/ c 1x x1 c nx xn > 0: CC CC D 1 1 CC n n 15 Give a quick reason why each of these statements is true: (a) Every positive definite matrix is invertible. (b) The only positive definite is P I . D (c) A diagonal matrix with positive diagonal entries is positive definite. (d) A symmetric matrix with a positive determinant might not be positive definite !

16 With positive pivots in D, the factorization S LDLT becomes LpDpDLT. (Square roots of the pivots give D pDpDD.) Then A pDLT yields the Cholesky factorization S ATA whichD is “symmetrized LUD” : D 3 1 4 8 From A find S: From S find A chol.S/: D 0 2 D 8 25 D     404 Chapter 7. Applied Mathematics and ATA

cos  sin  2 0 cos  sin  17 Without multiplying S , find D sin  cos  0 5 sin  cos      (a) the determinant of S (b) the eigenvalues of S (c) the eigenvectors of S (d) a reason why S is symmetric positive definite.

1 4 2 2 3 18 For F1.x;y/ x x y y and F2.x;y/ x xy x find the second D 4 C C D C derivative matrices H1 and H2 :

@2F=@x2 @2F=@x@y Test for minimum H 2 2 2 is positive definite D "@ F=@y@x @ F=@y #

H1 is positive definite so F1 is concave up . convex). Find the minimum point D of F1 and the saddle point of F2 (look only where first derivatives are zero).

19 The graph of z x2 y2 is a bowl opening upward. The graph of z x2 y2 is a saddle. The graphD ofCz x2 y2 is a bowl opening downward. WhatD is a test on a;b;c for z ax2 2bxyD cy2 to have a saddle point at .0;0/ ? D C C 20 Which values of c give a bowl and which c give a saddle point for the graph of z 4x2 12xy cy2 ? Describe this graph at the borderline value of c. D C C 21 When S and T are symmetric positive definite, ST might not even be symmetric. But its eigenvalues are still positive. Start from ST x x and take dot products with T x. Then prove >0. D

22 Suppose C is positive definite (so yTC y >0 whenever y 0) and A has indepen- dent columns (so Ax 0 whenever x 0). Apply the energy¤ test to xTATCAx to show that ATCA is positive¤ definite : the¤ crucial matrix in engineering.

T 23 Find the eigenvalues and unit eigenvectors v1; v2 of A A. Then find u1 Av1=1 : D 1 2 10 20 5 15 A and ATA and AAT : D 3 6 D 20 40 D 15 45       T Verify that u1 is a unit eigenvector of AA . Complete the matrices U;†;V .

T 1 2 1 SVD u1 u2 v1 v2 : 3 6 D 0   h i   h i 24 Write down orthonormal bases for the four fundamental subspaces of this A.

T 2 25 (a) Why is the of A A equal to the sum of all aij ? (b) For every rank-one matrix, why is  2 sum of all a2 ? 1 D ij 7.2. Positive Definite Matrices and the SVD 405

26 Find the eigenvalues and unit eigenvectors of ATA and AAT. Keep each Av u: D 1 1 Fibonacci matrix A D 1 0   Construct the singular value decomposition and verify that A equals U†V T. 27 Compute ATA and AAT and their eigenvalues and unit eigenvectors for V and U: 1 1 0 Rectangular matrix A : D 0 1 1   Check AV U† (this will decide signs in U ). † has the same shape as A. D ˙ v v 1 28 Construct the matrix with rank one that has A 12u for 2 .1;1;1;1/ and 1 D D u .2;2;1/. Its only singular value is 1 . D 3 D 29 Suppose A is invertible (with 1 > 2 > 0). Change A by as small a matrix as possible to produce a singular matrix A0. Hint : U and V do not change.

T 1 From A u1 u2 v1 v2 find the nearest A0. D 2 h i   h i 30 The SVD for A I doesn’t use † I . Why is .A I/ not just .A/ I ? C C C C 31 Multiply ATAv  2v by A. Put in parentheses to show that Av is an eigenvector of AAT. We divideD by its length Av  to get the unit eigenvector u. jj jj D 32 My favorite example of the SVD is when Av.x/ dv=dx, with the endpoint con- ditions v.0/ 0 and v.1/ 0. We are lookingD for orthogonal functions v.x/ so that their derivativesD Av D dv=dx are also orthogonal. The perfect choice is D v1 sin x and v2 sin 2x and vk sin kx. Then each uk is a cosine. D D D The derivative of v1 is Av1  cos x u1. The singular values are 1  D D D and k k. Orthogonality of the sines (and orthogonality of the cosines) is the foundationD for Fourier series. You may object to AV U†. The derivative A d=dx is not a matrix! The orthogonal factor V hasD functions sin kx in its columns,D not vectors. The matrix U has cosine functions cos kx. Since when is this allowed? One answer is to refer you to the chebfun package on the web. This extends linear algebra to matrices whose columns are functions—not vectors. Another answer is to replace d=dx by a first difference matrix A. Its shape will be N 1 by N . A has 1’s down the diagonal and 1’s on the diagonal below. Then AVC U† has discrete sines in V and discrete cosines in U . For N 2 those will D D be sines and cosines of 30ı and 60ı in v1 and u1. Can you construct the difference matrix A (3 by 2) and ATA (2 by 2)? The dis-  crete sines are v1 .p3=2; p3=2/ and v2 .p3=2; p3=2/. Test that Av1 is D D orthogonal to Av2. What are the singular values 1 and 2 in † ?