MATH 544 — HOMEWORK V

JAKOB STREIPEL

Problem 4.1.5 Let A ∈ Rn×m be nonzero with rank r. Show that all of the equations ( ( σiui i = 1, 2, . . . , r T σivi i = 1, 2, . . . , r Avi = A ui = 0 i = r + 1, r + 2, . . . , m 0 i = r + 1, r + 2, . . . , n follow from the singular value decomposition A = UΣV T (and its transpose), where u1, u2, . . . , un and v1, v2, . . . , vm are the columns of U and V , respectively. Solution. By construction of A = UΣV T , where V is an orthogonal , meaning V T V = I, we have AV = UΣ. n×m Again by construction Σ = diag(σ1, σ2, . . . , σr, 0,... 0) ∈ R and σ1 ≥ σ2 ≥ ... ≥ σr > 0. Hence writing U in terms of its columns   U = u1 u2 ··· un we have   σ1  σ2     ..     .  UΣ = u1 u2 ··· un    σr     0   .  ..   = σ1u1 σ2u2 ··· σrur 0ur+1 ··· 0rn   = σ1u1 σ2u2 ··· σrur 0 ··· 0 . Writing the left-hand side AV in terms of the columns of V we similarly have   AV = Av1 Av2 ··· Avm . Since these are equal, comparing the columns we have precisely ( σiui i = 1, 2, . . . , r Avi = 0 i = r + 1, r + 2, . . . , m, as claimed. By transposing A = UΣV T we get AT = (UΣV T ) = (V T )T ΣT U T = V ΣT U T ,

Date: April 2, 2018. 1 2 JAKOB STREIPEL whereby, since U T U = I because it’s orthogonal, we get AT U = V ΣT . By essentially the same argument as above on this new equation we get T  T T T  A U = A u1 A u2 ··· A un   T = σ1v1 σ2v2 ··· σrvr 0 ··· 0 = V Σ which by comparing columns tells us ( T σivi i = 1, 2, . . . , r A ui = 0 i = r + 1, r + 2, . . . , n, as desired.  Problem 4.2.3 Recall that the Frobenius is defined by  n m 1/2 X X 2 kAkF = |aij| . i=1 j=1 2 2 2 1/2 Show that kAkF = (σ1 + σ2 + ... + σr ) . Solution. Note that as described, A ∈ Rn×m. We start by showing that the Frobe- nius norm is equal to q T kAkF = tr(A A) where by tr B we mean the trace of a B ∈ Rm×m, i.e. the sum of the diagonal elements, m X tr B = Bii. i=1 Hence since the diagonal elements of AT A are n n n T X X 2 X 2 (A A)ii = ajiaji = aji = |aji| j=1 j=1 j=1 we have m m n T X T X X 2 tr(A A) = (A A)ii = |aji| . i=1 i=1 j=1 Switching the order of the sums we get n m T X X 2 tr(A A) = |aji| j=1 i=1 and switching the names of the dummy variables i and j we get n m T X X 2 2 tr(A A) = |aij| = kAkF i=1 j=1 by definition. Since these are sums of nonnegative numbers, we can take square roots of both sides. With this expression of the Frobenius norm in hand, suppose B = UC where U is an . Then q q q T T T T kBkF = kUCkF = tr((UC) (UC)) = tr(C U UC) = tr(C C) = kCkF MATH 544 — HOMEWORK V 3 since U being orthogonal means U T U = I. Similarly multiplication by an orthog- onal matrix from the right does not change the Frobenius norm since q q T T kCUkF = tr((CU) (CU)) = tr((CU)(CU) ) q q q T T T T = tr(CUU C ) = tr(CC ) = tr(C C) = kCkF where we have twice invoked the fact that tr(AB) = tr(BA) for any two (com- patible) matrices, which follows by the same reversal and relabelling of the sums defining the trace as we used above. T Hence since A = UΣV where Σ = diag(σ1, σ2, . . . , σr, 0,..., 0) and U and V are orthogonal, we have that AV = UΣ, we have q T 2 2 2 kAkF = kUΣV kF = kUΣkF = kΣkF = σ1 + σ2 + ... + σr . 

Problem 4.2.8 MATLAB’s command cond computes the condition number κ2(A). This works for both square and nonsquare matrices. Generate a random 3×3 matrix (A = randn(3)) and use MATLAB to compute κ2(A) three different ways: (i) using cond, (ii) taking the ratio of largest to smallest singular value, and −1 (iii) computing kAk2kA k2 (norm(A)* norm(inv(A))). Solution. Running A = randn(3) we acquire the matrix  0.2323 −0.2365 2.2294 A =  0.4264 2.0237 0.3376 −0.3728 −2.2584 1.0001 (rounded to four decimals, at any rate) on which we will perform the subsequent computations.

(i) Executing cond(A) returns κ2(A) = 76.530628562477673. The reason we print the above to so many decimal places will be apparent in part (iii). (ii) The MATLAB command s = svd(A) returns a list of the singular values of A in descending order. Hence s(1) / s(end) computes

κ2 = σ1/σr = 76.530628562477673, which is precisely what we got above. This is not very surprising since help cond specifies that cond computes κ2 precisely as the ratio of the largest and smallest singular values as given by svd. (iii) Finally norm(A)* norm(inv(A)) returns −1 κ2 = kAk2kA k2 = 76.530628562477744, which differs from the previous two in the thirteenth decimal. Note for the record that on out computer, as measured by tic and toc, the former two are also slightly faster than the last option on the matrix at hand.  Theorem 4.2.11 Let A ∈ Rn×m, n ≥ m, rank(A) = m, with singular values σ1 ≥ σ2 ≥ ... ≥ σm > 0. Then T −1 −2 T −1 T −1 k(A A) k2 = σm , k(A A) A k2 = σm , T −1 −1 T −1 T kA(A A) k2 = σm , and kA(A A) A k2 = 1. 4 JAKOB STREIPEL

Problem 4.2.12 Let A be as in Theorem 4.2.11 with singular value decomposi- tion A = UΣV T . (a) Determine the singular value decompositions of the matrices (AT A)−1, (AT A)−1AT ,A(AT A)−1, and A(AT A)−1AT in terms of the singular value decomposition of A. Use the orthogonality of U and V whenever possible. Pay attention to the dimensions of the various matrices. (b) Use the results of part (a) to prove Theorem 4.2.11. Solution. Note that by assumption A is full rank, whereby AT A is indeed invertible, making the expressions we’re after well-defined. (a) Let us first of all understand AT A. Since A = UΣV T , with U and V by definition being orthogonal, we have AT = (UΣV T )T = (V T )T ΣT U T = V ΣT U T as the singular value decomposition of AT , with ΣT looking very much like Σ, except that the extra block of zeros is to the right of the diagonal block, rather than below it. This importantly means that T 2 2 2 m×m Σ Σ = diag(σ1, σ2, . . . , σm) ∈ R which means that AT A = V ΣT U T UΣV T = V (ΣT Σ)V T is the singular value decomposition of AT A. Now since ΣT Σ is a diagonal square matrix with nonzero diagonal elements, it’s invertible. In particular T −1 −2 −2 −2 m×m (Σ Σ) = diag(σ1 , σ2 , . . . , σm ) ∈ R Hence (AT A)−1 = (V ΣT ΣV T ) = (V T )−1(ΣT Σ)−1V −1 = V (ΣT Σ)−1V T . Now this is not quite the singular value decomposition of (AT A)−1 since the singular values are currently in ascending order, as opposed to descending. We therefore let P be the m × m matrix with 1s on the anti-diagonal and 0s elsewhere—which is clearly orthogonal; P T = P and P T P = P 2 = I—in the following way (AT A)−1 = VPP (ΣT Σ)−1PPV T = (VP )(P (ΣT Σ)−1P )(VP )T where T −1 −2 −2 −2 P (Σ Σ) P = diag(σm , σm−1, . . . , σ1 ) and VP is orthogonal since both V and P are orthogonal. This, therefore, is the de- sired singular value decomposition, since the singular values are now in descending order. Multiplying this by AT from the right we get (AT A)−1AT = (V (ΣT Σ)−1V T )(V ΣT U T ) = V (ΣT Σ)−1ΣT U T , where T −1 T −1 −1 −1 m×n (Σ Σ) Σ = diag(σ1 , σ2 , . . . , σm ) ∈ R . MATH 544 — HOMEWORK V 5

Hence we again need to adjust the order of the diagonal elements by the same multiplication by P above, i.e. (AT A)−1AT = (VP )(P (ΣT Σ)−1ΣT P )(UP )T is the singular value decomposition of (AT A)−1AT . Similarly if we multiply (AT A)−1 from the left by A we get A(AT A)−1 = (UΣV T )(V (ΣT Σ)−1V T ) = UΣ(ΣT Σ)−1V T where T −1 −1 −1 −1 n×m Σ(Σ Σ) = diag(σ1 , σ2 , . . . , σm ) ∈ R and so once more we adjust the order of the singular values and get A(AT A)−1 = (UP )(P Σ(ΣT Σ)−1P )(VP )T as the singular value decomposition of A(AT A)−1. Finally for A(AT A)−1AT we have A(AT A)−1AT = (UΣ(ΣT Σ)−1V T )(V (ΣT Σ)V T ) = UΣ(ΣT Σ)−1ΣT V T where

T −1 T −1 −1 −1 n×n Σ(Σ Σ) Σ = diag(σ1 , σ2 , . . . , σm ) diag(σ1, σ2, . . . , σm) = I ∈ R . Hence A(AT A)−1AT = UIV T is the singular value decomposition of A(AT A)−1AT . (b) Since the singular values of (AT A)−1 are

−2 −2 −2 σm ≥ σm−1 ≥ ... ≥ σ1

T −1 −2 T −1 T T −1 the spectral norm is k(A A) k2 = σm . Similarly for (A A) A and A(A A) the singular values are −1 −1 −1 σm ≥ σm−1 ≥ ... ≥ σ1 so T −1 T T −1 −1 k(A A) A k2 = kA(A A) k2 = σm . Finally for A(AT A)−1AT all singular values are 1, whence

T −1 T kA(A A) A k2 = 1.  Problem 4.3.9 Work this exercise using pencil and paper (and exact arithmetic). Let 1 2 1 A = 2 4 and b = 1 . 3 6 1 (a) Find the SVD of A. (b) Calculate A†. (c) Calculate the minimum norm solution of the least-squares problem for the overdetermined system Ax = b. 6 JAKOB STREIPEL

Solution. (a) Following the lead of the constructive proof of the singular value decomposition theorem, we start with computing the eigenvalues of the matrix 1 2 1 2 3 14 28 AT A = 2 4 = .   2 4 6 28 56 3 6 This has the characteristic equation

T t − 14 −28 2 2 0 = det(tI −A A) = = (t−14)(t−56)−28 = t −70t = t(t−70), −28 t − 56 √ so the eigenvalues are λ1 = 70 and λ2 = 0, which gives us σ1 = 70 as the singular value of A, the other being 0. In other words √  70 0 Σ =  0 0 . 0 0 Note for the record that one positive singular value is expected—the columns of A are multiples of one another, so rank A = 1. We therefore want to find an orthonormal set of eigenvectors of AT A, so consider 14 28 Av = v = 70v = λ v . 1 28 56 1 1 1 1 Since 14 · 1 + 28 · 2 = 70 and 28 · 1 + 56 · 2 = 2 · 70, we see that the vector (1, 2) is an eigenvector corresponding to λ1 = 70. Normalising this vector we have 1 1 v1 = √ . 5 2

For λ2 = 0 note that the second column is twice the first column, so (−2, 1) satisfies the equation

Av2 = λ2v2 = 0, and if we normalise this we get 1 −2 v2 = √ . 5 1 This means that our matrix V in the singular value decomposition is 1 1 −2 V = √ . 5 2 1 This then means that we compute √  5  1 Av1 1 √ 1 u1 = √ = √ 2 5 = √ 2 λ √ 1 70 3 5 14 3 as the first column of U. For the remaining two columns u2 and u3 we can choose arbitrary norm 1 vectors such that u1, u2, and u3 make an orthonormal set. Some- what arbitarily we take  2  1 1 u2 = √ −1 and w3 = 0 , 5 0 0 MATH 544 — HOMEWORK V 7 where the second one is chosen deliberately to be unit length and obviously orthog- onal to u1, so that we save ourselves a step in the Gram-Schmidt process below. We proceed to orthogonalise w3, 1 √ 1 √  2  hw , u i hw , u i 1/ 14 1 2/ 5 1 w − 3 1 u − 3 2 u = 0 − √ 2 − √ −1 3 hu , u i 1 hu , u i 2   1   1   1 1 2 2 0 14 3 5 0 1 1  2   9  1 2 1 = 0 − 2 − −1 = 18 ,   14   5   70   0 3 0 −15 which we normalise to get  9   3  1 1 u3 = √  18  = √  6  . 3 70 −15 70 −5

Arranging these vectors u1, u2, and u3 as columns in the matrix U we have  √ √ √  1/√14 2/√5 3/√70 U = 2/√14 −1/ 5 6/√70 . 3/ 14 0 −5/ 70 Hence we take √ √ √ √ 1/ 14 2/ 5 3/ 70  70 0 √ √ √ 1  1 2 UΣV T = 2/ 14 −1/ 5 6/ 70 0 0 √  √ √    −2 1 3/ 14 0 −5/ 70 0 0 5 √  5 0 1 0 1 2 √ 1  1 2  1 2 = 2 5 0 √ = 2 0 = 2 4 = A,  √  −2 1   −2 1   3 5 0 5 3 0 3 6 verifying that this is indeed a singular value decomposition of A. (b) Knowing the singular value decomposition A = UΣV T of A, we can directly compute A† = V Σ†U T where √ 1/ 70 0 0 Σ† = . 0 0 0 Hence √ √ √ √ 1/ 14 2/ 14 3/ 14 1 1 −2 1/ 70 0 0 √ √ A† = V Σ†U T = √ 2/ 5 −1/ 5 0 2 1 0 0 0  √ √ √  5 3/ 70 6/ 70 −5/ 70 √ √ √ √ 1/ 14 2/ 14 3/ 14 1 1/ 70 0 0 √ √ 1 1 2 3 = √ √ 2/ 5 −1/ 5 0 = . 2/ 70 0 0  √ √ √  70 2 4 6 5 3/ 70 6/ 70 −5/ 70 (c) Knowing the pseudoinverse of A, the minimum norm solution to the least- squares problem Ax = b is given by 1 1 1 2 3 1  6  1 3 x = A†b = 1 = = . 70 2 4 6   70 12 35 6  1 8 JAKOB STREIPEL

Problem 4.3.14 (a) Let A ∈ Rn×m and B = A† ∈ Rm×n. Show that the following four relationships hold: BAB = B ABA = A (4.3.15) (BA)T = BA (AB)T = AB. (b) Conversely, show that if A and B satisfy (4.3.15), then B = A†. Thus the equations (4.3.15) characterise the pseudoinverse. In many books these equations are used to define the pseudoinverse. Solution. Note for the record that we took the equations of (a) as the definition of the pseudoinverse in the lectures, as suggested is often the case at the end of the problem. Therefore we remark that in the book from which the problem hails the definition given is A† = V Σ†U T given a singular value decomposition A = UΣV T , where by Σ† we mean † −1 −1 −1 m×n Σ = diag(σ1 , σ2 , . . . , σr , 0,..., 0) ∈ R given n×m Σ = diag(σ1, σ2, . . . , σr, 0,..., 0) ∈ R and rank A = r. These are the assumptions based on which we will prove the parts of this problem. Note therefore that I 0 Σ†Σ = r ∈ m×m 0 0 R and I 0 ΣΣ† = r ∈ n×n 0 0 R where Ir is the r × r . (a) We have B = A† = V Σ†U T , so we compute BAB = (V Σ†U T )(UΣV T )(V Σ†U T ) = V Σ†ΣΣ†U T since U and V . Hence by the expression for Σ†Σ above, this is BAB = V Σ†U T = B. Similarly ABA = (UΣV T )(V Σ†U T )(UΣV T ) = UΣΣ†ΣV T = UΣV T = A. For the last two equations, (BA)T = ((V Σ†U T )(UΣV T ))T = V ΣT U T U(Σ†)T V T = V (Σ†Σ)T V T . Now importantly since I 0 Σ†Σ = r ∈ m×m 0 0 R this matrix is symmetric, i.e. (Σ†Σ)T = Σ†Σ, whence (BA)T = V Σ†ΣV T = V Σ†(U T U)ΣV T = (V Σ†U T )(UΣV T ) = BA, where we can insert U T U in the middle since U T U is the n × n identity matrix since U is an orthogonal n × n matrix. Similarly (AB)T = ((UΣV T )(V Σ†U T ))T = (UΣΣ†U T )T = U(ΣΣ†)T U T = UΣΣ†U T MATH 544 — HOMEWORK V 9 since for the same reason as above ΣΣ† is also symmetric, and therefore we can insert the n × n identity matrix V T V in the middle, yielding (AB)T = UΣΣ†U T = UΣV T V Σ†U T = AB. (b) We know from the previous part that there exists a matrix B satisfying the four conditions, namely B = A†. We prove that any matrix B satisfying these four conditions must be A† in a slightly roundabout way, namely by proving that given A, there is a unique solution B to those four conditions. In other words, suppose A is fixed and that B satisfies the conditions, and suppose in addition that there exists some matrix C such that it too satisfies the four equations, i.e. CAC = C ACA = A (CA)T = CA (AC)T = AC. Now AB = (ACA)B = ACAB = (AC)T (AB)T since A = ACA, AC = (AC)T , and AB = (AB)T . Moreover AB = (AC)T (AB)T = CT AT BT AT = CT (ABA)T = CT AT since ABA = A. Finally AB = CT AT = (AC)T = AC since again AC = (AC)T . Thus in all AB = AC. The same holds with multiplication by A from the right, i.e. BA = B(ACA) = (BA)T (CA)T = AT BT AT CT = (ABA)T CT = AT CT = (CA)T = CA since, in order, A = ACA, BA = (BA)T , and CA = (CA)T , followed by A = ABA and CA = (CA)T again. Finally this means that B = BAB = B(AB) = BAC = (BA)C = CAC = C since B = BAB and C = CAC. This means that there is only one matrix that satisfies the four equations (4.3.15), † † and we’ve shown that A satisfies them, whereby B = A .