MATLAB PCG algorithm function x = pcg(Afun,b,tol,maxit,Binvfun,x0) x = x0; r = b - feval(Afun,x); rho = 1; for i = 1 : maxit y = feval(Binvfun,r); 5 Eigenvalues rho1 = rho; rho = r' * y; if (i == 1) Example 5.0.1 (Analytic solution of homogeneous linear ordinary differential equations). → [47, p = y; else Remark 5.6.1] beta = rho / rho1; p = y + beta * p; Autonomous homogeneous linear ordinary differential equation (ODE): end q = feval(Afun,p); y˙ = Ay , A ∈ Cn,n . (5.0.1) alpha = rho /(p' * q); x = x + alpha * p; λ 1 if (norm(b - evalf(Afun,x)) <= tolb*norm(b)) , return; end 1 z=S− y A = S ... S−1 , S ∈ Cn,n regular =⇒ y˙ = Ay ←→ z˙ = Dz . r = r - alpha * q;   end λn    =:D 

Dubious termination criterion ! | {z } ➣ solution of initial value problem: △ n −1 4º¿ y˙ = Ay , y(0) = y0 ∈ C ⇒ y(t) = Sz(t) , z˙ = Dz , z(0) = S y0 . 5º¼

Summary: Ôº ¾77 Ôº ¾79

Advantages of Krylov methods vs. direct elimination (, IF they converge at all/sufficiently fast). The initial value problem for the decoupled homogeneous linear ODE z˙ = Dz has a simple analytic • They require system matrix A in procedural form y=evalA(x) ↔ y = Ax only. solution −1 T • They can perfectly exploit sparsity of system matrix. zi(t) = exp(λit)(z0)i = exp(λit) (S )i,:y0 . • They can cash in on low accuracy requirements (, IF viable termination criterion available). In light of Rem. ??:  

• They can benefit from a good initial guess. λ1 A = S ... S−1 ⇔ A (S) = λ (S) i = 1,..., n . (5.0.2)   :,i i :,i λn   In order to find the transformation matrix S all non-zero solution vectors (= eigenvectors) x ∈ Cn of 4.4 Essential Skills Learned in Chapter 4 the linear eigenvalue problem Ax = λx

You should know: have to be found.

• the relation between a linear system of equations with a s.p.d. matrix and a quadratic minimization problem 3 • how the steepest descendent method works and its convergence properties Example 5.0.2 (Normal mode analysis). → [?] • the idea behind the conjugate gradient method and its convergence properties In the computation of the IR spectra of a molecule one is interested into the vibrational frequencies • how to use the Matlab-function pcg n 4º4 of a molecule which is described by n positional degrees of freedom x ∈ R and corresponding 5º¼ the importance of the preconditioner dx n • Ôº ¾78 R Ôº ¾8¼ velocities x˙ = dt ∈ . m 2 6.In case of different masses we end with the system Suppose all masses are equal with an effective mass m kinetic energy K(x, x˙ ) = 2 kx˙ k2 Ma¨ = −Ha . Model for total potential energy U(x) with the mass matrix M which is symmetric, positiv-definite, but not necessarily diagonal. We 1. find a local minimum x∗ of the potential energy: thus must perform normal mode analysis on the matrix M−1H: ∗ 2 ∗ DU(x ) = 0 , H = D U(x ) sym. pos. semi-def. , −1 j j j j M Hw = λjw ⇐⇒ Hw = λjMw . hence with Taylor we have near the local minimum: find a local minimum x∗ of the potential 3 energy: 1 U(x∗ + a) = U(x∗) + aT Ha . 2 2. Newton’mechanics near the local minimum: 5.1 Theory of eigenvalue problems ∗ mx¨ = ma¨ = −DaU(x + a) . As we are around the minimum:

∗ 1 T ma¨ = −Da(U(x ) + a Ha) = −Ha . 2 Definition 5.1.1 (Eigenvalues and eigenvectors).

• λ ∈ C eigenvalue (ger.: Eigenwert) of A ∈ Kn,n :⇔ det(λI − A) = 0 characteristic polynomial χ(λ)

5º¼ 5º½ • spectrum of A ∈ Kn,n: σ(A) := {λ ∈ C: λ eigenvalue of|A} {z } Ôº ¾8½ Ôº ¾8¿

3. As H is real and symmetric, its eigenvectors wj, with j = 1,..., n are orthogonal and hence eigenspace (ger.: Eigenraum) associated with eigenvalue : they form a conveninent basis for representing any vector: Two simple facts: 1 n a = c1(t)w + ... + cn(t)w . λ ∈ σ(A) ⇒ dim EigA(λ) > 0 , (5.1.1) Inserting into the system of second-order ODEs, we get: det(A) = det(AT ) ∀A ∈ Kn,n ⇒ σ(A) = σ(AT ) . (5.1.2) m(c¨ w1 + ... + c¨ wn) = −(c Hw1 + ... + c Hwn) = −(λ w1 + ... + λ wn) 1 n 1 n 1 n ✎ notation: ρ(A) := max{|λ|: λ ∈ σ(A)} =ˆ of A ∈ Kn,n j j where we denoted the associated eigenvalues by λj: Hw = λjw .

4. Taking the scalar product with the eigenvector k we obtain an uncoupled set of equations for ' $ w Theorem 5.1.2 (Bound for spectral radius). each , : ck(t) k = 1,..., n For any matrix norm k·k induced by a vector norm (→ Def. 2.4.2)

mc¨k = −λkck . ρ(A) ≤ kAk .

& % 5. Looking for solutions of the form ck = αk sin(ωkt) and substituting it into the differential equation one get the angular vibrational frequency of the normal mode ωk = λk/m ' $ Lemma 5.1.3 (Gershgorin circle theorem). For any A ∈ Kn,n holds true p n σ(A) ⊂ {z ∈ C: |z − a | ≤ |a |} . jj i6=j ji 5º¼ j[=1 X 5º½

Ôº ¾8¾ & %Ôº ¾84 ' $ Eigenvalue Lemma 5.1.4 (Similarity and spectrum). problems: ➊ Given A ∈ Kn,n find all eigenvalues (= spectrum of A). (EVPs) The spectrum of a matrix is with respect to similarity transformations: ➋ Given A ∈ Kn,n find σ(A) plus all eigenvectors. ➌ Given A Kn,n find a few eigenvalues and associated eigenvectors ∀A ∈ Kn,n: σ(S−1AS) = σ(A) ∀ regular S ∈ Kn,n . ∈

& % (Linear) generalized eigenvalue problem:

' $ Given A ∈ Cn,n, regular B ∈ Cn,n, seek x 6= 0, λ ∈ C Lemma 5.1.5. Existence of a one-dimensional invariant subspace Ax = λBx ⇔ B−1Ax = λx . (5.1.3) ∀C ∈ Cn,n: ∃u ∈ Cn: C (Span {u}) ⊂ Span {u} . x =ˆ , λ =ˆ generalized eigenvalue & % Obviously every generalized eigenvalue problem is equivalent to a standard eigenvalue problem

' $ −1 Theorem 5.1.6 (Schur normal form). Ax = λBx ⇔ B A = λx . However, usually it is not advisable to use this equivalence for numerical purposes! ∀A ∈ Kn,n: ∃U ∈ Cn,n unitary: UHAU = T with T ∈ Cn,n upper triangular . Remark 5.1.1 (Generalized eigenvalue problems and Cholesky factorization).

& % If B = BH s.p.d. (→ Def. 2.6.2) with Cholesky factorization B = RHR

5º½ 5º½ Ax = λBx ⇔ Ay = λy where A := R−HAR−1, y := Rx .

Ôº ¾85 Ôº ¾87

➞ e e ' $ This transformation can be used for efficient computations. Corollary 5.1.7 (Principal axis transformation). △

n,n H H n,n H A ∈ K , AA = A A: ∃U ∈ C unitary: U AU = diag(λ1,..., λn) , λi ∈ C .

& %

A matrix A ∈ Kn,n with AAH = AHA is called normal. 5.2 “Direct” Eigensolvers

• Hermitian matrices: AH = A ➤ σ(A) ⊂ R Examples of normal matrices are • unitary matrices: AH = A−1 ➤ |σ(A)| = 1 Purpose: solution of eigenvalue problems ➊, ➋ for dense matrices “up to machine precision” • skew-Hermitian matrices: A = −AH ➤ σ(A) ⊂ iR

MATLAB-function: eig ➤ Normal matrices can be diagonalized by unitary similarity transformations n,n d = eig(A) : computes spectrum σ(A) = {d1,..., dn} of A ∈ C [V,D] = eig(A) : computes V ∈ Cn,n, diagonal D ∈ Cn,n such that AV = VD Symmetric real matrices can be diagonalized by orthogonal similarity transformations

Remark 5.2.1 (QR-Algorithm). → [20, Sect. 7.5]

In Thm. 5.1.7: – λ1,..., λn = eigenvalues of A Note: All “direct” eigensolvers are iterative methods – Columns of U = orthonormal basis of eigenvectors of A 5º½ 5º¾

Ôº ¾86 Ôº ¾88 d = eig(A,B) : computes all generalized eigenvalues Idea: Iteration based on successive unitary similarity transformations [V,D] = eig(A,B) : computes V ∈ Cn,n, diagonal D ∈ Cn,n such that AV = BVD diagonal matrix , if A = AH , (0) (1) Note: (Generalized) eigenvectors can be recovered as columns of V: A = A −→ A −→ ... −→ upper triangular matrix , else. (→ Thm. 5.1.6) AV = VD ⇔ A(V):,i = (D)i,iV:,i , (superior stability of unitary transformations, see Rem. ??)  if D = diag(d1,..., dn). Code 5.2.2: QR-algorithm with shift QR-algorithm (with shift) 1 function d = e i g q r (A, t o l ) Remark 5.2.4 (Computational effort for eigenvalue computations). 2 n = size (A, 1 ) ; in general: quadratic conver- 3 while (norm( t r i l (A,−1)) > t o l ∗norm(A)) Computational effort (#elementary operations) for eig(): gence 4 s h i f t = A( n , n ) ; cubic convergence for normal 5 [Q,R] = qr ( A − s h i f t ∗ eye ( n )) ; eigenvalues & eigenvectors of A ∈ Kn,n ∼ 25n3 + O(n2) matrices 6 A = Q’ ∗A∗Q; only eigenvalues of A ∈ Kn,n ∼ 10n3 + O(n2) ( [20, Sect. 7.5,8.2]) 7 end → eigenvalues and eigenvectors A AH Kn,n 3 2 3 8 d = diag (A) ; = ∈ ∼ 9n + O(n ) O(n )! H Kn,n 4 3 2 only eigenvalues of A = A ∈ ∼ 3n + O(n ) ) only eigenvalues of tridiagonal A = AH ∈ Kn,n ∼ 30n2 + O(n) Computational cost: O(n3) operations per step of the QR-algorithm Note: eig not available for sparse matrix arguments

# Exception: Library implementations of the QR-algorithm provide numerically stable d=eig(A) for sparse Hermitian matrices eigensolvers (→ Def.??) 5º¾ △ 5º¾

Ôº ¾89 Ôº ¾9½ " ! Example 5.2.5 (Runtimes of eig). △ Code 5.2.6: measuring runtimes of eig Remark 5.2.3 (Unitary similarity transformation to tridiagonal form). 1 function e i g t i m i n g 2 H Successive Householder similarity transformations of A = A : 3 A = rand (500 ,500) ; B = A’ ∗A; (➞ =ˆ affected rows/columns, =ˆ targeted vector) 4 C = gallery ( ’ t r i d i a g ’ ,500 ,1 ,3 ,1) ; 5 times = []; 6 for n=5:5:500 0 0 0 0 0 0 0 0 0 0 7 An = A( 1 : n , 1 : n ) ; Bn = B( 1 : n , 1 : n ) ; Cn = C( 1 : n , 1 : n ) ; 0 0 0 0 0 0 0 8 t1 = 1000; for k =1:3 , t i c ; d = eig (An) ; t1 = min ( t1 , toc ) ; end −→ −→ −→ 0 9 t2 = 1000; for k =1:3 , t i c ;[ V,D] = eig (An) ; t2 = min ( t2 , toc ) ; end 0 t3 = 1000; for k =1:3 , t i c ; d = eig (Bn) ; t3 = min ( t3 , toc ) ; end 0 0 0 0 0 0 1 t4 = 1000; for k =1:3 , t i c ;[ V,D] = eig (Bn) ; t4 = min ( t4 , toc ) ; end transformation to tridiagonal form ! (for general matrices a similar strategy can achieve a 2 t5 = 1000; for k =1:3 , t i c ; d = eig (Cn) ; t5 = min ( t5 , toc ) ; end 3 times = [ times ; n t1 t2 t3 t4 t5 ]; similarity transformation to upper Hessenberg form) 4 end 5 6 figure ; this transformation is used as a preprocessing step for QR-algorithm ➣ eig. △ 7 loglog ( times ( :, 1 ) , times ( :, 2 ) , ’ r + ’ , times ( :, 1 ) , times ( :, 3 ) , ’m∗ ’ ,... 8 times ( :, 1 ) , times ( :, 4 ) , ’ cp ’ , times ( :, 1 ) , times ( :, 5 ) , ’ b^ ’ ,... 9 times ( :, 1 ) , times ( :, 6 ) , ’ k . ’ ) ; 0 xlabel ( ’ { \ bf matrix size n } ’ , ’ f o n t s i z e ’ ,14) ;

5º¾ 5º¾ Similar functionality for generalized EVP Ax = λBx, A, B ∈ Cn,n 1 ylabel ( ’ { \ bf time [ s ] } ’ , ’ f o n t s i z e ’ ,14) ;

Ôº ¾9¼ Ôº ¾9¾ eig runtimes nxn random matrix 1 2 10 10 2 t i t l e ( ’ eig runtimes ’ ) ; d = eig(A) d = eig(A) [V,D] = eig(A) [V,D] = eig(A) d = eig(B) O(n3) 3 legend ( ’ d = eig (A) ’ , ’ [ V,D] = eig (A) ’ , ’ d = eig (B) ’ , ’ [ V,D] = eig (B) ’ , ’ d = 1 0 [V,D] = eig(B) 10 10 eig (C) ’ ,... d = eig(C)

0 10 4 ’ l o c a t i o n ’ , ’ northwest ’ ) ; −1 10 5 −1 10

6 −2 p r in t −depsc2 ’ ../ PICTURES/ e i g t i m i n g a l l . eps ’ 10

−2 7 time [s] time [s] 10

−3 8 figure ; 10 −3 10 9 loglog ( times ( :, 1 ) , times ( :, 2 ) , ’ r + ’ , times ( :, 1 ) , times ( :, 3 ) , ’m∗ ’ ,...

−4 0 10 −4 times ( :, 1 ) , ( times ( :, 1 ) . ^ 3 ) / ( times ( 1 , 1 ) ^3)∗times ( 1 , 2 ) , ’ k−’ ) ; 10 1 xlabel ( ’ { \ bf matrix size n } ’ , ’ f o n t s i z e ’ ,14) ;

−5 −5 10 10 2 0 1 2 3 0 1 2 3 ylabel ( ’ { \ bf time [ s ] } ’ , ’ f o n t s i z e ’ ,14) ; 10 10 10 10 10 10 10 10 matrix size n Fig. 47 matrix size n Fig. 48 3 t i t l e ( ’ nxn random matrix ’ ) ; 4 legend ( ’ d = eig (A) ’ , ’ [ V,D] = eig (A) ’ , ’O( n ^3) ’ , ’ l o c a t i o n ’ , ’ northwest ’ ) ; 5 6 p r in t −depsc2 ’ ../ PICTURES/ eigtimingA . eps ’ 7 8 figure ; 9 loglog ( times ( :, 1 ) , times ( :, 4 ) , ’ r + ’ , times ( :, 1 ) , times ( :, 5 ) , ’m∗ ’ ,... 0 times ( :, 1 ) , ( times ( :, 1 ) . ^ 3 ) / ( times ( 1 , 1 ) ^3)∗times ( 1 , 2 ) , ’ k−’ ) ; 1 xlabel ( ’ { \ bf matrix size n } ’ , ’ f o n t s i z e ’ ,14) ; 2 ylabel ( ’ { \ bf time [ s ] } ’ , ’ f o n t s i z e ’ ,14) ;

3 t i t l e ( ’ nxn random Hermitian matrix ’ ) ; 5º¾ 5º¾

Ôº ¾9¿ Ôº ¾95

nxn random Hermitian matrix nxn tridiagonel Hermitian matrix 2 0 10 10 4 legend ( ’ d = eig (A) ’ , ’ [ V,D] = eig (A) ’ , ’O( n ^3) ’ , ’ l o c a t i o n ’ , ’ northwest ’ ) ; d = eig(A) d = eig(A) [V,D] = eig(A) O(n2) 3 1 O(n ) 5 10 −1 6 p r in t −depsc2 ’ ../ PICTURES/ eigtimingB . eps ’ 10

0 7 10

−2 8 figure ; 10 −1 10 9 loglog ( times ( :, 1 ) , times ( :, 6 ) , ’ r ∗ ’ ,...

−2 time [s] time [s] 10 0 times ( :, 1 ) , ( times ( :, 1 ) . ^ 2 ) / ( times ( 1 , 1 ) ^2)∗times ( 1 , 2 ) , ’ k−’ ) ; −3 10 1 xlabel ( ’ { \ bf matrix size n } ’ , ’ f o n t s i z e ’ ,14) ; −3 10 2 ylabel ( ’ { \ bf time [ s ] } ’ , ’ f o n t s i z e ’ ,14) ; −4 10 3 −4 t i t l e ( ’ nxn t r i d i a g o n e l Hermitian matrix ’ ) ; 10 4 legend ( ’ d = eig (A) ’ , ’O( n ^2) ’ , ’ l o c a t i o n ’ , ’ northwest ’ ) ;

−5 −5 10 10 5 0 1 2 3 0 1 2 3 10 10 10 10 10 10 10 10 matrix size n Fig. 49 matrix size n Fig. 50 6 p r in t −depsc2 ’ ../ PICTURES/ eigtimingC . eps ’ ☛ For the sake of efficiency: think which information you really need when computing eigenval- ues/eigenvectors of dense matrices

Potentially more efficient methods for sparse matrices will be introduced below in Sects. ??, 5.3.

3

5º¾ 5º¿

Ôº ¾94 Ôº ¾96 5.3 Krylov Subspace Methods Code 5.3.1: Lanczos process 1 function [ V, alph , bet ] = lanczos (A, k , z0 ) Algorithm for computing Vl and Tl:2 V = z0 / norm( z0 ) ; Recall from Sect. 4.2.2 , Lemma 4.2.5: Lanczos process 3 alph=zeros ( k , 1 ) ; 4 bet = zeros ( k , 1 ) ; ' $ Computational effort/step: (0) 5 for j =1:k Residuals r0,..., rm−1 generated in CG iteration, Alg. 4.2.1 applied to Ax = z with x = 0, 1× A×vector 6 p = A∗V ( :, j ) ; alph ( j ) = dot ( p ,V ( :, j )) ; provide orthogonal basis for Km(A, z) (, if rk 6= 0). 2 dot products 7 w = p − alph ( j ) ∗V ( :, j ) ; 8 i f ( j > 1) , w = w bet ( j 1) V ( :, j 1) ; end ; & % 2 AXPY-operations − − ∗ − 9 bet ( j ) = norm(w) ; V = [ V,w/ bet ( j ) ]; Inexpensive Ritz projection of Ax = λx onto Km(A, z): orthogonal matrix 1 division 10 end r r 11 bet = bet ( 1 : end−1) ; VT AV x = λx , V := 0 ,..., m−1 ∈ Rn,m . (5.3.1) m m m kr k kr k  0 m−1  Total computational effort for steps of Lanczos process, if A has at most non-zero entries per row: recall: residuals generated by short recursions, see Alg. 4.2.1 l k O(nkl)

Note: Code 5.3.0 assumes that no residual vanishes. This could happen, if z exactly belonged ' $ 0 Lemma 5.3.1 (Tridiagonal Ritz projection from CG residuals). to the span of a few eigenvectors. However, in practical computations inevitable round-off errors will T always ensure that the iterates do not stay in an invariant subspace of A, cf. Rem. ??. VmAVm is a tridiagonal matrix.

& % Convergence (what we expect from the above considerations) → [14, Sect. 8.5])

5º¿ (l) (l) (l) 5º¿ In -th step: l λn ≈ µl , λn−1 ≈ µl−1 , ... , λ1 ≈ µ1 ,

Ôº ¾97 Ôº ¾99

(l) (l) (l) (l) (l) Proof. Lemma 4.2.5: {r0,..., rℓ−1} is an orthogonal basis of Kℓ(A, r0), if all the residuals are non- σ(T ) = {µ ,..., µ }, µ ≤ µ ≤ ··· ≤ µ . T l 1 l 1 2 l zero. As AKℓ−1(A, r0) ⊂ Kℓ(A, r0), we conclude the orthogonality rmArj for all j = 0,..., m−2. Since

T T Example 5.3.2 (Lanczos process for eigenvalue computation). VmAVm = r Arj−1 , 1 ≤ i, j ≤ m , ij i−1   A from Ex. ?? A = gallery('minij',100); 2 4 the assertion of the theorem follows. 2 10 10

2 10 1 10

0 10

0 −2 10 10

−4 α β 10 1 1 −1 10

−6 β1 α2 β2 10  ..  β α . −2 −8 2 3 10 10 error in Ritz values

. . |Ritz value−eigenvalue| H  .. ..  k,k −10 V AV T K [tridiagonal matrix] 10 l l =   =: l ∈ λ λ −3 n n   10 λ λ n−1 n−1 −12  .  λ 10 λ .. n−2 n−2   λ λ   n−3 n−3 .. .. −4 −14 . . β 10 10  k−1 0 5 10 15 20 25 30 0 5 10 15   Fig. 51 Fig. 52  βk−1 αk  step of Lanzcos process step of Lanzcos process     Observation: same as in Ex. ??, linear convergence of Ritz values to eigenvalues.

However for A ∈ R10,10, a = min{i, j} good initial convergence, but sudden “jump” of Ritz values

5º¿ ij 5º¿ off eigenvalues!

Ôº ¾98 Ôº ¿¼¼ T Conjecture:Impact of roundoff errors, cf. Ex. 4.2.4 l σ( l) 1 38.500000 2 3.392123 44.750734 3 1.117692 4.979881 44.766064 3 4 0.597664 1.788008 5.048259 44.766069 5 0.415715 0.925441 1.870175 5.048916 44.766069 0 336507 0 588906 0 995299 1 872997 5 048917 44 766069 Example 5.3.3 (Impact of roundoff on Lanczos process). 6 ...... 7 0.297303 0.431779 0.638542 0.999922 1.873023 5.048917 44.766069 8 0.276160 0.349724 0.462449 0.643016 1.000000 1.873023 5.048917 44.766069 9 0.276035 0.349451 0.462320 0.643006 1.000000 1.873023 3.821426 5.048917 44.766069 A ∈ R10,10 , a = min{i, j} . A = gallery('minij',10); ij 10 0.263867 0.303001 0.365376 0.465199 0.643104 1.000000 1.873023 5.048917 44.765976 44.766069 3

Computed by [V,alpha,beta] = lanczos(A,n,ones(n,1));, see Code 5.3.0:

38.500000 14.813845 Idea: do not rely on orthogonality relations of Lemma 4.2.5 14.813845 9.642857 2.062955 use explicit Gram-Schmidt orthogonalization  2.062955 2.720779 0.776284   0.776284 1.336364 0.385013   0.385013 0.826316 0.215431  T =    0 215431 0 582380 0 126781  Details: inductive approach: given {v ,..., v } ONB of K (A, z)  . . .  1 l l  0 126781 0 446860 0 074650   . . .  l  0.074650 0.363803 0.043121  H vl+1   vl+1 := Avl − (vj Avl) vj , vl+1 := ⇒ vl+1 ⊥ Kl(A, z) . (5.3.2)  0.043121 3.820888 11.991094  kv k   j=1 l+1 2  11.991094 41.254286  5º¿ X e 5º¿   (Gram-Schemidt, cf. (4.2.6)) orthogonal   Ôº ¿¼½ e Ôº ¿¼¿ Code 5.3.4: Arnoldi process 1 function [ V,H] = a r n o l d i (A, k , v0 ) σ(A) = {0.255680,0.273787,0.307979,0.366209,0.465233,0.643104,1.000000,1.873023,5.048917,44.766069} Arnoldi process 2 V = [ v0 / norm( v0 ) ]; σ(T) = {0.263867,0.303001,0.365376,0.465199,0.643104,1.000000,1.873023,5.048917,44.765976,44.766069} In step l: 3 H = zeros ( k+1 ,k ) ; 4 for l =1:k Uncanny cluster of computed eigenvalues of T (“ghost eigenvalues”,[20, Sect. 9.2.5]) 1× A×vector 5 v t = A∗V ( :, l ) ; 1 000000 0 000000 0 000000 0 000000 0 000000 0 000000 0 000000 0 000251 0 258801 0 883711 l + 1 dot products ...... 6 for j =1: l 0 000000 1 000000 0 000000 0 000000 0 000000 0 000000 0 000000 0 000106 0 109470 0 373799 l AXPY-operations . . − ...... 7 H( j , l ) = dot (V ( :, j ) , v t ) ;  0.000000 −0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000005 0.005373 0.018347  n divisions 8 v t = v t − H( j , l ) ∗V ( :, j ) ; 0.000000 0.000000 0.000000 1.000000 −0.000000 0.000000 0.000000 0.000000 0.000096 0.000328   ➣ Computational cost for l steps, if at most k 9 end  0.000000 0.000000 0.000000 −0.000000 1.000000 0.000000 0.000000 0.000000 0.000001 0.000003  VHV =   non-zero entries in each row of A: O(nkl2) 10 H( l +1 , l ) = norm( v t ) ;  0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 −0.000000 0.000000 0.000000 0.000000    11 i f (H( l +1 , l ) == 0) , break ; end  0.000000 0.000000 0.000000 0.000000 0.000000 −0.000000 1.000000 −0.000000 0.000000 0.000000  H(l+1,l) = 0 ➞ STOP !   12 V = [ V, v t /H( l +1 , l ) ];  0.000251 0.000106 0.000005 0.000000 0.000000 0.000000 −0.000000 1.000000 −0.000000 0.000000    13 end  0 258801 0 109470 0 005373 0 000096 0 000001 0 000000 0 000000 0 000000 1 000000 0 000000   ...... − . . .   0.883711 0.373799 0.018347 0.000328 0.000003 0.000000 0.000000 0.000000 0.000000 1.000000      #If it does not stop prematurely, the Arnoldi process of Code 5.3.3 will yield an orthonormal basis Cn,n Loss of orthogonality of residual vectors due to roundoff (OBN) of Kk+1(A, v0) for a general A ∈ .

(compare: impact of roundoff on CG iteration, Ex. 4.2.4 " !

5º¿ 5º¿

Ôº ¿¼¾ Ôº ¿¼4 Algebraic view of the Arnoldi process of Code 5.3.3, meaning of output H: Eigenvalue approximation for general EVP Ax = λx by Arnoldi process:

vHAv , if (l) (l) (l) i j i ≤ j , In l-th step: λn ≈ µ , λn−1 ≈ µ , ... , λ1 ≈ µ , l+1,l l l−1 1 V = v1,..., v : AV = V H , H ∈ K mit hij = kv k , if i = j + 1 , (l) (l) (l) (l) (l) l l l l+1 l l  i 2 σ(H ) = {µ ,..., µ }, |µ | ≤ |µ | ≤ ··· ≤ |µ | .  l 1 l 1 2 l   0 else. e e e ➡ e Hl = non-square upper Hessenberg matrices 

e       1 l l

1 1 H

A + l v l v v v

    =  v                       e                                   

Translate Code 5.3.3 to matrix calculus:

5º¿ 5º¿

Ôº ¿¼5 Ôº ¿¼7

' $ Code 5.3.6: Arnoldi eigenvalue approximation Lemma 5.3.2 (Theory of Arnoldi process). 1 function [ dn , V, Ht ] = a r n o l d i e i g (A, v0 , k , t o l ) For the matrices V ∈ Kn,l, H ∈ Kl+1,l arising in the l-th step, l ≤ n, of the Arnoldi process l l 2 n = size (A, 1 ) ; V = [ v0 / norm( v0 ) ]; Arnoldi process for computing holds 3 H = zeros ( 1 , 0 ) ; dn = zeros ( k , 1 ) ; the k largest (in modulus) H e 4 for l =1:n n,n (i) Vl Vl = I (unitary matrix), eigenvalues of A ∈ C 5 d = dn ; (ii) AV = V H , H is non-square upper Hessenberg matrix, 6 ' $ l l+1 l l Ht = [ Ht , zeros ( l , 1 ) ; zeros (1 , l ) ]; 1 A×vector per step H Kl,l 7 v t = A∗V ( :, l ) ; ➣ (iii) Vl AVl = Hl ∈ , hij = hij for 1 ≤ i, j ≤ l, ( attractive for sparse e e 8 for j =1: l H matrices) (iv) If A = A then H is tridiagonal (➣ Lanczos process) 9 Ht ( j , l ) = dot (V ( :, j ) , v t ) ; l e 10 v t = v t − Ht ( j , l ) ∗V ( :, j ) ; & % & % 11 end However: required storage in- Proof. Direct from Gram-Schmidt orthogonalization and inspection of Code 5.3.3. 2 12 ev = sort ( eig ( Ht ( 1 : l , 1 : l ))) ; creases with number of steps, cf. 13 dn ( 1 : min ( l , k )) = ev ( end: −1:end−min ( l , k ) +1) ; situation with GMRES, Sect. ??. Remark 5.3.5 (Arnoldi process and Ritz projection). 14 i f (norm( d−dn ) < t o l ∗norm( dn )) , break ; end ; 15 Ht ( l +1 , l ) = norm( v t ) ; Heuristic termination criterion Interpretation of Lemma 5.3.2 (iii) & (i): 16 V = [ V, v t / Ht ( l +1 , l ) ]; 17 end

Hlx = λx is a (generalized) Ritz projection of EVP Ax = λx Example 5.3.7 (Stabilty of Arnoldi process).

R100,100 △ 5º¿ A ∈ , aij = min{i, j} . A = gallery('minij',100); 5º¿

Ôº ¿¼6 Ôº ¿¼8 4 4 10 10 Example 5.3.8 (Eigenvalue computation with Arnoldi process).

2 2 10 10

0 0 10 10 Eigenvalue approximation from Arnoldi process for non-symmetric A, initial vector ones(100,1);

−2 −2 10 10 1 n=100; 2 M= f u l l ( gallery ( ’ t r i d i a g ’ , 0.5 ones ( n 1,1) ,2 ones ( n , 1 ) , 1.5 ones ( n 1,1))) ; −4 −4 − ∗ − ∗ − ∗ − 10 10 3 A=M∗diag ( 1 : n ) ∗inv (M) ; −6 −6 10 10

−8 −8 10 10 error in Ritz values

−10 −10 10 λ 10 λ Approximation of largest eigenvalues Approximation of largest eigenvalues 2 n Approximation error of Ritz value n 10 λ λ 100 n−1 n−1 −12 −12 10 λ 10 λ n−2 n−2 λ λ 95 1 n−3 n−3 10 −14 −14 10 10 0 5 10 15 0 5 10 15 90 step of Lanzcos process Fig. 53 Step of Arnoldi process Fig. 54 0 10 85 Lanczos process: Ritz values Arnoldi process: Ritz values 80 −1 10

75

−2 10 Ritz values during Arnoldi process for A = gallery('minij',10); ↔ Ex. 5.3.2 Ritz value 70

−3 65 10

60 Approximation error of Ritz value λ −4 λ n 10 n λ λ 55 n−1 n−1 λ λ n−2 n−2 −5 50 10 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Step of Arnoldi process Fig. 55 Step of Arnoldi process Fig. 56

5º¿ 5º¿

Ôº ¿¼9 Ôº ¿½½

l σ(H ) Approximation of smallest eigenvalues Approximation of smallest eigenvalues 2 l 10 λ 38 500000 10 1 1 . λ 2 1 λ 10 2 3.392123 44.750734 9 3

1 117692 4 979881 44 766064 8 0 3 . . . 10 0 597664 1 788008 5 048259 44 766069 4 . . . . 7 −1 0 415715 0 925441 1 870175 5 048916 44 766069 10 5 . . . . . 6

−2 0.336507 0.588906 0.995299 1.872997 5.048917 44.766069 10 6 5

7 0.297303 0.431779 0.638542 0.999922 1.873023 5.048917 44.766069 Ritz value −3 4 10 8 0.276159 0.349722 0.462449 0.643016 1.000000 1.873023 5.048917 44.766069 3 −4 9 0.263872 0.303009 0.365379 0.465199 0.643104 1.000000 1.873023 5.048917 44.766069 10

2 Approximation error of Ritz value 0 255680 0 273787 0 307979 0 366209 0 465233 0 643104 1 000000 1 873023 5 048917 44 766069 λ 10 ...... −5 1 10 λ 1 2 λ 3 −6 0 10 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Observation: (almost perfect approximation of spectrum of A) Step of Arnoldi process Fig. 57 Step of Arnoldi process Fig. 58

For the above examples both the Arnoldi process and the Lanczos process are algebraically equiv- Observation: “vaguely linear” convergence of largest and smallest eigenvalues, cf. Ex. ??. 3 alent, because they are applied to a symmetric matrix A = AT . However, they behave strikingly differently, which indicates that they are not numerically equivalent. Krylov subspace iteration methods (= Arnoldi process, Lanczos process) attractive for The Arnoldi process is much less affected by roundoff than the Lanczos process, because it does not computing a few of the largest/smallest eigenvalues and associated eigenvectors of large take for granted orthogonality of the “residual vector sequence”. Hence, the Arnoldi process enjoys sparse matrices. superior numerical stability (→ Sect. ??, Def. ??) compared to the Lanczos process.

3 5º¿ 5º¿

Ôº ¿½¼ Ôº ¿½¾ ' $ Remark 5.3.9 (Krylov subspace methods for generalized EVP). Theorem 5.4.1. For any A ∈ Km,n there are unitary matrices U ∈ Km,m, V ∈ Kn,n and a (∗) m,n (generalized) diagonal matrix Σ = diag(σ1,..., σp) ∈ R , p := min{m, n}, σ1 ≥ σ2 ≥ Adaptation of Krylov subspace iterative eigensolvers to generalized EVP: Ax Bx, B s.p.d.: = λ σp ≥ 0 such that replace Euclidean inner product with “B-inner product” (x, y) 7→ xHBy. △ A = UΣVH .

& %

(∗): Σ (generalized) diagonal matrix :⇔ (Σ)i,j = 0, if i 6= j, 1 ≤ i ≤ m, 1 ≤ j ≤ n. MATLAB-functions: d = eigs(A,k,sigma) : k largest/smallest eigenvalues of A d = eigs(A,B,k,sigma): k largest/smallest eigenvalues for generalized EVP Ax = λBx,B s.p.d.     d = eigs(Afun,n,k) : Afun = handle to function providing matrix×vector for   −1 −1 eigs     A/A /A − αI/(A − αB) . (Use flags to tell about spe-        A        cial properties of matrix behind Afun.)   =  U   Σ  VH                               eigs just calls routines of the open source ARPACK numerical library.                                        

5º4 5º4

Ôº ¿½¿ Ôº ¿½5 5.4 Singular Value Decomposition

  Remark 5.4.1 (Principal component analysis (PCA)). H  A   U   Σ  V =     Rm         Given: n data points aj ∈ , j = 1,..., n, in m-dimensional (feature) space                         Conjectured: “linear dependence”: a ∈ V , V ⊂ Rm p-dimensional subspace,   j   p < min{m, n} unknown (➣ possibility of dimensional reduction) Proof. (of Thm. 5.4.1, by induction) Task (PCA): determine (minimal) p and (orthonormal basis of) V [47, Thm. 4.2.3]: Continuous functions attain extremal values on compact sets (here the unit ball Rn {x ∈ : kxk2 ≤ 1}) Perspective of : ➤ Kn Km ∃x ∈ , y ∈ , kxk = kyk2 = 1 : Ax = σy , σ = kAk2 , m,n Conjecture ⇔ rank(A) = p for A := (a1,..., an) ∈ R , Im(A) = V where we used the definition of the matrix 2-norm, see Def. 2.4.2. By Gram-Schmidt orthogonalization: ∃V ∈ Kn,n−1, U ∈ Km,m−1 such that Extension: Data affected by measurement errors V = (x V) ∈ Kn,n , U = (y U) ∈ Km,m are unitary. e e (but conjecture upheld for unperturbed data) e e H H H Application: Chemometrics (multivariate calibration methods for the analysis of chemical mixtures) H H y Ax y AV σ w 5º4 U AV = (y U) A(x V) = H H = =: A1 . 5º4 U Ax U AV ! 0 B

Ôº ¿½4   Ôº ¿½6 △ e e e e e e Since △ 2 2 σ σ2 + wHw A = = (σ2 + wHw)2 + kBwk2 ≥ (σ2 + wHw)2 , 1 w Bw 2 Remark 5.4.3 (Uniqueness of SVD).   2   2 SVD of Def. 5.4.2 is not (necessarily) unique, but the singular values are. we conc lude

2 σ 2 2 H 2 kA xk A1 (σ + w w) Assume that A has two singular value decompositions kA k2 = sup 1 2 ≥ w 2 ≥ = σ2 + wHw . (5.4.1) 1 2 2 σ 2 2 H 06=x∈Kn kxk2  σ + w w H H H H H H H w 2 A = U1Σ1V1 = U2Σ2V2 ⇒ U1 Σ1Σ1 U1 = AA = U2 Σ2Σ2 U2 . 1 1 2 2 2  =diag(s1,...,sm) =diag(s1,...,sm) 2 2 H 2 (5.4.1) 2 2 2 σ = kAk = U AV = kA1k =⇒ kA1k = kA1k + kwk ⇒ w = 0 . 2 2 2 2 2 2 | {z } | {z }

Two similar diagonal matrices are equal ! 2

σ 0 A = . 1 0 B   Then apply induction argument to B 2. △

MATLAB-functions (for algorithms see [20, Sect. 8.3]): Definition 5.4.2 (Singular value decomposition (SVD)). The decomposition A = UΣVH of Thm. 5.4.1 is called singular value decomposition (SVD) of

5º4 5º4 A. The diagonal entries σi of Σ are the singular values of A.

Ôº ¿½7 Ôº ¿½9

s = svd(A) : computes singular values of matrix A ' $ [U,S,V] = svd(A) : computes singular value decomposition according to Thm. 5.4.1 Lemma 5.4.3. The squares σ2 of the non-zero singular values of A are the non- i [U,S,V] = svd(A,0) : “economical” singular value decomposition for m > n:: U ∈ zero eigenvalues of AHA, AAH with associated eigenvectors (V) ,..., (V) , :,1 :,p Km,n, Σ ∈ Rn,n, V ∈ Kn,n (U):,1,..., (U):,p, respectively. s = svds(A,k) : k largest singular values (important for sparse A → Def. 2.5.1) [U,S,V] = svds(A,k): partial singular value decomposition: U ∈ Km,k, V ∈ Kn,k, & % Σ ∈ Rk,k diagonal with k largest singular values of A. Proof. AAH and AHA are similar (→ Lemma 5.1.4) to diagonal matrices with non-zero diagonal 2 entries σi (σi 6= 0), e.g., AAH = UΣHHVΣHUH = U ΣΣH UH . 2 Explanation: “economical” singular value decomposition: diagonal matrix

Remark 5.4.2 (SVD and additive rank-1 decomposition). | {z }     Recall from linear algebra: rank-1 matrices are tensor products of vectors  A   U  Σ H   =      V      A ∈ Km,n and rank(A) = 1 ⇔ ∃u ∈ Km, v ∈ Kn: A = uvH , (5.4.2)             Km         because rank(A) = 1 means that ax = µ(x)u for some u ∈ and linear form x 7→ µ(x). By the         H     Riesz representation theorem the latter can be written as µ(x) = v x.             Singular value decomposition provides additive decomposition into rank-1 matrices: p H H A = UΣV = σj (U):,j(V):,j . (5.4.3) 5º4 5º4 Xj=1 Ôº ¿½8 Ôº ¿¾¼ ➋ with perturbations: (MATLAB) algorithm for computing SVD is (numerically) stable → Def. ?? H Complexity: SVD: A = UΣV satisfies σ1 ≥ σ2 ≥ ... σp≫σp+1 ≈ ··· ≈ σmin{m,n}≈ 0 , 2mn2 + 2n3 + O(n2) + O(mn) for s = svd(A), V = Span {(U):,1,..., (U):,p} . 4m2n + 22n3 + O(mn) + O(n2) for [U,S,V] = svd(A), ONB of V O(mn2) + O(n3) for [U,S,V]=svd(A,0), m ≫ n. | {z } If there is a pronounced gap in distribution of the singular values, which separates p large from min{m, n} − p relatively small singular values, this hints that Im(A) has essentially dimension p. It • Application of SVD: computation of rank (→ Def. 2.0.2), kernel and range of a matrix depends on the application what one accepts as a “pronounced gap”.

Code 5.4.4: PCA in three dimensions via SVD ' $ 1 % Use of SVD for PCA with perturbations Lemma 5.4.4 (SVD and rank of a matrix). 2 If the singular values of A satisfy σ1 ≥ ··· ≥ σr > σr+1 = ··· σp = 0, then 3 V = [1 , −1; 0 , 0 . 5 ; −1 , 0 ]; A = V∗rand (2 ,20) +0.1∗rand (3 ,20) ; • rank(A) = r , 4 [U, S,V] = svd (A, 0 ) ; 5 • Ker(A) = Span (V):,r+1,..., (V):,n , 6 figure ; sv = diag (S( 1 : 3 , 1 : 3 )) • Im(A) = Span (U):,1,..., (U):,r .  7 &  % 8 [ X,Y] = meshgrid ( −2:0.2:0 , −1:0.2:1) ; n = size (X, 1 ) ; m = size (X, 2 ) ; 9 figure ; plot3 (A ( 1 ,: ) ,A ( 2 ,: ) ,A ( 3 ,: ) , ’ r ∗ ’ ) ; grid on ; hold on ; 10 M = U( :, 1 : 2 ) ∗[ reshape (X, 1 , prod ( size (X))) ; reshape (Y, 1 , prod ( size (Y))) ];

5º4 11 mesh( reshape (M( 1 ,: ) ,n ,m) , reshape (M( 2 ,: ) ,n ,m) , reshape (M( 3 ,: ) ,n ,m)) ; 5º4

Ôº ¿¾½ 12 colormap ( cool ) ; view (35 ,10) ; Ôº ¿¾¿

Illustration: columns = ONB of Im(A) rows = ONB of Ker(A) 13 14 p r in t −depsc2 ’ ../ PICTURES/ svdpca . eps ’ ;

Σr 0      

      H singular values: 0.2 A = U V (5.4.4)         0       3.1378       −0.2      0 0          1.8092 −0.4               0.1792 −0.6       −0.8             −1 We observe a gap between the second and third −1.2 Remark: MATLAB function r=rank(A) relies on svd(A) singular value ➣ the data points essentially lie −1.4 −1.6

in a 2D subspace. −1.8 1.5 −1 1 −0.5 0 0.5 0.5 0 1 Fig. 59 1.5 −0.5 Lemma 5.4.4 PCA by SVD Example 5.4.5 (Principal component analysis for data analysis). ➊ no perturbations: Rm,n H A ∈ , m ≫ n: SVD: A = UΣV satisfies σ1 ≥ σ2 ≥ ... σp > σp+1 = ··· = σmin{m,n} = 0 , V = Span {(U) ,..., (U) } . :,1 :,p Columns A → series of measurements at different times/locations etc. ONB of V Rows of A → measured values corresponding to one time/location etc.

5º4 5º4

| {z } Ôº ¿¾¾ Ôº ¿¾4 Goal: detect linear correlations 0.25 0.4 1st principal component 2nd principal component 0.2 0.35

0.15 Concrete: two quantities measured over one year at 10 different sites 0.3

0.1 0.25 1.5 (Of course, measurements affected by errors/fluc- 0.05 tuations) 0.2 0

1 0.15 −0.05

n = 10; principal component 0.1 m = 50; −0.1

0.5 1st model vector contribution of principal component x = sin(pi (1:m)'/m); 0.05 * −0.15 2nd model vector 1st principal component y = cos(pi*(1:m)'/m); 2nd principal component −0.2 0 A = []; 0 5 10 15 20 25 30 35 40 45Fig. 6050 0 2 4 6 8 10 12 14 16 18 20 0 No. of measurement measurement Fig. 61 3 for i = 1:n A = [A, x.*rand(m,1),... −0.5 y+0.1*rand(m,1)]; measurement 1 • Application of SVD: extrema of quadratic forms on the unit sphere measurement 2 end measurement 2 measurement 4 −1 Kn 0 5 10 15 20 25 30 35 40 45 50 A minimization problem on the Euclidean unit sphere {x ∈ : kxk2 = 1}: Km,n Kn given A ∈ , m > n, find x ∈ , kxk2 = 1 , kAxk2 → min . (5.4.5) Use that multiplication with unitary matrices preserves the 2-norm (→ Thm. ??) and the singular value decomposition A = UΣVH (→ Def. 5.4.2): 2 2 5º4 2 H H 5º4 min kAxk2 = min UΣV x = min UΣ(V x) Ôº ¿¾5 kxk2=1 kxk2=1 2 kVHxk =1 2 Ôº ¿¾7 2 16 2 2 2 2 2 2 = min kΣyk2 = min (σ1y1 + ··· + σnyn) ≥ σn . kyk2=1 kyk2=1 14 The minimum 2 is attained for y e minimizer x Ve V . 12 σn = n ⇒ = n = ( ):,n ← distribution of singular values of matrix 10 two dominant singular values ! By similar arguments: 8

singular value 6 σ = max kAxk , (V) = argmax kAxk . (5.4.6) measurements display linear correlation with two 1 2 :,1 2 kxk2=1 kxk2=1 4 principal components Recall: 2-norm of the matrix A (→ Def. 2.4.2) is defined as the maximum in (5.4.6). Thus we have 2 proved the following theorem:

0 0 2 4 6 8 10 12 14 16 18 20 No. of singular value ' $ Lemma 5.4.5 (SVD and Euclidean matrix norm). principal components = u , u (leftmost columns of U-matrix of SVD) ·,1 ·,2 A Km,n A A their relative weights = v , v (leftmost columns of V-matrix of SVD) • ∀ ∈ : k k2 = σ1( ) , ·,1 ·,2 n,n • ∀A ∈ K regular: cond2(A) = σ1/σn .

& % Remark: MATLAB functions norm(A) and cond(A) rely on svd(A)

5º4 • Application of SVD: best low rank approximation 5º4

Ôº ¿¾6 Ôº ¿¾8 ➋ Kn,n−k Find ONB {z1,..., zn−k} of Ker(B) and assemble it into a matrix Z = [z1 ... zn−k] ∈ Definition 5.4.6 (Frobenius norm). The Frobenius norm of A ∈ Km,n is defined as n−k n−k r m n 2 2 2 2 2 H 2 2 2 2 kA − BkF ≥ k(A − B)ZkF = kAZkF = kAzik2 = σj(vj zi) kAkF := |aij| . Xi=1 Xi=1 Xj=1 Xi=1 Xj=1

Obvious: kAkF invariant under unitary transformations of A Note: information content of a rank-k matrix M ∈ Km,n is equivalent to k(m + n) numbers! p Frobenius norm and SVD: kAk2 = σ2 (5.4.7) F j=1 j Approximation by low-rank matrices ↔ matrix compression P

✎ Km,n N notation: Rk(m, n) := {A ∈ : rank(A) ≤ k}, m, n, k ∈

5.5 Essential Skills Learned in Chapter 5

You should know:

5º4 5º5

Ôº ¿¾9 • complexity of the direct eigensolver eig of Matlab Ôº ¿¿½

' $ • what are Krylov methods, when and how to use them Theorem 5.4.7 (best low rank approximation). • what is the singular value decomposition and how to use it H Km.n Let A = UΣV be the SVD of A ∈ (→ Thm. 5.4.1). For 1 ≤ k ≤ rank(A) set Uk := Km,k Kn,k Kk,k • applications of the svd: principal component analysis, extrema of quadratic forms on the unit u·,1,..., u·,k ∈ , Vk := v·,1,..., v·,k ∈ , Σk := diag(σ1,..., σk) ∈ . Then, for k·k = k·k and k·k = k·k , holds true sphere, best low rank approximation   F 2  H A − UkΣkVk ≤ kA − Fk ∀F ∈ Rk(m, n) .

& H % Proof. Write Ak = UkΣkVk . Obviously, with r = rank A,

σk+1 , for k·k = k·k2 , rank Ak = k and kA − Akk = kΣ − Σkk = 2 2 , for ( σk+1 + ··· + σr k·k = k·kF . ➊ Pick B ∈ Kn,n, rank B = k. q

dim Ker(B) = n − k ⇒ Ker(B) ∩ Span {v1,..., vk+1} 6= {0} , where vi, i = 1,..., n are the columns of V. For x ∈ Ker(B) ∩ Span {v1,..., vk+1}, kxk2 = 1 2 k+1 k+1 kA − Bk2 ≥ k(A − B)xk2 = kAxk2 = σ (vHx)u = σ2(vHx)2 ≥ σ2 , 2 2 2 j j j j j j+1 j=1 j=1 X 2 X k+1 H 2 because (vj x) = 1. 5º4 5º5 j=1

Ôº ¿¿¼ Ôº ¿¿¾ P