Stat 309: Mathematical Computations I Fall 2018 Lecture 4
Total Page:16
File Type:pdf, Size:1020Kb
STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 4 1. spectral radius • matrix 2-norm is also known as the spectral norm • name is connected to the fact that the norm is given by the square root of the largest eigenvalue of ATA, i.e., largest singular value of A (more on this later) n×n • in general, the spectral radius ρ(A) of a matrix A 2 C is defined in terms of its largest eigenvalue ρ(A) = maxfjλij : Axi = λixi; xi 6= 0g n×n • note that the spectral radius does not define a norm on C • for example the non-zero matrix 0 1 J = 0 0 has ρ(J) = 0 since both its eigenvalues are 0 • there are some relationships between the norm of a matrix and its spectral radius • the easiest one is that ρ(A) ≤ kAk n for any matrix norm that satisfies the inequality kAxk ≤ kAkkxk for all x 2 C , i.e., consistent norm { here's a proof: Axi = λixi taking norms, kAxik = kλixik = jλijkxik therefore kAxik jλij = ≤ kAk kxik since this holds for any eigenvalue of A, it follows that maxjλij = ρ(A) ≤ kAk i { in particular this is true for any operator norm { this is in general not true for norms that do not satisfy the consistency inequality kAxk ≤ kAkkxk (thanks to Likai Chen for pointing out); for example the matrix " p1 p1 # A = 2 2 p1 − p1 2 2 p is orthogonal and therefore ρ(A) = 1 but kAkH;1 = 1= 2 and so ρ(A) > kAkH;1 { exercise: show that any eigenvalue of a unitary or an orthogonal matrix must have absolute value 1 Date: October 10, 2018, version 1.0. Comments, bug reports: [email protected]. 1 • on the other hand, the following characterization is true for any matrix norm, even the inconsistent ones ρ(A) = lim kAmk1=m m!1 • we can also get an upper bound for any particular matrix (but not for all matrices) n×n Theorem 1. Let A 2 C and " > 0. There exists an operator norm k · kα of the form kAxkα kAkα = max ; x6=0 kxkα n where k · kα is a norm on C , such that kAkα ≤ ρ(A) + . The norm k · kα is dependent on A and ". • this result suggests that the largest eigenvalue of a matrix can be easily approximated • here is an example, let 2 2 −1 3 6−1 2 −1 7 6 7 6 .. .. .. 7 A = 6 . 7 6 7 4 −1 2 15 −1 2 { the eigenvalues of this matrix, which arises frequently in numerical methods for solving differential equations, are known to be jπ λ = 2 + 2 cos ; j = 1; 2; : : : ; n j n + 1 the largest eigenvalue is π jλ j = 2 + 2 cos ≤ 4 1 n + 1 and kAk1 = 4, so in this case, the 1-norm provides an excellent approximation { on the other hand, suppose 1 106 A = 0 1 6 we have kAk1 = 10 + 1; but ρ(A) = 1, so in this case the norm yields a poor approximation however, suppose " 0 D = 0 1 then 1 106" DAD−1 = 0 1 −1 −6 and kDAD k1 = 1 + 10 , which for sufficiently small ", yields a much better approximation to ρ(DAD−1) = ρ(A): • if kAk < 1 for some submultiplicative norm, then kAmk ≤ kAkm ! 0 as m ! 1 • since kAk is a continuous function of the elements of A, it follows that Am ! O, i.e., every entry of Am goes to 0 • however, if kAk > 1, it does not follow that kAmk ! 1 2 • for example, suppose 0:99 106 A = 0 0:99 m m in this case, kAk1 > 1, we claim that because ρ(A) < 1, A ! O and so kA k ! 0 • let us prove this more generally, in fact we claim the following m Lemma 1. limm!1 A = O if and only if ρ(A) < 1: Proof. ()) Let Ax = λx with x 6= 0. Then Amx = λmx. Taking limits lim λm x = lim λmx = lim Amx = lim Am x = Ox = 0: m!1 m!1 m!1 m!1 m Since x 6= 0, we must have limm!1 λ = 0 and thus jλj < 1. Hence ρ(A) < 1. (() Since ρ(A) < 1, there exists some operator norm k · kα such that kAkα < 1 by Theorem 1. m m m So kA kα ≤ kAkα ! 0 and so A ! O. • alternatively, the second part above may also be proved directly via the Jordan form of A and the expression 2 k−(n −1)3 λk kλk−1 kλk−2 ··· k λ r r 1 r 2 r nr−1 r 6 . 7 6 .. .. 7 6 7 k 6 . 7 Jr = 6 .. .. 7 6 7 6 .. 7 4 . 5 k λr for sufficiently large k (see previous lecture) without using Theorem 1). • in Homework 1 we will see that if for some operator norm, kAk < 1, then I−A is nonsingular 2. Gerschgorin's theorem n×n • A = [aij] 2 C , for i = 1; : : : ; n, we define the Gerschgorin's discs Gi := fz 2 C : jz − aiij ≤ rig where X ri := jaijj j6=i • Gerschgorin's theoerm says that the n eigenvalues of A are all contained in the union of Gi's • before we prove this, we need a result that is by itself useful n×n • a matrix A 2 C is called strictly diagonally dominant if X jaiij > jaijj j6=i • it is called weakly diagonally dominant if the `>' is replaced by `≥' Lemma 2. A strictly diagonally dominant matrix is nonsingular. Proof. Let A be strictly diagonally dominant. Suppose Ax = 0 for some x 6= 0. Let k 2 f1; : : : ; ng be such that jxkj = maxi=1;:::;njxij. Since x 6= 0, we must have jxkj > 0. Now observe that X akkxk = − akjxj j6=k and so by the triangle inequality, X X X jakkjjxkj = akjxj ≤ jakjjjxjj ≤ jakjj jxkj < jakkjjxkj j6=k j6=k j6=k 3 where the last inequality is by strict diagonal dominance. This is a contradiction. In other words, there are no non-zero vector with Ax = 0. So ker(A) = f0g and so A is nonsingular. • we are going to use this to prove the first part of Gerschgorin theorem • the second part requires a bit of topology Theorem 2 (Gerschgorin). The spectrum of A is contained in the union of its Gerschgorin's discs, i.e., [n λ(A) ⊆ Gi: i=1 Furthermore, the number of eigenvalues (counted with multiplicity) in each connected component Sn of i=1 Gi is equal to the number of Gerschgorin discs that constitute that component. Sn Proof. Suppose z2 = i=1 Gi. Then A − zI is a strictly diagonal dominant matrix (check!) and therefore nonsingular by the above lemma. Hence det(A − zI) 6= 0 and so z is not an eigenvalue of A. This proves the first part. For the second part, consider the matrix 2 3 a11 ta12 ··· ta1n 6ta21 a22 ··· ta2n7 6 7 A(t) := 6 . .. 7 4 . 5 tan1 tan2 ··· ann for t 2 [0; 1]. Note that A(0) = diag(a11; : : : ; ann) and A(1) = A. We will let Gi(t) be the ith Gerschgorin disc of A(t). So Gi(t) = fz 2 C : jz − aiij ≤ trig: Clearly Gi(t) ⊆ Gi for any t 2 [0; 1]. By the first part, all eigenvalues of all the matrices A(t) are Sn contained in i=1 Gi. Since the set of eigenvalues of the matrices A(t) depends continuously on the parameter t, A(0) must have the same number of eigenvalues as A(1) in each connected component Sn of i=1 Gi. Now just observe that the eigenvalues of A(0) are simply the centers akk of each discs in a connected component. 3. schur decomposition n×n • suppose we want a decomposition for arbitrary matrices A 2 C like the evd for normal matrices A = QΛQ∗, i.e., diagonalizing with unitary matrices • the way to obtain such a decomposition is to relax the requirement of having a diagonal matrix Λ in A = QΛQ∗ but instead allow it to be upper-triangular • this gives the Schur decomposition: A = QRQ∗ (3.1) • as in the evd for normal matrices, Q is a unitary matrix but 2 3 r11 r12 ··· r1n 6 0 r22 ··· r2n7 6 7 R = 6 . .. 7 4 . 5 0 ··· 0 rnn • note that we the eigenvalues of A are precisely the diagonal entries of R: r11; : : : ; rnn • unlike the Jordan canonical form, the Schur decomposition is readily computable in finite- precision via the QR algorithm • the QR algorithm is based on the QR decomposition, which we will discuss later in this course • in its most basic form, QR algorithm does the following: 4 input: A0 = A; step k: Ak = QkRk; perform QR decomposition step k + 1: Ak+1 = RkQk; multiply QR factors in reverse order • under suitable conditions, one may show that Qk ! Q and Rk ! R where Q and R are as the requisite factors in (3.1) • in most undergraduate linear algebra classes, one is taught to find eigenvalues by solving for the roots of the characteristic polynomial pA(x) = det(xI − A) = 0 • this is almost never the case in practice • for one, there is no finite algorithms for finding the roots of a polynomial when the degree exceeds four | by the famous impossibility result of Abel{Galois • in fact what happens is the opposite | the roots of a univariate polynomial (divide by the coefficient of the highest degree term first so that it becomes a monic polynomial) n−1 n p(x) = c0 + c1x + ··· + cn−1x + x are usually obtained as the eigenvalues of its companion matrix 2 3 0 0 ::: 0 −c0 61 0 ::: 0 −c1 7 6 7 60 1 ::: 0 −c2 7 Cp = 6 7 6.