<<

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 LECTURE 5

1. Jordan canonical form • if A is not diagonalizable and we want something like a diagonalization, then the best we could do is a Jordan canonical form or where we get A = XJX−1 (1.1) – the J has the following characteristics ∗ it is not diagonal but it is the next best thing to diagonal, namely, bidiagonal, i.e. only the entries aii and ai,i+1 can be non-zero, every other entry in J is 0 ∗ the diagonal entries of J are precisely the eigenvalues of A, counted with multi- plicity ∗ the superdiagonal entries ai,i+1 are as simple as they can be — they can take one of two possible values ai,i+1 = 0 or 1 ∗ if ai,i+1 = 0 for all i, then J is in fact diagonal and (1.1) reduces to the eigenvalue decomposition – the matrix J is more commonly viewed as a block   J1  ..  J =  .  Jk

∗ each block Jr, for r = 1, . . . , k, has the form

λr 1  . .  .. ..    Jr =  .   .. 1  λr

where Jr is nr × nr Pk ∗ clearly r=1 nr = n – the set of column vectors of X are called a Jordan basis of A • in general the Jordan basis X include all eigenvectors of A but also additional vectors that are not eigenvectors of A • the Jordan canonical form provides valuable information about the eigenvalues of A • the values λj, for j = 1, . . . , k, are the eigenvalues of A • for each distinct eigenvalue λ, the number of Jordan blocks having λ as a diagonal element is equal to the number of linearly independent eigenvectors associated with λ, this number is called the geometric multiplicity of the eigenvalue λ • the sum of the sizes of all of these blocks is called the algebraic multiplicity of λ • we now consider Jr’s eigenvalues,

λ(Jr) = λr, . . . , λr

Date: October 12, 2013, version 1.0. Comments, bug reports: [email protected]. 1 where λr is repeated nr times, but because

0 1  . .  .. ..    Jr − λrI =  .   .. 1 0

is a matrix of rank nr − 1, it follows that the homogeneous system (Jr − λrI)x = 0 has only one vector (up to a scalar multiple) for a solution, and therefore there is only one eigenvector associated with this Jordan block > • the unique unit vector that solves (Jr − λrI)x = 0 is the vector e1 = [1, 0,..., 0] • now consider the matrix

0 1  0 1  0 0 1  ......  .. ..   .. ..   ......        2  . .   . .   . .  (Jr − λrI) =  .. ..   .. ..  =  .. ..       1  .   .   .   .. 1  .. 1  .. 0 0 0 0

2 • it is easy to see that (Jr − λrI) e2 = 0 • continuing in this fashion, we can conclude that

k (Jr − λrI) ek = 0, k = 1, . . . , nr − 1

• the Jordan form can be used to easily compute powers of a matrix • for example,

A2 = XJX−1XJX−1 = XJ 2X−1

and, in general,

Ak = XJ kX−1

• due to its structure, it is easy to compute powers of a Jordan block Jr:

k λr 1  . .  .. ..  k   Jr =  .   .. 1  λr 0 1  . .  .. ..  k   = (λrI + K) ,K =  .   .. 1 0 k X k = λk−jKj j r j=0 2 which yields, for k > nr,  k−(n −1) λk kλk−1 kλk−2 ··· k λ r r 1 r 2 r nr−1 r  . . .   .. .. .    k  . . .  Jr =  .. .. .  (1.2)    .. .   . .  k λr • for example, λ 1 03 λ3 3λ2 3λ  0 λ 1 =  0 λ3 3λ2 0 0 λ 0 0 λ3 • we now consider an application of the Jordan canonical form – consider the system of differential equations 0 y (t) = Ay(t), y(t0) = y0 – using the Jordan form, we can rewrite this system as y0(t) = XJX−1y(t) – ultiplying through by X−1 yields X−1y0(t) = JX−1y(t) which can be rewritten as z0(t) = Jz(t) where z = Q−1y(t) – this new system has the initial condition −1 z(t0) = z0 = Q y0 – if we assume that J is a diagonal matrix (which is true in the case where A has a full set of linearly independent eigenvectors), then the system decouples into scalar equations of the form 0 zi(t) = λizi(t),

where λi is an eigenvalue of A – this equation has the solution

λi(t−t0) zi(t) = e zi(0), and therefore the solution to the original system is

eλ1(t−t0)   ..  −1 y(t) = X  .  X y0 eλn(t−t0) • Jordan canonical form suffers however from one major defect that makes them useless in practice: they cannot be computed in finite precision or in the presence of rounding errors in general, a result of Golub and Wilkinson • that is why you won’t find a Matlab function for Jordan canonical form 3 2. spectral radius • matrix 2-norm is also known as the spectral norm • name is connected to the fact that the norm is given by the square root of the largest eigenvalue of A>A, i.e. largest singular value of A (more on this later) n×n • in general, the spectral radius ρ(A) of a matrix A ∈ C is defined in terms of its largest eigenvalue ρ(A) = max{|λi| : Axi = λixi, xi 6= 0} n×n • note that the spectral radius does not define a norm on C • for example the non- 0 1 J = 0 0 has ρ(J) = 0 since both its eigenvalues are 0 • there are some relationships between the norm of a matrix and its spectral radius • the easiest one is that ρ(A) ≤ kAk n for any matrix norm that satisfies the inequality kAxk ≤ kAkkxk for all x ∈ C – here’s a proof: Axi = λixi taking norms, kAxik = kλixik = |λi|kxik therefore kAxik |λi| = ≤ kAk kxik since this holds for any eigenvalue of A, it follows that

max|λi| = ρ(A) ≤ kAk i – in particular this is true for any operator norm – this is in general not true for norms that do not satisfy the inequality kAxk ≤ kAkkxk (thanks to Likai Chen for pointing out); for example the matrix

" √1 √1 # A = 2 2 √1 − √1 2 2 √ is orthogonal and therefore ρ(A) = 1 but kAkH,∞ = 1/ 2 and so ρ(A) > kAkH,∞ – exercise: show that any eigenvalue of a unitary or an must have absolute value 1 • on the other hand, the following characterization is true for any matrix norm, even the inconsistent ones ρ(A) = lim kAmk1/m m→∞ • we can also get an upper bound for any particular matrix (but not for all matrices)

Theorem 1. For every ε > 0, there exists an operator norm k · kα such that

kAkα ≤ ρ(A) + . The norm is dependent on A and ε. • this result suggests that the largest eigenvalue of a matrix can be easily approximated 4 • here is an example, let  2 −1  −1 2 −1     ......  A =  . . .     −1 2 1 −1 2 – the eigenvalues of this matrix, which arises frequently in numerical methods for solving differential equations, are known to be jπ λ = 2 + 2 cos , j = 1, 2, . . . , n j n + 1 the largest eigenvalue is π |λ | = 2 + 2 cos ≤ 4 1 n + 1

and kAk∞ = 4, so in this case, the ∞-norm provides an excellent approximation – on the other hand, suppose 1 106 A = 0 1 6 we have kAk∞ = 10 + 1, but ρ(A) = 1, so in this case the norm yields a poor approximation however, suppose ε 0 D = 0 1 then 1 106ε DAD−1 = 0 1 −1 −6 and kDAD k∞ = 1 + 10 ,, which for sufficiently small ε, yields a much better approximation to ρ(DAD−1) = ρ(A). • if kAk < 1 for some submultiplicative norm, then kAmk ≤ kAkm → 0 as m → ∞ • since kAk is a continuous function of the elements of A, it follows that Am → O, i.e. every entry of Am goes to 0 • however, if kAk > 1, it does not follow that kAmk → ∞ • for example, suppose 0.99 106  A = 0 0.99 m m in this case, kAk∞ > 1, we claim that because ρ(A) < 1, A → O and so kA k → 0 • let us prove this more generally, in fact we claim the following m Lemma 1. limm→∞ A = O if and only if ρ(A) < 1. Proof. (⇒) Let Ax = λx with x 6= 0. Then Amx = λmx. Taking limits     lim λm x = lim λmx = lim Amx = lim Am x = Ox = 0. m→∞ m→∞ m→∞ m→∞ m Since x 6= 0, we must have limm→∞ λ = 0 and thus |λ| < 1. Hence ρ(A) < 1. (⇐) Since ρ(A) < 1, there exists some operator norm k · kα such that kAkα < 1 by Theorem 1. m m m So kA kα ≤ kAkα → 0 and so A → O. Alternatively this part may also be proved directly via the Jordan form of A and (1.2) (without using Theorem 1).  • in Homework 1 we will see that if for some operator norm, kAk < 1, then I−A is nonsingular 5 3. Gerschgorin’s theorem n×n • A = [aij] ∈ C , for i = 1, . . . , n, we define the Gerschgorin’s discs Gi := {z ∈ C | |z − aii| ≤ ri} where X ri := |aij| j6=i • Gerschgorin’s theoerm says that the n eigenvalues of A are all contained in the union of Gi’s • before we prove this, we need a result that is by itself useful n×n • a matrix A ∈ C is called strictly diagonally dominant if X |aii| > |aij| j6=i • it is called weakly diagonally dominant if the ‘>’ is replaced by ‘≥’ Lemma 2. A strictly diagonally dominant matrix is nonsingular. Proof. Let A be strictly diagonally dominant. Suppose Ax = 0 for some x 6= 0. Let k ∈ {1, . . . , n} be such that |xk| = maxi=1,...,n|xi|. Since x 6= 0, we must have |xk| > 0. Now observe that X akkxk = − akjxj j6=k and so by the triangle inequality,  

X X X |akk||xk| = akjxj ≤ |aij||xj| ≤  |akj| |xk| < |akk||xk|

j6=k j6=i j6=k where the last inequality is by strict diagonal dominance. This is a contradiction. In other words, there are no non-zero vector with Ax = 0. So ker(A) = {0} and so A is nonsingular.  • we are going to use this to prove the first part of Gerschgorin theorem • the second part requires a bit of topology Theorem 2 (Gerschgorin). The spectrum of A is contained in the union of its Gerschgorin’s discs, i.e. [n λ(A) ⊆ Gi. i=1 Furthermore, the number of eigenvalues (counted with multiplicity) in each connected component Sn of i=1 Gi is equal to the number of Gerschgorin discs that constitute that component. Sn Proof. Suppose z∈ / i=1 Gi. Then A − zI is a strictly diagonal dominant matrix (check!) and therefore nonsingular by the above lemma. Hence det(A − zI) 6= 0 and so z is not an eigenvalue of A. This proves the first part. For the second part, consider the matrix   a11 ta12 ··· ta1n ta21 a22 ··· ta2n   A(t) :=  . . .. .   . . . .  tan1 tan2 ··· ann for t ∈ [0, 1]. Note that A(0) = diag(a11, . . . , ann) and A(1) = A. We will let Gi(t) be the ith Gerschgorin disc of A(t). So Gi(t) = {z ∈ C | |z − aii| ≤ tri}. 6 Clearly Ki(t) ⊆ Ki for any t ∈ [0, 1]. By the first part, all eigenvalues of all the matrices A(t) are Sn contained in i=1 Gi. Since the set of eigenvalues of the matrices A(t) depends continuously on the parameter t, A(0) must have the same number of eigenvalues as A(1) in each connected component Sn of i=1 Gi. Now just observe that the eigenvalues of A(0) are simply the centers akk of each discs in a connected component.  4. singular value decomposition m×n • let A ∈ C , we can always write A = UΣV ∗ (4.1) m×m n×n – U ∈ C and V ∈ C are both unitary matrices ∗ ∗ ∗ ∗ U U = Im = UU ,V V = In = VV m×n – Σ ∈ C is a diagonal matrix in the sense that σij = 0 if i 6= j – if m > n, then Σ looks like

σ1  .  ..       σn Σ =    0 ··· 0   . .   . .  0 ··· 0 – if m < n, then Σ looks like   σ1 0 ··· 0  .. . . Σ =  . . . σm 0 ··· 0 – if m = n, then Σ looks like   σ1  ..  Σ =  .  σm

– the diagonal elements of Σ, denoted σi, i = 1, . . . , n, are all nonnegative, and can be ordered such that

σ1 ≥ σ2 ≥ · · · ≥ σr > 0, σr+1 = ··· = σmin(m,n) = 0 – r is the rank of A • this decomposition of A is called the singular value decomposition, or svd – the values σi, for i = 1, 2, . . . , n, are the singular values of A – the columns of U are the left singular vectors – the columns of V are the right singular vectors • an alternative decomposition of A omits the singular values that are equal to zero: A = U˜Σ˜V˜ ∗ ˜ m×r ˜ ∗ ˜ – U ∈ C is a matrix with orthonormal columns, i.e. satisfying U U = Ir (but not ∗ U˜U˜ = Im!) ˜ n×r ˜ ∗ ˜ – V ∈ C is also a matrix with orthonormal columns, i.e. satisfying V V = Ir (but ∗ again not V˜ V˜ = In!) – Σ˜ is an r × r diagonal matrix with diagonal elements σ1, . . . , σr – again r = rank(A) 7 – the columns of U˜ are the left singular vectors corresponding to the nonzero singular values of A, and form an orthonormal basis for the range of A – the columns of V˜ are the right singular vectors corresponding to the nonzero singular values of A, and form an orthonormal basis for the cokernel of A – this is called the condensed or compact or reduced svd – note that in this case, Σ˜ is a square matrix • the form in (4.1) is sometimes called the full svd • we may also write the reduced svd as a sum of rank-1 matrices ∗ ∗ ∗ A = σ1u1v1 + σ2u2v2 + ··· + σrurvr ˜ m – U = [u1,..., ur], i.e. u1,..., ur ∈ C are the left singular vectors of A ˜ n – V = [v1,..., vr], i.e. v1,..., vr ∈ C are the right singular vectors of A > m > n – note that for x = [x1, . . . , xm] ∈ C and y = [y1, . . . , yn] ∈ C ,     x1y1 x1y2 ··· x1yn x1y¯1 x1y¯2 ··· x1y¯n  x2y1 x2y2 ··· x2yn   x2y¯1 x2y¯2 ··· x2y¯n  >   ∗   xy =  . . .  , xy =  . . .   . . .   . . .  xmy1 xmy2 ··· xmyn xmy¯1 xmy¯2 ··· xmy¯n – if neither x nor y is the zero vector, then rank(xy>) = rank(xy∗) = 1 m n ∗ – furthermore if rank(A) = 1, then there exists x ∈ C and y ∈ C so that A = xy

8