Chapter 7

Powers of Matrices

7.1 Introduction

In chapter 5 we had seen that many natural (repetitive) processes such as the rabbit examples can be described by a discrete linear system

(1) ~un+1 = A~un, n = 0, 1, 2,..., and that the solution of such a system has the form

n (2) ~un = A ~u0.

n In addition, we had learned how to find an explicit expression for A and hence for ~un by using the constituent matrices of A. In this chapter, however, we will be primarily interested in “long range predictions”, i.e. we want to know the behaviour of ~un for n large (or as n → ∞). In view of (2), this is implied by an understanding of the limit

n A∞ = lim A , n→∞ provided that this limit exists. Thus we shall: 1) Establish a criterion to guarantee the existence of this limit (cf. Theorem 7.6); 2) Find a quick method to compute this limit when it exists (cf. Theorems 7.7 and 7.8). We then apply these methods to analyze two application problems: 1) The shipping of commodities (cf. section 7.7) 2) Rat mazes (cf. section 7.7) As we had seen in section 5.9, the latter give rise to Markov chains, which we shall study here in more detail (cf. section 7.7). 312 Chapter 7: Powers of Matrices 7.2 Powers of Numbers

As was mentioned in the introduction, we want to study here the behaviour of the powers An of a matrix A as n → ∞. However, before considering the general case, it is useful to first investigate the situation for 1 × 1 matrices, i.e. to study the behaviour of the powers an of a number a. (a) Powers of Real Numbers Let us first look at a few examples.

n 1 Example 7.1. Determine the behaviour of a for a = 2, − 2 , −1. (a) a = 2 n 1 2 3 4 5 ··· sequence diverges an 2 4 8 16 32 ··· (limit doesn’t exist) 1 (b) a = − 2 n 1 2 3 4 5 ... lim (− 1 )n = 0 n→∞ 2 n 1 1 1 1 1 a − 2 4 − 8 16 − 32 ... converges to 0 (c) a = −1 n 1 2 3 4 5 ··· limit doesn’t exist an −1 1 −1 1 −1 ··· (bounces back and forth)

These three examples generalize to yield the following facts.

Theorem 7.1. Let a ∈ R. (a) If |a| < 1, then lim an = 0. n→∞ (b) If |a| > 1, then the sequence {an} diverges. (c) If |a| = 1, then lim an exists if and only if a = 1 (and then lim an = 1). n→∞ n→∞ (b) Powers of Complex Numbers Next, we consider the behaviour of the powers cn of a complex number c = a + bi. Here we first need to clarify the notion of the limit of a sequence {cn}n≥1 of complex numbers.

Definition. If cn = an + ibn, n = 1, 2,..., is a sequence of complex numbers such that

a = lim an and b = lim bn n→∞ n→∞ exist, then we say that c = a + ib is the limit of {cn}n≥1 and write

lim cn = a + ib. n→∞ Section 7.2: Powers of Numbers 313

Remarks. 1) If either lim an or lim bn does not exist, then neither does lim cn. n→∞ n→∞ n→∞ 2) The following rule is gives an alternate definition of the limit of the sequence {cn}n≥1 (and is sometimes useful for calculating the limit c):

lim cn = c ⇔ lim |cn − c| = 0. n→∞ n→∞ | {z } complex absolute value 2 2 2 [To see this equivalence, write cn = an+ibn and c = a+ib, so |cn−c| = (an−a) +(bn−b) . Then lim cn = c ⇔ lim an = a and lim bn = b ⇔ lim |an − a| = 0 and lim |bn − b| = 0 ⇔ 2 2 2 2 lim(an − a) = 0 and lim(bn − b) = 0 ⇔ lim(an − a) + (bn − b) = 0 ⇔ lim |cn − c| = 0.]

3) We also observe that if c = lim cn exists, then we have n→∞

lim |cn| = |c|. n→∞ 2 2 2 2 2 2 2 2 [For lim |cn| = lim(an + bn) = lim an + lim bn = a + b = |c| , and hence lim |cn| = |c|.] Theorem 7.1 naturally extends to complex numbers in the following way. Theorem 7.2. Let α ∈ C. (a) If |α| < 1 then lim αn = 0. “The powers spiral in to 0.” n→∞ (b) If |α| > 1, then lim αn = ∞. “The powers spiral out to ∞.” n→∞ (c) If |α| = 1, then lim αn does not exist except if α = 1. n→∞ Proof. (a) If |α| < 1, then lim |αn| = lim |α|n = 0, and hence lim αn = 0 by Remark 2) above (with c = 0). (Here we have used the multiplicative rule |αn| = |α|n of the absolute value.) (b) If |α| > 1, then the limit lim |αn| = lim |α|n = ∞ does not exist, and hence neither does the limit lim αn by Remark 3) above. (c) Suppose that β := lim αn exists and that |α| = 1. Then |β| = lim |αn| = lim |α|n = 1, so β 6= 0. Thus we have αβ = α lim αn = lim αn+1 = β, so αβ = β and hence α = 1. Corollary. lim αn exists if any only if α = 1 or |α| < 1. Moreover, n→∞ lim αn = 0 if |α| < 1. n→∞ 5 5 11 11 Example 7.2. Analyze the behaviour of the powers of α = 6 + 12 i, β = 12 + 24 i and γ = cos( 2π ) + i sin( 2π ). 100 100 √ 5 . n Solution. Since |α| = 12 5 = .931694990 < 1, the powers α spiral in to 0, as is shown in figure 7.1. √ 11 . n Since |β| = 24 5 = 1.024864490 > 1, the powers β spiral out to ∞, as is evident in figure 7.1. Finally, since |γ| = 1, the powers γn run around on the unit circle (cf. figure 7.1). 314 Chapter 7: Powers of Matrices

1.5 b

b b

1 γ20 bb a a a a a a a a a 2 b a a a a β a a a a b b a 2 a b a a r α a a a a a r r a a a a .5 a β a r α a a a b b a r a β20 a a a r 30 r a 2 b b a r α a γ r 20 r γ a α r a a r r r a −1.5 −a1 −.5 r rr r r .5 1a 1.5 a r r rrr rr a rr rrr rr r a r r r rr r r a b a r r a a r r a a a a r r a b a 10 −.5 a a α r 90a a r r γa a a a a a a b a a a a a a a a a a a a −1 a a b a a a a a a a a a a a a a b β10 b b −1.5

5 11 2π 2π Figure 7.1: Powers of the numbers α = 12 (2+i), β = 24 (2+i) and γ = cos( 100 )+i sin( 100 )

Exercises 7.2. 1. Calculate the following limits whenever they exist: 1 + (−1)ni 5 + (−1)ni (−2)n + i (a) lim ;(b) lim ;(c) lim ; n→∞ 1 + 2ni n→∞ 1 + 210000i n→∞ 1 + 2n n k n ! X 1 + i X (d) lim ;(e) lim eki(1+i) . n→∞ 2 n→∞ k=0 k=0 2. (a) Find the first n for which |(3 + 4i)n| > 107. (b) Find the first n for which |(.3 + .4i)n| < 10−7. 3. Write out a detailed proof of Theorem 7.1. Section 7.3: Sequences of powers of matrices 315 7.3 Sequences of powers of matrices

We now extend the results of the previous section about limits of powers of numbers to limits of powers of matrices. Before doing so, we first have to clarify the meaning of a limit for matrices.

Definition. Let A1,A2,A3,... be a sequence of m × n matrices. Then we say that

A = lim Ak k→∞

(k) if we have A = (aij) and lim aij = aij, for all i, j with 1 ≤ i ≤ m and 1 ≤ j ≤ n, where k→∞

 (k) (k)  a11 . . . a1n  . .  Ak =  . .  . (k) (k) am1 . . . amn

1 !   k 1 0 1 Example 7.3. (a) If Ak = , then lim Ak = because 1 k+1 k→∞ 0 1 k2 k 1 1 k + 1 lim = 0 = lim and lim = 1. k→∞ k k→∞ k2 k→∞ k

 (−1)k 0  (b) lim does not exist! k→∞ 0 1

Theorem 7.3 (Rules for limits). Suppose that lim Ak = A exists and let B,C be ma- k→∞ trices such that the product CAkB is defined. Then:

(a) lim (AkB) = AB, k→∞ (b) lim (CAk) = CA, k→∞ (c) lim (CAkB) = CAB. k→∞ Proof. Use the definition of and properties of (usual) limits. Definition. A square matrix A is called power convergent if lim Ak exists. k→∞

 1 k  1 k    2 0 0 ( 2 ) 0 0 0 0 0 1 1 k Example 7.4. (a) lim  0 3 0  = lim  0 ( 3 ) 0  =  0 0 0  , so A k→∞ 1 k→∞ 1 k 0 0 4 0 0 ( 4 ) 0 0 0 is power convergent. „ « „ « (b) A = 1 0 is not power convergent, for Ak = 1 0 , and we know that 0 −1 0 (−1)k the sequence (−1)k does not converge. 316 Chapter 7: Powers of Matrices

In view of Theorem 7.2, the above examples show the following general rule:

Theorem 7.4. A A = Diag(λ1, λ2, . . . , λm) is power convergent if and only if for each i with 1 ≤ i ≤ m we have:

(3) either |λi| < 1 or λi = 1. More generally, a diagonable matrix A is power convergent if and only if each eigenvalue λi of A satisfies (3).

Proof. Since A is a diagonable, there is a matrix P such that D := P −1AP is a diagonal matrix D = Diag(λ1, . . . , λm). Thus, for every k ≥ 1 we have: Ak = (PDP −1)k = PDkP −1, by Theorem 6.1. Thus, by Theorem 7.3:

k k −1 Th. 7.3 k −1 k k −1 lim A = lim PD P = P ( lim D )P = P ( lim Diag(λ1, . . . , λm))P . k→∞ k→∞ k→∞ k→∞ This means that Th. 7.2 A is power convergent ⇔ D is power convergent ⇔ (3) holds for all λi’s.

Remark. Note that the above proof also shows the following more general fact: If A and B = P −1AP are two similar matrices, then A is power convergent ⇔ B is power convergent. Moreover, if this is the case then we have lim Ak = P ( lim Bk)P −1. k→∞ k→∞

When is a general matrix power convergent? Following the strategy introduced in section 6.6, we first analyze this question for Jordan blocks, then for Jordan matrices and finally for arbitrary matrices. Theorem 7.5. A Jordan block J = J(λ, m) is power convergent if and only if (4) either: |λ| < 1 or: λ = 1 and m = 1.

 1  1 2 1 Example 7.5. a) If J = J( 2 , 2) = 1 , then 0 2 ! ( 1 )k k( 1 )k−1  0 0  J k = 2 2 → . 1 k 0 0 0 ( 2 ) Thus, J is power convergent and converges to 0. 1 1 k 1 k b) If J = J(1, 2) = 0 1 , then J = 0 1 , so the sequence does not converge, i.e. J is not power convergent. Section 7.3: Sequences of powers of matrices 317

Proof of Theorem 7.5. Recall from chapter 6, Theorem 6.5, that

 n n−1 n  n−m+1  λ nλ ······ m−1 λ  n .. .   0 λ . .  n  . . . . .  J(λ, m) =  ......  ,    . . .   . .. .. nλn−1  0 ······ 0 λn  n (n) (n) 0 if i > j i.e. J(λ, m) = (aij ), where aij = n  n−(j−i) . j−i λ if i ≤ j We now check the following four different cases: Case 1: |λ| ≥ 1 (but λ 6= 1) (n) n Since the sequence a11 = λ doesn’t converge for n → ∞ (cf. Theorem 7.2), neither does J(λ, m)n. Case 2: λ = 1 and m > 1 (n) n−1 Since the sequence a12 = nλ = n does not converge for n → ∞, hence neither does J(λ, m)n. (Note that since m > 1, this entry is actually present.) Case 3: λ = 1 and m = 1 n The sequence of 1 × 1 matrices {(1) } clearly converges to the 1 × 1 matrix J∞ = (1). Case 4: |λ| < 1 (n) In this case we shall show that {aij } converges for all i, j. In view of the above formula (n) for aij , this follows once we have shown:   n n−a Claim. |λ| < 1 ⇒ lim λ = 0 for all a ∈ Z (a ≥ 0). n→∞ a n n(n−1)...(n−a+1) a Proof of claim. Since a = k! ≤ n · n · ... · n = n we have   n n−a a n−a 1 a n 1 a n log |λ| λ ≤ |n λ | = |n λ | = n e . a |λ|a |λ|a We now use the following fact from Analysis: lim tae−tb = 0, if a, b > 0, t→∞ i.e. exponentials grow much faster than polynomials. This is applicable here because −b = log |λ| < 0 (since |λ| < 1), and so we see that n λn−a → 0, as n → ∞. a This, therefore, proves the claim. Putting these four cases together yields the assertion of Theorem 7.5. 318 Chapter 7: Powers of Matrices

For future reference, we note that in the course of the above proof we had also shown: Corollary. If |λ| < 1, then lim J(λ, m)n = 0. n→∞ We now want to extend Theorem 7.5 to general matrices. To state the result in a convenient form, we first introduce the following two definitions.

Definition. An eigenvalue λi of a matrix A is called regular if its geometric multiplicity is the same as its algebraic multiplicity; i.e. if mA(λi) = νA(λi). Remark. Actually, this concept had already been introduced in section 6.2. Recall that we had seen there that an eigenvalue λi of a matrix A is regular if and only if all the Jordan blocks with eigenvalue λi of its associated Jordan canonical form J are of size 1 × 1; cf. Corollary 3 of Theorem 6.4. In particular, A is diagonable if and only if all of its eigenvalues are regular (cf. Theorem 6.4, Corollary 20.).

5 0 Example 7.6. If A = 0 5 , then 5 is a regular eigenvalue: νA(5) = mA(5) = 2; 5 1 if B = 0 5 , then 5 is not a regular eigenvalue: νB(5) = 1, mB(5) = 2.

Definition. An eigenvalue λi of A is called dominant if we have

|λj| < |λi|, for every eigenvalue λj 6= λi of A. Example 7.7. a) If A = Diag(1, 2, 3, 4, 4), then 4 is a dominant eigenvalue of A. b) If A = Diag(1, 2, 3, 4, −4), then there is no dominant eigenvalue because |−4| = |4|. c) If A = Diag(1, i), there is no dominant eigenvalue.

We can now characterize power convergent matrices as follows.

Theorem 7.6. A square matrix A is power convergent if and only if ( either: |λ | < 1 for all eigenvalues λ of A, (5) i i or: 1 is a regular, dominant eigenvalue of A.

 0 −1  Example 7.8. a) Determine whether A = is power convergent. 1 0 2 Here chA(t) = t + 1 = (t − i)(t + i), so λ1 = i, λ2 = −i. Thus A is not power convergent 2 4 since |λ1| = |λ2| = 1. [In fact: A = −I,A = I so A is clearly not power convergent.]  1 0 0  1 1 2 b) A =  0 2 1  is power convergent: since chA(t) = (t − 1)(t − 2 ) , we see that 1 0 0 2 λ1 = 1 is a dominant, regular eigenvalue (the latter because νA(λ1) = mA(λ1) = 1). Proof of Theorem 7.6. Step 1. The theorem is true for Jordan blocks J(λ, m) by Theorem 7.5. Section 7.3: Sequences of powers of matrices 319

Step 2. It is true for Jordan matrices J = Diag(J11,... ) because it is true for each block by step 1; cf. Problem 4 of Exercises 7.3. Step 3. If A is an arbitrary matrix, then by Jordan’s theorem A = PJP −1, for some Jordan matrix J. Thus, by the remark on p. 316 we see that A is power convergent ⇔ J step 2 is power convergent ⇔ (5) holds for J ⇔ (5) holds for A.

Exercises 7.3.

1. Which of the following matrices are power convergent? Justify your answer.

 1 1 1   5 −8 2   1 1   0 1  1 ; 1 ; 1 0 3 2 ; 1 4 −7 2 . 2 −1 3 2 −2 3 3   3   4 −2 1 0 0 1

2. Determine which of the following matrices are power convegent.

 0 0 1 1   0 0 0 1 

1  0 0 −1 3   0 0 −1 0  A =   B =   . 2  0 1 0 0   0 1 0 0  −2 3 0 0 2 0 0 0

 1 0 1 0 

1  0 1 0 1  3. (a) Is the matrix A =   power convergent? Explain. 2  1 0 1 0  0 1 0 1  0 1 0  (b) Find all α ∈ C such that the matrix B =  0 0 1  is power conver- α3 −3α2 3α gent. Justify your answer!

4. (a) Show that A = Diag(B,C) is power convergent if and only if both B and C are power convergent.

(b) Show that A = Diag(A1,...,Ar) is power convergent if and only if each Ai, 1 ≤ i ≤ r, is power convergent.

5. Show that if A is a power convergent matrix then | det(A)| ≤ 1.

6. (a) If A is an integral 2 × 2 matrix which is power convergent, prove that its only eigenvalues are 0 and 1. (b) Use part (a) to write down all integral 2×2 matrices which are power convergent. 320 Chapter 7: Powers of Matrices 7.4 Finding lim An (when A is power convergent) n→∞ Having analyzed when A is power convergent, i.e. when the limit lim An exists, we now n→∞ want determine this limit. Naive method: find an explicit formula for An and use it to calculate the limit of each entry. Better method: use the spectral decomposition theorem (without calculating all the Eik’s explicitly).  0 1  Example 7.9. If A = , B = 1 A and C = 1 A, find lim Bn and lim Cn. −2 3 3 2 n→∞ n→∞ Solution. (a) Here chA(t) = (t − 1)(t − 2), 1 2 ⇒ chB(t) = (t − 3 )(t − 3 ) n 1 n 2 n ⇒ B is power convergent and B = ( 3 ) E10 + ( 3 ) E20 n 1 n 1 n ⇒ limn→∞ B = 0 since ( 2 ) , ( 3 ) → 0.

(b) As before, chA(t) = (t − 1)(t − 2), 1 2 ⇒ chC (t) = (t − 2 )(t − 2 ) n 1 n n ⇒ C is power convergent and C = ( 2 ) E10 + 1 E20 n ⇒ limn→∞ C = E20.

Thus, to find the limit, it is enough to calculate E20. Using the formula for the con- 1 stituent (polynomials and) matrices (cf. (9) and (13)) we obtain E20 = (A − λ1I) = λ2−λ1  −1 1   −1 1  , and hence lim An = . −2 2 n→∞ −2 2 Note that in the above example, we were able to calculate the limits without first cal- culating Bn and Cn in detail. Generalizing this idea leads to the following two theorems.

Theorem 7.7. If |λi| < 1 for every eigenvalue of A, then A is power convergent and n limn→∞ A = 0. Proof. By the Corollary of Theorem 7.5, this is true for Jordan blocks. Using the same argument as in the proof of Theorem 7.6, the assertion follows for an arbitrary matrix.

Theorem 7.8. If λ1 = 1 is a regular, dominant eigenvalue of A, then A is power convergent and we have n lim A = E10 6= 0, n→∞ where E10 is the first constituent matrix of A associated to λ1 = 1. Moreover, the other constituent matrices E1k associated to λ1 = 1 are equal to zero:

E11 = ··· = E1,m1−1 = 0. Section 7.4: Finding lim An (when A is power convergent) 321 n→∞ n Corollary. lim A = 0 ⇔ |λi| < 1, for all eigenvalues λi of A. n→∞ Proof. (⇐) Theorem 7.7. n (⇒) If limn→∞ A = 0, then A is in particular power convergent. By Theorem 7.6, A satisfies either the hypothesis of Theorem 7.7 or 7.8. However, in the latter case we have lim An 6= 0, so we must be in the situation of Theorem 7.7. It remains to prove Theorem 7.8. Before proving it, however, let us illustrate it with the following example.  3 0 0 1 

n 1  1 3 −1 −1  Example 7.10. Find lim A when A = B and B =  . n→∞ 4  1 −1 3 −1  1 0 0 3 st 2 2 Solution. Expanding det(B−tI) along the 1 or last row yields chB(t) = (t−4) (t−2) , 2 1 2 and so chA(t) = (t − 1) (t − 2 ) . Row reducing B − 4I yields  −1 0 0 1   1 0 0 −1   1 −1 −1 −1   0 −1 −1 0  B − 4I =   ↔   ,  1 −1 −1 −1   0 0 0 0  1 0 0 −1 0 0 0 0 and so we see that νA(1) = νB(4) = 4 − 2 = 2 = mA(1); thus, 1 is a regular eigenvalue. 1 Moreover, 1 is also dominant (because |λ2| = 2 < 1), so A power convergent. Thus, by n Theorem 7.8 we have limn→∞ A = E10 (and E11 = 0). To find the limit E10, write down the spectral decomposition theorem: 0 1 0 1 f(A) = f(1)E10 + f (1) E11 +f( )E20 + f ( )E21, for every f ∈ C[t]. |{z} 2 2 0 1 0 1 2 Choose f such that f( 2 ) = f ( 2 ) = 0, i.e. f(t) = (2t − 1) . Then: 2 0 2 (2A − I) = f(1) E10 + f (1) E11 +0E20 + 0E21 ⇒ E10 = (2A − I) . |{z} |{z} (2·1−1)2 0 n 2 1 2 Thus lim A = E10 = (2A − I) = ( 4A −2I) n→∞ 4 |{z} B 1 0 0 12 2 0 0 2

1 1 1 −1 −1 1 0 2 −2 0 =   =   . 4 1 −1 1 −1 4 0 −2 2 0 1 0 0 1 2 0 0 2 To prove Theorem 7.8, we first verify the following fact:

Theorem 7.9. If λ1 is a regular eigenvalue of A, then its associated constituent matrices

E10,...,E1,m1−1 satisfy:

E10 6= 0 and E11 = ... = E1,m1−1 = 0. 322 Chapter 7: Powers of Matrices

Proof. We follow the usual strategy by considering first (special) Jordan matrices, and then arbitrary matrices.

Case 1. A = J is a Jordan matrix with only one (regular) eigenvalue λ1. Then, since λ1 is a regular eigenvalue, we have J = λ1I, and so for every f ∈ C[t] 0 (m−1) f(J) = f(λ1)I = f(λ1)I + f (λ1) · 0 + ... + f (λ1) · 0. On the other hand, by the spectral decomposition theorem we have

0 (m1−1) f(J) = f(λ1)E10 + f (λ1)E11 + ... + f (λ)E1m1−1, for all f ∈ C[t], and so it follows that

E10 = I,E11 = 0,...,E1m1−1 = 0 either by appealing to the uniqueness property of the spectral decomposition or, more directly, as follows. Take f(t) = 1. Then f(λ) = 1, f 0(λ) = ··· = f (m1−1)(λ) = 0, so I = f(λ)I = f(J) =

1 · E10 + 0 · E11 + ··· + 0E1m1−1 which means that E10 = I. Next, take f(t) = (t − λ)k, (k > 0). Then f (k)(λ) = k! and f (l)(λ) = 0 for l 6= k, so k!Elk = f(J) = f(λ)I and hence Elk = 0.

Case 2. A = Diag(J1,J2), where J1 = λ1I and λ1 is not an eigenvalue of J2. A J1 Then E1k = Diag(E1k , 0) (because λ1 is not an eigenvalue of J2; cf. Theorem 7.11(b) J1 below). But by case 1 we have E1k = 0, for 1 ≤ k ≤ m1 − 1, and so it also follows that A A E1k = 0 for k ≥ 1. Moreover, E10 = Diag(I, 0) 6= 0. Thus, the assertion is true in this case as well. Case 3. A arbitrary.

By Jordan’s theorem (and the fact that λ1 is regular), there is a matrix P such that  λ I 0  J = P −1AP = 1 0 J2

−1 has the form of case 2. Thus, since A = PJP , we have for 1 ≤ k ≤ m1 − 1 that

A J −1 case 2 −1 E1k = PE1kP = P · 0 · P = 0.

A J −1 −1 Moreover, E10 = PE10P = P Diag(Im1 , 0)P 6= 0, and so the theorem follows.

Corollary. If λi is a regular eigenvalue of A, then

1 chA(t) Ei0 = gi(A), where gi(t) = m . gi(λi) (t − λi) i

Proof. Take f(t) = gi(t) in the spectral decomposition formula. Since gi(λj) = 0 if λj 6= λi and Eik = 0 for k ≥ 1 by Theorem 7.9, the formula reduces to gi(A) = gi(λi)Ei0. Thus, the assertion follows since gi(λi) 6= 0. Section 7.4: Finding lim An (when A is power convergent) 323 n→∞

Proof of Theorem 7.8. Let λ1 = 1, λ2, . . . , λs denote the distinct eigenvalues of A and m1, . . . , ms their respective algebraic multiplicites. Then by the spectral decomposition theorem applied to f(t) = tn we obtain

s m −1 X Xi n An = k! λn−kE k i ik i=1 k=0

(k) n−k n n−k because f (t) = n(n − 1) ··· (n − k + 1)t = k! k t . By hypothesis, λ1 = 1 is a n regular eigenvalue and so E11 = ... = E1,m1−1 = 0 by Theorem 7.9. Thus, since λ1 = 1 for all n we obtain s m −1 X Xi n An = E + k! λn−kE . 10 k i ik i=2 k=0

Now since λ1 = 1 is a dominant eigenvalue, we have |λi| < 1 for all i > 1, and so by the claim of the proof of Theorem 7.5 the coefficients of the above sum tend to 0 as n → ∞. n Thus, limn→∞ A = E10, and E10 6= 0 by Theorem 7.9.

The above formula for Ei0 (cf. Theorem 7.9, Corollary) can be used to quickly compute the limit lim An in many cases. n→∞  3 0 −1  n 1 Example 7.11. Find lim A when A = 4  0 4 0 . n→∞ −1 0 3 Solution. First note that since A is real and symmetric, it is diagonable by the Principal Axis Theorem 5.4, and so we could compute the limit by diagonalizing A. However, it is quicker to use Theorem 7.9 and the above corollary; the latter is applicable since all eigenvalues of a diagonable matrix are regular. The characteristic polynomial of A is

  3 − 4t 0 −1  1 ch (t) = (−1)3 det(A − tI) = − det 0 4 − 4t 0 A 4   −1 0 3 − 4t 1 2 2 1 = − 43 ((4 − 4t)(3 − 4t) − (4 − 4t)) = (t − 1) (t − 2 ).

chA(t) 1 Thus, 1 is a dominant, regular eigenvalue of A. Since g1(t) = (t−1)2 = t − 2 , it follows from Theorem 7.8 and the above corollary to Theorem 7.9 that

 1 0 −1  n g1(A) 1 lim A = E10 = = 2A − I =  0 2 0  . n→∞ g (1) 2 1 −1 0 1

In the case that λi is a simple eigenvalue, there is a more direct formula for Ei0. Here, the term “‘simple” means: 324 Chapter 7: Powers of Matrices

Definition. An eigenvalue λi of A is called simple if its algebraic multiplicity is 1; i.e. if mA(λi) = 1. Note. Clearly, each simple eigenvalue is regular. [Indeed, if λ is a simple eigenvalue then 1 = mA(λ) ≥ νA(λ) ≥ 1, so mA(λ) = νA(λ), i.e. λ is regular.]

Before giving the formula for Ei0 when λi is simple, we first note Observation. A square matrix A and its transpose At have the same eigenvalues with the same algebraic and geometric multiplicities; i.e. we have

mAt (λ) = mA(λ) and νAt (λ) = νA(λ), for all λ ∈ C.

t Indeed, since det(A ) = det(A), we see that chAt (t) = chA(t), and so the algebraic t multiplicities are the same. Furthermore, νAt (λ) = n − rank(A − λI) = n − rank((A − t λI) ) = n − rank(A − λI) = νA(λ), and so the geometric multiplicities are also the same. In fact, one can also show that A and At are similar (see Problem 4 of the exercises), so they actually have the same Jordan canonical form.

Theorem 7.10. If A is power convergent and 1 is a simple eigenvalue of A, then

n 1 t lim A = E10 = ~u · ~v , n→∞ ~ut · ~v | {z } | {z } matrix scalar where: ~u ∈ EA(1) is any non-zero 1-eigenvector of A, and t ~v ∈ EAt (1) is any non-zero 1-eigenvector of A .

Proof. Let P be such that J = P −1AP is a Jordan matrix, which we may take to be of the form J = Diag(J(1, 1),...). Then, since |λi| < 1 for i > 1, we see that

0 1 0 ... 0 1 B . C B . C (6) lim An = P lim J nP −1 = P B 0 0 . CP −1 = P~e ~et P −1, B . C 1 1 n→∞ n→∞ B ...... C @ . . . . A 0 ...... 0

t where ~e1 = (1, 0,..., 0) . Now since ~e1 is a 1-eigenvector of J, it follows that ~x := P~e1 (= the first column of P ) is a 1-eigenvector of A because A~x = AP~e1 = P J~e1 = P~e1 = ~x. t t −1 −1 Similarly, ~y := ~e1P (= the first row of P ) is (the transpose of) a 1-eigenvector of t t t t t −1 t −1 t −1 t A . Indeed, since ~e1J = ~e1, we see that ~y A = ~e1P A = ~e1JP = ~e1P = ~y . Taking the transpose yields At~y = ~y, which means that ~y is a 1-eigenvector of At. Thus by the above equation (6) we obtain

n t −1 t 1 t lim A = P~e1~e1P = ~x~y = ~x~y , n→∞ ~xt~y

t t t −1 t the latter because ~x ~y = ~y ~x = ~e1P P~ei = ~e1~e1 = 1. Thus, the theorem holds if we take ~u = ~x and ~v = ~y. Section 7.4: Finding lim An (when A is power convergent) 325 n→∞ Now if we make another choice of ~u, then we must have ~u = k~x for some k 6= 0 because the 1-eigenspace of A is one-dimensional. (Recall that the algebraic multiplicity mA(1) = 1 by hypothesis and so the corresponding geometric multiplicity is also νA = 1.) Similarly, any other choice of ~v has the form ~v = c~y for some c 6= 0. We then have ~ut~v = (k~xt)(c~x) = kc~xt~y = kc and similarly ~u~vt = kc~x~yt. Thus, 1 t t n t ~u~v = ~x~y = lim A , as desired. ~v ~u n→∞  2 0 3  n 1 Example 7.12. Find lim A when A = 2  0 2 2 . n→∞ −1 1 0 1 2 Solution. Here chA(t) = (t − 1)(t − 2 ) , so 1 is a simple dominant eigenvalue, and hence A is power convergent. Moreover,

 0 0 3  t 1 EA(1) = {c(1, 1, 0) } because A − I = 2  0 0 2  , −1 1 −2  0 0 −1  t t 1 EAt (1) = {c(2, −3, 0) } because A − I = 2  0 0 1  , 3 2 −2

so we can take ~u = (1, 1, 0)t and ~v = (2, −3, 0)t in Theorem 7.10.  2  Now ~ut · ~v = (1, 1, 0)  −3  = −1, 0  1   2 −3 0  ~u · ~vt =  1  (2, −3, 0) =  2 −3 0  , 0 0 0 0  2 −3 0   −2 3 0  n 1 t 1 hence lim A = ~ut~v ~u~v = (−1)  2 −3 0  =  −2 3 0  . n→∞ 0 0 0 0 0 0 Appendix: Further Properties of Constituent Matrices In order to state these properties in a convenient manner, it is useful to introduce the following notation.

Notation. As usual, let A be an m × m matrix with distinct eigenvalues λ1, . . . , λs of multiplicities m1, . . . , ms, respectively, so

m1 ms chA(t) = (t − λ1) ··· (t − λs) , and let Eik be the constituent matrices of A. Note that the numbering of the Eik’s depends on the ordering of the eigenvalues, so it is useful to label the Eik’s in a way which is independent of the numbering. 326 Chapter 7: Powers of Matrices

For any λ ∈ C and any k ≥ 0 put  A Eik if λ = λi and k ≤ mi − 1, Eλ,k = 0m×m otherwise.

We then have:

Theorem 7.11. Let λ ∈ C and let k ≥ 0 be an integer. −1 B −1 A (a) If B = P AP , then Eλ,k = P Eλ,kP . A A1 Ar (b) If A = Diag(A1,A2,...,Ar), then Eλ,k = Diag(Eλ,k,...,Eλ,k). Proof. (a) By the spectral decomposition formula (Theorem 5.12) we have for every f ∈ C[t] that s m −1 X Xi f(A) = f (k)(λ )EA , i λi,k i=1 k=0 so by Theorem 5.2 we obtain

s m −1 X Xi f(B) = P −1f(A)P = f (k)(λ )P −1EA P. i λi,k i=1 k=0 Since B has the same eigenvalues and multiplicities as A, the spectral decomposition formula for B gives s m −1 X Xi f(B) = f (k)(λ )EB , i λi,k i=1 k=0 By the uniqueness of the matrices appearing in the spectral decomposition formula, it thus follows that EB = P −1EA P for 1 ≤ i ≤ s and 0 ≤ k ≤ m − 1. Thus, if λ = λ , λi,k λi,k i i for some i, and if 0 ≤ k ≤ mi − 1, then we see that the assertion holds. On the other A B hand, if these conditions do not hold, then Eλ,k = 0 = Eλ,k, so the assertion holds here as well. (b) Let λ1, . . . , λs be the eigenvalues of A = Diag(A1,...,Ar). Since the eigenvalues of Aj, for 1 ≤ j ≤ r, are a subset of those of A, and since mAj (λi) ≤ mA(λi), we can write the spectral decomposition formula for Aj in the form

s mi−1 X X A f(A ) = f (k)(λ )E j j i λi,k i=1 k=0 because the extra terms in this sum are all 0 by our conventions. Thus, by Theorem 5.10 we have for any f ∈ C[t] that

s m −1 X Xi f(A) = Diag(A ,...,A ) = f (k)(λ ) Diag(EA1 ,...,EA1 ). 1 r i λi,k λi,k i=1 k=0 Section 7.4: Finding lim An (when A is power convergent) 327 n→∞ Comparing this to the spectral decomposition formula of A, we conclude as in part (a) that EA = Diag(EA1 ,...,EA1 ), for 1 ≤ i ≤ s and 0 ≤ k ≤ m −1. Thus, the assertion λi,k λi,k λi,k i of part (b) holds when λ = λi, for some i and k ≤ mi − 1., and hence in general because in the other cases all the matrices are equal to 0.

Corollary 1. If A = Diag(A1,...,Ar), then

A Eλ,k = 0 for k ≥ max(mA1 (λ), . . . , mAr (λ)).

Aj Proof. By definition we have that Eλ,k = 0, for k ≥ mAj (λ), for all j = 1, . . . , r. Thus, the assertion follows directly from Theorem 7.11(a). We can use the above Theorem 7.11 and its Corollary to relate the vanishing of the constituent matrices to the maximum size of the Jordan blocks of the Jordan canonical form J = JA of A. For this, let λ ∈ C and put

tA(λ) := max{k ≥ 0 : J(λ, k) appears as a Jordan block in JA}

Corollary 2. If A is an m × m matrix and if λ ∈ C, then

A Eλ,k = 0 ⇔ k ≥ tA(λ).

−1 Proof. By Jordan’s theorem, J = JA = P AP , for some P , so by Theorem 7.8(a) we A J −1 A J have Eλ,k = PEλ,kP . In particular, Eλ,k = 0 ⇔ Eλ,k = 0. If tA(λ) = 0, i.e., if λ is not an eigenvalue of J = JA, then the assertion is clear. 0 Thus, assume that tA(λ) ≥ 1, say λ = λ1. Then J = Diag(J11,...,J1ν,J ), where 0 J1j = J(λ1, kij) and J is a Jordan matrix with mJ0 (λ) = 0. Now by Example 5.25 we have that 1 EJ1j = J(0, k )k and EJ1j = 0 ⇔ k ≥ k . λ,k k! 1j λ,k 1j J 1 k k Thus, by Theorem 7.11(b) we have that Eλ,k = k! Diag(J(0, k11) ,...,J(0, k1ν) , 0) = 0 if and only if k ≥ max(k11, . . . , k1ν) = tJ (λ) = tA(λ), and so the assertion follows. Remark. 1) Recall from Corollary 3 of Theorem 6.4 that λ is a regular eigenvalue of A if and only if tA(λ) = 1. Thus, if mA(λ) ≥ 1, then by Corollary 2 we have that

A λ is a regular eigenvalue of A ⇔ Eλ,k = 0, for all k ≥ 1. Note that this is a restatement of Theorem 7.9, so Corollary 2 generalizes this theorem. p 2) The number tA(λ) is connected with the generalized geometric multiplicities νA(λ) in the following way: k k+1 νA(λ) = νA (λ) ⇔ k ≥ tA(λ).

Indeed, by definition of tA(λ) we have that the Jordan canonical form of A has no Jordan blocks J(λ, k) of size ≥ k if and only if k ≥ tA(λ) + 1, and by Theorem 6.5(c) we have k k−1 that this is the case if and only if νA(λ) = νA (λ). 328 Chapter 7: Powers of Matrices

3) If we put tA(λ1) tA(λs) µA(t) := (t − λ1) ··· (t − λs) , where λ1, . . . , λs are the distinct eigenvalues of A, then µA(t)| chA(t) and we have that

µA(A) = 0 by the spectral decomposition formula and by Corollary 2. This, therefore, refines the Cayley-Hamilton Theorem. Note that µA(t) is the minimal polynomial of A of the Appendix of Chapter 6.

Exercises 7.4. 1. By using shortcuts, find the constituent matrices of the matrices  1 0 −1 2 −1   1 0 0 0 −1   0 1 0 −3 0   1 2 0 0 1   1  1   A =  0 0 2 0 0  and B =  1 0 2 0 1  ,  1  2    0 0 0 2 0   1 0 0 2 1  1 0 0 0 0 − 3 0 0 0 0 1 and use these to calculate lim An and lim Bn. n→∞ n→∞ 2. Find lim An when n→∞  1 1 0 0   5 2 2  1  0 1 0 0  1 (a) A =   and (b) A = 3 4 6 . 4 2 1 1 2 10     2 4 2 2 1 3 2

3. Let A be an n × n matrix which is power convergent with limit B = lim Ak, and k→∞ let m = mA(1) ≥ 0 denote the algebraic multiplicity of 1. Show: m n−m (a) chB(t) = (t − 1) t ; (b) 0 and 1 are regular eigenvalues of B (whenever they occur). [Hint: Look at the Jordan canonical form of A.] 4. (a) Let J = J(λ, 3). Show that J is similar to its transpose J t. In other words, find P such that  λ 1 0   λ 0 0  −1 P  0 λ 1  P =  1 λ 0  . 0 0 λ 0 1 λ (b) As in (a), but for any Jordan block J = J(λ, m). Extend this to a general Jordan matrix. (c) Conclude that for any square matrix A, its transpose At is similar to A. Section 7.5: The and Gerˇsgorin’s theorem 329 7.5 The spectral radius and Gerˇsgorin’s theorem

In much of our preceeding work we were able to solve problems concerning a square matrix A provided we knew certain facts about its spectrum, i.e. about its set of eigenvalues. However, often only a crude estimate of where the eigenvalues lie is enough to enable us n to solve the problem. For example, to determine that limn→∞ A = 0 we need only know that |λ| < 1 for all eigenvalues of λ of A. Now it turns out that it is frequently easy to get such crude estimates by using a remarkable theorem due to Gerˇsgorin. To be able to state this theorem in a convenient form, we first introduce the following terminology. Definition. Let A be a square matrix. Its spectral radius is the real number defined by

ρ(A) = max{|λ| : λ ∈ C is an eigenvalue of A}.  1 0 0  Example 7.13. Find the spectral radius of A = Diag(1, i, 1 + i) =  0 i 0  . 0 0 1 + i Solution. The spectrum of A is {1, i, 1 + i}, so √ √ ρ(A) = max(|1|, |i|, |1 + i|) = max(1, 1, 2) = 2.

With the help of the spectral radius we can restate some of our previous theorems such as Theorems 7.6, 7.7 and 7.8 in a more convenient form: Theorem 7.60. A is power convergent if and only if either ρ(A) < 1 or ρ(A) = 1 and 1 is a dominant, regular eigenvalue of A. Theorem 7.80. lim An = 0 ⇔ ρ(A) < 1. n→∞ One of the interesting facts concerning the spectral radius is that we can estimate it without knowing the eigenvalues explicitly. There are several such methods available. The first one is based on Gerˇsgorin’s theorem and relates the spectral radius ρ(A) of A to the number

m ! X ||A|| = max |aij| , i j=1 which is often called the norm of the m × m matrix A = (aij). Remarks. 1) The norm || · || satisfies the following properties: (0) ||A|| ≥ 0, ||A|| = 0 ⇔ A = 0, (1) ||A + B|| ≤ ||A|| + ||B||, (2) ||cA|| = |c| ||A||, if c ∈ C, (3) ||A · B|| ≤ ||A|| · ||B||. 330 Chapter 7: Powers of Matrices

To see this, note first that properties (0) and (2) are clear and that (1) follows immediately from the triangle inequality. Finally, to verify (3), write A = (aij),B = (bij) and Pm C = AB = (cij). Then cij = k=1 aikbkj, so

m m m X X X X X |cij| = aikbkj ≤ |aikbkj| j=1 j=1 k=1 j k X X X = |aik| |bkj| ≤ |aik| · ||B|| k j k ≤ ||A|| · ||B||

2) Actually, any function || · || : Mn(C) → R satisfying the four properties (0)–(3) is t called a (matrix) norm. There are many others; for example, ||A||t := ||A || also satisfies these properties, as does ||A||2 := max(||~v1||,..., ||~vm||), if A = (~v1| ... |~vm). Some of these are implemented in MAPLE: ||A|| is given by the MAPLE command t norm(A, 1) and ||A || by norm(A, infinity), and ||A||2 by norm(A, 2). As we shall see presently, the following estimate on the spectral radius follows easily from Gerˇsgorin’s theorem (which gives much more precise information about the location of the eigenvalues). Theorem 7.12. If A is an m × m matrix, then ρ(A) ≤ min(||A||, ||At||).

 1 1  0 3 4   Example 7.14. Estimate the spectral radius of A =  1 0 1  .  2 4  1 1 2 3 0 2 1 Solution. The sums of the absolute values of the columns are 1, 3 , 2 , respectively, so 7 3 5 t 5 ||A|| = 1. On the other hand, the row sums are 12 , 4 , 6 so ||A || = 6 . Thus, by Theorem 7.12 we see 5 5 ρ(A) ≤ min(1, 6 ) = 6 . Note that this estimate already shows that A is power convergent (with limit 0). To see how good this estimate is, let us compute the characteristic polynomial of A,

3 3 1 chA(t) = t − 8 t − 12 , which doesn’t have any rational roots. Its roots are all real; they are (approximately): −0.420505805, −0.282066739, 0.7025747892. Thus, the spectral radius is ρ(A) =· 0.7025747852 ...

5 which is not too far away from the bound 6 = .83333 ... Section 7.5: The spectral radius and Gerˇsgorin’s theorem 331

We now turn to Gerˇsgorin’s theorem, for which we first require some additional no- tation (and terminology).

Notation. If A = (aij) ∈ Mm(C) is an m × m matrix, then for each k with 1 ≤ k ≤ m put

m X rk = |akj|, j = 1 j 6= k

Dk = Dk(A) = {z ∈ C : |z − akk| ≤ rk}.

Thus, rk is the sum of the absolute values of the entries of the k-th row excluding the diagonal element, and Dk is the disc of radius rk centered at the (diagonal element) akk ∈ C. These discs are called the Gerˇsgorindiscs. D D 2   1 0 1 a11 = 0, r1 = |1| = 1 ' $ Example 7.15. (a) A =   2 2 a22 = 2, r2 = |2| = 2   & %  0 1  D1 = D2 (b) A = a11 = a22 = 0, r1 = r2 = 1 1 0  

    D1 D3 0 1 −1 a11 = 0 r1 = |1| + | − 1| = 2 ' © $ (c) A = 0 0 1 a22 = 0 r2 = |0| + |1| = 1    1 0 1 a33 = 1 r3 = |1| + |0| = 1  D1 D2  D3 & %   i © i 0 i a11 = i r1 = 1 ' $ (d) A =  0 −i 1  a22 = −i r2 = 1 p 1 1 0 a = 0 r = 2   33 3  p D&2 %   3i i −1 a11 = 3i r1 = |i| + | − 1| = 2 (e) A =  −1 4 1  a22 = 4 r2 = | − 1| + |1| = 2 2 −1 −4 + i a33 = −4 + i r3 = |2| + | − 1| = 3

D1 3i  ' $ D2 −4 + i q   q  4 q D &3 %  Theorem 7.13 (Gerˇsgorin, 1931). Each eigenvalue λ of A lies in at least one Gerˇsgorin disc Dk(A) of A. 332 Chapter 7: Powers of Matrices

t Proof. Since λ is an eigenvalue of A, we have A~v = λ~v for some ~v = (v1, . . . , vm) 6= ~0. m m X X Then, for each k, we have λvk = (A~v)k = akjvj = akkvk + akjvj, and so j=1 j = 1 m j 6= k X (7) (λ − akk)vk = akjvj. j = 1 j 6= k

Now choose k such that vk = max |vj|. 1≤j≤m Then, applying the triangle inequality to (7) yields:

m m (7) X X X |λ − akk| · |vk| = akjvj ≤ |akj||vj| ≤ |akj| ·|vk| = rk · |vk|.

j = 1 j = 1 j6=k j 6= k j 6= k | {z } rk

Since |vk| 6= 0 (for otherwise ~v = ~0, contradiction), we obtain |λ − akk| ≤ rk, i.e. λ ∈ Dk(A).

Corollary. Suppose that the diagonal entries of the matrix A = (aij) are much larger than its off-diagonal entries in the sense that

m X (8) |akk| > |akj|, for k = 1, . . . , m. j = 1 j 6= k

Then A is invertible.

Proof. The hypothesis means that |0 − akk| > rk or that 0 ∈/ Dk(A), for 1 ≤ k ≤ m. By Gerˇsgorin theorem, this means that 0 is not an eigenvalue of A and hence A is invertible. Example 7.16. If A is as in Example 7.15(e), then the picture shows that 0 does not lie in any Gerˇsgorin disc, so A is invertible. Proof of Theorem 7.12. We first show:

m X (9) ρ(A) ≤ ||A|| := max( |akj|). k j=1

For this, let λ be an eigenvalue of A of maximal absolute value, i.e. |λ| = ρ(A). Now by Section 7.5: The spectral radius and Gerˇsgorin’s theorem 333

Gerˇsgorin’s theorem, λ ∈ Dk(A) for some k, i.e. m X |λ − akk| ≤ |akj|, or j = 1 j 6= k m X |λ − akk| + |akk| ≤ |akj| ≤ ||A||. j=1

On the other hand, since ρ(A) = |λ| = |(λ − akk) + akk| ≤ |λ − akk| + |akk| by the triangle inequality, it follows from the above inequality that ρ(A) ≤ |λ−akk|+|akk| ≤ ||A||, which proves (9). Next we apply (9) to At to get ρ(At) ≤ ||At||. But since A and At have the same eigenvalues, ρ(A) = ρ(At) ≤ ||At||. Thus we have ρ(A) ≤ ||A|| and ρ(A) ≤ ||At|| and so Theorem 7.12 follows. For real matrices, there are other methods of estimating the spectral radius. One of these is based on the trace of the matrix.

Definition. If A = (aij) is an m × m matrix, then its trace is defined as the sum of its diagonal elements: tr(A) = a11 + a22 + ··· + amm. 1 2 Example 7.17. a) tr 3 4 = 1 + 4 = 5. 0 1 −2 3 1 b) tr@ 4 −6 6 A = 1 − 6 + 9 = 4. 7 −8 9 Properties of the trace. If A, B and P are m × m matrices and P is invertible, then (1) tr(AB) = tr(BA) (2) tr(P −1AP ) = tr(A)

(3) tr(A) = mA(λ1)λ1 + ··· + mA(λs)λs = sum of the eigenvalues of A, counted with their multiplicities.

[Indeed, to verify (1), write A = (aij) and B = (bij). Then m m m m X X X X tr(AB) = aikbki = bkiaik = tr(BA) i=1 k=1 k=1 i=1 which is (1). From this (2) follows because

(1) tr(P −1AP ) = tr(P −1 (AP )) = tr((AP )P −1) = tr(A). |{z} | {z } A B Finally, (3) is immediate for Jordan matrices J. If A is an arbitrary matrix, then by (2) Jordan’s theorem P −1AP = J is a Jordan matrix for some P , and thus tr(A) = tr(P −1AP ) = tr(J).] 334 Chapter 7: Powers of Matrices

Theorem 7.14. If A is a real m × m matrix, then

(10) ρ(A) ≤ pρ(AtA) ≤ ptr(AtA).

Remark. Even though pρ(AtA) is a better estimate of ρ(A) than ptr(AtA), the price we pay for this improvement is often not worth the effort, for ρ(AtA) is in general as hard (or harder) to compute as ρ(A). (For example, if A is a (At = A), then ρ(AtA) = ρ(A2) = ρ(A)2, so the first inequality in (10) is an equality in this case and hence is vacuous). However, the combination of this inequality with other methods frequently leads to better results, as the following example shows.  1 −3 4 0   2 1 2 1  Example 7.18. Estimate ρ(A) when A =   .  1 −1 1 −1  −1 2 1 −1 Solution. We try in turn the various methods at our disposal. Method 1: use Theorem 7.12. Here ||A|| = max(5, 7, 8, 3) = 8 and ||At|| = max(8, 6, 4, 5) = 8, so by Theorem 7.12 we obtain ρ(A) ≤ max(8, 8) = 8. Method 2: use Theorem 7.14.  7 ...... ∗  . .  . .  t  . 15 .  t Since A A =  . .  , we see that tr(A A) = 47 and hence by Theorem 7.14  . 22 .  ∗ ...... 3 √ we obtain ρ(A) ≤ 47 < 6.86. Method 3: combine Theorems 7.12 and 7.14.  7 −4 8 2   4 15 −9 0  We have AtA =   .  8 −9 22 0  2 0 0 3 t t t Applying Method 1 to A A yields ρ(A A) ≤ ||A A|| = max(21,√28, 39, 5) = 39 and hence by the first inequality of (10) we obtain ρ(A) ≤ pρ(AtA) ≤ 39 < 6.25, which is the best estimate. [However, none of these three estimates is particularly good, for

4 3 2 chA(t) = t − 2t + 3t + 16t + 45 which has (by MAPLE) the approximate roots 2.36496 ± 2.57701i, −1.36496 ± 1.34728i, · · 1 and so ρ(A) = |2.36496 ± 2.57701i| = 3.49772, which is almost 2 of the above estimate!] Section 7.5: The spectral radius and Gerˇsgorin’s theorem 335

The proof of Theorem 7.14 is based on the fact that AtA is a positive definite matrix, so we first review some properties of such matrices. Insert: Properties of positive semi-definite matrices Let A be a real symmetric m × m matrix. Then A is said to be positive semi-definite (notation: A ≥ 0) if t n ~v A~v ≥ 0, for all ~v ∈ R . (Note that ~vtA~v is a 1 × 1 matrix, i.e. a number). t 1) A ∈ Mm(R) ⇒ A A ≥ 0. [Clearly, AtA is real symmetric, for (AtA)t = At(At)t = AtA. Furthermore, ~vt(AtA)~v = (~vtAt)A~v = (A~v)tA~v = ||A~v||2 ≥ 0, so AtA is positive semi-definite.]

2) A ≥ 0 ⇒ A is diagonable and has real non-negative eigenvalues λi ≥ 0. [Indeed, since At = A, the Principal Axis Theorem (cf. Chapter 5) tells us that A is diagonable and that the eigenvalues λi are real. Now if ~vi is an associated eigenvector, 2 t t 2 then λi||~vi|| = λi~vi~vi = ~vi (A~vi) ≥ 0, so λi ≥ 0 since ||~vi|| > 0.] t 3) A ≥ 0 ⇔ A = A and λi ≥ 0 for every eigenvalue λi of A. t [Clearly, if A ≥ 0 then A = A by definition and λi ≥ 0 by property 2). Moreover, the −1 converse is clear if A = Diag(λ1, . . . , λm) is diagonal. In the general case, A = P DP for some P = (P −1)t, so ~vtP tDP~v = (P~v)tD(P~v) ≥ 0.] 4) A ≥ 0 ⇒ ρ(A) ≤ tr(A).

[Indeed, let λ1 ≥ λ2 ≥ · · · ≥ λm ≥ 0 be the eigenvalues of A, listed according to their multiplicities. Then ρ(A) = λ1 ≤ λ1 + λ2 + ··· + λm = tr(A).] In addition to these elementary properties, we also require the following more subtle fact.

Lemma. Let A ∈ Mm(R) be a real m × m matrix and let λ ∈ C be an eigenvalue of A. t If 0 ≤ µ1 ≤ · · · ≤ µm denote the eigenvalues of A A (counted with multiplicities), then there are non-negative real numbers θ1, . . . , θm ≥ 0 such that

m m 2 X X (11) |λ| = θiµi and θi = 1. k=1 k=1

Proof. Note that even though A is real, some of its eigenvalues and eigenvectors may be non-real. We thus introduce the complex conjugate ~v and complex norm ||~v|| of a vector t m ~v = (v1, . . . , vm) ∈ C :

t ~v = (v1,..., vm) p p 2 2 ||~v|| = ~v · ~v = |v1| + ··· + |vm| 336 Chapter 7: Powers of Matrices

This norm has the familiar properties:

||~v|| ≥ 0 and ||~v|| = 0 ⇔ ~v = ~0(12) (13) ||α~v|| = |α| · ||~v||.

We apply this to an eigenvector ~v 6= ~0 associated to our given eigenvalue λ. (Thus ~v ∈ EA(λ).) Since A is real, we have A~v = (A~v) and so

(13) ~vtAtA~v = (A~v)t(A~v) = ||A~v||2 = ||λ~v||2 = |λ|2||~v||2.

Now we apply the Principal Axis Theorem to the symmetric matrix AtA to find an orthogonal matrix P ∈ Mn(R) such that

−1 t t −1 t P (A A)P = Diag(µ1, µ2, . . . , µm) = D, or A A = PDP = PDP .

Put ~w = P t~v. Then, since P is real, ~w = (P t~v) = P t~v, and so

~wtD ~w = (P t~v)tDP t~v = ~vtPDP t~v = ~vt(AtA)~v = |λ|2||~v||2 by the above. On the other hand, since D is a diagonal matrix,

m m t X X 2 t ~w D ~w = wiµi ~wi = µi|wi| , if ~w = (w1, . . . , wm) . i=1 i=0 Thus, combining this with the previous equation yields

m X 2 2 2 µi|wi| = |λ| ||~v|| , i=1 and so we see that the first equation of (11) holds with

|w |2 θ = i ≥ 0. i ||~v||2

Moreover, the second equation of (11) also holds because

m m 2 X X 2 2 t 2 t t t t 2 ||~v|| θi = |wi| = ||~w|| = ||P ~v|| = (P ~v) (P~v) = ~v PP ~v = ||~v|| , i=1 i=1 and so, dividing through by ||~v||2, we obtain that

m X θi = 1. i=1 Section 7.5: The spectral radius and Gerˇsgorin’s theorem 337

Proof of Theorem 7.14. Since AtA is positive semi-definite (cf. property 1) of the above insert), we have ρ(AtA) ≤ tr(AtA) by property 4). This proves the second inequality of (10). To prove the first inequality, let λ ∈ C be an eigenvalue of A such that |λ| = ρ(A), t and put M = max{µi : 1 ≤ i ≤ m}, where the µi’s are the eigenvalues of A A. Thus, since AtA is positive semi-definte, M = ρ(AtA). Then by the Lemma we have

m m m 2 X X X t |λ| = µiθi ≤ Mθi = M θi = M = ρ(A A), i=1 i=1 i=1 and so ρ(A)2 = |λ|2 ≤ ρ(AtA). From this the first inequality of (10) follows by taking square roots.

Exercises 7.5.

1. (a) Find the Gerˇsgorin discs of the following matrices.

 0 −1 1   2i 1 1   0 1  A = ; B = 1 2 1 ; C = 0 2 0 . −1 i     1 0 −1 1 0 −1

(b) What can say about the approximate location of the eigenvalues? (c) Compute the eigenvalues and compare your answer to that obtained in (b).

2. Suppose A is a 3 × 3 matrix with spectrum {1 + i, 1 − i, 1}. (a) What is the spectral radius of A? (b) What is the spectral radius of B = A2 − 2A + 2I? (c) Is B power convergent?

3. Let m × m matrix with characteristic polynomial

m m−1 chA(t) = t + am−1t + ... + a1t + a0.

Show that m tr(A) = −am−1 and det(A) = (−1) a0.

4. (a) Show that for all m × m matrices we have

tr(ABC) = tr(BCA) = tr(CAB).

(b) Find three 2 × 2 matrices such that tr(ABC) 6= tr(ACB). 338 Chapter 7: Powers of Matrices 7.6 Geometric Series

In the previous sections we learned how to determine whether a given matrix A is power n convergent and also how to find the limit limn→∞ A . We now apply these methods to the following problem arising in economics. Problem: Four regions (or cities) R1, R2, R3, and R4 ship non-renewable commodities (such as antique paintings, rental cars) among themselves. Each region ships per week a fixed percentage of its goods to the other regions according to the following diagram:

70% −→ R1 ←− R2 30% The figures represent the percentage of

10% ↓ &10% ↓10% goods each region ships per week. (Percentage: in terms of goods present) R3 R4

Today’s distribution: R1 has $300, 000 worth of goods R2 has $200, 000 worth of goods R3 has $100, 000 worth of goods R4 has $100, 000 worth of goods Questions: 1) What is the distribution of goods after n weeks? 2) What happens in the long run?

Analysis: Let ~v1n = amount in $100,000 of goods present in R1 after n weeks, ~v2n = the amount of goods in R2 after n weeks, etc. 0 1 v1n B v2n C Thus, ~vn = B C represents the distribution of goods after n weeks, and today’s distri- @ v3n A v4n 0 3 1 B 2 C bution is represented by the vector ~v0 = B C. We now determine how the distribution @ 1 A 1 of the goods changes from week n to week n + 1: R1 retains 10% of its goods (because it sends 70 + 10 + 10 = 90% away) + receives 30% of those of R2 R2 retains 60% of its goods (because it sends 30 + 10 = 40% away) + receives 70% of those of R1 R3 retains all of its goods + receives 10% of those of R1 R4 retains all of its goods + receives 10% of those of R1 Section 7.6: Geometric Series 339

We thus have the equations:

1 3 v1,n+1 = 10 v1,n + 10 v2,n 7 6 v2,n+1 = 10 v1,n + 10 v2,n 1 v1,n+1 = 10 v1,n + + 1 · v3,n 1 1 v1,n+1 = 10 v1,n + 10 v2,n + + 1 · v4,n This means that we have the discrete linear system

~vn+1 = A~vn in which  1 3 0 0   3  1  7 6 0 0   2  A =   and ~v0 =   . 10  1 0 10 0   1  1 1 0 10 1 From chapter 5 (discrete linear systems) we therefore know that

n ~vn = A ~v0.

How can we compute An efficiently? We could of course apply our previous method (using constituent matrices) to the matrix A, but since A has the special shape

 T 0   1 3   1 0  A = , where T = 1 and B = 1 , BI 10 7 6 10 1 1 it is better to use the following rule. Observation (Rule for multiplying partitioned matrices). Given the partitioned matrices as indicated, we have

n1 n2 ns k1 k2 kt k1 k2 kt  z}|{ z}|{ z}|{   z}|{ z}|{ z}|{   z}|{ z}|{ z}|{  m1{ A11 A12 ... A1s B11 B12 ... B1t }n1 C11 C12 ... C1t }m1 m2{ A21 ...... A2s   B21 ...... B2t }n2  C21 ...... C2t }m2        ......   ......  =  ......   . . . .   . . . .   . . . .  mr{ Ar1 ...... Ars Bs1 ...... Bst }ns Cr1 ...... Crt }mr where Cij = Ai1B1j + Ai2B2j + ··· + AisBsj.

Notes: 1) As indicated, Aij is an mi × nj matrix and Bij an ni × kj matrix. 2) This generalizes the rules we learned for multiplying block matrices. 340 Chapter 7: Powers of Matrices

T 0 Applying this rule to our situation with A = BI yields:  T 0   T 0   T 2 + 0 · BT · 0 + 0 · I   T 2 0  A2 = = = BI BI B · T + I · BB · 0I2 B(T + I) I

Similarly,

 T 2 0   T 0   T 3 0  A3 = A2 · A = = . B(T + I) I BI B(T + I)T + BI | {z } B(T 2+T +I) Thus, continuing in this manner leads to the formula

 T n 0  (14) An = . B(T n−1 + T n−2 + ··· + T + I) I

This formula therefore shows that in order to find An, we only have to do computations involving 2 × 2 (rather than 4 × 4) matrices! However, while we know how to compute T n efficiently, how do we compute the sum

T n−1 + ··· + T + I?

Recall (from high school): Geometric series. If a 6= 1 is any number, then 1 − an 1 + a + a2 + ··· + an−1 = 1 − a and hence, if |a| < 1, then the infinite series (called a geometric series)

∞ X def. 1 an = lim (1 + 1 + ··· + an−1) = . n→∞ 1 − a n=0 (Recall from analysis that an (infinite) series is the limit of a sequence of partial sums.) This idea works for matrices as well, as we shall now see.

Definition. Let T be a square matrix. Then the sequence {Sn}n≥0 defined by

n−1 Sn = I + T + ... + T ,S0 = I, is called the geometric series generated by T . The series converges if the sequence {Sn}n≥0 converges; we then write ∞ X n T = lim Sn. n→∞ n=0 Section 7.6: Geometric Series 341

Theorem 7.15. The geometric series generated by T converges if and only if

(15) ρ(T ) < 1; i.e. |λi| < 1, for each eigenvalue λi of T.

If this condition holds, then I − T is invertible and we have

n−1 X k −1 n (16) Sn := T = (I − T ) (I − T ), k=0 and hence the series converges to

∞ X (17) T k = (I − T )−1. k=0 Remark. Note that by the Corollary of Theorem 7.8 the above condition (15) is equiv- alent to:

(18) lim T n = 0. n→∞

Proof of Theorem 7.15. (⇐) Suppose first that the sequence {Sn}n converges, and put S = lim Sn. Then, since n→∞

n n−1 n−1 n Sn − Sn−1 = (T + T + ··· + I) − (T + ··· + I) = T ,

n it follows that limn→∞ T = limn→∞(Sn −Sn−1) = limn→∞ Sn −limn→∞ Sn−1 = S−S = 0. Thus, T is power convergent with limit 0, so by Theorems 7.6, 7.7, 7.8, we see that (15) holds.

(⇒) Now suppose that (15) holds. Then the eigenvalues of I − T are 1 − λi 6= 0, so I − T is invertible. Furthermore,

n T · Sn = T (I + T + ··· + T ) = T + T 2 + ··· + T n+1 = (I + ··· + T n) + T n+1 − I n+1 = Sn + T − I.

n+1 n+1 Thus, T ·Sn −Sn = T −I and hence (T −I)Sn = T −I. Since (T −I) is invertible, formula (16) follows. Finally, from (15) and Theorem 7.7 we have that lim T n+1 = 0, so n→∞

−1 n+1 −1 lim Sn = (I − T ) (I − lim T ) = (I − T ) · I, n→∞ n→∞ which proves (17). 342 Chapter 7: Powers of Matrices

∞ X  1 3  Example 7.19. Find the geometric series T n when T = 1 . 10 7 6 n=0 Solution. We first verify that T satisfies the hypothesis (15). 8 9 9 Method 1: By Theorem 7.12 we have ρ(T ) ≤ ||T || = max( 10 , 10 ) = 10 < 1, so T satisfies (15). 0 1 3 2 Method 2: Put T = 10T = . Then chT 0 = (1 − t)(6 − t) − 21 = t − 7t − 15 = √ √ 7 6 √ √ 7+ 109 7− 109 7+ 109 7− 109 (t − 2 )(t − 2 ), so chT (t) = (t − 20 )(t − 20 ). | {z } | {z } √ √ λ1 λ2 Since 0 < 1 (7 + 109) < 1 (7 + 121) = 18 < 1, 20 √ 20 √ 20 and 0 > 1 (7 − 109) > 1 (7 − 100) = − 3 > −1, 20 20 √ 20 1 9 we see that ρ(T ) = max(|λ1|, |λ2|) = 20 (7 + 109) < 10 < 1, so T satisfies (15). Thus, by Theorem 7.15, the geometric series generated by T converges and ∞ −1 X  1  9 −3  T n = (I − T )−1 = 10 −7 4 k=0  9 −3 −1 10  4 3  2  4 3  = 10 = = . −7 4 36 − 21 7 9 3 7 9

Example 7.20. Application to the “shipping of commodities” problem:  TO   1 0  Find lim An when A = , with T as in Example 7.19 and B = 1 . n→∞ BI 10 1 1 Solution. By the previous discussion we have:  n    n (14) T 0 n Th. 7.15 0 0 A = n−1 , so lim A = −1 . B(I + ··· + T ) I n→∞ B(I − T ) I Now by Example 7.19 4 3 1 0 4 3  4 3  (I − T )−1 = 2 , so B(I − T )−1 = 1 2 = 1 , 3 7 9 10 1 1 3 7 9 15 11 12 and hence  0 0 0 0  n 1  0 0 0 0  lim A =  . n→∞ 15  4 3 15 0  11 12 0 15 This means that the distribution of goods in the long run is

0 0 0 0 0 1 0 3 1 0 0 1 0 0 1 n 1 B 0 0 0 0 C B 2 C 1 B 0 C B 0 C lim ~vn = lim A ~v0 = B C B C = B C = B C. n→∞ n→∞ 15 @ 4 3 15 0 A @ 1 A 15 @ 33 A @ 2.2 A 11 12 0 15 1 72 4.8 In other words, R1 and R2 do not have any goods in the long run, R3 has $220, 000 worth of goods in the long run, R4 has $480, 000 worth of goods in the long run. Section 7.6: Geometric Series 343

Exercises 7.6. 1. For each of the following matrices, determine (and justify) whether or not the geometric series generated by the matrix converges and, if so, find its limit.     4 0 0 0   3 0 1 1 1 1 1 1  1 4 −1 0  A = ,B = 1 2 −1 ,C =   . 2 −1 1 4   5 0 0 3 0 1 0 3   1 3 0 0

2. Find lim An when n→∞  1 0 1 0 0 0   0 1 1 0 0 0    1  −1 1 1 0 0 0  A =   . 2  0 0 1 1 0 0     0 1 0 0 1 0  1 0 0 0 0 1 3. (a) Let T be an r × r matrix, B an s × r matrix and put m = r + s. Show that the  T 0  m × m matrix A = is invertible if and only if T is invertible, and that in BI  T −1 0  this case A−1 = .   −BT −1 I 10 0 ...... 0 .  ... .   9 1 .   . . . . .  (b) Write down the inverse of the 10×10 matrix A =  ......  without    2 0 ... 1 0  1 0 ... 0 1 performing any calculations.  AC  4. Let P = , where A is an r × r matrix, B is an s × s matrix and C is an 0 B r × s matrix. (a) Show that P is invertible if and only if A and B are invertible. [Hint: Consider the of suitable rows and/or columns.] (b) If A and B are invertible, show that the inverse of P is given by  A−1 X  P −1 = , where X = −A−1CB−1. 0 B−1 Note: A quick method to prove (a) is to use the general fact that det(P ) = det(A) det(B), but you are not supposed to use this rule in this problem. 344 Chapter 7: Powers of Matrices 7.7 Stochastic matrices and Markov chains

The discrete linear system ~vn+1 = A~vn of the previous example (“shipping of commodities”) had the following two properties:

1) Each entry of A (=transition matrix) is non-negative. 2) The sum of the entries of each column is = 1 (= sum of percentages).

A discrete linear system of this type is called a stochastic system, and its transition matrix A is called a . We had encountered such matrices already in subsection 5.9.3 in connection with Markov chains. More generally:

Definition. A real p × q matrix A = (aij) is called stochastic if

aij ≥ 0, for all i, j;(19)

(20) Σp · A = Σq, where Σp = (1, 1,..., 1) (viewed as a 1 × p matrix). | {z } p Remark. Condition (20) means: the sum of each column of A is equal to 1.  2 4  Example 7.21. (a) (1, 1, 1)  3 2  = (2 + 3 + 1, 4 + 2 + 5) = (16, 11) | P{z } 1 5 3 | {z } A ⇒ A is not stochastic.  1 1   1 1  1 1 (b) B = 5  2 3  ;Σ3B = (1, 1, 1) 5  2 3  = (1, 1) ⇒ B is stochastic. 2 1 2 1

Theorem 7.16. If A is a stochastic matrix of size p × q and B is a stochastic matrix of size q × r, then AB is a stochastic matrix of size p × r.

Proof. Clearly AB has non-negative entries (since A and B do), so it is enough to show that: Σp(AB) = Σr. Now:

A stochastic B stochastic Σp(AB) = (ΣpA)B = (Σq)B = Σr , and so AB is stochastic. Corollary. If A is a square stochastic matrix, then so is An for all n ≥ 0. If, moreover, ~v is a stochastic vector, then An~v is a stochastic vector for all n ≥ 0. Section 7.7: Stochastic matrices and Markov chains 345

Proof. Induct on n. Since the statement is clear for n = 0, assume n > 0. By the induction hypothesis we have that An−1 is stochastic, and hence An = A · An−1 is also stochastic by Theorem 7.16. Thus, the first assertion holds for all n ≥ 0. The second assertion follows from the first by applying Theorem 7.16 to An and ~v. Remark. In section 5.9 we had seen that if (S, A) is a Markov chain, then its transition matrix A and probability distribution vectors ~vn are all stochastic, and that these satisfy the discrete linear system ~vn+1 = A~vn. In the sequel we shall frequently refer to any such discrete linear system (in which A and the ~vn’s are stochastic) as an (abstract) Markov chain. We have the following basic result about the eigenvalues of a square stochastic matrix: Theorem 7.17. Let A be a square stochastic matrix of size m × m. Then (a) 1 is a regular eigenvalue of A.

(b) We have |λi| ≤ 1, for every eigenvalue λi of A; i.e. ρ(A) = 1.

t Proof. (a) We first prove that λ1 = 1 is an eigenvalue. Since A and A have the same eigenvalues, this follows from:

t t (21) Σm is an eigenvector of A with eigenvalue λ1 = 1.

t t t t t Indeed, A stochastic ⇒ ΣmA = Σm ⇒ A Σm = Σm ⇒ Σm is a 1-eigenvector of A . To prove that 1 is a regular eigenvalue of A, we will use the norm of a matrix as was defined in section 7.5. Since A is stochastic, we have that

(22) ||A|| = max(1, 1,..., 1) = 1.

Thus, since An is also stachastic, we have that ||An|| = 1, for any n ≥ 1. Thus, by using property (3) of the norm, we see that for any P we have

−1 n ∗ −1 (23) ||P A P || ≤ cP := ||P || · ||P ||, for all n ≥ 1.

Now take P such that P −1AP = J is a Jordan matrix. Then by (23) we see that n ||J || ≤ cP , for all n ≥ 1. This implies that J cannot contain a Jordan block J(1, k) of size k ≥ 2 because then

 1 n ∗ ∗  n  0 1 n ∗  J(1, k) =   ,  ......  0 ...... 1 where the ∗’s are positive numbers, and so ||J n|| ≥ ||J(1, k)n|| ≥ 1+n, which contradicts (23). Thus, 1 is a regular eigenvalue of A. 346 Chapter 7: Powers of Matrices

(b) By Theorem 7.12 and (22) we see the spectral radius ρ(A) ≤ ||A|| = 1. Since by part (a) we know that 1 is an eigenvalue of A, it follows that ρ(A) = 1. Warning. Theorem 7.17 does not imply that A is power convergent, for 1 need not be a dominant eigenvalue, i.e. there may be other eigenvalue λi 6= 1 with |λi| = 1. For „ 0 1 « example: A = 1 0 is stochastic, but not power convergent (since its eigenvalues are 1, −1). Note that A2 = I, so A2n+1 = A, A2n = I. Corollary. A stochastic matrix A is power convergent if and only if 1 is a dominant eigenvalue of A, i.e. if and only if λ = 1 is the only eigenvalue of A of absolute value 1. Proof. Combine Theorems 7.60 and 7.17. While a stochastic matrix does not need to be power convergent in general, there is a fairly simple criterion which ensures its power convergence. This criterion, which was discovered by Oscar Perron (1880–1975) in 1907, depends on the following simple concept. Definition. A square stochastic matrix A is called primitive if for some n ≥ 1 the matrix An has no entries equal to 0. Remark. Note that An has no zero entries, then the same is true for An+1 etc. Thus we see that A is primitive if and only if An is primitive for some (hence every) n. 1 1 2 Example 7.22. (a) A = 10 9 8 is clearly primitive (take n = 1). 1 0 1  2 1 0 1  1 10 9  (b) A = 10 10 9 is also primitive, for A = 100 10 9 = 100 90 91 . (c) Upper/lower triangular (block) matrices are never primitive. In particular, the transition matrix A of the commodities example (Example 7.19) is not primitive.

Theorem 7.18 (Perron’s Theorem). If A is a primitive, stochastic matrix, then:

(a) λ1 = 1 is a simple, dominant eigenvalue of A. (b) The associated eigenspace

EA(1) = {c~p : c ∈ C} is generated by a (unique) stochastic vector ~p (called the Perron vector of A). (c) All the entries of ~p are positive. Proof. Without proof.1 Corollary. If A is primitive and stochastic, then A is power convergent with limit lim An = (~p|~p| ... |~p), n→∞ | {z } m where ~p denotes the Perron vector of A. Moreover, for any vector ~v we have n lim A ~v = c~p, where c = Σm~v. n→∞ 1A proof can be found in F.R. Gantmacher’s book, The Theory of Matrices, vol. II, p. 53ff. Section 7.7: Stochastic matrices and Markov chains 347

In particular, if ~v is stochastic, then lim An~v = ~p does not depend on ~v. n→∞ Proof. By Perron’s theorem (Theorem 7.18) we know that 1 is a dominant, simple eigenvalue of A, so A is power convergent. Moreover, by Theorem 7.10 we have 1 lim Ak = ~u · ~vt, k→∞ ~ut~v where ~u is a 1-eigenvector of A and ~v is a 1-eigenvector of At. Here we can take ~u = ~p (Perron’s vector) t t ~v = (1,..., 1) = Σm since A is stochastic. Now ~ut · ~v = 1 since ~p is stochastic, ~u · ~vt = ~p · (1,..., 1) = (~p| ... |~p), so first formula follows. Moreover,

k k lim A ~v = ( lim A )~v = (~p| ... |~p)~v = ~p Σm~v . k→∞ k→∞ |{z} c Remarks. 1) As the above proof shows, the conclusion of the corollary holds more generally for stochastic matrices for which 1 is a simple, dominant eigenvalue (and ~p is a stochastic 1-eigenvector). However, in this more general case the entries of ~p need not be positive. 2) The above theorem and its corollary reveal the following interesting fact about a stochastic discrete linear system ~vn+1 = A~vn: if A is primitive, then no state will be completely drained in the long run (because the limit c~p has no zero entries). Note that in the commodities example (Example 7.19), the states R1 and R2 were drained in the long run (but there the transition matrix A was not primitive!) Example 7.23. Consider the economics example 7.19 again, but assume now that the transition matrix A describing the fractions aij of the value of goods shipped weekly from region Rj to Ri is  .1 .2 .1 .1   .7 .6 .1 .1  A =    .1 .1 .7 .1  .1 .1 .1 .7   v1  v2  and the initial distribution vector is ~v =   where v1 + v2 + v3 + v4 = 7.  v3  v4 Find the distribution of the goods in the long run. Solution. Note that here we are not specifying how many $100, 000 worth of goods was present initially in each region, only that the total present initially was $700, 000. So ~v could be (3, 2, 1, 1)t or (7, 0, 0, 0)t, etc. Qualitative Solution. Thanks to O. Perron (through the corollary to his theorem) we t know (with no calculation) that there is a vector ~p = (p1, . . . , p4) (all of whose entries 348 Chapter 7: Powers of Matrices

n are positive (pj > 0)) that depends only on A (not on ~v) such that limn→∞ A ~v = 7~p. Therefore, in the long run, no matter how ~v was chosen among all the vectors with entry sum 7, eventually Rj will have $700, 000pj worth of the commodity. This tells us a great deal about the economic situation in a qualitative way. Quantitative Solution. If we want more quantitative information we need to find ~p. This is a routine matter of finding the eigenvectors for 1. We know in advance (again thanks to O. Perron) that 1 is simple, so any solution ~x of (A−I)~x = ~0 will be a multiple of ~p, i.e. ~x = c~p, with c = Σ4~x. Now by row reducing A − I (or by inspection) we find that  6   16  ~x =    11  11 is such a solution, so  6   3/22   .14  1 1  16   4/11  ·  .36  ~p = ~x =   =   =   . Σ4~x 44  11   1/4   .25  11 1/4 .25

Since lim An~v = 7~p, this means that n→∞ R1 gets $95, 455 = 3/22 × $700, 000 worth of goods eventually,

R2 gets $254, 545 = 4/11 × $700, 000 worth of goods eventually,

R3 gets $175, 000 = 1/4 × $700, 000 worth of goods eventually,

R4 gets $175, 000 = 1/4 × $700, 000 worth of goods eventually, regardless of how the $700, 000 worth of goods was initially distributed! The only effect of the choice of ~v has on the system is to determine how soon the “ultimate” distribution is reached. For example, if  0.95   2.55  ~v = 7~p =   ,  1.75  1.75 then it’s reached immediately; however if  1.4   2.1  ~v =    1.75  1.75 then it will take longer. (Using the computer, it is not difficult to find the first n for 6 n 6 which the dollar difference between $10 (A ~v)j and $7 × 10 pj is negligible for all j. Section 7.7: Stochastic matrices and Markov chains 349

Here, ‘negligible’ means less than $10, 000, $1, 000, $100, $10, $1 or $0.01 depending on the situation at hand.)

 1 1  0 2 3 Example 7.24. Find the limit lim An when A =  1 0 2  . n→∞  2 3  1 1 2 2 0 Solution. Clearly, A is stochastic. Even though A has some zero entries, A is still primitive because  0 1 1   0 1 1   5 2 4  2 3 2 3 1 A2 =  1 0 2   1 0 2  =  4 7 2   2 3   2 3  12   1 1 1 1 2 2 0 2 2 0 3 3 6 has no zero entries. Thus, by Perron’s theorem, A is power convergent and lim An = (~p|~p|~p), n→∞ where ~p is the Perron vector of A. To compute ~p, solve (A − I)~x = ~0:  −6 3 2   8   8  1 1 3 −6 4 ~x = ~0 ⇒ ~x = c 10 , so ~p = 10 . 6     27   3 3 −6 9 9  8 8 8  n Th. 7.18 1 Thus, lim A = 27  10 10 10  . n→∞ 9 9 9

 1 1  0 2 2 Example 7.25. Find the limit lim Bn when B =  1 0 1  . n→∞  2 2  1 2 3 3 0 Solution. We first observe that even though B is not stochastic (for the columns do not add to 1), A = Bt is stochastic, and so the theory of the present section applies. In fact, since A is the matrix of Example 7.24, we know that  8 8 8  n 1 lim A = 27  10 10 10  . n→∞ 9 9 9 Now since Bn = (At)n = (An)t, we therefore obtain

 8 8 8 t  8 10 9  n n t 1 1 lim B = lim (A ) = 27  10 10 10  = 27  8 10 9  . n→∞ n→∞ 9 9 9 8 10 9 350 Chapter 7: Powers of Matrices

We now use Perron’s Theorem to analyze the following examples of “rat mazes” which might arise in Biology experiments. Example 7.26 (A Rat Maze). As in Example 5.30, consider the following system of three chambers connected by passages as shown.

1 At time t0, a rat is placed in one of the cham- bers (say in chamber 1). Each minute thereafter, the rat is driven out 2 of its present chamber by some stimulus and is prevented from re-entering immediately. Assume: the rat chooses the exits of each 3 chamber at random.

Question: What is the probability that the rat is in a certain chamber in the long run? Solution. Recall from the solution of Example 5.30 that this problem gives rise to a Markov chain t ~vn+1 = A~vn with ~v0 = (1, 0, 0) , t in which ~vn = (v1,n, v2,n, v3,n) denotes the state probability vector (with vi,n = the probability that the rat is in chamber i after n minutes) and that the transition matrix A is given by  1 1  0 2 3 A =  1 0 2   2 3  1 1 2 2 0 (cf. Example 5.30), which is the same as that of Example 7.24. Thus, by the latter exam- 1 t ple we see that the matrix A is primitive, and that the Perron vector is ~p = 27 (8, 10, 9) . By Perron’s theorem we therefore have

 8 8 8   8  n 1 1 lim A = 27  10 10 10  and lim ~vn = ~p = 27  10  . n→∞ 9 9 9 n→∞ 9

This means that in the long run the rat spends: 8 27 of its time in chamber 1, 10 27 of its time in chamber 2, 1 3 of its time in chamber 3. Note: This result does not depend on where (i.e. in which chamber) the rat was originally placed. Section 7.7: Stochastic matrices and Markov chains 351

Example 7.27. This example is the same experiment as in Example 7.26, except that the maze is different. Moreover, if the rat chooses an exit A or B, then it leaves the maze (and the experiment is over).

1 2

Analysis: To interpret this as a Markov chain, we let

s be the state: ‘the rat is in chamber j’ for j = 1, 2, 3; j s be the state: ‘the rat went to exit A’; 4 s5 be the state: ‘the rat went to exit B’.

3 A

B The corresponding transition matrix (assuming random selections by the rat) is:

 1 1  0 3 4 0 0  1 1   2 0 4 0 0    A =  1 1 0 0 0  .  2 3   0 1 1 1 0   3 4  1 0 0 4 0 1  TO  This matrix is of the form A = where BI

 1 1    0 3 4  1 1  0 0   1 1 0 3 4 1 0 T =  2 0 4  ,B = 1 ,O =  0 0  and I = . 1 1 0 0 4 0 1 2 3 0 0 0 The matrix A isn’t primitive, but we can use Theorem 7.15 to find lim An, as we did n→∞ in the problem of section 7.6 (cf. Example 7.20).   n 0 0 We have lim A = −1 because ρ(T ) < 1 (cf. Example 7.14). n→∞ B(I − T ) I  0 0 0 0 0    132 60 48  0 0 0 0 0  1 n 1   Now (I − T )−1 =  90 126 54  , and so lim A =  0 0 0 0 0  . 78 n→∞ 13   96 72 120  9 10 8 13 0  4 3 5 0 13

If we place the rat in chamber i with probability qi initially, then the initial vector is t ~v0 = (q1, q2, q3, 0, 0) and

n t 4 3 5 lim ~vn = lim A ~v0 = (0, 0, 0, 1 − q, q) , where q = q1 + q2 + q3. n→∞ n→∞ 13 13 13 352 Chapter 7: Powers of Matrices

Example 7.28. Consider the following rat maze:

1 3

4 2

Find the probability that the rat is in a given chamber after n minutes when n is large.

Solution. Here we have the Markov chain ~vn+1 = A~vn whose transition matrix is

 1 1  0 0 4 2  3 1     0 0  OB A =  4 2  = ,  1 1 0 0  CO  3 2  2 1 3 2 0 0

1 1 2 1 2 3 0 0 2 „ BC 0 « if we let B = 4 3 2 ,C = 6 4 3 and O = 0 0 . Thus A = 0 CB , so  (BC)k 0   0 B(CB)k  A2k = and A2k+1 = AA2k = , for all k ≥ 0. 0 (CB)k C(BC)k 0 This shows that A is not power convergent. To find out what An looks like when n is large, we therefore need to distinguish two cases: (a) n large and even, (b) n large and odd. Case (a). We calculate lim (BC)k = (~p, ~p) where ~p is the Perron vector of the primitive k→∞ stochastic matrix  10 9  BC = 1 . 24 14 15 We find that  9  ~p = 1 23 14 and hence   k 1 9 9 L1 := lim (BC) = . k→∞ 23 14 14 Then we calculate lim (CB)k = (~q, ~q) where ~q is the Perron vector of the primitive k→∞ 1 11 10 1 „ 10 « stochastic matrix CB = 24 13 14 . We find that ~q = 23 13 and hence   k 1 10 10 L2 := lim (CB) = . k→∞ 23 13 13 Section 7.7: Stochastic matrices and Markov chains 353

Therefore  9 9 0 0  2k 1  14 14 0 0  lim A = Diag(L1,L2) =   . k→∞ 23  0 0 10 10  0 0 13 13 Thus, when n is large and even,     9(v1 + v2) v1 n . 1  14(v1 + v2)   v2  ~vn = A ~v0 = Diag(L1,L2)~v0 = 23   when ~v0 =   .  10(v3 + v4)   v3  13(v3 + v4) v4

For example, 0 1 1 0 1 0 1 0 1 2 9 1 9 B 0 C · 1 B 14 C B 0 C · 1 B 14 C if ~v0 = B 1 C then ~v1024 = 46 B 10 C and if ~v0 = B 0 C then ~v1024 = 23 B 0 C. @ 2 A @ A @ A @ A 0 13 0 0 The latter computation can be interpreted as follows. If we place the rat in chamber 1 at time t = 0, then after 1024 minutes the probability that it is in chamber 1 is 9 14 approximately 23 and that it is in chamber 2 is (approximately) 23 . On the other hand, it is highly unlikely (i.e. the probability is almost 0) that it is in chambers 3 or 4. Case (b). Here we have  0 0 9 9    2k+1 2k 1 0 BL2 1  0 0 14 14  lim A = A lim A = 23 =   . k→∞ k→∞ CL1 0 23  10 10 0 0  13 13 0 0 Thus when n is large and odd,     0 0 9 9 9(v3 + v4) · 1  0 0 14 14  1  14(v3 + v4)  ~vn =  ~v0 = 23   . 23  10 10 0 0   10(v1 + v2)  13 13 0 0 13(v1 + v2)

For example, 0 1 1 0 0 1 0 0 1 0 9 1 B 0 C · 1 B 0 C B 0 C · 1 B 14 C if ~v0 = B C then ~v1025 = B C and if ~v0 = B C then ~v1025 = B C. @ 0 A 23 @ 10 A @ 1 A 23 @ 0 A 0 13 0 0 Again, these computations can be interpreted as follows. If we place the rat in chamber 1 at time t = 0, then it is highly unlikely that after 1025 minutes it is in chambers 1 10 13 or 2, whereas the probability that it is in chambers 3 and 4 is approximately 23 and 23 , respectively. Similarly, if we place it in chamber 3 at time t = 0, then it is highly unlikely that it is in chambers 3 or 4 after 1025 minutes, whereas the probability that it is in 9 14 chambers 1 and 2 at that time is approximately 23 and 23 , respectively. 354 Chapter 7: Powers of Matrices

Example 7.29. The game of craps. This game requires only an ordinary pair of dice. It has two stages: “Coming Out” The “shooter” (=player) rolls the dice. If she “rolls” (i.e. the sum of the dots on the tops of the dice is) 7 or 11, she wins; if she rolls 2, 3 or 12 she loses. In these two cases the game is over. However if 4, 5, 6, 8, 9 or 10 is rolled then that number is called the shooter’s “point” and we go to the second stage: “Making the Point” Now the shooter rolls the dice again. She wins if she rolls her point (“makes” her point) and the game is over. If she rolls any other number then she rolls the dice again, winning if she makes her point, (the same point as determined in the first stage), losing on 7 and rolling again if any other number turns up. For example: Suppose the shooter rolls 6 at first (so 6 is the point). Then she rolls 8 (not the point, not 7), so she rolls again. Say she rolls a 3 (3 is not 7, not 6), so she rolls again, say a 2 (not 7, not 6), so she rolls again, etc. until either 6 is rolled (she wins) or 7 (she loses).

We consider 8 states: s1 : the point is 4 s2 : the point is 10 s3 : the point is 5 s4 : the point is 9 s5 : the point is 6 s6 : the point is 8 s7 : the shooter loses s8 : the shooter wins  27 0 0 0 0 0 0 0   0 27 0 0 0 0 0 0     0 0 26 0 0 0 0 0    1  0 0 0 26 0 0 0 0  The transition matrix is A =   36  0 0 0 0 25 0 0 0     0 0 0 0 0 25 0 0     6 6 6 6 6 6 36 0  3 3 4 4 5 5 0 36 1 t and the initial vector is ~v0 = 36 (3, 3, 4, 4, 5, 5, 4, 8) . Therefore w, the shooter’s probability of winning eventually is the 8th entry in lim An~v. n→∞ TO 1 Now A = BI where T is the diagonal matrix 36 Diag(27, 27, 26, 26, 25, 25) occupying 27 n „ 0 0 « the first 6 rows and columns of A, so ρ(T ) = < 1 hence lim A ~v0 = −1 ~v0 36 n→∞ B(I − T ) I 1 „ 6 6 6 6 6 6 « −1 1 1 1 1 1 1 1 where B = 36 3 3 4 4 5 5 and (I − T ) = 36 Diag( 9 , 9 , 10 , 10 , 10 , 11 , 11 ). Thus, −1 1 „ 110 110 99 99 99 90 « n B(I − T ) = 165 55 55 66 66 75 75 and so the last row of A∞ = limn→∞ A is 1 1 1 2 2 5 5 165 (55, 55, 66, 66, 75, 75, 0, 165) = ( 3 , 3 , 5 , 5 , 11 , 11 , 0, 1). Thus, the desired probability is 1 1 2 2 5 5 976 244 . w = ( 3 , 3 , 5 , 5 , 11 , 11 , 0, 1) · ~v0 = 1980 = 495 = 0.4929. Section 7.7: Stochastic matrices and Markov chains 355

Exercises 7.7.

1. Consider the following stochastic matrices

 2 0 1   2 2 1  1 1 A = 4  0 4 2  and B = 4  0 2 2  . 2 0 1 2 0 1

Show that B is primitive but that A is not. (Suggestion: Show that A violates Perron’s Theorem.)

2. Let A = (aij) be a stochastic matrix of size m × m. Show that: (a) If for every pair i, j with 1 ≤ i, j ≤ m there is an index k (depending on i, j) 2 with aik 6= 0 and akj 6= 0, then A is primitive. [Hint: Consider A .] m (b) If each row and each column of A contains less than 2 zeros, then A is primitive. [Hint: Use part(a).]

3. A salesman’s territory consists of 3 cities C1,C2,C3. He never sells in the same city on successive days. If he sells in C1, then the next day he sells in C2. If he sells in either C2 or C3, then the next day he is twice as likely to sell in C1 as in the other city. th Let vkn denote the probability that he is in city Ck on the n day, and let ~vn = t (v1n, v2n, v3n) denote the associated probability distribution vector. (a) Draw a transition diagram and write down the transition matrix A such that ~vn+1 = A~vn. 2 (b) Find the matrix A . If he is in city C2 on one day, what is the probability that he is in C1 two days later? (c) Show that the matrix A is primitive. (d) What proportion of his time does the salesman spend in each city in the long run?

4. A sequence of random numbers x1, x2, x3, . . . , xn,... is generated by rolling one die th repeatedly and recording the number xn shown at the top of the die at the n roll. Let yn = max(x1, x2, . . . , xn), the largest number shown in n rolls. This is gives rise to a Markov chain in which the state sj is the event: “the largest number shown so far is j” (for j = 1, 2, 3, 4, 5, 6). (a) Write down the transition matrix and initial vector of the Markov chain. (b) By means of (a), or otherwise, compute the probability that 4 is the largest number shown in three rolls of the die. What happens in the long run? 356 Chapter 7: Powers of Matrices 7.8 Application: Finding approximate eigenvalues of a matrix

Very often it is a difficult if not impossible task to find the exact eigenvalues of a relatively small (even a 5×5) matrix, and so we have to be content with finding approximate values. However, in place of finding first the characteristic polynomial and then its approximate roots, there is a better and quicker method based on the ideas of this chapter. This method, while not always successful, works frequently enough so that most computer programs for finding an approximate eigenvalues are based on it. Before presenting the method, let us first observe the following variant of Theorem 7.8. Theorem 7.19. If α 6= 0 is a dominant, regular eigenvalue of A, then 1 (a) B = α A is power convergent and (b) each non-zero column of lim Bn is an α-eigenvector of A. n→∞ The proof of this theorem is based on the following fact, which we had already used several times in the examples.

Observation. Let σ be a complex number and let λ1, . . . λs be the distinct eigenvalues of an m × m matrix of A. Then σλ1, . . . , σλs are the distinct eigenvalues of σA and we have mσA(σλi) = mA(λi) and νσA(σλi) = νA(λi), for 1 ≤ i ≤ s.

In particular, λi is a regular eigenvalue of A if and only if σλi is a regular eigenvalue of σA.

m m1 ms [Write chA(t) = (−1) det(A − tI) = (t − λ1) ... (t − λs) . Then m chσA(t) = (−1) det(σA − tI) m t = (−1) det(σ(A − σ I)) m m t = (−1) σ det(A − σ I) m t m1 t ms = σ ( σ − λ1) ... ( σ − λs) m1 ms = (t − σλ1) ... (t − σλs) , which shows that σA has the eigenvalues σλ1, . . . , σλs with respective algebraic multi- plicities m1, . . . , ms. Finally, since EσA(σλi) = Nullspace(σA − σλiI) = Nullspace(σ(A − λiI)) = Nullspace(A − λiI) = EA(λi), we have in particular that νσA(σλi) = νA(λi) by taking dimensions.]

Proof of Theorem 7.19. (a) Let λ1 = α, λ2, . . . , λs be the eigenvalues of A. Then by the above observation the eigenvalues of B are λ λ 1, 2 ,..., s . α α Section 7.8: Application: Finding approximate eigenvalues of a matrix 357

Since α is a regular eigenvalue for A, 1 is a regular eigenvalue for B (by the same λj observation). Moreover, since α is dominant, we have k α k < 1 for j > 1. Thus B satisfies the conditions of Theorem 7.6 for the power convergence and hence is power convergent. n (b) Put L = lim B , and let ~v = L~ej be any non-zero column of L. Since n→∞

BL = B lim Bn = lim Bn+1 = lim Bn = L, n→∞ n→∞ n→∞ we see that B~v = BL~ej = L~ej = ~v and so A~v = α~v. This means that ~v ∈ EA(α), as claimed. From the above theorem we see that for large n, every non-zero column of α−nAn and therefore every non-zero column of An is approximately an eigenvector of A belonging to the dominant eigenvalue α; furthermore we see that the approximation becomes better the larger n is. Let 0 v1 1 B v2 C B . C ~v = B . C @ . A vm be any non-zero column of An where n is large. Since ~v is an approximate eigenvector of A i.e. A~v =· α~v, the numbers

(A~v)j for which vj 6= 0 vj differ little from each other and from α; in fact, if ~v were an exact eigenvector then these numbers would all coincide with the corresponding eigenvalue (which is close to α). If we are given a matrix A and asked to find an eigenvalue of A, even if we don’t know in advance that A has a dominant, regular eigenvalue we could look at the powers of A (it’s easy to do this with a computer) and test a non-zero column ~cn. First we will look at a 2×2 example (even though we can find the eigenvalues exactly for such small matrices) to illustrate this method (called the power method). 1 1 Example 7.30. Find an approximate eigenvalue of A = 1 2 .  2 3  Solution. We have A2 = , squaring A2 we get 3 5  13 21  A4 = , squaring A4 we get 21 34  610 987  A8 = . 987 1597 358 Chapter 7: Powers of Matrices

Suppose we stopped squaring at the third step and chose the first column of A4 as our candidate ~v for an eigenvector of A. How good is ~v ? Is A~v close to a multiple of ~v ?

( (A~v)1 34 · „ « „ « = = 2.61538 Here ~v = 13 , A~v = 34 and v1 13 . 21 55 (A~v)2 = 55 =· 2.61905 v2 21 We could take either 2.61538 or 2.61905 as our candidate for the corresponding eigenvalue; 1 34 55 · let us compromise by taking λ to be their average: 2 ( 13 + 21 ) = 2.61722. Then ~v, λ will be good candidates if A~v is close to λ~v. How close is A~v to λ~v ? √ ||A~v − λ~v|| = p(34 − 13λ)2 + (5 − 21λ)2 < 0.0005669 + 0.0014793 < 0.046.

Is that small enough? That depends on (a) how good an approximation is required and (b) on the size of ~v. To standardize comparisons between different eigenvector candidates (ones coming from 1 1 different powers of A) we could take care of (b) by normalizing ~v but A( ||~v|| )~v−λ( ||~v|| )~v = 1 ( ||~v|| )||A~v−λ~v|| < 0.0022 in our case. If we are satisfied with that degree of approximation, 1 1 we stop here. If not, (suppose e.g. we wanted ||A( ||~v|| )~v − λ( ||~v|| )~v|| < 0.00001) then we square the power of A we were looking at (in this case A4) and repeat the procedure on that matrix (in this case A8). Taking the first column of A8 (we could use any non-zero column) as our new candidate for an eigenvector of A we get

( (A~v)1 1597 „ 610 « „ 1597 « v = 610 ~v = , A~v = , and 1 ; 987 2584 (A~v)2 = 2584 v2 987 . these last two numbers average out to λ = 2.618033618. To see how good ~v, λ are, we calculate the tolerance ∆ which is defined by

||A~v − λ~v|| (24) ∆(~v, λ) = . ||~v||

Here we get   1 1597 − 1597.00507 1 ∆ = < (0.0052) < 0.000005 ||~v|| 2584 − 2583.999181 ||~v|| which is good enough since we prespecified ∆ < 0.00001. Moreover, we know that λ ≥ |µ| for all other eigenvalues µ of A (so λ is “almost” dominant). If, instead of 0.00001, we had specified 0.000001, then we would have to square A8 and repeat the process. In this case we obtain λ =· 2.618033989 and ∆ =· 0 to 9 decimal places. Section 7.8: Application: Finding approximate eigenvalues of a matrix 359

· Thus we have found one approximate eigenvalue λi = 2.6180... of A. To find the other one, we shall use the trace tr(A) of A. Recall (cf. section 7.5) that the tr(A) = sum of diagonal entries of A = sum of eigenvalues of A (counted with their multiplicities), hence in our case tr(A) = λ1 + λ3 = 1 + 2 = 3 hence λ2 = 3 − λ1 = 0.381966011 to 9 decimal places. To see how good an approximation this is, let us compute λ1 and λ2 exactly:  1 − t 1  ch (t) = det = (1 − t)(2 − t) − 1 = t2 − 3t + 1 = (t − λ )(t − λ ), A 1 2 − t 1 2 √ √ √ 3+ 9−4 3 1 . 3− 5 . with λ1 = 2 = 2 + 2 5 = 2.618033989 and λ2 = 2 = 0.391966011. Thus, the approximate eigenvalues which we found agree with the exact eigenvalues up to 9 decimal places.

Example 7.31. Find an eigenvalue λ of  1 2 4 1   1 1 0 3  A =    2 1 4 1  −1 1 2 4 with tolerance ∆(λ,~v) < 0.0001 (cf. (24)). Solution. The first column of A8 is  917758   558547  ~c =   .  970125  515537 Since its entries are a bit too large to deal with comfortably, we choose ~v = 10−5 times ~c, i.e.  9.17758   64.30889   5.58547   39.40674  ~v =   , and so A~v =   .  9.70125   67.90100  5.15537 36.43187 The corresponding value of λ is 1 64.30889 36.43187 . λ = + ··· + = 7.032094636. 4 9.177758 5.15537 To see how good the approximation is, we calculate: ||A~v − λ~v|| =· 0.4503433331 ||~v|| =· 15.36611666 360 Chapter 7: Powers of Matrices . . therefore ∆(λ,~v) = .0450343/15.36611 = 0.029307 > 0.0001. So we look at 10−2 times the first column of (10−5A8)(10−5A8), call it ~w. We get

 5.405674   37.96079165   3.332553   ...  ~w =   and A~w =   .  5.700911   ...  3.086350 ...

The corresponding λ is

1 37.96079165 ......  . λ = + + + = 7.022464671. 4 5.405674 3.332533 5.700911 3.086350 . . Since ||A~w − λ~w|| = 0.00077075 and ||~w|| = 9.074864, we see that ∆(λ, ~w) = ||A~w − . . λ~w||/||~w|| = 0.00077075/9.074864 = 0.0000849 < 0.0001, and so we can stop here and take λ, ~w as our approximate eigenvalue and eigenvector.

Power Method Algorithm Suppose A is any m × m matrix and ε > 0 is the degree of tolerance required for our approximation (in our last example ε = 0.0001). Successively carry out the following steps until ∆ < ε: We will suppose we have tried A, A2,A4,...,A2n−1 and are now looking at M = A2n . 1) Let ~v be the first column of M (or any other non-zero column). 2) Let ~y = A~v.

n yj o 3) Calculate the average of : vj 6= 0 = λ. vj ||~y−λ~v|| 4) Calculate ∆ = ||~v|| . 5) If ∆ < ε, stop. Then (λ,~v) are the required approximation. If ∆ > ε, then square A2n , call the new matrix M and do steps 1-5 again. This is just a rough outline. We have left out the possibility of replacing M by a scalar multiple of M if the entries in M are unwieldy. In addition, some instruction should be given about when to quit if the method does not seem to be working. We will point out in the exercises how the power method sometimes can give us an eigenvalue of A even when A doesn’t have a dominant regular eigenvalue. Of course, the method always works when A has a dominant, regular eigenvalue. Deflation Method Having found one eigenvalue of A, we would like to find the rest. One popular method is to replace A by a smaller matrix that contains the rest of the eigenvalues of A and keep repeating the process until all the eigenvalues are found. Section 7.8: Application: Finding approximate eigenvalues of a matrix 361

t Theorem 7.20. Suppose A = (aij) is an m × m matrix and ~v = (v1, . . . , vn) is an eigenvector for the eigenvalue α. If v1 6= 0, define the (m − 1) × (m − 1) matrix A2 by     a22 a23 . . . a2m v2  a32 a33 . . . a3m  1  v3      A2 =  .  −  .  (a12, a13, . . . , a1m).  .  v1  .  am2 am2 . . . amm vm

Then we have

chA(t) = (t − α) chA2 (t).

Thus, if λ2, λ3, . . . , λm denote the eigenvalues of A2 (counted with multiplicities), then α, λ2, . . . , λm are those of A.

Before we prove the theorem, let us see how it is used. Example 7.32. Use the deflation method to find the eigenvalues of

 1 1 8  A =  4 4 2  . 0 6 4

Solution. Since all row sums are 10 we see without further calculation that  1   1  A  1  = 10  1  . 1 1

0 1 1 So 10 is an eigenvalue and @ 1 A is one of its eigenvectors. Thus 1

4 2 1  1  4 2 1 8 3 − 6 A = − (1, 8) = − = . 2 6 4 1 1 6 4 1 8 5 − 4

The rest of the eigenvalues of A are the eigenvalues of A2 which we find by calculating the roots of the characteristic polynomial of A2. Here

2 chA2 (t) = (t − 3)(t + 4) + 30 = t + t + 18 √ −1±i 71 and its roots are 2 , so the eigenvalues of A are √ √ −1 + i 71 −1 − i 71 10, , . 2 2 362 Chapter 7: Powers of Matrices

 1 1 0 1   0 2 1 0  Example 7.33. Find the eigenvalues of A =   by the deflation method.  1 −1 3 0  −1 1 2 1

Solution. Since the row sum is 3, we see that 3 is an eigenvalue of A with eigenvector (1, 1, 1, 1)t. Here  2 1 0   1   A2 =  −1 3 0  −  1  1 0 1 1 2 1 1  2 1 0   1 0 1   1 1 −1  =  −1 3 0  −  1 0 1  =  −2 3 −1  . 1 2 1 1 0 1 0 2 0

We could use the power method to find the eigenvalues of A2 but instead we shall compute its characteristic polynomial. It is ch (t) = (t − 2)(t2 − 2t + 3), so the eigenvalues of A √ √ A2 √ √ 2 are 2, 1 + i 2, 1 − i 2 and hence the eigenvalues of A are 3, 2, 1 + i 2, 1 − i 2.

Proof of Theorem 7.20. We know that A~v = α~v and v1 6= 0. Define an m × m matrix P by   v1 0 0 ... 0  v 1 0 ... 0   2   v 0 1 ... 0  P =  3  .  . . . ..   . . . . 0  vm 0 0 ... 1  v ~0t  If we let ~w = (v , v , . . . , v )t, then P = 1 , where I is the (m − 1) × (m − 1) 2 3 m ~w I . The first column of AP is A~v, but A~v = α~v, so   αv1 a12 ...... a1m  αv a ...... a   2 22 2m   αv a ...... a  AP =  3 32 3m  .  ......   ......  αvm am2 ...... amm

„ t « t αv1 ~c If we put ~c = (a12, . . . , a1m) , then AP = α ~w B . Moreover,  −1 ~t  −1 v1 0 P = −1 −v1 ~w I because, by block multiplication,  −1 ~t   ~t   −1 ~t −1~t ~t   ~t  v1 0 v1 0 v1 v1 + 0 w v1 0 + 0 I 1 0 −1 = −1 −1 t = ~ ~ , −v1 ~w I ~w I (−v1 ~w)v1 + ~w −v1 ~wo + I 0 I Section 7.8: Application: Finding approximate eigenvalues of a matrix 363 which is the m × m identity matrix. Therefore  −1 ~t   t  −1 v1 0 αv1 ~c P AP = −1 −v1 ~w I α~w B  ~t −1 t ~t  α + 0 (α~w) v1 ~c + 0 B = −1 t −α~w + α~w −v1 ~wc + B  α v−1~ct  = 1 . ~0 A2

Since similar matrices have the same characteristic polynomials, we obtain

chA(t) = chP −1AP (t) = det(tI − P −1AP )  t − α −v−1~ct  = det 1 . ~0 tI − A2

Expanding the of the first column of this determinant yields chA(t) = (t −

α) det(tI − A2) = (t − α) chA2 (t), as desired. Here is a version of Theorem 7.20 which can be used if the first entry of the eigenvector ~v is zero. We omit the proof since it is similar to that of Theorem 7.20. 0 t Theorem 7.20 . Suppose A = (aij) is an m × m matrix, ~v = (v1, . . . , vm) is an eigenvector for the eigenvalue α and vj 6= 0. Define   v1  v2   .   .   1    (j)   A2 = A − vj−1 (aj1, aj2, . . . , ajj−1, ajj+1, . . . , ajm) vj   vj+1  .   .  vm where A(j) is the (m − 1) × (m − 1) matrix obtained from A by deleting its jth row and column. Then

chA(t) = (t − α) chA2 (t) and hence α, λ2, λ3, . . . , λm are the eigenvalues of A if λ2, λ3, . . . , λm are those of A2. Exercises 7.8.  0 2 −2 4   0 2 −1 1  1. Let A =   . Given that 2 is an eigenvalue of A, use the deflation  −1 2 −1 4  −1 1 −1 4 method (at least once) to find the spectrum of A. 364 Chapter 7: Powers of Matrices 7.9 Review Exercises

Review Problems, Part A

1. Which of the following matrices is power convergent?

 3 −3 1   3 3 1   1 −1 0  1 1 (a) 4  0 2 0  , (b) 4  0 4 0  , (c)  0 1 0  , −1 −5 3 1 −3 3 0 −1 1  0 1.9 .9   .8 .4  (d) 0 .9 0 , (e) .   .2 .6 .9 .9 0

For those matrices A which are power convergent, find lim An. n→∞ 2. Find the spectral radius of each matrix:

 7 3 1 −1   3 4 5   2 1   1 2  0 1 2 3 (a) ; (b) ; (c) −1 2 1 ; (d)   . 3 1 3 2    0 0 −1 4  1 0 1   0 0 0 −9

2 2 2 3. Find the spectral radius of A if chA(t) = (t + 6t + 9)(t + 2t + 11) . 4. Find the spectral radius of A if

 1 1 1  n lim A =  2 2 2  . n→∞ 3 3 3

 2 .5 0 .5   .2 i .4 −.1  5. Suppose that A =  , and that λ is any eigenvalue of A.  .2 0 1 + i 1  1+i 2 .1 0 3

(a) Draw the Gerˇsgorin discs Dj(A), j = 1, 2, 3, 4 in the plane. (b) Use (a) and (b) to obtain upper and lower bounds on Re(λ), Im(λ) and |λ|. (c) Does your answer in (a) give you enough information to decide:

(i) If A is invertible? If so, is it? 1 (ii) If 3 A is power convergent? If so, is it? Section 7.9: Review Exercises 365

 3 1 0 −1   −2 5 1 0  6. Suppose A =  .  −1 0 4 1  2 −1 −1 7 (a) Use Method 1 (Theorem 7.12) to estimate ρ(A). (b) Use Method 2 (Theorem 7.14) to estimate ρ(A). (c) Use Method 3 (Theorems 7.12 and 7.14) to estimate ρ(A). (d) Do you have enough information from (a), (b), and (c) to decide that

lim 1 An = 0? n→∞ 9 Explain briefly.

7. For each of the following matrices T , determine whether the geometric series gen- ∞ X erated by T converges. If so, find the sum T n. n=1

 0 1 2  1  1 2  1  1 2  (a) T = .3, (b) T = , (c) T = , (d) T = 0 0 3 . 4 3 2 5 3 2   0 0 0

8. (a) Use the technique of the economics example to evaluate  n   .2 .4 0 0 v1  .4 .4 0 0   v2  ~w = lim     . n→∞  .3 .1 1 0   v3  .1 .1 0 1 v4

(b) Find two different stochastic vectors such that the vectors ~w calculated in (a) are different.

9. (a) Use the fact that A is a primitive, stochastic matrix to evaluate lim An when n→∞

 .5 .2 .1  A =  .3 .4 .8  . .2 .4 .1

t n (b) If ~v = (v1, v2, v3) is an arbitrary stochastic vector, find lim A ~v. n→∞ (c) Is there a pair of stochastic vectors ~v, ~w such that

lim An~v 6= lim An ~w? n→∞ n→∞ 366 Chapter 7: Powers of Matrices

10. Suppose we alter the game of “craps” so that the player wins (instead of losing) if he rolls a 2 at the first stage of the game.

(a) Find the initial vector ~v0 for the Markov chain describing the new game. (b) Find the probability that the player wins eventually under this new rule. 11. If A is an n × n stochastic matrix and At is also stochastic, show that

 1 1 ··· 1  1 1  1 1 ··· 1  k t   lim A = (Σn| ... |Σn) =  . . .  . k→∞ n n  . . .  1 1 ··· 1

12. Players A and B play the following game: Player A begins the game with $1 and B begins with $3. A rolls two dice. If A rolls 3, 5, or 6 then A wins $1. If A rolls 2 then A wins $2 (or $1, if that is all the money that B has). If A rolls any other number then B wins $1. Playe A continues to roll until A or B has no money left. Find the probability that A wins eventually. Suggestion. Use a Markov chain with 5 states:

s1: A has $1 s2: A has $2 s3: A has $3 s4: A has $4 (i.e. B has no more money) s5: A has $0 (i.e. A has no more money) Sample game: Before Then A has B has roll no. A rolls 0 $1 $3 2 — A wins $2 from B 1 3 1 9 — A loses $1 to B 2 2 2 6 — A wins $1 from B 3 3 1 4 — A loses $1 to B 4 2 2 3 — A wins $1 from B 5 3 1 2 — so A wins only $1 from B 6 4 0 — — B loses: game over

 0 −1 1 2   2 4 −1 −2  13. Given that ~v = (0, 2, 0, 1)t is an eigenvector of A =  :  4 2 1 −4  −3 −1 1 5 (a) Find the eigenvalue of A belonging to the given vector. (b) Find the spectrum of A by using deflation at least once. Section 7.9: Review Exercises 367

Review Problems, Part B

1. Find  n n .1 0 0 0 0  .1 0 0 0   0 .2 0 0 0   0 .2 0 0    lim   and lim  0 0 .3 0 0  . n→∞ .8 .4 1 0 n→∞    .8 .4 .3 1 0  .1 .4 0 1   .1 .4 .4 0 1

2. Suppose  x 1 1 1 1   0 .5 1 2 1    A =  0 0 1 0 0     0 0 0 1 0  0 0 0 0 .7

For which values of x (if any) is A power convergent? Explain briefly.

3. Suppose that the characteristic polynomial of an m × m matrix A is (t − 1)2(t + 3 2 1  1) (t + 4)(t − 2) and B = 2 A. (a) Find m and the trace of A. (b) Find a spectrum of B. (c) Is B power-convergent? Justify your answer.

4. Suppose  −1 + 3i 0 1 2   0 1 + 3i 0.3 0.2  A =    0.5 −2 3.5 0  0 1 −1.5 3 Apply Gerˆsgorin’s theorem to A and the transpose of A,

(a) to estimate the minimum of the real parts of the eigenvalues of A, (b) to estimate the maximum of the real parts of the eigenvalues of A.  1 + i −1 .2  5. Suppose A =  .2 1 − i −1 , and λ is any eigenvalue of A. .3 .2 −1

(a) Sketch the Gerˇsgorin discs Dj(A), j = 1, 2, 3. (b) Obtain upper and lower bounds on |λ|, Re(λ), and Im(λ). c) Does the information above enable you to decide if A is invertible? 1  d) Does the information above enable you to decide if 2 A is power convergent? 368 Chapter 7: Powers of Matrices

6. Suppose that the characteristic polynomial of an m × m matrix A is (t2 + 6t + 9)(t2 + 2t + 10)(t2 + 2t − 8).

(a) Find m, the determinant and trace of A. 1  (b) If r is the spectral radius of A, is r A power convergent? Justify your answer.  1 1 1 0 

 −1 1 2 1  1 7. Suppose A =   and B = ( )A.  −1 1 2 −1  4 1 −2 1 1 (a) Estimate the spectral radius of A by any of the methods of the text. (b) Show that B is power-convergent.  0.8 0.3 0.2  8. Let A =  0 0.2 0  0 0.5 0.4 (a) Find the spectral radius of A.   ∞ 1 X n (b) Solve the equation A x =  1  n=0 1

 4  3i 1 1 − 3 i 0  0 0 2 i  9. Let A =  .  −1 1 3 −1  1 0 0 −2 (a) For each of the Gerˇsgorin discs of A, write down its centre and radius. (b) Give a rough sketch of the above discs. (c) Use Gerˇsgorin’s theorem to estimate the spectral radius ρ(A) of A.

10. Alex, Barbara and Charles are playing a game which has been analysed as a Markov process with 5 states: 1 - Alex plays, 2 - Barbara plays, 3 - Charles plays, 4 - Alex wins, 5 - Alex loses. The matrix, M, of transition probabilities and the matrix M ∗ = lim M n are given below. n→∞

 3 3 1    8 8 8 0 0 0 0 0 0 0  1 3 3 0 0   0 0 0 0 0   8 8 8      ∗   M =  3 1 3 0 0  ,M =  0 0 0 0 0   8 8 8     1 0 0 1 0   11 4 7 1 0   8   26 13 26  1 1 15 9 19 0 8 8 0 1 26 13 26 0 1 Section 7.9: Review Exercises 369

1) What is the chance that Charles will have the next turn if Alex is playing now? 2) What is the chance that Alex will have the next turn if Charles is playing now? 3) If Alex is now allowed to start the game, whom should he prefer to have start it? Give your reason. 4) What are the chances that Alex will win if the starting player is chosen at random from among the three of them?  AO  5) If M is written as how is M ∗ written in terms of A, B, I, O? BI 6) Fill in the last three rows of a new matrix N which represents the analysis made when the state “Alex loses” is broken up into 5 - “Barbara wins” and 6 - “Charles wins”. You need to know that only the person actually playing can win on that turn.

 3 3 1  8 8 8 0 0 0  1 3 3   0 0 0   8 8 8   3 1 3 0 0 0   8 8 8  N =    ? ? ? 1 0 0     ? ? ? 0 1 0    ? ? ? 0 0 1

 .8 .1 .1  n 11. Find lim A where A is the column (and row) stochastic matrix  .2 .7 .1  . n→∞ 0 .2 .8

 4 1 1 1   1 3 0 0  12. Let A =  .  1 0 1 0  1 0 0 1 (a) Show that 1 is an eigenvalue of A. (b) Without calculating the characteristic polynomial of A, show√ that the eigen- values of A are real numbers lying in the interval 0 < x ≤ 33. (c) By using the deflation method, find a 3 × 3 matrix B such that f(x) = (x − 1)g(x) where f(x) and g(x) are the characteristic polynomials of A and B respectively. √ 1 (d) Given that the other two eigenvalues of B are 2 (3 ± 5), why does L = 1 n lim A exist? Find L. n→∞ 5 370 Chapter 7: Powers of Matrices

 0 4 4 4   2 1 5 7  13. Suppose A =  . Use deflation at least once to find a spectrum of  2 2 6 −7  1 1 3 3  2   0   1   0  A, given that A   =   .  −1   0  0 0

 3  −1 1 2 14. Let A =  −1 0 0 . 3 3 5 − 2 2 2  1  (a) Show that  −1  is an eigenvector of A belonging to eigenvalue 1. 2 (b) Using the method of deflation find all other eigenvalues.

(c) Is A power convergent? If yes find lim An. n→∞  0 1 0   1  15. Let A =  0 0 1  ,~v =  2  . 12 −16 7 4

(a) Verify by direct matrix multiplication that ~v is a 2-eigenvector of the matrix A. (b) Apply deflation and find the remaining eigenvalues of A.