<<

Diagonalizability

Consider the constant-coefficient linear system

y0 = Ay (1)

where A is some n×n . In general, this problem is not easy to solve since the equations in the system are usually coupled. · ¸ 2 1 Example 1. 3 If A = , the system y0 = Ay is 4 −1

0 y1 = 2y1 + y2 0 y2 = 4y1 − y2

Because both y1 and y2 appear in both equations, we cannot solve these equations indepen- dently. 3

In some cases, however, the equations in the system are uncoupled. · ¸ 3 0 Example 2. 3 If A = , the system y0 = Ay is 0 −2

0 y1 = 3y1 0 y2 = −2y2.

Since the first equation only contains y1 and the second equation only contains y2, we may solve the two equations independently of one another. The solution is

3t y1 = c1e −2t y2 = c2e

or in vector form · ¸ · ¸ · ¸ 3t c1e 3t 1 −2t 0 y = −2t = c1e + c2e . c2e 0 1 3

The matrix A in this example is called a , since its entries off the main diagonal are all zero. In general, a linear system is uncoupled if and only if its coefficient matrix is a diagonal matrix. Now suppose we are given a system y0 = Ay which is coupled. Is it possible to make a substitution which transforms the system into an uncoupled system? To answer this question, let’s consider a substitution of the form

y = Cx,

1 where C is some n × n . This defines a change of coordinates. The old variable y will be replaced by the new variable x. To see what affect this has on the system y0 = Ay, we substitute the change of coordinates into both sides. On the left side we have y0 = Cx0, and on the right side Ay = ACx, so Cx0 = ACx. Multiplying both sides by the inverse of C gives x0 = C−1ACx. Thus the original system in y with coefficient matrix A has been replaced by the new system in x with coefficient matrix C−1AC. In order for this new system to be uncoupled, we need this coefficient matrix be a diagonal matrix. That is, we want C−1AC = D for some diagonal matrix D.

Definition 1. We say that a matrix A is diagonalizable if there exists and invertible matrix C and a diagonal matrix D such that C−1AC = D.

So a system y0 = Ay can be uncoupled if and only if A is a diagonalizable matrix. This conclusion leads to the following questions. • Which matrices are diagonalizable?

• For those matrices A which are diagonalizable, how do we find C and D? To answer these questions, suppose that we have found matrices  ¯ ¯ ¯  ¯ ¯ ¯   ¯ ¯ ¯ λ 0 ··· 0  ¯ ¯ ¯  1    0 λ ··· 0     2  C = v¯1 v¯2 ··· v¯n and D =  . . .. .   ¯ ¯ ¯   . . . .  ¯ ¯ ¯ ¯ ¯ ¯ 0 0 ··· λn such that C−1AC = D. Then multiplying both sides by C gives AC = CD. But  ¯ ¯ ¯   ¯ ¯ ¯  ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯  ¯ ¯ ¯   ¯ ¯ ¯          AC = A¯v1 A¯v2 ··· A¯vn and CD = λ1¯v1 λ2¯v2 ··· λn¯vn , (2)  ¯ ¯ ¯   ¯ ¯ ¯  ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ so if we equate the columns of these matrices we find that

Av1 = λ1v1

Av2 = λ2v2 . .

Avn = λnvn.

Motivated by this, we make the following definition.

2 Definition 2. A nonzero vector v such that

Av = λv

for some scalar λ is called an eigenvector of A with eigenvalue λ.

Thus in order for A to be diagonalized by the matrices C and D, the columns of C must be eigenvectors whose eigenvalues are the diagonal entries of D. Since C is invertible, its columns must be linearly independent. Thus we have shown that if A is diagonalizable, then there exist n linearly independent eigenvectors of A. But conversely, if there exist n linearly independent eigenvectors of A, then placing these vectors into the columns of a matrix C, and placing their eigenvalues into the diagonal matrix D, it follows by the same calculations above that AC = CD, so C−1AC = D and A is diagonalizable.

Theorem 1. An n×n matrix A is diagonalizable if and only if there there exist n linearly independent eigenvectors of A.

· ¸ · ¸ 2 1 1 Example 3. 3 Let A = and define v = . Then 4 −1 1 1 · ¸ · ¸ · ¸ 2 1 1 3 Av = = = 3v , 1 4 −1 1 3 1 · ¸ −1 so v is an eigenvector with eigenvalue 3. Next, let v = . Then 1 2 4 · ¸ · ¸ · ¸ 2 1 −1 2 Av = = = −2v , 2 4 −1 4 −8 2

so v2 is an eigenvector· ¸ with eigenvalue −2. Since v1 and v2 are linearly independent, the 1 −1 matrix C = is invertible, and by the reasoning above, 1 4 · ¸ 3 0 C−1AC = D = . (Check this yourself!) 0 −2

Thus we have found matrices C and D which diagonalize A. Therefore, if we now make the change of variable y = Cx, then the system y0 = Ay is transformed into the uncoupled system x0 = Dx:

0 x1 = 3x1 0 x2 = −2x2.

3 3t −2t The general solution is x1 = c1e , x2 = c2e , or · ¸ 3t c1e x = −2t . c2e

Therefore the general solution of the system y0 = Ay is · ¸ · ¸ · ¸ · ¸ 3t 1 −1 c1e 3t 1 −2t −1 y = Cx = −2t = c1e + c2e . 1 4 c2e 1 4 3

The last statement in this example can be generalized as follows.

Theorem 2. If {v1, v2,..., vn} are linearly independent eigenvectors of A with eigenval- 0 ues λ1, λ2, . . . , λn, respectively, then the general solution of y = Ay is

λ1t λ2t λnt y = c1e v1 + c2e v2 + ··· + cne vn.

Proof. Let C be the matrix with columns v1 through vn and D the diagonal matrix with diagonal entries λ1 through λn. Then using equations (2), we have AC = CD and therefore C−1AC = D. So, as above, the substitution y = Cx transforms y0 = Ay into x0 = Dx. The solution of this uncoupled system is   λ1t c1e  λ2t  c2e  x =  .   .  λnt cne so λ1t λ2t λnt y = Cx = c1e v1 + c2e v2 + ··· + cne vn.

We now focus on the task of finding the eigenvectors and eigenvalues of a matrix. We begin with the eigenvalues.

Finding Eigenvalues

Suppose that v is an eigenvector of A with eigenvalue λ. Then

Av = λv.

We can rewrite this equation as Av − λv = 0, or, since Iv = v for any vector v, Av − λIv = 0.

4 This may be written (A − λI)v = 0, which means that v ∈ N(A − λI). Since eigenvectors are nonzero by definition, this implies that the null space of A − λI is non-trivial. This in turn implies that A − λI is not invertible, and therefore

det(A − λI) = 0.

Each step in this chain of implications can be reversed, so we have the following test for eigenvalues.

Theorem 3. Let A be an n × n matrix. Then λ is an eigenvalue of A if and only if det(A − λI) = 0.

Example 4. 3 Let · ¸ 1 2 A = . 4 3 Then · ¸ · ¸ · ¸ 1 2 λ 0 1 − λ 2 A − λI = − = 4 3 0 λ 4 3 − λ so det(A − λI) = (1 − λ)(3 − λ) − 8 = λ2 − 4λ − 5 = (λ − 5)(λ + 1). This expression is zero when λ = 5 or λ = −1, so these are the only eigenvalues of A. 3 The expression

pA(λ) = det(A − λI)

is in general a polynomial of degree n, called the characteristic polynomial of A. The eigenvalues of A are the roots of this polynomial. Next let’s turn to the task of finding the eigenvectors.

Finding Eigenvectors

Once an eigenvalue λ of a matrix A is known, the associated eigenvectors are the nonzero solutions of Av = λv. Equivalently, they are the nonzero elements of the null space of A − λI. We call

Eλ = N(A − λI)

the eigenspace associated with the eigenvalue λ. Every nonzero vector in Eλ is an eigen- vector with eigenvalue λ.

5 · ¸ 1 2 Example 5. 3 Let A = . In the previous example we found that the eigenvalues of 4 3 A were 5 and −1. For λ = 5, · ¸ · ¸ µ· ¸¶ 1 −4 2 1 − 2 1 A − 5I = −→ , =⇒ E5 = N(A − 5I) = span . 4 −2 rref 0 0 2 | {z } v1 For λ = −1, · ¸ · ¸ µ· ¸¶ 2 2 1 1 −1 A − (−1)I = −→ , =⇒ E−1 = N(A − (−1)I) = span . 4 4 rref 0 0 1 | {z } v2

Thus v1 and all its nonzero multiples are eigenvectors with eigenvalue 5, and v2 and all of its nonzero multiples are eigenvectors· ¸ with eigenvalue −1. Since v1 and v2 are linearly 1 −1 independent, the matrix C = is invertible, and we have 2 1 · ¸ 5 0 C−1AC = D = , (Check!) 0 −1

so the matrix A is diagonalizable. 3

Here is an example of a non-diagonalizable matrix. · ¸ 2 1 Example 6. 3 Let A = . The characteristic polynomial of A is p (λ) = λ2 − 6λ + −1 4 A 9 = (λ − 3)2, so λ = 3 is the only eigenvalue of A. Since · ¸ · ¸ µ· ¸¶ −1 1 1 −1 1 A − 3I = −→ , =⇒ E3 = N(A − 3I) = span , −1 1 rref 0 0 1 | {z } v the only eigenvectors of A are multiples of v. Thus there do not exist two linearly independent eigenvectors of A, so by Theorem 1, A is not diagonalizable. 3

Part of the problem is this example was that the 2 × 2 matrix A only had one eigenvalue. That is, the number of different eigenvalues of A was less than the size of A. It turns out that if we avoid this situation, then we are guaranteed diagonalizability. The key is that eigenvectors with different eigenvalues are linearly independent.

Theorem 4. Suppose v1,..., vk are eigenvectors with different eigenvalues λ1, . . . , λk. Then they are linearly independent.

6 Proof. Suppose c1v1 + c2v2 + ··· + ckvk = 0.

Since vi is in the null space of A − λiI, we have (A − λiI)vi = 0. On the other hand (A − λiI)vj = (λj − λi)vj for j 6= i. So if we apply the product

(A − λ2I) · (A − λ3I) ··· (A − λkI)

to both sides, every term but the v1 term vanishes, and we are left with

(λ1 − λ2)(λ1 − λ3) ··· (λ1 − λk)c1v1 = 0.

Since v1 is an eigenvector, it is nonzero, and by assumption the eigenvalues are different, so each term in parentheses is nonzero. Hence c1 must equal zero. Now we are left with

c2v2 + ··· + ckvk = 0,

and proceeding in the same way we find that c2 = 0, and so on, until ck = 0, so the eigenvectors are linearly independent.

Theorem 5. Suppose A is an n × n matrix with n different eigenvalues. Then A is diagonalizable.

Proof. Let λ1, . . . , λn be the distinct eigenvalues. Then there exist corresponding eigen- vectors v1,..., vn. By Theorem 4, they are linearly independent, so by Theorem 1, A is diagonalizable.

Example 7. 3 Let   1 2 3 A = 4 5 6 7 8 9 3 2 The characteristic√ polynomial of A is λ − 15λ − 18λ. Thus the eigenvalues of A are λ = 0 15± 297 and λ = 2 . Since these are all distinct, A is diagonalizable. 3 As the following example shows, it is not necessary to have distinct eigenvalues to be diagonalizable.

Example 8. 3 Let   2 1 4 A = 0 3 0 0 0 3 The characteristic polynomial of A is (λ − 2)(λ − 3)2, so A only has two distinct eigenvalues λ = 2 and λ = 3. However,       1 1 4 E2 = span 0 and E3 = span 1 , 0 . 0 0 1

7 The three eigenvectors above are linearly independent, and one can check that         1 1 4 −1 2 1 4 1 1 4 2 0 0 0 1 0 0 3 0 0 1 0 = 0 3 0 , 0 0 1 0 0 3 0 0 1 0 0 3 so A is in fact diagonalizable. 3 The key in this example was that the dimensions of the eigenspaces summed to 3, the size of the matrix A.

Theorem 6. Let λ1, λ2, . . . , λk be the distinct eigenvalues of an n × n matrix A. Then A is diagonalizable if and only if

dim(Eλ1 ) + dim(Eλ2 ) + ··· + dim(Eλk ) = n.

Proof. First suppose that the dimensions of the eigenspaces sum to n. Let di = dim(Eλi )

and choose a βi = {vi1,..., vidi } for each eigenspace Eλi . Then, by hypothesis, the union of these bases, β = β1 ∪ β2 ∪ · · · ∪ βk, is a set consisting of n vectors. To show that these vectors are linearly independent, suppose some linear combination of them equals zero. Then (c v + ··· + c v ) + (c v + ··· + c v ) + ··· + (c v + ··· + c v ) = 0 |11 11 {z 1d1 1d}1 |21 21 {z 2d2 2d}2 |k1 k1 {z kdk kd}k w1 w2 wk

We claim that wi = 0 for all i. For if any of the wi are nonzero, then they are eigenvectors with distinct eigenvalues λi, and by the equation above they would be linearly dependent,

a contradiction. Now since w1 is zero, and {v11,..., v1d1 } is a basis for Eλ1 , it follows that

c11 = ··· = c1d1 = 0. Similarly all the other coefficients are zero. Now suppose conversely that A is diagonalizable, and let β = {v1,..., vn} be a set of n linearly independent eigenvectors of A. Write β as a disjoint union β1 ∪ β2 ∪ · · · ∪ βk, where

each βi contains the basis vectors which lie in Eλi . Then each βi is a linearly independent

set of vectors in Eλi . To see that β1 spans Eλ1 let v be any vector in Eλ1 . Since β is a basis for Rn, v = w1 + w2 + ··· + wk,

where each wi is in span(βi). This can be rewritten

(w1 − v) + w2 + ··· + wk = 0.

If any of the vectors w1 − v, w2,..., wk are nonzero, then this equation describes a linear dependence among eigenvectors with distinct eigenvalues, a contradiction. Thus all of these vectors must be zero, and in particular v = w1 is in span(β1). Hence Eλ1 is spanned be β1, and therefore β1 is a basis for Eλ1 . Similarly every other βi is a basis for Eλi . Thus Xk Xk

n = # elements in β = # elements in βi = dim(Eλi ). i=1 i=1

8 Unfortunately, it is not always the case that the dimensions of the eigenspaces sum to n.

Example 9. 3 Let   3 4 5 A = 0 3 4 0 0 3 The characteristic polynomial of A is (λ − 3)3, so the only eigenvalue of A is λ = 3. But since   1 E3 = span 0 , 0 dim(E3) = 1 and therefore A is not diagonalizable. 3

Complex Eigenvalues and Eigenvectors

Since the roots of a real polynomial may be complex numbers, a matrix A may have complex eigenvalues. Inn such cases, we find the corresponding eigenspaces in the same manner as we do for real eigenvalues. The only difference is that the eigenvectors will be vectors with complex components. · ¸ −1 5 Example 10. 3 Let A = . The characteristic polynomial of A is p (λ) = λ2−2λ+2. −1 3 A Its roots are λ = 1 ± 2i. Since · ¸ · ¸ µ· ¸¶ −2 − 2i 5 1 −2 + 2i 2 − 2i A − (1 + 2i)I = −→ ,E1+2i = span −1 2 − 2i rref 0 0 1

and · ¸ · ¸ µ· ¸¶ −2 + 2i 5 1 −2 − 2i 2 + 2i A − (1 − 2i)I = −→ ,E1−2i = span , −1 2 + 2i rref 0 0 1

and we therefore have C−1AC = D, where · ¸ · ¸ 2 − 2i 2 + 2i 1 + 2i 0 C = and D = . 1 1 0 1 − 2i

The solution of the system y0 = Ay is therefore · ¸ · ¸ 2 − 2i 2 + 2i y = c e(1+2i)t + c e(1−2i)t . 1 1 2 1 3

Next we make an important observation about complex eigenvalues and eigenvectors. The conjugate of a number z = a+bi is the numberz ¯ = a−bi. The conjugate of a vector v is the vector w¯ whose entries are the conjugates of the entries of w. Likewise, the conjugate

9 of a matrix A is the matrix A¯ whose entries are the conjugates of the entries of A. For instance, · ¸ · ¸ · ¸ · ¸ 2 + i 2 − i 3 2 − i 3 2 + i 3 + 2i = 3 − 2i, = , = . 4 − 3i 4 + 3i 5i 2 + 2i −5i 2 − 2i

The following properties of conjugates are not difficult to verify:

• z1 + z2 = z1 + z2

• z1z2 = z1 z2 • zw = z w

• Aw = A w

• AB = A B

It turns out that complex eigenvalues and eigenvectors of real matrices always appear in conjugate pairs.

Theorem 7. Let A be a real n × n matrix, and suppose that w is an eigenvector with eigenvalue λ. Then w is an eigenvector of A with eigenvalue λ.

Proof. Since w is an eigenvector with eigenvalue λ, we have Aw = λw. Taking the conjugate of both sides and using the properties above gives A w = λ w. But since A is a real matrix, A = A, so this becomes A w = λ w. In the case that w and λ are real, then w = w and λ = λ and the theorem does not tell us anything new. Otherwise the theorem provides a nice shortcut for finding eigenvalues and eigenvectors. Once we have found an eigenvector w with eigenvalue λ, we know λ is also an eigenvalue with eigenvector w. · ¸ 2 − 2i Example 11. 3 In Example 10, we found that w = was an eigenvector with 1 · ¸ 2 + 2i eigenvalue λ = 1 + 2i. By Theorem 7 we therefore know immediately that w = is 1 an eigenvector with eigenvalue λ = 1 − 2i. 3

The following formula is useful for dealing with expressions involving exponentials of complex numbers.

Theorem 8. (Euler’s Formula) eiθ = cos θ + i sin θ.

10 Proof. Consider the Taylor series expansion of the function ex: X∞ xk x2 x3 x4 x5 ex = = 1 + x + + + + + ··· k! 2! 3! 4! 5! k=0 Now substitute x = iθ and use the fact that i2 = −1, i3 = −i, i4 = 1, etc, to get θ2 θ3 θ4 eiθ = 1 + iθ − − i + + ··· µ 2! 3! ¶4! µ ¶ θ2 θ4 θ3 θ5 = 1 − + − · · · + i θ − + − · · · 2! 4! 3! 5! Now just observe that these series are the Taylor series expansions of cos θ and sin θ. Using Euler’s formula, we can rewrite the exponential of the complex number a + bi as

ea+bi = eaebi = ea(cos b + i sin b).

Example 12. 3 Let A be the matrix in Example 10. We found that · ¸ · ¸ 2 − 2i 2 + 2i z = e(1+2i)t z = e(1−2i)t 1 1 2 1 is a fundamental set of (complex-valued) solutions of y0 = Ay. By Euler’s formula, e(1+2i)t = ete2ti = et(cos 2t + i sin 2t), and if we write · ¸ · ¸ · ¸ 2 − 2i 2 −2 = + i , 1 1 0 then µ· ¸ · ¸¶ 2 −2 z = et(cos 2t + i sin 2t) + i 1 1 0 µ · ¸ · ¸¶ µ · ¸ · ¸¶ 2 −2 2 −2 = et cos 2t − sin 2t + et sin 2t + cos 2t i. 1 0 1 0 Similarly, µ · ¸ · ¸¶ µ · ¸ · ¸¶ 2 −2 2 −2 z = et cos 2t − sin 2t − et sin 2t + cos 2t i. 2 1 0 1 0

Since z1 and z2 form a fundamental solution set, any linear combination of them is a solution. In particular, if we define µ · ¸ · ¸¶ 1 1 2 −2 y = z + z = et cos 2t − sin 2t 1 2 1 2 2 1 0 µ · ¸ · ¸¶ 1 1 2 −2 y = z − z = et sin 2t + cos 2t , 2 2i 1 2i 2 1 0 then y1 and y2 are two real-valued solutions. It is not hard to verify that they are linearly independent and therefore form a fundamental solution set. 3

11 To generalize the calculations in this example, suppose A has a complex eigenvalue λ = a + bi (b 6= 0), and suppose w is an eigenvector with eigenvalue λ. Write w = u + iv, where u and v are real vectors. By Theorem 7, we know that w = u − iv is an eigenvector with eigenvalue λ = a − bi. Thus we have the two complex solutions

λt (a+bi)t at bti z1 = e w = e (u + iv) = e e (u + iv) = eat(cos bt + i sin bt)(u + iv) = eat(cos btu − sin btv) + eat(sin btu + cos btv)i and

λt (a−bi)t at −bti z2 = e w = e (u − iv) = e e (u − iv) = eat(cos bt − i sin bt)(u − iv) = eat(cos btu − sin btv) − eat(sin btu + cos btv)i.

Observe that z2 = z1. By letting 1 1 y = z + z = eat(cos btu − sin btv) 1 2 1 2 2 1 1 y = z − z = eat(sin btu + cos btv), 2 2i 1 2i 2 we obtain two real-valued solutions y1 and y2. These are precisely the real and imaginary parts of the complex-valued solution z1.

12