Section 5.2 Diagonalization

We have seen that diagonal and triangular matrices are much easier to work with than are most matrices. For example, and eigenvalues are easy to compute, and multiplication is much more straightforward. Diagonal matrices are particularly nice. For example, the product of two diagonal matrices can be computed by simply multiplying their corresponding diagonal entries:       a1 0 ... 0 b1 0 ... 0 a1b1 0 ... 0        0 a2 ... 0   0 b2 ... 0   0 a2b2 ... 0   . . .   . . .  =  . . .  .  . .. .   . .. .   . .. .  0 0 . . . an 0 0 . . . bn 0 0 . . . anbn

Computationally, diagonal matrices are the easiest to work with.

With this idea in mind, we introduce similarity:

Definition 1. An n × n A is similar to a matrix B if there is an P so that B = P −1AP, and the function cP defined by −1 cP (A) = P AP is called a similarity transformation.

As an example, matrices ( ) ( ) 1 0 −8 3 A = and B = 0 4 −36 13 are similar; with ( ) ( ) 4 −1 1 1 P = and P −1 = , −3 1 3 4 you should check that

B = P −1AP ( ) ( )( )( ) −8 3 1 1 1 0 4 −1 = , −36 13 3 4 0 4 −3 1 so that A and B are indeed similar.

We are concerned with similarity not for its own sake as an interesting phenomenon, but because of quantities known as similarity invariants. We can get a feel for what similarity invariants are by considering data about the matrices from the previous example: in particular, let’s calculate the , trace and eigenvalues of A and B:

1 Section 5.2 property A B determinant 4 4 trace 5 5 eigenvalues 1, 4 1, 4

As you may have guessed from the table above, certain matrix properties, such as the determi- nant, trace, and eigenvalues, are “shared” among similar matrices; this is what we mean when we use the phrase similarity invariant. In other words, the determinant of the matrix A is also the determinant of any matrix similar to A. In fact, it is quite easy to check that the determinant is a similarity invariant; to do so, we recall two rules: 1 det(AB) = det(A) det(B), and det(P −1) = . det P Any matrix similar to a given matrix A must have form P −1AP , with P an invertible matrix. Let’s calculate the determinant of this similar matrix, using the rules above:

det(P −1AP ) = (det P −1)(det A)(det P ) ( ) 1 = (det A)(det P ) det P

= det A.

We have just shown that similar matrices share determinants–that is, det A = det(P −1AP ) for any invertible matrix P .

In general, if matrices A and B are similar, so that B = P −1AP , then they share:

determinant trace eigenvalues

rank nullity invertibility

characteristic eigenspace corresponding to a polynomial particular (shared) eigenvalue

Based on the example above, you may have already guessed the reason that we care about the idea of similarity: If A and B are similar matrices, and if A is diagonal, then it is much easier to calculate data such as determinant and eigenvalues about A than it is about B. With this in mind, we introduce the idea of diagonalizability:

Definition 2. An n × n matrix A is diagonalizable if it is similar to a . That is, A is diagonalizable if there is an invertible matrix P so that P −1AP is diagonal.

The matrix ( ) −8 3 B = −36 13

2 Section 5.2 is diagonalizable, since it is similar to the diagonal matrix ( ) 1 0 A = . 0 4 Key Point. It is important to note that not every matrix is diagonalizable. Indeed, there are many matrices which are simply not similar to a diagonal matrix. We will examine a few later in this section.

Criteria for Diagonalizability Given the information we have collected so far, it should be clear that, while diagonal matrices are arguably the easiest type of matrix to work with, diagonalizable matrices are almost as easy. If I want to know the determinant, trace, etc. of a matrix B that is similar to a diagonal matrix A, then I simply need to make the (easier) calculations for A.

This leads to a few interesting questions: how can we be certain that a given matrix is diago- nalizable? And if we know that a matrix is diagonlizable, how do we find the diagonal matrix to which it is similar?

The following theorem answers the first question: Theorem 1.5.2. If A is an n × n matrix, then the following statements are equivalent: (a) A is diagonalizable. (b) A has n linearly independent eigenvectors. In other words, we can check that matrix A is diagonalizable by looking at its eigenvectors: if A has n linearly independent eigenvectors, then it is diagonalizable.

Example In Section 5.1, we saw that the matrix   2 0 0 0 0 −1 0 0 A =   0 4 2 0 4 0 0 3 has repeated eigenvalue λ1 = λ3 = 2, with two corresponding linearly independent eigenvectors     1 0  0  0 x =   and x =   . 1  0  3 1 −4 0

3 Section 5.2 In addition, the eigenvalues λ2 = −1 and λ4 = 3 have eigenvectors     0 0  1  0 x =   and x =   , 2 − 4  4   3 0 0 1 respectively. You should check that the four eigenvectors above are linearly independent inspecting the linear combination ax1 + bx2 + cx3 + dx4; indeed, it is easy to see that the corresponding system

a = 0

b = 0

4 − b + c = 0 3

−4a + d = 0 has only the trivial solution a = b = c = d = 0. Since A is 4 × 4 and has 4 distinct eigenvectors, A is diagonalizable.

Example Determine if the matrix ( ) 4 −1 A = 1 2 is diagonalizable.

We need to check the eigenvectors of A; if A is diagonalizable, then it has two linearly indepen- dent eigenvectors. Accordingly, we begin by finding the eigenvalues of A, using the characteristic equation: ( ) λ − 4 1 det(λI − A) = det −1 λ − 2

= (λ − 4)(λ − 2) + 1 = λ2 − 6λ + 8 + 1 = λ2 − 6λ + 9,

4 Section 5.2 so that the characteristic equation for A is

λ2 − 6λ + 9 = 0.

By factoring the equation, we see that its roots are λ = 3, so that A has a single repeated eigenvalue.

Any eigenvector x corresponding to λ = 3 must satisfy

Ax = 3x ( )( ) ( ) 4 −1 x 3x 1 = 1 1 2 x2 3x2 ( ) ( ) 4x − x 3x 1 2 = 1 . x1 + 2x2 3x2

Thus we see that

4x1 − x2 = 3x1

x1 + 2x2 = 3x2, both of which amount to the single equation

x1 − x2 = 0 or x1 = x2.

Parameterizing x1 as x1 = t, we see that any eigenvector of A must have form ( ) ( ) t 1 = t . t 1

Thus A has only one linearly independent eigenvector; since A is 2 × 2, it is not diagonalizable.

The following theorem on eigenvalues and their associated eigenvectors will give us a quick way to check some matrices for diagonalizability:

Theorem 5.2.2. If λ1, λ2,. . . , λk are distinct eigenvalues of an n × n matrix A, and x1, x2,. . . , xk are eigenvectors corresponding to λ1, λ2,. . . , λk respectively, then the set

{x1, x2,..., xk} is a linearly independent set.

The theorem says that, for each distinct eigenvalue of a matrix A, we are guaranteed another linearly independent eigenvector. For example, if a 4 × 4 matrix A has eigenvalues −10, 2, 0, and 5, then since it has four distinct eigenvalues, A automatically has four linearly independent eigenvectors. Taken together with Theorem 5.2.1 on diagonlizability, we have the following corollary:

Corollary. Any n × n matrix with distinct eigenvalues is diagonalizable.

5 Section 5.2 Key Point. We must be extremely careful to note that we can only use the corollary to draw conclusions about an n × n matrix if the matrix has n distinct eigenvalues. If the matrix does not have n distinct eigenvalues, then it may or may not be diagonalizable. In fact, we have seen two matrices   2 0 0 0 ( ) 0 −1 0 0 4 −1   and 0 4 2 0 1 2 4 0 0 3 with repeated eigenvalues: the first matrix has eigenvalues −1, 2, 2, and 3, and the second has eigenvalues 3 and 3. The first matrix is diagonalizable, while the second is not.

Finding the Similar Diagonal Matrix Earlier, we asked how we could go about finding the diagonal matrix to which a diagonalizable matrix A is similar. If A is diagonalizable, with P −1AP the desired diagonal matrix, then we can rephrase the question above: How do we find P ?

The answer to this question turns out to be quite interesting:

Theorem. Let the n × n matrix A be diagonalizable with n linearly independent eigenvectors x1, x2,. . . , xn. Set   | | |   P = x1 x2 ... xn . | | | In other words, P is the matrix whose columns are the n linearly independent eigenvectors of A. −1 Then P AP is a diagonal matrix whose diagonal entries are the eigenvalues λ1, λ2,. . . , λn that correspond to the eigenvectors forming the successive columns of P .

Example Find the diagonal matrix to which   2 0 0 0 0 −1 0 0 A =   0 4 2 0 4 0 0 3 is similar.

Earlier, we saw that matrix   2 0 0 0 0 −1 0 0 A =   0 4 2 0 4 0 0 3

6 Section 5.2 has linearly independent eigenvectors         1 0 0 0  0   1  0 0 x =   , x =   , x =   , and x =   1   2 − 4  3   4   0 3 1 0 −4 0 0 1 corresponding to eigenvalues

λ1 = 2, λ2 = −1, λ3 = 2, and λ4 = 3 respectively. The matrix P from the theorem above is given by   1 0 0 0  0 1 0 0 P =   ;  − 4  0 3 1 0 −4 0 0 1 you should check that   1 0 0 0   −1 0 1 0 0 P =  4  0 3 1 0 4 0 0 1 is its inverse. According to the theorem, P −1AP is diagonal–let’s verify this:       1 0 0 0 2 0 0 0 1 0 0 0 0 1 0 0 0 −1 0 0  0 1 0 0 P −1AP =        4     − 4  0 3 1 0 0 4 2 0 0 3 1 0 4 0 0 1 4 0 0 3 −4 0 0 1     2 0 0 0 1 0 0 0  0 −1 0 0  0 1 0 0 =      8   − 4  0 3 2 0 0 3 1 0 12 0 0 3 −4 0 0 1   2 0 0 0 0 −1 0 0 =   . 0 0 2 0 0 0 0 3

7 Section 5.2 Powers of a Diagonalizable Matrix with diagonal matrices is remarkably simple. As a quick example, we can calculate the product below simply by multiplying corresponding diagonal entries:       5 0 0 0 4 0 0 0 20 0 0 0 0 −2 0 0  0 1 0 0   0 −2 0 0     =   . 0 0 3 0  0 0 2 0   0 0 6 0 0 0 0 −1 0 0 0 −1 0 0 0 1 As you might have guessed, there is a simple formula for calculating powers of diagonal matrices: if   d1 0 ... 0    0 d2 ... 0  D =  . . .  ,  . .. .  0 0 . . . dn then Dk exists if

• k is any integer and each di is nonzero, or

• one or more of the di is 0, and k is a positive integer. If Dk exists, then   k d1 0 ... 0  0 dk ... 0  k  2  D =  . . .  .  . .. .  k 0 0 . . . dn

We can use these facts to our advantage if we know that matrix A is diagonalizable. To see why, suppose we wished to calculate Ak, where A is diagonalizable but not itself diagonal. Since A is diagonalizable, there is a diagonal matrix D and invertible matrix P so that

A = P −1DP ; thus we rewrite Ak as

Ak = (P −1DP )k

−1 −1 −1 −1 −1 = (|P DP )(P DP )(P DP{z ) ... (P DP )(P DP}) k copies

= P −1D(PP −1)D(PP −1)D(P...P −1)D(PP −1)DP )

−1 = P |DDD...DD{z } P k copies

= P −1DkP.

8 Section 5.2 In other words, in order to calculate Ak (which might be hard), we can first calculate Dk (easy), then finish off with a similarity transformation:

Ak = P −1DkP.

Key Point. The idea behind the observations above are actually true in a more general context than the one presented above: if A and B (not necessarily diagonal) are similar by an invertible matrix P , then Ak and Bk are also similar by P .

Example Given   2 0 0 0 0 −1 0 0 A =   , 0 4 2 0 4 0 0 3 find the eigenvalues of A5.

In an earlier example, we saw that A is diagonalizable via   1 0 0 0  0 1 0 0 P =    − 4  0 3 1 0 −4 0 0 1 to the diagonal matrix   2 0 0 0 0 −1 0 0 D =   . 0 0 2 0 0 0 0 3 Now since A and D are similar, they share eigenvalues; A5 and D5 are also similar (via the same matrix P ), so that they share eigenvalues as well. The eigenvalues of D5 are easy to find; D5 is diagonal, so its eigenvalues are its diagonal entries. Let’s make the calculation:   25 0 0 0  0 (−1)5 0 0  D5 =    0 0 25 0  0 0 0 35   32 0 0 0  0 −1 0 0  =   .  0 0 32 0  0 0 0 243

9 Section 5.2 5 5 Thus D has eigenvalues λ1 = 32, λ2 = −1, λ3 = 32, and λ4 = 243, which it shares with A since they are similar.

Geometric and Algebraic Multiplicity Given an n × n matrix A, there are two possibilities for the types of eigenvalues A could have:

1. n distinct eigenvalues

2. some repeated eigenvalues.

In the first case, we automatically know that A is diagonalizable; however, in the second case, A may or may not be diagonalizable. Indeed, we saw two different examples of repeated eigenvalues:   2 0 0 0 0 −1 0 0 A =   0 4 2 0 4 0 0 3 has eigenvalues −1, 2, 2, and 3 and is diagonalizable, while ( ) 4 −1 B = 1 2 has eigenvalues 3 and 3 but is not diagonalizable. There are apparently different types of repeated eigenvalues–those, like the 2 from matrix A above, that do not affect diagonalizability, and those such as the 3 from matrix B which do.

We would like to have some terminology to help us differentiate between “good” eigenvalues and “bad” ones, and so we will momentarily introduce the ideas of algebraic and geometric multiplicity. Before we do so, recall that eigenvalues are nothing but roots of the characteristic equation of the associated matrix. For example, since matrix B above has eigenvalue 3 repeated twice, its characteristic equation must be (λ − 3)(λ − 3) = 0.

Definition. Let λ0 be an eigenvalue of the matrix A. The number of times that the factor (λ − λ0) appears as a factor of the characteristic polynomial det(λI − A) of A is called the algebraic multiplicity of λ0. The of the eigenspace associated with λ0 is called the geometric multiplicity of λ0.

Let’s think about the ideas of algebraic and geometric multiplicity in terms of the two examples above. Starting with   2 0 0 0 0 −1 0 0 A =   , 0 4 2 0 4 0 0 3

10 Section 5.2 we can calculate that its characteristic equation is

(λ − 2)(λ − 2)(λ + 1)(λ − 3) = 0.

Since λ − 2 shows up twice as a factor, the eigenvalue 2 has algebraic multiplicity 2; the other two eigenvalues have algebraic multiplicity 1. To get the geometric multiplicities of the eigenvalues, we need to check the eigenspaces corre- sponding to each distinct eigenvalue: earlier, we saw that A has linearly independent eigenvectors         1 0 0 0  0   1  0 0 x =   , x =   , x =   , and x =   ; 1   2 − 4  3   4   0 3 1 0 −4 0 0 1 x1 and x3 correspond to the repeated eigenvalue λ = 2, x2 corresponds to the distinct eigenvalue λ = 2 and x4 corresponds to the distinct eigenvalue λ4 = 3. Thus the repeated eigenvalue 2 corresponds to an eigenspace with a consisting of 2 vectors, whereas each of the eigenvalues −1 and 3 have eigenspaces with a basis consisting of 1 vector. Thus the geometric multiplicity of the eigenvalue 2 is 2, and the eigenvalues −1 and 3 both have geometric multiplicities 1. We record all of the data in a table: Distinct Geometric Algebraic Eigenvalue Multiplicity Multiplicity 2 2 2 -1 1 1 3 1 1

Next, let’s examine matrix ( ) 4 −1 B = . 1 2 It is easy to calculate the characteristic equation

(λ − 3)(λ − 3) = 0; since the factor (λ − 3) shows up twice in the equation, the repeated eigenvalue λ = 3 has algebraic multiplicity 2.

Earlier, we saw that the eigenspace corresponding to eigenvalue λ = 3 is generated by the single vector ( ) 1 x = ; 1 thus λ = 3 has geometric multiplicity 1.

Let’s compare all of the data that we have generated:

11 Section 5.2 Distinct Geometric Algebraic Distinct Geometric Algebraic Eigenvalue of A Multiplicity Multiplicity Eigenvalue of B Multiplicity Multiplicity 2 2 2 3 1 2 -1 1 1 3 1 1

Notice in the example above that the algebraic and geometric multiplicities of each distinct eigenvalue of the diagonalizable matrix A matched up, whereas the geometric multiplicity of the only distinct eigenvalue of the non-diagonalizable matrix B was “deficient” compared with its algebraic multiplicity. This idea is actually a general rule, made precise in the following theorem:

Theorem 5.2.4. Let A be an n × n matrix.

(a) The geometric multiplicity of a distinct eigenvalue of A is less than or equal to its algebraic multiplicity.

(b) A is diagonalizable if and only if the algebraic and geometric multiplicities of each of its distinct eigenvalues are equal.

12