<<

Diagonalization and powers of matrices

Brian Krummel April 6, 2020

One important application of diagonalizable matrices is computing powers of square matrices. Let A be a diagonalizable n × n expressed as

A = PDP −1 for a n × n diagonal matrix D and n × n P . Suppose we want to compute Ak for some integer k. Then by multiplying PDP −1 k-times and cancelling P −1P = I:

Ak = (PDP −1)k = (PDP −1)(PDP −1) ··· (PDP −1) = PDD ··· D P −1 = PDkP −1. | {z } | {z } k times k times

Computing the Dk for a diagonal matrix is very easy:

 k  k  λ1 0 ··· 0 λ1 0 ··· 0  0 λ ··· 0   0 λk ··· 0  k  2   2  D =  . . .. .  =  . . .. .  .  . . . .   . . . .  k 0 0 ··· λn 0 0 ··· λn

That is, Dk is the diagonal matrix obtained by computing the k-th power of the diagonal entries of D.

Example 1. Given the 2 × 2 matrix

 −4 6  A = , −3 5

find A5. Answer. Finding eigenvalues of A. Suppose λ is an eigenvalue of A. Then (A − λI)x = 0 has a nontrivial solution. Thus the matrix A − λI is singular and det(A − λI) = 0. We have

−4 − λ 6 2 det(A − λI) = = (−4 − λ)(5 − λ) + 18 = λ − λ − 2 = (λ + 1)(λ − 2). −3 5 − λ Therefore the eigenvalues of A are −1, 2.

1 Finding eigenvector corresponding to −1. We solve (A + I) X = 0.

 −3 6   1 −2  A + I = −→ . −3 6 0 0 x2 is a free variable and x1 is a basic variable with x1 = 2x2, so an eigenvector of A corresponding to −2 is  2  . 1 Finding eigenvector corresponding to 2. We solve (A − 2I) X = 0.

 −6 6   1 −1  A − 2I = −→ . −3 3 0 0 x2 is a free variable and x1 is a basic variable with x1 = x2, so an eigenvector of A corresponding to −1 is  1  . 1 Diagonalize. We let D be the diagonal matrix whose diagonal entries are the eigenvalues −1, 1. We let P be the matrix whose columns are the corresponding eigenvectors:

 −1 0   2 1  D = P = 0 2 1 1

Thus  2 1   −1 0   2 1 −1 A = PDP −1 = (1) 1 1 0 2 1 1 Compute A5.

 2 1   −1 0 5  2 1 −1 A5 = PD5P −1 = 1 1 0 2 1 1  2 1   −1 0 5  1 −1  = 1 1 0 2 −1 2  2 1   −1 0   1 −1  = 1 1 0 32 −1 2  2 1   −1 1  = 1 1 −32 64  −34 66  = . −33 65

More generally, this gives us a way to compute functions of matrices.

Example 2. Let A be as in Example 1. Is B = A2 + 2A + 5I diagonalizable?

2 Answer. We have already shown that A is diagonalizable, so let A = PDP −1. Then, using A2 = PD2P −1 and I = PIP −1,

B = A2 + 2A + 5I = (PDP −1)2 + 2PDP −1 + 5I = PD2P −1 + 2PDP −1 + 5PIP −1 = P (D2 + 2D + 5I) P −1.

Recalling (1),

 (−1)2 + 2(−1) + 5 0   4 0  B = P P −1 = P P −1. 0 22 + 2(2) + 5 0 13

Therefore, B is diagonalizable.

Notice that here we had a polynomial function f(x) = x2 + 2x + 5. We showed that if A is a diagonalizable n × n matrix written as   λ1 0 ··· 0  0 λ ··· 0  −1  2  −1 A = PDP = P  . . .. .  P  . . . .  0 0 ··· λn where P is an invertible n × n matrix, then f(A) = A2 + 2A + 5I (with 5I in place of 5) is   f(λ1) 0 ··· 0  0 f(λ ) ··· 0  −1  2  −1 f(A) = P f(D) P = P  . . .. .  P .  . . . .  0 0 ··· f(λn)

This holds true for any polynomial function f(x). In fact, this holds true for any real analytic function f(x), i.e. any function which converges to its Taylor series.

Example 3. For instance, consider the exponential function exp(x) = ex. This function has the Taylor series ∞ X xk exp(x) = . k! k=0 We can define exp(A) for an n × n matrix by

∞ X Ak exp(A) = , k! k=0 where the infinite sum means that we compute the infinite sum for each entry. Of course, defining exp(A) by an infinite series is not particularly enlightening. Instead, suppose that A is a diago- nalizable matrix with A = PDP −1 for an n × n diagonal matrix D and n × n invertible matrix

3 P . Then using Ak = PDkP −1:

 k  λ1 0 ··· 0 ∞ k ∞ k −1 ∞ k ∞ 0 λk ··· 0 X A X PD P X D −1 X 1  2  −1 exp(A) = = = P · · P = P ·  . . . .  P k! k! k! k!  . . .. .  k=0 k=0 k=0 k=0  . . .  k 0 0 ··· λn  ∞ k  X λ 1 0 ··· 0  k!   k=0   λ   ∞ k  e 1 0 ··· 0  X λ2  ∞ k 0 ··· 0 0 eλ2 ··· 0 X D   −1   −1 = P · =  k!  P = P  . . . .  P k!  k=0   . . .. .  k=0  . . .. .     . . . .  λn  ∞  0 0 ··· e  X λk   0 0 ··· n  k! k=0 For instance, when A is as in Example 1,

 2 1   e−1 0   2 1 −1 exp(A) = 1 1 0 e2 1 1  2 1   e−1 0   1 −1  = 1 1 0 e2 −1 2  2 1   e−1 −e−1  = 1 1 −e2 2e2  2e−1 − e2 −2e−1 + 2e2  = . e−1 − e2 −e−1 + 2e2

This is important when studying differential equations. Recall that for each real number a, the solution to y0 = ay is y = ceat, where c ∈ R is a constant. For each n × n matrix A, we can consider the differential system Y 0 = AY , where Y (t) is a function of t taking values in Rn. The solution to Y 0 = AY is Y = exp(tA) · C, where C ∈ Rn is a constant. Example 4. Metropolis is served by two local newspapers, the Daily Planet and Metropolis Star. The Daily Planet seems to be in trouble. Currently has only a 34% market share. Every year, 10% of its readership switches to the Star, whereas only 6% of the Star’s readership switches to the Planet. Assume that no one subscribes to both papers and that the total newspaper readership remains constant. What is the long-term outlook for the Planet? Answer. Next year, the figures for the Planet and Star will be, respectively

0.9 · 0.34 + 0.06 · 0.66 = 0.3456 0.1 · 0.34 + 0.94 · 0.66 = 0.6544

This can be expressed as the matrix product of the form

 0.9 0.06   0.34   0.3456  = . 0.1 0.94 0.66 0.6544

4 In other words, X1 = PX0 where  0.9 0.06   0.34   0.3456  P = ,X = ,X = . 0.1 0.94 0 0.66 1 0.6544

We refer to the vectors X0 and X1 representing the readership for each year as the state vectors. For each positive integer k, we will let the state vector Xk represent the readership in the k-th year. Notice that the sum of the entries of each state vector Xk (for k = 0, 1) is 1. We call a column vector with non-negative entries and the sum of its entries equal to 1 a probability vector. We refer to the matrix P as the transition matrix, as it transitions the state vector Xk for the k-th year to the state vector Xk+1 = PXk for the next year via multiplication. The columns of P represent the probability that the readership will stay with the magazine or go to its rival. Thus the state vectors satisfy the inductive relationship

Xk+1 = PXk (2) for each k. Notice that since the readership for each magazine stays with them or goes to their rival in the next year, the sum of the columns of P must equal 1. We call a matrix P with non- negative entries and the sum of its entries in each column equal to 1 a probability matrix. Since the transition matrix P is independent of the readership, we say that this is Markov process. If we compute the readership for the next few years, we obtain

 0.9 0.06   0.3456   0.350304  X = PX = = , 2 1 0.1 0.94 0.6544 0.649696  0.9 0.06   0.350304   0.35425536  X = PX = = , 3 3 0.1 0.94 0.649696 0.64574464  0.9 0.06   0.35425536   0.3575745024  X = PX = = . 4 4 0.1 0.94 0.64574464 0.6424254976

The Planet is not in trouble. The readership of the Planet is in fact going up each year, whereas the readership of the Star is going down. This is because even though the Planet is currently less popular, there are not enough disgruntled Planet readers to keep the Star growing. To compute the readership for the k-th year, we multiplied P by the state vectors k times. Hence by (2), k Xk = P X0 for each k. We can use what we learned about computing powers of matrices using diagonalization k k to compute P and P X0 are and thereby work-out the long-term prospects of the Daily Planet. Find eigenvalues. We compute

0.9 − λ 0.06 det(P − λI) = 0.1 0.94 − λ = (0.9 − λ)(0.94 − λ) − 0.006 2 = λ − 1.84λ + p11 + 0.84 = (λ − 1)(λ − 0.84) = 0.

5 Therefore, λ = 1, 0.84. Find eigenvectors for λ = 1. We compute       −0.1 0.06 R2+R1 7→ R2 −0.1 0.06 −10 R1 7→ R1 1 −0.6 P − I = −−−−−−−→ −−−−−−−→ 0.1 −0.06 0 0 0 0 x2 is a free variable and x1 = 0.6 x2. Thus the eigenspace corresponding to λ = 1 is spanned by  0.6  . 1 Find eigenvectors for λ = 0.84. We compute  0.06 0.06   1 1  P − 0.84 I = −→ 0.1 0.1 0 0 x2 is a free variable and x1 = −x2. Thus the eigenspace corresponding to λ = 0.84 is spanned by  −1  . 1

Diagonalize. P = QDQ−1 where  1 0   0.6 −1  D = ,Q = . 0 0.84 1 1

k k Compute P and P X0. We have that  1 0  P k = QDkQ−1 = Q Q−1 0 0.84k  0.6 −1   1 0   0.6 −1 −1 = 1 1 0 0.84k 1 1  0.6 −1   1 0  1  1 1  = · 1 1 0 0.84k 1.6 −1 0.6 1  0.6 −1   1 1  = 1.6 1 1 −0.84k 0.6 · 0.84k 1  0.6 + 0.84k 0.6 (1 − 0.84k)  = 1.6 1 − 0.84k 1 + 0.6 · 0.84k and 1  0.6 + 0.84k 0.6 (1 − 0.84k)   0.34  1  0.6 − 0.32 · 0.84k  P kX = = 0 1.6 1 − 0.84k 1 + 0.6 · 0.84k 0.66 1.6 1 + 0.32 · 0.84k  0.375 − 0.32 · 0.84k  = 0.625 + 0.32 · 0.84k Letting k → ∞, the long-term readership X is given by

 k    k 0.375 − 0.32 · 0.84 0.375 X = lim P X0 = lim k = . k→∞ k→∞ 0.625 + 0.32 · 0.84 0.625

6 Note that there is another way we could have determined the long-term readership. Recall that Xk+1 = PXk for each k. Letting k → ∞

X = lim Xk+1 = lim PXk = P · lim Xk = PX, k→∞ k→∞ k→∞ or simply X = PX. In other words, the long-term readership X is an eigenvector of P with corresponding eigenvalue 1. Therefore,  0.6  X = c 1 for some c ∈ R. Since the sum of the entries of X add up to 1, we must have 1.6 c = c (0.6 + 1) = 1, or c = 0.625, so that

 0.6   0.375  X = 0.625 = 1 0.625 as we found above.

We can describe some of what we observed by with the following theorem. Recall that a probability vector is a column vector with non-negative entries and the sum of its entries equal to 1. A probability matrix is a matrix with non-negative entries and the sum of its entries in each column equal to 1.

Theorem 1. Let P be an n × n probability matrix with all non-zero entries. Then there is a unique probability vector X ∈ Rn such that PX = X.

n For each probability vector X0 ∈ R ,

k X = lim P X0. k→∞

7