<<

Unit 5: diagonalization

Juan Luis Melero and Eduardo Eyras October 2018

1 Contents

1 Matrix diagonalization3 1.1 Definitions...... 3 1.1.1 Similar matrix...... 3 1.1.2 Diagonizable matrix...... 3 1.1.3 Eigenvalues and Eigenvectors...... 3 1.2 Calculation of eigenvalues and eigenvectors...... 4

2 Properties of matrix diagonalization7 2.1 Similar matrices have the same eigenvalues...... 7 2.2 Relation between the rank and the eigenvalues of a matrix..8 2.3 Eigenvalues are linearly independent...... 8 2.4 A matrix is diagonalizable if and only if has n linearly inde- pendent eigenvalues...... 9 2.5 Eigenvectors of a are orthogonal...... 11

3 Matrix diagonalization process example 12

4 Exercises 14

5 R practical 15 5.1 Eigenvalues and Eigenvectors...... 15

2 1 Matrix diagonalization

1.1 Definitions 1.1.1 Similar matrix Two matrices are called similar if they are related through a third matrix in the following way:

−1 A, B ∈ Mn×n(R) are similar if ∃P ∈ Mn×n(R) invertible | A = P BP Note that two similar matrices have the same .

Proof: Given A, B similar: A = P −1BP 1 det(A) = det(P −1BP ) = det(P −1)det(B)det(P ) = det(B)det(P ) = det(B) det(P )

1.1.2 Diagonizable matrix A matrix is diagonalizable if it is similar to a , i.e.:

−1 A ∈ Mn×n(R) is diagonalizable if ∃P ∈ Mn×n(R) invertible | P AP is diagonal P is the matrix of change to a where A has a diagonal form.

We will say that A is diagonalizable (or diagonalizes) if and only if there is a basis B = {u1, . . . , un} with the property:

Au1 = λ1u1 . . with λ1, . . . , λn ∈ R Aun = λnun That is, A has diagonal form in this basis, and consequently, A has diagonal form on every vector of the in this basis.

1.1.3 Eigenvalues and Eigenvectors

Let A be a A ∈ Mn×n(R), λ ∈ R is an eigenvalue of A if, for some vector u, Au = λu

3 We can rewrite this as a system of equations:

Au = λu → (λIn − A)u = 0 or (A − λIn)u = 0 We can find non-trivial solutons in this homogeneous system of equations if the determinant is zero.

1.2 Calculation of eigenvalues and eigenvectors

Let A be a square matrix A ∈ Mn×n(R), λ ∈ R is an eigenvalue of A ⇐⇒ det(λIn − A) = 0 . A vector u is an eigenvector of λ ⇐⇒ (λIn − A)u = 0. First, we calculate the eigenvalues and afterwards, the eigenvectors.

To compute eigenvalues we solve the equation:

n n−1 2 0 = det(λIn − A) = λ + αn−1λ + ··· + α2λ + α1λ + α0

Thus, each eigenvalue λi is a root of this polynomial (the characteristic poly- nomial).

To compute the eigenvectors, we solve the linear equation for each eigenvalue:

(λiIn − A)u = 0

The set of solutions for a given eigenvalue is called the Eigenspace of A corresponding to the eigenvalue λ:

E(λ) = {u | (λIn − A)u = 0} Note that E(λ) is the kernel of a (we leave as exercise to show that λIn − A is a linear map):

E(λ) = Ker(λIn − A)

Since the kernel of a linear map is a vector subspace, the eigenspace is a vector subspace.

Given a square matrix representing a linear map from a vector space to itself (), the eigenvectors describe the subspaces in which the matrix works as a multiplication by a number (the eigenvalues), i.e. the vectors on

4 which matrix diagonalizes.

Example in R2. Consider the matrix in 2 dimensions: −3 4 A = −1 2 To diagonalize this matrix we write the characteristic equation:

λ + 3 −4  det(λI − A) = det = (λ + 3)(λ − 2) + 4 = 0 2 1 λ − 2 λ2 + λ − 2 = 0 → (λ + 2)(λ − 1) = 0 → λ = −2, 1 The eigenvalues of this matrix are −2 and 1. Now we calculate the eigenvec- tors for each eigenvalue by solving the homogeneous linear equations for the components of the vectors. For eigenvector λ = −2.       −2 + 3 −4 u1 0 (−2I2 − A)u = 0 → = → 1 −2 − 2 u2 0     u1 − 4u2 0 → = → u1 = 4u2 u1 − 4u2 0 Hence, the eigenspace is:

  a   E(−2) = u = , a ∈ a/4 R  1  In particular, u = is an eigenvector with eigenvalue −2. 1/4 For eigenvalue λ = 1.       1 + 3 −4 u1 0 (I2 − A)u = 0 → = → 1 1 − 2 u2 0     4u1 − 4u2 0 → = → u1 = u2 u1 − u2 0

Hence the eigenspace has the form:

5  a  E(1) = u = , a ∈ a R 1 In particular, u = is an eigenvector with eigenvalue 1. 1 Example in R3. Consider the following matrix: −5 0 0 A =  3 7 0 4 −2 3

λ + 5 0 0  det(λI3 − A) = det  3 λ − 7 0  = (λ + 5)(λ − 7)(λ − 3) = 0 −4 2 λ − 3

3 solutions: λ = −5, 7, 3

Eigenvector for λ = −5:

       16  0 0 0 x 0 − 9 z 4 (−5I3 − A)u = 0 → −3 −12 0  y = 0 → u =  9 z  −4 2 −8 z 0 z

The eigenspace is:

 x   16 4  E(−5) = u = y , x = − z, z, z, z ∈   9 9 R  z  Eigenvectors for λ = 7:

 12 0 0 x 0  0  (7I3 − A)u = 0 → −3 0 0 y = 0 → u = −2z −4 2 4 z 0 z The eigenspace is:      0  E(7) = u = −2z , z ∈ R  z 

6 Eigenvector for λ = 3:

 8 0 0 x 0 0 (3I3 − A)u = 0 → −3 −4 0 y = 0 → u = 0 −4 2 0 z 0 z The eigenspace is:      0  E(3) = u = 0 , z ∈ R  z 

2 Properties of matrix diagonalization

In this section we describe some of the properties of diagonalizable matrices.

2.1 Similar matrices have the same eigenvalues Theorem:

A, B ∈ Mn×n(R) similar =⇒ A, B have the same eigenvalues Proof: Given two square matrices that are similar:

−1 A, B ∈ Mn×n(R),A = P BP The eigenvalues are calculated with the characteristic polynomial, that is:

−1 −1 −1 det(λIn − A) = det(λP P − P BP ) = det(P (λIn − B)P ) =

−1 det(P )det(λIn − B)det(P ) = det(λIn − B) Hence, two similar matrices have the same characteristic polynomial and therefore will have the same eigenvalues.

This result also allows us to understand better the process of diagonalization. The determinant of a diagonal matrix is the product of the elements in its diagonal and the solution of the characteristic polynomials must be of the Q form (λ − λi) = 0, where λi are the eigenvalues. Thus, to diagonalize a matrix is to establish its similarity to a diagonal matrix containing its eigenvalues.

7 2.2 Relation between the rank and the eigenvalues of a matrix Recall that the rank of A matrix is the maximum number of linearly inde- pendent row or column vectors.

Property: rank(A) = number of different non-zero eigenvalues of A.

Proof: we defined a diagonalizable matrix A if it is similar to a diagonal matrix such that D = P −1AP and D is a diagonal matrix. As we saw in section 1.1.1, the determinant of two similar matrices is the same, therefore:

D = P −1AP → det(D) = det(A)

We can see that a matrix is singular, i.e. has det(A) = 0, if at least one of its eigenvalues is zero. A the rank of a diagonal matrix is the number of non-zero rows, the rank of A is the number of non-zero eigenvalues.

2.3 Eigenvalues are linearly independent Theorem: the eigenvalues of a matrix are linearly independent.

Proof: We prove this by contradiction, i.e. we assume the opposite and arrive to a contradiction. Consider the case of two non-zero eigenvectors for a 2 × 2 matrix A:

u1 6= 0, u2 6= 0, Au1 = λ1u1, Au2 = λ2u2

We assume that they are linearly dependent:

u1 = cu2

Now we apply the matrix A on both sides and use the fact that they are eigenvectors: λ1u1 = cλ2u2 = λ2u1 → (λ1 − λ2)u1 = 0

As the eigenvalues are generally different, therefore u1 = 0, which is a con- tradiction, since we assumed that the eigenvectors are non-zero. Thus, if the eigenvalues are different, the eigenvectors are linearly independent.

8 For n eigenvectors: first, assume linear dependence

n X u1 = αjuj j=2 Apply the matrix to both sides:

n n n X X X λ1u1 = αjλjuj = λ1 αjuj → (λ1 − λj)αjuj = 0 → αj 6= 0, ∀j j=2 j=2 j=2

For different eigenvalues λi 6= λj, i 6= j, necessarily all coefficients αj must be zero.

As a result, the eigenvectors of a matrix with maximal rank (non zero eigen- values) form a basis of the vector space and diagonalize the matrix (see section 2.4).

2.4 A matrix is diagonalizable if and only if has n lin- early independent eigenvalues Theorem:

A ∈ Mn×n(R) is diagonalizable ⇐⇒ A has n linearly independent eigenvec- tors.

Proof: we have to prove both directions. 1. A diagonalizable =⇒ n linearly independent eigenvectors. 2. n linearly independent vectors =⇒ A diagonalizable. Proof of 1: assume A is diagonalizable. Then, we know it must be similar to a diagonal matrix:

−1 ∃P ∈ Mn×n(R) | P AP is diagonal We can write:   λ1 ... 0 −1  . .. .  p | ... |p  P AP = D =  . . .  and P = 1 n 0 . . . λn

9 P is defined in terms of column vectors pi.

We multiply both sides of the equation by P from the left:

P −1AP = D → AP = PD   λ1 ... 0 p | ... |p  p | ... |p   . .. .  AP = PD → A 1 n = 1 n  . . .  0 . . . λn This can be rewritten as:   Ap1| ... |Apn = λ1p1| ... |λnpn → Ap2 = λipi

This tells us that the column vectors of P , pi, are actually eigenvectors of A. Since the matrix A is diagonalizable, P must be invertible, so the column vectors (i.e. the eigenvectors) pi cannot be linearly dependent of each other, since otherwise det(P ) = 0

Proof of 2: assume that A has n linearly independent eigenvectors. That means ∃pi, i = 1, . . . , n | Api = λipi (1)

We define a matrix P by using pi as the column vectors:  P = p1| ... |pn

We define a diagonal matrix D where the diagonal values are these eigenval- ues:   λ1 ... 0  . .. .  D =  . . .  0 . . . λn We can rewrite the equation1 in terms of the matrices P and D:

−1 Api = λipi, i = 1, . . . , n → AP = DP → D = P AP

−1 Since pi are all linearly independent, P exists. A is similar to a diagonal matrix, then, A is diagonalizable.

10 Q.E.D.

Conclusion: a matrix is diagonalizable if we can write:

A = PDP −1

Where P is the matrix containing the vector columns of eigenvectors  P = p1| ... |pn

And D is the diagonal matrix containing the eigenvalues   λ1 ... 0  . .. .  D =  . . .  0 . . . λn

2.5 Eigenvectors of a symmetric matrix are orthogonal In general, the eigenvectors of a matrix will be all linearly independent and the matrix diagonalizes when there are enough eigenvectors to form a basis of the vector space where is applied the endomorphism (section 2.4). In gen- eral the eigenvectors are not orthogonal. So it is not an orthogonal basis. However, for a symmetric matrix, the corresponding eigenvectors are always orthogonal.

Theorem: If v1, . . . , vn are eigenvectors for a real symmetric matrix A and if he corresponding eigenvectors are all different, then the eigenvectors cor- responding to different eigenvalues are orthogonal to each other.

Proof: a symmetric matrix is defined as AT = A. We will calculate the eigenvalues of A and AT and we will proof that they are orthogonal for any pair. Let u be an eigenvector for AT and v be an eigenvector for A. If the corresponding eigenvalues are different, then u and v must be orthogonal:

T A u = λuu, Av = λvv

T  A u, v = hλuu, vi = λu hu, vi  → (λu − λv) hu, vi = 0 T T  T T A u, v = A u v = u Av = λvu v = λv hu, vi 

11 If λu 6= λv → hu, vi = 0

As the matrix is symmetric, AT = A, so the result is true for any pair of eigenvectors for different eigenvalues of A.

Properties used in the proof:   v1 T u . . . u   .  hu, vi = u v = 1 n  .  vn     b11 . . . b1n v1 u . . . u   . .. .   .  T T T T hu, Bvi = 1 n  . . .   .  = u Bv = (B u) v = B u, v bn1 . . . bnn vn

3 Matrix diagonalization process example

In this section we will perform the whole process to diagonalize a matrix with an example.

1 2 Example: consider the following matrixA = . Calculate its eigenval- 2 1 ues and eigenvectors and build the matrix P to transform it into a diagonal matrix through P −1AP .

We write down the characteristic polynomial:

1 − λ 2  det(A − λI ) = det = (1 − λ)2 − 4 = λ2 − 2λ − 3 2 2 1 − λ

det(A − λI2) = 0 → (λ − 3)(λ + 1) = 0 → λ = 3, −1 It has two solution, i.e. two eigenvalues. We know that two eigenvalues will give two eigenvectors. Hence, at this point we know that the matrix diagonalizes. We now calculate the eigenvectors:

1 2 x x x + 2y = 3  x = 3 → → x = y → Eigenvectors of 3 2 1 y y 2x + y = 3y x

12 1 2 x x x + 2y = −x   x  = −1 → → −x = y → Eigenvectors of -1 2 1 y y 2x + y = −y −x

We can also calculate the eigenvectors with the eigenspaces, which is the set of solution of Au = λu: a  E(3) = Ker(A − 3I ) = , a ∈ ⊂ 2 2 a R R

 b   E(−1) = Ker(A + I ) = , b ∈ ⊂ 2 2 −b R R We choose two particular eigenvectors, one from each space:

1  1  ∈ E(3) ∈ E(−1) 1 −1

We build the matrix P from these vectors: 1 1  P = 1 −1

Now we need to calculate P −1 and check that P −1AP is a diagonal matrix with the eigenvalues in the diagonal.

1 1 1 −1 −1 P −1 = Adj(P ) = CT = − det(P ) det(P ) P 2 −1 1 We confirm that it is the inverse: 1 −1 −1 1 1  1 0 P −1P = − = 2 −1 1 1 −1 0 1 Now we confirm that A is similar to a diagonal matrix through P , and that this diagonal matrix contains the eigenvalues in its diagonal:

1 −1 −1 1 2 1 1  3 0  P −1AP = − = 2 −1 1 2 1 1 −1 0 −1 It is important to know that this would work with a matrix P build from any set of eigenvectors chosen. In addition, the order of the eigenvalues will be the order of the eigenvectors chosen as columns.

13 4 Exercises 3 2 Ex. 1 — Consider the matrix A = 0 1 1. Calculate its eigenvalues and eigenvectors 2. Calculate P such that P −1AP is diagonal

 1 2 1  Ex. 2 — Consider the matrix A =  2 0 −2 −1 2 3 1. Calculate its eigenvalues and eigenvectors 2. Calculate P such that P −1AP is diagonal

Ex. 3 — Consider the following linear map between polynomail of degree ≤ 1: f : P1[x] → P1[x] Where: a + bx → (a + b) + (a + b)x

1. Calculate the associated matrix A 2. Calculate the eigenvalues and eigenvectors associated to this linear map 3. What is the matrix P such that P −1AP is diagonal

2 1 −1 Ex. 4 — Consider the matrix A = 0 −1 2 . Show that A is not 0 0 −1 diagonalizable. Hint: you can use the theorem that says that a square ma- trix of size n is diagonalizable if and only if it has n linearly independent eigenvectors (or n different eigenvalues).

5 1 3 Ex. 5 — Consider the matrix A = 0 7 6. Calculate an orthonormal 0 1 8 (orthogonal and unit length) basis for each of its eigenspaces.

14 5 R practical

5.1 Eigenvalues and Eigenvectors Having built a matrix, R has the function eigen() that calculates the eigen- values and the eigenvectors of a matrix. #Introduce the matrix > m <- matrix(c(-3, -1, 4, 2), 2,2) > m [,1] [,2] [1 ,] -3 4 [2 ,] -1 2

#Compute the eigenvalues and eigenvectors > ev <- eigen(m) > evalues <- ev$values > evalues [1] -2 1 #Eigenvalues are always returned #in decreasing order of the absolute value

> evectors <- ev$vectors > evectors [,1] [,2] [1,] -0.9701425 -0.7071068 [2,] -0.2425356 -0.7071068 #Returns the eigenvectors as columns #The eigenvectors are unit length #The order is the corresponding #to the order of the eigenvalues. Notice that evectors is a valid matrix P . So computing the inverse of that matrix you can check the diagonalization. > library (’matlib’) > p <- evectors > pi <- inv(p) > pi%*%m%*%p [,1] [,2] [1,] -2.000000e+00 -2.220446e-16 [2,] -1.110223e-16 1.000000e+00

15 #The diagonal coincides with the eigenvalues #The other elements are"0" Try to test the theorems and properties using R (you may need to use com- mands from previous units).

16 References

[1] Howard Anton. Introducci´onal ´algebra lineal. 2003.

[2] Marc Peter Deisenroth; A Aldo Faisal and Cheng Soon Ong. Mathematics for Machine Learning. 2018.

[3] Michael Friendly. Eigenvalues and eigenvectors: Properties, 2018.

[4] Jordi Vill`aand Pau Ru´e. Elements of Mathematics: an embarrasignly simple (but practical) introduction to algebra. 2011.

17