Unit 5: Matrix diagonalization
Juan Luis Melero and Eduardo Eyras October 2018
1 Contents
1 Matrix diagonalization3 1.1 Definitions...... 3 1.1.1 Similar matrix...... 3 1.1.2 Diagonizable matrix...... 3 1.1.3 Eigenvalues and Eigenvectors...... 3 1.2 Calculation of eigenvalues and eigenvectors...... 4
2 Properties of matrix diagonalization7 2.1 Similar matrices have the same eigenvalues...... 7 2.2 Relation between the rank and the eigenvalues of a matrix..8 2.3 Eigenvalues are linearly independent...... 8 2.4 A matrix is diagonalizable if and only if has n linearly inde- pendent eigenvalues...... 9 2.5 Eigenvectors of a symmetric matrix are orthogonal...... 11
3 Matrix diagonalization process example 12
4 Exercises 14
5 R practical 15 5.1 Eigenvalues and Eigenvectors...... 15
2 1 Matrix diagonalization
1.1 Definitions 1.1.1 Similar matrix Two matrices are called similar if they are related through a third matrix in the following way:
−1 A, B ∈ Mn×n(R) are similar if ∃P ∈ Mn×n(R) invertible | A = P BP Note that two similar matrices have the same determinant.
Proof: Given A, B similar: A = P −1BP 1 det(A) = det(P −1BP ) = det(P −1)det(B)det(P ) = det(B)det(P ) = det(B) det(P )
1.1.2 Diagonizable matrix A matrix is diagonalizable if it is similar to a diagonal matrix, i.e.:
−1 A ∈ Mn×n(R) is diagonalizable if ∃P ∈ Mn×n(R) invertible | P AP is diagonal P is the matrix of change to a basis where A has a diagonal form.
We will say that A is diagonalizable (or diagonalizes) if and only if there is a basis B = {u1, . . . , un} with the property:
Au1 = λ1u1 . . with λ1, . . . , λn ∈ R Aun = λnun That is, A has diagonal form in this basis, and consequently, A has diagonal form on every vector of the vector space in this basis.
1.1.3 Eigenvalues and Eigenvectors
Let A be a square matrix A ∈ Mn×n(R), λ ∈ R is an eigenvalue of A if, for some vector u, Au = λu
3 We can rewrite this as a system of equations:
Au = λu → (λIn − A)u = 0 or (A − λIn)u = 0 We can find non-trivial solutons in this homogeneous system of equations if the determinant is zero.
1.2 Calculation of eigenvalues and eigenvectors
Let A be a square matrix A ∈ Mn×n(R), λ ∈ R is an eigenvalue of A ⇐⇒ det(λIn − A) = 0 . A vector u is an eigenvector of λ ⇐⇒ (λIn − A)u = 0. First, we calculate the eigenvalues and afterwards, the eigenvectors.
To compute eigenvalues we solve the equation:
n n−1 2 0 = det(λIn − A) = λ + αn−1λ + ··· + α2λ + α1λ + α0
Thus, each eigenvalue λi is a root of this polynomial (the characteristic poly- nomial).
To compute the eigenvectors, we solve the linear equation for each eigenvalue:
(λiIn − A)u = 0
The set of solutions for a given eigenvalue is called the Eigenspace of A corresponding to the eigenvalue λ:
E(λ) = {u | (λIn − A)u = 0} Note that E(λ) is the kernel of a linear map (we leave as exercise to show that λIn − A is a linear map):
E(λ) = Ker(λIn − A)
Since the kernel of a linear map is a vector subspace, the eigenspace is a vector subspace.
Given a square matrix representing a linear map from a vector space to itself (endomorphism), the eigenvectors describe the subspaces in which the matrix works as a multiplication by a number (the eigenvalues), i.e. the vectors on
4 which matrix diagonalizes.
Example in R2. Consider the matrix in 2 dimensions: −3 4 A = −1 2 To diagonalize this matrix we write the characteristic equation:
λ + 3 −4 det(λI − A) = det = (λ + 3)(λ − 2) + 4 = 0 2 1 λ − 2 λ2 + λ − 2 = 0 → (λ + 2)(λ − 1) = 0 → λ = −2, 1 The eigenvalues of this matrix are −2 and 1. Now we calculate the eigenvec- tors for each eigenvalue by solving the homogeneous linear equations for the components of the vectors. For eigenvector λ = −2. −2 + 3 −4 u1 0 (−2I2 − A)u = 0 → = → 1 −2 − 2 u2 0 u1 − 4u2 0 → = → u1 = 4u2 u1 − 4u2 0 Hence, the eigenspace is:
a E(−2) = u = , a ∈ a/4 R 1 In particular, u = is an eigenvector with eigenvalue −2. 1/4 For eigenvalue λ = 1. 1 + 3 −4 u1 0 (I2 − A)u = 0 → = → 1 1 − 2 u2 0 4u1 − 4u2 0 → = → u1 = u2 u1 − u2 0
Hence the eigenspace has the form:
5 a E(1) = u = , a ∈ a R 1 In particular, u = is an eigenvector with eigenvalue 1. 1 Example in R3. Consider the following matrix: −5 0 0 A = 3 7 0 4 −2 3
λ + 5 0 0 det(λI3 − A) = det 3 λ − 7 0 = (λ + 5)(λ − 7)(λ − 3) = 0 −4 2 λ − 3
3 solutions: λ = −5, 7, 3
Eigenvector for λ = −5:
16 0 0 0 x 0 − 9 z 4 (−5I3 − A)u = 0 → −3 −12 0 y = 0 → u = 9 z −4 2 −8 z 0 z
The eigenspace is:
x 16 4 E(−5) = u = y , x = − z, z, z, z ∈ 9 9 R z Eigenvectors for λ = 7:
12 0 0 x 0 0 (7I3 − A)u = 0 → −3 0 0 y = 0 → u = −2z −4 2 4 z 0 z The eigenspace is: 0 E(7) = u = −2z , z ∈ R z
6 Eigenvector for λ = 3:
8 0 0 x 0 0 (3I3 − A)u = 0 → −3 −4 0 y = 0 → u = 0 −4 2 0 z 0 z The eigenspace is: 0 E(3) = u = 0 , z ∈ R z
2 Properties of matrix diagonalization
In this section we describe some of the properties of diagonalizable matrices.
2.1 Similar matrices have the same eigenvalues Theorem:
A, B ∈ Mn×n(R) similar =⇒ A, B have the same eigenvalues Proof: Given two square matrices that are similar:
−1 A, B ∈ Mn×n(R),A = P BP The eigenvalues are calculated with the characteristic polynomial, that is:
−1 −1 −1 det(λIn − A) = det(λP P − P BP ) = det(P (λIn − B)P ) =
−1 det(P )det(λIn − B)det(P ) = det(λIn − B) Hence, two similar matrices have the same characteristic polynomial and therefore will have the same eigenvalues.
This result also allows us to understand better the process of diagonalization. The determinant of a diagonal matrix is the product of the elements in its diagonal and the solution of the characteristic polynomials must be of the Q form (λ − λi) = 0, where λi are the eigenvalues. Thus, to diagonalize a matrix is to establish its similarity to a diagonal matrix containing its eigenvalues.
7 2.2 Relation between the rank and the eigenvalues of a matrix Recall that the rank of A matrix is the maximum number of linearly inde- pendent row or column vectors.
Property: rank(A) = number of different non-zero eigenvalues of A.
Proof: we defined a diagonalizable matrix A if it is similar to a diagonal matrix such that D = P −1AP and D is a diagonal matrix. As we saw in section 1.1.1, the determinant of two similar matrices is the same, therefore:
D = P −1AP → det(D) = det(A)
We can see that a matrix is singular, i.e. has det(A) = 0, if at least one of its eigenvalues is zero. A the rank of a diagonal matrix is the number of non-zero rows, the rank of A is the number of non-zero eigenvalues.
2.3 Eigenvalues are linearly independent Theorem: the eigenvalues of a matrix are linearly independent.
Proof: We prove this by contradiction, i.e. we assume the opposite and arrive to a contradiction. Consider the case of two non-zero eigenvectors for a 2 × 2 matrix A:
u1 6= 0, u2 6= 0, Au1 = λ1u1, Au2 = λ2u2
We assume that they are linearly dependent:
u1 = cu2
Now we apply the matrix A on both sides and use the fact that they are eigenvectors: λ1u1 = cλ2u2 = λ2u1 → (λ1 − λ2)u1 = 0
As the eigenvalues are generally different, therefore u1 = 0, which is a con- tradiction, since we assumed that the eigenvectors are non-zero. Thus, if the eigenvalues are different, the eigenvectors are linearly independent.
8 For n eigenvectors: first, assume linear dependence
n X u1 = αjuj j=2 Apply the matrix to both sides:
n n n X X X λ1u1 = αjλjuj = λ1 αjuj → (λ1 − λj)αjuj = 0 → αj 6= 0, ∀j j=2 j=2 j=2
For different eigenvalues λi 6= λj, i 6= j, necessarily all coefficients αj must be zero.
As a result, the eigenvectors of a matrix with maximal rank (non zero eigen- values) form a basis of the vector space and diagonalize the matrix (see section 2.4).
2.4 A matrix is diagonalizable if and only if has n lin- early independent eigenvalues Theorem:
A ∈ Mn×n(R) is diagonalizable ⇐⇒ A has n linearly independent eigenvec- tors.
Proof: we have to prove both directions. 1. A diagonalizable =⇒ n linearly independent eigenvectors. 2. n linearly independent vectors =⇒ A diagonalizable. Proof of 1: assume A is diagonalizable. Then, we know it must be similar to a diagonal matrix:
−1 ∃P ∈ Mn×n(R) | P AP is diagonal We can write: λ1 ... 0 −1 . .. . p | ... |p P AP = D = . . . and P = 1 n 0 . . . λn
9 P is defined in terms of column vectors pi.
We multiply both sides of the equation by P from the left:
P −1AP = D → AP = PD λ1 ... 0 p | ... |p p | ... |p . .. . AP = PD → A 1 n = 1 n . . . 0 . . . λn This can be rewritten as: Ap1| ... |Apn = λ1p1| ... |λnpn → Ap2 = λipi
This tells us that the column vectors of P , pi, are actually eigenvectors of A. Since the matrix A is diagonalizable, P must be invertible, so the column vectors (i.e. the eigenvectors) pi cannot be linearly dependent of each other, since otherwise det(P ) = 0
Proof of 2: assume that A has n linearly independent eigenvectors. That means ∃pi, i = 1, . . . , n | Api = λipi (1)
We define a matrix P by using pi as the column vectors: P = p1| ... |pn
We define a diagonal matrix D where the diagonal values are these eigenval- ues: λ1 ... 0 . .. . D = . . . 0 . . . λn We can rewrite the equation1 in terms of the matrices P and D:
−1 Api = λipi, i = 1, . . . , n → AP = DP → D = P AP
−1 Since pi are all linearly independent, P exists. A is similar to a diagonal matrix, then, A is diagonalizable.
10 Q.E.D.
Conclusion: a matrix is diagonalizable if we can write:
A = PDP −1
Where P is the matrix containing the vector columns of eigenvectors P = p1| ... |pn
And D is the diagonal matrix containing the eigenvalues λ1 ... 0 . .. . D = . . . 0 . . . λn
2.5 Eigenvectors of a symmetric matrix are orthogonal In general, the eigenvectors of a matrix will be all linearly independent and the matrix diagonalizes when there are enough eigenvectors to form a basis of the vector space where is applied the endomorphism (section 2.4). In gen- eral the eigenvectors are not orthogonal. So it is not an orthogonal basis. However, for a symmetric matrix, the corresponding eigenvectors are always orthogonal.
Theorem: If v1, . . . , vn are eigenvectors for a real symmetric matrix A and if he corresponding eigenvectors are all different, then the eigenvectors cor- responding to different eigenvalues are orthogonal to each other.
Proof: a symmetric matrix is defined as AT = A. We will calculate the eigenvalues of A and AT and we will proof that they are orthogonal for any pair. Let u be an eigenvector for AT and v be an eigenvector for A. If the corresponding eigenvalues are different, then u and v must be orthogonal:
T A u = λuu, Av = λvv
T A u, v = hλuu, vi = λu hu, vi → (λu − λv) hu, vi = 0 T T T T A u, v = A u v = u Av = λvu v = λv hu, vi
11 If λu 6= λv → hu, vi = 0
As the matrix is symmetric, AT = A, so the result is true for any pair of eigenvectors for different eigenvalues of A.
Properties used in the proof: v1 T u . . . u . hu, vi = u v = 1 n . vn b11 . . . b1n v1 u . . . u . .. . . T T T T hu, Bvi = 1 n . . . . = u Bv = (B u) v = B u, v bn1 . . . bnn vn
3 Matrix diagonalization process example
In this section we will perform the whole process to diagonalize a matrix with an example.
1 2 Example: consider the following matrixA = . Calculate its eigenval- 2 1 ues and eigenvectors and build the matrix P to transform it into a diagonal matrix through P −1AP .
We write down the characteristic polynomial:
1 − λ 2 det(A − λI ) = det = (1 − λ)2 − 4 = λ2 − 2λ − 3 2 2 1 − λ
det(A − λI2) = 0 → (λ − 3)(λ + 1) = 0 → λ = 3, −1 It has two solution, i.e. two eigenvalues. We know that two eigenvalues will give two eigenvectors. Hence, at this point we know that the matrix diagonalizes. We now calculate the eigenvectors:
1 2 x x x + 2y = 3 x = 3 → → x = y → Eigenvectors of 3 2 1 y y 2x + y = 3y x
12 1 2 x x x + 2y = −x x = −1 → → −x = y → Eigenvectors of -1 2 1 y y 2x + y = −y −x
We can also calculate the eigenvectors with the eigenspaces, which is the set of solution of Au = λu: a E(3) = Ker(A − 3I ) = , a ∈ ⊂ 2 2 a R R
b E(−1) = Ker(A + I ) = , b ∈ ⊂ 2 2 −b R R We choose two particular eigenvectors, one from each space:
1 1 ∈ E(3) ∈ E(−1) 1 −1
We build the matrix P from these vectors: 1 1 P = 1 −1
Now we need to calculate P −1 and check that P −1AP is a diagonal matrix with the eigenvalues in the diagonal.
1 1 1 −1 −1 P −1 = Adj(P ) = CT = − det(P ) det(P ) P 2 −1 1 We confirm that it is the inverse: 1 −1 −1 1 1 1 0 P −1P = − = 2 −1 1 1 −1 0 1 Now we confirm that A is similar to a diagonal matrix through P , and that this diagonal matrix contains the eigenvalues in its diagonal:
1 −1 −1 1 2 1 1 3 0 P −1AP = − = 2 −1 1 2 1 1 −1 0 −1 It is important to know that this would work with a matrix P build from any set of eigenvectors chosen. In addition, the order of the eigenvalues will be the order of the eigenvectors chosen as columns.
13 4 Exercises 3 2 Ex. 1 — Consider the matrix A = 0 1 1. Calculate its eigenvalues and eigenvectors 2. Calculate P such that P −1AP is diagonal
1 2 1 Ex. 2 — Consider the matrix A = 2 0 −2 −1 2 3 1. Calculate its eigenvalues and eigenvectors 2. Calculate P such that P −1AP is diagonal
Ex. 3 — Consider the following linear map between polynomail of degree ≤ 1: f : P1[x] → P1[x] Where: a + bx → (a + b) + (a + b)x
1. Calculate the associated matrix A 2. Calculate the eigenvalues and eigenvectors associated to this linear map 3. What is the matrix P such that P −1AP is diagonal
2 1 −1 Ex. 4 — Consider the matrix A = 0 −1 2 . Show that A is not 0 0 −1 diagonalizable. Hint: you can use the theorem that says that a square ma- trix of size n is diagonalizable if and only if it has n linearly independent eigenvectors (or n different eigenvalues).
5 1 3 Ex. 5 — Consider the matrix A = 0 7 6. Calculate an orthonormal 0 1 8 (orthogonal and unit length) basis for each of its eigenspaces.
14 5 R practical
5.1 Eigenvalues and Eigenvectors Having built a matrix, R has the function eigen() that calculates the eigen- values and the eigenvectors of a matrix. #Introduce the matrix > m <- matrix(c(-3, -1, 4, 2), 2,2) > m [,1] [,2] [1 ,] -3 4 [2 ,] -1 2
#Compute the eigenvalues and eigenvectors > ev <- eigen(m) > evalues <- ev$values > evalues [1] -2 1 #Eigenvalues are always returned #in decreasing order of the absolute value
> evectors <- ev$vectors > evectors [,1] [,2] [1,] -0.9701425 -0.7071068 [2,] -0.2425356 -0.7071068 #Returns the eigenvectors as columns #The eigenvectors are unit length #The order is the corresponding #to the order of the eigenvalues. Notice that evectors is a valid matrix P . So computing the inverse of that matrix you can check the diagonalization. > library (’matlib’) > p <- evectors > pi <- inv(p) > pi%*%m%*%p [,1] [,2] [1,] -2.000000e+00 -2.220446e-16 [2,] -1.110223e-16 1.000000e+00
15 #The diagonal coincides with the eigenvalues #The other elements are"0" Try to test the theorems and properties using R (you may need to use com- mands from previous units).
16 References
[1] Howard Anton. Introducci´onal ´algebra lineal. 2003.
[2] Marc Peter Deisenroth; A Aldo Faisal and Cheng Soon Ong. Mathematics for Machine Learning. 2018.
[3] Michael Friendly. Eigenvalues and eigenvectors: Properties, 2018.
[4] Jordi Vill`aand Pau Ru´e. Elements of Mathematics: an embarrasignly simple (but practical) introduction to algebra. 2011.
17