<<

Chapter 7: Symmetric Matrices and Quadratic Forms

(Last Updated: December 5, 2016)

These notes are derived primarily from and its applications by David Lay (4ed). A few theorems have been moved around.

1. Diagonalization of Symmetric Matrices

We have seen already that it is quite time intensive to determine whether a is diagonalizable. We’ll see that there are certain cases when a matrix is always diagonalizable.

Definition 1. A matrix A is symmetric if AT = A.

 3 −2 4   Example 1. Let A = −2 6 2. 4 2 3

T 2 Note that A = A, so A is symmetric. The characteristic of A is χA(t) = (t + 2)(t − 7) so the eigenvalues are −2 and 7. The corresponding eigenspaces have bases, −2 −1 1           λ = −2, −1 , λ = 7,  2  , 0 .      2   0 1 

3 Hence, A is diagonalizable. Now we use Gram-Schmidt to find an for R . Note that the eigenvector for λ = −2 is already orthogonal to both eigenvectors for λ = 7. 1 −1/2 −2       v1 = 0 , v2 =  2  , v3 = −1 . 1 1/2 2 Finally, we normalize each vector, √ √ 1/ 2 −1/3 2 −2/3 √ u1 =  0  , u2 =  2 2/3  , u3 = −1/3 .  √   √    1/ 2 1/3 2 2/3

h i T Now the matrix U = u1 u2 u3 is orthogonal and so U U = I.

Theorem 2. If A is symmetric, then any two eigenvectors from different eigenspaces are orthogonal.

Proof. Let v1, v2 be eigenvectors for A with corresponding eigenvalues λ1, λ2, λ1 6= λ2. Then

T T T T T T λ1(v1 · v2) = (λ1v1) v2 = (Av1) v2 = v1 A v2 = v1 Av2 = v1 (λ2v2) = λ2(v1 · v2).

Hence, (λ1 − λ2)(v1 · v2) = 0. Since λ1 6= λ2, then we must have v1 · v2 = 0.  Based on the previous theorem, we say that the eigenspaces of A are mutually orthogonal.

Definition 2. An n × n matrix A is orthogonally diagonalizable if there exists an orthogonal n × n matrix P and a D such that A = PDP T .

Theorem 3. If A is orthogonally diagonalizable, then A is symmetric.

Proof. Since A is orthogonally diagonalizable, then A = PDP T for some P and T T T T T T T T diagonal matrix D. A is symmetric because A = (PDP ) = (P ) D P = PDP = A. 

It turns out the converse of the above theorem is also true! The set of eigenvalues of a matrix A is called the spectrum of A and is denoted σA.

Theorem 4 (The Spectral Theorem for symmetric matrices). Let A be a (real) n × n matrix. Then the following hold.

(1) A has n real eigenvalues, counting multiplicities.

(2) For each eigenvalue λ of A, geomultλ(A) = algmultλ(A). (3) The eigenspaces are mutually orthogonal. (4) A is orthogonally diagonalizable.

Proof. We proved in HW9, Exercise 6 that every eigenvalue of a is real. The second part of (1) as well as (2) are immediate consequences of (4). We proved (3) in Theorem 2. Note that (4) is trivial when A has n distinct eigenvalues by (3).

We prove (4) by induction. Clearly the result holds when A is 1 × 1. Assume (n − 1) × (n − 1) symmetric matrices are orthogonally diagonalizable.

Let A be n × n and let λ1 be an eigenvalue of A and u1 a () eigenvector for λ1. By the Gram- n Schmidt process, we may extend u1 to an orthonormal basis {u1,..., un} for R where {u2,..., un} is a basis for W ⊥. h i Set U = u1 u2 ··· un . Then

 T T  u1 Au1 ··· u1 Aun " # T  . . .  λ1 ∗ U AU =  . .. .  = .   0 B T T un Au1 ··· un Aun

T T T The first column is as indicated because ui Au1 = ui (λu1) = λ(ui · u1) = λδij. As U AU is symmetric, ∗ = 0 and B is a symmetric (n − 1) × (n − 1) matrix that is orthogonally diagonalizable T with eigenvalues λ2, . . . , λn (by the inductive hypothesis). Because A and U AU are similar, then the eigenvalues of A are λ1, . . . , λn. Since B is orthogonally diagonalizable, there exists an orthogonal matrix Q such that QT BQ = D, where the diagonal entries of D are λ2, . . . , λn. Now " #T " #" # " # " # 1 0 λ 0 1 0 λ 0 λ 0 1 = 1 = 1 . 0 Q 0 B 0 Q 0 QT BQ 0 D

" # " # 1 0 1 0 Note that is orthogonal. Set V = U . As the product of orthogonal matrices is 0 Q 0 Q T orthogonal, V is itself orthogonal and V AV is diagonal. 

T h i Suppose A is orthogonally diagonalizable, so A = UDU where U = u1 ··· un and D is the diagonal matrix whose diagonal entries are the eigenvalues of A, λ1, . . . , λn. Then

T T T A = UDU = λ1u1u1 + ··· + λnunun .

T This is known as the spectral decomposition of A. Each uiui is called a projection matrix because T (uiui )x is the projection of x onto Span{ui}.

Example 5. Construct a spectral decomposition of the matrix A in Example 1  3 −2 4   Recall that A = −2 6 2 and our orthonormal basis of Col(A) was 4 2 3 √ √ 1/ 2 −1/3 2 −2/3 √ u1 =  0  , u2 =  2 2/3  , u3 = −1/3 .  √   √    1/ 2 1/3 2 2/3

h i T Setting U = u1 u2 u3 gives U AU = D = diag(−2, 7, 7). The projection matrices are

 1/2 0 1/2   1/18 −2/9 −1/18   4/9 2/9 −4/9        u uT =   , u uT =  8  , u uT =   . 1 1  0 0 0  2 2  −2/9 9 2/9  3 3  2/9 1/9 −2/9        1/2 0 1/2 −1/18 2/9 1/18 −4/9 −2/9 4/9 The spectral decomposition is

T T T 7u1u1 + 7u2u2 − 2u3u3 = A. 2. Quadratic Forms

n T Definition 3. A quadratic form is a function Q on R given by Q(x) = x Ax where A is an n × n symmetric matrix, called the matrix of the quadratic form.

Example 6. The function x 7→ kxk is a quadratic form given by setting A = I.

Quadratic forms appear in differential geometry, physics, economics, and statistics. " # " # 5 −1 x Example 7. Let A = and x = 1 . The corresponding quadratic form is −1 2 x2

T 2 2 Q(x) = x Ax = 5x1 − 2x1x2 + 2x2.

2 2 2 Example 8. Find the matrix of the quadratic form Q(x) = 8x1 +7x2 −3x3 −6x1x2 +4x1x3 −2x2x3. By inspection we see that  8 −3 2    A = −3 7 −1 . 2 −1 −3

Theorem 9 (Principal Axes Theorem). Let A be an n × n symmetric matrix. Then there is an orthogonal change of variable x = P y, that transforms the quadratic form xT Ax into a quadratic form yT Dy such that D is diagonal.

Proof. By the Spectral Theorem, there exists an orthogonal matrix P such that P T AP = D with n T D diagonal. For all x ∈ R , set y = P x. Then x = P y and

T T T T T x Ax = (P y) A(P y) = y (P AP )y = y Dy. 

2 2 Example 10. Let Q(x) = 3x1 − 4x1x2 + 6x2. Make a change of variable that transforms Q into a quadratic form with no cross-product terms. " # (" #) (" #) 3 −2 −1 2 We have A = with eigenvalues 7 and −2, and eigenbases , , respectively. −2 6 2 1 We normalize each to determine our diagonalizing matrix P , so that P T AP = D where " # " # 1 −1 2 7 0 P = √ , and D = . 5 2 1 0 2

0 T 2 2 Our change of variable is x = P y and the new form is Q (x) = y Dy = 7y1 + 2y2.

If A is a symmetric n × n matrix, the quadratic form Q(x) = xT Ax is a real-valued function with n domain R . If n = 2, then the graph of Q(x) is the set of points (x1, x2, z) with z = Q(x). For 2 2 example, if Q(x) = 3x1 + 7x2, then Q(x) > 0 for all x 6= 0. Such a form is called positive definite, and it is possible to determine this property from the eigenvalues of A. Definition 4. A quadratic form Q is

(1) positive definite if Q(x) > 0 for all x 6= 0. (2) negative definite if Q(x) < 0 for all x 6= 0. (3) indefinite if Q(x) assumes positive and negative values.

Theorem 11. Let A be an n × n symmetric matrix. Then Q(x) = xT Ax is

(1) positive definite if and only if the eigenvalues of A are all positive. (2) negative definite if and only if the eigenvalues of A are all negative. (3) indefinite otherwise.

Proof. By the Principal Axes Theorem, there exists an orthogonal matrix P such that x = P y and

T T 2 2 Q(x) = x Ax = y Dy = λ1y1 + ··· + λnyn where λ1, . . . , λn are the eigenvalues of A. Since P is invertible, there is a 1-1 correspondence between all nonzero x and nonzero y. The values above for y 6= 0 are clearly determined by the signs of the λi. Hence so are the corresponding values of x 6= 0. 

We can also determine the maximum and minimum of a quadratic form when evaluated on a unit vector. This is known as constrained optimization

Theorem 12. Let A be a symmetric matrix. Set

m = min{xT Ax : kxk = 1} and M = max{xT Ax : kxk = 1}

The M is the greatest eigenvalue of A and m is the least eigenvalue of A. Moreover, xT Ax = M (resp. m) when x is the unit eigenvector corresponding to M (resp. m).

2 2 2 Example 13. Let Q(x) = 7x1 + x2 + 7x3 − 8x1x2 − 4x1x3 − 8x2x3. Find a vector x such that Q(x) is maximimized (mimimized) subject to xT x = 1.  7 −4 −2   The matrix of Q is A = −4 1 −4 and the eigenvalues are −3, 9. Hence, the maximum (resp. −2 −4 7 minimum) values of Q subject to the constraint is 9 (resp. 3). √ −1 −1/ 2 An eigenvector for 9 is  0 . Hence, setting u =  0  gives Q(u) = 9.    √  1 1/ 2 √ 1 1/ 6 √ An eigenvector for −3 is 2. Hence, setting v = 2/ 6 gives Q(v) = 6.    √  1 1/ 6 4. Singular Value Decomposition

The key question in this section is whether it is possible to diagonalize a non- matrix. " # 4 11 14 Example 14. Let A = . A linear transformation with matrix A maps the 8 7 −2 3 2 {x ∈ R : kxk = 1} onto an in R . Find a unit vector x at which the length kxk is maximized and compute its length.

The key observation here is that kAxk is maximized at the same x that maximizes kAxk2 and

kAxk2 = (Ax)T (Ax) = xT (AT A)x.

Thus, we want to maximize the quadratic form Q(x) = xT (AT A)x subject to the constraint kxk = 1. The eigenvalues of  80 100 40  T   A A = 100 170 140 40 140 220 are λ1 = 360, λ2 = 90, λ3 = 0 with corresponding (unit) eigenvectors 1/3 −2/3  2/3        v1 = 2/3 , v2 = −1/3 , v3 = −2/3 . 2/3 2/3 1/3

The maximum value is 360 obtained when x = v1. That is, the vector Av1 corresponds to the point on the ellipse furthest from the origin. Then " # 18 √ √ Av1 = and kAv1k = 360 = 6 10. 6

The trick we utilized above is a handy one. That is, even though A is not symmetric (it wasn’t even square), AT A is symmetric and we can extract information about A from AT A.

T Let A be an m × n matrix. Then A A can be orthogonally diagonalized. Let {v1,..., vn} be an T n orthonormal A A-eigenbasis for R with corresponding eigenvalues λ1, . . . , λn. For 1 ≤ i ≤ n, 2 T T T T T 0 ≤ kAvik = (Avi) (Avi) = vi (A A)vi = vi (λivi) = λivi vi = λi. √ Hence, λi ≥ 0 for all i. Arrange the eigenvalues such that λ1 ≥ λ2 ≥ · · · λn ≥ 0 and define σi = λi.

That is, the σi represent the lengths of the vectors Avi. These are called the singular values of A.

Example 15. In Example 14, the singular values are √ √ √ √ σ1 = 360 = 6 10, σ2 = 90 = 3 10, σ3 = 0.

n Theorem 16. Let A be an m × n matrix. Suppose {v1,..., vn} is an orthonormal basis for R T consisting of eigenvectors of A A with corresponding eigenvalues λ1 ≥ λ2 ≥ · · · λn ≥ 0. Suppose

A has r singular values. Then {Av1,...,Avr} is an orthogonal basis for Col(A) and rankA = r. Proof. Suppose i 6= j, then

T T T T (Avi) · (Avj) = (Avi) (Avj) = vi (A A)vj = λ(vi vj) = 0.

Hence, {Av1,...,Avr} is an orthogonal set and hence linearly independent. It is left to show that the given set spans Col(A). Since there are exactly r nonzero singular values, n Avi 6= 0 if and only if 1 ≤ i ≤ r. Let y ∈ Col(A), so y = Ax ∈ Col(A) for some x ∈ R . Then x = c1v1 + ··· + cnvn for some scalars ci ∈ R and so

y = Ax = c1Av1 + ··· + crAvr + cr+1Avr+1 + ··· + cnvn = c1Av1 + ··· + crAvr + 0 + ··· + 0.

Thus, y ∈ Span{Av1,...,Avr} and so {Av1,...,Avr} is an orthogonal basis for Col(A) and rankA = dim Col(A) = r. 

Theorem 17 (Singular Value Decomposition). Let A be an m × n matrix with rank r. Then there " # D 0 exists an m × n matrix Σ = where D is an r × r diagonal matrix for which the diagonal 0 0 entries in D are the first r singular values of A, σ1 ≥ σ2 ≥ · · · ≥ σr > 0. and there exists an m × m orthogonal matrix U and an n × n orthogonal matrix V such that A = UΣV T .

n Proof. Let {v1,..., vn} be an orthonormal basis for R By Theorem 16, there exists an orthogonal basis {Av ,...,Av } of Col(A). For each i, set u = Av / kAv k = 1 Av . Then Av = σ u . 1 r i i i σi i i i i m h i Extend {u1,..., ur} to an orthonormal basis {u1,..., um} of R . Let U = u1 ··· um and h i V = v1 ··· vn . Then both U and V are orthogonal and h i h i AV = Av1 ··· Avr 0 ··· 0 = σ1u1 ··· σrur 0 ··· 0 = UΣ.

T T Since V is orthogonal, A = AV V = UΣV . 

The columns of U in the preceding theorem are called the left singular vectors of A and the columns of V are the right singular vectors of A.

We summarize the method for SVD below. Let A be an m × n matrix of rank r. Then A = UΣV T where U, Σ,V T are as below.

T (1) Find an orthonormal basis of A A, {v1,..., vn}. T h i (2) Arrange the eigenvalues of A A in decreasing order. The matrix V is v1 ··· vn in this order. (3) The matrix Σ is obtained by placing the r singular values along the diagonal in decreasing order.

Avi (4) Set ui = for i = 1, . . . , r. Extend the orthogonal set {u1,..., ur} to an orthonormal kAvik m m basis of R , {u1,..., um} to a basis of R by adding vectors not in the set and applying h i Gram-Schmidt. The matrix U is u1 ··· um . Example 18. Consider A as in Example 14. We have already that 1/3 −2/3 2/3    V = 2/3 −1/3 −2/3 . 2/3 2/3 1/3 √ √ √ The nonzero singular values are σ1 = 360 = 6 10 and σ2 = 3 10 so " √ # " √ # 6 10 0 h i 6 10 0 0 D = √ and Σ = D 0 = √ . 0 3 10 0 3 10 0 Now " # " √ # " # " √ # 1 1 18 3/ 10 1 1 3 1/ 10 u1 = Av1 = √ = √ and u2 = Av2 = √ = √ . σ1 6 10 6 1/ 10 σ2 3 10 −9 −3/ 10

2 h i T Note that {u1, u2} is already a basis for R and so U = u1 u2 . Now we check that A = UΣV .

7 1   Example 19. Construct the Singular Value Decomposition of A = 0 0. 5 5 " # T 74 32 T We compute A A = . The eigenvalues of A A are λ1 = 90 and λ2 = 10 (note that 32 26 " # " # 2 −1 λ1 ≥ λ2 > 0). The corresponding eigenvectors are and , respectively. Normalizing gives 1 2 " √ # " √ # " √ √ # 2/ 5 −1/ 5 2/ 5 1/ 5 v1 = √ and v2 = √ so V = √ √ . 1/ 5 2/ 5 −1/ 5 2/ 5 √ √ √ The singular values are now σ1 = 90 = 2 10 and σ2 = 10. Hence,  √  " √ # " # 3 10 0 3 10 0 D √ D = √ and Σ = =  0 10 . 0 10 0   0 0 A has rank 2 and √ √ 1/ 2 −1/ 2 1 1 u1 = Av1 =  0  and u2 = Av2 =  0  . σ1  √  σ2  √  1/ 2 1/ 2

3 h i Choose u3 = cv3010 so that {u1, u2, u3} is an orthonormal basis of R . Then U = u1 u2 u3 and A = UΣV T .  1 −1   Example 20. Construct the Singular Value Decomposition of A = −2 2 . 2 −2 " # T 9 −9 T We compute A A = . The eigenvalues of A A are λ1 = 18 and λ2 = 0 with normalized −9 9 eigenvectors " √ # " √ # " √ √ # 1/ 2 1/ 2 1/ 2 1/ 2 v1 = √ and v2 = √ so V = √ √ . −1/ 2 1/ 2 −1/ 2 1/ 2 √ √ The singular values are now σ1 = 18 = 3 2 and σ2 = 0. Hence, √ 3 2 0  1/3  1 Σ =  0 0 and u1 = Av1 = −2/3 .   σ1   0 0 2/3

3 T To find u2, u3 such that {u1, u2, u3} is an orthonormal basis of R , we find a basis for Nul(u1 ). T Set u1 x = 0. Then we have 1 2 2 x − x + x = 0, 3 1 3 2 3 3 which implies x1 − 2x2 + 2x3 = 0. Hence, the parametric solution is given by         x1 2x2 − 2x3 2 −2         x = x2 =  x2  = 1 x2 +  0  x3. x3 x3 0 1

2 −2 Set w = 1 and w =  0 . By construction, 1w and w are orthogonal to u but not to each 2   3   ¯ 2 3 1 0 1 other. We apply Gram-Schmidt. Set ue21 = w2. Then       −2   2 −2/5 ue2 · w3 4 u3 = w3 − proj u2 = w3 − u2 =  0  − − 1 =  4/5  . e Span{ue2} e e       ue2 · ue2 5 1 0 1 Normalizing gives, √ √ 2/ 5 √ −2/5 −2/3 5 √ 5 √ u2 = 1/ 5 , u3 =  4/5  =  4/3 5  .   3    √  0 1 5/3

3 h i T Now {u1, u2, u3} is an orthonormal basis of R . Then U = u1 u2 u3 and A = UΣV .