“notes2” i i 2013/4/9 page 73 i i

Chapter 5 Symmetric and Hermitian Matrices

In this chapter, we discuss the special classes of symmetric and Hermitian matrices. We will conclude the chapter with a few words about so-called Normal matrices. Before we begin, we mention one consequence of the last chapter that will be useful in a proof of the unitary diagonalization of Hermitian matrices. Let A be an m n with m n, and assume (for the moment) that A has linearly independent× columns. Then≥ if the Gram-Schmidt process is applied to the columns of A, the result can be expressed in terms of a matrix factorization A = Q˜ R˜, where the orthogonal vectors are the columns in Q˜, and R˜ is unit upper triangular n n with the inner products as the entries in R˜. H × i 1 qk ai For example, consider that qi := ai k−=1 H qk. Rearranging the equa- − qk qk tion, P i 1 H n − qk ai ai = q + q + 0q i qH q k k kX=1 k k k=Xi+1 where the right hand side is a linear combination of the columns of the first i (only!) columns of the matrix Q˜. So this equation can be rewritten

H q1 ai H q1 q1 H  q2 ai  H q2 q2  . . .   H   qi−1ai  ˜  H  ai = Q  qi−1qi−1  .    1     0     .   .     0    Putting such equations together for i = 1 ,...,n , we arrive at the desired result, A = Q˜ R˜.

73 “notes2” i i 2013/4/9 page 74 i i

74 Chapter5. SymmetricandHermitianMatrices

Now, we really would prefer that Q˜ have orthonormal columns. That means column scaling each column of Q˜ by 1 divided by its length in the 2-norm. This corresponds to postmultiplication of Q˜ by a matrix D˜ that contains 1/ qi 2. k k 1 Thus, A = Q˜ R˜ = Q˜ D˜ D˜ − R˜ = QR , where now Q has orthonormal columns and R is still upper triangular, but it’s rows have been scaled by the entries in the 1 D˜ − . The factorization A = QR is called the QR factorization of A. Although we assumed in the preceding for ease of discussion that A had linearly independent columns, in fact such a factorization exists for any matrix A, the fine details are omitted.

5.1 Diagonalization of Hermitian Matrices

Definition 5.1. A matrix is said to be Hermitian if AH = A, where the H super- script means Hermitian (i.e. conjugate) . Some texts may use an asterisk for , that is, A∗ means the same as A. If A is Hermitian, it means that aij =a ¯ ji for every i, j pair. Thus, the diagonal of a Hermitian matrix must be real.

Definition 5.2. A matrix is said to be symmetric if AT = A.

Clearly, if A is real , then AH = AT , so a real-valued Hermitian matrix is symmetric. However, if A has complex entries, symmetric and Hermitian have different meanings. There is such a thing as a complex- ( aij = aji ) - a complex symmetric matrix need not have real diagonal entries. Here are a few examples. Symmetric Matrices:

0 2 4 2 1 − 2 i 3 + 4 i A = ; A =  2 7 5  ; A = − .  1 4  −  3 + 4 i 8 7i  4 5 8 −  −  Hermitian Matrices: 1 2 + 3 i 8 6 8+4 i − 3 5 A = ; A =  2 3i 4 6 7i  ; A = .  8 4i 9  − − −  5 8  − 8 6+7 i 5   As the examples show, the set of all real symmetric matrices is included within the set of all Hermitian matrices, since in the case that A is real-valued, AH = AT . On the other hand, one example illustrates that complex-symmetric matrices are not Hermitian.

Theorem 5.3. Suppose that A is Hermitian. Then all the eigenvalues of A are real. “notes2” i i 2013/4/9 page 75 i i

5.1. Diagonalization of Hermitian Matrices 75

Proof. . Suppose that Ax = λx for ( λ, x) an eigenpair of A. Multiply both sides of the eigen-equation by xH . (Recall that by definition of an eigenvector, x = 0.) Then we have 6

xH Ax = xH λx = λxH x H H 2 x A x = λ x 2 H k k2 (Ax ) x = λ x 2 H 2 k k (λx) x = λ x 2 k k2 λ¯ x 2 = λ x k k k k2 λ¯ = λ

and the last equality implies that λ must be real.

Recall that A is diagonalizable (over Cn) only when A has a set of n lin- early independent eigenvectors. We will show that Hermitian matrices are always diagonalizable, and that furthermore, that the eigenvectors have a very special re- lationship.

Theorem 5.4. If A is Hermitian, then any two eigenvectors from different eigenspaces are orthogonal in the standard inner-product for Cn (Rn, if A is real symmetric).

Proof. Let v1, v2 be two eigenvectors that belong to two distinct eigenvalues, H say λ1, λ 2, respectively. We need to show that v1 v2 = 0. Since this is true iff H H v1 (λ2v2) = λ2(v1 )v2 = 0, let us start there:

H H v1 (λ2v2) = v1 (Av 2) H H = v1 A v2 H = ( Av 1) v2 H = ( λ1v1) v2 ¯ H = λ1v1 v2 H = λ1v1 v2,

where the last equality follows using the previous theorem. It follows that ( λ2 H H − λ1)v1 v2 = 0. But we assumed that λ2 = λ1, it must be the case v1 v2 = 0, as desired. 6

Definition 5.5. A real, A is said to be orthogonally diagonaliz- able if there exists an Q and diagonal matrix D such that

A = QDQ T .

n n Definition 5.6. A matrix A C × is called unitarily diagonalizable if there ∈ “notes2” i i 2013/4/9 page 76 i i

76 Chapter5. SymmetricandHermitianMatrices

exists a U and diagonal matrix D such that A = UDU H .

Theorem 5.7 (). Let A be Hermitian. Then A is unitarily diagonalizable.

1 Proof. Let A have Jordan decomposition A = WJW− . Since W is square, we can factor (see beginning of this chapter) W = QR where Q is unitary and R is upper triangular. Thus, 1 H H A = QRJR − Q = QTQ where T is upper triangular because it is the product of upper triangular matrices 13 , and Q is unitary 14 . So A is unitarily similar to an upper T, and we may pre-multiply by QH and post-multiply by Q to obtain QH AQ = T. Taking the conjugate transpose of both sides, QH AH Q = TH However, A = AH and so we get T = TH . But T was upper triangular, and this can only happen if T is diagonal. Thus A = QDQ H as desired.

Corollary 5.8. In summary, if A is n n Hermitian, it has the following properties: × A has n real eigenvalues, counting multiplicities. • The algebraic and geometric mulitplicites of each distinct eigenvalue match. • The eigenspaces are mutually orthogonal in teh sense that eigenvectors corre- • spoinidng to different eignevalues are orthogonal. A is unitarily diagonalizable. •

Exercise 5.1. Let A and B both be orthogonally diagonalizable real matrices. a) Show A and B are symmetric b) Show that if AB = BA , then AB is orthogonally diagonalizable. 13 As mentioned elsewhere in the text, it is straightforward to show the product of upper trian- gular matrices is upper triangular. It is likewise straightforward to show that the inverse of an upper triangular matrix is upper triangular, so the expression RJR −1 is the product of 3 upper triangular matrices and is upper triangular 14 This is in fact the Schur factorization. “notes2” i i 2013/4/9 page 77 i i

5.1. Diagonalization of Hermitian Matrices 77

5.1.1 Spectral Decomposition

Definition 5.9. The set of eigenvalues of a matrix A is sometimes called the spectrum of A. The is the largest magnitude eigenvalue of A.

We know that if A is Hermitian, A = QDQ H , so let us write triple matrix product out explictly:

H λ1 0 0 q1 · · · H  0 λ2 0   q2  A = q1,..., qn . . . .  0 ......   .      H   0 0 λn   q   · · ·   n  H q1 H  q2  = [ λ1q1, , λ nqn . · · ·  .     qH   n  n H = λiqiqi Xi=1 This expression for A is called the spectral decomposition of A. Note that H H each qiqi is a rank-one matrix AND that each qiqi is an orthogonal onto Span( qi).

5.1.2 Positive Definite, Negative Definitie, Indefinite

Definition 5.10. Let A be a real symmetric matrix. We say that A is also positive definite if for every non-zero x Rn, xT Ax > 0. ∈ A similar result holds for Hermitian matrices

Definition 5.11. Let A be a complex Hermitian matrix. We say that A is also positive definite if for every non-zero x CN , xH Ax > 0. ∈ A useful consequence of HPD (SPD) matrices is that their eigenvalues (which we already know are real due to the Hermitian property) must be NON-NEGATIVE. Therefore, HPD (SPD) matrices MUST BE INVERTIBLE!

Theorem 5.12. A Hermitian (symmetric) matrix with all positive eigenvalues must be positive definite.

Proof. Since from the previous section we know A = QDQ H exists, from what we are given, the entries of D must be positive. Let x = 0. Then 6 xH Ax = xH QDQ H x “notes2” i i 2013/4/9 page 78 i i

78 Chapter5. SymmetricandHermitianMatrices

= ( QH x)H D(QH x) = zH Dz n 2 = λi zi . | | Xi=1

H Since the λi > 0 and z := Q x cannot be zero (why?), the result follows.

4 1 Example 5.13 Let A = . Compute the eigenvalues, observe they are  1 2  both positive. By the previous theorem, this matrix is SPD.

Exercise 5.2. Let A be HPD. Show < q, z >:= zH Aq defines a valid inner product on Cn.

A close cousin is the positive semi-definite matrix.

Definition 5.14. A Hermitian (symmetric) matrix is semi-definite if for every non-zero x Cn (x Rn), xH Ax 0. ∈ ∈ ≥ We also have the concept of negative-definite matrices.

Definition 5.15. If A is Hermitian, then it is negative definite if for every non-zero x Cn, xH Ax < 0. ∈ A negative definite Hermitian (symmetric) matrix must have all strictly neg- ative eigenvalues. So it, too, is invertible. A symmetric (Hermitian) indefinte matrix is one that has some positive and some negative (and possibly zero) eigenvalues.

5.2 Quadratic Forms A motivating quote from David Lay’s Third Ed., Linear Algebra and Its Applica- tions:

quadratic forms , occur frequently in applications of linear algebra to engineering (in design criteria and optimization) and signal processing (as output noise power). They also arise, for example, in physics (as potential and kinetic energy) differential geometry (as normal curvature of surfaces), economics (as utility functions), and statistics (in confidence ellipsoids). “notes2” i i 2013/4/9 page 79 i i

5.2. Quadratic Forms 79

In fact, you saw a quadratic form already in the definitions of the previous subsection.

Definition 5.16. A quadratic form on Rn (Cn) is a function : Rn R ( : Cn R) defined as follows: Q → Q → (x) = xT Ax ( (x) = xH Ax ) Q Q where A is an n n symmetric matrix (Hermitian matrix). Here, A is called the matrix of the quadratic× form.

5 0 Example 5.17 A = . Compute the quadratic form. xT Ax = 5 x2 +4 x2.  0 4  1 2

5 1 Example 5.18 A = − . If we compute the quadratic form here, there  1 4  are cross terms due to the− presence of non-zero off :

T 2 2 x Ax = 5 x 2x1x2 + 4 x . 1 − 2

In the 2nd example, it’s difficult to see whether or not this quadratic term will always give something that’s positive. However, there is an easy way to investigate the possibility (well, easy provided someone has handed you the eigendecomposi- tion!):

Theorem 5.19. Let A be an n n Hermitian (symmetric). Then there is a unitary (orthogonal) change of variable× of the form x = Qy that transforms the quadratic form xH Ax (xT Ax ) into a quadratic form yH Dy (xH Dx ) where the latter has no cross product term.

5.2.1 Geometry and Principal Axes In R2, we can get some geometric intuition of the quadratic form for a symmetric A by looking at its level sets. Consider f(x) = xT Ax as a map from R2 to R. 2 T Now consider Wc = x R x Ax = c for a fixed, real, constant c. The set { ∈ | } Wc is called the c-level set of the quadratic function f(x). One of the following will occur: The c-level set will be an ellipse (or circle, if both semi-axes have the same • length) The c-level set will be a hyperbola • The c-level set will be one or two lines, or a single point, or contain no points • at all “notes2” i i 2013/4/9 page 80 i i

80 Chapter5. SymmetricandHermitianMatrices

To see this, we start with A being a 2 2 real, diagonal matrix. Then xT Ax = 2 2 × a11 x1 + a22 x2. Let’s assume that A is invertible (no 0’s on the diagonal). Then

2 2 a11 2 a22 2 a11 x + a22 x = c x + x = 1 . (5.1) 1 2 ⇒ c 1 c 2

Consider first the case that a11 > 0, a 22 > 0. Note that in this case A is a11 1 symmetric positive definite. Then if c > 0, c = α2 for some positive α and a22 1 1 = 2 for some positive β. For example, if c = 1, α = . Thus the rightmost c β √a11 equation in ( 5.1 ) is in fact the equation for an ellipse centered at the origin , with α being the length of the semi-axis oriented on the x1 axis, and β be being the length of the semi-axis oriented in the vertical x2 component. On the other hand, if c = 0 and a11 , a 22 are both positive, there are no solutions to the equation; i.e. the -2 level set would contain no points at all for instance. If A is negative definite, a11 < 0, a 22 < 0. If you look at a level set for which c < 0, the same analysis as above goes through - the picture will be an ellipse. Now, without loss of generality, assume that a11 > 0 but a22 < 0. This means that A is indefinite . Looking back at ( 5.1 ), it can be rewritten

x2 x2 1 2 = 1 , α > 0, β > 0 α2 − β2

2 c for α as above and β = a22 . The former is an equation for a hyperbola centered at the origin opening on the| left| and right. To draw it, you draw a rectangle centered at the origin extending to ( α, 0) on the x1 axis and ( α, 0) the other direction, − (0 , β ) and (0 , β) on the x2 axis, then 2 diagonal lines through the origin and the − corners of the rectangle give the outline for the hyperbola. (obviously if a11 < 0 and a22 > 0 you will also get a hyperbola opening up and down.) And if one of the diagonal elements is zero (assume without loss of generality 2 c that a22 = 0), the level set equation reduces to x1 = a11 . If c = 0, then the line x1 = 0 is a solution. Take c be non-zero and to have the same sign as a11 . Then c the solution to the equation is x1 = a11 , so we get 2 parallel lines for the level curves. ±q If both diagonal elements are 0, and 0 is 0, the level curve is the point (0,0). Now the interesting question: What if A is not diagonal, but it is symmetric? The analysis proceeds easily if we use a change of coordinates provided by the eigenvectors of A . Since A is symmetric, A = QDQ T . So xT Ax = yT Dy T where y = Q x. So if we used the y1, y 2 coordinate system, the matrix is diagonal in that coordinate system, so shape-wise, the analysis goes through as above. We want to convince ourselves that the level set picture (ellipse, hyperbola, etc) when sketched in x1, x 2 space, though, should be the same, just aligned with q1, q2 (which are orthonormal!) as the coordinate axes. 1 Now since y = QT x, Qy = x. Let y = , whose span represents the  0  “horizontal” axis in ( y1, y 2) space, which, by the analysis above, is one of the 2 x1 principle axes for representing the level curves in that space. Then =  x2  “notes2” i i 2013/4/9 page 81 i i

5.2. Quadratic Forms 81

q11 Qy = q1 = ; so the the corresponding axis in ( x1, x 2) space must be q1.  q21  0 x1 Similarly, if y = , the vertical direction in ( y1, y 2) space, = Qy = q2,  1   x2  so the corresponding axes in ( x1, x 2) space must be q2. See boardwork during in class lecture for pictures. Finally, now that we have level curves, we can draw a 3D picture by setting z = f(x) and plotting ( x1, x 2, z ). Verify the following:

When A is SPD, the surface is bowl shaped, concave up, with a unique min- • imum.

When A is symmetric negative definite, the surface is bowl shaped, concave • down, with a unique maximum.

When A is symmetric indefinite (no zero evalues) we get a saddle shape, • concave up or down.

When A is only symmetric but singular, you get a paraboloid (an entire valley • of points that are all maxima or minima)

We can now use these pictures and the intuition developed by looking at them to discuss optimization.

5.2.2 Constrained Optimization; Optimality Characterization of Eigenvalues In Calculus III, you saw pictures of quadratic forms before, but they were developed without the matrix theory behind them. The test for existence of (global) min is the same as determining if A is SPD. For global max, it’s the same as checking for negative definite. Often in practice, we are looking for a max or a min value over a restricted set of points in Rn (we could do this for Cn too of course, but the graphical intuition is for R2, R3 so we will constrain the discussion to the real case here.) We motivate our discussion by the following example

Example 5.20 Find the maximum and minimum values of f(x) = xT Dx , with D a diagonal matrix with entries 7 , 5, 4, respectively, subject to the constraint x 2 = 1. (So how large, small, value(s) can the quadratic form take on over all vectorsk k in R3 with unit length). Note that this means D is symmetric positive definite . We have

2 2 2 f(x) = 7 x1 + 5 x2 + 4 x3 2 2 2 7x1 + 7 x2 + 7 x3 ≤ 2 2 2 = 7( x1 + x2 + x3) = 7 “notes2” i i 2013/4/9 page 82 i i

82 Chapter5. SymmetricandHermitianMatrices

where the last equality follows from the constraint. So f(x) 7. Can it ever equal 7? Yes. Set x = (1 , 0, 0) (admissible since it has unit length),≤ and the upper bound is achieved exactly. Therefore, 7 is the max value of the quadratic form when x 2 = 1. k k Similarly,

2 2 2 f(x) = 7 x1 + 5 x2 + 4 x3 2 2 2 4x1 + 4 x2 + 4 x3 ≥ 2 2 2 = 4( x1 + x2 + x3) = 4

where again the last equality follows from the constraint. To see that this lower bound can be achieved, substitute x = (0 , 0, 1) (admissible since it has unit length. The key observation is this: the max and min of the quadratic form over all unit length vectors correspond to the maximum and minimum eigenvalues of this positive definite matrix. .

The following example can be illustrated graphically since we restrict ourselves to a 2-d matrix (2 variable quadratic form) and therefore a 3D picture.

6 0 Example 5.21 Let f(x) = xT Dx where D = . Define g(x) = x 2.  0 3  k k2 2 2 Now the set of all unit-length vectors satisfies g(x) = x1 + x2 = 1 (unit circle in 2D), which corresponds to the points on the surface ( x1, x 2, 1) (you get a cylinder in 3D). Finding the min and max of the quadratic subject to the unit length constraint is a problem of finding the highest and lowest points on the intersection curve of those two surfaces. Proceeding logically as in the preceding example, we find the max to be 6 and the min to be 3. The vectors for which the max and min values are achieved are (1 , 0) (2 maxes) and (0 , 1) (2 mins) respectively. A key take away is that since D± was diagonal, the max± and min values occur at the (normalized) eigenvector directions, respectively.

These two examples illustrate something about symmetric matrices (not just SPD ones) that is true more generally.

Theorem 5.22. Let A be a symmetric matrix. Define

T T a = min x Ax x 2 = 1 , b = max x Ax x 2 = 1 . { |k k } { |k k } Then every eigenvalue λ of A satisfies a λ b. Specifically, a is the minimum ≤ ≤ eigenvalue (label it λn) of A and b is the maximum eigenvalue (label it λ1) of A. Moreover, b is attained for any (normalized) eigenvector corresponding to λ1 and a is attained for any (normalized) eigenvector corresponding to λn. “notes2” i i 2013/4/9 page 83 i i

5.2. Quadratic Forms 83

Proof. We’ll sketch the proof here. It’s along the lines of the first two examples. We just need to change into the correct coordinate system, and show that doing so doesn’t change the length of the vectors in question. Recall xT Ax = yT Dy , y = QT x( equivalently x = Qy ). Here, assume that then entries in D are ordered from largest to smallest (not in magnitude, but in actual value, including sign.)

T T T T x 2 = √x x = y Q Qy = y y = y 2, k k q p k k since Q is an orthogonal matrix. Therefore the quadratic form can be represented in either coordinate system, it assumes the same set of values when x, y are allowed to vary over the set of all unit vectors. Now we are able to use the same trick as in the first example to deduce the max and min of the quadratic function as represented in the diagonal coordinate system. We are able to deduce (along the lines of the 2nd example) that the max and min are achevied when y = e1 and y = en due to the ordering of the entries in D that ± ± was assumed. But when y = e1, x = Qy = q and y = en, x = Qy = q . ± ± 1 ± ± n Let a, b as defined above. Recall q1, qn are orthogonal. Let c = (1 α)a+αb for α [0 , 1]. Verify first of all then that c [a, b ]. Now let − 2 ∈ T T T ∈ T x = √1 αq1 +√αqn. x 2 = x x = (1 α)q1 q1 +( α)q2 q2 +2 √1 α√αq1 q2 = (1 α) + −α + 0. k k − − − Also xT Ax = c. So for each number c between a and b, there is a unit vector x such that c = xT Ax , which shows that the set of all possible values of xT Ax for x 2 = 1 is a closed interval on the real axis. k k Finally, to show that every λ [a, b ], it suffices to find x so that λ = xT Ax . But this is done by choosing x to be∈ any of the other n 2 eigenvectors. −

5.2.3 Motivation for Chapter 6.... m n m n Suppose that B C × (R × ). Certainly if m = n, it doesn’t even make sense to talk about B being∈ Hermitian (symmetric), let alone6 positive (semi)-definite. On the other hand, BH B which is ( n n) and BB H (which is m m) are Hermitian (symmetric, in real case) – this is× easy to establish. They are× also at least positive semi-definite, if not positive definite – the proof is left as an exercise. And therefore each of these matrices is unitarily (orthogonally) diagonalizable, with real, non-negative eigenvalues. We mention this here because it is an excellent motivation for considering if one can use the above fact to derive a “nice” way to “diagonalize” (it’s in quotes because the diagonal term is not necessarily square if m = n) the m n matrix B. Read on to Chapter 6 for the answer.... 6 × “notes2” i i 2013/4/9 page 84 i i

84 Chapter5. SymmetricandHermitianMatrices “notes2” i i 2013/4/9 page 85 i i

Chapter 6 Singular Value Decomposition

In this section, we again appeal to the fact that every linear transformation can be identified with a corresponding matrix of the transformation, A. So it sufficies to study what’s going on with these matrices.

6.1 Matrix 2-norm and Frobenius norm Previously, we defined norms in terms of inner products on vector spaces. The space m n C × is also a . Consider the following inner product definition: m n n ¯ = Bij Aij = B:∗,j A:,j , Xi=1 Xj=1 Xj=1 so it’s basically the sum of the inner products of corresponding columns of the matrices A and B.

Exercise 6.1. Show that this is a valid inner product.

Exercise 6.2. The of a matrix is the sum of the diagonal entries in the matrix. Show that = trace (A∗A).

2 In particular, we have that = Aij . | | From this, we will define the FrobeniusPP norm :

2 A F = = Aij . k k p qXX | | This is slightly annoying when A is dense, but if the matrix is diagonal, the double sum collapses to a single sum. It is also possible to use standard vector norms to induce matrix norms. In this class, we will consider only one such example, called the matrix 2-norm.

Define A 2 = max x 2=1 Ax 2. Since x has unit length, it is on the unit ball. Multiplicationk k by Ak isk a lineark k transformation. Thus, the 2-norm measures

85