6 Inner Product Spaces

You should be familiar with the or in R2, and with its basic properties. For any x1 y1  vectors x = ( x2 ) and y = y2 , define

x · y := x1y1 + x2y2 √ The or length of a vector is ||x|| = x · x, The θ between vectors satisfies x · y = ||x|| ||y|| cos θ, Vectors are orthogonal or perpendicular precisely when x · y = 0. The dot product is precisely what allows us to compute lengths of and between vectors in R2. An inner product is an that generalizes this idea to other vector spaces.

6.1 Inner Products and Norms

Definition 6.1. An inner product on a V over R is a positive definite, symmetric, ; that is, a function h , i : V × V → R satisfying the following properties: ∀λ ∈ R, x, y, z ∈ V,

(a) Positive definiteness x 6= 0 =⇒ hx, xi > 0 (b) hy, xi = hx, yi (c) Bilinearity hλx + y, zi = λ hx, zi + hy, zi and hz, λx + yi = λ hz, xi + hz, yi

A real vector space equipped with an inner product is a (real) . p The norm or length of x ∈ V is ||x|| := hx, xi.A unit vector has ||x|| = 1. Vectors x, y are perpendicular or orthogonal if hx, yi = 0. Vectors are orthonormal if they are orthogonal unit vectors.

The bilinearity condition is worth unpacking slightly: • An inner product is linear in both slots: if z ∈ V is constant, then we have two linear maps Tz,Uz : V → R defined by

Tz(x) = hx, zi ,Uz(x) = hz, xi

• We need only assume the first linearity condition: that the inner product is linear in the second slot (Uz linear) follows from the symmetry condition.

Euclidean Space The standard or Euclidean inner product on Rn is the usual dot product, v n u n h i = T = = + ··· + || || = u 2 x, y y x ∑ xjyj x1y1 xnyn, x t∑ xj j=1 j=1 where the norm is the Euclidean/Pythagorean length function. The resulting inner product space is often called Euclidean space. Unless indicated, if we refer to Rn as an inner product space, then we always mean with respect to the standard inner product. There are, however, many other ways to make Rn into an inner product space. . .

1 Example 6.2. (Weighted inner products) Define an alternative inner product on R2 via

hx, yi = x1y1 + 3x2y2

It is easy to check that this satisfies the required properties:

2 2 (a) Positive definiteness If x 6= 0, then hx, xi = x1 + 3x2 > 0; (b) Symmetry follows from the commutativity of multiplication in R:

hy, xi = y1x1 + 3y2x2 = x1y1 + 3x2y2 = hx, yi

(c) Bilinearity follows from symmetry and the associative/distributive laws in R:

hλx + y, zi = (λx1 + y1)z1 + 3(λx2 + y2)z2

= λ(x1z1 + 3x2z2) + (y1z1 + 3y2z2) = λ hx, zi + hy, zi

With respect to this new inner product, the concept of orthogonality in R2 has changed: for example, if x = 1 1  and y = √1 3 , we see that {x, y} is an orthonormal set with respect to h , i: 2 1 2 3 −1 1 1 1 ||x||2 = (12 + 3 · 12) = 1, ||y||2 = (32 + 3 · 12) = 1, hx, yi = √ (3 − 3) = 0 4 12 4 3 With respect to the standard dot product, hwoever, these vectors are not at all special:

1 5 1 x · x = , y · y = , x · y = √ 2 6 2 3

We have the same vector space R2, but different inner product spaces: (R2, ·) 6= (R2, h , i).

This example is easily generalized: let a1,..., an be positive real numbers (called weights) and define

n hx, yi = ∑ ajxjyj = a1x1y1 + ··· + anxnyn j=1

It is a simple exercise to check that this defines an inner product on Rn. In particular, this means that Rn may be viewed as an inner product space in infinitely many distinct ways.

There are even more inner products on Rn which aren’t described in the above example. For instance, if A ∈ Mn(R) is a symmetric , then hx, yi := yT Ax certainly satisfies the symmetry and bilinearity conditions. The matrix is called positive definite if hx, yi is an inner product. Here is an example of an inner product: 3 1 A = hx, yi = 3x y + x y + x y + x y 1 1 1 1 1 2 2 1 2 2

2 Basic properties of real inner products

Lemma 6.3. Let V be a real inner product space, let x, y, z ∈ V and λ ∈ R.

1. hx, 0i = 0

2. ||x|| = 0 ⇐⇒ x = 0

3. ||λx|| = |λ| ||x||

4. hx, yi = hx, zi for all x =⇒ y = z

5. (Cauchy–Schwarz inequality) |hx, yi| ≤ ||x|| ||y|| with equality ⇐⇒ x and y are parallel

6. (Triangle Inequality) ||x + y|| ≤ ||x|| + ||y|| with equality ⇐⇒ x and y are parallel and point in the same direction

Be careful with notation: |λ| means absolute value (applies to scalars), while ||x|| means norm (applies to vectors).

Proof. We leave parts 1–3 as exercises. 4. Let x = y − z and apply the bilinearity condition and part 2 to see that

hx, yi = hx, zi =⇒ 0 = hy − z, y − zi = ||y − z||2 =⇒ y = z

5. If y = 0, the result is trivial. WLOG we may assume ||y|| = 1: if the inequality holds for this, then it holds for all non-zero y by parts 1 and 4. Now expand:

0 ≤ ||x − hx, yi y||2 (positive definiteness) = ||x||2 + |hx, yi|2 ||y||2 − hx, yi hx, yi − hx, yi hy, xi (bilinearity) = ||x||2 − |hx, yi|2 (symmetry)

Taking square-roots establishes the inequality. By part 2, equality holds if and only if x = hx, yi y, which is precisely when x, y are linearly dependent.

6. We establish the squared result:

2 ||x|| + ||y|| − ||x + y||2 = ||x||2 + 2 ||x|| ||y|| + ||y||2 − ||x||2 − hx, yi − hy, xi − ||y||2 = 2||x|| ||y|| − hx, yi ≥ 2||x|| ||y|| − |hx, yi| ≥ 0 (Cauchy–Schwarz)

Equality requires both equality in Cauchy–Schwarz (x, y parallel) and that hx, yi ≥ 0: since x, y are already parallel, this means that one is a non-negative multiple of the other.

3 Complex Inner Product Spaces We can also define an inner product on a complex vector space. Since we will be primarily interested in the real case, we will often simply state the differences, if there are any.

Definition 6.4. An inner product on a vector space V over C is a positive definite, conjugate-symmetric, ; a function h , i : V × V → C satisfying the following properties: ∀λ ∈ C, x, y, z ∈ V,

(a) Positive definiteness x 6= 0 =⇒ hx, xi > 0

(b) Conjugate-symmetry hy, xi = hx, yi

(c) Sesquilinearity hλx + y, zi = λ hx, zi + hy, zi and hz, λx + yi = λ hz, xi + hz, yi

A complex vector space equipped with an inner product is a (complex) inner product space. The concepts of norm and orthogonality are identical to the real case.

• The prefix sesqui- means one-and-a-half, reflecting the property of being linear in the first slot and conjugate-linear in the second.

• A separate definition for the real case is not required. Simply replace C with F above: if F = R, we recover Definition 6.1. Note that an inner product space can only have base field R or C. For the remainder of this chapter, when we say an inner product space over F, we implicitly assume that F is R or C.

• We need only assume that a complex inner product is linear in its first slot; conjugate-symmetry then demonstrates the conjugate-linearity property.

• The entirety of Lemma 6.3 holds in a complex inner product space. A little extra care is needed for the proofs of the Cauchy–Schwarz and triangle inequalities. Note that |λ| now means the complex modulus of λ ∈ C.

• Warning! In case you dabble in the dark arts, be aware that the convention in Physics is for an inner product to be conjugate-linear in the first entry rather than the second!

The standard (Hermitian) inner product and norm on Cn are v n u n ∗ u 2 hx, yi = y x = ∑ xjyj = x1y1 + ··· + xnyn, ||x|| = t∑ xj j=1 j=1 where y∗ = yT denotes the conjugate transpose of y. We can also define weighted inner products exactly as in Example 6.2:

n hx, yi = ∑ ajxjyj = a1x1y1 + ··· + anxnyn j=1 note that the weights aj must still be positive real numbers.

4 Further Examples of Inner Products

The Frobenius inner product If F = R or C, and A, B ∈ Mm×n(F), define

hA, Bi = tr B∗ A

∗ T where tr is the of a and B = B is the conjugate transpose. This makes Mm×n(F) into an inner product space.

Example 6.5. In M3×2(C),

* 1 i   0 7 +  1 i  0 1 3 + 2i  2 − i −3 − 2i 2 − i 0  ,  1 2i = tr 2 − i 0  = tr 7 −2i 4 9 − 4i −4 + 7i 0 −1 3 − 2i 4 0 −1 = 2 − i − 4 + 7i = −2 − 8i   2   1 i   1 i   1 2 + i 0 6 i 2 − i 0  = tr 2 − i 0  = tr = 8 −i 0 −1 −i 2 0 −1 0 −1

This isn’t really a new example: if we map A → Fm×n by stacking the columns of A, then the Frobenius inner product becomes the standard inner product.

The L2 inner product If we have continuous functions f , g : [a, b] → C, define

Z b h f , gi := f (t)g(t) dt a This is well-defined, since continuous functions on closed bounded intervals are integrable. The properties of an inner product are easy to check, in particular,

Z b || f ||2 = | f (x)|2 dx = 0 ⇐⇒ f (x) ≡ 0 a follows because f is continuous. With sufficient care, we can even take the interval to be (−∞, ∞) R ∞ ( ) though we must restrict to functions f for which −∞ | f x | dx is finite. This is our first example of an infinite dimensional inner product space. It is common to restrict to subspaces of C[a, b]: in particular, if f , g are real-valued, then we have a real inner product space and the complex-conjugate in the definition is unnecessary.

2 Example 6.6. Let f (x) = x and g(x) = x : these lie in the real inner product space P2(R) ⊆ C[−1, 1] with respect to the L2 inner product;

Z 1 Z 1 2 Z 1 2 h f , gi = x3 dx = 0, || f ||2 = x2 dx = , ||g||2 = x4 dx = −1 −1 3 −1 5 n o q q  1 1 = 3 5 2 With some simple scaling, we see that || f || f , ||g|| g 2 x, 2 x is an orthonormal set.

5 The `2 inner product (for Analysis aficionados only!) The standard inner product on Fn can be generalized to the vector space `2 of square-summable sequences with values in F. That is, ∞ 2 2 (sn) ∈ ` ⇐⇒ ∑ |sn| < ∞ n=1 The inner product is defined by ∞ h(sn), (tn)i = ∑ sntn n=1 It requires a little analysis (outside the scope of this course) to prove that `2 is an inner product space: the challenge is showing that square-summable sequences form a vector space! This example is the prototypical : a useful type of infinite-dimensional inner product space. You should compare the structure with the L2 inner product. Both inner products can also be generalized with weights as in Example 6.2.

Exercises. 6.1.1. (a) Check explicitly that hx, yi = 3x1y1 + x1y2 + x2y1 + x2y2 is positive definite and therefore defines an inner product on R2. (Hint: Try to write ||x||2 as a sum of squares. . . ) 1  (b) Let x = 1 . With respect to the inner product above, find a non-zero unit vector y which is orthogonal to x. n n 6.1.2. Define hx, yi := ∑ xjyj on C . Is this an inner product? Which of the properties (a), (b), (c) of j=1 Definition 6.4 does it satisfy? 6.1.3. Prove parts 1, 2 and 3 of Lemma 6.3. a c 6.1.4. Use basic to prove the Cauchy–Schwarz inequality for vectors x = ( b ) and y = ( d ) in R2 with the standard (dot) product. 6.1.5. Prove the Cauchy–Schwarz and triangle inequalities for a complex inner product space. What has to change from Lemma 6.3? 6.1.6. Let V be an inner product space, prove Pythagoras’ Theorem:

If hx, yi = 0, then ||x + y||2 = ||x||2 + ||y||2

6.1.7. Prove the polarization identities:a 1 2 1 2 (a) In any real inner product space: hx, yi = 4 ||x + y|| − 4 ||x − y|| 4 1 k k 2 (b) In any complex inner product space: hx, yi = 4 ∑ i x + i y k=1 6.1.8. Let m ∈ Z and consider the complex-valuedb function f (x) = √1 eimx defined on the in- m 2π 2 terval [−π, π]. If h , i is the L inner product on C[−π, π], prove that { fm : m ∈ Z} is an orthonormal set. This example is central to the study of Fourier series.

aThese say that if one knows the length of every vector, then one knows the inner product. bIf the reliance on complex numbers is alarming, recall Euler’s formula eimx = cos mx + i sin mx. One can instead work entirely with the real-valued functions cos mx and sin mx. The problem is that you then need integration by parts :(

6 6.2 Gram–Schmidt Orthogonalization and Orthogonal Sets Throughout this chapter, and unless stated otherwise, V is an inner product space over either the real or complex field F. We start with a simple definition, relating subspaces to orthogonality.

Definition 6.7. Let U be a subspace of V. Its U⊥ is the set

U⊥ = {x ∈ V : ∀u ∈ U, hx, ui = 0}

It is easy to check that U⊥ is itself a subspace of V and that U ∩ U⊥ = {0}. It can moreover be seen that U ⊆ (U⊥)⊥, though equality need not hold (see Exercise 6.2.7.).

n 1   0 o ⊥ n 0 o Example 6.8. U = Span 0 , −1 ≤ R3 has orthogonal complement U = Span 3 . 0 3 1

We will use the concept of an orthogonal complement repeatedly. As in the example, we’re about to see that we often have a direct sum decomposition V = U ⊕ U⊥; recall what this means:

∀x ∈ V, ∃ unique u ∈ U, w ∈ U⊥ such that x = u + w

Whenever V = U ⊕ U⊥, we will call u and w the orthogonal projec- ⊥ tions of x onto U and U respectively; sometimes denoted πU(x) ⊥ and πU (x). The first question is how one computes these.

a Theorem 6.9. Suppose β = {u1,..., un} is an orthogonal subset of V consisting of non-zero vec- tors and let U = Span β.

1. For any x ∈ U, we have

n x, uj x = u (∗) ∑ 2 j j=1 uj

Moreover, β is linearly independent and thus a basis of U.

This simplifies to x = ∑ x, uj uj if β is an orthonormal set.

2. V = U ⊕ U⊥. Indeed, for any x ∈ V, we may write x = u + w where

n x, uj u = u ∈ and w = x − u ∈ ⊥ ∑ 2 j U U j=1 uj

are the orthogonal projections.

a D E I.e. uj 6= uk =⇒ uj, uk = 0.

n Since β is a basis of U, we see that (∗) essentially calculates the co-ordinate vector [x]β ∈ F .

7 n Proof. 1. If x ∈ U, we may write x = ∑j=1 ajuj for some scalars aj. The orthogonality of β shows that

n 2 hx, uki = ∑ aj uj, uk = ak ||uk|| j=1

Now let x = 0 to see that β is linearly independent.

2. Clearly u ∈ U. For each uk ∈ β, the orthogonality of β tells us that

n x, uj hw, u i = hx − u, u i = hx, u i − u , u = hx, u i − hx, u i = 0 k k k ∑ 2 j k k k j=1 uj

Thus w is perpendicular to a basis of U and thus to any element of U: we conclude that w ∈ U⊥ and that V = U + U⊥. Finally, U ∩ U⊥ = {0} forces the uniqueness of the decomposition.

2 x1 Examples 6.10. 1. Consider the standard orthonormal basis β = {e1, e2} of R . For any x = ( x2 ), we easily check that

2

∑ x, ej ej = x1e1 + x2e2 = x j=1

n 1   2   3 o 3 2. β = {u1, u2, u3} = 2 , −1 , 6 is an orthogonal set, and thus a basis, of R . We 3 0 −5  7  compute the co-ordinates of x = 4 with respect to β: 2       3 1 2 3 x, uj 7 + 8 + 6 14 − 4 + 0 21 + 24 − 10 x = u = 2 + −1 +  6  ∑ 2 j 1 + 4 + 9 4 + 1 + 0 9 + 36 + 25 j=1 uj 3 0 −5 1  2   3  3/2 3 1 = 2 + 2 −1 +  6  =⇒ [x]β =  2  2 2 3 0 −5 1/2

Compare this with the slow (augmented matrix) method for finding co-ordinates!  1  n 1   0 o 3. Revisiting Example 6.8, let x = 1 . Since β = 0 , −1 is an orthogonal basis of U, we 1 0 3 observe that 1  0   5  0 1 2 1 2 u = 0 + −1 = −1 , w = x − u = 3 1 10 5 5 0 3 3 1

are the orthogonal projections corresponding to R3 = U ⊕ U⊥.

8 The Gram–Schmidt Process Theorem 6.9 tells us how to compute the orthogonal projections corresponding to V = U ⊕ U⊥, provided U has a a finite, orthogonal basis. Our next goal is to see that such a basis exists for any finite-dimensional subspace.

Theorem 6.11 (Gram–Schmidt). Suppose S = {s1,..., sn} is a linearly independent subset of an inner product space V. Construct a sequence of vectors u1,..., un inductively:

• Choose u1 = λ1s1 where λ1 6= 0 k−1 ! sk, uj • For each ≥ 2, choose u = λ s − u where λ 6= 0 (†) k k k k ∑ 2 j k j=1 uj

Then β := {u1,..., un} is an orthogonal set of non-zero vectors and Span β = Span S.

The purpose of the non-zero multiple λk is to give you freedom to simplify your choice of uk. Other- wise you may find you have lots of complicated fractions! If you want a set of orthonormal vectors, it is easier to scale once the algorithm is finished.

n 1   2   1 o 3 Example 6.12. Let S = {s1, s2, s3} = 0 , −1 , 1 ⊆ R . We compute: 0 3 1

 1  2 u1 = s1 = 0 =⇒ ||u1|| = 1 0 hs2, u1i  2  2  1   0   0  2 s2 − t1 = −1 − 0 = −1 : choose u2 = −1 =⇒ ||u2|| = 10 2 0 ||u1|| 3 1 3 3 hs3, u1i hs3, u2i  1  1  1  2  0  2  0   0  s3 − u1 − u2 = 1 − 0 − −1 = 3 : choose u3 = 3 2 2 1 0 1 1 ||u1|| ||u2|| 1 10 3 5

The orthogonality of β = {u1, u2, u3} should be clear, as should Span{u1, u2} = Span{s1, s2}. One easily observes that {u , √1 u , √1 u } is an orthonormal basis of R3. 1 10 2 10 3

Proof of Theorem 6.11. For each k ≤ n, define Sk = {s1,..., sk} and βk = {u1,..., uk}. We prove by induction that each βk is an orthonormal set of non-zero vectors and that Span βk = Span Sk. The Theorem is then the terminal case k = n.

(Base case k = 1) Certainly β1 = {λ1s1} is an orthogonal set and Span β1 = Span S1.

(Induction step) Fix k ≥ 2, assume βk−1 is an orthogonal non-zero set and that Span βk−1 = ⊥ Span Sk−1. By Theorem 6.9, uk ∈ (Span βk−1) . We also see that uk 6= 0, for if not,

(†) =⇒ sk ∈ Span βk−1 = Span Sk−1 and Sk would be linearly dependent. It follows that βk is an orthogonal set of non-zero vectors. Moreover, sk ∈ Span βk =⇒ Span Sk ≤ Span βk. Since these spaces have the same (finite) dimension k, we conclude that Span βk = Span Sk. By induction, the result is proved.

9 By applying Gram–Schmidt to a basis, we instantly observe:

Corollary 6.13. Every finite-dimensional inner product space has an orthonormal basis.

Indeed if β is an orthonormal basis of an n-dimensional inner product space V, then the co-ordinate n isomorphism φβ : V → F defined by φβ(x) = [x]β is an isomorphism of inner product spaces where we use the standard inner product on Fn:

∗ hx, yi = [y]β[x]β

Example 6.14. This time we work in the inner product space of P(R) equipped with 2 = R 1 ( ) ( ) the L inner product h f , gi 0 f x g x dx. Let S = {1, x, x2} and apply the algorithm:

Z 1 2 f1 = 1 =⇒ || f1|| = 1 dx = 1 0 hx, f i Z 1 1 − 1 = − = − x 2 f1 x x dx x || f1|| 0 2 Z 1 2 2 1 =⇒ choose f2 = 2x − 1 with || f2|| = (2x − 1) dx = 0 3 R 1 x2, f x2, f Z 1 x2 (2x − 1) dx 2 − 1 − 2 = 2 − 2 − 0 ( − ) x 2 f1 2 x x dx 2x 1 || f1|| || f2|| 0 1/3 1 = x2 − x + 6 Z 1 2 2 2 2 1 =⇒ choose f3 = 6x − 6x + 1 with || f3|| = 6x − 6x + 1 dx = 0 5

This example can be extended to arbitrary degree since {1, x, x2,...} is an infinite linearly indepen- dent family in P(R). Indeed this shows that (P(R), h , i) has an orthonormal basis.

Gram–Schmidt also shows that our earlier discussion of orthogonal projections is generic:

Corollary 6.15. If U is a finite-dimensional subspace of an inner product space V, then V = U ⊕ U⊥ and the orthogonal projections may be computed as in Thereom 6.9.

If U is an infinite-dimensional subspace of V, then it is not guaranteed that we have V = U ⊕ U⊥ and so the orthogonal projections may not be well-defined. Regardless, if β is an orthonormal basis of U, it is common to describe the coefficients hx, ui for each u ∈ β as the Fourier coefficients, and the infinite sum ∑ hx, ui u u∈β as the orthogonal projection of x onto U, provided the sum converges. Situations like these are largely outside the scope of this course.

10 Exercises. 6.2.1. Apply the Gram–Schmidt algorithm to each set S and thus obtain an orthogonal basis for Span S, then obtain the co-ordinate representation (or Fourier coefficients) of the given vector with respect to your basis. n 1   0   0 o 3  1  (a) S = 1 , 1 , 0 ⊆ R , x = 0 1 1 1 1  3 5  −1 9  7 −17  −1 27  (b) S = −1 1 , 5 −1 , 2 −6 ⊆ M2(R) (Frobenius product), X = −4 8 n 1   0   0 o 3  1  (c) S = i , 1 , 0 ⊆ C with x = 1 0 i 1 1 n 1   2 o 3 6.2.2. Let S = {s1, s2} = 0 , −1 and let U = Span S ≤ R . Find the orthogonal projection 3 0  3  of x = −1 onto U. −2 (Hint: First apply Gram–Schmidt. . . ) 2 6.2.3. Find the orthogonal complement to U = Span{x } ≤ P2(R) with respect to the inner product = R 1 ( ) ( ) h f , gi 0 f t g t dt 6.2.4. Let V be an inner product space with orthonormal basis β = {v1,..., vn} and let T ∈ L(V). th Prove that the matrix of T with respect to β has jk entry T(vk), vj . 6.2.5. If U is a subspace of an inner product space V, prove that U⊥ is a subspace of V and that U ∩ U⊥ = {0}. 6.2.6. Recall Exercise 6.1.8.: β = { √1 eimx : m ∈ Z} is orthonormal with respect to h f , gi = 2π R π −π f (t)g(t) dt. The Fourier series of a function f is its orthogonal projection onto Span β: ∞ D E 1 ∞ Z π  F( f ) := f , √1 eimx √1 eimx = f (x)e−imx dx eimx ∑ 2π 2π ∑ m=−∞ 2π m=0 −π R π −imx Z Explicitly compute the integrals −π xe dx for each m ∈ and hence show that the Fourier series expression for the function f (x) = x is ∞ 2(−1)m+1 F(x) = ∑ sin mx m=1 m Sketch a few of the Fourier approximations (sum up to m = 5 or 7. . . ) and observe, when extended to R, how they approximate a discontinuous periodic function. 6.2.7. (a) Suppose U is a subspace of an inner product space V. Prove that U ⊆ (U⊥)⊥. If V = U ⊕ U⊥, prove that U = (U⊥)⊥. (b) Let `2 be the set of square-summable sequences of real numbers (see page 6). Consider th the sequences u1, u2, u3, . . ., where uj is the zero sequence except for a single 1 in the j entry. For instance,

u4 = (0, 0, 0, 1, 0, 0, 0, 0, . . .) ⊥ i. Let U = Span{uj : j ∈ N}. Prove that U contains only the zero sequence. = 1  `2 ii. Show that the sequence y n lies in , but does not lie in U; thus conclude that U is a proper subset of (U⊥)⊥. 6.2.8. (Hard) Much of Theorem 6.9 remains true, with suitable modifications, even if β is an infinite set. Restate and prove as much as you can, and identify the false part(s). (Hint: a linear combination is always a finite sum!)

11 6.3 The Adjoint of a Linear Operator Given a linear operator between inner product spaces V, W, we can describe a related operator.

Definition 6.16. Let T ∈ L(V, W) where V, W are inner product spaces over the same field F. The adjoint of T is a function T∗ : W → V (read ‘T-star’) satisfyinga

∀v ∈ V, w ∈ W, hT∗(w), vi = hw,T(v)i

aThe first inner product is computed within V and the second within W. If V = W, we don’t need the distinction.

There is much to unpack here, including the usual obvious questions.

Existence Does every have an adjoint? Uniqueness We wrote ‘The adjoint. . . ’: is this legitimate? Linearity If T∗ exists, is it a linear map? Computation How do we compute T∗ given T?

The first question turns out to be the trickiest: as we’ll see shortly, existence is guaranteed provided V is finite-dimensional. For the present, we assume existence and proceed with the other questions.

Lemma 6.17. If a linear map has an adjoint, then it is unique and linear.

Proof. Suppose T∗ and S∗ are adjoints of T. Then

hT∗(x), yi = hx,T(y)i = hS∗(x), yi

Since this holds for all y, Lemma 6.3 part 4 says that ∀x,T∗(x) = S∗(x), whence T∗ = S∗. It is similarly easy to check that

hT∗(λx + y), zi = hλT∗(x) + T∗(y), zi

Another appeal to the lemma establishes the linearity of T∗.

Other basic facts such as (T∗)∗ = T, (TS)∗ = S∗T∗, (λT + S)∗ = λT∗ + S∗, etc., are proved similarly. The next result shows how to compute the adjoint in finite-dimensions: again existence is assumed.

Theorem 6.18. Let T ∈ L(V, W), where β = {v1,..., vn} and γ = {w1,..., wm} are orthonormal bases of V, W respectively.

γ th 1. The matrix A = [T]β of T with respect to these bases has jk entry Ajk = T(vk), wj .

∗ β γ ∗ 2. The matrix of the adjoint is the conjugate-transpose: [T ]γ = ([T]β ) . 3. If V = W then the eigenvaluesa of T∗ are the complex conjugates of those of T.

aThe eigenvectors need not have any relationship however!

12 m Proof. 1. By definition, T(vk) = ∑i=1 Aikwi. Now take inner products with wj.

th ∗ β 2. This is straight from the definition and part 1: the jk entry of B = [T ]γ is

∗ Bjk = T (wk), vj = wk,T(vj) = T(vj), wk = Akj

∗ ∗ 3. This follows trivially from part 2: take determinants of [T − tI]β = [T − tI]β.

Adjoints are very easy to compute in finite-dimensions, provided you are in possession of orthonormal bases. Gram–Schmidt shows that such exist, though in practice their computation is tricky. There are other options. One could try to brute-force a calculation straight from the definition, though this is usually difficult. One could also use merely an orthogonal basis β and compute T∗(w) using projections: n hT∗(w), v i n hw,T(v )i ∗( ) = i = i T w ∑ 2 vi ∑ 2 vi i=1 ||vi|| i=1 ||vi|| Depending on the context, this might be faster than finding two orthonormal bases.

2 3 2  Examples 6.19. 1. Let T = LA ∈ L(R ) where A = −1 1 . Since A = [T]β where β = {i, j} is the ∗ standard orthonormal basis, we see that and so T = LAT :

3 −1 [T∗] = AT = β 2 1

2 3  i 2  γ 2. Let T = LA ∈ L(C , C ) where A = 1 1+i . This time A = [T] where β = {e1, e2} and −3 4−2i β 2 3 ∗ γ = {e1, e2, e3} are the standard orthonormal bases of C and C . We conclude that T = LA∗ :   β −i 1 −3 [T∗] = A∗ = γ 2 1 − i 4 + 2i

2 3. Let P2(R) be equipped with the inner product defined so that β = {1, x, x } is an orthonormal = d ∈ L( (R)) basis, and let T dx P2 . It is easy to find the matrices of T and its adjoint:

0 1 0 0 0 0 ∗ [T]β = 0 0 2 =⇒ [T]β = 1 0 0 0 0 0 0 2 0

The adjoint is therefore the linear map defined by

T∗(a + bx + cx2) = ax + 2bx2

It doesn’t make sense to talk of the adjoint outside the context of inner product spaces: T∗ depends both on the linear map T and the inner product(s). Here is an example where we take the same linear map as in the previous example, but with a different inner product on P2(R), the adjoint will also be different.

13 2 Example 6.20. In Example 6.14 we saw that β = { f1, f2, f3} = {1, 2x − 1, 6x − 6x + 1} is orthogonal 2 = R 1 ( ) ( ) (R) = d ∈ L( (R)) with respect to the L inner product h f , gi 0 f x g x dx on P2 . Given T dx P2 , we compute the adjoint in two different ways. Both rely on knowing the norms of the elements of β:

1 1 || f ||2 = 1, || f ||2 = , || f ||2 = 1 2 3 3 5 √ √ 2 1. γ = {g1, g2, g3} = {1, 3(2x − 1), 5(6x − 6x + 1)} is orthonormal. Observe that √ √ √ 6 5 √ T(g1) = 0, T(g2) = 2 3, T(g3) = 6 5(2x − 1) = √ g2 = 2 15g2 3  √    0 2 3 0 √0 0 0 √ ∗ =⇒ [T]β = 0 0 2 15 =⇒ [T ]β = 2 3√ 0 0 0 0 0 0 2 15 0 √ √ ∗ =⇒ T (ag1 + bg2 + cg3) = 2 3ag2 + 2 15bg3

This can be transformed to the standard basis: you obtain the result at the bottom of the page.

2. Use projections with β directly:

3 hT∗(p), f i 3 hp, f 0i hp, 2i hp, 12x − 6i ∗( ) = i = i = ( − ) + ( 2 − + ) T p ∑ 2 fi ∑ 2 fi 2x 1 6x 6x 1 i=1 || fi|| i=1 || fi|| 1/3 1/5

If p(x) is written with respect to β, then the inner products are easily evaluated. If you want the answer with respect to the standard basis, things are nastier. If p(x) = a + bx + cx2, then

Z 1 1 1 1 1 1 hp, 1i = a + bx + cx2 dx = a + b + c, hp, xi = a + b + c 0 2 3 2 3 4 from which  1 1  T∗(a + bx + cx2) = 6 a + b + c (2x − 1) 2 3  2 1 1 1  + 30 a + b + c − a − b − c (6x2 − 6x + 1) 3 2 2 3 = (−6a + 2b + 3c) + (12a − 24b − 26c)x + (30b + 30c)x2

With respect to the standard basis e = {1, x, x2}, we therefore have

0 1 0 −6 2 3  ∗ [T]e = 0 0 2 and [T ]e =  12 −24 −26 0 0 0 0 30 30

This is the same result you’d obtain by applying a change of basis to the answer in version 1.

The moral is that adjoints are easy once you have an orthonormal basis and really hard otherwise!

14 Existence of the adjoint on a finite-dimensional inner product space We first observe a natural relation between a finite-dimensional inner product space V and its dual- space: the space of linear maps L(V, F).

Theorem 6.21 (Riesz Representation Theorem). If V is finite-dimensional and g : V → F is linear, there exists a unique y ∈ V such that g(x) = hx, yi.

Proof. If g is the zero map, take y = 0. Otherwise, rank g = 1 and the rank–nullity theorem says that U = N (g)⊥ is one-dimensional. Let u ∈ U a unit vector and define, independently of u,

y := g(u)u ∈ V

Now observe:

• hu, yi = g(u) ||u||2 = g(u)

• x ∈ U⊥ =⇒ hx, yi = 0 = g(x)

We conclude that g(x) = hx, yi for all x ∈ V. The uniqueness of y follows from the cancellation property (Lemma 6.3, part 4).

Corollary 6.22 (Existence of the adjoint). Every linear map on a finite-dimensional inner product space has an adjoint.

Proof. Suppose V is finite-dimensional, let T ∈ L(V, W) and suppose z ∈ W is given. Simply define T∗(z) := y where y ∈ V is the unique vector in Riesz’s Theorem arising from the linear map

g : V → F, g(x) = hT(x), zi

Now check the defining property:

hT∗(z), xi = hy, xi = g(x) = hz,T(x)i whence T∗ is the adjoint of T.

• Note how only the domain V is required to be finite-dimensional.

• Riesz’s Theorem says that y 7→ g is an isomorphism of vector spaces V =∼ L(V, F). While there are infinitely many isomorphisms between these spaces, the inner product structure identifies a canonical or preferred choice.

• Riesz’s Theorem and the existence Corollary apply to the more general situation of continuous linear operators on Hilbert spaces: here V can be infinite-dimensional. The proof is a little trickier since the rank–nullity theorem isn’t so applicable in infinite-dimensions.

15 Least Squares Problems The adjoint provides a method for computing solutions to a certain class of optimization problems. First we observe something about orthogonal projections.

⊥ ⊥ Lemma 6.23. Suppose V = U ⊕ U so that the orthogonal projections πU and πU are well-defined. For any y ∈ V, the projection πU(y) is the unique closest element to y lying in U:

∀u ∈ U, ||y − πU(y)|| ≤ ||y − u||

⊥ ⊥ Proof. Simply apply Pythagoras’ Theorem (Exercise 6.1.6.): since y − πU(y) = πU (y) ∈ U ,

2 2 2 2 ||y − u|| = ||y − πU(y) + πU(y) − u|| = ||y − πU(y)|| + ||πU(y) − u|| 2 ≥ ||y − πU(y)|| with equality if and only if πU(y) = u.

A least-squares problem is a modified version of the above:

Definition 6.24. Suppose T ∈ L(X, Y) and y ∈ Y are given, and consider the linear equation a T(x) = y.A least-squares solution is a vector x0 ∈ X which minimizes ||y − T(x)||.

aMinimizing a norm in Fn means minimizing a sum of squares.

If y ∈ R(T), then a least-squares solution is simply a solution: ||y − T(x0)|| = 0 =⇒ T(x0) = y. Otherwise, we want x0 to come as close as possible to solving T(x) = y. With a little simplification, we can develop a method to attack these problems. ⊥ • If Y = R(T) ⊕ R(T) , the Lemma tells us that x0 must satisfy T(x0) = πR(T)(y). • Additionally, if T has an adjoint, the Fundamental Subspaces Theorem (Exercise 6.3.4.) tells us that x0 is a least-squares solution if and only if ⊥ ⊥ ∗ ∗ ∗ y − T(x0) = πR(T)(y) ∈ R(T) = N (T ) ⇐⇒ T T(x0) = T (y)

These conditions are certainly met when X, Y are finite-dimensional.

Lemma 6.25. If X, Y are finite-dimensional, then rank T∗T = rank T = rank T∗. In particular;

∗ ∗ ∗ ∗ 1. T T and T have the same range and so T T(x0) = T (y) always has at least one solution. 2. If rank T = dim V (equivalently null T = 0), then T(x) = y has a unique least-squares solution ∗ −1 ∗ ∗ −1 ∗ x0 = (T T) T (y). Moreover the orthogonal projection onto R(T) is πR(T) = T(T T) T .

Proof. By the rank–nullity theorem, we need only check that T∗T and T have the same nullspace: T(x) = 0 =⇒ T∗T(x) = 0 and, T∗T(x) = 0 =⇒ 0 = hT∗T(x), xi = hT(x),T(x)i = ||T(x)||2 =⇒ T(x) = 0

16 Least-squares problems are most commonly expressed with matrices: i.e. when T = LA for some n m A ∈ Mm×n(F) so that T : F → F is a map between standard inner product spaces. Equivalently this can be expressed in terms of a system of linear equations.

Example 6.26. Find the unique least-squares solution to overdetermined system       x1 + 3x2 = 5 1 3 5 x2 = 1 ! Ax = y where A = 0 1 y = 1  x1 + 2x2 = 1 1 2 1

First note that the system does not have a explicit solution. Since rank A = 2, we compute:

  −1   1 3    −1   ∗ − ∗ 1 0 1 1 0 1 2 5 1 0 1 (A A) 1 A =  0 1 = 3 1 2 3 1 2 5 14 3 1 2 1 2 1  −    1 − −  = 14 5 1 0 1 = 1 5 4 3 −5 2 3 1 2 3 1 2 −1 5 1 −1 −5 4  −2 =⇒ x0 = 1 = 3 1 2 −1 2 1

We can check this explicitly using a little multi-variable calculus: let

2 2 2 2 f (x1, x2) = ||y − Ax|| = (x1 + 3x2 − 5) + (x2 − 1) + (x1 + 2x2 − 1) then f is minimized when

∂ f ∂ f = 4x1 + 10x2 − 12 = 0 = = 10x1 + 28x2 − 36 ∂x1 ∂x2 which is precisely when (x1, x2) = (−2, 2). Moreover, the orthogonal projection onto the column space of A is left-multiplication by the matrix     1 3   2 1 1 ∗ − ∗ 1 −1 −5 4 1 A(A A) 1 A = 0 1 = 1 2 −1 3 1 2 −1 3 1 2 1 −1 2 which is easy to verify directly

A major application of least-squares problems is the computation of the regression line y = c0 + c1t for a data set {(tj, yj) : 1 ≤ j ≤ m}: this is used to predict y given a value of t. By definition, the regression line minimizes the sum of the squares of the vertical deviations     t 1 y m 1   1 2 2  . .  c1  .  ∑ yj − c1tj − c0 = ||y − Ax|| where A =  . .  x = y =  .  j=1 c0 tm 1 ym

With the indicated notation, we recognize this as a least-squares problem.

17 Example 6.27. Given the data set {(0, 1), (1, 1), (2, 0), (3, 2), (4, 2)}, we compute

0 1 1 1 1 1    −1         c ∗ − ∗ 30 10 15 0.3 A = 2 1 , y = 0 =⇒ x = 1 = (A A) 1 A y = =     0 c 10 5 6 0.6 3 1 2 0 4 1 2

The regression line therefore has equation y = 0.3t + 0.6. The process can be applied more generally to approximate a data set using other functions. For 2 instance, to find the best-fitting quadratic y = c2t + c1t + c0, we’d work with

 2    t t1 1   y1 1 c2 m 2 =  . . .  = =  .  =⇒ − 2 − − = || − ||2 A  . . .  x c1 y  .  ∑ yj c2tj c1tj c0 y Ax = 2 c0 j 1 tm tm 1 ym

Provided we have at least three distinct values t1, t2, t3, the matrix A is guaranteed to have rank 3 and there will be a best-fitting least-squares quadratic: in this case

1 y = (15t2 − 39t + 72) 70 This curve and the best-fitting straight line are shown below.

y y 2 2

1 1

0 0 01234 0 1 2 3 4 t t

The process can be extended to a degree n polynomial or to other families of functions. For instance, to find the best-fitting least-squares curve of the form y = aet + b to the data, we would apply the method to     et1 1 y   1 m . . a . t 2 2 A =  . .  x = y =  .  =⇒ y − ae j − b = ||y − Ax||  . .  b  .  ∑ j tm j=1 e 1 ym

18 Exercises. 6.3.1. For each of the inner product spaces V and linear transformations g : V → F, find a vector y ∈ V such that g(x) = hx, yi for all x ∈ V.  x  (a) V = R3 with the standard inner product, and g y = x − 2y + 4z z 2 z (b) V = C with the standard inner product, and g ( w ) = iz − 2w = (R) 2 = R 1 ( ) ( ) ( ) = ( ) + 0( ) (c) V P2 with the L inner product h f , hi 0 f t h t dt, and g f f 0 f 1 6.3.2. For each inner product space V and linear operator T ∈ L(V), evaluate T∗ on the given vector.   2 x 2x+y 3  (a) V = R with the standard inner product, T ( y ) = x−3y and x = 5   C2 z 2z+iw 3−i  (b) V = with the standard inner product, T ( w ) = (1−i)z and x = 1+2i = (R) = R 1 ( ) ( ) ( ) = 0 + ( ) = − (c) V P1 with h f , gi 0 f t g t dt,T f f 3 f and f t 4 2t 00 2 6.3.3. Let T( f ) = f be a linear transformation of P2(R) and let e = {1, x, x } be the standard basis. Find T∗(a + bx + cx2): (a) With respect to the inner product where e is orthonormal; = R 1 ( ) ( ) (b) With respect to the inner product h f , gi −1 f t g t dt. 2 1 ∗ (Hint: {1, x, x − 3 } is orthogonal, so project T (1), etc., onto its span. . . ) 6.3.4. Prove the Fundamental Subspaces Theorem for an inner product space V: let T ∈ L(V) have adjoint T∗ and let R and N denote the range and null space of a linear operator. (a) R(T∗)⊥ = N (T) (b) If V is finite dimensional, R(T∗) = N (T)⊥ 6.3.5. Let y, z ∈ V be fixed vectors and define T ∈ L(V) by T(x) = hx, yi z. Show that T∗ exists and find an explicit expression. 6.3.6. (a) In the proofs of Theorem 6.21, explain why y is depends only on g (not u) and thy the bullet points are enough to show that g(x) = hx, yi. (b) In the proof of Corollary 6.22, check that g(x) := hT(x), zi is linear. ∗ 6.3.7. Suppose A ∈ Mm×n(F) is such that no two columns are identical. Prove that A A is diagonal if and only if the columns of A are orthogonal. What additionally would it mean if A∗ A = I? 6.3.8. Check the calculation for the best-fitting least-squares quadratic in Example 6.27. 6.3.9. Find the best-fitting least-squares linear and quadratic approximations to the data set

{(1, 2), (3, 4), (5, 7), (7, 9), (9, 12)}

6.3.10. Consider the regression line with equation y = ct + d for a data set {(tj, yj) : 1 ≤ j ≤ m}. ∗ ∗ (a) Show that the equations A Ax0 = A y can be written in matrix form

∑ t2 ∑ t      j j c = ∑ tjyj ∑ tj m d ∑ yj

Cov(t,y) 1 1 2 1 2 (b) Show that c = 2 and d = y − ct where t = ∑ tj, y = ∑ yj, σt = ∑(tj − t) σt m m m ( ) = 1 ( − )( − ) and Cov t, y m ∑ tj t yj y are the means, variance and covariance respectively.

19 6.4 Normal and Self-Adjoint Operators We now consider a fundamental question: given a linear operator T ∈ L(V), can we find an orthonor- mal basis of V consisting of eigenvectors of T? For a general map this is far too much to hope for: indeed the linear maps for which this is possible are very, very special. We start with the definition and the major result.

Definition 6.28. Suppose T is a linear operator on an inner product space V which has an adjoint. We say that T is:

Normal if TT∗ = T∗T, where juxtaposition is composition of functions;

Self-adjoint if T∗ = T.

The definitions for square matrices are identical, where ∗ denotes the conjugate-transpose.

T Note that a real matrix A ∈ Mn(R) is self-adjoint if and only if it is symmetric: A = A. If T is self-adjoint then it is certainly normal, but the converse is false as the next example shows.

= 2 −1  Example 6.29. The (non-symmetric) real matrix A 1 2 is normal but not self-adjoint:

2 −1  2 1 5 0  2 1 2 −1 AAT = = = = AT A 1 2 −1 2 0 5 −1 2 1 2

More generally, every non-zero skew-hermitian matrix A∗ = −A is normal but not self-adjoint:

A∗ = −A =⇒ AA∗ = −A2 = A∗ A

Our main goal is to prove the following result: it comes in two flavors, depending on whether an inner product space is real or complex.

Theorem 6.30 (, version 1). Let T be a linear operator on a finite-dimensional inner product space V over F.

Complex case F = C: V has an orthonormal basis of eigenvectors of T if and only if T is normal.

Real case F = R: V has an orthonormal basis of eigenvectors of T if and only if T is self-adjoint.

In particular, every such linear operator is diagonalizable.

One direction of the theorem is very simple. If β is an orthonormal basis of eigenvectors of T, then

∗ [T]β = diag(λ1, ··· , λn) =⇒ [T ]β = diag(λ1, ··· , λn)

In the real case, these matrices are identical and so T∗ = T. In the complex case, we clearly have

∗ 2 2 ∗ [TT ]β = diag(|λ1| , ··· , |λn| ) = [T T]β

The tricky part of the spectral theorem is the converse! Before starting this, we consider several easy examples.

20 2 6 3  Examples 6.31. 1. We diagonalize the self-adjoint linear map T = LA ∈ L(R ) where A = 3 −2 . Characteristic polynomial: p(t) = (6 − t)(−2 − t) − 9 = t2 − 4t − 21 = (t − 7)(t + 3)

Eigenvalues: t1 = 7, t2 = −3   Eigenvectors (normalized): w = √1 3 , w = √1 −1 1 10 1 2 10 3 7 0  The basis β = {w1, w2} is orthonormal, with respect to which [T]β = 0 −3 is diagonal. 2 1 3  a 2. The map T = LA ∈ L(R ) where A = 0 −2 is neither self-adjoint nor normal. It is diagonal- izable: 1  3  1 0  γ = , [T] = 0 −1 γ 0 −2

In accordance with the spectral theorem, γ is not orthogonal. 0 −1  C2 R2 3. Let A = 1 0 and concider T = LA acting on both and . Since T is normal but not self-adjoint, we’ll see how the field really matters in the spectral theorem. First the complex case: T ∈ L(C2) is normal and thus diagonalizable with respect to an or- thonormal basis of eigenvectors. Here are the details. Characteristic polynomial: p(t) = t2 + 1 = (t − i)(t + i)

Eigenvalues: t1 = i, t2 = −i   Eigenvectors (normalized): w = √1 i , w = √1 −i 1 2 1 2 2 1 i 0  Certainly hw1, w2i = 0, β = {w1, w2} is orthonormal, and [T]β = 0 −i is diagonal. Now for the real case: T ∈ L(R2) is not self-adjoint and thus should not be diagonalizable with respect to an orthonormal basis of eigenvectors. Indeed this is trivial, since the characteristic polynomial has no roots in R, there are no eigenvalues! It is also clear geometrically: T is by 90° counter-clockwise around the origin, so it has no eigenvectors.

1 3  1 0   10 −6 1 3  1 0  1 3  a AA∗ = = 6= = = A∗ A 0 −2 3 −2 −6 4 3 13 3 −2 0 −2

Proving the Spectral Theorem for Self-Adjoint Operators We first need a few basic properties of self-adjoint operators.

Lemma 6.32. Let T ∈ L(V) be self-adjoint.

1. If W ≤ V is T- then the restriction TW is self-adjoint; 2. Every eigenvalue of T is real;

3. If dim V is finite then T has an eigenvalue.

It is irrelevant whether V is real or complex. However, if V is complex, the restriction to self-adjoint operators in part 3 is not required: the characteristic polynomial of every linear map splits over C (fundamental theorem of algebra) and so every linear map has an eigenvalue.

21 Proof. 1. Let w1, w2 ∈ W. Then

hTW (w1), w2i = hT(w1), w2i = hw1,T(w2)i = hw1,TW (w2)i

2. Suppose (λ, x) is an eigenpair. Then

λ ||x||2 = hT(x), xi = hx,T(x)i = λ ||x||2 =⇒ λ ∈ R

3. As observed above, we need only prove when V is real. Choose any orthonormal basis γ of V, n let A = [T]γ ∈ Mn(R), and define S := LA ∈ L(C ). Observe: • The characteristic polynomial of S splits over C, whence there exists an eigenvalue λ ∈ C. • The characteristic polynomials of S and T are identical (to that of A). • S is self-adjoint and thus (part 2) λ ∈ R.

It follows that T has the same real eigenvalue λ.

We are now in a position to prove the spectral theorem for self-adjoint operators on a finite-dimensional inner product space V. The proof is perfectly acceptable if V is real or complex.

Proof of the Spectral Theorem (self-adjoint case). Suppose T is self-adjoint on a finite-dimensional V: we prove by induction on dim V. (Base case) If dim V = 1, the result is trivial: V = Span{x} and T(x) = λx for some unit vector x and scalar λ ∈ R. (Induction step) Fix n ∈ N and assume that every inner product space of dimension n satisfies the spectral theorem. Suppose dim V = n + 1. By part 3 of the Lemma, T has an eigenpair (λ, x) where we may assume x has unit length. Let W = Span{x}⊥. If w ∈ W, then

hx,T(w)i = hT(x), wi = λ hx, wi = 0 (∗) whence W is T-invariant. By part 1 of the Lemma, TW is self-adjoint on the n-dimensional inner product space W. By the induction hypothesis, TW may be diagonalized by an orthonormal basis γ of W, whence T is diagonalized by the orthonormal basis β = γ ∪ {x} of V.

Proving the Spectral Theorem for Normal Operators What changes for normal operators on complex inner product spaces? The proof is almost identical:

• We don’t need part 3 of the lemma: every linear operator on a finite-dimensional complex inner product space has an eigenvalue.

• The induction step requires a little more: (∗) doesn’t work for normal operators and we need a stronger result than part 1 of the Lemma to see that TW is normal; this is covered in the exercises.

22 We finish by summarizing some of the basic properties of normal operators. Note that the lemma applies to self-adjoint operators as a special case.

Lemma 6.33. Let T be normal on V. Then:

1. ∀x ∈ V, ||T(x)|| = ||T∗(x)||; 2. T − kI is normal for any scalar k; 3. T(x) = λx ⇐⇒ T∗(x) = λx so that T and T∗ have the same eigenvectors and conjugate eigenvalues.a 4. Distinct eigenvalues of T have orthogonal eigenvectors.

aThis recovers the previously established fact that λ ∈ R if T is self-adjoint.

Proof. 1. ||T(x)||2 = hT(x),T(x)i = hT∗T(x), xi = hTT∗(x), xi = hT∗(x),T∗(x)i = ||T∗(x)||2. D E D E 2. hx, (T − kI)(y)i = hx,T(y)i − k hx, yi = hT∗(x), yi − kx, y = (T∗ − kI)(x), y shows that T − kI has adjoint T∗ − kI. It is trivial to check that these commute. parts 3.T (x) = λx ⇐⇒ ||(T − λI)(x)|| = 0 ⇐⇒ (T∗ − λI)(x) = 0 ⇐⇒ T∗(x) = λx. 1&2 4. In part this follows from the spectral theorem, but we can also prove more straightforwardly: suppose T(x) = λx and T(y) = µy where λ 6= µ. By part 3,

hT(x), yi hλx, yi = λ hx, yi

hx,T∗(y)i hx, µyi = µ hx, yi

This is a contradiction unless hx, yi = 0.

Schur’s Lemma We’ve seen that only normal or self-adjoint operators on finite-dimensional spaces have an orthonor- mal eigenbasis. It is reasonable to ask how useful an orthonormal basis can be for other operators. Here is one answer.

Lemma 6.34 (Schur). Suppose T is a linear operator on a finite-dimensional inner product space V. If the characteristic polynomial of T splits, then there exists an orthonormal basis β of V such that [T]β is upper-triangular.

Since the proof is very similar to the spectral theorem, we leave it to the exercises. The conclusion of Schur’s Lemma is weaker than the spectral theorem, though it applies to more operators: indeed if V is complex, the result applies to any T! Every example of the spectral theorem we’ve seen is therefore also an example of Schur’s Lemma. Example 6.31.2 provides another, since the matrix A is already upper triangular with respect to the standard orthonormal basis β = {e1, e2}.

23 Here is a further example:

0 Example 6.35. Consider T( f ) = 2 f (x) + x f (1) as a linear map T ∈ L(P1(R)) with respect to the = R 1 ( ) ( ) inner product h f , gi 0 f t g t dt. We have T(a + bx) = 2b + (a + b)x

If [T]β is to be upper-triangular, the first vector in β must be an eigenvector of T. It is easily checked that f1 = 1 + x is such with eigenvalue 2. To find a basis satisfying Schur’s Lemma, we need only find f2 orthogonal to this and then normalize. This can be done by brute force since the problem is small, but for the sake of practice we apply Gram–Schmidt to the polynomial 1:

h1, 1 + xi 1 + 1 1 1 − (1 + x) = 1 − 2 (1 + x) = (5 − 9x) =⇒ f = 5 − 9x 2 1 14 2 ||1 + x|| 1 + 1 + 3 Indeed we obtain an upper-triangular matrix for T:

2 −13 T( f ) = −18 − 4x = −13(1 + x) − (5 − 9x) = −13 f − f =⇒ [T] = 2 1 2 { f1, f2} 0 −1

We can also work with the corresponding orthonormal basis as posited in the theorem, though the matrix is messier: (r ) ! 3 1 2 − √13 3 β = {g1, g2} = (1 + x), √ (5 − 9x) =⇒ [T]β = 7 7 0 −1

Alternatively, we could have started with the other eigenvector h1 = 2 − x: an orthogonal vector to this is h2 = 4 − 9x, with respect to which

−1 −13 [T] = {h1,h2} 0 2

In both cases the eigenvalues are down the diagonal, as must be for an upper-triangular matrix.

In general it is difficult to quickly describe a suitable basis satisfying Schur’s Lemma, though for small matrices it can often be done simply by brute force. After trying the proof in the exercises, you should be able to describe a general method, though it is impractically slow!

24 Exercises. 6.4.1. For each linear operator T on an inner product space V, decide whether T is nor- mal, self-adjoint, or neither. If the spectral theorem allows it, find an orthonormal basis of V consisting of eigenvectors of V.   = R2 x = 2a−b (a) V and T ( y ) −2a+5b  − −  3  x  a b (b) V = R and T y = 5b z 4a−2b+5c 2 z 2z+iw  (c) V = C and T ( w ) = z+2w 4 (d) V = R with T : (e1, e2, e3, e4) 7→ (e3, e4, e1, e2) = (R) = R 1 ( ) ( ) ( ) = 0 (e) V P2 with h f , gi 0 f t g t dt and T f f (Hint: Don’t compute T∗! Instead assume T is normal and aim for a contradiction. . . ) ( ( )) = 0( ) + ( ) ∈ L( (R)) = R 1 ( ) ( ) 6.4.2. Let T f x f x 4x f 0 where T P1 and h f , gi −1 f t g t dt. Find an orthonormal basis of P1(R) with respect to which the matrix of T is upper-triangular. 6.4.3. Let S, T be operators on an inner product space V. (a) Prove that (ST)∗ = T∗S∗. (Hint: Consider hST(x), yi) (b) Suppose S, T are self-adjoint. Prove that ST is self-adjoint if and only if ST = TS. 6.4.4. Let T be a normal operator on a finite-dimensional inner product space V. Prove that N (T∗) = N (T) and that R(T∗) = R(T). (Hint: Use Lemma 6.33 and Exercise 6.3.4.) 6.4.5. Let T be self-adjoint on a finite-dimensional inner product space V. Prove that

∀x ∈ V, ||T(x) ± ix||2 = ||T(x)||2 + ||x||2

Hence prove that T − iI is invertible and that [(T − iI)−1]∗ = (T + iI)−1.

6.4.6. Let W be a T-invariant subspace of an inner product space V and let TW ∈ L(W) be the restriction of T to W. Prove: (a) W⊥ is T∗-invariant. ∗ ∗ ∗ (b) If W is both T- and T -invariant, then (TW ) = (T )W. ∗ (c) If W is both T- and T -invariant and T is normal, then TW is normal. 6.4.7. Use the previous question to complete the proof of the spectral theorem for a normal operator on a finite-dimensional complex inner product space. 6.4.8. (a) Suppose S is a normal operator on a finite-dimensional complex inner product space, all of whose eigenvalues are real. Prove that S is self-adjoint. (b) Let T be a normal operator on a finite-dimensional real inner product space V whose characteristic polynomial splits. Prove that T is self-adjoint and that there exists an or- thonormal basis of V of eigenvectors of T. (Hint: Mimic the proof of Lemma 6.32 part 3 and use part (a)) 6.4.9. Prove Schur’s Lemma by induction, similarly to the proof of the spectral theorem. (Hint: T∗ has an eigenvector x; why? Now show that W = Span{x}⊥ is T-invariant. . . )

25