<<

MAA6617 COURSE NOTES SPRING 2014

19. Normed vector spaces Let X be a vector over a field K (in this course we always have either K = R or K = C). Definition 19.1. A on X is a function k · k : X → K satisfying: (i) (postivity) kxk ≥ 0 for all x ∈ X , and kxk = 0 if and only if x = 0; (ii) (homogeneity) kkxk = |k|kxk for all x ∈ X and k ∈ K, and (iii) () kx + yk ≤ kxk + kyk for all x, y ∈ X . Using these three properties it is straightforward to check that the quantity d(x, y) := kx − yk (19.1) defines a on X . The resulting is called the norm topology. The next propo- sition is simple but fundamental; it says that the norm and the operations are continuous in the norm topology. Proposition 19.2 (Continuity of vector space operations). Let X be a normed vector space over K. a) If xn → x in X , then kxnk → kxk in R. b) If kn → k in K and xn → x in X , then knxn → kx in X . c) If xn → x and yn → y in X , then xn + yn → x + y in X . Proof. The proofs follow readily from the properties of the norm, and are left as exercises. 

We say that two norms k · k1, k · k2 on X are equivalent if there exist absolute constants C, c > 0 such that ckxk1 ≤ kxk2 ≤ Ckxk1 for all x ∈ X . (19.2) Equivalent norms defined the same topology on X , and the same Cauchy (Prob- lem 20.2). A normed space is called a if it is complete in the norm topology, that is, every Cauchy in X converges in X . It follows that if X is equipped with two equivalent norms k · k1, k · k2 then it is complete in one norm if and only if it is complete in the other. We will want to prove the completeness of the first examples we consider, so we begin P∞ P∞ with a useful proposition. Say a n=1 xn in X is absolutely convergent if n=1 kxnk < PN ∞. Say that the series converges in X if the limit limN→∞ n=1 xn exists in X (in the P∞ norm topology). (Quite explicitly, we say that the series xn converges to x ∈ X if n=1 lim x − PN x = 0.) N→∞ n=1 n Proposition 19.3. A normed space (X , k · k) is complete if and only if every absolutely convergent series in X is convergent. 1 P∞ Proof. First suppose X is complete and let n=1 xn be absolutely convergent. Write sN = PN th P∞ n=1 xn for the N partial sum and let  > 0. Since n=1 kxnk is convergent, we can P∞ choose L such that n=L kxnk < . Then for all N > M ≥ L,

N N X X ksN − sM k = xn ≤ kxnk <  (19.3) n=M+1 n=M+1

so the sequence (sN ) is Cauchy in X , hence convergent by hypothesis. Conversely, suppose every absolutely convergent series in X is convergent. Let (xn) be a in X . The idea is to arrange an absolutely convergent series whose partial −1 sums form a subsequence of (xn). To do this, first choose N1 such that kxn − xmk < 2 −2 for all n, m ≥ N1 and put y1 = xN1 . Next choose N2 > N1 such that kxn − xmk < 2 for all n, m ≥ N2 and put y2 = xN2 − xN1 . Continuing, we may choose inductively an ∞ −k increasing sequence of integers (Nk)k=1 such that kxn − xmk < 2 for all n, m ≥ Nk and P∞ P∞ −(k−1) define yk = xNk − xNk−1 . We have k=1 kykk < k=1 2 < ∞ by construction, and the th P∞ P∞ K partial sum of k=1 yk is xNK . Thus by hypothesis, the series k=1 yk is convergent in

X , which means that the subsequence (xNk ) of (xn) is convergent in X . The proof is finished by invoking a standard fact about convergence in metric spaces: if (xn) is a Cauchy sequence which has a subsequence converging to x, then the full sequence converges to x. 

19.1. Examples. n Pn 2 1/2 a) Of course, R with the usual Euclicean norm k(x1, . . . xn)k = ( k=1 |xk| ) is a Banach space. (Likewise Cn with the Euclidean norm.) Besides these, Kn can be equipped with the `p-norms

n !1/p X p k(x1, . . . xn)kp := |xk| (19.4) k=1 for 1 ≤ p < ∞, and the `∞-norm

k(x1, . . . xn)k∞ := max(|x1|,... |xn|). (19.5)

It is not too hard to show that all of the `p norms (1 ≤ p ≤ ∞) are equivalent on Kn (though the constants c, C must depend on the dimension n). It turns out that any two norms on a finite-dimensional vector space are equivalent. As a corollary, every finite-dimensional normed space is a Banach space. See Problem 20.3. b) (Sequence spaces) Define

c0 := {f : → | lim |f(n)| = 0} N K n→∞ ∞ ` := {f : N → K| sup |f(n)| < ∞} n∈N ∞ 1 X ` := {f : N → K| |f(n)| < ∞} n=0 2 It is a simple exercise to check that each of these is a vector space. Define for functions f : N → K

kfk∞ := sup |f(n)| n ∞ X kfk1 := |f(n)|. n=1 ∞ 1 Then kfk∞ defines a norm on both c0 and ` , and kfk1 is a norm on ` . Equipped with these respective norms, each is a Banach space. We sketch the proof for c0, the others are left as exercises (Problem 20.4). The key observation is that fn → f in the k · k∞ norm if and only if fn → f uniformly as functions on N. Suppose (fn) is a Cauchy sequence in c0. Then the sequence of functions fn is uniformly Cauchy on N, and in particular converges pointwise to a function f. To check completeness it will suffice to show that also f ∈ c0, but this is a straightforward consequence of uniform convergence on N. The details are left as an exercise. Along with these spaces it is also helpful to consider the vector space

c00 := {f : N → K|f(n) = 0 for all but finitely many n} (19.6) 1 ∞ Notice that c00 is a vector subspace of each of c0, ` and ` . Thus it can be equipped with either the k · k∞ or k · k1 norms. It is not complete in either of these norms, 1 ∞ however. What is true is that c00 is dense in c0 and ` (but not in ` ). (See Problem 20.9). c) (L1 spaces) Let (X, M , m) be a measure space. The quantity Z kfk1 := |f| dm (19.7) X defines a norm on L1(m), provided we agree to identify f and g when f = g a.e. (Indeed the chief motivation for making this identification is that it makes k · k1 into a norm. Note that `1 from the last example is a special case of this (what is the measure space?)) Proposition 19.4. L1(m) is a Banach space. P∞ Proof. We use Proposition 19.3. Suppose n=1 fn is absolutely convergent. Then the P∞ 1 function g := n=1 |fn| belongs to L and is thus finite m-a.e. In particular the series P∞ P∞ n=1 fn(x) is absolutely convergent in K for a.e. x ∈ X. Define f(x) = n=1 fn(x). Then f is an a.e.-defined measurable function, and belongs to L1(m) since |f| ≤ g. PN 1 We claim that the partial sums n=1 fn converge to f in the L (m) norm: indeed,

N Z N X X f − fn = f − fn dm (19.8) X n=1 1 n=1 ∞ Z X ≤ |fn| dm (19.9) X n=N+1 X = kfnk1 → 0 (19.10) n=N+1 3 as N → ∞. (What justifies the equality in the last line?)  d) (Lp spaces) Again let (X, M , m) be a measure space. For 1 ≤ p < ∞ let Lp(m) denote the set of measurable functions f for which

Z 1/p p kfkp := |f| dm < ∞ (19.11) X (again we identify f and g when f = g a.e.). It turns out that this quantity is a norm on Lp(m), and Lp(m) is complete, though we will not prove this yet (it is not immediately obvious that the triangle inequality holds when p > 1). The `p is defined analagously: it is the set of f : N → K for which

∞ !1/p X p kfkp := |f(n)| < ∞ (19.12) n=1 and this quantity is a norm making `p into a Banach space. When p = ∞, we define L∞(m) to be the set of all functions f : X → K with the following property: there exists M > 0 such that

|f(x)| ≤ M for m − a.e. x ∈ X; (19.13)

as for the other Lp spaces we identify f and g when there are equal a.e. When ∞ f ∈ L , let kfk∞ be the smallest M for which (19.13) holds. Then k · k∞ is a norm making L∞(m) into a Banach space. e) (C(X) spaces) Let X be a compact and let C(X) denote the set of continuous functions f : X → K. It is a standard fact from advanced calculus that the quantity kfk∞ := supx∈X |f(x)| is a norm on C(X). A sequence is Cauchy in this norm if and only if it is uniformly Cauchy. It is thus also a standard fact that C(X) is complete in this norm—completeness just means that a uniformly Cauchy sequence of continuous functions on X converges uniformly to a . This example can be generalized somewhat: let X be a locally compact metric space. Say a function f : X → K vanishes at infinity if for every  > 0, there exists a compact set K ⊂ X such that supx/∈K |f(x)| < . Let C0(X) denote the set of continuous functions f : X → K that vanish at inifinity. Then C0(X) is a vector space, the quantity kfk∞ := supx∈X |f(x)| is a norm on C0(X), and C0(X) is complete in this norm. (Note that c0 from above is a special case.) f) (Subspaces and direct sums) If (X , k · k) is a normed vector space and Y ⊂ X is a vector subspace, then the restriction of k · k to Y is clearly a norm on Y. If X is a Banach space, then (Y, k · k) is a Banach space if and only if Y is closed in the norm topology of X . (This is just a standard fact about metric spaces—a subspace of a is complete in the restricted metric if and only if it is closed.) If X , Y are vector spaces then the algebraic direct sum is the vector space of ordered pairs X ⊕ Y := {(x, y): x ∈ X , y ∈ Y} (19.14) 4 with entrywise operations. If X , Y are equipped with norms k · kX , k · kY , then each of the quanitites

k(x, y)k∞ := max(kxkX , kykY ),

k(x, y)k1 := kxkX + kykY is a norm on X ⊕ Y. These two norms are equivalent; indeed it follows from the definitions that

k(x, y)k∞ ≤ k(x, y)k1 ≤ 2k(x, y)k∞. (19.15) If X and Y are both complete, then X ⊕ Y is complete in both of these norms. The resulting Banach spaces are denoted X ⊕∞ Y, X ⊕1 Y respectively. g) (Quotient spaces) If X is a normed vector space and M is a proper subspace, then one can form the algebraic quotient X /M, defined as the collection of distinct cosets {x + M : x ∈ X }. From , X /M is a vector space under the standard operations. If M is a closed subspace of X , then the quantity kx + Mk := inf kx − yk (19.16) y∈M is a norm on X /M, called the quotient norm. (Geometrically, kx+Mk is the distance in X from x to the closed set M.) It turns out that if X is complete, so is X /M. See Problem 20.20. More examples are given in the exercises. Shortly we will construct further examples from linear transformations T : X → Y; to do this we first need to build up a few facts.

19.2. Linear transformations between normed spaces. Definition 19.5. Let X , Y be normed vector spaces. A linear transformation T : X → Y is called bounded if there exists a constant C > 0 such that kT xkY ≤ CkxkX for all x ∈ X .

Remark: Note that in this definition it would suffice to require that kT xkY ≤ CkxkX just for all x 6= 0, or for all x with kxkX = 1 (why?) The importance of boundedness is hard to overstate; the following proposition explains its importance. Proposition 19.6. Let T : X → Y be a linear transformation between normed spaces. Then the following are equivalent: (i) T is bounded. (ii) T is continuous. (iii) T is continuous at 0.

Proof. Suppose T is bounded and kT xk ≤ Ckxk for all x ∈ X ; let  > 0. Then if kx1 −x2k < δ := /C, we have

kT x1 − T x2k = kT (x1 − x2)k ≤ Ckx1 − x2k <  (19.17) so T is continuous, and (i) implies (ii). (ii) implies (iii) is trivial. For (iii) implies (i), we exploit homogeneity of the norm and the linearity of T . By hypothesis, given  > 0 there exists δ > 0 such that if kxk < δ, then 5 kT xk < . Fix a nonzero vector x ∈ X and a real number 0 < λ < δ. The vector λx/kxk has norm less than δ, so   λx kT xk T = λ < . (19.18) kxk kxk Rearranging this we find kT xk ≤ (/λ)kxk for all x 6= 0, which shows T is bounded; in fact we can take C = /δ.  Definition 19.7 (The ). Let X , Y be normed vector spaces and T : X → Y a bounded linear transformation. Define kT k := sup{kT xk : kxk = 1} (19.19) kT xk = sup{ } (19.20) x6=0 kxk = inf{C : kT xk ≤ Ckxk for all x ∈ X } (19.21) (As an exercise, verify that the three quantities on the right-hand side are equal.) The quantity kT k is called the operator norm of T . We write B(X , Y) for the set of all bounded linear operators from X to Y. Note that immediately we have the inequality kT xk ≤ kT kkxk (19.22) for all x ∈ X . If T ∈ B(X , Y) and S ∈ B(Y, Z) then from two applications of the the inequality (19.22) we have for all x ∈ X kST xk ≤ kSkkT xk ≤ kSkkT kkxk (19.23) and it follows that ST ∈ B(X , Z) and kST k ≤ kSkkT k. If X is a Banach space, then by the next proposition B(X , X ) is complete, and this inequality says that B(X , X ) is a . Proposition 19.8. The operator norm makes B(X , Y) into a normed vector space, which is complete if Y is complete. Proof. That B(X , Y) is a normed vector space follows readily from the definitions and is left as an exercise. Suppose now Y is complete, and let Tn be a Cauchy sequence in B(X , Y). For each x ∈ X , we have

kTnx − Tmxk = k(Tn − Tm)xk ≤ kTn − Tmkkxk (19.24) which shows that (Tnx) is a Cauchy sequence in Y. By hypothesis, Tnx converges in Y. Define T : X → Y by setting T x := y. It is straightforward to check that T is linear. We must show that T is bounded and kTn − T k → 0. To see that T is bounded, note first that since (Tn) is a Cauchy sequence in the metric space B(X , Y), it is bounded as a subset of B(X , Y); that is, M := supn kTnk = supn d(0,Tn) < ∞. But then kTnxk ≤ Mkxk for every x, and since T x = lim Tnx, we have kT xk ≤ Mkxk for every x also. Finally, given  > 0 choose N so that kTn − Tmk <  for all n, m ≥ N. Then for all x ∈ X with kxk = 1, we have kTnx − Tmxk < . Taking the limit as m → ∞, we see that kTnx − T xk ≤  for all kxk = 1; thus kTn − T k ≤  for n ≥ N.  6 The following proposition is very useful in constructing bounded operators—at least when the codomain is complete, it suffices to define the operator (and show that it is bounded) on a dense subspace: Proposition 19.9 (Extending bounded operators). Let X , Y be normed vector spaces with Y complete, and E ⊂ X a dense linear subspace. If T : E → Y is a bounded linear operator, then there exists a unique bounded linear operator Te : X → Y extending T (so Te|E = T ), and kTek = kT k.

Proof. Let x ∈ X . By hypothesis there is a sequence (xn) in E converging to x; in particular this sequence is Cauchy. Since T is bounded on E, the sequence (T xn) is Cauchy in Y, hence convergent by hypothesis. Define Te x = limn T xn; using the fact that T is bounded on E it is not hard to see that Te is well-defined and agrees with T on E (the proof is an exercise). It also follows readily that Te is linear. To see that Te is bounded (and prove the equality of norms), fix x ∈ X with kxk = 1 and let (xn) be a sequence in E converging to x. Then kxnk → kxk; in particular for any  we have kxnk < 1 +  for all n sufficiently large. By enlarging n if necessary, we may also assume that kTe x − T xnk < . But then using the triangle inequality, kTe xk <  + kT xnk <  + (1 + )kT k. (19.25) Since  was arbitrary, we have kTek ≤ kT k. As the reverse inequality is trivial, the proof is finished.  Remark: The completeness of Y is essential in the above proposition; Problem 20.11 suggests a counterexample. A bounded linear transformation T ∈ B(X , Y) is said to be invertible if it is bijective (so T −1 exists as a linear transformation) and T −1 is bounded from Y to X . We can now define two useful notions of equivalence for normed vector spaces. Two normed spaces X , Y are said to be (boundedly) isomorphic if there exists an invertible linear transformation T : X → Y. The spaces are isometrically isomorphic if additionally kT xk = kxk for all x.A T with this property is called an ; note that an isometry automatically injective; if it is also surjective then it is automatically invertible, in which case T −1 is also an isometry. An isometry need not be surjective, however. 19.3. Examples. a) If X is a finite-dimensional normed space and Y is any normed space, then every linear transformation T : X → Y is bounded.

b) Let X be c00 equipped with the k · k1 norm, and Y be c00 equipped with the k · k∞ norm. Then the identity map id : c00 → c00 is bounded as an operator from X to Y (in fact its norm is equal to 1), but is unbounded as an operator from Y to X . c) Consider c00 with the k · k∞ norm. Let a : N → K be any function and define a linear transformation Ta : c00 → c00 by

Taf(n) = a(n)f(n). (19.26)

Then Ta is bounded if and only if M = supn∈N |a(n)| < ∞, in which case kTak = M. When this happens, Ta extends uniquely to a from c0 to c0, and one may check that the formula (19.26) defines the extension. All of these claims 7 remain true if we use the k·k1 norm instead of the k·k∞ norm; we then get a bounded operator from `1 to itself. d) For f ∈ `1, define the shift operator by Sf(1) = 0 and Sf(n) = f(n − 1) for n ≥ 1. (Viewing f as a sequence, S shifts the sequence one place to the right and fills in a 0 in the first position). This S is an isometry, but is not surjective. In contrast, if X is finite-dimensional, then the rank-nullity theorem from linear algebra guarantees that every injective T : X → X is also surjective. e) Let C∞[0, 1] denote the space of functions on [0, 1] with continuous derivatives of all df ∞ orders. The differentiation map f → dx is a linear transformation from C [0, 1] to d tx tx ∞ itself. Since dx e = te for all real t, we find that there is no norm on C [0, 1] for d which dx is bounded. 20. Problems Problem 20.1. Prove Proposition 19.2. Problem 20.2. Prove that equivalent norms define the same topology and the same Cauchy sequences. Problem 20.3. a) Prove that all norms on a finite dimensional vector space X are P P equivalent. (Hint: fix a e1, . . . en for X and define k akekk1 := |ak|. Com- pare any given norm k · k to this one. Begin by proving that the “

S = {x ∈ X : kxk1 = 1} is compact in the k · k1 topology.) b) Combine the result of part (a) with the result of Problem 20.2 to conclude that every finite-dimensional normed vector space is complete. c) Let X be a normed vector space and M ⊂ X a finite-dimensional subspace. Prove that M is closed in X . Problem 20.4. Finish the proofs from Example 19.1(b). Problem 20.5. A function f : [0, 1] → K is called Lipschitz continuous if there exists a constant C such that |f(x) − f(y)| ≤ C|x − y| (20.1) for all x, y ∈ [0, 1]. Define kfkLip to be the best possible constant in this inequality. That is, |f(x) − f(y)| kfkLip := sup (20.2) x6=y |x − y| Let Lip[0, 1] denote the set of all Lipschitz continuous functions on [0, 1]. Prove that kfk := |f(0)| + kfkLip is a norm on Lip[0, 1], and that Lip[0, 1] is complete in this norm. Problem 20.6. Let C1[0, 1] denote the space of all functions f : [0, 1] → R such that f is differentiable in (0, 1) and f 0 extends continuously to [0, 1]. Prove that 0 kfk := kfk∞ + kf k∞ (20.3) is a norm on C1[0, 1] and that C1 is complete in this norm. Do the same for the norm 0 0 1 kfk := |f(0)| + kf k∞. (Is kf k∞ a norm on C ?) Problem 20.7. Let (X, M ) be a measurable space. Let M(X) denote the (real) vector space of all signed measures on (X, M ). Prove that the total variation norm kµk := |µ|(X) is a norm on M(X), and M(X) is complete in this norm. 8 Problem 20.8. Prove that if X , Y are normed spaces, then the operator norm is a norm on B(X , Y).

1 Problem 20.9. Prove that c00 is dense in c0 and ` . (That is, given f ∈ c0 there is a 1 sequence fn in c00 such that kfn − fk∞ → 0, and the analogous statement for ` .) Using ∞ ∞ these facts, or otherwise, prove that c00 is not dense in ` . (In fact there exists f ∈ ` with kfk∞ = 1 such that kf − gk∞ ≥ 1 for all g ∈ c00.)

Problem 20.10. Prove that c00 is not complete in the k · k1 or k · k∞ norms. (After we have studied the Baire Category theorem, you will be asked to prove that there is no norm on c00 making it complete.)

Problem 20.11. Consider c0 and c00 equipped with the k · k∞ norm. Prove that there is no

bounded operator T : c0 → c00 such that T |c00 is the identity map. (Thus the conclusion of Proposotion 19.9 can fail if Y is not complete.)

Problem 20.12. Prove that the k · k1 and k · k∞ norms on c00 are not equivalent. Conclude from your proof that the identity map on c00 is bounded from the k · k1 norm to the k · k∞ norm, but not the other way around.

n Problem 20.13. a) Prove that f ∈ C0(R ) if and only if f is continuous and lim|x|→∞ |f(x)| = n n 0. b) Let Cc(R ) denote the set of continuous, compactly supported functions on R . Prove n n n that Cc(R ) is dense in C0(R ) (where C0(R ) is equipped with sup norm). Problem 20.14. Prove that if X , Y are normed spaces and X is finite dimensional, then every linear transformation T : X → Y is bounded. Problem 20.15. Prove the claims in Example 19.3(c).

Problem 20.16. Let g : R → K be a (Lebesgue) measurable function. The map Mg : f → gf is a linear transformation on the space of measurable functions. Prove that Mg is 1 ∞ bounded from L (R) to itself if and only if g ∈ L (R), in which case kMgk = kgk∞. Problem 20.17. Prove the claims about direct sums in Example 19.1(f). Problem 20.18. Let X be a normed vector space and M a proper closed subspace. Prove that for every  > 0, there exists x ∈ X such that kxk = 1 and infy∈M kx − yk > 1 − . (Hint: take any u ∈ X \ M and let a = infy∈M ku − yk. Choose δ > 0 small enough so that a u−v a+δ > 1 − , and then choose v ∈ M so that ku − vk < a + δ. Finally let x = ku−vk .) Problem 20.19. Prove that if X is an infinite-dimensional normed space, then the unit ball(X ) := {x ∈ X : kxk ≤ 1} is not compact in the norm topology. (Hint: use the result of Problem 20.18 to construct inductively a sequence of vectors xn ∈ X such that kxnk = 1 for 1 all n and kxn − xmk ≥ 2 for all m < n.) Problem 20.20. (The quotient norm) Let X be a normed space and M a proper closed subspace. a) Prove that the quotient norm is a norm (see Example 19.1(g)). b) Show that the quotient map x → x + M has norm 1. (Use Problem 20.18.) c) Prove that if X is complete, so is X /M. 9 Problem 20.21. A normed vector space X is called separable if it is separable as a metric space (that is, there is a countable subset of X which is dense in the norm topology). Prove 1 ∞ ∞ that c0 and ` are separable, but ` is not. (Hint: for ` , show that there is an uncountable collection of elements {fα} such that kfα − fβk = 1 for α 6= β.) 21. Linear functionals and the Hahn-Banach theorem If there is a “fundamental theorem of ,” it is the Hahn-Banach theorem. The particular version of it we will prove is somewhat abstract-looking at first, but its importance will be clear after studying some of its corollaries. Let X be a normed vector space over the field K.A linear functional on X is a linear map L : X → K. As one might expect, we are especially interested in bounded linear functionals. Since K = R or C is complete, the vector space of bounded linear functionals B(X , K) is itself a normed vector space, and is always complete (even if X is not). This space is called the of X and is denoted X ∗. It is not yet obvious that X ∗ need be non-trivial (that is, that there are any bounded linear functionals on X besides 0). One corollary of the Hahn-Banach theorem will be that there always exist many linear functionals on any normed space X (in particular, enough to separate points of X ). 21.1. Examples. 1 ∞ a) For each of the sequence spaces c0, ` , ` , for each n the map f → f(n) is a bounded 1 linear functional. If we fix g ∈ ` , then the functional Lg : c0 → K defined by ∞ X Lg(f) := f(n)g(n) (21.1) n=0 is bounded, since ∞ ∞ X X |Lg(f)| ≤ |f(n)g(n)| ≤ kfk∞ |g(n)| = kgk1kfk∞. (21.2) n=0 n=0

This shows that kLgk ≤ kgk1. In fact, equality holds, and every bounded linear functional on c0 is of this form: 1 Proposition 21.1. The map g → Lg is an ismoetric isomorphism from ` onto the ∗ dual space c0. Proof. We have already seen that each g ∈ `1 gives rise to a bounded linear functional ∗ Lg ∈ c0 via ∞ X Lg(f) := g(n)f(n) (21.3) n=0 and that kLgk ≤ kgk1. We will prove simultaneously that this map is onto and that kLgk ≥ kgk1. ∗ 1 Let L ∈ c0, we will first show that there is unique g ∈ ` so that L = Lg. Let en ∈ c0 be the indicator function of n, that is

en(m) = δnm. (21.4) Define a function g : N → K by

g(n) = L(en). (21.5) 10 1 We claim that g ∈ ` and L = Lg. To see this, fix an integer N and let h ∈ c00 be the function ( g(n)/|g(n)| if n ≤ N and g(n) 6= 0 h(n) = . (21.6) 0 otherwise

PN By definition h ∈ c0 and khk∞ ≤ 1. Note that h = n=0 h(n)en. Now

N N X X |g(n)| = h(n)g(n) = L(h) = |L(h)| ≤ kLkkhk ≤ kLk. (21.7) n=0 n=0 1 It follows that g ∈ ` and kgk1 ≤ kLk. Moreover, the same calculation shows that L = Lg when restricted to c00, so by the uniqueness of extensions of bounded operators, L = Lg. Uniqueness of g is clear from its construction, since if L = Lg, we ∗ must have g(n) = Lg(en). Thus the map g → Lg is onto and kLgkc0 = kgk1.  Proposition 21.2. (`1)∗ is isometrically isomorphic to `∞. Proof. The proof follows the same lines as the proof of the previous proposition; the details are left as an exercise.  1 The same mapping g → Lg also shows that every g ∈ ` gives a bounded linear functional on `∞, but it turns out these do not exhaust (`∞)∗ (see Problem 22.7). 1 b) If f ∈ L (m) and g is a bounded measurable function with supx∈X |g(x)| = M, then the map Z Lg(f) := fg dm (21.8) X is a bounded linear functional of norm at most M. We will prove in Section 27 that the norm is in fact equal to M, and every bounded linear functional on L1(m) is of this type (at least when m is σ-finite). c) If X is a compact metric space and µ is a finite, signed Borel measure on X, then Z Lµ(f) := f dµ (21.9) X

is a bounded linear functional on CR(X) of norm at most kµk. To state and prove the Hahn-Banach theorem, we first work in the setting K = R, then extend our results to the complex case. Definition 21.3. Let X be a real vector space. A is a function p : X → R such that p(x + y) ≤ p(x) + p(y) and p(λx) = λp(x) for all x, y ∈ X and nonnegative λ ∈ R. For example, if L : X → R is any linear functional, then the function p(x) := |L(x)| is a Minkowski functional. Also p(x) = kxk is a Minkowski functional. More generally, p(x) = kxk is a Minkowski functional whenever kxk is a on X . (A seminorm is a function obeying all the requirements of a norm, except we allow kxk = 0 for nonzero x. Note that both of the given examples of Minkowski functionals come from .) 11 Theorem 21.4 (The Hahn-Banach Theorem, real version). Let X be a normed vector space over R, p a Minkowski functional on X , M a subspace of X , and L a linear functional on M such that L(x) ≤ p(x) for all x ∈ M. Then there exists a linear functional L0 on X such 0 0 that L (x) ≤ p(x) for all x ∈ X and L |M = L. Proof. The idea is to show that the extension can be done one dimension at a time, and we then infer the existence of an extension to the whole space by appeal to Zorn’s lemma. So, fix a vector x ∈ X \ M and consider the subspace M + Rx ⊂ X . For any m1, m2 ∈ M, we have by hypothesis

L(m1) + L(m2) = L(m1 + m2) ≤ p(m1 + m2) ≤ p(m1 − x) + p(m2 + x). (21.10) which rearranges to

L(m1) − p(m1 − x) ≤ p(m2 + x) − L(m2) for all m1, m2 ∈ M (21.11) so sup {L(m) − p(m − x)} ≤ inf {p(m + x) − L(m)}. (21.12) m∈M m∈M Now choose any λ satisfying sup {L(m) − p(m − x)} ≤ λ ≤ inf {p(m + x) − L(m)}. (21.13) m∈M m∈M and define L0(m + tx) = L(m) + tλ. This L0 is linear by definition and agrees with L on M. We now check that L0(y) ≤ p(y) for all y ∈ M + Rx. This is immediate if y ∈ M. In general let y = m + tx with m ∈ M and t > 0. Then  m   m m m  L0(m + tx) = t L( ) + λ ≤ t L( ) + p( + x) − L( ) = p(m + tx) (21.14) t t t t and a similar estimate shows that L0(m + tx) ≤ p(m + tx) for t < 0. We have thus successfully extended L to M + Rx. To finish, let L denote the set of pairs (L0, N ) where N is a subspace of X containing M, and L0 is an extension of L to N obeying 0 0 0 0 0 L (y) ≤ p(y) on N . Declare (L1, N1) ≤ (L2, N2) if N1 ⊂ N2 and L2|nn1 = L1. This is a 0 0 partial order on L. Given any increasing chain (Lα, Nα) in L, it has an upper bound (L , N ) S 0 in L, where N := α Nα and L(nα) := Lα(nα) for nα ∈ Nα. By Zorn’s lemma, then, the collection L has a maximal element (L0, N ) with respect to the order ≤. Since it always possible to extend to a strictly larger subspace, the maximal element must have N = X , and the proof is finished.  The proof is a typical application of Zorn’s lemma—one knows how to carry out a con- struction one step a time, but there is no clear way to do it all at once. In the special case that p is a seminorm, since L(−x) = −L(x) and p(−x) = p(x) the inequality L ≤ p is equivalent to |L| ≤ p. Corollary 21.5. Let X be a normed vector space over R, M a subspace, and L a bounded linear functional on M satisfying |L(x)| ≤ Ckxk for all x ∈ M. Then there exists a bounded linear functional L0 on X extending L, with kL0k ≤ C.

Proof. Apply the Hahn-Banach theorem with the Minkowski functional p(x) = Ckxk.  Before obtaining further corollaries, we extend these results to the complex case. First, if X is a vector space over C then trivially it is also a vector space over R, and there is a simple relationship between the R- and C-linear functionals. 12 Proposition 21.6. Let X be a vector space over C. If L : X → C is a C-linear functional, then u(x) = ReL(x) defines an R-linear functional on X , and L(x) = u(x) − iu(ix). Con- versely if u : X → R is R-linear then L(x) := u(x) − iu(ix) is C-linear. If in addition X is normed, then kuk = kLk.

Proof. Problem ??  Theorem 21.7 (The Hahn-Banach Theorem, complex version). Let X be a normed vector space over C, p a seminorm on X , M a subspace of X , and L : M → C a C-linear functional satisfying |L(x)| ≤ p(x) for all x ∈ M. Then there exists a linear functional L0 : X → C satisfying |L0(x)| ≤ p(x) for all x ∈ X .

Proof. The proof consists of applying the real Hahn-Banach theorem to extend the R-linear functional u = ReL to a functional u0 : X → R and then defining L0 from u0 as in Proposi- tion 21.6. The details are left as an exercise.  The following corollaries are quite important, and when the Hahn-Banach theorem is applied it is usually in one of the following forms: Corollary 21.8. Let X be a normed vector space. (i) (Linear functionals detect norms) If x ∈ X is nonzero, there exists L ∈ X ∗ with kLk = 1 such that L(x) = kxk. (ii) (Linear functionals separate points) If x 6= y in X , there exists L ∈ X ∗ such that L(x) 6= L(y). (iii) (Linear functionals detect distance to subspaces) If M ⊂ X is a closed subspace ∗ and x ∈ X \ M, there exists L ∈ X such that L|M = 0 and L(x) = dist(x, M) = infy∈M kx − yk > 0. Proof. (i): Let M be the one-dimensional subspace of X spanned by x. Define a functional x L : M → K by L(t kxk ) = t. For p(x) = kxk, we have |L(x)| ≤ p(x) on M, and L(x) = kxk. By the Hahn-Banach theorem the functional L extends to a functional (still denoted L) on X and satisfies |L(y)| ≤ kyk for all y ∈ X ; thus kLk ≤ 1; since L(x) = kxk by construction we conclude kLk = 1. (ii): Apply (i) to the vector x − y. (iii): Let δ = dist(x, M). Define a functional L : M + Kx → K by L(y + tx) = tδ. Since for t 6= 0 ky + txk = |t|kt−1y + xk ≥ |t|δ = |L(y + tx)|, (21.15) ∗ by Hahn-Banach we can extend L to a functional L ∈ X with kLk ≤ 1.  Needless to say, the proof of the Hahn-Banach theorem is thoroughly non-constructive, and in general it is an important (and often difficult) problem, given a normed space X , to find some concrete description of the dual space X ∗. Usually this means finding a Banach space Y and a bounded (or, better, isometric) isomorphism T : Y → X ∗. A final corollary, which is also quite important. Note that since X ∗ is a normed space, we can form its dual, denoted X ∗∗ and called the bidual of X . There is a canonical relationship between X and X ∗∗. First observe that if L ∈ X ∗, each fixed x ∈ X gives rise to a linear functionalx ˆ : X ∗ → K via evaluation: xˆ(L) := L(x). (21.16) 13 Corollary 21.9. ( in the bidual) The map x → xˆ is an isometric linear map from X into X ∗∗. Proof. First, from the definition we see that |xˆ(L)| = |L(x)| ≤ kLkkxk (21.17) sox ˆ ∈ X ∗∗ and kxˆk ≤ kxk. It is straightforward to check (recalling that the L’s are linear) that the map x → xˆ is linear. Finally, to show that kxˆk = kxk, fix a nonzero x ∈ X . From Corollary 21.8(i) there exists L ∈ X ∗ with kLk = 1 and L(x) = kxk. But then for this x and L, we have |xˆ(L)| = |L(x)| = kxk so kxˆk ≥ kxk, and the proof is complete.  Definition 21.10. A Banach space X is called reflexive if the mapˆ: X → X ∗∗ is surjective. In other words, X is reflexive if the mapˆ is an isomorphism of X with X ∗∗. For example, every finite dimensional Banach space is reflexive (Problem ??). On the other hand, we will ∗∗ ∞ p p see below that c0 = ` , so c0 is not reflexive. After we have studied the L and ` spaces in more detail, we will see that Lp is reflexive for 1 < p < ∞. The embedding into the bidual has many applications; one of the most basic is the follow- ing: Proposition 21.11 (Completion of normed spaces). If X is a normed vector space, there is a Banach space X and in isometric map ι : X → X such that the image ι(X ) is dense in X . Proof. Embed X into X ∗∗ via the map x → xˆ and let X be the closure of the image of X in ∗∗ X . Since X is a closed subspace of a complete space, it is complete.  The space X is called the completion of X . It is unique in the sense that if Y is another Banach space and j : X → Y embeds X isometrically as a dense subspace of Y, then Y is isometrically isomorphic to X . The proof of this fact is left as an exercise. 21.2. Dual spaces and adjoint operators. Let X,Y be normed spaces with duals X∗,Y ∗. If T : X → Y is a linear transformation, then given any linear map f : Y → K we can define another linear map T ∗f : X → K by the formula (T ∗f)(x) = f(T x). (21.18) In fact, if f is bounded then so is T ∗f, and more is true: Theorem 21.12. The formula (21.18) defines a bounded linear transformation T ∗ : Y ∗ → X∗, and in fact kT ∗k = kT k. Proof. Let f ∈ Y ∗ and x ∈ X. Then |T ∗f(x)| = |f(T x)| ≤ kfkkT xk ≤ kfkkT kkxk. (21.19) Taking the supremum over kxk = 1, we find that T ∗f is bounded and kT ∗fk ≤ kT kkfk, so kT ∗k ≤ kT k. For the reverse inequality, let 0 <  < 1 be given and choose x ∈ X with kxk = 1 and kT xk > (1 − )kT k. Now consider T x. By the Hahn-Banach theorem, there exists f ∈ Y ∗ such that kfk = 1 and f(T x) = kT xk. For this f, we have kT ∗fk ≥ |T ∗f(x)| = |f(T x)| = kT xk > (1 − )kT k. (21.20) ∗ ∗ Since  was arbitrary, we conclude that kT k := supkfk=1 kT fk ≥ kT k.  14 22. Problems Problem 22.1. a) Prove that if X is a finite-dimensional normed space, then every linear functional f : X → K is bounded. b) Prove that if X is any normed vector space, {x1, . . . xn} is a linearly independent set in X , and α1, . . . αn are scalars, then there exists a bounded linear functional f on X such that f(xj) = αj for j = 1, . . . n. Problem 22.2. Let X , Y be normed spaces and T : X → Y a linear transformation. Prove that T is bounded if and only if there exists a constant C such that for all x ∈ X and f ∈ Y∗, |f(T x)| ≤ Ckfkkxk; (22.1) in which case kT k is equal to the best possible C in (22.1). Problem 22.3. Let X be a normed vector space. Show that if M is a closed subspace of X and x∈ / M, then M + Cx is closed. Use this to give another proof that every finite- dimensional subspace of X is closed. Problem 22.4. Prove that if M is a finite-dimensional subspace of a Banach space X , then there exists a closed subspace N ⊂ X such that M ∩ N = {0} and M + N = X . (In other words, every x ∈ X can be written uniquely as x = y +z with y ∈ M, z ∈ N .) Hint: Choose a basis x1, . . . xn for M and construct bounded linear functionals f1, . . . fn on X such that n fi(xj) = δij. Now let N = ∩i=1ker fi. (Warning: this conclusion can fail badly if M is not assumed finite dimensional, even if M is still assumed closed.) Problem 22.5. Let X and Y be normed vector spaces and T ∈ L(X , Y). a) Consider T ∗∗ : X ∗∗ → Y∗∗. Identifying X , Y with their images in X ∗∗ and Y∗∗, show ∗∗ that T |X = T . b) Prove that T ∗ is injective if and only if the range of T is dense in Y. c) Prove that if the range of T ∗ is dense in X ∗, then T is injective; if X is reflexive then the converse is true. Problem 22.6. Prove that if X is a Banach space and X ∗ is separable, then X is separable. ∗ Hint: let {fn} be a countable dense subset of X . For each n choose xn such that |fn(xn)| ≥ 1 2 kfnk. Show that the set of linear combinations of {xn} is dense in X . Problem 22.7. a) Prove that there exists a bounded linear functional L ∈ (`∞)∗ with ∞ the following property: whenever f ∈ ` and limn→∞ f(n) exists, then L(f) is equal to this limit. (Hint: first show that the set of such f forms a closed subspace M ⊂ `∞). 1 b) Show that such a functional L is not equal to Lg for any g ∈ ` ; thus the map 1 ∞ ∗ T : ` → (` ) given by T (g) = Lg is not surjective. c) Give another proof that T is not surjective, using Problem ??.

23. The Baire Cateogry Theorem and applications A X is called a Baire space if it has the following property: whenever ∞ ∞ (Un)n=1 is a countable sequence of open, dense subsets of X, the intersection ∩n=1Un is dense in X. An equivalent formulation can be given in terms of nowhere dense sets: X is Baire if and only if it is not a countable union of nowhere dense sets. (Recall that a set E ⊂ X 15 is called nowhere dense if its closure has empty interior.) In particular, note that if E is nowhere dense, then X \ E is open and dense. The Baire property is used as a kind of pigeonhole principle: the “thick” Baire space X cannot be expressed as a countable union S of the “thin” nowhere dense sets En. Equivalently, if X is Baire and X = n En, then at least one of the En is somewhere dense. Theorem 23.1 (The Baire Category Theorem). Every complete metric space X is a Baire space.

Proof. Let Un be a sequence of open dense sets in X. Note that a set is dense if and only if it has nonempty intersection with every nonempty open set W ⊂ X. Fix such a W . Since U1 is open and dense, there is a point x1 ∈ W ∩U1 and a radius 0 < r1 < 1 such that the B(x1, r1) is contained in W ∩ U1. Similarly, since U2 is dense there is a point x2 ∈ B(x1, r1) ∩ U2 and 1 a radius 0 < r2 < 2 such that

B(x2, r2) ⊂ B(x1, r1) ∩ U2. (23.1) ∞ Continuing inductively, since each Un is dense we obtain a sequence of points (xn)n=1 and 1 radii 0 < rn < n such that

B(xn, rn) ⊂ B(xn−1, rn−1) ∩ Un. (23.2)

Since xm ∈ B(xn, rn) for all m ≥ n and rn → 0, the sequence (xn) is Cauchy, and thus convergent since X is complete. Moreover, since each the closed sets B(xn, rn) contains a tail of the sequence, each contains the limit x as well. Thus x ∈ W ∩ U1 ∩ · · · ∩ Un for all n, and the proof is complete.  Remark: In some books a countable intersection of open dense sets is called residual or second category and a countable union of nowhere dense sets is called meager or first category. In R with the usual topology, Q is first category and R \ Q is second category. We now give three important applications of the Baire category theorem in functional analysis: the Principle of Uniform boundedness (also known as the Banach-Steinhaus the- orem), the Open Mapping Theorem, and the Theorem. (In learning these theorems, keep careful track of what completeness hypotheses are needed.) Theorem 23.2 (The Principle of Uniform Boundedness (PUB)). Let X , Y be normed spaces with X complete, and let {Tα : α ∈ A} ⊂ B(X , Y) a collection of bounded linear transfor- mations from X to Y. If

M(x) := sup kTαxk < ∞ (23.3) α

for each x ∈ X , then supα kTαk < ∞. (In other words, a pointwise bounded family of linear operators is uniformly bounded.) Proof. For each integer n ≥ 1 consider the set

Vn := {x ∈ X : M(x) > n}. (23.4)

Since each Tα is bounded, the sets Vn are open. (Indeed, for each α the map x → kTαxk is continuous from X to R, so if kTαxk > n for some α then also kTαyk > n for all y sufficiently close to x.) If each Vn were dense, then since X is complete we would conclude from the ∞ Baire Category theorem that the intersection V = ∩n=1Vn is nonempty. But then for any 16 x ∈ V , we would have M(x) > n for all n, which contradicts the pointwise boundedness hypothesis (23.3). Thus for some N, VN is not dense. Choose x0 ∈ X \ VN and r > 0 so that x0 − x ∈ X \ VN for all kxk < r. Then for every α and every kxk < r we have

kTαxk ≤ kTα(x − x0)k + kTαx0k ≤ N + M(x0) := R. (23.5)

That is, kTαxk ≤ R for all kxk < r, so by rescaling x we conclude that kTαxk ≤ R/r for all kxk < 1, so supα kTαk ≤ R/r < ∞.  Recall that if X,Y are topological spaces, a mapping f : X → Y is called open if f(U) is open in Y whenever U is open in X. In the case of normed linear spaces the condition that a linear map be open can be refined somewhat: Lemma 23.3 (Translation and Dilation lemma). Let X , Y be normed vector spaces, let B denote the open unit ball of X , and let T : X → Y be a linear map. Then the map T is open if and only if T (B) contains an open ball centered at 0.

Proof. This is more or less immediate from the fact that the translation map z → z + z0, for fixed z0, and dilation map z → rz, r ∈ K are continuous in any normed vector space. The details are left as an exercise.  Theorem 23.4 (The Open Mapping Theorem). Suppose that X , Y are Banach spaces and T : X → Y is a bounded, surjective linear map. Then T is open. S∞ Proof. Let Br denote the open ball of radius r centered at 0 in X . Trivially X = n=1 Bn S∞ and since Y is surjective, we have Y = n=1 T (Bn). Since Y is complete, from the Baire category theorem we have that T (BN ) is somewhere dense, for some N. That is, T (BN ) has nonempty interior; by rescaling we conclude that T (B1) has nonempty interior. We must deduce from this that in fact T (B1) contains an open ball centered at 0 in Y. First we show that T (B1) contains an open ball centered at 0. Y There exists y0 ∈ Y and r > 0 such that T (B1) contains Br (y0). Choose y1 = T x1 in T (B1) such that ky1 − y0k < r/2, then by the triangle inequality Y Y Br/2(y1) ⊂ Br (y0) ⊂ T (B1). (23.6) It follows that for all kyk < r/2,

y = −y1 + (y + y1) = −T x1 + (y + y1) ∈ T (−x1 + B1) ⊂ T (B2). (23.7)

Halving the radius again, we conclude that if kyk < r/4, then y ∈ T (B1). Finally, we show that by shrinking the radius further we can find an open ball contained in n+2 T (B1). From the previous paragraph, by scaling again for each n, we see that if kyk < r/2 then y ∈ T (B1/2n ). If kyk < r/8, then there exists x1 ∈ B1/2 such that ky − T x1k < r/16. Pn −n−3 P∞ Inductively there exist xn ∈ B1/2n with ky − T k=1 xjk < r2 . The series k=1 xk is thus absolutely convergent, and hence convergent, since X is complete. Call its sum x. By P∞ −k construction we have the estimate kxk < k=1 2 , so x ∈ B1 and y = T x. In conclusion, we have shown that y ∈ T (B1) whenever kyk < r/8, so the proof is finished.  Corollary 23.5 (The Banach Isomorphism Theorem). If X , Y are Banach spaces and T : X → Y is a bounded bijection, then T −1 is also bounded (hence, T is an isomorphism). 17 Proof. Note that when T is bijection, T is open if and only if T −1 is continuous. The result then follows from the Open Mapping Theorem and Proposition 19.6.  To state the final result of this section, we need a few more definitions. Let X , Y be normed spaces. The Cartesian product X × Y is then a topological space in the . (In fact the product topology can be realized by norming X ×Y, e.g. with the norm k(x, y)k := max(kxk, kyk).) The space X × Y is equipped with the coordinate projections πX (x, y) = x, πY (x, y) = y; it is clear that these maps are continuous. Given a linear map T : X → Y, its graph is the set G(T ) := {(x, y) ∈ X × Y : y = T x} (23.8) Observe that since T is a linear map, G(T ) is a linear subspace of X ×Y. The transformation T is called closed if G(T ) is a closed subset of X × Y. This criterion is usually expressed in the following equivalent formulation: whenever xn → x in X and T xn → y in Y, it holds that T x = y. Note that this is not the same thing as saying that T is continuous: T is continuous if and only if whenever xn → x in X , then T xn → T x in Y. Evidently, if T is continuous then T is closed. The converse is false in general (see Problem 24.2). However, the converse does hold when X and Y are complete: Theorem 23.6 (The Closed Graph Theorem). If X , Y are Banach spaces and T : X → Y is closed, then T is bounded.

Proof. Let π1, π2 be the coordinate projections πX , πY restricted to G(T ); explicitly π1(x, T x) = x and π2(x, T x) = T x. Note that π1 is a bijection between G(T ) and X . Moreover, as maps −1 −1 between sets we have T = π2 ◦ π1 . So, it suffices to prove that π1 is bounded. By the remarks above π1 and π2 are bounded (from G(T ) to X , Y respectively). Since X and Y are complete, so is X × Y, and therefore G(T ) is complete since it is assumed closed. Since π1 is a bijection between G(T ) and X , its inverse is also bounded (Corollary 23.5). 

24. Problems

Problem 24.1. Show that there exists a sequence of open, dense subsets Un ⊂ R such that T∞ m( n=1 Un) = 0.

Problem 24.2. Consider the linear subspace D ⊂ c0 defined by

D = {f ∈ c0 : lim |nf(n)| = 0} (24.1) n→∞ and the linear transformation T : D → c0 defined by (T f)(n) = nf(n). −1 a) Prove that T is closed, but not bounded. b) Prove that T : c0 → D is bounded and surjective, but not open.

Problem 24.3. Suppose X is a vector space equipped with two norms k · k1, k · k2 such that k · k1 ≤ k · k2. Prove that if X is complete in both norms, then the two norms are equivalent. Problem 24.4. Let X , Y be Banach spaces. Provisionally, say that a linear transformation T : X → Y is weakly bounded if f ◦ T ∈ X ∗ whenever f ∈ Y∗. Prove that if T is weakly bounded, then T is bounded. 18 Problem 24.5. Let X , Y be Banach spaces. Suppose (Tn) is a sequence in B(X , Y) and limn Tnx exists for every x ∈ X . Prove that if T is defined by T x = limn Tnx, then T is bounded. Problem 24.6. Suppose that X is a vector space with a countably infinite basis. (That is, there is a linearly independent set {xn} ⊂ X such that every vector x ∈ X is ex- pressed uniquely as a finite linear combination of the xn’s.) Prove that there is no norm on X under which it is complete. (Hint: consider the finite-dimensional subspaces Xn := span{x1, . . . xn}.) Problem 24.7. The Baire Category Theorem can be used to prove the existence of (very many!) continuous, nowhere differentiable functions on [0, 1]. To see this, let Fn denote the set of all functions f ∈ C[0, 1] for which there exists x0 ∈ [0, 1] (which may depend on f) such that |f(x) − f(x0)| ≤ n|x − x0| for all x ∈ [0, 1]. Prove that the sets En are nowhere dense in C[0, 1]; the Baire Category Theorem then shows that the set of nowhere differentiable functions is residual. (To see that En is nowhere dense, approximate an arbitrary continuous function f uniformly by piecewise linear functions g, whose pieces have slopes greater than 2n in . Any function sufficiently close to such a g will not lie in En.)

25. Hilbert spaces 25.1. Inner products.

Definition 25.1. Let X be a vector space over K. An inner product on X is a function u : X × X → K such that, for all x, y, z ∈ X and all α, β ∈ K, 1) u(x, x) ≥ 0 and u(x, x) = 0 if and only if x = 0. 2) u(x, y) = u(y, x) 3) u(αx + βy, z) = αu(x, z) + βu(y, z). Notice that 2) and 3) together imply 4) u(x, αy + βz) = αu(x, y) + βu(x, z).

Remark: A function u satisfying only 3) and 4) is called a (when K = R) or a (when K = C). In this case, if 2) is also satisfied then u is called symmetric (R) or Hermitian (C). A Hermitian or symmetric form satisfying u(x, x) ≥ 0 for all x is called positive semidefinite or a pre-inner product. 25.2. Examples. a) (Rn and Cn) It is easy to check that the standard product on Rn is an inner product; it is defined as usual by n X hx, yi = xjyj (25.1) j=1

where we have written x = (x1, . . . xn); y = (y1, . . . yn). Similarly, the standard inner n product of vectors z = (z1, . . . zn), w = (w1, . . . wn) in C is given by n X hz, wi = zjwj. (25.2) j=1 19 (Note that it is necessary to take complex conjugates of the w’s to obtain positive definiteness.) b) `2(N) is defined to be the vector space of square-summable sequences of numbers in K: ∞ 2 X 2 ` (N) = {(a1, a2, . . . an,... ) | an ∈ K, |an| < ∞} (25.3) j=1 (Exercise: prove that `2(N) is a vector space. The Cauchy-Schwarz inequality, proved below, will be helpful). The standard inner product on `2(N) is, for sequences a = (a1, a2,... ) and b = (b1, b2,... ), ∞ X ha, bi = anbn. (25.4) n=1 It is not immediate that this series is absolutely convergent; this will follow easily, however, once we have proven the Cauchy-Schwarz inequality. Assuming convergence for now, it is straightforward to prove that this defines an inner product. c) L2(µ): Let (X, M , µ) be a measure space. Consider the set of all measurable functions f : X → K such that Z |f|2 dµ < ∞ (25.5) X The space L2(µ) is defined to be this set, modulo the equivalence relation which declares f equivalent to g if f = g almost everywhere. The resulting object is a vector space (Exercise: prove this), and as one would expect the inner product is given by Z hf, gi = fg dµ. (25.6) X (Again we must check that the integral is convergent, and again this will follow from Cauchy-Schwarz, or more precisely by imitating the proof of Cauchy-Schwarz.) Notice that the previous examples are special cases of this one, with µ equal to counting measure on {1, . . . n} and N respectively.) Theorem 25.2 (The Cauchy-Schwarz inequality). If u is a pre-inner product on the vector space X, then for all x, y ∈ X, |u(x, y)|2 ≤ u(x, x)u(y, y). (25.7)

Proof. For α ∈ K and x, y ∈ X, we have 0 ≤ u(x − αy, x − αy) (25.8) = u(x, x) − αu(y, x) − αu(x, y) + u(y, y). (25.9) Write u(x, y) in polar form as reiθ. Fix t for now (we will make an optimal choice later) and let α = te−iθ. Then (25.9) becomes 0 ≤ u(x, x) − e−iθtreiθ − eiθrte−iθ + t2u(y, y) (25.10) = u(x, x) − 2rt + t2u(y, y) (25.11) 20 The polynomial q(t) = u(y, y)t2 − 2rt + u(x, x) has real coefficients and is nonnegative for all t, so it has either a repeated real root or complex roots. This means that the discriminant b2 − 4ac is nonpositive, in other words

4r2 − 4u(x, x)u(y, y) ≤ 0 (25.12)

which proves (25.7). 

Remark: Here is a second proof; the “arbitrage” argument given here may help to clarify the optimization trick used in the proof of Cauchy-Schwarz. Expanding out u(x − y, x − y) and rearranging, we get 2Re u(x, y) ≤ u(x, x) + u(y, y) (25.13) Notice that Re u(e−iθx, y) = |u(x, y)|, so replacing x by e−iθx (which leaves the right hand side unchanged) we get 2|u(x, y)| ≤ u(x, x) + u(y, y) (25.14) which says that |u(x, y)| is dominated by the arithmetic mean of u(x, x) and u(y, y). But the inequality we are after says that |u(x, y)| is in fact controlled by the geometric mean–how can the stronger inequality follow from the weaker one? Notice that for λ nonzero and real, 1 the transformation x → λx, y → λ y leaves the left hand side unchanged, so in fact 1 2|u(x, y)| ≤ λ2u(x, x) + u(y, y) for all λ ∈ \{0}. (25.15) λ2 R So now we can minimize the right hand side as a function of λ; doing a little calculus (and taking care of the cases where u(x, x) or u(y, y) is zero) finishes. Moral: inequalities can “self-improve” if there are transformations that leave one side invariant but not the other; apply the transformation and optimize. Example: Let us go back and verify that the series defining the inner product in `2(N) is absolutely convergent. Let a, b ∈ `2(N); by assumption we have

∞ ∞ X 2 X 2 |an| = A < ∞, |bn| = B < ∞ (25.16) n=1 n=1 Fix N; by applying the Cauchy-Schwarz inequality to the standard inner product (25.2) on CN , we have

N N !1/2 N !1/2 X X 2 X 2 |anbn| ≤ |an| |bn| (25.17) n=1 √n=1 n=1 ≤ AB. (25.18)

∞ X So, the partial sums of the series |anbn| are increasing and bounded above, so the series n=1 converges (25.4) converges absolutely. 21 25.3. Norms. Given a vector space X over K and a semi-inner product h·, ·i, define for each x ∈ X kxk := phx, xi. (25.19) This quantity should act something like a “length” of the vector x. Clearly kxk ≥ 0 for all x, and moreover we have:

Theorem 25.3. Let X be a semi- over K, with k · k defined by equation (25.19). Then for all x, y ∈ X and α ∈ K, 1) kx + yk ≤ kxk + kyk 2) kαxk = |α|kxk If h·, ·i is an inner product, then also 3) kxk = 0 if and only if x = 0. Proof. We prove 1) and leave 2) and 3) as exercises. For all x, y ∈ X we have kx + yk2 = hx + y, x + yi (25.20) = kxk2 + 2Re hx, yi + kyk2 (25.21) ≤ kxk2 + 2|hx, yi| + kyk2 (25.22) ≤ kxk2 + 2kxkkyk + kyk2 (25.23) = (kxk + kyk)2 (25.24) where we have used the Cauchy-Schwarz inequality in (25.23). Taking square roots finishes.  When X is an inner product space, the quantity kxk will be called the norm of x. Property (1) will be referred to as the triangle inequality. On Rn, we get 2 2 1/2 kxk = (x1 + ··· + xn) (25.25) which is of course the usual Euclidean norm. We are now ready to define Hilbert spaces.

Definition 25.4. A over K is a vector space X over K, equipped with an inner product, and such that X is complete in the metric d(x, y) = kx − yk. 25.4. Completeness. Our next task is to show that the examples of inner product spaces considered thus far are complete with respect to their norms, and are therefore Hilbert spaces. That Rn and Cn are complete is known from elementary analysis. (Note that the complex case follows from the real case, since the Euclidean norms are equal under the n ∼ 2n natural isomorphism C = R .) Theorem 25.5. L2(µ) is complete.

2 P∞ Proof. We use Proposition 19.3. Suppose fk is a sequence in L (µ) and k=1 kfkk = B < ∞. Define n ∞ X X Gn = |fk| and G = |fk|. (25.26) k=1 k=1 22 Pn By the triangle inequality, kGnk ≤ k=1 kfkk ≤ B for all n, and so by the Monotone Convergence Theorem, Z Z 2 2 2 G dµ = lim Gn dµ ≤ B , (25.27) X n→∞ X so G belongs to L2(µ) and in particular G(x) < ∞ for almost every x. By the definition of G, this implies that the sum ∞ X fk(x) (25.28) k=1 converges absolutely for almost every x; call this (a.e. defined) sum f. By construction, |f| ≤ G and thus f ∈ L2(µ). Moreover, for all n we have 2 n X 2 f − fk ≤ (2G) . (25.29) k=1 Equation (25.27) says that G2 is integrable, so we can apply the Dominated Convergence Theorem to obtain n 2 n 2 X Z X lim f − fk = lim f − fk dµ = 0. (25.30) n→∞ n→∞ k=1 X k=1  25.5. Orthogonality. In this section we show that many of the basic features of the Eu- clidean geometry of Rn can be extended to the Hilbert space setting. Definition 25.6. Let H be an inner product space. Two vectors x, y ∈ X are orthogonal if hx, yi = 0. When x and y are orthogonal we write x ⊥ y. More generally, if A and B are any two subsets of H, we say A ⊥ B if x ⊥ y for all x ∈ A, y ∈ B. Also, we let A⊥ denote the set {x ∈ H | x ⊥ A}.

Theorem 25.7 (The ). If H is an inner product space and f1, . . . fn are mutually orthogonal vectors in H, then 2 2 2 kf1 + ··· + fnk = kf1k + ··· + kfnk . (25.31) Proof. When n = 2, we have 2 2 2 kf1 + f2k = kf1k + 2Re hf1, f2i + kf2k (25.32) 2 2 = kf1k + kf2k . (25.33) The general case follows by induction.  Theorem 25.8 (The ). If H is an inner product space and f, g ∈ H, then kf + gk2 + kf − gk2 = 2(kfk2 + kgk2). (25.34) Proof. As usual, from the definition of the norm we have kf ± gk2 = kfk2 + kgk2 ± 2Re hf, gi. (25.35) Now add.  23 If we look at the proof of the Parallelogram Law and subtract instead of adding, we find kf + gk2 − kf − gk2 = 4Re hf, gi. (25.36) This proves

Theorem 25.9 (The ). If H is an inner product space over R, then 1 hf, gi = kf + gk2 − kf − gk2 (25.37) 4 Remark: There is also a version of the polarization identity in the complex case; its state- ment and proof are left as an exercise. An elementary (but tricky) theorem of von Neumann says, in the real case, that if H is any vector space equipped with a norm k · k such that the parallelogram law (25.34) holds for all f, g ∈ H, then H is an inner product space; the inner product is then given by the formula (25.37). (The proof is simply to define the inner product by equation (25.37), and check that it is indeed an inner product.) There is of course a complex version as well.

25.6. Best approximation. The results of the preceding subsection used only the inner product, but to go further we will need to invoke completeness. From now on, then, we work only with Hilbert spaces. We begin with a fundamental approximation theorem. Recall that if X is a vector space over F, a subset K ⊆ X is called convex if whenever a, b ∈ K and 0 ≤ t ≤ 1, we have (1 − t)a + tb ∈ K as well. (Geometrically, this means that when a, b lie in K, so does the line segment joining them.) Theorem 25.10. Suppose H is a Hilbert space, K ⊆ H is a closed, convex, nonempty set, and h ∈ H. Then there exists a unique vector k0 ∈ K such that

kh − k0k = dist(h, K) := inf{kh − kk : k ∈ K}. (25.38) Proof. First, let’s replace K by K − h = {k − h | k ∈ K}. Then dist(h, K) = dist(0,K − h), so we may assume h = 0. (Is there anything we need to check here?) Let d = dist(0,K) = infk∈K kkk. By assumption, there exists a sequence (kn) in K so that kknk → d. It follows from the parallelogram law that for all m, n, 2 2 km − kn 1 2 2 km + kn = kkmk + kknk − (25.39) 2 2 2

1 1 2 2 Since K is convex, 2 (km + kn) ∈ K, and therefore k 2 (km + kn)k ≥ d . Now let  > 0 and 2 2 1 2 choose N such that for all n ≥ N, kknk < d + 4  . By (25.39), if m, n ≥ N then 2 km − kn 1 2 1 2 2 1 2 < (2d +  ) − d =  . (25.40) 2 2 2 4

This says that kkm − knk <  for m, n ≥ N; that is, (kn) is a Cauchy sequence. Since H is complete, kn converges to a limit k0, and since K is closed, k0 ∈ K. Moreover, for all kn,

d ≤ kk0k = k(k0 − kn) + knk (25.41)

≤ kk0 − knk + kknk → d as n → ∞. (25.42) 24 This means kk0k = d. It remains to show that k0 is the unique element of K with this 0 0 0 property. So, suppose also k ∈ K and kk k = d. Then (k0 + k )/2 belongs to K, and has norm at least d. By the parallelogram law again, 0 2 0 2 k0 − k 1 2 0 2 k0 + k 0 ≤ = kk0k + kk k − (25.43) 2 2 2 1 ≤ (d2 + d2) − d2 = 0 (25.44) 2 0 0 so kk0 − k k = 0, which means k0 = k .  The most important application of the preceding approximation theorem is in the case when K = M is a closed subspace of the Hilbert space H. (Note that a subspace is always convex). What is significant is that in the case of a subspace, the minimizer k0 has an elegant geometric description, namely, it is obtained by “dropping a perpendicular” from h to M. This is the content of the next theorem. Since we will use the notation often, let us write M ≤ H to mean that M is a closed subspace of H.

Theorem 25.11. Let H be a Hilbert space, M ≤ H, and h ∈ H. If f0 is the unique element of M such that kh − f0k = dist(h, M), then (h − f0) ⊥ M. Conversely, if f0 ∈ M and (h − f0) ⊥ M, then kh − f0k = dist(h, M).

Proof. Let f0 ∈ M with kh − f0k = dist(h, M). If f is any vector in M, then f0 + f belongs to M and so 2 2 0 ≤ kh − f0k ≤ kh − (f0 + f)k (25.45) 2 = k(h − f0) + fk (25.46) 2 2 = kh − f0k − 2Re hh − f0, fi + kfk . (25.47) This implies that 2 2Re hh − f0, fi ≤ kfk for all f ∈ M. (25.48) As in the proof of Cauchy-Schwarz, we optimize over f: first, let reiθ be the polar form of iθ hh − f0, fi. Let t > 0. If we replace f in (25.48) by te f, we get 2 2 2t|hh − f0, fi| ≤ t kfk . (25.49)

Dividing by t and taking the limit as t → 0, we see that hh − f0, fi = 0, as desired. Conversely, suppose f0 ∈ M and (h − f0) ⊥ M. In particular, we have (h − f0) ⊥ (f0 − f) for all f ∈ M. Therefore, for all f ∈ M 2 2 kh − fk = k(h − f0) + (f0 − f)k (25.50) 2 2 = kh − f0k + kf0 − fk (why?) (25.51) 2 ≥ kh − f0k . (25.52)

Thus kh − f0k = dist(h, M). 

To reiterate: if M ≤ H and h ∈ H, there exists a unique f0 ∈ M such that (h − f0) ⊥ M. We thus obtain a well-defined function P : H → H (or, we could write P : H → M) defined by P (h) = f0. (25.53) 25 Henceforth we omit the parentheses and just write P h = f0. If the space M needs to be emphasized we will write PM for P . The function P is called the orthogonal projection of H on M. The following theorem describes the basic properties of P . Theorem 25.12. Let H be a Hilbert space, M ≤ H, and P the orthogonal projection on M. Then: a) P is a linear transformation from H into itself. b) kP hk ≤ khk for all h ∈ H. c) P 2 = P . d) ker P = M ⊥ and ran P = M.

Proof. a) Let α ∈ F and h1, h2 ∈ H. We must prove that P (αh1 + h2) = αP H1 + P h2. To do this we prove that αP h1 + P h2 has the property uniquely characterizing P (αh1 + h2), namely, that it is the unique vector f0 ∈ H such that αh1 + h2 − f0 is orthogonal to M. So, let f ∈ M and compute:

h(αh1 + h2) − (αP h1 + P h2), fi (25.54)

= αhh1 − P h1, fi + hh2 − P h2, fi = 0 (25.55) by the definition of P h1 and P h2, and Theorem 25.11. b) If h ∈ H, then h = (h − P h) + P h. But (h − P h) ⊥ M and P h ∈ M, thus by the Pythagorean Theorem khk2 = kh − P hk2 + kP hk2, (25.56) so kP hk ≤ khk. c) If f ∈ M, then evidently P f = f. But P h ∈ M always, so P 2h = P (P h) = P h. d) If P h = 0, then h − P h = h ∈ M ⊥, so ker P ⊆ M ⊥. On the other hand, if h ∈ M ⊥, then ⊥ h − 0 ∈ M , so by Theorem 25.11 P h = 0. That ran P = M is evident.  Corollary 25.13. If M ≤ H, then (M ⊥)⊥ = M. Corollary 25.14. If E is any subset of H, then (E⊥)⊥ is equal to the smallest closed subspace of H containing E. 25.7. The Riesz Representation Theorem. In this section we investigate the dual H∗ of a Hilbert space H. One way to construct bounded linear functionals on Hilbert space is as follows: fix a vector h0 ∈ H and define

L(h) = hh, h0i. (25.57) Indeed, linearity of L is just the linearity of the inner product in the first entry, and the boundedness of L follows from the Cauchy-Schwarz inequality:

|L(h)| = |hh, h0i| ≤ kh0kkhk (25.58)

So kLk ≤ kh0k, but in fact it is easy to see that kLk = kh0k; just apply L to the unit vector h0/kh0k. In fact, it is clear from linear algebra that every linear functional on Kn is of this form. More generally, every bounded linear functional on a Hilbert space has the form just described. 26 Theorem 25.15 (The Riesz RepresentationTheorem). If H is a Hilbert space and L : H → F is a bounded linear functional, then there exists a unique vector h0 ∈ H such that

L(h) = hh, h0i (25.59) for all h ∈ H. Moreover, kLk = kh0k.

Proof. If L = 0 we of course have h0 = 0. So, assume L 6= 0. Notice that if (25.59) holds, then necessarily h0 ⊥ ker L, so this tells us where to look for h0. In particular, since we are assuming L is bounded, it is continuous by Proposition 19.6, and therefore ker L = L−1({0}) ⊥ is a proper, closed subspace of H. We can then find a nonzero vector f0 ∈ (ker L) (why?) and by rescaling we may assume L(f0) = 1. Let h ∈ H and α = L(h). Then

L(h − αf0) = L(h) − α = 0, (25.60) so h − αf0 ∈ ker L. Therefore

0 = hh − αf0, f0i (25.61) 2 = hh, f0i − αkf0k . (25.62) Solving for α, we find f0 α = L(h) = hh, 2 i. (25.63) kf0k 2 Thus, h0 = f0/kf0k works. Uniqueness is immediate, since if hh, h0i = hh, h1i for all h, then h0 = h1. Finally, the expression for the norm was proved in the discussion preceding the theorem.  25.8. Orthonormal Sets and Bases. Definition 25.16. Let H be a Hilbert space. A set E ⊂ H is called orthonormal if a) kek = 1 for all e ∈ E, and b) if e, f ∈ E and e 6= f, then e ⊥ f. An orthonromal set E is called maximal if it is not contained in any larger orthonormal set. A maximal orthonormal set is called an (orthonormal) basis for H. Observe that an orthonormal set E is maximal if and only if the only vector orthogonal to E is the zero vector. Remark: It must be stressed that a basis in the above sense need not be a basis in the sense of linear algebra. In particular, it is always true that an orthonormal set is linearly independent (Exercise: prove this), but in general an orthonormal basis need not span H. 25.9. Examples. n a) Of course the standard basis {e1, . . . , en} is an orthonormal basis of K . b) In much the same way we get a orthonormal basis of `2(N); for each n define ( 1 if k = n e (k) = (25.64) n 0 if k 6= n

∞ It is straightforward to check that the set E = {en}n=1 is orthonormal. In fact, it is 2 a basis. To see this, notice that if h : N → F belongs to ` (N), then hh, eni = h(n), and hence if h ⊥ E, we have h(n) = 0 for all n, so h = 0. 27 2 2πnx c) Let H = L [0, 1]. Consider for n ∈ Z the functions en(x) = e . It is a calculus exercise to check that this set is orthonormal. It is in fact a basis, but this is not obvious; its proof relies on facts from the theory of Fourier series. d) More generally, given any linearly independent sequence f1, f2,... , we can apply the Gram-Schmidt process to obtain an orthonormal set. More precisely: ∞ Theorem 25.17 (Gram-Schmidt process). If (fn)n=1 is a linearly independent se- ∞ quence in H, there exists an orthonormal sequence (en)n=1 such that for each n, span{f1, . . . fn} = span{e1, . . . en}.

Proof. Put e1 = f1/kf1k. Assuming e1, . . . en have been constructed satisfying the conditions of the theorem, define Pn fn+1 − j=1hfn+1, ejiej en+1 = Pn (25.65) kfn+1 − j=1hfn+1, ejiejk The rest of the proof is left as an exercise.  25.10. Basis expansions. Our ultimate goal in this section is to show that vectors in Hilbert space admit expansions as (possibly infinite) linear combinations of basis vectors. The first step in this direction is Bessel’s inequality, of fundamental importance in its own right. Before doing this, we record a useful computation regarding orthogonal projections.

Theorem 25.18. Let {e1, . . . en} be an orthonormal set in H, and let M = span{e1, . . . en}. Then M is closed, and the orthogonal projection onto M is given by n X P h = hh, ejiej. (25.66) j=1 Proof. The proof that M is closed is left as an exercise. Now the right-hand side of (25.66) belongs to M by definition, so to compute P , it is enough to show that if P h is given by the formula (25.66), then (h − P h) ⊥ M. But this is easy: for 1 ≤ m ≤ n, * n + X hh − P h, emi = hh, emi − hh, ejiej, em (25.67) j=1 n X = hh, emi − hh, ejihej, emi (25.68) j=1

= hh, emi − hh, emi = 0. (25.69) 

Theorem 25.19 (Bessel’s inequality). If {e1, e2,... } is an orthonormal sequence in H, then for all h ∈ H ∞ X 2 2 |hh, eni| ≤ khk . (25.70) n=1 Proof. Fix a vector h and a positive integer N. Define N X hN = h − hh, ejiej. (25.71) j=1 28 As we computed in the proof of Theorem 25.18, hN ⊥ ej for all j = 1,...N. Then by the Pythagorean theorem (twice), 2 N 2 2 X khk = khN k + hh, ejiej (25.72) j=1 N 2 X 2 = khN k + |hh, eji| (25.73) j=1 N X 2 ≥ |hh, eji| . (25.74) j=1

Since this holds for all N, we are done.  Corollary 25.20. If E ⊂ H is an orthonormal set and h ∈ H, then hh, ei is nonzero for at most countably many e ∈ E. Proof. Fix h ∈ H and a positive integer N, and define 1 E = {e ∈ E : |hh, ei| ≥ }. (25.75) N N

We claim that EN is finite. If not, then it contains a countably infinite subset {e1, e2,... }. Applying Bessel’s inequality to h and this subset, we get ∞ ∞ X X 1 khk2 ≥ |hh, e i|2 ≥ = +∞, (25.76) n N n=1 n=1 a contradiction. But now, ∞ [ {e ∈ E : hh, ei= 6 0} = EN (25.77) N=1 which is a countable union of finite sets, and therefore countable.  We can now extend Bessel’s inequality to arbitrary (i.e., not necessarily countable) orhtonor- mal sets, if we’re careful. Theorem 25.21. If E ⊂ H is an orthonormal set and h ∈ H, then X |hh, ei|2 ≤ khk2. (25.78) e∈E Remark: The (possibly uncountable) sum in (25.78) is interpreted as follows: we know from Corollary 25.20 that at only countably many of the terms hh, ei are nonzero. Fix an enumeration of these terms, and sum the ordinary infinite series that results. Since terms of the series are positive, if it is convergent then it is absolutely convergent, and hence its sum is independent of the ordering, that is, independent of the enumeration we chose. Proof of Theorem 25.21. The proof consists of combining Bessel’s inequality and the forego- ing remark; the details are left as an exercise.  29 At this point we pause to discuss convergence of infinite series in Hilbert space. We have already encountered ordinary convergence and absolute convergence in our discussion of P∞ PN completeness: recall that the series n=1 hn converges if limN→∞ n=1 hn exists; its limit P∞ h is called the sum of the series. Say the series converges absolutely if n=1 khnk < ∞. P∞ Now, say that the series n=1 is unconditionally convergent if it converges to a sum h, and P∞ if ϕ : N → N is a bijection of N, then n=1 hϕ(n) also converges to h. (In other words, every reordering of the series converges, and to the same sum.) Note that for ordinary scalar series, unconditional convergence is equivalent to absolute convergence, but this is not so in Hilbert space. Theorem 25.22. If E ⊂ H is an orthonormal set and h ∈ H, then the series X hh, eie (25.79) e∈E is unconditionally convergent. Proof. Let F = {e ∈ E : hh, ei 6= 0}. By Corollary 25.20, we know F is countable. We divide the proof into two parts: first, we fix an enumeration e1, e2,... of F , and show that the resulting series converges. We then show that the sum was independent of the choice of enumeration. Part I: Let {e1, e2,... } be an enumeration of F and consider the series ∞ X hh, enien. (25.80) n=1 Let  > 0. By Bessel’s inequality, there exists an integer N so that ∞ X 2 |hh, eni| < . (25.81) n=N+1 Pj Let sj denote the partial sum n=1hh, enien. Then for k > j ≥ N, we have by the Pythagorean theorem 2 k 2 X ksk − sjk = hh, enien (25.82) n=j+1 k X 2 = |hh, eni| (25.83) n=j+1 ∞ X 2 ≤ |hh, eni| (25.84) n=N+1 < . (25.85)

Thus (sj) is a Cauchy sequence, which converges since H is complete. So, the series (25.80) converges. Part II: Let e1, e2,... be the enumeration used in Part I, and let g denote the sum of the series (25.80). Now let ϕ : N → N be any bijection, and consider the corresponding 30 enumeration eϕ(1), eϕ(2),... . The proof in Part I shows that the series ∞ X hh, eϕ(n)ieϕ(n) (25.86) n=1 0 Pj also converges; call its sum g . Let tj denote the partial sum n=1hh, eϕ(n)ieϕ(n). To show that g0 = g, it suffices to show that given  > 0, there exists an integer M so that if j ≥ M, 2 then ksj − tjk <  (why?). Given  > 0, choose N so that (25.81) holds. Now choose M ≥ N so that {1, 2,...N} ⊆ {ϕ(1), ϕ(2), . . . ϕ(M)}. (25.87)

For each j ≥ M, let Gj be the symmetric difference of the sets {1, . . . j} and {ϕ(1), . . . ϕ(j)} (that is, their union minus their intersection). Since j ≥ M, the set Gj is disjoint from {1,...N}. It follows that

2 X 2 ksj − tjk = |hh, eni| (25.88)

n∈Gj ∞ X 2 ≤ |hh, eni| (25.89) N+1 < , (25.90) as desired. This finishes the proof.  We are almost ready to prove the main theorem of this section. Given a set of vectors S ⊂ H, the closed of S, denoted ∨S, is the closure in H of the vector subspace span(S). (As an exercise, prove that the closure of a subspace is again a subspace). Theorem 25.23. If E ⊂ H is an orthonormal set, then the following are equivalent: a) E is a maximal orthonromal set in H. b) If h ⊥ E, then h = 0. c) ∨E = H. d) h = P hh, eie. e∈E P e) (Plancherel theorem) For all g, h ∈ H, hg, hi = e∈Ehg, eihe, hi. 2 P 2 f) (Parseval identity) For all h ∈ H, khk = e∈E |hh, ei| . Proof. The theorem says that any of these six equivalent conditions can be taken to be the definition of an orthonormal basis. a) =⇒ b): This is immediate from the maximality of E. b) =⇒ c): Let M = ∨E. Since M is a closed subspace, M ⊥⊥ = M. By b), M ⊥ = {0}, so M ⊥⊥ = H. c) =⇒ b): If h ⊥ E, then h ⊥ ∨E, hence h = 0. 0 P 0 0 b) =⇒ d): Let h = e∈Ehh, eie. Then (h − h ) ⊥ E (check this carefully!), so h = h . d) =⇒ e): Exercise. e) =⇒ f): Immediate; put g = h. f) =⇒ a): If E is not a basis, choose h ⊥ E with khk = 1. Then X 1 = khk2 = |hh, ei|2 = 0, (25.91) e∈E a contradiction.  31 Definition 25.24. An orthonormal basis for a Hilbert space is a maximal orthonormal set. Theorem 25.25. Every Hilbert space has an orthonormal basis. Proof. The proof is essentially the same as the Zorn’s lemma proof that every vector space has a basis. Let H be a Hilbert space and E the collection of orthonormal subsets of H, partially ordered by inclusion. The Gram-Schmidt process shows that E is nonempty. If (Eα) is an ascending chain in E, then it is straightforward to verify that ∪αEα is an orthonormal set, and is an upper bound for (Eα). Thus by Zorn’s lemma, E has a maximal element.  Theorem 25.26. Any two bases of a Hilbert space H have the same cardinality. Proof. Let E,F be two orthonormal bases. The case where either E or F is finite is left as an exercise. So, assume both E and F are infinite. Fix e ∈ E and consider the set

Fe = {f ∈ F | hf, ei = 0}. (25.92)

Since F is a basis, each Fe is at most countable, and since E is a basis, each f ∈ F belongs S to at least one Fe. Thus e∈E Fe = F , and

[ |F | = Fe ≤ |E| · ℵ0 = |E| (25.93) e∈E where the last equality holds since E is infinite. By symmetry, |F | ≤ |E| and the proof is complete.  In light of this theorem, we make the following definition. Definition 25.27. The (orthogonal) dimension of a Hilbert space H is the cardinality of any orthonormal basis, and is denoted dim H. It can be shown that if H has finite dimension as a vector space, then it also has finite orthogonal dimension, and the two dimensions are equal. (Exercise). This is false in infinite dimensions however. 25.11. Weak convergence. In addition to the norm topology, Hilbert spaces carry another topology called the . In these notes we will stick to the seperable case and just study weakly convergent sequences.

Definition 25.28. Let H be a seperable Hilbert space and (hn) a sequence in H. Say that hn converges weakly to h ∈ H if for all g ∈ H,

hhn, gi → hh, gi. (25.94)

From the Cauchy-Schwarz inequality, we see that if hn → h in norm, then hn → h weakly ∞ also. However, when H is infinite-dimensional, the converse can fail: let {en}n=1 be an orthonormal basis for H. Then en → 0 weakly. (The proof is an exercise, see Problem ??). On the other hand, the sequence (en) is not norm convergent, since it is not Cauchy. In this section we characterize weak convergence as “bounded coordinate-wise convergence” and show that the unit ball of a separable Hilbert space is weakly sequentially compact. ∞ Proposition 25.29. Let H be a Hilbert space with orthonormal basis {ej}j=1. A sequence (hn) in H is weakly convergent if and only if

i) supn khnk < ∞, and 32 ii) limnhhn, eji exists for each j.

Proof. Suppose hn → h weakly. Then for each n we have a linear functional

Ln(g) = hg, hni. (25.95)

It follows that for each fixed g, the sequence |Ln(g)| is uniformly bounded. Thus, the family of linear functionals (Ln) is pointwise bounded, so by the Principle of Uniform boundedness, we have sup khnk = sup kLnk < ∞. This proves (i), and (ii) is immediate from the definition of weak convergence. Conversely, suppose (i) and (ii) hold, let M = sup khnk. Define

hj = limhhn, eji. (25.96)

P 2 P We will show that j |hj| ≤ M (so that the series hjej is norm convergent in H); we then define h to be the sum of this series and show that hn → h weakly. Fix a positive integer J. Then

J J J X 2 X 2 X 2 2 2 |hj| = lim |hhn, eji| = lim |hhn, eji| ≤ lim sup khnk ≤ M (25.97) n n n j=1 j=1 j=1

P 2 P using Bessel’s inequality. This proves that j |hj| ≤ M and thus the series j hjej is norm convergent; call its sum h and observe that hh, eji = hj for each j. Also note that khk ≤ M. P Now we prove that hn → h weakly. Fix g ∈ H and let  > 0. Since g = jhg, ejiej (where the series is norm convergent) we can choose an integer J large enough so that

J ∞ X X g − hg, ejiej = hg, ejiej < . (25.98) j=1 j=J+1

PJ Let g0 = j=1hg, ejiej and write g = g0 + g1; so that kg1k < . Then

|hhn − h, gi| ≤ |hhn − h, g0i| + |hhn − h, g1i| (25.99)

By (ii), the first term on the right hand side goes to 0 as n → ∞, since g0 is a finite sum of ej’s. By Cauchy-Schwarz, the second term is bounded by 2M. As  was arbitrary, we see that the left-hand side goes to 0 as n → ∞. 

Remark: The above proof shows that if hn → h weakly, then khk ≤ lim sup khnk, however it will not be the case that khk = lim khnk in general. This only occurs if in fact hn → h in norm; see Problem ??.

Theorem 25.30 (Weak compactness of the unit ball in Hilbert space). If (hn) is a bounded sequence in a Hilbert space H, then (hn) has a weakly convergent subsequence.

Proof. Using the previous proposition, it suffices to fix an orthonromal basis (ej) and produce

a subsequence hnk such that hhnk , eji converges for each j. This is a standard “diagonaliza- tion” argument, and the details are left as an exercise (Problem 26.12)  33 26. Problems Problem 26.1. Prove the complex form of the polarization identity: if H is a Hilbert space over C, then for all g, h ∈ H 1 hg, hi = kg + hk2 − kg − hk2 + ikg + ihk2 − ikg − ihk2 (26.1) 4 Problem 26.2. (Adjoint operators) Let H be a Hilbert space and T : H → H a bounded linear operator. a) Prove that there is a unique bounded operator T ∗ : H → H satisfying hT g, hi = hg, T hi for all g, h ∈ H, and kT ∗k = kT k. b) Prove that if S,T ∈ B(H), then (aS + T )∗ = aS∗ + T ∗ for all a ∈ K, and that T ∗∗ = T . c) Prove that kT ∗T k = kT k2. d) Prove that kerT is a closed subspace of H, (ranT ) = (kerT ∗)⊥ and kerT ∗ = (ranT )⊥. Problem 26.3. Let H,K be Hilbert spaces. A linear transformation T : H → K is called isometric if kT hk = khk for all h ∈ H, and unitary if it is a surjective isometry. Prove the following: a) T is an isometry if and only if hT g, T hi = hg, hi for all g, h ∈ H, if and only if T ∗T = I (here I denotes the identity operator on H). b) T is unitary if and only if T is invertible and T −1 = T ∗, if and only if T ∗T = TT ∗ = I. c) Prove that if E ⊂ H is an orthonormal set and T is an isometry, then T (E) is an orthonormal set in K. d) Prove that if H is finite-dimensional, then every isometry T : H → H is unitary. e) Consider the shift operator S ∈ B(`2(N)) defined by

S(a0, a1, a2,... ) = (0, a0, a1,... ) (26.2) Prove that S is an isometry, but not unitary. Compute S∗ and SS∗. Problem 26.4. For any set J, let `2(J) denote the set of all functions f : J → K such that P 2 2 j∈J |f(j)| < ∞. Then ` (J) is a Hilbert space. a) Prove that `2(I) is isometrically isomorphic to `2(J) if and only if I and J have the same cardinality. (Hint: use Problem 26.3(c).) b) Prove that if H is any Hilbert space, then H is isometrically isomorphic to `2(J) for some set J. Problem 26.5. Let (X, M , µ) be a σ-finite measure space. Prove that the simple functions that belong to L2(µ) are dense in L2(µ). 2πint Problem 26.6. (The Fourier basis) Prove that the set E = {en(t) := e |n ∈ Z} is an orthonormal basis for L2[0, 1]. (Hint: use the Stone-Weierstrass theorem to prove that the PN 2πint set of trigonometric polynomials P = { n=−M cne } is uniformly dense in the space of continuous functions f on [0, 1] which satisfy f(0) = f(1). Then show that this space of 2 2 continuous functions is dense in L [0, 1]. Finally show that if fn is a sequence in L [0, 1] and 2 fn → f uniformly, then also fn → f in the L norm.)

Problem 26.7. (The Haar basis) Let f0 := 1[0,1) and consider the function f1 := 1 1 −1 1 [0, 2 ) [ 2 ,1) on R. Inductively define a sequence of functions fn+1(x) = fn(2x) + fn(2x − 1). (Draw a 34 picture of the first few of these to see what is going on). Note that each fn is supported on ∞ 2 [0, 1). Prove that (fn)n=0 is an orthonormal basis for L [0, 1]. 2 Problem 26.8. Let (gn)n∈N be an orthonormal basis for L [0, 1], and extend each function to R by declaring it to be 0 off of [0, 1]. Prove that the functions hmn(x) := 1[m,m+1](x)gn(x−m), n ∈ N, m ∈ Z form an orthonormal basis for L2(R). (Thus L2(R) is separable.) Problem 26.9. Let (X, M , µ), (Y, N , ν) are σ-finite measure spaces, and let µ × ν denote 2 2 the product measure. Prove that if (fm) and (gn) are orthonormal bases for L (µ),L (ν) respectively, then the collection of functions {hmn(x, y) = fm(x)gn(y)} is an orthonromal basis for L2(µ × ν). (Problem 26.5 may be helpful.) Use this to construct an orthonormal basis for L2(Rn), and conclude that L2(Rn) is separable. Problem 26.10. (Weak Convergence)

a) Prove that if hn → h in norm, then also hn → h weakly. (Hint: Cauchy-Schwarz.) b) Prove that if H is infinite-dimensional, and (en) is an orthonormal sequence in H, then en → 0 weakly, but en 6→ 0 in norm. (Thus weak convergence does not imply norm convergence.) c) Prove that hn → h in norm if and only if hn → h weakly and khnk → khk. Problem 26.11. Suppose H is infinite-dimensional. Prove that if h ∈ H and khk ≤ 1, then there is a sequence hn in H with khnk = 1 for all n, and hn → h weakly. Problem 26.12. Prove Theorem ??.

27. Lp spaces Definition 27.1. Let (X, M , µ) be a measure space. For 0 < p < ∞, let Lp(µ) denote the space of measurable functions f : X → C which satisfy Z 1/p p kfkp := |f| dµ < ∞. (27.1) X where we identify f and g when f = g a.e. First by the inequality |f + g|p ≤ (2 max(|f|, |g|)p ≤ 2p(|f|p + |g|p) (27.2) and the monotonicity of the integral, we see that Lp is a vector space for all 0 < p < ∞. p Proposition 27.2. If 1 ≤ p < ∞, then kfkp is a norm on L .

Proof. Trivially kfkp ≥ 0, and if kfkp = 0 then f = 0 a.e., which means f = 0 since we have identified functions that agree almost everywhere. The homogoeneity kcfkp = |c|kfkp is evident from the definition. To prove the triangle inequality, let f, g ∈ Lp. To prove the triangle inequality we make several reductions. First, since |f + g|p ≤ (|f| + |g|)p, we can assume f, g ≥ 0. Next, we can scale f and g by the same constant factor so that kfkp + kgkp = 1 and using homogeneity again we can write f = tF, g = (1 − t)G with p F,G ∈ L and kF kp = kGkp = 1. With this setup proving the triangle inequality amounts to proving Z |tF + (1 − t)G|p dµ ≤ 1. (27.3) X 35 But now note that the function x → |x|p is convex on [0, +∞), that is, p p p |tx1 + (1 − t)x2| ≤ t|x1| + (1 − t)|x2| (27.4) for all x1, x2 ≥ 0. Applying this in our case we have Z Z |tF + (1 − t)G|p dµ ≤ t|F |p + (1 − t)|G|p dµ = 1 (27.5) X X as desired.  Remark: Note that the proof of the triangle inequality breaks down when p < 1 because the function x → |x|p is not convex for these p. In fact, one can use the non-convexity to show that the triangle inequality fails in this range–indeed since x → |x|p is concave, we have that ap + bp > (a + b)p for all a, b > 0. Now take two disjoint sets E,F of finite, positive measure. We have 1/p 1/p 1/p k1E + 1F kp = (µ(E) + µ(F )) > µ(E) + µ(F ) = k1Ekp + k1F kp. (27.6) When p = ∞, we define the space L∞(µ) to be the space of all essentially bounded measurable functions, again identifying functions that agree almost everywhere. Recall that a function is essentially bounded if there exists a number M < ∞ such that |f(x)| ≤ M for a.e. x ∈ X. We can then define kfk∞ to be the smallest M with this property. In particular we have

kfk∞ = inf{M : |f(x)| ≤ M a.e.} = sup{M : µ({x : |f(x)| > M}) > 0}. (27.7) ∞ ∞ It is straightforward to check that kfk∞ is a norm on L (µ), and that fn → f in L if and ∞ only if fn converges to f essentially uniformly; and it follows from this that L is complete. (Problem 28.5.

p Example 27.3. Suppose f = A1E with A > 0 and 0 < µ(E) < ∞. Then f ∈ L for every 1/p 0 < p ≤ ∞. In fact kfkp = Aµ(E) for finite p, and kfk∞ = A. Thus kfk∞ = limp→∞ kfkp. In fact, this is true generically (see Problem ??). We next show that Lp is complete for 1 ≤ p < ∞; the proof is essentially the same as the ones already given in the case of L1 and L2: Theorem 27.4. For 1 ≤ p ≤ ∞, Lp is a Banach space. Proof. The p = ∞ case is Problem 28.5. For 1 ≤ p < ∞, as before we use Proposi- p tion 19.3 and the monotone convergence theorem. Suppose fn is a sequence in L and P∞ PN P∞ M = n=1 kfnkp < ∞. Then k n=1 |fn|kp ≤ M for all N, so by MCT the series n=1 fn(x) is absolutely convergent for almost every x. Let f denote its sum; then kfkp ≤ M < ∞, so f ∈ Lp. Finally we need to check that the sum converges to f in the Lp norm. But

N ∞ ∞ X X X f − fn = fn ≤ kfnkp → 0 as N → ∞ (27.8) n=1 p n=N+1 p N+1 and we are done.  Suppose p < ∞. Then f ∈ Lp if and only if |f|p ∈ L1, so by applying Markov’s inequality to |f|p we obtain 36 Proposition 27.5 (Chebyshev’s inequality). Suppose f ∈ Lp(µ) and t > 0. Then Z  p 1 p kfkp µ({x : |f| > t}) ≤ p |f| dµ = . (27.9) t X t One consequence of Chebyshev’s inequality is that if f ∈ Lp (for finite p!) then f is essentially supported on a σ-finite set. (Say f is essentially supported on E if f = 0 a.e. on 1 the complement of E.) Indeed, if we let En = {|f| > n } then by Chebyshev each En has S∞ p finite measure, and f is essentially supported on n=1 En. Thus many questions about L functions can be reduced to the σ-finite case. In particular this is usually possible when one is dealing with at most countably many functions at a time. As in the case of L1, we have the following density result: Proposition 27.6. Simple functions are dense in Lp for all 1 ≤ p ≤ ∞. Proof. We prove the p < ∞ case and leave p = ∞ as an exercise. Let f ∈ Lp. It is straightforward to check that Ref, Imf are in Lp, as are the positive and negative parts when f is real valued. Thus we may assume f is unsigned. Next, by monotone convergence p we see that f can be approximated in L by bounded functions (take fn = min(f, n)), so we can assume f is bounded; and again by monotone convergence we can approximate f in Lp 1 by functions supported on sets of finite measure (take fn to be 1En f, where En = {|f| > n }). Thus we may assume f is nonnegative, bounded, and supported on a set of finite measure. But in this case f can be approximated essentially uniformly by simple functions (see ?? from last semester’s notes) and it is easy to verify that if fn → f essentially uniformly on a p 1 finite measure space, then fn → f in L also (the proof is the same as in the L case).  When we write Lp(Rn) we always refer to . We then have the following density result for Lp(Rn); the proof is essentially the same as the L1 case and is left as an exercise. Proposition 27.7. For 1 ≤ p < ∞ the space of continuous functions of compact n p n Cc(R ) is dense in L (R ). n ∞ n It should be clear that Cc(R ) is not dense in L (R ) (why?) There is an important algebraic relation among the Lp spaces, expressed by the following inequality: 1 1 Theorem 27.8 (H¨older’sinequality). Suppose 1 ≤ p, q ≤ ∞ and p + q = 1 (interpret 1/∞ = 0). If f ∈ Lp and g ∈ Lq, then fg ∈ L1, and

kfgk1 ≤ kfkpkgkq. (27.10) Proof. The proof is very simple in the case p = ∞ or q = ∞; (in which case the other exponent is 1), so we assume 1 < p, q < ∞. We may assume f, g are both nonzero, and by homogeneity we may normalize so that kfk = kgkq = 1. We must now show that Z |fg| dµ ≤ 1. (27.11) X We again use covexity, this time of the exponential function t → et. This implies the convexity of the function H(t) = |f(x)|p(1−t)|g(x)|qt for each fixed x. (To see this, as- sume f(x), g(x) 6= 0 and rearrange the function as H(t) = |f(x)|p · exp(ct), where c = 37 log(|f(x)|p/|g(x)|q).) In particular, the convexity of H means that (using the fact that 1 1 p + q = 1) 1 1 1 1 1 H( ) = H( · 0 + · 1) ≤ H(0) + H(1) (27.12) q p q p q which says 1 1 |f(x)g(x)| ≤ |f(x)|p + |g(x)|q (27.13) p q Integrating this last expression dµ and applying the normalizations on p, q, f, g gives (27.11).  Remark: One can get a more intuitive feel for what H¨older’sinequality says by examining it in the case of step functions. Let E,F be sets of finite, positive measure and put f = 1E, g = 1F . Then kfgk1 = µ(E ∩ F ) and 1/p 1/q kfkpkqkq = µ(E) µ(F ) , (27.14) 1 1 so H¨older’sinequality can be proved easily in this case using the relation p + q = 1 and the fact that µ(E ∩ F ) ≤ min(µ(E), µ(F )). Remark: A more general version of H¨older’sinequality says the following: let 1 ≤ p, q ≤ 1 1 1 p q r ∞ and r = p + q . If f ∈ L and g ∈ L , then fg ∈ L , and

kfgkr ≤ kfkpkgkq (27.15) (The proof is Problem 28.1 One important consequence of H¨older’sinequality is the following: given 1 ≤ p ≤ ∞, we 0 0 1 1 let p denote the conjugate exponent to p, namely the p for which p + p0 = 1. Then for any q fixed g ∈ L , the linear functional λg defined by Z Lg(f) = fg dµ (27.16) X p p is bounded on L , and kLgk ≤ kgkq. One of the most important facts about L spaces is that, in most cases, the converse is true: Theorem 27.9. Let 1 ≤ p < ∞ and suppose µ is a σ-finite measure. Then if λ : Lp(µ) → C p0 is a bounded linear functional, there exists a unique g ∈ L (µ) such that λ = λg; moreover kλk = kgkp0 . Proof. The idea of the proof is to use the linear functional λ to define a set function E → λ(1E). One proves that this defines a measure absolutely continuous to µ, from which we obtain g via the Radon-Nikodym theorem. Finally we must show that this g belongs to Lp0 . We now turn to the details. We first prove the theorem under the assumption that µ is finite. Let u = Re(λ). Define

ν(E) := u(1E). (27.17) p Since µ is finite, 1E ∈ L for all measurable E, so u takes only finite, real values. Next, since λ is bounded, u is also bounded, hence continuous. If (En) is a sequence of disjoint P∞ measurable sets, then the monotone convergence theorem shows that the series n=1 1En is p S∞ P∞ convergent in L , so by the linearity and continuity of ν we have ν( n=1 En) = n=1 ν(En). Thus ν is countably additive, and hence defines a signed measure on (X, M ). Moreover, if p µ(E) = 0, then 1E = 0 in L so ν(E) = 0. Thus, the measure ν is absolutely continuous to 38 µ. Let g1 denote the Radon-Nikodym derivative dν/dµ. Applying the same arguments to Imλ, we define g2 similarly, and put g = g1 + ig2. Moreover, the finiteness of the measure ν shows that g ∈ L1(µ). Next we prove that g ∈ Lp0 . For this it suffices to treat the real and imaginary parts separately; and in turn considering positive and negative parts we may assume g ≥ 0. Frist assume p > 1. If we could choose f ∈ Lp so that fg = gp0 (that is, if we knew that gp0−1 belonged to Lp) we would be done. However from the identity pp0 − p = p0 it follows that g(p0−1)p = gp0 , so this argument is circular. However, since we have already shown that g ∈ L1, by vertical truncation (Problem 28.3) it follows that min(g, N) belongs to Lq for all p0−1 p q ≥ 1 and all N ≥ 0. If we now put fN = min(g, N) , then fN ∈ L . Thus, testing g against fN , we have Z Z p0−1 p0 p0 kfN kkλgk ≥ λg(fN ) = min(g, N) g dµ ≥ min(g, N) ≥ k min(g, N)kp0 . (27.18) X X

p0−1 On the other hand, kfN k = k min(g, N)kp0 , and combining this with (27.18) we see that k min(g, N)kp0 ≤ kλgk. Letting N → ∞, by monotone congvergence we conclude that kgkp0 ≤ kλgk < ∞, and since H¨oldergives the reverse inequality, we have kλgk = kgkp0 . In the p = 1 case, put E = {g > t}. If µ(E ) > 0 for all t > 0, let f = 1 1 . Then t t t µ(Et) Et kftk1 = 1 for all t, while Z 1 Z kλgk = kλgkkftk ≥ ftg dµ = g dµ ≥ t (27.19) X µ(Et) Et

∞ for all t, which is a contradiction once t > kλgk. It follows that g ∈ L and µ(Et) = 0 for all t > kλgk, so kgk∞ ≤ kλgk. The proof that g is unique is similar to the uniqueness part of the proof of the Lebesgue- Radon-Nikodym theorem and is left as an exercise. Finally, we move from the finite to the σ-finite case. Let (X, M , µ) be a σ-finite measure space. We can write X as an increasing union of sets Xn of finite measure. Let λ be a p bounded linear functional on L (µ) and let µn be the restriction of µ to Xn. By restriction p we get bounded linear functionals λn on each L (µn), and kλnk ≤ kλk. By the arguments p0 above, on each Xn there is a function gn ∈ L (µn) such that λn = λgn . By uniqueness,

when n ≥ m we must have gn|Xm = gm almost everywhere on Xm. The gn then converge pointwise a.e. on X to a function g, and by monotone convergence we have g ∈ Lp0 (µ), since R p0 X |gn| dµ = kλgn k ≤ kλgk.  Corollary 27.10. For 1 < p < ∞, Lp is reflexive. The theorem above fails in general when p = ∞. Certainly every g ∈ L1 defines a bounded ∞ R linear functional on L by the formula λg(f) = X fg dµ, but unless µ is a finite sum of atoms, there exist bounded linear functionals on L∞ that are not of this form. An abstract way to see this is that in when µ is σ-finite, L1(µ) is separable, but L∞(µ) is not separable if µ is not a finite sum of atoms. Thus if it were the case that (L∞)∗ ∼= L1, then L∞ would be separable by the result in Problem 22.6, a contradiction. Problem 28.10 gives a somewhat more explicit argument in the case of L∞(R). 39 27.1. Distribution functions and weak Lp. Let (X, M , µ) be a measure space and f : X|toC a measurable function. The distribution function of f is the function λf : (0, +∞) → [0, +∞] defined by λf (t) = µ({x : |f(x)| > t}. (27.20)

To begin building an intuition about λf , we have the following lemma. Pn Lemma 27.11. Let f = j=1 cj1Ej be a simple function with the Ej disjoint measurable sets, and the cj ordered as 0 < c1 < c2 < ··· < cn < +∞. Then  µ(En) + ··· + µ(E2) + µ(E1), 0 ≤ t < c1,  µ(E ) + ··· + µ(E ), c ≤ t < c ,  n 2 1 2 λf (t) = ··· , ··· (27.21) µ(E ), c ≤ t < c ,  n n−1 n  0 t ≥ cn Proof. Problem 28.11. 

The basic properties of λf are collected in the following proposition:

Proposition 27.12. a) λf is decreasing and right continuous. b) [Monotonicity] If |f| ≤ |g| a.e., then λf ≤ λg everywhere.

c) [Monotone convergence] If |fn| increases to |f| pointwise a.e., then λfn increases to λf . d) [Subadditivity] If f = g + h, then λf (t) ≤ λg(t/2) + λh(t/2).

Proof. Let E(t, f) = {x : |f(x)| > t}. λf is decreasing since E(t, f) ⊂ E(s, f) when s < t, 1 and right continuous since E(t, f) is the increasing union of E(t + n , f). (b) is immediate from the definition. For (c) we have that for each fixed t > 0, E(t, f) is the incereasing union of the E(tn, f). Finally (d) is a pigeonhole argument: if |f(x)| > t, then either |g(x)| > t/2 or |h(x)| > t/2.  The main use of the distribution function is to convert integrals of functions of f into integrals against the measure induced by λf . Indeed, the function λf (t) is decreasing and right continuous on [0, +∞] and hence defines a (negative) Borel measure ν on [0, +∞] by

ν((a, b]) := λf (a) − λf (b) (27.22) and passing to the Caratheodory extension. Thus if ϕ : [0, +∞] → C is a Borel function we R R can consider integrals ϕ dν = ϕ dλf . The following formula relates these integrals to f and µ.

Proposition 27.13. Suppose λf (t) < ∞ for all t > 0 and ϕ ≥ 0 is an unsigned Borel function. Then Z Z ∞ ϕ(|f|) dµ = − ϕ(t) dλf (t). (27.23) X 0

Proof. Let 0 ≤ a < b and ϕ = 1(a,b]. Then ϕ(|f|) = 1a<|f|≤b, so Z ϕ(|f|) dµ = µ(a < |f| ≤ b) = λf (b) − λf (a); (27.24) X 40 on the other hand Z ∞ − ϕ(t) dλf (t) = −ν((a, b]) = −(λf (a) − λf (b)) (27.25) 0 so (27.23) holds when ϕ = 1(a,b]. Since both sides are linear in ϕ, it also holds for simple functions, and then for all unsigned Borel functions by monotone convergence.  The most important case of the above is ϕ(t) = tp, since it will allow us to obtain very p R p useful expressions for the L integrals X |f| dµ. In fact what is most useful is not (27.23) itself but its “integrated-by-parts” form: Proposition 27.14. If 0 < p < ∞ then Z Z ∞ p p−1 |f| dµ = p t λf (t) dt. (27.26) X 0 Proof. This can be proved using the previous proposition and integration by parts for Lebesgue-Stieltjes measures, or directly as follows: first, if λf (t) = +∞ for some t then both integrals are infinite. Otherwise, first let f be a simple function; then the identity can be verified directly using Lemma 27.11. For general f, take a sequence of simple functions fn increasing to |f|; then the formula holds by Lemma 27.11, Proposition 27.12(c), and monotone convergence.  The distribution function is used to define the so-called “weak Lp” spaces, as follows: first observe that if f ∈ Lp, then from Chebyshev’s inequality we have Z 1 p µ({|f| > t}) ≤ p |f| dµ (27.27) t X or rearranging p p t λf (t) ≤ kfkp for all t > 0. (27.28) For general f, say that f belongs to weak Lp if p 1/p (sup t λf (t)) := [f]p < ∞. (27.29) t>0 From what was just said, if f ∈ Lp, then f belongs to weak Lp, but the converse does not hold. The standard example is f(x) = x−1/p on (0, ∞). On the other hand, weak Lp functions are “almost” in Lp, in the sense that if we use Proposition 27.14 we find Z Z ∞ Z ∞ p p−1 −1 |f| dµ = p t λf (t) dt ≤ [f]p t dt (27.30) X 0 0 and the integral is just barely divergent.

27.2. The Hardy-Littlewood maximal function redux. As an illustration of the useful- ness of the distribution function (and the associated idea of splitting Lp functions into their “small” and “large” parts, we reconsider the Hardy-Littlewood maximal function. (This and the next subsection follow sections I.1 and I.4 of Singular Integrals and Differentiability Properties of Functions by Eli Stein.) Let f be a locally integrable function on Rn, then 1 Z (Mf)(x) := sup |f(y)| dy (27.31) r>0 m(B(r, x)) B(r,x) 41 In the language of distribution functions, the Hardy-Littlewood maximal theorem (Theorem 1 n 1 n 16.4) says that if f ∈ L (R ), then Mf belongs to weak L , with [Mf]1 ≤ 3 kfk1. We now investigate what happens for f ∈ Lp, 1 < p ≤ ∞, and find the situation is rather better. First, for p = ∞ it is trivial that kMfk∞ ≤ kfk∞. For finite p we have:

Theorem 27.15. If f ∈ Lp(Rn), 1 < p < ∞, then Mf ∈ Lp and  3np 1/p kMfk ≤ 2 kfk (27.32) p p − 1 p

Proof. Let f ∈ Lp(Rn). We fix a parameter t > 0 and use it to cut off f: let ( f(x) if f(x) ≥ t/2, f (x) := . (27.33) 1 0 otherwise

1 Note that by vertical truncation, f1 ∈ L , and we have |f(x)| ≤ |f1(x)|+t/2 and (Mf)(x) ≤ (Mf1)(x) + t/2. It follows that

{x : Mf(x) > t} ⊂ {x : Mf1(x) > t/2} (27.34)

1 so by the Hardy-Littlewood maximal theorem applied to the L function f1, writing Et = {x : Mf(x) > t} we have 2 · 3n 2 · 3n Z m(Et) ≤ kf1k1 = |f(y)| dy. (27.35) t t |f|>t/2

Now write g = Mf, then m(Et) is just the distribution function λg(t). By Proposition 27.14 we have Z Z ∞ p p−1 |Mf(x)| dx = p t λg(t) dt. (27.36) Rn 0 Using (27.35) we obtain Z ∞ Z ∞  n Z  p p−1 p−1 2 · 3 kMfkp = p t m(Et) dt ≤ p t |f(y)| dy dt (27.37) 0 0 t |f|>t/2 We apply Fubini to the double integral and inegrate dt first. This gives

Z ∞  Z  Z Z 2|f(y)| ! p tp−1 2 · 3n |f(y)| dy dt = |f(y)| tp−2 dt dy (27.38) 0 |f|>t/2 Rn 0 The inner integral is Z 2|f(y)| 1 tp−2 dt = 2p−1|f(y)|p−1 (27.39) 0 p − 1 so we have finally p n Z p 2 · 3 p p kMfkp ≤ |f(y)| dy (27.40) p − 1 Rn which finishes the proof.  42 27.3. The Marcinkiewicz interpolation theorem. The idea used in the proof of the Lp boundedness of the Hardy-Littlewood maximal operator can be extended to prove a more general result, called the Marcinkiewicz interpolation theorem. We will not consider the most general version of the theorem here (it may be found in Folland) but prove a special case that is adequate for many purposes. We fix a measureable space (X, M , µ); Lp always refers to Lp(µ). We need a few definitions. Let 1 ≤ p, q ≤ ∞. A mapping T : Lp → Lq is of type (p, q) if there is a constant A > 0 so that

kT fkq ≤ Akfkp (27.41) for all f ∈ Lp. The mapping is of weak type (p, q) if Akfk q µ({x : |T f(x)| > t} ≤ p , for q < ∞ (27.42) t where the constant A does not depend on f or t. (In other words, T maps Lp into weak Lq, with [T f]q ≤ Akfkp.) When q = ∞ weak type (p, q) simply means type (p, q). We say that a transformation T (defined on some space of measurable functions) is sub- linear if |T f(x)| ≤ |T g(x)| + |T h(x)| for f = g + h, and |T (cf)| = |c||T f| for all scalars c. Finally we let Lp + Lq denote the vector space of functions of the form f = g + h where g ∈ Lp, h ∈ Lq. Note that if p < r < q, then by the truncation lemmas we see that Lr ⊂ Lp + Lq. Theorem 27.16 (Marcinkiewicz interpolation theorem (special case)). Suppose that 1 < r ≤ ∞. If T is a sub-linear transformation from L1 + Lr to the vector space of measurable functions, and T is of weak type (1, 1) and weak type (r, r), then T is of type (p, p) for all 1 < p < r. Proof. The case r = ∞ closely parallels the proof given for the Hardy-Littlewood maximal function and is left as an exercise, so we assume 1 < r < ∞. Fix f ∈ Lp, we wish to estimate its distribution function λf (t). We fix t > 0 for the moment and use this to cut off f: define ( f(x) if |f(x)| > t f (x) := (27.43) 1 0 if |f(x)| ≤ t

( f(x) if |f(x)| ≤ t f (x) := (27.44) 2 0 if |f(x)| > t

1 r so that f = f1 + f2, f1 ∈ L and f2 ∈ L . Since |T f| ≤ |T f1| + |T f2|, we have by the subadditvity of λ

λT f (t) ≤ λT f1 (t/2) + λT f2 (t/2) (27.45) and so by the assumption that T is of weak type (1, 1) and (r, r), A Z Ar Z λ (t) ≤ 1 |f (x)| dµ(x) + r |f (x)|r dµ(x) (27.46) T f t/2 1 (t/2)r 2 43 for some fixed constants A1,Ar as in the definition of weak type. Because of the choice of splitting f = f1 + f2, we have Z r Z 2A1 (2Ar) r λT f (t) ≤ |f(x)| dµ(x) + r |f(x)| dµ(x) (27.47) t |f|>t t |f|≤t

p R ∞ p−1 p−1 Using the formula kT fkp = p 0 t λT f (t) dt we multiply both sides of (27.47) by pt and integrate dt. To handle first integral in (27.47) we observe

Z ∞ Z Z Z |f| tp−1t−1 |f(x)| dµ(x)dt = |f| tp−2 dt dµ(x) (27.48) 0 |f|>t X 0 1 Z = |f||f|p−1 dµ (27.49) p − 1 X since p > 1, similarly for the second integral Z ∞ Z Z Z ∞ tp−1t−r |f(x)|r dµ(x)dt = |f|r tp−1−r dt dµ(x) (27.50) 0 |f|≤t X |f| 1 Z = |f|r|f|p−r dµ (27.51) r − p X Putting these together we find that  2rA (2A )r  kT fk ≤ A kfk , with Ap = 1 + r p. (27.52) p p p p p − 1 r − p



27.4. Some inequalities. The following should be viewed as a continuous analog of the triangle inequality for the Lp norm; it says that “the Lp norm of the integral is less than the integral of the Lp norms.”

Theorem 27.17 (Minkowski’s inequality for integrals). Let (X, M , µ) and (Y, N , ν) be σ-finite measure spaces and let f be a jointly measurable function on X × Y . a) If f is unsigned and 1 ≤ p < ∞, then Z Z p 1/p Z Z 1/p f(x, y) dν(y) dµ(x) ≤ f(x, y)p dµ(x) dν(y) (27.53) X Y Y X p b) Suppose 1 ≤ p ≤ ∞, f(·, y) ∈ L (µ) for a.e. y, and the function kf(·, y)kp is in L1(ν). Then f(x, ·) ∈ L1(ν) for a.e. x, the function R f(x, y) dν(y) is in Lp(µ), and Z Z

f(·, y) dν(y) ≤ kf(·, y)kp dν(y). (27.54) p Proof. (b) follows immadiately from (a) (replacing f with |f| and applying Fubini’s theorem). To prove (a), note that this is just Tonelli’s theorem when p = 1. For the case 1 < p < ∞, let h(x, y) = R f(x, y) dν(y) and fix g ∈ Lq(µ) where q is the conjugate index to p. If the 44 right-hand side of (27.53) is infinite, there is nothing to prove. If it is finite, then the function f(·, y) belongs to Lp(µ) for almost every y, so by Fubini and H¨older, Z ZZ h(x)|g(x)| dµ(x) = f(x, y)|g(x)| dµ(x)dν(y) (27.55)

Z Z 1/p p ≤ f(x, y) dµ(x) dν(y)kgkq. (27.56)

Thus the linear functional g → R hg dµ is bounded on Lq(µ) with norm at most Z Z 1/p f(x, y)p dµ(x) dν(y) (27.57)

p so by Theorem 27.9, we see that h ∈ L (µ) and obeys the estimate (27.53).  There are many important linear operators on Lp spaces that are expressible in the form Z T f(x) = K(x, y)f(y) dν(y). (27.58)

The next proposition gives a simple sufficient condition for the boundedness of such an operator on Lp; it is a special case of a more general criterion known as the Schur test. Proposition 27.18 (Young’s inequality). Let µ, ν be σ-finite positive measures on spaces X,Y respectively and let K(x, y) be a jointly measurable funciton on X × Y . Suppose there is a constant C so that Z Z sup |K(x, y)| dν(y) ≤ C, sup |K(x, y)| dµ(x) ≤ C. (27.59) x∈X Y y∈Y X If f ∈ Lp(ν), 1 ≤ p ≤ ∞ and T f is defined by (27.58), then T f ∈ Lp(µ) and

kT fkp ≤ Ckfkp. (27.60) Proof. If either p = 1 or p = ∞, the proof is straightforward and left as an exercise. Suppose 1 < p < ∞. The idea is to split K as K = K1/qK1/p and apply H¨older’sinequality. So, let 1 1 p + q = 1, then Z Z 1/q Z 1/p |T f(x)| ≤ |K(x, y)||f(y)| dν(y) ≤ |K(x, y)| dν(y) |K(x, y)||f(y)|p dν(y) Y Y Y Z 1/p ≤ C1/q |K(x, y)||f(y)|p dν(y) Y so using Tonelli’s theorem Z Z Z  |T f(x)|p dµ(x) ≤ Cp/q |K(x, y)||f(y)|p dν(y) dµ(x) X X Y Z Z  = Cp/q |f(y)|p |K(x, y)| dµ dν(y) Y X Z ≤ C1+p/q |f(y)|p dν(y) Y p p = C kfkp 45 th and taking p roots finishes. 

One context in which Young’s inequality is often used is the following: let X = Y = Rn with Lebesgue measure, and fix a function g ∈ L1(Rn). If we put K(x, y) = g(x − y), then for a measurable function f on Rn the function Z T f(x) := g(x − y)f(y) dy (27.61) Rn is called the convolution of f and g, defined at each x where the integrand is L1. Usually we write T f = g ∗ f. From the translation invariance of Lebesgue measure, we have Z Z sup |g(x − y)| dy = sup |g(x − y)| dx = kgk1 (27.62) x∈Rn y∈Rn

so the hypotheses of Proposition 27.18 are satisfied with C = kgk1. We conclude

Corollary 27.19 (Young’s inequality for convolutions). If g ∈ L1(Rn), f ∈ Lp(Rn), 1 ≤ p ≤ ∞, then g ∗ f ∈ Lp(Rn) and

kg ∗ fkp ≤ kgk1kfkp. (27.63)

28. Problems Problem 28.1. Prove the generalized H¨olderinequality (27.15). (Hint: consider F = |f|r,G = |g|r.) Problem 28.2. Suppose f, g ≥ 0 with f ∈ Lp, g ∈ Lq, 0 < p, q < ∞. Show that equality holds in H¨older’sinequality (and the generalized H¨olderinequality) if and only if one the functions f p, gq is a scalar multiple of the other. Problem 28.3 (Truncation of Lp functions). Suppose f is an unsigned function in Lp(µ), 1 < p < ∞. For t > 0 let

Et = {x : f(x) > t}. (28.1) q a) Show that for each real number t > 0, the horizontal truncation 1Et f belongs to L for all 1 ≤ q ≤ p. b) Show that for each real number t > 0, the vertical truncation ft := min(f, t) belongs to Lq for all p ≤ q ≤ ∞. c) As a corollary, show that every f ∈ Lp, 1 < p < ∞, can be decomposed as f = g + h where g ∈ L1 and h ∈ L∞.

p0 ∞ p Problem 28.4. Suppose f ∈ L ∩ L for some p0 < ∞. Prove that f ∈ L for all p0 ≤ p ≤ ∞, and limp→∞ kfkp = kfk∞.

∞ Problem 28.5. Prove that fn → f in the L norm if and only if fn → f essentially uniformly, and that L∞ is complete.

p0 p1 p Problem 28.6. Suppose p0 < p < p1 and f ∈ L ∩ L . Prove that f ∈ L and kfkp ≤ kfk1−θkfkθ , where 0 < θ < 1 is chosen so that 1 = 1−θ + θ . When does equality hold? p0 p1 p p0 p1 46 Problem 28.7 (Containments of Lp spaces). a) Show that if µ is a finite measure, then Lp ⊂ Lq for all p ≥ q. b) Show that `p ⊃ `q for all p ≥ q. c) More generally, show that Lp ⊂ Lq for all p ≥ q if and only if µ does not admit sets of arbitrarily large finite measure, and Lp ⊃ Lq for all p ≥ q if and only if µ does not admit sets of arbitrarily small positive measure. Problem 28.8. Show that Lp(Rn) 6⊂ Lq(Rn) for any pair p, q. p p Problem 28.9 (Convergence in L norm). Prove that if fn → f in the L norm, then fn → f in measure, and hence a subsequence converges to f a.e. Conversely, show that if p fn → f in measure and there exists a g ∈ L such that |fn| ≤ g for all n, then fn → f in the Lp norm. (Hint: go back and look at the results in Section 12 in last semester’s notes, especially Remark 12.18 and Corollary 12.19.) Problem 28.10. Consider L∞(R). ∞ a) Show that M := C0(R) is a closed subspace of L (R) (more precisely, that the set ∞ ∞ of L functions that are a.e. equal to a C0 function is closed in L ). Prove that ∞ there is a bounded linear functional λ : L → K such that λ|M = 0 and λ(1R) = 1. b) Prove that there is no function g ∈ L1( ) such that λ(f) = R fg dm for all f ∈ L∞. R R (Hint: look at the restriction of λ to C0(R).) Problem 28.11. Prove Lemma 27.11, and use it to carry out the calculation omitted in the proof of Proposition 27.14. Problem 28.12. Prove that if f ∈ Lp then p p lim t λf (t) = lim t λf (t) = 0 t→0 t→∞ (Hint: first suppose f is a simple function.) Problem 28.13. Prove the r = ∞ case of the Marcinkiewicz interpolation theorem. Problem 28.14. Prove the following more general form of Young’s inequality: suppose 1 1 1 p, q, r ≥ 1 satisfy q = p + r − 1. Suppose K(x, y) satisfies

sup kK(x, ·)kLr(ν) ≤ C, sup kK(·, y)kLr(µ) ≤ C x∈X y∈Y p R q for some constant C. Prove that if f ∈ L (ν), then T f = Y K(x, y)f(y) dν(y) ∈ L (µ), and

kT fkq ≤ Ckfkp. (Hint: use the same strategy of splitting |K| = |K|α|K|β, for a suitable choice of α + β = 1.) Deduce the following corollary for convolutions on Rn: with p, q, r as above, if f ∈ Lp(Rn) and g ∈ Lr(Rn), then g ∗ f ∈ Lq(Rn) and

kg ∗ fkq ≤ kgkrkfkp. Problem 28.15. Let f be an unsigned measurable function on a σ-finite measure space. Prove that for 1 < p < ∞, f belongs to weak Lp if and only if there exists a constant C > 0 such that for every set E of finite measure, 1 Z C f dµ ≤ 1/p . (28.2) µ(E) E µ(E) Are these conditions still equivalent when p = 1?

47 29. The Fourier transform We assume all functions are complex-valued unless stated otherwise.

Definition 29.1 (The Fourier transform). Let f ∈ L1(R). The Fourier transform of f is the function fb defined at each t ∈ R by

Z ∞ fb(t) := f(x)e−2πitx dx (29.1) −∞

Note that fb makes sense, since the integrand belongs to L1 for each t ∈ R. We sometimes also use the phrase Fourier transform for the mapping that sends f to fb. The basic properties of the Fourier transform listed in the following proposition stem from two basic facts: first, that Lebesgue measure is translation invariant, and second that, for each t ∈ R, the function

χt : x → exp(2πitx) (29.2)

is a character of the additive group (R, +). This means that χt is a homomorphism from R into the mulitplicative group of unimodular complex numbers, explicitly for all s, t ∈ R

χt(x + y) = χt(x)χt(y). (29.3)

Before going further we introduce some notation: for fixed y ∈ R and a function f : R → C, define fy(x) := f(x − y).

Proposition 29.2 (Basic properties of the Fourier transform). Let f, g ∈ L1(R) and let α ∈ R. a) (Linearity) cf\+ g = cfb+ gb −2πity b) (Translation) fby(t) = e fb(t) 2πiαx c) (Modulation) If g(x) = e f(x), then gb(t) = fb(t − α) d) (Reflection) If g(x) = f(−x), then gb(t) = fb(t). e) (Scaling) If λ > 0 and g(x) = f(x/λ) then gb(t) = λfb(λt). Proof. Each of these properties is verified by elementary transformations of the integral defining fb; the details are left as an exercise. 

It is immediate from the definition that fb is always a bounded function; indeed |fb(t)| ≤ kfk1 for all t. Our next observation is:

1 1 Proposition 29.3. If f ∈ L (R), then fb is continuous. Moreover if fn is sequence in L 1 and fn → f in the L norm, then fbn → fb uniformly.

2πitnx 2πitx Proof. Fix t ∈ R and a sequence tn → t. Then f(x)e → f(x)e pointwise on R, and since trivially |f(x)e2πitnx| ≤ |f(x)| for all n, we have by dominated convergence 48 Z ∞ fb(t) = f(x)e2πitx dx −∞ Z ∞ = lim [f(x)e2πitnx] dx −∞ n→∞ Z ∞ = lim f(x)e2πitnx dx n→∞ −∞

= lim fb(tn). n→∞

The second statement of the theorem follows immediately from the estimate supt∈R |fb(t)| ≤ kfk1. 

In fact, fb always belongs to C0(R), this is known as the Riemann-Lebesgue Lemma. To prove it we first need the following result, which we will apply often (recall the notation fy(x) := f(x − y)): p p Lemma 29.4 (Translation is continuous on L ). If 1 ≤ p < ∞ and f ∈ L (R), then fy → f in Lp as y → 0 in R. Proof. We sketch the proof, the details are left as an exercise. The proof is accomplished by an approximation argument, in the following form: let X ⊂ Lp denote the set of all f for which the conclusion of the theorem is true. We show that X is a closed subspace of Lp which contains the indicator functions of intervals; but since the simple functions are dense in Lp the theorem follows since we can approximate simple functions in Lp by linear combinations of indicator functions of intervals (Littlewood’s principle). We leave it as an exercise to check that X is a vector space and contains 1I for all finite p intervals I. To see that X is closed, note that f, g ∈ L and kf −gkp < , then kfy −gykp <  for all y ∈ R, by the translation invariance of Lebesgue measure. The proof finishes with an /3 argument: suppose that g is in the closure of X and let  > 0 be given. Choose f ∈ X with kf − gkp < , and choose δ > 0 so that kfy − fkp <  for all |y| < δ. Then for all |y| < δ, kgy − gkp < kgy − fykp + kfy − fkp + kf − gkp < 3. Thus g ∈ X as well, and the proof is finished.  Note that the translation invariance of Lebesgue measure shows that the above proposition p implies a more general version of itself: if yn → y in R, then fyn → fy in L . 1 Lemma 29.5 (The Riemann-Lebesgue Lemma). Let f ∈ L (R). Then fb ∈ C0(R). Proof. The proof is accomplished using the continuity of translation in L1, and a simple trick: first, since e−πi = −1, we can write Z ∞ Z ∞  1  fb(t) = − f(x)e−2πit(x+(1/2t)) dx = − f x − e−2πixt dx (29.4) −∞ −∞ 2t Combining this with the usual definition of fb, we have 1 Z ∞   1  fb(t) = f(x) − f x − e−2πixt dx (29.5) 2 −∞ 2t 49 so 1 |fb(t)| ≤ kf − f1/2tk1 (29.6) 2 But by Lemma 29.4, we have kf − f1/2tk1 → 0 as t → ±∞.  Continuing our catalog of basic properties, we see that the Fourier transform also interacts nicely with differentiation:

Proposition 29.6 (Multiplication becomes differentiation). Let f ∈ L1(R). If g(x) := xf(x) −1 d belongs to L1, then fb is differentiable for all t ∈ , and g(t) = fb(t). R b 2πi dt Proof. This is another application of dominated convergence. Let s 6= t be real numbers; from the definition of fb we have fb(s) − fb(t) Z ∞ e−2πisx − e−2πitx = f(x) dx (29.7) s − t −∞ s − t and it follows that

f(s) − f(t) Z ∞ e−2πisx − e−2πitx b b ≤ |f(x)| dx (29.8) s − t −∞ s − t Now the estimate −2πisx −2πitx e − e ≤ 2π|x| (29.9) s − t holds for all s 6= t, so by the assumption xf(x) ∈ L1 we can apply dominated convergence in (29.8) to take the limit as s → t:

fb(s) − fb(t) Z ∞ e−2πisx − e−2πitx lim = lim f(x) dx s→t s − t s→t −∞ s − t Z ∞ = (−2πi)e−2πitxxf(x) dx ∞ = −2πigb(t) Thus fb is differentiable and the claimed formula holds.  Note that if f ∈ L1 and also g(x) := xnf(x) ∈ L1 for some integer n ≥ 1, then xkf(x) belongs to L1 for all 0 ≤ k ≤ n. The previous proposition can then be applied inductively to conclude:

Corollary 29.7. If f ∈ L1 and g := xnf ∈ L1, then fb is n times differentiable, and  −1 k xdkf = fb(k) for each 0 ≤ k ≤ n. (29.10) 2πi One would also expect a theorem in the opposite direction: the Fourier transform should convert differentiation to multiplication by the independent variable. Under reasonable hy- potheses, this is the case: 50 Proposition 29.8. Let k ≥ 1 be an integer and f ∈ L1(R). Suppose f (j) exists a.e. for each (j) 1 0 (k−1) j = 1, . . . k, that f ∈ L for j = 1, . . . k, and that f, f , . . . f ∈ C0(R). Then fd(j)(t) = (2πi)jtjfb(t) for j = 1, . . . k. (29.11) Proof. We prove the case k = 1, the general case follows by induction. Since f 0 ∈ L1 and f ∈ C0(R) by hypothesis, we can integrate by parts: Z ∞ fb0(t) = f 0(x)e−2πitx dx (29.12) −∞ Z b = lim f 0(x)e−2πitx dx (29.13) b→∞ −b Z ∞ = lim [f(b)e−2πib − f(−b)e2πib] + 2πit f(x)e−2πixt dx (29.14) b→∞ −∞ = 2πitfb(t). (29.15)  The last set of basic properties of the Fourier transform concern its interaction with convolution, which we now introduce. If f, g are measurable functions on R, the convolution of f and g is the function Z ∞ (f ∗ g)(x) := f(x − y)g(y) dy (29.16) −∞ defined at each x for which the integral makes sense. The most basic fact about convolution is:

Proposition 29.9. If f, g ∈ L1(R) then f ∗ g is defined for almost every x ∈ R, f ∗ g is measurable, and f ∗ g ∈ L1(R). Proof. Let H(x, y) = f(x − y)g(y). One may check (exercise) that H is jointly measurable as a function of x and y. We may then Fubinate: ZZ Z ∞ Z ∞  |H(x, y)| dxdy = |g(y)| |f(x − y)| dx dy −∞ −∞ Z ∞ = kfk1 |g(y)| dy −∞

= kfk1kgk1 where we have used the translation invariance of Lebesgue measure in the second equality. R ∞ R ∞ Thus by Fubini, −∞ |f(x − y)g(y)| dy = −∞ |H(x, y)| dy is finite for almost every x ∈ R, so f ∗ g is defined almost everywhere. Now by Fubini again Z ∞ Z ∞ ZZ

f(x − y)g(y) dy dx ≤ |H(x, y)| dydx = kfk1kgk1. −∞ −∞  The following basic properties of convolution are immediate, the proof is left as an exercise.

Proposition 29.10. Let f, g, h ∈ L1(R). 51 a) (Commutativity) f ∗ g = g ∗ f. b) (Associativity) (f ∗ g) ∗ h = f ∗ (g ∗ h). c) (Distributivity) (f + g) ∗ h = f ∗ g + f ∗ h. d) (Scalar multiplication) If c ∈ C, then (cf) ∗ g = c(f ∗ g). Notice that these properties together say that, if we equip L1(R) with the usual addition of functions and treat convolution as multiplication, then L1(R) becomes a commutative ring. (In fact it has even more structure, that of a Banach algebra, but we will not pursue this direction in this course). We can now describe how convolution behaves under the Fourier transform:

Proposition 29.11 (Convolution becomes multiplication). Let f, g ∈ L1(R). Then f[∗ g(t) = fb(t)gb(t). Proof. By virtue of Proposition 29.9, we can use Fubini’s theorem to compute f[∗ g(t): Z ∞ Z ∞  f[∗ g(t) = f(x − y)g(y) dy e−2πixt dx −∞ −∞ Z ∞ Z ∞  = g(y) f(x − y)e−2πixt dx dy −∞ −∞ Z ∞ = fb(t)e−2πiytg(y) dy −∞ = fb(t)gb(t) where we have used Proposition 29.2(b).  Given what we have proved so far, it follows that the Fourier transform is a ring homo- 1 morphism from L (R) (with addition and convolution) to C0(R) (with pointwise addition and multiplication). We will see later that the Fourier transform is injective. It turns out that it is not surjective, however. Let us finish this section by computing an important example. Let a > 0 and consider the Gaussian −πax2 ga(x) := e . (29.17) (The factor of π will be convenient given our choice of normalization in the definition of the n 1 Fourier transform.) It should be clear that x ga(x) ∈ L for all a > 0 and n ≥ 0.

2 Lemma 29.12. g (t) = √1 e−πt /a. ba a Proof. Rather than computing the integral directly, we exploit Propositions 29.6 and 29.8. We may also assume a = 1; the general case follows from this by scaling (Proposition 29.2(e)). 1 Write g = g1. Since xg ∈ L , we have 0 −πx2 (gb) (t) = (−2πixe )b(t) −πx2 0 = i((e ) )b(t) = i(2πit)gb(t) = −2πtgb(t). 52 It follows from this computation and the product rule that

d 2 (eπt g(t)) = 0, (29.18) dt b πt2 so the function e gb(t) is constant. To evaluate the constant, we set t = 0 and use the well-known Gaussian integral Z ∞ −πx2 gb(0) = e dx = 1. (29.19) −∞  30. Digression - convolution and approximate units In this section we pause to develop further properties of convolutions; these will be neces- sary in order to study the Fourier inversion problem later. A general principle is that the convolution of two functions inherits the best properties of both. We will see a number of instances of this phenomenon. A simple expression of this principle is the following: Proposition 30.1. Suppose f ∈ L∞, g ∈ L1, and f is continuous. Then f ∗ g is bounded and continuous. Proof. Since f is bounded, Z ∞ |(f ∗ g)(x)| ≤ |f(x − y)||g(y)| dy ≤ kfk∞kgk1. (30.1) −∞

To see that f ∗g is continuous, fix x ∈ R and let xn → x. The functions hn(y) := f(xn−y)g(y) 1 are dominated by kfk∞|g(y)| ∈ L , and converge pointwise to f(x − y)g(y), so by the dominated convergence theorem limn→∞(f ∗ g)(xn) = (f ∗ g)(x).  In fact, the continuity assumption on f isn’t needed; we can conclude that f ∗ g is con- tinuous assuming only f is bounded, by using the continuity of translation in L1. See Problem 35.4. Proposition 30.2. If f, g are L1 functions and are both compactly supported, then so if f ∗ g. Proof. Problem 35.3.  The dominated convergence argument can be extended to apply to differentiability: Proposition 30.3. Suppose f is a compactly supported Ck function (1 ≤ k ≤ ∞) and g ∈ L1. Then f ∗ g ∈ Ck. Proof. First assume k = 1. Then one can differentiate under the integral sign as in the proof of Proposition 29.6. The general case is proved by induction; the details are left as an exercise.  1 1 Definition 30.4. An L approximate unit is a collection of functions φλ ∈ L (R) indexed by λ > 0 such that:

a) φλ(t) ≥ 0 almost everywhere, for each λ, 53 R ∞ b) −∞ φλ(t) dt = 1 for all λ, and c) For each fixed δ > 0, we have k1|t|>δφλk1 → 0 as λ → 0. Approximate units are easy to construct: Let φ be any nonnegative, L1 function with R ∞ 1 x  1 −∞ φ(x) dx = 1; then the functions φλ(x) := λ φ λ form an L approximate unit (Prob- lem (a)). The simplest example comes from taking φ(x) = 1[−1/2,1/2]; the resulting φλ is known as the box kernel. (Draw a few of these for different values of λ to see what is going on.) Of course approximate units need not be compactly supported; we will see a very im- portant example later (the Poisson kernel). The significance of approximate units (and their name) is explained by the following theorem:

1 Theorem 30.5. Let φλ be an L approximate unit and let 1 ≤ p < ∞. Then for all p f ∈ L (R), we have kφλ ∗ f − fkp → 0 as λ → 0. To prove the theorem we need two facts. The first is Minkowski’s integral inequality (Theorem 27.17). The second is the following lemma (which is really the heart of the matter); note where each of the properties the approximate unit is used.

1 ∞ Lemma 30.6. Let φλ be an L approximate unit and g ∈ L . If g is continuous at a point x ∈ R, then lim(g ∗ φλ)(x) = g(x). (30.2) λ→0 R ∞ Proof. Using the trick g(x) = −∞ φλ(y)g(x) dy, we have Z ∞ (g ∗ φλ)(x) − g(x) = (g(x − y) − g(x)) φλ(y) dy, (30.3) −∞ so using the positivity of φλ Z ∞ |(g ∗ φλ)(x) − g(x)| ≤ |g(x − y) − g(x)|φλ(y) dy (30.4) −∞ To estimate the right-hand side, let  > 0. By the continuity of g at x, choose δ > 0 so that |g(x − y) − g(x)| <  when |y| < δ. We then split the integral in (30.4) into two integrals, over the regions |y| ≤ δ and |y| > δ: Z ∞ Z Z |g(x−y)−g(x)|φλ(y) dy = |g(x−y)−g(x)|φλ(y) dy+ |g(x−y)−g(x)|φλ(y) dy −∞ {|y|≤δ} {|y|>δ} (30.5) The first integrand is bounded by φλ, so Z Z |g(x − y) − g(x)|φλ(y) dy ≤  φλ(y) dy ≤  (30.6) {|y|≤δ} {|y|≤δ} R ∞ since −∞ φλ(y) dy = 1. The second integrand is bounded by 2kgk∞1{|y|>δ}φλ(y), so goes to 0 as λ → 0 by property (c) in the definition of approximate unit. This proves the lemma. 

Remark: If one assumes that the approximate unit φλ was constructed as φλ(y) = 1 y  R λ φ λ for some φ satisfying φ ≥ 0 and φ = 1, then the lemma has an easier proof; see Problem 35.5(b). 54 Proof of Theorem 30.5. Let f ∈ Lp. Write Z ∞ (f ∗ φλ)(x) − f(x) = (f(x − y) − f(x)) φλ(y) dy. (30.7) −∞

Then by Theorem 27.17(b) (with µ being Lebesgue measure and ν the measure φλ(y)dy), we have Z ∞ kf ∗ φλ − fkp ≤ kfy − fkpφλ(y) dy (30.8) −∞ ∞ Consider the function g(y) := kfy −fkp. This function belongs to L (indeed kgk∞ ≤ 2kfk1) and continuous (by the continuity of translation in Lp) and g(0) = 0. Thus by Lemma 30.6, the right hand side goes to 0 as λ → 0, which proves the theorem.  It is often useful to have approximate units with additional properties, such as smoothness 1 or compact support. In fact it is possible to construct an L approximate unit {ψλ} consisting of C∞ functions with compact support. This is accomplished via bump functions:

Definition 30.7. A bump function is a function ψ : R → R such that: a) ψ ∈ C∞(R), b) ψ is compactly supported, c) ψ ≥ 0, and R ∞ d) −∞ ψ(x) dx = 1. Lemma 30.8. Bump functions exist. Proof. The main issue is to construct a C∞ function with compact support. Consider the function ( e−1/x if x > 0 h(x) = (30.9) 0 if x ≤ 0 Clearly h ≥ 0 and h is differentiable for all x 6= 0; it is a calculus exercise to verify that h is infinitely differentiable at 0 and h(n) = 0 for all n. It is then straightforward to verify that ψ(x) = c · h(x + 1)h(1 − x) is a bump function (for a suitable normalizing constant c), supported on [−1, 1].  1 x  Note that if ψ is a bump function supported in [−1, 1], then the functions ψλ(x) := λ ψ λ are also bump functions, supported in [−λ, λ]. (Draw a picture of what these functions look like as λ → 0). Thus there exist approximate units consisting of smooth, compactly ∞ p supported functions. As an application, we can prove that Cc (R) is dense in L (R) for all 1 ≤ p < ∞. A proof is outlined in Problem 35.6.

31. Inversion and uniqueness 31.1. Fourier inversion–the problem. In this section we study the problem of recovering f from fb. Loosely, the Fourier transform can be thought of as a resolution of f as a superposition of sinusoidal functions e2πitx; the value of fb(t) measures the “amplitude” of f in the “frequency” t. This suggests that a formula like Z ∞ f(x) = fb(t)e2πitx dt (31.1) −∞ 55 ought to hold, at least if fb ∈ L1. If we formally substitute the definition of fb and switch the order of integration, we are confronted with Z ∞ Z ∞  f(u) e2πi(x−u)t dt du (31.2) −∞ −∞ and the inner integral is not convergent, regardless of any assumption on fb. In fact (31.1) does hold when fb ∈ L1, but a more delicate argument is necessary. So, the goal of this section will be to prove:

Theorem 31.1 (Fourier inversion, L1 case). Suppose f and fb belong to L1. Then Z ∞ f(x) = fb(t)e2πixt dt (31.3) −∞ for almost every x ∈ R. Once we have the inversion formula, we see that L1 functions are determined by their Fourier transforms:

1 Corollary 31.2. Suppose f, g ∈ L . If fb = gb, then f = g a.e. Proof. From the inversion theorem, if f ∈ L1 and fb = 0, then f = 0. By the linearity of the Fourier transform, f[− g = fb− gb, and the corollary follows.  So, in principle, f is fully determined by fb, even if fb ∈ / L1; and this is often the case. For example, suppose f = 1[0,1]. Then Z 1 −2πit −2πixt 1 − e 1d[0,1](t) = e dx = (31.4) 0 2πit which does not belong to L1 (check this). To recover f from fb in these cases, we turn to the summability methods of Section ??. In fact summability methods will already be of use in proving the inversion theorem. The idea is this: suppose we have a divergent integral Z ∞ h(t) dt (31.5) −∞ where the function h is, say, locally L1, but not L1. We might try to make sense of the integral as Z a lim h(t) dt, (31.6) a→+∞ −a effectively we have introduced the cutoff function ψa(t) := 1[−a,a], which is positive, inte- grable, and increases to 1 pointwise as a → ∞. Given any family of functions ψa with these three propertes, we can consider the integrals Z ∞ h(t)ψa(t) dt (31.7) −∞

It will turn out that the “square”cutoff 1[−a,a] has some undesirable properties; for example its Fourier transform is not L1 (and not of constant sign). We will work first with smoother cutoff functions, in particular the functions t → exp(−a|t|) (here we consider a → 0 rather than a → ∞, but this is not important). 56 The first step is to compute the (inverse) Fourier transform of e−aπ|t| (the extra factor of π turns out to be a convenient normalization). Lemma 31.3. For all a > 0, Z ∞ −aπ|t| 2πitx 1 a e e dt = 2 2 (31.8) −∞ π a + x Proof. Problem 35.7.  Let us fix the notation 1 a P (x) := (31.9) a π a2 + x2 R ∞ 1 x Notice that P1(x) is nonnegative and −∞ P1(x) dx = 1. Moreover, Pa(x) = a P1( a ). By the remarks following Definition 30.4, we have:

1 Lemma 31.4. {Pa}a>0 is an L approximate unit.

The function Pa(x) (viewed as a function of the two arguments a, x) is known as the Poisson kernel. We are now able to compute the integral (31.1) modified by the cutoff function e−aπ|t|: Proposition 31.5. If f ∈ L1, then for all a > 0 and all x ∈ R Z ∞ −aπ|t| 2πitx (f ∗ Pa)(x) = e fb(t)e dt. (31.10) −∞ Proof. Because we have introduced the cutoff function, Fubini’s theorem can be applied: Z ∞ Z ∞ Z ∞ e−aπ|t|fb(t)e2πix dt = e−aπ|t| f(y)e2πi(x−y)t dy dt (31.11) −∞ −∞ −∞ Z ∞ Z ∞ = e−aπ|t| f(x − y)e2πiyt dy dt (31.12) −∞ −∞ Z ∞ Z ∞ = f(x − y) e−aπ|t|e2πiyt dt dy (31.13) −∞ −∞

= (f ∗ Pa)(x) (31.14)  Proof of Theorem 31.1. Assume f, fb ∈ L1(R). Define Z ∞ g(x) = fb(t)e2πitx dt (31.15) −∞

By the Riemann-Lebesgue lemma, g ∈ C0(R). We want to show g = f a.e. From Proposi- tion 31.5 we have for all a > 0 Z ∞ −aπ|t| 2πitx (f ∗ Pa)(x) = e fb(t)e dt (31.16) −∞ 1 Fix a sequence an → 0. Since fb ∈ L , we can apply dominated convergence to show that the integral in the right-hand side of (31.16) converges to g(x) as an → 0, for all x. On the other 1 hand, since Pa is an L approximate unit, we know from Theorem 30.5 that f ∗ Pan → f 57 1 in L . Passing to a subsequence, we may assume that f ∗ Pan → f almost everywhere, but then by (31.16) we have f ∗ Pan → g a.e., so f = g a.e. and the theorem is proved. 

Remark: Observe that the above proof did not really use the explicit form of Pa; rather −aπ|t| the point was that {e }a>0 was a cutoff function whose Fourier transform {Pa} was an L1 approximate unit. Any other cutoff function with this property could have been used. Another corollary of Proposition 31.5 is that we can recover f from fb in a weaker sense for any f ∈ L1 (that is, not assuming fb ∈ L1). Indeed, combining Proposition 31.5 and Theorem 30.5 we have immediately: Corollary 31.6 (Fourier inversion in the L1 norm). If f ∈ L1, then Z ∞ e−aπ|t|fb(t)e2πixt dt (31.17) −∞ converges to f in the L1 norm as a → 0. So, we can recover f but only in the L1 norm; the corollary does not say anything about the pointwise covergence of the reguralized integrals. In fact, it is true that the integrals (31.17) converge to f a.e., but this requires a more delicate argument which we take up in Section ??.

32. The L2 theory In this section we study the Fourier transform on L2. There is an immediate problem, of course, since by Problem 28.8 L2 6⊂ L1, so the integral (29.1) need not be defined. However, we can observe that L1 ∩ L2 is dense in L2 (why?), and start there.

1 2 2 Lemma 32.1. If f ∈ L ∩ L , then fb belongs to L and kfbk2 = kfk2.

Proof. Let fe(x) := f(−x) and define g = f ∗ fe. Since f, fe ∈ L1 we have g ∈ L1 by Proposition 29.9. Now Z ∞ Z ∞ g(x) = f(x − y)f(−y) dy = f(x + y)f(y) dy (32.1) −∞ −∞

so we can write g(x) = hf−x, fiL2 . By Lemma 29.4, the map x → f−x is continuous from R 2 2 into L , so by Cauchy-Schwarz we see that g is a continuous function of x, and g(0) = kfk2. By Cauchy-Schwarz again, 2 |g(x)| ≤ kf−xk2kfk2 = kfk2, (32.2) so g is bounded. Now, since g ∈ L1 we can apply Proposition 31.5 to compute Z ∞ −aπ|t| (g ∗ Pa)(0) = e gb(t) dt (32.3) −∞ As g is continuous, we have by Lemma 30.6 Z ∞ 2 −aπ|t| kfk2 = g(0) = lim(g ∗ Pa)(0) = lim e gb(t) dt. (32.4) a→0 a→0 −∞ 58 Let us compute the limit of this last integral in a different way: recall that by definition g = f ∗ fe, so by Propositions 29.11 and 29.2(d), 2 gb(t) = |fb(t)| . (32.5) Making this substitution in the integral in (32.4) and applying the monotone convergence theorem, we have Z ∞ Z ∞ 2 −aπ|t| −aπ|t| 2 2 kfk2 = lim e gb(t) dt = lim e |fb(t)| dt = kfbk2. (32.6) a→0 −∞ a→0 −∞ which proves the lemma.  Theorem 32.2 (The Fourier transform on L2). There is a unique bounded linear transfor- mation F : L2 → L2 satisfying the following conditions: a) For all f ∈ L1 ∩ L2, Ff = fb. 2 b) (The Plancherel theorem) kFfk2 = kfk2 for all f ∈ L . c) The mapping f → Ff is an Hilbert space isomorphism of L2 onto L2. d) (The Parseval identity) hf, gi = hFf, Fgi for all f, g ∈ L2. Remark: Note that when f ∈ L1 ∩ L2, the Parseval identity reads Z ∞ Z ∞ f(x)g(x) dx = fb(t)gb(t) dt. (32.7) −∞ −∞ Proof of Theorem 32.2. By Lemma 32.1, the map f → fb is bounded linear transformation from a dense subspace of L2 into L2. Thus, since the codomain L2 is complete, by Propo- sition 19.9 the map f → fb has a unique bounded linear extension to a map F : L2 → L2. 1 2 This proves (a), and (b) follows since kfk2 = kFfk2 on a dense set (namely L ∩ L ). (d) is also an immediate consequence of (b), by Problem 26.3(a). It remains to prove (c); what we must show is that F is onto. We show that F has dense range; combined with the fact that F is an isometry, this shows that F is in fact onto. (The proof of this is left as an exercise). Let M denote the set of all functions g ∈ L2 such that g = fb for some f ∈ L1 ∩ L2. Clearly the range of F contains M, so it will suffice to prove that M is dense, or equivalently, that M ⊥ = {0}. Recall the cutoff functions e−aπ|x|, a > 0. The functions e2πibxe−aπ|x| belong to L1 ∩ L2 for all a > 0 and b ∈ R, so their Fourier transforms Z ∞ 2πibx −aπ|x| −2πitx Pa(t − b) = e e e dx (32.8) −∞ belong to M. So, let h ∈ M. Then Z ∞ (Pa ∗ h)(b) = Pa(b − t)h(t) dt = 0 (32.9) −∞ for all b, and therefore h = 0 by Theorem 30.5. Thus M is dense in L2 and the proof is finished.  Theorem 32.3 (L2 inversion). Let f ∈ L2. Define Z N Z N −2πixt 2πixt φN (t) = f(x)e dx, ψN (t) = (Ff)(t)e dt. (32.10) −N −N 59 Then kφN − Ffk2 → 0 and kψN (t) − fk2 → 0 as N → ∞.

1 2 Proof. Let fN := 1[−N,N]f. Then fN ∈ L ∩ L , and φN = fcN . An application of dominated 2 convergence shows that fN → f in the L norm, and in particular is a Cauchy sequence in 2 L . Since the Fourier transform is an isometry, we have also that φN is a Cauchy sequence 2 2 1 2 in L , and hence converges to some φ ∈ L . Now if f ∈ L ∩ L , we have that φN converges pointwise to fb, but since also a subsequence of φN converges pointwise to φ, we have φ = fb. This means that the linear map taking f to lim φN is a bounded linear transformation on 2 1 2 L that agrees with f → fb on L ∩ L , so it follows that lim φN = Ff. The statement for ψN is proved by similar methods and is left as an exercise (Prob- lem 35.10).  It is important to note that, for a general function f ∈ L2, its Fourier transform is defined only as an element of L2; in particular it is defined only a.e., and cannot be evaluated at points. From now on we just write fb for Ff when f ∈ L2, with the understanding that the integral definition is only valid when f ∈ L1 ∩ L2.

33. Fourier Series We now replace the group (R, +) with the multiplicative group of unimodular complex numbers iθ T = {e : −π ≤ θ < π} ⊂ C. (33.1) By the properties of the exponential, this group isomorphic to the group [−π, π) with addition ∼ mod 2π. (This amounts to the isomorphism of the quotient group R/2πZ = T.) We will typically use θ to identify elements of T. We will treat functions f : T → C either as functions on the unit circle in C, or, when convenient, as functions on the interval [−π, π) extended 2π-peridoically to R. For each integer n, the map inθ χn : θ → e (33.2) is a homomorphism from T to T. We equip T with normalized arc length measure, or equivalently normalized Lebesgue measure on [−π, π). We write dm for this measure. Note that T acts on itself by translation: for fixed θ, we have a map τθ : T → T given by iψ i(ψ+θ) τθ(e ) = e . (33.3)

This amounts to of the circle through the angle θ, and m is invariant under τθ for each θ. Arguing exactly as on the line, one can prove the continuity of translation in Lp(T): for fixed ϕ ∈ [−π, π), let fϕ(θ) = f(θ − ϕ). p p Proposition 33.1. Let 1 ≤ p < ∞. If f ∈ L (T), then fϕ → f in L (T) as ϕ → 0.

Now that we have a translation invariant measure and the characters χn, we can define a Fourier transform.

Definition 33.2. Let f ∈ L1(T). The Fourier coefficients of f are the numbers Z Z π −inθ 1 −inx fb(n) := f(θ)e dm(θ) = f(x)e dx, n ∈ Z. (33.4) T 2π −π 60 The Fourier series of f is the series ∞ X f ∼ fb(n)einθ. (33.5) n=−∞

The Fourier coefficients fb(n) are the analogs of the pointwise values of fb(t) in the case of R, and the (possibly divergent) Fourier series is the analog of the (possibly divergent) integral Z ∞ fb(t)e2πitx dt. (33.6) −∞ The function fb : Z → C is called the Fourier transform of f. Just as in the case of the real line, the Fourier transform behaves predictably under translation, modulation, reflection, and scaling; we leave the statements and proofs of these facts as exercises. Moreover the same kind of integral estimates show that fbis always a bounded function (that is, fb ∈ `∞(Z), and in fact the Riemann-Lebesgue lemma holds:

Theorem 33.3 (Riemann-Lebesgue lemma on the circle). If f ∈ L1(T), then lim |fb(n)| = 0. (33.7) n→±∞

1 In other words, the Fourier transform takes L (T) into c0(Z). As in the case of the line, this map turns out to be injective (which will follow from an appropriate inversion theorem), but not surjective. Also, as we found on the line, typically fb ∈ / `1(Z). Thus, there is an immediate difficulty in interpreting the series (33.5) As before, the basic problem is to recover f from fb, which in turn means finding a way to attach meaning to the (in general divergent) Fourier series. Broadly, the method is the same as in the case of R: we introduce a cutoff function whose Fourier series is nicely convergent to an approximate unit for L1(T). Before doing this we have a look at what can go wrong, even for nice f. Let us try to naively sum the Fourier series: fix f ∈ L1(T) and consider the partial sums N X inθ sN (θ) = fb(n)e . (33.8) n=−N Since this is a finite sum, we can expand fb as an integral and pull the sum inside: Z ( N ) X −inφ inθ sN (θ) = f(φ) e e dm(φ) (33.9) T n=−N Working with the inner sum, we consider the expression

N 1  X sin N + t D (t) := eint = 2 (33.10) N sin t n=−N 2 Then (33.9) can be written Z sN (θ) = f(φ)DN (θ − φ) dm = (f ∗ DN )(θ) (33.11) T 61 where we have introduced convolution on the circle group T. (Note that the difference θ − φ is interpreted in the group T, that is, is carried out mod 2π.) Thus, the question of whether 1 the partial sums sN converge to f in some sense (pointwise a.e., or in L , etc.) reduces to the question of whether f ∗ DN converges to f in the same sense. Unfortunately, since DN 1 is not an L (T) approximate unit, the partial sums sN can be badly behaved, even for nice f. For example we have the following: Theorem 33.4. There exists a continuous function f on T such that the Fourier series for f diverges at θ = 0. Proof. We present an outline of the proof; the details are left as an exercise. As noted above, th the N partial sum of the Fourier series of f at a point θ is given by (f ∗ DN )(θ). We suppose that (f ∗ DN )(0) → f(0) for every f ∈ C(T) and derive a contradiction. Now Z sN (0) = (f ∗ DN )(0) = f(φ)DN (φ) dm(φ). (33.12) T 1 By the construction of DN , it is clear that DN ∈ L (T) for each N. Thus for each N the map R LN : f → fDN dm is a bounded linear functional on C( ), and one can show that the norm T T of this functional is equal to kDN k1. (To see this, find a sequence of continuous functions gn such that kgnk∞ ≤ 1 for all n and gn → sgnDN pointwise. Then |LN (gn)| → kDN k1.) Next, one can show by direct estimates of the integral that kDN k1 → ∞ as N → ∞. The proof finishes by appeal to the Principle of Uniform Boundedness: if it were the case that LN (f) = sN (0) → f(0) for all f ∈ C(T), then the family of linear functionals LN would be pointwise bounded on C(T), hence uniformly bounded, which is a contradiction. Problem 35.17 gives some hints on filling the details.  Before going further, let us observe that if we assume f has a certain amount of smoothness at a point, then the Fourier series for f will converge to f at that point. A simple result of this type is the following:

1 Proposition 33.5. Suppose f ∈ L (T) and f is differentiable at a point θ0. Then sN (θ0) → f(θ0). Proof. By considering real and imaginary parts, we may assume f is real-valued, and by replacing f(θ) by f(θ + θ0) − f(θ0) we may assume that θ0 = 0 and f(0) = 0. As we have already observed, we have 1 Z π sN (0) = (f ∗ Dn)(0) = f(φ)DN (φ) dφ (33.13) 2π −π 1 Z π sin N + 1  φ = f(φ) 2 dφ, (33.14) 2π φ −π sin 2 and we wish to prove sN (0) → 0 as N → ∞. The key observation is that the function f(φ) g(φ) = φ (33.15) sin 2 belongs to L1(T). To see this, first note the elementary estimate |φ| φ |φ| ≤ | sin | ≤ (33.16) π 2 2 62 π for |φ| ≤ 2 . π Now, since f is differentiable at 0 and f(0) = 0, there exist M > 0 and 0 < δ < 2 such that

f(φ) sup ≤ M, (33.17) |φ|<δ φ so

f(φ) φ |g(φ)| = ≤ πM. (33.18) φ φ sin 2 for |φ| ≤ δ. On the other hand, for δ < |φ| ≤ π,

f(φ) π ≤ |f(φ)|. (33.19) φ δ sin 2 Thus, g is bounded near 0 and dominated by f away from 0, hence g ∈ L1. Returning to (33.13), we have 1 Z π  1 sN (0) = g(φ) sin N + φ dφ (33.20) 2π −π 2 Making the change of variable θ = φ/2 this becomes 1 Z π/2 sN (0) = g(2θ) sin((2N + 1)θ) dθ (33.21) π −π/2 If we now put h(θ) = 21[−π/2,π/2](θ)g(2θ), (33.22) then h ∈ L1 and the integral in (33.21) is nothing but the imaginary part of bh(2N + 1), which goes to 0 as N → ∞ by the Riemann-Lebesgue lemma. This finishes the proof.  So, to recover f from its Fourier series, as before we need to introduce a cutoff function, but since the “square” cutoff 1[−N,N] (corresponding to ordinary partial sums) is badly behaved, we choose a smoother cutoff. It turns out that the functions n → r|n| (33.23) for 0 ≤ r < 1 are a good choice (analgous to e−aπ|t| on the line). Thus we consider the Abel means of the Fourier series ∞ X A(r, θ) := fb(n)r|n|einθ. (33.24) n=−∞ Since fb is bounded and r < 1, this series is absolutely convergent for all θ, and unifomrly convergent on T for each fixed r. Thus, we can again expand fbas an integral, and interchange the sum and integral: Z ( ∞ ) X |n| in(θ−φ) A(r, θ) = f(φ) r e dm(φ) := (f ∗ Pr)(θ) (33.25) T n=−∞ where, by summing the geometric series, Pr(θ) is given by ∞ X 1 − r2 P (θ) := r|n|einθ = (33.26) r 1 − 2r cos θ + r2 n=−∞ 63 We now observe that the family of functions Pr has the following properties:

Lemma 33.6. i) Pr(θ) ≥ 0 for all θ ∈ [−π, π] and all 0 ≤ r < 1, R ii) For each r, Pr(θ) dm(θ) = 1, T iii) For each fixed 0 < δ < π, 1 Z Pr(θ) dθ → 0 as r → 1. (33.27) 2π δ≤|θ|≤π Proof. Exercise.  1 In other words, {Pr}r<1 is an L (T) approximate unit. Just as on the line, we have p Theorem 33.7. Let 1 ≤ p < ∞. If f ∈ L (T), then kf ∗ Pr − fkp → 0 as r → 1. Proof. This uses the properties of approximate units in the same way as on R; the proof is an exercise.  With these results in hand, we can obtain Abel summability of Fourier series on the circle: Corollary 33.8. If f ∈ Lp(T), 1 ≤ p < ∞, then the Abel means A(r, θ) of the Fourier series for f converge to f in Lp as r → 1. What happens when p = ∞? We have already seen that the Fourier series of a continuous function can diverge at a given point; however if we use the Abel means A(r, θ) we can do better. The reason is the following lemma, which says that the Poisson kernel Pr(θ) obeys a stronger condition than that of Lemma 33.6:

Lemma 33.9. For each 0 < δ < π, we have Pr(θ) → 0 uniformly on δ ≤ |θ| ≤ π as r → 1. Proof. Fix δ. For δ < |θ| ≤ π, we have −1 ≤ cosθ ≤ cosδ < 1. Thus for such θ 1 − r2 1 − r2 P (θ) = ≤ . (33.28) r 1 − 2r cos θ + r2 1 − 2r cos δ + r2 As r → 1, the numerator of this last expression goes to 0 while the denominator is bounded away from 0, which proves the lemma.  Theorem 33.10. If f ∈ C(T), then f ∗ Pr = A(r, θ) → f(θ) uniformly as r → 1. Proof. Fix f ∈ C(T) and  > 0. Since f is continuous and T is compact, f is uniformly continuous, so there exists δ > 0 such that |f(θ) − f(φ)| <  whenever |θ − φ| < δ. Using R our usual tricks with approximate units we write f(θ) = f(θ)Pr(φ)dφ to obtain 1 Z π |(f ∗ Pr)(θ) − f(θ)| ≤ |f(θ − φ) − f(θ)|Pr(φ) dφ. (33.29) 2π −π R R We split the integral as |φ|<δ + δ<|φ|≤π. For the first integral, we have |f(θ − φ) − f(θ)| <  for |φ| < δ by uniform continuity, so 1 Z 1 Z π |f(θ − φ) − f(θ)|Pr(φ) dφ <  Pr(θ) = . (33.30) 2π |φ|<δ 2π −π

For the second integral, for all r sufficiently large we have Pr(θ) <  on δ < |φ| ≤ π by the lemma, while |f(θ − φ) − f(θ)| ≤ 2kfk∞, so 1 Z |f(θ − φ) − f(θ)|Pr(φ) dφ ≤ 2kfk∞. (33.31) 2π δ<|φ|≤π 64 Thus for r sufficiently close to 1, we get |f ∗ Pr(θ) − f(θ)| ≤ (1 + 2kfk∞), so f ∗ Pr → f unifomrly on T.  1 From the smoothing properties of convolution, we see that if f ∈ L then f ∗ Pr is ∞ continuous in θ for each r. Thus there is no hope that f ∗ Pr → f in the L norm when f ∈ L∞. However we do have the following weaker form of convergence:

Proposition 33.11. Let f ∈ L∞(T). Then for each g ∈ L1(T), 1 Z π 1 Z π lim (f ∗ Pr)(θ)g(θ) dθ = f(θ)g(θ) dθ. (33.32) r→1 2π −π 2π −π Proof. Problem 35.18.  As for the line, the Fourier transform is espcially well-behaved in the L2 setting, though here things are somewhat simpler—we have L2(T) ⊂ L1(T) since the measure is finite. The proof is left as an exercise.

Theorem 33.12. The Fourier transform is a unitary transformation from L2(T) onto `2(Z). In particular, 2 2 P∞ 2 i) (Plancherel theorem) For all f ∈ L (T), we have kfk2 = n=−∞ |fb(n)| . ii) (Parseval identity) For all f, g ∈ L2(T), ∞ Z X f(θ)g(θ) dm(θ) = fb(n)gb(n). (33.33) T n=−∞

inθ In particular, note that the Fourier transform takes the orthonromal basis E = {e }n∈Z of L2(T) onto the standard orthonormal basis of `2(Z), and the Fourier transform of a function f ∈ L2(T) is just its sequence of coefficients with respect to this orthonormal basis. Indeed inθ if we write en(θ) = e , then Z −inθ fb(n) := f(θ)e dm(θ) = hf, eniL2(T). (33.34) T 2 Thus, if one has already proved that the functions {en} are an orthonormal basis for L (T), the proof of Theorem 33.12 becomes quite simple. (Indeed, the theorem is essentially equiv- alent to the assertion that the characters {en} form an orthonormal basis. Why?)

34. Schwartz functions and distributions In this section we introduce the S and the space of tempered distritbutions S 0. The theory of distributions allows many of the important operations of analysis (such as differentiation and the Fourier transform) to be extended to objects more singular than functions (indeed distributions are sometimes known as generalized functions). The basic idea is this: if ψ is a very smooth function (say C∞) and vanishes at infinity, then if f is differentiable we have the integration by parts formula Z Z f 0(x)ψ(x) dx = − f(x)ψ0(x) dx. (34.1) R R 65 However, the second integral will make sense even if the first does not (that is, even if f 0 does not exist). If we identify f with the linear functional Z ψ → fψ, (34.2) R then the above calculation suggests that we can interpret “f 0” as the linear functional Z ψ → − fψ0 (34.3) even if f 0 does not exist in the usual sense. The theory of distributions makes this heuristic precise. The first step is to carefully identify the space of smooth functions we wish to use, and topologize it appropriately so that we can speak of continuous linear functionals. ∞ ∞ Let Cb (R) denote the vector space of bounded C functions on R. ∞ Definition 34.1. The Schwartz space S consists of all functions ψ ∈ Cb (R) such that xαψ(β)(x) is bounded for all integers α, β ≥ 0. We say that a function f is rapidly decreasing if xαψ(x) is bounded for all α ≥ 0. So S consists of those ψ such that ψ and all of its derivatives are rapidly decreasing. For example, 2 ψ(x) = e−x belongs to S . It is an important fact that S is closed under differentiation, and under multiplication by polynomials: Lemma 34.2. S is a vector space, and if ψ ∈ S then xψ(x) and ψ0(x) belong to S , and in fact xαψ(β) ∈ S for all α, β ≥ 0. Proof. Exercise.  Definition 34.3. For integers α, β ≥ 0, define for ψ ∈ S α (β) kψkα,β := kx ψ k∞ (34.4)

Lemma 34.4. Each k · kα,β is a norm on S . It turns out that it is appropriate to topologize S not with a single norm, but with the whole family of norms k · kα,β simultaneously.

Definition 34.5. Say that a sequence ψn ⊂ S is Cauchy if it is Cauchy in each of the norms k · kα,β, and say that a sequence ψn ⊂ S converges if there exists ψ ∈ S such that kψn − ψkα,β → 0 for all α, β.

Of course, if ψn converges in S then the limit ψ is unique. (Check this.) We also have:

Proposition 34.6. S is complete. That is, if ψn is Cauchy in S , then there exists ψ ∈ S such that kψn − ψkα,β → 0 for all α, β ≥ 0.

Proof. Since Cb(R) is complete with respect to the k · k∞ norm, by the definition of S and the k · kα,β norms we have that, for each α, β ≥ 0, there is a function ψα,β ∈ Cb(R) such α (β) that x ψn → ψα,β uniformly on R. Put ψ = ψ0,0, the proof is finished if we can show that α (β) ψα,β = x ψ for all α, β. 0 From advanced calculus we know that if fn converges uniformly to f and fn converges uniformly to g, then f is differentiable and f 0 = g. Applying this fact we conclude that 0 (β) ψ0,1 = ψ0,0, and applying it inductively we have ψ0,β = ψ for all β ≥ 0. (In particular, 66 (β) (β) α (β) α (β) ψn → ψ uniformly for all β.) From this it follows that x ψn → x ψ pointwise for all α (β) α, β, but since this sequence also converges to ψα,β uniformly we conclude that ψα,β = x ψ for all α, β as desired.  We observed earlier that S is closed under differentiation and multiplication by polyno- mials; we now see that these operations are continuous:

α α (β) (β) Lemma 34.7. If ψn → ψ in S , then x ψn → x ψ and ψn → ψ in S . Proof. Exercise.  It is useful to observe (in connection with our discussion of the Fourier transfrom later) p that convergence in the family of norms k · kα,β controls L convergence: p Proposition 34.8. S ⊂ L for all 1 ≤ p ≤ ∞, and if ψn → ψ in S then also ψn → ψ in Lp(R). Proof. By the previous lemma, we have (1 + x2)ψ ∈ S for all ψ ∈ S , so in particular for all x ∈ R |ψ(x)| + |x2ψ(x)| kψk + kψk |ψ(x)| ≤ ≤ 0,0 2,0 . (34.5) 1 + x2 1 + x2 But (1 + x2)−1 belongs to Lp(R) for all 1 ≤ p ≤ ∞, so ψ ∈ Lp. Applying this same estimate to ψn − ψ we see that 2 −1 kψn − ψkp ≤ (kψn − ψk0,0 + kψn − ψk2,0) k(1 + x ) kp (34.6) and the right hand side goes to 0 as ψn → ψ in S .  Definition 34.9. The space of tempered distributions S 0 consists of all continuous linear maps F : S → C. That is, a map F : S → C belongs to S 0 if and only if it is linear and F (ψn) → F (ψ) whenever ψn → ψ in S . It is straightforward to check that S 0 is a vector space. To emphasize the role of S 0 as the dual space of S , we will write hF, ψi for F (ψ). Tempered distributions F are sometimes 0 0 called generalized functions. We will topologize S as follows: say Fn → F in S if hFn, ψi → hF, ψi for all ψ ∈ S . The following examples are fundamental; the unproved claims are left as exercises.

Examples 34.10. a) (Tempered functions) A measurable function f : R → C is called tempered if (1 + |x|)−N f ∈ L1 for some integer N ≥ 0. Each tempered function f defines a tempered distribution by the formula Z hf, ψi = fψ. (34.7) R (To see that fψ ∈ L1 for every ψ ∈ S , write fψ = (1+|x|)−N f(1+|x|)N ψ.) The fact that ψ → hf, ψi is continuous follows from dominated convergence. For examples of tempered functions, note that every f ∈ Lp, 1 ≤ p ≤ ∞ is tempered (apply H¨older’s inequality to (1+|x|)−2f). More generally, any polynomial times a tempered function is a tempered function. Let us also observe that if f, g are tempered functions, then the associated tempered distributions are equal if and only if f = g a.e. This justifies the name “generalized functions.” 67 b) (Tempered measures) A (positive, signed, or complex) Borel measure µ on R is called tempered if R (1 + |x|)−N d|µ|(x) < ∞ for some integer N ≥ 0. Every tempered R measure gives rise to a tempered distribution via the pairing Z hµ, ψi = ψ dµ. (34.8) R If µ is absolutely continuous with respect to Lebesgue measure m, with Radon- dµ Nikodym derivative f = dm , then µ is tempered if and only if f is a tempered function, and we are back to example (a). To give more examples, we first look at ways to obtain new tempered distributions from old ones. One elementary but important way is the following: Proposition 34.11. If F ∈ S 0 and T : S → S is a continuous linear map, then hT 0F, ψi := hF, T ψi (34.9) defines a tempered distribution.

Proof. If ψn → ψ in S , then hF, T ψni → hF, T ψi (34.10) 0 so T F defines a distribution.  Before moving on to more general classes of distributions, we consider one more special example: Proposition 34.12 (The Principal Value integral). For each ψ ∈ S , the limit Z 1 hP1/x, ψi := lim ψ(x) dx (34.11) →0 |x|≥ x exists, and defines a tempered distribution. Proof. We first show that (34.11) is well-defined on S . Let ψ ∈ S , then by changing variables in the integral on the negative half-line we get for each  > 0 Z 1 Z ∞ ψ(x) − ψ(−x) ψ(x) dx = dx (34.12) |x|≥ x  x Since ψ is differentiable at 0, the integrand is bounded in a neighborhood of 0, and since xψ(x) is bounded, the integrand decays faster than 1/x2 near infinity, so the integral is convergent. Thus the limit exists as  → 0, and equals Z ∞ ψ(x) − ψ(−x) dx. (34.13) 0 x

To see that P1/x is continuous on S , first observe that for x > 0 Z x ψ(x) − ψ(−x) 1 0 = ψ (t) dt (34.14) x x −x 1 Z x ≤ |ψ0(t)| dt (34.15) x −x 0 ≤ 2kψ k∞ (34.16) 68 It follows that Z 1 Z ∞ ψ(x) − ψ(−x) ψ(x) − ψ(−x) |hP1/x, ψi| ≤ + (34.17) 0 x 1 x Z ∞ 0 dx ≤ 2kψ k∞ + (|xψ(x)| + |xψ(−x)|) 2 (34.18) 1 x 0 ≤ 2kψ k∞ + 2kxψ(x)k∞ (34.19)

= 2kψk0,1 + 2kψk1,0 (34.20)

If we consider now ψn → ψ in S , then the above estimate applied to ψn − ψ shows that hP1/x, ψni → hP1/x, ψi and the proof is finished.  Proposition 34.13 (Differentiation of temepered distributions). For any integer β ≥ 0 and any tempered distribution F ∈ S 0, the map ψ → hF, (−1)βψ(β)i (34.21) defines a tempered distribution, called the βth distributional derivative of F , denoted F (β). Proof. By Lemma ??, the map T ψ = (−1)(β)ψ(β) is continuous on S , and the result follows by Proposition 34.11.  The reason for including the sign (−1)β in the definition of the distributional derivative is so that our definition is compatible with integration by parts. In particular, if f and f 0 are tempered functions, then the (formal) integration by parts calculation at the beginning of this section is valid, and shows that the distributional derivative ψ → hf, ψi is ψ → hf 0, ψi. Proposition 34.14 (The Heaviside function). Let H be the Heaviside function H(x) = 0 1[0,∞). Then H = δ in the sense of distributions. Proof. Let ψ ∈ S . H is a tempered function, so H0 is given by hH0, ψi = −hH, ψ0i Z ∞ dψ = − H(x) dx −∞ dx Z ∞ dψ = − dx 0 dx = ψ(0) = hδ, ψi.  Notice that every distribution is infinitely differentiable in the sense of distributions. So, we can take another derivative to get H00 = δ0. A quick computation shows that hδ0, ψh= −ψ0(0). It can be shown that δ0 is not given by any tempered measure (see Problem 35.22. Our next use of Proposition 34.11 will allow us to define the convolution of a distribution with a Schwartz function φ. Proposition 34.15. Let φ ∈ S and F ∈ S 0. Then the map hφ ∗ F, ψi := hF, φ˜ ∗ ψi (34.22) defines a tempered distribution, called the convolution of F and φ. (Here φ˜(x) = φ(−x).) 69 Proof. Again it suffices to verify that the map ψ → φ˜ ∗ ψ is continuous on S . The proof is left as Problem 35.24.  If f ∈ L1, one can also verify that f ∗ φ, viewed as a distribution, agrees with the distri- bution induced by the L1 function f ∗ φ defined by ordinary convolution (Problem 35.24). 1 It is instructive to revisit L approximate units in the context of distributions. If {φλ}λ>0 is 1 an L approximate unit, then each φλ is a tempered function and hence defines a distribution.

1 0 Proposition 34.16. If φλ is an L approximate unit, then φλ → δ in S .

Proof. By definition, for any ψ ∈ S Z ∞ ˜ hφλ, ψi = φλ(y)ψ(y) dy = (φλ ∗ ψ)(0) (34.23) −∞ ˜ ˜ where ψ(x) = ψ(−x). As λ → 0, by Lemma 30.6 we have (φλ ∗ ψ)(0) → ψ(0) = hδ, ψi.  Finally we consider the Fourier transform. The key fact is the following:

Lemma 34.17. If ψ ∈ S , then ψb ∈ S , and the map b: S → S is continuous. Proof. The fact that ψb belongs to S follows from repeated application of Propositions 29.6 α dβ and 29.8. Continuity follows from the fact that if ψn → ψ in S , then also kx dxβ (ψn−ψ)k1 → 0 for all α, β. See Problem 35.23. 

Proposition 34.18 (Fourier transforms of tempered distributions). If ψ ∈ S , then ψb ∈ S , and for any F ∈ S 0 the formula hFb , ψi := hF, ψbi (34.24) defines a tempered distribution, called the Fourier transform of F .

Examples 34.19. a) Let δt be the point mass at t ∈ R. We can compute δbt: for ψ ∈ S we have

hδbt, ψi = hδt, ψbi (34.25) = ψb(t) (34.26) Z ∞ = e−2πixtψ(x) dx (34.27) −∞ = he−2πixt, ψi (34.28)

−2πixt so δbt = e . −2πixt The expected inverstion (e\ ) = δt also holds; the proof is left as an exercise. b) Consider the distribution P1/x of Proposition 34.12. One can show that Pd1/x is the tempered distribution given by the tempered function

F (t) = −πisgn(t). (34.29) 70 35. Problems Problem 35.1. Prove Proposition 29.2 Problem 35.2. Complete the proof of Lemma 29.4. Problem 35.3. Prove Proposition 30.2 Problem 35.4. a) Prove that if f ∈ L∞ and g ∈ L1, then f ∗ g is continuous. b) Prove that if E ⊂ [0, 1] has positive Lebesgue measure, then the set E − E = {x − y : x, y ∈ E} contains an interval centered at the origin. (Hint: let −E = {−x : x ∈ E} consider the function h(x) = 1−E ∗ 1E.) 1 R 1 x  Problem 35.5. Suppose φ is an unsigned L function with φ = 1, and let φλ(x) = λ φ λ . 1 a) Prove that {φλ}λ>0 is an L approximate unit. b) Give a simpler proof of Lemma 30.6 by making a change of variables in equa- tion (30.4). 1 1 Problem 35.6. a) Prove that f ∈ Cc (R) and g is a compactly supported L function, then f ∗g is C1 with compact support. (Hint: justify differentiation under the integral sign.) ∞ 1 b) By induction, conclude that if f ∈ Cc (R) and g ∈ L is compactly supported, then ∞ f ∗ g ∈ Cc (R). ∞ p c) Conclude that Cc (R) is dense in L for all 1 ≤ p < ∞. (Apply Theorem 30.5 with φ a bump function.) d) Construct a bump function on Rn and extend the above results to n > 1. Problem 35.7. Compute the integral in Lemma 31.3. 1 Problem 35.8. This problem gives a proof that the Fourier transformb: L → C0(R) is not surjective.

a) Compute hn := 1[−n,n] ∗ 1[−1,1] explicitly. 1 b) Show that hn is, up to a multiplicative constant, the Fourier transform of the L function sin 2πx sin 2πnx f := . (35.1) n x2 (Hint: you can compute integrals, or use the L1 inversion theorem.) c) Show that kfnk1 → ∞ as n → ∞. Conclude that the Fourier transform is not surjective. (Hint: if it were surjective...). Prove, however, that the Fourier transform does have dense range. Problem 35.9. Suppose that f ∈ L1, f is differentiable a.e., f 0 ∈ L1, and f(x) = R x 0 0 −∞ f (y) dy for a.e. x ∈ R. Prove that fb = 2πitfb(t). Problem 35.10. Complete the proof of Theorem 32.3. 1 Problem 35.11. Let ϕλ be an L (T) approximate unit. Prove that if f ∈ C(T), then f ∗ ϕλ → f uniformly as λ → 0. Problem 35.12. State and prove an analog of Proposition 29.2 for Fourier series. 71 Problem 35.13. Prove Theorem 33.3. Problem 35.14. Prove Theorem 33.12. Also prove that L2 inversion is possible in the following sense: for all f ∈ L2(T), we have N 2 1 Z π X lim f(θ) − fb(n)einθ dθ = 0. (35.2) N→∞ 2π −π n=−N 2 (In other words, the partial sums sN of the Fourier series converge to f in the L norm.) Problem 35.15. Let A(T) denote the set of functions f ∈ L1(T) such that the Fourier transform fb belongs to `1(Z). a) Prove that if f, g ∈ A(T), then their product fg also belongs to A(T). (Hint: use the `1 inversion theorem to write f and g as the sums of their Fourier series, and express the Fourier coefficients of fg in terms of the coefficients of f and g.) Thus, A(T) is a ring. b) Prove that the Fourier transform is a ring isomorphism from A(T) onto `1(Z) (where the multiplication on `1(Z) is convolution). Problem 35.16. Prove that if f ∈ Ck(T), k ≥ 1, then the Fourier coefficients of f satisfy lim |n|k|fb(n)| = 0. (35.3) n→±∞ (Hint: first compute the Fourier transform of f 0 explicitly.)

Problem 35.17. Fill in the details in the proof of Theorem 33.4. To show that kDN k1 → ∞, fix N, and for each 0 ≤ k ≤ 2N let Ik denote the interval 1  kπ  1 (k + 1)π  1 , 1 2 n + 2 2 n + 2

(These are the intervals on which DN has constant sign.) Then Z π 2N Z 1 X sin(N + 2 )t |DN (t)| dt = 2 dt (35.4) sin t −π k=0 Ik 2

To estimate the integral over Ik, first show that there is a universal constant C > 0 such that 1 1 n + 2 t ≥ C for all t ∈ Ik sin 2 k for each k > 0. Problem 35.18. Prove Proposition 33.11. Problem 35.19 (Fourier transforms of measures). Let µ be a finite (signed or complex) Borel measure on T. The Fourier transform of µ is the function µb : Z → C defined by Z −inθ µb(n) := e dµ(θ). (35.5) T a) Prove that µb is bounded. Give an example of a measure µ such that µb ∈ / c0(Z). 72 b) For fixed µ, define for each 0 ≤ r < 1 ∞ X |n| inθ A(r, θ) := µb(n)r e . (35.6) n=−∞

Prove that the measures µr := A(r, θ)dm(θ) converge to µ as r → 1, in the following sense: for every continuous function f on T, Z Z lim f(θ)A(r, θ)dm(θ) = f(θ) dµ(θ). (35.7) r→1 T T Problem 35.20 (The Fej´erkernel). Consider the cutoff function on Z ( 0 if |k| > N ψN (k) = |k| (35.8) 1 − N if |k| ≤ N Find a closed form expression for N X ikθ FN (θ) = ψN (k)e , (35.9) k=−N 1 and show that the family {FN (θ)}N≥1 is an L (T) approximate unit. Problem 35.21. Prove that S is dense in Lp(R) for 1 ≤ p < ∞. R 1 Problem 35.22. Prove that there is no finte Borel measure µ on [−1, 1] such that −1 f dµ = f 0(0) for all f ∈ C1[−1, 1]. Problem 35.23. a) Complete the proof of Lemma 34.17. c) Prove that if φ, ψ ∈ S then Z Z φψb = φψ.b (35.10) R R (This justifies the definition of Fb.) b) Prove that Fourier inversion holds in S ; that is, (ψb)ˇ= ψ. c) State and prove a Fourier inversion theorem for the Fourier transform F → Fb on S .

Problem 35.24. a) Prove that if φ ∈ S and ψn → ψ in S , then φ ∗ ψn → φ ∗ ψ in S . (Here convolution means oridnary convolution of functions.) b) Let f ∈ L1 and φ ∈ S . Prove that the tempered distribution f ∗ φ coincides with the distribution defined by the tempered function f ∗ φ. c) Show that δ ∗ φ = φ in the sense of distributions. R x x  d) Let φ ∈ S be a nonnegative function with φ = 1, and let φλ(x) := λ φ λ the 1 0 0 corresponding L approximate unit. Prove that for any F ∈ S , F ∗ φλ → F in S as λ → 0.

73 36. Some probabilistic concepts In this section we sketch out the fundamentals of the measure-theoretic approach to proba- bility, emphasizing what distinguishes probability from analysis. We begin with the following “dictionary”:

analysis probability σ-algebra M σ-field E measurable space (X, M ) sample space (Ω, E ) measure space (X, M , µ) with probability space (Ω, E ,P ) µ(X) = 1 measurable function f : X → random variable X :Ω → R R R integral of f, f dµ expectation of X, E(X) = X R Ω X(ω) dP (ω) Lp function, R |f|p dµ < ∞ random variable with finite pth moment, E(|X|p) < ∞ push-forward measure, distribution PX (B) = −1 −1 f∗µ(E) = µ(f (E)) P (X (B)) = P (X ∈ B)

For the remainder of this section we will use probabilistic language. There are a few im- portant probabilistic notions that have no counterpart in the analysis we have studied so far.

Definition 36.1. Let X be a real-valued random variable with E(|X|2) < ∞. The variance of X is the quantity 2 2 var(X) = σ (X) := E((X − E(X)) ) (36.1) and σ(X) is called the standard deviation of X. The concept which most clearly distinguishes probability from analysis is that of indepen- dence. To introduce it we first look more closely at the distribution of a random variable. We recall the push-forward measure construction: if (Ω, E ,P ) is a probability space, a random variable X :Ω → R induces a Borel measure PX on R via the formula −1 PX (B) = P (X (B)) (36.2) for each Borel set B ⊂ R. The set X−1(B) ∈ E is interpreted as the event “X lies in B.” We then have:

Proposition 36.2. Let f : R → R be a Borel measurable function and X a random variable. Suppose that either f is unsigned, or E(|f(X)|) < ∞. Then Z Z E(f(X)) := f(X(ω)) dP (ω) = f(t) dPX (t). (36.3) Ω R

Proof. The equation is true by definition when f = 1B, and follows for general f by the usual approximation arguments. 

Likewise, if X1,...Xn are random variables defined on the same probability space, we can assemble them in to a single Rn-valued random variable, and define the joint distribution of 74 n (X1,...Xn) to be the unique Borel measure P(X1,...Xn) on R so that −1 P(X1,...Xn)(B) = P ((X1,...Xn) (B)) (36.4) Likewise, for unsigned or absolutely integrable f : Rn → R, Z

E(f(X)) = f(t1, . . . tn) dP(X1,...Xn)(t). (36.5) Rn So, for example we have Z E(X) = t dPX (t), R Z 2 var(X) = (t − E(X)) dPX (t), R Z E(X + Y ) = (t1 + t2) dP(X,Y )(t1, t2). R2 Let us remark that it is easy to construct random variables with a given joint distribution µ. n n Indeed, given a probability measure µ on R , we form the probability space R , BRn , µ); the random variables Xj(t1, . . . tn) = tj then have joint distribution µ. It is also important to observe that the joint distribution is not determined by the individual distrubtions. There is an important exception, though, in case the random variables are independent, which we now discuss. Without discussing it formally, let us recall that two events E,F should be called indepen- dent if the probability of E occuring, given that F has occured, is the same as the probability of E occuring. That is, P (E ∩ F )/P (F ) = P (E), or P (E ∩ F ) = P (E) ∩ P (F ). We may then define:

Definition 36.3. A set of events E1,...En ∈ E is called independent if n Y P (E1 ∩ · · · ∩ En) = P (Ej). (36.6) j=1 Now, each random variable X generates a family of events, namely X−1(B) for each Borel set B ⊂ R. So we will say that a set of random variables X1,...Xn is independent if they generate independent events:

Definition 36.4. A set of random variables X1,...Xn is independent if for all Borel sets −1 −1 B1,...Bn ⊂ R, the events X1 (B1),...Xn (Bn) are independent. As noted above, when random variables are independent, their joint distribution can be read off from the individual distributions:

Proposition 36.5. Let X1,...Xn be independent random variables. Then their joint dis- tribution is the product of the individual distributions:

P(X1,...Xn) = PX1 × · · · × PXn (36.7)

Proof. To determine P(X1,...Xn), we must determine P(X1,...Xn)(B) for every Borel set B, but since the Borel σ-algebra is generated by measurable rectangles B1 ×· · ·×Bn, we may assume 75 B has this form. Using the definition of distribution and independence, we have −1 P(X1,...Xn)(B1 × · · · × Bn) = P ((X1,...Xn) (B1 × · · · × Bn) −1 −1 = P ((X1 (B1) ∩ · · · ∩ Xn (Bn)) n Y −1 = P (Xj (Bj)) j=1 n Y = PXj (Bj) j=1 n ! Y = PXj (B1 × · · · × Bn). j=1 

Corollary 36.6. Suppose that X1,...Xn are independent random variables. Let f1, . . . fn : R → R be measurable functions and suppose that either each fj is unsigned, or E(|fj(Xj)|) < ∞ for each j. Then n Y E(f1(X1) ··· fn(Xn)) = E(fj(Xj)). (36.8) j=1 Proof. Equation 36.5, Proposition 36.5, and Fubini’s theorem.  2 Corollary 36.7. If X1,...Xn are independent and E(|Xj| ) < ∞ for each j, then n X var(X1 + ··· + Xn) = var(Xj). (36.9) j=1

Proof. We first replace each Xj by Yj := Xj − E(Xj), so that E(Yj) = 0 for each j. Then the Yj are independent (check this), and we have ( var(Xj) if j = k, E(YjYk) = . (36.10) 0 otherwise Thus n n n X X X X var( Xj) = var( Yj) = E(YjYk) = var(Xj). (36.11) j=1 j=1 j,k j=1  36.1. The Law of Large Numbers. We will be interested in an infinite sequence of iid random variables; in other words, infinitely many independent copies of a single random variable X. We introduce the notation n 1 X X := X (36.12) n n j j=1

Loosely speaking, a law of large numbers asserts that the Xn converge to E(X) in some sense. We will sketch a proof of the following: 76 2 Theorem 36.8 (The Strong Law of Large Numbers). If Xn is a sequence of iid L random variables, then Xn → E(X) almost surely. In this section we give only an outline of the proof; the exercises for this section are to fill in all the details. We begin with an elementary but very frequently used result. Recall that if (En) is sequence of sets, their lim sup is defined by ∞ ∞ \ [ lim sup En = Ek. (36.13) n=1 k=n

In other words, a point belongs to lim sup En if and only if it belongs to infinitely many of the En. P Lemma 36.9 (The Borel-Cantelli Lemma). a) If (Ej) is a sequence of events, and n P (En) < ∞, then P (lim sup En) = 0. P b) If (Ej) is a sequence of independent events, and n P (En) = +∞, then P (lim sup En) = 1.

Proof.  The next lemmas also represent frequently used techniques in probability (the zeroth mo- ment method (or union bound), first moment method, and second moment method respec- tively).

Lemma 36.10 (Zeroth moment tail bound). P (Xn 6= 0) ≤ nP (X 6= 0). Proof.  Lemma 36.11 (First moment tail bound). If E(|X|) < ∞, then for all t > 0 (|X|) P (|X | ≥ t) ≤ E . (36.14) n t Proof.  A stronger bound is available if we assume that X also has finite second moment: Lemma 36.12 (Second moment tail bound). If E(|X|2) < ∞, then var(X) P (|X − (X)| ≥ t) ≤ . (36.15) n E nt2 Proof.  The final ingedients in the proof of the strong law will be some elementary results on trunctation of random variables. As we have already seen in the analysis context (e.g. in the proof of the Marcinkiewicz interpolation theorem), it is often useful to split random variables into their small and large parts. Lemma 36.13. Let X be a random variable and N > 0 a real number. Let

X>N := 1|X|>N X,X≤N := 1|X|≤N X. (36.16)

Then X = X>N + X≤N and the following hold: 2 a) E(|X≤N | ) ≤ NE(|X|), b) var(X≤N ) ≤ NE(|X|), and 77 c) E(|X>N |) → 0 as N → ∞. Proof.  Proof of Theorem 36.8. By considering positive and negative parts separately, we may as- sume that X ≥ 0 almost surely. Since all the Xn are now nonnegative, we observe that the means Xn obey a weak monotonicity: if n ≥ m, we have m X ≥ X (36.17) n n m and thus if (1 − )n ≤ m ≤ n, we have

Xn ≥ (1 − )Xm. (36.18)

Thus the Xn cannot decrease too quickly in n. This weak monotonicity allows us to reduce to proving almost sure convergence along “sparse” subsequences of Xn. More precisely, say that a sequence of integers 1 ≤ n1 ≤ n2 ≤ · · · ≤ nj ≤ · · · is lacunary if there exists a constant c > 1 such that nj+1/nj ≥ c eventually. Then we have:

Claim: Let Xn be a sequence of unsigned, iid random variables with E(|X|) < ∞. If

Xnj → E(X) almost surely along every lacunary sequence (nj), then Xn → E(X) almost surely. Proof of claim: Assume the claim holds and let 0 <  < 1. Applying the claim to the j lacunary sequence nj = b(1 + ) c, and using the weak monotonicity (36.18), we see that lim sup |Xn − E(X)| ≤ |E(X)| almost surely. Letting  → 0 along a countable sequence of values proves the claim.  Thus, it suffices to prove the “lacunary” version of the strong law. By repeated application of Borel-Cantelli, it suffices to prove that ∞ X P (|Xnj − E(X)| ≥ ) < ∞ (36.19) j=1 for every lacunary sequence (nj) and every  > 0. To control the probabilities P (|Xnj −

E(X)| ≥ ), we truncate the Xnj and apply the second and zeroth tail bounds. To estimate

the average Xnj , we truncate each copy of X at a cutoff Nj; we will leave the values of Nj unspecified and optimize them later. At this point we only insist that Nj is chosen large

enough so that |E(X≤Nj ) − E(X)| < . So, let us fix  (we may assume  < E(X)) and the lacunary sequence nj. For each j, from the second moment tail bound we have  2 2 2 P (|(X≤Nj )nj − E(X)| ≥ /2) ≤ E(|X≤Nj | ) (36.20) nj

We control the large parts X>Nj using the zeroth moment bound, which is a useful estimate

here since X>Nj is mostly zero. We have

P (|(XNj )nj 6= 0) ≤ njP (X > Nj). (36.21) Putting these together, we have  2 2 2 P (|Xnj − E(X)| ≥ ) ≤ E(|X≤Nj | ) + njP (X > Nj). (36.22) nj 78 Now, it turns out that the simple guess Nj = nj works: indeed, since the sequence nj is lacunary one can show that for some absolute constant C > 0 (depending only on the lacunary parameter c) we have X 1 |X (ω)|2 ≤ CX(ω) almost surely, (36.23) n ≤nj j j and X n 1 (ω) ≤ CX(ω) almost surely. (36.24) j X>Nj j Taking expectations of (36.23) and (36.24), and substituting into (36.22), we conclude that (36.19) holds, and the proof is finished. 

36.2. The Central Limit Theorem. Suppose (Xn) is a sequence of iid random variables 2 with mean 0 and variance σ . Then the empirical means Xn are random variables with mean σ2 0 and variance n . Since the variance goes to 0 as n → ∞, we expect the Xn to become tightly centered around their mean, and this is the content of the Law of Large Numbers. However, if we instead consider the random variables n 1 X Z = √ X , (36.25) n n j j=1 2 then the Zn all have mean 0 and variance σ , so we might ask if the Zn have some limiting distribution. The (surprising) answer is “yes,” and in fact the limiting distribution does not depend on the distribution of X. This is the content of the Central Limit Theorem. To state it we first introduce one more mode of convergence:

Definition 36.14. A sequence of R-valued random variables Xn converges in distribution to a random variable X if for every bounded, continuous function f : R → R, we have

E(f(Xn)) → E(f(X)). (36.26) From the definition of distribution, this is equivalent to the condition Z Z

f(t) dPXn (t) → f(t)dPX (t) (36.27) R R for all f ∈ Cb(R). Since the PX are all probability measures, by standard approximation arguments it suffices to check smaller classes of continuous functions:

∞ R R Lemma 36.15. Let (µn)n=1, µ be probability measures on R. Then f dµn → f dµ for ∞ R R all f ∈ Cb(R) if and only if the convergence holds for all f ∈ Cc (R). Proof. Exercise.  R R If µn, µ are all probability measures on , we say µn → µ vaguely if f dµn → f dµ R R R for all f ∈ Cb(R). Thus Xn → X in distribution if and only if PXn → PX vaguely. The Gaussian distribution of mean µ and variance σ2 (or standard normal distribution N (µ, σ)) is the measure on R given by the density 1 Z −(t − µ)2  σ √ νµ (E) = exp 2 dt. (36.28) σ 2π E 2σ 79 One may check that Z Z σ 2 σ 2 t dνµ = µ, (t − µ) dνµ = σ . (36.29) R R 2 Theorem 36.16 (The Central Limit Theorem). Let (Xn) be a sequence of iid L random variables with mean µ and variance σ2. Let n 1 X Z := √ X . (36.30) n n j j=1

Then Zn → N (µ, σ) in distribution. There are several different proofs of the Central Limit Theorem available; since we have already studied the Fourier transform we will give the Fourier-analytic proof. Given a random variable X, we define its Fourier transform (or, as a probabilist would say, its characteristic function) by Z 2πitX 2πiyt FX (t) := E(e ) = e dPX (y). (36.31) R In other words, FX is just the Fourier transform of the measure PX . As such, it is defined for any random variable X; we do not need any integrability assumptions. Also, as we have already seen, FX is a bounded, continuous function on R. The Fourier-analytic proof of CLT boils down to two facts: first, that the Gaussian is (up to some constants) its own Fourier transform, and second (which allows the first fact to be used), that convergence in distribution is goverened by convergence of Fourier transforms. Specifically, we have:

Theorem 36.17 (Levy continuity theorem, special case). Let Xn be a sequence of R-valued random variables and X another R-valued random variable. The following are equivalent: a) Xn → X in distribution.

b) FXn → FX pointwise. Proof. (a) implies (b) is immediate from the definition of convergence in distribution, applied −2πitx to the bounded continuous functions ft(x) = e . For the converse, we use Lemma 36.15. By the lemma, it suffices to show that

E(ψ(Xn)) → E(ψ(X)) (36.32) for every Schwartz function ψ ∈ S . Since ψ ∈ S , we know that ψb ∈ L1 and Fourier inversion holds: Z ∞ ψ(x) = ψb(t)e2πixt dt. (36.33) −∞ For Xn defined on the probability space (Ω, E ,P ) we then have for each ω ∈ Ω Z ∞ 2πiXn(ω)t ψ(Xn(ω)) = ψb(t)e dt. (36.34) −∞ Taking expectations (that is, integrating dP (ω)) and using Fubini’s theorem, we have Z ∞

E(ψ(Xn)) = ψb(t)FXn (t) dt. (36.35) −∞ and the theorem follows by dominated convergence.  80 1 Proof of Theorem 36.16. Replacing each Xn by σ (Xn − µ), we may assume that µ = 0 and σ = 1. For brevity let λ = PX denote the common distribution of the Xn. By independence, the distribution of Zn is given by √ PZn (E) = λ ∗ λ ∗ · · · ∗ λ( nE). (36.36) 1 Write λn for this measure. We are done if we show that λn → ν0 vaguely; by the Levy continuity theorem it suffices to prove the pointwise convergence of the Fourier transforms. 1 −2π2t2 By Lemma 29.12, we have νb0 (t) = e . To investigate the Fourier transform of λn, first observe from our normalizations we have X ∈ L2 and Z Z y dλ(y) = 0, y2 dλ(y) = 1. (36.37) R R Since y, y2 both lie in L1(λ), it follows (by exactly the same arguments as for the Fourier transform of functions) that λb belongs to C2, and λb(0) = 1, λb0(0) = 0, λb00(0) = −4π2. (36.38) (See Corollary 29.7.) Taylor expanding λb we have λb(t) = 1 − 2π2t2 + o(t2) (36.39) where o(x) denotes a quantity that satisfies o(x)/x → 0 as x → 0. Since the Fourier transform converts convolution to multiplication, and noting the behavior of the Fourier transform under rescaling, we have  2π2t2 t2 n λcn(t) = 1 − + o . (36.40) n n Taking logs, and using the fact that log(1 + x) = x + o(x), we have  2 2  2   2  2π t t 2 2 t log λcn(t) = n log 1 − + o = −2π t + n · o (36.41) n n n

2 2 −2π2t2 1 which goes to −2π t as n → ∞. Thus, λcn → e = νb0 pointwise, and we are done.  We can improve the conclusion of the central limit theorem somewhat by expoiting the following characterizaiton of vague convergence:

Proposition 36.18. Suppose µn, µ are all Borel probability measures on R. For any Borel measure ν on R, let Fν(a) = ν((−∞, a]). (36.42)

Then µn → µ vaguely if and only if Fµn (a) → Fµ(a) at every point where Fµ is continuous. Proof.  2 Corollary 36.19 (Central Limit Theorem, second version). Let Xn be a sequence of iid L random variables with mean µ and variance σ2. Then for every real number a, we have n ! Z ∞ 1 X 1 2 P √ (X − µ) ≥ a → √ e−x /2 dx. (36.43) σ n j j=1 2π a Proof. This is immediate by combining CLT and Proposition 36.18. 

81