<<

LECTURES ON

R. SHVYDKOY

Contents 1. Elements of topology2 1.1. spaces3 1.2. General topological . Nets6 1.3. Ultrafilters and Tychonoff’s compactness theorem7 1.4. Exercises8 2. Banach Spaces9 2.1. Basic concepts and notation9 2.2. Geometric objects9 2.3. Banach Spaces 10 2.4. Classical examples 11 2.5. Subspaces, direct products, quotients 14 2.6. comparison and equivalence 15 2.7. Compactness in finite dimensional spaces 16 2.8. Convex sets 17 2.9. Linear bounded operators 18 2.10. Invertibility 20 2.11. Complemented subspaces 22 2.12. Completion 23 2.13. Extensions, restrictions, reductions 23 2.14. Exercises 24 3. Fundamental Principles 24 3.1. The dual 24 3.2. Structure of linear functionals 25 3.3. The Hahn-Banach extension theorem 26 3.4. Adjoint operators 28 3.5. Minkowski’s functionals 28 3.6. Separation theorems 29 3.7. 30 3.8. Open mapping and theorem 31 3.9. Exercises 33 4. Hilbert Spaces 34 4.1. Inner product 34 4.2. Orthogonality 36 4.3. Orthogonal and Orthonormal 38 4.4. Existence of a and Gram-Schmidt orthogonalization 40 4.5. Riesz Representation Theorem 42 4.6. Hilbert-adjoint 43 1 2 R. SHVYDKOY

4.7. Exercises 45 5. Weak 46 5.1. Weak 46 5.2. Weak∗ topology 49 5.3. Exercises 52 6. Compact sets in Banach spaces 52 6.1. Compactness in spaces 54 6.2. Arzel`a-AscoliTheorem 56 6.3. Compactness in Lp(Ω) 57 6.4. Extreme points and Krein-Milman Theorem 59 6.5. Compact maps 60 6.6. Exercises 60 7. Fixed Point Theorems 60 7.1. Contraction Principles 60 7.2. Brouwer Fixed Point theorem and its relatives 61 8. of bounded operators 61 8.1. Spectral Mapping Theorem and the Gelfand formula 65 8.2. On the spectrum of self-adjoint operators 67 8.3. On the spectrum of unitary operators 70 8.4. Exercises 70

1. Elements of topology We start our lectures with a crush course in elementary topology. We will not need the full extent of this section till after we start discussing weak and weak-star topologies on Banach spaces. However, everything related to metric spaces and compactness will become useful right away. Definition 1.1. A set X is called a if has a designated family of subsets τ, called topology whose elements are called open sets and which satisfy the following axioms (i) ∅,X ∈ τ, S (ii) If Uα ∈ τ for α ∈ A is any collection of open sets, then α∈A Uα ∈ τ, Tn (iii) If Ui, i = 1, . . . , n is a finite collection of open sets, then i=1 Ui ∈ τ. We call F ⊂ X a if F c = X\F ∈ τ. Closed sets verify a complementary set of axioms: (i) ∅,X are closed, T (ii) If Fα for α ∈ A is any collection of closed sets, then α∈A Fα is closed, Sn (iii) If Fi, i = 1, . . . , n is a finite collection of closed sets, then i=1 Fi is closed. For any subset A ⊂ X, we define the of A, denoted A¯, to the minimal closed set containing A. In other words, if F is a closed set containing A, then A¯ ⊂ F . In view of property (ii) of closed sets, we can define the closure equivalently by \ A¯ = F. F is closed,A⊂F LECTURES ON 3

One of the most important uses of topological spaces is that one can define the notion of a continuous map between them.

Definition 1.2. Let (X, τX ) and (Y, τY ) be two topological spaces. A map f : X → Y −1 is called continuous if f (U) ∈ τX for every U ∈ τY . In other words the preimage of every is an open set. One can also define ”small sets” that in a sense possess properties of a finite dimensional object. Definition 1.3. A subset K ⊂ X is called compact if every open cover of K contains a finite subcover, in other words if one can embed K into a collection of open sets [ K ⊂ Uα, α∈A

then one can find only finitely many of them Uα1 ,...,Uαn which still cover the entire set Sn K ⊂ i=1 Uαi . Lemma 1.4. Let f : X → Y be a continuous map. If K ⊂ X is compact, then f(K) is compact in Y . Proof. Consider an open cover [ f(K) ⊂ Vα. α∈A −1 Then f (Vα) = Uα form an open over of K. By the compactness of K there exists a finite Sn Sn subcover K ⊂ i=1 Uαi . But then f(K) ⊂ i=1 Vαi . 

One can define a of a sequence x = limn→∞ xn by declaring that for every open set U containing x there exists an N ∈ N such that for all n > N, xn ∈ U. However, in general the definitions of continuity and compactness are not equivalent to their sequential counterparts n as, say, in R . Thus, the continuity in the sense that “if xn → x, then f(xn) → f(x)” in not equivalent to Definition 1.2. Likewise, sequential compactness in the sense that ”every sequence {xn} ⊂ K has a convergent subsequence to an element of K” is not equivalent to Definition 1.3. These sequential counterparts appeal to the situation when one can choose a “sequence of balls” shrinking towards x, and respectively towards f(x) in Y , which may not be possible if the topology does not have a “countable ”. Before we venture into the general settings let us narrow our discussion down to the more special case of a where most topological concepts do in fact have sequential analogues.

1.1. Metric spaces. A metric on a set X, or distance, is a map d : X × X → R+ such that (i) d(x, y) = 0 if and only if x = y, (ii) d(x, y) = d(y, x), (iii) d(x, z) ≤ d(x, y) + d(y, z).

For any point x there is a family of balls Br(x) = {y : d(x, y) < r} around x which has a countable base, namely B1/n(x). We define metric topology on X by declaring that a set U is open if for every x ∈ U there is a ball Br(x) ⊂ U. Every metric space separates points, namely for every x1, x2 ∈ X there are open neigh- borhoods U(xi) such that xi ∈ U(xi) and U(x1) ∩ U(x2) = ∅. Indeed, pick U(xi) to be the 4 R. SHVYDKOY

ball centered at xi of radius r = d(x1, x2)/3. Such topologies are called Hausdorff. We say that x = limn→∞ xn if d(x, xn) → 0. Because of the Hausdorff property all limits are unique. We call a sequence Cauchy if ∀ε ∃N such that ∀n, m > N, d(xn, xm) < ε. Definition 1.5. The space X is called complete if every Cauchy sequence in X has a limit. Proposition 1.6 (Sequential definitions). Let (X, d) be a metric space. (a) A subset F ⊂ X of a metric space is closed iff the limit of a convergent sequence xn ∈ F , belongs to F . (b) A f : X → Y , where Y is any topological space, is continuous iff for every sequence xn → x one has f(xn) → f(x) (sequentially continuous). (c) A subset K ⊂ X is compact iff every sequence {xn}n ⊂ K contains a subsequence which converges to a point in K (sequentially compact). Consequently, K is complete as a metric space. Proof. (a): Suppose F is closed. If x∈ / F , then there is a neighborhood U(x) ∩ F = ∅. By definition of the limit there exists n such that xn ∈ U(x), so not in F , a contradiction. Conversely, if F is sequentially closed but F c is not open, then there exists a point x ∈ F c such that B1/n(x) ∩ F 6= ∅. Pick a sequence xn ∈ B1/n(x) ∩ F . Then xn → x and all xn ∈ F , yet x∈ / F , a contradiction. (b): Suppose f is continuous, but there exists xn → x such that f(xn) does not converge

to f(x). Then thee exists an open neighborhood U of f(x) and a subsequence xnk such that −1 f(xnk ) ∈/ U. Since f (U) is open and contains x, some elements of xnk will eventually get −1 in f (U) which means f(xnk ) ∈ U, a contradiction. Conversely, suppose f is sequentially continuous but for some open U ⊂ Y , f −1(U) is not −1 −1 open. Then there exists a point x ∈ f (U) and a sequence xn ∈ B1/n(x)\f (U). But then xn → x and so f(xn) → f(x), which means that from some n, f(xn) ∈ U, a contradiction. (c): Let K be compact and let {xn} ⊂ K. We call point y ∈ K a cluster point if every

ball Br(y) contains a subsequence {xnk } of the given sequence. We show that there exists at least one cluster point. Suppose not, then for every y ∈ K we can find a ball Br(y)(y) such that starting from n > N(y), all xn ∈/ Br(y)(y). This defines a cover, hence there

exists a finite subcover Br(y1)(y1),...,Br(ym)(ym). So, from n > maxi N(yi) all elements of our sequence do not belong to K, a contradiction. So, let y be a cluster point. We pick a converging subsequence as follows: pick n1 such that xn1 ∈ B1(y), then n2 > n1 such that

xn2 ∈ B1/2(y), then then n3 > n2 such that xn3 ∈ B1/3(y), and so on. Clearly, xnm → y. The converse statement is somewhat more involved and it highlights several additional useful properties of compact sets in metric spaces. So, suppose K is sequentially compact. And let {Uα}α∈A be an open cover of K. First, we show that this cover has a certain ”fatness” everywhere in K, this is called Lebesgue number lemma. We claim that there exists a ε > 0 such that for every point x ∈ K there exists α such that Bε(x) ⊂ Uα. Indeed, otherwise for every n we would have found an xn ∈ K such that the ball B1/n(xn) is not contained entirely in any Uα. For this sequence, {xn} we can extract a converging subsequence xn → x ∈ K. The limit x is contained in some Uα, and with it there is a ball Br(x) ⊂ Uα. Since xn → x, we can find n > 2/r such that d(xn, x) < r/2. But then B1/n(xn) ⊂ Br(x) ⊂ Uα, a contradiction. Next, we show that for every ε there exists an ε- covering K. This means a set of point S x1, . . . , xn such that K ⊂ i Bε(xi). Indeed, if for some ε > 0 K is not a union of finitely such balls, we pick a sequence as follows: take any x1 ∈ K, then pick x2 ∈ K\Bε(x1), then LECTURES ON FUNCTIONAL ANALYSIS 5

pick x3 ∈ K\(Bε(x1) ∪ Bε(x2)), and so on. This selects a sequence such that d(xn, xk) > ε for all n > k. It may not have any converging subsequence. n Finally, for our cover we pick a Lebesgue number ε and find an ε-net {xi} . Then each S S i=1 ball Bε(xi) finds itself in some Uαi . Clearly, K ⊂ i Bε(xi) ⊂ i Uαi . So, we have found a finite subcover.  Corollary 1.7. Let K ⊂ X be compact and f : K → R be a . Then maxK f and minK f are achieved.

Proof. First, maxK f is finite. If not, then there is a sequence xn ∈ K such that f(xn) > n. Choosing a converging subsequence we arrive a contradiction. Now, similarly, pick a sequence such that f(xn) → maxK f and select a convergent subsequence. If x is its limit, then clearly, f(x) = maxK f.  Another important characterization of compact sets in metric spaces is provided in terms of ε-nets, which momentarily appeared in the proof of Proposition 1.6 (c).

Definition 1.8. Let K ⊂ X. We say that {xγ}γ∈Γ ⊂ X is an ε-net for K if for every y ∈ K, there exists γ ∈ Γ such that d(y, xγ) < ε. It is easy to show that if one can find an ε-net for K in X, then there exists in fact a 2ε-net consisting of elements of K itself. Indeed, let us consider balls Bε(xγ). Consider the subset Γ0 ⊂ Γ consisting of indexes γ such that Bε(xγ) ∩ K 6= ∅. For any such γ ∈ Γ0 pick a yγ ∈ Bε(xγ) ∩ K. Then for any y ∈ K there exists γ ∈ Γ such that y ∈ Bε(xγ). But then

γ ∈ Γ0, and consequently, d(y, yγ) ≤ d(y, xγ) + d(xγ, yγ) < 2ε. Hence, {yγ}γ∈Γ0 ⊂ K is a 2ε-net. Definition 1.9. A subset K ⊂ X is called precompact if its closure K¯ is compact. As we will see in the context of compact embeddings most of the sets we will deal with are in fact precompact “out of the box”. So, it is a very convenient concept to use. Lemma 1.10. Suppose X is complete. Then the following are equivalent: (i) K is precompact. (ii) Every sequence in K contains a subsequence converging to an element of X. (iii) For every ε > 0 there exists an ε-net for K. Proof. Let us assume (i). Then (ii) follows immediately from Proposition 1.6(c) due to K¯ ¯ being compact. Conversely, if (ii) holds, consider a sequence xn ∈ K. Then for any n, pick ¯ a yn ∈ K such that d(yn, xn) < 1/n. There exists a subsequence yn → x. Then x ∈ K, and ¯ k at the same time xnk → x. Proposition 1.6(c) implies that K is compact. Let us show that (i) implies (iii). For a fixed ε > 0 consider an open cover of K¯ by ¯ ¯ balls Bε(x), x ∈ K. Since K is compact there exists a finite subcover Bε(xj), j = 1, . . . , n. Clearly, {xj} forms a finite ε-net for K. ¯ Assume (iii). Clearly, K shares the same property. Pick any sequence yn ∈ K. Fix an k arbitrary sequence εk → 0. For any k, find an εk-net xj , j = 1,...,Jk. Now we start an 1 iteration. Since the balls Bε1 (xj ) cover K, and there are finitely many of them, there exists 1 1 1 a subsequence yn belonging to one of them, i.e. d(yn, ym) < 2ε1, and with the first element 1 2 y1 corresponding to yp with p > 1. Next, pick a further subsequence yn which belongs to a 2 ball of radius ε2, with the first element y1 corresponding to yp, p > 2, etc. The constructed 6 R. SHVYDKOY

n subsequence y1 = zn is Cauchy since for any ε > 0 we can find εk < ε and starting from n > k all elements of zn are εk-close. Since X is complete, this implies that zn → x ∈ X, which implies (ii).  Corollary 1.11. In a X, a subset K ⊂ X is compact if and only if it is closed and for every ε > 0 there exists an ε-net for K. There is a separate term in topology for sets satisfying property (iii) of Lemma 1.10. They are called totally bounded. We will rarely use this term because the distinction between totally bounded and precompact sets may only appear in incomplete spaces, which will not be in focus for most of our discussion. 1.2. General topological spaces. Nets. The subject of this section is an arbitrary, not necessarily metric, topological space (X, τ). It turns out that in the case when the topology cannot be defined by a metric, many of the conventional sequential definitions stated in Proposition 1.6 do not apply. Sequences are simply not descriptive enough to characterize features of a topology that does not have a countable base of neighborhoods, see [?] for more on this. A proper “upgrade” of a sequence to the general settings is given by the concept of a net.

Definition 1.12. A subset {xα}α∈A ⊂ X is called a net (not to be confused with ε-nets of the previous section), if the index set A is partially ordered and directed, i.e. for every pair α, β ∈ A there is γ ∈ A with γ ≥ α, γ ≥ β.A subnet is a net {yβ}β∈B with a map n : A → B such that yβ = xn(β), n is monotone, and for every α ∈ A there is β ∈ B with n(β) ≥ α. A net {xα}α∈A is said to be convergent to x ∈ X, written

x = lim xα = lim xα, α∈A if for very open neighborhood U of x, there is α0 ∈ A such that xα ∈ U for all α ≥ α0. Using nets we can state the full analogue of Proposition 1.6. Proposition 1.13 (net-based definitions). Let (X, τ) be an arbitrary topological space. (a) A subset F ⊂ X is closed if and only if the limit of every convergent net inside F is contained in F . (b) A function f : X → Y , is continuous if and only if for any convergent net limα∈A xα = x, limα∈A f(xα) = f(x). (c) K ⊂ X is compact if and only if every net contains a convergent subnet to a point in K. Proof. We will leave (a) as Exercise 1.2. (b): Suppose f is continuous, and let limα∈A xα = x. For any open G containing f(x), −1 −1 f (G) is open and contains x. Since eventually all xα are in f (G), then all f(xα) will be in G. Conversely, suppose there is an open G ⊂ Y such that f −1(G) is not open. Thus, there −1 is a point x0 ∈ f (G) such that any open neighborhood U of x contains a point outside −1 f (G). Let us fix one such point xU for every U. Let A = {U ∈ τ : U open, x ∈ U}. It is a net ordered by inclusion. Clearly, xU → x, since for every U containing x, all elements of the net, namely starting from xU , will fall inside U. Yet, f(xU ) 6∈ G, and thus f(xU ) 6→ f(x). (c): Suppose X is compact, and let {xα}α∈A ⊂ X be a net. First we let us establish existence of a cluster point. A point y ∈ X is a cluster point of a net if for every U ∈ τ LECTURES ON FUNCTIONAL ANALYSIS 7 containing y and every α0, there is α ≥ α0 such that xα ∈ U. Suppose that our net does not have cluster points. Thus, for every y ∈ X there is Uy and αy ∈ A such that xα 6∈ Uy for all α ≥ αy. Consider the open cover {Uy}y∈X . By compactness there is a finite sub cover

Uy1 ,...,Uyn . Since A is a net, there is a α ≥ αyi for all i = 1, ..., n. Then xα is in none of the open sets above, which shows that they don’t form a cover. So, let y be a cluster point. Let B = {(U, α): y ∈ U, U ∈ τ, xα ∈ U} be ordered by reverse inclusion on the first component, and by the order of A on the second. For β = (U, α), let yβ = xα, and let n(β) = α. It is routine to show that {yβ}β∈B is a subnet converging to y. Conversely, suppose every net has a converging subnet, and yet on the contrary, X is not compact. This implies that there is an open cover U which has no finite subcover. Let us define A = {α = (U1,...,Un): Ui ∈ U, n ∈ N} ordered by α ≥ β if β ⊂ α. Clearly A is also S directed. By assumption, for any α = (U1,...,Un) there is xα 6∈ i Ui. The net {xα}α∈A has a converging subnet {yβ}β∈B, and y = lim yβ. Since U is a cover, there is U ∈ U with y ∈ U. Let α = (U). By the definition of a subnet, there is β0 ∈ B such that n(β0) ≥ α 00 0 and yβ0 = xn(β0), and there is another β ≥ β such that yβ00 ∈ U. By monotonicity of n, 00 n(β ) ≥ α, and yet xn(β00) ∈ U, in contradiction with the construction.  1.3. Ultrafilters and Tychonoff’s compactness theorem. Let X be a set. A family of subsets F ⊂ 2X is called a filter if (1) ∅ 6∈ F; Tn (2) if F1, ··· ,Fn are elements of F, then j=1 Fj ∈ F; (3) if F ∈ F and F ⊂ S, then S ∈ F. Let P be the set of all filters in X ordered by inclusion. A routine verification shows that P satisfies the conditions of Zorn’s Lemma. Every maximal element of P is called an ultrafilter. In fact, for any filter F there is an ultrafilter containing F, for the subset of P of filters containing the given one satisfies Zorn’s Lemma as well. Ultrafilters can be characterized by adding one more condition to the three above: U is an ultrafilter if and only if it is a filter and (4) for any subset A ⊂ X either A ∈ U or X\A ∈ U. Indeed, if (4) holds and U 0 is another filter containing U, then any set A ∈ U 0 should be in U, for otherwise, X\A is in U, and then ∅ = A ∩ (X\A) ∈ U 0. Conversely, if A ⊂ X is such that X\A 6∈ U, then by (3) every F ∈ U must intersect with A. Define a new family U 0 = {S : F ∩ A ⊂ S,F ∈ U}. Clearly, U ⊂ U 0, A ∈ U 0, and one can easily check that U 0 is a filter. By maximality of U, U = U 0, and hence A ∈ U. An alternative to (4) is a formally stronger, but equivalent condition: 0 (4 ) if A1 ∪ ... ∪ An ∈ U, then some Ai ∈ U.

Indeed, if non of Ai’s belongs to U, then all the complements do, and hence their intersection, which is X\(A1 ∪ ... ∪ An). This is incompatible with (1). The compactness of a topological space can be restated in terms of convergence of ultra- filters. So, let (X, τ) be a topological space and F be a filter on it. We say that lim F = x if every neighborhood of x has a non-empty intersection with any element of the filter. If F is an ultrafilter, which will be our standard assumption, we showed above that every set that intersects every element of F must lie in F. Thus, in this case lim F = x iff every open neighborhood of x is contained in F. If every two distinct points in X can be separated by disjoint open neighborhoods, and such a space is called Hausdorff, then clearly, the limit is unique. What follows, however, does not require this assumption. 8 R. SHVYDKOY

Lemma 1.14. X is compact if and only if every ultrafilter in X converges to a point in X. Proof. Suppose X is compact, and let U be an ultrafilter on X. If U does not converge to any point in X, then any point x ∈ X is contained in Ux ∈ τ with Ux 6∈ U. Thus, {Ux}x∈X form an open cover of X, which must contain a finite sub cover, U1,...,Un. But ∪jUj = X ⊂ U, so by (40) one of the sets must be in U, a contradiction. Conversely, let C = {U} be an open cover of X. Suppose that it contains no finite subcover. Thus, any finite intersection (X\U1)∩...∩(X\Un) in non-empty. This shows that the family F = {F ⊂ X : X\U ⊂ F, for some U ∈ C} is a filter, and let U be an ultrafilter containing F. By assumption, let x = lim U. Since C is a cover, there is U ∈ C with x ∈ U. But, X\U ∈ U, so U cannot be in U, a contradiction.  Ultrafilters are useful for many purposes. In particular, they provide an economical proof of Tihonov’s compacness theorem, which will be used later to establish the Alaoglu Theo- rem 5.10. Let {(Xγ, τγ)}γ∈Γ be a collection of topological spaces. The Cartesian product, X = Q S γ∈Γ Xγ is the set of functions x :Γ → γ∈Γ Xγ such that x(γ) ∈ Xγ. We usually denote x(γ) = xγ and write x = {xγ}γ∈Γ. Let πγ : X → Xγ be the usual projection map. We define the on the Cartesian product X to be the topology generated by the sets −1 π (Uγ), where Uγ ∈ τγ. This is also the minimal topology on X in which all the projection maps are continuous.

Theorem 1.15 (Tychonoff’s compactness theorem). If all Xγ, γ ∈ Γ, are compact, then the Q Cartesian product X = γ∈Γ Xγ is compact in the product topology.

Proof. Let U be an ultrafilter on X. For every γ ∈ Γ, consider Uγ = πγ(U). Then Uγ is an ultrafilter. Since Xγ is compact, there exists a limit xγ = lim Uγ. Let us show that x = {xγ}γ∈Γ = lim U. Let U be open neighborhood of x in the product topology. Then with x, U contains a finite intersection of basis sets ∩n π−1(U ). We have x ∈ U , and hence i=1 γi i γi i U ∈ U , which means there exist A ∈ U such that π (A ) = U . Then A ⊂ π−1(U ), which i γi i γi i i i γi i implies that π−1(U ) ∈ U, for each i, and hence ∩n π−1(U ) ∈ U. Since U contains that γi i i=1 γi i intersection, it must itself be in the ultrafilter U.  1.4. Exercises. Exercise 1.1. Verify all the axioms of open sets for this definition. Exercise 1.2. Let X be an arbitrary topological space. Show that a subset F ⊂ X is closed if and only if the limit of every convergent net inside F is contained in F . Exercise 1.3. Show that A¯ is the set of all limits of nets from within A. In a metric space, A¯ is the set of all limits of sequences from within A.

Exercise 1.4. A topology τ1 on X is said to be stronger than another topology τ2 on X if for any point x ∈ X any open neighborhood of x in τ2 contains an open neighborhood of x in τ1. We denote it τ1 ≥ τ2. If τ1 ≥ τ2 and τ2 ≥ τ1, then the topologies are called equivalent. Show that in general, τ1 ≥ τ2 if and only if a net converging in τ1 also converges in τ2.

Exercise 1.5. Show that a net xα → x in the product topology if and only if πγ(xα) → πγ(x) for every γ ∈ Γ. LECTURES ON FUNCTIONAL ANALYSIS 9

2. Banach Spaces 2.1. Basic concepts and notation. Let us consider a X over the field K = R or C. We say that X has finite n is there is a system of n linearly independent vectors {x1, . . . , xn} in X which spans X. We denote the of a set S ⊂ X by [S]. If X has no finite dimension, X is called infinite dimensional. A function + k · k : X → R is said to define a norm on X if the following axioms hold: (i) kxk = 0 iff x = 0, (ii) kαxk = |α|kxk for all x ∈ X, α ∈ K, (iii) kx + yk ≤ kxk + kyk (triangle inequality). More relaxed versions of the concept of a norm are also studied in functional analysis. These include a pseudo-norm, that is a function k · k satisfying only (ii) and (iii), and a quasi-norm that is a function satisfying (i), (ii), and the triangle inequality with a constant coefficient c > 1: kx + yk ≤ c(kxk + kyk). We write (X, k · k) to indicate that X is equipped with the norm k · k if it is not clear from the context. We call (X, k · k) a normed space. Let us notice that a norm generates a metric, called norm-metric, on the space X via d(x, y) = kx − yk. The corresponding topology is called norm-topology. Let us recall from Section 1.1 that we identify open sets as sets U with the property that for any x ∈ U there is an open ball Bε(x) ⊂ U. The norm-topology naturally gives rise to the concept of convergence and ∞ continuity as described in Section 1.1: a sequence {xn}n=1 ⊂ X is said to converge to x in the norm, or strongly, if kxn − xk → 0 as n → ∞. A function f : X → Y , where Y is a −1 topological space, is continuous if f(xn) → f(x) whenever xn → x, or equivalently, if f (U) is open for any open U ⊂ Y , see Lemma ??.

Lemma 2.1. The norm k · k : X → R is a continuous function on X. The proof follows readily from (25).

2.2. Geometric objects. In any normed space we can define a line passing through x0 ∈ X in the direction of v ∈ X as a set of vectors

{x0 + tv : t ∈ R}. Similarly we define a spanned by a linearly independent couple v, w ∈ X:

{x0 + tv + sw : t, s ∈ R}, etc. We introduce the following notation of balls and B(X) = {x ∈ X : kxk < 1}, the open unit ball B(X) = {x ∈ X : kxk ≤ 1}, the closed unit ball S(X) = {x ∈ X : kxk = 1}, the unit

Br(x0) = {x ∈ X : kx − x0k < r},

Br(x0) = {x ∈ X : kx − x0k ≤ r}. 10 R. SHVYDKOY

Exercise 2.1. Show that B(X) is open, while B(X) and S(X) are closed subsets of X. For two sets A, B ⊂ X we denote by A+B their algebraic sum {x+y : x ∈ A, y ∈ B}, and constant multiple by αA = {αx : x ∈ A}. Thus, Br(x0) = x0 +rB(X), Sr(x0) = x0 +rS(X), etc. For a subset A ⊂ X its linear span is the set

( k ) X span A = αixi : xi ∈ A, αi ∈ R, k ∈ N , i=1 and it is ( k ) X X conv A = αixi : xi ∈ A, 0 ≤ αi ≤ 1, αi = 1, k ∈ N . i=1 i In particular for any two vectors x, y ∈ X, the convex hull conv{x, y} is simply the segment between them. We can define the concept of a distance between two sets A, B ⊂ X: dist{A, B} = inf ka − bk. a∈A,b∈B Finally, we say that x ∈ X is a unit vector if kxk = 1. For any vector x 6= 0, we can define the unit vector pointing in the same direction: x x¯ = , kx¯k = 1. kxk

A subset A ⊂ X is called bounded if supa∈A kak < ∞. Note that a set is bounded if and only if it is contained in a ball A ⊂ RB(X), for some R > 0.

∞ 2.3. Banach Spaces. We say that a sequence {xn}n=1 is Cauchy if for every ε > 0 there exists an N ∈ N such that for all n, m > N

kxn − xmk < ε. Unlike for real numbers, in general normed spaces a Cauchy sequence may not have a limit. We say that (X, k · k) is complete if every Cauchy sequence converges. Definition 2.2. A complete normed space (X, k·k) is called a . In other words, X is a Banach space if every Cauchy sequence in X has a limit in X. P∞ Definition 2.3. A n=1 xn is said to converge to its sum x ∈ X, and we write ∞ X x = xn, n=1 PN if the sequence of partial sums n=1 xn tends to x. A series converges absolutely if the P∞ numerical series n=1 kxnk converges. LECTURES ON FUNCTIONAL ANALYSIS 11

2.4. Classical examples. The simplest example of a normed space is the n n `2 = (K , k · k2) with the norm given by n !1/2 X 2 kxk2 = |xi| . i=1 P 1/2 The Euclidean norm is generated by the inner product hx, yi = xiy¯i via kxk2 = hx, xi . The triangle inequality in this case is a consequence of the Cauchy-Schwatz inequality: kx + yk2 = kxk2 + 2

For p = ∞, `∞ is the space of bounded sequences endowed with the supremum-norm

kxk∞ = sup |xi|. i∈N n The corresponding n-dimensional analogue is denoted `p . At this point, other than in cases p = 1, 2, ∞, it not clear whether `p is a linear space and k · kp defines a norm on it. We will show it next and establish several very important inequalities as we go along the proof.

Lemma 2.4. `p is a Banach space for all 1 ≤ p ≤ ∞.

Proof. First, let p = ∞. The axioms of norm in this case are trivial. To show that `∞ ∞ is complete let xn = {xn(j)}j=1 be a Cauchy sequence. Hence, every numerical sequence {xn(j)}n is Cauchy. This implies that xn(j) → x(j) as n → ∞. Since {xn} is Cauchy, we have kxn − xmk∞ < ε for all n, m > N. Thus, |xn(j) − xm(j)| < ε for all j ∈ N as well. Let us fix n and j and let m → ∞ in the last inequality. We obtain |xn(j) − x(j)| ≤ ε for all j, and hence kxn − xk ≤ ε, for all n > N. We have shown that xn → x. Now let p < ∞. Let us prove the triangle inequality first. By concavity of ln(x), we have ln(λa + µb) ≥ λ ln(a) + µ ln(b), for all λ + µ = 1, λ, µ ≥ 0, and a, b > 0. Exponentiating the above inequality we obtain aλbµ ≤ λa + µb. 1 1 p q Letting λ = p , µ = q and replacing a → a and b → b we obtain ap bq (1) ab ≤ + , (Young’s Inequality) p q 1 1 n n whenever p + q = 1, p ≥ 1. Next, consider finite sequences x = {xi}i=1, y = {yi}i=1 and observe by (1) n n n X 1 X 1 X 1 1 x y ≤ |x |p + |y |q = kxkp + kykq. i i p i q i p p q q i=1 i=1 i=1 12 R. SHVYDKOY P So, if kxkp = kykq = 1, then | i xiyi| ≤ 1. For general x 6= 0 and y 6= 0, we consider the corresponding unit vectorsx ¯,y ¯, for which according to the above n X xi yi ≤ 1. kxk kyk i=1 p q Hence, we obtain

n X (2) xiyi ≤ kxkpkykq, (H¨olderInequality). i=1 Finally, n n n X p X p−1 X p−1 |xi + yi| ≤ |xi + yi| |xi| + |xi + yi| |yi| i=1 i=1 i=1 n !1/q X (p−1)q p/q ≤ |xi + yi| [kxkp + kykp] = kx + ykp [kxkp + kykp]. i=1 Thus, p p/q kx + ykp ≤ kx + ykp [kxkp + kykp], and this implies

kx + ykp ≤ kxkp + kykp, (Minkowski’s inequality) which is what we need only for finite sequences. It remains to notice that if x, y ∈ `p are arbitrary, then the above inequality shows that the partial sums of the p-series of x + y are uniformly bounded, which in turn implies that x + y ∈ `p and the triangle inequality (iii) holds as desired. ∞ To prove that `p is complete, let xn = {xn(j)}j=1 be Cauchy. Then, as before we can pass to the limit in every coordinate xn(j) → x(j). For a fixed J ∈ N, ε > 0 and n, m large enough, we have J X p |xn(j) − xm(j)| < ε. j=1 Letting m → ∞, we obtain J X p |xn(j) − x(j)| < ε. j=1 P p In particular, this implies that all partial sums of the series j |x(j)| are bounded, and hence x = {x(j)}j ∈ `p. Now, let us let J → ∞ in the estimate above. We obtain then 1/p kxn − xkp ≤ ε , thus xn → x.  ∞ Our next classical example is c0. This is the space of sequences {xj}j=1 such that limj→∞ xj = 0 endowed with the uniform k · k∞ norm.

Lemma 2.5. c0 is a Banach space.

Proof. Since c0 ⊂ `∞, and c0 is clearly a linear space which shares the same norm with `∞, it is sufficient to show that c0 is closed as a subset of `∞, as any closed subset of a complete metric space is also complete. LECTURES ON FUNCTIONAL ANALYSIS 13

So, let xn ∈ c0 and xn → x ∈ `∞. Then for any ε > 0 there exists n such that ε kx − xk < . n ∞ 2 k At the same time, since coordinates of xn tend to zero, xn → 0, k → ∞, there is K ∈ N such that for k > K we have ε |xk | < . n 2 In view of the two inequalities above, k k k k |x | ≤ |xn − x | + |xn| < ε, for all k > K. This shows that x ∈ c0.  Let us introduce the Lebesgue spaces (see also Math 533). Let (Ω, Σ, µ) be a space, which is a set Ω with σ- Σ or subsets and a positive σ-additive measure µ :Σ → R+ defined on it. Fro 1 ≤ p < ∞, we define  Z  p Lp(Ω, Σ, µ) = [f]: |f(x)| dµ(x) < ∞ , Ω where [f] is the equivalence class of functions g such that g = f almost everywhere. The norm is defined by 1 Z  p p kfkp = |f(x)| dµ(x) . Ω Here f is any representative of the class. For p = ∞ we define the space of essentially bounded functions

L∞(Ω, Σ, µ) = {[f]: ∃E ∈ S, µ(E) = 0, f|Ω\E is bounded }. We define the essential supremum norm by

kfk∞ = esssupΩ f = inf sup |f(x)|. E∈S:µ(E)=0 x∈Ω\E

Lemma 2.6. All spaces Lp(Ω, Σ, µ) are Banach. Another classical space is C(K). Let us assume that K is a compact topological space (e.g. a closed bounded domain in Rn). We define C(K) as the space of continuous functions f : K → R. The norm is defined by

kfkC(K) = max |f(x)|. x∈K Note that the maximum is always achieved by Corollary 1.7, and for continuous functions kfk∞ = kfkC(K). Lemma 2.7. The spaces C(K) is Banach.

This follows the same proof as before: we fist find a pointwise limit fn(x) → f(x), then conclude that the limit is uniform

kfn − fk∞ → 0, and invoke the classical theorem from saying that a uniform limit of continuous function is continuous. 14 R. SHVYDKOY

Definition 2.8. A normed space X is called separable if it contains a countable dense subset, i.e. if there is S ⊂ X, card S = ω0 such that for every x ∈ X and any ε > 0 there is y ∈ S with kx − yk < ε. 2.5. Subspaces, direct products, quotients. We say that Y ⊂ X is a subspace if it is closed under linear operations. We say that Y is closed if it is closed in the norm-topology of X. If in addition X is complete, then so is its every closed subspace. Thus, any closed subspace of a Banach space is Banach. We say that Y is dense in X if for every x ∈ X and every ball Bε(x) there is a point y ∈ Bε(x). In other words, Y is dense in the norm-topology. Let us now fix a closed Y ⊂ X and consider the equivalence relation x1 ∼ x2 iff x1 − x2 ∈ Y . This defines a conjugacy class [x] = x + Y , for every x ∈ X. The space of all conjugacy classes is called the quotient-space of X by Y , denoted X/Y , with the natural linear operations inherited from X. We can endow X/Y with a norm too, called the quotient-norm: (3) k[x]k = inf{kx + yk : y ∈ Y } = dist{x, Y }. Notice that we always have

k[x]kX/Y ≤ kxkX . Lemma 2.9. The above defines a norm on X/Y . If X is complete, then X/Y is complete as well in the quotient-norm. Proof. We will use Exercise 3.2 for the proof. Let us suppose that we have an absolutely convergent series in X/Y : X k[xi]kX/Y < ∞. i

We can assume that all k[xi]k > 0. Then for each i we can find a yi ∈ Y such that

kxi + yikX ≤ 2k[xi]kX/Y . Then X kxi + yikX < ∞. i P Since X is a Banach space, the series i(xi + yi) = x converges. Thus,

N N X X [xi] − [x] ≤ (xi + yi) − x → 0, i=1 X/Y i=1 X which precisely means that ∞ X [xi] = [x]. i=1 

If X is endowed with a pseudo-norm, k · k, consider X0 = {x ∈ X : kxk = 0}. This is a closed linear subspace of X, and moreover, kx + yk = kxk for all x ∈ X, y ∈ X0. It is easy to show that (3) defines a norm on X/X0, i.e. axiom (i) holds. A direct product of two linear spaces X and Y is defines as the space of pairs X × Y = {(x, y): x ∈ X, y ∈ Y } LECTURES ON FUNCTIONAL ANALYSIS 15

endowed with the coordinate-wise operation of addition and multiplication by a scalar. This makes X × Y into a linear space. Identifying elements of the product (x, 0) with x , and (0, y) with y arranges a natural embedding of X and Y into X × Y . We thus can write x + y = (x, y). If both spaces are normed, (X, k · kX ) and (Y, k · kY ), there are many ways one can define a norm on the product. For example, let 1 ≤ p < ∞. We can define a new norm on X × Y by p p 1/p kx + ykp = (kxkX + kykY ) . The verification that this rule defined a norm is immediate from Minkowski’s inequality established above. The obtained normed space is called the `p-sum of the X and Y and denoted X ⊕p Y . For p = ∞ we naturally define X ⊕∞ Y equipped with the norm

kx + yk∞ = max{kxkX , kykY }.

Similarly, we define `p-sums of any number of spaces and even countably many spaces by requiring a member of X1 ⊕p X2 ⊕p ... to be a sequence of vectors x = {x1, ...} such that  1/p kxk = P∞ kx kp < ∞, or bounded in the case p = ∞. p j=1 j Xj Let us notice that for any pair of vectors x ∈ S(X) and y ∈ S(Y ), the span{x, y} will be 2 identical to `p in the `p-product of spaces. So, for example, the unit ball of the X ⊕1 R will look like a symmetric tent with B(X) being the base and (0, 1) the top point. The ball of X ⊕∞ R would be the cylinder with base B(X) and height 1. 2.6. Norm comparison and equivalence. Let (X, k · k) be a normed space and Y ⊂ X is a subspace with another norm |||·|||. We say that the norm |||·||| is stronger than k · k if there exists a constant C > 0 such that (4) kyk ≤ C |||y||| , for all y ∈ Y. The two norms are equivalent if there are c, C > 0 for which (5) c |||y||| ≤ kyk ≤ C |||y||| , for all y ∈ Y. Geometrically, (4) means that B|||·|||(Y ) ⊂ CBk·k(Y ), while (5) means that there is embedding in both sides, cBk·k(Y ) ⊂ B|||·|||(Y ) ⊂ CBk·k(Y ). The stronger norm, therefore, defines a finer topology on Y , while equivalent norms define the same topology.

Example 2.10. We have `p ⊂ `q, for all 1 ≤ p ≤ q ≤ ∞, and

(6) kxkq ≤ kxkp. q p Indeed, assuming that x = (x1,... ) ∈ S(`p) implies that all |xi| ≤ 1. Hence, |xi| ≤ |xi| , and thus, x ∈ `q. Moreover, kxkq ≤ 1. The general inequality (6) follows by homogeneity. Example 2.11. We have the opposite embeddings for the Lebesgue spaces: Lq(dµ) ⊂ Lp(dµ), for all 1 ≤ p ≤ q ≤ ∞,

1 − 1 q (7) kfkp ≤ µ(Ω) p q kfkq, for all f ∈ L (dµ). It readily follows from the H¨olderinequality, Z Z p/q |f|pdµ ≤ |f|qdµ µ(Ω)1−p/q. Ω Ω p 1−p/q p Thus, kfkp ≤ µ(Ω) kfkq, from which (7) follows. 16 R. SHVYDKOY

2.7. Compactness in finite dimensional spaces. Suppose that X is a finite dimensional linear space. By definition, the dimension of X is the size of the largest linearly independent set of vectors x1, . . . , xn. So, we write n = dim X. We can always assume that xi’s are unit vectors. Theorem 2.12. B(X) is compact if and only if dim X < ∞. Proof. Let us assume that dim X < ∞. We argue by induction on the dimension n. If n = 1, then fixing a unit vector x0 ∈ X we can see that B(X) = [−1, 1]x0. So, the result follows from the compactness of [−1, 1]. Suppose now the result holds for all n-dimensional spaces. Let X be (n + 1)-dimensional.

For a sequence ym ∈ B(X), consider expansions n+1 X i i ym = ymxi, ym ∈ R. i=1 + i We claim that all coordinates are bounded. Indeed, if not, let ym = maxi |ym|. Then y+ → ∞ for some subsequence. Let us pick i for which |yi0 | = y+ infinitely many times. mk 0 mk mk WLOG we can assume that mk designates a further subsequence on which this occurs. Then i ym X ym = x + k x . i0 i0 i0 i ymk ymk i6=i0 Then y m = x − z , i0 i0 k ymk where i X ym z = − k x . k i0 i ymk i6=i0

We have (recalling that all xi’s are unit)

kzkk ≤ n.

So, all zk’s belong to an n-multiple of the unit ball of X restricted to the n-dimensional subspace Y = span{xi}i6=i0 . By the induction hypothesis, such ball is compact, hence passing to a further subsequence zk → z ∈ nB(Y ). Yet, y m → 0. i0 ymk

Hence, zk → xi0 from which we conclude that xi0 ∈ Y , a contradiction with linear indepen- dence. i Now that we know that all ym’s are bounded we use the Bolzano-Weierstrass Theorem to pass to subsequences yi → yi. Then form the vector y = P yix . We have mk i i X ky − yk ≤ |yi − yi| → 0. mk mk i Conversely, let us assume now that dim X = ∞ and show that the ball is not compact. It suffices to construct a separated sequence of vectors x1, x2, ... so that all kxnk = 1 and kxn − xmk ≥ 1/2. Indeed, any such sequence would not contain a convergent subsequence. LECTURES ON FUNCTIONAL ANALYSIS 17

To this end, let us fix an arbitrary first vector x1 ∈ S(X). Consider the space Y1 = span x1, and find x2 ∈ S(X) such that 1 k[x ]k = dist{x ,Y } > . 2 X/Y1 2 1 2

We can do that as follows. Consider any [x] ∈ S(X/Y1). Note that there is a vector y1 ∈ Y1 such that kx + y1k < 2. x+y1 Then letting x2 = ∈ S(X) we obtain kx+y1k

x y1 k[x]kX/Y1 1 k[x2]kX/Y1 = inf + + y = ≥ . y∈Y1 kx + y1k kx + y1k kx + y1k 2

Then consider Y2 = span{x1, x2} and find x3 ∈ S(X) with dist{x3,Y2} > 1/2, and so on. The process will never terminate since X is not a span of finitely many vectors. Now, if n > m, then xm ∈ Ym ⊂ Yn−1, and so dist{xn,Yn−1} > 1/2. Hence, kxn − xmk > 1/2.  Corollary 2.13. A subset K of a finite-dimensional space X is precompact if and only if it is bounded. Theorem 2.14. On a finite dimensional linear space X all norms are equivalent. Proof. By transitivity, it suffices to show that all norms on X are equivalent to the norm of n n P `1 . So, let k · k be a norm on X, and let {ei}i=1 be a unit basis in X. Then for x = xiei, X X kxk ≤ |xi|keik ≤ |xi| := kxk1. i To establish an inequality from below, let us consider the norm-function N(x) = kxk. By compactness of Sk·k1 (X) and continuity of N, N attains its minimum on Sk·k1 (X) at x0, see Corollary 1.7. Then N(x0) = c > 0, since N never vanishes on a non-zero vector. So, kxk ≥ c, for all x ∈ Sk·k1 (X), and hence kxk ≥ ckxk1, by homogeneity.  2.8. Convex sets. We say that a set A ⊂ X is convex if x, y ∈ A implies λx + (1 − λ)y ∈ A for all 0 < λ < 1, i.e. with every pair of points A contains the connecting them. A direct consequence of homogeneity and triangle inequality of the norm is that any ball is a . Let us recall that for a set A ⊂ X we define the convex hull of A as the set of all convex combinations of elements from A: ( N N ) X X conv A = λiai : ai ∈ A, λi = 1, λi ≥ 0,N ∈ N . i=1 i=1 It is the smallest convex subset of X containing A, or equivalently, the intersection \ conv A = C. A⊂C,C convex The topological closure of the convex hull conv A is the same as the smallest closed convex set containing A, or the intersection of such sets.

Theorem 2.15 (Caratheodori). Let A ⊂ Rn, then every point a ∈ conv A can be represented as a convex combination of at most n + 1 elements in A. 18 R. SHVYDKOY

PN P Proof. Suppose x = i=1 λiai, all λi > 0, λi = 1, and N > n + 1. We will find a way to introduce a correction into the convex combination above as to reduce the number of elements in the sum by 1. Then the proof follows by iteration. First, let us observe that since N > n+1, the number of elements in the family a2 −a1, a3 − a1, . . . , aN − a1 is larger then the dimension and hence they are not linearly independent. PN So, we can find constants ti ∈ R, not all of which are zero, such that ti(ai − a1) = 0. P i=2 Denoting t1 = − ti, we can write N X tiai = 0. i=1

By reversing the sign of all the ti’s if necessary, we can assume that at least one of them is positive. We will now adjust the original convex combination by a constant multiple of the zero-sum above, thus not changing the x: N N N X X X x = λiai − ε tiai = (λi − εti)ai. i=1 i=1 i=1

Letting ε = mint >0{λi/ti} ensures that µi = λi − εti ≥ 0 for all i, and that for some i0, i P µi0 = 0. Yet, clearly, µi = 1. Thus, the new representation N X x = µiai, i=1 is at least one term shorter.  Corollary 2.16. If A ⊂ Rn is compact, then conv A is compact too. Indeed, simply use the previous theorem and pass to nested subsequences in all n + 1 terms by compactness. In the infinite closeness or even compactness of A is not sufficient to conclude that conv A is automatically closed. Let us consider the following 1 example. Let X = `2, and A = { n~en}n ∪ {0}. It is easy to see that A is compact. Any P∞ 1 element of conv A has only finitely many non-zero entries, yet x = n=1 2nn~en ∈ conv A. 2.9. Linear bounded operators. A map T between two linear spaces X → Y is called a linear operator if T (αx+βy) = αT (x)+βT (y). We usually drop the parentheses, T (x) = T x, when a linear operator is in question. Suppose (X, k · kX ) and (Y, k · kY ) are normed. A linear operator T : X → Y is called bounded or continuous if there exists a constant C > 0 such that

(8) kT xkY ≤ CkxkX , holds for all x ∈ X. We denote the set of all linear bounded operators between X and Y by L(X,Y ). If X = Y , we simply denote L(X,X) = L(X). The following theorem justifies the terminology. Theorem 2.17. Let T : X → Y be a linear operator. The following are equivalent: (i) T ∈ L(X,Y ); (ii) T maps bounded sets into bounded sets; (iii) T is continuous as a map between X and Y endowed with their norm topologies; (iv) T is continuous at the origin. LECTURES ON FUNCTIONAL ANALYSIS 19

Proof. The implication (i) ⇒ (ii) is clear from (8). Conversely, T , in particular, is bounded on the unit ball of X, i.e. there exists a C > 0 such that, kT xkY ≤ 1, for all x ∈ B(X). If x ∈ X is arbitrary, thenx ¯ = x/kxk ∈ S(X), and hence kT x¯kY ≤ C. So, by linearity, we obtain (8). The implication (i) ⇒ (iv) is also clear directly from (8). If (iv) holds, and x0 ∈ X is arbitrary, then for y → 0, by linearity and continuity at the origin, we have

T (x0 + y) = T x0 + T y → T x0,

showing that T is continuous at x0. Thus, (iii) holds. Finally, if (iv) holds, then there is a 0 δ > 0 such that kxkX < δ implies kT xkY < 1. So, if x is arbitrary, consider x = δx/kxk. 0 Then kT x k ≤ 1 implies kT xk ≤ kxk/δ giving us (8). 

If T ∈ L(X,Y ) we define the norm T as follows:

kT k = inf{C > 0 : (8) holds}.

In particular, for any x ∈ X, kT xk ≤ Ckxk holds for all C > 0 for which (8) holds. This shows that the infimum itself provides the bound from above

kT xkY ≤ kT kkxkX , ∀x ∈ X.

The set L(X,Y ) of all bounded linear operators between X and Y clearly forms a linear space, and the above exercise shows that the operator norm endows it with a norm.

Theorem 2.18. If (X, k · kX ) is normed and (Y, k · kY ) is Banach, then the space L(X,Y ) is Banach in its operator norm.

Proof. Suppose {Tn} ⊂ L(X,Y ) is Cauchy. Then, in particular, for any fixed x ∈ X,

kTnx − TmxkY ≤ kTn − TmkkxkX → 0, as n, m → ∞. Thus, {Tnx} is Cauchy in Y . We, therefore can define the limit

T (x) = lim Tnx, n→∞ for each x ∈ X. By linearity of Tn’s the limit is a linear operator as well. To show that it is bounded, observe that the original sequence of operators is bounded, thus kTnk ≤ M for some M and all n. So, kT xkY ≤ MkxkX for all X, which proves boundedness. Finally, to show that Tn → T in operator norm, let us fix ε > 0 , then for all n, m large and all x ∈ B(X) we have

kTnx − TmxkY < ε.

Let us keep n fixed and let m → ∞. We already know that Tmx → T x, thus,

kTnx − T xkY ≤ ε holds for all x ∈ B(X) and all n large. This gives kTn − T k ≤ ε for all n large, which completes the proof.  20 R. SHVYDKOY

2.10. Invertibility. Let us introduce some terminology associated with operators. Let T ∈ L(X,Y ). The kernel of T is defined by Ker T = T −1(0) = {x ∈ X : T x = 0}, the range is defined by Rg T = {y ∈ Y : ∃x ∈ X, T x = y}. Note that for a the kernel is always closed, while the image may not be. We say that T is injective if Ker T = {0}; surjective if Rg T = Y , bijective if T is both injective and surjective. Definition 2.19. We say that T ∈ L(X,Y ) is invertible if T is bijective and the inverse T −1 is bounded. We will learn later in Corollary 3.16 that in fact any bijective bounded linear operator is automatically invertible. So, the fact that T −1 is bounded comes for free. For bijective operators this requirement is equivalent to T being bounded from below, which means the operator satisfies the lower bound (9) ckxk ≤ kT xk, for all x ∈ X. Indeed, if T −1 is bounded, then kxk = kT −1T xk ≤ kT −1kkT xk. Thus, 1 kT xk ≥ kxk. kT −1k On the other hand, if (9) holds, then kT −1yk = kT −1T xk = kxk ≤ c−1kT xk = c−1kyk. Boundedness from below by itself implies several important properties of the operator. Clearly, it implies that T is injective. Injectivity, however, does not imply boundedness from 1 below as seen for the (13) with an = n . Lemma 2.20. An operator T ∈ L(X,Y ) is invertible if and only if there exists another operator S ∈ L(Y,X) such that

(10) S ◦ T = IX ,T ◦ S = IY . Proof. Obviously if T is invertible, then S = T −1 is the desired operator satisfying (10). Conversely, if S exists, then S(T x) = x for any x 6= 0, so TX 6= 0. Hence, T is injective. At the same time, for any y ∈ Y , we have T x = y for x = Sy. Hence, T is surjective. This means that T −1 exists as an inverse map. However, from (10), −1 −1 −1 S = STT = IT = T . −1 So, T = S is bounded. This finishes the lemma.  Lemma 2.21. Suppose that T ∈ L(X,Y ) and S ∈ L(Y,Z) are invertible operators. Then ST is invertible. Proof. If both S and T are invertible, then it is easy to verify that T −1S−1 is the inverse of ST – simply check the identities (10).  LECTURES ON FUNCTIONAL ANALYSIS 21

Let us consider two important examples of operators on `p – the left shift and right shift:

Tlx = (x2, x3,... ), (11) Trx = (0, x1, x2,... ).

Clearly, Tl is non-injective, while Tr is non-surjective. So, none of them is invertible. Yet, Tl ◦Tr = I is obviously invertible. This means that the converse to Lemma 2.21 does not hold – invertibility of the product does not translate into invertibility of the individual product terms. It turns our that the converse does actually hold if we additionally assume that the operators commute. Note that this is not the case in our example, since

Tr ◦ Tlx = (0, x2, x3,... ). Lemma 2.22. Let T,S ∈ L(X). If ST = TS and ST is invertible, then both S and T are invertible. Proof. Denote by U = (ST )−1. Let us show that U commutes with S. Indeed, we have S(ST ) = (ST )S. Let us multiply this identity by U from the left and from the right US(ST )U = U(ST )SU. Using that U is inverse to ST , we obtain US = SU. Now let us check that US is the inverse to T . Indeed,

(US)T = U(ST ) = I, and T (US) = T (SU) = (TS)U = I. Similarly, UT = TU, and UT is the inverse of S. 

Exercise 2.2. Generalize Lemma 2.22 to several operators: let T1,...,Tn ∈ L(X), and TiTj = TjTi and T1 ◦ · · · ◦ Tn is invertible. Then all Ti’s are invertible. Hint: use induction. Invertible operators are sometimes called between two Banach spaces. If an exists between two spaces we call the spaces isomorphic, and denote X ≈ Y . T is called an isometric isomorphism if T is invertible and T is an , i.e. it preserves the lengths of vectors (12) kT xk = kxk, ∀x ∈ X. We denote by X ∼= Y isometrically isomorphic spaces. We generally don’t distinguish between such spaces, and simply refer to them as equal, although sometimes specify the identification rule, i.e. T : X → Y , between their elements. We can extend this terminology to operators which are not necessarily surjective yet act as isomorphisms onto their own images. We say that T : X → Y is an isomorphic embedding if (9) holds. Lemma 2.23. The range of any operator T bounded from below is a closed subspace of Y . 22 R. SHVYDKOY

Proof. If yn = T xn → y in Y , then yn is Cauchy. According to (9), −1 kxn − xmk ≤ c kyn − ymk → 0.

So, xn is Cauchy. Then xn → x in X, and hence T x = y, so y ∈ Rg T .  According to this lemma the range of a bounded from below operator is a Banach space. So, T establishes an isomorphism Rg T ≈ X. Similarly, T is called an isometric embedding if T is an isometry. This may seem like a redundancy in terminology, isometry = isometric embedding, bounded from below = isomorphic embedding, etc. The “embedding” clause is used more often in a situation when the attention is focused on the properties of spaces themselves rather than on operators acting between them. Let us observe that equivalence between two norms introduced in Section 2.6 in new terms is equivalent to saying that the identity i :(X, k · k) → (X, |||·|||) is an isomorphism. Example 2.24 (The quotient-map). Let Y be a closed proper subspace of a normed space X. Let us consider the quotient-map J : X → X/Y defined by the rule Jx = [x]. From the definition of the factor-norm, it is clear that kJxk ≤ kxk, thus making J a con- traction map. To show that in fact kJk = 1, let us fix one x ∈ X not in Y . Then k[x]k > 0. For a fixed ε > 0, let us find y ∈ Y such that k[x]k ≤ kx + yk ≤ k[x]k + ε. Consider the normalized unit vector x + y = (x + y)/kx + yk. Then k[x]k ε kJ(x + y)k = ≥ 1 − . kx + yk k[x]k Since, x is fixed and ε is arbitrary, we obtain kJk = 1. This observation shows that for any closed proper subspace Y we can find a vector on the S(X) which is almost a distance 1 away from Y . 2.11. Complemented subspaces. Suppose Z is a linear space, X,Y ⊂ Z are subspaces such that Z = X + Y and X ∩ Y = {0}. We call Z a direct sum of X and Y . This is equivalent to the statement that for every z ∈ Z there exists a unique couple of vectors x ∈ X, y ∈ Y such that z = x + y. We thus have two linear maps P z = x, Qz = y, so that P + Q = I, called projections. Suppose now, that Z is normed. It is not automatic that the operators P,Q are bounded. We say that X and Y are complemented, or X is a complement of Y , or Y is a complement of X, and write Z = X ⊕ Y , if P , or equivalently, Q, is bounded. The summands in this case are necessarily closed subspaces since X = ker Q and Y = ker P . We also say that P is the projection onto X along Y . Theorem 2.25. We have Z = X ⊕ Y if and only if dist{S(X),S(Y )} > 0. Proof. Suppose he projection P : Z → X is bounded. Let x ∈ S(X) and y ∈ S(Y ). Then kx − yk ≥ kP k−1kP (x − y)k = kP k−1kxk = kP k−1. Thus, dist{S(X),S(Y )} ≥ kP k−1. Suppose now that dist{S(X),S(Y )} > 0, and yet P is not bounded. It implies that there exists a sequence xn + yn ∈ S(Z) such that kxnk → ∞. By the triangle inequality, LECTURES ON FUNCTIONAL ANALYSIS 23

kynk xn yn kxnk − 1 ≤ kynk ≤ kxnk + 1. Thus, → 1. We have + → 0. On the other kxnk kxnk kxnk hand,   xn yn xn yn yn kynk + = + + − 1 kxnk kxnk kxnk kynk kynk kxnk

xn yn kynk ≥ + − − 1 kxnk kynk kxnk

In view of all of the above, xn + yn → 0, in contradiction with our assumption. kxnk kynk  Corollary 2.26. If Z = X + Y is a direct sum of closed subspaces, and dim X < ∞, then the spaces X and Y are complemented. Proof. Indeed, if the spheres of X and Y are not separated, then there exist sequences yn ∈ S(Y ) and xn ∈ S(X) such that kxn − ynk → 0. By compactness we can choose a subsequence ynk → y ∈ S(Y ). Thus, xn → y as well, which by the closedness of S(X) implies y ∈ X, a contradiction.  In the situation described above, the subspace Y ⊂ Z is called finite co-dimensional. 2.12. Completion. For every incomplete normed space (X, k · k) there is a way to actually complete it to the full Banach space where it would embed densely. To do that, consider ˜ another space X of all Cauchy sequences {xn}n in X endowed with `∞ norm. Note that for any Cauchy sequence limn kxnk exists. So, it is in fact a special subspace of X ⊕∞ X ⊕∞ .... One can show that it is complete by a diagonalization procedure. Next, we consider the ˜ subspace of vanishing sequences X0 = {{xn}n : limn kxnk = 0}, and the quotient space ¯ ˜ ˜ X = X/X0. One show that the quotient-norm of a conjugacy class is given by k[{xn}]k = ¯ limn kxnk. Then one can embed i : X → X by assigning i(x) = (x, x, ...). One can show that i is an isometric embedding of X into X¯, Rg i is dense in X¯, and of course X¯ is complete by contraction. The space X¯ is called a completion of X. 2.13. Extensions, restrictions, reductions. If T : X → Y is a bounded operator and ˜ ˜ X0 = Ker T , we can construct a new operator T : X/X0 → Y by the rule T ◦J = T . One can easily check that this definition is not ambiguous. Moreover, one has kT k ≤ kT˜kkJk = kT˜k. ˜ ˜ And on the other hand, if kT [x]k ≥ kT k − ε and k[x]k < 1, then for some x0 ∈ X0, ˜ ˜ kx + x0k < 1, and yet T [x + x0] = T (x + x0). This shows the opposite inequality kT k ≤ kT k. Thus, kT˜k = kT k. Notice that the reduced operator T˜ has trivial kernel. If Y ⊂ X is a dense linear subspace of a Banach space X and T : Y → Z is a bounded operator to another Banach space Z, then one can uniquely extend T to the entire X in a linear fashion and preserving the norm of T . Indeed, if x ∈ X, then there exists a sequence yn → x, yn ∈ Y . This implies in particular that {T yn} is a Cauchy sequence in Z. Since Z is complete, it has a limit, which we call T˜ x. This limit is independent of the original 0 0 sequence yn, simply because another such sequence yn would satisfy kyn − ynk → 0, hence 0 kT yn − T ynk → 0. Linearity follows similarly. Also if x ∈ Y , then we can pick yn = x, so ˜ ˜ ˜ T = T on Y . Furthermore, one can show that kT kX→Z = kT kY →Z . The operator T is called a bounded extension of T . Show that there exists only one such extension. Let us note that if Y was not a dense subspace, even more specifically, if Y ⊂ X is a proper closed subspace, then an extension of T : Y → Z to X would not be possible unless we know 24 R. SHVYDKOY

some specific information about the space Y itself. For instance, if Y is complemented, Y ⊕ W = X, then one could extend T to X be declaring T w = 0 for all w ∈ W . However, this extension could increase the norm of T .

If T : X → Z, and Y ⊂ X, we define the restriction of T onto Y by T |Y (y) = T y. Note that the norm of the restricted operator may decrease in general. 2.14. Exercises. Exercise 2.3. Show that S(X) is compact for any finite-dimensional space X. Exercise 2.4. Show that

kT k = sup kT xkY = sup kT xkY . x∈B(X) x∈S(X) These identities say that the norm of an operator is the measure of deformation of the unit ball of X under T . In particular, if kT k ≤ 1, then T is called a contraction. Exercise 2.5. Suppose that T,S ∈ L(X,Y ). Then kT + Sk ≤ kT k + kSk. Exercise 2.6. Suppose that T : X → Y and S : Y → Z are bounded. Prove that kS ◦ T k ≤ kSkkT k. ∞ p p Exercise 2.7. Let a = (a1, a2, ...) ∈ ` be a sequence. Define T : ` → ` by

(13) T x = (a1x1, a2x2,...).

Show that T is bounded and kT k = kak∞. Exercise 2.8. Prove that if T is invertible, then the sharp constant in (9) is in fact c = kT −1k−1. 2 ∼ 2 n ∼ n Exercise 2.9. Show that `1 = `∞, but `1 6= `∞ for all n ≥ 3. Exercise 2.10. Show that the norm of any projection operator is at least 1. Exercise 2.11. Prove that a bounded operator P : X → X is a projection onto a subspace if and only if it is idempotent, P 2 = P . Exercise 2.12. Show that if Z = X ⊕ Y , then Z/X ≈ Y . Hint: consider the projector P : Z → Y along X, and its factor by the kernel P˜ : Z/X → Y .

3. Fundamental Principles 3.1. The . If the target space of a linear operator is R, or C, the field over which X itself is defined, the operator is called a linear functional, real or complex respectively. The space of all linear functionals on X is denoted X0, while the space of all linear bounded functionals is denoted by X∗, and called the dual space. It is often possible to identify the dual of a Banach space up to an isometry. ∗ ∼ ∗ ∼ 1 1 Example 3.1. c0 = `1, `p = `q, where p + q = 1 and 1 ≤ p < ∞. The dual of `∞ is a very big space of finitely additive measures of on N.

Let us carry out construction for `p, p < ∞. First, we notice that the space has what is called a . LECTURES ON FUNCTIONAL ANALYSIS 25

Definition 3.2. A collection of vectors e1, e2,... ⊂ X is called a Schauder basis if for every vector x ∈ X there exists a unique set of scalars a1, a2,... such that X x = aiei, i where the series converges in the usual sense in X.

Indeed, in `p such a basis is given by coordinate vectors (ei)j = δi−j. So, if we have a ∗ bounded functional f ∈ `p, then we can assign a unique sequence to it given by f(ei) = fi. Then the action is given by X (14) f(x) = fiai. i q Let us show that (fi)i ∈ ` . Indeed, let us fix an arbitrary N ∈ N and consider x with q−1 coordinates ai = |fi| sgn fi for i ≤ N, and 0 for i > N. Then by the boundedness of f, N X q f(x) = |fi| ≤ kfkkxkp. i=1 However, N !1/p X q kxkp = |fi| . i=1 Dividing by kxkp we obtain N !1/q X q |fi| ≤ kfk, i=1 q for all N. This shows that (fi) ∈ ` and in fact k(fi)kq ≤ kfk. At the same time, by the H¨olderinequality (2), we obtain the opposite

X |f(x)| = fiai ≤ k(fi)kqkxkp. i ∗ This shows kfk ≤ k(fi)kq. So we have defined an isometric embedding i : `p → `q by ∗ associating a sequence in `q for every element of `p. But every sequence in `q gives rise to ∗ ∼ the functional defined by (14). So this embedding is onto. Hence, `p = `q. 0 3.2. Structure of linear functionals. Suppose f ∈ X \{0}, and let x0 ∈ X be a vector f(x) such that f(x0) 6= 0. Then let x ∈ X be an arbitrary vector. Define y = x − x0. Then f(x0) clearly, f(y) = 0. It shows that for any x there exist unique y ∈ Ker(f) and λ ∈ R such that

x = λx0 + y. In particular it shows that the kernel of f is one co-dimensional. We will show that the distinction between bounded and unbounded functionals comes in the condition of closeness of the kernel, or even less restrictively, its density. Lemma 3.3. Let f ∈ X0\{0}. The following are equivalent: (a) f ∈ X∗; (b) Ker(f) is closed, and hence the sum X = Rx0 ⊕ Ker(f) is complemented. (c) Ker(f) is not dense in X. 26 R. SHVYDKOY

Proof. The implications (a) ⇒ (b) ⇒ (c) are trivial. Suppose that Ker(f) is not dense. Then for some ball Br(x0) ∩ Ker(f) = ∅. Let y ∈ S(X). Then g(t) = f(x0 + ty) = f(x0) + tf(y) is a continuous non-vanishing function on (−r, r). This implies that the sign of it has to agree with that of f(x0). Assuming f(x0) > 0 we then have f(x0) + tf(y) > 0, and so, −1 |f(y)| ≤ r f(x0). This shows that f is bounded on the unit sphere and completes the proof.  Geometrically linear bounded functionals can be identified with affine hyperplanes. Thus, if f ∈ X∗, then H(f) = {x ∈ X : f(x) = 1}, defines f uniquely. If f ∈ S(X∗), then the hyperplane is in some sense tangent to the unit sphere of X, namely, it does not deep inside the of the unit ball and it approaches arbitrarily close to S(X). Note that a functional may not necessarily attain its highest values on the sphere, i.e. the norm. For example, let X = `1, and f = (1/2, 2/3, 3/4,...) ∈ S(`∞). There is no sequence x ∈ S(`1) for which f(x) = 1. If x ∈ S(X) and f ∈ S(X∗) with f(x) = 1, then f is called a supporting functional of x. Existence of supporting functionals is not immediately obvious, and it brings us to an even more fundamental question – does there exist at least one non-zero bounded linear functional on a given normed space?

3.3. The Hahn-Banach extension theorem. The essence of the Hahn-Banach extension theorem is to show that a given bounded functional defined on a linear subspace of X can be extended boundedly to the whole space X retaining not only its boundedness but also its norm. The boundedness can be expressed as the condition of domination by the norm- function, i.e. if Y ⊂ X and f ∈ Y 0 then f ∈ Y ∗ if and only if f(y) ≤ Ckyk, for some C > 0. We will in fact need a more general extension result that will be useful when establishing separation theorems later in Section 3.6. We thus consider a positively homogeneous convex functional p : X → R ∪ {∞}, which means that p(tx) = tp(x) for all x ∈ X and t ≥ 0, and p(λx + (1 − λ)y) ≤ λp(x) + (1 − λ)p(y), for all 0 < λ < 1, x, y ∈ X. The latter is equivalent to the triangle inequality, p(x + y) ≤ p(x) + p(y). Note that a norm, or a quasi-norm, is an example of such a functional. We say that p dominates f on Y if f(y) ≤ p(y) for all y ∈ Y . Theorem 3.4 (Hahn-Banach extension theorem). Suppose Y ⊂ X, and p is a positively homogeneous convex functional defined on X. Then every linear functional f ∈ Y 0 dominated by p on Y can be extended to a linear functional f ∈ X0 denominated by the same p on all of X. In the core of the proof lies Zorn’s Lemma, which we recall briefly. Let P be a partially ordered set. A subset C of P is called a chain if its every two elements are comparable, i.e. ∀a, b ∈ C either a ≤ b or b ≤ a. An upper bound for a set A ⊂ P is an element b ∈ P such that b ≥ a, for all a ∈ A. A maximal element m is an element with the property that if a ≥ m, then a = m. Generally, it may not be unique. Lemma 3.5 (Zorn’s Lemma). If every chain of P has an upper bound, then P contains a maximal element. Zorn’s lemma is equivalent to the . LECTURES ON FUNCTIONAL ANALYSIS 27

Proof of Theorem 3.4. Let us introduce the set of pairs P = {(f, Y ): f ∈ Y 0, f ≤ p} ordered by (f ,Y ) ≤ (f ,Y ) iff Y ⊂ Y and f = f . Let C ⊂ P be a chain. Define 1 1 2 2 1 2 2|Y1 1 ˜ ˜ Y = ∪(f,Y )∈C Y , and let f(y) = f(y) if y ∈ Y . This defines an upper bound of C. By Zorn’s Lemma there exists a maximal element m = (f0,Y0) ∈ P . Let us show that Y0 = X. Indeed, if not, then there is a vector x0 ∈ X\Y0. Thus, for every element x ∈ Z = span{x0,Y } there exist unique λ ∈ R and y ∈ Y0 such that x = λx0 + y. We construct an extension f of f0 to Z so that (f, Z) ∈ P and run into contradiction with the maximality of m. In order to do that it suffices to find a value of f only on x0. Let c = f(x0), then to ensure that f is still dominated by p we need to satisfy

λc + f0(y) ≤ p(λx0 + y). For λ > 0 this is equivalent to 0 0 (15) c ≤ p(x0 + y ) − f0(y ), and for λ < 0 to 00 00 (16) c ≥ f0(y ) − p(−x0 + y ). In order for such a c to exist one has to make sure that any number on the right hand side of (15) is no less than any number on the right hand side of (16), i.e. 0 0 00 00 p(x0 + y ) − f0(y ) ≥ f0(y ) − p(−x0 + y ), for all y0, y00 ∈ Y . This is true indeed, since in view of the convexity and dominance, we have 0 00 0 00 0 00 0 00 p(x0 + y ) + p(−x0 + y ) ≥ p(y + y ) ≥ f(y + y ) = f(y ) + f(y ).  Let us discuss some immediate consequences of the Hahn-Banach theorem. 3.3.1. Supporting functionals. First, every vector in a normed space (X, k·k) has a supporting functional. Indeed, let x ∈ X, define f ∈ span{x}∗ by f(λx) = λkxk. Then kfk = 1, which means f is dominated by the norm. The extension then has the same norm 1 and supports x. Since every vector can reach its norm on a functional, we can restate the definition of a norm of an operator in the “weak sense”: (17) kT k = sup y∗(T x) = sup y∗(T x). ∗ ∗ kxkX ≤1,ky kY ∗ ≤1 kxkX =1,ky kY ∗ =1 3.3.2. Second dual space. For a normed space X, one can consider the dual of the dual space, X∗∗, called second dual. There is a canonical isometric embedding i : X,→ X∗∗ defined as follows: i(x)(x∗) = x∗(x). It is convenient to use parentheses to indicate action of a functional: x∗(x) = (x∗, x). In this notation (i(x), x∗) = (x∗, x) or simply, (x, x∗) = (x∗, x). ∗ ∗ To show that i is an isometry, notice that |(i(x), x )| ≤ kx kX∗ kxkX , thus ki(x)kX∗∗ ≤ kxkX . ∗ ∗ ∗ On the other hand, let x be a supporting functional. Then kx kX∗ = 1, and (i(x), x ) = ∗ ∗∗ x (x) = kxkX . We will think of X is a subspace of X with the natural identification of elements described above. Definition 3.6. If the embedding X,→ X∗∗ covers all of X∗∗, i.e. X = X∗∗, then X is called reflexive. We will return to a discussion of reflexive spaces later as they possess very important compactness properties. 28 R. SHVYDKOY

3.4. Adjoint operators. Let T : X → Y be a bounded operator. We can define the adjoint or dual operator T ∗ : Y ∗ → X∗ by the rule (T ∗y∗, x) = (y∗, T x). Again, using the Hahn-Banach theorem we show that kT k = kT ∗k. First, |(T ∗y∗, x)| ≤ ky∗kkT xk ≤ ky∗kkT kkxk. This shows kT ∗k ≤ kT k. Let now ε > 0 be given. Find x ∈ S(X) such that kT xk ≥ kT k − ε. Then let y∗ ∈ S(Y ∗) be a supporting functional for T x. We have (T ∗y∗, x) = (y∗, T x) = kT xk ≥ kT k−ε. This shows the opposite inequality. Lemma 3.7. We have Rg T = Y if and only if Ker T ∗ is non-trivial.

Proof. Indeed, if Rg T = Z is a proper subspace of Y then we can find a point y0 6∈ Z. Define ∗ ∗ a bounded linear functional on span{y0,Z} by y (z) = 0 and y (y0) = 1. Then extend it to y∗ ∈ Y ∗ by the Hahn-Banach theorem. We have (T ∗y∗, x) = (y∗, T x) = 0, because y∗ vanishes on the range of T . Hence, y∗ ∈ Ker T ∗. ∗ ∗ ∗ ∗ ∗ Oppositely, if y ∈ Ker T , and y 6= 0, then Rg T ⊂ Ker y , and hence Rg T ⊂ Ker y .  3.5. Minkowski’s functionals. Let us recall from Section 3.6 that a function p : X → R ∪ {∞} is called convex if for any x, y ∈ X one has p(λx + (1 − λ)y) ≤ λp(x) + (1 − λ)p(y) for all 0 < λ < 1. A function q : X → R ∪ {−∞} is called concave if for any x, y ∈ X one has q(λx + (1 − λ)y) ≥ λq(x) + (1 − λ)q(y) for all 0 < λ < 1. Is it easy to see that if p is convex then the sub-level sets {p ≤ p0} are convex, and if q is concave then the super-level sets {q ≥ q0} are convex. Suppose A ⊂ X is convex and 0 ∈ A. We associate to A a , called Minkowski’s functional, pA so that A is ”almost” given as a sub level set of pA. We define pA(x) as follows. Suppose there is no t ≥ 0 for which x ∈ tA, then pA(x) = ∞. If x ∈ tA for some t ≥ 0, we set pA(x) = inf{t ≥ 0 : x ∈ tA}. We list the basic properties of the Minkowski’s functional.

(a) pA is positively homogeneous and convex; (b) {pA < 1} ⊂ A ⊂ {pA ≤ 1}. For α > 0, x ∈ tA if and only if αx ∈ αtA. This readily implies (a). Notice that for positively homogeneous functionals convexity is equivalent to triangle inequality, pA(x+y) ≤ pA(x) + pA(y). So, let x, y ∈ X. If any of pA(x) or pA(y) equal ∞, the inequality becomes trivial. If both are finite, then for every ε > 0 we can find t1 < pA(x) + ε and t2 < pA(y) + ε such that x ∈ t1A and y ∈ t2A. Then   t1 t2 x + y ∈ t1A + t2A = (t1 + t2) A + A ⊂ (t1 + t2)A. t1 + t2 t1 + t2

This shows that pA(x + y) ≤ pA(x) + pA(y) + ε, for all ε > 0. Finally, (c) follows directly from the definition and that 0 ∈ A. Suppose now B is another convex set not containing a small ball around the origin, i.e. there is δ > 0 such that δB(X) ∩ B = ∅. We can associate a similar, but now concave functional to B as follows. If x ∈ tB, for no t ≥ 0, then qB(x) = −∞. Otherwise, we define

qB(x) = sup{t ≥ 0 : x ∈ tB}. LECTURES ON FUNCTIONAL ANALYSIS 29

Condition δB(X)∩B = ∅ warrants that the supremum is finite for any x ∈ X. The following list of properties can be established in a similar fashion:

(a) qB is positively homogeneous and concave; (b) {qB > 1} ⊂ B ⊂ {qB ≥ 1}. Suppose now that we have two disjoint convex sets A and B satisfying all the assumptions above, and let pA and qB be the corresponding Minkowski’s functionals. If pA(x) < ∞, let t ≥ 0 be such that x ∈ tA. Since 0 ∈ A, the whole interval [0, x] is in tA and therefore not t in tB. This in turn implies than x 6∈ sB for any s ≥ t, for if such s existed, then s x ∈ tB contradicting the previous. As a consequence, qB(x) ≤ t. We have shown that

(18) qB(x) ≤ pA(x), for all x ∈ X. 3.6. Separation theorems. Theorem 3.8 (Generalized Hahn-Banach theorem). Let p be convex, and q concave func- tionals defined on X. Let Y ⊂ X, f ∈ Y 0 such that (19) f(y) ≤ p(x + y) − q(x), for all y ∈ Y, x ∈ X. Then f can be extended to all of X, f˜ ∈ X0, satisfying (20) q(x) ≤ f˜(x) ≤ p(x), for all x ∈ X. Proof. The proof goes exactly the same way as before. We only need to check that if Y ⊂ X, and x0 ∈ X\Y , then we can extend Y to Z = [x0,Y ] preserving the domination property (19). If c = f(x0), then we need

λc + f(y) ≤ p(x + λx0 + y) − q(x), for all x ∈ X and y ∈ Y and λ ∈ R. Again, for λ > 0 this is equivalent to 0 0 0 0 c ≤ p(x + x0 + y ) − q(x ) − f(y ), while for λ < 0, 00 00 00 00 c ≥ f(y ) − p(x − x0 + y ) + q(x ). The existence of c is ensured if 0 0 0 0 00 00 00 00 p(x + x0 + y ) − q(x ) − f(y ) ≥ f(y ) − p(x − x0 + y ) + q(x ), which is true since 0 0 00 00 0 00 0 00 0 00 0 00 0 00 p(x + x0 + y ) + p(x − x0 + y ) − q(x ) − q(x ) ≥ p(x + x + y + y ) − q(x + x ) ≥ f(y + y ).  Theorem 3.9 (Separation Theorems). Let A, B be two disjoint convex subsets of a normed space X. (i) If A 6⊂ B, then there exists f ∈ X0\{0} such that (21) sup f(A) ≤ inf f(B). (ii) If A has a non-empty interior, then there exists f ∈ X∗\{0} such that (21) holds. ∗ (iii) If A = {x0} and B is closed then there exists f ∈ X \{0} such that

(22) f(x0) < inf f(B). 30 R. SHVYDKOY

Proof. Suppose A 6⊂ B, then there exists x0 ∈ A and δ > 0 such that Bδ(x0) ∩ B = ∅. By moving x0 to the origin we satisfy all the conditions on A and B as above, which allows us to define Minkowski’s functions, qB ≤ pA. Thus, we have (19) for f = 0, and Y = {0}. By Theorem 3.8, there exists and extension f for which (20) holds, and thus, in view of (18), f separates A and B. If A has a non-empty interior then clearly (i) holds. Let us assume that εB(X) ⊂ A and let f be the functional constructed above. Since f(x) ≤ pA(x), we conclude that whenever kxk ≤ ε, then f(x) ≤ 1. This shows kfk ≤ ε−1. Finally if A = {x0} and B is closed, we apply case (ii) to A = Bδ(x0) for small δ > 0. Since f 6= 0, there is y ∈ S(X) for which f(y) > 0. Thus, f(x0) < f(x0) + εf(y) ≤ inf f(B).  The condition of (i) is not sufficient to guarantee that a bounded separator would exist. Let us consider the following example. Let X = `2,0 be the linear space of finite sequences 1 endowed with the `2-norm, A = conv{~en}n, and B = 2 A. These are convex disjoint sets. 1 P Notice that for any b ∈ B, let b = 2 i λi~ei, we have 1 X 1 X 1 kbk2 = |λ |2 ≤ λ = . 2 4 i 4 i 4 1 1 Thus, B ⊂ 2 B(X). Since any ~en 6∈ 2 B(X), the conditions of Theorem 3.9 (i) are satisfied. Next observe that 0 ∈ A ∩ B, by considering the sequences xn = (~e1 + ··· + ~en)/n and 1 yn = 2 xn. Suppose f ∈ `2/{0} separates the two sets, i.e. sup f(A) ≤ c ≤ inf f(B). Since 0 is in the closure of both sets, we conclude that c = 0. Then f as a sequence is positive on the one hand and negative on the other hand. Thus, f = 0, a contradiction. It is easy to construct an unbounded separator though by taking f = (1, 1, ....). Corollary 3.10. Let S ⊂ X be a subset of a normed space. Then \ conv S = {x : f(x) ≤ sup f(S)}. f∈X∗

Indeed, the inclusion ⊂ is obvious. If however x0 6∈ conv S, then Theorem 3.9 (iii) provides a functional such that f(x0) < inf f(conv S) ≤ inf f(S). Reversing the sign of f shows that x0 is not in one of the sets in the intersection. 3.7. Baire Category Theorem. Let us consider for a moment a general complete metric space X, without necessarily a given linear structure. Let us ask ourselves how ”big” such a space can be. One might say that a singleton X = {x0} is an obvious example of a ”small” space. On the other hand, it is not that small compared to its own standards – that one point is closed and open at the same time, and it is in fact a ball of any radius centered around itself. It is therefore rather ”large”. To make the terms ”small” and ”large” more precise let us agree on what we mean by a ”small” subset first. We say that F is nowhere dense in X is for any open set U there exists an open subset V ⊂ U with no intersection with F . In other words, F has empty interior. S∞ A subset F ⊂ X is called of 1st Baire category if F = i=1 Fi, where all Fi’s are nowhere dense. A subset is of 2nd Baire category if it is not of the 1st category. Thus, in the example above X itself is clearly of the 2nd category. This in fact holds in general. Theorem 3.11 (Baire Category Theorem). Any complete metric space is a 2nd Baire cat- egory set. LECTURES ON FUNCTIONAL ANALYSIS 31 S Proof. Let us suppose, on the contrary, that X = Fi, and all Fi’s are nowhere dense. Then, there is a closed ball Bε1 (x1) with ε1 < 1 disjoint from F1. Since F2 is however dense, there is a ball Bε2 (x2) ⊂ Bε1 (x1), with ε2 < 1/2, disjoint from F2, continuing in the same manner

we find a sequence of nested closed balls Bεn (xn), with εn < 1/n. Clearly, d(xn, xm) < 1/m, for all n > m. Thus, the sequence {xn} is Cauchy. By completeness, there exists a limit x, which belongs to all the balls, and hence not in any Fj’s, a contradiction.  There are many consequences of the Baire category theorem, some of which are given in the exercises below. Theorem 3.12 (Banach-Steinhauss uniform boundedness principle). Let F ⊂ L(X,Y ) be a family of bounded operators, and X is a Banach space. Suppose for any x ∈ X,

supT ∈F kT xk < ∞. Then, the family is uniformly bounded, i.e. supT ∈F kT k < ∞.

Proof. Let Fn = {x ∈ X : sup kT xk ≤ n}. Note that each set Fn is closed, and by S T ∈F assumption X = n Fn. Hence, by the Baire Category Theorem, one of Fn’s contains a ball Br(x0). This implies that for all x ∈ B(X), kT (x0 + rx)k ≤ n, for all T ∈ F. Thus, −1 −1 kT xk ≤ r (n + kT x0k) ≤ r (n + supT ∈F kT x0k), implying the desired result.  Corollary 3.13. Let S ⊂ X be a subset such that for every x∗ ∈ X∗, sup |x∗(S)| < ∞. Then S is bounded. Indeed, if viewed as a subset of X∗∗, S is a pointwise bounded family of operators. Hence, it is norm-bounded by the Banach-Steinhauss theorem. 3.8. Open mapping and . Lemma 3.14. Suppose T : X → Y is bounded, and X is a Banach space. Suppose that 1 B(Y ) ⊂ T (B(X)), then 2 B(Y ) ⊂ T (B(X)). Proof. Let us note, by linearity of T , that (23) rB(Y ) ⊂ T (rB(X))

for any r > 0. Let us fix y ∈ B1/2(Y ) and let us fix a small ε > 0 to be specified later. By 1 (23) we can find x1 ∈ 2 B(X) such that ky − y1k < ε, where T x1 = y1. Since y − y1 ∈ εB(Y ), one finds x2 ∈ εB(X) such that ky − y1 − y2k < ε/2, where T x2 = y2. Continuing this way, ∞ n P we construct a sequence {xn}n=1 with kxnk ≤ ε/2 . Let x = n xn. Then by construction 1 T x = y, and kxk ≤ 2 + 2ε < 1, provided ε is small.  Theorem 3.15 (Open mapping theorem). Suppose T ∈ L(X,Y ), and both spaces are Ba- nach. If, in addition, T is surjective, then T is an open mapping, i.e. T (U) is open for every open U.

Proof. Suppose, U is open. Let x0 ∈ U, and let Bε(x0) ⊂ U. We prove the theorem if we show that T (Bε(x0)) contains an open neighborhood of T x0. Since, T (Bε(x0)) = T x0 +εT (B(X)), it amounts to showing that T (B(X)) contains a ball centered at the origin. Since T is S surjective, we have Y = n nT (B(X)). By the Baire Category Theorem, one of the sets nT (B(X)) is dense in some ball Bδ(y0), and hence, by linearity, so is T (B(X)). Since T (B(X)) is convex and symmetric with respect to the origin,

δB(Y ) ⊂ conv{Bδ(y0),Bδ(−y0)} ⊂ T (B(X)) 1 δ Applying Lemma 3.14 to the operator δ T , we conclude 2 B(Y ) ⊂ T (B(X)) and the proof is finished.  32 R. SHVYDKOY

As a direct consequence of the open mapping theorem we obtain the following. Corollary 3.16. Suppose T ∈ L(X,Y ) is bijective. Then T −1 is automatically bounded. Proof. The open mapping property is saying that inverse image of any open set under T −1 is open. This is precisely the definition of being continuous for T −1. Alternatively, let δB(Y ) ⊂ T (B(X)). This means that if kyk < δ, then there exists kxk < 1 such that T x = y, or x = T −1y. So, kT −1yk ≤ 1. Thus, for any kyk < 1, we have −1 kT yk ≤ 1/δ.  Let us consider the following example of a multiplication operator on `p, p < ∞: 1 1 (24) T x = (x , x , x ,...). 1 2 2 3 3 It is injective, and formally the inverse is given by −1 T x = (x1, 2x2, 2x3,...). So, it is clearly not bounded. The reason why the open mapping theorem does not apply is because T is not surjective, even though Rg T is in fact dense in `p. Corollary 3.17. Two norms on a Banach space are either equivalent or incomparable.

Proof. If Ck · k1 ≥ k · k2, then the identity operator i :(X, k · k1) → (X, k · k2) is bounded. Hence, the inverse is bounded, which establishes the equivalence.  Another classical application of the open mapping theorem is the following.

Definition 3.18. We say that an operator T : X → Y is closed if xn → x and T xn → y implies that T x = y. Clearly, every continuous operator is automatically closed. However, the property of being closed is formally weaker than being continuous because we are asking in the assumption for T xn to converge to some limit. Why term ”closed”? It turns out one can rephrase the definition in terms of the graph of the operator given by Γ = {(x, T x): x ∈ X} ⊂ X × Y. Note that Γ is a linear subspace of the direct product as a consequence of linearity of T itself. Now, if we endow X × Y with a norm, for example the `1-sum norm, and consider Γ as a subspace of X ⊕1 Y , then the definition can be rephrased by saying that the graph Γ is a closed subspace of X ⊕1 Y . The next result proclaims that in fact being closed is not weaker – it is exactly equivalent to being bounded. Theorem 3.19 (Closed graph theorem). Suppose X and Y are Banach spaces. Then every closed operator T : X → Y is automatically bounded.

Proof. By the assumption, Γ is closed in X ⊕1 Y . Let us consider the projection onto X restricted to Γ, P :Γ → X, given by P (x, T x) = x. The operator is clearly bounded since kxk ≤ kxk + kT xk. It is also easy to show that P is bijective. Hence, by Corollary 3.16, P −1 is bounded, which implies kxk + kT xk ≤ Ckxk, LECTURES ON FUNCTIONAL ANALYSIS 33

and thus, kT xk ≤ (C − 1)kxk. This implies that T is bounded.  3.9. Exercises. Exercise 3.1. Show that the Triangle Inequality is equivalent to (25) |kxk − kyk| ≤ kx ± yk. Exercise 3.2. The fact that an absolutely convergent series converges does not necessarily hold in general normed spaces. Prove that a normed space (X, k · k) is Banach if and only if every absolutely convergent series in X is convergent.

Exercise 3.3. Show that c0 and `p is a for all 1 ≤ p < ∞.

Exercise 3.4. Show that `∞ is not a separable space. Hint: consider the set of vectors {xA}A⊂N, where xA is the characteristic function of A.

Exercise 3.5. Verify that the `p-sum of Banach spaces X1 ⊕p X2 ⊕p ... is a Banach space. ∗ ∗ ∗ Exercise 3.6. Show that if X = X1 ⊕p ... ⊕p Xn, then X = X1 ⊕q ... ⊕q Xn, where p, q are conjugates. Show that the same is true for infinite `p-sums if 1 ≤ p < ∞. Exercise 3.7. If a strict inequality holds in (21), then A and B are called strictly separated. Show that if A is compact and B closed convex disjoint sets, then they can be strictly separated. Exercise 3.8. Let T ∗∗ : X∗∗ → Y ∗∗ be the second adjoint operator, i.e. T ∗∗ = (T ∗)∗. Show ∗∗ that T |X = T . Exercise 3.9. Show that X∗ is always complemented in X∗∗∗. Hint: consider the adjoint i∗ : X∗∗∗ → X∗. Exercise 3.10. Let T ∈ L(X,Y ), and S ∈ L(Y,Z). Show that (S ◦ T )∗ = T ∗ ◦ S∗. Exercise 3.11. Show that in Examples 2.10, 2.11 the norms are not equivalent. Exercise 3.12. Verify the inequality 1 − 1 kxkq ≤ kxkp ≤ n p q kxkq, for vectors x ∈ Rn. Can you interpret the upper bound as a particular case of (7)? Exercise 3.13. Show that if µ(Ω) = ∞, then the Lp-norms are not comparable on Lp ∩ Lq, i.e. neither is stronger than the other. Exercise 3.14. Prove that T : X → Y is an isomorphism if and only if T ∗ : Y ∗ → X∗ is. Exercise 3.15. Show that X is reflexive if and only if X∗ is reflexive. Exercise 3.16. Let Y ⊂ X be a closed subspace, and X is Banach. Define Y ⊥ = {f ∈ ∗ ∗ X : f|Y = 0}. This is a closed subspace of X , called the annihilator of Y . Show that Y ∗ ∼= X∗/Y ⊥ , and (X/Y )∗ ∼= Y ⊥. Exercise 3.17. Give an example of a vector y ∈ `p which does not belong to the range of the operator defined in (24). 34 R. SHVYDKOY

4. Hilbert Spaces 4.1. Inner product. Let H be a linear space over C. We say that a binary operation h·, ·i : H × H → C is an inner product on H if (1) hαx + βy, zi = αhx, zi + βhy, zi, (2) hx, yi = hy, xi, (3) hx, xi ≥ 0 and hx, xi = 0 if and only if x = 0. Note that in view of conjugate symmetry property (2), the inner product is not linear with respect to the second variable, but it is conjugate-linear: (1*) hz, αx + βyi =α ¯hz, xi + β¯hz, yi. 1 Products satisfying (1)-(1*) are called sesquilinear, i.e. 1 2 -linear. In case of H is defined over the reals R, we replace assumption (2) with the usual symmetry hx, yi = hy, xi. In this case the inner product is bi-linear, i.e. linear with respect to both positions. Every inner product defines a norm (inner-product norm) by kxk = phx, xi. Axioms (1)-(2) of the definition of a norm are easily verified. In order to demonstrate the triangle inequality we prove the Cauchy-Schwartz inequality first. Lemma 4.1 (Cauchy-Schwartz inequality). For any pair of vectors x, y ∈ H one has |hx, yi| ≤ kxkkyk, and the equality holds if and only if x, y are linearly dependent. Proof. If y = 0, then the Cauchy-Schwartz inequality is trivial. Let us suppose y 6= 0. Let us fix a λ ∈ C to be determined later and compute 0 ≤ kx + λyk2 = kxk2 + 2 Re(λ¯hx, yi) + |λ|2kyk2. hx,yi Setting λ = − kyk2 we obtain |hx, yi|2 |hx, yi|2 kxk2 − 2 + ≥ 0. kyk2 kyk2 This proves the inequality. In case of equality holds, then for the above choice of λ we obtain kx + λyk2 = 0. Hence, x = −λy.  With the help of the Cauchy-Schwartz inequality we compute kx + yk2 = kxk2 + 2 Rehx, yi + kyk2 ≤ kxk2 + 2|hx, yi| + kyk2 ≤ kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2. This proves the triangle inequality. Definition 4.2. An inner-product linear space H which is complete in the metric of its inner-product norm is called a . LECTURES ON FUNCTIONAL ANALYSIS 35

Example 4.3. All classical examples are one way or another are connected to L2-spaces. This is not a coinsidence. We will make this more precise later. For now let us consider the space `2. Its norm comes from the inner product defined by ∞ X hx, yi = xiyi. i=1

Since it is also complete, `2 is a Hilbert space. More generally, L2(Ω, Σ, µ) is a Hilbert space with the inner product given by Z hf, gi = f(x)g(x) dµ(x). Ω Can we definitively say that all the other examples we encountered are not Hilbert spaces in their canonical norms? To answer this question we first observe that all inner-product norms satisfy a distinct identity, called the Parallelogram rule: (26) kx + yk2 + kx − yk2 = 2(kxk2 + kyk2), ∀x, y ∈ H. It is remarkable that this formula does not involve the underlying inner-product explicitly, but it is a consequence of the norm origin. So, let us check that `p-norm, for p 6= 2, is not an inner-product norm. Simply pick x = (1, 0, 0, ....), y = (0, 1, 0, ...), and verify directly that the Parallelogram Rule fails. We note that using the Parallelogram Rule only proves that those spaces are not Hilbert in their classical norms. But it is much harder to show that there is no other equivalent norm for which those spaces are Hilbert. This shows that the property of being Hilbert is not just isometric, it determines isomorphic properties of the space too. Can one restore the inner product from the inner-product norm? In a real space this is done via the following identity 1 (27) hx, yi = (kx + yk2 − kx − yk2), 4 and in a complex space this is done via the Polarization Identities 1 Rehx, yi = (kx + yk2 − kx − yk2), (28) 4 1 Imhx, yi = (kx + iyk2 − kx − iyk2), 4 As a consequence of the Cauchy-Schwartz inequality we deduce continuity of the inner product.

Lemma 4.4. If xn → x, and yn → y, then hxn, yni → hx, yi.

Proof. Note that the sequence yn is bounded, kynk ≤ M. So,

|hxn, yni − hx, yi| = |hxn, yni − hx, yni + hx, yni − hx, yi| ≤ kxn − xkkynk + kxkkyn − yk → 0.

 36 R. SHVYDKOY

4.2. Orthogonality. Next we will discuss the very important concept of orthogonality. We say that x and y are orthogonal, denoted x ⊥ y, if hx, yi = 0. For any two orthogonal vectors x ⊥ y one has the : kx + yk2 = kxk2 + kyk2. Prove it as an exercise! An important concept we learned in is the concept of an orthogonal projection to a hyperplane. This concept can be extended to any Hilbert space, in fact. To do that we have to recall that the orthogonal projection of x to Y ⊂ H can be viewed as a solution to the variational problem:

Find y0 ∈ Y such that kx − y0k = miny∈Y kx − yk := dist{x, Y }. If this problem has a solution then we have a good candidate for an ”orthogonal projection”. A solution to the minimization problem can be given for general closed convex sets. Lemma 4.5. Let Y be any convex closed subset of H. Then for every x ∈ H there exists a unique y0 ∈ Y such that kx − y0k = min kx − yk. y∈Y

1 Proof. Let I = miny∈Y kx − yk. Consider a sequence In = I + n , and yn ∈ Y such that

In ≥ kx − ynk ≥ I. 1 Since Y is convex, then 2 (yn + ym) ∈ Y , and so, 1 I + I ≥ kx − (y + y )k ≥ I. n m 2 n m Thus, as n, m → ∞, the averages also form a minimizing sequence 1 kx − (y + y )k → I. 2 n m By the Parallelogram Rule, 2 2 2 2 2 k2x − (yn + ym)k = kx − yn + x − ymk = −kx − yn − x + ymk + 2(kx − ynk + kx − ymk ). 2 2 2 2 2 So, as n, m → ∞, we have k2x − (yn + ym)k → 4I , while 2(kx − ynk + kx − ymk ) → 4I 2 2 also. So, kx − yn − x + ymk = kyn − ymk → 0. Since yn is Cauchy, it must converge to a limit y0. Clearly, from closedness of Y , y0 ∈ Y , and kx − y0k = I. Uniqueness already follows from the above. If y0 and y1 are both minimizers, set y2k = y0, and y2k+1 = y1. This is another minimizing sequence to which the argument above applies to show that in fact yn is Cauchy. Hence, y0 − y1 → 0, i.e. y0 = y1.  Lemma 4.6 (Existence of orthogonal projection). Suppose that Y ⊂ H is a closed linear subspace. Let x ∈ H and y0 ∈ Y is the minimizer, kx − y0k = dist{x, Y }. Then x − y0 ⊥ Y . Furthermore, if for some y0 ∈ Y , x − y0 ⊥ Y , then y0 is the minimizer.

The latter means that hx − y0, yi = 0, for all y ∈ Y .

Proof. Geometrically the idea of the proof is to show that if there is a vector y1 ∈ Y with hx − y0, y1i= 6 0, then moving along the line in the direction of y1 we can find another vector in Y whose distance to x is smaller than I. Indeed, let us consider λ ∈ C and 2 2 ¯ 2 kx − y0 + λy1k = I + 2 Re λhx − y0, y1i + |λ| . LECTURES ON FUNCTIONAL ANALYSIS 37

2 Denote α = |hx − y0, y1i| and let λ = −thx − y0, y1i, t ∈ R. Then continuing the above we obtain I2 − αt + αt2 < I2,

for t = 1/2, for example. So, we found another vector, namely y0 − λy1 ∈ Y for which the distance to x is smaller than I. A contradiction. Now let us assume that we have a vector y0 ∈ Y such that x−y0 ⊥ Y . Denote kx−y0k = I. Then for any other vector y ∈ Y , we have 2 2 2 2 kx − y0 − yk = I + kyk ≥ I . This proves the last assertion.  The characterization of minimizer as an orthogonal projection is especially important. We can show that the map

P : H → Y, P x = y0,

is linear. Indeed, if P x1 = y1, P x2 = y2, then x1 − y1 ⊥ Y , and x2 − y2 ⊥ Y . But then, αx1 + βx2 − αy1 − βy2 ⊥ Y also. So, P (αx1 + βx2) = αy1 + βy2. Furthermore, this mapping defines a bounded operator with kP k = 1. Indeed, kxk2 = kx − P x + P xk2 = kx − P xk2 + kP xk2 ≥ kP xk2. So, kP xk ≤ kxk. Moreover, P y = y for all y ∈ Y trivially. This establishes that P is idempotent, P 2 = P . These properties tell us that P is a projection operator, called orthogonal projection onto Y . We now study the kernel of P , which we denote Y ⊥. Lemma 4.7 (). We have Y ⊥ = {z ∈ H : z ⊥ y for all y ∈ Y }. Moreover, Y ⊥ is a closed linear subspace complementary to Y : (29) H = Y ⊕ Y ⊥. Proof. The latter assertion is in fact a consequence of the definition of Y ⊥ as a kernel of the projection P with image Y . Let us address the first claim now. If z ⊥ Y , then kz − yk2 ≥ kzk2 for any y ∈ Y . So, this means that 0 is the minimizer in this case, and hence P z = 0. Conversely, if P z = 0, then by Lemma 4.6, z − 0 = z ⊥ Y , and our lemma follows.  We can define an orthogonal complement to any set A in H: A⊥ = {x ∈ H : x ⊥ a, ∀a ∈ A}. It is easy to show that the orthogonal complement to A is also a complement to the entire closed linear span of A, see Exercise 4.6. As a consequence of Exercise 4.3, we can see that the double-complement returns not the set A itself in general, but its closed linear span: (30) A⊥⊥ = span A. As a consequence we obtain the following important characterization of dense sets. Lemma 4.8. The linear span of a set A is dense in H if and only if A⊥ = {0}. 38 R. SHVYDKOY

Proof. Suppose that the linear span is dense. This implies that span A = H. But then according to (30), ⊥ {0} = span A = A⊥⊥⊥ = A⊥, where the latter follows from the fact that A⊥ is a closed linear space and hence Exercise 4.3 applies. ⊥ Conversely, if A⊥ = {0}, then according to the above, span A = {0}. This means that the orthogonal complement to span A is trivial, hence according to decomposition (29), span A = H, which literally means that span A is dense in H.  Lemma 4.8 provides a valuable tool to show that a set is in some sense total – the linear span has trivial complement. It would have been very hard to try to prove it directly as it is difficult to describe all the linear combinations of a set. For future reference we will use exactly this term. Definition 4.9. A set A ⊂ H is called total if span A = H.

∞ 4.3. Orthogonal and Orthonormal sequences. We say that a sequence {xn}n=1 is or- thogonal if hxn, xmi = 0, ∀n 6= m.

If in addition all kxnk = 1, then the sequence is called orthonormal. Let us consider some classical examples. In space `2 the sequence of unit basis vectors en is orthonormal. In space L2[0, 2π] the system of simple harmonics forms an orthogonal sequence: inθ un(θ) = e , n ∈ Z. To make it orthonormal, we have to normalize it: 1 v = einθ. n 2π If one is interested in real functions one can consider the corresponding trigonometric sys- tems: an = cos(nt), bm = sin(mt), which together form an orthogonal system. Both systems are referred to as a Fourier system. If we have an orthonormal sequence {en} it is very easy to find a projection onto the closed space Y spanned by it. We will discuss it next. First, let us consider the finite dimensional spaces

Yn = span{e1, . . . , en}.

If x ∈ H, and y = Pnx, where Pn is the orthogonal projection onto Yn, then n X Pnx = akek. k=1

To find the coefficient let us recall that x − Pnx ⊥ Yn. So,

hx − Pnx, eki = 0, k = 1, ..., n. Since the system is orthonormal, hPnx, eki = ak. LECTURES ON FUNCTIONAL ANALYSIS 39

So,

ak = hx, eki. Thus, we obtain the explicit formula n X Pnx = hx, ekiek. k=1 Let us compute the norm using the Pythagorean theorem, n 2 2 X 2 kxk ≥ kPnxk = |hx, eki| . k=1

Let us now consider the entire sequence and space Y . Let y = P x, and denote yn = Pnx. In view of the inequality above, we have n X 2 2 |hx, eki| ≤ kxk , k=1 for all n. This implies that the numerical series as n → ∞ converges ∞ X 2 2 (31) |hx, eki| ≤ kxk . k=1

This inequality carries a special name: Bessel inequality. Looking at the sequence of yn’s we obtain now m 2 X 2 kyn − ymk = |hx, eki| . k=n+1

This can be made < ε for n, m > N in view of the convergence of the series. Hence, {yn} is Cauchy, which means that the vectorial series ∞ X y˜ = hx, ekiek, k=1 converges. Let us show that y =y ˜. Indeed, by uniqueness of Lemma 4.6 it suffices to prove that x − y˜ ⊥ Y . Since the set A = {en}n spans Y , it suffices to just show that x − y˜ ⊥ A. For that we pick en, and by continuity of the inner product and convergence of the series compute ∞ X hx − y,˜ eni = hx, eni − h hx, ekiek, eni = hx, eni − hx, eni = 0. k=1 We have arrived at the following conclusion.

Lemma 4.10. For any orthonormal sequence {en}n and any vector x ∈ H the orthogonal projection of x onto the space Y = span{en : n ∈ N} is given by ∞ X (32) P x = hx, ekiek. k=1 This brings us to the main concept of this section. Definition 4.11. A total orthonormal sequence is called an . 40 R. SHVYDKOY

This means that Y = H, and P = I, and according to Lemma 4.10, the expansion (32) holds not just for the projection but for the vector x itself: ∞ X (33) x = hx, ekiek. k=1 For an orthonormal basis the Bessel inequality turns into the Parseval identity: ∞ 2 X 2 (34) kxk = |hx, eki| . k=1 Indeed, by convergence of the series and continuity of the norm,

N 2 N ∞ 2 X X 2 X 2 kxk = lim hx, ekiek = lim |hx, eki| = |hx, eki| . N→∞ N→∞ k=1 k=1 k=1 4.4. Existence of a basis and Gram-Schmidt orthogonalization. There is a general procedure to produce an orthonormal sequence from any linearly independent system of vectors {xn}n, called Gram-Schmidt orthogonalization. So, let us start from an arbitrary linearly independent system {xn}n. On the first step we define x1 e1 = . kx1k

Next, we define e2 as an orthogonal vector from Y1 = span{e1} to the tip of x2:

x2 − P1x2 e2 = . kx2 − P1x2k

Next, we define e3 as an orthogonal vector from Y2 = span{e1, e2} to the tip of x3

x3 − P2x3 e3 = , kx3 − P2x3k

and so on. On the nth step we define en as an orthogonal vector from Yn−1 = span{e1, ..., en−1} to the tip of xn xn − Pn−1xn en = . kxn − Pn−1xnk Let us note a couple of properties. First, Yn = span{x1, ..., xn−1}. So, on each step we know what the span of first n vectors will be. And second, the procedure above is quite explicit one each step. Once we know vectors e1, ..., en−1 we find n−1 X Pn−1xn = hxn, ekiek. k=1 So, by the Pythagorean Theorem, n−1 2 2 X 2 kxn − Pn−1xnk = kxnk − |hxn, eki| . k=1

We can thus write the formula for en more explicitly, n−1 ! 1 X en = q xn − hxn, ekiek . 2 Pn−1 2 kxnk − k=1 |hxn, eki| k=1 LECTURES ON FUNCTIONAL ANALYSIS 41

By construction, the new system {en} is orthonormal and it spans the space space as the old one. So, if H is a separable Hilbert space, then we can start with an arbitrary dense countable set (yn). Then we can iteratively filter out vectors yn which are linearly dependent on the previous ones y1, ..., yn−1, making a new linearly independent system (xn) with the same linear span as (yn), i.e. span(xn) = H. Using Gram-Schmidt orthogonalization we produce another orthonormal sequence (en) with the same span span(en) = H. This system is total, and hence forms a basis. Theorem 4.12. Every separable Hilbert space has an orthonormal basis. As an easy consequence of this theorem we obtain the following result.

Theorem 4.13. Every separable Hilbert space is isometrically isomorphic to `2.

Proof. Indeed, let (en) be an orthonormal basis in H. Define the operator ∞ T : H → `2, T x = (hx, eni)n=1.

According to Parseval’s identity kT xk`2 = kxkH . So, T is an isometry. But it is also surjective because for every sequence a = (an) ∈ `2 we can define ∞ X x = anen ∈ H n=1 and hence T x = a.  As easy it is to show that a given system is orthonormal it appears much harder to show that it is a basis. This usually boils down to proving totality, which is an of a given set. Proving this requires elements of and . All the examples we mentioned before, however, are bases. Example 4.14. We start with the system 2 x0 = 1, x1 = t, x2 = t ,... in space L2[a, b]. This sequence forms polynomials and polynomials are dense in C[a, b] while C[a, b] is dense in L2[a, b]. Thus, polynomials are dense in L2[a, b]. Hence it is a total system. The Gram-Schmidt orthogonalization produces the system of Legendre polynomials. On the interval [−1, 1] those are given by r2n + 1 e = P (t), n 2 n 1 dn P = [(t2 − 1)n]. n 2nn! dtn Example 4.15. On the whole line L2(R) we start with moments of Gaussian distributions −t2/2 2 x0 = e , x1 = tx0, x2 = t x0,... The orthogonalization process results in Hermit system

1 2 e = √ e−t /2H (t), n (2nn! π)1/2 n n 2 d 2 H = 1,H = (−1)net (e−t ). 0 n dtn 42 R. SHVYDKOY

Example 4.16. Finally, on the half-line, L2[0, ∞), starting from −t/2 2 x0 = e , x1 = tx0, x2 = t x0,..., we obtain Laguerre polynomials −t/2 en = e Ln, et dn L = 1,L = (tne−t). 0 n n! dtn 4.5. Riesz Representation Theorem. Linear bounded functionals over Hilbert spaces have a very explicit representation. In fact it is easy to construct a f ∈ H∗ using another element y ∈ H. Indeed, define f(x) = hx, yi. Let us compute its norm. First, by the Cauchy-Schwartz inequality we have |f(x)| ≤ kxkkyk. So, kfk ≤ kyk. At the same time, define x = y/kyk ∈ S(H). Then f(x) = kyk. Thus, kfk = kyk. It turns out that all bounded functionals over H can be constructed this way. Theorem 4.17. For every f ∈ H∗ there exists a unique y ∈ H such that f(x) = hx, yi, ∀x ∈ H.

Moreover, kfkH∗ = kykH . Proof. Let ker f = Z ⊂ H, and suppose f 6= 0, for otherwise we can pick y = 0. Then Z is ⊥ a a 1-codimensional subspace. Let y0 ∈ Z with ky0k = 1. Let f(y0) = α. Define y =αy ¯ 0. Let us verify that this choice works. Indeed, for any x ∈ H we have a unique decomposition

x = λy0 + z, z ∈ Z, λ ∈ C. Then f(x) = λα by construction. At the same time, by orthogonality,

hx, yi = λhy0, yi = λαhy0, y0i = λα.

Uniqueness: if there are tow such vectors y1, y2, then

hx, y1i = hx, y2i, ∀x ∈ H.

Then picking x − y1 − y2, we obtain ky1 − y2k = 0. The equality of norms follows from the discussion above.  So, it seems like we have established that H ∼= H∗, because for every f ∈ H∗ we found the corresponding y ∈ H with the same norm. However this operator J : H∗ → H is not linear – it is conjugate linear, because ¯ J(αf1 + βf2) =αy ¯ 1 + βy2. It is still true, however, that H ∼= H∗, but the correspondence J is often one that appears to be most useful for identifying functionals. To see that H∗ is a Hilbert space we can introduce inner-product by hf, gi∗ = hJg, Jfi. LECTURES ON FUNCTIONAL ANALYSIS 43

∗ Then kfk = kfkH∗ = kJfkH . Another consequence of the Riesz Representation Theorem is an explicit formula for the x supporting functional: for every x 6= 0, set. f ∼ kxk . Then kfk = 1, and f(x) = kxk. In the Hilbert space setting the formula for the norm of an operator (17) reads (35) kT k = sup hT x, yi. kxk=1,kyk=1 4.6. Hilbert-adjoint operator. Since H∗ is the same as H via f → Jf, it makes sense to define an adjoint operator to any T : H → H not in the canonical way as T ∗ : H∗ → H∗, but as another operator T ∗ : H → H. This is called Hilbert-adjoint, or simply adjoint is the Hilbert context is assumed. To make that definition, let us fix y ∈ H and consider the following functional f(x) = hT x, yi, ∀x ∈ H. It is a bounded linear functional, and |f(x)| ≤ kT xkkyk ≤ kT kkxkkyk. So, kfk ≤ kT kkyk. According to the Riesz Representation Theorem there exists an element which we denote T ∗y ∈ H, kT ∗yk = kfk, such that f(x) = hx, T ∗yi. Thus, hT x, yi = hx, T ∗yi This is the defining property of the adjoint operator. Since we can do it for every y ∈ H, the above defines a map T ∗ : H → H. This map is linear as is easy to verify. Moreover, from RRT, kT ∗yk = kfk ≤ kT kkyk. So, kT ∗k ≤ kT k. At the same time, by (35), kT k = sup hT x, yi = sup hx, T ∗yi ≤ kT ∗k. kxk=1,kyk=1 kxk=1,kyk=1 Thus, kT k = kT ∗k.

n ∗ ∗ Example 4.18. Consider H = C , and T ∼ A = (aij). Then T ∼ A = (¯aji). Definition 4.19. We say that T : H → H is self-adjoint if T = T ∗. We say that T : H → H is unitary if T is bijective and T −1 = T ∗. We say that T : H → H is normal if TT ∗ = T ∗T. 44 R. SHVYDKOY

As an example let us consider the multiplication operator on `2

T x = (a1x1, a2x2,...). Then ∗ T x = (¯a1x1, a¯2x2,...). Cleary, T is normal since multiplication by scalars commutes. If inf |ai| > 0, then T is invertible and −1 −1 −1 T = (a1 x1, a2 x2,...). ∗ −1 ∗ So, T = T if and only if all coefficients ai are real. And T = T if and only if |ai| = 1. In some sense these properties define these classes of operators in general (this will be more clear in the framework of spectral theory). To proceed with the properties of these special classes of operators, let us establish one simple uniqueness criterion. Lemma 4.20. Suppose H is a complex Hilbert space and let hT x, xi = hSx, xi, ∀x ∈ H. Then T = S.

Proof. Let us consider arbitrary λ ∈ C, x, y ∈ H and compute for D = S − T , 0 = hD(x + λy), (x + λy)i = λhDy, xi + λ¯hDx, yi. Setting λ = 1 and λ = i we obtain the system hDy, xi + hDx, yi = 0, hDy, xi − hDx, yi = 0, which proves that hDx, yi = 0, for all x, y ∈ H. Thus D = 0.  Lemma 4.21. An operator T : H → H is self-adjoint if and only if hT x, xi ∈ R for all x ∈ H. Proof. If T is self-adjoint we have hT x, xi = hx, T xi = hT x, xi. So, hT x, xi ∈ R. Conversely, if hT x, xi ∈ R, then hT x, xi = hT x, xi = hx, T xi = hT ∗x, xi. ∗ By Lemma 4.20 we obtain T = T .  Lemma 4.22. The following are equivalent (i) T : H → H is unitary. (ii) T is surjective and T ∗T = I. (iii) T is injective and TT ∗ = I. Proof. If T is unitary, then T is bijective hence it is both surjective and injective. Moreover T ∗T = TT ∗ = I. Hence (i) implies both (ii) and (iii). If (ii) holds, then T ∗T = I implies kT xk2 = hT x, T xi = hT ∗T x, xi = hx, xi = kxk2. LECTURES ON FUNCTIONAL ANALYSIS 45

Hence, T is injective and even is an isometry. Together with surjectivity it implies that T is bijective. But then applying T −1 from the right we obtain ∗ ∗ −1 −1 −1 T = T TT = IT = T . This establishes that T is unitary. Suupose (iii). For every y ∈ H we have T (T ∗y) = y. So setting x = T ∗y, we have T x = y. This proves that T is surjective. Together with the fact that is it injective implies that T is invertible. But then applying T −1 from the left we obtain ∗ −1 ∗ −1 −1 T = T TT = T I = T . This establishes that T is unitary. 

The right on `2 is a good example which illustrates that not every isometry is automatically unitary, i.e. the condition T ∗T = I alone does not imply surjectivity. 4.7. Exercises. Exercise 4.1. Prove identities (27) and (28). Exercise 4.2. Show that Lp([0, 1]), p 6= 2 is not a Hilbert space. Show that C[0, 1] is not a Hilbert space either. Exercise 4.3. Show that Y ⊥⊥ = Y . Exercise 4.4. Prove that Y ⊥ ∼= H/Y . Exercise 4.5. Show that A⊥ is a linear closed subspace of H. ⊥ Exercise 4.6. Show that A⊥ = span A . Exercise 4.7. Show that if (33) holds for every x ∈ H, then the orthonormal system is total, and hence is a basis. Exercise 4.8. Write down the formula for vectors in the trigonometric orthonormal system. Exercise 4.9. Show that 1. T ∗∗ = T . 2.( αT )∗ =αT ¯ ∗. 3.( S + T )∗ = S∗ + T ∗. 4.( ST )∗ = T ∗S∗. 5. kT k2 = kT ∗T k = kTT ∗k. Exercise 4.10. Show that the supporting functional is unique for every x ∈ H. Hint: recall when the Cauchy-Schwartz inequality becomes equality. Exercise 4.11. Show that a projector P : H → H is orthogonal if and only if it is self-adjoint. Exercise 4.12. Show that T : H → H is an isometry if and only if T ∗T = I and if and only if T preserves inner-products hT x, T yi = hx, yi, ∀x, y ∈ H. So, the only condition that separates every isometry from being unitary is the surjectivity condition, see Lemma 4.22 (ii). Exercise 4.13. Show that T ∗T and TT ∗ are both self-adjoint operators for any T . 46 R. SHVYDKOY

5. Weak topologies 5.1. . Let X be a Banach space. We define the weak topology on X as the topology with the following base of neighborhoods: for x ∈ X, ε1, . . . , εn > 0 and ∗ f1, . . . , fn ∈ X , let

f1,...,fn (36) Uε1,...,εn (x) = {y ∈ X : |fi(y) − fi(x)| < εi, ∀i = 1, n}. Note that on an infinite dimensional space X these are unbounded sets containing the entire n linear plane x + ∩i=1 Ker fi. We say that set V ⊂ X is weakly open if for every point x ∈ V there exists some weak f1,...,fn neighborhood of x contained in the same set, Uε1,...,εn (x) ⊂ V . The collection of all weakly open sets forms a topology (exercise!), called weak topology. We will use the term “strong” with respect to anything related to the norm-topology, as opposed to “weak” that refers to anything related to the weak topology. For example, “strongly compact set” v.s. “weakly compact set”, or “strong convergence” v.s. “weak convergence”. Examples of weakly open sets are basically anything that can be defined by a finite set of functionals, e.g. {f < a}, {f > b}, {a < f < b}, ... The corresponding sets with ≤, ≥ are weakly closed. It is clear that the weak topology is weaker than the norm-topology on any normed space. In fact, on an infinite dimensional space it is strictly weaker. To see this, we show that any neighborhood (36) is unbounded. Indeed, let Z = ∩i Ker fi. This is a non-empty space, for f1,...,fn otherwise X would have been a span of xi’s, with xi 6∈ Ker fi. Then Uε1,...,εn (x) contains all of x + H.

∗ Lemma 5.1. A sequence xn → x weakly if and only if f(xn) → f(x), for all f ∈ X . ∗ Proof. Suppose xn → x weakly. For any ε > 0 and f ∈ X one can find an N ∈ N such f that for all n > N, xn ∈ Uε (x). So, |f(xn) − f(x)| < ε. This proves the convergence f(xn) → f(x). ∗ Conversely, for any sets ε1, . . . , εn > 0 and f1, . . . , fn ∈ X , one can find a common N ∈ N such that for all n > N,

|fi(xn) − fi(x)| < εi, ∀i = 1,...,N. This implies that x ∈ U f1,...,fn (x). n ε1,...,εn  One can weaken Lemma 5.1 considerably for bounded sequences by requiring convergence to hold only on a total set of functionals. Definition 5.2. We say that a family of functionals F ⊂ X∗ is total, if span F = X∗. In the Hilbert space settings where H ∼= H∗, this concept is identical to that of Defi- nition 4.9. In `p for 1 ≤ p < ∞ and in c0 the most basic example of a total system is ∞ provided by the set of coordinate basis vectors F = {en}n=1. And just like in Hilbert spaces if f(x) = 0 for all f ∈ F, then x = 0.

∗ Lemma 5.3. Let F ⊂ X be a total family of functionals. A sequence xn → x weakly if and only if

(i) kxnk ≤ M, for all n ∈ N; (ii) f(xn) → f(x), for all f ∈ F. LECTURES ON FUNCTIONAL ANALYSIS 47

Proof. Suppose xn → x weakly. Then (ii) directly from Lemma 5.1. Since xn is also a weakly , (i) holds by the Banach-Steinhauss uniform boundedness principle. Conversely, suppose (i) − (ii) holds. By linearity, f(xn) → f(x) holds for all f ∈ span F, which is dense in X∗. Let g ∈ X∗ be arbitrary. For any ε > 0 find f ∈ span F such that kf − gk < ε. Then find N ∈ N such that

|f(xn) − f(x)| < ε, ∀n > N. Then

|g(xn) − g(x)| ≤ |g(xn) − f(xn)| + |f(xn) − f(x)| + |f(x) − g(x)|

≤ kf − gkkxnk + ε + kf − gkkxk < ε(M + 1 + kxk).



Exercise 5.1. Suppose dim X = ∞. Construct a net {xα}α∈A in X such that xα → 0 weakly, yet for every α0 ∈ A and N > 0, there is α ≥ α0 such that kxαk ≥ N. Hint: let ∗ A = {(f1, . . . , fn; N): fj ∈ X , n ∈ N, N > 0}.

Define a partial order on A, and for every α ∈ A pick an xα ∈ ∩j Ker fj, with kxαk > N. Show that xα → 0 weakly and is (frequently) unbounded. With the help of Lemma 5.3 we can give a simple characterization of weak convergence in sequence spaces.

∞ Lemma 5.4. Let {xn}n=1 ⊂ X be a sequence in any of the spaces X = c0, `p, for 1 < p < ∞. Then xn → x weakly if and only if {xn} is norm bounded and converges to x pointwise, i.e. xn(j) → x(j), for all j ∈ N. Proof. This follows from Lemma 5.3 by choosing F to be the coordinate basis. In all the spaces in question such a sequence is total. 

What about `1 space?

∞ Lemma 5.5 (Kadets-Klee). In `1, a sequence {xn}n=1 converges weakly to x if and only if it converges to x strongly. Let us note that in spite of Exercise 1.4, the lemma above does not imply that the norm and weak topologies are equivalent, because it deals only with sequences. Loosely speaking, the reason why on `1 weak and strong convergences for sequences are equivalent is because the dual `∞ is ”very large”, so large that weak convergence is just as hard to arrange as strong convergence.

Lemma 5.6 (Weak lower-semi-continuity of norm). If xn → x weakly, then

lim inf kxnk ≥ kxk. n→∞ Proof. Consider a supporting functional for x: kfk = 1, f(x) = kxk. Then

kxk = f(x) = lim f(xn) ≤ lim inf kfkkxnk = lim inf kxnk. n→∞ n→∞ n→∞  48 R. SHVYDKOY

Sometimes when xn → x weakly and kxnk → kxk the sequence has no other choice than converge strongly, kxn − xk → 0. Such Banach spaces are called to have Radon-Riesz property. Any Hilbert space belongs to the Radon-Riesz class. Indeed, if xn → x weakly, then 2 2 2 kxn − xk = kxnk + kxk − hxn, xi − hx, xni According to our assumptions, the right hand side converges to kxk2 + kxk2 − hx, xi − hx, xi = 0. Lemma 5.7. The weak topology is not metrizable if dim X = ∞. Proof. Suppose, on the contrary that there is a metric d(·, ·) that defines the weak topology. Consider the sequence of balls {d(x, 0) < 1/n}. Each contains a weak neighborhood of the origin. We have shown every weak neighborhood is unbounded. Thus, we can find a xn within the nth ball with kxnk > n. So, on the one hand, xn → 0 weakly, and yet {xn} is unbounded, in contradiction with Lemma 5.3.  Lemma 5.8. If X∗ is separable, then the weak topology is metrizable on any bounded set.

∗ ∗ Proof. Since X is separable, we can pick a sequence of unit functionals ϕn ⊂ S(X ) which dense on the sphere S(X∗). Note that this family is automatically total. Now, define a metric ∞ X |ϕn(x − y)| d(x, y) = . 2n n=1 Note that since the functional are unit, the series converges. This metric is clearly symmetric, and if d(x, y) = 0, then ϕn(x − y) = 0 for all n. This implies that x − y = 0. The triangle inequality is also obvious. So, now let A ⊂ X be a bounded set. We will show that the weak topology induced on A, i.e. topology given by the family of traces of weak neighborhoods

f1,...,fn Uε1,...,εn (x) ∩ A, is equivalent to the topology produces by the metric d on A. For the purpose, we have to show that a neighborhood of a point x ∈ A in one topology contains a neighborhood of the same point x in the other topology. Let us start with metric one. So, fix x ∈ A, ε > 0, and consider the ball

Bε(x) = {y ∈ X : d(x, y) < ε}. Since A is bounded, we have kyk ≤ M, for some M > 0, and all y ∈ A. Then we can pick N ∈ such that N ∞ X |ϕn(x − y)| ε < , 2n 2 n=N+1 ϕ1,...,ϕn for all y ∈ A. Then, we consider Uε/2,...,ε/2(x) ∩ A. For any y in it we have N ∞ X ε X |ϕn(x − y)| d(x, y) ≤ + < ε. 2n+1 2n n=1 n=N+1 So, ϕ1,...,ϕn Uε/2,...,ε/2(x) ∩ A ⊂ Bε(x) ∩ A. LECTURES ON FUNCTIONAL ANALYSIS 49

f1,...,fn Conversely, let is fix a weak neighborhood Uε1,...,εn (x) ∩ A. Let us fix a small ε > 0 to be

determined later, and for each fi we can find a ϕki such that

fi − ϕki < ε. kfik

ki Then if d(x, y) < ε, then |ϕki (x − y)| < 2 ε for all i = 1, . . . , n. So, we compute

ki |fi(x − y)| ≤ 2Mε + kfik|ϕki (x − y)| < ε(2M + 2 kfik) < εi, provided ε is sufficiently small. This shows that

f1,...,fn Bε(x) ∩ A ⊂ Uε1,...,εn (x) ∩ A.  As a weaker topology, the weak topology provides a smaller family of open sets than the norm topology. Thus any weakly closed set is strongly closed as well 1. As a consequence of the Separation Theorem 3.9, it turns out that among the class of convex sets the property of being closed is the same in weak and strong topologies. Lemma 5.9. A convex set C ⊂ X is strongly closed if and only if it is weakly closed. w Proof. Clearly, if C is strongly closed, and yet there is a point x0 ∈ C not in C. Then by ∗ Theorem 3.9, there is a functional f ∈ X so that f(x0) > c > sup f(C). Thus x0 belongs to the open set U = {f > c} disjoint from C, a contradiction.  The exact same argument shows that the weak and strong closures of a convex set C coincide. This has a rather interesting consequence for relationship between weak and strong convergence of sequences. 5.2. Weak∗ topology. Consider now the dual space X∗. As any Banach space it has its own weak topology determined by the functionals from ”upstairs”, i.e. X∗∗. However, one can define a weaker Hausdorff topology on X∗ determined by the functionals from ”downstairs”, ∗ ∗ i.e. X. Let x1, . . . , xn ∈ X and ε1, . . . , εn > 0, and f ∈ X . We define a weak -open neighborhood of f to be

x1,...,xn ∗ (37) Uε1,...,εn (x) = {g ∈ X : |g(xi) − f(xi)| < εi, ∀i = 1, n}. Identifying element of X as vectors in X∗∗ we see it is just a special subclass of neighborhoods defined earlier in (36). It is still a Hausdorff topology as pairs of distinct functionals in X∗ can be separated by elements of X. The properties of weak∗-topology are entirely similar to those of the weak-topology, see exercises. How much the weak∗ topology may be weaker than the weak topology is illustrated by the following example (c.f. Lemma 5.5).

∗ Exercise 5.2. Show that in `1 = c0, fn → f if and only if {fn} is bounded and fn → f pointwise. Prove the same statement for sequences in `∞. Theorem 5.10 (Alaoglu). The unit ball of a dual space is compact in the weak∗-topology.

1 It sounds a bit counterintuitive from the linguistic point. But if you think what it takes for a set to be closed it becomes clear. If a set ”survives” weak limits from within itself, then it should definitely survive strong limits. 50 R. SHVYDKOY

Proof. Notice that for any f ∈ B(X∗), and x ∈ X, f(x) ∈ [−kxk, kxk]. This naturally ∗ Q suggests to consider B(X ) as a subset of the product space T = x∈X [−kxk, kxk]. By Ty- chonoff’s theorem, this product space is compact in the product topology. It suffices to show that B(X∗) is closed in T , because convergence of nets in the product topology is equivalent to , which for elements of B(X∗) amounts to weak∗ convergence. ∗ To this end, let {fα}α∈A be a net in B(X ) with lim fα = f ∈ T . By linearity of fα’s and the ”pointwise” sense of the limit above, we conclude that

f(λx + µy) ← fα(λx + µy) = λfα(x) + µfα(y) → λf(x) + µf(y).

Thus, f is linear, and since |fα(x)| ≤ kxk, we also have |f(x)| ≤ kxk for all x ∈ X, which ∗ identifies f as an element of B(X ). 

As an immediate consequence we see that for a reflexive Banach space X, the unit ball is weakly compact. This the property actually characterizes reflexivity.

Theorem 5.11 (Kakutani). A Banach space X is reflexive if and only if its unit ball is weakly compact.

The theorem will follow from the Goldstein Lemma.

Lemma 5.12 (Goldstein). The unit ball B(X) is weakly∗ dense in B(X∗∗). In particular, w∗ B(X) = B(X∗∗).

Indeed, let B(X) be weakly compact. In view of Lemma 5.12, for any x∗∗ ∈ B(X∗∗) we ∗ w ∗∗ w find a net xα → x , xα ∈ B(X). By the weak compactness, there exists a subnet yβ → x for some x ∈ B(X), and yet that same subnet converges to x∗∗ in the weak∗ topology of X∗∗. ∗ ∗∗ ∗ ∗ ∗ ∗ ∗∗ Thus, for every x we have (x , x ) ← (yβ, x ) = (x , yβ) → (x , x), which identifies x as x. Thus, B(X) = B(X∗∗), implying X = X∗∗. In order to prove Lemma 5.12, we have to go back to the Separation Theorem 3.9 and make an adaptation to the case of the weak∗ topology. First, we prove Helly’s Lemma.

∗ Tn Lemma 5.13 (Helly). Suppose that f, f1, . . . , fn ∈ X , and j=1 Ker fj ⊂ Ker f. Then f ∈ span{f1, . . . , fn}.

Proof. We will prove the lemma by induction. Suppose n = 1, and f1 6= 0 (otherwise the statement is trivial). By the structure of the linear functionals discussed in Section 3.2, there is x1 ∈ X with f1(x1) = 1, such that for every x ∈ X we have x = λx1 + y, where y ∈ Ker f1. Since f(y) = 0 we have f(x) = λf(x1) = f(x1)f1(x), as desired. n+1 Suppose the statement is true for n. Let us assume ∩j=1 Ker fj ⊂ Ker f. Consider n the space Y = Ker fn+1. Then ∩j=1 Ker fj|Y ⊂ Ker f|Y . By the induction hypothesis, Pn f|Y = j=1 ajfj|Y . By the structure of fn+1 we have for any x ∈ X, x = λxn+1 + y for some LECTURES ON FUNCTIONAL ANALYSIS 51

y ∈ Y , and where fn+1(xn+1) = 1. Thus, n n X X f(x) = λf(xn+1) + ajfj(y) = f(xn+1)fn+1(x) + ajfj(y) j=1 j=1 n n X X = f(xn+1)fn+1(x) + ajfj(x) − λ ajfj(xn+1) j=1 j=1 (38) n ! n X X = f(xn+1) − ajfj(xn+1) fn+1(x) + ajfj(x) j=1 j=1 n+1 X = ajfj(x). j=1  Lemma 5.14. If x∗∗ ∈ X∗∗ is continuous in the weak∗ topology, then x∗∗ ∈ X. Proof. By the assumption (x∗∗)−1(−1, 1) contains a weak∗ neighborhood of the origin, say, x1,...,xn ∗∗ ∗∗ Uε1,...,εn (0). In particular, x ∈ (−1, 1) on ∩j Ker xj. Since the latter is a linear space, x ∗∗ must in fact vanish on it. Thus, by Lemma 5.13, x ∈ span[x1, . . . , xn] ⊂ X.  Theorem 5.15 (Separation Theorem for weak∗ topology). Suppose B is a weakly∗-closed convex subset of X∗, and f 6∈ B. Then, there is x ∈ X such that sup g(x) < 1 < f(x). g∈B

x1,...,xn Proof. Let us follow the proof of Theorem 3.9. Let us assume f = 0, and let A = Uε1,...,εn (0) ∗ be a weak neighborhood of 0 disjoint from B. We then associate Minkowski’s functionals pA and qB to A and B respectfully. As a result of Hahn-Banach Theorem, we find a separating functional F so that qB ≤ F ≤ pA. Since f ∈ A implies pA(f) ≤ 1, we see that F is bounded from above on A. Like in the proof of Lemma 5.14 we conclude that F vanishes on the intersection of the kernels of the x1, . . . , xn, and hence, F ∈ X. By rescaling F if necessary we can arrange the constant of separation to be 1.  w∗ Proof of Lemma 5.12. Let B = B(X) . By Exercise 5.11, B ⊂ B(X∗∗). Suppose there is F ∈ B(X∗∗)\B. By Theorem 5.15 , we find a x∗ ∈ X∗ such that sup G(x∗) < 1 < F (x∗). G∈B The first inequality holds, in particular, on B(X), which shows that kx∗k ≤ 1. This runs into contradiction with the second inequality.  Corollary 5.16. Let Y ⊂ X be a closed subspace of a reflexive space X. Then Y and X/Y are reflexive. Proof. Indeed, B(Y ) = Y ∩ B(X). Since the this set if convex and closed, it is weakly closed in B(X) and hence compact in the weak topology of X. However, by the Hahn-Banach extension theorem the topology induced on Y by the weak topology of X is exactly the weak topology of Y . Thus, B(Y ) is weakly compact in Y . Now, by Exercise 3.16 and by the previous, (X/Y )∗ is a subspace of a reflexive space, which makes it reflexive. Then by Exercise 3.15( X/Y ) itself is reflexive.  52 R. SHVYDKOY

5.3. Exercises.

Exercise 5.3. Show that xα → x weakly if and only if f(xα) → f(x) for every functional f ∈ X∗.

Exercise 5.4. Show that, more generally, if a net xα → x weakly, then for any ε > 0 there is α0 so that for all α ≥ α0, kxαk ≥ kxk − ε. Exercise 5.5. Show that if A is closed and B is strongly compact convex sets, then there are two disjoint weakly open neighborhoods of A and B.

Exercise 5.6. Show that if xn → x weakly, then there is a sequence of convex combinations made of xn’s that converge to x strongly.

Exercise 5.7. Show that fn → f if and only if fn(x) → f(x) for every x ∈ X. Exercise 5.8. Show that any weakly∗ convergent sequence in X∗ is strongly bounded.

Exercise 5.9. Let F ⊂ X be a total set. Show that fn → f if and only if {fn} is bounded fn(x) → f(x) for every x ∈ F. Exercise 5.10. If X is separable show that the weak∗-topology is metrizable on any bounded subset of X∗. ∗ ∗ w Exercise 5.11 (Weak lower semi-continuity of norm). Show that if fn → f, then

lim inf kfnk ≥ kfk. n→∞ ∗ More generally, if a net {fα}α∈A converges weakly to f, then for any ε > 0 there is α0 ∈ A so that for all α ≥ α0, kfαk ≥ kfk − ε. ∗∗ ∗ ∗ Exercise 5.12. Recall that (c0) = (`1) = `∞. Show that B(c0) is weakly sequentially ∗ dense in B(`∞), i.e. for every F ∈ B(`∞) there is a sequence xn ∈ B(c0) converging weakly to F .

6. Compact sets in Banach spaces This section is devoted to studying an important class of sets in Banach spaces – compact sets in strong norm-topology. One of the goals will be to give characterizations of compact or precompact sets (see Definition 1.9). Tools developed in Section 1.1 will be most useful initially, so we encourage the reader to recall that material before reading this section. We set X to be an arbitrary Banach space, hence, a complete metric space in which Lemma 1.10 and Corollary 1.11 apply. Thus, K ⊂ X is precompact if and only if it has an ε-net for any ε > 0. This condition carries an intuitive interpretation of compact sets as being “finite-dimensional”. This analogy, however, is not be abused, as there are compact sets which are not subsets any finite-dimensional subspace. If fact one can easily construct such sets in a any separable Banach space. Let X be such a space, and let xn ∈ X be a dense subset of non-zero vectors in X. Consider 1 xn yn = . n kxnk

Then span K = X, where K = {0, yn : n ∈ N}. And yet we can prove that K is compact.

Indeed, if xk ⊂ K is a sequence, then either some yn0 is repeated among xk’s infinitely many LECTURES ON FUNCTIONAL ANALYSIS 53

times, in which case we pick a stationary subsequence xkl = yn0 , or if not, then xk → 0, in which case 0 ∈ K is the limit. So, compact sets may span infinite-dimensional spaces, but still are in some sense finite- dimensional as is seen in the following lemma. Lemma 6.1. A subset K ⊂ X is precompact if and only if K is bounded and for any ε > 0 there exists a finite-dimensional subspace Yε such that

k[x]kX/Yε < ε, ∀x ∈ K.

In other words, every element of K is no more than ε-away from the space Yε. Note that in this lemma and well as some of the statement below we choose to state a criterion for precompactness rather than compactness. This is simply because such a statement already contains the key element. In order the upgrade this to compactness one simply has to add “being closed”. Proof. Suppose K is precompact. Hence, it is clearly bounded. According to Lemma 1.10 J J for any ε > 0 we can find a finite ε-net {xj}j=1. Consider Yε = span{xj}j=1. Then for any x ∈ K, there exists an element of Yε, namely the one from the ε-net xj such that kx−xjk < ε.

Hence, k[x]kX/Yε < ε. Conversely, for any ε let us find a space Yε/2. For any x ∈ K, find y(x) ∈ Yε/2 with

kx − y(x)kX < ε/2.

The set y(K) = {y(x)}x∈K is obviously bounded. And as a subset of a finite-dimensional space, it is also precompact, see Corollary 2.13. Hence, according to Lemma 1.10 we can find an ε/2-net for y(K). Clearly, this same set will be an ε-net for K. 

An important example of a compact set in `2 is provided by the .

Example 6.2. Let us fix a sequence a ∈ `2 with positive coordinates an > 0. A Hilbert Cube is the subset of `2 defined by

H = {x ∈ `2 : |xn| ≤ an}. So, in other words it is an infinite Cartesian product

[−a1, a1] × [−a2, a2] × · · · × [−an, an] × ...

Note that H is a bounded closed convex subset of `2. It is also precompact, which along with closedness makes it compact. Indeed, for any ε, let us pick N > 0 such that

X 2 2 an < ε . n≥N

Define Yε = span{e1, . . . , eN }. Then for any x ∈ H, we have !1/2 !1/2 X 2 X 2 k[x]k`2/Yε ≤ |xn| ≤ an < ε. n≥N n≥N Lemma 6.3. If K ⊂ X is precompact, then conv K is precompact too. Let us note that even if K is compact, it may not necessarily imply that conv K is closed, unless dim X < ∞, see Corollary 2.16. However, the lemma implies that conv K is compact. 54 R. SHVYDKOY

Proof. Suppose K ⊂ X is precompact. Then for any ε > 0 there exists Yε as described in P Lemma 6.1. If x ∈ conv K, x = i λixi, then X k[x]kX/Yε ≤ λik[xi]kX/Yε < ε. i

Hence, the same space Yε approximates conv K too. Conversely, any subset of a precompact set is clearly precompact.  6.1. Compactness in sequence spaces. Let us address the general question of compact- ness in sequence spaces `p, 1 ≤ p < ∞. The example of the Hilbert cube suggests that compact sets should have some sort of uniform decay of coordinates. This can be expressed by the concept of uniform summability.

Definition 6.4. A subset K ⊂ `p is called uniformly summable if X p (39) lim sup |xn| = 0. N→∞ x∈K n>N Note that (39) holds true for any finite set. But a close examination of the argument we presented for the Hilbert cube shows that this condition is exactly what is needed to prove precompactness.

Lemma 6.5. A subset K ⊂ `p, for 1 ≤ p < ∞ is precompact if and only if it is uniformly summable. Proof. The sufficiency follows the line of the proof of compactness of the Hilbert cube line by line. For any ε we find N such that X p p sup |xn| < ε . x∈K n>N

Define Yε = span{e1, . . . , eN }, and observe that !1/p X p k[x]k`p/Yε ≤ |xn| < ε, ∀x ∈ K. n≥N P p Conversely, denote cN = supx∈K n>N |xn| . For any ε > 0 we can find an ε-net for K, y1, . . . , ym. Since this is a finite set, we have, for some N ∈ N, X j p sup |yn| < ε. j=1,...,m n>N j For any x ∈ K, find j such that kx − y kp < ε. Then, their tails are also ε-close !1/p X j p |xn − yn| < ε. n>N By the triangle inequality, !1/p X p |xn| < 2ε. n>N p Thus, cN < (2ε) . This finishes the proof. 

A similar criterion can be established in c0 (left as Exercise 6.2). LECTURES ON FUNCTIONAL ANALYSIS 55

Lemma 6.6. A subset K ⊂ c0 is precompact if and only if it is uniformly summable:

(40) lim sup sup |xn| = 0. N→∞ x∈K n>N A convenient sufficient condition for precompactness is given by existence of a majorant.

Definition 6.7. We say that a sequence a ∈ `p or c0 is a majorant for a set K ⊂ `p or c0, respectively, if |xn| ≤ an for all x ∈ K.

It is clear that cN ≤ k(0,..., 0, aN ,... )k → 0, and the uniform summability follows. Consequently, all Hilbert cube-type sets are compact in `p as long as p < ∞, and in c0. The question remains: what about `∞? The tails of sequences in `∞ can be arbitrary and oscillatory. So, we can’t rely on any kind of smallness at infinity. However, oscillations of any finite number of elements in `∞ can be controlled over finite partitions of N. To be precise, if x ∈ `∞, let us break its image down into small disjoint intervals: J [−kxk, kxk] = ∪j=1Ij, |Ij| < 2kxk/J. With J large the length of each interval can be made arbitrarily small, say less than ε. The −1 consider sets Aj = x (Ij). Then Aj’s are disjoint and N = ∪jAj. So, they form a partition of N. On such sets we have sup |xn − xm| < ε. n,m∈Aj

So, oscillation of x on each Aj is small. l Now if we have a finite set of vectors x ∈ `∞, l = 1,...,L, then we can consider the above construction for each of them (we can also pick the largest common J for all of them), and consider sets A1 ∩ A2 ∩ ... ∩ AL , j1 j2 jL P for all possible choices of jl ∈ N. These also form a finite partition of N, denoted {Bp}p=1, and moreover j j sup sup |xn − xm| < ε, ∀p ≤ P. j≤J n,m∈Bp So, we can establish a decomposition of N into subsets in which the oscillations of our finite set are small. This uniformity of oscillations is what characterizes precompactness in `∞.

Lemma 6.8. A set K ⊂ `∞ is precompact if and only if K is bounded and for every ε > 0 P there exists a partition of N by {Bp}p=1 such that

(41) sup sup |xn − xm| < ε, ∀p ≤ P. x∈K n,m∈Bp Proof. We have essentially proved necessity of (41) in the discussion above. Indeed, if we apply the partition for an ε-net {xj}, then (41) follows with 2ε. Conversely, let M be a global bound on the norms of elements in K, kxk∞ < M. For P any ε let us find a partition {Bp}p=1 as in (41). Let us further fix an ε-net of scalars Λ = {λ1, . . . , λQ} on the interval [−M,M]. Let us form sequences with values in Λ on each Bp: N = {x ∈ ` : x ∈ Λ, p ≤ P }. ∞ |Bp Thus, each element in N is constant on each Bp with value in Λ. Since the partition is finite and Λ is finite, the set N is finite. Yet, in view of (41) for each x ∈ K and any p there (p) (p) exists λ ∈ Λ such that |λ − xn| < 2ε for all n ∈ Bp. Then the corresponding y ∈ N with y = λ(p) will fulfill ky − xk < 2ε. Thus, N is a 2ε-net for K. |Bp ∞  56 R. SHVYDKOY

6.2. Arzel`a-AscoliTheorem. We now discuss compactness criteria in various function spaces. We start with spaces of continuous functions. Let Ω denote a metric . Let us recall the Heine-Cantor Theorem asserting that every continuous function f ∈ C(Ω) is automatically uniformly continuous. The latter means that for every ε > 0 there exists δ > 0 such that for all d(x, y) < δ one has |f(x) − f(y)| < ε.

Indeed if the latter is not true then we have an ε0 > 0 and two sequences d(xn, yn) → 0, for which |f(xn) − f(yn)| > ε0.

By compactness we can assume that xn, yn → z ∈ Ω. But then f(xn), f(yn) → f(z), which is a contradiction. A characterization of compact sets can be given by a condition which states of elements of that set uniformly over the set. Definition 6.9. A family of functions F ∈ C(Ω) is called uniformly equicontinuous if for every ε > 0 there exists δ > 0 such that for all d(x, y) < δ one has sup |f(x) − f(y)| < ε. f∈F Theorem 6.10 (Arzel`a-Ascoli). A set K ⊂ C(Ω) is precompact if and only if K is uniformly equicontinuous and pointwise bounded

sup |f(x)| ≤ Cx < ∞, ∀x ∈ Ω. f∈K Proof. Let us start with the easier implication. If K is precompact, then it is bounded and P hence pointwise bounded. For any ε > 0 find an ε/3-net {fp}p=1 for K. Since every function in that net is uniformly continuous we find a common δ > 0 such that for any d(x, y) < δ we have max |fp(x) − fp(y)| < ε/3. p

For any other function f ∈ K we pick p such that kf − fpk < ε/3, and the result follows by the standard application of the triangle inequality. To start the converse implication we note that and pointwise boundedness implies the usual boundedness, see Exercise 6.3. So, we can assume that K is equicontinuous and bounded. Let sup kfk = M < ∞. f∈K

Let us fix ε > 0 and partition the interval [−M,M] into disjoint semi-closed intervals Ij, j = 1,...,J with |Ij| < ε/2, in the increasing order I1 < I2 < ··· < IJ . Next, let us find finite cover K by balls B1,...,BN such that for any x, y ∈ Bi we have (42) sup |f(x) − f(y)| < ε/2. f∈K Now, for any function f ∈ K we associate a map

if : [1,...,N] → [1,...,J] by defining if (n) = max{j : ∃x ∈ Bn with f(x) ∈ Ij}. LECTURES ON FUNCTIONAL ANALYSIS 57

Note that in view of (42), all values of a function f ∈ K on any ball Bn will belong to either the top interval or the one below (43) f ⊂ I ∪ I . |Bn if (n)−1 if (n)

Consequently, if if = ig, then on any ball |f − g| < ε, i.e. kf − gk < ε. However, since i is a

map between two finite sets there are only finitely many such maps, if1 , . . . , ifP . Any other

map ig, g ∈ K has to coincide with one them, i.e. there exists fp such that ifp = ig. Hence P kg − fpk < ε. This means that {fp}p=1 is a finite ε-net for K.  If Ω is a bounded convex domain in Rn there is a very simple sufficient condition for uniform equicontinuity of a family F ⊂ C(Ω): boundedness in C1(Ω). Indeed, if there exists a constant M > 0 so that sup k∇fk∞ ≤ M, f∈F then for any x0, x00 ∈ Ω we have |f(x0) − f(x00)| ≤ |∇f(y)(x0 − x00)|, for some y ∈ [x0, x00], and hence |f(x0) − f(x00)| ≤ M|x0 − x00|. So, the family is uniformly Lipschitz, and hence equicontinuous. 6.3. Compactness in Lp(Ω). In this section we will establish compactness criteria in Lp- spaces over domains in Rn. Before we discuss setup and results of let us first introduce an important procedure called mollification2.

6.3.1. Mollification. We consider a non-negative C∞ function ψ on Rn with supp ψ ⊂ {|x| ≤ 1} and unit total weight: Z ψ(x) dx = 1. Rn For any small parameter ε > 0 and we consider the rescaled function 1 x ψ (x) = ψ . ε εn ε

Note that just like ψ, ψε has total weight 1 as well. However, supp ψε ⊂ {|x| ≤ ε}. So, ψε concentrates all its mass in a smaller ball. For any function u ∈ Lp(Rn), 1 ≤ p ≤ ∞ we define a by Z (44) u ∗ ψε(x) = uε(x) = ψε(x − y)u(y) dy. Rn Since ψε has compact support, it is clear that the in y converges for all x, although it may not have a local support anymore. However if it clear that

(45) supp(u ∗ ψε) ⊂ supp u + Bε(0). The most important feature of mollification is that it makes functions smooth. Indeed, since ψ ∈ C∞ the usual convergence theorems can be applied to show that Z j1 jn j1 jn ∂x1 . . . ∂xn u ∗ ψε(x) = ∂x1 . . . ∂xn ψε(x − y)u(y) dy. Rn 2Other terms used in the literature include filtration, coarse-graining, regularization. 58 R. SHVYDKOY

Since each time the falls on ψ it produces a factor of 1/ε, it is clear that the norm of any derivative of the convolution can grow out of control. However, it terms of its native p 1 1 L -norm, the mollification is well behaved. To see this let us estimate using p + q = 1, Z Z p Z Z p p 1/q 1/p ku ∗ ψεkp = ψε(x − y)u(y) dy dx = ψε (x − y)ψε (x − y)u(y) dy dx Rn Rn Rn Rn p Z Z  q Z p ≤ ψε(x − y) dy ψε(x − y)|u(y)| dy dx Rn Rn Rn Z Z p p = ψε(x − y)|u(y)| dy dx = kukp. Rn Rn Thus,

(46) ku ∗ ψεkp ≤ kukp. Next important property of mollification is the approximation:

(47) ku ∗ ψε − ukp → 0. Indeed, let us recall the following fact from Real Analysis: for any function u ∈ Lp(Rn) we have Z (48) |u(x + h) − u(x)|p dx → 0, Rn as h → 0. This fact follows by approximating u with a continuous compactly supported function v, for which (48) obviously holds, and running the usual triangle inequality argu- ment. Continuing with the proof of (47) we write Z Z p p ku ∗ ψε − ukp = u(x + h)ψε(h) dh − u(x) dx Rn Rn recalling that the mass of the mollifies is always 1, Z Z p

= (u(x + h) − u(x))ψε(h) dh dx Rn Rn and using the H¨olderinequality in the same manner as before, Z Z Z Z p p ≤ |u(x + h) − u(x)| ψε(h) dh dx = |u(x + h) − u(x)| dx ψε(h) dh. Rn Rn Rn Rn

Noting that all |h| < ε in the support of ψε and in view of (48) we can see that the inner integral tends to zero uniformly in h as ε → 0. This proves (47).

6.3.2. Compactness on bounded domains. Let us now assume that Ω ⊂ Rn is an open bounded set, which we call a domain. It is crucial for the compactness criterion formu- lated below to consider spaces on bounded domains, otherwise see Exercise 6.4. When considering functions u ∈ Lp(Ω) it is clear that any shift u(x + h) may drive the argument x + h outside the domain as x gets close to the . It is therefore conventional to extend u trivially by zero outside Ω for such operations to make sense. So, LECTURES ON FUNCTIONAL ANALYSIS 59

without further notice we always assume that functions in Lp(Ω) are extended trivially to the entire Rn: u(x), x ∈ Ω, u(x) = 0, x 6∈ Ω. Compactness in Lp(Ω) is very similar in flavor to the Arzel`a-AscoliTheorem, with the equicontinuity replaced by the “equiintegrability” – the condition (48) holding uniformly across a given set.

Theorem 6.11 (Compactness in Lp). Let Ω ⊂ Rn be a bounded domain, and 1 ≤ p < ∞. A subset K ⊂ Lp(Ω) is precompact if and only if K is bounded and Z (49) sup |u(x + h) − u(x)|p dx → 0, h → 0. u∈K Ω Proof. If K is precompact, it is clearly bounded, and for every ε > 0 let us pick an ε-net N {ui}i=1 for it. Since (48) holds for each ui we have for some i,

ku(· + h) − ukp ≤ ku(· + h) − ui(· + h)kp + kui(· + h) − uikp + kui − ukp

≤ 2ε + kui(· + h) − uikp → 2ε, as h → 0. Since this is true for all ε > 0, the result follows. Now, suppose (49) holds. Let us fix ε, and consider the mollification of the set K,

Kε = {u ∗ ψε : u ∈ K}. The proof of (47) shows that under (49) we have a uniform approximation of elements of K by their mollifications

sup ku ∗ ψε − ukp = o(1), ε → 0. u∈K

So, for any δ let us pick ε so small that o(1) < δ. For that fixed ε the set Kε ⊂ C(Ω + Bε(0)) is equicontinuous. Indeed, Z 0 00 0 00 |uε(x ) − uε(x )| ≤ |ψε(x − y) − ψε(x − y)||u(y)| dy Ω 0 00 0 00 0 00 ≤ Cε|x − x |kukL1(Ω) ≤ Cε,p|x − x |kukLp(Ω) ≤ C|x − x |, in the latter since K is assumed bounded. So, the family Kε is uniformly Lipschitz in fact. By the Arzel`a-AscoliTheorem we conclude that Kε is precompact in C(Ω + Bε(0)). Let j N j {uε}j=1 be a δ-net in Kε. Then for any uε there exists uε with ku − ujk < δ. ε ε C(Ω+Bε(0)) This means approximation in L∞-metric, the strongest of all Lp-metrics on a bounded set. Thus, j j j ku − uεkLp(Ω) ≤ ku − uεkLp(Ω) + kuε − uεkLp(Ω) ≤ δ + Cpkuε − uεkL∞(Ω) ≤ δ + C ku − ujk < (1 + c )δ. p ε ε C(Ω+Bε(0)) p Since δ is arbitrary, this concludes the proof.  6.4. Extreme points and Krein-Milman Theorem. 60 R. SHVYDKOY

6.5. Compact maps. Generally by a compact map between two metric spaces X and Y we understand a map T : X → Y which maps bounded sets to precompact sets. If X and Y are Banach spaces and T ∈ L(X,Y ), then it is sufficient to consider only the unit ball. Definition 6.12. T ∈ L(X,Y ) is called a if T (B(X)) is precompact in Y . By linearity then it implies that T sends bounded sets for precompact sets. The following lemma follows directly from Lemma 1.10. ∞ Lemma 6.13. T ∈ L(X,Y ) is compact if and only if for any bounded sequence {xn}n=1 ⊂ X

there exists a convergent subsequence of T xnk → y ∈ Y . Let us note some obvious properties of compact maps. First, if dim X < ∞, then any bounded linear operator T : X → Y is compact, simply because any continuous maps be- tween topological spaces maps compact sets to compact sets, see Lemma 1.4. At the same time if dim Y < ∞, then every bounded subset of Y is precompact by Corollary 2.13. And thus, every T ∈ L(X,Y ) is compact also. If, however, dim X, dim Y = ∞, then no isomor- phism or even isomorphic embedding between such spaces can be a compact operator. This is because in this case T (B(X)) would have a non-empty interior in an infinite dimensional subspace. So, compact operators are not invertible in this case. 6.6. Exercises.

√1 Exercise 6.1. Show that the Hilbert cube with sides an = n is not a compact set. Hint: construct a disjoint sequence of unit vectors in H with supports going off to infinity. Exercise 6.2. Prove Lemma 6.6. Exercise 6.3. Show that any uniformly equicontinuous and pointwise bounded set in C(Ω) is automatically bounded. Give an example of a pointwise bounded set which is not bounded. Why does the Uniform Boundedness principle not apply here? Exercise 6.4. Give an example of a set K ⊂ Lp(Rn) which satisfies (49) yet is not precompact. Hint: consider shifts of a fixed compactly supported function.

Exercise 6.5. Formulate and prove an analogue of the compactness criterion in `∞ stated in Lemma 6.8 in the space L∞(Ω).

7. Fixed Point Theorems 7.1. Contraction Principles. Definition 7.1. Let (X, d) be a metric space. We say that T : X → X is a contraction if there exists 0 < θ < 1 such that d(T x, T y) ≤ θd(x, y), ∀x, y ∈ X. Theorem 7.2 (Contraction Principle). Let (X, d) be a complete metric space and T : X → X is a contraction. Then T has a unique fixed point.

Proof. Fix an arbitrary x0 ∈ X. Let us consider the sequence n xn = T x0, n ∈ N. Let us estimate the distance between two consecutive elements: m d(xm+1, xm) = d(T xm, T xm−1) ≤ θd(xm, xm−1) ≤ · · · ≤ θ d(x1, x0). LECTURES ON FUNCTIONAL ANALYSIS 61

Thus, if n > m are arbitrary, n−1 m d(xn, xm) ≤ d(xn, xn−1) + ··· + d(xm+1, xm) ≤ (θ + ··· + θ )d(x1, x0).

Since θ < 1 this proves that the sequence xn is Cauchy. In view of completeness, xn → x ∈ X. Since

T xn = xn+1, by sending n → ∞ we conclude that T x = x, as desired. The show uniqueness, simple observe that if x1, x2 are two fixed points, then

d(x1, x2) = d(T x1, T x2) ≤ θd(x1, x2), and hence d(x1, d2) = 0.  A slightly more general contraction principle can be stated as follows. Corollary 7.3. Let (X, d) be a complete metric space and let T : X → X be a continuous map such that T m is a contraction, for some m ∈ N. Then T has a unique fixed point. Proof. Let R = T m. Then by the Contraction principle R has a unique pointx ˜, and moreover from the proof every iteration converges to it: n R x0 → x.˜

So, starting from x0 = T x˜ we conclude RnT x˜ → x.˜ At the same time RnT x˜ = T nm+1x˜ = TRnx˜ = T x˜ So,x ˜ is a fixed point for T also. Finally, every fixed point of T is also a fixed point of R. So, there can be only one such point by uniqueness for R.  7.2. Brouwer Fixed Point theorem and its relatives.

8. Spectral theory of bounded operators In this section we consider a complex Banach space X. Let T : X → X be a bounded linear operator.

Definition 8.1. We define the spectrum σ(T ) of T as a set of points λ ∈ C such that the operator T − λI is not invertible. The complement ρ(T ) = C\σ(T ) is called the resolvent set of T . If λ ∈ ρ(T ) the operator −1 R(λ, T ) = (T − λI) is called the resolvent of T at λ.

There are two ways in which T − λI may become non-invertible – by failing to be injective or surjective. If it is not injective, then λ is called an eigenvalue, and we denote λ ∈ σp(T ). The set of eigenvalues σp(T ) is called the point spectrum. The following lemma isolates one case when the resolvent is easy to compute. 62 R. SHVYDKOY

Lemma 8.2. Suppose kT k < 1. Then I − T is invertible and ∞ −1 X j (50) (I − T ) = T , j=1 where the series converges absolutely in the operator norm of L(X,X). Proof. First of all, let us note that kT nk ≤ kT kn, and since kT k < 1, the series converges absolutely. Denote ∞ N X j X j S = T ,SN = T . j=1 j=1 Then N N+1 N+1 SN (I − T) = I + T + ··· + T − T − · · · − T = I − T , hence, SN (I − T ) → I, and at the same time SN → S. So, S(I − T ) = I. Similarly, (I − T )S = I. By Lemma 2.20, I − T is invertible, and (I − T )−1 = S, which finishes the lemma.  A simple consequence of this lemma is that the spectrum of any operator is confined to 1 the ball of radius kT k. Indeed, if |λ| > kT k, then the operator λ T has norm < 1. Then 1 λ − T = λ( − T ) I I λ is invertible, and according to (50), ∞ X T j (51) R(λ, T ) = −(λ − T )−1 = − . I λj+1 j=0 Next, we show that σ(T ) is always closed.

Lemma 8.3. The spectrum of any bounded linear operator is a closed subset of C.

Proof. We prove it by showing that the resolvent set ρ(T ) is open. Indeed, let λ0 ∈ ρ(T ). Let us fix another λ such that 1 |λ − λ0| < . kR(λ0,T )k Then T − λI = T − λ0I − (λ − λ0)I = (T − λ0I)[I − (λ − λ0)R(λ0,T )]. Since k(λ−λ0)R(λ0,T )k < 1, the above is the product of two invertible operators. Hence, the product is invertible. This establishes that λ ∈ ρ(T ), and hence the entire ball B 1 (λ0) ⊂ kR(λ0,T )k ρ(T ).  The proof of the lemma provides an explicit formula for the resolvent in the form of a power series expansion: −1 R(λ, T ) = [I − (λ − λ0)R(λ0,T )] R(λ0,T ), so, according to (50), ∞ X 1 (52) R(λ, T ) = (λ − λ )jRj+1(λ ,T ), |λ − λ | < . 0 0 0 kR(λ ,T )k j=0 0 LECTURES ON FUNCTIONAL ANALYSIS 63

There are several useful consequences of this formula. First, the following estimate shows that the resolvent must blowup when nearing the spectrum. Lemma 8.4. For any λ ∈ ρ(T ), we have 1 (53) kR(λ, T )k ≥ . dist{λ, σ(T )} Proof. Indeed, we can see from (52) that for any λ ∈ ρ(T ), the ball of radius 1/kR(λ, T )k lies entirely in ρ(T ). So, the distance to the spectrum from λ will be greater than or equal that radius: dist{λ, σ(T )} ≥ 1/kR(λ, T )k. This proves the result.  Next, formula (52) provides demonstrates complex analytic structure of the resolvent. Indeed, if we fix any x ∈ X, and x∗ ∈ X∗, then from (52), we have an expansion for the function f(λ) = (x∗,R(λ, T )x): ∞ X j ∗ j+1 f(λ) = cj(λ − λ0) , cj = (x ,R (λ0,T )x). j=0 It proves that f is complex analytic in the resolvent set ρ(T ). This defines what is called a weakly analytic map R : ρ(T ) → L(X,X). It turns out that every weakly analytic map is automatically analytic with respect to the uniform topology, in other words the limit R(λ + h, T ) − R(λ, T ) R0(λ, T ) = lim , λ ∈ ρ(T ) h→0 h exists in the operator norm of L(X). This general fact can be found in [?]. But in our case we can prove it directly from (52) because manipulating with the convergent series can be done in the same way as with complex variables. Indeed, let λ0 ∈ ρ(T ), then ∞ ∞ R(λ0 + h, T ) − R(λ0,T ) 1 X X = hjRj+1(λ ,T ) = hjRj+2(λ ,T ). h h 0 0 j=1 j=0 Then ∞ 3 R(λ0 + h, T ) − R(λ0,T ) 2 X j j+2 |h|kR(λ0,T )k − R (λ0,T ) ≤ |h| kR(λ0,T )k = → 0. h 1 − |h|kR(λ ,T )k j=1 0 Hence, (54) R0(λ, T ) = R2(λ, T ). This proves uniform analyticity and gives an explicit formula for the derivative of the resol- vent. Alternatively, the formula (54) can be derived from the Hilbert identities, which are useful for various purposes. These state the following: for any λ, µ ∈ ρ(T ), (55) R(µ, T ) − R(λ, T ) = (µ − λ)R(µ, T )R(λ, T ) 64 R. SHVYDKOY

To see that, let us write R(µ, T ) − R(λ, T ) = R(µ, T )(T − λI)R(λ, T ) − R(µ, T )(T − µI)R(λ, T ) = R(µ, T )(T − λI − T + µI)R(λ, T ) = (µ − λ)R(µ, T )R(λ, T ). Then (54) follows trivially from (55) and the continuity of the resolvent. We are now ready to prove that the spectrum is not empty.

Theorem 8.5. For any T ∈ L(X), we have σ(T ) 6= ∅. Proof. Indeed, suppose σ(T ) 6= ∅ for some T . Let us recall (51) and estimate for |λ| > 2T , 1 1 kR(λ, T )k ≤ ≤ . |λ| − kT k kT k At the same time the map λ → R(λ, T ) is continuous, hence, bounded on {|λ| ≤ 2kT k}. Together with the above it implies that sup kR(λ, T )k = M < ∞. λ∈C For any fixed x∗ ∈ X∗ and x ∈ X, the f(λ) = x∗(R(λ, T )x) is entire and bounded. By Liouville’s theorem this implies that f(λ) is constant. So, in other words, for any λ, µ ∈ C and any pair of x∗ ∈ X∗ and x ∈ X x∗(R(λ, T )x) = x∗(R(µ, T )x). This implies that R(λ, T ) is independent of λ, and hence so is R−1 = T −λI. This is obviously not true.  With the basic properties of the spectra developed so far, we can already describe spectra of various operators. Example 8.6. Let us consider the left-shift operator

T x = (x2, x3,... ) p on ` , 1 ≤ p ≤ ∞. Clearly 0 ∈ σp(T ), and since kT k = 1 the whole spectrum is confined to the unit disc. Now, if |λ| < 1 we might try to construct an eigenvector corresponding to λ by considering the T x = λx. It read coordinate-wise, it results in an infinite system of linear

x2 = λx1, x3 = λx2,... So, if such a vector exists, it must be of the form n−1 xn = λ x1. Thus, the sequence x = (1, λ, λ2,... ) will be an eigenvector and it will belong to any `p class due to |λ| < 1 (in fact for p = ∞ this would also serve as an eigenvector for any |λ| = 1). LECTURES ON FUNCTIONAL ANALYSIS 65

We have discovered that the open disc {|λ| < 1} belongs to the spectrum. According to Lemma 8.3 the spectrum is a closed set, so the closed disc {|λ| ≤ 1} also belongs to the spectrum. But since kT k = 1 it must be a subset of the disc. We conclude that σ(T ) = {|λ| ≤ 1}.

Example 8.7. Consider now the multiplication operator on `p:

T x = (a1x1, a2x2,... ), a ∈ `∞.

Clearly, an ∈ σp(T ) for the corresponding eigenvector is given by the basis vector en. Con- sequently, ∞ {an}n=1 ⊂ σ(T ). ∞ Let now λ 6∈ {an}n=1. This means that there exists δ > 0 such that |λ − an| > δ. But then

(T − λI)x = ((a1 − λ)x1, (a2 − λ)x2,... ) has a bounded inverse −1 −1 −1 (T − λI) x = ((a1 − λ) x1, (a2 − λ) x2,... ). Hence, λ ∈ ρ(T ). We discover that

∞ σ(T ) = {an}n=1. 8.1. Spectral Mapping Theorem and the Gelfand formula. Let us consider a poly- nomial n n−1 p(λ) = anλ + an−1λ + ··· + a1λ + a0. We can define the operator p(T ) by substituting T for λ: n n−1 p(T ) = anT + an−1T + ··· + a1T + a0. The natural question to ask is what is the spectrum of this operator? This will be answered in the following Spectral Mapping Theorem. Theorem 8.8. We have the following identity σ(p(T )) = p(σ(T )). Proof. First let us show the inclusion σ(p(T )) ⊂ p(σ(T )). Indeed, let us fix a µ ∈ C, and consider the polynomial p(λ) − µ. By the Fundamental Theorem of Algebra, we have a factorization

p(λ) − µ = an(λ − λ1)(λ − λ2) ... (λ − λn),

where λ1, . . . , λn are the roots. Then

(56) p(T ) − µI = an(T − λ1I)(T − λ2I) ... (T − λnI).

If all of λi ∈ ρ(T ), then all the multiples in the product on the right hand side is invertible, and so is the product itself, p(T ) − µI, by Lemma 2.21. Hence, µ ∈ ρ(p(T )). So, if µ ∈ σ(p(T )), then one of λi ∈ σ(T ). But p(λi) = µ, so µ ∈ p(σ(T )). Let us prove the opposite inclusion (57) p(σ(T )) ⊂ σ(p(T )). 66 R. SHVYDKOY

We start from µ ∈ ρ(p(T )). Then p(T ) − µI is invertible. Note that all operators in the product (56) commute. Then by Lemma 2.22, see also Exercise 2.2, each product term T −λiI is invertible. Hence, all λi ∈ ρ(T ). If it was the case that µ ∈ p(σ(T )), then µ = p(λ) for some λ ∈ σ(T ). But then it would have implied that λ is the root of p(λ) − µ. We just established that all such roots are in ρ(T ), which is a contradiction. So, µ in the complement of p(σ(T )). In other words, ρ(p(T )) ⊂ C\p(σ(T )), which is equivalent to (57).  We will now define the by r(T ) = sup |λ|. λ∈σ(T ) We prove Gelfand’s formula. Theorem 8.9. The following formula holds for any T ∈ L(X): (58) r(T ) = lim pn kT nk. n→∞ So far, we only know that r(T ) ≤ kT k. Since σ(T n) = σ(T )n, we have r(T n) = r(T )n. So, according to the above r(T ) ≤ pn kT nk, for all n ∈ N. Hence, r(T ) ≤ lim inf pn kT nk ≤ lim sup pn kT nk. n→∞ n→∞ It remains to prove the bound (59) lim sup pn kT nk ≤ r(T ). n→∞ Let us recall the power series representation of the resolvent (53), for |λ| > kT k, which we rewrite for the variable ξ = 1/λ ∞ X n n (60) Rξ = −ξ ξ T . j=0 Then we have

∞ ∞ X n n X n n ξ ξ T ≤ −x x kT k, x = |ξ|. j=0 n=0

From the calculus we recall that the power series above converges for all x < r0, where 1 (61) = lim sup pn kT nk. r0 n→∞

So, for all |ξ| < r0, the power series converges absolutely. This power series therefore defines an analytic function on the ball of radius r0. At the same time Rξ defines an analytic function on the ball of radius 1/r(T ). And both functions coincide on the ball of radius 1/kT k. This means that both functions coincide up to their common radius of analyticity LECTURES ON FUNCTIONAL ANALYSIS 67

r1 = min{1/r(T ), r0} . Assume now that 1/r(T ) > r0. Then formula (60) holds up to |ξ| < r0. If we consider the series itself ∞ X f(ξ) = ξnT n, j=0

it shows that f(ξ) has an analytic extension beyond r0, say up to the ball of radius r0 + 2ε, where f remains bounded. Then we can compute the coefficients of the series by the Cauchy formula Z Z n 1 f(ξ) 1 f(ξ) T = n+1 dξ = n+1 dξ, 2πi |ξ|=r0−ε ξ 2πi |ξ|=r0+ε ξ due to the fact the integrand is analytic in the annulus r0 − ε ≤ |ξ| ≤ r0 + ε. Given that kf(ξ)k ≤ M, we obtain the estimate

n M kT k ≤ n , (r0 + ε) for all n ∈ N. So, 1 1 lim sup pn kT nk ≤ < , n→∞ r0 + ε r0 in contradiction with (61). This proves that 1/r(T ) ≤ r0, which is what we aimed for. 8.2. On the spectrum of self-adjoint operators. The goal of this section will be to establish special spectral properties for the classes of operators we defined on a Hilbert space. So, we fix a Hilbert space H with an inner-product h·, ·i. We start with a description of spectrum for self-adjoint operators.

Lemma 8.10. Let T ∈ L(H) be a self-adjoint operator, T = T ∗. Then σ(T ) ⊂ R. Proof. We start by proving that the spectrum is real. First, let us make an observation known already from : if λ ∈ σp(T ), then λ ∈ R. Indeed, if T x = λx, scalar multiplying with x and using Lemma 4.21 we obtain 2 hT x, xi = λkxk ∈ R, and hence, λ ∈ R. Next, we prove the following general claim.

Claim 8.11. Denote Tλ = T − λI. If Tλ is bounded from below

(62) kTλxk ≥ c0kxk,

then Tλ is invertible.

Proof. From Lemma 2.23 we learned that Rg Tλ = Y is closed. Suppose that Y 6= H. Then ⊥ consider a point x0 ∈ Y . We have

0 = hx0,Tλxi = hTλ¯x0, xi, ∀x ∈ H. ¯ So, Tλ¯x0 = 0, which means that λ is an eigenvalue, and hence λ ∈ R. But then Tλx0 = 0 which contradicts the assumption.  68 R. SHVYDKOY

Now let us fix λ = α + iβ with β 6= 0 and show that λ ∈ ρ(T ). Indeed, from Lemma 4.21, hT x, xi ∈ R. Thus, 2 ImhTλx, xi = βkxk . Then 2 |β|kxk = | ImhTλx, xi| ≤ |hTλx, xi| ≤ kT xkkxk, which implies (62). Hence, Tλ is invertible and hence, λ ∈ ρ(T ).  We now give a more precise information about the spectrum. Lemma 8.12. Let T ∈ L(H) be a self-adjoint operator. Then (63) σ(T ) ⊂ {hT x, xi : x ∈ S(H)}. Proof. Let us fix λ 6∈ {hT x, xi : x ∈ S(H)}. This means in particular that there is a positive distance between λ and the set of values hT x, xi: δ = inf |λ − hT x, xi| > 0. x∈S(H) Then, for any x ∈ S(H), |hTλx, xi| = |hT x, xi − λ| ≥ δ. Then by CS, kTλxk ≥ |hTλx, xi| ≥ δ. This implies by normalization that for any x 6= 0,

kTλxk ≥ δkxk. and Claim 8.11 applies to give λ ∈ ρ(T ). Hence, (63) follows.  Next we prove a formula for the norm of a self-adjoint operator in terms of the same set of scalar products hT x, xi. Lemma 8.13. We have (64) kT k = sup |hT x, xi|. kxk=1 Proof. Let us recall that kT k = sup |hT x, yi|. x,y∈S(H) Let us pick x, y ∈ S(H) such that |hT x, yi| > kT k − ε. We can certainly assume that hT x, yi > 0. Then, let us observe the identity 1 hT x, yi = (hT (x + y), x + yi − hT (x − y), x − y)i. 4

Denoting K = supkxk=1 |hT x, xi|, we then have 1 hT x, yi ≤ K(kx + yk2 + kx − yk2). 4 By the Parallelogram Rule (26), kx + yk2 + kx − yk2 = 4, so, we obtain hT x, yi ≤ K. This proves the result.  LECTURES ON FUNCTIONAL ANALYSIS 69

Next we prove that the end-points of the set of values hT x, xi lie in the spectrum. Lemma 8.14. Let

s1 = inf hT x, xi, s2 = sup hT x, xi. kxk=1 kxk=1

Then s1, s2 ∈ σ(T ). Since according to (64)

kT k = max{|s1|, |s2|}, this proves that the spectral radius is in fact equal to the norm of the operator kT k = r(T ).

Proof. We focus on proving that s2 ∈ σ(T ), with the argument for s1 being similar. Let us pick a constant L > 0 large enough so that h(T + LI)x, xi > 0, ∀x ∈ H.

If we can prove that s2(T + LI) ∈ σ(T + LI), then by the Spectral Mapping Theorem 8.8, we achieve the result. In other words, without loss of generality we can assume that s1 > 0. In this case kT k = s2. Let us pick a sequence xn ∈ H so that

hT xn, xni → kT k. Then 2 2 2 2 kT xn − kT kxnk = kT xnk − 2kT khT xn, xni + kT k ≤ 2kT k − 2kT khT xn, xni → 0. This means that the operator T − kT kI is not bounded from below, and as a consequence cannot be invertible. Hence, kT k ∈ σ(T ).  Let us collect the obtained results in one for self-adjoint operators. Theorem 8.15. Let T ∈ L(H) be a self-adjoint operator. Then σ(T ) ⊂ {hT x, xi : x ∈ S(H)}, inf hT x, xi, sup hT x, xi ∈ σ(T ), kxk=1 kxk=1 kT k = r(T ) = sup |hT x, xi|. kxk=1 Let us note that the spectrum of a self-adjoint operator is not in fact the same as {hT x, xi : x ∈ S(H)}. This is clear even in finite dimensional case. In this case we have an of eigenvectors

T xi = λixi.

Thus, σ(T ) = {λi}i. Yet, X 2 hT x, xi = λi|xi| , i P 2 where i |xi| = 1. In other words, the set of scalar products in this case is the convex combination of the spectrum: {hT x, xi : x ∈ S(H)} = [min σ(T ), max σ(T )]. 70 R. SHVYDKOY

8.3. On the spectrum of unitary operators. For unitary operators U ∈ L(H), U ∗ = U −1, the spectral theory goes in parallel with the self-adjoint one. In fact the relationship between the two is very similar to that between angles and unimodular complex numbers expressed by the Euler formula z = eiθ. In fact if we plug in T for θ, we can define ∞ X (iT )j U = eiT := . j! j=0 One can easily check that the series converges absolutely and U ∗ = e−iT ∗ . So, if T = T ∗, then U ∗ = e−iT . By superimposing two converging series one further observes that eiT ◦ e−iT = e−iT ◦ eiT = I. So, U ∗ = U −1. This exponential relationship between self-adjoint and unitary operators translates into the corresponding properties of the spectra. In fact, an extended Spectral Mapping Theorem would tell us that σ(U) = eiσ(T ), and hence |σ(U)| = {1}. We can prove exactly that directly.

Lemma 8.16. If U ∈ L(H) is unitary then σ(U) ⊂ {λ ∈ C : |λ| = 1}. Proof. Let |λ| > 1 first. Then considering x ∈ S(H), we have

|hUλx, xi| = |hUx, xi − λ| ≥ |λ| − 1 = δ > 0. Hence, by the Cauchy-Schwartz inequality

kUλxk ≥ δ. For |λ| < 1 we compute

|hUλx, Uxi| = |hUx, Uxi − λhx, Uxi| ≥ 1 − |λ| = δ > 0.

So, the same holds for Uλ. In any case Uλ is bounded from below. ⊥ It remains to show that Uλ is surjective. Indeed, if not, then consider x0 ∈ (Rg Uλ) . ∗ ∗ ∗ Then Uλ¯ x0 = 0. But, U is also unitary, hence, from what we have established above Uλ¯ must be bounded from below, which is a contradiction.  8.4. Exercises. Exercise 8.1. Describe the spectrum of the right shift operator

T x = (0, x1, x2,... ) on `p, 1 ≤ p < ∞. Exercise 8.2. Describe the spectrum of the multiplication operator T f(x) = h(x)f(x), on the space C[a, b]. Here h is assumed to be continuous too, h ∈ C[a, b].

Exercise 8.3. An operator T is called nilpotent if T m = 0, for some m ∈ N. Describe the spectrum of any nilpotent operator. Exercise 8.4. Describe the spectrum of any projection operator P : X → X. LECTURES ON FUNCTIONAL ANALYSIS 71

Exercise 8.5. Show that if ST = TS, then r(ST ) ≤ r(S)r(T ).