<<

MATH 3210 spaces

University of Leeds, School of November 29, 2017

Syllabus: 1. Definition and fundamental properties of a metric . Open sets, closed sets, closure and interior. Convergence of sequences. Continuity of mappings. (6) 2. Real inner-product spaces, orthonormal sequences, perpendicular to a subspace, applications in approximation theory. (7) 3. Cauchy sequences, completeness of R with the standard metric; uniform convergence and completeness of C[a, b] with the uniform metric. (3) 4. The contraction mapping theorem, with applications in the solution of equations and differential equations. (5) 5. Connectedness and path-connectedness. Introduction to compactness and sequential compactness, including of Rn. (6) LECTURE 1

Books: Victor Bryant, Metric spaces: iteration and application, Cambridge, 1985. M. O.´ Searc´oid,Metric Spaces, Springer Undergraduate Mathematics Series, 2006. D. Kreider, An introduction to linear analysis, Addison-Wesley, 1966.

1 Metrics, open and closed sets

We want to generalise the idea of distance between two points in the real , given by d(x, y) = |x − y|, and the distance between two points in the plane, given by

p 2 2 d(x, y) = d((x1, x2), (y1, y2)) = (x1 − y1) + (x2 − y2) . to other settings.

[DIAGRAM]

This will include the ideas of between functions, for example.

1 1.1 Definition Let X be a non-empty . A metric on X, or distance , associates to each pair of elements x, y ∈ X a d(x, y) such that (i) d(x, y) ≥ 0; and d(x, y) = 0 ⇐⇒ x = y (positive definite); (ii) d(x, y) = d(y, x) (symmetric); (iii) d(x, z) ≤ d(x, y) + d(y, z) ().

Examples: (i) X = R. The standard metric is given by d(x, y) = |x − y|. There are many other metrics on R, for example

d(x, y) = |ex − ey|;

 |x − y| if |x − y| ≤ 1, d(x, y) = 1 if |x − y| ≥ 1. Let X be any set whatsoever, then we can define

 1 if x 6= y, d(x, y) = (the discrete metric). 0 if x = y,

2 (ii) X = R . The standard metric is the Euclidean metric: if x = (x1, x2) and y = (y1, y2) then p 2 2 d2(x, y) = (x1 − y1) + (x2 − y2) .

This is linked to the inner-product (scalar product), x.y = x1y1 + x2y2, since it is just p(x − y).(x − y). We will study inner products more carefully later, so for the moment we won’t prove the (well-known) fact that it is indeed a metric. Other possible metrics include

d∞(x, y) = max{|x1 − y1|, |x2 − y2|}.

Let’s check the axioms. In fact (i) and (ii) are easy (i.e., the distance is positive definite, symmetric); for (iii) let’s write |x1 −y1| = p, |x2 −y2| = q, |y1 −z1| = r and |y2 −z2| = s. Then |x1 − z1| ≤ p + r and |x2 − z2| ≤ q + s; so

d∞(x, z) = max{|x1 − z1|, |x2 − z2|} ≤ max{p + r, q + s}

≤ max{p, q} + max{r, s} = d∞(x, y) + d∞(y, z). by inspection.

2 Another metric on R comes from d1(x, y) = |x1 − y1| + |x2 − y2|. These metrics are all translation- (i.e., d(x + z, y + z) = d(x, y)), and homogeneous (i.e., d(kx, ky) = |k|d(x, y)).

2 (iii) Take X = C[a, b]. Here are three metrics: s Z b 2 d2(f, g) = (f(x) − g(x)) dx. a Again, this is linked to the idea of an inner product, so we will delay proving that it is a metric.

Z b d1(f, g) = |f(x) − g(x)| dx, a the area between two curves [DIAGRAM].

d∞(f, g) = max{|f(x) − g(x)| : a ≤ x ≤ b}, the maximum separation between two curves. [DIAGRAM].

Example: on C[0, 1] take f(x) = x and g(x) = x2 and calculate

Z 1 1/2 2 2 p d2(f, g) = (x − x ) dx = 1/30, 0 Z 1 2 d1(f, g) = |x − x | dx = 1/6, and 0 2 d∞(f, g) = max |x − x | = 1/4. x∈[0,1]

1.2 Definition A set X together with a metric d is called a , sometimes written (X, d). If A ⊆ X then we can use d to distances between points of A, and (A, d) is also a metric space, called a subspace of (X, d).

LECTURE 2

Examples: 1. The [a, b] with d(x, y) = |x − y| is a subspace of R. 2 2 2 p 2 2 2. The unit {(x1, x2) ∈ R : x1 +x2 = 1} with d(x, y) = (x1 − y1) + (x2 − y2) is a subspace of R2. 3. The space of polynomials P is a metric space with any of the metrics inherited from C[a, b] above.

1.3 Definition

3 Let (X, d) be a metric space, let x ∈ X and let r > 0. The open centred at x, with radius r, is the set B(x, r) = {y ∈ X : d(x, y) < r}, and the closed ball is the set

B[x, r] = {y ∈ X : d(x, y) ≤ r}.

Note that in R with the usual metric the open ball is B(x, r) = (x − r, x + r), an open interval, and the closed ball is B[x, r] = [x − r, x + r], a closed interval.

2 For the d2 metric on R , the unit ball, B(0, 1), is disc centred at the origin, excluding the boundary. You may like to think about what you get for other metrics on R2. 1.4 Definition A U of a metric space (X, d) is said to be open, if for each x ∈ U there is an r > 0 such that the open ball B(x, r) is contained in U (“room to swing a cat”).

Clearly X itself is an , and by convention the ∅ is also considered to be open.

1.5 Proposition Every “open ball” B(x, r) is an open set.

Proof: For if y ∈ B(x, r), choose δ = r − d(x, y). We claim that B(y, δ) ⊂ B(x, r). If z ∈ B(y, δ), i.e., d(z, y) < δ, then by the triangle inequality

d(z, x) ≤ d(z, y) + d(y, x) < δ + d(x, y) = r.

So z ∈ B(x, r).

1.6 Definition A subset F of (X, d) is said to be closed, if its complement X \ F is open.

Note that closed does not mean “not open”. In a metric space the sets ∅ and X are both open and closed. In R we have: (a, b) is open. [a, b] is closed, since its complement (−∞, a) ∪ (b, ∞) is open. [a, b) is not open, since there is no open ball B(a, r) contained in the set. Nor is it closed, since its complement (−∞, a) ∪ [b, ∞) isn’t open (no ball centred at b can be contained in the set).

1.7 Example

4 If we take the discrete metric,

 1 if x 6= y, d(x, y) = 0 if x = y,

then each point {x} = B(x, 1/2) so is an open set. Hence every set U is open, since for x ∈ U we have B(x, 1/2) ⊆ U.

Hence, by taking complements, every set is also closed.

1.8 Proposition

In a metric space, every one-point set {x0} is closed.

Proof: We need to show that the set U = {x ∈ X : x 6= x0} is open, so take a point x ∈ U. Now d(x, x0) > 0, and the ball B(x, r) is contained in U for every 0 < r < d(x, x0). [DIAGRAM]

1.9 Theorem

Let (Uα)α∈A be any collection of open subsets of a metric space (X, d) (not necessarily S finite!). Then α∈A Uα is open. Let U and V be open subsets of a metric space (X, d). Then U∩V is open. Hence (by induction) any finite intersection of open subsets is open.

S Proof: If x ∈ Uα then there is an α with x ∈ Uα. Now Uα is open, so α∈A S B(x, r) ⊂ Uα for some r > 0. Then B(x, r) ⊂ α∈A Uα so the union is open.

If now U and V are open and x ∈ U ∩ V , then ∃r > 0 and s > 0 such that B(x, r) ⊂ U and B(x, s) ⊂ V , since U and V are open. Then B(x, t) ⊂ U ∩ V if t ≤ min(r, s). [DIAGRAM.]

So the collection of open sets is preserved by arbitrary unions and finite intersections.

1 1 However, an arbitrary intersection of open sets is not always open; for example (− n , n ) T∞ 1 1 is open for each n = 1, 2, 3,..., but n=1(− n , n ) = {0}, which is not an open set.

LECTURE 3

For closed sets we swap union and intersection.

1.10 Theorem

Let (Fα)α∈A be any collection of closed subsets of a metric space (X, d) (not necessar- T ily finite!). Then α∈A Fα is closed. Let F and G be closed subsets of a metric space (X, d). Then F ∪ G is closed. Hence (by induction) any finite intersection of closed

5 subsets is closed.

To prove this we recall de Morgan’s laws. We use the notation Sc for the complement X \ S of a set S ⊂ X.

[ [ c \ c x 6∈ Aα ⇐⇒ x 6∈ Aα for all α, so ( Aα) = Aα. α \ \ c [ c x 6∈ Aα ⇐⇒ x 6∈ Aα for some α, so ( Aα) = Aα. α

c S Proof: Write Uα = Fα = X \ Fα which is open. So α∈A Uα is open by Theorem T c S c S 1.9. Now, by de Morgan’s laws, ( Fα) = Fα. This is just Uα. Since T α∈A α∈A α∈A its complement is open, α∈A Fα is closed. Similarly, the complement of F ∪ G is F c ∩ Gc, which is the intersection of two open sets and hence open by Theorem 1.9. Hence F ∪ G is closed.  Infinite unions of closed sets do not need to be closed. An example is S∞ 1 n=1[ n , ∞) = (0, ∞), which is open but not closed. 1.11 Definition The closure of S, written S, is the smallest closed set containing S, and is contained in all other closed sets containing S. Also S is dense if S = X. A smallest closed set containing S does exist, because we can define \ S = {F : F ⊃ S,F closed}, the intersection of all closed sets containing S. There is at least one, namely X itself.

1.12 Example in R The closure of S = [0, 1) is [0, 1]. This is closed, and there is nothing smaller that is closed and contains S.

1.13 Theorem

The set Q of rationals is dense in R, with the usual metric.

Proof: Suppose that F is a closed subset of R which contains Q: we claim that it F = R. For U = R \ F is open and contains no points of Q. But an open set U (unless it is empty) must contain an interval B(x, r) for some x ∈ U, and hence a . Our only conclusion is that U = ∅ and F = R, so that Q = R. 

1.14 Proposition

6 Let S ⊂ X. Then: (i) S ⊂ S. (ii) S = S ⇐⇒ S is closed (so S = S). (iii) S ⊂ T ⇒ S ⊂ T . (iv) ∅ = ∅, X = X. (v) S ∪ T = S ∪ T . (vi) S ∩ T ⊂ S ∩ T .

Proof: All these are quite easy except (v) and (vi) (CHECK).

For (v) note that S ⊂ S and T ⊂ T so S ∪ T ⊂ S∪T , which is closed, so S ∪ T ⊂ S∪T . Also S ⊂ S ∪ T and T ⊂ S ∪ T so S ∪ T ⊂ S ∪ T . So equal.

For (vi), we have S ∩ T ⊂ S and S ∩ T ⊂ T so S ∩ T ⊂ S ∩ T .

But we don’t need to have equality; for example X = R, S = (0, 1), T = (1, 2). Then S ∩ T = ∅ = ∅, whereas S ∩ T = [0, 1] ∩ [1, 2] = {1}.

1.15 Definition We say that V is a neighbourhood (nhd) of x if there is an open set U such that x ∈ U ⊆ V ; this means that ∃δ > 0 s.t. B(x, δ) ⊆ V . Thus a set is open precisely when it is a neighbourhood of each of its points.

1.16 Example The half-open interval [0, 1) is a neighbourhood of every point in it except for 0.

1.17 Theorem For a subset S of a metric space X, we have x ∈ S iff V ∩ S 6= ∅ for all nhds V of x (i.e., all neighbourhoods of x meet S).

Proof: If there is a neighbourhood of x that doesn’t meet S, then there is an open subset U with x ∈ U and U ∩ S = ∅. [DIAGRAM?] But then X \ U is a closed set containing S and so S ⊂ X \ U, and then x∈ / S because x ∈ U. Conversely, if every neighbourhood of x does meet S, then x ∈ S, as otherwise X \ S is as open neighbourhood of x that doesn’t meet S. 

LECTURE 4

1.18 Definition

7 The interior of S, int S, is the largest open set contained in S, and can be written as [ int S = {U : U ⊂ S,U open}. the union of all open sets contained in S. There is at least one, namely ∅.

We see that S is open exactly when S = int S, otherwise int S is smaller.

1.19 Examples in R int[0, 1) = (0, 1); clearly this is open and there is no larger open set contained in [0, 1). int Q = ∅. For any non-empty open set must contain an interval B(x, r) and then it contains an , so isn’t contained in Q. 1.20 Proposition int S = X \ (X \ S).

Proof: By De Morgan’s laws, [ int S = {U : U ⊂ S,U open} \ = X \ {U c : U ⊂ S,U open} \ = X \ {F : F ⊃ X \ S,F closed} = X \ (X \ S).

This is because U ⊂ S if and only if U c = X\U ⊃ X\S. Also F = U c is closed precisely when U is open. That is, there is a correspondence between open sets contained in S and closed sets containing its complement.

1.21 Corollary (i) int S ⊂ S. (ii) int S = S ⇐⇒ S is open. (iii) S ⊂ T ⇒ int S ⊂ int T . (iv) int (int S) = int S. (v) int(S ∪ T ) ⊃ int S ∪ int T . (vi) int(S ∩ T ) = int S ∩ int T .

Proof: Easy, or take complements and use Prop’s 1.14 and 1.20.

1.22 Definition The boundary or frontier of S is ∂S = S \ int S = S ∩ X \ S. This writes ∂S as the intersection of two closed sets, so it is also closed.

1.23 Examples in R

8 For S = [0, 1) we have int S = (0, 1) and S = [0, 1] so ∂S = {0, 1}.

For S = Q we have int S = ∅ and S = R, so ∂S = R.

1.24 Examples in R2 For S = {(x, y): x2 + y2 < 1}, we have int S = S and S = {(x, y): x2 + y2 ≤ 1}, so ∂S is the circle {(x, y): x2 + y2 = 1}.

For S = [0, 1) regarded as the subset {(x, y) : 0 ≤ x < 1, y = 0} of R2, we have S = {(x, y) : 0 ≤ x ≤ 1, y = 0} and int S = ∅ so ∂S = S.

2 Convergence and continuity

Let (xn) be a sequence in a metric space (X, d), i.e., x1, x2,.... (Sometimes we may start counting at x0.) 2.1 Definition

We say xn → x (i.e., xn tends to x or converges to x) if d(xn, x) → 0 as n → ∞. That is, for all ε > 0 there is an N such that d(xn, x) < ε for n ≥ N (“for n sufficiently large”).

This is the usual notion of convergence if we think of points in Rm with the Euclidean metric.

2.2 Theorem

(i) The sequence (xn) tends to x if and only if for every open U with x ∈ U, ∃n0 s.t. xn ∈ U for all n ≥ n0.

(ii) Let S be a subset of the metric space X. Then x ∈ S if and only if there is a sequence (xn) of points of S with xn → x.

Proof: (i) If xn → x and x ∈ U, then there is a ball B(x, ε) ⊂ U, since U is open. But xn → x so d(xn, x) < ε for n sufficiently large, i.e., xn ∈ U for n sufficiently large.

Conversely, if the “open set” condition works, and ε > 0, choose U = B(x, ε). Then xn ∈ U for n sufficiently large, and so d(xn, x) < ε for n large.

1 (ii) If x ∈ S, then for each n we have B(x, n ) ∩ S 6= ∅ by Theorem 1.17. So choose xn ∈ B(x, 1/n) ∩ S. Clearly d(xn, x) → 0, i.e., xn → x.

Conversely, if x 6∈ S, then there is a neighbourhood U of x with U ∩ S = ∅. Now no sequence in S can get into U so it cannot converge to x. 

9 2.3 Examples

2 1. Take (R , d1), where d1(x, y) = |x1 − y1| + |x2 − y2|, where x = (x1, x2) and 1 2n+1 y = (y1, y2), and consider the sequence ( n , n+1 ). We guess its limit is (0, 2). To see if this is right, look at    1 2n + 1 1 2n + 1 1 1 d1 , , (0, 2) = + − 2 = + → 0 n n + 1 n n + 1 n n + 1 as n → ∞. So the limit is (0, 2).

LECTURE 5

n 2. In C[0, 1] let fn(t) = t and f(t) = 0 for 0 ≤ t ≤ 1. Does fn → f, (a) in d1, and (b) in d∞?

(a) Z 1 n 1 d1(fn, f) = t dt = → 0 0 n + 1 as n → ∞. So fn → f in d1.

(b) n d∞(fn, f) = max{t : 0 ≤ t ≤ 1} = 1 6→ 0 as n → ∞. So fn 6→ f in d∞.

Note: say gn → g pointwise on [a, b] as n → ∞ if gn(x) → g(x) for all x ∈ [a, b]. If we  0 for 0 ≤ x < 1, define g(x) = then f → g pointwise on [0, 1]. But g 6∈ C[0, 1], as 1 for x = 1, n it is not continuous at 1.

3. Take the discrete metric  1 if x 6= y, d (x, y) = 0 0 if x = y.

Then xn → x ⇐⇒ d0(xn, x) → 0. But since d0(xn, x) = 0 or 1, this happens if and only if d0(xn, x) = 0 for n sufficiently large. That is, there is an n0 such that xn = x for all n ≥ n0. All convergent sequences in this metric are eventually constant. So, for example d0(1/n, 0) 6→ 0.

A result on convergence in R2. 2.4 Proposition

10 2 Take R with any of the metrics d1, d2 and d∞. Then a sequence xn = (an, bn) con- verges to x = (a, b) if and only if an → a and bn → b.

Proof: We have d1(xn, x) = |an − a| + |bn − b|. This tends to zero as n → ∞ if and only if each of the terms |an − a| and |bn − b| does. And that’s the same as saying that an → a and bn → b.

2 2 1/2 2 Also d2(xn, x) = (|an − a| + |bn − b| ) , which tends to zero if and only if |an − a| + 2 2 2 |bn − b| does; this happens if and only if |an − a| and |bn − b| tend to zero, which is the same as an → a and bn → b.

Finally, d∞(xn, x) = max{|an − a|, |bn − b|}. If this tends to zero then so do |an − a| and |bn − b| as they are smaller and still positive; and if they both tend to zero then so does their maximum, which is less than their sum. Again this is the same as saying an → a and bn → b.

A similar result holds for Rk in general.

Now let’s look at continuous functions again.

2.5 Theorem

If fn → f in (C[a, b], d∞), then fn → f in (C[a, b], d1).

(d∞ convergence is stronger than d1 convergence.)

Proof: d∞(fn, f) = max{|fn(x) − f(x)| : a ≤ x ≤ b} → 0 as n → ∞, so, given ε > 0 there is an N so that d∞(fn, f) < ε for n ≥ N. It follows that if n ≥ N then

Z b Z b d1(fn, f) = |fn(x) − f(x)| dx ≤ ε dx = ε(b − a), a a

so d1(fn, f) → 0 as n → ∞.

Note: It is also true that if d∞(fn, f) → 0 then fn → f pointwise on [a, b]. The converse is FALSE.

Now we look at continuous functions between general metric spaces.

2.6 Definition

Let f :(X, dX ) → (Y, dY ) be a map between metric spaces. We say that f is continuous at x0 ∈ X if for each ε > 0 there is a δ > 0 such that dY (f(x), f(x0)) < ε whenever dX (x, x0) < δ.

11 So f is continuous, if it is continuous at all points of X.

2.7 Proposition

For f as above, f is continuous at x0 if, whenever a sequence xn → x0, then f(xn) → f(x0) (“sequential continuity”).

Proof: Same proof as in , more or less. If f is continuous at x0 and xn → x0. Then for each ε > 0 we have a δ > 0 such that dY (f(x), f(x0)) < ε whenever dX (x, x0) < δ. Then there’s an n0 with d(xn, x0) < δ for all n ≥ n0, and so d(f(xn), f(x0)) < ε for all n ≥ n0. Thus f(xn) → f(x).

Conversely, if f is not continuous at x0, then there is an ε for which no δ will do, so we can find xn with d(xn, x0) < 1/n but d(f(xn), f(x0)) ≥ ε. Then xn → x0 but f(xn) 6→ f(x0).  But there is a nicer way to define continuity. For a mapping f : X → Y and a set U ⊂ Y , let f −1(U) be the set

f −1(U) = {x ∈ X : f(x) ∈ U}.

This makes sense even if f −1 is not defined as a function.

2.8 Theorem A function f : X → Y is continuous if and only if f −1(U) is open in X for every open subset U ⊂ Y .

(“The inverse image of an open set is open.” Note that for f continuous we do not expect f(U) to be open for all open subsets of X, for example f : R → R, f ≡ 0, then f(R) = {0}, not open.)

LECTURE 6

−1 Proof: Suppose that f is continuous, that U is open, and that x0 ∈ f (U), so f(x0) ∈ U. Now there is a ball B(f(x0), ε) ⊂ U, since U is open, and then by continuity there is a δ > 0 such that dY (f(x), f(x0)) < ε whenever dX (x, x0) < δ. This −1 −1 means that for d(x, x0) < δ, f(x) ∈ U and so x ∈ f (U). That is, f (U) is open. [DIAGRAM]

Conversely, if the inverse image of an open set is open, and x0 ∈ X, let ε > 0 be given. −1 We know that B(f(x0), ε) is open, so f (B(f(x0), ε)) is open, and contains x0. So it contains some B(x0, δ) with δ > 0.

12 −1 But now if d(x, x0) < δ, we have x ∈ B(x0, δ) ⊂ f (B(f(x0), ε)) so f(x) ∈ B(f(x0), ε) and we have d(f(x), f(x0)) < ε. 

2.9 Example

Let X = R with the discrete metric, and Y any metric space. Then all functions f : X → Y are continuous!

(i) Because the inverse image of an open set is an open set, since all sets are open. (ii) Because whenever xn → x0 we have xn = x0 for n large, so obviously f(xn) → f(x0).

2.10 Proposition (i) A function f : X → Y is continuous if and only if f −1(F ) is closed whenever F is a closed subset of Y .

(ii) If f : X → Y and g : Y → Z are continuous, then so is the composition g ◦ f : X → Z defined by (g ◦ f)(x) = g(f(x)).

[DIAGRAM]

Proof: (i) We can do this by complements, as if F is closed, then U = F c is open, and f −1(F ) = f −1(U)c (a point is mapped into F if and only if it isn’t mapped into U). Then f −1(F ) is always closed when F is closed ⇐⇒ f −1(U) is always open when U is open.

(ii) Take U ⊂ Z open; then (g ◦ f)−1(U) = f −1(g−1(U)); for these are the points which map under f into g−1(U) so that they map under g ◦ f into U. Now g−1(U) is open in Y , as g is continuous, and then f −1(g−1(U)) is open in X since f is continuous. 

2.11 Definition A function f : X → Y is a between metric spaces if it is a bijection s.t. f and f −1 are continuous. Then we say X and Y are homeomorphic, or X ∼ Y .

2.12 Example

The real line R is homeomorphic to the open interval (0, 1). For if we take y = tan−1 x this maps it homeomorphically onto (−π/2, π/2), and this can be mapped 1 homeomorphically onto (0, 1), e.g. by z = π (y + π/2).

13 3 Real inner-product spaces

Notation: vectors written u, v, w, etc. (Sometimes just u, v, w). Scalars written a, b, c, etc. Functions written f, g, h. Coordinates of a vector u normally written u1, u2, u3, etc.

3.1 Inner product in Rn 2 For vectors u = (u1, u2) and v = (v1, v2) in R we write hu, vi for the standard inner product hu, vi = u1v1 + u2v2 ; sometimes written u.v or (u, v). We can do similarly for vectors in Rn where n = 1, 2, 3, ..., (i.e., n components), so if u = (u1, . . . , un) and v = (v1, . . . , vn) we have

hu, vi = u1v1 + u2v2 + ... + unvn.

For example,

h(1, 2, 3, 4), (0, −1, 5, 2)i = 1.0 − 2.1 + 3.5 + 4.2 = 21.

3.2 Standard properties of the scalar product I. LINEARITY. hau + bv, wi = ahu, wi + bhv, wi, for a, b real and u, v, w vectors. II. SYMMETRY.

hu, vi = hv, ui.

III. POSITIVE DEFINITENESS. hu, ui ≥ 0 for all u, and we have hu, ui = 0 if and only if u = 0. 2 2 The first two are easy to check. For III note that hu, ui = u1 + ... + un ≥ 0, and it will be zero if and only if u1 = ... = un = 0.

3.3 Definition of a general (real) inner product Let V be a real and suppose that we have for each pair of vectors u, v in V a real number written hu, vi, such that properties I, II and III of (3.2) hold. Then we call V a real , and hu, vi the inner product of u and v.

14 N.B. In quantum mechanics and elsewhere people use complex inner products. Not in this course.

LECTURE 7

3.4 Examples

1. The usual inner product on Rn. 2. We can define a new inner product on R2 by

hu, vi = 2u1v1 + 3u2v2. Easily checked to be linear (do it!) and symmetric. For positive definiteness, note that 2 2 hu, ui = 2u1 + 3u2 ≥ 0

and is > 0 unless u1 = u2 = 0. The following alternative is not an inner product, e.g. define

hu, vi = 2u1v1 − 3u2v2, 2 2 so hu, ui = 2u1 − 3u2, and would be negative if u = (0, 1), say. 3. For a < b define C[a, b] to be the vector space of all continuous real functions on [a, b]. For f, g ∈ C[a, b] define Z b hf, gi = f(x)g(x) dx. a Example: in C[0, 1], let f(x) = x + 1 and g(x) = 2x. Z 1 Z 1 2x3 1 5 hf, gi = (x + 1)(2x) dx = (2x2 + 2x) dx = + x2 = . 0 0 3 0 3 3.5 Other properties of inner products (a) hu, av+bwi = hav+bw, ui (rule II) = ahv, ui+bhw, ui (rule I) = ahu, vi+bhu, wi (rule II again). So it is linear in the second argument as well as the first. (b) h0, ui = h0u + 0u, ui = 0hu, ui + 0hu, ui = 0 for all u, using rule I. Also hu, 0i = h0, ui = 0, using rule II. This is for any u ∈ V .

(c) More generally we can check that

ha1u1 + a2u2 + ... + aN uN , b1v1 + b2v2 + ... + bM vM i behaves like multiplication, and we get

N M X X aibjhui, vji. i=1 j=1

15 4 Lengths, angles, orthogonality

4.1 Definition In an inner product space we define the length of a vector v (sometimes called its size or ) by kvk = phv, vi. Note that hv, vi is always ≥ 0; also by property III, kvk = 0 if and only if v = 0. n 2 2 2 This agrees with what we usually do in R√ , e.g. v = (3, 4, −12), then kvk = 3 + 4 + (−12)2 = 9 + 16 + 144 = 169, so kvk = 169 = 13. Example: in C[−1, 1] let f(x) = x. Then

Z 1 x3 1 2 kfk2 = x2 dx = = , −1 3 −1 3 so kfk = p2/3. Note that if v ∈ V and a ∈ R, then kavk2 = hav, avi = a2hv, vi = a2kvk2, √ so kavk = a2kvk = |a|.kvk, taking the positive root. For example (−2)v is twice as big as v, but with direction reversed.

4.2 Definition The angle between two non-zero vectors u and v is the unique solution θ to

hu, vi = kuk kvk cos θ

in the range 0 ≤ θ ≤ π (radians!) It is easy to check that the angle between u and u is 0, and the angle between u and −u is π. We say u and v are orthogonal if hu, vi = 0. This is because the angle between them satisfies cos θ = 0 so θ = π/2. This is sometimes written u ⊥ v. To make sense of our definition we will need to know that hu, vi cos θ = kuk.kvk

lies between −1 and 1; see later. Example: in C[0, 1] find the number a such that the functions f(t) = t and g(t) = 3t + a are orthogonal. Solution: Z 1  at2 1 a hf, gi = t(3t + a) dt = t3 + = 1 + , 0 2 0 2

16 a so hf, gi = 0 ⇐⇒ 1 + 2 = 0, or a = −2.

More generally, a set of vectors {u1,..., uN } is an orthogonal set if hui, uji = 0 when- ever i 6= j.

4.3 Pythagoras’s theorem If hu, vi = 0 then ku + vk2 = kuk2 + kvk2.

[DIAGRAM – square on the hypotenuse etc.]

Proof:

ku + vk2 = hu + v, u + vi = hu, ui + hv, ui + hu, vi + hv, vi = kuk2 + 0 + 0 + kvk2,

using orthogonality.

4.4 Parallelogram identity

ku + vk2 + ku − vk2 = 2kuk2 + 2kvk2.

[DIAGRAM – draw a parallelogram.] The sums of the squares of the two diagonals equals the sums of the squares of the four sides. Proof: expand the inner products; see the example sheets.

5 Cauchy–Schwarz and its consequences

In order to make sense of (4.2) we need the following.

5.1 Cauchy–Schwarz inequality For u and v in an inner-product space,

hu, vi2 ≤ hu, ui hv, vi,

i.e., |hu, vi| ≤ kuk kvk.

Example: if u1, . . . , un and v1, . . . .vn are real numbers, then

n n !1/2 n !1/2

X X 2 X 2 uivi ≤ ui vi . i=1 i=1 i=1

17 Note that the LHS is |hu, vi| and the RHS is kuk kvk, where u = (u1, . . . , un), n v = (v1, . . . , vn) and we use the standard inner product in R .

LECTURE 8

We give two proofs, and in each we assume that u 6= 0 and v 6= 0 (otherwise the inequality is obvious).

Proof 1:

Take kau − bvk2 = a2kuk2 − 2abhu, vi + b2kvk2 ≥ 0, with a = hu, vi and b = kuk2. We get

kuk2(hu, vi2 − 2hu, vi2 + kuk2kvk2) ≥ 0,

which gives the result.

Proof 2:

For real t we have htu + v, tu + vi ≥ 0, i.e., t2hu, ui + 2thu, vi + hv, vi ≥ 0. We’ll minimize this over t, so by differentiation this is where

2thu, ui + 2hu, vi = 0.

So we put t = −hu, vi/hu, ui, and we get hu, vi2 hu, vi2 − 2 + hv, vi ≥ 0. hu, ui hu, ui This simplifies to hu, vi2 − + hv, vi ≥ 0, hu, ui i.e., hu, vi2 ≤ hv, vi, hu, ui which is what is required.

hu, vi NOW we know that lies between -1 and 1, and so the definition of angle kuk kvk makes sense.

18 5.2 Triangle inequality In an inner product space we have

ku + vk ≤ kuk + kvk.

For example, in Rn this gives

n !1/2 n !1/2 n !1/2 X 2 X 2 X 2 (ui + vi) ≤ ui + vi . i=1 i=1 i=1 [DIAGRAM – triangle of vectors] Proof:

ku + vk2 = hu + v, u + vi = kuk2 + 2hu, vi + kvk2 ≤ kuk2 + 2kuk kvk + kvk2 = (kuk + kvk)2.

5.3 Theorem In an inner-product space the norm (length) of a vector satisfies (i) kuk ≥ 0, and kuk = 0 if and only if u = 0; (ii) kauk = |a| kuk; (iii) ku + vk ≤ kuk + kvk.

5.4 Corollary Let V be an inner-product space, and define d(x, y) = kx − yk. Then d is a metric.

Proof: From Theorem 5.3, we see easily that d(x, y) ≥ 0 and d(x, y) = 0 if and only if x − y = 0, i.e., x = y. Also d(x, y) = kx − yk = ky − xk = d(y, x). Finally

d(x, z) = kx − zk = k(x − y) + (y − z)k ≤ kx − yk + ky − zk = d(x, y) + d(y, z).

So every inner-product space is a metric space. 

5.5 The space `2

2 ∞ The elements of the space ` (also written `2) are real sequences (uk)k=1 such that P∞ 2 k=1 uk < ∞. 1 1 1 2 P∞ 1 2 So, for example ( 2 , 4 , 8 ,...) ∈ ` , since k=1 2k < ∞ (geometric series); but 2 P∞ 2 (1, 2, 3, 4,...) 6∈ ` , since k=1 k = ∞.

19 We shall get a vector space by adding sequences term-wise; if u = (uk) and v = (vk), then u + v = (uk + vk) and au = (auk), just like vectors with an infinite sequence of components. 2 How do we know that (uk + vk) is still in ` ?

Proof: for each N,

N !1/2 N !1/2 N !1/2 X 2 X 2 X 2 (uk + vk) ≤ uk + vk k=1 k=1 k=1 ∞ !1/2 ∞ !1/2 X 2 X 2 ≤ uk + vk = A, k=1 k=1 say, where we used first the triangle inequality in RN . Since this holds for every N we PN 2 2 let N → ∞ to see that k=1(uk + vk) converges, and its limit is at most A .

In fact `2 is an inner-product space; define

∞ X hu, vi = ukvk. k=1

To see that this sum converges, use Cauchy–Schwarz in RN :

N N !1/2 N !1/2 X X 2 X 2 |ukvk| ≤ uk vk k=1 k=1 k=1 ∞ !1/2 ∞ !1/2 X 2 X 2 ≤ uk vk = B, k=1 k=1 P∞ P∞ say. Hence k=1 |ukvk| converges to a limit which is at most B. So k=1 ukvk is absolutely convergent.

It is easy now to check that this defines an inner product. Also kuk2 = hu, ui = P∞ 2 n k=1 uk, so it is like R with n = ∞. It is an infinite-dimensional vector space, but a very useful one.

LECTURE 9

6 Orthonormal sets

6.1 Definition

20 A set of vectors {e1,..., en} in an inner product space is orthonormal if it is orthogonal and each vector has norm 1. So  0 if i 6= j, he , e i = i j 1 if i = j.

If it’s also a basis for the inner product space, then we call it an orthonormal basis.

Examples: (i) (1, 0, 0), (0, 1, 0), (0, 0, 1) is an orthonormal basis of R3 (the standard basis); 2 3 4 4 3 (ii) An unusual orthonormal basis of R is e1 = ( 5 , 5 ) and e2 = (− 5 , 5 ). [DIAGRAM – draw the vectors]

6.2 Proposition

If {e1,..., en} is orthonormal, then

n n !1/2

X X 2 aiei = ai , i=1 i=1 for any scalars a1, . . . , an, and so the vectors {e1,..., en} are linearly independent. Proof: * n n + n n X X X X aiei, ajej = aiajhei, eji, i=1 j=1 i=1 j=1 Pn 2 by (3.5). All terms except for those with i = j are zero, and we get i=1 ai , as required.

Pn Pn 2 Also, if i=1 aiei = 0, then i=1 ai = 0, and so a1 = ... = an = 0; i.e., the vectors are independent.

6.3 The Gram–Schmidt process

We start with a sequence v1,..., vn of independent vectors and up with a se- quence e1,..., en of orthonormal vectors such that for each 1 ≤ k ≤ n the set {e1,..., ek} spans the same subspace as {v1,..., vk}.

Define w1 = v1 and e1 = w1/kw1k.

Let w2 = v2 − hv2, e1ie1, and e2 = w2/kw2k.

Then w3 = v3 − hv3, e1ie1 − hv3, e2ie2, and e3 = w3/kw3k.

In general

k X wk+1 = vk+1 − hvk+1, eiiei, and ek+1 = wk+1/kwk+1k. i=1

21 Then {e1,..., en} are orthonormal and for each k the vectors e1,..., ek span the same space as v1,..., vk.

Proof: Basically, the orthonormality property is shown by induction.

Suppose that we know that e1,..., ek are orthonormal (k = 1 is already done). Then we work out hwk+1, eji for j ≤ k. So

k X hwk+1, eji = hvk+1, eji − hvk+1, eiihei, eji = hvk+1, eji − hvk+1, eji = 0. i=1

So each new vector wk+1 and hence also ek+1 is orthogonal to the earlier ej. It isn’t zero since vk+1 is independent of v1,..., vk.

Also ek+1 = wk+1/kwk+1k implies that kek+1k = kwk+1k/kwk+1k = 1.

The span of e1,..., ek is k-dimensional and contained in span{v1,..., vk}, so must equal it.

4 Example: Take v1 = (1, 0, 0, 1), v2 = (2, 3, 2, 0) and v3 = (0, 7, −2, 2) in R . Set w1 = v1 and 1 e1 = w1/kw1k = √ (1, 0, 0, 1). 2 1 1 w2 = v2 − hv2, e1ie1 = (2, 3, 2, 0) − √ 2√ (1, 0, 0, 1) = (1, 3, 2, −1). 2 2

Note that w2 ⊥ e1. Then 1 e2 = w2/kw2k = √ (1, 3, 2, −1). 15 Next

w3 = v3 − hv3, e1ie1 − hv3, e2ie2 2 1 1 1 = (0, 7, −2, 2) − √ √ (1, 0, 0, 1) − √ 15√ (1, 3, 2, −1) 2 2 15 15 = (0, 7, −2, 2) − (1, 0, 0, 1) − (1, 3, 2, −1) = (−2, 4, −4, 2).

Finally, (−2, 4, −4, 2) 1 e3 = w3/kw3k = √ = √ (−1, 2, −2, 1). 40 10 Having done this, CHECK that

 1 if i = j, he , e i = i j 0 if i 6= j.

22 Example (Legendre polynomials): Take the functions 1, t, t2, t3,... in C[−1, 1] with inner product

Z 1 hf, gi = f(t)g(t) dt. −1 √ 2 R 1 Now k1k = −1 1 dt = 2, so e1(t) = 1/ 2.

Next take 1 Z 1 t w2(t) = t − ht, e1ie1 = t − √ √ dt = t − 0 = t. 2 −1 2 Also Z 1 r 2 2 2 3 kw2k = t dt = , so e2(t) = w2(t)/kw2k = t. −1 3 2 Then

2 2 2 w3(t) = t − ht , e1ie1(t) − ht , e2ie2(t) 1 Z 1 3 Z 1 = t2 − t2 dt − t t3 dt 2 −1 2 −1 1 1 = t2 − − 0 = t2 − . 3 3 But Z 1 Z 1 2 2 1 2 4 2 2 1 kw3k = (t − ) dt = (t − t + ) dt −1 3 −1 3 9 2 4 2 8 = − + = , 5 9 9 45 so r45 1 r5 e (t) = (t2 − ) = (3t2 − 1). 3 8 3 8

In general, en(t) has degree n − 1. Lots of useful systems of polynomials are obtained by orthonormalizing 1, t, t2, t3,... with respect to different inner products (e.g. Cheby- shev, Hermite, Laguerre, . . . ).

23 LECTURE 10

7 Orthogonal projections and best approximation

Many approximation problems consist of taking a vector v and a subspace W of an inner-product space, and then finding the closest element w in W to v, i.e., minimizing the size of the error v − w.

Examples: 1. Take R3 with the usual inner product and W a plane through the origin. The closest point of W is obtained by “dropping a perpendicular onto W ”.

[DIAGRAM]

2. Find the best approximation to the function f(t) = |t| on [−1, 1] by a quadratic g(t) = a + bt + ct2, in the sense of minimizing

Z 1 kf − gk2 = (f(t) − g(t))2 dt. −1 7.1 Theorem Let W be a (finite-dimensional) subspace of an inner-product space V , let v ∈ V , and let w ∈ W satisfy hv − w, zi = 0 for all z ∈ W. Then kv − yk ≥ kv − wk for all y ∈ W . That is, w is the closest point in W to v, and it is unique.

[DIAGRAM: plot v, w, y.]

Proof: for y ∈ W write v − y = (v − w) + (w − y) and note that v − w is orthogonal to w − y, since w − y is in W . By Pythagoras’s theorem (4.3),

kv − yk2 = kv − wk2 + kw − yk2 ≥ kv − wk2, as required. Note that if y 6= w, then kv −yk > kv −wk so the closest point is unique.

7.2 Definition If W is a subspace of an inner product space V , then its orthogonal complement, W ⊥, is the set of all vectors u that are orthogonal to every vector of W . ⊥ ⊥ Clearly 0 ∈ W , and indeed W is a subspace, since if u1 and u2 are orthogonal to everything in W , then ha1u1 + a2u2, wi = a1hu1, wi + a2hu2, wi = 0 for all w ∈ W .

24 Example: if W is the 1-dimensional subspace of R3 spanned by the vector w = (3, 5, 7) ⊥ then x = (x1, x2, x3) is in W if and only if hx, wi = 0, i.e., 3x1 + 5x2 + 7x3 = 0. This is the plane perpendicular to W .

It can be checked that (W ⊥)⊥ is W again.

Now in (7.1) we have that if v − w lies in W ⊥, then w is the best approximation to v by vectors in W .

7.3 The normal equations

Suppose that w1,..., wn is a basis for W . Then the best approximant w to v is found by solving hv − w, wii = 0 for each i, because this makes v − w orthogonal to all linear combinations of the wi. Hence we have hw, wii = hv, wii for each i. Pn Suppose now that w = k=1 ckwk is the best approximant. Then we have

n X ckhwk, wii = hv, wii k=1 for each i = 1, . . . , n.

Example. In C[−1, 1] we take f(t) = |t|; to approximate it by a quadratic take w1(t) = 1, 2 w2(t) = t and w3(t) = t . 2 The best approximant c0 + c1t + c2t to |t| satisfies:

2 c0h1, 1i + c1ht, 1i + c2ht , 1i = hf, 1i, 2 c0h1, ti + c1ht, ti + c2ht , ti = hf, ti, 2 2 2 2 2 c0h1, t i + c1ht, t i + c2ht , t i = hf, t i.

Now we can easily check that

Z 1  k 0 if k is odd, t dt = 2 if k is even, −1 k+1 so we can soon calculate inner products and get

2 Z 1 2c0 + 0 + c2 = |t| dt = 1 3 −1

25 2 Z 1 0 + c1 + 0 = |t|t dt = 0, 3 −1 Z 1 2 2 2 1 c0 + 0 + c2 = t |t| dt = . 3 5 −1 2 Note that Z 1 Z 0 Z 1 |t| dt = (−t) dt + t dt, −1 −1 0 etc. 3 15 The solution to these equations is c0 = 16 , c1 = 0 and c2 = 16 , giving the approximation 3 15 |t| ≈ + t2. 16 16

7.4 Corollary

Suppose that e1,..., en is an orthonormal basis for W . Then the best approximant of v ∈ V by an element of W is n X w = hv, ekiek. k=1

26 Pn Proof: Let w = k=1 ckek. Then the normal equations become

n X ckhek, eii = hv, eii, k=1

which reduces to ci = hv, eii using orthonormality.

Thus we could have solved the example of approximating f(t) = |t| by using an or- thonormal basis for the quadratic polynomials, e.g. the Legendre functions.

7.5 Definition

The orthogonal projection of v onto W , written PW v, is the closest vector w ∈ W to v. In particular, n X PW v = hv, ekiek, k=1 if {e1,..., en} is an orthonormal basis of W . Note that PW : V → W is a linear mapping.

LECTURE 11

3 Example: the plane W = {(x1, x2, x3) ∈ R : x1 + x2 + x3 = 0} is a 2-dimensional subspace with orthonormal basis e = √1 (1, −1, 0) and e = √1 (1, 1, −2). CHECK 1 2 2 6 that these are orthonormal and lie in W (so, since dim W = 2, they are also a basis for it).

Calculate PW (1, 0, 0). It is

PW (1, 0, 0) = h(1, 0, 0), e1ie1 + h(1, 0, 0), e2ie2 1 1 2 1 1 = (1, −1, 0) + (1, 1, −2) = ( , − , − ). 2 6 3 3 3 Now for some more serious applications of the theory.

7.6 Least squares approximation Problem: find the line through (0, 0) (to be varied later) which “best approximates” the data (x1, y1),..., (xn, yn). We would like yi = cxi for each i, but we don’t know c and the points won’t always lie exactly on a line.

[DIAGRAM]

Pn 2 We decide to minimize i=1(yi − cxi) , least squares approximation, useful in statis- tical applications.

27 n This is the same as taking x = (x1, . . . , xn) and y = (y1, . . . , yn) in R and minimizing ky − cxk. Take V to be Rn, usual inner product, and W to be the one-dimensional subspace {ax : a ∈ R}. This is the same as finding the closest point to y in W .

Solution: take hy, xi c = , hx, xi since this is the orthogonal projection onto W . In detail, w = cx, and the normal equation is hw, xi = hy, xi, or chx, xi = hy, xi. So x1y1 + ... + xnyn c = 2 2 . x1 + ... + xn Example: find the best fit to the data x y 2 3 1 2 3 3 4 5 Solution: 2.3 + 1.2 + 3.3 + 4.5 37 c = = . 22 + 12 + 32 + 42 30 7.7 Generalization

Suppose that y is known/guessed to be a of m variables x1,..., xm, y = c1x1 + ... + cmxm, so we have experimental data

x1 x2 ... xm y x11 x21 . . . xm1 y1 ...... x1n x2n . . . xmn yn

n Set up the problem in R and choose c1, . . . , cm to minimize ky − (c1x1 + ... + cmxm)k. If W = span{x1,..., xm}, then we want the closest point in W to y. We know from (7.3) that the constants c1, . . . , cm are determined by the normal equa- tions: m X h ckxk, xii = hy, xii for each i, k=1 i.e.,

c1hx1, x1i + ... + cmhxm, x1i = hy, x1i ...... = ...

c1hx1, xmi + ... + cmhxm, xmi = hy, xmi

28 To get a unique solution we need the vectors x1,..., xm to be independent, which re- quires n ≥ m.

Example: Use the method of least squares approximation to find the best relation of the form y = c1x1 + c2x2 fitting the following experimental data:

x1 x2 y i) 1 0 2 ii) 0 1 3 iii) 1 1 2 iv) 1 −1 0

4 Solution: We work in R and take x1 = (1, 0, 1, 1), x2 = (0, 1, 1, −1) and y = (2, 3, 2, 0). Normal equations are

3c1 + 0c2 = 4 0c1 + 3c2 = 5,

4 5 so c1 = 4/3 and c2 = 5/3. So the best relation is y = 3 x1 + 3 x2, giving

x1 x2 yexperimental ytheoretical 1 0 2 4/3 0 1 3 5/3 1 1 2 3 1 −1 0 −1/3 7.8 Curve fitting

Given (x1, y1),..., (xn, yn) find a (polynomial) curve which fits these points well in the sense of least squares approximation. 2 Example: Find the parabola y = c0 + c1x + c2x which best fits the points (0, 0), (1, 4), (−1, 1), (−2, 5).

Solution: Apply the method of least squares approximation to y = c0x0 + c1x1 + c2x2 2 with x0 = 1, x1 = x and x2 = x . Put x0 = (1, 1, 1, 1), x1 = (0, 1, −1, −2), x2 = (0, 1, 1, 4), y = (0, 4, 1, 5). Note that x0 is the vector with all components 1, x1 the vector of x values, and and x2 the vector of x2 values. Normal equations are

4c0 − 2c1 + 6c2 = 10, −2c0 + 6c1 − 8c2 = −7, 6c0 − 8c1 + 18c2 = 25,

from which c0 = 3/10, c1 = 8/5 and c2 = 2.

Example: Find the line y = c0 + c1x which best fits the points (2, 3), (1, 2), (3, 3) and (4, 5). (Data used earlier to get y = cx only.)

29 Solution: let x0 = (1, 1, 1, 1), x1 = (2, 1, 3, 4) and y = (3, 2, 3, 5). So we want y ≈ c0x0 + c1x1. Normal equations are

4c0 + 10c1 = 13 10c0 + 30c1 = 37,

9 giving c0 = 1 and c1 = 9/10, or y = 1 + 10 x.

LECTURE 12

8 Cauchy sequences and completeness

Recall that if (X, d) is a metric space, then a sequence (xn) of elements of X converges to x ∈ X if d(xn, x) → 0, i.e., if given ε > 0 there exists N such that d(xn, x) < ε whenever n ≥ N.

Often we think of convergent sequences as ones where xn and xm are close together when n and m are large. This is almost, but not quite, the same thing.

8.1 Definition

A sequence (xn) in a metric space (X, d) is a if for any ε > 0 there is an N such that d(xn, xm) < ε for all n, m ≥ N.

1 1 Example: take xn = 1/n in R with the usual metric. Now d(xn, xm) = n − m . Suppose that n and m are both at least as big as N; then d(xn, xm) ≤ 1/N.

[DIAGRAM, showing the points]

Hence if ε > 0 and we take N > 1/ε, we have d(xn, xm) ≤ 1/N < ε whenever n and m are both ≥ N.

In fact all convergent sequences are Cauchy sequences, by the following result.

8.2 Theorem

Suppose that (xn) is a convergent sequence in a metric space (X, d), i.e., there is a limit point x such that d(xn, x) → 0. Then (xn) is a Cauchy sequence.

Proof: take ε > 0. Then there is an N such that d(xn, x) < ε/2 whenever n ≥ N. Now suppose both n ≥ N and m ≥ N. Then

d(xn, xm) ≤ d(xn, x) + d(x, xm) = d(xn, x) + d(xm, x) < ε/2 + ε/2 = ε,

30 and we are done.

8.3 Proposition Every subsequence of a Cauchy sequence is a Cauchy sequence.

Proof: if (xn) is Cauchy and (xnk ) is a subsequence, then given ε > 0 there is an N such that d(xn, xm) < ε whenever n, m ≥ N. Now there is a K such that nk ≥ N whenever k ≥ K. So d(xnk , xnl ) < ε whenever k, l ≥ K.

Does every Cauchy sequence converge?

Examples: 1. (X, d) = Q, as a subspace of R with the usual metric. Take x0 = 2 and define x = xn + 1 . The sequence continues 3/2, 17/12, 577/408,... and indeed n+1 2 xn x 1 2 xn → x where x = 2 + x , i.e., x = 2. But this isn’t in Q. √ Thus (xn) is Cauchy in R, since it converges to 2 when we think of it as a sequence in R. So it is Cauchy in Q, but doesn’t converge to a point of Q.

1  2. Easier. Take (X, d) = (0, 1). Then n is a Cauchy sequence in X (since it is Cauchy in R, as seen above), and has no limit in X.

In each case there are “points missing from X”.

8.4 Definition A metric space (X, d) is complete if every Cauchy sequence in X converges to a limit in X.

Is R complete? What do we mean by R? We could regard it as the set of all infinite decimal numbers; but since there is an ambiguity e.g. 0.999. . . = 1.000. . . , we have to allow for this, e.g. by regarding all the recurring-9 numbers as the same as the corresponding recurring-0 numbers.

1 n 1 Cauchy sequences can be awkward, e.g. xn = 2 +(−1) 10n , i.e., 0.4, 0.51, 0.499, 0.5001, 0.49999, . . . , will converge to 0.5, even though the individual digits do not converge.

8.5 Theorem

R is complete.

We do this in several stages.

31 A: Every bounded increasing or decreasing sequence in R converges. Increasing means x1 ≤ x2 ≤ ... and you can guess what decreasing means. Monotone means either increasing or decreasing. B: Every Cauchy sequence in R is bounded. C: Every sequence in R has a monotone subsequence. D: If a Cauchy sequence has a convergent subsequence, then the original sequence converges. E: R is complete.

Proof of E: let (xn) be a Cauchy sequence in R. By (C) it has a monotone subse-

quence (xnk ), which is also Cauchy by (8.3). By (B) this sequence is bounded. So by (A) it converges. Now by (D) the original sequence converges.

Proof of A: we can take this as an axiom of R, or observe that if the numbers are increasing and bounded, then eventually the integer parts are constant, then the first digit after the decimal point, then the second, . . . , so it is clear what number we want −k as our limit. But if xn agrees with x to k decimal places then |xn − x| < 10 ; this shows that xn → x. √ Example, 1.0, 1.2, 1.4, 1.41, 1.412, 1.414, 1.4141, 1.4142, ... is homing in on 2.

LECTURE 13

Proof of B: if (xn) is Cauchy, then with ε = 1 we know that |xm − xn| < 1 when- ever m, n ≥ N. Now |xn| ≤ |xn − xN | + |xN | < 1 + |xN | for all n ≥ N. Let K = max{|x1|, |x2|,..., |xN−1|, 1 + |xN |}. Then |xn| ≤ K for all n.

Proof of D: suppose that (xn) is Cauchy in (X, d) and limk→∞ xnk = y. Take ε > 0. Then there exists N such that d(xm, xn) < ε/2 whenever m, n ≥ N; and K such that

d(xnk , y) < ε/2 whenever k ≥ K. Choose k ≥ K such that nk ≥ N. Then for n ≥ N,

d(xn, y) ≤ d(xn, xnk ) + d(xnk , y) < ε,

and so d(xn, y) → 0 as n → ∞.

Proof of C: let (xn) be a sequence in R. We say that xm is a peak point of the sequence if xm ≥ xn for all n > m.

[DIAGRAM]

Case 1: only finitely many peak points. Choose n1 large so that xn is not a peak point for any n ≥ n1.

Since xn1 is not a peak point we can find n2 > n1 with xn2 > xn1 ;

since xn2 is not a peak point we can find n3 > n2 with xn3 > xn2 ; and so on.

32 Now (xnk ) is strictly increasing.

Case 2: (xn) has infinitely many peak points, say, xn1 , xn2 , . . . , with n1 < n2 < . . ..

Now xn1 ≥ xn2 ≥ ..., so (xnk ) is a decreasing subsequence.

We have finished the proof that R is complete.

8.6 Corollary

A subset X ⊂ R is complete if and only if it is closed.

Proof: If X is not closed, then X 6= X, so there is a point y ∈ R such that y ∈ X\X. There is a sequence (xn) in X that converges to y, by Theorem 2.2. Then (xn) is a Cauchy sequence by Theorem 8.2, but it does not have a limit in X, so X is not com- plete.

Conversely, if X is closed and (xn) is a Cauchy sequence in X, then it has a limit y in R, since R is complete, by Theorem 8.5. But then y ∈ X by Theorem 2.2, so y ∈ X since X is closed. Hence X is complete.  Examples: open intervals in R are not complete; closed intervals are complete.

What about C[a, b] with d1, d2 or d∞?

Define fn in C[0, 2] by  xn for 0 ≤ x ≤ 1, f (x) = n 1 for 1 ≤ x ≤ 2. [DIAGRAM]

Then Z 2 d1(fn, fm) = |fn(x) − fm(x)| dx 0 Z 1 = |xn − xm| dx 0 Z 1 = (xm − xn) dx if n ≥ m 0 1 1 1 = − ≤ → 0, m + 1 n + 1 m + 1

and hence (fn) is Cauchy in (C[0, 2], d1). Does the sequence converge?

33 Z 2 If there is an f ∈ C[0, 2] with fn → f as n → ∞, then |fn(x) − f(x)| dx → 0, so 0 Z 1 Z 2 and both tend to zero. So fn → f in (C[0, 1], d1), which means that f(x) = 0 0 1 on [0, 1] (from an example we did earlier). Likewise, f = 1 on [1, 2], which doesn’t give a continuous limit.

Similarly, (C[a, b], d1) is incomplete in general. Also it is incomplete in the d2 metric, as the same example shows (a similar calculation with squares of functions).

What about d∞?

8.7 Definition

A sequence (fn) of (not necessarily continuous) functions defined on [a, b] is said to converge uniformly to f if sup{|fn(x) − f(x)| : x ∈ [a, b]} → 0 as n → ∞. (If these are continuous functions, then this is just convergence in the d∞ metric.)

8.8 Theorem

If (fn) are continuous functions and fn → f uniformly, then f is also continuous.

Proof: Take ε > 0 and a point x ∈ [a, b]. Then there is an N such that |fn(t)−f(t)| < ε/3 for all t ∈ [a, b] whenever n ≥ N. Now fN is continuous, so we can choose δ > 0 such that |fN (t) − fN (x)| < ε/3 for all t ∈ [a, b] with |t − x| < δ. Then

|f(t) − f(x)| ≤ |f(t) − fN (t)| + |fN (t) − fN (x)| + |fN (x) − f(x)| ≤ ε/3 + ε/3 + ε/3 = ε whenever t ∈ [a, b] and |t − x| < δ. Hence f is continuous at x.

n Thus, for example, the functions fn(t) = t converge pointwise on [0, 1] to  0 for 0 ≤ t < 1, g(t) = but g is not continuous, so the convergence isn’t uniform. 1 for t = 1,

LECTURE 14

8.9 Theorem

(C[a, b], d∞) is a .

Proof: take a Cauchy sequence (fn) in (C[a, b], d∞). The proof goes in two steps.

I: For each x ∈ [a, b], (fn(x)) is a Cauchy sequence in R, and so has a limit, which we call f(x).

34 II: fn → f uniformly; hence f ∈ C[a, b] and d∞(fn, f) → 0.

Step I: given ε > 0 there is an N with d∞(fn, fm) < ε for n, m ≥ N, since (fn) is Cauchy. But |fn(x) − fm(x)| ≤ d∞(fn, fm) and so this is also < ε for n, m ≥ N. So (fn(x)) is a Cauchy sequence in R. Since R is complete by (8.5), we see that there is a limiting value f(x).

Step II: take ε > 0 and N as in Step I. Then |fn(x) − fm(x)| < ε for each x, provided that n, m ≥ N. Fix n ≥ N and let m → ∞. We conclude that |fn(x) − f(x)| ≤ ε for each x, provided that n ≥ N. This is just the uniform convergence of fn to f. So f is continuous, i.e., f ∈ C[a, b] by (8.8), and d∞(fn, f) → 0.

8.10 Remark

2 Note that R is also complete with any of the metrics d1, d2 and d∞; since a Cauchy/ 2 convergent sequence (vn) = (xn, yn) in R is just one in which both (xn) and (yn) are Cauchy/ convergent sequences in R (cf. Prop. 2.4).

Similar arguments show that Rk is also complete for k = 1, 2, 3,..., and (with the same proof as for Corollary 8.6) all closed subsets of Rk are complete.

9 Contraction mappings

Our aim is to use metric spaces to solve equations by using an iterative method to get approximate solutions.

9.1 Examples

3 2 1 3 2 1. x + 2x − 8x + 4 = 0. Rewrite this as x = 8 (x + 2x + 4).

1 3 2 Consider the function φ : R → R given by φ(x) = 8 (x + 2x + 4). Then x is a root of our equation if and only if φ(x) = x, i.e., x is a fixed point of φ.

Guess a solution x0; then let x1 = φ(x0), x2 = φ(x1), .... This gives a sequence of numbers x0, x1, x2, . . . , xn, xn+1 = φ(xn),.... If these terms converge to a limit, then this limit should be a solution. e.g. take x0 = 0, then x1 = 0.5, x2 = 0.578, x3 = 0.608, x4 = 0.621, x5 = 0.626, x6 = 0.629, x7 = 0.630, x8 = 0.630. dy 2. = x(x + y), for 0 ≤ x ≤ 1, with y(0) = 0. dx

35 Rewrite as Z x y(x) = t(t + y(t)) dt. 0 Define Z x φ(f)(x) = t(t + f(t)) dt. 0 So y = f(x) solves the original equation if and only if φ(f) = f. Again, try to find the solution as the limit of a sequence. Take f0(x) = 0 for 0 ≤ x ≤ 1. Then Z x Z x 3 2 x f1 = φ(f0), i.e., f1(x) = t(t + f0(t)) dt = t dt = . 0 0 3

Z x Z x t3 x3 x5 f2 = φ(f1), i.e., f2(x) = t(t + f1(t)) dt = t(t + ) dt = + . 0 0 3 3 15 Z x t3 t5 x3 x5 x7 f3 = φ(f2), i.e., f3(x) = t(t + + ) dt = + + . 0 3 15 3 15 105

Suppose we have a metric space (X, d) and a function φ : X → X. Choose x0 ∈ X and define xn = φ(xn−1) for n ≥ 1. This gives a sequence (xn); if it is Cauchy and (X, d) is complete, then x = limn→∞ xn exists and x should solve x = φ(x). How can we guarantee that (xn) will be Cauchy?

Note that d(xn, xn+1) = d(φ(xn−1), φ(xn)), so to get (xn) Cauchy we want φ to shrink distances. Let’s call φ : X → X a shrinking map if d(φ(y), φ(z)) < d(y, z) for all y, z ∈ X with y 6= z.

Example:

Take X = [1, ∞), regarded as a subspace of R, usual metric. It is complete. Define 1 φ : X → X by φ(x) = x + x . How can we check that φ is a shrinking map? Answer: use the Mean Value Theorem (MVT)! |φ(x) − φ(y)| = |φ0(c)| |x − y| for some c between x and y. Now φ0(c) = 1 − 1/c2, so |φ0(c)| < 1 for all c ∈ X. Hence |φ(x) − φ(y)| < |x − y| for any x 6= y, and so φ is a shrinking map.

Take x0 = 1, x1 = φ(x0) = 2, x2 = φ(x1) = 2 + 1/2 = 5/2, x3 = φ(x2) = 29/10, x4 = φ(x3) = 941/290, . . . . Clearly (xn) is increasing. If it remains bounded, then it has a limit, `, say. Then we shall have ` = ` + 1/`, which is impossible. So (xn) is unbounded, doesn’t converge, isn’t Cauchy. Too bad!

LECTURE 15

36 9.2 Definition Let (X, d) be a metric space. A map φ : X → X is a contraction mapping, if there exists a constant k < 1 such that d(φ(x), φ(y)) ≤ kd(x, y) for all x, y ∈ X. Examples: 1. Take X = [0, 1], usual metric, and φ(x) = x2/3. Then

2 2 x y 1 2 d(φ(x), φ(y)) = − = (x + y)(x − y) ≤ |x − y|. 3 3 3 3 So φ is a contraction mapping, with k = 2/3.

1 1 2. Take X = R and φ(x) = 4 sin 3x. So |φ(x) − φ(y)| = 4 | sin 3x − sin 3y|. 0 3 0 3 Use MVT! φ (x) = 4 cos 3x, so |φ (x)| ≤ 4 , and 3 |φ(x) − φ(y)| = |φ0(c)| |x − y| ≤ |x − y|, etc. 4 3. Take X = [1, ∞) and φ(x) = x + 1/x. Suppose x = n and y = n + 1; then 1 1 1 φ(y) − φ(x) = n + 1 + − n − = 1 − , n + 1 n n(n + 1) so |φ(y) − φ(x)|/|y − x| can be made as close to 1 as we like by taking x = n and y = n + 1 for n large. Thus φ (which is a shrinking mapping) is not a contraction mapping.

9.3 Theorem Let (X, d) be a metric space and let φ : X → X be a contraction mapping. Then for each x0 ∈ X the sequence defined by xn = φ(xn−1) (for each n ≥ 1) is a Cauchy sequence.

Proof: for some k < 1 we have d(φ(x), φ(y)) ≤ kd(x, y). So

d(x2, x1) ≤ kd(x1, x0) = kd, say, 2 d(x3, x2) ≤ kd(x2, x1) ≤ k d, ... ≤ ... n d(xn+1, xn) ≤ k d, and so on. Hence for n > m we have

d(xn, xm) ≤ d(xm, xm+1) + d(xm+1 + xm+2) + ... + d(xn−1, xn) ≤ kmd + km+1d + ... + kn−1d from the above kmd ≤ kmd(1 + k + k2 + ...) = , 1 − k

37 which tends to 0 as m → ∞. Thus (xn) is Cauchy.

Note: in the above theorem, if (X, d) is complete, then (xn) will converge to a limit x ∈ X. Note that x is a fixed point of φ, i.e., φ(x) = x, since

d(x, φ(x)) ≤ d(x, xn) + d(xn, φ(xn)) + d(φ(xn), φ(x))

≤ d(x, xn) + d(xn, xn+1) + kd(xn, x), and each term tends to 0 as n → ∞. So d(x, φ(x)) = 0, i.e., x = φ(x).

9.4 Theorem (Banach’s Contraction Mapping Theorem) (CMT) Let (X, d) be a complete metric space, and let φ : X → X be a contraction mapping. Then φ has a unique fixed point. If x0 is any point in X then the sequence defined by xn = φ(xn−1) (for n ≥ 1) converges to the fixed point.

Proof: by Theorem 9.3 and the note following it, we have proved everything except the fact that there is only one fixed point for φ. But if x and y are fixed points, then

d(x, y) = d(φ(x), φ(y)) ≤ kd(x, y), with k < 1; this can only happen if d(x, y) = 0, i.e., x = y.

How to apply the CMT: suppose we want to solve the equation φ(x) = x, where φ is a contraction mapping. Take x0 ∈ X and construct (xn) as above. Then (xn) tends to a solution x.

Note that in an incomplete metric space, there may be problems. For example take X = (0, 1) ⊂ R and φ(x) = x/2. The iterates form a Cauchy sequence but the limit, 0, isn’t in the space, and there is no fixed point in the space.

9.5 How fast does it converge?

n Answer: d(x1, x) = d(φ(x0), φ(x)) ≤ kd(x0, x), and in general d(xn, x) ≤ k d(x0, x). Also d(x0, x) ≤ d(x0, x1)+d(x1, x) ≤ d(x0, x1)+kd(x0, x), so (1−k)d(x0, x) ≤ d(x0, x1), and we conclude that kn d(x , x) ≤ d(x , x ), n 1 − k 0 1 so we can choose n large to make this as small as we wish.

Examples:

1. Show that x3 + 2x2 − 8x + 4 = 0 has a unique solution in [−1, 1], and find it correct to within ±10−6.

38 1 3 2 1 3 2 Solution: write equation as x = 8 (x + 2x + 4), and let φ(x) = 8 (x + 2x + 4) for −1 ≤ x ≤ 1. Note that if |x| ≤ 1 then 1 7 |φ(x)| ≤ (|x|3 + 2|x| + 4) ≤ , 8 8 so φ does map [−1, 1] to itself. Then

0 1 2 7 φ (x) = (3x + 4x) ≤ , 8 8

for x ∈ [−1, 1], so φ is a contraction mapping with k = 7/8. It has a unique fixed point, as required.

LECTURE 16

Take x0 = 0. Defining xn = φ(xn−1), we get x1 = 0.5, x2 = 0.578, etc. as in Examples 9.1. The sequence converges to 0.6308976, although convergence is slow, since k = 7/8, so the error after n steps is only bounded by

kn 7n |x − x| ≤ |x − x | = 4 . n 1 − k 0 1 8

2. Define φ : C[0, 1] → C[0, 1] by

Z x φ(f)(x) = t(t + f(t)) dt. 0

Show φ is a contraction mapping for the metric d∞, and use φ to find an approximate solution y to the differential equation dy = x(x + y), y(0) = 0, (0 ≤ x ≤ 1). dx Solution: d∞(φ(f), φ(g)) = max{|φ(f)(x) − φ(g)(x)| : 0 ≤ x ≤ 1}.

Z x Z x

|φ(f)(x) − φ(g)(x)| = t(t + f(t)) dt − t(t + g(t)) dt 0 0 Z x

= t(f(t) − g(t)) dt 0 Z x ≤ t|f(t) − g(t)| dt 0 Z x 1 2 ≤ td∞(f, g) dt = x d∞(f, g). 0 2

39 1 Thus d∞(φ(f), φ(g)) ≤ 2 d∞(f, g), and φ is a contraction map with k = 1/2.

If y = f(x) is a solution of the diff. eq. then f 0(t) = t(t + f(t)) and f(0) = 0. Integrate from 0 to x: Z x Z x f 0(t) dt = t(t + f(t)) dt, 0 0 i.e., f(x) = φ(f)(x). So f = φ(f) and f is a fixed point of φ.

So CMT says that the d.e. has a unique solution, which we can obtain by iteration. 3 1 We did this in Examples 9.1 as well. Note that f0 = 0, f1(x) = x /3, so d∞(f0, f1) = 3 , 1 and in general d∞(fn, f) ≤ 3.2n−1 , by 9.5.

Another example: f 0(t) = t(1 + f(t)), for t ∈ [0, 1], with f(0) = 0, f(t) = et2/2 − 1 is the actual (unique) solution.

Z x x2 Take f0(x) = 0, f1(x) = t(1 + f0(t)) dt = , 0 2 Z x x2 x4 f2(x) = t(1 + f1(t)) dt = + , etc. 0 2 8

9.6 General method Solve dy = F (x, y), y(a) = c, a ≤ x ≤ b, dx where F (x, y) is a real-valued function defined for a ≤ x ≤ b and y ∈ R.

If y = f(x) is a solution, then

f 0(t) = F (t, f(t)), f(a) = c, (t ∈ [a, b]), (1) or, equivalently, Z x f(x) = c + F (t, f(t)) dt. (2) a Define φ : C[a, b] → C[a, b] by

Z x φ(f)(x) = c + F (t, f(t)) dt. a So f is a solution of (1) ⇐⇒ f is a solution of (2) ⇐⇒ f is a fixed point of φ. If φ is a contraction for the d∞ metric on C[a, b], then by CMT (1) has a unique solution, which is the limit of a sequence (fn), where f0 ∈ C[a, b] is arbitrary and fn = φ(fn−1) for n ≥ 1. kn Also d∞(fn, f) ≤ 1−k d(f0, f1), where k is the contraction constant of φ.

40 So when is φ a contraction map?

Note that some differential equations don’t have solutions everywhere we might want them; e.g. f 0(t) = −f(t)2 for t ∈ [0, 2], with f(0) = −1. The only solution is f(t) = 1/(t − 1), which is discontinuous at t = 1.

9.7 Theorem With F and φ as above, suppose there is a constant k < 1 such that k |F (x, y ) − F (x, y )| ≤ |y − y | for all x ∈ [a, b], y , y ∈ . 1 2 b − a 1 2 1 2 R

Then φ is a contraction mapping on (C[a, b], d∞).

Proof: For f, g ∈ C[a, b], Z x

|φ(f)(x) − φ(g)(x)| = (F (t, f(t)) − F (t, g(t)) dt a Z x ≤ |F (t, f(t)) − F (t, g(t))| dt a k Z x ≤ |f(t) − g(t)| dt b − a a k Z x ≤ d∞(f, g) dt b − a a x − a = k d (f, g), b − a ∞

so d∞(φ(f), φ(g)) ≤ kd∞(f, g), as required.

9.8 Definition

A function f :[a, b] → R satisfies the Lipschitz condition with constant m if |f(x1) − f(x2)| ≤ m|x1 − x2| for all x1, x2 ∈ [a, b].

If f is differentiable on [a, b] and m = max{|f 0(t)| : t ∈ [a, b]}, then f satisfies the Lipschitz condition with constant m, since the Mean Value Theorem gives, for some c between x1 and x2,

0 |f(x1) − f(x2)| = |(x1 − x2)f (c)| ≤ m|x1 − x2|.

LECTURE 17

Similarly, if we have a function F (x, y), we say that it satisfies the Lipschitz condition in y with constant m if

|F (x, y1) − F (x, y2)| ≤ m|y1 − y2|

41 for all x and for all y1 and y2 for which the above is defined. If we have partial derivatives, we can take   ∂F m = max : a ≤ x ≤ b, y ∈ R . ∂y 9.9 Theorem

1 If F satisfies the Lipschitz condition in y with a constant m < b−a , then the differential equation y0 = F (x, y), y(a) = c has a unique solution for a ≤ x ≤ b.

k Proof: use Theorem 9.7 writing m = b−a with k < 1.

In fact if it satisfies the Lipschitz condition with any constant m at all, we can still 1 solve the equation. What we do is to solve it in C[a, a + δ], where m < δ , and ob- tain a value y(a+δ). We then solve it in C[a+δ, a+2δ], and keep going until we get to b.

[DIAGRAM: lots of pieces joined together.]

Examples: dy 1. = cos(x2y), y(0) = 2, for 0 ≤ x ≤ 1. dx ∂F Here F (x, y) = cos(x2y), and = −x2 sin(x2y), so ∂y

∂F 2 ≤ x ≤ 1. (3) ∂y Thus F satisfies the Lipschitz condition in y with constant m = 1. Not good enough 1 1 for the theorem to apply on [0, 1] (although we could use [0, 2 ] and [ 2 , 1], as above).

But if Z x φ(f)(x) = 2 + cos(t2f(t)) dt, 0 then Z x 2 2 |φ(f)(x) − φ(g)(x)| = (cos(t f(t)) − cos(t g(t)) dt 0 Z x ≤ |F (t, f(t)) − F (t, g(t))| dt 0 Z x ≤ t2|f(t) − g(t)| dt by (3), 0 Z x 2 1 3 ≤ t d∞(f, g) dt = x d∞(f, g). 0 3 1 So d∞(φ(f), φ(g)) ≤ 3 d∞(f, g), and so φ is a contraction map.

42 2. dy √ = y, y(0) = 0, 0 ≤ x ≤ 1. (4) dx ∂F 1 Here F (x, y) = y1/2 and = y−1/2, unbounded. ∂y 2 So F does not satisfy a Lipschitz condition in y at all. For any c ∈ (0, 1] we can define  0 if x ≤ c, fc(x) = 1 2 4 (x − c) if c ≤ x ≤ 1. [DIAGRAM: constant on [0, c], parabola rising on [c, 1].]

Then fc(x) satisfies (4), so there is no unique solution.

A new type of example: consider φ : R → R defined by φ(x) = cos x. Then φ is a shrinking map but not a contraction map, since |φ(x) − φ(y)| = | sin z| |x − y| for some z between x and y, by the Mean value Theorem. This is at most 1 (so shrink- ing), but can be close to 1 if x and y are close to π/2, for example.

[DIAGRAM: plot y = x, y = cos x; curves meet once between 0 and π/2.]

We shall see that nevertheless φ has a unique fixed point.

We see that φ2 : R → R, where φ2(x) = φ(φ(x)), is cos(cos(x)), and this is a contraction map, since | cos(cos x) − cos(cos y)| = | sin(cos z). sin z|, |x − y| by the MVT, and this is at most sin 1, since cos z lies between −1 and 1. But sin 1 is about 0.8415, anyway it’s less than 1.

The following theorem shows that φ has a unique fixed point, given by iteration: x0 = 0, x1 = φ(x0) = 1, x2 = φ(x1) = 0.54, etc. (keep hitting cos button on calculator, working in radians), and this converges to 0.7390851 ....

9.10 Theorem If (X, d) is a complete metric space and φ : X → X is a map such that some iterate m φ of φ is a contraction map, then φ has a unique fixed point. For any x0 ∈ X the n sequence (xn) = (φ (x0)) converges to the fixed point.

Proof: by the CMT applied to φm, we get a unique fixed point x for φm. So x = φm(x). Apply φ, then φ(x) = φ(φm(x)) = φm+1(x) = φm(φ(x)),

43 so that φ(x) is also a fixed point of φm. By the uniqueness, φ(x) = x, so x is a fixed point of φ as well.

If x and y are fixed points of φ, then x and y are fixed points of φm, which is a con- traction mapping, and so x = y. Hence φ has a unique fixed point.

Sketch of last assertion: let’s do m = 3 for illustration (the general case is similar, with more complicated notation). We have, by the CMT for φm:

x0, x3, x6, . . . , x3k,... → x,

x1, x4, x7, . . . , x3k+1,... → x,

x2, x5, x8 . . . , x3k+2,... → x.

This implies that the single sequence x0, x1, x2,... tends to x, since given ε > 0, we have d(x3k, x) < ε for k > k0, say, d(x3k+1, x) < ε for k > k1, say, and d(x3k+2, x) < ε N for k > k2, say. So d(φ (x0), x) < ε for N > max{3k0, 3k1 + 1, 3k2 + 2}.

LECTURE 18

9.11 The final word on differential equations (calculation non-examinable) Given the differential equation dy = F (x, y), y(a) = c, (a ≤ x ≤ b), (5) dx we define Z x φ(f)(x) = c + F (t, f(t)) dt, a as usual. Suppose that F satisfies the Lipschitz condition in y with constant m. We shall see that some iterate of φ is a contraction mapping.

As before we calculate Z x |φ(f)(x) − φ(g)(x)| ≤ |F (t, f(t)) − F (t, g(t))| dt a Z x ≤ m |f(t) − g(t)| dt, by the Lipschitz condition (6) a Z x ≤ m d∞(f, g) dt a Z x = md∞(f, g) dt = md∞(f, g)(x − a). (7) a

44 Repeat:

|φ2(f)(x) − φ2(g)(x)| = |φ(φ(f))(x) − φ(φ(g))(x)| Z x ≤ m |φ(f)(t) − φ(g)(t)| dt, by (6) replacing f, g by φ(f), φ(g) a Z x ≤ m md∞(f, g)(t − a) dt, by (7) a Z x 2 2 2 (x − a) = m d∞(f, g) (t − a)dt = m d∞(f, g) . (8) a 2 Once more: Z x |φ3(f)(x) − φ3(g)(x)| ≤ m |φ2(f)(t) − φ2(g)(t)| dt, by (6) replacing f, g by φ2(f), φ2(g) a Z x 2 2 (t − a) ≤ m m d∞(f, g) dt, by (8) a 2 (x − a)3 = m3d (f, g) . ∞ 3! In general we obtain mn(x − a)n |φn(f)(x) − φn(g)(x)| ≤ d (f, g), n! ∞ i.e., mn(b − a)n d (φn(f), φn(g)) ≤ d (f, g). ∞ n! ∞ P xn mn(b−a)n Now n! converges for any x, so the terms tend to zero, and so n! tends to mN (b−a)N N zero for all choices of m, a and b. If we choose N so that N! < 1, then φ is a contraction mapping. We thus have:

9.12 Theorem If F (x, y) satisfies the Lipschitz condition in y for some constant m, where a ≤ x ≤ b and y ∈ R, then the differential equation (5) has a unique solution which can be ap- proximated by iteration.

10 Connectedness

10.1 Definition A metric space X is disconnected if ∃U, V , open, disjoint, nonempty, such that X = U ∪ V . Note that U and V will also be closed, as their complements are open. Otherwise X is connected. A subset is connected/disconnected if it is con- nected/disconnected when we restrict the metric to the subset to get a (smaller) metric

45 space.

[DIAGRAM in R2]

Examples: (i) X with the discrete metric is disconnected if #X > 1. (ii) In R, consider A = [0, 1] ∪ [2, 3]. This splits√ into A ∩ (−∞√ , 3/2) and A ∩ (3/2, ∞). (iii) Q is disconnected; splits into Q ∩ (−∞, 2) and Q ∩ ( 2, ∞). (iv) R is connnected – see later.

10.2 Definition

An interval in R is a set S such that if s, t ∈ S then [s, t] ⊂ S.

Examples are (a, b), [a, b], (a, b], [a, b), (−∞, b), (−∞, b], (a, ∞), [a, ∞), with a, b finite; also ∅ and R. These are all the examples possible.

We want to show that the connected subsets of R (usual metric) are precisely the in- tervals.

10.3 Lemma

Let x, y ∈ R with x < y, let U, V be disjoint open sets in R with x ∈ U and y ∈ V . Then there is a z ∈ (x, y) with z 6∈ U ∪ V .

Proof: Let T = {t < y : t ∈ U}. Now x ∈ T and so T 6= ∅, and it is bounded above by y, so it has a least upper bound, z = sup T .

We can’t have z ∈ U or else a neighbourhood (z − δ, z + δ) is contained in U, which means that z isn’t the least upper bound of T .

We can’t have z ∈ V , or else a neighbourhood (z − δ, z + δ) is contained in V (and so doesn’t meet U), and again z isn’t the least upper bound of T .

Thus x < z < y and z 6∈ U ∪ V .  N.B. The same result holds assuming only that [x, y] ∩ U and [x, y] ∩ V are disjoint, which is the form we shall require.

10.4 Theorem

A subset S of R is connected if and only if it is an interval.

Proof: (i) If S is not an interval, then there are x, y ∈ S with [x, y] 6⊂ S, so there is a z ∈ (x, y), z 6∈ S.

46 Now take U = S ∩ (−∞, z) and V = S ∩ (z, ∞); we see that this disconnects S.

(ii) Suppose that S is an interval and U, V are open in R with (U ∩ S) ∩ (V ∩ S) = ∅, but U ∩ S and V ∩ S nonempty, and S ⊂ U ∪ V . Take x ∈ U ∩ S and y ∈ V ∩ S. By Lemma 10.3, there is a z ∈ (x, y) with z 6∈ U ∪ V . But z ∈ S ⊂ U ∪ V , a contradiction. Hence the result.

The intersection of two connected sets needn’t be connected [picture in R2] and nor is the union of two connected sets (e.g. (0, 1) ∪ (2, 3)). (In R, however, the intersection of two connected sets is connected, since these are just intervals.)

Unions are OK if there is a point in common, as we see next. LECTURE 19

10.5 Remark Let Y ⊂ X, where X is a metric space. Then a subset S ⊂ Y is open (regarded as a subset of Y ) if and only if S = U ∩ Y , where U is an open subset of X. This follows easily on noting that every open subset in Y is a union of balls BY (s, ε) for s ∈ S, and BY (s, ε) = BX (s, ε) ∩ Y . So, for example, we can say that [0, 1] and [2, 3] are open subsets of the metric space Y = [0, 1] ∪ [2, 3], although not open when regarded as subsets of X = R, since, for 3 example [0, 1] = (−∞, 2 ) ∩ Y .

10.6 Theorem

Let (X, d) be a metric space and x ∈ X. Let {Sλ : λ ∈ Λ} be a family of connected S sets, each containing x. Then λ∈Λ Sλ is connected. S S Proof: Let U, V be open, with Sλ ⊂ U ∪ V , and U ∩ V ∩ Sλ = ∅. WLOG, x ∈ U and x 6∈ V .

For each λ, Sλ = (U ∩ Sλ) ∪ (V ∩ Sλ), | {z } nonempty, contains x S so since Sλ is connected, V ∩ Sλ = ∅ for each λ, and so V ∩ Sλ = ∅. S Hence Sλ is connected.  So every point x ∈ X is contained in a maximal connected subset, the connected component containing x, namely [ Cx = {S ⊂ X, x ∈ S,S connected}.

47 Of course {x} itself is one such connected set S, so this is not an empty union.

Now for x, y ∈ X, either Cx = Cy or else Cx ∩ Cy = ∅. For otherwise, Cx ⊂ Cx ∪ Cy, which would be a strictly larger connected set containing x, by Theorem 10.6, and this contradicts the definition of Cx. Hence, we can write X as a disjoint union of connected components, and these are the maximal connected subsets.

If we apply this for an open subset of R, we end up by seeing that it is necessarily a countable union of disjoint open intervals (countably many, since each one contains a different rational point and there are only countably many to go round).

10.7 Theorem Let f : X → Y be a continuous mapping between metric spaces, and suppose that X is connected. Then the image f(X) := {f(x): x ∈ X} is also connected.

Proof: If f(X) = U ∪ V with U, V , open and disjoint, then X = f −1(U) ∪ f −1(V ), with f −1(U) and f −1(V ) disjoint, and open (since f is continuous). Since X is connected this can only happen if one of f −1(U) and f −1(V ) is empty, which means that one of U and V is empty. So f(X) is connected. 

10.8 Corollary (Intermediate Value Theorem)

Let f :[a, b] → R be continuous. Then for each y between f(a) and f(b) there is a x between a and b with f(x) = y.

Proof: By Theorem 10.7 we have that f([a, b]) is connected, and hence, by Theo- rem 10.4 it is an interval. The result is now clear. 

10.9 Definition Let (X, d) be a metric space. Then X is path-connected if, for all x, y ∈ X, there is a continuous f : [0, 1] → X with f(0) = x, f(1) = y (i.e., a path joining them).

(Of course we can also talk about path-connected subsets of a metric space, as they are metric spaces too.)

10.10 Proposition Let X be a path-connected metric space. Then X is connected.

48 Proof: Suppose that X = U ∪ V , with U and V open, disjoint and nonempty. Take x ∈ U and y ∈ V . Then there is a path f : [0, 1] → X joining x to y. Hence [0, 1] = f −1(U)∪f −1(V ), as open in the metric space [0, 1]. But [0, 1] is connected so one of f −1(U) and f −1(V ) is empty. But 0 ∈ f −1(U) and 1 ∈ f −1(V ), so we have a contradiction. So X is connected.  The converse is false. Take X = G ∪ I, where G = {(x, sin 1/x): x > 0} and I = {(0, y): −1 ≤ y ≤ 1}. Then G ∪ I is connected, but not path-connected. See the Exercises.

LECTURE 20

10.11 Remark

For open subsets of Rn it is true that connected and path-connected are the same. Suppose that S is open and connected, and take x ∈ S. Then

U = {y ∈ S : we can join x to y by a path} is open [DIAGRAM]. So is

V = {y ∈ S : we can’t join x to y by a path}.

Since S = U ∪ V (open, disjoint) and U is nonempty, since it contains x, we see that by connectedness V = ∅ and U = S.

10.12 Theorem

(i) Let n ≥ 2. Then Rn (usual metric) isn’t homeomorphic to R. (ii) Moreover, no two out of (0, 1), [0, 1) and [0, 1] are homeomorphic.

Proof: (i) Suppose that f : R → Rn was a homeomorphism. Let U = R \{0}, and V = Rn \{f(0)}. Then g = f|U is a homeomorphism from U onto V , and hence V is disconnected, as it splits into f(−∞, 0) ∪ f(0, ∞). But V is path-connected, hence connected. Contradiction.

(ii) Similarly, if we delete any point from (0, 1) it becomes disconnected; not true for the others if we deleted 0. So (0, 1) is not homeomorphic to the others. If we delete any 2 points from [0, 1) it becomes disconnected, not true for [0, 1] if we deleted the end-points. So the other two aren’t homeomorphic, either. 

49 Similarly we can see that [0, 1] is not homeomorphic to the square [0, 1]×[0, 1], since re- moving any three points will disconnect [0, 1]. This is in spite of the fact that there exist “space-filling curves”, i.e., continuous (non-bijective) maps from [0, 1] onto [0, 1]×[0, 1]. There also exist discontinuus bijections between the two sets.

11 Compactness

Recall that any real continuous function on a closed bounded interval [a, b] is bounded and attains its bounds. We look at this in a more general context.

11.1 Definition Let K ⊆ X, where (X, d) is a metric space. An open cover of K is a family of open S sets (Uλ)λ∈Λ such that K ⊂ λ∈Λ Uλ. We say that K is compact if whenever (Uλ)λ∈Λ is an open cover of K, there is a finite subcover Uλ1 ,...,UλN such that K ⊂ Uλ1 ∪...∪UλN .

“Every open cover has a finite subcover.”

It doesn’t matter whether we cover K with open sets in K or open sets in X, since open sets in K are just the intersection with K of open sets in X.

11.2 Examples

Clearly R is not compact, as Un = (−n, n) for n = 1, 2,... form an open cover, but we cannot cover the whole of R by taking only finitely many of these sets.

∞ [  1  Similarly, nor is (0, 1) = , 1 , but not a finite union of any of these sets. n n=2 It will be shown later that [0, 1] is compact. More generally, it turns out that the com- pact subsets of Rn with the Euclidean metric are just the closed bounded ones. Thus, compact subsets of R include finite sets, and finite unions of closed intervals such as [0, 1] ∪ [2, 3]. But NOT (0, 1), R itself, or Q.

11.3 Theorem

Let f :(K, d) → R be continuous with K ⊂ X compact. Then f is bounded on K and it attains its bounds (so that ∃x ∈ K with f(x) = sup{f(k): k ∈ K} < ∞ and similarly for inf).

−1 Proof: Let Un = {x ∈ K : |f(x)| < n} for n = 1, 2, 3,..., which is f (−n, n) S and hence open; we have K ⊂ Un. By compactness K ⊂ Un1 ∪ ... ∪ UnN for some

Un1 ,...,UnN , and now |f(x)| ≤ max{n1, . . . , nN } for x ∈ K.

50 Also, if s = supx∈K f(x), we have either that f(x) = s for some x, or else that 1/(s − f(x)) is a continuous function on K and hence bounded by M > 0, say. This means that s − f(x) ≥ 1/M for all x ∈ K; i.e., f(x) ≤ s − 1/M, contradicting the definition of s as the sup. 

11.4 Theorem Let (X, d) be a metric space; then every compact subset K ⊂ X is closed and bounded.

Proof: Let x be a point of X \K. For each k ∈ K consider the balls Bk = B(k, rk/2) and Ck = B(x, rk/2) where rk = d(x, k) > 0. These are disjoint and the Bk form an open cover of K. By compactness we can find k1, . . . , kN such that K ⊂ Bk1 ∪...∪BkN .

But now Ck1 ∩...CkN , is an open ball containing x which is disjoint from Bk1 ∪...∪BkN and hence from K. So K is closed. [DIAGRAM]

S∞ Also, let x be any point of X and note that K ⊂ n=1 B(x, n). By compactness ∃n1, . . . , nN such that K ⊂ B(x, n1)∪...∪B(x, nN ), and thus d(k, x) < max{n1, . . . , nN } for all k ∈ K.  LECTURE 21

An alternative way to see why a compact set K is necessarily closed and bounded is to take any x ∈ X \K and consider the continuous function on K given by f(k) = d(k, x). By Theorem 11.3, f attains its lower bound δ ≥ 0. But δ cannot be 0 as then we would have k ∈ K with k = x. Thus δ > 0 and B(x, δ) ∩ K = ∅ so K is closed. Also, since f is bounded we see that K is bounded.

11.5 Example  1 1  The infinite set S = 1, , ,... ∪ {0} is compact. For any open cover (U ) of S, 2 3 λ

there will be a set, say Uλ0 , containing 0. Since Uλ0 is open, there is an N such that 1 U will also contain for all n ≥ N. But then we only need finitely-many more U λ0 n λ to cover the whole set.

New compact sets from old ones.

11.6 Theorem (i) Let X be a compact metric space and F a closed subset of X. Then F is compact. (ii) Let X be a compact metric space and Y an arbitrary metric space. Suppose that f : X → Y is continuous. Then f(X) is compact.

51 S Proof: (i) If we have an open cover of F , say, F ⊂ λ∈Λ Uλ, then by adding the set X \ F , which is open, we have an open cover of X. Since X is compact we only need

finitely many sets, say X ⊂ (X \ F ) ∪ Uλ1 ∪ ... ∪ UλN , and now F ⊂ Uλ1 ∪ ... ∪ UλN , so F is compact.

S S −1 (ii) Given an open cover f(X) ⊂ λ∈Λ Uλ, we see that X ⊂ λ∈Λ f (Uλ) since for −1 each point x ∈ X there is a λ with f(x) ∈ Uλ, meaning that x ∈ f (Uλ). Since f is continuous, this is an open cover of X. −1 −1 But now we have a finite subcover of X. X ⊂ f (Uλ1 ) ∪ ... ∪ f (UλN ), which means that f(X) ⊂ Uλ1 ∪ ... ∪ UλN . Hence f(X) is compact.  This gives us another way to prove Theorem 11.3. For if K is compact and f(K) ⊂ R is compact, it is a bounded set, and being closed implies that the least upper bound is in the set.

11.7 Theorem (Heine–Borel)

Any closed bounded real interval [a, b] ⊂ R is compact (in the usual metric). S Proof: Given an open cover [a, b] ⊂ λ∈Λ Uλ, let

S = {x ∈ [a, b]:[a, x] ⊂ some finite subcollection of the Uλ}.

Now a ∈ S as there is a Uλ containing a, so S is a nonempty bounded set, indeed it’s an interval [a, y) or [a, y] since if x1 < x2 and x2 ∈ S then also x1 ∈ S. If we can show that y = b and y ∈ S, then we have the result. But if y < b, then y lies in some Uλ0 so that (y − δ, y + δ) ⊂ Uλ0 for some δ > 0. Now y − δ/2 ∈ S, so we cover

[a, y − δ/2] by finitely many Uλ and then adding Uλ0 to the collection cover [a, y + δ/2] by finitely many, contradicting the definition of y. The same argument also shows that y ∈ S. Putting this together, we see that we can cover [a, b] by finitely many sets.  Now here is a concept that, for metric spaces, is equivalent to compactness and a little easier to understand.

11.8 Definition A subset K of a metric space is sequentially compact if every sequence in K has a convergent subsequence with limit in K.

The classical Bolzano–Weierstrass theorem in R says that every bounded sequence has a convergent subsequence, and this implies that all closed bounded subsets F ⊂ R are sequentially compact (“closed” guarantees that the limit is in F ).

52 11.9 Example The closed unit ball B in `2 is not sequentially compact (although closed and bounded). Recall that ( ∞ ) X 2 B = (xn): xn ≤ 1 . n=1 For let e1 = (1, 0, 0,...), e2 = (0, 1, 0, 0,...), e3 = (0, 0, 1, 0,...), etc. Then (e√n) is a sequence in B with no convergent subsequence since d(en, em) = ken − emk = 2 for all n 6= m.

We need one more definition before we state the final big theorem of the course.

11.10 Definition A subset K of a metric space is precompact or totally bounded if for each ε > 0 it can be covered with finitely many balls B(xk, ε). [Think of employing finitely-many short-sighted guards to watch over your set.]

Easily, every compact set is precompact, since we can cover K with open balls B(x, ε), where x varies over the whole of K. By compactness we only need finitely many. [DI- AGRAM]

LECTURE 22

But the closed ball of `2 isn’t precompact, since if it were covered by balls of radius 1/2 then each en would have to be in a different one – for if d(x, en) < 1/2 and d(x, em) < 1/2 we get d(xn, xm) < 1, which is a contradiction for n 6= m. So it isn’t compact either.

11.11 Example - the Hilbert Consider the subset C ⊂ `2 defined by

 1  C = (x )∞ : |x | ≤ for each n . n n=1 n n

∞ X 1 ε2 We claim that C is precompact. For given ε > 0 choose N such that < . n2 4 n=N+1 Now the set  1  D = x = (x , . . . , x ) ∈ N : |x | ≤ for each n 1 N R n n is easily seen to be precompact: we can cover it with balls of radius ε/2 simply by taking enough centres (y1, . . . , yN ) such that |xj − yj| < ε/2N for each xj ∈ [−1/j, 1/j]

53 [DIAGRAM].

Now think of vectors in RN as being padded with zeroes, so that they lie in `2.

SK SK That is D ⊂ k=1 B(zk, ε/2). But now we have that C ⊂ k=1 B(zk, ε), since for every 0 0 point in c ∈ C its truncation c to N coordinates lies in D; thus d(c , zk) < ε/2 and 0 0 hence d(c, zk) ≤ d(c, c ) + d(c , zk) < ε by the triangle inequality.

In fact the Hilbert cube is also compact, and this is a consequence of the big result that follows.

11.12 Theorem The following are equivalent in a metric space (X, d): (1) X is compact. T∞ (2) If (En) are nonempty closed sets in X with E1 ⊇ E2 ⊇ ..., then n=1 En 6= ∅. (3) X is sequentially compact. (4) X is complete and precompact.

T S Proof: (1) ⇒ (2). Suppose that En = ∅, then (X \ En) = X (de Morgan’s law); this is an open cover of X so there is a finite subcover, by compactness. So X = (X \ E1) ∪ ... ∪ (X \ EN ) = X \ EN as the En are decreasing so their complements are increasing. Which means that EN = ∅, a contradiction.

(2) ⇒ (3). Let (xn) be any sequence in X and let En = {xn, xn+1,...}, which are T∞ decreasing nonempty closed sets. Thus there is a point y in n=1 En.

Take B(y, 1): this meets {x1, x2,...} since y is in its closure. Pick xn1 ∈ B(y, 1).

Now B(y, 1/2) meets {xn1+1, xn1+2,...} since y is in its closure. Pick xn2 ∈ B(y, 1/2), and note that n2 > n1.

Continuing this way we find xnk in B(y, 1/k), so the subsequence (xnk ) converges to y.

(3) ⇒ (4). If X is sequentially compact it is certainly complete, for if (xn) is a Cauchy

sequence, let (xnk ) be a convergent subsequence, converging to y. Now the original Cauchy sequence also converges to y (see Part D of Theorem 8.5).

Also X will be precompact, since if not then we can find ε > 0 with no finite covering by balls of radius ε; choose x1 ∈ X, and then inductively we obtain (xn) such that Sn−1 xn 6∈ k=1 B(xk, ε). Now it’s clear that d(xk, xn) ≥ ε > 0 for all k < n, which means that the (xn) have no convergent subsequence.

We’ll postpone (4) ⇒ (1) until the next lecture, as it is long.

11.13 Corollary

54 A subset K ⊂ RN is compact if and only if it is closed and bounded.

Proof: We saw in Theorem 11.4 that all compact sets (in any metric space) are closed and bounded.

For the converse, we can show that all closed bounded sets K in RN are sequentially n n n compact. If (x ) = (x1 , . . . , xN ) is a sequence in K, then by passing to a subsequence n we can ensure that the sequence (x1 ) of first coordinates converges (since every bounded sequence in R has a convergent subsequence); then to a further subsequence to ensure n that the sequence (x2 ) of 2nd coordinates converges, and so on.

n n n n After finitely many steps we have a subsequence (y ) = (y1 , . . . , yN ) such that yk → zk, n say, as n → ∞ for each 1 ≤ k ≤ N. This implies that y → z = (z1, . . . , zN ) (cf. Propo- sition 2.4). Also z ∈ K since K is closed.

Thus K is sequentially compact, and hence compact by Theorem 11.12. 

LECTURE 23

(4) ⇒ (1) in Theorem 11.12. This is the hardest bit and the proof is definitely not examinable. We suppose that X is complete and precompact, and show it is compact. S So take an open cover λ∈Λ Uλ of X. First we reduce it to a countable cover. For each n we can cover X by finitely many balls B(an,1, 1/n) ∪ ... ∪ B(an,rn , 1/n), using precompactness. Let A denote the set of centres (this is countable and dense because for each n the set A comes within 1/n of every point of X) and consider all the balls B(a, 1/k) for a ∈ A and k = 1, 2,.... We claim that every open set U is a union of some of the B(a, 1/k). For if U is open and x ∈ U, there is a ball B(x, 1/j) ⊂ U and a point a ∈ A contained in B(x, 1/2j). But now x ∈ B(a, 1/2j) ⊂ B(x, 1/j) ⊂ U [DIAGRAM].

Thus we can cover X with a countable subcollection of the Uλ, since for each x ∈ X there is a ball B(a, 1/k) with x ∈ B(a, 1/k) ⊂ some Uλ. There are only countably many balls to choose from so select one Uλ for each ball we used.

The upshot, after relabelling, is that we may assume that X = U1 ∪ U2 ∪ .... If there is a p such that X = U1 ∪ ... ∪ Up, we are finished. If not, then for each i we may find xi 6∈ U1 ∪...∪Ui. We select a Cauchy subsequence as follows (a “diagonal” argument).

Cover X by finitely-many balls of radius 1. Then for at least one of these balls there is an infinite subsequence, say x11, x12,..., all in the same ball B(y1, 1). Now cover X by finitely-many balls of radius 1/2. Then for at least one of these balls there is an infinite subsequence of (x1k), say x21, x22,... all in the same ball B(y2, 1/2).

55 Repeat. We obtain nested subsequences (xnk)k all in the same ball B(yn, 1/n). But now the diagonal subsequence (xnn) is Cauchy, since since for m ≤ n, d(xmm, ym) < 1/m and also d(xnn, ym) < 1/m as the (xnk) are a subsequence of the (xmk). Hence d(xmm, xnn) < 2/m for n > m, i.e., a Cauchy sequence.

By completeness, xnn → z, say. Now z ∈ Uj for some j so xnn ∈ Uj for n ≥ some n0. But this contradicts the construction of our (xi) as for i ≥ j we have xi 6∈ U1 ∪ ... ∪ Uj. 

11.14 Final Example – the We define the Cantor set C ⊂ [0, 1] as follows.

Let C0 = [0, 1], C1 = [0, 1/3] ∪ [2/3, 1], C2 = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1], and so on, at each stage deleting the middle open third of every interval that remains. [DIAGRAM]

T∞ Then C = n=0 Cn. This is the intersection of closed subsets of R, and is hence a closed (even compact) set. Remarkably, it is uncountable: indeed it consists of all ∞ X aj numbers of the form x = , where a = 0 or 2 for each j (and not 1). Note that 3j j j=1 we regard 1/3 = 0.02222 ....

One can use a Cantor diagonal argument (as one does to prove that R is uncountable) to show that C is uncountable.

Note that in fact there is a surjection f : C → [0, 1] defined by

∞ ∞ X aj X aj/2 f : 7→ . 3j 2j j=1 j=1 Paradoxically, the complement of the Cantor set is an open set and so just a countable union of intervals. If one calculates the total length of the intervals removed from [0, 1] ∞ X 2j it is , since we removed 2j intervals of length 3−j at each stage. This sums up to 3j j=1 1, but there are still many points left!

The set C is “totally disconnected” – it clearly doesn’t contain any intervals, so all its subsets consisting of more than one point are disconnected. That is, every component of C consists of a single point.

In a technical sense (outside the scope of this course) C is a fractal set – its “” is log 2/ log 3 or about 0.63.

56 THE END

The exercises on the sheet Extra Examples will be done in lectures, but the solutions will not be put online (if necessary, you can watch the videos).

57