<<

1.4 Cauchy in R

Definition. (1.4.1)

A sequence xn ∈ R is said to converge to a x if • ∀² > 0, ∃N s.t. n > N ⇒ |xn − x| < ². A sequence xn ∈ R is called Cauchy sequence if • ∀², ∃N s.t. n > N & m > N ⇒ |xn − xm| < ².

Proposition. (1.4.2) Every convergent sequence is a .

Proof. Assume xk → x. Let ² > 0 be given. ² • ∃N s.t. n > N ⇒ |xn − x| < 2 . • n, m ≥ N ⇒ ² ² |xn − xm| ≤ |x − xn| + |x − xm| < 2 + 2 = ².

1 Theorem. (1.4.3; Bolzano-Weierstrass Property) Every bounded sequence in R has a that converges to some point in R.

Proof. Suppose xn is a bounded sequence in R. ∃M such that

−M ≤ xn ≤ M, n = 1, 2, ··· . Select xn0 = x1.

• Bisect I0 := [−M, M] into [−M, 0] and [0, M].

• At least one of these (either [−M, 0] or [0, M]) must contain xn for infinitely many indices n.

• Call it I1 and select n1 > n0 with xn1 ∈ I0.

• Continue in this way to get a subsequence xnk such that • I0 ⊃ I1 ⊃ I2 ⊃ I3 ··· −k • Ik = [ak , bk ] with |Ik | = 2 M.

• Choose n0 < n1 < n2 < ··· with xnk ∈ Ik . ∃ • Since ak ≤ ak+1 ≤ M (monotone and bounded), ak → x. −k • Since xnk ∈ Ik and |Ik | = 2 M, we have −k−1 |xnk −x| < |xnk −ak |+|ak − x| ≤ 2 M+|ak − x| → 0 as k → ∞. 2 Corollary. (1.4.5; Compactness) Every sequence in the closed [a, b] has a subsequence in R that converges to some point in R.

Proof. Assume a ≤ xn ≤ b for n = 1, 2, ··· . By Theorem 1.4.3, ∃ ∃ a subsequence xnk and a ≤ x ≤ b such that xnk → x. Lemma. (1.4.6; Boundedness of Cauchy sequence)

If xn is a Cauchy sequence, xn is bounded.

Proof. ∃N s.t. n ≥ N ⇒ |xn − x| < 1. Then supn |xn| ≤ 1 + max{|x1|, ··· , |xN |} (Why?) Theorem. (1.4.3; Completeness) Every Cauchy sequence in R converges to an in [a, b].

Proof. Cauchy seq. ⇒ bounded seq. ⇒ convergent subseq.

3 1.5. Cluster Points of the sequence xn

Definition. (1.5.1; cluster points)

A point x is called a cluster point of the sequence xn if • ∀² > 0, ∃infinitely many values of n with |xn − x| < ² In other words, a point x is a cluster point of the sequence xn iff ∀² > 0 & ∀N, ∃n > N s.t. |xn − x| < ²

Example • Both 1 and −1 are cluster points of the sequence 1, −1, 1, −1, ··· . 1 • The sequence xn = n has the only cluster point 0. • The sequence xn = n does not have any cluster point.

4 Proposition.

1. x is a cluster point of the sequence xn iff

∃ a subsequence xnk s.t. xnk → x. 2. xn → x iff every subsequence of xn converges to x

3. xn → x iff the sequence {xn} is bounded and x is its only cluster points.

Proof. 1. (⇒) Assume x is a cluster point. Then, we can choose 1 n1 < n2 < n3 ··· s.t. |xnk − x| < k . (Why?) This gives a

subsequence xnk → x. 2. Trivial

3. (⇐)If not, ∃² and ∃ a subseq xnk so that |xnk − x| > ². Since

xnk is bounded, ∃ a convergent subseq. The limit of that subseq would be a cluster pt of the seq xn different from x, but there are no such pt. Contradiction. 5 Definition. (1.5.3; limit superior & limit inferior of seq xn )

Define the limit superior limxn in the following way: • If xn is bounded above, then lim supn→∞ xn = limxn = the largest cluster point limxn = −∞ if the cluster point is empty • If xn is NOT bounded above, then limxn = ∞

Similarly, we can define the limit inferior limxn. Examples

• For the seq 1, 0, −1, 1, 0, −1, ··· , limxn = 1 and limxn = −1.

• If xn = n, then limxn = ∞ = limxn n 1+n • Let xn = (−1) n . Then limxn = 1 and limxn = −1.

6 Definition. (1.6.2; Vector ) A real V is a set of elements called vectors, with given operations of vector + : V × V → V and scalar multiplication · : R × V → V such that the followings hold for all v, u, w ∈ V and all λ, µ ∈ R: 1. v + w = w + v, (v + u) + w = v + (u + w), λ(v + w) = λv + λw, λ(µv) = (λµ)v, (λ + µ)v = λv + µv, 1v = v. 2. ∃0 ∈ V s.t. v + 0 = v. ∃ − v ∈ V s.t. v − v = 0.

• A subset of V is called a subspace if it is itself a vector space with the same operations. • W is a vector subspace of V iff λv + µu ∈ W whenever u, v ∈ W and λ, µ ∈ R.

• The straight line W = {(x1, x2): x1 = 2x2} is a subspace of R2.

7 Euclidean space Rn & Definitions & Properties The Euclidean n-space Rn with the operations

(x1, ··· , xn) + (y1, ··· , yn) = (x1 + y1, ··· , xn + yn)& λ(x1, ··· , xn) = (λx1, ··· , λxn) is a vector space of dimension n. • The standard of Rn; e1 = (1, 0, ··· , 0), ··· , en = (0, ··· , 0, 1). n • Unique representation: x = (x1, ··· , xn) ∈ R can be expressed uniquely as x = x1e1 + ··· + xnen. P • Inner product of x and y: hx, yi = n x y p i=1 i i • of x: kxk = hx, yi. • Distance between x and y: dist(x, y) = kx − yk • : kx + yk ≤ kxk + kyk. • Cauchy-Schwartz inequality: hx, yi ≤ kxk kyk • Pythagorean theorem: If hx, yi = 0, then kx + yk2 = kxk2 + kyk2.

8 Definition. (1.7.1; Space (M, d) equipped with d =distance) A (M, d) is a set M and a d : M × M → R such that 1. d(x, y) ≥ 0 for all x, y ∈ M. 2. d(x, y) = 0 iff x = y. 3. d(x, y) = d(y, x) for all x, y ∈ M. 4. d(x, y) ≤ d(x, z) + d(z, y) for all x, y ∈ M.

Example [Fingerprint Recognition] Let M be a data set of fingerprints in Seoul city police department. • Motivation: Design an efficient access system to find a target. • We need to define a dissimilarity function stating the distance between the data. The distance d(x, y) between two data x and y must satisfy the above four rules. • Similarity queries. For a given target x∗ ∈ M and ² > 0, arrest all having finger print y ∈ M such that d(y, x∗) < ². 9 Definition. (1.7.3. Normed Space (V, k · k)) A normed space (V, k · k) is a vector space V and a function k · k : V → R called a norm such that 1. kvk ≥ 0, ∀v ∈ V 2. kvk = 0 iff v = 0. 3. kλvk = |λ|kvk, ∀v ∈ V and every scaler λ. 4. kv + wk ≤ kvk + kwk, ∀v, w ∈ V

Examples • V = R and kxk = |x| for all x ∈ R. q 2 2 2 2 • V = R and kvk = v1 + v2 for all v = (v1, v2) ∈ R . • Let V = C([0, 1])=all continuous functions on the interval [a, b]. Define kf k = sup{|f (x)| : x ∈ [0, 1]} (called supremum norm).

10 Proposition. If (V, k · k) is a and

d(v, w) = kv − wk

, then d is a metric in V. Proof. EASY. Examples • For V = C([0, 1]), the metric is

d(f , g) = kf − gk = sup{|f (x) − g(x)| : x ∈ [0, 1]}.

The sup distance between functions is the largest vertical distance between their graphs.

11 Definition. A vector space V with a function h·, ·i : V × V → R is called an if 1. h v, vi ≥ 0 for all v ∈ V. 2. h v, vi = 0 iff v = 0. 3. hλv, wi = λhv, wi, ∀v ∈ V and every scaler λ. 4. hv + w, hi = h v, h i + h w, h i. 5. h v, w i = h w, v i

Examples 2 1. V = R and h v, w i = v1w1 + v2w2. Two vectors v and w are orthogonal if h v, w i = 0. R 2. V = C[0, 1] and h f , g i = 1 f (x)g(x)dx p 0 3. kvk = h v, v i is a norm on V.

12 Theorem. (Cauchy-Schwarz inequality ) If h ·, · i is an inner product in a real vector space V, then |h f , g i| ≤ kf kkgk

Proof: • g Suppose g 6= 0. Let h = kgk . It suffices to prove that |h f , h i| ≤ kf k. (Why? |h f , g i| ≤ kf kkgk iff |h f , h i| ≤ kf k.) • Denote α = h f , h i. Then

0 ≤ kf − αhk2 = hf − αh, f − αhi = kf k2 − α hh, f i − α hf , hi + |α|2 = kf k2 − |α|2

Hence, |α| = |h f , h i| ≤ kf k. This completes the proof.

13 Chapter 2: of M = Rn n Throughout this chapter, assumepP M = R ( the Euclidean space ) n 2 with the metric d(x, y) = i=1 |xi − yi | = kx − yk Definition. (D(x, ²), open, neighborhood)

• D(x, ²) := {y ∈ M : d(y, x) < ²} is called ²-ball (or ²-disk) about x. • A ⊂ M is open if ∀ x ∈ M, ∃² > 0 s.t. D(x, ²) ⊂ A. • A neighborhood of x is an open set A containing x.

• Open sets: (a, b), D(x, ²), {(x, y) ∈ R2 : 0 < x < 1}. • The union of an arbitrary collection of open subsets of M is open. (Why?) • The intersection of a finite number of open subsets of M is ∞ open. (Note that ∩n=1(−1/n, 1/n) = {0} is closed. ) 1 2.2 Interior of a set A: int(A)

Definition. (2.2.1; Interior point & interior of A) Let (M, d) is a metric space and A ⊂ M. x is called an interior point of A if ∃D(x, ²) s.t. D(x, ²) ⊂ A. Denote

int(A) := the collection of all interior points of A.

Examples. Proofs are very easy. • If A = [0, 1], then int(A) = (0, 1). • int{(x, y) ∈ R2 : 0 < x≤1} = {(x, y) ∈ R2 : 0 < x<1}. • If A is open, then int(A) = A.

• Let (M, d) be a metric space and x0 ∈ M. int{y ∈ M : d(y, x0)≤1} = {y ∈ M : d(y, x0)<1}

2 Definition. (2.3-4: Closed sets & Accumulation Points ) • A set B in a metric space M is said to be closed if M \ B is open. • x ∈ M is accumulation point (or cluster point ) of a set A ⊂ M if ∀² > 0, D(x, ²) contains y ∈ A with y 6= x.

Prove the followings: 2 • Closed sets: [a, b], {y ∈ R : d(y, x0)≤1}. • The union of an a finite number of closed subsets of M is ∞ closed. (Note that ∪n=1[1/n, 2 − 1/n] = (0, 2) is open. ) • The intersection of an arbitrary family of closed subsets of M is closed. Why? • Every finite set in Rn is closed. • A set A ⊂ M is closed iff the accumulation points of A belongs to A. 1 1 1 • A = {1, 2 , 3 , 4 , · · · } ∪ {0} is closed.

3 Definition. ( of A & Boundary of A) Let (M, d) is a metric space and A ⊂ M. • cl(A) :=the intersection of all containing A. • ∂A = bd(A) = cl(A) ∩ cl(M \ A) is called the boundary of A

Examples • Closure: cl((0, 1)) = [0, 1], cl{(x, y) ∈ R2 : x > y} = {(x, y) ∈ R2 : x ≥ y}. • Boundary bd((0, 1)) = {0, 1}, bd{(x, y) ∈ R2 : x > y} = {(x, y) ∈ R2 : x = y}. Let (M, d) is a metric space and A ⊂ M. Prove that • cl(A) = A ∪ { accumnulation points of A}. • x ∈ cl(A) iff inf{d(x, y): y ∈ A} = 0. • x ∈ bd(A) iff ∀² > 0, D(x, ²) ∩ A 6= ∅ & D(x, ²) ∩ (M \ A) 6= ∅.

4 Definition. ( & Completes)

Let (M, d) is a metric space and xk a sequence of points in M.

• limk→∞ xk = x iff ∀² > 0, ∃N s.t. k ≥ N ⇒ d(x, xk ) < ².

• xk is Cauchy seq. iff ∀² > 0, ∃N s.t. k, l ≥ N ⇒ d(xk , xl ) < ².

• xk is bounded iff ∃B > 0 & x0 ∈ M s.t. d(xk , x0) < B for all k.

• x is a cluster point of the seq. xk iff ∀², ∃ infinitely many k with d(xk , x) < ². The space M is called complete if every Cauchy seq. in M converges to a point in M. In a metric space, it is easy to prove the followings: • Every convergent seq. is a Cauchy seq. • A Cauchy seq. is bounded. • If a subseq. of a Cauchy seq. converges to x, then the sequence itself converges to x. 5 Chapter 3: Compact & Connected sets Throughout this chapter, we assume that (M, d) is a metric space. Definition. (3.1.1: Sequentially compact & Compact) Let A ⊂ M. • A is called sequentially compact if EVERY sequence in A has a subsequence that converges to a point in A. • A is compact if EVERY open cover of A has FINITE subcover.

• An open cover of A is a collection {Ui } of open sets such that A ⊂ ∪i Ui . • An open cover {Ui } of A is said to have finite subcover if a finite subcollection of {Ui } covers A.

• In chapter 1, we proved that every sequence xn in the closed interval [a, b] has a subsequence that converges to a point in [a, b]. Hence, [a, b] is sequentially compact. 1 Examples of compact set 1. Prove that the entire line R is NOT compact. Proof. Clearly, {D(n, 1) : n = 0, ±1, ±2, ···} is open cover of R but does not have a finite subcover (why?). 2. Prove that A = (0, 1] is not compact. Proof. Clearly, ∞ (0, 1] = ∪n=1(1/n, 2). Hence, {(1/n, 2) : n = 1, 2, ···} is an open cover of (0, 1] but does not have a finite subcover. 3. Heine-Borel thm. Let A ⊂ M = Rn. A is compact iff A is closed and bounded. Proof. later. 4. Give an example of a bounded and closed set that is not compact. Sol’n. Let M = {en : n = 1, 2, ···} where √ e1 = (1, 0, 0, ··· ), e2 = (0, 1, 0, ··· ), ··· . Let d(ei , ej ) = 2 if i 6= j. Then (M, d) is a metric space. • The entire metric space M is closed and bounded (why?). • {D(en, 1) ; n = 1, 2 ···} is open cover of M but but does not have a finite subcover (why?). Hence, M is not compact. 2 Theorem. (3.1.3; Bolzano-Weirstrass theorem) A ⊂ M is compact iff A is sequentially compact.

• Lemma 1: Let A ⊂ M. If A is compact, then A is closed. Proof . We will show M \ A is open. Let x ∈ M \ A. ∞ 1. A ⊂ ∪n=1Un where Un = M \ D(x, 1/n) open set. 2. Since A is compact and {Un} covers A, ∃ a finite subcover, N that is, ∃N s.t. A ⊂ ∪n=1Un. = UN c c 3. Hence, D(x, 1/N) ⊂ UN ⊂ A = M \ A and therefore M \ A is open. • Lemma 2: Let A ⊂ B ⊂ M. If B is compact and A is closed, then A is compact. Proof. Let Ui be an open covering of A. 1. Set V = M \ A. Note that V is open. 2. Thus {Ui , V } is an open cover of B. 3. Since B is compact, B has a finite cover, say, {U1, ··· , UN , V }. Hence, A ⊂ U1 ∪ · · · ∪ UN .

3 • Lemma 4: If A is sequentially compact, then A is totally bounded. 1. Definition of totally bounded: A ⊂ M is totally bounded if N ∀², ∃ finite set {x1, ··· , xN } ⊂ M s.t. A ⊂ ∪i=1D(xi , ²). 2. Proof. If not, then for some ² > 0 we cannot cover A with finitely many disks.

(i) Choose x1 ∈ A and x2 ∈ A \ D(x1, ²). n−1 (ii) By assumption, we can repeat; choose xn ∈ A \ ∪i=1 D(xi , ²) for n = 1, 2, ··· . (iii) This seq {xn} satisfies d(xn, xm) > ² for all n 6= m. (iv) Hence, xn has no convergent subseq., a contradiction.

• Summery. Let A ⊂ M. • A is compact ⇒ A is closed • A is a closed subset of a compact set ⇒ A is compact. • A is sequentially compact ⇒ A is totally bounded.

4 Proof of B-W thm (⇒): If A is compact, then A is sequentially compact. Let A be compact. Let {xn} be a seq. in A.

1. To derive a contradiction, assume that {xn} has no convergent subseq.

2. Then, {xn} has infinitely many distinct points {yk } which has no accumulation points. (Why? If not, ∃ convergent subseq. )

3. Hence, ∃ some neighborhood Uk of yk containing no other yi .

4. {yn} is closed because it has no accumulation points. Hence, {yn} is compact by Lemma 2. Lemma2: Any closed subset of the compact set A is compact.

5. But {Uk } is an open cover that has no finite subcover, a contradiction.

6. Hence, xn has a convergence subsequence. The limit lies in A, since A is closed by Lemma 1.

Hence, xn has a subsequence that converges to a point in A. 5 Proof of B-W thm (⇐): If A is sequentially compact, than A is compact. Suppose {Ui } is an open cover of A. We need to prove that {Ui } has finite subcover.

• ∃r > 0 s.t. ∀y ∈ A, D(y, r) ⊂ Ui for some Ui . Why?

1. If not, ∃yn ∈ A s.t. D(yn, 1/n) is not contained in any Ui .

2. By assumption, {yn} has a subseq., say, ynk → z ∈ A. Since

z ∈ A ⊂ ∪i Ui , z ∈ Ui0 for some Ui0 .

3. Since Ui0 is open, ∃² > 0 s.t. D(z, ²) ⊂ Ui0 .

4. Since ynk → z, ∃N = nk0 ≥ 2/² s.t. yN ∈ D(z, ²/2).

5. But D(yN , 1/N) ⊂ D(z, ²) ⊂ Ui0 (why?), a contradiction. • Since A is totally bounded (see Lemma 4), we can write A ⊂ D(y1, r) ∪ · · · ∪ D(yn, r) for finitely many yi .

• Since D(yk , r) ⊂ Uik for some Uik , A ⊂ Ui1 ∪ · · · ∪ Uin , finite subcover. Hence, A is compact.

6 Theorem. (3.15; Compact ⇔ Closed and Totally Bounded) Let A ⊂ M. A is compact iff A is complete and totally bounded.

(Proof of ⇒) Assume A is compact. 1. A is compact ⇒ totally bounded & sequentially compact. 2. A is sequentially compact ⇒ A is complete.

(Proof of ⇐) Assume A is is complete and totally bounded. It suffices to prove that A is sequentially compact. Assume that {yn} is a sequence in A.

1. We may assume that the yk are all distinct. (Why? If not, ...) 2. Since A is totally bounded, for each k = 1, 2, ···

∃xk1, ··· , xkLk ∈ M s.t. A ⊂ D(xk1, 1/k) ∪ · · · ∪ D(xkLk , 1/k)

3. Nest page... 7 Theorem. (Continue...) Let A ⊂ M. A is compact iff A is complete and totally bounded.

(Proof of ⇐) Assume A is is complete and totally bounded. It suffices to prove that A is sequentially compact.

Assume that {yn} is a sequence in A.

1. We may assume that the yk are all distinct. (Why? If not, ...)

2. Since A is totally bounded, for each k = 1, 2, ···

∃xk1, ··· , xkLk ∈ M s.t. A ⊂ D(xk1, 1/k) ∪ · · · ∪ D(xkLk , 1/k) 3. For k = 1, an infinitely many yn lie in one of these disks D(x1,j , 1). Hence, we can select a subseq. {y11, y12, ···} lying entirely in one of these disks. 4. Repeat the previous step for k = 2 and obtain the subseq. {y21, y22, ···} of {y11, y12, ···} lying entirely in one of these disks D(x2j , 1/2). 5. Now choose the diagonal subsequence y11, y22, y33, ··· . This sequence is Cauchy seq. because d(yii , yjj ) ≤ max{1/i, 1/j}. 6. Since A is complete, yii converges to a point in A. 8 Theorem. (3.2.1, Heine-Borel thm.) Let A ⊂ M = Rn. A is compact iff A is closed and bounded.

Proof. • Recall Thm 3.1.5: A is compact iff A is closed and totally bounded. • Since M = Rn is Euclidean space,

A is bounded ⇔ A is totally bounded

Caution: If M is not Euclidean space, the above statement is not true. See Example 3.1.8 where there is an example that A is bounded but not totally bounded.

9 Theorem. (3.3.1: Nested Set Property)

Let Fk be a sequence of compact non-empty set in a metric space M such that F1 ⊇ F2 ⊇ F3 ⊇ · · · . Then,

∞ ∩k=1Fk 6= ∅.

1. For each n, choose xn ∈ Fn.

2. Since {xn} ⊂ F1 and F1 is compact, ∃ a subseq {xnk } that converges to some point z in F1, that is,

xnk −→ z ∈ F1

3. With a rearrangement, we may assume that xn → z. (why?) 4. n > N =⇒ xn ∈ Fn ⊂ FN =⇒ xn ∈ FN 5. Since limj→∞ xN+j = z & xN+j ∈ FN & FN is compact, it must be z ∈ FN , N = 1, 2, 3, ··· This completes the proof. 10 Definition. (Path-Connected Sets)

• φ :[a, b] → M is said to be continuous if

tk ∈ [a, b] → t =⇒ φ(tk ) → φ(t)

• A continuous path joining x, y ∈ M is a continuous mapping φ :[a, b] → M such that φ(a) = x, φ(b) = y. • A ⊂ M is said to be path-connected if for any x, y ∈ A, there exists a continuous path φ :[a, b] → M joining x and y such that φ([a, b]) ⊂ A.

11 Definition. (3.5.1: Separate, Connected Sets) Let A be a subset of a metric space M. • Two open set U, V are said to be separate A if 1. U ∩ V ∩ A = ∅ 2. U ∩ A 6= ∅ &V ∩ A 6= ∅ 3. A ⊂ U ∪ V. • A is disconnected if such sets U, V exist. • A is connected if such sets U, V do not exist.

12 Theorem. (3.3.1) Path-connected sets are connected.

1. Clearly, [a, b] is connected. 2. To derive a contradiction, suppose A is path-connected but not connected. Then ∃ open sets U, V such that (i) U ∩ V ∩ A = ∅ & A ⊂ U ∪ V (ii) ∃x ∈ U ∩ A & ∃y ∈ V ∩ A 3. Since A is path-connected, ∃ a continuous path φ :[a, b] → M s.t. φ(a) = x, φ(b) = y, φ([a, b]) ⊂ A. 4. From Theorem 4.2.1 which we will learn soon, φ([a, b]) is connected. This is a contradiction since U, V separate φ([a, b]).

13 Example 3.1 • Show that A := {x ∈ Rn : kxk ≤ 1} is compact and connected. Proof. 1. Since A is closed and bounded, A is compact by Heine-Borel thm. 2. To prove connectedness, let x, y ∈ A. 3. Define φ : [0, 1] → Rn by φ(t) = tx + (1 − t)y. Clearly, φ is continuous path joining φ(0) = x and φ(1) = y. 4. kφ(t)k ≤ tkxk + (1 − t)kyk ≤ t + (1 − t) = 1 for t ∈ [0, 1]. Hence, φ([0, 1]) ⊂ A. 5. Hence, A is path-connected.

14 Example 3.2 • Let A ⊂ Rn, x ∈ A and y ∈ Rn \ A. Let φ : [0, 1] → Rn be a continuous path joining x and y. Show that ∃ t0 s.t. φ(t0) ∈ bd(A). 1. Let t0 = sup{t : φ([0, t]) ⊂ A}. This is well-defined because φ(0) = x ∈ A. 2. If t0 = 1, clearly y = φ(t0) ∈ bd(A). 3. Assume 0 ≤ t0 < 1. From the definition of t0, for 1 c n = 1, 2, ··· , ∃tn s.t. t0 ≤ tk ≤ t0 + n & φ(tn) ∈ A c 4. Since φ(tn) ∈ A → φ(t0), φ(t0) ∈ bd(A).

15 Chapter 4. Continuous Mappings Throughout this chapter, we assume that M = Rn and N = Rm are Euclidean space with the standard metric v u uXn 2 d(x, y) = kx − yk = t (xi − yj ) , x, y ∈ M j=1 v u uXm 2 ρ(v, w) = kv − wk = t (vi − wj ) , v, w ∈ N j=1

Please note that the same symbol k · k may have different norm depending on its context. Throughout this chapter, we assume that A ⊂ M = Rn and

f : A → N = Rm is a mapping.

1 Definition. (4.1.1: Continuity of f : A → N)

• Suppose that x0 ∈ {accumulation points of A}. We write

limx→x0 f (x) = b if ∀² > 0, ∃ δ > 0 s.t.

0 < kx − x0k < δ & x ∈ A ⇒ kf (x) − bk < ²

• Let x0 ∈ A. We say that f is continuous at x0 if either

x0 6= {accumulation points of A} or limx→x0 f (x) = f (x0). • Let B ⊂ A. f is called continuous on B if f is continuous at each point on B. If A = B, we just say that f is continuous.

2 Theorem. (4.1.4: Continuity of f : A → N) The following assertions are equivalent. 1. f is continuous on A

2. For every convergent seq xk → x0 in A, we have f (xk ) → f (x0). 3. For each open set U in N, f −1(U) is open relative to A; that is, f −1(U) = A ∩ V for some open V 4. For each closed set F in N, f −1(F ) is closed relative to A; that is, f −1(F ) = A ∩ G for some close G

easy easy Proof. 1 =⇒ 2 =⇒? 4 =⇒ 3 =⇒? 1

3 Proof of (2 =⇒ 4) Let F ⊂ N be closed. We want to prove that f −1(F ) is closed relative to A. We begin with reviewing the definition of closed. 1. B is closed iff B = B ∪ {accumulation points of B}.

2. B is closed iff for every sequence {xk } ⊂ B that xk → x0, we necessary have x0 ∈ B. 3. B ⊂ A is closed relative to A iff B = (B ∪ {accumulation points of B}) ∩A

4. B ⊂ A is closed relative to A iff for every sequence {xk } ⊂ B that xk → x0∈ A, we necessary have x0 ∈ B. −1 5. Proof of (2 =⇒ 4). Let xk ∈ f (F ) and let xk → x0 ∈ A. By 2, f (xk ) → f (x0). Since F is closed, f (x0) ∈ F . −1 −1 ∴ x0 ∈ f (F ). ∴ f (F ) is closed relative to A.

4 Proof of (3 =⇒ 1)

For given x0 ∈ A and ² > 0, we must find δ > 0 such that

kx − x k < δ & x ∈ A ⇒ kf (x) − f (x )k < ² | 0 {z } | {z 0 } x ∈ D(x0,δ)∩A f (x) ∈ D(f (x0),²)

1. Since D(f (x0), ²) is open, by 3

−1 f (D(f (x0), ²)) is open relative to A.

−1 ∴ f (D(f (x0), ²)) = A ∩ V for some open set V .

2. Since x0 ∈ V and V is open,

∃ δ > 0 s.t. D(x0, δ) ⊂ V .

−1 3. Hence, D(x0, δ) ∩ A ⊂ f (D(f (x0), ²)) and this completes the proof. 5 Theorem. (4.2.1: f (connected) is connected if f ∈ C(M)) Suppose that f : M → N is continuous and let K ⊂ M. (i) If K is connected, so is f (K). (ii) If K is path-connected, so is f (K).

Proof of (i). Suppose f (K) is not connected. 1. From the definition of disconnectedness, ∃ open U, V s.t.

f (K) ⊂ U ∪ V , U ∩ V ∩ f (K) = ∅, U ∩ f (K) 6= ∅, V ∩ f (K) 6= ∅

2. Since f is continuous, f −1(U) and f −1(V ) are open. Moreover, K ⊂ f −1(U) ∪ f −1(V ), f −1(U) ∩ f −1(V ) ∩ K = ∅, f −1(U) ∩ K 6= ∅, f −1(V ) ∩ K 6= ∅. 3. Hence, K is disconnected, a contradiction.

6 Proof of (ii). If K is path-connected, so is f (K). 1. Let v, w ∈ f (K) and let x, y ∈ K s.t. f (x) = v, f (y) = w. 2. Since K is path-connected, ∃ a continuous curve c : [0, 1] → M s.t.

c(t) ∈ K (0 ≤ t ≤ 1), c(0) = x, c(1) = y

3. Since f is continuous, it is easy to show that c˜(t) = f (c(t)) ∈ f (K) for 0 ≤ t ≤ 1 and andc ˜ : [0, 1] → N is continuous path joining v and w. 4. Hence, f (K) is path-connected

7 Theorem. (4.2.2: f (compact) is compact if f ∈ C(M)) Suppose that f : M → N is continuous and K ⊂ M is compact. Then f (K) is compact.

It suffices to prove that f (K) is sequentially compact.

1. Let vn ∈ f (K). Let xn ∈ K s.t. f (xn) = vn. 2. Since K is compact, ∃ a convergent subsequence, say,

xnk → x0 ∈ K.

3. Since f is continuous, vnk = f (xnk ) → f (x0) ∈ f (K). This proves that f (K) is sequentially compact.

8 Examples 2 Let f : R → R be a continuous map. Denote x = (x1, x2). 2 2 • Let f (x) = x1 for x ∈ R . If K ⊂ R be compact, so is f (K) = {x1 : x = (x1, x2) ∈ K}. (Why? Since f is continuous and K is compact, f (K) is compact.) • Let f (x) = 7 for x ∈ R2. The set {7} is compact, while R2 = f −1({7}) is not compact. • The set A = {f (x): kxk = 1} is a closed interval. (Why? K = {x ∈ R2 : kxk = 1} is compact and connected. Hence, A = f (K) is compact and connected. )

9 Theorem.

(1) Let f : A ⊂ N → M and g : A ⊂ N → M be continuous at x0. Then

• f ± αg is continuous at x0 for any α ∈ R.

• fg is continuous at x0

• f /g is continuous at x0 if g(x0) 6= 0. (2) Suppose f : A ⊂ N → M and h : B ⊂ N → Rp are continuous and f (A) ⊂ B. Then h ◦ f : A ⊂ N → Rp is also continuous.

Proof. EASY

10 Theorem. (4.4.1: Maximum-Minimum Principle) Let f : A ⊂ M → R be continuous and let K be a compact subset in A. Then, • f (K) is bounded.

• ∃x0, y0 ∈ K such that

f (x0) = inf f (K) = inf f (x)& f (y0) = sup f (K) = sup f (x). x∈K x∈K

Proof. Since K is compact and f is continuous on K ⊂ A, f (K) is compact. Hence, f (K) is closed and bounded in R by Heine-Borel thm. This completes the proof.

11 Theorem. (4.5.1: Intermediate Value Theorem) Let f : A ⊂ M → R be continuous. Assume K is a connected subset in A and x, y ∈ K and f (x) < f (y). Then, • For every number c ∈ R such that f (x) < c < f (y),

∃ z ∈ K s.t. f (z) = c

Proof. Since K is connected and f is continuous on K ⊂ A, f (K) is connected. Hence, [f (x), f (y)] ⊂ f (K). ∴ ∃ z ∈ K s.t. f (z) = c. This completes the proof.

12 4.6 Throughout this section, we assume that f : A ⊂ Rn → Rm is continuous. • Definition. Let B ⊂ A. f is uniformly continuous on B if for every ² > 0, there is δ > 0 s.t.

kx − yk < δ & x, y ∈ B ⇒ kf (x) − f (y)k < ².

• Example. Consider f : R → R, f (x) = x2. Then f is continuous on R, but it is not uniformly continuous. Why? Let xn = n + 1/n and yn = n. Then |xn − yn| = 1/n → 0, while |f (xn) − f (yn)| ≥ 1. • Example. Consider f : (0, 1) → R, f (x) = 1/x. Then f is continuous on (0, 1), but it is not uniformly continuous. Why? Let xn = 1/n . Then |xn+1 − xn| < 1/n → 0, while |f (xn+1) − f (xn)| = 1.

13 Theorem. (Uniform Continuity Theorem) Let f : A ⊂ Rn → Rm be continuous and let K ⊂ A be compact. Then f uniformly continuous on K.

1. Let ² be given. Since f is continuous on K, for each x ∈ K, ¡ ² ¢ ∃δx > 0 s.t. f (D(x, δx ) ∩ K) ⊂ D f (x), 2 2. Since K ⊂ ∪x D(x, δx /2) and K is compact, N ∃{x1, ··· , xN } ⊂ K s.t. K ⊂ ∪j=1D(xj , δ) where 1 δ = 2 min{δx1 , ··· , δxN }. 3. If |x − y| < δ, x, y ∈ K, then ∃xj s.t. |x − xj | < δ. Since

|y − xj | ≤ |y − x| + |x − xj | < 2δ ≤ δxj , ² ² kf (x) − f (y)k ≤ kf (x) − f (xj )k + kf (xj ) − f (y)k ≤ 2 + 2 = ².

14 Chapter 5. This chapter deals with very important results in physical science: • a basic iteration technique called the contraction mapping principle (5.7.1) • some applications to differential and equations and some problems in control theory. (5.7.2, 5.7.3, 5.7.10) To study such results, we need • compactness in a (5.5.3) • uniform convergence, equi-continuity (5.6.2)

1 Definition. ( convergence & Uniform Convergence) Let N be a metric space with the metric ρ, A a set, and fk : A → N, k = 1, 2, ···

• fk → f pointwise if for each x ∈ A, limk→∞ fk (x) = f (x), i.e.

∀x ∈ A, lim ρ(fk (x), f (x)) = 0 k→∞

• fk → f uniformly if limk→∞ supx∈A ρ(fk (x), f (x)) = 0, i.e.

∀² > 0, ∃ N s.t. k > N ⇒ sup ρ(fk (x), f (x)) < ² x∈A

Examples: k • fk (x) = x → 0 pointwise in (0, 1). (Why?) k • fk (x) = x does NOT converge to 0 uniformly in (0, 1). xn • Show that fn(x) = 1+xn converges pointwise on [0, 2] but that the convergence is not uniform. 2 P Definition. (5.1.3: Does k gk makes sense ?) Pn Denote fn(x) = gk (x). P k=1 • gk = f (pointwise) if fn → f pointwise. Pk • k gk = f uniformly if fn → f uniformly.

Examples. P k 2k+1 • ∞ (−1) x k=0 (2k+1)! = sin x uniformly in the interval [−100, 100]. P k 1 • k x = 1−x converges uniformly in [−0.9, 0.9] P k 1 • k x = 1−x converges pointwise (NOT uniformly) in (−1, 1) P k • k x does not converge in R \ (−1, 1)

3 The Weierstrass M-test

Theorem. (5.2.1: Cauchy Criterion) Let V be a complete normed vector space with norm k · k, and let A be a set. Let fk : A → V is a sequence of functions. Then fk converges uniformly on A iff

∀ ² > 0, ∃ N s.t. l, k > N ⇒ sup kfk (x) − fl (x)k < ² x∈A

Proof of ⇒.

1. Assume fk → f uniformly. Let ² > 0 be given. 2. Then ∃ N s.t. k ≥ N ⇒ kfk − f k = supx∈A |fk (x) − f (x)| < ²/2. 3. Hence, ² ² l, k ≥ N ⇒ kfk − fl k ≤ kfk − f k + kfl − f k < 2 + 2 = ².

4 Theorem. (5.2.1: Continue...)

♣♦♥ Then fk converges uniformly on A iff ∀ ² > 0, ∃ N s.t. l, k > N ⇒ supx∈A kfk (x) − fl (x)k < ² Proof of ⇐.

1. From the assumption, fk (x) is Cauchy sequence for all x ∈ A.

2. Hence, for all x ∈ A, ∃ limk fk (x) and we can define f (x) = limk fk (x). 3. Let ² > 0 be given. From the assumption, ∃ N s.t. l, k > N ⇒ supx∈A kfl (x) − fl (x)k < ²/2. 4. From 2, ∀x ∈ A, ∃ Nx s.t. l > Nx ⇒ kf (x) − fl (x)k < ²/2. 5. From 3 and 4, if k ≥ N and x ∈ A, then kfk (x)−f (x)k ≤ kfk (x) − fl (x)k+kfl (x) − f (x)k < ²/2+²/2 for any l ≥ Nx .

6. From 5, k ≥ N ⇒ supx∈A kfk (x) − f (x)k < ². 5 Theorem. (5.2.2: Weierstrass M test) Let V be a complete normed vector space with norm k · k, and let A be a set. Suppose that gk : A → V are functions such that P∞ P∞ supx∈A kgk (x)k < Mk and k=1 Mk < ∞. Then k=1 gk converges uniformly.

Proof. R n 1. Denote fn(x) = k=1 gk (x). Pn+` Pn+` 2. Then kfn(x) − fn+`(x)k = k k=n gk (x)k ≤ k=n Mk . P∞ 3. Since limn→∞ k=n Mk = 0, it follows from 2 and Theorem 5.2.1 that fn converges uniformly.

6 5.5 The space of continuous functions Throughout this section, we assume M = Rn, A ⊂ M, and N = Rn. (N,M: complete normed space) • Denote C(A, N) = {f : f : A → N is continuous }. Then C is a vector space. • For f ∈ C(A, N), f is said to be bounded if there is a constant C such that kf (x)k < C for all x ∈ A.

• Denote Cb(A, N) = {f ∈ C : f is bounded }. • Define kf k = sup kf (x)k x∈A • kf k is a of the size of f and is called the norm of f .

7 Theorem. (5.5.1-3: Cb(A, N) is a complete normed space) m n Let A ⊂ M = R , N = R . The set Cb(A, N) is a complete normed space equipped with the norm kf k = supx∈A kf (x)k; that is, 1. Cb(A, N) is a normed space. • kf k ≥ 0 and kf k = 0 iff f = 0. • kαf k = |α|kf k for α ∈ R, f ∈ Cb. • kf + gk ≤ kf k + kgk.

2. Completeness: Every Cauchy sequence {fk } in Cb(A, N) converges to a function f ∈ Cb(A, N), that is,

lim kfk − f k = lim sup kfk (x) − f (x)k = 0. k→∞ k→∞ x∈A

• Clearly, Cb(A, N) is a normed space. (EASY!)

• From the definition, fk → f uniformly iff fk → f in Cb.

• From Cauchy criterion (Theorem 5.2.1), Cb(A, N) is complete. 8 Examples • Let B = {f ∈ C([0, 1], R): f (x) > 0 for all x ∈ [0, 1]}. Show that B is open in C([0, 1], R). Proof. 1. In order to prove that B is open, we must show that ∀ f ∈ B, ∃² > 0 s.t. D(f , ²) ⊂ B. 2. Let f ∈ B. Since [0, 1] is compact, f has a minimum value-say, m- at some point in [0, 1]. Hence, infx∈[0,1] f (x) = m. m 3. Let ² = 2 . We will show D(f , ²) ⊂ B. Proof. If g ∈ D(f , ²), then kg − f k < ², and m ∴ g(x) ≥ f (x)−|g(x) − f (x)| ≥ f (x)−kf − gk ≥ m−² = 2 for all x ∈ [0, 1]. Hence, g ∈ B. ∴ D(f , ²) ⊂ B

• Prove that B is D = {f ∈ Cb : infx∈[0,1] f (x) ≥ 0}. Proof.

1. D = D because if fn ∈ D → f uniformly, then fn(x) → f (x) pointwise and ∴ infx∈[0,1] f (x) ≥ 0. 1 1 2. If f ∈ D, then fn(x) := f (x) + n ∈ B and kfn − f k = n → 0. ∴ B ⊂ D ⊂ B. 9 Examples • PConsider a sequence fn ∈ Cb such that kfn+1 − fnk ≤ rn, where rn is convergent. Prove that fn converges. Proof. 1. Let ² >P0 be given. 2. Since rn is convergent, X∞ ∃ N s.t. n > N ⇒ rk < ² k=n 3. Hence, if n ≥ N, then

n+Xk−1 n+Xk−1 X∞ kfn+k −fnk = k (fj+1−fj )k ≤ kfj+1−fj k ≤ rj < ² j=n j=n j=n

4. From 3, fn is a Cauchy sequence, so it converges.

10 Arzela-Ascoli Theorem Throughout this section, we assume that M = Rm, A ⊂ M, N = Rn (N,M: complete normed space). Definition. (5.6.1: Equi-continuous) Assume B ⊂ C(A, N). • We say that B is equi-continuous if

∀ ² > 0, ∃ δ > 0 s.t. kx − yk < δ & x, y ∈ A

⇒ supf ∈Bkf (x) − f (y)k < ²

• We say B is pointwise compact iff Bx = {f (x): f ∈ B} is compact in N for each x ∈ A.

11 Example 5.6.4 (Compact sequence) 0 Let fn ⊂ Cb([0, 1], R) and be such that fn exist and à ! 0 sup kfnk ≤ C & sup sup |fn(x)| ≤ C n n x∈(0,1) for a positive constant C. Prove that B := {fn} is equicontinuous. Proof. • By the mean value theorem,

0 |fn(x) − fn(y)| ≤ sup |fn(z)| |x − y| ≤ C|x − y|, for all n z∈(0,1)

² • Hence, for given ² > 0, we can choose δ = C and

|x−y| < δ & x, y ∈ [0, 1] ⇒ sup|fn(x)−fn(y)| < C|x−y| < ². n

Hence, B := {fn} is equi-continuous. (So, {fn} has a convergent subsequence. Why? See Arzela-Ascoli theorem.) 12 Theorem. (5.6.2:Arzela-Ascoli theorem) Let A be compact and B ⊂ C(A, N). If B is closed, equi-continuous, and pointwise compact, then B is compact, that is, any sequence fn in B has a uniformly convergent subsequence.

The proof strategy is based on Bolzano-Wierstrass properties.

Theorem. (Special case of Arzela-Ascoli theorem) Let B ⊂ C([0, 1], R). If B is closed, equi-continuous, and bounded, then B is compact.

Proof.

1. Assume fn is a sequence in B. 1 2 n−1 2. Denote C1/n = { n , n , ··· , n , 1}. Let C = ∪nC1/n. 3. Since C is countable, we can write C = {x1, x2, ···}. 13 Theorem. (Special case of Arzela-Ascoli theorem) Let B ⊂ C([0, 1], R). If B is closed, equi-continuous, and bounded, then B is compact.

Proof.

1. Assume fn is a sequence in B. 1 2 n−1 2. Denote C1/n = { n , n , ··· , n , 1}. Let C = ∪nC1/n.

3. Since C is countable, we can write C = {x1, x2, ···}.

4. Since Bx1 is compact, ∃ a convergent subseq of fn(x1). Let us denote this subsequence by

f11(x1), f12(x1), ··· , f1k (x1), ···

5. Similarly, the sequence f1k (x2) has a subsequence

f21(x2), f22(x2), ··· , f2k (x2), ··· which is converegnt.

6. We proceed in this way and then set gn = fnn.

14 Proof of Arzela-Ascoli theorem

7. gn = fnn is obtained by picking out the diagonal

f11 f12 f13 ··· f1n ··· (1st subseq.) f21 f22 f23 ··· f11 ··· (2nd sub seq.) ...... fn1 fn2 fn3 ··· fnn ··· (n-th subseq.)

8. From the construction from the diagonal process,

lim gn(xi ) exists for all xi ∈ C. n→∞

9. Now, we are ready to prove

kgn − gmk = sup |gn(x) − gm(x)| → 0 as m, n → ∞. x∈[0,1]

15 Continue...

9. Proof of limn,m→∞ supx∈A |gn(x) − gm(x)| = 0. a. Let ² > 0 be given. b. From equi-continuity of {gn} ⊂ B, we can choose δ s.t.

|x −y| < δ & x, y ∈ A = [0, 1] ⇒ sup |gn(x)−gn(y)| < ²/3 n

1 c. Choose L ≥ δ . From 8, ² ∃ N s.t. n, m > N ⇒ sup |gn(xi ) − gm(xi )| < . xi ∈C1/L 3

d. For each x ∈ A, there exist yj ∈ C1/L s.t. |x − yj | < δ. Therefore, if n, m > N, then

|gn(x) − gm(x)| ≤ |gn(x) − gn(yj )| + |gn(yj ) − gm(yj )| ² ² ² +|g (x) − g (y )| ≤ + + m m j 3 3 3

This proves limn,m→∞ supx∈A |gn(x) − gm(x)| = 0. 16 Continue...

10. From 9, gn is a Cauchy sequence in C([0, 1], N).

11. Since C([0, 1], N) is the complete normed space, gn converges to some g ∈ C([0, 1], N). 12. Since B is closed, it must be g ∈ B. 13. From 1, 11, and 12, B is sequentially compact, so it is compact.

♣ ♣ ♣ ♣ ♣ ♣ ♣

The proof of Arzela-Ascoli theorem is exactly the same as the special case discussed above except the step 2. For the replacement of the step 2, we use the fact that the compact set A is totally bounded . The compactness of A provides that, for each δ > 0, there exist a finite set k Cδ = {y1, ··· , yk } such that A ⊂ ∪j=1D(yj , δ). 17 5.7 The contraction mapping principle

Theorem. (5.7.1: Contraction mapping principle) Let M be a complete normed space and Φ: M → M a given mapping. Assume

∃ k ∈ [0, 1) s.t. kΦ(f ) − Φ(g)k ≤ k kf − gk for all f , g ∈ M

Then there exists a unique fixed point f∗ ∈ M s.t. Φ(f∗) = f∗. In fact, if f0 ∈ M and fn+1 = Φ(fn), n = 0, 1, 2, ··· , then

lim kfn − f∗k = 0 n→∞

Key idea: Φ is shrinking distances:

n kfn+1 − fnk = kΦ(fn) − Φ(fn)k ≤ kkfn − fn−1k ≤ · · · ≤ k kf1 − f0k

1 The proof of contraction mapping principle: ∃ f∗ ∈ M s.t. Φ(f∗) = f∗

1. Let f0 ∈ M and fn+1 = Φ(fn), n = 0, 1, 2, ··· .

2. kf2 − f1k = kΦ(f1) − Φ(f0)k ≤ kkf1 − f0k. 2 3. kf3 − f2k = kΦ(f2) − Φ(f1)k ≤ kkf2 − f1k ≤ k kf1 − f0k. n 4. Inductively, kfn+1 − fnk ≤ k kf1 − f0k. 5. Hence, P∞ P∞ n 1 n=0 kfn+1 − fnk ≤ kf1 − f0k n=0 k = kf1 − f0k 1−k < ∞. 6. From the proof in Example 5.5.6 and 4, fn converges.

7. Since M is complete, limn→∞ fn = f∗ for some f∗ ∈ M. 8. Φ is uniformly continuous because kΦ(f ) − Φ(g)k ≤ k kf − gk.

9. From 8, limn→∞ Φ(fn) = Φ(f∗).

10. Hence, f∗ = limn→∞ fn+1 = limn→∞ Φ(fn) = Φ(f∗).

2 The proof of contraction mapping principle: Uniqueness of the fixed point f∗

11. To prove the uniqueness, assume g∗ is another fixed point, i.e., Φ(g∗) = g∗

12. Then f∗ − g∗ = Φ(f∗) − Φ(g∗) and

kf∗ − g∗k = kΦ(f∗) − Φ(g∗)k ≤ kkf∗ − g∗k

Hence, (1 − k)kf∗ − g∗k ≤ 0. 13. Since 1 ≤ k < 1, it must be

kf∗ − g∗k = 0

Hence, f∗ = g∗

3 Theorem. (5.7.2: Existence of sol’n of Differential equations) 2 Let A ⊂ R be an open neighborhood of (t0, x0). Assume f : A → R is satisfying the following Lipschitz condition:

|f (t, x1) − f (t, x2)| ≤ K|x1 − x2| for all (t, x1), (t, x2) ∈ A.

Then, there is a δ > 0 s.t. the equation

dx(t) = f (t, x), x(t ) = x dt 0 0

1 has a unique C -solution x = φ(t) with φ(t0) = x0, for t ∈ (t0 − δ, t0 + δ), i.e.,

0 φ (t) = f (t, φ(t)) for all t ∈ (t0 − δ, t0 + δ)& φ(t0) = x0

C 1-solution = continuously differentiable solution 4 Get insight: Proof of Theorem 5.7.2 Before the proof, let us get some insight. Imagine that φ is dx(t) the solution of dt = f (t, x), x(t0) = x0. 0 Since φ (t) = f (t, φ(t)) with φ(t0) = x0, Z t Z t 0 φ(t) = φ(t0) + φ (s)ds = x0 + f (s, φ(s))ds t0 t0 Hence, φ is a fixed point for the map Φ : M → M defined by Z t Φ(φ) = x0 + f (s, φ(s))ds t0 In order to apply the contraction mapping principle, we need to choose a suitable space M. In practice, the solution φ can be achieved from the following iterative method: Z t φn+1(t) = Φ(φn) = x0 + f (s, φn(s))ds & φ0 = x0 t0 5 Proof of Theorem 5.7.2 ˜ 1. Let L = sup(x,t)∈A˜ |f (x, t)| where A is a closed subset of A. Since f is continuous in A, L < ∞. 2. Choose δ such that Kδ < 1 and

{(t, x): |t − t0| < δ, |x − x0| < Lδ} ⊂ A˜

3. Denote C = C([t0 + δ, t0 + δ], R). From theorem 5.5.3, C is a complete normed space (or ) with norm kφk = sup |φ(t)| t∈[t0+δ,t0+δ] 4. Let

M = {φ ∈ C : φ(t0) = x0 & |φ(t) − x0| ≤ Lδ} 5. Then, M is also a complete normed space. (Why? M is closed subset of C w.r.t. the norm k · k.) 6 Proof of Theorem 5.7.2

5. Define Φ: M → C by (Please find its motivation from the previous slide)

Z t Φ(φ) = x0 + f (s, φ(s))ds t0 6. Claim: φ ∈ M ⇒ Φ(φ) ∈ M. Proof. Let φ ∈ M and ψ = Φ(φ).

• ψ(t0) = x0 and ψ ∈ C because ¯ ¯ ¯Z t+h ¯ ¯ ¯ lim |ψ(t + h) − ψ(t)| = lim ¯ f (s, φ(s))ds¯ ≤ lim Lh = 0 h→0 h→0 ¯ t ¯ h→0

• From 1, ¯Z ¯ ¯ t ¯ ¯ ¯ |t−t0| ≤ δ ⇒ |ψ(t)−x0| = ¯ f (s, φ(s))ds¯ ≤ L|t−t0| ≤ Lδ t0 • Hence, ψ ∈ M.

7. From 6, Φ maps M to M. See the condition of Theorem 5.7.1. 7 Proof of Theorem 5.7.2 7. Using the Lipschitz condition, ¯Z ¯ ¯ t ¯ ¯ ¯ kΦ(φ1) − Φ(φ2)k = sup ¯ f (s, φ1(s)) − f (s, φ2(s))ds¯ t∈[t0+δ,t0+δ] t0 ¯Z ¯ ¯ t ¯ ¯ ¯ ≤ sup ¯ K|φ1(s) − φ2(s)|ds¯ t∈[t0+δ,t0+δ] t0

≤ δKkφ1 − φ2k

8. Since δK < 1,

kΦ(φ1) − Φ(φ2)k ≤ kkφ1 − φ2k, k = δK ∈ [0, 1)

9. From 5.7.1, ∃ φ∗ ∈ M s.t. Φ(φ∗) = φ∗.

8 Theorem. (5.7.3: Fredholm equation) Assume that K(x, y) is continuous on [a, b] × [a, b] and

M = sup |K(x, y)| x,y∈[a,b]

If |λ|M|b − a| < 1, then the following Fredholm equation has a unique solution in C([a, b], R):

Z b f (x) = λ K(x, y) f (y) dy + φ(x), x ∈ [a, b] a where λ ∈ R, φ ∈ C([a, b], R).

Proof. For f ∈ C([a, b], R), we define

Z b (Φ(f ))(x) = λ K(x, y) f (y) dy + φ(x) a 9 Proof of 5.7.3 1. Claim: Φ maps from C([a, b], R) to C([a, b], R). Proof. Let f ∈ C([a, b], R). We need to show that Φ(f ) is continuous. Let ² > 0 be given. • Since [a, b] × [a, b] is compact, K(x, y) is uniformly continuous. • Hence, ∃ δ s.t. k(x1, y) − (x2, y)k < δ &(x1, y), (x2, y) ∈ [a, b] × [a, b] ² imply |K(x1, y) − K(x2, y)| < kf k |b−a|+1 . • If |x1 − x2| < δ and x1, x2 ∈ [a, b], then |(Φ(f ))(x1) − (Φ(f ))(x2)| = R b a |K(x1, y) − K(x2, y)||f (y)|dy ≤ δ kf k|b − a| < ². 2. Set k = |λ|M|b − a|. Then k < 1 and ¯Z ¯ ¯ b ¯ ¯ ¯ kΦ(f ) − Φ(g)k = sup ¯ K(x, y)(f (y) − g(y)|dy¯ ≤ kkf − gk x∈[a,b] a

3. From 5.7.1, ∃ unique f∗ ∈ C([a, b], R) s.t. Φ(f∗) = f∗.

10 Theorem. (5.7.4: Volterra integral equation) Assuming K(x, y) is continuous on [a, b] × [a, b], the Volterra R x integral equation f (x) = λ a K(x, y) f (y) dy + φ(x) has a unique solution f (x) for any λ.

Proof. For f ∈ C([a, b], R), we define R x (Φ(f ))(x) = λ a K(x, y) f (y) dy + φ(x) 1. As in 5.7.4, Φ maps from C([a, b], R) to C([a, b], R).

2. Let M = supx,y∈[a,b] |K(x, y)|. Then, ¯Z ¯ ¯ x ¯ ¯ ¯ |Φ(f )(x) − Φ(g)(x)| = |λ| ¯ K(x, y)(f (y) − g(y))dy¯ a ≤ |λ||x − a|Mkf − gk

11 Proof of 5.7.4 3. From 2, ¯Z ¯ ¯ x ¯ 2 2 ¯ ¯ |Φ (f )(x) − Φ (g)(x)| = |λ| ¯ K(x, y)(Φ(f )(y) − Φ(g)(y))dy¯ ¯Za ¯ ¯ x ¯ ¯ ¯ ≤ |λ| ¯ M|y − a||λ|Mkf − gkdy¯ a |b − a|2 ≤ |λ|2M2 kf − gk 2! 4. Inductively, we have

|λ|nMn|b − a|n kΦn(f ) − Φn(g)| ≤ kf − gk n!

P |λ|nMn|b−a|n 5. By the ratio test, n! converges. |λ|N MN |b−a|N N 6. Hence, we can choose N so that N! < 1. ∴ Φ is a contraction! 12 Proof of 5.7.4 N 7. From 6, ∃ unique f∗ ∈ C([a, b], R) s.t. Φ (f∗) = f∗. N+1 8. From 7, Φ (f∗) = Φ(f∗). N 9. From 8, Φ(f∗) is a fixed point of Φ . 10. From 7, 9, and the uniqueness of the fixed point, it must be f∗ = Φ(f∗).

What a CUTE IDEA is!

13 Examples • Example 5.7.5. Let Φ : R → R be defined by Φ(x) = x + 1. |Φ(x) − Φ(y)| = |x − y| k|x − y| for any k ∈ [0, 1), and Φ does not have a unique fixed point. • Example 5.7.6. Solve x0(t) = x(t), x(0) = 1. R t Solution. Let Φ(φ)(t) = 1 + 0 φ(s)ds. Let φ0 = 1 and Pn 1 k φn+1 = Φ(φn), n = 0, 1, ··· . Then φn(t) = k=0 k! t . t Hence, φn(t) → e . • Example 5.7.7. Solve x0(t) = t x(t) for t near 0 and x(0) = 3. R t Solution. Let Φ(φ)(t) = 3 + 0 φ(s)ds. Let φ0 = 3 and ³ ´k Pn 1 t2 φn+1 = Φ(φn), n = 0, 1, ··· . Then φn(t) = 3 k=0 k! 2 . t2/2 Hence, φn(t) → 3e .

14 Examples • Example 5.7.5. Consider the integral equation Z x f (x) = a + xe−xy f (y) dy 0 Check directly on which intervals [0, r] we get a contraction. −xy Solution. Let KR(x, y) = xe and let x −xy Φ(f )(x) = a + 0 xe f (y) dy. Then ¯Z ¯ ¯ x ¯ ¯ ¯ kΦ(f ) − Φ(g)| = sup ¯ K(x, y)(f (y) − g(y) dy¯ x∈[0,r] 0 ¯Z ¯ ¯ x ¯ ¯ ¯ ≤ sup ¯ K(x, y) dy¯ kf − gk x∈[0,r] 0 ¯ ¯ ¯ 2 ¯ = sup ¯1 − e−x ¯ kf − gk x∈[0,r]

Since 0 < 1 − e−r 2 < 1 for any r, Φ is a contraction for any r. 15 5.8 The Stone-Weierstrass Theorem Aim of Weierstrass Theorem is to show that any continuous function can be uniformly approximated by a function that has more easily managed properties, such as a . Theorem. (5.8.1: Weierstrass-Bernstein )

Let f ∈ C([0, 1], R). There exist a sequence of polynomial pn such that limn→∞ kpn − f k = 0. In fact,

Xn n! p (x) = xk (1 − x)n−k f (k/n) → f unformly n k!(n − k)! k=0

• n! k n−k Meaning of rk (x) := k!(n−k)! x (1 − x) : Imagine a coin with probability x of getting heads and, consequently, with probability 1 − x of getting tails. In n tosses, the probability of getting exactly k heads is that quantity. 1 Rough proof: Weierstrass-Bernstein Pn Pn 2 • k=0 rk (x) = 1 and k=0(k/n − x) rk (x) = x(1 − x). X lim rk (x) = 0, for any δ > 0 n→∞ k | n −x|>δ

and X lim rk (x) = 1, for any δ > 0 n→∞ k | n −x|<δ • Suppose that in gambling game called n-tosses, f (k/n) dollars is paid out when exactly k heads turn up when n tosses are made.m The average amount (after a lo∼∼ong evening of playing n-tosses) paid out when n tosses are made is

Xn pn(x) = rk (x) f (k/n) ≈ f (x) k=0

2 The Weierstrass-Bernstein theorem can be applied to C([a, b], R) because

g ∈ C([a, b], R) ⇒ f (x) = g(x(b − a) + a) ∈ C([a, b], R).

Theorem. (5.8.2: Stone-Wierstrass) Let M be a metric space, A ⊂ M a compact set, and B ⊂ C(A, R) satisfies the following: 1. B is algebra: f , g ∈ B &α ∈ R ⇒ f + g, fg, αg ∈ B 2. 1 ∈ B 3. ∀x, y ∈ A, x 6= y, ∃ f ∈ B s.t. f (x) 6= f (y). Then B is dense in C(A, R), that is, B = C(A, R).

The proof is easy (just technical). I just provide a rough insight.

1. Since B is algebra, f ∈ B ⇒ pn(f ) ∈ B. 2. Assume that A is a finite set. Then the proof is trivial. 3. Use the concept of finite δ− for the compact set A. 3 Differentiable Mappings Definition: Let A be an open set in Rn. A mapping f : A ⊂ Rn → Rm is said to be differentiable at x0 ∈ A if ∃ a linear function (m × n ) n m Df(x0) : R → R such that kf(x) − f(x ) − Df(x )(x − x )k lim 0 0 0 = 0 x→x0 kx − x0k

• Theorem 6.2.2. If f : A ⊂ n → m is differentiable, then ∂fj exist, and R R ∂xi  ∂f ∂f ∂f  1 1 ··· 1 ∂x1 ∂x2 ∂xn  ∂f2 ······ ∂f2  ( ) =  ∂x1 ∂xn  (called Jacobian matrix) Df x  . . .. .   . . . .  ∂fm ······ ∂fm ∂x1 ∂xn

• 1-Dimension. If f :(a, b) → R is differentiable at x0, then ∃ a number 0 m = f (x0) such that

kf(x) − f(x0) − m(x − x0)k f(x) − f(x0) lim = 0 or lim = m x→x0 kx − x0k x→x0 x − x0

1 n m Thm 6.1.2. If f : A ⊂ R → R is differentiable at a, then f is continuous at a and Df(a) is uniquely determined.

Proof of uniqueness. Let L1 and L2 be two m×n matrix (or linear mappings) satisfying kf(x) − f(a) − L (x − a)k kf(x) − f(a) − L (x − a)k lim 1 = 0 = lim 2 x→a kx − ak x→a kx − ak

It suffices to prove that kL1ej − L2ejk = 0 for j = 1, ··· , n.

1 kL1(hej)−L2(hej)k kL1ej − L2ejk = kL1(hej) − L2(hej)k = |h| khejk

( + ) ( ) ( ) ( + ) ( ) ( ) = kf a hej −f a −L1 hej −[f a hej −f a −L2 hej ]k khejk

( + ) ( ) ( ) ( + ) ( ) ( ) ≤ kf a hej −f a −L1 hej k + kf a hej −f a −L2 hej k khejk khejk

→ 0 as h → 0

• Proof of continuity: Since limy→a kf(y)−f(a)−Df(a)(y−a)k = 0, limy→a kf(y)−f(a)k = 0.

2 n m Thm 6.2.2. Assume f : A ⊂ R → R is differentiable at x and ∂f ∂f Df(x) = [a ]. Then, j exist and a = j . ij ∂xi ij ∂xi

Proof. Denote e1 = (1, 0, ··· , 0), e2 = (0, 1, 0, ··· , 0), en = (0, ··· , 0, 1). We have lim kf(y)−f(x)−Df(x)(y−x)k = 0 y→x ky−xk

( + ) ( ) ( )( ) lim kf x hei −f x −Df x hei k = 0 ı = 1 2 ⇒ h→0 |h| , , , ··· , n

q m P 2 |fj(x+hei)−fj(x)−aij(hei)| lim j=1 = 0  = 1 2 ⇒ h→0 |h| , , , ··· , n

∂fj ∂fj ⇒ exists and aij = ∂xi ∂xi

3 Thm 6.4.1. Let f : A ⊂ n → m. If each ∂fj exist and continuous on A, R R ∂xi then f is differentiable on A. h i [Proof for the case n = 2, m = 1.] Let Df(x) = ∂f (x), ∂f (x) , x ∈ A. From ∂x1 ∂x2 the mean value theorem,

f(y) − f(x) = f(y1, y2) − f(x1, y2) + f(x1, y2) − f(x1, x2) ∂f ∂f = (u1, y1) (y1 − x1) + (x1, u2) (y2 − x2) ∂x1 ∂x2 for some ui between xi and yi. Hence,

f(y) − f(x) − Df(x)(y − x) = α (y1 − x1) + β (y2 − x2) h ∂f ∂f i h ∂f ∂f i where α := (u1, y2) − (x) and β := (x1, u2) − (x) . ∂x1 ∂x1 ∂x2 ∂x2

Due to continuity of ∂f and ∂f , α → 0 & β → 0 as y → x and ∂x1 ∂x2

( ) + ( ) |f(y) − f(x) − Df(x)(y − x)| α y1 − x1 β y2 − x2 q = ≤ α2 + β2 → 0 p 2 2 ky − xk (y1 − x1) + (y2 − x2) as . This proves that lim kf(y)−f(x)−Df(x)(y−x)k = 0. y → x y→x ky−xk 4 Remark. About a differentiable map f : A ⊂ Rn → Rm.

• The proof of Thm 6.4.1 for the general case f : A ⊂ Rn → Rm is almost 2 same as the special case f : A ⊂ R → R.

• Intuitively, x → f(x0) + Df(x0)(x − x0) is supposed to be the best affine approximation to f near x0

• It should be noticed that the existence of ∂fj does not imply that the ∂xi derivative Df exist. Directional Derivatives.Let f : A ⊂ Rn → R be real-valued function.

Let n be a unit vector. d ( + ) = lim f(x+te)−f(x) is called • e ∈ R dtf x te |t=0 t→0 t the directional derivative of f at x in the direction e.

If is differentiable, then lim f(x+te)−f(x) = ( ) • f t→0 t Df x · e.

• Note that the existence of all directional derivatives at a point need not imply differentiability. Example. Let ( ) = xy for 2 = and ( ) = 0 if 2 = f x, y x2+y x 6 −y f x, y x 3 2 −y. Note that f is not continuous at (0, 0), since limt→0 f(t, t − t ) = 3 2 lim t(t −t ) = 1 = 0 = (0 0). But all directional derivative of at t→0 t2+t3−t2 − 6 f , f (0, 0) exist: f(ta, tb) 1 t2ab lim = = a t→0 t t t2a2 + tb for any unit vector e = (a, b).

5 m Chain Rule 6.5.1: Let A ⊂ Rn be open and let f : A → R be differentiable. Let p B ⊂ Rn be open, f(A) ⊂ B, and g : B → R be differentiable. Then h = g ◦ f is differentiable on A and Dh(x) = Dg(f(x))Df(x):

 ∂g1 ∂g1  ∂f1 ∂f1  ∂y ··· ∂y ··· . 1 .m ∂x1 ∂xn D(g ◦ f)(x) =  . ... .  . ... .     ∂gp ··· ∂gp ∂fm ··· ∂fm ∂y1 ∂ym ∂x1 ∂xn Proof. From the assumption, it is easy to see that :=♠ z }| { k (g ◦ f)(x) − (g ◦ f)(x ) − Dg(f(x ))(f(x) − f(x )) k lim 0 0 0 = 0 x→x0 kx − x0k :=♣ z }| { k f(x) − f(x ) − Df(x )(x − x ) k lim 0 0 0 = 0 x→x0 kx − x0k

Since (g ◦ f)(x) − (g ◦ f)(x0) − Dg(f(x0))Df(x0)(x − x0) = ♠ + Dg(f(x0))♣, it follows from the above identities that

= ♠ + Dg(f(x0)) ♣ z }| { k (g ◦ f)(x) − (g ◦ f)(x0) − Dg(f(x0))Df(x0)(x − x0) k lim = 0. x→x0 kx − x0k 6

Directional derivatives and examples

1. If h(r, θ) = f(r cos θ, r sin θ), then    cos θ −r sin θ  ∂h ∂h  = ∂f ∂f ∂r ∂θ ∂x ∂y sin θ r cos θ

2. Consider a surface S defined by f(x) =constant. Then ∇f(x) is orthog- onal to this surface. n Proof. Let c : [0, 1] → R be a curve lying on S and c(0) = x0. d 0 = f(c(t)) = ∇f(c(t)) · c0(t). dt This means that ∇f(c(t)) is orthogonal to its tangent vector c0(t). Since this is true for arbitrary curve on S passing x0, ∇f(x0) is orthogonal to S at x0.

3. The direction of greatest rate of increase of f(x) is ∇f(x).

7 6.7.1. Mean Value Theorem. Suppose f : A ⊂ Rn → R is differentiable on an open set A. For any x, y ∈ A such that the line segment joining x and y lies in A, ∃c on that segment such that f(y) − f(x) = Df(c) · (y − x) Proof. Define h(t) = f((1 − t)x + ty). Then 0 ∃t0 ∈ (0, 1) such that h(1) − h(0) = h (t0) and therefore

0 f(y) − f(x) = h(1) − h(0) = h (t0) = Df((1 − t0)x + t0y) · (y − x) | ={zc }

8 • Definition. A bilinear map B : Rm × Rn → R is n × m matrix such that     a11 ··· a1m y1 X . .. . . B(x, y) = aijxiyj = (x1, ··· , xn)  . . .   .  ij an1 ··· anm vm

• Definition 6.8.4. For positive r, f is said to be of class Cr if all partial derivatives up to order r exist and continuous.

2 • Let f : A ⊂ Rn → R is of class C . Then  2 2  ∂ f ··· ∂ f ∂x1∂x1 ∂x1∂xn D2f(x) =  . ... .   2 2  ∂ f ··· ∂ f ∂xn∂x1 ∂xn∂xn

• If D2f us continuous, D2f is symmetric.

9 3 3 Taylor’s Theorem 6.8.5.[Case:f ∈ C ]. Let f : A ⊂ Rn → R is of class C . Suppose x ∈ A and x + th ∈ A for 0 ≤ t ≤ 1. Then ∃c = x + t0h, 0 < t0 < 1, such that n n X ∂f 1 X ∂2f f(x + h) − f(x) = (x)hi + (x)hihj ∂x 2! ∂x ∂x i=1 i i,j=1 i j n 1 X  ∂3f  + (x + t0h)hihjhk 3! ∂xi∂xj∂x i,j,k=1 k Proof. R 1 d R 1 Pn ∂f f(x + h) − f(x) = f(x + th)dt = (x + th)hidt 0 dt 0 i=1 ∂xi Pn R 1 ∂f d(t−1) d(t−1) = (x + th)hi dt (Why? = 1) i=1 0 ∂xi dt dt Pn h ∂f R 1 d  ∂f  i = (x)hi − (x + th)hi (t − 1) dt i=1 ∂xi 0 dt ∂xi Pn ∂f = (x)hi + R1(h, x) i=1 ∂xi where n X Z 1  ∂2f  R1(h, x) = (1 − t) (x + th)hihj dt ∂x ∂x i,j=1 0 i j

10  2  Using d (t−1) = (1 ) and integration by part, dt − 2! − t

Pn R 1 d  (t−1)2   ∂2f  R1(h, x) = − (x + th)hihj dt i,j=1 0 dt 2! ∂xi∂xj 1 Pn ∂2f = (x)hihj + R2(h, x) 2! i,j=1 ∂xi∂xj where n X Z 1 (t − 1)2  ∂3f  R2(h, x) := (x + th)hihjhk dt 2! ∂xi∂xj∂x i,j,k=1 0 k Recall the second mean value theorem for integral Z 1 Z 1 f(t)g(t)dt = g(t0) f(t)dt for some 0 < t0 < 1. 0 0 Hence, ∃t0, 0 < t0 < 1 such that n X  ∂3f  Z 1 (t − 1)2 R2(h, x) = (x + t0h)hihjhk dt . ∂xi∂xj∂x 2! i,j,k=1 k 0 | {z1 } 3! One can proceed by using induction using the same method to get the general Taylor’s theorem. 6.8.5. Taylor’s Theorem [General Case:f ∈ Cr]. Let f : A ⊂ Rn → R is of class Cr. Suppose x ∈ A and x + th ∈ A for 0 ≤ t ≤ 1. Then

1 r−1 f(x + h) = f(x) + Df(x) · h + ··· + D f(x) · (h, ··· , h) + Rr−1(x, h) r! where Rr−1(x, h) is the remainder. Furthermore,

R 1(x, h) r− → 0 as h → 0 khkr−1

Another proof of Taylor’s formula. Let g(t) = f(x+th) for t ∈ [0, 1]. Applying one-dimensional Taylor’s formula, there exists ˜t ∈ (0, 1) such that r−1 X 1 1 g(1) = g(0) + g(k)(0) + g(k−1)(˜t) k! (r − 1)! k=1 Note that ( ) = + 1 (k−1)(˜), (1) = ( + ), (0) = ( ), Rr−1 x, h r!g t g f x h g f x 0 Pn ∂f g (0) = Df(x) · h = (x)hi i=1 ∂xi 00 2 Pn ∂2f g (0) = D f(x) · (h, h) = (x)hihj i,j=1 ∂xi∂xj 000 3 Pn  ∂3f  g (0) = D f(x)(h, h, h) = (x)hihjhk i,j,k=1 ∂xi∂xj∂xk

11 n Theorem 6.9.2. If f : A ⊂ R → R is differentiable and x0 ∈ A is an extreme point for f, then Df(x0) = 0.

Proof. Assume Df(x0) 6= 0. We try to prove that f(x0) is not a local extreme value.

Df(x0) • Let h = . Since f is differentiable at x0, kDf(x0)k 1 lim |f(x0 + λh) − f(x0) − Df(x0) · (λh)| = 0. λ→0 |λ|

kDf(x0)k • Hence, (for given  = 2 ) there exist δ > 0 such that

kDf(x0)k 0 < |λ| < δ ⇒ |f(x0 + λh) − f(x0) − Df(x0) · (λh)| < |λ| 2

Since Df(x0) · h = kDf(x0)k, we have

kDf(x0)k kDf(x0)k − |λ| < f(x0 + λh) − f(x0) − kDf(x0)kλ < |λ| 2 2 This leads to the followings:

12 kDf(x0)k – for 0 < λ < δ, 2 λ < f(x0 + λh) − f(x0). Hence, f(x0) is not local maximum.

kDf(x0)k – for −δ < λ < 0, f(x0 + λh) − f(x0) < 2 λ. Hence, f(x0) is not local minimum. n 3 Theorem 6.9.4. Suppose f : A ⊂ R → R is a C −function and x0 is a critical point.

• If f has a local maximum at x0, then Hx0 (f) is negative semi-definite.

• If Hx0 (f) is negative ( positive ) definite, then f has a local maximum (minimum) at x0

Indeed, this theorem holds true for f ∈ C2.

Proof. Since Df(x0) = 0, Taylor’s theorem gives

1 2 f(x0 + h) − f(x) = D f(x0)(h, h) + R2(x0, h) 2 where lim R2(x0,h) = 0. h→0 khk 2 If D f(x0) is negative definite, then

1 2 D f(x0)(h, h) + R2(x0, h) < 0 for sufficiently small h 2 and therefore f(x0 + h) − f(x0) < 0 for sufficiently small h. Hence, f(x0) has a local maximum at x0. 13  a b  • Example 6.9.5. The matrix = is positive definite if A b d

 a b   x  (x, y) > 0 if (x, y) 6= (0, 0). b d y

2 2 Hence, A is positive definite iff ax + 2bxy + dy > 0 for all all x, y. 2 Therefore, A is positive definite iff a > 0 and ad − b > 0.

• Example 6.9.6. Let f(x, y) = x2 − xy + y2. Then Df(0, 0) = (0, 0) and  2 −1  D2f(0, 0) = . Hence, the Hessian is positive definite. Thus −1 2 f has a local minimum at (0, 0).

14 Chapter 8. Integration 2 Definition. Let A ⊂ ℝ be a bounded set and let f : A → ℝ be a .

∙ We enclose A in some rectangle B = [a1, b1] × [a2, b2] and extend f to the whole rectangle by defining it to be zero outside of A.

∙ Let P be a partition of B obtained by dividing a1 = x0 < x1 < ⋅ ⋅ ⋅ < xn = b1 and a2 = y0 < y1 < ⋅ ⋅ ⋅ < ym = b2:

P = {[xi, xi+1] × [yj, yj+1] : i = 0, 1, ⋅ ⋅ ⋅ , n − 1, j = 0, 1, ⋅ ⋅ ⋅ , m − 1}. | {z } =subrectangle R

∙ Define the upper sum of f: X U(f, P) := sup{f(x, y) ∣ (x, y) ∈ R}× (volume of R) R∈P

∙ Define the lower sum of f: X L(f, P) := inf{f(x, y) ∣ (x, y) ∈ R}× (volume of R) R∈P 1 ∙ Define the upper integral of f on A by Z f = inf {L(f, P): P is a partition of B} A and the lower integral of f on A by Z f = sup {L(f, P): P is a partition of B} A

∙ We say that f is Riemann integrable or integrable if Z Z f = f. A A

∙ If f is integrable on A, we denote Z Z Z f = f = f. A A A Volume and sets of measure zero.

Definition. Let A be a bounded set of ℝn.

∙ The characteristic function1 A of A is the map defined by 1A(x) = 1 if x ∈ A and 1A(x) = 1 if x ∈/ A.

∙ We say that A has volume if 1A is Riemann integrable and the volume is the number Z vol(A) = 1A(x)dx. A

∙ The set A is said to have measure zero if for every  > 0 there is a countable number of rectangles R1,R2, ⋅ ⋅ ⋅ such that ∞ ∞ X A ⊂ ∪n=1Rn & vol(Rn) < . n=1

∙ Examples: The set of has measured zero in ℝ. As a 2 subset of ℝ , the has measure zero.

2 ∙ Lebesgue’s monotone convergence theorem. Let gn : [0, 1] → be ℝ integrable functions and R 1 g (x)dx < ∞. Suppose that 0 ≤ g ≤ g 0 n n+1 n and gn(x) → 0 for all x ∈ [0, 1]. Then Z 1 lim gn(x)dx = 0. n→∞ 0

R 1 −nx2 p ∙ Example: limn→∞ 0 e x dx = 0 if p > −1.

2 ∙ Fubini’s Theorem. Let A = [a, b] × [c, d] ⊂ ℝ , and let f : A → ℝ be continuous. Then Z Z b Z d  Z d Z b  f = f(x, y)dy dx = f(x, y)dx dy A a c c a

3 Chapter 10 Fourier . Fourier analysis arouse historically in connection with problems in mechanic such as heat conduction and wave motion.

∙ Vibrating string. Consider a string of length l with clamped ends that is free to vibrate when plunked. Let y(t, x) is the displacement of the string at time t and x ∈ [0, l]. – y obeys the wave equation ∂2y ∂2y = c2 ∂t2 ∂x2 (Force=mass× acceleration = tension) – That the string has clamped ends entails that y(t, 0) = y(t, l) = 0.

∙ It is both important and remarkable that any solution y(x, t) can be decomposed into harmonics: ∞ ∞ X X n  nc y(x, t) = cn yn(x, t) = cnsin x cos(!nt),!n = | {z } l l n=1 standing wave n=1 | {z } frequency

4 ∙ Physically, a standing wave is a synchronous up-and-down motion that repeats its shape periodically after time 2 , such as occurs when a string ! produces a pure note.

∙ Specific standing waves called fundamental solutions (a kind of basis) are given by n  yn(x, t) = sin x cos(!nt), n = 0, 1, 2, ⋅ ⋅ ⋅ l

∙ Thus a complicated-looking vibration is in reality an infinite linear com- bination of harmonics.

∙ The purpose of Fourier analysis is to carry out this procedure of decom- position using general method.

Exercise: Using separable , prove that any solution y(x, t) can be decomposed into harmonics ∞ ∞ X X n  nc y(x, t) = cnyn(x, t)= cnsin x cos(!nt),!n = l l n=1 n=1 n 10.1 Review: Inner Product in ℝ .

∙ For x, y ∈ ℝn, define inner product and norm: n X p ⟨x, y⟩ = x(j)y(j), ∥x∥ = ⟨x, x⟩. j=1

∙ The distance (or metric) between x and y is defined by ∥x − y∥, and hence ∥x − y∥ = 0 implies x = y.

∙ If ⟨x, y⟩ = 0, x and y are said to be orthogonal.

n ∙{ e1, e2, ⋅ ⋅ ⋅ , en} is said to be an orthonormal basis of ℝ if n 1. ℝ = span{e1, e2, ⋅ ⋅ ⋅ , en}

2. ∥ej∥ = 1, j = 1, ⋅ ⋅ ⋅ , n

3. ⟨ej, ei⟩ = 0 if i ∕= j.

∙ For example, e1 = (1, 0, ⋅ ⋅ ⋅ , 0), e2 = (0, 1, 0, ⋅ ⋅ ⋅ , 0), .... 5 n ∙ If {e1, e2, ⋅ ⋅ ⋅ , en} is an orthonormal basis, then every x ∈ ℝ can be rep- resented uniquely by n X x = ⟨x, ej⟩ej j=1

∙ If Vm = span{e1, ⋅ ⋅ ⋅ , em}, the element in Vm closest to x is m X xm = ⟨x, ej⟩ej j=1

qPn 2 with the distance ∥x − xm∥ = j=m ⟨x, ej⟩ .

This useful properties in Euclidean space can be generalized to infinite dimensional spaces by introducing . 10.1 Inner Product space C[0, 2]

∙ Let A be the interval (0, 2).

∙ Let V be the space of all continuous functions f : [0, 2] → ℂ. ∙ For f, g ∈ V , we define the inner product Z 2 ⟨ f, g ⟩ = f(x)g(x) dx 0 where g(x) denotes the complex conjugate of g(x). The above inner product can be approximated by n X ⟨ f, g ⟩ ≈ f(xj)¯g(xj) ∆x. j=1 where we divide the interval [0, 2] into n subintervals with endpoints x0 = 0 < x1 < = 2 and equal width ∆ = 2 . ⋅ ⋅ ⋅ < xn  x n ∙ Two functions f and g are said to be orthogonal if Z 2 ⟨ f, g ⟩ = f(x)g(x)dx = 0. 0 6 ∙ The norm of f is defined as s Z 2 ∥f∥ = p⟨ f, f ⟩ = ∣f(x)∣2dx. 0

∙ The distance between f and g is defined by d(f, g) = ∥f − g∥.

∙ If { n } is an orthogonal set of functions on the interval A with the property that ∥n∥ = 1, then we call { n } as an orthonormal set.

∙ Example. 1 1 1 1 1 { √ , √ cos x, √ sin x, √ cos 2x, √ sin 2x, ⋅ ⋅ ⋅ } 2     is an orthonormal set in V . 10.1 Inner Product space

Definition. Let V be a complex vector space V . An inner product on V is a mapping ⟨⋅, ⋅⟩ : V × V → ℂ with the following properties :

1. ⟨ f + g ℎ⟩ = ⟨ f, ℎ ⟩ + ⟨ g, ℎ ⟩ for all f, g, ℎ ∈ V and , ∈ ℂ.

2. ⟨ f, g ⟩ = ⟨ g, f ⟩

3. ⟨ f, f ⟩ ≥ 0, and ⟨ f, f ⟩ = 0 ⇒ f = 0

Theorem 10.1.2. The space V of the continuous functions f :[a, b] → ℂ forms an inner product space if we define Z b ⟨f, g⟩ = f(x)g(x)dx. a

7 10.1 Inner Product space V = C[a, b] Consider the space V of the continuous functions f :[a, b] → ℂ with the inner product ⟨f, g⟩ = R b ( ) ( ) a f x g x dx.

∙ Define the norm of f by ∥f∥ = p⟨ f, f ⟩.

∙ Define the distance between f and g by d(f, g) = ∥f − g∥.

For f, g, ℎ ∈ V , we have

∙ Cauchy-Schwarz inequality. ∣⟨ f, g ⟩∣ ≤ ∥f∥∥g∥

∙ Minkowski inequality. ∥f + g∥ ≤ ∥f∥ + ∥g∥

∙ Parallelogram law. ∥f + g∥2 + ∥f − g∥2 = 2∥f∥2 + 2∥g∥2

∙ Pythagorean Theorem. If ⟨ f, g ⟩ = 0, then ∥f + g∥2 = ∥f∥2 + ∥g∥2

8 Cauchy-Schwarz inequality. ∣⟨ f, g ⟩∣ ≤ ∥f∥∥g∥ Proof:

Suppose = 0. Let = g . Then ∙ g ∕ ℎ ∥g∥ ∣⟨ f, g ⟩∣ ≤ ∥f∥∥g∥ ⇔ ∣⟨ f, ℎ ⟩∣ ≤ ∥f∥

∙ Denote = ⟨ f, ℎ ⟩. Then 0 ≤ ∥f − ℎ∥2 = ⟨f − ℎ, f − ℎ⟩ = ∥f∥2 − ⟨ℎ, f⟩ − ¯ ⟨f, ℎ⟩ + ∣ ∣2 = ∥f∥2 − ∣ ∣2 Hence, ∣ ∣ = ∣⟨ f, ℎ ⟩∣ ≤ ∥f∥. This completes the proof.

9 Minkowski inequality. ∥f + g∥ ≤ ∥f∥ + ∥g∥ Proof: ∥f + g∥2 = ⟨f + g, f + g⟩ = ∥f∥2 + ⟨f, g⟩ + ⟨g, f⟩ + ∥g∥2 ≤ ∥f∥2 + 2∥f∥∥g∥ + ∥g∥2 = (∥f∥ + ∥g∥)2

10 Definition of convergence in an inner product space V . Let V be an inner product space and let fn be a sequence in V . We say that fn converges to f (in mean) and write fn → f if ∥fn − f∥ = 0, that is,

∀ > 0, ∃N s.t. n ≥ N ⇒ ∥fn − f∥ < . Pn Similarly, a series k=1 gk converges to f if n X lim ∥ gk − f∥ = 0. n→∞ k=1

Examples: Let V = C([0, 1]), the space of continuous functions f : [0, 1] → ℂ.

R 1 ∙ Let fn = nx [0, 1 ] +(2−nx) ( 1 , 2 ]. Then fn → 0 in mean, that is, ∣fn(x)− n n n 0 0∣2dx → 0.

2 2 ∙ Let fn = n x [0, 1 ] + (2n − n x) ( 1 , 2 ]. Then n n n Z 1 2 lim fn(x) = 0 (∀x ∈ ℝ)& lim ∣fn(x) − 0∣ dx = ∞. n→∞ n→∞ 0

11 Definition of Cauchy sequence. A sequence fn in an inner product space is said to be a Cauchy sequence when

∀ > 0, ∃N s.t. n, m ≥ N ⇒ ∥fn − fm∥ < . An inner product space is called complete if every Cauchy sequence in V converges. A complete inner product space is called a Hilbert space.

Remark: The inner product space V = C([0, 2]) is not complete.

n ∙ Let fn(x) = x for 0 ≤ x ≤ 1 and fn(x) = 1 for 1 ≤ x ≤ 2.

2 R 1 n m 2 ∙ Then fn is Cauchy sequence since ∥fn − fm∥ = 0 ∣x − x ∣ dx → 0 as n, m → ∞.

∙ However, fn → f where f(x) = 0 for 0 ≤ x ≤ 1 and f(x) = 1 for 1 ≤ x ≤ 2. f∈ / V .

12 A complete inner product space. To make the inner product space V = C([a, b]) complete, we need the following theorem and measure theory:

Theorem 8.3.4 If ( ) is integrable, 0, and R b ( ) = 0, then the set g x g ≥ a g x dx {x ∈ [a, b]: g(x) ∕= 0} has measure zero. Proof. TA

♣ For any integrable function f, theorem 8.3 leads to Z b ∣f(x)∣2dx = 0 ⇒ f = 0 except for those x in a set of measure zero. a Regarding such a f as equivalent to zero, we have the following theorem:

2 Theorem 10.1.6 Let V = L ([a, b]) be the space of functions f :[a, b] → ℂ that ∣f∣2 is integrable. Then V is an inner product space with inner product = R b ( ) ( ) and norm = p . ⟨f, g⟩ a f x g x dx ∥f∥ ⟨f, f⟩

13 If ( ) is integrable, 0, and R b ( ) = 0, Proof of Theorem 8.3.4: g x g ≥ a g x dx then the set {x ∈ [a, b]: g(x) ∕= 0} has measure zero.

∙ We first show that a set Am = {x ∈ A : g(x) > 1/m} has measure zero.

Recall R b ( ) = inf ( ): is any partition . ∙ a g x {U f, P P }

Let 0 be given. There exist a partition such that ( )  . ∙  > P U g, P < m

∙ Let I1, ⋅ ⋅ ⋅ ,Ik be the subintervals of the partition P such that Ii ∩ Am ∕= ∅. Then k k X X   ∣Ii∣ ≤ m sup g(x)∣Ii∣ ≤ m U(g, P ) < . I i=1 i=1 i

where ∣Ii∣ is the length of the interval Ii.

k Pk ∙ Since Am ⊂ ∪i=1Ii and i=1 ∣Ii∣ < , Am has measure zero.

∞ ∙ Since {x ∈ [a, b]: g(x) ∕= 0}⊂∪ m=1Am, the set has measure zero.

14 Proof of Theorem 10.1.6: Prove that V = L2([a, b]) is an inner product space.

If = 0, R b ( ) 2 = 0. From Theorem 8.3.4, = 0 since we are ∙ ∥f∥ a ∣f x ∣ dx f identifying functions that agree except on a set of measure zero.

∙ It is easy to see that ⟨f, g⟩ satisfies all the other rules of inner product space. We only need to prove that ∣⟨f, g⟩∣ < ∞ for all f, g ∈ V .

∙ If we split f and g into real and imaginary part, and into positive and negative part, we are reduce to the case in which f and g are real and positive.

∙ From Lebesgue monotone convergence theorem (page 467), it suffices to show that Z b lim (fg)M < ∞ ( see page 462) M→∞ a

∙ Note that 0 ≤ (fg) ≤ f√ g√ + f√2 + g√2 . M M M M M 15 R b( ) √ √ + √ 2 + √ 2. ∙ a fg M ≤ ∥f M ∥∥g M ∥ ∥f M ∥ ∥g M ∥

Hence, R b( ) + 2 + 2 ∙ a fg M ≤ ∥f∥∥g∥ ∥f∥ ∥g∥ < ∞. Example 10.1.8. If f1, ⋅ ⋅ ⋅ , fn are orthonormal in an inner prod- uct space V , prove that f1, ⋅ ⋅ ⋅ , fn are linearly independent.

∙ Definition. f1, ⋅ ⋅ ⋅ , fn are said to be linearly independent if n X cifi = 0 ⇒ c1 = ⋅ ⋅ ⋅ = cn = 0. i=1

Pn ∙ Assume that i=1 cifi = 0. We want to prove c1 = ⋅ ⋅ ⋅ = cn = 0.

∙ Due to orthogonality, we have * n + 2 X ck = ck∥fk∥ = cifi, fk = ⟨0, fk⟩ = 0. i=1

16 Example 10.1.8. Let V be an inner product space. Define the project of f on g to be the vector ⟨f, g⟩ ℎ = g ∥g∥2 Show that ℎ and f − ℎ are orthogonal, and interpret this result geometrically.

Proof: First, let us prove it when ∥g∥ = 1: ⟨ℎ, f − ℎ⟩ = ⟨ℎ, f⟩ − ∥ℎ∥2 = ⟨⟨f, g⟩g, f⟩ − ∣ ⟨f, g⟩ ∣2 = 0.

For the general case, repeat the above procedure.

17 10.2 Orthogonal family of functions

∙ Throughout this section, we assume that V is an inner product space with an inner product ⟨⋅, ⋅⟩.

∙ A vector  ∈ V is called normalized if ∥∥ = p⟨, ⟩ = 1.

∙ f and g are called orthogonal if ⟨f, g⟩ = 0.

∙ Definition. An orthonormal family 0, 1, ⋅ ⋅ ⋅ in V is called complete if every f ∈ V can be written ∞ X f = ckk (ck = ⟨f, k⟩) k=0 P∞ We call f = k=0 ckk the of f with respect to 0, 1, ⋅ ⋅ ⋅ and ck = ⟨f, k⟩ the Fourier coefficients.

∙ An orthonormal family {k} in V is complete iff for every f ∈ V , n X lim ∥f − ⟨f, k⟩ k∥ = 0. n→∞ k=0

18 P∞ Theorem 10.2.1: Suppose f = k=0 ckk for an orthonor- mal family 0, 1, ⋅ ⋅ ⋅ in V (convergence in mean). Then ck = ⟨f, k⟩ = ⟨f, k⟩. Proof.

Pn ∙ Set sn = k=0 ckk, so that ∥sn − f∥ → 0.

∙ Hence, ∣ ⟨f − sn, i⟩ ∣ ≤ ∥f − sn∥ → 0 as n → ∞.

Pn ∙ If n ≥ i, then ⟨sn, ⟩ = k=0 ⟨ckk, i⟩ = ci.

∙ If n ≥ i, ∣⟨f − sn, i⟩∣ = ∣⟨f, i⟩ − ci∣ ≤ ∥f − sn∥ → 0 as n → ∞.

∙ Hence, ⟨f, i⟩ = ci.

19 Examples of complete orthonormal families :

∙ Let V = L2([0, 2]) be the inner product space in Theorem 10.1.6.

einx ∙ The exponential system {n(x) = √ : n = 0, ±1, ±2} is a complete 2 orthonormal system in the space V, that is, Fourier series for f ∈ V for this family is given by

∞ ikx Z 2 X cke 1 −ikx f = √ , ck = ⟨f, k⟩ = √ f(x)e dx. 2 2 k=−∞ 0

∙ The trigonometric system √1 , cos√ mx, sin√ nx, m, n = 1, 2, ⋅ ⋅ ⋅ is complete 2 2 2 orthonormal system in V.

Proof. See Mean completeness theorem 10.3.1. (optional)

20 Gram-Schmidt process :

∙ Let g0, g1, g2, ⋅ ⋅ ⋅ be an linearly independent functions in an inner product space V .

∙ We can form a corresponding orthonormal system 0, 1, ⋅ ⋅ ⋅ as follows

g0 0 = ∥g0∥ ˜1 1 = ˜1 = g1 − ⟨g1, 0⟩ 0 ∥˜1∥ k ˜k X k+1 = ˜k = gk − ⟨g1, i⟩ i ∥˜ ∥ k i=0

21 Theorem: Bessel inequality: Let 0, 1, ⋅ ⋅ ⋅ be an orthonomal system P∞ 2 in an inner product space V . For each f ∈ V , the real series i=0 ∣⟨f, i⟩∣ converges and ∞ X 2 2 ∣⟨f, i⟩∣ ≤ ∥f∥ . i=0 Proof.

Pn ∙ Set sn = k=0 ckk where ck = ⟨f, k⟩.

∙ Key idea 1: f − sn and sn are orthogonal.

2 2 2 ∙ Key idea 2: Apply Pythagoras’ theorem: ∥f∥ = ∥f − sn∥ + ∥sn∥ .

2 2 ∙ Hence, ∥sn∥ ≤ ∥f∥ .

2 Pn 2 ∙ Since i are orthogonal, ∥sn∥ = i=0 ∣⟨f, i⟩∣ .

22 Parseval’s Theorem : Let 0, 1, ⋅ ⋅ ⋅ be an orthonomal system in an inner product space V . Then 0, 1, ⋅ ⋅ ⋅ is complete iff for every f ∈ V , we have ∞ X 2 2 ∣⟨f, i⟩∣ = ∥f∥ . i=0

Proof.

Pn ∙ Set sn = k=0 ckk where ck = ⟨f, k⟩.

2 2 2 ∙ Then ∥f∥ = ∥f − sn∥ + ∥sn∥ .

2 ∙ If 0, 1, ⋅ ⋅ ⋅ is complete, ∥f − sn∥ → 0. Therefore, letting n → ∞, ∞ 2  2 2 X 2 ∥f∥ = lim ∥f − sn∥ + ∥sn∥ = 0 + ∣⟨f, i⟩∣ n→∞ i=0

P∞ 2 2 2 2 ∙ Conversely, if i=0 ∣⟨f, i⟩∣ = ∥f∥ , then ∥f∥ − ∥sn∥ → 0, and so ∥f − 2 sn∥ → 0.

23