<<

In this section, we assume that you’re familiar with R2 as the set of pairs of real numbers, as well as the “standard” addition and scalar that you have seen in your linear algebra class.

1 Analysis on R2 From the previous section, an important element of analysis is the idea of measuring distance: the definition of convergence of a of real numbers used |xn − x| to measure how close xn was to x. As it turns out, there are many different ways of measuring distance on R2.

2 Definition 1.1. Suppose x = (x1, x2) ∈ R .

1 1 2 2 2 P2 2 2 1. Euclidean Size: kxk2 := (x1 + x2) = i=1 xi .

2. Maximum Size: kxk∞ := max{|x1|, |x2|}. P2 3. Sum Size: kxk1 := |x1| + |x2| = i=1 |xi|.

Notice that if we think of R as the subspace of R2 whose second component is 0, then each of the norms reduces to the absolute value. Thus, they are all generalizations of the absolute value as a measure of size.

Theorem 1.2. k · k2 satisfies (1-3):

2 1. kxk2 > 0 for all x ∈ R \{0}.

2 2. kaxk2 = |a|kxk2 for all x ∈ R and all a ∈ R.

2 3. kx + yk2 ≤ kxk2 + kyk2 for all x, y ∈ R .

These assertions are also true if we replace k · k2 by either k · k∞ or k · k1. Before we prove Theorem 1.2, we need the following lemma:

Lemma 1.3. Suppose x = (x1, x2) and y = (y1, y2). Then

x1y1 + x2y2 ≤ kxk2kyk2 for any x, y ∈ R2. Proof. By taking square roots, it will be enough to show that

2 2 2 2 2 (x1 + x2)(y1 + y2) ≥ x1y1 + x2y2 , or equivalently 2 2 2 2 2 (x1 + x2)(y1 + y2) − x1y1 + x2y2 ≥ 0.

1 We have 2 2 2 2 2 (x1 + x2)(y1 + y2) − x1y1 + x2y2 2 2 2 2 2 2 2 2 2 2 2 2 = x1y1 + x1y2 + x2y1 + x2y2 − x1y1 + 2x1y1x2y2 + x2yx 2 2 2 2 = x1y2 − 2x1y1x2y2 + x2y1 2 = (x1y2 − x2y1) , and this last quantity is clearly positive. We are now in position to prove Theorem 1.2.

2 2 Proof. (1.) Suppose that x 6= 0. Then, at least one of x1 or x2 is not zero, and so x1 +x2 > 0. Taking square roots yields kxk2 > 0. 2 (2.) Let x ∈ R and a ∈ R be arbitrary. Since ax = (ax1, ax2), we will have 2 2 2 2 2 2 2 2 kaxk2 = (ax1) + (ax2) = a x1 + x2 = a kxk2. √ Taking square roots (and bearing in mind that a2 = |a|) finishes the proof of (2.). 2 2 (3.) We will show that kx + yk2 ≤ (kxk2 + kyk2) . Since x + y = (x1 + y1, x2 + y2), we have 2 2 2 kx + yk2 = (x1 + y1) + (x2 + y2) 2 2 2 2 = x1 + 2x1y1 + y1 + x2 + 2x2y2 + y2 2 2  2 2 = (x1 + x2) + 2 x1y1 + x2y2 + (y1 + y2) 2 2 ≤ kxk2 + 2 x1y1 + x2y2 + kyk2 2 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 = kxk2 + kyk2 , where we have used Lemma 1.3.

Exercise: Prove the version of Theorem 1.2 for k · k1 and k · k∞. Now that we have a way of measuring size, we can also measure distance between “points” (really the vectors) in R2: Definition 1.4. Suppose x, y ∈ R2. Then

1. the Euclidean Distance between x and y is kx − yk2.

2. the Max Distance between x and y is kx − yk∞.

3. the Sum Distance between x and y is kx − yk1. This definition simply generalizes our ideas of distance from R: the distance between x, y ∈ R is |x − y|. It’s almost like we’re just replacing | · | with k · k. Notice that the properties of the various sizes are generalizations of the properties of the absolute value - including the ! There is an important distinction: unlike R, R2 has many different distances! Exercise: Suppose x = (x1, x2). Does kxk := 2|x1| + 3|x2| satisfy the three properties of Theorem 1.2? Why or why not? Exercise: Suppose A is an 2 × 2 . Does kxkA := kAxk2 satisfy (1-3) of Theorem 1.2? Why or why not?

2 1.1 Convergence in R2 Once we have a way of measuring distance, we can define convergence:

2 2 Definition 1.5. Suppose xn is a sequence in R and x ∈ R . We say that xn converges to x with respect to k · ki (i = 1, 2 or ∞) and write either xn → x or lim xn = x if for every n→∞ ε > 0, there is an N such that for all n ∈ N, if n > N, then kxn − xki < ε. On the face of things, it looks like we could perhaps have a sequence that converges with respect to one of our distances, but not with respect to the other. As it turns out, this doesn’t happen in R2. Lemma 1.6. There are positive constants C, K and M such that for all x ∈ R2 1 kxk ≤ kxk ≤ Ckxk C 1 2 1 1 kxk ≤ kxk ≤ Kkxk K ∞ 2 ∞ 1 kxk ≤ kxk ≤ Mkxk M 1 ∞ 1

Proof. We prove only the first. Suppose x = (x, y). By definition, kxk1 = |x| + |y| and p 2 2 kxk2 = x + y . We clearly have

p 2 2 |x| ≤ x + y = kxk2 p 2 2 |y| ≤ x + y = kxk2,

1 and adding these inequalities yields kxk1 ≤ 2kxk2, and so 2 kxk1 ≤ kxk2. It remains only to show that kxk2 ≤ 2kxk1. Notice that q p 2 2 p 2 2 2 kxk2 = x + y ≤ |x| + 2|x||y| + |y| = (|x| + |y|) = kxk1 < 2kxk1, as desired. Using the previous lemma, we can show the following:

Proposition 1.7. xn → x with respect to k · k2 if and only if xn → x with respect to k · k1

Proof. We first show that if xn → x with respect to k· k2, then xn → x with respect to k· k1. Let ε be given. Since xn → x with respect to k · k2, there is an N such that

kxn − xk2 < Cε whenever n > N. Suppose now that n > N is arbitrary. By Lemma 1.6, we then have 1 kx − xk ≤ kx − xk < Cε. C n 1 n 2

Thus, the N from the convergence of xn to x with respect to k · k2 also works to show convergence of xn to x with respect to k · k1. The other direction is left to you.

3 2 The preceding proposition shows that in R , to determine the convergence of xn to x, it doesn’t matter which measure of size we use - so long as it satisfies (1-3) of Theorem 1.2. Thus, from now on, when we say xn → x, it is unnecessary to specify with respect to which norm we mean! The other advantage of this is that we get the following:

Proposition 1.8. Suppose xn = (xn, yn) and x = (x, y). xn → x if and only if xn → x and yn → y.

This proposition says that to figure out if a sequence converges in R2, we need only look at the the behavior of the components. This is a very nice property, since we can then use all of our convergence properties in R.

Proof. Suppose xn → x. We need to show that xn → x and yn → y. Since Proposition 1.7 2 implies it doesn’t matter what size we use in R , we use k · k1. Notice that

|xn − x| ≤ |xn − x| + |yn − y| = kxn − xk1 for all n ∈ N.

Suppose now that ε > 0. Since xn → x, there is an N1 such that

kxn − xk1 < ε whenever n > N1.

By the preceding inequality, we then have

|xn − x| ≤ kxn − xk1 < ε whenever n > N1.

Thus, the N that works for xn also works for xn. Thus, xn → x implies that xn → x. The proof that yn → y is similar. We next suppose that xn → x and yn → y, and show that xn → x in k · k1. Let ε > 0 be given. By assumption, there exist Nx and Ny such that ε |x − x| < whenever n > N n 2 x and ε |y − y| < whenever n > N . n 2 y

Let N := max{Nx,Ny}. Then we will have ε ε kx − xk = |x − x| + |y − y| < + = ε whenever n > N. n 1 n n 2 2

Thus, xn → x in k · k1. The preceding implies the following, whose proof is left to you:

2 Proposition 1.9. Suppose xn → x and yn → y in R , and suppose an → a in R. Then

1. xn + yn → x + y

2. anxn → ax.

4 1.2 Continuity in R2 Notice that once we have convergence of in R2, we can define continuity for functions. Unlike the case of R, we have two types of functions to consider: f :Ω → R and f :Ω → R2, where Ω ⊆ R2. However, the definition for each is in essence the same: continuous functions must map convergent sequences to convergent sequences.

Definition 1.10. Suppose Ω ⊆ R2. Then

1. f :Ω → R is continuous at a ∈ Ω if f(xn) → f(a) for every sequence xn in Ω that converges to a.

2. f is continuous in Ω if f is continuous at every point in Ω.

2 3. f :Ω → R is continuous at a ∈ Ω if f(xn) → f(a) for every sequence xn in Ω that converges to a.

4. f :Ω → R2 is continuous in Ω if f is continuous at every point in Ω.

Example 1.11. Suppose f :(x, y) 7→ x2 − 2y. Then f : R2 → R is continuous on R2. To see this, suppose (xn, yn) → (x, y). We need to show that f(xn, yn) → f(x, y). Note that 2 2 f(xn, yn) = xn − 2yn → x − 2y by our rules for R, and so f is continuous everywhere! (See, all that initial work we did was worth it!)

We should be a little careful when we talk about continuity of f :Ω → R, since there are functions that satisfy both

• For each fixed x, y 7→ f(x, y) is a continuous of y and

• For each fixed y, x 7→ f(x, y) is a of x,

and yet the function is NOT continuous as a function of both variables!

Example 1.12. Consider f : R2 → R given by ( xy if xy 6= 0 f(x, y) := x2+y2 0 if xy = 0.

This function has the property that for each fixed x, y 7→ f(x, y) is continuous and for each fixed y, x 7→ f(x, y) is continuous, and yet f : R2 → R is not continuous at (0, 0), since if 1 we consider the sequence 1 , 1 , f 1 , 1  = n2 = 1 6→ 0 = f(0, 0). n n n n 1 + 1 2 n2 n2

It turns out that the determining continuity of functions f :Ω → R2 is very similar to determining that of functions f :Ω → R2. −2 3 Example 1.13. Let A = . Then f : 2 → 2, f : x 7→ Ax is continuous. 1 2 R R

5   xn To see this, suppose xn → x. We need to show that Axn → Ax. Suppose that xn = yn x and x = . Next, we have y     −2xn + 3yn −2x + 3y Axn = and Ax = xn + 2yn x + 2y

Thus, to show that Axn → Ax, it suffices to show that both components converge, i.e., −2xn + 3yn → x + 3y and xn + 2yn → x + 2y. By Proposition 1.8, xn → x implies that xn → x and yn → y. Our rules for limits in R then imply the claim. Notice that in the preceding example, we really only had to make sure that the functions x x 7→ −2x + 3y and 7→ x + 2y were continuous on 2. As it turns out, this is true in y y R general, as we now prove in two parts.

Proposition 1.14. Suppose f1 :Ω → R and f2 :Ω → R are both continuous at a ∈ Ω.     x f1(x, y) Then f :Ω → R2, f : 7→ is continuous as a ∈ Ω. y f2(x, y)

Proof. Suppose xn is a sequence in Ω and suppose xn → a. We need to show that f(xn) →       xn a1 f1(xn, yn) f(a). Suppose xn = and a = . By definition, f(xn) = and f(a) = yn a2 f2(x2, y2) f (a , a ) f (x , y ) f (a , a ) 1 1 2 . To show that 1 n n → 1 1 2 , Proposition 1.8 implies that it is f2(a1, a2) f2(xn, yn) f2(a1, a2) sufficient to show both

f1(xn, yn) → f1(a1, a2) and f2(xn, yn) → f2(a1, a2).

But that follows from the continuity of f1 and f2 at a. 2 Proposition 1.15. Suppose f :Ω → R is continuous at a. If f1 :Ω → R and f2 :Ω → R are the first and second components of f, then both f1 and f2 are continuous at a.

Proof. We will show that f1 is continuous at a. Suppose then that xn → a. By assumption, f(xn) → f(a). Proposition 1.8 then implies that the first component of the vector f(xn) converges to the first component of f(a). In other words: f1(xn) → f1(a). Combining the two preceding propositions, we have

Theorem 1.16. Suppose f :Ω → R2. f is continuous at a if and only if the component functions f1 :Ω → R and f2 :Ω → R are both continuous at a. Exercise: Suppose f : R2 → R2 is given by x x2 − y2 f = . y 2xy

Show that f is continuous on R2. The next theorems implies that the set of functions continuous at a is a vector :

6 Theorem 1.17. Suppose f, g :Ω → R are both continuous at a. Then so too are the f functions f + g, f − g, f · g, and g provided g(a) 6= 0.

Theorem 1.18. Suppose f1, f2 :Ω → R are both continuous at a. Then so too are f1 + f2 and cf1 for any c ∈ R.

Exercise: In the second theorem above, why don’t we make any claims about f1 · f2 or f1 ? f2

1.3 Cauchy Sequences in R2 Finally, we can define what it means for a sequence in R2 to be Cauchy:

2 Definition 1.19. A sequence xn in R is Cauchy with respect to k · ki if for every ε > 0, there is an N such that whenever n > N and m > N, then kxn − xmki < ε.

Notice that in principle, a sequence could be Cauchy with respect to k · k2, but not with 2 respect to k · k1. Fortunately, this doesn’t happen in R , which is a consequence of Lemma 1.6. Exercise: Prove that a sequence xn is Cauchy with respect to k · k2 if and only if xn is Cauchy with respect to k · k1. With this in mind, in the future we will say only that xn is Cauchy, and not bother to specify which norm we’re using - since the preceding exercise shows that it doesn’t matter which norm! 2 Exercise: Suppose xn → x in k · k2. Show that xn is a in R with respect to k · k2. (Hint: go back and look at the corresponding statement and its proof for R, and change the measures of distance!) As in R, it is important to know if a Cauchy sequence in R2 converges.

2 Theorem 1.20. Suppose xn is Cauchy with respect to k · k2. Then there is an x ∈ R such that xn → x in k · k2. Proof. Our proof will have two major components: finding a candidate for x, and then showing xn → x. For finding a candidate, suppose xn = (xn, yn). Notice that any ε > 0, there is an N such that

kxn − xmk2 < ε whenever n > N and m > N.

Thus, we will have

|xn − xm| ≤ kxn − xmk2 < ε whenever n > N and m > N, which means that the sequence xn of first components is a Cauchy sequence in R! Therefore, there is an x ∈ R such that xn → x in R. The same argument implies there is a y ∈ R such that yn → y in R. Let x := (x, y). This is our candidate for x. We next need to show that xn does in fact converge in k · k2 to x.

7 Let ε > 0 be given. Since xn → x and yn → y in R, there exist Nx and Ny such that ε |xn − x| < √ whenever n > Nx 2 and ε |yn − y| < √ whenever n > Ny. 2

Let N := max{Nx,Ny}, and suppose n > N. We then have r ε2 ε2 kx − xk = p(x − x)2 + (y − y)2 < + = ε, n 2 n n 2 2

since n > Nx and n > Ny. In the previous proof, notice that the hard lifting (the existence of limits for the sequence of components) was already done by the work we did for the analogous statement in R! Thus, the fact that Cauchy sequences in R2 converge is a consequence of the same fact in R!

1.4 Topology on R2 Before we define the various versions of closed, open and compact, we define bounded subsets and sequences in R2. Definition 1.21. Suppose B is a non-empty subset of R2.

1. B is bounded means there is an M such that kxki ≤ M for all x ∈ B (where i could be 1, 2 or ∞).

2 2. A sequence xn in R is bounded if the set {xn : n ∈ N} is bounded.

Proposition 1.22. Suppose xn is convergent. Then xn is bounded.

2 Proof. Since xn converges to some x ∈ R , there is an N such that

kxn − xk1 < 1 whenever n > N.

In particular, we will have

kxnk1 ≤ kxn − xk1 + kxk1 < 1 + kxk1 whenever n > N.

Thus, {xn : n > N} is bounded. Since {xn : n ∈ N} has only a finite number of extras, we know xn is bounded. Notice: it doesn’t matter which norm we use when we say that a set is bounded, since Lemma 1.6 implies that a set is bounded in R2 with respect to one norm if and only if it is bounded with respect to another. Note: the bound itself (the value of M) may change if you change the norm!

Corollary 1.23. Suppose xn is Cauchy. Then xn is bounded.

8 Proof. By Theorem 1.20, we know that xn converges, and so by the preceding proposition, xn must be bounded.

We next prove an R2 version of the Bolzano-Weierstrass Theorem. Notice that its state- ment is the same as the version in R - but its proof will be much easier, since we are able to appeal to the R version!

2 Theorem 1.24. Suppose xn is a bounded sequence in R . Then, there exists a 2 xnl that converges to some x ∈ R .

Proof. By assumption, there is an M such that kxnk2 ≤ M for all n ∈ N. If we write xn = (xn, yn), then notice that

|xn| ≤ kxnk2 ≤ M for all n ∈ N.

Thus, the sequence of real numbers xn is bounded. Therefore, by the Bolzano-Weierstrass

Theorem in R, there is a subsequence xnj of xn that converges to some x ∈ R. Now, notice that

|ynj | ≤ kxnj k2 ≤ M for all j ∈ N,

which implies that the sequence of real numbers ynj is bounded. Applying the Bolzano- Weierstrass Theorem to y , there is a subsequence y that converges to some y ∈ . nj njk R We now consider the subsequence x . Notice that x → x, since x is a subsequence njk njk njk of x . Therefore, both the components of x converge to the components of x := (x, y), n njk and so by Proposition 1.8, x → x. njk We can give the (almost) exact same definitions of closed, open, and compact as we did for subsets of R2. (As a matter of fact, to write the following definition, I just copied and pasted, and made the appropriate changes!)

Definition 1.25. Suppose B ⊆ A ⊆ R2.

1. B is closed in A if whenever xn is a sequence in B and xn → x and x ∈ A, it follows that x ∈ B.

2. B is closed means that B is closed in R2. 3. B is open in B if Bc ∩ A is closed in A.

4. B is open means that B is open in R2.

5. B is compact if whenever xn is a sequence in B, there is a subsequence xnj which converges to some element in B.

Proposition 1.26. Suppose B ⊂ R2 is compact. The following are all true: 1. For any set A with B ⊆ A, B is closed in A.

2. For any set C with C ⊆ B, if C is closed in B, then C is compact.

9 Proof. (1.) Suppose xn is a sequence in B such that xn → x for x ∈ A. We need to show that x ∈ B. Since B is compact, there is a subsequence xnj that converges to some y ∈ B. Since xn → x, we may use the same argument as in R to show that any subsequence of xn converges to x. Thus, x = y, and so x ∈ B. (2.) Suppose xn is a sequence in C. Since C ⊆ B and B is compact, there must be a subsequence xnj the converges to some x ∈ B. We need to show that in fact, x ∈ C. By definition of C is closed in B, the limit of a sequence in C must belong to C, i.e. x ∈ C, as desired.

The next lemma is the R2 version of the corresponding statement in R - the only thing that changes is that we use x instead of x!

Lemma 1.27. Suppose B is compact. Then B is bounded.

Proof. We show the contrapositive: if B is not bounded, then B is not compact. That means we need to find a sequence in B that has no convergent subsequence. Notice: for each n ∈ N, there must be an xn ∈ B with |xn| ≥ n. (Why?) Furthermore, there can be no convergent subsequence, since any such subsequence would have to be bounded . . . which is impossible! The next theorem details how continuous functions behave with respect to closed and compact. These are the same statement as before!

Theorem 1.28. Suppose ∅= 6 A ⊆ R2, and suppose f : A → R is continuous on A. 1. If A is compact, so too is f(A). (The continuous image of compact sets is compact.)

2. If K is closed, then f −1(K) := {x ∈ A : f(x) ∈ K} is closed in A. (The continuous pre-image of closed sets is closed.)

3. If K is open, then f −1(K) is open in A.

Proof. (1.) To show that f(A) is compact, we need to show that if yn is a sequence in f(A), then yn has a subsequence that converges to some element of f(A). By definition of f(A), there is a sequence xn in A such that f(xn) = yn. Since A is compact, there is a subsequence xnj that converges to some x ∈ A. Since f is continuous, we know that ynj = f(xnj ) → f(x) =: y. Since x ∈ A, y ∈ f(A). Therefore, f(A) is compact. −1 −1 (2.) To show that a f (K) is closed in A, suppose xn is a sequence in f (K) and −1 xn → x ∈ A. We need to show that in fact x ∈ f (K), or equivalently, f(x) ∈ K. Let yn := f(xn), y = f(x), and note that yn, y ∈ K. Since xn → x, we know that yn → y. Since K is closed, y ∈ K, i.e. f(x) ∈ K, as desired. (3.) To show that f −1(K) is open in A, we need to show that f −1(K)c ∩ A is closed in −1 c A. Thus, suppose xn is a sequence in f (K) ∩A that converges to some x ∈ A. We need −1 c −1 c to show that x ∈ f (K) ∩ A, i.e. that f(x) ∈/ K. Since xn ∈ f (K) ∩ A, we know c that f(xn) ∈/ K for all n, or equivalently f(xn) ∈ K for all n. f continuous on A implies c c that f(xn) → f(x). Because K is open, K is closed. Thus, f(x) ∈ K , which means that f(x) ∈/ K, as desired.

10 We can now show that continuous functions on compact subsets of R2 have a minimum and maximum: Theorem 1.29. Suppose A ⊆ R2 is non-empty and compact. If f : A → R is continuous, then there exists a, b ∈ A such that f(a) ≤ f(x) ≤ f(b) for all x ∈ A. Proof. We show the existence of an appropriate a. The proof for the existence of an appro- priate b is left to you. By Theorem 1.28, f(A) = {y ∈ R : y = f(x) for some x ∈ A} is compact, and so Lemma 1.27 implies that f(A) is bounded. Thus, inf f(A) exists. Now, there is a sequence yn ∈ f(A) such that yn → inf f(A). By definition of f(A), yn = f(xn) for some xn ∈ A. Since A is compact, there is a subsequence xnj that converges to some a ∈ A. We now show that f(a) = inf f(A). Notice that f(xnj ) = ynj → inf f(A)

(since ynj has the same limit as yn). By continuity, we also have f(xnj ) → f(a). Therefore, f(a) = inf f(A). Proposition 1.30. Suppose A ⊆ R2 is closed, f : A → R is continuous, and suppose there is an m such that m ≤ f(x) for all x ∈ A. If there is a bounded sequence xn such that ? ? f(xn) → inf f(A), then there is an x such that f(x ) = inf f(A).

Proof. Because there is an m such that m ≤ f(x) for all x ∈ A, inf f(A) exists. Because xn is bounded, the Bolzano-Weierstrass Theorem implies there is a subsequence xnj that converges ? ? ? to some x ∈ R. Since A is closed, x ∈ A. By continuity, f(x ) = limj→∞ f(xnj ) = limn→∞ f(xn) = inf f(A).

2 Norms on Rd In the previous section, we considered several different ways of measuring size and distance in R2. However, there is nothing magical about R2, and so we can proceed more generally. Throughout, we will make a small change in notation: we will just use x instead of x - you should bear in mind that x is an d-tuple of real numbers: x = (x1, x2, . . . , xd). Definition 2.1. A norm on Rd is a function k · k : Rd → R, k · k : x 7→ kxk, that satisfies 1. kuk > 0 for all u ∈ Rd\{0}. 2. kauk = |a|kuk for all u ∈ Rd, a ∈ R. 3. ku + vk ≤ kuk + kvk for all u, v ∈ Rd. There are lots of norms on Rd. Three of the most popular are: Definition 2.2. v u d uX 2 kxk2 = t xj j=1 d X kxk1 = |xj| j=1

kxk∞ = max{|xj| : j = 1, 2, . . . , d}.

11 Showing that k · k1 and k · k∞ are norms is straightforward, because of the properties of the absolute value in R. d Exercise: Show that k · k1 and k · k∞ are norms on R .

2.1 Inner Products

d Showing that k · k2 is a norm on R is more difficult. To show this, we need the following definition:

Definition 2.3. An inner product on Rd is a function h·, ·i : Rd × Rd → R such that 1. (Symmetry) hx, yi = hy, xi for all x, y ∈ Rd

d 2. (Linearity) ha1u + a2v, yi = a1hu, yi + a2hx2, yi for all a1, a2 ∈ R and all u, v, y ∈ R

3. (Positivity) hx, xi > 0 for all x ∈ Rd\{0}

Example 2.4. If x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn), then the standard dot product P is an inner product: hx,√ yi := x · y = xiyi. There are many others, as we will later see. In addition, note that x · x = kxk2! Notice: given an inner product, phx, xi > 0 whenever x 6= 0, and so phx, xi is a measure p of the size of x. As we will see, hx, xi defines a norm on Rd. The most difficult thing to show is that phx, xi satisfies the triangle inequality. This will follow from the following fundamentally important inequality: Theorem 2.5 (Cauchy-Schwarz-Bunyakovsky Inequality). Suppose h·, ·i is an inner product on Rd. Then for any x, y ∈ Rd p p hx, yi ≤ hx, xi · hy, yi

Proof. For fixed x, y ∈ Rd, consider the function f : t 7→ hx + ty, x + tyi. By the linearity and symmetry of h·, ·i, this means that

f(t) = hx, xi + 2thx, yi + t2hx, yi,

so f is a quadratic function of t. Note also that f(t) ≥ 0 for all t ∈ R. That means there is at most one solution of the quadratic equation f(t) = 0, and thus

4 (hx, yi)2 − 4hx, xi · hy, yi ≤ 0.

Therefore, we must have (hx, yi)2 ≤ hx, xi · hy, yi, which implies our claim by taking square roots. We can now show p Theorem 2.6. Suppose h·, ·i is an inner product on Rd. Then x 7→ hx, xi defines a norm on Rd, referred to as the norm induced by the inner product.

12 Proof. Note that the only difficulty is showing the triangle inequality. Suppose then that x, y ∈ Rd are arbitrary. We need to show that phx + y, x + yi ≤ phx, xi + phy, yi.

Note that by the linearity and symmetry of the inner product, we have

hx + y, x + yi = hx, xi + 2hx, yi + hy, yi,

and by the CSB inequality, we then have  2 hx + y, x + yi ≤ hx, xi + 2phx, xi · phy, yi + hy, yi = phx, xi + phy, yi .

Taking square roots yields the desired inequality. A word of caution: not every norm comes from an inner product!

2.2 Convergence in Rd Once we have a way of measuring distance, we can define convergence:

d d Definition 2.7. Suppose xn is a sequence in R and x ∈ R . We say that xn converges to x with respect to k · ki (i = 1, 2 or ∞) and write either xn → x or lim xn = x if for every n→∞ ε > 0, there is an N such that for all n ∈ N, if n > N, then kxn − xki < ε. Just as in R2, it looks like we could perhaps have a sequence that converges with respect to one of our distances, but not with respect to the other. As it turns out, just as in Rd, this doesn’t happen in Rd. Lemma 2.8. There are positive constants C, K and M such that for all x ∈ Rd 1 kxk ≤ kxk ≤ Ckxk C 1 2 1 1 kxk ≤ kxk ≤ Kkxk K ∞ 2 ∞ 1 kxk ≤ kxk ≤ Mkxk M 1 ∞ 1

Proof. We prove only the first. Suppose x = (x1, x2, . . . , xd). By definition, kxk1 = |x1| + p 2 2 2 |x2| + ··· + |xd| and kxk2 = x1 + x2 + xd. We clearly have

|xi| ≤ kxk2 for i = 1, 2, . . . d.

1 Adding these inequalities yields kxk1 ≤ dkxk2, and so d kxk1 ≤ kxk2. It remains only to show that kxk2 ≤ dkxk1. Notice that q q 2 2 2 2 kxk2 = x1 + x2 + ··· + xd ≤ |x1| + |x2| + ··· + |xd| = kxk1 < dkxk1, as desired.

13 Using the previous lemma, we can show the following:

Proposition 2.9. xn → x with respect to k · k2 if and only if xn → x with respect to k · k1

Proof. We first show that if xn → x with respect to k · k2, then xn → x with respect to k · k1. Let ε be given. Since xn → x with respect to k · k2, there is an N such that

kxn − xk2 < Cε whenever n > N.

Suppose now that n > N is arbitrary. By Lemma 2.8, we then have 1 kx − xk ≤ kx − xk < Cε. C n 1 n 2

Thus, the N from the convergence of xn to x with respect to k · k2 also works to show convergence of xn to x with respect to k · k1. The other direction is left to you.

The preceding proposition suggests that in Rd, it doesn’t matter which measure of size we use - so long as it satisfies (1-3) of Definition 2.1. Thus, in the future, we merely say xn → x without specifying the norm. The other advantage of this is that we get the following:

Proposition 2.10. Suppose xn = (xn,1, xn,2, . . . , xn,d) and x = (x1, x2, . . . , xd). xn → x if and only if xn,i → xi for each i = 1, 2, . . . , d.

This proposition says that to figure out if a sequence converges in Rd, we need only look at the the behavior of the components. This is a very nice property, since we can then use all of our convergence properties in R.

Proof. Suppose xn → x. We need to show that xn,i → xi for each i = 1, 2, . . . , d. Since d Proposition 2.9 implies it doesn’t matter what size we use in R , we use k · k1. Notice that

d X |xn,i − xi| ≤ |xn,j − xj| = kxn − xk1 for all n ∈ N. j=1

Suppose now i is fixed, and suppose ε > 0. Since xn → x, there is an N1 such that

kxn − xk1 < ε whenever n > N1.

By the preceding inequality, we then have

|xn,i − xi| ≤ kxn − xk1 < ε whenever n > N1.

Thus, the N that works for xn also works for xn,i. Thus, xn → x implies that xn,i → xi for every i = 1, 2, . . . , d. We next suppose that xn,i → xi for each i = 1, 2, . . . , d, and show that xn → x in k · k1. Let ε > 0 be given. By assumption, there exist N1,N2,...,Nd such that ε |x − x | < whenever n > N . n,i i d i 14 Let N := max{N1,N2,...,Nd}. Then we will have

d d X X ε kx − xk = |x − x | < = ε whenever n > N. n 1 n,j j d j=1 j=1

Thus, xn → x in k · k1. Exercise: Give two important reasons that we are working in finite dimensions in the previous proof. The preceding implies the following, whose proof is left to you:

d Proposition 2.11. Suppose xn → x and yn → y in R , and suppose an → a in R. Then

1. xn + yn → x + y

2. anxn → ax.

2.3 Continuity in Rd Notice that once we have convergence of sequences in Rd, we can define continuity for functions. Since we now have the idea of convergence for any Rd, we simply define continuity for functions f :Ω → Rk for Ω ⊆ Rd.

Definition 2.12. Suppose Ω ⊆ Rd. Then

k k 1. f :Ω → R is continuous at a ∈ Ω if f(xn) → f(a) (with respect to a norm on R ) for d every sequence xn in Ω that converges to a (with respect to a norm on R ).

2. f :Ω → Rk is continuous in Ω if f is continuous at every point in Ω. Another way to phrase (1.) above is the following: f is continuous at a if

kxn − akRd → 0 implies kf(xn) − f(a)kRk → 0.

As in R2, we need to be a little careful with continuity: it may in fact be possible for a function to be continuous with respect to each component, but not jointly. It turns out that the determining continuity of functions f :Ω → Rk is very similar to determining that of functions f :Ω → R.

Example 2.13. Let A be an k × d matrix. Then f : Rd → Rk, f : x 7→ Ax is continuous. To see this, suppose xn → x. We need to show that Axn → Ax. Suppose that     xn,1 x1 x  x   n,2  2 xn =  .  and x =  .  .  .   .  xn,d xd

15 th If the ij entry of A is aij, then by definition of matrix multiplication, we have     a11xn,1 + a12xn,2 + ··· + a1dxn,d a11x1 + a12x2 + ··· + a1dxd a x + a x + ··· + a x  a x + a x + ··· + a x   21 n,1 22 n,2 2d n,d   21 1 22 2 2d d  Axn =  .  and Ax =  .   .   .  ak1xn,1 + ak2xn,2 + ··· + akdxn,d ak1x1 + ak2x2 + ··· + akdxd

By Proposition 2.10, to show that Axn → Ax, it suffices to show that for each i = 1, 2, . . . , d th th the i component of Axn converges to the i component of Ax., i.e.,

ai1xn,1 + ai2xn,2 + ··· + aidxn,d → ai1x1 + ai2x2 + ··· + aidxd.

By Proposition 2.10, xn → x implies that xn,i → xi for i = 1, 2, . . . , d and so our rules for limits in R then imply the claim. Notice that in the preceding example, we really only that component functions were continuous on Rd. As it turns out, this is true in general, as we now prove in two parts. Proposition 2.14. Suppose Ω ⊆ d and suppose f :Ω → is continuous at a ∈ Ω for R  i R f1(x) f (x) k  2  each i = 1, 2, . . . k. Then f :Ω → R , f : x 7→  .  is continuous at a ∈ Ω.  .  fk(x)

Proof. Suppose xn is a sequence in Ω and suppose xn → a. We need to show that f(xn) →     f1(xn) f1(a) f (x ) f (a)  2 n   2  f(a). By definition, f(xn) =  .  and f(a) =  . . Thus, to show that f(xn) → f(a),  .   .  fk(xn) fk(a) Proposition 1.8 implies that it is sufficient to show that for i = 1, 2, . . . k that fi(xn) → fi(a). But that follows from the continuity of fi at a. Proposition 2.15. Suppose Ω ⊆ Rd and suppose f :Ω → Rk is continuous at a. If th fi :Ω → R is the i component of f, then fi is continuous at a.

Proof. We will show that f1 is continuous at a. Suppose then that xn → a. By assumption, th f(xn) → f(a). Proposition 2.10 then implies that the i component of the vector f(xn) th converges to the i component of f(a). In other words: fi(xn) → fi(a). Combining the two preceding propositions, we have

Theorem 2.16. Suppose Ω ⊆ Rd and suppose f :Ω → Rk. f is continuous at a if and only if the component functions fi :Ω → R continuous at a. Exercise: Suppose f : R2 → R3 is given by  3    x − y x x f = 3x + 2 . y  y +1  2xy

16 Show that f is continuous on R2. The next theorems implies that the set of functions continuous at a is a vector space:

Theorem 2.17. Suppose f, g :Ω → R are both continuous at a. Then so too are the f functions f + g, f − g, f · g, and g provided g(a) 6= 0.

k Theorem 2.18. Suppose f1, f2 :Ω → R are both continuous at a. Then so too are f1 + f2 and cf1 for any c ∈ R.

Exercise: In the second theorem above, why don’t we make any claims about f1 · f2 or f1 ? f2

2.4 Cauchy Sequences in Rd Finally, we can define what it means for a sequence in Rd to be Cauchy:

d Definition 2.19. A sequence xn in R is Cauchy with respect to k · ki if for every ε > 0, there is an N such that whenever n > N and m > N, then kxn − xmki < ε.

Notice that in principle, a sequence could be Cauchy with respect to k · k2, but not with d respect to k · k1. Fortunately, this doesn’t happen in R , which is a consequence of Lemma 2.8. Exercise: Prove that a sequence xn is Cauchy with respect to k · k2 if and only if xn is Cauchy with respect to k · k1. d Exercise: Suppose xn → x in k · k2. Show that xn is a Cauchy sequence in R with respect to k · k2. (Hint: go back and look at the corresponding statement and its proof for R, and change the measures of distance!) As in R, it is important to know if a Cauchy sequence in Rd converges.

d Theorem 2.20. Suppose xn is Cauchy with respect to k · k2. Then there is an x ∈ R such that xn → x in k · k2. Proof. Our proof will have two major components: finding a candidate for x, and then showing xn → x. For finding a candidate, suppose xn = (xn,1, xn,2, . . . , xn,d). Notice that any ε > 0, there is an N such that

kxn − xmk2 < ε whenever n > N and m > N.

Thus, for each i = 1, 2, . . . , d, we will have

|xn,i − xm,i| ≤ kxn − xmk2 < ε whenever n > N and m > N,

th which means that the sequence xn,i of i components is a Cauchy sequence in R! Therefore, for each i = 1, 2, . . . , d, there is an xi ∈ R such that xn,i → xi in R. Let x := (x1, x2, . . . , dd). This is our candidate for x. We next need to show that xn does in fact converge in k · k2 to x.

17 Let ε > 0 be given. Since for each i, xn,i → xi in R, there exists and Ni such that ε |xn,i − xi| < √ whenever n > Ni. d

Let N := max{N1,N2,...,Nd}, and suppose n > N. We then have v v u d u d 2 uX 2 uX ε kxn − xk2 = t (xn,j − xj) < t = ε, d j=1 j=1

since n > Ni for each i. In the previous proof, notice that the hard lifting (the existence of limits for the sequence of components) was already done by the work we did for the analogous statement in R! Thus, the fact that Cauchy sequences in Rd converge is a consequence of the same fact in R!

2.5 Topology on Rd Before we define the various versions of closed, open and compact, we define bounded subsets and sequences in Rd.

Definition 2.21. Suppose B is a non-empty subset of Rd.

1. B is bounded means there is an M such that kxki ≤ M for all x ∈ B (where i could be 1, 2 or ∞).

d 2. A sequence xn in R is bounded if the set {xn : n ∈ N} is bounded.

Proposition 2.22. Suppose xn is convergent. Then xn is bounded.

Exercise: Prove this proposition. (Hint: look up the same proposition in R2.) Notice: it doesn’t matter which norm we use when we say that a set is bounded, since Lemma 2.8 implies that a set is bounded in Rd with respect to one norm if and only if it is bounded with respect to another. Note: the bound itself (the value of M) may change if you change the norm!

Corollary 2.23. Suppose xn is Cauchy. Then xn is bounded.

Proof. By Theorem 2.20, we know that xn converges, and so by the preceding proposition, xn must be bounded.

We next prove an Rd version of the Bolzano-Weierstrass Theorem. Notice that its state- ment is the same as the version in R - but its proof will be much easier, since we are able to appeal to the R version!

d Theorem 2.24. Suppose xn is a bounded sequence in R . Then, there exists a subsequence d xnl that converges to some x ∈ R .

18 Proof. By assumption, there is an M such that kxnk2 ≤ M for all n ∈ N. If we write xn = (xn,1, xn,2, . . . , xn,d), then notice that

|xn,i| ≤ kxnk2 ≤ M for all n ∈ N.

th Thus, for each i = 1, 2, . . . , d, the sequence of i components xn,i is bounded. Therefore,

by the Bolzano-Weierstrass Theorem in R, there is a subsequence xnj of xn such that the

first components converge. Since the second components of xnj are bounded, the Bolzano- Weierstrass Theorem in implies there is subsequence x of x for which the first two R njk n components converge. Continuing on in this fashion, there will be a subsequence of a subsequence of a . . . subsequence

xnj of xn such that all components converge. Therefore, by Proposition 2.10, xnj → x. kl kl We can give the (almost) exact same definitions of closed, open, and compact as we did for subsets of R2. (As a matter of fact, to write the following definition, I just copied and pasted, and made the appropriate changes!)

Definition 2.25. Suppose B ⊆ A ⊆ Rd.

1. B is closed in A if whenever xn is a sequence in B and xn → x and x ∈ A, it follows that x ∈ B.

2. B is closed means that B is closed in Rd. 3. B is open in B if Bc ∩ A is closed in A.

4. B is open means that B is open in Rd.

5. B is compact if whenever xn is a sequence in B, there is a subsequence xnj which converges to some element in B.

Proposition 2.26. Suppose B ⊂ Rd is compact. The following are all true: 1. For any set A with B ⊆ A, B is closed in A.

2. For any set C with C ⊆ B, if C is closed in B, then C is compact.

Proof. (1.) Suppose xn is a sequence in B such that xn → x for x ∈ A. We need to show that x ∈ B. Since B is compact, there is a subsequence xnj that converges to some y ∈ B. Since xn → x, we may use the same argument as in R to show that any subsequence of xn converges to x. Thus, x = y, and so x ∈ B. (2.) Suppose xn is a sequence in C. Since C ⊆ B and B is compact, there must be a subsequence xnj the converges to some x ∈ B. We need to show that in fact, x ∈ C. By definition of C is closed in B, the limit of a sequence in C must belong to C, i.e. x ∈ C, as desired.

The next lemma is the Rd version of the corresponding statement in R. Lemma 2.27. Suppose B is compact. Then B is bounded.

19 Proof. We show the contrapositive: if B is not bounded, then B is not compact. That means we need to find a sequence in B that has no convergent subsequence. Notice: for each n ∈ N, there must be an xn ∈ B with |xn| ≥ n. (Why?) Furthermore, there can be no convergent subsequence, since any such subsequence would have to be bounded . . . which is impossible! The next theorem details how continuous functions behave with respect to closed and compact. These are the same statement as before!

Theorem 2.28. Suppose ∅= 6 A ⊆ Rd, and suppose f : A → Rk is continuous on A. 1. If A is compact, so too is f(A). (The continuous image of compact sets is compact.) 2. If K is closed, then f −1(K) := {x ∈ A : f(x) ∈ K} is closed in A. (The continuous pre-image of closed sets is closed.) 3. If K is open, then f −1(K) is open in A.

Proof. (1.) To show that f(A) is compact, we need to show that if yn is a sequence in f(A), then yn has a subsequence that converges to some element of f(A). By definition of f(A), there is a sequence xn in A such that f(xn) = yn. Since A is compact, there is a subsequence xnj that converges to some x ∈ A. Since f is continuous, we know that ynj = f(xnj ) → f(x) =: y. Since x ∈ A, y ∈ f(A). Therefore, f(A) is compact. −1 −1 (2.) To show that a f (K) is closed in A, suppose xn is a sequence in f (K) and −1 xn → x ∈ A. We need to show that in fact x ∈ f (K), or equivalently, f(x) ∈ K. Let yn := f(xn), y = f(x), and note that yn, y ∈ K. Since xn → x, we know that yn → y. Since K is closed, y ∈ K, i.e. f(x) ∈ K, as desired. (3.) To show that f −1(K) is open in A, we need to show that f −1(K)c ∩ A is closed −1 c in A. Thus, suppose xn is a sequence in f (K) ∩ A that converges to some x ∈ A. We −1 c −1 c need to show that x ∈ f (K) ∩ A, i.e. that f(x) ∈/ K. Since xn ∈ f (K) , we know c that f(xn) ∈/ K for all n, or equivalently f(xn) ∈ K for all n. f continuous on A implies c c that f(xn) → f(x). Because K is open, K is closed. Thus, f(x) ∈ K , which means that f(x) ∈/ K, as desired.

We can now show that continuous functions on compact subsets of Rd have a minimum and maximum:

Theorem 2.29. Suppose A ⊂ Rd is non-empty and compact. If f : A → R is continuous, then there exists a, b ∈ A such that f(a) ≤ f(x) ≤ f(b) for all x ∈ A. Proof. We show the existence of an appropriate a. The proof for the existence of an appro- priate b is left to you. By Theorem 2.28, f(A) = {y ∈ R : y = f(x) for some x ∈ A} is compact, and so Lemma 2.27 implies that f(A) is bounded. Thus, inf f(A) exists. Now, there is a sequence yn ∈ f(A) such that yn → inf f(A). By definition of f(A),

yn = f(xn) for some xn ∈ A. Since A is compact, there is a subsequence xnj that converges

to some a ∈ A. We now show that f(a) = inf f(A). Notice that f(xnj ) = ynj → inf f(A)

(since ynj has the same limit as yn). By continuity, we also have f(xnj ) → f(a). Therefore, f(a) = inf f(A).

20 Proposition 2.30. Suppose A ⊆ Rd is closed, f : A → R is continuous, and suppose there is an m such that m ≤ f(x) for all x ∈ A. If there is a bounded sequence xn such that ? ? f(xn) → inf f(A), then there is an x such that f(x ) = inf f(A). Proof. Because there is an m such that m ≤ f(x) for all x ∈ A, inf f(A) exists. Because xn is bounded, the Bolzano-Weierstrass Theorem implies there is a subsequence xnj that ? ? ? converges to some x ∈ R. Since A is closed, x ∈ A. By continuity, f(x ) = limj→∞ f(xnj ) = limn→∞ f(xn) = inf f(A).

2.6 Minimization: Coercivity and Continuity In many applications, some quantity is to be minimized. For example, we may want to minimize cost, or minimize work. In many physical applications, we might be interested in minimal energy, or minimizing work. Other very interesting (and often very difficult) minimization problems arise when we seek to minimize some geometric quantity: length, or surface area. In this section, we’ll consider some basic conditions that guarantee that a function has a minimizer.

Definition 2.31. Suppose Ω ⊆ Rd is non-empty and f :Ω → R. We say that x? ∈ Ω is a minimizer for f in Ω if f(x?) ≤ f(x) for all x ∈ Ω. We also call the value of f at x? the minimum of f on Ω. If there is an m ∈ R such that f(x) ≥ m for all x ∈ Ω, then f(Ω) ⊆ R is bounded from below, and so inf f(Ω) exists. We say that xn is a minimizing sequence for f if f(xn) → inf f(Ω). Recall: you had a homework problem that says that f has a minimizing sequence under these assumptions.

Whether or not f has a minimum is determined by the behavior of minimizing sequences.

x Example 2.32. Suppose Ω = R, and f : x 7→ e . Then, a minimizing sequence is xn = − log n, which doesn’t converge. Thus, the has no minimum on R. Example 2.33. Note also that there is nothing unique about minimizing sequences! For π 1 example, if f : R → R is f : x 7→ sin x, then xn = 2πn − 2 + n is a minimizing sequence, π 1 which doesn’t converge. Note also that yn = − 2 + n is a minimizing sequence, which does converge!

As suggested by Proposition 2.30, for a function defined on a subset of Rd, the existence of a bounded minimizing sequence is very important. A common condition guaranteeing this is coercivity. We need two definitions:

d Definition 2.34. A sequence yn in R goes to infinity if for every M > 0, there is an N such that whenever n > N, we have kynk ≥ M. In this case, we write yn → ∞. Notice that the preceding definition also applies to sequences of real numbers. In the case of R, we may also define yn → −∞: yn → −∞ means that for every M > 0, there is an N such that whenever n > N, we have yn < −M.

21 Definition 2.35. Suppose f :Ω → R. We say that f is coercive to mean that whenever xn → ∞, we have f(xn) → ∞. Example 2.36. Neither f : x 7→ ex nor f : x 7→ sin x are coercive.

Proposition 2.37. Suppose Ω ⊆ Rd is closed. If f :Ω → R is coercive and continuous on Ω, then f has a minimizer.

Proof. We consider two cases: Ω bounded and Ω unbounded. If Ω is bounded, then since Ω is closed and Ω ⊆ Rd, we know that Ω is compact, and so the existence of a minimizer is guaranteed. Suppose next that Ω is unbounded. We must first show that f(Ω) is bounded from below. If f(Ω) is not bounded from below, there must be a sequence xn such that f(xn) → −∞. If

xn is bounded, then there is a subsequence xnj such that xnj → x for some x ∈ Ω (since Ω is closed). Thus, by the continuity of f, we will have f(x) = lim f(xn ) = lim f(xn) = −∞, j→∞ j n→∞ which is impossible. Thus, xn must be unbounded, in which case the coercivity implies that f(xn) → ∞. Therefore, f(Ω) is bounded from below, and so inf f(x) is a . x∈Ω Suppose now that xn is a minimizing sequence. Since f(xn) converges, xn must be bounded. (Otherwise, f(xn) → ∞ ∈/ R. Thus, xn is bounded, and so by the Bolzano- ? d Weierstrass theorem, there is a subsequence xnj that converges to some x ∈ R . Since Ω is closed, x? ∈ Ω. By continuity, we have

? f(x ) = lim f(xn ) = lim f(x) = inf{f(x): x ∈ Ω}, j→∞ j n→∞ which means x? is a minimizer of f.

2.7 Uniqueness of Minimizers: Convexity In the preceding sections, we gave several conditions that guaranteed the existence of a minimizer. However, our methods weren’t really constructive, since we didn’t consider how to construct a minimizing sequence. In addition, we often had to pass to a subsequence, but how would we get an appropriate such sequence? Therefore, we now turn to a class of functions for which we often don’t need to pass to a subsequence.

Definition 2.38. Suppose Ω ⊆ Rd is non-empty. 1. We say that the set Ω is convex if for every pair of points x, y ∈ Ω, tx + (1 − t)y ∈ Ω for all t ∈ [0, 1]. (That is: the line segment connecting x and y lies entirely in Ω.)

2. Suppose Ω is convex. We say that f :Ω → R is convex if for every pair of points x, y ∈ Ω and all t ∈ [0, 1], we have

f(tx + (1 − t)y) ≤ tf(x) + (1 − t)f(y).

3. If we may replace ≤ above with < for all t ∈ (0, 1), we say that f is strictly convex.

22 Exercise: Suppose Ω is a subspace of Rd. Show that Ω is convex. Exercise: Suppose Ω := {x ∈ Rd : kxk ≤ 1}. Show that Ω is convex. Exercise: Suppose g : Rd → R is convex. Show that if f : Rd → R is linear, then f + g is convex. Exercise: Show that any norm on Rd is convex. Exercise: Show that x 7→ x2 is strictly convex on R - without using ! Exercise: Suppose h·, ·i is an inner-product on Rd. Show that u 7→ hu, ui is strictly convex. Remark: Suppose f : R → R, and f 0 : R → R is increasing. It can then be shown that f is convex (using the ). What if f 00(x) > 0 for all x ∈ R? Proposition 2.39. Suppose f :Ω → R is convex. If x, y are two minimizers of f, then f is constant along the line segment connecting x and y. Moreover, if f is strictly convex, the minimizer is unique. Proof. Suppose x and y are minimizers of f. Then, for any t ∈ [0, 1], we have f(tx + (1 − t)y) ≤ tf(x) + (1 − t)f(y) = t inf f(x) + (1 − t) inf f(x) = inf f(x). x∈Ω x∈Ω x∈Ω Suppose next that f is strictly convex, and suppose x and y are two minimizers. We want to show that x = y. Suppose then that x 6= y, and consider the line segment connecting x and y. Because f is strictly convex, for every t ∈ (0, 1), we will have f(tx + (1 − t)y) < tf(x) + (1 − t)f(y) = inf f(x). x∈Ω

1 Taking t = 2 then gives a point p ∈ Ω for which f(p) < infx∈Ω f(x), which is impossible. Theorem 2.40. Suppose Ω ⊆ Rd is convex, and suppose f :Ω → R is convex. If x ∈ Ω and there exists an r > 0 such that {y ∈ Rd : kx − yk < r} ⊆ Ω, then f is continuous at x. Proof. This is surprisingly finicky! Go and look up proofs of this on the web to see just on how annoying they are. In addition, it is very important that we’re working in a finite dimensional vector space. In infinite dimensions, this theorem may be false! Combining strict convexity and coercivity gives the following very useful theorem:

Theorem 2.41. Suppose Ω is convex and closed, and suppose f :Ω → R is coercive and strictly convex. Then any minimizing sequence of f converges to the minimizer of f on Ω. Exercise: Prove this theorem!

2.8 More on Norms The goal of this section is to prove that all norms on a finite dimensional vector space are equivalent, in the sense that convergence in one norm implies convergence in any other. In particular, we may use any norm we’d like on a finite-dimensional, since it won’t affect any of our concepts that are defined in terms of convergence. Recall that {e1, e2, . . . , ed} is a d d basis of R means that any u ∈ R can be written as a unique linear combination of the xi: u = u1e1 + u2e2 + ··· + uded.

23 Proposition 2.42. Suppose k · k : Rd → R is a norm on Rd. Then k · k : x 7→ kxk is a d continuous function on R , with respect to k · k2.

Proof. Suppose kxn − xk2 → 0. We need to show that kxnk → kxk. By the reverse triangle inequality, we have

kxnk − kxk ≤ kxn − xk. d Suppose now that {e1, e2, . . . , ed} is the standard basis for R (so ej is all zeros, except for th Pd Pd a 1 in the j position. By assumption, xn − x = j=1(an,j − a)ej where xn = j=1 an,jej Pd and x = j=1 ajej. But then, by the triangle inequality and the Cauchy-Schwarz inequality applied to the dot-product

d X kxnk − kxk ≤ |an,j − aj|kejk j=1 1 1 d ! 2 d ! 2 X 2 X 2 ≤ |an,j − aj| kejk j=1 j=1

= Ckxn − xk2

By the , kxnk − kxk → 0. Thus, x 7→ kxk is continuous with respect to k · k2.

Proposition 2.43. Suppose k · k : Rd → R is a norm. Then, there exist constants 0 < m ≤ M such that mkxk2 ≤ kxk ≤ Mkxk2 for all x ∈ Rd.

d Proof. Let A := {u ∈ R : kuk2 = 1}. Notice that A is a , since it is the pre-image of {1} ⊆ R in the norm function x 7→ kxk2, a continuous function. Moreover, A is bounded, d and so (since we’re in R ) A is compact. But then, there exists u1, u2 ∈ A such that

ku1k ≤ kuk ≤ ku2k for all u ∈ A.

d x Note that ku1k > 0, since u1 6= 0. Suppose now that x ∈ is non-zero. Let u := . By R kxk2 the inequality above, we then have

x ku1k ≤ ≤ ku2k, kxk2 and so we have ku1k kxk2 ≤ kxk ≤ ku2k kxk2.

Since this inequality is clearly satisfied for x = 0, we may take m := ku1k and M := ku2k.

24