<<

ANALYSIS LECTURES

JOHN QUIGG

Contents 1. Sets 1 2. The real numbers 10 3. Countability 21 4. Sequences 23 5. Limits 34 6. Continuity 38 7. Differentiation 42 8. Integration 48 9. Log and exp 60 10. Series 63 11. Sequences of functions 69 12. Series of functions 72 13. Power series 73 Index 80

1. Sets The terms set and element will be undefined. Remark 1.1. All a set A knows about is what its elements are. If x is any object, the sentence “x is an element of A”, written “x A”, is a (math- ematical) —that is, it is either true or . Roughly∈ speaking, all a set is good for is to allow the formation of such . In fact, strictly speaking these are the only propositions in mathematics; more pre- cisely, every mathematical proposition can be built up from those of the form “x A”. The rules governing these propositions comprise set theory, which is the∈ foundation of modern mathematics. We won’t study the formal theory of sets here; instead, we’ll just record a few of the most important facts concerning sets, partly to indicate the level of rigor, and the style of writing, of our definitions, results (theorems, corollaries, propositions, and lemmas), and proofs. Virtually everything in this section, except perhaps for the general concept of families of sets occurring at the end, is prerequisite material for the course.

Date: September 5, 2001. 1 2 JOHN QUIGG

Many definitions in set theory are merely translations of constructions from ; for example, the logical biconditional becomes set equality: Definition 1.2. Let A and B be sets. We say A equals B, written A = B, if for all x we have x A x B. ∈ ∈ Notation and Terminology 1.3. The of “x A” is written “x A”. ∈ 6∈ Definition 1.4. The empty set is the unique set such that for all x we have x . ∅ 6∈ ∅ Remark 1.5. There is no set A such that for all x we have x A; such a set would be “too big to be”—that is, its existence would lead∈ to (logical) , of which the most famous is Russell’s Paradox. The set-theoretic translation of the logical conditional is the subset rela- tion: Definition 1.6. Let A and B be sets. (i) A is called a subset of B, written A B, if for all x we have x A implies x B. ⊆ ∈ (ii) If A B,∈ we also say B is a superset of A, written B A. (iii) A subset⊆ A of B is called proper if A = B. ⊇ 6 Remark 1.7. In the above definition, we used a phrase of the form “we also say . . . ”. It is important to realize that this just introduces a synonym for the primary term—the definition of each synonym is the same as that of the primary term. For example, the definition of “B A” is the same as that of “A B”. ⊇ ⊆ Example 1.8. For every set A, we have: (i) A A; (ii) ⊆A. ∅ ⊆ Lemma 1.9. Let A and B be sets. Then A = B if and only if both A B and B A. ⊆ ⊆ Proof. First assume A = B. Since A A, we have both A B and B A. Conversely, assume A B and B ⊆A. Let x be any object.⊆ First assume⊆ x A. Then x B since⊆A B. Conversely,⊆ assume x B. Then x A since∈ B A. Thus∈ x A if and⊆ only if x B, so A = B.∈ ∈ ⊆ ∈ ∈  Remark 1.10. In the forward direction of the above proof, we had to prove A B (as well as B A); a more typical proof of this would start with “let⊆x A” and would⊆ deduce “x B”, however, the hypothesis “A = B” was so∈ special that it trivially implied∈ the desired conclusion. Actually, the above lemma has a much shorter proof, because it is just a set-theoretic version of a from logic (namely, the biconditional p q is equivalent to the conjunction p q and q p). However, the whole⇔ point here is to indicate typical proofs⇒ with sets,⇒ not to give the most efficient proofs of the results themselves. ANALYSIS LECTURES 3

The set-theoretic version of the , conjunction, and nega- tion are union, intersection, and complement: Definition 1.11. Let A an B be sets. (i) The union of A and B is defined by A B := x : x A or x B . ∪ { ∈ ∈ } (ii) The intersection of A and B is defined by A B := x : x A and x B . ∩ { ∈ ∈ } (iii) A and B are called disjoint if A B = . (iv) The difference of A and B is defined∩ by∅ A B := x : x A and x B . \ { ∈ 6∈ } (v) If the set U is understood, for any subset A U the complement of A is defined by ⊆ Ac := U A. \ Remark 1.12. Rather than state the elementary properties of these opera- tions, we’ll wait until we have the more general versions for families of sets (set below). Note that the complement of A is not just x : x A , rather it is x U : x A . Thus the complement depends{ upon6∈ the} choice of the {particular∈ “universe”6∈ } U; there is no single set U that could serve as a universe for taking complements of every set A, because, as we mentioned above, there is no single set U which contains all sets. Definition 1.13. (i) Let x and y be objects. The ordered pair with first coordinate x and second coordinate y is defined by (x, y) := x , x, y . { } { } (ii) The Cartesian product of sets A and B is defined by A B := (x, y): x A and x B . × { ∈ ∈ } Remark 1.14. Note that ordered pairs (x, y) and (z, w) are equal if and only if x = z and y = w. However, it would be improper to try to define “ordered pair” by this property; it is important to know that we can define all our concepts completely in terms of sets. Notation and Terminology 1.15. R denotes the set of real numbers, which we will discuss in more detail in the next section. Definition 1.16. Let A and B be sets, and let f A B. We say f is a function from A to B, written f : A B, if for⊆ all x× A there exists a unique y B such that (x, y) f. We→ call this y the∈value of f at x, written y =∈f(x). ∈ Remark 1.17. It would be improper to try to define a “function” as a “rule” with certain properties, because it raises the question “what is a rule?”; again, it is important to know that we can phrase the definition in terms of sets. 4 JOHN QUIGG

Definition 1.18. Let f : A B. → (i) The set A is called the domain of f, written A = dom f. (ii) The range of f is defined by ran f := f(x): x A . { ∈ } (iii) If C A, the image of C under f is defined by ⊆ f(C) := f(x): x C . { ∈ } (iv) If D B, the pre-image of D under f is defined by ⊆ 1 f − (D) := x A : f(x) D . { ∈ ∈ } (v) f is called real-valued if ran f R. ⊆ Remark 1.19. Note that the set A is determined by the function f, since A = dom f. However, the set B is not determined by f—it could be any superset of the range of f. This has an important consequence: if we have a function f : A B, and if C is any set such that ran f C, then we can equally well→ regard f : A C; it is still the same function.⊆ Actually, we have made a choice here—in→ some areas of advanced mathematics it is important to regard the set B as an official part of the function, but we have chosen to identify the function with its graph (x, f(x)) : x dom f . { ∈ } Remark 1.20. The notation f(C) for images is an abuse of the function notation; we must take care to interpret the notation correctly from the 1 context. Even more dangerous is the notation f − (D) for pre-images, since it can be confused with the concept of an inverse function (see below). Notation and Terminology 1.21. Occasionally it is convenient to intro- duce a function but not give it a name, in which case we use the notation “x (some formula involving x)”. For example, we could consider “the function7→ x x2 from R to R”. 7→ Remark 1.22. Thus, a function f is just a set of ordered pairs with a certain property, popularly known as the “vertical line test”: for all x, y, z, if (x, y) f and (x, z) f, then y = z. In particular, to prove two functions are equal∈ is just a case∈ of proving two sets are equal. But since functions are such special kinds of sets, there is a more efficient way to prove function equality: Lemma 1.23. Let f and g be functions. Then f = g if and only if dom f = dom g and for all x dom f we have f(x) = g(x). ∈ Proof. First, if f = g then trivially dom f = dom g and for all x dom f we have f(x) = g(x). ∈ Conversely, assume dom f = dom g and for all x dom f we have f(x) = g(x). We will show f g and g f; since the hypothesis∈ is symmetric in f and g, it suffices to⊆ show f ⊆g. Let (x, y) f. Then x dom f and f(x) = y. By hypothesis we have⊆ x dom g and∈g(x) = y. Thus∈ (x, y) g, and we have shown f g. ∈ ∈ ⊆  ANALYSIS LECTURES 5

Remark 1.24. In the direction of the above proof, when we were showing f g we did not just start with something like “let z f” and then deduce that⊆ there exist x and y such that z = (x, y); since the∈ elements of f are ordered pairs, it was more efficient to start with ordered-pair notation for an arbitrary element of f. In general, we try to use “notation appropriate for the context”. Definition 1.25. Let f : A B and C A. → ⊆ (i) The the function f C : C B defined by f C (x) = f(x) for each x C is called the |restriction→ of f to C. | (ii) If ∈g is a function with domain C, then f is called an extension of g to A if g = f C . | Proposition 1.26. Let f : A B, and let C,D A and E,F B. Then: → ⊆ ⊆ (i) f(C D) f(C) f(D); 1 \ ⊇ 1\ 1 (ii) f − (E F ) = f − (E) f − (F ); 1 \ \ (iii) f − (f(C)) C; 1 ⊇ (iv) f(f − (E)) E; (v) C D f⊆(C) f(D); (vi) E ⊆ F ⇒ f 1(E⊆) f 1(F ). ⊆ ⇒ − ⊆ − Proof. (i) Let y f(C) f(D). Then y f(C) and y f(D). Choose x C such that ∈y = f(x\). Then x D since∈ y f(D).6∈ Thus x C D. Hence∈ y f(C D) since y = f(x). 6∈ 6∈ ∈ \ ∈ \ 1 (ii) First let x f − (E F ). Then f(x) E F . Thus f(x) E ∈ \ 1 ∈ 1\ ∈ and f(x) F . Hence x f − (E) and x f − (F ). Therefore x 1 6∈ 1 ∈ 1 6∈ 1 1 ∈ f − (E) f − (F ), and we have shown f − (E F ) f − (E) f − (F ). \ 1 1 \ 1 ⊆ \ For the opposite inclusion f − (E) f − (F ) f − (E F ), just reverse the \ 1 ⊆1 \ 1 above steps. Explicitly: let x f − (E) f − (F ). Then x f − (E) and 1 ∈ \ ∈ x f − (F ). Thus f(x) E and f(x) F . Hence f(x) E F . Therefore 6∈ 1 ∈ 6∈ ∈ \ x f − (E F ). ∈ \ 1 (iii) If x C, then f(x) f(C), so x f − (f(C)). ∈ 1 ∈ ∈ 1 (iv) Given y f(f − (E)), choose x f − (E) such that f(x) = y. Then f(x) E, so y ∈E. ∈ (v)∈ Given y ∈f(C), choose x C such that f(x) = y. Then x D since C D. Hence∈f(x) f(D), so y∈ f(D). ∈ ⊆ 1 ∈ ∈ (vi) Let x f − (E). Then f(x) E, so f(x) F since E F . Therefore x f 1(F ). ∈ ∈ ∈ ⊆ ∈ −  Remark 1.27. In the proof of part (ii) above, it would have been logically correct to stop after saying “reverse the above steps”. However, this must be done carefully: it is only proper to do so when the steps can really be reversed exactly—if any modifications are necessary in the proof of the reverse direction, they must be explicitly mentioned. Actually, precisely because the steps are reversible, the proof could have been given for both directions simultaneously by using “if and only if” state- ments. However, again, the point is to indicate typical proofs with sets, not 6 JOHN QUIGG just prove the result; to prove two sets are equal it is typically necessary, or at least more convenient, to prove separately that each is a subset of the other. Definition 1.28. Given real-valued functions f and g with the same do- main, define (i) (f + g)(x) := f(x) + g(x); (ii) (cf)(x) := cf(x) if c R; (iii) (fg)(x) := f(x)g(x); ∈ (iv) f (x) := f(x) if 0 ran g.  g  g(x) 6∈ Remark 1.29. Slightly more generally, if dom f = dom g, it is sometimes convenient to still define the above combinations6 of f and g, with domain dom f dom g, and for f we also subtract g 1( 0 ) from the domain. ∩ g − { } In fact, it happens so often that we want to restrict a function to a subset of its domain that we often do it without comment; strictly speaking, we are replacing the old function with a new one, but in practice this causes no confusion. Definition 1.30. Given f : A B and g : B C, define g f : A C by → → ◦ → g f(x) := g(f(x)). ◦ Remark 1.31. Slightly more generally, if ran f dom g, it is sometimes convenient to still define g f by the above formula,6⊆ with domain ◦ 1 f − (dom g). Definition 1.32. (i) The identity function on a set A is defined by

idA(x) := x for x A. ∈ (ii) If A and B are sets and c B, the function f : A B defined by ∈ → f(x) = c for x A ∈ is called the constant function from A to B with value c. We also say f is identically c. Definition 1.33. Let f : A B. → (i) f is called one-to-one, written 1-1, if for all x, y A we have f(x) = f(y) x = y. ∈ (ii) f is called⇒ onto B if ran f = B. 1 (iii) If f is 1-1 and onto B, the unique function f − : B A such that 1 1 → f f = idA and f f = idB is called the inverse of f. − ◦ ◦ − Remark 1.34. Note that being onto is not just a property of the function f itself; it depends upon the set B when we regard f : A B. In fact, “onto-ness” is really a property of the set B relative to f, namely→ B must be the range of f. However, if the set B is understood we can say “f is onto”. ANALYSIS LECTURES 7

1 Remark 1.35. The notation f − must be used with care: for any function 1 f : A B it makes sense to consider f − (C) for any C B. But for y B it → 1 ⊆ 1 ∈ only makes sense to consider f − (y) if we know that the function f − exists (i.e., if we know f is 1-1 onto). Fortunately, the notation is consistent: if 1 1 f − : B A does exist and C B, then the set f − (C) is the same no matter whether→ we regard it as the⊆ pre-image of C under f or the image of 1 C under the function f − . Definition 1.36. (i) A set whose elements are sets is (also) called a family of sets. (ii) A function whose values are sets is called an indexed family of sets. If the domain if an indexed family of sets is I and the function is i Ai, the family is denoted Ai i I . When I = 1, 2,... we 7→ { } ∈ {n } write Ai , and when I = 1, . . . , n we write Ai . { }i∞=1 { } { }i=1 Remark 1.37. There is no logical need for the term “family of sets”; never- theless it is handy because it would sometimes be confusing to say “set of sets”. Note that to every indexed family Ai i I of sets, there is an associated { } ∈ family of sets, namely the range Ai : i I (and please note the subtle change of notation!) of the indexed{ family.∈ We} must take care to remember that the indexed family is very different from the associated family, since a function is different from its range. For some purposes, however, it won’t matter much whether we use a family or an indexed family. Since “family of sets” is simpler both linguistically and conceptually than “indexed family of sets”, we have a slight preference for the former. There is an important notational aspect to consider: it is common to use “index” notation such as Ai : i I for a family of sets, so that the family is given as the range of an{ indexed∈ } family. However, when we don’t want to use index notation, the family is commonly denoted with a capital script letter such as . The set-theoreticF version of the existential and universal quantifiers from logic are union and intersection for families:

Definition 1.38. Let Ai i I be an indexed family of sets. { } ∈ (i) The union of Ai i I is defined by { } ∈ Ai := x : there exists i I such that x Ai . [ { ∈ ∈ } i I ∈ When I = 1, 2,... we write i∞=1 Ai = A1 A2 , and when { } n S ∪ ··· I = 1, . . . , n we write Ai = A1 An. { } Si=1 ∪ · · · ∪ (ii) The intersection of Ai i I is defined by { } ∈ Ai := x : for all i I, x Ai . \ { ∈ ∈ } i I ∈ When I = 1, 2,... we write i∞=1 Ai = A1 A2 , and when { } n T ∩ ··· I = 1, . . . , n we write Ai = A1 An. { } Ti=1 ∩ · · · ∩ 8 JOHN QUIGG

(iii) The Cartesian product of Ai i I , written i I Ai, is defined as the { } ∈ Q ∈ set of functions i xi from I to i I Ai such that for all i I we 7→ S ∈ ∈ have xi Ai . When I = 1, 2,... we write i∞=1 Ai = A1 A2 ∈ { } n Q × × , and when I = 1, . . . , n we write Ai = A1 An. ··· { } Qi=1 × · · · × An element x of A1 An is written (x1, . . . , xn). If A1 = = n× · · · × ··· An = A we write A = A A, and call an element (x1, . . . , xn) an n-tuple of elements of×·A. · ·× Remark 1.39. It should be obvious how to define analogous notions of union and intersection for (unindexed) families of sets, although we must introduce the notation; for example, the union of a family is F = A := x : there exists A such that x A . [ F [ { ∈ F ∈ } A ∈F

If Ai i I is an indexed family, we use the same notation i I Ai for the { } ∈ S ∈ union of the indexed family as for the union of the associated family Ai : i I ; it is easy to see that the union is the same for both. Similarly{ for intersection.∈ } However, it does not make sense to try to form Cartesian products of (unindexed) families of sets! Remark 1.40. Keep in mind that although we normally use subscript nota- tion xi rather than ordinary function notation x(i) for values of an element x of a Cartesian product, x is still just a function.

Remark 1.41. Note that if A1 and A2 are sets we now have two definitions of the Cartesian product A1 A2: originally as a set of ordered pairs, and now × as a certain set of functions from 1, 2 into A1 A2. It is important to realize that we could not have avoided the{ original} definition∪ and just left everything concerning Cartesian products until now—the above general definition of Cartesian product uses the concept of function, which was defined in terms of ordered pairs. It turns out that this is a “chicken and egg” problem: we needed the concept of “ordered pair” before we could define ”function”. However, we could have omitted the original definition of union and inter- section of two sets, since this is subsumed by the above definition involving arbitrary families of sets and the original definition was not used to define anything else along the way. Definition 1.42. A family of sets is called pairwise disjoint if for all A, B we have either A =FB or A B = . ∈ F ∩ ∅ Remark 1.43. How should we define what it means for an indexed family Ai i I of sets to be pairwise disjoint? It seems simplest to let it mean the { } ∈ same as for the associated family Ai : i I . That is, we say Ai i I is { ∈ } { } ∈ pairwise disjoint if for all i, j I, either Ai = Aj or Ai Aj = . Note that ∈ ∩ ∅ this is not the same as “either i = j or Ai Aj = ”. ∩ ∅ ANALYSIS LECTURES 9

Proposition 1.44 (Distributive Laws). Let A be a set, and let be a family of sets. Then F

A = (A B) and A = (A B). ∩ [ F [ ∩ ∪ \ F \ ∪ B B ∈F ∈F

Proof. First let x A . Then x A and x . Thus we can choose B such∈ that∩xS FB. Since x∈ A, we have∈ SxF A B. Hence ∈ F ∈ ∈ ∈ ∩ x B (A B), and we have shown A B (A B). These ∈ S ∈F ∩ ∩ S F ⊆ S ∈F ∩ steps are reversible, so in fact A = B (A B). For the other part, first let x∩ S FA S ∈F. Then∩ x A or x . Case 1. x A. For all B ,∈ we have∪ TxF A B since∈ x A∈. ThusT F ∈ ∈ F ∈ ∪ ∈ x B (A B) in this case. ∈CaseT ∈F 2. x ∪ . For all B , we have x A B since x B. Thus ∈ T F ∈ F ∈ ∪ ∈ x B (A B) in this case as well. ∈ T ∈F ∪ We have shown that if x A then x B (A B), and therefore ∈ ∪ T F ∈ T ∈F ∪ A B (A B). For the opposite containment, let x B (A B).∪ T ToF ⊆ showT ∈Fx ∪A , we must show either x A or∈ Tx ∈F ∪. Equivalently, we must∈ ∪ showT F that if x / A then x ∈. Assume∈xT / FA, ∈ ∈ T F ∈ and let B . Since x B (A B), we have x A B. By assumption, ∈ F ∈ T ∈F ∪ ∈ ∪ x / A, so we must have x B. We have shown that x B (A B), as ∈ ∈ ∈ T ∈F ∪ desired, and this concludes the proof that B (A B) A .  T ∈F ∪ ⊆ ∪ T F

Proposition 1.45 (De Morgan’s Laws). Let be a family of sets. Then F

c c   = Ac and   = Ac. [ F \ \ F [ A A ∈F ∈F

Proof. First let x c. Then x , so it is false that there exists A such that∈xS FA. Thus for6∈ allS FA we have x A, hence ∈ Fc ∈ c ∈ F c 6∈ c x A . We have shown x A A , therefore A A . For the∈ opposite containment, just∈ T reverse∈F the above steps.S F ⊆ T ∈F For the other part, first let x c. Then x / , so it is false that for all A we have x A. Thus∈ T weF can choose ∈AT F such that x / A, ∈ Fc ∈ c ∈ F c ∈ c hence x A . We have shown x A A , therefore A A . ∈ ∈ S ∈F T F ⊆ S ∈F For the opposite containment, just reverse the above steps. 

Remark 1.46. In the of the above proposition, it is tacitly under- stood that there is some “universe” U with respect to which the complements are formed. 10 JOHN QUIGG

Proposition 1.47. Let f : A B, and let be a family of subsets of A and a family of subsets of B→. Then F G (i) f  = f(C), [ C [ C ∈F (ii) f C f(C), \ ⊆ \ C ∈F (iii) f 1 D = f 1(D), − [  [ − D ∈G and

f 1 D = f 1(D)(iv) − \  \ − D ∈G Proof. (i). Let y f C . Choose x C such that y = f(x), then choose C such∈ thatS x C. We∈ haveS y f(C) since y = f(x). ∈ F ∈ ∈ Therefore y C f(C). The converse direction is proven by essentially ∈ S ∈F reversing the steps. More precisely, if y C f(C) then we can choose C such that y f(C), then we can∈ chooseS ∈Fx C such that y = f(x), giving∈ F y f C since∈ x C. ∈ ∈  ∈ (ii). Let y Sf C . ChooseS x C such that y = f(x). For each C ∈ T  ∈ T ∈ we have x C and y = f(x), hence y f(C). Therefore y C f(C). F ∈ 1 ∈ ∈ T ∈F (iii). Let x f − D . Then f(x) D, so we can choose D such ∈ S  1 ∈ S 1 ∈ G that f(x) D. We have x f − (D), hence x D f − (D). These steps ∈ 1 ∈ 1 ∈ S ∈G are reversible, so f − D = D f − (D). 1 S  S ∈G (iv). Let x f − D . Then f(x) D. For each D we have ∈ T1  ∈ T 1 ∈ G f(x) D, hence x f − (D). Therefore x D f − (D). These steps are ∈ 1 ∈ 1 ∈ T ∈G reversible, so f − D = D f − (D).  T  T ∈G Remark 1.48. In the proof of (ii) above, the steps are not reversible, because if we take y C f(C), then we only know that for every C there exists x C ∈suchT that∈F y = f(x)—we cannot conclude that there∈ is F a single x which∈ works for every C.

2. The real numbers Remark 2.1. The real numbers form the foundation of mathematical anal- ysis. Although the real numbers can be constructed starting only from the counting numbers 1, 2,... , we shall not do this; instead we content ourselves with a careful listing of the properties of the real numbers. However, it is important to know that such a construction is possible, and that moreover the following axioms characterize the real numbers—this means that every real number system is just a relabelling of any other one. ANALYSIS LECTURES 11

The axioms fall into three categories: (1) the Field Axioms, listing the essential arithmetic properties, (2) the Order Axioms, listing the essential properties of inequalities, and (3) the all-important Completeness Axiom, guaranteeing that the “real number line” has no gaps. Axiom 2.2 (Field Axioms). There exist functions (x, y) x + y and (x, y) xy from R2 to R such that: 7→ 7→ (i) for all x, y, z R we have (x+y)+z = x+(y+z) and (xy)z = x(yz); (ii) for all x, y ∈R we have x + y = y + x and xy = yx; (iii) for all x, y,∈ z R we have x(y + z) = xy + xz; (iv) there exists 0∈ R such that for all x R we have 0 + x = x; (v) there exists 1 ∈ R such that 1 = 0 and∈ for all x R we have 1x = x; (vi) for all x R ∈there exists x 6 R such that x +∈ ( x) = 0; (vii) for all x ∈ R 0 there exists− ∈ x 1 R such that−xx 1 = 1. ∈ \{ } − ∈ − Remark 2.3. The elements 0 and 1 of R are unique. Also, for each x R the elements x and x 1 (if x = 0) are unique. ∈ − − 6 Remark 2.4. It would not have been proper to try to put the universal quantifying phrase “for all x, y, z R” before the list of items (i)–(vii), because in (iv) and (v) the phrase “for∈ all x R” comes after an existential quantifying phrase (“there exists . . . such that”),∈ and it changes the logic to reverse the order of mixed quantifiers. Also, in each item we made a point of putting the universal quantifying phrase before the propositional expression involving the quantified variable(s), and we will (usually) continue to do so in the future, again to emphasize the order of the quantifiers. Although it is popular to write things like “there exists x such that P (x, y) for all y”, or symbolically “ xP (x, y) y”, this must be interpreted with care—if it is rewritten with all∃ the quantifiers∀ in front, where should the y go? The convention is that it should be the same as “ x yP (x, y)”, but∀ in a complicated proposition involving many quantifiers this∃ ∀ can cause confusion, so in these notes we try to put all quantifying phrases in front. Definition 2.5. For each x, y R define ∈ (i) x y := x + ( y) and x− − (ii) := xy 1 (if y = 0). y − 6 Notation and Terminology 2.6. (i) N denotes the set of natural numbers, so that N = 1, 2, 3,... { } is the smallest (where here “smaller than” means “is a subset of”) subset of R satisfying both (a) 1 N, and (b) for∈ all n N we have n + 1 N. This property∈ of N is the Principle∈ of Mathematical Induction; this means that any subset A N satisfying both 1 A and for all n A we have n + 1 A must⊆ coincide with N. ∈ ∈ ∈ 12 JOHN QUIGG

(ii) Z denotes the set of integers, so that Z = N N 0 = 0, 1, 2,... ∪ − ∪ { } { ± ± } is the smallest subset of R containing N and closed under subtrac- tion. (iii) Q denotes the set of rational numbers, so that n Q = : n, k Z, k = 0 n k ∈ 6 o is the smallest subset of R containing Z and closed under division. (iv) The set of irrational numbers is R Q. \ Definition 2.7. For each x R and n Z define ∈n ∈ x x x if n N n z 1· }|··· { ∈ x :=  N  n if n and x = 0 x− − ∈ 6 1 if n = 0 and x = 0.  6 Remark 2.8. All the usual properties of algebra follow from Axiom 2.2. For example: (i) for all x R we have 0x = 0; (ii) for all x ∈ R we have x = ( 1)x; (iii) ( 1)2 =∈ 1; − − (iv) for− all x, y R, if xy = 0 then either x = 0 or y = 0. ∈ Axiom 2.9 (Order Axioms). There exists a subset P R such that: ⊆ (i) for all x R, exactly one of x P , x P , or x = 0 holds; (ii) for all x,∈ y P we have x + y ∈ P and− xy∈ P . ∈ ∈ ∈ Definition 2.10. For each x, y R define ∈ (i) x < y if y x P ; (ii) x > y if y− < x;∈ (iii) x y if x < y or x = y; (iv) x ≤ y if y x. ≥ ≤ Remark 2.11. All the usual properties of inequalities follow from Axiom 2.9. For example, for all x, y, z R we have: ∈ (i) exactly one of x < y, x > y, or x = y holds; (ii) if x < y then x + z < y + z; (iii) if z > 0, then x < y implies xz < yz; (iv) if x < y and y < z, then x < z; (v) xy > 0 if and only if either x > 0 and y > 0 or x < 0 and y < 0; (vi) xy < 0 if and only if either x > 0 and y < 0 or x < 0 and y > 0. Remark 2.12. We must resist the temptation to regard inequalities as mun- dane—in fact techniques of manipulating inequalities are crucial in the de- velopment of the theory of analysis (which includes the material in this course). ANALYSIS LECTURES 13

Definition 2.13. For each x R define ∈ x if x 0 x := ( ≥ | | x if x < 0. − Lemma 2.14. For all a, b R we have: ∈ (i) a a ; (ii) ±a ≤b | if| and only if b a b. | | ≤ − ≤ ≤ Proof. (i) More precisely, we must show two inequalities: a a and a a . We first show a a . ≤ | | − ≤ | |Case 1. a 0. Then≤ | a| = a , so a a . Case 2. a <≥ 0. Then a = | |a 0≤ | a| . For the other inequality, we−| have| ≤ ≤a | | a = a . (ii) First assume a b. Then a − b ≤and | − a| |b.| Multiplying the latter inequality by 1, we| get| ≤a b. Combining≤ − with≤a b, we get b a b. Conversely,− assume b ≥a − b. Then a b since≤ b a.− Combining≤ ≤ with a b, we get a − b≤. Since≤ a is either− ≤a or a, we− have≤ a b. ≤ ± ≤ | | − | | ≤  Proposition 2.15 (Triangle Inequality). For all x, y R, ∈ x + y x + y | | ≤ | | | | x y x y (alternate version) . | | − | | ≤ | − | Proof. First, x x and y y , so ≤ | | ≤ | | x + y x + y . ≤ | | | | Next, x x and y y , so − ≤ | | − ≤ | | (x + y) x + y . − ≤ | | | | Thus, ( x + y ) x + y x + y , − | | | | ≤ ≤ | | | | so x + y x + y . | | ≤ | | | | For the alternate version, x = x y + y x y + y , | | | − | ≤ | − | | | so x y x y . | | − | | ≤ | − | Then also y x y x = x y . | | − | | ≤ | − | | − | Thus x y x y . | | − | | ≤ | − |  Definition 2.16. (i) Given a, b, x R, we say x is between a and b if a x b or b x a. Furthermore,∈ we say x is strictly between if≤ the inequalities≤ ≤ are≤ strict (this means equality is not allowed). 14 JOHN QUIGG

(ii) A subset of R which contains every number between any two of its elements is called an interval. Every interval has one of the following forms (where a and b denote real numbers with a b): (a) [a, b] := x : a x b (closed, bounded); ≤ (b) [a, ) :={ x : a≤ x≤ (left-half-closed,} unbounded); (c) ( ∞, b] :={ x : x≤ }b (right-half-closed, unbounded); (d) (−∞, ) :={ R (open,≤ } unbounded); (e) (a,−∞ b) :=∞ x : a < x < b (open, bounded); (f) (a, ) :={ x : a < x (open,} unbounded); (g) ( ∞, b) :={ x : x <} b (open, unbounded); (h) [a,−∞ b) := x{: a x} < b (left-half-closed or right-half-open, bounded);{ ≤ } (i) (a, b] := x : a < x b (left-half-open or right-half-closed, bounded).{ ≤ } (iii) An interval [a, b], (a, b), [a, b), or (a, b] is called degenerate if a = b and nondegenerate if a < b. We often assume without comment that our intervals are nondegenerate. Remark 2.17. Note that if a = b then the closed interval [a, b] is the singleton (which means a set containing exactly one element) a , while the intervals (a, b), [a, b), and (a, b] are empty. { } Definition 2.18. Let A R and x R. Then x is called a maximum of A if for all y A we have⊆ y x. Such∈ an x is unique if it exists, and is denoted max A∈. Similarly for minimum≤ and min A. Remark 2.19. It is easy to see that min A = max( A) (where A := x : x A ). Using this and the elementary− properties− of inequalities,− {−from any∈ fact} concerning maxima we can more-or-less automatically deduce a corresponding fact concerning minima. Remark 2.20. If A is finite (and you can look in the next section for the definition of this!) and nonempty, then max A exists. Similarly for min A. Theorem 2.21 (Well-Ordering Principle). Every nonempty subset of N has a minimum. Proof. Suppose A N has no minimum. Then 1 A, otherwise 1 = min A since 1 = min N. Let⊆ n N, and assume 1, 2, . . . ,6∈ n A. Then n + 1 A, otherwise n + 1 = min A∈since n + 1 = min N 1, 2, .6∈ . . , n . By induction,6∈ for all n N we have n A. Thus A = . \{ } ∈ 6∈ ∅  Definition 2.22. Let A R. ⊆ (i) A real number x is called an upper bound of A if for all y A we have y x. ∈ (ii) A is called≤ bounded above if it has an upper bound. (iii) Similarly for lower bound and bounded below. Also similarly else- where. Remark 2.23. If A is bounded above, then the set of upper bounds of A is an interval which is unbounded to the right. ANALYSIS LECTURES 15

Trivially, every real number is simultaneously an upper and a lower bound for . As∅ with max and min, it is easy to see that x is a lower bound of A if and only if x is an upper bound of A, consequently any fact concerning upper bounds− gives more-or-less automatically− a corresponding fact concerning lower bounds. Remark 2.24. The following two statements are very easy generalizations of the Well Ordering Principle: (1) every nonempty set of integers which is bounded below has a minimum, and (2) every nonempty set of integers which is bounded above has a maximum. In fact, when these facts are invoked, we just say “by the Well Ordering Principle”. Definition 2.25. For all nonnegative integers n, k with k n define ≤ n n!   := , k k!(n k)! − where 1 2 n if n > 0 n! := ( · ··· 1 if n = 0. Proposition 2.26 (Pascal’s Triangle). With the above notation, if k > 0 then n + 1 n n   =   +  . k k 1 k − Proof. We have n n n! n!   +   = + k 1 k (k 1)!(n (k 1))! k!(n k)! − − − − − k + (n k + 1) = n! −  k!(n k + 1)!  − n!(n + 1) = k!(n + 1 k)! − (n + 1)! = k!((n + 1) k)! − n + 1 =  . k  Theorem 2.27 (Binomial Theorem). For all a, b R and every nonnegative integer n, ∈ n n n n k k (a + b) =  a − b . X k k=0 Proof. First, if n = 0 the left hand side is (a + b)0 = 1, 16 JOHN QUIGG while the right hand side is 0 0 0 k k 0 0 0  a − b =  a b = 1, X k 0 k=0 so the equation is true. Next, let n be a nonnegative integer, and assume n n n n k k (a + b) =  a − b . X k k=0 Then n n+1 n n n k k (a + b) = (a + b)(a + b) = (a + b)  a − b X k k=0 n n n n k+1 k n n k k+1 =  a − b +  a − b X k X k k=0 k=0 n n 1 n+1 n n k+1 k − n n k k+1 n+1 = a +  a − b +  a − b + b X k X k 1 0 n n n+1 n n k+1 k n n (k 1) k n+1 = a +  a − b +  a − − b + b X k X k 1 1 1 − n n+1 n n n+1 k k n+1 = a +   +  !a − b + b X k 1 k 1 − n n+1 n + 1 n+1 k k n+1 = a +  a − b + b X k 1 n+1 n + 1 n+1 k k =  a − b , X k 0 so the desired equation holds for n + 1. By induction, the equation holds for every nonnegative integer n.  Corollary 2.28 (Bernoulli’s Inequality). For all a 0 and n N, ≥ ∈ (1 + a)n 1 + na. ≥ Proof. By the Binomial Theorem, the left hand side is a sum of nonnegative terms, and the right hand side comprises the first two terms.  Definition 2.29. Let A R. ⊆ (i) A real number x is called a supremum of A if for every upper bound y of A we have x y. Such an x is unique if it exists, and is denoted sup A. ≤ (ii) Similarly for infimum and inf A. Also similarly elsewhere. ANALYSIS LECTURES 17

Remark 2.30. (i) Thus, if sup A exists it is the minimum of the set of upper bounds of A; indeed, the set of upper bounds of A is precisely [sup A, ). (ii) A has a∞ maximum if and only if A has a supremum and sup A A, in which case max A = sup A. Similarly for min and inf. ∈ (iii) If A has a supremum then A is bounded above, but not conversely; the interval (0, 1) is a counterexample. The empty set gives a more extreme counterexample: we have already noticed that every real number is an upper bound for , which means the set of upper bounds of is R, and of course R∅ has no minimum. (iv) Thus, if sup∅ A exists then A is nonempty and bounded above. The last axiom of the real numbers, certainly the deepest (indeed, its crucial role in analysis was not recognized until the 19th century), guarantees that there are no other obstructions to the existence of a supremum: Axiom 2.31 (Completeness Axiom). Every nonempty subset of R which is bounded above has a supremum. Remark 2.32. As with max and min, it is easy to see that if A R is nonempty and bounded below then A has an infimum, and moreover inf⊆ A = sup( A), consequently any fact concerning suprema gives more-or-less automatically− − a corresponding fact concerning infima. Remark 2.33. If A Z is nonempty and bounded above, we do not need the Completeness Axiom⊆ to know A has a supremum—the Well-Ordering Principle tells that in fact A has a maximum. Proposition 2.34 (Approximation of Suprema). Let A R be nonempty and bounded above. Then sup A is the unique number s satisfying⊆ both (i) for all t > s and all x A we have x < t; (ii) for all t < s there exists∈ x A such that t < x. ∈ Proof. We must prove two things: that sup A satisfies properties (i)–(ii), and that there is at most one real number satisfying (i)–(ii). First let t > sup A. Then t is an upper bound of A and is bigger than the upper bound sup A, so for all x A we have x < t. We have shown sup A satisfies (i). For (ii), let t < sup∈A. Then t is not an upper bound of A, so there exists x A such that t < x. ∈ For the uniqueness, let s1, s2 R both satisfy (i)–(ii), and suppose s1 = ∈ 6 s2. Without loss of generality s1 < s2. Then letting s = s2 in (ii) we see that we can choose x A such that s1 < x. But then letting s = s1 and t = x in (i) we see that∈ for all y A we have y < x, which is a contradiction since x A. ∈ ∈  Theorem 2.35 (Archimedean Principle). For all x R there exists n N such that n > x. ∈ ∈ 18 JOHN QUIGG

Proof. Suppose not. Then there exists x R such that for all n N we have n x. In particular, N is bounded∈ above. Of course N = ∈, so by the Completeness≤ Axiom sup N exists. Then by Approximation of6 Suprema∅ there exists n N such that n > sup N 1. But then n + 1 N and ∈ − ∈ n + 1 > sup N, giving a .  Remark 2.36. More generally (and this follows very easily from the Arch- imedean Principle), for every real number x there exist integers k and n such that k < x < n, and when this fact is invoked we just say “by the Archimedean Principle”. Corollary 2.37 (Floor Theorem). For all x R there exists a unique n Z such that ∈ ∈ n x < n + 1. ≤ Proof. Put A = k Z : k x . By the (generalized version of the) Archimedean Principle,{ ∈ the set≤A is} nonempty. Moreover, by its definition, the set A has x as an upper bound. Thus A is a nonempty set of integers which is bounded above. Then by the (generalized version of the) Well- Ordering Principle, A has a maximum, which we call n. So, by construction n is the largest integer less than or equal to x. In particular, n + 1 must be greater than x since n + 1 is an integer larger than n. Thus we have n x < n + 1. ≤For the uniqueness, suppose k Z, k x < k + 1, and k = n. Without loss of generality n < k. Then n∈+ 1 ≤k since n and k are6 integers. But then we have ≤ x < n + 1 k x, ≤ ≤ a contradiction.  Remark 2.38. We called the above result the “Floor Theorem” because the unique integer n is popularly called the “floor” of x. Theorem 2.39 (Density of Rationals). Strictly between any two distinct real numbers there exists a rational number.

Proof. Let a < b. Then b a > 0, so by the Archimedean Principle we can N − 1 1 choose n such that n > b a . Then n < b a. By the Floor Theorem we can choose∈ an integer k such− that k 1 na− < k. Then we have − ≤ k 1 a < a + < b, n ≤ n so k/n is a rational number strictly between a and b.  Remark 2.40. The Completeness Axiom is what allows for decimal expan- sions of real numbers: Given x 0, first define ≥ n = max l Z : l x , { ∈ ≤ } ANALYSIS LECTURES 19 which exists by the Well-Ordering and the Archimedean Principles. Next, define d1, d2,... by l d1 = maxl Z : n + x, ∈ 10 ≤ d1 l d2 = maxl Z : n + + x, ∈ 10 102 ≤ and continuing inductively. Note that each di is an integer between 0 and 9, inclusive. Put j di A = n + : j N. X 10i ∈ i=1 Then x = sup A, and we say

x = n.d1d2 ··· is a decimal expansion of x. On the other hand, given a nonnegative integer n and integers d1, d2,... between 0 and 9, again put j di A = n + : j N. X 10i ∈ i=1 Then A is nonempty and bounded above, so we can put x = sup A, and again we say x = n.d1d2 is a decimal expansion of x. ··· If x < 0 we find a decimal expansion n.d1d2 of x , and we say x = ··· | | n.d1d2 is a decimal expansion of x. − We recall··· (without proof) some elementary properties of decimal expan- sions: (i) R coincides with the set of all decimal expansions. That is, not only does every real number have a decimal expansion, but every decimal expansion determines a real number. (ii) The decimal expansion of a real number x is unique unless x is of n the form 10k for some integers n and k, in which case x has two decimal expansions, one ending in 0’s and the other ending in 9’s. (iii) A real number x is rational if and only if its decimal expansion (more precisely, each of its decimal expansions) is eventually re- peating (that is, there is a string of digits which repeats forever starting at some point in the decimal expansion). Thus, for ex- ample, the number .1010010001 is irrational since its decimal expansion does not repeat. ···

Remark 2.41. In the above preceding remark we defined d1, d2,... induc- tively, that is, the choice of each successive dn depended upon the choices of the preceding dk’s for k < n. The justification for this is actually a subtle point of set theory (because we are making “infinitely many choices”), but we will allow ourselves to do this willy-nilly. 20 JOHN QUIGG

Corollary 2.42 (Density of Irrationals). Strictly between any two distinct real numbers there exists an irrational number.

Proof. Let a < b, and pick any positive irrational number x. Then ax < bx, so by Density of Rationals we can choose y Q such that ax < y < bx. Without loss of generality, y = 0. Then ∈ 6 y a < < b, x y and x is irrational since y is rational and x is irrational.  Remark 2.43. The Completeness Axiom is also what guarantees the exis- tence of roots: for all x R and n N, ∈ ∈ (i) if n is even and x 0, then there exists a unique y 0 such that yn = x, while ≥ ≥ (ii) if n is odd then there exists a unique y R such that yn = x. ∈ The justification of these facts is deferred until after the Intermediate Value Theorem (and we will not run the risk of circular reasoning since we will only use roots in examples until then). In both of the above cases, y is called the nth root of x, denoted y = x1/n. Then for each n N and k Z define ∈ ∈ xk/n = (x1/n)k (where x 0 if n is even). ≥ It is natural to ask “is xk/n also equal to (xk)1/n?”, and fortunately the answer is “yes”, at least in the cases where everything makes sense. More generally, it can be proven by a tedious algebraic argument that the usual Laws of Exponents hold for rational powers, that is, for all r, s Q and x, y R we have: ∈ ∈ (i) xrxs = xr+s; xr r s (ii) xs = x − ; (iii) (xr)s = xrs; (iv) (xy)r = xryr (as long as we don’t try to divide by 0 or take an even root of a negative number). Since we will not need these in the formal development of the theory, we do not prove them here. Much later (after the material on inte- gration) we will have more powerful machinery allowing us to handle even more general exponents and prove the Laws of Exponents without resorting to mundane algebraic manipulation. Nevertheless, it is interesting to observe that rational powers tend to give irrational numbers, in the following precise sense: for all k, n N, if k1/n is not an integer, then it is irrational. We only prove the case k∈= n = 2; the general case can be proved using the Fundamental Theorem of Arithmetic. Since we have not yet proven the existence of square roots, we must state the formal result as follows: Proposition 2.44. For all x R, if x2 = 2 then x is irrational. ∈ ANALYSIS LECTURES 21

Proof. We argue by contradiction. Let x R, and suppose x2 = 2 and x is rational. Then there exist p, q Z such∈ that x = p , and without loss of ∈ q generality p and q are not both even. Then 2q2 = p2. Thus p2 is even, hence p itself is even. There exists r Z with p = 2r. Then ∈ 2q2 = 4r2, 2 2 2 so q = 2r . But then q , hence q, is even, giving a contradiction.  3. Countability Definition 3.1. A set A is called: (i) finite if either A = or for some n N there exists a 1-1 function from 1, . . . , n onto∅ A; ∈ (ii) infinite{ if A is} not finite; (iii) countable if A is finite or there exists a 1-1 function from N onto A; (iv) countably infinite if A is countable and infinite; (v) uncountable if A is not countable. Remark 3.2. We adopt the point of view that we understand finite sets pretty well, although strictly speaking the familiar properties of finite sets should be proven, which would involve routine induction arguments. Infinite sets, on the other hand, have some amazing properties (for example, A is infinite if and only if there exists a 1-1 function from A onto a proper subset of A), and have mystified a lot of smart people for a long time. The modern theory of infinite sets was only worked out during the late 19th and early 20th centuries. Here we will only need to know the difference between infinite sets which are “manageable” (countable) and which are “unimaginably huge” (uncountable). One fairly obvious fact is that if there is a 1-1 function from A onto B, then A is countable if and only if B is. Lemma 3.3. For every nonempty set A, the following are equivalent: (i) A is countable; (ii) there exist a countable set B and a function from B onto A; (iii) there exists a function from N onto A; (iv) there exists a 1-1 function from A to N; (v) there exist a countable set B and a 1-1 function from A to B. Proof. (i) (ii). This is trivial, since we could take the identity function on A. ⇒ (ii) (iii). Assume B is countable and f : B A is onto. Then B = . Case⇒ 1. B is finite. For some n N we can→ choose a 1-1 onto function6 ∅ g : 1, . . . , n B. Define h: N ∈1, . . . , n by { } → → { } h(k) = g min k, n . { } Then h is onto, so f g h: N A is onto. ◦ ◦ → 22 JOHN QUIGG

Case 2. B is infinite. Choose a 1-1 onto function g : N B. Then f g : N A is onto. → ◦(iii) →(iv). Given an onto function f : N A, define g : A N by ⇒ → → 1 g(x) = min f − x . { } Note that g is well defined by the Well-Ordering Principle, since onto-ness 1 of f means that for every x A the pre-image f − ( x ) is nonempty. Then ∈ { } 1 g is 1-1 since if x = y then x and y are disjoint, hence so are f − ( x ) 1 6 { } { } { } and f − ( y ), thus these latter two sets must have distinct minima. (iv) {(v).} This is trivial, since N is countable. (v) ⇒(i). By (i) (iv), without loss of generality A N. Case⇒ 1. A is finite.⇒ Then A is countable. ⊆ Case 2. A is infinite. Define f : N A inductively by → min A if n = 1 f(n) = ( min A f 1, . . . , n 1 if n > 1. \ { − } Then f is 1-1 onto.  Theorem 3.4. The following sets are countable: (i) N2; (ii) every countable union of countable sets; (iii) every finite product of countable sets; (iv) Z; (v) Z2; (vi) Q.

Proof. (i) Define f : N2 N by f(n, k) = 2n3k. Then f is 1-1 by the Fundamental Theorem of→ Arithmetic, so N2 is countable. (ii) More precisely, by a “countable union of countable sets” we mean a union where the is countable and for every A the set A is countable.S F Without lossF of generality and every A ∈ Fare nonempty, because = makes the whole union empty,F while an empty∈ F element of has no effectF ∅ on the union. Let g : N be onto, and for each A letF 2 → F ∈ F hA : N A be onto. Define f : N by → → S F f(n, k) = hg(n)(k). Then f is onto, so is countable since N2 is. (iii) More precisely,S F by a “finite product of countable sets” we mean a Cartesian product i I Ai where the index set I is finite and each coordinate Q ∈ set Ai is countable. Without loss of generality I = , otherwise the product has exactly one element (because there is exactly6 one∅ function whose domain is the empty set, namely the empty set of ordered pairs). Then without loss of generality I = 1, . . . , n for some n N. Moreover without loss of { } ∈ generality n > 1, otherwise the product is just A1, which is countable by ANALYSIS LECTURES 23 hypothesis. Then we have n n 1 − Ai =  Ai x  Y [ Y × { } i=1 x An i=1 ∈ (or, more precisely, there is an obvious 1-1 function from the left hand set onto the right hand set), so the result follows by induction. (iv) We have Z = N N 0 , a finite union of countable sets. (v) This is a finite product∪ − ∪ of { countable} sets. (vi) Define f : Z2 Q by → n if k = 0 f(n, k) = ( k 6 0 if k = 0. 2 Then f is onto, so Q is countable since Z is.  Theorem 3.5. The following sets are uncountable: (i) every nondegenerate interval; (ii) R; (iii) the irrational numbers. Proof. (i) It suffices to show [0, 1] is uncountable (because every nonde- generate interval contains an image of [0, 1] under a 1-1 function, and a set containing an uncountable subset must itself be uncountable), and moreover it suffices to show the subset

A := .a1a2 : for all i N we have ai = 1 or 2 { ··· ∈ } of [0, 1] (chosen to avoid the nuisance of ambiguous decimal expansions) is uncountable. Let f : N A. We show f is not onto, which suffices. For each n N let → ∈ f(n) = .an1an2an3 . ··· Define b = .b1b2 A by · · · ∈ 1 if aii = 2 bi = ( 2 if aii = 1. Then b differs from each f(n) in the nth decimal place, so b ran f. (ii) This follows immediately from (i), again since R contains6∈ the uncount- able subset [0, 1]. (iii) We have R = Q Qc, and Q is countable, so Qc must be uncountable ∪ since R is.  4. Sequences Definition 4.1. (i) A sequence is a function with domain N. (ii) If x is a sequence, then for each n N the nth term of x is defined as ∈ xn := x(n). 24 JOHN QUIGG

Notation and Terminology 4.2. A sequence x is written (xn)n∞=1, or just (xn). If all the values xn of a sequence (xn) are in a set S, we say (xn) is a sequence in S. Thus, the set of all sequences in S is just the Cartesian prod- uct 1∞ S. Remember that although we use subscript notation xn rather thanQ ordinary function notation x(n), a sequence is just a function. Standing Hypothesis 4.3. All our sequences will be in R unless otherwise specified. Remark 4.4. Actually, it is sometimes convenient to allow sequences to “start somewhere other than 1”, that is, to allow the domain to be of the form n Z : n k for some k Z, and in fact k = 0 is frequently used. The {essential∈ properties≥ } of sequences∈ do not depend upon the “starting point”, and for convenience we develop the general theory for sequences starting at 1.

Definition 4.5. Let (xn) be a sequence and x R. We say (xn) converges to x if for all  > 0 there exists k N such that∈ for all n N, ∈ ∈ if n k then xn x < . ≥ | − | Remark 4.6. In the above -condition for convergence, it is crucial to keep track of the dependence: the k depends upon the . More precisely, k is not uniquely determined by —if we find one k which works, then any larger k will also work. Also, if we have a k which works for a particular , then it will also work for any larger . The important thing to remember is that if the  is decreased then we will probably have to increase the k.

Notation and Terminology 4.7. When (xn) converges to x, we write xn x as n , or → → ∞ n xn →∞ x, −−−→ or just xn x if n is understood. → → ∞ Remark 4.8. There can be at most one x for which xn x. This result is completely routine, but we formalize it to indicate a typical→ “-argument”:

Lemma 4.9. Let (xn) be a sequence and x, y R. If xn x and xn y, then x = y. ∈ → →

Proof. Let  > 0. Choose k N such that if n k then xn x , xn y < /2. Then ∈ ≥ | − | | − |   x y x xk + xk y x xk + xk y < + = . | − | ≤ | − − | ≤ | − | | − | 2 2 Since  > 0 was arbitrary, we must have x = y.  Remark 4.10. In the above proof, we used an expression of the form “s, t < r”, which is intended as an abbreviation for “s < r and t < r”. Also, we should justify how we could “choose k N such that if n k then ∈ ≥ xn x , xn y < /2”: Since xn x we can choose k1 N such that n k1 | − | | − | → ∈ ≥ implies xn x < /2, and similarly we can choose k2 N such that n k2 | − | ∈ ≥ implies xn y < /2, and then we can take k = max k1, k2 . This sort of “choosing| something− | to perform several (compatible){ jobs simultaneously”} ANALYSIS LECTURES 25 occurs so often that it is normally done without comment, as in the above proof. Remark 4.11. By the preceding lemma, if xn x then x is uniquely deter- → mined by (xn), so it makes sense to formalize this relationship:

Definition 4.12. Let (xn) be a sequence, and let x R. If xn x, we also ∈ → say x is the limit of (xn), written x = limn xn, or just lim xn. →∞ Definition 4.13. If a sequence (xn) converges, we also say it is convergent; otherwise we say it diverges, or is divergent. Example 4.14. (i) Every constant sequence converges to its constant value. (ii) 1 0. n → Lemma 4.15. Let (xn) be a sequence and x R. Then xn x if and only ∈ → if xn x 0. | − | → Proof. Assume xn x. Given  > 0, choose k N such that if n k then → ∈ ≥ xn x < . Since for every t R we have t 0 = t , we have shown that | − | ∈ | |− | | xn x 0. The steps are reversible, so the desired equivalence holds. | − | →  Theorem 4.16 (Squeeze Theorem). Let (xn), (yn), and (zn) be sequences. Assume that for all n N we have xn yn zn, and that lim xn = lim zn = ∈ ≤ ≤ x. Then yn x. → Proof. Given  > 0, choose k N such that if n k then xn x , zn x < . Then n k implies ∈ ≥ | − | | − | ≥ x  < xn yn zn < x + , − ≤ ≤ so yn x < . | − |  Definition 4.17. (i) A subset A of R is called bounded if it is bounded above and below. (ii) A real-valued function is called bounded if its range is bounded in R. More generally, if B dom f, we say f is bounded on B if the ⊆ restriction f B is bounded. In particular, a sequence (xn) is called | bounded if its range xn : n N is bounded in R. { ∈ } Remark 4.18. For all A R, the following are equivalent: ⊆ (i) A is bounded; (ii) there exists M R such that for all x A we have x M; (iii) there exist t, M∈ R such that for all x∈ A we have| x| ≤ t M; (iv) for all t R there∈ exists M R such∈ that for all x| −A |we ≤ have x t ∈M. ∈ ∈ | − | ≤ Proposition 4.19. Every convergent sequence is bounded.

Proof. Let xn x. Choose k N such that if n k then xn x 1. → ∈ ≥ | − | ≤ Put M = max 1, x1 x ,..., xk 1 x (where without loss of generality { | − | | − − |} k > 1). Then for all n N we have xn x M. ∈ | − | ≤  26 JOHN QUIGG

Remark 4.20. In the above proof we applied the definition of convergence with  = 1 to choose k such that n k implies xn x 1. However, ≥ | − | ≤ the inequality given to us by the definition is the strict one: “ xn x < 1”. In this particular case we only needed the weak inequality,| which− can| certainly be satisfied since a strict inequality implies the corresponding weak inequality. This sort of “taking a weaker conclusion than the definition (or theorem, hypothesis, etc.) allows” is done often, since it is clearer to use only the property specifically required in the context. Definition 4.21. Let A R and t R. We say t is a cluster point of A if for all  > 0 there exists x⊆ A t ∈such that x t < . ∈ \{ } | − | Notation and Terminology 4.22. A0 denotes the set of all cluster points of A. Remark 4.23. In the definition of cluster point, once we specify x A we can satisfy the requirement that x = t by requiring 0 < x t . Thus∈ we can rephrase the condition as “for all6  > 0 there exists |x − A| such that 0 < x t < ”. ∈ Thus,| − t| is a cluster point of A if and only if we can find points of A “arbitrarily close to but different from” t. It turns out that we can find lots of them:

Lemma 4.24. If A R and t A0 then for all  > 0 the set (t , t+) A is infinite. ⊆ ∈ − ∩

Proof. Given  > 0, choose x1 A such that 0 < x1 t < , and for all ∈ | − | n = 2, 3,... inductively choose xn A such that ∈ 0 < xn t < xn 1 t . | − | | − − | Then for each n we have xn t < x1 t < , so xn (t , t + ). Also, if | − | | − | ∈ − n > k then xn t < xk t , so we must have xn = xk. Thus the sequence | − | | − | 6 (xn) is 1-1, hence has infinite range. Therefore xn : n N is an infinite subset of A (t , t + ). { ∈ } ∩ − 

Proposition 4.25. Let A R and t R. Then t A0 if and only if there exists a sequence in A t ⊆converging∈ to t. ∈ \{ }

Proof. First assume t A . For each n N choose xn A such that ∈ 0 ∈ ∈ 0 < xn t < 1/n. Then (xn) is a sequence in A t , and xn t 0 by | − | \{ } | − | → the Squeeze Theorem, so xn t. → Conversely, let (xn) be a sequence in A t converging to t, and let  > 0. \{ } Choose n N such that xn t < . Since xn = t, we have 0 < xn t , and we have∈ shown t A|. − | 6 | − | ∈ 0 

Definition 4.26. Given sequences (xn) and (yk), we say (yk) is a subse- quence of (xn) if there exist n1 < n2 < in N such that for all k N we ··· ∈ have yk = xnk . ANALYSIS LECTURES 27

Remark 4.27. Thus, a subsequence of (xn) is a composition of the function n xn : N R with a certain function k nk : N N. Note that the 7→ → 7→ → requirement n1 < n2 < implies that for all k N we have nk k. ··· ∈ ≥ Of course, (xn) is a subsequence of itself, taking nk = k for all k.A moment’s reflection reveals that a given sequence has a bewildering profusion of subsequences; certainly there is no simple way to write them all down. Nevertheless, there is some commonality among them:

Lemma 4.28. Let (xn) be a sequence and x R. If (xn) converges to x ∈ then every subsequence of (xn) also converges to x.

Proof. Let xn x and n1 < n2 < in N. Given  > 0, choose j N → ··· ∈ such that if n j then xn x < . Then k j implies nk k j, hence ≥ | − | ≥ ≥ ≥ xn x < . Thus xn x. | k − | k → 

Remark 4.29. A given sequence (xn) may not converge, but it may have many convergent subsequences. It is useful to have a test for which real numbers are limits of subsequences of (xn):

Lemma 4.30. Let (xn) be a sequence and x R. Then the following are equivalent: ∈

(i) (xn) has a subsequence converging to x; (ii) either x is a cluster point of the range xn : n N or the set { ∈ } n N : xn = x is infinite; { ∈ } (iii) for all  > 0 and k N there exists n k such that xn x < . ∈ ≥ | − |

Proof. (i) (ii). Let (yk) be a subsequence of (xn) converging to x, and ⇒ further assume xn = x for only finitely many n. We’ll show x (ran(xn)) . ∈ 0 Let  > 0, and choose j N such that k j implies yk x < . Then ∈ ≥ | − | xn x <  for infinitely many n, and we can have xn x = 0 for only | − | | − | finitely many n, so there exists n such that 0 < xn x < , hence x is a | − | cluster point of ran(xn). (ii) (iii). Let  > 0 and k N. First assume x (ran(xn)) . Then ⇒ ∈ ∈ 0 the set n N : xn x <  is infinite, hence contains some an element n { ∈ | − | } such that n k. On the other hand, assume the set n N : xn = x is infinite. Then≥ it contains an element n such that n {k,∈ and we certainly} ≥ have xn x < . | − | (iii) (i). Choose n1 N such that xn x < 1, and for each k = ⇒ ∈ | 1 − | 2, 3,... inductively choose nk > nk 1 such that xnk x < 1/k. Then − | − | we have n1 < n2 < , so that (xn ) is a subsequence. By the Squeeze ··· k Theorem, this subsequence converges to x.  Remark 4.31. With the notation of the above lemma, if x is a cluster point of the set A := xn : n N , it might seem that Lemma 4.25 tells us { ∈ } there is a subsequence of (xn) converging to x; not only does it not tell us this, but in fact Lemma 4.25 gives us no information: although it tells us there is a sequence in the set A (in fact in the slightly smaller set A x ) \{ } 28 JOHN QUIGG converging to x, there is nothing in the statement of that lemma which gives us a subsequence of (xn).

Lemma 4.32. For all sequences (xn) and (yn), if xn 0 and (yn) is → bounded, then xnyn 0. → Proof. Choose M R such that for all n N we have yn M. Without loss of generality M∈ > 0. Given  > 0, choose∈ k N such| that| ≤ if n k then ∈ ≥ xn < /M. Then n k implies | | ≥ xnyn M xn < . | | ≤ | |  Remark 4.33. In the above proof we imposed a further condition on M, namely positivity. First of all, there really was no loss of generality: of course we have M 0 since it is an upper bound for yn : n N ; if we happened to initially≥ choose M = 0 (which would only{| be| possible∈ } if the sequence (yn) were identically 0), we could swap it for a larger M and still have an upper bound. Anyway, the reason we imposed the further condition M > 0 was so that /M would be defined. An alternative trick would be to use /(M + 1), which makes sense no matter what our initial choice of M was. Then we would arrive at the inequalities M xnyn M xn < < . | | ≤ | | M + 1

Proposition 4.34. For all convergent sequences (xn) and (yn),

(i) lim(xn + yn) = lim xn + lim yn; (ii) lim(xnyn) = lim xn lim yn ;   (iii) lim(cxn) = c lim xn if c R; ∈ xn lim xn (iv) lim = if yn = 0 for all n and y = 0. yn lim yn 6 6

Proof. Let x = lim xn and y = lim yn. N  (i) Given  > 0, choose k such that if n k then xn x , yn y < 2 . Then n k implies ∈ ≥ | − | | − | ≥ (xn + yn) (x + y) xn x + yn y < . | − | ≤ | − | | − | (ii) We have

xnyn xy = xnyn xyn + xyn xy = (xn x)yn + x(yn y) 0, − − − − − → since xn x and yn y both go to 0, and (yn) and the constant sequence (x) are both− bounded.− (iii) This is immediate from (ii) and convergence of a constant sequence to its constant value. (iv) By (ii), it suffices to show 1 1 . Given  > 0, choose l N such yn → y ∈ that if n l then yn y < y /2. Then n l implies yn y /2. Now ≥ | − | | | ≥ | | ≥ | | ANALYSIS LECTURES 29

2 choose k l such that if n k then yn y < y /2. Then n k implies ≥ ≥ | − | ≥ 1 1 y y 2 y y = n n < . | − | | 2− | yn − y yny ≤ y | |  Remark 4.35. Note that a couple of times in the above proof we used the shortcut of choosing something to perform several consistent jobs at once.

Proposition 4.36. For all convergent sequences (xn) and (yn), if xn yn ≤ for all n N, then lim xn lim yn. ∈ ≤ Proof. Put zn = yn xn and z = lim zn. Then zn 0 for all n, and by the above proposition it− suffices to show z 0. Suppose≥ not, that is, assume z < 0. Choose n N such that ≥ ∈ zn z < z . | − | | | Then zn < z + z < 0, | | a contradiction.  Definition 4.37. (i) The set of extended real numbers is defined as R := R , ∪ {±∞} where + and are distinct objects which are not real numbers. (ii) We write∞ for−∞ + . (iii) The ordering∞ of R∞is extended to R by < x < for x R. −∞ ∞ ∈ (iv) Terms such as maximum or minimum, upper or lower bound, and supremum or infimum are applied to sets of extended real numbers in the obvious way. Proposition 4.38. Let A R. Then: ⊆ (i) A is bounded above, and if A = then A has a supremum. (ii) If A R and A = , then A6 is∅ bounded above in R if and only if sup A⊆ < . 6 ∅ (iii) Similarly∞ for bounded below and infimum. Proof. (i) Since R has a maximum (namely, ), it is bounded above, hence so is every subset, in particular A. For the other∞ part, assume A = . Case 1. A. Then = max A, so = sup A. 6 ∅ Case 2. A∞= ∈ . Then∞ = sup A∞. Case 3. {−∞}A and B := A−∞R = . If B is bounded above in R, then s := sup B ∞R 6∈ by the Completeness∩ 6 Axiom,∅ and we have s = sup A since s. On∈ the other hand, if B is unbounded above in R, then for all M−∞ ≤R there exists x B such that x > M, so is the only upper bound of B∈ in R, hence the only∈ upper bound of A, thus∞ = sup A. (ii) First, if A is bounded above in R, then sup A∞ R, so sup A < . ∈ ∞ 30 JOHN QUIGG

Conversely, the proof of (i) shows that if A is unbounded above in R then sup A = . (iii) Either∞ use an argument similar to the above, or multiply by 1 and − use the above results. 

Definition 4.39 (Infinite Limits). Let (xn) be a sequence.

(i) We say (xn) diverges to if for all M R there exists k N such that for all n N, ∞ ∈ ∈ ∈ if n k then xn > M. ≥ (ii) Our notation for infinite limits is similar to that for finite limits; for example, we write xn or lim xn = . (iii) Similarly for (reverse→ ∞ the inequalities).∞ −∞ (iv) We write xn x in R if either x R and (xn) converges to x or → ∈ x = and (xn) diverges to x. ±∞ Example 4.40. n . → ∞ Proposition 4.41. For any sequence (xn), we have xn if and only if there exists k N such that both → ∞ ∈ (i) for all n N, if n k then xn > 0, and ∈ 1 ≥ (ii) the sequence ( x )n k converges to 0. n ≥ Similarly for . −∞ Proof. We only give the argument for the case ; the case is similar. ∞ −∞ Since the important properties of the sequence (xn) do not change if we delete finitely many terms, without loss of generality xn > 0 for all n N. ∈ First assume xn . Given  > 0, choose l N such that n l implies 1 → ∞ 1 ∈ ≥ xn > . Then n l implies < .  ≥ xn Conversely, assume 1 0. Given M R, without loss of generality xn → ∈ M > 0, so we can choose l N such that n l implies 1 < 1 . Then n l ∈ ≥ xn M ≥ implies xn > M. 

Proposition 4.42. Let (xn) and (yn) be sequences, and assume xn . Then: → ∞

(i) (yn) bounded below implies xn + yn ; → ∞ (ii) inf yn > 0 implies xnyn ; → ∞ yn (iii) (yn) bounded and xn = 0 for all n implies 0. 6 xn → Similarly for xn , etc. → −∞ Proof. (i) Choose t R such that yn t for all n N. Given M R, ∈ ≥ ∈ ∈ choose k N such that n k implies xn > M t. Then n k implies ∈ ≥ − ≥ xn + yn xn + t > M. ≥ (ii) Choose t > 0 such that yn t for all n N. Given M R, choose ≥ ∈ ∈ k N such that n k implies xn > M/t. Then n k implies xnyn xnt > M∈. ≥ ≥ ≥ ANALYSIS LECTURES 31

(iii) By the preceding proposition we have 1/xn 0. Then yn/xn = → yn(1/xn) 0 since (yn) is bounded. →  Remark 4.43. Many results about convergent sequences, for example the Squeeze Theorem, have obvious analogues for infinite limits. Corollary 4.44. For all x R, ∈ (i) if x > 1 then xn , and (ii) if x < 1 then xn→ ∞0. | | → Proof. (i) We have x = 1 + a with a > 0. Then by Bernoulli’s Inequality, xn = (1 + a)n 1 + na, ≥ Since 1 + na by the preceding proposition, we have xn by the Squeeze Theorem→ ∞ for infinite limits. → ∞ (ii) Case 1. x = 0. Then xn 0 trivially. 1 → Case 2. x = 0. Then x > 1, so 6 | | 1 n  x  → ∞ | | by the first part of the corollary. Hence x n 0 by the rules of infinite limits. Thus xn 0, since xn = x n. | | → → | | | |  Lemma 4.45. If A is a nonempty subset of R, then there exists a sequence (xn) in A such that xn sup A. Similarly for inf. → Proof. Put s := sup A. Case 1. s = . For each n N choose xn A such ∞ ∈ ∈ that xn > n. Then xn by the Squeeze Theorem. → ∞ Case 2. s R. For each n N choose xn A such that xn > s 1/n. Then ∈ ∈ ∈ − 1 1 s < xn s < s + , − n ≤ n so xn s < 1/n. Hence xn s by the Squeeze Theorem. | − | →  Definition 4.46. Let (xn) be a sequence. (i) The limit supremum of (xn) is defined as

lim sup xn := inf sup xn. n k N n k →∞ ∈ ≥ (ii) Similarly for limit infimum.

Remark 4.47. In more detail: for each k N let sk be the supremum of the ∈ tail end xn : n k . Then sk : k N is a nonempty subset of R. As { ≥ } { ∈ } such, this set has an infimum, which is defined to be the lim sup of (xn). Lemma 4.48. For any sequence (xn), we have lim inf xn = lim sup( xn). − − Lemma 4.49. For any sequence (xn), we have lim sup xn < if and only ∞ if (xn) is bounded above. Similarly for lim inf and bounded below.

Proof. lim sup xn < if and only if there exists k N such that supn k xn < ∞ ∈ ≥ , if and only if sup xn < . ∞ ∞  32 JOHN QUIGG

Lemma 4.50. Let (xn) be a sequence, and let x = lim sup xn. Then: (i) for all a < x and for all k N there exists n k such that xn > a, and ∈ ≥ (ii) for all a > x there exists k N such that for all n k we have ∈ ≥ xn < a. Similarly for lim inf.

Proof. (i) Let k N. Since a < x supn k xn, by Approximation of ∈ ≤ ≥ Suprema there exists n k such that xn > a. ≥ (ii) Since a > infk N supn k xn, by Approximation of Infima there exists ∈ ≥ k N such that supn k xn < a, so that for all n k we have xn < a.  ∈ ≥ ≥ Theorem 4.51. For any sequence (xn), lim sup xn is the maximum extended real number x such that (xn) has a subsequence (yk) with yk x. Similarly for lim inf and minimum. →

Proof. Put x = lim sup xn. Case 1. x = . Then for all M R and for all k N there exists n k ∞ ∈ ∈ ≥ such that xn > M. Letting M = k = 1, we can choose n1 N such that ∈ xn1 > 1. Then for all k > 1 we can inductively choose nk > nk 1 such that − xnk > k. Then xnk by the Squeeze Theorem. Case 2. x = →. Given ∞ M R, we can choose k N such that for all −∞ ∈ ∈ n k we have xn < M. Thus xn . ≥ R → −∞ Case 3. x . We will show that there exists a subsequence (xnk ) such that for all k∈ N we have ∈ 1 1 x < xn < x + , − k k k which will imply that xn x by the Squeeze Theorem. First choose j1 N k → ∈ such that for all n j1 we have xn < x + 1, and then choose n1 j1 such ≥ ≥ that xn > x 1. For k > 1, first choose jk N such that for all n jk we 1 − ∈ ≥ have xn < x + 1/k, and then inductively choose nk > max nk 1, jk such { − } that xnk > x 1/k. Now let y >− x, and choose z (x, y). Then there exists k N such that ∈ ∈ for all n k we have xn < z. Thus no subsequence of (xn) can converge (or diverge)≥ to y. This proves the statements concerning the lim sup, and the arguments for the lim inf are similar. 

Corollary 4.52. Let (xn) be a sequence and x R. Then xn x in R if ∈ → and only if lim inf xn = lim sup xn = x.

Proof. First assume xn x in R. Then every subsequence converges (or → diverges) to x, so lim inf xn = lim sup xn = x. Conversely, assume lim inf xn = lim sup xn = x. If x = , an argument ±∞ from the preceding proof shows xn x. On the other hand, if x R, given → ∈  > 0 choose k N such that both x  < infn k xn and supn k xn < x + . ∈ − ≥ ≥ Then n k implies x  < xn < x+, hence xn x < . Thus xn x. ≥ − | − | →  ANALYSIS LECTURES 33

Theorem 4.53 (Bolzano-Weierstrass Theorem). Every bounded sequence in R has a convergent subsequence.

Proof. Let (xn) be a bounded sequence in R. Then lim sup xn is a real number. As we have seen, (xn) has a subsequence converging to lim sup xn.  Corollary 4.54. Every bounded infinite subset of R has a cluster point. Proof. Let A be a bounded infinite subset of R. Then there is a 1-1 sequence (xn) in A. Since A is bounded, so is the sequence (xn). By the Bolzano- Weierstrass Theorem, (xn) has a convergent subsequence, say with limit x. Then either x is a cluster point of the range of (xn), hence of the superset A, or xn = x for infinitely many n. Since (xn) is 1-1, we could only have xn = x for at most 1 value of n. Therefore, x must be a cluster point of A.  Remark 4.55. It turns out that the above corollary is actually equivalent to the Bolzano-Weierstrass Theorem, and in fact the distinction between them is sometimes blurred.

Definition 4.56. A sequence (xn) is called Cauchy if for all  > 0 there exists k N such that for all n, j N, ∈ ∈ if n, j k then xn xj < . ≥ | − | Theorem 4.57. A sequence converges if and only if it is Cauchy.

Proof. Let (xn) be a sequence. First, assume (xn) converges, and let x = lim xn. Given  > 0, choose k N such that n k implies xn x < /2. Then n, j k imply ∈ ≥ | − | ≥   xn xj xn x + x xj < + = , | − | ≤ | − | | − | 2 2 so (xn) is Cauchy . Conversely, assume (xn) is Cauchy. Claim: (xn) is bounded. For, we can choose k N such that n, j k imply xn xj < 1. Put M = ∈ ≥ | − | max 1, x1 xk ,..., xk 1 xk (where without loss of generality k > { | − | | − − |} 1). Then xn xk M for all n N, proving the claim. So, by the | − | ≤ ∈ Bolzano-Weierstrass Theorem (xn) has a convergent subsequence, say xn i → x. Given  > 0, choose k N such that n, j k imply xn xj < /2. Also ∈ ≥ | − | choose l N such that i l implies xn x < /2. Let n k, and put ∈ ≥ | i − | ≥ i = max l, k . Then ni i k, so { } ≥ ≥   xn x xn xn + xn x < + = . | − | ≤ | − i | | i − | 2 2 Thus (xn) itself converges.  Remark 4.58. Note that the above proof that every Cauchy sequence is bounded is (unsurprisingly) similar to the proof that every convergent se- quence is bounded. 34 JOHN QUIGG

Proposition 4.59. For all sequences (xn) and (yn), if (yn) converges, then: (i) lim sup(xn + yn) = lim sup xn + lim yn (where + t is interpreted as for all t R); ±∞ ±∞ ∈ (ii) lim sup(xnyn) = lim sup xn lim yn if lim yn > 0 (where ( )t is interpreted as for all t > 0).  ±∞ ±∞ Proposition 4.60. For all sequences (xn) and (yn), if xn yn for all n N ≤ ∈ then lim sup xn lim sup yn and lim inf xn lim inf yn. ≤ ≤ Proof. For all k N, supn k xn supn k yn, so ∈ ≥ ≤ ≥ lim sup xn = inf sup xn inf sup yn = lim sup yn. k n k ≤ k n k ≥ ≥ Similarly for lim inf.  5. Limits Standing Hypothesis 5.1. From now on, all our functions f will satisfy dom f, ran f R unless otherwise specified. ⊆ Definition 5.2. Let f : A R, t A0, and L R. We say f(x) goes to L → ∈ ∈ x t as x goes to t, written f(x) L as x t, or f(x) → L, if for all  > 0 there exists δ > 0 such that for→ all x →A, −−→ ∈ if 0 < x t < δ then f(x) L < . | − | | − | Remark 5.3. With the above notation, a routine Triangle Inequality argu- ment shows that L is uniquely determined by f and t, so it makes sense to formalize this relationship: Definition 5.4. Let f : A R, t A , and L R. If f(x) L as x t, → ∈ 0 ∈ → → we also say L is the limit of f at t, written L = limx t f(x). → Example 5.5. (i) limx t x = t. → (ii) limx t L = L. → Proposition 5.6 (Sequential Characterization of Limits). Let f : A R → and t A0. Then limx t f(x) exists if and only if for every sequence (xn) ∈ → in A t converging to t, the sequence (f(xn)) converges (in which case \{ } limn f(xn) = limx t f(x)). →∞ → Proof. First assume limx t f(x) = L, and let (xn) be a sequence in A t converging to t. Given → > 0, choose δ > 0 such that 0 < x t < δ implies\{ } | − | f(x) L < . Now choose k N such that n k implies xn t < δ. | − | ∈ ≥ | − | Then n k implies f(xn) L < . Thus f(xn) L. Conversely,≥ assume| the condition− | regarding sequences.→ First note that if (xn) and (x ) are two sequences in A t converging to t, then n0 \{ } lim f(xn) = lim f(x0 ). n n n →∞ →∞ To see this, define

xn if k = 2n 1 zk := ( − xn0 if k = 2n. ANALYSIS LECTURES 35

Then zk t, so limk f(zk) exists by hypothesis. Since (f(xn)) and → →∞ (f(xn0 )) are both subsequences of (f(zk)), we must have lim f(xn) = lim f(xn0 ). Consequently, we can define L to be the common limit of all the sequences (f(xn)) for (xn) in A t converging to t. We will show limx t f(x) = L. Suppose not. Then there\{ } exists  > 0 such that for all δ > 0→ there exists x A such that ∈ 0 < x t < δ and f(x) L . | − | | − | ≥ In particular, for all n N there exists xn A such that ∈ ∈ 1 0 < xn t < and f(xn) L . | − | n | − | ≥ But then xn t and f(xn) L, a contradiction. → 6→  Definition 5.7. Given f, g : A R, we say f g if f(x) g(x) for all x A, and similarly for , <, and→ >. ≤ ≤ ∈ ≥ Proposition 5.8 (Squeeze Theorem for Limits). Let f, g, h: A R and t A . If f g h and → ∈ 0 ≤ ≤ lim f(x) = lim h(x) = L, x t x t → → then limx t g(x) = L. → Proof. Immediate from the Sequential Characterization of Limits and the Squeeze Theorem for Sequences. More precisely, suppose (xn) is any se- quence in A t converging to t. Then the sequences (f(xn)), (g(xn)), \{ } and (h(xn)) satisfy the hypothesis of the Squeeze Theorem for Sequences, and the Sequential Characterization of Limits tells us limn f(xn) = →∞ limn h(xn) = L, so we have g(xn) L. Again by the Sequential Char- →∞ → acterization of Limits, we conclude that limx t g(x) = L.  → Remark 5.9. In the above proof, the application of the Sequential Charac- terization of Limits and the corresponding result for sequences is so routine that in similar situations we will omit the details.

Proposition 5.10. Let f, g : A R and t A0. If limx t f(x) = 0 and g → ∈ → is bounded, then limx t f(x)g(x) = 0. → Proof. Immediate from the Sequential Characterization of Limits and the corresponding result for sequences. 

Proposition 5.11. Let f, g : A R and t A0. If f and g both have limits at t, then: → ∈

(i) limx t(f(x) + g(x)) = limx t f(x) + limx t g(x); → → → (ii) limx t cf(x) = c limx t f(x) if c R; → → ∈ (iii) lim x tf(x)g(x) = limx t f(x) limx t g(x) ; → →  →  f(x) limx t f(x) (iv) limx t = → if M = 0 and 0 ran g. → g(x) limx t g(x) 6 6∈ → 36 JOHN QUIGG

Proof. Immediate from the Sequential Characterization of Limits and the corresponding result for sequences.  Proposition 5.12. Let A, B R, t A , L B , M R, f : A B L , ⊆ ∈ 0 ∈ 0 ∈ → \{ } and g : B R. If limx t f(x) = L and limy L g(y) = M, then → → → lim g f(x) = M. x t → ◦ Proof. Let (xn) be a sequence in A t converging to t. Then (f(xn)) is a sequence in B L converging to\{L,} by the Sequential Characterization \{ } of Limits. Hence g f(xn) = g(f(xn)) M, again by the Sequential ◦ → Characterization of Limits.  Proposition 5.13. Let f, g : A R and t A , and suppose f and g both → ∈ 0 have limits at t. If f g, then limx t f(x) limx t g(x). ≤ → ≤ → Proof. Immediate from the Sequential Characterization of Limits and the corresponding result for sequences. 

Definition 5.14 (Restricted Limits). Let f : A R, B A, and t B0. The limit of f at t through B is defined as → ⊆ ∈ lim f(x) := lim f B (x). x t x t | x→B → ∈ Definition 5.15 (One-Sided Limits). Let f : A R and t R. → ∈ (i) Put B := A (t, ). If t B0 then the right-hand limit of f at t is defined as ∩ ∞ ∈ f(t+) := lim f(x). x t x→B ∈ (ii) We also write lim f(x) = lim f(x) = f(t+). x t x t+ ↓ → (iii) Similarly for left-hand limit. Lemma 5.16. Let f : A R and t R, and suppose → ∈ t A (t, ) 0 A ( , t) 0. ∈ ∩ ∞  ∩ ∩ −∞  Then f has a limit at t if and only if both one-sided limits exist and are equal, in which case lim f(x) = f(t ) = f(t+). x t → − Proof. One direction is trivial, so assume f(t ) = f(t+) = L. Given  > 0, choose δ > 0 such that for all x A, t δ <− x < t implies f(x) L <  and t < x < t + δ implies f(x)∈ L <− . Then 0 < x |t < δ−implies| f(x) L < . | − | | − | | − |  Definition 5.17 (Monotone Functions). A function f : A R is called: → (i) increasing if for all x, y A we have x < y implies f(x) f(y); (ii) decreasing if for all x, y ∈ A we have x < y implies f(x) ≤ f(y); ∈ ≥ ANALYSIS LECTURES 37

(iii) monotone if it is increasing or decreasing; (iv) strictly increasing if for all x, y A we have x < y implies f(x) < f(y); ∈ (v) strictly decreasing if for all x, y A we have x < y implies f(x) > f(y); ∈ (vi) strictly monotone if it is strictly increasing or strictly decreasing. Proposition 5.18. Let f :[a, b] R be increasing. Then: → (i) a < t b implies f(t ) exists and equals supxt f(x), and moreover f(≤t+) f(t); (iii) a s <≥ t b implies f(s+) f(t ). ≤ ≤ ≤ − Similarly if f is decreasing.

Proof. (i) Put L = supx 0, choose c < t such that f(c) > L . Then ≤ − c < x < t L  < f(c) f(x) L f(x) L < . ⇒ − ≤ ≤ ⇒ | − | Thus f(t ) = L. (ii) Similar− to (i). (iii) Choose c (s, t). Then ∈ f(s+) = inf f(x) f(c) sup f(x) = f(t ). x>s ≤ ≤ x 0 there exists M R ∞such that for all →x A, → ∞ ∈ ∈ if x > M then f(x) L < . | − | (ii) With the notation of (i), L is uniquely determined by f, and is called the limit of f at , written limx f(x). ∞ →∞ (iii) Similarly for limx f(x). →−∞ Definition 5.20 (Infinite Limits). (i) Let f : A R and t A0. We say f(x) goes to as x t, written f(x)→ as x∈ t, or ∞ → → ∞ → limx t f(x) = , if for all M R there exists δ > 0 such that for all x→ A, ∞ ∈ ∈ if 0 < x t < δ then f(x) > M. | − | (ii) Similarly for f(x) , and also for f(x) as x . → −∞ → ±∞ → ±∞ Remark 5.21. (i) The Sequential Characterization of Limits extends appropriately to limits at infinity and to infinite limits. For ex- ample, if dom f is not bounded above, then limx f(x) = L if →∞ and only if for every sequence (xn) in dom f, xn implies → ∞ f(xn) L. → 38 JOHN QUIGG

(ii) The basic properties of limits extend to limits at infinity. (iii) The basic properties of infinite limits of sequences extend to infinite limits of functions.

6. Continuity Standing Hypothesis 6.1. Recall that we are assuming that all our func- tions f satisfy dom f, ran f R unless otherwise specified. ⊆ Definition 6.2. Let f : A R and t A. → ∈ (i) f is called continuous at t if for all  > 0 there exists δ > 0 such that for all x A, ∈ if x t < δ then f(x) f(t) < . | − | | − | (ii) Otherwise we say f is discontinuous at t, or has a discontinuity at t. (iii) f is called continuous if it is continuous at each element of A. More generally, if B A, we say f is continuous on B if f is continuous at each element⊆ of B. Theorem 6.3 (Sequential Characterization of Continuity). Let f : A R → and t A. Then f is continuous at t if and only if for every sequence (xn) ∈ in A, if xn t then f(xn) f(t). → → Proof. First assume f is continuous at t, and let (xn) be a sequence in A converging to t. Given  > 0, choose δ > 0 such that for all x A, if x t < δ then f(x) f(t) < . Now choose k N such that n k ∈implies | − | | − | ∈ ≥ xn t < δ. Then n k implies f(xn) f(t) < . Thus f(xn) f(t). | − | ≥ | − | → Conversely, assume xn t implies f(xn) f(t). Suppose f is not continuous at t. Choose → > 0 such that for all→δ > 0 there exists x A such that x t < δ and f(x) f(t) . In particular, for all n N∈we | − | | − | ≥ ∈ can choose xn A such that ∈ 1 xn t < and f(xn) f(t) . | − | n | − | ≥ Then xn t by the Squeeze Theorem, but f(xn) f(t), giving a contra- → 6→ diction.  Proposition 6.4. Let f, g : A R and t A. If f and g are both continu- ous at t, then so are: → ∈ (i) f + g; (ii) cf if c R; (iii) fg; ∈ (iv) f if 0 ran g. g 6∈ Proof. If xn t, then f(xn) f(t) and g(xn) g(t), so: → → → (i) (f + g)(xn) = f(xn) + g(xn) f(t) + g(t) = (f + g)(t). → (ii) (cf)(xn) = cf(xn) cf(t) = (cf)(t). → (iii) (fg)(xn) = f(xn)g(xn) f(t)g(t) = (fg)(t). → ANALYSIS LECTURES 39

f f(xn) f(t) f (iv) (xn) = = (t). g  g(xn) → g(t) g  The result now follows from the Sequential Characterization of Continuity.  Theorem 6.5 (Limit Characterization of Continuity). Let f : A R and t A. → ∈ (i) If t A0, then f is continuous at t. (ii) If t 6∈ A , then f is continuous at t if and only if ∈ 0 lim f(x) = f(t). x t → Proof. (i) Since t A0, we can choose δ > 0 such that for all x A t we have x t δ. Then6∈ for any  > 0 and x A, if x t < δ then∈ x\{= }t, so f(x) =| f−(t),| ≥ hence f(x) f(t) <  ∈ | − | (ii) First assume| f is− continuous| at t. Given  > 0, choose δ > 0 such that for all x A, x t < δ implies f(x) f(t) < . Then trivially ∈ | − | | − | 0 < x t < δ implies f(x) f(t) < . Thus limx t f(x) = f(t). | − | | − | → Conversely, assume limx t f(x) = f(t). Given  > 0, choose δ > 0 such that for all x A, 0 < x→ t < δ implies f(x) f(t) < . Trivially, f(t) f(t) <∈ . Thus, for| all− x| A, x t <| δ implies− |f(x) f(t) < . | − | ∈ | − | | − | Therefore f is continuous at t.  Corollary 6.6 (Continuity Characterization of Limits). Let f : A R, → t A0, and L R. Then limx t f(x) = L if and only if the function g :∈A t R defined∈ by → ∪ { } → f(x) if x = t g(x) = ( 6 L if x = t is continuous at t.

Proof. Since g(x) = f(x) for all x A t , we have limx t g(x) = g(t) if ∈ \{ } → and only if limx t f(x) = L. By the Limit Characterization of Continuity, → g is continuous at t if and only if limx t g(x) = g(t). The result follows.  → Proposition 6.7. Let A, B R, f : A B, and g : B R. ⊆ → → (i) If t A0, limx t f(x) = L, and g is continuous at L, then ∈ → lim g f(x) = g(L). x t → ◦ (ii) If t A, f is continuous at t, and g is continuous at f(t), then g f is continuous∈ at t. ◦

Proof. (i) If xn A t and xn t, then f(xn) L by the Sequential ∈ \{ } → → Characterization of Limits, so g f(xn) g(L) by the Sequential Charac- terization of Continuity. ◦ → (ii) If xn t, then f(xn) f(t) by continuity of f, so g f(xn) g f(t) → → ◦ → ◦ by continuity of g.  Definition 6.8. Let f : A R and x A. → ∈ 40 JOHN QUIGG

(i) We say f has a maximum at x if f(x) = max ran f, in which case we write max f for this maximum value. Similarly, we write sup f for sup ran f. (ii) Similarly for minimum, min f, and inf f. Theorem 6.9 (Extreme Value Theorem). Every continuous function on a closed bounded interval has both a maximum and a minimum. Proof. Let f :[a, b] R be continuous. We only give the argument for the → maximum; the minimum is similar. Choose a sequence (yn) in ran f such that yn sup f, and for each n choose xn [a, b] such that yn = f(xn). By → ∈ the Bolzano-Weierstrass Theorem (xn) has a convergent subsequence (xnk ). Put x = lim xn . Since a xn b for all n, x [a, b]. By continuity, k ≤ ≤ ∈

sup f = lim f(xnk ) = f(x). Since f(x) ran f, f(x) = max f. ∈  Theorem 6.10 (Intermediate Value Theorem). If f :[a, b] R is contin- uous, then for all d between f(a) and f(b) there exists c →[a, b] such that f(c) = d. ∈ Proof. Without loss of generality f(a) d f(b). Put ≤ ≤ A := x [a, b]: f(x) d . { ∈ ≤ } Then a A, so A = . Let c = sup A. Choose a sequence (xn) in A such ∈ 6 ∅ that xn c. Since a xn b for all n, c [a, b]. Then f(xn) f(c) by → ≤ ≤ ∈ → continuity. Since f(xn) d for all n, f(c) d. Suppose f(c) < d. Then c < b since f(b) d. Choose≤ δ > 0 such that≤ for all x [a, b], x c < δ implies f(x) f≥(c) < d f(c). Pick x such that ∈ | − | | − | − c < x < min c + δ, b . { } Then f(x) f(c) < d f(c), so f(x) < d. But then x A and x > c, a − − ∈ contradiction.  Remark 6.11. The Intermediate Value Theorem is equivalent to the following statement: if f : A R is continuous and I A is an interval, then f(I) is also an interval. → ⊆ Definition 6.12. f : A R is called uniformly continuous if for all  > 0 there exists δ > 0 such that→ for all x, y A, ∈ if x y < δ then f(x) f(y) < . | − | | − | Remark 6.13. Note that f is continuous if and only if for all x A and  > 0 there exists δ > 0 such that for all y A we have x y <∈ δ implies f(x) f(y) < . The definition of uniform∈ continuity moves| − the| universally quantified| − x| across the existentially quantified δ, producing a stronger con- dition in that now a single δ has to work for all x simultaneously. Therefore, basic logic tells us that uniform continuity implies pointwise continuity. ANALYSIS LECTURES 41

Theorem 6.14. Every continuous function on a closed bounded interval is in fact uniformly continuous. Proof. Let f :[a, b] R be continuous. Suppose f is not uniformly contin- uous. Then there exists→  > 0 such that for all δ > 0 there exist x, y [a, b] such that x y < δ and f(x) f(y) . In particular, for all n N∈there | − | | − | ≥ ∈ exist xn, yn [a, b] such that ∈ 1 xn yn < and f(xn) f(yn) . | − | n | − | ≥

Choose a convergent subsequence (xnk ) of (xn), and put x := lim xnk . Since xn yn 0, we also have yn x. But then | k − k | → k →  f(xn ) f(yn ) f(x) f(x) = 0, ≤ | k − k | → | − | a contradiction.  Theorem 6.15. If f :[a, b] R is continuous and 1-1, then: → (i) f is strictly monotone; (ii) f([a, b]) is a closed bounded interval [c, d]; 1 (iii) f − :[c, d] [a, b] is strictly monotone (the same way f is); 1 → (iv) f − is continuous. Proof. Without loss of generality f(a) < f(b); for the other case just multi- ply by 1. (i) We− will show f is strictly increasing on [a, b]. Let a x < y b, and suppose f(a) > f(x). Then by the Intermediate Value Theorem≤ there≤ exists z [x, b] such that f(z) = f(a), a contradiction. Thus f(a) < f(x). Similarly,∈f(x) < f(b). Applying this reasoning to [x, b], we get f(x) < f(y). (ii) By the Extreme Value Theorem, f has a maximum d and a minimum c on [a, b]. Then f([a, b]) [c, d]. By the Intermediate Value Theorem, [c, d] ran f. Thus f([a, b])⊆ = [c, d]. ⊆ 1 1 (iii) Let c y < z d, and suppose f − (y) > f − (z). Since f is strictly increasing, ≤ ≤ 1 1 y = f(f − (y)) > f(f − (z)) = z, a contradiction. (iv) Let t [c, d] and  > 0. Without loss of generality it suffices to show that if t∈ < d then there exists δ > 0 such that t y < t + δ implies 1 1 1 1 ≤ 1 f − (y) f − (t) < , or equivalently f − (y) < f − (t) + , since f − is |increasing.− Without| loss of generality f 1(t) +  b. Then − ≤ 1 1 1 1 t y < f(f − (t) + ) implies f − (t) f − (y) < f − (t) + , ≤ ≤ so we can take δ = f(f 1(t) + ) t. − −  Remark 6.16. The above result (with obvious modifications) remains true if the closed bounded interval [a, b] is replaced by any type of interval. The arguments are routine, and we only indicate the idea for an open interval (a, b): so, assume f :(a, b) R is continuous and 1-1. Note that a and b are allowed to be infinite here.→ The conclusion is that: 42 JOHN QUIGG

(i) f is strictly monotone; (ii) f(a, b) is an open interval (c, d); 1 (iii) f − :(c, d) (a, b) is strictly monotone (the same way f is); 1 → (iv) f − is continuous. The argument from the above theorem can be copied almost verbatim, ex- cept for (ii), where the relevant facts are: f(a, b) is an interval by the Intermediate Value Theorem, and f can have neither a maximum nor a minimum, since it is strictly monotone on an open interval. Remark 6.17. The above results can be used to prove the existence of roots (see the discussion at the end of Section 2).

7. Differentiation Standing Hypothesis 7.1. Recall that we are assuming that all our func- tions f satisfy dom f, ran f R unless otherwise specified. ⊆ Definition 7.2. Let f be defined on some open interval containing t. (i) The derivative of f at t is defined as f(x) f(t) f 0(t) := lim − , x t x t → − provided this limit exists. (ii) f is called differentiable at t if f 0(t) exists. (iii) f is called differentiable if it is differentiable at each element of its domain. More generally, if B dom f, we say f is differentiable on B if f is differentiable at every⊆ element of B. (iv) We also write d f(x) = f (x). dx 0 R d Example 7.3. (i) If c then dx c = 0. d ∈ (ii) dx x = 1. Proposition 7.4. If f is differentiable at t, then f is continuous at t.

Proof. If x dom f t , then ∈ \{ } f(x) f(t) x t f(x) f(t) = − (x t) → f 0(t)0 = 0, − x t − −−→ − so limx t f(x) = f(t). Thus f is continuous at t.  → Proposition 7.5. If f and g are both differentiable at t, then:

(i) (f + g)0(t) = f 0(t) + g0(t); (ii) (Product Rule) (fg)0(t) = f 0(t)g(t) + f(t)g0(t); (iii) (cf) (t) = cf (t) if c R; 0 0 ∈ f 0 f 0(t)g(t) f(t)g0(t) (iv) (Quotient Rule)   (t) = − if 0 ran g. g g(t)2 6∈ ANALYSIS LECTURES 43

Proof. (i) We have (f + g)(x) (f + g)(t) f(x) f(t) g(x) g(t) − = − + − x t x t x t − x t − − → f 0(t) + g0(t). −−→ (ii) We have (fg)(x) (fg)(t) f(x)g(x) f(x)g(t) f(t)g(t) − = ± − x t x t − g(x) g(t)− f(x) f(t) = f(x) − + − g(t) x t x t x t − − → f(t)g0(t) + f 0(t)g(t), −−→ by continuity of f at t. (iii) Immediate from (ii), since the derivative of a constant is 0. 1 g0(t) (iv) By (ii), it suffices to show ( g )0(t) = −g(t)2 : 1 1 g (x) g (t) g(t) g(x) x t g0(t) − = − → − , x t (x t)g(x)g(t) −−→ g(t)2 − − by continuity of g at t.  Corollary 7.6 (Power Rule). d xn = nxn 1 for all n Z. dx − ∈ Proof. Case 1. n = 0. We have d d x0 = 1 = 0 = 0x 1. dx dx − Case 2. n = 1. We have d d x1 = x = 1 = 1x0. dx dx Case 3. n > 1. Inductively, if we assume d xn 1 = (n 1)xn 2, then dx − − − d n d n 1 n 2 n 1 n 1 x = (x − x) = (n 1)x − x + x − 1 = nx − . dx dx − Case 3. n < 0. Then n = k with k > 0, so − k 1 d n d 1 kx − k 1 n 1 x = = − = kx− − = nx − . dx dx xk x2k −  Lemma 7.7. f is differentiable at t if and only if there exists a function q which is continuous at t and satisfies (7.1) f(x) = f(t) + q(x)(x t) for all x dom f, − ∈ in which case f 0(t) = q(t). 44 JOHN QUIGG

Proof. Note that for x dom f t , Equation (7.1) is equivalent to ∈ \{ } f(x) f(t) q(x) = − . x t − Thus, by the Continuity Characterization of Limits, if L R then f 0(t) exists and equals L if and only if the function q : dom f R∈ defined by → f(x) f(t)  − if x = t q(x) =  x t 6 L − if x = t  is continuous at t. Of course, in this case Equation (7.1) holds for all x ∈ dom f, and we also have q(t) = f 0(t).  Remark 7.8. To show f is differentiable at t, it suffices to verify Equation (7.1) for x = t. 6 Theorem 7.9 (Chain Rule). If f is differentiable at t and g is differentiable at f(t), then g f is differentiable at t and ◦ (g f)0(t) = g0(f(t))f 0(t). ◦ Proof. Write f(x) = f(t) + q(x)(x t) − g(y) = g(f(t)) + r(y)(y f(t)), − with q continuous at t and r continuous at f(t). Then g f(x) = g f(t) + r(f(x))(f(x) f(t)) ◦ ◦ − = g f(t) + r(f(x))q(x)(x t). ◦ − Since (r f)q is continuous at t, g f is differentiable at t and ◦ ◦ (g f)0(t) = r f(t)q(t) = g0(f(t))f 0(t). ◦ ◦  Lemma 7.10 (Critical Point Lemma). Let f :(a, b) R and t (a, b). If → ∈ f has a maximum or minimum at t and is differentiable at t, then f 0(t) = 0. Proof. Without loss of generality assume f has a maximum at t. For all x (a, t) we have ∈ f(x) f(t) − 0, x t ≥ − and letting x t we get f 0(t) 0. On the other hand, for all x (t, b) we have ↑ ≥ ∈ f(x) f(t) − 0, x t ≤ and letting x t we get f (t) 0.− Therefore, we must have f (t) = 0. ↓ 0 ≤ 0  Lemma 7.11 (Rolle’s Theorem). Let f be continuous on [a, b] and differ- entiable on (a, b). If f(a) = f(b), then there exists c (a, b) such that ∈ f 0(c) = 0. ANALYSIS LECTURES 45

Proof. By the Extreme Value Theorem, f has a maximum and a minimum on [a, b]. Since f(a) = f(b), at least one of the maximum or minimum must occur at some c (a, b) (even if f is constant). By the Critical Point ∈ Lemma, f 0(c) = 0.  Remark 7.12. As an immediate corollary of Rolle’s Theorem (but which is not really worth recording as one of our official results, although we will use it once), is the following: if f and g are continuous on [a, b] and differentiable on (a, b), with f(a) = g(a) and f(b) = g(b), then there exists c (a, b) ∈ such that f 0(c) = g0(c). To see this, just apply Rolle’s Theorem to f g. Whenever this result is used, it can (by slight abuse of terminology)− be referred to simply as Rolle’s Theorem. Theorem 7.13 (Cauchy’s Mean Value Theorem). If f and g are continuous on [a, b] and differentiable on (a, b), then there exists c (a, b) such that ∈ f 0(c) g(b) g(a) = g0(c) f(b) f(a) . −  −  Proof. Define h:[a, b] R by → h(x) = f(x) g(b) g(a) g(x) f(b) f(a) . −  − −  Then h is continuous, is differentiable on (a, b), and we further have h(a) = h(b). By Rolle’s Theorem, there exists c (a, b) such that ∈ 0 = h0(c) = f 0(c) g(b) g(a) g0(c) f(b) f(a) , −  − −  and this implies the desired equation.  Corollary 7.14 (Mean Value Theorem). If f is continuous on [a, b] and differentiable on (a, b), then there exists c (a, b) such that ∈ f(b) f(a) = f 0(c)(b a). − − Proof. Just apply Cauchy’s Mean Value Theorem with g(x) = x.  Proposition 7.15. If f :[a, b] R is continuous, then: → (i) f 0 > 0 on (a, b) implies f is strictly increasing on [a, b]; (ii) f 0 < 0 on (a, b) implies f is strictly decreasing on [a, b]; (iii) f 0 0 on (a, b) implies f is increasing on [a, b]; (iv) f ≥ 0 on (a, b) implies f is decreasing on [a, b]; 0 ≤ (v) f 0 = 0 on (a, b) implies f is constant on [a, b]. Proof. In each case the desired behavior follows immediately from the Mean Value Theorem.  Definition 7.16 (Higher Order Derivatives). (i) For n = 0, 1,... the nth derivative of f is defined inductively as f if n = 0 (n) ( f := (n 1) (f − )0 if n > 0. 46 JOHN QUIGG

(ii) We also write dn f(x) = f (n)(x). dxn Remark 7.17. We have never allowed ourselves to consider differentiating a function at an endpoint of its domain. However, it makes sense to restrict a differentiable function to a closed interval [a, b] contained in its domain. It is occasionally convenient, as in the following theorem, to speak of a derivative on a closed interval [a, b], and whenever we do this we have in mind that the function is differentiable on some open interval containing [a, b]. It is interesting to note, however, that if we are considering a function f which is differentiable on [a, b], we don’t need any information of the values of f outside [a, b], even to compute the derivative at the endpoints, since we can use one-sided limits: for example, f 0(a) is the right-hand limit of (f(x) f(a))/(x a) as x a. The− following− result is a↓ generalization of the Mean Value Theorem to higher order derivatives: Theorem 7.18 (Taylor’s Theorem). Let I be a closed interval with one endpoint a, let J be the open interval with the same endpoints, and let n N. (n 1) ∈ Assume f − is continuous on I and differentiable on J. Then for all x I there exists c between a and x such that ∈ n 1 − f (k)(a)(x a)k f (n)(c)(x a)n f(x) = − + − . X k! n! k=0 Proof. First of all, if n = 1 the result is just the Mean Value Theorem, so without loss of generality assume that n > 1. Fix x I, and define g : I R by ∈ → n 1 − g(t) = f (k)(a)(t a)k/k! + M(t a)n, X − − k=0 where M R is chosen such that ∈ f(x) = g(x). Note that f (k)(a) = g(k)(a) for k = 0, . . . , n 1. Since f(x) = g(x), by − Rolle’s Theorem there exists c1 between a and x such that

f 0(c1) = g0(c1).

Then since f 0(a) = g0(a), again by Rolle’s Theorem there exists c2 between a and c1 such that f 00(c2) = g00(c2). Continuing inductively, after n steps we get c := cn between a and cn 1 such that f (n)(c) = g(n)(c). But g(n) is identically Mn!. Thus we get − f (n)(c) M = . n! Coupled with f(x) = g(x), this gives the desired result.  ANALYSIS LECTURES 47

Theorem 7.19 (L’Hˆopital’sRule). Let a and L be extended real numbers. Assume limx a f(x) = limx a g(x) = 0 or limx a g(x) = . If → → → ±∞ f (x) lim 0 = L, x a g (x) → 0 then f(x) lim = L. x a g(x) → Proof. Without loss of generality it suffices to show that if L < M then there exists b > a such that for all x (a, b) we have ∈ f(x) < M. g(x)

Pick K (L, M), and choose b1 > a such that for all x (a, b1), ∈ ∈ f (x) 0 < K. g0(x)

Then whenever a < x < y < b1, Cauchy’s Mean Value Theorem tells us there exists t (x, y) such that ∈ f(x) f(y) f (t) (7.2) − = 0 < K. g(x) g(y) g (t) − 0 Case 1. lim f = lim g = 0. Letting x a in (7.2) gives ↓ f(y) K < M for all y (a, b1). g(y) ≤ ∈

Case 2. lim g = . Choose b2 (a, y) such that for all x (a, b2), ∞ ∈ ∈ g(x), g(x) g(y) > 0. − g(x) g(y) Multiplying (7.2) by g(−x) gives f(x) f(y) (g(x) g(y)) − < K − , g(x) g(x) so f(x) f(y) g(y) x a < + K 1 ↓ K. g(x) g(x)  − g(x) −−→ f(x) Thus there exists b (a, b2) such that for all x (a, b), < M. ∈ ∈ g(x) Similarly for lim g = , limx a, or M < L.  −∞ ↑ Remark 7.20. It is interesting to note that l’Hˆopital’sRule in the “infinite” case only requires the denominator to have an infinite limit; however, if the numerator does not also have an infinite limit, then presumably l’Hˆopital’s Rule is not needed. 48 JOHN QUIGG

Theorem 7.21 (Inverse Function Theorem). Let f be 1-1 and continuous on an open interval containing x, and differentiable at x with f 0(x) = 0. 1 6 Then f − is differentiable at f(x) and

1 1 (f − )0(f(x)) = . f 0(x) 1 Proof. By Theorem 6.15, f − is continuous on an open interval I containing f(x). For all y I f(x) , ∈ \{ } 1 1 f − (y) f − (f(x)) 1 y f(x) 1 → − = 1 , y f(x) f(f − (y)) f(x) f (x) 1 − −−−−−→ 0 − f − (y) x − since y f(x) implies f 1(y) x by continuity of f 1 at f(x). → − → −  Remark 7.22. The above result can be used to extend the Power Rule to rational exponents.

8. Integration Standing Hypothesis 8.1. Recall that we are assuming that all our func- tions f satisfy dom f, ran f R unless otherwise specified. ⊆ Definition 8.2. Let < a < b < . −∞ ∞ n (i) A partition of [a, b] is defined to be a finite set P = xi i=0 such that { }

a = x0 < x1 < < xn = b, ··· and [a, b] denotes the set of all partitions of [a, b]. P n (ii) Given P = xi [a, b], the norm of P is defined as { }0 ∈ P P := max ∆xi, k k i where

∆xi := xi xi 1 for i = 1, . . . , n. − − (iii) If P,Q [a, b], we say Q refines P if Q P . ∈ P ⊇ Remark 8.3. Thus, a partition of an interval is just a finite subset which includes the endpoints. Actually, in other areas of mathematics, the words “partition” and “norm” have completely different meanings. n Definition 8.4. Let f be bounded on [a, b] and P = xi 0 [a, b]. The upper sum of f associated to P is defined as { } ∈ P n U(P ) = U(f, P ) := M ∆x , X i i 1 where

Mi = Mi(f) := sup f, [xi 1,xi] − ANALYSIS LECTURES 49 and the lower sum of f associated to P is n L(P ) = L(f, P ) := m ∆x , X i i 1 where mi = mi(f) := inf f. [xi 1,xi] − Lemma 8.5. (i) If Q refines P then L(P ) L(Q) U(Q) U(P ). ≤ ≤ ≤ (ii) For all P,Q [a, b], ∈ P L(P ) U(Q). ≤ Proof. (i) By induction, it suffices to assume Q = P s with xi 1 < s < xi (say). Put ∪{ } −

Mi0 := sup f, Mi00 := sup f, ∆xi0 := s xi 1, and ∆xi00 := xi s. − [xi 1,s] [s,xi] − − − Then M ,M M and ∆xi = ∆x + ∆x , so i0 i00 ≤ i0 i00 U(P ) U(Q) = Mi∆xi M 0∆x0 + M 00∆x00 − − i i i i  = (Mi M 0)∆x0 + (Mi M 00)∆x00 − i i − i i 0. ≥ Similarly for L. (ii) Let R = P Q. Then R refines both P and Q, so by (i) we have ∪ L(P ) L(R) U(R) U(Q). ≤ ≤ ≤  Definition 8.6. Let f be bounded on [a, b]. We say f is Riemann integrable on [a, b] if supP [a,b] L(P ) = infP [a,b] U(P ), in which case this common ∈P ∈P value is called the Riemann integral of f on [a, b], written b f or b f(x) dx. Ra Ra Example 8.7. b 1 = b a. Ra − Remark 8.8. It follows immediately from Lemma 8.5 that < sup L(P ) inf U(P ) < . −∞ P ≤ P ∞ n Definition 8.9. Let f be bounded on [a, b] and P = xi [a, b]. When- { }0 ∈ P ever ti [xi 1, xi] for i = 1, . . . , n, the number ∈ − n f(t )∆x X i i i=1 is called a Riemann sum for f associated to P , and (P ) = (f, P ) denotes the set of all Riemann sums for f associated to P .R R 50 JOHN QUIGG

Remark 8.10. For all S (P ) we have L(P ) S U(P ). Thus, the set (P ) is a subset of the∈ closedR interval [L(P ),R≤(P )];≤ it usually does not coincideR with the interval, in fact it may or may not include the endpoints. However, it gets arbitrarily close to those endpoints: n Lemma 8.11. Let f be bounded on [a, b] and P = xi [a, b]. Then: { }0 ∈ P (i) U(P ) = sup (P ); (ii) L(P ) = inf R(P ); R (iii) U(P ) L(P ) = supS,T (P ) S T . − ∈R | − | Proof. (i) Clearly U(P ) is an upper bound for (P ). Let  > 0. For each R i = 1, . . . , n choose ti [xi 1, xi] such that ∈ −  Mi f(ti) < . − b a − Then

U(P ) f(ti)∆xi = (Mi f(ti))∆xi − X X −  < ∆x X b a i = . − Thus U(P )  is not an upper bound for (P ). This proves (i), and (ii) is similar. − R (iii) We have sup S T = sup (S T ) S,T (P ) | − | S,T (P ) − ∈R ∈R = sup (P ) (P ) R − R  = sup (P ) + sup (P ) R −R  = sup (P ) inf (P ) R − R = U(P ) L(P ), − by Parts (i) and (ii).  Theorem 8.12. If f is bounded on [a, b], then the following are equivalent: (i) f is integrable on [a, b]. (ii) There exists I R such that for all  > 0 there exists δ > 0 such that for all P ∈ [a, b], if P < δ then for all S (P ) we have ∈ P k k ∈ R S I < . | − | (iii) There exists I R such that for all  > 0 there exists a partition P such that for all∈ S (P ) we have ∈ R S I < . | − | (iv) For all  > 0 there exists P [a, b] such that U(P ) L(P ) < . ∈ P − ANALYSIS LECTURES 51

Proof. (i) (ii). Assume f is integrable, and let  > 0. We will show that ⇒ b there exists δ > 0 such that if P [a, b] with P < δ then U(P ) < a f +. This suffices, for then a similar argument∈ P wouldk showk we could decreaseR δ if b b necessary so that also L(P ) > a f , and then we would have S a f <  R − n | −R | for any S (P ). Claim: if M = sup[a,b] f , P = xi 0 [a, b], s [a, b], and Q = P∈ R s , then | | { } ∈ P ∈ ∪ { } U(P ) U(Q) 2M P . − ≤ k k Let xi 1 < s < xi, and put − Mi0 := sup f, Mi00 := sup f, ∆xi0 := s xi 1, and ∆xi00 := xi s. − [xi 1,s] [s,xi] − − − Then

U(P ) U(Q) = (Mi M 0)∆x0 + (Mi M 00)∆x00 − − i i − i i 2M∆x0 + 2M∆x00 ≤ i i = 2M∆xi 2M P , ≤ k k and this verifies the claim. By induction, if Q refines P and has at most k more points than P , then U(P ) U(Q) 2kM P . − ≤ k k b Now, we can choose R [a, b] such that U(R) < a f + /2. Let R have k points, and choose δ∈ > P0 such that 2kMδ < /2.R Let P [a, b] with P < δ, and put Q = P R. Then by the above we have ∈ P k k ∪ U(P ) U(Q) + 2kM P U(R) + 2kMδ ≤ k k ≤ b   b < Z f + + = Z f + , a 2 2 a as desired. (ii) trivially implies (iii). (iii) (iv). Given  > 0 choose P [a, b] such that S I < /3 for all S (⇒P ). Then S T < 2/3 for∈ all PS, T (P ). Taking| − | the sup over S,∈ T Rgives U(P ) |L(−P ) | 2/3 < . ∈ R − ≤ (iv) (i). Assume f is not integrable. Then  := infP U(P ) supP L(P ) > 0, and⇒ for every P [a, b] we have − ∈ P U(P ) L(P ) . − ≥ We have shown that (iv) if false if (i) is, so we are done.  Remark 8.13. (i) The equivalence (i) (ii) in the above theorem is called Darboux’s Theorem. ⇔ (ii) The condition (iv) in the above theorem is called Riemann’s Crite- rion. (iii) In (ii) and (iii) of the above theorem, of course we have I = b f. Ra Theorem 8.14. Every continuous function on [a, b] is integrable on [a, b]. 52 JOHN QUIGG

Proof. Let f :[a, b] R be continuous. Since the interval [a, b] is closed and bounded, f is uniformly→ continuous. Given  > 0, choose δ > 0 such that for all x, y [a, b], ∈  x y < δ implies f(x) f(y) < . | − | | − | 2(b a) − n Now choose P = xi [a, b] such that P < δ. Then { }0 ∈ P k k  Mi mi for all i, − ≤ 2(b a) − so  U(P ) L(P ) = (Mi mi)∆xi ∆xi − X − ≤ 2(b a) X − (b a)  = − = < . 2(b a) 2 −  Example 8.15. The Dirichlet function f on [0, 1], defined by 1 if x Q f(x) = ( ∈ 0 if x / Q, ∈ is not integrable. Proposition 8.16. If f and g are both integrable on [a, b], then: (i) b(f + g) = b f + b g; Ra Ra Ra (ii) b cf = c b f if c R. Ra Ra ∈ Proof. (i) Given  > 0, choose P [a, b] such that if ti [xi 1, xi] for all i then ∈ P ∈ − b b  f(ti)∆xi Z f , g(ti)∆xi Z g < . X − X − 2 a a To see that this is possible, choose δ 1, δ2 > 0 such that for all P [a, b], if b ∈ P P < δ1 then for all S (f, P ) we have S f < /2, and if P < δ2 k k ∈ R | − Ra | k k then for all T (g, P ) we have T b g < /2, and then we can take any ∈ R | − Ra | P with P < min δ1, δ2 . Then k k { } b b (f + g)(ti)∆xi Z f + Z g X − a a b b f(ti)∆i Z f + g(ti)∆i Z g < . ≤ X − X − a a

(ii) Given  > 0, choose P [a, b] such that if ti [xi 1, xi] for all i then ∈ P ∈ − b  f(ti)∆xi Z f < . X − c + 1 a | | ANALYSIS LECTURES 53

Then b b cf(ti)∆xi c Z f = c f(ti)∆xi Z f X − | | X − a a c  | | < . ≤ c + 1 | |  Proposition 8.17. Let a < b < c. Then f is integrable on [a, c] if and only if it is integrable on both [a, b] and [b, c], in which case c b c Z f = Z f + Z f. a a b Proof. First assume f is integrable on [a, c]. Given  > 0, choose P [a, c] such that U(P ) L(P ) < . Without loss of generality b P , since∈ P − ∈ throwing b into P can only decrease U(P ) L(P ). Put P 0 = P [a, b] and P = P [b, c]. Then P [a, b] and P − [b, c], and we have∩ 00 ∩ 0 ∈ P 00 ∈ P U(P ) L(P ) = U(P 0) L(P 0) + U(P 00) L(P 00), − − − so U(P 0) L(P 0),U(P 00) L(P 00) < . − − Conversely, assume f is integrable on both [a, b] and [b, c]. Given  > 0, choose P [a, b] and P [b, c] such that 0 ∈ P 00 ∈ P b c  S0 Z f , S00 Z f < − − 2 a b for all S (P ) and S (P ). Put P = P P [a, c], and 0 ∈ R 0 00 ∈ R 00 0 ∪ 00 ∈ P let S (P ). Then S = S0 + S00, with S0 (P 0) and S00 (P 00), respectively,∈ R so ∈ R ∈ R b c b c S Z f + Z f S0 Z f + S00 Z f < . − ≤ − − a b a b c b Therefore, Theorem 8.12 tells us f is integrable on [a, c], with a f = a f + c R R f. Rb  Definition 8.18. Define b a f if a > b Z f := (− Rb a 0 if a = b. Corollary 8.19. Let f be integrable on a closed interval I. Then for all a, b, c I, ∈ c b c Z f = Z f + Z f. a a b Proof. This follows from considering the cases determined by the possible relative positions of a, b, c.  54 JOHN QUIGG

Proposition 8.20. Let f and g be integrable on [a, b]. If f g, then b f ≤ Ra ≤ b g. In particular, if m f M on [a, b], then m(b a) b f M(b a). Ra ≤ ≤ − ≤ Ra ≤ − Proof. For every P [a, b] we have b f U(f, P ) U(g, P ). Thus ∈ P Ra ≤ ≤ b f b g. Ra ≤ Ra  Proposition 8.21. If f is integrable on [a, b], then so is f , and | | b b Z f Z f . ≤ | | a a n Proof. Given  > 0, choose P = xi [a, b] such that U(f, P ) { }0 ∈ P − L(f, P ) < . For each i = 1, . . . , n, if s, t [xi 1, xi] then ∈ − f(s) f(t) f(s) f(t) Mi(f) mi(f), | | − | | ≤ | − | ≤ − so Mi( f ) mi( f ) Mi(f) mi(f). | | − | | ≤ − Hence U( f ,P ) L( f ,P ) U(f, P ) L(f, P ) < . | | − | | ≤ − Thus f is integrable. For| the| other part, f f f on [a, b], so −| | ≤ ≤ | | b b b Z f Z f Z f . − a | | ≤ a ≤ a | | Hence b b Z f Z f . ≤ | | a a  Proposition 8.22. If f and g are integrable on [a, b], then so is fg. Proof. Case 1. f = g. Put M = sup f , and without loss of generality | |n M > 0. Given  > 0, choose P = xi [a, b] such that U(f, P ) { }0 ∈ P − L(f, P ) < /(2M). For each i = 1, . . . , n, if s, t [xi 1, xi] then ∈ − 2 2 f (s) f (t) = f(s) + f(t) f(s) f(t) 2M(Mi(f) mi(f)). −  −  ≤ − Hence 2 2 Mi(f ) mi(f ) 2M(Mi(f) mi(f)), − ≤ − so U(f 2,P ) L(f 2,P ) 2M(U(f, P ) L(f, P )) < . − ≤ − Case 2. f, g arbitrary. Then 1 fg = (f + g)2 (f g)2 , 4 − −  which is integrable by the above.  ANALYSIS LECTURES 55

Theorem 8.23 (Mean Value Theorem for Integrals). Let f, g :[a, b] R. If f is continuous and g is integrable and nonnegative, then there exists→ c [a, b] such that ∈ b b Z fg = f(c) Z g. a a Proof. Put m = min f and M = max f. Then mg fg Mg, so ≤ ≤ b b b m Z g Z fg M Z g. a ≤ a ≤ a

If b g = 0, then b fg = 0 and the conclusion holds. On the other hand, if Ra Ra b g > 0 then Ra b fg m Ra M. ≤ b g ≤ Ra Since f is continuous, by the Intermediate Value Theorem there exists c [a, b] such that ∈ b fg f(c) = Ra , b g Ra which implies the desired equality.  x Theorem 8.24. If f is integrable on [a, b], then the function x a f is continuous on [a, b]. 7→ R

Proof. Put M = sup f . Given  > 0, choose δ > 0 such that Mδ < . Then a y x b and x | |y < δ imply ≤ ≤ ≤ − x y x x Z f Z f = Z f Z f M(x y) < . − ≤ | | ≤ − a a y y



Theorem 8.25 (Fundamental Theorem of Calculus). Let f be integrable on [a, b]. R x (i) Define F :[a, b] by F (x) = a f. For all t (a, b), if f is continuous at t then→ R ∈

F 0(t) = f(t).

(ii) If G is continuous on [a, b] and G0 = f on (a, b), then

b Z f = G(b) G(a). a − 56 JOHN QUIGG

Proof. (i) If x [a, b] t then ∈ \{ } F (x) F (t) x f t f − f(t) = Ra − Ra f(t) x t − x t − − − x f(s) ds x f( t) ds = Rt Rt x t − x t − − x f(s) f(t) ds = Rt −  x t − sup f(s) f(t) : s between t and x x t | − | | − | ≤ x t | − | = sup f(s) f(t) : s between t and x | − | x t → 0. −−→ n (ii) Given  > 0, choose P = xi 0 [a, b] such that for all S (P ) we have S T < . By the Mean{ } Value∈ P Theorem, for each i =∈ 1, R. . . , n | − | there exists ti (xi 1, xi) such that ∈ − n G(b) G(a) = G(xi) G(xi 1) = f(ti)∆xi. − X − −  X 1 Hence b G(b) G(a) Z f < . − − a Letting  0 gives the conclusion. →  Remark 8.26. We stated Part (ii) of the Fundamental Theorem of Calculus in a somewhat fussy way; in most cases, the (only slightly less general) b statement “if G0 is integrable on [a, b], then a G0 = G(b) G(a)” is good enough. However, note that this requires consideringR a derivative− on a closed interval, and that our convention is that this tacitly means G extends to a differentiable function on an open interval containing [a, b]. In the next two theorems we use this less fussy (and more convenient) way of stating the hypotheses.

Theorem 8.27 (Integration by Parts). If f 0 and g0 are integrable on [a, b], then b b Z f 0g = f(b)g(b) f(a)g(a) Z fg0. a − − a Proof. Since f and g are differentiable, they are continuous, hence integrable. Thus fg0 and f 0g are integrable. By the Fundamental Theorem of Calculus, b b b f(b)g(b) f(a)g(a) = Z (fg)0 = Z fg0 + Z f 0g, − a a a implying the desired equality.  ANALYSIS LECTURES 57

Theorem 8.28 (Change of Variables Theorem). If φ0 is integrable on [a, b] and f is continuous on φ([a, b]) then

b φ(b) Z f(φ(x))φ0(x) dx = Z f(u) du. a φ(a)

Proof. Define F : φ [a, b] R by F (x) = x f. Then by the Fundamental  → Rφ(a) Theorem of Calculus we have F 0 = f on φ([a, b]) and (F φ)0 = (f φ)φ0 on [a, b] (where we used the Chain Rule for the latter). Since◦φ is differentiable,◦ it is continuous, and since f is also continuous, the composition f φ is ◦ continuous, hence integrable. Since φ0 is also integrable, the product (f φ)φ0 is integrable. We apply the (other part of the) Fundamental Theorem◦ of Calculus (twice) in the following computation to finish:

φ(b) b Z f = F φ(b) F φ(a) = Z (F φ)0 φ(a) ◦ − ◦ a ◦ b b = Z (F 0 φ)φ0 = Z (f φ)φ0. a ◦ a ◦  Remark 8.29. In the above proof, it might seem that we violated our rule prohibiting differentiating at an endpoint: the hypothesis only gave us f on the closed interval φ([a, b]) (which might not have endpoints φ(a) and φ(b), by the way), and so the Fundamental Theorem of Calculus seems to only give us F 0 = f on the corresponding open interval (that is, the open interval with the same endpoints). However, everything is ok, because f can be extended to a continuous function on an open interval containing φ([a, b]) (for example, just make f constant to the right and to the left). Definition 8.30. (i) Let f be Riemann integrable on [a, t] for every t (a, b). If either f is unbounded on [a, b) or b = we define ∈ ∞ b t Z f := lim Z f, t b a ↑ a b provided this limit exists, in which case we say a f exists as an improper integral. Note: throughout this definition,R we require all limits to exist as real numbers, not extended real numbers. (ii) Similarly, if f is Riemann integrable on [t, b] for every t (a, b), and either f is unbounded on (a, b] or a = , we define ∈ −∞ b b Z f := lim Z f, t a a ↓ t b provided this limit exists, in which case we again say a f exists as an improper integral. R 58 JOHN QUIGG

(iii) If f is Riemann integrable on [s, t] whenever a < s < t < b, and either f is unbounded on (a, b) or a or b is infinite, we pick any c (a, b), and define ∈ b c b Z f := Z f + Z f, a a c c b provided each of a f and c f exists as either a Riemann or an R R b improper integral, in which case we again say a f exists as an improper integral. R Remark 8.31. (i) In case (i) of the above definition, if f is bounded, b R, and f is defined at b, then in fact the hypothesis guarantees that∈ f must be Riemann integrable on [a, b], and then, by continuity of the integral as a function of the upper limit of integration, we b t have a f = limt b a f. Similarly in cases (ii) and (iii). (ii) On theR other hand,↑ R again in case (i) above, if f is bounded and b R, but f is not defined at b, then we extend f to g :[a, b] R by∈ assigning the value g(b) arbitrarily, and say (by slight abuse→ of terminology) the original function f is Riemann integrable on [a, b], b b with a f := a g. This latter value does not depend upon the choiceR of g(b).R Similarly in cases (ii) and (iii). b b (iii) In case (i) above, if c (a, b) then a f exists if and only if c f b ∈ c b R R does, in which case a f = a f + c f. Similarly in case (ii). (iv) In case (iii) above, theR choiceR of c R (a, b) does not matter. b b ∈ Proposition 8.32. If each of a f and a g exists as a Riemann or improper integral, then R R (i) b(f + g) = b f + b g; Ra Ra Ra (ii) b cf = c b f if c R. Ra Ra ∈ Proof. If necessary, split the interval [a, b] into two, so that without loss of generality f and g are integrable on [a, t] for all t (a, b). For (i), we have ∈ b b b b Z (f + g) = lim Z (f + g) = limZ f + Z g t b t b a ↑ t ↑ t t b b = lim Z f + lim Z g t b t b ↑ t ↑ t b b = Z f + Z g. a a For (ii), we have b t t t b Z cf = lim Z cf = lim c Z f = c lim Z f = c Z f. t b t b t b a ↑ a ↑ a ↑ a a  ANALYSIS LECTURES 59

Theorem 8.33 (Comparison Theorem for Improper Integrals). Let f be Riemann integrable on [s, t] whenever a < s < t < b, and suppose b g exists Ra and 0 f g. Then b f exists, and ≤ ≤ Ra b b Z f Z g. a ≤ a Proof. If necessary, split the interval [a, b] into two, so that without loss of generality f and g are integrable on [a, t] for all t (a, b). Since 0 f g, for all t (a, b) we have ∈ ≤ ≤ ∈ t t b Z f Z g Z g. a ≤ a ≤ a t Since f 0, the function t a f is increasing and bounded above, so the ≥ t 7→ R left hand limit limt b a f exists, and we get ↑ R b t b Z f = lim Z f Z g. t b ≤ a ↑ a a  p Example 8.34. 1∞ 1/x dx exists if and only if p > 1. To verify this, first let p > 1. ThenR 1 p t x − 1 lim = , t 1 p p 1 →∞ 1 a − − since limt t = 0 when a < 0. For p = 1, →∞ t lim log x = , t ∞ →∞ 1 p so 1∞ 1/x dx does not exist. Finally, for p < 1 we have 1/x 1/x for all R p ≥ x 1, so by the Comparison Theorem ∞ 1/x dx fails to exist. ≥ R1 Corollary 8.35. If f is Riemann integrable on [s, t] whenever a < s < t < b, and if b f exists, then so does b f, and Ra | | Ra b b Z f Z f . ≤ | | a a Proof. Since 0 f + f 2 f b ≤ | | ≤ | | b and a 2 f exists, by the Comparison Theorem so does a ( f + f), hence so doesR | | R | | b b Z f = Z ( f + f) f . a a | | − | | For the other part, if a < s < t < b then t t b Z f Z f Z f , ≤ | | ≤ | | s s a

60 JOHN QUIGG so letting s a and t b we get ↓ ↑ b b Z f Z f . ≤ | | a a

 Remark 8.36. Combining the above corollary with the Comparison Theo- rem, we see that if f is Riemann integrable on [s, t] whenever a < s < t < b, and if b g exists and f g, then b f exists. Ra | | ≤ Ra 9. Log and exp Definition 9.1 (The “Natural” Logarithm Function). Define log: (0, ) R by ∞ → x dt log x = Z . 1 t 1 Proposition 9.2. (i) log0(x) = x ; (ii) log 1 = 0; (iii) log xy = log x + log y; x (iv) log = log x log y; y − (v) log xn = n log x for all n Z; (vi) log is strictly increasing; ∈ (vii) limx 0 log x = and limx log x = ; (viii) log is↓ onto R. −∞ →∞ ∞ Proof. (i) This follows from the Fundamental Theorem of Calculus. (ii) We have 1 dt log 1 = Z = 0. 1 t (iii) Fix y > 0. Then d 1 d 1 d log xy = (xy) = = log x. dx xy dx x dx Hence there exists c R such that log xy = log x + c for all x > 0. Letting x = 1, we find c = log∈y. (iv) This follows from (iii), since x x log + log y = log y = log x. y  y  (v) This follows from (iii) and induction for n > 0, from (ii) for n = 0, and from (ii) and (iv) for n < 0. (vi) Since log0 > 0, log is strictly increasing by the Mean Value Theorem. (vii) Since log is increasing, it is either bounded below or limx 0 log x = . Since log 2 > log 1 = 0, ↓ −∞ n n log 2− = n log 2 →∞ , − −−−→−∞ hence log is not bounded below. Therefore, we must have limx 0 log x = . ↓ −∞ ANALYSIS LECTURES 61

Similarly, to show limx log x = , it suffices to notice →∞ ∞ n n log 2 = n log 2 →∞ . −−−→∞ (viii) This follows from (vii) and the Intermediate Value Theorem, since log is continuous.  Remark 9.3. In the proof of (vii) above, we used the letter n to (lazily) signal that we were considering a sequence. Definition 9.4 (The Exponential Function). Define exp: R (0, ) by 1 → ∞ exp = log− .

Proposition 9.5. (i) exp0(x) = exp(x); (ii) exp(0) = 1; (iii) exp(x) exp(y) = exp(x + y); exp(x) (iv) = exp(x y); exp(y) − (v) exp is strictly increasing; (vi) limx exp(x) = 0 and limx exp(x) = ; (vii) exp →−∞is onto (0, ). →∞ ∞ ∞ Proof. (i) By the Inverse Function Theorem, 1 exp0(x) = = exp(x). log0(exp(x)) (ii) exp(0) = exp(log 1) = 1. (iii) We have exp(x) exp(y) = exp log exp(x) exp(y)   = exp log exp(x) + log exp(y)  = exp(x + y). (iv) We have exp(x) exp(x) = exp log exp(y)  exp(y)  = exp log exp(x) log exp(y) −  = exp(x y). − (v) This follows from Theorem 6.15, since exp is the inverse of the strictly increasing function log. (vi) This follows from the corresponding limits involving log. (vii) The range of exp is the domain of log, which is (0, ). ∞  Definition 9.6 (Arbitrary Powers). For each x > 0 and t R define ∈ xt = exp(t log x). Proposition 9.7. For all x, y > 0 and t, s R: ∈ (i) xtxs = xt+s; 62 JOHN QUIGG

xt (ii) = xt s; xs − (iii) log xt = t log x; (iv) (xt)s = xts; (v) (xy)t = xtyt; d t t 1 (vi) (Power Rule for Arbitrary Exponents) dx x = tx − .

Proof. (i) We have

xtxs = exp(t log x) exp(s log x) = exp(t log x + s log x) = exp((t + s) log x) = xt+s.

(ii) We have

xt exp(t log x) = = exp(t log x s log x) xs exp(s log x − t s = exp((t s) log x) = x − . − (iii) log xt = log exp(t log x) = t log x. (iv) (xt)s = exp(s log xt) = exp(st log x) = xst = xts. (v) We have

(xy)t = exp(t log xy) = exp(t(log x + log y)) = exp(t log x + t log y) = exp(t log x) exp(t log y) = xtyt.

(vi) We have d d t xt = exp(t log x) = exp(t log x) = txt 1. dx dx x − 

Definition 9.8 (The Number e). e := exp(1). Proposition 9.9. ex = exp(x) for all x R. ∈ x Proof. e = exp(x log e) = exp(x). 

Example 9.10. The exponential and logarithm functions, together with l’Hˆopital’s Rule, allow easy computation of many limits which are in certain “exponen- tial indeterminate” forms, for example: x (i) limx 0 x = 1; ↓ 1/x (ii) limx x = 1; →∞ x t t (iii) limx 1 +  = e if t R. →∞ x ∈ ANALYSIS LECTURES 63

10. Series

Definition 10.1. (i) Given a sequence (an) in R, define a new se- quence (sk) by k s = a . k X n n=1

The series ∞ an is defined to be the sequence (sk). Pn=1 (ii) We say the series ∞ an has nth term an, and kth partial sum Pn=1 sk. (iii) If the series ∞ an converges (that is, if the sequence (sk) of par- Pn=1 tial sums converges), the limit limk sk is (ambiguously) denoted →∞ ∞ an, and called the sum of the series. Pn=1 Remark 10.2. Series and sequences are just two ways of looking at the same thing. More precisely, not only does every series uniquely determine the sequence of partial sums (as in the above definition), but conversely every sequence of real numbers is the sequence of partial sums of a unique series: given the sequence (sk), define the series an by P

s1 if n = 1 an = ( sn sn 1 if n > 1. − − Remark 10.3. Just as with sequences, it is often convenient to allow a series to “start somewhere other than 1”, for example we frequently encounter series of the form ∞ an. Also, it is frequently convenient to refer to a Pn=0 series using the abbreviated notation “ an”, especially when discussing general properties. This abbreviated notationP allows the starting point of the series to be anything, but in the development of the general theory we usually assume the starting point of the series is 1. n Example 10.4. If x < 1, the geometric series ∞ x converges and | | Pn=0 ∞ 1 xn = . X 1 x n=0 − To see this, note that an easy induction argument shows the identity

k 1 xk+1 xn = − , X 1 x n=0 − and xk+1 0 since x < 1. This is one of the rare instances where we can get a closed→ form expression| | for the partial sums. If c R it is sometimes convenient to abuse the terminology by referring to a∈ series of the form cxn (no matter what the starting point is) as a geometric series as well. P 64 JOHN QUIGG

2 Example 10.5. The series n∞=1 1/(n +n) converges, because the kth partial sum is P k k 1 1 1 1 k sk = = = 1 →∞ 1. X n2 + n Xn − n + 1 − k + 1 −−−→ n=1 n=1

Again, we got lucky enough to find a closed form for sk. In this case we say we have a telescoping sum. 1 Example 10.6. The harmonic series ∞ diverges, because P1 n k 1 k+1 dx sk = Z = log(k + 1) . X n ≥ x → ∞ n=1 1 The technique of this example is generalized in the Integral Test below. Remark 10.7. The following result could be phrased as “a series converges it and only if it is Cauchy”:

Theorem 10.8. an converges if and only if for all  > 0 there exists P k l N such that if k j l then an < . ∈ ≥ ≥ j P Proof. This follows from the corresponding result for sequences, since k sk sj 1 = an . | − − | X j

 Remark 10.9. The above theorem shows immediately that changing (or re- moving) finitely many terms of a series does not affect whether it converges (although of course it can affect the sum if the series converges).

Corollary 10.10. If an converges, then an 0. P → Proof. This follows immediately from the preceding theorem, since n a = a . n X k n  Remark 10.11. The above corollary only goes one way; it can happen that an 0 but an diverges: the most elementary example of this phenomenon is the→ harmonicP series 1/n. P n Example 10.12. If x 1, the geometric series 0∞ x diverges. This follows immediately from| the| ≥ above corollary, since xP 1 implies xn 0. | | ≥ 6→ Remark 10.13. The following theorem says (roughly) that a series can some- times be viewed as a sort of “infinite Riemann sum” for an improper integral: Theorem 10.14 (Integral Test). Let f : [1, ) R be decreasing and non- ∞ → negative. Then the series ∞ f(n) converges if and only if the improper P1 integral ∞ f exists. R1 ANALYSIS LECTURES 65

Proof. Since f(n) 0 for all n, the sequence of partial sums ≥ k s = f(n) k X 1 is increasing, hence converges if and only if it is bounded above. For k 2, ≥ k k k 1 − f(n) Z f f(n), X ≤ ≤ X 2 1 1 k ∞ so the sequence 1 f(n) k=1 is bounded above if and only if the sequence k P  x x f ∞ is. Since f 0, the function x f is increasing, so limx f R1 k=1 ≥ 7→ R1 →∞ R1 exists (in the real numbers) if and only if the sequence k f is bounded R1  above. The result follows.  Remark 10.15. In the Integral Test, the left endpoint of the interval [1, ) can be replaced by any positive number; thus it is enough for f to be even-∞ tually decreasing and nonnegative (meaning there exists a > 0 such that the desired property holds on [a, )). ∞ 1 Example 10.16. The p-series 1∞ np converges if and only p > 1. This is most easily seen by consideringP the cases p > 0 and p 0. When p > 0, the Integral Test applies with the nonnegative decreasing≤ function x 1/xp, p 7→ and we know 1∞ 1/x dx exists if and only if p > 1. When p 0, the terms 1/np do not goR to 0, so the series diverges. ≤ Note that the harmonic series is the special case p = 1.

Proposition 10.17. If an and bn both converge, then: P P (i) (an + bn) = an + bn; P P P (ii) can = c an if c R. P P ∈ k k k Proof. (i) 1(an + bn) = 1 an + 1 bn 1∞ an + 1∞ bn. k P k P P → P P (ii) 1 can = c 1 an c 1∞ an.  P P → P Theorem 10.18 (Comparison Test). If an bn for all n and bn con- | | ≤ P verges, then an converges and an bn. P P ≤ P k Proof. Given  > 0, choose l N such that k j l implies j bn < . Then k j l implies ∈ ≥ ≥ P ≥ ≥ k k k an an bn < . X ≤ X | | ≤ X j j j

This shows an converges. For the otherP part, for all k N we have ∈ k k an bn. X ≤ X 1 1

66 JOHN QUIGG

Letting k , we get → ∞ ∞ ∞ an bn. X ≤ X 1 1



Definition 10.19. We say a series an converges absolutely if an con- verges. P P | |

Corollary 10.20. If an converges absolutely, then an converges and P P an an . P ≤ P | |

Proof. This follows immediately from the Comparison Test. 

Remark 10.21. Thus, in the Comparison Test we can conclude that an P converges absolutely. Moreover, if we only have an bn for large n (mean- ing there exists k N such that for all n k the| | desired ≤ property holds), ∈ ≥ then we can still conclude an converges absolutely, although we no longer P have the inequality an bn. | P | ≤ P Remark 10.22. If both series have nonnegative terms, there is a useful con- trapositive of the Comparison Test: if 0 an bn for large n and an ≤ ≤ P diverges, then bn also diverges. P Corollary 10.23 (Limit Comparison Test). If the sequence of fractions (an/bn) converges to a nonzero real number, then the series an converges P absolutely if and only if bn does. P

Proof. Letting L = lim an/bn, we have

L bn 3 L bn | || | < an < | || | 2 | | 2 for large n, so the result follows from the Comparison Test.  Definition 10.24. We say a series converges conditionally if it converges but not absolutely. Remark 10.25. Conditionally convergent series are delicate (for example, see the remark following the Rearrangement Theorem below); the only general test we will prove which has the power to recognize conditional convergence is the following:

Theorem 10.26 (Alternating Series Test). If an 0, then the alternating n ↓ series ∞( 1) an converges, and moreover the sum satisfies the inequalities P0 −

∞ n 0 ( 1) an a0. ≤ X − ≤ 0

Moreover, these inequalities are strict if (an) is strictly decreasing. ANALYSIS LECTURES 67

Proof. For any k N, ∈ k n ak 1 ak if k odd ( 1) an = (a0 a1) + (a2 a3) + + ( − − X − − − ··· a if k even 0 k 0. ≥ On the other hand, k n ak 1 ak if k even ( 1) an = a0 (a1 a2) ( − − X − − − − · · · − a if k odd 0 k a0. ≤ Given  > 0, choose p N such that ap < . Then k j p implies ∈ ≥ ≥ k n ( 1) an aj ap < , X − ≤ ≤ j

n by the above reasoning. Therefore, ( 1) an converges, and then the inequality P − k n 0 ( 1) an a0 for all k ≤ X − ≤ 0 gives

∞ n 0 ( 1) an a0. ≤ X − ≤ 0 Moreover, if (an) is strictly decreasing, the even partial sums s2k are strictly decreasing and the odd partial sums s2k+1 are strictly increasing, so the sum is strictly between 0 and a0.  Remark 10.27. More generally, a series is called alternating if its signs al- ternate (although this allows for some 0 terms), and the Alternating Series Test says that if the absolute values of the terms of an alternating series decrease to 0 then the series converges and its sum is between 0 and the first term of the series. n+1 Example 10.28. The alternating harmonic series 1∞( 1) /n converges by the Alternating Series Test, but the convergenceP − is only conditional, since the series of absolute values is the harmonic series, which diverges. The Alternating Series Test also says the sum of the alternating harmonic series is strictly between 0 and 1; actually, we’ll see later that the sum is log 2.

Theorem 10.29 (Rearrangement Theorem). If ∞ an converges absolutely P1 and f : N N is 1-1 onto, then ∞ af(n) converges absolutely and → P1 ∞ ∞ a = a . X f(n) X n 1 1 68 JOHN QUIGG

Proof. Since af(n) is a rearrangement of an , it suffices to show P | | P | | af(n) converges and af(n) = an. Given  > 0, choose j N such P P P ∈ that ∞ an < /2. Now choose l N such that Pj+1 | | ∈ f 1, . . . , l 1, . . . , j . { } ⊇ { } Then k l implies ≥ k k j j ∞ ∞ a an a an + an an X f(n) − X ≤ X f(n) − X X − X 1 1 1 1 1 1

∞ = a + an X f(n) X n k j+1 ≤ f(n)>j

∞ a + an ≤ X | f(n)| X | | n k j+1 f(n≤)>j

∞ 2 an ≤ X | | j+1 < .



Remark 10.30. What if an converges conditionally? Then it turns out that there exist divergentP rearrangements, and moreover for any s R there exists a rearrangement with sum s. This is called Riemann’s Rearrangement∈ Theorem; since we do not need it, we do not include the proof. As an exam- ple of this phenomenon, let s denote the sum of the alternating harmonic n+1 series 1∞( 1) /n. The rearrangement formed by alternately taking two positiveP terms− and one negative term converges to 3s/2.

Theorem 10.31 (Ratio Test). Assume an = 0 for all n. 6 an+1 (i) If lim sup < 1 then an converges absolutely, while an P

an+1 (ii) if lim inf > 1 then an diverges. an P

an+1 Proof. (i) If lim sup < t < 1, then there exists k N such that n k a ∈ ≥ n implies an+1 < t. an Then for all j N, ∈ 2 j ak+j t ak+j 1 t ak+j 2 t ak . | | ≤ | − | ≤ | − | ≤ · · · ≤ | | ANALYSIS LECTURES 69

Thus the tail k∞+1 an converges absolutely by comparison to the geometric n P series t ak , so 1∞ an converges absolutely also. (ii)P In this| case| thereP exists k N such that n k implies ∈ ≥ an+1 > 1, an so 0 < ak ak+1 . | | ≤ | | ≤ · · · Hence an 0, thus an diverges.  6→ P Remark 10.32. It often happens that L := lim an+1/an exists, and then | | the Ratio Test says an converges absolutely if L < 1 and diverges if L > 1. However, ifPL = 1 the series could either converge or diverge. For example, the harmonic series 1/n diverges, while the series p-series 2 P 1/n converges, but in both cases the ratios an+1/an converge to 1. P 1/n Theorem 10.33 (Root Test). Put ρ = lim sup an . | | (i) If ρ < 1 then an converges absolutely, while P (ii) if ρ > 1 then an diverges. P Proof. (i) If ρ < t < 1, then there exists k N such that n k implies 1/n n ∈ ≥ an < t, so an < t . Hence an converges absolutely by comparison |to the| geometric| series| tn. P P 1/n (ii) In this case for all k N there exists n k such that an > 1, so ∈ ≥ | | an > 1. Hence an 0, so an diverges.  | | 6→ P Remark 10.34. Similarly to the Ratio Test, if ρ = 1 the series could either converge or diverge, and the same examples 1/n and 1/n2 apply. P P 11. Sequences of functions Standing Hypothesis 11.1. Recall that we are assuming that all our functions f satisfy dom f, ran f R unless otherwise specified. ⊆ Definition 11.2. Let f, f1, f2,... : A R. We say the sequence (fn) of functions: →

(i) converges pointwise to f if for every x A we have fn(x) f(x). ∈ → We also write fn f or f = lim fn; (ii) converges uniformly→ to f if for all  > 0 there exists k N such that for all n N and x A, ∈ ∈ ∈ if n k then fn(x) f(x) < . ≥ | − | Remark 11.3. It follows almost immediately from the definition that fn f → uniformly if and only if supx A fn(x) f(x) 0. ∈ | − | → Remark 11.4. Note that fn f pointwise if and only if for all x A and  > 0 there exists k →N such that for all n N we have n ∈ k ∈ ∈ ≥ implies fn(x) f(x) < . The definition of uniform convergence moves the universally| quantified− | x across the existentially quantified k, producing a stronger condition in that now a single k has to work for all x simultaneously. 70 JOHN QUIGG

Therefore, basic logic tells us that uniform convergence implies pointwise convergence.

Theorem 11.5 (Uniform Cauchy Criterion). Let f1, f2,... : A R. Then → (fn) converges uniformly if and only if for all  > 0 there exists k N such that for all x A and n, j N, ∈ ∈ ∈ if n, j k then fn(x) fj(x) < . ≥ | − | Proof. First assume fn f uniformly. Given  > 0, choose k N such that → ∈ for all x A and n N, n k implies fn(x) f(x) < /2. Then for all x A, if ∈n, j k then∈ ≥ | − | ∈ ≥ fn(x) fj(x) fn(x) f(x) + f(x) fj(x) < . | − | ≤ | − | | − | Conversely, assume (fn) satisfies the Uniform Cauchy Criterion. Then for all x A, the sequence (fn(x)) is Cauchy, so there exists f(x) R such ∈ ∈ that fn(x) f(x). Given  > 0, choose k N such that for all x A and → ∈ ∈ n, j N, if n, j k then fn(x) fj(x) < /2. Fix n k and x A, and let j∈ to get≥ | − | ≥ ∈ → ∞  fn(x) f(x) < . | − | ≤ 2 

Theorem 11.6. Let f, f1, f2,... : A R and t A. If fn f uniformly → ∈ → and each fn is continuous at t, then f is also continuous at t.

Proof. Given  > 0, choose k N such that fk(x) f(x) < /3 for all ∈ | − | x A. Now choose δ > 0 such that x t < δ implies fk(x) fk(t) < /3. Then∈ x t < δ implies | − | | − | | − | f(x) f(t) f(x) fk(x) + fk(x) fk(t) + fk(t) f(t) < . | − | ≤ | − | | − | | − | 

Theorem 11.7. Let f, f1, f2,... :[a, b] R. If fn f uniformly and each → → fn is integrable, then f is integrable and b b Z fn Z f. a → a b Proof. We first show the sequence a fn converges. Given  > 0, choose k N such that n, j k imply R  ∈ ≥  fn(x) fj(x) < for all x [a, b]. | − | 2(b a) ∈ − Then n, j k imply ≥ b b b   Z fn Z fj Z fn fj (b a) = < . − ≤ | − | ≤ 2(b a) − 2 a a a − b Thus I := limn fn exists. Ra ANALYSIS LECTURES 71

b N Now we show f is integrable and a f = I. Given  > 0, choose k such that both R ∈  fk(x) f(x) < for all x [a, b] | − | 3(b a) ∈ − and b  Z fk I < . − 3 a n Choose a partition P = xi 0 such that ti [xi 1, xi] for all i implies { } ∈ − b  fk(ti)∆xi Z fk < . X − 3 i a

Then

f(ti)∆xi I f(ti)∆xi fk(ti)∆xi X − ≤ X − X

b b + fk(ti)∆xi Z fk + Z fk I X − − a a 2 < f(ti) fk(ti) ∆xi + X | − | 3  2 < (b a) + 3(b a) − 3 − = .



Theorem 11.8. Let (fn) be a sequence of differentiable functions on (a, b). If (f ) converges uniformly on (a, b) and (fn(c)) converges for some c n0 ∈ (a, b), then (fn) converges pointwise on (a, b) to a differentiable function f, and

f 0 f 0 n → pointwise on (a, b). Moreover, if (a, b) is bounded then fn f uniformly. → Proof. For each n N and t (a, b) define gt :(a, b) R by ∈ ∈ n → fn(x) fn(t)  − if x = t gt (x) = x t 6 n  − fn0 (t) if x = t.  t Claim: the sequence (gn)n∞=1 converges uniformly. Given  > 0, choose k N such that n, j k and x (a, b) imply ∈ ≥ ∈ f 0 (x) f 0(x) < . | n − j | 72 JOHN QUIGG

Then n, j k and x = t imply ≥ 6 t t (fn fj)(x) (fn fj)(t) gn(x) gj(x) = − − − | − | x t − = (fn fj)0(s) for some s between x and t | − | < . Also, t t g (t) g (t) = f 0 (t) f 0(t) < , | n − j | | n − j | t t t and the claim is verified. Put g := lim gn. Each gn is continuous at t, hence so is gt. Note that for all x, t (a, b), ∈ t (*) fn(x) = fn(t) + g (x)(x t). n − Letting t = c, we see that (fn) converges at x. Put f := lim fn. Letting n in (*), we get → ∞ f(x) = f(t) + gt(x)(x t). − Since gt is continuous at t, f is differentiable at t and t t f 0(t) = g (t) = lim gn(t) = lim fn0 (t). For the other part, just note that if (a, b) is bounded then x t b a | − | ≤ − in (*), which implies the convergence of fn to f is uniform. 

12. Series of functions Standing Hypothesis 12.1. Recall that we are assuming that all our functions f satisfy dom f, ran f R unless otherwise specified. ⊆ Definition 12.2. (i) Given a sequence (fn) of real-valued functions on k ∞ A, the series n∞=1 fn is defined to be the sequence n=1 fn k=1 of partial sumsP. P  (ii) If the series n∞=1 fn converges pointwise (that is, if the sequence P k of partial sums converges pointwise), the limit limk n=1 fn is →∞ P (ambiguously) denoted ∞ fn, and called the sum of the series. Pn=1 (iii) Of course, we say n∞=1 fn converges uniformly if the sequence of partial sums convergesP uniformly. (iv) We say ∞ fn converges absolutely if for every x A the series Pn=1 ∈ ∞ fn(x) of real numbers converges absolutely. Pn=1 Proposition 12.3 (Uniform Cauchy Criterion for Series). fn converges uniformly on A if and only if for all  > 0 there exists l NPsuch that for all x A and k, j N, ∈ ∈ ∈ k if k j l then fn(x) < . ≥ ≥ X j

Proof. This follows immediately from the Uniform Cauchy Criterion for se- quences of functions.  ANALYSIS LECTURES 73

Theorem 12.4 (Weierstrass M-Test). Assume there exists a nonnegative sequence (Mn) in R such that both (i) Mn < , and P ∞ (ii) for all n N and x A we have fn(x) Mn. ∈ ∈ | | ≤ Then fn converges uniformly and absolutely on A. P k Proof. Given  > 0, choose l N such that k j l implies j Mn < . Then k j l and x A imply∈ ≥ ≥ P ≥ ≥ ∈ k k k fn(x) fn(x) Mn < . X ≤ X | | ≤ X j j j

 Example 12.5. xn converges absolutely on ( 1, 1), and uniformly on every interval of the formP [ s, s] with 0 < s < 1. − − Theorem 12.6. If fn converges uniformly and each fn is continuous at P t, then the sum fn is continuous at t. P Proof. Immediate from Theorem 11.6. 

Theorem 12.7. If fn is a uniformly convergent series of integrable func- P tions on [a, b], then the sum fn is integrable and P b b Z fn = Z fn. a X X a Proof. Immediate from Theorem 11.7. 

Theorem 12.8. Let fn be a series of differentiable functions on (a, b). P  If f 0 converges uniformly on (a, b) and fn(c) converges for some c P n P ∈ (a, b), then fn converges pointwise on (a, b), the sum fn is differen- tiable, and P P 0 f = f  X n X n0 on (a, b). Moreover, if (a, b) is bounded then fn converges uniformly. P Proof. Immediate from Theorem 11.8. 

13. Power series

Definition 13.1. Let (cn)n∞=0 be a real-valued sequence and let a R. The n ∈ series n∞=0 cn(x a) is called the power series with coefficients cn and centerPa. − n Theorem 13.2 (Cauchy-Hadamard Theorem). Let n∞=0 cn(x a) be a power series, and put P − 1 r := , 1/n lim sup cn | | 74 JOHN QUIGG interpreted as 0 if the lim sup is , and if the lim sup is 0. Then for all x R, ∞ ∞ ∈ n (i) if x a < r then cn(x a) converges absolutely, while | − | P − n (ii) if x a > r then cn(x a) diverges. | − | n P − Moreover, cn(x a) converges uniformly on every interval of the form [a s, a + sP] with 0−< s < r. − Proof. We have

n 1/n 1/n x a lim sup cn(x a) = x a lim sup cn = | − |, − | − | | | r so the first part follows form the Root Test. For the other part, let 0 < s < r, and pick t (s, r). Then ∈ 1 1 1/n > = lim sup cn , t r | | so there exists k N such that n k implies ∈ ≥ 1/n 1 cn < . | | t Thus n k and x a < s imply ≥ | − | n n s cn(x a) . | − | ≤  t  s n n Since t is a convergent geometric series, the power series cn(x a) convergesP uniformly on [a s, a + s] by the Weierstrass M-Test.P − −  Definition 13.3. With the above notation, r is called the radius of conver- n gence of the power series cn(x a) . P − Example 13.4. (i) The power series nnxn has radius of convergence 0. P (ii) The power series xn/nn has radius of convergence . (iii) For every real numberP p the power series xn/np has∞ radius of convergence 1 (because (np)1/n = (n1/n)p P1p = 1). → Remark 13.5. The following theorem tells us differentiating and integrating a power series term-by-term preserves the radius of convergence and gives a series whose sum is the derivative or integral of the original: n Theorem 13.6. If the power series cn(x a) has radius of convergence r, then both power series P − n 1 cn n+1 ncn(x a) − and (x a) X − X n + 1 − also have radius of convergence r. Moreover, if r > 0 and we define f :(a n − r, a + r) R by f(x) = n∞=0 cn(x a) , then for all x (a r, a + r) we have → P − ∈ − x ∞ n 1 ∞ cn n+1 f 0(x) = ncn(x a) − and Z f = (x a) . X − X n + 1 − n=1 a n=0 ANALYSIS LECTURES 75

Proof. First of all, multiplying by x a term-by-term does not change the −n 1 radius of convergence, so ncn(x a) − has the same radius of convergence n P1/n − as ncn(x a) . Since n 1, the Cauchy-Hadamard Theorem implies P − n → n that ncn(x a) has the same radius of convergence as cn(x a) . P − n 1 P − Thus the power series ncn(x a) − has radius of convergence r. Since nP − cn n+1 the series cn(x a) is obtained from n+1 (x a) by differentiating P − cn P n+1 − term-by-term, it follows that n+1 (x a) must also have radius of convergence r. P − For the other part, fix x in (a r, a + r), and choose s (0, r) such − n∈ 1 that x (a s, a + s). Since the power series ncn(x a) − converges ∈ − P n − uniformly on (a s, a + s) and the series cn(x a) converges at some point in (a s, a−+ s), Theorem 12.8 (on differentiatingP − series) tells us f is differentiable− and

∞ d n ∞ n 1 f 0(x) = cn(x a) = ncn(x a) − . X dx − X − 0 1 n Similarly, since cn(t a) converges uniformly for t in the closed interval with endpoints a andP x,− Theorem 12.7 (on integrating series) tells us

x x ∞ n ∞ cn n+1 Z f(t) dt = Z cn(t a) dt = (x a) . X − X n + 1 − a 0 a 0 

n Corollary 13.7. If f(x) = cn(x a) for all x in an open interval containing a, then the coefficientsP are− give by f (n)(a) c = for all n = 0, 1,.... n n! Proof. An induction argument shows

(n) ∞ k! k n f (x) = ck(x a) − . X (k n)! − k=n − Thus (n) f (a) = n!cn.  Definition 13.8. If f is infinitely differentiable at a, the power series

∞ f (n)(a)(x a)n − X n! n=0 is called the Taylor series of f at a. 76 JOHN QUIGG

Remark 13.9. The Taylor series of f may have a nonzero radius of conver- gence but only converge to f at a. For example, if

e 1/x2 if x = 0 f(x) = ( − 6 0 if x = 0

(n) 1/x2 then f (0) = 0 for every n, so the Taylor series of e− at 0 has radius 1/x2 of convergence , but this Taylor series converges to e− only at x = 0. To see this, note∞ that an induction argument using l’Hˆopital’sRule shows that for each n = 0, 1,... we have

g(x)e 1/x2 if x = 0 f (n)(x) = ( − 6 0 if x = 0, where g is a rational function. 1 n Example 13.10. The Taylor series of 1 x at 0 is 0∞ x , which converges to 1 − P 1 x on the interval ( 1, 1). Integrating term-by-term, we find the Taylor series− of log(1 x) at− 0. Multiplying by 1 and substituting x for x, we get the Taylor− series− for log(1 + x) at 0, which− converges to log(1− + x) on ( 1, 1): − x2 x3 log(1 + x) = x + . − 2 3 − · · · We can also see that this is valid at 1, that is, 1 1 log 2 = 1 + , − 2 3 − · · · n+1 n by using the Alternating Series Test to show that the series 1∞( 1) x /n converges uniformly on [0, 1]. To see the uniform convergence,P − given  > 0 choose l N such that 1/l < . If x [0, 1] then ( 1)n+1xn/n satisfies the hypotheses∈ of the Alternating Series∈ Test. HencePk − j l and x [0, 1] imply ≥ ≥ ∈ k xn xj 1 ( 1)n+1 < , X − n ≤ j ≤ l j so the series satisfies the Uniform Cauchy Criterion. This also gives an example of a uniformly convergent series which does not satisfy the hypotheses of the Weierstrass M-Test (and this is most easily seen by noting that the series converges conditionally at x = 1). However, a more elementary example of this phenomenon is given by any series of con- stant functions whose constant values comprise the terms of a conditionally convergent series. x xn Example 13.11. The Taylor series of e at 0 is 0∞ n! . The radius of conver- gence is , but to see that this series convergesP to ex requires analyzing the remainder∞ term in Taylor’s Theorem: for all k N there exists c between 0 ∈ ANALYSIS LECTURES 77 and x such that k 1 n c k x k x − x e x e| | x k e = | | | | →∞ 0. − X n! k! ≤ k! −−−→ 0

Example 13.12 (Trigonometry). For all c1, c2 R the power series ∈ c1 2 c2 3 c1 4 c1 + c2x x x + x + − 2! − 3! 4! ··· has radius of convergence , since the nth term has absolute value at most n ∞ n x c1 + c2 x /n! and 0∞ c1 + c2 x /n! converges (to c1 + c2 e| |) for| all| x| . || | P | | | || | | | | | Letting c1 = 1 and c2 = 0, we define cos: R R by → ∞ ( 1)nx2n cos x = − , X (2n)! 0 while letting c1 = 0 and c2 = 1, we define sin: R R by → ∞ ( 1)nx2n+1 sin x = − . X (2n + 1)! 0 We will give a brief outline showing that much (all?) of the theory of trigonometry can be easily derived by analyzing the Taylor series of cos and sin. Differentiating the series for sin and cos gives

sin0 = cos and cos0 = sin, − so both sin and cos satisfy the differential equation f = f, hence so does 00 − any linear combination c1 cos +c2 sin. Conversely, suppose we are given a real-valued function on an open in- terval I containing 0, and assume f 00 = f on I. Then by induction (n+2) (n) − f = f for n = 0, 1,... . Put c1 = f(0) and c2 = f 0(0). Then the Taylor− series of f at 0 is c1 2 c2 3 c1 4 c1 + c2x x x + x + . − 2! − 3! 4! ··· Fix x I, and choose an upper bound M for f and f 0 on the closed interval∈ with endpoints 0 and x. By Taylor’s Theorem,| | for| | all n N there exists c between 0 and x such that ∈ n 1 (k) k (n) n n − f (0)x f (c)x M x n f(x) = | | →∞ 0, − X k! n! ≤ n! −−−→ 0 since f (n) is f or f each time. Thus the Taylor series converges to f | | | | | 0| on I. Therefore we have shown that there exist unique c1, c2 R such that ∈ f = c1 cos +c2 sin. Because the Taylor series of sin has only odd powers, sin( x) = sin x, − − 78 JOHN QUIGG and similarly (because the series for cos has only even powers) cos( x) = cos x. − Letting x = 0 in the Taylor series, we get sin 0 = 0 and cos 0 = 1. We have d (sin2 x + cos2 x) = 2 sin x cos x 2 cos x sin x = 0, dx − so sin2 + cos2 is constant. Since sin2 0 + cos2 0 = 1, we get the identity sin2 x + cos2 x = 1. For fixed y R, we have ∈ d2 d sin(x + y) = cos(x + y) = sin(x + y), dx2 dx − so there exist unique c1, c2 R such that sin(x + y) = c1 cos x + c2 sin x. ∈ Differentiating, we get cos(x + y) = c1 sin x + c2 cos x. Letting x = 0, we − find c1 = sin y and c2 = cos y, so we get the addition formulas sin(x + y) = sin x cos y + cos x sin y cos(x + y) = cos x cos y sin x sin y. − If 0 < x 2 and n N then ≤ ∈ 0 < x2 < 6 = 3 2 (n + 2)(n + 1), · ≤ so xn xn+2 > > 0, n! (n + 2)! hence the Taylor series for sin satisfies the (strict version of the) hypotheses of the Alternating Series Test. Thus sin x > 0 for all 0 < x 2, ≤ so cos is strictly decreasing on [0, 2]. Now,

∞ ( 1)n22n ∞ ( 1)n22n cos 2 = − = 1 + − , X (2n)! − X (2n)! 0 2 and by the Alternating Series Test

∞ ( 1)n22n 24 2 − < = , X (2n)! 4! 3 2 so cos 2 < 0. Hence there exists a unique c (0, 2) such that cos c = 0. Define π = 2c. Then ∈ π cos = 0. 2 Since 1 = sin2 π/2 + cos2 π/2 = sin2 π/2 and sin π/2 > 0, we have π sin = 1. 2 ANALYSIS LECTURES 79

Thus π π sin x + = cos x and cos x + = sin x. 2  2  − Therefore, sin and cos are both periodic with period 2π, and we get the usual graphs. Index absolute convergence, 66, 72 decimal expansions, 18 alternate triangle inequality, 13 decreasing, 36 Alternating Series Test, 66 strictly, 37 Approximation of Suprema, 17 Density of Irrationals, 20 Archimedean Principle, 17 Density of Rationals, 18 derivative, 42 Bernoulli’s Inequality, 16 higher order, 45 between, 13 differentiable, 42 binomial coefficients, 15 at a point, 42 Binomial Theorem, 15 Dirichlet function, 52 Bolzano-Weierstrass Theorem, 33 discontinuity, 38 bounded, 25 discontinuous, 38 bounded above, 14 diverge bounded below, 14 sequence, 25 to , 30 Cauchy ∞ uniform, 70, 72 e, 62 Cauchy sequence, 33 element, 1 Cauchy’s Mean Value Theorem, 45 exponential function, 61 Cauchy-Hadamard Theorem, 73 extended real numbers, 29 Chain Rule, 44 Extreme Value Theorem, 40 Change of Variables Theorem, 57 closed form, 63 family, 7 cluster point, 26 Field Axioms, 11 Comparison Test, 65 finite, 21 Comparison Theorem for Improper Inte- floor, 18 grals, 59 function, 3 Completeness Axiom of R, 17 1-1, 6 conditional convergence, 66 arithmetic with, 6 Continuity Characterization of Limits, 39 bounded, 25 continuous, 38 composition, 6 at a point, 38 constant, 6 limit characterization of, 39 domain, 4 sequential characterization of, 38 extension, 5 uniformly, 40 identity, 6 converge image, 4 absolutely, 66, 72 inverse, 6 conditionally, 66 one-to-one, 6 pointwise, 69, 72 onto, 6 sequence, 24 pre-image, 4 series, 63 range, 4 uniformly, 69, 72 real-valued, 4 converge pointwise restriction, 5 series, 72 Fundamental Theorem of Calculus, 55 cos, 77 countability of Q, 22 improper integral, 57 countable, 21 increasing, 36 countably infinite, 21 strictly, 37 Critical Point Lemma, 44 indexed family, 7 inf, 16 Darboux’s Theorem, 51 infimum, 16 80 ANALYSIS LECTURES 81 infinite, 21 pairwise disjoint, 8 integers, 12 partial sum, 63 integral, 49 partition (for integrals), 48 Integral Test, 64 Pascal’s Triangle, 15 Integration by Parts, 56 pointwise convergence, 69, 72 Intermediate Value Theorem, 40 Power Rule, 43 interval, 14 power series, 73 bounded, 14 center, 73 degenerate, 14 coefficients, 73 half-closed, 14 powers half-open, 14 arbitrary, 61 nondegenerate, 14 integer, 12 open, 14 rational, 20 unbounded, 14 Product Rule, 42 Inverse Function Theorem, 48 Q, 12 irrational numbers, 12 Quotient Rule, 42

L’Hˆopital’sRule, 47 R, 3 lim inf, 31 radius of convergence, 74 lim sup, 31 Ratio Test, 68 limit rational numbers, 12 at infinity, 37 Rearrangement Theorem, 67 continuity characterization of, 39 Riemann, 68 function, 34 refines, 48 infinite, 30, 37 Riemann integrable, 49 left-hand, 36 Riemann integral, 49 one-sided, 36 Riemann sum, 49 restricted, 36 Riemann’s Criterion, 51 right-hand, 36 Riemann’s Rearrangement Theorem, 68 sequence, 25 Rolle’s Theorem, 44 sequential characterization of, 34 Root Test, 69 Limit Characterization of Continuity, 39 roots, 20 Limit Comparison Test, 66 logarithm function, 60 sequence, 23 lower bound, 14 bounded, 25 lower sum, 48, 49 Cauchy, 33 converge, 24 Mathematical Induction, 11 diverge, 25 maximum, 14 term, 23 function, 40 Sequential Characterization of Continu- Mean Value Theorem, 45 ity, 38 Cauchy, 45 Sequential Characterization of Limits, 34 Mean Value Theorem for Integrals, 55 series, 63, 72 minimum, 14 alternating harmonic, 67 function, 40 converge, 63 monotone, 37 converge pointwise, 72 strictly, 37 geometric, 63 harmonic, 64 N, 11 p-, 65 natural numbers, 11 partial sum, 63 power, 73 Order Axioms, 12 sum, 63, 72 ordered pair, 3 Taylor, 75 82 JOHN QUIGG

term, 63 set, 1 bounded, 25 Cartesion product, 3, 8 complement, 3 countable, 21 countably infinite, 21 De Morgan’s Laws, 9 difference, 3 disjoint, 3 Distributive Laws, 8 empty, 2 equal, 2 family, 7 finite, 21 indexed family, 7 infinite, 21 intersection, 3, 7 proper subset, 2 subset, 2 superset, 2 uncountable, 21 union, 3, 7 sin, 77 Squeeze Theorem limits, 35 sequences, 25 subsequence, 26 sup, 16 supremum, 16

Taylor series, 75 Taylor’s Theorem, 46 telescoping sum, 64 term sequence, 23 series, 63 triangle inequality, 13 alternate, 13 uncountability of R, 23 uncountable, 21 uniform Cauchy criterion, 70, 72 uniform convergence, 69, 72 uniformly continuous, 40 upper bound, 14 upper sum, 48

Weierstrass M-Test, 73 Well-Ordering Principle, 14

Z, 12