MATH1113 Mathematical Foundations for Statistics (Calculus)

Lilia Ferrario and David Ridout

Last modified: October 31, 2012 Contents

1 Functions, Domains and Ranges 1 1.1 Functions ...... 1 1.2 DomainsandRanges ...... 1 1.3 Domains of Standard Functions ...... 3 1.4 Domains of Functions Given by Formulae ...... 5 1.5 Domains of Piecewise-Defined Functions ...... 6 1.6 Composition of Functions ...... 8 1.7 EvenandOddFunctions ...... 9

2 Bounds for Functions 11 2.1 Bounds ...... 11 2.2 Finding Bounds for Functions ...... 12 2.3 Finding Bounds for Functions of the Form 1/u(x) ...... 15

3 Limits 17 3.1 An Intuitive Approach to Limits ...... 17 3.2 Rigorouslimits ...... 19 3.3 FactsaboutLimits...... 22 3.4 The“SqueezeTheorem” ...... 24 3.5 AVeryImportantLimit...... 25 3.6 Infinitelimits ...... 26

4 Continuity 30 4.1 ContinuityataPoint ...... 30 4.2 Algebraic Combinations and Continuity ...... 32 4.3 Limits and Continuity for Compositions ...... 33 4.4 ContinuityonanInterval ...... 34 4.5 The Intermediate Value Theorem ...... 34 4.6 TheMin-MaxTheorem...... 36 4.7 Removable Discontinuities ...... 38

5 Differentiation 39 5.1 The Derivative of a Function at a Point ...... 39 5.2 Continuity and Differentiability ...... 41 5.3 One-SidedDerivatives ...... 42 5.4 DifferentiationRules ...... 44 5.5 Chain Rule for Differentiating Compositions ...... 45 5.6 Some Simple Applications ...... 46 5.7 Differentials, ∆-notation and Linear Approximation ...... 47

i 5.8 Implicit Differentiation ...... 49 5.9 SomeImportantTheorems ...... 52

6 Inverse Functions 57 6.1 MonotonicFunctions ...... 57 6.2 One-to-One Functions and Inverses ...... 58 6.3 Properties of Inverse Functions ...... 60 6.4 Derivatives of Inverse Functions ...... 63 6.5 Inverses of Trigonometric Functions ...... 64 6.6 Logarithms and Exponentials ...... 66 6.7 Logarithmic Differentiation ...... 70 6.8 HyperbolicFunctions ...... 71 6.9 The Inverse Hyperbolic Functions ...... 72

7 Applications of Derivatives 74 7.1 RelatedRates ...... 74 7.2 l’Hopital’sRulesˆ ...... 75 7.3 ExtremeValues ...... 79 7.4 TheFirstDerivativeTest ...... 80 7.5 Functions Defined on General Intervals ...... 82 7.6 Concavity and Points of Inflection ...... 83 7.7 TheSecondDerivativeTest...... 84 7.8 SketchingGraphs ...... 86 7.9 Optimisation ...... 87 7.10 Roots of Functions: Newton’s Method ...... 90

8 Integration 94 8.1 IndefiniteIntegrals ...... 94 8.2 Summation and Σ Notation...... 95 8.3 AreasandSums...... 96 8.4 DefiniteIntegrals ...... 98 8.5 Implementing the Definition of Definite Integration ...... 100 8.6 Properties of the Definite ...... 101 8.7 The Fundamental Theorem of Calculus ...... 102 8.8 Definite Integration in Practice ...... 105 8.9 Integrating Piecewise-Continuous Functions ...... 106 8.10 ImproperIntegrals ...... 108

9 Integration Techniques 111 9.1 Integration by Substitution ...... 111 9.2 Substitution and Definite ...... 112 9.3 Trigonometric Integrals ...... 113 9.4 Inverse Trigonometric Substitutions ...... 114 9.5 IntegrationbyParts ...... 116 9.6 Integrating Rational Functions ...... 118 9.7 Further Partial Fraction Decompositions ...... 121

ii 10 Taylor Series 124 10.1 TaylorPolynomials ...... 124 10.2 Lagrange’sRemainderTheorem ...... 125 10.3TaylorSeries ...... 127 10.4 Standard Examples of Maclaurin Series ...... 130 10.5 TheBinomialTheorem ...... 133

11 Differential Equations 136 11.1 Ordinary Differential Equations and their Solutions ...... 136 11.2 InitialValueProblems...... 137 11.3 Separable Differential Equations ...... 139 11.4 Applications...... 140 11.5 First Order Linear Differential Equations ...... 141

12 Functions of Several Variables 143 12.1 DefinitionsandGeometry...... 143 12.2 Domains and Subsets of Rn ...... 146 12.3 LimitsandContinuity...... 148

13 Multivariable Differentiation 153 13.1 PartialDerivatives...... 153 13.2 Linear Approximations for Functions of Two Variables ...... 156 13.3 Differentiability for Multivariable Functions ...... 158 13.4TheChainRule ...... 159 13.5 Gradients and Directional Derivatives ...... 162 13.6 Extrema and Optimisation ...... 165

14 Multiple Integration 172 14.1 Doubleintegrals...... 172 14.2 Evaluating Double Integrals by Iteration ...... 174 14.3 AreaIntegrals...... 179 14.4 Changing the Order of Integration ...... 180 14.5 PolarCoordinates...... 182

iii Chapter 1

Functions, Domains and Ranges

1.1 Functions

A function f is a rule (or mapping) that assigns to each element x in a set A one and only one element y in a set B. To fix ideas, think of functions that you have met previously, for example where A = B = R and f (x)= 4x + 3. Here, the symbol R denotes the set of all real numbers. Figures 1.1 and 1.2 illustrate these concepts. When we write an expression y = f (x), we say that y is the dependent variable, x is the independent variable (the argument of f ), and f is the function, mapping or rule. Strictly speaking, a function is a symbol such as f and its value at the point x is f (x). It is common (and convenient!), however, to blur the distinction between the two. Common ways of writing a function include

f (x)= x2, f : x x2, or just x x2, 7−→ 7−→ where f , in this example, is the function from R to R that squares its argument. Until further notice, we will only consider functions for which both the independent and dependent variables are real numbers.

1.2 Domains and Ranges

The domain of a function f consists of all the real numbers x that f will accept. If D is the domain of the function f , then the set

f (D)= f (x) : x D { ∈ } consisting of all the values that f takes is called the range or image of f . Simply put, the range is the set of all numbers that the function can produce if you stick in a number from the domain. Thus, the function f can be seen as some sort of machine (see Figure 1.3), producing an output value f (x) in its range whenever it is fed an input value x from its domain. When a function is given an input value outside its domain, the result is not defined.

1 3 Example 1.1. Let f (x)= 2x + 1 on the domain [0,1]. What is f ( 2 ) and f ( 2 )? 1 1 3 Solution. As 2 is in the domain of f , we compute that f ( 2 )= 2. However, 2 is not in the domain of f , and so f (1.5) is not defined.

1 4

3.2

2.4

1.6

0.8

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

-0.8

Figure 1.1: The curve shown in this figure can be represented by a function, since each element x corresponds to one and only one element y.

2.4

1.6

0.8

-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

-0.8

-1.6

-2.4

Figure 1.2: The curve shown in this figure cannot be represented by a function, because a function cannot assign two values to each element x.

2 Figure 1.3: This function machine has the rule “double the number and then add seven”.

You could compare this to a soft drink machine: If you put in $1 worth of coins, you get a drink. But, some machines don’t accept 5¢ coins — they are legal tender but are not in the “domain” of the machine. If we neglect to specify the domain for a given function, then it is natural to assume that the domain consists of all the real numbers for which the function is defined. This is called the implicit domain rule. Example 1.2. Let f (x)= 2x + 1 on the domain D =[0,1] (as in Example 1.1). What is the range of f ? Solution. In this case, it suffices to consider the endpoints 0 and 1 of the domain D. Since f (0)= 1 and f (1)= 3, the range of f is R =[1,3]. Example 1.3. Let f (x)= 2x2 + 1 on the domain D =[ 2,1]. What is the range of f ? − Solution. We again consider the endpoints 2 and 1 of the domain D: f ( 2)= 9 and f (1)= 3. However, this does not suffice in this case because− f (0)= 1. In fact, the range− of f is R =[1,9] as you should check (sketch the graph). Example 1.4. Let f (x)= 1 √x. What is the range of f ? − Solution. As the domain has not been specified, we should assume the implicit domain rule and conclude that the natural domain of f is [0,∞), because negative numbers do not have (real) square roots. Since the square root function takes all values in [0,∞), the range of f is R =( ∞,1]. − 1.3 Domains of Standard Functions

We give the domains of many standard functions in Table 1.1, noting that these may be helpful for determining the domains of more complicated functions. You probably already know most of these functions — any that you don’t will be introduced later on in the course.

3 Constant functions f (x)= k Defined for all x. The identity function f (x)= x Defined for all x. Powers f (x)= xn If n > 0, defined for all x. (n Z) If n < 0, defined for all x = 0. ∈ 6 Roots f (x)= √n x = x1/n If n > 0 is odd, defined for all x. (n Z) If n > 0 is even, defined for all x > 0. ∈ If n < 0 is odd, defined for all x = 0. 6 If n < 0 is even, defined for all x > 0. Exponentials f (x)= eax Defined for all x and all a. Logarithms f (x)= lnx Defined for all x > 0. f (x)= log x Defined for all x > 0 if b > 0 (and b = 1). b 6 Trig. functions cosx, sinx Defined for all x. tanx, secx Defined for all x except (n + 1 )π, n Z. 2 ∈ cscx, cotx Defined for all x except nπ, n Z. 1 1 ∈ Inverse trig. functions cos− x, sin− x Defined for 1 6 x 6 1. 1 − tan− x Defined for all x. Hyperbolic functions coshx, sinhx Defined for all x. tanhx, sechx Defined for all x. cschx, cothx Defined for all x = 0. 1 6 Inverse hyperbolic cosh− x Defined for all x > 1. 1 functions sinh− x Defined for all x. 1 tanh− x Defined for all 1 < x < 1. − The absolute value f (x)= x Defined for all x. | | function

Table 1.1: Natural domains for some standard functions. Here, Z denotes the set of all .

4 1.4 Domains of Functions Given by Formulae

Finding the domain of a function generally relies upon checking through the definition of the function and determining those values of the independent variable for which the function is defined. We first note any divisions in the function as in any division, the denominator must not be zero. This will therefore give us an inequality that must be satisfied: denominator = 0. We also note any standard functions occurring in the definition which are not defined everywhere.6 Such occurrences may result in further inequalities which must be satisfied. It follows that one may end up with just one inequality, in which case the inequality defines the domain, or one may end up with several inequalities which must be satisfied simultane- ously, in which case the domain consists of those values of the variable which satisfy all the inequalities.

Example 1.5. Find the (natural) domain of the function

cosx + e3x f (x)= . 1 + x2

Solution. Here there is a division, but the denominator satisfies 1 + x2 > 1 and so never van- ishes. The standard functions which occur (cosx and ex) are both defined everywhere. There- fore, this function is defined everywhere (for all x) and the natural domain is R.

Example 1.6. Find the domain D of the function

2 ln(4 x )+ √x + 1 2 1 f (x)= − + cos (1 + tan− (x)). 1 ex2 − 2 Solution. Division occurs just once with the denominator being 1 ex . This gives an inequal- ity: − 2 2 1 ex = 0 ex = 1 x2 = 0 x = 0. − 6 ⇒ 6 ⇒ 6 ⇒ 6 x 1 2 The standard functions cosx, e , tan− x and x are each defined everywhere, so they do not lead to further inequalities. However, the standard function lnx requires a positive argument, so this gives rise to the inequality

4 x2 > 0 x2 < 4 2 < x < 2. − ⇒ ⇒ − Moreover, the square root function requires a non-negative argument, so it gives rise to the inequality x + 1 > 0 x > 1. ⇒ − The domain of f is the set of real numbers x satisfying all of these inequalities. This may be described as the interval [ 1,2) with the point 0 missing or, if you prefer, as the union of the two intervals [ 1,0) and (−0,2). Thus, − D = x R : 1 6 x < 2 and x = 0 =[ 1,0) (0,2). { ∈ − 6 } − ∪ This function is shown in Figure 1.4.

5 0.8

-2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.8

-1.6

-2.4

-3.2

-4

-4.8

Figure 1.4: A graph of the function f appearing in Example 1.6.

1.5 Domains of Piecewise-Defined Functions

To determine the domain of functions defined by cases, often called piecewise-defined func- tions, simply take each case in turn and figure out the values of the argument in the region specified by the case. Example 1.7. Determine the domain D of the function

√x + 1 for x < 0, f (x)= 1 (cosh− x for x > 0.

Solution. In the region x < 0, the function is defined to be √x + 1 which is defined when x + 1 > 0, hence x > 1. For this case, the region x < 0 restricts this to 1 6 x < 0. − 1 − In the region x > 0, the function is defined as cosh− x which is defined for all x > 1. For this case, the region x > 0 does not restrict the domain of definition further: The function is defined, in this case, for x > 1. Putting the two cases together, the function f is only defined on the two intervals [ 1,0) and [1,∞). Thus, −

D = x R : 1 6 x < 0 or 1 6 x =[ 1,0) [1,∞). { ∈ − } − ∪ This function is shown in Figure 1.5. The function defined in Example 1.7 cannot, of course, be defined anywhere outside the regions defined by the separate cases used to define it. Example 1.8. Find the domain D of the function

√sinx for 5 6 x 6 0, g(x)= − 1/(x2 1) for 0 < x 6 5. ( −

6 4

3

2

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

-1

-2

-3

-4

Figure 1.5: The function f of Example 1.7

2.4

1.6

0.8

-5 -4 -3 -2 -1 0 1 2 3 4 5

-0.8

-1.6

-2.4

Figure 1.6: The function g considered in Example 1.8.

7 Solution. In the region 5 6 x 6 0, we need √sinx to be defined, which in turn requires that sinx > 0. Within this region,− this inequality is only satisfied for 5 6 x 6 π and x = 0. In the region 0 < x 6 5, we instead require that 1/(x2 1) be− defined, which− in turn requires that x2 1 = 0. This tells us that x must not be 1, and we− may discard 1 because it does not belong− to the6 region under consideration. Therefore,± we are left with the− entire interval (0,5], except for the single point 1. Putting this together, the domain of the function consists of the three intervals [ 5, π], [0,1) and (1,5]. In other words, − −

D = x R : 5 6 x 6 π or 0 6 x 6 5 and x = 1 =[ 5, π] [0,1) (1,5]. { ∈ − − 6 } − − ∪ ∪ We graph this function in Figure 1.6. Note that when defining a piecewise function, one usually specifies the regions in each case so that the individual functions that are to be pieced together are defined everywhere within the region. This is safer and generally makes the definition much easier to understand. The two examples we have given above are therefore rather artificial — it would be much more natural (and common) to write them as

√x + 1 for 1 6 x < 0, f (x)= 1 − (cosh− x for x > 1 and √sinx for 5 6 x 6 π, g(x)= − − 1/(x2 1) for 0 < x < 1 or 1 < x 6 5, ( − respectively. We conclude this discussion of functions with a very important remark: Any function f gives equal treatment to anything that is put inside the function’s brackets — the way a function operates on its argument is not dependent on the naming of the argument, or even on whether the argument happens to be another function or not.

Example 1.9. Find f (sin2 x) when f (x)= 1 + x2.

Solution. We simply replace any occurrences of x in f (x) by sin2 x:

f (sin2 x)= 1 +(sin2 x)2 = 1 + sin4 x.

1.6 Composition of Functions

Consider the two basic functions

h(x)= x and g(x)= a, where a is a constant. What functions can we construct from these?

• Products of h, meaning powers of x, such as x2, x3 and so on.

• Products of powers and constants, known as monomials, such as 2x2, 5x3,...

• Summations of the above, known as polynomials, for example 2x2 + 5x3.

8 • Quotients of polynomials, known as rational functions, for example 2x2 + 5x3 . 5x + 6 We now introduce the composition of two functions f and g to create a new function f g. If the range of g is contained within the domain of f , then the composition f g is given by◦ ◦ ( f g)(x)= f (g(x)). ◦ Example 1.10. Let x2 + x 6 f (x)= − x and g(x)= sinx. Find the composition f g. ◦ Solution. Since the range of g is [ 1,1] and the domain of f is x R : x = 0 , the compo- sition f g is not, strictly speaking,− defined. However, if we understan{ ∈ d that the6 domain} of g is to be restricted◦ so that g(x) = 0, then the composition f g is given by 6 ◦ (sinx)2 + sinx 6 ( f g)(x)= − . ◦ sinx We see that f g is a rational function of sinx. ◦ Exercise. What is g f ? ◦ Example 1.11. Express f (x)= sin3(x 3)=(sin(x 3))3 as a composition of 3 functions. − − Solution. First, note that the function can be split up into three operations — subtracting 3, applying sin and cubing. We therefore define 3 g1(x)= x 3, g2(x)= sinx and g3(x)= x . − Now, we can write 3 3 3 f (x)=(sin(x 3)) =(sin(g1(x))) =(g2(g1(x))) = g3(g2(g1(x))) − and conclude that f = g3 g2 g1, a triple composition. ◦ ◦ 1.7 Even and Odd Functions

If a function f satisfies f ( x)= f (x), − for all x in the domain D of f (so x in D automatically implies that x is in D), then f is said to be an even function. Examples of even functions include f (x)=−cosx or g(x)= x2. Even functions are those whose graphs look the same when they are reflected about the y axis. Similarly, if a function f satisfies f ( x)= f (x), − − for all x in the domain D of f (again, so x in D automatically implies that x is in D), then f is said to be an odd function. Examples of odd functions include f (x)= sin−x or g(x)= x3. Odd functions are those whose graphs look the same after a 180°rotation about the origin.

9 Figure 1.7: An even function is symmetric about the y axis. An odd function is symmetric about the origin. −

10 Chapter 2

Bounds for Functions

Here, we shall introduce some techniques for finding bounds for certain types of functions. The concept of bounding functions will turn out to be very useful later on, particularly when we are studying the idea of limits of functions.

2.1 Bounds

Let f be a function and S be some set that the function is defined on. In other words, take S to be a subset of the domain of f . If there exists a number b such that

f (x) 6 b, for all x S, ∈ then we say that f is bounded above by b on S. Similarly, if there exists b′ such that

f (x) > b′, for all x S, ∈ then we say that f is bounded below by b′ on S. These numbers are of course not unique — if f is bounded above by 10 on S, then it’s surely bounded above by 20 on S. If f is both bounded above and below on S, then we say that f is bounded on S. It follows that there exists a number B > 0 for which we have f (x) 6 B, for all x S. | | ∈ We call B a bound for f . Again, bounds are never unique — one can take B to be any number greater than or equal to both b and b′ . This means that the graph of y = f (x) lies, for x in S, between (and possibly touching)| | the| horizontal| lines y = B and y = B. If we cannot find a number B that satisfies this inequality, then we say that f is unbounded−on S. Note carefully the role that the set S plays in this definition — we are only interested in the way that f behaves on this set! It always helps to think about a few examples:

• Is f (x)= x bounded on [5,10]?

• Is f (x)= x bounded on the set of real numbers, R?

• Is f (x)= 4 bounded on each of these sets?

11 Figure 2.1: A bounded function: You can see that B 6 f (x) 6 B. −

The following are some useful properties of the absolute value function that can be used to determine bounds of functions:

f (x)+ g(x) 6 f (x) + g(x) , (2.1) | | | | | | f (x)g(x) = f (x) g(x) , (2.2) | | | || | f (x) f (x) = | |. (2.3) g(x) g(x)

| | The first of these is often referred to as the triangle inequality.

2.2 Finding Bounds for Functions

Example 2.1. Find a bound B for f (x)= x2 2 on 1 6 x 6 2. − − Solution. We must find a B > 0 such that

x2 2 6 B, for all x in [ 1,2]. − −

Using the triangle inequality (2.1), we obtain x2 +( 2) 6 x2 + 2 = x2 + 2 = x2 + 2, − |− | noting that x2 = x2 as x 2 > 0 for all real numbers x. It is easy to check that on [ 1,2], we have − 0 6 x2 6 4. We therefore obtain

x2 2 6 4 + 2 = 6. − Hence, an upper bound for f (x)= x2 2 on the interval [ 1,2] is B = 6. − − Note that we don’t claim that B = 6 is the best bound (meaning the smallest one) in this example. In fact, Figure 2.2 (or knowledge of parabolae) shows that 2 6 f (x) 6 2 on [ 1,2], with f (0)= 2 and f (2)= 2. Thus, the best bound for f (x) on [ −1,2] is B = 2. − − − 12 5

2.5

-1 -0.5 0 0.5 1 1.5 2

-2.5

-5

Figure 2.2: The triangle inequality yields the same bound for the functions of Examples 2.1 and 2.2. However, one function reaches the bound while the other does not.

Example 2.2. Find a bound B for f (x)= x2 + 2 on 1 6 x 6 2. − Solution. This is almost the same as our last example; we need only replace the 2 by +2. As before, − x2 + 2 6 x2 + 2 6 6, and so B = 6 is again a bound. Unlike Example 2.1, this time the bound is optimal because f (2)= 6 (see Figure 2.2).

Example 2.3. Find a bound for f (x)= xsinx on 0 6 x 6 2π.

Solution. We note that f (x) is the product of two functions, hence (2.2) gives

xsinx = x sinx . | | | || | Now, x 6 2π on [0,2π]. Moreover, sinx 6 1 for all x. We therefore conclude that | | | | f (x) = x sinx 6 2π 1 = 2π. | | | || | · Thus, B = 2π is a bound for f on 0 6 x 6 2π. Here are some things we wish to emphasise:

• 2π is an upper bound, but it is certainly not the smallest upper bound (see Figure 2.3). Finding the smallest upper bound is not easy in general.

• The bound we have found here depends on x being in the set [0,2π]. If we had taken x in [0,r] (with r > 0), then we would have found that f was bounded by r instead.

Example 2.4. Show that 1 f (x)= 2x 1 − 1 ∞ is not bounded on the set S =( 2 , ).

13 5

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

-2.5

-5

Figure 2.3: We found that 2π bounds xsinx on [0,2π], but it is certainly not the best bound possible.

3

2

1

-5 -4 -3 -2 -1 0 1 2 3 4 5

-1

-2

-3

Figure 2.4: f (x)= 1/(2x 1) is an example of an unbounded function. But, don’t take the graph’s word for it! Follow− the reasoning given in the solution to Example 2.4.

Solution. The graph of f (x) shown in Figure 2.4 suggests that the function is indeed unbounded on S. We will demonstrate, beyond all doubt, that f is unbounded by showing that no matter 1 which number B we select as a (candidate) bound, we can always find an x0 > 2 such that f (x0) > B. Suppose then that we have selected B. Which arguments x yield f (x) > B? Note that we can solve this inequality as follows:

1 1 > B 2x 1 < 2x 1 ⇒ − B − 1 1 1 2x < 1 + x < + . ⇒ B ⇒ 2 2B 1 1 1 1 1 Given a candidate bound B, we can therefore choose any x0 ( , + ), say x0 = + , and ∈ 2 2 2B 2 4B know that it satisfies f (x0) > B. It follows that f is unbounded. 1 ∞ Notice that this function is bounded below by 0 on ( 2 , ), because 2x 1 > 0, but it is not bounded above. Obviously, if we had taken S to be the whole real line instead− (with the point

14 1 1 2 removed, because f ( 2 ) is undefined!), then the function will be neither bounded from above nor below.

2.3 Finding Bounds for Functions of the Form 1/u(x)

It is useful to recall that an expression like 1 u(x) is large if the values of u(x) are close to zero. Hence, in order to show that 1/u(x) is bounded, we must find a positive number A such that

u(x) > A. | | We sometimes say in this circumstance that u is bounded away from 0. This then guarantees that 1 1 1 f (x) = = 6 , | | u(x) u(x) A

| | hence that B = 1/A bounds f (x)= 1/u( x). It is really important to note that you shouldn’t normally use the triangle inequality for factors in a denominator, because this inequality will go in the wrong direction: 1 1 1 u(x)+ v(x) 6 u(x) + v(x) > + . | | | | | | ⇒ u(x)+ v(x) u(x) v(x) | | | | | | Example 2.5. Find a bound for 1 f (x)= 2x 1 − on (2,∞).

Solution. Here, u(x)= 2x 1 and we know that on the interval x > 2, we have − u(x)= 2x 1 > 3. − We therefore set A = 3 (we could just as well choose any positive number less than 3 for A). It 1 ∞ now follows that f (x) is bounded by B = 1/A = 3 on the domain (2, ). Note here the difference between Examples 2.4 and 2.5: The same function may be bounded on some domains and unbounded on others.

Example 2.6. Find a bound for the function

x2 + 2x 1 f (x)= − x2 2 − on the domain (2,5).

15 Solution. We write f (x) as 1 f (x)= g(x)h(x), where g(x)= x2 + 2x 1 and h(x)= , − x2 2 − and attempt to find a bound for both g and h. More precisely, we will find upper bounds for both g(x) and h(x) . | | | | We first use the triangle inequality on g(x), noting that x < 5 on (2,5): | | g(x) = x2 + 2x 1 6 x 2 + 2 x + 1 < 25 + 2 5 + 1 = 36. | | − | | | | |− | · 2 To bound h(x), we show that x 2 is bounded away from 0 on (2,5). One way to do this is as follows: −

2 < x < 5 4 < x2 < 25 ⇒ 2 < x2 2 < 23 2 < x2 2 < 23. ⇒ − ⇒ −

Therefore, A = 2 is a lower bound for u(x) = x2 2 , hence 1 is an upper bound for h(x) = | | − 2 | | 1/ x2 2 . Combining the two upper bounds we find that B = 1 36 = 18 is an upper bound 2 for f (−x) = g(x) h(x) . That is, f is bounded by 18 on (2,5). · | | | || |

Of course, we could have used our knowledge of parabolae to show that g(x) is bounded above by 34 and 1/h(x) is bounded below by 2. The point is that for many applications, it is sufficient to just know that a function is bounded, or to give a reasonably good bound (rather than the best possible one). Don’t waste time searching for the best bound, which can be very difficult or even impossible to find exactly, unless you have a good reason for doing so!

16 Chapter 3

Limits

3.1 An Intuitive Approach to Limits

When talking about limits of functions, we are interested in the value that a function looks like it “should” have, either at a point or as the function goes off to ∞ or ∞, regardless of whether the function actually takes that value, another value or is undefined at− the limit point. A rigorous discussion of limits involves what is commonly called an ε-δ (epsilon-delta) definition, which can be difficult to understand at first. Practically, intuition often works. However, it can fail quite badly at times, which is why we make the effort to be rigorous and precise. First, let’s take a trivial example: Consider the function f (x)= x2 and ask what is the limit of f (x) as x approaches 0. This is written as follows:

lim f (x). x 0 → In this case, it’s pretty clear that the value of the function is approaching 0 as x approaches 0, and indeed f (0)= 0. For something somewhat less trivial, consider now the function

x2 4 f (x)= − , x 2 − graphed in Figure 3.1. What can we say about limx 2 f (x)? In this case, note that f (x) is defined for all real numbers, except x = 2 because f→(2) is undefined due to a division by 0. However, when x = 2, factorising the numerator leads to the simplification 6 f (x)= x + 2 (x = 2). 6 Nothing has changed except that the natural domain of f could now be extended to include the point 2. This suggests that f should behave a lot like the function g(x)= x +2. You might even be tempted to say that f (x) should be regarded as being equal to g(2) at x = 2. However, it is important in general (though rather artificial in this example) to distinguish f and g because they have different domains. However, the fact is that f and g agree at all points except x = 2, hence we are perfectly justified in writing

x2 4 lim − = lim(x + 2)= 4. x 2 x 2 x 2 → − → Limits do not care about what happens at the point of interest, only what is happening as we get arbitrarily close to this point.

17 4.8

4

3.2

2.4

1.6

0.8

-1 -0.5 0 0.5 1 1.5 2 2.5 3

-0.8

Figure 3.1: The function f (x)=(x2 4)/(x 2). − −

2 ●

1.5

1

0.5

-1 -0.5 0 0.5 1 1.5 2 2.5 3

Figure 3.2: The function g given by g(x)= x for all x = 1, and g(1)= 2. 6

As a third example, we consider the function h defined by

x for x = 1, h(x)= 6 (2 for x = 1 and ask what is limx 1 h(x)? In this case, we can see in Figure 3.2 that the function is ap- proaching 1 as x approaches→ 1, even though g(1)= 2. This emphasises the point made above: limx c f (x) does not depend upon the value of f (x) at c (even when the function is defined at this→ point). Another thing worth emphasising is that it is customary to allow the point at which we take the limit, as well as the value of the limit itself, to be ∞. For example, it is perfectly respectable to write ± 1 lim x = ∞, lim x3 = ∞ and lim = ∞. x ∞ x ∞ − x 0 x2 → →− → 18 Figure 3.3: The function sin(π/x).

However, mathematicians will object if you say that these limits exist. While the above equal- ities are viewed as convenient, the existence of a limit is generally restricted to mean that the limit has a value which is a real number. In any case, one should be careful. Finally, we remark that 1 lim = ∞. x 0 x 6 → We shall formalise the reason for this in what follows, for now remarking only that the problem arises because we seem to get ∞ for the limit when x is taken small and positive, but ∞ when x is taken small and negative. − We conclude the section with a less trivial example:

Example 3.1. Try to determine the limit π lim sin . x 0 x → Solution. First, we see that sin(π/x) is undefined at x = 0. Let’s look at the behaviour of this function as we take smaller and smaller values of x: x 1 2 1 2 1 1 2 3 2 5 3 ··· 1000 2001 ··· sin(π/x) 0 10 1 0 0 1 − ··· ··· In this case, we take values of x getting closer and closer to the limit, but it does not appear that sin(π/x) is approaching any one value. Indeed, limx 0 sin(π/x) does not exist! Looking at Figure 3.3, the wild behaviour of this function about the→ point x = 0 is quite apparent. The intuitive idea behind a limit may be summed up as follows:

lim f (x)= L x c → means that as we take values of x closer and closer to c, f (x) becomes closer and closer to L.

3.2 Rigorous limits

We will now present some rigorous definitions of limits. This is the ε-δ (epsilon-delta) defini- tion mentioned above.

19 Figure 3.4: Right and left limits.

Definition (Right-sided limit). The limit of f (x) as x approaches c from the right (or sometimes, from above) is defined to be L if, for any ε > 0, there exists some δ > 0 such that

f (x) L < ε whenever c < x < c + δ. | − | We write this as lim f (x)= L. x c+ → Definition (Left-sided limit). The limit of f (x) as x approaches c from the left (or sometimes, from below) is defined to be L if, for any ε > 0, there exists some δ > 0 such that

f (x) L < ε whenever c δ < x < c. | − | − We write this as lim f (x)= L. x c → − Referring back to our intuitive idea of limits, the value of ε in each of the above definitions determines just how close to L we wish the value of the function to be, while δ tells us just how close to c our x must be to guarantee that f (x) is within ε of L. These ε-δ definitions can take some time to grasp, so you may want to look at them more than once! Figure 3.4 tries to visualise the situation.

Definition (The limit (or two-sided limit)). The limit (or two-sided limit) of f (x) as x ap- proaches c is defined to be L if, for any ε > 0, there exists some δ > 0 such that

f (x) L < ε whenever c δ < x < c + δ. | − | − We write this as lim f (x)= L. x c → These definitions formalise the intuitive idea of a function “getting close” to some value. Note that δ will in general depend on ε and is not unique.

20 Figure 3.5: The limit of this function as x tends to 1 does not exist.

Fact 3.1. The limit limx c f (x) exists if and only if all of the following hold: → • limx c f (x) exists. → − • limx c+ f (x) exists. → • limx c f (x)= limx c+ f (x). → − → Moreover, we then have that lim f (x)= lim f (x)= lim f (x) x c x c+ x c → → → − In other words, the two-sided limit exists precisely when both the right- and left-limit exist and are equal; in this case, the two-side limit equals the common value of the right- and left-sided limits. Curiously, this result is quite useful for showing that a function does not have a limit at a point. Example 3.2. Consider the function defined by 1 2 x ifx < 1, f (x)= 0 if x = 1, x2 if x > 1. Find the left-sided, right-sided and two-sided limits, if they exist, as x approaches 1.  Solution. Here, it is easy to check that 1 lim f (x)= and lim f (x)= 1. x 1 2 x 1+ → − → As the left- and right-sided limits are different, limx 1 f (x) does not exist (see Figure 3.5). →

Example 3.3. Find limx 0 x . → | | Solution. We first have to find the left- and right-sided limits: lim x = lim x = 0, lim x = lim ( x)= 0. x 0+ | | x 0+ x 0 | | x 0 − → → → − → − Since the left- and right-sided limits are both equal to 0, we can conclude that limx 0 x = 0. → | | 21 3.3 Facts about Limits

The following facts are useful for determining limits of sums, differences, products and quo- tients of functions. The are stated here for two-sided limits. However, there are corresponding versions that apply to one-sided (left- or right-sided) limits.

Fact 3.2. Let f and g be two functions with

lim f (x)= L and lim g(x)= M x c x c → → and let k be a real number. Then, the following hold:

1. limx c k f (x) = kL. → 2. limx c f (x)+g(x) = L + M. → 3. limx c f (x)g(x) = LM. → f (x) L  4. limx c = , assuming that M = 0. → g(x) M 6 Moreover, if m is an and n a positive integer, then

m/n m/n 5. limx c f (x) = L , → where we must specify that L > 0 if n is even and L = 0 if m is negative. Finally, if I =(a,b) is an interval containing c (so a6 < c < b), then

6. f (x) 6 g(x) on I implies that L 6 M.

As an example of how a mathematician uses “epsilon-delta” technology, we will now prove statement 1. Proof of 1. First we must choose an ε > 0. Given this ε, we can define ε ε = > 0. ′ k | | Now, we are told that limx c f (x)= L, so it follows from the definition of two-sided limits that → for our ε′, there exists a δ ′ > 0 such that

f (x) L < ε′ whenever 0 < x c < δ ′. | − | | − | The first inequality can be used to derive an inequality concerning k f (x) as follows:

k f (x) kL = k f (x) L < k ε′ = ε. | − | | || − | | | It therefore follows that

k f (x) kL < ε whenever 0 < x c < δ ′. | − | | − |

If we take δ = δ ′, then this says that limx c k f (x) = kL, as required. → The proofs of each of the other are reasonably similar. It would be a good exercise to try to prove them for yourself.

22 Example 3.4. Prove that limx a k = k. → Solution. Let f (x) be the constant function which takes value k for all x and choose an ε > 0. From the definition of a limit, we have to find a δ > 0 for which f (x) k < ε whenever x a < δ. | − | | − | However, this is clearly true for any δ and any x because f (x) k = k k = 0. | − | | − | Example 3.5. Prove that limx a x = a. → Solution. Here, we let f (x)= x and once again choose some ε > 0. For this ε, all we have to do is find a δ > 0 such that x a < δ guarantees that f (x) a < ε. But, since f (x) a = | | | | | | x a , it is clear that taking δ −= ε will do the job. The limit is− therefore a as required. − | − | Having proven that limx a x = a, we can now use the list of facts about limits given in the previous section to show, for→ example, that

lim x2 = lim(x)(x)= lim x lim x = a2. x a x a x a x a → → → →   n  n It should not be hard to convince yourself now that limx a x = a . Going further, if we have a n → polynomial p(x)= c0 + c1x + + cnx , then using our fact about limits of sums now gives us ··· n n lim p(x)= lim(c0 + c1x + + cnx )= c0 + c1a + + cna = p(a). x a x a → → ··· ··· We can even extend this to limits of rational functions: Let p(x) and q(x) be two polynomials, with q(a) = 0. The fact about limits of quotients now leads to 6 p(x) p(a) lim = . x a q(x) q(a) → When we have a rational function for which both the numerator and denominator vanish at the limit point, we must treat the limit somewhat differently. Let p(x) and q(x) be polynomials with p(a)= q(a)= 0. To determine p(x) lim , x a q(x) → we factorise both the numerator and denominator so as to cancel any common factors (especially those of the form x a). The resulting rational function will have the same limit as x approaches a. This is the idea we− used when developing our intuitive approach to find the limit x2 4 lim − . x 2 x 2 → − Example 3.6. Given p(x)= x2 1 and q(x)= x 1, find − − p(x) lim . x 1 q(x) → Solution. Both p(1) and q(1) vanish, so we factorise as directed above: p(x) x2 1 (x + 1)(x 1) lim = lim − = lim − = lim(x + 1)= 2. x 1 q(x) x 1 x 1 x 1 x 1 x 1 → → − → − → We mention that one may also have a rational function for which the denominator vanishes at the limit point, but the numerator does not. The study of this case will be deferred until Section 3.6.

23 3.4 The “Squeeze Theorem”

The calculation of limits can get very tricky, especially if we are only using what we have seen so far. We therefore present two more facts that can sometimes assist in limit computations. The first goes by the name of the Squeeze Theorem (or the “Sandwich Theorem”) for reasons we shall explain shortly. Fact 3.3 (The “Squeeze Theorem”). Suppose that we have functions f , g and h satisfying

g(x) 6 f (x) 6 h(x) for all x in an interval (a ε,a + ε) (where ε > 0) except perhaps at a itself. Suppose further that we know that the limits− of g and h at a are equal:

lim g(x)= lim h(x)= L. x a x a → → Then, the limit of f at a is defined and is equal to L as well:

lim f (x)= L. x a → The name of this theorem is derived from the fact that we are squeezing (or sandwiching) the function f between two other functions, one that is always smaller than f and one that is always bigger than f . Because these two functions have the same limit at a, f must have this limit at a as well. We can use the Squeeze Theorem when the limit of f is quite difficult to calculate, but the limits of g and h are easy. This can often save a lot of effort and can even allow us to compute limits that would otherwise be impossible! This is illustrated nicely by the next example.

Example 3.7. Show that limx 0 f (x)= 0, where → 1 f (x)= xsin . x Solution. Recall that we showed in Example 3.1 that π lim sin x 0 x → does not exist. We therefore cannot use our fact about limits of products because the limit of one of the factors does not exist! Instead, we resort to a squeezing argument. For this, we look for functions that bound xsin(1/x) both above and below. Noting that 1 1 6 sin 6 1, for x = 0, − x 6 we get the bounds

1 x 6 xsin 6 x, for x > 0, − x 1 x 6 xsin 6 x, for x < 0. x − Combining these cases, we let g(x)= x and h(x)= x to obtain −| | | | g(x) 6 f (x) 6 h(x), for all x = 0. 6 24 Figure 3.6: The function f (x)= xsin(1/x) being squeezed between x and x . −| | | |

But, we know from Example 3.3 that limx 0 h(x)= limx 0 x = 0 and similarly, limx 0 g(x)= 0. Thus, the Squeeze Theorem applies and→ we conclude→ that| | → 1 lim f (x)= lim xsin = 0. x 0 x 0 x → →   This squeezing is illustrated in Figure 3.6. Here is the second useful fact that sometimes comes in handy when computing limits:

Fact 3.4. If we have a function z(x) for which limx c z(x)= 0 and another function b(x) that is bounded on an interval [c ε,c + ε] for some ε >→0, then − lim b(x)z(x) = 0. x c → You can actually prove this result by using the Squeeze Theorem and some of the basic facts that we covered earlier in the chapter. We leave this proof as a nice exercise.

3.5 A Very Important Limit

We next provide another application of the Squeeze Theorem by using it to demonstrate the important result that sinθ lim = 1, (3.1) θ 0 θ → where θ is measured in radians. To prove this, we will use some basic geometry, referring to the setup illustrated in Figure 3.7. Here, we have drawn a unit circle (a circle of radius 1) and two right-angled triangles with angle θ at the origin O. With the points as labelled in the picture, the following inequalities between areas are clear:

Area of OAP < Area of sector OAP < Area of OAQ. △ △ Recalling that the area of a triangle is half the base times the height and that the area of a sector of a circle is half the radius squared times the angle subtended, these inequalities become 1 1 1 1 sinθ < 12 θ < 1 tanθ, 2 × × 2 × × 2 × × 25 sinθ Figure 3.7: The similar triangles used in calculating lim . θ 0 θ → that is, π sinθ < θ < tanθ (0 < θ < ). (3.2) 2 It’s important to realise that we’ve derived Equation (3.2) from a diagram in which θ was implicitly assumed to be small and positive. This means that sinθ is positive, hence that divid- ing (3.2) throughout by sinθ gives θ 1 1 < < . sinθ cosθ Inverting each term therefore yields sinθ cosθ < < 1. θ To deal with the case in which θ is assumed to be small and negative, then θ is small and positive, so the previous inequalities apply and we obtain − sin( θ) sinθ cos( θ) < − < 1 cosθ < < 1, − ( θ) ⇒ θ − again. Combining these results, it follows that for θ small and non-zero, we can sandwich (sinθ)/θ between cosθ and 1. Since limθ 0 cosθ = 1, the Squeeze Theorem lets us conclude that → sinθ lim = 1. θ 0 θ → This completes the proof of (3.1)!

3.6 Infinite limits

Finally, we return to the case omitted in our study of limits of rational (and more general) functions. This is the case where we have a denominator q(x) which vanishes at x = a and a

26 numerator p(x) which does not vanish at x = a. As mentioned when we were discussing limits intuitively, the limit of the quotient as x tends to a therefore does not exist. However, it is very convenient to bend the rules and write p(x) p(x) lim = ∞ or lim = ∞, x a q(x) x a q(x) → → − when appropriate. Determining whether this rule-bending is appropriate requires looking care- fully at the left- and right-limits. Example 3.8. Find 1 lim . x 2 x 2 → − − Solution. To analyse a limit as x 2−, it is useful to employ the substitution x = 2 ε and let ε tend to 0 from the right (ε 0+→). This gives − → 1 1 1 lim = lim = lim = ∞. x 2 x 2 ε 0+ 2 ε 2 ε 0+ ε − → − − → − − → − Still, we should be very careful to note that this limit doesn’t actually exist (especially if there may be mathematicians listening). Example 3.9. Find 1 lim . x 1+ x 1 → − Solution. For x 1+, we use the substitution x = 1 + ε and take ε 0+. Then, → → 1 1 1 lim = lim = lim = ∞. x 1+ x 1 ε 0+ 1 + ε 1 ε 0+ ε → − → − → Example 3.10. Determine if 1 lim x 2 x 2 → − exists, and compute its value if possible. Solution. To calculate the right-sided limit (x 2+), we make the substitution x = 2 + ε and let ε 0+. We obtain that the right-sided limit→ is ∞. Since this limit doesn’t exist, we must conclude→ that the two-sided limit doesn’t exist either. But, we might still be able to bend the rules a little and claim that the value of the limit is +∞ or ∞. To check, notice that the left- sided limit was computed in Example 3.8 with the result ∞−. Since the left- and the right-sided limits are different, the two-sided limit cannot be assigne−d a value. Example 3.11. Determine if x lim x 3 (x + 3)2 →− exists, and compute its value if possible. Solution. We calculate the right-sided limit by substituting x = 3 + ε, getting ( 3 + ε)/ε2 which tends to ∞ as ε 0+. The left-sided limit follows from− substituting x−= 3 ε, leading to ( 3 −ε)/ε2 which→ again tends to ∞. Since the left- and the right-sided− limits− coincide, we− may− write − x lim = ∞. x 3 (x + 3)2 − →− However, this limit doesn’t exist.

27 We can also consider the limit of a function f (x) when x tends to ∞. The most basic result here is that, for any positive integer n, ± 1 lim = 0. x ∞ xn →± This is quite easily seen from the graphs of 1/x and 1/x2, which you are encouraged to draw for yourselves. If we have two polynomials p(x) and q(x), then the limit p(x) lim x ∞ q(x) →± turns out to be determined, in large part, by the degrees of p(x) and q(x). Here, we recall that the degree of a polynomial is defined to be the highest power of x appearing for which the coefficient is non-zero. The limit will be: • 0 if the degree of p(x) is less than the degree of q(x). • finite (and non-zero) if the degrees of p(x) and q(x) are equal. • ∞ if the degree of p(x) is greater than the degree of q(x). ± The reason behind this is that when x is very large, the highest power of x completely dom- inates all the other powers of x, so one| | can essentially ignore the contributions of these other powers. We therefore have

m m 1 m p(x) amx + am 1x − + amx am m n lim = lim − ··· = lim = lim x − . ∞ ∞ n n 1 ∞ n ∞ x q(x) x bnx + bn 1x − + x bnx bn x →± →± − ··· →± →± This is, while perfectly correct, an intuitive argument for computing these limits. As we are trying to emphasise rigorous methods, the following examples illustrate how to convert this idea into a mathematically respectable deduction.

The method illustrated by the following examples should be used in all assign- ments, tests and exams during this course! Intuitive arguments of the type discussed above will, sadly, attract zero marks.

Example 3.12. Find 3x3 2x 1 lim − − . x ∞ 2x3 + 1 → − Solution. Here, the highest power of x present in the denominator is 3, so we divide top and bottom by x3, yielding

3 2 1 3x 2x 1 3 2 3 lim − − = lim − x − x x ∞ 2x3 + 1 x ∞ 2 + 1 → − → − x3 2 1 limx ∞ 3 2 3 → − x − x = 1 limx ∞ 2 + 3 → − x  3 = .  −2 n Here, we have used the fact that limx ∞ 1/x = 0 for all positive integers n. →± 28 Example 3.13. Find 7x2 x lim − . x ∞ 5x3 + 2 →− Solution. The highest power of x in the denominator is again 3, so we proceed as before:

2 7 1 7x x 2 lim − = lim x − x x ∞ 5x3 + 2 x ∞ 2 →− →− 5 + x3 7 1 limx ∞ 2 →− x − x = 2 limx ∞ 5 + 3 →− x  0 =  5 = 0. Example 3.14. Find 2x5 4x2 lim − . x ∞ 9x2 + 16 →− − Solution. This time, the highest power of x in the denominator is 2. Dividing top and bottom by x2 gives 5 2 3 2x 4x limx ∞ 2x 4 lim − = →− − x ∞ 2 16 9x + 16 limx ∞ 9 + 2 →− − →− − x  1 = lim 2x3 4 −9 x ∞ −  →−  = ∞. − Mind you, this limit doesn’t exist, strictly speaking. We finish by remarking that the formal definition given for limits in Section 3.2, where the limit is finite, cannot be used for infinite limits — it simply doesn’t make sense to talk about getting close to infinity. We need a slightly different definition. Definition. We say that L is the limit lim f (x)= L x +∞ → if, given any ε > 0, there exists a number N > 0 such that f (x) L < ε whenever x > N. | − | Similarly, we say that L is the limit lim f (x)= L x ∞ →− if given any ε > 0, there exists a number N > 0 such that f (x) L < ε whenever x < N. | − | − Our definitions for finite limits were loosely saying that if we get close enough to the limit point then the value of the function at that point gets close to the limit value. For infinite limits, we are saying rather that if we make x sufficiently big (either positive or negative), then the value of the function gets close to the limit value. Of course, one needs similarly modified definitions to deal with the case that the limit value is infinite. We leave the formulation of these definitions as an exercise.

29 Chapter 4

Continuity

4.1 Continuity at a Point

Intuitively, we might say that a function f (x) is continuous at a point x = c if f (c) is defined and if there are no holes or gaps in f (x) around the point c. Continuity can also be thought of, intuitively, as being able to draw the function’s graph without having to lift your pen from the paper. Formally, we define continuity of a function in a manner similar to how we defined limits. Definition. A function f is said to be left-continuous at a point c if

lim f (x)= f (c). x c → − A function f is said to be right-continuous at a point c if

lim f (x)= f (c). x c+ → These notions are illustrated in Figures 4.1 and 4.2. Definition. A function f is said to be continuous at a point c if

lim f (x)= f (c)= lim f (x). x c x c+ → − → In other words, a is both left- and right-continuous (and vice-versa). So, to check if a function f (x) is continuous at a point x = c, we have to confirm that:

• limx c f (x) exists. → • f is defined at x = c (no dividing by zero, c is in the domain of f , etc. . . ).

• The limit limx c f (x) is actually equal to f (c). → If one or more of the above tests fail, then the function is said to be discontinuous at x = c. Example 4.1. Consider the function

x ifx > 1, f (x)= 1 ( 2 if x 6 1. Is this function continuous at x = 1?

30 Figure 4.1: This function is left-continuous, but not right-continuous, at the point x = 1.

Figure 4.2: By contrast, this function is right-continuous, but not left-continuous, at x = 1.

31 Figure 4.3: The function of Example 4.1 is left-continuous (but discontinuous) at x = 1.

Solution. We compute the left- and right-sided limits as follows: 1 1 lim f (x)= lim = , lim f (x)= lim x = 1. x 1 x 1 2 2 x 1+ x 1+ → − → − → → Since the left- and right-sided limits are not equal, the limit of f (x) as x 1 is undefined, so the function f is discontinuous at x = 1. → 1 Note however, in this example, that f (1)= 2 which is equal to the left-sided limit. f (x) is therefore left-continuous at x = 1 (see Figure 4.3).

4.2 Algebraic Combinations and Continuity

Most of the algebraic operations we know and love for functions preserve continuity. Fact 4.1. If f and g are two functions which are continuous at a point c, then • f + g is continuous at c. • f g is continuous at c. − • f g is continuous at c.

• f /g is continuous at c, provided that g(c) = 0. 6 In particular, it follows that if f is continuous at c, then so is f . Moreover, if f is continuous at c, then so is 1/ f , provided only that f (c) = 0. − We showed in Section 3.3 that for any polynomial6 p(x), we have

lim p(x)= p(a). x a → It follows immediately from this that any polynomial is continuous at any point x = a. It now follows from the above fact that rational functions of polynomials will also be continuous when- ever the denominator is non-zero.

32 Some trigonometric functions, in particular sin and cos, are continuous everywhere. Others, such as tan, are continuous everywhere that they are defined. For example, tanx is not contin- uous at x = π/2 because it is not even defined there! Rational-power functions, for example f (x)= xm/n, are likewise continuous everywhere they are defined. Here, however, there is an annoying subtlety, best illustrated with the square root function f (x)= x1/2. Is f continuous at x = 0? It is easy to check that it is right- continuous there, but we cannot check left-continuity because the domain of f does not include any negative numbers (we cannot take the limit x 0−). When we are prevented from taking a left- (or right-) limit because of the→ domain, we take the practical way out and declare that continuity means right- (or left-) continuity, respectively. So, because the square root function is right-continuous at 0, we will say that it is continuous at 0. We will meet this again in Section 4.4.

4.3 Limits and Continuity for Compositions

Next, we present a couple of results concerning taking limits of compositions of functions and the related notion of whether continuity is preserved under compositions. Fact 4.2. Given two functions f and g, if g(x) has limit M at x = c and f is continuous at M, then lim( f g)(x)= f (M). x c → ◦ Note that the restrictions on f here are a good deal stronger than those on g. For g we only require that the limit exists. For f , we require continuity, which itself requires that the limit exists, but also requires that the limit be equal to the value of the function at that point. This is illustrated in the following example. Example 4.2. Let x2 4 f (x)= √x and g(x)= − . x 2 − Find limx 2( f g)(x), if it exists. → ◦ Solution. If order to apply the previous fact, we must first check that the limit of g exists at x = 2 and then check that f is continuous at the value of the limit. Now, x2 4 lim g(x)= lim − = lim(x + 2)= 4, x 2 x 2 x 2 x 2 → → − → so the limit of g does exist. We can also clearly see that f is continuous at the value x = 4 because it is a rational-power function of x. Thus,

lim f g(x) = f lim g(x) = f (4)= √4 = 2. x 2 x 2 →  →  This last example demonstrates  an alternative way of looking at our fact: Under the right conditions, namely that the limit of g exists and f is continuous at the value of this limit, we may take the limit inside f : lim f g(x) = f lim g(x) . x a x a → → This is essentially what it means for f to be continuous. Be careful though — you must always check that the condition on g is satisfied as well! Our second result says that composition preserves continuity.

33 Fact 4.3. Suppose that g and f are functions that are continuous at c and g(c), respectively. Then, f g is continuous at c. ◦ Proof. To see that f g is continuous at c, we have to check that its limit as x c exists and is equal to ( f g)(c). First,◦ as g is continuous at c, we know that g(c) is defined→ and that ◦ lim g(x)= g(c). x c → Furthermore, f is continuous at g(c), so f g(c) =( f g)(c) is defined and ◦ lim( f g)(x)= f (g(c)). x c → ◦ This means that f g is continuous at c. ◦ 4.4 Continuity on an Interval

We now turn to what it means for a function to be continuous not just at a point, but on an entire interval. The definition depends on whether the interval includes its endpoints or not.

Definition. A function f is said to be continuous on an open interval (a,b) if f is continuous at every point c (a,b). A function∈ f is said to be continuous on a closed interval [a,b] if f is continuous at every point c (a,b), right-continuous at a, and left-continuous at b. ∈ Similar definitions hold for intervals which are neither open nor closed, hence have the form (a,b] or [a,b). We simply have to check that the function is continuous on (a,b) and left- or right-continuous at the endpoint which is included in the interval.

Example 4.3. Show that f (x)= √x 4 − is continuous on the interval [4,8].

Solution. It is straight-forward to check that f is continuous on (4,8), being a translated square root function. We therefore check the endpoints x = 4 and x = 8:

lim √x 4 = lim (x 4)= 0 = f (4), x 4+ − x 4+ − → → lim √x 4 = q lim (x 4)= 2 = f (8). x 8− − x 8− − → q → f is therefore right-continuous at x = 4 and left-continuous at x = 8. It follows that f is contin- uous on [4,8].

4.5 The Intermediate Value Theorem

Fact 4.4 (The Intermediate Value Theorem). Let f be a function that is continuous on an in- terval [a,b] and let L be a real number between f (a) and f (b). Then, there exists at least one number c between a and b for which f (c)= L.

34 Figure 4.4: An illustration of the intermediate value theorem. Here, we have chosen L to lie between f (a) and f (b). Notice that there are two points c and d (actually, there are three!) between a and b for which f (c)= f (d)= L.

The intermediate value theorem is a formalisation of a concept that is quite obvious when one draws the graph of a continuous function. It says that if f is continuous, then it must take every value between f (a) and f (b) at least once as x varies between a and b (see Figure 4.4). You can demonstrate this easily to yourself by trying to draw the graph of a continuous function (so without lifting your pen from the paper!) such that f (0)= 0, f (1)= 1, and in between 1 the function never takes the value 2 . It shouldn’t take too long to convince yourself that this is impossible. A nice example illustrating how we might use the intermediate value theorem in what we call the bisection method. This method can be used to find, approximately, the roots of a continuous function, that is, the solutions to f (x)= 0. It does so by first finding an interval in which the function is positive at one endpoint and negative at the other. Assuming that our function is continuous on this interval, we can apply the intermediate value theorem to conclude that the interval contains at least one root of the function. To get more information about one of these roots, we can bisect our interval (cut it in two) to find a new interval in which the function is positive at one endpoint and negative at the other. Repeating this process then allows us to compute a root to whatever accuracy is needed. Of course, one has to find such an interval first! This might be rather difficult in practice. In fact, it could happen that no such interval exists because the function has no roots, for example x2 + 1, coshx.

Example 4.4. Let f be the function f (x)= x3 3x + 1. Find three intervals that contain a root of f and use four steps of the bisection method− to home in on one of these roots.

Solution. First, we recall that f is continuous everywhere because it is a polynomial function. To find our three intervals, we draw up a table of the values of f at integer points:

x 3 2 10 1 23 − − − f (x) 17 1 3 1 1 3 19 − − − Looking for intervals that have negative values at one endpoint and positive values at the other, this table gives us three: [ 2, 1], [0,1] and [1,2]. − −

35 Figure 4.5: This function is continuous and defined on [a,b]. It has an absolute minimum and an absolute maximum, by the Min-Max Theorem.

Let’s choose to home in on the root in the interval [ 2, 1]. At the midpoint 1.5, we compute that f ( 1.5)= 2.125, and so we deduce that there− is− a root in [ 2, 1.5] (since− this interval has a positive− value at one endpoint and a negative value at the other).− − Continuing this process, we get the data summarised in the following table:

Interval [a,b] f (a) f (b) Midpoint c f (c) [ 2, 1] 1 3 1.5 2.125 − − − − [ 2, 1.5] 1 2.125 1.75 0.8906 − − − − [ 2, 1.75] 1 0.8906 1.875 0.0332 − − − − [ 2, 1.875] 1 0.0332 1.9375 0.4607 − − − − − Bisecting one last time, we conclude that one of the roots of x3 3x + 1 lies in the interval [ 1.9375, 1.875]. − − − 4.6 The Min-Max Theorem

Fact 4.5 (The Min-Max Theorem). If f is continuous on a finite closed interval [a,b], then there are points x1,x2 [a,b] for which ∈

f (x1) 6 f (x) 6 f (x2), for all x in [a,b].

In other words, f (x1) is the absolute minimum among all values of f on [a,b] and f (x2) is the absolute maximum among all values of f on [a,b].

Note that the Min-Max Theorem only asserts the existence of the points x1 and x2. It does not give any hint as to how to find them. An example of a function illustrating this fact is given in Figure 4.5. It is also important to realise that if any of the hypotheses — f being continuous, the interval being finite, and the interval being closed — are not satisfied, then the existence of absolute maxima and minima is no longer guaranteed. Examples illustrating this are given in Figure 4.6.

36 Figure 4.6: Examples illustrating when the Min-Max Theorem fails: In the first example, the interval is not finite. In the second, f is not continuous. The third illustrates what can go wrong when the interval is not closed.

37 Figure 4.7: The discontinuity at x = 0 is removable because the value of the function may be redefined at this point to ensure continuity at x = 0.

It follows from the Min-Max Theorem that any function which is continuous on a finite closed interval is bounded on that interval. Recall that this means that there exists a B > 0 for which f (x) 6 B for all x in the interval. | | This is sometimes called the Boundedness Theorem.

4.7 Removable Discontinuities

Consider the function x + 1 if x = 0, f (x)= 6 (0 if x = 0. It is continuous everywhere except at x = 0. However, if we redefined f (0) so that f (0)= 1, then the new function so obtained would be continuous. In fact, it would be the polynomial x + 1. A discontinuity such as that of f is said to be removable (see Figure 4.7). Definition. A discontinuity at c of a function f is called removable if lim f (x) x c → exists. In general, a function f with a removable discontinuity at c can be replaced by the function F which agrees with f everywhere except at c where we instead define F(c)= lim f (x). x c → Note that there is no requirement for f to be defined at c. All we need is for the limit to exist! Finally, if we are lucky enough that all the discontinuities of a function f are removable, then the new function obtained by removing these discontinuities is called the continuous extension of f .

38 Chapter 5

Differentiation

5.1 The Derivative of a Function at a Point

We begin by revising what the derivative of a function is and what it means for a function to be differentiable.

Definition. Let f be a function and let c belong to the domain of f . If the limit

f (x) f (c) lim − x c x c → − exists, then this limit is called the derivative of f at c, written f ′(c), and the function f is said to be differentiable at c. If this limit does not exist, then the function is said to be non-differentiable at c.

The domain of the derivative f ′ of f consists of all c in the domain of f at which f is differen- tiable (meaning that f ′(c) is defined). The derivative is one of the two basic concepts of calculus — the other is, of course, the integral. You may recall that derivatives and integrals are related to one another via the fundamental theorem of calculus (we will review this in Section 8.7). We remark that a point c where f is not differentiable is often said to be a singular point. The interpretation that we normally use for differentiation is that the derivative of a function at a point c is the gradient (slope) of the line that is tangent to the function at c. This is pictured graphically in Figure 5.1. Many of the functions that you have encountered up to this point, and indeed, many of the functions that you will continue to use, are differentiable everywhere. Examples include polynomials, exponentials, sin and cos. Most of the other familiar functions will turn out to be differentiable everywhere except for a (relatively) small number of points. Here, the absolute value function is good example — it is not differentiable at c = 0 because the left-sided limit in the definition is 1 but the right-sided limit is +1. For an example of a different type, the tan function is not differentiable− at π/2 because tan(π/2) is not even defined. Before we continue, we need to recall some of the alternative notations surrounding differ- entiation. Above, we have used f ′ to denote the derivative. The other most frequently used notation is d f dy , or when y = f (x). dx dx If you are looking at other reference books in the library, you may come across notation for (k) derivatives such as Dx f or fx, and perhaps even f to represent the k-th derivative of a function. Throughout these notes, we will try to use either f ′ or d f /dx.

39 Figure 5.1: The derivative as the gradient of the tangent line to a function.

Example 5.1. Using the definition given above for the derivative, calculate f ′(c) (for any c) when f (x)= x. Solution. We note that the limit in the definition becomes f (x) f (c) x c f ′(c)= lim − = lim − = lim 1 = 1. x c x c x c x c x c → − → − → This is a result that many of you will already be familiar with. Note that we are allowed to cancel the numerator and denominator when they both equal x c because the process of taking the limit requires that x be very close, but not equal, to c. − Since the derivative of f (x)= x is equal to 1 at every point c, the derivative of f is the constant function f ′(x)= 1. When we want to compute the derivative of f as a function, rather than compute it at some point c, it is sometimes convenient to transform the limit appearing in the definition into another equivalent form: f (x + h) f (x) f ′(x)= lim − . h 0 h → We expect that this form is quite familiar to you. Note that the h in the denominator is really the difference h =(x + h) x between the points x + h and h at which we are evaluating f . − Example 5.2. Prove that the derivative of f (x)= sinx is cosx. Solution. No doubt, you all know this derivative, but how can you work it out from first prin- ciples? This relies upon the definition of derivative, knowing how to expand the sine of a sum, and two auxiliary limit computations: sinθ cosθ 1 lim = 1, lim − = 0. (5.1) θ 0 θ θ 0 θ → → The first we calculated in Section 3.5. The second may be checked using similar ideas and is left as a challenge!

40 The proof now follows from the following computations:

sin(x + h) sinx f ′(x)= lim − h 0 h → sinxcosh + cosxsinh sinx = lim − h 0 h → cosh 1 sinh = lim − sinx + cosx h 0 h h →   cosh 1 sinh = sinx lim − + cosx lim h 0 h h 0 h → → = sinx 0 + cosx 1 × × = cosx.

Here, the first equality is by definition, the second uses the addition formula for the sine func- tion, the third is just collecting terms, the fourth follows because sinx and cosx are constant with respect to h, the fifth follows from the auxiliary limits (5.1), and the last equality shouldn’t really need explaining. When calculating derivatives, we frequently rely on standard derivatives. The following table presents a few of the most common ones:

f (x) xn ex lnx sinx cosx tanx sinhx coshx tanhx f (x) nxn 1 ex 1/x cosx sinx sec2 x coshx sinhx sech2 x ′ − − A more complete list may be found at the front of your textbook (or online).

5.2 Continuity and Differentiability

Fact 5.1. If a function f is differentiable at a point c in (a,b), then f must also be continuous at c.

Proof. First, recall from the definition that the differentiability of f at c requires that f is defined at c. Choose x to be any point in (a,b) with x = c, so that we may write 6 f (x) f (c) f (x)= − (x c)+ f (c). x c − − Taking the limit as x c now gives → f (x) f (c) lim f (x)= lim − lim(x c)+ lim f (c) x c x c x c x c − x c → →  −  → → = f ′(c) 0 + f (c) × = f (c).

Here, we have used the assumption that f is differentiable at c to conclude that f ′(c) exists (and is finite!). Since the limit as x c of f (x) is equal to f (c), f is continuous at c. →

41 Figure 5.2: The graph on the left shows a continuous function that has “corners” where the func- tion is not differentiable. That on the right shows continuous function with a vertical tangent line — the function is not differentiable here.

Be careful with this fact! It says that differentiable functions are always continuous. It does not say that continuous functions are always differentiable! Figure 5.2 demonstrates two ways in which a continuous function can fail to be differentiable at a point. In fact, things can get much, much worse: There are some very crazy functions out there that are continuous everywhere but can’t be differentiated anywhere! Luckily, you will not be running into any of these functions in this course. However, you may be interested in looking up a couple of these weird functions. One example is the “Blancmange curve”, shown in Figure 5.3 (named after a European dessert made with almond milk and cream).

5.3 One-Sided Derivatives

As we have already seen with the definitions of one-sided limits and one-sided continuity, derivatives also have a one-sided version that can be useful in applications. Definition. Suppose that f is defined on the interval [c,b). Then, f is said to have a right-sided derivative f+′ at c if the limit f (x) f (c) f+′ (c)= lim − x c+ x c → − exists. Similarly, if f is defined on the interval (a,c] and f (x) f (c) f ′ (c)= lim − − x c x c → − − exists, then f is said to have a left-sided derivative f ′ at c. − In the graphs of Figure 5.4, we lack either a left- or a right-sided derivative at the points of dis- continuity, because the function is not left- or right-continuous there, respectively. The graph on the left of Figure 5.2 is a little different because its function has both a left-sided derivative and a right-sided derivative at each “corner” point, but their values are not equal, so the derivative does not exist.

42 Figure 5.3: The Blancmange function (also called the Takagi curve). You should be able to open the wikipedia.org page about it by clicking here.

Figure 5.4: In the graph on the left, the right-sided derivative is defined at the discontinuity, but the left-sided derivative is not. Similarly, the graph on the right shows an example in which the left-sided derivative is defined at the discontinuity, but the right-sided derivative is not.

43 Example 5.3. Let f be the function defined by

x2 if x 6 1, f (x)= 6 (x if x > 1.

Determine whether f is differentiable at x = 1.

Solution. First, we shall check to see whether f is continuous at x = 1. If it is not, then we can immediately conclude that it is not differentiable. Now, f (1)= 12 = 1 and taking the left- and right-sided limits gives

lim f (x)= lim x2 = 1 and lim f (x)= lim x6 = 1. x 1 x 1 x 1+ x 1+ → − → − → → Therefore, f is continuous at x = 1, so we must check the left- and right-sided derivatives. Using the definitions directly yields

f (x) f (1) x2 1 lim − = lim − = lim (x2 + 1)= 2, x 1 x 1 x 1 x 1 x 1 → − − → − − → − f (x) f (1) x6 1 lim − = lim − = lim (x5 + x4 + x3 + x2 + x + 1)= 6. x 1+ x 1 x 1+ x 1 x 1+ → − → − → Alternatively, we can realise that the derivative of f satisfies

2x if x < 1, f ′(x)= 5 (6x if x > 1

(we must exclude x = 1 — remember that we are trying to determine if f is differentiable at 1). Taking limits now effortlessly gives

5 lim f ′(x)= lim (2x)= 2 and lim f ′(x)= lim (6x )= 6, x 1 x 1 x 1+ x 1+ → − → − → → again. However the computation is performed, it shows that the left- and right-sided derivatives of f are not equal at x = 1. Thus, f is not differentiable there.

5.4 Differentiation Rules

If f and g are functions which are both differentiable at x, then the functions f + g, f g, k f (with k a constant), f g and f /g (with g(x) = 0) are all differentiable at x. Moreover, we− have the following rules to compute the derivatives6 of these functions:

• Sums, differences and constant multipliers:

d d f (x) dg(x) f (x)+ g(x) = + = f (x)+ g (x), dx dx dx ′ ′ d  d f (x) dg(x) f (x) g(x) = = f ′(x) g′(x), dx − dx − dx −  d d f (x) k f (x) = k = k f (x). dx dx ′ 

44 • Product Rule: d d f (x) dg(x) f (x)g(x) = g(x)+ f (x)= f (x)g(x)+ f (x)g (x). dx dx dx ′ ′  • Quotient Rule: d f (x) dg(x) d f (x) g(x) f (x) f (x)g(x) f (x)g (x) = dx − dx = ′ − ′ . dx g(x) g(x)2 g(x)2

We repeat the obvious proviso that the quotient rule requires that g(x) = 0. An important special case of this rule occurs when f (x)= 1, in which case it becomes 6 d 1 1 dg(x) g (x) = = ′ . dx g(x) −g(x)2 dx −g(x)2

Example 5.4. Find the derivative of x2 sinx. Solution. Applying the product rule, we get d d d (x2 sinx)= (x2)sinx + x2 (sinx)= 2xsinx + x2 cosx. dx dx dx Example 5.5. Find the derivative of cscx. Solution. Here, we use quotient rule: d d 1 1 d cosx cscx = = (sinx)= = cscxcotx. dx dx sinx −sin2 x dx −sin2 x − 5.5 Chain Rule for Differentiating Compositions

Suppose that y is a function of u — y = f (u) — and u is, in turn, a function of x — u = g(x). Then, y is a function of x. In fact, it is a composition of f and g:

y = f (g(x))=( f g)(x). ◦ If, further, g is differentiable at x and f is differentiable at u = g(x), then f g is differentiable at x and we have the ◦ • Chain Rule: dy dy du = . dx du dx With “prime notation”, the chain rule takes the alternative form

( f g)′(x)= f ′(g(x))g′(x). ◦ Example 5.6. Let f (x)= sinx and g(x)= x2. Find the derivative of f g. ◦ 2 Solution. The composition here is f (g(x)) = sin(x ). Since f ′(x)= cosx and g′(x)= 2x, the chain rule gives

2 ( f g)′(x)= f ′(g(x))g′(x)= cos g(x) g′(x)= 2xcos(x ). ◦  45 Figure 5.5: Finding the angle between two curves. 5.6 Some Simple Applications

One example where derivatives come into play is the calculation of the angle between two (smooth) curves, given by the graphs of (differentiable) functions f1 and f2, at a point of inter- section. By “angle”, we of course mean the angle between the lines tangent to the two curves at the intersection point. Referring to Figure 5.5, we are trying to find the angle α. If α1 and α2 denote the (anticlockwise) angles that the tangent lines to f1 and f2 make with the positive x-axis, respectively, then the slopes m1 and m2 of the tangent lines to f1 and f2 are just given by tanα1 and tanα2, respectively. However,

tanα2 tanα1 m2 m1 tanα = tan(α2 α1)= − = − , − 1 + tanα1 tanα2 1 + m1m2 so we can obtain the angle of intersection from the derivatives of the functions at the intersection of their graphs. 2 3 Example 5.7. Let f1(x)= x and f2(x)= x . Find their points of intersection and the angle between their graphs at each intersection point. Solution. These two curves intersect when x2 = x3, that is, at x = 0 and x = 1. The intersection points of the graphs of y = x2 and y = x3 are therefore (0,0) and (1,1). At (0,0), we compute that the slopes of the two tangent lines are

m1 = f1′(0)= 0 and m2 = f2′(0)= 0, yielding m2 m1 tanα = − = 0. 1 + m1m2 The angle between y = x2 and y = x3 at (0,0) is therefore 0.

46 At (1,1) we have instead

m1 = f1′(1)= 2 and m2 = f2′(1)= 3, yielding m2 m1 1 tanα = − = . 1 + m1m2 7 Using the inverse tan function, it follows that the angle between y = x2 and y = x3 at (1,1) is approximately 0.14 radians, or if you prefer, approximately 8 degrees. Another example where derivatives are used is when calculating the equation of a line tan- gent to a curve. If the curve is represented as the graph of y = f (x) and we want the equation of the tangent to this graph at x = c, we use the fact that the slope of the tangent line is the derivative f ′(c). The equation of the tangent line to f at c is therefore

y = f ′(c)(x c)+ f (c). − Example 5.8. Determine the equation of the tangent line to f (x)= x2 at x = 2.

Solution. Since f ′(x)= 2x, we obtain f ′(2)= 4. Because f (2)= 4 as well, the equation of the tangent to x2 at x = 2 is

y = 4(x 2)+ 4 y = 4x 4. − ⇒ − 5.7 Differentials, ∆-notation and Linear Approximation

If we have some fixed point x in mind, we often write ∆x, pronounced “delta x”, to indicate a small change in x. In particular, x + ∆x will be a point that is close to x. If f is a function of x, then we use ∆ f to denote the change in the value of f when x is changed by ∆x:

∆ f = f (x + ∆x) f (x). − Using this notation, we may rewrite the definition of a derivative as

f (x + ∆x) f (x) ∆ f f ′(x)= lim − = lim . ∆x 0 (x + ∆x) x ∆x 0 ∆x → − → Looking at this, we see the familiar “rise over run” definition of a tangent — the ∆ f is the small rise, and the ∆x is the small run. This is what was being used in the applications of the previous section. We also see the origin of one of the popular notations for the derivative: d f ∆ f = lim . dx ∆x 0 ∆x → We now use a similar notation to define what is sometime called a differential. The differ- ential dx is just the same thing as ∆x. However, the differential d f of a differentiable function is defined somewhat differently to ∆ f . We rather set

d f = f ′(x)dx.

47 Figure 5.6: A demonstration of linear approximation. In this case, we can see that as we get 1 further away from the point x = 2 , the quality of the approximation gets worse.

One of the main applications of differentials is to estimating function values using a method known as linear approximation. This relies on the fact that the differentiability of f guarantees that ∆ f d f , at least when ∆x = dx is taken sufficiently small. It follows that ≈ f (x + ∆x)= f (x)+ ∆ f f (x)+ d f = f (x)+ f ′(x)dx = f (x)+ f ′(x)∆x. ≈ The interpretation of this is that if f is a differentiable function whose value and derivative are known at some point x, then we may obtain good approximation for the values of f at points sufficiently close to x. The name “linear approximation” comes from the method of replacing f by a linear function of ∆x (see Figure 5.6). Example 5.9. Newton’s law of gravitation states that the force F of attraction between two particles is given by k F = , r2 where k is a constant and r is the distance between the two particles. Assuming the value r = 20cm, use linear approximation to approximate the change in r that will increase the force F by 10%. In other words, find out by how much we need to decrease the distance between the two particles to get this increase in force. Solution. When we change r from 20 to 20 + ∆r, how does F change? Using the method of linear approximation, we have an approximate answer:

∆F = F(20 + ∆r) F(20) F′(20)∆r. − ≈ 3 Now, F′(r)= 2k/r and so − 2k ∆F ∆r. ≈−203 To increase F by 10%, we require ∆F/F(20)= 0.1 which yields ∆F 2k/(20)3∆r ∆r 0.1 = − = F(20) ≈ k/(20)2 −10

48 Solving this gives ∆r = 1. This means that we should bring the particles 1cm closer in order to have the force increase− (approximately) by 10%. Another application of linear approximation is in finance to what is known as marginal analysis. If one wants to make good business decisions, it is necessary to know what will happen to production costs, revenues and profits when certain changes are introduced. For instance, one would think that it would be important to be able to estimate if the rate of change of profits, with respect to an increase in production, is positive or negative. The important quantities for this analysis are R(x), the revenue for producing (and selling!) x units, C(x), the cost of producing x units, and P(x)= R(x) C(x), the profit made on x units. In economics, the word marginal refers to an instantaneous rate− of change, that is, to a derivative. Hence, the marginal profit for producing additional units is defined to be P′(x) and the marginal revenue and cost are just R′(x) and C′(x), respectively. We shall now use linear approximation to approximate the change in production cost, rev- enue and profit when we increase the production by a single unit. Let’s start with the change in cost: dC(x) ∆C = C(x + ∆x) C(x) ∆x = C′(x), − ≈ dx since in our case ∆x = 1. Thus, the marginal cost C′(x) is the approximate cost of the next unit. The same argument now gives ∆R R (x) and ∆P P (x). ≈ ′ ≈ ′ Example 5.10. Suppose that the cost function (in dollars) to produce x barrels of a certain beverage is modelled by

C(x)= 0.001x3 0.3x2 + 50x + 100. − Find the marginal cost of producing one additional barrel whenx = 100.

Solution. We compute the derivative of C(x):

2 C′(x)= 0.003x 0.6x + 50. −

The marginal cost of the next barrel when x = 100 is therefore C′(100)= 20. Let’s compare this approximate value with the actual value, given by

C(101) C(100)= 3120.001 3100 = 20.001. − − In this case, the marginal cost approximates the actual cost to one-tenth of a cent! This accuracy is to be expected because 1 is significantly smaller than 100 and we are in the region in which the marginal cost function is changing very slowly (see Figure 5.7). The behaviour illustrated in this figure is very typical of cost functions, since unit costs normally decrease to start with, but then increase again with production one may have to hire more people, buy more manufacturing machines, etc. . .

5.8 Implicit Differentiation

Up to now, the functions we have considered have always been of the form y = f (x). This means that we have been implicitly assuming that we are always able to untangle the independent (x)

49 150

100

50

0 50 100 150 200 250 300

Figure 5.7: The marginal cost function computed in Example 5.10. and dependent (y) variables. But, we’re not always this lucky. What about when we want to differentiate curves for which it is not so easy to untangle x and y. For example, what about √2π xsiny + ycosx = ? (5.2) 4 Here, the trick is to realise that y may actually be thought of as a function of x, y = y(x), and so we can apply the chain, product and quotient rules that we have just introduced. Of course, there is nothing stopping us from declaring instead that x is a function of y and reaching the same conclusion. Sometimes, this latter approach will be easier and sometimes it will not. To see the trick in action, here are the (respective) results of differentiating (5.2) above with respect to x and with respect to y: dy dy dx dx siny + xcosy + cosx ysinx, siny + xcosy + cosx ysinx . dx dx − dy − dy Notice that we have used the product and chain rules several times each. For example, d d d d dy dy (xsiny)= (x)siny + x (siny)= siny + x (siny) = siny + xcosy . dx dx dx dy dx dx This technique is known as implicit differentiation, as compared to the explicit differentiation that we have been doing so far (with y = f (x)). The following examples demonstrate some situations for which implicit differentiation proves to be quite convenient. dy Example 5.11. Consider the curve defined by y2 = x. Find using explicit differentiation. dx Solution. To differentiate explicitly, we need to get the curve into the form y = f (x). We therefore split the curve up into two functions as follows:

y1(x)= √x (x > 0), y2(x)= √x (x > 0). − Note that y1 > 0 whereas y2 < 0. Differentiating and writing in terms of y1 and y2, we obtain

dy1 1 1 1 dy2 1 1 1 = = (y1 > 0), = = (y2 < 0). dx 2 √x 2y1 dx −2 √x 2y2 Combining these two cases then finally gives us our answer: dy 1 = (y = 0). dx 2y 6

50 dy Example 5.12. Consider again the curve y2 = x. This time, find using implicit differentia- dx tion.

Solution. Differentiating both sides implicitly with respect to x immediately gives d d dy dy 1 (y2)= (x) 2y = 1 = (y = 0). dx dx ⇒ dx ⇒ dx 2y 6 Looking at these two examples, we see that even when it is possible to split a curve into two or more functions in order to apply explicit differentiation, it is much more efficient to use implicit differentiation instead. In fact, the curve y2 = x can be differentiated in a third manner. For this is in the form x = f (y), so we can apply explicit differentiation with respect to y and get dx = 2y. dy Now, note that if we substitute x = y into the chain rule dy dy du = , dx du dx we obtain 1 dy dy du dy du − 1 = = = . dy du dy ⇒ du dy   Applying this to our example, we again arrive at

1 dy dx − 1 = = . dx dy 2y   dy Example 5.13. Find when (x,y) lies on the circle x2 + y2 = R2. dx Solution. We implicitly differentiate the equation defining the circle with respect to x to get dy 2x + 2y = 0. dx Solving therefore gives dy x = (y = 0). dx −y 6 Example 5.14. Show that the two curves (a pair of hyperbolae)

x2 y2 = a and xy = b, − where a and b are non-zero constants, are orthogonal at their points of intersection.

Solution. We need to find the gradient of each curve at their intersection points. Differentiating implicitly, we get dy dy x 2x 2y = 0 = (y = 0) − dx ⇒ dx y 6 for the first, and dy dy y y + x = 0 = (x = 0) dx ⇒ dx −x 6 51 Figure 5.8: The two families of hyperbolae analysed in Example 5.14 intersect each other orthogonally. for the second. Now, if (x0,y0) is any intersection point for these two curves, then the gradient of the line tangent at (x0,y0) to each curve is given by

x0 y0 m1 = and m2 = , y0 −x0 respectively. It therefore follows that

m1m2 = 1, − which is equivalent to the tangent lines being orthogonal (meeting at 90 degrees). We’ve drawn some examples of these hyperbolae in Figure 5.8 to illustrate that the curves always intersect orthogonally.

5.9 Some Important Theorems

We now turn to some more theoretical results. The first should be pretty familiar. It specifies some important conditions under which a point can be guaranteed to be critical. Recall that a critical point of a function f is a point c at which the derivative vanishes or is undefined. Fact 5.2. Suppose that f is a function satisfying: • f is defined on (a,b),

• f has a maximum or minimum value at c (a,b), and ∈ • f ′(c) exists.

Then, c is a critical point for f with f ′(c)= 0.

52 Proof. Suppose that f has a maximum at c, so that f (x) f (c) 6 0 for all x in (a,b). If we choose x > c, then we see that − f (x) f (c) f (x) f (c) − 6 0 lim − 6 0. x c ⇒ x c+ x c − → − However, if we choose x < c, then we obtain instead f (x) f (c) f (x) f (c) − > 0 lim − > 0. x c ⇒ x c x c − → − − But f ′(c) exists, so both these limits are equal to f ′(c) and we conclude that the only consistent conclusion is f ′(c)= 0. A similar argument deals with the case when f has a minimum value at c and the proof is complete. The second result formalises the intuitively obvious fact that on an interval, any “nice” function which takes the same value at both endpoints must have a critical point in between. Fact 5.3 (Rolle’s Theorem). Suppose that f is a function satisfying:

• f is continuous on [a,b],

• f is differentiable on (a,b), and

• f (a)= f (b).

Then, there exists some c in (a,b) for which f ′(c)= 0. Proof. We first deal with the special case in which f (x)= f (a)= f (b) for every x in (a,b). Then, f is a constant function, so f ′(c)= 0 for all c in (a,b) and, for this case, the theorem is proved. Otherwise, there is a point c′ in (a,b) for which f (c′) = f (a)= f (b). If f (c′) > f (a), then the Min-Max theorem of Section 4.6 tells us that there is a6 point c in [a,b] on which f takes its maximal value. But, c cannot be a or b as the maximal value is at least f (c′) > f (a)= f (b). Therefore, f takes its maximal value at some c in (a,b). Alternatively, we could have f (c′) < f (a) in which case, the same argument tells us that f takes its minimal value at some c in (a,b). Either way, the differentiability of f on (a,b) tells us that f ′(c) exists, hence that we may apply the previous fact. It tells us that f ′(c)= 0 as required.

Example 5.15. Verify Rolle’s Theorem for the function f (x)= sinx in the interval [0,π]. Solution. Since f is both continuous and differentiable everywhere and satisfies f (0)= f (π)= 0, the hypotheses of Rolle’s Theorem are satisfied. We should therefore be able to find at least 1 one c in (0,π) for which f ′(c)= 0. But, f ′(x)= cosx is zero for x = 2 π. This verifies Rolle’s Theorem. Our third result is somewhat more technical, but is, perhaps surprisingly, significantly more important for the theoretical development of calculus than the previous two. It tells us that for any “nice” function f on an interval [a,b], there is a point c where the line tangent to the graph y = f (x) has the same slope as the line joining (a, f (a)) and (b, f (b)). This is illustrated in Figure 5.9. Fact 5.4 (The Mean Value Theorem). Suppose that f is a function satisfying:

53 Figure 5.9: An illustration of the Mean Value Theorem: At the point x = c, the tangent to the curve is parallel to the line passing through the endpoints of the graph.

• f is continuous on [a,b] and • f is differentiable on (a,b). Then, there exists some c in (a,b) for which f (b) f (a) f (c)= − . ′ b a − Proof. The equation of the line passing through (a, f (a)) and (b, f (b)) is y f (a) f (b) f (a) f (b) f (a) − = − y = f (a)+ − (x a). x a b a ⇒ b a − − − − We use f to define a new function g by f (b) f (a) g(x)= f (x) f (a) − (x a). − − b a − − This new function has the property that g(a)= g(b)= 0. Furthermore, g is continuous on [a,b] and differentiable on (a,b) as f is. Hence, we may apply Rolle’s Theorem to g and conclude that there exists some c in (a,b) for which g′(c)= 0. But, f (b) f (a) g′(x)= f ′(x) − , − b a − so g′(c)= 0 implies that f (b) f (a) f (c)= − , ′ b a − as required. Example 5.16. Consider the function f (x)= x2/3 on the interval [ 1,1]. Show that the Mean Value Theorem cannot be applied to this case. −

Solution. The function is certainly continuous on the interval [ 1,1], but the derivative f ′(x)= 2 1/3 − 3 x− is not differentiable at x = 0. Because the function is not differentiable at every point of ( 1,1), the Mean Value Theorem cannot be applied. − Indeed, because f ( 1)= f (1)= 1, the Mean Value Theorem would, if it could be applied, − guarantee a c in ( 1,1) with f ′(c)= 0. However, it is easy to see that the derivative never takes this value! −

54 7.5

5

2.5

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

-2.5

-5

-7.5

Figure 5.10: The polynomial f (x)= x5 + 4x3 + 2x + 3 has only one real root.

Example 5.17. Use the Mean Value Theorem to prove that f (x)= x5 + 4x3 + 2x + 3 has no more than one real root. Solution. We are looking for real solutions to f (x)= 0. Let us assume that there are (at least) two different real roots, calling them a and b. Since f is differentiable everywhere, hence continuous everywhere, and f (a)= f (b)= 0, we may apply the Mean Value Theorem. It tells 4 2 us that there must be some c in (a,b) such that f ′(c)= 0. But, f ′(x)= 5x + 12x + 2 > 2, so there cannot be such a c. This is a contradiction, and the only way out is to reject our original assumption that there are (at least) two different real roots. This proves that f has no more than one real root (see Figure 5.10). [In fact, it is not hard to use the Intermediate Value Theorem to prove that there is precisely one root!]

Example 5.18. Suppose that you are driving your car past point A at 50km/h at 5pm and, 15km further down the road, you pass point B travelling at 55km/h at 5:12pm (see Figure 5.11). Your speed has only been recorded at points A and B, but you have been issued a traffic speeding ticket for exceeding the speed limit of 60km/h. Why?

Solution. According to the Mean Value Theorem, there must have been a time tC between tA and tB such that your speed was

dx x(tB) x(tA) 15km v(tC)= = − = = 1.25km/min = 75km/h. dt tB tA 12min t=tC − That’s why you were fined!

Finally, we present a generalisation of the Mean Value Theorem which is sometimes referred to as Cauchy’s Theorem. Fact 5.5 (The Generalised Mean Value Theorem). Suppose that f and g are functions satisfying: • f and g are both continuous on [a,b], • f and g are both differentiable on (a,b), and • g (x) = 0 whenever a < x < b. ′ 6 Then, there exists some c in (a,b) for which f (b) f (a) f (c) − = ′ . g(b) g(a) g (c) − ′ 55 Figure 5.11: The setup for the speeding ticket example.

Proof. The trick to proving this theorem is the construction of a suitable auxiliary function h which mimics the function g that was constructed in the proof of the Mean Value Theorem. The correct choice is to define

h(x)= f (b) f (a) g(x) g(a) g(b) g(a) f (x) f (a) . − − − − − Then, h is continuous on [a,b] and differentiable on (a,b), because f and g are, and satisfies h(a)= h(b)= 0. We may therefore apply Rolle’s Theorem to conclude that h′(c)= 0 for some c in (a,b). But, this means that

f (b) f (a) f ′(c) f (b) f (a) g′(c)= g(b) g(a) f ′(c) − = , − − ⇒ g(b) g(a) g (c) − ′   as required. Notice that we did not have to assume that g(a) = g(b) in the hypotheses of Cauchy’s Theorem, even though the conclusion involves dividing6 by g(b) g(a). This is because g(a)= g(b), − together with the continuity and differentiability of g, would imply that g′(c)= 0 for some c between a and b. The third hypothesis above therefore sneakily rules out the possibility that g(a)= g(b).

56 Chapter 6

Inverse Functions

6.1 Monotonic Functions

We start by revising the concept of monotonicity. As far as functions are concerned, monotonic means that the function is either increasing or decreasing. Unfortunately, there are two com- peting conventions for what these latter terms mean precisely. In particular, “ f is increasing” is taken to mean either

x < y f (x) < f (y) or x 6 y f (x) 6 f (y), ⇒ ⇒ and similarly for “ f is decreasing”. The difference is annoyingly small, but can be crucial in some statements. Because of this, we will try to avoid these terms unless the convention used is irrelevant to the statement’s truth. Instead, we will prefer the following terminology:

Definition. Let f be a function and x and y be two points in the domain of f . Then, f is said to be

• strictly increasing if x < y implies that f (x) < f (y),

• non-decreasing if x 6 y implies that f (x) 6 f (y),

• strictly decreasing if x < y implies that f (x) > f (y), and

• non-increasing if x 6 y implies that f (x) > f (y).

The distinction between strictly increasing (decreasing) and non-decreasing (increasing) is illustrated in Figure 6.1. Please note that a non-decreasing function is not at all the same thing as a function which is not decreasing. The former must be strictly increasing or constant on some intervals, whereas the latter could have intervals in which it decreases, as long as it increases somewhere else.

Fact 6.1. Let f be a continuous function on [a,b] and differentiable on (a,b). Then:

• If f ′(x) > 0 on (a,b), then f is strictly increasing on [a,b].

• If f ′(x) < 0 on (a,b), then f is strictly decreasing on [a,b].

• If f ′(x) > 0 on (a,b), then f is non-decreasing on [a,b].

• If f ′(x) 6 0 on (a,b), then f is non-increasing on [a,b].

57 Figure 6.1: Function (a) is strictly increasing, (b) is strictly decreasing, (c) is non-decreasing and (d) is non-increasing.

Moreover, the continuity may be restricted to (a,b], [a,b) or (a,b) and the conclusion still holds with the increasing and decreasing likewise restricted.

Proof. Suppose that we have two points x and y in [a,b] with x < y. Since f is differentiable on (x,y) and continuous on [x,y], we may apply the Mean Value Theorem to conclude that there is a c in (x,y) for which f (x) f (y) − = f (c). x y ′ − As x y > 0, we see that f (x) f (y) will vanish if and only if f (c)= 0 and must have the − − ′ same sign as f ′(c) otherwise. All four conclusions therefore follow. Changing the interval of continuity changes the argument only negligibly.

6.2 One-to-One Functions and Inverses

The idea behind inverse functions can be illuminated by the following (rather trivial) example. We let y = f (x)= ax + b (with a = 0) 6 and ask if we can express x in terms of y: x = g(y). Of course, the answer is “yes!” and we can even write g down explicitly: y b x = g(y)= − . a The functions f and g are inverse to one another in the sense that

g( f (x)) = x, (for all x in R) and f (g(y)) = y, (for all x in R).

In other words, the compositions f g and g f are both just the identity function. ◦ ◦

58 Figure 6.2: This is a one-to-one function.

In this case, we were able to solve uniquely for x in terms of y. On the other hand, the function f defined on R by f (x)= x2 has no inverse, because solving y = x2 for x leads to x = √y. Here, we don’t get a unique solution for x in terms of y. However,± if we restrict the domain of f (x)= x2 to the non-negative real numbers [0,∞), then we can solve y = f (x) uniquely for x — the solution is just x =+√y. Now f does have an inverse and it is given by g(y)= √y. The moral of the story is therefore that the existence of an inverse to a function f depends strongly upon the domain that f is given.

Definition. A function f is said to be one-to-one (1-1) if different points of the domain are sent to different values by f : x1 = x2 f (x1) = f (x2). 6 ⇒ 6 If you graph a one-to-one function, you will find that the vertical lines x = c cut the graph at only one point (if at all). This is illustrated in Figures 6.2 and 6.3.

1 Fact 6.2. If f is a one-to-one function, then it has an inverse function f − defined by

1 y = f − (x) if and only if x = f (y).

We would like to point out some unfortunate notational conventions. For trigonometric and hyperbolic functions, for example the cosine function cos, it is important to realise that cos2 1 customarily refers to the square of the cosine function, but cos− refers to a certain inverse function (to be discussed in Section 6.5). To put it succinctly,

2 2 1 1 1 cos x =(cosx) , but cos− x =(cosx)− = . 6 cosx This is annoying, but it is much too late to try to do anything about it! Remember this, and don’t 1 confuse cos− with 1/cos.

59 Figure 6.3: This function is not one-to-one.

6.3 Properties of Inverse Functions

The most important property of the relationship between an invertible functions f and its inverse 1 f − is that 1 y = f − (x) and x = f (y) are one and the same equation! Here are some more useful properties:

1 1. The domain of f − is the range of f . 1 2. The range of f − is the domain of f . 1 3. f − ( f (x)) = x for all x in the domain of f . 1 1 4. f ( f − (y)) = y for all y in the domain of f − . 1 1 5. ( f − )− (x)= f (x) for all x in the domain of f . 1 6. The graph of y = f − (x) is the reflection of the graph of y = f (x) about the line y = x. 1 1 3 and 4 say that the compositions f − f and f f − are both the identity function. 5 says that 1 ◦ ◦ 1 1 1 1 if f has an inverse f − , then this inverse also has an inverse ( f − )− and, in fact, ( f − )− = f . 6 is pictured in Figure 6.4.

Example 6.1. Show that g(x)= √2x + 1 is invertible and, if it is, find its inverse. Solution. Let’s first check whether g is one-to-one. The (natural) domain of g consists of all x 1 ∞ for which 2x+1 > 0. That is, the domain is [ 2 , ). If x1 and x2 are two points of this domain, then we have −

g(x1)= g(x2) 2x1 + 1 = 2x2 + 1 ⇒ 2x1 + 1 = 2x2 + 1 x1 = x2, ⇒ ⇒ p p 60 1 Figure 6.4: The graph of y= f − (x) is the reflection of the graph y = f (x) about y = x. from which we see that g is one-to-one. As g is one-to-one, it has an inverse. In this case, the inverse function is found by solving y = g(x) for x:

y2 1 y = √2x + 1 y2 = 2x + 1 x = − . ⇒ ⇒ 2 Note that the domain of g is [ 1 ,∞) and its range is [0,∞), whilst the domain of g 1 is [0,∞) − 2 − (not its natural domain!) and its range is [ 1 ,∞). The correct inverse function is then − 2 y2 1 g 1(y)= − , restricted to the domain y > 0. − 2 By the way, it doesn’t matter which letter you use to denote an element of the domain or the range of a function. Traditionally, we tend to use x for the domain and y for the range, but it is 1 common to speak of a function f (x) and its inverse f − (x). For instance, in this example, it’s perfectly correct to say that the inverse function of g(x) is g 1(x)= 1 (x2 1) with x > 0. − 2 − Example 6.2. Calculate the inverse of x 2 f (x)= − x + 2 on its natural domain, showing first that the inverse exists.

Solution. First, note that the natural domain is the set consisting of all the real numbers except for 2. The domain is therefore the union of two intervals: ( ∞, 2) ( 2,∞). Next, observe that− − − ∪ − x + 2 4 4 4 f (x)= − = 1 f ′(x)= > 0. x + 2 − x + 2 ⇒ (x + 2)2

61 Figure 6.5: The graph of g(x)= √2x + 1 and its inverse.

The positivity of the derivative implies that f is strictly increasing on the intervals ( ∞, 2) and ( 2,∞). This means that f is one-to-one when restricted to each interval. To show− − that f is one-to-one,− we therefore only need check that the range of values taken by f (x) when x is restricted to the first interval does not overlap with the range of values for x in the second interval. Checking limits, we compute that x in ( ∞, 2) gives f (x) in (1,∞) and x in ( 2,∞) gives f (x) in ( ∞,1). f is therefore one-to-one,− hence− it has an inverse. − Calculating− the inverse is comparatively easy: x 2 y = − xy + 2y = x 2 x + 2 ⇒ − 2(y + 1) x(y 1)= 2 2y x = . ⇒ − − − ⇒ − y 1 − This shows that 1 2(x + 1) f − (x)= . − x 1 − The domain of f 1 is the range of f : ( ∞,1) (1,∞). − − ∪ 1 We can check the answer given to this example by computing the composition f − f as fol- lows: ◦ x 2 1 2 f (x)+ 1 x−+2 + 1 x 2 + x + 2 f − ( f (x)) = = 2 x 2 = 2 − = x. − f (x) 1 − − 1 − x 2 x 2 −  x+2 − − − − 1 We leave the computation of f f − as an exercise. The above example argues◦ that a strictly increasing or decreasing function is always one- to-one, hence invertible, when restricted to any interval in its domain. However, one has to be

62 Figure 6.6: The function F(x)= x2 and its inverse on the restricted domain x > 0. very careful when concluding invertibility in this fashion. The function

x 2 if x > 0, f (x)= − (x + 2 if x < 0, has derivative 1 at all points of its domain, hence is strictly increasing on the intervals ( ∞,0) and (0,∞). However, it is not a strictly increasing function and, in particular, it is not −one-to- one. For example, we have f ( 1)= 1 = f (1). When a function f fails to− be one-to-one, it also fails to have an inverse. However, it is often true that we can get an inverse by suitably restricting the domain of f to make it one–to-one. Indeed, we saw this with the example of restricting f (x)= x2 to x > 0 — see Figure 6.6 — for 1 which inverse is f − (x)= √x, a rather useful function. We will see more important examples of inverting functions that are not one-to-one in Sections 6.5 and 6.9.

6.4 Derivatives of Inverse Functions

Suppose now that we have a differentiable, one-to-one function f on an interval I and that f ′(x) = 0 on I. Then, the graph of y = f (x) has a tangent line at each x in I which is not 6 1 horizontal. Reflecting about y = x, we learn that the graph of y = f − (x) has a tangent line at each point of its domain (an interval that will be different to I in general) which is not vertical. This is the basis for the following result: Fact 6.3. If f is one-to-one and differentiable on an interval, with a derivative that never van- 1 ishes, then the inverse function f − is differentiable and

d 1 1 f − (x)= 1 . dx f ′( f − (x))

1 Proof. Let y = f − (x), so that x = f (y). Differentiating implicitly with respect to x gives dx d f (y) d f (y) dy dy 1 = = = = f (y) , dx dx dy dx ′ dx

63 so that d 1 dy 1 1 f − (x)= = = 1 . dx dx f ′(y) f ′( f − (x)) Example 6.3. Let f be a one-to-one, twice differentiable function with inverse function g. Show that f ′′(g(x)) g′′(x)= 3 . − f ′(g(x)) Solution. From the above fact, we know that 1 g′(x)= . (6.1) f ′(g(x)) To get the second derivative of g, we differentiate both sides, obtaining d 1 d 1 1 g′′(x)= = g′(x)= 2 f ′′(g(x))g′(x) dx f ′(g(x)) dg(x) f ′(g(x)) − f ′(g(x)) f ′′(g(x)) = 3 , − f ′(g(x)) where the last equality follows by applying (6.1) again. Example 6.4. Suppose that f (x)= 2x + cosx. Show that f has an inverse function g and find g′(1).

Solution. Observe first that f is defined everywhere and that f ′(x)= 2 sinx, which is always positive. Therefore, f is strictly increasing, so is one-to-one, hence it− has an inverse g. The derivative of g is now given by 1 g′(x)= , f ′(g(x)) 1 but calculating g = f − by hand is actually impossible (try it!). Fortunately, we only need its value at 1 and this is readily guessed:

g(1)= x f (x)= 2x + cosx = 1. ⇒ You can check that x = 0 solves this equation, so g(1)= 0. As f is one-to-one, this is the only solution and we conclude that 1 1 1 1 g′(1)= = = = . f (g(1)) f (0) 2 sin0 2 ′ ′ − 6.5 Inverses of Trigonometric Functions

The sine function sin is obviously not one-to-one on its natural domain because, for example, sin0 = sinπ = 0. We therefore define a function Sin (with a capital “S”) by restricting the 1 1 domain of sin to [ 2 π, 2 π]. On this interval, sin is strictly increasing, taking values in [ 1,1]. − 1 − In particular, Sin is invertible! The inverse function is denoted by sin− (or arcsin):

1 y = sin− x if and only if x = Siny.

1 1 1 1 The domain of sin− is the range [ 1,1] of Sin and the range of sin− is the domain [ 2 π, 2 π] − 1 − of Sin. We draw the graphs of y = Sinx and y = sin− x in Figures 6.7 and 6.8.

64 Figure 6.7: The graph of Sinx.

1 Figure 6.8: The graph of sin− x.

65 1 Similarly, cos− x (or arccosx) is the inverse of the function Cosx, which agrees with cosx 1 on the interval [0,π], and tan− x (or arctanx) is the inverse of Tanx, which agrees with tanx 1 1 1 on the interval ( 2 π, 2 π). The domain of cos− is [ 1,1] and its range is [0,π], whereas the −1 ∞ ∞ 1 −1 domain of of tan− is ( , ) and its range is ( 2 π, 2 π). − 1 − Let’s find the derivative of sin− . Applying our rule for differentiating inverse functions, we get

d 1 1 1 sin− (x)= 1 = 1 dx Sin′(sin− x) Cos(sin− x) 1 1 = = , 2 1 √1 x2 1 Sin (sin− x) − − q where we have used the fact that Cosx > 0 (by definition) to decide that the square root is the 2 1 1 2 2 positive one. We have also recognised that Sin (sin− x)= Sin(sin− x) = x . The derivatives of the other inverse functions may be calculated in the same way. We leave it as an exercise to check that

d 1 1 d 1 1 cos− x = and tan− x = . dx −√1 x2 dx 1 + x2 − We remark that

d 1 d 1 1 1 cos− x = sin− x cos− x = sin− x +C, dx −dx ⇒ − for some constant C. This constant may be evaluated by substituting any x lying in the common 1 1 domain of sin− and cos− . Taking x = 0, we find that

π 1 1 = cos− 0 = sin− 0 +C = C, 2 − 1 hence that C = 2 π. Thus,

1 π 1 cos− x = sin− x ( 1 6 x 6 1). 2 − − 1 This is the “inverse version” of the familiar translation property cosx = sin(x + 2 π).

6.6 Logarithms and Exponentials

We all know that the integral of 1/x is the natural logarithm function ln. Indeed, you should check that x dt = lnx for all x > 0. t Z1 Perhaps you didn’t know that this is, in fact, the definition of the natural logarithmic function! While we haven’t yet revised integration in these notes, it’s not too hard to use this definition to convince yourself of the following facts: d 1 1. lnx is continuous and differentiable on (0,∞) with ln1 = 0 and lnx = . dx x

2. limx ∞ lnx = ∞ and limx 0+ lnx = ∞. → → − 66 3. For all x,y > 0, we have the “log laws”

ln(xy)= lnx + lny, ln(x/y)= lnx lny and ln(xy)= ylnx. − The first is the fundamental theorem of calculus (Section 8.7) and the second follows by con- sidering the area under the curve y = 1/t to see that ∞ dt 1 1 1 1 1 1 1 1 > + + + + + + + + 1 t 2 3 4 5 6 7 8 9 ··· Z 1 1 1 1 1 1 1 1 > + + + + + + + + 2 4 4 8 8 8 8 16 ··· 1 1 1 1 = + + + + = ∞. 2 2 2 2 ··· The log laws can be verified by differentiating with respect to x (treating y as a constant). For example, the first log law follows from setting f (x)= ln(xy) lnx. Then, − 1 1 f ′(x)= y = 0, xy − x hence f must be a constant function as far as x is concerned (this means that the constant may depend on y). We find out which constant by substituting x = 1 to get f (1)= lny ln1 = lny. Thus, − f (x)= lny ln(xy) lnx = lny, ⇒ − which is equivalent to the first log law. Now, ln is a strictly increasing function on (0,∞) because its derivative is always positive there. It therefore has an inverse function which we may as well call exp. Since the domain of ln is (0,∞) and its range is all of R, it follows that the domain of exp is R and its range is (0,∞). Being the inverse function of ln, the exponential function exp has the following properties:

d 1. expx is continuous and differentiable on (0,∞) with expx = expx. dx

2. limx ∞ exp(x)= ∞ and limx ∞ exp(x)= 0. → →− 3. For all x,y in R, we have expx exp(x + y)= expxexpy, exp(x y)= and (expx)y = exp(xy). − expy

The first follows from our rule for computing derivatives of inverse functions and the second can be seen by reflecting y = lnx about the line y = x. The relations constituting the third property follow from the log laws. For example,

ln(expxexpy)= lnexpx + lnexpy = x + y, and applying exp to both sides gives the first required relation. What about e? We all know that expx is actually just an inefficient way of writing ex. Well, we need to first define this number e:

Definition. We define the real number e by e = exp(1), so that lne = 1.

67 3

2

y=x

1 y=exp(x) y=ln(x)

-5 -4 -3 -2 -1 0 1 2 3 4 5

-1

-2

-3

Figure 6.9: The logarithm and exponential functions.

You know that e is one of the most important numbers in calculus. Like π, it can be shown to be irrational (and even transcendental). Its decimal expansion begins with

e = 2.718281828459045235360287471352662497757 ··· As this is a course, you won’t need to remember any of this expansion! However, with this definition, we can finally deduce that

ex = exp ln(ex) = exp(xlne)= expx for all x in R.

The graphs of y = lnx and y= expx are drawn in Figure 6.9. It remains to discuss general exponentials and logarithms. Given that we know all about base e logarithms and exponentials, the general case is defined in terms of these.

Definition. For every a > 0 and x in R, we define ax by

ax = exp(xlna).

When lnb > 0, which requires that b > 1, the function exp(xlnb) is strictly increasing, hence it has an inverse. We compute that lny y = bx = exp(xlnb) lny = xlnb x = . ⇒ ⇒ lnb This tells us how to define a general logarithm.

Definition. For every b > 1 and x > 0, the inverse to bx is the base-b logarithm function defined by lnx log x = . b lnb

68 We could also define the base-b logarithm for 0 < b < 1, but usually don’t bother, because

log x = log x. b − 1/b We cannot define a base-1 logarithm as the function 1x = exln1 = e0 = 1 is not one-to-one. The properties of these generalised logarithm and exponential functions are rather similar to their natural counterparts, so we don’t need to emphasise them. We only remark that it is easy to compute that d d 1 ax = ax lna and log x = . dx dx b xlnb Example 6.5. Find ax 1 lim − . x 0 x → Solution. First, notice that a0 = e0lna = e0 = 1. The trick with this limit is to recognise that it is actually asking for the derivative of f (x)= ax at x = 0:

ax 1 ax a0 d lim − = lim − = ax = ax lna = lna. x 0 x x 0 x 0 dx x=0 x=0 → → −

Here is a neat identity which of significant theoretical interest as well as being of practical value. You’ve probably come across it at some point, but maybe you haven’t seen how it can be derived.

Fact 6.4. For all x in R, x n lim 1 + = ex. n ∞ n →   Proof. First note that if x = 0, then both sides are 1 and there is nothing to prove. We may therefore assume that x = 0 and proceed as follows: 6 x n x ln(1 + x/n) lim ln 1 + = lim nln 1 + = x lim n ∞ n n ∞ n n ∞ x/n → → → h  i hln(1+ h) i ln(1 + h) ln1 = x lim = x lim − h 0 h h 0 h → → d 1 = x lnt = x = x. dt t=1 t t=1

Now, ln is continuous so our rules for taking limits of compos itions give x n x n ln lim 1 + = lim ln 1 + = x, n ∞ n n ∞ n h →   i → h  i from which the result follows by applying exp. One obvious — but important! — application is that of continuously compounding interest. Given a fixed interest rate, the money you invest will grow more quickly if the compounding period is shorter. An investment of $A invested at r% per year, which is compounded n times per year, will return after one year the princely sum of r n $A 1 + . 100n   69 Thus, if A = 10000 and r = 8, so we are investing $10000 at 8%, and the interest is compounded yearly, then n = 1 and after one year you’ll have $10800. If, however, the interest is compounded daily, then n = 365 and after one year you’ll have

8 365 $10000 1 + $10832.78. 100 365 ≈  ×  The highest return is achieved when interest is compounded continuously, meaning that we let n tend to infinity. In this idealised situation, the one year return on $A at r% becomes r n lim $A 1 + = $Aer/100. n ∞ 100n → h   i For comparison, A = 10000 and r = 8 give $10832.87, to the nearest cent, so the difference between compounding daily and compounding continuously is rather small. Nevertheless, can you guess which option the banks offer?

6.7 Logarithmic Differentiation

Sometimes it happens that we wish to differentiate a function, but the standard differentiation rules don’t seem to be of any use. In such cases, one can try what is known as logarithmic dif- ferentiation. This trick can also be useful when computing derivatives of complicated products and quotients with many factors. Here are a couple of examples. Example 6.6. Find the turning points, meaning those values x for which the derivative is zero, of the function f (x)= xx. [You may assume that the domain of f is (0,∞), even though there are many other values of x for which f (x) is defined, x = 1 for example.] − Solution. We will actually compute the derivative in two different ways:

1. One way is to rewrite f (x)= xx as exp(ln(xx)) = exp(xlnx), so that

d x 1 x f ′(x)= exp(xlnx) (xlnx)= x 1 lnx + x = x (1 + lnx). dx · · x The turning points therefore occur when  1 lnx = 1 x = . − ⇒ e

2. Alternatively, we could have proceeded using logarithmic differentiation. For this method, let y = f (x)= xx, so that lny = xlnx. Differentiating implicitly with respect to x now gives

1 dy dy x = 1 + lnx f ′(x)= = y(1 + lnx)= x (1 + lnx), y dx ⇒ dx and the turning point analysis follows as before.

Example 6.7. Differentiate (5x 4)3 y = − √2x + 1 using logarithmic differentiation.

70 Figure 6.10: The cosh and sinh functions.

Solution. We take the natural logarithm of both sides and then differentiate implicitly: 1 lny = 3ln(5x 4) ln(2x + 1) − − 2 1 dy 5 1 2 25x + 19 = 3 = ⇒ y dx 5x 4 − 2 2x + 1 (5x 4)(2x + 1) − − dy 25x + 19 (5x 4)3 (25x + 19)(5x 4)2 = − = − . ⇒ dx (5x 4)(2x + 1) √2x + 1 (2x + 1)3/2 − 6.8 Hyperbolic Functions

Here, we review the hyperbolic functions cosh, sinh, tanh, coth, sech and csch. Pronunciations for these important functions vary — cosh is universally pronounced so as to rhyme with “gosh”, but sinh is sometimes pronounced “shine” and sometimes “sinch”. For the others, it’s probably best to emphasise their hyperbolic nature, for example tanh is “hyperbolic tan” or “hyperbolic tangent” because “than” and “tanch” just sound funny. We start with the most important of the hyperbolic functions, the hyperbolic cosine cosh and the hyperbolic sine sinh, defined as follows:

Definition. For all x in R, define

ex + e x ex e x coshx = − and sinhx = − − . 2 2 Note that the domain of both these functions is all of R. We also remark that cosh is an even function, whereas sinh is odd and strictly increasing (see Figure 6.10). The graph of y = coshx has a shape known as a catenary which is that made by a suspended cable under no tension but that arising from its own weight. While (cost,sint) for t in [0,2π) parametrises the unit circle x2 + y2 = 1, the hyperbolic

71 analogue (cosht,sinht) for t in ( ∞,∞) parametrises (one branch of) the hyperbola x2 y2 = 1: − − et + e t 2 et e t 2 cosh2 t sinh2 t = − − − − 2 − 2     1 2t 2t 2t 2t = e + 2 + e− e 2 + e− = 1. 4 − − h  i This is the reason why these functions are called “hyperbolic functions”. Here are a few more basic properties of cosh and sinh: d d cosh(0)= 1, sinh(0)= 0, coshx = sinhx, sinhx = coshx, dx dx cosh(x + y)= coshxcoshy + sinhxsinhy, sinh(x + y)= sinhxcoshy + coshxsinhy, cosh(2x)= cosh2 x + sinh2 x = 2cosh2 x 1 = 2sinh2 x + 1, − sinh(2x)= 2sinhxcoshx.

There are many others, most of which look a lot like relations between their trigonometric counterparts (but beware of sign differences!). The other hyperbolic functions have reasonably obvious definitions:

Definition. x x x x sinhx e e− 1 e + e− tanhx = = x − x , cothx = = x x , coshx e + e− tanhx e e− 1 2 1 −2 sechx = = , cschx = = . coshx ex + e x sinhx ex e x − − − We have drawn the graphs of y = tanhx and y = cothx in Figure 6.11 for convenience. Note that tanhx and sechx are defined everywhere, while cothx and cschx are defined everywhere except at x = 0. The derivatives of these hyperbolic functions can be easily obtained from those given above for coshx and sinhx using the quotient rule. You should check that the results are: d d tanhx = sech2 x, cothx = csch2 x dx dx − d d sechx = sechxtanhx cschx = cschxcothx. dx − dx −

6.9 The Inverse Hyperbolic Functions

Of course, one can ask if the hyperbolic functions have inverses when restricted to an appropri- 1 ate domain. It’s easy to see that sinh is one-to-one on R, so it has an inverse function sinh− (or arcsinh) which is also defined everywhere. We can even compute this function as follows:

ex e x e2x 1 y = sinhx = − − = − e2x 2yex 1 = 0. 2 2ex ⇒ − − This is a quadratic equation for ex which we can solve to get

2y 4y2 + 4 ex = ± = y y2 + 1. p2 ± p 72 3

2 y=coth(x)

1

y=tanh(x)

-5 -4 -3 -2 -1 0 1 2 3 4 5

-1

-2

-3

Figure 6.11: The tanh and coth functions.

Now, ex > 0 for all x and y2 + 1 > y2 > y for all y, so we are forced to take the positive square root in the above. Thus, p p ex = y + y2 + 1 x = ln y + y2 + 1 . ⇒ p  p  In other words, we have shown that the function inverse to sinhx is

1 2 sinh− x = ln x + x + 1 .  p  You should probably check again that x + √x2 + 1 is always positive, so we can actually take its logarithm! One can repeat the above steps for coshx. However, coshx is not one-to-one, so we must restrict its domain. The natural choice for a domain on which coshx is one-to-one is [0,∞). The 1 inverse of this restricted function is denoted by cosh− x and is given by

1 2 cosh− x = ln x + x 1 . −  p  Its domain is [1,∞) and its range is [0,∞). Similarly, one can show that

1 1 + x tanh 1 x = ln , − 2 1 x  −  with domain ( 1,1) and range R. If you are ever unlucky enough to have to deal with the inverse functions− to the other hyperbolic functions, it’s probably best to look them up.

73 Chapter 7

Applications of Derivatives

7.1 Related Rates

The physical interpretation of derivatives is that they tell you how fast something is changing. In particular, derivatives with respect to time tell us the rate of change of the quantity being differentiated. So, velocity is the time derivative of displacement and acceleration is the time derivative of velocity. A popular application of differentiation, at least among math instructors, is known through the moniker “related rates”. Here, one is give some relation between quan- tities and asked to compute how fast one quantity is changing given the rates of change of the other quantities — simple in theory (one just differentiates the given relation implicitly to get a relation between the rates of change) but often confusing in practise. Here are some examples:

Example 7.1. Air is being pumped into a spherical balloon at the rate of 1m3/min. How fast is the radius of the balloon increasing when its volume is 10m3?

Solution. Because the balloon is spherical, we have a relation between the volume V and the radius r: 4 V = πr3. 3 Once we have a relation between the quantities whose rates are given or need computing, we differentiate implicitly; in this case, with respect to time t: dV dr = 4πr2 . dt dt

We know that dV/dt = 1m3/min and V = 10m3. This means that r =(3V/4π)1/3, so we can solve for dr/dt:

dr 1 dV 1 4π 2/3 dV 1 1m3/min = = = 0.0446m/min. dt 4πr2 dt 4π 3V dt (36π)1/3 102/3m2 ≈   The radius is therefore increasing at a rate of about 4.5 centimetres per minute.

Example 7.2. A 5m ladder is leaning against a wall. Suddenly, it starts sliding — the base of the ladder starts moving away from the wall and the top of the ladder starts sliding down the wall. If the base moves at 3m/s when it is 4m from the wall, how fast is the top of the ladder moving?

74 Solution. We assume that the wall is vertical and let x and y describe the distance of the ladder’s base from that of the wall and the distance of the ladder’s top from the base of the wall. We are therefore given dx/dt = 3m/s when x = 4m and asked to compute dy/dt. The relation between x and y is obviously x2 + y2 =(5m)2 = 25m2. Now it is just a matter of differentiating implicitly: dx dy dy x dx x dx 2x + 2y = 0 = = . dt dt ⇒ dt −y dt −√25m2 x2 dt − Plugging in the numbers gives dy 4m = 3m/s = 4m/s. dt −√25m2 16m2 · − − In other words, the top of the ladder is sliding down the wall (note the negative sign!) at 4m/s.

7.2 l’Hopital’sˆ Rules

There are several types of indeterminate forms that one often meets when trying to compute limits. Here are some examples, along with what ill-defined form one arrives at through na¨ıve substitution:

sinx lim [0/0] x 0 x → lnx lim [∞/∞] x ∞ x → 1 1 lim [∞ ∞] x 0+ x − sinx − →   1 lim (xln ) [0 ∞] x 0+ x × → lim xx [00] x 0+ → lim x1/x [∞0] x ∞ → 1 n lim 1 + [1∞] n ∞ n →  

You might like to ponder why we haven’t included any examples involving the form [0∞]. Is it indeterminate? Of course, such substitutions do no good for these limits (substituting ∞ never does you any good!). At the point in question, the fact that the form is indeterminate means that it could be defined or not: Our task is to decide which! In some cases, such indeterminate forms can be evaluated by simplification (cancelling common factors for example). Often however, we need some better tools. The rules of l’Hopitalˆ are examples of these which we will introduce

75 shortly. These are not the only tools available — we have already computed the first example above (with form [0/0]) in Section 3.5, using geometry and the Squeeze Theorem, and the last example was also computed in Fact 6.4 (Section 6.6). Moreover, we shall see in Section 10 that the study of Taylor series also gives another powerful tool for evaluating limits. The rules of l’Hopitalˆ apply directly to the calculation of limits when the indeterminate form is of the type [0/0] or [∞/∞]. The other indeterminate forms can be reduced to one of these by algebraic manipulation, often involving logarithms.

Fact 7.1 (l’Hopital’sˆ First Rule). Suppose that f and g are two differentiable functions defined on an interval (a,b) satisfying g (x) = 0 for any x in (a,b), ′ 6 f (x) lim f (x)= lim g(x)= 0 and lim ′ = L, x a+ x a+ x a+ g (x) → → → ′ where we allow L = ∞. Then, ± f (x) lim = L. x a+ g(x) → There are similar versions (and proofs!) in which lim + is replaced by lim or lim x a x b− x c with a < c < b. Both a = ∞ and b = ∞ are allowed in→ each of these versions. → → − Proof. First, we assume that a is finite and define

f (x) if a < x < b, g(x) if a < x < b, F(x)= and G(x)= (0 if x = a (0 if x = a.

Then, F and G are continuous on the interval [a,x] and differentiable on the interval (a,x) for all x < b, so we may apply the Generalised Mean Value Theorem (Fact 5.5 in Section 5.9) to conclude that there exists some c in (a,x) for which

f (x) F(x) F(x) F(a) F (c) f (c) = = − = ′ = ′ . g(x) G(x) G(x) G(a) G (c) g (c) − ′ ′ Now, a < c < x means that if we let x tend to a from the right, then the c for which the above relation holds will have to tend to a (from the right) as well. It follows that

f (x) f (c) lim = lim ′ = L. x a+ g(x) c a+ g (c) → → ′ It only remains to consider the case a = ∞. This, in fact, follows from what we have already shown by making the substitution x = 1−/t: − f (x) f ( 1/t) f (1/t)/t2 f (x) lim = lim − = lim ′ = lim ′ = L. x ∞ g(x) t 0+ g( 1/t) t 0+ g (1/t)/t2 x ∞ g (x) →− → − → ′ → ′ We remark that while there are three conditions required to be satisfied before we can apply l’Hopital’sˆ rule, only the second, which insists that the indeterminate form is of the type [0/0], is truly important. The first, g′(x) = 0, needs only to be satisfied for all x sufficiently close to the limit point a. Most importantly,6 it does not need to be satisfied at x = a. The third, that the limit of the quotient of the derivatives exists, is usually checked after the fact.

76 Example 7.3. Evaluate the limits cosx 1 cosx 1 lim − and lim − . x 0 x x 0 x2 → →

Solution. Taking f (x)= cosx 1 and g(x)= x, we find that g′(x)= 1 is certainly non-zero around x = 0 and the limit is indeterminate− of type [0/0]. We may therefore try l’Hopital:ˆ cosx 1 sinx lim − = lim − = 0. x 0 x x 0 1 → → Note that the limit of the quotient of the derivatives does exist (it’s 0), so this validates our use of the rule. 2 For the second limit, we again have f (x)= cosx 1, but now g(x)= x , so g′(x)= 2x. Now, this derivative is still non-zero around x = 0 (the fact− that it is zero at x = 0 is of no import). The limit is again indeterminate of type [0/0], so we try l’Hopitalˆ once more: cosx 1 sinx 1 lim − = lim − = . x 0 x2 x 0 2x −2 → → Again, this is justified by the fact that the latter limit exists. Note however that if we didn’t already know this limit, then we could have applied l’Hopitalˆ again to compute it, because sinx/(2x) is indeterminate at x = 0 of type [0/0]! The computation would then read − cosx 1 sinx cosx 1 lim − = lim − = lim − = . x 0 x2 x 0 2x x 0 2 −2 → → → The justification now proceeds backwards. Applying l’Hopitalˆ to sinx/(2x) is justified be- cause the limit of the quotient of the derivatives exists. But then,− applying l’Hopitalˆ to the original limit is now justified as we’ve just shown that the limit of the ratio of the derivatives exists. It’s very important to remember that l’Hopitalˆ only applies when one has a limit of inde- terminate form. Things can go very badly wrong if you forget this. For example, we know that 1 lim x 0 x → does not exist. But, if we were to apply l’Hopital,ˆ we would conclude that the limit was 0. This is obviously wrong and the reason is that the limit has form [1/0] which is not indeterminate. Example 7.4. Evaluate the limit

sin(2x) 2sinx lim − . x 0 sinx x → − Solution. This limit is indeterminate of type [0/0], but it requires more than one application of l’Hopital’sˆ rule. You should check that we have an indeterminate limit of type [0/0] every time the rule is being used:

sin(2x) 2sinx 2cos(2x) 2cosx 4sin(2x)+ 2sinx lim − = lim − = lim − x 0 sinx x x 0 cosx 1 x 0 sinx → → → − 8cos(2x−)+ 2cosx − = lim − = 6. x 0 cosx → − 77 Here is the second of l’Hopital’sˆ rules. It applies when the limit has the indeterminate form [∞/∞].

Fact 7.2 (l’Hopital’sˆ Second Rule). Suppose that f and g are two differentiable functions de- fined on an interval (a,b) satisfying g (x) = 0 for any x in (a,b), ′ 6 f (x) lim f (x)= ∞, lim g(x)= ∞ and lim ′ = L, x a+ ± x a+ ± x a+ g (x) → → → ′ where we allow L = ∞. Then, ± f (x) lim = L. x a+ g(x) → Again, there are similar versions for lim and lim with a < c < b, with a = ∞ and x b− x c b = ∞ allowed in each. → → −

Example 7.5. Find e3x lim . x ∞ x2 → Solution. This is indeterminate of type [∞/∞], so we write

e3x 3e3x 9e3x lim = lim = lim = ∞. x ∞ x2 x ∞ 2x x ∞ 2 → → → You should check that the second application of l’Hopitalˆ was also to a limit of indeterminate type.

Example 7.6. Find limx 0+ (xlnx). → Solution. This is indeterminate of the form [0 ∞]. We convert it into the form [∞/∞] and compute: × lnx 1/x lim (xlnx)= lim = lim = lim ( x)= 0. x 0+ x 0+ 1/x x 0+ 1/x2 x 0+ − → → → − → Example 7.7. Find 1 1 lim . x 0+ ex 1 − x →  −  Solution. This is indeterminate of the form [∞ ∞]. However, if we rewrite it as − 1 1 x ex + 1 lim = lim − , x 0+ ex 1 − x x 0+ xex x →  −  →  −  then we convert it into the indeterminate form [0/0]. Applying l’Hopital’sˆ rule twice now gives

x ex + 1 1 ex ex 1 lim − = lim − = lim − = . x 0+ xex x x 0+ ex + xex 1 x 0+ 2ex + xex −2 →  −  →  −  →   Example 7.8. Find sinhx lim , x ∞ coshx → using l’Hopital.ˆ

78 Solution. This is clearly indeterminate of type [∞/∞]. However, blindly applying l’Hopitalˆ doesn’t work this time: sinhx coshx sinhx lim = lim = lim = x ∞ coshx x ∞ sinhx x ∞ coshx → → → ··· x 1 1 Luckily, we have a trick up our sleeve. Let y = e , so coshx = y + y− and sinhx = y y− . As x ∞ obviously implies that y ∞, we may write − → → sinhx y y 1 1 + y 2 lim = lim − − = lim − = 1. x ∞ coshx y ∞ y + y 1 y ∞ 1 y 2 → → − → − − We’ve used l’Hopitalˆ for the second equality above as directed. However, it’s even easier to compute this limit by dividing the numerator and denominator by y. Don’t forget that there will be occasions where these sophisticated tools will be less efficient than the elementary methods we introduced in Chapter 3!

7.3 Extreme Values

We make the following definition:

Definition. We say that a function f has a global maximum (or absolute maximum) at some point x0 in the domain of f if f (x) 6 f (x0) for all x in this domain. Similarly, f has a global minimum (or absolute minimum)atx0 when f (x) > f (x0) for all x in the domain of f . Note that global extreme values can be assumed at more than one point. For example, f (x)= 1 sinx takes its global maximum at every x of the form 2 π + 2πn, where n is an integer. For a more extreme example, f (x)= 1 takes its global maximum and its global minimum at every 1 point. Note also that a function need not have a global maximum or minimum, tan− x being a good example. However, any continuous function defined on a closed (finite) interval has a global maximum and minimum on that interval. In addition to a global minimum or maximum, a function can also have local minima and maxima. We will see in the following section that the global maximum need not be the largest of the local maxima and the global minimum need not be the smallest of the local minima. However, this will be true if the domain of the function is a closed finite interval. The formal definition goes as follows:

Definition. Let D be the domain of a function f and let x0 and x1 be points in D. Then, f is said to have a local maximum at x0 if there exists an ε > 0 for which

f (x) 6 f (x0) for all x in D with x x0 < ε. | − |

Similarly, f is said to have a local minimum at x1 if there exists an ε > 0 for which

f (x) > f (x1) for all x in D with x x1 < ε. | − | When we don’t want (or need) to distinguish if such points are maxima or minima, we will refer to them as local or global extrema. We illustrate these definitions with an example in Figure 7.1.

Fact 7.3. Let f be defined on a closed interval. Then, any local extremum of f occurs when either

79 Figure 7.1: This function has local maxima when x = a,x2,x4,x6 and local minima when x = x1,x3,x5,b. The absolute maximum is the largest of the local maxima, hence it occurs at x = x2, and the absolute minimum is the smallest of the local minima, hence it occurs at x = x3.

• x is a stationary (or turning) point of f , meaning that f ′(x)= 0,

• x is a singular point of f , meaning that f ′(x) does not exist, or • x is an endpoint of the interval.

Note that when x is an endpoint of the interval that f is defined on, f ′(x) cannot exist — the best that we could hope for would be that either the left-sided or right-sided derivative exists there. So, the endpoints of the interval are also singular points. In general, it is convenient to consider both stationary points and singular points together. We therefore define a critical point to be a point x which is either a stationary point or a singular point. It now follows that locating extreme values on closed finite intervals amounts to finding all the critical points and evaluating f at each. The largest value is the global maximum and the smallest is the global minimum.

7.4 The First Derivative Test

Unless the function f is really unpleasant, we can use the derivative f ′ to identify local extrema and determine if they are maxima or minima. Roughly speaking, if the sign of f ′(x) changes as x passes through x0, then x0 is a local extremum. More precisely, we have the following result: Fact 7.4 (The First Derivative Test). Suppose that there exists ε > 0 for which f is differentiable at all x in the domain D of f satisfying 0 < x x0 < ε, so that the derivative is defined at all | − | points of D sufficiently close to x0, but not necessarily at x0 itself. Then,

1. x0 will be a local maximum if we can find such an ε for which

f ′(x) > 0, for all x in D with x0 ε < x < x0, and − f ′(x) 6 0, for all x in D with x0 < x < x0 + ε.

2. x0 will be a local minimum if we can find such an ε for which

f ′(x) 6 0, for all x in D with x0 ε < x < x0, and − f ′(x) > 0, for all x in D with x0 < x < x0 + ε.

80 We note that excluding f ′(x0) from being defined in this result allows x0 to be a singular point. It is also important to realise that when f ′(x) does not change sign as x passes through x0, then x0 cannot be a local extremum. Example 7.9. Find the global extrema of the function f (x)= sinx on the interval [ π,π]. − Solution. This is a continuous function on a closed interval, so we may apply the First Deriva- tive Test. The function is differentiable everywhere on ( π,π), so the only singular points are the endpoints x = π and x = π. The stationary points are− obtained from − π π f ′(x)= cosx = 0 x = and x = . → − 2 2 1 1 The complete list of critical points is therefore π, 2 π, 2 π,π . The values of the function at these points, the critical values are 0, 1,1,0 −, so− we conclude that the global maximum on { } [ π,π] is 1 and the global minimum is −1. − − 1 Example 7.10. Prove, or disprove, that tanx > x for all x in [0, 4 π]. Solution. We consider the function f (x)= tanx x and ask whether f (x) > 0 on the specified − 1 interval. The critical points of f include the endpoints x = 0 and x = 4 π at which f takes values 0 and 1 1 π > 0, respectively. In fact, there are no other critical points to check because − 4 2 2 f ′(x)= sec x 1 = 0 cos x = 1 x = nπ (n any integer), − → → and only x = 0 lies in the domain. The global minimum of f is therefore 0, which proves that 1 tanx > x on [0, 4 π]. Example 7.11. Find all critical points of f (x)= 3x4 4x3 12x2 +5 and use the First Deriva- tive Test to classify them. − − Solution. First, we find the derivative of f :

3 2 f ′(x)= 12x 12x 24x = 12x(x 2)(x + 1). − − − Solving f ′(x)= 0 gives critical points at x = 1 , 0 and 2. Examining the derivative, which is continuous, in the intervals ( ∞, 1), ( 1,0)−, (0,2) and (2,∞), we get − − − f ′(x) < 0 on ( ∞, 1) because f ′( 2)= 96, − − − − 1 15 f ′(x) > 0 on ( 1,0) because f ′( )= , − − 2 2 f ′(x) < 0 on (0,2) because f ′(1)= 24, − and f ′(x) > 0 on (2,∞) because f ′(3)= 144. The First Derivative Test now tells us that we have local minima at 1 and 2, and a local maximum at 0. − Example 7.12. What is the global maximum and minimum of f (x)= x2/3 on [ 1,1]? − 2 1/3 Solution. Since f ′(x)= 3 x− is undefined at x = 0, we have critical points at x = 1, 0 and 1. The critical values of f are therefore 1, 0 and 1, respectively, hence the global maxi−mum is 1 and the global minimum is 0.

We remark that in this example, f ′(x) < 0 for x < 0 and f ′(x) > 0 for x > 0. Thus, x = 0isa local maximum by the First Derivative Test, even though f ′(0) is not defined.

81 7.5 Functions Defined on General Intervals

All of these examples involved functions defined on a closed finite interval. If one has an open or half-open or infinite interval instead, two complications may arise. First, because an endpoint of the interval is not in the domain, we should take the appropriate one-sided limit of f (x) as x approaches the endpoint rather than just evaluate f at the endpoint. Second, it may happen that f does not have a global maximum or minimum. Example 7.13. Does f (x)= 1/x have a global minimum or maximum? Solution. f has no stationary points as f (x)= 1/x2 is never zero. In fact, f has no critical ′ − points at all, because even though f ′(x) is undefined at x = 0, f is not defined there (0 is not in the domain of f ). Moreover, lim f (x)= ∞ and lim + f (x)= ∞, so f has neither a x 0− x 0 global maximum nor a global minimum.→ − → Example 7.14. Does f (x)= x2 have a global maximum or minimum on ( 1,1)? − Solution. There is a unique stationary point at x = 0 and f (0)= 0. This is clearly the global minimum, because x2 > 0 and x2 = 0 implies that x = 0. As the endpoints x = 1 are not in the domain of f , we should consider the limits ±

lim f (x)= 1 and lim f (x)= 1. x 1+ x 1 →− → − Since f is continuous on ( 1,1), it follows that 0 6 f (x) < 1 there, but there are no points x for which f (x)= 1. Thus,−f is bounded above by 1 but does not have a global maximum on ( 1,1). − We remember that functions which are continuous on closed intervals always achieve their extremal values. For (half-)open or infinite intervals, we can sometimes deduce the existence of a global maximum or minimum using the following result. Fact 7.5. Suppose that f is continuous on (a,b) with

lim f (x)= L and lim f (x)= M, x a+ x b → → − where we allow the possibilities L,M = ∞. Then: ± • If u in (a,b) is such that f (u) > max L,M , then f has a global maximum on (a,b). { } • If u in (a,b) is such that f (u) 6 min L,M , then f has a global minimum on (a,b). { } Of course, asserting that f has a global extremum means that f has a critical point. We used this in the previous example to deduce that f (x) < 1 on the intervals ( 1,0) and (0,1). For, if f had taken a value 1 or greater on either interval, then there would have− known that there was a critical point on that interval, but we had already shown that there were none. Example 7.15. Find the global extrema of the function f (x)= x + 4/x on the interval (0,∞).

Solution. First, we note that limx 0+ f (x)= limx ∞ f (x)= ∞, so there cannot be a global maximum. Second, as f is continuous→ on (0,∞), the→ above result with L = M = ∞ tells us that f has a global minimum. Searching for critical points, we find no singular points and a single 2 stationary point at x = 2 where f ′(x)= 1 4/x = 0 (x = 2 is not in the domain). The global minimum is therefore f (2)= 4. − −

82 Figure 7.2: An example illustrating the meanings of concave up and concave down.

7.6 Concavity and Points of Inflection

The idea of concavity is best demonstrated diagrammatically. In Figure 7.2, we’ve drawn the graph of a function f that is concave up on (a,b) and concave down on (b,c). The next definition makes this concept mathematically precise.

Definition. A function is said to be concave up on an open interval (a,b) if f is differentiable on (a,b) and f ′ is strictly increasing there. Similarly, a function is said to be concave down on an open interval (a,b) if f is differentiable on (a,b) and f ′ is strictly decreasing function there. Geometrically, one can recognise concavity through the following consequences:

• On concave up intervals, the graph of a function lies above its tangent lines.

• Chords connecting points of concave up intervals lie above the graph.

• On concave down intervals, the graph of a function lies below its tangent lines.

• Chords connecting points of concave down intervals lie below the graph.

• If there is a tangent line to the graph at the point (a, f (a)) and this point separates a concave up from a concave down section, then the curve actually crosses its tangent line at that point.

This last point suggests the concept of an inflection point of a graph.

Definition. A point (x0, f (x0)) on the graph of a function f is said to be an inflection point if the graph has a tangent line at this point and the graph changes from being concave up to concave down (or vice versa) as x passes through x0. Fact 7.6. Let f be a function which can be differentiated twice on (a,b). Then,

• If f ′′(x) > 0 for all a < x < b, then the graph of f is concave up on (a,b).

83 2

1.6

1.2

0.8

0.4

-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.4

-0.8

Figure 7.3: The function f (x)= x4 is concave up everywhere except x = 0.

• If f ′′(x) < 0 for all a < x < b, then the graph of f is concave down on (a,b).

• Ifx0 is a point of inflection for f , then f ′′(x0)= 0.

Note that this theorem tells you where to look for inflection points, namely where f ′′ is zero, 4 but it does not say that every zero of f ′′ is an inflection point. For example, f (x)= x does not 2 4 have an inflection point at x = 0, even though f ′′(x)= 12x , so f ′′(0)= 0, because y = x is concave up on both ( ∞,0) and (0,∞) (see Figure 7.3). − 7.7 The Second Derivative Test

If the graph of a function f is concave up in some interval containing a stationary point, then the stationary point is a local minimum. Similarly, when f is concave down in some interval containing a stationary point, then the stationary point is a local maximum. This means that we can often use the value of the second derivative at the stationary point to decide whether the function has a local minimum or maximum there.

Fact 7.7 (The Second Derivative Test). Let f be twice differentiable. Then:

• If f ′(x)= 0 and f ′′(x) < 0, then f has a local maximum at x.

• If f ′(x)= 0 and f ′′(x) > 0, then f has a local minimum at x.

• If f ′(x)= 0 and f ′′(x)= 0, then could have a local maximum, a local minimum or a point of inflection at x.

Example 7.16. Let f (x)=(x2 1)2. Find its stationary points and use the Second Derivative Test to classify them. −

Solution. First, we calculate that f (x)= 4x3 4x, hence that ′ − 2 f ′′(x)= 12x 4. −

84 2.4

2

1.6

1.2

0.8

0.4

-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.4

Figure 7.4: The graph of the function f (x)=(x2 1)2 analysed in Example 7.16. −

1.6

1.2

0.8

0.4

-2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.4

-0.8

-1.2

-1.6

Figure 7.5: The graph of the function f (x)= x3 considered in Example 7.17.

f is therefore twice differentiable and solving f (x)= 0 gives stationary points at x = 1, 0 and ′ − 1. Substituting these values into f ′′, we obtain

f ′′( 1)= 8, f ′′(0)= 4 and f ′′(1)= 8. − − The Second Derivative Test then tells us that x = 1 are local minima and that x = 0 is a local maximum. We check this by plotting the graph in± Figure 7.4.

Example 7.17. Find the critical points of f (x)= x3 and characterise them.

2 Solution. As before, we compute the first two derivatives: f ′(x)= 3x and f ′′(x)= 6x. As f is clearly twice differentiable, we try the second derivative test. Unfortunately, the only critical point is at x = 0 and we find that f ′′(0)= 0. In this case, the Second Derivative Test tells us nothing and we must fall back on the First Derivative Test. We see that f ′(x) > 0 for x < 0 and for x > 0, so that x = 0 is not a local extremum. If we check the concavity for this example on either side of x = 0, we find that the graph changes from being concave down to concave up as x passes through 0. We therefore have a point of inflection for f at x = 0 (see Figure 7.5.

85 7.8 Sketching Graphs

All the ideas that we have covered recently may be put to use when drawing the graph of a function f . Here are a couple of steps to work through when asked to graph an unfamiliar function:

• Check for possible symmetries. For example, is f even ( f (x)= f ( x)) or odd ( f (x)= f ( x))? − − − • Find the domain D of f and calculate the left- and right-sided limits at points where f is undefined. This can indicate vertical asymptotes.

• Find limx ∞ f (x) to look for horizontal asymptotes. →±

• Calculate f ′(x) and f ′′(x) to find critical points, classifying them, and points of inflection. This can also be used to find intervals on which f is one-to-one.

• Calculating the intercepts with the x- and y-axes is often useful (and easy!).

After completing some, or all, of these steps, you will most likely know more than enough to draw a decent sketch of your function f . Your textbook has lots of examples dedicated to curve sketching.

Example 7.18. Sketch the graph of the function y =(x2 1)3. − Solution. This function is even, tending to ∞ as x tends to ∞, and it intercepts the x-axis at x = 1 and the y-axis at y = 1. The first derivative is ± ± − dy = 6x(x2 1)2 = 6x(x 1)2(x + 1)2, dx − − so f has no singular points and the stationary points are at x = 0 and x = 1, where y = 1 and y = 0, respectively. The second derivative is ± −

d2y = 6(x2 1)2 + 12x(x2 1) 2x = 6(x2 1)(5x2 1), dx2 − − · − − √ 64 so we have candidate points of inflection at x = 1 and x = 1/ 5, where y = 0 and y = 125 , respectively. ± ± − Evaluating the second derivative at the stationary points, we find that x = 0 is a local min- imum but that x = 1 are undecided. We therefore resort to the First Derivative Test, noting ± 1 that the derivative at x = 2 is positive, as is the derivative at x = 2. x = 1 is therefore not a local extremum and because the function is even, x = 1 is not one either. We now analyse the candidate points of inflection.− The second derivative is positive at 1 √ x = 0, negative at x = 2 and positive at x = 2. Therefore, both x = 1 and x = 1/ 5 are points of inflection. Evenness allows us to conclude the same for x = 1 and x = 1/√5. This pretty much exhausts the information we can extract. Our sketch− for the graph− is given in Figure 7.6.

86 Figure 7.6: A sketch of the graph of the function considered in Example 7.18.

7.9 Optimisation

In optimisation problems (see your textbook for a more detailed version), one generally has to:

1. Model the quantity to be optimised with a function f of a suitable variable x.

2. Determine a suitable domain for x.

3. Find the maxima and/or minima of f for values of x in the domain.

4. Re-interpret the (mathematical) results in the language of the original problem.

We are going to see several examples. Example 7.19. Assume that the cost of running a certain truck (excluding drivers wages) is modelled by (10 + v/2)¢/km, when the truck travels at a speed of v kilometres per hour. If the driver has to be paid $12.50 per hour, what is the most economical speed at which to operate the truck on a route of fixed distance between two cities. The maximum speed permissible is 55km/h?

Solution. We begin by letting d > 0 be the distance between the two cities. The two costs involved are v v c = 10 + ¢/km = $ 0.1 + /km and c = $12.5/h, 1 2 200 2     being the expense of running the truck and that of paying the driver, respectively. If the trip takes t hours, then our total expense is v c = c d + c t = $ 0.1 + d + 12.5t. 1 2 200   Because d = vt and d is constant, we can rewrite c as a function of the variable v: v d c(v)= $ 0.1 + d + 12.5 . 200 v   87 To find which is the most economical speed at which to operate the truck, we look for the global minimum of c(v) on the interval (0,55]. Now, c(v) is continuous on this interval and its first derivative is d 12.5d d v2 2500 c′(v)= = − . 200 − v2 200 v2 The stationary points are therefore at v = 50. Because c(v) has no singular points on (0,55], it follows that the global minimum must be± either at the endpoints v = 0 and v = 55, or at v = 50. Evaluating c(v) appropriately gives

lim c(v)= ∞, c(50)= 0.6000d, c(55) 0.6023d. v 0+ ≈ → The most economical speed at which to operate the truck is therefore 50km/h. Example 7.20. Find the largest possible volume for a right circular cylinder that is inscribed within a sphere of radius R.

Solution. First, we must find the volume of any cylinder inscribed in this manner. Let the radius of the cylinder be r. As the radius of the sphere is R, Pythagoras’ Theorem gives (see Figure 7.7) the height of the cylinder as 2√R2 r2 and the volume of the cylinder as − V(r)= 2πr2 R2 r2. − The domain of V is the interval [0,R] on which itp is continuous. Differentiating, we obtain

2 2 2 1 2r 2πr 2 2 V ′(r)= 2π 2r R r r = (2R 3r ). − − · 2 √R2 r2 √R2 r2 −  p −  − There is therefore a critical point at r = 2/3R. Evaluating V(r) here and at each of the endpoints gives p √2R 4 V(0)= 0, V = πR3 and V(R)= 0. √3 ! 3√3 We conclude that the maximum volume you can achieve when inscribing a cylinder within a sphere is 1/√3 times that of the sphere.

Example 7.21. You wish to fence in a rectangular pasture of 30000m2, with one side of the fence against a neighbour’s lot. Your neighbour agrees to pay half the cost of this piece of the fence. What dimensions should you choose for the pasture in order to minimise your costs?

Solution. Let the cost of fencing be p dollars per metre and let the dimensions of the pasture be x metres by y metres deep, so that the length of the fence you share with your neighbour is x metres long. The total cost of the fence is then

1 3 c = $ xp + xp + yp + yp = $ x + 2y p. 2 2     Since the area of the pasture fixes xy = 30000, we can rewrite the cost as a function of x: 3 60000 c(x)= x + p. 2 x   88 Figure 7.7: The geometry of Example 7.20 — a cylinder inscribed within a sphere.

The domain is (0,∞). Again, we compute the first derivative,

3p(x2 40000) c (x)= − , ′ 2x2 to be the stationary point at x = 200. Since limx 0+ c(x)= limx ∞ c(x)= ∞ and there are no other critical points on (0,∞), x = 200 is the global→ minimum. The→ dimensions of the pasture which minimise the cost is then x = 200 metres by y = 150 metres.

Example 7.22. A lighthouse at position L is 5km from a straight stretch of beach. There is also a station on the beach at position B which is 10km from the point on the beach which is closest to the lighthouse. The lighthouse and the station are to be joined by a cable whose cost is 5000 dollars per kilometre for underwater cable and 3000 dollars per kilometre for overland cable. Find the minimum cost of laying this cable.

Solution. Let A be the point on the beach closest to L and let C be an arbitrary point on the beach, x kilometres from A and 10 x kilometres from B. Referring to the drawing below, consider the cost T(x) of laying cable− from L to B via C.

L •

√x2 + 25 5

10 x − A•x C• B •

89 The function we have to minimise is therefore

T(x)= 5000 x2 + 25 + 3000(10 x) (0 6 x 6 10). − We differentiate as usual: p

2x 1000 2 T ′(x)= 5000 3000 = (5x 3 x + 25). · 2√x2 + 25 − √x2 + 25 − p There are no singular points, so the minimum we seek is either at the endpoints x = 0,x = 10 or at the stationary points obtained by solving 15 5x = 3 x2 + 25 25x2 = 9x2 + 225 x = . ⇒ ⇒ ± 4 p 15 The point x = 4 is discarded as it obviously doesn’t minimise the cost. Evaluating T at the − 15 critical points x = 0, x = 4 and x = 10 then gives

T(0)= 55000, T( 15 )= 50000 and T(10) 55901.70, 4 ≈ so the cost of the cable laying is minimised when C is situated 3.75km from A and the minimal cost is $50000.

7.10 Roots of Functions: Newton’s Method

Newton’s method provides an iterative method for finding roots of functions, that is, solutions to equations of the form f (x)= 0. Newton’s method and other variants are frequently used in software packages such as Maple and Mathematica. To apply Newton’s method, we begin by choosing a value x0 in the domain of f and forming the point P0 =(x0, f (x0)) on the curve y = f (x). The equation of the tangent line to the curve at P0 is given by y f (x0) − = f ′(x0) x x0 − and this line will intersect the x axis at y = 0. Let the x1 be the value of x at this intersection. Then, 0 f (x0) f (x0) − = f ′(x0) x1 = x0 . x1 x0 ⇒ − f (x0) − ′ If we now repeat this process, but using x1 instead of x0 as the starting point, we will obtain a tangent line that intersects the x-axis at x = x2, where

f (x1) x2 = x1 . − f ′(x1)

In general, iterating this procedure leads to a sequence x0, x1, x2, . . . whose entries are related by f (xn) xn+1 = xn . − f ′(xn)

The only proviso is that we must insist that f (xn) = 0 for any n. ′ 6

90 Figure 7.8: An illustration showing why Newton’s method for finding roots might be expected to converge.

Now, note that if this iteration stabilises at some point, meaning that xn+1 = xn for some n, then f (xn) xn = xn f (xn)= 0. − f ′(xn) ⇒ In other words, xn is a root of f . Unfortunately, we are extremely unlikely to choose x0 so that the iteration stabilises, but we can hope that the sequence (xn) converges, meaning that repeatedly iterating gets us closer and closer to a true root. From a geometric perspective, we are hoping that by following the tangent down to the x axis at each stage of the process, we are getting closer and closer to hitting the point where the curve intersects the x axis (the root where f (x)= 0). This idea is illustrated in Figure 7.8. After n applications of Newtons method, you will have constructed a sequence (x0,x1,...,xn) of approximations to a root of f (x), hopefully each one improving in accuracy. We need to have some kind of condition for when we should stop and we will often say that we stop when xn 1 xn is within some desired degree of accuracy. + − Those most important step in applying Newton’s method is choosing the initial choice x0. A good choice will lead to a sequence that converges to a root quite quickly, while a bad choice could lead to major problems. One obvious problem is that one of the xn could have f ′(xn)= 0, in which case xn+1 is undefined. A less obvious possibility is that a bad choice could leave you oscillating around the root and not converging, or even diverging to ∞. One method that will usually avoid these problems is to find an ±interval [a,b] in which f is continuous and f (a) and f (b) have different signs. We are then guaranteed that there is a root in [a,b] by the Intermediate Value Theorem. If we can choose [a,b] so that, in addition, f ′(x) does not change sign in this interval, then the sequence of approximate roots often converges quickly if we take x0 [a,b]. ∈ Example 7.23. Let f (x)= cosx x and use Newton’s method, with x0 = 1 to find an approxi- mate solution to f (x)= 0. −

Solution. First, we compute that f (x)= sinx 1, so that Newton’s method amounts to the ′ − −

91 recurrence relation cosxn xn xn+1 = xn + − . sinxn + 1 Using this relation with x0 = 1, we compute the following approximate roots: n 0 1 2 3 4 ··· xn 1.00000000 0.75036387 0.73911289 0.73908513 0.73908513 ··· Here, x3 and x4 agree to 8 decimal places, so we decide that x = 0.73908513 is a sufficiently good approximate solution to f (x)= 0. In fact, this value of x gives f (x)= 5.383 10 9. × − Example 7.24. Use Newton’s method to approximate √10.

Solution. Note that if x = √10, then x2 10 = 0. We may pose this as a root finding problem 2 − by setting f (x)= x 10 and solving f (x)= 0. The first derivative is f ′(x)= 2x, so the iteration formula becomes − 2 xn 10 xn+1 = xn − . − 2xn Starting with x0 = 3, we obtain the following values n 0 1 2 3 4 ··· xn 3.000000 3.166667 3.162281 3.162278 3.162278 ··· And we conclude that √10 3.162278. ≈ In fact, the ancient Babylonians used this method successfully to find approximations to √a. You should check that Newton’s method in this case amounts to the relation 1 a x = x + . n+1 2 n x  n  Example 7.25. Use Newton’s method to attempt to find a root of f (x)= x1/3.

Solution. Things are going to go quite badly wrong for this example and, to demonstrate how wrong, we will perform the iteration starting with an arbitrary point x0. The first derivative is 1 2/3 f ′(x)= 3 x− and the sequence of approximations is given by

1/3 xn xn+1 = xn = 2xn. − 1 2/3 − 3 xn−

The first few are x0, 2x0, 4x0, 8x0, 16x0, . . . which are clearly never going to converge to the − − actual root x = 0 (unless we start with x0 = 0). If we think about the graph of y = x1/3, pictured in Figure 7.9, we might become suspicious of the fact that the infinite slope of the graph at the root x = 0. However, this is not the culprit at repeating the above exercise with y = x5/3 shows. A closer look reveals that the problem here is that the term f (xn)/ f ′(xn) in the recursion formula is always bigger than xn. If the sequence (xn) is going to converge, then we need these “correction terms” to get smaller and smaller. Unfortunately, this method will just fail for functions for which this doesn’t happen. Other methods of solving f (x)= 0 must be sought.

92 0.75

0.5

0.25

-1.25 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

-0.25

-0.5

-0.75

Figure 7.9: The graph of the function y = x1/3.

93 Chapter 8

Integration

8.1 Indefinite Integrals

We begin with a (hopefully) familiar definition: Definition. Given a function f defined on an interval I, any function F with the property that

F′(x)= f (x), for all x in I, is called an antiderivative (or sometimes a primitive) of f . We say “an antiderivative” above because if F is an antiderivative of f , then so is F +C for any constant function C (recall that C′ will then be the zero function). In fact, any two antiderivatives F and G of f will be such that F G is a constant function, provided only that the domain of f is an interval. − An antiderivative of f on an interval I is denoted by

f or f (x) dx. Z Z The integrals appearing here are said to be indefinite because they take no limits. It is important to remember that f does not refer to a single function, but rather to a continuous family of functions which differ from each other by a constant. Here are a few simple antiderivatives which you shouldR be able to easily verify for yourself by differentiating:

xn+1 1 dx = x +C xn dx,= +C (n = 1), n + 1 6 − Z dx Z = ln x +C, ex dx = ex +C, x | | Z Z sinxdx = cosx +C, cosxdx = sinx +C, − Z Z dx 1 1 dx 1 = sin− x +C = cos− x +C′, = tan− x +C. √1 x2 − 1 + x2 Z − Z These can be extended to many other functions by using the properties of the integral. In particular, using

f (x)+ g(x) dx = f (x) dx + g(x) dx Z Z Z (8.1) and a f (x) dx = a f (x) dx for all a in R. Z Z  94 Example 8.1. Find the function f satisfying f (1)= 2 whose derivative is f ′(x)= 2x + 3.

Solution. Integrating f ′, we obtain x2 f (x)= 2 + 3x +C = x2 + 3x +C, 2 for some constant C. But, we are told that f (1)= 2, hence

2 = 12 + 3 1 +C C = 2, × → − and we conclude that f (x)= x2 + 3x 2. − 8.2 Summation and Σ Notation

Before we turn to definite integration, let us recall some standard notation that will be needed. A shorthand for writing the sum of the first n positive integers is n ∑ i. i=1 The “dummy variable” i is said to run from 1 to n. The actual symbol used for the dummy variable is irrelevant, because the sum does not depend upon it. In fact, it is not hard to check that n 1 ∑ i = n(n + 1). i=1 2 In any case, this means that n n ∑ i and ∑ j i=1 j=1 refer to exactly the same sum. We also note that it can often be useful to change the dummy variable. For example, if we set k = i 1, so that i = k + 1, then i = 1 means k = 0 and i = n means k = n 1 and − − n n 1 ∑ i = ∑− (k + 1). i=1 k=0 Of course, we can generalise such sums to those of the form n ∑ f (i), i=1 where f is some function (defined on the positive integers, say). We have already noted the value of the sum when f (i)= i. Further examples include

n n n(n + 1)(2n + 1) n 1 rn+1 ∑ 1 = n, ∑ i2 = and ∑ ri = − (r = 1). (8.2) 6 1 r 6 i=1 i=1 i=0 − The most important for us, theoretically, is the last, so it is the only one that we shall justify. Fact 8.1. If r = 1, then 6 n 1 rn+1 ∑ ri = − . 1 r i=0 −

95 Proof. Let

n i 2 n 2 n n+1 Sn(r)= ∑ r = 1 + r + r + + r rSn(r)= r + r + + r + r . i=0 ··· → ··· Subtracting these two equations now gives

n+1 (1 r)Sn(r)= 1 r , − − from which the result follows. The sequence 1, r, r2, . . . appearing in this sum is called a geometric series. It is characterised by the fact that the ratio of successive terms is a constant r. In general, finding explicit formulae for sums is extremely difficult, if not impossible. It is somewhat peculiar then, that it can sometimes be easier to find explicit formulae for infinite sums. This is a sum with infinitely many terms, for example

∞ n 1 rn+1 ∑ ri = lim ∑ ri = lim − . (8.3) n ∞ n ∞ 1 r i=1 → i=1 → − Because rn+1 0 as n ∞, whenever r < 1, we can evaluate this limit as → → | | ∞ 1 ∑ ri = . (8.4) 1 r i=1 − Note that the result is somewhat simpler than the result for finite n given in Fact 8.1. Finally, we remark that the summation symbol ∑ shares several properties with the integra- tion symbol . In particular, the analogue of Equation (8.1) holds: n R n n n n ∑(ai + bi)= ∑ ai + ∑ bi and ∑(cai)= c ∑ ai for all c in R. i=1 i=1 i=1 i=1 i=1

Here, ai and bi are sequences of real numbers indexed by the positive integers i = 1,2,...,n.

8.3 Areas and Sums

Let f be a function defined an an interval [a,b] and suppose that f (x) > 0 for all a 6 x 6 b. An estimate of the area between the graph of f , the x-axis and the vertical lines x = a and x = b — the area under the graph for short — may be obtained by finding the areas of certain “thin” rectangles under the graph (see Figure 8.1). The following steps make this precise:

• Take a sequence of numbers x0,x1,...,xn, where

x0 = a < x1 < x2 < < xn = b. ··· This is called a partition of [a,b].

• Calculate the area An(i), for i = 1,2,...,n, of the rectangle with corners at (xi 1,0), (xi,0), − (xi 1, f (xi)) and (xi, f (xi)): −

An(i)= f (xi)(xi xi 1). − − 96 Figure 8.1: The rectangles whose areas define the Riemann sum approximating the area under the graph. Note that this picture doesn’t actually reflect the method given in the text!

• Sum up the areas of all these rectangles to get the estimate for the area A under the graph: n n n An = ∑ An(i)= ∑ f (xi)(xi xi 1)= ∑ f (xi)∆xi. i=1 i=1 − − i=1

Here, we have suggestively written ∆xi for xi xi 1. This estimate is called a Riemann sum for f on [a,b]. − −

As the lengths ∆xi get smaller and smaller, and the number n of rectangles increases, we expect that the combined area An of the rectangles would get closer and closer to the true area under the curve. This is exactly the way that integration was done in the past! In most cases (but not all), the best partitions to use are the equipartitions, which are those for which the points are equally spaced. In other words, b a ∆x = − . i n In any case, An will not depend on the sequence of partitions used, as long as f is a sufficiently nice function and the ∆xi all tend to 0 as n ∞. Now, let’s have a closer look: Suppose that→ f is a function which is continuous on an interval [a,b] (we no longer require that f (x) is necessarily non-negative). As f is continuous on the closed interval [xi 1,xi], there are points ℓi and ui in [xi 1,xi] at which f takes its minimal and maximal values, respectively,− for that interval. That is, −

f (ℓi) 6 f (x) 6 f (ui) for all x in [xi 1,xi]. − We can now define n Ln( f )= ∑ f (ℓi)∆xi = f (ℓ1)(x1 x0)+ f (ℓ2)(x2 x1)+ + f (ℓn)(xn xn 1), i=1 − − ··· − − the lower sum for f over [a,b]. This, of course, depends upon the partition we are using, not just on n, but we will not bother to make this dependence explicit. Similarly, the upper sum is given by n Un( f )= ∑ f (ui)∆xi = f (u1)(x1 x0)+ f (u2)(x2 x1)+ + f (un)(xn xn 1). i=1 − − ··· − −

97 Figure 8.2: A lower and an upper Riemann sum for the function f (x)= x2 + 1.

If f (x) > 0 for all x in [a,b], then the lower sum Ln( f ) is precisely the sum of the areas of the rectangles that just fit under the graph of f , while the upper sum Un( f ) is the sum of the areas of the rectangles that just fit over the graph of f (see Figure 8.2). Example 8.2. Find the upper and lower sums of f (x)= x2 on the interval [0,a], where a > 0, using an equipartition of n + 1 points (so n rectangles). Solution. The equipartition corresponds to taking a a xi = i (i = 0,1,...,n) ∆xi = . n ⇒ n

Now, f has positive derivative on (0,a), so f is strictly increasing on [0,a]. Thus, ℓi = xi 1 and − ui = xi. The upper and lower sums are therefore given by n n n 2 2 a a 2 a Ln( f )= ∑ f (xi 1)∆xi = ∑ xi 1 = ∑ 2 (i 1) i=1 − i=1 − · n i=1 n − · n 3 n 1 3 3 a − 2 a (n 1)n(2n 1) a (n 1)(2n 1) = 3 ∑ i = 3 − − = − 2 − n i=0 n · 6 6n and n n n 2 2 a a 2 a Un( f )= ∑ f (xi)∆xi = ∑ xi = ∑ 2 i i=1 i=1 · n i=1 n · n 3 n 3 3 a 2 a n(n + 1)(2n + 1) a (n + 1)(2n + 1) = 3 ∑ i = 3 = 2 n i=1 n · 6 6n Here, we have used Equation (8.2) to evaluate these sums explicitly.

8.4 Definite Integrals

It is clear that for a given partition, the lower sum is less than or equal to the upper sum. In fact, more is true: The lower sum for any given partition is less than or equal to the upper sum for any other partition! The set of lower sums therefore has a least upper bound, which we will call µ: Ln( f ) 6 µ for all n and all partitions into n parts.

98 A similar discussion shows that the set of all upper sums has a greatest lower bound, say ν:

ν 6 Un( f ) for all n and all partitions into n parts.

Of course, we must always have µ 6 ν. For nice functions, including all those which we will meet in this course, we actually have µ = ν. This common value is the unique real number that lies between every lower sum and every upper sum. We define the definite integral of f over [a,b] to be this common value µ = ν and denote it, as you no doubt are expecting, by

b b f (x) dx or just by f . Za Za Thus, we arrive at the following definition:

Definition. The definite integral of a continuous function f over a closed interval [a,b] is the b unique real number a f (x) dx satisfying R b Ln( f ) 6 f (x) dx 6 Un( f ), Za where the lower and upper sums correspond to arbitrary partitions a = x0 < x1 < x2 < < ··· xn = b of [a,b]. We can now use the definite integral to define what is meant by the “area under a graph”. Namely, we say that:

Definition. If f is a continuous, non-negative function on an interval [a,b], then the area under the graph of f is defined to be b f (x) dx. Za Similarly, if f (x) > g(x) for all x in an interval [a,b], then we define the area between the curves to be b f (x) g(x) dx. a − Z Here is some terminology that is worth becoming familiar with when discussing definite b integrals f (x) dx. Za • The function f is the integrand. It is what is being integrated.

• The variable x is the dummy variable or the variable of integration. We say that we are integrating f with respect to x.

• dx is the differential of x. If an integrand depends on more than one variable, then the differential tells you which is the variable that is being integrated over.

• The numbers a and b are the lower and upper limits of integration, respectively. We say that f is being integrated from a to b or that f is integrated over [a,b].

b • When a f (x) dx exists, meaning it is a finite number, then we say that f is integrable on [a,b]. R

99 It is worth emphasising that in the above definition of a definite integral, we have assumed that a < b. However, it is convenient to give a meaning to the symbol

b f (x) dx Za when a > b. We therefore agree that

a b a f (x) dx = f (x) dx f (x) dx = 0. b − a ⇒ a Z Z Z 8.5 Implementing the Definition of Definite Integration

To actually use the definition of the definite integral when f is an elementary function, we can usually proceed as follows:

1. Consider the equipartition of [a,b] given by the n + 1 points b a b a xi = a + − i (0 6 i 6 n) ∆xi = − . n ⇒ n

2. Find formulae for the lower and upper sums Ln( f ) and Un( f ) (this is the hardest part in general).

3. Show that lim Ln( f )= lim Un( f ). n ∞ n ∞ → → b 4. This common limit is the definite integral a f (x) dx. Example 8.3. Find the area under the graph ofR the function f (x)= c over the interval [a,b] using the definition of integrability.

Solution. Using the (n+1)-point equipartition of [a,b], we compute that the lower sum is given by

n n b a b a n Ln( f )= ∑ f (xi 1)∆xi = ∑ c − = c − ∑ 1 i=1 − i=1 · n n i=1 b a = c − n = c(b a). n · −

Similarly, we compute that Un( f )= c(b a) for all n. As the lower and upper sums coincide, we conclude that the definite integral exists− and is given by

b cdx = c(b a). a − Z This is therefore the area under the graph. The result agrees with our expectation for this area. After all, the area under a constant function is just a rectangular. If we had defined the area under a graph in such a way that the area of a rectangle was not width times height, then we would have a hard time selling the definition to anyone! Here is a less trivial example:

100 a Example 8.4. Find x2 dx using the definition of integrability. Z0 Solution. We have already computed, in Example 8.2, the upper and lower sums of f (x)= x2 corresponding to an equipartition of [0,a] into n + 1 points. The results were a3(n 1)(2n 1) a3(n + 1)(2n + 1) L ( f )= − − and U ( f )= . n 6n2 n 6n2

We therefore compute the limit of Ln( f ) as n ∞: → 3 1 1 3 a (n 1)(2n 1) 3 (1 n )(2 n ) 3 1 2 a lim Ln( f )= lim − − = a lim − − = a × = . n ∞ n ∞ 6n2 n ∞ 6 6 3 → → → 1 3 A similar calculation gives limn ∞ Un( f )= a as well, hence we conclude that → 3 a a3 x2 dx = . 3 Z0 8.6 Properties of the Definite Integral

Here are the most important properties possessed by the definite integral: • If a,b,c all belong to an interval on which f is continuous, then b c b f (x) dx = f (x) dx + f (x) dx. Za Za Zc • if k is a real constant, then b b k f (x) dx = k f (x) dx. Za Za • If f and g are both continuous on [a,b], then b b b f (x)+ g(x) dx = f (x) dx + g(x) dx. Za Za Za  • If f (x) 6 g(x) for all x in [a,b], then b b f (x) dx 6 g(x) dx. Za Za • The following property is a version of the triangle inequality: b b f (x) dx 6 f (x) dx. a a | | Z Z

• If f is odd on the interval [ a ,a], then − a f (x) dx = 0. a Z− If f is even on [ a,a], then − a a f (x) dx = 2 f (x) dx. a 0 Z− Z 101 8.7 The Fundamental Theorem of Calculus

Recall the Mean Value Theorem (Fact 5.4 from Section 5.9). It has a version involving integrals which makes the appellation “Mean Value” much more comprehensible. First, we define what we mean by “mean value”:

Definition. The mean or average value of an integrable function f over an interval [a,b] is

1 b f = f (x) dx. b a a − Z Fact 8.2 (Mean Value Theorem for Integrals). If f is continuous and integrable on [a,b], then there exists a c in (a,b) for which

1 b f (c)= f = f (x) dx. b a a − Z This version of the Mean Value Theorem therefore says that f takes on its average value over [a,b] at some point c with a < c < b.

Example 8.5. Find the average value of f (x)= √x on [0,4] and find c in (0,4) such that f (c) is this average value.

Solution. The average is given by

1 4 1 2 4 f = x1/2 dx = x3/2 = 4/3. 4 0 0 4 3 − Z  0 16 This is clearly the value of f (c) when c = 9 . The Mean Value Theorem for derivatives is quite an important result theoretically and the Mean Value Theorem is similarly important. However, our motivation for introducing it here is because it can be used to prove a far more fundamental result.

Fact 8.3 (Fundamental Theorem of Calculus).

1. If f is continuous on an (open) interval I containing a, define a function F on I by

x F(x)= f (t) dt. Za Then, F is differentiable on I and d x F (x)= f (t) dt = f (x). ′ dx Za In other words, F is an antiderivative of f on I.

2. If G is any antiderivative of f on I, then

b f (x) dx = G(b) G(a) for all b in I. a − Z

102 Proof. To prove 1, use the definition of the derivative and the Mean Value Theorem for integrals as follows: F(x + h) F(x) F′(x)= lim − h 0 h → 1 x+h x = lim f (t) dt f (t) dt h 0 h a − a → Z Z  1 x+h = lim f (t) dt h 0 h x → Z = lim f (c) (for some c in [x,x + h]) h 0 → = lim f (c) (since c x as h 0) c x → → → = f (x) (since f is continuous).

For 2, note that if G is an arbitrary antiderivative of f on I, then

G′(x)= f (x) and F(x)= G(x)+C, for some constant C (since F is also an antiderivative). Hence, x f (t) dt = F(x)= G(x)+C. Za Substituting x = a into this equation now gives

0 = G(a)+C C = G(a). ⇒ − Consequently, x f (t) dt = G(x) G(a). a − Z Since this is true for all x in I, we may now substitute x = b for x and the proof is complete. The Fundamental Theorem of Calculus is so-named because it is rather fundamental to the entire subject. This is a hint that it’s rather a big deal, relating as it does the two most important operations of calculus, differentiating and integrating. Before giving examples, we pause to remark that the statement of the Fundamental Theorem of Calculus may be generalised slightly. Suppose that g(x) and h(x) are differentiable functions. Then, we can derive the following relation that is sometimes useful:

d g(x) d a g(x) f (t) dt = f (t) dt + f (t) dt dx dx Zh(x) Zh(x) Za  d h(x) g(x) = f (t) dt + f (t) dt dx − a a  Z Z  d g(x) dg(x) d h(x) dh(x) = f (t) dt f (t) dt dg(x) a · x − dh(x) a · x Z Z = f (g(x))g′(x) f (h(x))h′(x). (8.5) − Example 8.6. Find the area of the region lying above the line y = 1 and below the curve 5 y = . x2 + 1

103 Figure 8.3: The area between y = 1 and y = 5/(x2 + 1).

Solution. The graphs intersect at the points where 5 = 1 x = 2, x2 + 1 ⇒ ± that is, the intersection points are ( 2,1) and (2,1) (see Figure 8.3). The area required is therefore given by the definite integral−

2 2 2 5 5 dx 1 2 2 2 1 dx = 2 2 dx = 2 5tan− x 0 [x] 2 2 x + 1 − 0 x + 1 − 2 − − Z−   Z Z− 1 = 10tan− 2 4.   −

Example 8.7. Find the derivative of

3 t2 F(x)= e− dt. Zx Solution. We use the Fundamental Theorem of Calculus:

x x d t2 d t2 x2 F′(x)= e− dt = e− dt = e− . dx − 3 −dx 3 −  Z  Z Example 8.8. Find the derivative of

x3 t2 F(x)= e− dt. 2 Zx Solution. This time, we use Equation (8.5) with g(x)= x3 and h(x)= x2:

(x3)2 2 (x2)2 2 x6 x4 F′(x)= e− 3x e− 2x = 3x e− 2xe− . · − · −

104 8.8 Definite Integration in Practice

Now that we have the Fundamental Theorem of Calculus, we can turn to calculating definite integrals using antiderivatives rather than directly from the definition (as in Section 8.5). In this section, we will illustrate such calculations through area calculations. We will discuss other applications in Section 9. We have seen that the area between the graphs of the functions f and g is given by

b f (x) g(x) dx, a − Z  assuming that f (x) > g(x) on [a,b]. If we do not make this assumption, then the above integral will count some regions as having negative area. In order to make sure that all regions contribute positive area to the total, we should instead compute

b f (x) g(x) dx. a | − | Z In practice, this means finding the intervals on which f (x) g(x) is positive and those on which it is negative. The total area may then be found by evaluating− the integral over these intervals separately. 1 Example 8.9. Find the total area enclosed between the straight line with equation y = 2 , the graph y = sinx and the vertical lines x = 0 and x = 7π/3. Solution. Here, we are asked to compute the area 7π/3 1 A = sinx dx. 0 2 − Z

1 5 13 Referring to Figure 8.4, we find that the graphs intersect when x = 6 π, x = 6 π and x = 6 π. Therefore, π/6 5π/6 13π/6 7π/3 1 A = + sinx dx 0 − π 6 5π 6 − 13π 6 2 − Z Z / Z / Z /   x π/6 x 5π/6 x 13π/6 x 7π/3 = + cosx + cosx + + cosx + cosx 2 0 − 2 π/6 2 5π/6 − 2 13π/6 hπ i 1 h i h i h i = + 3√3 . 3 − 2

Example 8.10. Find the area of the region to the right of the parabola x = y2 12 and to the left of the line y = x. − Solution. While we could try to compute this area in terms of an integral over x, it is much easier to instead compute it as an integral over y (see Figure 8.5). Now, the curve x = y2 12 and the line x = y intersect when −

y = y2 12 (y 4)(y + 3)= 0 y = 4 or y = 3. − ⇒ − ⇒ − Moreover, y > y2 12 for y in [ 3,4]. Therefore, the area is given by − − 4 4 y2 y3 343 A = y (y2 12) dy = + 12y = . 3 − − 2 − 3 6 Z−   3   − 105 1 Figure 8.4: The graphs of y = sinx (in red) and y = 2 (in blue) needed for Example 8.9.

For comparison, computing A as an integral over x leads to the following expression:

3 4 A = − √12 + x ( √12 + x) dx + √12 + x x dx. 12 − − 3 − Z− h i Z− h i This has to give the same result, but it is clearly a more difficult calculation.

8.9 Integrating Piecewise-Continuous Functions

Recall that we discussed piecewise-defined functions in Section 1.5. A special subclass of these functions are those which are said to be piecewise-continuous.

Definition. A function is called piecewise-continuous on an interval if there exists a finite num- ber of points a0 < a1 < a2 < < an so that the function is continuous on each open sub- ··· interval (ai 1,ai) and has finite left- and right-sided limits at the endpoints ai 1 and ai of each sub-interval.− −

We refer to Figure 8.6 for an example of a piecewise-continuous function. Basically, this means that the function is continuous at every point of the interval, except perhaps for finitely many. When f is piecewise-continuous, we define the definite integral of f from a0 to an to be

an n ai f (x) dx = ∑ f (x) dx. a0 ai 1 Z i=1 Z − Example 8.11. Find 2 f (x) dx, Z0 106 Figure 8.5: The area computed in Example 8.10.

Figure 8.6: An example of a graph of a piecewise-continuous function.

107 where x2 for 0 6 x 6 1, f (x)= (x for 1 < x 6 2. Solution. Here, f is clearly continuous on the sub-intervals (0,1) and (1,2) and the left- and right-sided limits at x = 0, x = 1 and x = 2 clearly exist. Thus,

1 2 2 1 2 x3 x2 11 f (x) dx = x2 dx + xdx = + = . 3 2 6 Z0 Z0 Z1  0  1 8.10 Improper Integrals

We have seen thus far how to integrate a continuous function over a finite interval. However, what happens if the interval is infinite, or if the function is not defined at the endpoints of the interval? It can still happen that the definite integral is defined (and corresponds to the area under the curve). We just have to be a little more careful! A definite integral of the form b f (x) dx Za is said to be improper if either • f is not bounded on [a,b], or

• [a,b] is an infinite interval. You should check that the following integrals are all improper:

1 1 ∞ ∞ dx dx dx x2 , 2 , 3 , e− dx. 0 √1 x2 1 x 1 x ∞ Z − Z− Z Z− If f becomes unbounded at some point c of [a,b] which is not one of the endpoints a and b, then one should split the integral into one over [a,c] and one over [c,b] as we did for piecewise-defined functions. We may therefore suppose that an integral is improper because it is unbounded at an endpoint, or because an endpoint is infinite. Suppose that b is the endpoint responsible for the integral being improper. We then define

b v f (x) dx = lim f (x) dx, a v b a Z → − Z assuming that the integrals on the right-hand side exist and that the limit does as well (otherwise the improper integral does not exist either). Similarly, if a is the problematic endpoint, then

b b f (x) dx = lim f (x) dx, a u a+ u Z → Z again assuming that the integrals and the limit on the right-hand side exist. Finally, it could happen that both a and b are problematic, in which case we must use a double limit:

b v f (x) dx = lim lim f (x) dx. a u a+ v b u Z → → − Z With two limits, one has to be doubly careful!

108 1 dx Example 8.12. Evaluate . 0 √1 x2 Z − Solution. This integral is improper because the integrand has a pole at the right endpoint x = 1. Thus, we compute v v dx 1 1 = sin− x = sin− v. 0 √1 x2 0 Z − h i The improper integral is then obtained by taking the limit as v 1 : → − 1 v dx dx 1 1 π = lim = lim sin− v = sin− 1 = . 2 2 0 √1 x v 1− 0 √1 x v 1− 2 Z − → Z − → 1 dx Example 8.13. Evaluate − . ∞ x3 Z− Solution. This time, the integral is improper because the left limit is infinite. Thus,

1 1 1 − dx − dx 1 − 1 1 1 3 = lim 3 = lim − 2 = lim − 2 = . ∞ x u ∞ u x u ∞ 2x −2 − u ∞ 2u −2 Z− →− Z →−  u →− Once one has had some practise with improper integrals, one usually dispenses with the formal limits as in these examples and just remembers that a limit is implied in the notation. This is quite sloppy and can be rather dangerous; nevertheless, pretty much everyone does it. ∞ x Example 8.14. Evaluate e− dx. Z0 Solution. Here, we do things the sloppy way, leaving the formal limits to the reader to fill in:

∞ ∞ x x 0 e− dx = e− = 0 ( e− )= 1. 0 − 0 − − Z h i Here’s a different kind of example, illustrating why one needs to be alert when computing improper integrals: 1 dx Example 8.15. Evaluate 2 . 1 x Z− Solution. Neither endpoint is problematic, but this is an improper integral because of the pole at x = 0. We therefore split the integral into the sum of two improper ones: 1 dx 0 dx 1 dx 2 = 2 + 2 . 1 x 1 x 0 x Z− Z− Z The second one is problematic at the left endpoint and we find that

1 dx 1 dx 1 1 1 ∞ 2 = lim 2 = lim = 1 + lim =+ . 0 x u 0+ u x u 0+ −x − u 0+ u Z → Z →  u → Because this improper integral is divergent, the required integral cannot exist! Nevertheless, 0 2 ∞ one can check that 1 1/x dx is also + , hence we can write − R 1 dx ∞ 2 =+ . 1 x Z− 109 The point of this example is that if we had not noticed that there was a pole at x = 0, then we would have probably tried to apply the Fundamental Theorem of Calculus and obtained

1 dx 1 1 2 = = 1 1 = 2. 1 x −x 1 − − − Z−  − The fact that this is negative when 1/x2 is positive should set off alarm bells! The fact is that the Fundamental Theorem of Calculus does not apply here because the integrand is not continuous on ( 1,1). −

110 Chapter 9

Integration Techniques

9.1 Integration by Substitution

Recall the chain rule for differentiating from Section 5.5: If f and g are sufficiently nice differ- entiable functions, then d f (g(x)) = f (g(x))g (x). dx ′ ′ Consequently,

f ′(g(x))g′(x) dx = f (g(x)) +C. Z The basis of integration by substitution is the following observation: If we write u = g(x), so that du = g (x), dx ′ then du f (u)+C = f (g(x))g (x) dx = f (u) dx. ′ ′ ′ dx Z Z But, the Fundamental Theorem of Calculus tells us that

f (u)+C = f ′(u) du, Z so we have the formal “cancellation rule” for the two dx’s appearing above. Integration by substitution then amounts to recognising that the integrand has the form f ′(g(x))g′(x), for some functions f and g, or can be sneakily modified to be of this form.

Example 9.1. Find an antiderivative for the function h(x)= sinxecosx.

Solution. First, note that d h(x)= ecosx ( cosx), dx − so we choose u = cosx to obtain du h(x) dx = sinxecosx dx = eu dx − dx Z Z Z = eu du = eu +C = ecosx +C. − − − Z

111 Example 9.2. Find the antiderivative of the function

x2 k(x)= . 2x3 + 1 Solution. The idea with this example is to realise that x2 is proportional to the derivative of 2x3 + 1, so we can proceed as follows: Let du u = 2x3 + 1, so = 6x2. dx Then,

x2 1 1 du k(x) dx = dx = dx 2x3 + 1 u · 6 dx Z Z Z 1 1 1 1 = du = ln u +C = ln 2x3 + 1 +C. 6 u 6 | | 6 Z

Example 9.3. Evaluate dx . x2 + 4x + 5 Z Solution. Here, we have to find a suitable substitution. The trick in this case is to modify the denominator by “completing the square”:

x2 + 4x + 5 =(x + 2)2 + 1. du We therefore let u = x + 2, so du = dx (which is a naughty way of writing = 1) and dx dx dx du = = x2 + 4x + 5 1 +(x + 2)2 1 + u2 Z Z Z 1 1 = tan− u +C = tan− (x + 2)+C.

9.2 Substitution and Definite Integrals

The following result tells us how to apply the method of substitution to definite integration.

Fact 9.1. If f is continuous and g is differentiable on [a,b] with g(a)= A and g(b)= B, then

b B f (g(x))g′(x) dx = f (u) du, (9.1) Za ZA where u = g(x). Proof. Let F be an antiderivative of f . Then,

b x=b x=b x=b f (g(x))g′(x) dx = f (u) du = F(u) = F(g(x)) x=a x=a Za Zx=a h i h iB = F(g(b)) F(g(a)) = F(B) F(A)= f (u) du, − − A Z by the Fundamental Theorem of Calculus.

112 The best way to think about Equation (9.1) is to note that the left-hand side is an integral over x, so the limits correspond to x = a and x = b. Contrarily, the right-hand side is an integral over u, so the limits correspond to u = A and u = B.

Example 9.4. Evaluate 8 cos√x + 1 dx. √ Z0 x + 1 Solution. We can proceed in two different ways:

1. It seems reasonable to use the substitution u = √x + 1, which gives du 1 dx = = 2 du. dx 2√x + 1 ⇒ √x + 1 Our integral now becomes

8 cos√x + 1 x=8 x=8 dx = cosu 2 du = 2 sinu 0 √ x 0 · x=0 Z x + 1 Z = x=8 h i = 2 sin√x + 1 = 2(sin3 sin1). x=0 − h i Here, we’ve converted the result of the integration by u back into a function of x and evaluated it at the limits x = 0 and x = 8.

2. The second method uses the same substitution u = √x + 1, but changes the limits of integration so that they apply to u. We note that x = 0 implies that u = 1 and that x = 8 gives u = 3. Hence,

8 cos√x + 1 3 3 dx = 2 cosu du = 2 sinu = 2(sin3 sin1). 0 √x + 1 1 1 − Z Z h i This method is often more efficient!

9.3 Trigonometric Integrals

Integrals of the form sinm x cosn xdx, Z with m and n non-negative integers, may be evaluated using a mixture of substitution and trigonometric identities. The following examples illustrate some of the possibilities:

Example 9.5. Evaluate sin5 x cos7 xdx. Z Solution. When one of the exponents is odd, it’s usually a good idea to use sin2 x + cos2 x = 1 to reduce the exponent to 1 and then use substitution. In this case, sin4 x =(1 cos2 x)2, so − sin5 x cos7 xdx = (1 cos2 x)2 cos7 x sinxdx = (1 u2)2u7( du) − − − Z Z Z

113 (using u = cosx, hence du = sinxdx) − u8 u10 u12 = (u7 2u9 + u11) du = + +C − − − 8 5 − 12 Z cos8 x cos10 x cos12 x = + +C. − 8 5 − 12 Example 9.6. Evaluate sin4 θ dθ. Z Solution. This time, there are no odd exponents, so we resort to the double angle formulae

1 + cos(2θ) cos2 θ = , cos(2θ)= cos2 θ sin2 θ 2 − ⇒ 1 cos(2θ) sin2 θ = − . 2 Applying this twice, we deduce that

2 1 cos(2θ) 1 sin4 θ dθ = − dθ = 1 2cos(2θ)+ cos2(2θ) dθ 4 4 − Z Z  Z θ sin(2θ) 1 1 + cos(4θ) θ sin(2θ) 1 sin(4θ) = + dθ = + θ + +C 4 − 4 4 2 4 − 4 8 4 Z   3θ sin(2θ) sin(4θ) = + +C. 8 − 4 32 Similar tricks can also work for integrals involving secants and tangents or cosecants and cotan- gents.

9.4 Inverse Trigonometric Substitutions

Substitutions involving the inverse trigonometric functions defined in Section 6.5 are very useful when the integrands involve expressions of the following types:

a2 x2, a2 + x2, x2 a2. − − The three basic inverse substitutionsp we canp try are: p

If integrand has. . . Try the substitution. . . Simplifying using. . . √a2 x2 x = asinθ 1 sin2 θ = cos2 θ − − √a2 + x2 x = atanθ 1 + tan2 θ = sec2 θ √x2 a2 x = asecθ sec2 θ 1 = tan2 θ − − These correspond to the direct substitutions x x x a θ = sin 1 , θ = tan 1 , and θ = sec 1 = cos 1 , − a − a − a − x respectively.

114 Example 9.7. Given r > 0, find r r2 x2 dx. r − Z− p Solution. We try the substitution x = r sinθ for which dx = r cosθ dθ. Noting that x = r means θ = sin 1( 1)= 1 π, this gives ± − ± ± 2 r π/2 π/2 r2 x2 dx = r2 r2 sin2 θ r cosθ dθ = r2 cos2 θ dθ r − π/2 − · π/2 Z− p Z− p Z− (since cosθ > 0 on [ 1 π, 1 π]) − 2 2

π/2 1 + cos(2θ) r2 sin(2θ) π/2 1 = r2 dθ = θ + = πr2. π/2 2 2 2 π/2 2 Z−  − This is, of course, the area enclosed by a semicircle of radius r.

Example 9.8. Evaluate x3 dx. √1 x2 Z − Solution. Here, the appropriate substitution is x = sinθ, giving dx = cosθ dθ. This gives

x3 sin3 θ cosθ sin3 θ cosθ dx = dθ = dθ = sin3 θ dθ √1 x2 1 sin2 θ cosθ Z − Z − Z | | Z p √ 2 1 1 (because 1 x is only defined for x in [ 1,1], hence θ in [ 2 π, 2 π], and cosθ is non- negative on this− interval) − −

= (1 cos2 θ)sinθ dθ = (1 u2)( du) − − − Z Z (using u = cosθ, hence du = sinθ) − u3 cos3 θ = u +C = cosθ +C. 3 − 3 − Finally, cosθ > 0 means that

cosθ = cosθ = 1 sin2 θ = 1 x2. | | − − p We therefore arrive at p

x3 (1 x2)3/2 1 dx = − 1 x2 +C = (x2 + 2) 1 x2 +C. √1 x2 3 − − −3 − Z − p p Example 9.9. Evaluate √x2 25 − dx. x Z

115 Solution. Here, we use the substitution 5 5 x = 5secθ = dx = − ( sinθ)dθ = 5secθ tanθ dθ. cosθ ⇒ cos2 θ − This gives

√x2 25 √25sec2 θ 25 − dx = − 5secθ tanθ dθ = 5 tan2 θ dθ x 5secθ · Z Z Z = 5 (sec2 θ 1) dθ = 5tanθ 5θ +C − − Z 2 1 x = x 25 5sec− +C. − − 5 p Example 9.10. Evaluate dx . √ 2 Z 4 + x Solution. Here, we let x = 2tanθ, hence dx = 2sec2 θ dθ, which gives

dx 2sec2 θ dθ secθ tanθ + sec2 θ = = secθ dθ = dθ √ 2 √ 2 secθ + tanθ Z 4 + x Z 4 + r tan θ Z Z √4 + x2 x = ln secθ + tanθ +C′ = ln + +C′ | | 2 2

= ln 4 + x2 + x +C,

p  where C = C′ ln2. We remark that the absolute value may be dropped in the final result because √4 + x−2 + x > 0 for all x.

9.5 Integration by Parts

Suppose that f (x) and g(x) are two differentiable functions. The product rule for differentiation gives d d d f (x)g(x) = f (x) g(x)+ g(x) f (x). dx dx dx Therefore, h i d f (x)g′(x)= f (x)g(x) f ′(x)g(x) dx − and integrating both sides with respect to x yieldsh i d f (x)g′(x) dx = f (x)g(x) dx f ′(x)g(x) dx dx − Z Z h i Z = f (x)g(x) f ′(x)g(x) dx. − Z This is the formula for integration by parts. It is also frequently written as follows:

udv = uv v du. − Z Z

116 Here, we identify

u = f (x), so du = f ′(x) dx and v = g(x), so dv = g′(x) dx.

Unfortunately, it’s often difficult to know what to choose for u and dv when integrating by parts. Obviously, whatever you choose for dv needs to be something you know how to integrate, so you can find v. On the other hand, you don’t need to be able to integrate u, so it may be helpful to put all the “hard-to-integrate” parts of the integrand into u. There are no hard-and-fast rules for choosing u and dv. Examples, practice and a good dose of luck are your friends here. Just don’t forget that integration by parts is not the ultimate trick — be on the lookout for easy substitutions!

Example 9.11. Use integration by parts to evaluate

2 x3ex dx. Z 2 Solution. We could first try u = x3 and dv = ex dx, but then we wouldn’t know how to get v. 2 Instead, we recognise that choosing u = x2 and dv = xex dx will be better, because we can get v by the substitution u = x2:

2 1 1 1 2 v = dv = xex dx = eu du = eu = ex . 2 2 2 Z Z Z Note that we do not need to add an arbitrary constant to v. Any antiderivative of dv will do — just remember to add the constant after the integration by parts has been performed. Since du = 2xdx, integrating by parts gives

2 1 2 1 2 x3ex dx = udv = uv v du = x2 ex ex 2xdx − · 2 − 2 · Z Z Z Z 1 2 1 2 1 2 = x2ex ex +C = (x2 1)ex +C. 2 − 2 2 − Note that the integral we had to compute here was actually the same as the integral we computed to get v above.

Example 9.12. Evaluate xlnxdx. Z Solution. We apply integration by parts with u = lnx and dv = xdx, so dx 1 du = and v = xdx = x2. x 2 Z The integral now becomes 1 1 dx 1 1 xlnxdx = x2 lnx x2 = x2 lnx xdx 2 − 2 · x 2 − 2 Z Z Z 1 1 1 = x2 lnx x2 +C = x2(2lnx 1)+C. 2 − 4 4 −

117 Sometimes integration by parts takes us round in circles, meaning that we arrive back at the integral that we started with. Generally, this would be bad news, but it often happens that the integral comes back with a coefficient different to 1. Then, we can solve the resulting equation. As always, this is best illustrated with an example.

Example 9.13. Evaluate sin4 xdx. Z using integration by parts.

Solution. Here, the trick is to take u = sin3 x and dv = sinxdx. Then, du = 3sin2 x cosxdx and v = cosx. Integration by parts now gives − sin4 xdx = sin3 x cosx ( cosx)(3sin2 x cosxdx) − − − Z Z = sin3 x cosx + 3 sin2 x cos2 xdx − Z = sin3 x cosx + 3 sin2 xdx 3 sin4 xdx, − − Z Z where we have used cos2 x = 1 sin2 x. Solving for the integral we wish to evaluate, we obtain − sin3 x cosx 3 sin3 x cosx 3 1 cos(2x) sin4 xdx = + sin2 xdx = + − dx − 4 4 − 4 4 2 Z Z Z sin3 x cosx 3x 3 3x 3sinx cosx 2sin3 x cosx = + sin(2x)= − − . − 4 8 − 16 8 9.6 Integrating Rational Functions

Here, we will see that a rational function, which is just a quotient of two polynomials, can be easily integrated when we express it as a sum of simple fractions that we know how to integrate. This is called the method of partial fractions. Given a rational function g(x) f (x)= , h(x) where g and h are polynomials, the first step of the partial fractions method is to see if the degree of h is larger than the degree of g. Recall that the degree of a polynomial is the highest power appearing. If the degree of the numerator g is greater than or equal to the degree of the denominator h, then we must perform a long division to write

g(x)= q(x)h(x)+ r(x), where q and r are polynomials with the degree of r being strictly less than that of h. This will reduce f to the sum of a polynomial, which is easily integrated, and a rational function for which the degree of the numerator is strictly less than that of the denominator:

g(x) r(x) f (x)= = q(x)+ . h(x) h(x) Let’s see an example illustrate this long division before moving onto the second step.

118 Example 9.14. Perform the long division corresponding to

5x4 3x3 + 5x 6 − − x2 + 1 Solution. The long division works as follows: We consider the terms of highest degree in both numerator and denominator, 5x4 and x2. The term of highest degree in the quotient will then be 5x2. Multiplying this by the denominator gives 5x4 + 5x2, so we must subtract this from the numerator to obtain 3x3 5x2 + 5x 6. We therefore have − − − 5x4 3x3 + 5x 6 3x3 5x2 + 5x 6 − − = 5x2 + − − − x2 + 1 x2 + 1 (check it!). The fraction on the right-hand side still has a numerator whose degree is higher than that of the denominator, so we repeat the process. Usually, this is setup in a table as follows:

5x2 3x 5 − − x2 + 1 5x4 3x3 + 5x 6 − − 5x4 + 5x2 3x3 5x2 + 5x 6 − − − 3x3 3x − − 5x2 + 8x 6 − − 5x2 5 − − 8x 1 − Finally, we arrive at the polynomial 8x 1 whose degree is smaller than that of x2 + 1 and we stop. The result of the long division is therefore−

5x4 3x3 + 5x 6 8x 1 − − = 5x2 3x 5 + − . x2 + 1 − − x2 + 1 The second step of the partial fractions method is to factor the denominator h(x). We will suppose for now that h factors into n distinct linear factors (so n is the degree of h):

h(x)= A(x a1)(x a2) (x an) (ai = a j if i = j). − − ··· − 6 6 We then use the following fact: Fact 9.2. Let h(x) be a polynomial which factors into n distinct linear factors and let g(x) be a polynomial of degree smaller than n. Then the quotient g(x)/h(x) has a partial fraction decomposition of the form g(x) A A A = 1 + 2 + + n , h(x) x a1 x a2 ··· x an − − − where A1,A2,...,An are real constants.

The only problem is to determine these constants A1,A2,...,An. Example 9.15. Evaluate x 3 − dx. x2 3x + 2 Z − 119 Solution. First, note that the degree of the numerator is less than the degree of the denominator, so we do not have to do any long division. We therefore factor the denominator:

x2 3x + 2 =(x 1)(x 2). − − − It follows that x 3 A B − = + . x2 3x + 2 x 1 x 2 − − − To find A and B, we can put the right-hand side back over a common denominator and compare coefficients: x 3 A(x 2)+ B(x 1) (A + B)x (2A + B) − = − − = − . x2 3x + 2 (x 1)(x 2) x2 3x + 2 − − − − We therefore get two equations, one for the coefficient of x and one for the constant terms, in two unknowns: A + B = 1, A = 2, 2A + B = 3 ⇒ B = 1. − This completes the partial fraction decomposition, so we may now write

x 3 2 1 − dx = dx = 2ln x 1 ln x 2 +C. x2 3x + 2 x 1 − x 2 | − | − | − | Z − Z  − −  There is actually a slicker method of finding the coefficients in this case, known as the method of residues. Consider the partial fraction decomposition in the above example: x 3 x 3 A B − = − = + . x2 3x + 2 (x 1)(x 2) x 1 x 2 − − − − − To find A, cancel x 1 from the left hand side and evaluate what’s left at x = 1. Similarly, obtain B by cancelling the−x 2 and evaluating at x = 2. This gives − x 3 1 3 x 3 2 3 A = − = − = 2 and B = − = − = 1, x 2 1 2 x 1 2 1 − x=1 x=2 − − − − directly! Example 9.16. Evaluate x3 x 3√2 − − dx. x2 2 Z − Solution. Because the degree of the numerator is not smaller than that of the denominator, we must do the long division:

x x2 2 x3 x 3√2 − − − x3 2x − x 3√2 − Therefore, we have

x3 x 3√2 x 3√2 x 3√2 − − = x + − = x + − . x2 2 x2 2 (x √2)(x + √2) − − − 120 We now determine the partial fraction decomposition of the second term on the right-hand side:

x 3√2 1 2 − = − + . (x √2)(x + √2) x √2 x + √2 − − The integral is therefore

x3 x 3√2 1 2 − − dx = x + dx x2 2 − x √2 x + √2 Z − Z   1 − = x2 ln x √2 + 2ln x + √2 +C. 2 − −

9.7 Further Partial Fraction Decompositions

We’re now experts at partial fractions, as long as the denominator factors into distinct linear factors. However, what do we do when the factors are not distinct or if the denominator does not factor nicely? The answer is not too unpleasant:

Fact 9.3. If h(x) has a repeated factor, (x a)n say, then we should include the terms − A A A 1 + 2 + + n . x a (x a)2 ··· (x a)n − − − in the proposed partial fraction decomposition of g(x)/h(x). If h(x) has a factor (x2 + bx + c)n, then we should instead include terms of the form A x + B A x + B A x + B 1 1 + 2 2 + + n n . x2 + bx + c (x2 + bx + c)2 ··· (x2 + bx + c)n You can probably guess what to do if you have cubic factors or worse. However, it is a fact that one can always factor a polynomial with real coefficients into linear and quadratic factors with real coefficients. Example 9.17. Evaluate dx . (x2 1)(x 1) Z − − Solution. This integrand may be written as 1 1 = . (x2 1)(x 1) (x 1)2(x + 1) − − − As the degree of the denominator is greater than the degree of the numerator, we do not need to divide. We therefore try the decomposition 1 A B C = + + . (9.2) (x2 1)(x 1) x + 1 x 1 (x 1)2 − − − − Putting the right-hand side back over a common denominator, we obtain

1 = A(x 1)2 + B(x + 1)(x 1)+C(x + 1) − − =(A + B)x2 +(C 2A)x +(A B +C). − − 121 Equating coefficients therefore gives three equations in three unknowns:

A + B = 0, C 2A = 0 and A B +C = 1. − − This gives A = 1 , B = 1 and C = 1 so that 4 − 4 2 1 1/4 1/4 1/2 = + (x2 1)(x 1) x + 1 − x 1 (x 1)2 − − − − and dx 1 1 1 1 1 1 = dx dx + dx (x2 1)(x 1) 4 x + 1 − 4 x 1 2 (x 1)2 Z − − Z Z − Z − 1 1 1 1 = ln x + 1 ln x 1 +C. 4 | | − 4 | − | − 2 x 1 − We remark that the method of residues can be applied to get the unknown A in this example: 1 1 A = = . (x 1)2 4 x= 1 − − We can similarly obtain C by cancelling all powers of x 1 and evaluating at x = 1: − 1 1 C = = . x + 1 2 x=1

However, this method does not help us to find B . Of course, once A and C are known, it is not terribly difficult to find B. As this is the only remaining unknown, a quick trick is to substitute any x = 1 into the decomposition (9.2). Substituting x = 0, for example, yields 6 ± 1 1 = A B +C B = A +C 1 = , − ⇒ − −4 as before. Example 9.18. Evaluate: x2 x 21 − − dx. 2x3 x2 + 8x 4 Z − − Solution. Again, the numerator has lower degree than the denominator, so there is no need to do long division. We therefore turn to factoring the denominator. Inspection shows that

2x3 x2 + 8x 4 = x2(2x 1)+ 4(2x 1)=(x2 + 4)(2x 1), − − − − − so we try the partial fraction decomposition

x2 x 21 Ax + B C − − = + . 2x3 x2 + 8x 4 x2 + 4 2x 1 − − − This gives

x2 x 21 =(Ax + B)(2x 1)+C(x2 + 4)=(2A +C)x2 +(2B A)x +(4C B). (9.3) − − − − − The method of residues works on the linear factor as usual: x2 x 21 1 1 21 85 C = − − = 4 − 2 − = − = 5. x2 + 4 1 4 17 − x=1/2 4 +

122 Equating coefficients in (9.3) then gives 1 C A = − = 3 and B = 4C + 21 = 1. 2 Of course, we check that 2B A = 1 as required by the linear coefficient. Finally, we substitute back into our integral: − −

x2 x 21 3x + 1 5 − − dx = dx 2x3 x2 + 8x 4 x2 + 4 − 2x 1 Z − − Z  −  3x 1 5 = + dx x2 + 4 x2 + 4 − 2x 1 Z  −  3 2 1 1 x 5 = ln(x + 4)+ tan− ln 2x 1 +C. 2 2 2 − 2 | − | The method of residues obviously doesn’t apply directly to the quadratic factors. We conclude with a special procedure for evaluating integrals of the form

Ax + B dx, (ax2 + bx + c)k Z where k is a positive integer and A = 0. This involves adding and subtracting a constant in the numerator, if necessary, so as to express6 the integral in the more tractable form du dx C + D , uk (ax2 + bx + c)k Z Z with u = ax2 + bx + c. The first integral is very easy to solve and the second may be solved by completing the square in the quadratic polynomial and then using an appropriate substitution. As usual, this is best illustrated with an example.

Example 9.19. Evaluate x dx. x2 + 6x + 10 Z Solution. We want to make the substitution u = x2 + 6x + 10, for which du =(2x + 6) dx, so we introduce 2x + 6 into the numerator by multiplying and dividing by 2 and then adding and subtracting 6:

x 1 2x 1 2x + 6 6 dx = dx = − dx x2 + 6x + 10 2 x2 + 6x + 10 2 x2 + 6x + 10 Z Z Z 1 2x + 6 1 6 = dx dx 2 x2 + 6x + 10 − 2 x2 + 6x + 10 Z Z 1 du 1 = 3 dx 2 u − (x + 3)2 + 1 Z Z 1 2 1 = ln(x + 6x + 10) 3tan− (x + 3)+C. 2 −

123 Chapter 10

Taylor Series

10.1 Taylor Polynomials

We saw in Section 5.7 that a differentiable function f (x) could be approximated quite well, when x was close to a, by the linear function

L(x)= f (a)+ f ′(a)(x a). − But, what about approximating f by a quadratic function? We should expect that a quadratic approximation would be better than a linear one, at least near a, because the latter is just a special case of the former. The key to the idea of linear approximation was that the constant term matches the value of f at a and the linear term is chosen to match the slope of f at a. In other words, the approximation has the same value as f at a and the same derivative as f at a. Clearly, to approximate f by a quadratic, we should furthermore aim to have the same second derivative as f at a. Here, we are assuming, of course, that f is twice-differentiable! You should check that there is a unique quadratic function whose value, derivative and second derivative at a matches those of f at a. It is given by 1 2 Q(x)= f (a)+ f ′(a)(x a)+ f ′′(a)(x a) . − 2 − 1 2 The factor of 2 in the quadratic coefficient arises from the fact that differentiating x twice results in 2. But why stop with quadratics? If f has derivatives of all orders up to the n-th in some open (n) interval containing a, meaning that f (x), f ′(x), f ′′(x),..., f (x) all exist for x close to a, then we may define the n-th degree Taylor polynomial of f about a to be the following polynomial:

1 2 1 (n) n Pn(x)= f (a)+ f ′(a)(x a)+ f ′′(a)(x a) + + f (a)(x a) − 2 − ··· n! − n f ( j)(a) = ∑ (x a) j. (10.1) j=0 j! − Here, we use the common convention that f (0)(a)= f (a). This Taylor polynomial matches f and its first n derivatives at x = a, that is (n) (n) Pn(a)= f (a), Pn′(a)= f ′(a), ..., Pn (a)= f (a). It therefore approximates f (x) better than any other polynomial of degree n. We remark that in the special case when a = 0, Taylor polynomials are often called Maclaurin polynomials. Note that our linear approximation L(x) is precisely the first degree Taylor polynomial P1(x).

124 Example 10.1. Find the Taylor polynomial of degree n for f (x)= 1/x about x = 1. Solution. We need to find the derivatives of f at 1:

1 1 2 6 (4) 24 f (x)= , f ′(x)= , f ′′(x)= , f ′′′(x)= , f (x)= ,... x −x2 x3 −x4 x5 (4) f (1)= 1, f ′(1)= 1!, f ′′(1)= 2!, f ′′′(1)= 3!, f (1)= 4!,... ⇒ − − You should convince yourself, if you haven’t already, that

f ( j)(1)=( 1) j j! − Substituting into the formula (10.1) for Taylor polynomials, we conclude that the degree n Taylor polynomial of f (x)= 1/x about 1 is

n ( j) n j n f (a) j ( 1) j! j j j Pn(x)= ∑ (x a) = ∑ − (x a) = ∑( 1) (x a) j=0 j! − j=0 j! − j=0 − − = 1 (x 1)+(x 1)2 (x 1)3 + +( 1)n(x 1)n. − − − − − ··· − − Example 10.2. Find the Taylor polynomial of degree n for f (x)= ex about 0.

Solution. Since ex is its own derivative, f ( j)(x)= ex for all j and so f ( j)(0)= 1 for all j. The degree n Maclaurin polynomial for ex is therefore

n ( j) 2 3 n f (0) j x x x Pn(x)= ∑ (x 0) = 1 + x + + + + j=0 j! − 2 3! ··· n!

10.2 Lagrange’s Remainder Theorem

This theorem provides a formula for the error En(x) incurred when approximating f (x) by its n-th degree Taylor polynomial Pn(x):

En(x)= f (x) Pn(x). −

Recall that the Mean Value Theorem tells us that there is an x0 between a and x such that

f (x)= f (a)+ f ′(x0)(x a), − so we may conclude that

E0(x)= f (x) P0(x)= f (x) f (a)= f ′(x0)(x a). − − − Now consider the error for linearisation:

E1(x)= f (x) P1(x)= f (x) f (a) f ′(a)(x a). − − − − To find the error in this case, we will apply the Generalised Mean Value Theorem to the func- 2 tions E1(x) and (x a) on [a,x]. We conclude that there is a number c between a and x such that − E1(x) E1(a) E (c) − = 1′ . (x a)2 (a a)2 2(c a) − − − − 125 Since E1(a)= 0 and E (x)= f (x) f (a), this becomes 1′ ′ − ′ E1(x) f ′(c) f ′(a) 1 = − = f ′′(x ), (x a)2 2(c a) 2 0 − − for some x0 between a and c (by the regular Mean Value Theorem). In other words, the error is:

1 2 E1(x)= f ′′(x0)(x a) . 2 − The generalisation to all n is as follows: Fact 10.1 (Lagrange’s Remainder Theorem). If f is (n + 1)-times differentiable on an open interval containing a and x, then there exists an x0 between a and x for which

(n+1) f (x0) n+1 En(x)= f (x) Pn(x)= (x a) . − (n + 1)! − Alternatively, we have

1 (2) 2 f (x)= f (a)+ f ′(a)(x a)+ f (a)(x a) + − 2 − ··· 1 (n) n 1 (n+1) n+1 + f (x a) + f (x0)(x a) . n! − (n + 1)! − When n = 0, this is just the mean value theorem. A question which may not have occurred to you is the following: What are the Taylor polynomials of f when f is itself a polynomial? Lagrange’s Remainder Theorem answers this question effortlessly: Fact 10.2. If f (x) is a polynomial of degree at most n, then the degree n Taylor polynomial of f about a is precisely Pn(x)= f (x). Proof. If f has degree n or less, then f (n+1)(x)= 0 for all x. Lagrange’s Remainder Theorem therefore says that there is some x0 such that

(n+1) f (x0) n+1 f (x)= Pn(x)+ (x a) = Pn(x). (n + 1)! −

In practical applications, one may need to obtain approximate values of certain quantities to some given accuracy. Alternatively, one might want to know what the size of the error could be for such a computation. Lagrange’s remainder formula can be used for both these tasks. According to the theorem, the error from using the n-th order approximation is given by

(n+1) f (x0) n+1 En(x)= (x a) , (n + 1)! − for some x0 lying between a and x. However, we are usually not in a position to determine (n+1) x0. Instead, it is usually good enough to determine an upper bound K for f (x0) with x0 between a and x. This will then give us an upper bound on the error:

K n+1 En(x) 6 x a . | | (n + 1)! | − |

126 Example 10.3. Find the degree 2 Taylor approximation to √x and use it to give an estimate of √26 and the error inherent in this approximation. Solution. Since √26 5, we use the second degree Taylor polynomial of f (x)= √x about x = 25 and Lagrange’s≈ Remainder Theorem. We compute

1 1/2 1 3/2 f ′(x)= x− and f ′′(x)= x− , 2 −4 hence that 1 1 f (25)= 5, f ′(25)= and f ′′(25)= . 10 −500 The approximation is therefore

f ′′(25) 2 1 1 √26 P2(26)= f (25)+ f ′(25)(26 25)+ (26 25) = 5 + = 5.099. ≈ − 2 − 10 − 1000 To estimate the error in this approximation, remember that this is given by

(n+1) f (x0) n+1 En(x)= (x a) , (n + 1)! − for some x0 between x and a. In our case, n = 2, a = 25 and x = 26, so we compute the third derivative, 3 f (x)= x 5/2, ′′′ 8 − and bound it on the interval [25,26]: 3/8 f (x ) 6 f (25)= . ′′′ 0 ′′′ 55

The error is therefore 3 1 3 1 E2(26) 6 26 25 = = 0.00002. | | 8 55 3! | − | 16 55 · · It follows that √26 = 5.09900 0.00002. This compares very well with the calculator’s answer (which uses a more sophisticated± approximation scheme) of 5.099019514...

10.3 Taylor Series

Given that calculus is based on the notion of limits, a very natural (and useful!) question to ask is what happens to the degree n Taylor polynomials of a function f about x = a as we let n tend to infinity. What one would hope for would be that in this limit, we recover the function f . Something like this is often true, but there are subtleties to address. First, we remark that our hope actually constitutes two separate questions:

1. Does the limit limn ∞ Pn(x) exist? → 2. When this limit exists, is it equal to f (x)?

Note that the first question may be rephrased as “For which x does limn ∞ Pn(x) exist?”. → Clearly, it exists for x = a because Pn(a)= f (a) for all n. The key to answering this question lies in a comparison with the geometric series we introduced in Fact 8.1 and Equation (8.4). The second question may often be answered using Lagrange’s Remainder Theorem.

127 Example 10.4. Determine the degree n Maclaurin polynomial Pn(x) for 1 f (x)= , 1 x − deduce for which values of x the limit limn ∞ Pn(x) exists, and check for which x this limit coincides with f (x). → Solution. It is reasonably easy to convince yourself that the j-th derivative of f is given by j! f ( j)(x)= , (1 x) j+1 − hence that f ( j)(0)= j!. The Maclaurin polynomial of degree n is therefore

n ( j) n f (0) j j Pn(x)= ∑ x = ∑ x . j=0 j! j=0 By Fact 8.1, we may rewrite these polynomials in the form

1 xn+1 P (x)= − n 1 x − and, as in Equation (8.4), the limit as n ∞ exists for x < 1 and is → | | 1 lim Pn(x)= ( x < 1). n ∞ 1 x | | → − n+1 If x > 1, the limit diverges because the limit of x does. Finally, putting x = 1 gives Pn(1)= n| | n j ∑ j=0 1 = n+1, which diverges to ∞ as n ∞, and putting x = 1 gives Pn( 1)= ∑ j=0( 1) , which diverges by oscillation (the sum evaluates→ to 1 if n is even− and 0 if n is− odd). We therefore− conclude that lim Pn(x)= f (x), n ∞ → precisely when x < 1. | | In general, we call the infinite sum

∞ f ( j)(a) ∑ (x a) j j=0 j! − the Taylor series of f about a, regardless of whether it exists for x = a. The values of x for which Taylor series do converge are almost completely specified by6 the following result, a consequence of the well-known ratio test for series convergence:

Fact 10.3. Suppose that we have a sequence c0,c1,c2, . . . of real numbers for which

c j 1 lim + = r. j ∞ c j →

∞ j 1 Then, the series ∑ j=0 c j(x a) converges for all x satisfying x a < r− and diverges for all 1 − | − | 1 x satisfying x a > r− . If r is 0 or ∞, then we should understand r− in this conclusion to be ∞ or 0, respectively.| − |

128 ∞ j The proof is not difficult, involving a careful comparison of the series ∑ j=0 c j(x a) and the ∞ j j − geometric series ∑ j=0 r (x a) . Note that this result does not tell us whether the series con- − 1 1 verges or diverges when x a = r− . The number r− is known as the radius of convergence ∞ | −j | of the series ∑ c j(x a) . j=0 − Example 10.5. For which x do the Maclaurin polynomials of ex converge to ex?

Solution. As we saw in Example 10.2, the Maclaurin polynomials of ex are

n x j Pn(x)= ∑ . j=0 j!

The ratio test will tell us the x for which the Maclaurin series n x j ∞ x j lim ∑ = ∑ n ∞ j! j! → j=0 j=0 is convergent. The c j for this series take the form 1/ j!, so we compute

c j 1 1/( j + 1)! 1 lim + = lim = lim = 0. j ∞ c j j ∞ 1/ j! j ∞ j + 1 → → →

By the ratio test, the Maclaurin series has an infinite radius of convergence, meaning that it converges for all x. ∞ j x However, we also need to check that ∑ j=0 x / j! converges to e for all x. To do this, we apply Lagrange’s Remainder Theorem to show that the error, which is the difference between ex and its degree n Maclaurin polynomial, tends to 0 as n tends to ∞. We start by noting that the degree n error has the form x0 e n+1 En(x)= x , (n + 1)! x x for some x0 lying between x and 0. Fix an x, so e 0 is bounded above by K = max 1,e and { }

K n+1 x x x En(x) 6 x = K | | | | | | . | | (n + 1)! | | 1 2 ··· n + 1 Since x is fixed, almost all of the factors in this product will be less than one for n large. As these factors get smaller and smaller as n increases, the product will tend to zero as n tends to infinity. That is, the bound on En(x) will tend to zero as n ∞ and we conclude that | | →

lim En(x)= 0. n ∞ → This conclusion holds for all x, so the error between ex and its Maclaurin polynomials tends to zero for all x. In other words, the Maclaurin series converges to ex for all x:

∞ x j 1 1 1 ex = ∑ = 1 + x + x2 + x3 + x4 + (for all x in R). j=0 j! 2 6 24 ···

129 10.4 Standard Examples of Maclaurin Series

We now give a list of the Maclaurin series of certain elementary functions with their radii of convergence. Note that in each case, the series does actually converge to the elementary function in this convergence region (the error given by Lagrange’s Remainder Theorem can be shown to tend to zero).

1 ∞ = 1 + x + x2 + x3 + = ∑ x j ( x < 1), 1 x ··· | | − j=0 x2 x3 x4 ∞ x j ex = 1 + x + + + + = ∑ (for all x), 2 6 24 ··· j=0 j! x2 x3 x4 ∞ x j ln(1 x)= x = ∑ ( x < 1), − − − 2 − 3 − 4 −··· − j=1 j | | x3 x5 x7 ∞ ( 1) jx2 j+1 sinx = x + + = ∑ − (for all x), − 6 120 − 5040 ··· j=0 (2 j + 1)! x2 x4 x6 ∞ ( 1) jx2 j cosx = 1 + + = ∑ − (for all x), − 2 24 − 720 ··· j=0 (2 j)! x3 x5 x7 ∞ x2 j+1 sinhx = x + + + + = ∑ (for all x), 6 120 5040 ··· j=0 (2 j + 1)! x2 x4 x6 ∞ x2 j coshx = 1 + + + + = ∑ (for all x), 2 24 720 ··· j=0 (2 j)! 3 5 7 ∞ j 2 j+1 1 x x x ( 1) x tan− x = x + + = ∑ − ( x < 1). − 3 5 − 7 ··· j=0 2 j + 1 | |

We can obtain the Maclaurin series for new functions from those we know because:

• The Maclaurin series of a constant multiple of f is just that multiple times the Maclaurin series of f . The radius of convergence stays the same (unless the multiple is zero!).

• The Maclaurin series of the sum of f and g is just the sum of the Maclaurin series of f and g. The radius of convergence is (at least) the minimum of the radii of convergence of f and g.

• The Maclaurin series of the product of f and g is just the product of the Maclaurin series of f and g. The radius of convergence is (at least) the minimum of the radii of convergence of f and g.

• The Maclaurin series of the derivative of f is obtained by differentiating the Maclaurin series of f , term-by-term. The radius of convergence stays the same.

• The Maclaurin series of an antiderivative of f is obtained by integrating the Maclaurin series of f , term-by-term. The radius of convergence stays the same.

Something similar is true for the ratio f /g of functions, but dividing Maclaurin series is not much fun! The radius of convergence is also non-trivial if g has zeroes within its radius of convergence. We won’t study such ratios.

130 x2 Example 10.6. Find the Maclaurin series of e− and its radius of convergence. Solution. We know that x2 x3 x4 ∞ x j ex = 1 + x + + + + = ∑ (for all x). 2 6 24 ··· j=0 j!

Replacing x by x2 therefore gives us the required series: − 4 6 8 ∞ j 2 j x2 2 x x x ( 1) x e− = 1 x + + = ∑ − (for all x). − 2 − 6 24 −··· j=0 j! Example 10.7. Find the Maclaurin series of 1 4 + 5x Solution. We write 1 1 1 = = 4 + 5x 5x 5x 4 1 + 4 1 4 − − 4      and let y = 5x/4, so that we can use the known series for 1/(1 y). Thus, − − 1 1 1 1 ∞ = = ∑ y j ( y < 1) 4 + 5x 4 1 y 4 | | − j=0 1 ∞ 5x j = ∑ − ( 5x/4 < 1) 4 4 |− | j=0   ∞ ( 5) jx j x = ∑ − j+1 ( < 4/5). j=0 4 | |

4 The radius of convergence is therefore 5 . 1 Example 10.8. Check the given Maclaurin series for ln(1 x) and tan− x using only that of 1/(1 x). − − Solution. We note that x dt ln(1 x)= for x < 1. − − 0 1 t Z − Therefore, we can integrate the Maclaurin series of 1/(1 t) term-by-term to get − x x ∞ ∞ x ∞ t j+1 ln(1 x)= ∑ t j dt = ∑ t j dt = ∑ − − 0 − 0 − j + 1 Z j=0 j=0 Z j=0 0

∞ x j+1 ∞ xk = ∑ = ∑ , − j=0 j + 1 − k=1 k as required. Because the radius of convergence of 1/(1 t) is 1, so is that of ln(1 x). 1 − − For tan− x, we instead note that x 1 dt tan− x = . 1 +t2 Z0 131 The Maclaurin series of the integrand is easily obtained from that of 1/(1 x) by substituting x = t2: − − ∞ ∞ 1 2 j j 2 j 2 = ∑( t ) = ∑( 1) t ( t < 1). 1 +t j=0 − j=0 − | | We again integrate term-by-term to arrive at the required series:

∞ x ∞ j 2 j+1 1 j 2 j ( 1) x tan− x = ∑ ( 1) t dt = ∑ − ( x < 1). 0 − 2 j + 1 | | j=0 Z j=0 Example 10.9. Find the Maclaurin series of 1 , (1 x)2 − first by multiplying that of 1/(1 x) by itself, and second by differentiating that of 1/(1 x). − − Solution. Multiplying Maclaurin series is a little painful. The trick is to remember to use different summation indices: 1 1 1 ∞ ∞ ∞ ∞ = = ∑ x j ∑ xk = ∑ ∑ x j+k. (1 x)2 1 x 1 x − − − j=0 k=0 j=0 k=0 To get this into the form of a Maclaurin series, we let ℓ = j +k and note that as k runs from 0 to ∞, ℓ will run from j to ∞. Thus, 1 ∞ ∞ = ∑ ∑ xℓ. (1 x)2 − j=0 ℓ= j The final step is to change the order of the summations and evaluate the sum over j. To do this, draw the region specified by j > 0 and ℓ > j in the plane. The current order of the summations tells us that once j is fixed, let ℓ run from j to ∞. We want the opposite order, so suppose that ℓ is fixed. Your picture should make it clear that ℓ can range from 0 to ∞. However, once we fix ℓ, j can only range from 0 to ℓ. In other words, we have

1 ∞ ℓ ∞ = ∑ ∑ xℓ = ∑(ℓ + 1)xℓ, (1 x)2 − ℓ=0 j=0 ℓ=0 as the sum over j is now trivial to evaluate. Using derivatives is much much easier: Simply differentiate term-by-term: ∞ ∞ ∞ ∞ 1 d 1 d j d j j 1 ℓ = = ∑ x = ∑ x = ∑ jx − = ∑(ℓ + 1)x . (1 x)2 dx 1 x dx dx − − j=0 j=0 j=1 ℓ=0 Here, we have noted that j = 0 contributes nothing in the second-last summation, so we have removed it and substituted j = ℓ + 1. Either way, we get the same answer. The radius of convergence is still 1. Finally, we conclude with a quick application of this technology to computing limits. Example 10.10. Evaluate the following limit using Maclaurin polynomials: 2sinx sin(2x) lim − . x 0 2ex 2 2x x2 → − − − 132 Solution. We begin by replacing the sine and exponential functions with their degree 3 Maclau- rin polynomials (plus error terms denoted by O(xn) where xn is the first term we omit from the Maclaurin series):

2sinx = 2x x3/3 + O(x5), sin(2x)= 2x 4x3/3 + O(x5), − − 2ex = 2 + 2x + x2 + x3/3 + O(x4).

The limit therefore becomes 2sinx sin(2x) 2x x3/3 2x + 4x3/3 + O(x5) lim − = lim − − x 0 2ex 2 2x x2 x 0 2 + 2x + x2 + x3/3 2 2x x2 + O(x4) → − − − → − − − x3 + O(x5) = lim . x 0 x3/3 + O(x4) → Dividing numerator and denominator by x3 now gives the answer:

2sinx sin(2x) 1 + O(x2) lim − = lim = 3. x 0 2ex 2 2x x2 x 0 1/3 + O(x) → − − − → Note that we needed polynomials of degree at least 3 because all the lower degree terms can- celled out in both the numerator and denominator.

10.5 The Binomial Theorem

There is one standard function missing from the examples of Maclaurin series given in the pre- vious section. This is the square root function (and its generalisations). This series is extraordi- narily important and thus warrants a section all of its own. Consider therefore the function

f (x)=(1 + x)n, where n is an arbitrary, but fixed, real number. Let’s differentiate:

n 1 n 2 n 3 f ′(x)= n(1 + x) − , f ′′(x)= n(n 1)(1 + x) − , f ′′′(x)= n(n 1)(n 2)(1 + x) − ,... − − − The pattern is clear — the j-th derivative at 0 is given by

f ( j)(0)= n(n 1) (n j + 1), − ··· − so the Maclaurin series of (1 + x)n is therefore

∞ n(n 1) (n j + 1) ∑ − ··· − x j. j=0 j!

The radius of convergence is easy to compute using Fact 10.3. The coefficients in the Maclaurin series have the form c j = n(n 1) (n j + 1)/ j! and − ··· − n(n 1) (n j) − ··· − c j+1 ( j + 1)! n j lim = lim = lim − = 1, j ∞ c j j ∞ n(n 1) (n j + 1) j ∞ j + 1 → → − ··· − →

j!

133 so the radius of convergence is 1. One can show that the Maclaurin series indeed converges to (1 + x)n within this radius, so we have

∞ n(n 1) (n j + 1) (1 + x)n = ∑ − ··· − x j ( x < 1). j=0 j! | | This result is known as the Binomial Series. Note that when n is a non-negative integer, the coefficients may be simplified to

n! n = if j 6 n, c = j!(n j)! j j  −   0 otherwise,  n where we recognise the binomial coefficients j . The binomial series therefore truncates to the polynomial expansion  n n! n n (1 + x)n = ∑ x j = ∑ x j (n = 0,1,2,...). j!(n j)! j j=0 − j=0   There is no constraint upon x now because the series is just a polynomial. Using the trick y n (x + y)n = xn 1 + x   and expanding in y/x, we arrive at the following useful result: Fact 10.4 (The Binomial Theorem). For n is a non-negative integer,

n n n j n j (x + y) = ∑ x y − . j j=0   Of course, the binomial series for general n is only guaranteed to converge when x < 1. | | For some n, the series will converge for x = 1 and/or x = 1 as well. − 1 Example 10.11. Compute the binomial series for n = 2 and use it to compute the Maclaurin 1 − series of sin− x and its radius of convergence. Solution. With n = 1 , the coefficients of the binomial series become − 2 1 1 3 5 2 j 1 ( 1) j(2 j 1)!! c j = − = − − , j! −2 −2 −2 ··· − 2 2 j j!       where we introduce the double factorial, given by

m(m 2)(m 4) 2 if m is even and positive, − − ··· m!! = m(m 2)(m 4) 1 if m is odd and positive,  − − ··· 1 if m = 0 or m = 1. − We therefore have  ∞ j 1 ( 1) (2 j 1)!! j 1 3 2 5 3 35 4 = ∑ − j − x = 1 x + x x + x ( x < 1). √1 + x j=0 2 j! − 2 8 − 16 128 −··· | |

134 1 2 To derive the Maclaurin series for sin− x, we replace x by t and then integrate: − 1 ∞ ( 1) j(2 j 1)!! ∞ (2 j 1)!! = ∑ − − ( t2) j = ∑ − t2 j √1 t2 2 j j! − 2 j j! − j=0 j=0 x ∞ 1 dt (2 j 1)!! 2 j+1 sin− x = = ∑ j − x ⇒ 0 √1 t2 2 j!(2 j + 1) Z − j=0 1 3 5 = x + x3 + x5 + x7 + ( x < 1). 6 40 112 ··· | |

135 Chapter 11

Differential Equations

11.1 Ordinary Differential Equations and their Solutions

A differential equation is the name given to any equation in which the variable to be solved for appears as a derivative with respect to an independent variable. Examples include dy d2y dy 2 = 7y and 3 = 0. dx − dx2 − dx   When there is a single independent variable, differential equations are said to be ordinary. One therefore speaks of ordinary differential equations (often abbreviated as ODEs). The order of a differential equation is that of the highest derivative present in the equation. Thus, the above differential equations are first order and second order, respectively. Given a first order ordinary differential equation of the form dy = f (x,y), dx we can consider its solution curve. This is a function y(x) whose graph has slope at (x,y) given by dy/dx = f (x,y). The direction field is a geometric object formed by a grid of arrows which are always tangential to the solution curves. We will take the arrows to be centered at the point (x,y) on the solution curve and have slope dy/dx. The point of the direction field is that they are significantly easier to calculate than the actual solution curve. Moreover, direction fields give one an idea of what the solution curves look like, since the tangent line to each point of a solution curve is given by the direction field at that point (see Figure 11.1). If the function y = f (x) satisfies a differential equation on some open interval, then it is said to be a solution to that differential equation on that interval. This means that if we substitute y = f (x) into the differential equation, we get an identity. For example, the function 9 y = x4 4 is a solution of the ordinary differential equation dy = 6xy1/2 dx on the interval ∞ < x < ∞. Verifying this is very easy. We just substitute the solution into the differential equation:− d 9 9 1/2 x4 = 6x x4 9x3 = 9x3. dx 4 4 ⇒     136 Figure 11.1: An illustration of a single arrow of the vector field of a differential equation. The curve y = f (x) is assumed to be a solution curve which passes through (X0,Y0).

This is indeed an identity, demonstrating that we have found a solution. Differential equations generally have many solutions. This shouldn’t surprise us, because integration tends to introduce arbitrary constants. For example, the differential equation dy 1 = x + dx 2 is really just asking us to find an antiderivative of the right-hand side. Integrating both sides gives 1 x2 x y = x + dx = + +C, 2 2 2 Z   where C is an arbitrary constant. The functions y = y(x) for any choice of C are therefore all solutions to the differential equation. Moreover, it is clear that these are the only solutions. Solutions like this one, involving arbitrary constants, are called general solutions. They corre- spond to an infinite family of solution curves, one for each C. If, however, we choose a specific C, to obtain a specific solution, then the result is called a particular solution of the differential equation.

11.2 Initial Value Problems

An ordinary differential equation, equipped with an condition specifying a point (x0,y0) which must belong to the solution curves, is called an initial value problem. A common means of expressing an initial value problem is as follows: dy = f (x,y), with y(x )= y . dx 0 0 Here, x0 and y0 are given values that are interpreted as the initial values from which the required solution of the equation should be derived. As an example, consider dy 1 = x + , with y(1)= 2. dx 2 This is identical to the example treated in the previous section, but now we have the initial value y(1)= 2 (so x0 = 1 and y0 = 2). We have already seen that the general solution is x2 x y(x)= + +C. 2 2 137 Figure 11.2: The vector field (top) and solution curves (bottom) of the differential equation 1 dy/dx = x + 2 .

138 What the initial value does is pick out a particular solution by requiring that the solution curve pass through (x0,y0)=(1,2). To find which C is distinguished by this requirement, we simply substitute x = x0 = 1 and y = y0 = 2 into our general solution: 1 1 2 = + +C C = 1. 2 2 ⇒ The solution to the initial value problem is therefore the particular solution

x2 x y(x)= + + 1. 2 2 11.3 Separable Differential Equations

Many first order differential equations can be put in the following form: dy f (y) = g(x) f (y) dy = g(x) dx. dx ⇒ Such a differential equation is said to be separable because the variables x and y can be separated from one another so that x appears only on the right-hand side and y appears only on the left- hand side (or vice-versa). To solve an equation of this type, we just integrate both sides with respect to the variable appearing on that side:

f (y) dy = g(x) dx +C. Z Z This gives the general solution in which C is an arbitrary constant of integration. Example 11.1. Find the general solution of dy + 3x2y = 0. dx Solution. This equation is separable because we can write dy = 3x2 dx. y − Integrating both sides gives

dy 2 3 = 3 x dx ln y = x +C′. y − ⇒ | | − Z Z The general solution is therefore

x3+C x3 y = e− ′ = Ce− ,

3 where we have written C = eC′ . Note that we should really have written y = e x +C′ . However, | | − allowing C = eC′ to take negative as well as positive values lets us remove the absolute value signs without consequence. Sometimes, the algebraic manipulation that allows us to separate the variables introduces divi- sion by one or more expressions. In such cases, the results are valid where the denominators are different from zero, but care should be taken when a denominator vanishes.

139 11.4 Applications

Our first application of ordinary differential equations is to radioactive decay. Specifically, we treat the decay of carbon-14 which is important in dating artefacts from prehistory. The carbon in the atmosphere is mostly present as carbon dioxide and consists largely of the stable isotopes 12C and 13C with a small amount of the radioactive isotope 14C (carbon-14). This isotope is formed by the action of cosmic rays, or rather the neutrons that they produce, on the stable isotope 14N (nitrogen-14) in the upper atmosphere. It then decays back into 14N through beta decay. In the atmosphere, this process is in equilibrium, so the proportions of the three naturally occurring isotopes of carbon in the atmosphere is very nearly constant. Living organisms absorb carbon from the atmosphere and so have 12C, 13C and 14C in this constant ratio. However, death halts this absorption, so no more 14C is incorporated into the remains. Being radioactive, the old 14C slowly decays away and the ratio of 14C to 12C slowly decreases. If we know how this ratio decreases, then we can determine how long ago the remains of something that was alive (eg. bone, wood, rope, amber, coral) died. Radioactive material such as 14C is usually assumed to decay at a rate directly proportional to the amount of material. This results in a very simple ordinary differential equation: dx = kx. dt − Here, x is the amount of 14C, t is the time since death, and k is a constant of proportionality that is characteristic of the material. The growth rate is explicitly assumed to be negative, expressing the fact that we are studying a decay. This differential equation is separable, so the general solution is easily found: dx dx = k dt = k dt lnx = kt +C′. x − ⇒ x − ⇒ − Z Z Note that x is non-negative, being the amount of 14C present in the sample, hence we can dispense with the absolute values in the logarithm. Simplifying, we obtain

kt+C kt C x(t)= e− ′ = Ce− where C = e ′ .

14 You can check that x(0), the amount of C at time t = 0, is just C. After some time t1, half of this initial amount will have decayed and so half will remain. Mathematically, x(0) x(t )= . 1 2 Thus, x 0 1 ln2 ( ) kt1 kt1 = x(0)e− e− = t1 = . 2 ⇒ 2 ⇒ k We thereby learn that the length of time taken for half of a sample of radioactive material to decay depends only on the constant k and not on the initial amount x(0). This length of time is called the half-life of the material h. We have shown above that ln2 ln2 h = , so k = , k h and that the amount of radioactive material left at any time t will be given by

t ln2/h x(t)= x(0)e− .

140 The half-life of 14C has been measured to be approximately 5730 years. So, if you find a bone that is measured to have 40% of the 14C that living things contained, we can calculate its age as follows: x(t)/x(0)= 0.4, so

t ln2/(5730y) ln0.4 0.4 = e− t = (5730y) 7575y. ⇒ − ln2 ≈ In other words, the bone is around seven and a half thousand years old.

11.5 First Order Linear Differential Equations

A first order differential equation is said to be linear if it can be written in the form

y′ + p(x)y = q(x).

This equation is linear in the unknown function y and its derivative y′, but p and q can be arbitrary functions of x (though we must impose conditions if we wish the equation to have solutions!). Note that if q(x)= 0, then the equation is separable. However, first order differential equations may always be solved using the integrating factor method. To describe this, choose an antiderivative P(x) of p(x) and multiply both sides of the differential equation by eP(x):

P(x) P(x) P(x) e y′ + e p(x)y = e q(x).

P(x) Because P′(x)= p(x), we recognise the left-hand side as the derivative of e y (check this using product rule!). Integrating both sides with respect to x therefore gives

P(x) P(x) P(x) P(x) e y = e q(x) dx y = e− e q(x) dx. ⇒ Z Z The function eP(x) is called an integrating factor for the differential equation, because multiply- ing the equation by this factor allows us to integrate (solve) it. Of course, we might not be able to do the integration required in the solution, but it’s the solution nonetheless.

Example 11.2. Find the general solution of the differential equation

y′ + 3y = x.

Solution. This is in the standard form for first order differential equations, with p(x)= 3 and q(x)= x. An integrating factor is therefore

e p(x) dx = e 3 dx = e3x. R R Notice that we don’t need an arbitrary constant — any antiderivative will do! The general solution is therefore 3x 3x 3x 3x y = e− e q(x) dx = e− e xdx. Z Z The remaining integration can be evaluated using integration by parts. Set u = x and dv = e3x dx, 1 3x so that du = dx and v = 3 e . Thus,

3x 1 3x 1 3x x 3x 1 3x x 1 3x y = e− x e e dx = e− e +C = Ce− . · 3 − 3 3 − 9 3 − 9 −  Z    141 Example 11.3. At time t = 0, a tank contains 4 grams of salt dissolved in 100 litres of water. Suppose that brine containing 2 grams of salt per litre of water enters the tank at a rate of 5 litres per minute and that the mixed solution is drained from the tank at the same rate. Find the amount of salt in the tank after 10 minutes.

Solution. If y(t) is the amount of salt, measured in grams, in the tank at time t, measured in minutes, we want to find y(10). The rate of change of the amount of salt in the tank is dy/dt. Thus, dy = rate of salt in rate of salt out. dt − The rate in is easy to compute:

rate of salt in = 2g/L 5L/min = 10g/min. × To get the rate out, note that at time t, the mixture contains y(t) grams of salt in 100 litres of water, so the concentration of salt at time t is just y(t)/100g/L. We therefore have

y(t) y(t) rate of salt out = g/L 5L/min = g/min. 100 × 20 In this way, we arrive at our differential equation: dy y dy y = 10g/min + = 10g/min. dt − 20min ⇒ dt 20min As this is a first order linear differential equation with initial condition y(0)= 4g, we can solve it with an integrating factor. Here, p(t)= 1/(20min) and q(t)= 10g/min. The integrating factor may therefore be chosen as

e p(t) dt = e dt/(20min) = et/(20min). R R Note that the units cancel out as they must (you can’t exponentiate a dimensionful quantity!). The general solution is now

t/(20min) t/(20min) t/(20min) t/(20min) y(t)= e− e 10g/min = e− 10g/min 20min e +C′ × × Z t/(20min)   = 200g +Ce− .

Substituting in the initial condition, we obtain

4g = 200g +C C = 196g. ⇒ − The amount of salt in the tank after 10 minutes is therefore

(10min)/(20min) y(10min)= 200g 196g e− 81.12g. − × ≈

142 Chapter 12

Functions of Several Variables

12.1 Definitions and Geometry

Recall that we defined a function f as a rule of correspondence between two subsets of R, the domain and the range, which assigns a single element of the range to each element of the domain. We now generalise this definition to consider functions in which the domain is a subset of Rn, for some n. The range will still be assumed to be a subset of R — we will therefore speak of real-valued functions of n real variables. One can also consider ranges in Rm leading to vector-valued functions (and more), but we shall not do so here. Typical examples of the multivariable functions that we will deal with are those given by

f : R2 R, f (x,y)= x2 + y2, g: R3 R, g(x,y,z)= x2y3 + xcosz + yzsinx. −→ −→ We can visualise a functionp f of two variables geometrically by considering the surface in R3 consisting of those points (x,y,z) for which

z = f (x,y).

The function f simply gives the height of the surface above the xy-plane. This surface is there- fore the generalisation to three dimensions of the graph of the function f . We sketch the graph of a particular function of two variables in Figure 12.1. An alternative way of visualising a function of two variables is through what is called its level curves. These are contours in the xy-plane along which f (x,y) has a fixed value (height above the xy-plane). Thus, each plane parallel to the xy-plane meets the surface in a curve (or curves), with equation f (x,y)= k. These curves are usually plotted in a single plane for a number of values of k. As with a map, only a representative set can be drawn on any one picture. If the contours are drawn for equally spaced values of k, then the spacing of the contours gives information about the relative steepness at various places — the surface is steepest where the contour lines are closest together. Local maxima and minima can be found where the tangent plane to the graph of the function is horizontal. Around these extrema, the contours are concentric ovals (or something similar). This is because the function f (x,y) decreases (or increases) as it moves away from its maximum (or minimum). However, functions of two variables may also have saddle points. These are generalisations of those stationary points of a function of one variable which are neither local maxima or minima; f (x)= x3 has such a point at x = 0. At these points, functions increase in

143 Figure 12.1: A graph of the hyperbolic paraboloid f (x,y)= y2 x2. − some directions and decrease in other directions. This is the characteristic property of saddle points. Some examples of the graphs of a function of two variables are shown in Figure 12.2. In these examples, you should try to recognise the minima, maxima and saddle points, both from the surfaces and their level curves. We’ll see in Section 13.6 how to find these points.

Example 12.1. Consider the function f (x,y)= y2 x2. The level curves of f are then the curves y2 x2 = k, for k constant. There are three cases:− − (i) For k > 0, the level curves are the hyperbolae

y2 = x2 + k y = x2 + k ( ∞ < x < ∞). ⇒ ± − p (ii) For k < 0, the level curves are the hyperbolae

x2 = y2 k x = y2 k ( ∞ < y < ∞). − ⇒ ± − − p (iii) For k = 0, the level curves are the straight lines

y2 = x2 y = x ( ∞ < x < ∞). ⇒ ± − We draw these level curves in Figure 12.3.

When we consider a function F of three variables, it’s natural to refer not to level curves, but level surfaces. These satisfy F(x,y,z)= k,

144 Figure 12.2: Graphs of various two-variable functions.

145 Figure 12.3: The level curves of the function f (x,y)= y2 x2 considered in Example 12.1. − where k is a constant. An example is the gravitational potential of a body of mass M, given by GM GM V(x,y,z)= = . x2 + y2 + z2 r

Here, G is the gravitational constant and pr is the distance from the origin (0,0,0). The level surfaces are given by V(x,y,z)= k and a simple rearrangement shows that they are concentric spheres: GM x2 + y2 + z2 = . k 12.2 Domains and Subsets of Rn

We know what is meant by an open or closed interval of R. How does this generalise to higher dimensions? An arbitrary set S in Rn is said to be open if around every point of S, we can find a ball which is also completely contained in S. This applies to the case n = 1 (S R) in which case a “ball” is just a line (an interval in fact) and to the case n = 2 (S R2) in⊆ which case a “ball” is a disc. ⊆ Define the complement of a set S Rn to be the set Sc of points of Rn that do not belong to S. A closed set may then be defined⊆ as a set whose complement is open. A somewhat less circuitous, but equivalent, definition may be obtained by using the concept of boundary points. The boundary points of an arbitrary set S Rn are those points x in Rn for which every ball centered at x is partly in S and partly not⊆ in S. The set of all boundary points of S is called the boundary of S, denoted ∂S. The boundary points of S need not belong to S — in fact, the boundary points of an open set are never part of the set. The promised alternative definition of closed sets is that a closed set is one which includes all of its boundary points.

146 Figure 12.4: A closed disc S consisting of points (x,y) R2 satisfying x2 + y2 6 1. The bound- ary of S is the circle x2 + y2 = 1, the interior is the open∈ disk x2 + y2 < 1 and the exterior is the open set x2 + y2 > 1.

Finally, we define the interior of a set S to be the set of points in S which are not in its boundary ∂S. Similarly, the exterior of S is the set of points in its complement Sc which do not belong to its boundary. Both the interior and the exterior of any set are open sets (assuming we’re happy to regard the empty set as being open, which we are!). Notice that the boundaries of S and its complement Sc are always the same. The boundary of any set is closed (assuming we’re happy to regard the empty set as being closed, which we are!). Figure 12.4 describes a simple example illustrating these concepts. Now that we have some set-theoretic terminology under our belts, we can discuss open and closed balls in more detail. Recall that in one dimension, balls are just intervals. Given a centre c and radius r, the corresponding open and closed balls are the intervals (c r,c + r)= x : (x c)2 < r and [c r,c r]= x : (x c)2 6 r , − − − − − respectively. The corresponding balls in R2 are open and closed discs about a centre (a ,b) and a radius r: (x,y) : (x a)2 +(y b)2 < r and (x,y) : (x a)2 +(y b)2 6 r . − − − − The story is is exactly the same in R3, except that we use the euclidean distance squared (x a)2 +(y b)2 +(z c)2. It should be clear now how this generalises to Rn. − − − Recall that the set of points for which a function may be applied is called its domain. For example, the following multivariable function f : D R has the indicated domain as a subset of R2: → x2 y2 f (x,y)= − , D = (x,y) =(0,0) . x2 + y2 { 6 } It’s quite common for a domain to consist of a nice open set (like R2) from which certain iso- lated points have been removed. This removal is known as puncturing for (hopefully) obvious reasons. Thus, the domain of the function f given above is sometimes referred to as a “punc- tured plane”. Example 12.2. Find the domain of the function f (x,y)= 4 x2 4y2. − − p 147 Figure 12.5: The graph of the function f (x,y)= 4 x2 4y2. − − p Solution. The function f is clearly defined for for all (x,y) in R2 satisfying 4 x2 4y2 > 0 x2 + 4y2 6 4. − − ⇒ Thus, the domain consists of all points inside the closed ellipse (see Figure 12.5) x2 y2 + = 1. 22 12 Formally, the domain may be written in the form (x,y): x2 + 4y2 6 4 .  12.3 Limits and Continuity

As one would expect, all the tools of multivariable calculus derive from the notion of a limit, so we turn to what this means in higher dimensions. In one dimension, we had the limit as x tends to a point a. In two dimensions, this is replaced by the pair (x,y) tending to a point (a,b) in R2. The generalisation to higher dimensions should be obvious. Here is the definition of limit in two dimensions: Definition. Let f (x,y) be a real-valued function. We write lim f (x,y)= L (x,y) (a,b) → if for any given ε > 0, we can find a δ > 0 such that if (x,y) =(a,b) belongs to the domain of f and 6 (x a)2 +(y b)2 < δ, − − then q f (x,y) L < ε. | − | 148 What does it mean? It means that the limit of f as (x,y) tends to (a,b) is L when we can make the value of the function at (x,y) as close as we like to L by restricting (x,y) to lie inside a sufficiently small punctured disc centered at (a,b). The “puncturing” here is motivated by the maxim that limits should not care about what is happening at the limit point, in this case (a,b), but what is happening as one approaches it. The generalisation to higher dimensions just involves replacing “disc” by “ball’. Now that we know what a limit is, here’s the definition of continuity for a two-variable function:

Definition. The real-valued function f (x,y) is said to be continuous at the point (a,b) if:

• f (a,b) is defined,

• lim f (x,y) exists, and (x,y) (a,b) → • lim f (x,y)= f (a,b). (x,y) (a,b) → The same definition works in higher dimensions. One can use the above definitions to show that the functions

f1(x,y)= x and f2(x,y)= y are continuous, as are their higher-dimensional generalisations. Here are some important re- sults, familiar in one dimension, which allow one to construct new continuous functions from old ones:

Fact 12.1. If f and g are real-valued functions on a common domain D Rn which are both continuous at the point r D, then each of k f (k R), f + g, f g and f g⊆ are likewise contin- uous at r. Moreover, if g(∈r) = 0, then f /g is continuous∈ at r. − 6 Fact 12.2. If f is a continuous real-valued function on a domain D Rn and g is a continu- ous real-valued function on a domain S R that is contained within⊆ the range of f , then the composition g f is a continuous real-valued⊆ function on D. ◦ It follows from the first result that all polynomials in x and y (or x, y and z) are continuous functions on R2 (or R3). The second lets us conclude that so are functions such as cos(x2 +y3), x2 + y2, . . . . Beware however! Limits in more than one dimension can be surprisingly subtle as the following examples show. p Example 12.3. Evaluate the limit

x2 lim . (x,y) (2,1) x2 + y2 → Solution. Here, we note that the function is continuous everywhere except where the denomi- nator vanishes: (x,y)=(0,0). Direct substitution therefore gives

x2 22 4 lim = = . (x,y) (2,1) x2 + y2 22 + 12 5 →

149 Example 12.4. Show that x2y lim = 0. (x,y) (0,0) x2 + y2 → Solution. This function is not defined at (0,0). However, x2 x2y x2 6 x2 + y2 6 1 0 6 y . ⇒ x2 + y2 ⇒ x2 + y2 − | |

As (x,y) (0,0), it is clear that y 0. Thus, → | | → x2y 0 0 as (x,y) (0,0), x2 + y2 − −→ −→

2 2 2 which is another way of saying that the limit of x y/(x +y ) as (x,y) tends to 0 is 0, as required. [By the way, the value of the limit is the zero inside the absolute values on the left, not the zero after the arrow! The zero after the arrow is indicating that the distance between the value of the function and the limit point becomes vanishingly small.] Note that in Example 12.4, what we’re really doing is squeezing the value of the function be- tween the values of the functions z = 0 and z = y . In other words, we are using the Squeeze Theorem (Fact 3.3) to show that the limit exists. | | Example 12.5. Show that the function xy(x2 y2) − if (x,y) =(0,0), f (x,y)= x2 + y2 6  0 if (x,y)=(0,0) is continuous at the origin.  Solution. The function is defined at the origin: f (0,0)= 0. To demonstrate continuity, we must show that xy(x2 y2) lim − (x,y) (0,0) x2 + y2 → exists and is equal to f (0,0)= 0. Again, we use the Squeeze Theorem to show that the differ- ence between the function value and the proposed limit becomes vanishingly small: xy(x2 y2) x2 y2 − 0 = x y − 6 x y . x2 + y2 − | || | x2 + y2 | || |

We can therefore squeeze this difference between 0 and xy , both of which tend to 0 as (x,y) | | (0,0). This proves that the difference tends to 0, hence that → lim f (x,y)= 0. (x,y) (0,0) →

Example 12.6. Show that the function f : R2 R defined by → x2 y2 − if (x,y) =(0,0), f (x,y)= x2 + y2 6  0 if (x,y)=(0,0) does not have a limit at (0,0). 

150 Figure 12.6: Graphs of the functions considered in Example 12.4 (left) and Example 12.5 (right).

Solution. One way to show that a function does not have a limit is to look at the values the function takes when restricted to lines through the limit point. For the limit to exist, it can’t depend on which line we take. In this case, we need to look at lines through the origin. Along the x-axis (y = 0), the function takes values

x2 f (x,0)= = 1 (x = 0), x2 6 so it’s pretty clear that the limit of f (x,y) as (x,y) (0,0) along this line is 1. However, if we restrict to the y-axis (x = 0), then the function takes→ the values

y2 f (0,y)= − = 1 (y = 0), y2 − 6

The limit of f (x,y) along this line is therefore 1. Because the limits along different lines are different, the limit as (x,y) (0,0) in general cannot− exist. This can be seen in more→ generality if one switches to polar coordinates (x = r cosθ, y = r sinθ): r2 cos2 θ r2 sin2 θ f (x,y)= − = cos(2θ) (r = 0). r2 cos2 θ + r2 sin2 θ 6 The function is therefore constant along lines of constant angle θ. Different lines (different θ) give different results as r 0, we again conclude that the limit as (x,y) (0,0) cannot exist. → →

Example 12.7. Does the function defined by

x3y f (x,y)= x6 + y2 have a limit as (x,y) (0,0)? →

151 Figure 12.7: Graphs of the functions considered in Example 12.6 (left) and Example 12.7 (right).

Solution. The function is clearly not defined at the origin. Let’s try looking at the lines y = mx through the origin and see what happens as m varies:

mx4 mx2 f (x,mx)= = 0 as x 0 for any m. x6 + m2x2 x4 + m2 −→ → Even along x = 0, we have f (x,y)= f (0,y)= 0. Thus, the limit exists when we restrict to any straight line through the origin and its value is 0. You might think that this guarantees that the limit as (x,y) (0,0) exists and has value 0, but it does not. You might think that this is pretty strong evidence→ that this limits exists and has value 0. However, if you try to prove it using the Squeeze Theorem, you will meet with failure. The limit does not exist and the way to see it is to look at the values of the function as we approach the origin along the curve y = x3:

x6 1 f (x,x3)= = . x6 + x6 2

1 Along this curve, the limiting value is clearly 2 which differs from the limiting value along any straight line. In fact, one can get any limiting value by restricting to cubic curves through the origin. The moral of the story is then that limits can be really tough in higher dimensions. This is the real reason why mathematicians developed these ε-δ definitions for limits and continuity. It’s tough, but they allow you to be absolutely sure. And at the end of the day, that’s what matters (if you’re a mathematician).

152 Chapter 13

Multivariable Differentiation

13.1 Partial Derivatives

A function of n variables has n first-order partial derivatives. To partially differentiate a func- tion of several variables with respect to a given variable, one just uses ordinary differentiation while treating all other variables as constants. For a function f (x,y) of two variables, there are two partial derivatives of f , one with respect to x and one with respect to y. They are defined by

f (x + h,y) f (x,y) fx(x,y)= lim − (the derivative of f with respect to x), h 0 h → f (x,y + k) f (x,y) fy(x,y)= lim − (the derivative of f with respect to y). k 0 k → The partial derivatives of a function of three variables are similarly given by

f (x + h,y,z) f (x,y,z) fx(x,y,z)= lim − , h 0 h → f (x,y + k,z) f (x,y,z) fy(x,y,z)= lim − , k 0 k → f (x,y,z + ℓ) f (x,y,z) fz(x,y,z)= lim − , ℓ 0 ℓ → and the generalisation to more variables should be clear. The most popular notation for partial derivatives mimics that of ordinary derivatives: ∂ f ∂ f ∂ f , , , ... ∂x ∂y ∂z There is also a rather uncommon notation which some textbooks prefer:

f1(x,y), f2(x,y).

Here, the subscripts 1 and 2 refer to differentiation with respect to the first (x) and second (y) variable, respectively. For functions of two variables, the partial derivative fx(a,b) measures the rate of change of f (x,y) as x varies around a while y is kept fixed at b. Similarly, fy(a,b) represents the rate of change of f (x,y) as y varies around b while x is kept fixed at a. Example 13.1. Find the partial derivatives of the function f (x,y)= x3y2 2x2 cosy. − 153 Solution. We differentiate with respect to x, treating y as a constant, and vice-versa: ∂ f ∂ f = 3x2y2 4xcosy, = 2x3y + 2x2 siny. ∂x − ∂y Example 13.2. Find the partial derivatives of g(x,y)= sin(xy) and evaluate them at the point (1,π). Solution. Since

gx(x,y)= ycos(xy) and gy(x,y)= xcos(xy), substitution yields

gx(1,π)= π cosπ = π and gy(1,π)= cosπ = 1. − − Now, ∂ f /∂x = fx and ∂ f /∂y = fy are themselves functions of the two variables x and y, so they may themselves be differentiable and so have partial derivatives. Naturally, these are referred to as the second partial derivatives of f . There are four of them: ∂ 2 f ∂ ∂ f ∂ ∂ 2 f ∂ ∂ f ∂ = = f = f , = = f = f , ∂x2 ∂x ∂x ∂x x xx ∂x∂y ∂x ∂y ∂x y yx     ∂ 2 f ∂ ∂ f ∂ ∂ 2 f ∂ ∂ f ∂ = = f = f , = = f = f . ∂y∂x ∂y ∂x ∂y x xy ∂y2 ∂y ∂y ∂y y yy     A function of three variables will have nine partial derivatives, and so on. Example 13.3. Find the second partial derivatives of the functions f and g of Examples 13.1 and 13.2. Solution. For f (x)= x3y2 2x2 cosy, we differentiate each partial derivative with respect to both x and y: − ∂ 2 f ∂ 2 f = 6xy2 4cosy, = 6x2y + 4xsiny, ∂x2 − ∂x∂y ∂ 2 f ∂ 2 f = 6x2y + 4xsiny, = 2x3 + 2x2 cosy. ∂y∂x ∂y2 The second derivatives of g(x)= sin(xy) are, similarly, ∂ 2g ∂ 2g = y2 sin(xy), = cos(xy) xysin(xy), ∂x2 − ∂x∂y − ∂ 2g ∂ 2g = cos(xy) xysin(xy), = x2 sin(xy). ∂y∂x − ∂y2 − Notice that in both examples, the mixed partial derivatives are equal: ∂ 2 f ∂ 2 f = . ∂x∂y ∂y∂x This is nearly always the case. Moreover, it generalises in an obvious fashion to functions of more than two variables. For example, if f is a function of three variables x, y and z, then we have fxxy = fxyx = fyxx and fxyy = fyxy = fyyx, at least when f is sufficiently well-behaved. Here is the precise guarantee:

154 n Fact 13.1. If f is a real-valued function on D R whose second partial derivatives fx x and ⊆ i j fx jxi exist and are continuous at a point r, then these partial derivatives are equal at r:

fxix j (r)= fx jxi (r). For n = 2, this says that ∂ 2 f ∂ 2 f (a,b)= (a,b), ∂x∂y ∂y∂x provided only that these mixed partials exist and are continuous at (a,b). Example 13.4. Show that the mixed partial derivatives of the function f : R2 R given by → 1 if y = 0, f (x,y)= 6 (0 if y = 0 are not equal at (0,0).

Solution. It is clear that fx(x,y) = 0 everywhere because f does not depend on x. Thus, fxy(x,y)= 0 everywhere as well. In particular, fxy(0,0)= 0. However, fy does not exist at any point (x,y) with y = 0, so fyx certainly cannot exist at (0,0). Example 13.5. Show that the function

xy(x2 y2) − if (x,y) =(0,0), f (x,y)= x2 + y2 6  0 if (x,y)=(0,0) has mixed partial derivatives at (0,0), but that they are not equal. Solution. We showed in Example 12.5 that this function was continuous at the origin. To show it is differentiable there, we use the quotient rule:

y(x4 + 4x2y2 y4) fx(x,y)= − fx(0,y)= y ((x,y) =(0,0)), (x2 + y2)2 ⇒ − 6 x(x4 4x2y2 y4) fy(x,y)= − − fy(x,0)= x ((x,y) =(0,0)). (x2 + y2)2 ⇒ 6 This gives us the first partial derivatives everywhere except at (0,0). There, we have to use the definition. Luckily, f (x,0) and f (0,y) both vanish identically, so f (h,0) f (0,0) f (0,k) f (0,0) fx(0,0)= lim − = 0 and fy(0,0)= lim − = 0. h 0 h k 0 k → → We now have both first partial derivatives. We could check to make sure that these derivatives are continuous, but we may as well just go ahead and see if their derivatives, which will be the second partial derivatives of f , exist at (0,0). Again, we have to use the definition:

fx(0,k) fx(0,0) k 0 fxy(0,0)= lim − = lim − − = 1, k 0 k k 0 k − → → fy(h,0) fy(0,0) h 0 fyx(0,0)= lim − = lim − = 1. h 0 h h 0 h → → Thus, both the mixed partial derivatives exist at (0,0), but they are not equal.

155 So, why does Fact 13.1 not apply to this example? A rather dull computation gives an explicit formula for fxy away from (0,0):

(x4 + 12x2y2 5y4)(x2 + y2) 4y2(x4 + 4x2y2 y4) fxy(x,y)= − − − ((x,y) =(0,0)). (x2 + y2)3 6

Taking the limit of fxy, restricted to the line y = 0, as (x,y) (0,0), we obtain → x6 0 lim fxy(x,0)= lim − = 1. (x,y) (0,0) (x,y) (0,0) x6 → →

Since this is not the value of fxy(0,0), we conclude that fxy is not continuous at the origin. Thus, Fact 13.1 does not apply because (at least) one of the mixed partial derivatives is not continuous! Recall that we drew the graph of f in Figure 12.6 (right). Who could have known then that its second partial derivatives were so weird?

13.2 Linear Approximations for Functions of Two Variables

For a function of one variable, we have seen that the tangent line to the graph y = f (x) at x = a gives a reasonable approximation of the values of f (x) when x is near a:

f (x) L(x)= f (a)+ f ′(a)(x a). ≈ − We called L(x) the linearisation of f at a. Similarly, if the first derivatives of a two-variable function f exist at a point (a,b), then the tangent plane to the graph z = f (x,y) at (a,b) provides a reasonable approximation for the values of f (x,y) when (x,y) is near (a,b):

f (x,y) L(x,y)= f (a,b)+ fx(a,b)(x a)+ fy(a,b)(y b). ≈ − − You should check that this definition of L(x,y) really does define the tangent plane — L agrees with f at (a,b), as do Lx with fx and Ly with fy. It is therefore possible to estimate the change in f , denoted by ∆ f , corresponding to a change ∆x in x and a change ∆y in y. Rewriting our linearisation above in the form

f (x + ∆x,y + ∆y) f (x,y)+ fx(x,y)∆x + fy(x,y)∆y, ≈ we arrive at the formula ∆ f fx(x,y)∆x + fy(x,y)∆y. ≈ This motivates the following definition.

Definition. The differential of a two-variable function f at a point (a,b) where it is differen- tiable is defined to be d f = fx(a,b) dx + fy(a,b) dy. You will often see differentials d f used instead of ∆ f , especially in physics, the author assum- ing, implicitly, that d f and ∆ f are the same thing. This is generally harmless, provided one manipulates the quantities according to what one’s own definition allows. Note too that this easily generalises to functions of more than two variables.

Example 13.6. Use linear approximate to estimate the value of f (x,y)= x2 y2 at (2.1,1.3). − 156 Solution. Here, we can use the point (a,b)=(2,1) at which f (2,1)= 3. Since

fx(x,y)= 2x and fy(x,y)= 2y, − we have fx(2,1)= 4 and fy(2,1)= 2. Our formula for linear approximation therefore be- comes − f (x,y) L(x,y)= 3 + 4(x 2) 2(y 1). ≈ − − − Substituting x = 2.1 and y = 1.3 now gives

f (2.1,1.3) L(2.1,1.3)= 3 + 4 0.1 2 0.3 = 2.8, ≈ × − × which compares reasonably well with the exact value: f (2.1,1.3)= 2.72. Example 13.7. The radius r of a right circular cylinder is measured with an error of at most 2% and the height h is measured with an error of at most 4%. Use linear approximation to estimate the maximum possible percentage error in the volume V calculated from these measurements. Solution. The volume of the cylinder is given by

V(r,h)= πr2h, where r and h are the true radius and height values. We are told that the error in these quantities are given by dr dh 6 0.02 and 6 0.04, r h

respectively (here we are using differential notation just for a change). The differential of V is given by 2 dV = Vr(r,h) dr +Vh(r,h) dh = 2πrhdr + πr dh, so dV 2πrhdr + πr2 dh dr dh = = 2 + . V πr2h r h The maximal percentage error is therefore estimated by dV dr dh dr dh = 2 + 6 2 + 6 2 0.02 + 0.04 = 0.08, V r h r h ×

that is, the maximum percentage error in V is approximately 8%. Example 13.8. Two adjacent sides and the included angle of a triangular bracket are measured to be 4cm, 5cm and 30°. Find the area and use differentials to estimate the error in this area if the errors in the lengths are 0.2cm and the error in the angle is 5°. ± ± Solution. Let x and y be the sides that were measured and θ the included angle. The area of the bracket is then given by 1 1 π A(x,y,θ)= xysinθ = (4cm)(5cm)sin = 5cm2. 2 2 6 Since the area is a function of three variables, we use the three-variable generalisation of our definition of differentials: 1 1 1 dA = ysinθ dx + xsinθ dy + xycosθ dθ. 2 2 2 157 Since dx = dy = 0.2cm and dθ = 5° = π/36, we estimate the error in the area as | | | | | | 1 1 1 dA = y sinθ dx + x sinθ dy + x y cosθ dθ | | 2 | || || | 2 | || || | 2 | || || || | 1 π 1 π 1 π = (4cm)sin (0.2cm)+ (5cm)sin (0.2cm)+ (4cm)(5cm)cos (π/36) 2 6 2 6 2 6 5π√3 = 0.2cm2 + 0.25cm2 + cm2 1.21cm2. 36 ≈ 13.3 Differentiability for Multivariable Functions

For functions of one variable, we have seen in Fact 5.1 that differentiability implies continuity. In other words, the derivative of a function cannot exist at a given point unless the function is already continuous there. For functions of n variables, we have introduced the notion of partial derivatives. However, in an ugly twist to our tale, it turns out that the existence of all partial derivatives at a point need not imply differentiability nor does it even imply continuity! To see why, consider the two-dimensional case: There, the existence of fx(a,b) and fy(a,b) depends upon the behaviour of f (x,y) as we approach (a,b) along the lines containing the points (a+h,b) and (a,b+k), respectively. As we have seen, limits in two or more dimensions are decidedly tricky, so we should not be surprised that good behaviour of a function along two lines is not enough to conclude that the function behaves itself on all curves (or is even continuous). We need a much stronger condition:

Definition. We say that a function of two variables f (x,y) is differentiable at the point (a,b) if

f (a + h,b + k) f (a,b) h fx(a,b) k fy(a,b) lim − − − = 0. (h,k) (0,0) √h2 k2 → + The generalisation to functions of more than two variables is similar.

Note that the limit (h,k) (0,0) means that this pair can tend to zero in any manner we choose. → Note also that this definition requires that f (a,b), fx(a,b) and fy(a,b) be defined. We can there- fore adapt our proof of Fact 5.1 to show that this notion of differentiability implies continuity:

Fact 13.2. If f is differentiable at a point, then it is continuous there.

The differentiability of a two-variable function at the point (a,b) is designed to guarantee that the tangent plane L(h,k)= f (a,b)+ h fx(a,b)+ k fy(a,b) is a “good” linear approximation to f around (a,b) (for (h,k) around (0,0)). Unfortunately, it is not particularly easy to use this definition to determine whether a given function is differen- tiable. As usual, however, mathematicians have given us (and themselves) a break:

Fact 13.3. If f is a function whose partial derivatives all exist at r and are continuous in an open ball centered on r, then f is differentiable at r.

Example 13.9. Show that cosx + exy f (x,y)= . x2 + y2 is differentiable at all points (x,y) =(0,0). 6 158 Solution. We first find the partial derivatives:

∂ f (x2 + y2)(yexy sinx) 2x(cosx + exy) = − − , ∂x (x2 + y2)2 ∂ f (x2 + y2)xexy 2y(cosx + exy) = − . ∂y (x2 + y2)2 Because these are continuous everywhere except (x,y)=(0,0), f is differentiable everywhere but (0,0), by Fact 13.3.

13.4 The Chain Rule

Recall that the Chain Rule for functions of a single variable gives the derivative of a composition ( f g)(x)= f (g(x)) of two functions f and g: ◦ d f (g(x)) = f ′(g(x)) g′(x). dx · Let’s see, in preparation for what will follow, how the Chain Rule looks when we use the definition of the derivative on the left-hand side: f (g(x + h)) f (g(x)) lim − = f ′(g(x)) g′(x). (13.1) h 0 h · → We now turn to the multivariable generalisation of this rule. Consider therefore a function f of two variables x and y, where x and y are themselves functions of another variable t:

f = f (x,y), x = x(t), y = y(t).

Let us define g to be the function f regarded as a function of t alone:

g(t)= f (x(t),y(t).

How does one calculate the rate of change of g with respect to t? Well, we can always fall back on the definition: d g(t + h) g(t) g(t)= lim − dt h 0 h → f (u(t + h),v(t + h)) f (u(t),v(t)) = lim − h 0 h → (Now, we add and subtract a certain quotient in a creative manner!)

f (u(t + h),v(t + h)) f (u(t),v(t + h)) = lim − h 0 h → f (u(t),v(t + h)) f (u(t),v(t)) + lim − . h 0 h → Using (13.1), we recognise the first term as fx(u,v) u′(t) and the second term as fy(u,v) v′(t). That is, we have derived a chain rule for two-variable· functions: · d f (u(t),v(t)) = fx(u(t),v(t)) u′(t)+ fy(u(t),v(t)) v′(t). dt · · 159 It should be clear how this generalises to functions of n variables. Notice that we can also express it in the form d f ∂ f du ∂ f dv = + , dt ∂u dt ∂v dt which may appear more, or less, natural, depending upon your attachment to pretending that derivatives are fractions. Example 13.10. Find the t-derivative of z = x2y as a function of t, when x = cost and y = sint. Solution. We compute the following derivatives: ∂z ∂z dx dy = 2xy, = x2, = sint, = cost. ∂x ∂y dt − dt The two-variable Chain Rule therefore gives dz ∂z dx dz dy = + = 2xy( sint)+ x2 cost dt ∂x dt dy dt − = 2cost sin2 t + cos3 t. − Of course, we could have just substituted x = cost and y = sint into z = x2y to directly obtain dz z = cos2 t sint = 2cost sin2 t + cos3 t. ⇒ dt − However, we should also recall that the whole point of Chain Rules is to render such direct substitutions redundant, especially when the direct approach would be tedious. Example 13.11. Suppose that the surface temperature of a stretch of water depends on the position, given in terms of two coordinates x and y, and the time t of the day: T = T(x,y,t). What is the rate of change of the temperature T as observed by someone who is in a boat, the position of which is changing (differentiably) with time? If the surface temperature is modelled by T(x,y,t)= 18 +(5 + x2 + y2)sint and that x(t)= t and y(t)= t2/2, compute dT/dt as seen by the observer. Solution. The rate of change of temperature may be attributed to two different things: • The temperature at a given place changing with time. • The temperature varying from place to place. By using the Chain Rule, we immediately obtain dT ∂T dx ∂T dy ∂T = + + , dt ∂x dt ∂y dt ∂t where we have made use of the obvious fact that dt/dt = 1. To answer the second part of the question, we compute ∂T ∂T ∂T dx dy = 2xsint, = 2ysint, =(5 + x2 + y2)cost, = 1, = t. ∂x ∂y ∂t dt dt Substituting into the Chain Rule therefore gives dT t4 = 2xsint + 2yt sint +(5 + x2 + y2)cost = 5 +t2 + cost +(2t +t3)sint. dt 4   160 Another multivariable generalisation of the Chain Rule concerns the case where f = f (u,v) again, but both u and v are functions of two independent variables:

u = u(x,y), v = v(x,y).

In this case, the Chain Rule is naturally formulated as the statements

∂ f ∂ f ∂u ∂ f ∂v = + , ∂x ∂u ∂x ∂v ∂x ∂ f ∂ f ∂u ∂ f ∂v = + . ∂y ∂u ∂y ∂v ∂y This version of the Chain Rule can actually be deduced from the earlier two-variable version just by allowing u and v to depend on two variables x and y, rather than just on one variable t. Then, one should hold one of the two variables fixed while differentiating with respect to the other. More generally, if f is a function of several ’primary’ variables, each of which depends on other ’secondary’ variables, then the partial derivatives of f with respect to one of the ’sec- ondary’ variables will have as many terms, potentially, as there are primary variables. We’ll see an example which shows how this works shortly. ∂z Example 13.12. Consider the function z = u2 + 2uv, where u = xlny and v = 2x + y. Find ∂x ∂z and . ∂y Solution. This is a straight-forward application of the above Chain Rule:

∂z ∂z ∂u ∂z ∂v = + =(2u + 2v) lny + 2u 2 = 2(xlny + 2x + y)lny + 4xlny, ∂x ∂u ∂x ∂v ∂x · · ∂z ∂z ∂u ∂z ∂v x 2x(xlny + 2x + y) = + =(2u + 2v) + 2u 1 = + 2xlny. ∂y ∂u ∂y ∂v ∂y · y · y

Example 13.13. Write down the version of the Chain Rule which applies to ∂ f /∂x and ∂ f /∂y when

• f is a function of u, v and w,

• u and v depend on x, y and w, and

• w depends on x and y.

Solution. We have the following graphical representation of the situation: f

u v w

x y w x y w x y

x y x y

161 Tracing the paths down from f to the x’s and y’s, we arrive at ∂ f ∂ f ∂u ∂ f ∂u ∂w ∂ f ∂v ∂ f ∂v ∂w ∂ f ∂w = + + + + , ∂x ∂u ∂x ∂u ∂w ∂x ∂v ∂x ∂v ∂w ∂x ∂w ∂x ∂ f ∂ f ∂u ∂ f ∂u ∂w ∂ f ∂v ∂ f ∂v ∂w ∂ f ∂w = + + + + . ∂y ∂u ∂y ∂u ∂w ∂y ∂v ∂y ∂v ∂w ∂y ∂w ∂y This previous example shows the inherent limitations of the notation that we, and almost everybody else, use for partial derivatives (Adams is a notable exception). Consider u = u(x,y,w)= u(x,y,w(x,y)). What does ∂u ∂x represent? Is it the derivative when y and w are held constant, or is it the derivative when y is held constant and we substitute w = w(x,y)? Normally, we would conventionally choose the former. However, you can see that this can get very confusing when there are lots of variables with lots of different dependences. An alternative notation, which is particularly popular in thermodynamics, is to explicitly indicate which variables are being held constant. For example, the two possibilities for interpreting ∂u/∂x mentioned above would be denoted by ∂u ∂u and , ∂x ∂x  y,w  y according as to whether w was to be treated as a constant, or as a function of x (and y).

13.5 Gradients and Directional Derivatives

Recall that the partial derivative of a function f = f (x,y) with respect to x measures the rate of change of f when y is held constant, that is, the rate of change in the x-direction. Similarly, fy(x,y) measures the rate of change of f in the y-direction. One might imagine that it could be useful to know how to compute rates of change in other directions as well! To do this, it is useful to take the first steps towards combining our calculus knowledge with a little linear algebra. Definition. Suppose that the first partial derivatives of f : R2 R exist at the point (x,y). Then, we define the gradient of f at (x,y) to be the vector → ∂ f ∂ f ∇ f (x,y)= i + j = f (x,y) i + f (x,y) j, ∂x ∂y x y where i and j are the usual unit vectors in the x- and y-directions, respectively. One has similar definitions for functions of three and more variables. The symbol ∇ is called del or nabla and is a vector version of the derivative symbols d and ∂. Example 13.14. Compute the gradient vector of f (x,y)= x2 + y2 at the point ( 1,3). − Solution. The partial derivatives are fx(x,y)= 2x and fy(x,y)= 2y, hence the gradient is ∇ f (x,y)= 2x i + 2y j. Substituting x = 1 and y = 3 then gives the required result: − ∇ f ( 1,3)= 2 i + 6 j. − − 162 Suppose that u = u i + v j is a unit vector in R2, meaning that u 2 = u2 + v2 = 1. Unit vectors are generally used to indicate directions. Then, the rate of changek k of a function f in the direction of u is defined as follows:

Definition. The directional derivative of f (x,y) at (a,b) in the direction of the unit vector u = u i + v j is the one-sided limit

f (a + hu,b + hv) f (a,b) Du f (a,b)= lim − , h 0 h → provided that this limit exists.

Note that if fx and fy exist at (a,b), then

Di f (a,b)= fx(a,b), Dj f (a,b)= fy(a,b), D i f (a,b)= fx(a,b), D j f (a,b)= fy(a,b). − − − − Computing more general directional derivatives is easy, thanks to the following result:

Fact 13.4. Let u be a unit vector and let f be differentiable at (a,b). The directional derivative of f at (a,b) in the direction of u is then the dot product of u with the gradient at (a,b):

Du f (a,b)= u ∇ f (a,b). • Example 13.15. Find the derivative of f (x,y)= x2 + xy at (1,2), in the direction of the unit vector 1 1 u = i + j, √2 √2 using, first, the definition and, second, Fact 13.4.

Solution. Using the definition, we have

f (1 + h/√2,2 + h/√2) f (1,2) Du f (1,2)= lim − h 0 h → (1 + h/√2)2 +(1 + h/√2)(2 + h/√2) 3 = lim − h 0 h → h2 + 5h/√2 5 5 = lim = lim h + = . h 0 h h 0 √2 √2 → →   We may use Fact 13.4 because f is clearly differentiable at (1,2). The gradient is

∂ f ∂ f ∇ f (x,y)= i + j =(2x + y) i + x j ∇ f (1,2)= 4 i + j. ∂x ∂y ⇒ The directional derivative is therefore 1 1 5 Du f (1,2)= i + j (4 i + j)= , √2 √2 • √2   as before.

163 A very useful property of gradients and directional derivatives may be derived as follows: If φ denotes the angle between a unit vector u and the gradient vector ∇ f at a point (a,b), then

Du f (a,b)= u ∇ f (a,b)= ∇ f (a,b) cosφ. • k k Since 1 6 cosφ 6 1, it follows that the directional derivative is bounded by the magnitude of the gradient:− ∇ f (a,b) 6 Du f (a,b) 6 ∇ f (a,b) . −k k k k This leads to the following characterisation:

Fact 13.5. If f is differentiable at (a,b), then:

• At (a,b), f is increasing most rapidly in the direction of ∇ f (a,b) and this maximal rate of increase is ∇ f (a,b) . k k • At (a,b), f is decreasing most rapidly in the direction of ∇ f (a,b) and this maximal rate of decrease is ∇ f (a,b) . − k k • At (a,b), the rate of change of f is zero in precisely those directions which are orthogonal to ∇ f (a,b). These directions are tangent to the level curves of f through (a,b), so the level curve through (a,b) is always orthogonal to the gradient at (a,b).

Example 13.16. If f (x,y)= x2ey, find the unit vector pointing in the direction of maximal decrease at (1,0) and the unit vectors tangent to the level curves through (1,0).

Solution. The gradient is ∇ f (x,y)= 2xey i + x2ey j, hence at the point (1,0), we obtain

∇ f (1,0)= 2 i + j.

The direction of maximal decrease is that of ∇ f (1,0)= 2 i j. However, this is not a unit vector. Its norm is √5, hence the unit vector pointing− in the− direction− of maximal decrease is 2 1 i j. −√5 − √5 The unit vectors which are tangent to the level curve through (1,0) are those which are orthog- onal to ∇ f (x,y). It is easy to see that there are precisely two such unit vectors (each is the negative of the other) and just as easy to check that they are given by 1 2 1 2 i j and i + j. √5 − √5 − √5 √5

All of this generalises in an obvious fashion to higher dimensions. Rather than rewrite everything, we illustrate this with a function of three variables.

Example 13.17. What is the rate of change of f (x,y,z)= x3 xy2 z at (1,1,0) in the direction of 2i 3j+6k? In which direction is the rate of change of− f greatest− and what is this maximal rate of− change?

164 Solution. First, we normalise the direction vector to make it a unit vector: 2 i 3 j + 6 k 2 3 6 u = − = i j + k. √4 + 9 + 36 7 − 7 7 Next, we compute the gradient:

∇ f (x,y,z)=(3x2 y2) i 2xy j k ∇ f (1,1,0)= 2 i 2 j k. − − − ⇒ − − The rate of change in the direction of u is therefore 2 3 6 4 Du f (1,1,0)= u ∇ f (1,1,0)= i j + k (2 i 2 j k)= . • 7 − 7 7 • − − 7   Finally, the rate of change is greatest in the direction of ∇ f (1,1,0)= 2 i 2 j k (you can normalise this to get a unit vector if you want) and this maximal rate of change− is−

∇ f (1,1,0) = 2 i 2 j k = 3. k k k − − k 13.6 Extrema and Optimisation

In this section, we briefly describe how to extend our single-variable techniques for finding and classifying local extrema to multivariable functions. The definition of local maxima and minima for a two-variable function f are essentially the same as the single variable case and they generalise to more variables immediately: A local maximum at (a,b) has f (x,y) 6 f (a,b) for all (x,y) sufficiently close to (a,b) and a local minimum at (a,b) has f (x,y) > f (a,b) for all (x,y) sufficiently close to (a,b). As with the single variable case, local extrema can only occur at critical points which are defined to be any of the following three types of points: • If ∇ f (a,b)= 0, then (a,b) is a stationary point. • If ∇ f (a,b) does not exist, then (a,b) is a singular point. • If (a,b) belongs to the boundary of the domain of f , then (a,b) is a boundary point. We illustrate this with a couple of simple examples: Example 13.18. Find the critical points and local extrema of f (x,y)= x2 6x + y2 4y. − − Solution. The gradient vector is given by

∇ f (x,y)=(2x 6) i +(2y 4) j, − − which always exists. We therefore look for stationary points: 2x 6 = 0, x = 3, ∇ f (x,y)= 0 − ⇒ 2y 4 = 0 ⇒ y = 2. − The point (3,2), with f (3,2)= 13, is therefore our only candidate for a local extremum of f . In fact, completing the square gives−

f (x,y)= x2 6x + 9 9 + y2 4y + 4 4 =(x 3)2 +(y 2)2 13, − − − − − − − so it is clear that (3,2) gives a local minimum (actually, the global minimum).

165 Example 13.19. Find the critical points and local extrema of f (x,y)= 4 2 x2 + y2. − Solution. Again, we compute the gradient: p 2x 2y ∇ f (x,y)= i j. − x2 + y2 − x2 + y2

This exists everywhere except (x,y)=(0,p0), where wep can see that it does not from

f (x,0)= 4 2 x and f (0,y)= 4 2 y . − | | − | | There is therefore a singular point at the origin. Elsewhere, ∇ f (x,y) is non-zero, so there are no stationary points. Our only candidate for a local extremum is therefore (0,0), where f (0,0)= 4. In fact, it follows from x2 + y2 > 0 that f (x,y) 6 4, so (0,0) gives a local maximum (again, it is the global maximum). p As in the single variable case, one can have critical points which do not give rise to local extrema. A stationary or singular point (a,b) which is in the interior of the domain of f is said to be a saddle point if it is neither a local maximum nor a local minimum. This means that in any ball centered on (a,b), one can find points (x,y) for which f (x,y) > f (a,b) and points (x′,y′) for which f (x′,y′) < f (a,b). x2 y2 Example 13.20. Find the critical points and local extrema of f (x,y)= − . 4 1 Solution. The gradient is ∇ f (x,y)= 4 (2xi 2yj), hence the only critical point is (0,0), where f (0,0)= 0. However, the origin is a local− minimum if we restrict to the x-axis and a local maximum if we restrict to the y-axis:

x2 y2 f (x,0)= , f (0,y)= . 4 − 4 Thus, (0,0) is a saddle point. In the single-variable case, we learnt how to classify stationary points as local maxima, minima or saddles. One method for doing this is the second-derivative test which was discussed in Section 7.7. This test generalises to the multivariable case as follows. First, recall that the proper generalisation of the derivative of a real-valued function f was a vector, the gradient vector ∇ f . When we consider the second partial derivatives of f , it is natural to arrange them as a matrix!

Definition. If f : D R is twice-differentiable on a domain D R2, then we can define the → ⊆ hessian matrix H f (x,y) to be the matrix of second partial derivatives:

∂ 2 f ∂ 2 f 2 H (x,y)= ∂x ∂x∂y . f  ∂ 2 f ∂ 2 f  ∂y∂x ∂y2      Of course, the hessian generalises to an n n matrix when f is a function of n variables. One very important fact about hessian matrices× is the following:

166 Figure 13.1: Graphs of the functions f (x,y) = x2 6x + y2 4y (top-left), f (x,y) = 4 − − − 2 x2 + y2 (top-right) and f (x,y)= 1 (x2 y2) bottom considered in examples. 4 − p

167 Fact 13.6. When f has continuous second partial derivatives, the hessian matrix is symmetric: H T H f = f . This is really just Fact 13.1 translated into matrix language. Why is Fact 13.6 relevant? Because of an extremely important result of linear algebra that goes by the name of the spectral theorem: Fact 13.7 (The Spectral Theorem). Any symmetric matrix can be diagonalised and its eigen- values are always real. What this means is that a basis may be chosen so that the hessian matrix becomes real and diagonal. The basis vectors (the eigenvectors of H f ) define directions in which the hessian acts as multiplication by the corresponding eigenvalues. This is the sense in which the second derivative test generalises. Fact 13.8 (The Second Derivative Test). Let (a,b) be a stationary point of a function f : R2 R whose second partial derivatives are continuous. Then, →

• If all the eigenvalues of H f are positive, then (a,b) is a local minimum.

• If all the eigenvalues of H f are negative, then (a,b) is a local maximum.

• If H f has both positive and negative eigenvalues, then (a,b) is a saddle point. However, if all the eigenvalues are positive (or negative), except for a zero-eigenvalue, then the test fails and we must classify the stationary point using another method. Let’s have a look at a few examples to see how this works. Example 13.21. Find and classify the stationary points of f (x,y)= 3y2 2y3 3x2 + 6xy. − − Solution. The gradient vector is

∇ f (x,y)=(6y 6x) i +(6x + 6y 6y2) j, − − which vanishes at the points (0,0) and (2,2). To classify these stationary points, we compute the hessian at each point:

6 6 6 6 6 6 H f (x,y)= − H f (0,0)= − , H f (2,2)= − . 6 6 12y ⇒ 6 6 6 18 − ! ! − ! We need the eigenvalues of these matrices. Luckily, there is a neat trick for getting the eigenvalues of 2 2 matrices. There are two of them, λ and µ say, and their sum λ + µ is the trace of the matrix,× whereas their product λ µ is the determinant of the matrix. At (0,0), the hessian has trace 0 and determinant 72. To get a negative product, one of the eigenvalues must be positive and the other has to be negative− (if you like, you can even check that the eigenvalues must be √72). It follows that (0,0) is a saddle point. At (2,2), the hessian has trace 24 and determinant± 72. To get a positive product, both eigenvalues must have the same sign.− Since their sum is negative, the sign of both eigenvalues is negative, hence (2,2) is a local maximum. The graph is shown in Figure 13.2. Note that when the hessian of a two-variable function at a stationary point has a negative deter- minant, then the eigenvalues must have different signs, hence the stationary point is a saddle.

168 (x2+y2)/2 Example 13.22. Find and classify the stationary points of f (x,y)= xye− . Solution. Once again, we compute the gradient

(x2+y2)/2 2 2 ∇ f (x,y)= e− y(1 x ) i + x(1 y ) j − − and the hessian  

2 2 2 (x2+y2)/2 xy(x 3) (1 x )(1 y ) H f (x,y)= e− − − − . (1 x2)(1 y2) xy(y2 3) − − − ! From the gradient, we see that the stationary points occur when

(x = 1 or y = 0) and (x = 0 or y = 1) ± ± which yields (0,0), (1,1), (1, 1), ( 1,1) and ( 1, 1). The hessians at the stationary points are then: − − − −

0 1 H f (0,0)= detH f (0,0) < 0 saddle. 1 0! ⇒ ⇒

1 2 0 H f (1,1)= H f ( 1, 1)= e− − both eigenvalues < 0 local max. − − 0 2 ⇒ ⇒ − ! 1 2 0 H f (1, 1)= H f ( 1,1)= e− both eigenvalues > 0 local min. − − 0 2! ⇒ ⇒ See Figure 13.2 for the graph. We should also discuss global maxima and minima which are more likely to be important in applications. Recall from the single variable case (Section 7.3) that global maxima and minima are not guaranteed unless the domain is a closed bounded interval. In higher dimensions, the generalisation is that if the domain of f is closed and bounded (where bounded now means con- tained in disc/ball of sufficiently large radius), then f is guaranteed to have a global maximum and a global minimum. Moreover, these global extrema must occur at critical points.

x y Example 13.23. Find the global maximum of f (x,y)= xye− − on the (closed) triangular region bounded by x = 0,y = 0 and x + y = 4.

Solution. We first find the stationary points:

x y x y ∇ f (x,y)= y(1 x)e− − i + x(1 y)e− − j = 0 for (x,y)=(0,0), (1,1). − − 2 Since f (0,0)= 0 and f (1,1)= e− > 0 (and both points are in the triangular domain), (1,1) is a candidate for the global maximum. One can check that it is a local maximum by computing the hessian, but this isn’t required. Since ∇ f is defined everywhere on the domain, there are no singular points. There are, however, boundary points corresponding to the triangle’s three edges. We analyse the values that f takes on each edge:

2 1. Along x = 0 (with 0 6 x 6 4), f takes value f (0,y)= 0 < e− , so the global maximum cannot occur here.

169 Figure 13.2: Graphs of the functions f (x,y)= 3y2 2y3 3x2 + 6xy (top-left), f (x,y)=(y 2 2 − − − x2)(y 3x2) (top-right) and f (x,y)= xye (x +y )/2 (bottom). − −

170 2 2. Along y = 0 (with 0 6 y 6 4), f takes value f (x,0)= 0 < e− , so the global maximum cannot occur here either.

4 3. Along x + y = 4 (with 0 6 x 6 4), f takes values f (x,4 x)= x(4 x)e− . This is a quadratic function of x, so it is easily checked that its maximum− occurs− when x = 2. However, 4 4 f (2,2) 4e− 4 f (2,2)= 4e− = 2 = 2 < 1, ⇒ f (1,1) e− e since e > 2. Thus, f (2,2) < f (1,1). As we have exhausted all the critical points, it follows that the global maximum is found at the 2 stationary point (x,y)=(1,1) where f has value e− . Finally, we discuss an example in which the second derivative test fails. This makes life a bit tricky as we’ll see! Example 13.24. Find and classify the stationary points of f (x,y)=(y x2)(y 3x2). − − Solution. Since f (x,y)= y2 4x2y + 3x4, the gradient and hessian are − 2 3 2 36x 8y 8x ∇ f (x,y)=(12x 8xy) i +(2y 4x ) j and H f (x,y)= − − . − − 8x 2 − ! The only stationary point is therefore the origin and the hessian there is

0 0 H f (0,0)= . 0 2! Because this has a positive eigenvalue and a zero eigenvalue, the second derivative test cannot tell us the nature of this stationary point. To investigate this nature, we can look at how the function behaves when we restrict (x,y) to lie on curves through the origin in the xy-plane. If we restrict to the straight line x = 0, then the function becomes f (0,y)= y2 which has a local minimum at y = 0. This tells us that (0,0) cannot be a local maximum because there are points (0,y) arbitrarily close to (0,0) with f (0,y) > f (0,0). More generally, if we restrict to the straight line y = mx, then the function becomes f (x,mx)=(mx x2)(mx 3x2)= 3x4 4mx3 + m2x2. − − − This function of x also has a local minimum at x = 0, no matter which m we choose (the second derivative there is 2m2 > 0 for m = 0 and when m = 0, the function becomes f (x,0)= 3x4). Again, this rules out (0,0) being a6 local maximum. However, the fact that the function has a local minimum when restricting to every straight line through the origin doesn’t prove that (0,0) is a local minimum — our experience with tricky multivariable limits has made us wise to falling into that trap. Indeed, if we try restricting to the curve y = 2x2, we find that the function becomes f (x,2x2)=(2x2 x2)(2x2 3x2)= x4 − − − which clearly has a local maximum at x = 0. This curve therefore rules out (0,0) being a local minimum because we now know that there are points (x,2x2) arbitrarily close to (0,0) with f (0,y) < f (0,0). The only remaining possibility is that (0,0) is a saddle point. The graph of f , pictured in Figure 13.2 illustrates this.

171 Chapter 14

Multiple Integration

Here, we will extend the concept of the definite integral, discussed in Section 8.4 to functions of several variables. Usually, we will restrict ourselves to two variables, noting that much of what we do can be easily generalised to work with three or more variables.

14.1 Double integrals

b Recall that the definition of the definite integral a f (x) dx was motivated by the problem of finding the (signed) area of the plane region bounded by the curve y = f (x), the x axis and the lines x = a and x = b. We can similarly motivateR the definition of a definite integr− al of a function f of two variables x and y over a two-dimensional region R by considering the (signed) volume of the solid bounded by the surface z = f (x,y), the xy-plane and the prism (parallel to the z-axis) whose cross-section on the xy-plane is the region R. This will be referred to as a double integral, because we have two variables to integrate and instead of integrating over an interval [a,b], we will have to integrate over a region R in the xy-plane. The notation for this double integral is f (x,y) dA or f (x,y) dxdy, ZR ZZR where dA = dxdy stands for an “infinitesimal area”, much as dx denotes an “infinitesimal length” in one-dimensional integration. To actually compute the volume of the solid and thence the definite integral, we may proceed as follows. First, we discuss the case in which the region R is a rectangle: R = (x,y) R2 : a 6 x 6 b and c 6 y 6 d . ∈ • We construct an equipartition of the rectangle R, meaning we divide it into little rectangles Rn(i, j) whose lower-left corner is at (xi,yi)=(a+i(b a)/n,c+ j(d c)/n) and whose side lengths are ∆x =(b a)/n and ∆y =(d c)/n. − − − − • For each Rn(i, j), we then compute the volume Ln(i, j) of the rectangular prism over it whose height is the minimum of the values of f (x,y) when x and y lie in the little rectangle. We similarly compute the volume Un(i, j) using the maximum of the values of f (x,y).

• We add these volumes up to obtain a lower and upper Riemann sum, Ln and Un.

• If the limits of Ln and Un as n ∞ exist and are equal, then we define the definite integral → R f (x,y) dA over R to be this common value.

R 172 What should we do when the region is not a rectangle? Well, if the region is integration is not too unpleasant, then we can try to approximate it using small rectangles.

• First, overlay R by a rectangular grid of side lengths ∆x and ∆y, both of which are in- versely proportional to m.

• For each rectangle Ra in the interior of R, meaning that the intersection of Ra with R is Ra, compute the integral f (x,y) dA using the above method (assuming it exists!). Ra • Add up these integrals toR get the inner sum:

Inn(m)= ∑ f (x,y) dA. R Ra R=Ra a ∩ Z

• The outer sum is given by summing over the rectangles Ra whose intersection with R is non-empty: Out(m)= ∑ f (x,y) dA. R Ra R=/0 Z a ∩ 6

• If limm ∞ Inn(m)= limm ∞ Out(m), then define the integral → → f (x,y) dA ZR to be this common limit.

You can probably imagine how this double procedure may be generalised to deal with integrals over three- or higher-dimensional regions R. You are also probably wondering why anyone would bother with such a long and tedious procedure (and are suspecting, quite rightly, that they don’t!). However, the above constitutes an algorithm for computing integrals and, by replacing the limits by sufficiently fine grids, this can be implemented on a computer. Rest assured that we will not try to make you compute anything using the above method — what’s important are the ideas which go into it. Before we start on actual methods to compute higher-dimensional integrals, here are a few properties that should be familiar from the one-dimensional case. Let f and g be integrable functions on R Rn, meaning that their integrals over R are defined (and finite!). Then: ⊆ • c f (x,y) dA = c f (x,y) dA (c constant). ZZR ZZR • [ f (x,y)+ g(x,y)] dA = f (x,y) dA + g(x,y) dA. ZZR ZZR ZZR • f (x,y) 6 g(x,y) f (x,y) dA 6 g(x,y) dA. ⇒ R R ZZ ZZ • The Triangle Inequality: f (x,y) dA 6 f (x,y) dA. R R | | ZZ ZZ

• If R = R1 R2, with R1 R2 = /0, then ∪ ∩ f (x,y) dA = f (x,y) dA + f (x,y) dA. ZZR ZZR1 ZZR2

173 Figure 14.1: Integration over a y-simple region means integrate in the y-direction first.

14.2 Evaluating Double Integrals by Iteration

If f (x,y) is defined and bounded on R, then a double integral of f over R may be computed in the following ways: 1. If the region of integration is y-simple (see Figure 14.1), meaning that it is bounded by two vertical lines x = x1 and x = x2 and two continuous graphs y = h1(x) and y = h2(x), then we can integrate first in the y-direction and then in the x-direction:

x2 y=h2(x) f (x,y) dA = f (x,y) dy dx. ZZR Zx1 Zy=h1(x)  In other words, we can first evaluate the inner integral

y=h2(x) f (x,y) dy, Zy=h1(x) to get a function of x, k(x) say, and then integrate this function to get the result:

x2 k(x) dx = f (x,y) dA. Zx1 ZZR

2. Alternatively, if the region of integration is x-simple (see Figure 14.2), meaning that it is bounded by two horizontal lines y = y1 and y = y2 and two continuous graphs x = g1(y) and x = g2(y), then we can proceed by integrating first in the x-direction and then in the y-direction: y2 x=g2(y) f (x,y) dA = f (x,y) dx dy. ZZR Zy1 Zx=g1(y)  In other words, we can first evaluate the inner integral

x=g2(y) f (x,y) dx, Zx=g1(y) 174 Figure 14.2: Integration over a x-simple region means integrate in the x-direction first.

to get a function of y, ℓ(y) say, and then integrate this function to get the result:

y2 ℓ(y) dy = f (x,y) dA. Zy1 ZZR

3. If the region of integration is neither x-simple nor y-simple, then one should try to write it as a (preferably disjoint) union of x-simple and y-simple regions. This is not particularly pleasant, so it pays to look carefully at the region to check if it is x-simple or y-simple first!

Of course, now we must illustrate these procedures with a few examples.

Example 14.1. Evaluate the double integral

(2x + 6x2y) dA, ZZR where R is the rectangular region (x,y) R2 : 1 < x < 4 and 1 < y < 2 . ∈ − Solution. Rectangular regions, like the one in this example, are easy to integrate over because they are both x-simple and y-simple. We can evaluate this integral by integrating over x first as follows:

2 4 2 4 (2x + 6x2y) dA = (2x + 6x2y) dx dy = x2 + 2x3y dy R 1 1 1 1 ZZ Z− Z  Z− 2 h2 i = [(16 + 128y) (1 + 2y)] dy = (126y + 15) dy 1 − 1 Z Z − 2 − = 63y2 + 15y = 234. 1 h i−

175 In this case, we could have instead integrated over y first. This strategy results in

4 2 4 2 (2x + 6x2y) dA = (2x + 6x2y) dy dx = 2xy + 3x2y2 dx R 1 1 1 1 ZZ Z Z−  Z − 4 h 4 i = [(4x + 12x2) ( 2x + 3x2)] dx = (6x + 9x2) dx 1 − − 1 Z 4 Z = 3x2 + 3x3 = 234, 1 h i in agreement with our first calculation. Of course, the value of a double integral cannot depend on how we choose to evaluate it. How- ever, it is often the case that one evaluation method is significantly easier than the other.

Example 14.2. Evaluate the double integral

x2ydA, ZZR when R is the following semicircle:

R = (x,y) R2 : x2 + y2 6 1 and y > 0 . ∈ Solution. Let’s first compute this by integrating over x first. If we pick a given fixed value for y, then the limits on x will be (see Figure 14.3) x = 1 y2 and x = 1 y2. Of course, we should pick y to lie between 0 and 1 in order to capture− the− region R. The− integral is therefore p p evaluated as follows:

2 √1 y2 1 √1 y 1 1 − x2ydA = − x2ydx dy = x3y dy R 0 ( √1 y2 ) 0 3 √1 y2 ZZ Z Z− − Z  − − 1 2y 2 1 1 2 = (1 y2)3/2 dy = (1 y2)5/2 = . 0 3 − 3 −5 − 15 Z  0 Here, we have employed a simple substitution u = 1 y2. If we instead compute this by integrating over y first,− then we pick a fixed value of x between 1 and 1, noting that the limits for values of y are then 0 and √1 x2. Thus, − − √ 2 √1 x2 1 1 x 1 1 − x2ydA = − x2ydy dx = x2y2 dx R 1 0 1 2 ZZ Z− (Z ) Z−  0 1 1 1 1 1 1 2 = x2(1 x2) dx = x3 x5 = , 1 2 − 2 3 − 5 1 15 Z−  − as before. Notice that this computation was slightly easier because the final integration required no substitution.

Example 14.3. Evaluate x2ey dA, ZZR where R is the area bounded by the lines x = 1,y = x and y = 2x.

176 Figure 14.3: Integration over a semicircle in Example 14.2.

Solution. Here, it is easier to integrate in the y-direction first (see Figure 14.4 at left): 1 2x 1 2x 1 x2ey dA = x2ey dy dx = x2ey dx = x2 e2x ex dx. R 0 x 0 x 0 − ZZ Z Z  Z Z h i  The latter integral yields to integration by parts. Take u = x2 and dv =(e2x ex) dx, so that du = 2xdx and v = 1 e2x ex, leading to − 2 − 1 1 1 1 x2 e2x ex dx = x2 e2x ex xe2x 2xex dx. 0 − 2 − − 0 − Z   0 Z The latter integral again requires integration by parts for both terms. When this annoyance is overcome, the result is 1 7 x2ey dA = e2 e + . R 4 − 4 ZZ We remark that if we had chosen to integrate in the x-direction first, then we would have had to split R into two regions because R is not x-simple (see Figure 14.4 at right): 1 y 2 1 x2ey dA = x2ey dx dy + x2ey dx dy. ZZR Z0 Zy/2  Z1 Zy/2  This is clearly much more complicated. Example 14.4. Find the volume of the solid under the graph of f (x,y)= 2 x2 y2 over the region R bounded by x = 1,y = x and y = 0. − − Solution. Here, the region is a triangle which is both x- and y-simple. We will integrate in the y-direction first (see Figure 14.5). The volume is then given by the following integral: 1 x 1 1 x (2 x2 y2) dA = (2 x2 y2) dydx = 2y x2y y3 dx R − − 0 0 − − 0 − − 3 ZZ Z Z Z  0 1 4 1 1 2 = 2x x3 dx = x2 x4 = . 0 − 3 − 3 3 Z    0

177 Figure 14.4: Integration over a triangular region in Example 14.3. Integration in y first and then in x (left) and integration in x first and then in y (right).

Figure 14.5: Integrating over y, then x, for the triangular region of Example 14.4.

178 Figure 14.6: Integration over the shaded region — it is better to integrate in y first and then in x.

14.3 Area Integrals

Consider the double integral 1 dA ZZR in which we are integrating the function whose value is always 1. By definition, this integral gives the volume of the prism of height 1 and cross-sectional area equal to the area of R. In other words, this integral gives the area of R. Of course, one can compute areas using single-variable calculus. However, it may be more convenient in some cases to use two-dimensional methods! Example 14.5. Determine the area bounded between the curves 2y = 16 x2 and x + 2y = 4. − Solution. These curves intersect when 4 x = 16 x2, hence when x = 3 and x = 4. Sketch- ing the region bounded between the curves,− as in Figure− 14.6, we conclude− that our computation will be easier if we integrate over y first: 4 8 x2/2 4 x x2 Area = − dydx = 6 + dx 3 2 x/2 3 2 − 2 Z− Z − Z−   4 x2 x3 343 = 6x + = . 4 − 6 3 12  − This is an example where it is just as easy to use single-variable methods. Indeed, single variable methods tell us directly that the area is given by the integral appearing after the second equality above! Example 14.6. Determine the area bounded by the curves x = y3,x + y = 2 and y = 0. Solution. We again find the intersection points of the curves, displaying the region whose area is to be calculated in Figure 14.7). Clearly (0,0) and (2,0) are intersection points for x = y3 with y = 0 and x+y 2 with y = 0. The remaining point is obtained from x = y3 and x+y = 2: − y3 = 2 y y3 + y 2 = 0 (y 1)(y2 + y + 2)= 0 y = 1. − ⇒ − ⇒ − ⇒ 179 Figure 14.7: Integration over the shaded region — it is better to integrate in x first and then in y.

Thus, (1,1) is also an intersection point. From Figure 14.7, it is clearly better to integrate over x first. The required area is then 1 2 y 1 Area = − dxdy = (2 y y3) dy 0 y3 0 − − Z Z Z 1 1 1 5 = 2y y2 y4 = . − 2 − 4 4  0 Of course, we could also find this area by integrating first in the y-direction and dividing the region into two pieces as follows:

1 x1/3 2 2 x 1 2 Area = dydx + − dydx = x1/3 dx + (2 x) dx 0 0 1 0 0 1 − Z Z Z Z Z Z 3 1 1 2 5 = x4/3 + 2x x2 = . 4 − 2 4  0  1 This is probably how one would try to compute the area using single-variable methods (unless one was clever enough to set it up as a single integration over y).

14.4 Changing the Order of Integration

Sometimes a double integral of the form b d f (x,y) dxdy Za Zc is easier to evaluate (or perhaps can only be evaluated) by changing the order of integration. This means replacing it by an integral of the form

d′ b′ f (x,y) dydx. Zc′ Za′ 180 You might think that a′ = a, b′ = b, c′ = c and d′ = d, but this is only true if these are all constants. If any of them are functions, for example if c is a function of y, then one has to work to determine a′, b′, c′ and d′. Indeed, it can happen that reversing the order of integration requires splitting the double integral into several integrals over x- or y-simple regions. The main tool in understanding how to reverse the order of a double integral is sketching the region in the (x,y) plane over which the integral is taken. This means looking at the lower and upper limits given in the integral and then using the sketch to work out the new limits of integration after the order of integration has been changed. As always, it’s easier to follow with an example.

Example 14.7. Consider the following double integral:

1 3 2 ex dxdy. Z0 Z3y Sketch the region in the (x,y) plane over which this area integral is taken and then evaluate it by reversing the order of integration.

Solution. In this example, it’s not possible to evaluate the integral as given because the an- 2 tiderivatives of ex cannot be expressed in terms of elementary functions. Let’s sketch the region of integration as a precursor to reversing the order. We see that y varies freely from 0 to 1, but once a value for y is assumed, then x must vary from 3y to 3. The region is therefore bounded by the lines x = 3y, x = 3, y = 0 and y = 1 — see Figure 14.8. To reverse the order, we must let x vary freely over the range allowed by the region and then determine the range of y when we fix a given x. This is not too difficult: x can vary from 0 to 3 and for a given x, y is constrained to vary between 0 and x/3. Thus,

x2 1 3 2 3 x/3 2 3 2 x/3 3 xe ex dxdy = ex dydx = yex dx = dx 0 3y 0 0 0 0 0 3 Z Z Z Z Z h i Z (now, let u = x2, so du = 2xdx)

9 eu eu 9 1 = du = = (e9 1). 0 6 6 6 − Z  0

Example 14.8. Evaluate the following integral by interchanging the order of integration:

1 2 x3 sin(3y3) dA. 2 Z0 Z2x Solution. First, note that because the inner integral has limits which depend on x, it must be the y-integral. That is, dA = dydx. I’ll leave you to sketch the region over which we are integrating and thereby confirm that

1 2 2 √y/2 x3 sin(3y3) dydx = x3 sin(3y3) dxdy. 2 Z0 Z2x Z0 Z0

181 Figure 14.8: Reversing the order of integration over the shaded region.

Make sure that you understand why we are taking the positive square root in the upper limit of the x-integral. Now, we evaluate as per usual:

1 2 2 1 √y/2 x3 sin(3y3) dydx = x4 sin(3y3) dy 2 4 Z0 Z2x Z0  0 2 1 24 1 = y2 sin(3y3) dy = sinu du 16 144 Z0 Z0 1 24 1 cos24 = cosu = − . 144 − 0 144   Here, we’ve substituted u = 3y3.

14.5 Polar Coordinates

For many double integrals, it is downright inconvenient to have to use cartesian coordinates. For example, if we wanted to check our double integration moxie with something simple like the area enclosed by the unit circle, we’d have to compute

1 √1 x2 1 √1 x2 − 1 dA = − 1 dydx. 1 √1 x2 1 √1 x2 Z− Z− − Z− Z− − This is not particularly difficult, but you can see that it will boil down to making a trigonometric substitution. Why not set the integration up in the first place so as to take advantage of the circular symmetry? In particular, we should change our variables to the polar coordinates r [0,∞) and θ [0,2π): ∈ ∈ 2 2 x = r cosθ, r = x + y , y y = r sinθ ⇐⇒ θ = tanp 1 . − x 182 1 (Since tan− technically has domain ( π/2,π/2), the definition of θ is not strictly speaking correct — one may have to add a multiple− of π to get the correct angle. The integral for the area enclosed within the unit circle would then look something like

2π 1 1 dA. Zθ=0 Zr=0 We therefore need to think about how we should express dA in polar coordinates. Unfortunately, it is not as simple as dA = dr dθ (how could it be?). Instead, we recall that double integrals are defined as Riemann sums over small rectangles of side lengths dx and dy. In polar coordinates, what we want is to do Riemann sums with small regions defined by quantities dr and dθ. Here’s what such a small region looks like: y

dA

r dθ dr

dθ θ x r r + dr

Because dr and dθ are supposed to be infinitesimally small, one can see that the area dA will be approximated very well by a rectangle of side lengths dr and r dθ. Thus,

dA = rdrdθ.

This is the infinitesimal area element in polar coordinates! With it, we can now check that the area enclosed by the unit circle is indeed

r 2π 1 r2 =1 rdrdθ = 2π = π, 2 Zθ=0 Zr=0  r=0 as it must be. Let’s see some less trivial examples of polar coordinates at work:

Example 14.9. Use polar coordinates to find the area of that part of the disc x2 + y2 6 4 for which x > 1.

Solution. We first convert the equations of the boundary curves into polar coordinates:

x2 + y2 = 4 r = 2 and x = 1 r = secθ. ⇒ ⇒ 1 Their intersection points therefore occur when secθ = 2, meaning cosθ = 2 , which gives θ = π/3. We next draw a picture: ± 183 y x = 1 x2 + y2 = 4

π 3 x

r = 2

r = secθ

Since θ runs from π/3 to π/3 and r runs from secθ to 2, the area of the required region is − r π/3 2 π/3 r2 =2 Area = 1 rdrdθ = dθ π/3 secθ · π/3 2 Z− Z Z−  r=secθ π/3 sec2 θ tanθ π/3 = 2 dθ = 2θ π/3 − 2 − 2 π/3 Z−    − 4π π 4π = tan = √3. 3 − 3 3 − Example 14.10. Compute the integral

ydA, ZZR when R is the region bounded between the x-axis and the cardioid r = 1 + cosθ. Solution. The cardioid curve looks as follows:

It follows that we should integrate θ from 0 to π and r from 0 to 1 + cosθ: r 1 cosθ π 1+cosθ π r3 sinθ = + ydA = r sinθ rdrdθ = dθ R 0 0 · 0 3 ZZ Z Z Z  r=0 2 π (1 + cosθ)3 sinθ u=0 u3 u4 4 = dθ = − du = = . 3 3 12 3 Z0 Zu=2  0 Here, we have used the substitution u = 1 + cosθ.

184 Finally, we conclude with an integral that is quite significant for probability and statistics because it shows you how to correctly normalise a normal distribution.

Example 14.11. Demonstrate that

∞ x2/2 e− dx = √2π ∞ Z− by evaluating the square of this integral using polar coordinates.

Solution. This is a sneaky trick. Let I denote the value of the required integral. As this value does not depend on x, we may replace x by y without harm. We do this in I2 to one of the factors: ∞ ∞ ∞ ∞ 2 x2/2 y2/2 (x2+y2)/2 I = e− dx e− dy = e− dxdy. ∞ ∞ ∞ ∞ Z− Z− Z− Z− Converting now to polar coordinates, we discover that the extra r in the infinitesimal area ele- ment magically leads to an evaluation upon substituting u = r2/2:

2π ∞ 2π ∞ 2π 2 r2/2 u I = e− rdrdθ = e− du dθ = 1 dθ = 2π. 0 0 · 0 0 0 Z Z Z Z Z The required integral is now obtained by taking the square root. We know to take the positive x2/2 square root because e− is a strictly positive function. One can conclude from this computation that the standard normal distribution (with mean 0 and standard deviation 1) is correctly normalised by taking

1 x2/2 N0,0(x)= e− . √2π From this, it is a simple matter to generalise to normal distributions of mean µ R and standard deviation σ > 0 as follows: ∈

1 (x µ)2/2σ 2 Nµ,σ (x)= e− − . √2πσ 2

185