<<

Lecture 27

Indeterminate forms

(Relevant section from Stewart, Seventh Edition: Section 4.4)

Many of you are familiar with the idea of “indeterminate forms” as well as “l’Hˆopital’s Rule”. If you are not, there is no problem, since we shall develop the subject from first principles.

You will recall that we showed the following result,

sin x lim = 1, (1) x→0 x by means of a geometric argument. But let’s step back a bit and pretend that we don’t know this result. If we encounter the problem sin x lim (2) x→0 x for the first time, we see that there is indeed a problem: The numerator sin x tends to 0 as x 0 as → does the denominator x. Had the denominator been x + 1, we could have easily concluded, from the limit law for quotients, that sin x lim sin x lim = x→0 = 0. (3) x→0 x + 1 limx→0 x + 1 This is possible because the limit of the denominator is not 0. But what happens if the limit of the denominator is zero? As you should know by now, the answer is, “It depends.” For example, from Eq. (1) and a change of variable, one can show that sin 5x lim = 5. (4) x→0 x In this case again, the limits of the numerator and denominator are both 0, but we obtain a different result. The natural question is: What happens in general when we are faced with the following problem,

f(x) lim , (5) x→a g(x) when lim f(x) = 0 and lim g(x) = 0. (6) x→a x→a

203 0 Such a problem is known as an indeterminate form . It is “indeterminate” because we just can’t 0 conclude, on the basis of f(x) and g(x) what the limit is, or even if it exists.

You have also encountered another indeterminate form, namely ∞ . For example, consider the limit problem, ∞ x2 + 5 lim . (7) x→∞ 2x2 + 4 Both functions approach as x . You are already aware of a method to determine this limit: ∞ →∞ Divide out the highest power of x in both the numerator and denominator:

2 2 2 x + 5 1 + 5/x limx→∞ 1 + 5/x 1 lim 2 = lim 2 = 2 = . (8) x→∞ 2x + 4 x→∞ 2 + 4/x limx→∞ 2 + 4/x 2

A “-based” approach does exist to treat indeterminate forms – it is known as “l’Hˆopital’s Rule:

L’Hˆopital’s Rule: Assume that f and g are differentiable on an open interval that contains the point a R (except possibly at a). Also suppose that either ∈ 1. lim f(x) = 0 and lim g(x)=0or x→a x→a 2. lim f(x)= and lim g(x)= . x→a ±∞ x→a ±∞ Then f(x) f ′(x) lim = lim , (9) x→a g(x) x→a g′(x) if the limit on the RHS exists (it can be or ). ∞ −∞

Example 1: Let’s return to the first example considered in this lecture, i.e.,

sin x lim . (10) x→0 x

Here f(x) = sin x, g(x)= x, (11) so that f ′(x) = cos x, g′(x) = 1. (12)

204 We already know the following, but let’s repeat it here, to show that it satisfies the requirements of l’Hˆopital’s Rule: lim f(x) = 0, lim g(x) = 0. (13) x→0 x→0 Now note that f ′(x) cos x lim = lim = 1. (14) x→0 g′(x) x→0 1 The limit exists so we can conclude, by l’Hˆopital’s Rule, that

sin x lim = 1. (15) x→0 x

Example 2: Before proving a simple version of l’Hˆopital’s Rule, we return to the second example mentioned earlier, x2 + 5 lim . (16) x→∞ 2x2 + 4 1 (We know this limit to be .) Here, 2

f(x)= x2 + 5, g(x) = 2x2 + 4, (17) so that f ′(x) = 2x, g′(x) = 4x. (18)

Then f ′(x) 2x 2 1 lim = lim = lim = , (19) x→∞ g′(x) x→∞ 4x x→∞ 4 2 in agreement with our earlier result using another method.

Note: If necessary, l’Hˆopital’s Rule can be applied more than once. In fact, in the above example, if 2x we decided that we didn’t want to divide out the x’s from the ratio , we could have considered 4x the numerator 2x and the denominator 4x as new functions to which the Rule could be applied. 2 1 Differentiating again, we obtain the ratio , which yields our limit . 4 2

Finally, we mention that l’Hˆopital’s Rule is also applicable to one-sided limits, i.e., x a+ and → x a−. →

205 Proof of l’Hˆopital’s Rule for a special case

In what follows, we provide a proof of l’Hˆopital’s Rule for a special case – the simplest or “nicest case”, which is very often encountered in applications. We assume that the functions f ′(x) and g′(x) are continuous functions in an interval containing the limit point a and that f(a)= g(a) = 0. From the continuity of f ′(x) and g′(x), it follows that

f ′(x) f ′(a) lim = . (20) x→a g′(x) g′(a)

But the quotient on the right may also be expressed as follows, by definition of the ,

′ f(x)−f(a) f(x)−f(a) f (a) limx→a f(x) 0 = x−a = lim x−a = lim − . (21) g′(a) lim g(x)−g(a) x→a g(x)−g(a) x→a g(x) 0 x→a x−a x−a − It therefore follows that f(x) f ′(x) lim = lim . (22) x→a g(x) x→a g′(x)

In the case that f(a) or g(a) (or both) is not defined, but lim f(x) = 0 and lim g(x) = 0, then a x→a x→a slightly more complicated proof is required. It can be found in Appendix F of the text by Stewart.

We now consider some additional examples of the use of l’Hˆopital’s Rule as well as other indeterminate forms.

Example 3: lim x e−x. x→∞ 0 This doesn’t look like an indeterminate form of the form or ∞.The term x goes to and e−x 0 ∞ goes to zero, so it is actually the indeterminate form “0 ” which∞ we’ll discuss a little later. We can ·∞ turn the above into an ∞ indeterminate form by rewriting it as ∞ x lim . (23) x→∞ ex

Here, f(x) = x and g(x) = ex. This is now an ∞ indeterminate form: As x , f(x) and → ∞ → ∞ g(x) . Since f ′(x)=1 and g′(x)= ex, we have∞ that →∞ f ′(x) 1 lim = lim = 0, (24) x→∞ g′(x) x→∞ ex

206 it follows that x lim = 0. (25) x→∞ ex

x2 Example 4: lim . x→∞ ex Here, f(x) = x2 and g(x) = ex. Once again, we have an ∞ indeterminate form: As x , → ∞ f(x) and g(x) . Since f ′(x) = 2x and g′(x)= ex, we have∞ that →∞ →∞ f ′(x) 2x lim = lim = 0, (26) x→∞ g′(x) x→∞ ex where we have used the result of Example 3. (If we didn’t have this result at hand, we would simply apply l’Hˆopital’s Rule one more time.) Therefore

x2 lim = 0. (27) x→∞ ex

xn Example 4: From Examples 3 and 4 above, we should be able to see that lim , for n = 1, 2, 3, . x→∞ ex · · · Once again, if we didn’t have these examples at hand, we would, for any given n, apply l’Hˆopital’s Rule n times. This result implies that ex faster than any power of x as x . →∞ →∞

Other indeterminate forms

There are other indeterminate forms, for example,

0 , , 0∞, 0, 1∞. (28) ·∞ ∞−∞ ∞ 0 As the and ∞ indeterminate forms, we must investigate each case separately. 0 For example,∞ in response to a question by a student, it does not necessarily follow that 0 is ·∞ . Consider the following cases, in all of which ∞

lim f(x) = 0 and lim g(x)= . (29) x→∞ x→∞ ∞ 1 1. f(x)= , g(x)= x. Then lim f(x)g(x) = lim 1 = 1. x x→∞ x→∞ 1 1 2. f(x)= , g(x)= x. Then lim f(x)g(x) = lim = 0. x2 x→∞ x→∞ x

207 1 3. f(x)= , g(x)= x2. Then lim f(x)g(x) = lim x = . x x→∞ x→∞ ∞ As Stewart writes in his textbook, in the indeterminate forms in (28), for example, , the ∞−∞ question is “Which one wins?” The answer is, “We have to treat each case separately.”

Example 5: lim x ln x. x→0+

(We must consider the right-sided limit because ln x is undefined for x < 0.) This is a 0 ·∞ indeterminate form. In an effort to use l’Hˆopital’s Rule, we rewrite the x ln x as a quotient:

ln x f(x) x ln x = = . (30) x−1 g(x)

Since f(x)=ln x and g(x)= x−1 as x 0, this quotient is an ∞ indeterminate form. → −∞ →∞ → We now try to apply l’Hˆopital’s Rule, ∞

f ′(x) x−1 = = x, (31) g′(x) x−2 − − we have f ′(x) lim = lim ( x) = 0. (32) x→0+ g′(x) x→0+ − It follows that lim x ln x = 0. (33) x→0+

There remains the question: Could we have used the other possible quotient, i.e.,

x f(x) x ln x = 1 = ? (34) ln x g(x) Unfortunately, the derivative of the denominator gets quite complicated:

1 f ′(x) = 1, g′(x)= , (35) x(ln x)2 so that f ′(x) = x(ln x)2. (36) g′(x) This problem is more complicated than the original problem, suggesting that this is not the way to proceed.

208 Another note on Example 5: You may recall that we encountered this limit problem in a previous lecture – the lecture on logarithmic differentiation. The problem was to compute the derivative of the function f(x)= xx. To do so, we took logarithms:

1 dy dy y = xx ln y = x ln x, =1+ln x = y(1 + ln x)= xx(1 + ln x). (37) ⇒ ⇒ y dx ⇒ dx But to get an idea of the graph of xx, it was necessary to determine its behaviour as x 0+. Since → y = xx, it follows that ln y = x ln x, (38) so that lim ln y = lim x ln x. (39) x→0+ x→0+ At that time, we simply stated that the limit on the RHS was 0, with the understanding that we would prove it later in the course, which is what we have done in Example 5. From this result, it follows that lim ln y = 0. (40) x→0+ We now rewrite the LHS as follows, ln lim y = 0. (41) x→0+  This follows from the continuity of the ln function. Since ln 1 = 0, we can conclude that

lim y = lim xx = 1. (42) x→0+ x→0+

Example 6: lim x2 + x x. x→∞ p − This is an indeterminate form. Once again, we try to express this function in terms of ∞−∞ a quotient. We might try to remove the square root in the usual way:

√x2 + x + x (x2 + x) x2 x x2 + x x = ( x2 + x x) = − = . (43) − − · √x2 + x x √x2 + x + x √x2 + x + x p p − Then x 1 1 lim x2 + x x = lim = lim = . (44) 2 −1 x→∞ p − x→∞ √x + x + x x→∞ √1+ x + 1 2

This result might be a little surprising. As x , both √x2 + x and x go to infinity, yet their → ∞ difference approaches a finite number, and a small one. In MATH 138, you will learn another method

209 to obtain this limit, with the help of Taylor . For the moment, consider two particular numerical values of the function in this example, i.e.,

f(x)= x2 + x x. (45) p − We find that f(100) 0.498756, f(1000) 0.499875. (46) ≈ ≈ Even at x = 1000, which is far from , f(1000) is quite close to the limiting value of 0.5. In MATH ∞ 138, with the help of , you’ll be able to estimate the difference between f(x) and the limiting value 0.5.

1 x Example 7: We return to the basic limit for e: lim 1+ = e. x→∞  x

We derived this limit in a previous lecture. It corresponds to an 1∞ indeterminate form. We now consider a slight modification of the above indeterminate form,

a x lim 1+ . (47) x→∞  x We’ll try to express the above limit in terms of the known limit for e as follows: Let

a 1 = x = ay, (48) x y ⇒ so that a x 1 ay 1 y a 1+ = 1+ = 1+ . (49)  x  y   y   Then a x 1 y a 1 y a lim 1+ = lim 1+ = lim 1+ = ea. (50) x→∞  x y→∞  y   y→∞  y  

210 Lecture 28

Optimization problems

(This lecture was presented using a slide presentation of the lecture notes below.)

(Relevant section from Stewart, Seventh Edition: 4.7) We now enter – albeit briefly – the territory of “Applied Calculus,” the use of Calculus to solve some “practical” problems. These problems involving the maximizing or minimizing of something, i.e., a quantity “Q”, for example, a distance, a time, an area, a volume, or perhaps the cost of constructing something (which, in turn, may depend upon its area or volume). In real life, the quantities Q that one seeks to minimize or maximize will involve many variables. This will be the subject of a future calculus course on functions of several variables (e.g. MATH 237 or MATH 227). Since this course deals with functions of a single real variable, we’ll examine some problems that can be transformed into such functions. Indeed, this is often the stumbling block – how to transform a problem with a number of components into one with a single variable. It is recommended that you read Section 4.7 of Stewart’s textbook thoroughly for a good under- standing of this topic. It begins with a helpful outline of the strategy behind solving optimization problem. A number of illustrative problems are then solved. In this lecture, we have time only to consider a few problems, starting with a simple one and then working our way up to a very interesting problem from Physics: the refraction of light.

Example 1: Find the area of the largest rectangle that can be inscribed in a semicircle of radius R. The first step is to sketch a picture of the situation, if only to get a feel for the problem. The next step is to introduce a quantity that could serve as the unifying variable in the problem. In this case, we may wish to consider the x-coordinate of the side of the rectangle, as indicated in the figure above. The domain of this x variable will be [0, R]. By symmetry, this implies that the rectangle will extend from x to x. As such, its base length b is −

b = 2x. (51)

211 y

y = √R2 x2 −

h

R0 x R − b

The height h of the rectangle will then be determined by the semicircle:

h = R2 x2. (52) p − The area of the rectangle now becomes a function of x:

A = bh = 2x R2 x2 = A(x). (53) p − The goal is now to maximize A(x) over the interval [0, R]. Note that

A(0) = 0, A(R) = 0, (54) as expected from the sketch: When x = 0, the rectangle becomes a vertical line from (0, 0) to (0, R). When x = R, the rectangle becomes the horizontal line y = 0 and R x R. − ≤ ≤ We look for critical points of A(x):

1 2x A′(x) = 2 R2 x2 + 2x − − · 2 · √R2 x2 p − 2x2 = 2 R2 x2 − − √R2 x2 p − R2 x2 x2 = 2 − − √R2 x2 − R2 2x2 = 2 − . (55) √R2 x2 − We see that R2 R A′(x) = 0 2x2 = R2 x2 = x = . (56) ⇒ ⇒ 2 ⇒ √2 Evaluating A at this critical point, we see that

R2 R R A = 2 = R2. (57) √2 √2√2

212 Since this value is greater than the value of A(x) at the endpoints, i.e., A(0) = A(R) = 0, we conclude that this is the maximum value of A(x) on [0, R].

That being said, let’s just step back and examine another method to ascertain that, in fact, the above point corresponds to a local maximum and perhaps even an absolute maximum. To see that R x = is a local maximum, let’s examine the derivative A′(x) on both sides of this point. If we can √2 show that R R A′(x) > 0 for x< and A′(x) < 0 for x> , (58) √2 √2 R then x = is a local maximum. The above implies that f(x) is increasing until we get to the critical √2 point, and then decreasing as we move away from it to the right. To show the above inequality, let us rewrite the derivative A′(x) slightly as follows,

4 R2 A′(x)= x2 . (59) √R2 x2  2 −  − We see that R2 R2 x2 < A′(x) > 0 and x2 > A′(x) < 0. (60) 2 ⇒ 2 ⇒ R which agrees with the inequalities in (58). Therefore the critical point x = is a local maximum. √2 In fact, since the inequalities hold for all appropriate x-values in the interval [0, R], we can conclude that the critical point is an absolute maximum on the interval. (This idea is discussed in Stewart’s textbook, Section 4.7, p. 324, as the “First for Absolute Extrema”.)

In Eq. (59), we see that the derivative A′(x) as x R−. This suggests that the graph of → −∞ → A(x) exhibits a cusp-like nature at x = R. This is confirmed by the following plot for the case R = 1, courtesy of MAPLE: An alternate method: Instead of using the Cartesian coordinate x, we can introduce an angle variable θ as shown below. The base length b and height l of the rectangle are now given in terms of θ: π b = 2R cos θ, l = R sin θ, 0 θ , (61) ≤ ≤ 2 so that the area A is now a function of θ,

A = bl = (2R cos θ)(R sin θ) = 2R2 cos θ sin θ = A(θ). (62)

213 1

0.8

0.6

0.4

0.2

0 0 0.2 0.40.6 0.8 1 x

y

y = √R2 x2 −

h θ R0 R − b

We now seek to maximize A(θ). One could compute the θ-derivative of the above expression to do this. But recalling the double-angle formula for sin 2θ, we may rewrite A(θ) as

A(θ)= R2 sin 2θ. (63)

Over the interval θ [0, π/2], the sin(2θ) function increases to 1 at θ = π/4 and then decreases to 0 ∈ at π/2. So we could simply write down that θ = π/4 yields the maximum value of A(θ). But just to be complete, let’s compute the derivative,

A′(θ) = 2R2 cos 2θ. (64)

We see that π A′(θ) = 0 when cos 2θ = 0 θ = . (65) ⇒ 4 Thus, θ = π/4 is the only critical point on [0, π/2]. At this value of θ,

π π A = R sin = R2, (66)  4   2  as before.

214 R Note that the critical point θ = π/4 found above corresponds to the value x = , which is √2 in agreement with the previous method. This is good! It is often possible to solve a problem using several methods – if they are correct, they should yield the same result! That being said, note that the graphs of A(x) and A(θ) are quite different. The former was plotted above, and we saw that A′(x) as x R−. This comes from the derivative of the square root → −∞ → function √R2 x2 which, in turn, arises from the use of the Cartesian variable. On the other hand, − the function A(θ)= R2 sin 2θ is a simple one-half sine wave over [0, R]. The square root function from the Cartesian representation is replaced by a trigonometric function. It is often – but not always! – easier to work with angles.

Example 2: We now consider a three-dimensional version of the above two-dimensional one. Find the cylinder of largest volume that can be inscribed in a hemisphere of radius R. The situation is sketched below.

r

R

For simplicity, however, we consider a cross-sectional view of this problem, which basically gives us the same type of picture in Example 1: In this problem, however, we seek to maximize the volume of the inscribed cylinder, which will be given by V = (area of circular base) (height) = πr2h. (67) × We’ll introduce the angle θ again, so that the base radius of the cylinder is given by

r = R cos θ, h = R sin θ. (68)

215 y

y = √R2 x2 −

h θ R0 R − r

The volume V , as a function of θ, becomes π V (θ)= πR3 cos2 θ sin θ, 0 θ . (69) ≤ ≤ 2 Let’s just step back for a moment and check whether the dimensionality of this result is correct: The trigonometric terms are dimensionless, so the dimensionality of the RHS comes from the term R3, which is length3, written as “L3”. This is the correct dimension for a volume. Also note that V (0) = V (π/2) = 0. Since we are looking for the maximum value of V that will be attained on the interval (0, π/2), we search for critical points on that interval. Just to reduce some work, let’s write the volume function as

V (θ)= πR3f(θ), where f(θ) = cos2 θ sin θ. (70)

We now look for critical points of f(θ) by computing f ′(θ). There are several ways to proceed in the computation of f ′(θ). We could simply differentiate the above expression, i.e.,

f ′(θ)= 2 cos θ sin2 θ + cos3 θ = cos θ( 2sin2 θ + cos2 θ). (71) − − Since it will be the term in brackets that will determine our critical point, we could convert the sin function into a cos function or vice versa. If we use cos2 θ = 1 sin2 θ, we obtain − f ′(θ) = cos θ(1 3sin2 θ). (72) − We see that f ′(θ) = 0 if either cos θ = 0 or the term in brackets vanishes. But cos θ = 0 implies that θ = π/2, for which the volume V = 0, which will not be a maximum volume. The term in brackets vanishes if 1 sin2 θ = . (73) 3

216 Before going on, let’s examine another way to obtain this result. We go back to Eq. (70) for the definition of f(θ) and rewrite it as follows,

f(θ) = (1 sin2 θ) sin θ = sin θ sin3 θ. (74) − −

Differentiating, f ′(θ) = cos θ 3sin2 θ cos θ = cos θ(1 3sin2 θ), (75) − − which is in agreement with Eq. (72). It is not necessary to solve for θ in Eq. (73). All we need to do is to find the values of sin θ and cos θ from this equation, which is easy:

1 1 1 2 sin θ = = , cos2 θ = 1 sin2 θ = 1 = . (76) r3 √3 − − 3 3

We then substitute these values into Eq. (70),

2 1 2 V = πR3 = πR3. (77) 3√3 3√3

Clearly, this value of V is greater than the values V = 0 at the endpoints of the interval, so we can comfortably conclude that this is the maximum cylinder volume. That being said, we could once again investigate how the first derivative changes at this critical point. We’ll express f ′(θ) as follows,

1 f ′(θ) = 3 cos θ sin2 θ . (78) 3 − 

Then it can be seen that

1 1 sin2 θ< f ′(θ) > 0, and sin2 θ > f ′(θ) < 0. (79) 3 ⇒ 3 ⇒

Once again, the critical point is a local maximum - in fact, it is an absolute maximum.

The result in Eq. (77) is our desired result, but perhaps we can make it more meaningful if we relate it to the volume of the hemisphere in which the cylinder is embedded. You may or may not know (it doesn’t matter - you’ll derive this formula in MATH 138) that the volume of a sphere of radius R is 4 V = πR3, (80) sphere 3

217 implying that the volume of the hemisphere is 2 V = πR3. (81) hemi 3

Let us now rewrite the maximum cylinder volume in terms of Vsphere:

2 3 2 3 1 1 πR = πR = Vhemi 0.577 Vhemi. (82) 3√3 3 √3 √3 ≈ This gives us a better idea of the size of the cylinder.

Example 3: Snell’s Law of Refraction We now consider the very important phenomenon of refraction – the “bending” of light rays as light moves from one medium to another, which is due to the difference in the velocity of light in the two media. (This is presented as Problem No. 63 on Page 331 of the text by Stewart.)

Let v1 be the velocity of light in air and v2 the velocity of light in water. It is a fact that v1 > v2, i.e., light travels faster in air than in water. Now suppose that light rays are emitted from A in all directions. Ideally, there is only one ray of light that travels from A and hits point B – during its travel, it is “bent downward” (toward the normal) as it passes from air to water. This phenomenon is sketched below. A. air

θ1

C θ 2. B water

The basic physical principle behind this “bending” or refraction is known as Fermat’s principle: The path ACB taken by the ray is the one that minimizes the time required to travel from A to B. (Note that it is the time that is minimized and not the distance. If it were the latter, then the path would be the straight line AB.)

Our goal is to show that this minimizing path implies that the angles θ1 and θ2 are related as follows, sin θ v 1 = 1 . (83) sin θ2 v2

218 This is Snell’s Law of Refraction.

To prove Snell’s Law, we shall let x denote the position of point C on the x-axis, which is defined by the horizontal water surface. Furthermore, introduce points O and P on the x-axis as shown below. A.

θ1 a d1

air C P 0 water x θ2 b d2 . B L

We shall let O define the origin of a coordinate system and let L denote the length of the line OP . In this way, 0 x L. Also let a and b denote the distances between, respectively, A and B to the ≤ ≤ air-water interface, as shown in the diagram. We now wish to find the value of x that minimizes the time T taken for the ray to travel from A to B. For convenience, let d = AC and d = CB denote 1 | | 2 | | the distances travelled by the ray in, respectively, air and water. Then

T = t1 + t2, (84) where d1 t1 = (time taken for light ray to travel from A to C), (85) v1 d2 t2 = (time taken for light ray to travel from C to B), (86) v2

In terms of x, the distances d1 and d2 become

2 2 2 2 d1 = a + x , d2 = (L x) + b . (87) p p − Therefore, the total time T may be expressed as a function of the variable x,

1 1 T (x)= a2 + x2 + (L x)2 + b2. (88) v v 1 p 2 p − We now compute T ′(x) in order to look for critical points,

1 x 1 (L x) T ′(x)= − . (89) v1 √a2 + x2 − v2 (L x)2 + b2 − p 219 But this result may easily be expressed in terms of the angles θ1 and θ2 as follows,

sin θ sin θ T ′(x)= 1 2 . (90) v1 − v2

The condition for a critical point is that T ′(x) = 0. In this case, the RHS is zero and a rearrangement produces Snell’s Law in (83). Even though the above method gives the desired answer, namely Snell’s law, we should check if the critical point corresponds to a minimum or maximum. Differentiation of the RHS in Eq. (89 yields, after a little rearrangement,

1 a2 1 b2 T ′′(x)= + > 0. (91) 2 2 3/2 2 2 3/2 v1 [a + x ] v2 [(L x) + b ] − Therefore, the graph of T (x) is concave upward on [0,L], implying that the critical point is a global minimum: T (x) will decrease as x increases from 0 until it attains a minimum value at the critical point, after which it will increase until x reaches L.

A few notes on this refraction problem are in order before closing this section.

1. First of all, the angles θ1 and θ2 are normally displayed as the angles of incidence and refraction with respect to the normal to the water surface at the point x, as shown below.

A

θ1 v1

v2

θ2

B

Angles of incidence and refraction for light ray travelling from air to water (or vice versa).

2. Even more interesting is that an observer at A will perceive point B as being higher in the water than it actually is, for example, at point B′. Likewise, an observer at B will perceive point A as being higher than it actually is, for example at point A′. This is sketched below.

220 A′ apparent position of A

A

v1

v2

B′ apparent position of B

B

Apparent positions of each point from the other point’s perspective.

3. Finally, let us imagine a collection of light rays emanating from point B in all directions, and where some of these will hit the line normal to the air-water interface that passes through point A. For convenience, we’ll let this normal define the y-axis as shown below.

y

A

air

0 . B water

As the light rays from B become more vertical, i.e., the angle θ 0+, the y-coordinates of the 2 → intersection points increase to infinity. This follows from the fact that the velocity of light in

air, v1, is greater than the velocity of light in water, v2. From Snell’s Law, it follows that

v1 v1 sin θ1 = sin θ2, with > 1. (92) v2 v2

Because the function sin θ is monotonically increasing on [0, π/2], it follows that θ1 >θ2. Nev-

221 ertheless, as θ 0+, we see that θ 0+. 2 → 1 →

Now consider the light rays that emanate from B with increasing θ2, i.e., they start out from v B, travelling leftward and more horizontal. Since 1 > 1, implying that θ >θ , there will be a v 1 2 π π2 critical value of θ – call it θ∗ < , at which θ = . This critical value is easily determined as 2 2 2 1 2 follows, π v1 ∗ ∗ v2 ∗ v2 sin =1= sin θ2 sin θ2 = θ2 = arcsin . (93) 2 v2 ⇒ v1 ⇒ v1  ∗ The ray associated with this critical value θ2 is identified in the above figure with a thicker line. At this critical value, light rays no longer travel into the air.

As you probably know, light rays that emanate from B with θ2 values higher than this critical ∗ value θ2 are reflected back into the water. This is known as total internal reflection. However, Snell’s Law is not sufficient to explain this phenomenon – we must examine the wave picture of light to account for the reflection.

222 “Newton’s method” for finding the zeros of a function/roots of an equation

(Relevant section from Stewart, Seventh Edition: Section 4.8, p. 338)

This subject is treated very well in Stewart’s textbook, Section 4.8, but these lecture notes will provide some supplementary ideas and different perspectives which you may find interesting and helpful. First of all, the history of Newton’s method, also referred to as the Newton-Raphson method is interesting: You can find a brief outline in - where else? - Wikipedia:

http://en.wikipedia.org/wiki/Newton’s method

Very briefly, it was described by in a book that was written in 1669 but not published until 1711. His method, however, was different from the form used today – it was based on polynomial approximations. By the time of the publication of his work, several other people, including Joseph Raphson also published variations of the method.

The idea of the Newton-Raphson method is to find successive approximations xn of a zerox ¯ of a function f(x), i.e., f(¯x) = 0. As n increases, it is hoped that the xn values provide better and better approximations tox ¯ – in other words, that they converge tox ¯, i.e.,

lim xn =x. ¯ (94) n→∞

As we’ll see, these approximations xn are produced by the iteration of a function that we shall call the Newton function. There are advantages and disadvantages to this approach, however.

We begin with a generic illustration of the problem, as shown in the figure below. From the graph of the function f, we see that it has a zerox ¯. Our goal is to start with a “reasonable” approximation or guess ofx ¯ – we’ll call it x0 – and then try to come up with a better approximation x1. Hopefully, x1 is closer tox ¯ than x0 is.

Starting with our initial guess x0, we evaluate f(x0) and then construct the line to the graph of f at the point (x0,f(x0)), as shown in the figure below. Recall that this is the linearization

223 of f at x0, given by the formula,

L (x)= f(x )+ f ′(x )(x x ). (95) x0 0 0 − 0 ′ ′ Of course, we must assume that f (x0) exists – for this reason, we’ll assume that f (x) exists for all x, or at least for all x in an interval of suitable length containing the zerox ¯.

y

y = f(x)

y = Lx0 (x)

x 0 x¯ x1 x0

Our next approximation x1 is taken to be the point where the linearization Lx0 (x) intersects the x-axis, as shown in the above figure. We easily can solve for this point:

L (x ) = 0 f(x )+ f ′(x )(x x ) = 0. (96) x0 1 ⇒ 0 0 1 − 0

We now solve for x1: f(x0) x1 = x0 ′ . (97) − f (x0)

Of course, we may now continue the procedure, using x1 as our new “guess” and producing the next approximation, x2, by constructing the linearization of f at x1, as sketched below. The result is

f(x1) x2 = x1 ′ . (98) − f (x1) We may keep repeating this procedure to produce a sequence of approximations x ,x ,x , . In 0 1 2 · · · general, if we know element xn, then the next approximation is given by

f(xn) xn+1 = xn ′ . (99) − f (xn)

At this point it is convenient to summarize the Newton-Raphson procedure as follows: Given a function f with a zerox ¯, i.e., f(¯x) = 0, that we wish to approximate, we start with an initial

224 y

y = f(x)

y = Lx0 (x)

x 0 x¯ x2 x1 x0

approximation x0 and then perform the iteration process,

f(x) x = N(x ), where N(x)= x . (100) n+1 n − f ′(x)

We shall refer to N(x) as the Newton function associated with the function f.

Eq. (100) represents the iteration of a function. If we consider x0 as the “input” x0 into the

Newton function N(x), the associated “output” is x1. We then use x1 as an “input” and put it back into the Newton function to produce the “output” x2. The procedure is then repeated. We’ll return to this concept of the iteration of a function a little later. Note the appearance of f ′(x) in the denominator of the Newton function in (100). This means that we should avoid critical points in our iteration procedure. It would be desirable if no critical points exist in our interval I that contains the zerox ¯. This includes the zerox ¯ itself. (If f(¯x)=0and f ′(¯x) = 0, thenx ¯ is a multiple zero of f, e.g., x = 2 is a double zero of f(x) = (x 2)2.) −

There are a good number of illustrative examples in Stewart’s textbook, so we’ll consider only a couple here.

Example 1: Use Newton’s method to estimate √2. An appropriate function to use here is f(x)= x2 2, the zeros of which are √2. It is sketched − ± below. The Newton function N(x) associated with this function is

f(x) x2 2 N(x)= x = x − . (101) − f ′(x) − 2x

225 2 y = x2 2 − 1

-2 -1 01 2

-1

-2

We could leave N(x) in this form or simplify it further as follows,

x2 2 x 1 N(x)= x + = + . (102) − 2x 2x 2 x We’ll use this final form. Newton’s method then becomes the iteration procedure

xn 1 xn+1 = + (103) 2 xn for some starting point x0. Knowing that √2 lies between 1 and 2, let’s start with x0 = 2. Then, to nine decimal digits,

x0 1 1 3 x1 = + =1+ = = 1.5 2 x0 2 2 x1 1 3 2 17 x2 = + = + = = 1.416˙ 2 x1 4 3 12 x2 1 x3 = + = 1.414215686 2 x2 ∼ x3 1 x4 = + = 1.414213562 2 x3 ∼ x3 1 x5 = + = 1.414213562. (104) 2 x3 ∼

Notice that, to nine decimal digits, there is no difference between x4 and x5, so there is no point in going further, at least if we are displaying results to nine decimal points. And, indeed, the result x4 is √2 to nine decimal digits. That was fast!

Let’s now start with x0 = 1, on the other side of √2. Then

x0 1 1 x1 = + = +1=1.5. (105) 2 x0 2

226 Note that this value of x1 is the same as was produced by starting with x0 = 2, so we know the result of this iteration procedure. This doesn’t always happen – you may wish to investigate the graph of f(x) a little more closely to see why it happens in this particular case.

It can be shown - but we won’t do it here - that for any x0 > 0, the Newton iteration sequence x converges to √2. Of course, the point x = 0 is to be avoided, since f ′(0) = 0. And if we start { n} with x < 0, the Newton iteration sequence, as you may expect, will converge to √2, the other zero 0 − of f(x) = x2 2. Therefore, the critical point of f(x), x = 0, serves as the boundary between the − regions x> 0 and x< 0 that are associated with the respective zeros √2 and √2. −

A further note: With the above example in mind, the set of points x R which, when used as ∈ starting points for Newton’s method, converge to a given zerox ¯ of a function f is known as the basin of attraction ofx ¯. In the above example:

1. The basin of attraction of the zerox ¯ = √2 is the set (0, ). 1 ∞ 2. The basin of attraction of the zerox ¯ = √2 is the set ( , 0). 2 − −∞

Finally, we mention that in the above example, you may have noticed that since our starting points x0 were integers, therefore rational numbers, then the first few iterates were also rational numbers. In fact, from a look at the Newton function in Eq. (103), we see that if xn is rational, then so is xn+1. In other words, the Newton function N(x) maps rational numbers to rational numbers. Therefore, if we start with a rational number x0, all higher iterates xn are rational. Newton’s method therefore generates a series of rational approximations xn to the rootx ¯ – in this case the irrational number √2 which converge to it in the limit n . →∞

Example 2: This is probably belabouring the point, but let’s use Newton’s method to estimate p = 71/5. Since p5 = 7, the appropriate function to use here is f(x) = x5 7. The Newton function − associated with f(x) is f(x) x5 7 N(x)= x = x − , (106) − f ′(x) − 5x4

227 which can be simplified to 4x 7 N(x)= + . (107) 5 5x4 Newton’s method then becomes the iteration procedure

4xn 7 xn+1 = + 4 (108) 5 5xn for some starting point x . We note that f(1) = 6 and f(2) = 25. If we start with x = 2, then the 0 − 0 first few iterates xn are, to nine decimal places,

x1 = 1.6875

x2 = 1.522644564

x3 = 1.478571365

x4 = 1.475783733

x5 = 1.475773161

1/5 x6 = 1.475773162 = 7 to nine decimal digits. (109)

The convergence was perhaps not as rapid as the previous example, but still quite good.

Example 3: This example can be found in the textbook (Example 3 on p. 337). It was simply introduced in the lecture, but we’ll come back to it a little later in these notes. The problem is to find the root of the equation cos x = x. (110)

We must rewrite this equation in terms of a function f(x), the zero(s), of which will be the solution of the equation. In this case, f(x) = cos x x, (111) − will do. (We could also use f(x)= x cos x.) The Newton function N(x) associated with f(x) is − f(x) cos x x cos x x N(x)= x = x − = x + − . (112) − f ′(x) − sin x 1 sin x + 1 − − The solution of Eq. (110) represents the intersection of the graphs of the functions cos x and x. From an examination of these graphs (see textbook, p. 338), it would seem that x0 = 1 is a reasonable

228 choice for an initial approximation. We then find, using nine decimal digits accuracy, that

x1 = 0.7503638679

x2 = 0.7391128909

x3 = 0.7390851334

x4 = 0.7390851332

x5 = 0.7390851332. (113)

Therefore, to nine decimal places, the root of Eq. (110) is x = 0.7390851332.

Some problems with Newton’s method

Newton’s method is based on local analysis – the behaviour of iterates of the Newton function N(x) in a sufficiently small neighbourhood of a simple zerox ¯ of a function f(x). It works extremely well if we start close to a zero. But that’s about it – even though you think that you may be close to the zero of a function, you may not be close enough, and Newton’s method may not necessarily take you there. As such, care must be exercised when using Newton’s method. Here is an illustration of what can go wrong. In the figure below is sketched the with several zeros. If you start with the seed point x0 which is quite close to the leftmost zero in the

figure,x ¯1, then Newton’s method will take you tox ¯1. But if you start with the more distant seed point p0, the point p1 = N(p0) actually jumps over the first two zeros, to be in a position to converge to the third zero,x ¯3. Graphically, of course, we see the problem – as we move leftward away from the zerox ¯1, the graph of the function begins to level off, i.e., veer more toward the left.

y = f(x)

p0 q0 x0 x¯1 x¯2 p1 x¯3

And, indeed, the siutation is even more complicated. Between p0 and x0, there is a region over

229 which the of the graph of f(x) is such that starting points, such as q0, will be mapped very close to the second zero,x ¯2. As a result, the basins of attraction of these three zeros are not simple intervals on the real line – they will be comprised of subintervals. As such, the basins of attraction of the three zeros sketched above will be intermingled with each other. And then follows the natural question: What is the boundary of these basins of attraction? In the case of three or more roots, this is an extremely interesting question that is certainly beyond the scope of this course. To understand this problem, one must actually extend Newton’s method into the complex plane. Very briefly, for the case of three or more zeros, the boundary separating the basins of attraction of the zeros has a fractal structure. There is a remarkable result involving the theory of complex functions stating that if you take a point z from such a boundary, and draw a tiny circle of radius ǫ> 0 around it in the complex plane, then this circular region must contain points from all basins of attraction! To get an idea of the fascinating structure of these sets, you are invited to look at the following research paper,

E.R. Vrscay, Julia Sets and Mandelbrot-like Sets Associated With Higher Order Schr¨oder Rational Iteration Functions: A Computer-Assisted Study, Mathematics of Computation, Vol. 46, No. 173, 151-169 (1986). a copy of which has been posted after this set of lecture notes on UW-ACE.

Newton’s method as an iteration procedure, and the role of “fixed points”

As discussed earlier, Newton’s method is an iteration procedure. Given a function f(x), the zero(s) of which we are interested in approximating, its associated Newton function N(x) is given by

f(x) N(x)= x . (114) − f ′(x)

Given a seed point x sufficiently close to a zerox ¯ of f, we form the iteration sequence, x ,x , , as 0 1 2 · · · follows,

xn+1 = N(xn). (115)

If the method “works,” then the iterates xn approach the zerox ¯, i.e.,

lim xn =x. ¯ (116) n→∞

230 There is something deeper going on here. What happens if we set x0 =x ¯, the zero of f(x)? In this case, f(¯x) x = N(x )= N(¯x) =x ¯ =x, ¯ (117) 1 0 − f ′(¯x) since f(¯x) = 0. In summary, x¯ = N(¯x), (118) i.e., N maps the pointx ¯ to itself. Such a point is known as a fixed point.

Fixed points play an extremely important role in mathematics, both from a theoretical as well as a practical, especially computational, viewpoint. Many algorithms for the solution of problems are based on the fact that an iteration procedure will produce a sequence of iterates xn that converge to a fixed pointx ¯ of a function.

In the Newton iteration procedure outlined above, the zerox ¯ of the function f(x) was seen to be a fixed point of the Newton function N(x). Moreover, the fixed pointx ¯ is attractive since, according to Eq. (116), the iterates xn converge to it. For the sake of completeness, we provide a more precise definition of an attractive fixed point. We’ll consider a function g(x) instead of f(x), in order to avoid any confusion with the function f(x) used earlier.

A fixed point p of a function g(x) is a point for which

g(p)= p. (119)

Moreover, the fixed point p is attractive if there exists an open interval I which contains p and for which g(x) p < x p for all x I. (120) | − | | − | ∈ In other words, for any x I, the point g(x) is closer to p than x is. ∈

Graphically, the situation is sketched below. The fixed point p is the intersection of the graph of g(x), i.e., y = g(x) and the line y = x. For an x I, the distance g(x) p marked on the y-axis is ∈ | − | less than the distance x p marked on the x-axis. | − |

231 y = x

p |g(x) − p| g(x) y = g(x)

p x

|x − p|

An attractive fixed point p = g(p)

The astute reader may start to wonder if the slope of the function g(x) around the point p has something to do about the attractiveness of the fixed point p. If the magnitude of the slope were too large, then the point g(x) could actually be repelled from p. The answer to this conjecture is yes. We state the result a little more formally below.

Let g be a function with continuous first derivative, i.e., the function g′(x) is a of x. Furthermore, suppose that p is a fixed point of g and g′(p) < 1. Then p is | | an attractive fixed point of g (as defined earlier).

This result, which may be proved with the help of the Intermediate and Mean Value Theorems, is left as an exercise. (It is posted as a bonus problem in the current assignment (No. 9).)

Returning to the Newton function N(x) associated with a function f(x), the situation is even better that that of an attractive fixed point. Let’s compute the derivative of N(x) at the fixed point x¯, i.e., N ′(¯x): d 1 N ′(x) = 1 f(x) − dx  f ′(x) f ′(x) f ′′(x) = 1 + f(x) − f ′(x) [f ′(x)]2 f(x)f ′′(x) = . (121) [f ′(x)]2 Recalling that f(¯x) =x ¯, this implies that

N ′(¯x) = 0. (122)

232 Obviously, N ′(¯x) < 1, makingx ¯ an attractive fixed point. But the fact that N ′(¯x) = 0 actually | | makes it superattractive. We’ll simply state the fact that there exists an interval I containingx ¯, and a constant K 0 such that ≥

N(x) x¯ K x x¯ 2, for all x I. (123) | − |≤ | − | ∈

This means that if the Newton iterate x lies a distance ǫ fromx ¯, i.e, x x¯ = ǫ, then the next n | n − | 2 Newton iterate xn+1 = N(xn) lies a distance of less than Kǫ fromx ¯. For ǫ very small, this is a significant improvement. Because the error is a multiple of ǫ2, this behaviour is known as quadratic convergence. If you go back and look at the iterates involved in some of our early illustrative examples of Newton’s method, you’ll see the quadratic convergence at work. This convergence is much faster than the 2−n convergence of the bisection method examined earlier in this course.

Some final comments:

1. A proof of quadratic convergence for Newton’s method can be performed using Taylor series expansion of the Newton function N(x) about the fixed pointx ¯. You’ll learn more about Taylor series in MATH 138. But that being said, you can prove quadratic convergence using the quadratic approximation to Newton’s method – you have already encountered the quadratic approximation to a function g(x) in this course.

2. The iteration of a function has an interesting graphical interpretation, which was introduced very briefly in the lecture. For more information, you may consult the supplementary notes, “Graph- ical interpretation of iteration,” which are posted after this week’s lecture notes on Waterloo LEARN.

233 Lecture 29

Antiderivatives

(Relevant section from Stewart, Seventh Edition: Section 4.9)

We have already discussed the idea of earlier in this course, because of the need to relate the velocity function v(t)= x′(t) to the position function x(t). Obviously, the velocity function v(t) is the derivative of the position function x(t). But in reverse, since the derivative of the position function x(t) is the velocity function v(t), x(t) is the of v(t). 1 As a particular example, given the function f(x) = x2, then the function F (x) = x3 is an an- 3 tiderivative of f(x) since F ′(x)= f(x). As we proved in an earlier lecture, the set of all antiderivatives of x2 may be given by the set of functions,

1 x3 + C, where C is an arbitrary constant. (124) 3

This is a one-parameter family of functions. As you may already know, we may summarize this example as follows

1 x2 dx = x3 + C. (125) Z 3

The indefinite on the LHS of the above equation represents the general antiderivative of the function x2. And the RHS is the set of all such antiderivatives.

You are encouraged to read Section 4.9 in detail and study the examples discussed therein.

234 Integration

Definite

(Relevant sections from Stewart, Seventh Edition: Sections 5.1 and 5.2)

The material presented in this lecture closely follows the presentation of Section 5.1 of the textbook and therefore will not be reproduced here. Please read this and Section 5.2 for an excellent discussion of integration along with numerous helpful examples. Here is a very brief summary of what was covered in the lecture: Let f(x) be a continuous (or piecewise continuous function) defined over the interval [a, b]. We assumed that f(x) was positive, i.e., f(x) > 0 on [a, b] and that the goal was to find the area A of the region in the plane enclosed by the graph of f(x), the x-axis and the lines x = a and x = b. b a We first dividd this region into n strips of equal width ∆x = − . This is conveniently done by n introducing the following set of partition points xk on the interval [a, b]:

b a x = a +∆x = a + k − . (126) k · n on the In this way the endpoints of the interval [a, b] are x0 = a and xn = b. We also let Ik = [xk−1,xk] denote the kth subinterval produced by these partition points. The area Ak lies above the interval Ik. For each interval I , 1 k n, pick a sample point x∗ I . This sample point could be the left k ≤ ≤ k ∈ k endpoint of Ik, i.e., xk−1, the right endpoint xk or even the midpoint of Ik. (In fact, in computations, it might be desirable to use the midpoint.) ∗ Then evaluate the function f(x) at this sample point, i.e., compute f(xk). The idea is that the ∗ area of the kth strip is now approximated by the rectangle of width ∆x and height f(xk), i.e.,

A f(x∗)∆x, 1 k n. (127) k ≈ k ≤ ≤

Therefore, the total area A is approximated as follows

n n A = A f(x∗)∆x . (128) k ≈ k Xk=1 Xk=1 The sum on the right-hand side, which we’ll denote as

n ∗ Sn = f(xk)∆x (129) Xk=1

235 is known as a Riemann sum. It is an approximation to the true area. Note the subscript n which indicates the number of subintervals that are being used to approximate the area. We now claim that in the limit n , which implies that ∆x 0, the Riemann sums S → ∞ → n converge to a limit, and the limit will be the desired area A, i.e.,

A = lim Sn. (130) n→∞

This limit will be independent of the choice of partition points x∗ I employed. This seems appro- k ∈ k priate since the width ∆x of the subintervals I tends to zero as n . k →∞ In general, the function f(x) does not have to be positive over the interval [a, b], in which case this procedure does not yield the area, but rather a signed area. But more on this later. The final result is that if f(x) is a piecewise continuous function on the interval [a, b], then the sequence of Riemann sums defined in Eq. (129) converges to a limit, denoted as follows,

n b ∗ lim Sn = lim f(x )∆x = f(x) dx. (131) n→∞ n→∞ k Z Xk=1 a

The quantity on the right represents the so-called Riemann integral of the function f(x) over the interval [a, b]. The Riemann integral of a function will have many interesting applications in physics, as we’ll see very shortly.

236