<<

0.1

Before discussing the mean value theorem, it is worthwhile to note an interesting result about the nature of . When we differentiate a f : I → R, there is no guarantee that the result is a . For instance, consider ( x2 sin(1/x) x 6= 0 f(x) = 0 x = 0.

For x 6= 0, we find that

f 0(x) = 2x sin(1/x) + x2 cos(1/x) · (−x−2) = 2x sin(1/x) − cos(1/x).

In order to find f 0(0) we will need to appeal directly to the definition of the , and find

f(h) − f(0) f 0(0) = lim = lim h sin(1/h) = 0. h→0 h h→0 Thus, we have a function that is differentiable everywhere, but its derivative is not continuous, as it has no as x → 0 (the cosine term oscillates too rapidly). This example may seem a bit artificial, but it illustrates the fact that the derivative of a function may not behave nearly as nicely as the function itself. Even though we are not guaranteed continuity of a derivative, we are guaranteed one useful property of derivatives.

Theorem 0.1.1 (Intermediate Value Property of Derivatives). Let f :[a, b] → R be differentiable 0 0 0 on (a, b). For any value y0 between f (a) and f (b), there exists some c ∈ [a, b] such that f (c) = y0. This theorem simply states that any derivative has the intermediate value property. Thus, even though a derivative may not be continuous, it cannot also just be any function. For instance, the above theorem implies that |x|/x cannot be the derivative of a function that is differentiable over an interval containing 0, because over any interval containing x = 0 it does not have the intermediate value property (it skips the values between −1 and 1). We know in fact that the derivative of f(x) = |x| is f 0(x) = |x|/x, except for at x = 0, where the derivative f 0(x) does not exist. It is the way that |x|/x jumps at x = 0 that makes it impossible for it to be the derivative of any function that is differentiable over an entire interval, which f(x) = |x| is not. We will keep this intuition in mind while considering the mean value theorem. The mean value theorem provides us with a means of relating a function’s instantaneous rate of change inside an interval with its average (or mean) rate of change over an interval. The power of the mean value theorem does not truly shine in introductory , as its main applications are proving other results. Nevertheless, we will discuss this very useful theorem here, and some of its implications.

Theorem 0.1.2 (The Mean Value Theorem). Let f be continuous on [a, b], and differentiable on the interval’s interior (a, b). It follows that there exists at least one c ∈ (a, b) at which

f(b) − f(a) = f 0(c). b − a Verbally, the mean value theorem states: for a continuous and differentiable function f on an interval, we can find a point inside the interval at which the instantaneous rate of change of f is equal to the average rate of change of f over that interval. In other words, if we draw the secant

1 between the points (a, f(a)) and (b, f(b)), we can find a point c ∈ (a, b) so that the of the tangent line at c, f 0(c), is the same as the slope of the through the endpoints of the interval. If we think of the average rate of change over an interval as the mean rate of change, then the name of this theorem makes sense. A graphical representation of the mean value theorem is given in figure 1.

Figure 1: Geometrically, the mean value theorem states that there is some point c ∈ [a, b] where the tangent line is to the secant line through (a, f(a)) and (b, f(b)).

In order for a function to have a given average rate of change, it must either be constantly changing at that rate, or at some instants in time change faster, and at others change slower. In this way the mean value theorem follows from the intermediate value property of derivatives, as if the derivative is larger than the average rate of change at some time, and smaller at another time, there must be a time in between when it is exactly equal. It is important to emphasize that the mean value theorem does not apply for functions which are not continuous on the closed interval [a, b], or not differentiable on the open interval (a, b). Consider the function  0 x = 0  f(x) = 0.5 0 < x < 1 1 x = 1 If we look at the average rate of change on [0, 1] we find f(1) − f(0) = 1, 1 − 0 and the secant line through the points (0, f(0)) and (1, f(1)) is just the line of slope one that passes through the origin. What about the derivative of this function? It turns out f 0(x) does not exist at x = 0 and x = 1, because the function is not continuous at those points. However, for 0 < x < 1 the function is continuous, and just a constant, so the derivative f 0(x) = 0. It can be seen that there are no points in this interval for which the slope of the tangent line is 1, because for all points in (0, 1) the tangent line is horizontal, having slope 0. We cannot apply the mean value theorem to this function because it is not continuous on [0, 1], even though it is differentiable on (0, 1). It is noteworthy that we could still apply the mean value theorem on any closed interval contained in (0, 1), and it would tell us that somewhere in the interval the slope of the tangent line is 0 (which is not particularly useful, as we already know that the tangent line is 0 everywhere in the interval).

2 Now let us consider a function that is continuous on a closed interval, but not differentiable within the open interval. Let us consider f(x) = |x| on the interval [−1, 1]. If we look at the average rate of change over the interval we find

f(1) − f(−1) 0 = = 0, 1 − (−1) 2 yet there are no points c ∈ (−1, 1) such that f 0(c) = 0. For x < 0 we have f 0(x) = −1 and for x > 0 we have f 0(x) = 1 (of course f 0(0) does not exist). If the function |x| were differentiable at x = 0, then in a sense, we would have to have f 0(0) = 0, as this would be the point where the derivative would change from negative to positive. However, in this case we don’t have any points with a zero derivative, and because the derivative is discontinuous at x = 0, it is possible for us to jump from a positive to negative derivative. We gain further information from the mean value theorem in the following two corollaries (a corollary is a minor result that follows almost immediately from major theorem):

Corollary 0.1.3. If f 0(x) = 0 for all x in an open interval (a, b), then f(x) = C for all x ∈ (a, b), where C is a constant.

Corollary 0.1.4. If f 0(x) = g0(x) for all x in an open interval (a, b), then there exists some constant C so that f(x) = g(x) + C for some x ∈ (a, b).

Both of these results are things that we already knew intuitively, but follow rigorously from the mean value theorem. To move from the first of these results to the second, we consider the function (f −g)(x). For this function, (f −g)0(x) = f 0(x)−g0(x) = 0 on the interval (a, b), so (f −g)(x) = C for some constant, and thus f(x) = g(x) + C. If we apply the mean value theorem on an interval for which f(a) = f(b), it follows that there exists a point c ∈ (a, b) so that f(b) − f(a) f 0(c) = = 0. b − a This means that if we have any two points a and b at which our function of interest crosses the x-axis, we can find a point in the interval which is a critical point (at which we could look for a local extremum). This special case of the mean value theorem is called Rolle’s theorem, and is used in the proof of the mean value theorem, If we think about the function

x3 f(x) = − 3x 3 which is 0 at −3, 0, 3, then we know it has a critical point somewhere in the intervals (−3, 0) and (0, 3). Thus, we know it has at least two critical points, but it could have more. The mean value theorem is also useful in the classification of critical points. Consider

f :[−3, 3] → R, where f(x) is defined as above. Differentiating f we find that

f 0(x) = x2 − 3, √ √ 0 and that f (x) = 0 when x = ± 3, meaning that ± 3 are critical points. Now√ we evaluate√ the√ function√ at each of the critical points and endpoints, yielding f(−3) = 0, f(− 3) = 2 3, f( 3) = −2 3, and f(3) = 0. We can of these points as partitioning our larger interval into

3 subintervals, over which we want to consider the sign of the derivative. Since the sign of the derivative cannot change over these subintervals, knowing the sign of the derivative at a single point in the interval tells us the sign of the derivative over the entire subinterval. √ If we apply the mean value theorem√ to the endpoints of one of these subintervals, say [ 3, 3], we find that there exists some c ∈ ( 3, 3) so that √ √ f(3) − f( 3) 0 − (−2 3) f 0(c) = √ = √ > 0. 3 − 3 3 − 3 Since the derivative is positive at some point in the subinterval, and the sign of the derivative cannot change over the subinterval, it follows that the derivative is positive over the entire subinterval. Note that the denominator will always be positive when we apply the mean value theorem, and since we only care about the sign of the derivative, we only need to determine the relative magnitude of√ the endpoints√ to see√ if the derivative√ will be positive or negative. Doing so we can see that f( √3)√ = −2 3 < 2 3 =√f(− 3) which implies the derivative is negative on the subinterval (− 3,√3), and finally f(− 3)√> f(3) implies that the derivative√ is positive over the subinterval (−3, − 3). It follows that f(− 3) is a local maximum and f( 3) is a local minimum. The mean value theorem allows us to transform a bounded derivative into a bound for a function defined on an interval. Let’s suppose that we have a function f which is continuous on the interval [a, b], differentiable on (a, b), and that |f 0(x)| < B for all x ∈ (a, b) (this means that the derivative of f is bounded). This bound on the derivative tells us that the function can only change so fast. In fact, the maximum rate at which it can change is B. Intuitively, if the function can only change so fast, then it should not be able to become too much larger or smaller than f(a) (or f(b) for that matter) if we don’t move too far from x = a. Let us consider a point x ∈ (a, b). Applying the mean value theorem on the interval [a, x] we find f(x) − f(a) = f 0(c) x − a for some c ∈ (a, x). Rearranging terms it follows

f(x) = f 0(c)(x − a) + f(a).

Although this looks like the equation for a line, remember that x is a fixed value, and for any value of x, we might actually find a different value for c. Normally we would find f 0(c) using information about f(x) and f(a), but here we want to gain some information about f(x). This means that we don’t know exactly what f 0(c) is, but we do know that f 0(c) < B and f 0(c) > −B. It then follows that −B(x − a) + f(a) ≤ f(x) ≤ B(x − a) + f(a), which is an equation that holds for all x ∈ (a, b). In fact, if we consider the above equation in the cases of equality, we get the equations of two lines. When x = a, the value of both inequalities is f(a). From there, the lines increase (or decrease) with slope B (or −B), which describe bounds on the function f(x). At any given point x, the function f(x) must be between the two lines, which implies the function is bounded. The only way the function would be equal to one of the two lines, is if the derivative was a constant of B or −B up to that point, meaning that the magnitude of the derivative was equal to the bound, which is the largest possible magnitude of the derivative (or may not even be achievable, if the bound is larger than any value the derivative ever achieves). Because in this region the rate of change of the function is bounded, it is only possible for the function to change by so much, which forces it to be within these bounds. This is an important concept for visualizing functions, and is useful in many proofs.

4