<<

Seunghee Ye Ma 8: Week 8 Nov 17

Week 8 Summary

Last week, we saw how we can use Taylor polynomials to approximate . This week, we will see a few other numerical methods of approximating integrals. Then, we will discuss optimization. Given a f(x), how do we find x that maximizes f(x)? This discussion will bring us to convex functions which will prove to be quite useful.

Topics

Page

1 Numerical Integration 1 1.1 Midpoint rule ...... 1 1.2 Simpson’s rule ...... 2

2 Optimization and Convex Functions 2 2.1 Optimization ...... 2 2.2 Convex Functions ...... 4

1 Numerical Integration

Last week, we saw how we could use Taylor polynomials to approximate integrals. Now, we see a few other numerical methods we can use to approximate integrals. Recall that the was defined as the greatest lower bound of all the upper Riemann sums. Equiva- lently, we saw that it was the of the upper Riemann sums assocaited to the uniform partition as n → ∞. Numerical integration tries to approximate the integral by a sum that resembles the Riemann sums.

1.1 Midpoint rule

Let f be a function on [a, b] and consider the uniform partition Pn. Then, define

(j − 1 )(b − a) m = a + 2 j n

In other words, we define mj to be the midpoint of the subinterval (xj−1, xj). Now, define

n X b − a J = f(m ) 2 n j j=1

Note that J2 looks like the Riemann sum except we have replaced supf(x) with f(mj). It turns out that J2 R b provides a decent approximation of a f(x)dx. In fact, we have

Z b  1  J − f(x)dx = O 2 2 a n

This means precisely that as n → ∞, the difference between J2 and the true value of the integral approaches 0. Instead of proving this theorem, let’s look at a more advanced numerical integration method.

Page 1 of 6 Seunghee Ye Ma 8: Week 8 Nov 17

1.2 Simpson’s rule

Again, we start with a function f(x) defined on the [a, b], and let Pn be the uniform partition of [a, b]. Hence, we have

j(b − a) x − x (j − 1 )(b − a) x = a + , m = j j−1 = a + 2 j n j 2 n Now, consider n   X f(xj−1 + 4f(mj) + f(xj) b − a J = · 3 6 n j=1

Note that this time, we chose a weighted average of f(xj−1), f(mj), and f(xj) instead of choosing a partic- ularpoint. Why is this a good idea? Let’s look at the following proposition.

b−a Proposition 1.1. Let q(x) be a polynomial on I. Let m = 2 be the midpoint of I. Then,

Z b b − a q(x)dx = (q(a) + 4q(m) + q(b)) a 6 Before we prove the proposition, note that this proposition says that if f is a quadratic polynomial, then R J3 is in fact equal to f(x)dx.

Proof. Suppose q(x) is quadratic. By considering p(x) = q(x + m), we can assume without loss of generality 2 that the intervial is of the form [−a, a]. Suppose p(x) = c0 + c1x + c2x . Then, we have

Z a  3 a 2 c2x c0 + c1x + c2x dx = c + 0x + (1) −a 3 −a 2c a3 = 2c a + 2 (2) 0 3 Computing the right hand side, we see that the equality is indeed satisfied.

2 Now, suppose f(x) is C . Then, we can consider the second order Taylor polynomial of f(x) about mj:

f 00(m ) T (f)(x) = f(m ) + f 0(m )(x − m ) + j (x − m )2 2 j j j 2 j

2 3 We know that R2(f)(x) is o((x − mj) ). In fact, R2(f)(x) is O((x − mj) ). Since T2(f)(x) is a quadratic polynomial, we know that

Z xj   T2(f)(xj−1) + 4T2(f)(mj) + T2(f)(xj) b − a T2(f)(x)dx = xj−1 6 n   R xj b − a 1 In particular, the difference between f(x)dx and the j-th term in J3 is O . Hence, the xj−1 n n3  1  difference between R b f(x)dx and J is O . a 3 n3

2 Optimization and Convex Functions

2.1 Optimization Definition (Critical points and local extrema)

Page 2 of 6 Seunghee Ye Ma 8: Week 8 Nov 17

Let f be a C1 function. Then, we say that a is a critical of f if f 0(a) = 0. We say that a is a local minimum (resp. local maximum) of f if there exists δ > 0 such that for all x ∈ (a − δ, a + δ), we have f(x) ≥ f(a) (resp. f(x) ≤ f(a)). If a is either a local minimum or a local maximum, we say that a is a local extremum.

What we care about are local extrema: we want to find values a such that f is locally minimized/maximized at a. Then, why did we introduce critical points? As you might know already, we care about critical points because local extrema are always critical points. Proposition 2.1. Let f be a C1 function and suppose a is a local extremum of f. Then, f 0(a) = 0. Proof. We proceed by contradiction. We will only prove that when a is a local minimum, f 0(a) = 0. The case of local maxima will follow either by symmetry of argument or by considering g(x) = −f(x). Suppose a is a local minimum and suppose f 0(a) 6= 0. Then, either f 0(a) > 0 or f 0(a) < 0. Suppose 0 1 0 f (a) > 0. Since f is C , we know that f (x) is continuous. In particular, we can find δ1 > 0 such that 0 f (x) > 0 for all x ∈ (a − δ1, a + δ1). Since a is the local minimum, we can find δ2 > 0 such that f(x) ≥ f(a) for all x ∈ (a − δ2, a + δ2). Now, let δ = min(δ1, δ2). Since f is continuous and has a local minimum at a, we can find b1, b2 ∈ (a − δ, a + δ) such that f(b1) = f(b2). By the , we can find c ∈ (b1, b2) ⊂ (a − δ, a + δ) such that

f(b ) − f(b ) f 0(c) = 2 1 = 0 b2 − b1

However, we chose δ such that whenever x ∈ (a − δ, a + δ), we have f 0(x) > 0, which is a contradiction. Hence, we conclude that f 0(a) = 0.

So we see that local extrema must always be critical points of f(x). But is the converse also true? In other words, are all critical points local extrema? Unfortunately, the converse is not true in general. For example, consider f(x) = x3. Then, f 0(0) = 0 and thus, 0 is a critical point of f. However, we also know that f(x) is a strictly increasing function on R. Hence, 0 cannot be a local extremum. However, with a few extra conditions on f, we can check if a critical point a is a local extremum of f. Theorem 2.1 (First Test). Let a be a critical point of f. • Suppose there exists δ > 0 such that f 0(x) < 0 for all x ∈ (a − δ, a) and f 0(x) > 0 for all x ∈ (a, a + δ). Then, a is a local minimum of f.

• Suppose there exists δ > 0 such that f 0(x) > 0 for all x ∈ (a − δ, a) and f 0(x) < 0 for all x ∈ (a, a + δ). Then, a is a local maximum of f. If f is a C2 function, we have an even better test. Theorem 2.2 (Second ). Let a be a critical point of f. • Suppose f 00(a) > 0. Then, a is a local minimum of f.

• Suppose f 00(a) < 0. Then, a is a local maximum of f. Using the first and tests, finding local extrema becomes very easy. Let’s end this section with an optimization problem. Example 2.1. Suppose that each week, Caltech Bookstore sells 200 iPad Mini with Retina DisplayTM(hereinafter referred to as iPad) for $350 each. A market survey indicates that for each $10 rebate offered, the number of iPads sold per week will increase by 20 units. Write the price and the revenue as functions of number of units sold per week. How large should the rebate be if Caltech Bookstore wanted to maximize the revenue?

Page 3 of 6 Seunghee Ye Ma 8: Week 8 Nov 17

Solution. Let’s first write the price and the revenue as functions of the number of units sold per week. Let x be the total number of iPads sold in a week. Then, the increase in sales by offering a rebate is x − 200. The market survey says that for each increment of $10 in rebate offered, x increases by 20. Therefore, we can write the price as a function of x as:

10 x P (x) = 350 − (x − 200) = 450 − 20 2 Now, the revenue is simply the price multiplied by sales. Hence,

x2 R(x) = xP (x) = 450x − 2 The goal is to find the value of x which maximizes R(x) i.e. we want to find the global maximum of R(x). To do that, first we need to find all the local extrema of R(x). And to do that, we first find the critical points of R(x). But that’s not hard at all.

R0(x) = 450 − x ⇒ R0(450) = 0

Hence, x = 450 is the unique critical point of R(x). Noting that R00(x) = −1 < 0 for all x, we conclude that 450 is indeed a local maximum of R(x). Since 450 is the only local extremum of R(x), it is in fact the global maximum. Therefore, to maximize revenue, Caltech Bookstore must offer a rebate which would sell 450 iPads. In other words, Caltech Bookstore must offer a rebate which would increase sales by 450 − 200 = 250. This corresponds to a rebate of $20 250 = $125. In other words, Caltech Bookstore should be selling iPad Minis at $350 - $125 = $225 after 10 rebate!

2.2 Convex Functions You might have seen convex and concave functions before (some teachers say “concave up” and “concave down”). An example of a is f(x) = x2 and an example of concave function is g(x) = −x2. Intuitively, we think of convex functions as those functions that can be minimized, and concave functions as those that can be maximized. Using the second derivative test, this means that convex functions should have nonnegative second derivative and concave functions should have nonpositive second derivative. Our intuition serves us well this time and we will see in a bit that f(x) is convex (resp. concave) if and only if f 00(x) ≥ 0 (resp. f 00(x) ≤ 0). But first, let’s give a formal definition of convex and concave funcitons.

Definition (Convex and Concave Functions) let f be a function. f is called convex if for all θ ∈ [0, 1] and for all x, y we have

f(θx + (1 − θ)y) ≤ θf(x) + (1 − θ)f(y) f is called concave if for all θ ∈ [0, 1] and for all x, y we have

f(θx + (1 − θ)y) ≥ θf(x) + (1 − θ)f(y)

As we said in the beginning of this section, we have the following proposition.

Proposition 2.2. f is convex (resp. concave) if and only if f 00(x) ≥ 0 (resp. f 00(x) ≤ 0) for all x.

Solution. You will need to prove this for this week’s problem set!

Page 4 of 6 Seunghee Ye Ma 8: Week 8 Nov 17

Theorem 2.3 (Jensen’s Inequality). Let f be a confex function and let a1, . . . , an > 0. Then, for all x1, . . . , xn, we have Pn ! Pn i=1 aixi i=1 aif(xi) f Pn ≤ Pn j=1 aj j=1 aj If g is a concave function, we have

Pn ! Pn i=1 aixi i=1 aig(xi) g Pn ≥ Pn j=1 aj j=1 aj

You should think of Jensen’s inequality as an extension of the definition of convexity/concavity of a function to more than 2 points. In short, Jensen’s inequality says that if you have a convex function, the function’s value at the weighted average of the xi’s is at most the weighted average of the f(xi)’s. Using Jensen’s inequality, we can prove the generalized AM-GM inequality. Let’s recall the AM-GM inequality:

Theorem 2.4. Let a, b > 0. Then, a + b √ ≥ ab 2 Proof. We could prove this using Jensen’s inequality but that would be very inefficient. Instead, we give a simpler proof. We know that

0 ≤ (a − b)2 = a2 − 2ab − b2 ⇒ 4ab ≤ a2 + 2ab + b2 = (a + b)2

By taking square roots on both sides, we get the desired inequality.

That was easy! And we didn’t even need to use Jensen’s inequality. Now, let’s try to prove the generalized AM-GM inequality.

Theorem 2.5 (Generalized AM-GM Inequality). Let x1, . . . , xn > 0. Then,

x1 + ··· + xn √ ≥ n x ··· x n 1 n Proof. This time, we will use Jensen’s inequality to prove the inequality. Which convex/concave function do we use? Let’s take f(x) = log x. First, we claim that f(x) is a concave function that is strictly increasing on its domain of definition, 0 1 which is (0, ∞). First, we have that (log x) = x > 0 for all x ∈ (0, ∞). Hence, log x is strictly increasing. Differentiating one more time we get

 1 0 1 (log x)00 = = − < 0 for all x ∈ (0, ∞) x x2

By Proposition 2.2, we conclude that log x is a concave function.

Now, let x1, . . . , xn > 0. Then, by Jensen’s inequality for concave functions (where we let a1 = ··· = an = 1) we get

x + ··· + x  log x + ··· + log x log 1 n ≥ 1 n (3) n n log(x ··· x ) = 1 n (4) n  1  = log (x1 ··· xn) n (5)

Page 5 of 6 Seunghee Ye Ma 8: Week 8 Nov 17

However, we showed that log x is increasing on (0, ∞). Therefore, above inequality implies

x1 + ··· + xn √ ≥ n x ··· x n 1 n which is precisely the generalized AM-GM inequality. (Alternatively, we can apply ex to both sides and use the fact that ex is strictly increasing)

Page 6 of 6