Calculus 2502A - Advanced Calculus I Fall 2014 §14.7: Local minima and maxima

Martin Frankland

November 17, 2014

In these notes, we discuss the problem of finding the local minima and maxima of a .

1 Background and terminology

Definition 1.1. Let f : D → R be a function, with domain D ⊆ Rn. A ~a ∈ D is called:

• a local minimum of f if f(~a) ≤ f(~x) holds for all ~x in some neighborhood of ~a.

• a local maximum of f if f(~a) ≥ f(~x) holds for all ~x in some neighborhood of ~a.

• a global minimum (or absolute minimum) of f if f(~a) ≤ f(~x) holds for all ~x ∈ D.

• a global maximum (or absolute maximum) of f if f(~a) ≥ f(~x) holds for all ~x ∈ D.

An extremum means a minimum or a maximum. Definition 1.2. An interior point ~a ∈ D is called a:

• critical point of f if f is not differentiable at ~a or if the condition ∇f(~a) = ~0 holds, i.e., all first-order partial of f are zero at ~a.

• saddle point of f if it is a critical point which is not a local extremum. In other words, for every neighborhood U of ~a, there is some point ~x ∈ U satisfying f(~x) < f(~a) and some point ~x ∈ U satisfying f(~x) > f(~a). Remark 1.3. Global minima or maxima are not necessarily unique. For example, the function f : R → R defined by f(x) = sin x has a global maximum with value 1 at the points: 3π π 5π nπ o ..., − , , ,... = + 2kπ | k ∈ 2 2 2 2 Z

1 and a global minimum with value −1 at the points:

π 3π 7π 3π  ..., − , , ,... = + 2kπ | k ∈ . 2 2 2 2 Z

For a sillier example, consider a function g(~x) = c for all ~x ∈ D. Then every point ~x ∈ D is simultaneously a global minimum and a global maximum of g.

Proposition 1.4. Let ~a be an interior point of the domain D. If ~a is a local extremum of f, then any partial of f which exists at ~a must be zero.

Proof. Assume that the partial derivative Dmf(~a) exists, for some 1 ≤ m ≤ n. Then the function of a single

mth coordinate z}|{ g(x) := f(a1, a2,..., x , . . . , an−1, an) is differentiable at x = am and has a local extremum at x = am. Therefore its derivative vanishes at that point: 0 g (am) = Dmf(~a) = 0. Remark 1.5. The converse does not hold: a critical point of f need not be a local extremum. For example, the function f(x) = x3 has a critical point at x = 0 which is a saddle point.

Upshot: In order to find the local extrema of f, it suffices to consider the critical points of f.

2 2 One-dimensional case

Recall the second from single variable calculus.

Proposition 2.1 ( test). Let f : D → R be a function of a single variable, with domain D ⊆ R. Let a ∈ D be a critical point of f such that the first derivative f 0 exists in a neighborhood of a, and the second derivative f 00(a) exists.

• If f 00(a) > 0 holds, then f has a local minimum at a. • If f 00(a) < 0 holds, then f has a local maximum at a. • If f 00(a) = 0 holds, then the test is inconclusive. Remark 2.2. The following examples illustrate why the test is inconclusive when the critical point a satisfies f 00(a) = 0.

• The function f(x) = x4 has a local minimum at 0. • The function f(x) = −x4 has a local maximum at 0. • The function f(x) = x3 has a saddle point at 0.

Example 2.3. Find all local extrema of the function f : R → R defined by f(x) = 3x4 − 4x3 − 12x2 + 10.

Solution. Note that f is twice differentiable on all of R, and moreover the domain of f is open in R (so that there are no boundary points to check). Let us find the critical points of f:

f 0(x) = 12x3 − 12x2 − 24x = 0 ⇔ x3 − x2 − 2x = 0 ⇔ x x2 − x − 2 = 0 ⇔ x(x + 1)(x − 2) = 0 ⇔ x = −1, 0, or 2.

Let us use the second derivative test to classify these critical points:

f 00(x) = 12 3x2 − 2x − 2 f 00(−1) = 12 (3 + 2 − 2) = 12(3) > 0 f 00(0) = 12 (−2) < 0 f 00(2) = 12 (12 − 4 − 2) = 12(6) > 0 from which we conclude:

3 • x = −1 is a local minimum of f, with value f(−1) = 3 + 4 − 12 + 10 = 5 .

• x = 0 is a local maximum of f, with value f(0) = 10 .

• x = 2 is a local minimum of f, with value f(2) = 3(16) − 4(8) − 12(4) + 10 = −22 .

Remark 2.4. In this example, the function f has no global maximum, because it grows arbi- trarily large: limx→+∞ f(x) = +∞.

Moreover, the condition limx→−∞ f(x) = +∞, together with the extreme value theorem, implies that f reaches a global minimum. Therefore, x = 2 is in fact the global minimum of f, with value f(2) = −22, as illustrated in Figure 1.

40 f(x)

20

x −2 −1 1 2 3

−20

Figure 1: Graph of the function f in Example 2.3.

4 3 Two-dimensional case

Theorem 3.1 (Second derivatives test). Let f : D → R be a function of two variables, with domain D ⊆ R2. Let (a, b) ∈ D be a critical point of f such that the second partial derivatives fxx, fxy, fyx, fyy exist in a neighborhood of (a, b) and are continuous at (a, b), which implies in particular fxy(a, b) = fyx(a, b). The of f is the 2 × 2 determinant

fxx fxy D := fyx fyy 2 = fxxfyy − fxy.

Now consider the discriminant D = D(a, b) at the point (a, b).

• If D > 0 and fxx(a, b) > 0 hold, then f has a local minimum at (a, b).

• If D > 0 and fxx(a, b) < 0 hold, then f has a local maximum at (a, b). • If D < 0 holds, then f has a saddle point at (a, b).

• If D = 0 holds, then the test is inconclusive.

Remark 3.2. The following examples illustrate why the test is inconclusive when the discrim- inant satisfies D = 0 .

• The function f(x, y) = x2 + y4 has a (strict) local minimum at (0, 0).

• The function f(x, y) = x2 has a (non-strict) local minimum at (0, 0).

• The function f(x, y) = −x2 − y4 has a (strict) local maximum at (0, 0).

• The function f(x, y) = −x2 has a (non-strict) local maximum at (0, 0).

• The function f(x, y) = x2 + y3 has a saddle point at (0, 0).

• The function f(x, y) = −x2 + y3 has a saddle point at (0, 0).

• The function f(x, y) = x2 − y4 has a saddle point at (0, 0).

Example 3.3 (# 14.7.11). Find all local extrema and saddle points of the function f : R2 → R defined by f(x, y) = x3 − 12xy + 8y3.

5 Solution. Note that f is infinitely many times differentiable on its domain R2. Let us find the critical points of f:

( 2 fx(x, y) = 3x − 12y = 0 2 fy(x, y) = −12x + 24y = 0 ( 4y = x2 ⇔ x = 2y2

⇒ 4y = (2y2)2 = 4y4 ⇔ y = y4 ⇔ y(y3 − 1) = 0 ⇔ y = 0 or y3 = 1 ⇔ y = 0 or y = 1 y = 0 ⇒ x = 2(0)2 = 0 y = 1 ⇒ x = 2(1)2 = 2 which implies that (0, 0) and (2, 1) are the only points that could be critical points of f, and indeed they are. Let us compute the discriminant of f:

fxx(x, y) = 6x

fxy(x, y) = −12

fyy(x, y) = 48y 2 D(x, y) = fxx(x, y)fyy(x, y) − fxy(x, y) = (6x)(48y) − (−12)2 = 122 (2xy − 1) .

At the critical point (0, 0), the discriminant is:

D(0, 0) = 122(0 − 1) < 0 so that (0, 0) is a saddle point . At the critical point (2, 1), the discriminant is:

D(2, 1) = 122(4 − 1) > 0 and we have fxx(2, 1) = 6(2) = 12 > 0, so that (2, 1) is a local minimum with value f(2, 1) = 8 − 24 + 8 = −8 , as illustrated in Figure 2..

6 500

0 z −500 4 2 −4 0 −2 0 −2 2 4 y x −4

Figure 2: Graph of the function f in Example 3.3.

4 Why does it work?

4.1 Quadratic approximation

The second derivatives test relies on the quadratic approximation of f at the point (a, b), which is the Taylor of degree 2 at that point. Recall that for functions of a single variable f(x), the quadratic approximation of f at a is:

(x − a)2 f(x) ≈ f(a) + f 0(a)(x − a) + f 00(a) . 2 Since the variable of interest is the displacement from the basepoint a, let us rewrite h := x−a and thus: h2 f(a + h) ≈ f(a) + f 0(a)h + f 00(a) . 2 If a is a critical point of f, then we have f 0(a) = 0 and the quadratic approximation becomes:

h2 f(a + h) ≈ f(a) + f 00(a) . 2 If f 00(a) 6= 0 holds, then the quadratic term will determine the local behavior of f around a: f 00(a) > 0 yields a local minimum while f 00(a) < 0 yields a local maximum.

Idea:

• If f 00(a) > 0 holds, then f qualitatively behaves like f(a + h) = f(a) + h2 when h is small.

• If f 00(a) < 0 holds, then f behaves like f(a + h) = f(a) − h2.

7 For functions of two variables f(x, y), the quadratic approximation at (a, b) is:

f(x, y) ≈ f(a, b) + fx(a, b)(x − a) + fy(a, b)(y − b)+ (x − a)2 (y − b)2 + f (a, b) + f (a, b)(x − a)(y − b) + f (a, b) xx 2 xy yy 2 Again, consider the displacement from the basepoint (a, b) and write

~h := (h, k) := (x − a, y − b) and thus: h2 k2 f(a + h, b + k) ≈ f(a, b) + f (a, b)h + f (a, b)k + f (a, b) + f (a, b)hk + f (a, b) . x y xx 2 xy yy 2 In vector notation, which is convenient in higher , this can be rewritten as: 1 f(a + h, b + k) ≈ f(a, b) + ∇f(a, b) · ~h + ~hT H (a, b)~h 2 f where Hf (a, b) is the Hessian of f at the point (a, b), defined as the of second partial derivatives:   fxx fxy Hf = . fyx fyy h Here ~hT denotes the row vector h k, which is the transpose of the column vector ~h = . k

If (a, b) is a critical point of f, then we have fx(a, b) = 0 and fy(a, b) = 0, and the quadratic approximation becomes:

h2 k2 f(a + h, b + k) ≈ f(a, b) + f (a, b) + f (a, b)hk + f (a, b) xx 2 xy yy 2 1 = f(a, b) + ~hT H (a, b)~h. 2 f

4.2 Behavior close to a critical point

The Hessian Hf (a, b) should be viewed as a symmetric bilinear form on the tangent space of the domain D at the point (a, b). Given two direction vectors ~v and ~w, the Hessian outputs the number T ~v Hf (a, b)~w = D ~wD~vf(a, b) where D~vf(a, b) denotes the derivative of f in the (non-normalized) direction ~v at the point (a, b). For example, taking standard basis vectors ~v = ~ı and ~w = ~ı yields the number

T ~ı Hf (a, b)~ı = D~ıD~ıf(a, b) = fxx(a, b)

8 and taking ~v = ~ı and ~w = ~ yields the number

T ~ı Hf (a, b)~ = D~D~ıf(a, b) = fxy(a, b). From linear algebra, we know that a symmetric bilinear form is classified, up to a change of basis, by its signature: how many of its eigenvalues are positive, zero, or negative. Let λ1, λ2 be the eigenvalues of Hf (a, b). Depending on the signs of λ1 and λ2, the bilinear form Hf (a, b) will be equivalent (up to a change of basis) to one of the following:

1 0 • if λ and λ are both positive. 0 1 1 2

1 0  • if one λ is positive and the other is negative. 0 −1 i

−1 0  • if λ and λ are both negative. 0 −1 1 2

1 0 • if one λ is positive and the other is zero. 0 0 i

−1 0 • if one λ is negative and the other is zero. 0 0 i

0 0 • if λ and λ are both zero. 0 0 1 2

Upshot: To analyze the behavior of f close to a critical point (a, b), the first step is to find the signs of the eigenvalues of the Hessian Hf (a, b). The discriminant of f was defined as the determinant of the Hessian, which is the product of the eigenvalues: D(a, b) = det Hf (a, b) = λ1λ2.

Definition 4.1. A critical point (a, b) of f is called non-degenerate if the Hessian Hf (a, b) is non-degenerate, i.e., has only non-zero eigenvalues. Otherwise, the critical point is called degenerate, which means that the Hessian Hf (a, b) has an eigenvalue 0.

Note that a critical point is degenerate if and only if the product of the eigenvalues is zero: λ1λ2 = 0; equivalently, the discriminant is zero: D(a, b) = 0. Example 4.2. For all the functions described in Example 3.2, the origin (0, 0) is a degenerate critical point of f. As we have seen, the function f can have all kinds of behaviors close to a degenerate critical point.

In contrast, close to a non-degenerate critical point, the behavior of f is dictated by the signature of the Hessian. If the discriminant satisfies D(a, b) 6= 0, then the critical point (a, b) is non-degenerate, and its signature is one of the following:

9 1 0 • 0 1

1 0  • 0 −1

−1 0  • . 0 −1

The case of one positive eigenvalue and one negative eigenvalue happens if and only if the discriminant is negative: D = λ1λ2 < 0. In the case D > 0, there are still two possible signatures:

1 0 • 0 1

−1 0  • . 0 −1

To distinguish between the two cases, Sylvester’s criterion (from linear algebra) tells us that both eigenvalues are positive if and only if the top left entry of the matrix is positive: fxx(a, b) > 0.

Remark 4.3. For this last step, one could also use the equivalent condition fyy(a, b) > 0, or that the trace of the matrix is positive: fxx(a, b) + fyy(a, b) = λ1 + λ2 > 0.

Summary: Here is a reinterpretation of the second derivatives test.

• D = 0 ⇔ one of the eigenvalues λi is zero. • D < 0 ⇔ one of the eigenvalues is positive and the other is negative.

• D > 0 and fxx > 0 ⇔ both eigenvalues λi are positive.

• D > 0 and fxx < 0 ⇔ both eigenvalues λi are negative.

Idea:

• If both eigenvalues are positive, then f qualitatively behaves like f(a + h, b + k) = f(a, b) + h2 + k2 when h and k are small.

• If one eigenvalue is positive and the other is negative, then f behaves like f(a+h, b+k) = f(a, b) + h2 − k2.

• If both eigenvalues are negative, then f behaves like f(a + h, b + k) = f(a, b) − h2 − k2.

10