Quick viewing(Text Mode)

The Second Derivative Test

The Second Derivative Test

Hessians and the Second Test

Learning goals: students investigate the analog of the concavity for multivariable functions and apply it to critical points to determine their nature.

In one variable , at a point where the derivative is zero we can look to the to determine if the point is a minimum or maximum. Geometrically, the second derivative can tell us if the is concave up or concave down. If, at a point where the is zero, the function is concave up, the graph will “bend upward” in both directions as you move away from the critical point, meaning you will be at a minimum. Algebraically we can look at the Taylor expansion near the critical point: f(x + h) = f(x) + f′(x)h + ½ f′′(x)h2 + !. Now if the derivative term is zero, we get the graph of a + bh2 near f(x). And this is a parabola that opens upward (making the vertex a minimum) if b is positive and a max if b is negative. Hence we have the second .

Now let’s move to more variables, where we have f(x0 + Δx) = f(x0) + H(Δx) near the critical point. The Hessian function H is quadratic in all the pieces of Δx. So is there some kind of second derivative test for multivariable functions?

Definition: a function is called positive definite if it’s output is always positive, except perhaps at the origin. If the function is always positive or zero (i.e. nonnegative) for all x then it is called positive semidefinite. Negative (semi)definite has analogous definitions. Functions that take on both positive and negative values are called indefinite.

For instance, f(x, y) = x2 + y2 is positive definite, for as long as (x, y) ≠ (0, 0) the output is obviously positive. By comparison, g(x, y) = (x + y)2 is positive semidefinite, because it is positive unless y = –x and then its output is zero.

Now it’s all very simple—at a critical point, the function has a minimum if the Hessian is positive definite! For then any way that you move away from the critical point, you are adding something to the function, so it gets larger, so where you were must have been the smallest value—a minimum.

Theorem: (multivariable second derivative test) At a critical point, if the Hessian function is positive (negative) definite, then the function has a minimum (maximum). If the Hessian is indefinite, the critical point is a saddle—you go up in some directions and down in others. If the Hessian is semidefinite, you cannot tell what is happening without further analysis, though if it is positive semidefnite you cannot have a maximum and negative semidefinite you cannot have a maximum.

We’ve already discussed the definite parts of this theorem—if the Hessian is positive definite, you have a minimum because you go up anytime you go away from the point. The indefinite part is equally clear, because indefinite tells you that some directions cause the function to go up, others down. The semidefinite parts are equally simple. For positive semidefinite you can’t have a maximm because in some directions you go up. In others, the Hessian stays zero, and you might go up or down depending on higher there.

Example: in f(x, y) = cos(x) + cos(y), the origin is a critical point, as is easy enough to check. What is the Hessian function there? Well, ∂2f / ∂x2 = –cos(x), and at the origin this is –1. Same for ∂2f / ∂y2. The mixed partials are both zero. So the Hessian function is –(½)(Δx2 + Δy2). This is always negative for Δx and/or Δy ≠ 0, so the Hessian is negative definite and the function has a maximum. This should be obvious since cosine has a max at zero.

Example: for h(x, y) = x2 + y4, the origin is clearly a minimum, but the Hessian is just Δx2 which is positive semidefinite (it remains zero for inputs where Δx = 0 but Δy ≠ 0). On the other hand, the function j(x, y) = x2 – y4 has exactly the same Hessian, but clearly has neither a min nor a max at the origin.

So now the question becomes: how can we tell if the Hessian function is some kind of definite? Let’s analyze the two-variable version.

1 ⎛ a b ⎞ ⎛ Δx ⎞ 1 2 2 We have H(Δx,Δy) = Δx Δy ⎜ ⎟ ⎜ ⎟ = (a Δx + 2b ΔxΔy + c Δy ). Is there a 2 ( )⎝ b c ⎠ ⎝ Δy ⎠ 2 way for us to tell whether this will be positive (say) for all possible inputs Δx and Δy?

Clearly, we can put in Δy = 0, and get aΔx2 / 2. The only was that this is always positive is to make a > 0. Similarly c > 0. But that is not enough! For instance, a = c = 1 and b = 2 gives you Δx2 + 4ΔxΔy + Δy2 which is negative if you plug in Δx = –Δy. We need to look more carefully.

I know—let’s get rid the mixed term by completing the square. (By the way, we can also ignore the ½, since all we care about is positive vs. negative.) So we get 2 2 2 ⎛ 2 b b 2 ⎞ b 2 2 ⎛ b ⎞ 1 2 2 a Δx + 2 ΔxΔy + 2 Δy − Δy + cΔy = a⎜ Δx + Δy⎟ + ac − b Δy . Now we have a ⎝⎜ a a ⎠⎟ a ⎝ a ⎠ a ( ) sum of squares. It is easy to tell what happens. Both coefficients must be positive to guarantee positive definiteness. So a > 0 and also ac – b2 > 0. Notice something interesting: this last combination is the of the Hessian .

So, given the Hessian matrix, we can tell if the Hessian function is positive definite if the top left entry and then the determinant are both positive. What about a function of three or more ⎛ a b d ⎞ variables? Well, a 3 × 3 Hessian matrix would look like ⎜ b c e ⎟ . If we set Δz = 0, we ⎜ ⎟ ⎝ d e f ⎠ would need just stuff dealing with Δx and Δy to be positive definite, which is the stuff in the upper left corner of this matrix. So we need a > 0 and ac – b2 > 0 like before. Unsurprisingly, we’d also need the 3 × 3 determinant to be positive as well.

We will call a matrix positive definite, positive semidefinite, etc. if the corresponding function is, and vice versa. That is, a matrix A is positive definite is and only if every non-zero vector x leads to xTAx positive. Then a positive gives us a positive definite Hessian function. Though we haven’t proven it, we have seen that it is reasonable for the following theorem to be true:

⎛ ⎞ a11 a12 ! a1n ⎜ ⎟ a a ! a Theorem: a matrix ⎜ 21 22 2n ⎟ is positive definite if and only if the determinant of ⎜ " " # " ⎟ ⎜ ⎟ ⎝ an1 an2 ! ann ⎠

a11 a12 ! a1n

a11 a12 a21 a22 ! a2n each in the upper left, a11 , ,!, are all positive. a21 a22 " " # "

an1 an2 ! ann

We don’t have to start in the top left and work our way down. It works with any nested sequence of centrally symmetric . For instance, in the 5 × 5 matrix, we could take

a11 a12 a13 a15 a11 a12 a15 a22 a25 a21 a22 a23 a25 a22 , , a21 a22 a25 , and the determinant of the whole a52 a55 a31 a32 a33 a35 a51 a52 a55 a51 a52 a53 a55 matrix all positive.

⎛ 3 2 0 ⎞ Example: Test whether ⎜ 2 4 1 ⎟ is positive definite. ⎜ ⎟ ⎝ 0 1 1 ⎠ Solution: For variety, we start in the middle. The middle entry is 4, which is > 0. The 2 × 2 in the lower right has determinant 4⋅1 – 1⋅1 = 3 > 0, and the determinant of the entire matrix is 5, which is also positive, so this matrix is positive definite.

Now, of course, the question becomes how to tell when a matrix is negative definite and gives us a maximum. Hint: it is not all the determinants being negative! Looking back to the formula where we completed the square, we would need a < 0, and also (ac – b2) / a < 0. But since a < 0 that makes the 2 × 2 determinant positive. Hmm.

A nifty trick is this: if g has a maximum, then –g has a minimum. So we change the signs of all the terms in the Hessian matrix and determine if this new matrix is positive definite. But we know that to compute determinants, we can pull constant factors—like –1—out of each row. If we pull it out of one row, we get negative of the answer. But two rows, and the two negatives cancel—we would get the same sign back. Three negatives is back to negative again. And so forth. So we have learned that:

⎛ ⎞ a11 a12 ! a1n ⎜ ⎟ a a ! a Theorem: a matrix ⎜ 21 22 2n ⎟ is negative definite if and only if the determinant of ⎜ " " # " ⎟ ⎜ ⎟ ⎝ an1 an2 ! ann ⎠

a11 a12 ! a1n

a11 a12 a21 a22 ! a2n each square matrix in the upper left, a11 , ,!, alternate signs, a21 a22 " " # "

an1 an2 ! ann starting with a negative.

That is, the 1 × 1 matrix should be negative, the 2 × 2 should have positive determinant, and so forth.

Theorem: Look at the string of signs of the determinants of increasing size along the diagonal of the Hessian matrix. If: a) they are all positive, the matrix is positive definite, and we have a minumum b) they are alternate –, +, –, +, … starting with a negative, the matrix is negative definite and we have a maximum c) if any sign is wrong, the matrix is indefinite and we have a d) if no sign is wrong but one or more terms in the string is zero, then the Hessian function is positive or negative semidefinite (according as the signs match a) or b)) and more delicate work is needed.

You might wonder if it is possible to measure the signs in one way (say from the top-left down) and get one kind of string, while going in a different order gives a different answer. The answer is no, but it is well beyond our scope to prove this. Take linear algebra next semester!