Math 233 Hessians Fall 2001 and Unconstrained Optimization
Total Page:16
File Type:pdf, Size:1020Kb
Math 233 Hessians Fall 2001 and Unconstrained Optimization The Big Picture: Second derivatives, whether in single – or multi – variable calculus, measure the rate of change in slopes (i.e. the curvature of the function f). What makes problems harder in multivariable calc is that we have slopes in infinitely many directions (directional derivatives). So, we somehow need to examine how these infinite number of slopes change to help us determine the curvature and shape of the function f near critical points. This brings to mind something like second directional derivatives. We summarized the information about slopes by creating a vector of partial derivatives – the gradient. In a similar way, we can help to summarize known information about the rate of change of slopes by creating a matrix of second partial derivatives – the Hessian. So here is what we know: Function: f (x, y) ˆ ˆ Gradient: f (x, y) f x i f y j f x f y f xx f xy Hessian: H f (x, y) f yx f yy For example, take the function: f (x, y) 5xy 3 then the gradient is f (x, y) 5y 3 iˆ 15xy 2 ˆj and the Hessian is 2 0 15y H f (x, y) 2 (note that f xy f yx , as it will almost surely ) 15y 30xy We can evaluate these functions at specific points, for example when x = -3 and y = 2: 0 60 ˆ ˆ f ( )2,3 40i 180 j (so it isn’t a critical point) and H f ( )2,3 . 60 180 Some Matrix Theory: Suppose that we have an n row by n column (or square) matrix M. Then M is: Positive Definite if the determinants of all of its principal submatrices (the submatrices made up of its first k rows and columns for k = 1, …, n) are all positive. Another test for positive definiteness is that the eigenvalues of M are all positive real numbers. Negative Definite if the determinants of all its leading principal submatrices are nonzero and alternate in sign with the first being negative. Another test for negative definiteness is that the eigenvalues of M are all negative real numbers. If we happen to have a few zero determinants (which will imply a zero eigenvalue – we’ll see why next semester!) then we can say that M is: Positive Semi-definite if the determinants of all of its principal submatrices are all non-negative, or if all of its eigenvalues are non-negative real numbers. Note that any positive definite matrix also satisfies the definition of positive semi-definite. Mathematicians use the phrase “almost surely” when we know that there is a special case (not dealt with at this level) that prevents us from saying “always”. Negative Semi-definite if the determinants of all of its principal submatrices are alternate in sign, starting with a negative (with the allowance here of 0 determinants replacing one or more of the positive or negative values). A better test is to check if all of its eigenvalues are non-positive real numbers. Note that any negative definite matrix also satisfies the definition of negative semi-definite. Why we care: Let’s look at the function f (x, y) 40 x 3 (x )4 (3 y )5 2 , which has first partial 2 derivatives f x x 4( x 12) and f y (6 y )5 , and the critical points (3, 5) and (0, 5). 2 According to the test from the book, we would look at the “discriminant”: f xx f yy f xy - which is exactly the determinant of the Hessian matrix! 2 Let’s calculate the second partial derivatives: f xx 12x 24x , f yy 6 , and f xy 0 . This 12x 2 24x 0 gives us a Hessian matrix H f (x, y) . 0 6 36 0 At the critical point (3, 5), here is the Hessian matrix: H f )5,3( . Note that the 0 6 discriminant is (36)(6)-0 which is greater than 0. Thus we know we have a maximum or a minimum (we’ve only ruled out saddle point). If we think in terms of the matrix theory, at this point we can’t rule out Positive Definite or Negative Definite (since both would have a determinant > 0 for the whole matrix). Looking at the sign of f xx )5,3( 36 0 , so we see that we have a minimum. This is analogous to checking the sign of the first principal submatrix’s determinant. Thus, all of the principal submatrices have positive determinants and the Hessian matrix is positive definite at the critical point (3,5). A Pause for some Theory: The Hessian matrix is negative semi-definite at every unconstrained local maximum and positive semi-definite at every unconstrained local minimum. A critical point of a function f is an unconstrained local maximum if the Hessian matrix at the critical point is negative definite. A critical point of a function f is an unconstrained local minimum if the Hessian matrix at the critical point is positive definite. A critical point of a function f is a saddle point if the Hessian matrix at the critical point is neither positive semi-definite nor negative semi-definite. Back to our example: 0 0 Let’s look at the critical point (0, 5). Here is the Hessian matrix: H f )5,0( . Note 0 6 that the discriminant is (0)(6)-0 which is equal to 0. Thus, according to the book, we are unable to classify the critical point. Using our new found powers, we can discriminate further. A quick check of the set of eigenvalues for this matrix yields {0, 6}. Thus, the Hessian matrix is positive semi-definite at the critical point (0, 5). Thus, we know that we cannot have a local maximum (since all local maxima have negative semi-definite Hessians). So we know we have either a local minimum or a saddle point. The Big Picture Again: So, no matter how many variables we have in our problem, all we have to do is determine whether the Hessian matrix is positive or negative definite at a critical point to exactly classify the critical point as a maximum or minimum. If we are lucky enough to have the Hessian matrix be positive semi-definite or negative semi-definite for all points (not just the critical point we are looking at), then we even know that we have a global maximum or minimum. Another Tangent in to the Realm of Mathworld… If the Hessian matrix H(x) is positive semi-definite at all x in its domain, then the function f(x) is convex. If a function is convex, then any critical points will be global minima (if there are multiple critical points, they will all have the same function value). If the Hessian matrix H(x) is negative semi-definite at all x in its domain, then the function f(x) is concave. If a function is concave, then any critical points will be global maxima (if there are multiple critical points, they will all have the same function value). The ties between positive/negative semi-definiteness and convexity/concavity are actually stronger than I stated them above – they are equivalent statements (i.e. you could reverse the if and then parts of the sentences). Only for the few souls that wish to journey into the mathematical land of functional theory….