Constrained Optimization
Total Page:16
File Type:pdf, Size:1020Kb
AA222: MDO 113 Thursday 26th April, 2012 at 16:05 Chapter 5 Constrained Optimization Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest in general nonlinearly constrained optimization theory and methods in this chapter. Recall the statement of a general optimization problem, minimize f(x) (5.1) n with respect to x 2 R (5.2) subject toc ^j(x) = 0; j = 1;:::; m^ (5.3) ck(x) ≥ 0; k = 1; : : : ; m (5.4) Example 5.21. Graphical Solution of a Constrained Optimization Problem Suppose we want to solve the following optimization problem, 2 minimize f(x) = 4x1 − x1 − x2 − 2:5 (5.5) with respect to x1; x2 (5.6) 2 2 subject to c1(x) = x2 − 1:5x1 + 2x1 − 1 ≥ 0; (5.7) 2 2 c2(x) = x2 + 2x1 − 2x1 − 4:25 ≤ 0 (5.8) How can we solve this? One intuitive way would be to make a contour plot of the objective function, overlay the constraints and determine the feasible region, as shown in Fig. 5.1. By inspection, it is easy to see where the constrained optimum is. At the optimum, only one of the constraints is active. However, we want to be able to find the minimum numerically, without having to plot the functions, since this is impractical in the general case. We can see that in this case, the feasible space comprises two disconnected regions. In addition, although all the functions involved are smooth, the boundaries of the feasible regions are nonmsooth. These characteristics complicate the numerical solution of constrained optimization problems. 5.1 Optimality Conditions for Constrained Problems The optimality conditions for nonlinearly constrained problems are important because they form the basis for algorithms for solving such problems. AA222: MDO 114 Thursday 26th April, 2012 at 16:05 Figure 5.1: Example contours and feasible regions for a simple constrained optimization problem. 5.1.1 Nonlinear Equality Constraints Suppose we have the following optimization problem with equality constraints, minimize f(x) n with respect to x 2 R subject toc ^j(x) = 0; j = 1;:::; m^ To solve this problem, we could solve form ^ components of x by using the equality constraints to express them in terms of the other components. The result would be an unconstrained problem with n − m^ variables. However, this procedure is only feasible for simple explicit functions. Joseph Louis Lagrange is credited with developing a more general method to solve this problem, which we now review. At a stationary point, the total differential of the objective function has to be equal to zero, i.e., @f @f @f T df = dx1 + dx2 + ··· + dxn = rf dx = 0: (5.9) @x1 @x2 @xn T Unlike unconstrained optimization, the infinitesimal vector dx = [ dx1; dx2;:::; dxn] is not ar- bitrary here; if x is the sought after minimum, then the perturbation x + dx must be feasible: c^j(x + dx) = 0. Consequently, the above equations does not imply that rf = 0. For a feasible point, the total differential of each of the constraints (^c1;::: c^m^ ) must also be zero: @c^j @c^j T d^cj = dx1 + ··· + dxn = rc^j dx = 0; j = 1;:::; m^ (5.10) @x1 @xn To interpret the above equation, recall that the gradient of a function is orthogonal to its contours. Thus, since the displacement dx satisfiesc ^j(x + dx) = 0 (the equation for a contour), it follow that dx is orthogonal to the gradient rc^j. AA222: MDO 115 Thursday 26th April, 2012 at 16:05 Lagrange suggested that one could multiply each constraint variation by a scalar λ^j and subtract it from the objective function, m^ n 0 m^ 1 X X @f X @c^j df − λ^j d^cj = 0 ) @ − λ^j A dxi = 0: (5.11) @x @x j=1 i=1 i j=1 i Notice what has happened: the components of the infinitesimal vector dx have become inde- pendent and arbitrary, because we have accounted for the constraints. Thus, for this equation to be satisfied, we need a vector λ^ such that the expression inside the parenthesis vanishes, i.e., m^ @f X @c^j − λ^j = 0; (i = 1; 2; : : : ; n) (5.12) @x @x i j=1 i which is a system of n equations and n + m unknowns. To close the system, we recognize that the m constraints must also be satisfied. Suppose we define a function as the objective function minus a weighted sum of the constraints, m^ X L(x; λ^) = f(x) − λ^jc^j(x) ) j=1 L(x; λ^) = f(x) − λ^T c^(x) (5.13) We call this function the Lagrangian of the constrained problem, and the weights the Lagrange multipliers. A stationary point of the Lagrangian with respect to both x and λ^ will satisfy m^ @L @f X @c^j = − λ^j = 0; (i = 1; : : : ; n) (5.14) @x @x @x i i j=1 i @L =c ^j = 0; (j = 1;:::; m^ ): (5.15) @λ^j Thus, at a stationary point of the Lagrangian encapsulates our required conditions: the constraints are satisfied and the gradient conditions (5.12) are satisfied. These first-order conditions are known as the Karush{Kuhn{Tucker (KKT) conditions. They are necessary conditions for the optimum of a constrained problem. As in the unconstrained case, the first-order conditions are not sufficient to guarrantee a local minimum. For this, we turn to the second-order sufficient conditions (which, as in the unconstrained case, are not necessary). For equality constrained problems we are concerned with the behaviour of 2 ^ the Hessian of the Lagrangian, denoted rxxL(x; λ), at locations where the KKT conditions hold. In particular, we look for positive-definiteness in a subspace defined by the linearized constraints. Geometrically, if we move away from a stationary point (x∗; λ^∗) along a direction w that satisfies the linearized constraints, the Lagrangian should look like a quadratric along this direction. To be precise, the second-order sufficient conditions are T 2 ∗ ^∗ w rxxL(x ; λ )w > 0; n for all w 2 R such that ∗ T rc^j(x ) w = 0; j = 1;:::; m:^ AA222: MDO 116 Thursday 26th April, 2012 at 16:05 2 1 0 -1 -2 -2 -1 0 1 2 Figure 5.2: Contour plot for constrained problem (5.18). Example 5.22. Problem with Single Equality Constraint Consider the following equality constrained problem: minimize f(x) = x1 + x2 (5.16) weight respect to x1; x2 (5.17) 2 2 subject toc ^1(x) = x1 + x2 − 2 = 0 (5.18) The objective function and constraint of the above problem are shown inp Fig. 5.2. By inspection we can see that the feasible region for this problem is a circle of radius 2. The solution x∗ is obviously (−1; −1)T . From any other point in the circle it is easy to find a way to move in the feasible region (the boundary of the circle) while decreasing f. In this example, the Lagrangian is ^ 2 2 L = x1 + x2 − λ1(x1 + x2 − 2) (5.19) And the optimality conditions are " 1 # − ^ 1 2λ1x1 x1 2λ^1 rxL = = 0 ) = 1 1 − 2λ^1x2 x2 2λ^1 2 2 1 r^ L = x + x − 2 = 0 ) λ^1 = ± λ1 1 2 2 To establish which are minima as opposed to other types of stationary points, we need to look T at the second-order conditions. Directions w = (w1; w2) that satisfy the linearized constraints are AA222: MDO 117 Thursday 26th April, 2012 at 16:05 given by ∗ T 1 rc^1(x ) w = (w1 + w2) = 0 λ^1 ) w2 = −w1 while the Hessian of the Lagrangian at the stationary points is ^ 2 −2λ1 0 rxL = : 0 −2λ^1 Consequently, the Hessian of the Lagrangian in the subspace defined by w is ^ T 2 ∗ −2λ1 0 w1 ^ 2 w rxxL(x )w = w1 −w1 = −4λ1w1 0 −2λ^1 −w1 ^∗ 1 In this case λ1 = − 2 corresponds to a positive-definite Hessian (in the space w) and, therefore, the T 1 1 T T solution to the problem is (x1; x2) = ( ; ) = (−1; −1) . 2λ^1 2λ^1 ∗ ∗ Also note that at the solution the constraint normal rc^1(x ) is parallel to rf(x ), i.e., there is ^∗ a scalar λ1 such that ∗ ^∗ ∗ rf(x ) = λ1rc^1(x ): We can derive this expression by examining the first-order Taylor series approximations to the objective and constraint functions. To retain feasibility with respect toc ^1(x) = 0 we require that c^1(x + d) = 0 ) T T c^1(x + d) =c ^1(x) +rc^1 (x)d + O(d d): | {z } =0 Linearizing this we get, T rc^1 (x)d = 0 : We also know that a direction of improvement must result in a decrease in f, i.e., f(x + d) − f(x) < 0: Thus to first order we require that f(x) + rf T (x)d − f(x) < 0 ) rf T (x)d < 0 : A necessary condition for optimality is that there be no direction satisfying both of these conditions. The only way that such a direction cannot exist is if rf(x) and rc^1(x) are parallel, that is, if rf(x) = λ^1rc^1(x) holds. By defining the Lagrangian function L(x; λ^1) = f(x) − λ^1c^1(x); (5.20) and noting that rxL(x; λ^1) = rf(x) − λ^1rc^1(x), we can state the necessary optimality condition ∗ ^∗ ∗ ^∗ as follows: At the solution x there is a scalar λ1 such that rxL(x ; λ1) = 0.