Duality Theory of Constrained Optimization

Duality Theory of Constrained Optimization Robert M. Freund April, 2014 ⃝c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive in nonlinear (and linear) optimization models in a wide variety of engineering and mathematical settings. Some elementary examples of models where duality is plays an important role are: Electrical networks. The current flows are \primal variables" and the voltage differences are the \dual variables" that arise in consideration of optimization (and equilibrium) in electrical networks. Economic markets. The \primal" variables are production levels and consumption levels, and the \dual" variables are prices of goods and services. Structural design. The tensions on the beams are \primal" variables, and the nodal displacements are the \dual" variables. Nonlinear (and linear) duality is extremely useful both as a conceptual as well as a computational tool. For example, dual problems and their solutions are used in connection with: (a) Identifying near-optimal solutions. A good dual solution can be used to bound the values of primal solutions, and so can be used to actually identify when a primal solution is optimal or near-optimal. (b) Proving optimality. Using a strong duality theorem, one can prove optimality of a primal solution by constructing a dual solution with the same objective function value. (c) Sensitivity analysis of the primal problem. The dual variable on a constraint represents the incremental change in the optimal solution value per unit increase in the right-hand-side (RHS) of the constraint. (d) Karush-Kuhn-Tucker (KKT) conditions. The optimal solution to the dual problem is a vector of KKT multipliers. (e) Convergence of algorithms. The dual problem is often used in the convergence analysis of algorithms. (f) Discovering and Exploiting Problem Structure. Quite often, the dual problem has some useful mathematical, geometric, or compu- 3 tational structure that can exploited in computing solutions to both the primal and the dual problem. 2 The Dual Problem 2.1 Constructing the Dual Problem Recall the basic constrained optimization model: OP : minimumx f(x) s.t. g1(x) ≤ 0 · = · ≥ . gm(x) ≤ 0 x 2 X: n n In this model, we have f(x): R 7! R and gi(x): R 7! R; i = 1; : : : ; m. We can always convert the constraints involving gi(x) to have the format \gi(x) ≤ 0" , and so we will now presume that our problem is of the form: 4 ∗ OP : z = minimumx f(x) s.t. g1(x) ≤ 0 . gm(x) ≤ 0 x 2 X: We will refer to X as the \ground-set". Whereas in previous developments X was often assumed to be Rn, in our study of duality it will be useful to presume that X itself is composed of a structural part of the feasible region of the optimization problem. A typical example is when the feasible region of our problem of interest is the intersection of k inequality constraints: n F = fx 2 R : gi(x) ≤ 0; i = 1; : : : ; kg : Depending on the structural properties of the functions gi(·), we might wish re-order the constraints so that the first m of these k constraints (where of course m ≤ k) are kept explicitly as inequalities, and the last k − m constraints are formulated into the ground-set n X := fx 2 R : gi(x) ≤ 0; i = m + 1; : : : ; kg : Then it follows that: n F = fx 2 R : gi(x) ≤ 0; i = 1; : : : ; m; g \ X; where now there are only m explicit inequality constraints. More generally, the ground-set X could be of any form, including the following: 5 (i) X = fx 2 Rn : x ≥ 0g, (ii) X = K where K is a convex cone, (iii) X = fx j gi(x) ≤ 0; i = m + 1; : : : ; m + kg, { } j 2 n (iv) X = x x Z+ , (v) X = fx 2 Rn : Nx = b; x ≥ 0g where \Nx = b" models flows in a network, (vi) X = fx j Ax ≤ bg, (vii) X = Rn . We now describe how to construct the dual problem of OP. The first step is to form the Lagrangian function. For a nonnegative vector u, the Lagrangian function is defined simply as follows: Xm T L(x; u) := f(x) + u g(x) = f(x) + uigi(x) : i=1 The Lagrangian essentially takes the constraints out of the description of the feasible region of the model, and instead places them in the the objective function with multipliers ui for i = 1; : : : ; m. It might be helpful to think of these multipliers as costs or prices or penalties. The second step is to construct the dual function L∗(u), which is defined as follows: ∗ T L (u) := minimumx L(x; u) = minimumx f(x) + u g(x) (1) s.t. x 2 X s.t. x 2 X: 6 Note that L∗(u) is defined for any given u. In order for the dual problem to be computationally viable, it will be necessary to be able to compute L∗(u) efficiently for any given u. Put another way, it will be necessary to be able to solve the optimization problem (1) efficiently for any given u. The third step is to write down the dual problem D, which is defined as follows: ∗ ∗ D: v = maximumu L (u) s.t. u ≥ 0 : 2.2 Summary: Steps in the Construction of the Dual Prob- lem Here is a summary of the process of constructing the dual problem of OP. The starting point is the primal problem: ∗ OP : z = minimumx f(x) s.t. gi(x) ≤ 0; i = 1; : : : ; m x 2 X: The dual problem D is constructed in the following three-step procedure: 1. Create the Lagrangian: L(x; u) := f(x) + uT g(x) : 7 2. Create the dual function: ∗ T L (u) := minimumx f(x) + u g(x) s.t. x 2 X: 3. Create the dual problem: ∗ ∗ D: v = maximumu L (u) s.t. u ≥ 0 : 2.3 Example: the Dual of a Linear Problem Consider the linear optimization problem: T LP : minimumx c x s.t. Ax ≥ b x ≥ 0 : We can rearrange the inequality constraints and re-write LP as: T LP : minimumx c x s.t. b − Ax ≤ 0 x ≥ 0 : 8 Let us define X := fx 2 Rn : x ≥ 0g and identify the linear inequality constraints \b−Ax ≤ 0" as the constraints \gi(x) ≤ 0, i = 1; : : : ; m". Therefore the Lagrangian for this problem is: L(x; u) = cT x + uT (b − Ax) = uT b + (c − AT u)T x : The dual function L∗(u) is defined to be: ∗ L (u) = minx2X L(x; u) = minx≥0 L(x; u) : Let us now evaluate L∗(u). For a given value of u, L∗(u) is evaluated as follows: ∗ L (u) = minx≥0 L(x; u) T T T = minx≥0 u b + (c − A u) x 8 < uT b if AT u ≤ c = : −∞ if AT u ̸≤ c : The dual problem (D) is defined to be: ∗ ∗ (D) v = maxu≥0 L (u) ; and is therefore constructed as: ∗ T (D) v = maxu u b s.t. AT u ≤ c u ≥ 0 : 9 2.4 Remarks on Conventions involving ±∞ It is often the case when constructing dual problems that functions such as L∗(u) takes the values of ±∞ for certain values of u. Here we discuss conventions in this regard. Suppose we have an optimization problem: ∗ (P) : z = maximumx f(x) s.t. x 2 S: We could define the function: 8 < f(x) if x 2 S h(x) := : −∞ if x2 = S: Then we can rewrite our problem as: ∗ (P) : z = maximumx h(x) s.t. x 2 Rn : Conversely, suppose that we have a function k(·) that takes on the value −∞ outside of a certain region S, but that k(·) is finite for all x 2 S. Then the problem: ∗ (P) : z = maximumx k(x) s.t. x 2 Rn is equivalent to: ∗ (P) : z = maximumx k(x) s.t. x 2 S: Similar logic applies to minimization problems over domains where the function values might take on the value +1. 10 3 More Examples of Dual Constructions of Opti- mization Problems 3.1 The Dual of a Convex Quadratic Problem Consider the following convex quadratic optimization problem: 1 T T QP : minimumx 2 x Qx + c x s.t. Ax ≥ b ; where Q is symmetric and positive definite. To construct a dual of this problem, let us re-write the inequality constraints as b − Ax ≤ 0 and dualize these constraints. We construct the Lagrangian: 1 T T T − L(x; u) = 2 x Qx + c x + u (b Ax) T − T T 1 T = u b + (c A u) x + 2 x Qx : The dual function L∗(u) is: ∗ L (u) = minx2Rn L(x; u) T − T T 1 T = minx2Rn u b + (c A u) x + 2 x Qx T − T T 1 T = u b + minx2Rn (c A u) x + 2 x Qx : For a given value of u, letx ~ be the optimal value of x in this last expression. Then since the expression is a convex quadratic function of x, it follows that x~ must satisfy: T 0 = rxL(~x; u) = (c − A u) + Qx~ ; wherebyx ~ is given by: 11 x~ = −Q−1(c − AT u) : Substituting this value of x into the Lagrangian, we obtain: ∗ T − 1 − T T −1 − T L (u) = L(~x; u) = u b 2 (c A u) Q (c A u) : The dual problem (D) is defined to be: ∗ ∗ (D) v = maxu≥0 L (u) ; and is therefore constructed as: ∗ T − 1 − T T −1 − T (D) v = maxu u b 2 (c A u) Q (c A u) s.t.

Load more