Convex Optimization 5

Convex Optimization 5. Duality Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 46 Outline Lagrange dual function Lagrange dual problem Geometric interpretation Optimality conditions Perturbation and sensitivity analysis Examples Generalized inequalities SJTU Ying Cui 2 / 46 Lagrangian standard form problem (not necessarily convex) min f0(x) x s.t. fi (x) ≤ 0, i = 1, ..., m hi (x) = 0, i = 1, ..., p m p ∗ domain D = i=0 domfi ∩ i=1 domhi and optimal value p ◮ basic idea in Lagrangian duality: take the constraints into T T account by augmenting the objective function with a weighted sum of the constraint functions Lagrangian: L : Rn × Rm × Rp → R, with dom L = D× Rm × Rp, m p L(x, λ, ν)= f0(x)+ λi fi (x)+ νi hi (x) Xi=1 Xi=1 ◮ weighted sum of objective and constraint functions ◮ λi is Lagrange multiplier associated with fi (x) ≤ 0 ◮ νi is Lagrange multiplier associated with hi (x) = 0 SJTU Ying Cui 3 / 46 Lagrange dual function Lagrange dual function (or dual function): g : Rm × Rp → R m p g(λ, ν)= inf L(x, λ, ν)= inf f0(x)+ λi fi (x)+ νi hi (x) x∈D x∈D i i ! X=1 X=1 ◮ g is concave even when problem is not convex, as it is pointwise infimum of a family of affine functions of (λ, ν) ◮ pointwise minimum or infimum of concave functions is concave ◮ g can be −∞ when L is unbounded below in x SJTU Ying Cui 4 / 46 Lower bound property The dual function yields lower bounds on the optimal value of the primal problem, i.e., for any λ 0 and any ν, g(λ, ν) ≤ p∗ ◮ the inequality holds but is vacuous when g(λ, ν)= −∞ ◮ the dual function gives a nontrivial lower bound only when λ 0 and (λ, ν) ∈ domg, i.e., g(λ, ν) > −∞ ◮ refer to (λ, ν) with λ 0, (λ, ν) ∈ domg as dual feasible proof: Supposex ˜ is feasible, i.e., fi (˜x) ≤ 0 and hi (˜x)= 0, and λ 0. Then, we have m p λi fi (˜x)+ νi hi (˜x) ≤ 0 =⇒ L(˜x, λ, ν) ≤ f0(˜x) i i X=1 X=1 Hence, g(λ, ν)= inf L(x, λ, ν) ≤ L(˜x, λ, ν) ≤ f0(˜x) x∈D Minimizing over all feasiblex ˜ gives p∗ ≥ g(λ, ν). SJTU Ying Cui 5 / 46 Examples Least-norm solution of linear equations min xT x x s.t. Ax = b dual function: ◮ to minimize L(x,ν)= xT x + νT (Ax − b) over x (unconstrained convex problem), set gradient equal to zero: T T ∇x L(x,ν) = 2x + A ν = 0 ⇒ x = −(1/2)A ν ◮ plug in L(x,ν) to obtain g: g(ν)= L((−1/2)AT ν,ν) = (−1/4)νT AAT ν − bT ν which is a concave quadratic function of ν, as −AAT 0 lower bound property: p∗ ≥ (−1/4)νT AAT ν − bT ν, for all ν SJTU Ying Cui 6 / 46 Examples Standard form LP min cT x x s.t. Ax = b, x 0 dual function: ◮ Lagrangian L(x, λ, ν)= cT x + νT (Ax − b) − λT x = −bT ν + (c + AT ν − λ)T x is affine in x (bounded below only when identically zero) ◮ dual function −bT ν, AT ν − λ + c = 0 g(λ, ν) = inf L(x, λ, ν)= x (−∞, otherwise lower bound property: nontrivial only when λ 0 and AT ν − λ + c = 0, and hence p∗ ≥−bT ν if AT ν + c 0 SJTU Ying Cui 7 / 46 Examples Two-way partitioning problem (W ∈ Sn) min xT Wx x 2 s.t. xi = 1, i = 1, ..., n ◮ a nonconvex problem with 2n discrete feasible points ◮ find the two-way partition of {1, ..., n} with least total cost ◮ Wij is cost of assigning i, j to the same set ◮ −Wij is cost of assigning i, j to different sets dual function: T 2 g(ν) =inf(x Wx + νi (x − 1)) x i i X −1T ν, W + diag(ν) 0 =inf xT (W + diag(ν))x − 1T ν = x (−∞, otherwise lower bound property: p∗ ≥−1T ν if W + diag(ν) 0 ∗ example: ν = −λmin(W )1 gives bound p ≥ nλmin(W ) SJTU Ying Cui 8 / 46 Lagrange dual function and conjugate function ◮ conjugate f ∗ of a function f : Rn → R: f ∗(y)= sup (y T x − f (x)) x∈dom f ◮ dual function of min f0(x) x s.t. x = 0 g(ν) = inf(f (x)+ νT x)= − sup((−ν)T x − f (x)) x x ◮ relationship: g(ν)= −f ∗(−ν) ◮ conjugate of any function is convex ◮ dual function of any problem is concave SJTU Ying Cui 9 / 46 Lagrange dual function and conjugate function more generally (and more usefully), consider an optimization problem with linear inequality and equality constraints min f0(x) x s.t. Ax b, Cx = d dual function: T T g(λ, ν)= inf f0(x)+ λ (Ax − b)+ ν (Cx − d) x∈dom f0 T T T T T = inf f0(x) + (A λ + C ν) x − b λ − d ν x∈dom f0 ∗ T T T T = −f0 (−A λ − C ν) − b λ − d ν ∗ domain of g follows from domain of f0 : T T ∗ domg = {(λ, µ)|− A λ − C ν ∈ domf0 } ◮ simplify derivation of dual function if conjugate of f0 is known SJTU Ying Cui 10 / 46 Examples Equality constrained norm minimization min kxk x s.t. Ax = b dual function: T T T ∗ T −b ν, ||A ν||∗ ≤ 1 g(ν)= −b ν − f0 (−A ν)= (−∞, otherwise ◮ conjugate of f0 = || · ||: ∗ 0, ||y||∗ ≤ 1 f0 (y)= (∞, otherwise i.e., the indicator function of the dual norm unit ball, where T kyk∗ = supkuk≤1 u y is dual norm of k · k SJTU Ying Cui 11 / 46 Lagrange dual problem max g(λ, ν) λ,ν s.t. λ 0 ◮ find best lower bound on p∗, obtained from Lagrange dual function ◮ always a convex optimization problem (maximize a concave function over a convex set), regardless of convexity of primal problem, optimal value denoted by d ∗ ◮ λ, ν are dual feasible if λ 0 and g(λ, ν) > −∞ (i.e., (λ, ν) ∈ domg = {(λ, ν)|g(λ, ν) > −∞}) ◮ can often be simplified by making implicit constraint (λ, ν) ∈ dom g explicit, e.g., ◮ standard form LP and its dual min cT x max − bT ν x ν s.t. Ax = b, x 0 s.t. AT ν + c 0 SJTU Ying Cui 12 / 46 Weak duality and strong duality weak duality: d ∗ ≤ p∗ ◮ always holds (for convex and nonconvex problems) ◮ can be used to find nontrivial lower bounds for difficult problems, e.g., ◮ solving the SDP max − 1T ν ν s.t. W + diag(ν) 0 gives a lower bound for the two-way partitioning problem strong duality: d ∗ = p∗ ◮ does not hold in general ◮ (usually) holds for convex problems ◮ conditions that guarantee strong duality in convex problems are called constraint qualifications ◮ there exist many types of constraint qualifications SJTU Ying Cui 13 / 46 Slater’s constraint qualification One simple constraint qualification is Slater’s condition (Slater’s constraint qualification): convex problem is strictly feasible, i.e., there exists an x ∈ intD such that fi (x) < 0, i = 1, · · · , m, Ax = b ◮ can be refined, e.g., ◮ can replace intD with relintD (interior relative to affine hull) ◮ affine inequalities do not need to hold with strict inequality ◮ reduce to feasibility when the constraints are all affine equalities and inequalities ◮ implies strong duality for convex problems ◮ implies that the dual value is attained when d ∗ > −∞, i.e., there exists a dual feasible (λ∗,ν∗) with g(λ∗,ν∗)= d ∗ = p∗ SJTU Ying Cui 14 / 46 Examples Inequality form LP primal problem: min cT x x s.t. Ax b dual function: T T T T T −b λ, A λ + c = 0 g(λ) = infx (c + A λ) x − b λ = (−∞, otherwise dual problem: max − bT λ λ s.t. AT λ + c = 0, λ 0 ◮ from weaker form of Slater’s condition: strong duality holds for any LP provided the primal problem is feasible, implying strong duality holds for LPs if the dual is feasible ◮ in fact, p∗ = d ∗ except when primal and dual are infeasible SJTU Ying Cui 15 / 46 Examples n Quadratic program: P ∈ S++ min xT Px x s.t. Ax b dual function: T T 1 T −1 T T g(λ) = infx (x Px + λ (Ax − b)) = − 4 λ AP A λ − b λ dual problem: max − (1/4)λT AP−1AT λ − bT λ λ s.t. λ 0 ◮ from weaker form of Slater’s condition: strong duality holds provided the primal problem is feasible ◮ in fact, p∗ = d ∗ always holds SJTU Ying Cui 16 / 46 Examples A nonconvex problem with strong duality: A 6 0 min xT Ax + 2bT x x s.t. xT x ≤ 1 dual function: T T g(λ) =infx (x (A + λI )x + 2b x − λ) −bT (A + λI )†b − λ, A + λI 0, b ∈ R(A + λI ) = (−∞, otherwise dual problem and equivalent SDP: max − bT (A + λI )†b − λ max − t − λ λ λ,t A + λI b s.t. A + λI 0, b ∈ R(A + λI ) s.t. 0 bT t ◮ strong duality holds although primal problem is nonconvex (difficult to show) SJTU Ying Cui 17 / 46 Geometric interpretation geometric interpretation via set of values ◮ set of values taken on by the constraint and objective functions: G = {(f1(x), · · · , fm(x), h1(x), · · · , hp(x), f0(x)) ∈ Rm × Rp × R|x ∈ D} ◮ optimal value: p∗ = inf{t|(u, v, t) ∈ G, u 0, v = 0} ◮ dual function: g(λ, ν) = inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ G} ◮ if the infimum is finite, then (λ, ν, 1)T (u, v, t) ≥ g(λ, ν) defines a nonvertical supporting hyperplane to G ◮ weak duality: for all λ 0, p∗ =inf{t|(u, v, t) ∈ G, u 0, v = 0} ≥ inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ G, u 0, v = 0} ≥ inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ G} =g(λ, ν) SJTU Ying Cui 18 / 46 Geometric interpretation Example consider a simple problem with one constraint ∗ min f0(x) p =inf{t|(u, t) ∈ G, u ≤ 0} x s.t.

Convex Optimization 5

Lagrangian Duality in Convex Optimization

Arxiv:2011.09194V1 [Math.OC]

Log-Concave Duality in Estimation and Control Arxiv:1607.02522V1

Lecture 11: October 8 11.1 Primal and Dual Problems

Epsilon Solutions and Duality in Vector Optimization

CONJUGATE DUALITY for GENERALIZED CONVEX OPTIMIZATION PROBLEMS Anulekha Dhara and Aparna Mehra

On Strong and Total Lagrange Duality for Convex Optimization Problems

Conjugate Duality and Optimization CBMS-NSF REGIONAL CONFERENCE SERIES in APPLIED MATHEMATICS

Lec 13 - Math Basis for Rate Distortion Optimization

Conjugate Duality

Fenchel Duality Theory and a Primal-Dual Algorithm on Riemannian Manifolds

Duality for Almost Convex Optimization Problems Via the Perturbation Approach