Convex Optimization 5.

Prof. Ying Cui

Department of Electrical Engineering Shanghai Jiao Tong University

2018

SJTU Ying Cui 1 / 46 Outline

Lagrange dual function

Lagrange dual problem

Geometric interpretation

Optimality conditions

Perturbation and sensitivity analysis

Examples

Generalized inequalities

SJTU Ying Cui 2 / 46 Lagrangian standard form problem (not necessarily convex)

min f0(x) x

s.t. fi (x) ≤ 0, i = 1, ..., m

hi (x) = 0, i = 1, ..., p m p ∗ domain D = i=0 domfi ∩ i=1 domhi and optimal value p ◮ basic idea in Lagrangian duality: take the constraints into T T account by augmenting the objective function with a weighted sum of the constraint functions Lagrangian: L : Rn × Rm × Rp → R, with dom L = D× Rm × Rp, m p L(x, λ, ν)= f0(x)+ λi fi (x)+ νi hi (x) Xi=1 Xi=1 ◮ weighted sum of objective and constraint functions ◮ λi is Lagrange multiplier associated with fi (x) ≤ 0 ◮ νi is Lagrange multiplier associated with hi (x) = 0

SJTU Ying Cui 3 / 46 Lagrange dual function

Lagrange dual function (or dual function): g : Rm × Rp → R

m p g(λ, ν)= inf L(x, λ, ν)= inf f0(x)+ λi fi (x)+ νi hi (x) x∈D x∈D i i ! X=1 X=1

◮ g is concave even when problem is not convex, as it is pointwise infimum of a family of affine functions of (λ, ν) ◮ pointwise minimum or infimum of concave functions is concave ◮ g can be −∞ when L is unbounded below in x

SJTU Ying Cui 4 / 46 Lower bound property The dual function yields lower bounds on the optimal value of the primal problem, i.e., for any λ  0 and any ν, g(λ, ν) ≤ p∗ ◮ the inequality holds but is vacuous when g(λ, ν)= −∞ ◮ the dual function gives a nontrivial lower bound only when λ  0 and (λ, ν) ∈ domg, i.e., g(λ, ν) > −∞ ◮ refer to (λ, ν) with λ  0, (λ, ν) ∈ domg as dual feasible

proof: Supposex ˜ is feasible, i.e., fi (˜x) ≤ 0 and hi (˜x)= 0, and λ  0. Then, we have m p λi fi (˜x)+ νi hi (˜x) ≤ 0 =⇒ L(˜x, λ, ν) ≤ f0(˜x) i i X=1 X=1 Hence,

g(λ, ν)= inf L(x, λ, ν) ≤ L(˜x, λ, ν) ≤ f0(˜x) x∈D Minimizing over all feasiblex ˜ gives p∗ ≥ g(λ, ν). SJTU Ying Cui 5 / 46 Examples Least-norm solution of linear equations min xT x x s.t. Ax = b dual function: ◮ to minimize L(x,ν)= xT x + νT (Ax − b) over x (unconstrained convex problem), set gradient equal to zero: T T ∇x L(x,ν) = 2x + A ν = 0 ⇒ x = −(1/2)A ν ◮ plug in L(x,ν) to obtain g: g(ν)= L((−1/2)AT ν,ν) = (−1/4)νT AAT ν − bT ν which is a concave quadratic function of ν, as −AAT  0 lower bound property: p∗ ≥ (−1/4)νT AAT ν − bT ν, for all ν

SJTU Ying Cui 6 / 46 Examples Standard form LP min cT x x s.t. Ax = b, x  0 dual function: ◮ Lagrangian L(x, λ, ν)= cT x + νT (Ax − b) − λT x = −bT ν + (c + AT ν − λ)T x is affine in x (bounded below only when identically zero) ◮ dual function −bT ν, AT ν − λ + c = 0 g(λ, ν) = inf L(x, λ, ν)= x (−∞, otherwise lower bound property: nontrivial only when λ  0 and AT ν − λ + c = 0, and hence p∗ ≥−bT ν if AT ν + c  0 SJTU Ying Cui 7 / 46 Examples Two-way partitioning problem (W ∈ Sn) min xT Wx x 2 s.t. xi = 1, i = 1, ..., n ◮ a nonconvex problem with 2n discrete feasible points ◮ find the two-way partition of {1, ..., n} with least total cost ◮ Wij is cost of assigning i, j to the same set ◮ −Wij is cost of assigning i, j to different sets dual function: T 2 g(ν) =inf(x Wx + νi (x − 1)) x i i X −1T ν, W + diag(ν)  0 =inf xT (W + diag(ν))x − 1T ν = x (−∞, otherwise lower bound property: p∗ ≥−1T ν if W + diag(ν)  0 ∗ example: ν = −λmin(W )1 gives bound p ≥ nλmin(W ) SJTU Ying Cui 8 / 46 Lagrange dual function and conjugate function

◮ conjugate f ∗ of a function f : Rn → R:

f ∗(y)= sup (y T x − f (x)) x∈dom f

◮ dual function of

min f0(x) x s.t. x = 0

g(ν) = inf(f (x)+ νT x)= − sup((−ν)T x − f (x)) x x ◮ relationship: g(ν)= −f ∗(−ν)

◮ conjugate of any function is convex ◮ dual function of any problem is concave

SJTU Ying Cui 9 / 46 Lagrange dual function and conjugate function more generally (and more usefully), consider an optimization problem with linear inequality and equality constraints

min f0(x) x s.t. Ax  b, Cx = d dual function: T T g(λ, ν)= inf f0(x)+ λ (Ax − b)+ ν (Cx − d) x∈dom f0  T T T T  T = inf f0(x) + (A λ + C ν) x − b λ − d ν x∈dom f0 ∗ T T T T  = −f0 (−A λ − C ν) − b λ − d ν ∗ domain of g follows from domain of f0 : T T ∗ domg = {(λ, µ)|− A λ − C ν ∈ domf0 } ◮ simplify derivation of dual function if conjugate of f0 is known

SJTU Ying Cui 10 / 46 Examples Equality constrained norm minimization

min kxk x s.t. Ax = b

dual function:

T T T ∗ T −b ν, ||A ν||∗ ≤ 1 g(ν)= −b ν − f0 (−A ν)= (−∞, otherwise

◮ conjugate of f0 = || · ||:

∗ 0, ||y||∗ ≤ 1 f0 (y)= (∞, otherwise i.e., the indicator function of the dual norm unit ball, where T kyk∗ = supkuk≤1 u y is dual norm of k · k

SJTU Ying Cui 11 / 46 Lagrange dual problem

max g(λ, ν) λ,ν s.t. λ  0 ◮ find best lower bound on p∗, obtained from Lagrange dual function ◮ always a convex optimization problem (maximize a over a ), regardless of convexity of primal problem, optimal value denoted by d ∗ ◮ λ, ν are dual feasible if λ  0 and g(λ, ν) > −∞ (i.e., (λ, ν) ∈ domg = {(λ, ν)|g(λ, ν) > −∞}) ◮ can often be simplified by making implicit constraint (λ, ν) ∈ dom g explicit, e.g., ◮ standard form LP and its dual min cT x max − bT ν x ν s.t. Ax = b, x  0 s.t. AT ν + c  0

SJTU Ying Cui 12 / 46 Weak duality and strong duality weak duality: d ∗ ≤ p∗ ◮ always holds (for convex and nonconvex problems) ◮ can be used to find nontrivial lower bounds for difficult problems, e.g., ◮ solving the SDP

max − 1T ν ν s.t. W + diag(ν)  0

gives a lower bound for the two-way partitioning problem strong duality: d ∗ = p∗ ◮ does not hold in general ◮ (usually) holds for convex problems ◮ conditions that guarantee strong duality in convex problems are called constraint qualifications ◮ there exist many types of constraint qualifications

SJTU Ying Cui 13 / 46 Slater’s constraint qualification

One simple constraint qualification is Slater’s condition (Slater’s constraint qualification): convex problem is strictly feasible, i.e., there exists an x ∈ intD such that

fi (x) < 0, i = 1, · · · , m, Ax = b

◮ can be refined, e.g., ◮ can replace intD with relintD (interior relative to affine hull) ◮ affine inequalities do not need to hold with strict inequality ◮ reduce to feasibility when the constraints are all affine equalities and inequalities ◮ implies strong duality for convex problems ◮ implies that the dual value is attained when d ∗ > −∞, i.e., there exists a dual feasible (λ∗,ν∗) with g(λ∗,ν∗)= d ∗ = p∗

SJTU Ying Cui 14 / 46 Examples Inequality form LP primal problem: min cT x x s.t. Ax  b dual function: T T T T T −b λ, A λ + c = 0 g(λ) = infx (c + A λ) x − b λ = (−∞, otherwise dual problem:  max − bT λ λ s.t. AT λ + c = 0, λ  0 ◮ from weaker form of Slater’s condition: strong duality holds for any LP provided the primal problem is feasible, implying strong duality holds for LPs if the dual is feasible ◮ in fact, p∗ = d ∗ except when primal and dual are infeasible SJTU Ying Cui 15 / 46 Examples n Quadratic program: P ∈ S++

min xT Px x s.t. Ax  b

dual function: T T 1 T −1 T T g(λ) = infx (x Px + λ (Ax − b)) = − 4 λ AP A λ − b λ dual problem:

max − (1/4)λT AP−1AT λ − bT λ λ s.t. λ  0

◮ from weaker form of Slater’s condition: strong duality holds provided the primal problem is feasible ◮ in fact, p∗ = d ∗ always holds

SJTU Ying Cui 16 / 46 Examples A nonconvex problem with strong duality: A 6 0 min xT Ax + 2bT x x s.t. xT x ≤ 1 dual function: T T g(λ) =infx (x (A + λI )x + 2b x − λ) −bT (A + λI )†b − λ, A + λI  0, b ∈ R(A + λI ) = (−∞, otherwise dual problem and equivalent SDP: max − bT (A + λI )†b − λ max − t − λ λ λ,t A + λI b s.t. A + λI  0, b ∈ R(A + λI ) s.t.  0 bT t ◮ strong duality holds although primal problem is nonconvex (difficult to show)

SJTU Ying Cui 17 / 46 Geometric interpretation geometric interpretation via set of values ◮ set of values taken on by the constraint and objective functions: G = {(f1(x), · · · , fm(x), h1(x), · · · , hp(x), f0(x)) ∈ Rm × Rp × R|x ∈ D} ◮ optimal value: p∗ = inf{t|(u, v, t) ∈ G, u  0, v = 0} ◮ dual function: g(λ, ν) = inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ G} ◮ if the infimum is finite, then (λ, ν, 1)T (u, v, t) ≥ g(λ, ν) defines a nonvertical supporting hyperplane to G ◮ weak duality: for all λ  0,

p∗ =inf{t|(u, v, t) ∈ G, u  0, v = 0} ≥ inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ G, u  0, v = 0} ≥ inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ G} =g(λ, ν)

SJTU Ying Cui 18 / 46 Geometric interpretation Example consider a simple problem with one constraint ∗ min f0(x) p =inf{t|(u, t) ∈ G, u ≤ 0} x

s.t. f1(x) ≤ 0 g(λ)= inf (t + λu) (u,t)∈G

where G = {(f1(x), f0(x))|x ∈ D} ◮ λu + t = g(λ) is (non-vertical) supporting hyperplane to G ◮ hyperplane intersects t-axis at t = g(λ)

t

t G

λ2u + t = g(λ2) p⋆ G ⋆ ⋆ λu + t = g(λ) λ u + t = g(λ ) ⋆ g(λ) p ⋆ λ1u + t = g(λ1) d u

u

Figure 5.3 Geometric interpretation of dual function and lower bound g(λ) ≤ ⋆ p , for a problem with one (inequality) constraint. Given λ, we minimize T (λ, 1) (u, t) over G = {(f1(x), f0(x)) | x ∈ D}. This yields a supporting Figure 5.4 Supporting hyperplanes corresponding to three dual feasible val- ⋆ hyperplane with slope −λ. The intersection of this hyperplane with the ues of λ, including the optimum λ . Strong duality does not hold; the ⋆ ⋆ u = 0 axis gives g(λ). optimal p − d is positive. SJTU Ying Cui 19 / 46 Geometric interpretation geometric interpretation via ◮ m epigraph form of G: A = G + (R+ × {0}× R+) = {(u, v, t)|∃x ∈ D, fi (x) ≤ ui , i = 1, · · · , m, hi (x)= vi , i = 1, · · · , p, f0(x) ≤ t} includes all points with larger objective or inequality constraint function values ◮ optimal value: p∗ = inf{t|(0, 0, t) ∈ A} ◮ dual function: if λ  0, then g(λ, ν) = inf{(λ, ν, 1)T (u, v, t)|(u, v, t) ∈ A} ◮ if the infimum is finite, then (λ, ν, 1)T (u, v, t) ≥ g(λ, ν) defines a nonvertical supporting hyperplane to A ◮ weak duality: p∗ = (λ, ν, 1)T (0, 0, p∗) ≥ g(λ, ν) ◮ strong duality: holds iff there exists a nonvertical supporting hyperplane to A at its boundary point (0, 0, p∗) ◮ for convex problem, A is convex, hence has a supporting hyperplane at (0, 0, p∗) ◮ Slater’s condition guarantees the supporting hyperplane to be nonvertical SJTU Ying Cui 20 / 46 Geometric interpretation Example consider a simple problem with one constraint

∗ min f0(x) p = inf{t|(0, t) ∈ A} x T s.t. f1(x) ≤ 0 g(λ) = inf{(λ, 1) (u, t)|(u, t) ∈ A}

where A = {(u, t)|∃x ∈ D, f1(x) ≤ u, f0(x) ≤ t}

t

A

(0 , p⋆) λu + t = g(λ) (0 , g(λ)) u

Figure 5.5 Geometric interpretation of dual function and lower bound g(λ) ≤ ⋆ p , for a problem with one (inequality) constraint. Given λ, we minimize T (λ, 1) (u, t) over A = {(u, t) | ∃x ∈ D, f0(x) ≤ t, f1(x) ≤ u}. This yields a supporting hyperplane with slope −λ. The intersection of this hyperplane with the u = 0 axis gives g(λ).

SJTU Ying Cui 21 / 46 Certificate of suboptimality and stopping criteria do not assume the primal problem is convex, and let x and (λ, ν) be a primal feasible point and a dual feasible point, respectively ◮ (λ, ν) provides a proof or certificate that p∗ ≥ g(λ, ν) ◮ (λ, ν) bounds how suboptimal x is without knowing p∗: ∗ f0(x) − p ≤ f0(x) − g(λ, ν)

◮ provide nonheuristic stopping criteria in optimization algs ◮ x, (λ, ν) localizes p∗, d ∗ to an interval: ∗ ∗ p , d ∈ [g(λ, ν), f0(x)]

with the width being the duality gap f0(x) − g(λ, ν) associated with x and (λ, ν) ◮ if f0(x)= g(λ, ν), then x is primal optimal and (λ, ν) is dual optimal ◮ (λ, ν) is a certificate that proves x is optimal ◮ x is a certificate that proves (λ, ν) is dual optimal

SJTU Ying Cui 22 / 46 Complementary slackness Let x∗ and (λ∗,ν∗) be any primal optimal and dual optimal points. Assume strong duality holds. Then, m p ∗ ∗ ∗ ∗ ∗ f0(x )= g(λ ,ν ) = inf f0(x)+ λ fi (x)+ ν hi (x) x i i i=1 i=1 ! m X p X ∗ ∗ ∗ ∗ ∗ ≤ f0(x )+ λi fi (x )+ νi hi (x ) i=1 i=1 ∗ X X ≤ f0(x ) Hence, the two inequalities hold with equality implying: ◮ x∗ minimizes L(x, λ∗,ν∗) over x (L(x, λ∗,ν∗) can have other minimizers) ◮ ∗ ∗ complementary slackness: λi fi (x ) = 0, i = 1, · · · , m, i.e., ∗ ∗ ∗ ∗ λi > 0 ⇒ fi (x ) = 0, fi (x ) < 0 ⇒ λi = 0 ∗ λi = 0 unless the ith constraint is active at the optimum SJTU Ying Cui 23 / 46 Karush-Kuhn-Tucker (KKT) conditions Consider any optimization problem with differentiable objective and constraint functions. The following four conditions are called KKT conditions: ◮ primal constraints:

fi (x) ≤ 0, i = 1, · · · , m, hi (x) = 0, i = 1, · · · , p ◮ dual constraints: λ  0 ◮ complementary slackness:

λi fi (x) = 0, i = 1, · · · , m ◮ gradient of L(x, λ, ν) with respect to x vanishes:

m p ∇f0(x)+ λi ∇fi (x)+ νi ∇hi (x) = 0 i i X=1 X=1

SJTU Ying Cui 24 / 46 Karush-Kuhn-Tucker (KKT) conditions

consider any optimization problem with differentiable objective and constraint functions KKT conditions for nonconvex/convex problems ◮ for any optimization problem, if strong duality holds, any pair of primal and dual optimal points x∗, (λ∗,ν∗) must satisfy the KKT conditions ◮ proof: The first and second conditions hold obviously. The third condition is shown on page 23. The fourth condition ∗ ∗ ∗ follows from the fact that x = argminx L(x, λ ,ν ) (shown on page 23) and L(x, λ∗,ν∗) is differentiable.

SJTU Ying Cui 25 / 46 Karush-Kuhn-Tucker (KKT) conditions KKT conditions for convex problems ◮ for any convex optimization problem, any pointsx ˜ and (λ,˜ ν˜) that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap ◮ proof: The first and second conditions state thatx ˜ and (λ,˜ ν˜) are primary and dual feasible, respectively. By noting that L(x, λ,˜ ν˜) is convex in x (as λ˜  0), the fourth condition implies thatx ˜ = argminx L(x, λ,˜ ν˜), i.e., g(λ,˜ ν˜)= L(˜x, λ,˜ ν˜)= m ˜ p f0(˜x)+ i=1 λi fi (˜x)+ i=1 ν˜i hi (˜x)= f0(˜x), where the last equality is due to the first and third conditions. P P g(λ,˜ ν˜)= f0(˜x) means zero duality gap, implying thatx ˜ and (λ,˜ ν˜) are primal and dual optimal. ◮ if a convex optimization problem satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality, i.e., ◮ x is optimal iff there are (λ, ν) that, together with x, satisfy the KKT conditions

SJTU Ying Cui 26 / 46 Karush-Kuhn-Tucker (KKT) conditions

KKT conditions play an important role in optimization ◮ in a few special cases, it is possible to solve the KKT conditions (and therefore, the optimization problem) analytically ◮ more generally, many algorithms for convex optimization are conceived as, or can be interpreted as, methods for solving the KKT conditions

SJTU Ying Cui 27 / 46 Example water-filling (assume αi > 0) n min − log(xi + αi ) x i X=1 s.t. x  0, 1T x = 1 x is optimal iff x  0, 1T x = 1, and there exist λ ∈ Rn,ν ∈ R such that 1 λ  0, λi xi = 0, + λi = ν xi + αi ◮ if ν < 1/αi : λi = 0 and xi = 1/ν − αi ◮ if ν ≥ 1/αi : λi = ν − 1/αi and xi = 0 ◮ T n determine ν from 1 x = i=1 max{0, 1/ν − αi } = 1 thus, the optimal point is given by P ∗ ∗ xi = max{0, 1/ν − αi } ∗ n ∗ where ν satisfies i=1 max{0, 1/ν − αi } = 1 SJTU Ying Cui 28 / 46 P Example

interpretation: ◮ n patches; level of patch i is at height ai ◮ flood area with unit amount of water ◮ resulting level is 1/ν∗ ◮ ∗ depth of water above patch i is xi

1/ν ⋆ xi

αi

i Figure 5.7 Illustration of water-filling algorithm. The height of each patch is ⋆ given by αi. The region is flooded to a level 1/ν which uses a total quantity of water equal to one. The height of the water (shown shaded) above each ⋆ patch is the optimal value of xi .

SJTU Ying Cui 29 / 46 Perturbation and sensitivity analysis

(unperturbed) optimization problem and its dual

min f0(x) max g(λ, ν) x λ,ν

s.t. fi (x) ≤ 0, i = 1, ..., m s.t. λ  0

hi (x) = 0, i = 1, ..., p

perturbed problem and its dual

T T min f0(x) max g(λ, ν) − u λ − v ν x λ,ν

s.t. fi (x) ≤ ui , i = 1, ..., m s.t. λ  0

hi (x)= vi , i = 1, ..., p

SJTU Ying Cui 30 / 46 Perturbation and sensitivity analysis

◮ x is primal variable, and u, v are parameters ◮ tighten (ui < 0) or relax (ui > 0) ith inequality constraint by ui ◮ change the righthand side of ith equality constraints by vi ◮ p∗(u, v) is optimal value of perturbed problem, as a function of perturbations to the righthand sides of the constraints ◮ p∗(0, 0) = p∗ ◮ when p∗(u, v)= ∞, perturbations of the constraints result in infeasibility ◮ when unperturbed problem is convex, p∗(u, v) is a of u and v ◮ interested in information about p∗(u, v) obtained from solution of unperturbed problem and its dual

SJTU Ying Cui 31 / 46 Perturbation and sensitivity analysis global sensitivity Assume that strong duality holds for unperturbed problem, and that the dual optimum is attained. Let λ∗,ν∗ be dual optimal for unperturbed problem. Then for all u and v, p∗(u, v) ≥ p∗(0, 0) − uT λ∗ − v T ν∗ global sensitivity interpretation: ◮ ∗ ∗ large λi : p increases greatly if tightening constraint i (ui < 0) ◮ ∗ ∗ small λi : p does not decrease much if loosening constraint i (ui > 0) ◮ ∗ ∗ large and positive νi : p increases greatly if taking vi < 0; ∗ ∗ large and negative νi : p increases greatly if taking vi > 0 ◮ ∗ ∗ small and positive νi : p does not decrease much if taking vi > 0; ∗ ∗ small and negative νi : p does not decrease much if taking vi < 0 SJTU Ying Cui 32 / 46 Perturbation and sensitivity analysis

proof: apply weak duality to perturbed problem and then strong duality to the unperturbed problem

p∗(u, v) ≥ g(λ∗,ν∗) − uT λ∗ − v T ν∗ = p∗(0, 0) − uT λ∗ − v T ν∗

example: p∗(u) for a problem with one inequality constraint:

u u = 0 p⋆(u)

p⋆(0) − λ⋆u ⋆ Figure 5.10 Optimal value p (u) of a convex problem with one constraint f1(x) ≤ u, as a function of u. For u = 0, we have the original unperturbed problem; for u < 0 the constraint is tightened, and for u > 0 the constraint ⋆ ⋆ ⋆ is loosened. The affine function p (0) − λ u is a lower bound on p .

SJTU Ying Cui 33 / 46 Perturbation and sensitivity analysis local sensitivity If (in addition) p∗(u, v) is differentiable at (0, 0), then ∗ ∗ ∗ ∂p (0, 0) ∗ ∂p (0, 0) λi = − , νi = − ∂ui ∂vi local sensitivity interpretation: ◮ optimal Lagrange multipliers are exactly the local sensitivities of the optimal value with respect to constraint perturbations ◮ tightening (loosening) ith inequality constraint a small amount ∗ ∗ yields an increase (a decrease) in p of approximately −λi ui ∗ (λi ui ) ◮ local sensitivity result gives us a quantitative measure of how active a constraint is at the optimum x∗ ◮ ∗ fi (x ) < 0: constraint can be tightened or loosened a small ∗ amount without affecting the optimal value, as λi =0 ◮ ∗ ∗ fi (x ) = 0: small (large) λi means that constraint can be loosened or tightened a bit without much (with great) effect on the optimal value

SJTU Ying Cui 34 / 46 Perturbation and sensitivity analysis

∗ proof (for λi ): choosing u = tei and v = 0, from global sensitivity result,

p∗(te ,0)−p∗(0,0) ∗ , ∗ , i ∗ p (tei 0)−p (0 0) ∗ lim t ≥−λi t ≥−λi , t > 0 tց0 ∗ ∗ =⇒ ∗ ∗ p (tei ,0)−p (0,0) ∗ p (tei ,0)−p (0,0) ∗ ( ≤−λi , t < 0 lim ≤−λi t tր0 t

∗ Thus, ∂p (0,0) = −λ∗.  ∂ui i

SJTU Ying Cui 35 / 46 Duality and problem reformulations

◮ equivalent formulations of a problem can lead to very different dual problems ◮ reformulating the primal problem can be useful when the dual problem is difficult to derive, or uninteresting

common reformulations ◮ introduce new variables and associated equality constraints ◮ replacing the objective with an increasing function of the original objective ◮ make explicit constraints implicit (i.e., incorporating them into the domain of objective) or vice-versa

SJTU Ying Cui 36 / 46 Introducing new variables and equality constraints unconstrained problem: minx f0(Ax + b) ◮ ∗ dual function is constant: g = infx f0(Ax + b)= p ◮ strong duality holds, i.e., p∗ = d ∗, but dual is not useful reformulated problem and its dual: T ∗ min f0(y) max b ν − f (ν) x,y ν 0 s.t. Ax + b − y = 0 s.t. AT ν = 0 dual function follows from T T T T g(ν) =inf(f0(y)+ ν (Ax + b − y)) = inf(f0(y) − ν y + ν Ax)+ b ν x,y x,y T T T infy (f (y) − ν y)+ b ν, A ν = 0 = 0 (−∞, otherwise −f ∗(ν)+ bT ν, AT ν = 0 = 0 (−∞, otherwise

SJTU Ying Cui 37 / 46 Introducing new variables and equality constraints minimum norm problem: minx ||Ax − b|| ◮ ∗ dual function is constant: g = infx ||Ax − b|| = p ◮ strong duality holds, i.e., p∗ = d ∗, but dual is not useful reformulated problem and its dual: min ||y|| max bT ν x,y ν T s.t. y = Ax − b s.t. A ν = 0, ||ν||∗ ≤ 1 dual function follows from g(ν) =inf(||y|| + νT (y − Ax + b)) x,y T T T infy (||y|| + ν y)+ b ν, A ν = 0 = (−∞, otherwise bT ν, AT ν = 0, ||ν|| ≤ 1 = ∗ (−∞, otherwise

SJTU Ying Cui 38 / 46 Transforming the objective replacing the objective with an increasing function of the original objective minimum norm problem: minx ||Ax − b|| reformulated problem and its dual: min 1/2||y||2 max − 1/2||ν||2 + bT ν x,y ν ∗ s.t. y = Ax − b s.t. AT ν = 0 dual function follows from g(ν) =inf(1/2||y||2 + νT (y − Ax + b)) x,y 2 T T T infy (1/2||y|| + ν y)+ b ν, A ν = 0 = (−∞, otherwise −1/2||ν||2 + bT ν, AT ν = 0 = ∗ (−∞, otherwise 2 2 last inequality: conjugate of 1/2|| · || is 1/2|| · ||∗ (Ex.3.27, pp.93) SJTU Ying Cui 39 / 46 Implicit constraints make explicit constraints implicit (i.e., incorporating them into the domain of objective) or vice-versa LP with box constraints: primal and dual problem T T T T min c x max − b ν − 1 λ1 − 1 λ2 x λ,ν T s.t. Ax = b s.t. c + A ν + λ1 − λ2 = 0

− 1  x  1 λ1  0, λ2  0 reformulated problem with box constraints made implicit and its dual: T c x, −1  x  1 T T max f0(x)= max −b ν − ||A ν + c||1 x ( ∞, otherwise ν s.t. Ax = b dual function follows from: T T T T g(ν) = inf (c x + ν (Ax − b)) = −b ν − ||A ν + c||1 −1x1

SJTU Ying Cui 40 / 46 Problems with generalized inequality constraints do not assume convexity of problem ∗ p , min f0(x) x

s.t. fi (x) Ki 0, i = 1, ..., m hi (x) = 0, i = 1, ..., p

ki ki Ki ⊆ R is a proper cone; Ki is a generalized inequality on R ◮ Lagrange multiplier vector associated with fi (x) Ki 0: ki λi ∈ R , Lagrange multiplier associated with hi (x) = 0: νi ∈ R ◮ Lagrangian L :Rn × Rk1 × ... × Rkm × Rp → R: m p T L(x, λ1, ..., λm,ν)= f0(x)+ λi fi (x)+ νi hi (x) i i X=1 X=1 ◮ (concave) dual function g: Rk1 × ... × Rkm × Rp → R:

g(λ1, ..., λm,ν)= inf L(x, λ1, ..., λm,ν) x∈D SJTU Ying Cui 41 / 46 Lower bound property The dual function yields lower bounds on the optimal value of the primal problem, i.e., for any λ  ∗ 0 and any ν, i Ki ∗ p ≥ g(λ1, ..., λm,ν) proof: ifx ˜ is feasible and λ  ∗ 0, then Ki m p T f0(˜x) ≥ f0(˜x)+ λi fi (˜x)+ νi hi (˜x) Xi=1 Xi=1 ≥ inf L(x, λ1, ..., λm,ν)= g(λ1, ..., λm,ν) x∈D where the first inequality follows from the definition of the dual ∗ cone. Minimizing over all feasiblex ˜ gives p ≥ g(λ1, ..., λm,ν). dual problem: ∗ d , max g(λ1, ..., λm,ν) λ1,...,λm,ν s.t. λ  ∗ 0, i = 1, ..., m i Ki

SJTU Ying Cui 42 / 46 Weak duality and strong duality

weak duality: d ∗ ≤ p∗ ◮ always holds (for convex and nonconvex problems) ◮ can be used to find nontrivial lower bounds for difficult problems strong duality: d ∗ = p∗ ◮ does not hold in general ◮ holds for convex problem with constraint qualification, e.g., ◮ Slater’s condition: primal problem is strictly feasible

SJTU Ying Cui 43 / 46 Examples Semidefinite program k k primal SDP (Fi , G ∈ S , positive semidefinite cone K1 = S+): min cT x x

s.t. x1F1 + ... + xnFn  G ◮ Lagrange multiplier is matrix Z ∈ Sk and Lagrangian is T L(x, Z)=c x + tr(Z(x1F1 + ... + xnFn − G))

=x1(c1 + tr(F1Z)) + · · · + xn(cn + tr(FnZ)) − tr(GZ) ◮ dual function

− tr(GZ) tr(Fi Z)+ ci = 0, i = 1, ..., n g(Z) = inf L(x, Z)= x ( −∞ otherwise dual SDP: max − tr(GZ) Z

s.t. Z  0, tr(Fi Z)+ ci = 0, i = 1, ..., n SJTU Ying Cui 44 / 46 Examples Cone program in standard form primal CP: (proper cone K ⊆ Rn): min cT x x

s.t. x K 0, Ax = b ◮ Lagrange multipliers λ ∈ Rn,ν ∈ Rm and Lagrangian is L(x, λ, ν)=cT x − λT x + νT (Ax − b) = (AT ν − λ + c)T x − bT ν ◮ dual function − bT ν, AT ν − λ + c = 0 g(Z) = inf L(x, λ, ν)= x ( −∞, otherwise dual SDP: max − bT ν ν T s.t. A ν + c K ∗ 0

SJTU Ying Cui 45 / 46 KKT conditions differentiable fi , hi ◮ primal constraints:

fi (x) Ki 0, i = 1, · · · , m, hi (x) = 0, i = 1, · · · , p ◮ dual constraints: λ  ∗ 0 Ki ◮ T complementary slackness: λi fi (x) = 0, i = 1, · · · , m, implying λ ≻ ∗ 0 ⇒ f (x) = 0, f (x) ≺ 0 ⇒ λ = 0 i Ki i i Ki i ◮ gradient of Lagrangian with respect to x vanishes: m p T ∇f0(x)+ λi ∇fi (x)+ νi ∇hi (x) = 0 i i X=1 X=1 KKT conditions for nonconvex/convex problems if strong duality holds, any primal optimal and any dual optimal must satisfy the KKT conditions KKT conditions for convex problems if strong duality holds, the KKT conditions provide necessary and sufficient conditions for optimality SJTU Ying Cui 46 / 46