<<

CS675: Convex and Combinatorial Optimization Fall 2019 of Problems

Instructor: Shaddin Dughmi Outline

1 The Lagrange Dual Problem

2 Duality

3 Optimality Conditions This Lecture + Next We will develop duality theory for convex optimization problems, generalizing duality.

Recall: Optimization Problem in Standard Form

minimize f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k.

For convex optimization problems in standard form, fi is convex and hi is affine. Let D denote the domain of all these functions (i.e. when their value is finite)

The Lagrange Dual Problem 1/25 Recall: Optimization Problem in Standard Form

minimize f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k.

For convex optimization problems in standard form, fi is convex and hi is affine. Let D denote the domain of all these functions (i.e. when their value is finite)

This Lecture + Next We will develop duality theory for convex optimization problems, generalizing linear programming duality.

The Lagrange Dual Problem 1/25 Along the way, we will recover the following standard form dual

minimize y|b subject to A|y  c y  0

Running Example: Linear Programming

We have already seen the standard form LP below

maximize c|x −minimize −c|x subject to Ax  b subject to Ax − b  0 x  0 −x  0

The Lagrange Dual Problem 2/25 Running Example: Linear Programming

We have already seen the standard form LP below

maximize c|x −minimize −c|x subject to Ax  b subject to Ax − b  0 x  0 −x  0

Along the way, we will recover the following standard form dual

minimize y|b subject to A|y  c y  0

The Lagrange Dual Problem 2/25 The Lagrangian Function m k X X L(x, λ, ν) = f0(x) + λifi(x) + νihi(x) i=1 i=1

λi is for i’th inequality Required to be nonnegative νi is Lagrange Multiplier for i’th equality constraint Allowed to be of arbitrary sign

The Lagrangian

minimize f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k.

Basic idea of Lagrangian duality is to relax/soften the constraints by replacing each with a linear “penalty term” or “cost” in the objective.

The Lagrange Dual Problem 3/25 The Lagrangian

minimize f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k.

Basic idea of Lagrangian duality is to relax/soften the constraints by replacing each with a linear “penalty term” or “cost” in the objective.

The Lagrangian Function m k X X L(x, λ, ν) = f0(x) + λifi(x) + νihi(x) i=1 i=1

λi is Lagrange Multiplier for i’th inequality constraint Required to be nonnegative νi is Lagrange Multiplier for i’th equality constraint Allowed to be of arbitrary sign

The Lagrange Dual Problem 3/25 The Lagrange Dual Function m k ! X X g(λ, ν) = inf L(x, λ, ν) = inf f0(x) + λifi(x) + νihi(x) x∈D x∈D i=1 i=1

Observe: g is a of the Lagrange multipliers We will see: Its quite common for the Lagrange dual to be unbounded (−∞) for some λ and ν By convention, domain of g is (λ, ν) s.t. g(λ, ν) > −∞

The Lagrange Dual Function

minimize f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k. The Lagrange dual function gives the optimal value of the primal problem subject to the softened constraints

The Lagrange Dual Problem 4/25 The Lagrange Dual Function

minimize f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k. The Lagrange dual function gives the optimal value of the primal problem subject to the softened constraints

The Lagrange Dual Function m k ! X X g(λ, ν) = inf L(x, λ, ν) = inf f0(x) + λifi(x) + νihi(x) x∈D x∈D i=1 i=1

Observe: g is a concave function of the Lagrange multipliers We will see: Its quite common for the Lagrange dual to be unbounded (−∞) for some λ and ν By convention, domain of g is (λ, ν) s.t. g(λ, ν) > −∞ The Lagrange Dual Problem 4/25 And the Lagrange Dual

g(λ) = inf L(x, λ) x ( −∞ if A|λ − c − λ 6= 0 = 1 2 | | −λ1b if A λ1 − c − λ2 = 0

So we restrict the domain of g to λ satisfying A|λ1 − c − λ2 = 0

Langrange Dual of LP

minimize −c|x subject to Ax − b  0 −x  0 First, the Lagrangian function

| | | L(x, λ) = −c x + λ1(Ax − b) − λ2x | | | = (A λ1 − c − λ2) x − λ1b

The Lagrange Dual Problem 5/25 So we restrict the domain of g to λ satisfying A|λ1 − c − λ2 = 0

Langrange Dual of LP

minimize −c|x subject to Ax − b  0 −x  0 First, the Lagrangian function

| | | L(x, λ) = −c x + λ1(Ax − b) − λ2x | | | = (A λ1 − c − λ2) x − λ1b And the Lagrange Dual

g(λ) = inf L(x, λ) x ( −∞ if A|λ − c − λ 6= 0 = 1 2 | | −λ1b if A λ1 − c − λ2 = 0

The Lagrange Dual Problem 5/25 Langrange Dual of LP

minimize −c|x subject to Ax − b  0 −x  0 First, the Lagrangian function

| | | L(x, λ) = −c x + λ1(Ax − b) − λ2x | | | = (A λ1 − c − λ2) x − λ1b And the Lagrange Dual

g(λ) = inf L(x, λ) x ( −∞ if A|λ − c − λ 6= 0 = 1 2 | | −λ1b if A λ1 − c − λ2 = 0

So we restrict the domain of g to λ satisfying A|λ1 − c − λ2 = 0

The Lagrange Dual Problem 5/25 Interpretation: “Soft” Lower Bound

min f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k. The Lagrange Dual Function m k ! X X g(λ, ν) = inf L(x, λ, ν) = inf f0(x) + λifi(x) + νihi(x) x∈D x∈D i=1 i=1

The Lagrange Dual Problem 6/25 Proof Every primal feasible x incurs nonpositive penalty by L(x, λ, ν) ∗ ∗ Therefore, L(x , λ, ν) ≤ f0(x ) ∗ So g(λ, ν) ≤ f0(x ) = OPT (P rimal)

Interpretation: “Soft” Lower Bound

min f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k. The Lagrange Dual Function m k ! X X g(λ, ν) = inf L(x, λ, ν) = inf f0(x) + λifi(x) + νihi(x) x∈D x∈D i=1 i=1

Fact k g(λ, ν) is a lowerbound on OPT(primal) for every λ  0 and ν ∈ R .

The Lagrange Dual Problem 6/25 Interpretation: “Soft” Lower Bound

min f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k. The Lagrange Dual Function m k ! X X g(λ, ν) = inf L(x, λ, ν) = inf f0(x) + λifi(x) + νihi(x) x∈D x∈D i=1 i=1

Fact k g(λ, ν) is a lowerbound on OPT(primal) for every λ  0 and ν ∈ R . Proof Every primal feasible x incurs nonpositive penalty by L(x, λ, ν) ∗ ∗ Therefore, L(x , λ, ν) ≤ f0(x ) ∗ So g(λ, ν) ≤ f0(x ) = OPT (P rimal) The Lagrange Dual Problem 6/25 Interpretation: “Soft” Lower Bound

min f0(x) subject to fi(x) ≤ 0, for i = 1, . . . , m. hi(x) = 0, for i = 1, . . . , k. The Lagrange Dual Function m k ! X X g(λ, ν) = inf L(x, λ, ν) = inf f0(x) + λifi(x) + νihi(x) x∈D x∈D i=1 i=1

Interpretation A “hard” feasibility constraint can be thought of as imposing a penalty of +∞ if violated, and a penalty/reward of 0 if satisfied Lagrangian imposes a “soft” linear penalty for violating a constraint, and a reward for slack Lagrange dual finds the optimal subject to these soft constraints

The Lagrange Dual Problem 6/25 λu + t = g(λ) is a supporting hyperplane to G pointing northeast Must intersect vertical axis below p∗ Therefore g(λ) ≤ p∗

Interpretation: Geometric

Most easily visualized in the presence of a single inequality constraint

minimize f0(x) subject to f1(x) ≤ 0

Let G be attainable constraint/objective function value tuples i.e. (u, t) ∈ G if there is an x such that f1(x) = u and f0(x) = t p∗ = inf {t :(u, t) ∈ G, u ≤ 0} g(λ) = inf {λu + t :(u, t) ∈ G}

The Lagrange Dual Problem 7/25 Interpretation: Geometric

Most easily visualized in the presence of a single inequality constraint

minimize f0(x) subject to f1(x) ≤ 0

Let G be attainable constraint/objective function value tuples i.e. (u, t) ∈ G if there is an x such that f1(x) = u and f0(x) = t p∗ = inf {t :(u, t) ∈ G, u ≤ 0} g(λ) = inf {λu + t :(u, t) ∈ G}

λu + t = g(λ) is a supporting hyperplane to G pointing northeast Must intersect vertical axis below p∗ Therefore g(λ) ≤ p∗

The Lagrange Dual Problem 7/25 The Lagrange Dual Problem

This is the problem of finding the best lower bound on OPT(primal) implied by the Lagrange dual function

maximize g(λ, ν) subject to λ  0

Note: this is a convex optimization problem, regardless of whether primal problem was convex By convention, sometimes we add “dual feasibility” constraints to impose “nontrivial” lowerbounds (i.e. g(λ, ν) ≥ −∞) (λ∗, ν∗) solving the above are referred to as the dual optimal solution

The Lagrange Dual Problem 8/25 minimize y|b subject to A|y  c A|λ1  c y  0

The Lagrange dual problem can then be written as

| −maximize −λ1b subject to λ  0

Langrange Dual Problem of LP

maximize c|x −minimize −c|x subject to Ax  b subject to Ax − b  0 x  0 −x  0

Recall Our Lagrange dual function for the above minimization LP (to the right), defined over the domain A|λ1 − c − λ2 = 0.

| g(λ) = −λ1b

The Lagrange Dual Problem 9/25 minimize y|b subject to A|y  c y  0 A|λ1  c

Langrange Dual Problem of LP

maximize c|x −minimize −c|x subject to Ax  b subject to Ax − b  0 x  0 −x  0

Recall Our Lagrange dual function for the above minimization LP (to the right), defined over the domain A|λ1 − c − λ2 = 0.

| g(λ) = −λ1b The Lagrange dual problem can then be written as

| −maximize −λ1b subject to A|λ1 − c − λ2 = 0

λ  0

The Lagrange Dual Problem 9/25 minimize y|b subject to A|y  c y  0

Langrange Dual Problem of LP

maximize c|x −minimize −c|x subject to Ax  b subject to Ax − b  0 x  0 −x  0

Recall Our Lagrange dual function for the above minimization LP (to the right), defined over the domain A|λ1 − c − λ2 = 0.

| g(λ) = −λ1b The Lagrange dual problem can then be written as

−maximize −λ|b 1 ((( | ((( subject to (A(λ(1 − c − λ2 = 0 A|λ1  c λ  0

The Lagrange Dual Problem 9/25 Langrange Dual Problem of LP

maximize c|x −minimize −c|x subject to Ax  b subject to Ax − b  0 x  0 −x  0

Recall Our Lagrange dual function for the above minimization LP (to the right), defined over the domain A|λ1 − c − λ2 = 0.

| g(λ) = −λ1b The Lagrange dual problem can then be written as

| | −maximize −λ b minimize y b 1 ((( | ((( subject to A|y  c subject to (A(λ(1 − c − λ2 = 0 y  0 A|λ1  c λ  0

The Lagrange Dual Problem 9/25 maximize ν|b subject to A|ν − c ∈ K◦

Can think of λ  0 as choosing some s ∈ K◦

L(x, s, ν) = (c − A|ν + s)|x + ν|b

Lagrange dual function g(s, ν) is bounded when coefficient of x is zero, in which case it has value ν|b

Another Example: Conic Optimization Problem

minimize c|x subject to Ax = b x ∈ K

x ∈ K can equivalently be written as z|x ≤ 0, ∀z ∈ K◦

| | X | L(x, λ, ν) = c x + ν (Ax − b) + λz · z x z∈K◦ | X | | = (c − A ν + λz · z) x + ν b z∈K◦

The Lagrange Dual Problem 10/25 maximize ν|b subject to A|ν − c ∈ K◦

Lagrange dual function g(s, ν) is bounded when coefficient of x is zero, in which case it has value ν|b

Another Example: Conic Optimization Problem

minimize c|x subject to Ax = b x ∈ K

x ∈ K can equivalently be written as z|x ≤ 0, ∀z ∈ K◦

| | X | L(x, λ, ν) = c x + ν (Ax − b) + λz · z x z∈K◦ | X | | = (c − A ν + λz · z) x + ν b z∈K◦ Can think of λ  0 as choosing some s ∈ K◦

L(x, s, ν) = (c − A|ν + s)|x + ν|b

The Lagrange Dual Problem 10/25 maximize ν|b subject to A|ν − c ∈ K◦

Another Example: Conic Optimization Problem

minimize c|x subject to Ax = b x ∈ K

x ∈ K can equivalently be written as z|x ≤ 0, ∀z ∈ K◦

| | X | L(x, λ, ν) = c x + ν (Ax − b) + λz · z x z∈K◦ | X | | = (c − A ν + λz · z) x + ν b z∈K◦ Can think of λ  0 as choosing some s ∈ K◦

L(x, s, ν) = (c − A|ν + s)|x + ν|b

Lagrange dual function g(s, ν) is bounded when coefficient of x is zero, in which case it has value ν|b

The Lagrange Dual Problem 10/25 Another Example: Conic Optimization Problem

minimize c|x | subject to Ax = b maximize ν b ◦ x ∈ K subject to A|ν − c ∈ K

x ∈ K can equivalently be written as z|x ≤ 0, ∀z ∈ K◦

| | X | L(x, λ, ν) = c x + ν (Ax − b) + λz · z x z∈K◦ | X | | = (c − A ν + λz · z) x + ν b z∈K◦ Can think of λ  0 as choosing some s ∈ K◦

L(x, s, ν) = (c − A|ν + s)|x + ν|b

Lagrange dual function g(s, ν) is bounded when coefficient of x is zero, in which case it has value ν|b

The Lagrange Dual Problem 10/25 Outline

1 The Lagrange Dual Problem

2 Duality

3 Optimality Conditions OPT (dual) ≤ OPT (primal).

We have already argued holds for every optimization problem : difference between optimal dual and primal values

Weak Duality Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

Duality 11/25 Weak Duality Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

Weak Duality OPT (dual) ≤ OPT (primal).

We have already argued holds for every optimization problem Duality Gap: difference between optimal dual and primal values

Duality 11/25 Fact The equation λu + t = g(λ) defines a supporting hyperplane to G, intersecting t axis at g(λ) ≤ p∗.

Recall: Geometric Interpretation of Weak Duality

minimize f0(x) subject to f1(x) ≤ 0

Let G be attainable constraint/objective function value tuples

i.e. (u, t) ∈ G if there is an x such that f1(x) = u and f0(x) = t p∗ = inf {t :(u, t) ∈ G, u ≤ 0} g(λ) = inf {λu + t :(u, t) ∈ G}

Duality 12/25 Recall: Geometric Interpretation of Weak Duality

minimize f0(x) subject to f1(x) ≤ 0

Let G be attainable constraint/objective function value tuples

i.e. (u, t) ∈ G if there is an x such that f1(x) = u and f0(x) = t p∗ = inf {t :(u, t) ∈ G, u ≤ 0} g(λ) = inf {λu + t :(u, t) ∈ G}

Fact The equation λu + t = g(λ) defines a supporting hyperplane to G, intersecting t axis at g(λ) ≤ p∗.

Duality 12/25

Strong Duality We say strong duality holds if OPT (dual) = OPT (primal).

Equivalently: there exists a setting of Lagrange multipliers so that g(λ, ν) gives a tight lowerbound on primal optimal value. In general, does not hold for non-convex optimization problems Usually, but not always, holds for convex optimization problems. Mild assumptions, such as Slater’s condition, needed.

Duality 13/25 Fact The equation λu + t = g(λ) defines a supporting hyperplane to A, intersecting t axis at g(λ) ≤ p∗.

Geometric Proof of Strong Duality

minimize f0(x) subject to f1(x) ≤ 0

Let A be everything northeast (i.e. “worse”) than G i.e. (u, t) ∈ A if there is an x such that f1(x) ≤ u and f0(x) ≤ t p∗ = inf {t : (0, t) ∈ A} g(λ) = inf {λu + t :(u, t) ∈ A}

Duality 14/25 Geometric Proof of Strong Duality

minimize f0(x) subject to f1(x) ≤ 0

Let A be everything northeast (i.e. “worse”) than G i.e. (u, t) ∈ A if there is an x such that f1(x) ≤ u and f0(x) ≤ t p∗ = inf {t : (0, t) ∈ A} g(λ) = inf {λu + t :(u, t) ∈ A} Fact The equation λu + t = g(λ) defines a supporting hyperplane to A, intersecting t axis at g(λ) ≤ p∗.

Duality 14/25 Proof Assume (u, t) and (u0, t0) are in A 0 0 0 0 0 ∃x, x with (f1(x), f0(x)) ≤ (u, t) and (f1(x ), f0(x )) ≤ (u , t ). By Jensen’s inequality 0 0 0 0 (f1(αx+(1−α)x ), f0(αx+(1−α)x )) ≤ (αu+(1−α)u , αt+(1−α)t ) Therefore, segment connecting (u, t) and (u0, t0) also in A.

Geometric Proof of Strong Duality

minimize f0(x) subject to f1(x) ≤ 0

Fact

When f0 and f1 are convex, A is convex.

Duality 15/25 0 0 0 0 0 ∃x, x with (f1(x), f0(x)) ≤ (u, t) and (f1(x ), f0(x )) ≤ (u , t ). By Jensen’s inequality 0 0 0 0 (f1(αx+(1−α)x ), f0(αx+(1−α)x )) ≤ (αu+(1−α)u , αt+(1−α)t ) Therefore, segment connecting (u, t) and (u0, t0) also in A.

Geometric Proof of Strong Duality

minimize f0(x) subject to f1(x) ≤ 0

Fact

When f0 and f1 are convex, A is convex.

Proof Assume (u, t) and (u0, t0) are in A

Duality 15/25 By Jensen’s inequality 0 0 0 0 (f1(αx+(1−α)x ), f0(αx+(1−α)x )) ≤ (αu+(1−α)u , αt+(1−α)t ) Therefore, segment connecting (u, t) and (u0, t0) also in A.

Geometric Proof of Strong Duality

minimize f0(x) subject to f1(x) ≤ 0

Fact

When f0 and f1 are convex, A is convex.

Proof Assume (u, t) and (u0, t0) are in A 0 0 0 0 0 ∃x, x with (f1(x), f0(x)) ≤ (u, t) and (f1(x ), f0(x )) ≤ (u , t ).

Duality 15/25 Therefore, segment connecting (u, t) and (u0, t0) also in A.

Geometric Proof of Strong Duality

minimize f0(x) subject to f1(x) ≤ 0

Fact

When f0 and f1 are convex, A is convex.

Proof Assume (u, t) and (u0, t0) are in A 0 0 0 0 0 ∃x, x with (f1(x), f0(x)) ≤ (u, t) and (f1(x ), f0(x )) ≤ (u , t ). By Jensen’s inequality 0 0 0 0 (f1(αx+(1−α)x ), f0(αx+(1−α)x )) ≤ (αu+(1−α)u , αt+(1−α)t )

Duality 15/25 Geometric Proof of Strong Duality

minimize f0(x) subject to f1(x) ≤ 0

Fact

When f0 and f1 are convex, A is convex.

Proof Assume (u, t) and (u0, t0) are in A 0 0 0 0 0 ∃x, x with (f1(x), f0(x)) ≤ (u, t) and (f1(x ), f0(x )) ≤ (u , t ). By Jensen’s inequality 0 0 0 0 (f1(αx+(1−α)x ), f0(αx+(1−α)x )) ≤ (αu+(1−α)u , αt+(1−α)t ) Therefore, segment connecting (u, t) and (u0, t0) also in A.

Duality 15/25 Proof Recall (0, p∗) is on the boundary of A By the supporting hyperplane theorem, there is a supporting hyperplane to A at (0, p∗) Direction of the supporting hyperplane gives us an appropriate λ

Geometric Proof of Strong Duality

minimize f0(x) subject to f1(x) ≤ 0

Theorem (Informal) There is a choice of λ so that g(λ) = p∗. Therefore, strong duality holds.

Duality 16/25 Geometric Proof of Strong Duality

minimize f0(x) subject to f1(x) ≤ 0

Theorem (Informal) There is a choice of λ so that g(λ) = p∗. Therefore, strong duality holds.

Proof Recall (0, p∗) is on the boundary of A By the supporting hyperplane theorem, there is a supporting hyperplane to A at (0, p∗) Direction of the supporting hyperplane gives us an appropriate λ Duality 16/25 What if our supporting hyperplane H at (0, p∗) is vertical? The normal to H is perpendicular to the t axis In this case, no finite λ exists such that (λ, 1) is normal to H. Somewhat counterintuitively, this can happen even in simple convex optimization problems (though its somewhat rare in practice)

I Lied (A little)

minimize f0(x) subject to f1(x) ≤ 0

In our proof, we ignored a technicality that can prevent strong duality from holding.

Duality 17/25 Somewhat counterintuitively, this can happen even in simple convex optimization problems (though its somewhat rare in practice)

I Lied (A little)

minimize f0(x) subject to f1(x) ≤ 0

In our proof, we ignored a technicality that can prevent strong duality from holding. What if our supporting hyperplane H at (0, p∗) is vertical? The normal to H is perpendicular to the t axis In this case, no finite λ exists such that (λ, 1) is normal to H.

Duality 17/25 I Lied (A little)

minimize f0(x) subject to f1(x) ≤ 0

In our proof, we ignored a technicality that can prevent strong duality from holding. What if our supporting hyperplane H at (0, p∗) is vertical? The normal to H is perpendicular to the t axis In this case, no finite λ exists such that (λ, 1) is normal to H. Somewhat counterintuitively, this can happen even in simple convex optimization problems (though its somewhat rare in practice)

Duality 17/25 2 S A = R++ ({0} × [1, ∞]) Therefore, any supporting hyperplane to A at (0, 1) must be vertical. Optimal dual value is 0; a duality gap of 1.

Violation of Strong Duality

minimize e−x x2 subject to y ≤ 0

Let domain of constraint be region y ≥ 1 Problem is convex, with given by x = 0 Optimal value is 1, at x = 0 and y = 1

Duality 18/25 Violation of Strong Duality

minimize e−x x2 subject to y ≤ 0

Let domain of constraint be region y ≥ 1 Problem is convex, with feasible region given by x = 0 Optimal value is 1, at x = 0 and y = 1 2 S A = R++ ({0} × [1, ∞]) Therefore, any supporting hyperplane to A at (0, 1) must be vertical. Optimal dual value is 0; a duality gap of 1.

Duality 18/25 Can be weakened to requiring strict feasibility only of non-affine constraints

Slater’s Condition There exists a point x ∈ D where all inequality constraints are strictly satisfied (i.e. fi(x) < 0). I.e. the optimization problem is strictly feasible.

A sufficient condition for strong duality. Forces supporting hyperplane to be non-vertical

Duality 19/25 Slater’s Condition There exists a point x ∈ D where all inequality constraints are strictly satisfied (i.e. fi(x) < 0). I.e. the optimization problem is strictly feasible.

A sufficient condition for strong duality. Forces supporting hyperplane to be non-vertical Can be weakened to requiring strict feasibility only of non-affine constraints

Duality 19/25 Outline

1 The Lagrange Dual Problem

2 Duality

3 Optimality Conditions Recall: Lagrangian Duality

Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

Optimality Conditions 20/25 Recall: Lagrangian Duality

Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

Weak Duality OPT (dual) ≤ OPT (primal).

Optimality Conditions 20/25 Recall: Lagrangian Duality

Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

Strong Duality OPT (dual) = OPT (primal).

Optimality Conditions 20/25 If f0(x) − g(λ, ν) ≤ , then both are within  of optimality. OPT(primal) and OPT(dual) lie in the interval [g(λ, ν), f0(x)] Primal-dual algorithms use dual certificates to recognize optimality, or bound sub-optimality.

Dual Solution as a Certificate

Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

Dual solutions serves as a certificate of optimality

If f0(x) = g(λ, ν), and both are feasible, then both are optimal.

Optimality Conditions 21/25 Primal-dual algorithms use dual certificates to recognize optimality, or bound sub-optimality.

Dual Solution as a Certificate

Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

Dual solutions serves as a certificate of optimality

If f0(x) = g(λ, ν), and both are feasible, then both are optimal. If f0(x) − g(λ, ν) ≤ , then both are within  of optimality. OPT(primal) and OPT(dual) lie in the interval [g(λ, ν), f0(x)]

Optimality Conditions 21/25 Dual Solution as a Certificate

Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

Dual solutions serves as a certificate of optimality

If f0(x) = g(λ, ν), and both are feasible, then both are optimal. If f0(x) − g(λ, ν) ≤ , then both are within  of optimality. OPT(primal) and OPT(dual) lie in the interval [g(λ, ν), f0(x)] Primal-dual algorithms use dual certificates to recognize optimality, or bound sub-optimality.

Optimality Conditions 21/25 Implications of Strong Duality Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k. Facts If strong duality holds, and x∗ and (λ∗, ν∗) are feasible & optimal, then x∗ minimizes L(x, λ∗, ν∗) over all x. ∗ ∗ λi fi(x ) = 0 for all i = 1, . . . , m. (Complementary Slackness)

Optimality Conditions 22/25 Implications of Strong Duality Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k. Facts If strong duality holds, and x∗ and (λ∗, ν∗) are feasible & optimal, then x∗ minimizes L(x, λ∗, ν∗) over all x. ∗ ∗ λi fi(x ) = 0 for all i = 1, . . . , m. (Complementary Slackness) Proof ∗ ∗ ∗ ∗ ∗ f0(x ) = g(λ , ν ) = min L(x, λ , ν ) x m k ∗ ∗ ∗ ∗ X ∗ ∗ X ∗ ∗ ≤ L(x , λ , ν ) = f0(x ) + λi fi(x ) + νi hi(x ) i=1 i=1 ∗ ≤ f0(x ) Optimality Conditions 22/25 Implications of Strong Duality Primal Problem Dual Problem min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k. Facts If strong duality holds, and x∗ and (λ∗, ν∗) are feasible & optimal, then x∗ minimizes L(x, λ∗, ν∗) over all x. ∗ ∗ λi fi(x ) = 0 for all i = 1, . . . , m. (Complementary Slackness) Interpretation Lagrange multipliers (λ∗, ν∗) “simulate” the primal feasibility constraints Interpreting λi as the “value” of the i’th constraint, at optimality only the binding constraints are “valuable” Recall economic interpretation of LP

Optimality Conditions 22/25 Why are KKT Conditions Useful? Derive an analytical solution to some convex optimization problems Gain structural insights

min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

KKT Conditions Suppose the primal problem is convex and defined on an open domain, and moreover the constraint functions are differentiable everywhere in the domain. If strong duality holds, then x∗ and (λ∗, ν∗) are optimal iff: x∗ and (λ∗, ν∗) are feasible ∗ ∗ λi fi(x ) = 0 for all i (Complementary Slackness) ∗ ∗ ∗ ∗ Pm ∗ ∗ Pk ∗ ∗ 5xL(x , λ , ν ) = 5f0(x )+ i=1 λi 5fi(x )+ i=1 νi 5hi(x ) = 0

Optimality Conditions 23/25 min f (x) 0 max g(λ, ν) s.t. s.t. f (x) ≤ 0, ∀i = 1, . . . , m. i λ  0 hi(x) = 0, ∀i = 1, . . . , k.

KKT Conditions Suppose the primal problem is convex and defined on an open domain, and moreover the constraint functions are differentiable everywhere in the domain. If strong duality holds, then x∗ and (λ∗, ν∗) are optimal iff: x∗ and (λ∗, ν∗) are feasible ∗ ∗ λi fi(x ) = 0 for all i (Complementary Slackness) ∗ ∗ ∗ ∗ Pm ∗ ∗ Pk ∗ ∗ 5xL(x , λ , ν ) = 5f0(x )+ i=1 λi 5fi(x )+ i=1 νi 5hi(x ) = 0

Why are KKT Conditions Useful? Derive an analytical solution to some convex optimization problems Gain structural insights

Optimality Conditions 23/25 Example: Equality-constrained Quadratic Program

1 | | minimize 2 x P x + q x + r subject to Ax = b

KKT Conditions: Ax∗ = b and P x∗ + q + A|ν∗ = 0 Simply a solution of a linear system with variables x∗ and ν∗. m + n constraints and m + n variables

Optimality Conditions 24/25 Does there exist a market equilibrium?

Prices pj on items, such that each player can buy his favorite bundle that he can afford and the market clears (supply = demand).

Eisenberg-Gale Convex Program P P maximize i mi log j uijxij P subject to i xij ≤ 1, for j ∈ G. x  0

Using KKT conditions, we can prove that the dual variables corresponding to the item supply constraints are market-clearing prices!

Example: Market Equilibria (Fisher’s Model)

Buyers B, and goods G.

Buyer i has utility uij for each unit of good G.

Buyer i has budget mi, and there’s one divisible unit of each good.

Optimality Conditions 25/25 Eisenberg-Gale Convex Program P P maximize i mi log j uijxij P subject to i xij ≤ 1, for j ∈ G. x  0

Using KKT conditions, we can prove that the dual variables corresponding to the item supply constraints are market-clearing prices!

Example: Market Equilibria (Fisher’s Model)

Buyers B, and goods G.

Buyer i has utility uij for each unit of good G.

Buyer i has budget mi, and there’s one divisible unit of each good. Does there exist a market equilibrium?

Prices pj on items, such that each player can buy his favorite bundle that he can afford and the market clears (supply = demand).

Optimality Conditions 25/25 Using KKT conditions, we can prove that the dual variables corresponding to the item supply constraints are market-clearing prices!

Example: Market Equilibria (Fisher’s Model)

Buyers B, and goods G.

Buyer i has utility uij for each unit of good G.

Buyer i has budget mi, and there’s one divisible unit of each good. Does there exist a market equilibrium?

Prices pj on items, such that each player can buy his favorite bundle that he can afford and the market clears (supply = demand).

Eisenberg-Gale Convex Program P P maximize i mi log j uijxij P subject to i xij ≤ 1, for j ∈ G. x  0

Optimality Conditions 25/25 Example: Market Equilibria (Fisher’s Model)

Buyers B, and goods G.

Buyer i has utility uij for each unit of good G.

Buyer i has budget mi, and there’s one divisible unit of each good. Does there exist a market equilibrium?

Prices pj on items, such that each player can buy his favorite bundle that he can afford and the market clears (supply = demand).

Eisenberg-Gale Convex Program P P maximize i mi log j uijxij P subject to i xij ≤ 1, for j ∈ G. x  0

Using KKT conditions, we can prove that the dual variables corresponding to the item supply constraints are market-clearing prices!

Optimality Conditions 25/25