Lagrangian Duality and Convex Optimization

Lagrangian Duality and Convex Optimization David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 33 Introduction Introduction David Rosenberg (New York University) DS-GA 1003 July 26, 2017 2 / 33 Introduction Why Convex Optimization? Historically: Linear programs (linear objectives & constraints) were the focus Nonlinear programs: some easy, some hard By early 2000s: Main distinction is between convex and non-convex problems Convex problems are the ones we know how to solve efficiently Mostly batch methods until... around 2010? (earlier if you were into neural nets) By 2010 +- few years, most people understood the optimization / estimation / approximation error tradeoffs accepted that stochatic methods were often faster to get good results (especially on big data sets) now nobody’s scared to try convex optimization machinery on non-convex problems David Rosenberg (New York University) DS-GA 1003 July 26, 2017 3 / 33 Introduction Your Reference for Convex Optimization Boyd and Vandenberghe (2004) Very clearly written, but has a ton of detail for a first pass. See the Extreme Abridgement of Boyd and Vandenberghe. David Rosenberg (New York University) DS-GA 1003 July 26, 2017 4 / 33 Introduction Notation from Boyd and Vandenberghe f : Rp ! Rq to mean that f maps from some subset of Rp namely dom f ⊂ Rp, where dom f is the domain of f David Rosenberg (New York University) DS-GA 1003 July 26, 2017 5 / 33 Convex Sets and Functions Convex Sets and Functions David Rosenberg (New York University) DS-GA 1003 July 26, 2017 6 / 33 Convex Sets and Functions Convex Sets Definition A set C is convex if for any x1,x2 2 C and any θ with 0 6 θ 6 1 we have θx1 + (1 - θ)x2 2 C. KPM Fig. 7.4 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 7 / 33 Convex Sets and Functions Convex and Concave Functions Definition A function f : Rn ! R is convex if dom f is a convex set and if for all x,y 2 dom f , and 0 6 θ 6 1, we have f (θx + (1 - θ)y) 6 θf (x) + (1 - θ)f (y). 1 − λ λ x y A B KPM Fig. 7.5 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 8 / 33 Convex Sets and Functions Examples of Convex Functions on R Examples x 7! ax + b is both convex and concave on R for all a,b 2 R. p x 7! jxj for p > 1 is convex on R x 7! eax is convex on R for all a 2 R n Every norm on R is convex (e.g. kxk1 and kxk2) n Max: (x1,...,xn) 7! maxfx1 ...,xng is convex on R David Rosenberg (New York University) DS-GA 1003 July 26, 2017 9 / 33 Convex Sets and Functions Convex Functions and Optimization Definition A function f is strictly convex if the line segment connecting any two points on the graph of f lies strictly above the graph (excluding the endpoints). Consequences for optimization: convex: if there is a local minimum, then it is a global minimum strictly convex: if there is a local minimum, then it is the unique global minumum David Rosenberg (New York University) DS-GA 1003 July 26, 2017 10 / 33 The General Optimization Problem The General Optimization Problem David Rosenberg (New York University) DS-GA 1003 July 26, 2017 11 / 33 The General Optimization Problem General Optimization Problem: Standard Form General Optimization Problem: Standard Form minimize f0(x) subject to fi (x) 6 0, i = 1,...,m hi (x) = 0, i = 1,...p, n where x 2 R are the optimization variables and f0 is the objective function. Tm Tp Assume domain D = i=0 dom fi \ i=1 dom hi is nonempty. David Rosenberg (New York University) DS-GA 1003 July 26, 2017 12 / 33 The General Optimization Problem General Optimization Problem: More Terminology The set of points satisfying the constraints is called the feasible set. A point x in the feasible set is called a feasible point. If x is feasible and fi (x) = 0, then we say the inequality constraint fi (x) 6 0 is active at x. The optimal value p∗ of the problem is defined as ∗ p = inf ff0(x) j x satisfies all constraintsg. x∗ is an optimal point (or a solution to the problem) if x∗ is feasible and f (x∗) = p∗. David Rosenberg (New York University) DS-GA 1003 July 26, 2017 13 / 33 The General Optimization Problem Do We Need Equality Constraints? Note that h(x) = 0 () (h(x) > 0 AND h(x) 6 0) Consider an equality-constrained problem: minimize f0(x) subject to h(x) = 0 Can be rewritten as minimize f0(x) subject to h(x) 6 0 -h(x) 6 0. For simplicity, we’ll drop equality contraints from this presentation. David Rosenberg (New York University) DS-GA 1003 July 26, 2017 14 / 33 Lagrangian Duality: Convexity not required Lagrangian Duality: Convexity not required David Rosenberg (New York University) DS-GA 1003 July 26, 2017 15 / 33 Lagrangian Duality: Convexity not required The Lagrangian The general [inequality-constrained] optimization problem is: minimize f0(x) subject to fi (x) 6 0, i = 1,...,m Definition The Lagrangian for this optimization problem is m L(x,λ) = f0(x) + λi fi (x). i=1 X λi ’s are called Lagrange multipliers (also called the dual variables). David Rosenberg (New York University) DS-GA 1003 July 26, 2017 16 / 33 Lagrangian Duality: Convexity not required The Lagrangian Encodes the Objective and Constraints Supremum over Lagrangian gives back encoding of objective and constraints: m ! sup L(x,λ) = sup f0(x) + λi fi (x) λ0 λ0 i=1 X f (x) when f (x) 0 all i = 0 i 6 otherwise. Equivalent primal form of optimization problem:1 p∗ = inf sup L(x,λ) x λ0 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 17 / 33 Lagrangian Duality: Convexity not required The Primal and the Dual Original optimization problem in primal form: p∗ = inf sup L(x,λ) x λ0 Get the Lagrangian dual problem by “swapping the inf and the sup”: d∗ = sup inf L(x,λ) λ0 x ∗ ∗ We will show weak duality: p > d for any optimization problem David Rosenberg (New York University) DS-GA 1003 July 26, 2017 18 / 33 Lagrangian Duality: Convexity not required Weak Max-Min Inequality Theorem For any f : W × Z ! R, we have sup inf f (w,z) 6 inf sup f (w,z). z2Z w2W w2W z2Z Proof. For any w0 2 W and z0 2 Z, we clearly have inf f (w,z0) 6 f (w0,z0) 6 sup f (w0,z). w2W z2Z Since infw2W f (w,z0) 6 supz2Z f (w0,z) for all w0 and z0, we must also have sup inf f (w,z0) 6 inf sup f (w0,z). z02Z w2W w02W z2Z David Rosenberg (New York University) DS-GA 1003 July 26, 2017 19 / 33 Lagrangian Duality: Convexity not required Weak Duality For any optimization problem (not just convex), weak max-min inequality implies weak duality: " m # ∗ p =inf sup f0(x) + λi fi (x) x λ0 i=1 " Xm # ∗ sup inf f0(x) + λi fi (x) = d > x λ0,ν i=1 X The difference p∗ - d∗ is called the duality gap. For convex problems, we often have strong duality: p∗ = d∗. David Rosenberg (New York University) DS-GA 1003 July 26, 2017 20 / 33 Lagrangian Duality: Convexity not required The Lagrange Dual Function The Lagrangian dual problem: d∗ = sup inf L(x,λ) λ0 x Definition The Lagrange dual function (or just dual function) is m ! g(λ) = inf L(x,λ) = inf f0(x) + λi fi (x) . x x i=1 X The dual function may take on the value - (e.g. f0(x) = x). The dual function is always concave since pointwise min of affine functions 1 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 21 / 33 Lagrangian Duality: Convexity not required The Lagrange Dual Problem: Search for Best Lower Bound In terms of Lagrange dual function, we can write weak duality as ∗ ∗ p > sup g(λ) = d λ>0 So for any λ with λ > 0, Lagrange dual function gives a lower bound on optimal solution: ∗ p > g(λ) for all λ > 0 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 22 / 33 Lagrangian Duality: Convexity not required The Lagrange Dual Problem: Search for Best Lower Bound The Lagrange dual problem is a search for best lower bound on p∗: maximize g(λ) subject to λ 0. λ dual feasible if λ 0 and g(λ) > - . λ∗ dual optimal or optimal Lagrange multipliers if they are optimal for the Lagrange dual problem. 1 Lagrange dual problem often easier to solve (simpler constraints). d∗ can be used as stopping criterion for primal optimization. Dual can reveal hidden structure in the solution. David Rosenberg (New York University) DS-GA 1003 July 26, 2017 23 / 33 Convex Optimization Convex Optimization David Rosenberg (New York University) DS-GA 1003 July 26, 2017 24 / 33 Convex Optimization Convex Optimization Problem: Standard Form Convex Optimization Problem: Standard Form minimize f0(x) subject to fi (x) 6 0, i = 1,...,m where f0,...,fm are convex functions. David Rosenberg (New York University) DS-GA 1003 July 26, 2017 25 / 33 Convex Optimization Strong Duality for Convex Problems For a convex optimization problems, we usually have strong duality, but not always. For example: minimize e-x 2 subject to x =y 6 0 y > 0 The additional conditions needed are called constraint qualifications.

Lagrangian Duality and Convex Optimization

Boosting Algorithms for Maximizing the Soft Margin

On the Dual Formulation of Boosting Algorithms

Applications of LP Duality

Duality Theory and Sensitivity Analysis

On the Ekeland Variational Principle with Applications and Detours

Lagrangian Duality

Arxiv:2011.09194V1 [Math.OC]

The Idea Behind Duality Lec12p1, ORF363/COS323

Norm to Weak Upper Semicontinuity of Pre Duality

Lecture 11: October 8 11.1 Primal and Dual Problems

Multiobjective Nonlinear Symmetric Duality Involving Generalized Pseudoconvexity

Lagrangian Duality and Convex Optimization