Convex Analysis, Duality and Optimization

Convex Analysis, Duality and Optimization Yao-Liang Yu [email protected] Dept. of Computing Science University of Alberta March 7, 2010 Prelude Basic Convex Analysis Convex Optimization Fenchel Conjugate Minimax Theorem Lagrangian Duality References Outline Prelude Basic Convex Analysis Convex Optimization Fenchel Conjugate Minimax Theorem Lagrangian Duality References Notations Used Throughout I C for convex set, S for arbitrary set, K for convex cone, I g(·) is for arbitrary functions, not necessarily convex, I f (·) is for convex functions, for simplicity, we assume f (·) is closed, proper, continuous, and differentiable when needed, I min (max) means inf (sup) when needed, I w.r.t.: with respect to; w.l.o.g.: without loss of generality; u.s.c.: upper semi-continuous; l.s.c.: lower semi-continuous; int: interior point; RHS: right hand side; w.p.1: with probability 1. Historical Note I 60s: Linear Programming, Simplex Method I 70s-80s: (Convex) Nonlinear Programming, Ellipsoid Method, Interior-Point Method I 90s: Convexification almost everywhere I Now: Large-scale optimization, First-order gradient method But... Neither of poly-time solvability and convexity implies the other. NP-Hard convex problems abound. Outline Prelude Basic Convex Analysis Convex Optimization Fenchel Conjugate Minimax Theorem Lagrangian Duality References Convex Sets and Functions Definition (Convex set) A point set C is said to be convex if 8 λ 2 [0; 1]; x; y 2 C, we have λx + (1 − λ)y 2 C. Definition (Convex function) A function f (·) is said to be convex if 1. domf is convex, and, 2. 8 λ 2 [0; 1]; x; y 2 domf , we have f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y); x Or equivalently, f (·) is convex if its epigraph f t : f (x) ≤ tg is a convex set. I Function h(·) is concave iff −h(·) is convex, I h(·) is called affine (linear) iff it's both convex and concave, I No concave set. Affine set: drop the constraint on λ. More on Convex functions Definition (Strongly Convex Function) f (x) is said to be µ-strongly convex with respect to a norm k · k iff dom f is convex and 8λ 2 [0; 1], λ(1 − λ) f (λx + (1 − λ)y) + µ · kx − yk2 ≤ λf (x) + (1 − λ)f (y): 2 Proposition (Sufficient Conditions for µ-Strong Convexity) 1. Zero Order: definition 2. First Order: 8x; y 2 dom f ; µ f (y) ≥ f (x) + hrf (x); x − yi + kx − yk2: 2 3. Second Order: 8x; y 2 dom f ; hr2f (x)y; yi ≥ µkyk2: Elementary Convex Functions (to name a few) I Negative entropy x log x is convex on x > 0, 1=p h P pi I `p-norm kxkp := i jxi j is convex when p ≥ 1, concave otherwise (except p = 0), P I Log-sum-exp function log i exp(xi ) is convex, same is true for the matrix version log Tr exp(X ) on symmetric matrices, T −1 I Quadratic-over-linear function x Y x is jointly convex in x and Y 0, what if Y 0? I Log-determinant log det X is concave on X 0, what about log det X −1? −1 I Tr X is convex on X 0, I The largest element x[1] = maxi xi is convex; moreover, sum of largest k elements is convex; what about smallest analogies? I The largest eigenvalue of symmetric matrices is convex; moreover, sum of largest k eigenvalues of symmetric matrices is also convex; can we drop the condition symmetric? Compositions Proposition (Affine Transform) AC := fAx : x 2 Cg and A−1C := fx : Ax 2 Cg are convex sets. Similarly, (Af )(x) := min f (y) and (fA)(x) := f (Ax) are convex. Ay=x Proposition (Sufficient but NOT Necessary) f ◦ g is convex if I g(·) is convex and f (·) is non-decreasing, or I g(·) is concave and f (·) is non-increasing. Proof. For simplicity, assume f ◦ g is twice differentiable. Use the second-order sufficient condition. Remark: One needs to check if domf ◦ g is convex! However, this is unnecessary if we consider extended-value functions. Operators Preserving Convexity Proposition (Algebraic) For θ > 0, λC := fθx : x 2 Cg is convex; θf (x) is convex; and f1(x) + f2(x) is convex. Proposition (Intersection v.s. Supremum) I Intersection of arbitrary collection of convex sets is convex; I Similarly, pointwise supremum of arbitrary collection of convex functions is convex. Proposition (Sum v.s. Infimal Convolution) I C1 + C2 := fx1 + x2 : xi 2 Ci g is convex; I Similarly, (f1f2)(x) := infy ff1(y) + f2(x − y)g is convex. Proof. Consider affine transform. What about union v.s. infimum? Needs extra convexification. Convex Hull Definition (Convex Hull) The convex hull of S, denoted convS, is the smallest convex set containing S, i.e. the intersection of all convex sets containing S. Similarly, the convex hull of g(x), denoted convg, is the greatest convex function dominated by g, i.e. the pointwise supremum of all convex functions dominated by g. Theorem (Carathéodory, 1911) n The convex hull of any set S 2 R is: n+1 n+1 X X fx : x = λi xi ; xi 2 S; λi ≥ 0; λi = 1g: i=1 i=1 We will see how to compute convg later. Cones and Conic Hull Definition (Cone and Positively Homogeneous Function) A set S is called a cone if 8x 2 S; θ ≥ 0; we have θx 2 S. Similarly, a function g(x) is called positively homogeneous if 8θ ≥ 0; g(θx) = θg(x). K is a convex cone if it is a cone and is convex, specifically, if 8x1; x2 2 K; θ1; θ2 ≥ 0; ) θ1x1 + θ2x2 2 K. Similarly, f (x) is positively homogeneous convex if it is positively homogeneous and convex, specifically, if 8x1; x2 2 domf ; θ1; θ2 ≥ 0; ) f (θ1x1 + θ2x2) ≤ θ1f (x1) + θ2f (x2). Remark: Under the above definitions, cones always contain the origin, and positively homogeneous functions equal 0 at the origin. Definition (Conic Hull) The conic hull of S is the smallest convex cone containing S. Similarly, the conic hull of g(x), denoted coneg, is the greatest positively homogeneous convex function dominated by g. Conic Hull cont' Theorem (Carathéodory, 1911) n The conic hull of any set S 2 R is: n X fx : x = θi xi ; xi 2 S; θi ≥ 0; g: i=1 For convex function f (x), its conic hull is: (conef )(x) = min θ · f (θ−1x): θ≥0 How to compute coneg? Hint: coneg = cone convg, why? Elementary Convex Sets (to name a few) T I Hyperplane a x = α is convex, T I Half space a x ≤ α is convex, I Affine set Ax = b is convex (proof?), I Polyhedra set Ax ≤ b is convex (proof?), Proposition (Level sets) (Sub)level sets of f (x), defined as fx : f (x) ≤ αg are convex. Proof. Consider the intersection of the epigraph of f (x) and the 0T x hyperplane 1 t = α. Remark: A function, with all level sets being convex, is not necessarily convex! We call such functions, with convex domain, quasi-convex. P Convince yourself the `0-norm, defined as kxk0 = i I[xi 6= 0], is not convex. Show that -kxk0 on x ≥ 0 is quasi-convex. Elementary Convex Sets cont' T −1 I Ellipsoid fx :(x − xc ) P (x − xc ) ≤ 1; P 0g or 1=2 fxc + P u : kuk2 ≤ 1g is convex, I Nonnegative orthant x ≥ 0 is a convex cone, I All positive (semi)definite matrices compose a convex cone (positive (semi)definite cone) X 0 (X 0), x I All norm cones f t : kxk ≤ tg are convex, in particular, for the Euclidean norm, the cone is called second order cone or Lorentz cone or ice-cream cone. Remark: This is essentially saying that all norms are convex. `0-norm is not convex? No, but it's not a \norm" either. People call it \norm" unjustly. Outline Prelude Basic Convex Analysis Convex Optimization Fenchel Conjugate Minimax Theorem Lagrangian Duality References Unconstrained Consider the simple problem: min f (x); (1) x where f (·) is defined in the whole space. Theorem (First-order Optimality Condition) A sufficient and necessary condition for x? to be the minimizer of (1) is: 0 2 @f (x?): (2) When f (·) is differentiable, (2) reduces to rf (x?) = 0. Remark: I The minimizer is unique when f (·) is strictly convex, I For general nonconvex functions g(·), the condition in (2) gives only critical (stationary) points, which could be minimizer, maximizer, or nothing (saddle-point). Simply Constrained Consider the constrained problem: min f (x); (3) x2C where f (·) is defined in the whole space. Is rf (x?) = 0 still the optimality condition? If you think so, consider the example: min x: x2[1;2] Theorem (First-order Optimality Condition) A sufficient and necessary condition for x? to be the minimizer of (3) is (assuming differentiability): 8x 2 C; (x − x?)T rf (x?) ≥ 0: (4) Verify this condition is indeed satisfied by the example above. General Convex Program We say a problem is convex if it is of the following form: min f0(x) x2C s:t: fi (x) ≤ 0; i = 1;:::; m Ax = b; where fi (x); i = 0;:::; m are all convex. Remark: I The equality constraint must be affine! But affine functions are free to appear in inequality constraints. I The objective function, being convex, is to be minimized. Sometimes we see maximizinga concave function, no difference (why?). I The inequality constraints are ≤, which lead to a convex feasible region (why?).

Convex Analysis, Duality and Optimization

Boosting Algorithms for Maximizing the Soft Margin

On the Dual Formulation of Boosting Algorithms

1 Overview 2 Elements of Convex Analysis

CORE View Metadata, Citation and Similar Papers at Core.Ac.Uk

Lagrangian Duality in Convex Optimization

Applications of Convex Analysis Within Mathematics

Convex) Level Sets Integration

Turbo Decoding As Iterative Constrained Maximum Likelihood Sequence Detection John Maclaren Walsh, Member, IEEE, Phillip A

A Comparative Study of Time Delay Estimation Techniques for Road Vehicle Tracking Patrick Marmaroli, Xavier Falourd, Hervé Lissek

Applications of LP Duality

Duality Theory and Sensitivity Analysis

Optimization Basic Results