Convex Analysis, Duality and Optimization
Total Page:16
File Type:pdf, Size:1020Kb
Convex Analysis, Duality and Optimization Yao-Liang Yu [email protected] Dept. of Computing Science University of Alberta March 7, 2010 Contents 1 Prelude 1 2 Basic Convex Analysis 2 3 Convex Optimization 5 4 Fenchel Conjugate 11 5 Minimax Theorem 13 6 Lagrangian Duality 15 7 References 18 1 Prelude Notations Used Throughout •C for convex set, S for arbitrary set, K for convex cone, • g(·) is for arbitrary functions, not necessarily convex, • f(·) is for convex functions, for simplicity, we assume f(·) is closed, proper, continuous, and differen- tiable when needed, • min (max) means inf (sup) when needed, • w.r.t.: with respect to; w.l.o.g.: without loss of generality; u.s.c.: upper semi-continuous; l.s.c.: lower semi-continuous; int: interior point; RHS: right hand side; w.p.1: with probability 1. Historical Note • 60s: Linear Programming, Simplex Method • 70s-80s: (Convex) Nonlinear Programming, Ellipsoid Method, Interior-Point Method • 90s: Convexification almost everywhere 1 • Now: Large-scale optimization, First-order gradient method But... Neither of poly-time solvability and convexity implies the other. NP-Hard convex problems abound. 2 Basic Convex Analysis Convex Sets and Functions Definition 1 (Convex set). A point set C is said to be convex if 8 λ 2 [0; 1]; x; y 2 C, we have λx+(1−λ)y 2 C. Definition 2 (Convex function). A function f(·) is said to be convex if 1. domf is convex, and, 2. 8 λ 2 [0; 1]; x; y 2 domf, we have f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y); x Or equivalently, f(·) is convex if its epigraph f t : f(x) ≤ tg is a convex set. • Function h(·) is concave iff −h(·) is convex, • h(·) is called affine (linear) iff it's both convex and concave, • No concave set. Affine set: drop the constraint on λ. More on Convex functions Definition 3 (Strongly Convex Function). f(x) is said to be µ-strongly convex with respect to a norm k · k iff dom f is convex and 8λ 2 [0; 1], λ(1 − λ) f(λx + (1 − λ)y) + µ · kx − yk2 ≤ λf(x) + (1 − λ)f(y): 2 Proposition 1 (Sufficient Conditions for µ-Strong Convexity). 1. Zero Order: definition 2. First Order: 8x; y 2 dom f; µ f(y) ≥ f(x) + hrf(x); x − yi + kx − yk2: 2 3. Second Order: 8x; y 2 dom f; hr2f(x)y; yi ≥ µkyk2: Elementary Convex Functions (to name a few) • Negative entropy x log x is convex on x > 0, 1=p h P pi • `p-norm kxkp := i jxij is convex when p ≥ 1, concave otherwise (except p = 0), P • Log-sum-exp function log i exp(xi) is convex, same is true for the matrix version log Tr exp(X) on symmetric matrices, • Quadratic-over-linear function xT Y −1x is jointly convex in x and Y 0, what if Y 0? • Log-determinant log det X is concave on X 0, what about log det X−1? 2 • Tr X−1 is convex on X 0, • The largest element x[1] = maxi xi is convex; moreover, sum of largest k elements is convex; what about smallest analogies? • The largest eigenvalue of symmetric matrices is convex; moreover, sum of largest k eigenvalues of symmetric matrices is also convex; can we drop the condition symmetric? Compositions Proposition 2 (Affine Transform). AC := fAx : x 2 Cg and A−1C := fx : Ax 2 Cg are convex sets. Similarly, (Af)(x) := min f(y) and (fA)(x) := f(Ax) are convex. Ay=x Proposition 3 (Sufficient but NOT Necessary). f ◦ g is convex if • g(·) is convex and f(·) is non-decreasing, or • g(·) is concave and f(·) is non-increasing. Proof. For simplicity, assume f ◦ g is twice differentiable. Use the second-order sufficient condition. Remark: One needs to check if domf ◦ g is convex! However, this is unnecessary if we consider extended- value functions. Operators Preserving Convexity Proposition 4 (Algebraic). For θ > 0, λC := fθx : x 2 Cg is convex; θf(x) is convex; and f1(x) + f2(x) is convex. Proposition 5 (Intersection v.s. Supremum). • Intersection of arbitrary collection of convex sets is convex; • Similarly, pointwise supremum of arbitrary collection of convex functions is convex. Proposition 6 (Sum v.s. Infimal Convolution). •C 1 + C2 := fx1 + x2 : xi 2 Cig is convex; • Similarly, (f1f2)(x) := infyff1(y) + f2(x − y)g is convex. Proof. Consider affine transform. What about union v.s. infimum? Needs extra convexification. Convex Hull Definition 4 (Convex Hull). The convex hull of S, denoted convS, is the smallest convex set containing S, i.e. the intersection of all convex sets containing S. Similarly, the convex hull of g(x), denoted convg, is the greatest convex function dominated by g, i.e. the pointwise supremum of all convex functions dominated by g. Theorem 5 (Carath´eodory, 1911). The convex hull of any set S 2 Rn is: n+1 n+1 X X fx : x = λixi; xi 2 S; λi ≥ 0; λi = 1g: i=1 i=1 We will see how to compute convg later. 3 Cones and Conic Hull Definition 6 (Cone and Positively Homogeneous Function). A set S is called a cone if 8x 2 S; θ ≥ 0; we have θx 2 S. Similarly, a function g(x) is called positively homogeneous if 8θ ≥ 0; g(θx) = θg(x). K is a convex cone if it is a cone and is convex, specifically, if 8x1; x2 2 K; θ1; θ2 ≥ 0; ) θ1x1 + θ2x2 2 K. Similarly, f(x) is positively homogeneous convex if it is positively homogeneous and convex, specifically, if 8x1; x2 2 domf; θ1; θ2 ≥ 0; ) f(θ1x1 + θ2x2) ≤ θ1f(x1) + θ2f(x2). Remark: Under the above definitions, cones always contain the origin, and positively homogeneous func- tions equal 0 at the origin. Definition 7 (Conic Hull). The conic hull of S is the smallest convex cone containing S. Similarly, the conic hull of g(x), denoted coneg, is the greatest positively homogeneous convex function dominated by g. Theorem 8 (Carath´eodory, 1911). The conic hull of any set S 2 Rn is: n X fx : x = θixi; xi 2 S; θi ≥ 0; g: i=1 For convex function f(x), its conic hull is: (conef)(x) = min θ · f(θ−1x): θ≥0 How to compute coneg? Hint: coneg = cone convg, why? Elementary Convex Sets (to name a few) • Hyperplane aT x = α is convex, • Half space aT x ≤ α is convex, • Affine set Ax = b is convex (proof?), • Polyhedra set Ax ≤ b is convex (proof?), Proposition 7 (Level sets). (Sub)level sets of f(x), defined as fx : f(x) ≤ αg are convex. 0T x Proof. Consider the intersection of the epigraph of f(x) and the hyperplane 1 t = α. Remark: A function, with all level sets being convex, is not necessarily convex! We call such functions, with convex domain, quasi-convex. P Convince yourself the `0-norm, defined as kxk0 = i I[xi 6= 0], is not convex. Show that -kxk0 on x ≥ 0 is quasi-convex. T −1 1=2 • Ellipsoid fx :(x − xc) P (x − xc) ≤ 1;P 0g or fxc + P u : kuk2 ≤ 1g is convex, • Nonnegative orthant x ≥ 0 is a convex cone, • All positive (semi)definite matrices compose a convex cone (positive (semi)definite cone) X 0 (X 0), x • All norm cones f t : kxk ≤ tg are convex, in particular, for the Euclidean norm, the cone is called second order cone or Lorentz cone or ice-cream cone. Remark: This is essentially saying that all norms are convex. `0-norm is not convex? No, but it's not a \norm" either. People call it \norm" unjustly. 4 3 Convex Optimization Unconstrained Consider the simple problem: min f(x); (1) x where f(·) is defined in the whole space. Theorem 9 (First-order Optimality Condition). A sufficient and necessary condition for x? to be the minimizer of (1) is: 0 2 @f(x?): (2) When f(·) is differentiable, (2) reduces to rf(x?) = 0. Remark: • The minimizer is unique when f(·) is strictly convex, • For general nonconvex functions g(·), the condition in (2) gives only critical (stationary) points, which could be minimizer, maximizer, or nothing (saddle-point). Simply Constrained Consider the constrained problem: min f(x); (3) x2C where f(·) is defined in the whole space. Is rf(x?) = 0 still the optimality condition? If you think so, consider the example: min x: x2[1;2] Theorem 10 (First-order Optimality Condition). A sufficient and necessary condition for x? to be the minimizer of (3) is (assuming differentiability): 8x 2 C; (x − x?)T rf(x?) ≥ 0: (4) Verify this condition is indeed satisfied by the example above. General Convex Program We say a problem is convex if it is of the following form: min f0(x) x2C s:t: fi(x) ≤ 0; i = 1; : : : ; m Ax = b; where fi(x); i = 0; : : : ; m are all convex. Remark: • The equality constraint must be affine! But affine functions are free to appear in inequality constraints. • The objective function, being convex, is to be minimized. Sometimes we see maximizinga concave function, no difference (why?). • The inequality constraints are ≤, which lead to a convex feasible region (why?). • To summarize, convex programs are to minimizea convex function over a convex feasible region. 5 Two Optimization Strategies Usually, unconstrained problems are easier to handle than constrained ones, and there are two typical ways to convert constrained problems into unconstrained ones. Example 11 (Barrier Method). Given the convex program, determine the feasible region (needs to be com- pact), and then construct a barrier function, say b(x), which is convex and quickly grows to 1 when x, the decision variable, approaches the boundary of the feasible region.