2. Subgradients
Total Page:16
File Type:pdf, Size:1020Kb
L. Vandenberghe ECE236C (Spring 2020) 2. Subgradients • definition • subgradient calculus • duality and optimality conditions • directional derivative 2.1 Basic inequality recall the basic inequality for differentiable convex functions: f ¹yº ≥ f ¹xº + r f ¹xºT¹y − xº for all y 2 dom f ¹x; f ¹xºº r f ¹xº −1 • the first-order approximation of f at x is a global lower bound • r f ¹xº defines non-vertical supporting hyperplane to epigraph of f at ¹x; f ¹xºº: r f ¹xº T y x − ≤ 0 for all ¹y;tº 2 epi f −1 t f ¹xº Subgradients 2.2 Subgradient g is a subgradient of a convex function f at x 2 dom f if f ¹yº ≥ f ¹xº + gT¹y − xº for all y 2 dom f f ¹yº T f ¹x1º + g1 ¹y − x1º T f ¹x1º + g2 ¹y − x1º T f ¹x2º + g3 ¹y − x2º x1 x2 g1, g2 are subgradients at x1; g3 is a subgradient at x2 Subgradients 2.3 Subdifferential the subdifferential @ f ¹xº of f at x is the set of all subgradients: @ f ¹xº = fg j gT¹y − xº ≤ f ¹yº − f ¹xº; 8y 2 dom f g Properties • @ f ¹xº is a closed convex set (possibly empty) this follows from the definition: @ f ¹xº is an intersection of halfspaces • if x 2 int dom f then @ f ¹xº is nonempty and bounded proof on next two pages Subgradients 2.4 Proof: we show that @ f ¹xº is nonempty when x 2 int dom f • ¹x; f ¹xºº is in the boundary of the convex set epi f • therefore there exists a supporting hyperplane to epi f at ¹x; f ¹xºº: a T y x 9¹a; bº 0; − ≤ 0 8¹y;tº 2 epi f , b t f ¹xº • b > 0 gives a contradiction as t ! 1 • b = 0 gives a contradiction for y = x + a with small > 0 1 • therefore b < 0 and g = a is a subgradient of f at x jbj Subgradients 2.5 Proof: @ f ¹xº is bounded when x 2 int dom f • for small r > 0, define a set of 2n points B = fx ± rek j k = 1;:::; ng ⊂ dom f and define M = max f ¹yº < 1 y2B • for every g 2 @ f ¹xº, there is a point y 2 B with T rkgk1 = g ¹y − xº (choose an index k with jgk j = kgk1, and take y = x + r sign¹gkºek) • since g is a subgradient, this implies that T f ¹xº + rkgk1 = f ¹xº + g ¹y − xº ≤ f ¹yº ≤ M • we conclude that @ f ¹xº is bounded: M − f ¹xº kgk1 ≤ for all g 2 @ f ¹xº r Subgradients 2.6 Example f ¹xº = max f f1¹xº; f2¹xºg with f1, f2 convex and differentiable f ¹yº f2¹yº f1¹yº • if f1¹xˆº = f2¹xˆº, subdifferential at xˆ is line segment »r f1¹xˆº; r f2¹xˆº¼ • if f1¹xˆº > f2¹xˆº, subdifferential at xˆ is fr f1¹xˆºg • if f1¹xˆº < f2¹xˆº, subdifferential at xˆ is fr f2¹xˆºg Subgradients 2.7 Examples Absolute value f ¹xº = jxj f ¹xº @ f ¹xº 1 x x −1 Euclidean norm f ¹xº = kxk2 1 @ f ¹xº = f xg if x , 0;@ f ¹xº = fg j kgk2 ≤ 1g if x = 0 kxk2 Subgradients 2.8 Monotonicity the subdifferential of a convex function is a monotone operator: ¹u − vºT¹x − yº ≥ 0 for all x, y, u 2 @ f ¹xº, v 2 @ f ¹yº Proof: by definition f ¹yº ≥ f ¹xº + uT¹y − xº; f ¹xº ≥ f ¹yº + vT¹x − yº combining the two inequalities shows monotonicity Subgradients 2.9 Examples of non-subdifferentiable functions the following functions are not subdifferentiable at x = 0 • f : R ! R, dom f = R+ f ¹xº = 1 if x = 0; f ¹xº = 0 if x > 0 • f : R ! R, dom f = R+ p f ¹xº = − x the only supporting hyperplane to epi f at ¹0; f ¹0ºº is vertical Subgradients 2.10 Subgradients and sublevel sets if g is a subgradient of f at x, then f ¹yº ≤ f ¹xº =) gT¹y − xº ≤ 0 g x f ¹yº ≤ f ¹xº the nonzero subgradients at x define supporting hyperplanes to the sublevel set fy j f ¹yº ≤ f ¹xºg Subgradients 2.11 Outline • definition • subgradient calculus • duality and optimality conditions • directional derivative Subgradient calculus Weak subgradient calculus: rules for finding one subgradient • sufficient for most nondifferentiable convex optimization algorithms • if you can evaluate f ¹xº, you can usually compute a subgradient Strong subgradient calculus: rules for finding @ f ¹xº (all subgradients) • some algorithms, optimality conditions, etc., need entire subdifferential • can be quite complicated we will assume that x 2 int dom f Subgradients 2.12 Basic rules Differentiable functions: @ f ¹xº = fr f ¹xºg if f is differentiable at x Nonnegative linear combination if f ¹xº = α1 f1¹xº + α2 f2¹xº with α1; α2 ≥ 0, then @ f ¹xº = α1@ f1¹xº + α2@ f2¹xº (right-hand side is addition of sets) Affine transformation of variables: if f ¹xº = h¹Ax + bº, then @ f ¹xº = AT @h¹Ax + bº Subgradients 2.13 Pointwise maximum f ¹xº = max f f1¹xº;:::; fm¹xºg define I¹xº = fi j fi¹xº = f ¹xºg, the ‘active’ functions at x Weak result to compute a subgradient at x, choose any k 2 I¹xº, any subgradient of fk at x Strong result [ @ f ¹xº = conv @ fi¹xº i2I¹xº • the convex hull of the union of subdifferentials of ‘active’ functions at x • if fi’s are differentiable, @ f ¹xº = conv fr fi¹xº j i 2 I¹xºg Subgradients 2.14 Example: piecewise-linear function T f ¹xº = max ¹ai x + biº i=1;:::;m f ¹xº T ai x + bi x the subdifferential at x is a polyhedron @ f ¹xº = conv fai j i 2 I¹xºg T with I¹xº = fi j ai x + bi = f ¹xºg Subgradients 2.15 Example: `1-norm T f ¹xº = kxk1 = max s x s2f−1;1gn the subdifferential is a product of intervals 8 [−1;1¼ xk = 0 ><> @ f ¹xº = J1 × · · · × Jn; Jk = f1g xk > 0 > f−1g xk < 0 : ¹1;1º (−1;1º ¹1;1º ¹1;1º (−1;−1º ¹1;−1º ¹1;−1º @ f ¹0;0º = [−1;1¼ × [−1;1¼ @ f ¹1;0º = f1g × [−1;1¼ @ f ¹1;1º = f¹1;1ºg Subgradients 2.16 Pointwise supremum f ¹xº = sup fα¹xº; fα¹xº convex in x for every α α2A Weak result: to find a subgradient at xˆ, • find any β for which f ¹xˆº = fβ¹xˆº (assuming maximum is attained) • choose any g 2 @ fβ¹xˆº (Partial) strong result: define I¹xº = fα 2 A j fα¹xº = f ¹xºg [ conv @ fα¹xº ⊆ @ f ¹xº α2I¹xº equality requires extra conditions (for example, A compact, fα continuous in α) Subgradients 2.17 Exercise: maximum eigenvalue Problem: explain how to find a subgradient of T f ¹xº = λmax¹A¹xºº = sup y A¹xºy kyk2=1 where A¹xº = A0 + x1A1 + ··· + xn An with symmetric coefficients Ai Solution: to find a subgradient at xˆ, • choose any unit eigenvector y with eigenvalue λmax¹A¹xˆºº • the gradient of yT A¹xºy at xˆ is a subgradient of f : T T ¹y A1y;:::; y Anyº 2 @ f ¹xˆº Subgradients 2.18 Minimization f ¹xº = inf h¹x; yº; h jointly convex in ¹x; yº y Weak result: to find a subgradient at xˆ, • find yˆ that minimizes h¹xˆ; yº (assuming minimum is attained) • find subgradient ¹g;0º 2 @h¹xˆ; yˆº Proof: for all x, y, h¹x; yº ≥ h¹xˆ; yˆº + gT¹x − xˆº + 0T¹y − yˆº = f ¹xˆº + gT¹x − xˆº therefore f ¹xº = inf h¹x; yº ≥ f ¹xˆº + gT¹x − xˆº y Subgradients 2.19 Exercise: Euclidean distance to convex set Problem: explain how to find a subgradient of f ¹xº = inf kx − yk2 y2C where C is a closed convex set Solution: to find a subgradient at xˆ, • if f ¹xˆº = 0 (that is, xˆ 2 Cº, take g = 0 • if f ¹xˆº > 0, find projection yˆ = P¹xˆº on C and take 1 1 g = ¹xˆ − yˆº = ¹xˆ − P¹xˆºº kyˆ − xˆk2 kxˆ − P¹xˆºk2 Subgradients 2.20 Composition f ¹xº = h¹ f1¹xº;:::; fk¹xºº; h convex and nondecreasing, fi convex Weak result: to find a subgradient at xˆ, • find z 2 @h¹ f1¹xˆº;:::; fk¹xˆºº and gi 2 @ fi¹xˆº • then g = z1g1 + ··· + zkgk 2 @ f ¹xˆº reduces to standard formula for differentiable h, fi Proof: T T f ¹xº ≥ h f1¹xˆº + g1 ¹x − xˆº;:::; fk¹xˆº + gk ¹x − xˆº T T T ≥ h ¹ f1¹xˆº;:::; fk¹xˆºº + z g1 ¹x − xˆº;:::; gk ¹x − xˆº T = h ¹ f1¹xˆº;:::; fk¹xˆºº + ¹z1g1 + ··· + zkgkº ¹x − xˆº = f ¹xˆº + gT¹x − xˆº Subgradients 2.21 Optimal value function define f ¹u; vº as the optimal value of convex problem minimize f0¹xº subject to fi¹xº ≤ ui; i = 1;:::; m Ax = b + v (functions fi are convex; optimization variable is x) Weak result: suppose f ¹uˆ; vˆº is finite and strong duality holds with the dual ! X T maximize inf f0¹xº + λi¹ fi¹xº − uˆiº + ν ¹Ax − b − vˆº x i subject to λ 0 if λˆ, νˆ are optimal dual variables (for right-hand sides uˆ; vˆ) then (−λ;ˆ −νˆº 2 @ f ¹uˆ; vˆº Subgradients 2.22 Proof: by weak duality for problem with right-hand sides u, v ! X T f ¹u; vº ≥ inf f0¹xº + λˆi¹ fi¹xº − uiº + νˆ ¹Ax − b − vº x i ! X T = inf f0¹xº + λˆi¹ fi¹xº − uˆiº + νˆ ¹Ax − b − vˆº x i − λˆT¹u − uˆº − νˆT¹v − vˆº = f ¹uˆ; vˆº − λˆT¹u − uˆº − νˆT¹v − vˆº Subgradients 2.23 Expectation f ¹xº = E h¹x;uº u random, h convex in x for every u Weak result: to find a subgradient at xˆ, • choose a function u 7! g¹uº with g¹uº 2 @xh¹xˆ;uº • then, g = Eu g¹uº 2 @ f ¹xˆº Proof: by convexity of h and definition of g¹uº, f ¹xº = E h¹x;uº ≥ E h¹xˆ;uº + g¹uºT¹x − xˆº = f ¹xˆº + gT¹x − xˆº Subgradients 2.24 Outline • definition • subgradient calculus • duality and optimality conditions • directional derivative f ¹yº Optimality conditions — unconstrained x? minimizes f ¹xº if and only 0 2 @ f ¹x?º x? this follows directly from the definition of subgradient: f ¹yº ≥ f ¹x?º + 0T¹y − x?º for all y () 0 2 @ f ¹x?º Subgradients 2.25 Example: piecewise-linear minimization T f ¹xº = max ¹ai x + biº i=1;:::;m Optimality condition ? T 0 2 conv fai j i 2 I¹x ºg where I¹xº = fi j ai x + bi = f ¹xºg • in other words, x? is optimal if and only if there is a λ with m T X ? λ 0; 1 λ = 1; λiai = 0; λi = 0 for i < I¹x º i=1 • these are the optimality conditions for the equivalent linear program minimize t maximize bT λ subject to Ax + b t1 subject to AT λ = 0 λ 0; 1T λ = 1 Subgradients 2.26 Optimality conditions — constrained minimize f0¹xº subject to fi¹xº ≤ 0; i = 1;:::; m n assume dom fi = R , so functions fi are subdifferentiable everywhere Karush–Kuhn–Tucker conditions if strong duality holds, then x?, λ? are primal, dual optimal if and only if 1.