2. Subgradients

L. Vandenberghe ECE236C (Spring 2020) 2. Subgradients • definition • subgradient calculus • duality and optimality conditions • directional derivative 2.1 Basic inequality recall the basic inequality for differentiable convex functions: f ¹yº ≥ f ¹xº + r f ¹xºT¹y − xº for all y 2 dom f ¹x; f ¹xºº r f ¹xº −1 • the first-order approximation of f at x is a global lower bound • r f ¹xº defines non-vertical supporting hyperplane to epigraph of f at ¹x; f ¹xºº: r f ¹xº T y x − ≤ 0 for all ¹y;tº 2 epi f −1 t f ¹xº Subgradients 2.2 Subgradient g is a subgradient of a convex function f at x 2 dom f if f ¹yº ≥ f ¹xº + gT¹y − xº for all y 2 dom f f ¹yº T f ¹x1º + g1 ¹y − x1º T f ¹x1º + g2 ¹y − x1º T f ¹x2º + g3 ¹y − x2º x1 x2 g1, g2 are subgradients at x1; g3 is a subgradient at x2 Subgradients 2.3 Subdifferential the subdifferential @ f ¹xº of f at x is the set of all subgradients: @ f ¹xº = fg j gT¹y − xº ≤ f ¹yº − f ¹xº; 8y 2 dom f g Properties • @ f ¹xº is a closed convex set (possibly empty) this follows from the definition: @ f ¹xº is an intersection of halfspaces • if x 2 int dom f then @ f ¹xº is nonempty and bounded proof on next two pages Subgradients 2.4 Proof: we show that @ f ¹xº is nonempty when x 2 int dom f • ¹x; f ¹xºº is in the boundary of the convex set epi f • therefore there exists a supporting hyperplane to epi f at ¹x; f ¹xºº: a T y x 9¹a; bº 0; − ≤ 0 8¹y;tº 2 epi f , b t f ¹xº • b > 0 gives a contradiction as t ! 1 • b = 0 gives a contradiction for y = x + a with small > 0 1 • therefore b < 0 and g = a is a subgradient of f at x jbj Subgradients 2.5 Proof: @ f ¹xº is bounded when x 2 int dom f • for small r > 0, define a set of 2n points B = fx ± rek j k = 1;:::; ng ⊂ dom f and define M = max f ¹yº < 1 y2B • for every g 2 @ f ¹xº, there is a point y 2 B with T rkgk1 = g ¹y − xº (choose an index k with jgk j = kgk1, and take y = x + r sign¹gkºek) • since g is a subgradient, this implies that T f ¹xº + rkgk1 = f ¹xº + g ¹y − xº ≤ f ¹yº ≤ M • we conclude that @ f ¹xº is bounded: M − f ¹xº kgk1 ≤ for all g 2 @ f ¹xº r Subgradients 2.6 Example f ¹xº = max f f1¹xº; f2¹xºg with f1, f2 convex and differentiable f ¹yº f2¹yº f1¹yº • if f1¹xˆº = f2¹xˆº, subdifferential at xˆ is line segment »r f1¹xˆº; r f2¹xˆº¼ • if f1¹xˆº > f2¹xˆº, subdifferential at xˆ is fr f1¹xˆºg • if f1¹xˆº < f2¹xˆº, subdifferential at xˆ is fr f2¹xˆºg Subgradients 2.7 Examples Absolute value f ¹xº = jxj f ¹xº @ f ¹xº 1 x x −1 Euclidean norm f ¹xº = kxk2 1 @ f ¹xº = f xg if x , 0;@ f ¹xº = fg j kgk2 ≤ 1g if x = 0 kxk2 Subgradients 2.8 Monotonicity the subdifferential of a convex function is a monotone operator: ¹u − vºT¹x − yº ≥ 0 for all x, y, u 2 @ f ¹xº, v 2 @ f ¹yº Proof: by definition f ¹yº ≥ f ¹xº + uT¹y − xº; f ¹xº ≥ f ¹yº + vT¹x − yº combining the two inequalities shows monotonicity Subgradients 2.9 Examples of non-subdifferentiable functions the following functions are not subdifferentiable at x = 0 • f : R ! R, dom f = R+ f ¹xº = 1 if x = 0; f ¹xº = 0 if x > 0 • f : R ! R, dom f = R+ p f ¹xº = − x the only supporting hyperplane to epi f at ¹0; f ¹0ºº is vertical Subgradients 2.10 Subgradients and sublevel sets if g is a subgradient of f at x, then f ¹yº ≤ f ¹xº =) gT¹y − xº ≤ 0 g x f ¹yº ≤ f ¹xº the nonzero subgradients at x define supporting hyperplanes to the sublevel set fy j f ¹yº ≤ f ¹xºg Subgradients 2.11 Outline • definition • subgradient calculus • duality and optimality conditions • directional derivative Subgradient calculus Weak subgradient calculus: rules for finding one subgradient • sufficient for most nondifferentiable convex optimization algorithms • if you can evaluate f ¹xº, you can usually compute a subgradient Strong subgradient calculus: rules for finding @ f ¹xº (all subgradients) • some algorithms, optimality conditions, etc., need entire subdifferential • can be quite complicated we will assume that x 2 int dom f Subgradients 2.12 Basic rules Differentiable functions: @ f ¹xº = fr f ¹xºg if f is differentiable at x Nonnegative linear combination if f ¹xº = α1 f1¹xº + α2 f2¹xº with α1; α2 ≥ 0, then @ f ¹xº = α1@ f1¹xº + α2@ f2¹xº (right-hand side is addition of sets) Affine transformation of variables: if f ¹xº = h¹Ax + bº, then @ f ¹xº = AT @h¹Ax + bº Subgradients 2.13 Pointwise maximum f ¹xº = max f f1¹xº;:::; fm¹xºg define I¹xº = fi j fi¹xº = f ¹xºg, the ‘active’ functions at x Weak result to compute a subgradient at x, choose any k 2 I¹xº, any subgradient of fk at x Strong result [ @ f ¹xº = conv @ fi¹xº i2I¹xº • the convex hull of the union of subdifferentials of ‘active’ functions at x • if fi’s are differentiable, @ f ¹xº = conv fr fi¹xº j i 2 I¹xºg Subgradients 2.14 Example: piecewise-linear function T f ¹xº = max ¹ai x + biº i=1;:::;m f ¹xº T ai x + bi x the subdifferential at x is a polyhedron @ f ¹xº = conv fai j i 2 I¹xºg T with I¹xº = fi j ai x + bi = f ¹xºg Subgradients 2.15 Example: `1-norm T f ¹xº = kxk1 = max s x s2f−1;1gn the subdifferential is a product of intervals 8 [−1;1¼ xk = 0 ><> @ f ¹xº = J1 × · · · × Jn; Jk = f1g xk > 0 > f−1g xk < 0 : ¹1;1º (−1;1º ¹1;1º ¹1;1º (−1;−1º ¹1;−1º ¹1;−1º @ f ¹0;0º = [−1;1¼ × [−1;1¼ @ f ¹1;0º = f1g × [−1;1¼ @ f ¹1;1º = f¹1;1ºg Subgradients 2.16 Pointwise supremum f ¹xº = sup fα¹xº; fα¹xº convex in x for every α α2A Weak result: to find a subgradient at xˆ, • find any β for which f ¹xˆº = fβ¹xˆº (assuming maximum is attained) • choose any g 2 @ fβ¹xˆº (Partial) strong result: define I¹xº = fα 2 A j fα¹xº = f ¹xºg [ conv @ fα¹xº ⊆ @ f ¹xº α2I¹xº equality requires extra conditions (for example, A compact, fα continuous in α) Subgradients 2.17 Exercise: maximum eigenvalue Problem: explain how to find a subgradient of T f ¹xº = λmax¹A¹xºº = sup y A¹xºy kyk2=1 where A¹xº = A0 + x1A1 + ··· + xn An with symmetric coefficients Ai Solution: to find a subgradient at xˆ, • choose any unit eigenvector y with eigenvalue λmax¹A¹xˆºº • the gradient of yT A¹xºy at xˆ is a subgradient of f : T T ¹y A1y;:::; y Anyº 2 @ f ¹xˆº Subgradients 2.18 Minimization f ¹xº = inf h¹x; yº; h jointly convex in ¹x; yº y Weak result: to find a subgradient at xˆ, • find yˆ that minimizes h¹xˆ; yº (assuming minimum is attained) • find subgradient ¹g;0º 2 @h¹xˆ; yˆº Proof: for all x, y, h¹x; yº ≥ h¹xˆ; yˆº + gT¹x − xˆº + 0T¹y − yˆº = f ¹xˆº + gT¹x − xˆº therefore f ¹xº = inf h¹x; yº ≥ f ¹xˆº + gT¹x − xˆº y Subgradients 2.19 Exercise: Euclidean distance to convex set Problem: explain how to find a subgradient of f ¹xº = inf kx − yk2 y2C where C is a closed convex set Solution: to find a subgradient at xˆ, • if f ¹xˆº = 0 (that is, xˆ 2 Cº, take g = 0 • if f ¹xˆº > 0, find projection yˆ = P¹xˆº on C and take 1 1 g = ¹xˆ − yˆº = ¹xˆ − P¹xˆºº kyˆ − xˆk2 kxˆ − P¹xˆºk2 Subgradients 2.20 Composition f ¹xº = h¹ f1¹xº;:::; fk¹xºº; h convex and nondecreasing, fi convex Weak result: to find a subgradient at xˆ, • find z 2 @h¹ f1¹xˆº;:::; fk¹xˆºº and gi 2 @ fi¹xˆº • then g = z1g1 + ··· + zkgk 2 @ f ¹xˆº reduces to standard formula for differentiable h, fi Proof: T T f ¹xº ≥ h f1¹xˆº + g1 ¹x − xˆº;:::; fk¹xˆº + gk ¹x − xˆº T T T ≥ h ¹ f1¹xˆº;:::; fk¹xˆºº + z g1 ¹x − xˆº;:::; gk ¹x − xˆº T = h ¹ f1¹xˆº;:::; fk¹xˆºº + ¹z1g1 + ··· + zkgkº ¹x − xˆº = f ¹xˆº + gT¹x − xˆº Subgradients 2.21 Optimal value function define f ¹u; vº as the optimal value of convex problem minimize f0¹xº subject to fi¹xº ≤ ui; i = 1;:::; m Ax = b + v (functions fi are convex; optimization variable is x) Weak result: suppose f ¹uˆ; vˆº is finite and strong duality holds with the dual ! X T maximize inf f0¹xº + λi¹ fi¹xº − uîº + ν ¹Ax − b − vˆº x i subject to λ 0 if λˆ, νˆ are optimal dual variables (for right-hand sides uˆ; vˆ) then (−λ;ˆ −νˆº 2 @ f ¹uˆ; vˆº Subgradients 2.22 Proof: by weak duality for problem with right-hand sides u, v ! X T f ¹u; vº ≥ inf f0¹xº + λî¹ fi¹xº − uiº + νˆ ¹Ax − b − vº x i ! X T = inf f0¹xº + λî¹ fi¹xº − uîº + νˆ ¹Ax − b − vˆº x i − λˆT¹u − uˆº − νˆT¹v − vˆº = f ¹uˆ; vˆº − λˆT¹u − uˆº − νˆT¹v − vˆº Subgradients 2.23 Expectation f ¹xº = E h¹x;uº u random, h convex in x for every u Weak result: to find a subgradient at xˆ, • choose a function u 7! g¹uº with g¹uº 2 @xh¹xˆ;uº • then, g = Eu g¹uº 2 @ f ¹xˆº Proof: by convexity of h and definition of g¹uº, f ¹xº = E h¹x;uº ≥ E h¹xˆ;uº + g¹uºT¹x − xˆº = f ¹xˆº + gT¹x − xˆº Subgradients 2.24 Outline • definition • subgradient calculus • duality and optimality conditions • directional derivative f ¹yº Optimality conditions — unconstrained x? minimizes f ¹xº if and only 0 2 @ f ¹x?º x? this follows directly from the definition of subgradient: f ¹yº ≥ f ¹x?º + 0T¹y − x?º for all y () 0 2 @ f ¹x?º Subgradients 2.25 Example: piecewise-linear minimization T f ¹xº = max ¹ai x + biº i=1;:::;m Optimality condition ? T 0 2 conv fai j i 2 I¹x ºg where I¹xº = fi j ai x + bi = f ¹xºg • in other words, x? is optimal if and only if there is a λ with m T X ? λ 0; 1 λ = 1; λiai = 0; λi = 0 for i < I¹x º i=1 • these are the optimality conditions for the equivalent linear program minimize t maximize bT λ subject to Ax + b t1 subject to AT λ = 0 λ 0; 1T λ = 1 Subgradients 2.26 Optimality conditions — constrained minimize f0¹xº subject to fi¹xº ≤ 0; i = 1;:::; m n assume dom fi = R , so functions fi are subdifferentiable everywhere Karush–Kuhn–Tucker conditions if strong duality holds, then x?, λ? are primal, dual optimal if and only if 1.

2. Subgradients

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support