An Approximate Shapley-Folkman Theorem

An Approximate Shapley-Folkman Theorem. Alexandre d'Aspremont, CNRS & D.I., Ecole´ Normale Supérieure. With Thomas Kerdreux, Igor Colin (ENS). Alex d'Aspremont Newton Institute, June 2018. 1/34 Jobs Postdoc position (1.5 years) in ML / Optimization. At Ecole Normale Supérieure in Paris, competitive salary, travel funds. Alex d'Aspremont Newton Institute, June 2018. 2/34 Introduction Minimizing finite sums. Pn minimize i=1 fi(xi) subject to x 2 C Ubiquitous in statistics, machine learning. Better computational complexity (SAGA, SVRG, MISO, etc.). Today. More robust to nonconvexity issues. Alex d'Aspremont Newton Institute, June 2018. 3/34 Introduction Minimizing finite sums. Penalized regression. Given a penalty function g(x) such as `1, `0, SCAD, solve Pn 2 Pp minimize i=1 zi + λ i=1 g(xi) subject to z = Ax − b Empirical Risk Minimization. In the linear case, Pn Pp minimize i=1 `(yi; zi) + λ i=1 g(wi) subject to z = Aw − b Multi-Task Learning. Same format, by blocks. Resource Allocation. Aka unit commitment problem. Pn maximize i=1 f(xi) subject to Ax ≤ b Alex d'Aspremont Newton Institute, June 2018. 4/34 Introduction Minimizing finite sums. Pn minimize i=1 fi(xi) subject to x 2 C Better complexity bounds for stochastic gradient. SAG [Schmidt et al., 2013], SVRG [Johnson and Zhang, 2013], SDCA [Shalev-Shwartz and Zhang, 2013], SAGA [Defazio et al., 2014]. Non convexity has a milder impact. Weakly convex penalties for M-estimators [Loh and Wainwright, 2013, Chen and Gu, 2014]. Equilibrium in economies where consumers have non-convex preferences. [Starr, 1969, Guesnerie, 1975]. Unit commitment problem with non-convex costs. [Bertsekas et al., 1981]. Alex d'Aspremont Newton Institute, June 2018. 5/34 Introduction This talk. Focus on problems with separable linear constraints Pn minimize i=1 fi(xi) subject to Ax ≤ b; xi 2 Yi; i = 1; : : : ; n; Many results generalize to the nonlinear case, Pn minimize i=1 fi(xi) Pn subject to i=1 gij(xi) ≤ 0; j = 1; : : : ; m xi 2 Yi; i = 1; : : : ; n; Alex d'Aspremont Newton Institute, June 2018. 6/34 Outline Shapley-Folkman theorem Duality gap bounds Stable bounds Alex d'Aspremont Newton Institute, June 2018. 7/34 Introduction Minkowski sum. Given sets X; Y ⊂ Rd, we have X + Y = fx + y : x 2 X; y 2 Y g (CGAL User and Reference Manual) d Convex hull. Given subsets Vi ⊂ R , we have ! X X Co Vi = Co(Vi) i i Alex d'Aspremont Newton Institute, June 2018. 8/34 Shapley-Folkman The `1=2 ball, Minkowsi average of two and ten balls, convex hull. + + + + = 5× Minkowsi average of five first digits (obtained by sampling). Alex d'Aspremont Newton Institute, June 2018. 9/34 Shapley-Folkman Shapley-Folkman Theorem [Starr, 1969] d If Vi ⊂ R , i = 1; : : : ; n; and n ! n X X x 2 Co Vi = Co(Vi) i=1 i=1 then X X x 2 Vi + Co(Vi) [1;n]nS S where jSj ≤ d. Alex d'Aspremont Newton Institute, June 2018. 10/34 Shapley-Folkman Pn Proof. Suppose x 2 i=1 Co (Vi), by Carathéodory's theorem we have n d+1 X X z = λijzij i=1 j=1 where z 2 Rd+n, λ ≥ 0, and x vij z = ; zij = ; for i = 1; : : : ; n and j = 1; : : : ; d + 1, 1n ei n with ei 2 R is the Euclidean basis. Conic Carathéodory on z means n d+1 X X z = µijzij i=1 j=1 where n + d nonzero coefficients µij are spread among n sets (cf. constraints), with at least one nonzero coefficient per set. Pd+1 This means µij = 1 for at least n − d indices i, for which j=1 µijzij 2 Vi. Alex d'Aspremont Newton Institute, June 2018. 11/34 Shapley-Folkman Proof. Write X X x 2 Vi + Co(Vi) [1;n]nS S where jSj ≤ d, or n d+1 x X X vij = λij : 1n ei i=1 j=1 λij } d } xi ∈ Co(Vi) n xi ∈ Vi Number of nonzero λij controls distance to convex hull. Alex d'Aspremont Newton Institute, June 2018. 12/34 Shapley-Folkman: consequences Consequences. d If the sets Vi ⊂ R are uniformly bounded with rad(Vi) ≤ R, then ! !! X X p dH Vi ; Co Vi ≤ R minfn; dg i i where rad(V ) = infx2V supy2V kx − yk: In particular, when d is fixed and n ! 1 Pn V Pn V i=1 i ! Co i=1 i n n in the Hausdorff metric. Alex d'Aspremont Newton Institute, June 2018. 13/34 Shapley-Folkman: consequences In the limit. When n ! 1, Lyapunov Theorem [Berliocchi and Lasry, 1973, Ekeland and Temam, 1999]. Hilbert, Banach space versions [Cassels, 1975, Puri and Ralescu, 1985, Schneider and Weil, 2008]. Bound Hausdorff distance with convex hull in terms of radius. Strong law of large numbers for [Artstein and Vitale, 1975]. Alex d'Aspremont Newton Institute, June 2018. 14/34 Outline Shapley-Folkman theorem Duality gap bounds Stable bounds Alex d'Aspremont Newton Institute, June 2018. 15/34 Nonconvex Optimization Optimization problem. Focus on separable problem with linear constraints Pn minimize i=1 fi(xi) subject to Ax ≤ b; (P) xi 2 Yi; i = 1; : : : ; n; di Pn in the variables xi 2 R with d = i=1 di, where fi are lower semicontinuous m×d (but not necessarily convex), Yi ⊂ dom fi are compact, and A 2 R . Take the dual twice to form a convex relaxation, minimize Pn (f + 1 )∗∗(x ) i=1 i Yi i (CoP) subject to Ax ≤ b d in the variables xi 2 R i. Alex d'Aspremont Newton Institute, June 2018. 16/34 Nonconvex Optimization Convex envelope. ∗∗ Biconjugate f of f (aka convex envelope of f): pointwise supremum of all affine functions majorized by f (see e.g. [Rockafellar, 1970, Th. 12.1] or [Hiriart-Urruty and Lemaréchal,1993, Th. X.1.3.5]) ∗∗ We have epi(f ) = Co(epi(f)), which means that f ∗∗(x) and f(x) match at extreme points x of epi(f ∗∗). |x| 1 Card(x) −1 0 1 x The l1 norm is the convex envelope of Card(x) in [−1; 1]. Alex d'Aspremont Newton Institute, June 2018. 17/34 Nonconvex Optimization Writing the epigraph of problem (P) as in [Lemaréchaland Renaud, 2001], ( n ) d+1+m X G , (x; r0; r) 2 R : fi(xi) + 1Yi(xi) ≤ r0; Ax − b ≤ r ; i=1 and its projection on the last m + 1 coordinates, m+1 Gr , (r0; r) 2 R :(x; r0; r) 2 G ; we can write the dual function of (P) as > ∗∗ Ψ(λ) , inf r0 + λ r :(r0; r) 2 Gr ; in the variable λ 2 Rm, where G∗∗ = Co(G) is the closed convex hull of the epigraph G. [Lemaréchaland Renaud, 2001, Th. 2.11]: affine constraints means the dual functions of (P) and (CoP) are equal. The (common) dual of (P) and (CoP) is then sup Ψ(λ) (D) λ≥0 in the variable λ 2 Rm. Alex d'Aspremont Newton Institute, June 2018. 18/34 Nonconvex Optimization Epigraph. Define ∗∗ di Fi = ((fi + 1Yi) (xi);Aixi): xi 2 R m×d th where Ai 2 R i is the i block of A. ∗∗ The epigraph Gr can be written as a Minkowski sum of Fi n ∗∗ X m+1 Gr = Fi + (0; −b) + R+ i=1 Lack of convexity. Define ∗∗ ρ(f) , sup ff(x) − f (x)g: x2dom(f) Alex d'Aspremont Newton Institute, June 2018. 19/34 Bound on duality gap A priori bound on duality gap of Pn minimize i=1 fi(xi) subject to Ax ≤ b; xi 2 Yi; i = 1; : : : ; n; where A 2 Rm×d. Proposition [Aubin and Ekeland, 1976, Ekeland and Temam, 1999] A priori bounds on the duality gap Suppose the functions fi in (P) satisfy Assumption (. ). There is a point x? 2 Rd at which the primal optimal value of (CoP) is attained, such that n n n m+1 X ∗∗ ? X ? X ∗∗ ? X fi (xi ) ≤ fi(^xi ) ≤ fi (xi ) + ρ(f[i]) i=1 i=1 i=1 i=1 | CoP{z } | {zP } | CoP{z } | gap{z } ? where x^ is an optimal point of (P) and ρ(f[1]) ≥ ρ(f[2]) ≥ ::: ≥ ρ(f[n]). Alex d'Aspremont Newton Institute, June 2018. 20/34 Bound on duality gap ? ∗∗ Proof sketch. Pick optimal z in Gr , closed convex hull of epigraph which is a Minkowski sum, n ∗∗ X m+1 ∗∗ di m+1 Gr = Fi +(0; −b)+R+ ; where Fi = (fi (xi);Aixi): xi 2 R ⊂ R i=1 ∗∗ Pn m+1 Krein-Milman shows Gr = i=1 Co (Ext(Fi)) + (0; −b) + R+ : m+1 ? ∗∗ Fi ⊂ R so Shapley-Folkman shows that for any z 2 Gr , ? X X z 2 Ext(Fi) + Co (Ext(Fi)) [1;n]nS S for some index set S ⊂ [1; n] with jSj ≤ m + 1. ? ∗∗ ? ? ? ∗∗ ? Then, fi(xi ) = fi (xi ) when xi 2 Ext(Fi), and f(xi ) − f (xi ) ≤ ρ(fi) otherwise. Alex d'Aspremont Newton Institute, June 2018. 21/34 Bound on duality gap Shapley-Folkman. A priori bound on duality gap based on tractable quantities. Vanishingly small if n ! 1, m fixed and ρ is uniformly bounded. However, the bound is written in terms of unstable quantities which lack meaning (dimension, rank, etc.) Significantly tighten gap bound using stable quantities? Alex d'Aspremont Newton Institute, June 2018. 22/34 Outline Shapley-Folkman theorem Duality gap bounds Stable bounds Alex d'Aspremont Newton Institute, June 2018. 23/34 Stable bounds on duality gap kth-nonconvexity measure.

An Approximate Shapley-Folkman Theorem

Preface Chapter 1 Chaos Theory and the Dynamics of Narrative

Selected New Aspects of the Calculus of Variations in the Large

“Selecta” Books Scrapbook Forewords & Excerpts from Reviews

The Nonconvex Geometry of Linear Inverse Problems

Arxiv:2002.02208V1

UNIVERSITY of CALIFORNIA, SAN DIEGO Operator

Notices of the American Mathematical Society ABCD Springer.Com

Vanishing Price of Anarchy in Large Coordinative Nonconvex Optimization

Feature Selection & the Shapley-Folkman Theorem

Discretization of Functionals Involving the Monge-Amp\Ere Operator

OPTIMIZATION PROBLEMS in BANACH Spacest1)

RESTON for MATH======:::---"", New and Recent Titles in Mathematics