arXiv:2006.16784v1 [cs.DM] 27 Jun 2020 omto hoy umdlrFntosaedfie over In- defined and are Learning Functions a Machine subsets in Submodular including popularity Theory. areas gaining formation new is of submodularity number theory, resea game operations and economics, optimization, combinatorial ver shorter and [3]. concise preprint t a longer using our is of maximization paper submodular This prov for superdifferentials. and between bounds to conditions upper relationship superdi with try optimality associated the the polyhedra we characterize on and We paper, entials picture concavity. this with complete submodularity In more submodular. a Submodular are provide w concavity. functions composed fac functions constant to modular concave admits and similar hard, guarantees NP approximation signs Submodu though maximization, show functions. function convex also which of functions of a both characteristics and minimization, fundamental for Submodul [2] algorithms [1]. subdifferentials time information and polynomial mutual subgradients and have entropy functions as information-theoret such several tities generalize which functions, ifrnil,CneiyadConcavity and Convexity differentials, variables f to e fies hnastfunction set a then items, of set ihcne ucin,t h xetta umdlrt is [10] submodularity convexity of that analogue extent discrete a the as regarded to sometimes functions, convex with few. a sen name and to selection [9] observation placement selection [6]–[8], data summarization prevalent including [5], applications becoming learning increasingly machine in are Functions Submodular eoetemria oto element of cost marginal the denote qiaety umdlrstfnto satisfies returns function marginal set ing submodular a Equivalently, aibe nee ysubset by indexed variables ulInformation tual ayt e that see to easy rudset ground ,T S, ( 1 j ogkont ea motn rpryfrpolm in problems for property important an be to known Long Terms Index Abstract umdlrfntoshv ensrnl associated strongly been have functions Submodular eas s hsntto o sets for notation this use also We S | S ⊆ ) ⊆ ≥ V X thlsthat, holds it , X V Sboua ucin r pca ls fset of class special a are Functions —Submodular f ocv set fSboua Functions Submodular of Aspects Concave V nvriyo ea tDla,CEDepartment CSE Dallas, at Texas of University 1 ( . faground-set a of 1 j , · · · | Sboua ucin,Sbdfeetas Super- Sub-differentials, Functions, —Submodular = T h iiihn eun rprysae that, states property returns diminishing The ) H { , I , 1 ∀ ( X ( mi:[email protected] Email: , X X S 2 n 61N ly d SEC31 MS Rd. Floyd N. 2601 S S , .I I. define , Let : ⊆ · · · ; ) ihrsn X75083 TX Richardson, X f ssboua 1.Smlry h Mu- the Similarly, [1]. submodular is T NTRODUCTION V ( n , S \ and S + ) ihb Iyer Rishabh } f ,B A, ) V S ( H is j sas umdlrfunction. submodular a also is i.e. : Let . / j f | ( S submodular X ( ∈ sin as T = ) S T ) ) f X ie e frandom of set a Given . ≥ V f stejitEtoyof Entropy joint the as S ( A j f 2 : f = | = ( B ( ∈ S S = ) V ∪ { {X ffralsubsets all for if { ∪ V 1 T , f → i · · · + ) ( i , ihrespect with A j } ∪ R ∈ n , f ) B diminish- ( cquan- ic − ) S } S vra over − } ∩ f f ea be dmit Its . ffer- sion ( rch, ( [4], T are B sor ide tor lar S ith he ar ) ) ) . . . rprisealn hmt e h eto ohcassof classes Contributions pap both Our this of in A. relationships best these the are all get formalize functions We to functions. submodular them stran some that enabling have and properties concavity, indicate and convexity to both directio to particular related seem is a to these maximization, restricted submodular All when of concave be context to the useful known upper become in has multi-line [16] modular which function, The recently superdifferentials submodular tight functions. a form of concave show, have extension like we modular to supergradients as over and shown and concave [18]–[22], been bounds and submodula has addition concavity, In submodular. function property be to to known returns are akin functions diminishing is a Furthermore which have [17]. continuous difficult functions where recent hopelessly submodular convexity be unlike some factor can is local This maximization and constant [16]. or of methods [13]–[15] NP greedy approximation number simple hueristics be on a to search based exist concavity. known algorithms to there approximation is akin However, maximization more hard. perhaps function but Submodular convexity, unlike are optimize. and extension, [12] Lov´asz evaluate the to as easy admit known is convex which extension, also to convex functions from natural analogous submodular a conditions addition, manner In a (KKT) programming. in Karush-Kuhn-Tucker conditi minimization, the optimality provide submodular to [11] for possible is modular characterizatio it sub-differential Moreover, tight a to admit have and similar bounds functions example, lower make submodular For however, formal. functions, submodular convex results, more convex to that recent much relationship (akin of fact this number time the A polynomial by minimization). is evident minimization is function relationship This ne oyerlbud,alotial nplnma time also polynomial and in outer obtainable tighter) all (successively bounds, of polyhedral series inner a Ho general. provide in we hard NP ch is that superdiffereitial t show this We of acterizing function. existence submodular the a of proving superdifferential thereby modula bounds, tight con- upper have functions (additive) to submodular connections that show aspect and We cavity. polyhedral maximization to function related submodular study theoretical systematic umdlrfntosas aesm rpris which properties, some have also functions Submodular h ancnrbtoso hswr si rvdn h first the providing in is work this of contributions main The nvriyo ahntn C Department ECE Washington, of University mi:[email protected] Email: 8 tvn a NE Way Stevens E 185 etl,W 98195 WA Seattle, efBilmes Jeff wever, [2]. n of s er. ons ar- ge he ar n. r r , , and also show that we can obtain some specific practically a permutation of V that assigns the elements in X to the useful supergradients in polynomial time. We then show how first |X| positions (i ≤ |X| if and only if σ(i) ∈ X) and σ σ we can define forms of optimality conditions for submodular S|X| = X. This chain defines an extreme point hX of ∂f (X) σ σ σ maximization through the submodular superdifferential. We with entries: hX (σ(i)) = f(Si ) − f(Si−1). Note that for also show how optimality conditions related to approximations every subgradient hX ∈ ∂f (X), we can define a modular to the superdifferential lead to a number of familiar approxi- lower bound mX (Y ) = f(X)+ hX (Y ) − hX (X), ∀Y ⊆ V , mation guarantees for these problems. which satisfies mX (Y ) ≤ f(Y ), ∀Y ⊆ V . Moreover, we have that mX (X) = f(X), and hence the subdifferential II. SUBMODULARITY AND CONVEXITY exactly correspond to the set of tight modular lower bounds Most of the results in this section are from [10], [12] and the of a submodular function, at a given set X. If we choose references contained therein, so for more details please refer hX to be an extreme subgradient, the modular lower bound to these texts. We use this section to review existing work becomes mX (Y )= hX (Y ), resulting in a normalized modular on the connections between submodularity and convexity, and function2. to help contrast these with the corresponding results on the The subdifferential is defined via an exponential number connections between submodularity and concavity. of inequalities. A key observation however is that many A. Submodular (Lower) Polyhedron of these inequalities are redundant. Define three polyhedra: 1 2 3 ∂f (X), ∂f (X) and ∂f (X) where the first polyhedra is defined For a submodular function f, the (lower) submodular poly- via inequalities of all subsets Y ⊆ X, the second via hedron and the base polytope of a submodular function [10] inequalities of all subsets Y ⊇ X and the third comprising are defined, respectively, as: Pf = {x : x(S) ≤ f(S), ∀S ⊆ of all other inequalities. We immediately have that ∂f (X)= V }, and Bf = Pf ∩{x : x(V ) = f(V )}. The submodular ∂1(X)∩∂2(X)∩∂3(X). However, when f is submodular, the polyhedron has a number of interesting properties. An impor- f f f inequalities in ∂3(X) are in fact redundant in characterizing tant property of the polyhedron is that the extreme points and f ∂ (X). facets can easily be characterized even though the polyhedron f itself is described by a exponential number of inequalities. In Theorem 2. [10] Given a submodular function f, ∂f (X)= 1 2 fact, surprisingly, every extreme point of the submodular poly- ∂f (X) ∩ ∂f (X). hedron is an extreme point of the base polytope. These extreme The subdifferential at the emptyset has a special relationship points admit an interesting characterization in that they can be # since ∂ (∅)= P . Similarly ∂ (V )= P # , where f (X)= computed via a simple — let be a permu- f f f f σ f(V ) − f(V \X) is the submodular dual of f. Furthermore, tation of . Each such permutation defines V = {1, 2, ··· ,n} since f # is a supermodular function, it holds that ∂ (V ) is a a chain with elements σ , σ f S0 = ∅ Si = {σ(1), σ(2),...,σ(i)} supermodular polyhedron (for a supermodular function g, the such that σ σ σ. This chain defines an extreme S0 ⊆ S1 ⊆···⊆ Sn supermodular polyhedron is defined as P = {x : x(X) ≥ point σ of with entries: σ σ σ . g h Pf h (σ(i)) = f(Si ) − f(Si−1) g(X), ∀X ⊆ V }). Each permutation of V characterizes an extreme point of P △(1,1) f Finally define: ∂ (X) = {x ∈ RV : ∀j ∈ and all possible extreme points of P can be characterized in f f X,f(j|X\j) ≤ x(j) and ∀j∈ / X,f(j|X) ≥ x(j)}. No- this manner [10]. Given a submodular function f such that tice that ∂△(1,1)(X) ⊇ ∂ (X) since we are reducing the f(∅)= 0, the condition that x ∈ P can be checked in f f f △(1,1) polynomial time for every x. constraints of the subdifferential. In particular ∂f (X) just considers n inequalities, by choosing the sets Y in the Proposition 1. [10] Given a submodular function f, checking definition of ∂f (X) such that |Y △ X| = 1 (i.e., Hamming if x ∈ Pf is equivalent to the condition minX⊆V [f(X) − distance one away from X). This polyhedron will be useful in x(X)] ≥ 0, which can be checked in poly-time. characterizing local minimizers of a submodular function (see B. The Submodular Subdifferential Section II-C) and motivating analogous constructs for local maxima (see, for example, Proposition 13). The subdifferential ∂f (X) of a submodular set function f :2V → R for a set Y ⊆ V is defined [2], [10] analogously to C. Subdifferentials and Optimality Conditions the subdifferential of a continuous convex function: ∂f (X)= Fujishige [11] provides some interesting characterizations Rn {x ∈ : f(Y ) − x(Y ) ≥ f(X) − x(X) for all Y ⊆ V }. to the optimality conditions for unconstrained submodular The polyhedra above can be defined for any (not necessarily minimization, which can be thought of as a discrete analog submodular) set function. When the function is submodular to the KKT conditions. The result shows that a set A ⊆ V is however, it can be characterized efficiently. Firstly, note that V a minimizer of f : 2 → R if and only if: 0 ∈ ∂f (A). This for normalized submodular functions, for any hX ∈ ∂f (X), immediately provides necessary and sufficient conditions for we have f(X) − hX (X) ≤ 0 which follows by the constraint optimality of f: A set A minimizes a submodular function at Y = ∅. The extreme points of the submodular subdiffer- f if and only if f(A) ≤ f(B) for all sets B such that ential admit interesting characterizations. We shall denote a B ⊆ A or A ⊆ B. Analogous characterizations have also been subgradient at X by hX ∈ ∂f (X). The extreme points of 2 ∂f (Y ) may be computed via a greedy algorithm: Let σ be A set function h is said to be normalized if h(∅) = 0). provided for constrained forms of submodular minimization, Proof. Observe that f(X) − x(X) ≤ i∈X [f(i) − x(i)]. and interested readers may look at [11]. Finally, we can Since the l.h.s. is greater than 0, it implies that P provide a simple characterization on the local minimizers of i∈X [f(i) − x(i)] > 0. Hence there should exist an a submodular function. A set A ⊆ V is a local minimizer3 of i ∈ X such that f(i) − x(i) > 0. a submodular function if and only if 0 ∈ ∂△(1,1)(A). As was P f An interesting corollary of the above, is that it is in fact shown in [18], a local minimizer of a submodular function, in easy to check if the maximizer of a submodular function the unconstrained setting, can be found efficiently in O(n2) is greater than equal to zero. Given a submodular function complexity. f, the problem is whether maxX⊆V f(X) ≥ 0. This can III. SUBMODULARITY AND CONCAVITY easily be checked without resorting to submodular function In this section, we investigate several polyhedral aspects maximization. of submodular functions relating them to concavity, thus Corollary 5. Given a submodular function f with f(∅)=0, complementing the results from Section II. This provides a maxX⊆V f(X) > 0 if and only if there exists an i ∈ V such complete picture on the relationship between submodularity, that f(i) > 0. convexity and concavity. This fact is true only for a submodular function. For general A. The submodular upper polyhedron set functions, even when f(∅)=0, it could potentially require A first step in characterizing the concave aspects of a an exponential search to determine if maxX⊆V f(X) > 0. submodular function is the submodular upper polyhedron. B. The Submodular Superdifferentials Intuitively this is the set of tight modular upper bounds of the function, and we define it as follows: Given a submodular function f we can characterize its superdifferential as follows. We denote for a set X, the Pf = {x ∈ Rn : x(S) ≥ f(S), ∀S ⊆ V } (1) superdifferential with respect to X as ∂f (X). The above polyhedron can in fact be defined for any set ∂f (X)= {x ∈ Rn : f(Y )−x(Y ) ≤ f(X)−x(X), ∀Y ⊆ V } function. In particular, when f is supermodular, we get what This characterization is analogous to the subdifferential of a is known as the supermodular polyhedron [10]. Presently, we submodular function (section II-B). This is also akin to the are interested in the case when f is submodular and hence we superdifferential of a continuous concave function. call this the submodular upper polyhedron. Interestingly this Each supergradient f , defines a modular up- has a very simple characterization. gX ∈ ∂ (X) per bound of a submodular function. In particular, define X X Theorem 3. Given a submodular function f, m (Y )= gX (Y )+ f(X) − gX (X). Then m (Y ) is a mod- X f n ular function which satisfies m (Y ) ≥ f(Y ), ∀Y ⊆ X and P = {x ∈ R : x(j) ≥ f(j)} (2) X m (X) = f(X). We note that (x(v1), x(v2),...,x(vn)) = f Proof. Given x ∈ Pf and a set S, we have x(S) = (f(v1),f(v2),...,f(vn)) ∈ ∂ (∅) which shows that at least ∂f (∅) exists. A bit further below (specifically Theorem 10) we i∈S x(i) ≥ i∈S f(i), since ∀i, x(i) ≥ f(i) by Eqn. (1). show that for any submodular function, ∂f (X) is non-empty Hence x(S) ≥ i∈S f(i) ≥ f(S). Thus, the irredundant Pinequalities areP the singletons. for all X ⊆ V . P Note that the superdifferential is defined by an exponential The submodular upper polyhedron has a particularly simple (i.e., 2|V |) number of inequalities. However owing to the characterization due to the submodularity of f. In other words, submodularity of f and akin to the subdifferential of f, we can this polyhedron is not polyhedrally tight in that many of the reduce the number of inequalities. Define three polyhedrons as: defining inequalities are redundunt. This polyhedron alone is f Rn ∂1 (X)= {x ∈ : f(Y )−x(Y ) ≤ f(X)−x(X), ∀Y ⊆ X}, not particularly interesting to define a concave extension. f Rn ∂2 (X)= {x ∈ : f(Y )−x(Y ) ≤ f(X)−x(X), ∀Y ⊇ X}, We end this subsection by investigating the submodular f Rn and ∂3 (X) = {x ∈ : f(Y ) − x(Y ) ≤ f(X) − upper polyhedron membership problem. Owing to its sim- x(X), ∀Y : Y 6⊆ X, Y 6⊇ X}. A trivial observation is that: plicity, this problem is particularly simple which might seem f f f f ∂ (X)= ∂1 (X) ∩ ∂2 (X) ∩ ∂3 (X). As we show below for a surprising at first glance since x ∈ Pf is equivalent to, f f submodular function f, ∂1 (X) and ∂2 (X) are actually very Eqn. (1), checking if maxX⊆V [f(X) − x(X)] ≤ 0. This simple polyhedra. involves maximization of a submodular function which is NP hard. However, surprisingly this particular instance is easy! Theorem 6. For a submodular function f, f Rn Lemma 4. Given a submodular function f and vector x, let ∂1 (X)= {x ∈ : f(j|X\j) ≥ x(j), ∀j ∈ X} (3) X be a set such that f(X) − x(X) > 0. Then there exists an ∂f (X)= {x ∈ Rn : f(j|X) ≤ x(j), ∀j∈ / X}. (4) i ∈ X : f(i) − x(i) > 0. 2 f 3 Proof. Consider ∂1 (X). Notice that the inequalities defining A set A is a local minimizer of a submodular function if f(X) ≥ f Rn f(A), ∀X : |X\A| ≤ 1, and |A\X| = 1, that is all sets X no more the polyhedron can be rewritten as ∂1 (X) = {x ∈ : than hamming-distance one away from A. x(X\Y ) ≤ f(X) − f(Y ), ∀Y ⊆ X}. We then have that x(X\Y ) = j∈X\Y x(j) ≤ j∈X\Y f(j|X\j), since ∀j ∈ While it is difficult to characterize the superdifferentials for X, x(j) ≤ f(j|X\j) (this follows by considering only the sets ∅ ⊂ X ⊂ V , we can provide inner and outer bounds of P f P subset of inequalities of ∂1 (X) with sets Y such that |X\Y | = the superdifferentials. 1). Hence x(X\Y ) ≤ j∈X\Y f(j|X\j) ≤ f(X) − f(Y ). 1) Outer Bounds on the Superdifferential: It is possible to Hence an irredundant set of inequalities include those defined provide a number of useful and practical outer bounds on the P f f only through the singletons. superdifferential. Recall that ∂1 (Y ) and ∂2 (Y ) are already f f In order to show the characterization for ∂2 (X), we have simple polyhedra. We can then provide outer bounds on ∂3 (Y ) f Rn f f that ∂2 (X) = {x ∈ : x(Y \X) ≥ f(Y ) − f(X), ∀Y ⊇ that, together with ∂1 (Y ) and ∂2 (Y ), provide simple bounds . It then follows that, f f Rn X} x(Y \X) = j∈Y \X x(j) ≥ on ∂ (Y ). Define for 1 ≤ k,l ≤ n: ∂3,△(k,l)(X)= {x ∈ : f(j|X), since ∀j∈ / X, x(j) ≥ f(j|X). Hence f(Y )−x(Y ) ≤ f(X)−x(X), ∀Y : Y 6⊆ X, Y 6⊇ X, |Y \X|≤ j∈Y \X P x(X\Y ) ≥ f(j|X) ≥ f(Y ) − f(X), and again, k − 1, |X\Y |≤ l − 1}. P j∈Y \X an irredundant set of inequalities include those defined only We can then define the outer bound: through the singletons.P f f f f ∂△(k,l)(X)= ∂1 (X) ∩ ∂2 (X) ∩ ∂3,△(k,l)(X). (5) The above result significantly reduces the inequalities gov- erning ∂f (X), and in fact, the polytopes ∂f (X) and ∂f (X) f k+l 1 2 Observe that ∂△(k,l)(X) is expressed in terms of O(n ) are very simple polyhedra. Recall that this is analogous inequalities, and hence for a given k,l we can obtain the to the submodular subdifferential (Theorem 2), where again f representation of ∂△(k,l)(X) in polynomial time. We will see owing to submodularity the number of inequalities are reduced that this provides us with a heirarchy of outer bounds on the significantly. In that case, we just need to consider the sets superdifferential: Y which are subsets and supersets of X. It is interesting to note the contrast between the redunduncy of inequalities in the Theorem 9. For a submodular function f the following hold: a) f f f , b) ′ ′ subdifferentials and the superdifferentials. In particular, here, ∂△(1,1)(X) = ∂1 (X) ∩ ∂2 (X) ∀1 ≤ k ≤ k, 1 ≤ l ≤ f f f f the inequalities corresponding to sets Y being the subsets and l, ∂ (X) ⊆ ∂△(k,l)(X) ⊆ ∂△(k′ ,l′)(X) ⊆ ∂△(1,1)(X) and c) supersets of X are mostly redundunt, while the non-redundunt f f ∂△(n,n)(X)= ∂ (X). ones are the rest of the inequalities. In other words, in the case of the subdifferential, ∂1(X) and ∂2(X) were non-redundunt, Proof. The proofs of items 1 and 3 follow directly from f f definitions. To see item 2, notice that the polyhedra ∂f while ∂3(X) was entirely redundunt given the first two. In the △(k,l) f become tighter as k and l increase finally approaching the case of the superdifferentials, ∂f (X) and ∂f (X) are mostly 1 2 superdifferential. internally redundant (they can be represented using only by n f inequalities), while ∂3 (X) has no redundancy in general. 2) Inner Bounds on the Superdifferential: While it is hard Unlike the subdifferentials, we cannot expect a closed form to characterize the extreme points of the superdifferential, f expression for the extreme points of ∂ (Y ) (we provide sev- we can provide some specific supergradients. Define three eral examples in the extended version of this paper). Moreover, vectors as follows: they also seem to be hard to characterize algorithmically. For example, the superdifferential membership problem is NP f(j|X − j) if j ∈ X gˆX (j)= (6) hard. (f(j) if j∈ / X Lemma 7. Given a submodular function f and a set Y : ∅⊂ f(j|V − j) if j ∈ X Y ⊂ V , the membership problem y ∈ ∂f (Y ) is NP hard. gˇX (j)= (7) (f(j|X) if j∈ / X f Proof. Notice that the membership problem y ∈ ∂ (Y ) is f(j|V − j) if j ∈ X equivalent to asking max f(X) − y(X) ≤ f(Y ) − y(Y ). g¯X (j)= (8) X⊆V f(j) if j∈ / X In other words, this is equivalent to asking if Y is a maximizer ( of f(X) − y(X) for a given vector y. This is the decision Then we have the following theorem: version of the submodular maximization problem and corre- spondingly is NP hard when ∅⊂ Y ⊂ V . Theorem 10. For a submodular function f, gˆX , gˇX , g¯X ∈ ∂f (X). Hence for every submodular function f and set X, Given that the membership problem is NP hard, it is also NP ∂f (X) is non-empty. hard to solve a linear program over this polyhedron [23], [24]. The superdifferential of the empty and ground set, however, Proof. For submodular f, the following bounds are known to can be characterized easily: hold [15]: Lemma 8. For any submodular function f such that f(∅)= f(Y ) ≤ f(X) − f(j|X\j)+ f(j|X ∩ Y ), f Rn 0, ∂ (∅) = {x ∈ : f(j) ≤ x(j), ∀j ∈ V }. Similarly j∈XX\Y j∈XY \X f Rn ∂ (V )= {x ∈ : f(j|V \j) ≥ x(j), ∀j ∈ V }. f(Y ) ≤ f(X) − f(j|X ∪ Y \j)+ f(j|X) This lemma is a direct consequence of Theorem 6. j∈XX\Y j∈XY \X Using submodularity, we can loosen these bounds further to characterizations. An important such subclass if the class of provide tight modular: M ♮-concave functions [25]. These include a number of special cases like rank functions, concave over cardinality f(Y ) ≤ f(X) − f(j|X −{j})+ f(j|∅) functions etc. In some sense, these functions very closely j∈XX\Y j∈XY \X resemble concave functions. The following result provides f(Y ) ≤ f(X) − f(j|V −{j})+ f(j|X) a compact representation of the superdifferential of these j∈X\Y j∈Y \X functions. X X f(Y ) ≤ f(X) − f(j|V −{j})+ f(j|∅). Lemma 12. Given a submodular function f which is M ♮- V j∈XX\Y j∈XY \X concave on {0, 1} , its superdifferential satisfies, From the three bounds above, and substituting the expressions f ∂f (X)= ∂ (X) (14) of the supergradients, we may immediately verify that these △(2,2) f are supergradients, namely that gˆX , gˇX , g¯X ∈ ∂ (X). In particular, it can be characterized via O(n2) inequalities. Next, we provide inner bounds using the super-gradients C. Optimality Conditions for submodular maximization defined above. Define two polyhedra: Just as the subdifferential of a submodular function pro- f Rn ∂∅ (X)= {x ∈ : f(j) ≤ x(j), ∀j∈ / X}, (9) vides optimality conditions for submodular miinimization, f Rn ∂V (X)= {x ∈ : f(j|V \j) ≥ x(j), ∀j ∈ X}. (10) the superdifferential provides the optimality conditions for f f f f f submodular maximization. Then define: ∂i,1(X)= ∂1 (X) ∩ ∂V (X), ∂i,2(Y )= ∂2 (Y ) ∩ We start with unconstrained submodular maximization: f f f f f ∂∅ (Y ) and ∂i,3(Y )= ∂V (Y )∩∂∅ (Y ). Then note that ∂i,1(Y ) f max f(X) (15) is a polyhedron with gˆY as an extreme point. Similarly ∂i,2(Y ) X⊆V has gˇ , while ∂f (Y ) has g¯ as its extreme points. All Y i,3 Y Given a submodular function, we can give the KKT like these are simple polyhedra, with a single extreme point. Also conditions for submodular maximization: For a submodular define: ∂f (Y )= conv(∂f (Y ), ∂f (Y )), where conv(., .) i,(1,2) i,1 i,2 function f, a set A is a maximizer of f, if 0 ∈ ∂f (A). represents the convex combination of two polyhedra4. Then f However as expected, finding the set A, with the property is a polyhedron which has and as its extreme f ∂i,(1,2)(Y ) gˆY gˇY above, or even verifying if for a given set A, 0 ∈ ∂ (A) points. The following lemma characterizes the inner bounds of are both NP hard problems (from Lemma 7). However thanks the superdifferential: to the submodularity, we show that the outer bounds on the Lemma 11. Given a submodular function f, super-differential provide approximate optimality conditions for submodular maximization. Moreover, these bounds are f f f f ∂i,3(Y ) ⊆ ∂i,2(Y ) ⊆ ∂i,(1,2)(Y ) ⊆ ∂ (Y ), (11) easy to obtain. f f f f f ∂i,3(Y ) ⊆ ∂i,1(Y ) ⊆ ∂i,(1,2)(Y ) ⊆ ∂ (Y ) (12) Proposition 13. For a submodular function f, if 0 ∈ ∂(1,1)(A) Proof. The proof of this lemma follows directly from the then A is a local maxima of f (that is, ∀B ⊇ A, f(A) ≥ definitions of the supergradients, corresponding polyhedra and f(B), &∀C ⊆ A, f(A) ≥ f(C)). Furthermore, if we define 1 submodularity. S = argmaxX∈{A,V \A}f(A), then f(S) ≥ 3 OPT where OPT is the optimal value. Finally we point out interesting connections between ∂f (X) and ∂f (X). Firstly, it is clear from the definitions that Proof. The local optimality condition follows directly from f △(1,1) f f the definition of ∂ (A) and the approximation guarantee ∂f (X) ⊆ ∂f (X) and ∂ (X) ⊆ ∂△(1,1)(X). Notice also (1,1) △(1,1) f follows from Theorem 3.4 in [13]. that both ∂f (X) and ∂△(1,1)(X) are simple polyhedra with a single extreme point, The above result is interesting observation, since a very simple outer bound on the superdifferential, leads us to an f(j|X − j) if j ∈ X g˜X (j)= (13) approximate optimality condition for submodular maximiza- (f(j|X) if j∈ / X tion. We can also provide an interesting sufficient condition

The point g˜X , is in general, neither a subgradient nor a for the maximizers of a submodular function. supergradient at X. However both the semidifferentials are 0 f Lemma 14. If for any set A, ∈ ∂i,(1,2)(A), then A is the contained within (different) polyhedra defined via g˜X . Please global maxima of the submoduar function. In particular, if refer to the extended version for more details and figures a local maxima A is found (which is typically easy to do), demonstrating this. it is guaranteed to be a global maxima, if it happens that While it is hard to characterize superdifferentials of gen- 0 ∈ ∂f (A). eral submodular functions, certain subclasses have some nice i,(1,2) Proof. This proof follows from the fact that ∂f (A) ⊆ 4Given two polyhedra P1, P2, P = conv(P1, P2) = {λx1 + (1 − i,(1,2) f f f 2 1 1 2 2 λ)x , λ ∈ [0, 1],x ∈ P ,x ∈ P } ∂ (A) ⊆ ∂(1,1)(A). Thus, if 0 ∈ ∂i,(1,2)(A), it must also belong to ∂f (A), which means A is the global optimizer of [23] M. Grotschel, L. Lov´asz, and A. Schrijver, “Geometric methods in f. combinatorial optimization,” in Silver Jubilee Conf. on Combinatorics, 1984, pp. 167–183. [24] A. Schrijver, Combinatorial optimization: polyhedra and efficiency. We can also provide similar results for constrained submod- Springer Verlag, 2003, vol. 24. ular maximization, which we omit in the interest of space. For [25] K. Murota, Discrete Convex Analysis. Mathematical Programming, more details see the longer version of this paper [3]. 2003.

REFERENCES

[1] J. Bilmes and A. Karbasi, “Submodularity in information and data science,” in International Symposium on Information Theory Tutorial, 2018. [2] S. Fujishige, “On the subdifferential of a submodular function,” Math- ematical programming, vol. 29, no. 3, pp. 348–360, 1984. [3] R. Iyer and J. Bilmes, “Polyhedral aspects of submodularity, convexity and concavity,” arXiv preprint arXiv:1506.07329, 2015. [4] K. Wei, R. Iyer, and J. Bilmes, “Submodularity in data subset selection and active learning,” in International Conference on , 2015, pp. 1954–1963. [5] Y. Liu, R. Iyer, K. Kirchhoff, and J. Bilmes, “Svitchboard ii and fisver i: High-quality limited-complexity corpora of conversational english speech,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015. [6] M. Gygli, H. Grabner, and L. Van Gool, “Video summarization by learning submodular mixtures of objectives,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3090– 3098. [7] S. Tschiatschek, R. Iyer, H. Wei, and J. Bilmes, “Learning mixtures of submodular functions for image collection summarization,” in Neural Information Processing Society (NIPS), Montreal, CA, December 2014. [8] R. Bairi, R. Iyer, G. Ramakrishnan, and J. Bilmes, “Summarization of multi-document topic hierarchies using submodular mixtures,” in Proceedings of the 53rd Annual Meeting of the Association for Computa- tional Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015, pp. 553–563. [9] A. Krause, J. Leskovec, C. Guestrin, J. VanBriesen, and C. Faloutsos, “Efficient sensor placement optimization for securing large water distri- bution networks,” Journal of Water Resources Planning and Manage- ment, vol. 134, no. 6, pp. 516–526, 2008. [10] S. Fujishige, Submodular functions and optimization. Elsevier Science, 2005, vol. 58. [11] ——, “Theory of submodular programs: A fenchel-type min-max theo- rem and subgradients of submodular functions,” Mathematical program- ming, vol. 29, no. 2, pp. 142–155, 1984. [12] L. Lov´asz, “Submodular functions and convexity,” Mathematical Pro- gramming, 1983. [13] U. Feige, V. Mirrokni, and J. Vondr´ak, “Maximizing non-monotone submodular functions,” SIAM J. COMPUT., vol. 40, no. 4, pp. 1133– 1155, 2007. [14] J. Lee, V. Mirrokni, V. Nagarajan, and M. Sviridenko, “Non-monotone submodular maximization under matroid and knapsack constraints,” in STOC. ACM, 2009, pp. 323–332. [15] G. Nemhauser, L. Wolsey, and M. Fisher, “An analysis of approximations for maximizing submodular set functions—i,” Mathematical Program- ming, vol. 14, no. 1, pp. 265–294, 1978. [16] C. Chekuri, J. Vondr´ak, and R. Zenklusen, “Submodular function maximization via the multilinear relaxation and contention resolution schemes,” STOC, 2011. [17] S. Sahni, “Computationally related problems,” SIAM Journal on Com- puting, vol. 3, no. 4, pp. 262–279, 1974. [18] R. Iyer, S. Jegelka, and J. Bilmes, “Fast Semidifferential based Submod- ular function optimization,” in ICML, 2013. [19] S. Jegelka and J. Bilmes, “Submodularity beyond submodular energies: coupling edges in graph cuts,” in Computer Vision and Pattern Recog- nition (CVPR), 2011. [20] R. Iyer and J. Bilmes, “The submodular Bregman and Lov´asz-Bregman divergences with applications,” in NIPS, 2012. [21] ——, “Algorithms for approximate minimization of the difference between submodular functions, with applications,” In UAI, 2012. [22] ——, “Submodular Optimization with Submodular Cover and Submod- ular Knapsack Constraints,” in NIPS, 2013.