On Bisubmodular Maximization

Ajit P. Singh Andrew Guillory Jeff Bilmes University of Washington, Seattle University of Washington, Seattle University of Washington, Seattle [email protected] [email protected] [email protected]

Abstract polynomial-time approximation algorithms exist when the constraints consist only of a cardinality con- Bisubmodularity extends the concept of sub- straint [22], or under multiple matroid constraints [6]. modularity to set functions with two argu- Our interest lies in enriching the scope of selection ments. We show how bisubmodular maxi- problems which can be efficiently solved. We discuss mization leads to richer value-of-information a richer class of properties on the objective, known as problems, using examples in sensor placement bisubmodularity [24, 1] to describe biset optimizations: and . We present the first constant-factor for max f(A, B) (2) A,B V a wide class of bisubmodular maximizations. ⊆ subject to (A, B) , i 1 . . . r . ∈ Ii ∀ ∈ { } 1 Introduction Bisubmodular function maximization allows for richer problem structures than submodular max: i.e., simul- Central to decision making is the problem of selecting taneously selecting and partitioning elements into two informative observations, subject to constraints on the groups, where the value of adding an element to a set cost of data acquisition. For example: can depend on interactions between the two sets. Sensor Placement: One is given a set of n locations, To illustrate the potential of bisubmodular maximiza- V , and a set of k n sensors, each of which can cover tion, we consider two distinct problems: a fixed area around it. Which subset X V should be ⊂ Coupled Sensor Placement (Section5): One is instrumented, to maximize the total area covered? given a set of n locations, V . There are two different Feature Selection: Given a regression model on n kinds of sensors, which differ in cost and area covered. features, V , and an objective which measures feature Given a total sensor budget, k, select sites to instru- quality, f : 2V R, select the best k features. ment with sensors of each type (A, B), to maximize → the total area covered. Both examples are selection problems, for which a near-optimal approximation can be found efficiently Coupled Feature Selection (Section6): Given a when the objective is submodular (see Definition1, regression model on n features, V , select a set of k below). Many selection problems involve a submodular features and partition them into disjoint sets A and objective: e.g., sensor placement [16, 15], document B, such that a joint measure of feature quality f : A B R is maximized. summarization [20], influence maximization in social × → networks [12], and feature selection [13]. All can be This paper is the first to (i) propose bisubmodular framed as submodular function maximization: maximization to describe a richer class of value-of- max f(S) (1) information problems; (ii) define sufficient conditions S V ⊆ under which bisubmodular max is efficiently solvable; subject to S i, i 1 . . . r . (iii) provide an algorithm for such instances (algorithms ∈ I ∀ ∈ { } for bisubmodular min exist [7]); (iv) prove, by con- The constraint sets are independent sets of , Ii struction, the existence of an embedding of a directed which include knapsack or cardinality constraints. Fully bisubmodular function into a submodular one. Ancil-

th lary to our study of bisubmodular maximization is a Appearing in Proceedings of the 15 International Con- new result on the extension of submodular functions ference on Artificial Intelligence and Statistics (AISTATS) 2012, La Palma, Canary Islands. Volume 22 of JMLR: defined over matroids (Theorem2). W&CP 22. Copyright 2012 by the authors. Our key observation is that many bisubmodular 1055 On Bisubmodular Maximization maximization problems can be reduced to matroid- 3 Simple Bisubmodularity constrained submodular maximization, for which ef- ficient, constant-factor approximation algorithms al- A natural analogue to submodularity for biset functions ready exist. We first introduce the concept of simple is simple bisubmodularity: bisubmodularity (Section3), which describes the biset Definition 4 (Simple Bisubmodularity). f : 22V analogue of submodularity. Directed bisubmodular- 2→V R is simple bisubmodular iff for each (A, B) 2 , ity [23] allows for more complex interaction between 2V ∈ (A0,B0) 2 with A A0, B B0 we have for the set arguments. The question of whether or not ∈ ⊆ ⊆ s / A0 and s / B0: directed bisubmodular functions can be efficiently max- ∈ ∈ imized was has been an open problem for over twenty f(A + s, B) f(A, B) f(A0 + s, B0) f(A0,B0), years. Our reduction approach proves that a wide range − ≥ − f(A, B + s) f(A, B) f(A0,B0 + s) f(A0,B0). of directed bisubmodular objectives can be efficiently − ≥ − maximized (Section4). 2V Equivalently, (A, B), (A0,B0) 2 , ∀ ∈ 2 Preliminaries f(A, B)+f(A0,B0) f(A A0,B B0)+f(A A0,B B0) ≥ ∪ ∪ ∩ ∩ We first introduce basic concepts and notation. Definition 1 (Submodularity). A set-valued function Fixing one of the coordinates of f(A, B) yields a sub- over ground set V , f : 2V R is submodular if for modular function in the free coordinate. → any A A0 V v, ⊆ ⊆ \ 3.1 Reduction to Submodular Maximization f(A + v) f(A) f(A0 + v) f(A0). (3) − ≥ − The reduction uses a duplication of the ground set Equivalently, A, A0 V , ¯ ∀ ⊆ (c.f., [10, 2]). Define V to be an extended ground set formed by taking the disjoint union of 2 copies of the f(A) + f(A0) f(A A0) + f(A A0). (4) ≥ ∪ ∩ original ground set V . For an element s V , we use ∈ ¯ + is used to denote adding an element to a set. si where i 1, 2 to refer to the ith copy in V . Then ¯ ∈ { } ¯ V = V1 V2 where Vi , i si. For an element s V A set function f(A, B) with two arguments A V and we use abs∪ (s) to refer to the corresponding original∈ ⊆ B V is a biset function. In some cases we assume element in V (i.e. abs(s S) s). For a set S V¯ we ⊆ i , the domain of f to be all ordered pairs of subsets of V : similarly use abs(S) abs(s): s S . ⊆ , { ∈ } 22V (A, B): A V,B V . , { ⊆ ⊆ } Given any simple bisubmodular function f, define a single-argument set function g for S V¯ : In other cases we assume the domain of f to be ordered ⊆ pairs of disjoint subsets of V : g(S) , f(abs(S V1), abs(S V2)). 3V (A, B): A V,B V,A B = . ∩ ∩ , { ⊆ ⊆ ∩ ∅} Note that there is a one-to-one mapping between A biset function is normalized if f( , ) = 0. In some f(A, B) for (A, B) 22V and g(S) for S V¯ and ∅ ∅ cases we will also assume f(A, B) is monotone. therefore maximizing∈g(S) is equivalent to maximizing⊆ Definition 2 (Monotonicity). A biset function f over f(A, B). Furthermore, 2V 2 is monotone (monotone non-decreasing) if for any Lemma 1. g(S) is submodular if f(A, B) is a simple s V , (A, B) 22V : ∈ ∈ bisubmodular function. f(A + s, B) f(A, B) and f(A, B + s) f(A, B). ≥ ≥ Note also that g is monotone when f is monotone. V A biset function f over 3 is monotone (monotone non- Based on this connection, maximizing a simple bisub- decreasing) if for any s V (A B), (A, B) 3V : ∈ \ ∪ ∈ modular function f reduces to maximizing a normal f(A + s, B) f(A, B) and f(A, B + s) f(A, B). submodular function g. We can then exploit constant- ≥ ≥ factor approximation algorithms for submodular func- Monotone submodular maximization is nontrivial only tion maximization. when constrained. We focus on matroid constraints: Corollary 1. If f(A, B) is non-negative and simple Definition 3 (Matroid). A matroid is a set of sets bisubmodular, then there exists a constant-factor ap- I with three properties: (i) , (ii) if A and proximation algorithm for solving Equation2 when the ∅ ∈ I ∈ I B A then B (i.e., is an independence system), constraints are either: (i) non-existent; (ii) A k , ⊆ ∈ I I 1 and (iii) if A and B and B > A then there B k ; (iii) A + B k; (iv) A B = | (v)| ≤ any ∈ I ∈ I | | | | 2 is an s B such that s / A and (A + s) . combination| | ≤ of| the| above.| | ≤ ∩ ∅ ∈ ∈ ∈ I 1056 Singh, Guillory, Bilmes

Proof. With no constraints, maximizing f(A, B) corre- The previous section answers this question for simple sponds to maximizing non-negative, submodular g(S), bisubmodular functions. solvable using Feige et al. [5]. The only question is In many situations, including those described in Sec- how to represent the other constraints on f as ma- tions5 and6, simple bisubmodularity is sufficient. The troid constraints on g. Cases (ii) and (iii) reduce contributions of this section are primarily theoretical: to uniform matroids. In case (ii), S V k , 1 1 we (i) connect simple to directed bisubmodularity, (ii) S V k ; in case (iii) S V +| S∩ V| ≤ k. 2 2 1 2 give a method for embedding a directed bisubmodular For| ∩ case| (iv) ≤ the constraint is| a∩ partition| | matroid∩ | ≤ con- function into a submodular one; (iii) provide sufficient straint v V, S v , v 1. For any intersection 1 2 conditions under which directed bisubmodular maxi- of a constant∀ ∈ number| ∩ { of matroid}| ≤ constraints we can use mization can be reduced to submodular maximization. the algorithms of Lee et al. [18]. For the special case of monotone f (and therefore g) we can use the simple Definition 5 (Directed Bisubmodularity [24]). Biset of Fisher et al. [6] for a constant-factor function f : 3V R is directed bisubmodular iff → approximation. For the special case of a single matroid constraint and monotone f and g, the algorithm of f(A, B) + f(A0,B0) f(A A0,B B0)+ ≥ ∩ ∩ Calinescu et al. [2] gives an optimal approximation f((A A0) (B B0), (B B0) (A A0)). ratio of 1 1/e. ∪ \ ∪ ∪ \ ∪ − The set subtraction can be seen as enforcing the con- 3.2 Coordinate-wise Maximization straint that the two arguments of f remain disjoint.

Simple bisubmodular functions can also be maximized To connect simple to directed bisubmodularity, we use using a coordinate-wise procedure. Consider a characterization of such functions by Ando et al. [1]. Definition 6. f : 3V is said to be submodular max f(A, B) in every orthant if for→ R every partition of V given by A,B A, B with A B = V and A B = the function subject to (A, B) 22V , A k , B k . ∪ ∩ ∅ ∈ | | ≤ 1 | | ≤ 2 fˆ(S) , f(A S, B S) If f is simple then it suffices to solve the following pair ∩ ∩ of submodular optimizations: is a submodular function. Theorem 1 (Ando Conditions [1]). A function f is A∗ = argmax f(A, ), A V : A k ∅ directed bisubmodular iff ⊆ | |≤ 1 B∗ = argmax f(A∗,B), 1. f is submodular in every orthant. B V : B k V ⊆ | |≤ 2 2. For any (A, B) 3 , s V (A B) we have f(A + s, B) + f(∈A, B + s)∈ 2\f(A,∪ B). which due to Corollary1, corresponds to the local ≥ greedy algorithm, which is approximately optimal [6]. The second condition is satisfied if f is monotone. It Budget constraints of the form A + B k may is not hard to show the following. | | | | ≤ be handled by converting the constraint into A Proposition 1. If f is simple bisubmodular then f | | ≤ V k1, B k2, and then taking the best solution across restricted to 3 is submodular in every orthant. | | ≤ all possible (k1, k2) division of the budget k1 + k2 = k. One possible approach to this would require O(k) Therefore we have this relationship between simple and submodular optimizations. Budget constraints of the directed bisubmodular functions. form c A + c B k with integer costs c , c are 1| | 2| | ≤ 1 2 Corollary 2. If f is simple bisubmodular and mono- handled in a similar fashion. The search-over-budget- tone then f restricted to 3V is directed bisubmodular. divisions approach is still approximately optimal, since the domain under one of the budget divisions contains 4.1 Embedding into a Submodular Function the optimal solution. However, searching over budget divisions requires O(k2) runs of the greedy algorithm. At first glance, the techniques of Section 3.1 might ap- pear to yield an approximation algorithm for directed 4 Directed Bisubmodularity bisubmodular maximization, but unfortunately they do not: f is only defined over 3V while Section 3.1 Simple bisubmodular functions are related to a different made use of the fact that simple bisubmodular func- class of functions called directed bisubmodular func- tions are defined over the larger set 22V . We prove, tions. Qi [23] posed the question of whether directed by construction, that any bisubmodular function can bisubmodular functions can be efficiently maximized. be embedded into a submodular one. However, for 1057 On Bisubmodular Maximization directed bisubmodular functions the resulting submod- partial function g which is submodular over the inde- ular function is not guaranteed to be monotone and pendent sets of a matroid I . Is it possible to define ∈ I non-negative, which precludes direct use of a submod- an extension of g, g0: a submodular function defined V¯ ular maximization oracle for directed bisubmodular on 2 , where I , g0(I) = g(I)? maximization. ∀ ∈ I A recent result by Seshadri and Vondr´ak [25, Theo- As before, define V¯ to be an extended ground set formed rem 1.7] shows that, for functions g defined on arbitrary by taking the disjoint union of 2 copies of the original 2V¯ , an extension is not guaranteed to exist. But, S ⊂ ground set V . Let g : S V¯ R, where every submodular partial function that comes up in ⊆ → this paper is one defined over a matroid; not an arbi- g(S) f(abs(S V ), abs(S V )). , ∩ 1 ∩ 2 trary set system. We refine the non-extendability result of Seshadri and Vondr´ak [25], by proving that a partial Since f is directed bisubmodular, it is only defined over function over a matroid can always be extended: disjoint pairs of subsets, and therefore g is not defined for each S V¯ . For example, if both s1 V and Theorem 2 (Submodular Extension). For any func- s V for some⊆ s V , then abs(S V ) and abs∈ (S V ) tion g(S) which is submodular over independence sys- 2 ∈ ∈ ∩ 1 ∩ 2 are not disjoint. When f is directed bisubmodular, tem , there exists an extension g0(S) which is submod- I there is no guarantee of a one-to-one mapping between ular, and has g0(S) = g(S) for S . ∈ I the domains of f and g. While Theorem2 is presented in the context of bisub- While g is not submodular over set system 2V¯ , it is V¯ modularity, the result is of independent interest, espe- submodular over the values in 2 where it is defined, cially in the context of testing whether a function is which are the independent sets of a matroid. submodular. Definition 7. g(S) is submodular over independence However, the extension g0 is not guaranteed to be non- system if for every A A0 (A0 s) we have I ⊆ ⊂ ∪ ∈ I negative and monotone. While a directed bisubmodular

g(A s) g(A) g(A0 s) g(A0). function can always be reduced a submodular one, ∪ − ≥ ∪ − efficient greedy algorithms cannot be directly used to Equivalently, g is submodular over if for any A, A0 maximize the resulting submodular function. I with (A A0) we have ∪ ∈ I Lemma 2. There exists g(S) that is non-negative and submodular over a matroid for which no extension g0(S) g(A) + g(A0) g(A A0) + g(A A0). ≥ ∪ ∩ is non-negative and submodular. A directed bisubmodular f is defined over 3V , so the Lemma 3. There exists g(S) that is non-negative, values where g is defined form a partition matroid. A monotone, and submodular over a matroid for which directed bisubmodular function is submodular in each no extension g0(S) is non-negative, monotone, and orthant, so g is submodular over its partition matroid: submodular. Corollary 3. If f(A, B) is directed bisubmodular then Even in light of Lemmas2 and3, we provide suffi- g(S) f(abs(S V ), abs(S V )) is both defined and , 1 2 cient conditions under which directed bisubmodular submodular over∩ the partition∩ matroid maximization can be efficiently solved.

= S : s V s1, s2 S 1 . Corollary 4. Let f(A, B) be a monotone, non- I { ∀ ∈ |{ } ∩ | ≤ } negative, directed bisubmodular function. If there ex- 2V Seemingly, maximizing f(A, B) is equivalent to maxi- ists an extension f 0(A, B) on 2 which is monotone, mizing g over a partition matroid, which ensures that non-negative, and simple bisubmodular with f 0(A, B) = the solution found is in 3V . Fisher et al. [6] provides f(A, B) for (A, B) 3V , then there is a constant-factor ∈ an algorithm for maximizing g subject to a matroid approximation algorithm for maximizing f. constraint, which requires only evaluating g for values in the matroid. The subtle flaw in this argument is that the proof of the approximation guarantee in Fisher et al. Proof. Maximizing f(A, B) reduces to maximizing g(S) [6] involves evaluating g outside of the constraints. over a partition matroid. g(S) is monotone, non- negative, and submodular over the partition matroid, 4.1.1 Extending Submodular Functions but undefined elsewhere. The existence of f 0 implies there exists a non-negative, monotone, and submodular V¯ A function g defined on a subset of 2 is known extension of g(S), g0(S), which is defined for all S V . S ⊂ ⊆ as a partial function. The embedding of a non-negative The existence of g0 suffices to ensure Fisher et al. [6] directed bisubmodular function leads to a non-negative yield a near-optimal solution. 1058 Singh, Guillory, Bilmes

Note that while f 0 must exist for the corollary to hold, goal is to cover a floor plan using sensors with a fixed we do not need to be able to construct or evaluate f 0. sensing, or coverage, radius. Figure1 depicts two of the layouts tested. The potential sensing locations V Testing whether g(S) has a monotone, non-negative consist of points on an 50 by 50 grid of a floor plan. extension is a problem. However, it is an open question as to whether the resulting (ex- Selected locations can be instrumented with sensors ponentially large) linear program is one which can be of type A or type B. Sensors cover a circular area, efficiently solved. but walls in the environment block coverage. Type Theorem 3. Given a matroid and a submodular A sensors have a sensing radius of rA units; type B I function g defined I , testing whether g has an sensors have a radius of rB units. The costs to deploy ∀ ∈ I a sensor of each type are cA, cB Z+. Each location extension g0 which is submodular, monotone, and non- ∈ negative is a linear programming feasibility problem. can be instrumented with at most one sensor. The goal is to maximize the overall coverage f(A, B) given a budget k Z+. In our experiments, rA = 20 and Proof. The proof is constructive. For g0(S) to be sub- ∈ modular it is both necessary and sufficient to have for rB = 10, with deployment costs cA = 3, cB = 1. In every S, i (V (S + j)) terms of cost-per-area covered, large sensors are better, ∈ \ but less flexible—i.e., using larger sensors to cover g0(S + i) g0(S) g0(S + i + j) g0(S + j) narrow hallways would be wasteful. − ≥ − This is an alternate definition of submodularity [22]. The objective f(A, B), the surface area covered, is 2V Then a non-negative, submodular extension exists iff monotone and simple bisubmodular over 2 . Since the following linear feasibility constraints are satisfied each location can hold at most one sensor, f is restricted to 3V , and is therefore also directed bisubmodular. The (gS = g(S), gS0 = g0(S)): optimization is g0 = g S , S S ∀ ∈ I g0 0 S/ , max f(A, B) S ≥ ∀ ∈ I A,B gS0 +i gS0 gS0 +i+j gS0 +j S V, i (V (S + j)). subject to (A, B) 3V , c A + c B k. − ≥ − ∀ ⊂ ∈ \ ∈ A| | B| | ≤ If monotonicity is also required this can be encoded as We compare three approaches: (i) only use small sen- additional constraints g0 g0 A B. A ≤ B ∀ ⊆ sors, B = ; (ii) only use large sensors, A = ; (iii) allow a mix∅ of both types of sensors. Approaches∅ (i) 4.2 Coordinate-wise Maximization and (ii) are instances of submodular maximization un- If f is directed bisubmodular and we wish to maximize der a cardinality constraint; approach (iii) is solved under the cardinality constraint A k , B k for using the reduction technique (Section 3.1). | | ≤ 1 | | ≤ 2 k1, k2 Z+, then an alternate approach, which makes The reduction yields a submodular optimization subject ∈ no use of the theoretical results is Section 4.1, is to to a matroid and a knapsack constraint, which could be solve the following set of submodular optimizations: solved using Chekuri et al. [4]. However, this method is more complicated than the standard greedy algorithm. A = argmax f(A, ), ∗ Instead, we convert the knapsack constraint into a A V : A k1 ∅ ⊆ | |≤ partition matroid constraint by searching over divisions B = argmax f(A ,B A ), ∗ ∗ ∗ of the budget (e.g., if k = 11, the budget divisions B V : B k2 \ ⊆ | |≤ include one large sensor and 10 small ones; or 2 large B = argmax f( ,B), ∗∗ sensors, and 9 small ones; etc.). Within a budget B V : B k2 ∅ ⊆ | |≤ division, we simultaneously maximize A and B, without A∗∗ = argmax f(A B∗∗,B∗∗), A V : A k \ resorting to coordinate maximization. ⊆ | |≤ 1 taking the better of (A∗,B∗) and (A∗∗,B∗∗). Budget 5.1 Results constraints of the form A + B k can be handled us- ing a search over budget| divisions| | | ≤ (Section 3.2). Unlike Table 5.1 shows our results for the two example layouts coordinate-wise maximization on a simple bisubmodu- and budgets of 15 and 30 (the first row shows results lar function, no near-optimality guarantee is provided. for layout 1 and a budget of 15). We report results in terms of percentage of coverage relative to the best 5 Coupled Sensor Placement method for that layout / budget combination (i.e., the rows of the table are normalized). Not surprisingly, In this section, we generalize the sensor placement the method using both kinds of sensors performs the problem to allow for two different kinds of sensors. The best. This method in fact runs the other two methods 1059 On Bisubmodular Maximization

A1 B1 . . C1 C2 A B k1 k2

Figure 3: Coupled Feature Selection model.

Figure 1: Layouts For Sensor Placement Experiments of the underlying Gaussian graphical model:

f(A, B) = I(A, B; C) = H(A, B) H(A, B C) − | = H(A, B) H(A C ) H(B C ), − i| 1 − j| 2 Table 1: Percent Coverage (Relative to Best Method) i j Problem Small Sensors Large Sensors Both X X 1 / 15 89.27 82.52 100.00 which we refer to as biset . Max- 1 / 30 97.35 96.29 100.00 imizing f is equivalent to choosing features for the 2 / 15 91.20 86.92 100.00 two tasks that are maximally informative about both 2 / 30 95.91 88.07 100.00 tasks. Using f as the coupled feature selection criterion yields a bisubmodular function maximization under the budget constraint A + B k. | | | | ≤ Without assumptions on the form of f, the problem as subroutines (using only large sensors is one possible is an instance of subset selection in a polytree di- allocation of the budget) and is therefore always at PP rected graphical model, which is known to be NP - least as good as the other two methods. Figure2 shows complete [14]. However, we can show that when f is the results for layout 1 and a budget of 15. Sensor restricted to 3V it is directed bisubmodular, under a locations are shown in red, and covered area is shown relatively broad class of models. in blue. Here the best placement of sensors uses small sensors to cover the small rooms and narrow hallways Theorem 4. Assume A B are mutually condition- ∪ of the environment and large sensors for the larger ally independent given C for any A V and B V . ⊆ ⊆ rooms. The two type sensor method seems to perform f(A, B) = H(A, B) H(A, B C) is directed bisubmod- − | significantly better in situations like this. ular, normalized, and monotone non-decreasing. The differences in performance between the three ap- proaches is necessarily dependent on the floor layout, Proof. Evaluate f( , ) to establish normalization. the budget, and the relative cost/coverage of the sensor Monotonicity follows∅ ∅ from the chain rule for mutual types. There are layouts where the value of using both information. Let R = A B and R0 = A0 B0 2V ∪ ∪ sensors is less dramatic, or non-existent. for (A, B), (A0,B0) 2 . I(R,R0; C) I(R; C) = ∈ − I(R0; C R) 0. Consider f(A, B) = H(A, B) H(A, B| C).≥ By the conditional independence assump-− 6 Coupled Feature Selection tion, H(|A, B C) is modular in its arguments. H(S) where S = A| B is submodular, so H(S) is simple ∪ We are given a Gaussian graphical model, depicted in bisubmodular. The difference of a simple bisubmodular Figure3, with two variables to predict: C1,C2. Given function and a modular one is simple bisubmodular. a set of features V , the goal is to select and partition Restricting f(A, B) to 3V yields a directed bisubmod- the features into two sets, A and B, such that C1 is ular function, by Corollary1. predicted using only features in A; and C2 is predicted using only features in B. Communication constraints Restricting A and B to be disjoint may make sense for preclude transmitting features between nodes C and 1 some feature selection problems. For example, if C C . However, local predictions (the value of C and 1 2 1 and C predict the weather for two different geographic C ) can be transmitted between nodes. 2 2 regions and the features V correspond to different phys- If we ignore the correlation between C1 and C2, then ical sensors, selecting a feature for both tasks would one criterion for feature selection is mutual informa- correspond to placing the same sensor in two different tion, which is submodular under the Na¨ıve Bayes model: geographic regions, which is clearly impossible. A simi- 2V I(A; C1) = H(A) i H(Ai C1). To exploit the cor- lar proof shows that when f is defined over 2 it is relation between tasks,− we use| the mutual information also simple bisubmodular. P 1060 Singh, Guillory, Bilmes

Figure 2: Left: Only Small Sensors, Middle: Only Large Sensors Right: Both

6.1 Results

We have proposed two algorithms for maximizing a simple or directed bisubmodular function: a slow, coordinate-wise maximization, and a fast reduction to submodular maximization. The reduction to sub- modular maximization is always faster, and it can also yield a result closer to the optimum in practice. In Section5 the comparison was between the same algo- rithm on different ground sets (only small sensors, only large sensors, both type of sensors). In this section, Figure 4: Coupled Feature Selection, comparison of the the comparison is between two different algorithms on approximation quality of the fast reduction algorithm the same instance of bisubmodular max. (red) vs. the slow coordinate-wise optimization (blue) To illustrate, we generate random instances of Gaus- across all budgets k. Error bars cover 2-std errors. sian graphical models by randomly generating inverse covariance matrices which respect the structure in Fig- approximation guarantees for a greedy algorithm with- ure3. There are twenty features, half connected to C1; out connecting the problem to submodularity. Leskovec half connected to C2. A positive correlation is fixed et al. [19] and Mutlu et al. [21] do make this connection; in the potential on (C1,C2). All other parameters are drawn from [ 0.5, 0.5], with a rejection test to ensure these authors pose the problem as a submodular max- that the resultingU − matrix is positive semi-definite. imization problem with a knapsack constraint. The knapsack constraint allows for different sensors to have Figure4 compares the quality of the approximation different costs. Our work is distinct from this previ- produced by the two algorithms. The y-axis is scaled ous work, however, in that we pose our problems as so that performance is measured as a percent of the optimization problems over two argument set functions unconstrained maximum of f(A, B). The standard (specifically bisubmodular set functions). error bars reflect variation due to averaging results across randomly generated Gaussian graphical models. Other work has also considered applications of sub- The faster reduction based algorithm achieves a result modular maximization subject to partition matroid closer to the optimum, constraints [9, 17]. We note that Golovin et al. [9] in particular considers maximization algorithms which Note that the fast algorithm is approximately optimal only evaulate f(S) within the constraint set. However, while the slower coordinate-wise algorithm is likely not. this algorithm still requires that f is submodular ev- This is because the coordinate-wise algorithm has only erywhere (i.e. that you can reason about the value of been shown approximately optimal when maximizing f(S) outside of the constraint set). over 22V while here we maximize over 3V 8 Conclusions 7 Related Work We believe that bisubmodularity is theoretically inter- Other related work has considered sensor placement esting, and potentially, a broadly useful approach to problems involving more than one type of sensor generalizing value-of-information problems. We have [11, 8, 3, 19, 21]. However, the majority of this work derived the first efficient algorithms for a wide range doesn’t make a connection to submodular function max- of bisubmodular maximizations—a requisite step in imization. Note that Fusco and Gupta [8] even derive promulgating this class of problems in the machine 1061 On Bisubmodular Maximization

learning community. We show that g0(S) is submodular. Consider any A, B with A = B and therefore A A B and B A B. However, there are still theoretical and applied con- If (A B6 ) then we have⊂ that∪ ⊂ ∪ tributions to be made: (i) we have provided sufficient ∪ ∈ I conditions under which directed bisubmodular func- g0(A) + g0(B) = g(A) + g(B) tions can be efficiently maximized, but the necessary g(A B) + g(A B) conditions remain unknown; (ii) building a catalogue ≥ ∪ ∩ = g0(A B) + g0(A B) of directed bisubmodular functions of interest to the ∪ ∩ community, especially ones that have using the submodularity of g over . If (A B) / no simple bisubmodular analogue. A topic of particular then we have that I ∪ ∈ I interest is developing an analogue of directed bisub- modularity for multiset functions, where the objective g0(A B) g0(A) + g0(B) g0(A B) is a function of r > 2 set arguments. ∪ ≤ − ∩ which implies the desired inequality. For any A, B Appendix with A = B we have trivially

g0(A) + g0(B) = g0(A B) + g0(A B) Proof of Lemma1. We show that for any A B V¯ ∪ ∩ and s / B, ⊆ ⊆ ∈ g(A + s) g(A) g(B + s) g(B) − ≥ − We show that it is not always possible to extend a The result follows directly from the definition of simple function g(S) that is submodular over a matroid such bisubmodularity. In particular, define that non-negativity is preserved. We also show a similar result for monotonicity. (A1,A2) , (abs(A V1), abs(A V2)), ∩ ∩ Proof of Lemma2. Define g(S) = k S . This is a (B1,B2) (abs(B V1), abs(B V2)). − | | , ∩ ∩ submodular function (in fact modular), and for all S Then either with S k (i.e. all S in a uniform matroid), g(S) is non-negative.| | ≤ However, to extend g(S) to S with g(A + s) g(A) S > k we must necessarily use negative values. − | | = f(A1 + abs(s),A2) f(A1,A2) − Proof of Lemma3. Consider the following submodular f(B + abs(s),B ) f(B ,B ) ≥ 1 2 − 1 2 g(S) defined for all S 2 with S 1, 2, 3 . = g(B + s) g(B) | | ≤ ⊆ { } − g( ) = 0 or g(∅1 ) = 1 g( 2 ) = 1 g( 3 ) = 1 g({1}, 2 ) = 1 g({2}, 3 ) = 2 g({1,}3 ) = 1 g(A + s) g(A) { } { } { } − = f(A ,A + abs(s)) f(A ,A ) An extension g0(S) must assign a value to g0( 1, 2, 3 ) 1 2 − 1 2 { } f(B ,B + abs(s)) f(B ,B ) We must have ≥ 1 2 − 1 2 = g(B + s) g(B) g0( 1, 2, 3 ) g0( 2, 3 ) = 2 − { } ≥ { } in order to maintain monotonicity. But we must also have Proof of Theorem2. For subsets of size S 1 define | | ≤ 1 g0( 1, 2, 3 ) g0( 1, 2 ) + g0( 1, 3 ) g0( 1 ) = 1 g0(S) = g(S) (we assume contains all singletons ). I { } ≤ { } { } − { } For S > 1 we define g0(S) recursively over subsets | | in order to maintain submodularity. Clearly we can’t of increasing size. For any size k > 2 define g0(S) for have both. S = k as follows: if S then | | ∈ I Acknowledgements g0(S) , g(S) else This material is based upon work supported by NSF under grant IIS-0535100, by NIH award R01 EB007057, g0(S) , min g0(X) + g0(Y ) g0(X Y ) and by an Intel, a Microsoft, and a Google research X S,Y S:S=X Y − ∩ ⊂ ⊂ ∪ award. The opinions expressed in this work are those 1If doesn’t contain all singletons, we can simply shrink of the authors and do not necessarily reflect the views the groundI set by removing all singletons not in of the funders. I 1062 Singh, Guillory, Bilmes

References [15] A. Krause, H. McMahan, C. Guestrin, and A. Gupta. Robust submodular observation se- [1] K. Ando, S. Fujishige, and T. Naitoh. A char- lection. JMLR, 2008. acterization of bisubmodular functions. Discrete [16] A. Krause, A. Singh, and C. Guestrin. Near- Mathematics, 148(1-3):299 – 303, 1996. optimal sensor placements in Gaussian processes: [2] G. Calinescu, C. Chekuri, M. P´al,and J. Vondr´ak. Theory, efficient algorithms and empirical studies. Maximizing a monotone submodular function sub- JMLR, 9:235–284, 2008. ject to a matroid constraint. SIAM Journal on [17] A. Krause, R. Rajagopal, A. Gupta, and Computing, 40(6), 2011. C. Guestrin. Simultaneous placement and schedul- [3] K. Chakrabarty, S. Iyengar, H. Qi, and E. Cho. ing of sensors. In Information Processing in Sensor Grid coverage for surveillance and target location Networks, 2009. in distributed sensor networks. Computers, IEEE [18] J. Lee, V. S. Mirrokni, V. Nagarajan, and M. Sviri- Transactions on, 51(12):1448–1453, 2002. denko. Maximizing nonmonotone submodular [4] C. Chekuri, J. Vondr´ak,and R. Zenklusen. Depen- functions under matroid or knapsack constraints. dent randomized rounding via exchange properties SIAM Journal on Discrete Mathematics, 23(4), of combinatorial structures. In FOCS, 2010. 2010. [5] U. Feige, V. S. Mirrokni, and J. Vondrak. Maxi- [19] J. Leskovec, A. Krause, C. Guestrin, C. Falout- mizing non-monotone submodular functions. In sos, J. VanBriesen, and N. Glance. Cost-effective FOCS, 2007. outbreak detection in networks. In KDD, 2007. [6] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. [20] H. Lin and J. Bilmes. Multi-document summa- An analysis of approximations for maximizing sub- rization via budgeted maximization of submodular modular set functions II. Polyhedral combinatorics, functions. In NAACL/HLT, 2010. pages 73–87, 1978. [21] B. Mutlu, A. Krause, J. Forlizzi, C. Guestrin, [7] S. Fujishige and S. Iwata. Bisubmodular func- and J. Hodgins. Robust, low-cost, non-intrusive tion minimization. In Integer Programming and sensing and recognition of seated postures. In Combinatorial Optimization, 2001. UIST, 2007. [8] G. Fusco and H. Gupta. Selection and orientation [22] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. of directional sensors for coverage maximization. An analysis of approximations for maximizing sub- In 6th Annual IEEE Communications Society Con- modular set functions I. Mathematical Program- ference on Sensor, Mesh and Ad Hoc Communi- ming, 14:265–294, 1978. cations and Networks, 2009. [23] L. Qi. Directed submodularity, ditroids and di- [9] D. Golovin, A. Krause, and M. Streeter. Online rected submodular flows. Mathematical Program- learning of assignments that maximize submodular ming, 42:579–599, 1988. functions. In NIPS, 2009. [24] L. Qi. Bisubmodular functions. CORE Discussion [10] P. R. Goundan and A. S. Schulz. Revisiting the Papers 1989001, Universit´ecatholique de Louvain, greedy approach to submodular set function max- Center for Operations Research and Econometrics imization. Working paper, MIT, 2007. (CORE), 1989. [11] E. H¨orsterand R. Lienhart. On the optimal place- [25] C. Seshadri and J. Vondr´ak. Is submodularity ment of multiple visual sensors. In 4th ACM In- testable? In Innovations in Computer Science ternational Workshop on Video Surveillance and (ICS), 2011. Sensor Networks, 2006. [12] D. Kempe, J. Kleinberg, and E. Tardos. Maxi- mizing the spread of influence through a social network. In KDD, 2003. [13] C. Ko, J. Lee, and M. Queyranne. An exact al- gorithm for maximum entropy sampling. Ops Research, 43(4):684–691, 1995. [14] A. Krause and C. Guestrin. Optimal nonmyopic value of information in graphical models - efficient algorithms and theoretical limits. In Proceedings of the 19th International Joint Conference on Arti- ficial Intelligence (IJCAI), pages 1339–1345, 2005. 1063