Submodular Analysis, Duality and Optimization

Submodular Analysis, Duality and Optimization Yao-Liang Yu [email protected] Dept. of Computing Science University of Alberta December 14, 2015 This note is intended to present some fundamental results about submodular functions and their applications in discrete optimization (with special interest in machine learning applications). Most results are taken from various sources, with some occasional improvements. The algorithm section is expected to go through a large update soon. Page 38 – 43 are not available currently, due to conflict with some ongoing work. Contents 1 Distributive Lattice2 7 Duality 45 2 Submodular Functions4 8 Algorithms 52 3 Basic Properties 11 9 Graph Theorems and Algorithms 54 4 Greedy Algorithm 17 10 Matroid 65 5 The Lovász Extension 27 6 The Choquet Integral 29 11 Integer Polyhedra 76 1 Distributive Lattice 1 Distributive Lattice Alert 1.1: Skipping this section If one is willing to restrict himself to the full domain 2Ω, the power set of Ω, then this section can be skipped without much harm. Nevertheless, it is recommended to read at least Theorem 1.1 so that one understands how to deal with general (distributive) lattices. Let (L; ≤) be a partially ordered set. For any pair x; y 2 L, its least upper bound (or the join operator w.r.t. the order ≤), if exists, is denoted as x _ y, and its greatest lower bound (the meet operator), if exists, is similarly denoted as x ^ y. When all pairs have least upper bound (supremum) and greatest lower bound (infimum), we call (L; ≤) a lattice—the central domain for us. We will focus on distributive lattices—those enjoy the distributive law: for all x; y; z 2 L, x _ (y ^ z) = (x _ y) ^ (x _ z) x ^ (y _ z) = (x ^ y) _ (x ^ z): For the product set of ordered sets, we equip it with the natural pointwise order. It is a (distributive) lattice if the factors are. Example 1.1: Not all lattices are distributive Let L = f?; >; x; y; zg where > is largest, ? is smallest, and fx; y; zg are not directly comparable. Then x _ (y ^ z) = x _? = x while (x _ y) ^ (x _ z) = > ^ > = >. Remark 1.1: Lattice operators characterize the order It is clear that the lattice operators ^ and _ are completely determined by the underlying order, with the following properties: • Idempotent: 8x 2 L; x ^ x = x; x _ x = x; • Symmetric: 8x; y 2 L; x ^ y = y ^ x; x _ y = y _ x; • Absorptive: 8x; y 2 L; (x ^ y) _ x = x; (x _ y) ^ x = x; • Associative: 8x; y; z 2 L; (x ^ y) ^ z = x ^ (y ^ z); (x _ y) _ z = x _ (y _ z). On the other hand, given two operators ^ and _ on some set L with the above four properties, we can define an order on L: x ≤ y () x ^ y = x (or x ≤ y () x _ y = y), and the lattice operators associated with the defined order are exactly the ^ and _ that we begin with. Perhaps the most important example for a distributive lattice is the power set, denoted as 2Ω, of a nonempty ground set Ω, ordered by the set inclusion. A bit surprisingly, the converse is also true. Recall that two lattices L1 and L2 are isomorphic if there exists some bijective function f : L1 ! L2 such that f(x ^ y) = f(x) ^ f(y) and f(x _ y) = f(x) _ f(y). We say L0 ⊆ L a sublattice if L0 is itself a lattice with the inherited lattice operators. Note that it is possible for L0 to have different lattice operators than L, in which case we call L0 a lattice subspace. Example 1.2: Not all lattice subspaces are sublattices Let L = C([0; 1]) be the set of continuous functions on the interval [0; 1], equipped with the pointwise order. It is clearly a lattice. Take L0 to be all affine functions. Again L0 is a lattice, but its own lattice operators are different from those of L. Theorem 1.1: [Birkhoff, 1948, p. 140] Any distributive lattice is isomorphic to a sublattice of 2Ω for some ground set Ω. Proof: Proof will be added. December 14, 2015 revision: 1 SADO_main 2 1 Distributive Lattice More definitions. We call a lattice L bounded if it has a largest element > and a smallest element ?; complete if any subset has infimum and supremum; complemented if for any X ⊆ L there exists Y ⊆ L such that X _ Y = >;X ^ Y = ?; finite if the cardinality jLj < 1. Moreover, the product set is bounded, complete, complemented if the factors are, and finite if we only have finitely many factors which themselves are finite. A bounded, complemented, distributive lattice is called a Boolean algebra. Proposition 1.1: Finite lattices are nice Any finite lattice is bounded and complete. Remark 1.2: A special ordered set Ω T We consider decomposing a complete sublattice L ⊆ 2 with ;; Ω 2 L. Let I[x] := x2X2L X be the smallest element in L that contains x 2 Ω. I[x] is well defined since Ω 2 L and L is complete. Clearly, y 2 I[x] () I[y] ⊆ I[x]. Define the equivalence class [x] = fy : I[y] = I[x]g. Then P = f[x]: x 2 Ωg is a partition of Ω, and we order its elements by [x] [y] () I[x] ⊆ I[y]. Note that the cardinality of P may be strictly smaller than that of Ω. The resulting ordered set (P; ) will freely appear many times in our later development. Alert 1.2: Notation S S Following set theory, when I = fAj : j 2 Jg is a set of sets, we use I as a shorthand for j2J Aj. Recall that for an ordered set (O; ≤), I ⊆ O is called an (lower) ideal if x y 2 I =) x 2 I, i.e., an ideal contains all of its dominated elements. Similarly I is called an upper ideal if I 3 x y =) y 2 I. We verify that I ⊆ O is a lower ideal iff O n I is an upper ideal. Besides, all ideals themselves, under the inclusion order, form a lattice whose join and meet operators are simply the set union and intersection, respectively. An ideal in the form of fx 2 O : x yg for some y 2 O is called the principal ideal and denoted as Iy. The collection of all principal ideals form a lattice subspace (not necessarily sublattice!) of the set of all ideals, in fact, it is isomorphic to the original ordered set O under the identification x 7! Ix. Every ideal in a finite ordered set is a union of principal ideals, and an ideal remains to be an ideal after removing any of its maximal elements (which always exists for a finite set). Theorem 1.2: Distributive lattices correspond to ideals Let L ⊆ 2Ω be a complete distributive sublattice that contains ;; Ω. Consider the ordered set (P; ) constructed in Remark 1.2. Then for each ideal I ⊆ P, S I 2 L. Conversely, for any X 2 L, I := fI 2 P : I ⊆ Xg is an ideal of P that forms a partition of X. S Proof: Suppose I is an ideal in P. Fix x 2 X := I. Thus [x] 2 I. For each y 2 I[x], I[y] ⊆ I[x] =) [y] [x] =) [y] 2 I since I is an ideal in (P; ). Therefore y 2 X and S consequently X = x2X I[x] 2 L due to the completeness of L. Conversely, let X 2 L. Then for any I 2 P, either I ⊆ X or I \ X = ;. Since S P = Ω, I := fI 2 P : I ⊆ Xg forms a partition of X. Clearly, if P 3 J I 2 I, then J ⊆ X hence J 2 I, meaning that I thus defined is indeed an ideal of (P; ). In other words, any complete distributive lattice (L; ≤) consists of merely the union of each ideal of a potentially different set P equipped with a potentially different order . Theorem 1.3: Maximal increasing sequence determines the partition Let L ⊆ 2Ω be a finite distributive sublattice that contains ;; Ω. Let ; = S0 ⊂ S1 ⊂ · · · ⊂ Sk = Ω December 14, 2015 revision: 1 SADO_main 3 2 Submodular Functions be any maximal increasing sequence in L. Then P = fSi n Si−1; i = 1; : : : ; kg: (1) In particular, all maximal increasing sequences are equally long. S Proof: According to Theorem 1.2, each Si is the union of an ideal in (P; ), i.e., Si = Ii; Ii = P fI1;:::;Iji g ⊆ 2 . Let I be a maximal element in Ii n Ii−1 (w.r.t. the order ). Clearly I is also maximal in Ii since Si ⊃ Si−1. Let I = Ii n fIg. By the maximality of I, I is again an ideal and I ⊇ Ii−1. Thus by the maximality of fSig we must have I = Ii−1, i.e., I = Si n Si−1 2 P. This proves that fSi n Si−1; i = 1; : : : ; kg ⊆ P. Since S0 = ;;Sk = Ω and jSi n Si−1j = 1 we must have k = jPj, i.e., the equality in (1). From the proof it is clear that for any j > i we cannot have Sj n Sj−1 Si n Si−1. Therefore we can deduce the set P from any maximal increasing sequence in L. For the extreme case where P = Ω, we say the lattice L is simple. Pleasantly, the simple lattice (L ⊆ 2Ω; ≤) is just the collection of all ideals of the ordered set (Ω; ).

Load more