Submodular Functions, Optimization, and Applications to Machine Learning — Spring Quarter, Lecture 11 —
Total Page:16
File Type:pdf, Size:1020Kb
Submodular Functions, Optimization, and Applications to Machine Learning | Spring Quarter, Lecture 11 | http://www.ee.washington.edu/people/faculty/bilmes/classes/ee596b_spring_2016/ Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering http://melodi.ee.washington.edu/~bilmes May 9th, 2016 f (A) + f (B) f (A B) + f (A B) Clockwise from top left:v Lásló Lovász Jack Edmonds ∪ ∩ Satoru Fujishige George Nemhauser ≥ Laurence Wolsey = f (A ) + 2f (C) + f (B ) = f (A ) +f (C) + f (B ) = f (A B) András Frank r r r r Lloyd Shapley ∩ H. Narayanan Robert Bixby William Cunningham William Tutte Richard Rado Alexander Schrijver Garrett Birkho Hassler Whitney Richard Dedekind Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F1/59 (pg.1/60) Logistics Review Cumulative Outstanding Reading Read chapters 2 and 3 from Fujishige's book. Read chapter 1 from Fujishige's book. Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F2/59 (pg.2/60) Logistics Review Announcements, Assignments, and Reminders Homework 4, soon available at our assignment dropbox (https://canvas.uw.edu/courses/1039754/assignments) Homework 3, available at our assignment dropbox (https://canvas.uw.edu/courses/1039754/assignments), due (electronically) Monday (5/2) at 11:55pm. Homework 2, available at our assignment dropbox (https://canvas.uw.edu/courses/1039754/assignments), due (electronically) Monday (4/18) at 11:55pm. Homework 1, available at our assignment dropbox (https://canvas.uw.edu/courses/1039754/assignments), due (electronically) Friday (4/8) at 11:55pm. Weekly Office Hours: Mondays, 3:30-4:30, or by skype or google hangout (set up meeting via our our discussion board (https: //canvas.uw.edu/courses/1039754/discussion_topics)). Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F3/59 (pg.3/60) Logistics Review Class Road Map - IT-I L1(3/28): Motivation, Applications, & L11(5/2): From Matroids to Basic Definitions Polymatroids, Polymatroids L2(3/30): Machine Learning Apps L12(5/4): (diversity, complexity, parameter, learning L13(5/9): target, surrogate). L14(5/11): L3(4/4): Info theory exs, more apps, L15(5/16): definitions, graph/combinatorial examples, L16(5/18): matrix rank example, visualization L17(5/23): L4(4/6): Graph and Combinatorial Examples, matrix rank, Venn diagrams, L18(5/25): examples of proofs of submodularity, some L19(6/1): useful properties L20(6/6): Final Presentations L5(4/11): Examples & Properties, Other maximization. Defs., Independence L6(4/13): Independence, Matroids, Matroid Examples, matroid rank is submodular L7(4/18): Matroid Rank, More on Partition Matroid, System of Distinct Reps, Transversals, Transversal Matroid, L8(4/20): Transversals, Matroid and representation, Dual Matroids, L9(4/25): Dual Matroids, Properties, Combinatorial Geometries, Matroid and Greedy L10(4/27): Matroid and Greedy, Polyhedra, Matroid Polytopes, Finals Week: June 6th-10th, 2016. Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F4/59 (pg.4/60) Logistics Review The greedy algorithm In combinatorial optimization, the greedy algorithm is often useful as a heuristic that can work quite well in practice. The goal is to choose a good subset of items, and the fundamental tenet of the greedy algorithm is to choose next whatever currently looks best, without the possibility of later recall or backtracking. Sometimes, this gives the optimal solution (we saw three greedy algorithms that can find the maximum weight spanning tree). Greedy is good since it can be made to run very fast O(n log n). Often, however, greedy is heuristic (it might work well in practice, but worst-case performance can be unboundedly poor). We will next see that the greedy algorithm working optimally is a defining property of a matroid, and is also a defining property of a polymatroid function. Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F5/59 (pg.5/60) Logistics Review Matroid and the greedy algorithm Let (E; ) be an independence system, and we are given a I non-negative modular weight function w : E R+. ! Algorithm 1: The Matroid Greedy Algorithm 1 Set X ; ; 2 while v E X s.t. X v do 9 2 n [ f g 2 I 3 v argmax w(v): v E X; X v ; 2 f 2 n [ f g 2 Ig 4 X X v ; [ f g Same as sorting items by decreasing weight w, and then choosing items in that order that retain independence. Theorem 11.2.7 Let (E; ) be an independence system. Then the pair (E; ) is a matroid if I I and only if for each weight function w E, Algorithm 1 leads to a set 2 R+ I of maximum weight w(I). 2 I Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F6/59 (pg.6/60) Logistics Review Summary of Important (for us) Matroid Definitions Given an independence system, matroids are defined equivalently by any of the following: All maximally independent sets have the same size. A monotone non-decreasing submodular integral rank function with unit increments. The greedy algorithm achieves the maximum weight independent set for all weight functions. Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F7/59 (pg.7/60) Logistics Review Matroid Polyhedron in 2D + E Pr = x R : x 0; x(A) r(A); A E (11.10) 2 ≥ ≤ 8 ⊆ Consider this in two dimensions. We have equations of the form: x1 0 and x2 0 (11.11) ≥ ≥ x1 r( v1 ) 0; 1 (11.12) ≤ f g 2 f g x2 r( v2 ) 0; 1 (11.13) ≤ f g 2 f g x1 + x2 r( v1; v2 ) 0; 1; 2 (11.14) ≤ f g 2 f g Because r is submodular, we have r( v1 ) + r( v2 ) r( v1; v2 ) + r( ) (11.15) f g f g ≥ f g ; so since r( v1; v2 ) r( v1 ) + r( v2 ), the last inequality is either f g ≤ f g f g touching (so inactive) or active. Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F8/59 (pg.8/60) Logistics Review Matroid Polyhedron in 2D x2 x2 x1 + x2 = r( v1, v2 ) = 2 x2 r(v2)=1 r(v2)=1 { } x + x = r( v , v ) = 1 1 2 { 1 2} r( v , v ) = 0 { 1 2} x1 r(v1)=1 x1 r(v1)=1 x1 And, if v2 is a loop ... x2 x2 r( v2 ) ≤ { } x 1 + r( v , v ) = 1 x { 1 2} 2 r(v2)=0 ≤ r( v x { r(v1)=1 1 1 , v x2 0 2 ) ≥ } x1 0 ≥ x r( v ) 1 ≤ { 1} Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F9/59 (pg.9/60) Logistics Review Independence Polyhedra For each I of a matroid M = (E; ), we can form the incidence 2 I I vector 1I . Taking the convex hull, we get the independent set polytope, that is ( ) [ E P = conv 1I [0; 1] (11.10) ind. set f g ⊆ I2I + Since 1I : I P P , we have max w(I): I f 2 Ig ⊆ ind. set ⊆ r f 2 Ig ≤ max w|x : x P max w|x : x P + f 2 ind. setg ≤ f 2 r g Now take the rank function r of M, and define the following polyhedron: + E Pr x R : x 0; x(A) r(A); A E (11.11) , 2 ≥ ≤ 8 ⊆ Now, take any x P , then we have that x P + (or 2 ind. set 2 r P P +). We show this next. ind. set ⊆ r Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F10/59 (pg.10/60) Logistics Review P P + ind. set ⊆ r If x P , then 2 ind. set X x = λi1Ii (11.10) i for some appropriate vector λ = (λ1; λ2; : : : ; λn). Clearly, for such x, x 0. ≥ Now, for any A E, ⊆ | X | x(A) = x 1A = λi1Ii 1A (11.11) i X λi max 1Ij (E) (11.12) ≤ j:Ij ⊆A i = max 1Ij (E) = max A I (11.13) j:Ij ⊆A I2I j \ j = r(A) (11.14) Thus, x P + and hence P P +. 2 r ind. set ⊆ r Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F11/59 (pg.11/60) Logistics Review Matroid Independence Polyhedron So recall from a moment ago, that we have that P = conv I2I 1I ind. set f[ f gg + E Pr = x R : x 0; x(A) r(A); A E (11.19) ⊆ 2 ≥ ≤ 8 ⊆ In fact, the two polyhedra are identical (and thus both are polytopes). We'll show this in the next few theorems. Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F12/59 (pg.12/60) Logistics Review Maximum weight independent set via greedy weighted rank Theorem 11.2.5 Let M = (V; ) be a matroid, with rank function r, then for any weight I V function w R+, there exists a chain of sets U1 U2 Un V 2 ⊂ ⊂ · · · ⊂ ⊆ such that n X max w(I) I = λir(Ui) (11.19) f j 2 Ig i=1 where λi 0 satisfy ≥ n X w = λi1Ui (11.20) i=1 Prof. Jeff Bilmes EE596b/Spring 2016/Submodularity - Lecture 11 - May 9th, 2016 F13/59 (pg.13/60) Logistics Review Linear Program LP Consider the linear programming primal problem maximize w|x subject to xv 0 (v V ) (11.19) ≥ 2 x(U) r(U)( U V ) ≤ 8 ⊆ 2n And its convex dual (note y R+ , yU is a scalar element within this 2 exponentially big vector): P minimize U⊆V yU r(U); subject to yU 0 ( U V ) (11.20) P ≥ 8 ⊆ yU 1U w U⊆V ≥ Thanks to strong duality, the solutions to these are equal to each other.