Submodular Functions, Optimization, and Applications to Machine Learning — Spring Quarter, Lecture 6 —
Total Page:16
File Type:pdf, Size:1020Kb
Submodular Functions, Optimization, and Applications to Machine Learning — Spring Quarter, Lecture 6 — http://www.ee.washington.edu/people/faculty/bilmes/classes/ee563_spring_2018/ Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering http://melodi.ee.washington.edu/~bilmes April 11th, 2018 f (A) + f (B) f (A B) + f (A B) Clockwise from top left:v Lásló Lovász Jack Edmonds ∪ ∩ Satoru Fujishige George Nemhauser ≥ Laurence Wolsey = f (A ) + 2f (C) + f (B ) = f (A ) +f (C) + f (B ) = f (A B) András Frank r r r r Lloyd Shapley ∩ H. Narayanan Robert Bixby William Cunningham William Tutte Richard Rado Alexander Schrijver Garrett Birkho Hassler Whitney Richard Dedekind Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F1/46 (pg.1/163) Logistics Review Cumulative Outstanding Reading Read chapter 1 from Fujishige’s book. Read chapter 2 from Fujishige’s book. Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F2/46 (pg.2/163) Logistics Review Announcements, Assignments, and Reminders If you have any questions about anything, please ask then via our discussion board (https://canvas.uw.edu/courses/1216339/discussion_topics). Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F3/46 (pg.3/163) Logistics Review Class Road Map - EE563 L1(3/26): Motivation, Applications, & L11(4/30): Basic Definitions, L12(5/2): L2(3/28): Machine Learning Apps L13(5/7): (diversity, complexity, parameter, learning L14(5/9): target, surrogate). L15(5/14): L3(4/2): Info theory exs, more apps, L16(5/16): definitions, graph/combinatorial examples L17(5/21): L4(4/4): Graph and Combinatorial Examples, Matrix Rank, Examples and L18(5/23): Properties, visualizations L–(5/28): Memorial Day (holiday) L5(4/9): More Examples/Properties/ L19(5/30): Other Submodular Defs., Independence, L21(6/4): Final Presentations L6(4/11): Matroids, Matroid Examples, maximization. Matroid Rank, Partition/Laminar Matroids L7(4/16): Laminar Matroids, System of Distinct Reps, Transversals, Transversal Matroid, Matroid Representation, Dual Matroids L8(4/18): L9(4/23): L10(4/25): Last day of instruction, June 1st. Finals Week: June 2-8, 2018. Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F4/46 (pg.4/163) Logistics Review Composition of non-decreasing submodular and non-decreasing concave Theorem 6.2.1 Given two functions, one defined on sets V f : 2 R (6.1) ! and another continuous valued one: φ : R R (6.2) ! V the composition formed as h = φ f : 2 R (defined as ◦ ! h(S) = φ(f(S))) is nondecreasing submodular, if φ is non-decreasing concave and f is nondecreasing submodular. Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F5/46 (pg.5/163) Logistics Review Monotone difference of two functions Let f and g both be submodular functions on subsets of V and let (f g)( ) be either monotone non-decreasing or monotone non-increasing − · Then h : 2V R defined by ! h(A) = min(f(A); g(A)) (6.1) is submodular. Proof. If h(A) agrees with f on both X and Y (or g on both X and Y ), and since h(X) + h(Y ) = f(X) + f(Y ) f(X Y ) + f(X Y ) (6.2) ≥ [ \ or h(X) + h(Y ) = g(X) + g(Y ) g(X Y ) + g(X Y ); (6.3) ≥ [ \ the result (Equation ?? being submodular) follows since f(X) + f(Y ) min(f(X Y ); g(X Y )) + min(f(X Y ); g(X Y )) g(X) + g(Y ) ≥ [ [ \ \ (6.4) ::: Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F6/46 (pg.6/163) Logistics Review Arbitrary functions: difference between submodular funcs. Theorem 6.2.1 Given an arbitrary set function h, it can be expressed as a difference V between two submodular functions (i.e., h 2 R, 8 2 ! f; g s.t. A; h(A) = f(A) g(A) where both f and g are submodular). 9 8 − Proof. Let h be given and arbitrary, and define: ∆ α = min h(X) + h(Y ) h(X Y ) h(X Y ) (6.4) X;Y :X6⊆Y;Y 6⊆X − [ − \ If α 0 then h is submodular, so by assumption α < 0. Now let f be an arbitrary≥ strict submodular function and define ∆ β = min f(X) + f(Y ) f(X Y ) f(X Y ) : (6.5) X;Y :X6⊆Y;Y 6⊆X − [ − \ Strict means that β > 0. ::: Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F7/46 (pg.7/163) Logistics Review Many (Equivalent) Definitions of Submodularity f(A) + f(B) f(A B) + f(A B); A; B V (6.16) ≥ [ \ 8 ⊆ f(j S) f(j T ); S T V; with j V T (6.17) j ≥ j 8 ⊆ ⊆ 2 n f(C S) f(C T ); S T V; with C V T (6.18) j ≥ j 8 ⊆ ⊆ ⊆ n f(j S) f(j S k ); S V with j V (S k ) (6.19) j ≥ j [ f g 8 ⊆ 2 n [ f g f(A B A B) f(A A B) + f(B A B); A; B V (6.20) [ j \ ≤ Xj \ Xj \ 8 ⊆ f(T ) f(S) + f(j S) f(j S T j ); S; T V ≤ j − j [ − f g 8 ⊆ j2T nS j2SnT (6.21) X f(T ) f(S) + f(j S); S T V (6.22) ≤ j 8 ⊆ ⊆ j2T nS X X f(T ) f(S) f(j S j ) + f(j S T ) S; T V ≤ − j n f g j \ 8 ⊆ j2SnT j2T nS (6.23) X f(T ) f(S) f(j S j ); T S V (6.24) ≤ − j n f g 8 ⊆ ⊆ j2SnT Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F8/46 (pg.8/163) Logistics Review On Rank V Let rank : 2 Z+ be the rank function. ! In general, rank(A) A , and vectors in A are linearly independent if ≤ j j and only if rank(A) = A . j j If A; B are such that rank(A) = A and rank(B) = B , with j j j j A < B , then the space spanned by B is greater, and we can find a j j j j vector in B that is linearly independent of the space spanned by vectors in A. To stress this point, note that the above condition is A < B , not j j j j A B which is sufficient (to be able to find an independent vector) but⊆ not required. In other words, given A; B with rank(A) = A & rank(B) = B , then j j j j A < B an b B such that rank(A b ) = A + 1. j j j j , 9 2 [ f g j j Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F9/46 (pg.9/163) Logistics Review Spanning trees/forests & incidence matrices 5 A directed version of the graph 2 7 1 12 (right) and its adjacency matrix 3 9 8 8 (below). 1 4 6 11 Orientation can be arbitrary. 4 7 2 10 Note, rank of this matrix is 7. 3 6 5 1 2 3 4 5 6 7 8 9 10 11 12 1 0 110000000000 1 − 2B 1 0 1 0 1 0 0 0 0 0 0 0 C B − C 3B 0 1 0 1 0 1 0 0 0 0 0 0 C B − − C 4B 0 0 1 1 0 0 1 1 0 0 0 0 C B − − C 5B 0 0 0 0 0 1 1 0 0 1 0 0 C B − C 6B 0 0 0 0 0 0 0 1 1 0 1 0 C B − − C 7@ 0 0 0 0 1 0 0 0 1 0 0 1 A 8 0 0 0 0− 0 0 0 0 0 1 1 1 − − Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F10/46 (pg.10/163) Hence, is down-closed or “subclusive”, under subsets. In other words, I A B and B A (6.1) ⊆ 2 I ) 2 I Let = I1;I2;::: be a set of all subsets of V such that for any I , I f g 2 I the vectors indexed by I are linearly independent. Given a set B of linearly independent vectors, then any subset A B is also linearly2 independent. I ⊆ maxInd: Inclusionwise maximal independent subsets (i.e., the set of bases of) of any set B V defined as: ⊆ maxInd(B) A B : A and v B A; A v = (6.2) , f ⊆ 2 I 8 2 n [ f g 2 Ig Given any set B V of vectors, all maximal (by set inclusion) subsets of ⊂ linearly independent vectors are the same size. That is, for all B V , ⊆ A1;A2 maxInd(B); A1 = A2 = rank(B) (6.3) 8 2 j j j j Matroids Matroid Examples Matroid Rank More on Partition Matroid From Matrix Rank Matroid ! So V is set of column vector indices of a matrix. Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 6 - April 11th, 2018 F11/46 (pg.11/163) Hence, is down-closed or “subclusive”, under subsets. In other words, I A B and B A (6.1) ⊆ 2 I ) 2 I Given a set B of linearly independent vectors, then any subset A B is also linearly2 independent. I ⊆ maxInd: Inclusionwise maximal independent subsets (i.e., the set of bases of) of any set B V defined as: ⊆ maxInd(B) A B : A and v B A; A v = (6.2) , f ⊆ 2 I 8 2 n [ f g 2 Ig Given any set B V of vectors, all maximal (by set inclusion) subsets of ⊂ linearly independent vectors are the same size.