Convex Analysis of Submodular Set Functions for Machine Learning

Convex Analysis for Minimizing and Learning Submodular Set Functions Thesis by Peter Stobbe In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy California Institute of Technology Pasadena, California 2013 (Defended May 16, 2013) ii c 2013 Peter Stobbe All Rights Reserved iii To Aura iv Abstract The connections between convexity and submodularity are explored, for purposes of minimizing and learning submodular set functions. First, we develop a novel method for minimizing a particular class of submodular functions, which can be expressed as a sum of concave functions composed with modular functions. The basic algorithm uses an accelerated first order method applied to a smoothed version of its convex extension. The smoothing algorithm is particularly novel as it allows us to treat general concave potentials without needing to construct a piecewise linear approximation as with graph-based techniques. Second, we derive the general conditions under which it is possible to find a minimizer of a submodular function via a convex problem. This provides a framework for developing submodular minimization algorithms. The framework is then used to develop several algorithms that can be run in a distributed fashion. This is particularly useful for applications where the submodular objective function consists of a sum of many terms, each term dependent on a small part of a large data set. Lastly, we approach the problem of learning set functions from an unorthodox perspective— sparse reconstruction. We demonstrate an explicit connection between the problem of learning set functions from random evaluations and that of sparse signals. Based on the observation that the Fourier transform for set functions satisfies exactly the conditions needed for sparse reconstruction algorithms to work, we examine some different function classes under which uniform reconstruction is possible. v Contents List of Figures vii List of Algorithms viii 1 Introduction1 1.1 Main Contributions....................................3 1.2 Outline of Thesis.....................................4 2 Background of Convex Analysis5 2.1 Basic Concepts.......................................5 2.2 Proximal Operators....................................8 3 Set Functions and Submodularity 12 3.1 Overview.......................................... 12 3.2 General Set Functions.................................. 12 3.2.1 Set Function Derivatives............................ 14 3.2.2 Monotone and Low Order Functions..................... 16 3.2.3 Fourier Analysis of Set Functions....................... 18 3.2.4 Tensor Product Bases of Set Functions.................... 20 3.3 Properties of Submodular Set Functions....................... 22 3.3.1 Convex Analysis of Submodularity...................... 23 3.3.2 Lovász Extension................................. 27 3.3.3 Examples of Base Polytopes.......................... 29 3.4 Submodular Minimization................................ 30 3.4.1 Ellipsoid Method and Polynomial Time Algorithms............ 31 3.4.2 Fujishige’s Minimal Norm Algorithm..................... 33 vi 3.4.3 Special Cases................................... 33 4 Smoothed Gradient Methods for Decomposable Functions 35 4.1 Introduction........................................ 35 4.2 Background on Submodular Function Minimization................ 35 4.3 The Decomposable Submodular Minimization Problem.............. 38 4.4 Classification of Submodular Functions........................ 40 4.4.1 Submodularity of Decomposable Functions................. 40 4.4.2 Set Cover Functions as Threshold Potentials................ 41 4.4.3 Reformulation of a Class of Functions.................... 42 4.4.4 Strict Generality of Threshold Potentials.................. 43 4.5 The SLG Algorithm for Threshold Potentials..................... 44 4.5.1 The Smoothed Extension of a Threshold Potential............ 44 4.5.2 The SLG Algorithm for Minimizing Sums of Threshold Potentials... 46 4.5.3 Early Stopping based on Discrete Certificates of Optimality....... 48 4.6 Extension to General Concave Potentials....................... 50 4.6.1 Formula Derivation............................... 51 4.7 Experiments........................................ 53 4.8 Conclusion......................................... 56 5 Distributed Submodular Minimization 57 5.1 Introduction........................................ 57 5.2 Submodular Minimization with General Barrier Functions............ 58 5.3 Consensus Algorithms for Submodular Minimization............... 63 5.3.1 Notation...................................... 63 5.3.2 Outline of Algorithms.............................. 64 5.4 Fast Proximal Threshold Potentials.......................... 66 5.4.1 General Plane Intersection Projection.................... 66 5.4.2 Box-Plane Projection Algorithms....................... 67 5.5 Experiments........................................ 69 6 Learning Fourier Sparse Set Functions 75 6.1 Introduction........................................ 75 vii 6.2 Background......................................... 77 6.2.1 The Fourier transform on set functions.................... 77 6.3 Conditions for Recovery................................. 78 6.4 Classes of Set Functions................................. 80 6.4.1 Symmetric functions............................... 81 6.4.2 Low order functions............................... 81 6.4.3 Submodular functions.............................. 82 6.5 Reconstruction Algorithms............................... 85 6.5.1 Exploiting structure in the Fourier domain.................. 86 6.6 Applications and Experiments............................. 86 6.6.1 Sketching graph evolution........................... 86 6.6.2 Approximate submodular optimization................... 88 6.6.3 Synthetic Submodular Recovery....................... 89 6.7 Related Work........................................ 90 6.8 Conclusion......................................... 92 7 Conclusion 94 A Matroid Theory 96 Bibliography 99 viii List of Figures 4.1 Example Regions and Comparision of Submodular Minimization Running Times 54 4.2 Segmentation Experimental Results.......................... 55 5.1 Synthetic Problem..................................... 71 5.2 Parallelization Speedup.................................. 72 5.3 FISTA convergence..................................... 73 5.4 Barzilai–Borwein Convergence.............................. 74 6.1 Graph Reconstruction and Approximate Optimization Results........... 87 6.2 Empirical Study of Submodular Constraints...................... 91 ix List of Algorithms 4.1 SLG: Smoothed Lovász Gradient............................. 48 4.2 Set Generation by Rounding the Continuous Solution............... 48 4.3 Set Optimality Check.................................... 50 4.4 Gradient for General Concave Functions........................ 53 5.1 Projection onto a Box Intersecting a Plane....................... 68 5.2 Piecewise Linear Monotonic Root: Sorted Version.................. 69 5.3 Piecewise Linear Monotonic Root: Unsorted Version................ 70 1 Chapter 1 Introduction There is no doubt that convex optimization has proven to be an invaluable tool throughout all of the applied sciences and engineering. Consider though: the formal definition of convexity is a completely abstract concept, yet somehow has proven to be key in the development of numerical algorithms for countless real-world applications. Given the tremendous track record of such a powerful abstract idea, the mandate of the applied mathematics community must be then to attempt to answer the question: “Can convexity be generalized? Can we discover similar abstract concepts that hold the key for solving new, important problems?” And while one search direction of this quest is to look into the realm of continuous functions for quasiconvex or invex functions as generalizations, the other is to look for discrete analogues of convexity. Indeed, many different such discrete generalizations have been discovered [Mur03], yet none of them could accurately be described as a perfect mirror image of convexity. But we would claim that there is one such concept from discrete optimization that in recent years has proven to be the most similar to convexity, not only in terms of its salient abstract features, but also its empirical problem solving utility: submodularity. Submodularity is a property of set functions—meaning functions of some subset of a finite set of objects. It has numerous equivalent definitions, but perhaps the easiest to parse is the following inequality. It says that the change in value of the set function when adding a particular element must be smaller when adding it to a larger1 set: f (A e ) f (A) f (B e ) f (B), for all elements e and sets A, B such that e = B A. [ f g − ≥ [ f g − 2 ⊇ One amazing property of submodularity is that there are powerful algorithms for both maxi- 1By “larger” we mean a superset, not just any set of greater cardinality! 2 mization and minimization with provable guarantees. While both are important, we focus on the latter, as it is in this domain that the connections with convexity are most pronounced. The exact minimum of a submodular function can be found in strongly polynomial time [IFF01]. This is similar to the fact that convex functions are hard to maximize, but easy to minimize (assuming the convex sets which characterize the problem are not too esoteric). This similarity is is not just a coincidence—the connection between submodularity and convexity goes beyond mere

Load more