Faster Algorithms for High Dimensional Convex Optimization Jakub Pachocki
Total Page:16
File Type:pdf, Size:1020Kb
Graphs and Beyond: Faster Algorithms for High Dimensional Convex Optimization Jakub Pachocki CMU-CS-16-107 May 2016 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Gary L. Miller, Chair Anupam Gupta Daniel Sleator Shang-Hua Teng, University of Southern California Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright © 2016 Jakub Pachocki This research was sponsored by the National Science Foundation under grant numbers CCF-1018463 and CCF-1065106 and by the Sansom Graduate Fellowship in Computer Science. Parts of this work were done while at the Simons Institute for for the Theory of Computing, UC Berkeley. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: Convex Optimization, Linear System Solvers, Spectral Graph Theory To my parents iv Abstract Convex optimization is one of the most robust tools for automated data analysis. It has widespread applications in fields such as machine learning, computer vision, combinatorial optimization and scientific computing. However, the rapidly increasing volume and complexity of data that needs to be processed often renders general-purpose algorithms unusable. This thesis aims to address this issue through development of very fast algorithms for core convex optimization problems, with a focus on graphs. To this end, we: • Develop nearly optimal algorithms for the Fermat-Weber (geometric median) problem, a fundamental task in robust estimation and clustering. • Investigate how to approximate massive amounts of high-dimensional data in a restrictive streaming setting, achieving optimal tradeoffs. • Develop the fastest algorithm for solving linear systems on undirected graphs. This is achieved through improved clustering algorithms and better understanding of the interplay between combinatorial and numerical aspects of previous approaches. • Show the existence of clustering and oblivious routing algorithms for a broad family of directed graphs, in hopes of advancing the search for faster maximum flow algorithms. Most of the presented algorithms work in time nearly linear in the sparsity of the input data. The unifying theme of these results is careful analysis of optimization algorithms in high-dimensional spaces, and mixing combinatorial and numerical approaches. vi Acknowledgments First and foremost, I would like to thank my advisor, Gary Miller. His mentorship and support have been invaluable and fundamentally shaped the course of my studies. I am extremely grateful to the many co-authors I have been fortunate to work with over the past three years. Many of the ideas of this thesis have been shaped by long and fruitful collaborations with Michael Cohen, Richard Peng and Aaron Sidford. The results presented here represent joint work with Michael Cohen, Alina Ene, Rasmus Kyng, Yin-Tat Lee, Gary Miller, Cameron Musco, Richard Peng, Anup Rao, Aaron Sidford and Shen Chen Xu. I would like to thank my thesis committee members, Anupam Gupta, Daniel Sleator and Shang-Hua Teng for help and advice during the dissertation process. I am grateful to Daniel Sleator for his guidance throughout my entire stay in graduate school. I am indebted to my high school teacher, Ryszard Szubartowski, for igniting my interest in computer science. I am grateful for the countless discussions, his infectious enthusiasm and unwavering support. Finally, I would like to thank my friends for their help and encouragement, and my family for their love and support. vii viii Contents 1 Introduction 1 1.1 Dealing with High Dimensional Data . .1 1.2 Convex Optimization and Solving Linear Systems . .3 1.3 Structure of This Thesis . .4 2 Geometric Median 7 2.1 Introduction . .7 2.2 Notation . 13 2.3 Properties of the Central Path . 14 2.4 Nearly Linear Time Geometric Median . 16 2.5 Pseudo Polynomial Time Algorithm . 19 2.6 Weighted Geometric Median . 23 Appendices 25 2.A Properties of the Central Path (Proofs) . 25 2.B Nearly Linear Time Geometric Median (Proofs) . 40 2.C Derivation of Penalty Function . 47 2.D Technical Facts . 48 3 Online Row Sampling 55 3.1 Introduction . 55 3.2 Overview . 58 3.3 Analysis of sampling schemes . 62 3.4 Asymptotically optimal algorithm . 68 3.5 Matching Lower Bound . 73 ix 4 Faster SDD Solvers I: Graph Clustering Algorithms 77 4.1 Introduction . 79 4.2 Background . 82 4.3 Overview . 83 4.4 From Bartal Decompositions to Embeddable Trees . 88 4.5 Two-Stage Tree Construction . 94 Appendices 109 4.A Sufficiency of Embeddability . 109 5 Faster SDD Solvers II: Iterative Methods 113 5.1 Introduction . 114 5.2 Overview . 115 5.3 Expected Inverse Moments . 119 5.4 Application to Solving SDD linear systems . 125 Appendices 135 5.A Chebyshev Iteration with Errors . 135 5.B Finding Electrical Flows . 141 5.C Relation to Matrix Chernoff Bounds . 143 5.D Propagation and Removal of Errors . 145 6 Clustering and Routing for Directed Graphs 149 6.1 Introduction . 149 6.2 Overview . 152 6.3 Oblivious Routing on Balanced Graphs . 154 6.4 Lower Bounds . 161 6.5 Maximum Flow and Applications . 163 Appendices 173 6.A Missing proofs from Section 6.2 . 173 6.B Analysis of the clustering algorithm . 174 Bibliography 178 x List of Figures 1.1 Social network as graph; corresponding matrix B...................2 3.1 The basic online sampling algorithm . 61 3.1 The low-memory online sampling algorithm . 68 3.1 The Online BSS Algorithm . 69 4.1 Bartal decomposition and final tree produced . 86 4.1 Bartal’s Decomposition Algorithm . 89 4.2 Using a generic decomposition routine to generate an embeddable decomposition . 90 4.3 Constructing a Steiner tree from a Bartal decomposition . 93 4.1 The routine for generating AKPW decompositions . 96 4.2 Pseudocode of two pass algorithm for finding a Bartal decomposition . 98 4.3 Overall decomposition algorithm . 105 5.1 Sampling Algorithm . 116 5.1 Generation of a Randomized Preconditioner . 127 5.2 Randomized Richardson Iteration . 130 5.3 Recursive Solver . 133 5.A.1Preconditioned Chebyshev Iteration . 136 5.D.1Preconditioned Richardson Iteration . 147 6.1 The low-radius decomposition algorithm for directed graphs. 155 6.2 An unweighted Eulerian graph where a particular edge is very likely to be cut by the scheme of [MPX13]. 155 6.3 The low-stretch arborescence finding algorithm. 157 6.4 The decomposition algorithm with a specified root. 158 xi 6.5 The algorithm for finding single-source oblivious routings on balanced graphs (adapted from [Räc08]). 161 6.1 The example from Theorem 6.2.6 . 162 6.2 The example from Theorem 6.2.7 . 163 6.1 The algorithm for computing the maximum flow and minimum congested cut. 165 6.2 Faster algorithm for computing the maximum flow and minimum congested cut. 167 xii Chapter 1 Introduction The amount and complexity of objects subject to automated analysis increases at a rapid pace. Tasks such as processing satellite imagery in real time, predicting social interactions and semi-automated medical diagnostics become a reality of everyday computing. In this thesis, we shall explore multiple facets of the fundamental problem of analyzing such data effectively. 1.1 Dealing with High Dimensional Data In order to enable rigorous reasoning, we will associate our problems with a space Rd such that all the features of each object in our problem class can be conveniently represented as d real numbers. We refer to the integer d as the dimensionality of our problem. The different dimensions can correspond to pixels in an image; users in a social network; characteristics of a hospital patient. Often, the complexity of the problems of interest translates directly to high dimensionality. 1.1.1 Representation The most common way to describe a d-dimensional problem involving n objects is simply as an n × d real matrix. In the case of images, the numerical values might describe the colors of pixels. In a graph, they might be nonzero only for the endpoints of a particular edge (see Figure 1.1). In machine learning applications, they might be 0-1, describing the presence or absence of binary features. 1.1.2 Formulating Convex Objectives Given a dataset, we want to conduct some form of reasoning. Examples of possible questions include: • What is the community structure of this social network? • What does this image look like without the noise? • What are the characteristics of a typical patient? 1 ABCD Alice Daria 2 1 −1 0 0 3 6−1 0 0 1 7 6 7 4 0 1 0 −15 0 −1 1 0 Bob Charles Figure 1.1: Social network as graph; corresponding matrix B. • How likely is this user to like a given movie? To make these problems concrete, we turn such questions into optimization problems. This amounts to defining a set S of feasible solutions and an objective function f. For the above examples: • (Clustering a social network.) S is the possible clusterings of the nodes of the network; f measures the quality of a clustering. • (Denoising an image.) S is the set of all images of a given size; f is the similarity of the image to the original, traded off against its smoothness. • (Finding a typical patient.) S are the possible characteristics of a patient; f measures the averaged difference with the patients in the database. • (Predicting whether a user likes a movie.) S are the possible models for predicting user preferences; f measures the prediction quality of a model, traded off against its simplicity. The problem can then be stated succinctly as minimize f(x) (1.1) subject to x 2 S: In order to design efficient algorithms for finding the optimal (or otherwise satisfactory) solution x to the above problem, we need to make further assumptions on f and S.