Streaming Algorithms Via Reductions
Total Page:16
File Type:pdf, Size:1020Kb
University of Massachusetts Amherst ScholarWorks@UMass Amherst Doctoral Dissertations Dissertations and Theses Fall November 2014 Streaming Algorithms Via Reductions Michael S. Crouch University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/dissertations_2 Part of the Theory and Algorithms Commons Recommended Citation Crouch, Michael S., "Streaming Algorithms Via Reductions" (2014). Doctoral Dissertations. 172. https://doi.org/10.7275/6050904.0 https://scholarworks.umass.edu/dissertations_2/172 This Open Access Dissertation is brought to you for free and open access by the Dissertations and Theses at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. STREAMING ALGORITHMS VIA REDUCTIONS A Dissertation Presented by MICHAEL STEVEN CROUCH Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY September 2014 Computer Science c Copyright by Michael Steven Crouch 2014 All Rights Reserved STREAMING ALGORITHMS VIA REDUCTIONS A Dissertation Presented by MICHAEL STEVEN CROUCH Approved as to style and content by: Andrew McGregor, Chair Neil Immerman, Member Ramesh Sitaraman, Member Marco Duarte, Member Lori A. Clarke, Chair Computer Science To my parents, for their tireless support and unwavering love. ACKNOWLEDGEMENTS We thank an anonymous reviewer for mentioning that ideas from the pattern matching result in Erg¨unet al. [41] can be applied to cyclic shifts in the time-series model without requiring a second pass (x2.5). We also thank Graham Cormode for suggesting a simplification to the cyclic shift algorithm presented in x2.5.2. v ABSTRACT STREAMING ALGORITHMS VIA REDUCTIONS SEPTEMBER 2014 MICHAEL STEVEN CROUCH B.S., CARNEGIE MELLON UNIVERSITY B.S., CARNEGIE MELLON UNIVERSITY M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Andrew McGregor In the streaming algorithms model of computation we must process data \in order" and without enough memory to remember the entire input. We study reduc- tions between problems in the streaming model with an eye to using reductions as an algorithm design technique. Our contributions include • \Linear Transformation" reductions, which compose with existing linear sketch techniques. We use these for small-space algorithms for numeric measurements of distance-from-periodicity, finding the period of a numeric stream, and de- tecting cyclic shifts. • The first streaming graph algorithms in the \sliding window" model, where we must consider only the most recent L elements for some fixed threshold L. We develop basic algorithms for connectivity and unweighted maximum matching, then develop a variety of other algorithms via reductions to these problems. • A new reduction from maximum weighted matching to maximum unweighted matching. This reduction immediately yields improved approximation guar- vi antees for maximum weighted matching in the semistreaming, sliding window, and MapReduce models, and extends to the more general problem of finding maximum independent sets in p-systems. • Algorithms in a \stream-of-samples" model which exhibit clear sample vs. space tradeoffs. These algorithms are also inspired by examining reductions. We pro- vide algorithms for calculating Fk frequency moments and graph connectivity. vii TABLE OF CONTENTS Page ACKNOWLEDGEMENTS ........................................... v ABSTRACT .......................................................... vi LIST OF TABLES ................................................... xii LIST OF FIGURES..................................................xiii CHAPTER 1. INTRODUCTION ................................................. 1 1.1 Notation . .2 1.1.1 `p norms..................................................2 1.2 The Streaming Model: History . .4 1.3 The Data Stream Model . .5 1.3.1 Numeric Streams . .5 1.3.2 Linear Sketches . .6 1.3.3 Graph Streams. .8 1.3.4 Approximation and Randomization . .9 1.3.5 Sampling Problems . .9 1.3.5.1 Streaming Example: `p Sampling . .9 1.4 Example Reduction in the Streaming Model . 10 1.4.1 Precision Sampling: `1-Sampling Reduces to Heaviest Hitter . 10 1.5 Contributions . 12 1.5.1 Linear Transformation Reductions . 12 viii 1.5.1.1 Periodicity Results . 13 1.5.2 Polylog-parallel Reductions . 14 1.5.3 Sliding Window Model . 15 1.5.4 Weighted Matching . 17 1.5.5 Sampling vs. Space . 18 1.6 Other Reduction Models . 19 1.7 Chernoff-Hoeffding Bounds . 20 1.8 Organization . 22 2. PERIODICITY ................................................... 23 2.1 Introduction . 23 2.1.1 Results and Related Work . 25 2.1.2 Notation . 27 2.1.3 Precision . 27 2.2 Fourier Preliminaries and Choice of Distance Function . 27 2.2.1 Discrete Fourier Transform and Sketches . 27 2.2.2 Choice of Distance Function . 28 2.3 Reductions Using the Discrete Fourier Transform . 30 2.3.1 Distance from Fixed Periodicity . 30 2.3.2 Determining Perfect Periodicity: Noiseless Case . 32 2.3.3 Determining Perfect Periodicity: Noisy Case . 34 2.3.3.1 Fourier Sampling . 34 2.3.3.2 Application to the Noiseless Case. 36 2.3.3.3 Application to the Noisy Case . 36 2.4 Distance from Fixed Periodicity . 37 2.5 Cyclic Shifts . 40 2.5.1 Time-Series Model . 40 2.5.2 Cyclic Shift Distance . 41 2.6 Conclusion . 42 3. SLIDING WINDOW GRAPH STREAMS ........................ 43 3.1 Introduction . 44 3.1.1 Sliding-Window Model . 44 ix 3.1.2 Results . 44 3.2 Connectivity and Graph Sparsification . 45 3.2.1 Algorithm . 46 3.2.2 Analysis . 46 3.2.3 Applications: Bipartiteness and Graph Sparsification . 47 3.2.3.1 Bipartiteness . 47 3.2.3.2 Graph Sparsification . 48 3.3 Matchings . 48 3.3.1 Maximum Cardinality Matching . 48 3.3.1.1 Smooth Histograms . 49 3.3.1.2 Matchings are Almost Smooth . 49 3.3.1.3 Space Usage . 53 3.3.1.4 Approximation Factor . 53 3.3.2 Weighted Matching . 54 3.4 Minimum Spanning Tree . 54 3.5 Graph Spanners . 55 3.6 Conclusions . 57 4. MATCHING ..................................................... 58 4.1 Introduction . 58 4.2 Definitions and Results . 59 4.2.1 Independence Systems . 59 4.2.2 Streaming Reductions . 60 4.2.3 Main Result . 61 4.3 Algorithm . 62 4.4 Extensions . 63 4.5 Lower Bounds for Graph Matching . 65 4.6 Conclusion . 67 5. SAMPLE VS. SPACE COMPLEXITY ............................ 69 5.1 Introduction . 69 5.1.1 Sufficient Statistics and Data Streams . 70 5.1.2 Subsampling vs. Supersampling . 70 5.1.3 Results . ..