New Directions in Streaming Algorithms

New Directions in Streaming Algorithms by Ali Vakilian B.S., Sharif University of Technology (2011) M.S., University of Illinois at Urbana-Champaign (2013) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2019 c Massachusetts Institute of Technology 2019. All rights reserved. ○ Author...................................................................... Department of Electrical Engineering and Computer Science August 30, 2019 Certified by. Erik D. Demaine Professor of Electrical Engineering and Computer Science Thesis Supervisor Certified by. Piotr Indyk Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by................................................................. Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 New Directions in Streaming Algorithms by Ali Vakilian Submitted to the Department of Electrical Engineering and Computer Science on August 30, 2019, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science Abstract Large volumes of available data have led to the emergence of new computational models for data analysis. One such model is captured by the notion of streaming algorithms: given a sequence of N items, the goal is to compute the value of a given function of the input items by a small number of passes and using a sublinear amount of space in N. Streaming algorithms have applications in many areas such as networking and large scale machine learning. Despite a huge amount of work on this area over the last two decades, there are multiple aspects of streaming algorithms that remained poorly understood, such as (a) streaming algorithms for combinatorial optimization problems and (b) incorporating modern machine learning techniques in the design of streaming algorithms. In the first part of this thesis, we will describe (essentially) optimal streaming algorithms for set cover and maximum coverage, two classic problems in combinatorial optimization. Next, in the second part, we will show how to augment classic streaming algorithms of the frequency estimation and low-rank approximation problems with machine learning oracles in order to improve their space-accuracy tradeoffs. The new algorithms combine the benefits of machine learning with the formal guarantees available through algorithm design theory. Thesis Supervisor: Erik D. Demaine Title: Professor of Electrical Engineering and Computer Science Thesis Supervisor: Piotr Indyk Title: Professor of Electrical Engineering and Computer Science 3 4 Acknowledgments First and foremost, I would like to thank my advisors Erik Demaine and Piotr Indyk. I had the chance to learn from Erik while before joining MIT through his lectures on “Introduction to Algorithms” when I was a freshman at Sharif UT. His inspiring style of teaching played a significant role on my passion to pursue TCS in grad school. For me, it was very valuable to have the opportunity to work with Erik. Besides, Erik gave me the freedom to explore different areas and pursue my interests; though always being there to support and advise. The main topic of this thesis, streaming algorithms, and all results in this thesis are in collaboration with Piotr Indyk. His guidance and supervision has shaped my academic character and his support during my PhD studies were significant both academic-wise and social-wise. He always had time for me to discuss both research and non-research questions and his continuous passion for science, research and mentoring young researchers will be always a perfect example for me. Next, I would like to thank Ronitt Rubinfeld who kindly accepted to be in my thesis committee. Ronitt introduced me to other very related areas of research in massive data analysis, namely Local Computation Algorithms and Sublinear Algorithms both through her courses and collaborations. Besides, I benefited a lot from her advice during my PhD years. I also would like to thank my MS advisor at UIUC, Chandra Chekuri. My first experience in doing research in TCS was in collaboration with Chandra and I will be always grateful of his support and supervision. He always had time for me to discuss any topic even after my graduation from UIUC. During my years in grad school, I had the chance to collaborate with many wonderful researchers and this thesis would not be there without their collaboration. I would like to thank all my co-authors: Anders Aamand, Arturs Backurs, Chandra Chekuri, Yodsawalai Chodpathumwan, Erik Demaine, Alina Ene, Timothy Goodrich, Christoph Gunru, Sariel Har-Peled, Chen-Yu Hsu, Piotr Indyk, Dina Katabi, Kyle Kloster, Nitish Korula, Brian Lavallee, Quanquan Liu, Sepideh Mahabadi, Slobodan Mitrović, Amir Nayyeri, Krzysztof Onak, Merav Parter, Ronitt Rubinfeld, Baruch Schieber, Blair Sullivan, Arash Termehchy, Jonathan Ullman, Andrew van der Poel, Tal Wagner, Marianne Winslett, David Woodruff, 5 Anak Yodpinyanee and Yang Yuan. Thanks to all members in the Theory of Computation group and in particular Maryam Aliakbarpour, Arturs, Mohammad Bavarian, Mina Dalirroyfard, Dylan Foster, Mohsen Ghaffari, Themis Gouleakis, Siddhartha Jayanti, Gautam Kamath, Pritish Kamath, William M. Leiserson, Quanquan, Sepideh, Saeed Mehraban, Slobodan, Cameron Musco, Chirstoph Musco, Merav, Madalina Persu, Ilya Razenshteyn, Nicholas Schiefer, Ludwig Schmidt, Adam Sealfon, Tal, Anak, Yang and Henry Yuen, I had great time at MIT. Being away from home, thanks to my friends in Boston area I had wonderful years during my PhD. I would like to thank all my friends and in particular those at MIT: Ehsan, Ardavan, Asma, Maryam, Fereshteh, Elaheh, Sajjad, Mina, Alireza, Amir, Mehrdad, Zahra and Ali. Last but not least, I would like to thank my family. My parents, Amir and Maryam, were always sources of inspiration for me and I am thankful of all their supports and efforts for me since my childhood. Words are not enough to acknowledge what they have done for me. I also would like to thank my siblings Mohsen and Fatemeh who always care about me. Mohsen was my very first mentor since elementary school and had a significant role in shaping my interest in Math and CS. I have benefited a lot from his advice and helps. Mohsen and Fatemeh, thanks for being so perfect! Sepideh, my beloved wife, is the most valuable gift I had from God during my PhD at MIT. Besides our several collaborations on different projects that contribute to half of this thesis, I was very fortunate tohaveher honest and unconditional support in my ups and downs during my PhD. 6 “In the name of God, the most gracious, the most merciful” To my beloved family 7 8 Contents 1 Introduction 23 1.1 Streaming Algorithms for Coverage Problems................. 24 1.1.1 Streaming Algorithms for Set Cover................... 25 1.1.2 Streaming Algorithms for Fractional Set Cover............. 25 1.1.3 Sublinear Algorithms for Set Cover................... 26 1.1.4 Streaming Algorithms for Maximum Coverage............. 26 1.2 Learning-Based Streaming Algorithms..................... 27 1.2.1 Learning-Based Algorithms for Frequency Estimation......... 28 1.2.2 Learning-Based Algorithms for Low-Rank Approximation....... 29 I Sublinear Algorithms for Coverage Problems 31 2 Streaming Set Cover 33 2.1 Introduction.................................... 33 2.1.1 Related Work............................... 33 2.1.2 Our Results................................ 35 2.2 Streaming Algorithm for Set Cover....................... 39 2.2.1 Analysis of IterSetCover ....................... 41 2.3 Lower Bound for Single Pass Algorithms.................... 44 2.4 Geometric Set Cover............................... 50 2.4.1 Preliminaries............................... 50 9 2.4.2 Description of GeomSC Algorithm................... 52 2.4.3 Analysis.................................. 52 2.5 Lower bound for Multipass Algorithms..................... 55 2.6 Lower Bound for Sparse Set Cover in Multiple Passes............. 61 3 Streaming Fractional Set Cover 65 3.1 Introduction.................................... 65 3.1.1 Our Results................................ 65 3.1.2 Other Related Work........................... 66 3.1.3 Our Techniques.............................. 68 3.2 MWU Framework of the Streaming Algorithm for Fractional Set Cover... 70 3.2.1 Preliminaries of the MWU method for solving covering LPs..... 71 3.2.2 Semi-streaming MWU-based algorithm for factional Set Cover.... 73 3.2.3 First Attempt: Simple Oracle and Large Width............ 74 3.3 Max Cover Problem and its Application to Width Reduction......... 76 3.3.1 The Maximum Coverage Problem.................... 78 3.3.2 Sampling-Based Oracle for Fractional Max Coverage......... 82 3.3.3 Final Step: Running Several MWU Rounds Together......... 85 3.3.4 Extension to general covering LPs.................... 86 4 Sublinear Set Cover 91 4.1 Introduction.................................... 91 4.1.1 Our Results................................ 92 4.1.2 Related work............................... 94 4.1.3 Our Techniques.............................. 94 4.2 Preliminaries................................... 96 4.3 Lower Bounds for the Set Cover Problem.................... 97 4.3.1 The Set Cover Lower Bound for Small Optimal Value k ........ 99 4.3.2 The Set

Load more