Lower Bound Techniques for Data Structures Mihai

Lower Bound Techniques for Data Structures by Mihai Pˇatra¸scu Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2008 c Massachusetts Institute of Technology 2008. All rights reserved. Author.................................................................... Department of Electrical Engineering and Computer Science August 8, 2008 Certified by . Erik D. Demaine Associate Professor Thesis Supervisor Accepted by . Professor Terry P. Orlando Chair, Department Committee on Graduate Students 2 Lower Bound Techniques for Data Structures by Mihai Pˇatra¸scu Submitted to the Department of Electrical Engineering and Computer Science on August 8, 2008, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract We describe new techniques for proving lower bounds on data-structure problems, with the following broad consequences: • the first Ω(lg n) lower bound for any dynamic problem, improving on a bound that had been standing since 1989; • for static data structures, the first separation between linear and polynomial space. Specifically, for some problems that have constant query time when polynomial space is allowed, we can show Ω(lg n= lg lg n) bounds when the space is O(n · polylog n). Using these techniques, we analyze a variety of central data-structure problems, and obtain improved lower bounds for the following: • the partial-sums problem (a fundamental application of augmented binary search trees); • the predecessor problem (which is equivalent to IP lookup in Internet routers); • dynamic trees and dynamic connectivity; • orthogonal range stabbing. • orthogonal range counting, and orthogonal range reporting; • the partial match problem (searching with wild-cards); • (1 + )-approximate near neighbor on the hypercube; • approximate nearest neighbor in the `1 metric. Our new techniques lead to surprisingly non-technical proofs. For several problems, we obtain simpler proofs for bounds that were already known. Thesis Supervisor: Erik D. Demaine Title: Associate Professor 3 4 Acknowledgments This thesis is based primarily on ten conference publications: [84] from SODA'04, [83] from STOC'04, [87] from ICALP'05, [89] from STOC'06, [88] and [16] from FOCS'06, [90] from SODA'07, [81] from STOC'07, [13] and [82] from FOCS'08. It seems appropriate to sketch the story of each one of these publications. It all began with two papers I wrote in my first two undergraduate years at MIT, which appeared in SODA'04 and STOC'04, and were later merged in a single journal version [86]. I owe a lot of thanks to Peter Bro Miltersen, whose survey of cell probe complexity [72] was my crash course into the field. Without this survey, which did an excellent job of taking clueless readers to the important problems, a confused freshman would never have heard about the field, nor the problems. And, quite likely, STOC would never have seen a paper proving information-theoretic lower bounds from an author who clearly did not know what \entropy" meant. Many thanks also go to Erik Demaine, who was my research advisor from that freshman year until the end of this PhD thesis. Though it took years before we did any work together, his unconditional support has indirectly made all my work possible. Erik's willingness to pay a freshman to do whatever he wanted was clearly not a mainstream idea, though in retrospect, it was an inspired one. Throughout the years, Erik's understanding and tolerance for my unorthodox style, including in the creation of this thesis, have provided the best possible research environment for me. My next step would have been far too great for me to take alone, but I was lucky enough to meet Mikkel Thorup, now a good friend and colleague. In early 2004 (if my memory serves me well, it was in January, during a meeting at SODA), we began thinking about the predecessor problem. It took us quite some time to understand that what had been labeled \optimal bounds" were not optimal, that proving an optimal bound would require a revolution in static lower bounds (the first bound to beat communication complexity), and to finally find an idea to break this barrier. This 2-year research journey consumed us, and I would certainly have given up along the way, had it not been for Mikkel's contagious and constant optimism, constant influx of ideas, and the many beers that we had together. This research journey remained one of the most memorable in my career, though, unfor- tunately, the ending was underwhelming. Our STOC 2006 [89] paper went largely unnoticed, with maybe 10 people attending the talk, and no special issue invitation. I consoled myself with Mikkel's explanation that ours had been paper with too many new ideas to be digested soon after publication. Mikkel's theory got some support through our next joint paper. I proposed that we look at some lower bounds via the so-called richness method for hard problems like partial match or nearest neighbor. After 2 years of predecessor lower bounds, it was a simple exercise to obtain better lower bound by richness; in fact, we felt like we were simplifying our technique for beating communication complexity, in order to make it work here. Despite my opinion that this paper was too simple to be interesting, Mikkel convinced me to submit it to FOCS. Sure enough, the paper was accepted to FOCS'06 [88] with raving reviews and a special issue invitation. One is reminded to listen to senior people once in while. 5 After my initial interaction with Mikkel, I had begun to understand the field, and I was able to make progress on several fronts at the same time. Since my first paper in 2003, I kept working on a very silly dynamic problem: prove a lower bound for partial sums in the bit-probe model. It seemed doubtful that anyone would care, but the problem was so clean and simple, that I couldn't let go. One day, as I was sitting in an MIT library (I was still an undergraduate and didn't have an office), I discovered something entirely unexpected. You see, before my first paper proving an Ω(lg n) dynamic lower bound, there had been just one technique for proving dynamic bounds, dating back to Fredman and Saks in 1989. Everybody had tried, and failed, to prove a logarithmic bound by this technique. And there I was, seeing a clear and simple proof of an Ω(lg n) bound by this classic technique. This was a great surprise to me, and it had very interesting consequences, including a new bound for my silly little problem, as well as a new record lower bound in the bit-probe model. With the gods clearly on my side (Miltersen was on the PC), this paper [87] got the Best Student Paper award at ICALP. My work with Mikkel continued with a randomized lower bound for predecessor search (our first bound only applied to deterministic algorithms). We had the moral obligation to “finish” the problem, as Mikkel put it. This took quite some work, but in the end, we succeeded, and the paper [90] appeared in SODA'07. At that time, I also wrote my first paper with Alex Andoni, an old and very good friend, with whom I would later share an apartment and an office. Surprisingly, it took us until 2006 to get our first paper, and we have only written one other paper since then, despite our very frequent research discussions over beer, or while walking home. These two papers are a severe underestimate of the entertaining research discussions that we have had. I owe Alex a great deal of thanks for the amount of understanding of high dimensional geometry that he has passed on to me, and, above all, for being a highly supportive friend. Our first paper was, to some extent, a lucky accident. After a visit to my uncle in Philadelphia, I was stuck on a long bus ride back to MIT, when Alex called to say that he had some intuition about why (1 + ")-approximate nearest neighbor should be hard. As always, intuition is hard to convey, but I understood at least that he wanted to think in very high dimensions, and let the points of the database be at constant pairwise distance. Luckily, on that bus ride I was thinking of lower bounds for lopsided set disjointness (a problem left open by the seminal paper of Miltersen et al. [73] from STOC'95). It didn't take long after passing to realize the connection, and I was back on the phone with Alex explaining how his construction can be turned in a reduction from lopsided set disjointness to nearest neighbor. Back at MIT, the proof obviously got Piotr Indyk very excited. We later merged with another result of his, yielding a FOCS'06 paper [16]. Like in the case of Alex, the influence that Piotr has had on my thinking is not conveyed by our number of joint papers (we only have one). Nonetheless, my interaction with him has been both very enjoyable and very useful, and I am very grateful for his help. My second paper with Alex Andoni was [13] from FOCS'08. This began in spring 2007 as a series of discussions between me and Piotr Indyk about approximate near neighbor in `1, discussions which got me very excited about the problem, but didn't really lead to any 6 solution. By the summer of 2007, I had made up my mind that we had to seek an asymmetric communication lower bound. That summer, both I and Alex were interns at IBM Almaden, and I convinced him to join on long walks on the beautiful hills at Almaden, and discuss this problem.

Lower Bound Techniques for Data Structures Mihai

Persistent Predecessor Search and Orthogonal Point Location on the Word

Lecture 1 — 24 January 2017 1 Overview 2 Administrivia 3 The

On Data Structures and Memory Models

Predecessor Search

Advanced Data Structures 1 Introduction 2 Setting the Stage 3

Dynamic Integer Sets with Optimal Rank, Select, and Predecessor

MIT 6.851 Advanced Data Structures Prof

Advanced Rank/Select Data Structures: Succinctness, Bounds, and Applications

Computational Geometry Through the Information Lens Mihai Pdtracu

26 January, 2017 1 Overview 2 Y-Fast Tries

Minimal Indices for Predecessor Search ∗ Sarel Cohen , Amos Fiat, Moshik Hershcovitch, Haim Kaplan

Predecessor Data Structures