au AARHUS UNIVERSITY

),1$/ REPORT Center for Massive Data Algorithmics 2007 - 201

MADALGO - a Center of the Danish National Research Foundation / Department of Computer Science / Aarhus University

When initiated in 2007, the ambitious goal of the Danish National Research Foundation Center for Massive Data Algorithmics (MADALGO) was to become a world-leading center in for handling massive data. The high-level objectives of the center were: x To significantly advance the fundamental algorithms knowledge in the area of efficient processing of massive datasets; x To educate the next generation of researchers in a world-leading and international environment; x To be a catalyst for multidisciplinary collaborative research on massive dataset issues in commercial and scientific applications.

Building on the research strength at the main center site at Aarhus University in Denmark, and at the sites of the other participants at Max Planck Institute for Informatics and at Frankfurt University in Germany, and at Massachusetts Institute of Technology in the US, as well as on significant international research collaboration, multidisciplinary and industry collaboration, and in general on a vibrant international environment at the main center site, I believe we have met the goal.

Following the guidelines of the Foundation, this report contains a short reflection on the center’s results, impact and international standing (section A), along with appendices with statistics on patents and spin-off companies (section A) and the statistics collected by the Foundation (section C). It also contains 10 publications reflecting the breadth and depth of center research.

Lars Arge Center Director May, 2016

Final report A. Center results, impact and international standing

Center Results. A key motivation for establishment of Center for Massive Data Algorithmics was the inadequacy of traditional algorithms theory in many applications involving massive amounts of data. In particular, traditional theory use simplistic machine models (mathematical model in which algorithms are designed and analyzed) that do not take the hierarchical memory organization of modern machines into account. This translates into software inadequacies when processing massive data. Based on an ambitious research plan, center researchers have obtained a large number of results centered around four key areas that all relate to the design of algorithms in more realistic models of computations than traditional algorithm theory. Below we briefly highlight a few results (corresponding to the 10 selected papers). In the area of I/O-efficient algorithms, the goal is to design algorithms in a model of computation that takes the large access time of disk (compared to main memory) into account. Center results on fundamental problems in the area include results on sorting algorithms that are provably optimal in terms of both I/O and internal memory computation time (selected paper 1, which won the best paper awards at the 2013 ISAAC conference). Center researchers have also obtained results on processing of massive terrain data, including several results on modeling water flow on terrains, such as for example algorithms for computing how depressions in a terrain fills with water as it rains (paper 2, which appeared in the top computational geometry conference SoCG). Overall, we believe center work in the area has been very successful. In the area of cache-oblivious algorithms, the aim is to develop algorithms that automatically adapt to the unknown multiple levels of modern memory hierarchies. Developing cache-oblivious algorithms is very challenging; at the start of the center, techniques for obtaining cache-oblivious algorithms was not very developed and the fundamental limitations in the area not well understood. Center researchers have obtained a good number of results in the area, and e.g. developed new fundamental techniques used to obtain an optimal cache-oblivious data structure that allows for fast search for a data element among a set of elements (paper 3, which appeared in the top algorithms conference SODA). However, overall we have not made as much progress on cache-oblivious problems as we would have liked. In the area of streaming algorithms, the aim is to develop algorithms that only perform one sequential pass over the input data, while processing each input element as fast as possible using significantly less space than the input data size. Center researchers have obtained a large number of results in the area. For example, we have developed the first optimal algorithm for the well-studied and fundamental problem of estimating the number of distinct elements in a stream (paper 4, which won the best paper awards at the top theoretical database conference PODS in 2010). We have also obtained a large number of results in relation to the so- called linear sketching technique, and recently we showed the very first known lower bounds on how fast data stream elements can be processed (paper 5, which appeared in the top theoretical computer science conference STOC). Overall, we believe center work in the area has been very successful. The area of algorithm engineering covers the design and analysis of practical algorithms, their efficient implementation, as well as experimentation that provides insight into their applicability and further improvement. Center researches have in particular engineered many I/O-efficient algorithms. For example, the center has had great success with engineering algorithms for flood risk analysis using detailed (and thus massive) terrain data. This work includes implementation of results on how depressions fill during a rain event mentioned above (paper 2) and recent work on modelling and computing flood risk from water rising in rivers (paper 6, which appear in the good geographic information systems conference GIScience). It also formed the basis of the center startup company SCALGO, which has very successfully marketed flood risk analysis products based on the openly available detailed terrain model of Denmark. For example, the depression risk analysis results (paper 2) formed the basis of analysis purchased by more than half of the local Danish governments. Center algorithm engineering work has also formed the basis for collaboration with several other companies, as well as for extensive collaboration with researchers in other fields, in particular biology (bioinformatics and biodiversity) researchers. The river flood risk research (paper 6) is e.g. performed with a number of biology researchers, and the collaboration has also led to a Science paper (paper 7). Overall, we believe center algorithm engineering work has been very successful. In addition to the above four core areas, center researchers have obtained a large number of results on research questions in relation to e.g. models of parallel computation, flash memory, fault tolerance, memory efficient (succinct) data structures, as well as to fundamental problems in more traditional models of computation, in particular data structure and lower bound problems. As a few examples, center researchers have solved longstanding open problems in succinct (paper 8, which won the best student paper award at the top theoretical computer science conference FOCS in 2008) and priority queue (paper 9, which appeared in top theoretical computer science conferences STOC and is co-authored by Turing Award winner

Final report Tarjan) data structures. Much of the center’s data structure work (in various models of computation) has been on fundamental range searching problems (storing a set of d-dimensional points such that points in a query range can be identified efficiently), and we have e.g. showed very strong lower bound tradeoffs between search and update time for several variants of the problem (paper 10, which won both best paper and best student paper awards in the top theoretical computer science conferences STOC in 2012).

A key goal of the center has been to educate the next generation of researchers in a world-leading and international environment. Altogether, 28 PhD students have graduated from the center (19 from AU), and the center currently houses 14 PhD students (7 at AU). Among the 26 students at AU, 12 students are internationals. Additionally, the center has hosted 24 Post Doc (22 at AU). The center has emphasized creation of a vibrant and international environment at the main center site at AU. The center has hosted numerous research visitors, just as we have organized annual summer schools at the center. The summer school series has been very successful, with around 70 participants from 30+ institutions attending every year. The center has also hosted the top computational geometry conference (SoCG) in 2009, and is hosting the top European algorithms event ALGO (including the top European algorithms conference ESA) in 2016. The center also initiated Workshop on Massive Data Algorithmics (MASSIVE) in 2009 and the successful workshop has been held every year since in connection with SoCG or ALGO.

Center international standing and impact. Overall, we believe the center has been very successful. From a purely bibliometric point of view, this can e.g. be seen from the fact that in the last four years, center researchers have published 90+ peer-reviewed papers a year, with consistently 10+ papers in the top three general theoretical computer science conferences STOC, FOCS and SODA. According to google scholar, center papers have received 12.000+ citations (steadily increasing over the years and reaching 3000 in 2015). This is an exceptionally good record in theoretical computer science, and the center strength is confirmed by the many center best papers award, the around 30 annual invited presentations at research conferences, workshops and seminars given by center researchers, as well as by the numerous awards and recognitions received by center members. On a more subjective level, we believe that the center is among the top algorithms research centers in Europe (possibly only rivaled by the Max Planck Institute for Informatics). We believe the center is the top institution worldwide in I/O-efficient algorithms and possibly also in cache- oblivious algorithms, among the top in streaming algorithms and algorithm engineering, as well as among the top in data structures (including lower bounds) in general. This was confirmed by the international evaluation of the center in 2011, where the panel concluded that “MADALGO is a truly international research center with high visibility and respect in the worldwide computer-science community. There is no doubt that it is the world-leading center in the area of massive dataset algorithmics”.

In terms of broader impact, we believe that the center has contributed significantly to the transformation of the algorithms field, from a very theoretical field where results are proved in elegant but somewhat unrealistic models of computation, to a field where practical applicability of theoretical results is a key focus. This general transformation has occurred worldwide during the last decade, and the center focus on new and more practically realistic models of computation has certainly contributed to the transformation. Similarly, the center’s focus on understanding practical performance through algorithm engineering has contributed to the field of algorithm engineering gaining worldwide recognition. Supported by algorithm engineering work, the center has certainly also had an impact in terms of interdisciplinary and industry collaboration, including by contributing to the transformation of parts of the biological sciences to being more data driven, and by introducing innovative research based products in the marked through SCALGO (which e.g. also is involved in producing sea-charts for Greenland and contour maps for Denmark in collaboration with Danish authorities). On several occasions, the center’s collaborative activities and success in transforming research into innovative products has been highlighted as examples of the broader impact of basic research, for example when the Minister for Higher Education and Science visited the center in 2013. In general, the center has prioritized outreach events and maintaining a good public visibility. The center has also had an impact in terms of education of PhD students and Post Docs. The many center PhDs and Post Docs are now excellent ambassadors in research institutions and industry worldwide, and in general we believe that the international visibility and recognition of the center, also supported by the many international events the center has organized and participated in, will have a lasting impact on Danish algorithms research. Concretely, the center locally at AU contributed to development of the streaming algorithms area (which was not really present in Denmark at the start of the center), and is now contributing to buildup in related areas such as Machine Learning and “Big Data”. For example, recent Big Data faculty candidates have specifically mentioned center strength and visibility as a reason for applying to AU.

Final report B. Appendices

Appendix a: Patents and spin-off companies Number of mutually Number of inventions Number of submitted Number of granted agreed licence, sale Names of spin-off reported to institution patent applications patents and option companies established agreements

Scalable Algorithmics (SCALGO Appendix b: Center publications

Number of publications (May 2016) Peer-reviewed (OA) Not peer-reviewed (OA) Journal articles 204 (137) + 11 accepted 0 Conference proceedings 413 (226) 30 (20) Monographs 0 (0) 2 (0) Book chapters 2 (0) 0 (0) Others 0 (0) 96 (24)

The 10 most prestegious conferences within the Center's research area 1. ACM Symposium on Theory of Computing (STOC) 2. IEEE Symposium on Foundations of Computer Science (FOCS) 3. ACM-SIAM Symposium on Discrete Algorithms (SODA) 4. Symposium on Computational Geometry (SoCG) 5. International Colloquium on Automata, Languages, and Programming (ICALP) 6. European Symposium on Algorithms (ESA) 7. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) 8. International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX)/ International Workshop on Randomizationand Computation (RANDOM) 9. Scandinavian Workshop on Algorithm Theory (SWAT)/ Algorithms and Data Structures Symposium (WADS), previously Workshop on Algorithms and Data Structures 10. Workshop on Algorithm Engineering and Experiments (ALENEX)

The 10 most prestegious journals in the Center's research area 1. Journal of the ACM 2. SIAM Journal on Computing 3. ACM Transactions on Algorithms 4. Discrete & Computational Geometry 5. Algorithmica 6. Journal of Computer and System Sciences 7. Computational Geometry: Theory and Applications 8. ACM Journal of Experimental Algorithmics 9. Theoretical Computer Science 10. Journal of Discrete Algorithms

Bibliometric information Distribution of center publications on 10 most prestigious conferences: 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 STOC 10 0 03411 4 ? FOCS 17 3 23103 4 ? SODA 06 5 6611117 5 5 SoCG 05 3 32322 3 5 ICALP 10 7 13254 0 4 ESA 31 0 31532 2 ? SPAA 31 0 31102 0 ? APPROX/RANDOM 01 0 13000 0 ? SWAT/WADS 13 6 13122 5 ? ALENEX 00 1 01012 1 1 STOC, FOCS and SODA can be rated as "best non specialized" conferences SoCG and ALENEX can be rated as "best specialized" conferences

Center publications have been authored by 924 unique authors - 130 associated with the center and 794 not. Only 203 center publications are by center researchers only. Citations to center publication (according to Google scholar, which is the most reliable - but certainly not perfect - source of citation information in the area) can be found at http://scholar.google.com/citations?user=fRowhXcAAAAJ Conference proceedings C1 2007 B. Escoffier, G. Moruz Adapting Parallel Algorithms Proc. International (PR)(CO) and A. Ribichini to the W-Stream Model, with Symposium on Mathematical Applications to Graph Foundations of Computer Problems Science (MFCS)

C2 2007 S. Guha, P. Indyk and A. Sketching Information Proc. Annual Conference on (PR)(CO) McGregor Divergences Learning Theory (COLT)

C3 2007 G. S. Brodal and A. G. A Linear Time Algorithm for Proc. International (PR)(CO) Jørgensen the k Maximal Sums Problem Symposium on Mathematical Foundations of Computer Science (MFCS)

C4 2007 G. S. Brodal, L. Dynamic Matchings in Proc. International (PR)(CO) Georgiadis, K. A. Hansen Convex Bipartite Graphs Symposium on Mathematical and I. Katriel Foundations of Computer Science (MFCS)

C5 2007 G. Jørgensen, G. Moruz Resilient Priority Queues Proc. International Workshop (PR) and T. Mølhave on Algorithms and Data Structures (WADS)

C6 2007 G. S. Brodal, R. Optimal Resilient Dynamic Proc. European Symposium (PR)(CO) Fagerberg, I. Finocchi, F. Dictionaries on Algorithms (ESA) Grandoni, G. Italiano, A. G. Jørgensen, G. Moruz and T. Mølhave

C7 2007 P. K. Agarwal, L. Arge, A. TerraStream: From Elevation Proc. ACM International (PR)(CO) Danner, H. Mitasova, T. Data to Watershed Symposium on Advances in Mølhave and K. Yi Hierarchies Geographical Information Systems (ACM-GIS)

C8 2007 M. Patrascu and Mikkel Planning for Fast Proc. IEEE Symposium on (PR)(CO) Thorup Connectivity Updates Foundations of Computer Science (FOCS) C9 2007 G. Franceschini, S. Radix Sorting With No Extra Proc. European Symposium (PR)(CO) Muthukrishnan, and M. Space on Algorithms (ESA) Patrascu C10 2007 E. D. Demaine, S. Mozes, An Optimal Decomposition Proc. International (PR)(CO) B. Rossman and O. Algorithm for Tree Edit Colloquium on Automata, Weimann Distance Languages and Programming (ICALP) C11 2007 M. A. Bender, M. Farach- Cache-Oblivious Streaming B- Proc. ACM Symposium on (PR)(CO) Colton, J. T. Fineman, Y. trees Parallelism in Algorithms and Fogel, B. C. Kuszmaul Architectures (SPAA) and J. Nelson

C12 2007 E. D. Demaine, M. Scheduling to Minimize Gaps Proc. ACM Symposium on (PR)(CO) Ghodsi, M. Hajiaghayi, A. and Power Consumption Parallelism in Algorithms and S. Sayedi-Roshkhar and Architectures (SPAA) M. Zadimoghaddam

C13 2007 M. Patrascu Lower Bounds for 2- Proc. ACM Symposium on (PR) Dimensional Range Counting Theory of Computing (STOC)

C14 2007 G. M. Landau, D. Tsur Indexing a Dictionary for Proc. Symposium on String (PR)(CO) and O. Weimann Subset Matching Queries Processing and Information Retrieval (SPIRE)

C15 2007 T. Friedrich and D. Ajwani Average-Case Analysis of Proc. International (PR) Online Topological Ordering Symposium on Algorithms and Computation (ISAAC) C16 2007 K. Chang Multiple pass streaming Proc. Algorithmic Learning (PR) algorithms for learning Theory (ALT) mixtures of distributions in R^d C17 2007 M. Westergaard, L. M. The ComBack Method - Proc. International (PR) Kristensen, G. S. Brodal Extending Hash Compaction Conference on Applications and L. Arge with Backtracking and Theory of Petri Nets and Other Models of Concurrency (ICATPN)

C18 2007 M. A. Bender, G. S. Optimal Sparse Matrix Dense Proc. ACM Symposium on (PR)(CO) Brodal, R. Fagerberg, R. Vector Multiplication in the Parallelism in Algorithms and Jacob and E. Vicari I/O-Model Architectures (SPAA)

C19 2007 A. Golynski, R. Grossi, A. On the Size of Succinct Proc. European Symposium (PR)(CO) Gupta, R. Raman and S. Indices on Algorithms (ESA) S. Rao C20 2007 M. Olsen Nash Stability in Additively Proc. Conference on (PR) Separable Hedonic Games is Computability in Europe (CiE) NP-hard

C21 2008 M. Ruzic and P. Indyk Near-Optimal Sparse Proc. Symposium on (PR)(CO) Recovery in the L1 norm Foundations of Computer Science (FOCS) C22 2008 M. Patrascu (Data) STRUCTURES Proc. Symposium on (PR) Foundations of Computer Science (FOCS) C23 2008 M. Patrascu Succincter Proc. Symposium on (PR) Foundations of Computer Science (FOCS) C24 2008 E. Demaine, S. Confluently Persistent Tries Proc. Scandinavian Workshop (PR)(CO) Langerman and E. Price for Efficient Version Control on Algorithm Theory (SWAT)

C25 2008 D. Ajwani, I. Malinger, U. Characterizing the Proc. Workshop on (PR)(CO) Meyer and S. Toledo Performance of Flash Experimental Algorithms Memory Storage Devices and (WEA) Its Impact on Algorithm Design C26 2008 U. Meyer On Dynamic Breadth-First Proc. Symposium on (PR) Search in External-Memory Theoretical Aspects (STACS)

C27 2008 U. Meyer On Trade-Offs in External- Proc. Scandinavian Workshop (PR) Memory Diameter on Algorithm Theory (SWAT) Approximation C28 2008 G. S. Brodal and A. G. Selecting Sums in Arrays Proc.International (PR) Jørgensen Symposium on Algorithms and Computation (ISAAC) C29 2008 L. Arge, G. S. Brodal and External Memory Planar Proc. Symposium on (PR) S. S. Rao Point Location with Computational Geometry Logarithmic Updates (SoCG) C30 2008 A. Golynski, R. Raman On the Redundancy of Proc. Scandinavian Workshop (PR)(CO) and S. S. Rao Succinct Data Structures on Algorithm Theory (SWAT)

C31 2008 M. Olsen The Computational Proc. International (PR) Complexity of Link Building Conference on Computing and Combinatorics (COCOON)

C32 2008 M.A. Abam, M. de Berg A Simple and Efficient Kinetic Proc. Symposium on (PR)(CO) and J. Gudmundsson Spanner Computational Geometry (SoCG) C33 2008 L. Arge, M.T. Goodrich, Fundamental Parallel Proc. Symposium on (PR)(CO) M. Nelson and N. Algorithms for Private-Cache Parallelism in Algorithms and Sitchinava Chip Multiprocessors Architectures (SPAA)

C34 2008 L. Arge, T. Moelhave and Cache-Oblivious Red-Blue Proc. European Symposium (PR)(CO) N. Zeh Line Segment Intersection on Algorithm (ESA)

C35 2008 P.K. Agarwal, L. Arge, T. I/O-efficient Algorithms for Proc. Symposium on (PR)(CO) Moelhave and B. Sadri Computing Contour Lines on Computational Geometry a Terrain (SoCG) C36 2008 J. Feldman, S. On Distributing Symmetric Proc. Symposium on Discrete (PR)(CO) Muthukrishnan, A. Streaming Computations Algorithms (SODA) Sidiropoulos, C. Stein and Z. Svitkina C37 2008 P. Indyk Explicit Constructions for Proc Symposium on Discrete (PR) Compressed Sensing of Algorithms (SODA) Sparse Signals C38 2008 A. Andoni, P. Indyk and Earth Mover Distance over Proc. Symposium on Discrete (PR)(CO) R. Krauthgamer High-Dimensional Spaces Algorithms (SODA)

C39 2008 P. Indyk and A. Declaring Independence via Proc. Symposium on Discrete (PR)(CO) McGregor the Sketching of Sketches Algorithms (SODA)

C40 2008 K. Onak and A. Circular Partitions with Proc. Symposium on (PR)(CO) Sidiropoulos Applications to Visualization Computational Geometry and Embeddings (SoCG)

C41 2008 J. Matousek and A. Inapproximability for metric Proc. Symposium on (PR)(CO) Sidiropoulos embeddings into R^d Foundations of Computer Science (FOCS) C42 2008 N. J. A. Harvey, J. Nelson Sketching and Streaming Proc. Symposium on (PR)(CO) and K. Onak Entropy via Approximation Foundations of Computer Theory Science (FOCS) C43 2008 A. Andoni, D. Croitoru Hardness of Nearest Proc. Symposium on (PR)(CO) and M. Patrascu Neighbor under L-infinity Foundations of Computer Science (FOCS) C44 2008 T. Chan, M. Patrascu and Dynamic Connectivity: Proc. Symposium on (PR)(CO) L. Roditty Connecting to Networks and Foundations of Computer Geometry Science (FOCS) C45 2008 S. Mozes, K. Onak and Finding an Optimal Tree Proc. Symposium on Discrete (PR)(CO) Oren Weimann Searching Strategy in Linear Algorithms (SODA) Time C46 2008 A. Chakrabarti, T.S. Tight Lower Bounds for Proc. Symposium on Discrete (PR)(CO) Jayram and M. Patrascu Selection in Randomly Algorithms (SODA) Ordered Streams C47 2008 E. Demaine, T. Ito, Ni. J. On the Complexity of Proc. International (PR)(CO) A. Harvey, C. H. Reconfiguration Problems Symposium on Algorithms Papadimitriou, M. Sideri, and Computation (ISAAC) R. Uehara and Yushi Uno

C48 2008 E. Demaine, G. Aloupis, Reconfiguration of Cube- Proc. International (PR)(CO) S. Collette, S. Style Modular Robots Using Symposium on Algorithms Langerman, V. Sacristan O(log n) Parallel Moves and Computation (ISAAC) and S. Wuhrer

C49 2008 E. Demaine, M. Buadoiu, Ordinal Embedding: Proc. International Workshop (PR)(CO) M. Hajiaghayi, A. Approximation Algorithms on Approximation Algorithms Sidiropoulos and M. and Dimensionality Reduction for Combinatorial Zadimoghaddam Optimization Problems (APPROX)

C50 2008 E. Demaine, T. G. Hinged Dissections Exist Proc. Symposium on (PR)(CO) Abbott, Z. Abel, D. Computational Geometry Charlton, M. L. Demaine (SoCG) and S. D. Kominers

C51 2008 E. R. Hansen, S. S. Rao Compressing Binary Decision European Conference on (PR)(CO) and P. Tiedemann Diagrams Artificial Intelligence (ECAI)

C52 2008 R. Berinde, P. Indyk and Practical Near-Optimal Proc. Allerton Conference (CO) M. Ruzic Sparse Recovery in the L1 Norm (invited paper) C53 2008 R. Berinde, A. Gilbert, P. Combining Geometry and Proc. Allerton Conference (CO) Indyk, H. Karloff and M. Combinatorics: A Unified Strauss Approach to Sparse Signal Recovery (invited paper)

C54 2008 M.A. Abam, M. de Berg, Fault-Tolerant Conflict-Free Proc. Canadian Conference (CO) and S-H. Poon Coloring on Computational Geometry

C55 2009 R. Berinde, G. Cormode, Space-optimal Heavyhitters Proc. Symposium on (PR)(CO) P. Indyk and M. Strauss with Strong Error Bounds Principles of Database Systems (PODS) C56 2009 V. Cevher, C. Hegde, P. Recovery of Clustered Proc. International (PR)(CO) Indyk and R. G. Baraniuk Sparse Signal from Conference on Sampling Compressive Measurements Theory and Applications (SAMPTA) C57 2009 E. Demaine, G. Landau On Cartesian Trees and Proc. International (PR)(CO) and O. Weimann Range Minimum Queries Colloquium on Automata, Languages and Programming (ICALP) C58 2009 D. Hermelin, G. M. A Unified Algorithm for Proc. International (PR)(CO) Landau, S. Landau and Accelerating Edit-Distance Symposium on Theoretical O. Weimann Computation via Text- Aspects of Computer Science Compression (STACS)

C59 2009 A. Kovacs, U. Meyer, G. Online Paging for Flash Proc. International (PR) Moruz and A. Negoescu Memory Devices Symposium on Algorithms and Computation (ISAAC) C60 2009 G. Brodal, A. Jørgensen, Counting in the Presence of Proc. International (PR) G. Moruz and T. Mølhave Memory Faults Symposium on Algorithms and Computation (ISAAC) C61 2009 D. Ajwani, A. Beckmann, On Computational Models for Proc. Symposium on (PR)(CO) R. Jacob, U. Meyer and Flash Memory Devices Experimental Algorithms G. Moruz (SEA)

C62 2009 U. Meyer and V. Osipov Design and Implementation Proc. Workshop on Algorithm (PR) of a Practical I/O-efficient Engineering and Experiments Shortest Paths Algorithm (ALENEX)

C63 20009 U. Meyer Via Detours to I/O-Efficient Proc. Efficient Algorithms - Shortest Paths Essays dedicated to Kurt Mehlhorn on the Occasion of his 60th birthday C64 2009 D. Ajwani, R. Dementiev, Breadth First Search on Proc. Ninth DIMACS (PR) U. Meyer and V. Osipov Massive Graphs Implementation Challenge: The Shortest Path Problem

C65 2009 A. Beckmann, R. Building a Parallel Pipelined Proc. International (PR) Dementiev and J. Singler External Memory Algorithm Symposium on Parallel and Library Distributed Processing (IPDPS) C66 2009 G. S. Brodal and A. Data Structures for Range Proc. International (PR) Jørgensen Median Queries Symposium on Algorithms and Computation (ISAAC) C67 2009 G. S. Brodal, R. Online Sorted Range Proc. International (PR)(CO) Fagerberg, M. Greve and Reporting Symposium on Algorithms A. López-Ortiz and Computation (ISAAC) C68 2009 G. S. Brodal, A. Kaporis, Dynamic 3-sided Planar Proc. International (PR)(CO) S. Sioutas, K. Tsakalidis Range Queries with Expected Symposium on Algorithms and K. Tsichlas Doubly Logarithmic Time and Computation (ISAAC)

C69 2009 G. S. Brodal, A. Fault Tolerant External Proc. Algorithms and Data (PR) Jørgensen and T. Mølhave Memory Algorithms Structures Symposium (WADS) C70 2009 A. Kaporis, A.N. Efficient Processing of 3- Proc. International (PR)(CO) Papadopoulos, S. Sided Range Queries with Conference on Database Sioutas, K. Tsakalidis and Probabilistic Guarantees Theory (ICDT) K. Tsichlas C71 2009 M. Abam, M. de Berg, M. Geometric Spanners for Proc. European Symposium (PR)(CO) Farshi, J. Gudmundsson Weighted Point Sets on Algorithms (ESA) and M. Smid

C72 2009 M. Abam and M. de Berg Kinetic Spanners in R^d Proc. Symposium on (PR)(CO) Computational Geometry (SoCG) C73 2009 M. Abam, P. Carmi, M. On the Power of the Semi- Proc. Algorithms and Data (PR)(CO) Farshi and M. Smid Separated Pair Decomposition Structures Symposium (WADS) C74 2009 D. Ajwani On P-complete Problems in Proc. Workshop on Massive Memory Hierarchy Models Data Algorithmics (MASSIVE)

C75 2009 A. Farzan, R. Raman and Universal Succinct Proc. International (PR)(CO) S. Srinivasa Rao Representations of Trees? Colloquium on Automata, Languages and Programming (ICALP) C76 2009 R. Pagh and S. Srinivasa Secondary Indexing in One Proc. Symposium on (PR)(CO) Rao Dimension: Beyond B-trees Principles of Database and Bitmap Indexes Systems (PODS) C77 2009 R. Grossi, A. Orlandi, R. More Haste, Less Waste: Proc. International (PR)(CO) Raman and S. Srinivasa Lowering the Redundancy in Symposium on Theoretical Rao Fully Indexable Dictionaries Aspects of Computer Science (STACS)

C78 2009 J. E. Moeslund, P. K. Impacts of 21st Century Sea- IOP Conference Series: Earth (PR) Bøcher, J.-C. Svenning, level Rise on a Danish Major and Environmental Science 8 T. Mølhave and L. Arge City – An Assessment Based on Fine-resolution Digital Topography and a New Flooding Algorithm

C79 2009 M. de Berg and P. Rotated-Box Trees: A Proc. International (PR)(CO) Hachenberger Lightweight c-Oriented Symposium on Experimental Bounding-Volume Hierarchy Algorithms (SEA)

C80 2009 P. Afshani, L. Arge and K. Orthogonal Range Reporting Proc Symposium on (PR) Dalgaard Larsen in Three and Higher Foundations of Computer Dimensions Science (FOCS) C81 2009 P. Afshani, C. Hamilton A Unified Approach for Cache- Proc. Symposium on (PR)(CO) and N. Zeh Oblivious Range Reporting Computational Geometry and Approximate Range (SoCG) Counting

C82 2009 P. Afshani, C. Hamilton Cache-Oblivious Range Proc. Symposium on (PR)(CO) and N. Zeh Reporting With Optimal Computational Geometry Queries Requires Superlinear (SoCG) Space C83 2009 P. Afshani, J. Barbay and Instance-optimal Geometric Proc Symposium on (PR)(CO) T. Chan Algorithms Foundations of Computer Science (FOCS) C84 2009 L. Arge, M.T. Goodrich Parallel External Memory Proc. Workshop on Theory and N. Sitchinava Model and Many-Cores C85 2009 L. Arge and M. Revsbæk I/O-Efficient Contour Tree Proc. International (PR) Simplification Symposium on Algorithms and Computation (ISAAC) C86 2009 A. Andoni, P. Indyk, R. Approximate Line Nearest Proc. Symposium on Discrete (PR)(CO) Krauthgamer and H.L. Neighbor in High Dimensions Algorithms (SODA) Nguyen C87 2009 A. Andoni, P. Indyk and Overcoming the L1 Non- Proc. Symposium on Discrete (PR)(CO) R. Krauthgamer embeddability Barrier: Algorithms (SODA) Algorithms for Product Metrics C88 2009 R. Berinde and P. Indyk Sequential Sparse Matching Proc. Allerton Conference (PR)(CO) Pursuit C89 2009 A. Andoni, K. Do Ba, P. Efficient Sketches for Earth- Proc. Symposium on (PR)(CO) Indyk and D. Woodruff Mover Distance, with Foundations of Computer Applications Science (FOCS) C90 2009 A. Andoni, P. Indyk, K. External Sampling Proc. International (PR)(CO) Onak and R. Rubinfeld Colloquium on Automata, Languages and Programming (ICALP) C91 2009 E. Demaine, M. Demaine, Folding a Better Proc. International (PR)(CO) G. Konjevod and R. Lang Checkerboard Symposium on Algorithms and Computation (ISAAC) C92 2009 J. Cardinal, E. Demaine, Algorithmic Folding Proc. International (PR)(CO) M. Demaine, S. Imahori, Complexity Symposium on Algorithms S. Langerman and R. and Computation (ISAAC) Uehara

C93 2009 E. Demaine, M. Minimizing Movement: Fixed- Proc. European Symposium (PR)(CO) Hajiaghayi, and D. Marx Parameter Tractability on Algorithms (ESA)

C94 2009 B. Ballinger, D. Charlton, Minimal Locked Trees Proc. Algorithms and Data (PR)(CO) E. Demaine, M. Demaine, Structures Symposium J. Iacono, C-H. Liu and S- (WADS) H. Poon

C95 2009 E. Demaine, D. Kane and A Pseudopolynomial Proc. Algorithms and Data (PR)(CO) G. Price algorithm for Alexandrov's Structures Symposium Theorem (WADS) C96 2009 T. Ito, M. Kaminski and Reconfiguration of List Edge- Proc. Algorithms and Data (PR)(CO) E. Demaine Colorings in a Graph Structures Symposium (WADS) C97 2009 E. Demaine, M. Approximation Algorithms Proc. International (PR)(CO) Hajiaghayi and K. via Structural Results for Colloquium on Automata, Kawarabayashi Apex-Minor-Free Graphs Languages and Programming (ICALP) C98 2009 E. Demaine, M. Node-Weighted Steiner Tree Proc. International (PR)(CO) Hajiaghayi and P. Klein and Group Steiner Tree in Colloquium on Automata, Planar Graphs Languages and Programming (ICALP) C99 2009 E. Demaine, G. Polynomial-Time Proc. International (PR)(CO) Borradaile and S. Tazari Approximation Schemes for Symposium on Theoretical Subset-Connectivity Aspects of Computer Science Problems in Bounded-Genus (STACS) Graphs C100 2009 E. Demaine, D. Harmon, The Geometry of Binary Proc. Symposium on Discrete (PR)(CO) J. Iacono, D. Kane and Search Trees Algorithms (SODA) M. Patrascu C101 2009 E. Demaine, K. Additive Approximation Proc. Symposium on Discrete (PR)(CO) Kawarabayashi and M. Algorithms for List-Coloring Algorithms (SODA) Hajiaghayi Minor-Closed Class of Graphs

C102 2009 E. Demaine, M. The Price of Anarchy in Proc. International (PR)(CO) Hajiaghayi, H. Mahini and Cooperative Network Symposium on Theoretical M. Zadimoghaddam Creation Games Aspects of Computer Science (STACS)

C103 2009 J. Cardinal, E. Demaine, The Stackelberg Minimum Proc. Workshop on Internet (PR)(CO) S. Fiorini, G. Joret, I. Spanning Tree Game on and Network Economics Newman and O. Weimann Planar and Bounded- (WINE) Treewidth Graphs C104 2009 J. McLurkin and E. A Distributed Boundary Proc. International (PR)(CO) Demaine Detection Algorithm for Multi- Conference on Intelligent Robot Systems Robots and Systems C105 2009 G. Aloupis, N. Efficient Reconfiguration of Proc. European Conference (PR)(CO) Benbernou, M. Damian, Lattice-Based Modular Robots on Mobile Robots E. Demaine, R. Flatland, J. Iacono and S. Wuhrer

C106 2009 M. Ajtai, V. Feldman, A. Sorting and Selection with Proc. International (PR)(CO) Hassidim and J. Nelson Imprecise Comparisons Colloquium on Automata, Languages and Programming (ICALP) C107 2009 R. Yuster and O. Computing the Girth of a Proc. International (PR)(CO) Weimann Planar Graph in O(n log n) Colloquium on Automata, time Languages and Programming (ICALP) C108 2009 R. Backofen, G. Landau, Fast RNA Structure Proc. Symposium on (PR)(CO) M. Möhl, D. Tsur and O. Alignment for Crossing Input Combinatorial Pattern Weimann Structures Matching (CPM) C109 2009 P. Klein, S. Mozes and O. Shortest Paths in Directed Proc. Symposium on Discrete (PR)(CO) Weimann Planar Graphs with Negative Algorithms (SODA) Lengths: A Linear-Space O(n log n)-Time Algorithm

C110 2010 K. Do Ba, P. Indyk, E. Lower Bounds for Sparse Proc. Symposium on Discrete (PR)(CO) Price and D.P. Woodruff Recovery Algorithms (SODA)

C111 2010 P. Indyk, H.Q. Ngo and Efficiently Decodable Non- Proc. Symposium on Discrete (PR)(CO) A. Rudra adaptive Group Testing Algorithms (SODA)

C112 2010 D.M. Kane, J. Nelson and An Optimal Algorithm for the Proc. Symposium on (PR)(CO) D.P. Woodruff Distinct Elements Problem Principles of Database Systems (PODS) C113 2010 J. Nelson and D.P. Fast Manhattan Sketches in Proc. Symposium on (PR)(CO) Woodruff Data Streams Principles of Database Systems (PODS) C114 2010 I. Diakonikolas, D.M. Bounded Independence Fools Proc. Symposium on (PR)(CO) Kane and J. Nelson Degree-2 Threshold Functions Foundations of Computer Science (FOCS) C115 2010 D.M. Kane, J. Nelson and On the Exact Space Proc. Symposium on Discrete (PR)(CO) D.P. Woodruff Complexity of Sketching and Algorithms (SODA) Streaming Small Norms

C116 2010 A. Beckmann , U. Meyer, Energy-Efficient Sorting Proc. International IEEE (PR)(CO) P. Sanders and J. Singler using Solid State Disks Green Computing Conference

C117 2010 M. Greve, A.G. Cell Probe Lower Bounds and Proc. International (PR) Jørgensen, K.D. Larsen Approximations for Range Colloquium on Automata, and J. Truelsen Mode Languages and Programming (ICALP) C118 2010 M. Olsen Maximizing PageRank with Proc. International (PR) new Backlinks Conference on Algorithms and Complexity (CIAC) C119 2010 G.S. Brodal, E. Demaine, Cache-Oblivious Dynamic Proc. Symposium on Discrete (PR)(CO) J. T. Fineman, J. Iacono, Dictionaries with Optimal Algorithms (SODA) S. Langerman and J.I. Update/Query Tradeoff Munro

C120 2010 A. Kaporis, A.N. Efficient Processing of 3- Proc. International (PR)(CO) Papadopoulos, S. Sided Range Queries with Conference on Database Sioutas, K. Tsakalidis and Probabilistic Guarantees Theory (ICDT) K. Tsichlas C121 2010 M.A. Abam and S. Har- New constructions of SSPDs Proc. Symposium on (PR)(CO) Peled and their applications Computational Geometry (SoCG) C122 2010 M.B. Kjærgaard, H. Indoor Positioning using GPS Proc. International (PR) Blunck, T. Godsk, T. Revisited Conference on Pervasive Toftkjær, D.L. Computing (Pervasive) Christensen, and K. Grønbæk C123 2010 L. Arge, M.T. Goodrich Parallel external memory Proc. International Parallel & (PR)(CO) and N. Sitchinava graph algorithms Distributed Processing Symposium (IPDPS)

C124 2010 P. Afshani, L. Arge and Orthogonal Range Reporting: Proc. Symposium on (PR) K.D. Larsen Query Lower Bounds, Computational Geometry Optimal Structures in 3-d, (SoCG) and Higher Dimensional Improvements

C125 2010 P. Afshani, L. Arge and I/O-Efficient Orthogonal Proc. Workshop on Massive K.D Larsen Range Reporting in Three Data Algorithmics (MASSIVE) and Higher Dimensions C126 2010 T. Mølhave, P.K. Agarwal, Scalable Algorithms for Large Proc. International (PR)(CO) L. Arge and M. Revsbæk High-Resolution Terrain Data Conference on Computing for Geospatial Research & Application (COM.GEO) C127 2010 L. Arge, M. Revsbæk and I/O-Efficient Computation of Proc. Symposium on (PR)(CO) Norbert Zeh Water Flow Across a Terrain Computational Geometry (SoCG) C128 2010 G.S. Brodal, P. Davoodi On Space Efficient Two Proc. European Symposium (PR)(CO) and S.S. Rao Dimensional Range Minimum on Algorithms (ESA) Data Structures C129 2010 D. Ajwani, N. Sitchinava Geometric Algorithms for Proc. European Symposium (PR)(CO) and N. Zeh Private-Cache Chip on Algorithms (ESA) Multiprocessors C130 2010 Z. Abel, N. Benbernou, Shape Replication Through Proc. Symposium on Discrete (PR)(CO) M. Damian, E.D. Self-Assembly and RNase Algorithms (SODA) Demaine, M.L. Demaine, Enzymes R. Flatland, S. Kominers and R. Schwelle

C131 2010 E.D. Demaine, M. Decomposition, Proc. Symposium on Discrete (PR)(CO) Hajiaghayi and K. Approximation, and Coloring Algorithms (SODA) Kawarabayashi of Odd-Minor-Free Graphs C132 2010 N. Gershenfeld, D. Reconfigurable Asynchronous Proc. Symposium on (PR)(CO) Dalrymple, K. Chen, A. Logic Automata Principles of Programming Knaian, F. Green, E.D. Langauges (POPL) Demaine, S. Greenwald and P. Schmidt-Nielsen

C133 2010 G. Aloupis, J. Cardinal, S. Matching Points with Things Proc. Latin American (PR)(CO) Collette, E.D. Demaine, Theoretical Informatics M.L. Demaine, M. Dulieu, Symposium (LATIN) R. Fabila-Monroy, V. Hart, F. Hurtado, S. Langerman, M. Saumell, C. Seara and P. Taslakian

C134 2010 E.D. Demaine and M. Scheduling to Minimize Proc. Symposium on (PR) Zadimoghaddam Power Consumption using Parallelism in Algorithms and Submodular Functions Architectures (SPAA) C135 2010 S. Gilbert, R. Guerraoui, Collaborative Scoring in the Proc. Symposium on (PR)(CO) F. Malakouti and M. Presence of Malicious Players Parallelism in Algorithms and Zadimoghaddam Architectures (SPAA)

C136 2010 N. Alon, E.D. Demaine, Basic Network Creation Proc. Symposium on (PR)(CO) M. Hajiaghayi and T. Games Parallelism in Algorithms and Leighton Architectures (SPAA)

C137 2010 E.D. Demaine and M. Minimizing the Diameter of a Proc. Scandinavian Workshop (PR) Zadimoghaddam Network using Shortcut Edge on Algorithm Theory (SWAT)

C138 2010 M. Bateni, M.H. Submodular Secretary Proc. Workshop on (PR)(CO) Hajiaghayi and M. Problem and Extensions Approximation Algorithms for Zadimoghaddam Combinatorial Optimization Problems (APPROX)

C139 2010 B. Ballinger, N. Coverage with k- Proc. International (PR)(CO) Benbernou, P. Bose, M. Transmitters in the Presence Conference on Combinatorial Damian, E.D. Demaine, of Obstacles Optimization and Applications V. Dujmović, R. Flatland, (COCOA) F. Hurtado, J. Iacono, A. Lubiw, P. Morin, V. Sacristán, D. Souvaine and R. Uehara

C140 2010 E.D. Demaine and M. Constant Price of Anarchy in Proc. International Workshop (PR) Zadimoghaddam Network Creation Games via on Algorithms and Models for Public Service Advertising the Web-Graph

C141 2010 G. S. Brodal, C. Kejlberg- A Cache-oblivious Implicit Proc. International (PR) Rasmussen and J. Dictionary with the Working Symposium on Algorithms Truelsen Set Property and Computation (ISAAC) C142 2010 L. Arge, K. D. Larsen, T. Cleaning Massive Sonar Point Proc. International (PR) Mølhave and F. van Clouds Conference on Advances in Walderveen Geographic Information System (ACM-GIS) C143 2010 G.S Brodal, Ss.Sioutas, D2-Tree: A New Overlay with Proc. International (PR)(CO) K. Tsichlas and C. Deterministic Bounds Symposium on Algorithms Zaroliagis and Computation (ISAAC) C144 2010 F. Gieseke, G. Moruz and Resilient kd-trees: K-means Proc. Conference on Data (PR)(CO) J. Vahrenhold in space revisited Mining (ICDM) C145 2010 J. Brody and E. Verbin The Coin Problem and Proc. Symposium on (PR)(CO) Pseudorandomness for Foundations of Computer Branching Programs Science (FOCS) C146 2011 H. Blunck, M. B. Sensing and Classifying Proc. International (PR)(CO) Kjærgaard and T. S. Impairments of GPS Conference on Pervasive Toftegaard Reception on Mobile Devices Computing (Pervasive)

C147 2011 A. G. Jorgensen and K. Range Selection and Median: Proc. Symposium on Discrete (PR) G. Larsen, Tight Cell Probe Lower Algorithms (SODA) Bounds and Adaptive Data Structures C148 2011 P. Afshani, P. K. Agarwal, (Approximate) Uncertain Proc. International (PR)(CO) L. Arge, K. G. Larsen and Skylines Conference on Database J. M. Phillips Theory (ICDT)

C149 2011 T. M. Chan, K. G. Larsen Orthogonal Range Searching Proc. Symposium on (PR)(CO) and M. Patrascu on the RAM, Revisited Computational Geometry (SoCG) C150 2011 K. G. Larsen On Range Searching in the Proc. Symposium on (PR) Group Model and Foundations of Computer Combinatorial Discrepancy Science (FOCS)

C151 2011 M. de Berg and C. Exact and Approximate Proc. International (PR)(CO) Tsirogiannis Computations of Watersheds Conference on Advances in on Triangulated Terrains Geographic Information Systems (ACM-GIS) C152 2011 H.Haverkort and C. Flow on Noisy Terrains: An Proc. International (PR)(CO) Tsirogiannis Experimental Evaluation Conference on Advances in Geographic Information Systems (ACM-GIS) C153 2011 D. Ajwani, N. Sitchinava I/O-Optimal Distribution Proc. International (PR)(CO) and N. Zeh Sweeping on Private-Cache Symposium on Parallel and Chip Multiprocessors Distributed Processing (IPDPS) C154 2011 M.T. Goodrich, N. Sorting, Searching, and Proc. International Sitchinava and Q. Zhang Simulation in the MapReduce Symposium on Algorithms (PR)(CO) Framework and Computation (ISAAC) C155 2011 M. A. Abam, S. Computing Homotopic Line Proc. European Workshop on (CO) Daneshpajouh, L. Simplification in a Plane Computational Geometry Deleuran, S. Ehsani and (EuroCG) M. Ghodsi C156 2011 P. Afshani and N. Zeh Improved Space Bounds for Proc. Symposium on Discrete (PR)(CO) Cache-Oblivious Range Algorithms (SODA) Reporting C157 2011 P. Afshani, G.S. Brodal Ordered and Unordered Top- Proc. Symposium on Discrete (PR)(CO) and N. Zeh K Range Reporting in Large Algorithms (SODA) Data Sets C158 2011 G.S. Brodal, G. Moruz, OnlineMin: A Fast Strongly Proc. Workshop on (PR) and A. Negoescu Competitive Randomized Approximation and Online Paging Algorithm Algorithms (WAOA)

C159 2011 G.S. Brodal, P. Davoodi, Path Minima Queries in Proc. Workshop on (PR)(CO) and S.S. Rao Dynamic Weighted Trees Algorithms and Data Structures (WADS) C160 2011 G.S. Brodal and K. Dynamic Planar Range Proc. International (PR) Tsakalidis Maxima Queries Colloquium on Automata, Languages, and Programming (ICALP) C161 2011 G.S. Brodal, M. Greve, V. Integer Representations Proc. Conference on Theory (PR)(CO) Pandey and S.S. Rao towards Efficient Counting in and Applications of Models of the Bit Probe Model Computation (TAMC)

C162 2011 H.L. Chan, T.W. Lam, Edit Distance to Monotonicity Proc. International (PR) (CO) L.K. Lee, J. Pan, H.F. in Sliding Windows Symposium on Algorithms Ting and Q. Zhang and Computation (ISAAC) C163 2011 D. Ajwani, A. Cosgaya- Engineering a Topological Proc. Workshop on Algorithm (PR)(CO) Lozano and N. Zeh Sorting Algorithm for Engineering and Experiments Massive Graphs (ALENEX)

C164 2011 S.H. Chan, T.W. Lam, Sleep management on Proc. International (PR) (CO) L.K. Lee, C.M. Liu and multiple machines for energy Colloquium on Automata, H.F. Ting and flow time Languages and Programming (ICALP) C165 2011 A.G. Jørgensen, M. Geometric Computations on Proc. International Workshop (PR)(CO) Loffler and J. Phillips Indecisive Points on Algorithms and Data Structures (WADS)

C166 2011 P. Davoodi and S. Succinct Dynamic Cardinal Proc. Theory and Applications (PR)(CO) Srinivasa Rao Trees with Constant Time of Models of Computation Operations for Small Alphabet (TAMC) C167 2011 E. Verbin and W. Yu The Streaming Complexity of Proc. Symposium on Discrete (PR) Cycle Counting, Sorting By Algorithms (SODA) Reversals, and Other Problems

C168 2011 U. Meyer, A. Negoescu New bounds for old Proc. Conference on Theory (PR) and V. Weichert algorithms: On the average- and Practice of Algorithms in case behavior of classic (Computer) Systems (TAPAS) single-source shortest path approaches C169 2011 M. Manjunath, K. Approximate Counting of Proc. European Symposium (PR)(CO) Mehlhorn, K. Panagiotou Cycles in Streams on Algorithms (ESA) and H. Sun C170 2011 E. Price Efficient Sketches for the Set Proc. Symposium on Discrete (PR) Query Problem Algorithms (SODA)

C171 2011 P. Indyk and E. Price K-Median Clustering, Model- Proc. Symposium on Theory (PR) Based Compressive Sensing, of Computing (STOC) and Sparse Recovery for Earth Mover Distance

C172 2011 P. Indyk, E. Price and D. On the Power of Adaptivity in Proc. Symposium on (PR)(CO) P. Woodruff Sparse Recovery Foundations of Computer Science (FOCS) C173 2011 R. Gupta, P. Indyk, E. Compressive Sensing with Proc. Symposium on (PR)(CO) Price and Y. Rachlin Local Geometric Features Computational Geometry (SoCG) C174 2011 E. Price and D. P. (1+eps)-approximate sparse Proc. Symposium on (PR)(CO) Woodruff recovery Foundations of Computer Science (FOCS) C175 2011 D. M. Kane, J. Nelson, E. Fast Moment Estimation in Proc. Symposium on Theory (PR)(CO) Porat and D. P. Woodruff Data Streams in Optimal of Computing (STOC) Space C176 2011 D. M. Kane, R. Meka and Almost Optimal Explicit Proc. International Workshop (PR)(CO) J. Nelson Johnson-Lindenstrauss on Randomization and Transformations Computation (RANDOM )

C177 2011 D. B. Khanh and P. Indyk Sparse recovery with partial Proc. Workshop on (PR)(CO) support knowledge Approximation Algorithms for Combinatorial Optimization Problems (APPROX)

C178 2011 K. Kawarabayashi, P. N. Linear-Space Approximate Proc. International (PR)(CO) Klein and C. Sommer Distance Oracles for Planar, Colloquium on Automata, Bounded-Genus, and Minor- Languages, and Free Graphs Programming (ICALP)

C179 2011 C. Gavoille and C. Sparse Spanners vs. Proc. Symposium on (PR)(CO) Sommer Compact Routing Parallelism in Algorithms and Architectures (SPAA) C180 2011 H. N. Djidjev and C. Approximate Distance Proc. European Symposium (PR)(CO) Sommer Queries for Weighted on Algorithms (ESA) Polyhedral Surfaces C181 2011 D. Alistarh, J. Aspnes, K. Optimal-Time Adaptive Tight Proc. Symposium on (PR)(CO) Censor-Hillel, S. Gilbert Renaming, with Applications Principles of Distributed and M. Zadimoghaddam to Counting Computing (PODC)

C182 2011 A. Karbasi and M. Compression with Graphical Proc. International (PR)(CO) Zadimoghaddam Constraints: An Interactive Symposium on Information Browser Theory (ISIT) C183 2011 B. Haeupler, V. Mirrokni Online Stochastic Weighted Proc. Workshop on Internet & (PR)(CO) and M. Zadimoghaddam Matching: Improved Network Economics Approximation Algorithms

C184 2011 Z. Abel, E. D. Demaine, Folding Equilateral Plane Proc. International (PR)(CO) M. L. Demaine, S. Graphs Symposium on Algorithms Eisenstat, J. Lynch, T. B. and Computation (ISAAC) Schardl and I. Shapiro- Ellowitz C185 2011 E. D. Demaine, S. One-Dimensional Staged Proc. International (PR)(CO) Eisenstat, M. Ishaque Self-Assembly Conference on DNA and A. Winslow Computing and Molecular Programming C186 2011 E. D. Demaine, M. L. Algorithms for Solving Proc. European Symposium (PR)(CO) Demaine, S. Eisenstat, A. Rubik's Cubes on Algorithms (ESA) Lubiw and A. Winslow

C187 2011 E. D. Demaine and S. Flattening Fixed-Angle Proc. International Workshop (PR) Eisenstat Chains Is Strongly NP-Hard on Algorithms and Data Structures (WADS)

C188 2011 P. Christiano, E. D. Lossless Fault-Tolerant Data Proc. International Workshop (PR)(CO) Demaine and S. Kishore Structures with Additive on Algorithms and Data Overhead Structures (WADS)

C189 2011 P. Berman, E. D. O(1)-Approximations for Proc. Workshop on (PR)(CO) Demaine and M. Maximum Movement Approximation Algorithms for Zadimoghaddam Problems Combinatorial Optimization Problems (APPROX)

C190 2011 G. Aloupis, E. D. Meshes preserving minimum Proc. Spanish Meeting on (CO) Demaine, M. L. Demaine, feature size Computational Geometry V. Dujmovic and J. Iacono

C191 2011 E. D. Demaine and A. A generalization of the Proc. Spanish Meeting on (CO) Lubiw source unfolding of convex Computational Geometry polyhedra C192 2011 E. D. Demaine, M. Contraction Decomposition in Proc. Symposium on Theory (PR)(CO) Hajiaghayi and K. H-Minor-Free Graphs and of Computing (STOC) Kawarabayashi Algorithmic Applications

C193 2011 E. D. Demaine, M. J. Self-Assembly of Arbitrary Proc. Symposium on (PR)(CO) Patitz, R. T. Schweller Shapes Using RNAse Theoretical Aspects of and S. M. Summers Enzymes: Meeting the Computer Science (STACS) Kolmogorov Bound with Small Scale Factor C194 2011 E. D. Demaine and A. Embedding Stacked Proc. Symposium on Discrete (PR)(CO) Schulz Polytopes on a Polynomial- Algorithms (SODA) Size Grid C195 2012 P. Davoodi, M. Smid and Two-Dimensional Range Proc. Latin American (PR)(CO)(OA) F. van Walderveen Diameter Symposium on Theoretical Informatics (LATIN)

C196 2012 L. Arge, M.T. Goodrich Computing betweenness Workshop on Massive Data (CO)(OA) and F. van Walderveen centrality in external memory Algorithmics (MASSIVE)

C197 2012 K. G. Larsen and R. Pagh I/O-Efficient Data Structures Proc. Symposium on Discrete (PR)(CO)(OA) for Colored Range and Prefix Algorithms (SODA) Reporting

C198 2012 T. M. Chan, S. Durocher, Linear-Space Data Proc. Symposium on (PR)(CO)(OA) K. G. Larsen, J. Morrison Structures for Range Mode Theoretical Aspects of and B. T. Wilkinson Query in Arrays Computer Science (STACS)

C199 2012 K. G. Larsen The Cell Probe Complexity of Proc. Symposium on Theory (PR)(OA) Dynamic Range Counting of Computing (STOC)

C200 2012 P. Afshani, L. Arge and K. Higher-dimensional Proc. Symposium on (PR)(OA) G. Larsen Orthogonal Range Reporting Computational Geometry and Rectangle Stabbing in (SoCG) the Pointer Machine Model

C201 2012 K. G. Larsen and H. L. Improved Range Searching Proc. Symposium on (PR)(CO)(OA) Nguyen Lower Bounds Computational Geometry (SoCG) C202 2012 K. G. Larsen Higher Cell Probe Lower Proc. Symposium on (PR)(OA) Bounds for Evaluating Foundations of Computer Polynomials Science (FOCS) C203 2012 L. Arge, L. Deleuran, T. Simplifying Massive Contour Proc. European Symposium (PR)(OA) Mølhave, M. Revsbæk Maps on Algorithms (ESA) and J. Truelsen

C204 2012 Z. Huang, K. Yi and Q. Randomized Algorithms for Proc. Symposium on (PR)(CO)(OA) Zhang, Tracking Distributed Count, Principles of Database Frequencies, and Ranks Systems (PODS) C205 2012 D.P. Woodruff and Q. Tight Bounds for Distributed Proc. Symposium on Theory (PR)(CO)(OA) Zhang Functional Monitoring of Computing (STOC)

C206 2012 J. M. Phillips, E. Verbin Lower Bounds for Number-in- Proc. Symposium on Discrete (PR)(CO)(OA) and Q. Zhang Hand Multiparty Algorithms (SODA) Communication Complexity, Made Easy

C207 2012 E. Verbin and Q. Zhang Rademacher-Sketch: A Proc. International (PR)(OA) Dimensionality-Reducing Colloquium on Automata, Embedding for Sum-Product Languages and Programming Norms, with an Application (ICALP) to Earth-Mover Distance

C208 2012 H.L. Chan, S.H. Chan, Non-clairvoyant weighted Proc. ACM Symposium on (PR)(CO)(OA) T.W. Lam, L.K. Lee, and flow time scheduling with Parallelism in Algorithms and J. Zhu rejection penalty Architectures (SPAA) C209 2012 G. S. Brodal, J. A. S. Finger search in the implicit Proc. International (PR)(OA) Nielsen and J. Truelsen model Symposium on Algorithms and Computation (ISAAC) C210 2012 G.S. Brodal and C. Cache-Oblivious Implicit Proc. Symposium on (PR)(OA) Kejlberg-Rasmussen Predecessor Dictionaries with Theoretical Aspects of the Working-Set Property Computer Science (STACS)

C211 2012 X. Sun , C. Wang and W. The Relationship between Proc. Latin American (PR)(CO)(OA) Yu Inner Product and Counting Theoretical Informatics Cycles Symposium (LATIN) C212 2012 P. Davoodi, R. Raman Succinct Representations of Proc. International (PR)(CO)(OA) and S. S. Rao Binary Trees for Range Computing and Minimum Queries Combinatorics Conference (COCOON) C213 2012 G.S. Brodal, S. Sioutas, Fully Persistent B-trees Proc. Symposium on Discrete (PR)(CO)(OA) K. Tsakalidis and K. Algorithms (SODA) Tsichlas C214 2012 G.S. Brodal, G. Strict Fibonacci Heaps Proc. Symposium on Theory (PR)(CO)(OA) Lagogiannis and R.E. of Computing (STOC) Tarjan. C215 2012 G.S Brodal, P. Davoodi, Two Dimensional Range Proc. European Symposium (PR)(CO)(OA) M. Lewenstein, R. Raman Minimum Queries and on Algorithms (ESA) and S. S. Rao Fibonacci Lattices

C216 2012 D. Ajwani, A. Beckmann, I/O-efficient approximation Proc. Workshop on Parallel (PR)(CO)(OA) U. Meyer and D. Veith of graph diameter by parallel Systems and Algorithms cluster growing - a first (PASA) experimental study

C217 2012 A. Beckmann, J. A structural analysis of the Proc. Workshop on Graph (PR)(CO)(OA) Fedorowicz, J.Keller and A5/1 state transition graph Inspection and Traversal U. Meyer Engineering (GRAPHite) C218 2012 G. Moruz and A. Outperforming LRU via Proc. Symposium on Discrete (PR)(OA) Negoescu Competitive Analysis on Algorithms (SODA) Parametrized Inputs for Paging C219 2012 G. Moruz, A. Negoescu, Engineering Efficient Paging Proc. Symposium on (PR)(OA) C. Neumann and V. Algorithms Experimental Algorithms Weichert (SEA)

C220 2012 H. Hassanieh, P. Indyk, Simple and Practical Proc. Symposium on Discrete (PR)(CO)(OA) D. Katabi and E. Price Algorithm for Sparse Fourier Algorithms (SODA) Transform C221 2012 H. Hassanieh, P. Indyk, Nearly Optimal Sparse Proc. Symposium on Theory (PR)(CO)(OA) D. Katabi and E. Price Fourier Transform of Computing (STOC)

C222 2012 S. Mozes and C. Sommer Exact Distance Oracles for Proc. Symposium on Discrete (PR)(CO)(OA) Planar Graphs Algorithms (SODA)

C223 2012 T. Akiba, C. Sommer Shortest-Path Queries for Proc. International (PR)(CO)(OA) and K-i Kawarabayashi Complex Networks: Conference on Extending Exploiting Low Tree-width Database Technology (EDBT) Outside the Core C224 2012 S. Kreutzer and S. Tazari Directed Nowhere Dense Proc. Symposium on Discrete (PR)(CO)(OA) Classes of Graphs Algorithms (SODA) C225 2012 V.S. Mirrokni, S. O. Simultaneous Proc. Symposium on Discrete (PR)(CO)(OA) Gharan and M. approximations for Algorithms (SODA) Zadimoghaddam adversarial and stochastic online budgeted allocation

C226 2012 D. M. Kane and J. Nelson Sparser Johnson- Proc. Symposium on Discrete (PR)(CO)(OA) Lindenstrauss Transforms Algorithms (SODA)

C227 2012 C. Tsirogiannis, B. Sandel Efficient Computation of Proc. Workshop on (PR)(CO)(OA) and D. Cheliotis Popular Phylogenetic Tree Algorithms in Bioinformatics Measures (WABI) C228 2012 L. Arge, H. Haverkort Fast Generation of Multiple Proc. International (PR)(CO)(OA) and C. Tsirogiannis Resolution Instances of Conference on Advances in Raster Data Sets Geographic Information Systems (ACM-GIS) C229 2012 P. Afshani and N. Zeh Lower Bounds for Sorted Proc. European Symposium (PR)(CO)(OA) Geometric Queries in the I/O on Algorithms (ESA) Model C230 2012 P. Afshani Improved pointer machine Proc. ACM Symposium on (PR)(OA) and I/O lower bounds for Computational Geometry simplex range reporting and (SoCG) related problems

C231 2012 T. M. Chan, S. Durocher, Linear-Space Data Proc. Scandinavian Workshop (PR)(CO)(OA) M. Skala, and B. T. Structures for Range on Algorithm Theory (SWAT) Wilkinson Minority Query in Arrays C232 2012 H. Jowhari Efficient Communication Proc. European Symposium (PR)(OA) Protocols for Deciding Edit on Algorithms (ESA) Distance C233 2012 L. K. Lee, M. Lewenstein Parikh matching in the Proc. International (PR)(CO)(OA) and Q. Zhang. streaming model Symposium on String Processing and Information Retrieval (SPIRE)

C234 2012 D. Belazzougui and R. Compressed String Proc. Symposium on (PR)(CO)(OA) Venturini Dictionary Look-up Combinatorial Pattern (CPM) with Edit Distance One C235 2012 B. Ammitzbøll Jurik and Audio Quality Assurance: An Proc. iPRES Conference (PR)(CO)(OA) J.A.S. Nielsen Application of Cross Correlation C236 2012 N. Sitchinava and N. Zeh A parallel buffer tree Proc. ACM Symposium on (PR)(CO)(OA) Parallelism in Algorithms and Architectures (SPAA)

C237 2012 D. Ajwani, U. Meyer and I/O-efficient Hierarchical Proc. European Symposium (PR)(CO)(OA) D. Veith Diameter Approximation on Algorithms (ESA)

C238 2012 D. Kane, K. Mehlhorn, T. Counting Arbitrary Proc. International (PR)( Sauerwald and H. Sun Subgraphs in Data Streams Colloquium on Automata, Languages, and Programming (ICALP) C239 2012 M. Wibral, P. Wollstadt, Revisiting Wiener’s principle Proc. International (PR)(CO)(OA) U. Meyer, N. Pampu, of causality — interaction- Confonference in Medicine & V.Priesemann and R. delay Biology Society (EMBC) Vicente reconstruction using transfer entropy and multivariate analysis on delay-weighted graphs C240 2012 P. Indyk, R. Levi and R. Approximating and Testing k- Proc. Symposium on (PR)(CO)(OA) Rubinfeld Histogram Distributions in Principles of Database Sub-linear Time Systems (PODS)

C241 2012 J. Wang, H. Hassanieh, Efficient and Reliable Low- Proc. International (PR)(CO)(OA) D. Katabi and P. Indyk Power Backscatter Networks Conference on Mobile Computing and Networking (SIGCOMM) C242 2012 H. Hassanieh, F. Adib, Faster GPS Via the Sparse Proc. International (PR)(CO)(OA) D. Katabi and P. Indyk Fourier Transform Conference on Mobile Computing and Networking (MOBICOM) C243 2012 E. Price and D. Woodruff Applications of the Shannon- Proc. International (PR)(CO)(OA) Hartley Theorem to Data Symposium on Information Streams and Sparse Recovery Theory (ISIT)

C244 2012 L. Hamilton, D. Parker, Focal Plane Array Folding for Proc. Applied Imagery (PR)(CO)(OA) C. Yu and P. Indyk Efficient Information Patterns Recognition Extraction and Tracking Workshop (AIPR) C245 2013 E. Price and D. Woodruff Lower Bounds for Adaptive Proc. Symposium on Discrete (PR)(CO)(OA) Sparse Recovery Algorithms (SODA)

C246 2013 A. Andoni, H. Hassanieh, Shift Finding in Sub-linear Proc. Symposium on Discrete (PR)(CO)(OA) P. Indyk and D. Katabi Time Algorithms (SODA)

C247 2012 E.D. Demaine, M.L. Picture-Hanging Puzzles Proc. International (CO)(OA) Demaine, Y. N. Minsky, Conference on Fun with J.S.B. Mitchell, R.L. Algorithms Rivest and M. Patrascu

C248 2012 E. D. Demaine, M.L. Refold Rigidity of Convex Proc. European Workshop on (CO)(OA) Demaine, J-i. Itoh, A. Polyhedra Computational Geometry Lubiw, C. Nara and J. O'Rourke C249 2012 S. Lim, C. Sommer, E. Practical Route Planning Proc. Robotics: Science and (PR)(CO)(OA) Nikolova and D. Rus Under Delay Uncertainty: Systems VIII Stochastic Shortest Path Queries C250 2012 C. Ratti and C. Sommer Approximating Shortest Proc. International (PR)(CO)(OA) Paths in Spatial Social Conference on Social Networks Computing C251 2012 M. Zadimoghaddam and Efficiently Learning from Proc. International Workshop (PR)(CO)(OA) A. Roth Revealed Preference on Internet and Network Economics C252 2012 C. Guo, Y. Ma, B. Yang, EcoMark: Evaluating Models Proc. International (PR)(CO)(OA) C. S. Jensen and M. Kaul of Vehicular Environmental Conference on Advances in Impact Geographic Information Systems (ACM-GIS)

C253 2012 X. Li, P. Karras, L. Shi, Cooperative Scalable Moving Proc. International (PR)(CO)(OA) K.-L. Tan and C. S. Continuous Query Processing Conference on Mobile Data Jensen Management (MDM) C254 2012 D. Šidlauskas, C. S. A Comparison of the Use of Proc. International Workshop (PR)(CO)(OA) Jensenand S. Šaltenis Virtual Versus Physical on Data Management on New Snapshots for Supporting Hardware (DaMoN) Update-Intensive Workloads

C255 2012 J. Rishede, M. L. Yiu and Effective Caching of Shortest Proc. International (PR)(CO)(OA) C. S. Jensen Paths for Location-Based Conference on the Services Management of Data (SIGMOD) C256 2012 D. Šidlauskas, S. Šaltenis Parallel Main-Memory Proc. International (PR)(CO)(OA) and C. S. Jensen Indexing for Moving-Object Conference on the Query and Update Workloads Management of Data (SIGMOD) C257 2012 H. Lu, X. Cao and C. S. A Foundation for Efficient Proc. International (PR)(CO)(OA) Jensen Indoor Distance-Aware Conference on Data Query Processing Engineering (ICDE) C258 2012 H. Lu and C. S. Jensen Upgrading Uncompetitive Proc. International (PR)(CO)(OA) Products Economically Conference on Data Engineering (ICDE) C259 2012 X. Cao, L. Chen, G. Spatial Keyword Querying Proc. International (CO) (OA) Cong, C. S. Jensen, Q. (invited paper) Conference on Conceptual Qu, A. Skovsgaard, D. Modeling (ER) Wu and M. L. Yiu

C260 2013 M. Olsen and M. Revsbæk Alliances and Bisection Width Proc. International Workshop (PR)(OA) for Planar Graphs on Algorithms and Computation (WALCOM)

C261 2013 K.G. Larsen and F. van Near-Optimal Range Proc. Symposium on Discrete (PR)(OA) Walderveen Reporting Structures for Algorithms (SODA) Categorical Data C262 2013 K. Bringmann and K.G. Succinct Sampling from Proc. ACM Symposium on (PR)(CO)(OA) Larsen Discrete Distributions Theory of Computing (STOC)

C263 2013 Chr. Tsirogiannis and Uncovering the Missing Proc. Conference on (PR)(CO)(OA) Con. Tsirogiannis Routes : An Algorithmic Computer Applications and Study on the Illicit Quantitative Methods in Antiquities Trade Network Archaeology (CAA) C264 2013 C. Tsirogiannis and B. Computing the Skewness of Proc. Workshop on (PR)(CO)(OA) Sandel the Phylogenetic Mean Algorithms in Bioinformatics Pairwise Distance in Linear (WABI) Time

C265 2013 L. Arge, G.S. Brodal, J. An Optimal and Practical Proc. European Symposium (PR)(CO)(OA) Truelsen and C. Cache-Oblivious Algorithm on Algorithms (ESA) Tsirogiannis for Computing Multiresolution Rasters C266 2013 L. Arge, M. de Berg and Algorithms for Computing Proc. International (PR)(CO)(OA) C. Tsirogiannis Prominence on Grid Terrains Conference on Advances in Geographic Information System (ACM-GIS)

C267 2013 L. Arge and M. Thorup RAM-Efficient External Proc. International (PR) (CO)(OA) Memory Sorting Symposium on Algorithms and Computation (ISAAC)

C268 2013 L. Arge, M.T. Goodrich Computing betweenness Proc. IEEE International (PR)(CO)(OA) and F. van Walderveen centrality in external memory Symposium on Big Data

C269 2013 L. Arge, J. Fischer, P. On (Dynamic) Range Proc. Workshop on (PR)(CO)(OA) Sanders and N. Sitchinava Maximum Queries in External Algorithms and Data Memory Structures (WADS) C270 2013 L. Arge, F. van Multiway simple cycle Proc. Symposium on Discrete (PR)(CO)(OA) Walderveen and N. Zeh separators and I/O-efficient Algorithms (SODA) algorithms for planar graphs

C271 2013 G.S. Brodal, A. Brodnik The Encoding Complexity of Proc. European Symposium (PR)(CO)(OA) and P. Davoodi Two Dimensional Range on Algorithms (ESA) Minimum Data Structures

C272 2013 A. Sand, G.S. Brodal, R. A practical O(n log n) time Proc. Asia Pacific (PR)(CO)(OA) Fagerberg, C.N.S. algorithm for computing the Bioinformatics Conference Pedersen and T. Mailund triplet distance on binary (APBC) trees C273 2013 G.S. Brodal, R. Efficient Algorithms for Proc. Symposium on Discrete (PR)(CO)(OA) Fagerberg, C.N.S. Computing the Triplet and Algorithms (SODA) Pedersen, T. Mailund and Quartet Distance Between A. Sand Trees of Arbitrary Degree

C274 2013 S. Pettie and H.-H. Su Fast Distributed Coloring Proc. International (PR)(CO)(OA) Algorithms for Triangle-Free Colloquium on Automata, Graphs Languages and Programming (ICALP) C275 2013 C. Kejlberg-Rasmussen, I/O-Efficient Planar Range Proc. Symposium on (PR)(CO)(OA) Y. Tao, J. Yoon, K. Skyline and Attrition Priority Principles of Database Tsichlas and K. Tsakalidis Queues Systems (PODS)

C276 2013 Z. Wei and K. Yi The Space Complexity of 2- Proc. Symposium on Discrete (PR)(CO)(OA) Dimensional Approximate Algorithms (SODA) Range Counting

C277 2013 E. Verbin and W. Yu Data Structure Lower Bounds Proc. Symposium on (PR)(OA) on Random Access to Combinatorial Pattern Grammar-Compressed Matching (CPM) Strings C278 2013 D. Belazzougui and R. Compressed Static Functions Proc. Symposium on Discrete (PR)(CO)(OA) Venturini with Applications Algorithms (SODA)

C279 2013 T.M. Chan and B.T. Adaptive and Approximate Proc. Symposium on Discrete (PR)(CO)(OA) Wilkinson Orthogonal Range Counting Algorithms (SODA) C280 2013 S. Alamdari, P. Angelini, Morphing Planar Graph Proc. Symposium on Discrete (PR)(CO)(OA) T.M. Chan, G. Di Battista, Drawings with a Polynomial Algorithms (SODA) F. Frati, A. Lubiw, M. Number of Steps Patrignani, V. Roselli, S. Singla and B.T. Wilkinson

C281 2013 T. Jurkiewicz and K. The Cost of Address Proc. Workhop on Algorithm (PR)(OA) Mehlhorn Translation Engineering and Experiments (ALENEX)

C282 2013 G. Moruz and A. Improved space bounds for Proc. International (PR)(OA) Negoescu strongly competitive Colloquium on Automata, randomized paging Languages and Programming algorithms (ICALP) C283 2013 A. Beckmann, U. Meyer, An Implementation of I/O- Proc. European Symposium (PR) (CO)(OA) and D. Veith Efficient Dynamic Breadth- on Algorithms (ESA) First Search Using Level- Aligned Hierarchical Clustering C284 2013 L. Radaelli and C.S. Towards Fully Organic Indoor Proc. International Workshop (PR)(CO)(OA) Jensen Positioning on Indoor Spatial Awareness (ISA) C285 2013 X. Li, V. Ceikute, C.S. Trajectory Based Optimal Proc. International (PR)(CO)(OA) Jensen and K.-L. Tan Segment Computation in Conference on Advances in Road Network Databases Geographic Information Systems (ACM-GIS)

C286 2013 M.B. Kjærgaard, M.V. Indoor Positioning using Wi- Proc. International (PR)(CO)(OA) Krarup, A. Stisen, T.S. Fi—How Well Is the Problem Conference on Indoor Prentow, H. Blunck, K. Understood? Positioning and Indoor Grønbæk and C.S. Navigation (IPIN) Jensen C287 2013 V. Ceikute and C.S. Routing Service Proc. International (PR)(CO)(OA) Jensen Quality—Local Driver Conference on Mobile Data Behavior Versus Routing Management (MDM) Services C288 2013 M. Kaul, B. Yang and Building Accurate 3D Spatial Proc. International (PR)(CO)(OA) C.S. Jensen Networks to Enable Next Conference on Mobile Data Generation Intelligent Management (MDM) Transportation Systems

C289 2013 L. Radaelli, D. Sabonis, Identifying Typical Proc. International (PR)(CO)(OA) H. Lu and C.S. Jensen Movements Among Indoor Conference on Mobile Data Objects— Management (MDM) Concepts and Empirical Study

C290 2013 A. Baniukevic, C.S. Hybrid Indoor Positioning Proc. International (PR)(CO)(OA) Jensen and H. Lu With Wi-Fi and Bluetooth: Conference on Mobile Data Architecture and Performance Management (MDM)

C291 2013 O. Andersen, C.S. EcoTour: Reducing the Proc. International (PR)(CO)(OA) Jensen, K. Torp and B. Environmental Footprint of Conference on Mobile Data Yang Vehicles Using Eco-Routes Management (MDM)

C292 2013 L.R.A. Derczynski, B. Towards Context-Aware Proc. International (PR)(CO)(OA) Yang and C.S. Jensen Search and Analysis on Conference on Extending Social Media Data Database Technology (EDBT)

C293 2013 B. Yang, N. Fantini and iPark: Identifying Parking Proc. International (PR)(CO)(OA) C.S. Jensen Spaces from Trajectories Conference on Extending Database Technology (EDBT)

C294 2013 E. Grant, C. Hegde and Nearly Optimal Linear Proc. Global Conference on (PR)(CO)(OA) P. Indyk Embeddings into Very Low Signal and Information Dimensions Processing (GlobalSIP)

C295 2013 S. Abbar, S. Amer-Yahia, Diverse near neighbor Proc. Symposium on (PR)(CO)(OA) P. Indyk, S. Mahabadi problem Computational Geometry and K.R. Varadarajan (SoCG) C296 2013 E. Grant and P. Indyk Compressive sensing using Proc. Symposium on (PR)(CO)(OA) locality-preserving matrices Computational Geometry (SoCG) C297 2013 L. Schmidt, C. Hegde and The Constrained Earth Proc. International (PR)(CO)(OA) P. Indyk Movers Distance Model, with Conference on Sampling Applications to Compressive Theory and Applications Sensing (SampTA) C298 2013 S. Abbar, S. Amer-Yahia, Real-time recommendation Proc. International (PR)(CO)(OA) P. Indyk and S. Mahabadi of diverse related articles conference on World Wide Web (WWW) C299 2013 P. Indyk and I. On Model-Based RIP-1 Proc. International (PR)(CO)(OA) Razenshteyn Matrices Colloquium on Automata, Languages and Programming (ICALP) C300 2013 B. Ghazi, H. Hassanieh, Sample-optimal average- Proc. Allerton Conference (PR)(CO)(OA) P. Indyk, D. Katabi, E. case sparse fourier Price and L. Shi transform in two dimensions

C301 2013 E. Price and D. Woodruff Lower Bounds for Adaptive Proc. Symposium on Discrete (PR)(CO)(OA) Sparse Recovery Algorithms (SODA)

C302 2013 A. Andoni, H. Hassanieh, Shift Finding in Sub-linear Proc. Symposium on Discrete (PR)(CO)(OA) P. Indyk and D. Katabi Time Algorithms (SODA)

C303 2013 S. Har-Peled, P. Indyk Euclidean spanners in high Proc. Symposium on Discrete (PR)(CO)(OA) and A. Sidiropoulos dimensions Algorithms (SODA)

C304 2013 E.D. Demaine, P. Blame Trees Proc. Algorithms and Data (PR)(CO)(OA) Panchekha, D. Wilson Structures Symposium and E.Z. Yang (WADS) C305 2013 E.D. Demaine, M.J. The two-handed tile Proc. International (PR)(CO)(OA) Patitz, T.A. Rogers, R.T. assembly model is not Colloquium on Automata, Schweller, S.M. Summers intrinsically universal Languages and Programming and D. Woods (ICALP)

C306 2013 E.D. Demaine, J. Iacono, Combining Binary Search Proc. International (PR)(CO)(OA) S. Langerman and O. Trees Colloquium on Automata, Ozkan Languages and Programming (ICALP) C307 2013 S. Cannon, E.D. Two Hands Are Better Than Proc. International (PR)(CO)(OA) Demaine, M.L. Demaine, One (up to constant factors): Symposium on Theoretical S. Eisenstat, M.J. Patitz, Self-Assembly In The 2HAM Aspects of Computer Science R. Schweller, S.M. vs. aTAM (STACS) Summers and A. Winslow

C308 2013 Z. Abel, E.D. Demaine, Algorithms for Designing Pop- Proc. International (PR)(CO)(OA) M.L. Demaine, S. Up Cards Symposium on Theoretical Eisenstat, A. Lubiw, A. Aspects of Computer Science Schulz, D. Souvaine, G. (STACS) Viglietta and A. Winslow

C309 2013 E.D. Demaine and M. Learning Disjunctions: Near- Proc. Symposium on Discrete (PR)(OA) Zadimoghaddam Optimal Trade-off between Algorithms (SODA) Mistakes and "I Don't Know's'' C310 2013 A. Karbasi and M. Constrained Binary Proc. International (PR)(CO)(OA) Zadimoghaddam Identification Problem Symposium on Theoretical Aspects of Computer Science (STACS)

C311 2013 M. Bateni, N. Revenue Maximization with Proc. International (PR)(CO)(OA) Haghpanah, B. Sivan Nonexcludable Goods Conference on Web and and M. Zadimoghaddam Internet Economics (WINE)

C312 2013 N. Korula, V.S. Mirrokni Bicriteria Online Matching: Proc. International (PR)(CO)(OA) and M. Zadimoghaddam Maximizing Weight and Conference on Web and Cardinality Internet Economics (WINE) C313 2013 G.S. Brodal A Survey on Priority Queues Proc. Conference on Space (OA) Efficient Data Structures, Streams and Algorithms

C314 2013 P. Afshani, M. Agrawal, The Query Complexity of Proc. Conference on Space (CO)(OA) B. Doerr, C. Doerr, K.G. Finding a Hidden Permutation Efficient Data Structures, Larsen, K. Mehlhorn Streams and Algorithms

C315 2013 E.D. Demaine, M.L. Variations on Instant Insanity Proc. Conference on Space (CO)(OA) Demaine, S. Eisenstat, Efficient Data Structures, T.D. Morgan and R. Streams and Algorithms Uehara C316 2013 N. Sundaram, A. Streaming Similarity Search Proc. International (PR)(CO)(OA) Turmukhametova, N. over one Billion Tweets using Conference on Very Large Satish, T. Mostak, P. Parallel Locality-Sensitive Data Bases (VLDB) Indyk, S. Madden and P. Hashing Dubey C317 2014 A. Skovsgaard, D. Scalable Top-k Spatio- Proc. International (PR)(OA) Sidlauskas and C.S. Temporal Term Querying Conference on Data Jensen Engineering (ICDE) C318 2014 A. Skovsgaard, D. A Clustering Approach to the Proc. International (PR)(OA) Sidlauskas and C.S. Discovery of Points of Conference on Mobile Data Jensen Interest from Geo-Tagged Management (MDM) Microblog Posts C319 2014 S. Alstrup, E. B. Near-Optimal Labeling Proc. Symposium on Discrete (PR)(CO)(OA) Halvorsen and K.G. Schemes for Nearest Algorithms (SODA) Larsen, Common Ancestors C320 2014 K.G. Larsen, I.J. Munro, On Hardness of Several Proc. Symposium on (PR)(CO)(OA) J.S. Nielsen and S.V. String Indexing Problems Combinatorial Pattern Thankachan Matching (CPM) C321 2014 G. S. Brodal and K. G. Optimal Planar Orthogonal Proc. Scandinavian Workshop (PR)(OA) Larsen Skyline Counting Queries on Algorithm Theory (SWAT)

C322 2014 P. Afshani Fast Computation of Output- Proc. Symposium on Discrete (PR)(OA) Sensitive Maxima in a Word Algorithms (SODA) RAM C323 2014 P. Afshani and K. Optimal Deterministic Proc. Symposium on Discrete (PR)(CO)(OA) Tsakalidis Shallow Cuttings for 3D Algorithms (SODA) Dominance Ranges C324 2014 P. Afshani, C. Sheng, Y. Concurrent Range Reporting Proc. Symposium on Discrete (PR)(CO)(OA) Tao and B.T. Wilkinson in Two-Dimensional Space Algorithms (SODA)

C325 2014 P. Afshani, T.M. Chan Deterministic Rectangle Proc. International (PR)(CO)(OA) and K. Tsakalidis Enclosure and Colloquium on Automata, Offline Dominance Reporting Languages, and on the RAM Programming (ICALP) C326 2014 P. Afshani and N. I/O-efficient Range Minima Proc. Workshops on (PR)(CO)(OA) Sitchinava Queries Algorithm Theory (SWAT) C327 2014 B. T. Wilkinson Amortized bounds for Proc. European Symposium (PR)(OA) dynamic orthogonal range on Algorithms (ESA) reporting C328 2014 Z. Huang and K. Yi The Communication Proc. Symposium on (PR)(CO)(OA) Complexity of Distributed Foundations of Computer epsilon-Approximations Science (FOCS)

C329 2014 K-M. Chung, S. Pettie Distributed Algorithms for Proc. Symposium on (PR)(CO)(OA) and H.-H. Su the Lovász Local Lemma and Principles of Distributed Graph Coloring Computing (PODC)

C330 2014 S. Gilbert, V. King, S. (Near) Optimal Resource- Proc. Symposium on (PR)(CO)(OA) Pettie, E. Porat, J. Saia Competitive Broadcast with Parallelism in Algorithms and and M. Young Jamming Architectures (SPAA) C331 2014 H.-H. Su Brief Announcement: A Proc. Symposium on (PR)(OA) Distributed Minimum Cut Parallelism in Algorithms and Approximation Scheme Architectures (SPAA) C332 2014 D.Belazzougui, G. S. Expected Linear Time Sorting Proc. Workshops on (PR)(CO)(OA) Brodal and J. S. Nielsen for Word Size Ω(log n Algorithm Theory (SWAT) loglogn) C333 2014 M.K. Holt, J. Johansen On the Scalability of Proc.Workshop on Algorithm (PR)(OA) and G.S. Brodal Computing Triplet and Engineering and Experiments Quartet Distances (ALENEX) C334 2014 Z. Wei and K. Yi Equivalence between Priority Proc. European Symposium (PR)(CO)(OA) Queues and Sorting in on Algorithms (ESA) External Memory

C335 2014 L Radaelli, Y. Moses and Using Cameras to Improve Proc. International (PR)(CO)(OA) C.S. Jensen Wi-Fi Based Indoor Symposium on Web and Positioning Wireless Geographical Information Systems C336 2014 B. Yang, C. Guo, C. S. Multi-Cost Optimal Route Proc. International (PR)(CO)(OA) Jensen, M. Kaul and S. Planning under Time-Varying Conference on Data Shang Uncertainty Engineering (ICDE) C337 2014 C. Silvestri, F. Lettich, S. GPU-based Computing of Proc. Euromicro International (PR)(CO)(OA) Orlando and C.S. Jensen Repeated Range Queries Conference on Parallel, over Moving Objects Distributed, and Network- Based Processing

C338 2014 P. Indyk, M. Kapralov (Nearly) Sample-Optimal Proc. Symposium on Discrete (PR)(CO)(OA) and E. Price Sparse Fourier Transform Algorithms (SODA)

C339 2014 A. Andoni, P. Indyk, H.L. Beyond Locality-Sensitive Proc. Symposium on Discrete (PR)(CO)(OA) Nguyen and I. Hashing Algorithms (SODA) Razenshteyn C340 2014 C. Hegde, P. Indyk and Approximation-Tolerant Proc. Symposium on Discrete (PR)(CO)(OA) L. Schmidt Model-Based Compressive Algorithms (SODA) Sensing C341 2014 Z. Abel, E. D. Demaine, Flat Foldings of Plane Graphs Proc. International (PR)(CO)(OA) M. L. Demaine, D. with Prescribed Angles and Symposium on Graph Eppstein, A. Lubiw and R. Edge Lengths Drawing (GD) Uehara

C342 2014 E. D. Demaine, M. How to Influence People with Proc. International World (PR)(CO)(OA) Hajiaghayi, H. Mahini, D. Partial Incentives Wide Web Conference L. Malec, S. Raghavan, A. Sawant and M. Zadimoghaddam

C343 2014 C.Tsirogiannis, B.Sandel New algorithms for Proc. Workshop on (PR)(CO)(OA) and A. Kalvisa. computing phylogenetic Algorithms in Bioinformatics biodiversity. (WABI) C344 2014 D. Sidlauskas and C. S. Spatial Joins in Main Proc. International (PR)(CO)(OA) Jensen Memory: Implementation Conference on Very Large Matters! Data Bases (VLDB) C345 2014 K. S. Bøgh, S. Chester, Hashcube: A Data Structure Proc. International (PR)(CO)(OA) D. Sidlauskas and I. for Space- and Query- Conference on Conference on Assent Efficient Skycube Information and Knowledge Compression Management (CIKM)

C346 2014 A. Skovsgaard and C.S. Top-k point of interest Proc. International (PR)(OA) Jensen retrieval using standard Conference on Advances in indexes. Geographic Information Systems (ACM-GIS)

C347 2014 D. Chen, C. Konrad, K. Robust Set Reconciliation Proc. International (PR)(CO)(OA) Yi, W. Yu and Q. Zhang Conference on Management of Data (SIGMOD)

C348 2014 A. Grønlund and S. Pettie Threesomes, Degenerates, IEEE Symposium on (PR)(OA) and Love Triangles Foundations of Computer Science (FOCS 2014) C349 2014 D. Nanongkai and H.-H. Almost-Tight Distributed Proc. International (PR)(CO)(OA) Su Minimum Cut Algorithms Symposium Distributed Computing (DISC) C350 2014 J. I. Munro, G. Navarro, Top-k Term-Proximity in Proc. International (PR)(CO)(OA) J. S. Nielsen, R. Shah Succinct Space Symposium on Algorithms and S. V. Thankachan and Computation (ISAAC)

C351 2014 P. Indyk, S.Mahabadi, M. Composable Core-sets for Proc. Symposium on (PR)(CO)(OA) Mahdian and V. S. Diversity and Coverage Principles of Database Mirrokni Maximization Systems (PODS) C352 2014 E.D. Demaine, P. Indyk, On Streaming and Proc. International (PR)(CO)(OA) S. Mahabadi and A. Communication Complexity Symposium Distributed Vakilian of the Set Cover Problem Computing (DISC)

C353 2014 P. Indyk and M. Kapralov Sample-Optimal Fourier Proc. Symposium on (PR)(CO)(OA) Sampling in Any Constant Foundations of Computer Dimension Science (FOCS) C354 2014 L. Schmidt, C. Hegde, P. Automatic Fault Localization Proc. International (PR)(CO)(OA) Indyk, J. Kane, L. Lu and Using the Generalized Earth Conference on Acoustics, D. Hohl Mover's Distance Speech and Signal Processing (ICASSP) C355 2014 C. Hegde, P. Indyk and A Fast Approximation Proc. International (PR)(CO)(OA) L. Schmidt Algorithm for Tree-Sparse Symposium on Information Recovery Theory (ISIT) C356 2014 C. Hegde, P. Indyk and Nearly Linear-Time Model- Proc. International (PR)(CO)(OA) L. Schmidt Based Compressive Sensing Colloquium on Automata, Languages, and Programming (ICALP) C357 2014 A. Backurs and P. Indyk Better embeddings for planar Proc. Symposium on (PR)(CO)(OA) Earth-Mover Distance over Computational Geometry sparse sets (SoCG) C358 2014 H. Hassanieh, L. Shi, O. GHz-wide sensing and Proc. INFOCOM (PR)(CO) Abari, E. Hamed and D. decoding using the sparse Katabi Fourier transform C359 2014 E. D. Demaine, Y. Huang, Canadians Should Travel Proc. International (PR)(CO)(OA) C.-S. Liao and K. Randomly Colloquium on Automata, Sadakane Languages and Programming (ICALP) C360 2014 E. D. Demaine, M. L. One Tile to Rule Them All: Proc. International (PR)(CO)(OA) Demaine, S. P. Fekete, Simulating Any Tile Colloquium on Automata, M. J. Patitz, R. T. Assembly System with a Languages and Programming Schweller, A. Winslow Single Universal Tile (ICALP) and D. Woods C361 2014 G. Aloupis, E. D. Classic Nintendo Games are Proc. International (CO)(OA) Demaine, A. Guo and G. (NP-)Hard Conference on Fun with Viglietta Algorithms C362 2014 E. D. Demaine, F. Ma and Playing Dominoes is Hard, Proc. International (CO)(OA) E. Waingarten Except by Yourself Conference on Fun with Algorithms C363 2014 K. Yamanaka, E. D. Swapping Labeled Tokens on Proc. International (CO)(OA) Demaine, T. Ito, J. Graphs Conference on Fun with Kawahara, M. Kiyomi, Y. Algorithms Okamoto, T. Saitoh, A. Suzuki, K. Uchizawa and T. Uno C364 2014 E. D. Demaine and M. L. Fun with Fonts: Algorithmic Proc. International (CO)(OA) Demaine Typography Conference on Fun with Algorithms C365 2014 Z. Abel, E. D. Demaine, Continuously Flattening Proc. Symposium on (PR)(CO)(OA) M. L. Demaine, J.-I. Itoh, Polyhedra Using Straight Computational Geometry A. Lubiw, C. Nara and J. Skeletons (SoCG) O'Rourke

C366 2014 B. An, S. Miyashita, M. T. An End-To-End Approach to Proc. International (PR)(CO)(OA) Tolley, D. M. Aukes, L. Making Self-Folded 3D Conference on Robotics and Meeker, E. D. Demaine, Surface Shapes by Uniform Automation M. L. Demaine, R. J. Heating Wood and D. Rus

C367 2014 A. Becker, E. D. Particle Computation: Proc. International (PR)(CO)(OA) Demaine, S. Fekete and Designing Worlds to Control Conference on Robotics and J. McLurkin Robot Swarms with Only Automation Global Signals C368 2014 Y. Bachrach, O. Lev, S. Cooperative weakest link Proc. International (PR)(CO)(OA) Lovett, J. S. Rosenschein games Conference on Autonomous and M. Zadimoghaddam Agents and Multiagent Systems C369 2014 A. Termehchy, A. Which Concepts are Worth Proc. International (PR)(CO)(OA) Vakilian, Y. Extracting? Conference on the Chodpathumwan and M. Management of Data Winslett (SIGMOD) C370 2014 A. Ene and A. Vakilian Improved Approximation Proc. Symposium on Theory (PR)(CO)(OA) Algorithms for Degree- of Computing (STOC) bounded Network Design Problems with Node Connectivity Requirements

C371 2014 L. Arge, J. Truelsen and Simplifying massive planar Proc. Workshop on Algorithm (PR)(OA) J. Yang subdivisions Engineering and Experiments (ALENEX)

C372 2015 Z. Huang, B. Radunovic, Communication Complexity Proc. Symposium on (PR)(CO)(OA) M. Vojnovic and Q. of Approximate Maximum Theoretical Aspects of Zhang Matching in Distributed Graph Computer Science (STACS)

C373 2015 G. S. Brodal, J. S. Strictly Implicit Priority Proc. Workshop on (PR)(OA) Nielsen and J. Truelsen Queues: On the Number of Algorithms and Data Moves and Worst-Case Time Structures (WADS)

C374 2015 M. Elkin and S. Pettie A Linear-Size Logarithmic Proc. Symposium on Discrete (PR)(CO)(OA) Stretch Path-Reporting Algorithms (SODA) Distance Oracle for General Graphs

C375 2015 S. Pettie Sharp Bounds on Formation- Proc. Symposium on Discrete (PR)(OA) free Sequences Algorithms (SODA)

C376 2015 M. Elkin, S. Pettie and H.- (2Δ-1)-Edge Coloring is Proc. Symposium on Discrete (PR)(CO)(OA) H. Su Much Easier than Maximal Algorithms (SODA) Matching in the Distributed Setting C377 2015 T. Kopelowitz, S. Pettie Dynamic Set Intersection Proc. Workshop on (PR)(CO)(OA) and E. Porat Algorithms and Data Structures (WADS) C378 2015 P. K. Agarwal, T. Maintaining Contour Trees of Proc. Symposium on (PR)(CO)(OA) Mølhave, M. Revsbæk, I. Dynamic Terrains Computational Geometry Safa, Y. Wang and J. (SoCG) Yang. C379 2015 S. Chester, D. Scalable parallelization of Proc. IEEE Conference on (PR)(CO)(OA) Sidlauskas, I. Assent skyline computation for multi- Data Engineering (ICDE) and K. S. Bøgh core processors C380 2015 W. Son and P. Afshani Streaming Algorithms for Proc. Conference on Theory (PR)(OA) Smallest Intersecting Ball of and Applications of Models of Disjoint Balls Computation (TAMC)

C381 2015 M. Goswami, A. Approximate Range Proc. Symposium on Discrete (PR)(CO)(OA) Grønlund, K. G. Larsen Emptiness in Constant Time Algorithms (SODA) and R. Pagh and Optimal Space C382 2015 K. G. Larsen, J. Nelson Time Lower Bounds for Proc. Symposium on Theory (PR)(CO)(OA) and H. L. Nguyen Nonadaptive Turnstile of Computing (STOC) Streaming Algorithms C383 2015 D. Ajwani, U.Meyer and An I/O-efficient Distance Proc. Workshop on Algorithm (PR)(CO)(OA) D. Veith Oracle for Evolving Real- Engineering and Experiments World Graphs (ALENEX)

C384 2015 A. Andoni and I. Optimal Data-Dependent Proc. Symposium on Theory (PR)(CO)(OA) Razenshteyn Hashing for Approximate of Computing (STOC) Near Neighbors C385 2015 Z. Allen-Zhu, R. Restricted Isometry Property Proc. Symposium on (PR)(CO)(OA) Gelashvili and I. for General p-Norms Computational Geometry Razenshteyn (SoCG) C386 2015 S. Lattanzi, S. Leonardi, Robust Hierarchical k-Center Proc. Innovations in (PR)(CO)(OA) V. Mirrokni and I. Clustering Theoretical Computer Science Razenshteyn (ITCS) C387 2015 A. Andoni, R. Sketching and Embedding Proc. Symposium on the (PR)(CO)(OA) Krauthgamer and I. are Equivalent for Norms. Theory of Computing (STOC) Razenshteyn C388 2015 C. Hegde, P. Indyk and A Nearly-Linear Time Proc. International (PR)(CO)(OA) L. Schmidt Framework for Graph- Conference on Machine Structured Sparsity Learning (ICML) C389 2015 L. Schmidt, C. Hegde, P. Seismic feature extraction Proc. International (PR)(CO)(OA) Indyk, L. Lu, X. Chi and using steiner tree methods Conference on Acoustics, D. Hohl Speech and Signal Processing (ICASSP) C390 2015 A. Andoni, P. Indyk, T. Practical and Optimal LSH for Proc. Advances in Neural (PR)(CO)(OA) Laarhoven, I. P. Angular Distance Information Processing Razenshteyn and L. Systems (NIPS) Schmidt

C391 2015 A. Kim, E. Blais, A. G. Rapid Sampling for Proc. International (PR)(CO)(OA) Parameswaran, P. Indyk, Visualizations with Ordering Conference on Very Large S. Madden and R. Guarantees Data Bases (VLDB) Rubinfeld C392 2015 J. Acharya, I. Fast and Near-Optimal Proc. Symposium on (PR)(CO)(OA) Diakonikolas, C. Algorithms for Approximating Principles of Database Hegde, J. Z. Li and L. Distributions by Histograms Systems (PODS) Schmidt

C393 2015 A. Backurs and P. Indyk Edit Distance Cannot Be Proc. Symposium on the (PR)(CO)(OA) Computed in Strongly Theory of Computing (STOC) Subquadratic Time (unless SETH is false) C394 2015 A. Kovacs, U. Meyer, and Mechanisms with monitoring Proc. Conference on Web and (PR)(CO)(OA) C. Ventre for truthful RAM allocation Internet Economics (WINE)

C395 2015 P. Chalermsook, M. Self-Adjusting Binary Search Proc. European Symposium (PR)(CO)(OA) Goswami, L. Kozma, K. Trees: What Makes Them on Algorithms (ESA) Mehlhorn and T. Tick? Saranurak C396 2015 P. Chalermsook, M. Pattern-Avoiding Access in Proc. Symposium on (PR)(CO)(OA) Goswami, L. Kozma, K. Binary Search Trees Foundations of Computer Mehlhorn and T. Science (FOCS) Saranurak C397 2015 P. Chalermsook, M. Greedy Is an Almost Optimal Proc. Workshop on (PR)(CO)(OA) Goswami, L. Kozma, K. Deque Algorithms and Data Mehlhorn and T. Structures (WADS) Saranurak C398 2015 L. Arge, M. Rav, S. Raza I/O-Efficient Event Based Proc. Workshop on Massive (OA) and M. Revsbæk Depression Flood Risk Data Algorithmics (MASSIVE)

C399 2015 M. de Berg, C. Fast Computation of Proc. International (PR)(CO)(OA) Tsirogiannis and B.T. Categorical Richness on Conference on Advances in Wilkinson Raster Data Sets and Geographic Information Related Problems Systems (ACM-GIS)

C400 2015 M. de Berg, C. Fast Computation of Proc. Workshop on Massive (CO)(OA) Tsirogiannis and B.T. Categorical Richness on Data Algorithmics (MASSIVE) Wilkinson Raster Data Sets and Related Problems C401 2015 C. Alexander, L. Arge, P. Computing River Floods Proc. Workshop on Massive (OA) K. Bøcher, M. Revsbæk, Using Massive Terrain Data Data Algorithmics (MASSIVE) B. Sandel, J-C. Svenning, C. Tsirogiannis and J. Yang

C402 2015 P. Afshani and N. Sorting and Permuting Proc. European Symposium (PR)(CO)(OA) Sitchinava without Bank Conflicts on on Algorithms (ESA) GPUs C403 2015 A. Abboud, A. Backurs If the Current Clique Proc. Symposium on (PR)(CO)(OA) and V. A. Williams Algorithms are Optimal, So is Foundations of Computer Valiant's Parser Science (FOCS) C404 2015 R. Clifford, A. Grønlund New Unconditional Hardness Proc. Symposium on (PR)(CO)(OA) and K. G. Larsen Results for Dynamic and Foundations of Computer Online Problems Science (FOCS)

C405 2015 E. D. Demaine, S. New Geometric Algorithms Proc. International (PR)(CO)(OA) Fekete, C. Scheffer and for Fully Connected Staged Conference on DNA A. Schmidt Self-Assembly Computing and Molecular Programming C406 2015 E. D. Demaine, T. Kaler, Polylogarithmic Fully Proc. Workshop on (PR)(CO)(OA) Q. Liu, A. Sidford and A. Retroactive Priority Queues Algorithms and Data Yedidia via Hierarchical Checkpointing Structures (WADS)

C407 2015 E. D. Demaine, V. Gopal Cache-Oblivious Iterated Proc. Workshop on (PR)(CO)(OA) and W. Hasenplaugh Predecessor Queries via Algorithms and Data Range Coalescing Structures (WADS) C408 2015 A. T. Becker, E. D. Tilt: The Video -- Designing Multimedia Exposition in (PR)(CO)(OA) Demaine, S. P. Fekete, Worlds to Control Robot Proc. Symposium on H. M. Shad and R. Morris- Swarms with Only Global Computational Geometry Wright Signals (SoCG) C409 2015 H. M. Shad, R. Morris- Particle computation: Device Proc. IEEE International (PR)(CO)(OA) Wright, E. D. Demaine, fan-out and binary memory Conference on Robotics and S. P. Fekete and A. Automation (ICRA) Becker C410 2015 E. D. Demaine, S. Fekete New Geometric Algorithms Proc. European Workshop on (CO)(OA) and A. Schmidt for Staged Self-Assembly Computational Geometry (EuroCG)

C411 2015 E. D. Demaine, D. Folding a Paper Strip to Proc. International Workshop (PR)(CO)(OA) Eppstein, A. Hesterberg, Minimize Thickness on Algorithms and H. Ito, A. Lubiw, R. Computation (WALCOM) Uehara and Y. Uno

C412 2015 S. M. Oh, G. T. A Dissimilarity Measure for Proc. International (PR)(CO)(OA) Toussaint, E. D. Demaine Comparing Origami Crease Conference on Pattern and M. L. Demaine Patterns Recognition Applications and Methods (ICPRAM) C413 2015 A. Mehta, B. Waggoner Online Stochastic Matching Proc. Symposium on Discrete (PR)(CO)(OA) and M. Zadimoghaddam with Unequal Probabilities Algorithms (SODA)

C414 2015 A. Bhaskara, A. Theertha Sparse Solutions to Proc. International (PR)(CO)(OA) and M. Zadimoghaddam Nonnegative Linear Systems Conference on Artificial and Applications Intelligence and Statistics (AISTATS) C415 2015 A. Abboud, A. Backurs Tight Hardness Results for Proc. Symposium on (PR)(CO)(OA) and V. A. Williams LCS and Other Sequence Foundations of Computer Similarity Measures. Science (FOCS)

C416 2016 U. Meyer and M. Generating Massive Scale- Proc. Workshop on Algorithm (PR)(OA) Penschuck Free Networks under Engineering and Experiments Resource Constraints (ALENEX)

C417 2016 S. Ashkiani, A. Davidson, GPU Multisplit Proc. Symposium on (PR) (CO)(OA) U. Meyer and J. Owens Principles and Practice of Parallel Programming (PPoPP)

C418 2016 R. Duan, J. Garg and K. An Improved Combinatorial Proc. Symposium on Discrete (PR) (CO)(OA) Mehlhorn Polynomial Algorithm for the Algorithms (SODA) Linear Arrow-Debreu Market

C419 2016 Z. Abel, E.D. Demaine, Who Needs Crossings? Proc. International (PR) (CO)(OA) M.L. Demaine, S. Hardness of Plane Graph Conference on Computational Eisenstat, J. Lynch and Rigidity Geometry (SoCG) T.B. Schardl

C420 2016 E.D. Demaine, G. Super Mario Bros. is Proc. International (CO)(OA) Viglietta and A. Williams Harder/Easier than We Conference on Fun with Thought Algorithms C421 2016 E.D. Demaine, F. Ma, E. The Fewest Clues Problem Proc. International (CO)(OA) Waingarten, A. Conference on Fun with Schvartzman and S. Algorithms Aaronson C422 2016 E.D. Demaine, J. Lynch, Energy-Efficient Algorithms Proc. ACM Conference on (PR) (CO)(OA) G.J. Mirano and N. Tyagi Innovations in Theoretical Computer Science

C423 2016 K.G. Larsen and J. Nelson The Johnson-Lindenstrauss Proc. International (PR) (CO)(OA) Lemma is Optimal for Linear Colloquium on Automata, Dimensionality Reduction Languages and Programming (ICALP) C424 2016 A. Grønlund and K.G. Towards Tight Lower Bounds Proc. International (PR)(OA) Larsen for Range Reporting on the Colloquium on Automata, RAM Languages and Programming (ICALP) C425 2016 C. Baum, I. Damgård, How to prove knowledge of Proc. International (PR)(CO) K.G. Larsen and M. small secrets Cryptology Conference Nielsen (CRYPTO) C426 2016 P. Afshani and J.A.S. Data Structure Lower Bounds Proc. International (PR)(OA) Nielsen for Document Indexing Colloquium on Automata, Problems Languages and Programming (ICALP) C427 2016 P. Afshani, E. Berglin, I. Applications of incidence Proc. International (PR)(OA) van Duijn and J.A.S. bounds in point covering Conference on Computational Nielsen problems Geometry (SoCG)

C428 2016 P. Afshani, D. R. Sheehy Approximating the Simplicial Proc. European Workshop on (CO)(OA) and Y. Stein Depth in High Dimensions Computational Geometry (EuroCG)

C429 2016 J. Yon, S.W. Bae, S-W. Approximating Convex Proc. International (PR) (CO)(OA) Cheng, O. Cheong and Shapes with respect to Conference on Computational B.T. Wilkinson Symmetric Difference under Geometry (SoCG) Homotheties C430 2016 C. Tsirogiannis and B. Fast Phylogenetic Proc. International (PR)(OA) Sandel Biodiversity Computations Conference on Research in Under a Non-Uniform Computational Molecular Random Distribution Biology (RECOMB)

C431 2016 C. Alexander, L. Arge, Computing River Floods Proc. International (PR)(OA) P.K. Bøcher, M. Using Massive Terrain Data Conference on Geographic Revsbæk, B. Sandel, J.- Information Science C. Svenning, C. (GIScience) Tsirogiannis, and J. Yang

C432 2016 M. Löffler, F. Staals and New Results on Trajectory Proc. European Workshop on (CO)(OA) J. Urhausen Grouping under Geodesic Computational Geometry Distance (EuroCG)

C433 2016 M. van Kreveld, M. A Refined Definition for Proc. European Workshop on (CO)(OA) Löffler, F. Staals and L. Groups of Moving Entities Computational Geometry Wiratma and its Computation (EuroCG)

C434 2016 A. van Goethem, M. van Grouping Time-varying Data Proc. Symposium on (PR) (CO)(OA) Kreveld, M. Löffler, B. for Interactive Exploration Computational Geometry Speckmann and F. Staals (SoCG)

C435 2016 I. Kostitsnya, M. Löffler, On the complexity of Proc. Symposium on (PR) (CO)(OA) V. Polishchuk and F. minimum-link path problems Computational Geometry Staals (SoCG) C436 2016 T. Kopelowitz, S. Pettie Higher Lower Bounds from Proc. Symposium on Discrete (PR) (CO)(OA) and E. Porat the 3SUM Conjecture Algorithms (SODA)

C437 2016 G.S. Brodal External Memory Three- Proc. Symposium on (PR)(OA) Sided Range Reporting and Theoretical Aspects of Top-k Queries with Computer Science (STACS) Sublogarithmic Updates C438 2016 Z. Huang and P. Peng Dynamic Graph Stream Proc. International (PR) (CO)(OA) Algorithms in o(n) Space Colloquium on Automata, Languages and Programming (ICALP) C439 2016 P. Brandes, Z. Huang, H.- Clairvoyant Mechanisms for Proc. International (PR) (CO)(OA) H. Su and R. Wattenhofer Online Auctions Conference on Computing and Combinatorics (COCOON)

C440 2016 A. Backurs, P. Indyk, I. Nearly-optimal bounds for Proc. Symposium on Discrete (PR) (CO)(OA) Razenshteyn and D.P. sparse recovery in generic Algorithms (SODA) Woodruff norms, with applications to k- median sketching C441 2016 M. Cheraghchi and P. Nearly Optimal Deterministic Proc. Symposium on Discrete (PR) (CO)(OA) Indyk Algorithm for Sparse Walsh- Algorithms (SODA) Hadamard Transform

C442 2016 S. Har-Peled, P. Indyk, Towards Tight Bounds for Proc. Symposium on (PR) (CO)(OA) S. Mahabadi and A. the Streaming Set Cover Principles of Database Vakilian Problem Systems (PODS) C443 2016 A. Abboud, A. Backurs, Subtree Isomorphism Proc. Symposium on Discrete (PR) (CO)(OA) T.D. Hansen, V.V. Revisited Algorithms (SODA) Williams and O. Zamir

Journals J1 2007 G. S. Brodal, R. On the Adaptiveness of ACM Journal of Experimental (PR) (CO) Fagerberg and G. Moruz Quicksort Algorithmics, 12

J2 2008 D. Ajwani, T. Friedrich An O(n^{2.75}) Algorithm ACM Transactions on (PR) and U. Meyer for Incremental Topological Algorithms, 4(4) Ordering

J3 2008 M. Stissing, T. Mailund, Computing the All-Pairs Journal of Bioinformatics and (PR)(CO) C. N. S. Pedersen, G. S. Quartet Distance on a set of Computational Biology, 6(1) Brodal and R. Fagerberg Evolutionary Trees

J4 2008 L. Arge, M. de Berg, H. J. The Priority R-Tree: A ACM Transactions on (PR)(CO) Haverkort and K. Yi Practically Efficient and Algorithms, 4(1) Worst-Case Optimal R-Tree

J5 2009 M. Olsen Nash Stability in Additively Theory of Computing (PR) Separable Hedonic Games Systems, 45(4) and Community Structures

J6 2009 M. Abam, M. de Berg, M. Region-Fault Tolerant Discrete & Computational (PR)(CO) Farshi and J. Geometric Spanners Geometry, 41(4) Gudmundsson J7 2009 M. Abam, M. de Berg and Kinetic kd-Trees and Longest- SIAM Journal of Computing, (PR)(CO) B. Speckmann Side kd-Trees 39(4) J8 2009 L. Arge, V. Samoladas Optimal External-Memory Algorithmica, 54(3) (PR)(CO) and K. Yi Planar Point Enclosure

J9 2009 L. Arge, M. de Berg and Cache-Oblivious R-Trees Algorithmica, 53(1) (PR)(CO) H. Haverkort J10 2009 H. Iben, J. O'Brien and E. Refolding Planar Polygons Discrete & Computational (PR)(CO) Demaine Geometry, 41(3) J11 2009 E. Demaine, M. Minimizing Movement ACM Transactions on (PR)(CO) Hajiaghayi, H. Mahini, A. Algorithms, 5(3) Sayedi-Roshkhar, S. Oveisgharan and M. Zadimoghaddam J12 2009 E. Demaine, M. Algorithmic Graph Minor Algorithmica, 54(2) (PR)(CO) Hajiaghayi and K. Theory: Improved Grid Minor Kawarabayashi Bounds and Wagner's Contraction J13 2009 T. Abbott, M. Burr, T. Dynamic Ham-Sandwich Computational Geometry: (PR)(CO) Chan, E. Demaine, M. Cuts in the Plane Theory and Applications, Demaine, J. Hugg, D. 42(5) Kane, S. Langerman, J. Nelson, E. Rafalin, K. Seyboth and V. Yeung

J14 2009 E.D. Demaine, M. The Price of Anarchy in ACM SIGECOM Exchanges, (PR)(CO) Hajiaghayi, H. Mahini and Network Creation Games 8(2) M. Zadimoghaddam

J15 2009 E.D. Demaine, M.L. Wrapping Spheres with Flat Computational Geometry: (PR)(CO) Demaine, J. Iacono and Paper Theory and Applications, S. Langerman 42(8) J16 2010 P. Indyk and A. Gilbert Sparse Recovery Using Proceedings of the IEEE June (PR)(CO) Sparse Matrices 2010 J17 2010 E.D. Demaine, Confluently Persistent Tries Algorithmica 57(3) (PR)(CO) S.Langerman and E. Price for Efficient Version Control J18 2010 M.A. Abam, M. de Berg, Streaming Algorithms for Discrete & Computational (PR)(CO) P. Hachenberger and A. Line Simplification Geometry 43(3) Zarei

J19 2010 M.A. Abam, M. de Berg A Simple and Efficient Kinetic Computational Geometry: (PR)(CO) and J. Gudmundsson Spanner Theory and Applications 43(3)

J20 2010 D. Ajwani and T. Friedrich Average-case Analysis of Discrete Applied Mathematics (PR)(CO) Incremental Topological 158 Ordering J21 2010 H. Blunck and J. In-Place Algorithms for Algorithmica 57(1) (PR)(CO) Vahrenhold Computing (Layers of) Maxima J22 2010 P. Indyk, Z. Syed, C. Motif discovery in ACM Transactions on (PR)(CO) Stultz, M. Kellis and J. physiological datasets: A Knowledge Discovery in Data Guttag methodology for inferring 4(1) predictive elements

J23 2010 E. Hawkes, B. An, N. M. Programmable matter by Proceedings of the National (PR)(CO) Benbernou, H. Tanaka, folding Academy of Sciences of the S. Kim, E.D. Demaine, D. United States of America Rus and R.J. Wood 107(28)

J24 2010 J.L. Bredin, E.D. Deploying Sensor Networks IEEE/ACM Transactions on (PR)(CO) Demaine, M. Hajiaghayi with Guaranteed Fault Networking 18(1) and D. Rus Tolerance

J25 2010 E.D. Demaine, J. Iacono Grid Vertex-Unfolding International Journal of (PR)(CO) and S. Langerman Orthostacks Computational Geometry and Applications 20(3) J26 2010 E.D. Demaine, S.P. Integer Point Sets Minimizing Computational Geometry: (PR)(CO)

Fekete, G. Rote, N. Average Pairwise L1 Theory and Applications 44(2) Schweer, D. Scymura Distance: What is the and M. Zelke Optimal Shape of a Town?

J27 2010 R. Connelly, E.D. Locked and Unlocked Chains Discrete & Computational (PR)(CO) Demaine, M.L. Demaine, of Planar Shapes Geometry 44(2) S. Fekete, S. Langerman, J. S. B. Mitchell, A. Ribó and G. Rote

J28 2010 P.K. Agarwal, L. Arge and I/O-Efficient Batched Union- ACM Transactions on (PR)(CO) K. Yi Find and Its Applications to Algorithms 7(1) Terrain Analysis

J29 2010 P. Afshani, C. Hamilton A General Approach for Computational geometry: (PR)(CO) and N. Zeh Cache-Oblivious Range Theory and applications 43(8) Reporting and Approximate Range Counting

J30 2010 J. Katajainen and S. S. A compact data structure for Information Processing (PR)(CO) Rao representing a dynamic Letters 110(23) multiset J31 2010 M.A. Bender, G.S. Optimal Sparse Matrix Dense Theory of Computing (PR)(CO) Brodal, R. Fagerberg, R. Vector Multiplication in the Systems 47(4) Jacob and E. Vicari I/O-Model

J32 2010 C. Demetrescu, B. Adapting Parallel Algorithms Theoretical Computer Science (PR)(CO) Escoffier, G. Moruz and to the W-Stream Model, with 411(44-46) A. Ribichini Applications to Graph Problems

J33 2011 J. E. Moeslund, L. Arge, Geographically Wetlands 31(3) (PR)(CO) P. K. Bøcher, B. Nygaard Comprehensive Assessment and J.-C. Svenning of Salt-Meadow Vegetation- Elevation Relations Using LiDAR

J34 2011 B. Sandel, L. Arge, B. The influence of Late Science 334 (PR)(CO) Dalsgaard, R. Davies, K. Quaternary climate-change Gaston, W. Sutherland velocity on species endemism and J.-C. Svenning J35 2011 B. Dalsgaard, E. Magård, Specialization in Plant- PLoS ONE 6 (PR)(CO) J. Fjeldså, A.M. Martín Hummingbird Networks Is González, C. Rahbek, J. Associated with Species Olesen, J. Ollerton, R. Richness, Contemporary Alarcón, A.C. Araujo, P.A. Precipitation and Quaternary Cotton, C. Lara, C.G. Climate-Change Velocity Machado, I. Sazima, M. Sazima, A. Timmermann, S. Watts, B. Sandel, W. Sutherland and J.-C. Svenning

J36 2011 B. Sandel, M. Krupa and Using plant functional traits Ecosphere 2 (PR)(CO) J. Corbin to guide restoration: A case study in California coastal grassland

J37 2011 P. Afshani, C. Hamilton Cache-Oblivious Range Discrete & Computational (PR)(CO) and N. Zeh Reporting With Optimal Geometry 45(4) Queries Requires Superlinear Space J38 2011 G.S. Brodal, B. Gfeller, Towards Optimal Range Theoretical Computer Science (PR)(CO) A.G. Jørgensen and P. Medians 412(24) Sanders

J39 2011 M. Kutz, G.S: Brodal, K. Faster Algorithms for Journal of Discrete (PR)(CO) Kaligosi and I. Katriel Computing Longest Common Algorithms 9(4) Increasing Subsequences

J40 2011 M.A. Bender, G.S. The Cost of Cache-Oblivious Algorithmica 61(2) (PR)(CO) Brodal, R. Fagerberg, D. Searching Ge, S. He, H. Hu, J. Iacono and A. López-Ortiz

J41 2011 H.L. Chan, T.W. Lam, Approximating frequent Algorithmica 4(3) (PR) (CO) L.K. Lee and H.F. Ting items in asynchronous data stream over a sliding window

J42 2011 C. Daskalakis, R. M. Sorting and Selection in SIAM Journal of Computing (PR)(CO) Karp, E. Mossel, S. Posets Riesenfeld and E. Verbin

J43 2011 M. A. Abam and M. de Kinetic Spanners in R^d Discrete & Computational (PR)(CO) Berg Geometry 45(4) J44 2011 M. A. Abam, M. de Berg, Geometric Spanners for Algorithmica 61(1) (PR)(CO) M. Farshi, J. Weighted Point Sets Gudmundsson and M. H. M. Smid J45 2011 M. A. Abam, P. K. Out-of-Order Event Algorithmica 60(2) (PR)(CO) Agarwal, M. de Berg and Processing in Kinetic Data H. Yu Structures J46 2011 J. Freixas, X. Molinero, On the Complexity of RAIRO - Operations Research (PR)(CO) M. Olsen and M. J. Serna Problems on Simple Games 45(4)

J47 2011 A. Beckman, U. Meyer, P. Energy-Efficient Sorting Sustainable Computing: (PR)(CO) Sanders and J. Singler using Solid State Disks Informatics and Systems 1(2)

J48 2011 E. D. Demaine, S. P. Integer Point Sets Minimizing Computational Geometry: (PR)(CO) Fekete, G. Rote, N. Average Pairwise L1 Theory and Applications 44(2) Schweer, D. Schymura Distance: What is the and M. Zelke Optimal Shape of a Town?

J49 2011 B. An, N. Benbernou, E. Planning to Fold Multiple Robotica 29(1) (PR)(CO) D. Demaine and D. Rus Objects from a Single Self- Folding Sheet J50 2011 G. Aloupis, S. Collette, Efficient constant-velocity Robotica 29(1) (PR)(CO) M. Damian, E. D. reconfiguration of crystalline Demaine, R. Flatland, S. robots Langerman, J. O'Rourke, V. Pinciu, S. Ramaswami, V. Sacristan and S. Wuhrer

J51 2011 E. D. Demaine, M. L. (Non)existence of Pleated Graphs and Combinatorics (PR)(CO) Demaine, V. Hart, G. N. Folds: How Paper Folds 27(3) Price and T. Tachi Between Creases

J52 2011 E. D. Demaine, M. L. Continuous Blooming of Graphs and Combinatorics (PR)(CO) Demaine, V. Hart, J. Convex Polyhedra 27(3) Iacono, S. Langerman and J. O'Rourke

J53 2011 J. Cardinal, E. D. Algorithmic Folding Graphs and Combinatorics (PR)(CO) Demaine, M. L. Demaine, Complexity 27(3) S. Imahori, T. Ito, M. Kiyomi, S. Langerman, R. Uehara and T. Uno

J54 2011 K. C. Cheung, E. D. Programmable Assembly IEEE Transactions on (PR)(CO) Demaine, J. Bachrach With Universally Foldable Robotics 27(4) and S. Griffith Strings (Moteins) J55 2011 G. Aloupis, P. Bose, E. D. Computing Signed International Journal of (PR)(CO) Demaine, S. Langerman, Permutations of Polygons Computational Geometry and H. Meijer, M. Overmars Applications 21(1) and G. T. Toussaint

J56 2011 T. Ito, E. D. Demaine, N. On the Complexity of Theoretical Computer Science (PR)(CO) J. A. Harvey, C. H. Reconfiguration Problems 412(12-14) Papadimitriou, M. Sideri, R. Uehara and Y. Uno

J57 2011 H. Ahn, S. Bae, E. D. Covering points by disjoint Computational Geometry: (PR)(CO) Demaine, M. L. Demaine, boxes with outliers Theory and Applications 44(3) S. Kim, M. Korman, I. Reinbacher and W. Son

J58 2011 J. Cardinal, E. D. The Stackelberg Minimum Algorithmica 59(2) (PR)(CO) Demaine, S. Fiorini, G. Spanning Tree Game Joret, S. Langerman, I. Newman and O. Weimann

J59 2011 H. Haverkort and F. van Four-Dimensional Hilbert Journal of Experimental (PR)(CO) Walderveen Curves for R-Trees Algorithmics 16 J60 2012 B. Sandel and J. Corbin Scale and diversity following Journal of Vegetation Science (PR)(CO)(OA) manipulation of productivity 23(5) and disturbance in Californian coastal grasslands. J61 2012 M. Schleuning, J. Fründ, Specialization of Mutualistic Current Biology 22(20) (PR)(CO)(OA) A-M. Klein, S. Interaction Networks Abrahamczyk, R. Decreases toward Tropical Alarcón, M. Albrecht, Latitudes G.K.S. Andersson, S. Bazarian, K. Böhning- Gaese, R. Bommarco, B. Dalsgaard, D.M. Dehling, A. Gotlieb, M. Hagen, T. Hickler, A. Holzschuh, C.N. Kaiser-Bunbury, H. Kreft, R.J. Morris, B. Sandel, W.J. Sutherland, J-C. Svenning, T. Tscharntke, S. Watts, C.N. Weiner, M. Werner, N.M. Williams, C. Winqvist, C.F. Dormann and N. Blüthgen

J62 2012 B. Sandel and E. Climate change and the Global Change Biology 18(1) (PR)(CO)(OA) Dangremond invasion of California by grasses J63 2012 P. Afshani, P. K. Agarwal, (Approximate) Uncertain Theory of Computing (PR)(CO)(OA) L. Arge, K. G. Larsen and Skylines Systems 52(3) J. M. Phillips

J64 2012 P. K. Agarwal, L. Arge, H. An Optimal Dynamic Data SIAM Journal on Computing (PR)(CO)(OA) Kaplan, E. Molad, R. E. Structure for Stabbing- 41(1) Tarjan and K Yi Semigroup Queries

J65 2012 L. Arge, G.S. Brodal and External memory planar Algorithmica 63(1-2) (PR)(CO)(OA) S. S. Rao point location with logarithmic updates J66 2012 G. S. Brodal, P. Davoodi On Space Efficient Two Algorithmica 63(4) (PR)(CO)(OA) and S. S. Rao Dimensional Range Minimum Data Structures J67 2012 G.S. Brodal, G. Moruz OnlineMin: A Fast Strongly Theory of Computing (PR)(OA) and A. Negoescu Competitive Randomized Systems 56(1) Paging Algorithm

J68 2012 H.L. Chan, T.W. Lam, Continuous monitoring of Algorithmica 62(3-4) (PR)(CO)(OA) L.K. Lee and H.F. Ting distributed data streams over a time-based sliding window J69 2012 G. Cormode, S. Continuos sampling from Journal of the ACM 59(2) (PR)(CO)(OA) Muthukrishnan, K. Yi and distributed streams Q. Zhang J70 2012 U. Meyer and N. Zeh I/O-efficient shortest path ACM Transactions on (PR)(CO)(OA) algorithms for undirected algorithms 8(3) graphs with random and bounded edge lengths

J71 2012 F. Gieseke, G. Moruz and Resilient K-d Trees: K-Means Frontiers of Computer (PR)(CO)(OA) J. Vahrenhold in Space Revisited. Science 6(2)

J72 2012 E.D. Demaine, M. The Price of Anarchy in ACM Transactions on (PR)(CO)(OA) Hajiaghayi, H. Mahini and Network Creation Games Algorithms 8(2) M. Zadimoghaddam

J73 2012 O. Aichholzer, F. On k-convex polygons Computational Geometry: (PR)(CO)(OA) Aurenhammer, E.D. Theory and Applications 45(3) Demaine, F. Hurtado, P. Ramos and J. Urrutia

J74 2012 T.G. Abbott, Z. Abel, D. Hinged Dissections Exist Discrete & Computational (PR)(CO)(OA) Charlton, E.D. Demaine, Geometry 47(1) M.L. Demaine and S.D. Kominers J75 2012 M. Greve, A.M. Lykke, Continental-scale variability Journal of Ecology 100 (PR)(CO)(OA) C.W. Fagg, J. Bogaert, I. in browser diversity is a Friis, R. Marchant, A.R. major driver of diversity Marshall, J. patterns in acacias across Ndayishimiye, B. Sandel, Africa C. Sandom, M. Schmidt, J.R. Timberlake, J.J. Wieringa, G. Zizka and J.- C. Svenning

J76 2012 R. Gupta, P. Indyk, E. Compressive Sensing with International Journal of (PR)(CO)(OA) Price and Y. Rachlin Local Geometric Features Computational Geometry and Applications 22(4) J77 2012 D. Charlton, E.D. Ghost Chimneys International Journal of (PR)(CO)(OA) Demaine, M.L. Demaine, Computational Geometry and V. Dujmovic, P. Morin Applications 22(3) and R. Uehara

J78 2012 E.D. Demaine, M.L. Any Monotone Boolean Algorithms 5(1) (PR)(CO)(OA) Demaine and R. Uehara Function Can Be Realized by Interlocked Polygons

J79 2012 E.D. Demaine and M. Constant Price of Anarchy in Internet Mathematics 8(1-2) (PR)(OA) Zadimoghaddam Network-Creation Games via Public-Service Advertising

J80 2012 L. Moll, S. Tazari and M. Computing hypergraph width Information Processing (PR)(CO)(OA) Thurley measures exactly Letters 112(6) J81 2012 S. Tazari Faster approximation Theoretical Computer Science (PR)(OA) schemes and parameterized 417 algorithms on (odd-) H- minor-free graphs

J82 2012 D. Wu, G. Cong and C. S. A Framework for Efficient VLDB Journal 21(6) (PR)(CO)(OA) Jensen Spatial Web Object Retrieval

J83 2012 D. Wu, M. L. Yiu, G. Cong Joint Top-K Spatial Keyword IEEE Transaction on (PR)(CO)(OA) and C. S. Jensen Query Processing Knowledge and Data Engineering 24(10) J84 2012 X. Cao, G. Cong, B. Cui, Approaches to Exploring ACM Transactions on (PR)(CO)(OA) C. S. Jensen and Q. Yuan Category Information for Information Systems 30(2) Question Retrieval in Community Question Answer Archives J85 2012 M. Yiu, L., I. Assent, C. Outsourced Similarity Search IEEE Transactions on (PR)(CO)(OA) S. Jensen and P. Kalnis on Metric Data Assets Knowledge and Data Engineering 24(2) J86 2012 X. Cao, G. Cong, C. S. SWORS: A System for the Proceedings of the VLDB (PR)(CO)(OA) Jensen, J. J. Ng, B. C. Efficient Retrieval of Endowment 5(12) Ooi, N.-T. Phan and and Relevant Spatial Web Objects D. Wu J87 2013 J.E. Moeslund, L. Arge, Topographically controlled Biodiversity and Conservation (PR)(CO)(OA) P.K. Bøcher, T. Dalgaard, soil moisture drives plant 22(10) R. Ejrnæs, M.V. Odgaard diversity patterns within and J.-C. Svenning grasslands

J88 2013 J.E. Moeslund, L. Arge, Topography as a driver of Nordic Journal of Botany (PR)(CO)(OA) P.K. Bøcher, T. Dalgaard local terrestrial vascular 31(2) and J.-C. Svenning plant diversity patterns

J89 2013 J.E. Moeslund, L. Arge, Topographically controlled Ecosphere 4(7) (PR)(CO)(OA) P.K. Bøcher, T. Dalgaard, soil moisture is the primary M.V. Odgaard, B. driver of local vegetation Nygaard and J.-C. patterns across a lowland Svenning region J90 2013 C. Alexander, J.E. Airborne laser scanner Remote Sensing of (PR)(CO)(OA) Moeslund, P.K. Bøcher, L. (LiDAR) proxies for Environment 134 Arge and J.-C. Svenning understory light conditions J91 2013 C. Sandom, L. Dalby, C. Mammal predator and prey Ecology 94(5) (PR)(CO)(OA) Fløjgaard, W.D. Kissling, species richness are strongly J. Lenoir, B. Sandel, K. linked at macroscales Trøjelsgaard, R. Ernæs and J.-C. Svenning

J92 2013 B. Dalsgaard, K. Historical climate-change Ecography 36(12) (PR)(CO)(OA) Trøjelsgaard, A.M. Martin influences modularity of González, D. Nogués- pollination networks Bravo, J. Ollerton, T. Petanidou, B. Sandel, M. Schleuning, Z. Wang, C. Rahbek, W.J. Sutherland, J.-C. Svenning and J.M. Olesen

J93 2013 J.-C. Svenning and B. Disequilibrium vegetation American Journal of Botany (PR)(OA) Sandel dynamics under future climate change

J94 2013 A.B. Smith, B. Sandel, Characterizing scale- Ecology 94(11) (PR)(CO)(OA) N.J.B. Kraft and S. Carey dependent community assembly using the functional-diversity-area relationship J95 2013 B. Sandel and J.-C. Human impacts drive a Nature Communications 4 (PR)(OA) Svenning global topographic signature in tree cover J96 2013 M. Olsen and M. Revsbæk Alliance Partitions and Journal of Graph Algorithms (PR)(OA) Bisection Width for Planar and Applications 17(6) Graphs J97 2013 P.K. Agarwal, L. Arge, S. Efficient external memory Computational Geometry: (PR)(CO)(OA) Govindarajan, J. Yang structures for range- Theory and Application 46(3) and K. Yi aggregate queries J98 2013 A. Sand, G.S. Brodal, R. A practical O(n log n) time BMC Bioinformatics 14 (PR)(CO)(OA) Fagerberg, C.N.S. algorithm for computing the Pedersen and T. Mailund triplet distance on binary trees J99 2013 G.S. Brodal, M. Greve, V. Integer Representations Journal of Discrete Algorithms (PR)(CO)(OA) Pandey and S.S. Rao towards Efficient Counting in the Bit Probe Model

J100 2013 A. Sand, M.K. Holt, J. Algorithms for Computing MDPI Biology - Special Issue (PR)(CO)(OA) Johansen, R. Fagerberg, the Triplet and Quartet on Developments in G.S. Brodal, C.N.S. Distances for Binary and Bioinformatic Algorithms 2(4) Pedersen and T. Mailund General Trees

J101 2013 K. Yi and Q. Zhang Optimal Tracking of Algorithmica 65(1) (PR)(CO)(OA) Distributed Heavy Hitters and Quantiles J102 2013 P.K. Agarwal, G. Mergeable Summaries ACM Transactions on (PR)(CO)(OA) Cormode, Z. Huang, J.M. Database Systems 38(4) Phillips, Z. Wei and K. Yi.

J103 2013 R. Pagh, Z. Wei, K. Yi Cache-Oblivious Hashing Algorithmica (PR)(CO)(OA) and Q. Zhang J104 2013 U. Meyer and V. Weichert Algorithm Engineering für Informatik-Spektrum (PR)(OA) moderne Hardware J105 2013 X. Li, V. Ceikute, C.S. Effective Online Group IEEE Transactions on (PR)(CO)(OA) Jensen and K.-L. Tan Discovery in Trajectory Knowledge and Data Databases Engineering 25(12) J106 2013 M. Kaul, R.C.-W. Wong, Finding Shortest Paths on Proceedings of the VLDB (PR)(CO)(OA) B. Yang and C.S. Jensen Terrains by Killing Two Birds Endowment 7(1) with One Stone J107 2013 K. S. Bøgh, A. GroupFinder: A New Proceedings of the VLDB (PR)(CO)(OA) Skovsgaard and C.S. Approach to Top-K Point-of- Endowment 6(12) Jensen Interest Group Retrieval J108 2013 B. Yang, C. Guo and C.S. Travel Cost Inference from Proceedings of the VLDB (PR)(CO)(OA) Jensen Sparse, Spatio-Temporally Endowment 6(9) Correlated Time Series Using Markov Models

J109 2013 D. Wu, M.L. Yiu and C.S. Moving Spatial Keyword ACM Transactions on (PR)(CO)(OA) Jensen Queries: Formulation, Database Systems 38(1) Methods, and Analysis J110 2013 L. Chen, G. Cong, C.S. Spatial Keyword Query Proceedings of the VLDB (PR)(CO)(OA) Jensen and D. Wu Processing: An Experimental Endowment 6(3) Evaluation J111 2013 K. Tzoumas, A. Efficiently Adapting Graphical The VLDB Journal 22(1) (PR)(CO)(OA) Deshpande and C.S. Models for Cardinality Jensen Estimation J112 2013 B. Ballinger, N. Coverage with k- Journal of Combinatorial (PR)(CO)(OA) Benbernou, P. Bose, M. Transmitters in the Presence Optimization 25(2) Damian, E.D. Demaine, of Obstacles V. Dujmovic, R. Flatland, F. Hurtado, J. Iacono, A. Lubiw, P. Morin, V. Sacristan, D. Souvaine and R. Uehara

J113 2013 S. Butler, E.D. Demaine, Constructing Points through International Journal of (PR)(CO)(OA) R. Graham and T. Tachi Folding and Intersection Computational Geometry and Applications 23(1) J114 2013 G. Barequet, N. Bounded-Degree Computational Geometry: (PR)(CO)(OA) Benbernou, D. Charlton, Polyhedronization of Point Theory and Applications 46(2) E.D. Demaine, M.L. Sets Demaine, M. Ishaque, A. Lubiw, A. Schulz, D.L. Souvaine, G.T. Toussaint and A. Winslow

J115 2013 J. Cardinal, E.D. The Stackelberg Minimum Journal of Combinatorial (PR)(CO)(OA) Demaine, S. Fiorini, G. Spanning Tree Game on Optimization 25(1) Joret, I. Newman and O. Planar and Bounded- Weimann Treewidth Graphs

J116 2013 G. Aloupis, J. Cardinal, S. Non-crossing matchings of Computational Geometry: (PR)(CO)(OA) Collette, E.D. Demaine, points with geometric objects Theory and Applications 46(1) M.L. Demaine, M. Dulieu, R. Fabila-Monroy, V. Hart, F. Hurtado, S. Langerman, M. Saumell, C. Seara and P. Taslakian

J117 2013 N. Alon, E.D. Demaine, Basic Network Creation SIAM Journal on Discrete (PR)(CO)(OA) M. Hajiaghayi and T. Games Mathematics 27(2) Leighton

J118 2013 E.D. Demaine, S. One-Dimensional Staged Natural Computing 12(2) (PR)(CO)(OA) Eisenstat, M. Ishaque Self-Assembly and A. Winslow J119 2013 E.D. Demaine, M.L. Refold Rigidity of Convex Computational Geometry: (PR)(CO)(OA) Demaine, J. Itoh, A. Polyhedra Theory and Applications 46(8) Lubiw, C. Nara and J. O'Rourke J120 2013 G. Aloupis, N. Efficient Reconfiguration of Computational Geometry: (PR)(CO)(OA) Benbernou, M. Damian, Lattice-Based Modular Robots Theory and Applications 46(8) E.D. Demaine, R. Flatland, J. Iacono and S. Wuhrer

J121 2013 Z. Abel, E.D. Demaine, Finding a Hamiltonian Path in Journal of Information (PR)(CO)(OA) M.L. Demaine, S. a Cube with Specified Turns Processing 21(3) Eisenstat, J. Lynch and is Hard T.B. Schardl J122 2013 E.D. Demaine, M. Scheduling to Minimize Gaps Journal of Scheduling 16(2) (PR)(CO)(OA) Ghodsi, M. Hajiaghayi, and Power Consumption A.S. Sayedi-Roshkhar and M. Zadimoghaddam

J123 2013 Z. Abel, E.D. Demaine, Folding Equilateral Plane International Journal of (PR)(CO)(OA) M.L. Demaine, S. Graphs Computational Geometry and Eisenstat, J. Lynch, T.B. Applications 23(2) Schardl and I. Shapiro- Ellowitz

J124 2013 M. Bateni, M. Hajiaghayi Submodular secretary ACM Transactions on (PR)(CO)(OA) and M. Zadimoghaddam problem and extensions Algorithms 9(4)

J125 2013 A. Elmasry, A. Farzan On the hierarchy of Acta Informatica 50(4) (PR)(CO)(OA) and J. Iacono distribution-sensitive properties for data structures

J126 2013 P. Afshani Improved pointer machine International Journal of (PR)(OA) and I/O lower bounds for Computational Geometry and simplex range reporting and Applications 23 (4-5) related problems

J127 2014 C. Alexander, P.K. Regional-scale mapping of Remote Sensing of (PR)(CO)(OA) Bøcher, L. Arge and J.-C. tree cover, height and main Environment 147 Svenning phenological tree types using airborne laser scanning data

J128 2014 M. Schleuning, L. Ecological, historical and Ecology Letters 17(4) (PR)(CO)(OA) Ingmann, R. Strauß, S. evolutionary determinants of Fritz, B. Dalsgaard, D.M. modularity in weighted seed- Dehling, M. Plein, F.V. dispersal networks Saavedra, B. Sandel, J.- C. Svenning, K. Böhning- Gaese and C.F. Cormann

J129 2014 G. Feng, X.C. Mi, P.K. Relative roles of local Biogeosciences 11 (PR)(CO)(OA) Bøcher, L.F. Mao, B. disturbance, current climate Sandel, M. Cao, W.H. Ye, and palaeoclimate in Z.Q. Hao, H.D. Gong, determining phylogenetic Y.T. Zhang, X.H. Zhao, and functional diversity in G.Z. Jin, K.P. Ma and J.- Chinese forests C. Svenning

J130 2014 R. Ø. Pedersen, B. Macroecological evidence for PLoS ONE 9(6) (PR)(CO)(OA) Sandel and J.-C. competitive regional-scale Svenning interactions between the two major clades of mammal carnivores (Feliformia and Caniformia)

J131 2014 Dalsgaard, B., D.W. Does geography, current Ecology and Evolution 4(20) (PR)(CO)(OA) Carstensen, J. Fjeldså, climate or historical climate P.K. Maruyama, C. determine bird species Rahbek, B. Sandel, J. richness, endemism and Sonne, J.-C. Svenning, island network roles in Z. Wang and W.J. Wallacea and the West Sutherland Indies?

J132 2014 J.Y. Barnagaud, W.D. Ecological traits influence the Ecology Letters 17(7) (PR)(CO)(OA) Kissling, B. Sandel, W. phylogenetic structure of Eiserhardt, H. Balslev, bird species co-occurrences C.H. Sekercioglu, B. worldwide Enquist, C. Tsirogiannis and J.-C. Svenning J133 2014 T. Amano, B. Sandel, H. Global distribution and Proceedings of the Royal (PR)(CO)(OA) Eager, E. Bulteau, J.-C. drivers of language Academy of Science B: Svenning, B. Dalsgaard, extinction risk Biological Sciences 281(1793) C. Rahbek, R.G. Davies and W.J. Sutherland

J134 2014 C. Lamanna, B. Blonder, The latitudinal species Proceedings of the National (PR)(CO)(OA) C. Violle, N.J.B. Kraft, B. richness gradient does not Academy of Sciences 111(38) Sandel, I. Simova, J. arise from a larger functional Donoghue, J.-C. trait space in the tropics Svenning, B. McGill, B. Boyle, S. Dolins, P.M. Jørgensen, A. Marcuse- Kubitza, N. Morueta- Holme, R.K. Peet, W.H. Piel, J. Regetz, M. Schildhauer, N. Spencer, B. Theirs, S.K. Wiser and B.J. Enquist

J135 2014 C. Tsirogiannis and B. Computing the Skewness of Algorithms for Molecular (PR)(OA) Sandel the Phylogenetic Mean Biology 9(15) Pairwise Distance in Linear Time

J136 2014 D. Sidlauskas, S. Saltenis Processing of Extreme The VLDB Journal 23(5) (PR)(CO)(OA) and C.S. Jensen Moving-Object Update and Query Workloads in Main Memory J137 2014 T.M. Chan, S. Durocher, Linear-Space Data Theory of Computing (PR)(CO)(OA) K.G. Larsen, J. Morrison Structures for Range Mode Systems 55(4) and B.T. Wilkinson Query in Arrays

J138 2014 K.G. Larsen On Range Searching in the SIAM Journal on Computing (PR)(OA) Group Model and 43(2) Combinatorial Discrepancy

J139 2014 G. Cormode and H. A second look at counting Theoretical Computer Science (PR)(CO)(OA) Jowhari triangles in graph streams 552

J140 2014 G. S. Brodal, A. C. Dynamic 3-sided planar Theoretical Computer Science (PR)(CO)(OA) Kaporis, A. N. range queries with expected 526 Papadopoulos, S. doubly-logarithmic time Sioutas, K. Tsakalidis and K. Tsichlas J141 2014 A. Sand, M.K. Holt, J. tqDist: A Library for Bioinformatics 30(14) (PR)(CO)(OA) Johansen, G.S. Brodal, T. Computing the Quartet and Mailund and C.N.S. Triplet Distances Between Pedersen Binary or General Trees

J142 2014 K. Yi, L. Wang and Z. Wei Indexing for Summary ACM Transactions on (PR)(CO)(OA) Queries: Theory and Practice Database Systems

J143 2014 M. Abam, S. Computing Homotopic Line Computational Geometry: (PR)(CO) Daneshpajouh, L. Simplification in a Plane Theory and Applications 47(7) Deleuran, S. Ehsani and M. Ghodsi J144 2014 S. Shang, R. Ding, K. Personalized Trajectory The VLDB Journal 23(3) (PR)(CO) Zheng, C.S. Jensen, P. Matching in Spatial Networks Kalnis and X. Zhou J145 2014 B. Yang, B., M. Kaul and Using Incomplete IEEE Transactions on (PR)(CO)(OA) C.S. Jensen Information for Complete Knowledge and Data Weight Annotation of Road Engineering 26(5) Networks

J146 2014 G. Moruz, A. Negoescu, Engineering Efficient Paging Journal of Experimental (PR)(CO)(OA) C.Neumann and V. Algorithms Algorithmics 19 Weichert J147 2014 E. D. Demaine, M. Minimizing Movement: Fixed- ACM Transactions on (PR)(CO)(OA) Hajiaghayi and D. Marx Parameter Tractability Algorithms 11(2)

J148 2014 E. D. Demaine, M. L. UNO is hard, even for a Theoretical Computer Science (PR)(CO)(OA) Demaine, R. Uehara, T. single player 521 Uno and Y. Uno J149 2014 M. Damian, E. D. Unfolding Orthogonal Graphs and Combinatorics (PR)(CO)(OA) Demaine and R. Flatland Polyhedra with Quadratic 30(1) Refinement: The Delta- Unfolding Algorithm J150 2014 G. Borradaile, E. D. Polynomial-Time Algorithmica 68(2) (PR)(CO)(OA) Demaine and S.Tazari Approximation Schemes for Subset-Connectivity Problems in Bounded-Genus Graphs J151 2014 G. S. Brodal, M. Greve, Integer representations Journal of Discrete (PR)(CO)(OA) V. Pandey and S. S. Satti towards efficient counting in Algorithms 26 the bit probe model

J152 2014 Sandom, C., S. Faurby, Global Late Quaternary Proceedings of the Royal (PR)(CO)(OA) B. Sandel, J.-C. Svenning megafauna extinctions linked Society B: Biological Sciences to humans, not climate 281(1787) change. J153 2014 Kissling, D., L. Dalby, C. Establishing macroecological Ecology and Evolution 4(14) (PR)(CO)(OA) Fløjgaard, J. Lenoir, B. trait datasets: digitalization, Sandel, C. Sandom, K. extrapolation and validation Trøjelsgaard, J.-C. of diet preferences in Svenning. terrestrial mammals worldwide

J154 2014 H.-K. Ahn, H.-S. Kim, S.- Computing k Centers over International Journal of (PR)(CO)(OA) S. Kim and W. Son Streaming Data for Small k Computational Geometry and Applications 24(2) J155 2014 T. Jurkiewicz and K. On a Model of Virtual Journal of Experimental (PR)(CO)(OA) Mehlhorn Address Translation Algorithmics 19 J156 2014 A. C. Gilbert, P. Recent Developments in the Signal Processing Magazine (PR)(CO)(OA) Indyk, M. Iwen and L. Sparse Fourier Transform: A 31(5) Schmidt compressed Fourier transform for big data

J157 2014 S. Felton, M. Tolley, E. A method for building self- Science 345(6197) (PR)(CO)(OA) Demaine, D. Rus and R. folding machines Wood J158 2014 E. D. Demaine, M. L. Picture-Hanging Puzzles Theory of Computing (PR)(CO)(OA) Demaine, Y. N. Minsky, J. Systems 54(4) S. B. Mitchell, R. L. Rivest and M. Patrascu

J159 2014 O. Aichholzer, G. Aloupis, Covering Folded Shapes Journal of Computational (PR)(CO)(OA) E. D. Demaine, M. L. Geometry 5(1) Demaine, S. P. Fekete, M. Hoffmann, A. Lubiw, J. Snoeyink and A. Winslow

J160 2014 E. D. Demaine, Y. Computational complexity IEICE Transactions on (PR)(CO)(OA) Okamoto, R. Uehara and and an integer programming Fundamentals of Electronics, Y. Uno model of Shakashaka Communications and Computer Sciences 97-A(6)

J161 2014 Z. Abel, E. D. Demaine, Computational Complexity of IEICE Transactions on (PR)(CO)(OA) M. L. Demaine, T. Piano-Hinged Dissections Fundamentals of Electronics, Horiyama and R. Uehara Communications and Computer Sciences 97-A(6)

J162 2014 T. Ito and E. D. Demaine Approximability of the Journal of Combinatorial (PR)(CO)(OA) Subset Sum Reconfiguration Optimization 28(3) Problem J163 2015 W. Son, H.-K. Ahn and Group Nearest-Neighbor Theoretical Computer Scienc (PR)(CO)(OA) S. W. Bae Queries in the L_1 Plane 592 J164 2015 G. S. Brodal, S. Sioutas, D^2-Tree: A New Overlay Algorithmica 72(3) (PR)(CO)(OA) K. Tsichlas and C. with Deterministic Bounds Zaroliagis J165 2015 S. Pettie and H.-H. Su Distributed Coloring Information and Computation (PR)(OA) Algorithms for Triangle-Free 243 Graphs J166 2015 S. Pettie Sensitivity Analysis of Journal of Graph Algorithms (PR)(OA) Minimum Spanning Trees in and Applications 19(1) Sub-Inverse-Ackermann Time

J167 2015 S. Pettie Three Generalizations of SIAM Journal on Discrete (PR)(OA) Davenport-Schinzel Mathematics 29(4) Sequences J168 2015 E. Sebastián-González, Macroecological trends in Global Ecology and (PR)(CO)(OA) B. Sandel, B. Dalsgaard nestedness and modularity Biogeography 24(3) and P. Guimarães of seed-dispersal networks.

J169 2015 K. Engemann, B. Enquist, Limited sampling hampers Ecology and Evolution, 5(3) (PR)(CO)(OA) B. Sandel, B. Boyle, P. ’big data’ estimation of Jørgensen, N. Morueta- species richness in a tropical Holme, R. Peet, C. Violle biodiversity hotspot. and J.-C. Svenning

J170 2015 B. Sandel Towards a taxonomy of Ecography 38(4) (PR) spatial scale-dependence J171 2015 T. M. Chan, S. Durocher, Linear-Space Data Algorithmica 72(4) (PR)(CO)(OA) M. Skala and B. T. Structures for Range Wilkinson Minority Query in Arrays J172 2015 K. G. Larsen, J. I. Munro, On Hardness of Several Theoretical Computer Science (PR)(CO)(OA) J. S. Nielsen and S. String Indexing Problems 582 Thankachan J173 2015 A. Kovacs, U. Meyer, G. The optimal structure of Information Processing (PR)(OA) Moruz and A. Negoescu algorithms for alpha-paging. Letters, 115(12)

J174 2015 A. Termehchy, A. Cost-Effective Database ACM Transactions on (PR)(CO)(OA) Vakilian, Y. Design For Information Database Systems 40(2) Chodpathumwan and M. Extraction Winslett J175 2015 E. D. Demaine, J. Iacono Worst-Case Optimal Tree Algorithmica 72(2) (PR)(CO)(OA) and S. Langerman Layout in External Memory

J176 2015 B. Blonder, D. Nogués- Linking environmental Ecology 96(4) (PR)(CO) Bravo, M.K. Borregaard, filtering and disequilibrium to J. Donoghue II, P.M. biogeography with a Jørgensen, N.J.B. Kraft, community climate J.-P. Lessard, N. Morueta- framework Holme, B. Sandel, J.-C. Svenning, C. Violle, C. Rahbek and B.J. Enquist

J177 2015 B. Sandel, A.G. Estimating the missing Journal of Vegetation Science (PR)(CO) Gutiérrez, P.B. Reich, F. species bias in plant trait 26(5) Schrodt, J. Dickie and J. measurements Kattge J178 2015 Sandel, B., A.G. Late Cenozoic climate Global Ecology and (PR)(CO) Gutiérrez, P.B. Reich, F. change and the phylogenetic Biogeography 24(10) Schrodt, J. Dickie and J. structure of regional conifer Kattge floras worldwide

J179 2015 J-C. Svenning, W. The influence of paleoclimate Annual Reviews in Ecology, (PR)(CO)(OA) Eiserhardt, S. Normand, on present-day patterns in Evolution and Systematics 46 A. Ordonez and B. Sandel biodiversity and ecosystems.

J180 2015 M. Pasgaard, N. Strange, Geographical imbalances and Global Environmental Change (PR)(CO)(OA) B. Dalsgaard, P.K.M. divides in the scientific 35 Mendonça and B. Sandel production of climate change knowledge

J181 2015 L. Arge and M. Thorup RAM-efficient External Algoritmica 73(4) (PR)(CO)(OA) Memory Sorting J182 2015 G. S. Brodal, G. Moruz OnlineMin: A Fast Strongly Theory of Computing (PR)(OA) and A. Negoescu Competitive Randomized Systems 56(1) Paging Algorithm,

J183 2015 J. Brody and K. G. Larsen Adapt or Die: Polynomial Theory of Computing 11(19) (PR)(CO)(OA) Lower Bounds for Non- Adaptive Dynamic Data Structures J184 2015 P. Wollstadt, U. Meyer, A Graph Algorithmic PLoS ONE, 10(10) (PR)(CO)(OA) and M. Wibral Approach to Separate Direct from Indirect Neural Interactions

J185 2015 C. Hegde, P. Indyk and Approximation Algorithms for IEEE Transactions of (PR)(CO)(OA) L. Schmidt Model-Based Compressive Information Theory 61(9) Sensing

J186 2015 E. D. Demaine, F. Ma, M. You Should Be Scared of Journal of Information (PR)(CO)(OA) Susskind and E. German Ghost Processing 23(3) Waingarten J187 2015 G. Aloupis, E. D. Classic Nintendo Games are Theoretical Computer Science (PR)(CO)(OA) Demaine, A. Guo and G. (Computationally) Hard 586 Viglietta J188 2015 K. Yamanaka, E. D. Swapping Labeled Tokens on Theoretical Computer Science (PR)(CO)(OA) Demaine, T. Ito, J. Graphs 586 Kawahara, M. Kiyomi, Y. Okamoto, T. Saitoh, A. Suzuki, K. Uchizawa and T. Uno J189 2015 A. B. Adcock, E. D. Zig-Zag Numberlink is NP- Journal of Information (PR)(CO)(OA) Demaine, M. L. Demaine, Complete Processing 23(3) M. P. O'Brien, F. Reidl, F. S. Villaamil and B. D. Sullivan

J190 2015 E. D. Demaine and M. L. Fun with Fonts: Algorithmic Theoretical Computer Science (PR)(OA) Demaine Typography 586 J191 2015 E. D. Demaine, M. L. Linear-time algorithm for Theoretical Computer Science (PR)(CO)(OA) Demaine, E. Fox-Epstein, sliding tokens on trees 600 D. A. Hoang, T. Ito, H. Ono, Y. Otachi, R. Uehara and T. Yamada

J192 2016 E.D. Demaine, M.J. The Two-Handed Tile Algorithmica 74(2) (PR)(CO) Patitz, T.A. Rogers, R.T. Assembly Model is not Schweller, S.M. Summers Intrinsically Universal and D. Woods

J193 2016 E.D. Demaine, D. Folding a paper strip to Journal of Discrete (PR)(CO) Eppstein, A. Hesterberg, minimize thickness Algorithms 36 H. Ito, A. Lubiw, R. Uehara and Y. Uno

J194 2016 J-C. Svenning, P.B.M. Science for a wilder Proceedings of the National (PR) (CO)(OA) Pedersen, C.J. Donlan, R. Anthropocene - synthesis Academy of Sciences, 113 Ejrnaes, S. Faurby, M. and future directions for Galetti, D.M. Hansen, B. rewilding research Sandel, C.J. Sandom, J.W. Terborgh and F.W.M. Vera

J195 2016 J-C. Svenning, P.B.M. Reply to Rubenstein and Proceedings of the National (PR) (CO)(OA) Pedersen, C.J. Donlan, R. Rubenstein: Time to move Academy of Sciences, USA Ejrnaes, S. Faurby, M. on from ideological debates 113 Galetti, D.M. Hansen, B. on rewilding Sandel, C.J. Sandom, J.W. Terborgh and F.W.M. Vera J196 2016 J. Sonne, A.M. Martin High proportion of smaller Proceedings of the Royal (PR) (CO)(OA) González, P.K. ranged hummingbird species Society B,. 283 Maruyama, B. Sandel, S. coincides with ecological Abrahamczyk, R. specialization across the Alarcon, A.C. Araujo, F.P. Americas Araujo, S.M. de Azevedo Jr., A.C. Baquero, D. Nogues-Bravo, P.A. Cotton., T.T. Ingversen, G. Kohler, C. Lara, F.M.G. Las-Casas, A.O. Machado, C.G. Machado, M.A. Maglianesi, A.C. Moura, G.M. Olivera, P.E.Oliveira, J.F. Ornelas, L. da Cruz Rodrigues, L. Rosero-Lasprilla, A.M. Rui, M. Sazima, M. Schleuning, A. Timmermann, I.G. Varassin, J. Vinzentin- Bugoni, Z. Wang, S. Watts, J. Fjeldså, J.-C. Svenning, C. Rahbek and B. Dalsgaard

J197 2016 G. Feng, L. Mao, B. High plant endemism in Journal of Biogeography 43 (PR) (CO)(OA) Sandel, N.G. Swenson China is partially linked to and J.-C. Svenning reduced glacial-interglacial climate change

J198 2016 C. Doughty, A. Wolf, N. Megafauna extinction, tree Ecography 39 (PR) (CO)(OA) Morueta-Holme, P.M. species range reduction, and Jørgensen, B. Sandel, C. carbon storage in Amazonian Violle, B. Boyle, N.J.B. forests Kraft, R.K. Peet, B.J. Enquist, J.-C. Svenning, S. Blake and M. Galetti

J199 2016 D. N. Karger, A. Cord, M. Delineating probabilistic Global Ecology and (PR) (CO)(OA) Kessler, H. Kreft, I. species pools in ecology and Biogeography 25 Kühn, S. Pompe, B. biogeography Sandel, J. Sarmento- Cabral, A. Smith, J.-C. Svenning, H. Tuomisto, P. Weigelt and K. Wesche

J200 2016 K. Engemann, B. Sandel, Patterns and drivers of plant Botanical Journal of the (PR) (CO)(OA) B.J. Enquist, P.M. functional group dominance Linnean Society 180 Jørgensen, N.J.B. Kraft, across the Western A. Marcuse-Kubitza, B. Hemisphere - a McGill, N. Morueta- macroecological re- Holme, R.K. Peet, C. assessment based on a Violle, S. Wiser and J.-C. massive novel botanical Svenning dataset

J201 2016 Y. Li, X. Li, B. Sandel, D. Climate and topography Nature Climate Change 6 (PR) (CO)(OA) Blank, Z. Liu, X. Liu and explain range sizes of S. Yan. terrestrial vertebrates J202 2016 K. Milton, D.A. Nolin, K. Genetic, spatial, and social Primates 57 (PR) (CO)(OA) Ellis, J. Lozier, B. Sandel relationships among adults in and E. Lacey a group of Howler Monkeys (Alouatta palliata) from Barro Colorado Island, Panama J203 2016 Z. Ma, B. Sandel and J.- Phylogenetic assemblage Ecology and Evolution 6 (PR) (CO)(OA) C. Svenning structure of North American trees is more strongly shaped by glacial-interglacial climate variability in gymnosperms than in angiosperms

J204 2016 D. Arroyuelo, P. Davoodi Succinct Dynamic Cardinal Algorithmica 74(2) (PR) (CO)(OA) and S.S. Rao Trees - To C. Tsirogiannis and B. PhyloMeasures: A package Ecography (PR)(CO) Appear Sandel for computing phylogenetic biodiversity measures and their statistical moments

- To N. Morueta-Holme, B. A network approach for Ecography (PR)(CO) Appear Blonder, B. Sandel, B. inferring species associations McGill, R.K. Peet, J. Ott, from co-occurrence data C. Violle, B. Enquist, P.M. Jørgensen and J.-C. Svenning

- To T. K. Nielsen, B.M. Investigating Neanderthal Quaternary International (PR)(CO) Appear Benito, J.-C. Svenning, dispersal above 55N in B. Sandel, L. Europe during Marine McKerracher, F. Reide Isotope Stage 5 and P.C. Kjaergaard

- To G.R. Goldsmith, N. Plant-O-Matic: A dynamic Methods in Ecology and (PR)(CO) Appear Morueta-Holme, B. and mobile field guide to all Evolution Sandel, E.D. Fitz, S.D. plants of the Americas Fitz, B. Boyle, N. Casler, R. Condit, S. Dolins, J. Donoghue, K. Engemann, P.M. Jørgensen, N.J.B. Kraft, A. Marcuse- Kubitza, B. McGill, R.K. Peet, W. Piel, J. Regetz, M. Schildhauer, N. Spencer, J.-C. Svenning, B. Thiers, C. Violle, S. Wiser and B. Enquist

- To B. Sandel, A.-C. Monnet Multidimensional structure of Journal of Vegetation Science (PR)(CO) Appear and M. Vorontsova grass functional traits among species and assemblages

- To J.S. Ku and E.D. Demaine Folding Flat Crease Patterns Journal of Mechanisms and (PR) Appear With Thick Materials Robotics

- To T.M. Chan and B.T. Adaptive and Approximate ACM Transactions on (PR)(CO) Appear Wilkinson Orthogonal Range Counting Algorithms

- To B. Sandel and C. Species Introductions and Ecology (PR) Appear Tsirogiannis the Phylogenetic and Functional Structure of California's Grasses - To B. Sandel and C. Fast Computation of PLoS ONE (PR) Appear Tsirogiannis Measures of Phylogenetic Beta Diversity

- To A. Kalvisa, C. MIRU-VNTR genotype Journal of Infection, Genetics (PR) (CO) Appear Tsirogiannis, I. diversity and indications of and Evolution Silamikelis, G. Skenders, homoplasy in M. avium L. Broka, A. Zirnitis, D. strains isolated from humans Bandere, I. Jansone and and slaughter pigs in Latvia R. Ranka

- To L. Barenboim, M. Elkin, The Locality of Distributed Journal of the ACM (PR)(CO) Appear S. Pettie and J. Schneider Symmetry Breaking Thesis T1 2007 I. Brudaru Heuristics for Average MPI MS Thesis Diameter Approximation with External Memory Algorithms

T2 2007 G. Moruz Hardware-Aware Algorithms AU PhD Thesis and Data Structures

T3 2008 M. Patrascu Lower Bound Techniques for MIT PhD Thesis Data Structures T4 2008 A. Sidiropoulos Computational metric MIT PhD Thesis embeddings T5 2008 D. Ajwani Traversing large graphs in MPI PhD Thesis realistic settings T6 2008 K. Do Ba Testing closeness of MIT MS Thesis distributions under the EMD metric T7 2008 K. Lai Complexity of Union-Split- MIT MS Thesis Find Problems T8 2008 J. M. Larsen og M. Nielsen En undersøgelse af AU MS Thesis algoritmer til løsning af generalized movers problem i 3D T9 2008 C. Andersen An optimal minimum AU MS Thesis spanning tree algorithm T10 2008 M. Revsbæk I/O-efficient Algorithms for AU MS Thesis Batched Union-Find with Dynamic Set Properties and its Applications to Hydrological Conditioning

T11 2008 A. H. Jensen I/O-efficient Processing of AU MS Thesis LIDAR Data T12 2009 M. Olsen Link Building AU PhD Thesis T13 2009 T. Mølhave Handling Massive Terrains AU PhD Thesis and Unreliable Memory, AU

T14 2009 H. B. Kirk Searching with Dynamic AU MS Thesis Optimality: In Theory and Practice T15 2009 K. Piatkowski Implementering og udvikling AU MS Thesis af maksimum delsum algoritmer T16 2009 O. Weimann Accelerating Dynamic MIT PhD Thesis Programming T17 2009 V. Weichert Radiation parameterization FRA MS Thesis of the climate model COSMO/CLM in CUDA

T18 2009 R. Berinde Advances in Sparse Signal MIT MS Thesis Recovery Methods T19 2009 P. Davoodi Two Dimensional Range AU MS Thesis Minimum Queries T20 2009 K. Tsakalidis External Memory 3-sided AU MS Thesis Planar Range Reporting and Persistent B-Trees T21 2009 L. Deleuran Polygonal Line Simplification AU MS Thesis

T22 2010 A. G. Jørgensen Data Structures: Sequence AU PhD Thesis Problems, Range Queries, and Fault Tolerance

T23 2010 J. Moeslund Fine-resolution geospatial AU MS Thesis modelling of contemporary and potential future plant diversity in Denmark

T24 2010 J. Truelsen Working Set Implicit AU MS Thesis Dictionaries and Range Mode Lower Bounds and Approximations T25 2010 M. Greve Online Sorted Range AU MS Thesis Reporting and Approximating the Mode T26 2010 D. Kjær Range Media Algorithms AU MS Thesis T27 2010 J. Suhr Christensen Experimental Study of AU MS Thesis Kinetic Geometric t-Spanner Algorithms T28 2011 K. G. Larsen Optimal Orthogonal Range AU MS Thesis Reporting in 3-d T29 2011 C. Kejlberg-Rasmussen On Implicit Dictionaries with AU MS Thesis the Working-Set Property and Catenable Priority Queues with Attrition

T30 2011 P. Davoodi Data Structures: Range AU PhD Thesis Queries and Space Efficiency

T31 2011 K. Tsakalidis Dynamic Data Structures: AU PhD Thesis Orthogonal Range Queries and Update Efficiency

T32 2011 J. Nelson Sketching and Streaming MIT PhD Thesis High-Dimensional Vectors T33 2012 J. E. Moeslund The role of topography in AU PhD Thesis determining local plant diversity patterns across Denmark T34 2012 F. van Walderveen External Memory Graph AU PhD Thesis Algorithms and Range Searching Data Structures

T35 2012 L. Deleuran Homotopic Polygonal Line AU PhD Thesis Simplification T36 2012 C. Neumann Practical Paging Algorithms FRA MS Thesis

T37 2012 D. Veith Implementation of an FRA MS Thesis External-Memory Diameter Approximation T38 2012 M. Sturmann k-Dimensionale Orthogonale FRA MS Thesis Bereichsanfragen für GPUs auf großen Instanzen

T39 2012 P. Wollstadt A Graph Algorithmic FRA BS Thesis Approach to Separate Direct from Indirect Neural Interactions by Identifying Alternative Paths with Similar Weights

T40 2012 E. Deza An efficient implementation FRA BS Thesis of the optimal paging algorithm T41 2012 T. Morgan Map Folding MIT MS Thesis T42 2012 R. Gupta A Compressive Sensing MIT MS Thesis Algorithm for Attitude Determination T43 2012 A. Koefoed-Hansen Representations for Path AU MS Thesis Finding in Planar Environments T44 2013 K.G. Larsen Models and Techniques for AU PhD Thesis Proving Data Structure (OA) Lower Bounds T45 2013 C. Kejlberg-Rasmussen Dynamic Data Structures: AU PhD Thesis The Interplay of Invariants (OA) and Algorithm

T46 2013 J. Fogh Engineering a Fast Fourier AU MS Thesis Transform (OA) T47 2013 M. Holt and J. Johansen Computing Triplet and AU MS Thesis Quartet Distances (OA) T48 2013 J. Schou Range Minimum Data AU MS Thesis Structures T49 2013 A. Negoescu Design of Competitive Paging FRA PhD Thesis Algorithms with Good (OA) Behaviour in Practice

T50 2013 D. Pick Effiziente Algorithmen auf FRA MS Thesis Eingebetteten Plattformen

T51 2013 T. Timmer I/O-effiziente Durchmesser- FRA MS Thesis Approximierung auf gewichteten Graphen

T52 2013 D. Frascaria Improved results for (h,k)- FRA MS Thesis paging T53 2013 S. Försch An efficient implementation FRA MS Thesis of PARTITION2

T54 2013 A. Kehlenbach Interaktive FRA MS Thesis Stadtplanungsmaßnahmen auf der GPU T55 2013 S. Bechtold Shortest Paths with Multiple FRA MS Thesis Constraints in Flight Networks

T56 2013 V. Ceikute Inferring Groups of Objects, AU PhD Thesis Preferred Routes, and (OA) Facility Locations from Trajectories

T57 2013 E. Price Sparse Recovery and Fourier MIT PhD Thesis Sampling T58 2013 L. Schmidt Model-Based Compressive MIT MS Thesis Sensing with Earth Mover's Distance Constraints

T59 2013 S. Mahabadi Approximate Nearest MIT MS Thesis Neighbor And Its Many Variants T60 2013 F. Mogensen Locating Points of Interest AU MS Thesis Based on Geo- tagged Tweets

T61 2013 C.W. Schmidt Indicering af spatio- AU MS Thesis tekstuelle data – et empirisk studie T62 2014 J.M. Friis and S.B. Olesen An Experimental Comparison AU MS Thesis of Max Flow Algorithms (OA)

T63 2014 D. W. Petersen Orthogonal Range Skyline AU MS Thesis Counting Queries (OA) T64 2014 J. Kunert Hashing and Random Graphs AU MS Thesis (OA) T65 2014 B. Mortensen Algorithms for Computing AU MS Thesis Convex Hulls Using Linear (OA) Programming

T66 2014 M. Revsbæk Handling Massive and AU MS Thesis Dynamic Terrain Data (OA) T67 2014 A. Skovsgaard Indexing, Query Processing, AU PhD Thesis and Clustering of Spatio-Temporal Text Objects

T68 2014 Samir van de Sand Eine adaptive Tabusuche für FRA MS Thesis das Vehicle Routing Problem mit Zeitfenstern T69 2014 Morteza Zadimoghaddam Online Allocation Algorithms MIT PhD Thesis with Applications in Computational Advertising

T70 2015 J. Truelsen Space Efficient Data AU PhD Thesis Structures and External Terrain Algorithms T71 2015 J. Yang Efficient Algorithms for AU PhD Thesis Handling Massive Terrains

T72 2015 J. S. Nielsen Implicit Data Structures, AU PhD Thesis Sorting, and Text Indexing

T73 2015 C. Jespersen Monte Carlo Evaluation of AU MS Thesis Financial Options Using a GPU (OA)

T74 2015 B. Mortensen Algorithms for Computing AU MS Thesis Convex Hulls Using Linear (OA) Programming

T75 2015 K. V. Ebbesen On the Practicality of Data- AU MS Thesis Oblivious Sorting (OA) T76 2015 J. H. Knudsen and R. L. Engineering Rank and Select AU MS Thesis Pedersen Queries on Wavelet Trees (OA)

T77 2015 J. C. C. Jensen Event Detection in Soccer AU MS Thesis using Spatio-Temporal Data (OA)

T78 2015 M. E. Hougaard On the Complexity of Red- AU MS Thesis Black Trees for Higher (OA) Dimensions T79 2015 S. N. Madsen and R. Computing Set Operations AU MS Thesis Hallenberg-Larsen on Simple Polygons Using (OA) Binary Space Partition Trees

T80 2015 L. Walther Intersection of Convex AU MS Thesis Objects in the Plane (OA) T81 2015 A. S-H. Vinther and M. S- Pathfinding in Two- AU MS Thesis H. Vinther dimensional Worlds (OA) T82 2015 T. Sandholt and M. B. Geometric Measures of Depth AU MS Thesis Christensen (OA) T83 2015 M. Ravn Orthogonal Range Searching AU MS Thesis in 2D with Ball Inheritance (OA)

T84 2015 H. Hassanieh The Sparse Fourier MIT PhD Thesis Transform: Theory & Practice

T85 2015 N. Schepsen STXXL - Paging-Algorithmen FRA BS Thesis

T86 2015 D. Roß Cohesion in scientific FRA MS Thesis collaboration and citation networks T87 2016 J. Landbo and C. Green Range Mode Queries in AU MS Thesis Arrays (OA) T88 2016 H.K. Christensen Algorithms for Finding AU MS Thesis Dominators in Directed (OA) Graphs

Other O1 2008 E. Demaine, B. Gassend, All Polygons Flip Finitely ... In "Surveys on Discrete and (CO) J. O'Rourke, and G. T. Right? Computational Geometry: Toussaint Twenty Years Later", Contemporary Mathematics 453 O2 2008 A. Andoni and P. Indyk Near-Optimal Hashing Communications of the ACM, (CO) Algorithms for Approximate 51(1) Nearest Neighbor in High Dimensions

O3 2008 K. Mehlhorn and P. Algorithms and Data Springer Verlag (CO) Sanders Structures: The Basic Toolbox

O4 2009 D. Ajwani and U. Meyer Design and Engineering of In Algorithmic of Large and (PR) External Memory Traversal Complex Networks, Springer Algorithms for general graphs Verlag

O5 2009 L. Arge and N. Zeh External-memory Algorithms In Algorithms and Theory of (PR)(CO) and Data Structures Computation Handbook, CRC Press O6 2009 R. Hearn and E. Demaine Games, Puzzles, and A.K. Peters (CO) Computation O7 2010 D. Ajwani and H. Realistic Computer Models In Algorithm Engineering. (CO) Meyerhenke Bridging the Gap Between Algorithm Theory and Practice, Springer Verlag

O8 2011 H. Balslev, L. Arge, J.-C. Abstracts of Royal Danish (CO) Svenning, M. H. Schierup Academy of Sciences and C. S. Jensen Symposium on Biodiversity in the Silicon Age

O9 2012 L. Arge and K. G. Larsen I/O-Efficient Spatial Data Invited abstract in Structures for Range Queries SIGSPATIAL Special, July, 2012. O10 2012 B. Sandel, L. Arge, B. Response - Global endemism Science 335 (CO) Dalsgaard, R.G. Davies, needs spatial integration K.J. Gaston, W.J. Sutherland and J.-C. Svenning O11 2014 U. Meyer and N. Zeh I/O-model Encyclopedia of Algorithms, (CO) second edition

O12 2014 U. Meyer and N. Zeh List-Ranking Encyclopedia of Algorithms, (CO) second edition C. Center statistics

In the following, the center statistics collected by the foundation is enclosed. Note that since the same statistics is collected for all centers, not all of it is particularly relevant or meaningful for MADALGO. For example, since the tradition in the algorithms field is that only researcher that contributed very substantially to a result is a co-author on the paper with the results, and since authors are (almost) always listed alphabetically, statistics on “centerleder as co-author” and “centerleader is either first or last author” are not very relevant. Also, note that publication and citation statistics is also given in Section B.

Final report Center 84 Center for Massive Data Algorithmics DG.84/26-5 Date: 26-05-2016 Publications Publications pr. VIP FtE No. Titel 70 Book Chapters - Peer 10 Reviewed with PhD without PhD Book Chapters - Not Peer 9 60 Reviewed Conference Proceedings - 8 Peer Reviewed 50 Conference Proceedings - Not 7 Peer Reviewed 6 40 Journal article - Peer Reviewed 5 Journal article - Not Peer 30 Reviewed 4 Monographs - Peer Reviewed 20 3 Monographs - Not Peer Reviewed 2 10 Other means of publication - Peer Reviewed 1 Other means of publication - 0 Not Peer Reviewed 0

DG.84/26-5-2016 DG.84/26-5-2016

International conferences No. 60 Conferences organized Invited talks

50

40 No journal impact factor data available. No citation data available. 30

20

10

0

DG.84/26-5-2016 DG.84/26-5-2016 DG.84/26-5-2016

External Relations Public Outreach External Funding Companies - International No. Mill DKK 35 6 Companies - Danish No. Universities / research groups - International 100 30 Universities / research groups - Danish 5 90 25 80 4 70 20 60 3 50 15 40 2 10 30 20 1 5 10 0 0 0

DG.84/26-5-2016 DG.84/26-5-2016 DG.84/26-5-2016

Educational Activities Patents Awards No. No. ECTS points taught Number of courses taught Submitted patent applications No. 1,2 10 100 Spin-off companies established 9 Granted patents 90 1 8 80 7 70 0,8 6 60 0,6 5 50 40 4 0,4 30 3 20 2 0,2 10 1

0 0 0

DG.84/26-5-2016 DG.84/26-5-2016 DG.84/26-5-2016

Page 1/3 Center 84 Center for Massive Data Algorithmics Date: 26-05-2016 Faculty No. 8 Total - Staff Total - Full-time equivalent International - Staff International - Full-time equivalent 7 6

5

4 3 2

1

0 2007 2008 2009 2010 2011 2012 2013 2014 2015 DG.84/26-5-2016

Post-docs No. 14 Total - Staff Total - Full-time equivalent International - Staff International - Full-time equivalent

12

10

8

6

4

2

0 2007 2008 2009 2010 2011 2012 2013 2014 2015 DG.84/26-5-2016

Ph.D. students No. 25 Total - Staff Total - Full-time equivalent International - Staff International - Full-time equivalent PhD degrees - Staff PhD degrees - Full-time equivalent 20

15

10

5

0 2007 2008 2009 2010 2011 2012 2013 2014 2015 DG.84/26-5-2016

Guest Scientists No. 8 Total - Staff Total - Full-time equivalent International - Staff International - Full-time equivalent 7 6 5 4 3 2 1 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 DG.84/26-5-2016

No. TAP & Administration 4,5 Technical Administrative Personnel Administrative Personnel 4 3,5 3 2,5 2 1,5 1 0,5 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 DG.84/26-5-2016 Page 2/3 Center 84 Center for Massive Data Algorithmics Date: 26-05-2016 Centerleader as co-author No. of papers 100 Total number of center papers Centerleader co-author 90 80 70 60 50 40 30 20 10 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 DG.84/26-5-2016

Centerleader is either first or last author % 100 Center Leader First author Center Leader Last author 90 80 70 60 50 40 30 20 10 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 DG.84/26-5-2016

No. of authors per paper Title max average min 35

30

25

20

15

10

5

0 2007 2008 2009 2010 2011 2012 2013 2014 2015 DG.84/26-5-2016

2015 gender distribution by scientific degree, compared to the national distribution for Natural sciences 100% Male - National 90% 80% Male - Center of 70% Excellence 60% Female - National 50% 40% Female - Center of 30% Excellence 20% 10% 0% Associate Professor Professor Master student Ph.D. students Post-docs Faculty DG.84/26-5-2016 Number of scientists included: Ph.D. students: 20 Post-docs: 8 Associate Professors: 1 Professors: 6 Field of ResNatural scie2015 gender distribution by scientific degree, compared toNot included: 0

Page 3/3 Selected publications The below ten papers are selected in an attempt to both cover the breadth of the center’s research and to showcase some of the main center results. This is obviously not a simple task, and many important and interesting papers could not be selected. The number of each paper in the center publication list is listed below each paper, along with a few keywords describing the paper.

1. L. Arge and M. Thorup RAM-Efficient External Memory Sorting Proc. International Symposium on Algorithms and Computation (ISAAC), 2013 C267 (I/O and internal memory optimality – best paper award)

2. L. Arge, M. Revsbæk and N. Zeh I/O-Efficient Computation of Water Flow Across a Terrain Proc. Symposium on Computational Geometry (SoCG), 2010 C127 (I/O and terrain flood risk )

3. G.S. Brodal, E. Demaine, J.T. Fineman, J. Iacono, S. Langerman and J.I. Munro Cache-Oblivious Dynamic Dictionaries with Optimal Update/Query Tradeoff Proc. Symposium on Discrete Algorithms (SODA), 2010 C119 (Cache-oblivious data structures)

4. D.M. Kane, J. Nelson and D.P. Woodruff An Optimal Algorithm for the Distinct Elements Problem Proc. Symposium on Principles of Database Systems (PODS), 2010 C112 (Streaming algorithms – best paper award)

5. K. G. Larsen, J. Nelson and H. L. Nguyen Time Lower Bounds for Nonadaptive Turnstile Streaming Algorithms Proc. Symposium on Theory of Computing (STOC), 2015 C382 (Streaming algorithms)

6. C. Alexander, L. Arge, P.K. Bøcher, M. Revsbæk, B. Sandel, J.-C. Svenning, C. Tsirogiannis, and J. Yang Computing River Floods Using Massive Terrain Data Proc. International Conference on Geographic Information Science (GIScience), 2016 C431 (I/O, algorithm engineering and terrain flood risk)

7. B. Sandel, L. Arge, B. Dalsgaard, R. Davies, K. Gaston, W. Sutherland and J.-C. Svenning The influence of Late Quaternary climate-change velocity on species endemism Science 334 J34 (Interdisciplinary collaboration and algorithm engineering)

8. M. Patrascu Succincter Proc. Symposium on Foundations of Computer Science (FOCS), 2008 C23 (Succinct data structures – best student paper award)

9. G.S. Brodal, G. Lagogiannis and R.E. Tarjan Strict Fibonacci Heaps Proc. Symposium on Theory of Computing (STOC), 2012 C214 (Classical data structures)

10. K. G. Larsen The Cell Probe Complexity of Dynamic Range Counting Proc. Symposium on Theory of Computing (STOC), 2012 C199 (Lower bounds – best paper and best student paper award)

Final report RAM-Efficient External Memory Sorting

Lars Arge1, and Mikkel Thorup2,

1 MADALGO, Aarhus University, Aarhus, Denmark 2 University of Copenhagen,† Copenhagen, Denmark

Abstract. In recent years a large number of problems have been consid- ered in external memory models of computation, where the complexity measure is the number of blocks of data that are moved between slow external memory and fast internal memory (also called I/Os). In prac- tice, however, internal memory time often dominates the total running time once I/O-efficiency has been obtained. In this paper we study al- gorithms for fundamental problems that are simultaneously I/O-efficient and internal memory efficient in the RAM model of computation.

1 Introduction

In the last two decades a large number of problems have been considered in the external memory model of computation, where the complexity measure is the number of blocks of elements that are moved between external and internal memory. Such movements are also called I/Os. The motivation behind the model is that random access to external memory, such as disks, often is many orders of magnitude slower than random access to internal memory; on the other hand, if external memory is accessed sequentially in large enough blocks, then the cost per element is small. In fact, disk systems are often constructed such that the time spent on a block access is comparable to the time needed to access each element in a block in internal memory. Although the goal of external memory algorithms is to minimize the number of costly blocked accesses to external memory when processing massive datasets, it is also clear from the above that if the internal processing time per element in a block is large, then the practical running time of an I/O-efficient algorithm is dominated by internal processing time. Often I/O-efficient algorithms are in fact not only efficient in terms of I/Os, but can also be shown to be internal memory efficient in the comparison model. Still, in many cases the practical running time of I/O-efficient algorithms is dominated by the internal computation time. Thus both from a practical and a theoretical point of view it is interesting to investigate

 Supported in part by the Danish National Research Foundation and the Danish National Advanced Technology Foundation.  Supported in part by an Advanced Grant from the Danish Council for Independent Research under the Sapere Aude research career program.  Center for Massive Data Algorithmics—a center of the Danish National Research Foundation. † Part of this work was done while the author was at AT&T Labs–Research.

L. Cai, S.-W. Cheng, and T.-W. Lam (Eds.): ISAAC2013, LNCS 8283, pp. 491–501, 2013. c Springer-Verlag Berlin Heidelberg 2013 492 L. Arge and M. Thorup how internal-memory efficient algorithms can be obtained while simultaneously ensuring that they are I/O-efficient. In this paper we consider algorithms that are both I/O-efficient and efficient in the RAM model in internal memory.

Previous results. We will be working in the standard external memory model of computation, where M is the number of elements that fit in main memory and an I/O is the process of moving a block of B consecutive elements between external and internal memory [1]. We assume that N ≥ 2M, M ≥ 2B and B ≥ 2. Computation can only be performed on elements in main memory, and we will assume that each element consists of one word. We will sometime assume the comparison model in internal memory, that is, that the only computation we can do on elements are comparisons. However, most of the time we will assume the RAM model in internal memory. In particular, we will assume that we can use elements for addressing, e.g. trivially implementing permuting in linear time. Our algorithms will respect the standard so-called indivisibility assumption,which states that at any given time during an algorithm the original N input elements are stored somewhere in external or internal memory. Our internal memory time measure is simply the number of performed operations; note that this includes the number of elements transferred between internal and external memory. N N Aggarwal and Vitter [1] described sorting algorithms using O( B logM/B B ) I/Os. One of these algorithms, external merge-sort, is based on Θ(M/B)-way merging. First O(N/M) sorted runs are formed by repeatedly sorting M elements in main memory, and then these runs are merged together Θ(M/B)atatime N to form longer runs. The process continues for O(logM/B M ) phases until one is left with one sorted list. Since the initial run formation and each phase can be O N/B O N N performed in ( ) I/Os, the algorithm uses ( BlogM/B B ) I/Os. Another algorithm, external distribution-sort, is based on Θ( M/B)-way splitting. The N input elements are first split into Θ( M/B) sets of roughly equal size, such that the elements in the first set are all smaller than the elements in the second O √ N set, and so on. Each of the sets are then split recursively. After (log M/B M )= N O(logM/B M ) split phases each set can be sorted in internal memory. Although performing the split is somewhat complicated, each phase can still be performed N N in O(N/B) I/Os. Thus also this algorithm uses O( B logM/B B ) I/Os. Aggarwal and Vitter [1] proved that external merge- and distribution-sort are I/O-optimal when the comparison model is used in internal memory, and in the following we will use sortE (N) to denote the number of I/Os per block of ele- N ments of these optimal algorithms, that is, sortE (N)=O(logM/B B )andexter- N nal comparison model sort takes Θ( B sortE (N)) I/Os. (As described below, the I/O-efficient algorithms we design will move O(N · sortE (N)) elements between internal and external memory, so O(sortE (N)) will also be the per element in- ternal memory cost of obtaining external efficiency.) When no assumptions other than the indivisibility assumption are made about internal memory computation (i.e. covering our definition of the use of the RAM model in internal memory), Aggarwal and Vitter [1] proved that permuting N elements according to a given N permutation requires Ω(min{N, B sortE (N)}) I/Os. Thus this is also a lower RAM-Efficient External Memory Sorting 493 bound for RAM model sorting. For all practical values of N, M and B the bound N is Ω( B sortE (N)). Subsequently, a large number of I/O-efficient algorithms have been developed. Of particular relevance for this paper, several priority queues have been developed where insert and deletemin operations can be performed 1 in O( B sortE (N)) I/Os amortized [2,4,8]. The structure by Arge [2] is based on the so-called buffer-tree technique, which uses O(M/B)-way splitting, whereas the other structures also use O(M/B)-way merging. In the RAM model the best known sorting algorithm uses O(N log log N) time [6]. Similar to the I/O-case, we use sortI (N)=O(log log N)todenotethe per element cost of the best known√ sorting algorithm. If randomization is allowed then this can be improved to O( log log n) expected time [7]. A priority queue can also be implemented so that the cost per operation is O(sortI (N)) [9].

Our results. In Section 2 we first discuss how both external merge-sort and external distribution-sort can be implemented to use optimal O(N log N) time if the comparison model is used in internal memory, by using an O(N log N) sorting algorithm and (in the merge-sort case) an O(log N) priority queue. We also show how these algorithms can relatively easily be modified to use

O(N · (sortI (N)+sortI (M/B) · sortE (N))) and

O(N · (sortI (N)+sortI (M) · sortE (N))) time, respectively, if the RAM model is used in internal memory, by using an O(N · sortI (N)) sorting algorithm and an O(sortI (N)) priority queue. The question is of course if the above RAM model sorting algorithms can be improved. In Section 2 we discuss how it seems hard to improve the running time of the merge-sort algorithm, since it uses a priority queue in the merging step. By using a linear-time internal-memory splitting algorithm, however, rather than an O(N · sortI (N)) sorting algorithm, we manage to improve the running time of external distribution-sort to

O(N · (sortI (N)+sortE (N))). N Our new split-sort algorithm still uses O( B sortE (N)) I/Os. Note that for small values of M/B the N ·sortE (N)-term, that is, the time spent on moving elements between internal and external memory, dominates the internal time. Given the conventional wisdom that merging is superior to splitting in external memory, it is also surprising that a distribution algorithm outperforms a merging algorithm. In Section 3 we develop an I/O-efficient RAM model priority queue by modi- fying the buffer-tree based structure of Arge [2]. The main modification consists of removing the need for sorting of O(M) elements every time a so-called buffer- emptying process is performed. The structure supports insert and deletemin op- 1 erations in O( B sortE (N)) I/Os and O(sortI (N)+sortE (N)) time. Thus it can N be used to develop another O( B sortE (N)) I/O and O(N·(sortI (N)+sortE (N))) time sorting algorithm. N Finally, in Section 4 we show that when B sortE (N)=o(N) (and our sorting algorithms are I/O-optimal), any I/O-optimal sorting algorithm must transfer 494 L. Arge and M. Thorup a number of elements between internal and external memory equal to Θ(B) times the number of I/Os it performs, that is, it must transfer Ω(N · sortE (N)) elements and thus also use Ω(N · sortE (N)) internal time. In fact, we show a lower bound on the number of I/Os needed by an algorithm that transfers b ≤ B elements on the average per I/O, significantly extending the lower bound of Aggarwal and Vitter [1]. The result implies that (in the practically realistic case) when our split-sort and priority queue sorting algorithms are I/O-optimal, they are in fact also CPU optimal in the sense that their running time is the sum of an unavoidable term and the time used by the best known RAM sorting algorithm. As mentioned above, the lower bound also means that the time spent on moving elements between internal and external memory resulting from the fact that we are considering I/O-efficient algorithms can dominate the internal computation time, that is, considering I/O-efficient algorithms implies that less internal-memory efficient algorithms can be obtained than if not considering I/O-efficiency. Furthermore, we show that when B ≤ M 1−ε for some constant ε>0 (the tall cache assumption)thesameΩ(N ·sortE (N)) number of transfers are needed for any algorithm using less than εN/4 I/Os (even if it is not I/O- optimal). To summarize our contributions, we open up a new area of algorithms that are both RAM-efficient and I/O-efficient. The area is interesting from both a theoretical and practical point of view. We illustrate that existing algorithms, in particular multiway merging based algorithms, are not RAM-efficient, and develop a new sorting algorithm that is both efficient in terms of I/O and RAM time, as well as a priority queue that can be used in such an efficient algo- rithm. We prove a lower bound that shows that our algorithms are both I/O and internal-memory RAM model optimal. The lower bound significantly ex- tends the Aggarwal and Vitter lower bound [1], and shows that considering I/O-efficient algorithms influences how efficient internal-memory algorithms can be obtained.

2Sorting

External merge-sort. In external merge-sort Θ(N/M) sorted runs are first formed by repeatedly loading M elements into main memory, sorting them, and writing them back to external memory. In the first merge phase these runs are merged together Θ(M/B) at a time to form longer runs. The merging is continued for N O(logM/B M )=O(sortE (N)) merge phases until one is left with one sorted run. It is easy to realize that M/B runs can be merged together in O(N/B) I/Os: We simply load the first block of each of the runs into main memory, find and output the B smallest elements, and continue this process while load- ing a new block from the relevant run every time all elements in main mem- ory from that particular run have been output. Thus external merge-sort uses N N N O( B logM/B M )=O( B sortE (N)) I/Os. In terms of internal computation time, the initial run formation can trivially be performed in O(N/M·M log M)=O(N log M) time using any O(N log N)in- RAM-Efficient External Memory Sorting 495 ternal sorting algorithm. Using an O(log(M/B)) priority queue to hold the mini- N mal element from each of the M/B runs during a merge, each of the O(logM/B M ) M merge phases can be performed in O(N log B ) time. Thus external merge-sort N M can be implemented to use O(N log M +logM/B M · N log B )=O(N log M + N N log M )=O(N log N) time, which is optimal in the comparison model. When the RAM model is used in internal memory, we can improve the in- ternal time by using a RAM-efficient O(M · sortI (M)) algorithm in the run formation phase and by replacing the O(log(M/B)) priority queue with an O(sortI (M/B)) time priority queue [9]. This leads to an O(N · (sortI (M)+ sortI (M/B) · sortE (N)) algorithm. There seems no way of avoiding the extra sortI (M/B)-term, since that would require an O(1) priority queue.

External distribution-sort.In external distribution-sort the input set of N el- ements is first split into M/B sets X ,X ,...,X√ defined by s =  0 1 M/B−1 M/B − 1 split elements x1

Split-sort. While it seems hard to improve the RAM running time of the external merge-sort algorithm, we can actually modify the external distribution-sort algo- rithm further and obtain an algorithm that in most cases is optimal both in terms of I/O and time. This split-sort algorithm basically works like the distribution- sort algorithm with the split algorithm modification described above. However, we need to modify the algorithm further in order to avoid the sortI (M)-term in the time bound that appears due to the repeated sorting of O(M) elements in the split element finding algorithm, as well as in the actual split algorithm. First of all, instead of sorting each batch of M/2 elements in the split algorithm to split them over s = M/B − 1 < M/2 split elements, we use a previous result that shows that we can actually perform the split in linear time. Lemma 1 (Han and Thorup [7]). In the RAM model N elements can be split over N 1−ε split elements in linear time and space for any constant ε>0. Secondly, in order to avoid the sorting in the split element finding algorithm of Aggarwal and Vitter [1], we design a new algorithm that finds the split elements on-line as part of the actual split algorithm, that is, we start the splitting with no split elements at all and gradually add at most s = M/B− 1 split elements one at a time. An online split strategy was previously used by Frigo et al [5] in a cache-oblivious algorithm setting. More precisely, our algorithm works as follows. To split N input elements we, as previously, repeatedly bring M/2 elements onto main memory, distribute them to buffers using the current split elements and Lemma 1, while outputting the B elements in a buffer when it runs full. However, during the process we keep track of how many elements are output to each subset. If the number of elements in a subset Xi becomes 2N/s we pause the split algorithm, compute the median of Xi and add it to the set of splitters, and split Xi at the median element into two sets of size N/s.Thenwecontinue the splitting algorithm. It is easy to see that the above splitting process results in at most s+1 subsets containing between N/s and 2N/s − 1 elements each, since a set is split when it has 2N/s elements and each new set (defined by a new split element) contains at least N/s elements. The actual median computation and the split of Xi can be performed in O(|Xi|)=O(N/s)timeandO(|Xi|/B)=O(N/sB) I/Os [1]. Thus if we charge this cost to the at least N/s elements that were inserted in Xi RAM-Efficient External Memory Sorting 497 since it was created, each element is charged O(1) time and O(1/B) I/Os. Thus each distribution phase is performed in linear time and O(N/B) I/Os, leading to an O(N · (sortI (M)+sortE (N))) time algorithm.

Theorem 1. The split-sort algorithm can be used to sort N elements in O(N · N (sortI (M)+sortE (N))) time and O( B sortE (N)) I/Os.

Remarks. Since sortI (M)+sortE (N) ≥ sortI (N) our split-sort algorithm uses Ω(N · sortI (N)) time. In Section 4 we prove that the algorithm in some sense is optimal both in terms of I/O and time. Furthermore, we believe that the algorithm is simple enough to be of practical interest.

3 Priority Queue

In this section we discuss how to implement an I/O- and RAM-efficient priority queue by modifying the I/O-efficient buffer tree priority queue [2].  Structure. Our external priority queues consists of a fanout M/B B-tree [3] T over O(N/M) leaves containing between M/2andM elements each. In such a tree, all leaves are on the same level and each node (except the root) has fan-out 1 M/B M/B M/B between 2 and and contains at most splitting elements T O √ N defining the element ranges of its children. Thus has height (log M/B M )= O(sortE (N)). To support insertions efficiently in a “lazy” manner, each internal node is augmented with a buffer of size M and an insertion buffer of size at most B is maintained in internal memory. To support deletemin operations efficiently, a RAM-efficient priority queue [9] supporting both deletemin and deletemax,1 called the mini-queue, is maintained in main memory containing the up to M/2 smallest elements in the priority queue.

Insertion. To perform an insertion we first check if the element to be inserted is smaller than the maximal element in the mini-queue, in which case we insert the new element in the mini-queue and continue the insertion process with the currently maximal element in the mini-queue. Next we insert the element to be inserted in the insertion buffer. When we have collected B elements in the insertion buffer we insert them in the buffer of the root. If this buffer now contains more than M/2 elements we perform a buffer-emptying process on it, “pushing” elements in the buffer one level down to buffers on the next level of T : We load the M/2 oldest elements into main memory along with the less than M/B splitting elements, distribute the elements among the splitting elements, and finally output them to the buffers of the relevant children. Since the splitting and buffer elements fit in memory and the buffer elements are distributed to M/B buffers one level down, the buffer-emptying process is performed in O(M/B) I/Os. Since we distribute M/2elementsusing M/B splitters the process can

1 A priority queue supporting both deletemin and deletemax can easily be obtained using two priority queues supporting deletemin and delete as the one by Thorup [9]. 498 L. Arge and M. Thorup be performed in O(M) time (Lemma 1). After emptying the buffer of the root some of the nodes on the next level may contain more than M/2elements.If they do we perform recursive buffer-emptying processes on these nodes. Note that this way buffers will never contain more than M elements. When (between 1andM/2) elements are pushed down to a leaf (when performing a buffer- emptying process on its parent) resulting in the leaf containing more than M (and less than 3M/2) elements we split it into two leaves containing between M/2and3M/4 elements each. We can easily do so in O(M/B) I/Os and O(M) time [1]. As a result of the split the parent node v gains a child, that is, a new leaf is inserted. If needed, T is then balanced using node splits as a normal B-tree, that is, if the parent node now has M/B children it is split into two nodes with 1/2 M/B children each, while also distributing the elements in v’s buffer among the two new nodes. This can easily be accomplished in O(M/B) I/Os and M time. The rebalancing may propagate up along the path to the root (when the root splits a new root with two children is constructed). During buffer-emptying processes we push Θ(M) elements one level down the tree using O(M/B) I/Os and O(M) time. Thus each element inserted in the 1 N root buffer pays O(1/B) I/Os and O(1) time amortized, or O( B logM/B B )= 1 N O( B sortE (N)) I/Os and O(logM/B B )=O(sortE (N)) time amortized on buffer- emptying processes on a root-leaf path. When a leaf splits we may use O(M/B) I/Os and O(M) time in each node of a leaf-root path of length O(sortE (N)). Amortizing among the at least M/4 elements that were inserted in the leaf since 1 it was created, each element is charged and additional O( B sortE (N)) I/Os and O(sortE (N)) time on insertion in the root buffer. Since insertion of an element in the root buffer is always triggered by an insertion operation, we can charge 1 the O( B sortE (N)) I/Os and O(sortE (N)) time cost to the insertion operation.

Deletemin. To perform a deletemin operation we first check if the mini-queue contains any elements. If it does we simply perform a deletemin operation on it and return the retrieved element using O(sortI (M)) time and no I/Os. Otherwise we perform buffer-emptying processes on all nodes on the leftmost path in T starting at the root and moving towards the leftmost leaf. After this the buffers on the leftmost path are all empty and the smallest elements in the structure are stored in the leftmost leaf. We load the between M/2andM elements in the leaf into main memory, sort them and remove the smallest M/2 elements and insert them in the mini-queue in internal memory. If this results in the leaf having less than M/2 elements we insert the elements in a sibling and delete the leaf. If the sibling now has more than M elements we split it. As a result of this the parent node v may lose a child. If needed T is then rebalanced using node fusions as a normal B-tree, that is, if v now has 1/2 M/B children it is fused with its sibling (possibly followed by a split). As with splits after insertion of a new leaf, the rebalancing may propagate up along the path to the root (when the root only has one leaf left it is removed). Note that no buffer merging is needed since the buffers on the leftmost path are all empty. RAM-Efficient External Memory Sorting 499

If buffer-emptying processes are needed during a deletemin operation we M N M N spend O( B logM/B B )=O( B sortE (N)) I/Os and O(M logM/B B )=O(M · sortE (N)) time on such processes that are not paid by buffers running full (containing more than M/2 elements). We also use O(M/B) I/Os and O(M · sortI (M)) time to load and sort the leftmost leaf, and another O(M · sortI (M)) time is used to insert the M/2 smallest elements in the mini-queue. Then we N may spend (M/B) I/Os and O(M)timeoneachofatmostO(logM/B B )nodes on the leftmost path that need to be fused or split. Altogether the filling up of M the mini-queue requires O( B sortE (N)) I/Os and O(M ·(sortI (M)+sortE(N))) time. Since we only fill up the mini-queue when M/2 deletemin operations have been performed since the last fill up, we can amortize this cost over these M/2 1 deletemin operations such that each deletemin is charged O( B sortE (N)) I/Os and O(sortE (N)+sortI (M)) time.

Theorem 2. There exists a priority queue supporting an insert operation in 1 O( B sortE (N)) I/Os and O(sortE (N)) time amortized and a deletemin opera- 1 tion in O( B sortE (N)) I/Os and O(sortI (M)+sortE (N)) time amortized. N Remarks. Our priority queue obviously can be used in a simple O( B sortE (N)) I/O and O(N · (sortI (M)+sortE (N))) time sorting algorithm. Note that it is essential that a buffer-emptying process does not require sorting of the elements in the buffer. In normal buffer-trees [2] such a sorting is indeed performed, mainly to be able to support deletions and (batched) rangesearch operations efficiently. Using a more elaborate buffer-emptying process we can also support deletions without the need for sorting of buffer elements.

4 Lower Bound

N Assume that B sortE (N)=o(N) and for simplicity also that B divides N. Recall that under the indivisibility assumption we assume the RAM model in internal memory but require that at any time during an algorithm the original N elements are stored somewhere in memory; we allow copying of the original elements. The internal memory contains at most M elements and the external memory is divided into N blocks of B elements each; we only need to consider N blocks, since we are considering algorithms doing less than N I/Os. During an algorithm, we let X denote the set of original elements (including copies) in internal memory and Yi the set of original elements (including copies) in the i’th block; an I/O transfers up to B elements between an Yi and X.Notethatin terms of CPU time, an I/O can cost anywhere between 1 and B (transfers). In the external memory permuting problem, we are given N elements in the first N/B blocks and want to rearrange them according to a given permutation; since we can always rearrange the elements within the N/B blocks in O(N/B) I/Os, a permutation is simply given as an assignment of elements to blocks (i.e. we ignore the order of the elements within a block). In other words, we start with a distribution of N elements in X, Y1,Y2,...YN such that |Y1| = |Y2| = ... = |YN/B| = B and X = Y(N/B)+1 = Y(N/B)+2 = ... = YN = ∅, 500 L. Arge and M. Thorup and should produce another given distribution of the same elements such that |Y1| = |Y2| = ...= |YN/B| = B and X = Y(N/B)+1 = Y(N/B)+2 = ...= YN = ∅. N To show that any permutation algorithm that performs O( B sortE (N)) I/Os has to transfer Ω(N ·sortE (N)) elements between internal and external memory, we first note that at any given time during a permutation algorithm we can identify a distribution (or more) of the original N elements (or copies of them) in X, Y1,Y2,...YN . We then first want to bound the number of distributions that can be created using T I/Os, given that bi,1≤ i ≤ T ,isthenumberof elements transferred in the i’th I/O; any correct permutation algorithm needs N! Ω N/B N to be able to create at least B!N/B = (( ) ) distributions. Consider the i’th I/O. There are at most N possible choices for the block Yj involved in the I/O; the I/O either transfers bi ≤ B elements from X to Yj or M from Yj to X. In the first case there are at most ways of choosing the bi bi elements, and each element is either moved or copied. In the second case there are at most most B ways of choosing the elements to move or copy. Thus the bi I/O can at most increase the number of distributions that can be created by a factor of     M B bi 2bi N · + · 2

N 2 eM 2(b1+b2)   2b1 2b2 (2 ) 2b 2 N(2eM/b1) · N(2eM/b2) ≤ ≤ N(2eM/b) . b2(b1+b2) Next we consider the number of distributions that can be created using T I/Os for all possible values of bi,1≤ i ≤ T , with a given average b.This can trivially be bounded by multiplying the above bound by BT (since this is a bound on the total number of possible sequences b1,b2,...,bT ). Thus the number  T of distributions is bounded by BT N(2eM/b)2b =((BN)(2eM/b)2b)T . Since any permutation algorithm needs to be able to create Ω((N/B)N ) distributions, we get the following lower bound on the number of I/Os T (b) needed by an algorithm that transfers b ≤ B elements on the average per I/O:   N log(N/B) T (b)=Ω . log N + b log(M/b)

N Now T (B)=Ω(min{N, B sortE (N)}) corresponds to the lower bound proved N sort N o N T B by Aggarwal and Vitter [1]. Thus when B E ( )= ( )weget ( )= Ω N sort N Ω N log(N/B) . ≤ b ≤ B ≤ M/ T b ( B E ( )) = B log(M/B) Since 1 2, we have ( )= N ω(T (B)) for b = o(B). Thus any algorithm performing optimal O( B sortE (N)) I/Os must transfer Ω(N · sortE (N)) elements between internal and external memory. RAM-Efficient External Memory Sorting 501

Reconsider the above analysis under the tall cache assumption B ≤ M 1−ε for some constant ε>0. In this case, we have that the number of distributions any permutation algorithm needs to be able to create is Ω((N/B)N )=Ω(N εN ). AboveweprovedthatwithT I/Os transferring an average number of b keys an algorithm can create at most (BN(2eM/b)2b)T 0 any, permuting algorithm using less than εN/4 I/Os must transfer Ω(N · sortE (N)) elements between internal and external memory under the indivisibility assumption.

N Remark. The above means that in practice where B sortE (N)=o(N)our N O( B sortE (N)) I/O and O(N · (sortI (N)+sortE (N)) time split-sort and prior- ity queue sort algorithms are not only I/O-optimal but also CPU optimal in the sense that their running time is the sum of an unavoidable term and the time used by the best known RAM sorting algorithm.

References

1. Aggarwal, A., Vitter, J.S.: The Input/Output complexity of sorting and related problems. Communications of the ACM 31(9), 1116–1127 (1988) 2. Arge, L.: The buffer tree: A technique for designing batched external data structures. Algorithmica 37(1), 1–24 (2003) 3. Comer, D.: The ubiquitous B-tree. ACM Computing Surveys 11(2), 121–137 (1979) 4. Fadel, R., Jakobsen, K.V., Katajainen, J., Teuhola, J.: Heaps and heapsort on sec- ondary storage. Theoretical Computer Science 220(2), 345–362 (1999) 5. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algo- rithms. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 285– 298 (1999) 6. Han, Y.: Deterministic sorting in O(n log log n) time and linear space. In: Proc. ACM Symposium on Theory of Computation,√ pp. 602–608 (2002) 7. Han, Y., Thorup, M.: Integer sorting in O(n log log n) expected time and linear space. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 135–144 (2002) 8. Kumar, V., Schwabe, E.: Improved algorithms and data structures for solving graph problems in external memory. In: Proc. IEEE Symp. on Parallel and Distributed Processing, pp. 169–177 (1996) 9. Thorup, M.: Equivalence between priority queues and sorting. Journal of the ACM 54(6), Article 28 (2007) I/O-Efficient Computation of Water Flow Across a Terrain

Lars Arge∗ Morten Revsbæk∗ Norbert Zeh† MADALGO MADALGO Faculty of Computer Science University of Aarhus University of Aarhus Dalhousie University Aarhus, Denmark Aarhus, Denmark Halifax, Canada [email protected] [email protected] [email protected]

ABSTRACT 1. INTRODUCTION Consider rain falling at a uniform rate onto a terrain T rep- An important problem in terrain analysis is the prediction resented as a triangular irregular network. Over time, water of water flow across a terrain. Traditional approaches focus collects in the basins of T , forming lakes that spill into ad- on computing the river network of the terrain under the jacent basins. Our goal is to compute, for each terrain ver- assumption that water does not collect in the basins of the tex, the time this vertex is flooded (covered by water). We terrain. In reality, water does collect in the terrain’s basins, present an I/O-efficient algorithm that solves this problem particularly during heavy rainfall. This may cause basins to using O(sort(X)log(X/M) + sort(N)) I/Os, where N is the spill into adjacent basins, changing the river network of the number of terrain vertices, X is the number of pits of the terrain as a result. Thus, to model the flow of water across terrain, sort(N) is the cost of sorting N data items, and M a terrain over time, it is necessary to compute the times at is the size of the computer’s main memory. Our algorithm which the basins of the terrain spill. In this paper, we solve assumes that the volumes and watersheds of the basins of T the more general problem of computing, for every vertex v have been precomputed using existing methods. of a terrain T ,thetimetv at which it is flooded. We assume the terrain T is represented as a triangular irregular network (TIN) and the amount of rain is uniform across the terrain. Categories and Subject Descriptors The accuracy of predictions of natural phenomena, such as flooding, depends on the precision of the data used in F.2.2 [Analysis of Algorithms and Problem Complex- these predictions. High-resolution elevation models even of ity]: Nonnumerical Algorithms and Problems—Geometrical fairly small geographic regions often exceed the size of a problems and computations computer’s main memory. Current GIS tools cannot handle data sets of this size efficiently and therefore need to work with models of lower resolutions. This sacrifices accuracy General Terms because lower-resolution models do not capture all terrain Algorithms, Theory features. For example, in experiments we conducted to pre- dict the flooding of coastal regions of Denmark due to rising sea levels, an elevation model with one data point every 10m Keywords failed to capture a dike around an island, and the island was Terrains, geographical information systems, I/O-efficient al- predicted to be flooded if the sea rises by 2m, while the dike gorithms can in fact withstand 2m higher sea levels. Using a higher- resolution model with one data point every 2m, we obtained ∗ the correct prediction, but storing elevation data for Den- Supported in part by the Danish National Research Foun- mark alone requires 500GB of space at this resolution, an dation, the Danish Strategic Research Council, and by the US Army Research office. MADALGO is the Center for input size well beyond the size of main memory and beyond Massive Data Algorithmics, a center of the Danish National the reach of current GIS tools. Research Foundation. To process data sets beyond the size of main memory ef- †Supported in part by the Natural Sciences and Engineer- ficiently, it is necessary to develop I/O-efficient algorithms, ing Research Council of Canada and the Canada Research that is, algorithms that focus on minimizing the number of Chairs programme. Part of this work was done while on disk accesses they perform to swap data between disk and sabbatical at MADALGO. internal memory, as a disk access is orders of magnitude slower than an internal-memory computation step. In this paper, we propose an I/O-efficient algorithm for computing the flooding times of all vertices of a terrain T . Permission to make digital or hard copies of all or part of this work for The core of the problem is computing the spill times of all personal or classroom use is granted without fee provided that copies are basins of T . A simple method to compute these spill times not made or distributed for profit or commercial advantage and that copies is to simulate the entire sequence of spill events of the basins bear this notice and the full citation on the first page. To copy otherwise, to T republish, to post on servers or to redistribute to lists, requires prior specific of and maintain the watersheds of all basins that yet have permission and/or a fee. to spill, as well as predicted spill times for these basins based SCG’10, June 13–16, 2010, Snowbird, Utah, USA. on their current watersheds. An internal-memory solution Copyright 2010 ACM 978-1-4503-0016-2/10/06 ...$10.00.

403 based on this idea has been presented in [14]. In Section 3.2, The key to obtaining an O(sort(N)) I/O solution to this we discuss an O(N log N) time implementation of this ap- problem is an algorithm that can process a sequence of N proach using a priority queue and a union-find structure. Union and Find operations using O(sort(N)) I/Os, assum- This simulation approach does not translate into an I/O- ing the entire sequence of operations is provided in advance. efficient algorithm, as the only known I/O-efficient union- Arge and Revsbæk [5] extended the result of [1] to remov- find structure [1] requires the sequence of Union operations ing basins based on different geometric measures, including to be known in advance. In simulating the sequence of spill volume and area. events, the set of Union operations is known, but their Partial flooding methods provide a basis only for approxi- order depends on the spill times of the basins. Using an mate solutions to flow modelling, as the underlying assump- internal-memory union-find structure and an I/O-efficient tion is that all “small” basins are flooded at a certain time, priority queue (e.g., [3, 12]), a cost of O(Xα(X)+sort(N)) while all “big” basins are not. This assumption may not be disk accesses can be achieved, where α(·)istheinverseof true, as the spill time of a basin depends on its volume and Ackermann’s function, sort(N) N is the I/O complex- its watershed, and the watershed of a basin may grow over ity of sorting N data items (see next section), and X is time as a result of other basins spilling into it; in particular, the number of pits of the terrain. In contrast, the algo- the watershed of a big basin may grow to a size far exceeding rithm presented in this paper achieves an I/O complexity of the size of the watershed of a small basin and, thus, the big O(sort(X)log(X/M) + sort(N)) disk accesses. basin may spill before the small basin. To model the flow In the remainder of this section, we formally define the network at time t accurately, it is necessary to compute the computational model we use, discuss previous work, and times the basins of T fill and remove the basins that are give an overview of our algorithm. Section 2 introduces the full at time t. This is much harder than the partial flood- terminology and notation we use throughout the paper. Sec- ing approaches discussed above, as the above methods are tion 3 discusses our algorithm for computing the spill times based on local measures associated with each basin, while of the terrain’s basins and the flooding times of all terrain the computation of the actual spill time of each basin β de- vertices. This algorithm makes use of an I/O-efficient meld- pends on the spill times of other basins that spill into β.As able priority queue, which we discuss in Section 4. mentioned in the introduction, Liu and Snoeyink [14] pre- sented an internal-memory algorithm for computing actual 1.1 I/O Model spill times and used it for flow prediction. While the pa- We use the I/O model with one (logical) disk [2] to design per does not present the algorithm in detail, we discuss an and analyze our algorithm. In this model, the computer is efficient implementation using a union-find structure and a equipped with a two-level memory hierarchy consisting of an priority queue in Section 3.2; this implementation is needed internal memory and a (disk-based) external memory.The as part of our I/O-efficient algorithm. internal memory is capable of holding M data items (ver- We also require a tree structure that represents the nesting tices, edges, etc.), while the external memory is of conceptu- of the basins of T . This tree has been termed the merge tree ally unlimited size. All computation has to happen on data of T in [7,8] (also see Section 2) and can be computed using in internal memory. Data is transferred between internal O(sort(N)) I/Os using the topological persistence algorithm and external memory in blocks of B consecutive data items. of [1]. This algorithm is easily augmented to compute the Such a transfer is referred to as an I/O operation or I/O. lowest saddle on the boundary of each basin and, as shown The cost of an algorithm is the number of I/Os it performs. in [5], the volume of each basin. The number of I/Os required to read N contiguous items Computing the watershed sizes of all basins of T is harder. from disk is N/B. The number of I/Os required to sort N The watershed sizes of all basins of T are easily computed itemsissort(N)=Θ((N/B)logM/B(N/B)) [2]. For all real- from the watershed sizes of all pits by summing watershed istic values of N, M,andB,wehaveN/B < sort(N) N. sizes bottom up in the merge tree. However, the only I/O- efficient algorithm for computing the watersheds of all pits 1.2 Related Work of a terrain exactly is the one of [9]. This algorithm per- forms O(sort(N + S)) I/Os for fat terrains, where S is the Due to its importance, the problem of computing water size of the terrain’s strip map. The size of the strip map of flow across a terrain (usually in the form of a river net- a fat terrain is Θ(N 2) in the worst case [9], in which case work) has been studied extensively. Most existing meth- the exact computation of watersheds becomes infeasible. An ods for computing river networks assume that once water approach often taken in practice [16–18] is to assume that flows into a small basin of the terrain, it never flows out—in water flows only along terrain edges, allowing the computa- other words, basins do not spill. Therefore, to avoid water tion of watersheds using O(sort(N)) I/Os. While this may getting caught in small local basins, most flow modelling lead to poor approximations of the watersheds for irregular approaches first remove all basins by flooding the terrain, terrains [9], TIN’s derived from LIDAR data, for example, that is, conceptually pouring water onto the terrain until all are rather regular, and the computed watersheds match the basins are filled [4, 13, 15]. However, this often leads to un- real watersheds rather closely. realistic flow patterns, since many important geographical features are removed. Recent papers [1, 5] suggested partial flooding algorithms that flood only “small” basins, where the 1.3 New Result size of a basin can be defined using different measures, such We present an algorithm that computes the flooding times as height, volume or area. Agarwal, Arge and Yi [1] de- of all terrain vertices using O(sort(X)log(X/M)+sort(N)) scribedanO(sort(N)) I/O partial flooding algorithm that I/Os, where N is the size of the terrain and X is the number removes all basins of small height. This is done by com- of pits. This assumes that the watershed sizes and volumes puting the topological persistence [10, 11] of each basin and of all basins are given and that every vertex is labelled with removing the ones with persistence below a small threshold. the pit whose watershed contains it. The cost of computing

404 this information has been discussed in the previous section. min(T (w2), T (w4)) and appear in the order w1,w2,w3,w4 Our algorithm operates on the merge tree M of the given clockwise around v. terrain T .Givenanodeβ of M, we employ a recursive A basin is a maximal connected set of points β ⊆ R3 such strategy to compute the spill times of all basins represented that T ((x, y)) ≤ z ≤ hβ , for all (x, y, z) ∈ β,wherehβ is a by descendants of β. If there are at most M such descen- fixed elevation chosen as the upper boundary of β.Abasinβ dants, we use an augmented version of the internal-memory is maximal if there exists a saddle point pβ on the boundary algorithm mentioned in the introduction to compute the spill of β such that T (pβ)=hβ.Thepointpβ is called the spill times of their corresponding basins using O(sort(Y )) I/Os, point of basin β,sincewaterpouredintoβ spills over pβ where Y is the number of descendants of β plus the number into an adjacent basin once β becomes full. We assume pβ of basins that spill into β. Otherwise we consider an appro- is unique, that is, there are no two saddles with elevation hβ priate path P from β to a descendant leaf and prove that on the boundary of β. We also assume exactly two basins we can compute the spill times of all basins whose parents meet in each spill point. Throughout this paper, we are belong to P using O(sort(Y )) I/Os, where Y is defined as interested almost exclusively in maximal basins and refer to above. The path P is chosen so that every subtree attached them simply as basins. Every basin contains at least one to P contains at most half of β’s descendants, and we invoke pit, and for every pit there exists a unique (maximal) basin our algorithm recursively on the root of each such subtree. that contains only this pit. We call such a basin elementary. This ensures that log(X/M) levels of recursion suffice to Trickle paths, watersheds, and tributaries. The break M into subtrees of size at most M,towhichtheabove trickle path of a point q ∈T is the path that starts at q, internal-memory algorithm can be applied. The cost per continues in the direction of steepest descent for every point level of recursion is O(sort(X)) I/Os. Hence, the total cost it visits, and ends either in a pit p or at the boundary of T . of the algorithm is O(sort(X)log(X/M)) I/Os. This gives In the former case, water falling onto q collects in the ele- only the spill times of the basins. To compute the flooding mentary basin corresponding to p; in the latter, it flows off times of all terrain vertices, we use a post-processing step the edge of the terrain. The watershed of a pit p is the set of 0 that can be seen as a simpler version of the recursive step of points whose trickle paths end in p. The watershed Wβ of a our algorithm and performs O(sort(N)) I/Os. basin β is the union of the watersheds of all pits contained t The intuition behind the computation in each recursive in β. More generally, we use Wβ to denote the watershed of step is the following. In general, there are two directions basin β at time t, which is the area such that water falling t water can flow across any saddle of the terrain T .Giventhe onto Wβ at time t collects in basin β.Atributary of β is flow direction for each saddle, spill times could be computed abasinτ such that τ ∩ β = ∅, τ spills before β,andwater tτ rather easily, but the direction for each saddle is determined falling onto Wτ at the time tτ when τ spills collects in β; by the spill times of the two basins that merge at this sad- that is, τ spills into β and the watershed of τ at this time dle. For the saddles between the basins corresponding to the becomes part of the watershed of β.Notethatτ is usually path P in each recursive steps, we can characterize each flow a tributary of more than one basin. In particular, if β is the direction as either “confluent” (toward the basin represented smallest basin that has τ as a tributary, then τ is a tributary by the leaf of P ) or “diffluent” (away from the leaf), and of every basin that contains β but not τ. we can show that confluent spill events that influence other Merge tree and flow tree. The basins of T form a spill events—we call these watershed events—can themselves hierarchy that is easily represented using a rooted tree, the depend only on confluent watershed events. This allows us merge tree M of T .TheleavesofM are the elementary to perform a sweep in the confluent direction first to com- basins of T .Abasinβ1 is the parent of a basin β2 if and pute the times of all confluent watershed events. In a second only if β2 ⊂ β1 and there exists no basin β3 such that β2 ⊂ phase, we sweep in the diffluent direction and use the times β3 ⊂ β1. Under the assumptions we made about T , M is computed in the first phase to compute the correct times for a binary tree. For a subset S of nodes of M,letM(S)be all spill events. the subgraph of M induced by S.Foranodeα of M, Mα denotes the subtree of M induced by α and its descendants. 2. PRELIMINARIES We do not distinguish between merge tree nodes and their corresponding basins. For a basin β,weuseVβ to denote In this section, we introduce the basic terminology used β’s volume, W t to denote β’s watershed at time t or its size throughout this paper. In the following, we make some as- β (which will be clear from the context), Eβ to denote the sumptions about the structure of the terrain that simplify event that β spills into an adjacent basin, and tβ to denote the exposition in the rest of the paper. All these assumptions the time of event Eβ.WecallEβ the spill event associated can be removed using standard perturbation techniques. with β. See Figure 1 for an illustration of these definitions. Terrains, pits, and basins. We consider the terrain T Now consider a subset S of nodes of M such that M(S) to be represented as a triangular irregular network, which is a tree. The flow paths between the basins in S form is a planar triangulation each of whose vertices v has an a flow tree F(S) defined as follows. Let S be the set of associated elevation T (v). The elevation of a point interior leaves of M(S). The elements of S are the vertices of F(S). to a triangle is a linear interpolation of the elevations of the There is an edge between two such vertices α1 and α2 if their triangle vertices. In this manner, the terrain is represented watersheds touch in the common spill point of two basins as a continuous piecewise linear surface, and we use T (p) 2 β1,β2 ∈ S. See Figure 2. to refer to the elevation of the point p ∈ R . We assume that no two adjacent vertices have the same elevation. A pit of T is a local minimum, that is, a terrain vertex all 3. COMPUTING FLOODING TIMES of whose neighbours have higher elevations. A saddle point Our algorithm for computing the flooding times of all ter- of T is a vertex v with four vertices w1,w2,w3,w4 among rain vertices has two phases. The first phase (Sections 3.1– its neighbours that satisfy max(T (w1), T (w3)) < T (v) < 3.3) computes the spill times tβ of all basins of T using

405 Vi = 315 ti =8.75 i i

Vc =49 Vh = 123 tc =3.26 th =4.78 c h c h =44 =219 a b d Vg a b d tg . g g V =6 f t =1 f b Vd =2 b t =0.4 =13 d Va e t =1.26 e =3 a Vf tf =0.33 =13 Ve t =0.85 0 =9 0 =6 0 =5 0 =7 0 =9 e Wa Wb Wd We Wf 0 =16 Wg 0 =15 0 =21 Wc Wh 0 =36 Wi (a) (b)

Figure 1: (a) A 1-d terrain with its corresponding merge tree and volumes and initial watersheds of its basins. (b) The spill times of the basins and the water levels in the basins at time t =2.

α0 α0 α1 α2 α1 α3 β1 β4 β5 α4 α2 β5 α5 2 β β7 α6 β4 α3 β3 β2 β7 β3 α4 β3 α7 β6 β2 β4 β1 β6 α5 β5 α7 α6 β1 β6 α7 β7

(a) (b) (c)

Figure 2: Figure (a) shows the merge tree M(S)ofasetS = {α0,α1,β1,...,α7,β7} of basins. Figure (b) illustrates the nesting of these basins using contour lines through their spill points. Figure (c) shows the flow tree of basins α7,β1,β2,...,β7. Every edge corresponds to a spill point and joins the two basins containing the endpoints of the trickle paths starting at this spill point (shown as dashed lines in Figure (b)). Note that the trickle paths corresponding to different edges incident to a flow tree node βi may end in different pits inside the basin βi. This is the case for basin β6 in Figure (b).

O(sort(X)log(X/M)) I/Os. The second phase (Section 3.4) Our algorithm for computing the spill times of all basins then computes the flooding times of all vertices of T using is recursive. Every recursive call takes a node β ∈Mand O(sort(N)) I/Os. a list Rβ of all tributaries of β as input. Each tributary tτ τ ∈Rβ stores its spill time tτ , its watershed Wτ := Wτ at 3.1 Computing Spill Times time tτ , and the preorder number of an elementary basin it β We assume the merge tree M is given and every node contains. The task of the recursive call on a node is to 0 compute the spill times of all proper descendants of β in M. β ∈Mstores Vβ , Wβ ,aswellasidentifiersofthetwoele- The top-level invocation takes the root ρ of M and an empty mentary basins λf (β)andλr(β) such that λf (β) ∈Mβ and 0 0 list of tributaries as input (since ρ has no tributaries). We Wλ (β) and Wλ (β) touch in pβ; λf (β) is needed for some f r distinguish two cases for each recursive call. data structure queries in our algorithm, while λr(β)isthe If |Mβ |≤M, we use the algorithm in Section 3.2 below basin where water spilling from β would collect if we poured to solve the problem using O(sort(X)) I/Os, where X is the water only into β. We further assume the elementary basins total input size of the invocation, that is, X := |Mβ |+|Rβ |. of T have been numbered using a preorder traversal of the If |Mβ| >M,wecomputeaheaviest path P in Mβ .This flow tree F of T starting at an arbitrary root, and every path consists of a sequence of nodes α0,α1,...,αk,where elementary basin stores the preorder interval of its descen- α0 = β, αk is a leaf, and, for 1 ≤ i ≤ k, αi is the child of dants. This information can be computed from the input of αi−1 with the bigger subtree Mα among the two children our algorithm using O(sort(X)) I/Os using standard tech- i of αi−1.Weuseβi to denote the other child of αi−1.See niques. Details appear in the full paper.

406 Figure 2(a). The first step of the algorithm is to compute the part of Wα .Moreprecisely,aFind(α) operation returns  spill times tαi and tβi and the set Rβi of tributaries of βi,for a representative elementary sub-basin of α , and it is easy all 1 ≤ i ≤ k. To finish the computation of spill times, we to ensure that at all times, the representative of α stores  recursively invoke the algorithm on each node βi,1≤ i ≤ k. the ID of α . Initially, all elementary basins are active and As we show in Section 3.3, the computation of the spill times all other basins are inactive; the predicted spill time of an  0 tαi and tβi ,for1≤ i ≤ k, and of the lists of tributaries of the elementary basin α is tα := Vα/Wα. To allow the updating basins β1,β2,...,βk takes O(sort(k + |Rβ |)) = O(sort(X)) of predicted spill times, each active basin α also stores the I/Os, where X := |Mβ|+|Rβ |. The path P can be computed time uα when its watershed changed last—that is, the spill using O(sort(X)) I/Os using the Euler tour technique and time of its most recent tributary—as well as its residual vol- r list ranking [6]. This gives the following result. ume Vα at time uα, which is the portion of Vα left to be r filled at this time. Initially, Vα = Vα and uα = 0, for every Theorem 1. The spill times of all basins of a terrain T elementary basin α of T . can be computed using O(sort(X)log(X/M)) I/Os, where X After initializing the algorithm’s data structures as just is the number of elementary basins of T . described, we process the events in Q and Rβ by increasing time until Q contains only the basin β. The details of each Proof. We prove in Section 3.2 that, given the tribu- iteration are as follows. Let τ be the next tributary in Rβ to taries of β, we compute the correct spill times for all basins be processed, and let α be the active basin with minimum in Mβ in the case |Mβ|≤M. In Section 3.3, we prove  priority in Q.Iftτ M, we compute the spill times of Eτ as a watershed event with time tτ , as described below. basins αi and βi and the tributary lists Rβi ,for1≤ i ≤ k,  If tα M  O(sort( )) + i=1 ( i i) shed of α . To process a watershed event Eα with time t,we  findtheactivebasinα that α spills into using a Find(λr(α)) where Yi := |Mβi | and Xi := Yi + |Rβi |. By the definition  r operation on U. We update the information for α as Vα := of a tributary, every basin τ with τ ∈Mβ or τ ∈Rβ can r Vα − Wα (t − uα ), Wα := Wα + Wα, uα := t,and be the tributary of at most one basin βi.Hence,wehave  r  Pk tα := t + Vα /Wα , and then update the priority of α in Q Xi ≤ X. Furthermore, the choice of the path P as i=1 accordingly. If α ∈Rβ, this finishes the processing of Eα.If a heaviest path ensures that Yi ≤ Y/2, for all 1 ≤ i ≤ k. α ∈Mβ, we need to update U to reflect that water falling Together, these two facts imply that the recurrence solves  or flowing onto Wα after time t flows into α .Wedothis to T (X, Y )=O(sort(X)log(Y/M)). For the root of M,we by performing a Union(λf (α),λr(α)) operation on U and X Y have = , that is, the overall complexity of the algorithm ensuring that the representative of the resulting set of ele- X X/M is O(sort( )log( )), as claimed. mentary basins points to α.Wealsolabelthenodeα in M 3.2 Small Basins β as finished. To solve the case |Mβ|≤M, we provide an implemen- Basin event: A basin event occurs when the second of two tation of the algorithm of [14] using a union-find structure sibling basins becomes full, leaving its parent basin in Mβ and a priority queue and extend it to take the tributaries of to fill. When processing a basin event Eα with time t,for the basin β into account when computing the spill times of some α ∈Mβ ,bothα and its sibling σ in Mβ are full at all sub-basins of β. Before running the actual algorithm, we time t.Thus,welabelα as finished in Mβ and mark its compute, for each tributary τ of β, the elementary sub-basin parent γ as active. Water that flowed into α before time t γ W W λr(τ)ofβ that τ spills into; that is, τ spills into β across now collects in . Thus, we set γ := α and ensure that W 0 the representative of α now points to γ.Sincebothα and σ a saddle on the boundary of λr(τ). These basins can be r are full at time t,wesetVγ := Vγ − Vα − Vσ, uγ := t,and computed from the preorder numbers stored with the trib-  r  utaries of β and the preorder intervals of the elementary tγ := t + Vγ /Wγ .Thenweinsertγ into Q with priority tγ . sub-basins of β using O(sort(X)) I/Os. Details appear in The following lemma states the correctness and I/O com- the full paper. plexity of the above procedure. The algorithm maintains a set of active basins in Mβ , which are the basins that have not spilled yet but whose Lemma 1. Let β be a basin with at most M sub-basins, children have already spilled. A basin that has spilled is and let X := |Mβ| + |Rβ |. The spill times of all basins finished, and a basin with at least one unfinished child is α ∈Mβ can be computed using O(sort(X)) I/Os. inactive. The set of active basins always contains the next basin in Mβ to spill. We maintain the set of active basins Proof. To bound the I/O complexity of the procedure, using two data structures: a priority queue Q and a union- we observe that Q, U,andMβ fit in memory and that find structure U. The priority queue stores the active basins scanning the sorted list of tributaries Rβ requires us to keep with priorities equal to their predicted spill times—these only one buffer block of size B in memory. Thus, apart timesdecreaseovertimeaswediscovermoretributariesof from sorting the tributaries in Rβ by their spill times using active basins. The union-find structure stores the elemen- O(sort(X)) I/Os, the algorithm only loads Mβ into mem- tary basins (leaves) in Mβ and allows us to find, for each ory and scans Rβ ,whichtakesO(X/B)I/Os.Allother  0 such basin α,theactivebasinα such that Wα is currently processing happens in memory.

407 R , R ,...,R β ,β ,...,β To prove the correctness of the algorithm, we consider β1 β2 βk of the basins 1 2 k. Our algo- the sequence of events E1,E2,...,EX affecting basin β and rithm for this problem operates on the flow tree F := F(S) its sub-basins, sorted by their times t1,t2,...,tX .Thatis, of the set S of basins in P . Note that every edge in F corre- every event Ei is an event Eα with α ∈Mβ or such that sponds to the spill point connecting two basins αi and βi,for  α/∈Mβ and α isatributaryofabasinα ∈Mβ,inwhich 1 ≤ i ≤ k,andthatλf (αi)=λr(βi)andλf (βi)=λr(αi)in case α is also a tributary of β and belongs to Rβ . this case. So the tree is easy to construct using a constant  We use ti to denote the “current” predicted time of the number of sorting and scanning steps of S. Details appear event Ei. Consider the basin α such that Ei = Eα.Then in the full paper.   we define ti := ti if α is a tributary of β and ti := ∞ if We call a spill event Eβi confluent, as the water spilling

α ∈Mβ and α is inactive; if α ∈Mβ and α is active, we from βi at time tβi spills towards αk. Similarly, we call a  define ti to be its priority in Q. We use induction on i to spill event Eαi diffluent, as the water spilling from αi at time  prove that ti = ti when event Ei is processed. To do so, we tαi spills away from αk (αk is a sub-basin of αi). We define prove that the following holds after processing event Ei: Fβi to be the subtree of F rooted in βi, assuming αk is cho- sen as the root of F. Our algorithm proceeds in two phases. (i) Events E1,E2,...,Ei have been processed. No other t ,t ,...,t The confluent phase computes times β1 β2 βk .These events have been processed. t t E times satisfy βi = βi if βi is a watershed event and t ≥ t E ti βi βi if βi is a basin event. In addition, this phase (ii) Wα = Wα , for all active basins α ∈Mβ. constructs a list Lβi ,foreach1≤ i ≤ k, which is a su-   perset of those tributaries of βi that belong to Fβ or spill (iii) ti+1 = ti+1, while tj ≥ tj , for all j>i+1. i into β across a saddle on the boundary of a basin βj ∈Fβi . t ,t ,...,t The base case is i = 0, setting t0 = 0. Property (i) holds The second, diffluent phase computes times α1 α2 αk t ,t ,...,t because no events have been processed yet. Property (ii) and β1 β2 βk from the times and potential tributary 0 holds because initially Wα = Wα, for every elementary basin lists computed in the confluent phase, and we prove that t t t t ≤ i ≤ k in Mβ, which is exactly the set of active basins at the begin- αi = αi and βi = βi , for all 1 . In the process, the R β ning of the procedure. To see that (iii) holds, observe that diffluent phase computes the list βi of tributaries of i,for  tj = tj if Ej = Eτ , for some tributary τ of β.IfEj = Eα, every 1 ≤ i ≤ k.  for α ∈Mβ,andα is inactive, then tj = ∞;ifα is active, Intuitively, this two-phase approach works because the  0 β then tj = Vα/Wα ≥ tj .Moreover,ifE1 = Eα,forsome confluent phase ensures that the spill times of all basins i 0  α ∈Mβ ,thenα has no tributaries and t1 = Vα/W = t . that are tributaries of other basins are computed correctly α 1 β For the inductive step, consider i>0 and assume the (since i can be a tributary only if its spill event is a water- α claim holds for all jti, for all j>i, after processing Ei−1.Hence,Ei basin of j , for all ,and i is a super-basin of h, for all j h>i α is the ith event to be processed. This establishes (i). . Thus, by processing the basins “outwards” from k— i To prove (ii), we consider the two possible types of event that is, by decreasing index —in the diffluent phase, we can Ei separately. Let Ei = Eα.IfEi is a watershed event, part ensure that the spill times of all tributaries of a basin are (ii) of the inductive hypothesis implies that we identify the known by the time the basin is processed. correct watershed Wα into which α spills at time ti and, 3.3.1 From Tributaries to Spill Times ti hence, that we update Wα to W  ; all other watersheds α Both phases of our algorithm use the same elementary remain unchanged. If Ei is a basin event, α and its sibling procedure to compute a predicted spill time for a basin. To are full at time ti, while α’s parent γ has to fill. Furthermore, avoid duplication, we describe this procedure here and refer all water flowing into γ immediately before time ti collects in α α W ti W ti to it as procedure FindSpillTime later. The input of this because ’s sibling is already full. Therefore, γ = α , r α uα Vα Wγ Wα procedure is a basin ,atime ,theresidualvolume of and we correctly set := . uα α at time uα, and the watershed Wα = Wα of α at time uα. Finally, consider (iii). We consider only events Ej = Eα  In addition, we are given a priority queue Q containing a with α an active basin in Mβ because, by definition, tj = tj , E β t ∞ set of potential tributaries of α;eachentryτ ∈ Q satisfies for every spill event j of a tributary of ,and j = ,   E M tτ >uα,wheretτ denotes the priority of τ. The output of for every spill event j of an inactive basin in β.Itis  t the procedure is a predicted spill time tα of α and a list L easily verified that we update α correctly whenever we in-  W E h ≤ i W of entries removed from Q while computing tα. crease α. By (ii), each event h with updates α  r E E τ α Initially, we set tα := uα + Vα /Wα. Then we repeat the if and only if h = τ , for some tributary of .Thus,  t ≥ t j ≥ i j i following steps to update tα until either Q is empty or the j j , for all .For = + 1, the spill events of the   α E ,E ,...,E minimum entry τ in Q satisfies tτ ≥ tα.Weremovethemin- tributaries of are a subset of 1 2 i,astheyhave   E E t t imum entry τ (which satisfies tτ Mcomputes the spill times α. Assume further that every tributary τ ∈ Q of α satisfies   of α1,β1,α2,β2,...,αk,βk, as well as the lists of tributaries tτ = tτ .Thentα ≥ tα when the procedure terminates. If Q

408 τ α t ≥ u t t u W W 0 V r V contains every tributary of with τ α,then α = α, βi := 0, βi := βi ,and βi := βi . Once this procedure L {τ ∈R | t ≥ u } β Q t and = α τ α . terminates, we insert i into βi , with priority βi . The confluent phase terminates when the traversal reaches Proof. First assume for the sake of contradiction that the root αk of F.SinceEα is a diffluent spill event, we t t Now assume contains all tributaries of and α α F if βh is not a sub-basin of α but belongs to the path from FindSpillTime  when procedure finishes. This is possible βj to every sub-basin α ∈F of α. only if we do not process all tributaries of α. An unprocessed tributary τ wouldhavetoremaininQ by the end of the Lemma 3. Consider a tributary τ of a basin α ∈{αi,βi},    τ ∈{β }∪S j E procedure and, thus, satisfies tτ >tα. However, tτ = tτ < and assume that j βj ,forsome .Then βh is a  tα i, the lemma holds vacuously because α t ≥ u are exactly the tributaries of that satisfy τ α. βj is a sub-basin of αi in this case and α ∈{αi,βi};this implies that every basin βh on the path from βj to α is a 3.3.2 Confluent Phase sub-basin of αi.Forj ≤ i, assume there exists a basin βh on the path from βj to α that is not a sub-basin of αi and such To implement the confluent phase of the algorithm, we E β α that βh is a basin event. Since h is not a sub-basin of i, root the flow tree F in αk and process its nodes in post- we have h ≤ i.Ifα = αi, this implies that α is a sub-basin order. With every node α ∈F we associate a list Sα ⊆Rβ of αh and tα ≤ tα ≤ tβ .Ifα = βi,thenh

Insert, DeleteMin, Create,andMeld operations using Proof. First assume τ is a tributary of βi and βj ∈Fβi .

O(sort(N)) I/Os. Then Eτ is a watershed event and either τ ∈Sβj or τ = βj .  The processing of a node βi distinguishes two cases. If In both cases, we have tτ = tτ , in the latter case by the β Q t t i is a leaf, we create a new priority queue βi at the end assumption that βh = βh for every watershed event on the of the sequence of priority queues of active nodes. If βi path from βj to γ.Ifγ belongs to the path from βj to βi and  is an internal node with children βj1 ,βj2 ,...,βjh ,thecor- γ = βi,thenwehavetγ = tγ t t priority τ = τ . Then we use procedure to implies that βi βi τ = τ . Thus, the computation of t L β Q t τ Q τ ∈ L compute βi and βi ; the input to the procedure is i, βi , βi removes from βi ,and βi .

409 Now assume that τ is a tributary of a basin α ∈{αi,βi} ignore in iteration i all elements αj or βj in Q with j>i. and, if α = βi,thatβj ∈Fβi . Then, if γ belongs to the (Note that we ignore only these basins, not the elements of   path from βj to αi,weobtaintγ = tγ i.) hand, because τ is a tributary of α and, by Lemma 3, Eγ     Lemma 6. For al l 1 ≤ i ≤ k, we have tα = tα , tβ = t , is a watershed event; on the other hand, tγ >tτ because S i i i βi τ/∈ L γ R R R R ∪ k−1 R otherwise γ . Thus, we obtain a contradiction and is βi = βi ,and αi = αk j=i βj . a sub-basin of αi. Proof. By induction on i.Fori = k, Lemmas 4 and 5  imply that every tributary τ of αk belongs to Lα and sat- Lemma 5. For al l 1 ≤ i ≤ k, we have t ≥ tβ .IfEβ is k βi i i t t α ∈ L t ≥ t a watershed event, equality holds. isfies τ = τ . Any other basin αk satisfies α α, and the same arguments as in the proof of Lemma 5 show  Proof. |Fβ | |Fβ | By induction on i . The base case is i =0, that tα ≥ tαk in this case. Hence, by Lemma 2, the first in which case there is nothing to prove. So consider a node t t iteration computes αk = αk and processes only tributaries  βi and assume the claim holds for all its descendants. Then, of αk,thatis,Rα = Rαk . τ ∈{β }∪S k by Lemma 4, every tributary j βj ,forsomede- Now consider βk. By Lemmas 4 and 5, every tributary of  scendant βj of βi, is inspected when computing tβ ,and βk belongs to {αk}∪Lα ∪ Lβ , and we have just argued  i k k each such tributary satisfies tτ = tτ by the inductive hy- that the computation of αk processes only tributaries of αk. Q pothesis. In particular, each such tributary belongs to βi Hence, Q contains all tributaries of βk. Wehavejustargued   when invoking procedure FindSpillTime to compute tβ . t t τ i that αk = αk and, by Lemma 5, any other tributary of α ∈ Q t ≥ t  We prove that any other element βi satisfies α βi . βk satisfies tτ = tτ . For an element α ∈ Q that is not a trib- t ≥ t t t By Lemma 2, this implies that βi βi ,with βi = βi if utary of βk, the same arguments as in the proof of Lemma 5  Qβ βi Eβ t ≥ t i contains all tributaries of ,whichisthecaseif i is show that α βk . (However, we have to distinguish the τ β ∈F β ∈F/ α ∈{β }∪R a watershed event. Indeed if there was a tributary such cases j βk and j βk ,where j βj .) Thus,   that τ ∈{βj }∪Rβ ,forsomeβj ∈F/ β , the water spilling t t R R j i by Lemma 2, we compute βk = βk and βk = βk . from τ into βi would have to flow through a point interior Now assume that ii. By Lemmas 4 and 5, every trib- Sk a watershed event, αi is not full at time tτ ,thatis,thepath utary τ of αi is contained in L := Lα ∪ Lβ = S k j=i+1 j from τ to βi cannot be down-hill at time tτ , a contradiction. Q ∪ k R ∪R t t j=i+1( αj βj ) and satisfies τ = τ .Usingthe So consider an element α ∈ Qβi , and assume that α ∈  arguments from the proof of Lemma 5 again, every α ∈ L {βj }∪Sβj ,forsomeβj ∈Fβi ,andtα

410 increasing elevation. Then the water from every tributary in We always keep the topmost M elements of this stack in R β β βi first collects in T (v1);once T (v1) is full, the water col- memory, which ensures in particular that the I/D buffer of β Q lects in T (v2), and so on. This is just a simplified version of k is in memory. the diffluent flow computation in Section 3.3.3, and comput- β ,β ,...,β ing the spill times of basins T (v1) T (v2) T (vk)—that 4.2 Operations is, the flooding times of vertices v1,v2,...,vk—from the list Create. To create a new priority queue Qk+1,aCreate op-  Rβ takes O(sort(N)) I/Os in total for all basins of T .The eration pushes a new marker element onto the buffer stack. computation of the sets V (β) can be incorporated in the computation of the basins of T without increasing the I/O Insert. An Insert operation on Qk adds the inserted el- complexity of that phase. Details appear in the full paper. ement x to Qk’s I/D buffer. If the buffer now contains M Thus, we have the following result. elements, we Flush the buffer as described below. Then we Fill the I/D buffer with the M/2 minimum elements in Qk Theorem 2. The flooding times of all vertices of a ter- and update pk accordingly. rain T with N vertices and X pits can be computed using DeleteMin. A DeleteMin operation on Qk returns the O(sort(X)log(X/M)+sort(N)) I/Os, given the watersheds minimum element in Qk’s I/D buffer. Before doing so, how- and volumes of all basins of T . ever, it checks whether the I/D buffer of Qk is empty or its minimum element is greater than pk.Ifso,itFlushes 4. A MELDABLE PRIORITY QUEUE the I/D buffer and then FillsitwiththeM/2 minimum In this section, we discuss how to extend the external heap elements in Qk. of Fadel et al. [12] to obtain a meldable priority queue that Meld can be used in the procedure in Section 3.3.2. This struc- Meld. A operation behaves differently depending on the structure of the two involved priority queues, Qk−1 ture maintains a sequence of priority queues Q1,Q2,...,Qk Q under Create, Insert, DeleteMin,andMeld operations. and k. During different stages of its life time, a priority queue may have no external portion because all its elements Given the current sequence Q1,Q2,...,Qk,aCreate op- eration creates a new, empty priority queue Qk+1 and ap- fit in the I/D buffer. If at least one of the two priority Q pends it to the end of the sequence; an Insert(x)oper- queues, say k, has no external portion, we destroy its I/D buffer, and Insert its elements into Qk−1. If both prior- ation inserts element x into Qk;aDeleteMin operation ity queues have external portions, we Flush and destroy deletes and returns the minimum element in Qk;aMeld their I/D buffers, Merge their two trees as described be- operation replaces Qk−1 and Qk with a new priority queue Q Q ∪ Q Q low, and then Fill the I/D buffer of the merged priority k−1 := k−1 k containing the elements from k−1 and  Q queue Qk−1 := Qk−1 ∪ Qk with the M/2 minimum elements k. We prove the following result.  in Qk−1. Theorem 3. There exists a linear-space data structure Flush. A Flush operation of an I/D buffer containing K that uses O(sort(N)) I/Os to process a sequence of N Cre- elements creates a new leaf l at the leaf level and stores ate, Insert, DeleteMin,and Meld operations. these K elements in l. Then it applies a Heapify operation 4.1 The Structure to the path from l to the root to restore the heap property. If the addition of l increased the degree of l’s parent p to Q   Each priority queue i is represented as an external heap M/B + 1, we split p into two nodes p and p ,eachwith similar to the one presented in [12]. The underlying struc- half of p’s children. We associate the buffer of p with p and ture is a rooted tree whose internal nodes have between populate the buffer of p with the M smallest elements in a M/ B b M/B = (4 )and = children. There are two types its subtree using a Fill operation. If this split increases the of leaves: regular leaves are on the lowest level of the tree, degree of p’s parent to M/B + 1, we apply this rebalancing which we call the leaf level; leaves above the leaf level are procedure recursively until we reach the root. If the root has special. degree greater than M/B, we split it into two nodes with a v X v Every node has an associated buffer v.If is an new parent. internal node, Xv has size M.Ifv is a leaf, there is no bound on the size of Xv. The elements in Xv are sorted, Fill. A Fill operation applied to a node v repeatedly takes and the buffer contents of adjacent nodes satisfy the heap the smallest element stored in the buffers of v’s children, re- property: for every node v with parent u and every pair of moves it from the corresponding child’s buffer and stores it elements x ∈ Xu and y ∈ Xv,wehavex ≤ y. in v’s buffer. This continues until M elements have been col- To support Insert and DeleteMin operations efficiently, lected in v’s buffer or there are no elements left in the buffers every priority queue Qi is equipped with an insert/delete of v’s children. Whenever a child buffer runs empty while buffer or I/D buffer for short. This buffer is capable of filling v’s buffer, its buffer is filled recursively before continu- holding up to M elements and stores the minimum elements ing to fill v’s buffer. Once there are no more elements left in in Qi as well as newly inserted elements. To distinguish anodev’s subtree, we mark v as exhausted,inordertoavoid newly inserted elements in Qi’s I/D buffer from minimum repeatedly trying to fill v with elements. More precisely, a elements, Qi has an associated priority pi, which is the min- node is marked as exhausted once its buffer becomes empty imum priority of the elements stored in the external part and all its children are exhausted. To fill the I/D buffer of of Qi. a priority queue Qk with the M/2 smallest elements in Qk, The I/D buffers of priority queues Q1,Q2,...,Qk are kept we transfer the M/2 smallest elements from the root of Qk on a buffer stack,withthebufferofQ1 at the bottom and into the I/D buffer, Filling the root’s buffer whenever it the buffer of Qk at the top. The I/D buffers of consecutive runs empty. If the root becomes exhausted in the process, priority queues are separated by special marker elements. we delete the entire external portion of the priority queue.

411 Merge. A Merge operation between two trees T1 and T2 [3] L. Arge. The buffer tree: A technique for designing of heights h1 >h2 proceeds as follows. We locate a node v batched external data structures. Algorithmica, at height h2 +1 in T1 and make the root r2 of T2 a child 37(1):1–24, 2003. of v. We also create a new special leaf node l that is a child [4] L. Arge, J. Chase, P. Halpin, L. Toma, D. Urban, J. S. of v.Nowletv = v0,v1,...,vh be the ancestors of v,by Vitter, and R. W. and. Flow computation on massive increasing distance fromP v,letKi be the number of elements grid terrains. In Proc. 9th ACM GIS, pp. 82–87, 2001. X K h K in vi ,andlet = i=0 i. First we collect the elements [5] L. Arge and M. Revsbæk. I/O-efficient contour tree X ,X ,...,X l in v0 v1 vh in sorted order and store them in . simplification. In Proc. 20th ISAAC, pp. 1155–1165, X Then we repeatedly remove the minimum element from r2 2009. X X and l, refilling r2 as necessary, until we have collected [6]Y.-J.Chiang,M.T.Goodrich,E.F.Grove, K r l the smallest elements in the subtrees rooted at 2 and . R. Tamassia, D. E. Vengroff, and J. S. Vitter. X ,X ,...,X We store these elements in buffers v0 v1 vh ,sorted External-memory graph algorithms. In Proc. 6th v K top-down and so that each node i receives i elements. We SODA, pp. 139–149, 1995. v ,v ,...,v clear the“exhausted”labels of all nodes among 0 1 h. [7] A. Danner. I/O Efficient Algorithms and Applications v M/ If has degree greater than 2 as a result of gaining two in Geographic Information Systems. PhD thesis, Dept. r l children, 2 and , we rebalance the tree using node splits of Comp. Sci., Duke University, 2006. starting at v similar to the rebalancing done after a Flush [8]A.Danner,K.Yi,T.Mølhave,P.K.Agarwal,L.Arge, operation. and H. Mitasova. TerraStream: From elevation data Heapify. The final operation to discuss is the Heapify op- to watershed hierarchies. Manuscript, 2007. eration. Given a node v and its path v = v0,v1,...,vh to the [9] M. de Berg, O. Cheong, H. Haverkort, J.-G. Lim, and root, a Heapify operation ensures that the elements stored L. Toma. I/O-efficient flow modelling on fat terrains. on the path satisfy the heap property. It assumes that the In Proc. 10th WADS, pp. 239–250, 2007. elements in v1,v2,...,vh already do so, but some elements [10] H. Edelsbrunner, J. Harer, and A. Zomorodian. in v0 may be less than elements stored at higher nodes. To Hierarchical morse complexes for piecewise linear restore the heap property, we sort the elements in v0 and col- 2-manifolds. In Proc. 17th SCG, pp. 70–79, 2001. lect the elements in v1,v2,...,vh in sorted order. Then we [11] H. Edelsbrunner, D. Letscher, and A. Zomorodian. merge the two sorted sequences to obtain a sorted sequence Topological persistence and simplification. In Proc. of the elements stored in v0,v1,...,vh and distribute these 41st FOCS, pp. 454–463, 2000. v ,v ,...,v elements over the buffers of nodes 0 1 h, assigning [12] R. Fadel, K. V. Jakobsen, J. Katajainen, and the same number of elements to each node as it had before J. Teuhola. Heaps and heapsort on secondary storage. the operation and storing the elements sorted top-down on Theor. Comp. Sci., 220(2):345–362, 1999. the path. If some of the nodes on the path were marked as [13] S. Jenson and J. Domingue. Extracting topographic exhausted before this operation, we unmark them. structure from digital elevation data for geographic 4.3 Analysis information system analysis. Photogrammetric Eng. and Remote Sensing, 54(11):1593–1600, 1988. To prove the correctness of all priority queue operations, [14] Y. Liu and J. Snoeyink. Flooding triangulated terrain. we need to verify that the heap property is maintained at all In Proceedings of the 11th International Symposium on times. This is little more than an exercise and is therefore Spatial Data Handling, pages 137–148, 2005. omitted. It is easy to see that every Create, Insert, DeleteMin, [15] J. F. O’Callaghan and D. M. Mark. The extraction of and Meld operation makes only O(1) changes to the buffer drainage networks from digital elevation data. stack and thus has an amortized cost of O(1/B)I/Os,ex- Computer Vision, Graphics and Image Proc., cluding the manipulations performed by the Flush, Fill, 28(3):323–344, 1984. Merge,andHeapify operations they trigger. The follow- [16] O. Palacios-Velez, W. Gandoy-Bernasconi, and ing two lemmas bound the cost of all Flush, Fill, Merge, B. Cuevas-Renaud. Geometric analysis of surface and Heapify operations performed during a sequence of N runoff and the computation order of unit elements in priority queue operations. Due to lack of space, their proofs distributed hydrological models. J. Hydrology, are omitted. 211:266–274, 1998. [17] D. M. Theobald and M. F. Goodchild. Artifacts of Lemma 8. During a sequence of N priority queue opera- TIN-based surface flow modeling. In Proc. tions, at most O(N/M) Flush, Fill, Merge,andHeapify GIS/LIS’90, pp. 955–964, 1990. operations are performed. [18] G. Tucker, S. Lancaster, N. Gasparini, and S. Rybarczyk. An object-oriented framework for Lemma 9. The amortized cost per Flush, Fill, Merge Heapify M/B N/M hydrology and geomorphic modeling using or operation is O(( )logM/B( )) I/Os. triangulated irregular networks. Computers and Geosciences, 27(8):959–973, 2001. 5. REFERENCES [1] P. Agarwal, L. Arge, and K. Yi. I/O-efficient batched union-find and its applications to terrain analysis. In Proc. 22nd SCG, pp. 167–176, 2006. [2] A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Comm. of the ACM, 31(9):1116–1127, 1988.

412 Cache-Oblivious Dynamic Dictionaries with Update/Query Tradeoffs

Gerth Stølting Brodal∗† Erik D. Demaine‡† Jeremy T. Fineman‡§ John Iacono¶ Stefan Langerman J. Ian Munro∗∗

Abstract dictionary problem in the cache-oblivious model. Several existing cache-oblivious dynamic dictionaries Memory models. The external-memory (or I/O) model N achieve O(logB N) (or slightly better O(logB )) mem- [1] is the original model of a two-level memory hierarchy. M M ory transfers per operation, where N is the number of This model consists of an internal memory of size and a items stored, M is the memory size, and B is the disk storing all remaining data. The algorithm can transfer B block size, which matches the classic B-tree data struc- contiguous blocks of data of size to or from disk at unit ture. One recent structure achieves the same query cost. The textbook data structure in this model is the B- bound and a sometimes-better amortized update bound tree [2], a dynamic dictionary that supports inserts, deletes, 1 1 2 and predecessor queries in O(logB N) memory transfers per of O 2 log N + log N memory trans- BΘ(1/(log log B) ) B B operation. fers. This paper presents a new data structure, the xDict, The cache-oblivious model [9, 10] arose in particular O 1 N implementing predecessor queries in ( ε logB M ) worst- from the need to model multi-level memory hierarchies. The case memory transfers and insertions and deletions in 1 N premise is simple: analyze a data structure or algorithm just O −ε log εB1 B M amortized memory transfers, for any as in the external-memory model, but the data structure or ε <ε< constant with 0 1. For example, the xDict algorithm is not explicitly parametrized by M or B. Thus N achieves subconstant amortized update cost when = the analysis holds for an arbitrary M and B, in particular all MBo(B1−ε) N , whereas the B-tree’s Θ(logB M ) is subcon- the M’s and B’s encountered at each level of the memory N o MB stant only when = ( ), and the previously ob- hierarchy. The algorithm could not and fortunately does 1 N 1 2 N tained Θ / B 2 logB + B log is subcon- not have to worry about the block replacement strategy BΘ(1 (log log ) )√ B because the optimal strategy can be simulated with constant stant only when N = o(2 ). The xDict attains the optimal overhead. This lack of parameterization has let algorithm tradeoff between insertions and queries, even in the broader designers develop elegant solutions to problems by finding external-memory model, for the range where inserts cost be- ε the best ways to enforce data locality. tween Ω( 1 lg1+ N) and O(1/ lg3 N) memory transfers. B Comparison of cache-oblivious dictionaries. Refer to Table 1. In the cache-oblivious model, Prokop’s static search 1 Introduction structure [10] was the first to support predecessor searches in This paper presents a new data structure, the xDict, which O(logB N) memory transfers, but it does not support inser- is the asymptotically best data structure for the dynamic- tion or deletion. Cache-oblivious B-trees [3, 4, 7] achieve O(logB N) memory transfers for insertion, deletion, and ∗Department of Computer Science, Aarhus University, IT-parken, search. The shuttle tree [5] supports insertions and dele- Abogade˚ 34, DK-8200 Arhus˚ N, Denmark, [email protected] O 1 log N + 1 log2 N † tions in amortized BΘ(1/(log log B)2) B B Supported in part by MADALGO — Center for Massive Data Algo- N rithmics — a Center of the Danish National Research Foundation. memory transfers, which is an improvement over Θ(logB ) ‡ o(B/ log B) MIT Computer Science and Artificial Intelligence Laboratory, 32 Vas- for N =2 , while preserving the O(logB N) query sar St., Cambridge, MA 02139, USA, [email protected], jfineman@cs. bound.1 Our xDict reduces the insertion and deletion bounds cmu.edu O 1 N <ε≤ § further to εB1−ε logB M , for any constant 0 1, Supported in part by NSF Grants CCF-0541209 and CCF-0621511, and under the tall-cache assumption (common to many cache- Computing Innovation Fellows. ¶ M B2 Department of Computer Science and Engineering, Polytechnic Insti- oblivious algorithms) that =Ω( ). For all of these data tute of New York University, 5 MetroTech Center, Brooklyn, NY 11201, structures, the query bounds are worst case and the update USA, http://john.poly.edu bounds are amortized. Maˆıtre de recherches du F.R.S.-FNRS, Computer Science Department, Universite´ Libre de Bruxelles, CP212, Boulevard du Triomphe, 1050 Bruxelles, Belgium, [email protected] ∗∗ 1 Cheriton School of Computer Science, University of Waterloo, Water- For these previous data structures, the logB N terms may well reduce N N loo, Ontario N2L 3G1, Canada, [email protected] to logB M terms, but only logB was explicitly proven.

1448 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. Data Structure Search Insert/Delete static search [10] O(logB N) not supported B-trees [3, 4, 7] O(logB N) O(logB N) 1 1 2 shuttle tree [5] O(log N) O 2 log N + log N B BΘ(1/(log log B) ) B B O 1 N ⇒ 1 N lower bound [6] ( ε logB M )=Ω εB1−ε logB M O 1 N O 1 N xDict [this paper] ( ε logB M ) εB1−ε logB M

Table 1: Summary of known cache-oblivious dictionaries. Our xDict uses the tall-cache assumption that M =Ω(B2).

Lower bounds. The tradeoff between insertion and the input and output buffers, with no recursive subboxes or search costs was partly characterized in the external- middle buffer. memory model [6]. Because any algorithm in the cache- Logically, the upper-level subboxes are children of the oblivious model is also an algorithm in the external- input buffer and parents of the middle buffer. Similarly, the memory model using the same number of memory trans- lower-level subboxes are children of the middle buffer and fers, lower bounds for the external-memory model carry parents of the output buffer. However, the buffers and sub- over to the cache-oblivious model. Brodal and Fagerberg boxes do not necessarily form a tree structure. Specifically, [6] proved that any structure supporting insertions in I = for an x-box D, there are pointers from D’s input buffer O 1 N/M εB1−ε logB( ) amortized memory transfers, when to the input buffers of its upper-level subboxes. Moreover, 1 1+ε 3 D I is between Ω( B lg N) and O(1/ lg N) and when N ≥ there may be many pointers from ’s input buffers to a sin- 2 1 M , must use Ω( ε logB(N/M)) worst-case memory trans- gle subbox. There are also pointers from the output buffers fers for search. They also produced a data structure achiev- of the upper-level subboxes to D’s middle buffer. Again, ing this tradeoff, but their structure is not cache-oblivious, there may be many pointers originating from a single sub- using knowledge of the parameters B and M. The xDict box. Similarly, there are pointers from D’s middle buffer to structure achieves the same tradeoff in the cache-oblivious its lower-level subboxes’ input buffers, and from the lower- model, and is therefore optimal for this range of insertion level subboxes’ output buffers to D’s output buffers. cost I. Slightly outside this range, the optimal bounds are The number of subboxes in each level has been chosen not known, even in the external-memory model. to match the buffer sizes. Specifically, the total size of the input buffers of all subboxes in the upper level is at most 1 x1/2 · x1/2 1 x 2 Introducing the x-box 4 = 4 , which is a constant factor of the size of x Our xDict dynamic-dictionary data structure is built in terms the -box’s input buffer. Similarly, the total size of the upper- 1 x1/2 ·(x1/2)1+α = of another structure called the x-box. For any positive inte- level subboxes’ output buffers is at most 4 1 x1+α/2 x ger x,anx-box supports a batched version of the dynamic- 4 , which matches the size of the -box’s middle dictionary problem (defined precisely later) in which ele- buffer. The total size of the lower-level subboxes’ input and 1 x1/2+α/2 · x1/2 1 x1+α/2 ments are inserted in batches of Θ(x). Each x-box will be output buffers are at most 4 = 4 and 1 x1/2+α/2 · x1/2 1+α 1 x1+α defined recursively in terms of y-boxes for y0, affecting the inser- An x-box D organizes elements as follows. Sup- tion cost, with lower values of α yielding cheaper insertions. pose that the keys of elements contained in D range from This parameter is chosen globally and remains fixed through- [κmin,κmax). The elements located in the input buffer oc- out. cur in sorted order, as do the elements located in the mid- As shown in Figure 1,√ an x-box is composed of three dle buffer and the elements located in the output buffer. All buffers (arrays) and many x-boxes, called subboxes. The three of these buffers may contain elements having any keys three buffers of an x-box are the input buffer of size x, the between κmin and κmax. The upper-level subboxes, how- 1+α/2 middle buffer√of size x , and the output buffer of size ever, partition the key space. More precisely, suppose that x1+α. The x-subboxes of an x-box are divided into two there are r upper-level subboxes. Then there exist keys 1 / x1 2 κmin = κ0 <κ1 < ··· <κr = κmax such that each sub- levels: the upper level consists of at most 4 subboxes, 1 x1/2+α/2 box contains elements in a distinct range [κi,κi+1). Simi- and the lower level consists of at most 4 subboxes. 1 x1/2+α/2 larly, the lower-level subboxes partition the key space. There Thus, in total, there are fewer than 2 subboxes. See Table 2 for a table of buffer counts and sizes. As a base is, however, no relationship between the partition imposed case, an O(1)-box consists of a single array that acts as both by the upper-level subboxes and that of the lower-level sub- boxes; the subranges are entirely unrelated. What this setup

1449 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. Figure 1: The recursive structure of an x-box. The arrows represent lookahead pointers. These lookahead pointers are evenly distributed in the target buffer, but not necessarily in the source buffer. Additional pointers (not shown) allow us to find the nearest lookahead pointer(s) in O(1) memory transfers. The white space at the right of the buffers indicates empty space, allowing for future insertions.

Buffer Size per buffer Number of buffers Total size Top buffer x 1 x x1/2 1 x1/2 1 x Top buffer 4 4 x1/2 1+α/2 1 x1/2 1 x1+α/4 Middle buffer ( ) 4 4 x1/2 1+α 1 x1/2 1 x1+α/2 Bottom buffer ( ) 4 4 Middle buffer x1+α/2 1 x1+α/2 x1/2 1 x1/2+α/2 1 x1+α/2 Top buffer 4 4 x1/2 1+α/2 1 x1/2+α/2 1 x1+3α/4 Middle buffer ( ) 4 4 x1/2 1+α 1 x1/2+α/2 1 x1+α Bottom buffer ( ) 4 4 Bottom buffer x1+α 1 x1+α

Table 2: Sizes of buffers in an x-box. This table lists the sizes of the three buffers in an x-box and the sizes and number of buffers in its recursive x1/2-boxes, expanding just one level of recursion. means is that an element with a particular key may be located from the lower-level subboxes to D’s output buffer. in the input buffer or the middle buffer or the output buffer To aid in pushing elements down to the appropriate sub- or a particular upper-level subbox or a particular lower-level boxes, we embed in the input (and middle) buffers a pointer subbox. Our search procedure will thus look in all five of to each of the upper-level (and lower-level) subboxes. These these locations to locate the element in question. subbox pointers are embedded into the buffers by associating Before delving into more detail about the x-boxes, let with them the minimum key stored in the corresponding sub- us first give a rough sketch of insertions into the data struc- box, and then storing them along with the buffer’s elements ture. When an element is inserted into an x-box D,itis in sorted order by key. first inserted into D’s input buffer. As the input buffer stores To facilitate searches, we employ the technique of frac- elements in sorted order, elements in the input buffer must tional cascading [8], giving an x-box’s input buffer (and move to the right to accommodate newly inserted elements. middle buffer) a sample of the elements of the upper-level When D’s input buffer becomes full (or nearly full), the subboxes’ (and lower-level subboxes’) input buffers. Specif- elements are inserted recursively into D’s upper-level sub- ically, let U be an upper-level subbox of the x-box D. Then boxes. When an upper-level subbox becomes full enough, a constant fraction of the keys stored in U’s input buffer are it is “split” into two subboxes, with one subbox taking the also stored in D’s input buffer. This sampling is performed half of the elements with smaller keys and the other taking deterministically by placing every sixteenth element in U’s the elements with larger keys. When the maximum number input buffer into D’s input buffer. The sampled element (in of upper-level subboxes is reached, all elements are moved D’s input buffer) has a pointer to the corresponding element from the upper-level subboxes to D’s middle buffer (which in U’s input buffer. This type of pointer also occurs in [5], stores all elements in sorted order). When the middle buffer where they are called “lookahead pointers.” We adopt the becomes full enough, elements are moved to the lower-level same term here. This sampling also occurs on the output subboxes in a similar fashion. When the maximum number buffers. Specifically, the output buffers of the upper-level of lower-level subboxes is reached, all elements are moved (and lower-level) subboxes contain a similar sample of the

1450 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. elements in D’s middle buffer (and output buffer).2 5. The middle buffer. The advantage of lookahead pointers is roughly as fol- lows. Suppose we are looking for a key κ in some buffer A. 6. Array of booleans indicating which lower-level sub- Let s be the multiple of 16 (i.e., a sampled element) such that boxes are being used. A s ≤ κ0. middle) buffer a pointer to the nearest lookahead or subbox pointers preceding and following it. Proof. The proof is by induction. An x-box contains three Techniques. While fractional cascading has been em- buffers of total size c(x+x1+α/2+x1+α) ≤ 3cx1+α, where ployed before [5], we are not aware of any previous cache- c is the constant necessary for each array entry (including oblivious data structures that are both recursive and that use information about lookahead pointers, etc.). The boolean fractional cascading. Another subtler difference between the 1 x1/2+α/2 ≤ cx1+α arrays use a total of at most 2 space, x-box and previous cache-oblivious structures involves the giving us a running total of 4cx1+α space. Finally, the sizing of subboxes and our use of the middle buffer. If the subboxes by assumption use a total of at most c(x1/2)1+α · x-box matched typical structures, then an x-box with an in- 1 x1/2+α/2 c x1+α c ≥ c 2 = 2 space. Setting 8 yields a total 1+α 1+α put buffer of size x and an output√ buffer of size x would space usage of at most cx .  have a middle buffer of size x 1+α, not the x1+α/2 that we use. That is to say, it is natural to size the buffers such that 4 Operating an x-box x xδ a size- input buffer is followed by a size- middle buffer x D for some δ, and a size-y = xδ middle buffer is followed by An -box supports two operations: yδ xδ2 a size- = output buffer. Our choice of sizes causes 1. BATCH-INSERT(D, e1,e2,...,eΘ(x)): Insert Θ(x) the data structure to be more topheavy than usual, a feature keyed elements e1,e2,...,eΘ(x), given as a sorted ar- which facilitates obtaining our query/update tradeoff. ray, into the x-box D.BATCH-INSERT maintains lookahead pointers as previously described. 3 Sizing an x-box 2. SEARCH(D, s, κ): Return a pointer to an element with An x-box stores the following fields, in order, in a fixed key κ in the x-box D if such an element exists. If no contiguous region of memory. such element exists, return a pointer to κ’s predecessor 1. A counter of the number of real elements (not counting in D’s output buffer, that is, the element located in D’s lookahead pointers) stored in the x-box. output buffer with the largest key smaller than κ.We assume that we are given the nearest lookahead pointer 2. The top buffer. s preceding κ pointing into D’s input buffer: that is, s points to the sampled element (i.e., having an index 3. Array of booleans indicating which upper-level sub- that is a multiple of 16)inD’s input buffer that has the boxes are being used. largest key not exceeding κ. 4. Array of upper-level subboxes, in an arbitrary order. We treat an x-box as full or at capacity when it con- 1 x1+α tains 2 real elements. A BATCH-INSERT is thus only 2Any sufficiently large sampling constant suffices here. We chose 16 allowed when the inserted elements would not cause the x- 1 x1+α as an example of one such constant for concreteness. The constant must be box to contain more than 2 elements. Our algorithm large enough that, when sampling D’s output buffer, the resulting lookahead does not continue to insert into any recursive x-box when pointers occupy only a constant fraction of the lower-level subboxes’ output this number of elements is exceeded. The constant can be buffers. To make the description more concise, the constant fraction we 1 allow is 1 , but in fact any constant would work. As the lower-level tuned to waste less space, but we choose 2 here to simplify 4 x subboxes’ output buffers account for only 1 of the space of D’s output the presentation. Recall that the output buffer of an -box 4 α 1 has size x1+ , which is twice the size necessary to accom- buffer, these two constants are multiplied to get that only 16 of the elements in D’s output buffer may be sampled. modate all the real elements in the x-box. We allow the other

1451 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 1 x1+α 2 space in the output buffer to store external lookahead creating subboxes as necessary, and placing the appro- pointers (i.e., lookahead pointers into a buffer in the contain- priate lookahead pointers. The SAMPLE-UP operation ing x2-box). is an auxiliary operations used by BATCH-INSERT.

4.1 SEARCH. Searches are easiest. The operation 4.3 FLUSH. To F LUSH an x-box, first flush all the sub- SEARCH(D, s, κ) starts by scanning D’s input buffer at slot boxes. Now consider the result. Elements can live in only s and continues until reaching an element with key κ or un- five possible places: the input buffer, the middle buffer, the til reaching slot s where the key at s is larger than κ.By output buffer, the upper-level subboxes’ output buffers, and assumption on lookahead pointers, this scan considers O(1) the lower-level subboxes’ output buffers. The elements are array slots. If the scan finds an element with key κ, then we in sorted order in all of the buffers. Moreover, as the upper- are done. Otherwise, we consider the nearest lookahead or level (and lower-level) subboxes partition the key space, the subbox pointer preceding slot s − 1. We follow this pointer collection of upper-level subboxes’ output buffers form a and search recursively in the corresponding subbox. That re- fragmented sorted list of elements. Thus, after flushing all cursive search returns a pointer in the subbox’s output buffer. subboxes, moving all elements to the output buffer requires We again reference the nearest lookahead pointer and jump just a 5-way merge into the output buffer. A constant-way to a point in the middle buffer. This process continues, scan- merge can be performed in a linear number of memory trans- ning a constant-size region in middle buffer, searching re- fers in general, but here we have to deal with the fact that cursively in the lower-level subbox, and scanning a constant- upper-level and lower-level subboxes represent fragmented size region in the output buffer. lists, requiring random accesses to jump from the end of one output buffer to the beginning of another. LEMMA 4.1. For x>B,aSEARCH in an x-box costs When merging all the elements into the output buffer, O((1 + α) logB x) memory transfers. we merge only real elements, not lookahead pointers (except for the lookahead pointers that already occur in the output Proof. The search considers a constant-size region in the buffer). This step breaks the sampling structure of the data O three buffers, for a total of (1) memory transfers, and per- structure, which we will later resolve with the SAMPLE-UP forms two recursive searches. Thus, the cost of√ a search can procedure. S x S x O be described by the recurrence ( )=2 ( )+ (1). When the flush completes, the input and middle buffers x1+α O B x O B1/(1+α) Once = ( ), or equivalently = ( ), are entirely empty, and all subboxes are deleted.3 an entire x-box fits in a block, and no further recursions α incur any other cost. Thus, we have a base case of LEMMA 4.2. For x1+ >B,aFLUSH in an x-box costs / α S(O(B1 (1+ )))=0. O(x1+α/B) memory transfers. The recursion therefore proceeds with a nonzero cost for lg lg x − lg lg O(B1/(1+α)) levels, for a total of Proof. We can describe the flush by the recurrence lg lg x−lg lg O(B1/(1+α)) 2 =(1+α)lgx/ lg O(B)=O((1 + √ 1+α 1/2+α/2 1 1/2+α/2 α) logB x) memory transfers.  F (x)=O(x /B)+O(x )+ x F ( x) , 2

4.2 BATCH-INSERT overview. For clarity, we decom- where the first term arises due to scanning all the buffers pose BATCH-INSERT into several operations, including two (i.e., the 5-way merge), the second term arises from the new auxiliary operations: random accesses both to load the first block of any of the subboxes and to jump when scanning the concatenated list of 1. FLUSH(D): After this operation, all k elements in the output buffers, and the third term arises due to the recursive x-box D are located in the first Θ(k) slots of D’s output flushing. When x1+α = O(M), the second term disappears buffer. These elements occur in sorted order. All other as the entire x-box fits in memory, and we thus need only buffers and recursive subboxes are emptied, temporarily pay for loading each block once. When applying a tall- leaving the D without any internal lookahead pointers cache assumption that M =Ω(B2), it follows that the (to be fixed later by a call SAMPLE-UP). The Θ() arises second term only occurs when x1/2+α/2 =Ω(B), and because of the presence of lookahead pointers directed hence when x1/2+α/2 = x1+α/x1/2+α/2 = x1+α/Ω(B)= from the output buffer. The FLUSH operation is an O(x1+α/B). We thus have a total cost of auxiliary operation used by the BATCH-INSERT. 1 √ F (x) ≤ c x1+α/B + x1/2+α/2F ( x) , 2. SAMPLE-UP(D): This operation may only be invoked 1 2 on an x-box that is entirely empty except for its out- put buffer (as with one that has just been FLUSHed). 3In fact, the subboxes use a fixed memory footprint, so they are simply The sampling process is employed from the bottom up, marked as deleted.

1452 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. where c1 is a constant hidden by the order notation. performing a recursive insert would cause the number of α √ 1+α We next prove that F (x) ≤ cx1+ /B by induction. As (real) elements in the subbox to exceed (1/2) x , first 1+α a base case, when y fits in memory, the cost is F (y)= “split” the subbox. A “split” entails first FLUSHing the α cy1+ /B as already noted, to load each block into memory subbox, creating a new subbox, moving the larger half of once. Applying the inductive hypothesis that F (y) ≤ the elements from the old subbox’s output buffer to the new α cy1+ /B for some sufficiently large constant c and y c 1 + 2 . Setting 2 1 completes the recording the number of elements in each subbox to reflect proof.  the number of real elements in each. Observe that after all the recursions occur, the number√ D / x · 4.4 SAMPLE-UP. When invoking a SAMPLE-UP, we as- of real√ elements in ’s input buffer is at most (1 2) sume that the only elements in the x-box live in the output (1/4) x =(1/8)x, as otherwise more elements would have buffer. In this state, there are no lookahead pointers to fa- moved down to a subbox. cilitate searches. The SAMPLE-UP recomputes the looka- After moving elements from the input buffer to the head pointers, allowing future searches to be performed effi- upper-level subboxes, resample from the upper-level sub- ciently. boxes’ input buffers into D’s input buffer. Note that the num- The SAMPLE-UP operates as follows. Suppose that the ber of lookahead√ √ pointers introduced into the input buffer is 1+α 1 x· 1 x output buffer contains kB, the movement from the upper-level subboxes to the middle costs O(x1+α/B). buffer. (If the last lower-level subbox is allocated, we move all elements from D’s middle buffer and lower-level sub- Proof. The proof is virtually identical to the proof for boxes to D’s output buffer and then call SAMPLE-UP on D.) FLUSH. The recurrence is the same (in fact, it is better here When insertions into the lower-level subboxes complete, we because we can guarantee that the subboxes are, in fact, con- allocate new upper-level subboxes (as in SAMPLE-UP) and tiguous). sample from D’s middle buffer into the upper-level suboxes’  output buffers, call SAMPLE-UP recursively on these sub- boxes, and finally sample from the subboxes’ input buffers into D’s input buffer. 4.5 BATCH-INSERT. The BATCH-INSERT operation takes as input a sorted array of elements to insert. In par- Observe that the upper-level subboxes’ output buffers collectively contain at most 1 x1+α/2 lookahead pointers. ticular, when inserting into an x-box D,aBATCH-INSERT 16 inserts Θ(x) elements. For conciseness, let us say that the Moreover, after elements are moved into the middle buffer, constant hidden by the theta notation is 1/2. First, merge these are the only elements in the upper-level subboxes, and x1/2/8 the inserted elements into D’s input buffer, and increment they are spread across at most half ( ) the upper-level the counter of elements contained in D by (1/2)x.Forsubboxes, as specified for SAMPLE-UP. Hence, there must x1/2/8 simplicity, also remove the lookahead pointers during this be at least subbox splits between moves into the step. We will readd them later. middle buffer. Because subboxes splits only occur when (1/4)x1+α/2 Then, (implicitly) partition the input buffer according the two resulting subboxes contain at least real (1/4)x1+α/2 · to the ranges owned by each of the upper-level subboxes. elements, it follows that there must be at least √ (1/8)x1+α/2 =Ω(x1+alpha/2) insertions into D between For any partition containing√ at least (1/2) x elements, repeatedly remove (1/2) x elements from D’s input buffer moves into the middle buffer. A similar argument shows that there must be at least Ω(x1+α) insertions into D between and insert them recursively into the appropriate√ subbox until the partition contains less than (1/2) x elements. If insertions into the output buffer.

1453 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 1+α/2 THEOREM 4.1. A BATCH-INSERT into an x-box, with x> x /B LowerRA(x) O + O B, costs an amortized O((1 + α) log (x)/B1/(1+α)) mem- x1+α/2 x1+α/2 B √ ory transfers per element. √ F ( x) + I( x)+O x1/2+α/2 Proof. An insert has several costs per element. First, there √ is the cost of merging into the input array, which is simply 1 x1/2+α/2F ( x) x1+α/B + O 4 + O O(1/B) per element. Next, each element is inserted recur- x1+α x1+α sively into a top-level subbox. These recursive insertions en- UpperRA(x) tail random accesses to load the first block of each of the sub- = O(1/B)+O x boxes. Then these subboxes must be sampled, which is dom- √ inated by the cost of the aforementioned random accesses LowerRA(x) F ( x) + O + O and scans. An element may also contribute to a split of a sub- x1+α/2 x1/2+α/2 box, but each split may be amortized against Ω(x1/2+α/2) √ +2I( x) insertions. Then we must also consider the cost of moving = O(1/B)+O(1/B1/(1+α))+O(1/B1/(1+α)) elements from the upper-level subboxes to the middle buffer, but this movement may be amortized against the Ω(x1+α/2) x1/2+α/2/B √ + O +2I( x) elements being moved. Finally, there are similar costs among x1/2+α/2 the lower-level subboxes. √ = O(1/B1/(1+α))+2I( x) Let us consider the cost of the random accesses more closely. If all of the upper-level subboxes fit into mem- As we charge loading the first block of a sub- ory, then the cost of random accesses is actually the mini- box to an insert into the parent, we have a base mum of performing the random accesses or loading the en- / α case of I(O(B1 (1+ )) = 0, i.e., when the x- tire upper-level into memory. For the upper-level, we denote UpperRA x UpperRA x box fits into a single block. Solving the recur- this value by ( ). We thus have ( )= O /B1/(1+α) O 1 x1/2 x1+α/2 M UpperRA x rence, we get a per element cost of (1 ) at ( 4 ) if =Ω( ), and ( )= x − O B1/(1+α) 1 1/2 1+α/2 1+α/2 lg lg lg lg ( ) levels of recursion, and hence O(min x ,x /B ) if x = O(M). In fact, 1/(1+α) 4 a total cost of O(2lg lg x−lg lg O(B )/B1/(1+α))= we are really concerned with UpperRA(x)/x, as the ran- O (1+α)lgx O α x /B1/(1+α)  dom accesses can be amortized against the x elements in- B1/(1+α) lg O(B) = ((1 + ) logB( ) ). serted. We analyze the two cases separately, and we assume M>B2 the tall-cache assumption that . 5 Building a dictionary out of x-boxes x1+α/2 M N 1. Suppose that =Ω( ). Then we have The xDict data structure consists of log1+α log2 +1 x1+α/2 =Ω(B2) by the tall-cache assumption, x-boxes of doubly increasing size, where α is the same 1/2 2/(2+α) and hence x =Ω(√B ). It follows that parameter for the underlying x-boxes. Specifically, for 0 ≤ UpperRA x /x O / x O /B2/(2+α i ≤ N i x (1+α)i ( ) = (1 )= (1 ). log1+α log2 , the th box has =2 . The α/ x-boxes are linked together by incorporating into the ith 2. Suppose that x1+ 2 = O(M). We have two subcases / α box’s output buffer the lookahead pointers corresponding to here. If x>B2 (1+ ), then we have a cost of at most / / α a sample from the (i +1)st box’s input buffer. UpperRA(x)/x = O(x1 2/x)=O(1/B1 (1+ )). / α We can define the operations on an xDict in terms of If, on the other hand, x

1454 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 1 N move through all boxes, our insertion analysis accounts for O( ε logB M ) memory transfers and supports insertions in 1 N 1−ε an insertion into each box for each element inserted. The cost O( ε logB( M )/B ) amortized memory transfers. of the FLUSH and SAMPLE-UP incurred by each element as ε< 1 it moves through each of the boxes is dominated by the cost Proof. First off, we consider only 2 (rolling up a of the insertion. particular constant into the big-O notation), as larger ε only To search in the xDict, we simply search in each of hurt the performance of inserts. / α −ε the x-boxes, and return the closest match. Specifically, we Choose α = ε/(1 − ε), which gives B1 (1+ ) = B1 . ε< 1 α< search the boxes in order from smallest to largest. If the Because 2 ,wehave 1, and we can apply element is not found in the ith box, we have a pointer into Theorem 5.1. The 1/α term solves to 1/α =(1− ε)/ε = the ith box’s output buffer. We use this pointer to find an O(1/ε) to complete the proof.  appropriate lookahead pointer into the (i +1)st box’s input buffer, and begin the search from that point. 6 Final notes The performance of the xDict is essentially a geometric We did not address deletion in detail, but claim that it can series: be handed using standard techniques. For example, to delete THEOREM 5.1. The xDict supports searches in an element we can insert an anti-element with the same key 1 N value. In the course of an operation, should a key value and O( α logB M ) memory transfers and (single-element) 1 N 1/(1+α) its antivalue be discovered, they annihilate each other while inserts in O( α (logB M )/B ) amortized memory transfers, for 0 <α≤ 1. releasing potential which is used to remove them from the buffers they are in. Rebuilding the whole structure when Proof. A simple upper bound on the search cost is the number of deletions since the last rebuild is half of the structure ensures that the total size does not get out of sync N log1+α log2 i with the number of not-deleted items currently stored. O α (1+α) ((1 + ) logB(2 ) Another detail is that, to hold n items, the xDict may ⎛i=0 ⎞ create an n-box, which occupies up to Θ(n1+α) of address N log1+α log2 n 1+α space. However, only Θ( ) space of the xDict will ever be = O ⎝ (1 + α)i⎠ lg B occupied, and the layout ensures that the unused space is at i=0 the end. Therefore the xDict data structure can use optimal ∞ n (1 + α)lgN 1 Θ( ) space. = O lg B (1 + α)i i=0 Acknowledgments (1 + α)2 = O log N We would like to thank the organizers of the MADALGO α B Summer School on Cache-Oblivious Algorithms, held in 1 Aarhus, Denmark, on August 18–21, 2008, for providing the = O log N . α B opportunity for the authors to come together and create and analyze the structure presented in this paper. The last step of the derivation follows from the assumption that α ≤ 1 and hence that (1 + α)2 = O(1). References The above analysis, however, exploits only a constant number of cache blocks. If we assume that the memory 1/(1+α) already holds all x-boxes smaller than O(M ),4 the [1] Alok Aggarwal and Jeffrey Scott Vitter. The input/output 1 1/(1+α) 1 first O( α logB M )=O( α logB M) transfers are complexity of sorting and related problems. Communications 1 N free, resulting in a search cost of O( α logB M ). The cache- of the ACM, 31(9):1116–1127, September 1988. oblivious model assumes an optimal paging strategy, and [2] Rudolf Bayer and Edward M. McCreight. Organization and using half the memory to store the smallest x-boxes is no maintenance of large ordered indexes. Acta Informatica, better than optimal. 1(3):173–189, February 1972. [3] Michael A. Bender, Erik D. Demaine, and Martin Farach- The analysis for insertions is identical except that all O /B1/(1+α)  Colton. Cache-oblivious B-trees. SIAM Journal on Comput- costs are multiplied by (1 ). ing, 35(2):341–358, 2005. [4] Michael A. Bender, Ziyang Duan, John Iacono, and Jing ε <ε< COROLLARY 5.1. For any with 0 1, there exists Wu. A locality-preserving cache-oblivious dynamic dictio- a setting of α such that the xDict supports searches in nary. Journal of Algorithms, 53(2):115–136, 2004. [5] Michael A. Bender, Martin Farach-Colton, Jeremy T. Fine- 4All x-boxes with size at most O(M 1/(1+α)) fit in O(M) memory as man, Yonatan R. Fogel, Bradley C. Kuszmaul, and Jelani Nel- the x-box sizes increase supergeometrically. son. Cache-oblivious streaming B-trees. In Proceedings of

1455 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. the 19th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 81–92, San Diego, CA, June 2007. [6] Gerth Stølting Brodal and Rolf Fagerberg. Lower bounds for external memory dictionaries. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 546–554, Baltimore, MD, January 2003. [7] Gerth Stølting Brodal, Rolf Fagerberg, and Riko Jacob. Cache oblivious search trees via binary trees of small height. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 39–48, San Francisco, CA, Jan- uary 2002. [8] Bernard Chazelle and Leonidas J. Guibas. Fractional cascad- ing: I. A data structuring technique. Algorithmica, 1(2):133– 162, 1986. [9] Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Srid- har Ramachandran. Cache-oblivious algorithms. In Proceed- ings of the 40th Annual IEEE Symposium on Foundations of Computer Science, pages 285–297, New York, NY, 1999. [10] Harald Prokop. Cache-oblivious algorithms. Master’s thesis, Massachusetts Institute of Technology, Cambridge, MA, June 1999.

1456 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. PODS 2010 Best Paper Award

An Optimal Algorithm for the Distinct Elements Problem

∗ † Daniel M. Kane Jelani Nelson David P. Woodruff Harvard University MIT CSAIL IBM Almaden Research One Oxford Street 32 Vassar Street Center Cambridge, MA 02138 Cambridge, MA 02139 650 Harry Road [email protected] [email protected] San Jose, CA 95120 [email protected]

ABSTRACT is a fundamental problem in network traffic monitoring, query We give the first optimal algorithm for estimating the num- optimization, data mining, and several other database areas. ber of distinct elements in a data stream, closing a long line For example, this statistic is useful for selecting a minimum- of theoretical research on this problem begun by Flajolet cost query plan [33], database design [18], OLAP [30, 34], and Martin in their seminal paper in FOCS 1983. This data integration [10, 14], and data warehousing [1]. problem has applications to query optimization, Internet In network traffic monitoring, routers with limited mem- routing, network topology, and data mining. For a stream ory track statistics such as distinct destination IPs, requested of indices in {1,...,n}, our algorithm computes a (1 ± ε)- URLs, and source-destination pairs on a link. Distinct ele- approximation using an optimal O(ε−2+log(n)) bits of space ments estimation is also useful in detecting Denial of Service with 2/3 success probability, where 0 <ε<1 is given. This attacks and port scans [2, 17]. In such applications the data probability can be amplified by independent repetition. Fur- is too large to fit at once in main memory or too massive thermore, our algorithm processes each stream update in to be stored, being a continuous flow of data packets. This O(1) worst-case time, and can report an estimate at any makes small-space algorithms necessary. Furthermore, the point midstream in O(1) worst-case time, thus settling both algorithm should process each stream update (i.e., packet) the space and time complexities simultaneously. quickly to keep up with network speeds. For example, Estan We also give an algorithm to estimate the Hamming norm et al [17] reported packet header information being produced of a stream, a generalization of the number of distinct ele- at .5GB per hour while estimating the spread of the Code ments, which is useful in data cleaning, packet tracing, and Red worm, for which they needed to estimate the number database auditing. Our algorithm uses nearly optimal space, of distinct Code Red sources passing through a link. and has optimal O(1) update and reporting times. Yet another application is to data mining: for example, estimating the number of distinct queries made to a search Categories and Subject Descriptors: F.2.0 [Analysis engine, or distinct users clicking on a link or visiting a web- of Algorithms and Problem Complexity]: General; H.2.m site. Distinct item estimation was also used in estimating [Database Management]: Miscellaneous connectivity properties of the Internet graph [32]. General Terms: Algorithms, Theory We formally model the problem as follows. We see a Keywords: distinct elements, streaming, query optimiza- stream i ,...,im of indices ij ∈ [n], and our goal is to tion, data mining 1 compute F0 = |{i1,...,im}|, the number of distinct in- dices that appeared, using as little space as possible. Since 1. INTRODUCTION it is known that exact or deterministic computation of F0 Estimating the number of distinct elements in a data stream requires linear space [3], we settle for computing a value e ∗ F0 ∈ [(1 − ε)F0, (1 + ε)F0] for some given 0 <ε<1 with Supported by a National Defense Science and Engineering probability 2/3, over the randomness used by the algorithm. Graduate (NDSEG) Fellowship. † This probability can be amplified by independent repetition. Supported by a National Defense Science and Engineering The problem of space-efficient F0-estimation is well-studied, Graduate (NDSEG) Fellowship, and in part by the Cen- beginning with the work of Flajolet and Martin [20], and ter for Massive Data Algorithmics (MADALGO)-acenter continuing with a long line of research, [3, 4, 5, 6, 9, 12, 16, of the Danish National Research Foundation. Part of this work was done while the author was at the IBM Almaden 17, 19, 23, 24, 26, 36]. In this work, we finally settle both F Research Center. the space- and time-complexities of 0-estimation by giving an algorithm using O(ε−2 +log(n)) bits of space, with worst- case update and reporting times O(1). By update time, we mean the time to process a stream token, and by reporting F Permission to make digital or hard copies of all or part of this work for time, we mean the time to output an estimate of 0 at any personal or classroom use is granted without fee provided that copies are point in the stream. Our space upper bound matches the not made or distributed for profit or commercial advantage and that copies known lower bounds [3, 26, 36] up to a constant factor, and bear this notice and the full citation on the first page. To copy otherwise, to the O(1) update and reporting times are clearly optimal. A republish, to post on servers or to redistribute to lists, requires prior specific detailed comparison of our results to those in previous work permission and/or a fee. PODS’10, June 6–11, 2010, Indianapolis, Indiana, USA. is given in Figure 1. There is a wide spectrum of time/space Copyright 2010 ACM 978-1-4503-0033-9/10/06 ...$10.00. tradeoffs but the key points are that none of the previous

41 Paper Space Update Time Notes [20] O(log n) - Assumes random oracle, constant ε [3] O(log n) O(log n) Only works for constant ε [24] O(ε−2 log n) O(ε−2) [5] O(ε−3 log n) O(ε−3) [4] O(ε−2 log n) O(log(ε−1)) Algorithm I in the paper [4] O(ε−2 log log n + poly(log(ε−1), log log n) log n) ε−2 poly(log log n + log(ε−1)) Algorithm II in the paper [4] O(ε−2(log(ε−1) + log log n) + log n) O(ε−2(log(ε−1) + log log n)) Algorithm III in the paper [16] O(ε−2 log log n + log n) - Assumes random oracle, additive error [17] O(ε−2 log n) - Assumes random oracle [6] O(ε−2 log n) O(log(ε−1)) [19] O(ε−2 log log n + log n) - Assumes random oracle, additive error This work O(ε−2 + log n) O(1) Optimal

Figure 1: Comparison of our algorithm to previous algorithms on estimating the number of distinct elements in a data stream. algorithms achieved our optimal O(ε−2 +log n) bits of space, vious works, with added twists to achieve our stated per- and the only ones to achieve optimal O(1) update and/or re- formance. In for example [4], it was observed that if one porting time had various restrictions, e.g., the assumption of somehow knows ahead of time a value R =Θ(F0), refining access to a random oracle (that is, a truly random hash func- to (1 ± ε)-approximation becomes easier. For example, [4] tion) and/or a small constant additive error in the estimate. suggested a “balls and bins” approach to estimating F0 given The best previous algorithms without any assumptions are such an R. The key intuition is that when hashing A balls due to Bar Yossef et al [4], who provide algorithms with randomly into K bins, the number of bins hit by at least various tradeoffs (Algorithms I, II, and III in Figure 1). one ball is highly concentrated about its expectation, and We also give a new algorithm for estimating L0, also treating this expectation as a function of A then inverting knownastheHamming norm of a vector [13], with opti- provides a good approximation to A with high probability mal running times and near-optimal space. This problem is for A =Θ(K). Then if one subsamples each index in [n] − log(R/K) a generalization of F0-estimation to the case when items can with probability 2 , in expectation the number of be removed from the stream. While F0-estimation is useful distinct items surviving is Θ(K), at which point the balls- for a single stream or for taking unions of streams if there and-bins approach can be simulated by hashing indices (the are no deletions, L0-estimation can be applied to a pair of “balls”) into entries of a bitvector (the “bins”). streams to measure the number of unequal item counts. This Following the above scheme, an estimate of F0 can be ob- makesitmoreflexiblethanF0, and can be used in applica- tained by running a constant-factor approximation in par- tions such as maintaining ad-hoc communication networks allel to obtain such an R at the end of the stream, and amongst cheap sensors [25]. It also has applications to data meanwhile performing the above scheme for geometrically cleaning to find columns that are mostly similar [14]. Even if increasing guesses of R, one of which must be correct to the rows in the two columns are in different orders, stream- within a constant factor. Thus, the bits tracked can be ing algorithms for L0 can quickly identify similar columns. viewed as a bitmatrix: rows corresponding to log(n)lev- As with F0, L0-estimation is also useful for packet tracing els of subsampling, and columns corresponding to entries in and database auditing [13]. the bitvector. At the end of the stream, upon knowing R, Formally, in this problem there is a vector x =(x1,...,xn) the estimate from the appropriate level of subsampling is which starts off as the 0 vector, and receives m updates of used. Such a scheme with K = Θ(1/ε2) works, and gives the form (i, v) ∈ [n] ×{−M,...,M} in a stream (M is some O(ε−2 log(n)) space, since there are log(n) levels of subsam- positive integer). The update (i, v) causes the change xi ← pling. xi +v. At the end of the stream, we should output (1±ε)L0 It was then first observed in [16] that, in fact, an estima- with probability at least 2/3, where L0 = |{i : xi =0 }|. tor can be obtained without maintaining the full bitmatrix Note that L0-estimation is a generalization of F0-estimation, above. Specifically, for each column they gave an estimator since in the latter case an index i in the stream corresponds that required only maintaining the deepest row with its bit to the update (i, 1) in an L0-estimation problem. set to 1. This allowed them to collapse the bitmatrix above −2 We give an L0-estimation algorithm with O(1) update and to O(ε log log(n)) bits. Though, their estimator and anal- − reporting times, using O(ε 2 log(n)(log(1/ε)+log log(mM))) ysis required access to a purely random hash function. bits of space, both of which improve upon the previously best Our F0 algorithm is inspired by the above two algorithms known algorithm of Ganguly [22], which had O(log(1/ε)) up- of [4, 16]. We give a subroutine RoughEstimator using − date time and required O(ε 2 log(n) log(mM)) space. Our O(log(n)) space which with high probability, simultaneously update and reporting times are optimal, and the space is op- provides a constant-factor approximation to F0 at all times timal up to the log(1/ε) + log log(mM) term due to known in the stream. Previous subroutines gave a constant factor lower bounds [3, 27]. Furthermore, unlike with Ganguly’s approximation to F0 at any particular point in the stream algorithm, our algorithm does not require that xi ≥ 0for with probability 1−δ using O(log(n) log(1/δ)) space; a good each i to operate correctly. approximation at all times then required setting δ =1/m to apply a union bound, thus requiring O(log(n) log(m)) space. R F 1.1 Overview of our algorithms The next observation is that if =Θ( 0), the largest row Our algorithms build upon several techniques given in pre- index witha1bitforanygivencolumnisheavilyconcen-

42 trated around the value log(F0/K). Thus, if we bitpack the kept as part of the internal state of any of our algorithms, K counters and store their offsets from log(R/K), we expect we use y(t) to denote the contents of that variable at time to only use O(K) space for all counters combined. Whenever t. R changes, we update all our offsets. Lastly, we analyze our algorithm without any idealized as- There are of course obvious obstacles in obtaining O(1) sumptions, such as access to a cryptographic hash function, running times, such as the occasional need to decrement or to a hash function which is truly random. Our analyses all K counters (when R increases), or to locate the start- all take into account the space and time complexity required ing position of a counter in a bitpacked array when reading to store and compute the types of hash functions we use. and writing entries. For the former, we use a “variable-bit- length array” data structure [7], and for the latter we use 2. BALLS AND BINS WITH LIMITED IN- an approach inspired by the technique of deamortization of global rebuilding (see [29, Ch. 5]). Furthermore, we ana- DEPENDENCE lyze our algorithm without assuming a truly random hash In the analysis of the correctness of our algorithms, we function, and show that a combination of fast k-wise inde- require some understanding of the balls and bins random pendent hash functions [35] and uniform hashing [31] suffice process with limited independence. We note that [4] also to have sufficient concentration in all probabilistic events we required a similar analysis, but was only concerned with ap- consider. proximately preserving the expectation under bounded inde- Our L0-estimation algorithm also uses subsampling and a pendence whereas we are also concerned with approximately balls-and-bins approach, but needs a different subroutine for preserving the variance. Specifically, consider throwing a obtaining the value R, and for representing the bitmatrix. set of A balls into K bins at random and wishing to un- Specifically, if one maintains each bit as a counter and tests derstand the random variable X being the number of bins for the counter being non-zero, frequencies of opposite sign receiving at least one ball. This balls-and-bins random pro- may cancel to produce 0 and give false negatives. We instead cess can be modeled by choosing a random hash function store the dot product of frequencies in each counter with a h ∈HA([A], [K]), i.e. h acts fully independently on the A − random vector over a suitably large finite field. We remark balls, and letting X = |i ∈ [K]:h 1(i) = ∅|. When analyz- that Ganguly’s algorithm [22] is also based on a balls-and- ing our F0 algorithm, we require an understanding of how bins approach, but on viewing the number of bins hit by X behaves when h ∈Hk([A], [K]) for k A. exactly one ball (and not at least one ball), and the source Henceforth, we let Xi denote the random variable indicat- of his algorithm’s higher complexity stems from technical ing that at least one ball landsP in bin i under a truly random h X K X issues related to this difference. hash function , so that = i=1 i. The following fact is standard. 1.2 Preliminaries “ ` ´ ” Fact X K − − 1 A Throughout this paper, all space bounds are given in bits. 1. E[ ]= 1 1 K We always use m to denote stream length and [n] to denote the universe (the notation [n] represents {1,...,n}). With- The proof of the following lemma is deferred to the full out loss of generality, we assume n is a power of 2, and ε ≤ ε0 version due to space constraints. for some fixed constant ε0 > 0. In the case of L0, M denotes 2 an upper bound on the magnitude of updates to the xi.We Lemma 1. If 100 ≤ A ≤ K/20,thenVar[X] < 4A /K. use the standard word RAM model, and running times are measured as the number of standard machine word opera- We now state a lemma that k-wise independence for small tions (integer arithmetic, bitwise operations, and bitshifts). k suffices to preserve E[X] to within 1 ± ε, and to preserve We assume a word size of at least Ω(log(nmM)) bits to be Var[X] to within an additive ε2. We note that item (1) in able to manipulate counters and indices in constant time. the following lemma was already shown in [4, Lemma 1] but For reals A, B, ε ≥ 0, we use the notation A =(1± ε)B with a stated requirement of k = Ω(log(1/ε)), though their to denote that A ∈ [(1 − ε)B,(1 + ε)B]. We use lsb(x)to proof actually seems to only require k = Ω(log(1/ε)/ log log(1/ε)). denote the (0-based index of) the least significant bit of a Our proof of item (1) also only requires this k, but we require nonnegative integer x when written in binary. For example, dependence on K in our proof of item (2). The proof of the lsb(6) = 1. We define lsb(0) = log(n). All our logarithms following lemma is in Section A.1, and is via approximate are base 2 unless stated otherwise. We also use Hk(U, V ) inclusion-exclusion. to denote some k-wise independent hash family of functions mapping U into V . Using known constructions [11], a ran- Lemma 2. There exists some constant ε0 such that the ε ≤ ε A dom h ∈Hk(U, V ) can be represented in O(k log(|U| + |V |)) following holds for 0.Let balls be mapped into bits when |U|, |V | are powers of 2, and computed in the K bins using a random h ∈H2(k+1)([A], [K]),wherek = same amount of space. Also, henceforth, whenever we dis- c log(K/ε)/ log log(K/ε) for a sufficiently large constant c> ≤ A ≤ K i ∈ K X cuss picking an h ∈Hk(U, V ), it should be understood that 0. Suppose 1 .For [ ],let i be an indicator h is being chosen as a random element of Hk(U, V ). variable which is 1 if and only if thereP exists at least one i h X K X When discussing F0,fort ∈ [m]weuseI(t) to denote ball mapped to bin by .Let = i=1 i. Then the {i1,...,it}, and define F0(t)=|I(t)|. We sometimes use I following hold: to denote I(m) so that F = F (m)=|I|. In the case of L - 0 0 0 (1). |E[X] − E[X]|≤εE[X] estimation, we use I(t) to denote the i with xi = 0 at time e  2 t. For an algorithm which outputs an estimate F0 of F0,we (2). Var[X ] − Var[X] ≤ ε e let F0(t) be its estimate after only seeing the first t updates (and similarly for L0). More generally, for any variable y We now give a consequence of the above lemma.

43 Lemma 3. There exists a constant ε0 such that the fol- approximation to F0(t) for every t in which F0(t)issuffi- X e lowing holds. Let be as in Lemma 2, and also assume ciently large. That is, if our estimate of F0(t)isF0(t), 100 ≤ A ≤ K/20 with K =1/ε2 and ε ≤ ε .Then 0 e Pr[∀t ∈ [m] s.t. F0 ≥ KRE, F0(t)=Θ(F0(t))] = 1 − o(1), Pr[|X − E[X]|≤8εE[X]] ≥ 4/5 where KRE is as in Figure 2. . Proof. Observe that ! !! e Theorem 1. With probability 1 − o(1), the output F0 of 2 2 A 4 e E[X] ≥ (1/ε ) 1 − 1 − Aε + ε RoughEstimator satisfies F0(t) ≤ F0(t) ≤ 8F0(t) for ev- 2 ery t ∈ [m] with F (t) ≥ K simultaneously. The space ! ! 0 RE used is O(log(n)). A =(1/ε2) Aε2 − ε4 2 Proof. We first analyze space. The counters in total take O(KRE log log(n)) = O(log(n)) bits. The hash functions ≥ (39/40)A, hj ,hj O n hj 1 2 each take (log( )) bits. The hash functions 3 take O K K O n since A ≤ 1/(20ε2). ( RE log( RE)) = (log( )) bits. By Lemma 2 we have E[X] ≥ (1 − ε)E[X] > (9/10)A, We now analyze correctness. and additionally using Lemma 1 we have that Var[X] ≤ Lemma 4. For any fixed point t in the stream with F (t) ≥ Var[X]+ε2 ≤ 5ε2A2. Set ε =7ε. Applying Chebyshev’s 0 K , and fixed j ∈ [3], with probability 1 − O(1/K ) we inequality, RE RE F t ≤ Fej t ≤ F t have 0( ) 0 ( ) 4 0( ). Pr[|X − E[X]|≥(10/11)εE[X]] Proof. The algorithm RoughEstimator of Figure 2 ≤ X / / 2 ε 2 2 X Var[ ] ((10 11) ( ) E [ ]) can be seen as taking the median output of three instan- 2 2 2  2 2 2 ≤ 5 · A ε /((10/11) (ε ) (9/10) A ) tiations of a subroutine, where each subroutine has KRE counters C ,...,CK , hash functions h ,h ,h , and de- < (13/2)ε2/(10ε/11)2 1 RE 1 2 3 fined quantities Tr(t)=|{i : Ci(t) ≥ r}|, where Ci(t)isthe < 1/5 state of counter Ci at time t. We show that this subrou- tine outputs a value Fe (t) ∈ [F (t), 4F (t)] with probability Thus, with probability at least 1/5, by the triangle in- 0 0 0    1 − O(1/K ). equality and Lemma 2 we have |X −E[X]|≤|X −E[X ]|+ RE  Define Ir(t) ⊆ I(t) as the set of i ∈ I(t) with lsb(h (i)) ≥ |E[X ] − E[X]|≤8εE[X]. 1 r. Note |Ir(t)| is a random variable, and

F0(t) F0(t) F0(t) 3. F0 ESTIMATION ALGORITHM E[|Ir(t)|]= , Var[|Ir(t)|]= − ≤ E[|Ir(t)|], 2r 2r 22r In this section we describe our F -estimation algorithm. 0 h Our algorithm requires, in part, a constant-factor approx- with the latter using 2-wise independence of 1. Then by imation to F at every point in the stream in which F is Chebyshev’s inequality, 0 0 »˛ ˛ – sufficiently large. We describe a subroutine RoughEsti- ˛ F t ˛ F t ˛|I t |− 0( ) ˛ ≥ q · 0( ) ≤ 1 . mator in Section 3.1 which provides this, using O(log(n)) Pr ˛ r( ) r ˛ r (1) 2 2 q2 · E[|Ir(t)|] space, then we give our full algorithm in Section 3.2.  We remark that the algorithm we give in Section 3.2 is Since F0(t) ≥ KRE, there exists an r ∈ [0, log n] such that space-optimal, but is not described in a way that achieves KRE/2 ≤ E[|Ir (t)|] r+1, Ir (t) ≤ 7KRE/24. Applying Eq. (1) with / O n  ity 2 3 using (log( )) space. To understand why our guar- r = r + 2 and using that Ir+1(t) ⊆ Ir(t), antees from RoughEstimator are different, one should pay E  ≥ − O /K . particular attention to the quantifiers. In previous algo- Pr[ ] 1 (1 RE) rithms, it was guaranteed that there exists a constant c>0 We now define two more events. The first is the event t   such that at any particular point in the stream, with prob- E that Tr (t) ≥ ρKRE. The second is the event E that − δ Fe t F t ,cF t     ability at least 1 the output 0( )isin[ 0( ) 0( )], Tr (t) <ρKRE for all r >r + 1. Note that if E ∧E O n /δ with the space used being (log( ) log(1 )). To then guar- holds, then F (t) ≤ Fe (t) ≤ 4F (t). We now show that these e 0 0 0 antee F0(t) ∈ [F0(t),cF0(t)] for all t ∈ [m] with probability events hold with large probability. 2/3, one should set δ =1/(3m) to then union bound over all Define the event A that the indices in Ir (t) are per- t O n m  , giving an overall space bound of (log( ) log( )). Mean- fectly hashed under h2, and the event A that the indices RoughEstimator while, in our subroutine , we ensure that in Ir+2(t) are perfectly hashed under h2. Then with probability 2/3, Fe (t) ∈ [F (t),cF (t)] for all t ∈ [m] 0 0 0 A|E ≥ − O /K . simultaneously, and the overall space used is O(log(n)). Pr[ ] 1 (1 RE) and similarly for Pr[A |E]. 3.1 RoughEstimator Note that, conditioned on E∧A, Tr (t) is distributed We now show that RoughEstimator (Figure 2) with exactly as the number of non-empty bins when throwing probability 1 − o(1) (as n →∞) outputs a constant-factor |Ir (t)| balls uniformly at random into KRE bins. This is

44 1. Set KRE = max{8, log(n)/ log log(n)}. j j 2. Initialize 3KRE counters C ,...,C to −1forj ∈ [3]. 1 KRE j j 3 j 3 3. Pick random h ∈H ([n], [0,n− 1]), h ∈H ([n], [K ]), h ∈H K ([K ], [K ]) for j ∈ [3]. 1 2 2 2 jRE 3 2 RE ffRE RE j j j 4. Update(i): For each j ∈ [3], set C ← max C , lsb(h (i)) . hj hj i hj hj i 1 3( 2( )) 3( 2( )) j j 5. Estimator: For integer r ≥ 0, define Tr = |{i : Ci ≥ r}|. r r∗ T j ≥ ρK Fej r∗ K r Fej − For the largest = with r RE, set 0 =2 RE. If no such exists, 0 = 1. Fe {Fe1, Fe2, Fe3} Output 0 = median 0 0 0 .

e Figure 2: RoughEstimator pseudocode. With probability 1 − o(1), F0(t)=Θ(F0(t)) at every point t in the −1/3 stream for which F0(t) ≥ KRE. The value ρ is .99 · (1 − e ).

E∧A t ∈ m Fej t because, conditioned on , there are no collisions of Now, note that for any [ ], if 0 ( ) is a 4-approximation I t h h e members of r ( ) under 2, and the independence of 3 is to F0(t) for at least two values of j, then F0(t) ∈ [F0(t), 4F0(t)]. larger than |Ir (t)|.Thus, e Thus by Lemma 4, F0(t) ∈ [F0(t), 4F0(t)] with probability ! 2 „ « − O /K tr |Ir (t)| 1 (1 RE ). Let be the first time in the stream when 1 F t r t ∞ E[Tr (t) |E∧A]= 1 − 1 − KRE. 0( r)=2 (if no such time exists, let r = ). Then by a KRE e union bound, our estimate of F0(tr)isin[F0(tr), 4F0(tr)] at t r ∈ , n −O n /K 2 The same argument applies for the conditional expectation all r for [0 log ] with probability 1 (log( ) RE )=     −o Fe t E[Tr (t) |E ∧A]forr >r + 1. Call these conditional 1 (1). Now, observe that our estimates 0( ) can only ever n t F t , F t expectations Er. Then since (1−1/n)/e ≤ (1−1/n) ≤ 1/e increase with . Thus, if our estimate is in [ 0( r) 4 0( r)] t F t , F t t ∈ m for all real n ≥ 1 (see Proposition B.3 of [28]), we have that at all points r, then it is in [ 0( ) 8 0( )] for all [ ]. This concludes our proof. Er/KRE lies in the interval 2 0 13 „ « „ « |Ir (t)| |I t | |I t | K 3.2 Full algorithm − r ( ) − r ( ) RE 4 K @ K 1 A5 1 − e RE , 1 − e RE 1 − In this section we analyze our main algorithm (Figure 3), KRE which (1 ± O(ε))-approximates F0 with 11/20 probability. We again point out that the implementation described in r >r Thus for +1, Figure 3 is not our final algorithm which achieves O(1) up- „ « ! 7/24 date and reporting times; the final optimal algorithm is a −7/24 1 Er ≤ 1 − e 1 − KRE, and modification of Figure 3, described in Section 3.4. We as- KRE sume throughout that F0 ≥ K/32 and deal with the case of F O ε−2 n “ ” small 0 in Section 3.3. The space used is ( + log( )) −1/3 bits. Note that the 5/8 can be boosted to 1 − δ for arbi- Er ≥ 1 − e K . RE trary δ>0 by running O(log(1/δ)) instantiations of our

A calculation shows that Er <.99Er since KRE ≥ 8. algorithm in parallel and returning the median estimate By negative dependence in the balls and bins random pro- of F0. Also, the O(ε) term in the error guarantee can be ε ε ε/C cess (see [15]), the Chernoff bound applies to Tr(t) and thus made by running the algorithm with = for a suffi- ciently large constant C. Throughout this section we with- 2 − Er /3 Pr [|Tr (t) − Er |≥Er |E∧A] ≤ 2e out loss of generality assume n is larger than some constant 2 n0, and 1/ε ≥ C log(n) for a constant C of our choice, for any >0, and thus by taking  a small enough constant, and is a powerp of 2. If one desires a (1 ± ε)-approximation  − K ε> / C n Pr[E |E∧A] ≥ 1 − e Ω( RE). for p 1 log( ), we simply run our algorithm with ε =1/ C log(n), which worsens our promised space bound   We also have, for r >r +1, by at most a constant factor. ˆ ˜ The algorithm of Figure 3 works as follows. We main-      −Ω(KRE) Pr[E |E ∧A ]=Pr Tr (t) ≥ ρKRE |E ∧A ≥ 1−e . 2 tain K =1/ε counters C1,...,CK as well as three values A, b, Thus, overall, est. Each index is hashed to some level between 0 and log(n), based on the least significant bit of its hashed value,       Pr[E ∧E ] ≥ Pr[E ∧E ∧E∧E ∧A∧A] and is also hashed to one of the counters. Each counter ≥ Pr[E  ∧E∧A]+Pr[E  ∧E ∧A] − 1 maintains the deepest level of an item that was hashed to  it. Up until this point, this information being kept is iden- = Pr[E |E∧A] · Pr[A|E] · Pr[E]       tical as in the LogLog [16] and HyperLogLog [19] algorithms + Pr[E |E ∧A] · Pr[A |E] · Pr[E ] (though our analysis will not require that the hash functions − 1 be truly random). The value A keeps track of the amount of storage required to store all the Ci, and our algorithm fails ≥ 1 − O(1/KRE) if this value ever becomes much larger than a constant times K (which we show does not happen with large probability).

45 1. Set K =1/ε2. 2. Initialize K counters C1,...,CK to −1. 3 3 3. Pick random h1 ∈H2([n], [0,n−1]), h2 ∈H2([n], [K ]), h3 ∈Hk([K ], [K]) for k = Ω(log(1/ε)/ log log(1/ε)). 4. Initialize A, b, est = 0. 5. Run an instantiation RE of RoughEstimator. 6. Update(i): Set x ← max{Ch h i , lsb(h (i)) − b}. ˚ 3( 2( )) 1 ˇ A ← A − C  x  Set log(2 + h3(h2(i))) + log(2 + ) . If A>3K, Output FAIL. C ← x i Set h3(h2(i)) . Also feed to RE. Let R be the output of RE. if R>2est:

(a) est ← log(R), bnew ← max{0, est − log(K/32)}.

(b) For each j ∈ [K], set Cj ← max{−1,Cj + b − bnew} P b ← b A ← K  C  (c) new, j=1 log( j +2) . − T Estimator: T |{j C ≥ }| Fe b · ln(1 K ) 7. Define = : j 0 . Output 0 =2 − 1 . ln(1 K )

e Figure 3: F0 algorithm pseudocode. With probability 11/20, F0 =(1± O(ε))F0.

est The value est is such that 2 is a Θ(1)-approximation to takes on its maximum value for t = tj for some j ∈ [r], and F0, and is obtained via RoughEstimator, and b is such thus it suffices to show that A(tj ) ≤ 3K for all j ∈ [r]. We F t / b K t eRE that we expect 0( ) 2 to be Θ( ) at all points in the furthermore note that r ≤ log(n) + 3 since F0 (t) is weakly stream. Each Ci then actually holds the offset (from b)of increasing, only increases in powers of 2, and is always be- the deepest level of an item that was hashed to it; if no tween 1 and 8F0 ≤ 8n given that E occurs. Now, item of level b or deeper hashed to Ci, then Ci stores −1. XK Furthermore, the counters are bitpacked so that Ci only re- A(t) ≤ K + log(Ci(t)+2) quires O(1 + log(Ci)) bits of storage (Section 3.4 states a i=1 known data structure which allows the bitpacked Ci to be ! PK stored in a way that supports efficient reads and writes). Ci(t) ≤ K K · i=1 + log K +2 Theorem 2. The algorithm of Figure 3 uses O(ε−2 + log(n)) space. with the last inequality using concavity of the logarithm and Proof. The hash functions h ,h each require O(log(n)) Jensen’s inequality.P It thus suffices to show that, with large 1 2 K C t ≤ K j ∈ r bits to store. The hash function h takes O(k log(K)) = probability, i=1 i( j ) 2 for all [ ]. 3 t t j ∈ r i ∈ I t X t O(log2(1/ε)) bits. The value b takes O(log log n) bits. The Fix some = j for [ ]. For ( ), let i( ) be the A random variable max{−1, lsb(h1(i)) − b}, and let X(t)= value never exceeds the total number of bits to store all P PK −2 X t C t ≤ X t counters, which is O(ε log(n)), and thus A can be rep- i∈I(t) i( ). Note i=1 i( ) ( ), and thus it suffices resented in O(log(1/ε) + log log(n)) bits. The counters Cj to lower bound Pr[X(t) ≤ 2K]. b+s+1 never in total consume more than O(1/ε2) bits by construc- We have that Xi(t) equals s with probability 1/2 for tion, since we output FAIL if they ever would. 0 ≤ s K < 0( ) with large probability. Note A is always log(Ci +2), Pr[ ( ) 2 ] i=1 2b · (2K − F (t)/2b)2 and we must thus show that with large probability this quan- 0 K A t b tity is at most 3 at all points in the stream. Let ( )bethe Conditioned on E, K/256 ≤ F0(t)/2 ≤ K/32, implying value of A at time t (before running steps (a)-(c)), and sim- the above probability is at most 1/(32K). Then by a union ilarly define Cj (t). We condition on the randomness used bound over all tj for j ∈ [r], we have that X(t) ≤ 2K for all by RE, which is independent from the remaining parts of j ∈ [r] with probability at least 1 − r/(32K) ≥ 1 − 1/32 by the algorithm. Let t1,...,tr−1 be the points in the stream our assumed upper bound on ε, implying we output FAIL eRE eRE where the output of RE changes, i.e F0 (tj − 1) = F0 (tj ) with probability at most 1/32. for all j ∈ [r − 1], and define tr = m. We note that A(t) We now show that the output from Step 7 in Figure 3 is

46 (1 ± O(ε))F0 with probability 11/16. Let A be the algo- For the case F0 ≥ 100 we can apply Lemma 3 as was rithm in Figure 3, and let A be the same algorithm, but done in the proof of Theorem 3. We maintain K =2K  without the third line in the update rule (i.e., A never bits B1,...,BK in parallel, initialized to 0. When seeing e  i outputs FAIL). We first show that the output F0 of A is an index in the stream, in addition to carrying out Step B h (1 ± O(ε))F0 with probability 5/8. Let Ib be the set of in- 6ofFigure3,wealsoset h3(h2(i)) to 1 ( 3 can be taken b K K dices i ∈ I such that lsb(i) ≥ b. Then E[|Ib|]=F0/2 , and to have range =2 , and its evaluation can be taken Var[|Ib|] ≤ E[|Ib|], with the last inequality in part using modulo K when used in Figure 3 to have a size-K range). t t ∈ m F t K/ t pairwise independence of h1. We note that conditioned on Let 0 be the smallest [ ] with 0( )= 64, and 1  E,wehave be the smallest t ∈ [m] with F0(t)=K /32 (if no such ti exist, set them to ∞). Define TB (t)=|{i : Bi(t)=1}|, K/256 ≤ E[|Ib|] ≤ K/32 eB   and define F0 (t) = ln(1 − TB (t)/K )/ ln(1 − 1/K ). Then  Let E be the event that K/300 ≤|Ib|≤K/20. Then by by similar calculations as in Theorem 3 and a union bound eB Chebyshev’s inequality, over t0,t1, Pr[F0 (ti)=(1± O(ε))F0(ti)fori ∈{0, 1}]isat  − · / − o / − o FeB t Pr[E |E] ≥ 1 − O(1/K)=1− o(1). least 1 2 (1 5) (1)=3 5 (1). Noting that 0 ( ) t  monotonically increases with , we can do the following: for Also, if we let E be the event that Ib is perfectly hashed eB  t with F0 (t) ≥ K /32 = K/16, we output the estimator under h2, then pairwise independence of h2 gives eB from Figure 3; else, we output F0 (t). We summarize this Pr[E  |E] ≥ 1 − O(1/K)=1− o(1). section with the following theorem.   Now, conditioned on E ∧E , we have that T is a random Theorem 4. Let δ>0 be any fixed constant, and ε>0 variable counting the number of bins hit by at least one ball be given. There is a subroutine requiring O(ε−2 + log(n)) under a k-wise independent hash function, where there are space which with probability 1 − δ satisfies the property that   B = |Ib| balls, K bins, and k = Ω(log(K/ε)/ log log(K/ε)). there is some t ∈ [m] satisfying: (1) for any fixed t1 elements, and let V =[v], with arbitrarily large constant probability. with 1 0 there is word RAM algorithm that,   e z O(1) v O z u on E∧E ∧E , F0 =(1± O(ε))F0 with probability at least using time log( ) log ( ) and (log( ) + log log( )) bits 4/5 − δ for any constant δ>0 of our choice, e.g. δ =1/5. of space, selects a family H of functions from U to V (inde- Since Pr[E∧E ∧E] ≥ 1 − o(1), we thus have pendent of S) such that: e − O /zc H z Pr[F0 =(1± O(ε))F0] ≥ 3/5 − o(1). 1. With probability 1 (1 ), is -wise independent when restricted to S. Note our algorithm in Figure 3 succeeds as long as (1) we h ∈H do not output FAIL, and (2) Fe =(1± O(ε))F , and thus 2. Any can be represented by a RAM data structure 0 0 O z v h overall we succeed with probability at least 1− 2 −o(1)− 1 > using ( log( )) bits of space, and canbeevaluated 5 32 O z 11 . in constant time after an initialization step taking ( ) 20 time. F 3.3 Handling small 0 The following is a corollary of Theorem 2.16 in [35]. 2 In Section 3.2, we assumed that F0 =Ω(K)forK =1/ε (specifically, F0 ≥ K/32). In this subsection, we show how Theorem 7 (Siegel [35]). Let U =[u] and V =[v] c to deal with the case that F0 is small, by running a similar with u = v for some constant c ≥ 1, where the machine (but simpler) algorithm to that of Figure 3 in parallel. word size is Ω(log(v)). Suppose one wants a k(v)-wise in- The case F0 < 100 can be dealt with simply by keeping dependent hash family H of functions mapping U to V for the first 100 distinct indices seen in the stream in memory, k(v)=vo(1). For any constant >0 there is a random- taking O(log(n)) space. ized procedure for constructing such an H which succeeds

47 with probability 1 − 1/v, taking v bits of space. A random We will use the following “variable-bit-length array” data h ∈Hcan be selected using v bits of random seed, and h structure to implement the array C of counters in Figure 3, can be evaluated in O(1) time. which has entries whose binary representations may have unequal lengths. Specifically, in Figure 3, the bit represen- RoughEstimator We now describe a fast version of . tation of Ci requires O(1 + log(Ci + 2)) bits. Lemma RoughEstimator 5. can be implemented with Definition 1 (Blandford, Blelloch [7]). A variable- O (1) worst-case update and reporting times, at the expense bit-length array (VLA) is a data structure implementing an of only giving a 16-approximation to F0(t) for every t ∈ [m] array C1,...,Cn supporting the following operations: (1) with F (t) ≥ K ,forK as in Figure 2. 0 RE RE update(i, x) sets the value of Ci to x, and (2) read(i) re- Proof. We first discuss update time. We replace each turns Ci. Unlike in standard arrays, the Ci are allowed hj H 3 with a random function from the hash family of Theo- to have bit-representations of varying lengths, and we use 3 rem 6 with z =2KRE, u = KRE , v = KRE. The constant c len(Ci) to represent the length of the bit-representation of j h Ci in Item 1 of Theorem 6 is chosen to be 1, so that each 3 is . uniform on any given subset of z items of [u] with probability Theorem 8 (Blandford and Blelloch [7]). There is 1−O(1/KRE). Note that the proof of correctness of Rough- P j O n C Estimator (Theorem 1) only relied on the h being uniform a VLA data structure using ( + i len( i)) space to store 3 n elements, supporting worst-case O(1) updates and reads, on some unknown set of 4KRE/3 < 2KRE indices with prob- under the assumptions that (1) len(Ci) ≤ w for all i,and ability 1 − O(1/K ) (namely, those indices in Ir (t)). The RE w ≥ M w M space required to store any h ∈His z log(v)=O(log(n)), (2) log( ). Here is the machine word size, and which does not increase our space bound for RoughEsti- is the amount of memory available to the VLA. mator. Updates then require computing a least significant hj ,hj ,hj We now give a time-optimal version of Figure 3. bit, and computing the 1 2 3, all taking constant time. For reporting time, in addition to the information main- Theorem 9. The algorithm of Figure 3 can be imple- tained in Figure 2, we also maintain three sets of counters mented with O(1) worst-case update and reporting times. Aj ,Aj ,Aj ,Aj ,Aj j ∈ j Aj 0 1 2 3 4 for [3]. For a fixed , the i store j Proof. For update time, we select h from the hash fam- T for an r we now specify. Roughly speaking, for the val- 3 r+i ily of Theorem 7, which requires O(1/ε) space for arbitrar- ues of t where F0(t) ≥ KRE, r will be such that, conditioned r ily small >0 of our choosing (say,  = 1), and thus this on RoughEstimator working correctly, 2 will always be space is dominated by other parts of the algorithm. We then in [F0(t)/2, 8F0(t)]. We then alter the estimator of Rough- r can evaluate h ,h ,h in constant time, as well as compute Estimator to being 2 +1. 1 2 3 the required least significant bit in constant time. Updat- Note that, due to Theorem 4, the output of RoughEs- ing A requires computing the ceiling of a base-2 logarithm, timator does not figure into our final F0-estimator until 2 but this is just a most significant bit computation which we F0(t) ≥ (1 − O(ε))/(32ε ), and thus the output of the al- can do in O(1) time. We can also read and write the Cj gorithm is irrelevant before this time. We start off with j j j j j in constant time whilst using the same asymptotic space by r = log(1/(32ε2)). Note that A ,A ,A ,A ,A can be 0 1 2 3 4 Theorem 8. maintained in constant time during updates. At some point What remains is to handle the if statement for when t , the estimator from Section 3.3 will declare that F (t )= 1 0 1 R>2est. Note that a na¨ıve implementation would require (1 ± O(ε))/(32ε2), at which point we are assured F (t ) ≥ 0 1 O(K) time. Though this if statement occurs infrequently 1/(64ε2) ≥ log(n) (assuming ε is smaller than some con- enough that one could show O(1) amortized update time, stant, and assuming that 1/ε2 ≥ 64 log(n)). Similarly, we we instead show the stronger statement that an implemen- also have F (t ) ≤ 1/(16ε2) ≤ 4 log(n). Thus, by our choice 0 1 tation is possible with O(1) worst case update time. The of r and conditioned on the event that RoughEstimator idea is similar to that in the proof of Lemma 5: when b of Figure 2 succeeds (i.e., outputs a value in [F (t), 8F (t)] new 0 0 changes, it cannot change more than a constant number of for all t with F0(t) ≥ KRE), we can determine the median ∗ j times again in the next O(K) updates, and so we can spread across the j of the largest r such that Tr ≥ ρKRE from the j ∗ r t the O(K) required work over the next O(K) stream updates, A and set r(t )=r so that 2 ( 1) is in [F (t ), 8F (t )]. i 1 0 1 0 1 doing a constant amount of work each update. Our argument henceforth is inductive: conditioned on the Specifically, note that bnew only ever changes for times t output of RoughEstimator from Figure 2 being correct est(t) r t when R(t) > 2 ≥ K/16, conditioned on the subroutine (always in [F (t), 8F (t)]), 2 ( ) will always be in [F (t)/2, 8F (t)] 0 0 0 0 of Theorem 4 succeeding, implying that F (t) ≥ K/256, for all t ≥ t , which we just saw is true for t = t . Note 0 1 1 and thus there must be at least K/256 updates for F (t) that conditioned on RoughEstimator being correct, its es- 0 to double. Since RoughEstimator always provides an 8- timate of F cannot jump by a factor more than 8 at any 0 approximation, est can only increase by at most 3 in the given point in the stream. Furthermore, if this happens, we j next K/256 stream updates. We will maintain a primary will detect it since we store up to A . Thus, whenever we 4 and secondary instantiation of our algorithm, and only the find that the estimate from RoughEstimator changed (say est primary receives updates. Then in cases where R>2 r r r r − r Aj from 2 to 2 ), we increment by and set each i and b changes from b, we copy a sufficiently large con- Aj i ≤ r − r r − r

48 process new stream updates in both the primary and sec- log(16R/K),the(i∗)th row of A can be recovered with proba- ondary structures, and we answer updates from the primary bility 2/3. Furthermore, the update time and time to recover structure. The analysis of correctness remains virtually un- any Ai,j are both O(1). changed, since the value 2b corresponding to the primary Proof. We represent each Ai,j as a counter Bi,j of O(log(K)+ structure still remains a constant-factor approximation to log log(mM)) bits. We interpret Ai,j as being the bit “1” if F during this copy phase. 0 Bi,j is non-zero; else we intrepret Ai,j as 0. The details are T |{i C ≥ For reporting time, note we can maintain = : i as follows. We choose a prime p randomly in [D, D3]forD = }| 0 during updates, and thus the reporting time is the time 100K log(mM). Notice that for mM larger than some con- O to compute a natural logarithm, which can be made (1) stant, by standard results on the density of primes there are via a small lookup table (see Section A.2). at least K2 log2(mM) primes in the interval [D, D3]. Since every frequency xi is at most mM in magnitude and thus 4. L0 ESTIMATION ALGORITHM has at most log(mM) prime factors, non-zero frequencies re- p −O /K2 Here we give an algorithm for estimating L0, the Ham- main non-zero modulo with probability 1 (1 ), which ming norm of a vector updated in a stream. we condition on occurring. We also randomly pick a vector ∈ FK h ∈H K3 , K Our L0 algorithm is based on the approach to F0 estima- u p and 4 2([ ] [ ]). Upon receiving an update i, v B v · tion in Figure 4. In this approach, we maintain a lg(n) × K ( ), we increment lsb(h1(i)),h3(h2(i)) by uh4(h2(i)), then bit-matrix A, and upon receiving an update i, we subsam- reduce modulo p. ∗ ple i to the row determined by the lsb of a hash evalua- Define Ii∗ = {i ∈ I : lsb(i)=i }. Note that condi- tion, then evaluate another hash function to tell us a col- tioned on R ∈ [L0,cL0], we have E[Ii∗ ] ≤ K/32, and thus umn and set the corresponding bit of A to 1. Note that Pr[|Ii∗ |≤K/20] = 1 − O(1/K)=1− o(1) by Chebyshev’s our algorithm from Section 3 is just a space-optimized im- inequality. We condition on this event occurring. Also, since 3 plementation of this approach. Specifically, in Figure 3 the range of h2 is of size K , the indices in Ii∗ are perfectly − O /K − o we obtained a c-approximation R to F0 via RoughEsti- hashed with probability 1 (1 )=1 (1), which we mator for c = 8. The value b we maintained was just also condition on occurring. max{0, lg(32R/K)}. Then rather than explicitly maintain- Let Q be the event that p does not divide any |xj | for ing A, we instead maintained counters Cj which allowed j ∈ Ii∗ . Then by a union bound, Pr[Q]=1− O(1/K).   us to deduce whether Ab,j = 1 (specifically, Ab,j =1iff Let Q be the event that h4(h2(j)) = h4(h2(j )) for dis-   Cj = 0). tinct j, j ∈ Ii∗ with h3(h2(j)) = h3(h2(j )).  The proof of correctness of the approach in Figure 4 is thus Henceforth, we also condition on both Q and Q occurring, essentially identical to that of Theorem 3 (in fact simpler, which we later show holds with good probability. Define J since we do not have to upper bound the case of outputting as the set of j ∈ [K] such that h3(h2(i)) = j for at least one FAIL), so we do not repeat it here. Thus, we need only show i ∈ Ii∗ , so that to properly represent the Ai∗,j we should that the approach in Figure 4 can be implemented for some have Bi∗,j non-zero iff j ∈ J. For each j ∈ J, Bi∗,j can be constant c ≥ 1 in the context of L0-estimation. Specifically, viewed as maintaining the dot product of a non-zero vector we must show that (a) the bit-matrix A can be maintained v, the frequency vector x restricted to coordinates in Ii∗ (with large probability), and (b) we can implement the ora- which hashed to j, with a random vector w, namely, the I ∗ j cle in Step 4 of Figure 4 to give a c-approximation to L0 for projection of u onto coordinates in i that hashed to . some constant c ≥ 1. The vector v is non-zero since we condition on Q, and w is  We first show (a), that we can maintain the bit-matrix random since we condition on Q . X h h j A with large probability. In fact, note our estimate of L0 Now, let i,j be a random variable indicatingP that 3( 2( )) = ∗   only depends on one particular row i = log(16R/K)ofA, h3(h2(j )) for distinct j, j ∈ Ii∗ . Let X = j h ∈H r , t  P 2. Let 0 be integers. Pick 2([ ] [ ]). h h j h h j Y Y P ` − ´ dicating 4( 2( )) = 4( 2( )), and let = j,j ∈Z j,j . S ⊂ r s |h 1(i)∩S| ≤|S|2/ t ( ) For any [ ] , E[ i=1 2 ] (2 ). Then by pairwise independence of h4, and the fact that we I ∗ h Proof. Write |S| = s.LetXi,j indicate h(i)=j.By conditioned on i being perfectly hashed under 2,wehave X linearity of expectation, the desired expectation is then  ! E[Y ]= Pr[h4(h2(j)) = h4(h2(j ))] = |Z|/K. X s 1 s2 (j,j)∈Z t E[Xi, ]E[Xi, ]=t ≤ . 1 1 2 t2 2t |Z| X X ≤ X ≤ K/ i 7/8. vector. Then, a random w ∈ Fq gives Prw[v · w =0]=1/q, Finally, by Fact 3 with q = p, and union bounding over where v · w is the inner product over Fq. all K counters Bi∗,j ,noBi∗,j for j ∈ J is 0 with probability Proof. The set of vectors orthogonal to v is a linear sub- 1 − K/p ≥ 99/100. Thus, our scheme overall succeeds with d d−1 space V ⊂ Fq of dimension d − 1 and thus has q points. probability (7/8) · (99/100) − o(1) > 2/3. Thus, Pr[w ∈ V ]=1/q. We next show (b) in Section A.3, i.e. give an algorithm Lemma 6. There is a scheme which represents each Ai,j providing an O(1)-approximation to L0 with O(1) update using O(log(1/ε) + log log(mM)) bits such that, for i∗ = and reporting times. The space used is O(log(n) log log(mM)).

49 1. Set K =1/ε2. 2. Instantiate a lg(n) × K bit-matrix A, initializing each Ai,j to 0. 3 3 3. Pick random h1 ∈H2([n], [0,n− 1]), h2 ∈H2([n], [K ]), h3 ∈Hk([K ], [K]) for k = Ω(lg(1/ε)/ lg lg(1/ε)). 4. Obtain a value R ∈ [F0,cF0] from some oracle, for some constant c ≥ 1. Update(i): A ← 5. Set lsb(h1(i)),h3(h2(i)) 1. − T Estimator: T |{j ∈ K A }| Fe 32R · ln(1 K ) 6. Define = [ ]: log(16R/K),j =1 . Output 0 = K − 1 . ln(1 K )

Figure 4: An algorithm skeleton for F0 estimation.

Note that, as with our F0 algorithm, we also need to have [13] G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. an algorithm which provides a (1 ± ε)-approximation when Comparing data streams using hamming norms (how to zero L 1/ε2. Just as in Section 3.3, this is done by handling in). IEEE Trans. Knowl. Data Eng., 15(3):529–540, 2003. 0 [14] T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. the case of small L0 in two cases separately: detecting and Mining database structure; or, how to build a data quality estimating when L0 ≤ 100, and (1 ± ε)-approximating L0 browser. In SIGMOD, pages 240–251, 2002. [15] D. P. Dubhashi and D. Ranjan. Balls and bins: A study in when L0 > 100. In the former case, we can compute L0 ex- negative dependence. Random Struct. Algorithms, actly with large probability by perfect hashing (see Lemma 8 13(2):99–124, 1998. in Section A.3). In the latter case, we use the same scheme [16] M. Durand and P. Flajolet. Loglog counting of large as in Section 3.3, but using Lemma 6 to represent our bit cardinalities (extended abstract). In Proc. ESA, pages array. 605–617, 2003. [17] C. Estan, G. Varghese, and M. E. Fisk. Bitmap algorithms for Putting everything together, we have the following. counting active flows on high-speed links. IEEE/ACM Trans. Theorem ±ε Netw., 14(5):925–937, 2006. 10. There is an algorithm for (1 )-approximating [18] S. J. Finkelstein, M. Schkolnick, and P. Tiberio. Physical −2 L0 using space O(ε log(n)(log(1/ε) + log log(mM))),with database design for relational databases. ACM Trans. 2/3 success probability, and with O(1) update and reporting Database Syst., 13(1):91–128, 1988. times. [19] P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. Disc. Math. and Theor. Comp. Sci., Acknowledgments AH:127–146, 2007. [20] P. Flajolet and G. N. Martin. Probabilistic counting We thank Nir Ailon, Erik Demaine, Piotr Indyk, T.S. Jayram, algorithms for data base applications. J. Comput. Syst. Sci., Swastik Kopparty, Rasmus Pagh, Mihai Pˇatra¸scu, and the 31(2):182–209, 1985. [21] M. L. Fredman and D. E. Willard. Surpassing the information authors of [6] for valuable discussions and references. theoretic bound with fusion trees. J. Comput. Syst. Sci., 47(3):424–436, 1993. 5. REFERENCES [22] S. Ganguly. Counting distinct items over update streams. Theor. Comput. Sci., 378(3):211–222, 2007. [1] S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. The Aqua approximate query answering system. In SIGMOD, [23] P. B. Gibbons. Distinct sampling for highly-accurate answers pages 574–576, 1999. to distinct values queries and event reports. In VLDB,pages 541–550, 2001. [2] A. Akella, A. Bharambe, M. Reiter, and S. Seshan. Detecting DDoS attacks on ISP networks. In Proc. MPDS, 2003. [24] P. B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In Proc. SPAA, pages 281–291, [3] N. Alon, Y. Matias, and M. Szegedy. The Space Complexity of 2001. Approximating the Frequency Moments. J. Comput. Syst. Sci., 58(1):137–147, 1999. [25] P. Indyk. Algorithms for dynamic geometric problems over data streams. In Proc. STOC, pages 373–380, 2004. [4] Z. Bar-Yossef, T.S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In [26] P. Indyk and D. P. Woodruff. Tight lower bounds for the Proc. RANDOM, pages 1–10, 2002. distinct elements problem. In Proc. FOCS, pages 283–, 2003. [5] Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in [27] D. M. Kane, J. Nelson, and D. P. Woodruff. On the exact streaming algorithms, with an application to counting triangles space complexity of sketching and streaming small norms. In in graphs. In Proc. SODA, pages 623–632, 2002. Proc. SODA, pages 1161–1178, 2010. [6] K. S. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and [28] R. Motwani and P. Raghavan. Randomized Algorithms. R. Gemulla. On synopses for distinct-value estimation under Cambridge University Press, 1995. multiset operations. In SIGMOD, pages 199–210, 2007. [29] M. H. Overmars. The Design of Dynamic Data Structures. [7] D. K. Blandford and G. E. Blelloch. Compact dictionaries for Springer, 1983. variable-length keys and data with applications. ACM Trans. [30] S. Padmanabhan, B. Bhattacharjee, T. Malkemus, L. Cranston, Alg., 4(2), 2008. and M. Huras. Multi-dimensional clustering: A new data [8] A. Brodnik. Computation of the least significant set bit. In layout scheme in db2. In SIGMOD, pages 637–641, 2003. Proc. ERK, 1993. [31] A. Pagh and R. Pagh. Uniform hashing in constant time and [9] J. Brody and A. Chakrabarti. A multi-round communication optimal space. SIAM J. Comput., 38(1):85–96, 2008. lower bound for gap hamming and some consequences. In Proc. [32] C. R. Palmer, G. Siganos, M. Faloutsos, and C. Faloutsos. The CCC, pages 358–368, 2009. connectivity and fault-tolerance of the internet topology. In [10] P. Brown, P. J. Haas, J. Myllymaki, H. Pirahesh, B. Reinwald, NRDM Workshop, 2001. and Y. Sismanis. Toward automated large-scale information [33] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. integration and discovery. In Data Management in a Lorie, and T. G. Price. Access path selection in a relational Connected World, pages 161–180, 2005. database management system. In SIGMOD, pages 23–34, 1979. [11] L. Carter and M. N. Wegman. Universal classes of hash [34] A. Shukla, P. Deshpande, J. F. Naughton, and K. Ramasamy. functions. J. Comput. Syst. Sci., 18(2):143–154, 1979. Storage estimation for multidimensional aggregates in the [12] E. Cohen. Size-estimation framework with applications to presence of hierarchies. In Proc. VLDB, pages 522–531, 1996. transitive closure and reachability. J. Comput. Syst. Sci., [35] A. Siegel. On universal classes of extremely random 55(3):441–453, 1997.

50 constant-time hash functions. SIAM J. Computing, of h and is equal to 33(3):505–543, 2004. !!! !!! [36] D. P. Woodruff. Optimal space lower bounds for all frequency Ai Aj moments. In Proc. SODA, pages 167–175, 2004. 1 − f(Ai) ± O 1 − f(Aj ) ± O k +1 k +1 ! ! ! !! Ai Aj Ai Aj APPENDIX = XiXj ± O + + k +1 k +1 k +1 k +1 A. APPENDIX We can now analyze the error using 2(k + 1)-wise inde- pendence. The expectation of each term in the error is cal- A.1 Expectation and variance analyses for balls culated as before, except for products of the form and bins with limited independence ! ! The following is a proof of Lemma 2. Below, X corre- A A i j . sponds to the random variable which is the number of bins k +1 k +1 receiving at least one ball when tossing A balls into K inde- pendently at random. The expected value of this is Lemma 2 (restatement). There exists some constant ε0 ! ! ε ≤ ε A 2 such that the following holds for 0.Let balls be A − k A − k K h ∈H A , K K 2( +1) ≤ K 2( +1) mapped into bins using a random 2(k+1)([ ] [ ]), k +1,k+1 k +1 where k = c log(K/ε)/ log log(K/ε) for a sufficiently large „ « constant c>0. Suppose 1 ≤ A ≤ K.Fori ∈ [K],letX eA 2(k+1) i ≤ . be an indicator variable which is 1 if and only if thereP exists K(k +1) i h X K X at least one ball mapped to bin by .Let = i=1 i. Then the following hold: Thus, for k = c log(K/ε)/ log log(K/ε) for sufficiently large  (1). |E[X ] − E[X]|≤εE[X] c > 0, each summand in the error above is bounded by 3 2 3 2  2 ε /(6K ), in which case |E[XiXj ] − E[XiXj ]|≤ε /K .We (2). Var[X ] − Var[X] ≤ ε   can also make c sufficiently large so that |E[X] − E[X ]|≤ ε3/K2 Proof. Let Ai be the random variable number counting .Now,wehave the number of balls in bin i when picking h ∈H k ([A], [K]).  2( +1) Var[X ] − Var[X] Define the function: X    ! ≤|(E[X] − E[X ]) + 2 (E[XiXj ] − E[XiXj ]) Xk i n i

X X The same expression holds for the i, and thus both E[ i] A.2 A compact lookuptable for the natural log- and E[Xi] are sandwiched inside an interval of size bounded arithm by twice the expected error. To bound the expected error we can use` (k´+1)-independence.` ´ We have that the expected A A Lemma K> value of i is ways of choosing k + 1 of the balls √ 7. Let 4 be a positive integer, and write k+1 k+1 γ / K times the product of the probabilities that each ball is in bin =1 . It is possible to construct a lookup table requir- O γ−1 /γ − c/K i.Thisis ing ( log(1 )) bits such that ln(1 ) canthenbe computed with relative accuracy γ in constant time for all ! „ « A eA k+1 A integers c ∈ [4K/5]. K−(k+1) ≤ ≤ ·(e(k+1))−(k+1), k +1 K(k +1) K Proof. We set γ = γ/15 and discretize the interval [1, 4K/5] geometrically by powers of (1 + γ). We precom- with the last inequality using that A ≤ K.Thus,|E[Xi] − pute the natural algorithm evaluated at 1 − ρ/K for all  2 E[Xi]|≤ε A/K for k = c log(1/ε)/ log log(1/ε) for suffi- discretization points ρ, with relative error γ/3, creating a ciently large constant c. In this case |E[X]−E[X]|≤ε2A ≤ table A taking space O(γ−1 log(1/γ)). We answer a query εE[X]forε smaller than some constant since E[X]=Ω(A) ln(1 − c/K) by outputting the natural logarithm of the clos- for A ≤ K. est discretization point in A. First, we argue that the error  We now analyze Var[X ]. We approximate XiXj as (1 − from this output is small enough. Next, we argue that the fk(Ai))(1−fk(Aj )). This is determined by 2k-independence closest discretization point can be found in constant time.

51 j For the error, the output is up to (1 ± γ/3), Our estimate of L0 is L˜0 =2.Ifnosuchj exists, we   estimate L˜ =1. ln(1 − (1 ± γ )c/K) = ln(1 − c/K ± γ c/K) 0 = ln(1 − c/K) ± 5γc/K Theorem 11. RoughL0Estimator with probability at / L˜ L ≤ L˜ ≤ L = ln(1 − c/K) ± γc/(3K). least 9 16 outputs a value 0 satisfying 0 0 110 0. The space used is O(log(n) log log(mM)), and the update Using the fact that | ln(1 − z)|≥z/(1 − z)for0j, know the correct index to look at in our index table A up ∗ L Sj > < / · j−j −1 to ±1/3; since indices are integers, we are done. Pr[ 0( ) 8] 1 (8 2 ) by Markov’s inequality. Thus, by a union bound, the probability that any j>j∗ P ∗ L L Sj > / · ∞ −(j−j −1) / A.3 A Rough Estimator for 0-estimation has 0( ) 8isatmost(1 8) j−j∗=1 2 =1 4. ∗∗ ∗ j We describe here a subroutine RoughL0Estimator which Now, let j j. Thus, for these j the B will 2 least 1 − η using O(c log log(mM)) space, in addition to estimate L0 of the corresponding substreams to be at most j ∗ needingtostoreO(log(1/η)) independently chosen pairwise 8, and we will not output L˜0 =2 for j>j . On the other ∗∗ independent hash functions mapping [n] into [c2]. The up- hand, we know for j (if it exists) that with probability at O j∗∗ j∗∗ date and reporting times are (1). least 8/9, S will have 32 − tion h :[n] → [n] at random from a pairwise independent 0( i ) 0( i ) 4 8 with probability at least 1 j / / > / family. For each 0 ≤ j ≤ log(n) we create a substream S (1 9+1 16) 13 16 by Lemma 8. Thus, with probability − / / / L˜ j consisting of those x ∈ [n] with lsb(h(x)) = j. Let L0(S) at least 1 (3 16+1 4) = 9 16, we output 0 =2 for some j j∗∗ ≤ j ≤ j∗ · j 8.

52 Time Lower Bounds for Nonadaptive Turnstile Streaming Algorithms

∗ † Kasper Green Larsen Jelani Nelson Huy L. Nguyenˆ˜ Aarhus University Harvard University Simons Institute [email protected] [email protected] [email protected]

ABSTRACT 1. INTRODUCTION We say a turnstile streaming algorithm is non-adaptive if, In the turnstile streaming model of computation [36] there during updates, the memory cells written and read depend is some vector v ∈ Rn initialized to 0, and we must provide a only on the index being updated and random coins tossed data structure that processes coordinate-wise updates to v. at the beginning of the stream (and not on the memory con- An update of the form (i, Δ) causes the change vi ← vi +Δ, tents of the algorithm). Memory cells read during queries where Δ ∈{−M,...,M}. Occasionally our data structure may be decided upon adaptively. All known turnstile stream- must answer queries for some function of v.Inmanyap- ing algorithms in the literature, except a single recent exam- plications n is extremely large, and thus it is desirable to ple for a particular promise problem [7], are non-adaptive. provide a data structure with space consumption much less In fact, even more specifically, they are all linear sketches. than n, e.g. polylogarithmic in n. For example, n may be 128 We prove the first non-trivial update time lower bounds the number of valid IP addresses (n =2 in IPv6), and vi for both randomized and deterministic turnstile streaming may be the number of packets sent from source IP address i algorithms, which hold when the algorithms are non-adaptive. on a particular link. A query may then ask for the support While there has been abundant success in proving space size of v (the “distinct elements” problem [14]), which was lower bounds, there have been no non-trivial turnstile up- used for example to estimate the spread of the Code Red date time lower bounds. Our lower bounds hold against clas- worm after filtering the packet stream based on the worm’s sically studied problems such as heavy hitters, point query, signature [13, 35]. Another query may be the 2 norm of v entropy estimation, and moment estimation. In some cases [2], which was used by AT&T as part of a packet monitor- of deterministic algorithms, our lower bounds nearly match ing system [31, 42]. In some examples there is more than known upper bounds. one possible query to be asked; in “point query” problems a queryissomei ∈ [n] and the data structure must output Categories and Subject Descriptors vi up to some additive error, e.g. εvp [6, 12]. Such point query data structures are used as subroutines in the heavy F.2.0 [Analysis of Algorithms and Problem Complex- hitters problem, where informally the goal is to output all i ity]: General such that vi is “large”. If the data structure is linearly com- posable (meaning that data structures for v and v can be General Terms combined to form a data structure for v − v), heavy hitters Theory, Algorithms data structures can be used for example to detect trending topics in search engine query streams [20, 21, 6]. In fact the Keywords point query structure [6] has been implemented in the log analysis language Sawzall at Google [41]. streaming; time lower bounds; cell probe model; heavy hit- Coupled with the great success in providing small-space ters; point query data structures for various turnstile streaming problems has ∗ been a great amount of progress in proving space lower Supported by Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation, grant bounds, i.e. theorems which state that any data structure DNRF84. for some particular turnstile streaming problem must use † Supported by NSF grant IIS-1447471 and NSF CAREER space (in bits) above some lower bound. For example, tight award CCF-1350670, ONR grant N00014-14-1-0632, and a or nearly tight space lower bounds are known for the dis- Google Faculty Research Award. tinct elements problem [2, 43, 44, 25], p norm estimation [2, 3, 5, 43, 29, 22, 24, 26, 44, 25], heavy hitters [27], entropy Permission to make digital or hard copies of all or part of this work for personal or estimation [4, 29], and several other problems. classroom use is granted without fee provided that copies are not made or distributed While there has been much previous work on understand- for profit or commercial advantage and that copies bear this notice and the full cita- ing the space required for solving various streaming prob- tion on the first page. Copyrights for components of this work owned by others than lems, much less progress has been made regarding time com- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- plexity: the time it takes to process an update in the stream publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. and the time it takes to answer a query about the stream. STOC’15, June 14–17, 2015, Portland, Oregon, USA. This is despite strong motivation, since in several applica- Copyright c 2015 ACM 978-1-4503-3536-2/15/06 ...$15.00. tions the data stream may be updated at an extremely fast http://dx.doi.org/10.1145/2746539.2746542.

803 rate, so that in fact update time is often arguably more im- Question 1. Can we use techniques from cell probe lower portant than space consumption; for example, [42] reported bound proofs to lower bound the time complexity for classical in their application for 2-norm estimation that their system streaming problems? was constrained to spend only 130 nanoseconds per packet to keep up with high network speeds. If for example n is Indeed the lower bounds for streaming multiplication and the number of IP addresses 2128, then certainly the (com- Hamming distance computation were proved using the in- puter connected to the) router has much more than lg n bits formation transfer technique of Pˇatra¸scu and Demaine [40], of memory or even cache, which is a regime much previous but this approach does not seem effective against the turn- streaming literature has focused on. Arguably this primary stile streaming problems above. focus on space was not because of the greater practical in- There are a couple of obvious problems to attack for Ques- terest in space complexity, but simply because up until this tion 1. For example, for many streaming problems one can point there was a conspicuous absence of understanding con- move from, say, 2/3 success probability to 1−δ success prob- cerning the tradeoff between update time and memory con- ability by running Θ(lg(1/δ)) instantiations of the algorithm sumption (especially from the perspective of lower bounds). in parallel then outputting the median answer. This is possi- In short, many previous research targets were defined by ble whenever the output of the algorithm is numerical, such what our tools allowed us to prove. In fact, in high-speed as for example distinct elements, moment estimation, en- applications such as in networking, one could easily imagine tropy estimation, point query, or several other problems. preferring a solution using n memory(aslongasitfitin Note that doing so increases both the space and update cache) with a very fast update time than, say, polylogarith- time complexity of the resulting streaming algorithm by a mic memory with worse update time. Θ(lg(1/δ)) factor. Is this necessary? It was shown that the Of course, without any space constraint achieving fast up- blowup in space is necessary for several problems by Jayram date and query times is trivial: store v and its norm in mem- and Woodruff [26], but absolutely nothing is known about ory explicitly and spend constant time to add Δ to vi and whether the blowup in time is required. Thus, any non- change the stored norm after each update. Thus an inter- trivial lower bound in terms of δ would be novel. esting data structural issues arises: how can we simultane- Another problem to try for Question 1 is the heavy hitters ously achieve small space and low time for turnstile stream- problem. Consider the 1 heavy hitters problem: a vector ing data structures? As mentioned, surprisingly very little v receives turnstile updates, and during a query we must is understood about this question for any turnstile stream- report all i such that |vi|≥εv1. Our list is allowed to ing problem. For some problems we have very fast algo- contain some false positives, but not “too false”, in the sense rithms (e.g. constant update time for distinct elements [30], that any i output should at least satisfy |vi|≥(ε/2)v1. and also for 2 estimation [42]), whereas for others we do Algorithms known for this problem using non-trivially small not (e.g. super-constant time for p estimation for p =2 space, both randomized [12] and deterministic [19, 37], re- [28] and heavy hitters problems [6, 12]), and we do not have quire Ω(lg˜ n) update time. Can we prove a matching lower proofs precluding the possibility that a fast algorithm exists. bound? Indeed, the only previous time lower bounds for streaming problems are those of Clifford and Jalsenius [8] and Clif- Our Results. ford, Jalsenius, and Sach [9, 10] for streaming multiplica- In this paper we present the first update time lower bounds tion, streaming edit distance and streaming Hamming dis- for a range of classical turnstile streaming problems. The tance computation. These problems are significantly harder lower bounds apply to a restricted class of randomized stream- than e.g. p estimation (the lower bounds proved apply even ing algorithms that we refer to as randomized non-adaptive for super-linear space usage) and there appears to be no way algorithms. We say that a randomized streaming algorithm of extending their approach to obtain lower bounds for the is non-adaptive if: problems above. Importantly, the problems considered in [8, 9, 10] are not in the turnstile model. • Before processing any elements of the stream, the al- A natural model for upper bounds in the streaming liter- gorithm may toss some random coins. ature (and also for data structures in general), is the word RAM model: basic arithmetic operations on machine words • The memory words read/written to (henceforth jointly of w bits each cost one unit of time. In the data structure referred to as probed) on any update operation (i, Δ) literature, one of the strongest lower bounds that can be are completely determined from the index i and the proven is in the cell probe model [34, 15, 45], where one initially tossed random coins. assumes that the data structure is broken up into S words Thus a non-adaptive algorithm may not decide which mem- each of size w bits and cost is only incurred when the user ory words to probe based on the current contents of the reads a memory cell from the data structure (S is the space). memory words. Note that in this model of non-adaptivity, Lower bounds proven in this model hold against any algo- the random coins can encode e.g. a hash function chosen rithm operating in the word RAM model. In recent years (independent of the stream) from a desirable family of hash there has been much success in proving data structure lower functions and the update algorithm can choose what to probe bounds in the cell probe model (see for example [39, 33]) based on the hash function. It is only the input-specific con- using techniques from communication complexity and infor- tents of the probed words that the algorithm may not use mation theory. Can these techniques be imported to obtain to decide which words to probe. To the best of our knowl- time lower bounds for clasically studied turnstile streaming edge, all the known algorithms for classical turnstile stream- problems? ing problems are indeed non-adaptive,inparticularlinear sketches (i.e. maintaining Πv in memory for some r × n matrix Π, r n). One exception we are aware of is an

804 lg(1/δ) lg(1/δ) adaptive algorithm given for a promise problem in [7] (it is tu = Ω(min{ , √ }). Any deter- lg(eS/tu) t · eS/t also worth mentioning that for the non-promise version of lg u lg( u) ministic and non-adaptive streaming algorithm for the problem, the algorithm given in the same work is again 1 heavy hitters must have worst case update time tu = non-adaptive). We further remark that our lower bounds n Ω( lg ). only require the update procedure to be non-adaptive and lg(eS/tu) still apply if adaptivity is used to answer queries. Remark 3. The deterministic lower bound above for Remark 2. One common technique in turnstile stream- point query matches two previous upper bounds for point ing (e.g. see [28]) is to batch some number of updates then query [37], which use error-correcting codes to yield deter- ministic point query data structures. Specifically, for space process them all more time-efficiently once the batch is large S O n t n enough. In all examples we are aware of using this technique, = (lg ), our lower bound implies u =Ω(lg ), match- a time savings is gained solely because batching allows for ing an upper bound based on random codes. For space O n/ n 2 t n/ n faster evaluation of the hash functions involved (e.g. using ((lg lg lg ) ), our lower bound is u =Ω(lg lg lg ), fast multipoint evaluation of polynomials). Thus, such tech- matching an upper bound based on Reed-Solomon codes. Similarly, the deterministic bound above for deterministic niques have only ever found application to decreasing word / RAM time complexity, but have thus far never been effec- 2 1 norm estimation matches the previous upper bound tive in decreasing cell probe complexity, which is what our for this problem [37], showing that for the optimal space S =Θ(lgn), the fastest query time of non-adaptive algo- current work here lower bounds. Thus, in all these cases, t n there is a non-adaptive algorithm achieving the best known rithms is u =Θ(lg ). cell probe complexity. These deterministic upper bounds are also in the cell probe model. In particular, the point query data struc- For the remainder of the paper, for ease of presentation we ture based on random codes and the norm estimation data assume that the update increments are (possibly negative) structure require access to combinatorial objects that are integers bounded by some M ≤ poly(n) in magnitude, and shown to exist via the probabilistic method, but for which we that the number of updates in the stream is also at most do not have explicit constructions. The point query struc- poly(n). We further assume the trans-dichotomous model ture based on Reed-Solomon codes can be implemented in t O˜ n [16, 17], i.e. that the machine word size w is Θ(lg n)bits. the word RAM model with u = (lg ) using fast multi- This is a natural assumption, since typically the stream- point evaluation of polynomials. This is because perform- O n/ n ing literature assumes that basic arithmetic operations on a ing an update, in addition to accessing (lg lg lg )mem- ory cells of the data structure, requires evaluating a degree- value |vi|, the index of the current position in the stream, O n/ n O n/ n or an index i into v can be performed in constant time. (lg lg lg ) polynomial on (lg lg lg )pointstode- We prove lower bounds for the following types of queries termine which memory cells to access (see [37] for details). in turnstile streams. For each problem listed, the query Remark 4. The best known randomized upper bounds function takes no input (other than point query, which takes are S = tu = O(lg(1/δ)) for point query [12] and p/p an input i ∈ [n]). Each query below is accompanied by a estimation for p ≤ 2 [42, 28]. For entropy estimation the description of what the data structure should output. 2 best upper bound has S = tu = O˜((lg n) lg(1/δ)) [23]. For 1 heavy hitters the best known upper bound (in terms of S • 1 heavy hitter: return an index i such that |vi|≥ and tu)hasS = tu = O(lg(n/δ)). v∞ −v1/2.

• Point query: given i at query time, returns vi ± In addition to being the first non-trivial turnstile update v1/2. time lower bounds, we also managed to show that the update time has to increase polylogarithmically in 1/δ as the error • p/q norm estimation (1 ≤ q ≤ p ≤∞): returns probability δ decreases, which is achieved with the typical vp ±vq/2. reduction using lg(1/δ) independent copies of a data struc- ture with constant error probability. • Entropy estimation. returns a 2-approximation of Our lower bounds can also be viewed in another light. If the entropy of the distribution which assigns probabil- one is to obtain constant update time algorithms for the |v |/v i i ∈ n ity i 1 to for each [ ]. above problems, then one has to design algorithms that are adaptive. Since all known upper bounds have non-adaptive The lower bounds we prove for non-adaptive streaming updates, this would require a completely new strategy to n−O(1) ≤ δ ≤ / − algorithms are as follows ( 1 2 Ω(1) is the designing turnstile streaming algorithms. Note that for ran- failure probability of a query): domized algorithms, our lower bounds do not rule out con- stant update time with constant error probability. If how- • Any randomized non-adaptive streaming algorithm for √ ever the error probability is upper bounded by 2−ω( lg n), point query, p/q estimation with 1 ≤ q ≤ p ≤∞,and entropy estimation, must have worst case update time then adaptivity is necessary to achieve constant update time. lg(1/δ) We also show a new cell probe upper bound for 1 point tu =Ω(√ ). lg n lg(eS/tu) query which outperforms the CountMin sketch [12] in terms We also show that any deterministic non-adaptive stream- of query time, while matching it in both space and update ing algorithm for the same problems must have worst time (see Section 4.2). Our new algorithm is inspired by the case update time tu =Ω(lgn/ lg(eS/tu)). solution to the hard instance for our above lower bounds and we believe this upper bound provides evidence that despite • Any randomized non-adaptive streaming algorithm for the importance of the problems, the effort on understand- 1 heavy hitters, must have worst case update time ing the time complexity of them has been insufficient and

805 n perhaps better algorithms are achievable.√ In fact, our up- is a function qi : Z → R. With one-way communication per bound even demonstrates the lg(1/δ)/ lg n behaviour games in mind, we define the input to a streaming prob- of our lower bounds, further supporting the hypothesis that lem as consisting of two parts. More specifically, the pre- a faster algorithms exist. Our upper bound is randomized, domain Dpre ⊆ U consists of sequences of a update op- b non-adaptive and in the cell probe model, meaning that we erations. The post-domain Dpost ⊆{U ∪ Q} consists of assume computation is free of charge and that we have ac- sequences of b updates and/or queries. Finally, the input cess to input independent truly random hash functions that domain D ⊆ Dpre × Dpost denotes the possible pairings of a can be evaluated free of charge. Clearly our lower bounds initial updates followed by b intermixed queries and updates. apply to this setting. It would be interesting to find a fast The set D defines a streaming problem PD. implementation of our algorithm in the word-RAM model. We say that a randomized streaming algorithm for a prob- lem PD uses S words of space if the maximum number of Technique. memory words used when processing any d ∈ D is S.Here As suggested by Question 1, we prove our lower bounds a memory word consists of w =Θ(lgn) bits. The worst case using recent ideas in cell probe lower bound proofs. More update time tu is the maximum number of memory words specifically, we use ideas from the technique now formally read/written to upon processing an update operation for any known as cell sampling [18, 38, 32]. This technique derives d ∈ D. The error probability δ is defined as the maximum lower bounds based on one key observation: if a data struc- probability over all d ∈ D and queries qi ∈ d ∩ Q,ofre- ture/streaming algorithm probes t memory words on an up- turning an incorrect results on query qi after the updates date, then there is a set C of t memory words such that at preceding it in the sequence d. t least m/S updates probe only memory words in C, where A streaming problem PD of the above form naturally de- m is the number of distinct updates in the problem (for data fines a one-way communication game: on an input (d1,d2) ∈ structure lower bound proofs, we typically consider queries D, Alice receives d1 (the first a updates) and Bob receives rather than updates, and we obtain tighter lower bounds by d2 (the last b updates and/or queries). Alice may now send forcing C to have near-linear size). a message to Bob based on her input and Bob must answer We use this observation in combination with the stan- all queries in his input as if streaming through the concate- dard one-way communication games typically used to prove nated sequence of operations d1 ◦ d2. The error probability streaming space lower bounds. In these games, Alice receives of a communication protocol is defined as the maximum over updates to a streaming problem and Bob receives a query. all d ∈ D and qi ∈{d ∩ Q}, of returning an incorrect results Alice runs her updates through a streaming algorithm for on qi when receiving d as input. the corresponding streaming problem and sends the result- Traditionally, the following reduction is used: ing Sw bit memory footprint to Bob. Bob then answers his Theorem 5. If there is a randomized streaming algorithm query using the memory footprint received from Alice. By P S δ proving communication lower bounds for any communica- for D with space usage and error probability , then there tion protocol solving the one-way communication game, one is a private coin protocol for the corresponding one-way com- munication game in which Alice sends Sw bits to Bob and obtains space lower bounds for the corresponding streaming δ problem. the error probability is . At a high level, we use the cell sampling idea in combi- Proof. Alice simply runs the streaming algorithm on her nation with the one-way communication game as follows: if input and sends the memory image to Bob. Bob continues Alice’s non-adaptive streaming algorithm happens to “hash” the streaming algorithm and outputs the answers.  her updates such that they all probe the same t memory cells, then she only needs to send Bob the contents of those Recall from Section 1 that a randomized streaming algo- t cells. If t

806 since queries are defined from indices. For p estimation Recall we want to solve the one-way communication game and entropy estimation, we simply have π(qi(v)) = qi(v) on A and B. To do this, first observe that by fixing the and π(qi)=qi since neither queries or answers involve in- random coins, the randomized non-adaptive streaming al- dices. For heavy hitters we have π(qi)=qi (there is only gorithm gives a non-adaptive and deterministic streaming one query), but we might have that π(qi(v)) = qi(v)since algorithm that has error probability δ and space usage S the answer to the one query is an index. We now have the under distribution γ. Before starting the communication following: protocol on A and B, Alice and Bob both examine the algo- rithm (it is known to both of them). Since it is non-adaptive Theorem 6. If there is a randomized non-adaptive stream- and deterministic, they can find a set of tu memory√ words C, P S tu ing algorithm√ for a permutation invariant problem D with such that at least n/ >n/(eS/tu) ≥ n ≥ a indices a ≤ n S δ tu , having space usage , error probability ,and i ∈ [n] satisfy that any update (i, Δ) probes only memory t ≤ / n/ eS/t worst case update time u (1 2)(lg lg( u)),then words in C.WeletIC denote the set of all such indices there is a private coin protocol for the corresponding one-way (again, IC is known to both Alice and Bob). Alice and Bob communication game in which Alice sends at most a lg e + also agree on a set of permutations {ρ ,...,ρk} (we deter- t a eS/t a en/a t w 1 u lg( u)+lg +lglg( )+ u +1 bits to Bob and mine a value for k later), such that for any set of at most ea · eS/t tuaδ the error probability is 2 ( u) . a indices, I, that can occur in Alice’s updates, there is at ρ Before giving the proof, we present the two main ideas. least one permutation i where: First observe that once the random choices of a non-adaptive C • ρi(j) ∈ I for all j ∈ I streaming algorithm have been made, there must be a large • Let I(A) denote the (random) indices of the updates I ⊆ n i, collection of indices [ ] for which all updates ( Δ), in A. Then the probability that the non-adaptive and where i ∈ I, probe the same small set of memory words ρ A S deterministic protocol errs on input i( ), conditioned (there are at most distinct sets of tu words to probe). If a tua tu on I(A)=I ,isatmost2e ·(eS/tu) ·εI(A)=I where all of Alice’s updates probed only the same set of tu words, εI(A)=I is the error probability of the deterministic then Alice could simply send those words to Bob and we and non-adaptive streaming algorithm on distribution t w would have reduced the communication to u bits. To γ, conditioned on I(A)=I . handle the case where Alice’s updates probe different sets of words, we make use of the permutation invariant property. Again, this set of permutations is known to both players. More specifically, we show that Alice and Bob can agree on The protocol is now simple: upon receiving A, Alice finds acollectionofk permutations of the input indices, such that the index i of a permutation ρi satisfying the above for the one of these permutes all of Alice’s updates to a new set of indices I(A). She then sends this index to Bob and runs indices that probe the same tu memory cells. Alice can then the deterministic and non-adaptive algorithm on ρi(A). She send this permutation to Bob and they can both alter their forwards the addresses and contents of all memory words in input based on the permutation. Therefore Alice and Bob C as well. This costs a total of lg k + |C|·w ≤ lg k + tu(n)w can solve the communication game with lg k + tuw bits of bits. Note that no words outside C are updated during communication. The permutation of indices unfortunately Alice’s updates. Bob now remaps his input B according to increases the error probability as we shall see below. With ρi and runs the deterministic and non-adaptive streaming these ideas in mind, we now give the proof of Theorem 6. algorithm on his updates and queries. Observe that for each query qj ∈ B, Bob will get the answer ρi(qj )(ρi(v)) if the Proof (of Theorem 6). Alice and Bob will permute the algorithm does not err, where v is the“non-permuted”vector indices in their communication problem and use the random- after processing all of Alice’s updates A and all updates in B ized non-adaptive streaming algorithm on this transformed preceeding the query qj . For each such answer, he computes −1 instance to obtain an efficient protocol for the original prob- ρi (ρi(qj )(ρi(v))). Since PD is permutation invariant, we −1 −1 lem. have ρi (ρi(qj )(ρi(v))) = ρi (ρi(qj (v))) = qj (v). The final a tua By Yao’s principle, we have that the randomized com- error probability (over μ)ishenceatmost2e ·(eS/tu) ·δ plexity of a private coin one-way communication game with since EI εI(A)=I = δ. error probability δ equals the complexity of the best deter- We only need a bound on k. For this, fix one set I of at ministic algorithm with error probability δ over the worst most a indices in [n] and consider drawing k uniform random distribution. Hence we show that for any distribution μ on permutations. For each such random permutation Γ, note D ⊆ Dpre ×Dpost, there exists a deterministic one-way com- that Γ(I ) is distributed as I(π(A)) conditioned on I(A)= munication protocol with tua lg S +lga+lglgn+tuw bits of I . Hence, the expected error probability when using the a tua communication and error probability 2e · (eS/tu) δ.We map Γ (expectation over choice of Γ) is precisely εI(A)=I . let μ1 denote the marginal distribution over Dpre and μ2 the We also have marginal distribution over Dpost. |IC |  C |I | −tua μ μ ,μ D C |I| |I | −a eS Let =( 1 2) be a distribution on . Define a new P(Γ(I ) ⊆ I )= ≥ ≥ e · γ γ ,γ n en t distribution =( 1 2): pick a uniform random permuta- |I| u tion π of [n]. Now draw an input d from μ and permutate all indices of updates (and queries if defined for such) using By Markov’s inequality and a union bound, we have both I ⊆ IC ea · eS/t tua · the permutation π. The resulting sequence π(d)isgivento Γ( ) and error probability at most 2 ( u) −a −tua def Alice and Bob as before, which defines the new distribution εI(A)=I with probability at least e · (eS/tu) /2 = p γ =(γ1,γ2). We use A ∼ μ1 to denote the r.v. providing overthechoiceofΓ.Thusifwepick k =(1/p)a lg(en/a) > μ π A ∼ γ /p · n x ≤ ex x Alice’s input drawn from distribution 1 and ( ) 1 (1 ) lg a and use that 1 + for all real , then denotes the random variable providing Alice’s transformed setting the probability that all permutations Γ chosen fail C input. We define B ∼ μ2 and π(B) ∼ γ2 symmetrically. to have the desired failure probability and Γ(I ) ⊆ I is at

807 − p k < / n n a n/a ≥ c c t n c most (1 ) 1 a . Thus by a union bound over all a Ω( lg( )) 0 1 u lg bits for some constant 1 (where size-a subsets of indices, we can conclude that the desired c1 does not depend on c0). But the size of her message was set of permutations exists.  a lg e + tua lg(eS/tu)+lga +lglg(en/a)+tuw +1 ≤ 2 c tu lg(eS/tu)+tuw + o(tu lg n). 3. IMPLICATIONS 0 In this section, we present the (almost) immediate im- We assumed plications of Theorem 6. All these results follow by re- tu = o lg(1/δ)/ lg(eS/tu) = o lg n/ lg(eS/tu) using the one-way communication game lower bounds orig- inally proved to obtain space lower bounds of heavy hitters and thus the above is bounded by tuw + o(tu lg n). Setting [27]. Consider the generic protocol in Figure 1. For differ- c high enough, we get that w ≥ (c c lg n)/2 and thus we ent streaming problems, we will show the different ways to 0 0 1 Check t, j have reached a contradiction. implement the function ( ) using the updates and k−3 <δ< / − Check t, j For the case 1 2 Ω(1), observe that we can de- queries of the problems, where ( ) is supposed to be crease the failure probability to k−3 by increasing the space, able to tell if t is equal to ij with failure probability δ.We α k update time and query time by a factor =Θ(lg1/δ ): show first that if this is the case, we obtain a lower bound simply run α independent copies of the streaming algorithm on the update time tu. The lower bounds then follow from Check t, j and return the majority answer on a query. Hence the lower the implementation of ( ). bound becomes ⎛ ⎞ 1: Alice chooses a indices i1,...,ia randomly without re- lg k lg(eSα/(tuα)) lg 1/δ placement in [n]. Ω ⎝ ⎠ =Ω . j α 2: Alice performs updates v[ij ] ← v[ij ]+C for j = lg k lg(eS/tu) 1,...,a, where C is a large constant. 3: Alice sends her memory state to Bob.  4: for j from a down to 1 do Bob decodes ia,...,i1 from Alice’s message. Theorem 8. Any deterministic non-adaptive streaming 5: for all t ∈ [n] do algorithm that can implement Check(t, j) must have worst lg n 6: if Check(t,j) then tu case update time =Ω(lg(eS/tu) ). 7: Bob declares ij = t. j 8: Bob performs the update v[ij ] ← v[ij ] − C Proof. The proof is similar to that of Theorem 7. As- 9: end if sume for contradiction there is a deterministic non-adaptive 10: end for streaming algorithm with tu = o(lg n/ lg(eS/tu)). Choose 11: end for a = c0tu for sufficiently large constant c0.ByTheorem6, Alice can send her memory state to Bob using Figure 1: The communication protocol for Alice to send a random numbers in [n] to Bob. a lg e + tua lg(eS/tu)+lga +lglg(en/a)+tuw +1 ≤ 2 c0tu lg(eS/tu)+tuw + o(tu lg n) ≤ The proofs of the following two theorems follow easily from tuw + o(tu lg n) Theorem 6: bits and Bob can still answer every query correctly. Since Theorem 7. Any randomized non-adaptive streaming al- Bob can recover i1,...,ia, Alice must send Bob at least Check t, j gorithm that can implement all the ( ) calls using no H(i1,...,ia)=Ω(a lg(n/a)) = c0c1tu lg n for some con- O(1) −Ω(1) more than k ≤ n queries with failure probability n ≤ stant c1 (independent of c0). Setting c0 large enough gives δ ≤ / − w ≥ c c n /  1 2 Ω(1) each, must have worst case update time ( 0 1 lg ) 2, thus deriving the contradiction. t lg 1/δ , √ lg 1/δ u =Ω min eS/t . lg( u) lg k lg(eS/tu) 3.1 Applications to specific problems

Proof. We first prove the lower bound for the case where /δ Point query. lg 1 is the smaller of the two terms. This is the case lg(eS/tu) We can implement each Check(t, j) by simply querying at least from n−Ω(1) ≤ δ ≤ k−3. Assume for contradic- for the value of v[t] and check if the returned value is at j tion that such a randomized non-adaptive algorithm exists least C /3. By the guarantee of the algorithm, if it does with tu = o( lg(1/δ)/ lg(eS/tu)). Set a = c0tu for a suf- not fail, the returned value is within a factor 3 from the ficiently large constant c0. We invoke Theorem 6 to con- right value and thus, Check(t, j) can correctly tell if t = vj . clude that Alice can send her memory state to Bob using Thus we run a total of na = nO(1) queries to implement all a lg e + tua lg(eS/tu)+lga +lglg(en/a)+tuw + 1 bits while Check(t, j)’s. increasing the failure probability of Bob’s k queries to

a t a c t2 −o −o − p/q estimation. e · eS/t u δ ≤ eS/t 0 u δ1 (1) ≤ δ1 (1) ≤ k 2 2 ( u) ( u) We can implement Check(t, j)asfollows. each. By a union bound, the answer to all Bob’s queries • v t ← v t − Cj − /k ≥ / [ ] [ ] are correct with probability at least 1 1 9 10, so j • Check if vp is at most C /3 Bob can recover i1,...,ia with probability 9/10. By Fano’s j inequality, Alice must send Bob at least Ω(H(i1,...,ia)) = • v[t] ← v[t]+C

808 By the guarantee of the algorithm, if it does not fail, the the solution to the hard instance from Section 3 and we be- j−1 j estimated p/q norm is at most 3C 0.9. Thus, by the guarantee of the uniform random table cells. We also have t truly random w O(1) σ ,...,σ n × V → √ algorithm, Check(t, j) correctly tells if t = vj using n hash functions 1 t :[ ] w mapping indices queries. and a value√ to uniform random length w bit strings with precisely w 1-bits.Uponanupdatevi ← vi +Δorvi ← Corollary 9. Any randomized non-adaptive streaming vi − ΔforsomeΔ∈ V ,wedoabitwiseXORofσj (i, Δ) −O(1) algorithm for 1 heavy hitters with failure probability n ≤ ontothewordstoredinentryTj [hj (i)] for j =1,...,t. δ ≤ 1/2 − Ω(1) must have worst case update time tu = To answer whether an index i satisfies vi =ΓforsomeΓ∈ { lg(1/δ) , √ lg(1/δ) } V , we compute the standard inner product Tj [hj (i)],σj (i, Γ) Ω(min eS/t ). Any deterministic non- lg( u) lg tu·lg(eS/tu) for all j =1,...,t (interpreting the words as {0, 1}-vectors adaptive streaming algorithm must have worst case update in Rw). Note that such an inner product is simply a count of lg n tu Tj hj i σj i, time =Ω lg(eS/tu) . how many 1’s [√ ( )] and ( Γ) have in common. Since σ√j (i, Γ) has only w 1’s, this number of always bounded by Proof. We use a slightly different decoding procedure for w. If the majority of the Tj [hj (i)]’s have Bob. Instead of running Check(t, j) all indices t ∈ [n], in √ Tj [hj (i)],σj (i, Γ)≥ w/2, iteration j, Bob can simply query for the heavy hitter to Cj+1−C ij j v < we return that vi equals Γ and otherwise that vi =Γ. find . Note that in iteration ,wehave 1 = C−1 j 2C =2v[ij ] so if the algorithm is correct, Bob will find the index ij . WehavethusimplementedalltheCheck(t, j)’s Analysis. using only a queries. Recall from the proof of Theorem 7 In the cell probe model, we measure query time and up- that a = O(tu).  date time only as the number of memory words read/updated. Thus the query time and update time is t and the space usage is √tr words. Given a desired error√ probability δ ≤ 4. UPPER BOUNDS exp(√−Θ( lg n)), we set t =Θ(lg(1/δ)/ lg n) and let r = In this section we first present an upper bound for the con- ck/ lg n for some constant c>0. This gives√ optimal space crete hard instance used in Section 3 to derive our update andanupdateandquerytimeofO(lg(1/δ)/ lg n). What time lower bounds. The upper bound nearly matches√ our remains is to show that the error probability is bounded by lowerboundandinparticularexhibitsthelg(1/δ)/ lg n be- δ. For this, consider a query asking whether vi = Γ for haviour with optimal space usage. In Section 4.2 we present some Γ ∈ V . Since the hash functions used for the different an algorithm for 1 point query which outperforms the Count- rows are independent, we restrict our attention to one row Minsketchintermsofquery time, while matching it in both Tj .LetKj (i, Γ) be set of pairs (i , Δi ) such that Δi ∈ V , space and update time. Our new algorithm is inspired by vi =Δi and either i = i or Δi =Γ.NotethatΔ i ∈ V

809 implies Δi =0andthus |Kj (i, Γ)|∈{k − 1,k} depending than one counter fits in a word. We get around this by on whether Γ = vi or not. From Kj (i, Γ) we also define storing a summary word for groups of roughly lg n coun- Cj (i, Γ) ⊆ Kj (i, Γ) as the pairs (i , Δi ) ∈ Kj (i, Γ) satisfy- ters. The summary words store (1 + ε)-approximations of ing hj (i )=hj (i), i.e. Cj (i, Γ) is the conflict list of pairs the values of the counters and hence several approximations hashing to the same word as index i in table Tj . can be packed in one word. When answering queries, it Let Σ denote the bitwise XOR of σj (i , Δi )forall(i , Δi ) suffices to use the approximate counters and thus we have in Cj (i, Γ). Then if vi =ΓwehaveTj [hj (i)] = (Σ XOR effectively reduced the number of words that must be read. −1 σj (i, Γ)) and if vi =Γwehave √ Tj [hj (i)] = Σ. It follows Our final result, in the cell probe model, is O(ε lg(1/δ)) √that if Σ,σj (i, Γ) < w/2, then Tj [hj (i)],σj (i, Γ) > words of space, update time O(lg(1/δ)), and query time w/ v E lg(1/δ) 2iff i =Γ.Hencewelet√ err denote the event that O(1 + ε lg(1/δ)+ √ · lg lg n +lg(1/ε)). The space  ,σ i, ≥ w/ P E lg n Σ j ( Γ) 2. We wish to bound ( err). √ and update time match the CountMin sketch, whereas the E |C i, |≤ w/ o For this, let few denote the event j ( Γ) 4. query time strictly outperforms it for 1/n (1) ≤ ε ≤ o(1) E Now observe that conditioned on few, there are no more and δ = o(1). The details are as follows. than w/4 bits in Σ that are 1. Since σj is truly random i, ∈/ C i, σ i, and ( Γ) j ( Γ), we have that j ( Γ) is independent of Algorithm. these set bits and k { −2 /δ ,ε−2, n/ n} √ Let = Θ(min lg (1 ) lg lg lg1+ε ). We store T t T ,...,T T E[Σ,σj (i, Γ)] ≤ w/4. atable with rows denoted 1 t.Eachrow j is partitioned into r blocks of k entries each. We use Tj [h, ] It follows that to denote the ’th entry of the h’th block in Tj for any √ (h, ) ∈ [r] × [k]. Each of the trk entries stores a Θ(lg n)- P(Eerr | E ) ≤ exp(−Ω( w)) = exp(−Ω( lg n)). few bit counter which is initialized to 0. In addition to T ,we Next observe that store a table A with t rows and r columns. The rows of A are denoted A1,...,At and entry Aj [h] stores a (1 + ε/4)- E[|Cj (i, Γ)|] ≤ k/r = lg n/c. T h, ∈ k √ approximation of j [ ]foreach [ ] and thus an entry A O k n ≤ O n For big enough constant c,thisislessthan w/8andwe of takes ( lg lg1+ε ) (lg )bits. get from a Chernoff bound that We have t truly random hash functions h1,...,ht :[n] → √ √ [r] mapping indices to blocks and t truly random hash func- k P(¬Efew)=P(|Cj (i, Γ)| > w/4) < exp(−Ω( w)) = σ ,...,σ n → √ tions√ 1 t :[ ] k mapping indices to a subset of exp(−Ω( lg n)). k counters inside a block. Upon an update vi ← vi + Δ, we add Δ to all counters We conclude Tj [hj (i),]forj =1,...,t and ∈ σj (i). We also update ε/ P(Eerr)= the corresponding (1 + 4)-approximations of the counters. These are stored in Aj [hj (i)] for j =1,...,t. P(Eerr | Efew) P(Efew)+P(Eerr |¬Efew) P(¬Efew)= To answer a query for index i,wereadAj [hj (i)] for j = exp(−Ω( lg n)). 1,...,t.Foreachj, we extract from Aj [hj (i)] a (1 + ε/4)- approximations of Tj [hj (i),]foreach ∈ σj (i). We finally The probability that we fail in most of the t rows is thus √ return the median of these approximations as our estimate bounded by exp(−Ω(t lg n)) ≤ δ which completes the anal- of vi. ysis. Analysis. Discussion. We first analyse the resource requirements. The space There are two places in the above algorithm where we usage is O(trk) words. In the cell probe model, the query make use of free computation in the cell probe model: Com- time and update time is defined as the number of words√ puting Tj [hj (i)],σj (i, Γ) and in the efficient truly random read/updated and thus the algorithm has update time O(t k) hash functions. There seems to be hope of getting around and query time O(t). the true randomness by resorting to Θ(lg n)-wise indepen- We fix the parameters in the following. Given a desired dent hash functions and batching several evaluations to ex- √ error probability δ,wesett =1+O(lg(1/δ)/ k). To obtain ploit algorithms such as fast multipoint evaluation of polyno- − asymptotically optimal space, we set r = c(ε 1 lg(1/δ))/(tk) mials to obtain close-to-constant amortized evaluation time, √ for a large constant c>0. Note that by√ our choice of k, see e.g. [28]. Hashing only to bit strings with exactly w − we have r ≥ c(ε 1 lg(1/δ))/(k + O(lg(1/δ) k)) ≥ 1ifc is 1’s in near-constant time might need some additional ideas. sufficiently large. What remains is to show that this gives the desired error probability on a query for index i. What 4.2 Faster 1 Point Query in the Cell Probe Model makes the analysis more cumbersome than for the standard CountMin sketch is mainly the dependencies introduced by Below we present our improved algorithm for 1 point the blocking. v ← v query. In this problem we must support updates i i +Δ For the analysis, first observe that if we consider a counter for Δ ∈{−nO(1),...,nO(1)}. Given a query index i,we Tj [hj (i),] for an ∈ σj (i) and let β denote the sum of ab- must return vi up to an additive error of εv with proba- 1 solute values of all other indices contributing to Tj [hj (i),], bility at least 1 − δ. For the reader familiar with the classic i.e. β =    |vi |, then if β ≤ εv1/2 CountMin sketch, our basic idea is to use the word-packing i = i:hj (i)=hj (i )∧∈σj (i ) we have approach from Section 4.1 to pack roughly lg n counters in one word. The difficulty here is that Δ requires Θ(lg n) bits and not just one bit as in Section 4.1, thus no more (vi−εv1/2)/(1+ε/4) ≤ Aj [hj (i)] ≤ (vi+εv1/2)(1+ε/4).

810 √ Since vi ≤v1,thisimplies O(t)=O(1 + lg(1/δ)/ k) which is bounded by A h i ≥ − ε/ v − εv / lg(1/δ) j [ j ( )] (1 4)( i 1 2) O 1+ε lg(1/δ)+ √ · lg lg n +lg(1/ε) lg n ≥ vi − εv1/2 − εv1/4 − εv1/8 ε ≥ vi − εv1. This strictly outperforms the CountMin sketch for any and δ satisfying 1/no(1) ≤ ε ≤ o(1) and δ = o(1). and Discussion. Aj [hj (i)] ≤ (vi + εv /2)(1 + ε/4) 1 In addition to the true randomness, the median compu- ≤ vi + εv1/2+εv1/4+εv1/8 tation also prevents the above algorithm from being effi-

≤ vi + εv1. ciently implemented in word RAM.√ If one finds an efficient way of extracting the relevant k approximations from each It follows that our median estimate is correct if more than Aj [hj (i)], one can most likely use the standard randomized half of the counters Tj [hj (i),]satisfy median selection algorithm [11] to find the median efficiently using word-level parallelism (basically running an external |vi |≤εv1/2. memory model [1] median selection algorithm).    i = i:hj (i)=hj (i )∧∈σj (i )

To bound the probability this happens, let Cj denote the 5. REFERENCES [1] A. Aggarwal and J. S. Vitter. The input/output subset of indices i = i such that hj (i )=hj (i). We say √ complexity of sorting and related problems. that the event E happens if there are more than k/4 err Communications of the ACM, 31:1116–1127, 1988. indices ∈ σj (i)forwhich   |vi | >εv /2. i ∈Cj :∈σj (i ) 1 [2] N. Alon, Y. Matias, and M. Szegedy. The space P E C We wish to bound ( err). For this, partition j into two complexity of approximating the frequency moments. sets Hj and Lj . The set Hj contains the heavy indices J. Comput. Syst. Sci., 58(1):137–147, 1999. i ∈ Cj such that |vi |≥εv /2. Lj contains the re- 1 [3] Z. Bar-Yossef, T. S. Jayram, R. Kumar, and maining light indices. We define two auxiliary events. Let √ D. Sivakumar. An information statistics approach to E be the event |Hj |≥ k/16 and let E be the event errH √ errL data stream and communication complexity. J.  |vi |≥εv k/16. If we condition on ¬E and i ∈Lj 1 errH Comput. Syst. Sci., 68(4):702–732, 2004. ¬E k/ k/ εv / indices [ ]forwhich i ∈Cj :∈σj (i ) 1 2. near-optimal algorithm for estimating the entropy of a P E | From this it follows from a√ Chernoff bound that ( err stream. ACM Transactions on Algorithms, 6(3), 2010. ¬EerrH ∧¬EerrL)=exp(−Ω( k)). [5] A. Chakrabarti, S. Khot, and X. Sun. Near-optimal Next observe that there can be no more than 2ε−1 in- lower bounds on the multi-party communication dices i for which |vi |≥εv1/2.Foreachsuchindexi , complexity of set disjointness. In IEEE Conference on −1 we have P(i ∈ Hj )=1/r and we get√ E[|Hj |] ≤√2ε /r = Computational Complexity, pages 107–117, 2003. 2tk/(c lg(1/δ)) = 2k/(c lg(1/δ))+O( k/c)=√ O( k/c). Set- [6] M. Charikar, K. Chen, and M. Farach-Colton. Finding ting c high enough, this is no more than k/32.√ It follows frequent items in data streams. Theor. Comput. Sci., 312(1):3–15, 2004. from a Chernoff bound that P(EerrH)=exp(−Ω( k)). For analysing P(EerrL), observe that [7]R.H.Chitnis,G.Cormode,M.T.Hajiaghayi,and M. Monemizadeh. Parameterized streaming algorithms E[ |vi |] ≤v1/r for vertex cover. CoRR, abs/1405.0093, 2014.  i ∈Lj [8] R. Clifford and M. Jalsenius. Lower bounds for online = εv tk/(c lg(1/δ)) integer multiplication and convolution in the 1 √ cell-probe model. In Proceedings of the 38th = O(εv1 k/c). International Colloquium on Automata, Languages √ and Programming (ICALP), pages 593–604, 2011. c εv k/ For large enough , this is bounded by 1 32 and thus [9] R. Clifford, M. Jalsenius, and B. Sach. Tight E is an event in which  |vi | exceeds its expecta- errL i ∈Lj cell-probe bounds for online hamming distance tion by at least a factor 2. Since all i considered have |vi |≤ computation. In Proceedings of the 24th Annual εv1/2, it follows that the probability of EerrL is maximized ACM-SIAM Symposium on Discrete Algorithms v ε−1 when the mass (of 1) is distributed evenly on 2 √coor- (SODA), pages 664–674, 2013. dinates. Thus EerrL becomes the event that at least k/8 [10] R. Clifford, M. Jalsenius, and B. Sach. Cell-probe of these coordinates fall in Lj . By the arguments we√ used bounds for online edit distance and other pattern P E P E − k to bound ( errH), we conclude ( errL)=exp( Ω(√ )). matching problems. In Proceedings of the 26th Annual Combining the pieces, we conclude P(Eerr)=exp(−Ω( k)). ACM-SIAM Symposium on Discrete Algorithms Finally observe that for the median estimate to be incor- (SODA), 2015. To appear. rect, the event Eerr must happen for Ω(t) rows and the hash [11] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. functions for these rows are independent. It finally follows Introduction to Algorithms. The MIT Press, by a Chernoff√ bound that the error probability is bounded Cambridge, Mass., 1990. by exp(−Ω(t k)) ≤ δ. By our choice of parameters, this [12] G. Cormode and S. Muthukrishnan. An improved −1 gives√ optimal√ space of O(ε lg(1/δ)) words, update time data stream summary: the count-min sketch and its O(t k)=O( k +lg(1/δ)) = O(lg(1/δ)) and query time applications. J. Algorithms, 55(1):58–75, 2005.

811 [13] C. Estan, G. Varghese, and M. E. Fisk. Bitmap ACM-SIAM Symposium on Discrete Algorithms algorithms for counting active flows on high-speed (SODA), pages 1161–1178, 2010. links. IEEE/ACM Trans. Netw., 14(5):925–937, 2006. [30] D. M. Kane, J. Nelson, and D. P. Woodruff. An [14] P. Flajolet and G. N. Martin. Probabilistic counting optimal algorithm for the distinct elements problem. algorithms for data base applications. J. Comput. In Proceedings of the Twenty-Ninth ACM Syst. Sci., 31(2):182–209, 1985. SIGMOD-SIGACT-SIGART Symposium on Principles [15] M. L. Fredman. Observations on the complexity of of Database Systems (PODS), pages 41–52, 2010. generating quasi-Gray codes. SIAM J. Comput., [31] B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen. 7:134–146, 1978. Sketch-based change detection: methods, evaluation, [16] M. L. Fredman and D. E. Willard. Surpassing the and applications. In Internet Measurement information theoretic bound with fusion trees. J. Comference, pages 234–247, 2003. Comput. Syst. Sci., 47(3):424–436, 1993. [32] K. G. Larsen. Higher cell probe lower bounds for [17] M. L. Fredman and D. E. Willard. Trans-dichotomous evaluating polynomials. In Proceedings of the 53rd algorithms for minimum spanning trees and shortest Annual IEEE Symposium on Foundations of paths. J. Comput. Syst. Sci., 48(3):533–551, 1994. Computer Science (FOCS), pages 293–301, 2012. [18] A. G´al and P. B. Miltersen. The cell probe complexity [33] K. G. Larsen. Models and Techniques for Proving of succinct data structures. In Proceedings of the 30th Data Structure Lower Bounds. PhD thesis, Aarhus International Colloquium on Automata, Languages University, 2013. and Programming, pages 332–344, 2003. [34] M. Minsky and S. Papert. Perceptrons. MIT Press, [19] S. Ganguly and A. Majumder. CR-precis: A 1969. deterministic summary structure for update data [35] D. Moore, 2001. http: streams. In ESCAPE, pages 48–59, 2007. //www.caida.org/research/security/code-red/. [20] Google. Google Trends. [36] S. Muthukrishnan. Data streams: Algorithms and http://www.google.com/trends. applications. Foundations and Trends in Theoretical [21] Google. Google Zeitgeist. Computer Science, 1(2), 2005. http://www.google.com/press/zeitgeist.html. [37]J.Nelson,H.L.Nguyˆ˜en, and D. P. Woodruff. On [22] A. Gronemeier. Asymptotically optimal lower bounds deterministic sketching and streaming for sparse on the NIH-multi-party information complexity of the recovery and norm estimation. Lin.Alg.Appl., AND-function and disjointness. In Proceedings of the 441:152–167, Jan. 2014. Preliminary version in 26th International Symposium on Theoretical Aspects RANDOM 2012. of Computer Science (STACS), pages 505–516, 2009. [38] R. Panigrahy, K. Talwar, and U. Wieder. Lower [23] N. J. A. Harvey, J. Nelson, and K. Onak. Sketching bounds on near neighbor search via metric expansion. and streaming entropy via approximation theory. In In Proceedings of the 51st Annual IEEE Symposium Proceedings of the 49th Annual IEEE Symposium on on Foundations of Computer Science (FOCS), pages Foundations of Computer Science (FOCS),pages 805–814, 2010. 489–498, 2008. [39] M. Pˇatra¸scu. Lower bound techniques for data [24] T. S. Jayram. Hellinger strikes back: A note on the structures. PhD thesis, Massachusetts Institute of multi-party information complexity of AND. In Technology, 2008. Proceedings of the 12th International Workshop on [40] M. Pˇatra¸scu and E. D. Demaine. Logarithmic lower Randomization and Computation (RANDOM),pages bounds in the cell-probe model. SIAM Journal on 562–573, 2009. Computing, 35(4):932–963, 2006. [25] T. S. Jayram, R. Kumar, and D. Sivakumar. The [41] R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. one-way communication complexity of Hamming Interpreting the data: Parallel analysis with Sawzall. distance. Theory of Computing, 4(1):129–135, 2008. Scientific Programming, 13(4):277–298, 2005. [26] T. S. Jayram and D. P. Woodruff. Optimal bounds for [42] M. Thorup and Y. Zhang. Tabulation-based Johnson-Lindenstrauss transforms and streaming 5-independent hashing with applications to linear problems with subconstant error. ACM Transactions probing and second moment estimation. SIAM J. on Algorithms, 9(3):26, 2013. Comput., 41(2):293–331, 2012. [27] H. Jowhari, M. Saglam, and G. Tardos. Tight bounds [43] D. P. Woodruff. Optimal space lower bounds for all for Lp samplers, finding duplicates in streams, and frequency moments. In Proceedings of the 15th Annual related problems. In Proceedings of the 30th ACM ACM-SIAM Symposium on Discrete Algorithms SIGMOD-SIGACT-SIGART Symposium on Principles (SODA), pages 167–175, 2004. of Database Systems (PODS), pages 49–58, 2011. [44] D. P. Woodruff. Efficient and Private Distance [28] D. M. Kane, J. Nelson, E. Porat, and D. P. Woodruff. Approximation in the Communication and Streaming Fast moment estimation in data streams in optimal Models. PhD thesis, Massachusetts Institute of space. In Proceedings of the 43rd ACM Symposium on Technology, 2007. Theory of Computing (STOC), pages 745–754, 2011. [45] A. C.-C. Yao. Should tables be sorted? J. ACM, [29] D. M. Kane, J. Nelson, and D. P. Woodruff. On the 28:615–628, 1981. exact space complexity of sketching and streaming small norms. In Proceedings of the 21st Annual

812 %3"'5 TVCNJUUFE WFSTJPO

Computing River Floods Using Massive Terrain Data

Cici Alexander Lars Arge Peder Klith Bøcher Dynamiques de MADALGO∗ Department of Bioscience l’Environnement Côtier Aarhus University, Denmark Aarhus University, Denmark IFREMER, France [email protected] [email protected] [email protected]

Morten Revsbæk Brody Sandel Jens-Christian Svenning SCALGO Department of Bioscience and Department of Bioscience Aarhus, Denmark MADALGO∗ Aarhus University, Denmark [email protected] Aarhus University, Denmark [email protected] [email protected]

Constantinos Jungwoo Yang Tsirogiannis MADALGO∗ Department of Bioscience and Aarhus University, Denmark MADALGO∗ [email protected] Aarhus University, Denmark [email protected]

ABSTRACT actual event quite accurately. In fact, the new algorithm Many times river floods have resulted in huge catastrophes. that we introduce produces more accurate results than the To reduce the negative outcome of such floods, it is impor- existing popular method. We evaluated the efficiency of our tant to predict their extent before they happen. For this algorithms in practice by conducting experiments on mas- reason, scientists nowadays use algorithms that model river sive datasets. We show that the two algorithms perform floods on digital terrains. Yet, all existing algorithms of this efficiently even for datasets of approximatelly 268 GB size. kind have a major drawback; they cannot efficiently process massive terrain datasets, which have become widely avail- able during the last years. 1. INTRODUCTION In this paper, we describe two algorithms that provide Throughout history, river floods have caused large disas- high-quality river flood modelling and, unlike any previ- ters. Usually induced by heavy rainfall, such floods can lead ous approach, efficiently handle massive terrain data. More to casualties and huge financial damage for the local com- specifically, given a raster terrain and a subset of its cells munities. A recent example is the catastrophic flood of the representing a river network, we describe two algorithms Indus river in Pakistan that took place in 2010 [6]. This that for each cell in the raster estimate the height that the flood claimed approximatelly two thousand lives, and about river should rise for the cell to get flooded. One of the pro- one fifth of the total area of the country ended up covered posed algorithms is a redesign of a European Union approved by water. Society wants to predict such floods, so that mea- method that is used by authorities in Denmark for modelling sures can be taken in advance to reduce the harm done. river floods. We show how this algorithm can be adapted Therefore, it is important for people to know which regions to efficiently handle massive terrain data. The other al- around a river have the highest risk of getting flooded when gorithm is a novel method that we introduce for modelling the level of the river rises. river floods. For an input raster that consists of N cells, and Today, hydrologists use computers to model river floods; which is so large that it can only be stored in the hard disk they use specialised software to simulate flood events based of a computer, each of the proposed algorithms can produce on digital representations of terrains and rivers. Such ter- its output with only O(sort(N)) transfers of data blocks be- rain representations are widely known as Digital Elevation tween the disk and the main memory. Here sort(N) denotes Models (DEMs). The most popular type of DEMsistheso- the minimum number of data transfers that are needed for called grid or raster DEMs. In a raster DEM the domain sortingasetofN elements stored on disk. We have imple- of the terrain is divided into square cells of equal size, and mented both algorithms, and compared their output with each cell is associated with an elevation value. data that were acquired from a real flood event. We show One method for modelling river floods on DEMsisthe that both algorithms produce an output that models the method introduced by Berg Sonne [11]; let G be a raster terrain and let R(G) be the set of cells in G that represents ∗Center for Massive Data Algorithmics, a Center of the Dan- the region covered by a river network in this terrain. Also, ish National Research Foundation. let x be a positive real. Given G, R(G)andx,themethod estimates which cells in G will get flooded if the level of the massive terrain datasets efficiently. river R(G) rises uniformly by x meters. Of course, a flood is a very complex phenomenon and is influenced by many Our Results. factors, some of which are difficult to determine. Therefore, Inspired by the above, we designed two I/O-efficient algo- we cannot expect that a flood can be modelled precisely rithms that can be used for modelling river floods. The first by the output of any method, no matter how involved this algorithm is an adaptation of Berg Sonne’s method that can method is. However, the method proposed by Berg Sonne handle massive raster terrains. The second algorithm is a is today considered a quite accurate tool for modelling river novel method that we introduce for modelling river floods. floods. Hence, after approval by the European Union it is For each of these algorithms, the input is a raster G, and a used by the state authorities of Denmark [11]. subset R(G)ofcellsinG representing the area covered by However, Berg Sonne’s method has a major drawback; it a river network. Each of our algorithms returns for each cannot process massive DEMs. Recent advances in Lidar cell c ∈Ga value f(c), indicating the minimum number of technology have made it possible to produce detailed and meters the river level should rise before c gets flooded. We huge DEM datasets. In many cases, such a dataset is so call this value the resistance value of c. Given the resistance large that it cannot fit in the main memory of a standard values f(c)foreveryc ∈G, and a positive integer x,wecan computer. Hence, the dataset has to be stored mainly on then easily extract the part of the terrain that is flooded if disk. Since the computer’s can only handle data the river level rises uniformly by x meters. Each of the algo- that appear in the main memory, blocks of data have to rithms that we propose uses different criteria for computing be transfered between the disk and the memory in order to resistance values, hence they produce different outputs. In process the dataset. We call a transfer of a single block of the algorithm by Berg Sonne, the resistance value of each data between the disk and the memory an I/O-operation, cell is computed in two stages; in the first stage, for each or an I/O for short. The problem here is that a single I/O cell c the elevation difference is computed between c and its is an extremely slow operation; it can take about the same closest river cell (in the xy-plane). We call this elevation time as a million CPU operations. Therefore, when it comes difference the obstruction value of c. In the second stage, to processing huge amounts of data, it is important to pro- theresistancevalueofc is computed based on the obstruc- cess the dataset in a way that we minimize the number of tion values of the cells that appear on any path connecting data transfers between the disk and the memory. Otherwise, c with the river network. In our new algorithm, we compute the whole process becomes practically infeasible. To handle the resistance values by taking into account how water flows this issue, Aggarwal and Vitter introduced the so called I/O- on the terrain surface. In particular, we consider a model model, which takes into account the number of I/Os between according to which water can flow from a cell to potentially the disk and the main memory [3]. The performance of an more than one of its neighbours. Based on this model, we algorithm in the I/O-model is measured as the number of compute the resistance value of each cell c ∈Gas the ele- I/Os that take place during its execution. This measure vation difference between c and the highest river cell where of performance is called the I/O-efficiency of the algorithm. water from c can reach. To describe the I/O-efficiency we need three parameters; the We have designed both of our algorithms using the I/O- size N of the input data, the size of the internal memory M, model of Agarwal and Vitter [3]. To compute the resistance and the size B of a single block of data that can be trans- values on a raster that has N cells, each of our algorithms fered from and to the disk. Two basic processes that take require O(sort(N)) I/Os in the worst case. We have im- place during the execution of most algorithms is scanning plemented both algorithms and measured their efficiency in and sorting. We can scan a set of N elements stored in the practice. We show that the two algorithms perform effi- disk with O(scan(N)) I/Os, where scan(N)=N/B.Wecan ciently even for raster datasets of approximatelly 268 giga- also sort a set of N records in an I/O-efficient manner with bytes size; to process a dataset of this size on a standard O(sort(N)) I/Os, where sort(N)=N/B logM/B N/B [3]. workstation, our adaptation for Berg Sonne’s method re- Standard algorithms are often designed based on the as- quired roughly 24 hours, and our new algorithm roughly sumption that all input data fit in the main memory. Hence, 31 hours. We also conducted experiments to evaluate that usually they cannot handle massive datasets. This is also the our algorithms can model adequately real flood events. To case for algorithms that are used to model river floods; to do this, we worked as follows; we used as reference a vec- the best of our knowledge, there does not exist any I/O- tor dataset which outlines the river flood that took place in efficient algorithm for this problem. Therefore, the users Pakistan in 2010 [6]. We also used each of our algorithms to of up-to-date hydrological software are forced to choose be- compute the resistance values on a grid that models the ter- tween two approaches. In the first approach, the resolution rain in Pakistan. Then we selected a large number of pairs of the input DEM is reduced (so it fits entirely in the main of cells from this grid; each pair was selected such that one memory). Thus, a large amount of detail in terrain data cell in the pair is covered by the flooded region in the vector is thrown away. Important features on the landscape, such dataset, and the other cell falls outside this region. On the as ditches and levees, may not be depicted anymore on the output of each algorithm, we measured the percentage of resulting terrain. When it comes to modelling a river flood, pairs where the flooded cell of the pair scores a lower resis- this results in wrong estimations. In the other approach, tance value than the non-flooded cell. This percentage was users divide the massive DEM into smaller tiles and each 87% for Berg Sonne’s method and 92% for our algorithm. tile is processed independently; in this way, when process- We repeated the same measurements many times, each time ing a single tile, we do not take into account how the rest sampling cell pairs within a different small region overlap- of the landscape affects the flood in that region. Therefore, ping with the flooded area. We observed that the percentage there is a need for developing algorithms that, on one hand scored by each method depends on the size of the sampling model river floods accuratelly, and on the other hand handle region, and the topographic heterogeneity within this region. The lowest percentages were observed for the sampling re- The first algorithm that we describe is based on the flood gions of the smallest size that we considered; these regions modelling method introduced by Berg Sonne [11]. Origi- were squares of 20 km dimension. For such regions, the nally, this method was designed to solve a more simple prob- mean percentage measured for Berg Sonne’s method is 61%, lem than the one that we examine. In particular, the input and the mean percentage for our algorithm is 71%. For all of the original method is a raster G, the river network R(G), region sizes that we used, our algorithm provided on aver- and a rise value hrise. Instead of computing flood resistance age more accurate results than Berg Sonne’s method. To values, the method outputs the cells in G that are consid- understand the reasons behind this difference in the out- ered to get flooded when R(G)risesbyhrise meters. We call put quality, we used both algorithms to model river floods this version of the method ProximityF lood. Below, we first on a massive raster that represents the terrain in Denmark. explain how ProximityF lood calculates the flooded cells in Among other artefacts, the method by Berg Sonne produced G foragivenrisevaluehrise. Then, we show how we can flooded regions that had larger size than the ones calculated use this method to design an I/O-efficient algorithm that by our algorithm. As we explain, one reason for this is that computes a flood resistance value for each input cell1. Berg Sonne’s method produces very small resistance values ProximityF lood consists of two steps. In the first step, for areas along the entire coastline of the terrain. every cell c ∈G\R(G) gets associated with a single river cell in R(G); this is the river cell from which we consider that c 2. PROBLEM DEFINITION AND NOTATION can potentially get flooded. We call this cell the source cell of c, and we denote this by source(c). The source cell for every G N Let be a grid terrain that consists of cells. For ev- c ∈G\R(G) is defined as the river cell c ∈ R(G) that has the c ∈G h c ery cell we use ( ) to indicate the elevation of the smallest xy-distance from c. After calculating source(c)for terrain at this cell. We denote the cell that appears at the every non-river cell c, the height difference between source(c) i j G G i, j -th row and -th column of by ( ). We assume with- and c is computed and stored together with c. We call this G i, j out loss of generality that the center of grid cell ( )has value the obstruction value obst(c)ofc. xy j, i c ∈ G -coordinates ( ). For any cell , we denote this In the second step, we extract the cells in G that are con- p c xy center point by ( ). We call the -distance, or simply the sidered to get flooded when the river rises by h meters. G rise distance, between two cells in the 2D Euclidean distance More specifically, we extract any cell c that a) has an ob- between their cell centers on the xy-domain of G.LetC be struction value obst[c]  hrise and b) there exists a path of asetofcellsinG and let c be a cell that belongs to this set. c c cells between andarivercell R such that any non-river c C c We say that is the closest cell in to another cell if cell c in this path has an obstruction value obst(c )  h . c xy c rise has the smallest -distance to compared to any other Notice that, in this way, not all cells with obstruction  h C rise cell in . are flooded. R G G We use ( ) to denote a subset of the cells in that Method ProximityF lood canbeusedtomodelasingle belong to a river network of the terrain. We call these cells flood event at a time. On the other hand, if we want to G R G the river cells of . The cells in ( ) represent the river study which regions get flooded for different rise values then G network in when there is no flood. This implies that the we have to run this method many times, once for each dis- elevation value of each cell in R(G) approximates the average tinct rise value hrise. To avoid this, we instead choose to height of the river level at this location when no flood occurs. compute for each cell c the minimum rise value h for R G rise For the algorithms that we present, we assume that ( )is which c gets flooded according to method ProximityF lood. provided as part of the input. Below we describe our new I/O-efficient algorithm that does h Let rise be a positive real. We say that there is a river this, which we call ProximityResistance. h h rise of rise meters, or that the river rises by rise meters, As with ProximityF lood, the new algorithm consists of c ∈ R G whenforeachcell ( ) the river level rises to eleva- two main steps. In the first step, we compute for each h c h h tion ( )+ rise.Wecall rise the rise value.Thus,when cell c ∈G\R(G) the source cell source(c) and the ob- a river rise takes place we assume that the level of the river struction obst(c). In the second step, we calculate the flood increases by the same amount at all river cells. resistance values of all cells in G\R(G). We study the following problem. Given a terrain G and R G its river network ( ), we want to compute for every cell Computing the source cells and obstruction values. c ∈Ga value f(c) that estimates the minimum value h rise For the first step, the main task is to compute the source such that c gets flooded when the river rises by h meters. rise cell for each non-river cell c; given this cell, it is straight- We call this value the resistance value of c. Eachofthe forward to compute the obstruction obst(c). Calculating two algorithms that we present in this paper defines these thesourcecellsinG is similar to computing a Voronoi di- resistance values in a different way; hence, for the same in- put grid the output between the two algorithms may differ 1Some implementations of ProximityF lood include an ex- substantially. For each algorithm. we provide a detailed tra preprocessing step where the heights of the river cells definition for the resistance of a grid cell in the description are adjusted to make it consistent with the rest of the ter- of the algorithm. For both algorithms, it is assumed that rain data. This is useful when the river dataset is acquired all river cells are flooded by default. Therefore, for both from a different source than the terrain raster. In that case, approaches we imply that the resistance value of every river projecting the river data on the raster may create artefacts cell is set to zero. (such as rivers that flow upstream). In the description that we provide for ProximityF lood we do not include this pre- processing step; we consider that this step has to do more 3. DESCRIPTION OF THE ALGORITHMS with configuring the datasets rather than with the method itself. Yet, the preprocessing step can be also handled in an I/O-efficient manner, given a realistic assumption on the 3.1 Adaptation of Berg Sonne’s Method memory size. agram on the xy-domain of G; the sites of the Voronoi di- the terrain is obviously influenced by the terrain topogra- agram are the center-points of the river cells in G and for phy. Therefore, we introduce a novel method which instead any cell c ∈G\R(G) it holds that source(c)=c if chooses source(c) based on a model that represents how wa- the center of c falls in the Voronoi region of p(c). Com- ter flows on the terrain. We refer to this new method as puting the Voronoi diagram of the river cells can be done UpstreamResistance. Below, we first describe how source(c) in O(sort(N)) I/Os [10, 2]. Then, we sweep simultaneously is chosen in UpstreamResistance, and then we show how we the diagram and grid G. During the sweep, we maintain the can compute this I/O-efficiently. diagram edges that intersect the sweep line, sorted accord- For a raster G let F(G)=(V,E) be the graph such that for ing to the x-coordinate of their intersection point with this each cell c ∈Gthere exists exactly one vertex v(c)inV ,and line. For every row of G that we encounter, we scan the there exists a directed edge in E from v(c)tov(c)ifcells edges that intersect the sweep line to determine the Voronoi c, c ∈Gare adjacent and h(c) >h(c). We call this graph region (and therefore the corresponding source cell) where the flow graph of G. For now let us assume that no adjacent each cell in the row belongs to. Notice that the number cells in G have the same elevation value. Hence, there exists of edges in the sweep line cannot be more than two times exactly one directed edge in F(G) for each pair of adjacent the number of cells in a row. This is because there can- cells in G,andF(G)isaDAG. The concept of the flow graph not be more than two river cells on a single column whose was introduced in previous works to model how water flows Voronoi regions intersect the same horizontal line. There- between cells on a DEM. It is naturally assumed that water fore, scanning the raster and updating the sweep line can on a cell can flow only to neighbour cells with lower height; be done efficiently in O(sort(N)) I/Os in total. From this that is modelled with a directed edge in the flow graph. we conclude that computing the source cells on the raster For any cell c ∈ G water from c may flow following dif- can be performed in O(sort(N)) I/Os. But we can do this ferent routes on the raster until reaching one or more cells on more efficiently; in the appendix we present an algorithm the boundary of river R(G). In method UpstreamResistance that computes the source cells using O(scan(N)) I/Os. we choose source(c) to be one of these cells on the river boundary, that is, the river cells where the water from c Computing the flood resistance values. reaches. More formally, let c be a cell in G.Considera InthesecondstepofmethodProximityResistance we path in F(G) that starts from vertex v(c) and ends at a ver- compute for each cell c ∈Gits flood resistance f(c). Re- tex v(c ) where c is a river cell, such that the path does not call that for every cell c this resistance value is equal to the contain a vertex corresponding to any other river cell. We c DC c minimum rise value hrise such that obst(c)  hrise and c call such a path a downstream path of .Let ( )denote is connected to the river by a path of cells with obstruction the set of all river cells that belong to some downstream c pstreamResistance c  hrise. Based on this definition, we can reduce the compu- path of . In method U ,source()is tation of the flood resistance values to the problem of com- the cell in DC(c) with the highest elevation value. The puting the raise elevations on a terrain, that was described flood resistance of c is then defined as the height differ- by Arge et al. as part of their partial flooding algorithm [8]. ence h(c) − h(source(c)). This problem is defined as follows; let G be a raster and When it comes to implementing UpstreamResistance I/O- let ζ1, ..., ζk be a set of cells in G that we call sinks.For efficiently, the two key tasks for computing the flood re- any path of cells path in G theheightofpathisdefinedas sistances are constructing the flow graph F(G), and com- the height of the highest cell on this path. The raise eleva- puting the source cell for every cell in G. If no flat ar- tion of a cell c ∈Gis the minimum height among all paths eas exist on G, we can construct F(G) straightforwardly that connect c to ζi for any 1  i  k.Argeet al. provide in O(scan(N)) I/Os. As for computing the source cells, ob- an algorithm that computes the raise elevations for all the serve that for any cell c it holds that source(c)=source(c ) cells on the terrain in O(sort(N)) I/Os [8]. for some c such that there exists an edge in F(G) from v(c) We can reduce the problem of computing the flood re- to v(c ). Therefore, we can compute source(c) by first com- sistance values of the cells in G to an instance of the raise puting the source cells for those neighbours of c that appear elevation problem as follows; we create a raster G that has downstream in F(G), and then use these to infer source(c). the same number of rows and columns as G. For any river Arge et al. describe an I/O-efficient algorithm that com- cell G(i, j) ∈ R(G) we let the corresponding cell G(i, j)to putes the number of upstream cells for every cell on a raster be a sink. For any non-river cell G(i, j)weletcellG(i, j) in O(sort(N)) I/Os [4]. Their algorithm can be easily mod- have elevation equal to the obstruction value of G(i, j). It ified for computing the source cells in G. Therefore, we can is know easy to see that the raise value of any cell G(i, j) perform this computation in O(sort(N)) I/Os. is equal to the flood resistance value that we want to com- pute for G(i, j). By applying the I/O-efficient algorithm of Handling flat areas. Arge et al. on G we can compute the described flood resis- Terrain datasets often contain large connected regions of tance values in O(sort(N)) I/Os. cells that have exactly the same elevation. Given raster G that contains such flat areas, we have to perform two extra Theorem 3.1. Let G be a raster terrain that consists steps; first, we have to outline all distinct flat areas in G,and of N cells, and let R(G) be the set of all river cells in this then we have to model how water flows on each such area. raster. ProximityResistance computes the flood resistance For the second step, we will have to modify the definition values of all cells in G using O(sort(N)) I/Os. of the flow graph so as to represent flow between cells in a 3.2 Our New Method flat area. To outline the flat areas in G we have to com- In ProximityResistance,acellc can only get flooded pute the connected components of cells in the raster that from the closest river cell source(c)inthexy-plane. In- have the same elevation. This can be done I/O-efficiently O N A ⊆G G tuitively, this is very unnatural since the flow of water on in (sort( )) I/Os. Let be a flat area in ,and let c be a cell on the boundary of A.Wesaythatc is a 4.IMPLEMENTATIONS AND EXPERIMENTS spill point of A if c is adjacent to at least one cell that has We implemented both algorithms described in Section 3 elevation lower than h(c). When modelling water flow on A in order to evaluate how fast they perform in practice, as the goal is to route flow so that every cell in A drains to at well as how good they model real flood events. least one spill point of this area (if such a spill point exists). Let A be a flat area that has at least one spill point. Ev- 4.1 Description of Implementations c ∈ A A ery cell can be routed to all the spill points in . We implemented both algorithms in C++, using the open A Therefore, all cells in should have exactly the same source source library TPIE that provides I/O-efficient algorithms cell and also the same flood resistance. To represent this for sorting and scanning data [13]. We used the GNU g++ appropriatelly, we modify slightly the way we construct the compiler (version 4.8.2), and the experiments were ran on a A flow graph such that instead of representing each cell in Linux Ubuntu operating system (release 14.04). by exactly one vertex, we use a single vertex to represent When implementing ProximityResistance we made two v A theentireflatarea.Wedenotethisvertexby ( ). The modifications compared to the description in Section 3. First, v A v c in-edges of ( ) connect this vertex with all vertices ( ) when computing the source cells on G using a sweepline ap- c A c such that is a cell adjacent to ,and has a higher ele- proach, we used the O(scan(N)) approach (described in the A v A vation than . Similarly, the out-edges of ( ) connect to appendix) and we made the practically realistic assumption all lower elevation vertices in the graph representing cells that a constant number of rows in G canfitinmainmemory. A adjacent to . Thus, instead of performing an external scan of each row and After building the flow graph in this manner, we proceed maintaining an I/O-efficient stack during the sweep, we sim- with the computation of the resistance values. We process- ply store the two last rows that we swept in memory and per- A ing a vertex that represents a flat region , we distinguish form all computations internally. Second, when computing A two cases depending on whether contains river cells or the raise elevations we did not use the O(sort(N)) batched A not. In the case that does not contain any river cell, we union-find algorithm by Agarwal et al. [1] (that is quite in- c find the highest river cell that appears on a downstream volved), but instead a much simpler O(sort(N)log(N/M)) v A A path from ( ), and for every cell in we set the flood re- algorithm also proposed by Agarwal et al.. Agarwal et al. [1] A c sistance to the elevation difference between and .Inthe and Danner et al. [8] have showed that this simple union-find A case that contains river cells, we consider that all cells in algorithm performs very well in practice. A this area are flooded by default. Hence, for every cell in When implementing UpstreamResistance we accurately we set the flood resistance value to zero. In that case, ver- followed the description in Section 3. The only difference v A tex ( ) is treated in the flow graph in the same way as a was that we again used the practical union-find algorithm of c vertex that represents a single river cell; for each cell such Agarwal et al. (that requires O(sort(N)log(N/M)) I/Os), v A c that ( ) appears in a downstream path from we use the this time for computing the connected components of flat A c A elevation of to determine the flood resistance of ,asif areas in G, and for removing flat areas that correspond to was a single river cell. spurious pits. Note that not all flat areas have a spill point. In that case, a flat area is a region of locally minimum elevation 4.2 Measuring I/O-Efficiency in Practice in G.LetA be such an area in G.IfA does not contain any To measure the practical efficiency of each method, we ran river cell then we consider that A corresponds to a spurious our implementations on a massive raster dataset that repre- pit. We remove all such pits by raising the elevation of the sents the terrain surface of the entire country of Denmark. terrain within and around this region. The removal of the Publicly available through the website of the Danish Min- spurious pits can be done as a preprocessing step before istry of Environment [7], this raster consists of roughly 66.4 constructing the actual flow graph on G.Wecandothis billion cells, arranged in 287500 rows and 231250 columns. I/O-efficiently in O(sort(N)) I/Os, using the partial flooding Each cell represents a region of 1.6×1.6 meters on the terrain algorithm described by Danner et al. [8]. On the other hand, and is assigned an elevation value which is a 4-byte floating if A contains river cells, then we process this area in the same point number. The total size of the uncompressed dataset way as we did with flat areas that contain both river cells is 268 gigabytes. We refer to this dataset as denmark. and spill points; the flood resistance of all cells in A are Raster denmark does not include any river data, and there- set to zero, and the entire area is represented by a single fore we had to extract the river cells before conducting the vertex v(A) in the flow graph. experiments. To do so, we first preprocessed the raster by From the above description, we conclude that construct- removing all shallow pits. Then, we selected the river cells ing the flow graph for a raster G boils down to computing based on the size of their upstream area. For this reason, the connected components of flat regions in G,andthen we computed the flow graph of denmark as described in Sec- adding in the described way the graph edges between the tion3exceptthatforeachcellc we included at most one vertices of the flat regions and the adjacent cells. We can outgoing edge. This outgoing edge points to the vertex v(c ) compute the connected components of flat regions on G us- such that c is a neighbour of c and the vector from p(c) ing the batched union-find algorithm of Agarwal et al. [1] to p(c ) has the steepest downward slope. Then, we com- which uses O(sort(N)) I/Os. Constructing the modified flow puted for each cell c the size of its upstream area; this is graph, and processing the vertices that represent flat areas the area that is covered by all cells c such that there exists can be done straightforwardly in O(sort(N)) I/Os. a path from v(c )tov(c) in the flow graph. We extracted Theorem 3.2. Let G be a raster terrain that consists of the river cells by selecting all cells whose upstream area was N cells, and let R(G) be the set of river cells in this raster. larger than 12.5km2. We picked this threshold since the UpstreamResistance computes the flood resistance values resulting river network resembles better the actual shape of of all cells in G using O(sort(N)) I/Os. the rivers in Denmark, according to available orthophotos. Weranbothofouralgorithmsonthedenmark raster and the extracted river cells, and we evaluated the output of each the extracted cells, using a workstation that has a Xeon algorithm using a method that resembles the Area-Under- CPU (W3565), a four-core processor with 3.2GHz per core. the-Curve (also known as AUC) measure, which is one of The workstation had 48 Gigabytes of main memory, and a the most popular measures for model testing [5]. In par- raid (redundant array of independent disks) that consists ticular, we overlayed flood with indus and extracted the of nineteen disks, with 3 Terrabytes capacity in total. The cells in indus whose centers lie in the interior of a poly- maximum amount of main memory that was available at gon in flood. We refer to these cells as the flooded cells any point during the execution of our implementations was of indus. In total, we identified 4045544 flooded cells. Next 22 Gigabytes. The total time taken by the implementa- we selected at random a large set of pairs of cells. Each pair tion of ProximityResistance was roughly 24.2 hours; only was selected so that it consists of one flooded cell and one 2.4 hours was used for computing the source cell for each non-flooded cell. We denote this set of pairs by P.Foreach non-river cell, and the rest 21.8 hours were spent on com- of our methods, we determined for each pair pr ∈Pif puting the resistance values. For the implementation of the flooded cell in pr scores a higher resistance value than UpstreamResistance, the total execution time was approxi- the non-flooded cell, and calculated the percentage of the matelly 31.1 hours. The first stage of this method, where the pairs in P for which this condition holds. We call this per- flow graph of the input raster is computed, took 12.5 hours. centage the output quality of the method. The value of the The rest 18.6 hours were spent for delineating the flat areas output quality is an estimation of the AUC measure; the on the terrain, and computing the resistance values. output quality value is equal to the AUC if P consists of From the above, it is clear that the implementations of all possible pairs of flooded/non-flooded cells in the region both methods have a very good performance even for an of interest. For our study, we chose 105 pairs, consider- enormous dataset such as denmark. Each method took less ing that this is a sufficient number for estimating the value than one and a half day to process this dataset, using an of the AUC.FormethodProximityResistance the output amount of main memory which corresponds to roughly 8% quality is 87%, while for UpstreamResistance the output of the datasets total size. quality is 92%. This shows clearly that both of the methods produce flood resistances that are highly consistent with the 4.3 Evaluating the Quality of Flood Modelling actual event. To measure how the two methods perform on a more lo- In the second set of experiments we used an actual flood cal scale, we calculated their output quality within several event to evaluate the quality of the output produced by the smaller regions. More specifically, within the xy-region cov- two methods, namely the catastrophic flood of the Indus ered by flood we extracted three sets of square windows, river that took place in Pakistan in 2010 [6]. each set consisting of windows of certain size. In the first For the experiments we used a raster terrain extracted set each window is a square with dimension 20 km, in the from the SRTM grid, a DEM that represents the earth sur- ◦ ◦ second set each window has dimension 40 km, and the third face from 60 North to 56 South [12]. The extracted raster set consists of windows of 80 km dimension. The windows of covers a square region of approximately 2160×2160 kilome- each set were picked in the following way. Within the region ters and includes the entire Indus river basin–see Fig. 1. The covered by flood we extracted at random 500 windows of raster consists of 24,000×24,000 cells, and the dimension of the same size. Then we used a greedy algorithm to select a each square cell is approximately 90 meters. We refer to this subset of these windows, so that there is no pair of windows dataset as indus. inthesubsetthatoverlapwitheachother,andsothateach Since the indus DEM does not contain any river data, window contains at least 500 flooded and at least 500 non- we extracted the river cells based on the upstream area of flooded cells. Thus, we ended up with a subset of 119 win- each cell, in the same way as we did for the denmark dataset dows for the first set, and forty-five and twenty-two windows in Section 4.2. In this case we used an upstream area of for the second and third set respectively. From each window, 300 km2 since it produces a visual result that matches the we selected 105 cell pairs, again so that each pair contains shape of the local river network, as it appears in orthophotos one flooded and one non-flooded cell. We then calculated the acquired when there was no flood in the region. output quality of our methods for each window. Figure 4.3 To evaluate the ability of our algorithms to accurately shows the results for the windows of 20 km dimension, where model floods, we used a vector dataset that shows the actual the mean output quality was 61% for ProximityResistance flooded regions around the river during the Indus river flood. and 71% for UpstreamResistance. For windows of 40 km di- This dataset was released by the Dartmouth Flood Obser- mension, ProximityResistance attained mean output qual- vatory, and contains data acquired with MODIS (Moderate- ity 69% and UpstreamResistance mean output quality 81%. resolution Imaging Spectroradiometer) technology [9]. We For the third set of windows, the values were 76% and 85%, refer to this dataset as flood.Theflood dataset was con- respectively. structed based on several satellite photos of the Indus region, Therefore, for each window size UpstreamResistance has acquired during the period from the 1st to the 5th of August higher mean output quality than ProximityResistance.For of 2010. It represents with polygons all the regions that were both methods the output quality increases as the window flooded in at least one day during this period. The bounding size becomes larger. Yet, we observed that for all examined box of flood covers a rectangular region that spans approx- window sizes, there exist windows were at least one of the imatelly 1118 and 911 kilometers on the langitudinal and methods has an output quality value of less than 50%. the longitudinal axes respectively. It contains 4294 poly- To examine the above further, we investigated if there is gons, and the total area covered by these polygons is ap- a correlation between the output quality values and the two proximatelly 30483 km2. Refer to Fig. 1. following factors: heterogeneity of the terrain (variability We ran the implementations of ProximityResistance and of elevation values) and the number of flooded cells inside UpstreamResistance algorithms on the indus DEM and Figure 1: (a) An illustration of the indus DEM together with the flood vector dataset. The cells of the DEM appear in grayscale colours, shaded according to their elevation values; cells of higher elevation are indicated by lighter shades. The polygons of the flood dataset appear in red colour. (b) A closer view of the flooded regions.

Figure 2: The locations for the selected windows of 20 km dimension. In each subfigure, a window is represented by a colored box. Each box is colored according to the output quality value achieved by one of the methods for the corresponding window. The relative size of the boxes in the figure is larger than the size of the original windows. We did this to make the color of each box more visible. The xy-regions of the original windows do not overlap with each other. Left: boxes colored based on the output quality values for ProximityResistance. Right: boxes colored according to the output quality values of UpstreamResistance. each window. To measure the heterogeneity of the terrain scatter plot contains a 2-dimensional point p(w)forevery within each window w, we computed the logarithm of the window w; the horizontal coordinate of p(w)isequaltothe standard deviation for the elevations of the cells in w.We topographic heterogeneity of w, and the vertical coordinate call this value the topographic heterogeneity of w. In order of this point is equal to the output quality of the method for to examine visually the relation between the output qual- w. Figure 4.4 shows the scatter plots that we produced for ity and the topographic heterogeneity among the different windows of 20 km dimension. In a similar way, we created windows, we created a scatter plot for each method. Each a scatter plot for each method where the horizontal coor- dinates of the presented points are equal to the number of 5. REFERENCES flooded cells in the windows that we examine. These scat- [1] P. K. Agarwal, L. Arge, and K. Yi. I/O-Efficient ter plots appear also in Figure 4.4. It becomes evident that Batched Union-Find and its Applications to Terrain both of the methods score higher output quality values for Analysis. In Proc. 22nd Annual Symposium on windows of intermediate topographic heterogeneity. Most of Computational Geometry, pages 167–176, 2006. the low output quality values appear on windows of small [2] P. K. Agarwal, L. Arge, and K. Yi. I/O-Efficient heterogeneity. Regions that consist mainly of flat areas be- Construction of Constrained Delaunay Triangulations. long to this category. Also, there does not appear to be any In Proc. 13th Annual European Symposium on relation between the output quality of the methods and the Algorithms, pages 355–366, 2005. number of flooded cells within each window. The visualisa- [3] A. Aggarwal and J. S. Vitter. The Input/Output tions that we produced for the windows of larger size showed Complexity of Sorting and Related Problems. similar patterns. Communications of the ACM, 31(9):1116–1127, 1988. [4] L. Arge, L. Toma, and J. S. Vitter. I/O-Efficient Algorithms for Problems on Grid-Based Terrains. 4.4 Comparing the Output of the Methods ACM Journal on Experimental Algorithmics, 6(1), To gain more insight about ProximityResistance and 2001. UpstreamResistance, we visually examined the output that [5] U. Brefeld and T. Scheffer. AUC maximizing support the two methods produced for the denmark dataset. During vector learning. In Proc. ICML Workshop on ROC this examination, for various rise values ρ we extracted the Analysis in Machine Learning, 2005. regions in the output of each method which consisted of all [6] Encyclopaedia Britannica Webpage, Pakistan Floods of  ρ cells with flood resistance . The first observation we 2010. http://www.britannica.com/EBchecked/topic/ madewasthatforthesamerisevaluethesizeoftheflooded 1731329/Pakistan-Floods-of-2010. roximityResistance area that appears in the output of P is [7] Elevation Model of Denmark, Geodata Agency of the pstreamResistance larger than in the output produced by U . Danish Ministry of Environment. Refer to Figure 4(a) and Figure 4(b). One of the reasons http://gst.dk/emner/frie-data/hvilke-data-er- that contributes to this difference is how the two meth- omfattet/hvilke-data-er-frie/dhm-danmarks- ods estimate river floods around coastlines; in the output hoejdemodel/. of ProximityResistance, almost the entire coastline of the [8] A. Danner, T. Mølhave, K. Yi, P.K. Agarwal, L. Arge, terrain appears flooded even for very small rise values. Refer and H. Mitasova. TerraStream: From Elevation Data to Figure 4(c). We can explain this as follows. Recall that to Watershed Hierarchies. In Proc. 15th ACM with ProximityResistance acellc gets flooded for a rise International Symposium on Advances in Geographic value ρ if a) this cell has a height difference  ρ from the Information Science (ACM GIS), pages 212–219, 2007. closest river cell on the xy-domain (the obstruction value), [9] The Dartmouth Flood Observatory Webpage. and b) if there is a path from c to any river cell such that . the obstruction values of all cells in the path is  ρ.The http://floodobservatory.colorado.edu/ terrain cells which appear close to the coastline have low [10] M.T. Goodrich, J.J. Tsay, D.E. Vengroff and height values, since they lie almost on the sea level. There- J.S. Vitter. External-Memory Computational fore, for each such cell the height difference from the closest Geometry. In Proc. 34th IEEE Annual Symposium on river cell is either very small or negative, which means that Foundations of Computer Science (FOCS),pages the coastline constitutes a path of cells that connects to the 714–723, 1993. river and all cells in this path have very low obstruction val- [11] B. Kronvang, M. Søndergaard, C.C. Hoffmann, ues. As a consequence, even for small rise values all cells in H. Thodsen, N. Bering Ovesen, M. Stjernholm, this path are flooded when using ProximityResistance.On C. B. Nielsen, C. Kjærgaard, B. Schønfeldt, and ˚ the other hand, in the output of the UpstreamResistance B. Levesen. Etablering af P-Adale. Technical report method, coastlines do not appear flooded even for large rise 840, National Environmental Research Institute of values. The reason is that for a coastline cell there is usually Denmark, 2011. no flow path that connects this cell with a river cell. Hence, [12] E. Rodriguez, C.S. Morris, and J.E. Belz. (2006). A cell c can not get flooded whichever the river-rise value. We Global Assessment of the SRTM Performance. also observed one more artefact in the output of method Photogrammetric Engineering & Remote Sensing, ProximityResistance; in some cases, this method produces 72(3):249–260, 2006. flooded regions with long linear boundaries that do not cor- [13] TPIE, the Templated Portable I/O Environment. respond to actual obstacles on the elevation profile of the http://www.madalgo.au.dk/tpie/. terrain. Refer to Figure 4(d). These artefacts are the re- sult of assigning obstruction values to non-river cells based on the Voronoi diagram of the river cells on the xy-domain of the terrain. In an area that extends between two differ- ent river streams, this step may produce two regions of cells that have a large difference in their obstruction values. The boundary between these two regions follows the boundaries between Voronoi regions of river cells that belong to differ- ent streams. As a consequence, for certain rise values there appear flooded areas in the output whose boundary follows the boundary between the Voronoi regions of the river cells. ProximityResistance UpstreamResistance

● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●●●●●● ● ● ●●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ●●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●

Output quality ● Output quality ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 ● ● ● ●● ● ●

0123456 0123456

Topographic heterogeneity Topographic heterogeneity

ProximityResistance UpstreamResistance

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Output quality ● Output quality ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 ● ● ● ● ● ● ●

0 1600 6400 14400 25600 40000 0 1600 6400 14400 25600 40000

Number of flooded cells Number of flooded cells

Figure 3: Scatter plots that show the relation between the output quality of each method and two features of the examined windows. Each point corresponds to a window of 20 km dimension. Top: The relation between output quality and topographic heterogeneity. Bottom: The relation between output quality and the number of flooded cells in each window. (a) (b)

(c) (d) Figure 4: An illustration of the outputs of P roximityResistance and UpstreamResistance for the denmark dataset. Flooded regions are indicated by dark blue color. (a) The output of the P roximityResistance around Hadsund town (north-east Jutland) for a rise value of half a meter. (b) The output of UpstreamResistance for the same region and rise value. (c) The output of P roximityResistance close to Vejle city with a rise value of just one milimeter. The entire coast appears flooded, with wide flooded areas at certain places. (d) The output of the P roximityResistance on a region with several river streams, for a rise value of 2.8 meters. APPENDIX ever, not all region boundaries in VD(i − 1) that intersect l i − l i In Section 3.1 we claimed that, given a grid G and the set of s( 1) intersect also s( ); this is the case when there exist VD i − y river cells R(G) on this grid, we can compute for every cell regions in ( 1) whose -span ends somewhere between l i l i − c ∈Gits source cell using only O(scan(N)) I/Os. Recall that s( )and s( 1). This can be handled in the following L I/O the source cell of c is the cell in R(G)thathastheclosest way; we scan list i−1 and we maintain an -efficient ST xy-distance to c. Next we describe in detail how we can do stack that stores are the bisectors between Voronoi sites VD i − l i this computation while achieving the claimed performance in ( 1) whose regions potentially intersect s( ). More p bound. specifically, let curr be the element that is currently scanned L ST Instead of computing explicitly the Voronoi diagram of in i−1. We maintain the invariant that stores the bisec- VD i − centers of the cells in R(G), we use a simpler approach. For tors for only those sites in ( 1) whose Voronoi regions: l i − p every cell c = G(i, j) we compute three cells: The closest a) intersect s( 1) to the left of curr, and b) would inter- l i VD i − river cell G(k, l) such that ki, and the closest river cell G(k,l) such that regions intersect s( 1) to the right of . We also maintain that for any two lines l and l in the stack, line l is stored k = i. We indicate these cells by source<(c), source>(c), 1 2 1 l l l i l and source (c), respectively. Obviously, one of these three below 2 if and ony if 1 crosses s( ) to the left of 2.With = l ST cells is the source cell of c; given those cells we can de- each line that we store in we also maintain the two sites rc l rc l l ST termine source(c) by simply comparing their xy-distances left( )and right( )forwhich is the bisector. Initially p from c. We can calculate these three cells for every cell in G is empty. Recall that curr is the intersection point between l i− in O(scan(N)) I/Os using a sweep-line technique. Next we the sweepline s( 1) and the bisector line of two neighbour- ing sites rcleft(p)andrcright(p). Let lbisect be this line for show how we can do this for source<(c)andsource=(c); the p , the element currently processed in Li− .Lettop(ST) process for computing source>(c) is quite similar to the one curr 1 indicate the bisector currently stored at the top of the stack for source<(c). ST l Let row(i)denotethei-th row in G, where row(1) is the . We compute the intersection point between bisect and ST y bottom row in the raster. Let VD(i) indicate the Voronoi top( ); if this point has a -coordinate that falls above l i l i − l diagram on the xy-domain of G where the sites are the s( ), or below s( 1) we push bisect into the stack and L centers of all river cells G(k, l) such that k(c), source<(c), source=(c), and therefore the source cell source(c) for every cell c ∈G\R(G)inO(scan(N)) I/Os. REPORTS

changing climate, climate-change velocity is more The Influence of Late Quaternary biologically relevant than are traditional macro- climatic anomalies (13, 16, 19, 20). Furthermore, Climate-Change Velocity on because climate-change velocity incorporates fine- scale topoclimate gradients it captures the impor- tant buffering effect of topographic heterogeneity Species Endemism on climate change (21). For example, a given tem- perature change should have very different bio- 1,2 2 3 4 5 3 1 B. Sandel, * L. Arge, B. Dalsgaard, R. G. Davies, K. J. Gaston, W. J. Sutherland, J.-C. Svenning logical consequences depending on topographic context: In flat areas, considerable movement is The effects of climate change on biodiversity should depend in part on climate displacement required to track a 1°C increase in temperature, rate (climate-change velocity) and its interaction with species’ capacity to migrate. We estimated whereas a short shift uphill could be sufficient in Late Quaternary glacial-interglacial climate-change velocity by integrating macroclimatic shifts since mountains. Thus, species distributed in topograph- the Last Glacial Maximum with topoclimatic gradients. Globally, areas with high velocities were ically homogenous landscapes will experience associated with marked absences of small-ranged amphibians, mammals, and birds. The association higher climate-change velocities and therefore re- between endemism and velocity was weakest in the highly vagile birds and strongest in the weakly quire stronger dispersal abilities to track climate dispersing amphibians, linking dispersal ability to extinction risk due to climate change. High velocity change than those of species in heterogeneous was also associated with low endemism at regional scales, especially in wet and aseasonal regions. landscapes. Overall, we show that low-velocity areas are essential refuges for Earth’s many small-ranged species. High climate-change velocities are likely to be associated with incomplete range filling and nthropogenic climate change is a major cupy climatically suitable areas (9–14)andmay species extinctions (22). Not all species, however, threat to Earth’s biodiversity and the eco- even go extinct, despite appropriate habitat being system services it provides (1). As cli- present elsewhere (15–17). A 1Ecoinformatics and Biodiversity Group, Department of Bio- mate changes, the conditions suitable for local Climate-change velocity is a measure of the science, Aarhus University, Aarhus 8000 C, Denmark. 2Center persistence of a particular species move across local rate of displacement of climatic conditions for Massive Data Algorithmics (MADALGO), Department of Com- ’ 18 the surface of the Earth, driving species responses over Earth s surface ( ). It integrates macrocli- puter Science, Aarhus University, Aarhus 8000 C, Denmark. on May 24, 2016 both to recent warming (2–4) and to long-term matic shifts with local spatial topoclimate gra- 3Conservation Science Group, Department of Zoology, Univer- 4 natural climate cycles (5–8). Species with strong dients and is calculated by dividing the rate of sity of Cambridge, Cambridge CB2 3EJ, UK. School of Biolog- ical Sciences, University of East Anglia, Norwich NR4 7TJ, UK. dispersal abilities inhabiting relatively stable cli- climate change through time by the local rate of 5Environment and Sustainability Institute, University of Exeter, mates may track climate fairly closely. Converse- climate change across space, yielding a local in- Cornwall TR10 9EZ, UK. ly, species with weak dispersal abilities relative stantaneous velocity measure. By describing the *To whom correspondence should be addressed. E-mail: to the rate of climate change may fail fully to oc- local rate at which species must migrate to track [email protected]

Fig. 1. Global maps of (A) climate-change velocity since the Last Glacial Maximum (21,000 years ago); proportional endemism of (B)amphibians,(C)mam- mals, and (D)birds;and(E) relationships of past

and predicted future climate-change velocity. Ve- http://science.sciencemag.org/ locity is highest in northeastern North America and north-central Eurasia; these same areas display low or no endemism (the black line delimits areas that were glaciated at LGM). Mountain ranges and other low-velocity regions exhibit higher endemic richness of amphibians, mammals, and birds. Rank-transformed velocity since the LGM and rank-transformed veloc- ities until 2080 show similar spatial patterns, but there Downloaded from are areas of important mismatch. Orange areas, where velocities in the past have been low but predicted fu- ture velocities are expected to be high, are a particular conservation concern.

660 4 NOVEMBER 2011 VOL 334 SCIENCE www.sciencemag.org REPORTS are at equal risk from high velocity (20). Strong north-central Eurasia (Fig. 1A). Velocities tended tial pattern of modern climate conditions, and dispersers should be most able to maintain dis- to be lower in the Southern Hemisphere and in climate-change velocity. The spatial pattern of mod- tributional equilibrium with climate conditions mountainous areas. ern climate conditions was summarized by calcu- and are therefore likely to occupy more of their Globally, small-ranged (<250,000 km2; here- lating, for each grid cell, the total area of land potential range and avoid extinction. Species with after termed “endemic”) amphibians, mammals, within a 1000-km radius that had a mean annual small ranges are at particular risk of extinction and birds are concentrated where climate-change temperature (MAT) within 1°C and mean annual (20); they often have small population sizes and velocity is low (Fig. 1, B to D) [results were not precipitation (MAP) within 100 mm of that focal densities (23) and are less likely to occupy ref- sensitive to the particular definition of small cell (24). The extent of analogous modern cli- uges that remain suitable during climate oscil- ranges (27)]. In high-velocity northeastern North mate exerted strong influences on endemism, re- lations (15, 16, 24). Range size may also reflect America and north-central Eurasia, endemic spe- flecting the fact that regionally rare climates are other species characteristics that influence resil- cies are nearly absent, whereas low-velocity areas likely to contain small-ranged species (Table 1) ience to climate fluctuation, with widespread spe- often harbor highly endemic faunas. For all spe- (24). Endemism of all three groups was nega- cies tending to have broad climatic tolerances cies groups, low-velocity sites are more likely to tively related to productivity and to two measures (25), generalist strategies (23), and strong disper- harbor endemic species than are high-velocity of seasonality. In models that considered all var- sal capabilities (23). Hence, high-velocity regions sites (Fig. 2, insets; logistic regression with spa- iables together while controlling for spatial auto- should have fewer small-ranged species and fewer tial filters: n =2000gridcells,allP < 0.0001). In correlation [simultaneous autoregressive models species with poor dispersal ability (16). addition, among regions (10° × 10° equivalents) (SARs)], velocity was a highly significant nega- To test these hypotheses, we used reconstruc- containing at least one endemic species, velocity tive predictor of endemism for all groups, and for tions of mean annual temperature at the Last is strongly and negatively related to the proportion the amphibians was second in importance only to Glacial Maximum (LGM; 21,000 years ago) to of amphibian [bivariate regression; n = 188 re- the extent of analogous climate. We examined the calculate a global map of climate-change veloc- gions, coefficient of determination (r2) = 0.283, effect of adding terms to this model to describe ity from the LGM to the present (26) and tested P < 0.001], mammal (n = 231 regions, r2 = the interaction of velocity and modern climate the effects of velocity on patterns of range size 0.328, P < 0.001), and bird (n = 240 regions, r2 = descriptors. Including these terms generally had and species richness of amphibians, mammals, 0.385, P < 0.001) species that are endemic (Fig. 2). small effects on other coefficient estimates, and and birds (27). The difference between the LGM We excluded glaciated areas (Fig. 1A) from this only one such interaction was significant (veloc-

and the present is one of the strongest climatic analysis, but investigations showed that results ity × precipitation seasonality for amphibians), on May 24, 2016 shifts in all of the Quaternary (28), and its spatial were not sensitive to this decision. so these interactions were not considered further. pattern is probably similar to the spatial patterns It is widely accepted that modern climate in- We examined all subsets of the full SAR models of earlier climate cycles (16). Our analysis re- fluences species distributions and diversity, where- and compared them using Akaike’s Information vealed that the LGM-to-present climate-change as the role of historical determinants is less clear Criterion (AIC). For all groups, models incor- velocity exhibits marked geographic variation, (23). We therefore examined models that incor- porating velocity were always strongly preferred with peaks in northeastern North America and porated descriptions of modern climate, the spa- over equivalent models without velocity (mean AIC improvement: amphibians, 29.6; mammals, 38.5; and birds, 39.0). Across the full model set with and without climate-change velocity, there was high support for velocity as part of the best model (summed Akaike weights for velocity > 0.989 for all groups). These results show that http://science.sciencemag.org/ past climate-change velocity and modern climate act together to determine global patterns of endemism. Because velocity incorporates fine-scale topo- graphic effects on climate stability, it should also contribute to within-region variation in endemism.

To test this, we divided the world into regions Downloaded from (10° × 10° equivalents) and asked whether ve- locity was correlated with endemism within these regions. For all three species groups, high-velocity areas within regions were associated with low endemism (global means of within-region corre- lation coefficients: amphibians, r = –0.160; mam- Fig. 2. The relationship between climate-change mals, r = –0.157; and birds, r = –0.091, all P < velocity and proportional endemism for (A)amphib- 0.01) (Fig. 3). This overall pattern is consistent with ians, (B) mammals, and (C) birds at a global scale the loss of small-ranged species in high-velocity within 10° × 10° regions, considering only ungla- regions, but the local importance of velocity showed ciated areas. Relationships are shown separately for the Northern (red) and Southern Hemispheres (blue) strong geographic variation. Velocity should have and for the global relationship (black line). Insets the strongest impact where the climate is some- depict the relationship between velocity and the times highly suitable for a given group, potentially presence or absence of any small-ranged species. favoring the generation and maintenance of en- For each of 25 velocity quantiles, the blue bars in- demic diversity (29). In contrast, regions charac- dicate the proportion of locations with that velocity terized by generally unfavorable conditions are where small-ranged species occur. The black line expected to contain few endemics, whether or not displays a logistic regression fit to the relationship. velocity is low. In addition, the ability of species to cope with temperature fluctuations is thought to vary spatially, with species in highly seasonal

www.sciencemag.org SCIENCE VOL 334 4 NOVEMBER 2011 661 REPORTS

areas tolerating a wider temperature range (30). We Table 1. Relationships of seven predictor variables to global patterns of proportional endemism for tested these hypotheses by identifying the modern amphibians, mammals, and birds. GVI, generalized vegetation index; MAT, mean annual temperature; climatic variables that most strongly control the MAP, mean annual precipitation; TS, temperature seasonality; PS, precipitation seasonality; Extent, the within-region correlation of velocity and endemism regional extent of analogous climate; and Velocity, Late Quaternary climate-change velocity. Five com- 2 patterns (table S9). For all three species groups, parative measures were used: the coefficient of determination from bivariate regressions (r ), standardized the models with the lowest AIC scores contained regression coefficients estimated from ordinary least-squares multiple regression (OLS), simultaneous a single significant predictor and were consistent autoregressive models using all predictors (SAR), the reduced SAR model with the lowest AIC score with the above predictions; velocity was particu- (SARreduced), and Akaike weights based on SAR models. Blank cells indicate variables that were not larly important for amphibians and mammals included in the reduced model. Figures in text are based on the full SAR models, which were most where precipitation was high [amphibians, n = consistently successful at removing residual spatial autocorrelation. The effect of the change in MAT between LGM and present (Anomaly) and topographic heterogeneity (TH) were tested by replacing 125 regions, standardized regression coefficient velocity in all models with each of these variables. *P < 0.05, **P < 0.01, ***P < 0.001. (b)=–0.185, P = 0.0465; mammals, n = 149 re- gions, b = –0.259, P = 0.0018) and for birds 2 r OLS SAR SARreduced WeightAIC where temperature seasonality was low (n = 135 regions, b = 0.334, P = 0.0102) (Fig. 3). Amphibians – – – Although a comprehensive test has thus far GVI 0.227*** 0.185** 0.196** 0.198** 0.989 been lacking, the importance of climate-change MAT 0.002 0.010 0.008 0.294 – – – velocity for a group should depend on the dis- MAP 0.090*** 0.125 0.115 0.11 0.460 – persal abilities of its species. Hence, of the taxa TS 0.068*** 0.018 0.001 0.312 – – – analyzed, birds should be best at tracking high- PS 0.073*** 0.233*** 0.244*** 0.241*** 0.997 – – – velocity changes, whereas amphibians should be Extent 0.415*** 0.439*** 0.414*** 0.411*** 1.000 – – – worst (9). Indeed, the importance of velocity in Velocity 0.283*** 0.252*** 0.288*** 0.289*** 1.000 – – – determining global patterns of endemism decreased Anomaly 0.082*** 0.203** 0.213** 0.207*** 0.979 from amphibians (standardized b = –0.288) to TH 0.134*** 0.230** 0.260*** 0.285*** 0.998 mammals (standardized b = –0.216) to birds Mammals b – GVI 0.253*** –0.169*** –0.160*** –0.160*** 0.993 (standardized = 0.183) (Table 1). Furthermore, on May 24, 2016 velocity was most tightly correlated with patterns MAT 0.158*** 0.228** 0.216* 0.216* 0.882 – – – of within-region endemism for amphibians and MAP 0.203*** 0.215** 0.190* 0.190* 0.736 – – – least correlated for birds. Lending additional sup- TS 0.313*** 0.180 0.182 0.182 0.597 – – – port to the importance of dispersal ability, veloc- PS 0.039** 0.194*** 0.173*** 0.173*** 0.929 – – – ity was also more tightly associated with patterns Extent 0.538*** 0.490*** 0.467*** 0.467*** 1.000 – – – of nonvolant mammal endemism (standardized Velocity 0.328*** 0.199*** 0.216*** 0.216*** 0.998 – – – global b = –0.166, P = 0.0058, mean local r = Anomaly 0.187*** 0.155** 0.158** 0.158** 0.975 –0.165) than with bat endemism (standardized TH 0.023* 0.092 0.106* 0.106* 0.646 global b = –0.135, P = 0.0670, mean local r = Birds – – – –0.034). Global patterns of endemism were well- GVI 0.297*** 0.237*** 0.212*** 0.213*** 1.000 correlated among the three groups, suggesting that MAT 0.102*** 0.065 0.021 0.284 – – – overall they respond to similar drivers. However, MAP 0.137*** 0.317*** 0.303*** 0.302*** 0.999 http://science.sciencemag.org/ – – – regions with low amphibian endemism relative to TS 0.348*** 0.500*** 0.549*** 0.565*** 1.000 – – – avian endemism (which are correlated, r = 0.681) PS 0.036** 0.183*** 0.120* 0.114* 0.838 – – – tended to have high velocity, indicating that am- Extent 0.446*** 0.369*** 0.323*** 0.320*** 1.000 – – – phibians respond more strongly than do birds to Velocity 0.385*** 0.194*** 0.183*** 0.180*** 0.989 – – – climate change (multiple regression of amphibian Anomaly 0.222*** 0.148*** 0.121* 0.119* 0.733 endemism against velocity and avian endemism, TH 0.015 0.074 0.086* 0.081* 0.726

n = 183 regions, standardized bvelocity = –0.281, Downloaded from P = 0.0010). Similar results were obtained for am- variation in range size within assemblages, re- High-velocity areas may coincide with those phibian endemism relative to mammalian ende- duced richness of species with range size below where analogous climate conditions have most mism (r = 0.693, n = 185 regions, bvelocity = –0.283, the median, and weaker, inconsistent, and some- expanded since the LGM; low endemism in these P = 0.0010) and mammalian endemism relative times positive relationships for richness of spe- areas may occur because species in such areas to avian endemism (r = 0.819, n = 212 regions, cies with larger range sizes (figs. S1 to S6 and have expanded their range size accordingly. Cli- bvelocity = –0.175, P = 0.0028). Richness of the tables S1 to S6 and S9). mate expansion since the LGM was highly cor- lowest three range-size quartiles in amphibians High velocities have been proposed to be as- related with the extent of modern analogous is low relative to the equivalent avian richness sociated with incomplete range filling and spe- conditions (r = 0.794) but was a weaker predictor (n = 175 regions, bvelocity = –0.247, P = 0.0031) cies extinctions (10, 17, 22). Although we found of endemism in all cases (compare Table 1 with and mammal richness (n = 173 regions, bvelocity = no evidence for reduced range sizes with higher table S12). Furthermore, the range expansion hy- –0.333, P < 0.0001) in high-velocity regions. As velocities, the decline in endemism and inferred pothesis predicts that the groups with strongest expected given their low dispersal ability, am- importance of dispersal ability are consistent with dispersal ability (birds) should have expanded phibian assemblages in high-velocity regions were the extinctions hypothesis. However, it may be their ranges most and therefore should show the thus particularly depauperate of endemic species that velocity per se is not driving endemism but strongest relationships to velocity, which is con- relative to bird and mammal assemblages. simply correlates with other variables that do have trary to our results. It is also possible that velocity The focal relationship between velocity and a direct mechanistic link. Alternatively, a direct appears to be an important predictor, not because endemism is corroborated by other patterns in the mechanistic link between velocity and endemism of a mechanistic link but only because it is de- distributions of all three groups. At both global may not require species extinctions. We consid- rived from another variable that does have a di- and regional spatial scales, high velocity was as- ered several such alternative hypotheses and show rect link. However, climate-change velocity was sociated with larger median range sizes, lower that none are consistent with the data. a better predictor than either of its components

662 4 NOVEMBER 2011 VOL 334 SCIENCE www.sciencemag.org REPORTS on May 24, 2016

Fig. 3. The global pattern of within-region fit of climate-change velocity to coefficient across all groups present within that region. (B to D) Globally, strong patterns of endemism. (A) Within each 10° × 10° region, circle size indicates the within-region correlations between velocity and endemism were associated with number of species groups for which velocity was significantly (P < 0.05) associated high precipitation (amphibians and mammals) and low temperature seasonality with endemism. The color of the circle corresponds to the mean correlation (birds). Envelopes around each line show the 90% confidence interval.

(topographic heterogeneity and macroclimate tion (ii)] might instead be due to high rates of Taken together, these results indicate that past anomaly) and was the only one with consistent gene flow among populations in unstable, shifting climate changes have left important legacies in http://science.sciencemag.org/ predictive power across scales (Table 1), which climates. However, this mixing should be most contemporary range size and species richness pat- suggests that topographic heterogeneity and pronounced for strongly dispersing species. In con- terns, supplemented by the influences of modern anomaly both contain important and comple- trast, lineage extinctions at all levels should have climate and its spatial pattern. Small-ranged spe- mentary information. High endemism and low the strongest effects on weak dispersers, which is cies constitute most of Earth’s species diversity velocity occur not only in mountains but also in consistent with our results. Thus, elevated extinc- (23); our findings show that these species, espe- macroclimatically stable, relatively flat regions tion rates—likely across a range of evolutionary cially those from less vagile groups, are sensitive

(such as portions of the Amazon basin and cen- scales (31)—appear to best explain the association to climate movements and are concentrated in Downloaded from tral Africa). At the same time, high-velocity moun- of low endemism with high velocity (17, 22). areas where possibilities for tracking past climate tain areas harbor low endemism (such as the Our results have important implications for changes have been greatest. This conclusion also Altai mountains of northern Mongolia and some conservation in a world that is increasingly ex- suggests that small-ranged, weakly dispersing mountains in the western United States). periencing elevated climate-change velocities (18). species in previously stable regions experiencing Does a direct role of velocity depend on ex- Areas that have experienced high velocities in the high future climate-change velocities will be at tinctions, and are alternatives consistent with the past are on average also expected to experience greatest extinction risk from anthropogenic cli- data? Dynesius and Jansson (15) proposed three high velocities over the next century (Fig. 1E and mate change. interrelated mechanisms linking climate stability fig. S7). As we have shown, these areas are al- and patterns of richness and range size: (i) In- ready missing small-ranged species, suggesting References and Notes stability may select for increased dispersal ability that most of the remaining species may cope well 1. C. Parmesan, G. Yohe, Nature 421, 37 (2003). 2. R. K. Colwell, G. Brehm, C. L. Cardelús, A. C. Gilman, and generalism, leading to large ranges; (ii) grad- with future changes. However, there are impor- J. T. Longino, Science 322, 258 (2008). ual speciation rates may be reduced in unstable tant mismatches between the spatial patterns of 3. J. Lenoir, J. C. Gégout, P. A. Marquet, P. de Ruffray, areas, producing a lack of young, small-ranged past and future climate change; areas with low H. Brisse, Science 320, 1768 (2008). species; and (iii) small-ranged species may have velocities in the past, high concentrations of en- 4. C. Moritz et al., Science 322, 261 (2008). 5. B. Huntley, T. Webb III, J. Biogeogr. 16, 5 (1989). gone extinct in unstable areas. Underlying all of demic species, and high velocities into the future 6. P. K. Schoonmaker, D. R. Foster, Bot. Rev. 57, 204 (1991). these hypotheses is the process of lineage extinc- are a particular conservation concern. These areas 7. M. S. McGlone, Global Ecol. Biogeogr. Lett. 5, 309 (1996). tion across a range of evolutionary scales, from include western Amazonia, where concentrations 8. M. B. Davis, R. G. Shaw, Science 292, 673 (2001). 9. M. B. Araújo, R. G. Pearson, Ecography 28, 693 (2005). selection acting on within-species lineages to the of endemic species that have experienced rela- 10. J.-C. Svenning, F. Skov, Ecol. Lett. 7, 565 (2004). extinction of newly diverging or full species (31). tively low-velocity changes in the past may be 11. J.-C. Svenning, F. Skov, Glob. Ecol. Biogeogr. 16, 234 In principal, reduced speciation rates [explana- faced with rapid climate shifts in the near future. (2007).

www.sciencemag.org SCIENCE VOL 334 4 NOVEMBER 2011 663 REPORTS

12. J.-C. Svenning, F. Skov, Ecol. Lett. 10, 453 (2007). or centuries, relatively rapid changes may produce for collecting and archiving the paleoclimate model data, 13. M. B. Araújo et al., Ecography 31, 8 (2008). velocities considerably higher than those obtained by the International Union for Conservation of Nature 14. J.-C. Svenning, S. Normand, F. Skov, Ecography 31, 316 using just the LGM and present. and Natural Resources for making the amphibian and (2008). 27. Materials and methods are available as supporting mammal range data available, and the Natural 15. M. Dynesius, R. Jansson, Proc. Natl. Acad. Sci. U.S.A. 97, material on Science Online. Environment Research Council–funded Avian Diversity 9115 (2000). 28. W. F. Ruddiman, Earth’s Climate: Past and Future Hotspots Consortium (NER/O/S/2001/01230) for the 16. R. Jansson, Proc. Biol. Sci. 270, 583 (2003). (W.H. Freeman and Company, New York, 2001). use of the bird range data. We thank four anonymous 17. J.-C. Svenning, Ecol. Lett. 6, 646 (2003). 29. D. J. Currie et al., Ecol. Lett. 7, 1121 (2004). reviewers whose constructive comments improved this 18. S. R. Loarie et al., Nature 462, 1052 (2009). 30. C. K. Ghalambor, R. B. Huey, P. R. Martin, J. J. Tewksbury, manuscript. Data are archived at Dryad (http://dx.doi.org/ 19. R. Jansson, T. J. Davies, Ecol. Lett. 11, 173 (2008). G. Wang, Integr. Comp. Biol. 46, 5 (2006). 10.5061/dryad.b13j1). 20. T. J. Davies, A. Purvis, J. L. Gittleman, Am. Nat. 174, 297 31. A. C. Carnaval, M. J. Hickerson, C. F. B. Haddad, (2009). M. T. Rodrigues, C. Moritz, Science 323, 785 (2009). Supporting Online Material 21. D. Scherrer, C. Körner, J. Biogeogr. 38, 406 (2010). Acknowledgments: We thank the Aarhus University www.sciencemag.org/cgi/content/full/science.1210173/DC1 22. D. Nogués-Bravo, R. Ohlemüller, P. Batra, M. B. Araújo, Research Foundation for financial support. This study Materials and Methods Evolution 64, 2442 (2010). was also supported in part by MADALGO–Center for Massive Figs. S1 to S12 23. K. J. Gaston, The Structure and Dynamics of Geographic Data Algorithmics, a Center of the Danish National Tables S1 to S12 Ranges (Oxford Univ. Press, New York, 2003). Research Foundation. B.D. is supported by the Danish References (32–54) 24. R. Ohlemüller et al., Biol. Lett. 4, 568 (2008). Council for Independent Research–Natural Sciences, and 25. G. C. Stevens, Am. Nat. 133, 240 (1989). W.J.S. is funded by Arcadia. We thank international climate 22 June 2011; accepted 19 August 2011 26. This measure does not account for abrupt or transient modeling groups for providing their data for analysis, the Published online 6 October 2011; changes within the time interval. Over periods of decades Laboratoire des Sciences du Climat et de l’Environnement 10.1126/science.1210173

Colorado Island (BCI), Republic of Panama, Long-Term Change in the Nitrogen with sun and shade leaves (340 species) collected in 2007. Over four decades, leaf d15N increases Cycle of Tropical Forests averaged 1.4 T 0.16 per mil (‰)(SEM)and2.6T 0.1‰ when comparing 1960s leaves to con- Peter Hietz,1* Benjamin L. Turner,2 Wolfgang Wanek,3 Andreas Richter,4 specific 2007 shade and sun leaves, respectively. 5 2 Charles A. Nock, S. Joseph Wright Based on their leaf mass per area, 1960s leaves on May 24, 2016 included a mixture of both sun and shade leaves Deposition of reactive nitrogen (N) from human activities has large effects on temperate forests (13). The increase in leaf d15N occurred in both where low natural N availability limits productivity but is not known to affect tropical forests legumes (Fabaceae) and nonlegumes (Fig. 1, A where natural N availability is often much greater. Leaf N and the ratio of N isotopes (d15N) and B). Foliar N concentrations in nonlegumes increased substantially in a moist forest in Panama between ~1968 and 2007, as did tree-ring increased by 7.7 T 1.9% and 15.2 T 2.5% when d15N in a dry forest in Thailand over the past century. A decade of fertilization of a nearby comparing 1960s leaves to 2007 sun and shade Panamanian forest with N caused similar increases in leaf N and d15N. Therefore, our results leaves, respectively (Fig. 1C). Legumes had sub- indicate regional increases in N availability due to anthropogenic N deposition. Atmospheric stantially greater foliar N concentrations than non- nitrogen dioxide measurements and increased emissions of anthropogenic reactive N over legumes, and there was no overall change in their tropical land areas suggest that these changes are widespread in tropical forests. foliar N concentration between the 1960s and 2007 (Fig. 1D). nthropogenic N fixation has approxi- and rhizobia associated with legumes, which are To assess whether the changes detected on http://science.sciencemag.org/ mately doubled atmospheric deposition abundant in many tropical forests (4). Nitrogen BCI are representative of tropical forests more Aof reactive N in terrestrial ecosystems deposition is increasing in the tropics, and this broadly, we determined d15N in tree rings from globally, with regional variation resulting from region may see the most dramatic increases in the three nonleguminous tree species in the Huai Kha differences in the intensity of agriculture, the coming decades (1). It has been hypothesized Khaeng Reserve, a remote monsoon forest near the burning of fossil fuels, and biomass burning (1). that this will acidify soils, deplete soil nutrients, Thailand-Myanmar border. Significant increases Many temperate and boreal ecosystems are N reduce tree growth and carbon storage, and neg- in d15N during the past century were detected in

limited; in these regions, atmospheric N deposi- atively affect biodiversity in tropical forests (5, 6). all three species (Fig. 2). Similar changes were Downloaded from tion has caused the acidification of soils and Yet despite extensive speculation, there remains reported previously for tree rings in two Amazo- waters, loss of soil cations, a switch from N to P no direct evidence for changes in the N cycle in nian rainforest tree species (14). limitation, a decline in the diversity of plant com- tropical forests. A forest N addition experiment conducted munities adapted to low N availability, and in- The ratio of stable N isotopes (d15N) reflects 1 km from BCI provides perspective on the creases in carbon uptake and storage (2, 3). Natural the nature of the N cycle in ecosystems, with changes in foliar N composition (15). Foliar d15N N availability is much greater in many tropical higher values indicating greater N availability increased by 0.3 to 1.5‰ in four tree species and forests than in most temperate forests due to high and a more open N cycle (7, 8). In temperate by ~0.5 to 1.2‰ in fine litter (15), and the N rates of N fixation by heterotrophic soil microbes ecosystems where N deposition is low, leaf N concentration in litterfall increased by 7% (16) concentrations and the d15N of leaves and wood after 8 to 9 years of fertilization with 125 kg N − − 1Institute of Botany, University of Natural Resources and Life decreased during the 20th century, indicating ha 1year 1. The observed increase in leaf d15N Sciences, Gregor Mendel-Straße 33, 1180 Vienna, Austria. 2 progressive N limitation in response to changes did not reflect the signal of the N fertilizer, which Smithsonian Tropical Research Institute, Apartado 0843-03092, 9 15 3 in land use ( ) and increasing atmospheric CO2 had a lower d N(–2.2‰) than leaves of non- Balboa,Ancón,RepublicofPanama. Department of Chemi- 10 d15 15 cal Ecology and Ecosystem Research, University of Vienna, concentrations ( ). In contrast, wood Nval- fertilized trees in control plots ( ) and therefore Althanstrasse 14, A-1090 Vienna, Austria. 4Institute of Envi- ues have increased in temperate forests with high should have resulted in a decline rather than an ronmental Physics, University of Bremen, Otto-Hahn-Allee 1, rates of N deposition or a history of recent dis- increase in foliar d15N. Nitrogen fertilization also D-28359 Bremen, Germany. 5Centre d’ÉtudedelaForêt,Dé- turbance, suggesting more open N cycles under increased NO3 leaching (from 0.01 to 0.93 mg partement des Sciences Biologiques, Université du Québec à 11 12 −1 m −2 −1 Montréal, Post Office Box 8888 Centre-ville Station, H3C 3P8 such conditions ( , ). N liter ), NO flux (from 70 to 196 gNm day ), −2 −1 Montreal, Canada. We compared leaves from herbarium spec- and N2O flux (from 448 to 1498 mgNm day ) 15 *To whom correspondence should be addressed. E-mail: imens (158 species) collected ~40 years ago (15), confirming that the increase in leaf d N [email protected] (~1968) from a tropical moist forest on Barro after N fertilization was associated with a more

664 4 NOVEMBER 2011 VOL 334 SCIENCE www.sciencemag.org The Influence of Late Quaternary Climate-Change Velocity on Species Endemism B. Sandel, L. Arge, B. Dalsgaard, R. G. Davies, K. J. Gaston, W. J. Sutherland and J.-C. Svenning (October 6, 2011) Science 334 (6056), 660-664. [doi: 10.1126/science.1210173] originally published online October 6, 2011

Editor's Summary

This copy is for your personal, non-commercial use only.

Article Tools Visit the online version of this article to access the personalization and article tools: on May 24, 2016 http://science.sciencemag.org/content/334/6056/660 Permissions Obtain information about reproducing this article: http://www.sciencemag.org/about/permissions.dtl http://science.sciencemag.org/ Downloaded from

Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright 2016 by the American Association for the Advancement of Science; all rights reserved. The title Science is a registered trademark of AAAS. 2008 49th Annual IEEE Symposium on Foundations of Computer Science

Succincter

Mihai Patrasˇ ¸cu∗ MIT

Abstract On the other hand, a summary is, almost by definition, re- dundant. If we store a summary for every block of t bits, We can represent an array of n values from {0, 1, 2} us- it would appear that the query needs to spend time propor- n t ing log2 3 bits (arithmetic coding), but then we cannot tional to , because no guiding information is available in- retrieve a single element efficiently. Instead, we can encode side the block. t t A ..n every block of elements using log2 3 bits, and bound 2. Suppose we want to represent an array [1 ] of the retrieval time by t. This gives a linear trade-off between “trits” (A[i] ∈{0, 1, 2}), supporting fast access to single the redundancy of the representation and the query time. elements A[i]. We can encode the entire array as a number In fact, this type of linear trade-off is ubiquitous in in {0,...,3n − 1}, but the information about every trit is known succinct data structures, and in data compression. “smeared” around, and we cannot decode one trit without The folk wisdom is that if we want to waste one bit per block, decoding the whole array. the encoding is so constrained that it cannot help the query Generalizing trits to symbols drawn independently from in any way. Thus, the only thing a query can do is to read a distribution with entropy H, we can use arithmetic coding the entire block and unpack it. to achieve n · H +1bits on average, but information about We break this limitation and show how to use recursion each element is spread around in the encoding. At the other to improve redundancy. It turns out that if a block is en- extreme, we can use Huffman coding to achieve n · H + coded with two (!) bits of redundancy, we can decode a O(1) bits1 of storage, which represents every element in single element, and answer many other interesting queries, “its own” memory bits, using a prefix-free code. in time logarithmic in the block size. Our technique allows us to revisit classic problems in The natural solutions to these problems gravitate towards succinct data structures, and give surprising new upper a linear trade-off between redundancy and query time. We bounds. We also construct a locally-decodable version of can store a summary for every block of t elements (in prob- arithmetic coding. lem 1.), or we can “round up” the entropy of every block of t symbols to an integral number of bits (in problem 2.). n In both cases, we are introducing redundancy of roughly t , t 1 Introduction and the query time will be proportional to . It is not hard to convince oneself that a linear trade-off is the best possible. If we store t elements with at most O(1) 1.1 Motivation bits of redundancy, we need a super-efficient encoding that is essentially fixed due the entropy constraint. Because the Can we represent data close to the information-theoretic encoding is so constrained, it would appear that it cannot be minimum space, and still answer interesting queries effi- useful beyond representing the data itself. Then, the only ciently? Two basic examples can showcase the antagonistic way to work with such a super-efficient encoding is to de- nature of compression and fast queries: code it entirely, forcing query time proportional to t. A ..n 1. Suppose we want to store a bit-vector [1 ], and In this paper, we show that this intuition is false: we can k answer partial sums queries: RANK( ), which asks for use recursion even inside a super-efficient encoding (if we k A i k i=1 [ ]; and SELECT( ), which asks for the index of are allowed two bits of redundancy). Instead of decoding t the k-th one in the array. elements to get to one trit, local decoding can be supported One the one hand, we must store some summaries in logarithmic time. This extends to storing an entire aug- (e.g. partial sums at various points) to support fast queries. mented tree succinctly, so we can solve RANK/SELECT in

∗Supported by a Google Research Award and by MADALGO - Center 1More precisely, Gallager [11] showed that the redundancy is at most for Massive Data Algorithmics, a Center of the Danish National Research pmax +0.086 bits per element, where pmax is the maximum probability Foundation. of an element.

0272-5428/08 $25.00 © 2008 IEEE 305 DOI 10.1109/FOCS.2008.83 logarithmic time. At every node of the tree, we can achieve Mitzenmacher [17] uses arithmetic coding to compress something nontrivial (store the sum of its subtree), while Bloom filters, an application that critically requires local 1 introducing just t bits of redundancy. decodability. He asks for “a compression scheme that also provided random access,” noting that “achieving random 1.2 Succinct(er) Data Structures access, efficiency, and good compression simultaneously is generally difficult.” Our results give a version of locally-decodable arith- In the field of succinct data structures, the goal is to con- metic codes, in which the trade-off between local access struct data structures that use space equal to the information time and redundancy is exponential: theoretic minimum plus some redundancy R, while sup- porting various types of queries. The field has been expand- Theorem 1. Consider an array of n elements from an al- ing at a remarkable rate in the past decade, exploring a wide phabet Σ, and let fσ > 0 be the number of occurrences of variety of problems and queries. letter σ in the array. On a RAM with cells of Ω(lg n) bits, All of these structures, however, exhibit a linear trade-off we can represent the array with: between redundancy and query time. Typically the results n lg n t are stated for constant query time, and achieve a fixed re- O | | n f n O˜ n3/4 n ( Σ lg )+ σ log2 + + ( ) O( ) fσ t dundancy close to linear, most often around lg n .Ata σ∈Σ high enough level of abstraction, this comes from storing ε lg n elements by an optimal encoding, and using a pre- bits of memory, supporting single-element access in O(t) ε computed lookup table of size O(n ) to decode them in time. constant time. It can be seen that for any constant query n fσ log time, the redundancy will not improve asymptotically (we Observe that 2 fσ is the empirical entropy of can only look at a constant number of blocks). the array. Thus, if the elements are generated by a memo- H n · H For many natural problems in the field, our technique can ryless process with entropy , the expected space is , t O | | achieve redundancy O(n/poly log n) with constant running plus redundancy decreasing exponentially in , plus ( Σ ) times, for any desired poly log. This suggests we have to re- words needed to represent the distribution. think our expectations and approaches when designing suc- cinct data structures. Bit-vectors with RANK/ SELECT. The problem of sup- Surprisingly, our constructions are often easier than the porting RANK and SELECT on bit vectors is the bread-and- fixed-redundancy data structures they are replacing. It is butter of succinct data structures, finding use in most other not uncommon for succinct data structures to group ele- data structures (for representing trees, graphs, suffix trees ments into chunks, group chucks in superchunks, and finally / suffix arrays etc). Thus, the redundancy needed for this group superchunks in megachunks. Each level has different problem has come under quite a bit of attention. lookup tables, and different details in the implementation. The seminal papers of Jacobson [15] from FOCS’89, By having to do recursion for an unbounded number of lev- and Clark and Munro [5] from SODA’96 gave the first data els, we are forced to discover a clean and uniform way to structures using space n + o(n) and constant query time. do it. These results were later improved [19, 21, 26]. The following are a few results that follow from our tech- In several applications, the set of ones is not dense in the nique. We believe however that the main value of this paper array. Thus, the problem was generalized to storing an array is to demonstrate that redundancy can be improved through A[1 ..u], containing n ones and u − n zeros. The optimal u recursion. These particular results are illustrations. space is B =lg n . Note that the problem can solve pre- decessor search, by running SELECT(RANK(i)). From the lower bounds of [25], it follows that constant running times Locally decodable arithmetic codes. Our toy problem of are only possible when the universe is not too much larger storing ternary values can be generalized to representing an n u = n · poly log n n H than , in particular, . Thus, succinct data array of elements with zeroth-order entropy . Matching structures have focused on this regime of parameters. zeroth-order entropy is needed at the bottom of almost any n 2 B + O(n · (lg lg ) ) technique attempting to match higher order entropy (includ- Pagh [23] achieved space lg n for this sparse problem. Recently, Golynski et al. [13] achieved B+ ing, for example, LZ77, the Burrows-Wheeler transform, lg lg u O(n · 2 ). Finally, Golynski et al. [14] have achieved JPEG, and MP3). Arithmetic coding can have a notable lg n B O n · lg lg n·lg(u/n) advantage over Huffman coding at low rates. For exam- space + ( lg2 n ). That paper conjectures that ple, Shannon estimated the entropy of English to be 1.3 their redundancy is the best possible in constant time. bits/letter, a rate at which a constant redundancy per letter Here, we disprove their conjecture, and show that any can increase the encoding size by a significant percentage. redundancy O(n/poly log n) is achievable.

306 u Theorem 2. On a RAM with cells of Ω(lg ) bits, we can The optimal space is given by the Catalan number: represent an array A[1 ..u] with n ones and u − n zeros 1 2n lg n n =2n − O(lg n). Due to the natural asso- u u ˜ 3/4 +1 using log + t + O(u ) bits of memory, 2 n (lg u/t) ciation between tree structures and balanced parentheses, O t supporting RANK and SELECT queries in ( ) time. this problem has been the crucial building block in most The RANK/SELECT problem has also seen a lot of work succinct tree representations. in lower bounds [10, 16, 14, 12], particularly bounds apply- ing to “systematic encodings.” In this model, the bit vec- A generic transformation. In fact, all of our results are tor is stored separately in plain form, and the succinct data shown via a generic transformation, converting a broad structure consists of a sublinear size index on the side. Un- class of data structures based on augmented search trees into der this requirement, the best achievable redundancy with succinct data structures. It seems likely that such a broad re- t n sult will have applications beyond the ones explored in this query time is t·poly lg n , i.e. the linear trade-off between redundancy and query time is the best possible. Our re- paper. sults demonstrate the significant power of not storing data The reader is referred to §4 for formal details about the in plain form. class of augmented search trees handled by our transforma- tion. Dictionaries. The dictionary problem is to store a set S of n elements from a universe u, and answer membership 1.3 Technical Discussion queries (is x ∈ S?) efficiently. Improving space has long been a primary goal in the study of dictionaries. For ex- Our goal is to improve redundancy through recursion, as ample, classic works put a lot of effort into analyzing linear opposed to a linear scan. A na¨ıve view for how recursion probing when the table is close to capacity (if a 1−ε fraction might work for the trits problem is the following. We take w w of the cells are used, how does the “constant” running time trits, which have entropy log2 3, and “extract” some M M w degrade with ε?). Another trend is to study weaker versions bits of information (e.g. = log2 3 ), which we δ w − M of dictionaries, in the hope of saving space. Examples in- store. Then, the remaining = log2 3 bits of in- clude the well known Bloom filters [1, 3, 22], and dictionar- formation are passed to the second level of the recursion. w ies that support retrieval only, sometimes called “Bloomier At the second level, we aggregate blocks, for which we wδ M  filters” [6, 4, 18]. must store bits of information. We store some bits M  wδ δ wδ−M  The famous FKS dictionaries [8] were the first to solve (e.g. = ), and pass = bits to the second the dictionary problem in linear space, in the sense of using level, etc. O(n) cells of size lg u, while supporting queries in O(1) Since we are not willing to waste a bit of redundancy δ time in the worst-case. Many data structures with simi- per block, may not be an integer. Unfortunately, “passing lar performance have been suggested; in particular, cuckoo some fractional bits of information” is an intuition that does hashing [24] uses (2 + ε)n memory words. not render itself to any obvious implementation. Our solution for passing a fractional number of entropy Brodnik and Munro [2] were the first to approach the en- u tropy bound, B =lg . Their data structure used space bits is elegant and, in hindsight, quite natural. We will ap- n δ K K B+O(B/lg lg lg u), and they suggested that getting signifi- proximate by log2 , where is an integer. This means δ cantly more sublinear bounds might not be possible without that passing bits of information is almost the same as pass- { ,...,K − } “a more powerful machine model.” ing a number in 0 1 . This approach introduces K − δ Pagh [23] gave the best known bound, achieving B + redundancy (“unused entropy”) of log2 bits, a quan- (lg lg n)2 tity that depends on how close δ is to a logarithm of an inte- O(n ). In fact, his algorithm is reduction to the lg n ger. Note however, that if K is the best approximation, δ is RANK problem in vectors of size u = n · poly log n. Thus, sandwiched between log (K − 1) and log K. Thus, if we our results immediately imply dictionaries with redundancy 2 2 c choose δ large enough, there is always some K that gives a B + O(n/ lg n), for any constant c. good approximation. At the second level of recursion, the problem will be to Balanced parentheses. Our techniques imply results represent an array of n/w values from {0,...,K−1}. This identical to RANK/SELECT for the problem of storing a is just a generalization of the ternary problem to an arbitrary n string of 2 balanced parentheses, and answering two universe, so the same ideas apply recursively. queries: It should be noted that the basic technique of storing a MATCH(k): find the parenthesis that matches the one on certain number of bits and passing the “spill” to be stored position k. separately, is not new. It was originally introduced by ENCLOSING(k): find the open parenthesis that encloses Munro et al. [20], and is the basis of the logarithmic im- most tightly the one on position k. provement in redundancy of Golynski et al. [13] (see the

307 “informative encodings” of that paper). Unfortunately, the and pass the quotient as the spill. This is decodable by O(1) techniques in these papers do no allow recursive composi- arithmetic operations. tion, which is the key to our results. Our spill-over scheme is capable of representing 2M · In §2, we formally define the concept of spill-over repre- K different values, as opposed to the |X | values required. sentations, and uses it to prove Theorem 4 for representing Thus, we have a redundancy of: an array of trits. |X | M M  ·2 For less trivial applications, we need to compose log K·2 =log 2M 2 |X | 2 |X | variable-length representations without losing entropy. For |X | M ( M +1)·2 n ≤ 2 example, we may want to store the sum of numbers, and log2 |X | then the numbers themselves, by an efficient encoding that 2M ≤ 1 ≤ 2 doesn’t store the sum redundantly. With the right view, we =log2 1+ |X | log2 1+ r r can give a very usable lemma performing this composition; see §3. Finally, §4 uses this lemma to derive our main re- 2.1 Application: Storing Trits sults in succinct data structures. We now show how to compose spill-over representations 2 Spill-Over Representations from Lemma 3 recursively, yielding the following bounds for our toy problem of storing n ternary values: Assume we want to represent a value from a set X .Any Theorem 4. On a RAM with w-bit cells, we can rep- representation in a sequence of memory bits will use at least resent an array A[1 ..n] with A[i] ∈{0, 1, 2} using log |X | n n n 2 bits in the worst case, so it will have a redun- log2 3 + w/t t + poly log bits of memory, while |X | ( ) dancy of almost one bit if is not close to a power of supporting single-element accesses in O(t) time. two. To achieve a redundancy much smaller than one for any value of |X |, we must first define a model of “represen- Proof. We first group elements into blocks of w. A block w tation” where this is possible. takes values in a set of size 3 . We apply Lemma 3 with A spill-over representation consists of M memory bits some parameter r to be determined, and store each block M K ∈ r, r (stored in our random-access memory), plus a number in as 0 memory bits and a spill in a universe 0 [ 2 ]. {0,...,K − 1} that is stored by some outside entity. This Given the spill, we can read the O(w) memory bits and de- number is called the spill, and K is called the spill uni- code in constant time: verse. Observe that the spill-over representation is ca- • first assemble the spill and the memory bits, multiply- M pable of representing K · 2 distinct values. When us- ing the spill by 2M0 , and adding them together. ing such a representation to store an element in X , with • now extract the ith trit, dividing by 3i, and taking the M |X |

308 is a telescoping sum of the sizes at intermediate levels of We can design a spill-over representation for x, yM and yK recursion, i.e. the final redundancy is simply the sum of the with the following parameters: redundancies introduced at each step. This gives: • the spill universe is K with K ≤ 2r, and the memory M n/w n/(Bw) n/(Btw) n usage is  bits. R = O + + ···+ + • 4 M K ≤ r r r Btw the redundancy is at most r bits, i.e.  +log2  H 4 n n + r . = O + wr w · Bt • if the word size is w =Ω lg |X |+lg r+lg max K(x) , x and yK can be decoded with O(1) word probes. The To balance the two terms (the redundancy of the spill- input bits yM can be read directly from memory, but over representations, versus the final redundancy of one only after yK is retrieved. r = Bt B bits per spill), let . By our choice of ,wehave • given a precomputed table of O(|X |) words that only B O w O w O w/t = ( lg r )= ( t lg B )= ( lg(w/t) ). We thus have depends on the input functions K, M and p, decoding Ω(t) t w x yK r = B = t . Adjusting constants, we have a redun- and takes constant time on the word RAM. n dancy of R≤ t , and query time O(t). (w/t) In §3.1, we describe the main details of the construction. To As a header of the data structure, we need to store the make the construction implementable, we need to tweak the K M values i and i for every level of the recursion, which are distribution p(·) to avoid pathologically rare events; this is O(lg2 n) needed to navigate the data structure. This adds detailed in §3.2. Finally, in §3.3 we describe the constant- bits to the redundancy. time decoding procedure, which relies on a cute algorithmic trick. Note that a table of size O(|X |) is optimal, since the 3 Composing Variable-Length Encodings lemma is instantiated for three arbitrary functions on X .

As we have just seen, spill-over encodings of fixed length 3.1 The Basic Construction can be composed easily in a recursive fashion. However, in less trivial applications, we need to compose variable-length The design of our data structure is remarkably simple. encodings without losing entropy. For an illustration of the First, let Mmin =minx∈X M(x). We can decrease each A ..n concept, assume we want to store an array [1 ] of bits, M(x) to Mmin, by encoding M(x) − Mmin bits of memory such that we can query both any element A[i], and the sum into the spill. This has the technical effect that we may not X n A j of the entire array: = j=1 [ ]. If we choose the trivial access yM prior to decoding yK , as stated in the lemma. n encoding as bits, querying the sum will take linear time. From now on, we assume yM always has Mmin bits, and  M(x)−Mmin Conceptually, we must first store the sum, followed by yK comes from a universe of K (x)=K(x) · 2 . the array itself, represented by an efficient encoding that Now let Z be the set of possible pairs (x, yK ). uses knowledge of the sum (i.e. does not contain any re- |Z| M ≤ H dundant information about the sum). This should not lose Claim 6. We have log2 + min . entropy, since H(X)+H(A|X)=n. K(x) Proof. We will show |Z|≤maxx∈X . Since The trouble is that the bound H(X) can only be achieved p(x) 1 K x M ≤ H x in expectation, by some kind of arithmetic code. Further- log2 p(x) +log2 ( )+ min for all , it follows |Z| M ≤ H more, since H(A|X) is a function of X, the parameters of that log2 + min . X |Z| > K (x) x ∈ a spill-over representation must also vary with . Thus, we Suppose for contradiction that p(x) for all X  need to piece together some kind of arithmetic code for , X . Then, K (x)< |Z|· p(x), for all x.Wehave|Z| = A  with a variable-length spill-over representation of whose x∈X K (x) < x∈X |Z|·p(x) = |Z|, which gives a size depends on X. In the end, however, we should obtain contradiction. a fixed-length encoding, since the overall entropy is n bits. The following lemma formalizes this intuition, in a state- Though the proof of this lemma looks like a technical- ment that has been crafted carefully for ease of use in appli- ity, the intuition behind it is quite strong. The space of cations: encodings Z is partitioned into equivalence classes (x, ), giving each element x a fraction equal to K(x)/|Z|. But Lemma 5. Assume we have to represent a variable x ∈ 1 log +log K(x) ≈ H − Mmin,soK(x)/p(x) is X y ,y ∈{, }M(x) × ,...,K x − 2 p(x) 2 , and a pair ( M K ) 0 1 0 ( ) roughly fixed. Thus, the fraction of space assigned to x is p x X 1 . Let ( ) be a probability density function on , and roughly p(x), which is exactly how arithmetic coding oper- K · ,M · X ( ) ( ) be non-negative functions on satisfying: ates (partitioning the real interval [0, 1]). 1 To complete the representation, we apply Lemma 3 to (∀)x ∈X :log + M(x)+log K(x) ≤ H (1) 2 p(x) 2 the set Z. The resulting spill is passed along as the spill of

309 our data structure, and the memory bits are stored at the be- We can replace (1) with the following weaker guarantee: ginning of the data structure, followed by the Mmin bits of yM M x K x 1 . The only redundancy introduced is by this application ( )+log2 ( )+log2 p(x) 2 of Lemma 3, i.e. r bits. ≤ M x K x 1 2 ≤ H 2 ( )+log2 ( )+log2 p x + r + r For this construction to be efficiently decodable, we must ( ) at the very least be able to manipulate a value of Z in con- Combining with the construction from §3.1, which intro- stant time, i.e. have a word size of w =Ω(lg|Z|).To duced another 2 bits of redundancy, we have lost 4 bits of understand this requirement, we bound log |Z|≤H − r r 2 redundancy overall. min M(x). For every x, we are free to increase M(x), padding with zero bits, up to the maximum integer satis- M x ≤ H − K x − 1 M x > fying ( ) log2 ( ) log2 p(x) . Thus, ( ) 3.3 Algorithmic Decoding H − K(x) − |Z|≤ log2 p(x) 1. This implies the bound log2 K x K x log max ( ) +1, i.e. |Z| = O max ( ) . The statement The heart of the decoding problem lies in recovering x 2 min p(x) min p(x) y Z of the lemma already assumes w =Ω(lgmaxK(x)), since and K based on an index in (output by Lemma 3). We K(x) could do this with a lookup in a precomputed table of size is the universe of the input spill. Thus, it remains to max K(x) w 1 |Z|. Remember that we bounded |Z| = O( p x )= ensure that =Ω(lgmin p(x) ). min ( ) O(|X | · r · max K(x)), which is not a strong enough bound for some applications. We now show how to use a table of 3.2 Handling Rare Events size just O(|X |). From the point of view of x, the space Z is partitioned Unfortunately, some values x ∈Xcan be arbitrar- into pieces of cardinality K(x), and the query is to find the ily rare, and in fact, min p(x) is prohibitively small even piece containing a given codeword. We are free to design for natural applications. Remember our example of rep- the partition to make decoding efficient. First, we assign to resenting an array A[1 ..n] of bits, together with its sum each x a contiguous interval of Z. Let zx be the left bound- x n A i p x p n −n x x = j=1 [ ].Wehavemin ( )= ( )=2 , which ary of the interval assigned to . Decoding is equivalent to means our algorithm wants to manipulate spills of Ω(n) a predecessor search, locating the codeword among the val- bits. A word size of Θ(n) bits is an unrealistic assumption. ues {z1,...,z|X |}. Decoding yK simply subtracts zx from Our strategy will be to tweak the distribution p(·) the codeword. 1 slightly, increasing min p(x) towards |X | , while not losing Unfortunately, the optimal bounds for predecessor too much entropy if we code according to this false distribu- search [25] are superconstant in the worst case. To achieve tion. (In information-theoretic terms, the Kullback-Leibler constant time, we must leverage our ability to choose the divergence between the distributions must be small.) encoding: we must arrange the intervals in an order that 1 Formally, we tweak the distribution by adding |X|·r to makes predecessor search easy! While this sounds mys- every probability, and then normalizing: terious, it turns out that sorting the intervals by increasing length suffices. 1 1 p(x)= p(x)+ 1+ |X|·r r Claim 7. If intervals are sorted by length, i.e. zi+1 − zi ≥ zi − zi− , predecessor search among the zi’s can be sup- 1 p x p x |X|· ported in constant time by a data structure of O(|X |) words. Observe that x∈X ( )= x∈X ( )+ 1 1+ 1 =1,sop(·) indeed defines a distribution. |X|·r r f τ τ  1 1 1 Proof. Let ( ) be the smallest interval of length , We now have min p (x) ≥ |X |·r (1 + r ) ≥ r·|X | . 2 i.e. f(τ)=min{i | zi+1 − zi ≥ τ}. Consider the set This requires the word size to be w =Ω(lgr +lg|X |),a of zf(τ), for every τ a power of two. This set has O(w) reasonable assumption made by the lemma. (In our example values, so we can store it in a fusion tree [9], and support n |X | n w r with an -bit array, = +1, so we require =Ω(lg + predecessor search in constant time. lg n), instead of Ω(n) are before.) Say we have located the query between zf(τ) and zf(2τ). Finally, we must show that the entropy bound in (1) is All intervals in this range have width between τ and 2τ, p · not hurt too much if we code according to ( ) instead of i.e. we have constant spread. In this case, predecessor p · ( ). Note that: search can be solved easily in constant time [7]: break the universe into buckets of size τ, which ensures at most one 1 1 log ≤ log 2 p (x) 2 p(x)/(1+1 /r) value per bucket, and at most three buckets to inspect until 1 1 ≤ 1 2 the predecessor is found. =log2 p(x) +log2 1+ r log2 p(x) + r

310 B O w 4 Applications to Succinct Data Structures Theorem 8. Let = ( lg(n+|Φ|) ). We can store an aB- n ϕ N n, ϕ tree of size with root value using log2 ( )+2bits. 4.1 Augmented Trees The query time is O(logB n), assuming precomputed look- up tables of O(|Σ| + |Φ|B+1 + B ·|Φ|B) words, which only As mentioned already, our results are based on a generic depend on n, B and the aB-tree algorithm. transformation of augmented B-trees to succinct data struc- tures. For some B ≥ 2, we define a class of data structures, Essentially, this result compresses the entire aB-tree with aB-trees, as follows: only two 2 bits of redundancy. The additional space of the look-up tables will not matter too much, since we construct • The data structure represents an array A[1 ..n] of el- many data structures that share them. ements from some alphabet Σ, where n is a power of B. The data structure is a B-ary tree with the elements of A in the leaves. Application to RANK/SELECT. Say we want to solve O t U • Every node is augmented with a value from some al- RANK and SELECT in time ( ), for an array of size with N phabet Φ. The value of a leaf is a function of its array ones. As mentioned already, RANK and SELECT queries element, and the value of an internal node is a function can be supported by an aB-tree, so Theorem 8 applies. If A B the aB-tree has size r,wehave|Σ| =2and |Φ| = r +1. of the values of its children, and the size of the ε U Choose B ≥ 2 such that B lg B = lg , and let r = subtree. t t lg U Θ(t) • The query algorithm examines the values of the root’s B = t . We break the array into buckets of size r, children, decides which child to recurse to, examines rounding up the size of the last bucket. Each bucket is stored all values of that node’s children, recurses to one of as a succinct aB-tree. Supporting RANK and SELECT inside them, etc. When a leaf is examined, the algorithm out- such an aB-tree requires time O(logB r)=O(t). puts the query answer. We assume the query algorithm For each bucket, we store the the index in memory of the spends constant time per node, if all values of the chil- bucket’s memory bits. Let N1,N2,... be number of ones in dren are given packed in a word. each subarray. We store a partial sums vector for these val- ues (to aid RANK), and a predecessor structure on the partial ANK ELECT For instance, the R /S problem has a stan- sums (to aid SELECT). We have at most U/r values from dard solution through an aB-tree. The alphabet Σ is simply a universe of U, so the predecessor structure can support {0, 1}. Every internal node counts the sum of the leaves U Ω(1/t) U t−1 query time O(t) using space r ·r ≤ r ·B = U/B in its subtree (equivalently, the sum of its children), so words; see [25]. A query begins by examining these aux- { ,...,n} Φ= 0 . The queries can be solved in constant iliary structures, and then performing a query in the right time per node if the values of all children are given packed aB-tree. in a word. This uses very standard ideas from word-level The components of the memory consumption are: parallelism that we omit. We aim to compress an aB-tree. A natural goal for the 1. a pointer to each bucket, and the partial sums for N ,N ,... O U U space an aB-tree should use is given by N (n, ϕ), defined as the array 1 2 . These occupy ( r lg )= U lg U the number of instances of A[1 ..n] such that the root is la- O Bt bits. beled with ϕ ∈ Φ. Observe that we can write the following 2. the predecessor structure, occupying O(U/Bt−1) recursion for N (B · n, ϕ): words. This dominates item 1., and is U/BΘ(t) bits. 3. the succinct aB-trees, which occupy: N (B · n, ϕ)= N (n, ϕ ) ···N(n, ϕB) 1 r r U ϕ1,...,ϕB :A(ϕ1,...,ϕB ,n)=ϕ log +2 ≤ log + O i 2 Ni 2 i Ni r ≤ U+r−1 O U ≤ U O r U Indeed, any instance for the first half is valid, as long as log2 N + r log2 N + + r its aggregate value combines properly with the aggregate of r U the second half. bits. The redundancy√ can be rewritten as + r = To develop intuition for N , observe that in the O { U , U} n (max r ). RANK/SELECT example, N (n, ϕ)= , because ϕ was B B ϕ 4. the look-up tables, of size O((r +1) +1 +(r +1) · just the number of ones in the array. Our recursion becomes O tB B O ε U O ε B)=2 ( lg ) =2 ( lg ) = U ( ). Setting ε the following obvious identity: a small enough constant, this contributes negligibly to the redundancy. The only limitation is B ≥ 2, N (B · n, ϕ)= N (n, ϕ ) ···N(n, ϕB) 1 O(r3) ϕ ··· ϕ ϕ so we cannot reduce the lookup tables below 1+ + B = U 3 words. A redundancy of r + O(r ) can be written as U 3/4 We will show the following general result: max{ r ,U }.

311 Θ(t) n  B To summarize, we obtain a redundancy of U/B + i K( B ,ϕi). Since lg K ≤ lg (2r) = O(B lg r),we O U 3/4 t B O w/ r ( ). Readjusting constants in , the redundancy is require that = ( lg ), so that this superspill fits in U t U/ lg + O(U 3/4). a constant number of words. For every possible ϕ, we pre- t n compute the partial sums of M( B ,ϕi), so that we know i Application to arithmetic coding. In this application, Σ where the th child begins in constant time. We also pre- i is the alphabet that we wish to encode. Intuitively, a letter compute the constant needed to extract the th spill from O B ·| |B σ ∈ Σ log n the superspill. These tables require ( Φ ) words. has 2 fσ bits of entropy. We round these values n Summing the recursive guarantee (3) of every child, we up to multiples of 1 , which only adds redundancy over r r have: all symbols. We construct an aB-tree, in which internal nodes are aug-   n 2n − B log K + M ≤ log N ,ϕi +4· mented to store the entropy of the symbols in their subtree. 2 2 B r i If the aB-tree has size at most r, the total entropy is at most O(r lg n),so|Φ| = O(lg r +lglgn). The query algorithm Let ϕ = A(ϕ1,...,ϕi,n) be the value at the root. Let p(·) is trivial: it just traverses the tree down to the leaf that it be the distribution of ϕ given this value of ϕ, that is: wants to retrieve, ignoring all nodes along the way. By this definition, N (n, ϕ) is the number of n-letter se- n p(ϕ)= N ,ϕi N (n, ϕ) ϕ B quences with total entropy exactly . But there are at most i 2ϕ such sequences, by an immediate packing argument. Thus, an aB-tree of size r having value ϕ at the root can But then: be compressed to space ϕ +2. We now proceed as in the 1 2n − B previous example, breaking the array into n/r buckets of log K + M  ≤ log N (n, ϕ) − log +4· 2 2 2 p ϕ r size r, and performing the same calculations for the space ( ) occupied by auxiliary structures. This satisfies the entropy condition (1) of Lemma 5. We ap- ϕ 4.2 Proof of Theorem 8 ply the lemma to represent , the superspill, and the mem- ory bits of the subarrays. We obtain a representation with spill universe K ≤ 2r and M memory bits, such that: The proof is by induction on powers of B, aggregating B spill-over representations for aB-trees of size n/B into one 2n − B 4 M +log K ≤ log N (n, ϕ)+4 + for an aB-tree of size n. Let K(n, ϕ) be the spill universe 2 2 r r used for a data structure of size n and root label ϕ. Let 2n − 1 ≤ log N (n, ϕ)+4· M(n, ϕ) be the memory bits used by such a representation. 2 r Let r to be determined. We guarantee inductively that: The precomputed table required by Lemma 5 is linear in K n, ϕ ≤ r B ( ) 2 ; (2) the support of p(·), which is |Φ| . Such a table is stored 2n − 1 ϕ | |B+1 M(n, ϕ)+log K(n, ϕ) ≤ log N (n, ϕ)+4 . (3) for every distinct , giving space Φ . We must have 2 2 r B O w = ( lg |Φ| ). Consider the base case, n =1. The alphabet Σ is parti- This completes the induction step. To prove Theorem 8, tioned into sets Σϕ of array elements for which the leaf is la- we construct the above representation for the required size ϕ N ,ϕ | | n r 1 B ≤ beled with .Wehave (1 )= Σϕ . We use a spill-over , using the value = n . Note that this requires w 8 Σϕ O } encoding as in Lemma 3 to store an index into . The en- lg max{|Φ|,r . coding will use a spill universe K(1,ϕ) ≤ 2r and M(1,ϕ) At the root, the final spill is stored explicitly at the be- M ,ϕ K ,ϕ ≤ bits of memory, such that (1 )+log2 (1 ) ginning of the data structure. Thus, the space is: | | 2 log2 Σϕ + r . We store a look-up table that determines ϕ  K n, ϕ M n, ϕ ≤ N n, ϕ the array value based on the value and the index into Σϕ. log2 ( ) + ( ) log2 ( )+2 These look-up tables (for all ϕ) require space O(|Φ| + |Σ|). For the induction step, we break the array into B subar- The queries are easy to support. First, we read the final rays of size n/B. Let ϕ =(ϕ1,...,ϕB) denote the values spill at the root. Then, we decode ϕ and the superspill from at the root of each of the B subtrees. We recursively rep- the representation of Lemma 5. The aB-tree query algo- n resent each subarray using M( B ,ϕi) bits of memory and rithm decides which child to follow recursively based on ϕ. n spill universe K( B ,ϕi). We extract the spill of that child from the superspill, and Then, all memory bits from the children are concate- recurse. The constants needed to extract the spill and the  n nated into a bit vector of size M = i M( B ,ϕi), and the position in memory of the child were stored in look-up ta- spills are combined into a superspill from the universe K = bles.

312 5 Open Problems [9] Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. Journal of It is an important open problem to establish whether our Computer and System Sciences, 47(3):424–436, 1993. See exponential trade-off between redundancy and query time also STOC’90. is optimal. We conjecture that it is. Unfortunately, prov- [10] Anna Gal´ and Peter Bro Miltersen. The cell probe com- ing this seems beyond the scope of current techniques. The plexity of succinct data structures. In Proc. 30th Interna- only lower bound for succinct data structures (without the tional Colloquium on Automata, Languages and Program- systematic assumption) is via a rather simple idea of Gal´ ming (ICALP), pages 332–344, 2003. and Miltersen [10], which requires that the data structure [11] Robert G. Gallager. Variations on a theme by huffman. IEEE have an intrinsic error-correcting property. Such a property Transactions on Information Theory, 24(6):668–674, 1978. is not characteristic of our problems. [12] Alexander Golynski. Optimal lower bounds for rank and Even if the exponential trade-off cannot be improved, it select indexes. Theoretical Computer Science, 387(3):348– 359, 2007. See also ICALP’06. would be interesting to establish where this trade-off “bot- [13] Alexander Golynski, Roberto Grossi, Ankur Gupta, Rajeev toms.” Due to our need for large precomputed tables, the Raman, and S. Srinivasa Rao. On the size of succinct in- O nα smallest redundancy that we can achieve is some ( ) dices. In Proc. 15th European Symposium on Algorithms bits, where α is a constant close to one (for instance, α = (ESA), pages 371–382, 2007. 3/4 for RANK√ /SELECT). Can this redundancy be reduced [14] Alexander Golynski, Rajeev Raman, and S. Srinivasa Rao. ε to, say, O( n), or hopefully even O(n )? On the redundancy of succinct data structures. In Proc. 11th Scandinavian Workshop on Algorithm Theory (SWAT), 2008. Acknowledgments. This work was initiated while the au- [15] Guy Jacobson. Space-efficient static trees and graphs. In thor was visiting the Max Planck Institut fur¨ Informatik, Proc. 30th IEEE Symposium on Foundations of Computer Saarbrucken,¨ in June 2005. I wish to thank Seth Pettie for Science (FOCS), pages 549–554, 1989. telling me about the problem on that occasion, and for initial [16] Peter Bro Miltersen. Lower bounds on the size of selection discussions. I also wish to thank Rasmus Pagh and Rajeev and rank indexes. In Proc. 16th ACM/SIAM Symposium on Raman for helping me understand previous work. Discrete Algorithms (SODA), pages 11–12, 2005. [17] Michael Mitzenmacher. Compressed bloom filters. References IEEE/ACM Transactions on Networking, 10(5):604–612, 2002. See also PODC’01. [1] Burton H. Bloom. Space/time trade-offs in hash coding with [18] Christian Worm Mortensen, Rasmus Pagh, and Mihai allowable errors. Communications of the ACM, 13(7):422– Patrasˇ ¸cu. On dynamic range reporting in one dimension. 426, 1970. In Proc. 37th ACM Symposium on Theory of Computing [2] Andrej Brodnik and J. Ian Munro. Membership in constant (STOC), pages 104–111, 2005. time and almost-minimum space. SIAM Journal on Comput- [19] J. Ian Munro. Tables. In Proc. 16th Conference on the Foun- ing, 28(5):1627–1640, 1999. See also ESA’94. dations of Software Technology and Theoretical Computer [3] Larry Carter, Robert Floyd, John Gill, George Markowsky, Science (FSTTCS), pages 37–40, 1996. and Mark Wegman. Exact and approximate membership [20] J. Ian Munro, Rajeev Raman, Venkatesh Raman, and testers. In Proc. 10th ACM Symposium on Theory of Com- S. Srinivasa Rao. Succinct representations of permutations. puting (STOC), pages 59–65, 1978. In Proc. 30th International Colloquium on Automata, Lan- [4] Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet guages and Programming (ICALP), pages 345–356, 2003. Tal. The Bloomier filter: an efficient data structure for static [21] J. Ian Munro, Venkatesh Raman, and S. Srinivasa Rao. Space support lookup tables. In Proc. 15th ACM/SIAM Symposium efficient suffix trees. Journal of Algorithms, 39(2):205–222, on Discrete Algorithms (SODA), pages 30–39, 2004. 2001. See also FSTTCS’98. [5] David R. Clark and J. Ian Munro. Efficient suffix trees on [22] Anna Pagh, Rasmus Pagh, and S. Srinivasa Rao. An optimal secondary storage. In Proc. 7th ACM/SIAM Symposium on Bloom filter replacement. In Proc. 16th ACM/SIAM Sympo- Discrete Algorithms (SODA), pages 383–391, 1996. sium on Discrete Algorithms (SODA), pages 823–829, 2005. [6] Erik D. Demaine, Friedhelm Meyer auf der Heide, Ras- [23] Rasmus Pagh. Low redundancy in static dictionaries mus Pagh, and Mihai Patrasˇ ¸cu. De dictionariis dynamicis with constant query time. SIAM Journal on Computing, pauco spatio utentibus (lat. on dynamic dictionaries using lit- 31(2):353–363, 2001. See also ICALP’99. tle space). In Proc. Latin American Theoretical Informatics [24] Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. (LATIN), pages 349–361, 2006. J. Algorithms, 51(2):122–144, 2004. See also ESA’01. [7] Erik D. Demaine, Thouis Jones, and Mihai Patrasˇ ¸cu. In- [25] Mihai Patrasˇ ¸cu and Mikkel Thorup. Time-space trade-offs terpolation search for non-independent data. In Proc. 15th for predecessor search. In Proc. 38th ACM Symposium on ACM/SIAM Symposium on Discrete Algorithms (SODA), Theory of Computing (STOC), pages 232–240, 2006. pages 522–523, 2004. [26] Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct indexable dictionaries with applications to encod- [8] Michael L. Fredman, Janos´ Komlos,´ and Endre Szemeredi.´ ing k-ary trees and multisets. In Proc. 13th ACM/SIAM Sym- Storing a sparse table with 0(1) worst case access time. Jour- posium on Discrete Algorithms (SODA), 233–242, 2002. nal of the ACM, 31(3):538–544, 1984. See also FOCS’82.

313 Strict Fibonacci Heaps

Gerth Stølting Brodal George Lagogiannis Robert E. Tarjan† MADALGO∗ Agricultural University Dept. of Computer Science Dept. of Computer Science of Athens Princeton University Aarhus University Iera Odos 75, 11855 Athens and HP Labs Åbogade 34, 8200 Aarhus N Greece 35 Olden Street, Princeton Denmark [email protected] New Jersey 08540, USA [email protected] [email protected]

ABSTRACT 1. INTRODUCTION We present the first pointer-based heap implementation with Williams in 1964 introduced binary heaps [25]. Since then time bounds matching those of Fibonacci heaps in the worst the design and analysis of heaps has been thoroughly inves- case. We support make-heap, insert, find-min, meld and tigated. The most common operations supported by the decrease-key in worst-case O(1) time, and delete and delete- heaps in the literature are those listed below. We assume min in worst-case O(lg n) time, where n is the size of the that each item stored contains an associated key. No item heap. The data structure uses linear space. can be in more than one heap at a time. A previous, very complicated, solution achieving the same time bounds in the RAM model made essential use of arrays makeheap() Create a new, empty heap and return a ref- and extensive use of redundant counter schemes to maintain erence to it. balance. Our solution uses neither. Our key simplification insert(H, i) Insert item i, not currently in a heap, into is to discard the structure of the smaller heap when doing heap H, and return a reference to where i is stored a meld. We use the pigeonhole principle in place of the in H. redundant counter mechanism. meld(H1,H2) Return a reference to a new heap containing H H H H Categories and Subject Descriptors all items in the two heaps 1 and 2 ( 1 and 2 cannot be accessed after meld). E.1 [Data Structures]: Trees; F.2.2 [Theory of Compu- tation]: Analysis of Algorithms and Problem Complexity— find-min(H) Return a reference to where the item with Nonnumerical Algorithms and Problems minimum key is stored in the heap H. delete-min(H) Delete the item with minimum key from General Terms the heap H. Algorithms delete(H, e) Delete an item from the heap H given a ref- erence e to where it is stored. Keywords decrease-key(H, e, k) Decrease the key of the item given Data structures, heaps, meld, decrease-key, worst-case com- by the reference e in heap H to the new key k. plexity There are many heap implementations in the literature, with a variety of characteristics. We can divide them into ∗Center for Massive Data Algorithmics, a Center of the Dan- two main categories, depending on whether the time bounds ish National Research Foundation. are worst case or amortized. Most of the heaps in the lit- †Partially supported by NSF grant CCF-0830676, US-Israel erature are based on heap-ordered trees, i.e. tree structures Binational Science Foundation Grant 2006204, and the where the item stored in a node has a key not smaller than Distinguished Visitor Program of the Stanford University the key of the item stored in its parent. Heap-ordered trees Computer Science Department. The information contained give heap implementations that achieve logarithmic time for herein does not necessarily reflect the opinion or policy of all the operations. Early examples are the implicit binary the federal government and no official endorsement should be inferred. heaps of Williams [25], the leftist heaps of Crane [5] as modi- fied by Knuth [20], and the binomial heaps of Vuillemin [24]. The introduction of Fibonacci heaps [15] by Fredman and Tarjan was a breakthrough since they achieved O(1) amor- Permission to make digital or hard copies of all or part of this work for tized time for all the operations above except for delete and personal or classroom use is granted without fee provided that copies are delete-min, which require O(lg n) amortized time, where n not made or distributed for profit or commercial advantage and that copies is the number of items in the heap and lg the base-two log- bear this notice and the full citation on the first page. To copy otherwise, to arithm. The drawback of Fibonacci heaps is that they are republish, to post on servers or to redistribute to lists, requires prior specific complicated compared to existing solutions and not as ef- permission and/or a fee. STOC.12, May 19–22, 2012, New York, New York, USA. ficient in practice as other, theoretically less efficient solu- Copyright 2012 ACM 978-1-4503-1245-5/12/05 ...$10.00. tions. Thus, Fibonacci heaps opened the way for further

1177 progress on the problem of heaps, and many solutions based dal [1] achieved O(1) worst case time for the meld operation on the amortized approach have been presented since then, on a pointer machine, but not for the decrease-key opera- trying to match the time complexities of Fibonacci heaps tion. It then became obvious that although the decrease-key while being at the same time simpler and more efficient in and the meld operation can be achieved in O(1) worst case practice. time separately, it is very difficult to achieve constant time Self adjusting data structures provided a framework to- for both operations in the same data structure. Brodal [2] wards this direction. A self-adjusting data structure is a managed to solve this problem, but his solution is very com- structure that does not maintain structural information (like plicated and requires the use of (extendable) arrays. For size or height) within its nodes, but still can adjust itself the pointer machine model of computation, the problem of to perform efficiently. Within this framework, Sleator and matching the time bounds of Fibonacci heaps remained open Tarjan introduced the skew heap [23], which was an amor- until now, and progress within the worst case framework has tized version of the leftist heap. They matched the com- been accomplished only in other directions. In particular, plexity of Fibonacci heaps on all the operations except for Elmasry, Jensen and Katajainen presented two-tier relaxed decrease-key, which takes O(lg n) amortized time. The pair- heaps [12] in which the number of key comparisons is re- ing heap, introduced by Fredman, Sedgewick, Sleator and duced to lg n + 3 lg lg n + O(1) per delete operation. They Tarjan [14], was the amortized version of the binomial heap, also presented [11] a new idea (which we adapt in this paper) and it achieved the same time complexity as Fibonacci heaps for handling decrease-key operations by introducing struc- except again for the decrease-key, the time complexity of tural violations instead of heap order violations. which remained unknown for many years. In 1999, Fred- man [13] proved that the lower bound for the decrease-key 1.1 Our contribution operation on pairing heaps is Ω(lg lg n); thus the amortized In this paper we present the first heap implementation performance of pairing heaps does not match the amor- that matches the time bounds of Fibonacci heaps in the tized performance of Fibonacci heaps. In 2005, Pettie [22] worst case on a pointer machine, i.e. we achieve a linear proved that√ the time complexity of the decrease-key oper- space data structure supporting make-heap, insert, find-min, ation is 2O( lg lg n). Later, Elmasry [9] gave a variant of meld and decrease-key in worst-case O(1) time, and delete pairing heaps that needs only O(lg lg n) amortized time for and delete-min in worst-case O(lg n) time. This adds the decrease-key. final step after the previous step made by Brodal [2] and Heaps having amortized performance matching the amor- answers the long standing open problem of whether such a tized time complexities of Fibonacci heaps have also been heap is possible. presented. In particular, Driscoll, Gabow, Shrairman and Much of the previous work, including [1, 2, 3, 11, 12, 18], Tarjan [6] proposed rank-relaxed heaps, Kaplan and Tar- used redundant binary counting schemes to keep the struc- jan [19] presented thin heaps, Chan [4] introduced quake tural violations logarithmically bounded during operations. heaps, Haeupler, Sen and Tarjan introduced rank-pairing For the heaps described in this paper we use the simpler heaps [16], and Elmasry introduced violation heaps [10]. El- approach of applying the pigeonhole principle. Our heaps masry improved the number of comparisons of Fibonacci are essentially heap ordered trees, where the structural vio- heaps by a constant factor [7] and also examined versions lations are subtrees being cut off (and attached to the root), of pairing heaps, skew heaps, and skew-pairing heaps [8]. as in Fibonacci heaps. The crucial new idea is that when Some researchers, aiming to match the amortized bounds melding two heaps, the data structures maintained for the of Fibonacci heaps in a simpler way, followed different di- smaller tree are discarded by marking all these nodes as be- rections. Peterson [21] presented a structure based on AVL ing passive. We mark all (active) nodes in the smaller tree trees and Høyer [17] presented several structures, including passive in O(1) time using an indirectly accessed shared flag. ones based on red-black trees, AVL trees, and (a, b)-trees. In Sections 2-4 we describe our data structure, ignoring Let us now review the progress on this problem, based the pointer-level representation, and analyze it in Section 5. on the worst-case approach. The goal of worst-case efficient The pointer-level representation is given in Section 6. In heaps is to eliminate the unpredictability of amortized ones, Section 7 we give some concluding remarks and discuss pos- since this unpredictability is not desired in e.g. real time sible variations. applications. The targets for the worst case approach were given by 2. DATA STRUCTURE AND INVARIANTS Fibonacci heaps, i.e. the time bounds of Fibonacci heaps In this section we describe our data structure on an ab- should ideally be matched in the worst case. The next im- stract level. The representation at the pointer level is given provement after binomial heaps came with the the implicit in Section 6. heaps of Carlsson, Munro and Poblete [3] supporting worst- A heap storing n items is represented by a single ordered O O n case (1) time insertions and (lg ) time deletions on a tree with n nodes. Each node stores one item. The size of a single heap stored in an array. Run-relaxed heaps [6] achieve tree is the number of nodes it contains. The degree of a node the amortized bounds given by Fibonacci heaps in the worst is the number of children of the node. We assume that all case, with the exception of the meld operation, which is sup- keys are distinct; if not, we break ties by item identifier. We O n ported in (lg ) time in the worst case. The same result let x.key denote the key of the item stored in node x. The was later also achieved by Kaplan and Tarjan [18] with fat items satisfy heap order, i.e. if x is a child of y then x.key > heaps. Fat heaps without meld can be implemented on a y.key. Heap order implies that the item with minimum key O n pointer machine, but to support meld in (lg ) time arrays is stored in the root. are required. The meld operation was the next target for The basic idea of our construction is to ensure that all achieving constant time in the worst case framework, in or- nodes have logarithmic degree, that a meld operation makes der to match the time complexities of Fibonacci heaps. Bro- the root with the larger key a child of the root with the

1178 smaller key, and that a decrease-key operation on a node grandchildren. Finally, for all nodes v = x we repeatedly detaches the subtree rooted at the node and reattaches it as prune grandchildren such that the i-th rightmost child of v a subtree of the root. To guide the necessary restructuring, has degree exactly i − 1. The remaining subtrees Tv are we need the following concepts and invariants. binomial trees of size 2degree(v). The minimum size of such r Each node is marked either active or passive. An active a Tx is achieved by starting with a binomial tree of size 2 , node with a passive parent is called an active root. The and repeating the following step L times: prune a grand- rank of an active node is the number of active children. child of the root with maximum degree. Since the maximum Each active node is assigned a non-negative integer loss. grandchild degree of a binomial tree of size 2r is r − 2, and The total loss of a heap is the sum of the loss over all generally there are j grandchildren of degree r − j − 1, there k j k k / ≥ r −k − active nodes. A passive node is linkable if all its children are j=1 = ( +1) 2 grandchildren of degree 1. are passive. Since k(k +1)/2 ≥ L, no grandchild of degree ≤ r − k − 2 In order to keep the node degrees logarithmic during dele- is pruned, i.e. the (r − k)-th rightmost child w of x has de- tions, we maintain all nodes of a heap except for the root in gree r − k − 1 and loss zero. By the assumption on r, the a queue Q. A non-root node has position p if it is the p-th degree of w is ≥ lg n and Tw has size ≥ n. It follows that Tx node on the queue Q. has size at least n √+ 1, which is a contradiction. This gives r

I1 (Structure) For all nodes the active children are to the The corollary implies a bound on the maximal rank be- left of the passive children. The root is passive and the fore a heap operation. If we violate I2 or I3 temporarily linkable passive children of the root are the rightmost during a heap operation, then the pigeonhole principle guar- children. For an active node, the i-th rightmost active antees that we can apply the transformations described in child has rank+loss at least i − 1. An active root has Section 3. If the total loss is >R+1, then there exists either loss zero. a node with loss at least two, or there exist two nodes with equal rank each with loss exactly one. Similarly if there are I2 (Active roots) The total number of active roots is at >R+ 1 active roots, then at least two active roots have the most R +1. same rank. Finally, if I1-I3 are satisfied but the root violates I4, then the root has at least three passive linkable children, I3 (Loss) The total loss is at most R +1. since the root has at most R + 1 children or grandchildren I4 (Degrees) The maximum degree of the root is R +3. that can be active roots, i.e. at most R + 1 children of the Let x be a non-root node, and let p denote its posi- root are active roots or passive non-linkable nodes. tion in Q.Ifx is a passive node or an active node with positive loss its degree is at most 2 lg(2n − p)+9; 3. TRANSFORMATIONS x otherwise, is an active node with loss zero and is The basic transformation is to link anodex and its sub- allowed to have degree one higher, i.e. degree at most tree below another node y, by removing x from the child list n − p 2lg(2 ) + 10. of its current parent and making x a child of y.Ifx is active it is made the leftmost child of y;ifx is passive it is made Note that I4 implies that all nodes have degree at most the rightmost child of y. 2lgn+12. The above invariants imply a bound on the max- The following transformations use link to reestablish the imum rank an active node can have. The following lemma invariants I2-I4 when they get violated. The transformations captures how the maximum rank depends on the value of R, are illustrated in Figure 1 and the main properties of the for arbitrary values of R. transformations are captured by Table 1. Active root reduction Let x and y be active roots of Lemma 1. If I1 is satisfied and the total loss is L,then √ equal rank r. Compare x.key and y.key. Assume w.l.o.g. the maximum rank is at most lg n + 2L +2. x.key

1179 root root xzy 1/0 x below root r/0 r/0 0/0 y yx z z

(a) Root degree reduction (b) Active root reduction

y below root z

r/ x r/1 yxr/1

(c) One-node loss reduction ( ≥ 2) (d) Two-node loss reduction

Figure 1: Transformations to reduce the root degree, the number of active roots, and the total loss. Black/white nodes are passive/active nodes. For an active node r/ shows rank/loss.

Loss reduction To reduce the total loss we have two heap. To delete an arbitrary item, decrease its key to minus different transformations. The one-node loss reduction infinity and do a minimum deletion. applies when there exists an active node x with loss ≥ 2. To decrease the key of the item in node x, in the tree with Let y be the parent of x. In this case x is linked to the root z, begin by decreasing the key of the item. If x is the root and made an active root with loss zero, and the rank root we are done. Otherwise, if x.key

1180 Table 1: Effect of the different transformations Root Total Active Key degree loss roots comparisons Active root reduction (A) ≤ +1 0 −1+1 Root degree reduction (R) −2 0 +1 +3 Loss reduction ≤ +1 ≤−1 ≤ +1 ≤ +1 – one-node +1 ≤−1+10 – two-node 0 −10 +1 (A) + (R) ≤−100 +4 2×(A) + (R) ≤ 00−1+5 3×(A) + 2×(R) ≤−10−1+9

Table 2: The changes caused by the different heap operations Root degree Total loss Active roots decrease-key ≤ 1+1+6− 8 ≤ 1 − 1+0+0 ≤ 1+1− 6+4 meld ≤ 1+0+1− 2 ≤ 0+0+0+0 ≤ 0+0− 1+1 delete-min ≤ (2 lg n +12+4)+1 ≤ 0 − 1 ≤ R +1

5. CORRECTNESS provided that the repeated application of active root reduc- In the following we verify that each operation preserves tions and root degree reductions terminate. Termination the invariants I1-I4. immediately follows from the facts that both reductions re- For I1 the interesting property to verify is that the i-th ac- duce the measure ≥ i − tive child of an active node has rank+loss 1. All other 2 · root degree + 3 · #active roots properties in I1 are straightforward to verify. We start by ob- serving that if an active child x is detached from its parent y by one, and that the initial value of this measure is O(log n). satisfying I1, then I1 is also satisfied for y after the detach- This immediately also implies the claimed time bounds for ment. This follows since the i-th rightmost active child z our heap. after the detachment was either the i-th or (i + 1)-st active To prove the validity of I4, we first observe that for the child before the detachment, i.e. for z the new rank+loss is degree of the root we can use the same argument as above at least i − 1. An active node only gets a new active child using Table 2. as a result of an active root reduction, two-node loss reduc- During the transformations a non-root node can only in- tion, or root degree reduction. In the first two cases a new crease its degree in three cases. During an active root reduc- (r + 1)-st rightmost active child is added with rank r (and tion there is no passive right-child to detach. In this case all loss zero), and in the later case I1 holds by construction for children of the node are active, and the degree bound fol- the two new active nodes. lows from Corollary 1. During a two-node loss reduction the For invariants I2 and I3, Table 2 captures the change to degree of the node x increases by one, but this is okay by I4 the degree of the root, the total loss, and the number of since the loss of the node decreases from one to zero. Dur- active roots when performing the operations decrease-key, ing a root degree transformation two passive nodes become meld, and delete-min, respectively. Each entry is a sum of active, both getting loss zero and degree increased by one. four terms stating the change caused by the initial trans- Again this is okay by I4. During a meld one root becomes a formations performed by the operations, and by the loss non-root, but since R+3 ≤ 2lg(2n−p)+9 for all possible p, reduction transformations, active root reductions, and root again I4 holds. The interesting case is when we perform a degree reductions. Each entry is an upper bound on the delete-min operation. I4 holds for the root trivially, since change, except for the cases where no reduction is possible we repeatedly perform root degree reductions until none are (e.g. the loss increases, but is still ≤ R + 1). For meld the possible. For non-root nodes we observe that their invariant root degree is the change to the old root that becomes the is strengthened since R decreases. The role of Q is to deal new root, whereas total loss and number of active roots is with this case. By removing the two first nodes in the queue, compared to the heap that is not made passive. For delete- all nodes get their position in Q decreased by two or three min the bounds are stated before the repeated active root (depending if they were in front of the deleted node in the and root degree transformations are applied. queue). By observing that 2n − p does not increase in this Observe that for both decrease-key and meld all sums are case, it follows that the invariant for non-root nodes is not zero, and that R increases during a meld, i.e. invariants I2 strengthened and I4 remains valid. For the two nodes moved and I3 remain valid for each of these operations. Delete-min to the end of the queue the term 2 lg(2n − p) decreases by reduces the size of the heap by one, reducing R =2lgn +6 two, implying that their degree constraint is strengthened by at most one, if n ≥ 4 before the delete-min operation (if by two. Since we detach two passive nodes from these nodes n ≤ 3 before the delete-min operation, I2 and I3 are triv- I4 remains valid for these nodes also. During meld all ele- ially true after). The reduction in loss by one ensures that ments in the smaller heap become passive, and their degree I3 is valid after delete-min. Since active root reductions are constraint goes from the “+10” to the “+9” case. But since performed until they are not possible, I2 trivially holds — they remain in their position in the queue, and the result-

1181 ing queue is at least twice the size, we have that 2n − p rank active record increases by a factor two, and I4 remains valid for the ele- Q-next flag ments in the smaller heap. For the elements in the larger of active ref-count the two heaps, they keep their active status. Both n and p parent right increase for these elements by the size of the smaller queue. x Therefore 2n − p is non-decreasing and I4 remains valid. left A note on the loss: In the previous description the loss of a node is assumed to be a non-negative integer. Even though left-child ... the loss plays an essential role in invariant I1, only values 0, 1, and ≥ 2 are relevant for the algorithm. As soon as the Q-prev loss is ≥ 2, then the loss can only decrease when it is set to zero by a one-node loss transformation. Since the algorithm ≥ only tests if the loss is zero, one or 2, it is sufficient for Figure 2: The fields of a node record x and an active the algorithm to keep track of these three states. It follows record. The fields of x not shown are the item and that we can store the loss using only two bits. the loss. 6. REPRESENTATION DETAILS To represent a heap we have the following types of records, actions: In order to perform a loss reduction transformation, also illustrated in Figures 2 and 3. Each node is represented we go to the right-end of the fix-list and perform a one-node by a node record. To mark if a node is active or passive loss reduction (if we access a node of multiple loss) or a two- we will not store the flag directly in the node but indirectly node loss reduction (if the two rightmost nodes of the fix-list in an active record. In particular all active nodes of a tree have loss one). Otherwise Part 4 is empty. After a loss re- point to the same active record. This allows all nodes of duction transformation on a rank k, we may have to transfer O a tree to be made passive in (1) time by only changing one node of rank k into Part 3, if its loss is one and it is the a single flag. The entry point to a heap is a pointer to a last node in Part 3 with this rank. Whenever the loss of a heap record, that has a pointer to the root of the tree and node that has a loss-transformable rank increases, we insert additional information required for accessing the relevant it into (the appropriate group of) Part 4. If its rank is not nodes for performing the operations described in Section 4. loss transformable, we insert it into Part 3, unless there is We call a rank active-root transformable, if there are another node of the same rank there, in which case we move at least two active roots of that rank. We call a rank loss both nodes into Part 4 at the right end of the fix-list. transformable, if the total loss of the nodes of that rank is In order to perform an active root reduction, we go to the at least two. For each rank we maintain a node, and all such left end of the fix-list and link the two leftmost active roots, nodes belong to the rank-list. The rightmost node corre- if they have the same rank (otherwise, Part 1 is empty). If sponds to rank zero, and the node that corresponds to rank k k − after the reduction, the two leftmost nodes in the fix-list are is the left sibling of the one that corresponds to 1. (see not active roots of equal degree, we transfer the leftmost Figure 3). We maintain all active nodes that potentially node into Part 2. When an active node of rank k becomes could participate in one of the transformations from Sec- an active root and there is no other active root of the same tion 3 (i.e. active roots and active nodes with positive loss) rank, we insert the new active root into Part 2. Otherwise, k in a list called the fix-list. Each node with rank on the fix- we insert it adjacent to the active roots of that rank, unless k list points to node of the rank-list. The fix-list is divided only one active root has this rank (i.e. it is located in Part 2), left-to-right into four parts (1-4), where Parts 1-2 contain in which case we transfer the existing active root of rank k active roots and Parts 3-4 contain nodes with positive loss. with the new one into Part 1 (at the left end of the fix- Part 1 contains the active roots of active-root transformable list). When the rank of a node that belongs to the fix-list ranks. All the active roots of the same rank are adjacent, changes, we can easily perform the necessary updates to the and one of the nodes has a pointer from the corresponding fix-list so that all parts of the list are consistent with the node of the rank-list. Part 2 contains the remaining active above description. The details are straightforward and thus roots. Each node has a pointer from the corresponding node omitted. of the rank-list. Part 3 contains active nodes with loss one The details of the fields of the individual records are as and a rank that is not loss-transformable. Each node of follows (see also Figures 2 and 3). this part has a pointer from the corresponding node of the rank-list. Finally, Part 4 contains all the active nodes of loss Node record transformable rank. As in Part 1, all nodes of equal rank are adjacent and one of the nodes is pointed to by the corre- item A pointer to the item (including its associated key). sponding node of the rank-list. Observe that for some ranks left, right, parent, left-child Pointers to the node re- there may exist only one node in Part 1 (because if the loss cords for the the left and right sibling of the node (the of a node is at least two, its rank is loss-transformable). left and right pointers form a cyclic linked list), the From the above description it follows that we can always parent node, and the leftmost child. The later two are perform an active root reduction as long as Part 1 of the fix- NULL if the nodes do not exist. list is nonempty, and we can always perform a loss reduction active Pointer to an active record, indicating if the node is transformation as long as Part 4 is nonempty. We maintain active or passive. a pointer (in the heap record) indicating the boundary be- Q-prev, Q-next Pointers to the node records for the previ- tween Parts 2 and 3. The above construction and pointers ous and the next node on the queue Q, which is main- (see Figure 3 for more details) allow us to take the following tained as a cyclic linked list.

1182 heap record size root pointers from active nodes with rank 1 with active parent and no loss active-record maximum non-linkable-child rank inc dec Q-head rank-list 4 3 2 10 rank-list loss singles active-roots fix-list rank

fix-list 33114021 33 400 left right node (1) transformable (2) (3) (4) transformable active roots nodes with postive loss

Figure 3: The heap-record, rank-list, and fix-list. The ref-count field in the records of the rank-list is not shown. Only pointers for nodes with rank equal to one are shown. The numbers in the nodes in the rank-list and fix-list are the ranks of the active nodes pointing to these nodes; these numbers are not stored.

loss Non-negative integer equal to the loss of the node. Un- Rank-list record (representing rank r) defined if the node is passive. rank If the node is passive, the value of the pointer is not inc, dec Pointers to the records on the rank-list for rank defined (it points to some old node that has been re- r + 1 and r − 1, if they exist. Otherwise they are leased for garbage collection). If the node is an active NULL. root or an active node with positive loss (i.e. it is on loss A pointer to a record in the fix-list for an active node the fix-list), rank points to the corresponding record in with rank r and positive loss. NULL if no such node the fix-list. Otherwise rank points to the record in the exists. rank-list corresponding to the rank of the node. (The active-roots A pointer to a record in the fix-list for an cases can be distinguished using the active field of the active root with rank r. NULL if no such node exists. node and the parent together with the loss field). ref-count The number of node records and fix-list records pointing to this record. If the leftmost record on the Active record rank-list gets a ref-count = 0, then the record is deleted flag A Boolean value. Nodes pointing to this record are from the rank-list and is released for garbage collec- active if and only if the flag is true. tion. ref-count The number of nodes pointing to the record. If Fix-list record flag is false and ref-count = 0, then the node is released for garbage collection. node A pointer to the node record for the node. Heap record left, right Pointers to the left and right siblings on the fix- list, that is maintained as a cyclic linked list. size An integer equal to the number of items in the heap. rank A pointer to the record in the rank-list corresponding root A pointer to the node record of the root (NULL if and to the rank of this node. only if the heap is empty). active-record A pointer to the active record shared by all A detail concerning garbage collection: When performing active nodes in the heap (one distinct active record for a meld operation on two heaps, all nodes in one heap are each heap). first made passive by clearing the flag in the active record non-linkable-child A pointer to the node record of the given by the heap record. The heap record, the rank-list, leftmost passive non-linkable child of the root. If all and the fix-list for this heap are released for incremental passive children of the root are linkable the pointer is garbage collection. to the rightmost active child of the root. Otherwise it We now bound the space required by our structure. For is NULL. each item we have one node record and possibly one fix-list Q-head A pointer to the node record for the node that is record. The number of active records is bounded by the the head of the queue Q. number of nodes, since in the worst case each active record rank-list A pointer to the rightmost record in the rank-list. has a ref-count = 1. Finally for each heap we have one fix-list A pointer to the rightmost node in the fix-list. heap-record and a number of rank-list records bounded by singles A pointer to the leftmost node in the fix-list with one plus the maximum rank of a node, i.e. logarithmic in the a positive loss, if such a node exists. Otherwise it is size of the heap. It follows that the total space for a heap is NULL. linear in the number of stored items.

1183 7. CONCLUSION [10] Amr Elmasry. The violation heap: A relaxed We have described the first pointer-based heap implemen- Fibonacci-like heap. In Proc. 16th Annual tation achieving the performance of Fibonacci heaps in the International Conference on Computing and worst-case. What we presented is just one possible imple- Combinatorics, volume 6196 of Lecture Notes in mentation. Based on the active/passive node idea we have Computer Science, pages 479–488. Springer, 2010. considered many alternative implementations. One option [11] Amr Elmasry, Claus Jensen, and Jyrki Katajainen. is to allow a third explicit marking “active root” (instead of On the power of structural violations in priority an active root being a function of the active/passive marks), queues. In Proc. 13th Computing: The Australasian such that a child of an active node can also be an active root. Theory Symposium., volume 65 of CRPIT, pages This eliminates the distinction between passive linkable and 45–53. Australian Computer Society, 2007. non-linkable nodes. Another option is to allow the root to [12] Amr Elmasry, Claus Jensen, and Jyrki Katajainen. be active also. This simplifies the decrease-key operation, Two-tier relaxed heaps. Acta Informatica, since then the node getting a smaller key can also be made 45(3):193–210, 2008. the root, without ever swapping items in the nodes. A third [13] Michael L. Fredman. On the efficiency of pairing option is to adopt redundant counters instead of the pigeon- heaps and related data structures. Journal of the hole principle to bound the number of active roots and holes ACM, 46(4):473–501, 1999. created by cutting of subtrees. Further alternatives are to [14] Michael L. Fredman, Robert Sedgewick, consider ternary linking instead of binary linking, and us- Daniel Dominic Sleator, and Robert Endre Tarjan. ing a different “inactivation” criterion during meld, e.g. size The pairing heap: A new form of self-adjusting heap. plus number of active nodes. All these variations can be Algorithmica, 1(1):111–129, 1986. combined, each solution implying different bounds on the [15] Michael L. Fredman and Robert Endre Tarjan. maximum degrees, constants in the time bounds, and com- Fibonacci heaps and their uses in improved network plexity in the reduction transformations. In the presented optimization algorithms. Journal of the ACM, solution we aimed at reducing the complexity in the descrip- 34(3):596–615, 1987. tion, whereas the constants in the solution were of secondary [16] Bernhard Haeupler, Siddhartha Sen, and interest. Robert Endre Tarjan. Rank-pairing heaps. In Proc. 17th Annual European Symposium on Algorithsm, 8. REFERENCES volume 5757 of Lecture Notes in Computer Science, pages 659–670. Springer, 2009. SIAM Journal of [1] Gerth Stølting Brodal. Fast meldable priority queues. Computing, to appear. In Proc. 4th International Workshop Algorithms and [17] Peter Høyer. A general technique for implementation Data Structures, volume 955 of Lecture Notes in of efficient priority queues. In Proc. Third Israel Computer Science, pages 282–290. Springer, 1995. Symposium on the Theory of Computing and Systems, [2] Gerth Stølting Brodal. Worst-case efficient priority pages 57–66, 1995. queues. In Proc. 7th Annual ACM-SIAM Symposium [18] Haim Kaplan and Robert E. Tarjan. New heap data on Discrete Algorithms, pages 52–58. SIAM, 1996. structures. Technical report, Department of Computer [3] Svante Carlsson, J. Ian Munro, and Patricio V. Science, Princeton University, 1999. Poblete. An implicit binomial queue with constant [19] Haim Kaplan and Robert Endre Tarjan. Thin heaps, insertion time. In Proc. 1st Scandinavian Workshop on thick heaps. ACM Transactions on Algorithms, 4(1), Algorithm Theory, volume 318 of Lecture Notes in 2008. Computer Science, pages 1–13. Springer, 1988. [20] Donald E. Knuth. The Art of Computer Programming, [4] Timothy M. Chan. Quake heaps: a simple alternative Volume I: Fundamental Algorithms, 2nd Edition. to Fibonacci heaps. Manuscript, 2009. Addison-Wesley, 1973. [5] Clark Allan Crane. Linear lists and priority queues as [21] Gary L. Peterson. A balanced tree scheme for balanced binary trees. PhD thesis, Stanford University, meldable heaps with updates. Technical Report Stanford, CA, USA, 1972. GIT-ICS-87-23, School of Information and Computer [6] James R. Driscoll, Harold N. Gabow, Ruth Shrairman, Science, Georgia Institute of Technology, 1987. and Robert E. Tarjan. Relaxed heaps: An alternative [22] Seth Pettie. Towards a final analysis of pairing heaps. to Fibonacci heaps with applications to parallel In Proc. 46th Annual IEEE Symposium on computation. Communications of the ACM, Foundations of Computer Science, pages 174–183. 31(11):1343–1354, 1988. IEEE Computer Society, 2005. [7] Amr Elmasry. Layered heaps. In Proc. 9th [23] Daniel Dominic Sleator and Robert Endre Tarjan. Scandinavian Workshop on Algorithm Theory, volume Self-adjusting heaps. SIAM Journal of Computing, 3111 of Lecture Notes in Computer Science, pages 15(1):52–69, 1986. 212–222. Springer, 2004. [24] Jean Vuillemin. A data structure for manipulating [8] Amr Elmasry. Parameterized self-adjusting heaps. J. priority queues. Communications of the ACM, Algorithms, 52(2):103–119, 2004. 21(4):309–315, 1978. [9] Amr Elmasry. Pairing heaps with o(log log n) decrease [25] John William Joseph Williams. Algorithm 232: cost. In Proc. 20th Annual ACM-SIAM Symposium on Heapsort. Communications of the ACM, 7(6):347–348, Discrete Algorithms, pages 471–476. SIAM, 2009. 1964.

1184 The Cell Probe Complexity of Dynamic Range Counting

Kasper Green Larsen∗ MADALGO,† Department of Computer Science Aarhus University Aarhus, Denmark [email protected]

ABSTRACT 1. INTRODUCTION In this paper we develop a new technique for proving lower Proving lower bounds on the operational timeofdata bounds on the update timeandquerytimeofdynamic data structures has been an active line of research for decades. structures in the cell probe model. With this technique, During these years, numerous models of computation have we prove the highest lower bound to date for any explicit been proposed, including the cell probe model of Yao [12]. 2 problem,namely a lower bound of tq = Ω((lg n/ lg(wtu)) ). The cell probe model is one of the least restrictive lower Here n is the number of update operations, w the cell size, tq bound models, thus lower bounds proved in the cell probe the query timeandtu the update time. In the most natural model apply to essentially every imaginable data structure, setting of cell size w =Θ(lgn), this gives a lower bound including those developed in the popular upper bound model, 2 of tq =Ω((lgn/ lg lg n) ) for any polylogarithmic update the word RAM. Unfortunately this generality comes at a time. This bound is almost a quadratic improvement over cost: The highest lower bound that has been proved for any the highest previous lower bound of Ω(lg n), due to Pˇatra¸scu explicit data structure problem is Ω(lg n), both for static and Demaine [SICOMP’06]. andevendynamic data structures1. We prove our lower bound for the fundamental problem In this paper, we break this barrier by introducing a new of weighted orthogonal range counting. In this problem,we technique for proving dynamic cell probe lower bounds. Us- are to support insertions of two-dimensional points, each as- ing our technique, we obtain a query time lower bound of signed a Θ(lg n)-bit integer weight. A query to this problem Ω((lg n/ lg lg n)2) for any polylogarithmic update time. We is specified by a point q =(x, y), and the goal is to report prove the bound for the fundamental problem of dynamic the sum of the weights assigned to the points dominated by weighted orthogonal range counting in two-dimensional space. q, where a point (x,y)isdominated by q if x ≤ x and In dynamic weighted orthogonal range counting (in 2-d), the y ≤ y. In addition to being the highest cell probe lower goal is to maintain a set of (2-d) points under insertions, bound to date, our lower bound is also tight for data struc- where each point is assigned an integer weight. In addi- 2+ε tures with update time tu =Ω(lg n), where ε>0isan tion to supporting insertions, a data structure must support arbitrarily small constant. answering queries. A query is specified by a query point q =(x, y), and the data structure must return the sum of the weights assigned to the points dominated by q.Herewe Categories and Subject Descriptors say that a point (x,y)isdominated by q if x ≤ x and  E.1 [Data]: Data Structures y ≤ y. 1.1 The Cell Probe Model Keywords Adynamic data structure in the cell probe model con- Data Structures, Cell Probe, Lower Bounds, Range Search- sists of a set of memory cells, each storing w bits. Each ing, Computational Geometry cell of the data structure is identified by an integer ad- dress, which is assumed to fit in w bits, i.e. each address w w ∗Kasper Green Larsen is a recipient of the Google Europe is amongst [2 ]={0,...,2 − 1}. We will make the ad- Fellowship in Search and Information Retrieval, and this ditional standard assumption that a cell also has enough research is also supported in part by this Google Fellowship. bits to address any update operation performed on it, i.e. † Center for Massive Data Algorithmics, a Center of the Dan- we assume w =Ω(lgn) when analysing a data structures ish National Research Foundation. performance on a sequence of n updates. When presented with an update operation, a data struc- ture reads and updates a number of the stored cells to reflect the changes. We refer to the reading or writing of a cell as Permission to make digital or hard copies of all or part of this work for probing the cell, hence the namecellprobemodel. The up- personal or classroom use is granted without fee provided that copies are date time of a data structure is the number of cells probed not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to when processing an update operation. republish, to post on servers or to redistribute to lists, requires prior specific To answer a query, a data structure similarly probes a permission and/or a fee. 1This is true under the most natural assumption of cell size STOC’12, May 19–22, 2012, New York, New York, USA. n Copyright 2012 ACM 978-1-4503-1245-5/12/05 ...$10.00. Θ(lg ).

85 number of cells from the data structure, and from the con- when tu is the worst case update timeandtq the expected tents of the probed cells, the data structure must return the average2 query time of a data structure. correct answer to the query. We similarly define the query The particular problem of orthogonal range counting has time of a data structure as the number of cells probed when received much attention from a lower bound perspective. answering a query. In the static case, Pˇatra¸scu [6] first proved a lower bound of t =Ω(lgn/ lg(Sw/n)) where t is the expected average Previous Results. query timeandS the space of the data structure in number In the following, we give a brief overview of the most im- of cells. This lower bound holds for regular counting (with- portant techniques that have been introduced for proving out weights), and even when just the parity of the number cell probe lower bounds for dynamic data structures. We of points in the range is to be returned. In [7] he reproved also review the previous cell probe lower bounds obtained this bound using an elegant reduction from the communica- for orthogonal range counting and related problems. In Sec- tion game known as lop-sided set disjointness. Subsequently tion 2 we then give a more thorough review of the previous Jørgensen and Larsen [3] proved a matching bound for the techniques most relevant to our work, followed by a descrip- strongly related problems of range selection and range me- tion of the key ideas in our new technique. dian. Finally, as mentioned earlier, Pˇatra¸scu [6] proved a 2 In their seminal paper [2], Fredman and Saks introduced tq = Ω((lg n/ lg(wtu)) ) lower bound for dynamic weighted the celebrated chronogram technique. They applied their orthogonal range counting when the weights are lg2+Ω(1) n- technique to the partial sums problem and obtained a lower bit integers. In the concluding remarks of that paper, he bound stating that tq =Ω(lgn/ lg(wtu)), where tq is the posed it as an interesting open problem to prove the same query timeandtu the update time. In the partial sums lower bound for regular counting. problem,wearetomaintain an array of nO(w)-bit integers under updates of the entries. A query to the problem con- Our Results. sists of two indices i and j, and the goal is to compute the In this paper we introduce a new technique for proving sum of the integers in the subarray from index i to j.The dynamic cell probe lower bounds. Using this technique, we 2 lower bound of Fredman and Saks holds even when the data obtain a lower bound of tq = Ω((lg n/ lg(wtu)) ), where tu structure is allowed amortization and randomization. is the worst case update timeandtq is the expected average The bounds of Fredman and Saks remained the highest query time of the data structure. Our lower bound holds achieved until the breakthrough results of Pˇatra¸scu and De- for any cell size w =Ω(lgn), and is the highest achieved maine [9]. In their paper, they extended upon the ideas of to date in the most natural setting of cell size w =Θ(lgn). Fredman and Saks to give a tight lower bound for the partial For polylogarithmic tu and logarithmic cell size, this bound 2 sumsproblem. Their results state that tq lg(tu/tq)=Ω(lgn) is tq = Ω((lg n/ lg lg n) ), i.e. almost a quadratic improve- and tu lg(tq/tu)=Ω(lgn) when the integers have Ω(w) bits, ment over the highest previous lower bound of Pˇatra¸scu and which in particular implies max{tq,tu} =Ω(lgn). We note Demaine. that they also obtain tight lower bounds in the regimeof We prove our lower bound for dynamic weighted orthog- smaller integers. Again, the bounds hold even when allowed onal range counting in two-dimensional space, where the amortization and randomization. For the most natural cell weights are Θ(lg n)-bit integers. This gives a partial an- size of w =Θ(lgn), this remains until today the highest swer to the open problem posed by Pˇatra¸scu by reducing achieved lower bound. the requirement of the magnitude of weights from lg2+Ω(1) n The two above techniques both lead to smooth tradeoff to just logarithmic. Finally, our lower bound is also tight 2+Ω(1) curves between update timeandquerytime. While this for update time tu =lg n, hence deepening our un- behaviour is correct for the partial sumsproblem,thereare derstanding of one of the most fundamental range searching many examples where this is certainly not the case. Pˇatra¸scu problems. and Thorup [10] recently presented a new extension of the chronogram technique, which can prove strong threshold Overview. lower bounds. In particular they showed that any data struc- In Section 2 we discuss the two previous techniques most ture for maintaining the connectivity of a graph under edge related to ours, i.e. that of Fredman and Saks [2] and of insertions and deletions, must either have amortized update Pˇatra¸scu [6]. Following this discussion we give a description 1−o(1) timeΩ(lgn)orthequerytimeexplodeston . of the key ideas behind our new technique. Having intro- In the search for super-logarithmic lower bounds, Pˇatra¸scu duced our technique, we proceed to the lower bound proof introduced a dynamic set-disjointness problem named the in Section 3 and finally we conclude in Section 5 with a dis- multiphase problem [8]. Based on a widely believed conjec- cussion of the limitations of our technique and the intriguing ture about the hardness of 3-SUM, Pˇatra¸scu first reduced open problems these limitations pose. 3-SUMtothemultiphase problem and then gave a series of reductions to different dynamic data structure problems, implying polynomial lower bounds under the 3-SUM conjec- 2. TECHNIQUES ture. In this section, we first review the two previous techniques Finally, we mention that Pˇatra¸scu [6] presented a tech- most important to our work, and then present our new tech- nique capable of proving a query time lower bound of tq = 2 nique. Ω((lg n/ lg(wtu)) ) for dynamic weighted orthogonal range counting, but only when the weights are lg2+Ω(1) n-bit in- 2i.e. for any data structure with a possibly randomized tegers. Thus the magnitude of the lower bound compared U δ query algorithm, there exists a sequence of updates ,such to the number of bits, , needed to describe an update op- that the expected cost of answering a uniform random query δ eration or a query, remains below Ω( ). This bound holds after the updates U is tq .

86 Fredman and Saks [2]. complished by first letting Bob send all cells associated to This technique is known as the chronogram technique. epochs j1isaparameter. j>i) herself, and she knows the cells associated to epochs The i’th epoch consists of performing βi carefully chosen up- j

87 smaller than the entropy of the βi updates of epoch i.Let- Hard Distribution. ting Si denote the cells associated to epoch i, the decoder Our updates arrive in batches, or epochs, of exponentially  then finds a subset of cells Si ⊆ Si, such that a large number decreasing size. For i =1,...,lgβ n we define epoch i as a  i of queries are resolved by Si. He then sends a description of sequence of β updates, for a parameter β>1tobefixed those cells and proceeds by finding a subset Q of the queries later. The epochs occur in timefrom biggest to smallest  resolved by Si, such that knowing the answer to all queries epoch, and at the end of epoch 1 we execute a uniform ran- in Q reduces the entropy of the updates of epoch i by more dom query in [n] × [n].  than the number of bits needed to describe Si,Q and the What remains is to specify which updates are performed in cells associated to epochs j0, any axis-aligned rectangle in [0,n−n/β ]×[0,n− set Q exists. In the next section, we demonstrate the tech- i i n/β ] with area αn2/β contains between α/a  and α/a  nique on a concrete problem by proving our lower bound for 1 2 points, where a ≈ 1.9 and a ≈ 0.45. dynamic weighted orthogonal range counting. 1 2 Note that we assumeeachβi to be a Fibonacci number i 3. DYNAMIC LOWER BOUND (denoted fki ), and that each β divides n. These assump- In this section we prove our main result, which we have tions can easily be removed by fiddling with the constants, formulated in the following theorem: but this would only clutter the exposition. For the remainder of the paper, we let Ui denote the ran- Theorem 1. Any data structure for dynamic weighted dom variable giving the sequence of updates in epoch i,and orthogonal range counting in the cell probe model, must sat- we let U = U n ···U denote the random variable giv- t n/ wt 2 t lgβ 1 isfy q = Ω((lg lg( u)) ).Hereq is the expected av- n q t ing all updates of all lgβ epochs. Finally, we let be the erage query time and u the worst case update time. This random variable giving the query. lower bound holds when the weights of the inserted points are Θ(lg n)-bit integers. A Chronogram Approach. We prove Theorem 1 by devising a hard distribution over Having defined our hard distribution over updates and updates, followed by one uniform random query. We then queries, we now give a high-level proof of our lower bound. lower bound the expected cost (over the distribution) of Assumeadeterministic data structure solution exists with answering the query for any deterministic data structure worst case update time tu.From this data structure and a with worst case update time tu. By Yao’s principle [11], U U ,...,U U sequence of updates = lgβ n 1,whereeach j is a this translates into a lower bound on the expected average possible outcomeofUj , we define S(U) to be the set of cells query time of a possibly randomized data structure. In our stored in the data structure after executing the updates U. proof we assume the weights are 4 lg n-bit integers and note Now associate each cell in S(U) to the last epoch in which that our lower bound applies to any ε lg n-bit weights, where its contents were updated, and let Si(U) denote the subset ε> 0 is an arbitrarily small constant, simply because a data of S(U) associated to epoch i for i =1,...,lgβ n. Also structure for ε lg n-bit integer weights can be used to solve let ti(U, q)denotethenumber of cells in Si(U)probedby the problem for any O(lg n)-bit integer weights with a con- the query algorithm of the data structure when answering stant factor overhead by dividing the bits of the weights into the query q ∈ [n] × [n] after the sequence of updates U. δ/(ε lg n) = O(1) chunks and maintaining a data structure Finally, let ti(U) denote the average cost of answering a for each chunk. We begin our proof by presenting our hard query q ∈[n] × [n] after the sequence of updates U, i.e. let distribution over updates and queries. t U t U, q /n2 i( )= q∈[n]×[n] i( ) . Then the following holds:

88 9 Lemma 2. If β =(wtu) ,thenE[ti(U, q)] = Ω(lgβ n) for ·, · denotes the standard inner product. With these defini- i ≥ 15 n tions, we now present the main result forcing a data struc- all 16 lgβ . ture to probe many cells from each epoch: Before giving the proof of Lemma 2, we show that it im- β Lemma i ≥ 15 n U plies Theorem 1: Let be as in Lemma2.Sincethecellsets 3. Let 16 lgβ be an epoch, and let = S ,...,S U n,...,U Uj lgβ n(U) 1(U) are disjoint, we get that the number of lgβ 1 be a fixed sequence of updates, where each U t U o n cells probed when answering the query q is i ti(U, q). It is a possible outcome of j .If i( )= (lgβ ), then there now follows immediately from linearity of expectation that exists a subset of cells Ci ⊆ Si(U) and a set of query points the expected number of cells probed when answering q is Q ⊆ [n] × [n] such that: 2 Ω(lg n · lg n)=Ω((lgn/ lg(wtu)) ), which completes the β β |C | O βi−1w proof of Theorem 1. 1. i = ( ). The hard part thus lies in proving Lemma 2, i.e. in show- i− / 2. |Q| =Ω(β 3 4). ing that the random query must probe a lot of cells associ- i 15 n,..., n ated to each of the epochs = 16 lgβ lgβ . This will 3. The set of incidence vectors χi(Q)={χi(q) | q ∈ Q} be the focus of the next section. βi is a linearly independent set of vectors in [Δ] . i 3.1 Bounding the Probes to Epoch 4. The query algorithm of the data structure solution probes As also pointed out in Section 2, we prove Lemma2us- no cells in Si(U) \ Ci when answering a query q ∈ Q ing an encoding argument. Assume for contradiction that after the sequence of updates U. there exists a data structure solution such that under our 9 hard distribution, with β =(wtu) , there exists an epoch The contradiction that this lemma intuitively gives us, is ∗ 15 i ≥ n Q Ui 16 lgβ , such that the claimed data structure satisfies that the queries in reveal more information about than E[ti∗ (U, q)] = o(lgβ n). the bits in Ci can describe. We note that Lemma 3 essen- ∗ ··· ∗ First observe that Ui is independent of Ulgβ n Ui +1, tially is a generalization of the results proved in the static H ∗ | ··· ∗ H ∗ H · i.e. (Ui Ulgβ n Ui +1)= (Ui ), where ( )de- range counting papers [6, 3], simply phrased in termsofcell notes binary entropy. Furthermore, we have H(Ui∗ )= subsets answering many queries instead of communication ∗ βi lg Δ, since the updates of epoch i∗ consists of insert- complexity. Since the proof contains only few new ideas, ∗ ing βi fixed points, each with a uniform random weight we have deferred it to Section 4 and instead move on to amongst the integers [Δ]. Our goal is to show that, con- show how we use this lemma in our encoding and decoding ··· ∗ procedures. ditioned on Ulgβ n Ui +1, we can use the claimed data structure solution to encode Ui∗ in less than H(Ui∗ )bits in expectation, i.e. a contradiction. We view this encoding Encoding. U U ,...,U step as a game between an encoder and a decoder. The Let = lgβ n 1 be a fixed sequence of updates encoder receives as input the sequence of updates U = given as input to the encoder, where each Uj is a possible ∗ 15 ,..., j i ≥ n Ulgβ n U1. The encoder now examines these updates outcomeofU . Also let 16 lgβ be the epoch for and from them sends a message to the decoder (an en- which E[ti∗ (U, q)] = o(lgβ n). We construct the message of coding). The decoder now sees this message, as well as the encoder by the following procedure: U n,...,Ui∗ (we conditioned on these variables), and lgβ +1 U must from this uniquely recover Ui∗ . If we can design a 1. First the encoder executes the sequence of updates procedure for constructing and decoding the encoder’s mes- on the claimed data structure, and from this obtains S U ,...,S U sage, such that the expected size of the message is less than the sets lgβ n( ) 1( ). He then simulates the i∗ H(Ui∗ )=β lg Δ bits, then we have reached our contra- query algorithm on the data structure for every query q ∈ n × n t ∗ U diction. [ ] [ ]. From this, the encoder computes i ( ) S ∗ U Before presenting our encoding and decoding procedures, (just the average number of cells in i ( )thatare we show exactly what breaks down if the claimed data struc- probed). ture probes too few cells from epoch i∗. For this, we first 2. If ti∗ (U) > 2E[ti∗ (U, q)], then the encoder writes a introduce someterminology. For a query point q =(x, y) ∈ i∗ β  H ∗ O [n] × [n], we define for every epoch i =1,...,lg n the in- 1-bit, followed by lg Δ = (Ui )+ (1) bits, β simply specifying each weight assigned to a point in χ q { , }βi j cidence vector i( ), as a vector in 0 1 .The ’th coor- Ui∗ (this can be done in the claimed amount of bits by dinate of χi(q)is1ifthej’th point inserted in epoch i is i∗ interpreting the weights as one big integer in [Δβ ]). dominated by q, and 0 otherwise. More formally, for a query This is the complete message send to the decoder when q =(x, y), the j’th coordinate χi(q)j is given by: ti∗ (U) > 2E[ti∗ (U, q)]. i i i 1ifjn/β ≤ x ∧ (jfki−1 mod β )n/β ≤ y χi(q)j = t ∗ U ≤ E t ∗ , 0 otherwise 3. If i ( ) 2 [ i (U q)], then the encoder first writes a 0-bit. Now since ti∗ (U) ≤ 2E[ti∗ (U, q)] = o(lgβ n), Similarly, we define for a fixed sequence of updates Ui, we get from Lemma 3 that there must exist a set of U βi where i is a possible outcomeofUi,the -dimensional cells Ci∗ ⊆ Si∗ (U) and a set of queries Q ⊆ [n] × [n] vector ui for which the j’th coordinate equals the weight as- satisfying 1-4 in Lemma 3. The encoder finds such j U U signed to the ’th inserted point in i.Wenotethat i and sets Ci∗ and Q simply by trying all possible sets in ui uniquely specify each other, since all possible outcomes some arbitrary but fixed order (given two candidate i   of U inserts the same fixed points, only the weights vary. sets Ci∗ and Q it is straight forward to verify whether q Finally observe that the answer to a query after a fixed they satisfy properties 1-4 of Lemma3).Theencoder U ,...,U lgβ n χ q ,u sequence of updates lgβ n 1 is i=1 i( ) i ,where now writes down these two sets, including addresses

89 ∗ ∗ i +1 i +1 and contents of the cells in Ci∗ , for a total of at most contents of c from S (U),...,S ∗ (U). The de- lgβ n i +1 n2 O w |C ∗ |w O w ( )+2 i +lg |Q| bits (the ( )bitsspecifies coder can therefore recover the answer to each query q Q |Ci∗ | and |Q|). in if it had been executed after the sequence of updates U n,...,U , i.e. for all q ∈ Q,heknows lgβ 1 4. The encoder now constructs a set X, such that X = lgβ n χ q ,u i=1 i( ) i . χi∗ (Q) initially. Then he iterates through all vectors i∗ in {0, 1}β ,insome arbitrary but fixed order, and for 3. The next decoding step consists of computing for each query q in Q, the value χi∗ (q),ui∗ .Foreachq ∈ Q, each such vector x, checks whether x is in span(X). If lgβ n χ q ,u not, the encoder adds x to X. This process continues the decoder already knows the value i=1 i( ) i i∗ X β from the above. From the encoding of ui∗−1,...,u1, until dim(span( )) = , at which point the encoder ∗ i −1 x, u ∗ χ q ,u computes and writes down ( i mod Δ) for each the decoder can compute the value i=1 i( ) i x that was added to X.Sincedim(span(χi∗ (Q))) = and finally from U n,...,Ui∗ the decoder com- lgβ +1 |Q| (by point 3 in Lemma 3), this adds a total of lgβ n ∗ putes ∗ χi(q),ui . The decoder can now re-  βi −|Q|  i=i +1 ( )lgΔ bits to the message (by interpreting cover the value χi∗ (q),ui∗ simply by observing that i∗ β −|Q| lgβ n the values as one big integer in [Δ ]). χ ∗ q ,u ∗ χ q ,u − χ q ,u i ( ) i = i=1 i( ) i i= i∗ i( ) i . 5. Finally, the encoder writes down all of the cell sets 4. Now from the query set Q, the decoder construct the ∗ Si∗−1(U),...,S1(U), including addresses and contents, set of vectors X = χi (Q), and then iterates through ∗ βi plus all of the vectors ui∗−1,...,u1. This takes at most { , } ∗ all vectors in 0 1 ,inthesamefixedorderasthe i −1 |S U |w βj O w j=1 (2 j ( ) + lg Δ + ( )) bits. When this encoder. For each such vector x, the decoder again ver- is done, the encoder sends the constructed message to ifies whether x is in span(X), and if not, adds x to X the decoder. and recovers x, ui∗ mod Δ from the encoding. The ∗ ∗ decoder now constructs the βi × βi matrix A,hav- Before analysing the expected size of the encoding, we ing the vectors in X as rows. Similarly, he construct present the decoding procedure. the vector z having one coordinate for each row of A. The coordinate of z corresponding to a row vector x, Decoding. has the value x, ui∗ mod Δ. Note that this value is In the following, we describe our decoding procedure. The already known to the decoder, regardless of whether decoder receives as input the updates U n,...,Ui∗ and lgβ +1 x is also in χi∗ (Q)(simply taking modulo Δ on the the message from the encoder. The decoder now recovers value χi∗ (x),ui∗ computed above for each vector in Ui∗ by the following procedure: χi∗ (Q)), or was added later. Since A has full rank, and since the set [Δ] endowed with integer addition 1. The decoder examines the first bit of the message. If and multiplication modulo Δ is a finite field, it fol- this bit is 1, then the decoder immediately recovers lows that the system of equations A ⊗ y = z has a Ui∗ from the encoding (step 2 in the encoding proce- i∗ unique solution y ∈ [Δ]β (here ⊗ denotes matrix- dure). If not, the decoder instead executes the updates ∗ βi U ···U ∗ ui∗ ∈ lgβ n i +1 on the claimed data structure solution vector multiplication modulo Δ). But [Δ] ∗ ∗ i +1 i +1 and A ⊗ ui∗ = z, thus the decoder now solves the and obtains the cells sets S (U),...,S ∗ (U)where lgβ n i +1 ∗ linear system of equations A ⊗ y = z and uniquely re- Si +1 U j ( ) contains the cells that were last updated dur- covers ui∗ , and therefore also Ui∗ .Thiscompletes the j U ,...,U ∗ ing epoch when executing updates lgβ n i +1 decoding procedure. U ,...,U (and not the entire sequence of updates lgβ n 1).

2. The decoder now recovers Q, Ci∗ ,Si∗−1(U),...,S1(U) Analysis. and ui∗−1,...,u1 from the encoding. For each query We now analyse the expected size of the encoding of Ui∗ . q ∈ Q, the decoder then computes the answer to q as We first analyse the size of the encoding when ti∗ (U) ≤ U ,...,U if all updates lgβ n 1 had been performed. The 2E[ti∗ (U, q)]. In this case, the encoder sends a message of decoder accomplishes this by simulating the query al- 2 gorithm on q, and for each cell requested, the decoder n i∗ 2|Ci∗ |w +lg +(β −|Q|)lgΔ recovers the contents of that cell as it would have been |Q| U ,...,U if all updates lgβ n 1 had been performed. This ∗ is done in the following way: When the query algo- i−1 c j rithm requests a cell , the decoder first determines +O(w lgβ n)+ (2|Sj (U)|w + β lg Δ) ∗ whether c is in one of the sets Si −1(U),...,S1(U). j=1 c If so, the correct contents of (the contents after the i∗ U U ,...,U c bits. Since β lg Δ = H(Ui∗ )and|Ci∗ |w = o(|Q|), the updates = lgβ n 1) is directly recovered. If is not amongst these cells, the decoder checks whether above is upper bounded by ∗ c is in Ci∗ . If so, we have again recovered the con- i−1 2 j tents. Finally, if c is not in Ci∗ ,thenfrom point 4 H(Ui∗ ) −|Q| lg(Δ/n )+o(|Q|)+ (2|Sj (U)|w + β lg Δ). of Lemma3,wegetthatc is not in Si∗ (U). Since j=1 ∗ c is not in any of Si (U),...,S1(U), this means that ∗ ∗ β ≥ i −1 βj < βi −1 the contents of c has not changed during the updates Since 2, we also have j=1 lg Δ 2 lg Δ = j Ui∗ ,...,U1, and thus the decoder finally recovers the o(|Q| lg Δ). Similarly, we have |Sj (U)|≤β tu, which gives

90 ∗ ∗ i −1 |S |w< βi −1wt o |Q| n/βi−j/2 n/βj/2 C us j=1 2 j (U) 4 u = ( ). From standard has width and height . The existence of i results on primenumbers, we have that the largest prime is guaranteed by the following lemma: number smaller than n4 is at least n3 for infinitely many 2 15 n /n Lemma i ≥ n U n,...,U , i.e. we can assumelg(Δ ) = Ω(lg Δ). Therefore, the 5. Let 16 lgβ be an epoch and lgβ 1 above is again upper bounded by a fixed sequence of updates, where each Uj is a possible out- ∗ come of Uj . Assume furthermore that the claimed data i −3/4 H i∗ − |Q| H i∗ − β . (U ) Ω( lg Δ) = (U ) Ω( lg Δ) structure satisfies ti(U)=o(lgβ n). Then there exists a set C ⊆ S U j ∈{ ,..., i − } This part thus contributes at most of cells i i( ) and an index 2 2 2 ,such |C | O βi−1w Q βi−3/4 ∗ that i = ( ) and Ci has hitting number Ω( ) i −3/4 Pr[ti∗ (U) ≤ 2E[ti∗ (U, q)]] · (H(Ui∗ ) − Ω(β lg Δ)) on the grid Gj . bits to the expected size of the encoding. The case where To not remove focus from our proof of Lemma3wehave t ∗ > E t ∗ , t ∗ > i (U) 2 [ i (U q)] similarly contributes Pr[ i (U) moved the proof of this lemma to Section 4.2. We thus move E t ∗ , · H ∗ O 2 [ i (U q)]] ( (Ui )+ (1)) bits to the expected size of on to show that Lemma4andLemma5implies Lemma3. the encoding. Now since q is uniform,wehaveE[ti∗ (U)] = By assumption we have ti(U)=o(lgβ n). Combining this E[ti∗ (U, q)], we therefore get from Markov’s inequality that C ⊆ 1 with Lemma 5, we get that there exists a set of cells i Pr[ti∗ (U) > 2E[ti∗ (U, q)]] < . Therefore the expected size 2 Si(U) and an index j ∈{2,...,2i − 2}, such that |Ci| = 1 of our encoding is upper bounded by O(1) + H(Ui∗ )+ O βi−1w Q ∗ 2 ( ) and the set of queries Ci has hitting number 1 i −3/4 H i∗ − β

Si(U) and a grid G, such that the set of queries QCi that column with index at least three less than that hit by qj probe no cells in Si(U) \ Ci, hit a large number of grid cells (we crossed out the two columns preceding that hit by qj ), in G. For this, first define the grids G2,...,G2i−2 where Gj or they hit a grid cell in the same columnasqj but with

91 1/4 a row index that is at least three lower than that hit by qj β ). The probability that a query qh in Q is not well- (we crossed out the two rows preceding that hit by qj ). In separated is therefore bounded by either case, such a query cannot dominate the point inside 2 2 2 the cells associated to qj . +(1− ) β1/4 β1/4 (|k − h|−1)β1/2 + β1/4 What remains is to bound the size of Q. Initially, we have k= j i |Q| = h.Thebottom two rows have a total area of 2n3/β μ, n/μ 10 2 1 lg n thus by Lemma 1 they contain at most 6 points. The ≤ + = O + . leftmost two columns have area 2nμ and thus contain at β1/4 |k − h|β1/2 β1/4 β1/2 k= j most 6μβi/n points. After crossing out these rows and col- i 9 2 umn we are therefore left with |Q|≥h−6n/μ−6μβ /n.Fi- Since β =(wtu) = ω(lg n) this probability is o(1), and nally, when crossing out even or odd rows we always choose the result now follows from linearity of expectation and the one eliminating fewest points, thus the remaining steps Markov’s inequality. at most reduce the size of Q by a factor 16. This completes Now let Si(Q, U) ⊆ Si(U) denote the subset of cells in the proof of Lemma4. Si(U) probed by the query algorithm of the claimed data 4.2 Proof of Lemma 5 structure when answering all queries in a set of queries Q after the sequence of updates U (i.e. the union of the cells We prove the lemma using another encoding argument. probed for each query in Q). Since a uniform random query However, this time we do not encode the update sequence, from Q is uniform in [n] × [n], we get by linearity of expec- but instead we define a distribution over query sets, such i−1 tation that E[|Si(Q,U)|]=β ti(U). From this, Lemma6, that if Lemma 5 is not true, then we can encode such a Markov’s inequality and a union bound, we conclude query set in too few bits. U U ,...,U Let = lgβ n 1 be a fixed sequence of updates, Lemma 7. The query set Q is both well-separated and i−1 where each Uj is a possible outcomeofUj .Furthermore, |Si(Q,U)|≤4β ti(U) with probability at least 1/2. assume for contradiction that the claimed data structure With this established, we are now ready to give our impos- satisfies both ti(U)=o(lg n) and for all cell sets C ⊆ Si(U) β sible encoding of Q. of size |C| = O(βi−1w) and every index j ∈{2,...,2i−2},it Q G o βi−3/4 holds that the hitting number of C on grid j is ( ). Encoding. Here QC denotes the set of all queries q in [n]×[n] such that In the following we describe the encoding procedure. The the query algorithm of the claimed data structure probes no encoder receives as input a set of queries Q,whereQ is a cells in Si(U) \ C when answering q after the sequence of possible outcomeofQ. He then executes the below proce- updates U. Under these assumptions we will construct an dure to encode Q: impossible encoder. As mentioned, we will encode a set of queries: 1. The encoder first executes the fixed sequence of up- dates U on the claimed data structure, and from this S U ,...,S U Hard Distribution. obtains the sets lgβ n( ) 1( ). He then runs Let Q denote a random set of queries, constructed by the query algorithm for every query q ∈ Q and collects drawing one uniform random query (with integer coordi- the set Si(Q, U). i−1 nates) from each of the β vertical slabs of the form i−1 2. If Q is not well-separated or if |Si(Q, U)| > 4β ti(U), [hn/βi−1 , (h +1)n/βi−1) × [0,n), then the encoder sends a 1-bit followed by a straight forward encoding of Q using H(Q)+O(1) bits in total. i−1 where h ∈ [β ]. Our goal is to encode Q in less than This is the complete encoding procedure when either i− i− H β 1 n2/β 1 i−1 (Q)= lg( ) bits in expectation. Before giv- Q is not well-separated or |Si(Q, U)| > 4β ti(U). ing our encoding and decoding procedures, we prove some i−1 simple properties of Q: 3. If Q is both well-separated and |Si(Q, U)|≤4β ti(U), Define a query q in a query set Q to be well-separated if then the encoder first writes a 0-bit and then executes for all other queries q ∈ Q,whereq = q, q and q do not lie the remaining four steps. 2 i−1/2 within an axis-aligned rectangle of area n /β . Finally, 1   4. The encoder examines Q and finds the at most |Q| define a query set Q to be well-separated if at least 1 |Q | 2  2 queries that are not well-separated. Denote this set Q . Q queries in are well-separated. We then have: The encoder now writes down Q by first specifying |Q| Lemma 6. The query set Q is well-separated with proba- , then which vertical slabs contain the queries in Q bility at least 3/4. and finally what the coordinates of each query in Q O w |Q| is within its slab. This takes ( )+lg |Q| + Proof. Let q denote the random query in Q lying in  i− i−  i− h |Q | lg(n2/β 1)=O(w)+O(β 1)+|Q | lg(n2/β 1) the h’th vertical slab. The probability that q lies within a h bits. distance of at most n/βi−3/4 from the x-border of the h’th i− / i− / slab is precisely (2n/β 3 4)/(n/β 1)=2/β1 4.Ifthisis 5. The encoder now writes down the cell set Si(Q, U), not the case, then for another query qk in Q, we know that including only the addresses and not the contents. o H the x-coordinates of qh and qk differ by at least (|k − h|− This takes ( (Q)) bits since n/βi−1 n/βi−3/4 1) + .Thisimplies that qh and qk can only |S U | 2 i−1/2 i( ) i−1 be within an axis-aligned rectangle of area n /β if their lg = O(β ti(U)lg(βtu)) |Si(Q, U)| y-coordinates differ by at most n/((|k − h|−1)β1/2 + β1/4). This happens with probability at most 2/((|k −h|−1)β1/2 + = o(βi−1 lg(n2/βi−1)),

92 i whereinthefirstlineweusedthat|Si(U)|≤β tu and 2. If the first bit is 0, the decoder proceeds with this step i−1 |Si(Q, U)|≤4β ti(U). The second line follows from and all of the below steps. The decoder executes the 2 i−1 the fact that ti(U)=o(lgβ n)=o(lg(n /β )/ lg(βtu)) updates U on the claimed data structure and obtains β ω t S U ,...,S U since = ( u). the sets lgβ n( ) 1( ). From step 4 of the en- coding procedure, the decoder also recovers Q. 6. Next we encode the x-coordinates of the well-separated queries in Q. Since we have already encoded which 3. From step 5 of the encoding procedure, the decoder vertical slabs contain well-separated queries (we re- now recovers the addresses of the cells in Si(Q, U). ally encoded the slabs containing queries that are not Since the decoder has the data structure, he already well-separated, but this is equivalent), we do this by knows the contents. Following this, the decoder now specifying only the offset within each slab. This takes simulates every query in [n] × [n], and from this and (|Q|−|Q|)lg(n/βi−1)+O(1) bits. Following that, the S Q, U Q i( ) recovers the set Si(Q,U). encoder considers the last grid G2i−2,andforeach well-separated query q,hewritesdownthey-offset 4. From step 6 of the encoding procedure, the decoder x of q within the grid cell of G2i−2 hit by q.Since now recovers the -coordinates of every well-separated i−1 Q the grid cells of G2i−2 have height n/β ,thistakes query in (the offsets are enough since the decoder  (|Q|−|Q|)lg(n/βi−1)+O(1) bits. Combined with the knows which vertical slabs contain queries in Q ,and encoding of the x-coordinates, this step adds a total of thus also those that contain well-separated queries). (|Q|−|Q|)lg(n2/β2i−2)+O(1) bits to the size of the Following that, the decoder also recovers the y-offset encoding. of each well-separated query q ∈ Q within the grid cell of G2i−2 hit by q (note that the decoder does not know 7. In the last step of the encoding procedure, the en- what grid cell it is, he only knows the offset). coder simulates the query algorithm for every query n × n Q Q in [ ] [ ]andfrom this obtains the set Si(Q,U), 5. From the set Si(Q,U) the decoder now recovers the set Q i.e. the set of all those queries that probe no cells Si(Q,U) Gj for each j =2,...,2i − 2. This information in Si(U) \ Si(Q, U). Observe that Q ⊆ QS Q,U . Q i( ) Q G Si(Q,U) The encoder now considers each of the grids Gj ,for is immediate from the set Si(Q,U).From j j ,..., i − and step 7 of the encoding procedure, the decoder now =2 2 2, and determines both the set of grid Q Q G j G Q G Si(Q,U) ⊆ G Q recovers j for each .Ingrid 2, we know that cells j j hitbyaqueryin Si(Q,U),and Q has only one query in every column, thus the decoder GQ ⊆ G Si(Q,U) ⊆ G GQ G the set of grid cells j j j hit by a well- can determine uniquely from 2 which grid cell of 2 separated query in Q. The last step of the encoding is hit by each well-separated query in Q.Nowob- Q 1/2 consists of specifying Gj . This is done by encoding serve that the axis-aligned rectangle enclosing all β QS (Q,U) G G i GQ grid cells in j+1 that intersects a fixed grid cell in which subset of j corresponds to j .This 2 i−1/2 Gj n /β QS (Q,U) has area . Since we are considering well- |G i | takes lg j bits for each j =2,...,2i − 2. separated queries, i.e. queries where no two lie within |GQ| 2 i−1/2 j an axis-aligned rectangle of area n /β ,thismeans i−1 i−1 GQ Since |Si(Q, U)| = o(β lgβ n)=o(β w)weget that j+1 contains at most one grid cell in such a group from our contradictory assumption that the hitting of β1/2 grid cells. Thus if q is a well-separated query in Q G o βi−3/4 number of Si(Q,U) on each grid j is ( ), thus Q,wecandetermine uniquely which grid cell of Gj+1 Q Q Si(Q,U) i−3/4 q G |Gj | = o(β ). Therefore the above amount that is hit by ,directlyfrom j+1 and the grid cell of bits is at most in Gj hit by q. But we already know this informa- tion for grid G2,thuswecanrecoverthisinformation (|Q|−|Q|)lg(βi−3/4e/(|Q|−|Q|))(2i − 3) ≤ for grid G3,G4,...,G2i−2.Thusweknowforeach  1/4 i−1 (|Q|−|Q |)lg(β )2i + O(β i) ≤ well-separated query in Q which grid cell of G2i−2 it  i− hits. From the encoding of the x-coordinates and the (|Q|−|Q |) 1 lg(β)2 lg n + O(β 1 lg n) ≤ 4 β β y-offsets, the decoder have thus recovered Q. |Q|−|Q| 1 n o H ( ) 2 lg + ( (Q)) This completes the encoding procedure, and the en- Analysis. coder finishes by sending the constructed message to Finally we analyse the size of the encoding. First consider the decoder. the case where Q is both well-separated and |Si(Q,U)|≤ i−1 4β ti(U). In this setting, the size of the message is bounded Before analysing the size of the encoding, we show that by the decoder can recover Q from the encoding. | | n2/βi−1 | |−| | n2/β2i−2 1 n o H Q lg( )+( Q Q )(lg( )+ 2 lg )+ ( (Q)) Decoding. bits. This equals In this paragraph we describe the decoding procedure. U U ,...,U 2+1/2 2i−2  i−1 1/2 The decoder only knows the fixed sequence = lgβ n 1 |Q| lg(n /β )+|Q | lg(β /n )+o(H(Q)) and the message received from the encoder. The goal is to i ≥ 15 n recover Q, which is done by the following steps: bits. Since we are considering an epoch 16 lgβ ,we have lg(n2+1/2/β2i−2) ≤ lg(n5/8β2), thus the above amount 1. The decoder examines the first bit of the message. If of bits is upper bounded by this is a 1-bit, the decoder immediately recovers Q from the remaining part of the encoding. |Q| lg(n5/8β2)+|Q| lg(n1/2)+o(H(Q)).

93 | |≤ 1 | | Since Q 2 Q , this is again bounded by 7. REFERENCES / [1] A. Fiat and A. Shamir. How to find a battleship. |Q| lg(n7 8β2)+o(H(Q)) Networks, 19:361–371, 1989. bits. But H(Q)=|Q| lg(n2/βi) ≥|Q| lg n, i.e. our encoding [2] M. Fredman and M. Saks. The cell probe complexity 15 H uses less than 16 (Q) bits. of dynamic data structures. In Proc. 21st ACM Finally, let E denote the event that Q is well-separated Symposium on Theory of Computation, pages 345–354, i−1 and at the sametime |Si(Q,U)|≤4β ti(U), then the ex- 1989. pected number of bits used by the entire encoding is bounded [3] A. G. Jørgensen and K. G. Larsen. Range selection by and median: Tight cell probe lower bounds and adaptive data structures. In Proc. 22nd ACM/SIAM O(1) + Pr[E](1 − Ω(1))H(Q)+(1− Pr[E])H(Q) Symposium on Discrete Algorithms, pages 805–813, The contradiction is now reached by invoking Lemma7to 2011. conclude that Pr[E] ≥ 1/2. [4] J. Matouˇsek. Geometric Discrepancy. Springer, 1999. [5] R. Panigrahy, K. Talwar, and U. Wieder. Lower 5. CONCLUDING REMARKS bounds on near neighbor search via metric expansion. In this paper we presented a new technique for proving In Proc. 51st IEEE Symposium on Foundations of dynamic cell probe lower bounds. With this technique we Computer Science, pages 805–814, 2010. proved the highest dynamic cell probe lower bound to date [6] M. Pˇatra¸scu. Lower bounds for 2-dimensional range under the most natural setting of cell size w =Θ(lgn), counting. In Proc. 39th ACM Symposium on Theory of 2 namely a lower bound of tq =Ω((lgn/ lg(wtu)) ). Computation, pages 40–46, 2007. While our results have taken the field of cell probe lower [7] M. Pˇatra¸scu. Unifying the landscape of cell-probe bounds one step further, there is still a long way to go. lower bounds. In Proc. 49th IEEE Symposium on Amongst the results that seems within grasp, we find it Foundations of Computer Science, pages 434–443, a very intriguing open problem to prove an ω(lg n)lower 2008. bound for a problem where the queries have a one bit out- [8] M. Pˇatra¸scu. Towards polynomial lower bounds for put. Our technique crucially relies on the output having dynamic problems. In Proc. 42nd ACM Symposium on more bits than it takes to describe a query, since otherwise Theory of Computation, pages 603–610, 2010. the encoder cannot afford to tell the decoder which queries [9] M. Pˇatra¸scu and E. D. Demaine. Logarithmic lower to simulate. Since many interesting data structure problems bounds in the cell-probe model. SIAM Journal on have a one bit output size, finding a technique for handling Computing, 35:932–963, April 2006. this case would allow us to attack many more fundamental [10] M. Pˇatra¸scu and M. Thorup. Don’t rush into a union: data structure problems. Take time to find your roots. In Proc. 43rd ACM Applying our technique to other problems is also an im- Symposium on Theory of Computation, 2011. To portant task, however such problems must again have a log- appear. See also arXiv:1102.1783. arithmic number of bits in the output of queries. [11] A. C.-C. Yao. Probabilistic computations: Toward a unified measure of complexity. In Proc. 18th IEEE 6. ACKNOWLEDGMENT Symposium on Foundations of Computer Science, The author wishes to thank Peter Bro Miltersen for much pages 222–227, 1977. useful discussion on both the results and writing of this pa- [12] A. C. C. Yao. Should tables be sorted? Journal of the per. ACM, 28(3):615–628, 1981.

94