Dynamic Elias-Fano Representation

Giulio Ermanno Pibiri1 and Rossano Venturini1

1 Computer Science Department, University of Pisa, Italy [email protected], [email protected]

Abstract We show that it is possible to store a dynamic ordered set S(n, u) of n integers drawn from a bounded universe of size u in space close to the information-theoretic lower bound and yet preserve the asymptotic time optimality of the operations. Our results leverage on the Elias- u Fano representation of S(n, u) which takes EF(S(n, u)) = ndlog n e + 2n bits of space and can be shown to be less than half a bit per element away from the information-theoretic minimum. Considering a RAM model with memory words of Θ(log u) bits, we focus on the case in which the integers of S are drawn from a polynomial universe of size u = nγ , for any γ = Θ(1). We represent S(n, u) with EF(S(n, u)) + o(n) bits of space and: 1. support static predecessor/suc- u cessor queries in O(min{1 + log n , log log n}); 2. make S grow in an append-only fashion by spending O(1) per inserted element; 3. support random access in O(log n/ log log n) worst- case, insertions/deletions in O(log n/ log log n) amortized and predecessor/successor queries in u O(min{1 + log n , log log n}) worst-case time. These time bounds are optimal.

1998 ACM Subject Classification E.1 Data Structures, E.4 Coding and Information Theory, F.2.2 Nonnumerical Algorithms and Problems

Keywords and phrases Succinct Data Structures, Integer Sets, Predecessor Problem, Elias-Fano

Digital Object Identifier 10.4230/LIPIcs.CPM.2017.48

1 Introduction

The problem we consider is the one of representing in compressed space a dynamic ordered set S of n integer keys, which is a fundamental textbook problem (see the introduction to parts III and V of [8]). In general, any self-balancing search tree , e.g., AVL or Red-Black tree, solves the problem optimally in the comparison model, by implementing all operations in O(log n) worst-case time and using linear space [8]. However, by exploiting the fact that the stored keys are integers drawn from a bounded universe of size u, the problem is known to admit more efficient solutions in terms of asymptotic while still retaining linear space [8, 23, 26, 30, 13, 14]. Classical examples include the [26, 27, 28], x/y-fast trie [30] and the [14], that was the first data structure able to surpass the information-theoretic lower bound, by exhibiting an optimal [13] amount of time per operation within a number of memory words proportional to the size of the input. Some efforts have been spent in trying to reduce the space requirements of the representation [16, 18, 25] but known compressed solutions do not closely match the information-theoretic lower bound of the underlying integer set. In this paper we show that it is possible to preserve the optimal bounds for the operations under almost optimal space requirements. The key ingredient of our data structures is the Elias-Fano representation of monotone integer sequences [10, 11]. In particular, Elias-Fano u encodes a monotone integer sequence S(n, u) in EF(S(n, u)) = ndlog n e + 2n bits, which can be shown to be less than half a bit per element away from optimality [10], maintaining the capability of randomly access an integer in O(1) worst-case time. The query Predecessor,

© Giulio Ermanno Pibiri and Rossano Venturini; licensed under Creative Commons License CC-BY 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Editors: Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter; Article No. 48; pp. 48:1–48:14 Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 48:2 Dynamic Elias-Fano Representation

which, given an integer x, returns max{y ∈ S : y < x}, is possible as well over the compressed u sequence in O(1 + log n ) worst-case time. These properties make Elias-Fano extremely efficient on crucial practical applications, e.g., inverted indexes compression, just to mention the most noticeable one. Since inverted indexes can indeed be regarded as being a collection of sorted integer sequences, recent works [29, 20] have shown that Elias-Fano exhibits the best time/space trade-off thanks to its efficient search capabilities and strong theoretical guarantees. For this specific application, the operation that has to be supported efficiently is Successor(x) = min{y ∈ S : y ≥ x}, which is commonly called NextGEQ (Next Greater or EQual) [29, 20]. Throughout the paper we adopt the classical nomenclature and discuss Predecessor(x) as it is well known that the twin query Successor(x) is solved in a similar way. The natural question is whether it is possible to extend the static Elias-Fano representation to dynamic scenarios, in which integers can also be inserted/deleted in/from S. To this end, we consider the case in which the n integers of S are drawn from a polynomial universe of size u = nγ , for any γ = Θ(1). This is the classical operational setting as considered by Fredman and Saks [13] (list representation problem) and let us concentrate on the typical case of practical interest. In order to characterize the asymptotic complexity of the data structures described in the paper and review the literature, we use a RAM model with word size w = Θ(log u) bits. We also adopt the usual trans-dichotomous assumption [14], making w grow with n as needed. We maintain S(n, u) using EF(S(n, u)) + o(n) bits of space, hence introducing a sublinear space overhead with respect to its static Elias-Fano representation, and show how: u 1. static predecessor/successor queries can be supported in O(min{1 + log n , log log n}) u worst-case time (note that the first term of the bound, i.e., O(1 + log n ), is optimal only for polynomial universes of size u = nγ with 1 ≤ γ ≤ 1 + log log n/ log n); 2. to extend S in an append-only fashion, i.e., by assuming that integers are inserted in the data structure in sorted order, using a constant amount of work per integer; 3. to maintain S in a fully dynamic way, supporting random access in O(log n/ log log n) worst-case, insertions/deletions in O(log n/ log log n) amortized and predecessor/successor u queries in O(min{1 + log n , log log n}) worst-case time.

2 Related Work

We organize the discussion of the related work in three parts. The first part concerns the review of the results about the static predecessor problem. The second one explains in details the (static) Elias-Fano representation of monotone integer sequences because it forms the backbone of our solutions. The last part finally describes the results closest to our work for the maintenance of a dynamic integer set.

2.1 Static Predecessor Problem

We could solve the static predecessor problem in O(1) worst-case by storing all results to every possible query using perfect hashing [12] in O(u) words of space. In order to not trivialize the problem, assume we have a polynomial space budget, e.g., we deal with a data structure occupying O(nO(1)) words. √ Ajtai [1] proved the first ω(1) lower bound, claiming that ∀w, ∃n that gives Ω( log w) query time. Only ten years later Beame and Fich [3, 4] proved two strong bounds for any G. E. Pibiri and R. Venturini 48:3 cell-probe data structure1. They proved that ∀w, ∃n that gives Ω(log w/ log log w) query time and that ∀n, ∃w that gives Ω(plog n/ log log n) query time. They also gave a static data structure achieving O(min{log w/ log log w, plog n/ log log n}) which is, therefore, optimal. Building on a long line of research, Pˇatraşcuand Thourp [21, 22] finally proved the following optimal space-time trade-off for a static data structure taking m = n2aw bits of space, with m a = log n − log w  n w − log n log w log w o Θ min log n, log , a , a (1) w a log( a log w ) w log n log n a log(log a / log a ) This lower bound holds for cell-probe, RAM, trans-dichotomous RAM, external memory and communication game models. The first branch of the trade-off indicates that, whenever we are in RAM or external memory with one integer fitting in one memory word, fusion trees are optimal, as these require O(logw n) = O(log n/ log w) query time. The second branch holds for polynomial universes, i.e., whenever u = nγ , for any γ = Θ(1). In such case we have that w = Θ(log u) = γ log n, therefore y-fast tries [30] and van Emde Boas trees [26, 27, 28] are optimal with query time O(log log u) = O(log log n). Finally, the last two branches of the trade-off treat the case for super-polynomial universes. In particular, the third branch matches the lower bound by Beame and Fich [3, 4] that requires nO(1) words of space; the 1− fourth branch improves this space occupancy, showing that n1+1/ exp(log log u) words are sufficient, for any  > 0.

2.2 Static Elias-Fano Representation The integer encoding we describe in this section was independently proposed by Peter Elias [10] and Robert Mario Fano [11], hence its name. Given a monotonically increasing sequence S(n, u) of n positive integers drawn from a universe of size u (i.e., S[i−1] ≤ S[i], for any 1 ≤ i < n, with S[n−1] ≤ u), we write each S[i] in binary using dlog ue bits. Each binary representation is then split into two parts: a high part consisting in the first dlog ne most u significant bits that we call high bits and a low part consisting in the remaining ` = blog n c bits that we similarly call low bits. Let us call hi and `i the values of high and low bits of S[i] respectively. The Elias-Fano representation of S is given by the encoding of the high and low parts. The array L = [`0, . . . , `n−1] is stored in fixed-width and represents the encoding of the low parts. Concerning the high bits, we represent them in negated unary2 using a u bit vector of n + d 2` e ≤ 2n bits as follows. We start from a 0-valued bit vector H and set the bit in position hi + i, for all i ∈ [0, n). The effect is that now the k-th unary integer m of H indicates that m integers of S have high bits equal to k. Finally the Elias-Fano representation of S is given by the concatenation of H and L and overall takes l u m EF(S(n, u)) = n log + 2n bits. (2) n While we can opt for an arbitrary split ranging from 0 to dlog ue into high and low parts, u it can be shown that the value ` = blog n c minimizes the overall space occupancy of the encoding [10]. As the information theoretic lower bound for a monotone sequence of n

1 In the cell-probe computational model, described by Yao [31], computation is for free given that we only take into account word reads. It is not a very realistic model of computation, but it is useful to prove lower bounds because it is a stronger model than RAM and trans-dichotomous RAM. 2 The negated unary representation of an integer x, is the bitwise NOT of its unary representation U(x). An example: U(5) = 00001 and NOT(U(5)) = 11110.

C P M 2 0 1 7 48:4 Dynamic Elias-Fano Representation

u+n u+n elements drawn from a universe of size u is dlog n e ≈ n log n + n log e bits, it can be shown that less than half a bit is wasted per element by the space bound in (2) [10]. Since we set a bit for every i ∈ [0, n) in H and each hi is extracted in O(1) time from S[i], it follows that S gets encoded with Elias-Fano in Θ(n) time. Despite the simplicity of the encoding, it is possible to randomly access an integer from a sequence encoded with Elias-Fano without decompressing it. We refer to this operation as Access(i) in the following, which returns the i-th (smallest) element of the sequence. The operation is supported using an auxiliary data structure that is built on bit vector H, able to efficiently answer Select1(i) queries, that return the position in H of the i-th 1 bit. This auxiliary data structure is succinct in the sense that it is negligibly small compared to EF(S(n, u)), requiring only o(n) additional bits [7, 29].

Using the Select1 primitive, it is possible to implement Access(i), which returns S[i] for any i ∈ [0, n), in O(1). We basically have to re-link together the high and low bits of an integer, previously split up during the encoding phase. The low bits `i are trivial to retrieve as we need to read the range of bits [i`, (i + 1)`) from L. Note that we also need to store the quantity `: a global redundancy of O(log u) bits is sufficient. The retrieval of the high bits deserve, instead, a bit more care. Since we write in negated unary how many integers share the same high part, we have a bit set for every integer of S and a zero for every distinct high part. Therefore, to retrieve the high bits of the i-th integer, we need to know how many zeros are present in H[0, Select1(i)). This quantity is evaluated on H in O(1) as Rank0(Select1(i)) = Select1(i) − i. Notice, therefore, that the succinct rank/select data structure does not have to support Rank. Finally, linking the high and low bits is as simple as: Access(i) = ((Select1(i) − i)  `) ∨ `i, where  is the left shift operator and ∨ is the bitwise OR. u 3 The query Successor(x) is supported in O(1 + log n ) time , as follows. Let hx be the high bits of x, i.e., its first dlog ne most significant bits. Then p1 = Select0(hx) − hx represents the number of integers in S whose high bits are less than hx. On the other hand, p2 = Select0(hx + 1) − hx − 1 gives us the position at which the elements having high bits greater than hx start. These two preliminary operations take O(1). We can now determine the successor of x by binary searching in this interval which may contain up to u/n integers. The algorithm for Predecessor(x) runs in a similar way. In particular, it could be that Predecessor(x) lies before the interval [p1, p2): in this case S[p1 − 1] is the element to return.

2.3 Dynamic Problems We now review the most important results concerning the maintenance of a dynamic set of integers/binary strings, following the chronological order of their proposal. The van Emde Boas tree is a recursive data structure that maintains S in O(u) words of space and supports the operations: Search which tests whether a given integer is present or not in S, Insert/Delete and Predecessor/Successor all in O(log w) worst-case time [26, 27, 28]. Willard [30] improved the space bound to O(n) words by introducing the y-fast trie that supports Search and Predecessor/Successor queries in O(log w) worst-case time, Insert/Delete in amortized O(log w) time. The work by Fredman and Saks [13] is useful to understand which lower bounds apply to the problem we consider in the paper. They described the list representation problem, i.e., how

3 u u We report the bound as O(1 + log n ), instead of O(log n ), to cope with the case n = u. G. E. Pibiri and R. Venturini 48:5 to maintain S under the triad of operations Access/Insert/Delete, and proved that it can be solved in Ω(log n/ log log n) amortized time per operation if w ≤ logγ n for some γ. No space bound is posed on such problem. Their lower bound does not apply to dynamic predecessor queries and holds for the cell-probe computational model [31]. Extending the result to the dynamic predecessor problem, they proved that any cell-probe data structure representing S using (log u)O(1) bits per memory cell and nO(1) worst-case time for insertions, requires Ω(plog n/ log log n) worst-case query time. They also proved that on a RAM, the dynamic predecessor problem can be solved in O(min{log log n · log w/ log log w, plog n/ log log n}), using O(n) words. This bound was matched by Andersson and Thorup [2] with the so-called exponential search tree. This data structure has an optimal bound of O(plog n/ log log n) worst-case time for searching and updating S, using polynomial space. Raman, Raman and Rao [24] also addressed the list representation problem4 for arrays of length n by providing two solutions. Their first data structure supports Access in O(1) and Insert/Delete in O(n) worst-case time for any fixed positive  < 1; the second data structure implements all the three operations in O(log n/ log log n) amortized time. Both data structures use o(n) bits of redundancy and the time bounds are optimal. Fredman and Willard [14] showed that dynamic predecessor queries can be answered in O(log n/ log log n) time by using the fusion tree. This data structure is a B-tree with branching factor B = Θ(log n) that stores in each internal node a fusion node, a small data structure able of answering predecessor queries in O(1) for sets up to w1/5 integers. Updating a fusion node takes, however, O(B4) time. The overall space of the data structure is O(n) words. The work by Pˇatraşcuand Thorup [23] has recently shown that it is possible to “dynamize” the fusion node, by supporting Insert and Delete in O(1). As a result, they have proposed a data structure representing S in O(n) words and optimal O(log n/ log w) running time for the operations Insert, Delete, Predecessor, Successor, Rank and Select. We also mention a few additional results, that will be useful in the following. Bille et al. [5] recently combined the static solution of Demaine and Pˇatraşcu[9] with the one by Pˇatraşcuand Thorup [23] to support dynamic prefix sums over an array of size n in optimal O(log n/ log(w/δ)) time per operation and linear space, where δ is the number of bits needed to encode the quantity that we sum to the elements of the array. Though not devised for integer sets, the extended CRAM (Compressed Random Access Memory) data structure described by Jansson, Sadakane and Sung [17] allows a string S of length n to be stored using its k-th order empirical entropy nHk(S) plus a redundancy of O(n log σ(k log σ + (k + 1) log log n)/ log n) bits for every 0 ≤ k < logσ n, where σ is the size of the alphabet, in such a way that Insert/Delete of characters and Access to any consecutive logσ n bits are all supported in optimal O(log n/ log log n) worst-case time. We will exploit the part of this work dedicated to the memory management. Grossi et al. [15] improved the previous space bound by using nHk(S) + O(n log log n/ logσ n) bits and maintaining the asymptotic optimality for all operations. The paper by Navarro and Nekrich [19] illustrates a data structure supporting Access, Rank/Select queries, as well as symbol insertions/deletions on 1+ S in optimal O(log n/ log log n) time and taking nH0(S) + O(n + σ(log σ + log n)) bits of space. Of particular interest for our purposes, is the data structure described in Appendix A.1 concerning the organization of data in small blocks. The high-level idea is to maintain a tree of constant height with node degree logδ n, for some 0 < δ < 1, and leaves containing o(log n) elements each. As each internal node can fit in one machine word, the tree supports

4 In their paper [24], the authors refer to the list representation problem, as introduced by Fredman and Saks [13], as the dynamic array problem. Also, the operation Access is named Index.

C P M 2 0 1 7 48:6 Dynamic Elias-Fano Representation

basic search operations in O(1) time by using a small pre-computed table. In Section 5 we will make use of a similar data structure, in order to handle mini blocks of sorted integers, which avoids the use of pre-computed tables.

3 Static Predecessor Queries in Optimal Time

In this section we are interested in determining the optimal running time of Predecessor for the Elias-Fano space bound in (2). As mentioned in Section 1, our focus is on polynomial universes, i.e., u = nγ for any γ = Θ(1), for which the second branch of the time/space trade-off in (1) becomes optimal. The following theorem shows that adding o(n) bits of redundancy to EF(S(n, u)) is enough to support Predecessor queries in optimal time.

I Theorem 1. There exists a data structure representing an ordered set S(n, u) of n integers drawn from a polynomial universe of size u = nγ , for any γ = Θ(1), that takes EF(S(n, u)) + o(n) bits of space and supports Access in O(1) worst-case and Predecessor/Successor queries u in optimal O(min{1 + log n , log log n}) worst-case time.

We resort on the time/space trade-off (1) by Pˇatraşcuand Thorup [21, 22]. In our case, u a = log(dlog n e + 2) and w = Θ(log u) = γ log n. In such setting, the second term of the w−log n u trade-off becomes log a = log((γ − 1) log n/ log(dlog n e + 2) = O(log log n). This proves that y-fast tries and van Emde Boas trees are optimal for static Predecessor queries within the Elias-Fano space bound. However, such bound only depends on n, whereas the plain u Elias-Fano bound for Predecessor of O(1 + log n ), introduced in Subsection 2.2, depends on both n and u. On the other hand, the relation u = nγ relates the two parameter by means of the constant γ = Θ(1). It is clear that varying γ one of the two bounds becomes u u optimal. Indeed, comparing 1 + log n with log log n, we have that 1 + log n ≤ log log n n γ−1 1 whenever u ≤ 2 log n, i.e., when n ≤ 2 log n. From this last condition we derive that log log n the plain Elias-Fano is faster than van Emde Boas whenever 1 ≤ γ ≤ 1 + log n . In this case the static Elias-Fano representation does not need to be augmented. When, instead, log log n γ > 1 + log n , the query time O(log log n) is optimal and exponentially better than plain u Elias-Fano. Therefore, O(min{1 + log n , log log n}) is an accurate characterization of the Predecessor time bound. We are left to describe a data structure matching the bound of O(log log n), within o(n) bits of additional space. We divide S into dn/ log2 ue blocks of log2 u integers each (the last block may contain less integers). We can solve Predecessor queries in a block in O(log log u) = O(log log n) time by applying binary search. Now, we need a data structure on top of S that allows us to identify the proper block in O(log log n) time. Call the first element of a block its lower bound. We attach to S an y-fast trie storing the lower bounds of the blocks. More precisely, each leaf in the y-fast trie holds the lower bound of a block and its position in S. The integers stored in the y-fast trie are dn/ log2 ue, therefore its space is n O( log2 u log u) = o(n) bits. To identify the block where the predecessor of x lies in, we answer a partial Predecessor(x) query among the integers stored in the y-fast trie in O(log log n) worst-case time. The position p in S of the block’s lower bound, associated to the identified partial answer, indicates that the search must continue in the block S[p, min{p + log2 u, n}).

Concluding this section, observe that the time bound for Predecessor queries is always at log log n most O(log log n) except when 1 ≤ γ ≤ 1 + log n : in this case, the plain Elias-Fano representation beats the time bound of O(log log n). Therefore, in what follows we report u log log n the bound as O(min{1 + log n , log log n}) but discuss the case for γ > 1 + log n . G. E. Pibiri and R. Venturini 48:7

4 Extensible Elias-Fano Representation

When the integers are inserted in sorted order, we obtain an efficient extensible representation as these can only be added at the end of the sequence by means of an Append operation. This is a scenario of practical interest as it is the operational setting of append-only inverted indexes, e.g., the one of Twitter [6].

I Theorem 2. There exists a data structure representing an ordered set S(n, u) of n integers, drawn from a polynomial universe of size u = nγ , for any γ = Θ(1), that takes EF(S(n, u)) + o(n) bits of space and supports: Append in O(1) amortized, Access in O(1) worst-case and u Predecessor/Successor queries in optimal O(min{1 + log n , log log n}) worst-case time. We maintain an array B of size m in which integers are appended uncompressed. This array acts as a buffer, which is periodically encoded with Elias-Fano in Θ(m) time and dumped, so that new integers can be successfully appended. Each compressed representation of the buffer is appended in an array of blocks encoded with Elias-Fano. More precisely, when B becomes full we encode with Elias-Fano its corresponding differential buffer, i.e., the buffer whose values are B[i] − B[0], 0 ≤ i < m. Each time the buffer is compressed, we append in another array C the pair hbase, low_bitsi = hB[0], dlog(B[m − 1]/m)ei, i.e., the buffer lower bound value and the number of bits needed to encode the average gap of the Elias-Fano representation of the buffer. Apart from the space taken by the compressed blocks, the space of the data structure is given by the following contributions: (m + 1) log u bits for the buffer B of uncompressed integers and its size; n O(d m e log n) bits for pointers to rank/select data structures, low and high bit arrays; n O(d m e log u) bits for the array C. n 2 Summing up, the redundancy is O((m+1+d m e) log u) bits. We use a buffer of size m = log u and, as done in Section 3, we index the buffer lower bounds in an y-fast trie. More precisely, each leaf of the fast trie stores a buffer lower bound and the index of the compressed block to which the lower bound belongs to. The values stored in the y-fast trie are dn/ log2 ue, thus n requiring o(n) bits of space. The redundancy O((m + 1 + d m e) log u) becomes o(n) bits for n = ω(log3 u), which is already satisfied by requiring that γ = Θ(1). To take into account the space taken by the representation of the blocks, we use the property that splitting a block encoded with Elias-Fano into two sub-blocks never increases the cost of representation of the block. This is possible because each sub-block can be encoded with a universe relative to the sub-block, which is smaller than the original block universe, by subtracting to each integer the lower bound of the sub-block. The following property can be easily extended to work with an arbitrary number of splits. I Property 1. Consider a monotone sequence S of n integers. Let S[i, j) indicate the range of S delimited by endpoints i and j. Then for any i, k and j such that 0 ≤ i < k < j < n, we have EF(S[i, k)) + EF(S[k, j)) ≤ EF(S[i, j)).

Proof. Let m and u be respectively size and universe of the sub-sequence S[i, j), and, similarly, let m1, m2, u1, u2 be the sizes and universes of the two sub-sequences S[i, k) and S[k, j) respectively. We have that m = m1 + m2 and u = u1 + u2. From Subsection 2.2, we know that EF(S[i, j)) takes mφ + m + d u e. Similarly EF(S[i, k)) = m φ + m + d u1 e 2φ 1 1 1 2φ1 and EF(S[k, j)) = m φ + m + d u2 e. EF(S[i, k)) and EF(S[k, j)) are minimized by setting 2 2 2 2φ2 u1 u2 φ1 = blog c and φ2 = blog c respectively [10], therefore, by replacing φ1 and φ2 with φ, m1 m2 u1 u2 u we have that EF(S[i, k))+EF(S[k, j)) ≤ m1φ+m2φ+m1 +m2 +d 2φ e+d 2φ e = mφ+m+d 2φ e = EF(S[i, j)). J

C P M 2 0 1 7 48:8 Dynamic Elias-Fano Representation

The operations are supported as follows. Since we compress the buffer each time it fills up (by taking Θ(m) time), Append is performed in O(1) amortized time. Appending new integers in the buffer accumulates a credit of O(log2 u) which largely pays the amortized cost O(log log u) of inserting a buffer lower bound into the y-fast trie. To Access the i-th integer, we retrieve the element x in position i − jm from the compressed block of index i j = b m c. This is done in O(1) worst-case time, since we know how many low bits are required to perform the access by reading C[j].low_bits. We finally return the integer x + C[j].base. Predecessor queries are supported similarly as in the description of Theorem 1. Given the integer x, we first resolve a partial Predecessor(x) query in the y-fast trie to identify the index j of the compressed block in which the predecessor is located. Then we return C[j].base + Predecessor(x − C[j].base) by binary searching the block of index j in O(log log u) = O(log log n) worst-case time.

From Theorem 2, the following corollary easily follows.

2 I Corollary 1. There exists a data structure representing an ordered set S(n, u) of n = ω(log u) integers drawn from a universe of size u that takes EF(S(n, u)) + o(n) bits of space and supports Append and Access operations in O(1) worst-case time.

Without using the y-fast trie we are able to achieve a worst-case running time for the Append operation in Corollary 1 by using a classical de-amortization argument (note, however, that Predecessor queries are not supported in optimal time anymore). We maintain two buffers, B1 and B2, instead of one. When one is full we use the other to store the elements that must be appended. Suppose B1 is full. For each of the successive m Append operations, we compress one element from B1 and append the new integer in B2. These two steps require O(1) worst-case time each.

5 Dynamic Elias-Fano Representation

In this section we describe how the static Elias-Fano representation can be turned into an efficient dynamic data structure, i.e., supporting Access, Insert, Delete, Minimum, Maximum, Predecessor and Successor in optimal time and taking EF(S(n, u)) + o(n) bits of space. log n As already discussed in Subsection 2.3, Fredman and Saks [13] proved that O( log log n ) amortized time is optimal for any data structure maintaining a set of integers subject to Access, Insert and Delete (list representation problem). Their result holds when w ≤ logγ n for some γ, which covers the case of polynomial universes u = nγ since γ ≤ logγ−1 n, for any γ ≥ 1 and n ≥ 2. We operate, therefore, in the same setting as Theorems 1 and 2, considering integers drawn from a polynomial universe of size u = nγ , for any γ = Θ(1). In this setting, Pˇatraşcuand Thorup [21] showed that O(log log n) query time of y-fast tries and van Emde Boas trees is optimal for the dynamic predecessor problem too.

I Theorem 3. There exists a data structure representing an ordered set S(n, u) of n integers drawn from a polynomial universe of size u = nγ , for any γ = Θ(1), that takes EF(S(n, u)) + o(n) bits of space and supports: Access in O(log n/ log log n) worst-case; Insert/Delete in O(log n/ log log n) amortized; Minimum/Maximum in O(1) and Predecessor/Successor queries u in O(min{1 + log n , log log n}) worst-case time. These time bounds are optimal.

In what follows, we first describe the layout of the data structure and then analyze its space and time complexities. G. E. Pibiri and R. Venturini 48:9

5.1 Data Structure Description We begin our description by showing how to handle a dynamic collection of mini blocks in succinct space, which is a key tool to obtain the full dynamic data structure. This result builds on an idea from [19], Appendix A.1.

5.1.1 Maintaining a Sorted Collection of Mini Blocks Let C be a collection of k = O(polylog n) blocks of sorted integers, with the following properties. The blocks of C form a total order, i.e., uj ≤ fj+1, for all j = 1, . . . , k − 1, where fj and uj indicate, respectively, the first and last element of the j-th block in the total order. Each block supports random access to its elements in constant time and is of size Θ(b) = ρb 1 with 2 ≤ ρ ≤ 2 and b = O(polylog n).

I Lemma 4. The total order of the blocks of C can be maintained by using a data structure that takes O(polylog n · log log n) bits of space and supports the following operations in O(log log n) worst-case time: Search(x) which returns a pointer to the block containing the integer x; Access(i) which returns the i-th integer of the total order; Insert/Delete of a block.

Pointers of O(log log n) bits to blocks are stored in the leaves of a τ-ary tree T , with τ = Θ(logσ n) for some 0 < σ < 1. Given that we have O(polylog n) leaves, the height of T is constant and equal to O(1/σ). T operates as a B-tree, in which internal nodes have Θ(τ) = ρτ children. Logically, we divide the information stored at each internal node into two levels of representation. For each of the two levels we store Θ(τ) pairs, where the i-th pair maintains information about the sub-tree rooted in the i-th child. The pairs are stored following the order of the upper bounds of the blocks indexed in the sub-trees rooted in the node’s children. In the lower level, each pair contains a pointer to the sub-tree rooted in the child and the size of such sub-tree. The Θ(τ) children sizes are kept in prefix sums to enable binary search. In the upper level, each pair contains a pointer to the right-most block indexed in the sub-tree rooted in its child and the size of such sub-tree. Each leaf holds, of course, only the lower level of information. Each node uses O(τ(log log n + log polylog n)) = O(τ log log n) = o(log n) bits, thus fitting in (less than) a machine word. The space taken by whole data structure is, therefore, O(τ O(1/σ) log log n) = O(polylog n · log log n) bits. We now detail how the operations are implemented. To support Search(x), i.e., determin- ing the block where the integer x is comprised, we percolate T , locating the correct child at each node in O(log τ) = O(log log n) by binary searching on blocks’ upper bounds. Specific- ally, if the upper bounds of the i-th block is needed for comparison for some 1 ≤ i ≤ Θ(τ), we access the block following the pointer (to the right-most block) of the i-th pair stored in the upper level of the node and we retrieve the upper bound in O(1), given that we also know the size of the block. When we have to insert/delete an integer, we identify the proper block of the total order in/from which the integer must be inserted/deleted in O(log log n) time (as described for the Search operation) and update the pairs along the path from the root in constant time, as these pairs fits in o(log n) bits overall. If a split or merge of a block happens, it is handled as usual and solved in a constant number of O(1)-time operations. During an Access(i) query, we follow the proper root-to-leaf path in T . The traversal of the data structure does not need to access the blocks directly, but instead uses their sizes to determine the correct child at each level. By binary searching the sizes, we traverse the data structure in O(log log n) time. During the traversal of the path we also compute the sum ∆ of the sizes of the preceding blocks by summing to the current value of ∆, at each level, the

C P M 2 0 1 7 48:10 Dynamic Elias-Fano Representation

value stored in the (j − 1)-th pair of the lower level if the j-th child is traversed. Finally we retrieve the (i − ∆)-th integer from the identified block in O(1), as the blocks of C support random access.

5.1.2 Full Data Structure Layout Let ` be log n/ log log n for the rest of the paper. We logically divide the sorted sequence S(n, u) into mini blocks of Θ(`) = ρ` integers each. We organize the dynamic layout into two levels.

Lower level. We group O(log2 n) consecutive mini blocks together and index such collection using the data structure T described in Lemma 4. We refer to this collection as a “block” and 2 k0 0 2 say that T stores a block of O(log n) mini blocks. The set {Tj}j=1, with k = n/O(` log n), of all such data structures forms the lower level of the dynamic layout. Each Tj also stores the lower bound fj of its block and the number of low bits required by its Elias-Fano representation in Θ(log u) bits, so that we can subtract fj to all the integers belonging to the mini blocks of Tj.

k0 Upper level. The set {fj}j=1 of all the lower bounds of the blocks are indexed using an y-fast trie. The sizes of the blocks are maintained, instead, using the dynamic prefix sums data structure described in [5], which is a B-tree in which each node stores a dynamic prefix sums data structure operating on a small set of integers in O(1) time. In particular we use the operation Update(i, ∆) of P as implemented in [5], which sums to the i-th integer of the data structure the quantity ∆ (that fits in δ bits) and runs in optimal O(log n/ log(w/δ)) worst-case time. In our setting this operation is supported in O(`) given that δ = ∆ = 1. These two data structures, respectively named Y and P in the following, form the upper level of the dynamic layout. The j-th leaf of Y and P stores a O(log n)-bit pointer to the data structure Tj in the lower level.

To handle the memory allocation for the mini blocks, we employ a different technique to manage the high and low part of their Elias-Fano representation. Recall from Subsection 2.2 that, given a sequence S(n, u), the high part of EF(S(n, u)) consists in a bitvector of at most u 2n bits, whereas the low part is given by a vector of n dlog n e-bit integers. In our case, the high part of each mini block requires at most 2` = O(w) bits and is stored using the data structure of Theorem 6 from [17] that allows to address and allocate the high part of a mini block in O(1) worst-case time. The low part of a mini block is instead stored using the data structure of Corollary 3 from [24] that supports Access in O(1) and Insert/Delete in O(`) worst-case time for any fixed positive  < 1.

5.2 Space Analysis The space required by the introduced layout will be clearly given by the contribution of: the data structures Y and P used in the upper level and the data structures T of Lemma 4 used in the lower level; the cost of representation of the mini blocks encoded with Elias-Fano; the overhead given by the mini blocks memory management. In the following we separately analyze each contribution.

n The space taken by the data structures Y and P in the upper level is O( ` log2 n log u) = o(n) n 2 bits. All the data structures T of Lemma 4 require O( ` log2 n log n log log n) = o(n) bits too. G. E. Pibiri and R. Venturini 48:11

We now analyze the space taken by the encoding of the mini blocks. Since the universe of representation of a mini block could be as large as the one of its comprising block, i.e., u, storing the lower bounds of the mini blocks in order to use reduced universes (as already n done for the blocks), would require O( ` log u) bits, which is too much. In what follows we show that it is not necessary to re-map the mini blocks using Property 1, hence these are kept encoded with the universe relative to their comprising block, if we carefully set the number of bits required to represent each low part in the Elias-Fano space bound (2). As pointed out previously, each low part in the Elias-Fano representation of a sequence S(n, u) u is encoded using dlog n e bits, which is the number of bits needed to encode the average gap u u/n of S. The number of bits for the average gap of a block is therefore dme = dlog ` log2 n e. The idea is to choose a number of bits dm0e for the encoding of the average gap of the mini blocks such that dm0e = dme for a sufficiently long sequence of p insertions/deletions. After p insertions/deletions have been performed, we rebuild the mini blocks using dme bits for the average gap. In other words, we want to guarantee that encoding the mini blocks with dm0e bits for the average gap, instead of dme, does not introduce any extra space. 0 u u 0 Since m lies in the interval [l, r] = [log ` log2 n+p , log ` log2 n−p ], m must be chosen in order to satisfy dme − 1 < m0 < dme, which indeed implies dm0e = dme. Precisely, we satisfy this condition by fixing m0 = m ± θ with dme − l < ±θ < dme − r + 1. To derive this condition, we distinguish three possible cases. 1. [l, r] ⊂ [dme − 1, dme). In this case the condition dme − 1 < m0 < dme is already satisfied. The other two cases are symmetric. 2. dle = dme − 1. In this case we set m0 = m + θ. To let dme − 1 < m0 < dme holds, θ must be at least dme − l and at most dme + 1 − r. 3. dre = dme + 1. In this case we set m0 = m − θ. To let dme − 1 < m0 < dme holds, θ must be at least r − dme − 1 and at most l − dme. Cases 2. and 3. together yield the condition dme − l < ±θ < dme − r + 1. Finally, we have to determine the proper number p of insertions/deletions before triggering the rebuilding of the mini blocks in order to attain to optimal insert/delete amortized time O(`). As blocks are of size Θ(` log2 n), p is chosen to be O(log2 n).

The techniques used to manage the memory allocation for the mini blocks introduce an overall redundancy of o(n) bits. Precisely, the data structure of Theorem 6 from [17] has an 4 n 2 overhead of O(w + log n log w) = o(n) bits, while the one of Corollary 3 from [24] uses o(n) bits by choosing a proper positive  < 1.

In conclusion, by the above discussion and the use of Property 1, the space taken by the mini blocks can be safely upper bounded by EF(S(n, u)) and the redundancy sums up to o(n) bits, so that the whole data structure requires EF(S(n, u)) + o(n) bits of space.

5.3 Operations

In this subsection we describe how the operations of Theorem 3 are implemented. As stated before, ` is a short-hand for log n/ log log n. To Access the i-th integer, we first resolve Search(i) on P in O(`): Search(i) = j indicates that the j-th block contains the i-th integer given that Sum(j − 1) < i ≤ Sum(j), where Sum(j) equals the sum of the sizes of the first j blocks. We then follow the pointer stored in the j-th leaf of P, which points to the data structure Tj. We finally Access the integer x of index i − Sum(j − 1) from Tj in O(log log n) and return x + fj. The overall complexity is, therefore, O(`). To Insert/Delete an integer x, we perform the following steps: 1. identify

C P M 2 0 1 7 48:12 Dynamic Elias-Fano Representation

the proper data structure Tj by resolving a partial Successor(x) query on Y in O(log log n) and following the pointer retrieved at the identified leaf of Y; 2. identify the correct mini block by Search(x − fj) in Tj in O(log log n); 3. Insert/Delete x − fj in Tj by rebuilding the proper mini block in Θ(`); 4. update P in O(`). During the third step, split or merge of a mini block can happen and it is handled in O(`) worst-case time by the data structure Tj; rebuilding of the mini blocks can happen as pointed out in the previous section and it is handled in O(`) amortized time. If split/merge of a block happens, the lower bound of the block is inserted/removed from Y in O(log log n) time. The overall complexity is, therefore, O(`) amortized. The query Predecessor(x) is supported as follows (Successor(x) runs in a similar way). We identify the proper data structure Tj in O(log log n) by answering a partial Predecessor(x) query on Y and following the pointer retrieved at the identified leaf of Y. Then we identify the proper mini block by Search(x − fj) in Tj in O(log log n) time. We finally return fj + Predecessor(x − fj) by binary searching on the identified mini block. The overall complexity is O(log log n) worst-case. The minimum and maximum elements of S are stored uncompressed using Θ(log u) bits and returned when requested in O(1). Upon insertion/deletions these are updated as needed.

6 Conclusions

In this paper we have shown how the Elias-Fano representation of a monotone integer sequence S can be adapted to obtain optimal data structures in terms of query time and almost optimal in terms of space. In particular, when integers are drawn from a polynomial universe of size u = nγ , for any γ = Θ(1), our data structures take the same asymptotic space of the plain, static, Elias-Fano representation, i.e., EF(S(n, u)) + o(n) bits and support: 1. static Predecessor/Successor queries in optimal worst-case time O(min{1 + u log n , log log n}) (Section 3); 2. a O(1) worst-case amount of work for Append when integers are inserted in sorted order (Section 4); 3. Access in optimal O(log n/ log log n) worst- case time, Insert/Delete in optimal O(log n/ log log n) amortized time, Predecessor/Successor u queries in optimal O(min{1 + log n , log log n}) worst-case time (Section 5). As a last note, we observe that the data structure described in Section 5 allows us to support all operations in time O(log log u) when non-polynomial universes are considered, i.e., when n and u are not necessarily related by means of the formula u = nγ for any γ = Θ(1). In this setting, the data structure of Lemma 4 will take O(polylog u · log log u) bits and operate in O(log log u) time. In order to guarantee an overall redundancy of o(n) bits, we let mini blocks be of size Θ((log log u)2) and group O(log2 u) consecutive mini blocks into a block. The high part of a mini block fits into one machine word, whereas we can insert/delete a low part in O((log log u)2) for Corollary 3 of [24], which is O(log log u) as 1 soon as  < 2 . Therefore, the following corollary matches the asymptotic time bounds of y-fast tries and van Emde Boas trees but in almost optimally compressed space.

I Corollary 2. There exists a data structure representing an ordered set S(n, u) of n integers drawn from a universe of size u that takes EF(S(n, u))+o(n) bits of space and supports: Access and Predecessor/Successor queries in O(log log u) worst-case; Insert/Delete in O(log log u) amortized and Minimum/Maximum in O(1).

Acknowledgements. This work was partially supported by the EU H2020 Program under the scheme INFRAIA-1-2014-2015: Research Infrastructures, grant agreement #654024 SoBigData: Social Mining & Big Data Ecosystem and by the Pegaso Project, POR FSE 2014-2020. G. E. Pibiri and R. Venturini 48:13

References 1 Miklós Ajtai. A lower bound for finding predecessors in Yao’s cell probe model. Combin- atorica, 8(3):235–247, 1988. 2 Arne Andersson and Mikkel Thorup. Dynamic ordered sets with exponential search trees. Journal of the ACM (JACM), 54(3):13, 2007. 3 Paul Beame and Faith E. Fich. Optimal bounds for the predecessor problem. In Proceedings of the Thirty-First Annual Symposium on Theory of Computing (STOC), pages 295–304, 1999. 4 Paul Beame and Faith E. Fich. Optimal bounds for the predecessor problem and related problems. Journal of Computer and System Sciences (JCSS), 65(1):38–72, 2002. 5 Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind. Dynamic relative compression. CoRR, abs/1504.07851, 2015. 6 Michael Busch, Krishna Gade, Brian Larson, Patrick Lok, Samuel Luckenbill, and Jimmy Lin. Earlybird: Real-time search at Twitter. In Proceedings of the 28-th International Conference on Data Engineering (ICDE), pages 1360–1369, 2012. 7 David Clark. Compact Pat Trees. PhD thesis, University of Waterloo, 1996. 8 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms (3-rd Edition). MIT Press, 2009. 9 Erik D. Demaine and Mihai Pˇatraşcu. Tight bounds for the partial-sums problem. In Proceedings of the 15-th Annual Symposium on Discrete Algorithms (SODA), pages 20–29, 2004. 10 Peter Elias. Efficient storage and retrieval by content and address of static files. Journal of the ACM (JACM), 21(2):246–260, 1974. 11 Robert Mario Fano. On the number of bits required to implement an associative memory. Memorandum 61, Computer Structures Group, MIT, Cambridge, MA, 1971. 12 Michael L Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst-case access time. Journal of the ACM (JACM), 31(3):538–544, 1984. 13 Michael L. Fredman and Michael E. Saks. The cell probe complexity of dynamic data structures. In Proceedings of the 21-st Annual Symposium on Theory of Computing (STOC), pages 345–354, 1989. 14 Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. Journal of Computer and System Sciences (JCSS), 47(3):424–436, 1993. 15 Roberto Grossi, Rajeev Raman, S. Srinivasa Rao, and Rossano Venturini. Dynamic com- pressed strings with random access. In Proceedings of 40-th International Colloquium on Automata, Languages, and Programming (ICALP), pages 504–515, 2013. 16 Ankur Gupta, Wing-Kai Hon, Rahul Shah, and Jeffrey Scott Vitter. Compressed data structures: Dictionaries and data-aware measures. Theoretical Computer Science (TCS), 387(3):313–331, 2007. 17 Jesper Jansson, Kunihiko Sadakane, and Wing-Kin Sung. CRAM: Compressed random access memory. In Proceedings of 39-th International Colloquium on Automata, Languages, and Programming (ICALP), pages 510–521, 2012. 18 Veli Mäkinen and Gonzalo Navarro. Rank and select revisited and extended. Theoretical Computer Science (TCS), 387(3):332–347, 2007. 19 Gonzalo Navarro and Yakov Nekrich. Optimal dynamic sequence representations. In Pro- ceedings of the Twenty-Fourth Annual Symposium on Discrete Algorithms (SODA), pages 865–876, 2013. 20 Giuseppe Ottaviano and Rossano Venturini. Partitioned Elias-Fano indexes. In Proceed- ings of the 37-th International Conference on Research and Development in Information Retrieval (SIGIR), pages 273–282, 2014.

C P M 2 0 1 7 48:14 Dynamic Elias-Fano Representation

21 Mihai Pˇatraşcuand Mikkel Thorup. Time-space trade-offs for predecessor search. In Proceedings of the 38-th Annual Symposium on Theory of Computing (STOC), pages 232– 240, 2006. 22 Mihai Pˇatraşcuand Mikkel Thorup. Randomization does not help searching predecessors. In Proceedings of the 18-th Annual Symposium on Discrete Algorithms (SODA), pages 555–564, 2007. 23 Mihai Pˇatraşcuand Mikkel Thorup. Dynamic integer sets with optimal rank, select, and predecessor search. In Proceedings of the 55-th Annual Symposium on Foundations of Computer Science (FOCS), pages 166–175, 2014. 24 Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct dynamic data struc- tures. In Proceedings of the 7-th International Workshop on Algorithms and Data Structures (WADS), pages 426–437, 2001. 25 Kunihiko Sadakane and Roberto Grossi. Squeezing succinct data structures into entropy bounds. In Proceedings of the 17-th Annual Symposium on Discrete Algorithms (SODA), pages 1230–1239, 2006. 26 Peter van Emde Boas. Preserving order in a forest in less than logarithmic time. In Proceedings of the 16-th Annual Symposium on Foundations of Computer Science (FOCS), pages 75–84, 1975. 27 Peter van Emde Boas. Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters (IPL), 6(3):80–82, 1977. 28 Peter van Emde Boas, Robert Kaas, and Erik Zijlstra. Design and implementation of an efficient priority queue. Mathematical Systems Theory (MST), 10:99–127, 1977. 29 Sebastiano Vigna. Quasi-succinct indices. In Proceedings of the 6-th International Confer- ence on Web Search and Data Mining (WSDM), pages 83–92, 2013. 30 Dan E. Willard. Log-logarithmic worst-case range queries are possible in space θ(n). In- formation Processing Letters (IPL), 17(2):81–84, 1983. 31 Andrew Chi-Chih Yao. Should tables be sorted? Journal of the ACM (JACM), 28(3):615– 628, 1981.