3 Van Emde Boas Trees

Advanced Data Structures Anubhav Baweja May 19, 2020

1 Introduction

Design of data structures is important for storing and retrieving information efficiently. All data structures have a of operations that they support in some time bounds. For instance, balanced binary search trees such as AVL trees support find, insert, delete, and other operations in O(log n) time where n is the number of elements in the .

In this report we will talk about 2 data structures that support the same operations but in different time complexities. This problem is called the fixed-universe and the two data structures we will be looking at are Van Emde Boas trees and Fusion trees.

2 Setting the stage

For this problem, we will make the fixed-universe assumption. That is, the only elements that we care about are w-bit integers. Additionally, we will be working in the word RAM model: so we can assume that w ≥ O(log n) where n is the size of the problem, and we can do operations such as addition, subtraction etc. on w-bit words in O(1) time. These are fair assumptions to make, since these are restrictions and advantages that real computers offer. So now given the T , we want to support the following operations:

1. insert(T, a): insert integer a into T . If it already exists, do not duplicate. 2. delete(T, a): delete integer a from T , if it exists. 3. predecessor(T, a): return the largest b in T such that b ≤ a, if one exists. 4. successor(T, a): return the smallest b in T such that b ≥ a, if one exists.

Note that balanced binary can support all these operations in O(log n) time, so we will try to do better here.

3 Van Emde Boas Trees

If we are given that the size of the universe is u, then using vEB trees we can support all the given operations in O(log log u) time [1]. If we are considering all integers of word length w, then we have that u = 2w so our time bound becomes O(log w). Since we are working in the word RAM model, we made the assumption that w ≤ O(log n), so we get that the time is O(log log n), significantly better than the complexity for binary search trees.

In order to motivate this complexity, we need to have a recurrence that solves to O(log log u). The √ classic example of such a recurrence is T (u) = T ( u) + O(1), so we will strive to get to that point. But first let’s incrementally build this data structure in steps.

1 Advanced Data Structures 15-751 3 Van Emde Boas Trees

3.1 The first solution

A naive thing we could do is just maintain a bit vector on all possible elements in our universe. This will give us O(1) inserts and deletes, but predecessor and successor can be as bad as O(u). However, we can make the following optimization: we store another bit vector with half the size where each element is an OR of two adjacent entries. Then we will combine adjacent elements of this bit vector to get another one and so on.

Figure 1: The data structure for the set S = {1, 2, 3, 7} where u = 8

It is clear that we can do insert and delete in O(log u) time with this modification (just update ancestor blocks, and also maintain a counter for the number of elements present in the range), but now we can also do the predecessor and successor operation in that time. We first find the leaf of the corresponding position, and do the rest in 2 phases:

1. Up phase: We keep going up until we enter a node from the left side such that the right child of the node is also 1.

2. Down phase: From there we go down the left child if there is a 1, otherwise we go down the right child.

For example, if we wanted the successor of 1 in Figure 1, then we go up until the 0 − 3 block since its right child also has a 1, and from there we go down to the 2 block, so we get that the successor of 1 is 2. This is particularly nice because we see that the hard case for successorT, a is when a is already in T: otherwise we can make the search for a the Down phase itself: bypassing the Up phase completely.

3.2 Motivation for vEB trees

Note that the reason why the above solution has O(log u) complexity is because we divide the entire set into 2 parts at every layer, since the recurrence we are implicitly solving is T (u) = T (u/2)+O(1). √ √ Since we want to move towards T (u) = T ( u) + O(1) instead, let’s divide the set into u parts of √ √ √ u size. Therefore every vEB stores u many vEBs of size u each, and the total height of the tree is O(log log u).

When we divided the set into 2 halves, we could just OR the result stored in the 2 halves to compute the result of the set. This is because at the end of the Up phase, there was always only one place √ to look: the right child of the node. However now there might be as many as u − 1 many options

2 Advanced Data Structures 15-751 3 Van Emde Boas Trees to pick from, and we cannot go through each one of them since that would destroy our complexity. So we need to figure out some other way to do this.

Here is the super clever part: deciding which child to go down on is like solving the successor √ √ problem again. Since the node has u children, we can enumerate them from 0 to u − 1, and while coming up from child/block i, we can just ask for the successor of i on a bit set that we √ maintain over these children. And this can be solved with a vEB tree of size u. So not only √ √ does a vEB contain u children vEBs with u elements, but we have an extra vEB called the √ ”summary”, which also contains u elements (although these elements are artificial in the sense that they correspond to these child/block numbers that we have assigned to the children).

3.3 Cleanup

This is all really cool, but we need to tie up some loose ends. In particular, do we really query this summary vEB on every node in the Up phase? Note that if we do then the recursive formula is √ no longer T (u) = T ( u) + O(1). So we can only query the summary at most a constant number of times in the Up phase. In fact, if we just store the maximum stored in each vEB, then we only need to query the summary once: at the end when we flip to the Down phase. In the Up phase, we can check if queried integer is equal to the max. If it is, we continue going up, otherwise we know now that there exists an integer greater in the set that lies within this vEB, so we query the summary and start the Down phase with the appropriate block.

Note that in order to support the predecessor operation, we need to do a similar thing and store the minimum. With these additions to our data structure, we have finally achieved O(log log u) time predecessor and successor operations.

3.4 Insert and Delete

Now we just need to make sure insertions and deletions can still be supported in O(log log u) time:

1. insert(V, a): Starting at the root, we can figure out which child vEB to go down on in O(1) time. If the number is less or more than the minimum or maximum respectively then we update it in O(1). Now there are two cases: • The child vEB we want to insert a into is not empty. In this case we do not need to update the summary at all, and we can just proceed by inserting a into the child which √ takes T ( u) time. • The child vEB we want to insert into is empty. In this case we need to enter the child’s a √ index into the summary, which will take T ( u) time. Now one might think that we need √ to insert a into the child vEB recursively as well and that takes another T ( u) time breaking our recursive formula. However, we know that the vEB is empty to begin with so we have already reached our ’base case’ in a way. The remaining cost of insertions will be a total of O(log log u) because that is the height of the tree, and the total cost is √ T (u) + O(log log u) where T (u) = T ( u) + O(1), so we are good. 2. delete(V, a): Note that there is nothing to do done if the tree is empty. Just like insert, now we need to consider a few cases:

3 Advanced Data Structures 15-751 3 Van Emde Boas Trees

• There is a single element in V. In this case, the Min and Max are equal (which can be checked in O(1) time), so we just check if they are equal to a. If they are not, then nothing needs to be done. If they are, then we set both Min and Max to None, and initialize both the summary and the appropriate child to an empty vEB. • There is more than one element in V, and a is the Min/Max in V. WLOG it is the minimum. Let S the summary of V and let m be the Min of S. Then either a is the only element in the mth child of V, or it is not. – If it is the only element, then we can delete it from the mth child in O(1) because it is going to becomes empty. Then in order to update the minimum, we need to get the minimum of the next non-empty child of . Therefore we need to call √ V delete(S, m) (this is the only step that takes T ( n) time), and then S will have a new minimum mm, so we just get the minimum of the mmth child of V. – If it is not the only element, then we recursively delete a from the mth child of V, and set the new minimum to be the minimum of the mth child of V. • There is more than one element in V, and a is not the Min or Max in V. Similar to the previous case but without the Min and Max updates.

And we are finally done! With all this work we can finally support all 4 operations in O(log log u) time. However, this data structure has one key limitation: it uses O(u) space. Since u is the size of our universe, and for example can be the set of all 32 bit integers, clearly this data structure uses an exorbitant amount of memory.

3.5 Low space modifications

There are 2 ways with which we can reduce our space:

1. Can get O(n log n) memory using hash tables. This gives us O(log log u) complexity with high probability.

2. Can get O(n) memory using Y-fast . This gives us O(log log u) complexity amortized.

We are going to discuss how to use hash tables here. Y-fast tries [2] are a story for another time.

√ Currently we are storing an array of pointers in every single vEB. This array uses O( u) memory, no matter how many children actually have elements in them. A lot of children can be empty, and we are allocating unnecessary space for them. This can be avoided by using a which maps i to the ith child vEB. For every entry in our set S, we need to have it as an entry in each of its ancestors, and in the summary of each of its ancestors. Since the height of the tree is w, we get that the total space used is O(nw). Since we are working in the word-RAM model, have that w ≤ O(log n) so we get O(n log n) space.

This completes our section on vEB trees, any now will move on to another interesting data structure which solves the fixed-universe predecessor problem.

4 Advanced Data Structures 15-751 4 Fusion Trees

4 Fusion Trees

We will still work in the word-RAM model where every word has size w, so the size of the universe is w 2 . Fusion trees [3] let us solve the predecessor problem in O(logw n) time, which is not necessarily better or worse than the bound we found for Van Emde Boas Trees. In particular, vEB trees work better if w is small: about poly(log n), and fusion trees work better when w is about nε for some ε > 0.

Note that we are only going to discuss a static version of fusion trees: we are not going to show how to support insert or delete, but they can also be supported with Exponential Search Trees [4] with only an extra O(log log n) overhead.

4.1 B-trees

B-trees are a generalization of binary search trees with an arbitrary “branching factor”. In binary search trees we say that the branching factor is 2, because every node can have at most 2 children. In a B-tree of branching factor k, a node can have at most k children, and can also store up to k − 1 keys. Note that the values in the k subtrees are dictated by the values of the k − 1 keys: the ith subtree has keys that are greater than the ith key and less than the (i + 1)th key of the node.

Figure 2: A B-tree of branching factor 3

B-trees are a very interesting data structure in their own right, and are used in databases and file systems. However, we will not go into the details for how insert and delete work because we are building a static data structure anyway.

However, the find operation is really important for us. The height of the tree is logk n, and at every level we need to decide which path to go down on. If k is a small constant this is not a problem since we can just do linear search. Otherwise, we need to maintain a balanced like a at every single node to decide which path to go down on. Therefore it takes O(log k) time  log n  per level, which brings the total complexity of find to O(log k·logk n) = O log k · log k = O(log n).

4.2 Motivation

The key idea is that fusion trees are B-trees with branching factor w1/5 along with the use of a lot of bit tricks.

If we just take a B-tree with branching factor w1/5, and even use the balanced BST trick, we get that the search time is O(log n). This is good, but we want O(logw n). Since the height of the tree

5 Advanced Data Structures 15-751 4 Fusion Trees

is O(logw n), we therefore want O(1) search time per node. This is where the bit tricks come in, and the entire write-up from here on out will be dedicated to obtaining this O(1) bound.

4.3 Sketching

The first observation we make is that if we have at most w1/5 keys in one single node, then we don’t really need to store all the w bits of the keys to differentiate between. In fact, we can get away with storing just w1/5 − 1 bits.

Figure 3: A representing 3 keys present in a node. Note that only bits 1 and 4 are necessary to distinguish between the 3 keys.

If we arrange the keys in a trie-like fashion as above, then we note that we only need to store bits 1 and 4 in order to distinguish between the 3 keys, because we only have 2 branching points: before bits 1 and 4. Therefore we will “sketch” the key 001001 as 00, 001011 as 01, 011011 as 11. Note the amazing properties this sketch has:

1. sketch(a) < sketch(b) iff a < b, so the sketch preserves order.

2. The sketch only keeps k − 1 bits, where k is the number of keys in the node, which can be at most w1/5. We can see this by the diagram above: k keys will create at most k − 1 branching points, which will create at most k − 1 important bits for our sketch.

Now since there are w1/5 elements with w1/5 bits each, there are a total of w2/5 = o(w) bits per node, which means we read the entire contents of the node in one read, since we are working in the word-RAM model with word size w.

4.4 Desketchifying

Now if had not sketched, then we would have been able to solve the successor and predecessor problem by using the Up and Down phase technique as presented in section 3.1. However, sketch- ing makes life slightly harder for us. If the queried value a already exists in our tree, then we will be fine because we will be able to compare the sketch of a with the sketches stored at nodes, and therefore decide which vertex to go down on. However, if a does not exist in the data structure, then we have a problem because we don’t have a sketch of this a at all, so how would we compare this element with the sketches?

6 Advanced Data Structures 15-751 4 Fusion Trees

Figure 4: Bold edges represent keys present in data structure. Dotted edges represent the query element

We make a convenient observation first: if say we want to find the successor of q in the node, then we could also just find the predecessor of q instead if we feel it’s easier. This is because once we have the predecessor of q, then the successor of q is just going to be the next sketch value in the node, which can be obtained in O(1) time. Similarly we can find the successor instead of the predecessor if that is easier.

Even though this may seem kind of pointless at first, we find the keys xi and xi+1 such that sketch(xi) ≤ sketch(q) ≤ sketch(xi+1) where q is the query element. In the above example this is xi = 0001 and xi+1 = 0010, which are clearly not the predecessor and successor of q.

Now let’s compare the original bit strings of xi and q. They have a longest common prefix in their bit representations, and a point D where they diverged. In the example, this common prefix is 0, and D is the large dashed box in the image. Now we will create a new query q2 such that q2 is the same as q in this common prefix, but then instead diverges in the direction of xi instead of q. In the example, we diverged at bit 1 where q had a 1 and xi had a 0. So q2 starts with 00, and then we append as many 1s as we need in order to make q2 have word length w. So q2 = 0011, or the leaf corresponding to the smaller dashed box in the figure.

Now if we get yi, yi+1 such that sketch(yi) ≤ sketch(q2) ≤ sketch(yi+1), then yi ≤ q because they have the same bits in the prefix, and then at the divergence point q takes a 1 and yi takes a 0. How- ever, more surprisingly, this is in fact the largest such key in the set and is therefore the predecessor of q. Since we created q2 by obtaining the largest value possible (by appending tons of 1s) such that it was consistent with xi for just one more bit than q was, yi must be the largest key present in the left subtree of D. So yi would indeed be the predecessor of q if only there was no key smaller than q present in the right subtree of D, right? But in that case, D would also have been a divergence point in the original set! Therefore in the example we would have included bit 1 in the sketch, and there- fore the sketch of q would have been greater than the sketch of xi and xi+1 which is a contradiction.

Note that all the above reasoning applies when q has a 1 at the divergence point D and xi, xi+1 have a 0. We could also have the opposite case, and in that case we try to find the successor instead. We create q2 by taking the longest common prefix again, but now we append a 1 and a ton of 0s, to get the smallest value in the right subtree and then finding its successor in the sketch space.

7 Advanced Data Structures 15-751 4 Fusion Trees

4.5 Cleanup

And that is all there is to the data structure. If we can somehow show that all the operations above can be done in O(1) time then we would be done, but some things are still not clear:

1. How do we obtain these sketches in the first place? One might think that since this is a static data structure so we can just sketch the keys in preprocessing. However it is not clear how to sketch the query q in O(1) time with the sketch we have defined. In order to do that, we will use the idea of an approximate sketch.

2. How do we obtain xi and xi+1 such that sketch(xi) ≤ sketch(q) ≤ sketch(xi+1)? We will need to do multiple comparisons quickly in order to do this, so we will exploit the fact that all sketches can fit inside a single word and use parallel comparisons.

3. How do we compute the longest common prefix of two bit vectors? Observe that this is the same as taking the XOR of the two vectors (can be done in O(1) in word-RAM), and then computing the most significant bit of the result.

Out of these 3, approximate sketching is probably the most insightful tool, so we are not going to worry about the other 2 problems.

4.6 Approximate Sketching

So far we have been working with “perfect sketches”: we have a list of important indices b0, b1..., br−1, and we have a bit vector x. We extract the r bits one at a time and make a string of length r, only containing the important bits, nothing else. This is hard to do it constant time, so we relax the condition a little bit: we allow some garbage bits to be in between the important bits. Firstly, we can create a bit vector with 1s in the positions b and AND it with x, zeroing out all the i   0 Pr−1 bi non-essential bits. We can represent this as x = x AND i=0 2 . So we now have a lot of useless 0s between our important bits in x0. If the number of these 0s between important bits is predictable in some way, then we will be good, but how do we do that?

The perfect sketch achieved really good compression: we went from O(w) to O(w1/5) bits needed per key. However, if we reduce the space usage to O(w4/5) instead, then we would still have O(w4/5) ∗ O(w1/5) = O(w) space for all the keys in the node, which means we can still store every- thing in a constant number of words. Therefore we now prove the following claim:

Claim: There exists an integer m = Pr−1 2mi such that the sketch of x can be defined as (x0 ·   i=0 Pr−1 bi+mi m) AND i=0 2 >> (b0 + m0 − 1). Essentially the only bits we care about at positions bi + mi, so we zero out everything else, and also delete the leading bits before position b0 + m0. Now m must satisfy the following:

0 Pr−1 bi Pr−1 mi 1. All bi + mj are distinct. This is important because x · m = i=0 xbi 2 · i=0 2 = Pr−1 Pr−1 ai+bj i=0 j=0 xbi 2 where xa represents the ath bit of x. In order to not corrupt the bits at positions bi + mi, there must be no other bi + mj that collides with them.

2. bi + mi < bj + mj iff i < j, to preserve order like we did in the perfect sketch.

8 Advanced Data Structures 15-751 5 Finale and Further Reading

4 4/5 3. (br−1 + mr−1) − (b0 + m0) ≤ O(r ) ≤ O(w ), since that is the space bound we established earlier.

Proof: This is a constructive proof by induction: instead of directly picking mi though, we will 0 0 first inductively construct mi and then define mi in terms of these mi. Instead of showing these 0 3 bi + mi are unique, we will show they are unique modulo r . Let’s say we have already picked 0 0 0 3 0 some m0, m1, ..., mt−1 < r for some t < r. To pick mt, we know that it must be different from 0 2 0 mi + bj − bk for all i < t and for all j, k < r. Therefore, there are at most tr values for mt to avoid, which means there are strictly less than r3 values to avoid, so we always have at least 1 value < r3 0 to pick for mt.

0 Note that these mi already satisfy condition 1, but in order to get conditions 2 and 3 as well, we need to space out the bi + mi a little. So we finally define

0 3 3 mi = mi + ((w − bi + ir ) rounded down to a multiple of r )

3 This modification places every single bi +mi in a different block of size r : the ith block has exactly bi + mi, and there are r such blocks. Note that we need the rounding to avoid overflow into the next block due to the addition of w (we need to add w however to make every index non-negative), and also to keep everything still the same modulo r3.

This is beautiful, because it gives us both properties 2 and 3: 2 is satisfied because every bi + mi lands in the ith block, and since there are r blocks of length r3, the total length of our sketch is 4 just O(r ). 

Note that this sketch can be computed in constant time because all we have done is an AND, a multiplication, another AND, and then a bitshift, all of which take O(1) time in the word-RAM model, so we are good.

The 2 techniques that we need to cover every aspect of fusion trees are parallel comparisons, and most significant bit calculation in O(1) time. These are clever bit tricks that we won’t go over but it is recommended that one first tries to discover them themselves: they are not particularly straightforward but still good exercises.

5 Finale and Further Reading

So we have seen two data structures that solve the fixed-universe predecessor problem. Van Emde Boas trees can do this in O(log w) time, and fusion trees can do this in O(logw n) time. Since both data structures work better under different values of w, one might just implement a wrapper data structure that uses one or the other depending on the value of w. The worst case of this wrapper is log n when both have the same complexity, that is when O(log w) = O(logw n) ⇔ O(log w) = O( log w ) ⇔ 2 √ O(log w)√ = O(log n) ⇔ O(log w) = O( log n). This means this wrapper data structure has com- plexity O( log n) irrespective of the value of w, which is pretty awesome.

The key things we skipped over include Y-fast tries, parallel comparison, and finding most significant bit. It is recommended that the reader refer to the resources and references listed in the next sections to study how they work.

9 Advanced Data Structures 15-751 References

6 Resources

My main resource was lecture videos from MIT 6.851 Advanced Data Structures taught by Erik Demaine. The video lectures are available here: https://www.youtube.com/playlist?list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf

I watched the lectures on Van Emde Boas trees and Fusion trees a few times, and first drafted key claims and proofs presented in the lecture. The lecture notes skip on a lot of details that are mentioned briefly during lecture and I think are key to understanding the data structures, so I also tried to note down as many passing remarks as I could. Some details are also skipped in lecture, and I tried to flesh the details out in such a way that these notes could be read and understood by an average undergraduate student.

For example, the insert and delete functions in Van Emde Boas trees have a few cases that are ignored in lecture for brevity, but I tried to cover them in this report. The proof of why sketching works for Fusion trees is also dismissed as obvious in the lectures, but I fleshed out the entire argu- ment as a contradiction proof. Also there is a key mistake in the vEB lecture: the hashing technique used to reduce space does not bring the space down to O(n), and it should be O(n log n) instead. The argument seemed very hand-wavey to me so I dug deeper and realized that this was a problem in Introduction to Algorithms [5] and the errata page (https://www.cs.dartmouth.edu/ thc/clrs- bugs/bugs-3e.php) now says that the hash table modification actually gives us Θ(n log n) space.

I was also not familiar with B-trees before this project, so I learnt about them from Carl Kingsford’s lecture slides: https://www.cs.cmu.edu/ ckingsf/bioinfo-lectures/btrees.pdf

I also looked at Jelani Nelson’s lecture on Fusion trees for CS224 Advanced Algorithms at Harvard, and that was super helpful in answering some questions I had even after watching the MIT lectures. Some questions that students asked in that lecture were questions that I had myself, such as how can we even do the approximate sketch in O(1) time as advertized. infocobuild.com/education/audio-video-courses/computer-science/CS224-Fall2014-Harvard/lecture- 02.html.

References

[1] Peter van Emde Boas. Preserving order in a forest in less than logarithmic time. In 16th Annual Symposium on Foundations of Computer Science (sfcs 1975), pages 75–84. IEEE, 1975. [2] Dan E Willard. Log-logarithmic worst-case range queries are possible in space θ (n). Information Processing Letters, 17(2):81–84, 1983. [3] Michael L Fredman and Dan E Willard. Surpassing the information theoretic bound with fusion trees. Journal of computer and system sciences, 47(3):424–436, 1993. [4] Arne Andersson and Mikkel Thorup. Dynamic ordered sets with exponential search trees. Journal of the ACM (JACM), 54(3):13–es, 2007. [5] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2009.

10