Ribbon filter: practically smaller than Bloom and Xor1 Peter C. Dillinger Stefan Walzer Facebook, Inc. University of Cologne Seattle, Washington, USA Cologne, Germany
[email protected] [email protected] ABSTRACT themselves2. Thus, the Bloom filter filters out almost all specific 3 Filter data structures over-approximate a set of hashable keys, i.e. set key queries to data files that would find no relevant data. False membership queries may incorrectly come out positive. A filter with negative (FN) queries would be incorrect for this application and false positive rate 5 0, 1 is known to require log 1 5 bits must never occur. 2 ¹ ¼ ≥ 2 ¹ / º per key. At least for larger 5 2 4, existing practical filters require Blocked Bloom filters [44, 54] are a popular Bloom variant be- ≥ − a space overhead of at least 20% with respect to this information- cause they are extremely fast. We do not expect to improve upon 4 theoretic bound. this solution for short-lived applications such as database joins We introduce the Ribbon filter: a new filter for static sets with or the smallest levels of an LSM-tree. However, Bloom filters use a broad range of configurable space overheads and false positive at least 44% more space (“space overhead”) than the information- theoretic lower bound of _ = log 1 5 bits per key for a hashed rates with competitive speed over that range, especially for larger 2 ¹ / º 5 2 7. In many cases, Ribbon is faster than existing filters for filter with FP rate 5 [13, Section 2.2].