Ribbon Filter: Practically Smaller Than Bloom and Xor1
Total Page:16
File Type:pdf, Size:1020Kb
Ribbon filter: practically smaller than Bloom and Xor1 Peter C. Dillinger Stefan Walzer Facebook, Inc. University of Cologne Seattle, Washington, USA Cologne, Germany [email protected] [email protected] ABSTRACT themselves2. Thus, the Bloom filter filters out almost all specific 3 Filter data structures over-approximate a set of hashable keys, i.e. set key queries to data files that would find no relevant data. False membership queries may incorrectly come out positive. A filter with negative (FN) queries would be incorrect for this application and false positive rate 5 0, 1 is known to require log 1 5 bits must never occur. 2 ¹ ¼ ≥ 2 ¹ / º per key. At least for larger 5 2 4, existing practical filters require Blocked Bloom filters [44, 54] are a popular Bloom variant be- ≥ − a space overhead of at least 20% with respect to this information- cause they are extremely fast. We do not expect to improve upon 4 theoretic bound. this solution for short-lived applications such as database joins We introduce the Ribbon filter: a new filter for static sets with or the smallest levels of an LSM-tree. However, Bloom filters use a broad range of configurable space overheads and false positive at least 44% more space (“space overhead”) than the information- theoretic lower bound of _ = log 1 5 bits per key for a hashed rates with competitive speed over that range, especially for larger 2 ¹ / º 5 2 7. In many cases, Ribbon is faster than existing filters for filter with FP rate 5 [13, Section 2.2]. Blocked Bloom can exceed ≥ − the same space overhead, or can achieve space overhead below 10% 50% space overhead for small 5 . with some additional CPU time. An experimental Ribbon design In this work we focus on saving space in static filters, and opti- with load balancing can even achieve space overheads below 1%. mizing the CPU time required for saving space. Our presentation A Ribbon filter resembles an Xor filter modified to maximize and validation are kept general, but an intended application is opti- locality and is constructed by solving a band-like linear system mizing accesses to the largest levels of an LSM-tree, where it should over Boolean variables. In previous work, Dietzfelbinger and Walzer be worth CPU time to save space in long-lived memory-resident 5 describe this linear system and an efficient Gaussian solver. We structures . In [18] it is shown that a relatively high FP rate for present and analyze a faster, more adaptable solving process we call these levels is best for overall efficiency. However, the space sav- “Rapid Incremental Boolean Banding ON the fly,” which resembles ings offered by existing practical Bloom filter alternatives is limited (dotted line in Figure 1 (a)), especially for higher FP rates, _ 5. hash table construction. We also present and analyze an attractive ≤ Ribbon variant based on making the linear system homogeneous, Bloom filter and alternatives. We can categorize hashed filters and describe several more practical enhancements. by the logical structure of a query: OR probing: Cuckoo filters [11, 32, 34, 35], Quotient filters [5, 15, 30, 51], and most others return true for a query if any one of several probed locations is a hashed data match, much 1 INTRODUCTION like hash table look-ups. This design is great for supporting dynamic add and delete, but all known instances use 1 ¹ ¸ Background and motivation. The primary motivation for this Y _ ` bits per key where ` ¡ 1.44, or often ` = 3 for speed6. º ¸ work is optimizing data retrieval, especially in systems aggregating Even with Y 0.05, the space overhead is large for small _. ≈ immutable data resources. In the example of LSM-tree storage [50], AND probing: A Bloom filter query returns true iff all probed persisted key-value data is primarily split among immutable data locations (bits) are a match (set to 1). files. A crucial strategy for reducing I/O in key look-ups is filtering accesses using an in-memory data structure. In a common configu- 2Learned filters [55] or tries [59] can take advantage of regularities in the key set. A arXiv:2103.02515v2 [cs.DS] 8 Mar 2021 ration, each data file has an associated Bloom filter [9] representing space-efficient hash table [15] can take advantage of a densely covered key space. We the set of keys with associated data in that file. The Bloom filter has focus on the general case. 3Static filters can also support range queries, either through prefix Bloom [48] or more some false positive (FP) rate, which is the probability that querying sophisticated schemes [47]. a key not added returns true (positive). For example, configuring 4In Section 7 we mention and use another blocked Bloom implementation with new the Bloom filter to use 10 bits of space per added key yields anFP trade-offs [24]. 5In some large-scale applications using RocksDB [31] for LSM-tree storage, we observe rate of just under 1%, regardless of the size or structure of the keys roughly 10% of memory and roughly 1% of CPU used in blocked Bloom filters. The size-weighted average age of a live filter is about three days. An experimental Ribbon filter option was added to RocksDB in version 6.15.0 (November 2020). This work is licensed under the Creative Commons BY-NC-ND 4.0 International 6` ¡ 1.44 can be explained by these structures approximating a minimal perfect License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of hash [36], a near-strict ordering of keys based on the location of their matching entry this license. Copyright is held by the owner/author(s). in the structure, on top of _ payload bits per key. This is an observation about existing 1Title inspired by [34, 35, 39]. structures, not necessarily a fundamental limitation. 1 Peter C. Dillinger and Stefan Walzer XOR probing: Xor filters10 [ , 14, 20, 37, 39] return true for a 5 fastest lter query iff the bitwise exclusive-or (XOR) of all probed loca- 16 2− tions is a hashed data match. XOR probing is only known to Xor 14 Xor+ work with static filters. 2− Bloom Structures using XOR probing are the most promising for space 12 2− efficient static filters. They are constructed by solving a linearsys- Cuckoo tem with one constraint per key ensuring that querying the key 10 2− returns true. Standard Xor filters use a fast solving process called Balanced 7 8 peeling that limits their space efficiency to 1.22_ bits per key , 2− Ribbon ≥ though a variant with fast compression, the Xor+ filter [39], uses 6 roughly 1.08_ 0.5 bits per key, which is an improvement for larger 2− ¸ _. Using structured Gaussian elimination instead of peeling [21, 37] 4 2− Homogeneous offers better space efficiency, but construction times are considered 2 Ribbon impractical for many applications. 2 Blocked Bloom − % space overhead Core contribution. We introduce a faster, simplified, and more 100 50 30 16 8 4 2 1 adaptable Gaussian elimination algorithm (Ribbon) for the static 5 query times 5 construction times function data structure from [22]. Based on Ribbon, we develop a 16 16 2− 100 ns 2− 300 ns 14 14 family of practical and highly space-efficient XOR-probed filters. 2− 70 ns 2− 150 ns 12 12 2− 50 ns 2− 80 ns Results and comparison. Figure 1 (a) summarizes extensive bench- 10 40 ns 10 2− 2− 40 ns 8 30 ns 8 marking data by indicating which structure is fastest for satisfying 2− 2− 20 ns 6 20 ns 6 2− 2− various space and FP rate requirements for a static filter. For “fastest” 4 15 ns 4 10 ns 2− 2− 2 10 ns 2 5 ns we consider the sum of the construction time per key and three 2− % space 2− % space overhead overhead query times (measured for G (, G 8 ( and a mixed data set).8 100 5030 16 8 4 2 1 100 5030 16 8 4 2 1 2 Although we compare Ribbon with many approaches imple- mented in the fastfilter benchmark library38 [ ], only variants of Figure 1: (a) Fastest filter with the given combination of Bloom, Cuckoo, Xor, and Ribbon emerge as winners. Specifically, space overhead and false positive rate, considering a mix of the color at point G,~ indicates the fastest filter with space over- construction and query times. (b) Construction and query ¹ º head at most G and FP rate 5 ~ 2,~ 2 for = = 107 keys. Diagonal times for fastest approaches in (a). 2 » / · ¼ shading indicates different winners for = = 106 and = = 108. The timings for the winning approach are also shown in Figure 1 (b). We observe the following. small Ribbon structures (“smash”). These features go into the Ribbon wins to the right of the dotted line because none of • Standard Ribbon filter, which in practice has increasing space the competing approaches achieve space overhead this low. overhead or running time as the number of keys increases.9 Ribbon wins in some territory previously occupied by Xor • Section 4. We present the Homogeneous Ribbon filter, which and Xor+ filters, mostly for 5 ¡ 2 8 (_ < 8) from Xor and − shares many desirable properties with blocked Bloom filters: 5 ¡ 2 12 (_ < 12) for Xor+. In these cases, Ribbon-style − construction success is guaranteed, and scaling to any num- Gaussian elimination is faster than peeling. ber of keys is efficient. Homogeneous Ribbon does not build Blocked Bloom filters are still the fastest whenever appli- • on static functions in the standard way, which simplifies cable, though Cuckoo and Xor take some of the remaining implementation but complicates analysis.