Speeding up Dynamic Bit Vectors

MSc thesis Master’s Programme in Computer Science Speeding up dynamic bit vectors Saska Donges¨ April 20, 2021 Faculty of Science University of Helsinki Supervisor(s) Assoc. Prof. Simon J. Puglisi Examiner(s) Assoc. Prof. Simon J. Puglisi, Prof. Veli. Makinen¨ Contact information P. O. Box 68 (Pietari Kalmin katu 5) 00014 University of Helsinki, Finland Email address: [email protected].fi URL: http://www.cs.helsinki.fi/ HELSINGIN YLIOPISTO – HELSINGFORS UNIVERSITET – UNIVERSITY OF HELSINKI Tiedekunta — Fakultet — Faculty Koulutusohjelma — Utbildningsprogram — Study programme Faculty of Science Master’s Programme in Computer Science Tekija¨ — Forfattare¨ — Author Saska Donges¨ Tyon¨ nimi — Arbetets titel — Title Speeding up dynamic bit vectors Ohjaajat — Handledare — Supervisors Assoc. Prof. Simon J. Puglisi Tyon¨ laji — Arbetets art — Level Aika — Datum — Month and year Sivumaär¨ a¨ — Sidoantal — Number of pages MSc thesis April 20, 2021 39 pages, 12 appendice pages Tiivistelma¨ — Referat — Abstract Bit vectors have many applications within succinct data structures, compression and bioinformatics among others. Any improvements in bit vector performance translates to improvements in the applications. In this thesis we focus on dynamic bit vector performance. Fully dynamic succinct bit vectors enable other dynamic succinct data structures, for example dynamic compressed strings. We briefly discuss the theory of bit vectors and the current state of research related to static and dynamic bit vectors. The main focus of the thesis is on our research into improving the dynamic bit vector implementation in the DYNAMIC C++ library (Prezza, 2017). Our main contribution is the inclusion of buffering to speed up insertions and deletions while not negatively impacting non-modifying operations. In addition, we optimized some of the code in the DYNAMIC library and experimented with vectorizing some of the access operations. Our code optimizations yield a substantial improvement to insertion and deletion performance. Our buffering implementation speeds up insertions and deletions significantly, with negligible impact to other operations or space efficiency. Our implementation acts as proof-of-concept for buffering and suggests that future research into more advanced buffering is likely to increase performance. Finally, our testing indicates that using vectorized instructions in the AVX2 and AVX512 microarchitecture extensions is beneficial in at least some cases and should be researched further. Our implementation available at https://github.com/saskeli/DYNAMIC should only be considered a proof-of-concept as there are known bugs in some of the operations that are not extensively tested. ACM Computing Classification System (CCS) Data ! Data Structures Theory of computation ! Design and analysis of algorithms ! Data structures design and analysis ! Data compression Avainsanat — Nyckelord — Keywords bit vector, buffering, C++, compression, data structure, dynamic, SIMD Instructions, Vectorization Sailytyspaikka¨ — Forvaringsst¨ alle¨ — Where deposited Helsinki University Library Muita tietoja — ovriga¨ uppgifter — Additional information Algorithms study track Contents 1 Introduction1 2 Definitions and notation3 2.1 Bit vectors..................................... 3 2.2 O-notation..................................... 4 3 Static bit vectors6 3.1 Simple bit vector implementations........................ 6 3.2 Population count vectorization.......................... 7 3.3 Constant time(ish) Rank and Select........................ 7 3.3.1 Rank.................................... 8 3.3.2 Select................................... 9 3.4 Practical mostly static implementation...................... 9 4 Dynamic bit vectors 10 4.1 B-trees....................................... 10 4.2 Practical fully dynamic implementation ..................... 11 4.2.1 Access and modification ......................... 13 4.2.2 Rank and Select.............................. 14 4.2.3 Insertion and removal........................... 14 4.2.4 Space requirement ............................ 14 5 Our contributions 16 5.1 Division and modulus bypass........................... 16 5.2 Background on write optimization ........................ 17 5.3 Buffering ..................................... 17 5.3.1 Insertion.................................. 19 5.3.2 Removal.................................. 20 5.3.3 Access and modification ......................... 21 5.3.4 Rank.................................... 21 5.3.5 Select................................... 22 5.4 AVX experimentation............................... 22 6 Experiments 25 6.1 Scaling experiment ................................ 26 6.2 Mixture test.................................... 29 6.3 Application testing ................................ 31 6.4 Vectorization testing................................ 32 7 Conclusions and future work 35 7.1 Optimizing memory allocation.......................... 35 7.2 Query caching................................... 36 7.3 Branchless binary search ............................. 36 7.4 Bit vector compression .............................. 36 Bibliography 37 A Experiment results expanded 1 Introduction Bit vectors are an integral part of many widely used algorithms and data structures. In the simplest case, bit vector implementations, such as std::bitset in C++, can be used as efficient arrays of booleans for use with path finding algorithms or similar. Several compressed and succinct data structures can be built using bit vectors (Navarro, 2016). Any gains in efficiency for the underlying bit vector implementation translate directly to gains for the applications. Further, support for additional operations like Rank and Select can be leveraged in the application. For example, a string that supports random access, can be represented in compressed space as a wavelet tree (a specifically built tree with bit vectors as nodes), given bit vectors that support Rank and Select. If modifications, inserts and removals are supported as well, these can also be used in an application to provide, for example, insert and remove operations for compressed strings. A great deal of effort has been made to optimize static lookup structures for bit vectors (Gog, Beller, et al., 2014). However, dynamic bit vectors can still be optimized to improve the performance of fully dynamic implementations of data structures that rely on them. This thesis is a look at work that has been done so far on dynamic bit vectors, as well as a presentation of some of our own preliminary results, along with some promising avenues for further research. Our research is based on modifications to the succinct bit vector implementation in the DYNAMIC library presented in Prezza, 2017, where a blocking approach along with a tree structure is used. Similar approaches are presented in at least Zhou et al., 2013,Karkk¨ ainen¨ et al., 2014, Klitzke and Nicholson, 2016 and Cordova and Navarro, 2016, where the blocking and tree structures are used to enable dynamism, block compression or support structures for Rank and Select. Blocking approaches potentially suffer greatly from memory fragmentation due to the nature of memory allocation for such structures by operating systems. Our contribution is a modification to the leaves of the succinct bit vector tree structure of the DYNAMIC library. With simple code optimization and an addition of buffering, our leaf implementation significantly speeds up insertion and removal operations for the data structure without significant penalty to non-modifying operations or data structure size. We have not, as of yet, considered compression of the bit vectors, as the main focus of our research is to speed up the current dynamic implementations with minimal impact on space usage. Compression schemes may be beneficial both in terms of space and run time for some 2 input data. We do not address memory fragmentation either, beyond briefly noting it in the results of some of our experiments. Possible future work related to compression and reducing memory fragmentation are discussed in Section 7.4. We will start by presenting definitions and the notation used throughout the thesis. After this, some current approaches for static structures are discussed, followed by a presentation of one state of the art implementation for fully dynamic bit vectors. After presenting our research and results in Chapters5 and6, conclusions and discussion of possible future directions for research will be follow in Chapter7. 2 Definitions and notation 2.1 Bit vectors In principle, bit vectors are sequences of values (v0; v1;:::; vn) where vi = f0; 1g. For a given bit vector V 2 M, the operations below are typically defined. The precise implementation of these operations depend on the bit vector implementation. Access: A function f : M × ! f0; 1g maps f (V; i) = vi to the value of the bit vector V at position i. This is often denoted V.at(i) or simply V[i] in (pseudo)code. Formally, f (V; i) can be defined to be 0 if vi is undefined. In practice, querying beyond the length of the bit vector is considered undefined behavior. Modification: A function f : M × × f0; 1g ! M maps f (V; i; b) = V0 to another bit vector where 8 j (( j = i ) V0[ j] = b) ^ ( j , i ) V0[ j] = V[ j])). In this thesis, the notation V.set(i, v) or V[i]=v will be used in code. Insertion: A function f : M × × f0; 1g ! M maps f (V; i; b) = V0 to another bit vector where 8 j (( j < i ) V0[ j] = V[ j]) ^ ( j = i ) V0[ j] = b) ^ ( j > i ) V0[ j] = V[ j − 1])). Typically, the notation V.insert(i,

Speeding up Dynamic Bit Vectors

Podcast Ch23a

Binary Index Trees a Cumulative Frequency Array Allows Us To

B-Bit Sketch Trie: Scalable Similarity Search on Integer Sketches

Compact Fenwick Trees for Dynamic Ranking and Selection

Efficient Data Structures for High Speed Packet Processing

K-Mer Data Structures Rayan Chikhi CNRS, Univ

CMU SCS 15-721 (Spring 2020) :: OLTP Indexes (Trie Data Structures)

Space- and Time-Efficient String Dictionaries

Reconfigurable Architecture for Minimal Perfect Sequencing Using the Convey Hybrid Core Computer Chad Michael Nelson Iowa State University

The Guide to Xillybus Lite

Hashing and Amortization

Memory-Efficient Search Trees for Database Management Systems