DART: Distributed Adaptive Radix Tree for Efficient Affix-Based

DART: Distributed Adaptive Radix Tree for Eicient Aix-based Keyword Search on HPC Systems Wei Zhang Houjun Tang Texas Tech University Lawrence Berkeley National Laboratory Lubbock, Texas Berkeley, California [email protected] [email protected] Suren Byna Yong Chen Lawrence Berkeley National Laboratory Texas Tech University Berkeley, California Lubbock, Texas [email protected] [email protected] ABSTRACT 1 INTRODUCTION Ax-based search is a fundamental functionality for storage systems. Ax-based keyword search is a typical search problem, where It allows users to nd desired datasets, where attributes of a dataset resulting data records match a given prex, sux, or inx. For match an ax. While building inverted index to facilitate ecient ax- contemporary parallel and distributed storage systems in HPC en- based keyword search is a common practice for standalone databases vironments, searching string-based metadata is a frequent use case and for desktop le systems, building local indexes or adopting index- for users and applications to nd desired information. For instance, ing techniques used in a standalone data store is insucient for high- the ax-based keyword search on string-based identiers, such performance computing (HPC) systems due to the massive amount of as “name=chem*” and “date=*/2017”, is a common requirement in data and distributed nature of the storage devices within a system. In searching through the metadata among terabytes to petabytes of this paper, we propose Distributed Adaptive Radix Tree (DART), to data generated by various HPC applications [1, 7, 22, 23, 27, 32]. address the challenge of distributed ax-based keyword search on It is a common practice to build trie-based inverted index to ac- HPC systems. This trie-based approach is scalable in achieving e- celerate ax-based keyword search. There are several standalone cient ax-based search and alleviating imbalanced keyword distribu- trie-based indexing data structure, such as prex B-tree [8], Patricia tion and excessive requests on keywords at scale. Our evaluation at dif- tree [19], or adaptive radix tree (ART) [21]. But none of these stan- ferent scales shows that, comparing with the “full string hashing” use dalone data structures can address the metadata search problem of case of the most popular distributed indexing technique - Distributed an HPC system. The massive amount of data on HPC system will Hash Table (DHT), DART achieves up to 55× better throughput with cause leaf node expansion in these trie-based data structures and prex search and with sux search, while achieving comparable through- hence exhausts limited memory resources while diminishing the put with exact and inx searches. Also, comparing to the “initial hash- overall eciency. Thus, it is preferable to distribute such trie-based ing” use case of DHT, DART maintains a balanced keyword distri- inverted index onto multiple nodes to utilize the data parallelism bution on distributed nodes and alleviates excessive query workload and to reduce tree traversal overhead. against popular keywords. However, it is challenging to build distributed inverted indexes on HPC systems. One challenge comes from the distributed nature CCS CONCEPTS of HPC storage systems which involves communication overhead. • Computing methodologies → Parallel algorithms; Some recent research eorts (e.g. [40]) explored the practice of indexing metadata by storing metadata into multiple distributed KEYWORDS instances of database management systems, such as SQLite[42], distributed search, distributed ax search, distributed inverted MySQL [31], PostgreSQL [33] or MongoDB [30]. With such ap- index proach, queries have to be sent to all participating database instances and the results have to be collected from all these instances. ACM Reference Format: Such broadcasting and reduction operations cause heavy commu- Wei Zhang, Houjun Tang, Suren Byna, and Yong Chen. 2018. DART: Dis- nication overhead and hence reduce search eciency. The same tributed Adaptive Radix Tree for Ecient Ax-based Keyword Search on problem can also be found in some cloud computing solutions, such HPC Systems. In International conference on Parallel Architectures and Com- pilation Techniques (PACT ’18), November 1–4, 2018, Limassol, Cyprus. ACM, as Apache SolrCloud [2] and ElasticSearch [11], in building and New York, NY, USA, 12 pages. https://doi.org/10.1145/3243176.3243207 using distributed inverted indexes. Moreover, these approaches do not scale well in HPC environments and it is dicult to ensemble ACM acknowledges that this contribution was authored or co-authored by an employee, them into a HPC storage system as a pluggable component. Also, contractor, or aliate of the United States government. As such, the United States government retains a nonexclusive, royalty-free right to publish or reproduce this as a common practice for managing distributed index in many HPC article, or to allow others to do so, for government purposes only. storage systems (such as SoMeta [43], IndexFS [35], FusionFS [49], PACT ’18, November 1–4, 2018, Limassol, Cyprus and DeltaFS [50]), distributed hash table (DHT) [47] has been used © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-5986-3/18/11...$15.00 to build inverted index for metadata [48]. However, when it comes https://doi.org/10.1145/3243176.3243207 1 PACT ’18, November 1–4, 2018, Limassol, Cyprus Wei Zhang, Houjun Tang, Suren Byna, and Yong Chen to ax-based keyword search, the application of DHT on each SQLite [42], ElasticSearch [11], and Apache Lucene [13] can also entire keyword may still result in query broadcasting for prex or be used for such purpose. In addition, each server leverages locality sux searches. between the index and the actual documents or data items being Another challenge in building distributed inverted indexes comes indexed (TagIt [40] is a typical example of such approach). How- from the emerging trend of enabling user-dened tags in storage ever, a keyword query must be broadcasted to all servers in order systems (e.g., SoMeta [43] and TagIt [40]). As compared to tradi- to retrieve all the related documents or data items, and requires all tional le systems where only a xed number of attributes are servers to participate in the process of serving a query. Such query available in the metadata, such practice can introduce unlimited broadcasting can lead to intense network trac and exhausts lim- number of attributes and user-dened tags. As the number of key- ited computing resources when query frequency rises. In addition, words to be indexed increases, the keyword distribution lead (or load balance of inverted index on dierent servers highly relies on ended) by dierent prexes (or suxes) and the keyword popularity the partitioning strategy of documents or data items. In this case, it can be imbalanced. Such imbalance would cause query contention can be dicult to manage the distribution of indexed keywords on and poor resource utilization in a distributed system, which may each server, and hence may result in imbalanced workload. In this discount the eciency of ax-based keyword search. Thus, it is study, we do not consider document-partitioned approach necessary to consider load balancing issue when creating such nor compare our work with any document-partitioned ap- distributed trie-based inverted index. proach like TagIt[40] because the load imbalance of this approach Given the challenges of distributed ax-based keyword search, prevents itself from supporting ecient ax-based keyword search. in this paper, we propose a distributed trie-based inverted index The term-partitioned approach takes each indexed keyword as for ecient ax-based keyword search. Our technique can be eas- the cue for index partitioning. The typical case of such approach ily applied on HPC metadata management systems that facilitate designed for HPC systems is the use of DHT, which can be found parallel client-server communication model. Inspired by Adaptive in many recent distributed metadata and key value store solutions, Radix Tree (ART) [21], we name our proposed approach as DART - such as SoMeta [43], IndexFS [35], FusionFS [49], and DeltaFS [50]. Distributed Adaptive Radix Tree. We have designed DART to pro- However, the application of DHT still remains inappropriate for vide simplicity and eciency for distributed ax-based keyword ax-based keyword search in two-fold. search. First, when applying hashing algorithm against each keyword, The major contributions of this paper are: it may result in a well balanced keyword distribution with careful • We designed the DART partition tree and its initializing selection of hash function. However, this DHT approach only sup- process in which the height of the DART partition tree and ports ecient exact keyword search. When it comes to prex search the number of DART leaf nodes is determined by the number or sux search, the query still has to be sent to all the servers and of servers, which ensures the scalability of DART. hence the search performance is far from optimal. In this paper, we • We composed the index creation process with a series of call this approach as “full string hashing”, in the sense that the node selection procedures and replication mechanism on hashing algorithm is applied on the entire keyword. top of DART partition tree so that the query workload on Second, to support ecient prex search, a common approach dierent index node can be balanced. to use DHT is the n-gram technique [14, 15, 37, 46]. This approach • We devised the query routing procedure on DART to achieve iterates over all prexes of each keyword (including the keyword ecient query response. Our evaluation demonstrates that itself), and creates index for each individual prex of the keyword. DART outperforms DHT by up to 55× in executing prex However, since each keyword may have multiple prexes, and such and sux keyword searches and performs similar to DHT approach leads to signicant space consumption for storing the for exact and inx searches.

DART: Distributed Adaptive Radix Tree for Efficient Affix-Based

KP-Trie Algorithm for Update and Search Operations

The Adaptive Radix Tree

Artful Indexing for Main-Memory Databases

STAT 534 Lecture 4 April, 2019 Radix Trees C 2002 Marina Meil˘A [email protected]

Wormhole: a Fast Ordered Index for In-Memory Data Management

(Fall 2019) :: Tree Indexes II

Hyperion: Building the Largest In-Memory Search Tree

START – Self-Tuning Adaptive Radix Tree

Virtual Memory Algorithm Improvement

File Processing & Organization Course No: 5901227-3

Advanced Data Structures – SCSA1304

Space-Efficient Data Structures for Top-K Completion