Implementing Parallel and Concurrent Tree Structures

Implementing Parallel and Concurrent Tree Structures Yihan Sun Guy Blelloch Carnegie Mellon University Carnegie Mellon University [email protected] [email protected] Abstract 1 Introduction As one of the most important data structures used in al- Recently, the advent and development of shared-memory gorithm design and programming, balanced search trees multi-core machines has improved the computation ability to are widely used in real-world applications for organizing process large-scale data in parallel and in memory. As such, data. Answering the challenges thrown up by modern large- it is of great interest to have simple and efficient parallel volume and ever-changing data, it is important to consider data structures to easily organize and process data. One of parallelism, concurrency, and persistence. This tutorial will the most important data structures for organizing data is introduce techniques for supporting functionalities on trees, the balanced search tree (BST) structure, which are useful including various parallel algorithms, concurrency, multi- in maintaining abstract data types such as ordered maps versioning, etc. In particular, this tutorial will focus on an and sets. This tutorial will introduce techniques to support algorithmic framework for parallel balanced binary trees, parallelism and concurrency in balanced search trees for which works for multiple balancing schemes, including AVL ordered sets/maps operations, show examples of applying trees, red-black trees, weight-based trees, and treaps. This the tree structures in various applications, as well as discuss framework allows for theoretically-efficient algorithms. The some state-of-the-art tree structures. corresponding implementation is available as a library, which We focus on writing simple and efficient parallel algo- demonstrates good performance both sequentially and in rithms for trees. This tutorial introduces an algorithmic parallel in various use scenarios. framework for parallel balanced binary trees [6, 20], which This tutorial will focus on the following topics: 1) the al- bases all tree algorithms on a single primitive Join. This gorithms and techniques used in the PAM library; 2) the framework is extendable to at least four balancing schemes: interface of the library and a hands-on introduction to the AVL trees, red-black trees, weight-balance trees, and treaps, download/installation of the library; 3) examples of apply- and all algorithms except Join are generic across balancing ing the library to various applications and 4) introduction schemes. Based on this Join-based framework, this tutorial about other useful techniques for parallel tree structures and will address techniques including many algorithms, concur- performance comparisons with PAM. rency, augmentation, persistence (meaning to yield a new version on updating) and multi-versioning. We show efficient CCS Concepts • Theory of computation → Sorting and parallel solutions to bulk operations on trees, such as Union, searching; Shared memory algorithms; • Computing Filter, MapReduce, etc.. From a theoretical standpoint, all methodologies → Shared memory algorithms. algorithms on trees are work-efficient with poly-logarithmic Keywords balanced tree, augmented map, parallel, concur- parallel depth. rent, library, PAM, ordered set, ordered map This parallel tree framework is integrated into a C++ library called PAM (Parallel Augmented Maps) [18]. This tuto- ACM Reference Format: rial will also discuss how to use the framework and the PAM Yihan Sun and Guy Blelloch. 2019. Implementing Parallel and Con- library to solve real-world problems. The tree structure is current Tree Structures. In 24th ACM SIGPLAN Symposium on Prin- ciples and Practice of Parallel Programming (PPoPP ’19), February 16– extendable to a variety of applications in different domains, 20, 2019, Washington, DC, USA. ACM, New York, NY, USA, 4 pages. which is achieved by using an abstract data type (ADT) called https://doi.org/10.1145/3293883.3302576 the augmented map [20]. Designed as a general-purpose library for parallel tree structures, this library can be directly Permission to make digital or hard copies of part or all of this work for applied to many applications including 2D range/segmen- personal or classroom use is granted without fee provided that copies are t/rectangle search, inverted index searching, HTAP database not made or distributed for profit or commercial advantage and that copies systems, multi-version concurrency control, graph process- bear this notice and the full citation on the first page. Copyrights for third- party components of this work must be honored. For all other uses, contact ing systems, and so on. Making use of the library, each of the owner/author(s). the applications only needs about one hundred lines of high- PPoPP ’19, February 16–20, 2019, Washington, DC, USA level code to get highly-optimized implementations. © 2019 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-6225-2/19/02. https://doi.org/10.1145/3293883.3302576 PPoPP ’19, February 16–20, 2019, Washington, DC, USA Yihan Sun and Guy Blelloch The advantage of the Join-based framework and the li- 1 Split¹T; kº = brary lies in its generality, and its simplicity and efficiency 2 if T = ; then ¹;,False,;º in both theory and practice. 3 else if k = k¹T º then ¹l¹T º, True, r¹T ºº 1. Multiple real-world problems can be solved directly 4 else if k < m then let ¹L0;b; R0º=Split¹l¹T º; kº based on the augmented map abstraction and the PAM 5 in ¹L0,b,Join¹R0; e¹T º; r¹T ººº library. This reduces the incremental effort for users to 6 else let ¹L0;b; R0º=Split¹r¹T º; kº adapt this tree structure simply as a black box to their 7 in ¹Join¹l¹T º; e¹T º; L0º, b, r¹T ºº own applications. Because of the functionalities (e.g., 8 Union¹T ;T º = concurrency, persistence, multi-versioning, garbage 1 2 9 if T = ; then T collection) supported, users can deal with the prob- 1 2 10 else if T = ; then T lems on a higher level of abstraction without worrying 2 1 11 else let hL ;v0; R i = Split¹T ; k¹T ºº about the details in the implementation. 1 2 1 2 12 and L = Union¹L ; l¹T ºº jj R = Union¹R ; r¹T ºº 2. Multiple algorithms and balancing schemes can be 1 2 1 2 ¹ ; ¹ º; º dealt with using generic methodology. The Join func- 13 in Join L e T2 R tion captures all that is required for rebalancing. As a 14 Insert¹T; k;vº = result, all algorithms except Join are generic for multi- 15 if T = ; then Singleton¹k;vº ple balancing schemes. This minimizes the coding ef- 16 else if k < k¹T º then Join¹Insert¹l¹T º; k;vº; e¹T º; r¹T ºº fort to re-create the tree structure with special require- 17 else if k > k¹T º then Join¹L; e¹T º; Insert¹r¹T º; k;vºº ments when necessary (e.g., in another programming 18 else T language). Furthermore, this allows for extendability 19 MapReduce¹T;д; f ; Iº = to other balancing schemes and parallel algorithms. 20 if T = ; then I This tutorial will also have a hands-on introduction to 21 else let L = MapReduce¹l¹T º;д; f ; Iº the download/installation of the library. We will show code 22 jj R = MapReduce¹r¹T º;д; f ; Iº examples on how to use the framework and the library. This 23 in f ¹L; f ¹д¹e¹T ºº; Rºº tutorial will finally show comparisons among different sys- 24 Build'¹S; i; jº = tems and tree structures under different workloads and ap- ; plications, and show analysis on the favored properties for 25 if i = j then ¹ » ¼º trees under different scenarios. 26 else if i + 1 = j then Singleton S i The library is available at https://github.com/cmuparlay/ 27 else let m = ¹i + j)/2 PAM. More information can be found at https://cmuparlay. 28 and L = Build'¹S; i;mº jj R = Build'¹S;m + 1; jº github.io/PAMWeb/. 29 in Join¹L; S»m¼; Rº 30 Build¹Sº = Experimental Settings. All experiments shown in this tu- 31 Build'¹RemoveDuplicates¹Sort¹Sºº; 0; jSjº torial are tested on a 72-core Dell R930 with 4 x Intel(R) Xeon(R) E7-8867 v4 (18 cores, 2.4GHz and 45MB L3 cache) Figure 1. Example of algorithms using Join. with 1TB memory. Each core is 2-way hyperthreaded giving 144 hyperthreads. The code was compiled with -O2 using the than all keys in TR . In the sequential setting, this function g++ 5.4.1 compiler which supports the Cilk Plus extensions. was first defined by Tarjan [21], and later extended to other balancing schemes [2, 17]. Blelloch et al. describe the Join Preliminaries. We call each element (key-value pairs) in algorithms for AVL trees, red-black trees, weight-balanced the tree an entry, noted as e = ¹k;vº. We assume keys type trees and treaps, respectively, and use them as primitives for K and value type V . We define the entry type E = K ×V . We parallel tree algorithms. The Join function will deal with all use l¹xº or r¹xº to extract the left or right subtree of a tree rotation and rebalancing issues for the other algorithms. As node x. We use k¹xº, v¹xº and e¹xº to extract the key, value a result, all the other algorithms are identical across multiple and entry of a tree node or the root of a (sub)tree. balancing schemes, most of them also parallel. Figure 1 shows several examples. We show some perfor- 2 Algorithms mance numbers in Table 1.

Load more