Parallel Balanced Binary Trees in Shared-Memory

Parallel Balanced SPAA2020 Tutorial Binary trees in Shared-Memory Yihan Sun University of California, Riverside What’s in this tutorial • Algorithms and implementation details about parallel balanced binary trees (P-trees) • Simple with various functionalities • An open-source parallel library (PAM) and example code to use it • Applications that can be solved using the algorithms and the library in this tutorial 2 [Sedgewick and Wayne] Sec. 3.2 Binary Search Trees Trees Sec. 3.3 Balanced Search Trees • Trees are fundamental data structures for organizing data • Taught in entry-level [TAOCP] 2.3 Trees undergraduate courses 6.2.2. Binary Tree Searching 6.2.3. Balanced Trees 6.2.4. Multiway Trees • In real-world applications, things are more complicated… [CLRS] Especially in parallel 12 Binary Search Trees 13 Red-Black Trees 14.3 Interval trees 3 Applications Using Trees Document Search Engine Struct Doc { int l; Find information about pair<Word, Weight>* w; “balanced” and “binary” }; class doc_tree { Balanced AND Binary void build(Doc* d); Doc_set search(Word w); 1,234,567 results Doc_set and_search(Word* w); Doc 1: Balanced binary tree Doc_set or_search(Word* w); A binary tree is balanced if it void add_doc(Doc d); keeps its height small … …… Doc 2: AVL tree }; AVL tree is a balanced binary search tree structure … Searching and updating may need to be done concurrently Applications using trees Databases Find all young CS Struct Student { students with good grade id name, grade, age, major, … }; select class database { name void build(Doc* d); from Student* search(int id); students Student* fitler(function f); where void add_student(Student s); age < 25 …… and major = ‘CS’ }; and grade >= ‘A’ 5 Applications Using Trees Geometric queries (A 2D range query) struct Point { Find average temperature X x; Y y; Weight w; in Riverside }; class range_tree { void build(Point* p); Double ave_weight_search(X x1, X x2, Y y1, Y y2); int count_search(X x1, X x2, Y y1, Y y2); int list_all_search(X x1, X x2, Y y1, Y y2); Riverside void insert(Point p); range_tree filter(func f); Point* output(); void update(X x, Y y, Weight w); }; Balanced Binary Trees • Binary: each tree node has at most two children • Balanced: the tree has bounded height • Usually 푂(log 푛) for size 푛 A wild balanced hyphaene compressa binary tree 7 Balanced Binary Trees • Binary: each tree node has at most two children • Balanced: the tree has bounded height • Usually 푂(log 푛) for size 푛 A wild balanced binary tree 8 Balanced Binary Trees • Binary: each tree node has at most two children • Balanced: the tree has bounded height • Usually 푂(log 푛) for size 푛 An abstract balanced binary tree 9 Balanced Binary Trees • Balancing schemes: invariants to keep the tree balanced and to bound the tree height • We discuss four standard balancing schemes Height balanced Size balanced Randomized balanced Red- Weight- AVL black balanced Treaps Trees Trees Trees 10 Applications Using Trees Document Search Engine Databases Geometric queries Find information about Find all young CS (A 2D range query) “balanced” and “binary” students with good Find average temperature grade in Riverside area: 62 F BalancedlANDlBinary Eleganceselect – Framework 1,234,567 results Genericlfor balancingname schemes Doc 1: Balancedlbinary treeGenericiforfromapplications A binary tree is balanced if it students keeps its height small … Massivewhere Data - Performance Doc 2: AVL tree age < 25 Parallelism and concurrency AVL tree is a balanced binary and major = ‘CS’ Riverside search tree structure … Efficiency bothand in grade theory >= ‘A’and in practice Comprehensive Queries – Functionality Range queries, bulk updates, … What we want Augmentation, … Dynamicity, multi-versioning, … 11 What does P-tree look like? Elegance • Genericlfor balancing schemes Framework • Genericifor applications Massive Data Performance • Parallelism and concurrency • Efficiency both in theory and in practice Functionality Comprehensive Queries • Range queries, bulk updates, … • Augmentation, … Join • Dynamicity, multi-versioning, … (A primitive for trees) 12 1 Genericlfor balancing schemes All algorithms except join are identical across balancing schemes One algorithm for multiple balancing schemes! 2 Genericifor applications Multiple applications based on the same tree structure One tree for different problems! A primitive for trees: Join 13 The Primitive Join • 푇 =Join(푇퐿, 푒, 푇푅) : 푇퐿 and 푇푅 are two trees of a certain balancing scheme, 푒 is an entry/node (the pivot). • 푻푳 < 풆 < 푻푹 • It returns a valid tree 푇, which is 푇퐿 ∪ 푒 ∪ 푇푅 푇퐿 푇 푒4 푅2 푇 = 2 10 푒 푇푅 푇퐿 (Rebalance if necessary) 14 The Primitive Join 4 2 10 8 12 1 3 6 9 11 13 5 7 14 푇퐿 푒 푇푅 15 The Primitive Join • Connect at a balancing point 10 4 8 12 2 6 9 11 13 1 3 5 7 14 Balanced! 16 The Primitive Join • Connect at a balancing point 10 8 12 9 11 4 13 2 6 14 1 3 5 7 17 The Primitive Join • Connect at a balancing point • Rebalance (specific to balancing schemes) • Join algorithms for four balancing schemes and the cost bound [SPAA’16] 10 4 12 2 8 11 13 1 3 6 9 14 5 7 18 How does Join help? Applications 1 Algorithms Using Join Experiments • Generic across balancing schemes • Highly-Parallel • Theoretically efficient 2 Augmentation Using Join • A unified framework for augmentation • Model multiple applications 3 Persistence Using Join • Multi-versioning on trees 19 PART 1 Algorithms Using Join Generic algorithms across balancing schemes Parallel algorithms using divide-and-conquer paradigm Theoretically efficient 20 Join-based insertion 21 Join-based Algorithms: insertion insert(푻, 풌) CompareInsert if 푇 = ∅ then return Singleton(푘) 4 else let 퐿, 푘′, 푅 = 푇 if 푘 < 푘′ then 5 return Join (Insert(퐿, 푘),푘′,푅) else if 푘 > 푘(푇) then 2 9 return Join (퐿, 푘′, Insert(푅, 푘)) 1 3 else return 푇 Join-based Algorithms: insertion insert(푻, 풌) Join results if 푇 = ∅ then return Singleton(푘) (rebalanced may else let 퐿, 푘′, 푅 = 푇 be required) if 푘 < 푘′ then 5 return Join (Insert(퐿, 푘),푘′,푅) else if 푘 > 푘(푇) then 2 9 return Join (퐿, 푘′, Insert(푅, 푘)) 1 3 else return 푇 4 Join-based Algorithms: insertion insert(푻, 풌) Join results if 푇 = ∅ then return Singleton(푘) (rebalanced may else let 퐿, 푘′, 푅 = 푇 be required) if 푘 < 푘′ then 3 Join will return Join (Insert(퐿, 푘),푘′,푅) 2 5 rebalance else if 푘 > 푘(푇) then for Insert! return Join (퐿, 푘′, Insert(푅, 푘)) 1 4 9 else return 푇 푂(log 푛) Join-based split and Join2 25 The Inverse of Join: Split 25 13 51 • ⟨푻푳, 풃, 푻푹⟩ =Split (푻, 풌) • 푇퐿: all keys in 푇 < 푘 8 17 36 80 • 푇푅: all keys in 푇 > 푘 • A bit 푏 indicating whether 푘 ∈ 푇 5 12 15 22 30 42 70 95 22 Split by 42 13 30 70 8 17 22 36 푏 = 푡푟푢푒 51 80 5 12 15 95 푇퐿 푇푅 The Inverse of Join: Split split(푻, 풌) if 푇 = ∅ then return (∅, 퐟퐚퐥퐬퐞, ∅); 푘 ? 푘′ else { 푘′ = key at the root of 푇 if 푘 = 푘′ then { // same key as the root 푇퐿 = 푇. left; 푇푅 = 푇. right; flag = true} if 푘 < 푘′ then { // split the left subtree 푇. right (퐿, flag, 푅) = split(푇. left, 푘); 퐿 푅 푇퐿 = 퐿; 푇푅 = Join (푅, 푘′, 푇. right); } if 푘 > 푘′ then { /* symmetric*/ } 푇퐿 = 퐿 return (푇퐿, flag, 푇푅) ′ 푇푅 = join (푅, 푘 , 푇. 푟푔ℎ푡) 27 Another helper function: join2 • join2(푻푳, 푻푹) • Similar to join, but without the middle key • Can be done by first split out the last key in 푻푳, then use it to join the rest of 푻푳 and 푻푹 join2(푇퐿, 푇푅) { ′ u v (푇퐿, 푘) = split_last(푇퐿); ′ return join(푇퐿, 푘, 푇푅);} 푇퐿 푇푅 k Other Join-based algorithms 29 BST Algorithms • BST algorithms using divide-and-conquer scheme • Recursively deal with two subtrees (possibly in parallel) • Combine results of recursive calls and the root (e.g., using join or join2) • Usually gives polylogarithmic bounds on span func(T, …) { if (T is empty) return base_case; M = do_something(T.root); in parallel: L=func(T.left, …); R=func(T.right, …); return combine_results(L, R, M, …) } Get the maximum value • In each node we store a key and a value. The nodes are sorted by the keys. get_max(Tree T) { if (T is empty) return −∞; in parallel: L=get_max(T.left); R=get_max(T.right); return max(max(L, T.root.value), R); 푂(푛) work and 푂(log 푛) span Similar algorithm work on any map-reduce function Map and reduce • Maps each entry on the tree to a certain value using function map, then reduce all the mapped values using reduce (with identity identity). • Assume map and reduce both have constant cost. map_reduce(Tree T, function map, function reduce, value_type identity) { if (T is empty) return identity; M=map(t.root); in parallel: L=map_reduce(T.left, map, reduce, identity); R=map_reduce(T.right, map, reduce, identity); return reduce(reduce(L, M), R); 푂(푛) work and 푂(log 푛) span Filter • Select all entries in the tree that satisfy function 푓 • Return a tree of all these entries filter(Tree T, function f) { if (T is empty) return an empty tree; in parallel: L=filter(T.left, f); R=filter(T.right, f); if (f(T.root)) return join(L, T.root, R); else return join2(L, R); } 푂(푛) work and 푂(log2 푛) span Construction T=build(Array A, int size) { 푂(푛 log 푛) work and 퐴′=parallel_sort(A, size); 푂(log 푛) span, ′ return build_sorted(퐴 , 푠); bounded by the } sorting algorithm T=build_sorted(Arrary A, int start, int end) { if (start == end) return an empty tree; if (start == end-1) return singleton(A[start]); mid = (start+end)/2; 푂(푛) work and in parallel: 푂(log 푛) span L = build_sorted(A, start, mid); R = build_sorted(A, mid+1, end); return join(L, A[mid], R); Output to array • Output the entries in a tree 푇 to an array in its in-order • Assume each tree node stores its subtree size (an empty tree has size 0) to_array(Tree T, array A, int offset) { 풆 if (T is empty) return; A[offset+T.left.size] = get_entry(T.root); in parallel: 푇.

Parallel Balanced Binary Trees in Shared-Memory

Interval Trees Storing and Searching Intervals

Neuron C Reference Guide Iii • Introduction to the LONWORKS Platform (078-0391-01A)

Augmentation: Range Trees (PDF)

Advanced Data Structures

Search Trees

Computational Geometry: 1D Range Tree, 2D Range Tree, Line

Navigation Techniques in Augmented and Mixed Reality: Crossing the Virtuality Continuum

Range Searching

I/O-Efficient Spatial Data Structures for Range Queries

Parallel Range, Segment and Rectangle Queries with Augmented Maps

Handwritten Digit Classication Using 8-Bit Floating Point Based Convolutional Neural Networks

Lecture Notes of CSCI5610 Advanced Data Structures