PARALLEL SEARCH TREES

Yihan Sun CMU 15-853 Why trees?

 Very important in almost all areas in computer science  Maintain ordering on the keys – search  Build index for databases or search engines  Support ordered sets and maps, priority queues, …, useful as a subroutine in many algorithms  : useful in database systems and computational geometry algorithms  …… Balance Search Trees (BSTs)

 Many algorithms on trees have cost proportional to tree height  Need balancing schemes to keep tree nearly balanced  Balancing scheme: a of invariants on trees  AVL trees, weight-balanced trees, splay trees, , B trees, ……  In this lecture we use balanced binary search trees  Not limited to any specific balancing schemes. The algorithms/bounds hold at least for:  AVL trees, red-black trees, weight-balance trees and treaps What we want to do with trees

 Searching  The first or last entry, the entry of certain rank, …  Insertion and deletion Cost is order of the tree height, 푂(log 푛) for balance trees

 Construction  Filter, map-reduce, …  Bulk insertion and deletion  Merging two trees (or getting the common elements) Can be highly-parallelized

Goal: work-efficient (optimal) and polylogarithmic depth In this lecture

 Parallel algorithms on trees  Applications that can be solved using assorted tree structures

 Next: Preliminaries Preliminaries

 푇 = join(푇퐿, 푒, 푇푅): connects two trees 푇퐿 and 푇푅 with 푒, but get the result balanced 푇 = 푒 in-order(푇)=[in-order(푇퐿), 푒, in-order(푇푅)]

푇푅 푇퐿 Preliminaries

 푇 = join(푇퐿, 푒, 푇푅): connects two trees 푇퐿 and 푇푅 with 푒, but get the result balanced  푇 = join2(푇퐿, 푇푅): similar as join but without the middle entry Both can finish in O(log 푛), where 푛 is the larger tree size (for join a tighter bound is 푂( log 푇퐿 − log 푇푅 ))  ⟨푇퐿, 푏, 푇푅⟩= split(푇, 푘)  푇퐿 contains all keys in 푇 smaller than 푘  푇푅 contains all keys in 푇 greater than 푘  A bit 푏 indicating whether 푘 ∈ 푇 costs O(log 푛), where 푛 is the size of 푇

 Next: simple parallel algorithms on trees Parallel Search trees

 It is very easy to design parallel algorithms on trees using divide-and-conquer scheme  Recursively deal with two subtrees in parallel  Combine results of recursive calls and the root  Usually gives polylogarithmic bounds on depth

func(T, …) { if (T is empty) return base_case; M = do_something(T.root); in parallel: L=func(T.left, …); R=func(T.right, …); return combine_results(L, R, M, …) } Map and reduce

 Maps each entry on the tree to a certain value using function map, then reduce all the mapped values using reduce (with identity identity).  Assume map and reduce both have constant cost.

map_reduce(Tree T, function map, function reduce, value_type identity) { if (T is empty) return identity; M=map(t.root); in parallel: L=map_reduce(T.left, map, reduce, identity); R=map_reduce(T.right, map, reduce, identity); return reduce(reduce(L, M, R));

푂(푛) work and 푂(log 푛) depth Filter

 Select all entries in the tree that satisfy function 푓  Return a tree of all these entries

filter(Tree T, function f) { if (T is empty) return an empty tree; in parallel: L=filter(T.left, f); R=filter(T.right, f); if (f(T.root)) return join(L, T.root, R); else return join2(L, R);

푂(푛) work and 푂(log2 푛) depth Construction

T=build(Array A, int size) { 퐴′=parallel_sort(A, size); return build_sorted(퐴′, 푠); }

T=build_sorted(Arrary A, int start, int end) { if (start == end) return an empty tree; if (start == end-1) return singleton(A[start]); mid = (start+end)/2; in parallel: 푂(푛) work and L = build_sorted(A, start, mid); 푂(log 푛) depth R = build_sorted(A, mid+1, end); return join(L, A[mid], R);

푂(푛 log 푛) work and 푂(log 푛) depth, bounded by the sorting algorithm Output to array

 Output the entries in a tree 푇 to an array in its in- order  Assume each tree node stores its subtree size (an empty tree has size 0) 풆 to_array(Tree T, array A, int offset) { if (T is empty) return; 푇. 푙푒푓푡 푇. 푟𝑖푔ℎ푡 A[offset+T.left.size] = get_entry(T.root); in parallel: to_array(T.left, A, offset); 풆 to_array(T.right, A, offset+T.left.size()+1); The size of the left subtree 푂(푛) work and 푂(log 푛) depth Brief Summary

 Parallel algorithms on trees  Polylogarithmic depth  Takeaway: design parallel divide-and-conquer algorithms on trees

 Next: tree-tree operations: union, intersection, difference  Combine two indexes in database  Subroutine in some applications, e.g., the range tree  …… In this lecture: lower bound, a divide-and-conquer algorithm, the cost analysis Merging Two Trees of Size n and m (n≥m)

 Solution 1: flatten trees into arrays, merge with moving pointers: 0 4 7 8 Cost: 푂 푚 + 푛 1 2 3 5 6 9 What if 푛 ≫ 푚? O 푛 ? 0 1 2 3 4 5 6 7 8 9  Solution 2: insert the entries in the smaller tree into the larger tree

5 Cost: 푂 푚 log 푛 7 2 69 4 8 What if 푛 = 푚? 1 3 76 9 0 0 4 7 8 푂 푛 log 푛 ? 15

 What is the minimum cost? The Lower Bound of Merging Two Ordered Sets

 Choose 푛 slots for the elements in the first set among all 푚 + 푛 available slots in the final result. Set 1 (size 푚): Set 2 (size 푛 ≥ 푚): 0 4 7 8 1 2 3 5 6 9

풎 + 풏 풎 + 풏 = Result: 0 1 2 3 4 5 6 7 8 9 풎 풏 cases 풎+풏 풏  Lower bound: 퐥퐨퐠 = 횯 풎 퐥퐨퐠 + ퟏ ퟐ 풎 풎 The Lower Bound of Merging Two Ordered Sets

 The lower bound 풏 푶 풎 퐥퐨퐠 + ퟏ 풎  When 푚 = 푛, it is 푂(푛)  When n ≫ 푚, it is about 푂 푚 log 푛 (e.g., when 푚 = 1, it is 푂 log 푛 )

 Can we give an algorithm achieving this bound? The Union Function

Base case

7 5 4 8 2 9 0 1 3 6 The Union Function

5 4 7 2 9 0 8 1 3 6

푳ퟏ 푹ퟏ 푳ퟐ 푹ퟐ The Union Function

5

4 2 7 9 0 1 3 8 6 Union Union The Union Function

5

2 7 1 3 6 8

0 4 9

푻푳 푻푹 The Union Function

5

2 7 1 3 6 8

0 4 9

Similarly we can implement intersection and difference. The Cost

Theorem 1. For AVL trees, red-black trees, weight-balance trees and treaps, the above algorithm of merging two balanced 푛 BSTs of sizes 푚 and 푛 (푚 ≤ 푛) have 푂 푚 log + 1 work and 푚 푂(log 푚 log 푛) depth (in expectation for treaps). The Cost

Lemma 1. The Join work can be asymptotically bounded by its corresponding Split.

Lemma 2. Split a tree 푻 of size 풏 costs time 푶(퐥퐨퐠 풏). The depth is 푶 퐥퐨퐠 풎 퐥퐨퐠 풏

ℎ1 steps to reach the base case, O(ℎ2) cost for each split The Split Work Union split 8 4 9 0 푻ퟏ Union 5 Union split split 4 8 2 7 0 푻 9 ퟐ ^ 1 3 6 8 0 4 ^ 9 Union Union Union Union Concavity: 푘 ∑푎 The Split Work ෍ 푎 ≤ 푘 log 𝑖 𝑖 푘 𝑖=1 split 8 4 9 log |푇1| = log 푛 0 푻ퟏ (|푇|: the size of tree 푇) 푇21 5 푇22 split split 4 8 log |푇21| + log |푇22| 2 7 푛 0 9 ≤ 2 log 2 …… 푻 log |푇 | + log |푇 | ퟐ 31 32 1 3 6 ^ + log 푇 + log |푇 | 8 log |푇 |33+ ⋯ + log 34|푇 ℎ| ℎ1 푛 ℎ2 ≤ 4 log 푛 푇31 0 푇32 4 푇33 ^ 푇34 9 ≤ 2ℎ log 42ℎ 푛 푛 ℎ 푛 c 푛 log 푛 + 2 log + 4 log … + 2 log ℎ = 푂 푚 log + 1 The height of T2 2 4 2 푚 ℎ = 푂(log 푚)

c log2 푚 terms (If 푇2 is perfectly balanced) Concavity: 푘 ∑푎 The Split Work ෍ 푎 ≤ 푘 log 𝑖 𝑖 푘 𝑖=1 split 8 푁 Lemma 3. In a tree of size 푁, there are at most ⌈𝑖/2⌉ nodes in 4 9 푐 level −𝑖. log |푇1| = log 푛 0 푻ퟏ (|푇|: the size of tree 푇) 푇21 5 (Details푇22 omitted) split split 4 8 log |푇21| + log |푇22| 2 푛 푚 푛 푚 푛7 푛 푛 0 푂 푚 log + log + log + ⋯ 9= 푂 푚≤log2 log+ 1 푚 2 푚/2 4 푚/4 푚 2 …… 푻 log |푇31| + log |푇32| 1 3 ퟐ6 ^ + log 푇33 + log |푇34| 8 log |푇 | + ⋯ +푛log |푇 ℎ| ℎ1≤ 4 log ℎ2 4 푛 푇 0 푇32 4 푇33 ^ 푇 9 ℎ 31 34 ≤ 2 log ℎ 푛 푛 푛 푛 2 ℎ c The height of T log 푛 + 2 log + 4 log … + 2 log ℎ = 푂 푚 log + 1 2 2 4 2 푚 ℎ = 푂(log 푚)

c log2 푚 terms (If 푇2 is perfectly balanced) Brief Summary

 Takeaways:  The lower bound of merging two ordered structures  A work-optimal and polylog-depth parallel algorithms to merge two trees  A useful trick in analysis: layering bottom-up

 Next: augmentations on trees and range queries Range query (1D)

 Report all entries in key range [푘퐿, 푘푅].  Get a tree of them: 푂(log 푛) work and depth  Flatten them in an array: 푂(푘 + log 푛) work, 푂(log 푛) depth

푂(log 푛) related nodes / subtrees

푘퐿 푘푅 Augmentations

 In practice, most problems can be solved with a “textbook” data structure with some augmentations  Augmented trees: in each of the tree node, store some additional information about the whole subtree rooted at it.  The size, sum of values, the maximum value, ……  Another , a , ……  Efficiently maintain this augmented value in your algorithms. Augmentations for range search

 If we want to fast report the “range sum” of any associative operations, augmentations can help  The sum of all values in a key range, the maximum value in the key range, ……  Augment each tree node with the partial sum in its subtree

푂(log 푛) related nodes / subtrees

푘퐿 푘푅 Applications

 1D stabbing query  1D stabbing query:  2D range query: range tree  2D 3-sided query: priority search trees 1D stabbing query

 Given a set of intervals [푙𝑖, 푟𝑖] on the number line, and a specific point 푝, determine whether 푝 is covered by any

interval Stabbing query at 푝 = 5 1 2 3 4 5 6 7 8 9 (3,5) (number line) (1,7) (2,6) (4,5) (5,6) (6,7) (7,9)

 Is there an interval that satisfies 푙𝑖 ≤ 푝 ≤ 푟𝑖?  One possible application: for a website, given the login interval of all users, query at any given time if anyone is online 1D stabbing query

 Given a set of intervals [푙𝑖, 푟𝑖] on the number line, and a specific point 푝, determine whether 푝 is covered by any interval

 Is there an interval that satisfies 푙𝑖 ≤ 푝 ≤ 푟𝑖?  Possible solution:

 Find all intervals that 푙𝑖 ≤ 푝  Return the maximum right endpoint 푟푚 among them  If 푝 ≤ 푟푚 then return true, else return false  If we sort all intervals by their left endpoint, and view “max” as an abstract “sum” function, then it just requires to fast answer a range sum query Interval tree

 All intervals in the tree are sorted by the left endpoint. Each node is augmented by the maximum right endpoint in its subtree

1 2 3 4 5 6 7 8 9 (3,5) (number line) (1,7) (2,6) (4,5) (5,6) (6,7) (7,9) (4,5) 9

(2,6) 7 (6,7) 9 (left,right) Max right (1,7) 7 (3,5) 5 (5,6) 6 (7,9) 9 Interval tree

 Parallel construction: use the divide-and-conquer algorithm  Maintain the augmented values in the join function  Query: return max_right(T.root, p)>p;  max_right(node* t, p) {  if (!t) return −∞;  if (p

 Given a set of points on 2D planar, answer some information of all points in a query window 푥퐿, 푥푅 × [푦퐿, 푦푅]

Application: In a database, report all people (or the number of people) in age range [푎1, 푎2] and salary range [푠1, 푠2]

 Possible solution: find all points in [푥퐿, 푥푅], find those in [푦퐿, 푦푅]  If the result points in the first search are sorted by their y-coordinate …… 2D Range Tree

 2D Range tree: a two-level tree structure  The augmented value of the each outer tree node is also a search tree structure  Outer level: all points sorted by x-coordinate  Inner level (in each tree node): all points in its subtree sorted by y-coordinate 2D Range Tree y 7 6 (1,7) (4,5) 5 (2,6) (6,4) 4 3 2 (5,1) (3,3) 1 (7,2) 0 1 2 3 4 5 6 7 x key aug_val 4,5 (x,y) (inner tree) 6,4

2,6 6,4 7,2 2,6 2,6 3,3 1,7 5,1 3,3 4,5 1,7

1,7 3,3 5,1 7,2 2D Range Tree

푦퐿 푦푅 푦퐿 푦푅

푦퐿 푦푅 푦퐿 푦푅

푦 푦 푦퐿 푦푅 푥퐿 푥푅 퐿 푅  A range query can be done by two nested range query on outer and inner level respectively 2  Report all points in the search range 푂(log 푛 + 푘), where 푘 is the output size 2  Report the count in the search range: 푂(log 푛) 2D Range Tree

 If the tree is static (i.e., no insertions or deletions), the inner tree can also be maintained by arrays  Construction: the parallel divide-and-conquer algorithm (푂(푛 log 푛) work and polylog depth)  Maintain the augmented value (the inner tree) in the join function  When the join function is called, the inner trees of the two children have already been settled, so in this step we only need to merge two ordered structures into one

 The union function we mentioned above, or the array merging algorithm (since the two ordered sets are of the same size) 3-sided query

 Given a set of points on 2D planar, report all points in query window 푥퐿, 푥푅 × [푦퐿, +∞)

The y coordinate is often called the priority of the point.

 Using a range tree, this can be done in time 푂(log2 푛 + 푘)  When 푘 is small, log2 푛 dominates  Can we do better? 3-sided query

 Given a set of points on 2D planar, report all points in query window 푥퐿, 푥푅 × [푦퐿, +∞)  Using a range tree, this can be done in time 푂(log2 푛 + 푘)  When 푘 is small, log2 푛 dominates  Can we do better?  When we do the secondary search, if we can first access the points with higher priority…  Instead of a tree, can we use a heap? Priority search tree

 Heap on y (priority)  (Almost) search tree on x

 The root is the point with highest priority  All the other points are evenly split by the median of their x-coordinate, forming the left and right subtrees respectively  Recursively build the two subtrees Priority search tree

 Search 푥퐿, 푥푅 × [푦퐿, ∞):  Search [푥퐿, 푥푅] in the tree, finding at most O(log 푛) related nodes and subtrees - 푂(log 푛)  For each node, check if it is in the query range - 푂(log 푛)  For each subtree, pop all points with priority higher than 푦퐿 - 푂(푘) In total 푂(푘 + log 푛) Priority search tree

 Construct a priority search tree in parallel:  Presort all points based on their x-coordinates. Store the sorted list in an array 퐴 푂 푛 log 푛 work and 푂 log 푛 depth  Recursively do the follows:

 In parallel find the point 푝 with the highest priority 푂(푛) work and 푂(log 푛) depth

 Remove 푝 from the input, find the median 푚 of the rest points

 Copy the left half of the points in 퐴퐿 (exclude 푝) recursively build the left subtree 퐿

 Copy the right half of the points in 퐴푅 (exclude 푝) recursively build the right subtree 푅

 Store 푝 in the root, set its left child to 퐿, right child to 푅, the splitter key to 푚 푂(푛 log 푛) work and 푂(log2 푛) depth Summary

 Parallel divide-and-conquer algorithms on trees  Merging two trees in parallel  Lower bound  A divide-and-conquer algorithm  Cost analysis

 Applications:  1d range query and augmentations  Solve 1d stabbing query using interval trees  Solve 2d range query using range trees  Solve 3-sided query using priority search trees