CSE 203 Lecture Note (12/08)
Total Page:16
File Type:pdf, Size:1020Kb
CSE 203 Lecture Note (12/08) 1 Binary Search Tree (BST) 1.1 Basic Structure It is just a binary tree with a special property: for an element k, all elements smaller than k are in the left subtree; while all elements larger than k are in the right subtree. 1.2 Balancedness If it is a balanced binary tree, you need O(log n) time in search; however, it is not balanced, you need O(n) time in search for the worst case. A usual solution to this is to use a complicated method to maintain the balancedness. Probably, the solution does not keep the tree "perfectly balanced" but with some constants added to O(log n). However, there might be a lot of data mutations. 1.3 Application: TreeSort v.s. QuickSort TreeSort is implemented as follows: insert elements into binary search tree one by one. At the end, traverse all the elements in a certain order, such as, in-order DFS. Nevertheless, there is an alternative choice for sort: QuickSort, which will randomly pick up an element and separate the whole sequence into two sets: one is for elements smaller than the picked up element, and another one is for larger elements. After that, one can recursively work on two subsets and build up a BST-like structure. To solve the imbalancedness we encountered in BST, one can pick up the pivots at random, which induces a expected O(n log n) runtime for QuickSort algorithm. 1.4 Build QuickSort Tree in random order Motivation: if we can insert elements into BST in a random order, we can prove that the average depth for BST is O(log n) from QuickSort. However, in practice, we need to dynamically insert nodes without knowing the whole inputs in advance. 1.4.1 Solution Each element is given a random priority. Later on, tree is going to be structured as if elements are inserted in priority order. By doing so, no matter which operation is applied on the current tree, the runtime still has expected O(log n). 2 Treap 2.1 Basic Structure One always picks up the element k with lowest priority as root. After the element is selected, one can put all elements with keys smaller than k in the left subtree and all elements with keys larger than k in the right subtree. This can be done recursively. At the end the following property is going to be held. • Treap is a BST w.r.t. key value • Treap is a min heap w.r.t. priority 2.2 Uniqueness Given a list of elements, there are many possible BSTs and min heaps satisfy the constraints. Fortunately, according to Lemma 2.1, there is only a unique Treap. Lemma 2.1. For any n distinct keys and priorities. There is only a unique Treap for these elements. Proof. Since all elements have different priority, there is exactly one element k with lowest priority. After picking up k, there is only one way to separate elements into two sets: elements with keys smaller than k and elements with keys larger than k. Due to the fact that two sets can be done recursively, we prove the Treap is unique. 2.3 Treap is memoryless Treap only cares about the keys and priorities of nodes in current sturcture, and the order of insersion or deletion does not matter at all. 2.4 Tree Depth Assuming that there is a unique Treap on elements x1 < x2 < ··· < xn, we want to know about Depth(xk). It is known that Depth(xk) is exactly the number of ancestor of xk. In other words, if xl is the ancestor of xk if and only if Priority(xl) < Priority(xl+1); ··· ; Priority(xk−1); Priority(xk): If any of the inequality does not hold, it means that there must be a common ancestor of xl and xk. Proofs for both "if" and "only if" are similar. Now, we are curious about the probability of this situation. Since, there is only one feasible 1 occurrence of xl; ··· ; xk, the probability can be easily calculated as k−l+1 . Thus, one can find out the expectation of the Depth(xk) as the summation of root itself, left subtree, and right subtree: 1 1 1 1 Depth(x ) = 1 + + ··· + + + ··· + k 2 k 2 n − k + 1 = Hk + Hn−k+1 = O(log n); where Hi is the Harmonic function. 2.5 Delete Operation (merge branches) If the to-be-delete node is a leaf, one can remove that node easily; however, more often than not, the node is an inner node, and the direct node deletion will incur some orphan nodes. Therefore, to delete node x, there is an algorithm that merges only two branches under node x. An example is given as Figure1. 2.6 Delete Operation (increase priority) Motivation: one can increase the priority of x gradually and have some rotations in tree. Once node x becomes a leaf node, one can remove it from tree easily. A example is given as Figure.2 2.7 Insertion Operation (decrease priority) It is just like an opposite operation of node deletion. A node x will be inserted as a leaf with infinite priority in the beginning. Later, the priority of that node will be decrease gradually; meanwhile many rotations happen in tree. At the end, the node reaches its target priority, and the whole insertion operation is finished. 2 (a) Before merging two branches (b) After merging two branches Figure 1: An example of node x deletion (merge branches) (a) Before increasing priority (b) After increasing priority Figure 2: An example of node x deletion (increase priority) 3 2.8 Time analysis for Deletion and Insertion Operation The time needed for deletion is the time to find node x and the time to merge two branches, which can written as O(log n) + O(length of two branches) ≤ O(log n) + O(2 × tree depth) = O(log n): Analysis for insertion is similar. 2.9 Data mutation for Deletion and Insertion Operation In real application, one might care more about the scale of data mutations, which is O(length of two branches), and in fact, the expectation of data mutations needed for Treap is much smaller. To start with, we are curious about the fact that xl belongs to the left branch and xl is the smallest element in left branch, which means the following two conditions. Priority(xk) < Priority(xk−1); ··· ; Priority(xl) Priority(xl) < Priority(xl+1); ··· ; Priority(xk−1) In other words, the probability can be written as 1 1 P (smallest elements in left branch are x ; x in the order) = : k l k − l + 1 k − 1 Later, we know the expected number of elements in left branch is 1 1 1 1 + + ··· + = 1 − : 2 2 × 3 (k − 1) × k k Thus, the total number of data mutations is 1 1 1 1 1 − + 1 − = 2 − − : k n − k + 1 k n − k + 1 2.10 Disadvantage of Treap Take splay tree as an example, the complexity of its operations is related to the entropy of the data distribution. That is to say, if the input data follow a certain distribution, splay tree may have better performance. [Writer Note] In fact, splay tree also supports many other operations, such as reverse a range of nodes between i and j in O(log n) time, which is not supported by Treap. 4.