CSE 203 Lecture Note (12/08)

Total Page:16

File Type:pdf, Size:1020Kb

CSE 203 Lecture Note (12/08) CSE 203 Lecture Note (12/08) 1 Binary Search Tree (BST) 1.1 Basic Structure It is just a binary tree with a special property: for an element k, all elements smaller than k are in the left subtree; while all elements larger than k are in the right subtree. 1.2 Balancedness If it is a balanced binary tree, you need O(log n) time in search; however, it is not balanced, you need O(n) time in search for the worst case. A usual solution to this is to use a complicated method to maintain the balancedness. Probably, the solution does not keep the tree "perfectly balanced" but with some constants added to O(log n). However, there might be a lot of data mutations. 1.3 Application: TreeSort v.s. QuickSort TreeSort is implemented as follows: insert elements into binary search tree one by one. At the end, traverse all the elements in a certain order, such as, in-order DFS. Nevertheless, there is an alternative choice for sort: QuickSort, which will randomly pick up an element and separate the whole sequence into two sets: one is for elements smaller than the picked up element, and another one is for larger elements. After that, one can recursively work on two subsets and build up a BST-like structure. To solve the imbalancedness we encountered in BST, one can pick up the pivots at random, which induces a expected O(n log n) runtime for QuickSort algorithm. 1.4 Build QuickSort Tree in random order Motivation: if we can insert elements into BST in a random order, we can prove that the average depth for BST is O(log n) from QuickSort. However, in practice, we need to dynamically insert nodes without knowing the whole inputs in advance. 1.4.1 Solution Each element is given a random priority. Later on, tree is going to be structured as if elements are inserted in priority order. By doing so, no matter which operation is applied on the current tree, the runtime still has expected O(log n). 2 Treap 2.1 Basic Structure One always picks up the element k with lowest priority as root. After the element is selected, one can put all elements with keys smaller than k in the left subtree and all elements with keys larger than k in the right subtree. This can be done recursively. At the end the following property is going to be held. • Treap is a BST w.r.t. key value • Treap is a min heap w.r.t. priority 2.2 Uniqueness Given a list of elements, there are many possible BSTs and min heaps satisfy the constraints. Fortunately, according to Lemma 2.1, there is only a unique Treap. Lemma 2.1. For any n distinct keys and priorities. There is only a unique Treap for these elements. Proof. Since all elements have different priority, there is exactly one element k with lowest priority. After picking up k, there is only one way to separate elements into two sets: elements with keys smaller than k and elements with keys larger than k. Due to the fact that two sets can be done recursively, we prove the Treap is unique. 2.3 Treap is memoryless Treap only cares about the keys and priorities of nodes in current sturcture, and the order of insersion or deletion does not matter at all. 2.4 Tree Depth Assuming that there is a unique Treap on elements x1 < x2 < ··· < xn, we want to know about Depth(xk). It is known that Depth(xk) is exactly the number of ancestor of xk. In other words, if xl is the ancestor of xk if and only if Priority(xl) < Priority(xl+1); ··· ; Priority(xk−1); Priority(xk): If any of the inequality does not hold, it means that there must be a common ancestor of xl and xk. Proofs for both "if" and "only if" are similar. Now, we are curious about the probability of this situation. Since, there is only one feasible 1 occurrence of xl; ··· ; xk, the probability can be easily calculated as k−l+1 . Thus, one can find out the expectation of the Depth(xk) as the summation of root itself, left subtree, and right subtree: 1 1 1 1 Depth(x ) = 1 + + ··· + + + ··· + k 2 k 2 n − k + 1 = Hk + Hn−k+1 = O(log n); where Hi is the Harmonic function. 2.5 Delete Operation (merge branches) If the to-be-delete node is a leaf, one can remove that node easily; however, more often than not, the node is an inner node, and the direct node deletion will incur some orphan nodes. Therefore, to delete node x, there is an algorithm that merges only two branches under node x. An example is given as Figure1. 2.6 Delete Operation (increase priority) Motivation: one can increase the priority of x gradually and have some rotations in tree. Once node x becomes a leaf node, one can remove it from tree easily. A example is given as Figure.2 2.7 Insertion Operation (decrease priority) It is just like an opposite operation of node deletion. A node x will be inserted as a leaf with infinite priority in the beginning. Later, the priority of that node will be decrease gradually; meanwhile many rotations happen in tree. At the end, the node reaches its target priority, and the whole insertion operation is finished. 2 (a) Before merging two branches (b) After merging two branches Figure 1: An example of node x deletion (merge branches) (a) Before increasing priority (b) After increasing priority Figure 2: An example of node x deletion (increase priority) 3 2.8 Time analysis for Deletion and Insertion Operation The time needed for deletion is the time to find node x and the time to merge two branches, which can written as O(log n) + O(length of two branches) ≤ O(log n) + O(2 × tree depth) = O(log n): Analysis for insertion is similar. 2.9 Data mutation for Deletion and Insertion Operation In real application, one might care more about the scale of data mutations, which is O(length of two branches), and in fact, the expectation of data mutations needed for Treap is much smaller. To start with, we are curious about the fact that xl belongs to the left branch and xl is the smallest element in left branch, which means the following two conditions. Priority(xk) < Priority(xk−1); ··· ; Priority(xl) Priority(xl) < Priority(xl+1); ··· ; Priority(xk−1) In other words, the probability can be written as 1 1 P (smallest elements in left branch are x ; x in the order) = : k l k − l + 1 k − 1 Later, we know the expected number of elements in left branch is 1 1 1 1 + + ··· + = 1 − : 2 2 × 3 (k − 1) × k k Thus, the total number of data mutations is 1 1 1 1 1 − + 1 − = 2 − − : k n − k + 1 k n − k + 1 2.10 Disadvantage of Treap Take splay tree as an example, the complexity of its operations is related to the entropy of the data distribution. That is to say, if the input data follow a certain distribution, splay tree may have better performance. [Writer Note] In fact, splay tree also supports many other operations, such as reverse a range of nodes between i and j in O(log n) time, which is not supported by Treap. 4.
Recommended publications
  • Amortized Analysis Worst-Case Analysis
    Beyond Worst Case Analysis Amortized Analysis Worst-case analysis. ■ Analyze running time as function of worst input of a given size. Average case analysis. ■ Analyze average running time over some distribution of inputs. ■ Ex: quicksort. Amortized analysis. ■ Worst-case bound on sequence of operations. ■ Ex: splay trees, union-find. Competitive analysis. ■ Make quantitative statements about online algorithms. ■ Ex: paging, load balancing. Princeton University • COS 423 • Theory of Algorithms • Spring 2001 • Kevin Wayne 2 Amortized Analysis Dynamic Table Amortized analysis. Dynamic tables. ■ Worst-case bound on sequence of operations. ■ Store items in a table (e.g., for open-address hash table, heap). – no probability involved ■ Items are inserted and deleted. ■ Ex: union-find. – too many items inserted ⇒ copy all items to larger table – sequence of m union and find operations starting with n – too many items deleted ⇒ copy all items to smaller table singleton sets takes O((m+n) α(n)) time. – single union or find operation might be expensive, but only α(n) Amortized analysis. on average ■ Any sequence of n insert / delete operations take O(n) time. ■ Space used is proportional to space required. ■ Note: actual cost of a single insert / delete can be proportional to n if it triggers a table expansion or contraction. Bottleneck operation. ■ We count insertions (or re-insertions) and deletions. ■ Overhead of memory management is dominated by (or proportional to) cost of transferring items. 3 4 Dynamic Table: Insert Dynamic Table: Insert Dynamic Table Insert Accounting method. Initialize table size m = 1. ■ Charge each insert operation $3 (amortized cost). – use $1 to perform immediate insert INSERT(x) – store $2 in with new item IF (number of elements in table = m) ■ When table doubles: Generate new table of size 2m.
    [Show full text]
  • Search Trees
    Lecture III Page 1 “Trees are the earth’s endless effort to speak to the listening heaven.” – Rabindranath Tagore, Fireflies, 1928 Alice was walking beside the White Knight in Looking Glass Land. ”You are sad.” the Knight said in an anxious tone: ”let me sing you a song to comfort you.” ”Is it very long?” Alice asked, for she had heard a good deal of poetry that day. ”It’s long.” said the Knight, ”but it’s very, very beautiful. Everybody that hears me sing it - either it brings tears to their eyes, or else -” ”Or else what?” said Alice, for the Knight had made a sudden pause. ”Or else it doesn’t, you know. The name of the song is called ’Haddocks’ Eyes.’” ”Oh, that’s the name of the song, is it?” Alice said, trying to feel interested. ”No, you don’t understand,” the Knight said, looking a little vexed. ”That’s what the name is called. The name really is ’The Aged, Aged Man.’” ”Then I ought to have said ’That’s what the song is called’?” Alice corrected herself. ”No you oughtn’t: that’s another thing. The song is called ’Ways and Means’ but that’s only what it’s called, you know!” ”Well, what is the song then?” said Alice, who was by this time completely bewildered. ”I was coming to that,” the Knight said. ”The song really is ’A-sitting On a Gate’: and the tune’s my own invention.” So saying, he stopped his horse and let the reins fall on its neck: then slowly beating time with one hand, and with a faint smile lighting up his gentle, foolish face, he began..
    [Show full text]
  • Splay Trees Last Changed: January 28, 2017
    15-451/651: Design & Analysis of Algorithms January 26, 2017 Lecture #4: Splay Trees last changed: January 28, 2017 In today's lecture, we will discuss: • binary search trees in general • definition of splay trees • analysis of splay trees The analysis of splay trees uses the potential function approach we discussed in the previous lecture. It seems to be required. 1 Binary Search Trees These lecture notes assume that you have seen binary search trees (BSTs) before. They do not contain much expository or backtround material on the basics of BSTs. Binary search trees is a class of data structures where: 1. Each node stores a piece of data 2. Each node has two pointers to two other binary search trees 3. The overall structure of the pointers is a tree (there's a root, it's acyclic, and every node is reachable from the root.) Binary search trees are a way to store and update a set of items, where there is an ordering on the items. I know this is rather vague. But there is not a precise way to define the gamut of applications of search trees. In general, there are two classes of applications. Those where each item has a key value from a totally ordered universe, and those where the tree is used as an efficient way to represent an ordered list of items. Some applications of binary search trees: • Storing a set of names, and being able to lookup based on a prefix of the name. (Used in internet routers.) • Storing a path in a graph, and being able to reverse any subsection of the path in O(log n) time.
    [Show full text]
  • Leftist Heap: Is a Binary Tree with the Normal Heap Ordering Property, but the Tree Is Not Balanced. in Fact It Attempts to Be Very Unbalanced!
    Leftist heap: is a binary tree with the normal heap ordering property, but the tree is not balanced. In fact it attempts to be very unbalanced! Definition: the null path length npl(x) of node x is the length of the shortest path from x to a node without two children. The null path lengh of any node is 1 more than the minimum of the null path lengths of its children. (let npl(nil)=-1). Only the tree on the left is leftist. Null path lengths are shown in the nodes. Definition: the leftist heap property is that for every node x in the heap, the null path length of the left child is at least as large as that of the right child. This property biases the tree to get deep towards the left. It may generate very unbalanced trees, which facilitates merging! It also also means that the right path down a leftist heap is as short as any path in the heap. In fact, the right path in a leftist tree of N nodes contains at most lg(N+1) nodes. We perform all the work on this right path, which is guaranteed to be short. Merging on a leftist heap. (Notice that an insert can be considered as a merge of a one-node heap with a larger heap.) 1. (Magically and recursively) merge the heap with the larger root (6) with the right subheap (rooted at 8) of the heap with the smaller root, creating a leftist heap. Make this new heap the right child of the root (3) of h1.
    [Show full text]
  • SPLAY Trees • Splay Trees Were Invented by Daniel Sleator and Robert Tarjan
    Red-Black, Splay and Huffman Trees Kuan-Yu Chen (陳冠宇) 2018/10/22 @ TR-212, NTUST Review • AVL Trees – Self-balancing binary search tree – Balance Factor • Every node has a balance factor of –1, 0, or 1 2 Red-Black Trees. • A red-black tree is a self-balancing binary search tree that was invented in 1972 by Rudolf Bayer – A special point to note about the red-black tree is that in this tree, no data is stored in the leaf nodes • A red-black tree is a binary search tree in which every node has a color which is either red or black 1. The color of a node is either red or black 2. The color of the root node is always black 3. All leaf nodes are black 4. Every red node has both the children colored in black 5. Every simple path from a given node to any of its leaf nodes has an equal number of black nodes 3 Red-Black Trees.. 4 Red-Black Trees... • Root is red 5 Red-Black Trees…. • A leaf node is red 6 Red-Black Trees….. • Every red node does not have both the children colored in black • Every simple path from a given node to any of its leaf nodes does not have equal number of black nodes 7 Searching in a Red-Black Tree • Since red-black tree is a binary search tree, it can be searched using exactly the same algorithm as used to search an ordinary binary search tree! 8 Insertion in a Red-Black Tree • In a binary search tree, we always add the new node as a leaf, while in a red-black tree, leaf nodes contain no data – For a given data 1.
    [Show full text]
  • Amortized Complexity Analysis for Red-Black Trees and Splay Trees
    International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: 2347-5552, Volume-6, Issue-6, November2018 DOI: 10.21276/ijircst.2018.6.6.2 Amortized Complexity Analysis for Red-Black Trees and Splay Trees Isha Ashish Bahendwar, RuchitPurshottam Bhardwaj, Prof. S.G. Mundada Abstract—The basic conception behind the given problem It stores data in more efficient and practical way than basic definition is to discuss the working, operations and complexity data structures. Advanced data structures include data analyses of some advanced data structures. The Data structures like, Red Black Trees, B trees, B+ Trees, Splay structures that we have discussed further are Red-Black trees Trees, K-d Trees, Priority Search Trees, etc. Each of them and Splay trees.Red-Black trees are self-balancing trees has its own special feature which makes it unique and better having the properties of conventional tree data structures along with an added property of color of the node which can than the others.A Red-Black tree is a binary search tree with be either red or black. This inclusion of the property of color a feature of balancing itself after any operation is performed as a single bit property ensured maintenance of balance in the on it. Its nodes have an extra feature of color. As the name tree during operations such as insertion or deletion of nodes in suggests, they can be either red or black in color. These Red-Black trees. Splay trees, on the other hand, reduce the color bits are used to count the tree’s height and confirm complexity of operations such as insertion and deletion in trees that the tree possess all the basic properties of Red-Black by splayingor making the node as the root node thereby tree structure, The Red-Black tree data structure is a binary reducing the time complexity of insertion and deletions of a search tree, which means that any node of that tree can have node.
    [Show full text]
  • Performance Analysis of Bsts in System Software∗
    Performance Analysis of BSTs in System Software∗ Ben Pfaff Stanford University Department of Computer Science [email protected] Abstract agement and networking, and the third analyzes a part of a source code cross-referencing tool. In each Binary search tree (BST) based data structures, such case, some test workloads are drawn from real-world as AVL trees, red-black trees, and splay trees, are of- situations, and some reflect worst- and best-case in- ten used in system software, such as operating system put order for BSTs. kernels. Choosing the right kind of tree can impact performance significantly, but the literature offers few empirical studies for guidance. We compare 20 BST We compare four variants on the BST data struc- variants using three experiments in real-world scenar- ture: unbalanced BSTs, AVL trees, red-black trees, ios with real and artificial workloads. The results in- and splay trees. The results show that each should be dicate that when input is expected to be randomly or- preferred in a different situation. Unbalanced BSTs dered with occasional runs of sorted order, red-black are best when randomly ordered input can be relied trees are preferred; when insertions often occur in upon; if random ordering is the norm but occasional sorted order, AVL trees excel for later random access, runs of sorted order are expected, then red-black trees whereas splay trees perform best for later sequential should be chosen. On the other hand, if insertions or clustered access. For node representations, use of often occur in a sorted order, AVL trees excel when parent pointers is shown to be the fastest choice, with later accesses tend to be random, and splay trees per- threaded nodes a close second choice that saves mem- form best when later accesses are sequential or clus- ory; nodes without parent pointers or threads suffer tered.
    [Show full text]
  • Lecture 4 1 Overview 2 Splay Tree Properties
    ICS 691: Advanced Data Structures Spring 2016 Lecture 4 Prof. Nodari Sitchinava Scribe: Ben Karsin 1 Overview In the last lecture we discussed splay trees, as detailed in [ST85]. We saw that splay trees are c-competitive with the optimal binary search tree in the static case by analyzing the 3 operations used by splay trees. In this lecture we continue our analysis of splay trees with the introduction and proof of several properties that splay trees exhibit and introduce the geometric view of BSTs. In Section 2 we define several properties of splay trees and BSTs in general. In Section 3 we introduce the geometric view of BSTs by defining arborally satisfied sets of points and proving that they are equivalent to BSTs. 2 Splay Tree Properties Property 1. (Static Finger Bound): The amortized cost c^f (x) of searching for x in a splay tree, given a fixed node f is: c^f (x) = O(log (1 + jx − fj)) where jx − fj is the distance from node x to f in the total order. The Static Finger Bound says that accessing any element x can be bounded by the logarithm of the distance from some designated node f. Informally, it can be viewed as placing a finger on node f, starting each search from f in a tree balanced around f. 1 Proof. To prove this property for splay trees, let the weight function be w(x) = (1+x−f)2 . Then sroot, the value of the root node, can be bounded by: X 1 s = root (1 + x − f)2 x2tree 1 X 1 ≤ (1 + k − f)2 k=−∞ 1 X 1 π2 ≤ = k2 3 −∞ = O(1) 1 This tells us that the size function of the root node is bounded by a constant.
    [Show full text]
  • Data Structures Lecture 11
    Fall 2021 Fang Yu Software Security Lab. Data Structures Dept. Management Information Systems, National Chengchi University Lecture 11 Search Trees Binary Search Trees, AVL trees, and Splay Trees 3 Binary Search Trees ¡ A binary search tree is a binary tree storing keys (or key-value ¡ An inorder traversal of a binary entries) at its internal nodes and search trees visits the keys in an satisfying the following increasing order property: ¡ Let u, v, and w be three nodes such that u is in the left subtree of v and w is in the right subtree of v. 6 We have key(u) ≤ key(v) ≤ key(w) 2 9 ¡ External nodes do not store items 1 4 8 4 Search Algorithm TreeSearch(k, v) ¡ To search for a key k, we trace if T.isExternal (v) a downward path starting at the return v root if k < key(v) return TreeSearch(k, T.left(v)) ¡ The next node visited depends else if k key(v) on the comparison of k with the = return v key of the current node else { k > key(v) } ¡ If we reach a leaf, the key is return TreeSearch(k, T.right(v)) not found < 6 ¡ Example: get(4): 2 9 > ¡ Call TreeSearch(4,root) 1 4 = 8 ¡ The algorithms for floorEntry and ceilingEntry are similar 5 Insertion 6 ¡ To perform operation put(k, o), < we search for key k (using 2 9 > TreeSearch) 1 4 8 > ¡ Assume k is not already in the tree, and let w be the leaf w reached by the search 6 ¡ We insert k at node w and expand w into an internal node 2 9 ¡ Example: insert 5 1 4 8 w 5 6 Deletion 6 ¡ To perform operation remove(k), < we search for key k 2 9 > ¡ Assume key k is in the tree, and 1 4 v 8 let v be
    [Show full text]
  • Lecture 5: Splay Trees 1 Introduction 2 Splay(X)
    15-750: Graduate Algorithms February 3, 2016 Lecture 5: Splay Trees Lecturer: Gary Miller Scribe: Jiayi Li, Tianyi Yang 1 Introduction Recall from previous lecture that balanced binary search trees (BST) are those whose max depths are O(log n). While this property guarantees balanced BST to have O(log n) worst-case perfor- mance, sometimes the worst-case performance may not be of the most interest. Indeed, suppose a worst-case query is searched multiple times, ideally we would want the data structure to adjust itself to accommodate the observed empirical frequency so that when the same query is searched again, a faster time can be achieved. The splay tree is a variant of BST that does exactly that. Every time Search(x) is called on a splay tree, (assume x is indeed in the tree), in addition to the searching, the splay tree also rotates x to the root using a specialized function Splay(x). Although splay trees are not balanced according to the worst-case standard, they are \self- balancing" thanks to Splay(x) in that no information is stored to maintain the balance. In the long run, the amortized complexity works out to O(log n) per operation. 2 Splay(x) As mentioned above, the distinguished feature of splay trees is the function Splay(x). Splay(x) rotates a node x to the root using a series of three rules, which are described below. We give to definitions of these rules. The first is an imperative description using rotations, while the second describes the new BST as a change to an insertion order or priority order that determines the original tree.
    [Show full text]
  • AVL/Splay Trees
    AVL/Splay Trees S. Thiel Self-Organizing Lists Balanced Trees AVL Trees AVL/Splay Trees Splay Trees References S. Thiel1 1Department of Computer Science & Software Engineering Concordia University July 17, 2019 1/43 Outline AVL/Splay Trees S. Thiel Self-Organizing Lists Balanced Trees AVL Trees Self-Organizing Lists Splay Trees References Balanced Trees AVL Trees Splay Trees References 2/43 Self-Organizing Lists [1, p.307] AVL/Splay Trees S. Thiel Self-Organizing Lists Balanced Trees AVL Trees Splay Trees References I We have looked at organizing list by value I That's not the only option I If we want to find something quickly, are there faster options? 3/43 Organizing by Frequency 1 [1, p.307] AVL/Splay Trees S. Thiel Self-Organizing Lists If we just want a fast access to what we're looking for Balanced Trees I AVL Trees Splay Trees I We can organize by frequency, as this is often cheap References I This applies when we know about the frequency: I e.g.: for ki elements where each value has probability pi I When sorting linearly, assuming the above frequencies were ordered for convenience: I Total comparisons to find each value once is: Cn = 1p0 + 2p1 + ::: + npn−1 2 I In general, Cn is a function of n , and thus each find is Θ(n) in the average case 1 1being the case where the probabilities are followed 4/43 Organizing by Frequency 2 [1, p.307-308] AVL/Splay Trees S. Thiel Self-Organizing Lists Balanced Trees AVL Trees Splay Trees I What if pn are geometric series? References 1 I What if pi = 2i+1 ? I Cn = 1p0 + 2p1 + ::: + npn−1 I which is just: n−1 X i n + 1 = 2 − 2i 2n−1 i=1 I and as n approaches 1 it is 2 5/43 Access Patterns [1, p.309] AVL/Splay Trees S.
    [Show full text]
  • Sc0 SEARCHING TECHNIQUES Hanan Samet Computer Science Department and Center for Automation Research and Institute for Advanced C
    sc0 SEARCHING TECHNIQUES Hanan Samet Computer Science Department and Center for Automation Research and Institute for Advanced Computer Studies University of Maryland College Park, Maryland 20742 e-mail: [email protected] Copyright © 1997 Hanan Samet These notes may not be reproduced by any means (mechanical or elec- tronic or any other) without the express written permission of Hanan Samet sc1 SEARCHING • Simplest technique is to look at every record until finding the one we are looking for (known as sequential search) • Speeding up sequential search: 1. table-lookup • assumes existence of 1-1 mapping from dataset (i.e., one of the key values) to a memory address • frequently 1-1 mapping does not exist and use hashing to calculate a good starting point for the search 2. preprocess data by sorting it • search is implemented by the repeated application of a partitioning process to the set of key values until locating the desired record a. tree-based • partitions the set of key values • e.g., binary search trees, AVL trees, B-trees, … b. trie-based • partitions based on the digits or characters that comprise the domain of the key values • e.g., digital searching, radix-trees, most quadtree methods sc2 SEQUENTIAL SEARCHING • Simplest way to search • Two tests: 1. has every record in the file been examined? 2. is the current record the one we want? • Expected cost of success is 2·n/2 = n tests • Expected cost of failure is 2n tests • Speed up sequential search by inserting the record with the desired key value k at the end of the file • eliminates need to test if all records have been examined • n/2 tests for success and n+1 tests for failure • halve the execution time at the cost of one additional location • assumes we know location of the end of the file for insertion • For greater speedups, records need to be sorted 1.
    [Show full text]