Randomized Search Trees

Randomized Search Trees Raimund Seidel Fachb erich Informatik Computer Science Division Universitat des Saarlandes University of California Berkeley D Saarbruc ken GERMANY Berkeley CA y Cecilia R Aragon Computer Science Division University of California Berkeley Berkeley CA Abstract We present a randomized strategy for maintaining balance in dynamically changing search trees that has optimal expected b ehavior In particular in the exp ected case a search or an up date takes logarithmic time with the up date requiring fewer than two rotations Moreover the up date time remains logarithmic even if the cost of a rotation is taken to b e prop ortional to the size of the rotated subtree Finger searches and splits and joins can be p erformed in optimal exp ected time also We show that these results continue to hold even if very little true randomness is available ie if only a logarithmic numb er of truely random bits are available Our approach generalizes naturally to weighted trees where the exp ected time b ounds for accesses and up dates again match the worst case time b ounds of the b est deterministic metho ds We also discuss ways of implementing our randomized strategy so that no explicit balance information is maintained Our balancing strategy and our algorithms are exceedingly simple and should b e fast in practice This pap er is dedicated to the memory of Gene Law ler Intro duction Storing sets of items so as to allow for fast access to an item given its key is a ubiquitous problem in computer science When the keys are drawn from a large totally ordered set the metho d of choice for storing the items is usually some sort of search tree The simplest form of such a tree is a binary search tree Here a set X of n items is stored at the no des of a ro oted binary tree as follows some item y X is chosen to be stored at the ro ot of the tree and the left and right children of the ro ot are binary search trees for the sets X fx X j xk ey ykeyg and Supp orted by NSF Presidential Young Investigator award CCR Email seidelcsunisbde y Supp orted byanATT graduate fellowship X fx X j ykey xk ey g resp ectively The time necessary to access some item in such a tree is then essentially determined by the depth of the no de at which the item is stored Thus it is desirable that all no des in the tree have small depth This can easily be achieved if the set X is known in advance and the search tree can b e constructed oline One only needs to balance the tree by enforcing that X and X dierinsizeby at most one This ensures that no no de has depth exceeding log n When the set of items changes with time and items can b e inserted and deleted unpredictably ensuring small depth of all the no des in the changing search tree is less straightforward Nonethe less a fair numb er of strategies have b een develop ed for maintaining approximate balance in such changing search trees Examples are AVLtrees a btrees BBtrees redblack trees and many others All these classes of trees guarantee that accesses and up dates can b e p erformed in O log nworst case time Some sort of balance information stored with the no des is used for the restructuring during up dates All these trees can b e implemented so that the restruc turing can b e done via small lo cal changes known as rotations see Fig Moreover with the appropriate choice of parameters a btrees and BBtrees guarantee that the average number of rotations p er up date is constant where the average is taken over a sequence of m up dates It can even b e shown that most rotations o ccur close to the leaves roughly sp eaking for BBtrees this means that the number of times that some subtree of size s is rotated is O ms see This fact is imp ortant for the parallel use of these search trees and also for applications in compu tational geometry where the no des of a primary tree have secondary search structures asso ciated with them that have to be completely recomputed up on rotation in the primary tree eg range trees and segment trees see Rotate Left y x x y C A AB BC Rotate Right Figure Sometimes it is desirable that some items can b e accessed more easily than others For instance if the access frequencies for the dierent items are known in advance then these items should b e stored in a search tree so that items with high access frequency are close to the ro ot For the static case an optimal tree of this kind can be constructed oline by a dynamic programming technique For the dynamic case strategies are known such as biased trees and D trees that allow accessing an item of weight w in worst case time O log Ww which is basically optimal Here W is the sum of the weights of all the items in the tree Up dates can b e p erformed in time O log W minfw wwg where w and w are the weights of the items that precede and succeed the inserteddeleted item whose weightisw All the strategies discussed so far involve reasonably complex restructuring algorithms that re quire some balance information to b e stored with the tree no des However Brown has p ointed out that some of the unweighted schemes can be implemented without storing any balance infor mation explicitly This is b est illustrated with schemes suchasAVLtrees or redblack trees which require only one bit to b e stored with every no de this bit can b e implicitly enco ded by the order in whichthetwochildren p ointers are stored Since the identities of the children can b e recovered from their keys in constant time this leads to only constant overhead to the search and up date times whichthus remain logarithmic There are metho ds that require absolutely no balance information to b e maintained A partiu carly attractive one was prop osed by Sleator and Tarjan Their splay trees use an extremely simple restructuring strategy and still achieve all the access and up date time bounds mentioned ab ove b oth for the unweighted and for the weighted case where the weights do not even need to be known to the algorithm However the time b ounds are not to be taken as worst case b ounds for individual op erations but as amortized b ounds ie b ounds averaged over a suciently long sequence of op erations Since in many applications one p erforms long sequences of access and up date op erations such amortized b ounds are often satisfactory In spite of their elegant simplicity and their frugality in the use of storage space splay trees do have some drawbacks In particular they require a substantial amount of restructuring not only during up dates but also during accesses This makes them unusable for structures such as range trees and segment trees in which rotations are exp ensive Moreover this is undesirable in a caching or paging environment where the writes involved in the restructuring will dirty memory lo cations or pages that might otherwise stay clean Recently Galp erin and Rivest prop osed a new scheme called scap egoat trees which also needs basically no balance information at all and achieves logarithmic search time even in the worst case However logarithmic up date time is achieved only in the amortized sense Scap egoat trees also do not seem to lend themselves to applications such as range trees or segment trees In this pap er we present a strategy for balancing unweighted or weighted search trees that is based on randomization Weachieve expectedcase b ounds that are comparable to the deterministic worst case or amortized b ounds mentioned ab ove Here the exp ectation is taken over all p ossible sequences of coin ips in the up date algorithms Thus our b ounds do not rely on any assump tions ab out the input Our strategy and algorithms are exceedingly simple and should b e fast in practice For unweighted trees our strategy can b e implemented without storage space for balance information Randomized search trees are not the only randomized data structure for storing dynamic or dered sets Bill Pugh has prop osed and p opularized another randomized scheme called skip lists Although the twoschemes are quite dierenttheyhave almost identical exp ected p erformace characteristics We oer a brief comparison in the last section Section of the pap er describ es treaps the basic structure underlying randomized search trees In section unweighted and weighted randomized search trees are dened and all our main results ab out them are tabulated Section contains the analysis of various exp ected quantities in ran domized search trees such as exp ected depth of a no de or exp ected subtree size These results are then used in section where the various op erations on randomized search trees are describ ed and their running times are analyzed Section discusses how randomized search trees can be implemented using only very few truly random bits In section we showhow one can implement randomized search trees without maintaining explicit balance information In section we oer a short comparison of randomized search trees and skip lists V L Z X G S V A K P U W Y G Z X T M D J Q L A S W Y D K P U V J Z G M Q T S X A V W Y U D L G Z T K P S X A J M Q U W Y K D T L J V P G Z M Q S X A V U W Y K D G Z T J P S X A L Q W Y U K D M T J P Q M Figure DeletionInsertion of item L L Treaps Let X be a set of n items eachofwhich has asso ciated with it a key and a priority The keys are drawn from some totally ordered universe and so are the priorities The two ordered universes need not b e the same A treap for X is a ro oted binary tree with no de set X that is

Randomized Search Trees

Curriculum for Second Year of Computer Engineering (2019 Course) (With Effect from 2020-21)

KP-Trie Algorithm for Update and Search Operations

Balanced Trees Part One

Lecture 04 Linear Structures Sort

CSE 326: Data Structures Binary Search Trees

Assignment 3: Kdtree ______Due June 4, 11:59 PM

Binary Search Tree

Position Heaps for Cartesian-Tree Matching on Strings and Tries

Search Trees for Strings a Balanced Binary Search Tree Is a Powerful Data Structure That Stores a Set of Objects and Supports Many Operations Including

Assignment of Master's Thesis

Balanced Binary Search Trees 1 Introduction 2 BST Analysis

Lecture Notes of CSCI5610 Advanced Data Structures