Randomized Search Trees

Raimund Seidel

Fachb erich Informatik Computer Science Division

Universitat des Saarlandes University of California Berkeley

D Saarbruc ken GERMANY Berkeley CA

y

Cecilia R Aragon

Computer Science Division

University of California Berkeley

Berkeley CA

Abstract

We present a randomized strategy for maintaining balance in dynamically changing search

trees that has optimal expected b ehavior In particular in the exp ected case a search or an

up date takes logarithmic time with the up date requiring fewer than two rotations Moreover

the up date time remains logarithmic even if the cost of a rotation is taken to b e prop ortional

to the size of the rotated subtree Finger searches and splits and joins can be p erformed in

optimal exp ected time also We show that these results continue to hold even if very little true

randomness is available ie if only a logarithmic numb er of truely random bits are available

Our approach generalizes naturally to weighted trees where the exp ected time b ounds for

accesses and up dates again match the worst case time b ounds of the b est deterministic metho ds

We also discuss ways of implementing our randomized strategy so that no explicit balance

information is maintained Our balancing strategy and our algorithms are exceedingly simple

and should b e fast in practice

This pap er is dedicated to the memory of Gene Law ler

Intro duction

Storing sets of items so as to allow for fast access to an item given its key is a ubiquitous problem

in computer science When the keys are drawn from a large totally ordered the metho d of

choice for storing the items is usually some sort of search The simplest form of such a tree

is a binary Here a set X of n items is stored at the no des of a ro oted as

follows some item y X is chosen to be stored at the ro ot of the tree and the left and right

children of the ro ot are binary search trees for the sets X fx X j xk ey ykeyg and

Supp orted by NSF Presidential Young Investigator award CCR Email seidelcsunisbde

y

Supp orted byanATT graduate fellowship

X fx X j ykey xk ey g resp ectively The time necessary to access some item in such a

tree is then essentially determined by the depth of the no de at which the item is stored Thus it

is desirable that all no des in the tree have small depth This can easily be achieved if the set X

is known in advance and the search tree can b e constructed oline One only needs to balance

the tree by enforcing that X and X dierinsizeby at most one This ensures that no no de has

depth exceeding log n

When the set of items changes with time and items can b e inserted and deleted unpredictably

ensuring small depth of all the no des in the changing search tree is less straightforward Nonethe

less a fair numb er of strategies have b een develop ed for maintaining approximate balance in such

changing search trees Examples are AVLtrees a btrees BBtrees redblack

trees and many others All these classes of trees guarantee that accesses and up dates can b e

p erformed in O log nworst case time Some sort of balance information stored with the no des is

used for the restructuring during up dates All these trees can b e implemented so that the restruc

turing can b e done via small lo cal changes known as rotations see Fig Moreover with the

appropriate choice of parameters a btrees and BBtrees guarantee that the average number

of rotations p er up date is constant where the average is taken over a sequence of m up dates It can

even b e shown that most rotations o ccur close to the leaves roughly sp eaking for BBtrees

this means that the number of times that some subtree of size s is rotated is O ms see

This fact is imp ortant for the parallel use of these search trees and also for applications in compu

tational geometry where the no des of a primary tree have secondary search structures asso ciated

with them that have to be completely recomputed up on rotation in the primary tree eg range

trees and segment trees see

Rotate Left y x

x y

C A

AB BC

Rotate Right

Figure

Sometimes it is desirable that some items can b e accessed more easily than others For instance

if the access frequencies for the dierent items are known in advance then these items should b e

stored in a search tree so that items with high access frequency are close to the ro ot For the

static case an optimal tree of this kind can be constructed oline by a dynamic programming

technique For the dynamic case strategies are known such as biased trees and D trees

that allow accessing an item of weight w in worst case time O log Ww which is basically

optimal Here W is the sum of the weights of all the items in the tree Up dates can b e p erformed

in time O log W minfw wwg where w and w are the weights of the items that precede

and succeed the inserteddeleted item whose weightisw

All the strategies discussed so far involve reasonably complex restructuring algorithms that re

quire some balance information to b e stored with the tree no des However Brown has p ointed

out that some of the unweighted schemes can be implemented without storing any balance infor

mation explicitly This is b est illustrated with schemes suchasAVLtrees or redblack trees which

require only one bit to b e stored with every no de this bit can b e implicitly enco ded by the order

in whichthetwochildren p ointers are stored Since the identities of the children can b e recovered

from their keys in constant time this leads to only constant overhead to the search and up date

times whichthus remain logarithmic

There are metho ds that require absolutely no balance information to b e maintained A partiu

carly attractive one was prop osed by Sleator and Tarjan Their splay trees use an extremely

simple restructuring strategy and still achieve all the access and up date time bounds mentioned

ab ove b oth for the unweighted and for the weighted case where the weights do not even need to

be known to the algorithm However the time b ounds are not to be taken as worst case b ounds

for individual op erations but as amortized b ounds ie b ounds averaged over a suciently long

sequence of op erations Since in many applications one p erforms long sequences of access and

up date op erations such amortized b ounds are often satisfactory

In spite of their elegant simplicity and their frugality in the use of storage space splay trees do

have some drawbacks In particular they require a substantial amount of restructuring not only

during up dates but also during accesses This makes them unusable for structures such as range

trees and segment trees in which rotations are exp ensive Moreover this is undesirable in a caching

or paging environment where the writes involved in the restructuring will dirty memory lo cations

or pages that might otherwise stay clean

Recently Galp erin and Rivest prop osed a new scheme called scap egoat trees which also

needs basically no balance information at all and achieves logarithmic search time even in the worst

case However logarithmic up date time is achieved only in the amortized sense Scap egoat trees

also do not seem to lend themselves to applications such as range trees or segment trees

In this pap er we present a strategy for balancing unweighted or weighted search trees that is

based on randomization Weachieve expectedcase b ounds that are comparable to the deterministic

worst case or amortized b ounds mentioned ab ove Here the exp ectation is taken over all p ossible

sequences of coin ips in the up date algorithms Thus our b ounds do not rely on any assump

tions ab out the input Our strategy and algorithms are exceedingly simple and should b e fast in

practice For unweighted trees our strategy can b e implemented without storage space for balance

information

Randomized search trees are not the only randomized for storing dynamic or

dered sets Bill Pugh has prop osed and p opularized another randomized scheme called skip

lists Although the twoschemes are quite dierenttheyhave almost identical exp ected p erformace

characteristics We oer a brief comparison in the last section

Section of the pap er describ es the basic structure underlying randomized search trees

In section unweighted and weighted randomized search trees are dened and all our main results

ab out them are tabulated Section contains the analysis of various exp ected quantities in ran

domized search trees such as exp ected depth of a no de or exp ected subtree size These results

are then used in section where the various op erations on randomized search trees are describ ed

and their running times are analyzed Section discusses how randomized search trees can be

implemented using only very few truly random bits In section we showhow one can implement

randomized search trees without maintaining explicit balance information In section we oer a

short comparison of randomized search trees and skip lists



V

 

L Z

  

X G S



V

     

 

A K P U W Y

G Z



    





X T M D J Q

L

A



 

 S 

W Y

D K

 



P U 

V

J

 

   Z G

M Q T

 



S X

A



V

  

 

W Y U

  D L

G Z



 

  T

 K P

S X

A

  

  





J M Q

U W Y

K

D



 

T

L J



V



P

 

G Z

 

 

M Q



S X

A



  

 V



U W Y

K

D

 

G Z



 

T

J P

 



S X

A

 

L Q

  





W Y U

K

D



M



 

T

J P

 

Q M

Figure DeletionInsertion of item L

 L

Treaps

Let X be a set of n items eachofwhich has asso ciated with it a key and a priority The keys are

drawn from some totally ordered universe and so are the priorities The two ordered universes need

not b e the same A for X is a ro oted binary tree with no de set X that is arranged in inorder

with resp ect to the keys and in order with resp ect to the priorities Inorder means that

for any no de x in the tree ykey xk ey for all y in the left subtree of x and xk ey ykey for

y in the right subtree of x Heaporder means that for any no de x with parent z the relation

xpr ior ity z pr ior ity holds It is easy to see that for any set X such a treap exists With the

assumption that all the priorities and all the keys of the items in X are distinct a reasonable

assumption for the purp oses of this pap er the treap for X is unique the item with largest

priority b ecomes the ro ot and the allotment of the remaining items to the left and right subtree

is then determined by their keys Put dierently the treap for an item set X is exactly the binary

search tree that results from successively inserting the items of X in order of decreasing priority

into an initially empty tree using the usual leaf insertion algorithm for binary search trees

Let T be the treap storing set X Given the key of some item x X the item can easily be

lo cated in T via the usual search tree algorithm The time necessary to p erform this access will b e

prop ortional to the depth of x in the tree T How ab out up dates The insertion of a new item z

into T can b e achieved as follows At rst using the key of z attach z to T in the appropriate leaf

p osition At this p oint the keys of all the no des in the mo died tree are in inorder However the

heaporder condition might not b e satised ie z s parent mighthave a smaller priority than z To

reestablish heaporder simply rotate z up as long as it has a parent with smaller priorityoruntil

it b ecomes the ro ot Deletion of an item x from T can be achieved by inverting the insertion

op eration First lo cate x then rotate it down until it b ecomes a leaf where the decision to rotate

left or right is dictated by the relative order of the priorites of the children of x and nally clip

away the leaf see Figure

At times it is desirable to b e able to split asetX of items into the set X fx X j xk ey ag

and the set X fx X j xk ey ag where a is some given element of the key universe Con

versely one mightwanttojoin two sets X and X into one where it is assumed that the keys of

the items in X are smaller than the keys from X With treap representations of the sets these

op erations can be p erformed easily via the insertion and deletion op erations In order to split a

treap storing X according to some a simply insert an item with key a and innite priority By

the heaporder prop erty the newly inserted item will be at the ro ot of the new treap By the

inorder prop erty the left subtree of the ro ot will b e a treap for X and the right subtree will b e a

treap for X In order to join thetreapsoftwosetsX and X as ab ove simply create a dummy

ro ot whose left subtree is a treap for X and whose right subtree is a treap for X and p erform a

delete op eration on the dummy ro ot

Recursive pseudo co de implementations of these elementary treap up date algorithms are shown

in Figure

Sometimes handles or ngers are available that p oint to certain no des in a treap Such

handles p ermit accelerated op erations on treaps For instance if a handle on a no de x is available

then deleting x reduces just to rotating it down into leaf p osition and clipping it no search is

Herb ert Edelsbrunner p ointed out to us that Jean Vuillemin intro duced the same data structure in and

called it V The term treap was rst used for a dierent data structure by Ed McCreight who

later abandoned it in favor of the more mundane priority search tree Mc



In practice it will b e preferable to approach these op erations the other way round Joins and splits of treaps

can b e implemented as iterative topdown pro cedures insertions and deletions can then b e implemented as accesses

followed by splits or joins These implementations are op erationally equivalent to the ones given here

function EmptyTreap treap

tnul l prioritylchildrchild tnul ltnul l

return tnul l

pro cedure TreapInsertkp itemTtreap

if T tnul l then T newnode

T keyprioritylchild rchil d kptnul ltnul l

else if k T key then TreapInsertkp T lchild

if T lchild priority T priority then RotateRight T

else if k T key then TreapInsertkp T rchild

if T rchild priority T priority then RotateLeft T

else key k already in treap T

pro cedure TreapDelete k keyTtreap

tnul l key k

RecTreapDelete kT

pro cedure RecTreapDelete k keyT treap

if k T key then RecTreapDelete kT lchild

else if k T key then RecTreapDelete kT rchild

else RootDelete T

pro cedure RootDelete T treap

if IsLeaforNull T then T tnul l

else if T lchild priority T rchild priority then RotateRight T

RootDelete T rchild

else RotateLeft T

RootDelete T lchild

pro cedure TreapSplit T treapk key T T treap

TreapInsertkT

TT T lchildrchild

pro cedure TreapJoin TT T treap

T Newnode

T lchildrchild TT

RootDelete T

pro cedure RotateLeft T treap

TT rchildT rchild lc hi ld T rchildT rchild lc hi ld T

pro cedure RotateRight T treap

TT lchildT lchild rch ild T lchildT lchild rchil d T

function IsLeaforNull T treap Bo olean

return T lchild T rchild

Figure Simple routines for the elementary treap op erations of creation insertion deletion

splitting and joining We assume callbyreference semantics A treap no de has elds key priority

lchild rchild The global variable tnul l p oints to a sentinel no de whose existence is assumed

denotes parallel assignment

necessary Similarly the insertion of an item x into a treap can be accelerated if a handle to the

successor or predecessor s of x in the resulting treap is known start the search for the correct

leaf position of x at no de s instead of at the ro ot of the treap Socalled nger searches are

also p ossible where one is to lo cate a no de y in a treap but the search starts at some hop efully

nearby no de x that has a handle p ointing to it essentially one only needs to traverse the unique

path between x and y Also splits and joins of treaps can be p erformed faster if handles to the

minimum and maximum key items in the treaps are available These op erations are discussed in

detail in sections and

Some applications such as socalled Jordan sorting require the ecient excision of a sub

sequence ie splitting a set of X of items into Y fx X j a xk ey bg and Z fx X j

xk ey a or xk ey bg Suchan excision can of couse be achieved via splits and joins However

treaps also p ermit a faster implementation of excisions which is discussed in section

Randomized Search Trees

Let X b e a set of items each of which is uniquely identied byakey that is drawn from a totally

ordered set We dene a randomized search tree for X to be a treap for X where the priorities of

the items are indep endent identically distributed continuous random variables

Theorem A randomized search tree storing n items has the expected performance character

istics listed in the table below

Performance measure Bound on

exp ectation

access time O log n

insertion time O log n

insertion time for element with handle on predecessor or successor O

deletion time O log n

deletion time for element with handle O

numb er of rotations p er up date

ytime for nger searchover distance d O log d

ytime for fast nger searchover distance d O log minfd n dg

time for joining two trees of size m and n O log maxfm ng

time for splitting a tree into trees of size m and n O log maxfm ng

ztime for fast joining or fast splitting O log minfm ng

ytime for excising a tree of size d O log minfd n dg

up date time if cost of rotating a subtree of size s is O s O log n

k k

up date time if cost of rotating a subtree of size s is O s log O log s k n

a a

up date time if cost of rotating a subtree of size s is O s witha O n

P

f n f i

up date time if rotation cost is f s with f s nonnegative O



in

n

i

requires parent p ointers

yrequires parentpointers and some additional information see section

zrequires some parentpointers see section and

NowletX b e a set of items identiable bykeys drawn from a totally ordered universe More

over assume that every item x X has asso ciated with it an integer weight w x We dene

the weighted randomized search tree for X as a treap for X where the priorities are indep endent

continuous random variables dened as follows Let F be a xed continuous probability distribu

tion The priority of element x is the maximum of w x indep endent random variables each with

distribution F

Theorem Let T be a weighted randomized search tree for a set X of weighted items The

fol lowing table lists the expected performance characteristics of T Here W denotes the sum of the

weights of the items in X for an item x the predecessor and the successor by key rank in X are

denoted by x and x respectively T min and Tmax denote the items of minimal and maximal

key rank in X

Performance measure Bound on

exp ectation

W

time to acces item x O log

w x

W w x

time to insert item x O log

minfw x w xw x g

W

time to delete item x O log

minfw x w xw x g

w x w x w x

insertion time for item x with handle on predecessor O log

w x w x w x

w x w x

time to delete item x with handle O log

w x w x

w x w x

numb er of rotations p er up date on item x O log

w x w x

ytime for nger searchbetween x and y where V is

V

O log

minfw xw y g

total weightofitemsbetween and including x and y

time to join trees T and T of weight W and W

W W



log O log

w T max w T min



time to split T into trees T T of weight W W

W W



ztime for fast joining or fast splitting O log minf g

w T max w T min



w x

time for increasing the weight of item x by O log

w x

w x

time for decreasing the weight of itme x by O log

w x

requires parent p ointers

yrequires parentpointers and some additional information see section

zrequires some parentpointers see section and

Several remarks are in order First of all note that no assumptions are made ab out the key

distribution in X All exp ectations are with resp ect to randomness that is controlled by the

up date algorithms For the time being we assume that the priorities are kept hidden from the

user This is necessary to ensure that a randomized search tree is transformed into a randomized

searchtreebyanyupdate If the user knew the actual priorities it would a simple matter to create

avery nonrandom and unbalanced tree by a p olynomial number of updates

The requirement that the random variables used as priorities b e continuous is not really neces

sary We make this requirement only to ensure that with probability all priorities are distinct

Our results continue to hold for iid random variables for which the probability that to o many

of them are equal is suciently small This means that in practice using integer random num

bers drawn uniformly from a suciently large range such as to will be adequate for most

applications

Comparing with other balanced tree schemes some of which require only one bit of balance

information per no de it might seem disapp ointing that randomized search trees require to store

such extensive balance information at each no de However it is p ossible to avoid this In

section we discuss various strategies of maintaining randomized search trees that obviate the

explicit storage of the random priorities

Next note that weighted randomized search trees can b e made to adapt naturally to observed

access frequencies Consider the following strategy whenever an item x is accessed a new random

number r is generated according to distribution F if r is bigger than the current priorityof x

then make r the new priorityof x and if necessary rotate x up in the tree to reestablish the heap

prop erty After x has been accessed k times its priority will be the maximum of k iid random

variables Thus the exp ected depth of x in the tree will be O log p where p is the access

frequency of x ie p kAwithA b eing the total numb er of accesses to the tree

How would one insert an item x into a weighted randomized search tree with xed weight k

This can most easily b e done if the distribution function F is the identityiewe start with random

variables uniformly distributed in the interval The distribution function G for the maximum

k

k

of k such random variables has the form G z z From this it follows that x should b e inserted

k

k

into the tree with priority r where r is a random number chosen uniformly from the interval

Since the only op erations involving priorities are comparisons and the function

is monotonic one can also store log r k instead Adapting the tree to changing weights is also

p ossible see section

Finally there is the question of howmuch randomness is required for our metho d Howmany

random bits do es one need to implement randomized search trees There is a rather simple minded

strategy that shows that an exp ected constant numb er of random bits p er up date can suce in the

unweighted case This can b e achieved as follows Let the priorities b e real random numb ers drawn

uniformly from the interval Suchnumb ers can b e generated piecemeal by adding more and

more random bits as digits to their binary representations The idea is of course to generate

only as much of the binary representation as needed The only priority op eration in the up date

algorithms are comparisons between priorities In many cases the outcome of such a comparison

will already b e determined by the existing partial binary representations When this is not the case

ie one representation happ ens to b e a prex of the other one simply renes the representations by

app ending random bits in the obvious way It is easy to see that the exp ected numb er of additional

random bits needed to resolve the comparison is not greater than Since in our up date algorithms

priority comparisons happ en only in connection with rotations and since the exp ected number of

rotations per update is less than it follows that the exp ected number of random bits needed is

less than for insertions and less than for deletions

Surprisingly one can do much b etter than that Only O log n truly random bits suce overall

This can be achieved by letting a pseudo random number generator supply the priorities where

this generator needs O log n truly random seed bits One p ossible such generator works as follows

Let U n be a prime number Randomly cho ose integers in the range between and U

and make them the co ecients of a degree p olynomial q this requires the O log n random bits

As ith priority pro duce q imodU Other generators are p ossible The essential prop erty they

need to have is that they pro duce random numbers that are wise indep endent Actually a

somewhat weaker prop erty suces see section

Theorem All the results of Theorem continue to hold up to constant factors if the pri

orities are wise independent random variables from a suciently large range as for instance

produced by the pseudo random number generator outlined above

In order to achieve logarithmic expected access insertion and deletion time wise indepen

dence suces

Analysis of Randomized Search Trees

In this section weanalyzeanumber of interesting quanitities in randomized search trees and derive

their exp ected values These quantities are

D x the depth of no de x in a tree in other words the numberofnodesonthepathfromx to the

ro ot

S x the size of the subtree ro oted at no de x ie the numb er of no des contained in that subtree

P x y the length of the unique path between no des x and y in the tree ie the number of nodes

on that path

SLx and SRx the length of the right Spine of the Left subtree of x and the length of the left

Spine of the Right subtree of x by left spine of a tree we mean the ro ot together with the

left spine of the ro ots left subtree the right spine is dened analogously

Throughout this section we will deal with the treap T for a set X fx k p j i ng

i i i

of n items numbered by increasing key rank ie k k for in The priorities p will b e

i i i

random variables Since for a xed set of keys k the shap e of the treap T for X dep ends on the

i

priorities the quantities of our interest will b e random variables also for which it makes sense to

analyze exp ectations distributions etc

Our analysis is greatly simplied by the fact that each of our quantities can be represented

readily bytwotyp es of indicator random variables

A is if x is an ancestor of x in T and otherwise

ij i j

C is if x is a common ancestor of x and x in T and otherwise

i m

im

We consider each no de an ancestor of itself In particular wehave

Theorem Let m nand let m

P

A i D x

i

in

P

ii S x A

j

j n

P P

iii P x x A C A C

m im

i im im

im in

P

A C iv SLx

i i

i

P

SRx A C

i i

in

Pro of i and ii are clear since the depth of a no de is nothing but the numb er of its ancestors

and the size of its subtree is nothing but the numb er of no des for whichitisancestor

For iii note that the path from x to x is divided by their v into

m

two parts The part from x to v which consists exactly of the ancestors of x minus the common

ancestors of x and x is accounted for by the rst sum This would b e clear if the index range of

m

that sum were b etween and n The smaller index range is justifed by the fact that for m i

the no de x is an ancestor of x i it is a common ancestor of x and x ie A C in that

i m

i im

range Similarly the second sum accounts for the path b etween x and v Since v is not counted

m

in either of those two parts the correction factor is needed

For iv it suces to consider SLx by symmetry If x has a left subtree then the lowest

no de on its right spine must b e x It is clear that in that case the spine consists exactly of the

ancestors of x minus the ancestors of x but the latter are the common ancestor of x and

x Also no x with i can lie on that spine

i

If x has no left subtree then either in whichcase the formula correctly evaluates to

or x is an ancestor of x in which case every ancestor of x is a common ancestor of x and

x and the formula also correctly evaluates to

If weleta ExA and c ExC then by the linearityof exp ectations we imme

ij ij

im im

diately get the following

Corollary Let m n and let m

P

a i ExD x

i

in

P

ii ExS x a

j

j n

P P

iii ExP x x a c a c

m im

i im im

im in

P

iv ExSLx a c

i i

i

P

ExSRx a c

i i

in

In essence our whole analysis has now been reduced to determining the exp ectations a and

ij

c Note that since since we are dealing with indicator random variables wehave

im

a ExA PrA Prx is ancestor of x and

ij ij ij i j

c ExC PrC Prx is common ancestor of x and x

i m

im im im

Determining the probabilities and hence the exp ectations is made p ossible by the following ancestor

lemma which completely characterizes the ancestor relation in treaps

Lemma Let T be the treap for X and let i j n

Then assuming that al l priorities are distinct x is an ancestor of x in T i among al l x with

i j

h

h between and including i and j the item x has the largest priority

i

Pro of Let x be the item with the largest priority in T and let X fx j mg and

m

X fx jm ng By the denition of a treap x will b e the ro ot of T and its left and right

m

subtrees will b e treaps for X and X resp ectively

Clearly our ancestor characterization is correct for all pairs of no des involving the ro ot x It

m

is also correct for all pairs x X and x X they lie in dierent subtrees and hence are not

ancestorrelated But indeed the largest priority in the range between and is realized by x

m

and not by x or x

Finally by induction or recursion if you will the characterization is correct for all pairs of

no des in the treap for X and for all pairs of no des in the treap for X

As an immediate consequence of this ancestor lemma we obtain an analogous common ancestor

lemma

Lemma Let T be the treap for X and let m i n with m Let p denote the

priority of item x

Then assuming that al l priorities are distinct x is a common ancestor of x and x in T i

i m

p maxfp ji mg if i

i

p maxfp j mg if i m

i

p maxfp j ig if m i n

i

which can be summarized as

n o

p max p j minfi mg max fi mg

i

Nowwehave all to ols ready to analyze our quantities of interest We will deal with the case of

unweighted and weighted randomized search trees separately

The unweighted case

Thelasttwo lemmas makeiteasytoderive the ancestor probabilities a and c

ij

im

Corollary In an unweighted randomized search tree x is an ancestor of x with probability

i j

ji j j in other word we have

a ji j j

ij

Pro of According to the ancestor lemma we need the priority of x to be the largest among the

i

priorities of the ji j j items b etween x and x Since in an unweighted randomized search tree

i j

these priorities are indep endent identically distributed continuous random variables this happ ens

with probabilityji j j

Corollary Let m n

In the case of unweightedrandomizedsearch trees the expectation for common ancestorship is given

by

m i if i

c m if i m

im

i if m i n

which can be summarized as

c max fi mgminfi mg

im

Pro of Analogous to the previous pro of

Nowwe can put everything together to obtain exact expressions and closed form upp er b ounds

for the exp ectations of the quantities of interest The expressions involve harmonic numb ers dened

P

i Note the standard b ounds ln nH lnn for n as H

n n

in

Theorem Let m n and let m In an unweighted randomized search tree of n

nodes the fol lowing expectations hold

i ExD x H H

n

ln n

ii ExS x H H

n

ln n

iii ExP x x H H H H H

m m nm

m n

lnm

iv ExSLx

ExSRx n

Pro of Just use Corollary and plug in the values from Corollaries and

It is striking that in an unweighted randomized search tree the exp ected number of ancestors

of a no de x exactly equals the exp ected number of its descendants However it is reminiscent

although apparently unrelated to the following easily provable fact in any ro oted binary tree T

the average no de depth equals the average size of a subtree

We want to p oint out a ma jor dierence between the random variables D x and S x Al

though b oth have the same exp ectation they havevery dierent distributions D x isvery tightly

concentrated around its exp ectation whereas S x is not For instance one can easily see that

in an nno de tree PrD x nn whereas PrS x nn Using the ancestor lemma

it is not to o hard to derive the exact distribution of S x one gets PrS x n n and

PrS x sO s for s n We leave the details as an exercise to the reader Here we

briey prove a concentration result for D x

Lemma In an unweighedrandomizedsearch treewithn nodes we have for index n

and any c

c lnce

PrD x c ln n ne

P

Pro of Recall that D x A where A indicates whether x is an ancestor of x

i

i i

in

P P

Let L A count the left ancestors of x and let R A count the right

i i

i in

ancestors Since D x L R in order for D x to exceed c ln n at least one of L

and R has to exceed c ln n It now suces to prove that the probability of either of those events is

c lnce

b ounded byne We will just consider the case of R The other case is symmetric

The crucial observation is that for i the random variables A are indep endent and

i

one can therefore apply socalled Cherno b ounds which in one form state that if random

c lnceEx Z

variable Z is the sum of indep endent variables then PrZ c ExZ e

P

In order to make the b ound indep endentof we consider R A

i

in

where we use additional indep endent variables A for i n with ExA i

i i

Obviously for any k we have PrR k PrR k Using ExR H and using

n

ln n H ln n and applying the Cherno b ound we get the desired

n

PrR c ln n PrR c ln n PrR cH

n

c lnceH c lnceln n c lnce

n

e e ne

Note that this lemma implies that the probability of least one of the n no des in an unweighted

c lnce

randomized search tree having depth greater than c ln n is b ounded by nne In other

words the probability that the height of a randomized search tree is more than logarithmic is

exceedingly small and hence the trees exp ected height is logarithmic also In contrast to the

random variables studied in this section the random variable h the heightofannno de unweighted

n

randomized search tree is quite dicult to analyze exactly Devroye though has shown that

h ln n almost surelyas n where is the unique solution of lne

n

with

The weighted case

Recall that in the weighted case every item x has asso ciated with it a p ositiveinteger weight w

i i

and the weighted randomized search tree for a set of items uses as priorityfor x the maximum of

i

w indep endent continuous random variables each with the same distribution i

P

For i j let w denote w andforij dene w w Let W w denote the

ij ij j i n

h

ihj

total weight

Corollary In an weighted randomized search tree x is an ancestor of x with probability

i j

w w in other word we have

i ij

a w w

ij i ij

Pro of According to the ancestor lemma we need the priority of x to be the largest among the

i

priorities of the items b etween x and x This means one of the w random variables drawn for

i j i

x has to b e the largest of the w random variables drawn for the items b etween x and x But

i ij i j

since these random variables are identically distributed this happ ens with the indicated probability

Corollary Let m n

In the case of unweightedrandomizedsearch trees the expectation for common ancestorship is given

by

w w if i

i im

c w w if i m

i

im m

w w if m i n

i

i

Pro of Analogous to the previous pro of but based on the common ancestor lemma

Nowwe just need to plug our values into Corollary to get the following

Theorem Let m n and let m In an weighted randomized search tree with n

nodes of total weight W the fol lowing expectations hold

P

w w i ExD x

i

i

in

lnWw

P

ii ExS x w w

i

in

P

w w w w iii ExP x x

i i im m

i

i

P

w w w w w w

i i im i

i m

im

P

w w w w

i mi i

i

min

lnw w lnw w

m

m m

P

w w w w iv ExSLx

i i

i i

i

ln w w

P

ExSRx w w w w

i i

i i

in

ln w w

Pro of The exact expressions follow from Corollaries and

The quoted upp er b ounds are consequences of the following two inequalities that can easily b e

veried considering the area underneath the curve f xx

A ln A lnA for A

A B ln A lnA ln B lnB for A B

For instance to proveiwe can apply inequality and use the principle of telescoping sums to

derive

X X

w w ln w ln w lnw ln w lnw w lnWw

i

i i i

i i

P

and analogously w w lnw w lnWw which adding in the stemming from

i

n

in

i yields the stated b ound

Similarly we can use inequality to derive iv wejustshow the case of ExSLx

P P

w w w w ln w ln w ln w ln w

i i

i i i i i i

i i

ln w ln w ln w ln w

lnw ln w

ln w w

For proving iii one uses inequality and telescoping to b ound the rst sum bylnw w

m

and the third sum by lnw w The middle sum can be broken into three pieces The third

m

m

piece evaluates to and using inequality the rst piece is b ounded by lnw w and

m

the second piece by lnw w Together this yields the indicated b ound

m

m

With suciently uneven weight assignments the b ounds listed in this theorem can b ecome

arbitrarily bad in particular they can exceed n the numb er of no des in the tree which is certainly

an upp er b ound for every one of the discussed quantities One can obtain somewhat stronger

b ounds by optimizing over all p ossible weight assignments while leaving the total weight W and

the weightofx and p ossibly x xed Although this is p ossible in principle it seems technically

m

quite challenging in general We just illustrate the case D x

P

Which choice of w maximizes ExD x w w while leaving w and W w

i i i n

in

xed Rewrite the sum as

X X X

w w

i i

w w n w w

i i i i

w

i

in in in

A little bit of calculus shows that the last sum is minimized when all its summands are the same

n

which using our b oundary conditions implies that each of them is w W Thus it follows

that S ExD x is b ounded by

n n

n n w W n w W

which however is not a particularly illuminating expression except that it is easily seen never to

exceed n

Analysis of Op erations on Randomized Search Trees

In this section we discuss the various op erations on unweighted and weighted randomized search

trees and derive their exp ected eciency thus proving all b ounds claimed in Theorem and Theorem

Searches

A succesful search for item x in a treap T commences at the ro ot of T and using the key of x

traces out the path from the ro ot to x Clearly the time taken bysuch a search is prop ortional to

the length of the path or in other words the number of ancestors of x Thus the exp ected time

for a search for x is prop otional to the exp ected number of its ancestors whichis O log n in the

unweighted and O log Wwx in the weighted case by Theorems and

An unsuccessful search for a key that falls b etween the keys of successive items x and x takes

exp ected time O log n in the unweighted and O log W minfw x wx g in the weighted case

since such a searchmust terminate either at x or x ignoring the two cases where x or x do es

not exist

Insertions and Deletions

An item is inserted into a treap by rst attaching it at the prop er leaf p osition which is lo cated

by an unsuccessful search using the items key then the item is rotated up the treap as long as

it has a parent with smaller priority Clearly the number of those rotations can be at most the

length of the search path that was just traversed Thus the insertion time is prop ortional to the

time needed for an unsuccessful search which in exp ectation is O log n in the unweighted and

O log W minfw x wx g in the weighted case where x and x are the predecessor and

successor of x in the resulting tree

The deletion op eration requires essentially the same time since it basically reverses an insertion

op eration rst the desired item is lo cated in the tree using its key then it is rotated down until it

is in leaf p osition at whichpoint it is clipp ed away during those downward rotations the decision

whether to rotate right or left is dictated by the priorities of the two current children as the one

with larger prioritymust b e rotated up

What is still interesting is the exp ected numb er of rotations that o ccur during an up date Since

in terms of rotations a deletion is an exact reversal of an insertion it suces to analyze the number

of rotations that o ccur during a deletion

So let x b e an item in a treap T to b e deleted from T Although we cannot tell just from the tree

structure of T whichdownwards rotations will b e p erformed we can tell howmany there will b e

their numb er is the sum of the lengths of the right spine of the left subtree of x and the left spine of

the right subtree of x The correctness of this fact can b e seen by observing that a leftrotation of

x has the eect of shortening the left spine of its right subtree by one a rightrotation of x shortens

the right spine of the left subtree

Thus weknow from Theorems and that in exp ectation the numb er of rotations p er up

date is less than in the unweighted case and less than lnw xw x lnw xw x

O logw xw x w xw x in the weighted case

Insertion and Deletions using Handles

Having a handle ie a p ointer to a no de to be deleted from a treap of course obviates the need

to search for that no de Only the downward rotations to leaf p osition and the clipping away

needs to be p erformed which results in exp ected time bound of O in the unweighted and

O logw xw x w xw x in the weighted case

Similarly if we know the leaf p osition at which a new no de x is to be inserted which is

always either the resulting predecessor x or successor x of x then after attaching the new

leaf only the upward rotations need to be p erformed which also results in the exp ected time

b ounds just stated If just a p ointer to x is available and x cannot be attached as a leaf to

x since it has a nonempty right subtree then the additional cost of lo cating x is incurred

Note that in this case x is the bottom element of the left spine of that right subtree of x and

lo cating it amounts to traversing the spine The exp eced cost for this is therefore O in the

unweighted and O logw x w x which results in a total exp ected cost of O and

O logw xw x w xw x w x w x in the resp ective cases

Note that in contrast to ordinary insertions where the sequence of ancestors can b e computed

during the search for the leaf p osition insertions using handles require that ancestor information

is stored explicitly in the tree ie parent p ointers must b e available so that the upward rotations

can b e p erformed Deletions based on handles also require parent p ointers since when the no de x

to b e deleted is rotated down one needs to know its parent y so that the appropriate child p ointer

of y can b e reset

Finger searches

In a nger searchwe assume that wehave a nger or handle or p ointer on item x in a tree

and using this information we quickly want to lo cate the item y with given key k Without loss of

generalitywe assume that x x and y x with m The most natural way to p erform such

m

a search is to follow the unique path b etween x and x in the tree The exp ected length of this

m

path is given in Theorems and for weighted and unweighted randomized search trees which

immediately would yield the b ounds of Theorems and However things are not quite that

easy since it is not clear that one can traverse the path between x and x in time prop ortional

m

to its length Such a traversal would in general entail going up the tree starting at x until the

lowest common ancestor u of x and x is reached and then going down to x But how can one

m m

tell that u has b een reached For all weknow x maybeu already

If one reaches during the ascenttowards the ro ot a no de z whose key is larger than the search

key k then one has gone to o far similarlyifone reaches the ro ot of the entire tree via its right

child The desired u was the last no de on the traversed path that was reached via its left child

or if no such no de exists the start no de x This excess path between u and z maybemuchlonger

than the path between x and y and traversing this path may dominate the running time of the

nger search

There are several ways of dealing with this excess path problem One is to try to argue that in

exp ectation such an excess path has only constant length However this turns out to b e true for

the unweighted case only Other metho ds store extra information in the tree that either allows to

shortcut such excess paths or obviates the need to traverse them

Extra pointers

Let us rst discuss a metho d of adding extra pointers Dene the left parent of v is the rst no de

on the path from v to the ro ot whose key is smaller than the key of v and the right parent of

v is the rst no de on the path with larger key Put dierently a no de is the right parent of all

no des on the right spine of its left subtree and the left parent of all no des on the left spine of its

right subtree We store with every no de v in the tree b esides the two children p ointers also two

p ointers set to the left parent and right parent of v resp ectively or to nil if such an parent do es

not exist Such parent p ointers have b een used b efore in Please note that during a rotation or

when adding or clipping a leaf these parent p ointers can b e maintained at constant cost Thus no

signicant increase in up date times is incurred

A nger searchnow pro ceeds as follows starting at x chase right parentpointers until either a

nil p ointer is reached or the next no de has key larger than k using a sentinel with key obviates

this case distinction At this p oint one has reached the lowest common ancestor u of x and y and

using the key k one can nd the path from u to y following children p ointers in the usual fashion

Note that the numb er of links traversed is at most one more than the length of the path b etween

x and y and that it can be much smaller b ecause of the shortcuts provided by the right parent

p ointers

Extra keys

With every no de u in the tree one stores minu and maxu the smallest and largest key of any

item in the subtree ro oted at u With this information one can tell in constant time whether an

item can p ossibly b e in the subtree ro oted at u its key has to lie in the range b etween minuand

maxu The common ancestor of x and x is then the rst no de z on the path from x to the

m

ro ot with minz k maxz

Maintaining this minmax information clearly only takes constant time during rotations

However when a leaf is added during an insertion or is clipp ed away at the end of a deletion the

minmax information needs to b e up dated on an initial p ortion of the path toward the ro ot To

b e sp ecic consider the change when leaf x is removed The no des for which the min information

changes are exactly the no des on the left spine of the right subtree of the predecessor of x The

max information needs to be adjusted for all no des on the right spine of the left subtree of the

successor of x The exp ected lengths of those spines are given in Theorems and They

need to be added to the exp ected up date times which asymptotically is irrelevant in the case of

unweighted randomized search trees but can b e the dominant factor in the case of weighted trees

Finally to make this argument applicable to the items with smallest and largest keys in the

tree one needs to maintain treaps with two sentinel no des that are never removed one with key

and one with key b oth with random priorities

Relying on short excess paths

Finally let us consider the metho d suggested to us by Mehlhorn and Raman that traverses

the excess path and relies on the fact that in exp ectation this path is short at least for unweighted

trees In the weighted case one can conco ct examples where this exp ectation can b ecome arbitrarily

large The advantage of this metho d is that only the one usual parentpointer needs to b e stored

p er no de and not two parent p ointers However as we shall see so on one also need to store one

additional bit p er no de

As b efore let u be the lowest common ancestor between x x and y x and let the right

m

parent of u be z the endnode of the excess path Call the no des on the path between but not

including u and z the excess nodes Note that u x for some with m namely the one

with largest priorityinthat range and z x for some j with mj n and any excess no de

j

must b e some x with i We estimate the exp ected numb er of excess no des byintro ducing

i

for each i with i a random variable X indicating whether or not x is an excess no de

i i

and summing the exp ectations NowExX is the probabilitythat x is an excess no de whichwe

i i

represent as the sum of the probabilities of the disjointevents E that x is an excess no de and z

ij i

is x Now for E to happ en x mustbetheright parentofx and of uorinotherwords x and

j ij j i i

u must b oth b e on the right spine of the left subtree of x This happ ens if in the range b etween

j

i and j item x has the largest priority and x has the second largest and if u has the largest

j i

priority in the range from to j By the denition of u this last condition can b e reformulated

that the largest priority in the range to j o ccur in the range to m By the assumption that

all priorities are indep endently identically distributed it follows that

m

PrE

ij

j i j i j

and the exp ected numb er of excess no des is

X X X X X

m

ExX PrE

i ij

j i j i j

mj n mj n

i i i

P

Summing rst over i and applying the b ound k k a one can see that the double

akb

sum is upp er b ounded by

X

m

j j

mj n

whichinturnby applying the same b ound can b e seen to be upp er b ounded by Thus we have

proved that in the case of unweighted trees the exp ected numb er of excess no des is always less than

one

The ab ove analysis relies on the existence of z ie the lowest common ancestor u must have

a right parent This need not be the case u can lie on the right spine of the entire tree In this

case a naive nger searchwould continue going up past u along the spine until the ro ot is reached

This could be very wasteful Consider for instance a nger search from x to x which with

n n

probability would this way traverse the entire spine whichhas O log n exp ected length To

preventthiswe require that every no de has stored with it information indicating whether it is on

the right or left spine of the entire tree With this information a nger search can stop going up

as so on as the right spine of the entire tree is reached and thus in eect only the path b etween x

and y is traversed

Fast nger searches

Now assume we have a nger or p ointer on item x and also one on item z and we want to

quickly lo cate item y with key k A natural thing to do is to start nger searches b oth at x and at

z that pro ceed in lo ckstep in parallel until the rst search reaches y If X and Z are the lengths of

the paths from x and z to y resp ectively then by the previous subsection the time necessary for

such a parallel nger searchwould b e prop ortional to minfX Z g and the exp ected time would b e

prop ortional to ExminfX Z g minfEx X Ex Z g

This can b e exploited as follows Always maintain a p ointer to Tmin and Tmax the smallest

and largest key items in the tree Note that this can b e done with constant overhead p er up date

If one now wants to p erform a nger search starting at some no de x x for some no de y x

m

and say y s key is larger than the one of x then start in parallel a nger search from Tmax x

n

The exp ected time for this searchisnow prop ortional to the minimum of the exp ected path lengths

from x and T max to y whichby Theorem is O minflog m log n m g If welet

d m b e the key distance b etween x and y then this is at most O minflog d log n dg

O log minfd n dg as claimed in Theorem since n m n d

Joins and splits

Two treaps T and T with all keys in T smaller than all keys in T can be joined by creating

a new tree T whose ro ot r has as left child T and right child T and then deleting r fromt T

Creating T takes constant time Deleting r as we observed b efore takes the time prop ortional

to the length of the right spine of T plus the length of the left spine of T ie the depth of the

largest key item of T plus the depth of the smallest key item of T By Theorem this is in

exp ectation O log m lognO log max fm ng if T and T are unweighted randomized search

trees with m and n items resp ectively In the weighted case Theorem implies an exp ected

W W



log if weletW denote the total weightofT b ound of O log

i i

w T max w T min



Splitting a treap T intoatreap T with all keys smaller than k and treap T with all keys larger

than k is achieved by essentially reversing the join op eration An item x with key k and innite

priority is inserted into T b ecause of its large priority the new item will end up as the ro ot the left

and right subtrees of the ro ot will b e T and T resp ectively This insertion works by rst doing a

search starting at the ro ot to lo cate the correct leaf p osition for key k adding item x as a leaf and

then rotating x up until it b ecomes the ro ot Note that the length of the search path is the same

as the number of rotations p erformed which in turn is the sum of the lengths of the right spine

of T and the left spine of T Thus the time necessary to p erform this split is prop ortional to the

sum of the lengths of those two spines and therefore the same exp ected time b ounds hold as for

joining T and T

Fast splits

How can one sp eed up the split op eration describ ed and analyzed ab ove One needs to shorten

the search time for the correct leaf p osition and one needs to reduce the number of subsequent

rotations Let us assume the size or total weight of the eventual T is small Then it makes sense

to determine the correct leaf p osition for x in T by a nger search starting at T min the item in T

with minimum key Let z be the lowest common ancestor of Tmin and the leaf added for x Note

that z is part of the path between Tmin and x and that z must be on the left spine of T Now

when rotating x up the tree one can stop as so on as x b ecomes part of the left spine of the current

tree or in other words as so on as z b ecomes the left child of x The current left subtree of x will

then contain exactly all items of T with key smaller than the splitting key k and forms the desired

T The tree T is obtained by simply replacing the subtree ro oted at x by the right subtree of x

The time to do all this is prop ortional to the length L of the path in the original T between

Tmin and the leaf added for x as argued b efore the nger searchtakes this much time and the

number of subsequent rotations is exactly the number of edges on this path between x and z In

the unweighted case the exp ected value of L is by Theorem O log mwherem is the number of

no des that end up in T This is not particularly go o d if T is almost all of T In this case it would

be much b etter to pro ceed symmetrically start the nger search at Tmax the item in T with

largest priority and identify the path form Tmax to x The exp ectation of the length R of that

path and hence of the splitting time would then b e O log n where n is the size of T Of course

in general one do es not knowinadvance whether T or T is smaller But one can circumvent this

problem by starting b oth searches in lo ckstep in parallel terminating b oth as so on as one succeeds

This would take O minfL R g time And since ExminfL R g minfExL ExR g the total

exp ected time for the fast split would b e O minflog m log ngO log minfm ng

For the weighted case it nowwould seem to suce to observe that by Theorem the exp ec

tation of L is O log W w T minW w T max where T max is the item with largest

key in T which is of course nothing but x the predecessor of x in T However this is not quite

true since the new leaf x is not necessarily a child of x but could b e a child of x the successor

of x In that case the path in T from x to x in other words the right spine of the left subtree of

x that needs to b e traversed additionally after x has b een reached could b e quite long namely

O log w x w x This additional traversal and cost can be avoided if one uses more

space and stores with every item in the tree a predecessor and succesor pointer Note that those

additional p ointers could b e maintained with constantoverhead during all up date op erations

What kind of information needs to be maintained so that such a split pro cedure can be im

plemented One needs Tmin and T max since the nger searches start at Tmin and Tmax and

the upward parts of the searches only pro ceed along the spines of T one needs to maintain parent

p ointers only for spine no des and not for all no des

Fast joins

In a fast join of T and T one to reverse the op erations of a fast split Starting at T max and

T min one traverses the right spine RS of T and the left spine LS of T in a merge like fashion

until either a no de x of RS is found whose priority is larger than the one of the ro ot of T or

symmetrically a no de y of LS is found whose priority exceeds the one of the ro ot of T In the

rst case the subtree T of T ro oted at x is replaced by the join of T and T computed using the

x x

normal join algorithm and the ro ot of T b ecomes the ro ot of the result In the other case one

pro ceeds symmetrically and the ro ot of T b ecomes the ro ot of the result

This in eect reverses the op eration of a fast split except that no nger searches had to be

made Therefore the time required for fast joining T and T into T is upp er b ounded by the time

necessary to fast split T into T and T

Excisions

Let x and y be two items in an nno de treap T and assume we have a nger on at least one of

them We would like to extract from T the treap T containing all d items with keys b etween and

including the keys of x and y leaving in T all the remaining items First we showthatwe can do

this in time prop ortional to P x y the length of the path from x to y plus SLxandSRy the

lengths of the right spine of the left subtree of x and the left spine of the right subtree of y By

Theorem the exp ectation of this would b e O log d

With a nger search rst identify the path connecting x and y and with this the lowest common

ancestor u of x and y all in time O P x y Let T b e the subtree ro oted at u It suces to show

u

how to excise T from T Let LL b e the sequence of no des on the path from x to u with keys less

u

than the key of x and let A b e the sequence of the remaining no des on that path not including u

Symmetrically let RR b e the sequence of no des on the path from y to u with keys greater than the

key of y and let B b e the sequence of the remaining no des not including u Now the desired treap

T has as its ro ot u its left subtree is formed by stringing together the no des in A along with their

right subtrees its right subtree is formed by stringing together the no des in B together with their

left subtrees Clearly T can thus b e formed in time O P x y

The remaining pieces of T can b e put backinto a treap as follows Form a treap L by stringing

u

together the no des in LL together with their left subtrees adding the left subtree of x at the

bottom of the right spine Symmetricallyforma treap R by stringing together the no des in RR

adding the right subtree of y at the b ottom of the left spine The remainder of T is constructed

u

by stringing the members of LL into a tree L and RR intoatreeR see Figures and Clearly

L and R can b e constructed in time O P x y In section we showed that the time necessary

for the join op eration is prop ortional to sum of the lenghts of the right spine of L and the left spine

of R But this sum is at most P x y SLxSRy

The excision of T from T can also be p erformed in exp ected O log n d time using the

following metho d Split T into treaps L and T where L contains all items with key less than the

key of xsplitT into the desired T and R where R contains all items with key greater then the

u

A



R



B L

 

B



B



A



A



x

y

R



L



R

L

B



L A



B R



Figure Subtree T b efore the excision

u

L R

 

L R

 

u

L R

L R

 

B x A

 

B y A

 

B A

 

B A



B

Figure The trees L T and R

one of y nally join L and R to form the remainder treap L and R each contains at most n d

no des Thus using the fast split metho d of section and the normal join metho d this can all b e

p erformed in exp ected O log n d time

Of course usually d is not knowinadvance Thus we face the usual dilemma of which metho d

to cho ose This can be resolved as follows In lo ckstep in parallel p erform a nger search from x

to y and p erform a nger search from T min to x followed by a nger search from T max to y If

the search from x to y is completed rst use the rst metho d for the excision otherwise use the

second metho d

Exp ensive rotations

How exp ensiveisittomaintain unweighted randomized search trees under insertions and deletions

when the cost of a rotation is not constant but dep ends as f s on the size s of the subtree that is

b eing rotated

Since an insertion op eration is just the reverse of a deletion op eration it suces to analyze

just deletions Recall that in order to delete a no de x it rst has to be lo cated and then it has

to be rotated down into leaf p osition where it is then clipp ed away Since the search time and

the clipping away is unaected by the cost of rotations we only need to analyze the exp ected cost

R x of rotating x into leaf p osition

f

As b efore let x x b e the items in the treap numbered by increasing key rank Assume we

n

want to delete x

k

For i k j let E denote the event that at some point during this deletion x is the ro ot

k ij k

of a subtree comprising the j i items x through x Then the total exp ected cost of the

i j

deletion is given by

X

PrE f j i R x

k ij f k

ik

k j n

We need to evaluate PrE We can do this bycharacterizing those congurations of priorities

k ij

that cause event E to happ en

k ij

We claim that in the generic case i k j n this event o ccurs exactly if among the

j i items x through x the items x x x have the largest three priorities the order

i j i j

k

among those three is irrelevant

As a consequence of the ancestor lemma x is ro ot of a subtree comprising x through x exactly

i j

k

if x has largest priority in that range and x and x each have larger priority than x ie

i j

k k

x and x have the largest two priorities among x through x and x has third largest

i j i j

k

priority Now the deletion of x can b e viewed as continuously decreasing its priority and rotating

k

x down whenever its priority b ecomes smaller than the priority of one of its children thus always

k

maintaining a legal treap This means that if x x x have the largest three priorities among

i j

k

x through x at some point during the decrease x will have third largest priority and thus

i j

k

will b e ro ot of a tree comprising x through x as claimed Similarly x can never b ecome the ro ot

i j

k

of such a tree if x x x do not initially have the largest three priorities among x through

i j i

k

x

j

Using the same typ e of argument it is easy to see that the leftmarginal event E happ ens

k j

i x and x have the largest two priorities among the j itmes x through x The right

j j

k

marginal event E happ ens i x and x have the largest two priorities among x through

i j

k in k

x Finally E of course o ccurs exacltyifx has the largest of all n priorities

n

k n k

Since in an unweighted randomized search tree the priorities are indep endent identically dis

tributed continuous random variables we can conclude from these characterizations that

for i k jngeneric case j i

j for i and k jnleftmarginal case

PrE

k ij

n i for i k and j n rightmarginal case

n for i and j n full case

Substituting these values now yields

X X X X

f n f j f n i f j i

R x

f k

n

j n i j i

k jn ik ik k jn

In this form this expression is not particularaly illuminating Rewriting it as a linear combination

of f fn yields for k n

X X X

f n n k

R x f s f s f s

f k

n

s s s s s

sk nk sn k snk

and for k n we can exploit symmetry and get R x R x This is the exact

f k f nk

exp ectation and applies to any arbitrary real valued function f For nonnegative f it is easy to

see that for any k this expression is upp er b ounded by

X

R x O f nn f ss

f k

sn

From this the b ounds of Theorem ab out exp ensive rotations follow immediately

There is a slightly less cumb ersome way to arrive at this asymptotic b ound For any k and any

s n item x can participate in at most s generic events E with j i s each having

k k ij

a probability of O s which yields a contribution of Of ss to R x Similarly x can

f k k

participate in at most two marginal events E with j i s each having a probabilityof

k ij

of O s which also yields a contribution of O f ss to R x Finally x participates in

f k k

exactly one full event E which has probability n and gives the f nn contribution to

k n

R x

f k

A dierent rotation cost mo del

The ab ove analysis hinges on the fact that the probability that a particular three random variables

are the smallest in a set of s indep endent identically distributed continuous random variables is

O s In the next section which deals with limited randomness we will see that it is advanta

geous if one only needs to consider twooutofs variables and not three

In order to achievethiswewillslightly change how the cost of a rotation is measured If no de

x is rotated then the cost will be f f r where and r are the sizes of the left and right

subtrees of x This cost mo del is asymptotically indistinguishable from the previous one as long as

there exists a constants c and c so that c f f r f r c f f r for all

r This is the case for all nondecreasing functions that satisfy f n c f n which

is true essentially for all increasing functions that grow at most p olynomially fast

We will distribute this cost of a rotation at x to the individual no des of the subtrees as follows

anodey that diers in key rank from x by j is charged f j f j f j with the convention

 m m

We use the notations x xx x m  and x xx  x  m

that f Since the right subtree of x contains the rst r no des that succeed x in key rank

P

f j whichevaluates to f r as desired the charge distributed in the right subtree is thus

j r

Symmetrically the total charge in the left subtree adds up to f

Now let x x be the no de to be deleted and let y x be some other no de The no de y

k k j

may participate in several rotations during the deletion of x What are the ro ots z x of the

k i

maximal subtrees that are moved up during those rotations It is not hard to see that before

the deletion b egan both y and z must have been descendants of x and after completion of the

deletion y must be a descendant of z This charaterizes the z s exactly and corresp onds to the

following condition on the priorities In the index range b etween the minimum and the maximum

of fk k i k j g the no de x x must have the largest priority and z x must have the

k k i

second largest Note that with uniform and indep endent random priorities this condition hold with

ss if the size of the range is s probabilitys

If D denotes the event that y x participated in a down rotation of x x against

ij

k j k

z x then the exp ected cost of the rotations R x incurred when x is deleted from an

k i f k k

nno de tree using cost function f can b e written as

X X

R x PrD f jj j

ij

f k

k j nk k ink

j  i 

Since PrD max fk i k j k gminfk i k j k g the inner sum evaluates for a xed

ij

jto

X X X

j i j i

ij

k i jink

P

whichusing a b evaluates to

ab

h i h i h i

j j k j j n k j j

For y x a symmetric argument shows that the inner sum is also upp er b ounded byj When

k j

f is non decreasing ie f is nonnegative we therefore get

X X X X

f j f j f n f j f j

R x

f k

j j j n

j

j n jn

jk jnk

from which again the b ounds on exp ensive rotations stated in Theorem follow Note that this

metho d actually also allows the exact computation of R x for any arbitrary real valued function

f k

f

Changing weights

If the priority p of a single item x in a treap is changed to a new priority p then the heap prop erty

of the treap can b e reestablished by simply rotating x up or down the treap as is done during the

insertion and deletion op erations The cost of this will b e prop ortional to the numb er of rotations

p erformed whichis jD x D xj where D x and D x are the depth of x in the old and new

tree resp ectively

Now assume the weight w of an item x in a weighted randomized search tree of total weight

W is changed to w and after the required change in the random priorities the tree is restructured

as outlined ab ove In the old tree x had exp ected depth O log Ww in the new tree it has

exp ected depth O log W w where W W w w is the total weight of the new tree

One is now tempted to claim that the exp ected depth dierence and hence the exp ected number of

rotations to achieve the change is O j log Ww log W w j whichisabout O j log w w j if

the weight change is small relativeto W There are two problems with sucha quick claim a in

general it is not true that ExjX Y jjExX ExY j b one cannot upp er b ound a dierence

A B using only upp er b ounds for A and B

For the sake of deniteness let us deal with the case of a weight increase ie w w The

case of a decrease can be dealt with analogously Let us rst address problem a In section

we briey outlined how to realize weighted randomized search trees As priority p for an item x of

w

weight w use u or equivalently log uw where u is a random numb er uniformly distributed

in This simulates generating the maximum of w random numb ers drawn indep endently and

w

uniformly from If the new weightofx is w and one cho oses as new priority p v where

v is a new random number drawn from then p is not necessarily larger than p This means

that D x the new depth of x could be larger than the old depth D x inspite of the weight

increase whichisexpectedto make the depth of x smaller Thus since the relative order of D x

and D xisunknown ExjD x D xj b ecomes dicult to evaluate

w

This diculty do es not arise if one cho oses as new priority p u or equivalently log uw

where u is the random numb er originally drawn from Note that although the random variable

p and p are highly dep endent each has the correct distribution and this is all that is required

w w

Since w w we have u u ie p p Thus we have D x D x and therefore

ExjD x D xj ExD x D x ExD x ExD x

Addressing problem b p ointed out ab ove we can b ound the dierence of those two exp ecta

tions using the fact that we know exact expressions for eachofthem Assume that x has key rank

ie x x and let w w be the weight increase From Theorem we get using the

notation from there

X X

w w

i i

ExD x ExD x

w w

i i

in in

i i

X X

w w w w

i i i i

w w w w

i i i i

in i

Applying the metho ds used in the pro of of part iv of Theorem it is easy to show that each

w

w

of the last two sums is b ounded from ab ove byln ln From this b ound on the exp ected

w w

dierence of old and new depth of x it follows that the exp ected time to adjust the tree after the

weightofx has b een increased from w to w is O log w w Using the same metho ds one can

show that decreasing w to w can b e dealt with in time O log ww

When implementing the metho d outlined here one needs to store for every item the priority

implicitly in twopiecesw u where integer w is the weight and u is a random numb er from

w w

When two priorities w u and w u are to be compared one has to compare u with u

Alternatively one could store the pieces w log u and use log uw for the explicit comparison

This raises the issue of the cost of arithmetic We can p ostulate a mo del of computation where

w

the evaluation of an expression like u or log u takes constanttimeandthus dealing with priorities

do es not b ecome a signicant overhead to the tree op erations We would like to argue that such

a mo del is not that unrealistic This seems clear in practice since there one would denitely use

a oating p oint implementation This is not to saythatweighted trees are necessarily practical

From the theoretical pointof view the assumption of constant time evaluation of those functions

is not that unrealistic since Brenthasshown that when measured in the bit mo del evaluating

m

such functions up to a relative error of is only slightly more costly than multyplying two m

bit numbers

Thus we assume a wordmodel where each of the four basic arithmetic op erations and evaluating

functions such as log u using op erands sp ecied by logarithmically manybitscosts constant time

It seems natural to assume here that logarithmically means O log W Wenow need to deal with

one issue it is not clear that a word size of O log W suces to represent our random priorities so

that their relative order can b e determined

Here is a simple argumentwhy O log W bitsperword should suce for our purp oses Following

the denition of a weighted randomized search tree the priorities of an n no de tree of total weight

W can be viewed as follows W random numbers are drawn indep endently and uniformly from

the interval and certain n of those chosen numbers are selected to be the priorities Now

k

basic arguments show that with probability at most W the dierence of any two of the W

k k

chosen numb ers is smaller than W This means that with probabilityat least W all

comparisons between priorities can be resolved if the numbers are represented using more than

k log W bits ie aword size of O log W suces with high probability

Limited Randomness

The analyses of the previous sections crucially relied on the availability of p erfect randomness and

on the complete indep endence of the random priorities In this section we briey discuss how

one can do with much less thus proving Theorem We will show that for unweighted trees

all the asymptotic results ab out exp ectations proved in the previous sections continue to hold if

the priorities in the tree are integer random variables that are of only limited indep endence and

are uniformly distributed over a suciently large range In particular we will show that wise

indep endence and range size U n suce The standard example of a family of random variables

satisfying these prop erties is X q imodU for i n where U n is a prime number

i

and q is a degree p olynomial whose co ecients are drawn uniformly and indep endently from

f U g Thus q acts as a pseudo random number generator that needs O log n truely

random bits as seeds to sp ecify its eight co ecients

It is quite easy to see why one would want U n Ideally all priorites o ccuring in a randomized

search tree should b e distinct Our algorithms on treaps can easily b e made to handle the case of

equal priorities However for the analysis and for providing guarantees on the exp ectations it is

preferably that all priorities b e distinct Because of the pairwise indep endence implied by wise

indep endence for anytwo distinct random variables X X wehavePrX X U Thus the

i j i j

n

probability that any two of the n random variables happ en to agree is upp er b ounded by U

and as the birthday paradox illustrates not much smaller than that With U n the probability

of some two priorities agreeing thus becomes less than n We can now safely ignore the case of

agreeing priorities since in that event we could even aord to rebuild the entire tree whichwould

incur only O log n exp ected cost

Why wise indep endence Let X beanitesetofrandomvariables and let d b e some integer

constant We say that X has the dmax property i there is some constant c so that for any

enumeration X X X of the elements of any subset of X wehave

m

d

PrX X X fX X X g cm

m

d d d

Note that identically distributed indep endentcontinous random variables have the dmax prop erty

for any d with a constant c

It turns out that all our results ab out exp ected prop erties of randomized search trees and of their

up date op erations can be proved by relying only on the max prop ertyof the random priorities

Moreover the max prop erty is implied by wise indep endence b ecause of the following remarkable

lemma that is essentially due to Mulmuley

Lemma Let X be a set of n random variables each uniformly distributed over a common

integer range of size at least n

X has the dmax property if its random variables are d wise independent

A pro of of this lemma or rather of a related version can b e found in Mulmuleys b o ok section

We now need to show that results of section ab out unweighted trees continue to hold up

to a constant factor if the random priorities satisfy the max prop erty This is clear for the

central Corollaries and As a consequence of the ancestor and common ancestor lemma they

essentially just give the probability that a certain one in a set of priorities achieves the maximum

For those two corollaries the max prop ertywould actually suce From the continued validityof

Corollary the asymptotic versions of p oints i and ii of Theorem ab out exp ected depth

of a no de and size of its subtree follow immediately also relying only on the max prop erty Note

that this means that if one is only interested in a basic version of randomized search trees where

the exp ected search and up date times are logarithmic although more than an exp ected constant

number of rotations may occur per up date then wise indep endence of the random priorities

provably suces

Points iii and iv of Thero em do not follow immediately They consider exp ectations of

random variables that were expressed in Theorem as the sum of dierences of indicator variables

A C and upp er b ounds for the exp ectations of A and C yield no upp er b ound for

i im i im

the exp ectation of their dierence Now A C really indicates the event E that x is

i

i im im

an ancestor of x but not an ancestor of x Weneedtoshow that if the priorities have the max

m

prop erty PrE isessentially the same as if the priorities were completely indep endent

im

Without loss of generality let us assume m In the case i the event E happ ens

im

exactly when the following constellation occurs among the priorities x has the largest priority

i

among the items with index b etween and including i and but not the largest priority among the

items with index b etween i and m For i m event E occurs i x has the largest priority

i

im

among the items with indices in the range between and i but not the largest in the range to

m For the case im the event is empty

Thus in both cases we are dealing with an event E of the following form For a set Z

XY Z

of random variables and X Y Z the random variable X is largest among the ones in Y but

not the largest among the ones in Z In the case of identically distributed indep endent random

variables clearly we get PrE jY j jZ j The following claim shows that essentially the

XY Z

same is true if Z has the max prop erty

Claim If Z has the max property then PrE O jY j jZ j

XY Z

Pro of Let a jY j and b jZ j and let X X X be an enumeration of Z so that X X

b

and Y fX X g For ai b let F denote the event that X is largest among fX X g

a i i i

and X is second largest Because of the max prop erty of Z we have PrF O i Since

i

S

F we therefore get E

i XY Z

aib

X X

PrE PrF O i O a b

XY Z i

aib aib

as desired

Thus p oints iii and iv of Theorem hold up to constant factors if priorities have the

max prop erty This immediately means that all results listed in Theorem except for the ones

on exp ensive rotations continue to hold up to constant factors if random priorities satisfying the

max prop erty are used The results on exp ensive rotations rely on the max prop erty However

if one uses the alternate cost mo del as explained in section then the max prop erty again

suces This cost mo del is equivalent to the rst one for all rotation cost functions of practical

interest

The only result for unweighted trees that seems to require something more than the max

prop erty is the one on short excess paths for nger searches in section This is not to o serious

since other metho ds for implementing nger searches are available Weleave it as an op en problem

to determine weak conditions on priorities under which excess paths remain short in exp ectation

Implicit Priorities

In this section weshow that it is p ossible to implementunweighted randomized search trees so that

no priorities are stored explicitly We oer three dierent metho ds One uses hash functions to

generate or regenerate priorities on demand The other stores the no des of the tree in a random

and uses no de addresses as priorities The last metho d recomputes priorities from

subtree sizes

Priorities from hash functions

This metho d is based on an initial idea of Danny Sleator He suggested to cho ose and asso ciate

with a randomized search tree a hash function h For every item in the tree the priority is then

declared to b e hk where k is the key of the item This priority need not b e stored since it can b e

recomputed whenever it is needed The hop e is of course that a go o d hash function is random

enough so that the generated priorities b ehave like random numbers

Initially it was not clear what sort of hash function would actually exhibit enough random

b ehaviour However the results of the previous section show that cho osing for h the randomly

selected degree p olynomial q mentioned at the b eginning of the previous section do es the trick

If one is only interested in normal search and up date times then a randomly selected degree

p olynomial suces as discussed in the previous section In order to make this scheme applicable

for anysortofkey typ e we apply q not to the key but to the address of the no de where the resp ective

item is stored

One may argue that from a practical p oint of view it is to o exp ensive to evaluate a degree

p olynomial whenever a priority needs to b e lo oked at Note however that priorities are compared

only during up dates and that priority comparisons are coupled with rotations This means that

the exp ected numb er of priorities lo oked at during a deletion is less than and during an insertion

it is less than

Bob Tarjanpointed out to us that this metho d also yields a go o d randomized metho d for

the socalled unique representation problem where one would like subsets of a nite universe to

have unique tree representations see eg and

Lo cations as priorities

Here we store the no des of the tree in an array L in random order Now the addresses of the no des

can serve as priorities ie the no de Li has priority i We will assume here that the underlying

treap has the minheap prop erty and not the maxheap prop erty Thus L will the ro ot of the tree

How do es one insert into or delete from an nno de tree stored in this fashion Basically one

needs to up date a random p ermutation In order to insert an item x some i with i n is

chosen uniformly at random If i n then x is stored in lo cation Ln ie it is assigned

priority n and it is then inserted in the treap If i n the no de stored at Li is moved to

Ln ie its priority is changed to n This means in the tree it has to be rotated down

into leaf p osition The new item x is placed into lo cation Li and it is inserted in the treap with

priority i

When item x Li is to be removed from the tree it is rst deleted in the usual fashion

Since this vacates lo cation Li the no de stored at Lnismoved there This means its prioritywas

changed from n to i and it has to b e rotated up the tree accordingly

This scheme is relatively simple but it do es have some drawbacks Per up date some extra

no de changes lo cation and has to b e rotated up or down the tree This is not a problem timewise

since the exp ected numb er of those rotations is constant However changing the lo cation of a no de

y means that one needs to access its parent so that the relevant child p ointer can be reset For

accessing the parent one either needs to maintain explicit parentpointers which is costly in space

or one needs to nd it via a search for y which is costly in time Explicit parentpointers denitely

are necessary if a family of trees is to b e maintained under joins and splits One can keep the no des

of all the trees in one array However when an extra no de is moved during an up date one do es

not know which tree it b elongs to and hence one cannot nd its parent via search Also note the

book keeping problem when the ro ot of a tree is moved

Finally there is the question what size the array L should have This is no problem if the

maximum size of the tree is known a priori If this is not the case one can adjust the size dynamically

by say doubling it whenever the arraybecomeslled and halving it whenever it is only full

With a strategy of this sort the copying cost incurred through the size changes is easy to b e seen

constant in the amortized sense

Computing priorities from subtree sizes

This metho d was suggested byBent and Driscoll It assumes that for every no de x in the tree

the size S x of its subtree is known In a number of applications of search trees this information

is stored with every no de in any case

During the deletion of a no de y for every down rotation one needs to decide whether to rotate

left or right This decision is normally dictated by the priorities of the two children x and z of y

the one with larger priority is rotated up The priority of x is the largest of the S x priorities

stored in its subtree The priorityof z is the largest of the S z priorities in its subtree Thus the

probabilitythat the priorityof x is larger than the priorityof z is p S xS xS z This

means that p should also b e the probability that x is rotated up Thus the decision whichwayto

rotate x can b e probabilistically correctly simulated by ipping a coin with bias S xS xS z

It is amusing that one can actually do this without storing the sizes Before the rotation one could

determine S x and S z in linear time bytraversing the two subtrees This would make the cost

of the rotation linear in the size of the subtree rotated and our results ab out costly rotations imply

that the exp ected deletion time is still logarithmic

Unfortunately this trick do es not work for insertions How do es one p erform them Note that

when a no de x is to b e inserted into a a tree ro oted at y it b ecomes the ro ot of the new tree with

probabilityS y This suggests the following strategy Flip a coin with bias S y

In case of success insert x into the tree by nding the correct leaf p osition and rotating it all the

way back up to the ro ot In case of failure apply this strategy recursively to the appropriate child

of y

Weleave the implementation of joins and splits via this metho d as an exercise

Randomized search trees and Skip lists

Skip lists are a probabilistic search structure that were prop osed and p opularized by Pugh

They can be viewed as a hierarchy of coarser and coarser lists constructed over an initial linked

list with a coarser list in the hierarchy guiding the search in the next ner list in the hierarchy

Which list items are used in the coarses lists is determined via random choice Alternatively skip

lists can also b e viewed as a randomized version of a btrees

The p erformance characteristics of skip lists are virtually identical to the ones of unweighted

randomized search trees The bounds listed in Theorem all hold if the notion of rotation is

changed to the notion of pointer change The only p ossible exception are the results ab out costly

rotations here it seems only partial results are known for skip lists see section

Just as with randomized search trees it is p ossible to generalize skip lists to a weighted version

This is done by appropriately biasing the random choice that determines how far up the

hierachy a list item is to go Apparently again the exp ected p erformance characteristics of weighted

skip lists match the ones of weighted randomized search trees listed in Theorem

No analogue of Theorem ab out limited randomness for skip lists has app eared in the litera

ture However Kurt Mehlhorn has adapted the approachtaken in this pap er to apply to skip

lists also Also Martin Dietzfelbinger apparently has proved results in this direction

Comparing skip lists and randomized search trees seems a fruitless exercise They are both

conceptually reasonably simple and both are reasonably simply to implement Both have b een

implemented and are for instance availab e as part of LEDA They seem to be almost

identical in their imp ortant p erformance characteristics Dierences such as randomized search

trees can b e implemented using only exactly n p ointers whereas this app ears to b e imp ossible for

skip lists are not of particularly great practical signicance

There seems to b e ample ro om for b oth skip lists and randomized search trees We would like

to refer the reader to chapter of Mulmuleys book where the two structures are used and

discussed as two prototypical randomization strategies

Acknowledgements

Wewould like to thank Kurt Mehlhorn for his constructive comments and criticism

References

GM AdelsonVelskii and YM Landis An algorithm for the organization of information

Soviet Math Dokl

A Andersson and T Ottmann Faster uniquely represented dictionaries Proc nd

FOCS

H Baumgarten H Jung and K Mehlhorn Dynamic p oint lo cation in general sub division

Proc rd ACMSIAM Symp on Discrete Algorithms SODA

R Bayer and E McCreight Organization and maintenance of large ordered indices Act

Inf

SW Bent DD Sleator and RE Tarjan Biased search trees SIAM J Comput

SW Bent and JR Driscoll Randomly balanced search trees Manuscript

RP Brent Fast Multiple Precision Evaluation of Elementary Functions J of the ACM

M Brown Addendum to A Storage Scheme for HeightBalanced Trees Inf Proc

Letters

KL Clarkson K Mehlhorn and R Seidel Four results on randomized incremental

construction Comp Geometry Theory and Applications

L Devroye A note on the height of binary search trees J of the ACM

M Dietzfelbinger private communication

I Galp erin and RL Rivest Scap egoat Trees Pro c th ACMSIAM Symp on Discrete

Algorithms SODA

LJ Guibas and R Sedgewick A dichromatic framework for balanced trees Proc th

FOCS

T Hagerup and C Rub A guided tour of Cherno b ounds Inf Proc Letters

K Homan K Mehlhorn P Rosenstiehl and RE Tarjan Sorting Jordan sequences in

linear time using level linked search trees Inform and Control

E McCreight Priority search trees SIAM J Comput

K Mehlhorn Sorting and Searching Springer

K Mehlhorn Multidimensional Searching and Computational Geometry

Springer

K Mehlhorn private communication

K Mehlhorn and S Naher Algorithm Design and Software Libraries Recent Develop

ments in the LEDA Pro ject Algorithms Software Architectures Information Processing

Vol Elsevier Science Publishers BV

K Mehlhorn and S Naher LEDA a Platform for Combinatorial and Geometric Com

puting To app ear in Commun ACMJanuary

K Mehlhorn and R Raman private communication

K Mulmuley Computational Geometry An Intro duction through Randomized

Algorithms Prentice Hall

S Naher LEDA User Manual Version Tech Rep ort MPII MaxPlanck

Institut fur Informatik Saarbruc ken

J Nievergelt and EM Reingold Binary search trees of b ounded balance SIAM J

Comput

W Pugh Skip Lists A Probabilistic Alternative to Balanced Trees Commun ACM

W Pugh and T Teitelbaum Incremental Computation via Function Caching Proc th

ACM POPL

DD Sleator private communication

RE Tarjan private communication

DD Sleator and RE Tarjan Selfadjusting binary search trees J of the ACM

J Vuillemin A Unifying Lo ok at Data Structures Commun ACM