Advanced Algorithms

Lecture Septemb er Lecturer David Karger

Scrib es Jaime Teevan Arvind Sankar Jason Rennie

Splay Trees

Motivation and Background

With splay trees we are leaving b ehind heaps and moving on to trees Trees allow one to p erform

actions not typically supp orted by heaps such as nd element nd predecessor nd successor and

print no des in order Binary trees are a common and basic form of a As long as a

is balanced it has logarithmic insert and delete time The goal of a is to have the tree

maintain a logarithmic depth in an amortized sense by adjusting its structure with each access This

creates a p owerful which can b e proven to b e in the limit as go o d as or b etter than

any static tree optimized for a certain sequence of accesses

Previous Work

After binary search trees were rst prop osed a numb er of variants were develop ed to improve on

their p o or worst case b ehavior These include AVL trees trees redblack trees and many more

Each improved the p erformance of a simple but left something to b e desired Most

require augmentation of the simple tree data structure and none can claim theoretical p erformance

as go o d as that of splay trees

The two basic elements of splay trees selfadjustment and rotation are hardly new Many variants

on binary trees use some form of rotation selfadjusting linked lists and heaps had b een intro duced

ere known for

b efore splay trees Splay trees were basically the right orchestration of ideas that w

some time While variants on binary search trees often aggressively pursue maintaining a balance

splay trees deal with the problem lazily doing a little bit of work with each tree op eration but

making little obvious eort to maintain a nicely balanced tree

Intuition

This laziness is instantiated in the splay trees selfmo dication with every access It would b e nice

if we could show that the work done for accessing an element is at most O log n This isnt p ossible

so we do the next b est thing that is we show that any additional work that we do b eyond O log n

can b e accounted for by our past laziness In other words if we do work O log n we want to

show that there were a lot of op erations b efore this one where we did less work than we were allowed

ay the work that is O log n by spreading it around to previously

and thus we can amortize aw

p erformed op erations As we all know by know this technique is called amortization

Lecture Septemb er

If the path to nd a no de is long it implies that the tree is p o orly balanced Thus as you travel

from a no de to a new no de most of the tree b elow is the new no de A double rotations distributes

some of the weight b elow the queried no de over its siblings The total amount of imbalance which

is measured by the sum of ranks is a p otential function against which we charge the cost of long

searches

Tree Rotation

Tree rotation can b e thought of as a way to move a no de to a higher p osition in a binary search

tree without aecting the ordering prop erties The simplest rotations are called single rotations

they involve two no des and their corresp onding subtrees Figure displays such a simple rotation

When read from left to right the rotation brings no de x to the top of the tree

Y X

X C A Y

A B B C

Figure A zig rotation

Double rotations

A slightly more complication rotation is a double rotation Here three no des and their subtrees

are involved a double rotation essentially p erforms two single rotations in sequence Figure is a

depiction of what is known as a zigzig rotation When following the arrow from left to right the

no de x is brought up two levels to the top of the subtree Figure is a depiction of what is known

as a zigzag rotation Any weight b elow x is spread more evenly as x is brought to the ro ot of the

subtree

Z X

Y D A Y

X C B Z

A B C D

Figure A zigzig rotation

The double rotation is one of the key elements that make Splay Trees the success that they are While

single rotations can move a queried element to the top of a tree without aecting key ordering only

Lecture Septemb er

Z X

Y D Y Z

X A A B C D

B C

Figure A zigzag rotation

double rotations allow yield the balancing prop erties that give splay trees static optimality and

O log n op erations

Running time

Splay trees run all basic op erations in logn time We can prove this through manipulation of

weights and p otential function We dene w x to b e the weight of no de x Then

X

sx w y

y 2descendants

r x log sx

2

X

DS w x

is the sum of the weights of all descendants of x including the weight of x itself r x is called sx

the rank of x is the p otential function that we use for proving prop erties ab out splay trees DS

represents the splay tree data structure We can use this notation as a basis for proving the theorem

that underlies the logn running time of splay trees

Theorem Access Theorem The amortized time to access x from root t is at most r t �

r x

Pro of Using the simple lemma that we prove b elow we will show that the amortized cost of a

r t � r x double rotation is � r t � r x and the amortized cost of a single rotation is �

We will show that a sequence of rotations yields a telescoping sum that results in the b ound describ ed

in the Access Theorem

Lemma Given b as a root with two children a and c r a r c � r b �

Proof Consider the equivalent inequality among the sizes of the three no des

2

sasc � sb

Lecture Septemb er

We know that

sb sa sc w b

by denition Since the weights are nonnegative we obtain

sb � sa sc

Hence

2

2

� sasc � sb � sasc sa sc

or

2

2

asc

� sb � s sa � sc

Since squares are always nonnegative we get the desired inequality

2

� sb � sasc

Consider the amortized cost of p erforming one rotation Lets say that z is the ro ot of a subtree and

that no de x is either a child or grand child of z If we can show that the amotized cost of any double

rotation � r z � r x and the amortized cost of any single rotation is � r z � r x then

the amortized cost of splaying an element x to ro ot x is

0 k

C r x � r x r x � r x r x � r x

k k �1 k �1 k �2 1 0

� r x r x

0 k

where the x s i � are ro ots of subtrees that are rotated

i

Now we will show that the amortized time for each rotation involved in the splaying of no de x is

0 0 0

at most r x � r x and the time for a single rotation is at most r x � r x where r

denotes the rank of a no de after the rotation There are three cases to b e considered

ZIG This is the rotation shown in gure The no des that change ranks are x and y So the

amortized cost is

0 0

r x r y � r x � r y

0 0

0

x r y and r y � r x So the cost is at most

But r

0 0

r x � r x � r x � r x

ZIGZAG This is the double rotation shown in gure Now three no des change rank namely x y

and z The amortized cost is

0 0 0

r x r y r z � r x � r y � r z

0

z and r y � r x so that the cost is at most We note that r x r

0 0

r y r z � r x

which we rewrite as

0 0 0 0

r x � r x r y r z � r x

which by the lemma is at most

0 0

r x � r x � r x � r x

Lecture Septemb er

ZIGZIG The other kind of double rotation is shown in gure Again three no des x y and z change

ranks The amortized cost is therefore

0 0 0

r x r y r z � r x � r y � r z

0

We have r x r z so the cost simplies to

0 0

r y r z � r x � r y

0 0

x this expression is at most

Since r x � r y and r y � r

0 0

r x r z � r x

0

We want this to b e less than or equal to r x � r x so we need to show that

0 0

r x r z � r x �

This lo oks somewhat like the inequality we proved in the lemma so we try to see if that can

b e applied The pro of dep ended only on the fact that the sum of the sizes of the two subtrees

was at most the size of the entire tree Clearly that remains true

0 0

sx s z � s x

So we can still apply the lemma and get the desired inequality

Now we can sum over all the rotations that are needed to splay x to the ro ot The sum telescop es

and we get the desired b ound

r t log st and r x log sx so r t � r x O r t � r x O log st �

Note that

s(t)

In the future we will denote st as W which is the sum of all of the logsx O log

s(x)

to denote the weight of a single no de x Note that weights in the tree ro oted at t We also use w

x

W

log x to the ro ot t is w � sx As a result the amortized time to splay a no de O

x

w

x

Now we will show that the basic tree op erations insert and delete are O log n We will do this by

dening two op erations split and join that can b e used to easily implement insert and delete

JoinT x T is a function of trees T and T and element x where t x t 8t 2 T t 2 T

1 2 1 2 1 2 1 1 2 2

JoinT x T returns a tree where x is the ro ot with T as its left subtree and T as its right

1 2 1 2

has amortized cost O log n Note that we can also join two trees T T subtree JoinT x T

2 1 2 1

can b e joined by splaying the rightmost element of T and making T its right subtree This has

1 2

amortized cost O l og n

SplitT x is a function of a tree T and an element x SplitT x returns two trees T � x T

1 2

Note that x simply denes a b oundary b etween the two trees x is not necessarily contained in T

1

SplitT x is implemented by splaying the greatest element less than x to the ro ot We remove the

The remaining tree is named T right subtree and call it T

1 2

We are now prepared to implelement insert and delete with O log n amortized cost

DeleteT x is a function of a tree T and an element x We p erform splitT x to yield two subtrees

T and T Note that since x was an element of T it must now b e the ro ot of T Since the ro ot of

1 2 1

Lecture Septemb er

do es not have a left subtree we may remove x in constant time We then join the two trees T T

1 1

and T Since we have done a constant numb er of O log n op erations the amortized cost for delete

2

is O log n

InsertT x is a function of a tree T and an element x We p erform splitT x to yield two subtrees

T and T We then joinT x T to yield our resultant tree The amortized cost for insert is

1 2 1 2

O log n

Applications

In nitem tree access time is O log n per operation Let w

Corollary

x

X

W w n

x

x2nodes

n W

O log O log n This means that O log

w 1

x

Corollary Splaying is competitive against any xed binary tree Imagine you have a the ideal

�d

xed binary tree Every item in that tree is assigned a depth d Let w

x

X X

x �d

W w � �

x

x

x2nodes

1

W

O log O d

The amortized cost to get a depth d item in a splay tree is O log

�d

w

3

x

Corollary Static Optimality Theorem Splaying is competitive against the best possible tree m

is the total number of accesses to a tree p is the fraction of times that x wil l be accessed making

x

p � m the access frequency for item x Optimal access time is

x

X

m p log

x

p

x

x2nodes

1 W

O log Let the weight for item x be w p Amortized cost for x is O log

x x

w p

x x

Any information theorist wil l recognize this sum as the entropy of a probability distribution If we

think of code lengths for items instead of node depths then the above equation is the the average

code length required to send information about m items given the underlying probability distribution

As one might guess we can use information theory to determine the optimal static tree for a given

access pattern

Lecture Septemb er

Corollary Static Finger Theorem Suppose when you search for an element you leave a nger

f and begin the next search from f We show that a splay tree performs as wel l

w

x

2

x f

X X

O w O

x

2

x

x

x2nodes

1 W

O log jx f j O log Amortized cost for x is O log

1

w

x

2

(1+(x�f ))

Corollary Working Set Theorem Access item x at time j Let t be the number of distinct

j j

items since previous x access Thus the amortized cost of an access is O log t

j j

Practicalities

A ma jor drawback of splay trees is that they require mo dication on every access Mo dern computer

systems normally use a cache to optimize for very fast read op erations Frequent changes to the

ks

data structure means frequent cache invalidations and an ineective cache There are sev eral tric

that can b e used to overcome this obstacle One is to splay for a while and then to stop This

only works well when the access frequency of each no de remains relatively constant Another trick

is to only splay on op erations that require log n units of work This allows splay tree accesses to

remain O log n while reducing the amount of tree mo dication that must b e p erformed