SKIP LIST & SKIP GRAPH

James Aspnes Gauri Shah

Many slides have been taken from the original presentaon by the authors 2

Definion of Skip List

A skip list for a L of disnct (key, element) items is a series of linked lists L0, L1 , … , Lh such that

Each list Li contains the special keys +∞ and -∞ List L0 contains the keys of L in non-decreasing order Each list is a subsequence of the previous one, i.e.,

L0 ⊃ L1 ⊃ … ⊃ Lh

List Lh contains only the two special keys +∞ and -∞ 3 Skip List

(Idea due to Pugh ’90, CACM paper) Diconary based on a probabilisc . Allows efficient search, insert, and delete operaons. Each element in the diconary typically stores addional useful informaon beside its search key. Example:

[for University of Iowa] [for Daily Iowan]

Probabilisc alternave to a balanced . 4 Skip List

HEAD TAIL

-∞ J +∞ Level 2

A J M Level 1

Level 0 A G J M R W

Each node linked at higher level with probability 1/2. 5 Another example

L2 -∞ 31 +∞

L1 -∞ 23 31 34 64 +∞

-∞ 12 23 26 31 34 44 56 64 78 ∞ L0 +

Each element of Li appears in L.i+1 with probability p. Higher levels denote express lanes. 6 Searching in Skip List

Search for a key x in a skip:

Start at the first posion of the top list At the current posion P, compare x with y ≥ key aer P

x = y -> return element (aer (P)) x > y -> “scan forward” x < y -> “drop down”

– If we move past the boom list, then no such key exists 7 Example of search for 78

-∞ +∞ L3

L2 -∞ 31 +∞

L 1 -∞ 23 31 34 64 +∞

- 12 23 26 31 34 44 56 64 78 L0 ∞ +∞

At L1 P is 64, the next element +∞ is bigger than 78, we drop down At L0, 78 = 78, so the search is over. 8 Insertion

• The insert algorithm uses randomizaon to decide in how many levels the new item should be added to the skip list. • Aer inserng the new item at the boom level flip a coin. • If it returns tail, inseron is complete. Otherwise, move to next higher level and insert in this level at the appropriate posion, and repeat the coin flip. 9 Inseron Example

1) Suppose we want to insert 15 2) Do a search, and find the spot between 10 and 23 3) Suppose the coin come up “head” three times

L3 -∞ +∞ p2 L2 -∞ +∞ L2 -∞ 15 +∞ p1 L1 -∞ 23 +∞ L1 -∞ 15 23 +∞ p0 L0 -∞ 10 23 36 +∞ L0 -∞ 10 15 23 36 +∞ 10 Deleon

• Search for the given key . If a posion with key is not found, then no such key exists. • Otherwise, if a posion with key is found (it will be definitely found on the boom level), then we remove all occurrences of from every level. • If the uppermost level is empty, remove it. 11 Deleon Example

1) Suppose we want to delete 34 2) Do a search, find the spot between 23 and 45 3) Remove all occurrences of the key

p2 L2 -∞ 34 +∞ L2 -∞ +∞ p1 L1 -∞ 23 34 +∞ L1 -∞ 23 +∞ p0 L0 -∞ 12 23 34 45 +∞ L0 -∞ 12 23 45 +∞ 12

O(1) pointers per node

Average number of pointers per node = O(1)

Total number of pointers

= 2.n + 2. n/2 + 2. n/4 + 2. n/8 + … = 4.n

So, the average number of pointers per node = 4 13

Number of levels

The number of levels = O(log n) w.h.p

Pr[a given element x is above level c log n] = 1/2c log n = 1/nc

Pr[any element is above level c log n] = n. 1/nc = 1/nc-1 So the number of levels = O(log n) with high probability 14 Search me

Consider a skiplist with two levels L0 and L1. To search a key, first search L1 and then search L0.

Cost (i.e. search me) = length (L1) + n / length (L1) Minimum when length (L1) = n / length (L1).

1/2 1/2 Thus length(L1) = (n) , and cost = 2. (n) (Three lists) minimum cost = 3. (n)1/3 (Log n lists) minimum cost = log n. (n) 1/log n = 2.log n 15

Skip lists for P2P?

Advantages • O(log n) expected search me. • Retains locality. • Dynamic node addions/deleons.

Disadvantages • Heavily loaded top-level nodes. • Easily suscepble to failures. • Lacks redundancy. 16 A Skip Graph

G W 101 Level 2 A 100 J M R

000 001 011 110

100 G R W 110 101 Level 1 A J M

001 001 011 Membership vectors

A G J M R W Level 0 001 100 001 011 110 101

Link at level i to nodes with matching prefix of length i. Think of a tree of skip lists that share lower layers. 17

Properes of skip graphs

1. Efficient Searching. 2. Efficient node inserons & deleons. 3. Independence from system size. 4. Locality and range queries. 18 Searching: avg. O (log n)

Restricng to the lists containing the starng element of the search, we get a skip list.

G W Level 2 A J M R

G R W

Level 1 A J M

A G J M R W Level 0

Same performance as DHTs. 19 Node Inseron – 1

buddy new node G W J 001 Level 2 A 100 M R 101 000 011 110

G R W

Level 1 A 100 M 110 101 001 011

A G M R W Level 0 001 100 011 110 101

Starng at buddy node, find nearest key at level 0. Takes O(log n) me on average. 20 Node Inseron - 2 At each level i, find nearest node with matching prefix of membership vector of length i.

G W

Level 2 A 100 J M R 101 000 001 011 110

G R W 110 101 Level 1 A 100 J M 001 001 011

A G J M R W

Level 0 001 100 001 011 110 101

Total me for inseron: O(log n) DHTs take: O(log2n) 21

Independent of system size

No need to know size of keyspace or number of nodes.

E J Z Level 2 insert E Z Level 1 J E J Z Level 1 E Z 00 01 Level 0 1 0 E J Z Level 0 1 0 0

Old nodes extend membership vector as required with arrivals. DHTs require knowledge of keyspace size inially. 22

Locality and range queries

• Find key < F, > F. • Find largest key < x. A D F I • Find least key > x.

• Find all keys in interval [D..O].

A D F I L O S

• Inial node inseron at level 0. 23 Applicaons of locality

Version Control e.g. find latest news from yesterday. find largest key < news: 02/13.

Level 0 news:02/09 news:02/10 news:02/11 news:02/12 news:02/13

Data Replicaon e.g. find any copy of some Britney Spears song.

Level 0 britney01 britney02 britney03 britney04 britney05

DHTs cannot do this easily as hashing destroys locality. 24

So far... Coming up...

 Decentralization. • Load balancing.  Locality properties. •Tolerance to faults.  O(log n) space per node. • Random faults.  O(log n) search, insert, and • Adversarial faults. delete time. • Self-stabilization.  Independent of system size. 25 Load balancing

Interested in average load on a node u. i.e. the number of searches from source s to desnaon t that use node u.

Theorem: Let dist (u, t) = d. Then the probability that a search from s to t passes through u is < 2/(d+1). where V = {nodes v: u <= v <= t} and |V| = d+1. 26 Observaon

s Level 2

Nodes u Level 1

Level 0

Node u is on the search path from s to t only if it is in the skip list formed from the lists of s at each level. 27 Tallest nodes

s u is not on path. s u is on path.  u

u u

u t u t

Node u is on the search path from s to t only if it is in T = the set of k tallest nodes in [u..t]. d+1 Pr [u T] = Pr[|T|=k] • k/(d+1) = E[|T|]/(d+1). ∈ k=1 Heights independent of posion, so distances are symmetric. 28 Load on node u

Start with n nodes. Each node goes to next set with prob. 1/2. We want expected size of T = last non-empty set.

We show that: E[|T|] < 2.

= T

Asymptocally: E[|T|] = 1/(ln 2) ± 2x10-5 ≃ 1.4427… [ analysis] Average load on a node is inversely proporonal to the distance from the desnaon. 29 Experimental result

1.1

1.0 Expected load 0.9 Actual load Destination = 76542 0.8

0.7

0.6

0.5

Load on node 0.4

0.3

0.2

0.1

0.0 76400 76450 76500 76550 76600 76650 Node location 30

Fault tolerance

How do node failures affect skip graph performance?

Random failures: Randomly chosen nodes fail. Experimental results.

Adversarial failures: Adversary carefully chooses nodes that fail. Bound on expansion ratio. 31

Random faults

131072 nodes 32

Searches with random failures

131072 nodes 10000 messages 33

Adversarial faults

dA = nodes adjacent to A but A not in A. To disconnect A dA all nodes in dA must be removed

Expansion rao = min |dA|/|A|, 1 <= |A| <= n/2.

Theorem: A skip graph with n nodes has expansion rao = (1/log n).

f failures can isolate only O(f•log n ) nodes. 34

Need for repair mechanism

G W

Level 2 A J M R

G R W

Level 1 A J M

A G J M R W Level 0

Node failures can leave skip graph in inconsistent state. 35

Ideal skip graph

Let xRi (xLi) be the right (left) neighbor of x at level i.

If xLi, xRi exist: k xLi = xLi-1. Successor k constraints xRi = xRi-1. xLi < x < xRi. xLiRi = xRiLi = x. xRi Level i x

Invariant Level i-1 x ..00.. 1 2 xR i-1 xR i-1 ..01.. ..00.. 36 Basic repair If a node detects a missing neighbor, it to patch the link using other levels.

1 5

1 3 5 6

1 2 3 4 5 6

Also relink at other lower levels. Successor constraints may be violated by node arrivals or failures. 37 Constraint violation

Neighbor at level i not present at level (i-1).

Level i x x

x Level i-1 x ..00.. ..01.. ..01.. ..01.. ..00.. ..01.. ..01.. ..01.. ..00.. ..01.. ..01.. zipper ..01.. ..00.. ..01..

x Level i

x Level i-1 38

Conclusions

Similarities with DHTs

• Decentralization.

• O(log n) space at each node.

• O(log n) search time.

• Load balancing properties.

• Tolerant of random faults. 39 Differences

Property DHTs Skip Graphs

Insert/Delete O(log2n) O(log n) time Locality No Yes

Repair mechanism ? Partial

Tolerance of ? Yes adversarial faults Keyspace size Reqd. Not reqd. 40

Open Problems

• Design efficient repair mechanism.

• Incorporate geographical proximity.

• Study multi-dimensional skip graphs.

• Evaluate performance in practice.

• Study effect of byzantine failures.

?