Randomized Data Structures & Algorithms Treaps, Skip Lists

Today’s Outline

CSE 326: Data Structures • Review of probability Worst Case, Average Case, and In-Between • Motivation for randomization • Two randomized data structures – Treaps Hannah Tang and Brian Tjaden – Randomized Skip Lists • Two randomized algorithms Summer Quarter 2002 – Primality checking – Graph searching • Again?!

The Problem with Deterministic Data Structures The Motivation for Randomization We’ve seen many data structures with good average case Instead of randomizing the input (since we performance on random inputs, but bad behavior on cannot!), consider randomizing the data particular inputs structure We define the worst case runtime over all possible – No bad inputs, just unlucky random numbers inputs I of size n as: – Expected case good behavior on any input Worst-case T(n) = max T(I) I We define the average case runtime over all possible inputs I of size n as: Average-case T(n) = ( S T(I) ) / numPossInputs I

Expectant Cases What’s the Difference? Definition: – A worst-case expected time analysis is a weighted sum of all • Randomized with good expected time possible outcomes over some probability distribution – Once in a while you will have an expensive operation, but no inputs can make this happen all the time Thus, the expected runtime of a randomized data structure on some input I is: Expected T(I) = S( Pr(S) * T(I, S) ) • Deterministic with good average time S – If your application happens to always use the “bad” case, And the worst-case expected runtime of a randomized data you are in big trouble! structure* is: Expected T(n) = max ( S(Pr(S) * T(I, S)) ) I S • Expected time is kind of * Randomized data structure = = a data like an insurance policy structure whose behaviour is dependant for your algorithm! on a sequence of random numbers

1 Treap Data Structure for the Does Randomization Work? Dictionary ADT • Not-really randomized data structures Treaps: heap in yellow; search tree in blue – Splay Trees (compare with AVL trees) – Have the binary tree 2 – Skew Heaps (compare with Leftist heaps) structure property 9 – Others? – Have the BST order property 6 4 • Randomized data structures 7 18 – Universal Hashing hash table – Have the heap order property with • Also Perfect Hashing hash table 7 9 10 randomly assigned – Others? priorities 8 15 30

Legend: 15 priority 12 key

Treap Insert Tree + Heap… Why Bother? • Choose a random priority • Insert as in normal BST Insert data in sorted order into a treap; what • Rotate up until heap order is restored shape tree comes out? (maintaining BST property while rotating) insert(7) insert(8) insert(9) insert(12) insert(15) 6 6 2 2 2 2 2 7 7 9 9 9 9 9 7 6 6 15 6 14 6 14 6 9 8 7 7 12 7 12 7 12 7 15 Legend: 7 7 7 7 9 7 14 priority 8 8 15 8 12 key 8 8

Treap Delete • Find the key Treap Delete, cont. • Increase its priorityto ¥ 6 6 • Rotate it to the fringe 6 rotate right 7 rotate right 7 7 • Snip it off 7 7 7 8 8 delete(9) 8 6 6 ¥ 9 9 7 7 2Þ¥ 9 15 15 9 7 9 ¥ rotate left ¥ rotate left rotate right 15 8 15 9 6 9 9 15 12 7 15 ¥ 12 15 ¥ 7 9 9 7 15 9 12 9 8 15 8 12 15 15 15 snip! 12 12

2 Treap Summary Perfect Binary Skip List

Implements Dictionary ADT • Sorted linked list – Insert in expected O(log n) time – Delete in expected O(log n) time • # of links of a node is its height – Find in expected O(log n) time • The height i link of each node (that has one) – But worst case O(n) links to the next node of height i or greater Memory use – O(1) per node – About the cost of AVL trees 22

Very simple to implement, little overhead 11 19 29 8 10 13 20 23 – Less than AVL trees 2

Find() in a Perfect Binary Skip List Insert() in a Perfect Binary Skip List

• Start i at the maximum height • Until the node is found,or i =1 and the next node is too large: – If the next node along the i link is less than the target, traverse to the next node – Otherwise, decrease i by one

Runtime? Runtime?

Randomized Skip List Intuition Randomized Skip List • Sorted linked list • It’s fartoo hard to insert into a perfect skip list • # of links of a node is its height • The height i link of each node (that has one) links • But is perfection necessary? to the next node of height i or greater • There should be about 1/2 as many height i+1 nodes as height i nodes • What matters in a skip list? – We want way fewer tall nodes than short ones

– Make good progress through the list with each high 13 22 8

traverse 10 20 29 11 19 23 2

3 Find() in a RSL Insert() in a RSL

• Start i at the maximum height • Flip a coin until it comes up heads • Until the node is found or i is one and the next – This will take i flips. Make the new node’s height i. node is too large: • Do a find, remembering nodes where we moved – If the next node along the i link is less than the target, down one link traverse to the next node • Add the new node at the spot where the find ends – Otherwise, decrease i by one • Point all the nodes where we moved down (up to the new node’s height) at the new node Same as for a perfect skip list! • Point the new node’s links where those redirected pointers were pointing Runtime?

RSL Insert Example Iteration and1D Range Queries insert(22) with 3 flips

13 • Iteration: successively return (in order) each

8 element in the structure 10 20 29 11 19 23

2 – Start at beginning, walk list to end – Just like a linked list!

• Range query: search for everything that falls between two values

13 – Find() start point 22 8

10 20 29 – Walk through skip list using the lowest links 11 19 23 2 – Output each node until the end point

Runtimes? Runtime?

Randomized Skip List Summary Intermission: Efficiently Calculating Powers • Implements Dictionary ADT – Insert in expected O(log n) • How would you implement a function/method n – Find in expected O(log n) pow(x, n) which returns the number x ? – But worst case O(n) • How could you do that efficiently?

• Memory use – O(1) memory per node – About double a linked list

• Less overhead than search trees for iteration over range queries

4 Primality Checking Two Properties of Primes • Given a number P, can we determine whether or not P is a prime and is prime? P 0 < A < P 0 < X < P

Date: Wed, 7 Aug 2002 11:00:43 -0700 (PDT) Then: Newsgroups: uw-cs.ugrads.openforum AP-1 = 1 (mod P) Subject: Primes in P??

So, a paper published yeterday alleges they have found And, the only solutions to X2 = 1 (mod P) are: a deterministic polynomial algorithm to determine primality. X = 1 and X = P - 1

http://www.cse.iitk.ac.in/primality.pdf

Checking Primality - First Attempt Using More Information aToPMinus1 = int pow(int a, int n) { pow( someNumber, p-1 ); if (n == 0) if( aToPMinus1 % p == 1 ) “And, the only solutions to X2 = 1 (mod P) are: return true; return 1; if( n == 1 ) else X = 1 and X = P - 1” return a; return false;

int x = pow( a, n/2 );

if( isEven(n) ) return x * x; else return x * x * a; }

Checking Primality - Second Attempt Checking Primality int pow(int a, int n, int p) { Systematic algorithm: if (n == 0) For all A such that 0 < A < P return 1; Calculate AP-1 mod P using if (n == 1) pow() Check at each step of pow() and at end for primality conditions return a;

int x = pow( a, n/2, p ); Randomized algorithm: Randomly pick an A and calculate AP-1 mod P using pow() int squared = x * x % p; if( squared == 1 && x is neither p-1 or 1) // p isn’t prime!

if( isEven(n) ) return x * x; else If pow() tells us that P is prime, will either algorithm return x * x * a; tell us conclusively that P really is prime?

5 Randomized Primality Check Evaluating Randomized Primality Testing

If the randomized algorithm reports failure, then P really isn’t prime. Your probability of being struck by lightning this year: If the randomized algorithm reports success, then P might be prime. 0.00004% – P is prime with probability > ¾ – Each new A has independent probability of false positive Your probability that a number that tests prime 11 times in a row is actually not prime: 0.00003% • Solution: Your probability of winning a lottery of 1 million people – Run the randomized algorithm several times five times in a row: 1 in 2 100 Your probability that a number that tests prime 50 times in a row is actually not prime: 1 in 2100

Implicitly Generated Graphs Randomized Graph Searching • A huge graph may be implicitly specified by rules for generating it on-the-fly Consider some really huge graphs… • Blocks world: – All cities and towns in the World Atlas – vertex = relative positions of all blocks – All stars in the Galaxy – edge = robot arm stacks one block – All ways 10 blocks can be stacked Huh??? stack(blue,table) stack(green,blue)

stack(blue,red)

stack(green,red) stack(green,blue)

The Blocks World Problem: Review: Heuristic-Based Searching A Large Branching Factor • The Manhattan distance (D x+ D y) is an estimate of • Source = initial state of the blocks the distance to the goal • Goal = desired state of the blocks – A heuristic value • Path source to goal = sequence of actions (program) for robot arm! • Best-First Search • n blocks »nn vertices – Select nodes to minimize estimated distance to the goal; if a – 10 blocks » 10 billion vertices! promising set of nodes doesn’t pan out, backtrack • We cannot search such huge graphs exhaustively! – Breadth-first search: If out-degree of each node is 10, potentially visits 10d • Hill-climbing vertices – Select nodes to minimize estimated distance to the goal – Dijkstra’s algorithm is basically breadth-first search (modified to handle – Says nothing about backtracking. In fact, we don’t even keep edge weights) track of our previous path!

6 Solution: Hill Climbing with Random Restarts* “Hill Climbing”: What’s in a Name?

• Let’s assume we’re trying to maximize our • Once you can’t go any farther, randomly heuristic choose a node in the graph, and try again. • If we view our graph as a terrain, and the • Once you have several possible solutions, pick heuristic values as elevations, then graph the one with the highest heuristic value searching becomes a problem of finding the • A common variation is simulated annealing, tallest hill which may pick a random move instead of the • But … best move. As the algorithm progresses, we – What are some problems with hill climbing? choose the random move less and less often

Example: N-Queens Problem Hill Climbing with Random Restarts – Complexity?

• Place N queens on an N by N • Can often prove that if you run long enough will chessboard so that no two queens reach a goal state – but may take exponentialtime can attack each other • In some cases can prove that a hill-climbing or • Graph search formulation: random walk algorithm will find a goal in – Each way of placing from 0 to N queens on the chessboard is a vertex polynomial time with high probability – Edge between vertices that differ by – e.g., 2-SAT, Papadimitriou 1997 adding or removing one queen – Startvertex: empty board • Widely used for real-world problems where actual – Goalvertex: any one with N non- complexity is unknown – scheduling, optimization attacking queens (there are many – N-Queens – probably polynomial, but no one has tried to such goals) prove formal bound

Other Randomized Other Real-World Applications Algorithms/Data Structures • Routing finding – computer networks, airline route planning • Data Structures • VLSI layout – cell layout and channel routing – Other Dictionary ADT’s • Production planning –“just in time” optimization • Algorithms • Protein sequence alignment – Find a minimum spanning tree in O(E) time • Travelling Salesman – Max flow problems in O(V2 log V) time • Many other “NP -Hard” problems • In CSE 421, you’ll see a deterministic algorithm – A class of problems for which no exact polynomial 2 time algorithms exist – so heuristic search is the best works in O( V E ) time we can hope for – Many others!