Data structures and algorithms DAT038/TDA417, LP2 2019

Lecture 12, 2019-12-02 Binary heaps (S&W 2.4)

Some slides by Sedgewick & Wayne Collections

A collection is a data type that stores groups of items.

stack Push, Pop , resizing array

queue Enqueue, Dequeue linked list, resizing array

symbol table Put, Get, Delete BST, , , TST Add, Contains, Delete BST, hash table, trie, TST

A is another kind of collection.

“ Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won't usually need your code; it'll be obvious.” — Fred Brooks

2 Priority queues

Collections. Can add and remove items. Stack. Add item; remove the item most recently added. Queue. Add item; remove the item least recently added.

Min priority queue. Add item; remove the smallest item. Max priority queue. Add item; remove the largest item.

return contents contents operation argument value size (unordered) (ordered)

insert P 1 P P insert Q 2 P Q P Q insert E 3 P Q E E P Q remove max Q 2 P E E P insert X 3 P E X E P X insert A 4 P E X A A E P X insert M 5 P E X A M A E M P X remove max X 4 P E M A A E M P insert P 5 P E M A P A E M P P insert L 6 P E M A P L A E L M P P insert E 7 P E M A P L E A E E L M P P remove max P 6 E M A P L E A E E L M P

A sequence of operations on a priority queue 3 Priority queue API

Requirement. Generic items are Comparable. Basic API. Add item; find minimum; delete minimum.

MinPQ() create an empty priority queue MinPQ(Key[] a) create a priority queue with given keys void insert(Key v) insert a key into the priority queue Key delMin() return and remove the smallest key boolean isEmpty() is the priority queue empty? Key min() return the smallest key int size() number of entries in the priority queue

4 Priority queue API

Alternative. Add item; find maximum; delete maximum. (Exercise: use MinPQ to implement MaxPQ.)

MaxPQ() create an empty priority queue MaxPQ(Key[] a) create a priority queue with given keys void insert(Key v) insert a key into the priority queue Key delMax() return and remove the largest key boolean isEmpty() is the priority queue empty? Key max() return the largest key int size() number of entries in the priority queue

5 Sorting with a priority queue

To a list, add its elements to a (min-)priority queue, then repeatedly find and remove the minimum element:

public void sort(String[] a) { int N = a.length; MinPQ pq = new MinPQ(); for (int i = 0; i < N; i++) pq.insert(a[i]); for (int i = 0; i < N; i++) a[i] = pq.delMin(); }

Q. What is this algorithm’s performance? A. Depends on performance of priority queue:

● quadratic, if insert() or delMin() take linear time.

● N lg N, if insert() and delMin() take lg N time. 6 Priority queue client example

Challenge. Find the largest M items in a stream of N items.

e.g. Fraud detection: isolate $$ transactions. N huge, M large Constraint. Not enough memory to store N items.

% more tinyBatch.txt % java TopM 5 < tinyBatch.txt Turing 6/17/1990 644.08 Thompson 2/27/2000 4747.08 vonNeumann 3/26/2002 4121.85 vonNeumann 2/12/1994 4732.35 Dijkstra 8/22/2007 2678.40 vonNeumann 1/11/1999 4409.74 vonNeumann 1/11/1999 4409.74 Hoare 8/18/1992 4381.21 Dijkstra 11/18/1995 837.42 vonNeumann 3/26/2002 4121.85 Hoare 5/10/1993 3229.27 vonNeumann 2/12/1994 4732.35 Hoare 8/18/1992 4381.21 sort Turing 1/11/2002 66.10 key Thompson 2/27/2000 4747.08 Turing 2/11/1991 2156.86 ... 7 Priority queue client example

Challenge. Find the largest M items in a stream of N items.

e.g. Fraud detection: isolate $$ transactions. Constraint. Not enough memory to store N items.

MinPQ pq = new MinPQ();

while (StdIn.hasNextLine()) { use a min- Transaction data String line = StdIn.readLine(); oriented pq type is Transaction item = new Transaction(line); Comparable pq.insert(item); (ordered by $$) if (pq.size() > M) pq.delMin(); pq } contains largest M items 8 Priority queue client example

Challenge. Maintain a printer queue, but where jobs can have priorities

e.g. the boss’s documents should be printed first

Solution. Use a max priority queue.

Ordering. Implement Comparable as follows:

● First compare who submitted the document; documents submitted by more senior people are greater

● If seniority is equal, compare submission time; documents submitted earlier are greater

9 Priority queue applications

Event-driven simulation.[ customers in a line, colliding particles ]

Numerical computation. [ reducing roundoff error ]

Data compression. [ Huffman codes ]

Graph searching. [ Best-first search/Dijkstra's algorithm ]

Number theory. [ sum of powers ]

Artificial intelligence. [ A* search ]

Statistics. [ online median in data stream ]

Operating systems. [ load balancing, interrupt handling ]

Computer networks. [ web cache ]

Discrete optimization. [ bin packing, scheduling ]

10 Priority queue implementations

Cost of insert() Cost of max() Cost of delMax() Unsorted array 1 N N Sorted array N 1 1 Unbalanced BST N N N Red-black BST lg N lg N lg N Binary lg N 1 lg N

Balanced BSTs. Work fine, but we can do better. Binary heaps. Simpler and faster than a balanced BST. This lecture. How to implement min-heaps to get a min priority queue. The book. Presents max-heaps (completely symmetric to min heaps).

11 Binary (min-)heaps Binary (min-)heaps – representation

A binary heap implements a priority queue using a binary (not a BST).

28

29 20 37 32 89 66 18 8 74 39 Above is just a . We will add an invariant to make it easy to find the minimum element. 13 Invariant #1 – the heap property

A tree satisfies the heap property if the value of each node is less than (or equal to) the value of its children:

Root node is the 8 smallest – can find minimum 18 29 in O(1) time 20 28 39 66 37 32 74 89

Invariant #1: A binary heap must satisfy the heap property. 14 Binary heap

A tree is complete if it is perfectly balanced except the bottom level, which is filled from left to right:

Level 0 8

Level 1 18 29 Level 2 20 28 39 66 Level 3 37 26 76 32 74

Complete trees have logarithmic height! Invariant #2:

A binary heap must be complete. 15 Binary heap or not?

8 8

18 29 18 29 20 28 66 20 28 39

8 8

8 78 28 29 20 95 85 20 18 39 66

16 Binary heap or not?

8 8

No: 18 No: 29 18not complete29 not complete 20 28 66 20 28 39

8 8 No: 8 Yes 78 28 29 28 > 18 20 95 85 20 18 39 66

17 Binary heap invariant

The binary heap invariant for min-heaps:

● The tree must be complete ● It must have the heap property (each node is less than or equal to its children) Why do we choose these invariants?

● Completeness: ensures balance ● Heap property: allows us to find the minimum in constant time (it’s the root!) Our task now: implement insert() and delMin() while preserving the invariant

18 Adding an element to a binary heap

Step 1: insert the element at the next empty position in the tree

8

18 29 20 28 39 66 37 26 76 32 74 89 12 This might break the heap invariant! In this case, 12 is less than 66, its parent.

19 Adding an element to a binary heap

Step 2: if the new element is less than its parent, swap it with its parent

8

18 29 20 28 39 66 37 26 76 32 74 89 12

20 Adding an element to a binary heap

Step 2: if the new element is less than its parent, swap it with its parent

8

18 29 20 28 39 12 37 26 76 32 74 89 66 The invariant is still broken, since 12 is less than 29, its new parent

21 Adding an element to a binary heap

Repeat step 2 until the new element is greater than or equal to its parent.

8

18 12 20 28 39 29 37 26 76 32 74 89 66 Now 12 is in its right place, and the invariant is restored. (Think about why this algorithm restores the invariant.) 22 Why this works

At every step, the heap property almost holds except that the new element might be less than its parent After swapping the element and its parent, still only the new element can be in the wrong place (why?) 8

18 29 20 28 39 12 37 26 76 32 74 89 66 23 Why this works

Suppose that z is the new node and we swap it with its parent:

x z

y z y x

On the left, we must have x ≤ y but x > z. On the right, this means z < x ≤ y. So only z can be in the wrong place afterwards!

24 Removing the minimum element

To remove the minimum element, we are going to follow a similar scheme as for insertion:

● First remove the minimum (root) element from the tree somehow, breaking the invariant in the process ● Then repair the invariant Because of completeness, we can only really remove the last (bottom-right) element from the tree

● Solution: first swap the root element with the last element, then remove the last element

25 Removing the minimum element

The goal: remove the minimum element First: swap the root and the last element (66), and then remove the last element

8

18 12 20 28 39 29 37 26 76 32 74 89 66

26 Removing the minimum element

Step 1: swap the root element and the last element in the tree, and remove the last element

66

18 12 20 28 39 29 37 26 76 32 74 89 The invariant is broken, because 66 is greater than its children

27 Removing the minimum element

Step 2: if the moved element is greater than its children, swap it with its least child

66

18 12 20 28 39 29 37 26 76 32 74 89

28 Removing the minimum element

Step 2: if the moved element is greater than its children, swap it with its least child

12

18 66 20 28 39 29 37 26 76 32 74 89

29 Removing the minimum element

Step 3: repeat until the moved element is less than or equal to its children

12

18 29 20 28 39 66 37 26 76 32 74 89

30 Removing the minimum element

We swap with the least child because swapping with the greatest child breaks the invariant at two nodes:

66 18

18 12 66 12 20 28 39 29 20 28 39 29 37 26 76 32 74 89 37 26 76 32 74 89

31 Why this works

Suppose that x was bigger than its children and z was its smallest child:

xx z

y z y x

On the left, we must have x > z, z ≤ y. On the right, this means z < x and z ≤ y. So only x can be in the wrong place afterwards!

32 Binary (min-)heaps – summary so far

Implementation of (min-)priority queues

● Heap property – means smallest value is always at root ● Completeness – means tree is always balanced Complexity:

● find minimum – constant time ● insert, delete minimum – runtime proportional to height of tree, logarithmic because tree is balanced

33 Binary heaps are arrays!

A binary heap is really implemented using an array!

8 Possible because of completeness 18 29 property Convention: 1-based 20 28 39 66 array 37 26 76 32 74 89 indexing

1 2 3 4 5 6 7 8 9 10 11 12 13 8 18 29 20 28 39 66 37 26 76 32 74 89 34 Child positions

The left child of node i ...the right child is at index 2i 8 is at index 2i + 1 in the array... 18 29 20 28 39 66 37 26 76 32 74 89

1 2 3 4 5 6 7 8 9 10 11 12 13 8 18 29 20 28 39 66 37 26 76 32 74 89

R

L

P

.

a .

C

r

C

e

h

h

n

i

i

l

t

l

d

d 35 Child positions

The left child of node i ...the right child is at index 2i 8 is at index 2i + 1 in the array... 18 29 20 28 39 66 37 26 76 32 74 89

1 2 3 4 5 6 7 8 9 10 11 12 13 8 18 29 20 28 39 66 37 26 76 32 74 89

P

L

R

.

a

.

C

r

C

e

h

h

n

i

i

l

t

l

d

d 36 Child positions

The left child of node i ...the right child is at index 2i 8 is at index 2i + 1 in the array... 18 29 20 28 39 66 37 26 76 32 74 89

1 2 3 4 5 6 7 8 9 10 11 12 13 8 18 29 20 28 39 66 37 26 76 32 74 89

P

L

R

.

a

.

C

r

C

e

h

h

n

i

i

l

t

l

d

d 37 Parent position

The parent of node i 8 is at index i/2 (rounded down) 18 29 20 28 39 66 37 26 76 32 74 89

1 2 3 4 5 6 7 8 9 10 11 12 13 8 18 29 20 28 39 66 37 26 76 32 74 89

C

P

a

h

r

i

e

l

d

n

t 38 Reminder: inserting into a binary heap

To insert an element into a binary heap:

● Add the new element at the end of the heap ● Move the element up: while the element is less than its parent, swap it with its parent We can do exactly the same thing for a binary heap represented as an array!

39 Inserting into a binary heap

Step 1: add the new element to the end of the array, set child to its index 6

18 29 20 28 39 66

C

37 26 76 32 74 89 8 h

i

l

d

1 2 3 4 5 6 7 8 9 10 11 12 13 14 6 18 29 20 28 39 66 37 26 76 32 74 89 8 40 Inserting into a binary heap

Step 2: compute parent = child/2

6

18 29 20 28 39 66 37 26 76 32 74 89 8

P

a

C

r

h

e

i

n

l

d

t

1 2 3 4 5 6 7 8 9 10 11 12 13 14 6 18 29 20 28 39 66 37 26 76 32 74 89 8 41 Inserting into a binary heap

Step 3: if array[parent] > array[child], swap them 6

18 29 20 28 39 8 37 26 76 32 74 89 66

P

a

C

r

h

e

i

n

l

d

t

1 2 3 4 5 6 7 8 9 10 11 12 13 14 6 18 29 20 28 39 8 37 26 76 32 74 89 66 42 Inserting into a binary heap

Step 4: set child = parent, parent = child/2, and repeat 6

18 29 20 28 39 8

P 37 26 76 32 74 89 66

C

a

r

h

e

i

n

l

d

t

1 2 3 4 5 6 7 8 9 10 11 12 13 14 6 18 29 20 28 39 8 37 26 76 32 74 89 66 43 Binary heaps as arrays

Binary heaps are “morally” trees

● This is how we view them when we design the heap algorithms But we implement the tree as an array

● The actual implementation translates these tree concepts to use arrays When you see a binary heap shown as a tree, you should also keep the array view in your head (and vice versa!)

44 Binary heap: Java implementation

public class MinPQ> { public class MinPQ> { fixed capacity privateprivate Key[]Key[] pq;pq; (for simplicity) privateprivate intint N;N;

publicpublic MinPQ(intMinPQ(int capacity)capacity) {{ pqpq == (Key[])(Key[]) newnew Comparable[capacity+1];Comparable[capacity+1]; }}

publicpublic booleanboolean isEmpty()isEmpty() {{ returnreturn NN ==== 0;0; }} publicpublic voidvoid insert(Keyinsert(Key key)key) {{ ...... }} PQ ops publicpublic KeyKey delMin()delMin() {{ ...... }} publicpublic KeyKey min()min() {{ returnreturn pq[1];pq[1]; }} heap helper functions privateprivate voidvoid swim(intswim(int k)k) {{ ...... }} privateprivate voidvoid sink(intsink(int k)k) {{ ...... }} array helper functions privateprivate booleanboolean greater(intgreater(int i,inti,int j)j) {{ returnreturn pq[i].compareTo(pq[j])pq[i].compareTo(pq[j]) >> 0;0; }}

privateprivate voidvoid exch(intexch(int i,inti,int j)j) {{ KeyKey tt == pq[i];pq[i]; pq[i]pq[i] == pq[j];pq[j]; pq[j]pq[j] == t;t; }} }} 45 Insertion in a heap

Insert. Add node at end, then swim it up. Cost. At most 1 + lg N compares. Swimming. Fixes the situation when child's key is smaller than its parent's key. public void insert(Key x) { To fix the violation: N++;

Exchange key in child pq[N] = x; swim(N); with key in parent. } (If child is at index k, parent is at index k/2.) private void swim(int k) { while (k > 1 && greater(k/2, k)) { Repeat until heap exch(k, k/2); order restored. k = k/2; } } 46 Deleting the minimum in a heap

Delete min. Exchange root with node at end, then sink it down. Cost. At most 2 lg N compares. Sinking. Fixes the situation public Key delMin() { Key min = pq[1]; where the parent’s key is exch(1, N); N--; allow garbage greater than one (or both) sink(1); collection of deleted pq[N+1] = null; item of its children's. return min; } To eliminate the violation: children of node at k are 2k and private void sink(int k) { Exchange key in parent while (2*k <= N) { 2k+1 with key in smaller child. int j = 2*k; if (j < N && greater(j, j+1)) j++; Repeat until heap // j = smallest child if (!greater(k, j)) break; order restored. exch(k, j); k = j; } } 47 Binary heap considerations

Underflow and overflow.

Underflow: throw exception if deleting from empty PQ.

Overflow: add no-arg constructor and use .

leads to log N Maximum-oriented priority queue. amortized time per op Replace greater() with less().

Implement less().

Other operations (can be implemented with sink and swim).

Remove an arbitrary item.

Change the priority of an item.

Build a heap in-place from an array.

Sort an array in-place (by first converting to a heap). 48 Max heaps

Dual to min heaps.

Complete binary tree stored in an array.

Heap property: each node must be greater than or equal to its children. The book presents heaps as max heaps!

49 Summary

Priority queues.

Min priority queue: add, find minimum, delete minimum.

Max priority queue: add, find maximum, delete maximum.

Binary heaps.

Min heaps implement min priority queues.

Max heaps implement max priority queues.

Idea: take a complete binary tree satisfying the heap property; store values in an array for efficiency. Add/remove take logarithmic time.

Visualisation.

See “Resources” section of website! 50