Lecture 17 Priority Queues
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 17 Priority Queues 15-122: Principles of Imperative Computation (Fall 2021) Frank Pfenning, Rob Simmons In this lecture we will look at priority queues as an abstract type and discuss several possible implementations. We then pick the representation as heaps and start to work towards an implementation (which we will complete in the next lecture). Heaps have the structure of binary trees, a very common structure since a (balanced) binary tree with n elements has height O(log n). During the presentation of algorithms on heaps we will also come across the phenomenon that invariants must be temporarily violated and then re- stored. We will study this in more depth in the next lecture. From the programming point of view, we will see a cool way to implement binary trees in arrays which, alas, does not work very often. Additional Resources • Review slides (https://cs.cmu.edu/~15122/slides/review/17-pq.pdf) This maps as follows to our learning goals: Computational Thinking: We focus on the trade-offs of various represen- tations for a given data structure, and dip our toes in the idea of tem- porarily violating invariants. Algorithms and Data Structures: We introduce priority queues, a com- mon data structure for prioritized task lists, and heaps, an efficient way to implement them. Programming: For once, we won’t be doing any programming in this lec- ture! 1 Priority Queues A priority queue is like a queue, a stack, or an unbounded array: there’s an add function (enq, push, uba_add) and a delete function (deq, pop, LECTURE NOTES c Carnegie Mellon University 2021 Lecture 17: Priority Queues 2 uba_rem). Stacks and queues can be generally referred to as worklists.A priority queue will be a new kind of worklist. We think of each element as a task: we put all the work we need to do on the worklist (with enq/push/add), and then consult the worklist to find out what work we should do next (with deq/pop/rem). Queues and stacks have a fixed way of deciding which element gets removed: queues always remove the element that was added first (FIFO), and stacks always remove the element that was added most recently (LIFO). This makes the client interface for generic stacks and queues very easy: the queue or stack doesn’t need to know anything about the generic type elem. The library doesn’t even need to know if elem is a pointer. (Of course, as clients, we’ll probably want elem to be defined as the generic type void*.) For example, here’s again the interface for stacks: /* Client interface for stacks */ // typedef _______ elem; /* Library interface for stacks */ // typedef ______* stack_t; bool stack_empty(stack_t S) /*@requires S != NULL; @*/; stack_t stack_new() /*@ensures \result != NULL; @*/; void push(stack_t S, elem x) /*@requires S != NULL; @*/; elem pop(stack_t S) /*@requires S != NULL && !stack_empty(S); @*/; Rather than fixing, once and for all, the definition of what elements will get returned first, priority queues allow the client to decide in what order elements will get removed. The client has to explain how to give every element a priority, and the priority queue ensures that whichever element has the highest priority will get returned first. For example, in an operating system the runnable processes might be stored in a priority queue, where certain system processes are given a higher priority than user processes. Similarly, in a network router packets may be routed according to some assigned priorities. Lecture 17: Priority Queues 3 To implement the library of priority queues, we need the user to give us a function has_higher_priority(x,y) that returns true only when x has strictly higher priority than y. /* Client-side interface for priority queues */ // typedef _______ elem; typedef bool has_higher_priority_fn(elem e1, elem e2); Once we’ve defined the client interface for priority queues, the inter- face is very similar to the stacks and queues we’ve seen before: pq_new creates a new priority queue, pq_empty checks whether any elements ex- ist, and pq_add and pq_rem add and remove elements, respectively. We also add a function pq_peek which returns the element that should be re- moved next without actually removing it. For stacks, this operation can be implemented on the client-side in constant time, but that is not the case for queues and may not be the case for priority queues. /* Library-side interface */ // typedef ______* pq_t; bool pq_empty(pq_t P) /*@requires P != NULL; @*/; pq_t pq_new(has_higher_priority_fn* prior) /*@requires prior != NULL; @*/ /*@ensures \result != NULL; @*/ /*@ensures pq_empty(\result); @*/; void pq_add(pq_t P, elem x) /*@requires P != NULL; @*/; elem pq_rem(pq_t P) /*@requires P != NULL && !pq_empty(P); @*/; elem pq_peek(pq_t P) /*@requires P != NULL && !pq_empty(P); @*/; In this lecture, we will actually use heaps to implement bounded prior- ity queues. When we create a bounded worklist, we pass a strictly positive maximum capacity to the function that creates a new worklist (pq_new). We also add a new function that checks whether a worklist is at maximum Lecture 17: Priority Queues 4 capacity (pq_full). Finally, it is a precondition to pq_add that the priority queue must not be full. Bounding the size of a worklist may be desirable to prevent so-called denial-of-service attacks where a system is essentially dis- abled by flooding its task store. This can happen accidentally or on purpose by a malicious attacker. To accommodate this, we make the following ad- dition and changes to the above interface: /* Library-side interface -- heap implementation */ // ... bool pq_full(pq_t P) /*@requires P != NULL; @*/; pq_t pq_new( int capacity, has_higher_priority_fn* prior) /*@requires capacity > 0 && prior != NULL; @*/ /*@ensures \result != NULL; @*/ /*@ensures pq_empty(\result); @*/; void pq_add(pq_t P, elem x) /*@requires P != NULL && !pq_full(P);@ */; 2 Some Implementations Before we come to heaps, it is worth considering different implementation choices for bounded priority queues and consider the complexity of vari- ous operations. The first idea is to use an unordered array where the length of the array is the maximum capacity of the priority queue, along with a current size n. Inserting into such an array is a constant-time operation, since we only have to insert it at n and increment n. However, finding the highest-priority element (pq_peek) will take O(n), since we have to scan the whole portion of the array that’s in use. Consequently, removing the highest-priority el- ement also takes O(n): first we find the highest-priority element, then we swap it with the last element in the array, then we decrement n. A second idea is to keep the array sorted. In this case, inserting an ele- ment is O(n). We can quickly (in O(log n) steps) find the place i where it be- longs using binary search, but then we need to shift elements to make room for the insertion. This takes O(n) copy operations. Finding the highest- priority element is definitely O(1), and if we arrange the array so that the Lecture 17: Priority Queues 5 highest-priority elements are at the end, deletion is also O(1). If we instead keep the elements sorted in an AVL tree, the AVL height invariant ensures that insertion becomes a O(log n) operation. We haven’t considered deletion from AVL trees, though it can be done in logarithmic time. The heap structure we present today also gives logarithmic time for adding an element and removing the element of highest priority. Heaps are also more efficient, both in terms of time and space, than using balanced binary search trees, though only by a constant factor. In summary: pq_add pq_rem pq_peek unordered array O(1) O(n) O(n) ordered array O(n) O(1) O(1) AVL tree O(log n) O(log n) O(log n) heap O(log n) O(log n) O(1) 3 The Min-heap Ordering Invariant A min-heap is a binary tree structure, but it is a very different binary tree than a binary search tree. The min-heaps we are considering now can be used as priority queues of integers, where smaller integers are treated as having higher priority. (The alternative is a max-heap, where larger integers are treated as having higher priority.) Therefore, while we focus our discussion on heaps, we will talk about “removing the minimal element” rather than “removing the element with the highest priority.” Typically, when using a priority queue, we expect the number of inserts and deletes to roughly balance. Then neither the unordered nor the or- dered array provide a good data structure since a sequence of n inserts and deletes will have worst-case complexity O(n2). A heap uses binary trees to do something in between ordered arrays (where it is fast to remove) and unordered arrays (where it is fast to add). A min-heap is a binary tree where the invariant guarantees that the min- imal element is at the root of the tree. For this to be the case we just require that the key of a node be less than or equal to the keys of its children. Al- ternatively, we could say that each node except the root is greater than or equal to its parent. Min-heap ordering invariant (alternative 1): The key of each node in the tree is less than or equal to all of its children’s keys. Lecture 17: Priority Queues 6 Min-heap ordering invariant (alternative 2): The key of each node in the tree except for the root is greater than or equal to its parent’s key. These two characterizations are equivalent.