Priority Queues
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 8 Priority Queues Imagine that you are writing a system to manage tickets—service requests—for an IT department. A natural choice of data structure to hold these tickets is a queue which provides first-in-last-out behavior. Recall that queues provide two important operations: • enqueue(T v): adds the value v onto the end of the queue. • dequeue(): removes and returns the oldest value (the first value in line) from the queue. This interface is suitable for dealing with the line-like-behavior that our system needs tomanage. However, a queue is insufficient if the tickets also carry along priorities with them. Some tickets are higher priority than others—an electrical fire should take precedence over an accountant’s email being down—and our system needs to be able handle this. In particular, we ought to require that dequeue return the highest priority item in the queue, regardless of how long it has been in the structure. If there are multiple items with the same highest priority, then we can service any of them first. 8.1 The Priority Queue ADT A priority queue is an abstract data type that represents a queue-like structure that respects priorities. A priority is defined to be an ordering on the elements contained in the queue. For example, the tickets may have a numeric value associated with them denoting their priority, and we could order the tickets based off this priority. A priority queue provides the following operations: • void add(T v): adds the value v into the queue, respecting priority order (i.e., enqueue). • T poll(): removes and returns the value with the highest priority from the queue (i.e., dequeue). • T peek(): returns, but does not remove, the value with the highest priority in the queue. If multiple values have the same priority, then the order in which they are dequeued is unspecified. For example, consider a priority queue for tickets as described above where a ticket is represented by an integer that is its priority. For simplicity’s sake, we’ll represent a ticket by an integer representing its priority with higher priority tickets needing to be serviced first. For example, if we start with an empty priority queue and add in tickets 10, 5, and 2, the queue has the following shape: (front) [10, 5, 2]. 1 8.2. HEAPS CHAPTER 8. PRIORITY QUEUES Peeking at the top element of the queue will result in 10 because it is the highest priority ticket. After adding tickets 15 and 7, the queue changes to: [15, 10, 7, 5, 2]. Polling the queue at this point dequeues 15 from the front of the queue, leaving: [10, 7, 5, 2]. Finally, if we add another ticket with priority 5, the queue becomes: [10, 7, 5푎, 5푏, 2]. The newest ticket of priority 5, denoted 5푏 appears after the older ticket, 5푎, in the queue. Note that the higher priority ticket goes to the front of the queue. For simplicity’s sake, we show the queue as an ordered list although the data structure we use to represent this priority queue may not maintain this ordering. The only thing it needs to do is ensure that the highest priority element iseasily removable when the queue is polled. 8.2 Heaps Our example suggests a simple implementation: an ordered array list where elements are ordered by their priority. • void add(T v): because the list is ordered, we can use binary insertion to insert v into the list in amortized 풪(log 푛) time. • T peek(): because the list is ordered, the head of the list is the highest priority. We can access this element in 풪(1) time. • T poll(): poll requires that we remove the head of the list which takes 풪(푛). Can we do better than this? Recall that a binary search tree allows us to maintain an ordered list but with 풪(log 푛) time complexity for add and remove. This sounds good on paper, but the problem is that the 풪(log 푛) time complexity is dependent on the BST being balanced. If it is balanced, then we obtain the desired 풪(log 푛) complexity. However, if the tree is degenerate, i.e., a linked list, then we have 풪(푛) time complexity instead. There are general schemes for maintaining a balanced tree that allow us togetthe 풪(log 푛) complexity we want but at the cost of higher constant factors in the runtime. However, can we obtain similar performance in this restricted domain of supporting a priority queue without the complexities of a general balancing strategy? It turns out that we can do this by relaxing the invariant on our binary search tree. Recall that the binary search tree invariant says that for any node in a binary search tree: (a) The left branch contains elements that are less than the element atthisnode. (b) The right branch contains elements that are greater than the element at thisnode. Because of this invariant, we are forced to unbalance the tree in certain situations, e.g., inserting elements in ascending order. If we relax the invariant, we can hit a sweet spot between enforcing the ordering that we to support a priority queue while allowing us to easily balance the tree. 2 CHAPTER 8. PRIORITY QUEUES 8.3. THE HEAP INVARIANTS 8.3 The Heap Invariants The data structure we’ll use to efficiently implement a priority queue isacalled heap (which has no relation to the heap in memory). A (binary) heap is a tree which maintains two invariants: • The semantic binary heap invariant says that for any node in the tree, the sub-trees of that node only contain elements less than the element at this node. • The structural binary heap invariant says that the heap is always complete. That is, all the levels but the last of the heap are completely filled in, and the last level is filled from left-to-right. The semantic invariant is represented graphically as follows: 푣 (< 푣) (< 푣) This may seem like a useless property, but by doing this, we implicitly require that all elements greater than 푣 appear above it in the tree. By applying this reasoning recursively at each level of the tree, we know that the maximum priority element must be placed at the root of the tree. The structural invariant is represented graphically as follows: ∘ ∘ ∘ ∘ ∘ ∘ ⋅ Note that with such a tree, the length of any path from the root to a leaf is upper-bounded by log 푛 which is critical in ensuring that the runtime of our operations will be 풪(log 푛). A final note on the semantic invariant of our heaps: by choosing to push smaller elements furtherdown the tree, the maximum element sits at the root. We call such a heap a max heap. In contrast, we could instead have our semantic invariant require that the children of a node only contain elements greater than the value at the node. By doing this, the minimum element sits at the root of tree. Such a heap is a min heap. For the purposes of simplifying our discussion, we will consider a max heap in our discussions below. However, keep in mind that a min heap is obtainable by simply flipping the ordering in our invariant. 8.4 Array-based Trees Because our heaps are complete trees, we are able to use an array to represent the tree rather than a linked structure (compare array lists versus linked lists). The array will contain the contents of the nodes ofour tree, and rather than containing references to its children, we will use explicit formulae to find the indices of children and parent nodes given the index of a particular node in the tree. To arrive at these formulae, we note that a natural way to lay out the elements of a complete tree in an array is to proceed in a top-down, left-to-right fashion. For example, if we have the following tree: 3 8.5. HEAP OPERATIONS CHAPTER 8. PRIORITY QUEUES 1 2 3 4 5 We could represent it with the following array: [1, 2, 3, 4, 5, …] keeping in mind that like an array list, only part of the overall array is in use at any given time. Because the tree is complete, this layout strategy ensures we fill the array from left to right. This fact iswhywedid not previously consider using an array to represent a tree. Most trees will not be complete like a heap and so, there will be many indices of the array that are unused. If we look at each node in the array: • 1 is at index 0 with children 2 (index 1) and 3 (index 2) • 2 is at index 1 with children 4 (index 3) and 5 (index 4) • 3 is at index 2 with no children (indices 5 and 6). From this, we can derive the following formulae: • The left child of the node at index 푖 is 푖 × 2 + 1. • The right child of the node at index 푖 is 푖 × 2 + 2 = (푖 + 1) × 2. 푖−1 • The parent of the node at index 푖 is ⌊ ⌋ for nodes that are not the root of the overall tree. 2 8.5 Heap Operations Now we discuss implementing each of our heap operations in terms of our array-based tree data structure: Peek Noting that the root of the tree is the first index of the array, we simply return that element. This takes 풪(1) time to do so.