An Efficient Algorithm for Concurrent Priority Queue Heaps

Information Processing Letters ELSEVIER Information Processing Letters 60 ( 1996) 15 1-1 57 An efficient algorithm for concurrent priority queue heaps Galen C. Hunt ', Maged M. Michael *, Srinivasan Parthasarathy I, Michael L. Scott ' Department of Computer Science, University of Rochester, Rochester, NY 14627-0226, USA Received 30 March 1995; revised 2 September 1996 Communicated by G.R. Andrews Abstract We present a new algorithm for concurrent access to array-based priority queue heaps. Deletions proceed top-down as they do in a previous algorithm due to Rao and Kumar (1988), but insertions proceed bottom-up, and consecutive insertions use a bit-reversal technique to scatter accesses across the fringe of the tree, to reduce contention. Because insertions do not have to traverse the entire height of the tree (as they do in previous work), as many as 0(M)operations can proceed in parallel, rather than O(log M) on a heap of size M. Experimental results on a Silicon Graphics Challenge multiprocessor demonstrate good overall performance for the new algorithm on small heaps, and significant performance improvements over known alternatives on large heaps with mixed insertionldeletion workloads. Keywords: Concurrency; Data structures; Priority queue; Heap; Bit-reversal; Synchronization 1. Introduction on shared memory multiprocessors use shared priority queues to schedule sub-tasks. In these applications, The heap data structure is widely used as a priority items can be inserted and deleted from the heap by queue [2].The basic operations on a priority queue are any of the participating processes. The simplest way insert and delete. Insert inserts a new item in the queue to ensure the consistency of the heap is to serialize the and delete removes and returns the highest priority updates by putting them in critical sections protected item from the queue. A heap is a binary tree with the by a mutual exclusion lock. This approach limits con- property that the key at any node has higher priority current operations on the heap to one. Since updates than the keys at its children (if they exist). An array to the heap typically modify only a small fraction of representation of a heap is the most space efficient: the nodes, more concurrency should be achievable by the root of the heap occupies location 1 and the left allowing processes to access the heap concurrently as and right children of the node at location i occupy the long as they do not interact with each other. locations 2i and 2i + 1, respectively. No items exist in Biswas and Browne [ 11 proposed a scheme that al- level 1 of the tree unless level 1 - 1 is completely full. lows many insertions and deletions to proceed concur- Many applications (e.g. heuristic search algorithms, rently. Their scheme relies on the presence of main- graph search, and discrete event simulation [5,6]) tenance processes that dequeue sub-operations from a FIFO work queue. Sub-operations are placed on the work queue by the processes performing insert and * Corresponding author. Email: [email protected]. Email: {gchunt,srini,scott} @cs.rochester.edu. delete operations. The work queue is used to avoid 0020-0190/96/$12.00 Copyright @ 1996 Elsevier Science B.V. All rights reserved. PI1 S0020-0190(96)00148-2 152 G.C. Hunt et al./Information Processing Letters 60 (1996) 151-157 deadlock due to insertions and deletions proceeding A delete operation in the new algorithm, as in the in opposite directions in the tree. The need for a work sequential algorithm, starts by reading the data and queue and maintenance processes causes this scheme priority of the root of the heap and then replacing them to incur substantial overhead. Rao and Kumar [7] with those of as rightmost node in the lowest level of present another scheme that avoids deadlock by us- the heap. Then, the delete operation "heapifies" the ing top-down insertions, where an inserted item has heap. It compares the priority of the root with that of to traverse a path through the whole height of the heap each of its children (if any). If necessary, it swaps (insertions in a traditional sequential heap proceed the root item with one of its children in order to en- bottom-up) . Jones [3] presents a concurrent prior- sure that none of the children has priority higher than ity queue algorithm using skew heaps. He notes that the root. If no swapping is necessary the delete oper- top-down insertions in array-based heaps are ineffi- ation is complete; it returns the data that was origi- cient, while bottom-up insertions would cause dead- nally in the root. Otherwise, the operation recursively lock if they collide with top-down deletions without "heapifies" the subheap rooted at the swapped child. using extra server processes. To handle concurrency all these steps are performed This paper presents a new concurrent priority under the protection of the locks on the individual queue heap algorithm that addresses the problems nodes and a lock on the size of the heap. In each encountered in previous research. On large heaps the step of the heapify operation, the lock of the subtree algorithm achieves significant performance improve- root is already held. It is not released until the end of ments over both the serialized single-lock algorithm that step. Prior to comparing priorities, the locks of and the algorithm of Rao and Kumar, for various the children are acquired. If swapping is performed, insertionldeletion workloads. For small heaps it still the lock on the swapped child is retained through the performs well, but not as well as the single-lock next recursive heapify step, and the locks on the root algorithm. The new algorithm allows concurrent in- and the unswapped child are released. Otherwise, all sertions and deletions in opposite directions, without the locks are released, and the delete operation com- risking deadlock and without the need for special pletes. server processes. It also uses a "bit-reversal" tech- An insert operation starts by inserting the new data nique to scatter accesses across the fringe of the tree and priority in the lowest level of the heap. If the in- to reduce contention. serted node is the root of the heap, then the insert operation is complete. Otherwise, the operation compares the priority of the inserted node to that of its parent. If the child's priority is higher than that of its parent, 2. The new algorithm then the two items are swapped, otherwise the insert The new algorithm augments the standard heap data operation is complete. If swapping was necessary, then the same steps are applied repeatedly bottom-up un- structure [2] with a mutual-exclusion lock on the heap's size and locks on each node in the heap. Each til reaching a step in which no swapping is necessary, or the inserted node has become the root of the heap. node also has a tag that indicates whether it is empty, valid, or in a transient state due to an update to the To handle concurrency, all these steps are performed under the protection of the locks and tags on the indi- heap by an inserting process. Nodes that contain no vidual nodes and the lock on the size of the heap. In data are tagged EMPTY. Nodes that are available for deletion are tagged AVAILABLE. A node that has been every step of the bottom-up comparison, the lock of the parent is acquired first, followed by the lock on the inserted, and is being moved into place, is tagged with inserted node. After comparison and swapping (if nec- the process identifier (pid) of the inserting process. essary), both locks are released. Locks are acquired in the same order as in the delete operation, parent- child, to avoid deadlock. This mechanism requires re- Array based heaps can be considered as a binary tree that is leasing and then acquiring the lock on the inserted filled at all levels except possibly the last level. In skew heaps this restriction is relaxed; the representative binary tree need not item between successive steps, opening a window of be filled at all the intermediate levels. vulnerability during which the inserted item might be G.C. Hunt et at. /Information Processing Letters 60 (1996) 151-157 153 swapped by other concurrent operations. Tags are used record dataitem to resolve these situations. lock := FREE, tag := EMPTY; priority := 0 An insert operation tags the inserted item with its record heap pid. In every step, an insert operation can identify the lock := FREE; bit-reversed-counter size; dataitem item item it is moving up the heap even if the item has been define LOCK(x) as lock(heap.items [d.lock) swapped upwards by a deletion. In particular, tags are define UNLOCK(x) as unlock( heap.items[x 1 .lock) define TAG(^) as heap.items[x].tag used in the following manner: define PRIORITY (x) as heapitems [x].priority If the tag of the parent node is equal to AVAILABLE and the tag of the current node is equal to the insert procedure concurrentinsert(prioriiy, heap) operation's pid, then no interference has occurred I/ Insert new item at bottom of the heap. and the insertion step can proceed normally. lock(heap.lock); i := bit-reversedincrement(heap.size) LOCK(i); unlock(heap.lock); PRIORITY (i) := priority If the tag of the parent node is equal to EMPTY, TAG(i) := pi4 UNLOCKS) then the inserted item must have been moved by a delete operation to the root of the heap. The insert / 1 Move item towards top of heap while it has higher priority operation is complete.

Load more