Heaps with Bits

Theoretical Computer Science ELSEVIER Theoretical Computer Science 164 (1996) 1-12 Heaps with bits Svante Carlsson a, Jingsen Chen a,*, Christer Mat&son b a Department of Computer Science, Luled University, S-971 87 Luled Sweden b Quality Laboratories AB, IDEON Research Park, S-223 70 Lund, Sweden Received March 1995; revised July 1995 Communicated by M. Nivat Abstract In this paper, we show how to improve the complexity of heap operations and heapsort using extra bits. We first study the parallel complexity of implementing priority queue operations on a heap. The trade-off between the number of extra bits used, the number of processors available, and the parallel time complexity is derived. While inserting a new element into a heap in parallel can be done as fast as parallel searching in a sorted list, we show how to delete the smallest element from a heap in constant time with a sublinear number of processors, and in sublogarithmic time with a sublogarithmic number of processors. The models of parallel computation used are the CREW PRAM and the CRCW PRAM. Our results improve those of previously known algorithms. Moreover, we study a variant, the fine-heap, of the traditional heap structure. A fast algorithm for constructing this new data structure is designed using an interesting technique, which is also used to develop an improved heapsort algorithm. Our variation of heapsort is faster than Wegener’s heapsort and requires less extra space. 1. Introduction One of the fundamental data types in Computer Science is the priority queue. It has been useful in many applications [l 11. A priority queue is a set of elements on which two basic operations are defined: inserting a new element into the set and deleting the minimum element from the set. Several data structures have been proposed for implementing priority queues. Probably the most elegant one is the heap [21]. A (min-)heap is a binary tree with heap-property: (i) It has the heap shape; i.e., all leaves lie on at most two levels which are adjacent and all leaves on the last level occupy the leftmost positions and all other levels are complete; (ii) It is min-ordered: the key value associated with each node is not smaller than that of its parent. The minimum element is then at the root, which is at the first level. We refer to the number of elements in a heap as its size. A max-heap is defined similarly. * Corresponding author. E-mail: svante,[email protected]. 0304-3975/96/$15.00 @ 1996- Elsevier Science B.V. All rights reserved SSDZ 0304-3975(95)00152-2 2 S. Carlsson, et al. I Theoretical Computer Science 164 (1996) I-12 The problem of heap construction and heap operations have received considerable attention in the literature [2,5,6,8,9, 11, 131. In the parallel models of computation, optimal heap construction algorithms have also been developed [4,15]. However, parallel heap operations have not been so deeply studied. Recently, Pinotti and Pucci [14] presented an O(log log n)-time ’ parallel algorithm for deleting the smallest element from a heap of size n using n/logn EREW-PRAM processors; and Zhang and Korf [22] reduced the number of processors used for the deletion to (n/logn)‘-‘ik for some constant k, 1 <k d [log(n/logn)l. In this paper, the trade-off between the number of extra bits used, the number of processors available, and the parallel-time complexity of heap operations is investigated. We first present a constant-time parallel deletion algorithm on the concurrent-read concurrent-write (CRCW) PRAM model. On this model, a multiple-write access to the same memory location succeeds only when all the processors writing to that cell are attempting to write the same value. Next, we show how to perform a delete operation in a heap of size n in O(logn/loglogn) time using log n/log log n processors on the same model. All our CRCW-PRAM algorithms use n extra bits. Moreover, if n log n extra bits are available and if a processor can write 0 or 1 into the bit of a word (where a word is of [logn] bits), the complexity of our parallel algorithms remains the same on the concurrent-read exclusive-write (CREW) PRAM model. Sorting is a fundamental algorithmic problem. One of the well studied in-place sorting algorithm is the heapsort, which first constructs a heap on the input elements and then deletes the elements one by one from the heap. The classical heapsort [8,21] needs 2n logn + 0(n) comparisons both in the worst and average cases to sort n elements [ 161. During the past 30 years, several variants of heapsort have been developed [2,3,9, 10, 13, 19,201. The fastest one is the variant proposed by Wegener [19], which takes n logn+ l.ln (and n logn+n for n = 2’- 1) comparisons in the worst case, using n bits of extra storage. Moreover, it also makes Lo(n log n) two-bit variable comparisons. This is very close to the information-theoretic lower bound of n log n- 1.4427n comparisons for sorting n elements. In Section 3, we shall study a new variant, the fine-heap, of the traditional heap structure, which is a heap with additional ordering relation defined on siblings. Efficient construction algorithm for this new structure is presented. This algorithm is not only simple and fast, but also employs a powerful technique for designing comparison-based algorithms (namely, mass productions, which was previously used for designing the fastest known selection algorithm [ 171). With the fine-heap, we show how to obtain a variant of Wegener’s heapsort and achieve an upper bound of n log n + 1.00274n (and n log n + 0.91667n for n = 2h - 1) comparisons in the worst case. Furthermore, our variant requires either Ln/2j extra bits and Co(nlogn) two-bit variable comparisons, or n extra bits and no bit comparisons. Re- mark that parallel algorithms for heap construction and heap operations can be adapted to the fine-heap, which may also result in a parallel version of heapsort. ’ All logarithms in this Paper are to base 2. S. Carlsson, et al. I Theoretical Computer Science 164 (1996) I-12 2. Parallel heap operations Notice that a heap on n elements can be stored level by level from left to right in an array X with the property that the element at position i has its parent at Li/2J and its children at 2i and 2i + 1. Thus, the addresses of all the nodes on a path from the root to some leaf of a heap can easily be computed by shift operations. The level, leuel(X[i]), of an element X[i] in the heap Y? is defined as LlogiJ + 1. For the insertion of a new element x into a heap 2 of size n, an optimal sequential algorithm of @(log logn) comparisons works as follow: First, x is placed at the first available position X[n + 11; and then the min-ordering is (re)stored on the path from X[l] down to X’[n + 11. This is equivalent to the problem of searching x in the path from X[l] to X[ [(n + 1)/2J] (which form a sorted list). For the complexity of parallel searching, see [ 12, 181. Therefore, the following observation is immediate. Observation 2.1. The parallel complexity of the insert operation in a heap of size n is the same as that of searching in a sorted list of length [log(n + l)] on the same model of parallel computation. The delete operation in a heap $4?[ 1.. n] consists of first removing the smallest element from the heap, replacing it with X[n], and then restoring the min-ordering property. The sequential deletion can be done optimally in logarithmic time. However, it appears that the delete operation is inherently sequential, since the operation involves the search of Z’[n] in some path from either ~$721 or ~4731 down to the leaf-level (called the path of minimum children). Hence, the deletion may not admit an efficient parallel solution. Observing that the searching path for Z’[n] is not known beforehand, we have Observation 2.2. In a heap, the parallel complexity of the delete operation is at least as hard as that of parallel insertion. In the rest of this section, nevertheless, we will try to parallelize the delete operation. More precisely, we shall demonstrate that it is possible to perform the deletion in constant time with a sublinear number of processors, and in sublogarithmic time using a sublogarithmic number of processors. Assume without loss of generality that the size of the heap is of form 2h - 1 for some integer h > 0. Otherwise, one can first perform the parallel deletion on the first [log n] levels of the heap and then in O( 1) steps find out which element at the leaf-level of the heap is the last node of the path of minimum children. We first show that the root deletion in a heap of size n can be solved in constant time with CJ(nlog n) processors on the CRCW PRAM model. Algorithm 2.3. Suppose the number of processors available is p = (n + 1)/2(LlognJ - 1). 4 S. Carlsson. et al. I Theoretical Computer Science 164 (1996) I-12 1. Associate with the heap X an array of n bits, denoted by B[ 1..n].

Heaps with Bits

Overview Parallel Merge Sort

Lecture 16: Lower Bounds for Sorting

Space-Efficient Data Structures, Streams, and Algorithms

A Complete Bibliography of Publications in Nordisk Tidskrift for Informationsbehandling, BIT, and BIT Numerical Mathematics

Advanced Topics in Sorting

Optimal Node Selection Algorithm for Parallel Access in Overlay Networks

CS302 Final Exam, December 5, 2016 - James S

Dualheap Sort Algorithm: an Inherently Parallel Generalization of Heapsort

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ

Fundamental Data Structures Contents

00 Fast K-Selection Algorithms for Graphics Processing Units

Introspective Sorting and Selection Algorithms