<<

Journal of 4 (11): 897-902, 2008 ISSN 1549-3636 © 2008 Science Publications

A Heapify Based Parallel

1Mwaffaq A. Abu Al hija, 2Arwa Zabian, 3Sami Qawasmeh and 4Omer H. Abu Al haija 1Department of Computer Science, Hail University, Hail, Saudi Arabia 2Department of Computer Science, Jadara University, Irbid, Jordan 3Department of Computer Science, Irbid National University, Irbid, Jordan 4Department of Computer Science, Irbid, Jordan

Abstract: Quick is a whose worst case running time is (n2 ) on an input array of n numbers. It is the best practical for sorting because it has the advantage of sorting in place. Problem statement: Behavior of quick sort is complex, we proposed in-place 2m threads parallel sort algorithm which had advantage in sorting in place and had better performance than classical sequential quick sort in running time. Approach: The algorithm consisted of several stages, in first stage; it splits input data into two partitions, next stages it did the same partitioning for prior stage which had been spitted until 2 m partitions was reached equal to the number of available processors, finally it used heap sort to sort respectively ordered of non internally sorted partitions in parallel. Results: Results showed the speed of algorithm about double speed of classical Quick sort for a large input size. The number of comparisons needed was reduced significantly. Conclusion: In this study we had been proposed a sorting algorithm that uses less number of comparisons with respect to original quick sort that in turn requires less running time to sort the same input data.

Key words: Parallel sorting, heap sort, Quick sort, in-place sorting

INTRODUCTION complete binary with height h has between 2h and 2h+1 -1 node. So, the height of a of n nodes is Sorting is one of the most well studied problems in O (log2 n). The basic operations of the heap are Build- computer science because it is fundamental in many Heap and Heapify. Heapify operation arranges the heap [1] applications. Quick sort is typically the fastest sorting to restore its heap property. algorithm, basically due to better cache behavior among other factors. However, the worst case running time for MATERIALS AND METHODS 2 Quick sort is O (n ) for an input size n, which is unacceptable for large data sets. In this study, we Partition and Swap Algorithm (PSA): Given an array propose PSA Partition and Swap Algorithm, which of size n, PSA divides the array into four sub-arrays in reduces the time needed for sorting with respect to two steps to facilitate the sorting and to reduce the Quiksort in the worst case. A comparison between number of comparisons needed. The algorithm works in Quick sort and PSA algorithm in terms of running time three phases the first partition phase, the second partition phase and finally the Heapify phase. is performed; the simulation results show that PSA is In the first partition phase the algorithm divides the faster than Quick sort in the worst case for sorting the [n] array A into two sub-arrays A , A based on the same input data size. Our algorithm is based on 1 2 middle value midval, in a manner that all the elements partitioning the input data to different partitions then of A1 (or denoted left) are smaller than the midval and using Heapify procedure in parallel to sort each all the elements of A2 (or denoted as right) are greater partition. The heap is a very fundamental than the (midval), if the data must be ordered in an used in many application programs. A heap is a data ascendant manner. (If the data must be ordered in structure that stores an array in a binary tree descendent manner, all the elements of A1 must be maintaining two properties: the first is that the value greater than the midval and all the elements of A2 must stored in every node is smaller than or equal to the be smaller). In the second partition phase, the value of its children. The second property is that the partitioning in the first partition phase is repeated for binary tree must be complete. This implies that the each of A1, A2. Where A1 is divided into two subarrays

Corresponding Author: Mwaffaq A. Abu Al hija, Department of Computer Science, Hail University, Hail, Saudi Arabia 897 J. Computer Sci., 4 (11): 897-902, 2008

[0] [n] A1l, A1r and A2 is divided into two sub arrays A2l, A2r A , A denote the first and last element in the array then the elements of these four sub arrays are ordered as in the previous phase. The third phase, is heavily phase Swap procedure: in which the resultant subarrays A1l, A1r, A2l, A2r are for i = 0 to n/2, j = n to n/2 do: organized using heavily procedure into sorted [0] subarrays. if A = Lmax > midval and [n] Rmin = A < Lmax First partition phase: Swap (Rmin, Lmax ) if i ≠ j • Given an array A of size n find the middle location i = i+1, j = j+1 of the n elements the pivot can be find by »n/2ÿ. Step 1 in the flowchart Return if • For each part (left and right to the pivot) find the Else End max and min values (step 2 in the flowchart) Else • Find the mid value midval that can be calculated as i = i+1 and Return if following: The procedure stops when i = j, that means the two pointers are encountered and all the elements on the left Lmax + Lmin+ Rmax+Rmin /4⁄ and on the right have been visited and compared. This • For each element in the two subarrays A1 (left), A2 condition is important to ensure the correctness of the (right) do the following comparisons (if the array is algorithm and to ensure that all the elements have been ordered in an ascendant manner) For i, j are two compared. In the case of n being an odd number, we pointers, take n/2⁄ to find the mid location (pivot location),

Fig. 1: The flow chart of PSA algorithm 898 J. Computer Sci., 4 (11): 897-902, 2008

Fig. 3: PSA algorithm/merge in place

Fig. 2: PSA algorithm/first partition phase in which case the right will have more elements than the left and the pointer i must continue to visit the elements until it encounters j.

Second partition phase: The algorithm starts working in parallel in dividing the two subarrays A1, A2 to four consecutive partitions A1l, A1r, A2l, A2r and the swap procedure will be repeated for the following counters in Fig. 4: PSA Algorithm/ Heapify Phase A1 for i = 0 to n/4, j = n/2 to n/4 and for A2 the counter will be: for i = n/2 to 3n/2, j = n to j = 3n/2. At the end of this phase we obtain four subarrays in represent the swapped elements. In part A of Fig. 3, the sub-array A is divided, to left1 and right1. In part B, such a manner that: 1 A2 is divided into two sub-arrays left 2, right 2. In All the elements of A1l < all the elements of A1r and all Fig. 4, it is shown how the four sorted sub-arrays are the elements of A2l < all the elements of A2r inserted in their location into the final .

Heapify phase: In this phase, we use heavily procedure RESULTS on all the four resultant sub-arrays to organize them in sorted subarrays. To study the effectiveness of our algorithm PSA Figure 1 shows the flowchart that describes how with respect to Quick sort and other sorting , PSA work. In the first step of the flowchart the initial we have used turbo Delphi to implement both Quick array is divided into two sub-arrays and the midval- sort and PSA. Where we wrote a generic code that location is found. In the second step, the min and max could run on both programs. Turbo Delphi is an values for each partition are found and a series of swap integrated development environment that runs under win 32 and is built based on object-oriented Pascal operations are done. Step 3 indicates the last phase that language. Our simulator runs on Pentium 4, 2.66 GHZ, is heavily phase. 512 Mb. We have implemented both Quick sort and PSA on the same machine and we have performed Example: Figure 2 shows a numerical example of how different simulations. Our simulation results show that PSA works in an array of size n = 12. In Fig. 2, A is the performance of PSA is better than quick sort in divided into two sub-arrays A1, A2 where the bold items term of running time (Table 1). 899 J. Computer Sci., 4 (11): 897-902, 2008

Table 1: Comparison between the running time of Quick sort and PSA for different input size Quick sort running PSA running

)

s Input size time (ms) time (ms) m ( 1000 0.234 0.171 e

m 5000 1.640 0.829 i t

g 10000 3.313 2.139 n i

n 50000 20.485 12.203 n

u 100000 40.460 26.090 R (1000 items), however the efficiency of our algorithm respectively to Quick sort is about 64.56% for a large input size and does not change if the input size Input size increased from n to 10 n (10000-100000 items). And that is accorded with our goal is to propose an Fig. 5: The running time of both Quick sort and PSA algorithm simple that works in parallel and has a good algorithms efficiency for all input size.

CONCLUSION

In this study we have been proposed a sorting s n

o algorithm that uses less number of comparisons with s i r

a respect to original Quick sort that in turn requires less p m

o running time to sort the same input data. Our algorithm

c

f o

is based on dividing the initial array in four partitions in # a manner that during the partitioning process the sub- arrays are organized from the smaller to the bigger elements reducing in that the number of comparisons for the final merging operation. In addition it is used the heavily procedure that organize the nodes of each Input size partition in a data structure useful for the searching Fig. 6: The number of comparisons needed to sort the operation. Our results show that PSA algorithm same input size in both Quick sort and PSA performs better than original Quick sort for the input algorithms data size tested. Sorting algorithms can be classified into three For an input size 1000-5000 the performance of PSA is categories: the first category is based on the number of similar to that of Quick sort. comparisons in the worst and average and best case. However, for a large data size (100.000) PSA The second category is based on the memory usage and outperforms Quick sort (Fig. 5). Figure 6, shows that the last category is based on the difference in behavior there a big difference in the number of comparisons in the worst case and the average case. [2] needed to sort the same input data between both Quick In , a comparison between three sorting [4] [3] [5] sort and PSA, where PSA requires a less number of algorithms (Heap sort , Quick sort and ) comparisons with respect to Quick sort in all the input is done and a new algorithm is proposed that is a data sizes tested. variant of heap sort (modified heap sort). The new algorithm requires nlogn-0.788928 comparisons in the DISCUSSION average case. The algorithm uses only one comparison at each node. With one comparison, the algorithm can Our goal in this study is to propose an algorithm decide which child of a node contains a larger element that performs better than Quick sort in term of running and this child is promoted to its parent position. time. However the performance of quicksort is optimal The four previous algorithms (heap sort, mergesort, only for a large input size. Analyzing the results quicksort, modified heap sort) were implemented to presented in Table 1 we can see that the efficiency study their performance in term of execution time. For of our algorithm is about 73% for a small input size an input size 1000 the performance of heap sort is better 900 J. Computer Sci., 4 (11): 897-902, 2008 than the other algorithms. However, for an input size in the root will be placed in the last unsorted position from 5000-50000 quicksort outperforms the other and perform heavily or heap adjustment. This is to find algorithms in terms of execution time. However, for out the larger or smaller key between the 2nd key of the input size 100000 the modified heap sort requires a less 1st pair and the 1st key of the last pair. In the new data running time to sort this input data. In[6], a heap structure each key require three movements. The traversal algorithm is proposed that visits the node of a experimental results indicate that the new data structure heap and stores its value in an auxiliary data structure for heaps results in better performance of heap sort with size n/2⁄, in a manner that the time for locating algorithm in term of number of comparisons. In[10], is the next node for traversal is O (log n) in the worst proposed ultimate heap sort that is a variant of heap sort  case. The main advantage of the algorithm is that the that sorts n elements in (nlog2 (n+1)) time in the worst data structure used is small, which consequently helps case by performing at most nlog2n+θ(n) key in the searching operation. The algorithm takes a min- comparisons and nlog2n+θ(n) element moves. The heap as input and visits the heap in ascending order of algorithm transforms the heap on which it operates into the value stored in the nodes without making any two-layer heap which keeps small elements at the changes to the structure or contents of the heap. The leaves. The two layer heap works as follow: first it nodes that are being traversed are copied into another finds 2d the largest key of the elements in the array list, producing an ordered list of the values stored in the where denoted as x. x is the largest power of 2 smaller original heap. The algorithm is similar to an in-order than or equal to n. Then, it partitions the array A[1..n] traversal of a . A comparison between into three pieces: A[1…r], A[r+1…….r+e] and the running time of generic Quicksort, optimized A[r+e+1,…….n], in a manner that the key of every quicksort (optquicksort), heap sort, heap traversal of a element in the three partitions is larger than, equal to binary tree (heapbt) and the proposed algorithm heap and smaller (respectively) to x. Then the array A[1….r] traversal using (heapbh) is done. The is arranged into a heap using standard heap construction comparison results show that for an input size of 30000 algorithm. In ultimate heap sort, the input array to be the heapbh performs slightly better than heap sort. An sorted of size n is sorted in d-1 rounds. Where d = …log2 another sorting algorithm that reduces the cache size is (n+1)Œ. In each round one half of the elements are proposed in[7], the proposed dualheap sort algorithm sorted. In the first phase of the ith round, I = {1….d-1} partitions recursively the subheaps in half until the the remaining ni elements are built into a two layer subheaps become small to partition any more, then the heap. In the second phase of the ith round, the ri array is sorted. The algorithm has different advantages elements with a key larger than x are removed from the in term of more operations and cache size. In the best heap. The removed elements are exchanged with the case where the input is already sorted the dualheap sort last ri elements of the heap. In the third phase of the ith performs no move operations and nlogn comparisons. round, ni-2di-ri of the elements with a key equal to that In addition, the continued partitioning of the subheaps of x are gathered together and moved at the end of the decreases the cache size needed in each step. The sub-array containing the remaining elements. The algorithm constructs two opposing subheap and then computations carried out in each round are the exchanges values between them until the larger values following: are in one subheap and the smaller values are in the other. The results presented in[7] show that dualheap • Rearrange A[1..n] into a two layer heap sort performs about (50%) more comparisons and move • For n steps-1 →j until n-l do: exchange A[1] and operations than heap sort but the memory size needed is [8] A[j], remake A[1………j-1] into heap smaller. In , a variant of heap sort is proposed that uses a new data structure for pairs of nodes of which REFERENCES can be simultaneously stored and processed in a single register. The algorithm sorts pair of elements in 1. Cormen, T., C.E. Leiserson, R.L. Rivest and ascending order. By n/2 comparisons, all larger C. Stein, 2001. Introduction to Algorithms. 2nd elements of each pair 2i, 2i+1 can be brought to even Edn., MIT Press and McGraw-Hill. ISBN 10: 0- positions. To construct the new data structure of size 262-03293-7. n/2 elements it needs (1+/2) n comparisons where 2. Sharma, V., S. Singh and K.S. Kahlon, 2008. >0, in contrast to weak heaps[9], that takes n-1 Performance study of improved comparisons. In this data structure, during swapping it algorithm and other sorting algorithms on different must swap nodes containing pair of elements with platforms. Int. J. Comput. Sci. Network Secur., similar nodes at no extra cost. After constructing the 8: 101-105. heap, in the sorting phase the active elements (the even http://paper.ijcsns.org/07_book/200804/20080415. elements are called active and the other ones dormant) pdf. 901 J. Computer Sci., 4 (11): 897-902, 2008

3. Hoare, C.A.R., 1962. Quick sort. Comput. J., 8. Shahjala and M. Kaykobad, 2007. A new data 5: 10-15. DOI: 10.1093/comjnl/5.1.10. structure or heap sort with improved numbers of 4. Wegener, I., 1990. Bottom-up heap sort a new comparisons. Proceedings of the 1st Workshop on variant of heap sort beating on average quicksort. Algorithms and Computation, Feb. 12-12, Dhaka, Proceeding of the Mathematical Foundations of Bangladesh, pp: 88-96. Computer Science, Aug. 27-31, Springer-Verlag http://teacher.buet.ac.bd/saidurrahman/walcom200 Inc., Banska´ Bystrica, Czechoslovakia, New York 7/proceedings/shahjalal.pdf. USA., pp: 516-522. 9. Duton, R.D., 1993. The weak heapsort. BIT http://portal.acm.org/citation.cfm?id=90256. Numer. Math. J. Part I Comput. Sci., 33: 372-381. 5. Knuth, D.E., 1998. The Art of Programming http://www.springerlink.com/content/x442k4973j5 Sorting and Searching. 2nd Edn., Addison-Wesley 242p0/. Professional, pp: 780. ISBN: 0-201-89685-0. 10. Katajainen, J., 1998. The ultimate heapsort. 6. Mao, L.J. and S.D. Lang, 2003. An empirical study Proceedings of the Computing: The 4th Australian Theory Symposium Australian Computer Science of heap traversal and its applications. Proceedings of Communications, Springer-Verlag. Singapore, pp: the 7th World Multi Conference on Systematic, 87-95. Cybernatics and Informatics, July 27-30, Orlando- http://citeseerx.ist.psu.edu/viewdoc/summary?doi=1 Florida, USA., pp: 167-171. 0.1.1.33.4727. http://www.cs.ucf.edu/csdept/faculty/lang/pub.html. 7. Sepesi, G., 2007. DualHeap sort algorithm: An inherently parallel generalization of heapsort. http://adsabs.harvard.edu/abs/2007arXiv0706.2893S

902