Avinash Shukla, Anil Kishore Saxena / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.555-560 ‘Review of Radix Sort & Proposed Modified Radix Sort for Heterogeneous Data Set in Distributed Computing Environment’

Avinash Shukla, Anil Kishore Saxena

ABSTRACT We have proposed a Modified Pure Radix keys at all. The approach is generally known as Sort for Large Heterogeneous Data Set. In this "bucket sorting", "radix sorting," or "digital sorting," research paper we discuss the problems of radix because it is based on the digits on the keys. sort, brief study of previous works of radix sort & There are two approaches of radix sorting. present new modified pure radix sort algorithm 1.1 MSD (most-significant-digit) Radix Sort for large heterogeneous data set. We try to Examine the digits in the keys in a left-to-right order, optimize all related problems of radix sort working with the most significant digits first, MSD through this algorithm. This algorithm works on radix sorts partition the file according to the leading the Technology of Distributed Computing which is digits of the keys, and then recursively apply the implemented on the principal of divide & conquer same method to the sub files method. 1.2 LSD (least-significant-digit) Radix Sort 1. INTRODUCTION The second class of radix-sorting methods Sorting is a computational building block of examine the digits in the keys in a right-to-left order, fundamental importance and is the most widely working with the least significant digits first. Radix studied algorithmic problem. The importance of sort is work on the radix of elements then the sorting has lead to the design of efficient sorting Number of passes depends on the maximum length of algorithms for a variety of architectures. Many elements following are observed. applications rely on the availability of efficient  For the data set with uniform length, Radix sorting routines as a basis for their own efficiency, Sort work highly efficiently. while some algorithms can be conveniently phrased  For data set with unequal length elements, in terms of sorting. Database systems make extensive number of passes increases because use of sorting operations. The construction of spatial depending on the maximum length of data structures that are essential in computer graphics elements in list, thus increasing Space & and geographic information systems is fundamentally . a sorting process. Efficient sort routines are also a  In the case of string, strings are sorted but it useful building block in implementing algorithms is corrupted data. like sparse matrix multiplication and parallel programming patterns like Map Reduce. It is 2. REVIEW OF LITERATURE therefore important to provide efficient sorting Nilsson [2] Re-evaluated the method for routines on practically any programming platform, managing buckets held at leaves & shows better and with the evolution of new computer architectures choice of data structures further improves the there is a need to explore efficient sorting techniques efficiency, at a small additional cost in memory. For on them. Acceleration of existing techniques as well sets of around 30,000,000 strings, the improved burst as developing new sorting approaches is crucial for sort is nearly twice as fast as the previous best sorting many real-time graphics scenarios, database systems, algorithm. Jon l. Bentley [3] suggested a detailed and numerical simulations. While optimal sorting implementation combining the most effective models for serial execution on a single processor improvements to Quick sort, along with a discussion exists; efficient parallel sorting remains a challenge. of how to implement it in assembly language. It is Radix sort is classified by Knuth as "sorting by wide applicability as an internal sorting method distribution". It is the most efficient sorting method which requires minimal memory. Arne Anderson [4] for alphanumeric keys on modern computers had presented and evaluated several optimized and provided that the keys are not too long. Floating implemented techniques for string sorting. Forward number sorting is also possible, with same radix sort has a good worst-case behavior. modifications. Radix sort is stable, very fast and is an Experimental results indicate that radix sorting is excellent algorithm on computers having large considerably faster (often more than twice as fast) memory. The idea of radix soft is similar to the idea than comparison-based sorting. It is possible to of hashing algorithms. The final position of the implement a radix sort with good worst-case running record is computed for each key. If there is already a time without sacrificing average-case performance. record(s) with this key, it is placed after them The implementations are competitive with the best (overflow area). The key is not compared with other previously published string sorting programs. Naila

555 | P a g e Avinash Shukla, Anil Kishore Saxena / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.555-560 Rahman [5] Discuss the problem of large applications Radix sort (SCS-Radix sort). The three important data set which were too massive to fit completely features of the SCS-Radix are the dynamic detection inside the computer’s internal memory. The resulted of data skew, the exploitation of the memory input/output communication between fast internal hierarchy and the execution time stability when memory and slower external memory was a major sorting data sets with different characteristics. They performance bottleneck. [5] Also discussed the claim the algorithm to be 1:2 to 45 times faster distribution and merging techniques for using the compare to Radix sort or quick sort. Navarro & disks independently. These are useful techniques for Josep [10] focused on the improvement of data batched EM problems involving matrices (such as locality. CC-Radix improved the data locality by matrix multiplication and transposition), geometric dynamically partitioning the data set into subsets that data (such as finding intersections and constructing fit in cache level L2. Once in that cache level, each convex hulls), and graphs (such as list ranking, subset is sorted with Radix sort. The proposed connected components, , and algorithm is about 2 and1:4 times faster than Quick shortest paths) were proposed. In the online domain, sort and Explicit Block Transfer Radix sort. Ranjan canonical EM applications include dictionary lookup Sinha [11] suggested that the Algorithms for sorting and range searching. They also re-examined some of large data sets can be made more efficient with the EM problems in slightly different settings, such careful use of memory hierarchies and reduction in as when the data items are moving, or when the data the number of costly memory accesses. Burst sort items are variable-length (e.g., text strings), or when dynamically builds a small tree that is used to rapidly the allocated amount of internal memory can change allocate each string to a bucket. Sinha & Zobel dynamically. Rajeev Raman [6] illustrated the introduced new variants of algorithm: SR-burst sort, importance of reducing misses in the standard DR-burst sort, and DRL-burst sort. These algorithms implementation of least-significant bit first in (LSB) a-priori construct a tree from random samples. radix sort, these techniques simultaneously reduce Experimental results with sets of over 30 million cache and TLB misses for LSB radix sort, all the strings show that the new variants reduce, by up to techniques proposed yield algorithms whose 37percent cache misses than the original burst sort, implementations of LSB Radix sort & comparison- and simultaneously reducing instruction counts by up based sorting algorithms. Danial [7] explained the to 24 percent. Jian- Jun Han [12] proposed two Communication and Cache Conscious Radix sort contention-aware scheduling algorithms viz. OIHSA Algorithm (C3-Radix sort). C3-Radix sort uses the (Optimal Insertion Hybrid Scheduling Algorithm) distributed shared memory parallel programming and BBSA (Bandwidth Based Scheduling Models. Exploiting the memory hierarchy locality Algorithm). Both the algorithms start from the and reduce the amount of communication for inherent characteristic of the edge scheduling distributed Memory computers. C3-Radix sort problem, and select route paths with relatively low implements & analyses on the SGI Origin 2000 network workload to transfer communication data by NUMA Multiprocessor & provides results for up to modified routing algorithm. OISHA optimizes the 16 processors and 64M 32bit keys. The results show start time of communication data transferred on links. that for small data sets compared to the number of BBSA exploits bandwidth of network links processors, the MPI implementation is the faster optimally. Moreover, the proposed algorithms adapt while for large data sets, the shared memory not only to homogeneous systems but also implementation is faster. Shin-Jae Lee [8] solved the heterogeneous systems. Sinha, R. and Zobel [13 & 14] load imbalance problem present in parallel radix sort. examined that the Burst sort is a cache-oriented Redistributing the keys in each round of radix, each sorting technique using dynamic tree to efficiently processor has exactly the same number of keys, divide large sets of string keys into related subsets thereby reducing the overall sorting time. Load small enough to sort in cache. In original burst sort, balanced radix sort is currently the fastest internal string keys sharing a common prefix were managed sorting method for distributed-memory based via a bucket of pointers represented as a list or array. multiprocessors. However, as the computation time is C-burst sort copies the unexamined tail of each key to balanced, the communication time becomes the the bucket and discards the original key to improve bottleneck of the overall sorting performance. The data locality. Results indicate that C-burst sort is proposed algorithm preprocesses the key by typically twice as fast as original burst sort and four redistribution to eliminate the communication time. to five times faster than multi-key quick sort. CP- Once the keys are localized to each processor, the burst sort uses more memory, but provides stable sorting is confined within processor, eliminating the sorting. Nadathur Satish [15] proposed the high- need for global redistribution of keys & enables well performance parallel radix sort and balanced communication and computation across routines for many-core GPUs, taking advantage of processors. Experimental results with various key the full programmability offered by CUDA. Radix distributions indicate significant improvements over sort is the fastest GPU sort and merge sort is the balanced radix sort. Jimenez- Gonzalez [9] introduced fastest comparison-based sort reported in the a new algorithm called Sequential Counting Split literature. For optimal performance, the algorithm

556 | P a g e Avinash Shukla, Anil Kishore Saxena / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.555-560 exploited the substantial fine-grained parallelism and processors caused by data skew and reduces decomposes the computation into independent tasks. significantly the amount of data communicated. The Exploiting the high-speed on chip shared memory local stages of PCS-Radix sort are performed only on provided by NVIDIA’s GPU architecture and the bits of the key that have not been sorted during efficient data-parallel primitives, particularly parallel the parallel stages of the algorithm. PCS-Radix sort scan, the algorithms targeted the GPUs. N. adapts to any parallel computer by changing three Ramprasad and Pallav Kumar Baruah [16] simple algorithmic parameters. [9] Implemented the suggested an optimization for the parallel radix sort algorithm on a Cray T3E-900 and the results shows algorithm, reducing the time complexity of the that it is more than 2 times faster than the previous algorithm and ensuring balanced load on all fastest 64-bit parallel . PCS-Radix processor. [16] Implemented it on the “Cell sort achieves a speed up of more than 23 in 32 processor”, the first implementation of the Cell processors in relation to the fastest sequential Broadband Engine Architecture (CBEA). It is a algorithm at our hands. Daniel Cederman and heterogeneous multi-core processor system. Philippas Tsigas [19] showed at GPU-Quick sort, an 102400000 elements were sorted in 0.49 seconds at a efficient Quick sort algorithm suitable for the highly rate of 207 Million/sec. Shibdas Bandyopadhyay parallel multi-core graphics processors. Quick sort and Sartaj Sahni [17] developed a new radix sort had previously been considered an inefficient sorting algorithm, GRS, for GPUs that reads and writes solution for graphics processors, but GPU-Quick sort records from/to global memory only once. The often performs better than the fastest known sorting existing SDK radix sort algorithm does this twice. implementations for graphics processors, such as Experiments indicate that GRS is 21% faster than radix and bitonic sort. Quick sort can thus be seen as SDK sort while sorting 100M numbers and is faster a viable alternative for sorting large quantities of data by between 34% and 55% when sorting 40M records on graphics processors. with 1 to 9 32-bit fields. Daniel Jiménez-González, 1. COMPARISON OF VARIOUS SORTING Juan J. Navarro, Josep-L. Larrba-Pey [18] ALGORITHMS [1] proposed Parallel in-memory 64-bit sorting, an The following table compares the sorting algorithms important problem in Database Management Systems according to the complexity, method used by them and other applications such as Internet Search like exchange, insertion, selection, merge and Engines and Data Mining Tools. [9] The algorithm is also discuss their advantages and disadvantages. n termed Parallel Counting Split Radix sort (PCS- represents the number of element to be sorted. Radix sort). The parallel stages of the algorithm increases the data locality, balance the load between

TABLE 1.1: COMPARISON OF COMPARISON BASED SORT

Name Average Case Worst Method Advantage/Disadvantage Time Case Complexity Time complexity

Bubble O(n2) O(n2) Exchange 1. Straightforward, simple and Stable. Sort 2. Slow & difficult on large data set.

Insertion O(n2) O(n2) Insertion 1. Efficient for small list and Save memory Sort 2. Slow for large data set.

Selection O(n2) O(n2) Selection 1. Improvement over Sort 2. Unstable & very slow for large Data set.

Heap Sort O(n log n) O(n log n) Selection 1. More efficient version of . It does not require recursion & extra buffer. 2. Slower than Quick and Merge Sorts.

Merge O(n log n) O(n log n) Merge 1. A fast recursive sorting suitable for very large Sort data set. 2. It requires large memory space. In place- O(n log n) O(n log n) Merge 1.Very low memory required merge Sort 2. Unstable & slow.

557 | P a g e Avinash Shukla, Anil Kishore Saxena / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.555-560

Shell Sort O(n log n) O(nlog2n) Insertion 1. Efficient for large data set, relatively small memory required. 2. It is not stable & has more constants.

Quick Sort O(n log n) O(n2) Partition 1. Fastest, efficient & required less memory space. 2. Partition can lead to unbalanced.

3.1 COMPARISON OF NON COMPARISON sorted, and k the size of each key and s is the chunk BASED SORTING ALGORITHMS [1] size use by implementation. Assume that the key size The following table describes