An Efficient Methodology to Sort Large Volume of Data
Total Page:16
File Type:pdf, Size:1020Kb
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 03, MARCH 2020 ISSN 2277-8616 An Efficient Methodology to Sort Large Volume of Data S.Bharathiraja, G.Suganya, M.Premalatha, R.Kumar, Sakkaravarthi Ramanathan Abstract—Sorting is a basic data processing technique that is used in all day-day applications. To cope up with technological advancement and extensive increase in data acquisition and storage, Sorting requires improvement to minimize time taken for processing, response time and space required for processing. Various sorting techniques have been proposed by researchers but the applicability of those techniques for large volume of data is not assured. The main focus of this work is to propose a new sorting technique titled Neutral Sort, to reduce the time taken for sorting and decrease the response time for a large volume of data. Neutral Sort is designed as an enhancement to Merge Sort. The advantages and disadvantages of existing techniques in terms of their performance, efficiency and throughput are discussed and the comparative study shows that Neutral sort drastically reduces time taken for sorting and hence reduces the response time. Index Terms—Chunking, Banding, Sorting, Efficiency, Merge sort, Bigdata Sorting, Neutral Sort. —————————— —————————— 1 INTRODUCTION N data science, ordering of elements is very much useful in divided chunks may already be in sorted manner and hence I various applications and data base querying. Different doesn’t need further split, we have proposed a modified approaches have been proposed by researchers for Merge sort algorithm titled “Neutral sort” to reduce time performing the sorting operation effectively based on the type complexity and hence to reduce response time. of application used. The approaches are normally validated Chapter 2 elaborates on the various algorithms that exist in using the space complexity, that refers to the temporary space practice with a detailed analysis on their merits and occupied by the algorithm during sorting and time limitations. A detailed discussion on proposed methodology complexity, the time taken to sort and return the final sorted with the algorithmic explanation is presented in Chapter 3. data. Algorithms emerge with the consent to minimize both The prototype is tested and the results are discussed in space and time complexities. Literature as discussed in the Chapter 4. following section asserts that quick sort and merge sort algorithms are efficient and effective for usage in real time. In 2 EXISTING ALGORITHMS addition to this, depending on the location where sorting is executed, the techniques are broadly categorized into Internal and external sorting. The existing sorting techniques provide 2.1 Bubble Sort different throughput and efficiency, out of which Quick, Bubble sort is one of the simplest algorithm for sorting data. Merge, Binary sorting techniques have good responses. The algorithm compares each element with its next element Chunking and banding are widely used in different and if necessary (if not ordered) swaps the elements [1]. The applications to create modularity and to ensure specificity. comparison and swapping operations are repeated during Also, Merge sort, a technique that divides the data into chunks each pass until all the elements in the list are completely till the size of the chunk becomes one and then combines the sorted. It is inefficient since, in the worst case situation there chunks applying sorting algorithm in order is proven to be might be at most swapping possible. The time complexity to efficient by many researchers. Considering the fact that the sort data with the application of Bubble sort in worst case scenario is O (n2). ———————————————— 2.1.1 Benefits and Drawbacks S.Bharathiraja is currently working as a Assistant Professor in Vellore Though Bubble sort is inefficient its simplicity makes it an Institute of Technology Chennai and is pursuing his Ph.D., PH- advantageous algorithm given some small set of data for 04439931120. E-mail: [email protected] Dr. G.Suganya is currently working as Associate Professor is currently in sorting. Bubble sort utilizes O(1) auxiliary space. Due to its Vellore Institute of Technology Chennai and is specialized in Software time constraint its inefficient for sorting large set of data[1]. Engineering and Machine Learning., PH-04439931399. E-mail: [email protected] 2.2 Selection Sort M.Premalatha is currently working as a Assistant Professor in Vellore Institute of Technology Chennai and is pursuing his Ph.D., PH- The number of comparisons and hence swapping is slightly 04439931071. E-mail: [email protected] reduced in Selection sort when compared to Bubble sort. Dr.R.Kumar is currently working as a Associate Professor in Vellore During every pass, only one swap occurs to place the Institute of Technology Chennai and is pursuing his Ph.D., PH-0443993. E-mail: [email protected] appropriate data in respective position and hence, selection Sakkaravarthi is currently working as a professor in Department of sort is better than Bubble sort in terms of swapping. Even Computer Science , Cegep Gaspesie, Montreal, Canada., PH-+1(438) 395- then, while applied to large datasets, the performance of this 9639. E-mail:[email protected] sorting is O (n2)[2] in worst case situation. 5828 IJSTR©2020 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 03, MARCH 2020 ISSN 2277-8616 2.2.1performance Analysis selected as either the left most or the right most element. This [2][3]Selection sort needs O(n2) comparisons and (n-1) swaps drawback was eliminated later by selecting a suitable method to sort the list of n elements. [2][3]Two other variations of of selecting the pivot. One best method among choosing an selection sort that are quite popular are Quadratic Sort[5] and appropriate pivot which would divide the list into two almost Tree Sort [3][2]. equal halves during all recursive partitions is to choose the pivot as the median of first, middle and last elements. Quick 2.2.2 Benefits and Drawbacks sort works efficiently even in an virtual memory environment Though selection sort is inefficient due to its performance for and its an in-place algorithm. large set of data its better than bubble sort in terms of lesser 2.4.1 Performance Analysis number of swapping which reduces the data movement much. [8]The fastest sorting algorithm with an average running time Selection sort utilizes O(1) auxiliary space. of O(n log n) is the Quick sort. Generally if we select either the first or the last element as pivot it produces the worst run time 2.3 Insertion Sort of O(n2). However, these worst case scenarios are not that A much better efficient algorithm compares with bubble and frequent. One of the variant of Quick sort is the Qsort [10], selection sort is the Insertion sort. It works just similar to which is robust and faster than the traditional Quick sort playing cards. As and when a new element is picked up it algorithm method. Quick sort is the better suited for large data looks for the appropriate position and inserts the element by sets. shifting all the elements which occurs after the inserting 2.4.2 Benefits and Drawbacks position one position forward to maintain the sorted order in [8]It is the fastest and efficient algorithm for large sets of data. the list. This insertion process will continue for all the But it is inefficient if the elements in the list are already sorted elements in the list to maintain the ordering. Insertion sort is which results in the worst case time complexity of O(n2). more suitable for almost or partially sorted list of elements. Quick sort uses O(log n) [13] secondary space for recursive Time complexity is proven to be O (n2) for this sorting. function calls so it might be expensive in terms of space 2.3.1 Performance Analysis occupancy for large sets of data. Moreover quick sort carry out Insertion sort performance degrades to quadratic sequential traverse through the elements of array that results computational complexity if the order of elements is reversed. in good locality of reference and cache behavior for arrays [9]. The advantage of insertion sort is its efficient performance when the elements in the list are partially sorted Insertion sort 2.5 Merge Sort is inefficient when size grows since its average case running Merge sort [11] also uses divide and conquer approach. The time is also O(n2). A popular variant to this sort is the Shell procedure splits the set of elements to be sorted into two equal sort [7]. It is unstable and an in-place comparison sorting halves if the number of elements is even and splits into two algorithm with the performance of O(n log n) in best case partitions with either of the partition containing one element scenario. Insertion sort utilizes O(1) auxiliary space. greater than the other, if the number of elements is odd. The 2.3.2 Benefits And Drawbacks procedure continues recursively until all the partition contains Insertion sort is also inefficient like bubble and selection sort one element each. for large set of data. Also to locate the insertion position it has Merge operation is applied after partitioning the elements. to scan through number of element to be sorted. Its best case The merging operation merges the correspond left and right run time is O(n) [13] only when the elements are already in partition and grows up until it forms a single partition which sorted form. contains all the elements in the original set but now in sorted order. Mostly Merge sort is done using recursive procedure 2.4 Quick Sort though it can also be done using non-recursively.