Algorithms: Quicksort

Total Page:16

File Type:pdf, Size:1020Kb

Algorithms: Quicksort Algorithms: QuickSort QuickSort is an algorithm that proports to sort an input array A[lo..hi] of n = hi − lo + 1 elements recursively and in place. Recall that the way we wrote MergeSort segregated the code other than the recursive calls in a companion procedure Merge. The most easily understood presentation of QuickSort uses the same approach, with the companion procedure typically named Partition. Tony Hoare (Sir Charles Antony Richard Hoare) invented QuickSort in 1959 while an exchange student at Moscow State University, but he was only able to get a good imple- mentation after he could program it in ALGOL. QuickSort spread through inclusion in Unix systems. The C standard library has a sorting algorithm qsort, an homage to QuickSort, even though the C standard never required qsort to be an implementation of QuickSort. Pseudo-code for Hoare’s algorithm is algorithm QuickSort(A, lo, hi) if lo < hi then p ← Partition(A, lo, hi) QuickSort(A, lo, p) QuickSort(A, p + 1, hi) fi end algorithm Partition(A, lo, hi) pivot ← A[lo] i ← lo j ← hi while (true) do while (A[i] < pivot) do i ← i + 1 end while while (A[j] > pivot) do j ← j − 1 end while if i ≥ j then return j swap(A[i],A[j]) i ← i + 1 j ← j − 1 end while end Hoare’s partitioning method divides the array into two adjacent subarrays using A[lo] as 1 a pivot. After partitioning it suffices to know that any element in the front subarray is less than or equal to any element in the back subarray to assure recursive calls will complete the sorting. The correctness of Hoare’s partitioning code is not easy to verify due to some subtle points. One natural concern is whether the main loop might be infinite, and hence one must argue that the conditional of the if statement will be satisfied after a finite number of steps (and hence break out of that loop). Toward that end one argues that during any full pass i increases and j decreases. What about the inner loops? Do we need to change the conditions to avoid letting i exceed hi and potentially create an infinite loop? Likewise for j dropping below lo and possibly becoming a runaway loop. The answer is no in both cases, but this requires an argument. On the initial pass we know A[lo] < pivot is false (they are equal). If we come back to this nested loop for a second pass the value of pivot will have been swapped into some position j where j > i and even after incrementing i will be less than or equal to a position beyond which it cannot pass. Can you do the reasoning that j cannot ever drop below lo? To complete the correctness argument one uses induction on a loop invariant for the outer loop such as “all entries in the subarray A[lo..i − 1] are smaller than all entries in the subarray A[j + 1..hi] at the start of the loop”. We now give the implementation most often presented in introductory computer science texts, calling it QSort. QSort expects its partitioning method to return an index that has the pivot in a correct final position and uses recursive calls QSort(A, lo, p − 1) and QSort(A, p + 1, hi). The partitioning normally uses A[hi] as the pivot, rearrangeing entries so that everything smaller than the pivot comes before it and everything larger comes after it (where do ties go? Do we even care?). Again the pivot itself is in position p, the return value of the partitioning code, which is a correct final position. Code for this partitioning method is algorithm Partition(A, lo, hi) pivot ← A[hi] i ← lo − 1 for j ← lo to hi − 1 do if A[j] < pivot then i ← i + 1 swap(A[i],A[j]) fi end for if A[hi] < A[i + 1] then swap(A[i + 1],A[hi]) fi return i + 1 end 2 Most students find it much easier to prove this version is correct since it clearly doesn’t have any infinite loops. It is tempting to think this version might be faster, but in fact it does about three times as many comparisons as Hoare’s method of partitioning. QuickSort has been a very popular sorting algorithm because it is fast for common data sets. Average case complexity is O(n log n), but worst case complexity is O(n2). So Merge- Sort and HeapSort both have better asymptotics since they are true O(n log n) algorithms. The slow cases for QuickSort arise when the partitioning isn’t into more or less equal length subarrays. People often code hybrid algorithms that start as MergeSort or QuickSort, but revert to using InsertionSort for any short subarrays where InsertionSort is faster. This can reduce the depth of the a recursion stack, saving overhead. Recall we said the C standard library algorithm qsort need not be an implementation of QuickSort. Gnu C does not even promise qsort will be an in place algorithm. That flexibility in the standard allows for use of such hybrid algorithms with the potential for better practical performance. To learn more about hybrid sorting algorithms you might look up TimSort, a hybrid sorting algorithm (of Merge- Sort & InsertionSort) written by Tim Peters (2002) for Python. TimSort is also used in Java, GNU Octave and on the Android platform. As with other sorting algorithms we suggest you trace both implementations on, say, A = [37, 23, 61, 74, 85, 13, 41, 53, 71, 19]. How would they work if the 74 were another 23; that is, do ties matter? What if the 74 were another 19 or another 37, the initial pivot values? 3.
Recommended publications
  • Overview of Sorting Algorithms
    Unit 7 Sorting Algorithms Simple Sorting algorithms Quicksort Improving Quicksort Overview of Sorting Algorithms Given a collection of items we want to arrange them in an increasing or decreasing order. You probably have seen a number of sorting algorithms including ¾ selection sort ¾ insertion sort ¾ bubble sort ¾ quicksort ¾ tree sort using BST's In terms of efficiency: ¾ average complexity of the first three is O(n2) ¾ average complexity of quicksort and tree sort is O(n lg n) ¾ but its worst case is still O(n2) which is not acceptable In this section, we ¾ review insertion, selection and bubble sort ¾ discuss quicksort and its average/worst case analysis ¾ show how to eliminate tail recursion ¾ present another sorting algorithm called heapsort Unit 7- Sorting Algorithms 2 Selection Sort Assume that data ¾ are integers ¾ are stored in an array, from 0 to size-1 ¾ sorting is in ascending order Algorithm for i=0 to size-1 do x = location with smallest value in locations i to size-1 swap data[i] and data[x] end Complexity If array has n items, i-th step will perform n-i operations First step performs n operations second step does n-1 operations ... last step performs 1 operatio. Total cost : n + (n-1) +(n-2) + ... + 2 + 1 = n*(n+1)/2 . Algorithm is O(n2). Unit 7- Sorting Algorithms 3 Insertion Sort Algorithm for i = 0 to size-1 do temp = data[i] x = first location from 0 to i with a value greater or equal to temp shift all values from x to i-1 one location forwards data[x] = temp end Complexity Interesting operations: comparison and shift i-th step performs i comparison and shift operations Total cost : 1 + 2 + ..
    [Show full text]
  • Batcher's Algorithm
    18.310 lecture notes Fall 2010 Batcher’s Algorithm Prof. Michel Goemans Perhaps the most restrictive version of the sorting problem requires not only no motion of the keys beyond compare-and-switches, but also that the plan of comparison-and-switches be fixed in advance. In each of the methods mentioned so far, the comparison to be made at any time often depends upon the result of previous comparisons. For example, in HeapSort, it appears at first glance that we are making only compare-and-switches between pairs of keys, but the comparisons we perform are not fixed in advance. Indeed when fixing a headless heap, we move either to the left child or to the right child depending on which child had the largest element; this is not fixed in advance. A sorting network is a fixed collection of comparison-switches, so that all comparisons and switches are between keys at locations that have been specified from the beginning. These comparisons are not dependent on what has happened before. The corresponding sorting algorithm is said to be non-adaptive. We will describe a simple recursive non-adaptive sorting procedure, named Batcher’s Algorithm after its discoverer. It is simple and elegant but has the disadvantage that it requires on the order of n(log n)2 comparisons. which is larger by a factor of the order of log n than the theoretical lower bound for comparison sorting. For a long time (ten years is a long time in this subject!) nobody knew if one could find a sorting network better than this one.
    [Show full text]
  • Mergesort and Quicksort ! Merge Two Halves to Make Sorted Whole
    Mergesort Basic plan: ! Divide array into two halves. ! Recursively sort each half. Mergesort and Quicksort ! Merge two halves to make sorted whole. • mergesort • mergesort analysis • quicksort • quicksort analysis • animations Reference: Algorithms in Java, Chapters 7 and 8 Copyright © 2007 by Robert Sedgewick and Kevin Wayne. 1 3 Mergesort and Quicksort Mergesort: Example Two great sorting algorithms. ! Full scientific understanding of their properties has enabled us to hammer them into practical system sorts. ! Occupy a prominent place in world's computational infrastructure. ! Quicksort honored as one of top 10 algorithms of 20th century in science and engineering. Mergesort. ! Java sort for objects. ! Perl, Python stable. Quicksort. ! Java sort for primitive types. ! C qsort, Unix, g++, Visual C++, Python. 2 4 Merging Merging. Combine two pre-sorted lists into a sorted whole. How to merge efficiently? Use an auxiliary array. l i m j r aux[] A G L O R H I M S T mergesort k mergesort analysis a[] A G H I L M quicksort quicksort analysis private static void merge(Comparable[] a, Comparable[] aux, int l, int m, int r) animations { copy for (int k = l; k < r; k++) aux[k] = a[k]; int i = l, j = m; for (int k = l; k < r; k++) if (i >= m) a[k] = aux[j++]; merge else if (j >= r) a[k] = aux[i++]; else if (less(aux[j], aux[i])) a[k] = aux[j++]; else a[k] = aux[i++]; } 5 7 Mergesort: Java implementation of recursive sort Mergesort analysis: Memory Q. How much memory does mergesort require? A. Too much! public class Merge { ! Original input array = N.
    [Show full text]
  • Hacking a Google Interview – Handout 2
    Hacking a Google Interview – Handout 2 Course Description Instructors: Bill Jacobs and Curtis Fonger Time: January 12 – 15, 5:00 – 6:30 PM in 32‐124 Website: http://courses.csail.mit.edu/iap/interview Classic Question #4: Reversing the words in a string Write a function to reverse the order of words in a string in place. Answer: Reverse the string by swapping the first character with the last character, the second character with the second‐to‐last character, and so on. Then, go through the string looking for spaces, so that you find where each of the words is. Reverse each of the words you encounter by again swapping the first character with the last character, the second character with the second‐to‐last character, and so on. Sorting Often, as part of a solution to a question, you will need to sort a collection of elements. The most important thing to remember about sorting is that it takes O(n log n) time. (That is, the fastest sorting algorithm for arbitrary data takes O(n log n) time.) Merge Sort: Merge sort is a recursive way to sort an array. First, you divide the array in half and recursively sort each half of the array. Then, you combine the two halves into a sorted array. So a merge sort function would look something like this: int[] mergeSort(int[] array) { if (array.length <= 1) return array; int middle = array.length / 2; int firstHalf = mergeSort(array[0..middle - 1]); int secondHalf = mergeSort( array[middle..array.length - 1]); return merge(firstHalf, secondHalf); } The algorithm relies on the fact that one can quickly combine two sorted arrays into a single sorted array.
    [Show full text]
  • Data Structures & Algorithms
    DATA STRUCTURES & ALGORITHMS Tutorial 6 Questions SORTING ALGORITHMS Required Questions Question 1. Many operations can be performed faster on sorted than on unsorted data. For which of the following operations is this the case? a. checking whether one word is an anagram of another word, e.g., plum and lump b. findin the minimum value. c. computing an average of values d. finding the middle value (the median) e. finding the value that appears most frequently in the data Question 2. In which case, the following sorting algorithm is fastest/slowest and what is the complexity in that case? Explain. a. insertion sort b. selection sort c. bubble sort d. quick sort Question 3. Consider the sequence of integers S = {5, 8, 2, 4, 3, 6, 1, 7} For each of the following sorting algorithms, indicate the sequence S after executing each step of the algorithm as it sorts this sequence: a. insertion sort b. selection sort c. heap sort d. bubble sort e. merge sort Question 4. Consider the sequence of integers 1 T = {1, 9, 2, 6, 4, 8, 0, 7} Indicate the sequence T after executing each step of the Cocktail sort algorithm (see Appendix) as it sorts this sequence. Advanced Questions Question 5. A variant of the bubble sorting algorithm is the so-called odd-even transposition sort . Like bubble sort, this algorithm a total of n-1 passes through the array. Each pass consists of two phases: The first phase compares array[i] with array[i+1] and swaps them if necessary for all the odd values of of i.
    [Show full text]
  • Sorting Networks on Restricted Topologies Arxiv:1612.06473V2 [Cs
    Sorting Networks On Restricted Topologies Indranil Banerjee Dana Richards Igor Shinkar [email protected] [email protected] [email protected] George Mason University George Mason University UC Berkeley October 9, 2018 Abstract The sorting number of a graph with n vertices is the minimum depth of a sorting network with n inputs and n outputs that uses only the edges of the graph to perform comparisons. Many known results on sorting networks can be stated in terms of sorting numbers of different classes of graphs. In this paper we show the following general results about the sorting number of graphs. 1. Any n-vertex graph that contains a simple path of length d has a sorting network of depth O(n log(n=d)). 2. Any n-vertex graph with maximal degree ∆ has a sorting network of depth O(∆n). We also provide several results relating the sorting number of a graph with its rout- ing number, size of its maximal matching, and other well known graph properties. Additionally, we give some new bounds on the sorting number for some typical graphs. 1 Introduction In this paper we study oblivious sorting algorithms. These are sorting algorithms whose sequence of comparisons is made in advance, before seeing the input, such that for any input of n numbers the value of the i'th output is smaller or equal to the value of the j'th arXiv:1612.06473v2 [cs.DS] 20 Jan 2017 output for all i < j. That is, for any permutation of the input out of the n! possible, the output of the algorithm must be sorted.
    [Show full text]
  • 13 Basic Sorting Algorithms
    Concise Notes on Data Structures and Algorithms Basic Sorting Algorithms 13 Basic Sorting Algorithms 13.1 Introduction Sorting is one of the most fundamental and important data processing tasks. Sorting algorithm: An algorithm that rearranges records in lists so that they follow some well-defined ordering relation on values of keys in each record. An internal sorting algorithm works on lists in main memory, while an external sorting algorithm works on lists stored in files. Some sorting algorithms work much better as internal sorts than external sorts, but some work well in both contexts. A sorting algorithm is stable if it preserves the original order of records with equal keys. Many sorting algorithms have been invented; in this chapter we will consider the simplest sorting algorithms. In our discussion in this chapter, all measures of input size are the length of the sorted lists (arrays in the sample code), and the basic operation counted is comparison of list elements (also called keys). 13.2 Bubble Sort One of the oldest sorting algorithms is bubble sort. The idea behind it is to make repeated passes through the list from beginning to end, comparing adjacent elements and swapping any that are out of order. After the first pass, the largest element will have been moved to the end of the list; after the second pass, the second largest will have been moved to the penultimate position; and so forth. The idea is that large values “bubble up” to the top of the list on each pass. A Ruby implementation of bubble sort appears in Figure 1.
    [Show full text]
  • Title the Complexity of Topological Sorting Algorithms Author(S)
    Title The Complexity of Topological Sorting Algorithms Author(s) Shoudai, Takayoshi Citation 数理解析研究所講究録 (1989), 695: 169-177 Issue Date 1989-06 URL http://hdl.handle.net/2433/101392 Right Type Departmental Bulletin Paper Textversion publisher Kyoto University 数理解析研究所講究録 第 695 巻 1989 年 169-177 169 Topological Sorting の NLOG 完全性について -The Complexity of Topological Sorting Algorithms- 正代 隆義 Takayoshi Shoudai Department of Mathematics, Kyushu University We consider the following problem: Given a directed acyclic graph $G$ and vertices $s$ and $t$ , is $s$ ordered before $t$ in the topological order generated by a given topological sorting algorithm? For known algorithms, we show that these problems are log-space complete for NLOG. It also contains the lexicographically first topological sorting problem. The algorithms use the result that NLOG is closed under conplementation. 1. Introduction The topological sorting problem is, given a directed acyclic graph $G=(V, E)$ , to find a total ordering of its vertices such that if $(v, w)$ is an edge then $v$ is ordered before $w$ . It has important applications for analyzing programs and arranging the words in the glossary [6]. Moreover, it is used in designing many efficient sequential algorithms, for example, the maximum flow problem [11]. Some techniques for computing topological orders have been developed. The algorithm by Knuth [6] that computes the lexicographically first topological order runs in time $O(|E|)$ . Tarjan [11] also devised an $O(|E|)$ time algorithm by employ- ing the depth-first search method. Dekel, Nassimi and Sahni [4] showed a parallel algorithm using the parallel matrix multiplication technique. Ruzzo also devised a simple $NL^{*}$ -algorithm as is stated in [3].
    [Show full text]
  • Quick Sort Algorithm Song Qin Dept
    Quick Sort Algorithm Song Qin Dept. of Computer Sciences Florida Institute of Technology Melbourne, FL 32901 ABSTRACT each iteration. Repeat this on the rest of the unsorted region Given an array with n elements, we want to rearrange them in without the first element. ascending order. In this paper, we introduce Quick Sort, a Bubble sort works as follows: keep passing through the list, divide-and-conquer algorithm to sort an N element array. We exchanging adjacent element, if the list is out of order; when no evaluate the O(NlogN) time complexity in best case and O(N2) exchanges are required on some pass, the list is sorted. in worst case theoretically. We also introduce a way to approach the best case. Merge sort [4] has a O(NlogN) time complexity. It divides the 1. INTRODUCTION array into two subarrays each with N/2 items. Conquer each Search engine relies on sorting algorithm very much. When you subarray by sorting it. Unless the array is sufficiently small(one search some key word online, the feedback information is element left), use recursion to do this. Combine the solutions to brought to you sorted by the importance of the web page. the subarrays by merging them into single sorted array. 2 Bubble, Selection and Insertion Sort, they all have an O(N2) time In Bubble sort, Selection sort and Insertion sort, the O(N ) time complexity that limits its usefulness to small number of element complexity limits the performance when N gets very big. no more than a few thousand data points.
    [Show full text]
  • Heapsort Vs. Quicksort
    Heapsort vs. Quicksort Most groups had sound data and observed: – Random problem instances • Heapsort runs perhaps 2x slower on small instances • It’s even slower on larger instances – Nearly-sorted instances: • Quicksort is worse than Heapsort on large instances. Some groups counted comparisons: • Heapsort uses more comparisons on random data Most groups concluded: – Experiments show that MH2 predictions are correct • At least for random data 1 CSE 202 - Dynamic Programming Sorting Random Data N Time (us) Quicksort Heapsort 10 19 21 100 173 293 1,000 2,238 5,289 10,000 28,736 78,064 100,000 355,949 1,184,493 “HeapSort is definitely growing faster (in running time) than is QuickSort. ... This lends support to the MH2 model.” Does it? What other explanations are there? 2 CSE 202 - Dynamic Programming Sorting Random Data N Number of comparisons Quicksort Heapsort 10 54 56 100 987 1,206 1,000 13,116 18,708 10,000 166,926 249,856 100,000 2,050,479 3,136,104 But wait – the number of comparisons for Heapsort is also going up faster that for Quicksort. This has nothing to do with the MH2 analysis. How can we see if MH2 analysis is relevant? 3 CSE 202 - Dynamic Programming Sorting Random Data N Time (us) Compares Time / compare (ns) Quicksort Heapsort Quicksort Heapsort Quicksort Heapsort 10 19 21 54 56 352 375 100 173 293 987 1,206 175 243 1,000 2,238 5,289 13,116 18,708 171 283 10,000 28,736 78,064 166,926 249,856 172 312 100,000 355,949 1,184,493 2,050,479 3,136,104 174 378 Nice data! – Why does N = 10 take so much longer per comparison? – Why does Heapsort always take longer than Quicksort? – Is Heapsort growth as predicted by MH2 model? • Is N large enough to be interesting?? (Machine is a Sun Ultra 10) 4 CSE 202 - Dynamic Programming ..
    [Show full text]
  • Sorting Algorithm 1 Sorting Algorithm
    Sorting algorithm 1 Sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions: 1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order); 2. The output is a permutation, or reordering, of the input. Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and lower bounds. Classification Sorting algorithms used in computer science are often classified by: • Computational complexity (worst, average and best behaviour) of element comparisons in terms of the size of the list . For typical sorting algorithms good behavior is and bad behavior is .
    [Show full text]
  • Arxiv:1812.03318V1 [Cs.SE] 8 Dec 2018 Rived from Merge Sort and Insertion Sort, Designed to Work Well on Many Kinds of Real-World Data
    A Verified Timsort C Implementation in Isabelle/HOL Yu Zhang1, Yongwang Zhao1 , and David Sanan2 1 School of Computer Science and Engineering, Beihang University, Beijing, China [email protected] 2 School of Computer Science and Engineering, Nanyang Technological University, Singapore Abstract. Formal verification of traditional algorithms are of great significance due to their wide application in state-of-the-art software. Timsort is a complicated and hybrid stable sorting algorithm, derived from merge sort and insertion sort. Although Timsort implementation in OpenJDK has been formally verified, there is still not a standard and formally verified Timsort implementation in C programming language. This paper studies Timsort implementation and its formal verification using a generic imperative language - Simpl in Isabelle/HOL. Then, we manually generate an C implementation of Timsort from the verified Simpl specification. Due to the C-like concrete syntax of Simpl, the code generation is straightforward. The C implementation has also been tested by a set of random test cases. Keywords: Program Verification · Timsort · Isabelle/HOL 1 Introduction Formal verification has been considered as a promising way to the reliability of programs. With development of verification tools, it is possible to perform fully formal verification of large and complex programs in recent years [2,3]. Formal verification of traditional algorithms are of great significance due to their wide application in state-of-the-art software. The goal of this paper is the functional verification of sorting algorithms as well as generation of C source code. We investigated Timsort algorithm which is a hybrid stable sorting algorithm, de- arXiv:1812.03318v1 [cs.SE] 8 Dec 2018 rived from merge sort and insertion sort, designed to work well on many kinds of real-world data.
    [Show full text]