Comppgarison Sorting Review

Introduction to z : Sorting in Linear Time z Pro’s: z Easy to code z Fast on small inputs (less than ~50 elements) CSE 680 z Fast on nearly-sorted inputs Prof. Roger Crawfis z Con’s: z O(n2) worst case z O(n2) average case z O(n2) reverse-sorted case

Comppgarison Sorting Review Comppgarison Sorting Review

z : z sort: z Divide-and-conquer: z Uses the very useful heap data structure z Sppylit array in half z Comppylete binary tree z Recursively sort sub-arrays z Heap property: parent key > children’s keys z Linear-time merge step z Pro’s: z P’Pro’s: z O(n lg n) worst case - asymptotically optimal for z O(n lg n) worst case - asymptotically optimal for comparison sorts com pariso n sort s z Sort s in ppacelace z Con’s: z Con’s: z Doesn’t sort in place z Fair amount of shuffling memory around Comppgarison Sorting Review Non-Comparison Based Sorting

z Quick sort: z Many times we have restrictions on our z Divide-and-conquer: keys z Partition array into two sub-arrays, recursively sort z Deck of cards: Ace->King and four suites z All of first sub-array < all of second sub-array z Social Security Numbers z Pro’s: z Employee ID’s z O(n lg n) average case z We will examine three algorithms which z StiSorts in pl ace z Fast in practice (why?) under certain conditions can run in O(n) z Con’s: time. z O(n2) worst case z CtitCounting sort z Naïve implementation: worst case on sorted input z z Good partitioning makes this very unlikely. z

Counting Sort

z Depends on assumption about the 1 CountingSort(A, B, k) 2 for i=1 to k numbers being sorted 3 C[i]= 0; This is called a histogram. z Assume numbers are in the range 1.. k 4 for jj1=1ton to n 5 C[A[j]] += 1; z The : 6 for i=2 to k z ItA[1Input: A[1..n], wh ere A[j] ∈ {123{1, 2, 3, …, k} 7 C[i] = C[i] + C[i-1]; 8 for j=n downto 1 z Output: B[1..n], sorted (not sorted in place) 9 B[C[A[j]]] = A[j]; z Also: Array C[1..k] for auxiliary storage 10 C[A[j]] -= 1; Countinggp Sort Example Counting Sort

1 CountingSort(A, B, k) 2 for i=1 to k Takes time O(k) 3 C[i]= 0; 4 for jj1=1ton to n 5 C[A[j]] += 1; 6 for i=2 to k Takes time O(n) 7 C[i]C[] = C[i] + C[i-1]; 8 for j=n downto 1 9 B[C[A[j]]] = A[j]; 10 C[A[j]] -= 1;

What is the running time?

Counting Sort Counting Sort

z Total time: O( n + k) z Why don’ t we always use counting sort? z Works well if k = O(n) or k = O(1) z Depends on range k of elements. z This algorithm / implementation is stable. z A is stable when numbers z Could we use counting sort to sort 32 bit with the same values appear in the output itintegers ?Wh? Why or w hy no t? array in the same order as they do in the input array. Counting Sort Review Radix Sort

z Assumption: input taken from small set of numbers of size k z How did IBM get rich originally? z Basic idea: z Answer: punched card readers for z Count number of elements less than you for each element. z This gi ves the position of that nu mber – similar to selection census tabulation in early 1900’ s. sort. z Pro’s: z In particular, a card sorter that could sort z Fast cards into different bins z Asymptotically fast - O(n+k) z Simple to code z Each column can be punched in 12 places z Con’ s: z Decimal digits use 10 places z Doesn’t sort in place. z Elements must be integers. countable z Problem: only one column can be sorted on z Requires O(n+k) extra storage. at a time

Radix Sort Radix Sort Example

z Intuitively, you might sort on the most significant digit, then the second msd, etc. z Problem: lots of intermediate piles of cards (read: scratch arrays) to keep track of z Key idea: sort the least significant digit first RadixSort(A, d) for i=1 to d StableSort(A) on digit i Radix Sort Correctness Radix Sort

z Sketch of an inductive proof of correctness z What sort is used to sort on digits? (induction on the number of passes): z Counting sort is obvious choice: z Assume lower-order digits {j: j

Radix Sort Radix Sort Review

z Assumption: input has d digits ranging from 0 to k z Problem: sort 1 million 64-bit numbers z Basic idea: z Treat as four-digit radix 216 numbers z Sort elements by digit starting with least significant z Use a stable sort (like counting sort) for each stage z Can sort in just four passes with radix sort! z P’Pro’s: z Fast z Performs well compared to typical z Asymptotically fast (i.e., O(n) when d is constant and k=O(n)) O(n lg n)it) z Simple to code z A good choice z Approx lg(1,000,000) ≅ 20 comparisons per z Con’s: numbbiber being sortdted z Doesn’ t sort in place z Not a good choice for floating point numbers or arbitrary strings. Bucket Sort Bucket Sort

Assumption: input elements distributed uniformly over some known Bucket-Sort(A, x, y) range, e.g., [0,1), so all elements in A are greater than or equal to 0 but less 1. divide interval [x, y) into n equal-sized subintervals (buckets) than 1 . (Appendix C.2 has definition of uniform distribution) 2. distribute the n input keys into the buckets 3. sort the numbers in each bucket (e.g., with insertion sort) Bucket-Sort(A) 4. scan the (sorted) buckets in order and produce output array 1. n = length[A] 2. for i = 1 to n Running time of bucket sort: O(n) expected time 3. do inser t A[i] in to lis t B[floor o f n A[i]] Step 1: O(1) for each interval = O(n) time total. 4. for i = 0 to n-1 Step 2: O(n) time. 5. do sort list i with Insertion-Sort Step 3: The expected number of elements in each bucket is O(1) 6. Concatenate lists B[0], B[1],…, B[n-1] (bkffl(see book for formal argument , secti ti84)on 8.4), so ttlitotal is O(n) Step 4: O(n) time to scan the n buckets containing a total of n input elements

Bucket Sort Example Bucket Sort Review

z Assumption: input is uniformly distributed across a range z Basic idea: z Partition the range into a fixed number of buckets. z Toss each element into its appropriate bucket. z StSort each hbkt bucket. z Pro’s: z Fast z Asymptoticall y fast (i. e., O(n) w hen distrib u tion is u niform) z Simple to code z Good for a rough sort. z Con’ s: z Doesn’t sort in place Summaryyg of Linear Sorting

Non-Comparison Based Sorts Running Time worst-case average-case best-case in place Counting Sort O(n + k) O(n + k) O(n + k) no Radix Sort O(d(n + k')) O(d(n + k')) O(d(n + k')) no Bucket Sort O(n) no

Counting sort assumes input elements are in range [0, 1, 2,.., k] and uses array indexing to count the number of occurrences of each value. Radix sort assumes each integer consists of d digits, and each digit is in range [1,2,..,k']. Bucket sort requires advance knowledge of input distribution (sorts n numbiflbers uniformly dis tr ibu te d in range in O(n)ti) time) .