Sorting

October 18, 2017

CMPE 250 Algorithms October 18, 2017 1 / 74 Sorting

Sorting is a process that organizes a collection of data into either ascending or descending order. An internal requires that the collection of data fit entirely in the computer’s main memory. We can use an external sort when the collection of data cannot fit in the computer’s main memory all at once but must reside in secondary storage such as on a disk (or tape). We will analyze only internal sorting algorithms.

CMPE 250 Sorting Algorithms October 18, 2017 2 / 74 Why Sorting?

Any significant amount of computer output is generally arranged in some sorted order so that it can be interpreted. Sorting also has indirect uses. An initial sort of the data can significantly enhance the performance of an . Majority of programming projects use a sort somewhere, and in many cases, the sorting cost determines the running time. A comparison-based makes ordering decisions only on the basis of comparisons.

CMPE 250 Sorting Algorithms October 18, 2017 3 / 74 Sorting Algorithms

There are many sorting algorithms, such as: Quick Sort Heap Sort Shell Sort The first three are the foundations for faster and more efficient algorithms.

CMPE 250 Sorting Algorithms October 18, 2017 4 / 74 Insertion Sort

Insertion sort is a simple sorting algorithm that is appropriate for small inputs. The most common sorting technique used by card players. The list is divided into two parts: sorted and unsorted. In each pass, the first element of the unsorted part is picked up, transferred to the sorted sublist, and inserted at the appropriate place. A list of n elements will take at most n − 1 passes to sort the data.

CMPE 250 Sorting Algorithms October 18, 2017 5 / 74 Insertion Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 6 / 74 Insertion Sort Algorithm

// Simple insertion sort. void insertionSort( vector &a) { for( int p = 1;p

int j; for(j=p;j > 0 && tmp

CMPE 250 Sorting Algorithms October 18, 2017 7 / 74 Insertion Sort – Analysis

Running time depends on not only the size of the array but also the contents of the array. Best-case: → O(n) Array is already sorted in ascending order. Inner loop will not be executed. The number of moves: 2 × (n − 1) → O(n) The number of key comparisons: (n − 1) → O(n) Worst-case: → O(n2) Array is in reverse order: Inner loop is executed i − 1 times, for i = 2, 3,..., n The number of moves: 2 × (n − 1) + (1 + 2 + ··· + n − 1) = 2 × (n − 1) + n × (n − 1)/2 → O(n2) The number of key comparisons: (1 + 2 + ··· + n − 1) = n × (n − 1)/2 → O(n2) Average-case: → O(n2) We have to look at all possible initial data organizations. So, Insertion Sort is O(n2)

CMPE 250 Sorting Algorithms October 18, 2017 8 / 74 Analysis of insertion sort

Which running time will be used to characterize this algorithm? Best, worst or average? Worst: Longest running time (this is the upper limit for the algorithm) It is guaranteed that the algorithm will not be worse than this. Sometimes we are interested in the average case. But there are some problems with the average case. It is difficult to figure out the average case. i.e. what is average input? Are we going to assume all possible inputs are equally likely? In fact for most algorithms the average case is the same as the worst case.

CMPE 250 Sorting Algorithms October 18, 2017 9 / 74 A lower bound for simple sorting algorithms

An inversion : an ordered pair (Ai , Aj ) such that i < j but Ai > Aj Example: 10, 6, 7, 15, 3,1 Inversions are: (10,6), (10,7), (10,3), (10,1), (6,3), (6,1) (7,3), (7,1) (15,3), (15,1), (3,1)

CMPE 250 Sorting Algorithms October 18, 2017 10 / 74 Swapping

Swapping adjacent elements that are out of order removes one inversion. A sorted array has no inversions. Sorting an array that contains i inversions requires at least i swaps of adjacent elements.

CMPE 250 Sorting Algorithms October 18, 2017 11 / 74 Theorems

Theorem 1: The average number of inversions in an array of N distinct elements is N(N − 1)/4 Theorem 2: Any algorithm that sorts by exchanging adjacent elements requires Ω(N2) time on average. For a sorting algorithm to run in less than quadratic time it must do something other than swap adjacent elements.

CMPE 250 Sorting Algorithms October 18, 2017 12 / 74 Mergesort

Mergesort algorithm is one of the two important divide-and-conquer sorting algorithms (the other one is ). It is a recursive algorithm. Divides the list into halves, Sorts each half separately, and Then merges the sorted halves into one sorted array.

CMPE 250 Sorting Algorithms October 18, 2017 13 / 74 Merge Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 14 / 74 Mergesort

/** * Mergesort algorithm(driver). */ template void mergeSort( vector &a) { vector tmpArray(a.size( ) );

mergeSort(a, tmpArray, 0,a.size( ) - 1 ); }

CMPE 250 Sorting Algorithms October 18, 2017 15 / 74 Mergesort (Cont.)

/** * Internal method that makes recursive calls. * a is an array of Comparable items. * tmpArray is an array to place the merged result. * left is the left-most index of the subarray. * right is the right-most index of the subarray. */ template void mergeSort(vector &a, vector & tmpArray, int left, int right){ if (left< right){ int center=(left+ right) / 2; mergeSort(a, tmpArray, left, center); mergeSort(a, tmpArray, center + 1, right); merge(a, tmpArray, left, center + 1, right); } }

CMPE 250 Sorting Algorithms October 18, 2017 16 / 74 Merge

/** * Internal method that merges two sorted halves ofa subarray. * a is an array of Comparable items. * tmpArray is an array to place the merged result. * leftPos is the left-most index of the subarray. * rightPos is the index of the start of the second half. * rightEnd is the right-most index of the subarray. */ template void merge( vector &a, vector & tmpArray, int leftPos, int rightPos, int rightEnd) { int leftEnd= rightPos - 1; int tmpPos= leftPos; int numElements= rightEnd- leftPos + 1;

// Main loop while( leftPos <= leftEnd&& rightPos <= rightEnd) if(a[ leftPos ] <=a[ rightPos]) tmpArray[ tmpPos++ ] = std::move(a[ leftPos++ ] ); else tmpArray[ tmpPos++ ] = std::move(a[ rightPos++ ] );

while( leftPos <= leftEnd) // Copy rest of first half tmpArray[ tmpPos++ ] = std::move(a[ leftPos++ ] );

while( rightPos <= rightEnd) // Copy rest of right half tmpArray[ tmpPos++ ] = std::move(a[ rightPos++ ] );

// Copy tmpArray back for( int i = 0;i< numElements; ++i, --rightEnd) a[ rightEnd]= std::move( tmpArray[ rightEnd]); }

CMPE 250 Sorting Algorithms October 18, 2017 17 / 74 Merge Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 18 / 74 Merge Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 19 / 74 Mergesort – Analysis of Merge

A worst-case instance of the merge step in mergesort

CMPE 250 Sorting Algorithms October 18, 2017 20 / 74 Mergesort – Analysis of Merge (cont.)

Merging two sorted arrays of size k Best-case: All the elements in the first array are smaller (or larger) than all the elements in the second array. The number of moves: 2k + 2k The number of key comparisons: k Worst-case: The number of moves: 2k + 2k The number of key comparisons: 2k − 1

CMPE 250 Sorting Algorithms October 18, 2017 21 / 74 Mergesort - Analysis

Levels of recursive calls to mergesort, given an array of eight items

CMPE 250 Sorting Algorithms October 18, 2017 22 / 74 Mergesort - Analysis

CMPE 250 Sorting Algorithms October 18, 2017 23 / 74 Mergesort - Analysis

Worst-case – The number of key comparisons: = 20 × (2 × 2m−1 − 1) + 21 × (2 × 2m−2 − 1) + ... + 2m−1 × (2 × 20 − 1) = (2m − 1) + (2m − 2) + ... + (2m − 2m−1) ( m terms ) m Pm−1 i = m × 2 − i=0 2 = m × 2m − 2m − 1 Using m = logn = n × log2n − n − 1 → O(n × log2n)

CMPE 250 Sorting Algorithms October 18, 2017 24 / 74 Mergesort - Analysis

Mergesort is an extremely efficient algorithm with respect to time.

Both worst case and average cases are O(n × log2n) But, mergesort requires an extra array whose size equals to the size of the original array. If we use a linked list, we do not need an extra array But, we need space for the links And, it will be difficult to divide the list into half ( O(n) )

CMPE 250 Sorting Algorithms October 18, 2017 25 / 74 Mergesort for Linked Lists

Merge sort is often preferred for sorting a linked list. The slow random-access performance of a linked list makes some other algorithms (such as quicksort) perform poorly, and others (such as ) completely impossible. MergeSort

1 If head is NULL or there is only one element in the Linked List then return. 2 Else divide the linked list into two halves. 3 Sort the two halves a and b. MergeSort(&first); MergeSort(&second); 4 Merge the two parts of the list into a sorted one. *head = Merge(first, second);

CMPE 250 Sorting Algorithms October 18, 2017 26 / 74 Mergesort for linked lists

#include using namespace std;

// Link list node typedef struct Node* listpointer; struct Node { int data; listpointer next; };

// function prototypes

listpointer SortedMerge(listpointera, listpointerb);

void FrontBackSplit(listpointer source, listpointer* frontRef, listpointer* backRef);

// sorts the linked list by changing next pointers(not data)

void MergeSort(listpointer* headRef) { listpointer head= *headRef; listpointera; listpointerb;

//Base case-- length0 or1 if ((head == NULL) || (head->next == NULL)) { return; }

// Split head into’a’ and’b’ sublists FrontBackSplit(head,&a,&b);

// Recursively sort the sublists MergeSort(&a); MergeSort(&b);

// answer= merge the two sorted lists together *headRef= SortedMerge(a,b); }

CMPE 250 Sorting Algorithms October 18, 2017 27 / 74 Mergesort for linked lists (cont.)

listpointer SortedMerge(listpointera, listpointerb) { listpointer result= NULL;

// Base cases if (a == NULL) return(b); else if (b==NULL) return(a);

// Pick eithera orb, and make recursive call if (a->data <=b->data) { result=a; result->next= SortedMerge(a->next,b); } else { result=b; result->next= SortedMerge(a,b->next); } return(result); }

CMPE 250 Sorting Algorithms October 18, 2017 28 / 74 Mergesort for linked lists (cont.)

// Split the nodes of the given list into front and back halves, // and return the two lists using the reference parameters. // If the length is odd, the extra node should go in the front list. // Uses the fast/slow pointer strategy.

void FrontBackSplit(listpointer source, listpointer* frontRef, listpointer* backRef) { listpointer fast; listpointer slow; if (source==NULL || source->next==NULL) { // length<2 cases *frontRef= source; *backRef= NULL; } else { slow= source; fast= source->next;

// Advance’fast’ two nodes, and advance’slow’ one node while (fast != NULL) { fast= fast->next; if (fast != NULL) { slow= slow->next; fast= fast->next; } }

//’slow’ is before the midpoint in the list, so split it in two //at that point. *frontRef= source; *backRef= slow->next; slow->next= NULL; } }

CMPE 250 Sorting Algorithms October 18, 2017 29 / 74 Mergesort for linked lists (cont.)

// Function to print nodes ina given linked list void printList(listpointer node) { while(node!=NULL) { cout<< node->data<<""; node= node->next; } }

// Function to inserta node at the beginning of the linked list

void push(listpointer* head_ref, int new_data) { // allocate node listpointer new_node= new Node;

// put in the data new_node->data= new_data;

// link the old list off the new node new_node->next=( *head_ref);

// move the head to point to the new node (*head_ref) = new_node; }

CMPE 250 Sorting Algorithms October 18, 2017 30 / 74 Mergesort for linked lists (cont.)

// Driver program to test above functions int main() { // Start with the empty list

listpointera= NULL; int n,num;

// Let us create an unsorted linked list to test the functions

cout<>n;

// Create linked list. for(int i = 0;i>num;

push(&a,num); }

// Sort the above created Linked List MergeSort(&a);

cout<< endl <<"Sorted Linked List is:"<

return 0; }

CMPE 250 Sorting Algorithms October 18, 2017 31 / 74 Quicksort

Like mergesort, Quicksort is also based on the divide-and-conquer paradigm. But it uses this technique in a somewhat opposite manner, as all the hard work is done before the recursive calls. It works as follows: 1 First, it partitions an array into two parts, 2 Then, it sorts the parts independently, 3 Finally, it combines the sorted subsequences by a simple concatenation.

CMPE 250 Sorting Algorithms October 18, 2017 32 / 74 Quicksort

Algorithm 1 Quicksort 1: Let S be the input set. 2: if | S| = 0 or | S| = 1 then return 3: Pick an element v in S. Call v the pivot. 4: Partition S − v into two disjoint groups: S1 = {x ∈ S − {v} |x ≤ v} S2 = {x ∈ S − {v} |x ≥ v} return { quicksort(S1), v, quicksort(S2) }

CMPE 250 Sorting Algorithms October 18, 2017 33 / 74 Quicksort Illustrated

CMPE 250 Sorting Algorithms October 18, 2017 34 / 74 Issues To Consider

How to pick the pivot? Many methods (discussed later) How to partition? Several methods exist. The one we consider is known to give good results and to be easy and efficient. We discuss the partition strategy first.

CMPE 250 Sorting Algorithms October 18, 2017 35 / 74 Partitioning Strategy

For now, assume that pivot = A[(left+right)/2]. We want to partition array A[left .. right]. First, get the pivot element out of the way by swapping it with the last element (swap pivot and A[right]). Let i start at the first element and j start at the next-to-last element (i = left, j = right – 1)

CMPE 250 Sorting Algorithms October 18, 2017 36 / 74 Partitioning Strategy (Cont.)

Want to have A[k] ≤ pivot, for k < i A[k] ≥ pivot, for k > j

When i < j Move i right, skipping over elements smaller than the pivot Move j left, skipping over elements greater than the pivot When both i and j have stopped A[i] ≥ pivot A[j] ≤ pivot ⇒ A[i] and A[j] should now be swapped

CMPE 250 Sorting Algorithms October 18, 2017 37 / 74 Partitioning Strategy (Cont.)

When i and j have stopped and i is to the left of j (thus legal) Swap A[i] and A[j] The large element is pushed to the right and the small element is pushed to the left After swapping A[i] ≤ pivot A[j] ≥ pivot Repeat the process until i and j cross

CMPE 250 Sorting Algorithms October 18, 2017 38 / 74 Partitioning Strategy (Cont.)

When i and j have crossed swap A[i] and pivot Result: A[k] ≤ pivot, for k < i A[k] ≥ pivot, for k > i

CMPE 250 Sorting Algorithms October 18, 2017 39 / 74 Pivot Strategies

First element: Bad choice if input is sorted or in reverse sorted order Bad choice if input is nearly sorted Random: even a malicious agent cannot arrange a bad input Median-of-three: choose the median of the left, right, and center elements

CMPE 250 Sorting Algorithms October 18, 2017 40 / 74 Median of Three

CMPE 250 Sorting Algorithms October 18, 2017 41 / 74 Median of Three

// Return median of left, center, and right. // Order these and hide the pivot.

template const Comparable& median3( vector &a, int left, int right) { int center=( left+ right ) / 2;

if(a[ center]

// Place pivot at position right-1 std::swap(a[ center],a[ right - 1 ] ); return a[ right - 1 ]; }

CMPE 250 Sorting Algorithms October 18, 2017 42 / 74 Discussion

Small arrays: Quicksort is slower than insertion sort when is N is small (say, N ≤ 20). Optimization: Make |S| ≤ 20 the base case and call insertion sort.

CMPE 250 Sorting Algorithms October 18, 2017 43 / 74 Quicksort algorithm (driver)

template void quicksort( vector &a) { quicksort(a, 0,a.size( ) - 1 ); }

CMPE 250 Sorting Algorithms October 18, 2017 44 / 74 Quicksort algorithm (recursive)

// Uses median-of-three partitioning anda cutoff of 20. //a is an array of Comparable items. // left is the left-most index of the subarray. // right is the right-most index of the subarray.

template void quicksort( vector &a, int left, int right) { if( left + 20 <= right) { const Comparable& pivot= median3(a, left, right);

// Begin partitioning int i= left,j= right - 1; for(;;) { while(a[ ++i]< pivot){} while( pivot

std::swap(a[i],a[ right - 1 ] ); // Restore pivot

quicksort(a, left,i - 1 ); // Sort small elements quicksort(a,i + 1, right); // Sort large elements } else // Do an insertion sort on the subarray insertionSort(a, left, right); }

CMPE 250 Sorting Algorithms October 18, 2017 45 / 74 Analysis of Quicksort

Worst case: pivot is the smallest (or largest) element all the time. T (N) = T (N − 1) + cN T (N − 1) = T (N − 2) + c(N − 1) T (N − 2) = T (N − 3) + c(N − 2) ... T (2) = T (1) + c(2) PN 2 T (N) = T (1) + c i=2 i → O(N ) Best case: pivot is the median T (N) = 2T (N/2) + cN T (N) = cNlogN + N → O(NlogN)

CMPE 250 Sorting Algorithms October 18, 2017 46 / 74 Quicksort: Average case

Assume each of the sizes for S1 are equally likely.

0 ≤ |S1| ≤ N − 1.  1 PN−1  T (N) = N i=0 [T (i) + T (N − i − 1)] + cN  2 PN−1  T (N) = N i=0 T (i) + cN  PN−1  2 NT (N) = 2 i=0 T (i) + cN  PN−2  2 (N − 1)T (N − 1) = 2 i=0 T (i) + c(N − 1) NT (N) − (N − 1)T (N − 1) = 2T (N − 1) + 2cN − c NT (N) = (N + 1)T (N − 1) + 2cN Divide equation by N(N + 1) T (N) T (N−1) 2c N+1 = N + N+1

CMPE 250 Sorting Algorithms October 18, 2017 47 / 74 Quicksort: Average case (Cont.)

T (N−1) T (N−2) 2c N = N−1 + N T (N−2) T (N−3) 2c N−1 = N−2 + N−1 ... T (2) T (1) 2c 3 = 2 + 3 T (N) T (1) PN+1 1 N+1 = 2 + 2c i=3 i PN+1 1 3 2c i=3 i = 2c(HN+1 − 2 ) T (1) 3 T (N) = (N + 1)( 2 + 2c(HN+1 − 2 ) 1 HN ≈ loge(N) + γ + 2N (γ = 0.577215664901(Euler-Mascheroni Constant) h T (1)  1 3 i T (N) ≈ (N + 1) 2 + 2c (loge(N + 1) + γ + 2(N+1) ) − 2 T (N) → O(NlogN)

CMPE 250 Sorting Algorithms October 18, 2017 48 / 74 Heapsort

The priority queue can be used to sort N items by inserting every item into a and extracting every item by calling deleteMin N times, thus sorting the result. An algorithm based on this idea is heapsort. It is an O(NlogN) worst-case sorting algorithm.

CMPE 250 Sorting Algorithms October 18, 2017 49 / 74 Heapsort

The main problem with this algorithm is that it uses an extra array for the items exiting the heap. We can avoid this problem as follows: After each deleteMin, the heap shrinks by 1. Thus the cell that was last in the heap can be used to store the element that was just deleted. Using this strategy, after the last deleteMin, the array will contain all elements in decreasing order. If we want them in increasing order we must use a max heap.

CMPE 250 Sorting Algorithms October 18, 2017 50 / 74 Heapsort Example

Max heap after the buildHeap phase for the input sequence 59,36,58,21,41,97,31,16,26,53

CMPE 250 Sorting Algorithms October 18, 2017 51 / 74 Heapsort Example (Cont.)

Heap after the first deleteMax operation

CMPE 250 Sorting Algorithms October 18, 2017 52 / 74 Heapsort Example (Cont.)

Heap after the second deleteMax operation

CMPE 250 Sorting Algorithms October 18, 2017 53 / 74 Implementation

In the implementation of heapsort, the ADT BinaryHeap is not used. Everything is done in an array. The root is stored in position 0. Thus there are some minor changes in the code: Since we use max heap, the logic of comparisons is changed from > to <. For a node in position i, the parent is in (i − 1)/2, the left child is in 2i + 1 and right child is next to left child. Percolating down needs the current heap size which is lowered by 1 at every deletion.

CMPE 250 Sorting Algorithms October 18, 2017 54 / 74 The Heapsort Sort Algorithm

// Standard heapsort. template void heapsort( vector &a) { for( int i=a.size( ) / 2 - 1;i >= 0; --i) // buildHeap percDown(a,i,a.size( ) ); for( int j=a.size( ) - 1;j > 0; --j) { std::swap(a[ 0 ],a[j]); // deleteMax percDown(a, 0,j); } }

CMPE 250 Sorting Algorithms October 18, 2017 55 / 74 percDown Algorithm

// Internal method for heapsort. //i is the index of an item in the heap. // Returns the index of the left child.

inline int leftChild( int i) { return 2 * i + 1; } // Internal method for heapsort that is used in // deleteMax and buildHeap. //i is the position from which to percolate down. //n is the logical size of the binary heap.

template void percDown( vector &a, int i, int n) { int child; Comparable tmp;

for( tmp= std::move(a[i]); leftChild(i)

CMPE 250 Sorting Algorithms October 18, 2017 56 / 74 Analysis of Heapsort

It is an O(NlogN) algorithm. First phase: Build heap O(N) Second phase: N deleteMax operations: O(NlogN). Detailed analysis shows that, the average case for heapsort is poorer than quick sort. Quicksort’s worst case however is far worse. An average case analysis of heapsort is very complicated, but empirical studies show that there is little difference between the average and worst cases. Heapsort usually takes about twice as long as quicksort. Heapsort therefore should be regarded as something of an insurance policy: On average, it is more costly, but it avoids the possibility of O(N2).

CMPE 250 Sorting Algorithms October 18, 2017 57 / 74 How fast can we sort?

Heapsort, Mergesort, and Quicksort all run in O(NlogN) best case running time. Can we do any better?

CMPE 250 Sorting Algorithms October 18, 2017 58 / 74 The Answer is No! (if using comparisons only)

Our basic assumption: we can only compare two elements at a time – how does this limit the run time? Suppose you are given N elements Assume no duplicates – any sorting algorithm must also work for this case How many possible orderings can you get?

CMPE 250 Sorting Algorithms October 18, 2017 59 / 74 How many possible orderings?

Example: a, b, c (N = 3) Orderings: 1 a b c 2 b c a 3 c a b 4 a c b 5 b a c 6 c b a 6 orderings = 3 × 2 × 1 = 3! For N elements: N! orderings

CMPE 250 Sorting Algorithms October 18, 2017 60 / 74 A Decision Tree

Leaves contain possible orderings of a, b, c

CMPE 250 Sorting Algorithms October 18, 2017 61 / 74 Decision Trees and Sorting

A Decision Tree is a Binary Tree such that: Each node = a set of orderings Each edge = 1 comparison Each leaf = 1 unique ordering How many leaves for N distinct elements? Only 1 leaf has sorted ordering Each sorting algorithm corresponds to a decision tree Finds correct leaf by following edges (= comparisons) Run time ≥ maximum number of comparisons Depends on: depth of decision tree What is the depth of a decision tree for N distinct elements?

CMPE 250 Sorting Algorithms October 18, 2017 62 / 74 Lower Bound on Comparison-Based Sorting

Suppose you have a binary tree of depth d. How many leaves can the tree have? e.g. depth d = 1 → at most 2 leaves, d = 2 → at most 4 leaves, etc.

CMPE 250 Sorting Algorithms October 18, 2017 63 / 74 Lower Bound on Comparison-Based Sorting

A binary tree of depth d has at most 2d leaves Number of leaves L ≤ 2d → d ≥ logL Decision tree has L = N! leaves → its depth d ≥ log(N!)

CMPE 250 Sorting Algorithms October 18, 2017 64 / 74 Lower Bound on Comparison-Based Sorting

√ N N Stirling’s approximation: N! ≈ 2πN e √  N N  log(N!) ≈ log 2πN e √    N N  = log 2πN + log e 1 = 2 log(2πN) + N(log(N) − 1) → Ω(NlogN) Conclusion: Any sorting algorithm based on comparisons between elements requires Ω(NlogN) comparisons

CMPE 250 Sorting Algorithms October 18, 2017 65 / 74 Comparison of Sorting Algorithms

Algorithm Worst case Average case Selection sort O(N2) O(N2) Bubble sort O(N2) O(N2) Insertion sort O(N2) O(N2) Mergesort O(NlogN) O(NlogN) Quicksort O(N2) O(NlogN) O(N) O(N) Treesort O(N2) O(NlogN) Heapsort O(NlogN) O(NlogN)

CMPE 250 Sorting Algorithms October 18, 2017 66 / 74 Sorting in linear time

Comparison sort: Lower bound: Ω(nlogn). Non : , radix sort They can sort in linear time (under certain assumptions).

CMPE 250 Sorting Algorithms October 18, 2017 67 / 74 Bucket Sort

Assumption: uniform distribution Input numbers are uniformly distributed in [0, 1). Suppose input size is n. Idea: Divide [0, 1) into n equal-sized subintervals (buckets). Distribute n numbers into buckets Expect that each bucket contains few numbers. Sort numbers in each bucket (insertion sort as default). Then go through buckets in order, listing elements.

CMPE 250 Sorting Algorithms October 18, 2017 68 / 74 Bucket Sort Algorithm

Algorithm 2 BucketSort(A) 1: n ← length[A] 2: for i ← 1 to n do insert A[i] into bucket B[bnA[i]c] 3: for i ← 0 to n − 1 do sort bucket B[i] using insertion sort 4: Concatenate bucket B[0],B[1],. . . ,B[n-1]

CMPE 250 Sorting Algorithms October 18, 2017 69 / 74 Bucket Sort

CMPE 250 Sorting Algorithms October 18, 2017 70 / 74 Analysis of Bucket Sort Algorithm

Algorithm 3 BucketSort(A) 1: n ← length[A] Ω(1) 2: for i ← 1 to n do O(n) insert A[i] into bucket B[bnA[i]c] Ω(1) (i.e. total O(n) 3: for i ← 0 to n − 1 do O(n) 2 sort bucket B[i] using insertion sort O(ni ) 4: Concatenate bucket B[0],B[1],. . . ,B[n-1] O(n)

where ni is the size of bucket B[i]. Pn−1 2 1 Thus T (n) = Θ(n) + i=0 O(ni ) = Θ(n) + nO(2 − n ) = Θ(n) Better than Ω(nlogn)

CMPE 250 Sorting Algorithms October 18, 2017 71 / 74 Radix Sort

Origin: Herman Hollerith’s card-sorting machine for the 1890 U.S. Census Digit-by-digit sort. Hollerith’s original (bad) idea: sort on most-significant digit first. Good idea: Sort on least-significant digit first with auxiliary stable sort. Stable Sort Property:The relative order of any two items with the same key is preserved after the execution of the algorithm.

CMPE 250 Sorting Algorithms October 18, 2017 72 / 74 Radix Sort Algorithm

Algorithm 4 RadixSort(A,d) 1: for i ← 1 to d do use stable BucketSort to sort array A on digit i.

Lemma: Given n d-digit numbers in which each digit can take on up to k possible values, RadixSort correctly sorts these numbers in Θ(d(n + k)) time. If d is constant and k = O(n), then time is Θ(n).

CMPE 250 Sorting Algorithms October 18, 2017 73 / 74 Radix Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 74 / 74