Practical Session #10 - , Sort properties, algorithm, Selection

Heap

A binary heap can be considered as a complete binary tree, Heap (the last level is full from the left to a certain point). For each node x, x ≥ left(x) and x ≥ right(x) Maximum-Heap The root in a Maximum-Heap is the maximal element in the heap For each node x, x ≤ left(x) and x ≤ right(x) Minimum-Heap The root in a Minimum-Heap is the minimal element in the heap Heap-Array A heap can be represented using an array:

1. A[1] - the root of the heap (not A[0]) 2. length[A] - number of elements in the array 3. heap-size[A] - number of elements in the heap represented by the array A 4. For each index i :

 Parent(i) =  i  2  Left(i) = 2i

 Right(i) = 2i+1

Actions on  Insert ~ O(log(n)) Heaps  Max ~ O(1)  Extract-Max ~ O(log(n))  Build-Heap ~ O(n)  Down-Heapify(H, i) – same as MaxHeapify seen in class but starting in the node in index i ~ O(log(n))  Up-Heapify(H, i) – Fixing the heap going from node in index i to the root as in Insert() ~ O(log(n))

1 Question 1 Given a maximum heap H, What is the of finding the 3 largest keys in H ? What is the time complexity of finding the C largest keys in H ? (C is a constant) What if C is not constant? Solution:

It takes O(1) to find the 3 maximal keys in H. The 3 maximal keys in H are:  The root  The root's maximal son y  The largest among y's maximal son and y's sibling

Finding the C largest keys in H is done in a similar way: The i-th largest key is one of at most i keys, the sons of the i-1 largest keys that have not been taken yet, thus can be found in O(i) time. Finding the C largest keys takes O(C2) = O(1) (Another solution would be ordering all the keys in the first C levels of the heap)

If C is not constant we have to be more efficient, We will use a new heap, where we will store nodes of the original heap so that for each node entered we will have it sons as put in this new heap but we also preserve access to its sons as in the original heap. We first enter the root to the new heap. Then C times we do Extract-Max() on the new heap while at each time we Insert the original sons (the sons as in the original heap) to the new heap. The logic is as before only this time since the new heap has at most C elements we get O(C*log(C)). (note that we do not change the original heap at any point)

Question 2 Analyze the complexity of finding the i smallest elements in a of size n using each of the following:  Sorting  Priority Queue

Sorting Solution:

 Sort the numbers using algorithm that takes 푂(푛 log 푛)  Find the i smallest elements in 푂(𝑖) Total = 푂(𝑖 + 푛 log 푛) = 푂(푛 log 푛)

Priority Queue Solution:

 Build a minimum heap in 푂(푛)  Execute Extract-Min i times in 푂(𝑖 log 푛) Total = 푂(푛 + 𝑖 log 푛). Using question 1 this can be reduced to 푂(푛 + 𝑖 log 𝑖). Note: 푛 + 𝑖 log 𝑖 = Θ(max(푛, 𝑖 log 𝑖)) and clearly 푛 + 𝑖 log 푛 = 푂(푛 log 푛) (and faster if 𝑖 ≠ Θ(푛))

2 Question 3 In a given heap H that is represented using an array; suggest an algorithm for extracting and returning the i'th indexed element.

Solution:

Delete(H, i) if (heap_size(H) < i) error(“heap underflow”) key ← H[i] H[i] ← H[heap_size] /*Deleting the i'th element and replacing it with the last element*/ heap_size ← heap_size − 1 /*Effectively shrinking the heap*/ Down-Heapify(H, i) /*Subtree rooted in H[i] is legal heap*/ Up-Heapify(H, i) /*Area above H[i] is legal heap*/ return key

Question 4

You are given two min-heaps 퐻1 and 퐻2 with n1 and n2 keys respectively. Each element of H1 is smaller than each element of 퐻2. How can you merge H1 and H2 into one heap H (with 푛1 + 푛2 keys) using only 푂(푛2) time? (Assume both heaps are represented in arrays of size > 푛1 + 푛2)

Solution

풏ퟏ ≥ 풏ퟐ: Add the keys of H2 to the array of H1 from index n1+1 to index 푛1 + 푛2.

The resulting array represents a correct heap due to the following reasoning. Number of leaves in H1 is l1, hence if l1 is even, the number of internal nodes is l1-1 and there are 2l1 "openings" for new leaves.

푙1 + 푙1 − 1 = 푛1 2푙1 − 1 = 푛1 2푙1 > 푛1 ≥ 푛2

If l1 is odd then there are l1 internal nodes and 2푙1 + 1 ≥ 푛2 openings

All keys from 푯ퟐ are added as leaves due to the fact stated in formula 2푙1 > 푛2. As all keys in H1 are smaller than any key in H2, the heap property still will hold in the resulting array.

풏ퟏ < 풏ퟐ: Build a new heap with all the elements in H1 and H2 in 푂(푛1 + 푛2) = 푂(2푛2) = 푂(푛2).

3 Question 5

Give an O(nlogk) time algorithm to merge k sorted arrays A1 .. Ak into one sorted array. n is the total number of elements (you may assume k≤n).

Solution

We will use a min-heap of k triples of the form (d, i, Ad[i]) for 1 ≤ 푑 ≤ 푘, 1 ≤ 𝑖 ≤ 푛. The heap will use an array B.

1. First we build the heap with the first elements lists A1...Ak in O(k) time. 2. In each step extract the minimal element of the heap and add it at the end of the output Array M. 3. If the array that the element mentioned in step two came from is not empty, than remove its minimum and add it to our heap.

for d←1 to k B[d] ← (d, 1, Ad[1]) Build-Heap(B) /*By the order of Ad [1] */ for j←1 to n (d, i, x) ← Extract-Min(B) M[j]←x if 𝑖 < 퐴푑. 푙푒푛𝑔푡ℎ Heap-Insert(B,(푑, 𝑖 + 1, 퐴푑[𝑖 + 1]))

Worst case time analysis: Build-Heap : O(k) Extract-Min : O(log k) Heap-Insert : O(log k) Total O(nlogk)

4 Question 6 (For practice at home)

Suppose that you had to implement a priority queue that only needed to store items whose priorities were integer values between 0 and 푘 − 1. Describe how a priority queue could be implemented so that 𝑖푛푠푒푟푡( ) has complexity 푂(1) and 푅푒푚표푣푒푀𝑖푛( ) has complexity 푂(푘). Be sure to give the pseudo-code for both operations. Hint: This is very similar to a hash table.

Solution

Use an array A of size 푘 of linked lists. The linked list in the ith slot stores entries with priority 𝑖. To insert an item into the priority queue with priority 𝑖, append it to the linked list in the 𝑖th slot.

𝑖푛푠푒푟푡(푣) 퐴[푣. 푝푟𝑖표푟𝑖푡푦].add-to-tail(푣)

To remove an item from the queue, scan the array of lists, starting at 0 until a non-empty list is found. Remove the first element and return it—the for-loop induces the O(k) worst case complexity. E.g.,

푟푒푚표푣푒푀𝑖푛( ) for 𝑖 ← 0 to k-1 List L← 퐴[𝑖] if( L.empty() = false ) 푣 ← L.remove-from-head( ) /*Removes list's head and returns it = dequeue*/ return 푣 return NULL

5 Quicksort

quickSort( A, low, high ) if( high > low ) pivot ← partition( A, low, high ) // quickSort( A, low, pivot-1 ) quickSort( A, pivot+1, high )

int partition( A, low, high ) pivot_value  A[low] left ← low pivot ← left right ← high while ( left < right )

// Move left while item < pivot while( left < high && A[left] ≤ pivot_value) left++

// Move right while item > pivot while( A[right] > pivot_value) right--

if( left < right ) Make sure right has not passed left SWAP(A,left,right)

// right is final position for the pivot A[low] ← A[right] A[right] ← pivot_item return right quickSort(A,0,length(A)-1)

 stable sorting algorithms: maintain the relative order of records with equal keys  in place algorithms: need only O(log N) extra memory beyond the items being sorted and they don't need to create auxiliary locations for data to be temporarily stored  QuickSort version above is not stable.

6 Question 1

Given a multi-set S of n integer elements and an index k (1 ≤ k ≤ n), we define the k-smallest element to be the k-th element when the elements are sorted from the smallest to the largest.

Suggest an O(n) on average time algorithm for finding the k-smallest element.

Example: For the given set of numbers: {6, 3, 2, 4, 1, 1, 2, 6} The 4-smallest element is 2 since in the 2 is the 4th element in the sorted set {1, 1, 2, 2, 3, 4, 6, 6}.

Solution:

The algorithm is based on the Quick-Sort algorithm. Quick-Sort : //Reminder quicksort(A,p, r) If (p

Select(k, S) // returns k-th element in S. pick x in S partition S into: // Slightly different variant of partition() max(L) < x, E = {x}, x < min(G) if k ≤ length(L) // Searching for item ≤ x. return Select(k, L) else if k ≤ length(L) + length(E) // Found return x else // Searching for item ≥ x. return Select(k - length(L) - length(E), G)

In the worst case: the chosen pivot x is the maximal element in the current array and there is only one such element. G is empty, 푙푒푛𝑔푡ℎ(퐸) = 1 and 푙푒푛𝑔푡ℎ(퐿) = 푙푒푛𝑔푡ℎ(푆) − 1 푇(푙 − 1) + 푂(푙) 푙 < 푘 푇(푙) = { 푘 푙 = 푘

The solution of the recursive equation: 푇(푛) = 푂(푛2)

In the average case: similar to quick-sort, half of the elements in S are good pivots, for 3푛 which the size of L and G are each less than . 4 3푛 Therefore, 푇(푛) ≤ 푇 ( ) + 푂(푛) = 푂(푛), (master theorem, case c). 4

7 Question 2

Given an array of n numbers, suggest an Θ(푛) expected time algorithm to determine whether there is a number in A that appears more than 풏/ퟐ times.

Solution:

If x is a number that appears more than 풏/ퟐ times in A, then x is the (⌊풏/ퟐ⌋ + ퟏ)-smallest in the array A.

Frequent (A,n) x ← Select ((⌊풏/ퟐ⌋ + ퟏ), A) // find middle element count ← 0 for i ← 1 to n do: // count appearances of middle element if (A[i] = x) count ++ if count > n/2 then return TRUE else return FALSE

Time Complexity: In the mean case, Select algorithm runs in Θ(푛). Computing count takes Θ(푛) as well. Total run time in the mean case: Θ(푛)

Question 3 n records are stored in an array A of size n. Suggest an algorithm to sort the records in O(n) (time) and no additional space in each of the following cases:

I. All the keys are 0 or 1 II. All the keys are in the range [1..k], k is constant

Solution:

I. Use Quicksort's partition method as we did in question 1 with pivot 0. After the completion of the partition function, the array is sorted (L={}, E will have all elements with key 0, G will have all elements with key 1). Time complexity is Θ(푛) – the cost of one partition. II. First, partition method on A[1..n] with pivot 1, this way all the records with key 1 will be on the first x1 indices of the array. Second, partition method on A[x1+1,..,n] with pivot 2 ... After k-1 steps A is sorted Time complexity is O(kn)=O(n) – the cost of k partitions.

8 Question 4

Given the following algorithm to sort an array A of size n: 1. Sort recursively the first 2/3 of A (A[1..2n/3]) 2. Sort recursively the last 2/3 of A (A[n/3+1..n]) 3. Sort recursively the first 2/3 of A (A[1..2n/3])

* If (2/3*n) is not a natural number, round it up.

Prove the above algorithm sorts A and find a recurrence T(n), expressing it's running time.

Solution: The basic assumption is that after the first 2 steps, the n/3 largest number are in their places, sorted in the last third of A. In the last stage the algorithm sorts the left 2 thirds of A.

2푛 4푛 8푛 2 푖 푇(푛) = 3푇 ( ) = 3 (3푇 ( )) = 3 (3 (3푇 ( ))) = … = 3푖푇 (( ) 푛) 3 9 27 3 after i=log3 푛 steps ... 2 푇(푛) = 3log3/2n · 푇(1) = 3(log3 푛)/ log3(3/2) 1/(log3(3/2)) = (3log3 푛) = 푛1/(log3(3/2)) = 푛(log3 3)/(log3(3/2)) = 푛log3/2 3

log 3 T(n) = O(n 3/2 ) , (also according to the Master-Theorem)

9 Question 5

Given an array A of M+N elements, where the first N elements in A are sorted and the last M elements in A are unsorted.

1. Evaluate the run-time complexity in terms of M and N in the worst case, of fully sorting the array using on A?

2. For each of the following cases, which sort method (or methods combination) would you use and what would be the run-time complexity in the worst case?

a) M = O(1) b) M = O(logN) c) M = O(N)

Solution:

1. O(M(M+N)) The last M elements will be inserted to their right place and that requires N, N+1, N+2,...,N+M shifts ( in the worst case ), or O(M2 + N) if we apply insertion sort to the last M elements and then merge. 2. a. Insertion-Sort in O(N) b. Use any comparison based sort algorithm that has a runtime of O(MlogM) (Such as ) on the M unsorted elements, and then merge the two sorted parts of the array in O(M + N). Total runtime: O(MlogM + N) = O(N) c. Use any efficient comparison based sort algorithm for a runtime of O((M+N)log(M+N))=O(NlogN). Quick-Sort is bad for this case, as its worst case analysis is O(n2).

10 Question 6

How can we use an unstable sorting (comparisons based) algorithm U (for example, quick- sort or heap-sort) to build a new stable S with the same time complexity as the algorithm U?

Solution 1:

U is a comparisons based sorting algorithm, thus it's runtime 푻푼(풏) = 휴(풏 퐥퐨퐠 풏).

1. Add a new field, index, to each element. This new field holds the original index of that element in the unsorted array. 2. Change the comparison operator so that: [key1, index1] < [key2, index2]  key1 < key2 or ( key1 = key2 and index1 < index2) [key1, index1] > [key2, index2]  key1 > key2 or (key1 = key2 and index1 > index2) 3. Execute the sorting algorithm U with the new comparison operation.

Time complexity: adding an index field is O(n), the sorting time is the same as of the unstable algorithm, 푇푈(푛), total is 푇(푛) = 푇푈(푛) (as 푇푈(푛) = Ω(푛 log 푛)).

Solution 2:

1. Add a new field, index, to each element in the input array A – O(n). This new field holds the original index of that element in the input. 2. Execute U on A to sort the elements by their key – 푇푈(푛) 3. Execute U on each set of equal-key elements to sort them by the index field – 푇푈(푛)

Time complexity of phase 3: assume we have m different keys in the input array(1 ≤ m ≤ n), 풎 ni is the number of elements with key ki, where ퟏ ≤ 풊 ≤ 풎 and ∑풊=ퟏ(풏풊) = 풏. That is, the time complexity of phase 3 is:

풎 풎

푻(풏) = ∑ 푻푼(풏풊) = ∑ 휴(풏풊 퐥퐨퐠 풏풊) 풊=ퟏ 풊=ퟏ

In the worst case all keys in the array are equal (i.e., m=1) and the phase 3 is in fact sorting of the array by index values: 푻(풏) = 푻푼(풏) = 휴(풏 퐥퐨퐠 풏).

Time complexity (for entire algorithm): 푻(풏) = ퟐ푻푼(풏) + 푶(풏) = 푶(푻푼(풏)).

11