
Bucket Sort Bucket sort assumes that the input is drawn from a uniform ² CS 361, Lecture 14 distribution over the range [0; 1) Basic idea is to divide the interval [0; 1) into n equal size ² Jared Saia regions, or buckets University of New Mexico We expect that a small number of elements in A will fall into ² each bucket To get the output, we can sort the numbers in each bucket ² and just output the sorted buckets in order 2 Outline Bucket Sort //PRE: A is the array to be sorted, all elements in A[i] are between $0$ and $1$ inclusive. //POST: returns a list which is the elements of A in sorted order BucketSort(A){ B = new List[] Bucket Sort n = length(A) ² Midterm Review for (i=1;i<=n;i++){ ² insert A[i] at end of list B[floor(n*A[i])]; } for (i=0;i<=n-1;i++){ sort list B[i] with insertion sort; } return the concatenated list B[0],B[1],...,B[n-1]; } 1 3 Bucket Sort Analysis We claim that E(n2) = 2 1=n Claim: If the input numbers are distributed uniformly over ² i ¡ ² To prove this, we de¯ne indicator random variables: X = 1 the range [0; 1), then Bucket sort takes expected time O(n) ² ij if A[j] falls in bucket i and 0 otherwise (de¯ned for all i, Let T (n) be the run time of bucket sort on a list of size n ² 0 i n 1 and j, 1 j n) Let ni be the random variable givingthe number of elements · · ¡ · · ² Thus, n = n X in bucket [ ] i j=1 ij B i ² 2 n 1 2 We can nowP compute E(ni ) by expanding the square and Then T (n) = £(n) + ¡ O(n ) ² ² i=0 i regrouping terms P 4 6 Analysis Analysis n 1 2 We know T (n) = £(n) + ¡ O(n ) ² i=0 i Taking expectation of both sides, we have ² P n 1 n ¡ 2 E(n ) = E(( X )2) E(T (n)) = E(£(n) + O(ni )) i2 ij j=1 iX=0 X n 1 n n ¡ 2 = E( X X ) = £(n) + E(O(ni )) ij ik j=1 k=1 iX=0 X X n 1 n ¡ 2 = E( X2 + X X ) = £(n) + (O(E(ni ))) ij ij ik i=0 jX=1 1 Xj n 1 k Xn;k=j X n · · · · 6 2 The second step follows by linearity of expectation = E(Xij) + E(XijXik)) ² j=1 1 j n 1 k n;k=j The last step holds since for any constant a and random X ·X· · ·X 6 ² variable X, E(aX) = aE(X) (see Equation C.21 in the text) 5 7 Analysis Analysis We can evaluate the two summations separately. Xij is 1 ² n 1 2 with probability 1 and 0 otherwise Recall that E(T (n)) = £(n) + ¡ (O(E(n ))) =n ² i=0 i 2 2 Thus E(X ) = 1 (1=n) + 0 (1 1=n) = 1=n We can now plug in the equationP E(ni ) = 2 (1=n) to get ² ij ¤ ¤ ¡ ² ¡ Where k = j, the random variables Xij and Xik are indepen- n 1 ² 6 ¡ dent E(T (n)) = £(n) + 2 (1=n) ¡ For any two independent random variables X and Y , E(XY ) = iX=0 ² = £(n) + £(n) E(X)E(Y ) (see C.3 in the book for a proof of this) = £( ) Thus we have that n ² E(X X ) = E(X )E(X ) ij ik ij ik Thus the entire bucket sort algorithm runs in expected linear = (1=n)(1=n) ² time = (1=n2) 8 10 Analysis Midterm Substituting these two expected values back into our main ² equation, we get: Midterm: Tuesday, March 23rd in class (the Tuesday after ² n spring break) 2 2 E(ni ) = E(Xij) + E(XijXik)) You can bring 2 pages of \cheat sheets" to use during the jX=1 1 Xj n 1 k Xn;k=j ² n · · · · 6 exam. Otherwise the exam is closed book and closed note = (1=n) + (1=n2) Note that the web page contains links to prior classes and ² j=1 1 j n 1 k n;k=j X ·X· · ·X 6 their midterms. Many of the questions on my midterm will 2 = n(1=n) + (n)(n 1)(1=n ) be similar in avor to these past midterms (and to exercises ¡ = 1 + (n 1)=n in the book)! ¡ = 2 (1=n) ¡ 9 11 Review Session Question 1 Collection of true/false questions and short answer on: Asymptotic notation: e.g. I give you a bunch of functions ² and ask you to give me the simplest possible theta notation for each I will have a review session on Monday, March 22nd from ² Recurrences: e.g. I ask you to solve a recurrence 5:30-6:30 in FEC 141 (conference room on ¯rst oor of ² Heaps: e.g. I ask you questions about properties of heaps FEC) ² and priority queues Please try to make it to this review session if at all possible ² Sorting Algorithms: heapsort, quicksort, bucketsort, merge- ² sort, (know resource bounds for these algorithms) Probability: Random variables, expectation, linearity of ex- ² pectation, birthday paradox, analysis of expected runtime of quicksort and bucketsort 12 14 Midterm Question 2 Solving recurrence relations: 5 questions, about 20 points each ² Like problems on hw 4 and Problem 7-3 (Stooge Sort) Hard but fair ² ² You'll need to know annihilators, change of variables, han- There will be some time pressure, so make sure you can e.g. ² ² dling homogeneous and non-homogeneous parts of recur- solve recurrences both quickly and correctly. rences, recursion trees, and the Master Method I expect a class mean of between 60 :( and 70 :) points ² You'll need to know the formulas for sums of convergent and ² divergent geometric series 13 15 Question 3 Questions 5 Loop Invariant: Will give you an algorithm and ask you to give the loop Asymptotic notation: ² invariant you would use to show it is correct You may also need to give initialization, maintenance and Similar to book problems: 3.1-2, 3.1-5, 3.1-7 ² ² termination for your loop invariant Similar to the hw problems and in-class exercises on loop ² invariants 16 18 Question 4 Asymptotic Notation Let's now review asymptotic notation ² I'll review for O notation, make sure you understand the other Recurrence proof using induction (i.e. the substitution method): ² four types f(n) = O(g(n)) if there exists positive constants c and n0 You'll need to give base case, inductive hypothesis and then ² ² such that 0 f(n) cg(n) for all n n0 show the inductive step · · ¸ This means to show that f(n) = O(g(n)), you need to give Similar to Exercises 7.2-1, 4.2-1 and 4.2-3 ² ² positive constants c and n0 for which the above statement is true! 17 19 Example 1 Example 3 Prove that 22n = O(5n) Prove that 2n+1 = O(2n) ² ² Goal: Show there exist positive constants c and n0 such that Goal: Show there exist positive constants c and n0 such that ² 2n n ² 2 c 5 for all n n0 2n+1 c 2n for all n n · ¤ ¸ · ¤ ¸ 0 22n c 5n (7) 2n+1 c 2n (1) · ¤ · ¤ 4n c 5n (8) 2 2n c 2n (2) · ¤ ¤ · ¤ (4=5)n c (9) 2 c (3) · · (10) n+1 n Hence for c = 2 and n0 = 1, 2 c 2 for all n n0 ² · ¤ ¸ Hence for c = 1 and n = 1, 22n c 5n for all n n ² 0 · ¤ ¸ 0 20 22 Example 2 A Procedure Prove that n + pn = O(n) ² Goal: Show there exist positive constants c and n such that Goal: prove that f(n) = O(g(n)) ² 0 n + pn c n for all n n · ¤ ¸ 0 1. Write down what this means mathematically n + pn c n (4) · ¤ 2. Write down the inequality f(n) c g(n) 1 · ¤ 1 + c (5) 3. Simplify this inequality so that c is isolated on the right hand pn · (6) side 4. Now ¯nd a n0 and a c such that for all n n0, this simpli¯ed Hence if we choose n = 4, and c = 1:5, then it's true that ¸ ² 0 inequality is true n + pn c n for all n n · ¤ ¸ 0 21 23 In Class Exercise Example: Substitution Method Base Case: T (1) = 2 which is in fact 21. ² Show that n2n is O(4n) Inductive Hypothesis: For all j < n, T (j) = 2j ² Inductive Step: We must show that T (n) = 2n, assuming the ² Q1: What is the exact mathematical statement of what you inductive hypothesis. ² need to prove? 2 n T (n) = 2 ¡ T (n 1) T (n 1) Q2: What is the ¯rst inequality in the chain of inequalities? 2 n ¤ n 1¡ n¤ 1 ¡ ² T (n) = 2 ¡ 2 ¡ 2 ¡ Q3: What is the simpli¯ed inequality where c is isolated? ¤ ¤ ² T (n) = 2n Q4: What is a n and c such that the inequality of the last ² 0 question is always true? where the inductive hypothesis allows us to make the replace- ments in the second step. 24 26 Example: Substitution Method Todo Consider the following recurrence: ² T (n) = 22 n T (n 1) T (n 1) ¡ ¤ ¡ ¤ ¡ Enjoy Spring! where T (1) = 2. ² Study for Midterm Show that T (n) = 2n by induction. Include the following ² ² in your proof: 1)the base case(s) 2)the inductive hypothesis and 3)the inductive step.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-