Randomized Algorithms Part One
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Randomized Quicksort Algorithm
Outline The Randomized Quicksort Algorithm K. Subramani1 1Lane Department of Computer Science and Electrical Engineering West Virginia University 7 February, 2012 Subramani Sample Analyses Outline Outline 1 The Randomized Quicksort Algorithm Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that A[1] < A[2]... A[n]. Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that A[1] < A[2]... A[n]. Note The assumption of distinctness simplifies the analysis. Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that A[1] < A[2]... A[n]. Note The assumption of distinctness simplifies the analysis. It has no bearing on the running time. Subramani Sample Analyses The Randomized Quicksort Algorithm The Partition subroutine Function PARTITION(A,p,q) 1: {We partition the sub-array A[p,p + 1,...,q] about A[p].} 2: for (i = (p + 1) to q) do 3: if (A[i] < A[p]) then 4: Insert A[i] into bucket L. -
Randomized Algorithms, Quicksort and Randomized Selection Carola Wenk Slides Courtesy of Charles Leiserson with Additions by Carola Wenk
CMPS 2200 – Fall 2014 Randomized Algorithms, Quicksort and Randomized Selection Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk CMPS 2200 Intro. to Algorithms 1 Deterministic Algorithms Runtime for deterministic algorithms with input size n: • Best-case runtime Attained by one input of size n • Worst-case runtime Attained by one input of size n • Average runtime Averaged over all possible inputs of size n CMPS 2200 Intro. to Algorithms 2 Deterministic Algorithms: Insertion Sort for j=2 to n { key = A[j] // insert A[j] into sorted sequence A[1..j-1] i=j-1 while(i>0 && A[i]>key){ A[i+1]=A[i] i-- } A[i+1]=key } • Best case runtime? • Worst case runtime? CMPS 2200 Intro. to Algorithms 3 Deterministic Algorithms: Insertion Sort Best-case runtime: O(n), input [1,2,3,…,n] Attained by one input of size n • Worst-case runtime: O(n2), input [n, n-1, …,2,1] Attained by one input of size n • Average runtime : O(n2) Averaged over all possible inputs of size n •What kind of inputs are there? • How many inputs are there? CMPS 2200 Intro. to Algorithms 4 Average Runtime • What kind of inputs are there? • Do [1,2,…,n] and [5,6,…,n+5] cause different behavior of Insertion Sort? • No. Therefore it suffices to only consider all permutations of [1,2,…,n] . • How many inputs are there? • There are n! different permutations of [1,2,…,n] CMPS 2200 Intro. to Algorithms 5 Average Runtime Insertion Sort: n=4 • Inputs: 4!=24 [1,2,3,4]0346 [4,1,2,3] [4,1,3,2] [4,3,2,1] [2,1,3,4]1235 [1,4,2,3] [1,4,3,2] [3,4,2,1] [1,3,2,4]1124 [1,2,4,3] [1,3,4,2] [3,2,4,1] [3,1,2,4]2455 [4,2,1,3] [4,3,1,2] [4,2,3,1] [3,2,1,4]3244 [2,1,4,3] [3,4,1,2] [2,4,3,1] [2,3,1,4]2333 [2,4,1,3] [3,1,4,2] [2,3,4,1] • Runtime is proportional to: 3 + #times in while loop • Best: 3+0, Worst: 3+6=9, Average: 3+72/24 = 6 CMPS 2200 Intro. -
Overview Parallel Merge Sort
CME 323: Distributed Algorithms and Optimization, Spring 2015 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Matriod and Stanford. Lecture 4, 4/6/2016. Scribed by Henry Neeb, Christopher Kurrus, Andreas Santucci. Overview Today we will continue covering divide and conquer algorithms. We will generalize divide and conquer algorithms and write down a general recipe for it. What's nice about these algorithms is that they are timeless; regardless of whether Spark or any other distributed platform ends up winning out in the next decade, these algorithms always provide a theoretical foundation for which we can build on. It's well worth our understanding. • Parallel merge sort • General recipe for divide and conquer algorithms • Parallel selection • Parallel quick sort (introduction only) Parallel selection involves scanning an array for the kth largest element in linear time. We then take the core idea used in that algorithm and apply it to quick-sort. Parallel Merge Sort Recall the merge sort from the prior lecture. This algorithm sorts a list recursively by dividing the list into smaller pieces, sorting the smaller pieces during reassembly of the list. The algorithm is as follows: Algorithm 1: MergeSort(A) Input : Array A of length n Output: Sorted A 1 if n is 1 then 2 return A 3 end 4 else n 5 L mergeSort(A[0, ..., 2 )) n 6 R mergeSort(A[ 2 , ..., n]) 7 return Merge(L, R) 8 end 1 Last lecture, we described one way where we can take our traditional merge operation and translate it into a parallelMerge routine with work O(n log n) and depth O(log n). -
Lecture 16: Lower Bounds for Sorting
Lecture Notes CMSC 251 To bound this, recall the integration formula for bounding summations (which we paraphrase here). For any monotonically increasing function f(x) Z Xb−1 b f(i) ≤ f(x)dx: i=a a The function f(x)=xln x is monotonically increasing, and so we have Z n S(n) ≤ x ln xdx: 2 If you are a calculus macho man, then you can integrate this by parts, and if you are a calculus wimp (like me) then you can look it up in a book of integrals Z n 2 2 n 2 2 2 2 x x n n n n x ln xdx = ln x − = ln n − − (2 ln 2 − 1) ≤ ln n − : 2 2 4 x=2 2 4 2 4 This completes the summation bound, and hence the entire proof. Summary: So even though the worst-case running time of QuickSort is Θ(n2), the average-case running time is Θ(n log n). Although we did not show it, it turns out that this doesn’t just happen much of the time. For large values of n, the running time is Θ(n log n) with high probability. In order to get Θ(n2) time the algorithm must make poor choices for the pivot at virtually every step. Poor choices are rare, and so continuously making poor choices are very rare. You might ask, could we make QuickSort deterministic Θ(n log n) by calling the selection algorithm to use the median as the pivot. The answer is that this would work, but the resulting algorithm would be so slow practically that no one would ever use it. -
Arxiv:1606.00484V2 [Cs.DS] 4 Aug 2016
Fast Deterministic Selection Andrei Alexandrescu The D Language Foundation [email protected] Abstract lection algorithm most used in practice [8, 27, 31]. It can The Median of Medians (also known as BFPRT) algorithm, be thought of as a partial QUICKSORT [16]: Like QUICK- although a landmark theoretical achievement, is seldom used SORT,QUICKSELECT relies on a separate routine to divide in practice because it and its variants are slower than sim- the input array into elements less than or equal to and greater ple approaches based on sampling. The main contribution than or equal to a chosen element called the pivot. Unlike of this paper is a fast linear-time deterministic selection al- QUICKSORT which recurses on both subarrays left and right UICKSELECT gorithm QUICKSELECTADAPTIVE based on a refined def- of the pivot, Q only recurses on the side known k inition of MEDIANOFMEDIANS. The algorithm’s perfor- to contain the th smallest element. (The side to choose is mance brings deterministic selection—along with its desir- known because the partitioning computes the rank of the able properties of reproducible runs, predictable run times, pivot.) and immunity to pathological inputs—in the range of prac- The pivot choosing strategy strongly influences QUICK- ticality. We demonstrate results on independent and identi- SELECT’s performance. If each partitioning step reduces the cally distributed random inputs and on normally-distributed array portion containing the kth order statistic by at least a UICKSELECT O(n) inputs. Measurements show that QUICKSELECTADAPTIVE fixed fraction, Q runs in time; otherwise, is faster than state-of-the-art baselines. -
IBM Research Report Derandomizing Arthur-Merlin Games And
H-0292 (H1010-004) October 5, 2010 Computer Science IBM Research Report Derandomizing Arthur-Merlin Games and Approximate Counting Implies Exponential-Size Lower Bounds Dan Gutfreund, Akinori Kawachi IBM Research Division Haifa Research Laboratory Mt. Carmel 31905 Haifa, Israel Research Division Almaden - Austin - Beijing - Cambridge - Haifa - India - T. J. Watson - Tokyo - Zurich LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T. J. Watson Research Center , P. O. Box 218, Yorktown Heights, NY 10598 USA (email: [email protected]). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home . Derandomization Implies Exponential-Size Lower Bounds 1 DERANDOMIZING ARTHUR-MERLIN GAMES AND APPROXIMATE COUNTING IMPLIES EXPONENTIAL-SIZE LOWER BOUNDS Dan Gutfreund and Akinori Kawachi Abstract. We show that if Arthur-Merlin protocols can be deran- domized, then there is a Boolean function computable in deterministic exponential-time with access to an NP oracle, that cannot be computed by Boolean circuits of exponential size. More formally, if prAM ⊆ PNP then there is a Boolean function in ENP that requires circuits of size 2Ω(n). -
January 30 4.1 a Deterministic Algorithm for Primality Testing
CS271 Randomness & Computation Spring 2020 Lecture 4: January 30 Instructor: Alistair Sinclair Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may be distributed outside this class only with the permission of the Instructor. 4.1 A deterministic algorithm for primality testing We conclude our discussion of primality testing by sketching the route to the first polynomial time determin- istic primality testing algorithm announced in 2002. This is based on another randomized algorithm due to Agrawal and Biswas [AB99], which was subsequently derandomized by Agrawal, Kayal and Saxena [AKS02]. We won’t discuss the derandomization in detail as that is not the main focus of the class. The Agrawal-Biswas algorithm exploits a different number theoretic fact, which is a generalization of Fermat’s Theorem, to find a witness for a composite number. Namely: Fact 4.1 For every a > 1 such that gcd(a, n) = 1, n is a prime iff (x − a)n = xn − a mod n. n Exercise: Prove this fact. [Hint: Use the fact that k = 0 mod n for all 0 < k < n iff n is prime.] The obvious approach for designing an algorithm around this fact is to use the Schwartz-Zippel test to see if (x − a)n − (xn − a) is the zero polynomial. However, this fails for two reasons. The first is that if n is not prime then Zn is not a field (which we assumed in our analysis of Schwartz-Zippel); the second is that the degree of the polynomial is n, which is the same as the cardinality of Zn and thus too large (recall that Schwartz-Zippel requires that values be chosen from a set of size strictly larger than the degree). -
Advanced Topics in Sorting
Advanced Topics in Sorting complexity system sorts duplicate keys comparators 1 complexity system sorts duplicate keys comparators 2 Complexity of sorting Computational complexity. Framework to study efficiency of algorithms for solving a particular problem X. Machine model. Focus on fundamental operations. Upper bound. Cost guarantee provided by some algorithm for X. Lower bound. Proven limit on cost guarantee of any algorithm for X. Optimal algorithm. Algorithm with best cost guarantee for X. lower bound ~ upper bound Example: sorting. • Machine model = # comparisons access information only through compares • Upper bound = N lg N from mergesort. • Lower bound ? 3 Decision Tree a < b yes no code between comparisons (e.g., sequence of exchanges) b < c a < c yes no yes no a b c b a c a < c b < c yes no yes no a c b c a b b c a c b a 4 Comparison-based lower bound for sorting Theorem. Any comparison based sorting algorithm must use more than N lg N - 1.44 N comparisons in the worst-case. Pf. Assume input consists of N distinct values a through a . • 1 N • Worst case dictated by tree height h. N ! different orderings. • • (At least) one leaf corresponds to each ordering. Binary tree with N ! leaves cannot have height less than lg (N!) • h lg N! lg (N / e) N Stirling's formula = N lg N - N lg e N lg N - 1.44 N 5 Complexity of sorting Upper bound. Cost guarantee provided by some algorithm for X. Lower bound. Proven limit on cost guarantee of any algorithm for X. -
CMSC 420: Lecture 7 Randomized Search Structures: Treaps and Skip Lists
CMSC 420 Dave Mount CMSC 420: Lecture 7 Randomized Search Structures: Treaps and Skip Lists Randomized Data Structures: A common design techlque in the field of algorithm design in- volves the notion of using randomization. A randomized algorithm employs a pseudo-random number generator to inform some of its decisions. Randomization has proved to be a re- markably useful technique, and randomized algorithms are often the fastest and simplest algorithms for a given application. This may seem perplexing at first. Shouldn't an intelligent, clever algorithm designer be able to make better decisions than a simple random number generator? The issue is that a deterministic decision-making process may be susceptible to systematic biases, which in turn can result in unbalanced data structures. Randomness creates a layer of \independence," which can alleviate these systematic biases. In this lecture, we will consider two famous randomized data structures, which were invented at nearly the same time. The first is a randomized version of a binary tree, called a treap. This data structure's name is a portmanteau (combination) of \tree" and \heap." It was developed by Raimund Seidel and Cecilia Aragon in 1989. (Remarkably, this 1-dimensional data structure is closely related to two 2-dimensional data structures, the Cartesian tree by Jean Vuillemin and the priority search tree of Edward McCreight, both discovered in 1980.) The other data structure is the skip list, which is a randomized version of a linked list where links can point to entries that are separated by a significant distance. This was invented by Bill Pugh (a professor at UMD!). -
Optimal Node Selection Algorithm for Parallel Access in Overlay Networks
1 Optimal Node Selection Algorithm for Parallel Access in Overlay Networks Seung Chul Han and Ye Xia Computer and Information Science and Engineering Department University of Florida 301 CSE Building, PO Box 116120 Gainesville, FL 32611-6120 Email: {schan, yx1}@cise.ufl.edu Abstract In this paper, we investigate the issue of node selection for parallel access in overlay networks, which is a fundamental problem in nearly all recent content distribution systems, grid computing or other peer-to-peer applications. To achieve high performance and resilience to failures, a client can make connections with multiple servers simultaneously and receive different portions of the data from the servers in parallel. However, selecting the best set of servers from the set of all candidate nodes is not a straightforward task, and the obtained performance can vary dramatically depending on the selection result. In this paper, we present a node selection scheme in a hypercube-like overlay network that generates the optimal server set with respect to the worst-case link stress (WLS) criterion. The algorithm allows scaling to very large system because it is very efficient and does not require network measurement or collection of topology or routing information. It has performance advantages in a number of areas, particularly against the random selection scheme. First, it minimizes the level of congestion at the bottleneck link. This is equivalent to maximizing the achievable throughput. Second, it consumes less network resources in terms of the total number of links used and the total bandwidth usage. Third, it leads to low average round-trip time to selected servers, hence, allowing nearby nodes to exchange more data, an objective sought by many content distribution systems. -
CS302 Final Exam, December 5, 2016 - James S
CS302 Final Exam, December 5, 2016 - James S. Plank Question 1 For each of the following algorithms/activities, tell me its running time with big-O notation. Use the answer sheet, and simply circle the correct running time. If n is unspecified, assume the following: If a vector or string is involved, assume that n is the number of elements. If a graph is involved, assume that n is the number of nodes. If the number of edges is not specified, then assume that the graph has O(n2) edges. A: Sorting a vector of uniformly distributed random numbers with bucket sort. B: In a graph with exactly one cycle, determining if a given node is on the cycle, or not on the cycle. C: Determining the connected components of an undirected graph. D: Sorting a vector of uniformly distributed random numbers with insertion sort. E: Finding a minimum spanning tree of a graph using Prim's algorithm. F: Sorting a vector of uniformly distributed random numbers with quicksort (average case). G: Calculating Fib(n) using dynamic programming. H: Performing a topological sort on a directed acyclic graph. I: Finding a minimum spanning tree of a graph using Kruskal's algorithm. J: Finding the minimum cut of a graph, after you have found the network flow. K: Finding the first augmenting path in the Edmonds Karp implementation of network flow. L: Processing the residual graph in the Ford Fulkerson algorithm, once you have found an augmenting path. Question 2 Please answer the following statements as true or false. A: Kruskal's algorithm requires a starting node. -
(Deterministic & Randomized): Finding the Median in Linear Time
Lecture 2 Selection (deterministic & randomized): finding the median in linear time 2.1 Overview Given an unsorted array, how quickly can one find the median element? Can one do it more quickly than by sorting? This was an open question for some time, solved affirmatively in 1972 by (Manuel) Blum, Floyd, Pratt, Rivest, and Tarjan. In this lecture we describe two linear-time algorithms for this problem: one randomized and one deterministic. More generally, we solve the problem of finding the kth smallest out of an unsorted array of n elements. 2.2 The problem and a randomized solution Consider the problem of finding the kth smallest element in an unsorted array of size n. (Let’s say all elements are distinct to avoid the question of what we mean by the kth smallest when we have equalities). One way to solve this problem is to sort and then output the kth element. We can do this in time O(n log n) if we sort using Mergesort, Quicksort, or Heapsort. Is there something faster – a linear-time algorithm? The answer is yes. We will explore both a simple randomized solution and a more complicated deterministic one. The idea for the randomized algorithm is to start with the Randomized-Quicksort algorithm (choose a random element as “pivot”, partition the array into two sets LESS and GREATER consisting of those elements less than and greater than the pivot respectively, and then recursively sort LESS and GREATER). Then notice that there is a simple speedup we can make if we just need to find the kth smallest element.