COEN279 - Design and Analysis of Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
COEN279 - Design and Analysis of Algorithms Probabilistic Analysis and Randomized Algorithm Dr. Tunghwa Wang Fall 2016 09/22/2016 Dr. Tunghwa Wang 1 Announcement Always read the document online unless necessary to download: Files could be updated. For the first week, everything is not well prepared yet: Linux account: Use putty to work remotely. E-mail: Automatic forwarded to [email protected] In case you receive from this account instead of [email protected]. Bonus assignments: #2 – deadline is extended to 09/25. Future ones – due on Thursday of the week 09/20/2016 Dr. Tunghwa Wang 2 Probabilistic Analysis It is the use of probability in the analysis of problems and thus the algorithm designed to solve these problems. review We compute the average-case running time by taking the average over all the possible inputs – it is average-case running time. The distribution of the inputs is the key to the analysis: We know the distribution of inputs. We assume the model of distribution function of inputs. Otherwise, we cannot do probabilistic analysis. 09/22/2016 Dr. Tunghwa Wang 3 Randomized Algorithms Tools and techniques widely used in many applications. Benefits: Simplicity Speed Robustness Inputs: generated using random number generator Pseudo random numbers Distribution functions Queueing models More Initial values: Randomized initial values of internal variables do not make the algorithm a randomized algorithm. By the nature, they can also be initialized with any valid values, say all initialized to 0. 09/22/2016 Dr. Tunghwa Wang 4 Probabilistic or Randomized Algorithm At least once during the algorithm, a random number is used to make a decision instead of spending time to work out which alternative is best. The worst-case running time of a randomized algorithm is almost always the same as the worst-case running time of the non- randomized algorithm. A good randomized algorithm has no bad input, but only bad random numbers. The random numbers are important, and we can get an expected running time, where we now average over all possible random numbers instead of over all possible inputs, or the mean time that it would take to solve the same instance over and over again. 09/22/2016 Dr. Tunghwa Wang 5 Random Number Generators True randomness is virtually impossible to do on a computer. Pseudorandom numbers. What really needed is a sequence of random numbers appear independently. The linear congruential generator: xi+1 = Axi % M, where x0 is the seed and 1 ≤ x0 < M. If M is prime, xi is never 0. After M-1 numbers, the sequence repeat (period of M-1). Some choices of A gets shorter period than M-1. If M is chosen to be a large, 31-bit prime, the period should be significantly large for most applications. M = 231 - 1 = 2,147,483,647 and A = 48,271. Same sequence occurs all the time for easy debugging, and input seed (e.g., use system clock) for real runs. Usually a random real number in the open interval (0,1), which can be done by dividing by M. Multiplication overflow prevention: let Q = M / A = 44,488 and R = M % A = 3,399, xi+1 = Axi % M = A(xi % Q) - R(xi / Q) + Md(xi), where d(xi) = xi / Q - Axi / M = 1 iff the remaining terms evaluate to less than zero, 0 otherwise. xi+1 = Axi % M = Axi - M(Axi / M) = Axi - M(xi / Q) + M(xi / Q) - M(Axi / M) = Axi - M(xi / Q) + M(xi / Q - Axi / M) = A(Q(xi / Q) + xi % Q) - M(xi / Q) + M(xi / Q - Axi / M) = (AQ - M)(xi / Q) + A(xi % Q) + M(xi / Q - Axi / M) = -R(xi / Q) + A(xi % Q) + M(xi / Q - Axi / M) = A(xi % Q) - R(xi / Q) + Md(xi) 09/22/2016 Dr. Tunghwa Wang 6 Complexity Randomized Polynomial Time (RP) is the complexity class of problems for which a probabilistic Turing machine exists with these properties It always runs in polynomial time in the input size. If the correct answer is NO, it always returns NO. If the correct answer is YES, then it returns YES with probability at least 1/2 (otherwise, it returns NO). NP RP P Unsolved problem in computer science: RP = P? co-RP: When RP return YES, it is always right. The complexity class co-RP is similarly defined, except that NO is always right and YES might be wrong. co-NP co-RP P 09/22/2016 Dr. Tunghwa Wang 7 Complexity Bounded-error Probabilistic Polynomial Time (BPP) is the class of decision problems solvable by a probabilistic Turing machine in polynomial time with an error probability bounded away from 1/3 for all instances: It is allowed to flip coins and make random decisions. It is guaranteed to run in polynomial time. On any given run of the algorithm, it has a probability of at most 1/3 of giving the wrong answer, whether the answer is YES or NO. BPP P, BPP RP, BPP co-RP Zero-error Probabilistic Polynomial Time (ZPP) is the complexity class of problems for which a probabilistic Turing machine exists with these properties: The running time is polynomial in expectation for every input. It always returns the correct YES or NO answer. ZPP = RP ∩ co-RP 09/22/2016 Dr. Tunghwa Wang 8 Numerical Probabilistic Algorithms For certain real-life problems, computation of an exact solution is not possible even in principle, e.g., uncertainties in the experimental data, digital computers handle only binary values, etc. For other problems, a precise answer exists but it would take too long to figure it out exactly. Numerical algorithms yield a confidence interval, and the expected precision improves as the time available to the algorithm increase. The error is usually inversely proportional to the square root of the amount of work performed. Buffon’s needle Quasi Monte Carlo integration Probabilistic counting 09/22/2016 Dr. Tunghwa Wang 9 Monte Carlo Algorithm Always fast but probably correct: Monte Carlo algorithms give exact answer with high probability whatever the instance considered, although sometimes they provide a wrong answer. Generally you cannot tell if the answer is correct, but you can reduce the error probability arbitrarily by allowing the algorithm more time (amplifying the stochastic). A Monte Carlo algorithm is p-correct if it returns a correct answer with probability at least p (0 < p < 1), whatever the instance considered. p depends on the instance size but not on the instance itself. Verifying matrix multiplication Primality testing Skip list 09/22/2016 Dr. Tunghwa Wang 10 Las Vegas Algorithm Always correct but probably fast: Las Vegas algorithms make probabilistic choices to help guide them more quickly to a correct solution, they never return a wrong answer. Two main categories of Las Vegas algorithms: it take longer time to solve a problem when unfortunate choice are made (e.g., Quicksort), and alternatively, they allow themselves go to a dead end and admit that they cannot find a solution in this run of the algorithm. A Las Vegas algorithm has the Robin Hood effect, with high probability, instances that took a long time deterministically are now solved much faster, but instances on which the deterministic algorithm was particularly good are slowed down to average. Let p(x) be the probability of success of the algorithm, then the expected time t(x) is 1/p(x). However, a correct analysis must consider separately the expected time taken by LV(x) in case of success s(x) and in case of failure f(x). t(x) = s(x) + ((1-p(x))/p(x))f(x). The eight queens problem Probabilistic quick-select and quick-sort Universal hashing Factoring large integers 09/22/2016 Dr. Tunghwa Wang 11 Atlantic City Algorithm Probably fast and probably correct: BPP complexity 09/22/2016 Dr. Tunghwa Wang 12 Backup Slides 09/22/2016 Dr. Tunghwa Wang 13 Probability 09/22/2016 Dr. Tunghwa Wang 14 Probability 09/22/2016 Dr. Tunghwa Wang 15 Probability 09/22/2016 Dr. Tunghwa Wang 16 Probability 09/22/2016 Dr. Tunghwa Wang 17 Probability return 09/22/2016 Dr. Tunghwa Wang 18 Reference Random Algorithms: http://ocw.mit.edu/courses/electrical-engineering-and-computer- science/6-856j-randomized-algorithms-fall-2002/lecture-notes/ 09/22/2016 Dr. Tunghwa Wang 19 Buffon’s Needle Throw a needle at random on a floor made of planks of constant width, if the needle is exactly half as long as the planks in the floor and if the width of the cracks between the planks are zero, the probability that the needle will fall across a crack is 1/ . General case: The probability that a randomly thrown needle will fall across a crack is 2λ/ω , where λ is needle length and ω is plank width. Estimating : ≈ 2λ/ωP where P is the probability from randomly throwing the needle. P = h / n where h success over n throws. The result estimate will be between - ε and + ε with probability at least (desired reliability). return 09/22/2016 Dr. Tunghwa Wang 20 Probabilistic Counting With an n-bit counter, we can only count up to 2n – 1. With probabilistic counting, we can count up to larger value at the expense of some loss of accuracy. Counting twice as far to up to 2n+1 - 2 by initializing to 0, each time tick is called, flip a fair coin. If it comes up head, add 1 to the register, otherwise, do nothing. When count is called, return twice the value stored in the register. n Counting exponentially farther from 0 to 22 -1 - 1.