Fall 2009 Version of Course 15-359, Computer Science Department, Carnegie Mellon University
Total Page:16
File Type:pdf, Size:1020Kb
Fall 2009 version of Course 15-359, Computer Science Department, Carnegie Mellon University. Acknowledgments: CMU’s course 15-359, Probability and Computing, was originally conceived and designed by Mor Harchol-Balter and John Lafferty. The choice, order, and presentation of topics in the latter half of the course is strongly informed by the work of Mor Harchol-Balter. Indeed, you might like to buy her book! Performance Modeling and Design of Computer Systems: Queueing Theory in Action, http://www.cs.cmu.edu/~harchol/PerformanceModeling/book.html The choice, order, and presentation of topics in the earlier half of the course is informed by the work of John Lafferty: http://galton.uchicago.edu/~lafferty/ Further, a very great deal of material in these lecture notes was strongly informed by the outstanding book Probability and Computing by Michael Mitzenmacher and Eli Upfal, http://www.cambridge.org/us/academic/subjects/computer-science/algorithmics-complexity- computer-algebra-and-computational-g/probability-and-computing-randomized-algorithms-and- probabilistic-analysis Many thanks to Mor Harchol-Balter, John Lafferty, Michael Mitzenmacher, Eli Upfal (and many other web sources from which I borrowed)! 15-359: Probability and Computing Fall 2008 Lecture 1: Probability in Computing; Verifying matrix multiplication 1 Administrivia The course web site is: http://15-359.blogspot.com. Please read carefully the Policies document there, which will also be handed out in class. This is also a blog, so please add it to your RSS reader, or at least read it frequently. Important announcements | and more! | will appear there. 2 Probability and computing Why is probability important for computing? Why is randomness important for computing? Here are some example applications: Cryptography. Randomness is essential for all of crypto. Most cryptographical algorithms in- volve the parties picking secret keys. This must be done randomly. If an algorithm deterministically said, \Let the secret key be 8785672057848516," well, of course that would be broken. Simulation. When writing code to simulate, e.g., physical systems, often you model real-world events as happening randomly. Statistics via sampling. Today we often work with huge data sets. If one wants to approximate basic statistics, e.g., the mean or mode, it is more efficient to sample a small portion of the data and compute the statistic, rather than read all the data. This idea connects to certain current research topics: \property testing algorithms" and \streaming algorithms". Learning theory. Much of the most successful work in AI is done via learning theory and other statistical methods, wherein one assumes the data is generated according to certain kinds of probability distributions. Systems & queueing theory. When studying the most effective policies for scheduling and processor-sharing, one usually models job sizes and job interarrival times as coming from a stochastic process. Data compression. Data compression algorithms often work by analyzing or modeling the un- derlying probability distribution of the data, or its information-theoretic content. Error-correcting codes. A large amount of the work in coding theory is based on the problem of redundantly encoding data so that it can be recovered if there is random noise. 1 Data structures. When building, e.g., a static dictionary data structure, one can optimize response time if one knows the probability distribution on key lookups. Even moreso, the time for operations in hash tables can be greatly improved by careful probabilistic analysis. Symmetry-breaking. In distributed algorithms, one often needs a way to let one of several identical processors \go first”. In combinatorial optimization algorithms | e.g., for solving TSP or SAT instances | it is sometimes effective to use randomness to decide which city or variable to process next, especially on highly symmetric instances. Theory of large networks. Much work on the study of large networks | e.g., social networks like Facebook, or physical networks, like the Internet | models the graphs as arising from special kinds of random processes. Google's PageRank algorithm is famously derived from modeling the hyperlinks on the internet as a \Markov chain". Quantum computing. The laws of physics are quantum mechanical and there has been tremen- dous recent progress on designing \quantum algorithms" that take advantage of this (even if quan- tum computers have yet to be built). Quantum computing is inherently randomized | indeed, it's a bit like computing with probabilities that can be both positive and negative. Statistics. Several areas of computing | e.g., Human-Computer Interaction | involve running experimental studies, often with human subjects. Interpreting the results of such studies, and decid- ing whether their findings are \statistically significant", requires a strong knowledge of probability and statistics. Games and gambling. Where would internet poker be without randomness? Making algorithms run faster. Perhaps surprisingly, there are several examples of algorith- mic problems which seem to have nothing to do with randomness, yet which we know how to solve much more efficiently using randomness than without. This is my personal favorite example of probability in computing. We will see an example of using randomness to make algorithms more efficient today, in the problem of verifying matrix multiplication. 3 About this course This course will explore several of the above uses of probability in computing. To understand them properly, though, you will need a thorough understanding of probability theory. Probability is traditionally a \math" topic, and indeed, this course will be very much like a math class. The emphasis will be on rigorous definitions, proofs, and theorems. Your homework solutions will be graded according to these \mathematical standards". One consequence of this is that the first part of the course may be a little bit dry, because we will spend a fair bit of time going rigorously through the basics of probability theory. But of course it is essential that you go through these basics carefully so that you are prepared for the more advanced applications. I will be interspersing some \computer science" applications throughout 2 this introductory material, to try to keep a balance between theory and applications. In particular, today we're going to start with an application, verifying matrix multiplication. Don't worry if you don't understand all the details today; it would be a bit unfair of me to insist you do, given that we haven't even done the basic theory yet! I wanted to give you a flavor of things to come, before we get down to the nitty-gritty of probability theory. 4 Verifying matrix multiplication Let's see an example of an algorithmic task which a probabilistic algorithm can perform much more efficiently than any deterministic algorithm we know. The task is that of verifying matrix multiplication. 4.1 Multiplying matrices There exist extremely sophisticated algorithms for multiplying two matrices together. Suppose we have two n × n matrices, A and B. Their product, C = AB, is also an n × n matrix, with entries given by the formula n X Cij = AikBkj: (1) k=1 How long does it take to compute C? Since C has n2 entries, even just writing it down in memory will take at least n2 steps. On the other hand, the \obvious" method for computing C given by (1) takes about n3 steps; to compute each of the n2 entries it does roughly n many arithmetic operations. Actually, there may be some extra time involved if the numbers involved get very large; e.g., if they're huge, it could take a lot of time just to, say, multiply two of them together. Let us eliminate this complication by just thinking about: Matrix multiplication mod 2. Here we assume that the entries of A and B are just bits, 0 or 1, and we compute the product mod 2: n X Cij = AikBkj (mod 2): (2) k=1 This way, every number involved is just a bit, so we needn't worry about the time of doing arithmetic on numbers. The \obvious" matrix multiplication algorithm therefore definitely takes at most O(n3) steps.1 As you may know, there are very surprising, nontrivial algorithms that multiply matrices in time faster than O(n3). The first and perhaps most famous is Strassen's algorithm: Theorem 1. (Strassen, 1969.) It is possible to multiply two matrices in time roughly nlog2 7 ≈ n2:81. 1By the way, if you are not so familiar with big-Oh notation and analysis of running time, don't worry too much. This is not an algorithms course, and we won't be studying algorithmic running time in too much detail. Rather, we're just using it to motivate the importance of probability in algorithms and computing. 3 Incidentally, I've heard from Manuel Blum that Strassen told him that he thought up his algo- rithm by first studying the simpler problem of matrix multiplication mod 2. (Strassen's algorithm works for the general case, non-mod-2 case too.) There were several improvements on Strassen's algorithm, and the current \world record" is due to Coppersmith and Winograd: Theorem 2. (Coppersmith-Winograd, 1987.) It is possible to multiply two matrices in time roughly n2:376. Many people believe that matrix multiplication can be done in time n2+ for any > 0, but nobody knows how to do it. 4.2 Verification We are not actually going to discuss algorithms for matrix multiplication. Instead we will discuss verification of such algorithms. Suppose your friend writes some code to implement the Coppersmith-Winograd algorithm (mod 2). It takes as input two n × n matrices of bits A and B and outputs some n × n ma- trix of bits, D. The Coppersmith-Winograd algorithm is very complicated, and you might feel justifiably concerned about whether your friend implemented it correctly.