<<

On the Work of

Avi Wigderson

Madhu Sudan is the recipient of the 2002 Nevan- “eyeglasses” were key in this study of the no- linna Prize. Sudan has made fundamental contri- tions proof (again) and error correction. butions to two major areas of research, the con- • Theoretical computer science is an extremely nections between them, and their applications. interactive and collaborative community. The first area is coding theory. Established by Sudan’s work was not done in a vacuum, and Shannon and Hamming over fifty years ago, it is much of the background to it, conceptual and technical, was developed by other people. The the mathematical study of the possibility of, and space I have does not allow me to give proper the limits on, reliable communication over noisy credit to all these people. A much better job media. The second area is probabilistically check- has been done by Sudan himself; his homepage able proofs (PCPs). By contrast, it is only ten years (http://theory.lcs.mit.edu/~madhu/) con- old. It studies the minimal resources required for tains several surveys of these areas which give probabilistic verification of standard mathemati- proper historical accounts and references. In cal proofs. particular see [13] for a survey on PCPs and [15] My plan is to briefly introduce these areas, their for a survey on the work on error correction. motivation, and foundational questions and then to explain Sudan’s main contributions to each. Be- Probabilistic Checking of Proofs fore we get to the specific works of Madhu Sudan, One informal variant of the celebrated P versus let us start with a couple of comments that will set NP question asks, Can mathematicians, past and up the context of his work. future, be replaced by an efficient computer pro- gram? We first define these notions and then ex- • Madhu Sudan works in computational com- plexity theory. This research discipline at- plain the PCP theorem and its impact on this foun- dational question. tempts to rigorously define and study efficient versions of objects and notions arising in com- Efficient Computation Throughout, by an efficient algorithm (or program, putational settings. This focus on efficiency is machine, or procedure) we mean an algorithm which of course natural when studying computation runs at most some fixed polynomial time1 in the itself, but it also has proved extremely fruit- length of its input. The input is always a finite ful in studying other fundamental notions such as proof, randomness, knowledge, and more. 1Time refers to the number of elementary steps taken by Here I will try to explain how the efficiency the algorithm. The choice of “polynomial” to represent ef- ficiency is both small enough to often imply practicality is professor of mathematics at the Institute and large enough to make the definition independent of for Advanced Study, Princeton, and The Hebrew Univer- particular aspects of the model, e.g., the choice of allowed sity, Jerusalem. His email address is [email protected]. “elementary operations”.

JANUARY 2003 NOTICES OF THE AMS 45 string of symbols from a fixed, finite alphabet. must be short, and the verification procedure must Note that an algorithm is an object of fixed size be efficient. It is important to note that all standard which is supposed to solve a problem on all inputs (logical) proof systems used in mathematics con- of all (finite) lengths. A problem is efficiently com- form to the second restriction: since only “local in- putable if it can be solved by an efficient algo- ferences” are made from “easily recognizable” ax- rithm. ioms, verification is always efficient in the total length of statement and proof. The first restriction, Definition 1. The class P is the class of all problems on the length of the proof, is natural, since we solvable by efficient algorithms. want the verification to be efficient in terms of the For example, the problems Integer Multiplication, length of the statement. Determinant, Linear Programming, Univariate Poly- An excellent, albeit informal, example is the lan- nomial Factorization, and (recently established) guage MATH of all mathematical statements, whose Testing Primality are in P. proof verification is defined by the well-known ef- Let us restrict attention (for a while) to algo- ficient (anonymous) algorithm REFEREE.2 As hu- rithms whose output is always “accept” or “reject”. mans we are simply not interested in theorems Such an algorithm A solves a decision problem. The whose proofs take, say, longer than our lifetime (or set L of inputs which are accepted by A is called the three-month deadline given by EDITOR) to read the language recognized by A. Statements of the and verify. form “x ∈ L” are correctly classified as “true” or But is this notion of efficient verification—read- “false” by the efficient algorithm A, deterministi- ing through the statement and proof, and check- cally (and without any “outside help”). ing that every new lemma indeed follows from pre- Efficient Verification vious ones (and known results)—the best we can In contrast, allowing an efficient algorithm to use hope for? Certainly as referees we would love some “outside help” (a guess or an alleged proof) natu- shortcuts as long as they do not change our notion rally defines a proof system. We say that a language of mathematical truth too much. Are there such L is efficiently verifiable if there is an efficient al- shortcuts? gorithm V (for “Verifier”) and a fixed polynomial Efficient Probabilistic Verification p for which the following completeness and sound- A major paradigm in computational complexity is ness conditions hold: allowing algorithms to flip coins. We postulate ac- • For every x ∈ L there exists a string π of length cess to a supply of independent unbiased random |π|≤p(|x|) such that V accepts the joint input variables which the probabilistic (or randomized) (x, π). algorithm can use in its computation on a given • For every x ∈ L, for every string π of length input. We comment that the very rich theories |π|≤p(|x|), V rejects the joint input (x, π). (which we have no room to discuss) of pseudo- Naturally, we can view all strings x in L as the- randomness and of weak random sources attempt orems of the proof system V. Those strings π to bridge the gap between this postulate and “real- which cause V to accept x are legitimate proofs of life” generation of random bits in computer pro- the theorem x ∈ L in this system. grams. The notion of efficiency remains the same: prob- Definition 2. The class NP is the class of all lan- abilistic algorithms can make only a polynomial guages that are efficiently verifiable. number of steps in the input length. However, the It is clear that P⊆NP. Are they equal? This is output becomes a random variable. We demand that the “P versus NP” question [5], one of the most the probability of error, on every input, never ex- important open scientific problems today. Not only ceed a given small bound . (Note that can be mathematicians but scientists and engineers as taken to be, e.g., 1/3, since repeating the algorithm well daily attempt to perform tasks (create theo- with fresh independent randomness and taking ries and designs) whose success can hopefully be majority vote of the answers can decrease the error efficiently verified. Reflect on the practical and exponentially in the number of repetitions.) philosophical impact of a positive answer to the Returning to proof systems, we now allow the question: if P = NP, then much of their (creative!) verifier V to be a probabilistic algorithm. As above, work can be performed efficiently by one com- we allow it to err (namely, accept false “proofs”) puter program. with extremely small probability. The gain would Many important computational problems, like be extreme efficiency: the verifier will access only the Travelling Salesman, Integer Programming, Map a constant number of symbols in the alleged proof. Coloring, Systems of Quadratic Equations, and In- Naturally, the positions of the viewed symbols can teger Factorization are (when properly coded as lan- guages) in NP. 2This system can, of course, be formalized. However, it is We stress two aspects of efficient verification. better to have the social process of mathematics in mind The purported “proof” π for the statement “x ∈ L” before we plunge into the notions of the next subsection.

46 NOTICES OF THE AMS VOLUME 50, NUMBER 1 be randomly chosen. What kind of theorems can The conversion above is efficient and deter- be proved in the resulting proof system? First, let ministic. So in principle an efficient program can us formalize it. be written to convert standard mathematical proofs We say that a language L has a probabilistically to robust ones which can be refereed in a jiffy. checkable proof if there is an efficient probabilis- The PCP theorem challenges the classical belief tic algorithm V, a fixed polynomial p, and a fixed that proofs have to be read and verified fully for constant c for which the following completeness one to be confident of the validity of the theorem. and probabilistic soundness conditions hold. Of course one does not expect the PCP theorem to • For every x ∈ L there exists a string π , of dramatically alter the process of writing and veri- length |π|≤p(|x|), such that V accepts the fying proofs (any more than one would expect au- joint input (x, π) with probability 1. tomated verifiers of proof systems to replace the • For every x ∈ L, for every string π of length REFEREE for journal papers). In this sense the PCP |π|≤p(|x|), V rejects the joint input (x, π) theorem is just a statement of philosophical im- with probability ≥ 1/2. portance. However, the PCP theorem does have • On every input (x, π), V can access at most c significant implications of immediate relevance to bits of π. the theory of computation, and we explain this Note again that executing the verifier indepen- next. dently a constant number of times will reduce the Hardness of Approximation “soundness” error to an arbitrarily small constant The first and foremost contribution thus far to without changing the definition. Also note that understanding the mystery of the P versus NP randomness in the verifier is essential if it probes question was the discovery of NP-completeness only a constant (or even logarithmic) number of and its ubiquity by Cook, Levin, and Karp in the symbols in the proof. The reader may verify that early 1970s. such deterministic verification can exist only for Roughly speaking, a language is NP-complete easy languages in P. if it is the hardest in the class NP. More precisely, NP Definition 3. The class PCP is the class of all lan- a language L is -complete if any efficient al- guages that have a probabilistically checkable proof. gorithm for it can be used to efficiently solve every other language in NP. Note that by definition The main contribution of Sudan and his col- every NP-complete language is as hard to compute leagues to this area is one of the deepest and most as any other. Moreover, P = NP if and only if any important achievements of theoretical computer NP-complete language is easy, and P= NP if science. and only if any NP-complete language is hard. As it turns out, almost every language known in Theorem 1. (The PCP Theorem [2, 1]). PCP = NP. NP is known to be either NP-complete or in P. In words, every theorem that can be efficiently The great practical importance of the P versus verified can also be verified probabilistically by NP question stems from the fact that numerous viewing only a fixed number of bits of the pur- outstanding problems in science and engineering ported proof. If the proof is correct, it will always turn out to be NP-complete. For computer pro- be accepted. If it is wrong (in particular, when the grammers or engineers required by their boss to input statement is not a theorem, all “proofs” will find an efficient solution to a given problem, prov- be wrong), it will be rejected with high probability ing it NP-complete is the ultimate excuse for not despite the fact that the verifier hardly glanced at doing it; after all, it is as hard as all these thousands it. of other problems which scientists in various dis- The proof of the PCP theorem, from a very high- ciplines have attempted unsuccessfully. level point of view, takes a standard proof system Knowing the practical world, we suspect that (a problem in NP) and constructs from it a very the boss would not be impressed. In real life we need robust proof system. A correct proof in the former to solve impossible problems too. To do that, we is transformed to a correct proof in the latter. A reduce our expectations! An almost universal situ- false “proof” in the former (even if it has only one ation of this type is some optimization problem for bug in a remote lemma, to use the refereeing which finding the optimal solution is NP-complete metaphor again) is transformed to a “proof” littered or harder. In this situation the boss would ask for with bugs, so many that a random sample of a few an efficient algorithm for some “reasonable” ap- bits would find one. proximation of the optimal solution. Many success This conversion appears related to error-cor- stories exist; an important example is the efficient recting coding (which is our second topic), and in- algorithm for approximating (by any constant fac- deed it is. However, it is quite a bit more, as the tor > 1) the volume of a convex body of high di- encoded string has a “meaning”: it is supposed to mension. What about failure? Does the theory of be a proof of a given statement, and the coding NP-completeness provide any excuses to our poor must keep it so. programmers if they fail again?

JANUARY 2003 NOTICES OF THE AMS 47 For twenty years there was essentially no answer The connection between PCPs and the hardness to this question. The complexity of approximation of approximation was established in [7]. The basic problems was far from understood. It was clear that idea is the following: The PCP theorem provides a this area is much richer/murkier than that of de- natural optimization problem which cannot be ef- cision problems. For illustration, consider the fol- ficiently approximated by any factor better than 2. lowing three optimization problems. Namely, fix a verifier V (and thus a language L it • Linear Equations: Given a system of n linear accepts, as in the PCP theorem). Given x, find the equations, say over the finite field GF(2), de- maximum acceptance probability of V on x by any termine the maximal number that can be sat- proof π. Clearly, by the definition of probabilistic isfied simultaneously. verification, beating a factor of 2 efficiently means ∈ • Set Cover: Given a collection of subsets of a distinguishing between those x L and those ∈ given finite universe of size n, determine the x L. By the theorem, L can be any problem in NP size of the smallest subcollection that covers , so such an efficient approximator would yield P NP every element in the universe. = . This optimization problem serves the same pur- • Clique: Given a finite graph on n vertices, find the size of the largest subset of vertices which pose that the satisfiability of Boolean formulae served when discovered as the first NP-complete are pairwise connected by edges. language. From then on efficient reductions, For each of these problems, finding the optimal namely, transformations of one problem to an- solution is NP-complete. Some naive approxima- other, could be used to prove completeness. Here, tion algorithms have existed for a long time, and too, reductions are used to get the above theo- no one could improve them. They yield completely rems on hardness of approximation. However, different approximation factors. these reductions are far more intricate than for de- • Linear Equations: A random assignment will cision problems; the difference in approximability satisfy on average half the equations, so it is of these different problems is but the first indica- at most a factor 2 from optimal. Try beating tion of the richness and complexity of this area. it. • Set Cover: A simply greedy algorithm, col- List Decodable and Implicit Error- lecting subsets so that the next one covers as Correcting Codes many yet uncovered elements as possible, will Unique Decoding be a factor ln n from optimal. Try proving it. You have some precious information, which you • Clique: A trivial solution is a 1-vertex clique, may want to store or communicate. It is repre- which is within a factor n of optimal. Somewhat sented, say, by K symbols from some fixed alpha- more elaborate algorithms give a factor bet Σ. To protect any part of it from being lost, you 2 n/(log n) . Think it pathetic? Try improving it are prepared to store/communicate it with redun- 0.999 to (the still pathetic?) n . dancy, using N symbols of the same alphabet. Then The PCP theorem, as well as many other tech- it will be subject to (a process we view as) an ad- nical developments of Sudan and other researchers, versary who may destroy or change, say, T of the has paved the way to an almost complete under- symbols in the encoding. A decoding process must standing of how well we can approximate differ- then be able to recover the original information. ent natural optimization problems efficiently. These This scenario is the core of a multitude of prac- developments vary PCPs in many ways and study tical devices you use—CDs, satellite transmissions, many other “resources” of the proof verification Internet communications, and many others. The process, beyond the number of queries to the proof problem of determining the best achievable rela- and the error probability. In particular, as the fol- tionships between the parameters K, N, and T was lowing three different theorems show, the trivial raised by Hamming about fifty years ago. In a approximation algorithms above are essentially slightly different scenario, when changes are ran- the best possible. dom, it was raised even earlier by Shannon. The va- • Linear Equations: Approximation by a factor riety of related models and problems constitute the of 2 − is NP-hard3 for every >0 [11]. large and active field of coding theory and its close • Set Cover: Approximation by a factor of relative, information theory. (1 − )lnn is NP-hard for every >0 [6]. Once again, efficiency requirements enrich the 1− problems tremendously. It was only a few years ago • Clique: Approximation by factor of n is NP-hard for every >0 [10]. that optimal codes (having linear-time encoding and decoding algorithms) were developed. These 3We use “hard” rather than “complete”, as these problems nearly match Shannon’s completely nonconstruc- are not languages in NPas defined above (but they can tive bounds and apply as well to the Hamming be so defined). The meaning remains: an efficient algorithm problem (on which we focus from now on for sim- for the problem would yield P = NP. plicity).

48 NOTICES OF THE AMS VOLUME 50, NUMBER 1 One central feature of this huge body of work Theorem 2 [12]. There is an efficient algorithm that was its focus on unique decoding: you want to re- recovers p from any table that agrees with p on cover the original information when the corrupted t>|F|/2 elements of F. codeword defines it unambiguously. This requires Note that the decoding problem is a nonlinear that the encoding of any two information words dif- one and brute-force search takes time about |F|d , fer in at least 2T +1 positions. which is prohibitive when d is large. The algorithm Is there a meaning to useful decoding when the above, polynomial in |F| and d, uses (efficient) uni- Hamming distance (the number of differing sym- variate polynomial factoring over F. Sudan’s algo- bols) is less than 2T +1 and ambiguity is un- rithm uses factorization of bivariate polynomials avoidable? Can one achieve it efficiently? Why to list-decode even if the fraction of agreement bother? goes to zero with |F|! List Decoding Theorem 3 [14, 9]. There is an efficient algorithm The questions above were pondered in the past which, for every > d/|F| , recovers a list of decades. It was realized that ambiguous decoding O(1/ 2) polynomials containing p from any table would be useful if the decoding process generated that agrees with p on t> |F| elements of F. a short list of candidates, one of which is the orig- inal information. Moreover, it was realized that in Put differently, given any function g from F to principle such a short list exists even if distances itself, this algorithm efficiently recovers all de- gree d polynomials which agree with g on |F| ar- between pairs of encodings are close to T. This an- guments. swers the “why bother?” question immediately: this would drastically improve the ratio between Implicit Codes So we have codes (for unique and list decoding) that K and N (in this area constant factor improve- achieve near-optimal parameters in wide ranges and ments are “drastic”). But no nontrivial code was whose encoding and decoding takes linear time— known to support such decoding even remotely ef- essentially the time to read the input and write ficiently. down the output. What more can we want? The an- Sudan’s extremely elegant algorithm to list- swer is, try to use sublinear time: some root of the decode Reed-Solomon codes completely changed input length; better, a logarithm of it; or, even bet- the situation. It started a snowball rolling which ter, constant time, independent of the input length! transformed large areas in this field in a matter of If you consider the amounts of data biologists two years, again led by Sudan and his colleagues. and astronomers have to understand and explain Moreover, once discovered, these codes were ap- or the amounts of data on the World Wide Web from plied to solve theoretical problems, mainly within which we want to gather some useful information, complexity theory, providing completely different you will realize that sublinear algorithms which look answers (that we mention later) to the “why at only a tiny fraction of the data are in vast de- bother?” question. mand. If you consider the way we search through The Reed-Solomon codes, the old result about a phone book or gather simple statistics via sam- unique decoding, and Sudan’s result on list- pling, you will realize that at least for very simple decoding should appeal to any mathematician, tasks such algorithms exist. The importance of whether interested or not in error correction. Here such algorithms has made this area grow in recent is the setup; the reader can easily relate the para- years, with many more examples of sophisticated meters below to those above. algorithms for far less trivial tasks. Fix a finite field F and an integer d ≤|F| . My in- Computational complexity suggests a natural way to represent huge objects. A basic object is a formation is a degree d polynomial p over F, and function, and a basic representation for it is a pro- I encode it simply as a table of the values of p on gram that computes it. Returning to the coding all elements of F . Now suppose an adversary problem, think of K (and hence of N) as being so changes the table, the only restriction being to large that we have neither room nor time to write | | leave at least t of the F positions in agreement it down explicitly. The K information symbols we with p. Can p be recovered from the table effi- wish to encode are given by a program P which on ciently? input at position i|F|/2 (otherwise we could fill half the table smaller than K. with the values of one polynomial, p1 , and the The encoder will take P and produce another pro- other half with values of a different polynomial, p2). gram Q of the same type; on input j

JANUARY 2003 NOTICES OF THE AMS 49 with Q only on some small fraction of the inputs. also his investment in collecting, clarifying, and Can we recover P in any reasonable sense? The ap- conveying this knowledge in teaching and writing. propriate sense suggests itself: we should construct a small program P that for any input i

50 NOTICES OF THE AMS VOLUME 50, NUMBER 1