<<

Avi Wigderson

On the Work of von

Madhu Sudan is the recipient of the 2002 . Sudan has made fundamental contributions to two major areas of research, the connections between them, and their applications. The first area is coding theory. Established by Shannon and Hamming over fifty years ago, it is the mathematical study of the possibility of, and the limits on, reliable communication over noisy media. The second area is probabilistically checkable proofs (PCPs). By contrast, it is only ten years old. It studies the minimal resources required for probabilistic verification of standard mathematical proofs.

My plan is to briefly introduce these areas, their mo- Probabilistic Checking of Proofs tivation, and foundational questions and then to ex- One informal variant of the celebrated P versus NP plain Sudan’s main contributions to each. Before we question asks,“Can mathematicians, past and future, get to the specific works of Madhu Sudan, let us start be replaced by an efficient computer program?” We with a couple of comments that will set up the con- first define these notions and then explain the PCP text of his work. theorem and its impact on this foundational question. Madhu Sudan works in computational complexity theory. This research discipline attempts to rigor- Efficient Computation ously define and study efficient versions of objects and notions arising in computational settings. This Throughout, by an efficient algorithm (or program, machine, or procedure) we mean an algorithm which focus on efficiency is of course natural when studying 1 computation itself, but it also has proved extremely runs at most some fixed polynomial time in the fruitful in studying other fundamental notions such length of its input. The input is always a finite string as proof, randomness, knowledge, and more. Here of symbols from a fixed, finite alphabet. Note that I will try to explain how the efficiency “eyeglasses” an algorithm is an object of fixed size which is sup- were key in this study of the notions proof (again) posed to solve a problem on all inputs of all (finite) and error correction. lengths. A problem is efficiently computable if it can be solved by an efficient algorithm. Theoretical computer science is an extremely inter- Definition 1. The class P is the class of all prob- active and collaborative community. Sudan’s work lems solvable by efficient algorithms. was not done in a vacuum, and much of the back- ground to it, conceptual and technical, was developed For example, the problems Integer Multiplication, Determinant, Linear Programming, Univariate Poly- by other people. The space I have does not allow me 2 to give proper credit to all these people. A much bet- nomial Factorization, and (recently established ) P ter job has been done by Sudan himself; his home- Testing Primality are in . page (http://theory.lcs.mit.edu/~madhu/)con- Let us restrict attention (for a while) to algorithms tains several surveys of these areas which give proper whose output is always “accept” or “reject”. Such an historical accounts and references. In particular see algorithm solves a decision problem. The set L of in- [13] for a survey on PCPs and [15] for a survey on puts which are accepted by A is called the language the work on error correction. recognized by A. Statements of the form “x ∈ L”

1 Time refers to the number of elementary steps taken by the algorithm. The choice of “polynomial” to represent efficiency is both small enough to often imply practicality and large enough to make the definition independent of particular aspects of the model, e.g., the choice of allowed “elementary operations”. 2[SeeMitteilungen 4–2002, S. 14–21. Anm. d. Red.]

64 DMV-Mitteilungen 1/2003 On the Work of Madhu Sudan

are correctly classified as “true” or “false” by the ef- proof verification is defined by the well-known effi- ficient algorithm, deterministically (and without any cient (anonymous) algorithm referee.3 As humans “outside help”). we are simply not interested in theorems whose proofs take, say, longer than our lifetime (or the three-month Efficient Verification deadline given by editor)toreadandverify. In contrast, allowing an efficient algorithm to use But is this notion of efficient verification – reading “outside help” (a guess or an alleged proof) naturally through the statement and proof, and checking that defines a proof system. We say that a language L is every new lemma indeed follows from previous ones efficiently verifiable if there is an efficient algorithm (and known results) – the best we can hope for? Cer- V (for “Verifier”) and a fixed polynomial p for which tainly as referees we would love some shortcuts as the following completeness and soundness conditions long as they do not change our notion of mathemat- hold: ical truth too much. Are there such shortcuts? – For every x ∈ L there exists a string π of length |π|≤p(|x|) such that V accepts the joint input Efficient Probabilistic Verification (x, π). A major paradigm in computational complexity is al- ∈ – For every x L, for every string π of length lowing algorithms to flip coins. We postulate access | |≤ | | π p( x ), V rejects the joint input (x, π). to a supply of independent unbiased random vari- Naturally, we can view all strings x in L as theorems ables which the probabilistic (or randomized) algo- of the proof system V . Those strings π which cause rithm can use in its computation on a given input. V to accept x are legitimate proofs of the theorem We comment that the very rich theories (which we x ∈ L in this system. have no room to discuss) of pseudorandomness and Definition 2. The class NP is the class of all lan- of weak random sources attempt to bridge the gap guages that are efficiently verifiable. between this postulate and “real-life” generation of random bits in computer programs. It is clear that P⊆NP. Are they equal? This is the “P versus NP” question [5], one of the most impor- The notion of efficiency remains the same: Proba- tant open scientific problems today. Not only mathe- bilistic algorithms can make only a polynomial num- maticians but scientists and engineers as well daily ber of steps in the input length. However, the output attempt to perform tasks (create theories and de- becomes a random variable. We demand that the signs) whose success can hopefully be efficiently ver- probability of error, on every input, never exceed a ified. Reflect on the practical and philosophical im- given small bound. (Note that can be taken to be, pact of a positive answer to the question: If P = NP, e.g., 1/3, since repeating the algorithm with fresh in- then much of their (creative!) work can be performed dependent randomness and taking majority vote of efficiently by one computer program. the answers can decrease the error exponentially in Many important computational problems, like the the number of repetitions.) Travelling Salesman, Integer Programming, Map Returning to proof systems, we now allow the verifier Coloring, Systems of Quadratic Equations, and In- to be a probabilistic algorithm. As above, we allow it teger Factorization are (when properly coded as lan- to err (namely, accept false “proofs”) with extremely guages) in NP. small probability. The gain would be extreme effi- We stress two aspects of efficient verification. The ciency: The verifier will access only a constant num- purported “proof” π for the statement “x ∈ L”must ber of symbols in the alleged proof. Naturally, the be short, and the verification procedure must be effi- positions of the viewed symbols can be randomly cho- cient. It is important to note that all standard (log- sen. What kind of theorems can be proved in the ical) proof systems used in mathematics conform to resulting proof system? First, let us formalize it. the second restriction: Since only “local inferences” We say that a language L has a probabilistically are made from “easily recognizable” axioms, verifica- checkable proof if there is an efficient probabilis- tion is always efficient in the total length of statement tic algorithm V , a fixed polynomial p,andafixed and proof. The first restriction, on the length of the constant c for which the following completeness and proof, is natural, since we want the verification to be probabilistic soundness conditions hold. efficient in terms of the length of the statement. – For every x ∈ L there exists a string π,oflength An excellent, albeit informal, example is the lan- |π|≤p(|x|), such that V accepts the joint input guage math of all mathematical statements, whose (x, π) with probability 1.

3 This system can, of course, be formalized. However, it is better to have the social process of mathematics in mind before we plunge into the notions of the next subsection.

DMV-Mitteilungen 1/2003 65 Avi Wigderson

– For every x ∈ L, for every string π of length of proof systems to replace the referee for journal |π|≤p(|x|), V rejects the joint input (x, π) with papers). In this sense the PCP theorem is just a probability ≥ 1/2. statement of philosophical importance. However, the – On every input (x, π), V can access at most c bits PCP theorem does have significant implications of of π. immediate relevance to the theory of computation, Note again that executing the verifier independently and we explain this next. a constant number of times will reduce the “sound- ness” error to an arbitrarily small constant without Hardness of Approximation changing the definition. Also note that randomness The first and foremost contribution thus far to under- in the verifier is essential if it probes only a constant standing the mystery of the P versus NP question (or even logarithmic) number of symbols in the proof. was the discovery of NP-completeness and its ubi- The reader may verify that such deterministic verifi- quity by Cook, Levin, and Karp in the early 1970s. cation can exist only for easy languages in P. Roughly speaking, a language is NP-complete if it is Definition 3. The class PCP is the class of all lan- the hardest in the class NP. More precisely, a lan- guages that have a probabilistically checkable proof. guage L is NP-complete if any efficient algorithm for The main contribution of Sudan and his colleagues it can be used to efficiently solve every other language to this area is one of the deepest and most important in NP. Note that by definition every NP-complete achievements of theoretical computer science. language is as hard to compute as any other. More- Theorem 1. (The PCP Theorem [2], [1]). over P = NP, if and only if any NP-complete lan- PCP = NP. guage is easy, and P = NP if and only if any NP- complete language is hard. In words, every theorem that can be efficiently veri- fied can also be verified probabilistically by viewing As it turns out, almost every language known in NP NP P only a fixed number of bits of the purported proof. If is known to be either -complete or in .The the proof is correct, it will always be accepted. If it great practical importance of the P versus NP ques- is wrong (in particular, when the input statement is tion stems from the fact that numerous outstanding not a theorem, all “proofs” will be wrong), it will be problems in science and engineering turn out to be rejected with high probability despite the fact that NP-complete. For computer programmers or engi- the verifier hardly glanced at it. neers required by their boss to find an efficient so- lution to a given problem, proving it NP-complete The proof of the PCP theorem, from a very high- is the ultimate excuse for not doing it; after all, it level point of view, takes a standard proof system (a is as hard as all these thousands of other problems NP problem in ) and constructs from it a very ro- which scientists in various disciplines have attempted bust proof system. A correct proof in the former is unsuccessfully. transformed to a correct proof in the latter. A false “proof” in the former (even if it has only one bug Knowing the practical world, we suspect that the in a remote lemma, to use the refereeing metaphor boss would not be impressed. In real life we need again) is transformed to a “proof” littered with bugs, to solve impossible problems too. To do that, we so many that a random sample of a few bits would reduce our expectations! An almost universal situ- find one. ation of this type is some optimization problem for which finding the optimal solution is NP-complete This conversion appears related to error-correcting or harder. In this situation the boss would ask for coding (which is our second topic), and indeed it is. an efficient algorithm for some “reasonable” approxi- However, it is quite a bit more, as the encoded string mation of the optimal solution. Many success stories has a “meaning”: it is supposed to be a proof of a exist; an important example is the efficient algorithm given statement, and the coding must keep it so. for approximating (by any constant factor > 1) the The conversion above is efficient and deterministic. volume of a convex body of high dimension. What So in principle an efficient program can be written to about failure? Does the theory of NP-completeness convert standard mathematical proofs to robust ones provide any excuses to our poor programmers if they which can be refereed in a jiffy. fail again? The PCP theorem challenges the classical belief that For twenty years there was essentially no answer proofs have to be read and verified fully for one to be to this question. The complexity of approximation confident of the validity of the theorem. Of course problems was far from understood. It was clear that one does not expect the PCP theorem to dramati- this area is much richer/murkier than that of deci- cally alter the process of writing and verifying proofs sion problems. For illustration, consider the following (any more than one would expect automated verifiers three optimization problems.

66 DMV-Mitteilungen 1/2003 On the Work of Madhu Sudan

– Linear Equations: Given a system of linear equa- in the PCP theorem). Given x, find the maximum tions, say over the finite field GF (2), determine the acceptance probability of V on x by any proof π. maximal number that can be satisfied simultane- Clearly, by the definition of probabilistic verification, ously. beating a factor of 2 efficiently means distinguish- – Set Cover: Given a collection of subsets of a given ing between those x ∈ L and those x ∈ L.Bythe finite universe of size n, determine the size of the theorem, L can be any problem in NP,sosuchan smallest subcollection that covers every element in efficient approximator would yield P = NP. the universe. This optimization problem serves the same purpose – Clique: Given a finite graph n on vertices, find that the satisfiability of Boolean formulae served the size of the largest subset of vertices which are when discovered as the first NP-complete language. pairwise connected by edges. From then on efficient reductions, namely, transfor- For each of these problems, finding the optimal so- mations of one problem to another, could be used to lution is NP-complete. Some naive approximation prove completeness. Here, too, reductions are used algorithms have existed for a long time, and no one to get the above theorems on hardness of approxima- could improve them. They yield completely different tion. However, these reductions are far more intricate approximation factors. than for decision problems; the difference in approx- imability of these different problems is but the first – Linear Equations: A random assignment will sat- indication of the richness and complexity of this area. isfy on average half the equations, so it is at most a factor 2 from optimal. Try beating it. – Set Cover: A simply greedy algorithm, collecting List Decodable and Implicit subsets so that the next one covers as many yet un- Error-Correcting Codes covered elements as possible, will be a factor from optimal. Try proving it. Unique Decoding – Clique: A trivial solution is a 1-vertex clique, You have some precious information, which you may which is within a factor n of optimal. Somewhat want to store or communicate. It is represented, say, more elaborate algorithms give a factor n/(log n)2. by K symbols from some fixed alphabet Σ. To pro- Think it pathetic? Try improving it to (the still tect any part of it from being lost, you are prepared to pathetic?) n0.999. store/communicate it with redundancy, using N sym- The PCP theorem, as well as many other technical bols of the same alphabet. Then it will be subject to developments of Sudan and other researchers, has (a process we view as) an adversary who may destroy paved the way to an almost complete understanding or change, say, T of the symbols in the encoding. A of how well we can approximate different natural op- decoding process must then be able to recover the timization problems efficiently. These developments original information. vary PCPs in many ways and study many other “re- This scenario is the core of a multitude of practical sources” of the proof verification process, beyond the devices you use – CDs, satellite transmissions, Inter- number of queries to the proof and the error proba- net communications, and many others. The problem bility. In particular, as the following three different of determining the best achievable relationships be- theorems show, the trivial approximation algorithms tween the parameters K, N,andT was raised by above are essentially the best possible. Hamming about fifty years ago. In a slightly differ- – Linear Equations: Approximation by a factor of ent scenario, when changes are random, it was raised 2 − ε is NP-hard4 for every ε>0 [11]. even earlier by Shannon. The variety of related mod- – Set Cover: Approximation by a factor of (1−ε)lnn els and problems constitute the large and active field is NP-hard for every ε>0[6]. of coding theory and its close relative, information – Clique: Approximation by factor of n1−ε is NP- theory. hard for every ε>0 [10]. Once again, efficiency requirements enrich the prob- The connection between PCPs and the hardness of lems tremendously. It was only a few years ago that approximation was established in [7]. The basic idea optimal codes (having linear-time encoding and de- is the following: The PCP theorem provides a natu- coding algorithms) were developed. These nearly ral optimization problem which cannot be efficiently match Shannon’s completely nonconstructive bounds approximated by any factor better than 2. Namely, and apply as well to the Hamming problem (on which fix a verifier V (and thus a language L it accepts, as we focus from now on for simplicity).

4 We use “hard” rather than “complete”, as these problems are not languages in NP as defined above (but they can be so defined). The meaning remains: an efficient algorithm for the problem would yield P = NP.

DMV-Mitteilungen 1/2003 67 Avi Wigderson

One central feature of this huge body of work was on t>|F |/2 elements of F . its focus on unique decoding: you want to recover Note that the decoding problem is a nonlinear one the original information when the corrupted code- and brute-force search takes time about |F |d,which word defines it unambiguously. This requires that is prohibitive when d is large. The algorithm above, the encoding of any two information words differ in polynomial in |F | and d, uses (efficient) univariate at least 2 T + 1 positions. polynomial factoring over F . Sudan’s algorithm uses Is there a meaning to useful decoding when the Ham- factorization of bivariate polynomials to list-decode ming distance (the number of differing symbols) is even if the fraction of agreement goes to zero with |F |! less than 2T + 1 and ambiguity is unavoidable? Can Theorem 3 [14, 9]. There is an efficient algorithm one achieve it efficiently? Why bother? which, for every ε> d/|F |, recovers a list of O(1/ε2) polynomials containing p from any table that List Decoding agrees with p on t>ε|F | elements of F . The questions above were pondered in the past Put differently, given any function g from F to itself, decades. It was realized that ambiguous decoding this algorithm efficiently recovers all degree d poly- would be useful if the decoding process generated a nomials which agree with g on ε|F | arguments. short list of candidates, one of which is the origi- nal information. Moreover, it was realized that in Implicit Codes principle such a short list exists even if distances be- So we have codes (for unique and list decoding) that tween pairs of encodings are close to. This answers achieve near-optimal parameters in wide ranges and the “why bother?” question immediately: This would whose encoding and decoding takes linear time – es- drastically improve the ratio between K and N.(In sentially the time to read the input and write down this area constant factor improvements are“drastic”.) the output. What more can we want? The answer But no nontrivial code was known to support such is, try to use sublinear time: some root of the in- decoding even remotely efficiently. put length; better, a logarithm of it; or, even better, Sudan’s extremely elegant algorithm to list-decode constant time, independent of the input length! Reed-Solomon codes completely changed the situa- If you consider the amounts of data biologists and tion. It started a snowball rolling which transformed astronomers have to understand and explain or the large areas in this field in a matter of two years, again amounts of data on the World Wide Web from which led by Sudan and his colleagues. Moreover, once dis- we want to gather some useful information, you will covered, these codes were applied to solve theoreti- realize that sublinear algorithms which look at only cal problems, mainly within complexity theory, pro- a tiny fraction of the data are in vast demand. If you viding completely different answers (that we mention consider the way we search through a phone book or later) to the “why bother?” question. gather simple statistics via sampling, you will realize The Reed-Solomon codes, the old result about unique that at least for very simple tasks such algorithms decoding, and Sudan’s result on list-decoding should exist. The importance of such algorithms has made appeal to any mathematician, whether interested or this area grow in recent years, with many more ex- not in error correction. Here is the setup; the reader amples of sophisticated algorithms for far less trivial can easily relate the parameters below to those above. tasks. ≤| | Fix a finite field F and an integer d F .Myin- Computational complexity suggests a natural way to formation is a degree d polynomial p over F ,andI represent huge objects. A basic object is a function, encode it simply as a table of the values of p on all and a basic representation for it is a program that elements of F . Now suppose an adversary changes computes it. Returning to the coding problem, think the table, the only restriction being to leave at least of K (and hence of N) as being so large that we have | | t of the F positions in agreement with p.Canp be neither room nor time to write it down explicitly. The recovered from the table efficiently? K information symbols we wish to encode are given For unique decoding, namely, for the table to by a program P which on input at position i gives the value of the i-th information symbol. Note |F |/2 (otherwise we could fill half the table with that such a program can be extremely succinct: It the values of one polynomial, p1, and the other half can have size exponentially smaller than K. with values of a different polynomial, p2). When The encoder will take P and produce another pro- | | d< F /2, this bound turns out to be not only nec- gram Q of the same type; on input j

68 DMV-Mitteilungen 1/2003 On the Work of Madhu Sudan

we recover P in any reasonable sense? The appropri- home page (http://theory.lcs.mit.edu/~madhu/) ate sense suggests itself: we should construct a small reveals not only the extent to which Sudan was part program P that for any input i

DMV-Mitteilungen 1/2003 69