Computing over the Reals: Where Turing Meets Newton Lenore Blum

he classical (Turing) theory of computation been a powerful tool in classical complexity theory. has been extraordinarily successful in pro- But now, in addition, the transfer of complexity re- Tviding the foundations and framework for sults from one domain to another becomes a real theoretical computer science. Yet its dependence possibility. For example, we can ask: Suppose we on 0s and 1s is fundamentally inadequate for pro- can show P = NP over C (using all the mathemat- viding such a foundation for modern scientific ics that is natural here). Then, can we conclude that computation, in which most algorithms—with ori- P = NP over another field, such as the algebraic gins in Newton, Euler, Gauss, et al.—are real num- numbers, or even over Z2? (Answer: Yes and es- ber algorithms. sentially yes.) In 1989, Mike Shub, Steve Smale, and I introduced In this article, I discuss these results and indi- a theory of computation and complexity over an cate how basic notions from numerical analysis arbitrary ring or field R [BSS89]. If R is such as condition, round-off, and approximation are Z2 = ({0, 1}, +, ·), the classical computer science being introduced into complexity theory, bringing theory is recovered. If R is the field of real num- together ideas germinating from the real calculus bers R, Newton’s algorithm, the paradigm algo- of Newton and the discrete computation of com- rithm of numerical analysis, fits naturally into our puter science. The canonical reference for this ma- model of computation. terial is the book Complexity and Real Computation Complexity classes P, NP and the fundamental [BCSS98]. question “Does P = NP?” can be formulated natu- rally over an arbitrary ring R. The answer to the fun- Two Traditions of Computation damental question depends in general on the com- The two major traditions of the theory of compu- plexity of deciding feasibility of polynomial systems tation have, for the most part, run a parallel non- over R. When R is Z2, this becomes the classical sat- intersecting course. On the one hand, we have nu- isfiability problem of Cook–Levin [Cook71, Levin73]. merical analysis and scientific computation; on the When R is the field of complex numbers C, the an- other hand, we have the tradition of computation swer depends on the complexity of Hilbert’s Null- theory arising from logic and computer science. stellensatz. Fundamental to both traditions is the notion of The notion of reduction between problems (e.g., algorithm. Newton’s method is the paradigm ex- between traveling salesman and satisfiability) has ample of an algorithm cited most often in numerical analysis texts. The Turing machine is the Lenore Blum is Distinguished Career Professor of Computer underlying model of computation given in most Science at Carnegie Mellon University. Her email address computer science texts on algorithms. Yet Newton’s is [email protected]. This article is based on the AWM given Supported in part by NSF grant # CCR-0122581. The au- by Lenore Blum at the Joint Mathematics Meetings in San thor is grateful to the Toyota Technological Institute at Diego, January 2001. Chicago for the gift of space-time to write this article.

1024 NOTICES OF THE AMS VOLUME 51, NUMBER 9 method is not discussed in these computer science texts, nor are Turing machines men- tioned in texts on numerical analysis. More fundamental differences arise with the distinct underlying spaces, the mathe- matics employed, and the problems tackled by each tradition. In numerical analysis and scientific computation, algorithms are gen- erally defined over the reals or complex num- bers, and the relevant mathematics is that of the continuum. On the other hand, 0s and 1s are the basic bits of the theory of computa- tion of computer science, and the mathe- matics employed is generally discrete. The problems of numerical analysts tend to come ∗ to be a decidable subset of Σ (e.g., as would be the from the classical tradition of equation solving case for the set of all diophantine polynomials em- and the calculus. Those of the computer scientist bedded in {0, 1}∗ via some natural coding). N.B. Σ∗ tend to have more recent combinatorial origins. The is countable. highly developed theory of computation and com- To complete the definition, many seemingly dif- plexity theory of computer science in general is un- ferent machines were proposed. What has been natural for analyzing problems arising in numeri- striking is that all gave rise to the exact same class cal analysis, yet no comparable formal theory has emanated from the latter. of “computable” functions. This gives rise to the One aim of our work is to reconcile the disso- belief, known as Church’s Thesis, that the com- nance between these two traditions, perhaps to putable functions form a natural class and any in- unify, but most important, to see how perspec- formal notion of procedure or algorithm can be re- tives and tools of each can inform the other. alized within any of the formal settings. We begin with some background and motivation, In 1970, Yuri Matijasevich answered Hilbert’s then we present our unifying model and main com- Tenth Problem in the negative by showing the as- plexity results, and, finally, we see Turing meet sociated characteristic function χS is not Turing Newton and fundamental links introduced. computable and hence, by Church’s Thesis, no procedure exists for deciding the solvability in in- Background tegers of diophantine equations. (Following the program mapped out earlier by Martin Davis, Hi- The motivation for logicians to develop a theory of computation in the 1930s had little to do with computers (think of it, aside from historical arti- facts, there were no computers around then). 0 or 1 1, R Rather, Gödel, Turing, et al. were groping with the A B question: “What does it mean for a problem or set 0 or 1 S (⊂ universe X) to be decidable?” For example, how 0, R can one make precise Hilbert’s Tenth Problem? Example. Hilbert’s Tenth Problem. Let X ={f ∈ Z[x1,...,xn] n>0} and S ={f ∈ X|∃ζ n 1010100000000 ∈ Z ,f(ζ1,...,ζn)=0} . Is S decidable? That is, can one decide by finite means, given a diophantine polynomial, whether or not it has an integer solu- The Turing Machine (with simple operations, tion? (Actually, Hilbert’s challenge was: Produce finite number of states, finite program, and such a decision procedure.) infinite tape) provides a mathematical The logicians’ subsequent formalization of the foundation for the classical Theory of notion of decidability has had profound conse- Computation (originated by logicians in the quences. 1930s and 40s) and Complexity Theory Definition. A set S(⊂ X) is decidable if its char- (originated by computer scientists in the 1960s acteristic function χS (with values 1 on S, 0 on and 70s). The program of this 2-state TM X − S) is computable (in finite time) by a machine. instructs the machine “if in state A with a 0 or 1 Such a 0-1 valued machine is called a decision in the current scanned cell, then print a 1, move procedure for S. On input x ∈ X it answers the R(ight) and go into state B; if in state B, print 0, question “Is x ∈ S?” with output 1 if YES and 0 if move R and go into state A.” That the long-term ∗ NO. Here X is Σ , the set of finite but unbounded behavior of this machine is discernible is not sequences over a finite set Σ. We will also allow X the norm. (TM picture, courtesy of Bryan Clair.)

OCTOBER 2004 NOTICES OF THE AMS 1025 ∈ Zn lary Putnam, and , Matijasevich ζ 2 that provides, together with the computa- needed only to show the existence of a diophan- tion of f on argument ζ = (ζ1,...,ζn) producing tine relation of exponential growth [DavisMatija- value 0, a polynomial time verification. If f ∉ S, then sevicRobinson76].) no such witness will do. Since the initial startling results of the 1930s of The significance of the P = NP? question be- the undecidability of the true statements of arith- came clear when Dick Karp showed that the metic and the halting problem for Turing machines, tractability of each of twenty-one seemingly unre- logicians have focused attention on classifying lated problems was equivalent to the tractability problems as to their decidability or undecidability. of SAT [Karp72]. The number of such problems Considerable attention was placed on studying the known today is legion. hierarchy of the undecidable. Decidable problems, As did the earlier questions about decidability, particularly finite ones, held little interest. In the the P = NP? question has had profound conse- 1960s and 70s, computer scientists started to re- quence. The apparent dichotomy between the alize that not all such decidable problems were classes P and NP has been the underpinning of alike; indeed, some seemed to defy feasible solu- some of the most important applications of com- tion by machine. plexity theory, such as to cryptography and se- cure communication. Particularly appealing here is Example. The SATisfiability Problem (SAT). Here, the idea of using hard problems to our advantage. Thus the classical Turing tradition has yielded X ={f |f : Zn → Z } 2 2 a highly developed and rich (invariant) theory of is the set of Boolean functions and computation and complexity with essential appli- cations to computation—and deep interesting ques- S ={f ∈ X|∃ζ ∈ Zn,f(ζ ,...,ζ ) = 0} . 2 1 n tions. Why do we want a new model of computa- It is assumed X is embedded in {0, 1}∗ via some tion? natural encoding. Systematically testing all possi- ble 2n arguments for a given Boolean function f Motivation for Model clearly yields a decision procedure for S. This pro- Decidability over the Continuum: Is the cedure takes an exponential number of basic op- Decidable? erations in the worst case. We do not know if SAT is tractable, i.e., if there is a polynomial time deci- Now we witnessed a certain extraordi- sion procedure for S. narily complicated-looking set, namely Definition. The decision problem (X,S) is in the Mandelbrot set. Although the rules which provide its definition are sur- class P if the characteristic function χS is polyno- 3 mial time computable, i.e., computable by a poly- prisingly simple, the set itself exhibits nomial time Turing machine. A polynomial time an endless variety of highly elaborate Turing machine is one that halts in c(size x)k Tur- structure. Could this be an example of ing operations for some fixed c,k ≥ 0 and all in- a non-recursive [i.e. undecidable] set, 4 puts x ∈ X. Here size x is the length of the se- truly exhibited before our mortal eyes? quence x, i.e., the bit length if Σ ={0, 1}. —Roger Penrose, The Emperor’s New Mind As was the case for computable functions, the [Penrose89]. polynomial time functions, the class P, and sub- Classically, decidability is defined only for count- sequently defined complexity classes, form natural able sets. The Mandelbrot set is uncountable. So the classes independent of machine.1 Thus again, com- Mandelbrot set is not decidable classically. But puter scientists have confidence they are working clearly this is not a satisfactory argument. So how with a very natural class of functions and feel jus- do we reasonably address Penrose’s question? tified employing their favorite model. From time to time, logicians and computer sci- In the early 1970s Steve Cook and Leonid Levin entists do look at problems over the reals or com- [Cook71, Levin73] independently formulated and plex numbers. One approach has been through “re- answered the question about the tractability of cursive analysis,” which has its origins in Turing’s = 2 SAT with another question: Does P NP? seminal 1936 paper [Turing36–37].5 In the first para- A decision problem (X,S) ∈ classNP if for each graph, Turing defines “a number [to be] computable x ∈ S there is a polynomial time verification of this fact. Later we will formalize the definition of 3 ∈ C NP, but meantime we note that SAT ∈ NP: If a The Mandelbrot set M is the set of all parameters c , ∈ such that the orbit of 0 under the quadratic map Boolean function f S, then there is a witness 2 pc (z) = z + c remains bounded. 1A main proviso is that integers are not coded in unary. 4Emphasis mine. 2I am rephrasing their result. The usual statement is: SAT 5Here Turing first defines automatic machines and shows is NP-complete. the undecidability of the halting problem.

1026 NOTICES OF THE AMS VOLUME 51, NUMBER 9 if its decimal can be written down by a machine.” feeling that the correct viewpoint has not yet been Turing later defines computable functions of com- arrived at.” putable real numbers.6 One can then imagine an or- The Mandelbrot example is perhaps too exotic acle Turing machine that, when fed a real number to draw generalizations. We turn now to a decision by oracle, decimal by decimal, outputs a real num- problem ubiquitous in mathematics. ber decimal by decimal. A refinement of this no- The Hilbert Nullstellensatz/R tion forms the basis of recursive analysis. Given a system of polynomial equations over a ring R, f1(x1,...,xn) = 0,...,fm(x1,...,xn) = 0 . Is n there ζ ∈ R , such that f1(ζ) = ...= fm(ζ) = 0? Oracle Turing Machines We call the corresponding decision problem over R, HNR. If R is Z2, Z, or the rational numbers 9 8 6 9 6 0 4T.M. 3 1 4 1 5 9 2 Q, HNR fits naturally into the Turing formalism (via bit coding of integers and rationals). The corre- sponding decision problem over Z2 is essentially Another tack taken by computer scientists is SAT (decidable but not known to be in P), over Z what might be called the “rational number model” it is Hilbert’s Tenth Problem (undecidable) and approach. The approach is not formalized, but its over Q it is not known to be decidable or unde- reasoning goes as follows: cidable. If R is the real field R or complex numbers C A. Machines are finite. , then HNR does not fit naturally into the Turing B. Finite approximations of reals are rationals. formalism. C. ∴ We are really looking at problems over the An even simpler example is the high school al- rationals. gorithm for deciding whether or not a real poly- 2 If we are totally naïve here, we quickly run into nomial ax + bx + c has a real root. We just check 2 trouble. The rational skeleton of the curve if the discriminant b − 4ac ≥ 0. We do not stipu- x3 + y 3 = 1 on the positive quadrant is hardly in- late that a, b, and c be rational or be fed to us bit formative.7 by bit—or question if we can tell whether or not a We have even more serious concerns when this real number equals 0. We just work with the basic approach is used in complexity theory. Computer arithmetic operations and comparisons of an or- scientists measure complexity as a function of dered ring or field. input word size (in bits). But small perturbations More generally, we have perfectly good algo- R C (of input) can cause large differences in word size. rithms for deciding HNR over and . Recall: For example, a small perturbation of an input 1 to Hilbert’s Nullstellensatz, HN [Hilbert1893] n 1 + 1/2 causes the word size to grow from 1 to Given f1(x1,...,xn),...,fm(x1,...,xn) ∈ C[x1,..., n + 1. Thus an algorithm that is polynomial time xn]. Then, f1 = ...= fm = 0 is not solvable over C according to the discrete model definition would 1=Σgifi, be allowed to take considerably more time on a per- (*) for some polynomials turbed input than on a given input. If the problem g ∈ C[x ,...,x ]. instance were well conditioned, this clearly would i 1 n not be acceptable. An issue here is that the Eu- This theorem provides a semidecision procedure clidean metric is very different from the bit met- for the complement of HNC : Given ric. Another is condition. f1,...,fm ∈ C[x1,...,xn], systematically search for ∗ Not paying attention to these issues has caused gi’s to solve ( ). If found, then f1 = ...= fm = 0 is both incompleteness in the analysis and confu- not solvable over C and so output 0. (The search sion in the comparison of different algorithms can be done by considering, for each successive D, over the reals. The comparison of competing al- general polynomials gi of degree D with indeter- gorithms for the Linear Programming Problem pro- minate coefficients. Checking if there exist coeffi- ∗ vides a case in point. We shall return to this example cients satisfying ( ) reduces to solving ∼ Dn linear again. equations over C.) Penrose explores similar scenarios for posing his However, if f1 = ...= fm = 0 is solvable, then question but in the end “... is left with the strong no such gi’s will ever be found. Fortunately, we have stopping rules. In 1926, Grete Hermann, a student of Emmy Noether, gave an effective upper bound 6Although Turing states in this paper that “a develop- n D (= d ↑ (2 ) where d = max(3, deg fi )) on the de- ment of the theory of functions of a real variable ex- grees of the gi ’s that one need consider [Her- pressed in terms of computable numbers” would be forth- mann26].8 If no solution is found for generic g ’s coming, I am not aware of any further such development i by him in this direction. 8By Brownawell and Kollar, we only need check the case 7By Fermat, the only rational points on the curve are (0, 1) D = dn, which by Masser and Philippon is optimal. (See and (1, 0). [Yger01] for discussion and [KPS01] for refinements.)

OCTOBER 2004 NOTICES OF THE AMS 1027 Input: (f, z, ε) of degree D, then none exists, and so we can out- f (z) Compute: z ← z − put 1. f ′(z) This decision procedure, using arithmetic oper- ations and comparisons on complex numbers, in- Branch: Stopping Rule spires us to take a different tack. Rather than forc- |f (z)|2 < ε |f (z)|2 ≥ ε ing artificial coding of problems into bits, we propose a model of computation that computes Output: z over a ring (or field), using basic algebraic opera- tions and comparisons on elements of the ring (or A “machine” for executing Newton’s method. field). Thus we have our first motivation for propos- ing a new model. than originally posed. We are thus motivated to fol- Algorithms of Numerical Analysis low this path for the P = NP? problem. Our next motivation comes directly from the tra- dition of numerical analysis. We start with the The Model: Machines over a Ring or Field R paper, “Rounding-off errors in matrix processes” [BSS89] in the Quarterly Journal of Mechanics and Applied We suppose R is a commutative ring or field (pos- Mathematics, vol. I, 1948, pp. 287–308 [Turing48]. sibly ordered) with unit. A machine M over R has Written by Alan Turing while he was preoccupied the following properties: with solving large systems of equations, this paper Associated with M is an input space and an out- is quite well known to numerical analysts but al- put space, both R∞ (the disjoint union of Rn, n ≥ 0). most unknown by logicians and computer scien- At the top level, our machine M is similar to a Tur- tists. Its implicit model of computation is more ing machine. M has a 2-way infinite tape divided closely related to the former than the latter. into cells and a read-write head that can view a fixed

In the first section of his paper, Turing consid- number kM of contiguous cells at a time. ers the “measures of work in a process”: Internal to M is its program, a finite directed graph with 5 types of nodes, each with associated It is convenient to have a measure of the operations and next node maps: amount of work involved in a comput- ing process, even though it be a very • The operation gι associated with the input node = crude one.... We might, for instance, ι takes elements x (x1,...,xk) from the input ∞ = count the number of additions, subtrac- space R and puts each xi (i 1,...,k) in suc- tions, multiplications, divisions, record- cessive tape cells, starting with the leftmost one  ings of numbers...9 in M’s view. There is a unique next node ι = ι. • Each computation node η has a built-in polyno- From this point of view, it is again natural to start n m mial or rational map gη : R → R with with a model of computation in which real num- 10 n, m ≤ kM . Given elements x1,...,xn in the bers are viewed as entities, and algebraic operations first n cells of M’s view, the associated opera- and comparisons, as well as simple accessing, are j tion, also called gη, puts gη(x1,...,xn) in the jth each counted as a unit of work. We will return cell in M’s view (j = 1,...,m). There is a unique again to this paper to motivate refinements of next node η. these initial “measures of work.” • For each branch node η, the associated opera- We also want a model of computation that is tion is the identity. There are two possible next more natural for describing algorithms of numer-   nodes ηL and ηR depending on the leftmost el- ical analysis, such as Newton’s method for finding ement x1 in M’s view. If x1 = 0 (≥ 0, if R is or- zeros of polynomials. Here, given a polynomial   dered), then η = ηR . If x1 = 0 (x1 < 0), then R C   f (z) over or , the basic operation is the New- η = η . = −  L ton map, Nf (z) z f (z)/f (z), which is iterated • For each shift node σ, the associated map is the until the current value satisfies some proscribed identity and there is a unique next node σ  . stopping rule. Translating to bit operations would Right shift nodes σR shift M’s view one cell to wipe out the natural structure of this algorithm. the right, left shift nodes σL shift one cell to the left. Does P = NP? • The operation gN associated with the output In order to gain new perspective and access addi- node N outputs (by projection) the contents of ∞ tional tools, mathematicians often find it prof- the tape into R . N has no next node. itable to view problems in a broader framework 10Machines over R can thus have a finite number of built- 9Emphasis mine. in constants from R.

1028 NOTICES OF THE AMS VOLUME 51, NUMBER 9 ∞ ∆ n ∗ Input Space R = ∪ R = R {finite (unbounded)} ∞ n ≥ 0 sequences over R Input Space R ↓ Input ↓ I

State Space R∞ State Space R∞ x x x x x + . . .0 x x x x x + . . . 0 ...... 000 1 2 kM kM 1 . . . . . 0 1 2 kM kM 1 x ε R Input i Node ↓ The Computation program Node Shift (R,L) for M x←g (x) Nodes n ←σ is a finite x (x) directed Branch graph. Node x ≠ 0 x = 0 ≥ (< 0) 1 1 ( 0) Output Node N ↓Output ↓O ∞ ∆ ∗ ∞ Output Space R = ∪ Rn= R Output Space R n ≥ 0

A machine over a ring or field R , top-level and internal views.

The computable functions over R are the input- bounds, is easily converted to a decision machine output maps φM of machines M over R. Thus, for over C, and so HNC is decidable over C. Similarly, ∞ x ∈ R , φM (x) is defined if the output node N is HNR is decidable over R (by Seidenberg’s elimina- reachable by following M’s program on input x. If tion theory [Seid54]). ∞ so, φM (x) is the output y ∈ R . We can also now formally state, and answer, Although the machine’s view at any time is fi- Penrose’s question about the Mandelbrot set M. 2 nite, the shift nodes enable the machine to read and Here X is R and Xyes is M. (In order to allow al- operate on inputs from Rn for all n, and thus we gorithms that compare magnitudes, we are view- can model algorithms that are defined uniformly ing C as R2.) for inputs of any dimension. Noting this, we can Theorem [BlumSmale93]. The Mandelbrot set M also construct universal (programmable) machines is undecidable over R. over R. (We do not use Gödel coding. The machine The proof uses the fact that the boundary of a program itself is (essentially) its own code.) closed semidecidable set in R2 has Hausdorff di- If R is Z2, we recover the classical theory of mension at most 1, whereas the Hausdorff di- computation (and complexity, as we shall see). We mension of the boundary of the Mandelbrot set is also note that Newton’s method is naturally im- 2 [Shishikura91]. plemented by a machine over R. It turns out that the complement of the Man- Let’s return now to questions of decidability delbrot set M is semidecidable over R. To see this, over R and C, which were so problematic before. a little arithmetic shows that M ={c ∈ C such that the sequence c,c2 + c,(c2 + c)2 + c,... stays within Definition. A problem over R is a pair, (X,Xyes), ∞ the circle of radius 2}. Hence, if at some point the where Xyes ⊆ X ⊆ R . X consists of the problem in- orbit of 0 under the map z2 + c escapes the circle stances, Xyes, the yes-instances. of radius 2, we can be certain that c is in the com- For HNR, plement of M. One can use this fact to “draw” the X = {f =(f1,...,fm)|fi ∈ R[x1,...,xn]},m,n>0 and Mandelbrot set. { ∈ |∃ ∈ n } Xyes = f X ζ R ,fi (ζ1,...,ζn)=0,i =1,...,m . Complexity Classes and Theory over a Finite polynomial systems over R can be coded Ring R ∞ as elements of R (by systematically listing coef- Following the classical tradition, we measure time ∞ ficients); thus X can be viewed as a subspace of R . (or cost) as a function of input word size. Suppose ∞ Definition. A problem over R, (X,Xyes), is de- x ∈ R . We define size(x) to be the vector length n cidable if the characteristic function of Xyes in X is of x, thus size(x) = n if x ∈ R . For machine M computable over R. over R and input x, we define TM (x) to be the num- Thus, in this framework, we can state problems ber of nodes traversed from input to output when over R and C (or any ring or field) and ask questions M is input x. TM is our measure of time or cost. So, of decidability. The algorithm presented earlier, if R = Z2 size(x) is the bit size of x, and TM (x) is based on Hilbert’s Nullstellensatz with effective the bit cost of the computation.

OCTOBER 2004 NOTICES OF THE AMS 1029 (Y,Yyes) ∈ PR over R . Also, given f ∈ X, f = (f1,...,fm),fi ∈ R[x1,...,xn],f∈ Xyes n ∃ζ ∈ R such that fi (ζ1,...,ζn) = 0, i = 1,...,m, that is, (f,ζ) ∈ Yyes. We note that there is an exponential time deci-

sion machine for HN over Z2, but as mentioned ear- lier, we do not know if HN ∈ P. On the other hand, since HNZ is not decidable (over Z), it certainly is not in PZ, thus PZ = NPZ .

Main Complexity Results We recall that Theorem [Cook71, Levin73]. P = NP SAT ∈ P and ≥ N 0, a large integer Fix Theorem [Karp72]. Input: R ≥ 2, a real number P = NP TSP12∈ P⇔ X ∈ P, c ∈C (Input) = ← where X Hamilton Circuit or any of nineteen Initialize: z ← 0 other problems. ← n 0 We prove ← ← Theorem Compute: z pc(z) [BSS89]. n ← n+1 PR = NPR HNR ∈ PR for R = Z2, R, C,

Branch: ← |z| ≤ R or for any field R, unordered or ordered. |z| > R To prove this theorem, we show that given any BRANCH problem (X,X ) ∈ NP and instance x ∈ X, we n ≥ N n < N yes R ← can code x (in polynomial time) as a polynomial sys- WHITE Output: n 0 BLACK (n=Escape tem fx over R such that x ∈ Xyes ⇐⇒ fx has a zero Time) over R. In other words, we give a polynomial time Machine to “draw” the Mandelbrot set M reduction from any problem in class NPR into HNR . n This is done by writing down the equations for the (Fact: M = {c ∈ C| | pc (0)| ≥ 2, all n > 0} where 2 pc (z)=z + c) computing endomorphism of an NPR machine. So, HNR is a universal NP-complete problem. ∈ Definition. (X,Xyes) PR (class P over R) if there We know that HNR ∈ EXPR for R = R and C. That exists a decision machine M and a polynomial is, there are exponential time algorithms for de- ∈ Z ∈ p [x] such that for each x X , TM (x) ciding the solvability of polynomial systems over 0} as a search problem: Find the shortest (cheapest) path tra- and Yyes ={(f,ζ) ∈ Y |fi (ζ1,...,ζn) = 0, versing all nodes. To view TSP as a decision problem, we i =1,...,m}. introduce bounds: Given kis there a path of length (cost) Then, since polynomial evaluation is a polyno- at most k traversing all nodes? mial time computation over R , 11 we have 13Toward this goal, a sequence of five papers by Mike Shub and Steve Smale on the related Bezout’s Theorem pre- 11It should be clear what formal definitions we are sup- sents a comprehensive analysis of the complexity of solv- posing here. The proof requires a construction. ing polynomial systems approximately and probabilistically.

1030 NOTICES OF THE AMS VOLUME 51, NUMBER 9 It is interesting to speculate as to whether the questions of whether PR = NPR and whether PC = NPC are re- lated to each other and to the classical P versus NP question. ...I am inclined to think that the three questions are very different and need to be attacked independently....14 One reason for the skepticism is that over R or C one can quickly build numbers of large magnitude, for example by successive squaring. And over R or C, arithmetic operations on numbers of any mag- nitude can be done in one step. Hence, polynomial time machines over R or C might be able to decide inherently hard discrete problems by “cheating”, e.g., by quickly coding up an exponential amount of information within large numbers and subse- quently getting an exponential amount of essential bit operations accomplished quickly. From the Quarterly Journal of Mechanics and Applied In our book we present a number of transfer re- Mathematics, vol. I, 1948, p. 287. sults for complexity theory, one of which transfers the P = NP ? question across algebraically closed R R Z fields of characteristic 0. Let BPP be the class of problems over 2 that can be solved in bounded error (< 1/2) proba- Theorem [BCSS96/98]. PC = NPC PK = NPK , where K is any algebraically closed field of char- bilistic polynomial time. Class BPP is a practical modification of class P. Repeating a BPP algorithm acteristic 0, for example the field of algebraic num- k times produces a polynomial time algorithm with bers. probability of error < 1/2k. For example, BPP al- The transfer of the P = NP ? problem from R R gorithms for Primality (testing) were known and the algebraic numbers to the complexes had been used well before it was known that Primality was proved earlier by Christian Michaux, using model in class P. theoretic techniques [Michaux94]. The other direc- Theorem. PC = NPC ⇒ BPP ⊇ NP. tion uses number theory. Here we show how a ma- The idea of the proof goes as follows. First we chine over C with built in algebraically independent note that by adding polynomials of the form constants can be simulated by a machine that has x(x − 1) to an instance f ∈ HN we get an equivalent no such constants and that takes the same amount ∗ instance f ∈ HNC. Then, any polynomial decision of time (up to a polynomial factor) to compute. We machine M over C for HNC will decide the solv- call this result the elimination of constants. The con- ∗ ability of f and hence the solvability of f over Z2. stants in a machine come into play at branch nodes The trouble is, M might have a finite number of where a decision is to be made as to whether or not built-in complex constants. As before, they come the current x1, which is a polynomial in the machine into play at branch nodes where a decision is to be constants, is equal to 0. This polynomial is not made as to whether or not a polynomial in these presented to us in the standard form, but rather constants, presented by composition, is equal to by a composition of the polynomials along the 0. Rather than generate witnesses as before to computation path, so we cannot in general tell eliminate constants, the decision is now made by quickly enough if the coefficients are all zero. In- probabilistically replacing these constants by a stead, we use the Witness Theorem to quickly gen- small number (by Schwartz’s Lemma) of small num- erate algebraic witnesses with the property that, if bers (by the Prime Number Theorem). Thus, given the original constants are replaced by these wit- a polynomial time machine M over C for HNC , we nesses, and the resulting evaluation is 0, then the could construct a probabilistic polynomial time original polynomial is 0. (The theory of heights machine for HN. comes into play here.) Transfer results provide important connections Shortly after our book went to press, Steve Smale between the two approaches to computing.15 Un- realized (after talking to ) that stan- derlying connections derive from the uniform dis- dard computer science arguments could yield a tribution of rational data of bounded input length transfer result from C to the classical setting [Smale2000]. 15Transfer results are also known for questions regard- 14Emphasis mine. ing other complexity classes such as PSPACE [Koiran02].

OCTOBER 2004 NOTICES OF THE AMS 1031 with respect to the probability distribution of con- dition numbers of numerical analysis [CMPS02]. input x ϕ Introducing Condition, Accuracy, and output (x) Round-off into Complexity Theory: Where x+∆x Turing Meets Newton ϕ(x+∆x) We now return to the Turing paper on rounding- off errors referred to earlier [Turing48]. It is here that the notion of condition (of a linear system) was originally introduced. An With appropriate norm, the ratio illustrative example is presented on page 297: ϕ(x + ∆x) − ϕ(x)/∆x, or relative ratio (ϕ(x + ∆x) − ϕ(x)/ϕ(x))/(∆x/x) (8.1) 1.4x + 0.9y = 2.7 indicates the condition of an instance. If the 0.8x + 1.7y =−1.2 quotient is large, the instance is ill-conditioned and so requires more accuracy, and hence more resources, to compute with small error. (8.2) −0.786x + 1.709y =−1.173 −0.8x + 1.7y =−1.2 computational purposes, ill-conditioned problem instances will generally require more input preci- The set of equations (8.2) is fully equiv- sion than well-conditioned instances. alent to (8.1)16, but clearly if we attempt During the 1980s a number of people gave es- to solve (8.2) by numerical methods in- timates on the average loss of precision for (solv- volving rounding-off errors we are al- ing) linear systems over R. + most certain to get much less accuracy Theorem [Edelman88]. Average log κ(A) than if we worked with equations (8.1)... ∼ log n. The log of condition provides an intrinsic para- We should describe the equations (8.2) meter to replace arbitrarily chosen bit input word as an ill-conditioned set, or, at any rate, sizes for problem instances where the underlying as ill-conditioned compared with (8.1).17 mathematical spaces are R or C. And so a focus It is characteristic of ill-conditioned sets for complexity theory is to formulate and under- of equations that small percentage er- stand measures of condition. rors in the coefficients given may lead My favorite example for illustrating the issues to large percentage errors in the solu- raised and the resolutions proposed is the Linear tion. Programming Problem over R (LPPR) alluded to Turing’s notion of condition, clearly inspired earlier, where R is Q or R. An instance of the LPPR by Newton’s derivative, links both traditions, par- is to maximize a linear function c · x subject to the ticularly when considering questions of complex- constraints Ax ≤ b, x ≥ 0, or to conclude no such ity. maximum exists. The data here is (A, b, c), where Condition Numbers and Complexity A is an m × n matrix over R, b ∈ Rm and c ∈ Rn.

The condition of a problem (instance) measures Competing algorithms for solving the LPPR are how small perturbations of the input will alter the often posed and analyzed using distinct models of output. Introducing the notion of condition pro- computation. (Also, there are various equivalent vides an important link between the two traditions of computing. Definition (Turing). Suppose A is an n × n real matrix and b ∈ Rn. The condition number of the lin- ear system, Ax = b , is given by κ(A) = AA . Here A=max{|Ay|/|y||y = 0} is 1 c α the operator norm with respect to the Euclidean 0 norm ||. ν 0 We note that κ(A) is the worst-case relative con- dition for solving the system Ax = b for x. Thus + log κ(A) provides a worst-case lower bound for the loss of precision in solving the system.18 For

16The third equation is the second plus .01 times the first. The simplex method for the LPP optimizes by 17Emphasis mine. traversing vertices. The newer interior point + + methods follow a trajectory of centers. 18log x = log x for x ≥ 1, otherwise log x = 0.

1032 NOTICES OF THE AMS VOLUME 51, NUMBER 9 20 mathematical formulations of the LPPR, not neces- Theorem [Renegar Interior Point Algorithm]. sarily equivalent with respect to complexity theory.) If the linear program is feasible, the number of it- The simplex algorithm [Dantzig47/90], a long- erations to produce an ε-approximation to a fea- + time method of choice, is an algebraic algorithm sible point is polynomial in n, m, log C(A, b) and that in the worst case takes exponential (in n and | log ε|. m) number of steps [KleeMinty72] given exact and Javier Peña propose an algo- arithmetic. For rational inputs, simplex also is ex- rithm and add round-off error as a parameter for ponential in the input word size (given in bits) in the complexity analysis [CuckerPena02]. the bit model. Theorem [Cucker-Peña Algorithm with Round- On the other hand, for rational inputs, the el- Off]. If the linear program is feasible, the bit cost lipsoid algorithm [Khachiyan79] is polynomial time to produce an ε-approximation to a feasible in the input word size in the bit model. But, as an point is O((m + n)3.5(log(m + n) + log+ C(A, b)+ algorithm over R, i.e., allowing exact arithmetic, it | log ε|)3). The finest precision required is a round- is not finite in general. The same is true of the newer off unit of 1/c(m + n)12C(A, b)2 . interior point algorithms.19 While condition, approximation and round-off It would seem more natural, and appropriate, to help bridge the combinatorial and continuous ap- analyze the complexity of algorithms for the LPPR with respect to an intrinsic input word size. Hence proaches to the design and analyses of linear pro- we are motivated to define a measure of the con- gramming algorithms, basic connections and com- dition of a linear program [Blum90]. Jim Renegar plexity questions remain open. [Renegar95] was the first to propose such a con- In particular: Is LPPR ∈ PR? Even more, is LPP dition number in this context. His definition is in- strongly polynomial? That is, is there a polynomial spired by the 1936 theorem of Eckart and Young. time algorithm over R for LPPR that is also poly- Theorem [EckartYoung36]. For a real matrix A, nomial time with respect to bit cost on rational 21 κ(A) ∼ 1/dF (A, Σ) where Σ is space of ill-posed prob- input data? lem instances, i.e., Σ is the space of noninvertible In conclusion, I have endeavored to give an idea matrices. (Here the relative distance dF is with re- of how machines over the reals—tempered with 2 spect to the Frobenius norm AF = Σaij .) condition, approximation, round-off, and proba- bility22—enable us to combine tools and traditions of theoretical computer science with tools and tra- ditions of numerical analysis to help better un- derstand the nature and complexity of computa- tion. A References

Σ [BPR96] S. BASU, R. POLLACK, and M.-F. ROY, On the combi- natorial and algebraic complexity of quantifier- elimination, J. ACM 43 (1996), 1002–1045. [Blum90] L. BLUM, Lectures on a theory of computation and We now consider the linear programming prob- complexity over the reals (or an arbitrary ring), lem over R in the form: Given Ax = b, x ≥ 0, find Lectures in the Sciences of Complexity II, (E. Jen, ed.), a feasible solution or declare there is none. Addison-Wesley, 1–47, (1990). Definition [Renegar95]. The condition number [Blum91] ——— , A theory of computation and complex- of a linear program over R with data (A, b) is given ity over the real numbers, Proceedings of the In- by C(A, b) =(A, b)/d((A, b), Σm,n). (Here, Σm,n is ternational Congress of Mathematicians, 1990, the boundary of the feasible pairs (A, b), and both the operator norm and the distance d are with 20By giving the simplest results here, I am hardly doing respect to the respective Euclidean norms.) justice to the full extent of Renegar’s and others’ work in Renegar proposes an interior point algorithm this area but hope this discussion will spur the reader to and analyzes it with respect to parameters: n, m, investigate this area more fully. the loss of precision, and the desired accuracy of so- 21Recall that in each case, i.e., over R and in the bit model, lution. complexity is a function of input word size. However, over R, the input size for a rational linear program is the vec- tor length of the input (approximately m × n), whereas in the bit model it is the bit length (approximately m × n × k, where k is the maximum height of the coefficients). 19We emphasize this distinction. Simplex is a finite algo- 22For more on probabilistic algorithms and probabilistic rithm over R and over the rationals in both the exact arith- analysis see [BCSS98]. For more results on complexity es- metic model and the bit model. The newer interior point timates depending on condition and round-off error, see algorithms are finite only over the rationals in the bit [CS98].

OCTOBER 2004 NOTICES OF THE AMS 1033 Kyoto, (I. Satake, ed.), Springer-Verlag, 1492-1507, [KleeMinty72] V. KLEE. and G. J. MINTY, How good is the (1991). simplex algorithm?, Inequalities, III (O. Shisha, ed.) [BCSS96] L. BLUM, F. CUCKER, M. SHUB, and S. SMALE, Alge- 159–175, Academic Press (1972). braic settings for the problem P=NP?, The Mathe- [Koiran 02] P. KOIRAN, Transfer theorems via sign condi- matics of Numerical Analysis, Volume 32, Lectures tions, Information Processing Letters 81 (2002), in Applied Mathematics, (J. Renegar, M. Shub, and 65–69 . S. Smale, eds.), Amer. Math. Soc., (1996), 125-144. [KPS01] T. KRICK, L. M. PARDO, and M. SOMBRA, Sharp esti- [BCSS98] L. BLUM, F. CUCKER, M. SHUB, and S. SMALE, Com- mates for the arithmetic Nullstellensatz, Duke plexity and Real Computation, Springer-Verlag Mathematical Journal 109 (2001), 521–598 . (1998). [Levin73] L. A. LEVIN, Universal search problems, Problems Inform. Transmission 9 (1973), 265–266. [BSS89] L. BLUM, M. SHUB, and S. SMALE, On a theory of com- [Michaux94] C. MICHAUX, P = NP over the nonstandard putation and complexity over the real numbers: NP- reals implies P = NP over R, Theoretical Com- completeness, recursive functions and universal puter Science 133 (1994), 95–104. machines, Bulletin of the AMS 21 (July 1989), No. [Penrose89] R. PENROSE, The Emperor’s New Mind: Con- 1, 1–46. cerning Computers, Minds, and the Laws of Physics, [BlumSmale93] L. BLUM and S. SMALE, The Gödel incom- Oxford University Press (1989). pleteness theorem and decidability over a ring, [Renegar92] J. RENEGAR, On the computational complex- From Topology to Computation: Proceedings of the ity and the first order theory of the reals, Parts I- Smalefest, (M. Hirsch, J. Marsden, and M. Shub, III, Journal Symbolic Comp. 13 (1992), 255–352. eds.), 321–339 (1990). [Renegar95] J. RENEGAR, Incorporating condition numbers [CMPS02] D. CASTRO, J. L. MONTAÑA, L. M. PARDO, and J. SAN into the complexity theory of linear programming, MARTIN, The distribution of condition numbers of SIAM J. Optim. 5 (1995), 506–524. rational data of bounded bit length, Foundations of [Seid54] A. SEIDENBERG, A new decision method for ele- Computational Mathematics 2 (1) (2002), 1–52. mentary algebra, Ann. of Math 60 (1954), 365–374. [Cook71] S. COOK, The complexity of theorem-proving [Shishikura91/98] M. SHISHIKURA, The Hausdorff dimension procedures, Proceedings of the 3rd Annual ACM of the boundary of the Mandelbrot set and Julia Symposium on Theory of Computing, 151–158, sets, Ann. of Math. 147 (2) (1998), 225–267. (1971). (Preprint, SUNY at Stony Brook, IMS 1991/70.) [CuckerPena02] F. CUCKER, and J. PEÑA, A primal-dual al- [Smale2000] S. SMALE, Mathematical Problems for the Next gorithm for solving polyhedral conic systems with Century, Mathematics: Frontiers and Perspectives a finite-precision machine, SIAM Journal on Opti- 2000, (V. Arnold, M. Atiyah, P. Lax, and B. Mazur, mization 12 (2002), 522–554. eds.), Amer. Math. Soc., Providence, RI, 2000. [CS98] F. CUCKER and S. SMALE, Complexity estimates de- [Turing36-37] A. TURING, On computable numbers, with pending on condition and round-off error, Journal an application to the Entscheidungsproblem, Proc. of the ACM 46 (1999), 113–184. Lond. Math. Soc. (2) 42 (1936-7), 230–265; correc- tion ibid. 43 (1937), 544–546. [Dantzig47/90] G. B. DANTZIG, Origins of the simplex method, A History of Scientific Computing, (S. G. [Turing48] A. TURING, Rounding-off errors in matrix Nash, ed.), ACM Press, New York, 141–151, (1990). processes, Quart. J. Mech. Appl. Math. 1 (1948), 287–308. [DavisMatijasevicRobinson76] M. DAVIS, Y. MATIJASEVIC, [Yger01] A. YGER, The diversity of mathematics with re- and J. ROBINSON, Hilbert’s Tenth Problem: Dio- spect to a problem in logic, http://www.math. phantine Equations: Positive Aspects of a Nega- u-bordeaux.fr/~yger/exposes.html. tive Solution, Proceedings of Symposia in Pure Math- ematics, vol. 28, 323–378, (1976); reprinted in The Collected Works of Julia Robinson, (S. Feferman, ed.), Amer. Math. Soc., 269–378, (1996). [Edelman88] A. EDELMAN, Eigenvalues and condition num- bers of random matrices, SIAM J. Matrix Anal. Appl. 9 (1988), 543–560. [EckartYoung36] C. ECKART and G. YOUNG, The approxi- mation of one matrix by another of lower rank, Psy- chometrika 1 (1936), 211–218. [Hilbert1893] D. HILBERT, Über die vollen Invariantensys- teme, Math. Ann. 42 (1893), 313–373. [Hermann26] G. HERMANN, Die Frage der endlich vielen Schritte in der Theorie der Polynomideale, Math. Ann. 95 (1926), 736–788. [Kachiyan79] L. G. KHACHIYAN, A polynomial algorithm in linear programming, Soviet Math. Dokl. 20 (1979), 191–194. [Karp72] R. M. KARP, Reducibility among combinatorial problems, Complexity of Computer Computations,( R. E. Miller and J. W. Thatcher, eds.), 85–103, Plenum Press, (1972).

1034 NOTICES OF THE AMS VOLUME 51, NUMBER 9