<<

Self-Testing/Correcting for Polynomials and for Approximate Functions

Peter Gemmell ∗ † Ronitt Rubinfeld ‡ Madhu Sudan § ¶ April 25, 2002

Abstract We initiate the study of self-testing (and self-correcting) programs which only approximately compute f. This setting captures in particular the digital computation of The study of self-testing/correcting programs was intro- real valued functions. We present a rigorous framework duced in [8] in order to allow one to use program P to and obtain the first results in this area: namely that the compute function f without trusting that P works cor- class of linear functions, the log function and floating rectly. A self-tester for f estimates the fraction of x for point exponentiation can be self-tested. All of the above which P (x) = f(x); and a self-corrector for f takes a functions also have self-correctors. program that is correct on most inputs and turns it into a program that is correct on every input with high prob- ability 1. Both access P only as a black-box and in some precise way are not allowed to compute the function f. 1 Introduction Self-correcting is usually easy when the function has the random self-reducibility property. One class of such Suppose someone gives us an extremely fast pro- functions that has this property is the class of multivari- gram P that we can call as a black box to com- ate polynomials over finite fields [4] [12]. We extend this result in two directions. First, we show that polynomi- pute a function f. Rather than trust that P works als are random self-reducible over more general domains: correctly, a self-testing program for f ([8]) verifies specifically, over the rationals and over noncommutative that program P is correct on most inputs, and a rings. Second, we show that one can get self-correctors self-correcting program ([8] [12]) for f takes a pro- even when the program satisfies weaker conditions, i.e. gram P that is correct on most inputs and uses it when the program has more errors, or when the program to compute f correctly on every input (with high behaves in a more adversarial manner by changing the probability). Self-testing/correcting is an extension function it computes between successive calls. of program result checking as defined in [5],[6], and Self-testing is a much harder task. Previously it was if f has a self-tester and a self-corrector, then f has known how to self-test for a few special examples of func- a program result checker. tions, such as the class of linear functions. We show that one can self-test the whole class of polynomial functions To make this somewhat more precise, consider a function P : X Y that attempts to compute over Zp for prime p. f. We consider→ two domains, the “test” domain ∗U.C. Berkeley. Supported by NSF Grant No. CCR 88- D X and the “safe” domain D X (usually 13632 t s D ⊂= D ). We say that program P⊂ -computes f †. t s

‡Princeton University. Supported by DIMACS (Center for on Dt if Prx Dt [P (x) = f(x)] > 1 . An (1, 2)- Discrete Mathematics and Theoretical ), ∈ − self-tester (0 1 < 2) for f on Dt must fail any NSF-STC88-09648. ≤ program that does not 2-compute f on Dt, and §U.C. Berkeley. Part of this work was done while this author was visiting IBM Almaden. must pass any program that 1-computes f on Dt ¶Hebrew University and Princeton University. Partially (note that the behavior of the tester is not specified supported by the Wolfson Research Awards administered by for all programs). The tester should satisfy these the Israel Academy of Sciences and Humanities. conditions with error probability at most β, where 1[12] independently introduces a notion which is essen- tially equivalent to self-correcting. β is a confidence parameter input by the user. An - self-corrector for f on (Dt,Ds) is a program C that

1 uses P as a black box, such that for every x Ds, Dc = R). Define a R-monomial over Dc to be P 2 ∈ Pr[C (x) = f(x)] 2/3, for every P which - an arbitrary word w (Dc X)∗, and its degree, ≥ ∈ ∪ computes f on Dt. Furthermore, all require only a deg(w), the number of occurrences of indetermi- small multiplicative overhead over the running time nates. An R-polynomial f over Dc is simply the of P and are different, simpler and faster than any sum f = wi of R-monomials, and its degree, Pi correct program for f in a precise sense defined in deg(f) = maxideg(wi). Denote the set of all degree d [6]. d polynomials (where X = n) Rn(Dc). If Dc = R, we omit D from the above| | definitions, and in par- Section 2 is devoted to self-correcting polynomial c ticular, the set of all degree d polynomials is Rd . functions, and Section 3 is devoted to the self-testing n of polynomial functions. Section 4 introduces self- testing/correcting and checking for programs which For example, if R is the ring of matrices over some field, Ds = Dt = Dc = R, and A, B, C, D, E, F R, only approximately compute f, including the digital 2∈ computation of real valued functions. then f(X1,X2) = A+BX1C+X2DX1EX2+X1 F is a polynomial of degree 3 in the two variables X1,X2.

d If for every f Rn(Dc) there is an -self-corrector 2 Self-Correcting Polynomials ∈ d on (Dt,Ds), we say that Rn(Dc) is -resilient on (Dt,Ds). When R = Dc = Ds = Dt, then we say d In addition to the aforementioned application, self- that Rn is -resilient. The interesting issues are of correcting is interesting because of its independent course for what rings and domains this is possible, applications to complexity theory: (1) The existence how small should  be, and what is the complexity of a self-corrector for a function implies that the of the corrector program C, separating its actual function has similar average and worst case com- computation and the number of calls to the black plexities. (2) [4],[12] first observed that low degree box P . polynomials over finite fields have self-correctors. This simple fact turned out to be a key ingredient The results of [4] [12] can be compressed to: in two recent results in complexity theory, mainly IP=PSPACE [14][18] and MIP=NEXPTIME [3]. Theorem 1 ([4][12]) If F is a finite field, F > d + 1, then for every n, F d is 1 resilient| with| Self-correcting is usually easy when the function has n 3d+3 − the random self-reducibility property [8][12]. Infor- O(d) calls to P . mally, the property of k-random self-reducibility is that f(x) can be expressed as an easily computable The self-corrector used to prove this theorem is function of f(x1), . . . , f(xk), where the xi’s are each very simple to implement and is based on the exis- uniformly distributed, and are easy to choose. Sev- tence of the following interpolation identity relating eral functions have been shown to have this prop- the function values between points: for all univari- erty: examples of such functions are integer and ate polynomials f of degree at most d, x, t F , d+1 ∀ ∈ matrix multiplication, modular exponentiation, the αif(x + ai t) = 0 where the ai’s are distinct Pi=0 · mod function and low degree polynomials over finite elements of F , α0 = 1 and αi depends only on − fields. F, d and not on x, t. 3 Then the self-correcting al- gorithm for evaluating f(¯x) = f(x1, . . . , xn) is to Our first aim is to study the scope under which this n randomly choose t¯= (t1, . . . , tn) F with uniform phenomenon occurs. We extend the above results d+1 ∈ ¯ distribution and output i=1 αiP (¯x + i t). With in various directions, and to this end, we define the probability at least 2/3, allP the calls to the· program following general setting. will return correct values, and the output will be

Let R be any ring, and X = x1, x2, . . . xn a set correct. of indeterminates (we assume nothing{ about} R, in particular the indeterminates may not commute). 3 In particular, if F = p, ai can be chosen to be i, and Let Dc R be the domain of coefficients (often i+1 d+1 Z d+1 then αi = ( 1) . Furthermore αif(x + i t) ⊂ − i i=0 · 2this can be amplified to 1 β by O(log 1/β) independent can be computed in O(d2) time usingP only additions and repetitions and and a majority− vote. comparisons by the method of successive differences in [20]. 2.1 New Domains 2.1.2 Fixed Point Arithmetic

2.1.1 Non-Commutative Rings Our second theorem concerns computation over do- mains that are not as nice as finite fields. The only Our first generalization shows, perhaps surpris- similar result we know appears in [3]. Motivated by ingly, that the [4][12] trick works in many non- the fact that every boolean function has a multi- commutative rings as well. This may have com- linear representation with small integer coefficients, plexity theoretic consequences, as many complexity they considered R = , Dc = 2n , Dt = Ds = 10d classes are characterized by non-commutative group and showed Z Z Z theoretic problems. Define C(R), the center of the ring R, to be all elements in R which commute with Theorem 3 ([3]) every element. Define R to be all the elements in d 1 ∗ n(Dc) is 2d -resilient on (Dt,Ds) with O(d) calls R which have inverses. toZ P . Theorem 2 If a random element of R can be uni- We consider rational domains, specifically binary formly generated, and C(R) R d + 1, then for ∗ fixed point arithmetic. Let FIX = y : every n, Rd is 1/(2d)-resilient| ∩ with| ≥O(d) calls to P . s,r s n integers y r . We use the notation that{x ˜ = | | ≤ } We note that the condition that C(R) R∗ d+1 (x1, . . . , xn). is satisfied by rich classes of rings,| most∩ notably| ≥ fi- A nice property of finite fields that is used to get nite group algebras over finite fields of size d + 1. self-correctors for polynomials over finite fields is A special case is the algebra of square matrices≥ of that any fixed element of the field, when multiplied any order over a finite field (here the center is sim- by a random uniformly distributed element of the ply all diagonal matrices). Note that here uniform field, gives a result that is uniformly distributed over generation is trivial. In general, we know only a few the field. This is not a property of the fixed point more examples where uniform generation is possi- domain (fixed point domains do not even have the ble. For any finite group, given by generators, al- property of “wraparound”). Though the proof of most uniform generation is possible by the recent the self-corrector has the same spirit as the previ- Monte Carlo algorithm of [2], but whether this re- ous one, it is technically more involved. Here we use sult extends to the group algebra is open. the generalization of the definitions of self-correcting The proof is based on the fact that such rings can given in [8],[12] and assume that more than one also be shown to have interpolation identities: domain is tested. In addition, this is an example Lemma 1 For every ring R for which C(R) d+ of a class of functions for which the test domains | | ≥ are larger than the safe domain: in order to get re- 1, there are weights α1, . . . , αd+1 such that for any d d+1 silience on a particular precision, one should use a f R1, f(0) = i=1 αif(ci) where c1, . . . , cd+1 C(∈R) and are distinct.P ∈ program that is known to be usually correct both on a slightly finer precision and over a larger range. Proof: We will show that it is possible to select weights so that the identity is true for all monomials We have the polynomial interpolation formula d+1 mj(x) = xa1x . . . xaj of degree j for 0 j d. The f(y) = i=1 αif(y + ait) where a1, . . . , ad are dis- ≤ ≤ P lemma follows by linearity. The constraints on the tinct integers and α1, . . . , αd+1 depend only on d d+1 and not on y or t.. Let A = lcm(a1, . . . , ad), and weights are that for j = 0, . . . , d, i=1 αimj(x) = P A(i) = A . Let L = 5npdd+1, R = FIXn , δj where δ0 = 1 and δj = 0 for j > 0. Since ai s,L n ci C(R), we have that for all 1 k d + 1, Ds = FIXs,p. Then the d + 1 test distributions ∈ j ≤ ≤ ai n m (c ) = a a . . . a c . Thus this is a linear sys- are Dti = y˜ y˜ FIXs,L (1 i d + 1), where j k 1 2 d k { A | ∈ } ≤ ≤ tem of equations in the weights. Since the matrix of the precision on the inputs is finer than that of the the system is a Vandermonde matrix, the system of safe domain. equations has a solution. The self-corrector uses the interpolation formula If ci has an inverse, we have the property that for over the new domain to compute f(x) by pick- ˜ n fixed x and uniformly distributed t, then x + cit ing a random t FIXs,L. The distribution of is uniformly distributed. The rest of the proof of the d + 1 calls to∈ P is dependent on the input theorem 2 is as in [4][12]. and is The distributions on the d + 1 queries is ai n Dq = x˜ + y˜ y˜ FIX (1 i d + 1). achieved. This theorem was proved independently, i { A | ∈ s,L} ≤ ≤ We show that for all 1 i d + 1, Dti and Dqi are using similar techniques by Don Coppersmith [11]. very close, so the program≤ ≤ returns correct answers Theorem 5 If F is a finite field, and for some m, on all inputs with probability at least 3/4. When 3d + 1 < m, m divides F 1, then for every n, F d this happens, f(x) is computed correctly. n is 1/9-resilient with O(|m)| −calls to P . Theorem 4 Rd (D ) is 1 -resilient on n c 2d Proof: (sketch) Let ω be a primitive mth root (D ,...,D ,D ). t1 td+1 s of unity in F . The queries that C makes are Definition i m 1 2.1 P (x+ω r) i=0− , This sequence differs from f(x+ { i m 1 } { x if x = y ω r) i=0− in at most m/3 places with probability δ(x, y) = }  0 otherwise 2/3. As the second sequence is a codeword in the ≥generalized BCH code [17], it can be recovered from Definition 2.2 For random variables X,¯ Y¯ over the first efficiently, and then C(x) can be computed ¯ ¯ ¯ ¯ domain D, δ(X, Y ) = s D δ(Pr[X = s], Pr[Y = again by interpolation. Note here that there are m s]) (note that 0 δ(X,¯ YP¯ ) ∈ 1). (usually m = O(d)), queries to P , but as the er- ≤ ≤ ror correction procedure requires linear algebra, the Lemma 2 If δ(X,¯ Y¯ ) 1 1 and Pr[X¯ S] ≥ − ∈ ≥ running time of C is O(d3). 1 2 then Pr[Y¯ S] 1 1 2. − ∈ ≥ − − We can also generalize this to some non- Proof: Assume Pr[Y¯ S] 1 1 2 and ¯ ∈ ≤ − ¯− commutative rings: Pr[X S] 1 2. Then s S Pr[X = s] ¯∈ ≥ ¯− P ∈ ¯ − δ(Pr[X = s], Pr[Y = s]) 1. Since s/S Pr[X = Theorem 6 If R is a finite group algebra over a s] δ(Pr[X¯ = s], Pr[Y¯ = s≥]) 0, we haveP ∈δ(X,¯ Y¯ ) finite field F , satisfying the conditions of Theorem 5, − ≥ ≤ 1 1. and we can uniformly sample from R, then for every − d 1 n, Rn is 1/9-resilient. Lemma 3 For 1 i d, δ(Dti ,Dqi ) 1 4(d+1) ≤ ≤ ≥ − We do not know how to increase the resiliency over Proof: For all 1 j n, we have that: fixed point domains. i ≤ ≤ xj aiuj yj (i) i Pr[ s + As = A(i)s ] = Pr[xjA + uj = yj] 1 (i) (i) if yj [ L + xjA , +L + xjA ] 2.2.2 Memory Bounded Programs = 2L+1 ∈ −  0 otherwise Until now, we have assumed that the program P a ui i j yj i is a function, i.e. a non-adaptive memoryless ob- Pr[ As = A(i)s ] = Pr[uj = yj] ject. Here we address the case where the program is 1 if yj [ L, +L] not a function, but an adversary A. Under the as- =  2L+1 ∈ − 0 otherwise sumption that there are more than one independent copies of the program (non-communicating adver- 1 n Thus Prx˜ Dq [˜x =y ˜] = ( ) if yi saries), [7] show how to get checkers for any func- ∈ i 2L+1 (i) (i) ∈ [ L + xiA ,L + xiA ] for all i = 1, . . . , n and tion that has a checker in the original model. Here Pr− [˜x =y ˜] = ( 1 )n if y [ L, L] for all we address the case of a single adversary A whose x˜ Dti 2L+1 i i =∈ 1, . . . , n. The claim follows by∈ the− choice of L. only limitation is space. The adversary can apply different programs depending on his memory state, which can be affected by previous queries. Formally, a space s adversary A (over R) is an automaton with s n 2.2 Programs with Weaker Con- 2 states where state i is labeled by Pi : R R, straints and the transition alphabet is Rn. When at→ state n i, given input z R , A returns Pi(z) and moves 2.2.1 Resiliency to a new state j along∈ the unique transition labeled d z. We say that A -computes f Rn if for every d ∈ Note that the resiliency in Theorem 1 degrades with i, Pi -computes f. Rn is (s, )-resilient if there is the degree. We show that using the theory of er- a program C such that Pr[CA(x) = f(x)] 2/3 for ror correcting codes [17], constant resiliency can be all x Rn and for all space s adversaries A≥. ∈ There were no previous results of (s, )-resiliency, chosen proportional to its cardinality, and within it even for 1-bit memory s = 1. We give an al- h is uniformly distributed. It is easy to prove in- gorithm C, for the case of finite fields, that is ductively that for every word w, the state Sw(h) (θ(log F ), 1 )-resilient. is reached by h on w with probability proportional | | 16d2 8 d to its cardinality, and within it h is uniformly dis- Theorem 7 If F > d , then for every n, Fn is 1 |2 | 2 tributed. ( 4 log F , 1/(16d ))-resilient with O(d ) calls to P . | | Let P be any partition of H into m parts As mentioned above, for the algorithm C we give, S1,S2, Sm. Let B1,B2, Bm be arbitrary sub- this is optimal: it fails if s 4 log F . The algo- sets of ···Z with B / Z < ···, and B(h) = B for the ≥ | | n i i rithm queries A on points x + ir with r R F | | | | { } ∈ same i satisfying S(h) = Si. Then, using Lemma and i R F . We use the fact that the pair (x, r) acts ∈ 10 of [15] and simple algebra we get that for a uni- on i as a 2-universal hash function, and the small formly chosen random h, even knowing S(h) will information A has on (x, r) enables the use of results keep the probability that h(u) B(h) for random on the distribution of hash values by [16][15] to show u U essentially unchanged. ∈ that the queries are fairly uniformly distributed in ∈ F n. Pr[h(u) B(h) S(h)] ∈ | We observe that this implies that bigger fields are m = ( Si / U ) Pr[h(u) Bi S(h) = Si] better. When the choice of a field is ours, we have: X | | | | ∈ | i=1

Corollary 4 In polynomial time, we can achieve (1/m) + max Pr[h(u) Bi] 2 ≤ i: Si m U ∈ resiliency to any adversary that is polynomial space { | | ≥| |} 1/4 1/4 bounded! U − +  + U − ≤ | | 1/4 | | Proof: [of Theorem 7]  + 2 U − ≤ | | Let H = h : U Z be a family of universal hash functions{ [9] from→ a} domain U to range Z. This We used m = U 1/4 and the [15] lemma for a subset | | 1/2 means that for any u, u0 U, z, z0 Z and random of hash functions of density at least U − . ∈ ∈ 2 | | h R H, Pr[h(u) = z, h(u0) = z0 = 1/ Z . ∈ | | Now we can fit our algorithm and the space s Let s denote space, and m = 2s. We shall describe bounded adversary into this framework. Our do- a space s ”tree automaton” which is stronger than main U is the field F , and pairs x, r F n map field ∈ the above adversary, and with which it will be conve- elements u F to Z = F n by (x, r)(u) = x + ur, ∈ nient to work. Schematically, it will look like a U - which is universal hashing. | | ary tree, in which every path is a width m branching We want to evaluate f(x), and let us assume for program. Thus every node (labeled by a word from a moment that x is random. We pick a random U ∗) will have m states, and will have a separate r F n, and a random sequence of field elements transition function (with alphabet Z) to each child. ∈ u1, u2, . We request from the adversary the value ··· Formally, for every word w U ∗ and u U let of f(x+uir) allowing him to know and remember for ∈ ∈ δw,u :[m] Z [m] be an arbitrary (transition) free the value of ui. This creates the tree structure × → function. Let P ∅ be an arbitrary partition of H above and guarantees that the states represent only into m parts, S1∅,S2∅, Sm∅ . This partition and information about the pair (x, r). We continue until the transition functions··· naturally define partitions we ask d + 1 distinct questions (the expected time w w w w P = S1 ,S2 , Sm for every w U ∗ induc- until this happens is O(d) as the field is large). Since { ··· } ∈ wu tively as follows. For any u U, h is in Sj iff it was -resiliant, at every state it can err only on w ∈ h Si and δw,u(i, h(u)) = j. For convenience, we a subset of density  of the queries h(d), and the denote∈ by P w the nodes of the tree as well as the above inequality guarantees that this happens with partition of H in that node. The parts of this par- probability at most 2. Hence  < 1/(4d) guarantees tition are the states of that node. that with high probability all replies are correct and we can interpolate safely. Let Sw(h) denote the part (state) containing h in the partition (node) P w. If h is chosen uniformly For the general case x is fixed and not random. We at random from H, then clearly its state S∅(h) is handle this by repeating the above procedure for n x + it, with t R F and i = 1, 2, d + 1, from Corollary 5 If f is a degree d polynomial in m ∈ ··· which f values we can interpolate f(x). We use the variables over p, then f has a (0, )-self-tester on Z m 1 same random r in all procedures. Each (x + it, r) p with O((d + 1) / + poly(d, m,  )) calls to P . is a random pair, so each single procedure fails with Z The self-tester algorithm is almost as simple as the probability at most 1/4d when  < 1/16d2. To see self-corrector algorithm, though the proof of correct- that the correlation between different pairs cannot ness is more difficult. We present the proof of The- help the adversary, note that it was allowed to start orem 8 at the end of this section. The self-testing with an arbitrary partition P . Hence, by telling ∅ is done in two phases, one verifying that the pro- the adversary when we are done with (x + it, r) and gram is essentially computing a degree d polynomial start with (x + (i + 1)t, r), we let him convert in an function, and the other verifying that the program arbitrary way his information (partition) on the old is computing the correct polynomial function. pair into another about the new pair. The details are left to the reader. When the number of variables is small, the provision that the value of the function is known on at least (d+1)m points is not very restrictive since the degree is assumed to be small with respect to the size of 3 Self-Testing Polynomials the field: Suppose one has a program for the RSA function x3 mod p. Traditional testing requires that the tester know the value of f(x) for random values Self-testing is a much harder task than self- of x. Here one only needs to know the following correcting. In [8] it is shown how to get self-testers simple and easy to generate specification: f is a for the two special cases of any function that is degree 3 polynomial in one variable, and f(0) = downward self-reducible in addition to being ran- 0, f(1) = 1, f( 1) = 1, f(2) = 8. These function dom self-reducible, and any function that is linear. values are the− same over− any field of size at least 9. In this section we show that the much richer class of all polynomial functions can be self-tested. In [3][13][19], there are tests, which in our setting al- lows the self-tester to be convinced that the program As in [8], our testers are of a nontraditional form: is computing a multivariate polynomial function of the tester is given a short specification of the func- low degree in polynomial time. Although the test tion in the form of properties that the function must is in polynomial time, it is somewhat complicated have, and verifies that these properties “usually” to perform because it involves the reconstruction of hold. We show that these properties are such that a univariate polynomial given its values at a num- if the program “usually” satisfies these properties, ber of points (which in turn requires multiplications then it is essentially computing the correct function. and matrix inversions), and later the evaluation of the reconstructed polynomial at random points. If Complementing the results of [4][12], we first show the number of variables in the original multivari- how to construct a self-tester for any univariate ate polynomial is relatively small, this is more diffi- polynomial function provided that the value of the cult than computing the polynomial function from function is known on at least one more than the de- scratch, and therefore is not different than the pro- gree number of points. These results extend to give gram in the sense defined by [8]. self-testers for polynomial functions of degree d with m variables, provided the function value is known on We now present the proof of Theorem 8 for the spe- at least (d + 1)m well chosen points, by combining cial case of univariate polynomials of degree d over 2 the univariate result with the results of [3][13][19]. Zp and when  O(1/d ). ≤ In the final version we show that the results can also For simplicity, in the description of our self-testing be generalized to give self-testers and self-correctors program, we assume that whenever the self-tester for functions in finite dimensional function spaces makes a call to P , it verifies that the answer re- that are closed under shifting and scaling. turned by P is in the proper range, and if the an- swer is not in the proper range, then the program Theorem 8 If f is a degree d (univariate) polyno-  notes that there is an error. mial over p, then f has an ( 2(d+2) , 4)-self-tester Z 2 1 We use x R Zp to denote that x is chosen uniformly on p with O(d min(d ,  )) calls to P . ∈ Z · at random in Zp. Lemma 7 g and P agree on more than 1 2δ frac- program Polynomial-Self-Test(P, , β, − tion of the inputs from Zp. (x1, f(x1)),..., (xd+1, f(xd+1))) Proof: Consider the set of elements x such that d+1 Membership Test Prt[P (x) = αiP (x + i t)] < 1/2. If the frac- 1 Pi=1 ∗ Repeat O(  log (1/β)) times tion of such elements is more than 2δ then it contra- d+1 Pick x, t R Zp and test that dicts the condition that Prx,t[ αiP (x + i t) = d+1 ∈ Pi=0 ∗ i=0 αiP (x + i t) = 0 0] = δ. For all remaining elements, P (x) = g(x). RejectPP if the test∗ fails more than  an fraction of the time. In the following lemmas, we show that the function g satisfies the interpolation formula for all x, t and is Consistency Test therefore a degree d polynomial. We do this by first j 0 d for going from to do showing that for all x, g(x) is equal to the interpola- O(log (d/β)) Repeat times tion of P at x by most offsets t. We then use this to Pick t R Zp and test that ∈ d+1 show that the interpolation formula is satisfied by g f(xj) = i=1 αiP (xj + i t). for all x, t. Reject P if theP test fails more∗ than 1/4th of the time. Lemma 8 For all x Zp, Prt g(x) = d+1 ∈  αiP (x + i t) 1 2(d + 1)δ. Pi=1 ∗  ≥ − Let δ Pr [ d+1 α P (x + i t) = 0] x,t i=0 i Proof: Observe that t1, t2 R Zp implies ≡ P ∗ ∈

x + i t1 R Zp and x + j t2 R Zp Definition 3.1 We say program P is -good if δ ∗ ∈ ∗ ∈  d+1 ≤ d+1 and j 0, , d , Prt[f(xj) = αiP (xj + 2 ∀ ∈ { ··· } i=1 i t)] 3/4. We say P is -bad if eitherP δ > 2 or if Pr [P (x+i t1) = αjP (x+i t1+j t2)] 1 δ ⇒ t1,t2 ∗ X ∗ ∗ ≥ − ∗ ≥ d+1 j=1 j such that Prt[f(xj) = i=1 αiP (xj +i t)] < 1/2. (Note∃ that there are programsP which are∗ neither - d+1 good or -bad). Pr [P (x+j t2) = αiP (x+i t1+j t2)] 1 δ ⇒ t1,t2 ∗ X ∗ ∗ ≥ − i=1 The following lemma is easy to prove : Combining the two we get

Lemma 6 With probability at least 1 β a -good d+1 − program is passed by Polynomial-Self-Test. With Pr αiP (x + i t1) t1,t2  X ∗ probability at least 1 β a -bad program is rejected i=1 − by Polynomial-Self-Test. d+1 d+1 = αiαjP (x + i t1 + j t2) It is easy to see that if a program P  -computes X X ∗ ∗ 2(d+2) i=1 j=1 f, then it is -good. The hard part of the theorem d+1 is to show that if program P does not -compute f = αiP (x + j t1) 1 2(d + 1)δ then it is -bad. We show the contrapositive, i.e. X ∗  ≥ − j=1 that if P is not -bad, then it 4-computes f. If P is not -bad, then δ 2. Under this assump- The lemma now follows from the observation that tion, we show that there exists≤ a function g with the the probability that the same object is drawn twice following properties: from a set in two independent trials lower bounds the probability of drawing the most likely object 1. g(x) = P (x) for most x. in one trial. (Suppose the objects are ordered so that pi is the probability of drawing object i, and d+1 2. x, t i=0 αig(x + it) = 0, and thus g is a p1 p2 .... Then the probability of drawing the ∀ P ≥ ≥ 2 degree d polynomial. same object twice is p p1pi = p1.) Pi i ≤ Pi 1 3. g(xj) = f(xj) for j 0, 1, , d . Lemma 9 For all x, t Zp, if δ 2(d+2)2 , then ∈ { ··· } d+1 ∈ ≤ i=0 αig(x + i t) = 0 (and thus g is a degree d d+1 P ∗ Define g(x) to be majt Zp i=1 αiP (x + it). polynomial [20]). ∈ P Proof: Observe that, since t1 + it2 R Zp, we apply it to the quotient function, the floating point have for all 0 i d + 1 ∈ exponentiation function and the logarithm function. ≤ ≤ Prt1,t2 g(x + i t) = We assume that there is a well-defined metric space  d∗+1 as the range of f. j=1 αjP ((x + i t) + j (t1 + it2)) P1 2(d + 1)δ. ∗ ∗  We use a ζ b to denote that a b ζ. We say Furthermore,≥ we− have for all 1 j d + 1 ≈ | − | ≤ d+1 ≤ ≤ that g δ-approximates f on Dt if for all x Dt, Prt ,t αjP ((x + j t1) + i (t + j t2)) = 0 ∈ 1 2 i=0 g(x) δ f(x). We say that P (, δ)-approximately  P 1 δ. ∗ ∗ ∗  ≈ computes f on Dt if there is a g such that P - Putting these≥ two− together we get d+1 computes g on Dt and g δ-approximates f on Dt. Prt1,t2 i=0 αig(x + i t) = d+1  P d+1 ∗ An approximate self-tester verifies that that the αj αiP ((x + j t1) + i (t + j t2)) = Pj=1 Pi=0 ∗ ∗ ∗ program correctly approximates the function on 0 > 0. most inputs: An ( ,  , δ , δ )-approximate self- The lemma follows since the statement in the lemma 1 2 1 2 tester (0  <  , 0 δ < δ ) must fail any pro- is independent of t , t , and therefore if its proba- 1 2 1 2 1 2 gram that≤ does not (≤, δ )-approximately compute bility is positive, it must be 1. 2 2 f on Dt, and must pass any program that (1, δ1)- Lemma 10 g(xj) = f(xj) computes f on Dt. The error probability of the self-tester should be at most β. Proof: Follows from the definition of g and the fact that P is not -bad. An approximate self-corrector for f takes a program P that approximately computes f on most inputs Theorem 9 The program Polynomial-Self-Test and uses it to approximately compute f on all in-  is a ( 2(d+2) , 4)-self-testing program for any degree puts. More formally, an (, δ, δ0)-approximate self- d polynomial function over Zp specified by its values corrector is a program C that uses P as a black 1 box, such that for every x D , Pr[CP (x) at any d + 1 points, if  4(d+2)2 . s δ0 ≤ f(x)] 2/3, for every P which∈ (, δ)-approximately≈ ≥ Proof: Follows from Lemmas 6,9, and 10. computes f on Dt. An analogous definition of an approximate checker can be made. An approximate result checker checks 4 Approximate and that the program correctly approximates the func- Real-Valued Functions tion on a particular input.

Intuitively, we say that a function is (k, δ1, δ2)- In the notions of a result checker and self- approximate-random self-reducible if the function testing/correcting pair considered so far, a result value at a particular point can be approximated to of a program is considered incorrect if it is not ex- within δ2 given approximations to within δ1 of the actly equal to the function value. Some programs function at k points, where the k points are uni- are designed only to correctly approximate the value formly distributed and easy to choose. Regarding of a function. One important setting where this is when we can self-correct, we have the following the- the case is with functions dealing with real inputs orem: and outputs: due to the finiteness of the representa- tion, the program computes a function which is only Theorem 10 an approximation to the desired function. Another If f is (k, δ1, δ2)-approximate-random self-reducible, 1 example of this is the quotient function fˆ(x, R) = then f has an ( 4k , δ1, δ2)-approximate self-corrector. x divR, a commonly used system function, which The proof of this theorem is very similar to the proof can be thought of as an approximation to the inte- that any random self-reducible function has a self- ger division function f(x, R) = (x div R, x mod R). corrector, except that instead of taking the major- Below we give a formal framework to study checkers, ity answer, the approximate self-corrector must take self-testers and self-correctors in this setting. We the “median” answer (the median is not well-defined give a general technique for self-correcting in the over finite groups, but an appropriate candidate can case that the function is approximately linear, and be chosen). 4.1 Approximate Self-Testers for rection is harder to prove. From this point on we Linear Functions assume that the program P is (, ∆)-good. We define the discrepancy disc(x) = f(x) P (x). It We make the following definition: is cleaner to work with the discrepancy of−x because Definition 4.1 A function f from a group G to a it allows us to ignore the effect of E(x, y). We have: group H is ∆-approximately linear if a, b G, ∀ ∈ Pr[disc(x + y) ∆ disc(x) + disc(y)] > 1  f(a + b) ∆ f(a) + f(b) + E(a, b), where E(a, b) x,y ≈ − is some easily≈ computable function from G G to H. We call a function linear if it is 0-approximately× 3 Pr[disc(x + 1) ∆ disc(x)] linear. Notice that our definition of a linear function x ≈ ≥ 4 encompasses a much larger class of functions than 3 Pr[disc(x + a) ∆ disc(x)] the traditional definition. x ≈ ≥ 4 In this section we assume that we have a linear func- The rest of this section shows that disc(x) is always tion f mapping from Zm to Zn or Z, given by f(0) = approximately 0. This is achieved by first show- 0, f(1) and f(a), where a is close to √m. (the prop- ing that changing the value of disc(x) at only a few erty required of a is that any x Zm should be ex- places gives a function h which always passes the pressible as b a + c where b + c∈ 3√m). We show ∗ ≤ approximate linearity test (Lemma 12). Lemma 13 that the following self-tester tests f(x) ∆ P (x). ≈ shows that h is bounded well away from n. This is used in lemma 14 to show that h is actually very Program Approximate-Self-Test(P, ∆, β, ) close to 0 everywhere, implying in turn that disc is within 6∆ of 0 at most points. Finally lemma 15 Approximate Linearity Test shows that for most points the discrepancy is actu- 1 1 Repeat O(  log β ) times ally much smaller than 6∆. Pick random x, y and test that P (x + y) ∆ P (x) + P (y) + E(x, y) Lemma 12 There exists a function h with the fol- Reject P if the≈ test fails more than lowing properties: /2 fraction of the time. 1. x, y Zm, h(x + y) 6∆ h(x) + h(y) Neighborhood Tests ∀ ∈ ≈ 1 2. Prx[h(x) = disc(x)] > 1 2√ Repeat O(log β ) times − Pick random x and test that Proof: Let G = y Zm Prx[disc(x + y) ∆ P (x + 1) ∆ P (x) + f(1) + E(x, 1) ≈ disc(x)+disc(y)] > √{ ∈be the| set of “good” values≈ P (x + a) ∆ P (x) + f(a) + E(x, a) } ≈ of y Zm. For y G define h(y) = disc(y). For Reject P if test fails more than ∈ ∈ 1/4th of the time. other values of y, pick x, z G such that x + z = y and define h(y) = h(x) + ∈h(z). It can be shown, by techniques similar to those in [8], that h so de- Definition 4.2 We say that P is (, ∆)-good if: fined has the property that y, Prx[disc(y + x) 2∆ Prx,y[P (x+y) ∆ P (x)+P (y)+E(x, y)] > 1  and, ∀ ≈ ≈ − disc(x) + h(y)] > 1 2√. This in turn can be used if R = Zn, Prx[P (x+1) ∆ P (x)+f(1)+E(x, 1)] > − 3 ≈ 3 to show that h has the approximate linearity prop- 4 and Prx[P (x + a) ∆ P (x) + f(a) + E(x, a)] > 4 ≈ erty, i.e., h(x + y) 6∆ h(x) + h(y). The fact that The following lemma establishes confidence in the the cardinality of G≈is large completes the proof of self-tester and can be proved using Chernoff bounds. this lemma.

Lemma 11 If P is not (, ∆)-good then the proba- Lemma 13 x, h(x) 27√m∆ 0 bility that P passes the basic linear test is less than ∀ ≈  Proof: Since P is (, ∆)-good, h(1) 3∆ 0 and β. If P is ( 4 , ∆)-good then the probability that P ≈ h(a) 3∆ 0. Lemma 12 now implies that for all x passes the self-test is at least 1 β. ≈ ∈ − Zm, h(x + 1) 9∆ h(x) and h(x + a) 9∆ h(x). The  ≈ ≈ It is clear that if a program ( 12 , ∆/3)-approximately fact that any x Zm can be expressed as b a + c, computes f then it is (  , ∆)-good. The other di- where b + c 3√∈m completes the proof. ∗ 4 ≤ Lemma 14 If 27√m∆ < n/4 then x, h(x) 6∆ 0 4.2 Extending the ∀ ≈ and Prx[disc(x) 6∆ 0] > 1 √ approximate-linear self-tester to ≈ − Proof: Assume that x such that dist(h(x), 0) > other functions 6∆. Pick x such that∃dist(h(x), 0) is maximum. 4.2.1 The Quotient Function Since h(x) 27√m∆ 0 we have dist(h(2x), 0) 2dist(h(x), 0)≈ 6∆ > dist(h(x), 0) which is a con-≥ tradiction. − The mod function, f(x, R) = x mod R , and the division function given by f(x, R) = The second part of the assertion follows from the (x div R, x mod R ), are both linear, but the quo- fact that disc(x) equals h(x) with probability 1 √. − tient function f 0(x, R) = x div R is not. How- (Notice that to prove this lemma for the case where ever the quotient function can be thought of as the range of f is Z, lemma 13 is not required. In a 1-approximately linear function mapping into Z, turn this implies that the case where range of f is because (x1 + x2) div R = x1 div R + x2 div R − Z does not require any neighborhood tests.) E(x1, x2) + ζ, where ζ 0, 1 , and E(x1, x2) = n n ∈ { } ((x1 + x2) div 2 R)2 is an easy to compute func- To get even tighter bounds on the discrepancy of −tion. Hence the approximate-linear self-tester ap- most elements we use the following lemma which plies to the quotient function and we get the fol- illustrates an important tradeoff between  and ∆. lowing corollary. (Note again that since the range of the quotient function is Z, no neighborhood tests Lemma 15 For all k > 0, Pr[disc(x) 1+ 5 ∆ 0] > ≈ 2k are needed.) 1 (4k + 1)√ − Corollary 16 The quo- Proof: Let i be the smallest value of k such that 5 tient function has a (/16, √, 0, 6) self-tester over Pr[(1 + 2k )∆ < disc(x) < 6∆] > 2k√. Let S = n 5 the domain [0 ... 2 R]. x Zm (1 + k )∆ < disc(x) < 6∆ . Consider the { ∈ | 2 } set S0 = (x, y) x, y S, and disc(x) + disc(y) ∆ { | ∈ 2 ≈2 disc(x+y) . The size of this set is at least S m . 4.2.2 Floating Point Exponentiation } | | − 2 The size of S00 = x + y x, y S0 is at least ( S m2)/ S which is{ at least| (2∈i }1)√m. At| most| − | | − The floating point exponentiation function f com- √m of these elements have disc greater than 6∆ x putes f(x) = 2 for inputs from the domain D0 = (by lemma 14). Thus at least 2(i 1)√m elements l exp k k 2 0 l < 2 ,C0 < exp < C1 , where of S satisfy (1 + 5 )∆ < disc−(x) < 6∆. This { 2 | ≤ } 00 2(i 1) C0 < C1 are constants. (The domain here mod- − violates the minimality of i. els a typical floating point word in a computer with 5 l representing the mantissa and exp representing the Similarly we have Pr[(1 + 2k )∆ < n disc(x) < 6∆] > 2k√. An application of lemma 14− completes exponent. Typically C0 is negative and C1 is posi- the proof. tive). Thus we have proved: We first restrict our attention to the exponentiation l function working on the domain D = 2k 0 l < k { | ≤ Theorem 11 If f is a linear function from Zm to 2 . Observe that D forms a group over modular } Zn and n > 54√m∆, then f has a (/12, (4k + addition defined as x +0 y = (x + y)mod1. Further- 5 1)√, ∆/3, (1 + 2k ∆))-approximate self-tester on G. more f(x +0 y) = f(x) f(y) E(x, y), making f a linear function, where∗E(x, y∗) = 1 if x + y < 1 It should be noted that, though our proof as pre- and 1/2 otherwise. Thus f can be self-tested over sented here requires that n > 54√m∆, this condi- D using our approximate linearity self-tester. (D tion can be easily relaxed by throwing in some more 0 on the other hand does not form a group, hence our neighborhood tests. For instance, if n = Θ(m1/d), methods are not directly applicable to it.) and f(ai) is known for i = 1, . . . , d + 1 (where ai is close to mi/d+1, then for each i we test that Next observe that computing f over D and com- P (x + ai) ∆ P (x) + f(ai) + E(x, ai). These tests puting f over D0 are equivalent in an approximate ≈ suffice to carry out the proof here with little modi- sense. A program computing f over D0 also com- fications to lemma 13. putes f over D. For the other direction, observe l that x D , x (k+1) a + where a Z. 0 2− 2k 5 Acknowledgments ∀ ∈ a ≈ l ∈ Thus f(x) k 2 f( ). (Here the metric used on 2− 2k the range is≈d(x, y) = log x log y .) Thus given | − | We wish to thank Gary Miller for pointing out the a program computing f on the domain D we can extension of our initial results to finite-dimensional get a program computing f in the domain D0 ap- function spaces. We would also like to thank Manuel proximately. The approximate equivalence of the Blum for suggesting the logarithm and floating point two domains yields the following corollary to theo- exponentiation functions as applications of approxi- rem 11. mate self-testing/correcting, and Mike Luby for sev- eral technical suggestions. We would like to thank Corollary 17 The floating point exponentiation Shafi Goldwasser, Sampath Kannan and Umesh function for numbers with k bits of precision has Vazirani for several interesting and helpful discus- a (/16, √, log(1+ 1 ), log(1+ 9 ))-approximate- 2k+1 2k sions. self-tester.

4.2.3 Floating Point Logarithm References

The floating point log function f takes an input from [1] Babai, L., “Trading Group Theory for Ran- l exp k the domain D0 = (1 + k )2 1 l < 2 , C0 < domness”, Proc. 17th ACM Symposium on { 2 | ≤ − exp < C1 where C0,C1 are constants and computes Theory of Computing, 1985, pp. 421-429. f(x) = log} x. In this section we outline a self-tester for the floating point log function. [2] Babai, L., “Local expansion of vertex-transitive graphs and random generation of finite As in the case of floating point exponentiation, the groups”, University of Chicago TR90-31, Oc- problem reduces to testing on the smaller domain tober 16,1990. D = (1 + l ) 1 l < 2k (computing log 2exp is { 2k | ≤ } easy). But unlike the floating point exponentiation [3] Babai, L., Fortnow, L., Lund, C., “Non- case there seems to be no obvious group structure Deterministic Exponential Time has Two- on this domain. The natural binary operation over Prover Interactive Protocols”, Proceedings of D, given by a b = a b if a b < 2 and a b/2 oth- the 31st Annual Symposium on Foundations of ∗ ∗ ∗ erwise, is not associative. But, it can be shown that Computer Science,, 1990. it satisfies the following approximate associativity property. [4] Beaver, D., Feigenbaum, J., “Hiding Instance in Multioracle Queries”, STACS 1990. (a (b c)) 3 ((a b) c) ≈ 2k [5] Blum, M., “Designing programs to check their work”, Submitted to CACM. The tester for the log function performs the approx- imate linearity test and the following neighborhood [6] Blum, M., Kannan, S., “Program correctness tests for j = 1, 2, 3. checking ... and the design of programs that check their work”, Proc. 21st ACM Symposium j j on Theory of Computing, 1989. 1 for random x test if P (x k ) P (x) + k 2 ≈ 2k 2 [7] Blum, M., Luby, M., Rubinfeld, R., “Pro- The correctness of this tester can be proved now gram Result Checking Against Adaptive Pro- in a manner similar to the proof of theorem 11 us- grams and in Cryptographic Settings”, DI- ing the approximate associativity property instead MACS Workshop on Distributed Computing of associativity. The details are omitted from this and , 1989. version. [8] Blum, M., Luby, M., Rubinfeld, R., “Self- Theorem 12 The floating point logarithm func- Testing/Correcting with Applications to Nu- tion for numbers with k bits of precision has a merical Problems,” Proc. 22th ACM Sympo- (/16, √, 1 , 33 ) approximate self tester. 2k+1 2k − − − sium on Theory of Computing, 1990. [9] Carter, L., Wegman, M., “Universal Hash Functions”, Journal of Comp. and Sys. Sci. 18 (1979) 143-154. [10] Cleve, R., Luby, M.,“A Note on Self- Testing/Correcting Methods for Trigonomet- ric Functions”, International Computer Science Institute Technical Report TR-90-032, July, 1990. [11] Coppersmith, D., personal communication. [12] Lipton, R., “New directions in testing”, Pro- ceeding of DIMACS Workshop on Distributed Computing and Cryptography, 1989. [13] Lund, C., “The Power of Interaction”, Univer- sity of Chicago Technical Report 91-01, Jan- uary 14, 1991. [14] Lund, C., Fortnow, L., Karloff, H., Nisan, N., “Algebraic Methods for Interactive Proof Sys- tems”, Proceedings of the 31st Annual Sym- posium on Foundations of Computer Science,, 1990. [15] Mansour, Y., Nisan, N., Tiwari, P., “The Com- putational Complexity of Universal Hashing”, Proceedings of the 22nd Annual ACM Sympo- sium on Theory of Computing, 1990. [16] Nisan, N., “Psuedorandom Generators for Space-Bounded Computation”, Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, 1990. [17] Peterson, W.W., Weldon, E.J., Error Correct- ing Codes, MIT Press, Cambridge, Mass. [18] Shamir, Adi, “IP=PSPACE”, Proceedings of the 31st Annual Symposium on Foundations of Computer Science, 1990. [19] Szegedy, Mario, manuscript, January 1991. [20] Van Der Waerden, B.L., Algebra, Vol. 1, Fred- erick Ungar Publishing Co., Inc., pp. 86-91, 1970.