<<

vs. Time: De- under a uniform assumption

Russell Impagliazzo∗ Avi Wigderson† Department of Computer Science Institute of Computer Science University of California Hebrew University San Diego, CA 91097-0114 Jerusalem, Israel 91904 [email protected] [email protected]

Abstract 1 Introduction, History, and Intuition We prove that if BPP = EXP, then every prob- lem in BPP can be solved6 deterministically in 1.1 Motivation subexponential time on almost every input ( The introduction of randomization into efficient on every samplable ensemble for infinitely many computation has been one of the most fertile and input sizes). This is the first derandomiza- useful ideas in computer science. In cryptogra- tion result for BPP based on uniform, non- phy and asynchronous computing, randomization cryptographic hardness assumptions. It implies makes possible tasks that are impossible to per- the following gap in the average-instance com- form deterministically. Even for function com- plexities of problems in BPP : either these com- putation, many examples are known in which plexities are always sub-exponential or they con- randomization allows considerable savings in re- tain arbitrarily large exponential functions. sources like space and time over deterministic al- We use a construction of a small “pseudo- gorithms, or even “only” simplifies them. random” set of strings from a “hard function” But to what extent is this seeming power of in EXP which is identical to that used in the randomness over real? The most analogous non-uniform results of [21, 3]. How- famous concrete version of this question regards ever, previous proofs of correctness assume the the power of BPP , the class of problems solv- “hard function” is not in P/poly. They give a able by probabilistic polynomial time non-constructive argument that a circuit distin- making small constant error. What is the rela- guishing the pseudo-random strings from truly tive power of such algorithms compared to de- random strings implies that a similarly-sized cir- terministic ones? This is largely open. On the cuit exists computing the “hard function”. Our one hand, it is possible that P = BPP , i.e., ran- main technical contribution is to show that, if the domness is useless for solving new problems in “hard function” has certain properties, then this polynomial time. On the other, we might have argument can be made constructive. We then BPP = EXP , which would say that random- show that, assuming EXP P/poly, there are ness would be a nearly omnipotent tool for al- EXP -complete functions with⊆ these properties. gorithm design. A priori, neither extreme seems likely: there are some problems where random- ness seems exponentially helpful, but many hard problems are not susceptible to randomized solu- ∗Research supported by NSF Award CCR-92-570979, Sloan Research Fellowship BR-3311, grant #93025 of the tions. joint US-Czechoslovak Science and Technology Program, In this paper, we show that the intuition that and USA-Israel BSF Grant 92-00043 randomness is a resource basically incomparable †This research was supported by grant number 69/96 of the Israel Science Foundation, founded by the Israel to time is wrong. Either there is a non-trivial de- Academy for Sciences and Humanities terministic of BPP , or BPP = EXP ! Either time can non-trivially substitute for ran- “cryptographically secure” (BMY-type) pseudo- domness, or randomness can non-trivially substi- random generator based on one-way functions tute for time. In other words, either universal [6, 24, 16, 7, 8, 11]; the NW-generator based on de-randomization is possible, or randomization a Boolean function with no circuit that approxi- is a panacea for intractability. (There are some mates it [21, 3]; and the hitting set method [1, 2]. technical provisos: the deterministic simulation To state our results, we will need some notation only works for infinitely many input lengths, and for classes. Let Size(T (n)) be the may fail on a negligible fraction of inputs even class of functions computable by circuit families of these lengths.) We consider the former much where the number of gates in the circuit with more plausible than the latter. n inputs is at most T (n). For C a complexity class and t(n) a function, let C/t(n) be the class 1.2 History: Hardness vs. Ran- of functions computable in C with t(n) bits of “advice” depending only on the input size, i.e., domness f C/t(n) g C and a function h : Z ∈ ⇐⇒ ∃ ∈ → While counter to most people’s first intuition, Z with h(n) t(n) and f(x) = g(x, h( x )). A | | ≤ | | c our result should be less surprising to those who result of [15] shows that P/poly = c 1Size(n ). ∪ ≥ are aware of the literature on de-randomization. For C a complexity class, let i.o. C be the class − The fundamental in de-randomization of functions that agree with a function in C for is to trade “hardness” for “randomness”. This all inputs of length n for infinitely many n. was first elucidated in the remarkable sequence of The first set of papers construct a pseudo- papers [22, 6, 24]. Roughly speaking, “computa- random generator from a one-way function. tionally hard” functions can be used to construct The pseudo-random generator quickly converts “efficient pseudo-random generators”. These in a small random string to a polynomially larger turn lower the randomness requirements of any string that seems random in the following sense: efficient probabilistic , allowing for a Any adversary that can distinguish an output of “nontrivial” deterministic simulation. this generator from a truly random string of the In many such results, there is a quantitative same length can be used to invert the function. trade-off between the hardness assumption and A BPP algorithm that had a markedly different the time to perform the deterministic simula- behaviour on a pseudo-random input than a ran- tion. The stronger the assumption, the faster dom one would be such an adversary. So if no the simulation. Here, we are concentrating on such invertor exists, the deterministic algorithm the “low end” of the curve in this trade-off: what that enumerates the multi-set of outputs of the is the weakest assumption one can make and still generator and simulates the BPP algorithm on have some version of universal derandomization? each, taking the majority answer, would always Our results also have some implications for the be correct. Informally, this is stated as: “higher end” of the curve, but these are much less clean, and we will not fully describe them in Theorem A 1 [6, 24, 8, 7, 11] this abstract. If there are one-way functions that cannot be in- verted with a non-negligible in P/poly, We will thus compare our results mainly to the then BPP SUBEXP “low end” version of the known results. In partic- ⊂ ular, we will use as our standard for “nontrivial” The NW-generator [20, 21, 3] considerably nδ the class SUBEXP = δ>0DTIME(2 ). The weakened the hardness assumption needed in the statement BPP SUBEXP∩ (read “random- ⊂ nonuniform setting. It achieves the same deter- ness is weak”), while falling short of P = BPP , ministic simulation of BPP , from any function would be a great result to prove uncondition- in EXP P/poly. ally, and it certainly implies BPP = EXP . − 6 There have been a sequence of papers getting Theorem A 2 [20, 21, 3] weaker and weaker hardness assumptions suffi- If EXP P/poly, then BPP i.o. SUBEXP cient to prove such a result. These papers use 6⊂ ⊂ − one of three basic methods for converting hard This was the best result known at the “low functions into pseudo-random sequences: the end” of the hardness vs. randomness curve.

2 There has also been a sequence of papers [21, then M(n, R) is distributed according to µn. As 1, 2, 14] at the “high end” of the curve, where usual, we can extend this notion to allow M ac- the desired goal is to obtain P = BPP under cess to an oracle, in which case we say µ is poly- the weakest possible assumption. The strongest nomially sampleable given the oracle. result in this sequence is stated below: Let T and  be functions of n. HeurT IME (T (n)) is the class of pairs (f, µ) Theorem A 3 [14] (n) of functions f : 0, 1 ∗ 0, 1 and probability If EXP i.o. SIZE(2o(n)), then BPP = P . { } → { } 6⊂ − ensembles µ so that there is an algorithm A(x) running in deterministic time T ( x ) so that n, n | | ∀ 1.3 Average-Case Derandomiza- for x µ 0, 1 , P rob[A(x) = f(x)] < (n). ∈ n { } 6 tion Under Uniform Assump- AvgT IME(n)(T (n)) is the class of pairs (f, µ) tions of functions f : 0, 1 ∗ 0, 1 and probabil- ity ensembles µ {so that} → there { is} an algorithm Both the high-end and low-end results above A(x) running in deterministic time T ( x ) so that require non-uniform hardness assumptions, i.e., | | n A(x) f(x), ? and for all n, for x µn 0, 1 , that the hard problems in question are hard for P rob[∈A( {x) = f(}x)] < (n). ∈ { } circuits. The reason for this, intuitively, is that If membership6 in one of the above classes holds if a function is hard uniformly but easy non- for f together with any polynomially sampleable uniformly, there is some advice, or trap-door, ensemble, then we omit mention of the ensem- that makes computing the function easy. Even ble and simply say that f is in the class. The if this trap-door is hard to find, there is no way same abuse of notation will be used for complex- to guarantee that a rare instance of the BPP ity classes being subsets of the above classes. problem does not code this trap-door informa- tion. Thus, it seems difficult to obtain a worst- Under cryptographic assumptions, the stan- case guarantee for the simulation. dard techniques give “average-case” . However, we might still be able to get an From uniformly one-way functions, we can gen- “average-case” simulation under a uniform as- erate pseudorandom sequences that are hard for sumption. In fact, what such a result would say is probabilistic algorithms, rather than circuits, to that it is infeasible to find inputs for which such a distinguish. An informal statement of the result- simulation fails. We need to be somewhat careful ing derandomization is: in defining average-case complexity of a problem. We actually want to examine the difficulty of the Theorem A 4 [6, 24, 8, 7, 11] problem for “most instances” rather than the av- If there are uniformly one-way func- erage of the difficulties. We also should say what tions, then for every c > 0, kinds of errors the algorithm is allowed to make BPP Heur c SUBEXP and ZPP ⊂ 1/n ⊂ on the exceptional instances. An algorithm al- Avg1/nc SUBEXP . lowed to output mistakes on a small number of inputs will be called a “heuristic” for the prob- 1.4 Our Results lem, whereas an algorithm that simply fails to give an answer in the alloted time will be called The main result of this paper is a version of this an algorithm with a certain average-case perfor- theorem based on the much weaker assumption mance. Here, we’ll use somewhat ad hoc defini- BPP = EXP . tions of these concepts that have certain technical 6 advantages for our setting, especially when mak- Theorem 5 If BPP = EXP then for every 6 ing statements about “infinitely many” sizes.. c > 0, BPP i.o. Heur1/nc SUBEXP and ⊂ − ZPP i.o. Avg1/nc SUBEXP . Definition 1 A probability ensemble µ = ⊂ − + µn n Z . is a sequence of probability dis- We also give a sharp converse: {tributions| ∈ on} the set of strings of length n. The ensemble µ is polynomially sampleable if there Theorem 6 There are functions in is a polynomial p and a polynomial time com- EXP P/poly that are not in i.o. ∩ o(n) − p(n) HeurT IME (2 )/o(n). putable function M so that if R U 0, 1 , 2/3 ∈ { }

3 Corollary 7 If BPP i.o. points. While this is easy if f was the inverse of HeurT IME (2o(n))/o(n), then⊆BPP = EXP−. a one-way function or nonuniformly (hard-wire 2/3 6 And so BPP i.o. Heur1/nc SUBEXP these values), such values seem impossible to ob- ⊂ − tain uniformly for an arbitrary function in EXP . Corollary 8 If BPP = EXP P/poly then ∩ Here we take a careful look at the steps men- BPP = EXP . tioned above. It was already known that for some of them ( Random Self Reducibility [19, 4, 5], Summarizing, we get: Hard Core Bit theorem [8]) the circuit construc- Corollary 9 Exactly one of the following holds: tion is already uniform. More importantly, in (1) BPP = EXP the other steps (the XOR Lemma [24], the gen- nδ erator conversion [20, 21]), function values of f (2) BPP i.o. HeurT IME c (2 ) for every 1/n are the only nonuniform construct needed. Thus, δ > 0 and⊆c > 0−. the first key idea is allowing our PPT (Proba- This result is naturally interpreted as a gap bilistic Polynomial Time) algorithm to have an theorem on derandomization - either no deran- oracle for f. domization of BPP is possible at all, or otherwise At first sight it seems ridiculous to give an algo- a highly nontrivial derandomization is possible. rithm trying to compute f an access to an oracle A more precise statement of the gap is: for f. However, we don’t merely want to com- pute f on a single input, but to construct a cir- Corollary 10 If BPP i.o. cuit computing f on all inputs. This, one could o(n) ⊆ − HeurT ime1/3(2 )/o(n) then BPP state the issue of whether such a construction ex- nδ ⊆ i.o. HeurT IME1/nc (2 ) for every δ > 0 ists as whether f is learnable from examples in and −c > 0. the sense of computational learning theory. We show that a distinguisher for the pseudo-random 1.5 Why wasn’t this paper written generator can be used to learn how to compute the hard function from examples. in 1988? So we get the following (informal) partial re- The rest of this section describes intuitively the sult: if an NW-generator based on any f EXP ∈ obstacles to obtaining this result long ago, and is not pseudo-random, then a circuit for f can be the key ideas we use to overcome them. At- constructed in PPT f . tempts to find a uniform version of the above We still need to convert this into a construc- result (namely, replacing P/poly by BPP in the tion of such a circuit with no oracle calls. This hardness assumption) followed immediately after is not trivial. However, a crucial observation is its in 1988, both by the authors and that in the construction above the use of the or- many others. However, the following presented a acle is limited in the following way: it is never psychological barrier: called on larger input lengths than those of the The proofs of the before-mentioned theorems circuit it constructs. How can this help to elim- have the following structure. A (presumably inate the oracle? The next key idea is assuming hard) function f is used to construct a (hope- (for no good reason so far) that f happens to be fully pseudorandom ) generator G. Equivalently, downward self reducible (like SAT or PERMA- one starts with a hypothetical distinguisher for NENT). In such cases observe that the oracle for G, and constructs from it an efficient algorithm f is redundant: to construct a circuit for fn, sim- for f, obtaining a contradiction. In the nonuni- ply use the above PPT algorithm, and whenever form versions the distinguisher and algorithm are it calls the oracle, use the downward self reduc- circuits, and only an existence proof of the later tion and the inductively constructed circuits of from the former is required. In the uniform case, smaller sizes. both are probabilistic Turing machines. It remains to justify the assumption that f is The construction of an algorithm for f from downward self reducible. This seems a strange a distinguisher of G, follows a sequence of steps. assumption, since downward self-reducible prob- Various steps in this construction seem to inher- lems are always in P SP ACE and our f is sup- ently need values of f at many (often random) posed to be complete for EXP . However, we

4 have an advantage: we know that the only way so its average-case and worst-case difficulty for the NW generator fails is if EXP P/poly, and BPP are equivalent. Secondly, one can solve the then it follows from [15] and [23]⊆ that EXP = permanent in polynomial time using an oracle for Σ2 = P #P . 1 the permanent of smaller matrices. We call this So if the NW generator fails with a standard property downward self-reducibility. We will as- EXP -complete problem, we try again with a sume we have a function f / BPP with these downward self-reducible #P -complete problem properties. ∈ (which we then know is also EXP -complete). If Let fn be f restricted to inputs of size n. For the simulation still fails, we can use it to solve the each input size n, we will construct a pseudo- c #P complete problem, and hence any problem in random generator Gn from n bits for some fixed EXP . We will pick f to be our favorite random- constant c, to nd bits for an arbitrary d > c, that self-reducible and downward self-reducible func- will be computable in polynomial time with an tion complete for #P , a variation of the perma- oracle for fn. Given a circuit that distinguishes nent function. the output of this generator from truly random strings, we will be able to construct a circuit com- puting f on n bit strings, in polynomial-time with 2 Proof of Theorem 5 an oracle for fn . The simulation of a BPP algorithm is as fol- 2.1 Overview lows. Let δ > 0 be given. On inputs of size k, assume the BPP algorithm uses kc1 random bits We want to show that derandomization is possi- δ/2c and time. Set n = k . Using d = 2cc1/δ, we ble given BPP = EXP . As mentioned above , d c1 6 construct the of Gn, a set of n = k bit since [21] show that BPP i.o. SUBEXP as- c δ strings, in time 2O(n ) = O(2k ). We then simu- suming EXP P/poly, we⊆ can− assume EXP late the BPP algorithm on each element of the P/poly. 6⊆ ⊂ range and take the majority vote. Furthermore, [15] observed that if this is true, Assume the above algorithm is incorrect with then EXP = Σ2. The proof of this can be probability 1/kd with respect to some sampleable sketched as follows. Let f be the function that, distribution µ on k-bit strings, for all but finitely given a Turing Machine that runs in time 2n, an k many k. Then given n, we can set k = n2c/δ input of length n and the name of a cell in the and through random µ , find instances tableau for the machine, outputs the contents of k x , ...x O(1) in probabilistic polynomial-time so that cell. f is EXP -complete. Given a circuit 1 k that the with high probability the algorithm fails C, the question “Does C compute f on all in- on at least one x . Translating the behaviour puts of length n?” is in Co-NP, since if not, one i of the BPP algorithm on x into a circuit, we can exhibit an input on which the circuit fails, i get a polynomial-time probabilistic construction and a previous inconsistent finite block in the which produces a collection D , ..D so that at tableau. Thus, if small circuits for f exist, one 1 r least one D distinguishes outputs of G from can non-deterministically guess one, and then co- i n truly random strings. non-deterministically verify it. So if f P/poly, Working through the constructions from [21, then f Σ2. 2 ∈ 3], we show that from such a distinguisher we By Toda’s∈ Theorem [23], then EXP = P #P can construct a circuit for f in probabilistic and so the permament function is complete for n polynomial-time using an oracle for f . We then EXP . ([23]). Thus, we can assume that com- n get out of this Catch 22 by using a bootstrap- puting the permanent is not possible in BPP . ping argument as follows. Assume we have a cir- The permanent has two nice properties that we cuit Cn 1 for fn 1. Since f is downward self- will use. First, it is random self-reducible ([19]), − − reducible, we can simulate an oracle for fn using 1 The result from [15] does not relativize ([13]), Cn 1. We construct a set D1, ..Dr as above that − which suggests that similar methods might lead to non- contains a distinguisher for Gn. We can find this relativizing separations. However, we are unsure whether distinguisher, since we can use our oracle for f our main result relativizes. n 2[3] get the stronger result, If EXP P/poly then to from the range of Gn and hence esti- EXP = MA. ⊆ mate the distinguishing probability. We then use

5 it to construct a circuit Cn computing fn. A strong construction for A is a probabilis- The key observation is that the size of Cn does tic function f(n, α) so that n 1, α > 0, ∀ ≥ not depend on the size of Cn 1, since Cn 1 was P rob(f(n, α) An) 1 α, where the prob- only used as an oracle. So unlike− the situation− ability is over∈ random≥ choices− made by f.A if we used the downward reduction directly in weak construction for A is a probabilistic func- such a construction, we do not get an exponential tion f(n) and a constant c > 0 with P rob(f(n) c ∈ blow-up in size. An) n− for all n 1. A is weak/strong prob- abilistic≥ polynomial≥ time constructible if there is 2.2 Reductions between construc- a weak/strong construction f for A which runs in tion problems time polynomial in n or n and 1/α, respectively. A construction of B from A is a probabilistic Most of the algorithms we’ll use will be proba- function f(x, α) so that for every n, α > 0, a ∈ bilistic polynomial time algorithms whose inputs An, P rob[f(a, α) Bn] 1 α. B is probabilis- and outputs will be encodings of Boolean circuits. tic polynomial time∈ constructible≥ − from A, written We’ll be interested in statements, “If one can con- A B, if there is a construction f of B from A struct a circuit with property X , then from it which→ runs in time polynomial in n and 1/α. one can construct a circuit with property Y”. Be- cause to formalize such a statement requires some The following definition of random self- quantification, it is helpful to have some general reducibility is slightly non-standard, but is notation for such reductions. For our paper, one clearly implied by all the usual definitions: can think of each construction problem as speci- Definition 5 A function f : 0, 1 ∗ 0, 1 ∗ is c fying a type of circuit, and one can think of n as { } →f, {1 n−} randomly self reducible (RSR) if (C − the number of inputs to these circuits. Cf ) for some c 0. → ≥ Definition 2 A construction problem A = As usual, we can extend this definition by al- An is a family of non-empty subsets An lowing the function f access to an oracle, O, { } ⊆ O 0, 1 ∗. (Note that no upper bound is put on the which we will write A B. If, in addition, { } → sizes of members of An in terms of n.) the queries made by the construction are all of binary length n, we write A On B. In the below definitions, the are We also adopt the same length→ restriction for taken over the uniform distribution on n bit the familiar efficient Turing reduction among strings. functions.

Definition 3 Important Construction Definition 6 Let f, g : 0, 1 ∗ 0, 1 ∗, and Problems let ` : Z+ Z+. We say{ that} f→is {polynomial} Turing reducible→ to g restricted to length `(n), Circuits approximating f Let f : 0, 1 ∗ { } → and write fn pT g`(n), if there is a determinis- 0, 1 ∗ and  : N [0, 1]. Define the tic3 polynomial≤ time oracle machine M g that on { } → f, f, construction problem C by Cn con- every input x outputs f(x) and queries the ora- tains all circuits C with n inputs satis- cle g only on inputs of length at most `( x ). f is fying P r[C(x) = f(x)] (n). | | downward self-reducible if fn pT fn 1. ≥ ≤ − f f,1 Circuits computing f C = C . Let ModP erm be the following decision prob- lem: Distinguishers Let m : N N, G = Gn : → { 0, 1 m(n) 0, 1 n, and  as before. Instance: An integer k in unary, a prime p > 2k { } → { } Define the construction problem DG, in unary, a k k matrix M of integers modulo G, × by Dn contains all circuits D with p, and an integer t modulo p. n inputs satisfying P r[D(G(y) = 1] P r[D(x) = 1] . − Problem: Is P erm(M)modp = t?, where P erm ≥ is the usual permanent function. Definition 4 Let A and B be construction prob- 3A probabilistic version can be given and used as well, lems. but we shall not need it

6 Note that it is easy to generate random valid This follows from the previous two lemmas. We instances of ModP erm of a given length, and can repeatedly sample from using the weak con- to place them in one-to-one correspondence with struction. For each circuit we construct, we can integers up to the number of valid instances. So estimate its distinguishing probability by sam- without loss of generality, by “uniform” distri- pling from the range of Gf using the oracle for bution on length n strings for ModP erm, we fn, since Gf pT fn. uniform on the valid instances of length ≤ n. ModP erm is downward self-reducible by the Lemma 15 If f is random-self-reducible, then DGf ,1/5 fn Cf usual method of computing permanent via mi- → nors. By using the Chinese Remainder Theo- This is the main technical lemma, and will be rem, it is easy to see that computing the perma- proved in sub-section 2.4. However, it really just nent for an arbitrary matrix of integers reduces examines the proofs of the known results. to ModP erm, so it is complete for #P . The self- Lemma 16 If f is downward self-reducible, and reducibility for permanent modulo a prime > 2k f due to Lipton [19] (using the method of [4]) shows C is strongly polynomial-time constructible us- ing oracle fn then f BPP that ModP erm is random self-reducible accord- ∈ ing to the above definition. In the sequel, the Proof. We recursively compute circuits C1 f f ∈ reader can think of f as ModP erm, although we C , ..Cn C . We then output Cn evaluated at 1 ∈ n prefer to state things in more general terms. our input. Say that we have computed Ci. We f run the construction for Ci+1 with oracle fi+1 Lemma 11 There is a #P complete decision (with α = 1/n2), simulating queries to f by problem that is downward self-reducible and ran- i+1 M Ci , where M is the poly-time oracle Turing dom self-reducible. Machine from the definition of downward self- Now we can sketch the outline of the proof in reducibility. Note that Ci+1 (Time taken by the con- terms of the definitions above. Let f be a random | | ≤ self-reducible and downward self-reducible func- struction not counting oracle queries)*(Time tion, like ModP erm. We will define in the next taken to simulate queries not counting the time to evaluate oracle calls by M). This is a fixed subsection the NW-generator Gf . Assuming that polynomial in n, independent of the size of C . Gf is not pseudo-random, we will conclude that i Since each Ci is bounded by this fixed polyno- f BPP . | | ∈The sketch is as follows: mial in n, the time for each stage (including time to evaluate oracle calls by M) is a fixed polyno- f Lemma 12 Gfn pT fn . mial in n. Also, the probability that C C is ≤ n n at most α n = 1/n, so the error is bounded.6∈ This will be immediate from the construction of ∗ G, given in the next sub-section. Combining the above, we get: Lemma 13 If BPP i.o. Lemma 17 If BPP i.o. nδ 6⊆ − nδ 6⊆ − HeurT ime c (2 ) for some c, δ > 0, then HeurT IME c (2 ) then f BPP for 1/n n− ∈ DGf ,1/4 is weakly probabilistic polynomial-time every downward and random self-reducible constructible. function f. In particular, ModP erm BPP so BPP = #P . ∈ The proof was sketched before. Assume the sim- As described earlier, this suffices to prove The- ulation based on computing the range of Gf and taking the BPP algorithm on this range fails orem 5 for the case of BPP . (For ZPP we just with probability 1/nc for all but finitely many n. need to note that the simulation is error-free, but Then we can sample a random instance and use may not find a halting computation. The rest is the corresponding circuit as our distinguisher. identical.)

Lemma 14 If DGf ,1/4 is weakly probabilis- 2.3 Construction of Gf tic polynomial-time constructible, then DGf ,1/5 is strongly probabilistic polynomial-time con- We view the construction of Gf as a sequence structible using oracle fn of three steps, in order to make the proof more

7 modular. The sequence we use below is not ex- Finally, by the definition of random self- c f,1 n− f actly the one used in [21, 3], but the original one reducibility C − C → would work as well. Proof. Lemma 18: The construction is from Let d > 0 be an integer. Let c > 0 be the G,.2 c [21]. Let D Dm . We construct a circuit to constant so that Cf C1 n− ,f . ∈ − predict h as follows: Pick i U 1, ..` . For each → ∈ { } c+2 1 j ` with j Si, pick zj U 0, 1 . For Direct product function Let n1 = n . ≤ ≤ 6∈ Si ∈Sj { } c+1 each i0 < i, query h at all 2| ∩ | ` strings View an n1 bit string as n n bit c+1 ≤ n1 n that might be z S for a z consistent with the strings. Define g : 0, 1 0, 1 by | i0 { } → { } zj’s. Store the answered queries in a table T . g(x1, ..xn) = f(x1), ..f(xn). Pick bi0 U 0, 1 for i i0 `. c+1 ∈ { } ≤ ≤ Hard-core bit Let n2 = n1 + n . View an n2 Let C be the following circuit: on input x , set bit string as an input to g x and a string r z S = x, while the other bits of z are fixed to of length g(x) . Then h(x, r) =< g(x), r >, | i the randomly chosen bits. Set bi = h(z S ) for where <| y, z| > represents inner product 0 | i0 i0 < i, by looking up the appropriate entry in T . modulo 2. If D(b1, ..b`) = 1 output bi; else output bi. ¬ 2 Almost disjoint sets generator Let m = n2. By random sampling using the oracle for hn2 , m Let z 0, 1 and let S = s1 < s2 < estimate the probability that C(x) = h(x); if ∈ { } { .. < sn be a subset of bit positions be- greater than 1/2 + .05/`, output C, else repeat. 2 } tween 1 to m. Then define z S to be the [21] show that the expected probability of suc- n bit string z z ...z . In| [21], an ex- 2 s1 s2 sn2 cess for C is at least 1/2 + .1/`, so the number of plicit construction of ` such sets S1,..S` is repetitions before outputting a C that has good given so that Si Sj log ` for every advantage is at most O(n`) with very high prob- | ∩ | ≤ n i = j. We define G : 0, 1 m 0, 1 ` by ability. 6 { } → { } G(z) = h(z S ), h(z S ), ...h(z S ). | 1 | 2 | ` From the construction, it is clear that Gm pT Lemma 19 follows directly from [8]. ≤ hn2 pT gn1 pT fn, which proves Lemma 12. Proof. Lemma 20 is the uniform version of a ≤ ≤ direct product lemma which has many proofs [16, 2.4 Proof of Lemma 15 9, 14]. We present here the construction from [14] as being simple to describe. We work through the construction in reverse or- Let C Cg,δ. Construct C as follows: Let der, showing how to construct, from a distin- n1 0 n = n /n∈ = nc+1. Repeat for r = 1 to n3/δ. guisher for G , a circuit for f . There will be 3 1 m n Pick i 1, ..n . For each j = i , pick x four stages in this construction, the first three U 3 j U 0, 1 n∈, query{ f(x} ) and record6 the answer. Flip∈ corresponding to the three levels of the defini- j {coins} until a head arises or until n tails have been tion of G and the last stage to the random self- flipped; let t be the number of flips. reduction of f. All stages are identical to those 1 from the non-uniform proofs, but we need to ver- Let Cr0 be the following three-valued cir- ify that the use of non-uniformity can be replaced cuit: On input x, compute t , the number of bit positions j = i where the j’th bit of by an oracle for fn. We’ll just review the con- 6 C(x1, ..xi 1, x, xi+1, ..xn3 ) disagrees with f(xj) structions from other papers, to see that they are − polytime computable from such an oracle, and (as recorded.) If t < t1 output the i’th bit of C(x1, ..xi 1, x, xi+1, ..xn3 ) ; otherwise output refer to the relevant papers for proofs of correct- − ness. “reject”. The four stages we need are given by the fol- Let C0 be the circuit that outputs the major- lowing lemmas: ity answer from those Cr0 that do not reject. [14] c f,1 n− G,.2 h h,1/2+O(1/`) prove that, for non-neglegible δ, C0 C − Lemma 18 D n2 C ∈ → with high probability. If δ is at least inverse poly- 1 3 Lemma 19 Ch,1/2+O(`− ) Cg,O(`− ) nomial, the construction takes polynomial time. → 3 c g,O(`− ) fn f,1 n− Lemma 20 C C − →

8 2.5 Proof of Theorem 6 Can we somehow utilize this intractibility in yet another layer of hardness vs. randomness trade- Proof. We construct a problem in offs? E P/poly that cannot be approximated As can be seen, our main result is achieved in TIME∩ (2o(n))/o(n) as follows. The idea is essentially with no technical work. All that is that some function from a universal family of taken from previous papers. On the other hand hash functions will serve our purpose. these papers are viewed from a somewhat differ- For any input of size n, we first simulate all ent perspective in trying to make them uniform, machines with descriptions of size .1n on all ad- which is subtle in some ways and raises some new vice strings of size .1n for 2n steps on all inputs. questions regarding these issues. If they don’t halt, record the answer as (say) 0. The first one is that the classical “learning This gives us at most 2.2n strings (truth tables) from a membership oracle” problem of computa- of N = 2n bits each. Let H be a family of 2O(n) tional learning theory arises naturally here. Let pairwise independent hash functions from n bits LEARN be the class functions for which this can to 1 bit. (For example, H = 0, 1 n, h (x) =< r be done efficiently, namely all f for which Cf is r, x >). Cosider them too as N-bit{ strings.} Using n constructible in PPT fn . This class is quite in- Chebyshev bounds, one can see that a random teresting, and we trivially have: hr H agrees with a string in 2/3 N positions ∈ only with probability 1/O(N). Then since there Fact 21 BPP LEARN P/poly are less than O(N) strings in our collection, we ⊆ ⊆ can find a hash function that does not agree with We also showed that any downward self- any of them in 2/3 the bits. We pick this hr and reducible problem in LEARN is also in BPP . output hr(x). What more can be said about the class LEARN? Finally, we should mention that gaps similar to the one obtained here are possible at higher levels 3 Conclusions and Open of the polynomial hierarchy, such as for whether MA = NEXP . Problems 6

Ideally, we would hope that our results are a step References towards proving BPP = EXP . However, our re- sults provide reasons both6 to be optimistic and [1] A. Andreev, A. Clementi and J. Rolim, pessimistic about such a proof. On the one hand, “Hitting Sets Derandomize BPP”, in our result makes such a result stronger, since it XXIII International Colloquium on would show a positive simulation as well as a neg- Algorithms, Logic and Programming ative result. On the other, it clarifies that the (ICALP’96), 1996. best way of attacking this problem is to continue [2] A. Andreev, A. Clementi, and J. Rolim, along the lines of the de-randomization papers. “Hitting Properties of Hard Boolean Op- It also shows that non-relativizing techniques can erators and its Consequences on BPP ”, be useful in this area, so we need not be depressed manuscript, 1996. by oracles where BPP = EXP . It also indi- cates that we do not need to prove circuit lower [3] L. Babai, L. Fortnow, N. Nisan and bounds to get such a result, so we should also A. Wigderson, “BPP has Subexponential be undaunted by the negative results on Natural Time Simulations unless EXPTIME has Proofs. Publishable Proofs”, Complexity Theory, There are some more technical points our work Vol 3, pp. 307–318, 1993. raises. First, is an average-case derandomization all one can hope for under a uniform assumption? [4] D. Beaver and J. Feigenbaum, “Hiding If BPP = EXP but EXP P/poly, we have a Instance in Multioracle Queries”, Proc paradoxical6 situation: the simulation⊆ of random- 7th Symposium on TheoreticalAspects of ness by determinism does not always work, but Computer Science, LNCS 415, pp. 37–48, it is intractable to find instances where it fails. 1990.

9 [5] L. Babai, L. Fortnow, C. Lund, “Non- [17] M. Luby, and Cryp- deterministic exponential time has two- tographic Applications, Princeton Com- prover interactive protocols”, in 31st puter Science Notes, Princeton University FOCS, pp. 16–25, 1990. Press, 1996.

[6] M. Blum and S. Micali. “How to Gener- [18] L.A. Levin, “Average Case Complete ate Cryptographically Strong Sequences Problems”, SIAM J. Comput., 15:285– of Pseudo-Random Bits”, SIAM J. Com- 286, 1986; also STOC, 1984. put., Vol. 13, pages 850–864, 1984. [19] R. Lipton, “New directions in testing”, [7] O. Goldreich, H. Krawcyk, M. Luby, “On In J. Fegenbaum and M. Merritt, editors, the existence of pseudorandom genera- Distributed Computing and , tors”, in 29th FOCS, pp. 12–24, 1988. DIMACS Series in Discrete Mathematics and Theoretical Computer Science Vol- [8] O. Goldreich and L.A. Levin. “A Hard- ume 2, pp. 191-202. American Mathemat- Core Predicate for all One-Way Func- ical Society, 1991. tions”, in ACM Symp. on Theory of Com- [20] N. Nisan, “Pseudo-random bits for con- puting, pp. 25–32, 1989. stant depth circuits”, Combinatorica 11 [9] O. Goldreich, N. Nisan and A. Wigder- (1), pp. 63-70, 1991. son. “On Yao’s XOR-Lemma”, available [21] N. Nisan, and A. Wigderson, “Hardness via www at ECCC TR95-050, 1995. vs Randomness”, J. Comput. System Sci. 49, 149-167, 1994 [10] S. Goldwasser and S. Micali. “Probabilis- tic Encryption”, JCSS, Vol. 28, pages [22] A. Shamir, “On the generation of cryp- 270–299, 1984. tographically strong pseudo-random se- quences”, 8th ICALP, Lecture Notoes in [11] J. Hastad, R. Impagliazzo, L.A. Levin and Computer Science 62, Springer-Verlag, M. Luby, “Construction of Pseudorandom pp. 544–550, 1981. Generator from any One-Way Function”, to appear in SICOMP. ( See preliminary [23] , S. Toda, “On the computational power versions by Impagliazzo et. al. in 21st of PP and P ”, in 30th FOCS, pp. 514– ⊕ STOC and Hastad in 22nd STOC.) 519, 1989.

[12] R. Impagliazzo, “Hard-core Distributions [24] A.C. Yao, “Theory and Application of for Somewhat Hard Problems”, in 36th Trapdoor Functions”, in 23st FOCS, FOCS, pages 538–545, 1995. pages 80–91, 1982.

[13] R. Impagliazzo, In preparation.

[14] R. Impagliazzo and A. Wigderson, “P=BPP unless E has sub-exponential circuits: Derandomizing the XOR Lemma”, Proc. of the 29th STOC, pp. 220–229, 1997.

[15] R. M. Karp and R. J. Lipton, “Turing Ma- chines that Take Advice”, L’Ensignment Mathematique, 28, pp. 191–209, 1982.

[16] L. A. Levin, “One-Way Functions and Pseudorandom Generators”, Combinator- ica, Vol. 7, No. 4, pp. 357–363, 1987.

10