<<

Electronic Colloquium on Computational Complexity, Report No. 88 (2013)

AC0 Pseudorandomness of Natural Operations

Zachary Remscrim∗ and Michael Sipser† MIT Cambridge, MA 02139

June 15, 2013

Abstract A function f :Σ∗ → Σ∗ on strings is AC0-pseudorandom if the pair (x, fˆ(x)) is AC0- indistinguishable from a uniformly random pair (y, z) when x is chosen uniformly at random. Here fˆ(x) is the string that is obtained from f(x) by discarding some selected bits from f(x). It is shown that several naturally occurring functions are AC0-pseudorandom, including con- volution, nearly all homomorphisms, Boolean matrix multiplication, integer multiplication, finite field multiplication and division, several problems involving computing rank and determinant, and a variant of the algebraic integer problem.

1 Introduction

1.1 The Problem

Random-like behavior occurs naturally√ in many places in . For example, the binary representations of numbers π, e and 2 look random. Various conjectures about the distribution of prime numbers and the number of prime factors of an integer say that these behave randomly. However, very little progress has been made in proving that such behaviors are indeed pseudoran- dom√ in any formal sense. For example, it is not known that the binary representations of π, e or 2, contain all substrings with the expected frequencies or even that the substring 11 appears infinitely often. In this paper, we propose to study the pseudorandom characteristics of naturally occurring mathematical functions by using the tools of complexity theory. The theory of pseudorandom generators provides a good starting point, but there the motivation is somewhat different than ours. Pseudorandom generators are used to good effect in cryptographic protocols and in derandomizing probabilistic , and they are designed with those goals in mind. Our objective is to study the basic operations themselves, such as Boolean convolution and integer multiplication, for their pseudorandom properties. These functions occur naturally—they have not been specifically designed to have pseudorandom behaviour—yet we can show that they do exhibit such behavior. We use the integer multiplication function as a motivating example. Let X and Y be n- bit binary strings representing non-negative integers and let Z be the 2n-bit string representing Z = X × Y . Take X and Y to be selected uniformly at random from 0 to 2n − 1, and consider the characteristics of Z. Does Z look random? The low-order bit of Z certainly does not; it is 0 with probability 3/4. The other very low order bits look non-random for a similar reason. The

[email protected] Research supported in part by an Akamai Fellowship. †[email protected]

1 ISSN 1433-8092 very high order bits of Z likewise appear non-random. However, if we discard these problematic very low and very high order bits from Z, the result could conceivably be pseudorandom in some appropriate sense. We show that, for uniformly randomly selected X,Y , the string consisting of X,Y and all 2n bits of X × Y , except the lowest and highest nα bits, for any α > 0, is indistinguishable from truly random strings by AC0 circuits. In fact, we show something even stronger: for almost all Y , the string consisting of X and all 2n bits of X × Y , except the lowest and highest nα bits, is indistinguishable from random by AC0 circuits that have Y built-in (the circuit is allowed to depend on Y ). AC0 circuits are circuits consisting of AND, OR, and NOT gates of unbounded fan-in, such that the size of the circuit (the total number of gates) is polynomial in the size of the in- put and the depth of the circuit (the number of gates on the longest path from the input to the output) is a constant. Techniques for proving strong lower bounds on low-depth circuits [Ajt83],[FSS84],[Yao85],[Has86] enable us to prove the AC0-pseudorandomness of explicit func- tions without using any unproven complexity-theoretic assumptions. Moreover AC0 is powerful enough to describe basic tests for pseudorandomness. We now formally define what it means for a function to look random to AC0 circuits. For ease of exposition, we consider functions that operate on strings of a specific length, whereas we really have in mind a family of functions and their asymptotic properties. For a function f : {0, 1}m1 × · · · × {0, 1}mh → {0, 1}k, define the function g : {0, 1}m1 × · · · × {0, 1}mh → {0, 1}n, where n = m1+···+mh+k, such that g(x1, . . . , xh) = x1◦· · ·◦xh◦f(x1, . . . , xh) is the concatenation of x1, . . . , xh and f(x1, . . . , xh). Let µn denote the distribution of g(x1, . . . , xh), when each xi is m n drawn uniformly at random from {0, 1} i . For any binary predicate Pn : {0, 1} → {0, 1}, let

Eµn [Pn] denote the expected value of Pn when inputs are drawn according to the distribution µn and E[Pn] denote the expected value of Pn when inputs are drawn uniformly at random from n {0, 1} . We say that the distribution µn -fools the function Pn if |Eµn [Pn] − E[Pn]| <  and 0 that the original function f is AC -pseudorandom if the corresponding distribution µn -fools 0 −nκ every Pn that is computable in AC , where  = O(2 ), for constant κ > 0. This is, of course, quite similar to the standard pseudorandom generator model for AC0 circuits (see, for instance, [Nis91], [NW94]), with the exception of the fact that we impose the stronger requirement that both the input and output of the function together are indistinguishable from random bits, instead of only requiring that the output is indistinguishable. Also, while the focus of this paper is the pseudorandomness of functions, not the difficulty of actually computing the functions, it is still worth noting that the functions considered can be computed in a low such as AC0[2] (AC0 circuits that are allowed unbounded fan-in parity gates) or TC0 (constant depth circuits with unbounded fan-in majority gates), but still produce strings that are indistinguishable from truly random strings by AC0 circuits. A somewhat similar question was considered in the recent paper [Gre12], concerning the M¨obiusfunction µ : N → {−1, 0, 1}, which is defined such that µ(1) = 1, µ(x) = 0 when x has a nontrivial perfect square factor, and µ(x) = (−1)k, when x has no nontrivial perfect square factors, where k is the number of distinct primes in the prime factorization of x. It was shown 0 that µ is asymptotically orthogonal to any AC computable function f : N → {−1, 1} (that is to 1 PN say N x=1 f(x)µ(x) = o(1)). Tools from complexity theory were used to show that a naturally occurring function looks random to AC0 circuits. It is worth noting that the functions considered in our paper have much longer output than the M¨obiusfunction; we consider functions which, on an n bit input, produce a Ω(n) bit output, while the M¨obiusfunction maps an n bit input to only a constant sized output. Another example of a natural problem studied for its pseudorandom properties is the alge-

2 braic number problem, which, as noted in [KLL84], was initially proposed by Manuel√ Blum.√ An algebraic√ number is a root of a polynomial with integer coefficients. For example, 2, 3, and (1 + 5)/2 are all algebraic numbers. The algebraic number problem involves selecting, uniformly at random, an algebraic number ζ of bounded degree d and height H (where the degree of ζ is the degree of the (unique) primitive irreducible polynomial that has ζ as a root, and the height is the Euclidean length of the coefficient vector of that polynomial). The string to be considered is a portion of the binary expansion of the fractional part of ζ. In [KLL84], it was shown that, given the first O(d2 + d log H) bits of an algebraic number ζ, it is possible, in deterministic polynomial time to determine the minimal polynomial of ζ. Since the next bit of the binary expansion of ζ can easily be obtained if given the minimal polynomial of ζ, this immediately implied that such strings do not pass all polynomial time tests. We consider a closely related problem, which is identical to the above problem, except that we select ζ only from the ring of integers of certain algebraic number fields. By the argument used in [KLL84], this variant also does not pass all polynomial 0 time tests. However, we show that it does pass all AC tests. While this is certainly far√ away from showing anything about the pseudorandomness properties of a single value, such as 2, it might be a step in that direction.

1.2 Main Results This paper illustrates two techniques for demonstrating that functions are AC0-pseudorandom. The first technique makes use of the result in [Bra09] that resolved the long standing Linial-Nisan conjecture [LN90]. We use this technique to show that almost all “reasonably sized” homomor- phisms are AC0-pseudorandom, and, moreover, that convolution, integer multiplication and matrix multiplication are AC0-pseudorandom. Our second technique involves reducing the (provably hard) problem of computing parity to the problem of distinguishing certain distributions from random. The second technique is related to the method in [Nis91],[NW94], in that we show that the structure of certain multiplication problems is a naturally occuring example of the combinatorial designs they employ. We use this technique to show that an alternate form of the multiplication problem, where one multiplicand is substantially longer than the other, is AC0-pseudorandom. One consequence of this result will be the existence of a simple, multiplication-based pseudorandom generator with the same stretch and hardness parameters as the Nisan-Wigderson generator. An additional consequence is the fact that no AC0 circuit can compute the product of an n-bit number and a superpolylog(n)-bit number (that is to say, a sequence of numbers whose length grows faster than logc n, for all constants c > 0). This shows that the result from [CSV84], which states that an AC0 circuit can compute the product of an n-bit and a O(logcn)-bit value, is optimal. Additionally, we show, via a reduction from the multiplication problem, that a certain variant of the algebraic integer problem looks random to AC0. These same techniques can be used to show that a variety of additional problems, such as finite field multiplication and division, matrix inversion, computing determinants, and an iterated version of convolution are also AC0- pseudorandom. We prove the following theorems: Let Hom({0, 1}m, {0, 1}k) denote the set of homomorphisms from {0, 1}m to {0, 1}k (or, in other words, the linear maps from the vector space {0, 1}m to the vector space {0, 1}k).

Theorem 1. If k = mu, for any fixed constant u > 0, then all but an exponentially small fraction of all f ∈ Hom({0, 1}m, {0, 1}k) are AC0-pseudorandom.

r s k Let CONVr,s,k : {0, 1} × {0, 1} → {0, 1} denote the Boolean convolution function, which

3 takes a X ∈ {0, 1}r and Y ∈ {0, 1}s to the middle k-bits of the r + s − 1 bit long convolution of X and Y . Theorem 2. If s = ru and k = r + s − (MIN(r, s))α, for any fixed constants u > 0 and 0 < α < 1, 0 α then CONVr,s,k is AC -pseudorandom. In particular, if r = s and k = 2r − r , for any 0 < α < 1, 0 then CONVr,s,k is AC -pseudorandom. r s k Let MULTr,s,k : {0, 1} ×{0, 1} → {0, 1} denote the integer multiplication function, which takes a X ∈ {0, 1}r and Y ∈ {0, 1}s to the middle k-bits of the r + s bit long product of X and Y . Theorem 3. If s = ru and k = r + s − (MIN(r, s))α, for any fixed constants u > 0 and 0 < α < 1, 0 α then MULTr,s,k is AC -pseudorandom. In particular, if r = s and k = 2r − r , for any 0 < α < 1, 0 then MULTr,s,k is AC -pseudorandom.

rs rs s2 Let MATRIX-MULTr,s : {0, 1} × {0, 1} → {0, 1} denote the matrix multiplication function, which, on input a s × r matrix A and a r × s matrix B (both of which are encoded as strings in {0, 1}rs in the obvious way), produces the s × s matrix AB.

u 0 Theorem 4. If s = r , for any fixed constant u > 0, then MATRIX-MULTr,s is AC -pseudorandom.

2 The Linial-Nisan-Braverman Technique

2.1 Braverman’s Theorem Braverman [Bra09] resolved the long standing Linial-Nisan conjecture [LN90]. We now state this theorem, which provides a simple sufficient condition for a distribution to appear random to AC0 n circuits. For a distribution µn with support {0, 1} , we say that µn is a (β, r)-approximation if every r restriction of µn to r coordinates is β-close to the uniform distribution on {0, 1} (two distributions are β-close if the statistical distance between them is at most β). The theorem states that if a distribution µn is a (β, r(s, d, ))-approximation, for sufficiently large r and sufficiently small β, then it -fools all depth d AC0 circuits of size s. Theorem. [Bra09] Every (β, r(s, d, ))-approximation -fools all depth d AC0 circuits of size s, where  sO(d2) r(s, d, ) = log  and  > 2nr(s,d,). β In particular, every (2−nγ , nδ)-approximation, for constants κ < δ < γ < 1, will 2−nκ -fool polynomial sized circuits of any constant depth, for sufficiently small constant α. In this paper, any function f for which the corresponding distribution µn, as defined above, meets this condition, will be said to have the Linial-Nisan-Braverman property, or LNB property for short. In fact, many of the functions considered will have an even stronger property: their corresponding distributions will δ δ be (0, n )-approximations (or, in other words, every restriction of µn to n coordinates will simply be the uniform distribution, rather than being only close to the uniform distribution).

2.2 Application to Homomorphisms Let us now restrict our attention to homomorphisms from {0, 1}m to {0, 1}k, the set of which we denote by Hom({0, 1}m, {0, 1}k) (or, in other words, viewing {0, 1}m and {0, 1}k as vector spaces,

4 we consider the set of linear maps). It will be shown that it is particularly simple to determine if a given homomorphism has the Linial-Nisan-Braverman property, and, moreover, that many homomorphisms have this property, and hence appear random to AC0 circuits. Every f ∈ Hom({0, 1}m, {0, 1}k) corresponds to a k × m matrix F , with entries in {0, 1}, m such that f(X) = FX, for X ∈ {0, 1} . For any R ⊆ {1, . . . , k} and C ⊆ {1, . . . , m}, let FR,C be the submatrix of F consisting of rows R and columns C. The following lemma shows that having the Linial-Nisan-Braverman property is equivalent to certain submatrices of F being full rank. As before, n = m + k.

Lemma 1. f ∈ Hom({0, 1}m, {0, 1}k) has the Linial-Nisan-Braverman property if and only if δ ∃δ > 0 such that ∀R ⊆ {1, . . . , k},C ⊆ {1, . . . , m} with |R| + |C| = n , the submatrix FR,C is full rank, where C = {1, . . . , m}\ C.

Proof. First, consider a function f : {0, 1}m → {0, 1}k whose corresponding matrix F meets the above condition. We show that f has the Linial-Nisan-Braverman property. To do this, let X ∈ m n {0, 1} be an arbitrary element, Y ∈ {0, 1} be the concatenation of X and f(X), and µn be the distribution of Y given a uniformly randomly selected X. By definition, f has the Linial-Nisan- δ Braverman property if µn is n -independent. To see that f has this property, imagine that an adversary selects some nδ sized subset of coordinates of Y . We must show that the distribution µn, when restricted to these coordinates is the uniform distribution. Each coordinate is either a coordinate of the input X or a coordinate of the output f(X). Of course, since X is selected uniformly at random, any such restriction on just the bits of X yields the uniform distribution. All that needs to be shown is that the conditional distribution of selected output coordinates is uniform, given any value of the selected input coordinates, or, in other words, that if the adversary is allowed to look at only a small number of input bits (fewer than nδ) than the distribution of any small number of output bits due to the remaining inputs bits is still uniform. To see this, let R ⊆ {1, . . . , k} and C ⊆ {1, . . . , m} denote the selected coordinates of f(X) and X, respectively, δ where |R|+|C| = n . Letting f(X)R denote the bits of the output corresponding to R (that is to say, the selected bits of the output), and defining XC and XC analogously (which are then the selected and unselected bits of the input, respectively), then we can write f(X)R = FR,C XC + FR,C XC . Since F meets the above condition, we know that FR,C is full rank, and so, as all of the (unseen) bits of XC vary uniformly, FR,C XC varies uniformly. One way to see this is to note that, since FR,C is full rank, it contains a |R| × |R| invertible submatrix. Therefore, as the bits of XC that correspond to this invertible submatrix vary over all possible values (with the other bits of XC fixed), FR,C XC indeed varies uniformly. Therefore, for any fixed XC , f(X)R varies uniformly, and so f has the Linial-Nisan-Braverman property. To prove the other direction, assume that F doesn’t meet the above condition. This means δ that, ∀δ > 0, ∃R ⊆ {1, . . . , k},C ⊆ {1, . . . , m} with |R| + |C| = n the submatrix FR,C is not full rank. Again, we write f(X)R = FR,C XC + FR,C XC . Since FR,C is not full rank, we have, by definition, that as XC varies FR,C XC doesn’t even hit all possible values. In fact, it must miss at least half of all values, and so f(X)R is far from uniformly randomly distributed for any fixed XC .

Using the above result, we are now able to prove Theorem 1, which states that for any “reasonable” choice of m and k, almost every f ∈ Hom({0, 1}m, {0, 1}k) is AC0-pseudorandom. For convenience, we restate the theorem here.

Theorem 1. If k = mu, for any fixed constant u > 0, then all but an exponentially small fraction of all f ∈ Hom({0, 1}m, {0, 1}k) are AC0-pseudorandom.

5 Proof. Let Ph,w denote the probability that an h × w matrix, where w ≥ h, with entries drawn uniformly at random from {0, 1}, is full rank (that is to say, has rank h). We have the following useful bound, which follows from the fact that, in order for the matrix not to be full rank, either the first row must be identically zero, or the second row is a multiple of the first, or, in general, the ith row lies in the span of the first i − 1 rows; combining these probabilities with a union bound gives: h −w X i−1 Ph,w ≥ 1 − 2 2 . i=1 For any particular m, k, the probability that a randomly selected f ∈ Hom({0, 1}m, {0, 1}k) is AC0-pseudorandom is, by the above theorem, given by the probability that all appropriately sized submatrices of a randomly selected k × m matrix are full rank. To be precise, we are interested in δ δ the probability that all submatrices FR,C , where |R| + |C| = n are full rank, when m, k  n . For km any h ≤ k and w ≤ m, the number of h × k submatrices of a k × m matrix is given by h w , and so, by a simple union bound, we have the following:

nδ−1 X k m  Pr(f doesn’t have the LNB property) ≤ (1 − P δ ) j m − (nδ − j) j,m−(n −j) j=1

   nδ k m δ X ≤ nδ 2−(m−n ) 2i−1 nδ nδ i=1 nδ nδ k m δ δ ≤ nδ 2−(m−n )(2n +1 − 1) (nδ)! (nδ)! (km)nδ ≤ 2m (mu+1)nδ = 2m

2.3 Convolution In the previous section, it was shown that many functions in Hom({0, 1}m, {0, 1}k) appear random to AC0 circuits, but no explicit example of such a function was given. This section shows that a particular function, namely the convolution function, satisfies this property. We begin by recalling the definition of convolution. Given some X ∈ {0, 1}r and Y ∈ {0, 1}s, the convolution of X and r+s−1 th Y , which will be denoted X ∗ Y , is the Z ∈ {0, 1} where if Xi,Yi, and Zi refer to the i bit (zero indexed, counting from the least significant bit up) of X,Y,Z, respectively, then

i X Zi = XjYi−j, j=0 where XjYi−j denotes the AND of Xj and Yi−j, any Xj or Yj outside of the defined range is understood to be zero, and the sum is, of course, computed modulo 2. The goal is to show that convolution is AC0-pseudorandom. There are several reasonable ways to define this. Perhaps the most natural, immediate thought is to consider the function

6 f : {0, 1}r × {0, 1}s → {0, 1}r+s−1, which takes the pair (X,Y ) to X ∗ Y . Unpacking definitions, this means we consider the distribution (when X and Y are selected uniformly at random) of the string in {0, 1}2r+2s−1 where the first r bits are X, the next s bits are Y , and the final r + s − 1 bits are X ∗ Y . Observe that this distribution clearly does not look random to AC0 circuits because some of the bits of X ∗ Y can be determined exactly by an AC0 circuit. To be precise, letting n denote, as usual, the total size of the string (n = 2r + 2s − 1), we see that any of the first (or last) O(logc n) bits of X ∗ Y is simply the parity of O(logc n) bits, each of which is the AND of some bit of X with some bit of Y . Since a parity of O(logc n) bits can (for any constant c) be computed easily in AC0, we immediately conclude that including any of these bits will cause the resulting distribution to not appear random to AC0 circuits. However, if we exclude these bits, we can show that the remainder does appear random to AC0 circuits. We consider the function r s k CONVr,s,k : {0, 1} × {0, 1} → {0, 1} where now k = k(r, s) < r + s − 1, and only the k “middle bits” of X ∗ Y are included (the k centermost bits). Such a function is not a homomorphism and so the technique of the previous section does not directly apply. Instead, we will consider a variant of this problem, to which that technique does apply. Doing so yields a stronger result that also 0 immediately implies that CONVr,s,k function does, in fact, appear random to AC circuits. Essentially, the idea is to consider a “fixed” Y (here we mean that there is a single fixed Y of each length; as mentioned earlier, the discussion involves the asymptotic properties of f, defined r k by a sequence of Y values, one for each length), and define the function fY : {0, 1} → {0, 1} , (where again, as above, k = k(r) < r + s − 1) such that fY takes the r-bit value X to the middle k bits of X ∗ Y . The difference between these two variants can be understood as follows. In the first variant, described in the previous paragraph, the distinguisher would be an AC0 circuit family where the circuit whose input size is r + s + k would be able to distinguish the string consisting of a uniformly randomly selected X ∈ {0, 1}r, a uniformly randomly selected Y ∈ {0, 1}s and the middle k(r, s) bits of X ∗ Y from a truly random string. In the second variant, the distinguisher can have Y built-in, and only needs to distinguish the string consisting of a uniformly randomly selected X ∈ {0, 1}r and the middle k(r) bits of X ∗ Y from a truly random string. Since each fY is clearly a homomorphism, Lemma 1 applies. Moreover, if it can be shown that, for all sufficiently large r, all but an exponentially small fraction of choices for Y produce an fY that is AC0-pseudorandom, then it immediately follows that the variant of the problem described in the previous paragraph, in which both X and Y are selected uniformly at random, also is AC0- pseudorandom. Loosely speaking, claiming that this second variant is AC0-pseudorandom is a stronger claim because being able to have a separate circuit for each Y could conceivably give a distinguisher more power. We now prove Theorem 2, which is restated below.

Theorem 2. If s = ru and k = r + s − (MIN(r, s))α, for any fixed constants u > 0 and 0 < α < 1, 0 α then CONVr,s,k is AC -pseudorandom. In particular, if r = s and k = 2r − r , for any 0 < α < 1, 0 then CONVr,s,k is AC -pseudorandom. By the above logic, it suffices to show the following lemma.

r k Lemma 2. For all but an exponentially small fraction of Y , the function fY : {0, 1} → {0, 1} , where k = 2(r − rα + 1) for any small constant α > 0, has the Linial-Nisan-Braverman property.

s Proof. Let f denote an arbitrary element of the set {fY |Y ∈ {0, 1} }. Since f is a homomorphism, there is a corresponding k × r matrix F such that f(X) = FX, for any X ∈ {0, 1}r. To show that, for almost all choices of Y , the corresponding function f has the Linial-Nisan-Braverman property, it suffices, by Lemma 1, to show that the appropriate submatrices of F are full rank.

7 The matrix F has a particularly simple structure, namely it has constant skew-diagonals. That is to say, if Fi,j denotes the element of F in row i and column j then Fi,j = Fi−1,j+1. The first row of F consists of, from left to right, r − rα zeros followed by the lowest rα bits of Y , starting with the least significant bit of Y . Each subsequent row of F is obtained by shifting Y one index further to the left, filling empty entries with zeros. Consider an arbitrary submatrix FR,C where R ⊆ {1, . . . , k} and C ⊆ {1, . . . , r} such that |R| + |C| = nδ for δ < α, where C = {1, . . . , r}\ C and n = r + k. For randomly selected Y , this submatrix is full rank with overwhelming probability.

To see this, note that if FR,C is not full rank, then there is some non-trivial linear combination of its rows that adds to 0. Let h and w be the height and width, respectively, of FR,C . Then there are 2h − 1 potential non-trivial linear combinations of the rows, because a linear combination is, by definition, a sum of the rows of FR,C where each row has coefficient 0 or 1 (having all coefficients be 0 is the trivial linear combination). In other words, it is a sum of some subset of the rows of FR,C . Consider any fixed non-trivial linear combination. Let i denote the lowest row of FR,C that has coefficient 1. Note that the probability (over Y ) that this particular linear combination of the rows of FR,C is zero is very small. While this fact would be immediate if FR,C were simply a random unstructured matrix, some care must be given due to the structure of F (constant skew-diagonals) which forces all elements of F in the same skew-diagonal to be identical. To deal with this, consider the rows of FR,C one at a time, from left to right. In order for the linear combination of the rows to be the zero vector, it must be the case, by definition, that the sum in each column is zero (where of course this sum is only over the subset of elements selected by the linear combination). Consider the element in position (i, j). This element is either some element of Y , if some part of Y was shifted over position (i, j), or is simply 0, if no part of Y was shifted to that position. In the first case, this value is completely independent of any previously considered entries that influence the linear combination. This is because, even though the value of the entry in position (i, j) forces the values of all other entries in the same skew-diagonal (in F ), all other such entries are either to the right of this entry, and so haven’t been considered yet, or to the left and below this entry, in which case they have coefficient 0 in the linear combination (because row i is the lowest row with coefficient 1). Since row i has coefficient 1, flipping the value of the element in position (i, j) flips 1 the value of the sum in column j, and so the sum in this column is 0 with probability 2 . From this, we immediately conclude that the probability that the sum in all columns is 0 is 2−z, where z is the number of entries in row i that come from Y (as opposed to being fixed 0s). Since each row of F has at least rα such elements (because the output of f does not include the first or last rα bits of X ∗ Y ), we conclude that this particular linear combination is 0 with probability at most 2−r−α . Applying a union bound over all 2h − 1 non-trivial linear combinations, where h < nδ  rα, and then another union bound over all choices of R and C (as in the calculation in the previous section), we conclude that, for all but an exponentially small fraction of Y , F has the desired property, which completes the proof that convolution appears random to AC0 circuits.

2.4 Integer Multiplication r s k Let MULTr,s,k : {0, 1} × {0, 1} → {0, 1} denote the integer multiplication function, which takes a X ∈ {0, 1}r and Y ∈ {0, 1}s to the middle k-bits of the r + s bit long product of X and Y . In this section, we will prove the following theorem.

Theorem 3. If s = ru and k = r + s − (MIN(r, s))α, for any fixed constants u > 0 and 0 < α < 1, 0 α then MULTr,s,k is AC -pseudorandom. In particular, if r = s and k = 2r − r , for any 0 < α < 1, 0 then MULTr,s,k is AC -pseudorandom.

8 As was the case for convolution, there are two natural variants of the multiplication problem to consider. In the first variant, we select X ∈ {0, 1}r and Y ∈ {0, 1}s uniformly at random, then produce the product P = X × Y , and finally we produce the string consisting of X,Y, and part of P . The hope is that the distribution of that string appears random to AC0 circuits. It is necessary to include only part of P because, as was the case in convolution, the lowest and highest bits of P do not look random to AC0 circuits. For example, the low O(logc r) bits of the product can be calculated exactly, using the technique in [CSV84]. In the second variant, we consider “fixed” Y , in the sense that we have a single Y of each length, and the multiplication problem is defined such that a uniformly randomly selected X ∈ {0, 1}r is multiplied by the fixed Y to produce the product P = X × Y ; the string of interest then consists of X and the middle part of P . Again, loosely speaking, the second variant is stronger in the sense that a potential distinguisher is allowed to have Y built-in. In this section, we focus on the second variant and show that, for sufficiently large r, all but an exponentially small fraction of Y (of length s) lead to a multiplication problem that looks random to AC0 circuits. Therefore, by the same logic as in the convolution problem, it immediately follows 0 r k that the first variant is also AC -pseudorandom. We consider the function fY : {0, 1} → {0, 1} , which takes the r-bit value X to the middle k bits of the product X ×Y . We will prove the following lemma, from which the above theorem immediately follows. Lemma 3. For all but an exponentially small fraction of Y ∈ {0, 1}s, where s = ru, the function r k α fY : {0, 1} → {0, 1} , where k = r + s − 2r for any small constant α > 0, has the Linial-Nisan- Braverman property. Proof. It suffices to establish the claim for almost all odd Y (because adding w trailing zeros to Y simply shifts the product X × Y by w bits to the left; all but an exponentially small fraction of Y have fewer than rα trailing zeros), and so we restrict our attention to the case in which Y is odd. We begin by establishing some notation. Let n = r + k. Let Z = Z1 ··· Zn be the distribution of the set of all strings of the form X ◦ fY (X) (strings that are the concatenation of X with fY (X)), where X is an r-bit string. Then, by definition, fY has the Linial-Nisan-Braverman property if Z is a (2−nγ , nδ)-approximation for appropriate small constants 0 < δ < γ < 1, which is to say that, for δ −nγ every set of n coordinates the restriction of µn to those coordinates is 2 -close to the uniform distribution over {0, 1}nδ . To show this, we begin by recalling that the bias of a distribution Z on some set I ⊆ {1, . . . , n} is defined to be P Zi biasI (Z) = E[(−1) i∈I ]. We make use of the following lemma, variants of which appeared in, for example [Vaz86] and [AGM02]. Lemma 4. [Vaz86], [AGM02] Every distribution Z that has bias as most  on every non-empty subset I of size at most h is a (2h/2, h)-approximation. We will then show that Z has bias at most 2−nν , for some constant ν > 0, on all non-empty sets of size at most nδ. The above lemma implies that Z is a (2−nγ , nδ)-approximation, as desired th r (for any δ < γ < ν). To see why, let Xi denote the i bit of X and let fY,j : {0, 1} → {−1, 1} be th th defined such that fY,j(X) = 1 when the j bit of X × Y is 0 and fY,j(X) = −1 when the j bit th th of X × Y is 1 (note that fY,j corresponds to the j bit of X × Y not the j bit of fY (X), where α fY (X) consists of all bits of X × Y except the lowest and highest r ; this is done because it will be much cleaner to refer to bits by their position in the entire product). Clearly,

XY b j−1 c fY,j(X) = (−1) 2 .

9 ˆ For any S ⊆ {1, . . . , r}, let fY,j(S) denote the Fourier-Walsh coefficients of fY,j, which are given by P ˆ i∈S Xi fY,j(S) = E[fY,j(X)(−1) ]. r These are the Fourier coefficients of a function on F2 (we use the term Fourier-Walsh to avoid confusion with the “ordinary” Fourier coefficients of a function defined on R, which will be used shortly). We partition the set I as I = S ∪ J, where S ⊆ {1, . . . , r} are the indices of Z that correspond to bits of X and J ⊆ {r + 1, . . . , n} are the indices of Z that correspond to bits of fY (X). There are two cases. First, if J is empty, then the set I consists only of bits of X, and so, trivially, Z has bias exactly 0 on this set, because X is uniformly random. The interesting case is when J is non-empty. For notational convenience, define the set J 0 ⊆ {rα + 1, . . . , r + s − rα} such that J 0 = {j0|j0 + r − rα ∈ J} (simply the set J shifted appropriately to index bits of X × Y ). Let Q ˆ fY,J0 (X) = j∈J0 fY,j(X). Then the bias of Z on I is simply fY,J0 (S). This follows from the fact that

P Zi biasI (Z) = E[(−1) i∈I ]

= Pr[⊕i∈I Zi = 0] − Pr[⊕i∈I Zi = 1]

= Pr[⊕s∈SZs = ⊕j∈J Zj] − Pr[⊕s∈SZs 6= ⊕j∈J Zj] P P Xs Xs = Pr[(−1) s∈S = fY,J0 (X)] − Pr[(−1) s∈S 6= fY,J0 (X)] P P Xs Xs = Pr[(−1) s∈S fY,J0 (X) = 1] − Pr[(−1) s∈S fY,J0 (X) = −1] P Xs = E[fY,J0 (X)(−1) s∈S ] ˆ = fY,J0 (S). ˆ Rather than compute fY,J0 (S) directly, we instead compute the Fourier coefficients of fY,J0 r r when viewed as a function on {0,..., 2 − 1} (instead of on F2), and exploit a connection between these two types of Fourier coefficients. For a function f : {0,..., 2r − 1} → {−1, 1}, define

− 2πikt fˆ(k) = E[f(t)e 2r ], where k ∈ Z. We have the following lemma, from [Gre12] (see also [Kat86]), which has been modified to fit our notation. We say that an integer k is a (b, m)-sparse number if it can be written h1 h in the form k = k12 + ··· + kb2 b where each ki ∈ Z, |ki| ≤ m, hi ∈ N. Lemma 5. Let f : {0,..., 2r − 1} → {−1, 1} be a function such that ∃S ⊆ {1, . . . , r} with Fourier- ˆ 1   10|S|  Walsh coefficient f(S) of magnitude at least , where 0 <  < 2 . Then there is a |S|,  - 4|S| ˆ    sparse number k such that the Fourier coefficient f(k) has magnitude at least 10|S| . δ Applying this lemma to the function fY,J0 , with sets S of size at most n , we immediately conclude that, in order to establish the necessary bounds on the Fourier-Walsh coefficients (which then implies that multiplication has the Linial-Nisan-Braverman property), it suffices to show that, δ δ nν ˆ −nρ for all (n , 10n 2 )-sparse numbers k, |fY,J0 (k)| < 2 for a fixed constant ρ such that ρ > δ + ν. We say that a particular Fourier component is negligible if its magnitude has such a bound. ˆ We now show that, for almost all Y , the required bound on fY,J0 (k) holds. The main idea is that, for each j, fY,j is simply a downsampled version of a square wave. This fact allows us to express the Fourier coefficients of fY,j in terms of the Fourier coefficients of a square wave. This

10 is useful because the Fourier coefficients of a square wave are particularly simple. In the following, we make use of several standard facts about the Discrete Fourier Transform, which can be found in essentially any text that deal with Fourier Analysis, for example [OSB99]. We begin with a few r+s definitions. Let DY = {0,...,Y 2 − 1}. Let sj : DY → {−1, 1} be the perfect square wave of period 2j, t b j−1 c sj(t) = (−1) 2 .

Let pY : DY → {0, 1} be a pulse train with interval Y , ( 1, t ≡ 0 mod Y pY (t) = . 0, t 6≡ 0 mod Y

Let hY (t): DY → {0, 1} be the step function ( 1, t < Y 2r hY (t) = . 0, t ≥ Y 2r

s Q Finally, let gY,J0 (t) = Y 2 hY (t)pY (t) j∈J0 sj(t). We then have 2r−1 1 X − 2πikt fˆ 0 (k) = f 0 (t)e 2r Y,J 2r Y,J t=0

2r−1   1 X Y b Y t c − 2πikt = (−1) 2j−1 e 2r 2r   t=0 j∈J0

Y 2r−1   1 X Y b t c − 2πikt = p (t) (−1) 2j−1 e Y 2r 2r Y   t=0 j∈J0

Y 2r−1   1 X Y − 2πikt = p (t) s (t) e Y 2r 2r Y  j  t=0 j∈J0

r+s   Y 2 −1 s 1 X s Y − 2πi2 kt = Y 2 h (t)p (t) s (t) e Y 2r+s Y 2r+s Y Y  j  t=0 j∈J0

r+s Y 2 −1 s 1 X − 2πi2 kt = g 0 (t)e Y 2r+s Y 2r+s Y,J t=0 s =g ˆY,J0 (2 k). s Therefore, it suffices to show thatg ˆY,J0 (2 k) is sufficiently small for the k values of interest. The convolution theorem implies that

sˆ O gˆY,J0 (k) = Y 2 hY (k) ⊗ pˆY (k) ⊗ sˆj(k), j∈J0 where ⊗ denotes cyclic convolution.

11 Notice that, for each j,s ˆj(k) has a particularly simply structure.

 1 r+s−j 2πi(2v+1) ! , k = (2v + 1)Y 2  −  2j−2 1−e 2j sˆj(k) = .  0, otherwise

r+s−j Notice thats ˆj(k) is only nonzero at few locations; specifically, the odd multiples of Y 2 . Moreover, notice that the magnitude of the nonzero values falls off quickly. To be precise,

X r+s−j −nη |sˆj((2v + 1)Y 2 )| = O(2 ) v τ |2v+1|>2n for constants η and τ such that δ < η < τ  1. In other words, the only non-negligible part of sˆj(k) is at values k given by small odd multiples of a shift of Y (Y shifted to the left by r + s − j bits). N We then consider j∈J0 sˆj(k). We splits ˆj(k) into a large low frequency component and a small high frequency component. That is to say, we writes ˆj(k) =u ˆj(k) +v ˆj(k), where

 1 r+s−j nτ 2πi(2v+1) ! , k = (2v + 1)Y 2 , |2v + 1| ≤ 2  −  2j−2 1−e 2j uˆj(k) =  0, otherwise and  1 r+s−j nτ 2πi(2v+1) ! , k = (2v + 1)Y 2 , |2v + 1| > 2  −  2j−2 1−e 2j vˆj(k) = .  0, otherwise Therefore, O O sˆj(k) = (ˆuj(k) +v ˆj(k)) j∈J0 j∈J0     X O O =  uˆj(k) ⊗  vˆj(k) . J1,J2 j∈J1 j∈J2 0 J1∪J2=J Notice that there are at most 2nδ terms in the above expansion (because |J 0| ≤ nδ). The term N r+s−j1 r+s−j|J0| j∈J0 uˆj(k) is only nonzero at k values of the form (2v1 + 1)Y 2 + ··· + (2v|J0| + 1)Y 2 , nτ where each vi satisfies |2vi + 1| ≤ 2 . All other terms are extremely small everywhere. To 0 be precise, when J1 6= J , every such term involves at least onev ˆj(k) factor and so we can write     N uˆ (k) ⊗ N vˆ (k) =v ˆ (k)⊗qˆ(k) for some function q : D → {−1, 0, 1}. By combining j∈J1 j j∈J2 j j Y η   the bound P |vˆ (k)| = O(2−n ) with the trivial bound |qˆ(k)| ≤ 1, we obtain | N uˆ (k) ⊗ k j j∈J1 j N  −nη N vˆ (k) | = O(2 ). Therefore, the total contribution of all terms except 0 uˆ (k) is j∈J2 j j∈J j negligible (O(2−(nη−nδ))). From the above, it is immediate that the only non-negligible Fourier r+s−j1 r+s−j|J0| components are values k of the form (2v1 + 1)Y 2 + ··· + (2v|J0| + 1)Y 2 , where each nτ α α vi satisfies |2vi + 1| ≤ 2 . Recall that each j satisfies r < j < r + s − r . Therefore, these values k are of the form Y k0, where k0 is a (|J 0|, 2nτ )-sparse number with at least rα trailing zeros and at most r + s − rα trailing zeros.

12 Next, we considerg ˆY,J0 (k). We have

sˆ O gˆY,J0 (k) = Y 2 hY (k) ⊗ pˆY (k) ⊗ sˆj(k) j∈J0

sˆ O = Y 2 hY (k) ⊗ pˆY (k) ⊗ (ˆuj(k) +v ˆj(k)) j∈J0     sˆ X O O = Y 2 hY (k) ⊗ pˆY (k) ⊗  uˆj(k) ⊗  vˆj(k) J1,J2 j∈J1 j∈J2 0 J1∪J2=J     X sˆ O O = Y 2 hY (k) ⊗ pˆY (k) ⊗  uˆj(k) ⊗  vˆj(k) . J1,J2 j∈J1 j∈J2 0 J1∪J2=J 0 By the same logic as above, the total contribution of every term in the sum except the J1 = J term is negligible everywhere (has total magnitude O(2−(nη−nδ)) at all k) and so if we define the ˆ0 sˆ N ˆ0 function g Y,J0 (k) = Y 2 hY (k) ⊗ pˆY (k) ⊗ j∈J0 uˆj(k), it suffices to show that g Y,J0 is small at the k values of interest. We have ( 1, k = u2r+s pˆY (k) = 0, otherwise and 2πik − s sˆ 1 1 − e 2 Y 2 hY (k) = . r − 2πik 2 1 − e Y 2r+s ˆ0 Therefore, the only non-negligible values of g Y,J0 (k) are those that are “close” to values of the form 0 r+s ˆ0 0 Y k mod 2 . More precisely, the only non-negligible values of g Y,J0 are of the form k ≡ Y k + u r+s s+nν ˆ s mod 2 , where |u| ≤ 2 , and so the only non-negligible values of fY,J0 (k) =g ˆY,J0 (2 k) are at values k such that 2sk ≡ Y k0 + u mod 2r+s. Or equivalently, values k where ∃k0, u0 where k0 is (as above) a (|J 0|, 2nτ )-spare number with at least rα trailing zeros and at most r + s − rα trailing zeros, |u0| ≤ 2nν such that k + u0 is equal to the high 2r bits of Y k0 mod 2r+s. ˆ Therefore, for a particular value Y , the required bound on |fY,J0 (k)| holds if, for every (nδ, 10nδ2nν )-sparse number k, we do not have k + u0 equal to the high 2r bits of Y k0 mod 2r+s, for any (|J 0|, 2nτ )-sparse number k0 with at least rα trailing zeros and at most r + s − rα trailing zeros. To see that this holds for all but an exponentially small fraction of Y , first notice that if k is a (nδ, 10nδ2nν )-sparse number, then k +u0 is a (nδ +1, 10nδ2nν )-sparse number. Set the constants τ and ν small enough such that nν+τ  rα (this can be done because n = 2r +s−2rα = 2r +ru −2rα and so n is polynomial in r). Therefore, it suffices to show that, for almost all Y , if k0 is a (|J 0|, 2nτ )- sparse number with at least rα trailing zeros and at most r + s − rα trailing zeros then the high 2r bits of Y k0 mod 2r+s is not a (nδ + 1, 10nδ2nν )-sparse number. To see this, notice that, for each 00 1 r pair of sparse numbers k, k , there is at most a fraction 2rα of all Y such that the high 2 bits of Y k0 mod 2r+s are equal to k00 and so a simple union bound completes the proof.

13 2.5 Matrix Multiplication 0 rs We now show that matrix multiplication is AC -pseudorandom. Let MATRIX-MULTr,s : {0, 1} × {0, 1}rs → {0, 1}s2 denote the matrix multiplication function, which, on input a s × r matrix A and a r × s matrix B (both of which are encoded as strings in {0, 1}rs in the obvious way), produces the s × s matrix AB.

u 0 Theorem 4. If s = r , for any fixed constant u > 0, then MATRIX-MULTr,s is AC -pseudorandom. As was the case for the convolution and multiplication problems, we consider a stronger variant where one of the matrices is held fixed. We then prove the following lemma, from which the above theorem immediately follows.

rs s2 Lemma 6. For an s × r matrix A, let fA : {0, 1} → {0, 1} denote the function that, on input a r × s matrix B produces the s × s matrix Z = AB. Then all but an exponentially small fraction 0 of A yield an fA that is AC -pseudorandom.

0 th Proof. To see that almost all such fA are AC -pseudorandom, let Bi and Zi denote the i column of B and Z, respectively. Then, of course, Zi = ABi, and so we can interpret this problem as the concatenation of s independent instances of the homomorphism problem. That is to say, if we let 0 r s 0 fA : {0, 1} → {0, 1} be the homomorphism corresponding to A, then Zi = f (Bi). The result then follows from Theorem 1.

3 The Reduction Technique

3.1 Next-Bit Test and Parity In this section, another technique for proving that a function appears random to AC0 circuits is presented, specifically, reducing a known hard problem to the next-bit test. The next-bit test is n defined as follows. Given a distribution µn with support {0, 1} , we say that µn passes the next-bit 0 test if, given the first i bits of a string selected according to µn, no AC circuit can predict the th n (i+1) bit with non-negligible advantage, for any i. Formally, for any Z ∈ {0, 1} , let Zj denote the th j bit of Z (1 indexed, counting from left to right) and Z[j,k] denote the substring of Z from positions j to k, inclusive. Then we say that µn passes the next-bit test if, for all i ∈ {1, . . . , n}, and for all i−1 0 1 −nκ functions Qi : {0, 1} → {0, 1} computable by AC circuits, |Pr(Qi(Z[1,i−1]) = xi)− 2 | = O(2 ), for some constant κ > 0, where the probability is taken over values of Z ∈ {0, 1}n drawn according to the distribution µn. It is known [Yao82] that a distribution µn passes the next-bit test if and −nκ 0 only if µn O(2 )-fools all AC circuits (strictly speaking, the result in [Yao82] was proven for probabilistic polynomial time algorithms, but the same technique applies just as well to AC0 circuit families). Since, as stated in §1, we say that a function f is AC0-pseudorandom if the distribution −nκ 0 µn corresponding to it O(2 )-fools all AC circuits, showing that µn passes the next-bit test is sufficient to prove the corresponding f is AC0-pseudorandom. The natural next question is how to prove that distributions arising from particular functions pass the next-bit test. One idea is to reduce a problem that is known to be hard for AC0, such as the parity problem, to the next-bit test. The parity problem is defined as follows: given some ∗ P X ∈ {0, 1} , compute i Xi mod 2. In other words, the parity of a string is 1 if there are an odd number of 1s in the string and 0 if there are an even number of 1s in the string. It is known that no AC0 circuit family can compute parity [FSS84],[Ajt83]. In fact, parity can’t even be non-negligibly approximated in AC0 [Has86]. To be precise, if we define h(s, d, n) to be the function such that no

14 s 1 depth d circuit of size 2 computes parity correctly for more than a 2 + h(s, d, n) fraction of the inputs, then we have the following (Theorem 8.1.iii in [Has86])

1 −Ω(( n ) d−1 ) 1 Theorem. [Has86] h(s, d, n) < 2 s for d > 2 and s < n d .

The goal is then to reduce the parity problem to the problem of computing the next bit of 0 a string drawn according to µn, or, in other words, show that if some AC circuit could predict the next bit with non-negligible advantage, then it could be used to produce another AC0 circuit that approximates the parity problem, with non-negligible advantage. Since the parity problem cannot be approximated by such a circuit, we could then conclude that the original distribution must pass the next-bit test.

3.2 Integer Multiplication 0 u As was already shown in Theorem 3, the function MULTr,s,k is AC -pseudorandom when s = r and k = r + s − (MIN(r, s))α, for constants u > 0 and 0 < α < 1. This was done by considering a variant of the multiplication function in which one of the multiplicands is held fixed. Specifically, s r k r for Y ∈ {0, 1} , we defined the function fY : {0, 1} → {0, 1} which takes a value X ∈ {0, 1} 0 to the middle k bits of X × Y . As shown in Lemma 3, fY is AC -pseudorandom for all but an exponentially small fraction of Y , when s = ru and k = r + s − (MIN(r, s))α. In this section, we will be interested in results that hold when s is much greater than r. Specifically, we are interested in the case when s > ru for all constants u > 0, but r > logc s for all constants c > 0. Recall that we say a given function looks random to AC0 circuits if the distribution corresponding to it can only be distinguished (by AC0 circuits) from the uniform distribution with advantage O(2−nκ ). In this section we relax this condition only slightly, and only require a bound on the advantage c of the form o(2− log n) for all constants c > 0 (in other words, we require that no AC0 circuit can distinguish with advantage one over any quasipolynomial in n). We show that, for certain 0 Y , fY is AC pseudorandom with these parameters. This has several interesting consequences. Firstly, this yields a simple, multiplication based pseudorandom generator with the same stretch and security parameters as the Nisan-Wigderson generator [Nis91]. Secondly, this shows that the result in [CSV84], which states that an AC0 circuit can multiply an n-bit value Y by a O(logc n) bit value X is tight, We restrict our attention to Y ∈ {0, 1}s that are “sparse”, in the sense that only a small number of the bits of Y are 1s. Specifically, we generate Y as follows: each bit is set to be 1 with − 1 r k probability r , for a constant 0 <  < 2 . As before, let fY : {0, 1} → {0, 1} be defined such that 2 fY takes the value X to the middle k bits of the product X × Y , where here k = r + s − 2r . We prove the following theorem.

Theorem 5. With high probability (where the probability is over the selection of Y according to the above distribution, and the statement high probability means within an exponentially small distance 0 from probability 1), fY is AC -pseudorandom.

Proof. As usual, we consider strings of the form X ◦ fY (X). For convenience, we assume that both X and the substring of Z = X × Y produced by fY are written from least significant bit to most significant bit, when read from left to right. We let n denote the total length of the string, and so n = 2r + s − 2r2. Consider the next-bit test applied to strings generated in this manner. Since the first r bits of the string are bits of the uniformly randomly generated number X, we conclude, for information theoretic reasons, that there is no hope of any AC0 circuit predicting the ith bit, given the first i − 1 bits, for i ∈ {1, . . . , r}. All that remains is to prove the same claim for

15 i ∈ {r + 1, . . . , n}, which will be done by showing that any AC0 circuit that predicts such a bit with non-negligible advantage can be used to approximate the parity function, with non-negligible advantage, which we know is impossible. We assume, for contradiction, that we have an AC0 circuit, call it C, that can predict some next-bit of our pseudorandom string, call it bit i, given the first i − 1 bits. Using the circuit C, we will produce an AC0 circuit D that predicts (with non-negligible advantage) the solution to a parity problem T of size rν, for some ν > 0, which is impossible. th Begin by noting that, if Yj denotes the j bit of Y (0 indexed, counting from least significant Pr−1 j Pr−1 j bit up), then we have X × Y = X j=0 Yj2 = j=0 XYj2 . Thus, we can understand the multiplication of X by Y as the sum of many shifts of X, where the amount that X is shifted in determined by the locations of the 1s in Y . To be precise, for each j such that Yj = 1, we include a copy of X shifted left by j indices. To produce the product X × Y , we then sum all copies of X. This is illustrated in the figure below.

Each column contains certain bits of X. One way to characterize which bits appear in each particular column is to imagine sliding the strings X and Y REV past one another, where Y REV is the string Y flipped left-to-right. To be precise, start by aligning X and Y REV such that the least significant bits of X and Y line up, and no other bits initially line up. To determine which bits of X lie in column j (where we number the columns from right to left, starting with 0), slide Y REV j bits over; exactly the bits of X that lines up with a 1 in Y appear in column j. This is illustrated in the figure below.

Define sets Uj ⊆ {0, . . . , r − 1} such that Uj consists of all indices of X that appear in column j. Let Sj ⊆ {0, . . . , s − 1} be a collection of indices of Y . The exact manner in which the Sj are selected will be specified shortly. Let Vj ⊆ Uj be indices of X that appear in column j because they lined up with a 1 in Y at one of the indices Sj. As noted above, we must have i ∈ {r + 1, . . . , n} (the portion of the string containing bits of the product Z = X × Y ), and so we are predicting bit i − r + r2 − 1 =: k of the product. Notice that, if it weren’t for the fact that there are carries when computing the sum of the various shifts of X, bit k of the product would simply be the parity of the bits of X selected by Uk. The key idea will be to construct the sets Vk c so that they are individually large, |Vk| > log s, for all constants c > 0, but have small intersection with any Uj, |Vk ∩ Uj| ≤ 2, ∀j < k, and then fill the bits of X specified by Vk with the bits of an instance of the parity problem. This is very similar to the notion of a combinatorial design, [Nis91], with the exception of the fact that here we consider subsets Vj of Uj. The circuit D predicts the solution of the parity problem T by producing a multiplication instance to feed to C, that is to say the first i − 1 bits of a string produced by multiplication. This string consists of a value X and some of the bits of the product XY . We construct this multiplication instance as follows. Begin by setting the bits of X selected by Vk to the bits of the parity instance T . To set the other bits of X, notice that if C can truly predict the next-bit test

16 with non-negligible advantage, then this means, by definition, that the advantage of C, averaged over all choices of X, is non-negligible. In particular, this means that there must exist at least one setting of the other bits of X such that C has non-negligible advantage as just the bits selected by Vk vary (uniformly). We then set the other bits of X to such a fixed value. To be clear, the claim is not that an AC0 circuit can find a proper setting to the other bits of X, but rather that such a value can simply be built into D (because it is only a single fixed value, which depends only on the input size t of circuit D). In order to calculate the lowest k − 1 bits of XY that must be fed to C, we write X = Xinput + Xfixed where Xinput consists of the t bits of the input to D, which are assigned to the positions specified by Vk, as Xfixed corresponds to the fixed setting of the other bits of X. Since both Y and Xfixed are fixed values, we can also build the value YXfixed into D. 0 Therefore, if it were possible to compute in AC the low k − 1 bits of YXinput, then it would be possible to compute the low k − 1 bits of XY because XY = YXinput + YXfixed, and we can, of course, perform addition in AC0. The key observation is that, with high probability over the choice of Y , it will be easy to compute YXinput. To see this, notice that, with high probability over Y , there will be a choice of Sk such that |Vk ∩Uj| ≤ 2, for j ∈ {0, . . . , k−1}. This is simply the statement that each column of multiplication problem illustrated in the figure above contains at most two bits of Xinput. Therefore, these bits can be packed into two numbers, whose sum (which is calculable in AC0) will be the low bits of 0 YXinput. To see that |Vk ∩ Uj| ≤ 2, with high probability, let Y be identical to Y except that all bits outside of Sk are set to 0, and note that |Vk ∩ Uj| is simply the number of 1s that line up when 0 0 Y and Y are slid over one another, or, in other words, the number of h such that Yh and Yh−(k−j) are both 1. To bound the probability that |Vk ∩ Uj| fails to be at most 2 for every j, we show this failure probability (where, again, the probability is taken over the choice of Y ) is extremely 0 small for a single fixed j and union bound over the j. Fix j and define Qh = YhYh−(k−j); then P |Vk ∩ Uj| = h Qh. Unfortunately, the Qh are not independent. To deal with this, partition the indices h into two classes, where the first class contains all h such that h mod 2(k − j) falls in the range [0, k − j − 1] and the second class contains all other h. Notice that h and h − (k − j) always are in separate classes, and so the set of all Qh such that h is in the first class are independent, and, similarly, the set of all Qh such that h is in the second class are independent. We show that P h Qh ≤ 1, where the sum is restricted to a single class. Recall that the bits of Y are generated (independently) such that each bit is 1 with probability r− and that, if we select the special bits Sk at random (which is allowed because we need only show ∃SK that satisfies the above) such that each of the bits of Y that line up with a portion of X (when sliding Y over X, only part of Y lines −(1−) up with actual indices of X at any given shift) are included in Sk with probability r then a bit of Y 0 is 1 with probability r−(1−2). The result follows from a simple application of the Chernoff bound. Thus far, we have shown that D can produce a multiplication instance to feed to C. To use the result produced by C (namely, the predicted next bit of the product) to determine the parity of T , notice that the correct value of the next bit of the product is simply the exclusive-or of the parity of T , the parity of those bits of Xfixed that appear in column k of the multiplication problem, and the carry bit that enters column k when the low k − 1 bits of YXinput and YXfixed are added to produce the low k − 1 bits of the product XY . Since Xfixed is a single fixed value, the parity of those bits that appear in column k can be built in to D. As noted earlier, it is possible, 0 in AC , to compute the sum of the low k − 1 bits of YXinput and YXfixed, including the carry into column k. Thus, if the next bit can be predicted with some advantage, then the parity of T can be predicted with the exact same advantage. This contradiction completes the proof that the multiplication problem, as defined above, looks random to AC0.

17 It is worth noting that, while the above proof was only carried out in the case when r < sα for all constants α > 0, but r > logc s for all constants c, the same technique would also work for other parameters, such as if s = ru, for some constant u (the parameters of Lemma 3). Moreover, a c 0 similar argument would show that, if r = O(log s), then fY passes all AC tests of depth at most d, where d depends on c.

4 The Algebraic Integer Problem

In this section, it is shown that the algebraic integer problem looks random to AC0 circuits. We begin with a few definitions. An algebraic integer is a root of some monic polynomial with integer coefficients. An algebraic number field is a finite field extension of Q. Given some algebraic number field K, the ring of integers of K, denoted OK , is the ring that consists of all algebraic integers in K. For every K, OK is a free Z-module, and so has an integral basis (that is to say, ∃b1, . . . bh ∈ OK P such that every element of OK can be uniquely expressed as i aibi, for ai ∈ Z). For a particular m1 m k basis B, we define the function fB : {0, 1} × · · · × {0, 1} h → {0, 1} such that fB(a1, . . . , ah) P is the first k bits of the binary expansion of the fractional real part of i aibi, where for i > 1, ui u mi = m1 for some constant ui > 0, and k = m1 , for any constant u. We show, via reduction from 0 the multiplication problem, that certain fB are AC -pseudorandom.√ As an example, consider the algebraic number field K =√Q( d), for d a squarefree positive integer. It can be shown that, when√ d ≡ 2, 3 mod 4, then {1, d} is an integral basis for OK and that when d ≡ 1 mod 4, {1, (1+ d)/2} is an integral basis for OK (of course, since d is squarefree, we can’t have d ≡ 0 mod 4). Let b1 and b2 denote the basis elements, in the order they appear above. Then fB(a1, a2) is simply the first k bits of the fractional part of a1b1 + a2b2, which is identical to the first k bits of the fractional part of a2b2 (because a1, b1 ∈ Z). It is straightforward bn/2c−1 to show that, for all sufficiently large n, and all strings Y√ ∈ {0, 1} , there is an n bit value d for which the binary expansion of the fractional part of d starts with the string Y . In particular, 0 if we consider a string Y such that the multiplication function fY is AC -pseudorandom, then the√ 0 corresponding fB is also AC -pseudorandom, because it is just the multiplication problem a2 d bit-shifted, possibly with 1/2 added. In general, consider any basis B of some OK such that there is some basis element bj in B such that the binary expansion of the fractional real part of bj starts with a value Y for 0 which fY is AC -pseudorandom. Rather than consider fB directly, it will again be convenient to consider a variant of the function in which some of the inputs are held fixed. In particular, mj k we wish to fix ai for each i 6= j. Define the function fB,j,a1,...,a ,a+j+1,a : {0, 1} → {0, 1} P j−1 h such that it maps the value aj to the first k bits of i aibi. By a straightforward reduction from 0 the multiplication problem, it follows that fB,j,a1,...,aj−1,a+j+1,ah is AC -pseudorandom, which then 0 immediately implies that fB is AC -pseudorandom.

References

1 [Ajt83] M. Ajtai, Σ1-Formulae on Finite Structure, APAL (1983). [AGM02] N. Alon, O. Goldreich, and Y. Mansour, Almost k-wise independence versus k-wise inde- pendence, Electronic Colloquium on Computational Complexity, Report TR02-048 (2002). [Bra09] M. Braverman, Poly-logarithmic independence fools AC0 circuits, IEEE Conference on Computational Complexity (2009), 3-8. [CSV84] A. K. Chandra, . Stockmeyer, and U. Vishkin, Constant depth reducibility, SIAM Journal on Computing (1984), 13:423-439.

18

[FSS84] M. Furst, J. Saxe, and M. Sipser, Parity, circuits, and the polynomial time hierarchy, Mathematical Systems Theory (1984), 17:13-27. [Gre12] B. Green, On (not) computing the M¨obius function using bounded depth circuits, http://arxiv.org/pdf/1103.4991.pdf (2012). [Has86] J. Hastad, Computational limitations for small depth circuits, MIT Press (1986), Ph. D. thesis. [KLL84] R. Kannan, A. K. Lenstra, L. Lovasz, Polynomial factorization and nonrandomness of bits of algebraic and some transcendental numbers, STOC (1984), 191-200. [Kat86] I. Katai, Distribution of digits of primes in q-ary canonical form, Acta Math (1986), 47(3- 4):341-359. [LN90] N. Linial and N. Nisan, Approximate inclusion-exclusion, Cominatorica (1990), 10(4):349- 365. [Nis91] N. Nisan, Pseudorandom bits for constant depth circuits, Combinatorica (1991), 11(1):63- 70. [NW94] N. Nisan and A. Wigderson, Hardness vs Randomness, Journal of Computer and Systems Sciences (1994), 49(2):149-167. [OSB99] A. Oppenheim, R. Schafer, and J. Buck, Discrete-Time Signal Processing, Prentice Hall (1999). [Vaz86] U. V. Vazirani, Randomness, Adversaries and Computation, Ph.D. Thesis, EECS, UC Berkeley (1986). [Yao82] A. C. Yao, Theory and application of trapdoor functions, IEEE Symposium on Foundations of Computer Science (1982), 80-91. [Yao85] A. C. Yao, Separating the polynomial-time hierarchy by oracles, Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science (1985), 1-10.

ECCC ISSN 1433-8092 19 http://eccc.hpi-web.de