<<

Bounded Independence Plus Noise

by

Chin Ho Lee

A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of Doctor of Philosophy in Computer Science

Northeastern University August 2019 c 2019 Chin Ho Lee All rights reserved. To my parents Abstract Bounded Independence Plus Noise

by

Chin Ho Lee Doctor of Philosophy Khoury College of Computer Science Northeastern University

Derandomization is a fundamental research area in theoretical computer science. In the past decade, researchers have been able to derandomize a number of computational problems, leading to breakthrough discoveries such as proving SL = L, explicit constructions of Ramsey graphs and optimal-rate error-correcting codes. Bounded independent and small-bias distributions are two pseudorandom primitives that are used extensively in derandomization. This thesis studies the power of these primitives under the perturbation of noise. We give positive and negative results on these perturbed distributions. In particular, we show that they are significantly more powerful than the unperturbed ones, and have the potential to solve long standing open problems such as proving RL = L and AC0[⊕] lower bounds. As applications, we give new lower bounds on the complexity of decoding error-correcting codes, nearly-optimal pseudorandom generators for old and new classes of tests, and limita- tions on the sum of small-bias distributions.

i Acknowledgements

Foremost, I would like to thank Manu, Daniel, Jon and Omer for taking the time to serve on my thesis committee. This thesis would have been impossible without the wise guidance of Manu. I am truly grateful for his patience during my six years of PhD. His unique perspectives on many matters, whether they are related to research or not, have made a great impact on my life. His striving for simplicity and his clarity of writing will always be examples for me to pursue in the future. I thank Ravi Boppana, Elad Haramaty, Johan H˚astadand Manu, with whom I have collaborated on several results related to this thesis, for sharing their ideas with me. I have learned a great deal of doing research through these collaborations. I am extremely grateful to Amnon Ta-Shma for hosting me at Tel-Aviv University, and to Dean Doron and Gil Cohen for many stimulating discussions during my visit. I thank for his excellent course on pseudorandomness at Harvard. My understanding of pseudorandomness would not have been the same without it. I thank Andrej Bogdanov for teaching me Fourier Analysis during my masters, a tool that is used extensively in this thesis. I also thank Andrej for being available for discussions whenever I went home for a visit, and for giving me the opportunities to give talks at the theory seminars in CUHK. My PhD life would have been miserable without my friends at Harvard and Khoury, in particular those at MD138 and WVH266. I thank them for keeping the office spaces full of positive energy, and organizing all kinds of activities to put my mind at ease when getting depressed from research. Finally, I thank my beloved Sabrina for her endless support, patience and love on the other side of the world.

ii Contents

Abstract...... i Acknowledgements...... ii Table of Contents...... iii

1 Introduction1 1.1 Contribution of this thesis...... 3 1.2 Organization of this thesis...... 4

2 Bounded Independence Plus Noise Fools Products7 2.1 Our results...... 9 2.1.1 Application: The complexity of decoding...... 10 2.1.2 Application: Pseudorandomness...... 14 2.1.3 Techniques...... 16 2.2 Bounded independence plus noise fools products...... 17 2.2.1 Preliminaries...... 20 2.2.2 Proof of Theorem 2.22...... 22 2.3 Proofs for Section 2.1.1...... 24 2.4 Pseudorandomness: I...... 26 2.5 Pseudorandomness, II...... 29 2.5.1 Proof of Theorem 2.38...... 29 2.5.2 A recursive generator...... 32 2.6 Pseudorandomness, III...... 35 2.7 A lower bound on b and η ...... 36

3 Pseudorandom Generators for Read-Once Polynomials 37 3.1 Our results...... 37 3.1.1 Techniques...... 40 3.2 Bounded independence plus noise fools products...... 44 3.3 Pseudorandom generators...... 48 3.4 On almost k-wise independent variables with small total-variance...... 52 3.4.1 Preliminaries...... 53 3.4.2 Proof of Lemma 3.12...... 54 3.5 Improved bound for bounded independence plus noise fools products.... 68 3.5.1 Noise reduces variance of bounded complex-valued functions..... 69

iii 3.5.2 XOR Lemma for bounded independence...... 71 3.5.3 Proof of Theorem 3.9...... 73 3.5.4 Proof of Theorem 3.11...... 76 3.6 Small-bias plus noise fools degree-2 polynomials...... 77 3.7 Proof of Claim 3.8...... 78 3.8 Moment bounds for sum of almost d-wise independent variables...... 80

4 Fourier Bounds and Pseudorandom Generators for Product Tests 85 4.1 Our results...... 85 4.1.1 Techniques...... 89 4.2 Fourier spectrum of product tests...... 92 4.2.1 Schur-concavity of g ...... 97 4.2.2 Lower bound...... 99 4.3 Pseudorandom generators...... 99 4.3.1 Generator for product tests...... 101 4.3.2 Almost-optimal generator for XOR of Boolean functions...... 102 4.4 Level-d inequalities...... 105

5 Some Limitations of the Sum of Small-Bias Distributions 109 5.1 Our results...... 110 5.2 Our techniques...... 113 5.3 Our counterexamples...... 116 5.3.1 General circuits...... 116 5.3.2 NC2 circuits...... 117 5.3.3 One-way log-space computation...... 117 5.3.4 Depth 3 circuits, DNF formulas and AC0 circuits...... 118 5.3.5 Mod 3 linear functions...... 119 5.3.6 Sum of k copies of small-bias distributions...... 123 5.4 Mod 3 rank of k-wise independence...... 124 5.4.1 Lower bound for almost k-wise independence...... 124 5.4.2 Pairwise independence...... 125 5.5 Complexity of decoding...... 127

Bibilography 129

A Fooling read-once DNF formulas 141

iv Chapter 1

Introduction

The theory of pseudorandomness studies explicit constructions of objects that appear random to restricted classes of tests. It has numerous connections to computer science and math- ematics, including algorithms, computational complexity, and combinatorics. In particular, the study of pseudorandomness is indispensable in understanding the power of randomness in computation, a fundamental research area in theoretical computer science. While it is known that certain tasks in areas such as cryptography and distributed computing are impossible without randomness, researchers have been able to derandomize a number of computational problems, showing that randomness often does not give us significant savings in computational resources over determinism. One central open question in derandomization is the BPP vs. P question. It asks whether probabilitistic polynomial-time algorithms can be made deterministic without a drastic draw- back in its running time. This question is largely open, as resolving this question would imply circuit lower bounds that seem beyond reach given our current techniques [IKW02, KI04]. Because of this, research in derandomization can be divided into two directions. • Conditional results: One line of research, pioneered by Blum and Micali [BM84], and Yao [Yao82], shows that derandomization can be realized under the assumption that hard functions exist. This approach has found a lot of success in cryptography, where cryptographic primitives are constructed based on the intractability of several specific computational problems. Indeed, this approach was first proposed for cryptographic applications, and the idea of trading hardness for randomness was first suggested by Shamir [Sha81], who constructed pseudorandom sequences assuming the hardness of the RSA encryption function. The seminal work of Nisan and Wigderson [NW94] shows that derandomizing BPP is possible under weaker complexity assumptions. This is followed by the subsequent work of Impagliazzo and Wigderson [IW99], which showed that if any problem in the class E requires circuits of super-polynomial size to approximate, then BPP = P. Since then, researchers have tried to optimize the hardness and randomness trade-off. • Unconditional results: Another line of research turns to derandomizing restricted classes of computation models for which we can prove unconditional lower bounds.

1 Two major classes of tests that have received a lot of attention since the late 80s are constant-depth circuits [AW89] and space-bounded computation [AKS87]. A major open problem in derandomizing space-bounded computation is the RL vs. L question, which is a space-analogue of BPP vs. P, asking whether randomized logarithmic space computation can be made deterministic without a drastic blow-up in space. In contrast to proving BPP = P, currently no major “obstacle” is known in resolving this problem. The focus of this thesis will be on unconditional results. In this direction, several fascinating results were discovered in the past two decades. • Primality testing: It was shown in the 70s that there exist randomized polynomial- time algorithms that determine if a given integer is prime [Rab80, SS77]. In 2002, Agrawal, Kayal and Saxena [AKS04] showed that primality testing can be solved de- terministically in polynomial time. • Undirected s-t connectivity: A randomized polynomial-time algorithm was given in the 70s to decide if two points are connected in an undirected graph [AKL+79]. In 2005, Reingold [Rei05] showed that this problem can be solved deterministically in polynomial time, proving SL = L (see also [RV05]). • Ramsey graphs: In one of the first applications of the probabilistic method, Erd¨os[Erd47] showed that random graphs on n vertices are (2 log n)-Ramsey, meaning they have no clique or independent set of size 2 log n. The recent breakthrough results of Chattopad- hyay and Zuckerman [CZ16], and Cohen [Coh16] used tools in pseudorandomness to construct explicit graphs that are 2O(log log n)O(1) -Ramsey. A standard approach in derandomization is the construction of pseudorandom generators (PRGs), a fundamental object in pseudorandomness. Indeed, at least the latter two items above are proved using pseudorandom generators. A pseudorandom generator is a determin- istic efficient algorithm that takes a few random bits, called the seed, as input, and stretches them to a longer output that fools a class of tests, meaning no test can distinguish the output distribution of the generator (over uniform input) from truly random. One way to deran- domize a randomized algorithm is to enumerate the seeds of the generator, and simulate the algorithm using the outputs of the generator as truly random bits. Two primitives that are used extensively as building blocks in constructing pseudorandom generators are bounded independence and small-bias distributions, introduced by Carter and Wegman [CW79], and Naor and Naor [NN93], respectively. These primitives alone are often not sufficient for many applications, and researchers have proposed different ways of combining them to fool different classes of tests. In this thesis we focus on two of them. In the late 80s, Ajtai and Wigderson [AW89] constructed the first pseudorandom gener- ator for polynomial-size constant-depth circuits. Their construction relies on the fact that circuits become simplified under random restrictions, a procedure that selects a random sub- set of input bits and sets their values to random, and the simplified circuits can be fooled by bounded independence. To obtain their generator, they apply the argument recursively. This approach was revived and refined recently by the work of [GMR+12], and since then has been used extensively to construct pseudorandom generators for various classes of tests [TX13, GKM18, GY14, RSV13, SVW17, HT18, CHRT18, FK18, MRT18, CSV15, ST18, DHH18].

2 One notable result is the construction of generators by Meka, Reingold and Tal [MRT18], which fools the class of width-3 read-once branching programs, improving the O(log2 n) seed length of the generators in the 90s [Nis92, INW94]. In the last decade, Bogdanov and Viola [BV10a] proposed taking the sum of independent copies of small-bias distributions to fool F2-polynomials. This class of distributions is known to be significantly more powerful than a single copy, and was shown to give optimal PRGs for F2-polynomials of low degrees [Lov09, Vio09c]. It is an open question whether this approach fools polynomials of higher degree. If the construction worked for every degree d = logO(1) n, it would make progress on long-standing open problems in circuit complexity regarding constant-depth circuits with parity gates [Raz87]. This question is implicit in the works [BV10a, Lov09, Vio09c] and explicit in [Vio09b, Chapter 1] (Open question 4), cf. the survey [Vio09b, Chapter 1]. Besides F2-polynomials, Reingold and Vadhan (personal communication) asked whether there exists a constant c such that the sum of two independent copies of any n−c-biased distribution fools one-way logarithmic space, also known as read-once polynomial-width branching programs, which would imply RL = L. It is known that a small-bias distribution fools width-2 read-once branching programs (Saks and Zuckerman, see also [BDVY13] where a generalization is obtained). However, no such result is known for width-3 programs.

1.1 Contribution of this thesis

In this thesis, we study the two aforementioned proposals and attack them through the following seemingly unrelated natural question.

What is the power of bounded independence if we perturb it with noise?

One starting point of this thesis is the following observation. It is known that both bounded independent and small-bias distributions fail to fool certain very simple tests. For example, it is well-known that bounded independence completely fails to fool the parity function. Consider the distribution D that is uniform over n-bit strings with parity 0. Then it is straightforward to see that D is (n − 1)-wise independence. Clearly, the parity under D is always 0, whereas under the uniform distribution the parity equals 1 with probability 1/2. Likewise, one can show that small-bias does not fool the mod3 function, the language that contains all n-bit strings with Hamming weight divisible by 3, because the uniform distribution over these strings is small-biased. However, both examples above break completely if we perturb just a few bits of the distribution randomly, i.e., setting a few bits to uniform. For parity, it suffices to perturb just one bit and the expectation of parity will be the same as uniform. Similarly, if we perturb 3 bits of the input, then again the expectation of mod3 will be exponentially close to uniform in the number of perturbed bits. Thus, these primitives appear to become more powerful under the perturbation of noise, and the goal of this thesis is to understand the power of such distributions from various angles. We obtained both positive and negative results.

3 Positive results. We study the power of bounded independence plus noise through a new ` class of tests called product tests. These tests are the product of k functions fi : Zm → C≤1, where the inputs of the fi are disjoint subsets of the n variables and C≤1 is the complex unit disk. Our main positive result is showing that if we add a little noise to any distribution with bounded independence, or with small-bias, then the distribution fools product tests with good error. This has found applications in constructing pseudorandom generators and proving lower bounds for the complexity of decoding error-correcting codes. Along the way, we obtain several structural results about product tests, such as their Fourier spectrum and distinguishabilities against the coin problem, which play a critical role to obtaining a pseudorandom generator for these tests with a seed length that is close to optimal.

Negative results. Using a connection between bounded independence and linear error- correcting codes, we give a framework to exhibit several classes of tests that can distinguish bounded independence plus noise from uniform. With this framework and another com- pletely different approach, we construct several counterexamples showing some limitation of sum of small-bias distributions. Several ideas of in this thesis have played a role towards the recent developments on derandomizing space-bounded computation. We gave the first generator with seed length 2 better than O(log n) for the class of read-once F2-polynomials. This class of tests was an obstacle to constructing better pseudorandom generators for read-once constant-width branching programs that was noted by several researchers including Trevisan [Tre10] and Vadhan (personal communication). Subsequently, several ideas in our analyses were adapted by Meka, Reingold and Tal to construct a generator for width-3 read-once branching pro- grams with seed length better than the O(log2 n) seed length in the 90s. At about the same time, Forbes and Kelly [FK18], by applying a small but elegant modification to one of our analyses, simplified and improved the analysis in [MRT18], and obtained a pseudorandom generator for unordered polynomial-width read-once branching programs with seed length O(log3 n), improving the previous constructions of generators in the unordered setting. In addition, using a generator in this thesis, Viola [?] recently gave the first lower bound for randomized Turing machines with a read-only input tape. Kopparty, Shaltiel, and Sil- bak [KSS19] constructed list-decodable codes against space-bounded channels whose encod- ing and decoding algorithms run in quasilinear time.

1.2 Organization of this thesis

Chapter2. We formally define bounded independence, noise, and product tests and show that bounded independence plus noise fools product tests. We develop two applications of this type of results. First, we prove communication lower bounds for decoding noisy codewords of length n split among k parties. For Reed–Solomon codes of dimension n/k where k = O(1), we show that Ω(ηn) − O(log n) bits of communication is required to decode one message symbol from a codeword with ηn errors, and communication O(ηn log n) suffices.

4 Second, we obtain pseudorandom generators. We can ε-fool product tests f : {0, 1}n → ˜ 2 C≤1 under any permutation of the bits with seed lengths 2` + O(k log(1√ /ε)) and O(`) + O˜(p`k log 1/ε). Previous generators have seed lengths ≥ `k/2 or ≥ ` `k. For the special case where the k bounded functions have range {0, 1} the previous generators have seed length ≥ (` + log k) log(1/ε).

Chapter3. We give a different analysis of showing bounded independence plus noise fools product tests. Then we construct pseudorandom generators with improved seed length for several classes of tests. First we consider the class of read-once polynomials over GF(2) in n variables. For error ε we obtain seed length O˜(log(n/ε)) log(1/ε), where O˜ hides lower-order terms. This is optimal up to the factor O˜(log(1/ε)). The previous best seed length was polylogarithmic in n and 1/ε. n Second we consider product tests f : {0, 1} → C≤1. In this chapter, we obtain seed length ` · poly log(n/ε). This implies better generators for other classes of tests. Moreover, if the fi have output range independent of ` and k (e.g. {−1, 1}) then we obtain seed length ˜ ˜ O(` + log(k/ε)) log(1/ε). This is again√ optimal up to the factor O(log(1/ε)), while the seed length in the previous chapter is ≥ k.

Chapter4. We study the Fourier spectrum of product tests f : {0, 1}`k → {−1, 0, 1}. We prove that for every positive integer d,

X ˆ p d |fS| = O min{`, ` log(2k)} . S⊆[`k]:|S|=d

Our upper bounds are tight up to a constant factor in the O(·). Our proof uses Schur- P ˆ 2 convexity, and builds on a new “level-d inequality” that bounds above |S|=d fS for any [0, 1]-valued function f in terms of its expectation, which may be of independent interest. As a result, we construct pseudorandom generators for product tests with seed length O˜(`+log(k/ε)), which is optimal up to polynomial factors in log `, log log k and log log(1/ε). We also extend our results to product tests whose range is [−1, 1].

Chapter5. We present two approaches to constructing ε-biased distributions D on n bits and functions f : {0, 1}n → {0, 1} such that the XOR of two independent copies (D + D) does not fool f. Using them, we give constructions for any of the following choices: 1. ε = 2−Ω(n) and f is in P/poly; 2. ε = 2−Ω(n/ log n) and f is in NC2; 3. ε = n−c and f is a one-way space O(c log n) algorithm, for any c; 4. ε = n−Ω(1) and f is a mod 3 linear function. All the results give one-sided distinguishers, and extend to the XOR of more copies for suitable ε. We also give conditional results for AC0 and DNF formulas.

5 6 Chapter 2

Bounded Independence Plus Noise Fools Products

At least since the seminal work [CW79] the study of bounded independence has received a lot of attention in theoretical computer science. In particular, researchers have analyzed various classes of tests that cannot distinguish distributions with bounded independence from uniform. Such tests include (combinatorial) rectangles [EGL+98] (cf. [CRS00]), bounded- depth circuits [Baz09, Raz09, Bra10, Tal17], and halfspaces [DGJ+10, GOWZ10, DKN10], to name a few. We say that such tests are fooled by distributions with bounded independence.

n Definition 2.1 (Bounded independence). A distribution D over Zm is b-wise independent, b or b-uniform, if any b symbols of D are uniformly distributed over Zm. In this thesis consider fooling a new class of tests called product tests. These are functions which can be written as a product of arbitrary bounded functions defined on disjoint inputs.

n Definition 2.2 (Product tests). A function f : Zm → C≤1 is a product test with k functions of input length n if there exist k disjoint subsets I1,I2,...,Ik ⊆ {1, 2, . . . , n} of size ≤ ` such Q that f(x) = i≤k fi(xIi ) for some functions fi with range C≤1. Here C≤1 is the complex unit

disk {z ∈ C : |z| ≤ 1} and xIi are the |Ii| bits of x indexed by Ii.

Througout this thesis, we will often restrict the range of each fi to be a subset R of C≤1. For example, R can be the real interval [−1, 1], or the sets {0, 1} and {−1, 1}. We will sometimes use R-product tests to specify the range of the product tests in the discussion. We note that these tests make sense already for ` = 1 and large m (and in fact as we will see have been considered for such parameters in the literature). But it is essential for our applications that the input set of the fi has a product structure, so we think of ` being large. We can choose m = 2 for almost all of our results. In this case, each fi simply has domain {0, 1}`. The class of product tests was first introduced by Gopalan, Kane and Meka under the name of Fourier shapes [GKM18]. However, in their definition, the subsets Ii are fixed. Motivated by the recent constructions of pseudorandom generators against unordered tests,

7 which are tests that read input bits in arbitrary order [BPW11, IMZ19, RSV13, SVW17], we consider the generalization in which the subsets Ii can be arbitrary as long as they are of bounded size and pairwise disjoint. Constructing generators handling arbitrary order is significantly more challenging, because the classical space-bounded generators such as Nisan’s [Nis92] only work in fixed order [Tzu09, BPW11]. Product tests include as a special case several classes of tests which have been studied in the literature. Specifically, when the range of the functions fi is {0, 1}, product tests correspond to the AND of disjoint Boolean functions, also known as the well-studied class of combinatorial rectangles [AKS87, Nis92, NZ96, INW94, EGL+98, ASWZ96, Lu02, Vio14, GMR+12, GY14]. Definition 2.3 (Combinatorial rectangles). A combinatorial rectangle is a product test where each fi has output in {0, 1}. Product tests also generalize some other classes of tests. For example, {−1, 1}-product tests correspond to the XOR of disjoint Boolean functions, also known as the class of com- binatorial checkerboards [Wat13]. The work [GKM18] highlights the unifying role of C≤1 product tests by showing that any distribution that fools product tests also fools a number of other tests considered in the literature, including generalized halfspaces [GOWZ10] and combinatorial shapes [GMRZ13, De15]. Product tests can also be generalized to capture the important class of read-once space computation. Specifically, Reingold, Steinke and Vadhan [RSV13] showed that the class of read-once width-w branching programs can be encoded as product tests with outputs {0, 1}w×w, the set of w × w Boolean matrices.

Bounded independence vs. products. A moment of thought reveals that bounded independence completely fails to fool product tests. Indeed, note that the parity function on n := `k bits is a product test: set m = 2 and let each of the k functions compute the parity of their `-bit input, with output in {−1, 1}. However, consider the distribution D which is uniform on n − 1 bits and has the last bit equal to the parity of the first n − 1 bits. D has independence n − 1, which is just one short of maximum. And yet the expectation of parity under D is 1, whereas the expectation of parity under uniform is 0. The parity counterexample is the simplest example of a general obstacle which has more manifestations. For another example define gi := (1 − fi)/2, where the fi are as in the Q previous example. Each gi has range in {0, 1}, and so gi is a combinatorial rectangle. Q −k But the expectations of i gi under D and uniform differ by 2 . This error is too large for the applications in communication complexity and streaming where we have to sum over 2k rectangles. Indeed, jumping ahead, having a much lower error is critical for our applications. Finally, the obstacle arises even if we consider distributions with small bias [NN93] instead of bounded independence. Indeed, the uniform distribution D over n bits whose inner product modulo 2 is one has bias 2−Ω(n), but inner product is a nearly balanced function which can be written as product, implying that its expectations under D and uniform differ by 1/2 − o(1). The starting point of this thesis is the observation that all these examples break com- pletely if we perturb just a few bits of D randomly.

8 n Definition 2.4 (Noise). We denote by N(m, n, η) the noise distribution over Zm where the symbols are independent and each of them is set to uniform with probability η and is 0 otherwise. We simply write N when the parameters are clear from the context.

For parity, it suffices to perturb one bit and the expectation under D will be 0. For inner product, the distance between the expectations shrinks exponentially with the number of perturbed bits.

2.1 Our results

The main result in this thesis is that this is a general phenomenon: If we add a little noise to any distribution with bounded independence, or with small-bias, then we fool product tests with good error bounds. We first state the results for bounded independence.

` Theorem 2.5 (Bounded independence plus noise fools products). Let f1, . . . , fk : Zm → C1 be k functions with µi = E[fi]. Set n := `k. Let b ≥ ` and D be a b-uniform distribution n over Zm. Let N be the noise distribution from Definition 2.4. Write D = (D1,D2,...,Dk) ` where each Di is in Zm, and similarly for N. Then " # 2 Y Y Ω(b /n) E fi(Di + Ni) − µi ≤ k(1 − η) . i≤k i≤k

In Section 2.2 we prove a more general result that applies to distributions which are almost b-wise independent [NN93], and to a wider range of parameters. In that section we also point out that Theorem 2.5 is essentially tight when b = Ω(n). It is an interesting question whether the bounds are tight even for b = o(n), and we will come back to this question in the next ` chapter. We stress that the k points D1,D2,...,Dk ∈ Zm in the theorem may not even be pairwise independent; only the n symbols of D are b-wise independent. Also note that the theorem is meaningful for a wide range of the noise parameter η: we can have η constant, which means that we are perturbing a constant fraction of the symbols, or we can have η = O(1/n) which means that we are only perturbing a constant number of symbols, just like in the observation mentioned above. To illustrate this setting, consider for example k = O(1) and b = `. We can have an error bound of ε by setting η = c/n for a c that depends only on ε. We note that a noise vector can be equivalently viewed as a random restriction. With this interpretation, our results show that on average a random restriction of a product test is a function f 0 that is simpler in the sense that f 0 is fooled by any (`, δ)-biased distribution, for certain values of δ. (The latter property has equivalent formulations in terms of the Fourier coefficients of f 0, see [Baz09].) Thus, our results fall in the general theme “restrictions simplify functions” that has been mainstream in complexity theory since at least the work of Subbotovskaya [Sub61]. For an early example falling in this theme, consider AC0 circuits. There are distributions with super-constant independence which do not fool AC0 circuits of

9 bounded depth and polynomial size. (Take the uniform distribution conditioned on the parity of the first log many bits equal to 1, and use the fact that such circuits can compute parity on log many bits.) On the other hand, the switching lemma [FSS84, Ajt83, Yao85, H˚as, H˚as14, IMP] shows that randomly restricting all but a 1/polylog fraction of the variables collapses the circuit to a function that depends only on c = O(1) variables, and such a function is fooled by any c-wise independent distribution. Thus, adding noise dramatically reduces the amount of independence that is required to fool AC0 circuits. For a more recent example, Lemma 7.2 in [GMR+12] shows that for a special case of AC0 circuits – read-once CNF – one can restrict all but a constant fraction of the variables and then the resulting function is fooled by any ε-bias distribution for a certain ε = 1/`ω(1) which is seen to be larger than the bias that would be required had we not applied a restriction. We are not aware of prior work which applies to arbitrary functions as in our theorems. Another difference between our results and all the previous works that we are aware of lies in the parameter η. In previous works η is large, in particular η = Ω(1), which corresponds to restricting many variables. We can instead set η arbitrarily, and this flexibility is used in both of our applications.

2.1.1 Application: The complexity of decoding Error-correcting codes are a fundamental concept with myriad applications in computer science. It is relevant to several of these, and perhaps also natural, to ask what is the complexity of basic procedures related to error-correcting codes. In this chapter we fo- cus on decoding. The question of the complexity of decoding has already been addressed in [BYRST02, Baz05, Gro06]. However, all previous lower bounds that we are aware of are perhaps not as strong as one may hope. First, they provide no better negative results for decoding than for encoding. But common experience shows that decoding is much harder! Second, they do not apply to decision problems, but only to multi-output problems such as computing the entire message. Third, they apply to small-space algorithms but not to stronger models such as communication protocols. In this chapter we obtain new lower bounds for decoding which overcome these limita- tions. First, we obtain much stronger bounds for decoding than for encoding. For example, we prove below that decoding a message symbol from Reed–Solomon codeword of length q with Ω(q) errors requires Ω(q) communication. On the other hand, encoding is a linear map, and so one can compute any symbol with just O(log q) communication (or space). This expo- nential gap may provide a theoretical justification for the common experience that decoding is harder than encoding. Second, our results apply to decision problems. Third, our results apply to stronger models than space-bounded algorithms. Specifically, our lower bounds are proved in the k-party “number-in-hand” communication complexity model, where each of k collaborating parties receives a disjoint portion of the input. The parties communicate by broadcast (a.k.a. writing on a blackboard). For completeness we give next a definition. Although we only define deterministic protocols, our lower bounds in fact bound the corre- lation between such protocols and the hard problem, and so also hold for distributions of protocols (a.k.a. allowing the parties to share a random string).

10 Definition 2.6 (Number-in-hand protocols). A k-party number-in-hand, best-partition, n communication protocol for a function f : Zm → Y , where k divides n, is given by a partition of n into k sets S1,S2,...,Sk of equal size n/k and a binary tree. Each internal node v of n/k the tree is labeled with a set Sv ∈ {S1,S2,...,Sk} and a function fv : Zm → {0, 1}, and has two outgoing edges labeled 0 and 1. The leaves are labeled with elements from Y . On input n x ∈ Zm the protocol computes y ∈ Y following the root-to-leaf path where from node v we follow the edge labeled with the value of fv on the n/k symbols of x corresponding to Sv. The communication cost of the protocol is the depth of the tree.

Note that we insisted that k divides n, but all the results can be generalized to the case when this does not hold. However this small additional generality makes the statements slightly more cumbersome, so we prefer to avoid it. Jumping ahead, for Reed–Solomon codes this will mean that the claims do not apply as stated to prime fields (but again can be modified to apply to such fields). Again for completeness, we give next a definition of space-bounded algorithms. For simplicity we think of the input as being encoded in bits.

Definition 2.7 (One-way, bounded-space algorithm). A width-w (a.k.a. space-log w) one- way algorithm (or branching program or streaming algorithm) on n bits consists of a layered directed graph with n + 1 layers. Each layer has w nodes, except the first layer which has 1. Each node in layer i ≤ n has two edges, labeled with 0 and 1, connecting to nodes in layer i + 1. Each node on layer m + 1 is labeled with an output element. On an n-bit input, the algorithm follows the path corresponding to the input, reading the input in a one-way fashion (so layer i reads the i-th input bit), and then outputs the label of the last node. (For Boolean outputs it is sufficient for the last layer to have 2 nodes.)

We note that a space-s one-way algorithm can be simulated by a k-party protocol with communication sk. Thus our negative results apply to space-bounded algorithms as well. In fact, this simulation only uses one-way communication and a fixed partition (corresponding to the order in which the algorithm reads the input). But our communication lower bounds hold even for two-way communication and for any partition of the input into k parties, as in Definition 2.6. Our lower bound holds when the uniform distribution over the code is b-uniform.

n Definition 2.8. A code C ⊆ Fq is b-uniform if the uniform distribution over C is b-uniform.

The following standard fact relates the above definition to the dual distance of the code.

n Fact 2.9. Let X be the uniform distribution over a linear code C ⊆ Fq . Then X is d-wise independent if and only if the dual of C has minimum distance ≥ d + 1.

We state next a lower bound for distinguishing a noisy codeword from uniform. The “-1” in the assumption b ≥ n/k − 1 will be useful in Theorem 2.12.

11 n Theorem 2.10 (Distinguishing noisy codewords from uniform is hard). Let C ⊆ Fq be a b-uniform code. Let N be the noise distribution from Definition 2.4. Let k be an integer n dividing n such that b ≥ n/k − 1. Let P : Fq → {0, 1} be a k-party protocol using c bits of communication. Then

c+log n+O(1)−Ω(ηb/k) Pr[P (C + N) = 1] − Pr[P (U) = 1] ≤ ε for ε = 2 ,

n where C and U denote the uniform distributions over the code C and Fq respectively. We now make some remarks on this theorem. First, we note that a (`k)-party protocol can be simulated by a k-party protocol, so in this sense the lower the number of parties the stronger the lower bound. Also, the smallest number of parties to which the theorem can apply is k = n/b, because for k = n/b − 1 one can design b-uniform codes such that the distribution C + N can be distinguished well from uniform by just one party, cf. Section 2.7. And√ our lower bound applies for that number. The theorem is non-trivial whenever b = ω( n), but we illustrate it in the setting of b = Ω(n) which is typical in coding theory as we are also going to discuss. In this setting we can also set k = n/b = O(1). Hence for ε ≥ 1/poly(n), the communication lower bound is

c ≥ Ω(ηn) when η ≥ C log n/n for a universal constant C. When η = Ω(1) this becomes Ω(n). Note that this bound is within an O(log q) factor of the bit-length of the input, which is O(n log q), and within a constant factor if q = O(1). We prove an essentially matching upper bound in terms of η, stated next. The corre- sponding distinguisher is a simple variant of syndrome decoding which we call “truncated syndrome decoding.” It can be implemented as a small-space algorithm with one-sided error, and works even against adversarial noise. So the theorems can be interpreted as saying that syndrome decoding uses an optimal amount of space. We denote by V (t) = Vm,n(t) the n volume of the m-ary Hamming ball in n dimensions of radius t, i.e., the number of x ∈ Zm with at most t non-zero coordinates.

n Theorem 2.11 (Truncated syndrome distinguishing). Let C ⊆ Fq be a linear code with dimension d. Given t and δ > 0, define s := dlogq(Vq,n(t)/δ)e. If d ≤ n − s there is a one-way algorithm A that runs in space s log q such that 1. for every x ∈ C and for every e of Hamming weight ≤ t, A(x + e) = 1, and n 2. Pr[A(U) = 1] ≤ δ, where U is uniform in Fq . Moreover, the space bound s log q is at most O(t log(nq/t)) + log 1/δ.

Note that when t = O(ηn) and δ is constant the space bound is O(ηn log(q/η)), which matches our Ω(ηn) lower bound up to the O(log(q/η)) factor. These results in particular apply to Reed–Solomon codes. Recall that a Reed–Solomon b code of dimension b is the linear code where a message in Fq is interpreted as a polynomial p of degree b − 1 and encoded as the q evaluations of p over any element in the field. (In

12 some presentations, the element 0 is excluded.) Such a code is b-uniform because for any b points (xi, yi) where the xi’s are different, there is exactly one polynomial p of degree b − 1 such that p(xi) = yi for every i. n For several binary codes C ⊆ F2 and constant η we can obtain a communication lower bound of Ω(n) which is tight up to constant factors. This is true for example for random, linear codes (with bounded rate). The complexity of decoding such codes is intensely studied, also because the assumed intractability of their decoding is a basis for several cryptographic applications. See for example [BJMM12]. We also obtain a tight lower bound of Ω(n) for several explicitly-defined binary codes. For example, we can pick an explicit binary code n C ⊆ F2 which is Ω(n)-uniform and that can be decoded in polynomial time for a certain constant noise parameter η (with high probability), see [Shp09] for a construction.

Lower bounds for decoding one symbol. The lower bound in Theorem 2.10 is for the problem of distinguishing noisy codewords from uniform. Intuitively, this is a strong lower bound saying that no bit of information can be obtained from a noisy codeword. We next use this result to obtain lower bounds for decoding one symbol of the message given a noisy codeword. Some care is needed because some message symbols may just be copied in the codeword. This would allow one party to decode those symbols with no communication, even though the noisy codeword may be indistinguishable from uniform. The lower bound applies to codes that remain b-uniform even after fixing some input symbol. For such codes, a low- communication protocol cannot decode that symbol significantly better than by guessing at random.

0 n Theorem 2.12. Let C ⊆ Fq be a linear code with an n×r generator matrix G. Let i ∈ {1, 2, . . . , r} be an index, and let C be the code defined as C := {Gx | xi = 0}. Let N be the noise distribution from Definition 2.4. Let k be an integer. Suppose that C is b-uniform for n b ≥ n/k − 1. Let P : Fq → Fq be a k-party protocol using c bits of communication. Then

Pr[P (GU + N) = Ui] ≤ 1/q + ε, where U = (U1,U2,...,Ur) is the uniform distribution and ε is as in Theorem 2.10. We remark that whether C is b-uniform in general depends on both G and i. For example, let C0 be a Reed–Solomon code of dimension b = n/k. Recall that C0 is b-uniform. Note that if we choose i = 0 (corresponding to the evaluation of the polynomial at the point 0 ∈ Fq, which as we remarked earlier is a point we consider) then C has a fixed symbol and so is not even 1-uniform. On the other hand, if i = b then we obtain a Reed–Solomon code with dimension b − 1, which is (b − 1)-uniform, and the lower bound in Theorem 2.12 applies. We again obtain an almost matching upper bound. In fact, the corresponding protocol recovers the entire message.

n Theorem 2.13 (Recovering messages from noisy codewords). Let C ⊆ Fq be a code with distance d. Let t be an integer such that 2t < d, and let k be an integer dividing n. There n n is a k-party protocol P : Fq → Fq communicating max{n − d + 2t + 1 − n/k, 0}dlog2 qe bits such that for every x ∈ C and every e of Hamming weight ≤ t, P (x + e) = x.

13 A Reed–Solomon code with dimension b has distance d = n − b + 1. Hence we obtain communication max{b + 2t − n/k, 0}dlog2 qe, for any t such that 2t < n − b + 1. This upper bound matches the lower bound in Theorem 2.12 up to a log q factor. For example, when k = O(1) and b = n/k our upper bound is O(ηn log q) for η = t/n and our lower bound is Ω(ηn) − O(log n).

2.1.2 Application: Pseudorandomness The construction of explicit pseudorandom generators against restricted classes of tests is a fundamental challenge that has received a lot of attention at least since the 80’s, cf. [AW89, AKS87]. One class of tests extensively considered in the literature is concerned with algorithms that read the input bits in a one-way fashion in a fixed order. A lead- ing goal is to prove RL = L by constructing generators with logarithmic seed length that fool one-way, space-bounded algorithms, but here the seminal papers [Nis92, INW94, NZ96] remain the state of the art and have larger seed lengths. However, somewhat better genera- tors have been obtained for several special cases, including for example combinatorial rect- angles [AKS87, Nis92, NZ96, INW94, EGL+98, ASWZ96, Lu02, Vio14, GMR+12, GY14], combinatorial shapes [GMRZ13, De15, GKM18], and product tests [GKM18]. In particular, for combinatorial rectangles f :({0, 1}`)k → {0, 1} two incomparable results are known. For context, the minimal seed length up to constant factors is O(` + log(k/ε)). One line of research culminating in [Lu02] gives generators with seed length O(` + log k + log3/2(1/ε)). More recently, [GMR+12] (cf. [GY14]) improve the dependence on ε while making the depen- dence on the other parameters a bit worse: they achieve seed length O˜(` + log k + log(1/ε)). The latter result is extended to products in [GKM18] (with some other lower-order losses). Recently there has been considerable interest in extending tests by allowing them to read the bits in any order:[BPW11, BPW12, IMZ19, RSV13, SVW17]. This extension is significantly more challenging, and certain instantiations of generators against one-way tests are known to fail [BPW11]. We contribute new pseudorandom generators that fool product tests in any order.

Definition 2.14 (Fooling). A generator G: {0, 1}s → {0, 1}n ε-fools (or fools with error ε) a class T of tests on n bits if for every function f ∈ T we have |E[f(G(Us)) − E[f(Un)]| ≤ ε, where Us and Un are the uniform distributions on s and n bits respectively. We call s the seed length of G. We call G explicit if it is computable in time polynomial in n.

Definition 2.15 (Any order). We say that a generator G: {0, 1}s → {0, 1}n ε-fools a class T of tests in any order if for every permutation π on n bits the generator π◦G: {0, 1}s → {0, 1}n ε-fools T .

The next theorem gives some of our generators. The notation O˜() hides logarithmic factors in k and `. In this section we only consider alphabet size s = 2. We write the range {0, 1}`k of the generators as ({0, 1}`)k to indicate the parameters of the product tests.

14 Theorem 2.16 (PRG for unordered products, I). There exist explicit pseudorandom gen- erators G: {0, 1}s → ({0, 1}`)k that ε-fool product tests in any order, with the following seed lengths: 1. s = 2` + O˜(k2 log(1/ε)), and 2. s = O(`) + O˜((`k)2/3 log1/3(1/ε)).

One advantage of these generators is their simplicity: the generator’s output has the form D + N 0, where D is a small-bias distribution and N 0 is statistically close to a noise vector. Constructions in the literature tend to be somewhat more involved. In terms of parameters, we note that when k = O(1) we achieve in (1) seed length s = 2` + O(log 1/ε) log `, which is close to the value of `+O(log 1/ε), which is optimal even for the case of fixed order and k = 2. Our result is significant already for k = 3, but not for k = 2. In the latter√ case the seed length of (2 − Ω(1))` obtained in [BPW11] remains the best known. For k ≥ ` our generator in (2) has polynomial stretch, using a seed length O˜(n2/3) for output length n. For the sake of comparison we note again that [GKM18] has exponential stretch, achieving seed length O˜(`+log(k/ε)). However, [GKM18] does not work in any order. We note that for the special case of combinatorial rectangles f :({0, 1}`)k → {0, 1} a pseudorandom generator with seed length O((` + log k) log(1/ε)) follows from previous work. The generator simply outputs n bits such that any d · ` of them are 1/kd close to uniform in statistical distance, where d = c log(1/ε) for an appropriate constant c. Theorem 3 in [AGHP92] shows how to generate these bits from a seed of length O(` log(1/ε) + log log n + log(1/ε) log k) = O((` + log k) log(1/ε)). The analysis of this generator is as follows. The induced distribution on the outputs of k d the fi is a distribution on {0, 1} such that any d bits are 1/k close to the distribution of independent variables whose expectations are equal to the E[fi]. Now Lemma 5.2 in [CRS00] (cf. [EGL+98]) shows that the probability that the And of the output is 1 equals the product −Ω(d) k d of the expectations of the fi plus an error which is ≤ 2 + d d /k ≤ ε. However this generator breaks down if the output of the functions is {−1, 1} instead of {0, 1}. Moreover, its parameters are incomparable with those in Theorem 2.16.(1). In particular, for k = O(1) its seed length is ≥ ` log(1/ε), while as remarked above we achieve O(` + log ` log(1/ε)). √ We are able to improve the seed length of (2) in Theorem 2.16 to O˜( n), but then the resulting generator is more complicated and in particular it does not output a distribution of the form D + N 0. For this improvement we “derandomize” Theorem 2.5 and then combine it with a recursive technique originating in [AW89] and used in several recent works including [GMR+12, RSV13, SVW17, CSV15]. Our context and language are somewhat different from previous work, and this fact may make this chapter useful to readers who wish to learn the technique.

Theorem 2.17 (PRG for unordered products, II). There exists an explicit pseudorandom generator G: {0, 1}s → ({0, 1}`)k that ε-fools product tests in any order and seed length s = O(` + p`k log k log(k/ε)).

Recall that for b = ` the error bound in our Theorem 2.5 is k(1−η)Ω(b/k), and that it is an interesting question to ask whether the exponent can be improved to Ω(b). We show that if

15 such an improvement is achieved for the derandomized version of the theorem (stated later in Theorem 2.38) then one would get much better seed length: s = O((`+log k log(n/ε)) log n). In Chapter4 we will see that a positive answer to this question. Reingold, Steinke, and Vadhan [RSV13] give a generator√ that ε-fools width-w space algorithms on n bits in any order, with seed length s = O˜( n log(w/ε)). Every combinatorial ` k `−1 rectangle f :({0, 1} ) → {0, 1} can be computed by a one-way√ algorithm with width 2 +1 on n = `k bits. Hence they also get seed length O˜( `k(` + log 1/ε)) for combinatorial rectangles. Our Theorem 2.17 improves upon this by removing a factor of `. Going in the other direction, if D is a distribution on ({0, 1}`)k bits that ε-fools combi- natorial rectangles, then D also fools width-w one-way algorithms on n = `k bits with error wkε. Using this we obtain from Theorem 2.5 a new class of distributions that fools space, namely any distribution that is the sum of a distribution with high-enough independence (or small enough bias) and suitable noise. We state one representative result. Corollary 2.18 (Bounded independence plus noise fools space). Let D be a b-uniform dis- tribution on n bits. Let N be the noise distribution from Definition 2.4. If b ≥ n2/3 log n and η is any constant then D + N fools O(log n)-space algorithms in any order with error o(1). As mentioned earlier, [GKM18] show that if a generator fools products then it also fools several other computational models, with some loss in parameters. As a result, we obtain generators for the following two models, extended to read bits in any order. Definition 2.19 (Generalized halfspaces and combinatorial shapes). A generalized halfspace ` k P is a function h:({0, 1} ) → {0, 1} defined by h(x) := 1 if and only if i≤k gi(xi) ≥ θ, where ` g1, . . . , gk : {0, 1} → R are arbitrary functions and θ ∈ R. ` k P A combinatorial shape is a function f :({0, 1} ) → {0, 1} defined by f(x) := g( i≤k gi(xi)), ` where g1, . . . , gk : {0, 1} → {0, 1} and g : {0, . . . , k} → {0, 1} are arbitrary functions.

Theorem 2.20 (PRG for generalized halfspaces and combinatorial shapes, in any order). s ` k There exists an explicit pseudorandom generator G: {0, 1} → ({0, 1} ) that ε-fools√ both generalized halfspaces and combinatorial shapes in any order with seed length s = O˜(` k + p`k log(1/ε)). √ Note that for ε = 2−O(`) the seed length simplifies to O˜(` k).

2.1.3 Techniques We now give an overview of the proof of Theorem 2.5. The natural high-level idea, which our proof adopts as well, is to apply Fourier analysis and use noise to bound high-degree terms and independence to bound low-degree terms. Part of the difficulty is finding the right Q H way to decompose the product i≤k fi. We proceed as follows. For a function f let f be H L its “high-degree” Fourier part and fL be its “low-degree” Fourier part, so that f = f + f . Q Q L Our goal is to go from fi to fi . The latter is a product of low-degree functions and Q hence has low degree. Therefore, its expectation will be close to i µi by the properties

16 of the distribution D; here we do not use the noise N. To point out a limitation of this Q L argument, note that if D is n-wise independent we need fi to have degree ≤ `. Even if L each fi has degree 1 we cannot afford k larger than `. Q Q L H L To move from fi to fi we pick one fj and we decompose it as fj + fj . Iterating Q L this process we indeed arrive to fi , but we also obtain k extra terms of the form

H L L L f1f2 . . . fj−1fj fj+1fj+2 . . . fk for j = 1, . . . , k. We show that each of these terms is close to 0 thanks to the presence of H the high-degree factor fj . Here we use both D and N. We conclude this section with a brief technical comparison with the recent papers [GMR+12, GY14, GKM18] which give generators for combinatorial rectangles (and product tests). We note that the generators in those papers only fool tests f = f1 · f2 ··· fk that read the input in a fixed order (whereas our results allow for any order). Also, they do not use noise, but rather hash the functions fi in a different way. Finally, a common technique in those pa- pers is, roughly speaking, to use hashing to reduce the variance of the functions, and then show that bounded independence fools functions with small variance. We note that the noise parameters we consider in this chapter are too small to be used to reduce the vari- ance. Specifically, for a product test f those papers define a new function g = g1 · g2 ··· gk which is the average of f over t independent inputs. While g has the same expectation as f, the variance of each gi is less than that of fi by a factor of t. Their goal is to make the variance of each gi less than 1/k so that the sum of the variances is less√ than 1. In order to achieve this reduction with noise we would have to set η ≥ 1 − 1/ k. This is x because if fi simply is (−1) where x is one bit, then the variance of fi perturbed by noise 2 x+N x+N N+N 0 2 is Ex[EN [(−1) ]] − Ex,N [(−1) ] = Ex,N,N 0 [(−1) ] = (1 − η) .

Organization. In Section 2.2 we prove a more general version of Theorem 2.5. In Sec- tion 2.3 we give the proof details for the results in Section 2.1.1. The details for the results in Section 2.1.2 are spread over three sections. In Section 2.4 we prove Theorem 2.16. In Section 2.5 we prove Theorem 2.17, and discuss the potential improvement. In Section 2.6 we prove Theorem 2.20. In Section 2.7 we include for completeness a lower bound on the values of b and η for which Theorem 2.5 can apply.

2.2 Bounded independence plus noise fools products

In this section we prove Theorem 2.5. It follows easily from the next theorem which is the main result in this section. A distribution D over n bits has bias δ if any parity of the bits (with range {−1, 1}) has expectation at most δ in magnitude. The following definition extends this to larger alphabets.

n Definition 2.21. A distribution D = (D1,D2,...,Dn) over m is (b, δ)-biased if, for every Z P n i αiDi nonzero α ∈ Zm with at most b non-zero coordinates we have |ED[ω ]| ≤ δ where ω := e2πi/m. When b = n we simply call D δ-biased.

17 ` Let n = `k. Recall that we denote by V (t) = Vq,`(t) the number of x ∈ Fq with at most t non-zero coordinates.

` Theorem 2.22. Let t ∈ [0, `]. Let f1, . . . , fk : Zm → C1 be k functions with µi = E[fi]. Let n D be a (b, δ)-biased distribution over Zm for b ≥ 2(k − 1)t and each Di is uniform, which holds if b ≥ `. Let N = N(m, n, η) be the noise distribution from Definition 2.4. Write ` D = (D1,D2,...,Dk) where each Di is in Zm, and similarly for N. Then " #

Y Y tp ` k−1 k/2 E fi(Di + Ni) − µi ≤ k(1 − η) (1 + m δ)(1 + V (t) δ) + V (t) δ. i≤k i≤k

Note that [AGM03] show that a (b, δ)-biased distribution over {0, 1}n is ε-close in sta- Pb n tistical distance to a b-uniform distribution, for ε = δ i=1 i . (See [OZ18] for an optimal bound.) One can apply their results in conjunction with Theorem 2.5 to obtain a variant Pb n of Theorem 2.5 for small-bias distributions, but only if δ ≤ 1/ i=1 i . Via a direct proof we derive useful bounds already for δ = Ω(2−b), and this is used in the applications in Section 2.1.2. We now state and prove a more general version of Theorem 2.5 in the introduction.

` Corollary 2.23 (Generalization of Theorem 2.5). Let f1, . . . , fk : Zm → C1 be k functions n with µi = E[fi]. Set n := `k and let D be a b-uniform distribution over Zm. Let N be the ` noise distribution from Definition 2.4. Write D = (D1,D2,...,Dk) where each Di is in Zm, and similarly for N. Then " #

Y Y E fi(Di + Ni) − µi ≤ ε i≤k i≤k

for the following choices: (1) if b ≥ ` then ε = k(1 − η)Ω(b2/n). ` Ω(b/k) (2) if b < ` and each Di is uniform over Zm then ε = k(1 − η) . −Ω(ηb/k) `  −Ω(ηb) (3) if b < ` then ε = ke + 2k `−b e . Moreover, for ` = 1, m = 2, and any η, b, k there exist fi and D for which it is not true that ε < (1 − η)b+1. In particular, if b = Ω(n) then an upper bound on the error of the form k(1 − η)cn is false for sufficiently large c, using that η ≥ (log k)/n.

We use (1) in most of our applications. Occasionally we use (3) with b = ` − 1, in which case the error bound is O(`ke−Ω(η`/k)).

Proof of Corollary 2.23. Setting δ = 0 and t = b/2(k − 1) in Theorem 2.22 gives the bound

k(1 − η)b/2(k−1) (?)

which proves the theorem in the case ` ≤ b = O(`).

18 To prove (1) we need to handle larger b. For this, let c := bb/`c, and group the k functions into k0 ≤ k/c+1 functions on input length `0 := c`. Note that b ≥ `0, and so we can apply (?) to k0(1 − η)Ω(b/k0) ≤ k(1 − η)Ω(b2/k`). To prove (2) one can observe that in the proof of (?) the condition b ≥ ` is only used to guarantee that each Di is uniform. The latter is now part of our assumption. To prove (3) view the noise vector N as the sum of two noise vectors N 0 and N 00 with parameter α such that 1 − η = (1 − α)2. Note this implies α = Ω(η). If N 0 sets to uniform at least ` − b coordinates in each function then we can apply (?) to functions on ≤ b symbols with η replaced by α and N replaced by N 0. The probability that N 0 does not set to uniform that many coordinates is at most  `   `  k (1 − α)b ≤ k e−Ω(ηb), ` − b ` − b and in that case the distance between the expectations is at most two. xi To show the “moreover” part, define fi := (−1) for i ≤ b + 1 and fi := 1 for i > b + 1 to compute parity on the first b + 1 bits, and let D be the b-wise independent distribution which is uniform on strings whose parity of the b + 1 bits is 0. The other bits are irrelevant. The expectation of parity under uniform is 0. The expectation of parity under D is 1 if no symbol is perturbed with noise, and is 0 otherwise. Hence the error is ≥ (1 − η)b+1. We now state and prove a special case of Theorem 2.22 for small-bias distributions.

` −` Corollary 2.24. Let f1, . . . , fk : Zm → C1 be k functions with µi = E[fi]. Assume δ ≤ s . n Let D be an (`, δ)-biased distribution over Zm. Let N be the noise distribution from Definition ` 2.4. Write D = (D1,D2,...,Dk) where each Di is in Zm, and similarly for N. Then " # √ Y Y Ω(log(1/δ)/(k log mk)) E fi(Di + Ni) − µi ≤ 2k(1 − η) + δ. i≤k i≤k

Proof of Corollary 2.24. Let c := bplog(1/δ)/(` log m)c. Note that c ≥ 1 because δ ≤ m−`. We group the k functions into k0 = dk/ce functions on input length `0 := c`. The goal is `0 0 t k0/2 k0−1 to make s close to 1/δ. By Claim 2.34, V`0 (t) ≤ (e` m/t) . Hence V`0 (t) ≤ V`0 (t) ≤ 0 k0t 0 0 0 (e` m/t) . Now let t = α` log m/(k √log mk ) for a small constant α > 0 so that the latter bound is ≤ m`0/2, which is roughly 1/ δ. The error bound in Theorem 2.22 now becomes at most

k(1 − η)t(1 + m`0 δ) + m`0/2δ.

And so the bound is at most √ 2k(1 − η)Ω(log(1/δ)/(k log mk)) + δ.

We now turn to the proof of Theorem 2.22. We begin with some preliminaries.

19 2.2.1 Preliminaries

Denote by U the uniform distribution. Let m be any positive integer. We write Zm for 2πi/m u {0, 1, 2, . . . , m − 1}. Let ω := e be a primitive m-th root of unity. For any α ∈ Zm, we u define χα(x): Zm → C to be hα,xi χα(x) := ω , u P where α and x are viewed as vectors in Zm and hα, xi := i αixi. u For any function f : Zm → C, its Fourier expansion is

X ˆ f(x) := fαχα(x), u α∈Zm

ˆ where fα ∈ C is given by ˆ u fα := Ex∼Zm [f(x)χα(x)]. Here and elsewhere, random variables are uniformly distributed unless specified otherwise. P The Fourier L1-norm of f is defined as α|fα|, and is denoted by L1[f]. The degree of ˆ f is defined as max{|α| : fα 6= 0}, where |α| is the number of nonzero coordinates of α, and is denoted by deg(f). Note that we have L1[f] = L1[f]. The following fact bounds the L1-norm and degree of product functions.

u Fact 2.25. For any two functions f, g : Zm → C, we have (1) deg(fg) ≤ deg(f) + deg(g), and

(2) L1[fg] ≤ L1[f]L1[g].

We shall use this fact both for f and g on disjoint inputs and for f and g on the same input.

Proof. We have

    ! X ˆ X X ˆ X X ˆ f(x)g(x) =  fαχα(x)  gˆβχβ(x) = fαgˆβχα+β(x) = fα−βgˆβ χα(x). ` ` α∈Zm β∈Zm α,β α β

P ˆ Hence the α-th Fourier coefficient of f · g is β fα−βgˆβ. To see (1), note that in the latter expression the sum over β can be restricted to those β with |β| ≤ deg(g). Now note that if |α| > deg(f) + deg(g) then |α − β| > deg(f) and hence ˆ fα−β will be zero for every β. P P ˆ P ˆ P ˆ P To show (2) write L1[fg] = α| β fα−βgˆβ| ≤ α,β|fα−β||gˆβ| = ( α|fα|)( β|gˆβ|) = L1[f]L1[g].

P ˆ 2 2 Fact 2.26 (Parseval’s identity). ` |fα| = ` [|f(x)| ]. In the case of f ∈ 1, this α∈Zm Ex∼Zm C quantity is at most 1.

20 Proof.

X X X X 2 ` [f(x)f(x)] = ` [ fˆ χ (x)· fˆ 0 χ 0 (x)] = fˆ fˆ 0 ` [χ 0 (x)] = |fˆ | . Ex∼Zm Ex∼Zm α α α α α α Ex∼Zm α−α α ` 0 ` 0 ` ` α∈Zm α ∈Zm α,α ∈Zm α∈Zm

0 where the last equality holds because we have ` [χ 0 (x)] equals 0 if α 6= α and equals Ex∼Zm α−α 1 otherwise.

` Fact 2.27. Let N = (N1,...,N`) be the distribution over Zm, where the symbols are inde- pendent and each of them is set to uniform with probability η and is 0 otherwise. Then for ` |α| every α ∈ Zm, E[χα(N)] = (1 − η) .

Proof. The expectation conditioned on the event “N sets none of the nonzero positions of α to uniform” is 1. This event happens with probability (1 − η)|α|. Conditioned on its complement, the expectation is 0. To see this, assume that the noise vector sets to uniform αi position i of α, and that αi 6= 0. Let β := ω . Then the expectation can be written as a product where a factor is

1 βm − 1 [βx] = · = 0, Ex∼{0,1,...,m−1} m β − 1

m m αi using the fact that β 6= 1 because αi ∈ {1, 2, . . . , m−1} and that β = (ω ) = 1. Therefore the total expectation is (1 − η)|α|.

Note that this lemma includes the uniform η = 1 case, with the convention 00 = 1. We will use the following facts multiple times.

` Fact 2.28. Let f : Zm → C be a function with degree b. We have: ` (1) For any (b, δ)-biased distribution D over Zm, |E[f(D)] − E[f(U)]| ≤ L1[f]δ, ` 2 2 2 (2) For any (2b, δ)-biased distribution D over Zm, |E[|f(D)| ] − E[|f(U)| ]| ≤ L1[f] δ, and (3) the bound in (2) holds even if D is (`, δ) biased.

Proof. For (1), note that | [f(D)]− [f(U)]| = P fˆ [χ (D)] ≤ P |fˆ || [χ (D)]| ≤ E E 0<|α|≤b α E α 0<|α|≤b α E α L1[f]δ. For (2), recall that |f(x)|2 = f(x)f(x). By Fact 2.25, the function |f(x)|2 has degree ¯ 2 ≤ 2b. Also, again by Fact 2.25 the L1-norm of that function is at most L1[f]·L1[f] = L1[f] . Now the result follows by (1). ` Finally, (3) is proved like (2), noting that a function on Zm always has degree ≤ `.

P ˆ Actually the bounds hold with α6=0 |fα| instead of L1[f], but we will not use that.

21 2.2.2 Proof of Theorem 2.22 ` P ˆ For a function f : Zm → C1, consider its Fourier expansion f(x) := α fαχα(x), and let L P ˆ H P ˆ ` k f (x) := α:|α|≤t fαχα(x) and f (x) := α:|α|>t fαχα(x). Define Fi :(Zm) → C to be

Y  H Y L  Fi(x1, . . . , xk) := fj(xj) · fi (xi) · f` (x`) . ji

L H Pick fk and write it as fk + fk . We can then rewrite

Y  Y  L fi = Fk + fi · fk . 1≤i≤k 1≤i≤k−1 Q Q H L We can reapply the process to fi = (fi + fi ). Continuing this way, we 1≤i≤k−1 Q1≤i≤k−1 Q eventually have what we want to bound, i.e. |E[ i≤k fi(Di + Ni)] − i≤k µi|, is at most

h i X Y L Y E[Fi(D + N)] + E fi (Di + Ni) − µi . i≤k i≤k i≤k The theorem follows readily from the next two lemmas, the second of which has a longer proof. Q L Q k/2 Lemma 2.29. |E[ i≤k fi (Di + Ni)] − i≤k µi| ≤ V (t) δ. L Proof. Fix N arbitrarily. Each fi has degree at most t, and by the Cauchy–Schwarz in- P ˆ 1/2 P ˆ 2 1/2 1/2 equality, it has L1-norm |α|≤t|fα| ≤ V (t) ( α|fα| ) ≤ V (t) . Here we use the fact Q L that f maps to C1 and Fact 2.26. Hence, by Fact 2.25, 0

h i Y L Y k/2 ED fi (Di + Ni) − µi ≤ V (t) δ. i≤k i≤k Averaging over N proves the claim.

tp ` k−1 Lemma 2.30. For every i ∈ {1, 2, . . . , k}, we have |E[Fi(D+N)]| ≤ (1−η) (1 + m δ)(1 + V (t) δ). Proof. We have

E[Fi(D + N)]

h i Y H Y L = E fj(Dj + Nj) · fi (Di + Ni) · f` (D` + N`) ji hY H Y L i ≤ ED ENj [fj(Dj + Nj)] · ENi [fi (Di + Ni)] · EN` [f` (D` + N`)] ji h H Y L i ≤ ED ENi [fi (Di + Ni)] · EN` [f` (D` + N`)] , `>i

22 where the last inequality holds because |ENj [fj(Dj +Nj)]| ≤ ENj [|fj(Dj +Nj)|] ≤ 1 for every j < i, by Jensen’s inequality, convexity of norms, and the fact that the range of fj is C1. By the Cauchy–Schwarz inequality, we get

1/2 " #1/2  2 Y 2 | [F (D + N)]| ≤ f H (D + N ) · f L(D + N ) . E i ED ENi i i i ED EN` ` ` ` `>i

In claims 2.32 and 2.33 below we bound above the square of the two terms on the right-hand side. In both cases, we view our task as bounding |ED[g(D)]| for a certain function g, and we proceed by computing the L1-norm, average over uniform, and degree of g, and then we apply Fact 2.28. We start with a claim that is useful in both cases.

` Claim 2.31. Let f : Zm → C be a function. Then: P ˆ |α| 1. for every x, EN [f(x + N)] = α fαχα(x)(1 − η) , and  2 P ˆ 2 2|α| 2. EU |EN [f(U + N)]| = α |fα| (1 − η) . P ˆ P ˆ Proof. For (1), write EN [f(x+N)] = EN [ α fαχα(x+N)] = α fαχα(x) EN [χα(N)]. Then apply Fact 2.27. 2 For (2), write |EN [f(x + N)]| as EN [f(x + N)]EN [f(x + N)]. Then apply (1) twice to further write it as

hX ˆ ˆ |α|+|α0|i X ˆ ˆ |α|+|α0| EU fαfα0 χα−α0 (U)(1 − η) = fαfα0 EU [χα−α0 (U)](1 − η) . α,α0 α,α0

The claim then follows because U is uniform.

We can now bound our terms.

h H 2i 2t ` Claim 2.32. For every i, ED ENi [fi (Di + Ni)] ≤ (1 − η) (1 + m δ).

H Proof. Let g(x) be the function g(x) = ENi [fi (x + Ni)]. By (1) in Claim 2.31, the L1-norm P ˆ |α| t P ˆ t `/2 of g is at most α:|α|>t |fiα|(1 − η) ≤ (1 − η) α |fiα| ≤ (1 − η) m , where we used Cauchy–Schwarz and Fact 2.26. 2 2t By (2) in Claim 2.31 and Fact 2.26, EU [|g(U)| ] under uniform is at most (1 − η) . 2 2t Because b ≥ ` we can apply (3) in Fact 2.28 to obtain that ED[|g(D)| ] ≤ (1 − η) + (1 − η)2tm`δ as claimed.

hQ L 2i k−1 Claim 2.33. ED `>i EN` [f` (D` + N`)] ≤ 1 + V (t) δ.

L Proof. Pick any ` > i and let g`(x) := EN [f` (x + N`)]. 1/2 The L1-norm of g` is at most V (t) by (1) in Claim 2.31 and Cauchy–Schwarz. Also by 2 (2) in the same claim we have EU [|g`(U)| ] ≤ 1. Moreover, g` has degree at most t by (1) in the same claim.

23 ` k−i Now define g :(Zm) → C as g(xi+1, xi+2, . . . , xk) := gi+1(xi+1) · gi+2(xi+2) ··· gk(xk). (k−i)/2 (k−1)/2 Note that g has L1-norm at most V (t) ≤ V (t) and degree (k − i)t ≤ (k − 1)t, 2 by Fact 2.25 applied with u = `(k − i). Moreover, EUi+1,Ui+2,...,Uk [|g(Ui+1,Ui+2,...,Uk)| ] = 2 2 2 EUi+1 [|gi+1| ] · EUi+2 [|gi+2| ] ··· EUk [|gk| ] ≤ 1. Because b ≥ 2(k − 1)t, we can apply (2) in Fact 2.28 to obtain

 2 k−1 ED |g(D)| ≤ 1 + V (t) δ as desired.

Lemma 2.30 follows by combining claims 2.32 and 2.33.

2.3 Proofs for Section 2.1.1

In this section we provide the proofs for the claims made in Section 2.1.1.

n Theorem 2.10 (Distinguishing noisy codewords from uniform is hard). Let C ⊆ Fq be a b-uniform code. Let N be the noise distribution from Definition 2.4. Let k be an integer n dividing n such that b ≥ n/k − 1. Let P : Fq → {0, 1} be a k-party protocol using c bits of communication. Then

c+log n+O(1)−Ω(ηb/k) Pr[P (C + N) = 1] − Pr[P (U) = 1] ≤ ε for ε = 2 ,

n where C and U denote the uniform distributions over the code C and Fq respectively. Proof. Let L be the set of the 2c leaves of the protocol tree. For ` ∈ L, note that the set of inputs that lead to ` forms a rectangle, denoted R`. Moreover, these rectangles are disjoint. Hence,

X X Pr[P (C + N) = 1] − Pr[P (U) = 1] = Pr[C + N ∈ R`] − Pr[U ∈ R`] ` ` X ≤ Pr[C + N ∈ R`] − Pr[U ∈ R`] . `

If b ≥ n/k − 1, then applying Corollary 2.23.(3) with ` = n/k to each R` we have

  `   ε = 2c ke−Ω(ηb/k) + 2k e−Ω(ηb) ` − b = 2cke−Ω(ηb/k) + 2ne−Ω(ηb) = 2c3ne−Ω(ηb/k) = 2c+log n+O(1)−Ω(ηb/k).

n Recall that we denote by V (t) the number of x ∈ Fq with at most t non-zero coordinates.

n t t Claim 2.34. The following two inequalities hold: V (t) ≤ t q ≤ (enq/t) .

24 Proof. The second is standard. To see the first, note that to specify a string with Hamming weight ≤ t we can specify a super-set of size t of the non-zero positions, and then values for those positions, including 0.

n Theorem 2.11 (Truncated syndrome distinguishing). Let C ⊆ Fq be a linear code with dimension d. Given t and δ > 0, define s := dlogq(Vq,n(t)/δ)e. If d ≤ n − s there is a one-way algorithm A that runs in space s log q such that 1. for every x ∈ C and for every e of Hamming weight ≤ t, A(x + e) = 1, and n 2. Pr[A(U) = 1] ≤ δ, where U is uniform in Fq . Moreover, the space bound s log q is at most O(t log(nq/t)) + log 1/δ.

(n−d)×n 0 Proof. Let H ∈ Fq be the parity-check matrix of C. Let H be the matrix consisting of the first s rows of H. Note that we do have at least this many rows by our hypothesis on d. Also note that H0 has full rank. n 0 0 On input x ∈ Fq , the algorithm computes H x, and accepts if and only if H x equals to 0 n H e for any e ∈ Fq of Hamming weight at most t. To analyze the correctness, let y be a codeword with at most t errors. Then H(y −e) = 0 n for some e ∈ Fq with Hamming weight at most t, and so the algorithm always accepts. On 0 0 s the other hand if U is uniform, then as H has full rank, H U is uniform in Fq. Since there are n V (t) vectors in Fq with Hamming weight at most t, the algorithm accepts with probability ≤ V (t)/qs ≤ δ. Now we show how to compute H0x using s symbols of space (and so s log q bits). For 0 0 P i ≤ s, let hi be the i-th row of H . Note that the i-th symbol of H x equals j≤n hi,jxj, which can be computed with one symbol of space by keeping the partial sum. The result follows. The “moreover” part follows from Claim 2.34.

0 n Theorem 2.12. Let C ⊆ Fq be a linear code with an n×r generator matrix G. Let i ∈ {1, 2, . . . , r} be an index, and let C be the code defined as C := {Gx | xi = 0}. Let N be the noise distribution from Definition 2.4. Let k be an integer. Suppose that C is b-uniform for n b ≥ n/k − 1. Let P : Fq → Fq be a k-party protocol using c bits of communication. Then

Pr[P (GU + N) = Ui] ≤ 1/q + ε, where U = (U1,U2,...,Ur) is the uniform distribution and ε is as in Theorem 2.10.

Proof. Suppose

Pr [P (GU + N) = Ui] ≥ 1/q + ε.

Let Da be the uniform distribution over {Gx | xi = a}. We can rewrite the inequality as   Ea∈Fq Pr[P (Da + N) = a] − Pr[P (U) = a] ≥ ε.

Therefore, there exists an a such that Pr[P (Da + N) = a] − Pr[P (U) = a] ≥ ε.

25 0 We now use P to construct a protocol P that distinguishes D0 + N from uniform. Given n y ∈ Fq , the parties add to y the ith column Gi of G multiplied by a. This can be done without communication. Then they run the protocol P on y + aGi and accept if and only if the output is a. We have

0 0 Pr[P (D0 + N) = 1] − Pr[P (U) = 1] = Pr[P (D0 + aGi + N) = a] − Pr[P (U) = a]

= Pr[P (Da + N) = a] − Pr[P (U) = a] ≥ ε.

So the result follows from Theorem 2.10.

n Theorem 2.13 (Recovering messages from noisy codewords). Let C ⊆ Fq be a code with distance d. Let t be an integer such that 2t < d, and let k be an integer dividing n. There n n is a k-party protocol P : Fq → Fq communicating max{n − d + 2t + 1 − n/k, 0}dlog2 qe bits such that for every x ∈ C and every e of Hamming weight ≤ t, P (x + e) = x.

Proof. Let ` := n/k be the input length to a party. The parties communicate n−d+2t+1−` symbols that the first does not have, and no symbol if n − d + 2t + 1 − ` ≤ 0. The first party then outputs the unique message whose encoding is at distance ≤ t with the n − d + 2t + 1 symbols z they have, i.e. the symbols they received plus the ` they already have. The message corresponding to x clearly is such a message. Also no other such message exists, because if two encodings are at distance ≤ t with z then they agree with each other in ≥ n − d + 1 symbols, and so they cannot differ in d positions and must be the same.

2.4 Pseudorandomness: I

In this section we prove our first theorem on pseudorandom generators, Theorem 2.16. First, we shall need the following lemma to sample our noise vectors, which is also used in the next section. We write sd for statistical distance.

Lemma 2.35. There is a poly(n)-time computable function f mapping O(η log(1/η)n) bits to {0, 1}n such that sd(f(U),N) ≤ e−Ω(ηn).

In turn, that will use the following lemma to sample arbitrary distributions through discretization. A version of the lemma appears in [Vio12, Lemma 5.2]. That version only bounds the number of bits of the sampler. Here we also need that the sampler is efficient.

Lemma 2.36. Let D be a distribution on S := {1, 2, . . . , n}. Suppose that given i ∈ S we can compute in time polynomial in |i| = O(log n) the cumulative distribution Pr[D ≤ i]. Then there is a poly log(nt)-time computable function f that given any t ≥ 1 uses

dlog2 nte bits to sample a string in the support of D such that sd(f(U),D) ≤ 1/t.

Proof. Following [Vio12, Lemma 5.2], partition the interval [0, 1] into n intervals Ii of lengths Pr[D = i], i = 1, . . . , n. Also partition [0, 1] in ` := 2dlog2 nte ≥ nt intervals of size 1/` each,

26 which we call blocks. The function f interprets an input as a choice of a block b, and outputs i if b ⊆ Ii and, say, outputs 1 if b is not contained in any interval. For any i we have | Pr[D = i] − Pr[f(U) = i]| ≤ 2/`. Hence the statistical distance is P ≤ (1/2) i | Pr[D = i] − Pr[f(U) = i]| ≤ (1/2)n2/` ≤ 1/t. To show efficiency we have to explain how given b we determine the i such that b ⊆ Ii. We perform binary search. This requires O(log n) steps, and in each step we compute the cumulative distribution function of D, which by assumption can be computed in time polynomial in log n.

Proof of Lemma 2.35. We can assume η ≥ 1/n, for otherwise the conclusion is trivial. Our function f first samples a weight distribution W on {0, . . . , n} so that sd(W, |N|) ≤ e−Ω(ηn). By Lemma 2.36, this uses a seed of length O(ηn + log(n + 1)) and runs in time polynomial in n. If W ≥ 2ηn, we output the all-zero string. Otherwise we sample a string in {0, 1}n n  with Hamming weight W almost uniformly. To do this, first we index the W strings in n  lexicographical order. We then use Lemma 2.36 again to sample an index in {1,..., W } from a distribution that is e−Ω(ηn)-close to uniform. This takes another seed of length at n  most O(ηn + log 2ηn ) = O(ηn + η log(1/η)n) and can be computed in time polynomial in n. Given an index i, we output the corresponding string efficiently using the following recur- rence. Let s(n, k, i) denote the i-th n-bit string with Hamming weight k, in lexicographical order. We have

( n−1 0 ◦ s(n − 1, k, i) if i ≤ k s(n, k, i) = n−1 1 ◦ s(n − 1, k − 1, i − k ) otherwise.

Note that s(n, k, i) outputs the string by n comparisons of strings with at most n bits, and thus can be computed in time polynomial in n. Therefore f has input length O(ηn + η log(1/η)n + log(n + 1)) = O(η log(1/η)n). Let D := f(U). We now bound above the statistical distance between D and N. Denote Dw as the distribution D conditioned on |D| = w and denote Nw analoguously. We have

n X X X Pr[D(x)] − Pr[N(x)] = Pr[D(x)] − Pr[N(x)] x∈{0,1}n w=0 |x|=w n X X     = Pr[Dw(x)] Pr |D| = w − Pr[Nw(x)] Pr |N| = w , w=0 |x|=w

Adding − Pr[Dw(x)] Pr[|N| = w] + Pr[Dw(x)] Pr[|N| = w] = 0 in each summand, this is at most

n n X X   X X   Pr[Dw(x)]· Pr[|D| = w]−Pr |N| = w + Pr[Dw(x)]−Pr[Nw(x)] ·Pr |N| = w . w=0 |x|=w w=0 |x|=w

27 The first double summation is at most 2 sd(|D|, |N|) = 2 sd(W, |N|). We now bound above the second summation as follows. We separate the outer sum into w > 2ηn and w ≤ 2ηn. For the first case, we have X X     Pr[Dw(x)] − Pr[Nw(x)] · Pr |N| = w ≤ 2 Pr |N| > 2ηn . w>2ηn |x|=w By the Chernoff Bound, this is at most 2e−Ω(ηn). For the other case, we have X X   Pr[Dw(x)] − Pr[Nw(x)] · Pr |N| = w ≤ 2 max sd(Dw,Nw). w≤2ηn w≤2ηn |x|=w Therefore,

 −Ω(ηn) −Ω(ηn) sd(D,N) ≤ sd W, |N| + max sd(Dw,Nw) + e ≤ 3e . w≤2ηn We can now prove our first theorem on pseudorandom generators. Theorem 2.16 (PRG for unordered products, I). There exist explicit pseudorandom gen- erators G: {0, 1}s → ({0, 1}`)k that ε-fool product tests in any order, with the following seed lengths: 1. s = 2` + O˜(k2 log(1/ε)), and 2. s = O(`) + O˜((`k)2/3 log1/3(1/ε)). Proof. (1) We apply Theorem 2.22. Known constructions [AGHP92, Theorem 2] (see also [NN93]) produce a δ-biased distribution over n bits using 2 log(1/δ) + O(log n) bits. We set δ = O(2−`ε), resulting in a seed length of 2` + 2 log(1/ε) + O(log n) bits. For the noise we set η = O(k log k log(k/ε)/`). Note that η ≤ 1 because we can assume k2 log k log(k/ε) log ` ≤ `, for else (2) gives a better bound. By Lemma 2.35, the seed length to generate the noise vector is O(k2 log k log(k/ε) log(`/k)). So the overall seed length is s = 2` + O(k2 log k log(k/ε) log `) = 2` + O˜(k2 log(1/ε)). Applying Theorem 2.22 with t = c`/(k log k) for a small enough constant c, we can bound V (t)k/2 ≤ V (t)k−1 ≤ 2`. Thus the error bound from Theorem 2.22 is at most k(1 − η)c`/(k log k)(1 + 2`δ) + 2`δ ≤ 2k(1 − η)c`/(k log k) + ε/4 ≤ ε/2. The error from Lemma 2.35 is e−Ω(ηn) ≤ ε/2. Thus overall the error is at most ε. The fact that we can apply any permutation π follows from the fact that applying such a permutation does not change the noise distribution, and preserves the property of being b-wise independent. We remark that one can also replace the δ-biased distribution with a δ-almost 2tk-wise independent distribution, but since log(1/δ) is the dominating term in the seed length there is no advantage in doing so. (2) Let c := b(k2 log k log(k/ε) log `/`)1/3c. We can assume c ≥ 1 for else (1) gives a better bound. Group the k functions into k0 = dk/ce functions on input length `0 := c`. We can now apply (1) with `, k replaced by `0, k0 to get the desired seed length of s = O(`) + O(`2/3(k2 log k log(k/ε) log `)1/3) = O(`) + O˜((`k)2/3 log1/3(1/ε)).

28 2.5 Pseudorandomness, II

We now move to our second theorem on pseudorandom generators, Theorem 2.17. We begin by modifying Theorem 2.22 to allow us to sample the noise in a certain pseudorandom way. Specifically, we can write our noise vector N in the previous sections as N = T ∧ U, where U is uniform, T is a distribution of i.i.d. bits where each comes 1 with probability η, and ∧ denotes bit-wise AND. In the derandomized way, we keep U uniform but select T using an almost `-wise independent distribution. The analogue of Theorem 2.22 with this derandomization is proved below as Theorem 2.38. Finally, we show how to recurse on U in Section 2.5.2. At the end of the section we show that a certain improvement in the error bound of Theorem 2.38 would yield much better pseudorandom generators.

Definition 2.37. A distribution T on n bits is γ-almost d-wise independent if for every d d indices i1, . . . , id and any S ⊆ {0, 1} we have

X h^ i Y  Pr T = x − Pr[T = x ] ≤ γ. ij j ij j x∈S j≤d j≤d

Theorem 2.38 (Bounded independence plus derandomized noise fools products). Let t ∈ ` [0, `]. Let f1, . . . , fk : {0, 1} → C1 be k functions with µi = E[fi]. Let D be a δ-biased distribution over ({0, 1}`)k. Let T be a γ-almost `-wise distribution over ({0, 1}`)k which sets each bit to 1 with probability η and 0 otherwise. Assume γ ≤ η. Let U be the uniform ` k ` distribution over ({0, 1} ) . Write D = (D1,D2,...,Dk) where each Di is in {0, 1} , and similarly for T and U. Then

h i Y Y t 1/2p ` k−1 k/2 E fi(Di + Ti ∧ Ui) − µi ≤ k (1 − η) + γ (1 + 2 δ)(1 + V (t) δ) + V (t) δ. i≤k i≤k

2.5.1 Proof of Theorem 2.38

` We begin exactly as in the proof of Theorem 2.22. For a function f : {0, 1} → C1, consider P ˆ L P ˆ H its Fourier expansion f(x) := α fαχα(x), and let f (x) := α:|α|≤t fαχα(x) and f (x) := P ˆ ` k α:|α|>t fαχα(x). Define Fi :({0, 1} ) → C to be ! ! Y H Y L Fi(x1, . . . , xk) := fj(xj) · fi (xi) · f` (x`) . ji

L H Pick fk and write it as fk + fk . We can then rewrite ! Y Y L fi = Fk + fi · fk . 1≤i≤k 1≤i≤k−1

29 Q We can reapply the process to ( fi). Continuing this way, we eventually have what Q 1≤i≤k−1 Q we want to bound, i.e. |E[ i≤k fi(Di + Ti ∧ Ui)] − i≤k µi|, is at most

X Y L Y E[Fi(D + T ∧ U)] + E[ fi (Di + Ti ∧ Ui)] − µi . i≤k i≤k i≤k The theorem follows readily from the next two lemmas, the second of which has a longer proof. The first one has the same proof as Lemma 2.29.

Q L Q k/2 Lemma 2.39. |ED,T,U [ i≤k fi (Di + Ti ∧ Ui)] − i≤k µi| ≤ V (t) δ. Lemma 2.40. For every i ∈ {1, 2, . . . , k}, we have

t 1/2p ` k−1 |ED,T,U [Fi(D + T ∧ U)]| ≤ ((1 − η) + γ) (1 + 2 δ)(1 + V (t) δ). Proof. We have " #

Y H Y L |E[Fi(D + T ∧ U)]| = E fj(Dj + Tj ∧ Uj) · fi (Di + Ti ∧ Ui) · f` (D` + T` ∧ U`) ji " # Y H Y L ≤ ED,T EUj [fj(Dj + Tj ∧ Uj)] · EUi [fi (Di + Ti ∧ Ui)] · EU` [f` (D` + T` ∧ U`)] ji " # H Y L ≤ ED,T EUi [fi (Di + Ti ∧ Ui)] · EU` [f` (D` + T` ∧ U`)] , `>i

where the last inequality holds because |EUj [fj(Dj + Tj ∧ Uj)]| ≤ EUj [|fj(Dj + Tj ∧ Uj)|] ≤ 1 for every j < i, by Jensen’s inequality, convexity of norms, and the fact that the range of fj is C1. By the Cauchy–Schwarz inequality, we get " #1/2 1/2 h H 2i Y L 2 |E[Fi(D + T ∧ U)]| ≤ ED,T EUi [fi (Di + Ti ∧ Ui)] ·ED,T EU` [f` (D` + T` ∧ U`)] . `>i In claims 2.42 and 2.43 below we bound from above the square of the two terms on the right-hand side. In both cases, we view our task as bounding | ED[g(D)]| for a certain function g, and we proceed by computing the L1-norm, average over uniform, and degree of g, and then we apply Fact 2.28. We start with a claim that is useful in both cases. Claim 2.41 (Replacing Claim 2.31). Let f : {0, 1}` → C be a function. Let T be a γ-almost `-wise independent distribution which sets each bit to 1 with probability η and 0 otherwise. Let U and U 0 be two independent uniform distributions over ` bits. Then: P ˆ |α| (1) for every x, ET,U [f(x + T ∧ U)] = α fαχα(x)((1 − η) ± γ), and  0 2 P ˆ 2 |α| (2) EU,T |EU 0 [f(U + T ∧ U )]| ≤ α |fα| ((1 − η) + γ).

30 P ˆ P ˆ Proof. For (1), write ET,U [f(x+T ∧U)] = ET,U [ α fαχα(x+T ∧U)] = α fαχα(x) ET,U [χα(T ∧ U)]. If T does not intersect α then the expectation is one, and this happens with probability within (1 − η)|α| − γ and (1 − η)|α| + γ. Otherwise, the expectation is 0. 0 2 For (2), write EU,T [|EU 0 [f(x + T ∧ U )]| ] as " # X ˆ ˆ 0 00 EU,T fαfα0 χα−α0 (U) EU 0,U 00 [χα(T ∧ U )χα0 (T ∧ U )] . α,α0

P ˆ 2 0 00 P 2 Since U is uniform this becomes α|fα| ET,U 0,U 00 [χα(T ∧(U −U ))] = α|fα| E[χα(T ∧U)]. The claim then follows as in (1). We can now bound our terms. H 2 t Claim 2.42 (Replacing Claim 2.32). For every i, ED,T [|EU [fi (Di +Ti ∧Ui)]| ] ≤ ((1−η) + γ)(1 + 2`δ).

H Proof. Let g(x) be the function g(x) = ETi,Ui [fi (x + Ti ∧ Ui)]. By (1) in Claim 2.41, the P ˆ |α| t P ˆ t `/2 L1-norm of g is at most α:|α|>t |fα|((1−η) +γ) ≤ ((1−η) +γ) α |fα| ≤ ((1−η) +γ)2 , where we used Cauchy–Schwarz and Fact 2.26. 2 t Also, by (2) in Claim 2.41 and Fact 2.26, EU [|g(U)| ] under uniform is at most (1−η) +γ. 2 t Because `k ≥ ` we can apply (3) in Fact 2.28 to obtain that ED[|g(D)| ] ≤ ((1 − η) + γ) + ((1 − η)t + γ)22`δ ≤ ((1 − η)t + γ)(1 + 2`δ) as claimed.

hQ L 2i k−1 Claim 2.43. ED,T `>i EU` [f` (D` + T` ∧ U`)] ≤ 1 + V (t) δ.

L Proof. Pick any ` > i and let g`(x) := EN [f` (x + N`)]. 1/2 The L1-norm of g` is at most V (t) by (1) in Claim 2.31 and Cauchy–Schwarz. Also by 2 (2) in the same claim we have EU [|g`(U)| ] ≤ 1. Moreover, g` has degree at most t by (1) in the same claim. ` k−i Now define g :(Zm) → C as g(xi+1, xi+2, . . . , xk) := gi+1(xi+1) · gi+2(xi+2) ··· gk(xk). (k−i)/2 (k−1)/2 Note that g has L1-norm at most V (t) ≤ V (t) and degree (k − i)t ≤ (k − 1)t, 2 by Fact 2.25 applied with u = `(k − i). Moreover, EUi+1,Ui+2,...,Uk [|g(Ui+1,Ui+2,...,Uk)| ] = 2 2 2 EUi+1 [|gi+1| ] · EUi+2 [|gi+2| ] ··· EUk [|gk| ] ≤ 1. Because b ≥ 2(k − 1)t, we can apply (2) in Fact 2.28 to obtain

2 k−1 ED[|g(D)| ] ≤ 1 + V (t) δ as desired.

L Proof. Pick any ` > i and let g`(x) := ET,U` [f` (x + T` ∧ U`)]. 1/2 The L1-norm of g` is at most V (t) by (1) in Claim 2.41, Cauchy–Schwarz, and the 2 assumption that γ ≤ η. Also by (2) in the same claim we have EU [|g`(U)| ] ≤ 1. Moreover, g` has degree at most t by (1) in the same claim. ` k−i Now define g :(Zm) → C as g(xi+1, xi+2, . . . , xk) := gi+1(xi+1) · gi+2(xi+2) ··· gk(xk). (k−i)/2 (k−1)/2 Note that g has L1-norm at most V (t) ≤ V (t) and degree (k − i)t ≤ (k − 1)t,

31 2 by Fact 2.25 applied with u = `(k − i). Moreover, EUi+1,Ui+2,...,Uk [|g(Ui+1,Ui+2,...,Uk)| ] = 2 2 2 EUi+1 [|gi+1| ] · EUi+2 [|gi+2| ] ··· EUk [|gk| ] ≤ 1. Because `k ≥ 2(k − 1)t, we can apply (2) in Fact 2.28 to obtain

2 k−1 ED[|g(D)| ] ≤ 1 + V (t) δ as desired. Lemma 2.40 follows by combining claims 2.42 and 2.43.

2.5.2 A recursive generator In this section we construct our generators for product tests and prove Theorem 2.17. Recall in Theorem 2.38 we proved that the distribution D + T ∧ U, where D is small-biased and T is almost n-wise independent, fools product tests. The idea is that once we’ve fixed D, we can view the product test defined on the positions selected by T as another product test with shorter input. Hence we can recursively replace the uniform bits in those positions by the distribution in Theorem 2.38 until we are left with a product test on a few bits. We begin with a lemma that shows the correctness of each step of the recursion. Theo- rem 2.17 then follows from applying the lemma repeatedly. Lemma 2.44. Suppose ` ≥ Ck log k log(k/ε) for a universal constant C. Let c be a multiple s of 4 and assume k is a multiple of c. If there is an explicit generator Gc`/4,k/c : {0, 1} → ({0, 1}c`/4)k/c that ε-fools product tests that read bits in any order and uses a seed of length s0 ` k s, then there is an explicit generator G`,k : {0, 1} → ({0, 1} ) that fools product tests in any order with error ε/k + ε and uses a seed of length s0 = O(`) + s.

|S| n Proof. Let n = `k. For S ⊆ {1, 2, . . . , n}, define the function PADS(x): {0, 1} → {0, 1} outputs n bits of which the positions in S are the first |S| bits of x0|S| and the rest are 0. s0 ` k −2` Our generator G`,k : {0, 1} → ({0, 1} ) samples a 2 -biased distribution D on n bits, −2` and a 2 -almost `-wise independent distribution T = (T1,...,Tk) on n bits which sets each bit to 1 with probability 1/8 and 0 otherwise. Then it outputs D + T ∧ PADT (Gc`/4,k/c(Us)).

We first analyze the seed length of G`,k. Standard constructions [NN93, AGHP92] use a seed of O(`) bits to sample D. To sample T , we will use the following lemma from [RSV13].

Lemma 2.45 (Lemma B.2 in [RSV13]). There is an explicit sampler that samples a γ-almost `-wise independent distribution T on n bits which sets each bit to 1 with probability η and 0 otherwise and uses a seed of length O(` log(1/η) + log((log n)/γ)). Applying the lemma with γ = 2−2` and η = 1/8, we can sample T with O(`) bits. So the 0 total seed length of G`,k is s = O(`) + s. ` k We now analyze the error of G`,k. Let f :({0, 1} ) → C1 be a product test. We bound above |E[f(Un)] − E[f(G`,k(Us0 ))]| by

|E[f(Un)] − E[f(D + T ∧ Un)]| + |E[f(D + T ∧ Un)] − E[f(G`,k(Us0 )]|.

32 The first term is at most ε/2k by Theorem 2.38 with the following choice of parameters. We set t = c`/(k log k) for a small enough constant c. Then we can bound V (t)k/2 ≤ V (t)k−1 ≤ 2`. We set η = 1/8. Thus, by the condition ` ≥ Ck log k log(k/ε) the error bound from Theorem 2.38 is at most k((1 − η)c`/(k log k) + γ)1/2(1 + 2`δ) + 2`δ ≤ O(k((ε/k)100 + 2−2`)1/2) + 2−` ≤ ε/2k. For the second term we don’t use any property of D. Let T 0 be T conditioned on 0 |Ti| ≤ `/4 for every 1 ≤ i ≤ k. For every fixed y ∈ D and t ∈ T , consider the function n/4 k fy,t :({0, 1} ) → C1 by fy,t(x) := f(y + t ∧ PADt(x)). Note that we can group every c functions into one and think of fy,t as a product test with k/c functions on c`/4 bits, which 0 can be fooled by Gc`/4,k/c. Thus, |E[f(D + T ∧ Un)] − E[f(G`,k(Us0 ))]| equals 0 0 |E[f(D + T ∧ PADT 0 (Un/4))] − E[f(D + T ∧ PADT 0 (Gc`/4,k/c(Us)))]| ≤ Ey∼D,t∼T 0 [|E[fy,t(Un/4)] − E[fy,t(Gc`/4,k/c(Us))]|] ≤ ε.

Now let E denote the event |Ti| > `/4 for some 1 ≤ i ≤ k. We bound above Pr[E]. We will use the following tail bound for almost d-wise independence.

Lemma 2.46 (Lemma B.1 in [RSV13]). Let T = (T1,...,T`) be a γ-almost d-wise indepen- dent distribution on ` bits where E[Tj] = η for every 1 ≤ j ≤ `. Then for any ε ∈ (0, 1), d/2 X  ed  Pr[| T − η`| ≥ ε`] ≤ + γ/εd. i 2ε2` i≤` We apply Lemma 2.46 with η = ε = 1/8, γ = 2−2`, and d = Ω(`). This guarantees that −Ω(`) for each 1 ≤ i ≤ k the probability of |Ti| > `/4 is at most 2 . By a union bound over −Ω(`) T1,...,Tk, we have Pr[E] ≤ k2 ≤ ε/2k. Putting everything together we have error ε/k + ε. Finally, we combine these results to prove Theorem 2.17. Theorem 2.17 (PRG for unordered products, II). There exists an explicit pseudorandom generator G: {0, 1}s → ({0, 1}`)k that ε-fools product tests in any order and seed length s = O(` + p`k log k log(k/ε)). Proof. Let C be the universal constant in Lemma 2.44. Suppose ` ≥ Ck log k log(k/ε). We will first apply Lemma 2.44 with c = 2 for t := O(log k) times until we are left with a product test of O(1) functions on O(`/k) bits, and then we output the uniform O(`/k)-bit string. Note that the condition ` ≥ Ck log k log(k/ε) holds throughout because ` and k are both divided by 2 at each step. Note that in each application of Lemma 2.44, we reduce ` by at least a half. Hence, Pt i the total seed length is at most i=0 O(`/2 ) + O(`/k) = O(`). The error is at most Pt i i=0 2 ε/k ≤ ε. If ` ≤ Ck log k log(k/ε), pick an integer c = O(pk log k log(k/ε)/`) so that c2` ≥ O(Ck log k log(k/ε)). By grouping every c functions into one, f is also a product test of k/c functions on c` bits. Hence, by the previous result we have a generator with seed length s = O(c`) = O(p`k log k log(k/ε)).

33 A potential improvement. We now show that an improvement in the error bound of Theorem 2.38 would yield much better pseudorandom generators.

Claim 2.47. Let D be an `-wise independent distribution on n := `k bits. Let T be an `-wise independent distribution on n bits which sets each bit to 1 with probability η. Let U be the uniform distribution on n bits. ` k Suppose that for any product test f :({0, 1} ) → C1 on n bits we have |E[f(U)]−E[f(D+ T ∧ U)]| ≤ k(1 − η)Ω(`). Then there is an explicit generator G: {0, 1}s → ({0, 1}`)k that ε-fools product tests in any order with seed length s = O((` + log k log(n/ε)) log n).

To prove Claim 2.47, first we replace Lemma 2.44 with the following lemma.

Lemma 2.48. Suppose ` ≥ C log(n/ε) for a universal constant C. Let c be an integer. If there is an explicit generator Gc`/4,k/c that fools product tests that read bits in any order on (c`/4) · (k/c) bits with error ε and uses a seed of length s, then there is an explicit generator s0 ` k G`,k : {0, 1} → ({0, 1} ) that fools product tests that read bits in any order on n := `k bits with error ε/n + ε and uses a seed of s0 = O(` log n) + s bits.

Proof. The generator is very similar to the one in Lemma 2.44 except that G now samples an `-wise independent distribution D on n bits and an `-wise independent distribution T on n bits that sets each bit to 1 with probability 1/8 and 0 otherwise. Now sampling D and T takes a seed of length O(` log n)[CG89, ABI86]. ` k Now we analyze the error of G`,k. Let f :({0, 1} ) → C1 be a product test. As in the proof of Lemma 2.44 we bound above |E[f(Un)] − E[f(G`,k(Us0 ))]| by

|E[f(Un)] − E[f(D + T ∧ Un)]| + |E[f(D + T ∧ Un)] − E[f(G`,k(Us0 ))]|. By our assumption, the first term is at most k(1 − η)Ω(`) ≤ ε/2n. For the second term, 0 0 let T be T conditioned on |Ti| ≤ `/4 for every 1 ≤ i ≤ k. For every fixed y ∈ D and t ∈ T , `/4 k consider the function fy,t :({0, 1} ) → C1 by fy,t(x) := f(y + t ∧ PADt(x)). Note that we can group every c functions into one and think of fy,t as a product test of k/c functions on 0 c`/4 bits, which can be fooled by Gc`/4,k/c. Thus, |E[f(D +T ∧Un)]−E[f(G`,k(Us0 ))]| equals

0 0 |E[f(D + T ∧ PADT 0 (Un/4))] − E[f(D + T ∧ PADT 0 (Gc`/4,k/c(Us)))]| ≤ Ey∼D,t∼T 0 [|E[fy,t(Un/4)] − E[fy,t(Gc`/4,k/c(Us))]|] ≤ ε.

Now let E denote the event |Ti| > `/4 for some 1 ≤ i ≤ k. We bound above Pr[E]. Since T −Ω(`) is n-wise independent, by the Chernoff bound the probability of |Ti| ≤ `/4 is at most 2 . −Ω(`) By a union bound over T1,...,Tk, we have Pr[E] ≤ k2 ≤ ε/2n. Putting everything together we have error ε/n + ε. Proof of Claim 2.47. Suppose ` ≥ C log(n/ε). We apply Lemma 2.48 recursively, in two different ways. One way reduces ` and the other reduces k. First, we apply the lemma

34 0 with c = 1 for t1 := O(log `) times to bring ` down to ` = O(log(n/ε)). This takes a seed Pt i of s := i=0 O(` log n/4 ) = O(` log n) bits. Now we have a product test of k functions on `0 bits. We will instead think of it as a product test of k/2 functions on 2`0 bits, and apply Lemma 2.48 with c = 2, which will reduce it to a product test of k/4 functions on 0 0 ` bits. Now we repeat t2 := O(log k) steps to reduce k to k = O(1). This takes a seed of 0 0 s := t2 · O(` log n) = O(log k log(n/ε) log n) bits. Now we are left with a product test of k functions on `0 bits, and we can output the uniform string. Therefore the total seed length is s = s + s + O(log(n/ε)) = O((` + log k log(n/ε)) log n). Because in each application of Lemma 2.48 the input length of the product test decreases by at least half, the error bound Pt1+t2 i O(log n) is at most i=0 2 ε/n ≤ 2 ε/n ≤ ε. If ` ≤ C log(n/ε), we can group the functions and have a product test of k0 functions on C log(n/ε) bits where k0 ≤ k, and reason as before.

2.6 Pseudorandomness, III

In this section we prove Theorem 2.20, giving generators for generalized halfspaces and combinatorial shapes. See Definition 2.19 for their definitions. Lemma 2.49 ([GKM18]). Suppose G: {0, 1}s → ({0, 1}`)k is an explicit generator that ε-fools any product test on `k bits that reads bits in any order, then 1. G fools any generalized halfspace h:({0, 1}`)k → {0, 1} on `k bits that reads bits in any order with error O(k2`(` + log k)ε). 2. G fools any combinatorial shape g :({0, 1}`)k → {0, 1} on `k bits that reads bits in any order with error O(k22`(` + log k)ε).

` k Proof of Lemma 2.49. (1) Let U = (U1,...,Uk) be the uniform distribution over ({0, 1} ) ` k s and X = (X1,...,Xk) = πG(Us) ⊆ ({0, 1} ) , where Us is uniform over {0, 1} and π is P P some permutation on `k bits. Let Z1 := i≤k gi(Ui), and Z2 := i≤k gi(Xi). Since G fools product tests with error ε, we have for every α ∈ [0, 1],

hY i hY i 2πiαZ1 2πiαZ2 2πiαgi(Ui) 2πiαgi(Xi) E[e ] − E[e ] = E e − E e ≤ ε. i≤k i≤k

By [GKM18, Lemma 9.3], we may assume each gi(j) and θ are integers of absolute value ` O(2`k) B := (2 k) , and so −kB ≤ Z1,Z2 ≤ kB. It follows from [GKM18, Lemma 9.2] that   E[h πG(Us) ] − E[h(U)] ≤ max Pr(Z1 ≤ t) − Pr(Z2 ≤ t) ≤ O log(kB) ε. −kB≤t≤kB P (2) Since gi(xi) ∈ {0, . . . , k}, it suffices to fool the generalized halfspaces h(x) := P i≤k i≤k gi(xi) − θ for θ ∈ {0, . . . , k}, the rest follows from (1) and a union bound. Theorem 2.20 (PRG for generalized halfspaces and combinatorial shapes, in any order). There exists an explicit pseudorandom generator G: {0, 1}s → ({0, 1}`)k that ε-fools both

35 √ generalized halfspaces and combinatorial shapes in any order with seed length s = O˜(` k + p`k log(1/ε)).

Proof. Combine Lemma 2.49 and Theorem 2.17.

2.7 A lower bound on b and η

In this section, we make some final remarks on the possible range of parameters in Theo- rem 2.5 in Section 2.1.1. First, we note that if the noise parameter η is set to 0 then we require that b ≥ `. Next we prove a lower bound on b that holds also for other settings of η. In particular, for any b = (1−ε)` we show that η must be at least a constant ε0 which depends only on ε. Let k = 1 and let M be a uniformly chosen ` × t matrix over Fq. The probability t ` that the corresponding code has minimum distance ≤ b is at most q Vq(b)/q . Hence a code 0 C exists with minimum distance > b for ` − t = dlogq Vq(b)e. By Fact 2.9, the uniform distribution D over the dual code C of C0 is b-uniform. This distribution can be generated `−t by an ` × (` − t) matrix. Hence the support size of this distribution is q ≤ O(Vq(b)). Moreover, by Lemma 2.35 we can sample with O(η log(q/η)`) bits a distribution that is 2−Ω(η`)-close to the noise vector N. Hence D + N is 2−Ω(η`)-close to a distribution supported on a set S of size   O η log(q/η)` b log O(e`q/b)+O(η log(q/η)`) O Vq(b) · 2 ≤ 2 .

Define the function f1 to be the characteristic function of S. By Lemma 2.35 the function outputs 1 on D + N with probability 1 − 2−Ω(η`). On the other hand, the function outputs 1 on a uniform input with probability |S|/q`. In particular, for any b = (1−ε)` and sufficiently 0 0 large q, this shows that f1 has a constant distinguishing advantage for all η less ε , where ε depends only on ε.

36 Chapter 3

Pseudorandom Generators for Read-Once Polynomials

Pseudorandom generators for polynomials have been studied since at least the 1993 work by Luby, Veliˇckovi´c,and Wigderson [LVW93√ ], who gave a generator for F2-polynomials of size O( log(s/ε)) s with error ε and seed length 2 . See [Vio07]√ for an alternative proof. Servedio and Tan [ST18] recently improved the seed length to 2O( log s) · log(1/ε), and any significant improvement on the seed length would require breakthrough progress on circuit lower bounds. For low-degree polynomials, better generators are known [BV10a, Lov09, Vio09c]. In this chapter we consider read-once polynomials on n variables, which are a sum of monomials on disjoint variables. For this class, a generator with seed length polylogarithmic in n and 1/ε is given in [GLS12] and it applies more generally to read-once ACC0. A specific motivation for studying read-once polynomials comes from derandomizing space-bounded algorithms, a major line of research in pseudorandomness whose leading goal is proving RL = L. Despite a lot of effort, for general space-bounded algorithms there has been no improvement over the seed length ≥ log2 n since the classic 1992 paper by Nisan [Nis92]. In fact, no improvement is known even under the restriction that the algorithm uses constant-space. Read-once polynomials can be implemented by constant- space algorithms, and were specifically pointed out by several researchers as a bottleneck for progress on space-bounded algorithms, see for example this survey talk by Trevisan [Tre10].

3.1 Our results

In this chapter, we construct a generator with a seed length which is optimal up to a factor of O˜(log 1/ε), where O˜ hides factors log log(n/ε). In particular, when ε is not too small, our generator has seed length optimal up to poly log log n.

Theorem 3.1. There exists an explicit generator G: {0, 1}s → {0, 1}n that fools any read- ˜ once F2-polynomial with error ε and seed length O(log(n/ε)) log(1/ε). Theorem 3.1 can be seen as progress towards derandomizing small-space algorithms. We

37 note that the work of Chattopadhyay, Hatami, Reingold and Tal [CHRT18] gives a generator for space-bounded algorithms which implies a generator for polynomials with seed length O˜(log3 n) log2(n/ε). Theorem 3.1 also holds for polynomials modulo M for any fixed M; in fact we obtain it as an easy corollary of a more general generator for product tests.

Fooling products. In Chapter2, we gave the first generators for the√ class of product tests, but in it the dependency on k is poor: the seed length is always ≥ k. In this chapter we improve the dependency on k exponentially, though the results in Chapter2 are still unsurpassed when k is very small, e.g. k = O(1). We actually obtain two incomparable generators. Theorem 3.2. There exists an explicit generator G: {0, 1}s → {0, 1}n that fools any product test with k functions of input length ` with error ε and seed length O˜(` + log k) log(1/ε) log k. By the reductions in [GKM18], we also obtain generators that fool variants of prod- uct tests where the outputs of the fi are not simply multiplied but combined in other ways. These variants include generalized halfspaces [GOWZ10] and combinatorial shapes ˜ [GMRZ13, De15], extended to arbitrary order. For those we obtain seed√ length O(` + log k)2 log(1/ε) log k, whereas in Chapter2 we obtained seed length ≥ ` k. As this appli- cation amounts to plugging the above theorem in previous reductions, we don’t discuss it further in this chapter and instead refer the reader to Section 2.6. We then give another generator whose seed length is optimal up to a factor O˜(log 1/ε), just like Theorem 3.1. However, for this we need each function fi in the definition of product tests to have expectation at most 1−α2−` for some universal constant α > 0. This condition is satisfied by Boolean and most natural functions. For simplicity one can think of the functions fi having outputs {−1, 1}. Definition 3.3 (Nice product tests). A product test as in Definition 2.2 is nice if there −` exists a constant α > 0 such that each function fi has expectation at most 1 − α2 . Formally, one should talk about a nice family of product tests; but for simplicity we’ll just say “nice product test.” Theorem 3.4. There exists an explicit generator G: {0, 1}s → {0, 1}n that fools any nice product test with k functions of input length ` with error ε and seed length O˜(`+log(k/ε)) log(1/ε). This is the result from which the generator for polynomials in Theorem 3.1 follows easily.

Bounded independence plus noise. The framework in which we develop these genera- tors was first laid out by Ajtai and Wigderson in their pioneering work [AW89] of constructing generators for AC0 with polynomial seed length. The framework seems to have been for- gotten for a while, possibly due to the spectacular successes by Nisan who gave better and arguably simpler generators [Nis91, Nis92]. It has been recently revived in a series of pa- pers starting with the impressive work by Gopalan, Meka, Reingold, Trevisan, and Vadhan

38 [GMR+12], who use it to obtain a generator for read-once CNF on n bits with error ε and seed length O˜(log(n/ε)). This significantly improves on the previously available seed length of O(log n) log(1/ε) when ε is small. The Ajtai–Wigderson framework goes by showing that the test is fooled by a distribution with limited independence [NN93], if we perturb it with noise. (Previous papers use the equivalent language of restrictions.) Then the high-level idea is to recurse on the noise. This has to be coupled with a separate, sometimes technical argument showing that each recursion simplifies the test, which we address later. Thus our goal is to understand if bounded independence plus noise fools product tests. For our application, it would be convenient to view the distribution as D + T ∧ U, the bit-wise XOR of D and T ∧ U, where ∧ is bit-wise AND and T ∧ U is a noise vector: if a bit chosen by T is 1 then we set it to uniform. For the application it is important that T is selected pseudorandomly, though the result is interesting even if T is uniform in {0, 1}n. We now restate the result in Chapter2 after defining almost bounded independence.

Definition 3.5 ((δ, d)-closeness). The random variables X1,...,Xn are (δ, d)-close to Y1,...,Yn if for every i1, . . . , id ∈ {1, 2, . . . , n} the d-tuples (Xi1 ,...,Xid ) and (Yi1 ,...,Yid ) have statis- tical distance ≤ δ.

Note that when δ = 0 and the variables Yi are uniform, the variables Xi are exactly d-wise independent. Recall in Chapter2 we proved the following theorem.

n Theorem 3.6. Let f : {0, 1} → C≤1 be a product test with k functions of input length `. Let D and T be two independent distributions over {0, 1}n that are (0, d`)-close to uniform. Then −Ω(d2`/k) E[f(D + T ∧ U)] − E[f(U)] ≤ k2 , where U is the uniform distribution.

Note that the dependence on the number k of functions is poor: when k = Ω(d2`), the error bound does not give anything non-trivial. A main technical contribution of this chapter is obtaining exponentially better dependency on k using different techniques from Chapter2. Our theorem gives non-trivial error bound even when d = O(1) and k is exponential in `.

n Theorem 3.7. Let f : {0, 1} → C≤1 be a product test with k functions of input length `. Let D and T be two independent distributions over {0, 1}n that are (δ, d`)-close to uniform. Then −Ω(d) ED,T,U [f(D + T ∧ U)] − EU [f(U)] ≤ 2 + Bδ, where U is the uniform distribution, for the following choices of B: i. B = (k2`)O(d); ii. if f is nice, then B = (d2`)O(d).

39 Setting δ = 0, Theorem 3.7 has a better bound than Theorem 3.6 when k = Ω(d`). An interesting feature of Theorem 3.7 is that for nice products the parameter δ can be independent of k. We complement this feature with a negative result showing that for general products a dependence on k is necessary. Thus, the distinction between products and nice products is not an artifact of our proof but is inherent.

Claim 3.8. For every sufficiently large k, there exists a distribution D over {0, 1}k that −Ω(1) Ω(1) k is (k , k )-close to uniform, and a product test f : {0, 1} → C≤1 with k functions of input length 1 such that

E[f(D + T ∧ U)] − E[f(U)] ≥ 1/10, where T and U are the uniform distribution over {0, 1}k.

This claim also shows that for ` = 1 and ε = Ω(1) one needs δ ≤ k−Ω(1), and even for distributions which are (δ, kΩ(1))-close to uniform, instead of just (δ, O(1))-close. For the class of combinatorial rectangles, which corresponds to product tests with each + fi outputting values in {0, 1}, the classic result [EGL 98] (extended in [CRS00], for an exposition see Lecture 1 in [Vio17]) shows that d`-wise independence alone fools rectangles with error 2−Ω(d) and this error bound is tight. So Theorem 3.7 does not give better bounds for rectangles, even with the presence of noise. We develop additional machinery and obtain an improvement on Theorem 3.7. While the improvement is modest, the machinery we develop may be useful for further improvements. Since this improvement is not used in our construction of PRGs, we only state and prove it for exact bounded independence. For technical reasons we restrict the range of the fi slightly. Theorem 3.9. Let f be a product test with k functions of input length `. Suppose the range of each function fi of f is the set {0, 1}, or the set of all M-th roots of unity for some fixed M. Let D and T be two independent distributions over {0, 1}n that are d`-wise independent. Then −Ω(d) E[f(D + T ∧ U)] − E[f(U)] ≤ ` . Finally, it is natural to ask if similar techniques fool non-read-once polynomials. In this regard, we are able to show that small-bias distributions [NN93] plus noise fool F2- polynomials of degree 2.

n Claim 3.10. Let p: {0, 1} → {0, 1} be any F2-polynomial of degree 2. Let D and T be two distributions over {0, 1}n, where D is δ-biased, and T sets each bit to 1 independently with probability 2/3. Then

E[p(D + T ∧ U)] − E[p(U)] ≤ δ.

3.1.1 Techniques We first explain how to prove Theorem 3.7. After that, we will explain the additional ideas that go into constructing our PRGs.

40 Following the literature [GMR+12, GMRZ13, GY14], at a high level we do a case analysis based on the total-variance of the product test f we want to fool. This is defined as the sum of the variances Var[fi] of the functions fi in the definition of product test. The variance of a function g is E[|g(x)|2] − |E[g(x)]|2 where x is uniform.

Low total-variance. Our starting point is a compelling inequality in [GKM18] (cf. [GMR+12, GY14]) showing that bounded independence alone without noise fools low total-variance product tests. However, their result is only proved for exact bounded independence, i.e. every d bits are exactly uniform, whereas it is critical for our seed lengths to handle almost bounded independence, i.e. every d bits are close to uniform. One technical contribution in this chapter is extending the inequality in [GKM18] to work for almost bounded independence. The proof of the inequality in [GKM18] is somewhat technical, and our extension introduces several complications. For example, the expectations of the fi under the almost-bounded independent distribution D and the uniform distribution U are not guaranteed to be equal, and this requires additional arguments. However our proof follows the argument in [GKM18], which we also present in a slightly different way that is possibly of interest to some readers. Finally we mention that Claim 3.8 shows that our error term is close to tight in certain regimes, cf. Section 3.7.

High total-variance. Here we take a different approach from the ones in the literature: The papers [GLS12, GKM18] essentially reduce the high total-variance case to the low total- variance case. However their techniques either blow up the seed length polynomially [GLS12] or rely on space-bounded generators that only work in fixed order [GKM18]. We instead observe that bounded independence plus noise fools even high total-variance product tests. We now give some details of our approach. A standard fact is that the expectation of a product test f is bounded above by

P Y Y 1/2 − Var[fi]/2 |E[fi]| ≤ (1 − Var[fi]) ≤ e i . i i P So if the total-variance i Var[fi] is large then the expectation of the product test under the uniform distribution is small. Thus, it suffices to show that the expectation is also small under bounded independence plus noise. To show this, we argue that typically, the total-variance remains high even considering the fi as functions of the noise only. Specifically, we first show 0 that on average over a uniform x and t, the variance of the functions fi (y) := fi(x + t ∧ y) is about as large as that of the fi. This uses Fourier analysis. Then we use concentration inequalities for almost bounded independent distributions to derandomize this fact: we show that it also holds for typical x and t sampled from D and T . This suffices to prove Theorem 3.7.i. Proving Theorem 3.7.ii requires extra ideas. We first note that the high total-variance case actually does not appear in the read-once CNF generator in [GMR+12]. This is because one can always truncate the CNF to have at most 2w log(1/ε) number of clauses of width w, which suffices to determine the expected value of the CNF up to an additive error of ε, and such a CNF has low total-variance (for this

41 one argues that noise helps reduce the variance a little.) To handle an arbitrary read-once CNF, [GMR+12] partition the clauses according to their width, and handle each partition separately. However, one cannot truncate polynomials without noise. To see why, consider, for a simple example, the linear polynomial x1 + x2 + ... + xn (corresponding to a product test that computes the parity function). Here no strict subset of the monomials determines the expectation of the polynomial. Indeed, one can construct distributions which look random to n − 1 monomials, but not to n.

Truncation using noise. Although we cannot truncate polynomials without noise, we show that something almost as good can still be done, and this idea is critical to obtaining our seed lengths. We show that the statistical closeness parameter in D and T can be selected as if the polynomial was truncated: it is independent from the number k of functions. This is reflected in Theorem 3.7.ii, where δ is independent from k. The proof goes by showing that if the number k of functions is much larger than 23` then noise alone will be enough to fool the test, regardless of anything else. This proof critically uses noise: without noise a dependence on k is necessary, as shown in the parity example in our discussion. Also, for the proof to work the functions must have expectation at most 1 − Ω(2−`). As mentioned earlier, we further prove that this last requirement is necessary (Claim 3.8): we construct functions whose expectation is about 1 − 1/k but their product is not fooled by almost bounded independence plus noise, if the statistical closeness parameter is larger than 1/kc for a suitable constant c.

Extra ideas for improved bound. To obtain the improved error bound in Theorem 3.9, we show that whenever the total-variance of a product test lies below dn0.1, we can use noise to bring it down below d`−0.1. This produces a gap of [d`−0.1, d`0.1] between the high and low total-variance cases, which gives the better bound using the previous arguments. Reducing the total-variance requires a few additional ideas. First, we use Theorem 3.6 to handle the functions fi in the product test which have high variances. Then we use the hypercontractivity theorem to reduce the variances of the rest of the fi individually. + [GMR 12] also uses noise to reduce variance, but their functions fi are just AND and so they do not need hypercontractivity. To combine both ideas, we prove a new “XOR Lemma” for bounded independence, a variant of an XOR lemma for small-bias, which was proved in [GMR+12].

Constructing our PRGs. We now explain how to use Theorem 3.7 to construct our PRGs. The high-level idea of our PRG construction is to apply Theorem 3.7 recursively following the Ajtai–Wigderson framework: Given D + T ∧ U, we can think of T as selecting each position in {1, . . . , n} with probability 1/2. For intuition, it would be helpful to assume each position is selected independently. We will focus on how to construct a PRG using a seed of length O˜(log n) for read-once polynomials with constant error, as this simplifies the parameters and captures all the ideas.

42 Without loss of generality, we can assume the degree of a polynomial to be ` = O(log n), because the contribution of higher-degree terms can be shown to be negligible under a small- bias distribution. (See the proof of Theorem 3.1.) Let p: {0, 1}n → {0, 1} be a degree-` read-once polynomial with k monomials. It would be convenient to think of p outputting values {−1, 1}. Further, we can write p as a product Qk i=1 pi, where each pi is a monomial on at most ` bits (with outputs in {−1, 1}.) Now suppose we only assign the values in D to the positions not chosen by T , that is, setting the input bits xi = Di for i 6∈ T . This induces another polynomial pD,T defined on the positions in T . Clearly, pD,T also has degree at most `, and so we can reapply Theorem 3.7 to pD,T . Repeating the above argument t times induces a polynomial defined on the positions t Tt := ∧i=1Ti. One can think of Tt as a single distribution that selects each position with −t probability 2 . Viewing Tt this way, it is easy to see that we can terminate the recursion after t := O(log n) steps, as the set Tt should become empty with high probability. By standard constructions [NN93], it takes s := O˜(`) bits to sample D and T in Theo- rem 3.7.ii each time. Therefore, we get a PRG of seed length t · s = O˜(`) log n. To obtain a better seed length, we will instead apply Theorem 3.7 in stages. Our goal in each stage is to reduce the degree of the polynomial by half. In other words, we want t the restricted polynomial defined on the positions in ∧i=1Ti to have degree `/2. It is not difficult to see that in order to reduce the degree of the n monomials of p to `/2 with high probability, it suffices to apply our above argument recursively for t := O(log n)/` times. So in each stage, we use a seed of length

log n t · s = O˜(`) · = O˜(log n). `

After repeating the same argument for O(log `) = O˜(1) stages, with high probability the restricted polynomial would have degree 0 and we are done. Therefore, the total seed length of our PRG is O˜(log n). Here we remark that it is crucial in our argument that D and T are almost-bounded independent, as opposed to being small-biased. Otherwise, we cannot have seed length s = O˜(`) when ` = o(log n. For example, when ` = O(1), with small-bias we would need s = O(log n) bits, whereas we just use O(log log n) bits. Forbes and Kelley [FK18], by applying the analysis in Chapter2 to an elegant Fourier decomposition of product tests, show that 2t-wise independence plus noise fools width-w ROBPs on n bits with error 2−t/2nw. Their work implicitly shows that t-wise independence plus noise fools product tests with k functions of input length ` with error k2−Ω(t)+`−1, improving Theorem 2.5. However, their result is incomparable to Theorems 3.7 and 3.9, as there is no dependence on k in our error bounds for exact bounded independence, i.e. when D is (0, d`)-close to uniform. By combining their result with Claim 3.14, we show that the dependence on k in their error bound can be removed for nice product tests.

43 Conditions Uses Follows from Error P −Ω(d) ` O(d) (1) i≤k Var[fi] ≤ αd D Lemma 3.12 2 + (k2 ) δ P −Ω(d) O(d) (2) i≤k Var[fi] ≥ αd D + T ∧ U Derandomized Claim 3.13 2 + k δ (3) k ≥ 23`+1d, nice products T ∧ U Claim 3.14 2−Ω(d`) + 2O(d`)δ

Table 3.1: Error bounds for fooling a product tests of k functions of input length ` under different conditions. Here D and T are (δ, d`)-close to uniform, and α is a small constant.

n Theorem 3.11. Let f : {0, 1} → C≤1 be a nice product test with k functions of input length `. Let D and T be two t-wise independent distributions over {0, 1}n. Then

8`−Ω(t) ED,T,U [f(D + T ∧ U)] − EU [f(U)] ≤ 2 , where U is the uniform distribution.

We note that for product tests this error bound is optimal up to the constant in the exponent, because the same distribution fools parities with error 2−(t+1). On the other hand, [BHLV18, Theorem 8] shows that for ROBPs the dependence on n in the error is inherent.

Organization. We prove bounded independence plus noise fools product (Theorem 3.7) in Section 3.2, except the proof of the low total-variance case, which we defer to Section 3.4. Then we give constructions of our PRGs in Section 4.3. In Section 3.5, we show how to obtain the modest improvement of Theorem 3.7 and the optimal error bound for nice product tests (Theorem 3.11) using [FK18]. After that, we prove our result on fooling degree-2 polynomials in Section 3.6. Finally, we prove Claim 3.8 in Section 3.7.

3.2 Bounded independence plus noise fools products

In this section we prove Theorem 3.7. As we mentioned in the introduction, the proof consists of 3 parts: (1) Low total-variance, (2) high total-variance, and (3) truncation using noise for nice products. We summarize the conditions and the error bounds we obtain for these cases in Table 3.1. Let us now quickly explain how to put them together to prove Theorem 3.7. Clearly, combining (1) and (2) immediately gives us a bound of 2−Ω(d) +(k2`)O(d) for product tests, proving Theorem 3.7.i. For nice product tests, we can apply (3) if k ≥ 23`+1d, otherwise we can plug in k ≤ 23`+1d in the previous bound, proving Theorem 3.7.ii. We now discuss each of the 3 cases in order. Since the proof of the low total-variance case is quite involved, we only state the lemma in this section and defer its proof to Section 3.4.

Lemma 3.12. Let X1,X2,...,Xk be k independent random variables over C≤1 with minz∈Supp(Xi) −` Pr[Xi = z] ≥ 2 for each i ∈ {1, . . . , k}. Let Y1,Y2,...,Yk be k random variables over C≤1

44 that are (ε, 16d)-close to X1,...,Xk. Then

k k Pk !d/2 h Y i h Y i Var[Xi] Y − X ≤ 2O(d) i=1 + (k2`)O(d)ε. E i E i d i=1 i=1 We now prove a claim that handles the high total-variance case. This claim shows that for uniform x and t, the variance of the function g(y) := f(x + t ∧ y) is close to the variance of f in expectation. Its proof follows from a simple calculation in Fourier analysis. Later, we will derandomize this claim in the proof of Theorem 3.7.

` Claim 3.13. Let T be the distribution over {0, 1} where the Tj’s are independent and ` E[Tj] = η for each j. Let f : {0, 1} → C be any function. Then

h 0 i EU,T Var[f(U + T ∧ U )] ≥ η Var[f]. U 0 Proof of Claim 3.13. By the definition of variance and linearity of expectation, we have

h 0 i h  0 2 0 2i EU,T Var[f(U + T ∧ U )] = EU,T EU 0 |f(U + T ∧ U )| − |EU 0 [f(U + T ∧ U )]| U 0 h i h 2i  0 2 0 = EU,T EU 0 |f(U + T ∧ U )| − EU,T EU 0 [f(U + T ∧ U )] . The first term is equal to

2 X ˆ ˆ X ˆ 2 EU [|f(U)| ] = fαfα0 EU [χα−α0 (U)] = |fα| . α,α0 α The second term is equal to h h i h ii X ˆ 0 X ˆ 00 EU,T EU 0 fαχα(U + T ∧ U ) EU 00 fα0 χα0 (U + T ∧ U ) α α0 h i X ˆ ˆ 0 00 = EU,T fαfα0 EU 0 [χα(U + T ∧ U )] EU 00 [χα0 (U + T ∧ U )] α,α0 X ˆ ˆ h 0 00 i = fαfα0 EU [χα+α0 (U)] ET EU 0 [χα(T ∧ U )] EU 00 [χα0 (T ∧ U )] α,α0 X ˆ 2 0 00 = |fα| ET,U 0,U 00 [χα(T ∧ (U + U ))] α X ˆ 2 |α| = |fα| (1 − η) . α Therefore,

h 0 i X ˆ 2 |α| X ˆ 2 EU,T Var[f(U + T ∧ U )] = |fα| 1 − (1 − η) ≥ η |fα| = η Var[f], U 0 α α6=0 where the inequality is because 1 − (1 − η)|α| ≥ 1 − (1 − η) ≥ η for any α 6= 0.

45 With Lemma 3.12 and Claim 3.13, we now prove Theorem 3.7.

P 1/2 2 Proof of Theorem 3.7.i. Let σ denote ( i≤k Var[fi]) . We will consider two cases: σ ≤ αd and σ2 > αd, where α > 0 is a sufficiently small constant. 2 −` If σ ≤ αd, we use Lemma 3.12. Specifically, since Pr[fi(U) = z] ≥ 2 for every z ∈ Supp(fi), it follows from Lemma 3.12 that

h k i h k i Y Y −Ω(d) ` O(d) E fi(D) − E fi(U) ≤ 2 + (k2 ) δ, i=1 i=1 and the desired bound holds for every fixing of T and U. If σ2 ≥ αd, then the expectation of f under the uniform distribution is small. More precisely, we have

Y Y 1/2 − 1 σ2 −Ω(d) 2 EU [fi(U)] = (1 − Var[fi]) ≤ e ≤ 2 . (3.1) i≤k i≤k

Thus, it suffices to show that its expectation under D + T ∧ U is at most 2−Ω(d) + (k2`)O(d)δ. We now use Claim 3.13 to show that

h k i Y −Ω(d) n O(d) ED,T,U fi(D + T ∧ U) ≤ 2 + (k2 ) δ. i=1

m 2 0 For each t, x ∈ {0, 1} , and each i ∈ {1, 2, . . . , k}, let σt,x,i denote VarU 0 [fi(x + t ∧ U )]. We P 2 claim that i≤k σt,x,i is large for most x and t sampled from D and T respectively. From Claim 3.13 we know that this quantity is large in expectation for uniform x and t. By a tail bound for almost bounded independent distributions, we show that the same is true for most x ∈ D and t ∈ T . By a similar calculation to (3.1) we show that for these x and t we have that |E[f(x + t ∧ U)]| is small. To proceed, let T 0 be the uniform distribution over {0, 1}n. Applying Claim 3.13 with 2 η = 1/2, we have ET 0,U [σT 0,U,i] ≥ Var[fi]/2. So by linearity of expectation,

h X 2 i 2 ET 0,U σT 0,U,i ≥ σ /2 ≥ αd/2. i≤k

2 2 Since T and D are both (δ, d`)-close to uniform, the random variables σT,D,1, . . . , σT,D,k are 2 2 P 2 (2δ, d`)-close to σT 0,U,1, . . . , σT 0,U,k. Let µ = ET 0,U [ i≤k σT 0,U,i] ≥ αd/2. By Lemma 3.49,

hX 2 i −Ω(d) O(d) Pr σT 0,U,i ≤ µ/2 ≤ 2 + k δ. (3.2) T 0,U i≤k

Hence, except with probability 2−Ω(d) + kO(d)δ over t ∈ T and x ∈ D, we have

X 2 X 0 σt,x,i = Var[fi(x + t ∧ U )] ≥ αd/4. U 0 i≤k i≤k

46 For every such t and x, we have

Y Y EU [fi(x + t ∧ U)] ≤ EU [fi(x + t ∧ U)] i≤k i≤k Y 2 1/2 = (1 − σt,x,i) i≤k − 1 P σ2 −Ω(d) ≤ e 2 i≤k t,x,i ≤ 2 . (3.3)

In addition, we always have |f| ≤ 1. Hence, summing the right hand side of (3.2) and (3.3), we have " # h i Y Y −Ω(d) O(d) ED,T,U fi(D + T ∧ U) ≤ ED,T EU [fi(D + T ∧ U)] ≤ 2 + k δ. i≤k i≤k

To prove Theorem 3.7.ii, we use the following additional observation that noise alone fools nice products when k is suitably larger than 22`. The high-level idea is that in such a −` ` case there will be at least k2 ≥ 2 functions fi whose inputs are completely set to uniform −` by the noise. Since the expectation of each fi is bounded by 1 − O(2 ), the expectation of their product becomes small when k is suitably larger than 22`. On the other hand, E[f(U)] can only get smaller under the uniform distribution, and so the expectations under uniform and noise are both small.

n Claim 3.14 (Noise fools nice products with large k). Let f : {0, 1} → C≤1 be a nice product test with k ≥ 23`+1d of input length `. Let T be a distribution over {0, 1}n that is (δ, d`)-close to uniform. Then

−Ω(d`) O(d`) ET,U [f(T ∧ U)] − E[f(U)] ≤ 2 + 2 δ.

Proof. We will bound above both expectations in absolute value. Let k0 := 23`+1d ≤ k. Qk Ii Write f = i=1 fi, where fi : {0, 1} → C≤1. Since f is nice, there is a constant α ∈ (0, 1] −` such that |E[fi(U)]| ≤ 1 − α2 for every i ∈ {1, . . . , k}. Under the uniform distribution, we have

k Y −` k −Ω(k2−`) −Ω(d`) E[f(U)] = E[fi(U)] ≤ (1 − α2 ) ≤ e ≤ 2 . (3.4) i=1

It suffices to show that the expectation under T ∧ U is at most 2−Ω(d`) + 2O(d`)δ. Note that

k k0 h Y i h Y i E[f(T ∧ U)] ≤ ET EU [fi(T ∧ U)] ≤ ET EU [fi(T ∧ U)] . i=1 i=1

We now show that the right hand side is at most 2−Ω(d`) + 2O(d`)δ. We first show that the expected number of fi whose inputs are all selected by T when T is uniform is large, and

47 then apply a tail bound for almost bounded independent distributions to show that it holds for most t ∈ T . Let T 0 be the uniform distribution over {0, 1}n. Then

k0 k0 h X i X 1(T 0 = 1|Ii|) = Pr[T 0 = 1|Ii|] ≥ k02−` = 22`+1d. E Ii Ii i=1 i=1

Since T is (δ, d`)-close to uniform, the TIi are (δ, d)-close to uniform. By Lemma 3.49,

0 h k i X |Ii| 2n −Ω(dn) O(dn) Pr 1(TI = 1 ) ≤ 2 d ≤ 2 + 2 δ. (3.5) T i i=1

|Ii| −n Note that if TIi = 1 , then |EU [fi(T ∧ U)]| = |E[f]| ≤ 1 − α2 . Thus, conditioned on k0 P 1 |Ii| ` i=1 (TIi = 1 ) ≥ 2 d, we have

k0 Y −` 22`d −Ω(d`) E[fi(T ∧ U)] ≤ (1 − α2 ) ≤ 2 . (3.6) i=1 Since we always have |f| ≤ 1, the error bound follows from summing the right hand side of (3.4), (3.5) and (3.6). Theorem 3.7.ii now follows easily from Claim 3.14 and Theorem 3.7.i.

Proof of Theorem 3.7.ii. Since f is nice, there is a constant α ∈ (0, 1] such that |E[fi]| ≤ 1 − α2−`. If k ≥ 23`+1d, then the theorem follows from Claim 3.14. Otherwise, k ≤ 23`+1d and the theorem follows from Theorem 3.7.i.

3.3 Pseudorandom generators

In this section we construct our generators. As explained in the introduction, all construc- tions follow from applying the Theorem 3.7 recursively. We obtain our generator for arbitrary product tests (Theorem 3.2) by applying Theorem 3.7 for O(log `k) = O˜(log k) times recur- sively. Our progress measure for the recursion is the number of bits the restricted product test is defined on. We show that after O(log `k) steps of the recursion we are left with a product test that is defined on n0 := O(` log(1/ε)) bits, which can be fooled by a distribution that is (ε, n0)-close to uniform. As a first read, we suggest the readers to refer to the O˜ no- tations in the statements and proofs, i.e. ignore polylogarithmic factors in `, log k, log(1/ε) and log n, and think of k as n and ε as some arbitrary small constant. Proof of Theorem 3.2. Let C be a sufficiently large constant. Let t = C log(`k) = O˜(log k), ` −d d = C log(t/ε) and δ = (k2 ) . Let D1,...,Dt,T1,...,Tt be 2t independent distributions n (1) (i+1) (i) over {0, 1} that are (δ, d`)-close to uniform. Define D := D1 and D := Di+1 +Ti ∧D . (t) Vt 0 n Let D := D , T := i=1 Ti. Let G be another distribution over {0, 1} that is (δ, d`)- |S| n close to uniform. For a subset S ⊆ [n], define the function PADS(x): {0, 1} → {0, 1} to

48 output n bits of which the positions in S are the first |S| bits of x0|S| and the rest are 0. Our generator G outputs 0 D + T ∧ PADT (G ). We first look at the seed length of G. By [NN93, Lemma 4.2], sampling G0 and each of the distributions Di and Ti takes a seed of length Od` + log(1/δ) + log log n = O(` + log(k/ε)) log(t/ε) + log log n = O˜` + log(k/ε) log(1/ε). Hence the total seed length of G is (2t + 1) · O˜(` + log(k/ε)) log(1/ε) = O˜(` + log(k/ε)) log(1/ε) log k. We now look at the error of G. By our choice of δ and applying Theorem 3.7 recursively for t times, we have −Ω(d) ` O(d)  E[f(D + T ∧ U)] − E[f(U)] ≤ t · 2 + (k2 ) δ ≤ ε/2.

Next, we show that for every fixing of D and most choices of T , the function fD,T (y) := f(D + T ∧ y) is a product test defined on d` bits, which can be fooled by G0. Sk Let I = i=1 Ii. Note that |I| ≤ `k. Because the variables Ti are independent and each of them is (δ, d`)-close to uniform, we have |I| Pr|I ∩ T | ≥ d` ≤ (2−d` + δ)t ≤ 2d` log(`k) · 2−Ω(Cd` log(`k)) ≤ ε/4. d` It follows that for every fixing of D, with probability at least 1 − ε/4 over the choice of T , 0 the function fD,T is a product test defined on at most d` bits, which can be fooled by G with error ε/4. Hence G fools f with error ε. Our generator for nice product tests (Theorem 3.4) uses the maximum input length of the functions fi as the progress measure. We will use the following lemma, which captures the trade-off between the number of recursions and the simplification on a product test measured in terms of the maximum input length of the fi. Lemma 3.15. If there is an explicit generator G0 : {0, 1}s0 → {0, 1}n that fools nice product tests with k functions of input length r with error ε0 and seed length s0, then there is an explicit generator G: {0, 1}s → {0, 1}n that fools nice product tests with k functions of input 0 log(k/ε) `  ˜ log(k/ε)  length ` with error ε + tε, where t = O r+1 + log( r+1 ) = O r+1 + 1 , and seed length s = s0 + t · O((` + log log(1/ε)) log(1/ε) + log log n) = s0 + t · O˜(` log(1/ε)). We defer its proof to the end. Theorem 3.4 requires applying the lemma in stages, where in each stage we apply the lemma with a different value of `. XORing its output with a small-bias distribution gives our generator for polynomials (Theorem 3.1). We will apply Lemma 3.15 in O(log `) stages. In each stage our goal is to halve the input length of the product test.

49 Proof of Theorem 3.4. Let f be a nice product test with k functions of input length `. Note that by applying Lemma 3.15 with r = `/2 and error ε/(t log `), where t = O(log(k/ε)/`+1), we can halve its input length by incurring an error of ε/O(log `) and using a seed of length t · O(` + log log((t log `)/ε)) log((t log `)/ε) + log log n = t · O˜` log(1/ε) = O˜(log(k/ε) + `) log(1/ε). Now we repeat the argument for s = O(log `) = O˜(1) steps until the input length is zero, which is a constant function and can be fooled with zero error. So we have a generator that fools nice product tests with k functions of input length `, with error ε and seed length s · O˜(log(k/ε) + `) log(1/ε) = O˜(log(k/ε) + `) log(1/ε). Theorem 3.1 follows from XORing the output of the above generator with a small-bias distribution. Proof of Theorem 3.1. Let c be a sufficiently large constant. Let D be a (ε/n)c-biased distri- bution over {0, 1}n [NN93]. Let G be the output distribution of the generator in Theorem 3.4 that fools product tests with n functions and input length c log(n/ε) with error ε/2. The generator outputs D + G. By [NN93] and Theorem 3.4, it takes a seed of length O(log(n/ε)) + O˜log(n/ε) + c log(n/ε) log(1/ε) = O˜(log(n/ε)) log(1/ε). Let p: {0, 1}n → {−1, 1} be any read-once GF(2) polynomial. Consider the polynomial p0 obtained from p by removing all the monomials with degree greater than c log(n/ε) in p. We claim that the expectation of p and p0 under D differs by at most ε. Note that under any (ε/n)c-biased distribution X, the probability that any c log(n/ε) bits are 1 is at most ε/4n, and so by a union bound we have Pr[p(X) 6= p0(X)] ≤ ε/4. In particular, this holds for D and U. It follows that 0 0 E[p(D + G)] − E[p(U)] ≤ E[p (D + G)] − E[p (U)] + ε/2 ≤ ε, where the last inequality holds for any fixed D because of Theorem 3.4. We now prove Lemma 3.15. First we state a claim that will be used in the proof to reduce the input length of the product test. Claim 3.16. Let T (1),...,T (t) be t independent and identical distributions over {0, 1}` that t (i) `  −(r+1) t are δ-close to uniform. Then Pr[wt(∧i=1T ) > r] ≤ r+1 (2 + δ) . Proof. Since T (1),...,T (t) are independent and each T (i) is δ-close to uniform,

 t (i)  X h t (i) i Pr wt(∧i=1T ) > r ≤ Pr ∧i=1 ∧j∈S(Tj = 1) S:|S|=r+1 t X Y  (i)  = Pr ∧j∈S(Tj = 1) S:|S|=r+1 i=1 X  `  ≤ (2−(r+1) + δ)t = (2−(r+1) + δ)t. r + 1 S:|S|=r+1

50 |S| n Proof of Lemma 3.15. For S ⊆ {1, 2, . . . , n}, define the function PADS(x): {0, 1} → {0, 1} to output n bits of which the positions in S are the first |S| bits of x0|S| and the rest are 0. Let C be a sufficiently large constant. The generator G will output H(1), where we define (i) log(k/ε) `  (i) the distribution H recursively for t = O r+1 + log( r+1 ) steps: At the i-th step, H samples two independent distributions D(i),T (i) over {0, 1}m that are (δ, C` log(1/ε))-close to uniform, where δ = 2−C(`+log log(1/ε)) log(1/ε). Then output

(i) (i) (i) (i+1) H := D + T ∧ PADT (i) (H ).

(t+1) 0 We define H to be G (Us0 ). By [NN93, Lemma 4.2], sampling D(i) and T (i) takes a seed of length u := O(` log(1/ε)+log(1/δ)+log log n) = O((`+log log(1/ε)) log(1/ε)+log log n) = O˜(` log(1/ε)).

The total seed length of G is therefore s = s0 + tu = s0 + t · O˜(` log(1/ε)). (i) (1) We now analyze the error of G. For i ∈ {1, 2, . . . , t}, consider the variant HU of H , (1) (i+1) which is the same as H but at the i-th step replace PADT (i) (H ) with PADT (i) (Un). (0) Let HU = Un. For every i ∈ {1, . . . , t}, for every fixed D(1),...,D(i−1) and T (1),...,T (i−1), the function (j) f restricted to ∧j

(i−1) (i) (i) (i) E[f(HU )] − E[f(HU )] = E[g(U)] − E[g(D + T ∧ Un)] ≤ ε.

Hence, summing over i we have

t (t) X (i−1) (i) E[f(Un)] − E[f(HU ] ≤ E[f(HU )] − E[f(HU )] ≤ tε. i=1

(t) (1) 0 We now prove that |E[f(HU )] − E[f(H )]| ≤ ε + 2ε. We will show that except with (j) probability ε, the function f restricted to ∧j≤tT is a product test of input length r and so we can fool the restricted function using G0 given by our assumption. Q Ii Write f = i≤k fi, where each fi is defined on {0, 1} with |Ii| ≤ `. We claim that h i Pr wt(∧t T (i)) > r for some j ∈ {1, . . . , k} ≤ ε. i=1 Ij

It suffices to analyze Pr[wt(∧t T (i)) > r] for each j and take a union bound over j ≤ k. i=1 Ij Since |I | ≤ `, T (i) is 2−C`-close to uniform, by Claim 3.16 and a union bound over j ≤ k, j Ij the probability that some fi has input length > r is at most

   r+1 log(k/ε)  ` t `e Ω +log( ` ) k 2−(r+1) + 2−C` ≤ k · 2−r r+1 r+1 ≤ ε. r + 1 r + 1

51 Hence, for every D(1),...,D(t), with probability 1 − ε over the choice of T (1),...,T (t), the t (i) function f restricted to ∧i=1T becomes a product with k functions of input length r, and remains nice if f is nice. Conditioned on this, we have by the definition of G0 that (t) (1) 0 |E[f(HU )] − E[f(H )]| ≤ ε . Otherwise, as |f| is bounded by 1, the absolute difference is (t) (1) 0 always at most 2. Hence, |E[f(HU )] − E[f(H )]| ≤ ε + 2ε, and so the total error is at most ε0 + (t + 2)ε.

3.4 On almost k-wise independent variables with small total-variance

In this section we will prove Lemma 3.12. Our proof follows closely to the one in [GKM18], which proves the lemma for ε = 0, that is, when the Xi’s are d-wise independent. We first give an overview of their proof. For independent random variables Z1,...,Zk, we will use σ(Z) to denote the standard P Pk 1/2 deviation of i≤k Zi, that is, σ(Z) := ( i=1 Var[Zi]) . As a first step, let us assume each E[Xi] is nonzero and normalize the variables Xi by writing   Y Y Y Xi − [Xi] X = ( [X ] + (X − [X ])) = [X ] 1 + E . i E i i E i E i [X ] i i i E i

Let Zi denote (Xi − E[Xi])/ E[Xi]. If |Zi| is small, then intuitively a low-order Taylor’s Q expansion of (1 + Zi) should approximate the original function well. To write down its i P Q i log(1+Zi) Taylor’s expansion, a convenient way is to rewrite i(1 + Zi) as e . It suffices to bound above its error term in expectation. This is equivalent to bounding the d-th moment P of i log(1 + Zi). A standard calculation gives a bound in terms of the norm and variance of the functions log(1 + Zi). Since |Zi| is small, log(1 + Zi) behaves similarly as Zi. So we 2 P can relate the error term in terms of |Zi| and σ(Z) := i Var[Zi]. In particular if |Zi| ≤ B for all i then we would get an error bound of the form 2O(d)(pσ(Z)2/d + B)O(d). For now let’s think of E[Xi] being bounded away from 0 so that Var[Zi] = Θ(Var[Xi]). Now we handle the case where |Zi| is large. Note that this implies either (1) |Xi − E[Xi]| is large, or (2) E[Xi] is small. We will handle the two conditions separately by a reduction to the case where the |Zi|’s are small. The recurring idea throughout is that we can always tolerate O(d) bad variables that violates the conditions, provided with high probability there can be at most O(d) bad vari- ables. This is because by affording an extra O(d) amount of independence in the beginning, we can condition on the values of these variables and work with the remaining ones. As a simple illustration of this idea, throughout the proof we can assume each Var[Xi] P 2 is bounded by i Var[Xi]/d =: σ(X) /d, as there can be at most d bad variables Xi that violate this inequality, and so we can start with 2d-wise independence, then conditioned on values of the bad variables Xi, the rest of the Xi would satisfy the bound. We first assume the |E[Xi]|’s are large and handle (1), we will round the Xi to E[Xi] whenever |Xi − E[Xi]| ≥ B. Note that by Chebyshev’s inequality an Xi gets rounded with

52 2 probability Var[Xi]/B . It follows that the probability that there are more than d such Xi’s is bounded by (σ(X)/Bd)d. This suggests taking B to be (σ(X)/d)α for some constant α ∈ (0, 1) to balance the error terms. 2 Ω(1) It remains to handle condition (2), for Zi to be bounded by B = (σ(X) /d) , as ex- 2 O(1) plained above it suffices to show that all but O(d) of the Xi’s satisfy |E[Xi]| ≥ (σ(X) /d) . Ω(1) If |E[Xi]| ≥ (σ(X)/d) for Ω(d) of the Xi’s, then by a similar argument as above one can show that with high probability at least half of them is bounded by (σ(X)2/d)Ω(1). Hence, Q 2 Ω(d) E[ i Xi] is at most (σ(X) /d) when the Xi’s are d-wise independent. This finishes the proof. Note that in the case of ε > 0, each Xi is only ε-close to the corresponding Yi and they are not exactly identical. As a result, throughout the proof we will often have to introduce hybrid terms to move from functions of Xi to functions of Yi, and vice versa, and we will show that each of these steps introduces an error of at most kO(d)ε. Also, there is some loss in ε whenever we condition on the values of any subset of the Yi’s, see Claim 3.24 for a formal claim. This introduces the extra condition that each Xi must put a certain mass on each outcome.

3.4.1 Preliminaries In this section, we prove several claims that will be used in the proof of Lemma 3.12.

Lemma 3.17. For any z ∈ C with |z| ≤ 1/2, |log(1 + z)| ≤ 2|z|, where we take the principle branch of the logarithm. Proof. From the Taylor series expansion of the complex-valued log function we have

∞ ∞ ∞ X (−1)n−1 X X |log(1 + z)| = zn ≤ |z|n ≤ |z| (1/2)n = 2|z|. n! n=1 n=1 n=0

Lemma 3.18. Let Z ∈ C be a random variable with |Z| ≤ 1/2, E[Z] = 0 and W = log(1+Z) the principle branch of the logarithm function (phase between (−π, π)). We have Var[W ] ≤ 4 Var[Z].

Proof. By the definition of Variance, Lemma 3.17, and that E[Z] = 0,

2 2 Var[W ] = E[|W | ] − |E[W ]| 2 ≤ E[|W | ] 2 ≤ 4 E[|Z| ] = 4 Var[Z].

Lemma 3.19 (Taylor’s approximation). For w ∈ C and d > 0,

d−1 X |w|d ew − wj/j! ≤ O(1) · max{1, e<(w)}. d! j=0

53 Lemma 3.20. For any random variable W ∈ C, |eE[W ]| ≤ E[|eW |]. Proof. By Jensen’s inequality, we have

[W ] [<(W )] <(W ) W |eE | = |eE | ≤ |E[e ]| = E[|e |].

z1 z2 z2 Claim 3.21. |e − e | ≤ |e | · O(|z1 − z2|) if |z1 − z2| ≤ 1,

Proof. By Lemma 3.19 with d = 1,

z1−z2 <(z1−z2) |e − 1| ≤ O(1) · |z1 − z2| · max{1, e } = O(|z1 − z2|),

because <(z1 − z2) ≤ |z1 − z2| ≤ 1. Therefore,

|ez1 − ez2 | = |ez2 (ez1−z2 − 1)| = |ez2 ||ez1−z2 − 1|

z2 ≤ |e | · O(|z1 − z2|).

Claim 3.22. Let X,Y ∈ Ω be two discrete random variables such that sd(X,Y ) ≤ ε. Let f :Ω → C be any function. We have |E[f(X)] − E[f(Y )]| ≤ 2 maxz|f(z)| · sd(X,Y ). Proof. Let p and q be the probability function of X and Y . Using the fact that sd(X,Y ) = 1 P 2 z|p(z) − q(z)|, we have

X X E[f(X)] − E[f(Y )] = p(z)f(z) − q(z)f(z) z z X ≤ |f(z)||p(z) − q(z)| z X ≤ max|f(z)| · |p(z) − q(z)| z z = 2 max|f(z)| · sd(X,Y ). z

Claim 3.23 (Maclaurin’s inequality (cf. Chapter 12 in [Ste04])). Let z1, . . . , zk be k non- negative numbers. For any i ∈ {0, . . . , k}, we have

k X Y i X i Si(z1, . . . , zk) := zj ≤ (e/i) ( zj) . S:|S|=i j∈S j=1

3.4.2 Proof of Lemma 3.12

We now prove Lemma 3.12. For independent random variables Z1,...,Zk, we will use σ(Z) P Pk 1/2 to denote the standard deviation of i≤k Zi, that is, σ(Z) := ( i=1 Var[Zi]) . We will also denote σ(Z)2/d by v for notational simplicity.

54 Assuming the variances are not too small As hinted in the overview above, throughout the proof we will without loss of generality 2 assume Var[Xi] ≤ σ(X) /d for every i ∈ {1, . . . , k}. This assumption will be used in the proof of Lemma 3.30 to give a uniform bound on how close the rounded Xi’s and Xi’s are in expectation. We first prove a claim that shows the Yi’s remains close to the Xi even if we condition on the values of a few of the Yi’s. This claim will be used multiple times throughout the proof. Note that this claim is immediate for exact independence (ε = 0) but less for almost independence. We shall use the assumption that the Xi take any value with probability at least 2−`.

Claim 3.24. Let X1,X2,...,Xk be k independent random variables over C≤1 with minz∈Supp(Xi) Pr[Xi = −` z] ≥ 2 . Let Y1,Y2,...,Yk be k random variables over C≤1 that are (ε, d)-close to X1,X2,...,Xk. Let S ⊆ {1, . . . , k} be a subset of size t. Then conditioned on any values of the Yi for i ∈ S, 2t` the Yi for i 6∈ S are (3 · 2 ε, d − t)-close to the Xi for i 6∈ S.

Proof. Let T ⊆ [k] − S be a subset of size at most d − t. We have for any value zs for s ∈ S,

h i h i X ^ ^ ^ Pr Yj = zj | Ys = zs − Pr Xj = zj zj :j∈T j∈T s∈S j∈T h i h i Pr V Y = z Pr V X = z X j∈S∪T j j j∈S∪T j j = − , pY pX zj :j∈T

where pX := Pr[∧s∈SXs = zs] and pY := Pr[∧s∈SYs = zs]. Hence, we can rewrite above as

X  1 1  h ^ i 1  h ^ i h ^ i − Pr Yj = zj + Pr Yj = zj − Pr Xj = zj pY pX pX zj :j∈T j∈S∪T j∈S∪T j∈S∪T 1 1 X h ^ i ε ≤ − Pr Yj = zj + pY pX pX zj :j∈T j∈S∪T

≤ |1/pY − 1/pX | + ε/pX

≤ (1/pX pY + 1/pX )ε.

The first and last inequalities are because the Xi’s are (ε, d)-close to the Yi’s. As the Xi’s are Q −t` −t` independent, by our assumption we have pX = s∈S Pr[Xs = zs] ≥ 2 , and so pY ≥ 2 − −t` 2t` ε ≥ 2 /2. (Otherwise the conclusion is trivial.) Therefore, (1/pX pY + 1/pX )ε ≤ 3 · 2 ε, and the proof follows.

Claim 3.25. Let X1,X2,...,Xk be k independent random variables over C≤1 with minz∈Supp(Xi) Pr[Xi = −` z] ≥ 2 for each i ∈ {1, . . . , k}. Let Y1,Y2,...,Yk be k random variables over C≤1. If 2 Lemma 3.12 holds when the Yi’s are (ε, Cd)-close to the Xi’s assuming Var[Xi] ≤ σ(X) /d for every i ∈ [k], then Lemma 3.12 holds when the Yi’s are (ε, (C +1)d)-close the Xi’s without the assumption.

55 Proof. Note that there can be at most d different such indices. Let J be the set of these indices. We have Y Y Y Y Y Y Xi − Yi = Xj Xi − Yj Yi i i j∈J i6∈J j∈J i6∈J  Y Y  Y Y  Y Y  = Xj − Yj Xj + Yj Xj − Yj . j∈J j∈J i6∈J j∈J i6∈J i6∈J

We first bound the expectation of the first term. Since the Xi’s are independent,

h  i h i h i h i Y Y Y Y Y Y EX,Y Xj − Yj Xj = E Xj − E Yj · E Xj j∈J j∈J i6∈J j∈J j∈J i6∈J

h i h i Y Y ≤ E Xj − E Yj j∈J j∈J ≤ ε.

For the second term, note that conditioning on the values of the Yj for which j ∈ J, by O(d`) Claim 3.24, the remaining variables are (2 ε, Cd)-close to the corresponding Xj’s. So we can apply the above Lemma 3.12 with our assumption and the claim follows.

Assuming the variables are close to their expectations and the expectations are large

Lemma 3.26. Let X1,X2,...,Xk be k independent discrete random variables over C≤1. Let Y1,Y2,...,Yk be k discrete random variables over C≤1 that are (ε, d)-close to X1,...,Xk. 0 Assume for each Xi and Yi, there exist Zi and Zi such that

0 Xi = E[Xi](1 + Zi) and Yi = E[Xi](1 + Zi),

0 where |Zi| ≤ B ≤ 1/2 and |Zi| ≤ B ≤ 1/2. Then

k k √ !d h Y i h Y i σ(Z) d + Bd X − Y ≤ 2O(d) + (Bk)O(d)ε. E i E i d i=1 i=1

Remark 3.27. Note that we define Yi above in terms of E[Xi] but not E[Yi]. The random 0 variables Zi are independent, but the variables Zi may not be. Also, later we will take B to be v1/3. ˆ Proof. Define Wi, Wi such that ˆ Wi = log(1 + Zi) and Wi = Wi − E[Wi]. 0 ˆ 0 Likewise, define Wi , Wi such that 0 0 ˆ 0 0 0 Wi = log(1 + Zi) and Wi = Wi − E[Wi ].

56 ˆ ˆ 0 0 ˆ P ˆ ˆ 0 P ˆ 0 Wi+E[Wi] Wi +E[Wi ] Let W = i Wi and W = i Wi . Note that Xi = E[Xi]e and Yi = E[Yi]e . We have

k k k k   ˆ  0  ˆ 0 Y Y E[Wi] W Y Y E[W ] W Xi = E[Xi]e e and Yi = E[Xi]e i e . i=1 i=1 i=1 i=1

Hence the difference is

k k k k k   ˆ 0 ˆ 0  Y Y Y Y E[Wi] W Y E[W ] W Xi − Yi = E[Xi] e · e − e i · e i=1 i=1 i=1 i=1 i=1 k k k k !    0  ˆ 0  ˆ ˆ 0  Y Y E[Wi] Y E[W ] W Y E[W ] W W = E[Xi] e − e i e + e i · e − e . i=1 i=1 i=1 i=1

The lemma follows from the two claims below:

 k  k [W ] k [W 0] Wˆ Claim 3.28. For every outcome of Wˆ , Q [X ] Q eE i − Q eE i e ≤ i=1 E i i=1 i=1 O(kε).

√ d  k [W 0] Wˆ Wˆ 0  O(d)  σ(Z) d+Bd  O(d) Claim 3.29. Q [X ]eE i [e ] − [e ] ≤ 2 + (Bk) ε. i=1 E i E E d

Proof of Claim 3.28. We have

k k k k k k  Y  Y [W ] Y [W 0] Wˆ Y Y [W ] Y [W 0] Wˆ E i E i E i E i E[Xi] e − e e = E[Xi] · e − e · e . i=1 i=1 i=1 i=1 i=1 i=1

By Lemma 3.17, Claim 3.22 and our assumption that |Z| ≤ 1/2 and |Z0| ≤ 1/2, we have P P 0 P 0 | i E[Wi] − i E[Wi ]| ≤ i|E[Wi] − E[Wi ]| ≤ 2kε. Hence, by Claim 3.21,

k k Y [W ] Y [W 0] P [W ] P [W 0] E i E i i E i i E i e − e = e − e i=1 i=1 P [W ] i E i ≤ e · O(kε) k Y E[Wi] = e · O(kε). i=1

57 Therefore,

k k k k k Y Y [W ] Y [W 0] Wˆ Y Y [W ] Wˆ E i E i E i E[Xi] · e − e · e ≤ E[Xi] · e · O(kε) · e i=1 i=1 i=1 i=1 i=1 k  Y  ˆ E[Wi] W = E[Xi]e e · O(kε) i=1 k Y = Xi · O(kε) i=1 ≤ O(kε).

Proof of Claim 3.29. We first rewrite eWˆ − eWˆ 0 as a sum of 3 terms:

d−1 d−1 d−1 ˆ ˆ 0  ˆ X   X j   X j ˆ 0  eW − eW = eW − Wˆ j/j! + (Wˆ j − Wˆ 0 )/j! + Wˆ 0 /j! − eW . j=0 j=0 j=0

k 0 Q E[Wi ] It suffices to bound above the expectation of each term multiplied by γ := i=1 E[Xi]e . We bound the first and last terms using Taylor’s approximation (Lemma 3.19), and the second term using (ε, d)-closeness of the variables. We will show the following:

" d−1 # √ !d  ˆ 0 X j  σ(Z) d + Bd γ · eW − Wˆ 0 /j! ≤ 2O(d) + (kB)O(d)ε (3.7) E d j=0 " d−1 # √ !d  ˆ X  σ(Z) d + Bd γ · eW − Wˆ j/j! ≤ 2O(d) (3.8) E d j=0 d−1 h j i X ˆ j ˆ 0 d γ · E (W − W )/j! ≤ k ε. (3.9) j=0

For (3.7), by Lemma 3.19 we have

d−1 ˆ 0 d  ˆ 0 X j  |W | ˆ 0 γ · eW − Wˆ 0 /j! ≤ |γ| · O(1) · max{1, e<(W )}. d! j=0

58 We now bound above |γ · max{1, e<(Wˆ 0)}| by 1. We have

k Y [W 0] E i |γ| = E[Xi]e i=1 k Y [P W 0] E i i = E[Xi] · e i=1 k Y P W 0 i i ≤ E[Xi] · E[|e |] (Jensen’s inequality, see Lemma 3.20) i=1 " k # Y P W 0 i i = E E[Xi] · e i=1 " k #

Y = E Yi i=1 ≤ 1.

Moreover,

k k k <(Wˆ 0) Y [W 0] <(Wˆ 0) Y [W 0] Wˆ 0 Y E i E i |γ · e | = E[Xi]e · e = E[Xi]e e = Yi ≤ 1. i=1 i=1 i=1

ˆ 0 d ˆ 0 ˆ Hence, it suffices to bound above E[|W | ]. Note that the Wi ’s are (ε, d)-close to the Wi’s. ˆ ˆ So we bound above |Wi| and Var[Wi] and then apply Lemma 3.48. First, since |Zi| ≤ B, ˆ we have |Wi| ≤ 2B because of Lemma 3.17, and so |Wi| ≤ |Wi| + |E[Wi]| ≤ 4B. Next, ˆ ˆ we have Var[Wi] ≤ 4 Var[Zi] because of Lemma 3.18, and so σ(W ) ≤ 2σ(Z). Therefore, by Lemma 3.48,

" k d−1 # ˆ 0 d  Y [W 0] Wˆ 0 X j  E[|W | ] [X ]eE i e − Wˆ 0 /j! ≤ O(1) E E i d! i=1 j=0 √ !d σ(Wˆ ) d + 4Bd ≤ 2O(d) + (kB)O(d)ε d √ !d σ(Z) d + Bd ≤ 2O(d) + (kB)O(d)ε. d

59 We prove Inequality (3.8) similarly. Note that

P [W 0] |e i E i | P 0 P i E[Wi ]− i E[Wi] P [W ] = |e | |e i E i | |P [W 0]−P [W ]| ≤ e i E i i E i P | [W 0]− [W ]| ≤ e i E i E i ≤ ekε ≤ O(1), because ε < 1/k, otherwise the conclusion is trivial. Hence,

k d−1 k d−1  Y [W 0] Wˆ X j   Y [W ] Wˆ X j  E i ˆ E i ˆ E[Xi]e e − W /j! ≤ E[Xi]e e − W /j! · O(1). i=1 j=0 i=1 j=0

Therefore, it follows by Inequality (1) by considering ε = 0 that

" k d−1 # √ !d  Y [W 0] Wˆ X j  O(d) σ(Z) d + Bd [X ]eE i e − Wˆ /j! ≤ 2 . E E i d i=1 j=0

Finally we prove Inequality (3.9). By linearity of expectation,

d−1 d−1 h X j j i X j j E (Wˆ − Wˆ 0 )/j! = (E[Wˆ ] − E[Wˆ 0 ])/j! . j=0 j=0

ˆ j P ˆ j j Note that W = ( i Wi) can be written as a sum of k terms where each term is a product of at most j ≤ d different Wi’s. Moreover, we have |Wi| ≤ 2B ≤ 1 for each i because of j Lemma 3.17. So we have |E[Wˆ j] − E[Wˆ 0 ]| ≤ kjε. Hence,

d−1 d−1 h j i j X ˆ j ˆ 0 X ˆ j ˆ 0 E (W − W )/j! ≤ |E[W ] − E[W ]| j=0 j=0 d−1 X ≤ kjε j=0 ≤ kdε.

Recall that |γ| ≤ 1, this concludes (3.9).

Assuming the expectations are large

We now prove the main lemma assuming the expectation of the Xi are far from zero.

60 Lemma 3.30. Let X1,X2,...,Xk be k independent random variables over C≤1, with minz∈Supp(Xi) −` Pr[Xi = z] ≥ 2 . Let Y1,Y2,...,Yk be k random variables over C≤1 that are (ε, 9d)-close to 2 1/6 X1,...,Xk. Assume |E[Xi]| ≥ (σ(X) /d) for each i. We have

k k d h Y i h Y i σ(X)2  X − Y ≤ 2O(d) + (k2`)O(d)ε. E i E i d i=1 i=1

Proof of Lemma 3.30. We will assume σ(X)2/d is less than a sufficiently small constant and ε ≤ (k2`)−Cd for a sufficiently large C; otherwise the right hand side of the inequality is greater than 2 and there is nothing to prove.

For each i ∈ {1, 2, . . . , k}, we define a new function rdi : C≤1 → C≤1 that will be used to round the variables Xi and Yi. We define rdi as

( 2 1/3 z if |z − E[Xi]| ≤ (σ(X) /d) rdi(z) := E[Xi] otherwise.

˜ ˜ Q Q Let Xi = rdi(Xi) and Yi = rdi(Yi). We will write both i Xi and i Yi as

k k Y Y ˜ ˜ X Y ˜ Y ˜ Xi = (Xi − Xi + Xi) = (Xi − Xi) Xi, i=1 i=1 S⊆{1,2,...,k} i∈S i6∈S

and k k Y Y ˜ ˜ X Y ˜ Y ˜ Yi = (Yi − Yi + Yi) = (Yi − Yi) Yi. i=1 i=1 S⊆{1,2,...,k} i∈S i6∈S Let m = 3d. Define X Y Y Pm(z1, z2, . . . , zk) = (zi − rdi(zi)) rdi(zi). |S|

We will show that Pm is a good approximation of the product in expectation over both Xi’s and Yi’s and then show that the expectations of Pm under Xi’s and Yi’s are close. We will use the following inequalities repeatedly.

˜ −2/3 1/3 P ˜ 2/3 Claim 3.31. Pr[Xi 6= Xi] ≤ Var[Xi]v ≤ v . In particular, i Pr[Xi 6= Xi] ≤ (dσ) .

Proof. The first inequality follows from Chebyshev’s inequality and second follows from the assumption Var[Xi] ≤ v. The last sentence is implied by the first inequality.

h i Claim 3.32. Q Y − P (Y ,...,Y ) ≤ 2O(d)vd + kO(d)ε. E i i m 1 k

61 Q ˜ 0 Proof. Consider the product i∈S(Yi − Yi). Let N be the number of i ∈ {0, 1, 2, . . . , k} ˜ 0 such that Yi 6= Yi. If N < m then any set S of size at least m must contain an i such that ˜ Yi = Yi. In this case the product is 0 and thus Y X Y ˜ Y ˜ Yi − Pm(Y1,...,Yk) = (Yi − Yi) Yi = 0. i |S|≥m i∈S i6∈S So,

h i h  i Y 1 0 Y E Yi − Pm(Y1,...,Yk) = E (N ≥ m) · Yi − Pm(Y1,...,Yk) i i h  i 1 0 Y ≤ E (N ≥ m) · Yi + |Pm(Y1,...,Yk)| i h i h i 1 0 Y 1 0 = E (N ≥ m) · Yi + E (N ≥ m) · |Pm(Y1,...,Yk)| . i

0 Pm−1 N 0 Pm−1 N 0m mN 0 If N ≥ m then there can be at most s=0 s ≤ s=0 m s ≤ 2 m subsets in the m sum in Pm for which the product is nonzero, and each such product can be at most 2 because |S| < m. Thus,

m−1 X N 0 1(N 0 ≥ m) · P (Y ,...,Y ) ≤ 1(N 0 ≥ m) · 2m m 1 k s s=0 N 0 ≤ 1(N 0 ≥ m) · 2m · 2m m N 0 ≤ 22m . m Therefore, h  Y i h Y i hN 0i 1(N 0 ≥ m) · Y + |P (Y ,...,Y )| ≤ 1(N 0 ≥ m) · Y + 22m E i m 1 k E i E m i i hN 0i ≤ [1(N 0 ≥ m)] + 22m . E E m We will show the following

0 hN 0i d O(d) Claim 3.33. Pr[N ≥ m] ≤ E m ≤ v + k ε. Assuming the claim it follows that

h  Y i N 0 1(N 0 ≥ m) · Y − P (Y ,...,Y ) ≤ [1(N 0 ≥ m)] + 22m E i m 1 k E E m i ≤ (1 + 26d)((2v)d + kO(d)ε)(m = 6d) ≤ 2O(d)vd + kO(d)ε.

62 We now prove Claim 3.33.

Proof of Claim 3.33. The first inequality is clear.

N 0 X ≤ Pr[∧ Y 6= Y˜ ] E m i∈S i i |S|=m X  Y ˜  ≤ Pr[Xi 6= Xi] + ε (each Yi is ε-close to Xi) |S|=m i∈S X Y ˜ m ≤ Pr[Xi 6= Xi] + k ε |S|=m i∈S !m e Pk Pr[X 6= X˜ ] ≤ i=1 i i + kmε (Maclaurin’s inequality) m e(d · σ(X))2/3 3d ≤ + kmε (Claim 3.31) 3d ≤ vd + kO(d)ε.

Now, we show that Pm(Y1,...,Yk) is close to Pm(X1,...,Xk) in expectation.

O(d) d 3d Claim 3.34. |E[Pm(X1,...,Xk)] − E[Pm(Y1,...,Yk)]| ≤ 2 v + O(k) ε .

Proof. The difference between Pm(X1,...,Xk) and Pm(Y1,...,Yk) equals

X  Y ˜ Y ˜ Y ˜ Y ˜  Pm(X1,...,Xk) − Pm(Y1,...,Yk) = (Xi − Xi) Xi − (Yi − Yi) Yi . |S|

We can rewrite the right hand side as ! X  Y ˜ Y ˜  Y ˜ Y ˜  Y ˜ Y ˜  (Xi − Xi) − (Yi − Yi) Xi + (Yi − Yi) Xi − Yi . |S|

It suffices to show that     X Y ˜ Y ˜ Y ˜ O(d) E  (Xi − Xi) − (Yi − Yi) Xi ≤ k ε (3.10)

|S|

|S|

63 We first prove Inequality (3.10). Because the Xi’s are independent, the left hand side of the inequality equals

! X h Y ˜ i h Y ˜ i h Y ˜ i E (Xi − Xi) − E (Yi − Yi) E Xi

|S|

m−1 h i h i h i X X Y ˜ Y ˜ Y ˜ ≤ E (Xi − Xi) − E (Yi − Yi) · E Xi s=1 |S|=s i∈S i∈S i6∈S

m−1 h i h i X X Y ˜ Y ˜ ≤ E (Xi − Xi) − E (Yi − Yi) s=1 |S|=s i∈S i∈S m−1 X X ≤ 2 · 2sε s=1 |S|=s m−1 X ≤ ks · 2 · 2sε s=1 ≤ 2(2k)mε = kO(d)ε.

Q |S| To see the third inequality, note that |z − rdi(z)| ≤ 2, and so | i∈S(zi − rdi(zi))| ≤ 2 . So we can apply Claim 3.22 to bound above the absolute difference by 2 · 2|S|ε.

Now we prove Inequality (3.11). As |S| ≤ m = 3d and Yi’s are (ε, 9d)-close to Xi’s, ˜ ˜ conditioned on the values of Xi for which i ∈ S, by Claim 3.24, the remaining Yi’s for which O(m·`) ˜ i 6∈ S are still (2 ε, 6d)-close to the corresponding Xi’s. (Recall that we can assume ε = (k2`)−Cd for a sufficiently large C.) We will apply Lemma 3.26 to them.

0 ˜ ˜ ˜ ˜ 0 Define Zi,Zi such that Xi = E[Xi](1+Zi) and Yi = E[Xi](1+Zi). To apply Lemma 3.26, 0 2 we need the following two claims to bound above |Zi|, |Zi| and σ(Z) . We defer their proofs to the end.

1/6 0 Claim 3.35. Let B = 4v . Then |Zi| ≤ B and |Zi| ≤ B.

Claim 3.36. σ(Z)2 ≤ 4σ(X)2v−1/3.

Therefore, by Lemma 3.26 with ε0 = 2O(m·`)ε and B = 4(σ(X)2/d)1/6 ≤ 1/2 (Recall that we can assume σ(X)2/d less than a sufficiently small constant),

  " # X Y ˜  Y ˜ Y ˜  X Y ˜ E  (Yi − Yi) Xi − Yi  ≤ E (Yi − Yi) · M,

|S|

64 where √ !6d σ(Z) d + dB M ≤ 2O(d) + (Bk)O(d)ε0 d √ !6d σ(X)(σ(X)/ d)−1/3 ≤ 2O(d) √ + B + (Bk2`)O(d)ε d √ !6d σ(X)(σ(X)/ d)−1/3 ≤ 2O(d) √ + 4v1/6 + (k2`)O(d)ε d 6d = 2O(d) v1/3 + v1/6 + (k2`)O(d)ε = 2O(d)vd + (k2`)O(d)ε.

Q ˜ Q |S| We now bound above E[| i∈S(Yi − Yi)|]. Note that | i∈S(zi − rdi(zi))| ≤ 2 . Hence by Claim 3.22, " # " #

Y ˜ Y ˜ |S| E (Yi − Yi) ≤ E (Xi − Xi) + 2 ε. i∈S i∈S ˜ Let N be the number of i ∈ {0, 1, . . . , k} such that Xi 6= Xi. Note that

" # m−1 X Y X  N (X − X˜ ) ≤ 2s E i i E s |S|

X Y ˜ O(d) X |S| O(d) m O(d) E (Yi − Yi) ≤ 2 + 2 ε ≤ 2 + (2k) ε ≤ 2 , |S|

˜ 2 1/3 We now prove Claim 3.35 and 3.36. By Claim 3.31, |E[Xi] − E[Xi]| ≤ (σ(X) /d) . Also 2 1/6 ˜ 2 1/6 by assumption, |E[Xi]| ≥ (σ(X) /d) . So, we have |E[Xi]| ≥ |E[Xi]|/2 ≥ (σ(X) /d) /2.

65 ˜ 1/6 Proof of Claim 3.35. As |E[Xi]| ≥ v /2, we have

|X˜ − [X˜ ]| |Z˜ | = i E i i ˜ |E[Xi]| |X˜ − [X ]| + | [X˜ ] − [X ]| ≤ i E i E i E i ˜ |E[Xi]| ≤ 4v1/3/v1/6 ≤ 4v1/6,

˜0 ˜ 1/3 and the same argument holds for |Zi| because |Yi − E[Xi]| ≤ v .

Proof of Claim 3.36. Since z∗ = E[Z] is the minimizer of E[|Z − z|2], we have

˜ ˜ ˜ 2 Var[Xi] = E[|Xi − E[Xi]| ] ˜ 2 ≤ E[|Xi − E[Xi]| ] 2 ˜ ≤ E[|Xi − E[Xi]| ](Xi = rdi(Xi)) = Var[Xi].

˜ ˜ ˜ 2 −1/3 P ˜ 2 −1/3 Therefore, Var[Zi] = Var[Xi]/|E[Xi]| ≤ 4 Var[Xi]v and thus i Var[Zi] ≤ 4σ(X) v .

The general case Proof of Lemma 3.12. We will again assume σ(X)2/d is less than a sufficiently small constant ` −Cd 2 and ε ≤ (k2 ) for a sufficiently large constant C. We first assume Var[Xj] ≤ σ(X) /d for all j and prove the lemma when the Yi’s are (ε, 15d)-close to the Xi’s. Later we will handle the general case. 1/6 Let m be the number of i such that |E[Xi]| ≤ v . 1/6 If m ≤ 6d, let J be the set of indices for which |E[Xi]| ≤ v . We can write ! ! Y Y Y Y Y Y Y Y Xi − Yi = Xj − Yj Xj + Yj Xj − Yj . i i j∈J j∈J j6∈J j∈J j6∈J j6∈J

It suffices to show that " #   Y Y Y E Xj − Yj Xj ≤ ε (3.12) j∈J j∈J j6∈J " !#

Y Y Y O(d) d ` O(d) E Yj Xj − Yj ≤ 2 v + (k2 ) ε. (3.13) j∈J j6∈J j6∈J

66 We first show Inequality (3.12). Since the Xi’s are independent, the left hand side of (3.12) is

! h i h i h i h i h i Y Y Y Y Y E Xj − E Yj E Xj ≤ E Xj − E Yj j∈J j∈J j6∈J j∈J j∈J ≤ ε.

To prove Inequality (3.13), note that conditioned on the values of the Yi’s for which i ∈ J, O(d`) by Claim 3.24, the rest of the Yi’s are still (2 ε, 9d)-close to the corresponding Xi’s with 1/6 ` −Cd |E[Xi]| ≥ v . (Recall that we can assume ε = (k2 ) for a sufficiently large C.) So the bound follows from Lemma 3.30.

If m ≥ 6d, then note that

h k i k Y Y m/6 d E Xi = |E[Xi]| ≤ v ≤ v . i=1 i=1

So it suffices to show that

h k i Y O(d) d O(d) E Yi ≤ 2 v + k ε. i=1

1/6 Consider the event E that at least 3d of the Yi for i ∈ J have absolute value less than 2v . Then we know that k Y 3d d/2 Yi ≤ 2 · v . i=1

We will show that E happens except with probability at most v2d + k3dε. Let N ∈ 1/6 {0, 1, 2, . . . , m} be the number of i ∈ J such that |Yi| ≥ 2v . Note that

" # X ^ 1/6 Pr[N ≥ 3d] ≤ Pr |Yi| ≥ 2v S⊆J:|S|=3d i∈S X Y  1/6 3d ≤ Pr |Xi| ≥ 2v + k ε. S⊆J:|S|=3d i∈S

By Chebyshev’s inequality,

1/6 1/6 −1/3 Pr[|Xi| ≥ 2v ] ≤ Pr[|Xi − E[Xi]| ≥ v ] ≤ Var[Xi]v .

67 Hence, by Maclaurin’s inequality,

 Pm 1/6 3d X Y e Pr[|Xi| ≥ 2v ] Pr |X | ≥ 2v1/6 ≤ i=1 i 3d S⊆J:|S|=3d i∈S e Pm Var[X ]v−1/3 3d ≤ i=1 i 3d eσ(X)2v−1/3 3d ≤ 3d ≤ v2d.

So, Pr[N ≥ 3d] ≤ v2d + k3dε. Therefore, h i Y 3d d/2 2d 3d E Yi ≤ 2 v + v + k ε i ≤ 2O(d)vd/2 + kO(d)ε.

3.5 Improved bound for bounded independence plus noise fools products

In this section we prove Theorem 3.9, which improves the error bound in Theorem 3.7 from 2−Ω(d) to `−Ω(d), and Theorem 3.11, which gives the optimal error bound for nice product tests. The proof of Theorem 3.9 requires developing a few additional technical tools. We first outline the high-level idea on how to obtain the improvement. For simplicity, we will assume d = O(1) and show how to obtain an error bound of `−Ω(1). Recall in the proof of Theorem 3.7 (see also Table 3.1) that we used a win-win argument on the total-variance: we applied two different arguments depending on whether the total-variance of a product test f is above or below a certain threshold. Suppose now the total-variance of f is guaranteed to lie outside the interval [`−0.1, `0.1]. Then applying the same arguments as before would already give us an error of `−Ω(1). So it suffices to handle the additional case, where the total-variance is in the range of [`−0.1, `0.1]. Our goal is to use noise to reduce the total-variance down to n−0.1, which can then be handled by the low total-variance argument. To achieve this, as a first step we will handle the functions fi with variances above and below `−0.6 separately, and show that O(`)-wise independence plus noise fools the product of the fi in each case. For the former, note that since the total-variance is ≤ `0.1, there can be at most `0.7 functions with variances above `−0.6. In this case we can simply apply the result in Chapter2 (Theorem 3.6). To prove the latter case, we use noise to reduce the variance of each function. Specifically, we use the hypercontractivity theorem to show that applying the noise operator

68 to a function reduces its variance from σ2 to (σ2)(4/3). This is proved in Section 3.5.1 2 below. Hence, on average over the noise, the variance σi of each fi is reduced to at most −0.6 1/3 2 −0.6 1/3 0.1 −0.1 (` ) σi , and so the total-variance of the fi is at most (` ) · ` = ` and we can argue as before. To combine the two cases, we prove a new XOR Lemma for bounded independent distributions, inspired by a similar lemma for small-bias distributions which is proved in [GMR+12], and the theorem follows.

3.5.1 Noise reduces variance of bounded complex-valued functions In this section, we show that on average, noise reduces the variance of bounded complex- valued functions. We will use the hypercontractivity theorem for complex-valued functions (cf. [Hat14, Theorem 6.1.8]). n Let f : {0, 1} → C be any function. For every ρ ∈ [0, 1], define the noise operator Tρ to be Tρf(x) := EN [f(x+N)], where N sets each bit to uniform independently with probability 1 − ρ and 0 otherwise.

Theorem 3.37 (Hypercontractivity Theorem). Let q ∈ [2, ∞). Then for any ρ ∈ [0, p1/(q − 1)],

 q1/q 2 1/2 E |Tρf(x)| ≤ |E[f(x) ]| .

We will use the following well-known corollary.

Corollary 3.38. Let f : {0, 1}n → C. Then

2 2  2  1+ρ  1+ρ2 E |Tρf(x)| ≤ E |f(x)| .

Proof.

 2  0  E |Tρf(x)| = Ex EN,N 0 [f(x + N)f(x + N )]  0  = Ex EN,N 0 [f(x)f(x + N + N )]  0  = Ex f(x) EN,N 0 [f(x + N + N )]   = Ex f(x)TρTρf(x) 2 1 1+ 1 1  1+ρ  1+ρ2  2  1+1/ρ2 ≤ E |f(x)| E |TρTρf(x)| ρ 2 1 1+ρ 2 2 1/2 ≤ E[|f(x)| ] 1+ρ E[|Tρf(x)| ] .

1 1 The first inequality follows from H¨older’sinequality because 1+ρ2 + 1+1/ρ2 = 1, and the second inequality follows from Theorem 3.37 with q = 1 + 1/ρ2.

Let T be a distribution over {0, 1}n that sets each bit independently to 1 with probability 1 − ρ and 0 otherwise.

 0 2  √ 2 Claim 3.39. ET,U |EU 0 [f(U + T ∧ U )]| = E |T ρf(x)| .

69 Proof.

 0 2 hX ˆ ˆ 0 00 i ET,U |EU 0 [U + T ∧ U ]| = ET fαfα0 EU [χα+α0 (U)] EU 0 [χα(T ∧ U )] EU 00 [χα0 (T ∧ U )] α,α0 X ˆ 2  0 00  = |fα| ET,U 0,U 00 χα T ∧ (U + U ) α X ˆ 2 |α|  √ 2 = |fα| ρ = E |T ρf(x)| , α where the last inequality follows from Parseval’s identity because the Fourier expansion of P ˆ |α| Tρf(x) is α fαρ χα(x). We are now ready to prove that noise reduces the variance of a function. The main idea is to translate the function to a point close to its mean so that its variance is close to its second moment, and then apply Corollary 3.38 to it.

n 0 Lemma 3.40. Let f : {0, 1} → C≤1 be any function. Let δ := min{|f(x) − f(x )| : f(x) 6= f(x0)}. Then 2   1+ρ h  i 2 Var[f] ET Var EU [f(x + T ∧ U)] ≤ 4 . x δ2 Proof. We can assume Var[f] ≤ δ2/2; otherwise the conclusion is trivial. Let S be the 2 support of f. For every y ∈ S, let py := Pr[f(x) = y]. Let µ = E[f] and σ = Var[f]. Since σ2 = E[|f(x) − µ|2], there is a point z ∈ S such that |z − µ|2 ≤ σ2. We have

2 X 2 X 2 2 X  σ = py|y − µ| ≥ py|y − µ| ≥ min |y − µ| py . y∈S:y6=z y∈S y∈S:y6=z y∈S:y6=z

f(x)−z Define g(x) := 2 . We have for every t,      2 Var U [f(x + t ∧ U)] = 4 Var U [g(x + t ∧ U)] ≤ 4 x U [g(x + t ∧ U)] . x E x E E E By Corollary 3.38,

2 2 1+ρ ! 1+ρ2 2 2 2 y − z   1+ρ2  2  1+ρ  1+ρ2 X X E |Tρg| ≤ E |g| = py ≤ py 2 y∈S:y6=z y∈S:y6=z

0 0 because |y − y | ≤ 2 for every y, y ∈ C≤1. So by Claim 3.39, we have

2  2  2  X  1+ρ ET,x EU [g(x + T ∧ U)] = E |T g| ≤ py . y∈S:y6=z It follows from above that

2 2   1+ρ h  i  2  X  1+ρ Var[f] ET Var EU [f(x+T ∧U)] ≤ 4 ET,x EU [g(x+T ∧U)] ≤ 4 py ≤ 4 2 . x miny∈S:y6=z|y − µ| y∈S:y6=z

70 2 Now we bound below miny∈S:y6=z|y − µ| . For every y 6= z, δ2 ≤ |y − z|2 ≤ |y − µ|2 + |µ − z|2 ≤ |y − µ|2 + σ2.

Because σ2 ≤ δ2/2, we have

2 2   1+ρ   1+ρ h  i Var[f] 2 Var[f] ET Var EU [f(x + T ∧ U)] ≤ 4 ≤ 4 . x δ2 − σ2 δ2 Remark 3.41. The dependence on δ is necessary. Consider a function f with support {0, ε}. Then f = εg, where g has support {0, 1}. We have Var[f] = ε2 Var[g]. Applying noise to f is the same as applying noise to g, but g has no dependence on ε.

3.5.2 XOR Lemma for bounded independence We now prove a version of XOR lemma for bounded independence that is similar to the one in [GMR+12], which proves the lemma for small-bias distributions.

n k Lemma 3.42. Let f1, . . . , fk : {0, 1} → [0, 1] be k functions on disjoint inputs. Let H : [0, 1] → [0, 1] be a multilinear function in its input. If each fi is fooled by any di-wise indepen- dent distribution with error ε, then the function h: {0, 1}n → [0, 1] defined by h(x) := P H(f1(x), f2(x), . . . , fk(x)) is fooled by any ( i≤k di)-wise independent distribution with er- ror 16kε. We will use the following dual equivalence between bounded independence and sandwich- ing polynomials that was introduced by Bazzi [Baz09]. Fact 3.43 ([Baz09]). A function f : {0, 1}n → [0, 1] is fooled by every d-wise independent distribution if and only if there exist two multivariate polynomials p` and pu of degree d such that n 1. For every x ∈ {0, 1} , we have p`(x) ≤ f(x) ≤ pu(x), and 2. E[pu(U) − f(U)] ≤ ε and E[f(U) − p`(U)] ≤ ε.

Proof of Lemma 3.42. By Fact 3.43, for each i ∈ {1, . . . , k}, there exist two degree-di poly- u ` nomials fi and fi for fi which satisfy the conditions in Fact 3.43. Hence, we have

u ` fi (x) ≥ fi(x) ≥ 0 and 1 − fi (x) ≥ 1 − fi(x) ≥ 0. For every α ∈ {0, 1}k, define

u Y u Y `  Y Y  Mα (x) := fi (x) 1 − fj (x) and Mα(x) := fi(x) 1 − fj(x) . i:αi=1 j:αj =0 i:αi=1 j:αj =0

u u P Clearly, Mα (x) ≥ Mα(x), and Mα (x) has degree i≤k di. We claim that for every α ∈ {0, 1}k, u k E[Mα (x) − Mα(x)] ≤ 2 ε.

71 k u Fix a string α ∈ {0, 1} . Define the hybrids M0 = Mα (x),M1,...,Mk = Mα(x), where

(1) (2) Mi(x) := Mi (x) · Mi (x), where (1) Y Y  Mi (x) := fj(x) 1 − fj(x) , j≤i,αj =1 j≤i:αj =0 and (2) Y u Y `  Mi (x) := fj (x) 1 − fj (x) . j>i:αj =1 j>i:αj =0 Note that

(2) Y  u  Y  `  k−i E[Mi (x)] = E fj (x) E 1 − fj (x) ≤ (1 + ε) , j>i:αj =1 j>i:αj =0

(1) and Mi (x) ≤ 1. So, if αi = 1, then

  h u  (1) (2) i k−i E Mi−1(x) − Mi(x) = E fi (x) − fi(x) · Mi−1(x) · Mi (x) ≤ ε · (1 + ε) .

Likewise, if αi = 0, we have

h `  (1) (2) i k−i E[Mi−1(x) − Mi(x)] = E (1 − fi (x)) − (1 − fi(x)) · Mi−1(x) · Mi (x) ≤ ε · (1 + ε) .

Hence,

u X X i k E[Mα (x) − Mα(x)] ≤ E[Mi−1(x) − Mi(x)] ≤ ε (1 + ε) ≤ 2 ε. 1≤i≤k 0≤i≤k−1

` P u ` P Now we define Mα(x) := 1− β:β6=α Mβ (x). Note that Mα(x) also has degree i≤k di. Since X Y  Mα(x) = fi(x) + (1 − fi(x)) = 1, α∈{0,1}k i≤k we have ` X u X Mα(x) = 1 − Mβ (x) ≤ 1 − Mβ(x) = Mα(x). β:β6=α β:β6=α Hence,

` X u  X k k k k E[Mα(x) − Mα(x)] = Mβ (x) − Mβ(x) ≤ 2 ε ≤ 2 2 ε = 4 ε. β:β6=α β:β6=α As H is multilinear, we can write H as X Y Y H(y1, . . . , yk) = H(α) yi (1 − yi),

α∈{0,1}k i:αi=1 i:αi=0

72 where H(α) ∈ [0, 1] for every α. So X Y Y X h(x) = H(α) fi(x) (1 − fi(x)) = H(α)Mα(x).

α∈{0,1}k i:αi=1 i:αi=0 α∈{0,1}k

Now if we define

u X u ` X ` h (x) := H(α)Mα (x) and h (x) := H(α)Mα(x). α∈{0,1}k α∈{0,1}k

u ` P u ` Clearly h and h both have degree i≤k di. We also have h (x) ≥ h(x) ≥ h (x), and

u ` X u ` X k k k E[h (x) − h (x)] ≤ H(α) E[Mα (x) − Mα(x)] ≤ (2 + 4 )ε ≤ 16 ε. α∈{0,1}k α∈{0,1}k

Therefore, since hu and h` are two polynomials that satisfy the conditions in Fact 3.43, the lemma follows.

3.5.3 Proof of Theorem 3.9 Armed with Lemma 3.40 and Lemma 3.42, we are now ready to prove Theorem 3.9. We first need the following useful fact to handle the case when S is the M-th roots of unity.

Fact 3.44. Let X and Y be two random variables on {0, 1}n. Suppose for every product

test g : {0, 1}n → S, where S is the set of all M-th roots of unity, we have E[g(X)] −

E[g(Y )] ≤ ε. Then for every product test g : {0, 1}n → S and every z ∈ S, we have

Pr[g(X) = z] − Pr[g(Y ) = z] ≤ ε. Proof. Note that for every integer j, the function gj is also a product test with the same range. So for every j, k ∈ {0, . . . , n},

−k j −k j −kj j j E[(ω g(X)) ] − E[(ω g(Y )) ] ≤ ω · E[g(X) ] − E[g(Y ) ] ≤ ε.

Using the identity M−1 ( 1 X 1 if i = k ω(i−k)j = M i=0 0 otherwise, we have for every k ∈ {0, . . . , n − 1},

M k k 1 X  −k j  −k j Pr[g(X) = ω ] − Pr[g(Y ) = ω ] ≤ (ω g(X)) − (ω g(Y )) ≤ ε. M E E i=0

Qk Ii P 1/2 Proof of Theorem 3.9. Write f = i=1 fi, where fi : {0, 1} → C≤1. Let σ denote ( i≤k Var[fi]) . We will consider two cases: σ2 ≥ d`0.1 and σ2 ≤ d`0.1.

73 If σ2 ≥ d`0.1, then the expectation of f under the uniform distribution is small. Specifi- cally, we have

Y Y 1/2 − 1 σ2 −Ω(d`0.1) −Ω(d) 2 EU [fi(U)] = (1 − Var[fi]) ≤ e ≤ 2 ≤ ` . (3.14) i≤k i≤k

Thus, it suffices to show that its expectation under D + T ∧ U is at most `−Ω(d). We now use Claim 3.13 to show that

h k i Y −Ω(d) ED,T,U fi(D + T ∧ U) ≤ ` . i=1

n 2 0 For each t, x ∈ {0, 1} , and each i ∈ {1, 2, . . . , k}, let σt,x,i denote VarU 0 [fi(x + t ∧ U )]. Let T 0 be the uniform distribution over {0, 1}n. By Claim 3.13 with η = 1/2, we have 2 ET 0,U [σT 0,U,i] ≥ Var[fi]/2. So by linearity of expectation,

h X 2 i 2 0.1 ET 0,U σT 0,U,i ≥ σ /2 ≥ d` /2. i≤k

2 2 Since T and D are both d`-wise independent, the random variables σT,D,1, . . . , σT,D,k are 2 2 P 2  0.1 (0, d`)-close to σT 0,U,1, . . . , σT 0,U,k. Let µ = ET 0,U i≤k σT 0,U,i ≥ d` /2. By Lemma 3.49,

√ d hX 2 i d µd + d −Ω(d) Pr σT,D,i ≤ µ/2 ≤ 2 = ` . T,D µ/2 i≤k

Hence, except with probability `−Ω(d) over t ∈ T and x ∈ D, we have

X 2 X 0 0.1 σt,x,i = Var[fi(x + t ∧ U )] ≥ d` /4. U 0 i≤k i≤k

By a similar calculation to (3.14), for every such t and x,

Y Y EU [fi(x + t ∧ U)] ≤ EU [fi(x + t ∧ U)] i≤k i≤k Y 2 1/2 = (1 − σt,x,i) i≤k − 1 P σ2 −Ω(d`0.1) −Ω(d) ≤ e 2 i≤k t,x,i ≤ 2 ≤ ` .

In addition, we always have |f| ≤ 1. Hence, h i h i Y Y −Ω(d) ED,T,U fi(D + T ∧ U) ≤ ED,T EU [fi(D + T ∧ U)] ≤ ` . i≤k i≤k

74 2 0.1 2 2 2 Suppose σ ≤ d` . Let σ1 ≥ σ2 ≥ · · · ≥ σk be the variances of f1, f2, . . . , fk respectively. 0 0.7 2 0.1 0 −0.6 2 Pk0 2 0 2 0.1 Let k = d` . We have σk0 ≤ d` /k = ` ; for otherwise σ ≥ i=1 σi ≥ k σk0 > d` , a 0 n 2 contradiction. Let T be the uniform distribution over {0, 1} . Letσ ˜i denote

h 0 0 i Var EU 0 [f(U + T ∧ U )] . T 0,U

2 2 4/3 n We now show thatσ ˜i ≤ O(σi ) . For every i ∈ {1, . . . , k}, define gi : {0, 1} → C≤1 to be gi(x) = (fi(x)−E[fi])/2 so that E[gi] = 0 and Var[gi] = Var[fi]/4. We apply Lemma 3.40 with ρ = 1/2. Notice that since M is fixed, we have |g(x) − g(x0)| = Ω(1) whenever g(x) 6= g(x0). Hence,

2 h 0 0 i σ˜i = Var EU 0 [gi(U + T ∧ U )] T 0,U h  0 0 i = T 0 Var U 0 [gi(U + T ∧ U )] E U E 2 4/3 = O(σi ) .

It follows that

X 2 X 2 4/3 2 1/3 X 2 −0.2 0.1 −Ω(1) σ˜i = O (σi ) ≤ O (σk0 ) σi ≤ O(` ) · d` = d` . i>k0 i>k0 i>k0 Q Now, if we let F2 := i>k0 fi, then by Lemma 3.12,

−Ω(d) ED,T,U [F2(D + T ∧ U)] − EU [F2(U)] ≤ ` . (3.15)

Qk0 On the other hand, if we define F1 to be i=1 fi, then it follows from Theorem 3.6 that

2 0 0.3 0 −Ω(d `/k ) −Ω(d` ) ED,T,U [F1(D + T ∧ U)] − EU [F1(U)] ≤ k 2 = 2 . (3.16)

We now combine (3.15) and (3.16) using Lemma 3.42. To begin, define g1(x) := ET,U [F1(x+ T ∧ U)] and g2(x) := ET,U [F2(x + T ∧ U)]. If S = [0, 1], then the theorem follows immediately by applying Lemma 3.42 to g1 and g2. However, if S is the set of M-th roots of unity, then we cannot apply Lemma 3.42 directly because it only applies to functions with range [0, 1]. Nevertheless we can use Fact 3.44 to reduce from S to [0, 1]. We now reduce S to [0, 1] and apply Lemma 3.42. For every z ∈ S, we define the point function 1z : S → {0, 1} by 1z(x) = 1 if and only if x = z. Then for every random variable Z on S, X X 1 E[Z] = z Pr[Z = z] = z E[ z(Z)]. z∈S z∈S

75 Hence, X 1  E[g1(X)g2(X)] = z E z g1(X)g2(X) z∈S h i X X 1 1  = z E u g1(X) v g2(X) z∈S u,v∈S:uv=z h i X X 1 1  = z E u g1(X) v g2(X) . z∈S u,v∈S:uv=z

Hence, by Fact 3.44, for every u, v ∈ S, the functions 1u ◦ g1 and 1v ◦ g2 are fooled by d-wise −Ω(d) independence with error ` . So by Lemma 3.42,(1u ◦ f)(1v ◦ g) are fooled by 2d-wise independence with error `−Ω(d). Hence,

E[f(D + T ∧ U)] − E[f(U)]

= E[(g1g2)(D)] − E[(g1g2)(U)]

X X  1 1   1 1  ≤ |z| E ( u ◦ g1)( v ◦ g2) D − E ( u ◦ g1)( v ◦ g2) U z∈S u,v∈S:uv=z ≤ M 2 · `−Ω(d) = `−Ω(d) because M is fixed, proving the theorem.

3.5.4 Proof of Theorem 3.11 We now prove Theorem 3.11. We will need the following theorem that is implicit in [FK18]. The original theorem was stated for read-once branching programs. Below we sketch how to modify their proof to handle product tests. Combining the theorem with Claim 3.14 proves Theorem 3.11.

n Theorem 3.45 ([FK18] (Implicit)). Let f : {0, 1} → C≤1 be a product test with k functions of input length `. Let D and T be two 2t-wise independent distributions over {0, 1}n. Then

−(t−`+1)/2 ED,T,U [f(D + T ∧ U)] − EU [f(U)] ≤ k · 2 , where U is the uniform distribution. Proof. We can assume t ≥ ` for otherwise the conclusion is trivial. Let t0 := t−`+1 ≥ 1. We slightly modify the decomposition in [FK18, Proposition 6.1] as follows. Let f be a product Qk test and write f = i=1 fi. As the distribution D + T ∧ U is symmetric, we can assume ≤i Q the function fi is defined on the ith ` bits. For every i ∈ {1, . . . , k}, let f = j≤i fj and >i Q f = j>i fj. We decompose f into

k ˆ X >i f = f∅ + L + Hif , (3.17) i=1

76 where X ˆ L := fαχα α∈{0,1}`k 0<|α|i we claim that α appears in Hif . This is because the coefficient indexed by (α1, . . . , αi) >i appears in Hi, and the coefficient indexed by (αi+1, . . . , αk) appears in f . Note that all 0 0 the coefficients in each function Hi have weights between t = t − ` + 1 and t + ` − 1 = t, and because our distributions D and T are both 2t-wise independent, we get an error of 2−t0 = 2−(t−`+1) in Lemma 6.2 in [FK18]. The rest of the analysis follows from [FK18] or Chapter2.

Theorem 3.11 easily follows from Theorem 3.45 and Claim 3.14.

Proof of Theorem 3.11. We may assume t ≥ 8`, otherwise the conclusion is trivial. If k ≥ 23`+1dt/`e, then the theorem follows from Claim 3.14. Otherwise, k ≤ 23`+1dt/`e and the theorem follows from Theorem 3.45.

3.6 Small-bias plus noise fools degree-2 polynomials

In this section we show that small-bias distributions plus noise fool non-read-once F2- polynomials of degree 2. We first state a structural theorem about degree-2 polynomials over F2 which will be used in our proof.

n Theorem 3.46 (Theorem 6.30 in [LN97]). For every F2-polynomial p: {0, 1} → {0, 1} of n×n degree 2, there exists an invertible matrix A ∈ F2 , an integer k ≤ bn/2c, and a subset Pk P L ⊆ [n] such that p(Ax) := i=1 x2i−1x2i + i∈L xi. Proof of Claim 3.10. Let p be a degree-2 polynomial. It suffices to fool q(x) := (−1)p(x). By Theorem 3.46, there exists an invertible matrix such that q(Ax) = r(x) · χL(x), where Pk P x2i−1x2i xi r(x) := (−1) i=1 , and χL(x) = (−1) i∈L . By writing r(x) in its Fourier expansion, q(x) has the Fourier expansion  X  q(x) = rˆSχS(x) χL(x), S⊆[k]

77 −k/2 where |rˆS| = 2 . Note that L is a subset of [n]. Viewing the sets S and L as vectors in {0, 1}n, we have

X −k/2 −1 −1 E[q(D + T ∧ U)] − E[q(U)] ≤ 2 E[χS+L(A (D))] · E[χS+L(A (T ∧ U))] ∅6=S⊆[k] −k/2 X −1 ≤ 2 δ E[χS+L(A (T ∧ U))] ∅6=S⊆[k] −k/2 X = 2 δ E[χA(S+L)(T ∧ U)] ∅6=S⊆[k] X = 2−k/2δ (1/3)|A(S+L)|, ∅6=S⊆[k]

where the second inequality follows because small-bias distributions are closed under linear transformations. We now bound above the summation. We claim that X X (1/3)|A(S+L)| ≤ (1/3)|S| = (4/3)k. S⊆[k] S⊆[k]

The equality is clear. To see the inequality, notice that since S ⊆ [k], when viewed as a vector in {0, 1}n its last n − k positions must be 0. So we will instead think of S as a vector in {0, 1}k, and rewrite A(S + L) as A0S + AL, where A0 is the first k columns of the full rank matrix A. In particular, A0 is a full rank n × k matrix. As we are only concerned with the 0 0 00 T Hamming weight of A S + AL, we can permute its coordinates and rewrite A as [Ik|A ] for some k × (n − k) matrix A00. (Readers who are familiar with linear codes should think of the standard form of a generator matrix.) Moreover, for a lower bound on the Hamming weight, we can restrict our attention to the first k bits of A0S + AL. Hence, we can think of first k bits of A0S + AL as S shifted by the first k bits of the fixed vector AL. Since we are summing over all S in {0, 1}k, the shift does not affect the sum, and the inequality follows. Therefore, we have

−k/2 k k/2 E[q(D + T ∧ U)] − E[q(U)] = 2 δ · (4/3) ≤ (8/9) δ, and proving the claim.

3.7 Proof of Claim 3.8

In this section, we more generally exhibit a distribution D that is (d2/10k, d)-close to uniform. One can obtain Claim 3.8 by setting d = k1/3. To simplify notation we will switch from {0, 1} to {−1, 1}, and replace k with 2k. We define D to be the uniform distribution over strings in {−1, 1}2k with equal number of −1’s and 1’s. Claim 3.47. D is (10d2/k, d)-close to uniform for every integer d.

78 Proof. We can assume d2 ≤ k/10, for otherwise the conclusion is trivial. Let I ⊆ [k] be a subset of size d. For every x ∈ {−1, 1}d, we have 2k−d  Pr[D = x] = k−wt(x) , I 2k k where wt(x) is the number of −1’s in x. We bound below the right hand side by 2k−d k(k − 1) ··· (k − d + 1) k−d = 2k 2k(2k − 1) ··· (2k − d + 1) k k − d + 1d ≥ 2k  d − 1d = 2−d 1 − k  d(d − 1) ≥ 2−d 1 − k ≥ 2−d · (1 − d2/k), and bound it above by 2k−d  (k(k − 1) ··· (k − d/2 + 1))2 k−d/2 = 2k 2k(2k − 1) ··· (2k − d + 1) k  k d ≤ 2k − d + 1  d − 1 d = 2−d 1 + 2k − d + 1 d i! X  d(d − 1)  ≤ 2−d 1 + 2k − d + 1 i=1  d(d − 1)  ≤ 2−d 1 + 2 · 2k − d + 1 ≤ 2−d · (1 + 2d2/k). The third inequality is because the geometric sum has ratio ≤ 1/2 as d2 ≤ k/10, and so −d −d 2 is bounded by twice the first term. Hence, we have |Pr[DI = x] − 2 | ≤ 2 · 2d /k for every x ∈ {−1, 1}d. The claim then follows from summing the inequality over every x ∈ {−1, 1}d. We now define our product test f. For each j ∈ {1,..., 2k}, define f : {−1, 1}2k → √ j C≤1 xj −i/ 2k Q to be fj(x) = ω , where ω := e . Let f = j≤2k fj. We now show that for every large enough k we have

E[f(D + T ∧ U)] − E[f(U)] ≥ 1/10.

79 We now bound above and below the expectation of f under both distributions. We will use the fact that 1 − θ2/2 ≤ cos θ ≤ 1 − 2θ2/5 for θ ∈ [−1, 1]. First, we have

√ 2k Y x Y −1   2k E[f(U)] = Ex∼{−1,1}[ω ] = (ω + ω )/2 = cos(1/ 2k) ≤ (1 − 1/5k) . j≤2k j≤2k

Next for every j ∈ {1, 2,..., 2k}, we have

3 1 [f (x + T ∧ U)] = ωxj + ω−xj . ET,U j 4 4

3 x 1 −x Define β : {−1, 1} → C≤1 to be β(x) := 4 ω + 4 ω . Since D has the same number of −1’s and 1’s,

h Y i k k ED βj(D) = β(1) β(−1) j≤2k = (10/16 + 3/16 · (ω2 + ω−2))k √ = (5/8 + 3/8 · cos(2/ 2k))k ≥ (5/8 + 3/8 · (1 − 1/k))k = (1 − 3/8k)k,

Therefore |E[f(D+T ∧U]−E[f(U)]| ≥ (1−3/8k)k −(1−1/5k)2k ≥ 1/10, for every sufficiently large k, concluding the proof. The fi in this proof have variance Θ(1/k). So this counterexample gives a product test with total-variance O(1), and is relevant also to Lemma 3.12. Specifically it shows that for ` = 1 and say d = O(1), the error term (k2`)O(d)ε in Lemma 3.12 cannot be replaced with c Ω(1) k ε for a certain constant c. Moreover, it cannot be replaced even if any k of the Yi are close to the Xi (as opposed to just O(1)).

3.8 Moment bounds for sum of almost d-wise indepen- dent variables

In this section we prove some moment bounds and tail bounds for sum of almost d-wise independent complex variables.

Lemma 3.48. Let Z1,Z2,...,Zk ∈ C be independent random variables with E[Zi] = 0, |Zi| < B. Let d be an even positive integer. Let W1,W2,...,Wk ∈ C be random variables that are (ε, d)-close to Z1,...,Zk. Then,

" k # d  1/2 d X d X d E Wi ≤ 2 Var[Zi] · d + dB + (2kB) ε. i=1 i

80 Proof of Lemma 3.48. Note that for any random variable W ∈ C we have

d/2 h di h 2 2 i E |W | = E |<(W )| + |=(W )| d/2 h 2 2  i ≤ E 2 max{|<(W )| , |=(W )| }

d/2 h d di ≤ 2 · E |<(W )| + |=(W )| ,

and Var[W ] = Var[<(W )]+Var[=(W )]. We will first prove the lemma when W is real-valued.

Since W1,...,Wk are (ε, d)-close to Z1,...,Zk, and d is even, we have

" k # " # d  d X X E Wi = E Wi i=1 i " d # X Y d d ≤ E Zij + k B ε, i1,...,id j=1

because there are kd products in the sum, each product is bounded by Bd and Claim 3.22. h i We now estimate the quantity P Qd Z . We have i1,...,id E j=1 ij

" d # d " d # X Y X X X Y E Zij = E Zij . i1,...,id j=1 m=1 |S|=m i1,...,id∈S: j=1 {ij }j =S

The expectation is zero whenever Zij appears only once for some ij ∈ S. So each Zij must appear at least twice. So the expectation is 0 whenever m > d/2. As each Zi is bounded by d−2m Q 2 d−2m Q B, each product is bounded by B j∈S E[Zj ] = B j∈S Var[Zj]. For each S ⊆ [k]

81 d Pk 1/2 of size m, there are at most m such terms. Let σ denote ( i=1 Var[Zi]) . Then,

" d # d/2 X Y X d−2m d X Y E Zij ≤ B m Var[Zj] i1,...,id j=1 m=1 |S|=m j∈S d/2 X ≤ Bd−2mmd−memσ2m (Maclaurin’s inequality, see Claim 3.23) m=1 d/2 X ≤ ed/2 Bd−2m(d/2)d−mσ2m m=1 d/2 X  σ2 m ≤ ed/2(d/2)dBd (d/2)B2 m=0 d−1   σd  X ≤ ed/2(d/2)dBd · d 1 + ( αm ≤ d(α0 + αd−1), ∀α > 0) (d/2)d/2Bd m=0   ≤ ded/2 (d/2)dBd + (d/2)d/2σd √ ≤ 2d/2(dB + σ d)d.

Putting everything together, we have

" k # d  √  X d/2 d/2 d d E Wi ≤ 2 2 (σ d + dB) + (kB) ε i=1 √ ≤ 2d(σ d + dB)d + (2kB)dε.

Lemma 3.49. Let X1,X2,...,Xk ∈ [0, 1] be independent random variables. Let d be an even positive integer. Let Y1,Y2,...,Yk ∈ [0, 1] be random variables that are (ε, d)-close to P P X1,...,Xk. Let Y = i≤k Yi and µ = E[ i Xi]. Then,

√  µd + dd 2k d Pr[|Y − µ| ≥ δµ] ≤ 2d + ε. δµ δµ

In particular, if µ ≥ 25d and δ = 1/2, we have Pr[|Y − µ| ≥ µ/2] ≤ 2−Ω(d) + kdε.

0 0 0 P 0 0 Proof. Let Xi = Xi − E[Xi], Yi = Yi − E[Xi] and Y = i Yi . Note that Xi ∈ [−1, 1] and 0 E[Xi] = 0. Since Xi ∈ [0, 1], we have

2 0 E[Xi] ≥ E[Xi ] ≥ Var[Xi] = Var[Xi − E[Xi]] = Var[Xi].

82 By Lemma 3.48 and Markov’s inequality,

Pr[|Y − µ| ≥ δµ] = Pr[|Y 0|d ≥ (δµ)d] d (P Var[X0] · d)1/2 + d 2k d ≤ 2d i i + ε δµ δµ √  µd + dd 2k d ≤ 2d + ε, δµ δµ

P 0 where in the last inequality we used µ ≥ i Var[Xi].

83 84 Chapter 4

Fourier Bounds and Pseudorandom Generators for Product Tests

In this chapter we are interested in understanding the Fourier spectrum of product tests. We first define the Fourier weight of a function. For a function f : {0, 1}n → R, consider its P ˆ Fourier expansion f = S⊆[n] fSχS. n Definition 4.1 (dth level Fourier weight in Lq-norm). Let f : {0, 1} → C≤1 be any function. The dth level Fourier weight of f in Lq-norm is

X ˆ q Wq,d[f] := |fS| . |S|=d

Pd We denote by Wq,≤d[f] the sum i=0 Wq,i[f]. Several papers have studied the Fourier spectrum of different classes of tests. This includes constant-depth circuits [Man95, Tal17], read-once branching programs [RSV13, SVW17, CHRT18], and low-sensitivity functions [GSTW16]. More specifically, these pa- pers showed that they have bounded L1 Fourier tail, that is, there exists a positive number b such that for every test f in the class and every positive integer d, we have

d W1,d[f] ≤ b .

4.1 Our results

One technical contribution of this chapter is giving tight upper and lower bounds on the L1 Fourier tail of product tests.

n Theorem 4.2. Let f : {0, 1} → [−1, 1] be a product test of k functions f1, . . . , fk with input −c` length `. Suppose there is a constant c > 0 such that |E[fi]| ≤ 1 − 2 for every fi. For every positive integer d, we have √ d W1,d[f] ≤ 72( c · `) .

85 Theorem 4.2 applies to Boolean functions fi with outputs {0, 1} or {−1, 1}, for which we know a bound on c. Moreover, the parity function on `k bits can be written as a product ˆ test with outputs {−1, 1}, which has f[`k] = 1. So product tests do not have non-trivial L2 Fourier tail. (See [Tal17] for a definition.) We also obtain a different upper bound when the fi are arbitrary [−1, 1]-valued functions.

n Theorem 4.3. Let f : {0, 1} → [−1, 1] be a product test of k functions f1, . . . , fk with input length `. Let d be a positive integer. We have

p d W1,d[f] ≤ 85 ` ln(4ek) .

We note that Theorems 4.2 and 4.3 are incomparable, as one can take ` = 1 and k = n, or ` = n and k = 1.

Claim 4.4. For all positive integers ` and d, there exists a product test f : {0, 1}`k → {0, 1} with k = d · 2` functions of input length ` such that

3/2 d W1,d[f] ≥ (`/e ) .

d This matches the upper bound W1,d[f] = O(`) in Theorem 4.2 up to the constant in the O(·). Moreover, applying Theorem 4.3 to the product test f in Claim 4.4 gives p d √ d O(`) W1,d[f] = O( ` log(2k)) = O(`+ ` log d) . Therefore, for all integers ` and d ≤ 2 , there p d exists an integer k and a product test f such that the upper bound W1,d[f] = O( ` log(2k)) is tight up to the constant in the O(·). We now discuss some applications of Theorems 4.2 and 4.3 in pseudorandomness.

Pseudorandom generators. In recent years, researchers have developed new frameworks to construct pseudorandom generators against different classes of tests. Gopalan, Meka, Reingold, Trevisan and Vadhan [GMR+12] refined a framework introduced by Ajtai and Wigderson [AW89] to construct better generators for the classes of combinatorial rectangles and read-once DNFs. Since then, this framework has been used extensively to construct new PRGs against different classes of tests [TX13, GKM18, GY14, RSV13, SVW17, CSV15, HT18, ST18, CHRT18, FK18, MRT18, DHH18]. Recently, a beautiful work by Chattopad- hyay, Hatami, Hosseini and Lovett [CHHL18] developed a new framework of constructing PRGs against any classes of functions that are closed under restriction and have bounded L1 Fourier tail. Thus, applying their result to Theorems 4.2 and 4.3, we can immediately obtain a non-trivial PRG for product tests. However, using the recent result of Forbes and Kelley [FK18] and exploiting the structure of product tests, we use the Ajtai–Wigderson framework to construct PRGs with much better seed length than using [CHHL18] as a blackbox.

Theorem 4.5. There exists an explicit generator G: {0, 1}s → {0, 1}n that fools the XOR of any k Boolean functions on disjoint inputs of length ≤ ` with error ε and seed length O(` + log(n/ε))(log ` + log log(n/ε))2 = O˜(` + log(n/ε)).

86 Here O˜(1) hides polynomial factors in log `, log log k, log log n and log log(1/ε). When `k = n or ε = n−Ω(1), the generator in Theorem 4.5 has seed length O˜(` + log(k/ε)), which is optimal up to O˜(1) factors. We now compare Theorem 4.5 with previous results. Using a completely different anal- ysis, in Chapter3 we obtained a generator with seed length O˜((` + log k)) log(1/ε). When ` = O(log n) and k = 1/ε = nΩ(1), this is O˜(log2 n), whereas the generator in Theorem 4.5 ˜ has seed length O(log n). When each function fi is computable by a read-once width-w branching program on ` bits, Meka, Reingold and Tal [MRT18] obtained a PRG with seed length O(log(n/ε))(log ` + log log(n/ε))2w+2. When ` = O(log(n/ε)), Theorem 4.5 improves on their generator on the lower order terms. As a result, we obtain a PRG for read-once F2-polynomials, which are a sum of monomials on disjoint variables over F2, with seed length O(log(n/ε))(log log(n/ε))2. This also improves on the seed length of their PRG for read-once polynomials in the lower order terms by a factor of (log log(n/ε))4. Our generator in Theorem 4.5 also works for the AND of the functions fi, corresponding to the class of unordered combinatorial rectangles. Previous generators [CRS00, DETT10] use almost-bounded independence or small-bias distributions, and have seed length O(log(n/ε))(1/ε). While several papers [Lu02, Vio14, GMR+12, GY14, GKM18] have improved the seed length for this model in the fixed order setting, our generator is the first improvement for the un- ordered setting and has nearly-optimal seed length. In fact, we have the following more general corollary.

Corollary 4.6. There exists an explicit pseudorandom generator G: {0, 1}s → {0, 1}n with ˜ Ii seed length O(` + log(n/ε)) such that the following holds. Let f1, . . . , fk : {0, 1} → {0, 1} be k Boolean functions where the subsets Ii ⊆ [n] are pairwise disjoint and have size at most `. k P Let g : {0, 1} → C≤1 be any function and write g in its Fourier expansion g = S⊆[k] gˆSχS. P Then G fools g(f1, . . . , fk) with error L1[g] · ε, where L1[g] := S6=∅|gˆS|.

Proof. Let G be the generator in Theorem 4.5. Note that χS(f1(xI1 ), . . . , fk(xIk )) is a product test with outputs {−1, 1}. So by Theorem 4.5 we have

E[g(f1(UI1 ), . . . , fk(UIk )) − E[g(f1(GI1 ), . . . , fk(GIk )] X ≤ |gˆS| E[χS(f1(UI1 ), . . . , fk(UIk ))] − E[χS(f1(GI1 ), . . . , fk(GIk )] S

≤ L1[g] · ε.

Note that the AND function has L1[AND] ≤ 1, and so the generator in Corollary 4.6 fools unordered combinatorial rectangles. When the functions fi in the product tests have outputs [−1, 1], we also obtain the following generator.

Theorem 4.7. There exists an explicit generator G: {0, 1}s → {0, 1}n that fools any prod- uct test with k functions of input length ` with error ε and seed length O(log `k)((` + log(k/ε))(log ` + log log(k/ε)) + log log n) = O˜(` + log(k/ε)) log k.

87 √ When ` = o(log n) and k = 1/ε = 2o( log n), Theorem 4.7 gives a better seed length than Theorem 4.5. Thus the generator in Theorem 4.7 remains interesting for fi ∈ {−1, 1} when a product test f depends on very few variables and the error ε is not so small. The generator in Chapter3 has an extra O˜(log(1/ε)) in the seed length. However, the generator in that Chapter works even when the fi have range C≤1, which implies generators for several variants of product tests such as generalized halfspaces and combinatorial shapes. (See [GKM18] for the reductions.) Finally, when the subsets Ii of a product test are fixed and known in advanced, Gopalan, Kane and Meka [GKM18] constructed a PRG of the same seed length as Theorem 4.5, but again their PRG works more generally for the range of C≤1 instead of {−1, 1}.

F2-polynomials. Chattopadhyay, Hatami, Lovett and Tal [CHLT19] recently constructed a pseudorandom generator for any class of functions that are closed under restriction, pro- vided there is an upper bound on the second level Fourier weight of the functions in L1- norm. They conjectured that every n-variate F2-polynomial f of degree d satisfies the bound 2 1/2−o(1) W1,2[f] = O(d ). In particular, a bound of n would already imply a generator for poly- nomials of degree d = Ω(log n), a major breakthrough in complexity theory. Theorem 4.3 shows that their conjecture is true for the special case of read-once polynomials. In fact, t it shows that W1,t[f] = O(d ) for every positive integer t. Previous bound for read-once 4 t polynomials gives W1,t[f] = O(log n) [CHRT18].

The coin problem. Let Xn,ε = (X1,...,Xn) be the distribution over n bits, where the variables Xi are independent and each Xi equals 1 with probability (1−ε)/2 and 0 otherwise. The ε-coin problem asks whether a given function f can distinguish between the distributions Xn,ε and Xn,0 with advantage 1/3. This central problem has wide range of applications in computational complexity and has been studied extensively for different restricted classes of tests, including bounded-depth circuits [Ajt83, Val84, ABO84, Ama09, Vio09a, SV10, Aar10, Vio14, CGR14], space-bounded algorithms [BV10b, Ste13, CGR14], bounded-depth circuits with parity gates [SV10, KS18, + + RS17, LSS 18], F2-polynomials [LSS 18, CHLT19] and product tests [LV18]. It is known that if a function f has bounded L1 Fourier tail, then it implies a lower bound on the smallest ε∗ of ε that f can solve the ε-coin problem.

n Fact 4.8. Let f : {0, 1} → C≤1 be any function. If for every integer d ∈ {0, . . . , n} we have d W1,d[f] ≤ b , then f solves the ε-coin problem with advantage at most 2bε. Proof. We may assume bε ≤ 1/2, otherwise the result is trivial. Observe that we have |S| E[χS(Xn,ε)] = ε for every subset S ⊆ [n]. Thus,

X ˆ E[f(Xn,ε)] − E[f(Xn,0)] = fS E[Xn,ε] S6=∅ n n n X X ˆ d X d X −(d−1) ≤ |fS| · ε = (bε) ≤ bε · 2 ≤ 2bε. d=1 |S|=d d=1 d=1

88 Lee and Viola [LV18√] showed that product tests with range [−1, 1] can solve the ε-coin problem with ε∗ = Θ(1/ m log k). Hence, Fact 4.8 implies that Theorem 4.3 recovers their lower bound. Moreover, their upper bound implies that the dependence on ` and k in Theorem 4.3 is tight up to constant factors when d is constant. Claim 4.4 complements this by showing that the dependence on d in Theorem 4.3 is also tight for some choice of k. The work [LV18] also√ shows that when the range of the functions fi is C≤1, the right ∗ answer for ε is Θ(1√ / `k). Therefore, one cannot hope for a better tail bound than the d trivial bound of ( `k) when the range is C≤1.

4.1.1 Techniques We now explain how to obtain Theorems 4.2 and 4.3 and our pseudorandom generators for product tests (Theorems 4.5 and 4.7).

Fourier spectrum of product tests The high-level idea of proving Theorems 4.2 and 4.3 is inspired from [LV18]. For intuition, let us first assume that the functions fi have outputs {0, 1} and are all equal to f1 (but defined on disjoint inputs). It will also be useful to think of the number of functions k being much larger than input length ` of each function. We first explain how to bound above P ˆ q W1,1[f]. (Recall in Definition 4.1 we defined Wq,d[f] of a function f to be |S|=d|fS| .)

Bounding W1,1[f]. Since the functions fi of a product test f are defined on disjoint inputs, each Fourier coefficient of f is a product of the coefficients of the fi, and so each weight-1 coefficent of f is a product of k − 1 weight-0 and 1 weight-1 coefficients of the fi. From this, we can see that W1,1[f] is equal to k · W [f ] · W [f ]k−1 = k · W [f ] · [f ]k−1. (4.1) 1 1,1 1 1,0 1 1,1 1 E 1 k−1 Because of the term E[f1] , to maximize W1,1[f] it is natural to consider taking f1 to be a function with expectation E[f1] as close to 1 as possible, i.e. the OR function. In such case, one would hope for a better bound on W1,1[f1]. Indeed, Chang’s inequality [Cha02] (see also [IMR14] for a simple proof) says that for a [0, 1]-valued function g with expectation α ≤ 1/2, we have 2 W2,1[g] ≤ 2α ln(1/α). (The condition α ≤ 1/2 is without loss of generality as one can instead consider 1 − g.) It follows by a simple application of the Cauchy–Schwarz inequality that W1,1[g] ≤ √ p O( n) · α ln(1/α) (see Fact 4.11 below for a proof). Moreover,√ when the functions fi are −` −` p Boolean, we have 2 ≤ E[fi] ≤ 1 − 2 , and so ln(1/α) ≤ `. Plugging these bounds into k−1 Equation (4.1), we obtain a bound of O(`) · k(1 − E[f1]) E[f1] . So indeed E[f1] should be roughly 1 − 1/k in order to maximize W1,1[f], giving an upper bound of O(`). For the case where the fi can be different, a simple convexity argument shows that W1,1[f] is maximized when the functions fi have the same expectation.

89 Bounding W1,d[f] for d > 1. To extend this argument to d > 1, one has to generalize Chang’s inequality to bound above W2,d[g] for d > 1. The case d = 2 was already proved by Talagrand [Tal96]. Following Talagrand’s argument in [Tal96] and inspired by the work of Keller and Kindler [KK13], which proved a similar bound in terms of a different measure than E[g], we prove the following bound on W2,d[g] in terms of its expectation. Lemma 4.9. Let g : {0, 1}n → [0, 1] be any function. For every positive integer d, we have

2 1/d d W2,d[g] ≤ 4 E[g] 2e ln(e/ E[g] ) .

We note that the exponent 1/d of E[g] either did not appear in previous upper bounds (mentioned without proof in [IMR14]), or only holds for restricted values of d [O’D14]. This exponent is not important for proving Theorem 4.2 , but will be crucial in the proof of Theorem 4.3, which we will explain later on. For d > 1, the expression for W1,d[f] becomes much more complicated than W1,1[f], as it involves W1,z[f1] for different values of z ∈ [`]. So one has to formulate the expression of W1,d[f] carefully (see Lemma 4.12). Once we have obtained the right expression for W1,d[f], the proof of Theorem 4.2 follows the outline above by replacing Chang’s inequality with Lemma 4.9. One can then handle functions fi with outputs {−1, 1} by considering the translation fi 7→ (1 − fi)/2, which only changes each W1,d[fi] (for d > 0) by a factor of 2. We remark that Theorem 4.2 is sufficient for constructing the generator in Theorem 4.5.

Handling [−1, 1]-valued fi. Extending this argument to proving Theorem 4.3 poses sev- eral challenges. Following the outline above, after plugging in Lemma 4.9, we would like to show that E[f1] should be roughly 1 − 1/k to maximize W1,d[f]. However, it is no longer clear why this is the case even assuming the maximum is attained√ by functions fi with the same expectation, as we now do not have the bound pln(1/α) ≤ `, and so it cannot be used to simplify the expression of W1,d[f] as before. In fact, the above assumption is simply false if we plug in the upper bound in Lemma 4.9 with the exponent 1/d omitted to the

W1,zi [fi]. Using Lemma 4.9 and the symmetry of the expression for W1,d[f], we reduce the problem of bounding above W1,d[f] with different fi to bounding the same quantity but with the additional assumption that the fi have the same expectation E[f1]. This uses Schur-convexity (see Section 4.2 for its definition). Then by another convexity argument we show that the maximum is attained when E[f1] is roughly equal to 1 − d/k. Both of these arguments critically rely on the aforementioned exponent of 1/d in Lemma 4.9.

Pseudorandom generators We now discuss how to use Theorems 4.2 and 4.3 to construct our pseudorandom generators for product tests. Our construction follows the Ajtai–Wigderson framework [AW89] that was recently revived and refined by Gopalan, Meka, Reingold, Trevisan and Vadhan [GMR+12]. The high-level idea of this framework involves two steps. For the first step, we show that derandomized bounded independence plus noise fools f. More precisely, we will show

90 that if we start with a small-bias or almost-bounded independent distribution D (“bounded independence”), and select roughly half of D’s positions T pseudorandomly and set them to uniform U (“plus noise”), then this distribution, denoted by D + T ∧ U, fools product tests. Forbes and Kelley [FK18] recently improved the analysis in Chapter2 and implicitly showed that δ-almost d-wise independent plus noise fools product tests, where d = O(` + log(k/ε)) and δ = n−Ω(d). Using Theorem 4.3, we improve the dependence on δ to (` ln k)−Ω(d) and obtain the following theorem. Theorem 4.10. Let f : {0, 1}n → [−1, 1] be a product test with k functions of input length `. Let d be a positive integer. Let D and T be two independent δ-almost d-wise independent distributions over {0, 1}n, and U be the uniform distribution over {0, 1}n. Then √ p d −(d−`)/2 E[f(D + T ∧ U)] − E[f(U)] ≤ k · δ · (170 · ` ln(ek)) + 2 , where “+” and “∧” are bit-wise XOR and AND respectively. The second step of the Ajtai–Wigderson framework builds a pseudorandom generator by applying the first step (Theorem 4.10) recursively. Let f : {0, 1}n → {0, 1} be a product test with k functions of input length `. As product tests are closed under restrictions (and shifts), after applying Theorem 4.10 to f and fixing D and T in the theorem, the function T fD,T : {0, 1} → {0, 1} defined by fD,T (y) := f(D+T ∧y) is also a product test. Thus one can apply Theorem 4.10 to fD,T again and repeat the argument recursively. We will use different progress measures to bound above the number of recursion steps in our constructions. We first describe the recursion in Theorem 4.7 as it is simpler.

Fooling [−1, 1]-valued product tests. Here our progress measure is the number of bits that are defined by the product test f. We show that after O(log(`k)) steps of the recursion, the restricted product test is defined on at most O(` + log(k/ε)) bits with high probability, which can then be fooled by an almost-bounded independent distribution. This simple recursion gives our second PRG (Theorem 4.7).

Fooling Boolean-valued product tests. Our construction of the first generator (Theo- rem 4.5) is more complicated and uses two progress measures. The first one is the maximum input length ` of the functions fi, and the second is the number k of the functions fi. We re- duce the number of recursion steps from O(log(k/ε)) log ` to O(log `). This requires a more delicate construction and analysis that are similar to the recent work of Meka, Reingold and Tal [MRT18], which constructed a pseudorandom generator against XOR of disjoint constant-width read-once branching programs. There are two main ideas in their construc- tion. First, they ensure k ≤ 16` in each step of the recursion, by constructing another PRG to fool the test f for the case k ≥ 16`. We will also use this PRG in our construction. Next, throughout the recursion they allow one “bad” function fi of the product test f to have a longer input length than m, but not longer than O(log(n/ε)). Using these two ideas, they show that whenever ` ≥ log log n during the recursion, then after O(1) steps of the recursion all but the “bad” fi have their input length restricted by a half, while the “bad” fi always

91 has length O(log(n/ε)). This allows us to repeat O(log `) steps until we are left with a 0 product test of k ≤ polylog(n) functions, where all but one of the fi have input length at most `0 = O(log log n). Now we switch our progress measure to the number of functions. This part is different from [MRT18], in which their construction relies on the fact that the fi are computable by read-once branching programs. Here because our functions fi are arbitrary, by grouping c functions as one, we can instead think of the parameters k0 and `0 in the product test as k00 = k0/c and `00 = c`0, respectively. Choosing c to be O(log n/ log log n), we have `00 = O(log n) and so we can repeat the previous argument again. Because each time k0 is reduced by a factor of c, after repeating this for O(1) steps, we are left with a product test defined on O(log n) bits, which can be fooled using a small-bias distribution. This gives our first generator (Theorem 4.5).

Organization In Section 4.2 we prove Theorems 4.2 and 4.3. In Section 4.3 we construct our pseudorandom generators for product tests, proving Theorems 4.5 and 4.7. In Section 4.4 we prove Lemma 4.9, which is used in the proof of Theorem 4.3.

4.2 Fourier spectrum of product tests

In this section we prove Theorems 4.2 and 4.3. We first restate the theorems.

n Theorem 4.2. Let f : {0, 1} → [−1, 1] be a product test of k functions f1, . . . , fk with input −c` length `. Suppose there is a constant c > 0 such that |E[fi]| ≤ 1 − 2 for every fi. For every positive integer d, we have

√ d W1,d[f] ≤ 72( c · `) .

n Theorem 4.3. Let f : {0, 1} → [−1, 1] be a product test of k functions f1, . . . , fk with input length `. Let d be a positive integer. We have

p d W1,d[f] ≤ 85 ` ln(4ek) .

Both theorems rely on the following lemma which gives an upper bound on W2,d[g] in terms of the expectation of a [0, 1]-valued function g. The case d = 1 is known as Chang’s inequality [Cha02]. (See also [IMR14] for a simple proof.) This was then generalized by Talagrand to d = 2 [Tal96]. Using a similar argument to [Tal96], we extend this to d > 2. Lemma 4.9. Let g : {0, 1}n → [0, 1] be any function. For every positive integer d, we have

2 1/d d W2,d[g] ≤ 4 E[g] 2e ln(e/ E[g] ) . We defer its proof to Section 4.4. We remark that a similar upper bound was proved by Keller and Kindler [KK13]. However, the upper bound in [KK13] was proved in terms Pn 2 of i=1 Ii[g] , where Ii[g] is the influence of the ith coordinate on g, instead of E[g]. A

92 similar upper bound in terms of E[g] can be found in [O’D14] under the extra condition d ≤ 2 ln(1/ E[g]). We will also use the following well-known fact that bounds above W1,d[f] in terms of W2,d[f]. n d/2p Fact 4.11. Let f : {0, 1} → R be any function. We have W1,d[f] ≤ n W2,d[f]. Proof. By the Cauchy–Schwarz inequality, v X un X q W [f] = |fˆ | ≤ u fˆ2 ≤ nd/2 W [f]. 1,d S t d S 2,d |S|=d |S|=d

n Lemma 4.12. Let f : {0, 1} → [−1, 1] be a product test of k functions f1, . . . , fk with input length `, and αi := (1 − E[fi])/2 for every i ∈ [k]. Let d be a positive integer. We have √ 3 d W1,d[f] ≤ 32e ` g(α1, . . . , αk), where the function g : (0, 1]k → R is defined by d Pk  1/z zi/2 −2 i=1 αi X X X Y i  g(α1, . . . , αk) := e αi ln e/αi . m=1 S⊆[k] z∈[`]S i∈S |S|=m P i zi=d Qk Proof. For notational simplicity, we will use Wd[f] to denote W1,d[f]. Write f = i=1 fi. Without loss of generality we will assume each function fi is non-constant. Since fi and −fi have the same weight Wd[fi], we will further assume E[fi] ∈ [0, 1). Note that for a subset S = S × · · · × S ⊆ ({0, 1}`)k, we have fˆ = Qk fˆ . So, 1 k S i=1 iSi k d X Y X X X Y Y  Wd[f] = Wzi [fi] = Wzi [fi] · W0[fi] . z∈{0,...,`}k i=1 m=1 S⊆[k] z∈[`]S i∈S i6∈S P |S|=m P i zi=d i zi=d Since x = 1 − (1 − x) ≤ e−(1−x) for every x ∈ R, for every subset S ⊆ [k] of size at most d, we have P P P Pk Y − (1−W0[fi]) − (1−W0[fi]) W0[fi] d − (1−W0[fi]) W0[fi] ≤ e i6∈S ≤ e i6∈S · e i∈S ≤ e · e i=1 . i6∈S Hence,

d X X X Y Y  Wd[f] = Wzi [fi] · W0[fi] m=1 S⊆[k] z∈[`]S i∈S i6∈S |S|=m P i zi=d d Pk d − i=1(1−W0[fi]) X X X Y ≤ e · e Wzi [fi]. (4.2) m=1 S⊆[k] z∈[`]S i∈S |S|=m P i zi=d

93 0 0 Define fi := (1 − fi)/2 ∈ [0, 1]. Let αi := E[fi ] = (1 − E[fi])/2 ∈ (0, 1/2]. Applying 0 Lemma 4.9 and Fact 4.11 to the functions fi , we have for every subset S ⊆ [k] of size at most d,   X Y 0 X Y zi/2 1/zi zi/2 Wzi [fi ] ≤ 2` αi 2e ln e/αi z∈[`]S i∈S z∈[`]S i∈S P P i zi=d i zi=d √ d X Y 1/zi zi/2 ≤ ( 8e`) αi ln e/αi . z∈[`]S i∈S P i zi=d

0 Note that for every integer d ≥ 1, we have Wd[fi] = 2Wd[fi ]. Plugging the bound above into Equation (4.2), we have

d Pk √ d d −2 i=1 αi X X X Y 0 3  Wd[f] ≤ (2e) · e Wzi [fi ] ≤ 32e ` g(α1, . . . , αk), m=1 S⊆[k] z∈[`]S i∈S |S|=m P i zi=d where the function g : (0, 1]k → R is defined by

d Pk  1/z zi/2 −2 i=1 αi X X X Y i  g(α1, . . . , αk) := e αi ln e/αi . m=1 S⊆[k] z∈[`]S i∈S |S|=m P i zi=d

k Pk We now prove Theorems 4.2 and 4.3. For every (α1, . . . , αk) ∈ (0, 1] , let α := i=1 αi/k ∈ (0, 1]. We note that the upper bound in Theorem 4.2 is sufficient to prove Theorem 4.5.

Proof of Theorem 4.2. We will bound above g(α1, . . . , αk) in Lemma 4.12. Recall that αi = −c` −(c`+1) (1 − E[fi])/2. Since |E[fi]| ≤ 1 − 2 , we have αi ≥ 2 , and so ln(1/αi) ≤ c` + 1. For S P d−1  d every subset S ⊆ [k], the set {z ∈ [`] : i zi = d} has size at most |S|−1 ≤ 2 . Hence,

X Y zi/2 d d/2 ln(1/αi) ≤ 2 (c` + 1) . z∈[`]S i∈S P i zi=d By Maclaurin’s inequality (cf. [Ste04, Chapter 12]), we have

k m X Y mX  m m αi ≤ (e/m) αi = (e/m) (kα) . S⊆[k] i∈S i=1 |S|=m Because the function x 7→ e−2xxm is maximized when x = m/2, it follows that

d d d d X −2kα X Y X −2kα m m X −m m m X −m e αi ≤ e (e/m) (kα) ≤ e (e/m) (m/2) = 2 ≤ 1. m=1 S⊆[k] i∈S m=1 m=1 m=1 |S|=m

94 Therefore,

d Pk  1/z zi/2 −2 i=1 αi X X X Y i  g(α1, . . . , αk) = e αi ln(1/αi ) m=1 S⊆[k] z∈[`]S i∈S |S|=m P i zi=d d d d/2 X −2kα X Y ≤ 2 (c` + 1) e αi m=1 S⊆[k] i∈S |S|=m ≤ 2d(c` + 1)d/2.

Plugging this bound into Lemma 4.12, we have √ √ 3 d p d d W1,d[f] ≤ 32e ` · 4(c` + 1) ≤ 72( c · `) .

Pk We now prove Theorem 4.3. Recall that we let α := i=1 αi/k ∈ (0, 1] for every k (α1, . . . , αk) ∈ (0, 1] . We will show that the maximum of the function g defined in Lemma 4.12 is attained at the diagonal (α, . . . , α). We state the claim now and defer the proof to the next section.

k Claim 4.13. Let g be the function defined in Lemma 4.12. For every (α1, . . . , αk) ∈ (0, 1] , we have g(α1, . . . , αk) ≤ g(α, . . . , α).

Proof of Theorem 4.3. We first apply Claim 4.13 and obtain

d −2kα X X m X Y 1/zi zi/2 g(α1, . . . , αk) ≤ g(α, . . . , α) = e α ln e/α . m=1 S⊆[k] z∈[`]S i∈S |S|=m P i zi=d

We next give an upper bound on g(α, . . . , α) that has no dependence on the numbers zi. By the weighted AM-GM inequality, for every subset S ⊆ [k] of size m and numbers zi such P that i∈S zi = d,

1/zi  d/2 Y zi/2 X zi ln e/α  lne/α1/zi  ≤ d i∈S i∈S 1 X  1 d/2 = zi 1 + ln(1/α) d zi i∈S  m d/2 = 1 + ln(1/α) d = lne/αm/dd/2.

95 S P d−1  d For every subset S ⊆ [k], the set {z ∈ [`] : i zi = d} has size at most |S|−1 ≤ 2 . Thus,

d X X X d/2 g(α, . . . , α) ≤ e−2kα αm lne/αm/d m=1 S⊆[k] z∈[`]S |S|=m P i zi=d d X X d/2 ≤ 2d e−2kα αmlne/αm/d m=1 S⊆[k] |S|=m d X ekαm d/2 ≤ 2d e−2kα lne/αm/d . (4.3) m m=1

For every m ∈ [k], define gm : (0, 1] → R to be ekxm g (x) := e−2kx lne/xm/dd/2. m m

We now bound above the maximum of gm over x ∈ (0, 1]. One can verify easily that the derivative of g is g (x) g0 (x) = m ln(1/x2m/d)(m − 2kx) + (m − 4kx). m 2x lne/xm/d

0 gm(x) 2m/d  Observe that when x ≤ m/4k, then gm(x) ≥ 4x ln(e/xm/d) m ln(1/x ) ≥ 0. Likewise, 0 gm(x) when x ≥ m/2k, then gm(x) ≤ 2x ln(e/xm/d) (−m) ≤ 0. Also, we have gm(0) = 0. Hence, gm(x) ≤ gm(βmm/4k) for some βm ∈ [1, 2], which is at most  d/2 e−m/2 · (e/2)m · lne(4k/m)m/d .

−2k m (In the case when m/4k ≥ 1, we have gm(x) ≤ gm(1) ≤ e (ek/m) .) Therefore, plugging this back into Equation (4.3),

d d d d/2 d X d X d X −m/2 m  m/d g(α, . . . , α) ≤ 2 gm(α) ≤ 2 gm(βmm/4k) ≤ 2 e · (e/2) · ln e(4k/m) m=1 m=1 m=1 d d/2 X ≤ 2de ln(4ek) 2−m m=1 ≤ p4e ln(4ek)d.

Putting this back into the bound in Lemma 4.12, we conclude that p d W1,d[f] ≤ 84 ` ln(4ek) , proving the theorem.

96 4.2.1 Schur-concavity of g

We prove Claim 4.13 in this section. First recall that the function g : (0, 1]k → R is defined as d X X X Y g(α1, . . . , αk) := φzi (αi), m=1 S⊆[k] z∈[`]S i∈S |S|=m P i zi=d where for every positive integer z, the function φz : (0, 1] → R is defined by 1/z z/2 φz(x) = x ln(e/x ) . The proof of Claim 4.13 follows from showing that g is Schur-concave. Before defining it, we first recall the concept of majorization. Let x, y ∈ Rk be two vectors. We say that y majorizes x, denoted by x ≺ y, if for every j ∈ [k] we have

j j X X x(i) ≤ y(i), i=1 i=1 Pk and i=1(xi − yi) = 0, where x(i) and y(i) are the ith largest coordinates in x and y respec- tively. A function f : D → R where D ⊆ Rk is Schur-concave if whenever x ≺ y we have f(x) ≥ f(y). We will show that g is Schur-concave using the Schur–Ostrowski criterion.

Theorem 4.14 (Schur–Ostrowski criterion (Theorem 12.25 in [PPT92])). Let f : D → R be a function where D ⊆ Rk is permutation-invariant, and assume that the first partial derivatives of f exist in D. Then f is Schur-concave in D if and only if  ∂f ∂f  (xj − xi) − ≥ 0 ∂xi ∂xj for every x ∈ D, and every 1 ≤ i 6= j ≤ k. P P Claim 4.13 then follows from the observation that ( i xi/k, . . . , i xi/k) ≺ x for every x ∈ [0, 1]k. Claim 4.15. For every x ∈ (0, 1] we have

1. φz(x) ≥ 0; 0 1 e  e z/2−1 2. φz(x) = 2 ln x2/z ln x1/z > 0, and 00 1 e z/2−2 e  z e  3. φz (x) = − 2xz ln x1/z 2 ln x1/z + ( 2 − 1) ln x2/z ≤ 0. 0 Proof. The derivatives of φz and the non-negativity of φz and φz can be verified easily. It is 00 00 also clear that φz is non-positive when z ≥ 2. Thus it remains to verify φ1(x) ≤ 0 for every x. We have 1  e −3/2  e  1  e  φ00(x) = − ln 2 ln − ln . 1 2x x x 2 x2 1 2 2 2 00 It follows from 2 ln(e/x ) ≤ ln(e /x ) = 2 ln(e/x) that φ1(x) ≤ 0.

97 Lemma 4.16. g is Schur-concave.

Proof. Fix 1 ≤ u 6= v ≤ k and write g = g1 + g2, where

d X X X Y g1(α1, . . . , αk) := φzi (αi) m=1 S⊆[k],|S|=m z∈[`]S i∈S (S3u∧S63v)∨(S63u∧S3v) P i zi=d

and d X X X Y g2(α1, . . . , αk) := φzi (αi). m=1 S⊆[k],|S|=m z∈[`]S i∈S (S3u∧S3v)∨(S63u∧S63v) P i zi=d   We will show that for every α ∈ (0, 1]k, whenever α ≤ α we have (1) ∂g1 − ∂g1 (α) ≤ 0 v u ∂αu ∂αv   and (2) ∂g2 − ∂g2 (α) ≤ 0, from which the lemma follows from Theorem 4.14. ∂αu ∂αv 00 0 0 For g1, since φz ≤ 0 and αv ≤ αu, we have φzu (αv) ≥ φzu (αu). Moreover, as φz ≥ 0 and 0 φz > 0, we have

d 0 ∂g1 X X X Y φ (αv) (α) ≤ φ (α ) · φ0 (α ) · zu zi i zu u 0 ∂αu φz (αu) m=1 S⊆[k],|S|=m z∈[`]S i∈S u (S3u∧S63v) P i6=u i zi=d d X X X Y 0 = φzi (αi) · φzu (αv) m=1 S⊆[k],|S|=m z∈[`]S i∈S (S3u∧S63v) P i6=u i zi=d d X X X Y 0 ∂g1 = φzi (αi) · φzv (αv) = (α), ∂αv m=1 S⊆[k],|S|=m z∈[`]S i∈S (S3v∧S63u) P i6=v i zi=d

where in the second equality we simply renamed zu to zv.   We now show that ∂g2 − ∂g2 (α) ≤ 0 whenever α ≤ α . For all positive integers z ∂αu ∂αv v u 2 and w, define ψz,w : (0, 1] → R by

0 0 0 0 ψz,w(x, y) := φz(x)φw(y) + φw(x)φz(y) − φz(x)φw(y) − φw(x)φz(y).

Note that when x = y we have ψz,w(x, x) = 0. Moreover, when z = w we have ψz,z(x, y) = 0 0 2(φz(x)φz(y) − φz(x)φz(y)). For every x, y ∈ (0, 1], by Claim 4.15 we have

∂ ψ (x, y) = φ0 (x)φ0 (y) + φ0 (x)φ0 (y) − φ (x)φ00 (y) − φ (x)φ00(y) ≥ 0. ∂y z,w z w w z z w w z

98 Since ψzu,zv (αu, αu) = 0, we have ψzu,zv (αu, αv) ≤ 0 whenever αv ≤ αu, and so

 ∂g ∂g  2 − 2 (α) = ∂αu ∂αv d X X  X Y X Y  φzi (αi) · ψzu,zv (αu, αv)/2 + φzi (αi) · ψzu,zv (αu, αv) ≤ 0 m=2 S⊆[k] z∈[`]S i∈S z∈[`]S i∈S |S|=m P z =d i6=u P z =d i6=u i i i6=v i i i6=v S3u∧S3v zu=zv zu

because the values φzi are non-negative.

4.2.2 Lower bound In this section we prove Claim 4.4. We first restate our claim. Claim 4.4. For all positive integers ` and d, there exists a product test f : {0, 1}`k → {0, 1} with k = d · 2` functions of input length ` such that

3/2 d W1,d[f] ≥ (`/e ) .

` `k Proof. Let k = d · 2 and f1, . . . , fk : {0, 1} → {0, 1} be the OR function on k disjoint sets ˆ −` ˆ −` of ` bits. It is easy to verify that fi(∅) = 1 − 2 and |fi(S)| = 2 for every S 6= ∅. Consider Qk −x(1+x) the product test f := i=1 fi. Using the fact that 1 − x ≥ e for x ∈ [0, 1/2], we have

(1 − 2−`)k ≥ e−2`(1+2−`)k ≥ e−d(1+2−`) ≥ e−3d/2.

Hence,

k X Y W1,d[f] = Wzi [fi] z∈{0,...,`}k i=1 P i zi=d X Y Y  ≥ W1,1[fi] W1,0[fi] |S|=d i∈S i6∈S k = · (`2−`)d · (1 − 2−`)k−d d d · 2` d ≥ · (`2−`)d · e−3d/2 d = (`/e3/2)d.

4.3 Pseudorandom generators

In this section, we use Theorem 4.3 to construct two pseudorandom generators for product tests. The first one (Theorem 4.7) has seed length O˜(` + log(k/ε)) log k. The second one

99 (Theorem 4.5) has a seed length of O˜(` + log(n/ε)) but only works for product tests with outputs {−1, 1} and their variants (see Corollary 4.6). We note that Theorem 4.5 can also be obtained using Theorem 4.2 in place of Theorem 4.3. Both constructions use the Ajtai–Wigderson framework [AW89, GMR+12], and follow from recursively applying the following theorem, which roughly says that 2−Ω(˜ `+log(k/ε))- almost O(` + log(k/ε))-wise independence plus constant fraction of noise fools product tests. Theorem 4.10. Let f : {0, 1}n → [−1, 1] be a product test with k functions of input length `. Let d be a positive integer. Let D and T be two independent δ-almost d-wise independent distributions over {0, 1}n, and U be the uniform distribution over {0, 1}n. Then √ p d −(d−`)/2 E[f(D + T ∧ U)] − E[f(U)] ≤ k · δ · (170 · ` ln(ek)) + 2 , where “+” and “∧” are bit-wise XOR and AND respectively. Theorem 4.10 follows immediately by combining Theorem 4.3 and Lemma 4.17 below. Lemma 4.17. Let f : {0, 1}n → [−1, 1] be a product test with k functions of input length `. Let d be a positive integer. Let D,T,U be a δ-almost (d + `)-wise independent, a γ-almost (d + `)-wise independent, and the uniform distributions over {0, 1}n, respectively. Then √ −d/2 √  E[f(D + T ∧ U)] − E[f(U)] ≤ k · δ · W1,≤d+`[f] + 2 + γ , where “+” and “∧” are bit-wise XOR and AND respectively. Proof. We slightly modify the decomposition in [FK18, Proposition 6.1] as follows. Let f be Qk a product test and write f = i=1 fi. As the distribution D + T ∧ U is symmetric, we can ≤i Q assume the function fi is defined on the ith ` bits. For every i ∈ {1, . . . , k}, let f = j≤i fj >i Q and f = j>i fj. We decompose f into

k ˆ X >i f = f∅ + L + Hif , (4.4) i=1 where X ˆ L := fαχα α∈{0,1}`k 0<|α|

100 |α| ≥ d. Then the dth 1 in α must appear in one of α1, . . . , αk. Say it appears in αi. Then >i we claim that α appears in Hif . This is because the coefficient indexed by (α1, . . . , αi) >i appears in Hi, and the coefficient indexed by (αi+1, . . . , αk) appears in f . Note that all the coefficients in each function Hi have weights between d and d + `, and because our distributions D and T are both almost (d + `)-wise independent, we get an error of 2−d + γ in Lemma 7.1 in [FK18]. The rest of the analysis follows from [FK18] or Chapter2.

4.3.1 Generator for product tests

We now prove Theorem 4.7.

Theorem 4.7. There exists an explicit generator G: {0, 1}s → {0, 1}n that fools any prod- uct test with k functions of input length ` with error ε and seed length O(log `k)((` + log(k/ε))(log ` + log log(k/ε)) + log log n) = O˜(` + log(k/ε)) log k.

The high-level idea is very simple. Let f be a product test. For every choice of D and T in Theorem 4.10, the function f 0 : {0, 1}T → [−1, 1] defined by f 0(y) := f(D + T ∧ y) is also a product test. So we can apply Theorem 4.10 again and recurse. We show that if we repeat this argument for t = O(log(`k)) times with t independent copies of D and T , then for every fixing of D1,...,Dt and with high probability over the choice of T1,...,Tt, the restricted Vt T product test defined on {0, 1} i=1 i is a product test defined on at most O(`+log(k/ε)) bits, which can then be fooled by an almost O(` + log(k/ε))-wise independent distribution.

Proof of Theorem 4.7. Let C be a sufficiently large constant. Let d = C(` + log(k/ε)), δ = −2d ˜ d , and t = C log(`k) = O(log k). Let D1,...,Dt,T1,...,Tt be 2t independent δ-almost d- n (1) (i+1) (i) wise independent distributions over {0, 1} . Define D := D1 and D := Di+1 +Ti ∧D . (t) Vt 0 Let D := D , T := i=1 Ti. Let G be a δ-almost d-wise independent distribution over n |S| n {0, 1} . For a subset S ⊆ [n], define the function PADS(x): {0, 1} → {0, 1} to output n bits of which the positions in S are the first |S| bits of x0|S| and the rest are 0. Our generator G outputs 0 D + T ∧ PADT (G ).

We first look at the seed length of G. By [NN93, Lemma 4.2], sampling the distributions Di and Ti takes a seed of length

s := t · O(d log d + log log n) = t · O(` + log(k/ε))(log ` + log log(k/ε)) + log log n = t · O˜` + log(k/ε).

Sampling G0 takes a seed of length O((` + log(k/ε))(log ` + log log(k/ε)) + log log n). Hence the total seed length of G is O˜(` + log(k/ε)) log k.

101 We now look at the error of G. By our choice of δ and applying Theorem 4.10 recursively for t times, we have √  p d −(d−`)/2 E[f(D + T ∧ U)] − E[f(U)] ≤ t · k · δ · 170 · ` ln(ek) + 2 170p` ln(ek)d  ≤ t · k · + 2−Ω(d) d ≤ t · 2−Ω(d) ≤ ε/2.

Next, we show that for every fixing of D and most choices of T , the function fD,T (y) := f(D + T ∧ y) is a product test defined on d bits, which can be fooled by G0. Sk Let I = i=1 Ii. Note that |I| ≤ `k. Because the variables Ti are independent and each of them is δ-almost d-wise independent, we have |I| Pr|I ∩ T | ≥ d ≤ (2−d + δ)t ≤ 2d log(`k) · 2−Ω(d log(`k)) ≤ ε/4. d It follows that for every fixing of D, with probability at least 1 − ε/4 over the choice of T , 0 the function fD,T is a product test defined on at most d bits, which can be fooled by G with error ε/4. Hence G fools f with error ε.

4.3.2 Almost-optimal generator for XOR of Boolean functions In this section, we construct our generator for product tests with outputs {−1, 1}, which correspond to the XOR of Boolean functions fi defined on disjoint inputs. Throughout this section we will call these tests {−1, 1}-products. We first restate our theorem. Theorem 4.5. There exists an explicit generator G: {0, 1}s → {0, 1}n that fools the XOR of any k Boolean functions on disjoint inputs of length ≤ ` with error ε and seed length O(` + log(n/ε))(log ` + log log(n/ε))2 = O˜(` + log(n/ε)). Theorem 4.5 relies on applying the following lemma recursively in different ways. From now on, we will relax our tests to allow one of the k functions to have input length greater than `, but bounded by O(` + log(n/ε)). Lemma 4.18. There exists a constant C such that the following holds. Let ` and m be two integers such that ` ≥ C log log(n/ε) and m = 5(` + log(n/ε)). If there is an explicit generator G0 : {0, 1}s0 → {0, 1}n that fools {−1, 1}-products with k0 ≤ 16`+1 functions, k0 − 1 of which have input lengths ≤ `/2 and one has length ≤ s, with error ε0 and seed length s0, then there is an explicit generator G: {0, 1}s → {0, 1}n that fools {−1, 1}-products with k ≤ 162`+1 functions, k − 1 of which have input lengths ≤ ` and one has length ≤ m, with error ε0 +ε and seed length s = s0 +O(`+log(n/ε))(log `+log log(n/ε)) = s0 +O˜(`+log(n/ε)). The proof of Lemma 4.18 closely follows a construction by Meka, Reingold and Tal [MRT18]. First of all, we will use the following generator in [MRT18]. It fools any {−1, 1}-products when the number of functions k is significantly greater than the input length ` of the func- tions fi.

102 Lemma 4.19 (Lemma 6.2 in [MRT18]). There exists a constant C such that the following holds. Let n, k, `, m be integers such that C log log(n/ε) ≤ ` ≤ log n and 16` ≤ k ≤ 2 · 2` s n 16 . There exists an explicit pseudorandom generator G⊕Many : {0, 1} → {0, 1} that fools {−1, 1}-products with k non-constant functions, k − 1 of which have input lengths ≤ ` and one has length ≤ m, with error ε and seed length O(m + log(n/ε)). Here is the high-level idea of proving Lemma 4.18. We consider two cases depending on ` whether k is large with respect to `. If k ≥ 16 , then by Lemma 4.19, the generator G⊕Many fools f. Otherwise, we show that for every fixing of D and most choices of T , the restriction of f under (D,T ) is a {−1, 1}-product with k functions, k − 1 of which have input length ≤ `/2 and one has length ≤ m. More specifically, we will show that for most choices of T , the following would happen: for the function with input length ≤ m, at most m/2 of its inputs remain in T ; for the rest of the functions with input length ≤ `, after being restricted by (D,T ), at most dm/2`e of them have input length > `/2, and so they are defined on a total of m/2 positions in T . Now we can think of these “bad” functions as one function with input length ≤ m, and the rest of the at most k “good” functions have input length `/2. So we can apply the generator G0 in our assumption. Proof of Lemma 4.18. Let C be the constant in Lemma 4.19 and C0 be a sufficiently large constant. 0 −2d Let d = C m and δ = d . Let D1,...,D50,T1,...,T50 be 100 independent δ-almost d- n (1) (i+1) (i) wise independent distributions over {0, 1} . Define D := D1 and D := Di+1 +Ti ∧D . (50) V50 Let D := D , T := i=1 Ti and G⊕Many be the generator in Lemma 4.19 with respect to the values of n, k, `, m given in this lemma. For a subset S ⊆ [n], define the function |S| n PADS(x): {0, 1} → {0, 1} to output n bits of which the positions in S are the first |S| bits of x0|S| and the rest are 0. Our generator G outputs

0 (D + T ∧ PADT (G )) + G⊕Many.

We first look at the seed length of G. By Lemma 4.19, G⊕Many uses a seed of length O(m + log(n/ε)) = O(` + log(n/ε)). By [NN93, Lemma 4.2], sampling the distributions Di and Ti takes a seed of length

O(m log m) = O` + log(n/ε)(log ` + log log(n/ε)) = O˜(` + log(n/ε)).

Hence the total seed length of G is s0 + O(` + log(n/ε))(log ` + log log(n/ε)) = s0 + O˜(` + log(n/ε)).

Qk Ii We now show that G fools f. Write f = i=1 fi, where fi : {0, 1} → {−1, 1}. Without loss of generality we can assume each function fi is non-constant. We consider two cases.

k is large: If k ≥ 16`, then for every fixing of D, T and G0, the function f 0(y) := f(D + 0 T ∧ PADT (G ) + y) is also a {−1, 1}-product with the same parameters as f. Note that we always have k ≤ n and so ` ≤ log n. Hence it follows from Lemma 4.19 that the generator 0 0 G⊕Many fools f with error ε. Averaging over D, T and G shows that G fools f with error ε.

103 ` 0 k is small: Now suppose k ≤ 16 . For every fixing of G⊕Many, consider f (y) := f(y + 0 G⊕Many). Again, f is a {−1, 1}-product with the same parameters as f. In particular, it is a {−1, 1}-product with k functions with input length m. So, by our choice of δ and applying Theorem 4.10 recursively for 50 times, we have √ 0 0  p d −(d−m)/2 E[f (D + T ∧ U)] − E[f (U)] ≤ 50 · k · δ · 170 · m ln(ek) + 2   ≤ 50 · 2m · (170m/d)d + 2−Ω(m) ≤ 2−Ω(m) ≤ ε/2.

0 Next, we show that for every fixing of D and most choices of T , the function fD,T (y) := f 0(D + T ∧ y) is a {−1, 1}-product with k functions, k − 1 of which have input lengths ≤ `/2 and one has length ≤ m, which can be fooled by G0. Because the variables Ti are independent and each of them is δ-almost d-wise independent, for every subset I ⊆ [n] of size at most d, we have

50 Y −|I| 50 −50|I| Pr[T ∩ I = I] = Pr[Ti ∩ I = I] ≤ (2 + δ) ≤ (3/4) . i=1

Without loss of generality, we assume I1,...,Ik−1 are the subsets of size at most ` and Ik is the subset of size at most m. We now look at which subsets T ∩ Ii have length at most `/2 and which subsets do not. For the latter, we collect the indices in these subsets. Let G := {i ∈ [k − 1] : |T ∩ Ii| ≤ `/2}, B := {i ∈ [k − 1] : |T ∩ Ii| > `/2} and S BV := {j ∈ [n]: j ∈ i∈B(T ∩ Ii)}. We claim that with probability 1 − ε/2 over the choice of T , we have |BV | ≤ m. Note that the indices in BV either come from Ik, or Ii for i ∈ [k − 1]. For the first case, the probability that at least m/2 of the indices in Ik appear in BV is at most  |I |  k (3/4)−25m ≤ 2m · (3/4)−25m ≤ ε/4. m/2 S For the second case, note that if at least m/2 of the variables in i∈[k−1] Ii appear in BV , then they must appear in at least dm/2`e of the subsets T ∩I1,...,T ∩Ik−1. The probability of the former is at most the probability of the latter, which is at most

 k − 1 ` · dm/2`e (3/4)−25m ≤ 16`·(m/2`+1) · 2`·(m/2`+1) · (3/4)−25m ≤ ε/4, dm/2`e m/2 because k ≤ 16` and ` ≤ m. Hence with probability 1−ε/2 over the choice of T , the function 0 fD,T is a product g · h, where g is a product of |G| ≤ k − 1 functions of input length `/2, and h is a product of |B| + 1 functions defined on a total of |BV | ≤ m bits. Recall that k ≤ 16`, 0 0 0 0 so by our assumption G fools fD,T with error ε . Therefore G fools f with error ε + ε .

We obtain Theorem 4.5 by applying Lemma 4.18 repeatedly in different ways.

104 Proof of Theorem 4.5. Given a {−1, 1}-product f : {0, 1}n → {−1, 1} with k functions of input length `, we will apply Lemma 4.18 in stages. In each stage, we start with a {−1, 1}- product f with k1 functions, k1 − 1 of which have input lengths ≤ `1 = max{`, 2 log(n/ε)} 2`1+1 and one has length ≤ m := 5(` + log(n/ε)). Note that k1 ≤ 16 . Let C be the constant in Lemma 4.18. We apply Lemma 4.18 for t = O(log `1) times until f is restricted to a 0 {−1, 1}-product f with k2 functions, k2 − 1 of which have input lengths ≤ `2 and one has 2`2+1 r length ≤ m, where `2 = C log log(n/ε), k2 ≤ 16 ≤ (log(n/ε)) , and r := 8C + 4 is a constant. This uses a seed of length

t · O(` + log(n/ε))(log ` + log log(n/ε)) ≤ O(` + log(n/ε))(log ` + log log(n/ε))2 = O˜(` + log(n/ε)).

At the end of each stage, we repeat the above argument by grouping every dlog(n/ε)/`2e 0 functions of f that have input lengths ≤ `2 as one function of input length ≤ 2 log(n/ε), so 0 r−1 we can think of f as a {−1, 1}-product with k3 := k2/d`2/(log n)e ≤ (log(n/ε)) log log n functions, k3 − 1 of which have input lengths ≤ log(n/ε) and one has length ≤ m. Repeating above for r + 1 = O(1) stages, we are left with a {−1, 1}-product of two functions, one has input length ≤ C log log(n/ε), and one has length ≤ m, which can then be fooled by a 2−Ω(m)-biased distribution that can be sampled using O(`+log(n/ε)) bits [NN93]. So the total seed length is O(` + log(n/ε))(log ` + log log(n/ε))2 = O˜(` + log(n/ε)), and the error is (r + 1) · t · ε. Replacing ε with ε/(r + 1)t proves the theorem.

4.4 Level-d inequalities

In this section, we prove Lemma 4.9 that gives an upper bound on the dth level Fourier weight of a [0, 1]-valued function in L2-norm. We first restate the lemma. Lemma 4.9. Let g : {0, 1}n → [0, 1] be any function. For every positive integer d, we have

2 1/d d W2,d[g] ≤ 4 E[g] 2e ln(e/ E[g] ) . Our proof closely follows the argument in [Tal96].

n Claim 4.20. Let f : {0, 1} → R have Fourier degree at most d and kfk2 = 1. Let n d/2 g : {0, 1} → [0, 1] be any function. If t0 ≥ 2e , then

1−2/d d 2/d   − 2e t0 E g(x)|f(x)| ≤ E[g]t0 + 2et0 e . To prove this claim, we will use the following concentration inequality for functions with Fourier degree d from [DFKO07].

Theorem 4.21 (Lemma 2.2 in [DFKO07]). Let f : {0, 1}n → R have Fourier degree at most P ˆ2 d/2 d and assume that kfk2 := S fS = 1. Then for any t ≥ (2e) ,

  − d t2/d Pr |f| ≥ t ≤ e 2e .

105 − d t2/d We also need to bound above the integral of e 2e .

d/2 Claim 4.22. Let d be any positive integer. If t0 ≥ (2e) , then we have

Z ∞ d 2/d 1−2/d d 2/d − 2e t − 2e t0 e dt ≤ 2et0 e . t0

d 2/d Proof. First we apply the following change of variable to the integral. We set s = 2e t and obtain Z ∞ d/2−1 Z ∞ − d t2/d 2e d/2−1 −s e 2e dt = e s e ds, t0 d s0

d 2/d where s0 = 2e t0 . Define Z ∞ d−1 −s Γs0 (d) = s e ds. s0

(Note that when s0 = 0 then Γ0(d) is the Gamma function.) Using integration by parts, we have

d−1 −s0 Γs0 (d) = s0 e + (d − 1)Γs0 (d − 1). (4.5)

d−1 R ∞ −s d−1 −s0 Moreover, when d ≤ 1, we have Γs (d) ≤ s e ds = s e . 0 0 s0 0 d/2 Note that if t0 ≥ (2e) , then s0 ≥ d − 2. Hence, if we open the recursive definition of

Γs0 (d/2) in Equation (4.5), we have

d d 2 e−1 i −s0 X d/2−1−i Y Γs0 (d/2) ≤ e s0 (d/2 − j) i=0 j=1 d d e−1 2 i d/2−1 X d/2 − 1 ≤ e−s0 s 0 s i=0 0

−s0 d/2−1 ≤ 2e s0 ,

because the summation is a geometric sum with ratio at most 1/2. Substituting s0 with t0, we obtain

2ed/2−1 Z ∞ 2ed/2−1 d/2−1 −s −s0 d/2−1 e s e ds ≤ 2e e s0 d s0 d 1−2/d d 2/d − 2e t0 = 2et0 e .

106 R |f(x)| 1 R ∞ 1 Proof of Claim 4.20. We rewrite |f(x)| as 0 dt = 0 (|f(x)| ≥ t)dt and obtain hZ ∞ i 1 Ex∼{0,1}n [g(x)|f(x)|] = Ex∼{0,1}n g(x) (|f(x)| ≥ t)dt 0 hZ ∞ i  1 ≤ Ex∼{0,1}n min g(x), (|f(x)| ≥ t) dt 0 Z ∞ n o = min E[g], Pr[|f(x)| ≥ t] dt 0 x Z t0 Z ∞   ≤ E[g]dt + Pr |f(x)| ≥ t dt 0 t0 Z ∞ − d t2/d ≤ E[g]t0 + e 2e dt. t0

1−2/d d 2/d d/2 − t0 Since t0 ≥ (2e) , by Claim 4.22 this is at most E[g]t0 + 2et0 e 2e .

P ˆ ˆ P 2 −1/2 Proof of Lemma 4.9. Define f to be f(x) := |S|=d fSχS(x), where fS =g ˆS |T |=d gˆT . Note that kfk2 = 1, and we have

P 1/2 S gˆS E[g(x)χS(x)]  X 2  E[g(x)f(x)] = = gˆS . P 2 1/2 |T |=d gˆT |S|=d

1/d d/2 d/2 Let t0 = (2e ln(e/ E[g] )) ≥ (2e) . By Claim 4.20,

1/2  X  1−2/d d 2/d 2 − 2e t0 gˆS = E[g(x)f(x)] ≤ E[g(x)|f(x)|] ≤ E[g]t0 + 2et0 e . |S|=d

By our choice of t0, the second term is at most

 d/2 d/2 1−2/d d 2/d  e  [g]  e  − 2e t0 E d/2 2et0 e ≤ 2e ln ≤ (2/e) E[g] ln , E[g]1/d ed E[g]1/d which is no greater than the first term. So

1/2  X 2  1/d d/2 gˆS ≤ 2 E[g] 2e ln(e/ E[g] ) , |S|=d and the lemma follows.

107 108 Chapter 5

Some Limitations of the Sum of Small-Bias Distributions

Small-bias distributions, introduced by Naor and Naor [NN93], cf. [ABN+92, AGHP92, BATS13, TS17], are distributions that look balanced to parity functions over {0, 1}n.

Definition 5.1. A distribution D over {0, 1}n is ε-biased if for every nonempty subset I ⊆ [n], we have P x  i∈I i  Ex∼D (−1) ≤ ε . An ε-biased distribution can be generated using a seed of O(log(n/ε)) bits. Since their introduction, small-bias distributions have become a fundamental object in theoretical com- puter science and have found their uses in many areas including derandomization and algo- rithm design. In the last decade or so researchers have considered the sum (i.e. , bitwise XOR) of several independent copies of small-bias distributions. The first paper to explicitly consider it is [BV10a]. This distribution appears to be significantly more powerful than a single small- bias copy, while retaining a modest seed length. In particular, two main questions have been asked:

Question 5.2 (RL). Reingold and Vadhan (personal communication) asked whether there exists a constant c such that the sum of two independent copies of any n−c-biased distribution fools one-way logarithmic space, a. k. a. one-way polynomial-size branching programs, which would imply RL = L. It is known that a small-bias distribution fools one-way width-2 branching programs (Saks and Zuckerman, see also [BDVY13] where a generalization is obtained). No such result is known for width-3 programs.

Question 5.3 (Polynomials). The papers [BV10a, Lov09, Vio09c] show that the sum of d small-bias generators fools F2-polynomials of degree d. However, the proofs only apply when d ≤ (1 − Ω(1)) log n. It is an open question whether the construction works for larger d. If the construction worked for any d = logO(1) n, it would make progress on long-standing open problems in circuit complexity regarding AC0 with parity gates [Raz87]. This question

109 is implicit in the works [BV10a, Lov09, Vio09c] and explicit in [Vio09b, Chapter 1] (Open question 4), cf. the survey [Vio09b, Chapter 1].

In terms of negative results, Meka and Zuckerman [MZ09] show that the sum of 2 dis-

tributions with constant bias does not fool mod 3 linear√ functions. Bogdanov, Dvir, Verbin, and Yehudayoff [BDVY13] show that for ε = 2−O( n/k), the sum of k copies of ε-biased distributions does not fool circuits of size poly(n) and depth O(log2 n)(NC2). This chapter gives two different approaches to improving on both these results and obtain other limitations of the sum of small-bias distributions. One is based on the complexity of decoding, and the other one on bounding the mod 3 rank (see Definition 5.7). Either approach is a candidate to answer negatively the “RL question” (Question 5.2).

5.1 Our results

The following theorem states our main counterexamples. We denote by D + D the bitwise XOR of two independent copies of a distribution D.

Theorem 5.4. For any c, there exists an explicit ε-biased distribution D over {0, 1}n and an explicit function f, such that f(D + D) = 0 and Prx∼{0,1}n [f(x) = 0] ≤ p, where ε, f, p are of any one of the following choices: i. ε = 2−Ω(n), f is a uniform poly(n)-size circuit, and p = 2−Ω(n); ii. ε = 2−Ω(n/ log n), f is a uniform fan-in 2, poly(n)-size circuit of depth O(log2 n), and p = 2−n/4; iii. ε = 1/nc, f is a one-way O(c log n)-space algorithm, and p = O(1/nc); iv. ε = n−Ω(1), f is a mod 3 linear function, and p = 1/2. Moreover, all our results extend to more copies of D as follows. The input D + D to f can be replaced by the bitwise XOR of k independent copies of D if ε is replaced by ε2/k, where k is at most the following quantities corresponding to the above items: i. n/60; ii. n/6 log n; iii. 2c; iv. O(log n/ log log n).

Theorem 5.4.i is tight up to the constant in the exponent because every ε2−n-biased distribution is ε-close to uniform. Theorem 5.4.ii would also be true with ε = 2−Ω(n), if a decoder for certain algebraic- 2 geometric codes√ ran in NC , which we conjecture it does. [BDVY13] prove Theorem 5.4.ii with ε = 2−O( n/k). Theorem 5.4.iii can also be obtained in the following way, pointed out to us by Chen and Zuckerman (personal communication). Since one can distinguish a set of size s from uniform with a width s + 1 branching program, and there exist ε-bias distributions with support size O(n/ε2), the sum of two such distributions can be distinguished from uniform in space O(c log n) when ε = n−c. Actually, both their proof and ours (presented later) apply to c > 0.01; but for smaller c Theorem 5.4.iv kicks in.

110 Theorems 5.4.iii and 5.4.iv come close to addressing the “RL question,” without an- swering it: Theorem 5.4.iv shows that polynomial bias is necessary even for width-3 regular branching programs, while Theorem 5.4.iii shows that the bias is at least polynomial in the width. [MZ09] prove Theorem 5.4.iv with ε = Ω(1). We have not been able to say anything on the “Polynomials question” (Question 5.3). There exist other models of interest. For read-once DNFs no counterexample with large error is possible because Chari, Rohatgi, and Srinivasan [CRS00], building on [EGL+98], show that (just one) n−O(log(1/δ))-bias distribution fools any read-once DNF on n variables with error δ, cf. SectionA. The [CRS00] result is rediscovered by De, Etesami, Trevisan, and Tulsiani [DETT10], who also show that it is essentially tight by constructing a distribution which is n−Ω(log(1/δ)/ log log(1/δ))-biased yet does not δ-fool a read-once DNF. In particular, fooling with polynomial error requires super-polynomial bias. It would be interesting to know whether the XOR of two copies overcomes this limitation, i.e. , if it δ-fools any read-once DNF on n variables provided each copy has bias poly(δ/n). If true, this would give a generator with seed length O(log(n/δ)), which is open. We are unable to resolve this for read-once DNFs. However, we show that the cor- responding result for general DNFs would resolve long-standing problems on circuit lower bounds [Val77]. This can be interpreted as saying that such a result for DNFs is either false or extremely hard to prove. We also get conditional counterexamples for depth-3 and AC0 circuits. Theorem 5.5. Suppose polynomial time (P) has fan-in 2 circuits of linear size and loga- rithmic depth. Then Theorem 5.4 also applies to the following choices of parameters: i. ε = n−ω(1), f is a depth-3 circuit of size no(1) and unbounded fan-in, and p = n−ω(1). ii. ε = n−ω(1), f is a DNF formula of size poly(n), and p = 1 − 1/no(1). Moreover, all our results extend to more copies of D as follows. The input D + D to f can be replaced by the bitwise XOR of k ≤ log n independent copies of D if ε is replaced by ε2/k. Recall that it is still open whether NP has linear-size circuits of logarithmic depth. Theorem 5.6. Suppose for every δ > 0 there exists a constant d such that NC2 has AC0 circuits of size 2nδ and depth d. Then Theorem 5.4 also applies to the following choice of parameters: ε = n− logc n, f is an AC0 circuit of size nO(c) and depth O(c) , and p = n− logΩ(1) n/4. Moreover, our result extends to more copies of D as follows. The input D + D to f can be replaced by the bitwise XOR of k ≤ logc+1 n/6(c + 1) log log n independent copies of D if ε is replaced by ε2/k. Recall that the assumption in Theorem 5.6 holds for NC1 instead of NC2, in fact it holds even for log-space. Moreover, the parameters in the conclusion of Theorem 5.6 are tight in the sense that n−(log n)O(d) bias fools AC0 circuits of size nd and depth d, as shown in the sequence of works [Baz09, Raz09, Bra10, Tal17]. All the above results except Theorem 5.4.iv are based on a new, simple connection be- tween small-bias generators and error-correcting codes, discussed in Section 5.2.

111 Definition 5.7. Let S ⊆ Fp be a set of vectors. Define the mod p rank of S, denoted by rankp(S) to be the rank of S over Fp. We define the mod p rank of a distribution D to be the mod p rank of its support.

Definition 5.8. The correlation of two functions f, g : {0, 1}n → {0, 1} is

f(x)+g(x) Cor(f, g) := Ex∼{0,1}n [(−1) ] .

Theorem 5.4.iv instead follows [MZ09] and bounds the mod 3 rank of small-bias dis- tributions. It turns out an upper bound on the mod 3 rank of some k-wise independent distributions over bits would allow us to reduce the bias in Theorem 5.4.iv, assuming long- standing conjectures on correlation bounds for low-degree polynomials (which may be taken as standard).

Claim 5.9. Suppose −Ω(k) 1. the parity of k copies of mod 3 on√ disjoint inputs of length m has correlation 2 with any F2-polynomial of degree ε m for some constant ε > 0, and 2. for every c, there exists an n−c-almost c log n-wise independent distribution whose sup- n n n 0.49 port on {0, 1} ⊆ F3 = {0, 1, 2} has mod 3 rank at most n . Then the “RL question” has a negative answer, i.e. , for every c, there exists an n−c-biased distribution D such that D + D does not fool a one-way O(log n)-space algorithm. More specifically, D + D does not fool a mod 3 linear function.

Contrapositively, an affirmative answer to the “RL question,” even for permutation, width-3 branching programs, implies lower bounds on the mod 3 rank of k-wise independent distributions, or that the aforementioned correlation bounds are false. What we know about the second assumption in Claim 5.9 is in Section 5.4, where we initiate a systematic study of the mod 3 rank of (almost) k-wise independent distributions, and obtain the following lower and upper bounds. First, we give an Ω(k log n) lower bound on the mod 3 rank for almost k-wise independent distributions, specifically, distributions such that any k coordinates are 1/10 close to being uniform over {0, 1}k (Claim 5.25). This also gives an exponential separation between mod 3 rank and seed length for such distributions. We then prove the following upper bounds, see Claim 5.33.

Theorem 5.10. For infinitely many n, there exist k-wise independent distributions over {0, 1}n with mod 3 rank d for k = 2 and d ≤ n0.73.

We note that an upper bound of n − 1 on the mod 3 rank of a k-wise independent distribution implies that the distribution is constant on a mod 3 linear test. We ask what is the largest k∗ = k∗(n) such that there exists a k-wise independent distribution with mod 3 rank ≤ n − 1. We conjectured the bound k∗(n) = Ω(n). Partial progress towards this conjecture appeared in a preliminary version of this chapter [LV]. This conjecture was later verified [BHLV18].

112 5.2 Our techniques

All our counterexamples in Theorem 5.4 and 5.5, except Theorem 5.4.iv, come from a new connection between small-bias distributions and linear codes, which we now explain. Let C ⊆ Fn be a linear error correcting code over a finite field of characteristic 2. (Using characteristic 2 allows us to work with small-bias over bits, as opposed to large alphabets, which makes things slightly simpler.) We also use C to denote the uniform distribution over the code C. It is well-known that if C⊥ has minimum distance d⊥, then C is (d⊥ − 1)-wise independent. n Define Ne to be the “noise” distribution over F obtained by repeating the following process e times: Pick a uniformly random position from [n], and set it to a uniform symbol in F. Now, define De to be the distribution on n log|F| bits obtained from adding Ne to C, and we have the following fact.

⊥ e Fact 5.11. De is (1 − d /n) -biased.

⊥ ⊥ Proof. If a test is on less than d field elements, De has zero bias because it is (d − 1)-wise independent. Otherwise, the bias is nonzero only if none of the symbols touched by the test are hit by random noise, which happens with probability (1 − d⊥/n)e. Our main observation is that the XOR of two noisy codewords is also a noisy codeword, with the number of errors injected to the codeword doubled. That is,

De + De = C + Ne + C + Ne = C + N2e = D2e .

Definition 5.12. An algorithm is a threshold-e discriminator for the code C if it decides whether a string is within Hamming distance e of the code.

Now suppose an algorithm is a threshold-2e discriminator for C. Then it can be used to distinguish De + De from uniform. More generally, if an algorithm is a threshold-ke discriminator for C, then it can distinguish the XOR of k independent copies of De from uniform. Contrapositively, if De + De fools f, then f is not a threshold-2e discriminator for C. Thus, to obtain counterexamples we only have to exhibit an appropriate threshold discriminator. We achieve this by drawing from results in coding theory. This is explained below after two remarks.

Remark 5.13. Our threshold discriminator is only required to tell apart noisy codewords and uniform random strings. This is a weaker condition than decoding. In fact, similar threshold discriminators have been considered in the context of tolerant property testing [GR05, KS09, RU10], where tolerant testers are designed to decide if the input is close to being a codeword or far from every codeword, by looking at as few positions of the input as possible.

Remark 5.14. We note that our connection between ε-bias distributions and linear codes is different from the well-known connection in [NN93], which shows that for a binary linear

113 code with relative minimum and maximum distance ≥ 1/2 − ε and ≤ 1/2 + ε, respectively, the columns of its k × n generator matrix form the support of an ε-biased distribution over {0, 1}k. However, the connection to codes is lost once we consider the sum of the same distributions. In contrast, the sum of our distributions bears the code structure of a single copy.

As hinted before Fact 5.11, the small-bias property is established through a case analysis based on the weight of the test. This paradigm goes back at least to the original work by Naor and Naor [NN93]. It was used again more recently in [MST06, ABR16]. Our reasoning is especially close to [MST06, ABR16] because in both papers small tests are handled by local independence but large tests by sum of independent biased bits. For general circuits (Theorem 5.4.i), we consider the asymptotically good binary linear code with constant dual relative distance, based on algebraic geometry and exhibited by Gu- ruswami in [Shp09]. We conjecture that the corresponding threshold discriminator can be implemented in NC2. wever, we are unable to verify this. Instead, for NC2 circuits (The- orem 5.4.ii), we use Reed–Solomon codes and the Peterson–Gorenstein–Zierler syndrome- decoding algorithm [Pet60, GZ61] which we note is in NC2. Under the assumption that NC2 is contained in AC0 circuits of size 2nδ , by scaling the NC2 result down to polylog n bits followed by a depth reduction, we obtain our results for AC0 circuits (Theorem 5.6). This result could also be obtained by scaling down a result in [BDVY13]. Our counterexample for one-way log-space computation (Theorem 5.4.iii) also uses Reed– Solomon codes. The threshold discriminator is simply syndrome decoding: To decode from e errors it can be realized by computing the syndrome in a one-way fashion using space O(e log q), where q is the size of the underlying field of the code. For a given constant c, setting q = n, message length k = d⊥ − 1 = n − O(c), and e = O(c) we obtain a one-way space O(c log n) distinguisher for the sum of two distributions with bias n−c. Naturally, one might try to eliminate the dependence on c in the O(c log n) space bound with a different choice of e and q, which would answer the “RL question” in the negative. In Claim 5.16 however we show that to obtain n−c bias, the space O(e log q) for syndrome decoding must be of Ω(c log n), regardless of the code and the alphabet. Thus our result is the best possible that can be obtained using syndrome decoding. We raise the question of whether syndrome decoding is optimal for one-way decoding in this setting of parameters, and specifically if it is possible to devise a one-way decoding algorithm using space o(e log q). There do exist alternative one-way decoding algorithms, cf. [RU10], but apparently not for our setting of parameters of e = O(1) and k = n − O(1). Our conditional result for depth-3 circuits and DNF formulas (Theorem 5.5) follows from scaling down to barely superlogarithmic input length, and a depth reduction [Val77] (cf. [Vio09b, Chapter 3]) of the counterexample for general circuits (Theorem 5.4.i). We note that the 2−Ω(n)-bias in Theorem 5.4.i is essential for this result, in the sense that 2−n/ log n- bias would be insufficient to obtain Theorem 5.5. We also remark that since O(log2 n)- wise independence suffices to fool DNF formulas [Baz09], one must consider linear codes 2 with dual distance less than log n in our construction, and so De has bias at least (1 − log2 n/n)e = 2−O(log2 n). On the other hand, [DETT10] shows that 2−O(log2 n log log n)-bias fools

114 DNF formulas. The connection between codes and small-bias distributions motivate us to study further the complexity of decoding. [Vio06, Chapter 6] and [SV10], cf. [Vio06, Chapter 6], show that list-decoding requires computing the majority function. In Claim 5.35 we extend their ideas and prove that the same requirement holds even for decoding up to half of the minimum distance. This gives some new results for AC0 and for branching programs. Finally, since logO(1) n-wise independence fools AC0 [Bra10, Tal17], we obtain that AC0 cannot distinguish a codeword from a code with logΩ(1) n dual distance from uniform random strings. This also gives some explanation of why scaling is necessary to obtain Theorem 5.6 from Theorem 5.4.ii.

A different approach. We now explain the high-level ideas in proving Theorem 5.4.iv. Meka and Zuckerman [MZ09] construct the following constant-bias distribution D over n := d √ 5 bits with mod 3 rank less than n. Each output bit is the square of the mod 3 sum of 5 out of the d uniform random bits, which can be written as a degree-5 polynomial over d F2. Since any parity of the output bits is also a degree-5 polynomial over {0, 1} , D has constant bias. To show that a mod 3 linear function is always 0 on the support√ of D + D, they observe that for sufficiently large n, D has mod 3 rank at most d2 < n, and D + D has mod 3 rank at most (d2)2 + d2 = d4 + d2 < n. (See Fact 5.18.) We extend their construction using ideas from the Nisan generator [Nis91]: We pick a pseudo-design consisting of n sets where each set has size nβ (we will choose β to be a small constant), and the intersection of any two sets has size O(log n). Such pseudo-design exists provided the universe has size n2β. The output distribution is again the square of the mod 3 sum on each set. For any test of size at least C log n bits, let J be any C log n bits of the test. We fix the intersections of their corresponding sets in the universe to make them independent. After we do this, every bit in J is still a mod 3 function on nβ − |J| log n ≥ 0.9nβ bits. We further fix every bit outside the |J| sets in the universe. This will not affect the bits in J. Now consider any bit b in the test that is not in J, it corresponds to a set which has intersection at most log n with each of the sets that correspond to the bits in J. Thus, b is now a mod 3 function on at most |J| log n = log2 n input bits and thus can be written 2 as a degree-log n polynomial over F3. Hence, the parity of the bits outside J is also an F2-polynomial of the same degree, and we call this polynomial p. Now observe that the bias of the test equals to the correlation between the parity of the bits in J and p. Since each bit in J is a mod 3 function on nβ bits, by Smolensky’s theorem [Smo87], it has constant correlation with p. In Lemma 5.23 we prove a variant of Impagliazzo’s XOR lemma [Imp95] to show that the XOR of log n independent such bits makes the correlation drop from constant to ε = n−β/4. This variant of XOR lemma may be folklore, but we are not aware of any reference. This handles tests of size at least C log n. For smaller tests, the above distribution could have constant bias, and hence we XOR it with an 1/nΩ(1)-almost C log n-wise independent distribution, which gives us ε bias for tests of size less than C log n and has sufficiently√ small rank. We then show that the XOR of the two distributions has rank less than n and

115 conclude as in the previous paragraph. We refer the reader to [VW08] for background on XOR lemmas.

Organization. In Section 5.3 we describe our counterexamples and prove Theorem 5.4 and 5.5, and Claim 5.9. In Section 5.4 we prove our lower bounds and upper bounds on the mod 3 rank of k-wise independence. As a bonus, in Section 5.5 we include some results on the complexity of decoding. For example, we show that for codes with large minimum distance, AC0 circuits and read-once branching programs cannot decode when the number of errors is close to half of the minimum distance of a code (Claim 5.35). We obtained these results while attempting to devise low-complexity algorithms that can decode (which, by our connection, would have consequences for the sum of small-bias generators).

5.3 Our counterexamples

We are now ready to prove Theorem 5.4 and 5.5, and Claim 5.9. We consider linear codes with different parameters, the bias of D follows from Fact 5.11. Then we present our distin- guishers. In the end, we explain how our results hold for k copies instead of 2.

5.3.1 General circuits Venkatesan Guruswami [Shp09] exhibits the following family of constant-rate binary linear codes whose primal and dual relative minimum distance are both constant.

Theorem 5.15 (Theorem 4 in [Shp09]). For infinitely many n, there exists a binary linear code C with block length n and dimension n/2, which can be constructed, encoded, and decoded from n/60 errors in time poly(n). Moreover, the dual of C has minimum distance at least n/30.

Proof of Theorem 5.4.i. Applying Fact 5.11 with e = n/120 to the code in Theorem 5.15, we obtain a distribution D that is 2−n/3600-biased. Our threshold-2e discriminator f for the code C decodes and encodes the input, and accepts if and only if the input and the re-encoded string differ by at most 2e positions. Since both the encoding and decoding algorithms run in polynomial time, so does f. Note that f accepts at most

2e X n 2n/2 · ≤ 2n/2 · 2nH(1/60) ≤ 20.75n i i=0

possible strings, where H(·) is the binary entropy function (cf. [CT06, Example 11.1.3] for the first inequality). Hence, f distinguishes D + D from the uniform distribution with probability at least 1 − 2−0.25n.

116 5.3.2 NC2 circuits

Proof of Theorem 5.4.ii. Let q be a power of 2. Consider the Reed–Solomon code C over Fq with block length q − 1, dimension q/2 and minimum distance q/2. C has dual minimum distance q/2 + 1 and can decode from q/4 errors. Applying Fact 5.11 to C with e = q/12, we obtain a distribution D over n := (q − 1) log q bits that is 2−Ω(n/ log n)-biased. Let α be a primitive element of Fq. Let H be a parity-check matrix for C. We first recall the Peterson–Gorenstein–Zierler syndrome-decoding algorithm [Pet60, GZ61]. T Given a corrupted codeword y, let (s1, . . . , sq/2) := Hy be the syndrome of y. Suppose y has v < q/2 errors. Let E denote the set of its corrupted positions. Let

v Y i X i Λv(x) := (1 − α x) = 1 + λix i∈E i=1 be the error locator polynomial. The syndromes and the coefficients of Λv are linearly related by λvsj−v + λv−1sj−v+1 + ··· + λ1sj−1 + sj = 0 ,

for j > v. This forms a linear system with unknowns λi. The algorithm decodes by at- tempting to solve the corresponding linear systems with v errors, where v ranges from 2e to 1. Note that the system has a unique solution if and only if y and some codeword differ by exactly v positions, for some v between 1 and 2e. Thus, f computes the determinants of the 2e < q/4 systems and accepts if and only if one of them is nonzero. Since computing determinant is in NC2 [Ber84], f can be computed by an NC2 circuit. The system always has a solution when inputs are under D + D and so f always accepts. On the other hand, f accepts at most

2e X q − 1 qq/2 · (q − 1)i ≤ qq/2 · 2qhq(1/6) ≤ 22n/3+o(n) i i=0 possible strings, where

hq(x) := x logq(q − 1) − x logq x − (1 − x) logq(1 − x) is the q-ary entropy function. Therefore, f distinguishes D+D from the uniform distribution with probability at least 1 − 2−n/4.

5.3.3 One-way log-space computation

Proof of Theorem 5.4.iii. Let q be a power of 2. Consider the [q − 1, q − 6c, 6c]2log q Reed– Solomon code C over F2log q , which has dual minimum distance q − 6c + 1 and can decode from 3c errors. Applying Fact 5.11 to C with e = c, we obtain a distribution D over n := (q − 1) log q bits that is O(c log n/n)c-biased.

117 q Let H be a parity-check matrix of C. On input y ∈ F2log q , our distinguisher f computes s2e+1, . . . , s4e from the syndrome s := Hy. Clearly this can be implemented in one-pass and space (2e + O(1)) log q. Finally, using the Peterson–Gorenstein–Zierler syndrome-decoding algorithm, f accepts if and only if y differs from a codeword of C by at most 2e positions. Since f accepts at most

2e X q − 1 qq−6c · (q − 1)i ≤ qq−6c · 2q4c ≤ O(qq−2c) i i=0

strings, f distinguishes D + D from uniform with probability 1 − O(log n/n)2c. Computing the input for syndrome decoding requires space (2e + O(1)) log q. We now show that in order to obtain n−c bias via our construction, we always have 2e log q = Ω(c log n). Thus, one cannot answer the “RL question” in the negative via syndrome de- coding.

Claim 5.16. For every q ≥ n + 1, let C be an [n, k, d] code over Fq which decodes from e errors, and d⊥ be its dual minimum distance. If C satisfies (1−d⊥/n)e < q−c for sufficiently large c, then we have e log q = Ω(c log n).

Proof. If d⊥ > (1−1/q)n, then by the Plotkin bound on the dual code, n−k = O(1). By the Singleton bound, e ≤ d ≤ n−k and so we have e = O(1). Hence, (1−d⊥/n)e = (1/q)e ≥ q−c for sufficiently large c, and therefore the condition is not satisfied. On the other hand, suppose d⊥ ≤ (1 − 1/q)n. Then (1 − d⊥/n)e ≥ (1/q)e. The condition (1 − d⊥/n)e < q−c implies e log q > c log q > c log n.

5.3.4 Depth 3 circuits, DNF formulas and AC0 circuits Proof of Theorem 5.5. We will use Valiant’s depth reduction [Val77, Val83] (cf. [Vio09b, Theorem 25]). Theorem 5.17 ([Val77, Val83]). Let C : {0, 1}n → {0, 1} be a circuit of size cn, depth c log n and fan-in 2. The function computed by C can also be computed by an unbounded c0n/ log log n fan-in circuit of size 2 and depth 3 with inputs x1, x2, . . . , xn, x1, x2,..., xn, where c0 depends only on c. By the assumption that P has fan-in 2 circuits of linear size and logarithmic depth and the fact that f in Theorem 5.4.i is in P, we can apply Theorem 5.17 to f and obtain an unbounded fan-in depth-3 circuit f 0 of size 2O(n/ log log n) that computes the same function. Then we scale down n to n0 = log n log log log log n bits (we set the rest of the n − n0 bits −ω(1) 0 o(1) uniformly at random) to get an n -biased distribution Dn0 and a circuit fn0 of size n −ω(1) and depth 3 that distinguishes Dn0 + Dn0 from uniform with probability at least 1 − n . This proves Theorem 5.5.i. 0 To prove Theorem 5.5.ii, note that fn0 accepts with probability 1 under Dn0 + Dn0 and 0 without loss of generality we can assume fn0 is an AND-OR-AND circuit. Hence, it contains

118 00 00 00 a DNF f such that (1) f accepts under Dn0 + Dn0 with probability 1, and (2) f rejects with probability at least 1/2no(1) under the uniform distribution. Proof of Theorem 5.6. Let D and f be the distribution and distinguisher in Theorem 5.4.ii, respectively. Let Dn0 and fn0 be the scaled distribution and distinguisher of D and f on n0 = logc+1 n bits, respectively. (We set the rest of the n − n0 bits uniformly at random.) −Ω(n0/log n0) −Ω(logc n) 0 Dn0 has bias 2 = n . By our assumption, fn0 is in AC and distinguishes − logc n/4 Dn0 + Dn0 from uniform with probability 1 − n .

5.3.5 Mod 3 linear functions Recall the definition of mod p rank in Definition 5.7.

n 2 Fact 5.18 (Lemma 7.1 and 7.2 in [MZ09]). Let S,T be two sets of vectors in F3 . Define S to be the set {x ×3 x : x ∈ S}, where x ×3 y denote the pointwise product of two vectors x and y (over F3). Then 2 2 (1) rank3(S ) ≤ rank3(S) ; n n (2) when S and T are subsets in {0, 1} ⊆ F3 ,

rank3(S +2 T ) ≤ rank3(S) + rank3(T ) + rank3(S) rank3(T ) .

2 If S = T , then rank3(S +2 S) ≤ rank3(S) + rank3(S) .

Proof. Let dS := rank3(S) and dT := rank3(T ). Let {β1, . . . , βdS } be a basis of S and

{γ1, . . . , γdT } be a basis of T . Let

d d XS XT x = ciβi ∈ S and y = rjγj ∈ T i=1 j=1

be any two vectors. We have X x ×3 x = cicj(βi ×3 βj) .

i,j∈[dS ]

2 Thus {βi ×3 βj}i,j∈[dS ] spans S , proving (1). For (2), observe that for any a, b ∈ {0, 1} ⊆ F3, we have a +2 b = a +3 b +3 a ×3 b. Hence we have

d d XS XT X x +3 y +3 x ×3 y = ciβi + rjγj + cirj(βi ×3 γj) ,

i=1 j=1 i∈[dS ],j∈[dT ]

and thus

{βi}i∈[dS ] ∪ {γj}j∈[dT ] ∪ {βi ×3 γj}i∈[dS ],j∈[dT ]

spans S +2 T .

119 The following lemma is well-known (cf. [Nis91]). We include a proof here for completeness.

Lemma 5.19. There exists a pseudo-design (S1,...,Sn) over the universe [d] such that

1. |Si| = t for every i ∈ [d], and

2. |Si ∩ Sj| ≤ tˆ for every i 6= j ∈ [d], where d = n2β, t = nβ, and tˆ= log n for any β < 0.5. We will use the following Chernoff bound in the proof.

Claim 5.20 (Chernoff bound). Let X1,...,Xn ∈ {0, 1} be n independent and identically distributed variables with E[Xi] = µ for each i. We have " n # 2 X −t /4n Pr Xi − µn ≥ t ≤ e . i=1

Proof of Lemma 5.19. It suffices to show that given S1,...,Si−1, there exists a set S such that |S| ≥ t and |S ∩ Sj| ≤ tˆ for j < i. Consider picking each element in [d] to be in S with probability p = 0.1 log n/nβ. We have E[|S|] = pd ≥ 2nβ. By the Chernoff bound, Pr[|S| < t = nβ] ≤ 2−nβ /4 < 1/2 .

We also have E[|S ∩ Sj|] = pt = 0.1 log n. Again by the Chernoff bound, −4 log n Pr[|S ∩ Sj| > tˆ= log n] ≤ 2 < 1/2n . It follows by a union bound that with nonzero probability there is an S which satisfies the two conditions above. Proof of Theorem 5.4.iv. Let α < 1/36 and β = 4α. Also let d, t, tˆ be the parameters and d S1,...,Sn be the pseudo-design specified in Lemma 5.19. Define the function L: {0, 1} → n {0, 1} whose i-th output bit yi equals 2 2 X  mod3(xSi ) := xj mod 3 . j∈Si

Let T1 be the image set of L. Without the square, this set has mod 3 rank d and so 2 16α by Fact 5.18, rank3(T1) = O(d ) = O(n ). Let T2 be an ε-almost k-wise independent set, where ε = 1/nα and k = 2 log n. Known constructions [AGHP92, Theorem 2] (see 2 also [NN93]) produce such a set of size O((k log n)/ε) and therefore rank3(T2) is at most O(n2α log4 n). 18α 4 Consider the set T := T1 +2 T2. By Fact 5.18, T has rank at most O(n log n). By 36α 8 the same fact, T +2 T has rank at most O(n log n) < n because α < 1/36. Therefore, there is a non-zero mod 3 linear function ` such that `(y) ≡ 0 (mod 3) for any y ∈ T , while Pr[`(y) = 0] ≤ 1/2 for a uniform y in {0, 1}n. It remains to show that T is O(1/n0.99α)- biased. For any test on I ⊆ [n], we consider the cases (1) when |I| ≤ k, and (2) when |I| > k separately. Write y = y1 + y2, where y1 ∈ T1 and y2 ∈ T2. Case (1) follows from the fact that T2 is 1/nα-almost k-wise independent. Case (2) follows from the following claim.

120 0.99α Claim 5.21. For any |I| > k, we have |Ey1∈T1 [χI (y1)]| ≤ O(1/n ), where χI (z) := P z (−1) i∈I i . Proof. Pick a subset J ⊆ I of size k. Define f, p: {0, 1}n → {0, 1} to be

X 2 X 2 f(x) := mod3(xSi ) and p(x) := mod3(xSi ) , i∈J i∈I\J respectively. Observe that (f(x)+p(x)) Exi:i∈[d][χI (y1)] = Exi:i∈[d][(−1) ] , which is the correlation between f and p. Consider the sets Sj ⊆ [d] with j ∈ J. Let B1 be the set of indices appearing in their pairwise intersections. That is, B1 := {` ∈ [d]: ` ∈ Si ∩Sj for some distinct i, j ∈ J}. Fixing 2 β ˆ β the value of every x` ∈ B1, each mod 3(xSj ) in f becomes a function on m := n −t·k ≥ 0.9n bits. Let B2 be the set of indices in [d] outside the Sj for j ∈ J. The bits in B2 do not 2 affect the outputs in J. Fixing their values, each mod3(xSj ) in p is a function of at most ˆ 2 2 t·k = O(log n) bits and so can be written as a polynomial of degree O(log n) over F2. Since 2 2 p is a parity of values mod3(xSj ), it can also be written as a polynomial of degree O(log n) over F2. We will use the following theorem by Smolensky [Smo87] (cf. [Vio09b, Chapter 1]). The proof in [Vio09b] has the condition that n is divisible by 3. This condition can be removed. For example, when n = 3` + 1, we can set a random bit of the uniform distribution to zero. This distribution is close to uniform, but now we can apply [Vio09b] as stated. Theorem 5.22 ([Smo87]). There exists an absolute constant ε > 0 such that for every n n √ that is divisible by 3 and for every polynomial p: F2 → F2 of degree at most ε n, we have  mod2(x)+p(x) Ex∼{0,1}n (−1) 3 ≤ 0.9 .

To build intuition, note that after fixing the input bits in B1 and B2, for each of the 2 mod3(xSj ) in f, by Theorem 5.22 we have

h (mod2(x )+p(x))i (−1) 3 Sj ≤ 1 − Ω(1) . Exi:i∈[d] In the following lemma we prove a variant of Impagliazzo’s XOR Lemma [Imp95] to show that (−1)(f(x)+p(x)) ≤ O(1/m0.249) = O(1/n0.99α) . Exi:i∈[d]

Averaging over the values of the xk in B1 and B2 finishes the proof. Lemma 5.23. Let k = 2 log m, define f : {0, 1}m×k → {0, 1} by

(1) (k) 2 (1) 2 (k) f(x , . . . , x ) := mod3(x ) +2 ··· +2 mod3(x ) . Let p: {0, 1}m×k → {0, 1} be any polynomial of degree O(log2 m). We have  f(x)+p(x) 0.249 Cor(f, p) := Ex∼{0,1}m×k (−1) ≤ O(1/m ) .

121 Proof. We will use the fact that Theorem 5.22 holds for degree nΩ(1) polynomials to get correlation 1/nΩ(1) for polynomials of degree polylog(n). As in the proof in [Imp95] we first show the existence of a measure M : {0, 1}m → [0, 1] P m of size at most |M| := x M(x) = 2 /4 such that with respect to its induced distribution 2 0.249 D(x) := M(x)/|M|, the function mod3 is 1/2m -hard for any polynomial p of degree O(log2 m), i.e. , 2 0.249 Pr [mod3(x) = p(x)] ≤ 1/2 + 1/4m . x∼D Suppose not. Lemma 1 in [Imp95] implies that one can obtain a function q by taking the majority of O(m0.498) polynomials of degree O(log2 m) such that

2 Pr [mod3(x) = q(x)] > 0.99 . x∼{0,1}m

Note that q can be represented as a degree O(m0.498 log2 m) polynomial. By Theorem 5.22,

2 Pr [mod3(x) = p(x)] ≤ 0.95 x∼{0,1}m for any degree εm1/2 polynomial p, a contradiction. m m 2 0.249 Now we show that there is a set S ⊆ {0, 1} of size 2 /8 such that mod3 is 1/m - hard-core on S for any polynomial p of degree O(log2 m), i.e. ,

3 0.249 Pr [mod2(x) = p(x)] ≤ 1/2 + 1/2m . x∼S Let p be any degree-O(log2 m) polynomial. For any measure M : {0, 1}m → [0, 1], define Advp(M) by X mod2(x)+p(x) Advp(M) := M(x)(−1) 3 . x We construct S probabilistically by picking each x to be in S with probability M(x). Let 0.249 MS be the indicator function of S. Then ES[Advp(MS)] = Advp(M) ≤ |M|/(2m ). Note m that Advp(MS) is the sum of 2 independent random variables, where each variable is over [−1, 0] or [0, 1]. By Hoeffding’s inequality,

0.249 −2|M|2/(2m·4m0.498) −2m/32m0.498 Pr[Advp(MS) > |M|/m ] ≤ 2 = 2 . S

2 mO(log m) 2 m Note that there are 2 polynomials of degree log m. Moreover, since ES[|S|] = 2 /4, m again by Hoeffding’s inequality, PrS[|S| < 2 /8] ≤ 1/2. Hence, by a union bound, the required S exists. It follows that there exists a set of inputs S ⊆ {0, 1}m of size at most m 2 0.249 2 2 /8 such that mod3 is 1/m -hard-core on S for any polynomial of degree O(log m). Now we apply the following lemma, which is stated in [Imp95] for circuits, but the same proof applies to polynomials. Lemma 5.24 (Lemma 4 in [Imp95]). If g is ε-hard-core for some set of δ2n inputs for (1) (k) (1) (k) polynomials of degree d, then the function f(x , . . . , x ) := g(x ) +2 ··· +2 g(x ) is ε + (1 − δ)k-hard-core for polynomials of the same degree.

122 Applying this lemma with our choice of k, we have for any polynomial p of degree O(log2 m),

Pr[f(x) = p(x)] ≤ (1 + 1/m0.249 + (7/8)k)/2 = 1/2 + O(1/m0.249) . x

Hence f is O(1/m0.249)-hard for any polynomial of degree O(log2 m), and the lemma follows.

Proof of Claim 5.9. We replace the pseudo-design in the proof of Theorem 5.4.iv with one that has set size t = O(log4 n) and intersection size tˆ = O(log n). Using the same idea as in the proof of Lemma 5.19 one can show that such pseudo-design exists provided the universe is of size d = O(log8 n). Now, using the same argument, for tests of size larger than c log n, we apply (1) to f and p, which are the parity of c log n copies of mod 3 function on m = O(log4 n) bits and a polynomial of degree O(log2 n), respectively. This gives bias c 2 16 O(1/n ). Note that the image set T1 now has mod 3 rank d = O(log n). For tests of size at most c log n, we replace the almost k-wise independent set with the n−c-almost k-wise independent distribution given by (2), which has bias n−c, and we denote the support of the distribution by T2. 0.49 16 0.5 By Fact 5.18, T := T1 +2 T2 has mod 3 rank O(n log n) = o(n ). Hence, T +2 T has rank less than n and the claim follows.

5.3.6 Sum of k copies of small-bias distributions We now show that the results hold for k copies when ε is replaced by ε2/k, proving the “Moreover” part in Theorem 5.4, 5.5 and 5.6.

Proof of “Moreover” part in Theorem 5.4, 5.5 and 5.6. To prove Theorem 5.4.i, 5.4.ii and 5.4.iii, we can replace e by 2e/k in their proofs to obtain distributions D0 that are ε2/k- biased. Since we have to throw in at least one error, 2e/k ≥ 1. The rest follows by noting the sum of k copies of D0 is identical to D + D. By scaling down the above small-bias distributions D0 for Theorem 5.4.i and 5.4.ii to n0 bits as in the proofs of Theorem 5.5 and 5.6, respectively, we obtain ε2/k-biased distributions 0 0 Dn0 so that the sum of k copies of Dn0 is identical to Dn0 + Dn0 in Theorem 5.6 and 5.5. Moreover, k scales from k(n) to k(n0). For Theorem 5.4.iv, let α := log(1/ε)/ log n and so ε2/k = n−2α/k. We set β = 8α/k −2α/k instead of 4α in the construction of T1 and replace T2 by an n -almost 2 log n-wise inde- 0 0 0 pendent set in the proof, and call them T1 and T2, respectively. We now have rank3(T1) = 32α/k 0 4α/k 4 0 0 0 O(n ) and rank3(T2) = O(n log n). Thus, the set T := T1 +2 T2 has rank at 36α/k 4 0 k most O(n log n) and therefore the sum of k copies has rank at most rank3(T ) = O(n36α log4k n) < n, for k < O(log n/ log log n). The bias of T 0 follows from the facts that 0 −2α/k 0 −2α/k T2 has bias n against tests of size at most 2 log n, and T1 has bias O(n ) for tests of size greater than 2 log n.

123 5.4 Mod 3 rank of k-wise independence

In this section, we begin a systematic investigation on the mod 3 rank of k-wise independent distributions. Recall Definition 5.7 of mod p rank. We also define the mod p rank of a matrix over the integers to be its rank over Fp. We also write rankp for mod p rank. We will sometimes work with vectors over {−1, 1} instead of {0, 1}. Note that the map (1 − x)/2 convert the values 1 and −1 to 0 and 1, respectively, and so the mod 3 rank of a set will differ by at most 1 when we switch vector values from {−1, 1} to {0, 1}, and vice versa. While we state our results for mod 3, all the results in this section can be extended to mod p for any odd prime p naturally.

5.4.1 Lower bound for almost k-wise independence

In the following claim we give a rank lower bound on almost k-wise independent distributions. Here “almost” is measured with respect to statistical distance. (Another possible definition is the max bias of any parity.)

n Claim 5.25. Let D be any subset {0, 1} . If rank3(D) = t, then D is not 1/10-almost ct/ log(n/t)-wise independent, for a universal constant c.

This gives an exponential separation between seed length and rank for almost k-wise independence. Indeed, for k = O(1), the seed length is Θ(log log n), whereas the rank must be Ω(log n).

⊥ ⊥ Proof. Let C be the span of D over F3 and C be its orthogonal complement. C has dimen- ⊥ ⊥ sion n − t. We view C as a linear code over F3 and let d be its minimum distance. Since C⊥ is linear, d⊥ equals the minimum Hamming weight of its non-zero elements. Moreover, by the Singleton bound, d⊥ − 1 ≤ t. By the Hamming bound, that is,

⊥ b d −1 c X2 n 3n−t 2i ≤ 3n , i i=0

124 we have

 ⊥  b d −1 c X2 n t log 3 ≥ log  2i 2 2  i  i=0    n d⊥−1 b 2 c ≥ log2 d⊥−1 · 2 b 2 c d⊥−1 !b 2 c n d⊥ − 1 ≥ log + 2 d⊥−1 2 b 2 c d⊥ − 1 n ≥ log2 t 2 b 2 c  n = Ω d⊥ log , 2 t where we use the fact that d⊥ − 1 ≤ t in the last inequality. Hence, d⊥ ≤ O(t/ log(n/t)). ⊥ ⊥ Now, let y be a codeword in C with Hamming weight d . Let I := {i | yi 6= 0}. Note that for every x ∈ D, we have hy, xi3 = 0 on I. On the other hand, for a uniformly I distributed x in {0, 1} we have hy, xi3 = 0 with probability at most 1/2. Therefore, D is constant bounded away from uniform on the d⊥ bits indexed by I, and thus cannot be close to d⊥-wise independent.

5.4.2 Pairwise independence We now show that the mod 3 rank of a pairwise independent set can be as small as n0.73. Then we give evidence that our approach cannot do any better. Definition 5.26. We say H is an Hadamard matrix of order n if its entries are ±1 and it T satisfies HH = nIn, where In is the n × n identity matrix. It is well-known that by removing the all-ones row of an Hadamard matrix H, which can always be created by multiplying each column by its first element, the uniform distribution over the columns of the truncated matrix is pairwise independent. Henceforth we will work with vectors whose entries are from {−1, 1} = {2, 1} ⊆ F3. The following two claims show that certain Hadamard matrices cannot have dimension smaller than n/2. They are taken from [Wil12], and here we give a self-contained proof for completeness. First, we would give a lower bound to the mod p rank from the determinant of any square matrix. Claim 5.27 (Theorem 1 in [Wil12]). Let A be an n × n matrix over the integers. Assume e+1 p - det(A). Then rankp(A) ≥ n − e.

Proof. Suppose nullityp(A) = n − r. Let (β1, . . . , βn−r) be a basis of the null space of A over n Fp. Extend the basis to (β1, . . . , βn) so that it forms a basis of Fp . Let B be the matrix whose columns are the βi. Note that det(B) 6≡ 0 (mod p) and det(AB) = det(A) det(B). Thus,

125 s s p | det(A) if and only if p | det(AB). By construction, β1, . . . , βn−r are in the null space of n−r A over Fp and so the first n − r columns of AB are zero mod p. Hence p | det(A).

Claim 5.28 (Theorem 2 in [Wil12]). Let H be an n × n Hadamard matrix. Let p be an odd 2 prime such that p - n. Then rankp(H) ≥ n/2.

Proof. Since H is an Hadamard matrix, we have HHT = nI and so det(H) det(HT ) = det(H)2 = nn. Hence |det(H)| = nn/2. By the condition on p we have that pn/2+1 - nn/2. Hence, it follows from Claim 5.27 that rankp(H) ≥ n/2.

The following claim characterizes Hadamard matrices with mod p rank at most n/2.

Claim 5.29. Let H be an n×n Hadamard matrix. If p | n, then rankp(H) ≤ n/2. Otherwise, rankp(H) = n.

Proof. If p | n, then by Sylvester’s rank inequality, we have

T T rankp(H) + rankp(H ) − n ≤ rankp(HH ) = rankp(nIn) = 0 .

Hence, rankp(H) ≤ n/2, proving the first part. For the second part, if p - n then

det(H)2 = det(H) det(HT ) 6≡ 0 (mod p)

and so det(H) 6≡ 0 (mod p). Hence rankp(H) = n.

Now we give a construction of Hadamard matrices whose orders do not satisfy the con- dition in Claim 5.28. These matrices have much smaller mod p ranks than the lower bound stated in Claim 5.28. Note that the affine bijection L: {−1, 1}n → {0, 1}n defined by L(v) = (1 − v)/2, where 1 is the all-ones vector, maps vectors from {−1, 1}n to {0, 1}n. We have the following facts.

n Fact 5.30. Let S ⊆ {−1, 1} be a set containing the all-ones vector. Then rank3(L(S)) ≤ rank3(S).

Fact 5.31. If A and B are two Hadamard matrices then A⊗B is also an Hadamard matrix, where ⊗ indicates the Kronecker product.

The following is well-known.

Fact 5.32. Let A, B be two matrices over any field. Then we have rank(A ⊗ B) = rank(A) · rank(B).

Claim 5.33. For infinitely many values of n, there exists a pairwise independent distribution over {0, 1}n with mod 3 rank at most n0.73.

126 Proof. Paley [Pal33] constructed a (q − 1) × (q − 1) Hadamard matrix for every prime power q ≡ 1 (mod 4). Starting with an Hadamard matrix H12 over {−1, 1} = {2, 1} ⊆ F3 using Paley’s construction, for every n that is a power of 12, we construct the Hadamard matrix ⊗r Hn := H12 , where r = log12 n. It follows from Claim 5.28 and 5.29 that rank3(H12) = 6. log n 0.73 Hence, by Fact 5.32, Hn has rank 6 12 ≤ n . As discussed above, we can assume H12 contains an all-ones row. Thus, Hn also contains an all-ones row, and the claim follows from Fact 5.30.

We start from an m × m Hadamard matrix with mod 3 rank m/2, for some m. The smaller m we start from, the better exponent we get. Since Hadamard matrices must be of order 1, 2, or multiple of 4, Claim 5.28 implies that 12 is indeed the smallest possible m.

5.5 Complexity of decoding

In this section we prove some negative results on the complexity of decoding. In [Vio06, Chapter 6] and [SV10] it is shown that list-decoding binary codes from error rate 1/2−ε requires computing the majority function on 1/ε bits, which implies lower bounds for list-decoding over several computational models. Using a similar approach, we give lower bounds on the decoding complexity for AC0 circuits and read-once branching programs. We give a reduction from ε-approximating the majority function to decoding (1/2 − ε)d errors of a code, where d is the minimum distance. Define ε-MAJ to be the promise problem on {0, 1}n, where the YES and NO instances are strings of Hamming weight at least (1/2 + ε)n and at most (1/2 − ε)n, respectively. We say that a probabilistic circuit solves ε-MAJ if it accepts every YES instance with probability at least 2/3 and accepts every NO instance with probability at most 1/3. Let C ⊆ {0, 1}n be a code with minimum distance d and let the codewords x and y differ by exactly d positions, respectively. Define ε-DECODE to be the promise problem on {0, 1}n, where the YES and NO instances are strings that differ from x and y by at most (1/2 − ε)d, respectively. The results in this section have been obtained while attempting to bound the complexity of the threshold discriminator explained in Section 5.2: distinguish a random noisy codeword from uniform. The ε-DECODE problem is different, as it asks to distinguish between two noisy codewords.

Lemma 5.34. If a function D : {0, 1}n → {0, 1} solves ε-DECODE, then a restriction of D solves ε-MAJ on d bits.

Proof. Let x, y ∈ C be the codewords at Hamming distance d. Without loss of generality, we assume x and y differ in the first d positions. We further assume xi = 0 and yi = 1 for i ∈ [d]. Given an ε-MAJ instance w of length d, let z be the n-bit string where zi = wi for i ∈ [d] and zi = xi (= yi) otherwise. If w has weight at most (1/2 − ε)d, then w and x disagree in at most (1/2 − ε)d positions and therefore D accepts. Similarly, if w has weight at least (1/2 + ε)d then D rejects.

127 Shaltiel and Viola [SV10] show that depth-c AC0[⊕] circuits can solve ε-MAJ only if ε is at least 1/O(log n)(c+2). Brody and Verbin [BV10b] show that ε-MAJ can be solved by a read-once width-w branching program whenever ε is at least 1/(log n)Θ(w). Combining these results with Lemma 5.34, we have the following claim.

Claim 5.35. Let D : {0, 1}n → {0, 1} be a function. 1. If D is computable by an AC0[⊕] circuit of depth c, then it can only solve ε-DECODE with ε ≥ 1/O(log d)c+2. 2. If D is computable by a read-once width-w branching program, then it can only solve ε-DECODE with ε ≥ 1/(log d)Θ(w).

We also note the following negative result for decoding by low-degree polynomials.

Claim 5.36. Let C ⊆ {0, 1}n be an [n, k, d] code with dual minimum distance d⊥. If

t−1  d⊥ e/2 2−t > 16 1 − n

d−1 for some constant t and e ≤ b 2 c, then no degree-t polynomial over F2 can be a threshold-te discriminator for C.

Proof. Suppose on the contrary a polynomial P is a threshold-te discriminator for C. By Fact 5.11 and the Schwartz–Zippel Lemma, there exists an ε := (1 − d⊥/n)e-biased distribu- tion D such that P distinguishes the sum of t independent copies of D from uniform with probability at least 2−t. But by [Vio09c], the sum of t copies of D fools P with probability 16ε1/2t−1 , a contradiction.

128 Bibliography

[Aar10] Scott Aaronson. BQP and the polynomial hierarchy. In STOC’10—Proceedings of the 2010 ACM International Symposium on Theory of Computing, pages 141–150. ACM, New York, 2010. 4.1

[ABI86] , L´aszl´oBabai, and Alon Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms, 7(4):567– 583, 1986. 2.5.2

[ABN+92] Noga Alon, Jehoshua Bruck, Joseph Naor, , and Ron M. Roth. Construction of asymptotically good low-rate error-correcting codes through pseudo-random graphs. IEEE Transactions on Information Theory, 38(2):509– 516, March 1992.5

[ABO84] Mikl´osAjtai and Michael Ben-Or. A theorem on probabilistic constant depth computations. In Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing, STOC ’84, pages 471–474, New York, NY, USA, 1984. ACM. 4.1

[ABR16] Benny Applebaum, Andrej Bogdanov, and Alon Rosen. A dichotomy for local small-bias generators. J. Cryptology, 29(3):577–596, 2016. 5.2

[AGHP92] Noga Alon, Oded Goldreich, Johan H˚astad,and Ren´ePeralta. Simple con- structions of almost k-wise independent random variables. Random Structures Algorithms, 3(3):289–304, 1992. 2.1.2, 2.4, 2.5.2,5, 5.3.5

[AGM03] Noga Alon, Oded Goldreich, and Yishay Mansour. Almost k-wise independence versus k-wise independence. Inform. Process. Lett., 88(3):107–110, 2003. 2.2

1 [Ajt83] Mikl´osAjtai. Σ1-formulae on finite structures. Ann. Pure Appl. Logic, 24(1):1– 48, 1983. 2.1, 4.1

[AKL+79] Romas Aleliunas, Richard M. Karp, Richard J. Lipton, L´aszl´oLov´asz,and Charles Rackoff. Random walks, universal traversal sequences, and the com- plexity of maze problems. In 20th Annual Symposium on Foundations of Com- puter Science (San Juan, Puerto Rico, 1979), pages 218–223. IEEE, New York, 1979.1

129 [AKS87] Mikl´osAjtai, J´anosKoml´os,and Endre Szemer´edi.Deterministic simulation in logspace. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 132–140. ACM, 1987.1,2, 2.1.2

[AKS04] , , and . PRIMES is in P. Ann. of Math. (2), 160(2):781–793, 2004.1

[Ama09] Kazuyuki Amano. Bounds on the size of small depth circuits for approximating majority. In Automata, languages and programming. Part I, volume 5555 of Lecture Notes in Comput. Sci., pages 59–70. Springer, Berlin, 2009. 4.1

[ASWZ96] Roy Armoni, Michael Saks, , and Shiyu Zhou. Discrepancy sets and pseudorandom generators for combinatorial rectangles. In 37th Annual Symposium on Foundations of Computer Science (Burlington, VT, 1996), pages 412–421. IEEE Comput. Soc. Press, Los Alamitos, CA, 1996.2, 2.1.2

[AW89] Mikl´osAjtai and Avi Wigderson. Deterministic simulation of probabilistic con- stant depth circuits. Advances in Computing Research, 5(199-222):1, 1989.1, 2.1.2, 2.1.2, 3.1, 4.1, 4.1.1, 4.3

[BATS13] Avraham Ben-Aroya and Amnon Ta-Shma. Constructing small-bias sets from algebraic-geometric codes. Theory Comput., 9:253–272, 2013.5

[Baz05] Louay M. J. Bazzi. Encoding complexity versus minimum distance. IEEE Transactions on Information Theory, 51(6):2103–2112, June 2005. 2.1.1

[Baz09] Louay M. J. Bazzi. Polylogarithmic independence can fool DNF formulas. SIAM J. Comput., 38(6):2220–2272, 2009.2, 2.1, 3.5.2, 3.43, 5.1, 5.2

[BDVY13] Andrej Bogdanov, Zeev Dvir, Elad Verbin, and Amir Yehudayoff. Pseudoran- domness for width-2 branching programs. Theory Comput., 9:283–292, 2013.1, 5.2,5, 5.1, 5.2

[Ber84] Stuart J. Berkowitz. On computing the determinant in small parallel time using a small number of processors. Inform. Process. Lett., 18(3):147–150, 1984. 5.3.2

[BHLV18] Ravi Boppana, Johan H˚astad,Chin Ho Lee, and Emanuele Viola. Bounded independence versus symmetric tests. 2018. ECCC TR16-102. 3.1.1, 5.1

[BJMM12] Anja Becker, Antoine Joux, Alexander May, and Alexander Meurer. Decoding random binary linear codes in 2n/20: how 1 + 1 = 0 improves information set decoding. In Advances in cryptology—EUROCRYPT 2012, volume 7237 of Lecture Notes in Comput. Sci., pages 520–536. Springer, Heidelberg, 2012. 2.1.1

[BM84] Manuel Blum and . How to generate cryptographically strong se- quences of pseudorandom bits. SIAM J. Comput., 13(4):850–864, 1984.1

130 [BPW11] Andrej Bogdanov, Periklis A. Papakonstantinou, and Andrew Wan. Pseudo- randomness for read-once formulas. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science—FOCS 2011, pages 240–246. IEEE Com- puter Soc., Los Alamitos, CA, 2011.2, 2.1.2, 2.1.2

[BPW12] Andrej Bogdanov, Periklis A. Papakonstantinou, and Andrew Wan. Pseudoran- domness for linear length branching programs and stack machines. In Approxi- mation, randomization, and combinatorial optimization, volume 7408 of Lecture Notes in Comput. Sci., pages 447–458. Springer, Heidelberg, 2012. 2.1.2

[Bra10] Mark Braverman. Polylogarithmic independence fools AC0 circuits. J. ACM, 57(5):Art. 28, 10, 2010.2, 5.1, 5.2

[BV10a] Andrej Bogdanov and Emanuele Viola. Pseudorandom bits for polynomials. SIAM J. Comput., 39(6):2464–2486, 2010.1,3,5, 5.3

[BV10b] Joshua Brody and Elad Verbin. The coin problem, and pseudorandomness for branching programs. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science—FOCS 2010, pages 30–39. IEEE Computer Soc., Los Alamitos, CA, 2010. 4.1, 5.5

[BYRST02] Ziv Bar-Yossef, , Ronen Shaltiel, and Luca Trevisan. Stream- ing computation of combinatorial objects. In Proceedings 17th IEEE Annual Conference on Computational Complexity, pages 165–174, May 2002. 2.1.1

[CG89] Benny Chor and Oded Goldreich. On the power of two-point based sampling. J. Complexity, 5(1):96–106, 1989. 2.5.2

[CGR14] Gil Cohen, Anat Ganor, and Ran Raz. Two sides of the coin problem. In Approximation, randomization, and combinatorial optimization, volume 28 of LIPIcs. Leibniz Int. Proc. Inform., pages 618–629. Schloss Dagstuhl. Leibniz- Zent. Inform., Wadern, 2014. 4.1

[Cha02] Mei-Chu Chang. A polynomial bound in Freiman’s theorem. Duke Math. J., 113(3):399–419, 2002. 4.1.1, 4.2

[CHHL18] Eshan Chattopadhyay, Pooya Hatami, Kaave Hosseini, and Shachar Lovett. Pseudorandom generators from polarizing random walks. In 33rd Computational Complexity Conference, volume 102 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 1, 21. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2018. 4.1

[CHLT19] Eshan Chattopadhyay, Pooya Hatami, Shachar Lovett, and Avishay Tal. Pseu- dorandom generators from the second Fourier level and applications to AC0 with parity gates. In 10th Innovations in Theoretical Computer Science, vol- ume 124 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 22, 15. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2019. 4.1, 4.1

131 [CHRT18] Eshan Chattopadhyay, Pooya Hatami, Omer Reingold, and Avishay Tal. Im- proved pseudorandomness for unordered branching programs through local monotonicity. In STOC’18—Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 363–375. ACM, New York, 2018.1, 3.1,4, 4.1, 4.1

[Coh16] Gil Cohen. Two-source dispersers for polylogarithmic entropy and improved Ramsey graphs. In STOC’16—Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 278–284. ACM, New York, 2016.1

[CRS00] Suresh Chari, Pankaj Rohatgi, and Aravind Srinivasan. Improved algorithms via approximations of probability distributions. J. Comput. System Sci., 61(1):81– 107, 2000.2, 2.1.2, 3.1, 4.1, 5.1,A,A

[CSV15] Sitan Chen, Thomas Steinke, and Salil Vadhan. Pseudorandomness for read- once, constant-depth circuits. arXiv preprint arXiv:1504.04675, 2015.1, 2.1.2, 4.1

[CT06] Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley- Interscience [John Wiley & Sons], Hoboken, NJ, second edition, 2006. 5.3.1

[CW79] J. Lawrence Carter and Mark N. Wegman. Universal classes of hash functions. J. Comput. System Sci., 18(2):143–154, 1979.1,2

[CZ16] Eshan Chattopadhyay and David Zuckerman. Explicit two-source extractors and resilient functions. In STOC’16—Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 670–683. ACM, New York, 2016.1

[De15] Anindya De. Beyond the central limit theorem: asymptotic expansions and pseudorandomness for combinatorial sums. In 2015 IEEE 56th Annual Sympo- sium on Foundations of Computer Science—FOCS 2015, pages 883–902. IEEE Computer Soc., Los Alamitos, CA, 2015.2, 2.1.2, 3.1

[DETT10] Anindya De, Omid Etesami, Luca Trevisan, and Madhur Tulsiani. Improved pseudorandom generators for depth 2 circuits. In Approximation, randomiza- tion, and combinatorial optimization, volume 6302 of Lecture Notes in Comput. Sci., pages 504–517. Springer, Berlin, 2010. 4.1, 5.1, 5.2

[DFKO07] , Ehud Friedgut, Guy Kindler, and Ryan O’Donnell. On the Fourier tails of bounded functions over the discrete cube. Israel J. Math., 160:389–412, 2007. 4.4, 4.21

[DGJ+10] Ilias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco A. Servedio, and Emanuele Viola. Bounded independence fools halfspaces. SIAM J. Comput., 39(8):3441–3462, 2010.2

132 [DHH18] Dean Doron, Pooya Hatami, and William Hoza. Near-optimal pseudorandom generators for constant-depth read-once formulas. 2018. ECCC TR18-183.1, 4.1

[DKN10] Ilias Diakonikolas, Daniel M. Kane, and Jelani Nelson. Bounded independence fools degree-2 threshold functions. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science—FOCS 2010, pages 11–20. IEEE Computer Soc., Los Alamitos, CA, 2010.2

[EGL+98] Guy Even, Oded Goldreich, Michael Luby, , and Boban Veliˇckovi´c. Efficient approximation of product distributions. Random Structures Algo- rithms, 13(1):1–16, 1998.2,2, 2.1.2, 2.1.2, 3.1, 5.1

[Erd47] Paul Erd¨os. Some remarks on the theory of graphs. Bull. Amer. Math. Soc., 53:292–294, 1947.1

[FK18] Michael A. Forbes and Zander Kelley. Pseudorandom generators for read-once branching programs, in any order. In 59th Annual IEEE Symposium on Founda- tions of Computer Science—FOCS 2018, pages 946–955. IEEE Computer Soc., Los Alamitos, CA, 2018.1, 1.1, 3.1.1, 3.1.1, 3.5.4, 3.45, 3.5.4, 3.5.4, 4.1, 4.1.1, 4.3, 4.3

[FSS84] Merrick Furst, James B. Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy. Math. Systems Theory, 17(1):13–27, 1984. 2.1

[GKM18] Parikshit Gopalan, Daniel M. Kane, and Raghu Meka. Pseudorandomness via the discrete Fourier transform. SIAM J. Comput., 47(6):2451–2487, 2018.1,2, 2, 2.1.2, 2.1.2, 2.1.2, 2.1.3, 2.49, 2.6, 3.1, 3.1.1, 3.1.1, 3.4, 4.1, 4.1, 4.1

[GLS12] Dmitry Gavinsky, Shachar Lovett, and Srikanth Srinivasan. Pseudorandom gen- erators for read-once ACC0. In 2012 IEEE 27th Conference on Computational Complexity—CCC 2012, pages 287–297. IEEE Computer Soc., Los Alamitos, CA, 2012.3, 3.1.1

[GMR+12] Parikshit Gopalan, Raghu Meka, Omer Reingold, Luca Trevisan, and Salil Vad- han. Better pseudorandom generators from milder pseudorandom restrictions. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science— FOCS 2012, pages 120–129. IEEE Computer Soc., Los Alamitos, CA, 2012.1, 2, 2.1, 2.1.2, 2.1.2, 2.1.3, 3.1, 3.1.1, 3.1.1, 3.1.1, 3.1.1, 3.5, 3.5.2, 4.1, 4.1, 4.1.1, 4.3

[GMRZ13] Parikshit Gopalan, Raghu Meka, Omer Reingold, and David Zuckerman. Pseu- dorandom generators for combinatorial shapes. SIAM J. Comput., 42(3):1051– 1076, 2013.2, 2.1.2, 3.1, 3.1.1

133 [GOWZ10] Parikshit Gopalan, Ryan O’Donnell, Yi Wu, and David Zuckerman. Fooling functions of halfspaces under product distributions. In 25th Annual IEEE Con- ference on Computational Complexity—CCC 2010, pages 223–234. IEEE Com- puter Soc., Los Alamitos, CA, 2010.2,2, 3.1

[GR05] Venkatesan Guruswami and Atri Rudra. Tolerant locally testable codes. In Approximation, randomization and combinatorial optimization, volume 3624 of Lecture Notes in Comput. Sci., pages 306–317. Springer, Berlin, 2005. 5.13

[Gro06] Andr´eGronemeier. A note on the decoding complexity of error-correcting codes. Inform. Process. Lett., 100(3):116–119, 2006. 2.1.1

[GSTW16] Parikshit Gopalan, Rocco A. Servedio, Avishay Tal, and Avi Wigderson. Degree and sensitivity: tails of two distributions. In 31st Conference on Computational Complexity, volume 50 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 13, 23. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2016.4

[GY14] Parikshit Gopalan and Amir Yehudayoff. Inequalities and tail bounds for elementary symmetric polynomial with applications. arXiv preprint arXiv:1402.3543, 2014.1,2, 2.1.2, 2.1.3, 3.1.1, 3.1.1, 4.1, 4.1

[GZ61] Daniel Gorenstein and Neal Zierler. A class of error-correcting codes in pm symbols. J. Soc. Indust. Appl. Math., 9:207–214, 1961. 5.2, 5.3.2

[H˚as] Johan H˚astad. Computational Limitations of Small Ddepth Circuits. Ph.D. Thesis – Massachusetts Institute of Technology. 2.1

[H˚as14] Johan H˚astad.On the correlation of parity and small-depth circuits. SIAM J. Comput., 43(5):1699–1708, 2014. 2.1

[Hat14] Hamed Hatami. Lecture notes on harmonic analysis of boolean functions, 2014. Available at http://cs.mcgill.ca/~hatami/comp760-2014/lectures. pdf. 3.5.1

[HT18] Pooya Hatami and Avishay Tal. Pseudorandom generators for low-sensitivity functions. In 9th Innovations in Theoretical Computer Science, volume 94 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 29, 13. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2018.1, 4.1

[IKW02] Russell Impagliazzo, Valentine Kabanets, and Avi Wigderson. In search of an easy witness: exponential time vs. probabilistic polynomial time. J. Comput. System Sci., 65(4):672–694, 2002. Special issue on complexity, 2001 (Chicago, IL).1

[IMP] Russell Impagliazzo, William Matthews, and Ramamohan Paturi. A satisfiabil- ity algorithm for AC0. 2.1

134 [Imp95] Russell Impagliazzo. Hard-core distributions for somewhat hard problems. In 36th Annual Symposium on Foundations of Computer Science (Milwaukee, WI, 1995), pages 538–545. IEEE Comput. Soc. Press, Los Alamitos, CA, 1995. 5.2, 5.3.5, 5.3.5, 5.24 [IMR14] Russell Impagliazzo, Cristopher Moore, and Alexander Russell. An entropic proof of Chang’s inequality. SIAM J. Discrete Math., 28(1):173–176, 2014. 4.1.1, 4.1.1, 4.2 [IMZ19] Russell Impagliazzo, Raghu Meka, and David Zuckerman. Pseudorandomness from shrinkage. J. ACM, 66(2):Art. 11, 16, 2019.2, 2.1.2 [INW94] Russell Impagliazzo, Noam Nisan, and Avi Wigderson. Pseudorandomness for network algorithms. In Proceedings of the Twenty-sixth Annual ACM Sympo- sium on Theory of Computing, STOC ’94, pages 356–364, New York, NY, USA, 1994. ACM.1,2, 2.1.2 [IW99] Russell Impagliazzo and Avi Wigderson. P = BPP if E requires exponential circuits: derandomizing the XOR lemma. In STOC ’97 (El Paso, TX), pages 220–229. ACM, New York, 1999.1 [KI04] Valentine Kabanets and Russell Impagliazzo. Derandomizing polynomial iden- tity tests means proving circuit lower bounds. Comput. Complexity, 13(1-2):1– 46, 2004.1 [KK13] Nathan Keller and Guy Kindler. Quantitative relation between noise sensitivity and influences. Combinatorica, 33(1):45–71, 2013. 4.1.1, 4.2 [KS09] Swastik Kopparty and Shubhangi Saraf. Tolerant linearity testing and locally testable codes. In Approximation, randomization, and combinatorial optimiza- tion, volume 5687 of Lecture Notes in Comput. Sci., pages 601–614. Springer, Berlin, 2009. 5.13 [KS18] Swastik Kopparty and Srikanth Srinivasan. Certifying polynomials for AC0[⊕] circuits, with applications to lower bounds and circuit compression. Theory Comput., 14:Article 12, 24, 2018. 4.1 [KSS19] Swastik Kopparty, Ronen Shaltiel, and Jad Silbak. Quasilinear time list- decodable codes for space bounded channels. In 2019 IEEE 60th Annual Sym- posium on Foundations of Computer Science—FOCS 2019. 2019. 1.1 [LN97] Rudolf Lidl and Harald Niederreiter. Finite fields, volume 20 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, second edition, 1997. With a foreword by P. M. Cohn. 3.46 [Lov09] Shachar Lovett. Unconditional pseudorandom generators for low-degree poly- nomials. Theory Comput., 5:69–82, 2009.1,3, 5.3

135 [LSS+18] Nutan Limaye, Karteek Sreenivasiah, Srikanth Srinivasan, Utkarsh Tripathi, and S Venkitesh. The coin problem in constant depth: Sample complexity and parity gates. In Electronic Colloquium on Computational Complexity (ECCC), number TR18–157, 2018. 4.1

[Lu02] Chi-Jen Lu. Improved pseudorandom generators for combinatorial rectangles. Combinatorica, 22(3):417–433, 2002.2, 2.1.2, 4.1

[LV] Chin Ho Lee and Emanuele Viola. Some limitations of the sum of small-bias distributions. TR15-005, 2016. 5.1

[LV18] Chin Ho Lee and Emanuele Viola. The coin problem for product tests. ACM Trans. Comput. Theory, 10(3):Art. 14, 10, 2018. 4.1, 4.1, 4.1.1

[LVW93] Michael Luby, Boban Veliˇckovi´c,and Avi Wigderson. Deterministic approx- imate counting of depth-2 circuits. In [1993] The 2nd Israel Symposium on Theory and Computing Systems, pages 18–24, June 1993.3

[Man95] Yishay Mansour. An O(nlog log n) learning algorithm for DNF under the uniform distribution. J. Comput. System Sci., 50(3, part 3):543–550, 1995. Fifth Annual Workshop on Computational Learning Theory (COLT) (Pittsburgh, PA, 1992). 4

[MRT18] Raghu Meka, Omer Reingold, and Avishay Tal. Pseudorandom generators for width-3 branching programs. arXiv preprint arXiv:1806.04256, 2018.1, 1.1, 4.1, 4.1, 4.1.1, 4.3.2, 4.19

[MST06] Elchanan Mossel, Amir Shpilka, and Luca Trevisan. On -biased generators in NC0. Random Structures Algorithms, 29(1):56–81, 2006. 5.2

[MZ09] Raghu Meka and David Zuckerman. Small-bias spaces for group products. In Approximation, randomization, and combinatorial optimization, volume 5687 of Lecture Notes in Comput. Sci., pages 658–672. Springer, Berlin, 2009.5, 5.1, 5.1, 5.2, 5.18

[Nis91] Noam Nisan. Pseudorandom bits for constant depth circuits. Combinatorica, 11(1):63–70, 1991. 3.1, 5.2, 5.3.5

[Nis92] Noam Nisan. Pseudorandom generators for space-bounded computation. Com- binatorica, 12(4):449–461, 1992.1,2, 2.1.2,3, 3.1

[NN93] Joseph Naor and Moni Naor. Small-bias probability spaces: efficient construc- tions and applications. SIAM J. Comput., 22(4):838–856, 1993.1,2, 2.1, 2.4, 2.5.2, 3.1, 3.1, 3.1.1, 3.3, 3.3, 3.3, 4.3.1, 4.3.2, 4.3.2,5, 5.14, 5.2, 5.3.5

[NW94] Noam Nisan and Avi Wigderson. Hardness vs. randomness. J. Comput. System Sci., 49(2):149–167, 1994.1

136 [NZ96] Noam Nisan and David Zuckerman. Randomness is linear in space. J. Comput. System Sci., 52(1):43–52, 1996.2, 2.1.2

[O’D14] Ryan O’Donnell. Analysis of Boolean functions. Cambridge University Press, New York, 2014. 4.1.1, 4.2

[OZ18] Ryan O’Donnell and Yu Zhao. On closeness of k-wise uniformity. In Approx- imation, randomization, and combinatorial optimization. Algorithms and tech- niques, volume 116 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 54, 19. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2018. 2.2

[Pal33] Raymond E. A. C. Paley. On orthogonal matrices. J. Math. Phys., 12(1-4):311– 320, 1933. 5.4.2

[Pet60] William Wesley Peterson. Encoding and error-correction procedures for the Bose-Chaudhuri codes. Trans. IRE, IT-6:459–470, 1960. 5.2, 5.3.2

[PPT92] Josip E. Peˇcari´c,Frank Proschan, and Yung Liang Tong. Convex functions, partial orderings, and statistical applications, volume 187 of Mathematics in Science and Engineering. Academic Press, Inc., Boston, MA, 1992. 4.14

[Rab80] Michael O. Rabin. Probabilistic algorithm for testing primality. J. Number Theory, 12(1):128–138, 1980.1

[Raz87] Alexander A. Razborov. Lower bounds on the dimension of schemes of bounded depth in a complete basis containing the logical addition function. Mat. Zametki, 41(4):598–607, 623, 1987.1, 5.3

[Raz09] Alexander A. Razborov. A simple proof of bazzi’s theorem. ACM Trans. Comput. Theory, 1(1):3:1–3:5, February 2009.2, 5.1

[Rei05] Omer Reingold. Undirected ST-connectivity in log-space. In STOC’05: Pro- ceedings of the 37th Annual ACM Symposium on Theory of Computing, pages 376–385. ACM, New York, 2005.1

[RS17] Benjamin Rossman and Srikanth Srinivasan. Separation of AC0[⊕] formulas and circuits. In 44th International Colloquium on Automata, Languages, and Programming, volume 80 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 50, 13. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2017. 4.1

[RSV13] Omer Reingold, Thomas Steinke, and Salil Vadhan. Pseudorandomness for reg- ular branching programs via Fourier analysis. In Approximation, randomization, and combinatorial optimization, volume 8096 of Lecture Notes in Comput. Sci., pages 655–670. Springer, Heidelberg, 2013.1,2,2, 2.1.2, 2.1.2, 2.1.2, 2.5.2, 2.45, 2.46,4, 4.1

137 [RU10] Atri Rudra and Steve Uurtamo. Data stream algorithms for codeword testing. In Proceedings of the 37th International Colloquium Conference on Automata, Languages and Programming, ICALP’10, pages 629–640, Berlin, Heidelberg, 2010. Springer-Verlag. 5.13, 5.2

[RV05] Eyal Rozenman and Salil Vadhan. Derandomized squaring of graphs. In Approx- imation, randomization and combinatorial optimization, volume 3624 of Lecture Notes in Comput. Sci., pages 436–447. Springer, Berlin, 2005.1

[Sha81] Adi Shamir. On the generation of cryptographically strong pseudorandom se- quences. In Automata, languages and programming (Akko, 1981), volume 115 of Lecture Notes in Comput. Sci., pages 544–550. Springer, Berlin-New York, 1981.1

[Shp09] Amir Shpilka. Constructions of low-degree and error-correcting -biased gener- ators. Comput. Complexity, 18(4):495–525, 2009. 2.1.1, 5.2, 5.3.1, 5.15

[Smo87] Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, STOC ’87, pages 77–82, New York, NY, USA, 1987. ACM. 5.2, 5.3.5, 5.22

[SS77] Robert M. Solovay and Volker Strassen. A fast Monte-Carlo test for primality. SIAM J. Comput., 6(1):84–85, 1977.1

[ST18] Rocco A Servedio and Li-Yang Tan. Improved pseudorandom generators from pseudorandom multi-switching lemmas. arXiv preprint arXiv:1801.03590, 2018. 1,3, 4.1

[Ste04] J. Michael Steele. The Cauchy-Schwarz master class. MAA Problem Books Se- ries. Mathematical Association of America, Washington, DC; Cambridge Uni- versity Press, Cambridge, 2004. 3.23, 4.2

[Ste13] John Steinberger. The distinguishability of product distributions by read-once branching programs. In 2013 IEEE Conference on Computational Complexity— CCC 2013, pages 248–254. IEEE Computer Soc., Los Alamitos, CA, 2013. 4.1

[Sub61] Bella A. Subbotovskaya. Realization of linear functions by formulas using ∨, &, −. Soviet Math. Dokl., 2:110–112, 1961. 2.1

[SV10] Ronen Shaltiel and Emanuele Viola. Hardness amplification proofs require ma- jority. SIAM J. Comput., 39(7):3122–3154, 2010. 4.1, 5.2, 5.5, 5.5

[SVW17] Thomas Steinke, Salil Vadhan, and Andrew Wan. Pseudorandomness and Fourier-growth bounds for width-3 branching programs. Theory Comput., 13:Paper No. 12, 50, 2017.1,2, 2.1.2, 2.1.2,4, 4.1

138 [Tal96] Michel Talagrand. How much are increasing sets positively correlated? Combi- natorica, 16(2):243–258, 1996. 4.1.1, 4.2, 4.4

[Tal17] Avishay Tal. Tight bounds on the Fourier spectrum of AC0. In 32nd Computa- tional Complexity Conference, volume 79 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 15, 31. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2017. 2,4, 4.1, 5.1, 5.2

[Tre10] Luca Trevisan. Open problems in unconditional derandomization. Presentation at China Theory Week, 2010. Slides available at https://www.cc.gatech.edu/ ~mihail/trevisan2.pdf. 1.1,3 [TS17] Amnon Ta-Shma. Explicit, almost optimal, epsilon-balanced codes. In STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on The- ory of Computing, pages 238–251. ACM, New York, 2017.5

[TX13] Luca Trevisan and TongKe Xue. A derandomized switching lemma and an improved derandomization of AC0. In 2013 IEEE Conference on Computational Complexity—CCC 2013, pages 242–247. IEEE Computer Soc., Los Alamitos, CA, 2013.1, 4.1

[Tzu09] Yoav Tzur. Notions of weak pseudorandomness and GF (2n)-polynomials. M.Sc. thesis, Weizmann Institute of Science, 2009.2

[Val77] Leslie G. Valiant. Graph-theoretic arguments in low-level complexity. pages 162–176. Lecture Notes in Comput. Sci., Vol. 53, 1977. 5.1, 5.2, 5.3.4, 5.17

[Val83] Leslie G. Valiant. Exponential lower bounds for restricted monotone circuits. In Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, STOC ’83, pages 110–117, New York, NY, USA, 1983. ACM. 5.3.4, 5.17

[Val84] Leslie G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, November 1984. 4.1

[Vio07] Emanuele Viola. Pseudorandom bits for constant-depth circuits with few arbi- trary symmetric gates. SIAM J. Comput., 36(5):1387–1403, 2006/07.3

[Vio06] Emanuele Viola. The Complexity of Hardness Amplification and Derandomiza- tion. PhD thesis, Cambridge, MA, USA, 2006. AAI3217914. 5.2, 5.5

[Vio09a] Emanuele Viola. On approximate majority and probabilistic time. Comput. Complexity, 18(3):337–375, 2009. 4.1

[Vio09b] Emanuele Viola. On the power of small-depth computation. Found. Trends Theor. Comput. Sci., 5(1):1–72, 2009.1, 5.3, 5.2, 5.3.4, 5.3.5

139 [Vio09c] Emanuele Viola. The sum of d small-bias generators fools polynomials of degree d. Comput. Complexity, 18(2):209–217, 2009.1,3, 5.3, 5.5

[Vio12] Emanuele Viola. The complexity of distributions. SIAM J. Comput., 41(1):191– 218, 2012. 2.4, 2.4

[Vio14] Emanuele Viola. Randomness buys depth for approximate counting. Comput. Complexity, 23(3):479–508, 2014.2, 2.1.2, 4.1, 4.1

[Vio17] Emanuele Viola. Special topics in complexity theory. Lecture notes of the class taught at Northeastern University., 2017. 3.1

[VW08] Emanuele Viola and Avi Wigderson. Norms, XOR lemmas, and lower bounds for polynomials and protocols. Theory Comput., 4:137–168, 2008. 5.2

[Wat13] Thomas Watson. Pseudorandom generators for combinatorial checkerboards. Comput. Complexity, 22(4):727–769, 2013.2

[Wil12] Richard M. Wilson. Combinatorial analysis lecture notes. 2012. 5.4.2, 5.27, 5.28

[Yao82] Andrew C. Yao. Theory and applications of trapdoor functions. In 23rd annual symposium on foundations of computer science (Chicago, Ill., 1982), pages 80– 91. IEEE, New York, 1982.1

[Yao85] Andrew C. Yao. Separating the polynomial-time hierarchy by oracles. In 26th Annual Symposium on Foundations of Computer Science (sfcs 1985), pages 1– 10, Oct 1985. 2.1

140 Appendix A

Fooling read-once DNF formulas

In this chapter we state and prove that an m−O(log(1/δ)) bound on the bias suffices to δ-fool any read-once DNF formulas with m terms. This directly follows from Lemma 5.2 in [CRS00].

Claim A.1. Let φ be a read-once DNF formula with m terms. For 1 ≤ k ≤ m, every ε-biased distribution D fools φ with error O(2−Ω(k) + εmk). Wm Proof. Write φ(x) := i=1 Ci. By Lemma 5.2 in [CRS00], |Prx∼D[φ(x)] − Prx∼{0,1}n [φ(x)]| is upper bounded by

k −k −k/2e X X h^ i h^ i 2 + e · e + Pr Ci − Pr Ci . x∼D x∼{0,1}n `=1 S⊆[m]:|S|=` i∈S i∈S V The rest follows from the fact that D fools each i∈S Ci with error ε because it is an AND of AND terms.

141