Bounded Independence Plus Noise
by
Chin Ho Lee
A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of Doctor of Philosophy in Computer Science
Northeastern University August 2019 c 2019 Chin Ho Lee All rights reserved. To my parents Abstract Bounded Independence Plus Noise
by
Chin Ho Lee Doctor of Philosophy Khoury College of Computer Science Northeastern University
Derandomization is a fundamental research area in theoretical computer science. In the past decade, researchers have been able to derandomize a number of computational problems, leading to breakthrough discoveries such as proving SL = L, explicit constructions of Ramsey graphs and optimal-rate error-correcting codes. Bounded independent and small-bias distributions are two pseudorandom primitives that are used extensively in derandomization. This thesis studies the power of these primitives under the perturbation of noise. We give positive and negative results on these perturbed distributions. In particular, we show that they are significantly more powerful than the unperturbed ones, and have the potential to solve long standing open problems such as proving RL = L and AC0[⊕] lower bounds. As applications, we give new lower bounds on the complexity of decoding error-correcting codes, nearly-optimal pseudorandom generators for old and new classes of tests, and limita- tions on the sum of small-bias distributions.
i Acknowledgements
Foremost, I would like to thank Manu, Daniel, Jon and Omer for taking the time to serve on my thesis committee. This thesis would have been impossible without the wise guidance of Manu. I am truly grateful for his patience during my six years of PhD. His unique perspectives on many matters, whether they are related to research or not, have made a great impact on my life. His striving for simplicity and his clarity of writing will always be examples for me to pursue in the future. I thank Ravi Boppana, Elad Haramaty, Johan H˚astadand Manu, with whom I have collaborated on several results related to this thesis, for sharing their ideas with me. I have learned a great deal of doing research through these collaborations. I am extremely grateful to Amnon Ta-Shma for hosting me at Tel-Aviv University, and to Dean Doron and Gil Cohen for many stimulating discussions during my visit. I thank Salil Vadhan for his excellent course on pseudorandomness at Harvard. My understanding of pseudorandomness would not have been the same without it. I thank Andrej Bogdanov for teaching me Fourier Analysis during my masters, a tool that is used extensively in this thesis. I also thank Andrej for being available for discussions whenever I went home for a visit, and for giving me the opportunities to give talks at the theory seminars in CUHK. My PhD life would have been miserable without my friends at Harvard and Khoury, in particular those at MD138 and WVH266. I thank them for keeping the office spaces full of positive energy, and organizing all kinds of activities to put my mind at ease when getting depressed from research. Finally, I thank my beloved Sabrina for her endless support, patience and love on the other side of the world.
ii Contents
Abstract...... i Acknowledgements...... ii Table of Contents...... iii
1 Introduction1 1.1 Contribution of this thesis...... 3 1.2 Organization of this thesis...... 4
2 Bounded Independence Plus Noise Fools Products7 2.1 Our results...... 9 2.1.1 Application: The complexity of decoding...... 10 2.1.2 Application: Pseudorandomness...... 14 2.1.3 Techniques...... 16 2.2 Bounded independence plus noise fools products...... 17 2.2.1 Preliminaries...... 20 2.2.2 Proof of Theorem 2.22...... 22 2.3 Proofs for Section 2.1.1...... 24 2.4 Pseudorandomness: I...... 26 2.5 Pseudorandomness, II...... 29 2.5.1 Proof of Theorem 2.38...... 29 2.5.2 A recursive generator...... 32 2.6 Pseudorandomness, III...... 35 2.7 A lower bound on b and η ...... 36
3 Pseudorandom Generators for Read-Once Polynomials 37 3.1 Our results...... 37 3.1.1 Techniques...... 40 3.2 Bounded independence plus noise fools products...... 44 3.3 Pseudorandom generators...... 48 3.4 On almost k-wise independent variables with small total-variance...... 52 3.4.1 Preliminaries...... 53 3.4.2 Proof of Lemma 3.12...... 54 3.5 Improved bound for bounded independence plus noise fools products.... 68 3.5.1 Noise reduces variance of bounded complex-valued functions..... 69
iii 3.5.2 XOR Lemma for bounded independence...... 71 3.5.3 Proof of Theorem 3.9...... 73 3.5.4 Proof of Theorem 3.11...... 76 3.6 Small-bias plus noise fools degree-2 polynomials...... 77 3.7 Proof of Claim 3.8...... 78 3.8 Moment bounds for sum of almost d-wise independent variables...... 80
4 Fourier Bounds and Pseudorandom Generators for Product Tests 85 4.1 Our results...... 85 4.1.1 Techniques...... 89 4.2 Fourier spectrum of product tests...... 92 4.2.1 Schur-concavity of g ...... 97 4.2.2 Lower bound...... 99 4.3 Pseudorandom generators...... 99 4.3.1 Generator for product tests...... 101 4.3.2 Almost-optimal generator for XOR of Boolean functions...... 102 4.4 Level-d inequalities...... 105
5 Some Limitations of the Sum of Small-Bias Distributions 109 5.1 Our results...... 110 5.2 Our techniques...... 113 5.3 Our counterexamples...... 116 5.3.1 General circuits...... 116 5.3.2 NC2 circuits...... 117 5.3.3 One-way log-space computation...... 117 5.3.4 Depth 3 circuits, DNF formulas and AC0 circuits...... 118 5.3.5 Mod 3 linear functions...... 119 5.3.6 Sum of k copies of small-bias distributions...... 123 5.4 Mod 3 rank of k-wise independence...... 124 5.4.1 Lower bound for almost k-wise independence...... 124 5.4.2 Pairwise independence...... 125 5.5 Complexity of decoding...... 127
Bibilography 129
A Fooling read-once DNF formulas 141
iv Chapter 1
Introduction
The theory of pseudorandomness studies explicit constructions of objects that appear random to restricted classes of tests. It has numerous connections to computer science and math- ematics, including algorithms, computational complexity, cryptography and combinatorics. In particular, the study of pseudorandomness is indispensable in understanding the power of randomness in computation, a fundamental research area in theoretical computer science. While it is known that certain tasks in areas such as cryptography and distributed computing are impossible without randomness, researchers have been able to derandomize a number of computational problems, showing that randomness often does not give us significant savings in computational resources over determinism. One central open question in derandomization is the BPP vs. P question. It asks whether probabilitistic polynomial-time algorithms can be made deterministic without a drastic draw- back in its running time. This question is largely open, as resolving this question would imply circuit lower bounds that seem beyond reach given our current techniques [IKW02, KI04]. Because of this, research in derandomization can be divided into two directions. • Conditional results: One line of research, pioneered by Blum and Micali [BM84], and Yao [Yao82], shows that derandomization can be realized under the assumption that hard functions exist. This approach has found a lot of success in cryptography, where cryptographic primitives are constructed based on the intractability of several specific computational problems. Indeed, this approach was first proposed for cryptographic applications, and the idea of trading hardness for randomness was first suggested by Shamir [Sha81], who constructed pseudorandom sequences assuming the hardness of the RSA encryption function. The seminal work of Nisan and Wigderson [NW94] shows that derandomizing BPP is possible under weaker complexity assumptions. This is followed by the subsequent work of Impagliazzo and Wigderson [IW99], which showed that if any problem in the class E requires circuits of super-polynomial size to approximate, then BPP = P. Since then, researchers have tried to optimize the hardness and randomness trade-off. • Unconditional results: Another line of research turns to derandomizing restricted classes of computation models for which we can prove unconditional lower bounds.
1 Two major classes of tests that have received a lot of attention since the late 80s are constant-depth circuits [AW89] and space-bounded computation [AKS87]. A major open problem in derandomizing space-bounded computation is the RL vs. L question, which is a space-analogue of BPP vs. P, asking whether randomized logarithmic space computation can be made deterministic without a drastic blow-up in space. In contrast to proving BPP = P, currently no major “obstacle” is known in resolving this problem. The focus of this thesis will be on unconditional results. In this direction, several fascinating results were discovered in the past two decades. • Primality testing: It was shown in the 70s that there exist randomized polynomial- time algorithms that determine if a given integer is prime [Rab80, SS77]. In 2002, Agrawal, Kayal and Saxena [AKS04] showed that primality testing can be solved de- terministically in polynomial time. • Undirected s-t connectivity: A randomized polynomial-time algorithm was given in the 70s to decide if two points are connected in an undirected graph [AKL+79]. In 2005, Reingold [Rei05] showed that this problem can be solved deterministically in polynomial time, proving SL = L (see also [RV05]). • Ramsey graphs: In one of the first applications of the probabilistic method, Erd¨os[Erd47] showed that random graphs on n vertices are (2 log n)-Ramsey, meaning they have no clique or independent set of size 2 log n. The recent breakthrough results of Chattopad- hyay and Zuckerman [CZ16], and Cohen [Coh16] used tools in pseudorandomness to construct explicit graphs that are 2O(log log n)O(1) -Ramsey. A standard approach in derandomization is the construction of pseudorandom generators (PRGs), a fundamental object in pseudorandomness. Indeed, at least the latter two items above are proved using pseudorandom generators. A pseudorandom generator is a determin- istic efficient algorithm that takes a few random bits, called the seed, as input, and stretches them to a longer output that fools a class of tests, meaning no test can distinguish the output distribution of the generator (over uniform input) from truly random. One way to deran- domize a randomized algorithm is to enumerate the seeds of the generator, and simulate the algorithm using the outputs of the generator as truly random bits. Two primitives that are used extensively as building blocks in constructing pseudorandom generators are bounded independence and small-bias distributions, introduced by Carter and Wegman [CW79], and Naor and Naor [NN93], respectively. These primitives alone are often not sufficient for many applications, and researchers have proposed different ways of combining them to fool different classes of tests. In this thesis we focus on two of them. In the late 80s, Ajtai and Wigderson [AW89] constructed the first pseudorandom gener- ator for polynomial-size constant-depth circuits. Their construction relies on the fact that circuits become simplified under random restrictions, a procedure that selects a random sub- set of input bits and sets their values to random, and the simplified circuits can be fooled by bounded independence. To obtain their generator, they apply the argument recursively. This approach was revived and refined recently by the work of [GMR+12], and since then has been used extensively to construct pseudorandom generators for various classes of tests [TX13, GKM18, GY14, RSV13, SVW17, HT18, CHRT18, FK18, MRT18, CSV15, ST18, DHH18].
2 One notable result is the construction of generators by Meka, Reingold and Tal [MRT18], which fools the class of width-3 read-once branching programs, improving the O(log2 n) seed length of the generators in the 90s [Nis92, INW94]. In the last decade, Bogdanov and Viola [BV10a] proposed taking the sum of independent copies of small-bias distributions to fool F2-polynomials. This class of distributions is known to be significantly more powerful than a single copy, and was shown to give optimal PRGs for F2-polynomials of low degrees [Lov09, Vio09c]. It is an open question whether this approach fools polynomials of higher degree. If the construction worked for every degree d = logO(1) n, it would make progress on long-standing open problems in circuit complexity regarding constant-depth circuits with parity gates [Raz87]. This question is implicit in the works [BV10a, Lov09, Vio09c] and explicit in [Vio09b, Chapter 1] (Open question 4), cf. the survey [Vio09b, Chapter 1]. Besides F2-polynomials, Reingold and Vadhan (personal communication) asked whether there exists a constant c such that the sum of two independent copies of any n−c-biased distribution fools one-way logarithmic space, also known as read-once polynomial-width branching programs, which would imply RL = L. It is known that a small-bias distribution fools width-2 read-once branching programs (Saks and Zuckerman, see also [BDVY13] where a generalization is obtained). However, no such result is known for width-3 programs.
1.1 Contribution of this thesis
In this thesis, we study the two aforementioned proposals and attack them through the following seemingly unrelated natural question.
What is the power of bounded independence if we perturb it with noise?
One starting point of this thesis is the following observation. It is known that both bounded independent and small-bias distributions fail to fool certain very simple tests. For example, it is well-known that bounded independence completely fails to fool the parity function. Consider the distribution D that is uniform over n-bit strings with parity 0. Then it is straightforward to see that D is (n − 1)-wise independence. Clearly, the parity under D is always 0, whereas under the uniform distribution the parity equals 1 with probability 1/2. Likewise, one can show that small-bias does not fool the mod3 function, the language that contains all n-bit strings with Hamming weight divisible by 3, because the uniform distribution over these strings is small-biased. However, both examples above break completely if we perturb just a few bits of the distribution randomly, i.e., setting a few bits to uniform. For parity, it suffices to perturb just one bit and the expectation of parity will be the same as uniform. Similarly, if we perturb 3 bits of the input, then again the expectation of mod3 will be exponentially close to uniform in the number of perturbed bits. Thus, these primitives appear to become more powerful under the perturbation of noise, and the goal of this thesis is to understand the power of such distributions from various angles. We obtained both positive and negative results.
3 Positive results. We study the power of bounded independence plus noise through a new ` class of tests called product tests. These tests are the product of k functions fi : Zm → C≤1, where the inputs of the fi are disjoint subsets of the n variables and C≤1 is the complex unit disk. Our main positive result is showing that if we add a little noise to any distribution with bounded independence, or with small-bias, then the distribution fools product tests with good error. This has found applications in constructing pseudorandom generators and proving lower bounds for the complexity of decoding error-correcting codes. Along the way, we obtain several structural results about product tests, such as their Fourier spectrum and distinguishabilities against the coin problem, which play a critical role to obtaining a pseudorandom generator for these tests with a seed length that is close to optimal.
Negative results. Using a connection between bounded independence and linear error- correcting codes, we give a framework to exhibit several classes of tests that can distinguish bounded independence plus noise from uniform. With this framework and another com- pletely different approach, we construct several counterexamples showing some limitation of sum of small-bias distributions. Several ideas of in this thesis have played a role towards the recent developments on derandomizing space-bounded computation. We gave the first generator with seed length 2 better than O(log n) for the class of read-once F2-polynomials. This class of tests was an obstacle to constructing better pseudorandom generators for read-once constant-width branching programs that was noted by several researchers including Trevisan [Tre10] and Vadhan (personal communication). Subsequently, several ideas in our analyses were adapted by Meka, Reingold and Tal to construct a generator for width-3 read-once branching pro- grams with seed length better than the O(log2 n) seed length in the 90s. At about the same time, Forbes and Kelly [FK18], by applying a small but elegant modification to one of our analyses, simplified and improved the analysis in [MRT18], and obtained a pseudorandom generator for unordered polynomial-width read-once branching programs with seed length O(log3 n), improving the previous constructions of generators in the unordered setting. In addition, using a generator in this thesis, Viola [?] recently gave the first lower bound for randomized Turing machines with a read-only input tape. Kopparty, Shaltiel, and Sil- bak [KSS19] constructed list-decodable codes against space-bounded channels whose encod- ing and decoding algorithms run in quasilinear time.
1.2 Organization of this thesis
Chapter2. We formally define bounded independence, noise, and product tests and show that bounded independence plus noise fools product tests. We develop two applications of this type of results. First, we prove communication lower bounds for decoding noisy codewords of length n split among k parties. For Reed–Solomon codes of dimension n/k where k = O(1), we show that Ω(ηn) − O(log n) bits of communication is required to decode one message symbol from a codeword with ηn errors, and communication O(ηn log n) suffices.
4 Second, we obtain pseudorandom generators. We can ε-fool product tests f : {0, 1}n → ˜ 2 C≤1 under any permutation of the bits with seed lengths 2` + O(k log(1√ /ε)) and O(`) + O˜(p`k log 1/ε). Previous generators have seed lengths ≥ `k/2 or ≥ ` `k. For the special case where the k bounded functions have range {0, 1} the previous generators have seed length ≥ (` + log k) log(1/ε).
Chapter3. We give a different analysis of showing bounded independence plus noise fools product tests. Then we construct pseudorandom generators with improved seed length for several classes of tests. First we consider the class of read-once polynomials over GF(2) in n variables. For error ε we obtain seed length O˜(log(n/ε)) log(1/ε), where O˜ hides lower-order terms. This is optimal up to the factor O˜(log(1/ε)). The previous best seed length was polylogarithmic in n and 1/ε. n Second we consider product tests f : {0, 1} → C≤1. In this chapter, we obtain seed length ` · poly log(n/ε). This implies better generators for other classes of tests. Moreover, if the fi have output range independent of ` and k (e.g. {−1, 1}) then we obtain seed length ˜ ˜ O(` + log(k/ε)) log(1/ε). This is again√ optimal up to the factor O(log(1/ε)), while the seed length in the previous chapter is ≥ k.
Chapter4. We study the Fourier spectrum of product tests f : {0, 1}`k → {−1, 0, 1}. We prove that for every positive integer d,
X ˆ p d |fS| = O min{`, ` log(2k)} . S⊆[`k]:|S|=d
Our upper bounds are tight up to a constant factor in the O(·). Our proof uses Schur- P ˆ 2 convexity, and builds on a new “level-d inequality” that bounds above |S|=d fS for any [0, 1]-valued function f in terms of its expectation, which may be of independent interest. As a result, we construct pseudorandom generators for product tests with seed length O˜(`+log(k/ε)), which is optimal up to polynomial factors in log `, log log k and log log(1/ε). We also extend our results to product tests whose range is [−1, 1].
Chapter5. We present two approaches to constructing ε-biased distributions D on n bits and functions f : {0, 1}n → {0, 1} such that the XOR of two independent copies (D + D) does not fool f. Using them, we give constructions for any of the following choices: 1. ε = 2−Ω(n) and f is in P/poly; 2. ε = 2−Ω(n/ log n) and f is in NC2; 3. ε = n−c and f is a one-way space O(c log n) algorithm, for any c; 4. ε = n−Ω(1) and f is a mod 3 linear function. All the results give one-sided distinguishers, and extend to the XOR of more copies for suitable ε. We also give conditional results for AC0 and DNF formulas.
5 6 Chapter 2
Bounded Independence Plus Noise Fools Products
At least since the seminal work [CW79] the study of bounded independence has received a lot of attention in theoretical computer science. In particular, researchers have analyzed various classes of tests that cannot distinguish distributions with bounded independence from uniform. Such tests include (combinatorial) rectangles [EGL+98] (cf. [CRS00]), bounded- depth circuits [Baz09, Raz09, Bra10, Tal17], and halfspaces [DGJ+10, GOWZ10, DKN10], to name a few. We say that such tests are fooled by distributions with bounded independence.
n Definition 2.1 (Bounded independence). A distribution D over Zm is b-wise independent, b or b-uniform, if any b symbols of D are uniformly distributed over Zm. In this thesis consider fooling a new class of tests called product tests. These are functions which can be written as a product of arbitrary bounded functions defined on disjoint inputs.
n Definition 2.2 (Product tests). A function f : Zm → C≤1 is a product test with k functions of input length n if there exist k disjoint subsets I1,I2,...,Ik ⊆ {1, 2, . . . , n} of size ≤ ` such Q that f(x) = i≤k fi(xIi ) for some functions fi with range C≤1. Here C≤1 is the complex unit
disk {z ∈ C : |z| ≤ 1} and xIi are the |Ii| bits of x indexed by Ii.
Througout this thesis, we will often restrict the range of each fi to be a subset R of C≤1. For example, R can be the real interval [−1, 1], or the sets {0, 1} and {−1, 1}. We will sometimes use R-product tests to specify the range of the product tests in the discussion. We note that these tests make sense already for ` = 1 and large m (and in fact as we will see have been considered for such parameters in the literature). But it is essential for our applications that the input set of the fi has a product structure, so we think of ` being large. We can choose m = 2 for almost all of our results. In this case, each fi simply has domain {0, 1}`. The class of product tests was first introduced by Gopalan, Kane and Meka under the name of Fourier shapes [GKM18]. However, in their definition, the subsets Ii are fixed. Motivated by the recent constructions of pseudorandom generators against unordered tests,
7 which are tests that read input bits in arbitrary order [BPW11, IMZ19, RSV13, SVW17], we consider the generalization in which the subsets Ii can be arbitrary as long as they are of bounded size and pairwise disjoint. Constructing generators handling arbitrary order is significantly more challenging, because the classical space-bounded generators such as Nisan’s [Nis92] only work in fixed order [Tzu09, BPW11]. Product tests include as a special case several classes of tests which have been studied in the literature. Specifically, when the range of the functions fi is {0, 1}, product tests correspond to the AND of disjoint Boolean functions, also known as the well-studied class of combinatorial rectangles [AKS87, Nis92, NZ96, INW94, EGL+98, ASWZ96, Lu02, Vio14, GMR+12, GY14]. Definition 2.3 (Combinatorial rectangles). A combinatorial rectangle is a product test where each fi has output in {0, 1}. Product tests also generalize some other classes of tests. For example, {−1, 1}-product tests correspond to the XOR of disjoint Boolean functions, also known as the class of com- binatorial checkerboards [Wat13]. The work [GKM18] highlights the unifying role of C≤1 product tests by showing that any distribution that fools product tests also fools a number of other tests considered in the literature, including generalized halfspaces [GOWZ10] and combinatorial shapes [GMRZ13, De15]. Product tests can also be generalized to capture the important class of read-once space computation. Specifically, Reingold, Steinke and Vadhan [RSV13] showed that the class of read-once width-w branching programs can be encoded as product tests with outputs {0, 1}w×w, the set of w × w Boolean matrices.
Bounded independence vs. products. A moment of thought reveals that bounded independence completely fails to fool product tests. Indeed, note that the parity function on n := `k bits is a product test: set m = 2 and let each of the k functions compute the parity of their `-bit input, with output in {−1, 1}. However, consider the distribution D which is uniform on n − 1 bits and has the last bit equal to the parity of the first n − 1 bits. D has independence n − 1, which is just one short of maximum. And yet the expectation of parity under D is 1, whereas the expectation of parity under uniform is 0. The parity counterexample is the simplest example of a general obstacle which has more manifestations. For another example define gi := (1 − fi)/2, where the fi are as in the Q previous example. Each gi has range in {0, 1}, and so gi is a combinatorial rectangle. Q −k But the expectations of i gi under D and uniform differ by 2 . This error is too large for the applications in communication complexity and streaming where we have to sum over 2k rectangles. Indeed, jumping ahead, having a much lower error is critical for our applications. Finally, the obstacle arises even if we consider distributions with small bias [NN93] instead of bounded independence. Indeed, the uniform distribution D over n bits whose inner product modulo 2 is one has bias 2−Ω(n), but inner product is a nearly balanced function which can be written as product, implying that its expectations under D and uniform differ by 1/2 − o(1). The starting point of this thesis is the observation that all these examples break com- pletely if we perturb just a few bits of D randomly.
8 n Definition 2.4 (Noise). We denote by N(m, n, η) the noise distribution over Zm where the symbols are independent and each of them is set to uniform with probability η and is 0 otherwise. We simply write N when the parameters are clear from the context.
For parity, it suffices to perturb one bit and the expectation under D will be 0. For inner product, the distance between the expectations shrinks exponentially with the number of perturbed bits.
2.1 Our results
The main result in this thesis is that this is a general phenomenon: If we add a little noise to any distribution with bounded independence, or with small-bias, then we fool product tests with good error bounds. We first state the results for bounded independence.
` Theorem 2.5 (Bounded independence plus noise fools products). Let f1, . . . , fk : Zm → C1 be k functions with µi = E[fi]. Set n := `k. Let b ≥ ` and D be a b-uniform distribution n over Zm. Let N be the noise distribution from Definition 2.4. Write D = (D1,D2,...,Dk) ` where each Di is in Zm, and similarly for N. Then " # 2 Y Y Ω(b /n) E fi(Di + Ni) − µi ≤ k(1 − η) . i≤k i≤k
In Section 2.2 we prove a more general result that applies to distributions which are almost b-wise independent [NN93], and to a wider range of parameters. In that section we also point out that Theorem 2.5 is essentially tight when b = Ω(n). It is an interesting question whether the bounds are tight even for b = o(n), and we will come back to this question in the next ` chapter. We stress that the k points D1,D2,...,Dk ∈ Zm in the theorem may not even be pairwise independent; only the n symbols of D are b-wise independent. Also note that the theorem is meaningful for a wide range of the noise parameter η: we can have η constant, which means that we are perturbing a constant fraction of the symbols, or we can have η = O(1/n) which means that we are only perturbing a constant number of symbols, just like in the observation mentioned above. To illustrate this setting, consider for example k = O(1) and b = `. We can have an error bound of ε by setting η = c/n for a c that depends only on ε. We note that a noise vector can be equivalently viewed as a random restriction. With this interpretation, our results show that on average a random restriction of a product test is a function f 0 that is simpler in the sense that f 0 is fooled by any (`, δ)-biased distribution, for certain values of δ. (The latter property has equivalent formulations in terms of the Fourier coefficients of f 0, see [Baz09].) Thus, our results fall in the general theme “restrictions simplify functions” that has been mainstream in complexity theory since at least the work of Subbotovskaya [Sub61]. For an early example falling in this theme, consider AC0 circuits. There are distributions with super-constant independence which do not fool AC0 circuits of
9 bounded depth and polynomial size. (Take the uniform distribution conditioned on the parity of the first log many bits equal to 1, and use the fact that such circuits can compute parity on log many bits.) On the other hand, the switching lemma [FSS84, Ajt83, Yao85, H˚as, H˚as14, IMP] shows that randomly restricting all but a 1/polylog fraction of the variables collapses the circuit to a function that depends only on c = O(1) variables, and such a function is fooled by any c-wise independent distribution. Thus, adding noise dramatically reduces the amount of independence that is required to fool AC0 circuits. For a more recent example, Lemma 7.2 in [GMR+12] shows that for a special case of AC0 circuits – read-once CNF – one can restrict all but a constant fraction of the variables and then the resulting function is fooled by any ε-bias distribution for a certain ε = 1/`ω(1) which is seen to be larger than the bias that would be required had we not applied a restriction. We are not aware of prior work which applies to arbitrary functions as in our theorems. Another difference between our results and all the previous works that we are aware of lies in the parameter η. In previous works η is large, in particular η = Ω(1), which corresponds to restricting many variables. We can instead set η arbitrarily, and this flexibility is used in both of our applications.
2.1.1 Application: The complexity of decoding Error-correcting codes are a fundamental concept with myriad applications in computer science. It is relevant to several of these, and perhaps also natural, to ask what is the complexity of basic procedures related to error-correcting codes. In this chapter we fo- cus on decoding. The question of the complexity of decoding has already been addressed in [BYRST02, Baz05, Gro06]. However, all previous lower bounds that we are aware of are perhaps not as strong as one may hope. First, they provide no better negative results for decoding than for encoding. But common experience shows that decoding is much harder! Second, they do not apply to decision problems, but only to multi-output problems such as computing the entire message. Third, they apply to small-space algorithms but not to stronger models such as communication protocols. In this chapter we obtain new lower bounds for decoding which overcome these limita- tions. First, we obtain much stronger bounds for decoding than for encoding. For example, we prove below that decoding a message symbol from Reed–Solomon codeword of length q with Ω(q) errors requires Ω(q) communication. On the other hand, encoding is a linear map, and so one can compute any symbol with just O(log q) communication (or space). This expo- nential gap may provide a theoretical justification for the common experience that decoding is harder than encoding. Second, our results apply to decision problems. Third, our results apply to stronger models than space-bounded algorithms. Specifically, our lower bounds are proved in the k-party “number-in-hand” communication complexity model, where each of k collaborating parties receives a disjoint portion of the input. The parties communicate by broadcast (a.k.a. writing on a blackboard). For completeness we give next a definition. Although we only define deterministic protocols, our lower bounds in fact bound the corre- lation between such protocols and the hard problem, and so also hold for distributions of protocols (a.k.a. allowing the parties to share a random string).
10 Definition 2.6 (Number-in-hand protocols). A k-party number-in-hand, best-partition, n communication protocol for a function f : Zm → Y , where k divides n, is given by a partition of n into k sets S1,S2,...,Sk of equal size n/k and a binary tree. Each internal node v of n/k the tree is labeled with a set Sv ∈ {S1,S2,...,Sk} and a function fv : Zm → {0, 1}, and has two outgoing edges labeled 0 and 1. The leaves are labeled with elements from Y . On input n x ∈ Zm the protocol computes y ∈ Y following the root-to-leaf path where from node v we follow the edge labeled with the value of fv on the n/k symbols of x corresponding to Sv. The communication cost of the protocol is the depth of the tree.
Note that we insisted that k divides n, but all the results can be generalized to the case when this does not hold. However this small additional generality makes the statements slightly more cumbersome, so we prefer to avoid it. Jumping ahead, for Reed–Solomon codes this will mean that the claims do not apply as stated to prime fields (but again can be modified to apply to such fields). Again for completeness, we give next a definition of space-bounded algorithms. For simplicity we think of the input as being encoded in bits.
Definition 2.7 (One-way, bounded-space algorithm). A width-w (a.k.a. space-log w) one- way algorithm (or branching program or streaming algorithm) on n bits consists of a layered directed graph with n + 1 layers. Each layer has w nodes, except the first layer which has 1. Each node in layer i ≤ n has two edges, labeled with 0 and 1, connecting to nodes in layer i + 1. Each node on layer m + 1 is labeled with an output element. On an n-bit input, the algorithm follows the path corresponding to the input, reading the input in a one-way fashion (so layer i reads the i-th input bit), and then outputs the label of the last node. (For Boolean outputs it is sufficient for the last layer to have 2 nodes.)
We note that a space-s one-way algorithm can be simulated by a k-party protocol with communication sk. Thus our negative results apply to space-bounded algorithms as well. In fact, this simulation only uses one-way communication and a fixed partition (corresponding to the order in which the algorithm reads the input). But our communication lower bounds hold even for two-way communication and for any partition of the input into k parties, as in Definition 2.6. Our lower bound holds when the uniform distribution over the code is b-uniform.
n Definition 2.8. A code C ⊆ Fq is b-uniform if the uniform distribution over C is b-uniform.
The following standard fact relates the above definition to the dual distance of the code.
n Fact 2.9. Let X be the uniform distribution over a linear code C ⊆ Fq . Then X is d-wise independent if and only if the dual of C has minimum distance ≥ d + 1.
We state next a lower bound for distinguishing a noisy codeword from uniform. The “-1” in the assumption b ≥ n/k − 1 will be useful in Theorem 2.12.
11 n Theorem 2.10 (Distinguishing noisy codewords from uniform is hard). Let C ⊆ Fq be a b-uniform code. Let N be the noise distribution from Definition 2.4. Let k be an integer n dividing n such that b ≥ n/k − 1. Let P : Fq → {0, 1} be a k-party protocol using c bits of communication. Then
c+log n+O(1)−Ω(ηb/k) Pr[P (C + N) = 1] − Pr[P (U) = 1] ≤ ε for ε = 2 ,
n where C and U denote the uniform distributions over the code C and Fq respectively. We now make some remarks on this theorem. First, we note that a (`k)-party protocol can be simulated by a k-party protocol, so in this sense the lower the number of parties the stronger the lower bound. Also, the smallest number of parties to which the theorem can apply is k = n/b, because for k = n/b − 1 one can design b-uniform codes such that the distribution C + N can be distinguished well from uniform by just one party, cf. Section 2.7. And√ our lower bound applies for that number. The theorem is non-trivial whenever b = ω( n), but we illustrate it in the setting of b = Ω(n) which is typical in coding theory as we are also going to discuss. In this setting we can also set k = n/b = O(1). Hence for ε ≥ 1/poly(n), the communication lower bound is
c ≥ Ω(ηn) when η ≥ C log n/n for a universal constant C. When η = Ω(1) this becomes Ω(n). Note that this bound is within an O(log q) factor of the bit-length of the input, which is O(n log q), and within a constant factor if q = O(1). We prove an essentially matching upper bound in terms of η, stated next. The corre- sponding distinguisher is a simple variant of syndrome decoding which we call “truncated syndrome decoding.” It can be implemented as a small-space algorithm with one-sided error, and works even against adversarial noise. So the theorems can be interpreted as saying that syndrome decoding uses an optimal amount of space. We denote by V (t) = Vm,n(t) the n volume of the m-ary Hamming ball in n dimensions of radius t, i.e., the number of x ∈ Zm with at most t non-zero coordinates.
n Theorem 2.11 (Truncated syndrome distinguishing). Let C ⊆ Fq be a linear code with dimension d. Given t and δ > 0, define s := dlogq(Vq,n(t)/δ)e. If d ≤ n − s there is a one-way algorithm A that runs in space s log q such that 1. for every x ∈ C and for every e of Hamming weight ≤ t, A(x + e) = 1, and n 2. Pr[A(U) = 1] ≤ δ, where U is uniform in Fq . Moreover, the space bound s log q is at most O(t log(nq/t)) + log 1/δ.
Note that when t = O(ηn) and δ is constant the space bound is O(ηn log(q/η)), which matches our Ω(ηn) lower bound up to the O(log(q/η)) factor. These results in particular apply to Reed–Solomon codes. Recall that a Reed–Solomon b code of dimension b is the linear code where a message in Fq is interpreted as a polynomial p of degree b − 1 and encoded as the q evaluations of p over any element in the field. (In
12 some presentations, the element 0 is excluded.) Such a code is b-uniform because for any b points (xi, yi) where the xi’s are different, there is exactly one polynomial p of degree b − 1 such that p(xi) = yi for every i. n For several binary codes C ⊆ F2 and constant η we can obtain a communication lower bound of Ω(n) which is tight up to constant factors. This is true for example for random, linear codes (with bounded rate). The complexity of decoding such codes is intensely studied, also because the assumed intractability of their decoding is a basis for several cryptographic applications. See for example [BJMM12]. We also obtain a tight lower bound of Ω(n) for several explicitly-defined binary codes. For example, we can pick an explicit binary code n C ⊆ F2 which is Ω(n)-uniform and that can be decoded in polynomial time for a certain constant noise parameter η (with high probability), see [Shp09] for a construction.
Lower bounds for decoding one symbol. The lower bound in Theorem 2.10 is for the problem of distinguishing noisy codewords from uniform. Intuitively, this is a strong lower bound saying that no bit of information can be obtained from a noisy codeword. We next use this result to obtain lower bounds for decoding one symbol of the message given a noisy codeword. Some care is needed because some message symbols may just be copied in the codeword. This would allow one party to decode those symbols with no communication, even though the noisy codeword may be indistinguishable from uniform. The lower bound applies to codes that remain b-uniform even after fixing some input symbol. For such codes, a low- communication protocol cannot decode that symbol significantly better than by guessing at random.
0 n Theorem 2.12. Let C ⊆ Fq be a linear code with an n×r generator matrix G. Let i ∈ {1, 2, . . . , r} be an index, and let C be the code defined as C := {Gx | xi = 0}. Let N be the noise distribution from Definition 2.4. Let k be an integer. Suppose that C is b-uniform for n b ≥ n/k − 1. Let P : Fq → Fq be a k-party protocol using c bits of communication. Then
Pr[P (GU + N) = Ui] ≤ 1/q + ε, where U = (U1,U2,...,Ur) is the uniform distribution and ε is as in Theorem 2.10. We remark that whether C is b-uniform in general depends on both G and i. For example, let C0 be a Reed–Solomon code of dimension b = n/k. Recall that C0 is b-uniform. Note that if we choose i = 0 (corresponding to the evaluation of the polynomial at the point 0 ∈ Fq, which as we remarked earlier is a point we consider) then C has a fixed symbol and so is not even 1-uniform. On the other hand, if i = b then we obtain a Reed–Solomon code with dimension b − 1, which is (b − 1)-uniform, and the lower bound in Theorem 2.12 applies. We again obtain an almost matching upper bound. In fact, the corresponding protocol recovers the entire message.
n Theorem 2.13 (Recovering messages from noisy codewords). Let C ⊆ Fq be a code with distance d. Let t be an integer such that 2t < d, and let k be an integer dividing n. There n n is a k-party protocol P : Fq → Fq communicating max{n − d + 2t + 1 − n/k, 0}dlog2 qe bits such that for every x ∈ C and every e of Hamming weight ≤ t, P (x + e) = x.
13 A Reed–Solomon code with dimension b has distance d = n − b + 1. Hence we obtain communication max{b + 2t − n/k, 0}dlog2 qe, for any t such that 2t < n − b + 1. This upper bound matches the lower bound in Theorem 2.12 up to a log q factor. For example, when k = O(1) and b = n/k our upper bound is O(ηn log q) for η = t/n and our lower bound is Ω(ηn) − O(log n).
2.1.2 Application: Pseudorandomness The construction of explicit pseudorandom generators against restricted classes of tests is a fundamental challenge that has received a lot of attention at least since the 80’s, cf. [AW89, AKS87]. One class of tests extensively considered in the literature is concerned with algorithms that read the input bits in a one-way fashion in a fixed order. A lead- ing goal is to prove RL = L by constructing generators with logarithmic seed length that fool one-way, space-bounded algorithms, but here the seminal papers [Nis92, INW94, NZ96] remain the state of the art and have larger seed lengths. However, somewhat better genera- tors have been obtained for several special cases, including for example combinatorial rect- angles [AKS87, Nis92, NZ96, INW94, EGL+98, ASWZ96, Lu02, Vio14, GMR+12, GY14], combinatorial shapes [GMRZ13, De15, GKM18], and product tests [GKM18]. In particular, for combinatorial rectangles f :({0, 1}`)k → {0, 1} two incomparable results are known. For context, the minimal seed length up to constant factors is O(` + log(k/ε)). One line of research culminating in [Lu02] gives generators with seed length O(` + log k + log3/2(1/ε)). More recently, [GMR+12] (cf. [GY14]) improve the dependence on ε while making the depen- dence on the other parameters a bit worse: they achieve seed length O˜(` + log k + log(1/ε)). The latter result is extended to products in [GKM18] (with some other lower-order losses). Recently there has been considerable interest in extending tests by allowing them to read the bits in any order:[BPW11, BPW12, IMZ19, RSV13, SVW17]. This extension is significantly more challenging, and certain instantiations of generators against one-way tests are known to fail [BPW11]. We contribute new pseudorandom generators that fool product tests in any order.
Definition 2.14 (Fooling). A generator G: {0, 1}s → {0, 1}n ε-fools (or fools with error ε) a class T of tests on n bits if for every function f ∈ T we have |E[f(G(Us)) − E[f(Un)]| ≤ ε, where Us and Un are the uniform distributions on s and n bits respectively. We call s the seed length of G. We call G explicit if it is computable in time polynomial in n.
Definition 2.15 (Any order). We say that a generator G: {0, 1}s → {0, 1}n ε-fools a class T of tests in any order if for every permutation π on n bits the generator π◦G: {0, 1}s → {0, 1}n ε-fools T .
The next theorem gives some of our generators. The notation O˜() hides logarithmic factors in k and `. In this section we only consider alphabet size s = 2. We write the range {0, 1}`k of the generators as ({0, 1}`)k to indicate the parameters of the product tests.
14 Theorem 2.16 (PRG for unordered products, I). There exist explicit pseudorandom gen- erators G: {0, 1}s → ({0, 1}`)k that ε-fool product tests in any order, with the following seed lengths: 1. s = 2` + O˜(k2 log(1/ε)), and 2. s = O(`) + O˜((`k)2/3 log1/3(1/ε)).
One advantage of these generators is their simplicity: the generator’s output has the form D + N 0, where D is a small-bias distribution and N 0 is statistically close to a noise vector. Constructions in the literature tend to be somewhat more involved. In terms of parameters, we note that when k = O(1) we achieve in (1) seed length s = 2` + O(log 1/ε) log `, which is close to the value of `+O(log 1/ε), which is optimal even for the case of fixed order and k = 2. Our result is significant already for k = 3, but not for k = 2. In the latter√ case the seed length of (2 − Ω(1))` obtained in [BPW11] remains the best known. For k ≥ ` our generator in (2) has polynomial stretch, using a seed length O˜(n2/3) for output length n. For the sake of comparison we note again that [GKM18] has exponential stretch, achieving seed length O˜(`+log(k/ε)). However, [GKM18] does not work in any order. We note that for the special case of combinatorial rectangles f :({0, 1}`)k → {0, 1} a pseudorandom generator with seed length O((` + log k) log(1/ε)) follows from previous work. The generator simply outputs n bits such that any d · ` of them are 1/kd close to uniform in statistical distance, where d = c log(1/ε) for an appropriate constant c. Theorem 3 in [AGHP92] shows how to generate these bits from a seed of length O(` log(1/ε) + log log n + log(1/ε) log k) = O((` + log k) log(1/ε)). The analysis of this generator is as follows. The induced distribution on the outputs of k d the fi is a distribution on {0, 1} such that any d bits are 1/k close to the distribution of independent variables whose expectations are equal to the E[fi]. Now Lemma 5.2 in [CRS00] (cf. [EGL+98]) shows that the probability that the And of the output is 1 equals the product −Ω(d) k d of the expectations of the fi plus an error which is ≤ 2 + d d /k ≤ ε. However this generator breaks down if the output of the functions is {−1, 1} instead of {0, 1}. Moreover, its parameters are incomparable with those in Theorem 2.16.(1). In particular, for k = O(1) its seed length is ≥ ` log(1/ε), while as remarked above we achieve O(` + log ` log(1/ε)). √ We are able to improve the seed length of (2) in Theorem 2.16 to O˜( n), but then the resulting generator is more complicated and in particular it does not output a distribution of the form D + N 0. For this improvement we “derandomize” Theorem 2.5 and then combine it with a recursive technique originating in [AW89] and used in several recent works including [GMR+12, RSV13, SVW17, CSV15]. Our context and language are somewhat different from previous work, and this fact may make this chapter useful to readers who wish to learn the technique.
Theorem 2.17 (PRG for unordered products, II). There exists an explicit pseudorandom generator G: {0, 1}s → ({0, 1}`)k that ε-fools product tests in any order and seed length s = O(` + p`k log k log(k/ε)).
Recall that for b = ` the error bound in our Theorem 2.5 is k(1−η)Ω(b/k), and that it is an interesting question to ask whether the exponent can be improved to Ω(b). We show that if
15 such an improvement is achieved for the derandomized version of the theorem (stated later in Theorem 2.38) then one would get much better seed length: s = O((`+log k log(n/ε)) log n). In Chapter4 we will see that a positive answer to this question. Reingold, Steinke, and Vadhan [RSV13] give a generator√ that ε-fools width-w space algorithms on n bits in any order, with seed length s = O˜( n log(w/ε)). Every combinatorial ` k `−1 rectangle f :({0, 1} ) → {0, 1} can be computed by a one-way√ algorithm with width 2 +1 on n = `k bits. Hence they also get seed length O˜( `k(` + log 1/ε)) for combinatorial rectangles. Our Theorem 2.17 improves upon this by removing a factor of `. Going in the other direction, if D is a distribution on ({0, 1}`)k bits that ε-fools combi- natorial rectangles, then D also fools width-w one-way algorithms on n = `k bits with error wkε. Using this we obtain from Theorem 2.5 a new class of distributions that fools space, namely any distribution that is the sum of a distribution with high-enough independence (or small enough bias) and suitable noise. We state one representative result. Corollary 2.18 (Bounded independence plus noise fools space). Let D be a b-uniform dis- tribution on n bits. Let N be the noise distribution from Definition 2.4. If b ≥ n2/3 log n and η is any constant then D + N fools O(log n)-space algorithms in any order with error o(1). As mentioned earlier, [GKM18] show that if a generator fools products then it also fools several other computational models, with some loss in parameters. As a result, we obtain generators for the following two models, extended to read bits in any order. Definition 2.19 (Generalized halfspaces and combinatorial shapes). A generalized halfspace ` k P is a function h:({0, 1} ) → {0, 1} defined by h(x) := 1 if and only if i≤k gi(xi) ≥ θ, where ` g1, . . . , gk : {0, 1} → R are arbitrary functions and θ ∈ R. ` k P A combinatorial shape is a function f :({0, 1} ) → {0, 1} defined by f(x) := g( i≤k gi(xi)), ` where g1, . . . , gk : {0, 1} → {0, 1} and g : {0, . . . , k} → {0, 1} are arbitrary functions.
Theorem 2.20 (PRG for generalized halfspaces and combinatorial shapes, in any order). s ` k There exists an explicit pseudorandom generator G: {0, 1} → ({0, 1} ) that ε-fools√ both generalized halfspaces and combinatorial shapes in any order with seed length s = O˜(` k + p`k log(1/ε)). √ Note that for ε = 2−O(`) the seed length simplifies to O˜(` k).
2.1.3 Techniques We now give an overview of the proof of Theorem 2.5. The natural high-level idea, which our proof adopts as well, is to apply Fourier analysis and use noise to bound high-degree terms and independence to bound low-degree terms. Part of the difficulty is finding the right Q H way to decompose the product i≤k fi. We proceed as follows. For a function f let f be H L its “high-degree” Fourier part and fL be its “low-degree” Fourier part, so that f = f + f . Q Q L Our goal is to go from fi to fi . The latter is a product of low-degree functions and Q hence has low degree. Therefore, its expectation will be close to i µi by the properties
16 of the distribution D; here we do not use the noise N. To point out a limitation of this Q L argument, note that if D is n-wise independent we need fi to have degree ≤ `. Even if L each fi has degree 1 we cannot afford k larger than `. Q Q L H L To move from fi to fi we pick one fj and we decompose it as fj + fj . Iterating Q L this process we indeed arrive to fi , but we also obtain k extra terms of the form
H L L L f1f2 . . . fj−1fj fj+1fj+2 . . . fk for j = 1, . . . , k. We show that each of these terms is close to 0 thanks to the presence of H the high-degree factor fj . Here we use both D and N. We conclude this section with a brief technical comparison with the recent papers [GMR+12, GY14, GKM18] which give generators for combinatorial rectangles (and product tests). We note that the generators in those papers only fool tests f = f1 · f2 ··· fk that read the input in a fixed order (whereas our results allow for any order). Also, they do not use noise, but rather hash the functions fi in a different way. Finally, a common technique in those pa- pers is, roughly speaking, to use hashing to reduce the variance of the functions, and then show that bounded independence fools functions with small variance. We note that the noise parameters we consider in this chapter are too small to be used to reduce the vari- ance. Specifically, for a product test f those papers define a new function g = g1 · g2 ··· gk which is the average of f over t independent inputs. While g has the same expectation as f, the variance of each gi is less than that of fi by a factor of t. Their goal is to make the variance of each gi less than 1/k so that the sum of the variances is less√ than 1. In order to achieve this reduction with noise we would have to set η ≥ 1 − 1/ k. This is x because if fi simply is (−1) where x is one bit, then the variance of fi perturbed by noise 2 x+N x+N N+N 0 2 is Ex[EN [(−1) ]] − Ex,N [(−1) ] = Ex,N,N 0 [(−1) ] = (1 − η) .
Organization. In Section 2.2 we prove a more general version of Theorem 2.5. In Sec- tion 2.3 we give the proof details for the results in Section 2.1.1. The details for the results in Section 2.1.2 are spread over three sections. In Section 2.4 we prove Theorem 2.16. In Section 2.5 we prove Theorem 2.17, and discuss the potential improvement. In Section 2.6 we prove Theorem 2.20. In Section 2.7 we include for completeness a lower bound on the values of b and η for which Theorem 2.5 can apply.
2.2 Bounded independence plus noise fools products
In this section we prove Theorem 2.5. It follows easily from the next theorem which is the main result in this section. A distribution D over n bits has bias δ if any parity of the bits (with range {−1, 1}) has expectation at most δ in magnitude. The following definition extends this to larger alphabets.
n Definition 2.21. A distribution D = (D1,D2,...,Dn) over m is (b, δ)-biased if, for every Z P n i αiDi nonzero α ∈ Zm with at most b non-zero coordinates we have |ED[ω ]| ≤ δ where ω := e2πi/m. When b = n we simply call D δ-biased.
17 ` Let n = `k. Recall that we denote by V (t) = Vq,`(t) the number of x ∈ Fq with at most t non-zero coordinates.
` Theorem 2.22. Let t ∈ [0, `]. Let f1, . . . , fk : Zm → C1 be k functions with µi = E[fi]. Let n D be a (b, δ)-biased distribution over Zm for b ≥ 2(k − 1)t and each Di is uniform, which holds if b ≥ `. Let N = N(m, n, η) be the noise distribution from Definition 2.4. Write ` D = (D1,D2,...,Dk) where each Di is in Zm, and similarly for N. Then " #
Y Y tp ` k−1 k/2 E fi(Di + Ni) − µi ≤ k(1 − η) (1 + m δ)(1 + V (t) δ) + V (t) δ. i≤k i≤k
Note that [AGM03] show that a (b, δ)-biased distribution over {0, 1}n is ε-close in sta- Pb n tistical distance to a b-uniform distribution, for ε = δ i=1 i . (See [OZ18] for an optimal bound.) One can apply their results in conjunction with Theorem 2.5 to obtain a variant Pb n of Theorem 2.5 for small-bias distributions, but only if δ ≤ 1/ i=1 i . Via a direct proof we derive useful bounds already for δ = Ω(2−b), and this is used in the applications in Section 2.1.2. We now state and prove a more general version of Theorem 2.5 in the introduction.
` Corollary 2.23 (Generalization of Theorem 2.5). Let f1, . . . , fk : Zm → C1 be k functions n with µi = E[fi]. Set n := `k and let D be a b-uniform distribution over Zm. Let N be the ` noise distribution from Definition 2.4. Write D = (D1,D2,...,Dk) where each Di is in Zm, and similarly for N. Then " #
Y Y E fi(Di + Ni) − µi ≤ ε i≤k i≤k
for the following choices: (1) if b ≥ ` then ε = k(1 − η)Ω(b2/n). ` Ω(b/k) (2) if b < ` and each Di is uniform over Zm then ε = k(1 − η) . −Ω(ηb/k) ` −Ω(ηb) (3) if b < ` then ε = ke + 2k `−b e . Moreover, for ` = 1, m = 2, and any η, b, k there exist fi and D for which it is not true that ε < (1 − η)b+1. In particular, if b = Ω(n) then an upper bound on the error of the form k(1 − η)cn is false for sufficiently large c, using that η ≥ (log k)/n.
We use (1) in most of our applications. Occasionally we use (3) with b = ` − 1, in which case the error bound is O(`ke−Ω(η`/k)).
Proof of Corollary 2.23. Setting δ = 0 and t = b/2(k − 1) in Theorem 2.22 gives the bound
k(1 − η)b/2(k−1) (?)
which proves the theorem in the case ` ≤ b = O(`).
18 To prove (1) we need to handle larger b. For this, let c := bb/`c, and group the k functions into k0 ≤ k/c+1 functions on input length `0 := c`. Note that b ≥ `0, and so we can apply (?) to k0(1 − η)Ω(b/k0) ≤ k(1 − η)Ω(b2/k`). To prove (2) one can observe that in the proof of (?) the condition b ≥ ` is only used to guarantee that each Di is uniform. The latter is now part of our assumption. To prove (3) view the noise vector N as the sum of two noise vectors N 0 and N 00 with parameter α such that 1 − η = (1 − α)2. Note this implies α = Ω(η). If N 0 sets to uniform at least ` − b coordinates in each function then we can apply (?) to functions on ≤ b symbols with η replaced by α and N replaced by N 0. The probability that N 0 does not set to uniform that many coordinates is at most ` ` k (1 − α)b ≤ k e−Ω(ηb), ` − b ` − b and in that case the distance between the expectations is at most two. xi To show the “moreover” part, define fi := (−1) for i ≤ b + 1 and fi := 1 for i > b + 1 to compute parity on the first b + 1 bits, and let D be the b-wise independent distribution which is uniform on strings whose parity of the b + 1 bits is 0. The other bits are irrelevant. The expectation of parity under uniform is 0. The expectation of parity under D is 1 if no symbol is perturbed with noise, and is 0 otherwise. Hence the error is ≥ (1 − η)b+1. We now state and prove a special case of Theorem 2.22 for small-bias distributions.
` −` Corollary 2.24. Let f1, . . . , fk : Zm → C1 be k functions with µi = E[fi]. Assume δ ≤ s . n Let D be an (`, δ)-biased distribution over Zm. Let N be the noise distribution from Definition ` 2.4. Write D = (D1,D2,...,Dk) where each Di is in Zm, and similarly for N. Then " # √ Y Y Ω(log(1/δ)/(k log mk)) E fi(Di + Ni) − µi ≤ 2k(1 − η) + δ. i≤k i≤k
Proof of Corollary 2.24. Let c := bplog(1/δ)/(` log m)c. Note that c ≥ 1 because δ ≤ m−`. We group the k functions into k0 = dk/ce functions on input length `0 := c`. The goal is `0 0 t k0/2 k0−1 to make s close to 1/δ. By Claim 2.34, V`0 (t) ≤ (e` m/t) . Hence V`0 (t) ≤ V`0 (t) ≤ 0 k0t 0 0 0 (e` m/t) . Now let t = α` log m/(k √log mk ) for a small constant α > 0 so that the latter bound is ≤ m`0/2, which is roughly 1/ δ. The error bound in Theorem 2.22 now becomes at most
k(1 − η)t(1 + m`0 δ) + m`0/2δ.
And so the bound is at most √ 2k(1 − η)Ω(log(1/δ)/(k log mk)) + δ.
We now turn to the proof of Theorem 2.22. We begin with some preliminaries.
19 2.2.1 Preliminaries
Denote by U the uniform distribution. Let m be any positive integer. We write Zm for 2πi/m u {0, 1, 2, . . . , m − 1}. Let ω := e be a primitive m-th root of unity. For any α ∈ Zm, we u define χα(x): Zm → C to be hα,xi χα(x) := ω , u P where α and x are viewed as vectors in Zm and hα, xi := i αixi. u For any function f : Zm → C, its Fourier expansion is
X ˆ f(x) := fαχα(x), u α∈Zm
ˆ where fα ∈ C is given by ˆ u fα := Ex∼Zm [f(x)χα(x)]. Here and elsewhere, random variables are uniformly distributed unless specified otherwise. P The Fourier L1-norm of f is defined as α|fα|, and is denoted by L1[f]. The degree of ˆ f is defined as max{|α| : fα 6= 0}, where |α| is the number of nonzero coordinates of α, and is denoted by deg(f). Note that we have L1[f] = L1[f]. The following fact bounds the L1-norm and degree of product functions.
u Fact 2.25. For any two functions f, g : Zm → C, we have (1) deg(fg) ≤ deg(f) + deg(g), and
(2) L1[fg] ≤ L1[f]L1[g].
We shall use this fact both for f and g on disjoint inputs and for f and g on the same input.
Proof. We have
! X ˆ X X ˆ X X ˆ f(x)g(x) = fαχα(x) gˆβχβ(x) = fαgˆβχα+β(x) = fα−βgˆβ χα(x). ` ` α∈Zm β∈Zm α,β α β
P ˆ Hence the α-th Fourier coefficient of f · g is β fα−βgˆβ. To see (1), note that in the latter expression the sum over β can be restricted to those β with |β| ≤ deg(g). Now note that if |α| > deg(f) + deg(g) then |α − β| > deg(f) and hence ˆ fα−β will be zero for every β. P P ˆ P ˆ P ˆ P To show (2) write L1[fg] = α| β fα−βgˆβ| ≤ α,β|fα−β||gˆβ| = ( α|fα|)( β|gˆβ|) = L1[f]L1[g].
P ˆ 2 2 Fact 2.26 (Parseval’s identity). ` |fα| = ` [|f(x)| ]. In the case of f ∈ 1, this α∈Zm Ex∼Zm C quantity is at most 1.
20 Proof.
X X X X 2 ` [f(x)f(x)] = ` [ fˆ χ (x)· fˆ 0 χ 0 (x)] = fˆ fˆ 0 ` [χ 0 (x)] = |fˆ | . Ex∼Zm Ex∼Zm α α α α α α Ex∼Zm α−α α ` 0 ` 0 ` ` α∈Zm α ∈Zm α,α ∈Zm α∈Zm
0 where the last equality holds because we have ` [χ 0 (x)] equals 0 if α 6= α and equals Ex∼Zm α−α 1 otherwise.
` Fact 2.27. Let N = (N1,...,N`) be the distribution over Zm, where the symbols are inde- pendent and each of them is set to uniform with probability η and is 0 otherwise. Then for ` |α| every α ∈ Zm, E[χα(N)] = (1 − η) .
Proof. The expectation conditioned on the event “N sets none of the nonzero positions of α to uniform” is 1. This event happens with probability (1 − η)|α|. Conditioned on its complement, the expectation is 0. To see this, assume that the noise vector sets to uniform αi position i of α, and that αi 6= 0. Let β := ω . Then the expectation can be written as a product where a factor is
1 βm − 1 [βx] = · = 0, Ex∼{0,1,...,m−1} m β − 1
m m αi using the fact that β 6= 1 because αi ∈ {1, 2, . . . , m−1} and that β = (ω ) = 1. Therefore the total expectation is (1 − η)|α|.
Note that this lemma includes the uniform η = 1 case, with the convention 00 = 1. We will use the following facts multiple times.
` Fact 2.28. Let f : Zm → C be a function with degree b. We have: ` (1) For any (b, δ)-biased distribution D over Zm, |E[f(D)] − E[f(U)]| ≤ L1[f]δ, ` 2 2 2 (2) For any (2b, δ)-biased distribution D over Zm, |E[|f(D)| ] − E[|f(U)| ]| ≤ L1[f] δ, and (3) the bound in (2) holds even if D is (`, δ) biased.
Proof. For (1), note that | [f(D)]− [f(U)]| = P fˆ [χ (D)] ≤ P |fˆ || [χ (D)]| ≤ E E 0<|α|≤b α E α 0<|α|≤b α E α L1[f]δ. For (2), recall that |f(x)|2 = f(x)f(x). By Fact 2.25, the function |f(x)|2 has degree ¯ 2 ≤ 2b. Also, again by Fact 2.25 the L1-norm of that function is at most L1[f]·L1[f] = L1[f] . Now the result follows by (1). ` Finally, (3) is proved like (2), noting that a function on Zm always has degree ≤ `.
P ˆ Actually the bounds hold with α6=0 |fα| instead of L1[f], but we will not use that.
21 2.2.2 Proof of Theorem 2.22 ` P ˆ For a function f : Zm → C1, consider its Fourier expansion f(x) := α fαχα(x), and let L P ˆ H P ˆ ` k f (x) := α:|α|≤t fαχα(x) and f (x) := α:|α|>t fαχα(x). Define Fi :(Zm) → C to be
Y H Y L Fi(x1, . . . , xk) := fj(xj) · fi (xi) · f` (x`) . ji
L H Pick fk and write it as fk + fk . We can then rewrite
Y Y L fi = Fk + fi · fk . 1≤i≤k 1≤i≤k−1 Q Q H L We can reapply the process to fi = (fi + fi ). Continuing this way, we 1≤i≤k−1 Q1≤i≤k−1 Q eventually have what we want to bound, i.e. |E[ i≤k fi(Di + Ni)] − i≤k µi|, is at most
h i X Y L Y E[Fi(D + N)] + E fi (Di + Ni) − µi . i≤k i≤k i≤k The theorem follows readily from the next two lemmas, the second of which has a longer proof. Q L Q k/2 Lemma 2.29. |E[ i≤k fi (Di + Ni)] − i≤k µi| ≤ V (t) δ. L Proof. Fix N arbitrarily. Each fi has degree at most t, and by the Cauchy–Schwarz in- P ˆ 1/2 P ˆ 2 1/2 1/2 equality, it has L1-norm |α|≤t|fα| ≤ V (t) ( α|fα| ) ≤ V (t) . Here we use the fact Q L that f maps to C1 and Fact 2.26. Hence, by Fact 2.25, 0
h i Y L Y k/2 ED fi (Di + Ni) − µi ≤ V (t) δ. i≤k i≤k Averaging over N proves the claim.
tp ` k−1 Lemma 2.30. For every i ∈ {1, 2, . . . , k}, we have |E[Fi(D+N)]| ≤ (1−η) (1 + m δ)(1 + V (t) δ). Proof. We have
E[Fi(D + N)]
h i Y H Y L = E fj(Dj + Nj) · fi (Di + Ni) · f` (D` + N`) ji hY H Y L i ≤ ED ENj [fj(Dj + Nj)] · ENi [fi (Di + Ni)] · EN` [f` (D` + N`)] ji h H Y L i ≤ ED ENi [fi (Di + Ni)] · EN` [f` (D` + N`)] , `>i
22 where the last inequality holds because |ENj [fj(Dj +Nj)]| ≤ ENj [|fj(Dj +Nj)|] ≤ 1 for every j < i, by Jensen’s inequality, convexity of norms, and the fact that the range of fj is C1. By the Cauchy–Schwarz inequality, we get
1/2 " #1/2 2 Y 2 | [F (D + N)]| ≤ f H (D + N ) · f L(D + N ) . E i ED ENi i i i ED EN` ` ` ` `>i
In claims 2.32 and 2.33 below we bound above the square of the two terms on the right-hand side. In both cases, we view our task as bounding |ED[g(D)]| for a certain function g, and we proceed by computing the L1-norm, average over uniform, and degree of g, and then we apply Fact 2.28. We start with a claim that is useful in both cases.
` Claim 2.31. Let f : Zm → C be a function. Then: P ˆ |α| 1. for every x, EN [f(x + N)] = α fαχα(x)(1 − η) , and 2 P ˆ 2 2|α| 2. EU |EN [f(U + N)]| = α |fα| (1 − η) . P ˆ P ˆ Proof. For (1), write EN [f(x+N)] = EN [ α fαχα(x+N)] = α fαχα(x) EN [χα(N)]. Then apply Fact 2.27. 2 For (2), write |EN [f(x + N)]| as EN [f(x + N)]EN [f(x + N)]. Then apply (1) twice to further write it as
hX ˆ ˆ |α|+|α0|i X ˆ ˆ |α|+|α0| EU fαfα0 χα−α0 (U)(1 − η) = fαfα0 EU [χα−α0 (U)](1 − η) . α,α0 α,α0
The claim then follows because U is uniform.
We can now bound our terms.
h H 2i 2t ` Claim 2.32. For every i, ED ENi [fi (Di + Ni)] ≤ (1 − η) (1 + m δ).
H Proof. Let g(x) be the function g(x) = ENi [fi (x + Ni)]. By (1) in Claim 2.31, the L1-norm P ˆ |α| t P ˆ t `/2 of g is at most α:|α|>t |fiα|(1 − η) ≤ (1 − η) α |fiα| ≤ (1 − η) m , where we used Cauchy–Schwarz and Fact 2.26. 2 2t By (2) in Claim 2.31 and Fact 2.26, EU [|g(U)| ] under uniform is at most (1 − η) . 2 2t Because b ≥ ` we can apply (3) in Fact 2.28 to obtain that ED[|g(D)| ] ≤ (1 − η) + (1 − η)2tm`δ as claimed.
hQ L 2i k−1 Claim 2.33. ED `>i EN` [f` (D` + N`)] ≤ 1 + V (t) δ.
L Proof. Pick any ` > i and let g`(x) := EN [f` (x + N`)]. 1/2 The L1-norm of g` is at most V (t) by (1) in Claim 2.31 and Cauchy–Schwarz. Also by 2 (2) in the same claim we have EU [|g`(U)| ] ≤ 1. Moreover, g` has degree at most t by (1) in the same claim.
23 ` k−i Now define g :(Zm) → C as g(xi+1, xi+2, . . . , xk) := gi+1(xi+1) · gi+2(xi+2) ··· gk(xk). (k−i)/2 (k−1)/2 Note that g has L1-norm at most V (t) ≤ V (t) and degree (k − i)t ≤ (k − 1)t, 2 by Fact 2.25 applied with u = `(k − i). Moreover, EUi+1,Ui+2,...,Uk [|g(Ui+1,Ui+2,...,Uk)| ] = 2 2 2 EUi+1 [|gi+1| ] · EUi+2 [|gi+2| ] ··· EUk [|gk| ] ≤ 1. Because b ≥ 2(k − 1)t, we can apply (2) in Fact 2.28 to obtain
2 k−1 ED |g(D)| ≤ 1 + V (t) δ as desired.
Lemma 2.30 follows by combining claims 2.32 and 2.33.
2.3 Proofs for Section 2.1.1
In this section we provide the proofs for the claims made in Section 2.1.1.
n Theorem 2.10 (Distinguishing noisy codewords from uniform is hard). Let C ⊆ Fq be a b-uniform code. Let N be the noise distribution from Definition 2.4. Let k be an integer n dividing n such that b ≥ n/k − 1. Let P : Fq → {0, 1} be a k-party protocol using c bits of communication. Then
c+log n+O(1)−Ω(ηb/k) Pr[P (C + N) = 1] − Pr[P (U) = 1] ≤ ε for ε = 2 ,
n where C and U denote the uniform distributions over the code C and Fq respectively. Proof. Let L be the set of the 2c leaves of the protocol tree. For ` ∈ L, note that the set of inputs that lead to ` forms a rectangle, denoted R`. Moreover, these rectangles are disjoint. Hence,
X X Pr[P (C + N) = 1] − Pr[P (U) = 1] = Pr[C + N ∈ R`] − Pr[U ∈ R`] ` ` X ≤ Pr[C + N ∈ R`] − Pr[U ∈ R`] . `
If b ≥ n/k − 1, then applying Corollary 2.23.(3) with ` = n/k to each R` we have
` ε = 2c ke−Ω(ηb/k) + 2k e−Ω(ηb) ` − b = 2c ke−Ω(ηb/k) + 2ne−Ω(ηb) = 2c 3ne−Ω(ηb/k) = 2c+log n+O(1)−Ω(ηb/k).
n Recall that we denote by V (t) the number of x ∈ Fq with at most t non-zero coordinates.