<<

2009 50th Annual IEEE Symposium on Foundations of Computer Science

Multiparty Communication Complexity and Threshold Circuit Size of AC0

Paul Beame∗ Dang-Trinh Huynh-Ngoc∗† Computer Science and Engineering Computer Science and Engineering University of Washington University of Washington Seattle, WA 98195-2350 Seattle, WA 98195-2350 [email protected] [email protected]

Abstract— We prove an nΩ(1)/4k lower bound on the random- that the Generalized Inner Product function in ACC0 requires ized k-party communication complexity of depth 4 AC0 functions k-party NOF communication complexity Ω(n/4k) which is in the number-on-forehead (NOF) model for to Θ(log n) polynomial in n for k up to Θ(log n). players. These are the first non-trivial lower bounds for general 0 NOF multiparty communication complexity for any AC0 function However, for AC functions much less has been known. for ω(log log n) players. For non-constant k the bounds are larger For the communication complexity of the set disjointness than previous lower bounds for any AC0 function even for function with k players (which is in AC0) there are lower simultaneous communication complexity. bounds of the form Ω(n1/(k−1)/(k−1)) in the simultaneous Our lower bounds imply the first superpolynomial lower bounds NOF [24], [5] and nΩ(1/k)/kO(k) in the one-way NOF for the simulation of AC0 by MAJ ◦ SYMM ◦ AND circuits, showing that the well-known quasipolynomial simulations of AC0 model [26]. These are sub-polynomial lower bounds for all by such circuits are qualitatively optimal, even for formulas of non-constant values of k and, at best, polylogarithmic when small constant depth. k is Ω(log n/ log log n). NPcc − BPPcc k We also exhibit a depth 5 formula√ in √ k k for Until recently, there were no lower bounds for general up to Θ(log n) and derive an Ω(2 log n/ k) lower bound on multiparty NOF communication complexity of any AC0 the randomized k-party NOF communication complexity of set 1/3 function. That changed with recent lower bounds for set disjointness for up to Θ(log n) players which is significantly disjointness by Lee and Shraibman [14] and Chattopadhyay larger than the O(log log n) players allowed in the best previous lower bounds for multiparty set disjointness. We prove other strong and Ada [8] but no lower bounds apply for ω(log log n) 0 results for depth 3 and 4 AC0 functions. players. As for circuit simulations of AC , Sherstov [20] - cently showed that AC0 cannot be simulated by polynomial- Keywords-communication complexity, constant-depth circuits, lower bounds size MAJ ◦ MAJ circuits. However, there have been no non-trivial size lower bounds for the simulation of AC0 by 1. INTRODUCTION MAJ ◦ MAJ ◦ AND or even SYMM ◦ AND circuits with ω(log log n) bottom fan-in. As shown by Viola [25], suffi- The multiparty communication complexity of AC0 in ciently strong lower bounds for AC0 in the multiparty NOF the number-on-forehead (NOF) model has been an open communication model, even for sub-logarithmic numbers of question since Hastad˚ and Goldmann [11] showed that players, can yield quasipolynomial circuit size lower bounds. any AC0 or ACC0 function has polylogarithmic random- We indeed produce such strong lower bounds. We show ized multiparty NOF communication complexity when its that there is an explicit linear-size fixed-depth AC0 function input bits are divided arbitrarily among a polylogarithmic that requires randomized k-party NOF communication com- number of players. This result is based on the simulations, plexity of nΩ(1)/4k even for error exponentially close to due to Allender and Yao, of AC0 circuits [1] and ACC0 1/2. For ω(1) players this bound is larger than all previous circuits [27] by quasipolynomial-size depth-3 circuits that multiparty NOF communication complexity lower bounds consist of two layers of MAJORITY gates whose inputs are for AC0 functions, even those in the weaker simultaneous polylogarithmic-size AND gates of literals. These protocols model. The bound is non-trivial for up to Θ(log n) players may even be simultaneous NOF protocols in which the and is sufficient to apply Viola’s arguments to produce fixed- players in parallel send their information to a referee who depth AC0 functions that require MAJ◦SYMM◦AND circuits computes the answer [2]. of nΩ(log log n) size, showing that quasipolynomial size is It is natural to ask whether these upper bounds can be im- necessary for the simulation of AC0. proved. In the case of ACC0, Razborov and Wigderson [18] The function for which we derive our strongest commu- showed that quasipolynomial size is required to simulate nication complexity lower bound is computable in depth 6 ACC0 based on the result of Babai, Nisan, and Szegedy [4] AC0. In the case of protocols with error 1/3, we exhibit ∗Research supported by NSF grants CCF-0514870 and CCF-0830626 a hard function computable by simple depth 4 formulas. †Research supported by a Vietnam Education Foundation Fellowship We further show that the same lower bound applies to a

0272-5428/09 $26.00 © 2009 IEEE 53 DOI 10.1109/FOCS.2009.12 function having depth 5 formulas that also has O(log2 n) a distribution µ on inputs such that, with respect to µ, nondeterministic communication complexity which shows g is both highly correlated with f and orthogonal to all 0 cc m that AC contains functions in NPk − BPPk for k up -degree polynomials. It follows that f ◦ ψ is highly Θ(log n) g ◦ ψm to . As a consequence of the√ lower√ bound for correlated with and, by the discrepancy method for this depth 5 function, we obtain Ω(2 log n/ k−k) lower communication complexity, it suffices to prove a discrepancy bounds on the k-party NOF communication complexity of lower bound for g ◦ ψm. Thanks to the orthogonality of g to set disjointness which is non-trivial for up to Θ(log1/3 n) all low degree polynomials this is possible using the bound players. The best previous lower bounds for set disjointness in [4], [9], [17] derived from the iterated application of the only apply for k ≤ log log n − o(log log n) players (though Cauchy-Schwartz inequality. For example, the bound for set n k these bounds are stronger than ours for o(log log n) players). disjointness DISJk,n(x) = ∨i=1 ∧j=1 xji corresponds to a In the full paper, we also show somewhat weaker lower particular√ selector ψ and f = OR which has approximate bounds of nΩ(1)/kO(k), which is polynomial in n for up degree Ω( n). to k = Θ(log / log log n) players, for another function In the two party case, Sherstov [23] and Razborov and in depth 4 AC0 that has O(log3 n) nondeterministic com- Sherstov [19] extended the pattern matrix method to yield munication complexity and yet another in depth 3 AC0 sign-rank lower bounds for some simple functions. A key Ω(1/k) O(k) idea for their arguments is the existence of orthogonalizing that has n /2 √randomized k-party communication complexity for k = Ω( log n) players. distributions µ for their functions that are “min-smooth” in that they assign at least some fixed positive probability to Methods and Related Work: Recently, Sherstov intro- any x such that f(x) = 1. duced the pattern matrix method, a general method to use By contrast we show that any function f for which analytic properties of Boolean functions to derive communi- approximating f within  on only a subset S of inputs cation lower bounds for related Boolean functions [20], [22]. requires large degree, there is an orthogonalizing distribution In [20], this analytic property was large threshold degree, µ for f that is “max-smooth” – the probability of subsets and the resulting communication lower bounds yielded lower defined by partial assignments is never much larger than AC0 MAJ ◦ MAJ bounds for simulations of by circuits. under the uniform distribution. The smoothness quality and Sherstov [22] extended this to large approximate degree, the properties of the constrained subset S are determined yielding a strong new method for lower bounds for two- by a function α so we call the degree bound the (, α)- party randomized and quantum communication complexity. approximate degree. We show that for any function this Chattopadhyay [7] generalized [20] to pattern tensors for degree bound is large if there is a diverse collection of partial k ≥ 2 players to yield the first lower bounds for the assignments ρ such that each subfunction f|ρ of f requires general NOF multiparty communication complexity of any large approximate degree. This property is somewhat deli- 0 AC function for k ≥ 3, implying exponential lower bounds cate but we are able to exhibit simple AC0 functions with 0 for computation of AC functions by MAJ ◦ SYMM ◦ ANY large (, α)-approximate degree. circuits with o(log log n) input fan-in – our results extend this to fan-in Ω(log n). Lee and Schraibman [14] and 2. PRELIMINARIESANDTHEGENERALIZED Chattopadhyay and Ada [8] applied the full method in [22] DISCREPANCY/CORRELATION METHOD to pattern tensors to yield the first lower bounds for the : Let AND denote the class of all general NOF multiparty communication complexity of set unbounded fan-in ∧ functions (of literals), SYMM denote disjointness for k > 2 players, improving on a long line of the class of all symmetric functions and MAJ ⊂ SYMM research on the problem [3], [24], [5], [26], [12], [6] and AC0 1 O(k) denote the class of all majority functions. is the class of obtaining a lower bound of Ω(n k+1 )/22 . This yields functions f : {0, 1}∗ → {0, 1} computed by polynomial size a separation between randomized and nondeterministic k- circuits (or formulas) of constant depth having ¬ gates and party models for k = o(log log n), which David, Pitassi, and unbounded fan-in ∧ and ∨ gates. Given classes of functions Viola [10] improved to Ω(log n) players for other functions C1, C2,... Cd, we let C1 ◦ C2 ◦ · · · ◦ Cd be the class of all based on pseudorandom generators. They asked whether circuits of depth d whose inputs are given by variables and there was a separation for Ω(log n) players for AC0 functions their negations and whose gates at the i-th level from the 0 since their functions are only in AC for k = O(log log n), top are chosen from Ci. a problem which our results resolve. We will assume that Boolean functions on m bits are maps The high-level idea of the k-party version of the pattern f : {0, 1}m → {−1, 1}. matrix method as described in [8], [21] is as follows. To Correlation: Let µ be a distribution on {0, 1}m. The prove k-party lower bounds for a function F , we first correlation between two real-valued functions f and g under m show that F has f ◦ ψ as a subfunction where ψ is µ is defined as Corµ(f, g) := Ex∼µ[f(x)g(x)]. If G is a a bit-selection function and f has large approximate de- class of functions, the correlation between f and G under µ gree. For such an f there exists another function g and is defined as Corµ(f, G) := maxg∈G Corµ(f, g).

54 k k Communication complexity: Let D (f),  (f), and The second major component of the pattern matrix/tensor N k(f) denote the k-party deterministic, randomized with method is the use of particular selector functions to provide two-sided error , and nondeterministic, respectively, com- inputs to functions f with large -approximate degree. munication complexity of f. Let Πc be the class of output k ks functions of all deterministic k-party communication proto- Definition Any function ψ : {0, 1} → {0, 1} with the cols of cost at most c. following property is a selector function: s • There exist sets Dψ,1,...,Dψ,(k−1) ⊆ {0, 1} such Fact 2.1 (cf. [13]). If there exists a distribution µ such that Y = (Y ,...,Y ) ∈ D := c k that for any 1 k−1 ψ Corµ(f, Π ) ≤  then R (f) ≥ c. k 1/2−/2 Dψ,1 × · · · × Dψ,(k−1), PrX∈{0,1}s [ψ(X,Y ) = 0] = Because of the following property of multiparty com- PrX∈{0,1}s [ψ(X,Y ) = 1] = 1/2. munication complexity, henceforth we find it convenient to Let D(m) := Dm × · · · × Dm . For any function designate the input to player 0 as x and the inputs to players ψ ψ,1 ψ,(k−1) f : {0, 1}m → {1, −1} and any selector function ψ we 1 through k − 1 as y , . . . , y . 1 k−1 define a new function f ◦ ψm on {0, 1}kms bits by, on any Lemma 2.2 ([4], [9], [17]). Let f : {0, 1}m×k → and U ms (m) R x ∈ {0, 1} and y = (y1, . . . , yk−1) ∈ Dψ , be the uniform distribution over X × Y where Y = Y1 × f ◦ ψm(x, y) = f ◦ ψm(x, y , . . . , y ) · · · × Yk−1. Then, 1 k−1 c 2k−1 = f(ψ(x1, y∗1), . . . , ψ(xm, y∗m)), CorU (f, Πk) c·2k−1 h  Y u  i where y∗i = (y1i, . . . , y(k−1)i) for i ∈ [m]. We will write ≤ 2 · Ey0,y1∈Y Ex∈X f(x, y ) zi = ψ(xi, y∗i) and z = (z1, . . . , zm) for the input to f. u∈{0,1}k−1 In the k-party NOF communication problem for f ◦ ψm on u u1 uk−1 k−1 ms where y = (y1 , . . . , yk−1 ) for u ∈ {0, 1} . input x, y1, . . . , yk−1 ∈ {0, 1} , player 0 holds x and can see all the y and each other player i holds y (but can Approximate and threshold degree: Given 0 ≤  < 1, i i only see x and all y for j 6= i) and they need to compute the -approximate degree of f, deg (f), is the smallest d for j  f ◦ ψm(x, y , . . . , y ). which ||f − p|| = max |f(x) − p(x)| ≤  for some real- 1 k−1 ∞ x One example of a selector function ψ is the pattern tensor valued polynomial p of degree d. Following [16] we have function ψ used in [8], [14] which generalizes the pattern the following property of the approximate degree of OR. k,` matrix function. In this example, s = `k−1 and the s bits are m k−1 Proposition 2.3. Let ORm : {0, 1} → {1, −1}. For 0 ≤ arranged in a (k − 1)-dimensional array indexed by [`] . p s  < 1, deg(ORm) ≥ (1 − )m/2. Dψk,`,j consists of the ` vectors Yj ∈ {0, 1} that are 1 in all entries in one of the ` slices along the j-th dimension of this The threshold degree of f, thr(f), is the smallest d for array and are 0 in every other entry. For X ∈ {0, 1}s and which there exists a multivariate real-valued polynomial p of such a Y = (Y ,...,Y ) ∈ {0, 1}(k−1)s the array ∧k−1Y degree d such that f(x) = sign(p(x)). Because the domain 1 k−1 i=1 i contains precisely one 1 which selects the bit of X to pass of f is finite, we can assume without loss of generality that to f. This function is expressible by a small 2-level ∨ of ∧s. p(x) 6= 0 for all x since we can shift p by adding the constant As described in [10] the generalized discrepancy/correlation 1 · max |f(x)| to p. Thus the condition on p can 2 x:f(x)<0 arguments work for any selector function that uses the inputs be replaced by f(x)p(x) > 0 on every input x. Hence it for players 1 to k − 1 to select which bits from player follows that thr(f) = min deg (f). For this reason, we <1  0’s input to pass on to f, but we need our more general write thr(f) = deg (f). <1 formulation for some examples we consider in the full paper. Define an inner product h, i on the set of functions f : We give a brief overview of the remainder of the argument {0, 1}m → by hf, gi = E[f · g]. For S ⊆ [m], let χ : R S in [8], [10], which extends ideas of [20], [22] from 2-party {0, 1}m → {−1, 1} be the function χ = Q (−1)xi . The S i∈S to k-party communication complexity. χS for S ⊆ [m] form an orthonormal basis of this space. • f m The following Orthogonality-Approximation Lemma is Start with a on bits having large (1 − δ) d the key to lower bounds using the pattern matrix (and pattern -approximate degree . • f tensor) method. It is easily proved by duality of ` and ` Apply the Orthogonality/Approximation Lemma to 1 ∞ g (1 − δ) f norms or by LP duality. to obtain a that is -correlated with and a distribution µ under which g is not correlated with any m Lemma 2.4 ([22]). If f : {0, 1} → {−1, 1} has deg(f) ≥ low degree polynomial. m d then there exists a function g : {0, 1} → {−1, 1} and a • Observe that from µ one can define a natural λ under distribution µ on {0, 1}m such that: which g◦ψm and f ◦ψm have the same high correlation m 1) Corµ(g, f) > ; and as g and f so to prove that f ◦ ψ is uncorrelated with 2) for every S ⊆ [m] with |S| < d and every function low communication protocols, by the triangle inequality |S| m h : {0, 1} → R, Ex∼µ[g(x) · h(x|S)] = 0. it suffices to prove this for g ◦ ψ .

55 • The BNS-Chung bound/Gowers’ norm used in This definition is an obvious weakening of the usual Lemma 2.2 is based on the expectation of a function’s `∞ approximation of f since the non-light points can be correlation with itself on randomly chosen hypercubes ignored in the approximation. We will use this definition to of points. Use the orthogonality of g under µ to all prove our main technical theorem on the application of the polynomials of degree < d to show that all low degree general correlation method. To prove the theorem, we need self-correlations of g ◦ ψm under λ disappear. The the following lemma which generalizes Lemma 2.4 and is remaining high-degree self-correlations are bounded by the first key to our substantially improved lower bounds. Its analyzing overlaps in the choices of bits in different proof, which is based on LP duality, is given in Section 7. inputs among the hypercube of inputs. The argument Lemma 3.1 (Max-Smooth Orthogonality-Approximation repeatedly bounds the probability mass that µ assigns Lemma). Let 0 <  ≤ 1 and α : {0, . . . , m} → R. If to small sub-cubes of the input by 1. m f : {0, 1} → {−1, 1} has deg<,α(f) ≥ d, then there • The final lower bound is limited both by the upper exists a function g : {0, 1}m → {−1, 1} and a distribution bound on correlation in the high degree case and by the µ on {0, 1}m such that: number of input bits required for each selector function. 1) Corµ(g, f) ≥ ; Our argument follows this basic outline but improves it 2) for every S ⊆ [m] with |S| < d and every function in two different ways. First, by considering a new measure |S| h : {0, 1} → R, Ex∼µ[g(x) · h(x|S)] = 0; and that strengthens (1 − δ)-approximate degree we are able α(|ρ|)−|ρ| 3) for any restriction ρ, µ(Cρ) ≤ 2 /. to obtain a much sharper upper bound on the high-degree self-correlations and second, we use a selector function that Although the upper bound on µ(Cρ) can be much larger −|ρ| requires many fewer bits. We also show that some simple than the 2 probability under the uniform distribution, functions require large values for our strengthened measure we can use it to obtain an exponential improvement in the (which turns out to be fairly non-trivial to prove). dependence of communication complexity lower bounds on α0 k if α(r) is bounded below r for r ≥ d and α0 < 1. As we 3. BEYONDAPPROXIMATEDEGREE: A NEW SUFFICIENT note in Section 7, for any function f computed by an AC0 CRITERIONFORSTRONGCOMMUNICATIONCOMPLEXITY circuit this assumption and the upper bound are essentially BOUNDS the best possible for d polynomial in m. We introduce our notion of (, α)-approximate degree and Definition Let ψ be a selector function with Dψ = Dψ,1 × show how it implies our main technical theorem on the · · · × D . For fixed y0, y1 ∈ D(m), i ∈ [m] and general correlation method. ψ,(k−1) ψ uniformly random x , we call i good for (y0, y1) if the A restriction is a ρ ∈ {0, 1, ∗}m, and we let |ρ| = |{i : i set of 2k−1 random variables zu = ψ(x , yu ) for u ∈ ρ 6= ∗}|. Two restrictions π and ρ are compatible, π k ρ, i i ∗i i {0, 1}k−1 are mutually independent, where yu is defined iff they agree on all non-star positions. Let Cρ = {x ∈ 0 1 m as in Lemma 2.2; otherwise we call i bad for (y , y ). Let {0, 1} : x k ρ}. 0 1 0 1 Rψ(y , y ) be the set of i ∈ [m] that are bad for (y , y ) 0 1 0 1 Definition Let α : {0, . . . , m} → R. Given a probability and let rψ(y , y ) = |Rψ(y , y )|. m distribution λ on the set of restrictions {0, 1, ∗} , we say We can now state the main technical consequence of the x ∈ {0, 1}m α-light for λ P 2|ρ|−α(|ρ|)λ(ρ) ≤ that is iff ρkx Max-Smooth Orthogonality-Approximation Lemma. A sim- 1. Note that when α(r) = r, every point is α-light for every ilar version with α(r) = r follows from earlier work but the distribution λ. ability to have α(r) < rα0 for large r yields exponentially Definition Let α : {0, . . . , m} → R. The (, α)- better lower bounds than in previous work. 1 approximate degree of f, denoted as deg (f), is defined m ,α Theorem 3.2. Let α : {0, . . . , m} → R. If f : {0, 1} → to be the minimum integer d ≥ 0 such that there is {1, −1} has deg<1−,α(f) ≥ d and ψ is a selector function some polynomial q of degree ≤ d and some probability ks on {0, 1} with Dψ = Dψ,1 × · · · × Dψ,(k−1) then distribution λ on restrictions such that for every x ∈ {0, 1}m k m if x is α-light for λ then |f(x) − q(x)| ≤ . Note that R1/2−(f ◦ ψ ) ≥ log2((1 − )) deg (f) α(r) ≥ r r m this reduces to  if for all . Also define 1  X k−1  − log 2(2 −1)α(r) ·Pr [r (y0, y1) = r] . deg<,α(f) = inf0< deg0,α(f). As we write thr(f) = k−1 2 ψ 2 y0,y1∈D(m) r=d ψ deg<1(f), we will usually say “α-threshold degree” for (< 1, α)-approximate degree. Proof: The pattern of the argument follows the outline from Section 2. We first apply Lemma 3.1 to f to produce 1 We use the same notation for a somewhat different and more general function g and distribution µ. By construction Corµ(f, g) ≥ definition than that in earlier versions of this paper. First, α previously was 1−. Then we define a distribution λ on {0, 1}mks based on a constant analogous to logr α(r) though this was not defined for all r. Second, the old definition was closer to that of a related quantity that we µ(z1, . . . , zm) ∗ µ and ψ by λ(x, y) = n−m m where zi = ψ(xi, y∗i) now call deg,α and define later. 2 |Dψ|

56 (m) 0 1 for y ∈ Dψ and 0 otherwise. To prove a lower bound c those i ∈ [m] that are bad for (y , y ). Therefore on Rk (f ◦ ψm) we show that Cor (f ◦ ψm, Πc ) ≤ 2. 1/2− λ k Y u u E{zu} ∼Z6=0...0|z0...0 µ(z )g(z ) Since ψ is a selector function, each zi = ψ(xi, y∗i) is a u6=0...0 u6=0...0 uniformly random bit for each fixed y∗i ∈ Dψ and random Y u u m m u 6=0...0 0...0 xi. We therefore have Corλ(f ◦ψ , g◦ψ ) = Corµ(f, g) ≥ = E{z }u6=0...0∼Z |(z ) 0 1 µ(z )g(z ). i i∈Rψ (y ,y ) m c m c u6=0...0 1 − , hence Corλ(f ◦ ψ , Πk) ≤  + Corλ(g ◦ ψ , Πk) by the triangle inequality. It therefore suffices to show that 0...0 m c This quantity is some function Q of z that depends Corλ(g ◦ ψ , Π ) ≤ . 0 1 0...0 k 0 1 on only the r = rψ(y , y ) variables (zi )i∈Rψ (y ,y ). By Lemma 2.2, if we let U be the uniform distribution Therefore ms (m) on the set of (x, y) ∈ {0, 1} × Dψ and zi = ψ(xi, y∗i) 0 1  0...0 0...0 0...0  we have H(y , y ) = Ez0...0 µ(z )g(z )Q(z ) = 0

m c 2k−1 by the orthogonality property of µ and g since r < d. Corλ(g ◦ ψ , Πk) 0 1 The following bound for r = rψ(y , y ) ≥ d is the key to m2k−1 c 2k−1 = 2 CorU (µ(z1, . . . , zm)g(z1, . . . , zm), Πk) the sharper bound that yields our exponentially better results. (c+m)·2k−1 0 1 A weaker version in [8] applies only when α(r) = r. ≤ 2 · E 0 1 (m) H(y , y ), y ,y ∈Dψ k−1 2(2 −1)α(r) Lemma 3.4. H(y0, y1) ≤ . where H(y0, y1) is the self-correlation in the hypercube 22k−1m2k−1−1 0 1 0 1 defined by y and y : Proof: Note that by definition of Rψ(y , y ), condi- tioned on each fixed value of x 0 1 = (x ) 0 1 Rψ (y ,y ) i i∈Rψ (y ,y ) 0 1  Y u u u u  the random variable zu = zu(x, y0, y1) is statistically H(y , y ) := Ex µ(z1 , . . . , zm)g(z1 , . . . , zm) , v u∈{0,1}k−1 independent of all z for v 6= u. For convenience of notation 0 1 we assume without loss of generality that Rψ(y , y ) = u u {1, . . . , r}. where zi = ψ(xi, y∗i). We now compute bounds on the self-correlation H(y0, y1) that depend on the value of r = Since g is ±1-valued, 0 1 rψ(y , y ). The first bound is from [8] and is the key to the Y H(y0, y1) = E  µ(zu)g(zu) original method. x u∈{0,1}k−1 0 1 0 1 Proposition 3.3. If r = rψ(y , y ) < d, then H(y , y ) = 0. Y u u ≤ Ex µ(z )g(z ) Proof: Let Z = Z0...0Z0...1 ···Z1...1 be the joint dis- u∈{0,1}k−1 u  Y u  tribution induced on {z }u∈{0,1}k−1 by taking x uniformly = Ex µ(z ) u at random. By construction, z is uniformly distributed in u∈{0,1}k−1 m k−1 u {0, 1} for any u ∈ {0, 1} so each Z is a uniform ≤ E [µ(z0...0)] distribution. For each choice of z0...0 we will also consider x Y u 6=0...0 0...0 u   the conditional distribution Z |z on {z } × max Exr+1,...,xm µ(z ) u6=0...0 x1,...,xr which is derived from Z by conditioning on Z0...0 = z0...0. u6=0...0 0...0 Then, = Ex[µ(z )] (1) Y  u  × max Exr+1...xm µ(z ) (2) 0 1 x1,...,xr H(y ,y ) u6=0...0

 Y u u  u u = E{zu} ∼Z µ(z )g(z ) where z = ψ(xi, y ) for all i ∈ [m]. u∈{0,1}k−1 i ∗i u∈{0,1}k−1 We first consider line (1). For x chosen uniformly from ms k−1  0...0 0...0 {0, 1} , by assumption on ψ, for any u ∈ {0, 1} the = E 0...0 µ(z )g(z ) z random variable zu is uniform in {0, 1}m. In particular, 0...0 Y u u  m · E{zu} ∼Z6=0...0|z0...0 µ(z )g(z ) . Ex[µ(z )] = Ez∈{0,1} [µ(z)]. Further, since µ is a u6=0...0 −m u6=0...0 distribution, Ez∈{0,1}m [µ(z)] = 2 . We now bound the remaining terms. First we have

We now consider the conditional distribution in the inner Y  u  0 1 max Exr+1...xm µ(z ) expectation above. For any i that is good for (y , y ) the set x1,...,xr k−1 u u6=0...0 of 2 random variables {z }u∈{0,1}k−1 are independent. i Y  u  6=0...0 0...0 ≤ max Ex ...x µ(z ) . Therefore conditioning of Z on z is equivalent to x ,...,x r+1 m 0...0 0...0 1 r 0 1 u6=0...0 conditioning on (zi )i∈Rψ (y ,y ), the portions of z on

57 u u m−|ρ| Fixing x1, . . . , xr fixes the values of z1 , . . . , zr and by our we define f|ρ on {0, 1} in the natural way. We also r m assumption on ψ, for random xr+1, . . . , xm the values of define Rm := {ρ ∈ {0, 1, ∗} : |ρ| = m − r}. zu , . . . , zu are uniformly random. Therefore the value in r+1 m Definition Given α : {0, . . . , m} → , we say that a line (2) is upper bounded by R probability distribution ν on {0, 1, ∗}m is α-spread iff for Y  u  m α(|ρ|)−|ρ| max Ezu ...zu µ(z ) every restriction ρ ∈ {0, 1, ∗} , Prπ∼ν [π k ρ] ≤ 2 . zu,...,zu r+1 m ∗ u6=0...0 1 r Let deg,α(f) be the minimum d such that for any α- m  2k−1−1 spread distribution ν on {0, 1, ∗} , there is some π with = max Ezr+1...zm µ(z) . z1,...,zr ν(π) > 0 and deg(f|π) ≤ d. Note that for α(r) = r, ∗ deg(f) = deg,α(f) since every distribution on restrictions By the property of µ implied by Lemma 3.1, ∗ ∗ is α-spread. We define deg<,α(f) = min0< deg0,α(f). X α(r)−r max µ(z) ≤ 2 / Given the following lemma, to show that deg,α(f) is z1,...,zr zr+1,...,zm ∗ large, it suffices to show that deg,α(f) is large. and therefore line (2) is upper bounded by m k−1 k−1 Lemma 4.1. Let f : {0, 1} → {−1, 1} and α : (2α(r)−r/(2m−r))2 −1 = (2α(r)−m/)2 −1. (This is ∗ {0, . . . , m} → R. For 0 <  ≤ 1, deg,α(f) ≥ deg,α(f). the one place where we use the max-smoothness property of the distribution µ.) The lemma follows immediately by Proof: Suppose, by contradiction, that for some d, (i) ∗ combining the bounds for lines (1) and (2). deg,α(f) > d, and (ii) deg,α(f) = d. Then by definition, m Plugging in the bounds of Proposition 3.3 and Lemma 3.4 (i’) there exists an α-spread distribution ν on {0, 1, ∗} such we obtain that that deg(f|π) > d for every π with ν(π) > 0, and (ii’)

k−1 there exists a polynomial q of degree ≤ d and a distribution Cor (g ◦ ψm, Πc )2 m P |ρ|−α(|ρ|) λ k λ on {0, 1, ∗} such that R(x) = ρkx 2 λρ > 1 m (2k−1−1)α(r) whenever x ∈ B0, where B0 = {x : |f(x) − q(x)| > }. k−1 X 2 ≤ 2(c+m)·2 · 22k−1m(1 − )2k−1−1 Choosing π ∼ ν, we define the random variable r=d 0 1 X |ρ|−α(|ρ|) × Pr [rψ(y , y ) = r] Iπ := 2 λρ. 0 1 (m) y ,y ∈Dψ ρkπ m c k−1 2 2 X (2k−1−1)α(r) < · 2 X |ρ|−α(|ρ|) 1 −  Then, Eπ∼ν (Iπ) = Pr [ρ k π] · 2 λρ r=d 0 1 π∼ν × Pr [rψ(y , y ) = r] ρ y0,y1∈D(m) ψ X α(|ρ|)−|ρ| |ρ|−α(|ρ|) ≤ 2 · 2 λρ ≤ 1. Taking 2k−1-st roots and using Fact 2.1 we obtain that ρ Rk (f ◦ ψm) ≥ c if 1/2− Therefore there exists a restriction π for which Iπ ≤ 1. If 0 c m there exists x ∈ B such that x ∈ Cπ, then since 2  X (2k−1−1)α(r)  ≥ · 2 X |ρ|−α(|ρ|) 1 −  1/2k−1 R(x) = 2 λρ > 1, r=d 0 1  × Pr [rψ(y , y ) = r] . ρkx y0,y1∈D(m) ψ 0 we would have Iπ > 1. Thus Cπ ∩ B = ∅. So for any Rewriting and taking logarithms yields the claimed bound x ∈ Cπ, we have |f(x) − q(x)| ≤ . But since the degree of of Theorem 3.2. qd is ≤ d this contradicts the fact that deg(f|π) > d. The lemma follows. 4. AC0 FUNCTIONSWITHLARGE (, α)-APPROXIMATE For the rest of this section we always take α(z) ≤ zα0 for DEGREE some α0 < 1 for large enough z and α(z) = z otherwise. By ∗ Given  < 1 and α, it is not obvious that any function, definition, to show that deg,α(f) is large, we need to exhibit let alone a function in AC0, has large (, α)-approximate an α-spread distribution ν such that for any restriction ρ with 0 degree. This section shows that AC does contain functions ν(ρ) > 0, deg(f|ρ) is large. An obvious choice for such ν r α0 with large (5/6, α)-approximate degree and functions with is the uniform distribution on Rm where r ≈ m . Indeed, α0 large α-threshold degree where α(z) ≤ z for α0 < 1 and it is not hard to show with this distribution that the parity all large z. function has large (, α)-approximate degree. However this We first reduce this new notion of approximate degree to simple ν cannot be used for AC0 circuits since these circuits a more tractable notion, which is only large if many widely shrink rapidly under such restrictions. Thus in Lemma 4.2 distributed restrictions of f also require large approximate we define a more involved α-spread family of restrictions. degree. Given a function f on {0, 1}m and a restriction ρ, With this family, we give a generic construction that takes

58 a circuit G on q bits and produces another circuit H on Lemma 4.5. For any p sufficiently large multiple of 15 and m = pq bits such that for any restriction π in the family, q = 24p, if α : {0, . . . , m} → R is defined as α(z) = z0.9 10 H|π contains the projection of G on some set S of r bits – a for r ≥ (3p ln 2) and α(z) = z otherwise, then there is an new function obtained from G by keeping only those nodes explicit depth 4 AC0 function on pq bits that has α-threshold on paths from the inputs in S to the output gate – as a degree at least q1/15. subfunction. If each such projection of G has -approximate 5. MULTIPARTY COMMUNICATION COMPLEXITY LOWER degree rΩ(1) and if p is O(log q) and r is polynomial in q BOUNDSFOR AC0 and hence in m = pq, then we derive that H has (, α)- approximate degree mΩ(1). Together with the functions from the previous section, Theorem 3.2 is sufficient to improve the lower bounds Lemma 4.2. Let q, r, p, and w be integers with q > r > for AC0 functions based on pattern tensor selectors from β √ p ≥ 2 and let 1 > α0 > β > 0 be such that q ≥ , O(log log n) players to Ω( log n) players. These results, p−1 1−β α0 6 p α0−β 2 − 1 ≥ q , q ≥ ln 2 2 r, and w ≥ 3p/ ln 2. which show the power of our introduction of (, α)- Fix any partition of a set of m = pq bits into q blocks of p approximate degree on its own, are described in the full bits each. Define distribution ν on Rpq as follows: choose paper. We need one more ingredient to obtain our strongest a subset of q − r blocks uniformly at random; then assign lower bounds, namely, a different selector function ψ, which values to the variables in each of these blocks uniformly at we denote by INDEX⊕a where a > 0 is an integer. This p p p m k−1 random from {0, 1} −{0 , 1 }. Then for any ρ ∈ {0, 1, ∗} a s α function has s = 2 and D = {0, 1} for all j. |ρ| 0 −|ρ| INDEX⊕a ,j with |ρ| ≥ w, we have Pr [ρ k π] ≤ 2 . k−1 π∼ν For X ∈ {0, 1}s and Y ∈ {0, 1}(k−1)s define The proof of Lemma 4.2 is surprisingly involved and INDEX⊕a (X,Y ) = X requires quite precise tail bounds. It is in the full paper. k−1 (Y1⊕...⊕Yk−1)[a] For  = 5/6, a simple candidate for G is G = OR . With q where the bits in X are indexed by a-bit vectors and Y[a] this G and the family of restrictions given by Lemma 4.2, denotes the vector of the first a bits of Y . This function the next lemma constructs H = TRIBESp,q that has large clearly satisfies the selector function requirement that the (5/6, α)-approximate degree. Recall that TRIBESp,q(x) = Y q p output be unbiased for each fixed value of . ∨i=1 ∧j=1 xi,j. Although the definition of INDEX a uses parity, the ⊕k−1 Lemma 4.3. Given any constants 0 < , α0, β < 1 with number of players k will be O(log n) and hence it is 0 computable in AC . We can either write INDEX a as an β > 1− and α0 −β ≥ 0.1. Let q > p ≥ 2 be integers such ⊕k−1 1−β p 1 α0+−1 α0 ∨ ◦ ∧ ◦ ∨ ◦ ∧ formula where the fan-ins are 2a, a + 1, 2k−2, that 2dq e < 2 ≤ 6 q ln 2. Define α(z) = z for zα0−β ≥ 3p/ ln 2 and α(z) = z otherwise. Then for large and k − 1, respectively, or as an ∨ ◦ ∧ ◦ ∨ formula where p 1− a k−2 enough q, we have deg5/6,α(TRIBESp,q) ≥ q /12. the fan-ins are 2 , a2 + 1, and k − 1, respectively. u With ψ = INDEX a , the variables z = ⊕k−1 i Proof: Define the distribution ν as in the statement of u k−1 INDEX a (x , y ) for u ∈ {0, 1} are independent iff ⊕k−1 i ∗i Lemma 4.2, where a p-block corresponds to a p-term in u v 1− for every u 6= v, y∗i and y∗i select different bits of xi. TRIBESp,q, by applying this lemma with r := dq e and 1/(α −β) w = (3p/ ln 2) 0 . For q large enough, Lemma 5.1. If ψ = INDEX a then ⊕k−1 qβ/r ≥ qβ+−1 > log q > p, and wα0−β ≥ 3p/ ln 2. 0 1 Pr [rψ(y ,y ) = r] y0,y1∈D(m) ψ 2k−a−3 It is clear that for any π with ν(π) > 0,ORr is a m em2 r ≤ 2(2k−a−3)r ≤  . subfunction of TRIBESp,q|π so deg5/6(TRIBESp,q|π) ≥ p r r deg (OR ) ≥ r/12. Thus, deg (TRIBES ) ≥ 5/6 r 5/6,α p,q (m) ∗ p Proof: D {0, 1}(k−1)ms deg (TRIBESp,q) ≥ r/12. In this case ψ is simply . For 5/6,α k−1 In particular, with  = 0.4, β = 0.8, α0 = 0.9, we get: each fixed i ∈ [m] and each fixed pair of u 6= v ∈ {0, 1} , u v 4p the probability that y∗i and y∗i select the same bit of xi is the Corollary 4.4. For sufficiently large p and q = 2 , u1 uk−1 v1 vk−1 probability that (y∗i ⊕· · ·⊕y∗i )[a] = (y∗i ⊕· · · y∗i )[a]. if α : {0, . . . , m} → R is defined as α(z) = z0.9 10 Since u 6= v, this is a homogeneous full rank system of a for r ≥ (3p ln 2) and α√(z) = z otherwise,√ then 3/10 6p/5 equations over F2 which is satisfied with probability pre- deg5/6,α(TRIBESp,q) ≥ q / 12 = 2 / 12. −a 2k−1 2k−3 cisely 2 . By a union bound over all of the 2 < 2 Corollary 4.4 suffices for most of our communication pairs u, v ∈ {0, 1}k−1, it follows that the probability that i is complexity lower bounds. However our results for threshold bad for (y0, y1) is at most 22k−32−a = 22k−a−3. The bound circuit size require a function in AC0 having large α- follows by the independence of the choices of (y0, y1) for threshold degree, which is more difficult to produce. The different values of i ∈ [m]. proof of the next lemma, which involves more complex G We are ready to prove the main theorem for functions and H, and our generic construction are in the full paper. composed using this new selector function.

59 √ √ k log2 n/ k Theorem 5.2. Let α : {0, . . . , m} → R and 0 < Theorem 5.6. R1/3(DISJn,k) is Ω(2 ) for k ≤ α0 < 1. For any Boolean function f on m bits such 1 1/3 log2 n. α0 5 that deg1−,α(f) ≥ d and α(r) ≤ r for all r ≥ d, m Although our bound for DISJn,k applies to exponentially the function f ◦ INDEX⊕a defined on nk bits, where k−1 more players than do the bounds in [14], [8], the previous n = ms and s = 2a ≥ e22k−1m/d, requires that k m k bounds are stronger for k ≤ log log n−o(log log n) players. R1/2−(f ◦ INDEX⊕a ) ≥ d/2 + log2((1 − )) for k−1 0 cc k ≤ (1 − α0) log2 d. Corollary 5.7. There is a depth-2 AC formula in NPk − cc 1/3 BPPk for k up to Θ(log n). Proof: For ψ = INDEX a , by Lemma 5.1, ⊕k−1 m Although we have shown non-trivial lower bounds for X k−1 1/3 2(2 −1)α(r) · Pr [r (y0, y1) = r] DISJk,n for k up to Θ(log n), it is open whether one can ψ (3) 1/3 y0,y1∈D(m) r=d ψ prove stronger lower bounds for k = ω(log n) players for 0 m 2k−a−3 DISJk,n or any other depth-2 AC function. The difficulty of X k−1 em2 r ≤ 2(2 −1)α(r) ·  (4) extending our lower bound methods is our inability to apply r r=d Lemma 3.1 to OR since the constant function 1 approximates k−1 OR on all but one point. Since k ≤ (1 − α0) log d, we have (2 − 1)α(r) < 2 MAJ ◦ SYMM ◦ AND d1−α0 α(r) ≤ r for r ≥ d so (4) is To prove lower bounds for circuits we need lower bounds on protocols that succeed with prob- m 2k−a−2 X em2 r ability barely better than that of random guessing. Using the ≤  r function with large α-threshold degree given by Lemma 4.5 r=d m in place of TRIBESp,q we obtain the following theorem. X −r −(d−1) a 2k−1 ≤ 2 < 2 for 2 ≥ e2 m/d. Theorem 5.8. There exist explicit constants c, c0 > 0 and r=d a depth 6 AC0 function H : {0, 1}∗ → {0, 1} such that Plugging this into Theorem 3.2 we obtain that k c for 1/2 >  > 0, R1/2−(Hn) is Ω(n + log ) for any 0 k m 1 −(d−1) k ≤ c log2 n. R1/2−(f ◦ ψ ) ≥ log2((1 − )) − k−1 log2 2 2 6. THRESHOLDCIRCUITLOWERBOUNDSFOR AC0 > d/2k + log ((1 − )) 2 Following the approach of Viola [25], which extends the as required. ideas of Razborov and Wigderson [18], we show quasipoly- 0 0 Let TRIBESp,q be the dual of the TRIBESp,q function on nomial lower bounds on the simulation of AC functions by 0 m = pq bits. Obviously the (, α)-degree of TRIBESp,q is unrestricted MAJ ◦ SYMM ◦ AND circuits. the same as that of TRIBESp,q for any  and α. By applying ∗ 0 Theorem 6.1. There is a function G : {0, 1} → {0, 1} in the above theorem for f = TRIBES and f = TRIBES , p,q p,q AC0 such that G requires MAJ◦SYMM◦AND circuit size we obtain the following result. The proofs of the remaining N N Ω(log log N). results in this section are in the full paper. Proof Sketch: The proof is almost identical to an Theorem 5.3. Let p be a sufficiently large integer and 0 4p p+2k argument in [25] with our hard AC functions replacing the q = 2 , k ≤ p/10, and s = 2 . Let F = TRIBESp,q ◦ m 0 0 m generalized inner product. It relies on the following con- INDEX (p+2k) and F = TRIBESp,q ◦ INDEX (p+2k) . Let ⊕k−1 ⊕k−1 nection between multiparty communication complexity and n = pqs = p25p+2k be the number of input bits given threshold circuit complexity given by Hastad˚ and Goldmann. 0 k to each player in computing F or F . Then R1/3(F ) and k 0 0.3 k Ω(1) k Proposition 6.2. [11] If f is computed by a MAJ ◦ R1/3(F ) are both Ω(q /2 ) which is n /4 . Further- k SYMM ◦ ANDk−1 circuit of size S, then R (f) more, F has polynomial-size depth 5 AC0 formulas and F 0 1/2−1/(2S) is O(k log S). has polynomial-size depth 4 AC0 formulas. k m We use the function Hn from Theorem 5.8 and replace Lemma 5.4. N (TRIBESp,q ◦INDEX⊕a ) is O(log q+pa). 2 k−1 each input by an ⊕ of Θ(log n) new input bits to obtain a 2 Corollary 5.5. There is a function G in depth 5 AC0 such function G of N = Θ(n log n) inputs. This adds 2 to the that G is in NPcc −BPPcc for k ≤ a0 log n for some constant depth and keeps the polynomial size. If G is computed by k k o(log log N) a0 > 0. such a circuit C of size N then using random re- strictions that leave bits unset with probability Θ(1/ log N) By applying the distributive law to the depth 5 function we can ensure both that all bottom-level AND gates of C are m f = TRIBESp,q ◦ INDEX (p+2k) we derive the following reduced to fan-in at most δ log N and that every ⊕ block of ⊕k−1 2 exponential improvement in the number of players for which inputs in G contains at least one unset input bit. Applying non-trivial lower bounds can be shown for DISJk,n. Proposition 6.2 yields a contradiction to Theorem 5.8.

60 7. PROOFOF LEMMA 3.1 and rewrite (6) as: Proof: As in the proof for Lemma 2.4, we write the m h(x): βf(x) + pd(x) + wx − vx = 0 : x ∈ {0, 1} (7) requirements as a linear program and study its dual. The lemma is implied by proving that the following linear Equations (5) and (7) for x ∈ {0, 1}m together are equiva- program P has optimal value ≤ 1: lent to: X |ρ|−α(|ρ|) Minimize η subject to 2wx + βf(x) + pd(x) + γ − 2 λρ = 0 and X ρkx yS : h(x)χS(x) = 0 : |S| < d X |ρ|−α(|ρ|) x∈{0,1}m 2vx − βf(x) − pd(x) + γ − 2 λρ = 0. X β : h(x)f(x) ≥  ρkx x∈{0,1}m Since these are the only constraints on vx and wx respec- m vx : µ(x) − h(x) ≥ 0 : x ∈ {0, 1} tively other than non-negativity these can be satisfied by any m solution to wx : µ(x) + h(x) ≥ 0 : x ∈ {0, 1} X |ρ|−α(|ρ|) |ρ|−α(|ρ|) X m βf(x) + pd(x) + γ ≤ 2 λρ and λρ : η − 2 µ(x) ≥ 0 : ρ ∈ {0, 1, ∗} ρkx x∈Cρ X X |ρ|−α(|ρ|) γ : µ(x) = 1 −βf(x) − pd(x) + γ ≤ 2 λρ, ρkx x∈{0,1}m Suppose that we have optimum η ≤ 1. In this LP for- which together are equivalent to mulation, inequality γ ensures that the function µ is a X |ρ|−α(|ρ|) |βf(x) + pd(x)| + γ ≤ 2 λρ. probability distribution, and inequalities v and w ensure x x ρkx that µ(x) ≥ |h(x)| so ||h||1 ≤ 1. If ||h||1 = 1, then we must have µ(x) = |h(x)| and we can write h(x) = µ(x)g(x) as Since pd(x) is an arbitrary polynomial function of degree d p = −βq q in the proof of Lemma 2.4 and then the inequalities yS will less than , we can write d d where d is another d ensure that Corµ(g, χS) = 0 for |S| < d and inequality β arbitrary polynomial function of degree less than and we |βf(x) + p (x)| β|f(x) − q (x)| will ensure that Corµ(f, g) ≥  as required. Finally, each can replace the terms d by d . −|ρ|+α(ρ|) Therefore the dual program D is equivalent to maximizing inequality λρ ensures that µ(Cρ) ≤ 2 which is actually a little stronger than our claim. β ·  + γ subject to The only issue is that an optimal solution might have X |ρ|−α(|ρ|) β|f(x) − qd(x)| + γ ≤ 2 λρ ||h|| < 1. However in this case inequality β ensures that 1 ρkx ||h||1 ≥ . Therefore, for any solution of the above LP m with function h, we can define another function h0(x) = for all x ∈ {0, 1} , where λ is a probability distribution 0 h(x)/||h||1 with ||h ||1 = 1 and a new probability distri- on the set of restrictions and qd is a real-valued function of 0 0 0 bution µ by µ (x) = |h (x)| ≤ µ(x)/||h||1 ≤ µ(x)/. degree < d. m This new h0 and µ0 still satisfy all the inequalities as before Now, let B be the set of points x ∈ {0, 1} at which except possibly inequality λρ but in this case if we increase |f(x)−qd(x)| ≥ . For any x ∈ B, the value of the objective η by a 1/||h||1 factor it will also be satisfied. Therefore, function of D, which is β ·  + γ, is not more than 0 −|ρ|+α(|ρ|) µ (Cρ) ≤ 2 /. X |ρ|−α(|ρ|) β|f(x) − qd(x)| + γ ≤ 2 λρ. (8) Here is the dual LP: ρkx Maximize β ·  + γ subject to Let R(x) denote the right-hand side of inequality (8). It X η : λρ = 1 suffices to prove that R(x) ≤ 1 for some x ∈ B. This is, in ρ∈{0,1,∗}m turn, equivalent to proving that X |ρ|−α(|ρ|) m µ(x): vx + wx + γ − 2 λρ = 0 : x ∈ {0, 1} min R(x) ≤ 1, ρkx x∈B (5) for any distribution λ. Since deg<,α(f) is larger than the m X m degree of qd, there must exist x ∈ {0, 1} that is both α- h(x): βf(x) + ySχS(x) + wx − vx = 0 : x ∈ {0, 1} light for λ and |f(x) − q (x)| ≥ . Since |f(x) − q (x)| ≥  |S| 0 when applied to AC functions: By results 0 by pd(x) where pd is an arbitrary polynomial of degree < d of Linial, Mansour, and Nisan [15], for any AC function f

61 and constant 0 < η < 1, there is a function pd of degree [12] R. Jain, H. Klauck, and A. Nayak, “Direct product theorems η 2 m−mδ for classical communication complexity via subdistribution d < m , such that ||f − pd||2 ≤ 2 for some constant δ > 0. Let B be the set of x such that |f(x) − p (x)| ≥ . bounds,” in Proceedings of the Fortieth Annual ACM Sympo- m d sium on Theory of Computing, Victoria, BC, May 2008, . Then |B |2 ≤ P |f(x) − p (x)|2 ≤ ||f − p (x)||2 ≤ m x∈Bm d d 2 599–608. m−mδ m−mδ 2 2 so |Bm| ≤ 2 / . If we tried to replace the [13] E. Kushilevitz and N. Nisan, Communication Complexity. upper bound on µ(Cρ) by some c(|ρ|) where 1/c(m) is Cambridge, England ; New York: Cambridge University ω(|Bm|) then in the dual program D, we could choose λx = Press, 1997. 1/|Bm| for x ∈ Bm and λρ = 0 for all other ρ and for these [14] T. Lee and A. Shraibman, “Disjointness is hard in the values β would be unbounded. multi-party number-on-the-forehead model,” in Proceedings Twenty-Third Annual IEEE Conference on Computational Complexity, College Park, Maryland, Jun. 2008, pp. 81–91. ACKNOWLEDGEMENTS [15] M. Linial, Y. Mansour, and N. Nisan, “Constant depth circuits, Fourier transform, and learnability,” in 30th Annual Sympo- We thank Emanuele Viola for suggesting the circuit com- sium on Foundations of Computer Science, Research Triangle plexity application and Alexander Sherstov, Arkadev Chat- Park, NC, Oct. 1989, pp. 574–579. topadhyay, and the anonymous referees for many helpful [16] N. Nisan and M. Szegedy, “On the degree of boolean func- comments. tions as real polynomials,” Computational Complexity, vol. 4, pp. 301–314, 1994. [17] R. Raz, “The BNS-Chung criterion for multi-party commu- REFERENCES nication complexity,” Computational Complexity, vol. 9, pp. 113–122, 2000. [1] E. W. Allender, “A note on the power of threshold circuits,” in 30th Annual Symposium on Foundations of Computer Science. [18] A. Razborov and A. Wigderson, “Lower bounds on the size Research Triangle Park, NC: IEEE, Oct. 1989, pp. 580–584. of depth 3 threshold circuits with AND gates at the bottom,” Information Processing Letters, vol. 45, pp. 303–307, 1993. [2] . Babai, A. Gal,´ P. G. Kimmel, and S. V. Lokam, “Communi- 0 cation complexity of simultaneous messages,” SIAM Journal [19] A. A. Razborov and A. A. Sherstov, “The sign-rank of AC ,” on Computing, vol. 33, no. 1, pp. 137–166, 2003. in Proceedings 49th Annual Symposium on Foundations of Computer Science. Philadelphia,PA: IEEE, Oct. 2008, pp. [3] L. Babai, T. P. Hayes, and P. G. Kimmel, “The cost of the 57–66. missing bit: Communication complexity with help,” Combi- 0 natorica, vol. 21, no. 4, pp. 455–488, 2001. [20] A. A. Sherstov, “Separating AC from depth-2 majority circuits,” in Proceedings of the Thirty-Ninth Annual ACM [4] L. Babai, N. Nisan, and M. Szegedy, “Multiparty protocols, Symposium on Theory of Computing, San Diego, CA, Jun. pseudorandom generators for logspace, and time-space trade- 2007, pp. 294–301. offs,” Journal of Computer and System Sciences, vol. 45, no. 2, pp. 204–232, Oct. 1992. [21] ——, “Communication lower bounds using dual polynomi- als,” Bulletin of the European Association for Theoretical [5] P. Beame, T. Pitassi, N. Segerlind, and A. Wigderson, “A Computer Science, vol. 95, pp. 59–93, 2008. strong direct product theorem for corruption and the multi- party communication complexity of set disjointness,” Com- [22] ——, “The pattern matrix method for lower bounds on quan- putational Complexity, vol. 15, no. 4, pp. 391–432, 2006. tum communication,” in Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, Victoria, BC, May [6] A. Ben-Aroya, O. Regev, and R. de Wolf, “A hypercontractive 2008, pp. 85–94. inequality for matrix-valued functions with applications to ,” in Proceedings 49th Annual Symposium [23] ——, “Unbounded-error communication complexity of sym- on Foundations of Computer Science. Philadelphia,PA: metric functions,” in Proceedings 49th Annual Symposium on IEEE, Oct. 2008, pp. 477–486. Foundations of Computer Science. Philadelphia,PA: IEEE, Oct. 2008, pp. 384–393. [7] A. Chattopadhyay, “Discrepancy and the power of bottom fan-in in depth-three circuits,” in Proceedings 48th Annual [24] P. Tesson, “Communication complexity questions related to Symposium on Foundations of Computer Science. Berkeley, finite monoids and semigroups,” Ph.D. dissertation, McGill CA: IEEE, Oct. 2007, pp. 449–458. University, 2002. [8] A. Chattopadhyay and A. Ada, “Multiparty communication [25] E. Viola, “Pseudorandom bits for constant-depth circuits with complexity of disjointness,” Electronic Colloquium in Com- few arbitrary symmetric gates,” SIAM Journal on Computing, putation Complexity, Tech. Rep. TR08-002, 2008. vol. 36, no. 5, pp. 1387–1403, 2007. [9] F. R. K. Chung, “Quasi-random classes of hypergraphs,” [26] E. Viola and A. Wigderson, “One-way multi-party commu- Random Structures and Algorithms, vol. 1, no. 4, pp. 363– nication lower bound for pointer jumping with applications,” 382, 1990. in Proceedings 48th Annual Symposium on Foundations of Computer Science. Berkeley, CA: IEEE, Oct. 2007, pp. 427– [10] M. David, T. Pitassi, and E. Viola, “Improved separations 437. between nondeterministic and randomized multiparty commu- nication,” in RANDOM 2008, 12th International Workshop on [27] A. C. Yao, “On ACC and threshold circuits,” in Proceedings Randomization and Approximization Techniques in Computer 31st Annual Symposium on Foundations of Computer Science. Science, 2008, pp. 371–384. St. Louis, MO: IEEE, Oct. 1990, pp. 619–627. [11] J. Hastad˚ and M. Goldmann, “On the power of small-depth threshold circuits,” Computational Complexity, vol. 1, pp. 113–129, 1991.

62