Separation of AC $^ 0 [\Oplus] $ Formulas and Circuits

Home , AC0, CC (complexity), ELEMENTARY, R (complexity)

arXiv:1702.03625v1 [cs.CC] 13 Feb 2017 hr a enpors naaouso h NC the of analogues on progress been has there well. h oooefruasz fst-Connectivity). bou lower of onl classic size gates formula the OR by monotone and separated the AND was (with circuits basis vs. monotone formulas the in instance, For ola icis.Dsiedcdso ffrs hsquestion this efforts, of decades o Despite class (the P/poly circuits). g of Boolean the subclass proper of a NC is one whether formulas) is is Boolean area this circuits in versus question central formulas The of power relative The Introduction 1 ∗ 1 nti ae efcso o-nfr opeiycass Th classes. complexity non-uniform on focus we paper this In NSERC by Supported a prxmt ereΘ( degree approximate has et nteAC the in depth all zdcntuto fdepth- of construction ized to n 1 a O n3 on depth- Majorities tuto fformulas of struction exp( O (log depth- ( / d hspprgvstefis eaainbtentepwrof power the between separation first the gives paper This AC oooeAC Monotone d 8 / O ( ) et.o ahadCmue Science Computer and Math of Depts. errplnma approximation polynomial -error n .Ti euti bandb obnto fnwlwraduprboun upper and lower new of combination a by obtained is result This ). rcino inputs. of fraction 4 s ( d ) n ) 0 d ≤ [ AC 1 d − ⊕ ⊕ ⊕ / h ls fBoenfunctions Boolean of class the , 2( 1 omlso size of formulas O oml oe bound. lower formula ] 0 d ereapoiainfrcircuits for approximation degree eaaino AC of Separation ( [ − ⊕ o log log 1) nvriyo Toronto of University log oml ieo l prxmt aoiyfntosfrall for functions Majority Approximate all of size formula ] ejmnRossman Benjamin )ta opt nApoiaeMjrt ucin hsstrength This function. Majority Approximate an compute that )) 0 [ ⊕ n n ai ubuddfni N,O,NTadMOD and NOT OR, AND, fan-in (unbounded basis ] 0 ,ta hr exist there that ), fsz exp( size of ici pe bound. upper circuit n √ d o n ( oooeAC monotone d ,ti eutipisa exp(Ω( an implies result this ), ) O mroe,ti sotmli that in optimal is this (moreover, ( dn eray1,2017 14, February over eso hteeydepth- every that show We 1 ∗ / oyoilsz depth- polynomial-size 0 2( [ { d ⊕ Abstract 0 F − , 2 1) 0 1 1 omlsadCircuits and Formulas ] u oRzoo 1] ic h aoiyfunction Majority the Since [12]. Razborov to due fdegree of 1 )det mn [1]. Amano to due )) } tecaso agae eial ypolynomial-size by decidable languages of class (the icis(ihu O rMOD or NOT (without circuits 1 n s /oyqeto ncranrsrce settings. restricted certain in question P/poly vs. o all For { → 0 , O 1 d usino uniform-NC of question e } ( ( eateto Mathematics of Department n htarewt h aoiyfunction Majority the with agree that agae eial ypolynomial-size by decidable languages f d 1 do acmradWgesn[](on [8] Wigderson and Karchmer of nd ean ieopen. wide remains ) log ≤ etmseisi opeiytheory. complexity in mysteries reat d rknhSrinivasan Srikanth dn s circuits O ) formulas d d 1 ( − / ) h oe fpolynomial-size of power the y), I Bombay IIT AC o log log 2( n 1 log hssrntesaclassic a strengthens This . o d ( − d 0 n 1) ) [ ⊕ htaentequivalent not are that n antb mrvdto improved be cannot )lwrbudo the on bound lower )) 2 oml fsize of formula ] ,w iearandom- a give we ), and ae) eso,for show, We gates). d ( sfor ds n ) circuits 2 1 ≤ 1 s swd pnas open wide is P vs. ae)o size of gates) Approximate ntemeantime, the In O (log n con- a ens fequal of n ). s has The bounded-depth setting is another natural venue for investigating the question of formula vs. circuits. Consider the elementary fact that every depth-d circuit of size s is equivalent to a depth-d d 1 formula of size at most s − , where we measure size by the number of gates. This observation is valid with respect to any basis (i.e. set of gate types). In particular, we may consider the AC0 basis (unbounded fan-in AND, OR, NOT gates) and the AC0[ ] basis (unbounded fan-in MOD gates ⊕ 2 in addition to AND, OR, NOT gates). With respect to either basis, there is a natural depth-d analogue of the NC1 vs. P/poly question (where d = d(n) is a parameter that may depend on n), namely whether every language decidable by polynomial-size depth-d circuits is decidable by depth-d formulas of size no(d) (i.e. better than the trivial nO(d) upper bound). It is reasonable to expect that this question could be resolved in the sub-logarithmic depth regime (d(n) log n), given the powerful lower bound techniques against AC0 circuits (H˚astad’s ≪ Switching Lemma [5]) and AC0[ ] circuits (the Polynomial Method of Razborov [12] and Smolensky ⊕ [15]). However, because the standard way of applying these techniques does not distinguish between circuits and formulas, it is not clear how to prove quantitatively stronger lower bounds on formula size vis-a-vis circuit size of a given function. Recent work of Rossman [13] developed a new way 0 1/(d 1) of applying H˚astad’s Switching Lemma to AC formulas, in order to prove an exp(Ω(dn − )) lower bound on the formula size of the Parity function for all d O(log n). Combined with the 1/(d 1) ≤ well-known exp(O(n − )) upper bound on the circuit size of Parity, this yields an asymptotically optimal separation in the power of depth-d AC0 formulas vs. circuits for all d(n) O( log n ), as ≤ log log n well as a super-polynomial separation for all ω(1) d(n) o(log n). ≤ ≤ In the present paper, we carry out a similar development for formulas vs. circuits in the AC0[ ] ⊕ basis, obtaining both an asymptotically optimal separation for all d(n) O( log n ) and a super- ≤ log log n polynomial separation for all ω(1) d(n) o(log n). Our target functions lie in the class of ≤ ≤ Approximate Majorities, here defined as Boolean functions 0, 1 n 0, 1 that approximate the { } → { } Majority function on 3/4 fraction of inputs. First, we show how to apply the Polynomial Method to obtain better parameters in the approximation of AC0[ ] formulas by low-degree polynomials ⊕ over F . This leads to an exp(Ω(dn1/2(d 1))) lower bound on the AC0[ ] formula size of all Ap- 2 − ⊕ proximate Majority functions. The other half of our formulas vs. circuits separation comes from an exp(O(n1/2(d 1))) upper bound on the AC0[ ] circuit size of some Approximate Majority func- − ⊕ tion. In fact, this upper bound is realized by a randomized construction of monotone AC0 circuits (without NOT or MOD2 gates). Together these upper and lower bound give our main result: Theorem 1.

(i) For all 2 d(n) O( log n ), there exist AC0[ ] circuits (in fact, monotone AC0 circuits) ≤ ≤ log log n ⊕ of depth d and size poly(n) that are not equivalent to any AC0[ ] formulas of depth d and ⊕ size no(d).

(ii) For all ω(1) d(n) o(log n), the class of languages decidable by polynomial-size depth-d ≤ ≤ AC0[ ] formulas is a proper subclass of the class of languages decidable by polynomial-size ⊕ depth-d AC0[ ] circuits. ⊕ Separation (i) is asymptotically optimal, in view of the aforementioned simulation of poly(n)- size depth-d circuits by depth-d formulas of size nO(d). Separation (ii) resembles an analogue of NC1 = P/poly (or rather NC1 = AC1) within the class AC0[ ]. In fact, extending separation (ii) 6 6 ⊕ from depth o(log n) to depth log n is equivalent the separation of NC1 and AC1.

2 1.1 Proof outline Improved polynomial approximation. The lower bound for AC0[ ] formulas follows the gen- ⊕ eral template due to Razborov [12] on proving lower bounds for AC0[ ] circuits using low-degree ⊕ n polynomials over F2. Razborov showed that for any Boolean function f : 0, 1 0, 1 that has 0 { } → { } d 1 an AC [ ] circuit of size s and depth d, there is a randomized polynomial P of degree O(log s) − ⊕ 7 that computes f correctly on each input with probability 8 (we call such polynomials 1/8-error probabilistic polynomials). By showing that some explicit Boolean function f (e.g. the Majority function or the MODq function for q odd) on n variables does not have such an approximation of degree less than Ω(√n) [12, 15, 16], we get that any AC0[ ] circuit of depth d computing f must 1/2(d 1) ⊕ have size exp(Ω(n − )). In this paper, we improve the parameters of Razborov’s polynomial approximation from above for AC0[ ] formulas. More precisely, for AC0[ ] formulas of size s and depth d, we are able to ⊕ ⊕ 1 d 1 construct 1/8-error probabilistic polynomials of degree O( d log s) − . (Since every depth-d circuit of d 1 size s is equivalent to a depth-d formula of size at most s − , this result implies Razborov’s original theorem that AC0[ ] circuits of size s and depth d have 1/8-error probabilistic polynomials of d 1 ⊕ degree O(log s) − .) We illustrate the idea behind this improved polynomial approximation with the special case of a balanced formula (i.e. all gates have the same fan-in) of fan-in t and depth d. Note that the d 1 size of the formula (number of gates) is Θ(t − ) and hence it suﬃces in this case to show that d 1 it has a 1/8-error probabilistic polynomial of degree O(log t) − . We construct the probabilistic polynomial inductively. Given a balanced formula F of depth d and fan-in t, let F1,...,Ft be its subformulas of depth d 1. Inductively, each Fi has a 1/8-error probabilistic polynomial Pi d 2 − of degree O(log t) − and by a standard error-reduction [10], it has a (1/16t)-error probabilistic polynomial of degree O(log t)d 1 (in particular, at any given input x 0, 1 n, the probability that − ∈ { } there exists an i [t] such that P (x) = F (x) is at most 1/16). Using Razborov’s construction ∈ i 6 i of a 1/16-error probabilistic polynomial of degree O(1) for the output gate of F and composing this with the probabilistic polynomials Pi, we get the result for balanced formulas. This idea can be extended to general (i.e. not necessarily balanced) formulas with a careful choice of the error parameter for each subformula Fi to obtain the stronger polynomial approximation result.

Improved formula lower bounds. Combining the above approximation result with known lower bounds for polynomial approximation [12, 15, 16], we can already obtain stronger lower bounds for AC0[ ] formulas than are known for AC0[ ] circuits. For instance, it follows that ⊕ ⊕ any AC0[ ] formula of depth d computing the Majority function on n variables must have size ⊕ exp(Ω(dn1/2(d 1))) for all d O(log n), which is stronger than the corresponding circuit lower − ≤ bound. Similarly stronger formula lower bounds also follow for the MODq function (q odd).

Separation between formulas and circuits. However, the above improved lower bounds do not directly yield the claimed separation between AC0[ ] formulas and circuits. This is because ⊕ we do not have circuits computing (say) the Majority function of the required size. To be able to prove our result, we would need to show that the Majority function has AC0[ ] circuits of depth d ⊕ and size exp(O(n1/2(d 1))) (where the constant in the O( ) is independent of d). However, as far as − · we know, the strongest result in this direction [9] only yields AC0[ ] circuits of size greater than ⊕

3 1/(d 1) 2 exp(Ω(n − )), which is superpolynomially larger than the upper bound. To circumvent this issue, we change the hard functions to the class of Approximate Majorities, which is the class of Boolean functions that agree with Majority function on most inputs. While this has the downside that we no longer are dealing with an explicitly deﬁned function, the advantage is that the polynomial approximation method of Razborov yields tight lower bounds for some functions from this class. Indeed, since the method of Razborov is based on polynomial approximations, it immediately follows that the same proof technique also yields the same lower bound for computing Approximate Majorities. Formally, any AC0[ ] circuit of depth d computing any Approximate Majority must 1/2(d 1) ⊕ have size exp(Ω(n − )). On the upper bound side, it is known from the work of O’Donnell and Wimmer [11] and Amano [1] that there exist Approximate Majorities that can be computed by 0 1/2(d 1) monotone AC formulas of depth d and size exp(O(dn − )). (Note that the double exponent 1 2(d 1) is now the same in the upper and lower bounds.) − We use the above ideas for our separation between AC0[ ] formulas and circuits. Plugging in ⊕ our stronger polynomial approximation for AC0[ ] formulas, we obtain that any AC0[ ] formula ⊕ 1/2(d 1) ⊕ of depth d computing any Approximate Majority must have size exp(Ω(dn − )). In particular, this implies that Amano’s construction is tight (up to the universal constant in the exponent) even for AC0[ ] formulas. ⊕ Further, we also modify Amano’s construction [1] to obtain better constant-depth circuits for Approximate Majorities: we show that there exist Approximate Majorities that are computed by monotone AC0 circuits of depth d of size exp(O(n1/2(d 1))) (the constant in the O( ) is a constant − · independent of d).

Smaller circuits for Approximate Majority. Our construction closely follows Amano’s, which in turn is related to Valiant’s probabilistic construction [18] of monotone formulas for the Majority function. However, we need to modify the construction in a suitable way that exploits the fact that we are constructing circuits. This modification is in a similar spirit to a construction of Hoory, Magen and Pitassi [6] who modify Valiant’s construction to obtain smaller monotone circuits (of depth Θ(log n)) for computing the Majority function exactly. At a high level, the difference between Amano’s construction and ours is as follows. Amano constructs random formulas F of each depth i d as follows. The formula F is the AND i ≤ 1 of a1 independent and randomly chosen variables. For even (respectively odd) i > 1, Fi is the OR (respectively AND) of ai independent and random copies of Fi 1. For suitable values of − a , . . . , a N, the random formula F computes an Approximate Majority with high probability. 1 d ∈ d In our construction, we build a depth i circuit C for each i d in a similar way, except that each i ≤ Ci now has M different outputs. Given such a Ci 1, we construct Ci by taking M independent − randomly chosen subsets T1,...,TM of ai many outputs of Ci 1 and adding gates that compute − either the OR or AND (depending on whether i is even or odd) of the gates in Ti. Any of the M final gates of Cd now serves as the output gate. By an analysis similar to Amano’s (see also [6]) we can show that this computes an Approximate Majority with high probability, which finishes the proof.3

2Indeed, this is inevitable with all constructions that we are aware of, since they are actually AC0 circuits and it is known by a result of H˚astad [5] that any AC0 circuit of depth d for the Majority function must have size exp(Ω(n1/(d−1))). 3This is a slightly imprecise description of the construction as the ﬁnal two levels of the circuit are actually deﬁned

4 2 Preliminaries

Throughout, n will be a growing parameter. We will consider Boolean functions on n variables, i.e. functions of the form f : 0, 1 n 0, 1 . We will sometimes identify 0, 1 with the field F { } → { } { } 2 in the natural way and consider functions f : Fn F instead. 2 → 2 Given a Boolean vector y 0, 1 n, we use y and y to denote the number of 0s and number ∈ { } | |0 | |1 of 1s respectively in y. The Majority function on n variables, denoted MAJn is the Boolean function that maps inputs x 0, 1 n to 1 if and only if x > n/2. ∈ { } | |1 Definition 2. An (ε, n)-Approximate Majority is a function f : 0, 1 n 0, 1 such that { } → { } Prx 0,1 n [f(x) = Majn(x)] ε. ∈{ } 6 ≤ As far as we know, the study of this class of functions was initiated by O’Donnell and Wim- mer [11]. See also [1, 4]. We refer the reader to [2, 7] for standard definitions of Boolean circuits and formulas. We use AC0 circuits (respectively formulas) to denote circuits (respectively formulas) of constant depth made up of AND, OR and NOT gates. Similarly, AC0[ ] circuits (respectively formulas) will be ⊕ circuits (respectively formulas) of constant depth made up of AND, OR, MOD2 and NOT gates. The size of a circuit will denote the number of gates in the circuit and the size of a formula will denote the number of its leaves which is within a constant multiplicative factor of the number of gates in the formula.4

3 Lower Bound

In this section, we show that any AC0[ ] formulas of depth d computing a (1/4,n)-Approximate ⊕ Majority must have size at least exp(Ω(dn1/2(d 1))) for all d O(log n). − ≤ We work over the ﬁeld F and identify it with 0, 1 in the natural way. The following concepts 2 { } are standard in circuit complexity (see, e.g., Beigel’s survey [3]).

Deﬁnition 3. Fix any ε [0, 1]. A polynomial P F [X ,...,X ] is said to be an ε-approximating ∈ ∈ 2 1 n polynomial for a Boolean function f : 0, 1 n 0, 1 if { } → { } Pr [f(x)= P (x)] 1 ε. x 0,1 n ≥ − ∈{ } We will use the following result of Smolensky [16] (see also Szegedy’s PhD thesis [17]).

Lemma 4 (Smolensky [16]). Let ε (0, 1 ) be any ﬁxed constant. Any ( 1 ε)-approximating ∈ 2 2 − polynomial for the Majority function on n variables must have degree Ω(√n).

Corollary 5. Let f be any (1/4,n)-Approximate Majority and ε (0, 1/4) an arbitrary constant. ∈ Then any ( 1 ε)-approximating polynomial for f must have degree Ω(√n). 4 − Proof. The proof is immediate from Lemma 4 and the triangle inequality. somewhat diﬀerently. 4We assume here without loss of generality that the formula does not contain a gate of fan-in 1 feeding into another.

5 Deﬁnition 6. An ε-error probabilistic polynomial of degree D for a Boolean function f : 0, 1 n { } → 0, 1 is a random variable P taking values from polynomials in F [X ,...,X ] of degree at most { } 2 1 n D such that for all x 0, 1 n, we have Pr[ f(x)= P(x) ] 1 ε. ∈ { } ≥ − Deﬁnition 7. Let Dε(f) be the minimum degree of an ε-error probabilistic polynomial for f. We will make use of the following two lemmas concerning D ( ). ε · Lemma 8 (Razborov [12]). Let ORn and ANDn be the OR and AND functions on n variables respectively. Then D (OR ), D (AND ) log(1/ε) . ε n ε n ≤ ⌈ ⌉ Lemma 9 (Kopparty and Srinivasan [10]). There is an absolute constant c1 such that for any ε (0, 1), D (f) c log(1/ε) D (f) for all Boolean functions f. ∈ ε ≤ 1 · ⌈ ⌉ · 1/8 We now state our main result, which shows that every AC0[ ] formula of size s and depth d + 1 1 ⊕ d admits a 1/8-error approximating polynomial of degree O( d log s) . Theorem 10. There is an absolute constant c such that, if f is computed by an AC0[ ] formula 2 ⊕ F of size s and depth d + 1, then D (f) 3(c ( 1 log(s) + 1))d. 1/8 ≤ 2 d Proof. The proof is an induction on the depth d of the formula. The base case d = 0 corresponds to the case when the formula is a single AND, OR or MOD2 gate and we need to show that D (f) 3. In the case that the formula is an AND or OR gate, 1/8 ≤ this follows from Lemma 8. If the formula is a MOD2 gate, this follows from the fact that the MOD2 function is exactly a polynomial of degree 1. Let d 1. We assume that the formula F is the AND/OR/MOD of sub-formulas F ,...,F ≥ 2 1 m computing f ,...,f where F has size s and depth d + 1. So F has size s = s + + s 1 m i i 1 · · · m and depth d + 2. Assume that D (f ) 3(c ( 1 log(s ) + 1))d for all i. We must show that 1/8 i ≤ 2 d i D (f) 3(c ( 1 log(s) + 1))d+1. 1/8 ≤ 2 d+1 By Lemma 9, each f has an s /(16s)-error probabilistic polynomial P of degree c i i i 1 · log(16s/s ) D (f ), which is at most ⌈ i ⌉ · 1/8 i 3c 5(log(s/s ) + 1) (c ( 1 log(s ) + 1))d. 1 · i · 2 d i m Then (P1,..., Pm) jointly computes (f1,...,fm) with error 1/16 (= i=1(si/(16s))). By a reasoning identical to the base case, it follows that there existsP a 1/16-error probabilistic polynomial Q of degree 4 for the output gate of the formula. Then Q(P1,..., Pm) is a 1/8-error probabilistic polynomial for f of degree

1 d 60c1 max(log(s/si) + 1) (c2( log(si) + 1)) . · i · d So long as c 20c , it suﬃces to show that for all i, 2 ≥ 1 (log(s/s ) + 1) ( 1 log(s ) + 1)d ( 1 log(s) + 1)d+1. i · d i ≤ d+1 Consider any i and let a, b 0 such that s = 2a and s = 2a+b. We must show ≥ i a d a + b d+1 (b + 1) + 1 + 1 . d ≤ d + 1

6 For fixed a 0, as a polynomial in b, the function ≥ a + b d+1 a d p (b) := + 1 (b + 1) + 1 a,d d + 1 − d is nonnegative over b 0 with a unique root at b = a/d. This follows from ≥ ∂ a + b d a d p (b)= + 1 + 1 , ∂b a,d d + 1 − d which is zero iff b = a/d; this value is a minimum of pa,d with pa,d(a/d) = 0. Corollary 11. Fix any constant d and let n N be a growing parameter. Let f be any ∈ (1/4,n)-Approximate Majority. Then any AC0[ ] formula of depth d computing f must have ⊕ size exp(Ω(dn1/2(d 1))) for all d O(log n), where asymptotic notation O( ) and Ω( ) hide absolute − ≤ · · constants (independent of d and n). Proof. Say that F is an AC0[ ] formula of depth d and size s computing f. Then, by Lemma 9, ⊕ we see that F has a 1/8-error probabistic polynomial P of degree D O(O( 1 log s + 1)d 1). In ≤ d − particular, by an averaging argument, there is some fixed polynomial P F [X ,...,X ] of degree ∈ 2 1 n at most D such that P is a 1/8-error approximating polynomial for f. Corollary 5 implies that the degree of P must be Ω(√n). Hence, we obtain O( 1 log s + 1)d 1 d − ≥ Ω(√n). It follows that 1/2(d 1) s exp(Ω(dn − ) O(d)). ≥ − Observe that Ω(dn1/2(d 1)) dominates O(d) so long as d ε log n for some absolute constant − ≤ ε > 0 (depending on the constants in Ω( ) and O( )). Hence, we get the claimed lower bound · · s exp(Ω(dn1/2(d 1))) for all d ε log n. ≥ − ≤ 4 Upper Bound

In this section, we show that for any constant ε, there are (ε, n)-Approximate Majorities that can 0 1/2(d 1) be computed by depth d AC circuits of size exp(O(n − )). Let ε (0, 1) be a small enough constant so that the following inequalities hold for any β ε 0 ∈ ≤ 0 exp( β) 1 β exp( β), • − ≤ − − 1 β exp( β β2) exp( 2β). • − ≥ − − ≥ − (It suﬃces to take ε0 = 1/2.) We need the following technical lemma. Lemma 12. Let A, s be positive reals, M,n N, and γ ( 1 , 1 ) be such that eA n3, n 1 , ∈ ∈ n 10 ≥ ≥ ε0 and s n. Deﬁne I (γ) := y 0, 1 M y Me A(1 γ) and I (γ) := y 0, 1 M y ≤ 0 { ∈ { } | | |1 ≤ − − } 1 { ∈ { } | | |1 ≥ Me A(1 + γ) . If we choose S [M] of size t := eA s by picking t random elements from M − } ⊆ ⌈ · ⌉ with replacement, then

x I0(γ) Pr[ xj = 0] exp( s) exp(sγ/2), ∈ ⇒ S ≥ − · j_S ∈ x I1(γ) Pr[ xj = 0] exp( s) exp( sγ). ∈ ⇒ S ≤ − · − j_S ∈

7 Further, if sγ ε , then the above probabilities can be lower bounded and upper bounded by exp( s) ≤ 0 − · (1 + sγ exp( sγ)) and exp( s) (1 sγ exp( sγ)) respectively. − − · − − M A A similar statement can be obtained above for the sets J1(γ) := y 0, 1 y 0 Me− (1 M A { ∈ { } | | | ≤ − γ) and J0(γ) := y 0, 1 y 0 Me− (1 + γ) , with the event “ j S xj = 0” being replaced } { ∈ { } | | | ≥ } ∈ by the event “ j S xj = 1”. W V ∈ Proof. We give the proof only for I0(γ) and I1(γ). The proof for J0(γ) and J1(γ) is similar. Consider ﬁrst the case that x I (γ). In this case, we have the following computation. ∈ 1 eA s 1+ γ · Pr[ xj = 0] 1 S ≤ − eA j_S ∈ exp( (1 + γ) s) exp( s) exp( sγ). (1) ≤ − · ≤ − · − The above implies the ﬁrst upper bound on PrS[ j S xj = 0] from the lemma statement. When ∈ sγ ε0, we further have exp( sγ) 1 sγ exp( Wsγ), which implies the second upper bound. This ≤ − ≤ − − proves the lemma when x I . ∈ 1 Now consider the case that x I (γ). We have ∈ 0 eA s+1 1 γ · Pr[ xj = 0] 1 − S ≥ − eA j_S ∈ 1 γ 1 exp ( − ) (eA s + 1) ≥ − eA − e2A · · s 1 γ 1 = exp s + sγ − − − eA − eA − e2A 2s exp s + sγ ≥ − − eA 2e A = exp( s) exp(sγ(1 − ))) (2) − · − γ A 1 1 γ where for the second inequality we have used the fact that since e− n3 ε0, we have 1 e−A 1 γ 1 A 1 1 ≤ ≤ − ≥ exp( − ). Since e γ/4, we can lower bound the right hand side of (2) by − eA − e2A − ≤ n3 ≤ 4n ≤ exp( s) exp(sγ/2). Also, note that − · 2e A 2/n3 2 1 − 1 = 1 − γ ≥ − 1/n − n2 exp( 1/n) exp( γ) exp( sγ). ≥ − ≥ − ≥ − This implies that the RHS of (2) can also be lower bounded by exp( s) exp(sγ exp( sγ)) − − ≥ exp( s) (1 + sγ exp( sγ)), which implies the claim about PrS[ j S xj = 0] assuming that − · − ∈ x I0(γ). W ∈ We now prove the main result of this section. N log n Theorem 13. For any growing parameter n and 2 d O( log log n ) and ε > 0, ∈ ≤ ≤ 0 there is an (ε, n)-Approximate Majority fn computable by a monotone AC circuit with at most exp(O(n1/2(d 1) log(1/ε)/ε)) many gates, where both O( )’s hide absolute constants (independent of − · d, ε).

8 Proof. We assume throughout that ε is a small enough constant and that n is large enough for various inequalities to hold. We will actually construct a monotone circuit of depth d and size 1/2(d 1) exp(O(n − log(1/ε)/ε)) computing a (4ε, n)-Approximate Majority, which also implies the theorem. Fix parameters A = n1/2(d 1) and M = e10A . We assume that A 10 log n (which holds ⌊ − ⌋ ⌈ ⌉ ≥ as long as d c log n for an absolute constant c> 0) and that ε ε . ≤ log log n ≤ 0 Deﬁne a sequence of real numbers γ0, γ1, . . . , γd 2 as follows: −

ε γ = 0 √n

γi = Aγi 1 exp( 2Aγi 1), for each i [d 2]. − − − ∈ − It is clear that γ Aiγ for each i [d 2]. As a result we also obtain i ≤ 0 ∈ − i γi = A γ0 exp( 2A(γ0 + γ1 + + γi 1)) − · · · − i 2 i 1 i i A γ exp( 2γ A(1 + A + A + + A − )) A γ exp( 3A γ ). (3) ≥ 0 − 0 · · · ≥ 0 − 0 Let 1 ε Y = x 0, 1 n x + n , ε ∈ { } | |1 ≥ 2 √n

1 ε N = x 0, 1 n x n . ε ∈ { } | |1 ≤ 2 − √n

The idea is to deﬁne a sequence of circuits C 1,C2,...,Cd 2 with n inputs and M outputs such − that Ci has depth i and iM many (non-input) gates. Further, for odd i

x N C (x) I (γ ) ∈ ε ⇒ i ∈ 0 i x Y C (x) I (γ ) (4) ∈ ε ⇒ i ∈ 1 i and similarly for even i

x N C (x) J (γ ) ∈ ε ⇒ i ∈ 0 i x Y C (x) J (γ ). (5) ∈ ε ⇒ i ∈ 1 i

After this is done, we will add on top a depth-2 circuit that will reject most inputs from I0(γd 2) − or J0(γd 2) — depending on whether d 2 is odd or even respectively — and accept most inputs − − from I1(γd 2) or J1(γd 2). − − We begin with the construction of C1,...,Cd 2 which is done by induction. −

Construction of C1. The base case of the induction is the construction of C1, which is done as follows. We choose M i.i.d. random subsets T ,...,T [n] in the following way: for each i [M], 1 M ⊆ ∈ we sample A random elements of [n] with replacement. Let bx = x . i j Ti j x ∈ If x Nε, then the probability that b = 1 is given by V ∈ i 1 A 1 1 Pr[bx = 1] γ (1 2γ )A (1 γ A) i ≤ 2 − 0 ≤ 2A − 0 ≤ eA − 0

9 2 2 where the last inequality follows from the fact that (1 z)A (1 zA + A z ). − ≤ − 2 Let δ = 1/n3. Note in particular that 2δ/γ A ε for large enough n. 0 ≤ 0 By a Chernoﬀ bound, the probability that 1 bx 1 (1 γ A)(1 + δ) is bounded by M i i ≥ eA − 0 exp( Ω(δ2M/eA)) exp( Ω(e9A/n6)) exp( n),P since eA n10. Thus, with probability at − ≤ − ≤ − ≥ least 1 exp( n), we have − − bx 1 i i (1 γ A)(1 + δ) PM ≤ eA − 0 1 1 δ A (1 γ0A + δ)= A (1 γ0A(1 )) ≤ e − e − − γ0A 1 2δ 1 A (1 γ0A exp( )) A (1 γ0A exp( γ0A)) ≤ e − −γ0A ≤ e − − 1 (1 γ ). (6) ≤ eA − 1

δ 2δ Above, we have used the fact that (1 ) exp( − ) since δ/γ0A ε0 for large enough n, as − γ0A ≥ γ0A ≤ noted above. If x Y , then the probability that bx = 1 is given by ∈ ε i 1 A Pr[bx = 1] + γ i ≥ 2 0 1 1 (1 + 2γ )A (1 + γ A) ≥ 2A 0 ≥ eA 0 1 (1 + γ A). ≥ eA 0 As above, we can argue that the probability that 1 bx 1 (1 + γ A)(1 δ) is at most M i i ≤ eA 0 − exp( n). Thus, with probability 1 exp( n) P − − −

bx 1 i i (1 + γ A)(1 δ) PM ≥ eA 0 − 1 1 2δ A (1 + γ0A 2δ)= A (1 + γ0A(1 )) ≥ e − e − γ0A 1 4δ 1 A (1 + γ0A exp( )) A (1 + γ0A exp( γ0A)) ≥ e −γ0A ≥ e − 1 (1 + γ ). (7) ≥ eA 1 Thus, by a union bound over x, we can ﬁx a choice of T ,...,T so that (6) holds for all x N 1 M ∈ ε and (7) holds for all x Y . Hence, (4) holds for i = 1 as required. This concludes the construction ∈ ε of C , which just outputs the values of x for each i. 1 j Ti j V ∈ Construction of Ci+1. For the inductive case, we proceed as follows. We assume that i is odd (the case that i is even is similar). So by the inductive hypothesis, we know that (4) holds and hence that C (x) I (γ ) or I (γ ) depending on whether x N or Y . Let γ := γ . Let the output i ∈ 0 i 1 i ∈ ε ε i gates of Ci be g1,...,gM .

10 We choose T ,...,T [M] randomly as in the statement of Lemma 12 with s = A. Note 1 M ⊆ that the chosen parameters satisfy all the hypotheses of Lemma 12. Further we also have sγ ≤ A Aiγ Ad 1 ε ε . · 0 ≤ − · √n ≤ 0 The random circuit C′ is deﬁned to be the circuit obtained by adding M OR gates to Ci such that the jth OR gate computes g . Let bx be the output of the jth OR gate on C (x). k Tj k j i By Lemma 12, we have W ∈

x x Nε Pr[bj = 0] exp( A) (1 + Aγ exp( Aγ)) ∈ ⇒ S ≥ − · − x x Yε Pr[bj = 0] exp( A) (1 Aγ exp( Aγ)) (8) ∈ ⇒ S ≤ − · − −

1 1 1 2δ Let δ = n3 . Note that Aγ [ √n , n1/2(d−1) ] and hence for large enough n, Aγ exp( Aγ) ε0. ∈ − ≤ x Assume x Nε. In this case, the Chernoﬀ bound implies that the probability that j [M] bj ∈ ∈ ≤ M exp( A) (1 + Aγ exp( Aγ))(1 δ) is at most exp( Ω(δ2M/eA)) exp( n). WhenP this event − · − − − ≤ − does not occur, we have

bx 1 i i (1 + Aγ exp( Aγ))(1 δ) PM ≥ eA − − 1 1 2δ (1 + Aγ exp( Aγ) 2δ)= (1 + Aγ exp( Aγ)(1 )) ≥ eA − − eA − − Aγ exp( Aγ) − 1 4δ (1 + Aγ exp( Aγ) exp( )) ≥ eA − · −Aγ exp( Aγ) − 1 (1 + Aγ exp( Aγ) exp( Aγ)) ≥ eA − · − 1 1 (1 + Aγ exp( 2Aγ)) (1 + γ ). (9) ≥ eA − ≥ eA i+1 We have used above that for large enough n, 2δ ε and hence 1 2δ Aγ exp( Aγ) ≤ 0 − Aγ exp( Aγ) ≥ 4δ − − exp( Aγ exp(− Aγ) ). Similarly− when x Y , the Chernoff bound tells us that the probability that bx ∈ ε j [M] j ≥ M exp( A) (1 Aγ exp( Aγ))(1 + δ) is at most exp( n). In this case, we get P ∈ − · − − − bx 1 i i (1 Aγ exp( Aγ))(1 + δ) PM ≤ eA − − 1 1 δ (1 Aγ exp( Aγ)+ δ)= (1 Aγ exp( Aγ)(1 )) ≤ eA − − eA − − − Aγ exp( Aγ) − 1 2δ (1 Aγ exp( Aγ) exp( )) ≤ eA − − · −Aγ exp( Aγ) − 1 (1 Aγ exp( Aγ) exp( Aγ)) ≤ eA − − · − 1 1 = (1 Aγ exp( 2Aγ)) (1 γ ). (10) eA − − ≥ eA − i+1 By a union bound, we can fix T ,...,T so that (9) and (10) are true for all x N and x Y 1 M ∈ ε ∈ ε respectively. This gives us the circuit Ci+1 which satisfies all the required properties.

11 The top two levels of the circuit. At the end of the above procedure we have a circuit Cd 2 of depth d 2 and at most (d 2)M gates that satisﬁes one of (4) or (5) depending on whether− − − d 2 is odd or even respectively. We assume that d 2 is even (the other case is similar). − d 2 − d 2 d 2 Deﬁne γ := γd 2. Recall from (3) that γ A − γ0 exp( 3A − γ0) A − γ0/2. − 10A log(1/ε) ≥ − ≥ Let M = exp( + 10A) . We choose M many subsets T ,...,T ′ [M] i.i.d. so ′ ⌈ ε ⌉ ′ 1 M ⊆ that each Tj is picked as in Lemma 12 with s = 10A log(1/ε)/ε. Note that

Ad 2γ 10A log(1/ε) Ad 2 ε sγ s − 0 = − 5 log(1/ε). ≥ 2 ε · 2 · √n ≥

Say g1,...,gM are the output gates of Cd 2. We deﬁne the random circuit C′ (with n inputs − and M ′ outputs) to be the circuit obtained by adding M ′ AND gates such that the jth AND gate x computes k T gk. Let bj be the output of the jth AND gate on Cd 2(x). j − By LemmaV ∈ 12, we have

x 2 x Nε Pr[bj = 1] exp( s) exp( sγ) ε exp( s) ∈ ⇒ S ≤ − · − ≤ · − x exp( s) x Yε Pr[bj = 1] exp( s) exp(sγ/2) − . (11) ∈ ⇒ S ≥ − · ≥ ε2 x 2 Say x Nε. By a Chernoﬀ bound, the probability that j bj 2ε M ′ exp( s) is at most 2∈ 2 10A ≥ − exp( Ω(ε M ′ exp( s))) exp( Ω(ε e )) exp( n). Similarly,P when x Yε, the probability ′ − x M exp(− s) ≤ − ≤ − ∈ that b − is also bounded by exp( n). By a union bound, we can ﬁx a T ,...,T ′ j j ≤ 2ε2 − 1 M to getP a circuit Cd 1 such that − 2 x Nε Cd 1(x) 1 2ε exp( s)M ′ ∈ ⇒ | − | ≤ − 1 x Yε Cd 1(x) 1 exp( s)M ′. (12) ∈ ⇒ | − | ≥ 2ε2 −

This gives us the depth d 1 circuit Cd 1. Note that Cd 1 has M ′ + O(dM)= O(M ′) gates. − − − To get the depth d circuit, we choose a random subset T [M ] by sampling exactly exp(s) ⊆ ′ ⌈ ⌉ many elements of [M ′] with replacement. We construct a random depth-d circuit Cd′ by taking the OR of the the output gates of Cd 1 indexed by the subset T . From (12) it follows that −

2 2 x Nε Pr[Cd′ (x) = 1] T 2ε exp( s) 4ε < ε ∈ ⇒ T ≤ | | · − ≤ exp(s) exp( s) 2 x Yε Pr[Cd′ (x) = 0] 1 − exp( 1/2ε ) < ε. ∈ ⇒ T ≤ − 2ε2 ≤ − The ﬁnal inequalities in each case above hold as long as ε is a small enough constant.

It follows from the above that there is a choice for T such that Cd′ makes an error — i.e. C (x)=1 for x N or C (x)=0 for x Y — on at most a 2ε fraction of inputs from N Y . d′ ∈ ε d′ ∈ ε ε ∪ ε We ﬁx such a choice for T and the corresponding circuit C. We have

Pr [C(x) = Majn(x)] Pr [C(x) = Majn(x)]+ Pr [x Yε Nε] x 0,1 n 6 ≤ x Yε Nε 6 x 0,1 n 6∈ ∪ ∈{ } ∈ ∪ ∈{ } 2ε + Pr [x Yε Nε]. ≤ x 0,1 n 6∈ ∪ ∈{ }

12 Finally by Stirling’s approximation we get

1 n 1 n Pr [x Yε Nε]= 2ε. x 0,1 n 6∈ ∪ 2n m ≤ 2n n/2 ≤ ∈{ } m [ n εX√n, n +ε√n] m [ n εX√n, n +ε√n] ∈ 2 − 2 ∈ 2 − 2 Hence we see that the circuit C computes a (4ε, n)-Approximate Majority, which proves Theo- rem 13. 1/2(d 1) The circuit has depth d and size O(M ′) = exp(O(n − log(1/ε)/ε)).

5 Conclusion

0 Our main results extend straightforwardly to AC [MODp] for any ﬁxed prime p. The proofs are 1 d 1 exactly the same except for the fact that the approximating polynomials of degree O( d log s) − from Section 3 are constructed over Fp. Using the fact [15] that any (1/4)-approximating polynomial over Fp (p odd) for the Parity 0 function on n variables must have degree Ω(√n), we see that any polynomial-sized AC [MODp] formula computing the Parity function on n variables must have depth Ω(log n). This strengthens a result of Rossman [13] which gives this statement for AC0 formulas.

Acknowledgements. We thank Rahul Santhanam for valuable discussions. We also thank the organizers of the 2016 Complexity Semester at St. Petersburg, where this collaboration began.

References

[1] K. Amano. Bounds on the size of small depth circuits for approximating majority. In Automata, Languages and Programming, 36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009, Proceedings, Part I, pages 59–70, 2009.

[2] S. Arora and B. Barak. Computational Complexity - A Modern Approach. Cambridge Univer- sity Press, 2009.

[3] R. Beigel. The polynomial method in circuit complexity. In Proceedings of the Eigth Annual Structure in Complexity Theory Conference, San Diego, CA, USA, May 18-21, 1993, pages 82–95, 1993.

[4] E. Blais and L. Tan. Approximating boolean functions with depth-2 circuits. SIAM J. Comput., 44(6):1583–1600, 2015.

[5] J. H˚astad. Almost optimal lower bounds for small depth circuits. In Proceedings of the 18th Annual ACM Symposium on Theory of Computing, May 28-30, 1986, Berkeley, California, USA, pages 6–20, 1986.

[6] S. Hoory, A. Magen, and T. Pitassi. Monotone circuits for the majority function. In Approx- imation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 410–425. Springer, 2006.

13 [7] S. Jukna. Boolean Function Complexity - Advances and Frontiers, volume 27 of Algorithms and combinatorics. Springer, 2012.

[8] M. Karchmer and A. Wigderson. Monotone circuits for connectivity require super-logarithmic depth. SIAM Journal on Discrete Mathematics, 3(2):255–265, 1990.

[9] M. M. Klawe, W. J. Paul, N. Pippenger, and M. Yannakakis. On monotone formulae with restricted depth (preliminary version). In Proceedings of the 16th Annual ACM Symposium on Theory of Computing, April 30 - May 2, 1984, Washington, DC, USA, pages 480–487, 1984.

[10] S. Kopparty and S. Srinivasan. Certifying polynomials for AC0[ ] circuits, with applications. In ⊕ LIPIcs-Leibniz International Proceedings in Informatics, volume 18. Schloss Dagstuhl-Leibniz- Zentrum fuer Informatik, 2012.

[11] R. O’Donnell and K. Wimmer. Approximation by DNF: examples and counterexamples. In Automata, Languages and Programming, 34th International Colloquium, ICALP 2007, Wro- claw, Poland, July 9-13, 2007, Proceedings, pages 195–206, 2007.

[12] A. A. Razborov. Lower bounds on the size of constant-depth networks over a complete basis with logical addition. Mathematicheskie Zametki, 41(4):598–607, 1987.

[13] B. Rossman. The average sensitivity of bounded-depth formulas. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 424–430. IEEE, 2015.

[14] P. Savicky and A. R. Woods. The number of Boolean functions computed by formulas of a given size. 2000.

[15] R. Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, New York, New York, USA, pages 77–82, 1987.

[16] R. Smolensky. On representations by low-degree polynomials. In FOCS, pages 130–138, 1993.

[17] M. Szegedy. Algebraic Methods in Lower Bounds for Computational Models with Limited Communication. PhD thesis, University of Chicago, 1989.

[18] L. G. Valiant. Short monotone formulae for the majority function. J. Algorithms, 5(3):363–366, 1984.