Lecture 10: Learning DNF, AC0, Juntas Feb 15, 2007 Lecturer: Ryan O’Donnell Scribe: Elaine Shi

Analysis of Boolean Functions (CMU 18-859S, Spring 2007) Lecture 10: Learning DNF, AC0, Juntas Feb 15, 2007 Lecturer: Ryan O’Donnell Scribe: Elaine Shi 1 Learning DNF in Almost Polynomial Time From previous lectures, we have learned that if a function f is ǫ-concentrated on some collection , then we can learn the function using membership queries in poly( , 1/ǫ)poly(n) log(1/δ) time.S |S| O( w ) In the last lecture, we showed that a DNF of width w is ǫ-concentrated on a set of size n ǫ , and O( w ) concluded that width-w DNFs are learnable in time n ǫ . Today, we shall improve this bound, by showing that a DNF of width w is ǫ-concentrated on O(w log 1 ) a collection of size w ǫ . We shall hence conclude that poly(n)-size DNFs are learnable in almost polynomial time. Recall that in the last lecture we introduced H˚astad’s Switching Lemma, and we showed that 1 DNFs of width w are ǫ-concentrated on degrees up to O(w log ǫ ). Theorem 1.1 (Hastad’s˚ Switching Lemma) Let f be computable by a width-w DNF, If (I, X) is a random restriction with -probability ρ, then d N, ∗ ∀ ∈ d Pr[DT-depth(fX→I) >d] (5ρw) I,X ≤ Theorem 1.2 If f is a width-w DNF, then f(U)2 ǫ ≤ |U|≥OX(w log 1 ) ǫ b O(w log 1 ) To show that a DNF of width w is ǫ-concentrated on a collection of size w ǫ , we also need the following theorem: Theorem 1.3 If f is a width-w DNF, then 1 |U| f(U) 2 20w | | ≤ XU b Proof: Let (I, X) be a random restriction with -probability 1 . After this restriction, the DNF ∗ 20w becomes a O(1)-depth decision tree with high probability. Due to H˚astad’s Switching Lemma, we 1 have the following: n E [ fX→I 1]= Pr [DT-depth(fX→I)= d] E fX→I 1 DT-depth(fX→I)= d I,X k k I,X · I,X k k Xd=0 n 1 d 5 w 2d (H˚astad’s Switching Lemma, DT of size s has L Fourier norm s) ≤ · 20w · · 1 ≤ Xd=0 n 1 = 2 2d ≤ Xd=0 In addition, we have 2 E [ fX→I 1]= E E fX→I(S) I,X I X ≥ k k I XS⊆ b = E E E [fX→I(y)yS] I X y∈{−1,1}I XS⊆I > E E E[f(y, X)yS] = E f(S) I X y I I I XS⊆ XS⊆ b |U| 1 = f(U) Pr[U I]= f(U) · I ⊆ · 20w UX⊆[n] UX⊆[n] b b 2 Corollary 1.4 If f is a width-w DNF, then O(w log 1 ) f(U) w ǫ ≤ |U|≤OX(w log 1 ) ǫ b Proof: 1 |U| 1 |U| 2 f(U) f(U) ≥ · 20w ≥ · 20w UX⊆[n] |U|≤OX(w log 1 ) ǫ b b O(w log 1 ) 1 ǫ f(U) ≥ 20w |U|≤OX(w log 1 ) ǫ b 2 O(w log 1 ) Corollary 1.5 If f is a width-w DNF, it’s ǫ-concentrated on a collection of size w ǫ . 2 1 ǫ Proof: Define = U : U O(w log ), f(U) 1 . By Parseval, we get that S | | ≤ ǫ ≥ wO(w log ǫ ) |S| ≤ O(w log 1 ) n o w ǫ . We now show that is ǫ-concentrated b on . By Theorem 1.2, we know that S S f(U)2 ǫ ≤ UX∈S / 1 |U|≥O(w log ǫ ) b By Corollary 1.4, we have 2 O(w log 1 ) ǫ f(U) f(U) max f(U) w ǫ = ǫ O(w log 1 ) ≤ · ≤ · ǫ UX∈S / UX∈S / w |U|≤O(w log 1 ) b |U|≤O(w log 1 ) b b ǫ ǫ Therefore, f is 2ǫ-concentrated on . 2 S Corollary 1.6 poly(n)-size DNFs are ǫ-concentrated on collections of size O(log n log 1 ) O(log log n ·log 1 ) n ǫ ǫ n ǫ ǫ log = ǫ ǫ And if ǫ = Θ(1), then the above is nO(log log n). Note that this uses the fact that size-n DNF formulas n are ǫ-close to a width-log( ǫ ) DNF. An open research problem is the following question: Are poly(n)-size DNFs ǫ-concentrated on a collection of size poly(n), assuming that ǫ = Θ(1). 2 LearningAC0 We will now study how to learn polynomial-size, constant-depth circuits, AC0. Consider circuits with unbounded fan-in AND, OR, and NOT gates. The size of the circuit is defined as the number of AND/OR gates. Observe the following fact: Fact 2.1 Let d denote the depth of the circuit. At the expense of a factor of d in size, these circuits can be taken to be “layered”. Here “layered” means that eacy layer consists of the same type of gates, either AND or OR, and adjacent layers contain the opposite gates. In a layered circuit, the number of layers is the depth of the circuit, and define the bottom fan-in of a layered circuit to be the maximum fan-in at the lowest layer (i.e. closest to the input layer). Theorem 2.2 (LMN.) Let f be computable by a size s, depth D, and bottom fan-in w circuit. Then ≤ ≤ ≤ f(U)2 O(s 2−w) ≤ · X D |U|≥(10w) b 3 Before we show a proof of the LMN theorem, let us first look at some corollaries implied by this theorem. Corollary 2.3 If f has a size s, depth D circuit, then f(U)2 ǫ ≤ |U|≥[OX(log s )]D ǫ b Proof: Notice that such an f is ǫ-close to a similar circuit with bottom fan-in log( s ). 2 ≤ ǫ Corollary 2.4 AC0, i.e., the class of poly-size, constant-depth circuits, are learnable from random poly(log( n )) examples in time n ǫ , where n denotes the size of the circuit. 0 poly(log( n )) Proof: According to Corollary 2.3, AC circuits are ǫ-concentrated on a collection of size n ǫ . 2 Corollary 2.5 If f has a size-s, depth-D circuit, then I(f) [O(log s)]D ≤ Remark 2.6 Due to Hastad,˚ the above bound can be improved to [O(log s)]D−1. Proof:(sketch.) Define F (τ)= f(U)2. Recall that |UP|>τ b b n O(log s)D n I(f)= U f(U)2 = F (r)= F (r)+ F (r) | | XU Xr=1 Xr=1 X D b b b r=O(log s) b n = U f(U)2 + F (O(log s)D) O(log s)D + F (r) | | · X D X D |U|≤O(log s) b b r=O(log s) b n O(log s)D + F (r) ≤ X D r=O(log s) b n It remains to show that F (r) O(log s)D. By Corollary 2.3, r=O(log s)D ≤ P b 1/D F (τ)= f(U)2 s 2−Ω(τ ) ≤ · |UX|>τ b b n Using this fact plus some manipulation, it is not hard to show that F (r) O(log s)D. 2 D ≤ r=OP(log s) b Using this fact, we can derive the following two corollaries: 4 Corollary 2.7 Parity / AC0, Majority / AC0. ∈ ∈ Proof: I(Parity)= n, I(Majority)=Θ(√n). 2 Definition 2.8 (PRFG.) A function f : 1, 1 m 1, 1 n 1, 1 is a Pseudo-Random Function Generator (PRFG), if for all Probablistic{− } × {−Polynom} ial→ Time {− (P.P.T)} algorithm A with access to a function, 1 Pr [A(f(S, )) = YES] Pr [A(g)= YES] S⊆{−1,1}m, · − g, ≤ nω(1) A’s random bits A’s random bits where g is a function picked at random from all functions h h : 1, 1 n 1, 1 . { | {− } → {− }} Corollary 2.9 Psuedo-random function generators / AC0. ∈ Proof: Suppose that f AC0. Then we can construct a P.P.T adversary A such that A can tell f ∈ apart from a truly random function with non-negligible probability. Consider an adversary A that picks at random X 1, 1 n, and i [n]. A is given an oracle to g, it queries g at X and X(i). A outputs YES iff g∈(X {−) = g(}X(i)). ∈ 6 If g is truly random, then Pr[A outputs YES]=1/2; if g is in AC0, then Pr[A outputs YES] I(g)/n = polylog(n)/n. 2 ≤ After seeing all these applications of the LMN theorem, we now show how to prove it. To prove the LMN theorem, we need the the following tools: Observation 2.10 A depth-w Decision Tree (DT) is expressible as a width-w DNF or as a width-w CNF. Lemma 2.11 Let f : 1, 1 n 1, 1 , let (I, X) be a random restriction with -probability {− } → {− } ∗ ρ. Then d 5, ∀ ≥ 2 f(U) 2 Pr [DT-depth(fX→I) >d] ≤ (I,X) |UX|≥2d/ρ b Now using Lemma 2.11, we can prove the LMN theorem. Proof:(LMN.) Claim 2.12 −w Pr [DT-depth(fX→I) >w] s 2 I,X ≤ · The above claim in combination with Lemma 2.11 would complete the proof.

Lecture 10: Learning DNF, AC0, Juntas Feb 15, 2007 Lecturer: Ryan O’Donnell Scribe: Elaine Shi

Randomised Computation 1 TM Taking Advices 2 Karp-Lipton Theorem

BQP and the Polynomial Hierarchy 1 Introduction

NP-Complete Problems and Physical Reality

0.1 Axt, Loop Program, and Grzegorczyk Hier- Archies

Quantum Computational Complexity Theory Is to Un- Derstand the Implications of Quantum Physics to Computational Complexity Theory

PR Functions Are Computable, PR Predicates) Lecture 11: March

General Models of Computation

Hierarchies of Recursive Functions

Ion-Size Effect on Superconducting Transition Temperature Tc in …Er, Dy, Gd, Eu, Sm, and Nd؍R1؊X؊Yprxcayba2cu3o7؊Z Systems „R Li-Chun Tung, J

Quantum Complexity Theory

Randomized Computation, Chernoff Bounds. 1 Randomized Time Complexity

Quantum Interactive Proofs and the Complexity of Separability Testing