<<

Analysis of Boolean Functions (CMU 18-859S, Spring 2007) Lecture 14: The KKL and Friedgut’s Theorem March 6, 2007 Lecturer: Ryan O'Donnell Scribe: Ryan Williams

1 Introduction

The KKL Theorem, named after Kahn, Kalai, and Linial, says that for any Boolean f on n variables, there is some variable that has non-trivial influence on the value of f, and this amount of influence depends on the variance of the function. Before we state the theorem, recall that Var(f), a.k.a. the variance of f, is

Var(f)= E[f(x)2] (E[f(x)])2 = fˆ(S)2. − SX6=∅ Variance can be expressed in terms of the empty coefficient:

Proposition 1.1 For f : 1, 1 n 1, 1 , Var(f)=(1+ fˆ(∅))(1 fˆ(∅)). {− } → {− } − Proof:

Var(f)= fˆ(S)2 fˆ(∅)2 =1 fˆ(∅)2 =(1+ fˆ(∅))(1 fˆ(∅)), − − − S X ˆ 2 2 where S f(S) =1 follows from Parseval. P Theorem 1.2 (Kahn-Kalai-Linial) Let f : 1, 1 n 1, 1 . Then there is i [n] satisfying {− } → {− } ∈ log n Inf (f) Ω Var(f) . i ≥ · n   As a special case, if a Boolean function is balanced, taking the value 1 on exactly half of its inputs, then fˆ(∅) = Ex[f(x)] = 0, so Var(f)=1 by the . Hence Theorem 1.2 tells us that every balanced function has a variable with influence at least Ω(log n/n). In other words, not all variables can have small influence; if a function has, say, constant total influence, then there must be some variables of that function that are more influential than others by an Ω(log n) factor. We will prove the KKL theorem in this lecture. The work of Kahn, Kalai, and Linial is hard to overstate in importance– they essentially invented the use of Fourier analysis in theoretical science, and they employed interesting proof techniques that are still widely used today. A related result is that of Friedgut, who showed that Boolean functions with small total influence are “close” to another Boolean function that only depends on a few variables.

1 Theorem 1.3 (Friedgut) Let f : 1, 1 n 1, 1 . Then for every ǫ (0, 1), f is ǫ-close to a 2O(I(f)/ǫ)-junta. {− } → {− } ∈

We will also prove Friedgut’s theorem along the way. Theorem 1.2 and Theorem 1.3 are related in the followingsense: by taking a small of the most influential variables as guaranteed by KKL, we get a coalition of variables that controls most values of the function.

2 Noise Operators

In order to establish the above results, we use the notion of a noise operator on functions. In- formally speaking, a noise operator smooths out the Fourier expansion of a function by placing a dampening factor on its high-degree parts. For a given string x 1, 1 n, we define the distribution y x that chooses a string y as ∈ {− } ∼ρ follows: each coordinate y is set to be x with probability 1/2+ ρ/2, and x with probability i i − i 1/2 ρ/2. In other words, randomly choosing a y ρ x flips some bits of x, with probability dependent− on ρ. ∼

Definition 2.1 For ρ [ 1, 1], the noise operator T on functions f : 1, 1 n R is given by ∈ − ρ {− } → E (Tρf)(x)= y∼ρx[f(y)].

First, let’s look at some extremal cases for this operator.

Observation 2.2 T f = f, T f = f, and T f = E [f(x)] = fˆ(∅). 1 −1 − 0 x

For ρ varying between 0 and 1, we can think of the Tρf function as an average of f’s values on neighborhoods of x, where the radius of the neighborhood is parameterized by ρ. In the case ρ =1, the neighbors are completely ignored, and T1f = f. In the case ρ =0, we ignore the input x and output the average value of f on all inputs, so T0f is a . Another property of Tρ is that it is a linear operator. The proof is just three words: “ of expectation.”

Observation 2.3 For f,g : 1, 1 n 1, 1 and constant c R, T (cf + g)= cT f + T g. {− } → {− } ∈ ρ ρ ρ To further build up our intuition about the noise operator, let’s see what it does on the Fourier basis.

|S| Proposition 2.4 TρxS = ρ xS.

2 Proof:

E TρxS = y∼ρx[χS(y)] E = yi [yi] i∈S Y 1 ρ 1 ρ = + x + x 2 2 i 2 − 2 i Yi∈S     |S| = (ρxi)= ρ xS. Yi∈S 2

So the Fourier basis functions are “dampened” by the ρ factor. As the cardinality of the relevant set gets larger, the contribution of a parity function to the overall value of Tρf decreases rapidly. We can now immediately write the Fourier expansion of Tρf in terms of the Fourier coefficients of f.

|S| ˆ Corollary 2.5 (Tρf)(x)= S ρ f(S)xS.

Finally, we will need a technicalP lemma that expresses the attenuated influence of a function at ρ in terms of the noise operator Tρ. Fortunately, most of Homework 2 proved it for us. In Problem 3 of Homework 2, you proved that the noise stability of f at ρ is

S (f)= E[f(x) (T f)(x)] = ρ|S|fˆ(S)2, (1) ρ · ρ S X and you proved S (ρ) ρ(Dif) = Infi (f), (2) ˆ where (Dif)= S:i∈S f(S)χS−{i}(x).

P(ρ) 2 Lemma 2.6 Inf (f)= T 1/2 D f . i || ρ i ||2 Proof: Appending to equation (1), we derive

2 |S| 2 1/2 |S| 2 S (f)= ρ fˆ(S) = (ρ ) fˆ(S) = T 1/2 f . ρ || ρ ||2 S S X X   2 (ρ) Therefore T 1/2 D f = S (D f) = Inf (f), by equation (2). 2 || ρ i ||2 ρ i i Lemma 2.6 uses the 2-norm of a certain function to express the attenuated influence of a vari- able. This suggests the application of inequalities between norms to prove bounds on the influence of a variable, which is the approach that we will take to proving KKL and Friedgut’s theorem.

3 3 LowerBoundinga Norm of f With a Norm of Tρf

p 1/p Recall our norm definition: if Y is a random variable and p 1, then Y p = E[ Y ] . This section is devoted to proving the inequality: ≥ || || | |

Theorem 3.1 T f 4 f . || 1/2 ||2 ≤ || ||4/3

That is, we can upper-bound the 2-norm of T1/2f in terms of the 4/3-norm of f. We will use this inequality to show that variable influences on f attenuated by ρ are significantly smaller than normal (ρ = 0) variable influences on f, by turning the norm inequality between Tρf and f into an inequality between attenuated influences and normal influences, via Lemma 2.6. Before proving Theorem 3.1, we first recall the hypercontractive inequality from the previous lecture.

d Theorem 3.2 (From Previous Lecture) If f is a degree d polynomial, then f √3 f . || ||4 ≤ || ||2 By a simple change-of-norms, we can replace the 4 and 2 in the above with 2 and 4/3. Recall H¨older’s inequality tells us that E[X Y ] X Y , provided that 1/p +1/q =1. · ≤ || ||p|| ||q d Corollary 3.3 If f is a degree d polynomial, then f √3 f . || ||2 ≤ || ||4/3

2 d Proof: f 2 = E[f f] f 4 f 4/3 √3 f 2 f 4/3, where the penultimate inequality is H¨older’s,|| and|| the last· inequality≤ || || follows|| || from≤ Theorem|| || || 3.2|| . 2

Proof of Theorem 3.1. For a parameter d 0, 1,...,n , define f =d := fˆ(S)x . ∈ { } S:|S|=d S Observe that f =d 2 f 2. Thus by Corollary 3.3, || ||2 ≤ || ||2 P fˆ(S)2 = f =d 2 f 2 3d f 2 . || ||2 ≤ || ||2 ≤ || ||4/3 S:X|S|=d Multiplying both sides of the inequality by (1/4)d,

1 d 3 d fˆ(S)2 f 2 . 4 ≤ 4 || ||4/3 S:X|S|=d     Since the inequality holds for all d, we can upper-bound the sum over all sets S:

1 d ∞ 3 d fˆ(S)2 f 2 . 4 ≤ 4 || ||4/3 XS   Xd=0   But d 2 d 2 1 ˆ 1 ˆ 2 T1/2f 2 = f(S) = f(S) , || || 2 ! 4 XS   XS   4 and ∞ 3 d 1 = 3 =4, 4 1 4 Xd=0   − so it follows that T f 2 4 f 2 . || 1/2 ||2 ≤ || ||4/3 2

4 Lower Bounding Influence With Attenuated Influence

The above inequality implies that for a Boolean function, the 1/4-attenuated influences of f are significantly smaller than the original influences. We’ll use this fact directly in the proofs of KKL and Friedgut’s theorem.

Corollary 4.1 For all i [n], Inf(1/4)(f) 4 Inf (f)3/2. ∈ i ≤ · i Since variable influences are typically at most 1, the corollary indicates that 1/4-attenuated influences are even smaller.

(ρ) c Remark 4.2 Notice that Infi (f) Infi(f) implies that c 2, for any ρ. This is because if f (ρ) ≤ |S|−1 ˆ 2 ˆ 2 ≤ 2 were monotone, then Infi (f)= S ρ f(S) f(i) = Infi(f) . Thus we should not expect to improve the exponent 3/2 in the corollary to greater≥ than 2, without introducing a multiplicative constant. P

Proof of Corollary 4.1.

(1/4) 2 Inf (f) = T 1/2 D f by Lemma 2.6 i || (1/4) i ||2 = T D f 2 4 D f 2 by Theorem 3.1 || 1/2 i ||2 ≤ · || i ||4/3 = 4 E [ (D f)(x) 4/3](3/4)·2 by definition. · x | i | 4/3 But for Boolean f, the function (Dif)(x) 1, 0, 1 for all x. So (Dif)(x) 0, 1 , and we can therefore rewrite the expectation as∈ a probability: {− } | | ∈ { }

Inf(1/4)(f) 4 E [ (D f)(x) 4/3]3/2 =4 Pr [(D f)(x) = 0]3/2 =4 Inf (f)3/2. i ≤ · x | i | · x i 6 · i 2

5 5 TheMainLemma

Using the results developed above, we now prove a critical lemma that says the Fourier spectrum of a Boolean function with low total influence is mostly concentrated on small cardinality sets of influential coordinates.

Lemma 5.1 (Main Lemma) Let f : 1, 1 n 1, 1 , and ǫ (0, 1). Let d =2 I(f)/ǫ and {− } → {− } ∈ · J = j [n] : Inf (f) 100−d . Then f is ǫ-concentrated on the collection { ∈ j ≥ } = S J : S d . J { ⊆ | | ≤ } −d I −d O(I(f)/ǫ) Note that J ( i Infi(f))/100 = (f)/100 2 ; this set J will turn out to be the set of influential| | ≤ coordinates in Friedgut’s theorem. ≤ P ˆ 2 Proof: Recall that to show ǫ-concentration of f on , we need to show that S∈ /J f(S) ǫ (cf. Lecture 8). We can break up the summation into twoJ parts: ≤ P fˆ(S)2 = fˆ(S)2 + fˆ(S)2. (3) XS∈ /J S:X|S|>d S:|SX|≤d,S*J For the first part, fˆ(S)2 = fˆ(S)2 ǫ/2, ≤ S:X|S|>d S:|S|>XI(f)/(ǫ/2) where the inequality follows from Proposition 5.6 of Lecture 8. It suffices for us to bound the second part of the summation (3) by ǫ/2 as well. First, we compute

fˆ(S)2 S J fˆ(S)2 ≤ | ∩ | S:|SX|≤d,S*J S:X|S|≤d 1 |S|−1 4d−1 S J fˆ(S)2 ≤ | ∩ | 4 XS   1 |S|−1 4d−1 fˆ(S)2 =4d−1 Inf(1/4)(f). ≤ 4 i Xi∈J SX:i∈S   Xi∈J Applying the influence inequality (Corollary 4.1) from the previous section,

4d−1 Inf(1/4)(f) 4d Inf (f)3/2 i ≤ i Xi∈J Xi∈J d 1/2 4 max Infi(f) Infi(f) ≤ · i∈J   Xi∈J 4 2 I(f)/ǫ 4d 100−d/2 I(f)= I(f). ≤ · · 10 ·    6 Note that 4 2 I(f)/ǫ I(f) I(f) I(f)= , 10 · 100 I(f)/ǫ ≤ eI(f)/ǫ   16 since e< 100/16. Finally,  I(f) I(f) ǫ/2 2 eI(f)/ǫ, eI(f)/ǫ ≤ ⇐⇒ · ǫ ≤ but since I(f) 0, the latter follows immediately from the fact that x ex/2 for all x 0. 2 ǫ ≥ ≤ ≥

6 Friedgut’s Theorem and the KKL Theorem

We are finally ready to prove the main of this lecture, although the Main Lemma of the previous section is so powerful that the two theorems are really corollaries now.

Corollary 6.1 (Friedgut’s Theorem) Let f : 1, 1 n 1, 1 . Then for every ǫ (0, 1), f is ǫ-close to a 2O(I(f)/ǫ)-junta. {− } → {− } ∈

ˆ Proof: Define g := sgn S:|S|≤d,S⊆J f(S)xS , where J and d are the same as in Lemma 5.1. O(I(f)/ǫ) This function clearly dependsP only on the coordinates in J, but as we saw before, J 2 . Since f is ǫ-concentrated on J, it follows that f is ǫ-close to g, by | 5.2| ≤ and 5.3 of Lecture 8. 2

Corollary 6.2 (KKL) Let f : 1, 1 n 1, 1 . Then there is i [n] satisfying {− } → {− } ∈ log n Inf (f) Ω Var(f) . i ≥ · n   Proof: Take ǫ = Var(f)/10 1/10 in the Main Lemma. In the following, let the logarithms be in base-10. ≤ Var Var If I(f) (f) log n , then there is a coordinate i such that Inf (f) (f) log n . • ≥ 1000 i ≥ 1000n Var If I(f) < (f) log n , then • 1000 Var 2 I(f) (f) log n log n d = · < 500 . ǫ Var(f) ≤ 50 10 Therefore J has very few variables in it, namely

J I(f) 100d (log n) 100(log n)/50 = (log n)n2/50 n1/2. | | ≤ · ≤ · ≤ 7 Now we can lower-bound the sum of influences in J by the variance, as follows:

ˆ 2 Infi(f) = f(S) Xi∈J Xi∈J SX:i∈S S fˆ(S)2 ≥ | | S⊆J X fˆ(S)2 ≥ ∅=6XS⊆J = fˆ(S)2 fˆ(S)2 − SX6=∅ SX*J Var(f) ǫ by the Main Lemma ≥ − = 0.9 Var(f) by our choice of ǫ. · Hence there must be a particular i J satisfying ∈ 0.9 Var(f) 0.9 Var(f) 0.9 Var(f)log n Inf (f) · · · , i ≥ J ≥ n1/2 ≥ n | | so the theorem holds in this case as well. 2

8