Threshold Phenomena and Influence With Some Perspectives from Mathematics, Computer Science and Economics∗

Gil Kalai Hebrew University of Jerusalem and Yale University and Shmuel Safra Tel-Aviv University, Princeton University and Institute for Advanced Study, Princeton

1 Introduction

“Threshold phenomena” refer to settings in which the probability for an event to occur changes rapidly as some underlying parameter varies. Thresh- old phenomena play an important role in probability theory and statistics, ∗Research supported in part by an NSF grant and by an ISF bikura grant. We thank Allon Percus without his encouragement and help this paper would not have been written and are thankful to many friends and colleagues including , Itai Benjamini, , Ehud Friedgut, Jeff Kahn, Guy Kindler, Nati Linial, Elchanan Mossel, Ryan O’Donnell, Yuval Peres and Oded Schramm for inspiring discussions and helpful remarks.

1 physics and computer science, and are related to issues studied in economics and political science. Quite a few questions that come up naturaly in those fields translate to proving that some event indeed exhibit a threshold phe- nomenon, and then finding the location of the transition and how rapid the change is. The notions of sharp threshold and phase transition originated mainly in physics and many of the mathematical ideas for their study came from mathematical physics. In this paper, however, we will mainly discuss connections to other fields. A simple yet illuminating example that demonstrates the sharp threshold phenomenon is Condorcet’s Jury Theorem (CJT), which can be described as follows: Say one is running an election process, where the results are deter- mined by simple majority, between two candidates, Alice and Bob. If every

1 voter votes for Alice with probability p > 2 and for Bob with probability 1 − p, and if the probabilities for each voter to vote either way are inde- pendent of the other votes, then as the number of voters tends to infinity the probability that Alice is elected tends to 1. The probability that Alice is elected is a monotone function of p, and when there are many voters it

1 rapidly changes from being very close to 0 when p < 2 to being very close to 1 1 when p > 2 . The reason usually given for the interest of CJT to economics and political science is that it can be interpreted as saying that even if agents receive very poor (yet independent) signals, indicating which of two choices is correct, majority voting nevertheless results in the correct decision being taken with high probability, as long as there are enough agents, and the agents vote according to their signal. This is referred to in economics as “asymptotically

2 complete aggregation of information”. Condorcet’s Jury theorem is a simple consequence of the weak law of large numbers. The central limit theorem implies that the “threshold interval” √ is of length proportional to 1/ n. Some extensions, however, are much more difficult. When we consider general economic or political situations, aggregation of agents’ votes may be much more complicated than simple majority, the individual signal (or signals) may be more complicated than a single bit of information, and the distribution of signals among agents can be more general and, in particular, agents’ signals may depend on each other. On top of that, voters may vote strategically taking into account possible actions of others in addition to their own signal, and distinct voters may have different goals and interests, not only different information. In addition, the number of candidates may be larger than two, which results in a whole set of new phenomena. Let us now mention briefly two other areas in which understanding thresh- old behavior emerges. The study of random graphs as a separate area of re- search was initiated in the seminal paper of Erd˝os and Renyi [?] from 1960. Consider a random graph G(n, p) on n vertices where every edge among the [n] 2 possible edges appears with probability p. Erd˝os and Renyi proved a sharp threshold property for various graph properties: For example, for every  > 0, if p = (1 + ) log n/n the graph is connected with probability tending to 1 (as n tends to infinity) while for p = (1 − ) log n/n the probability that the graph will be connected tends to zero. Since 1960 extensive study of specific random graph properties was carried out and in recent years, results concerning threshold behavior of general graph properties were found. For

3 general understanding of threshold properties of graphs, it turns out that symmetry plays a crucial role: When we talk about properties of graphs we implicity assume that the those properties depends only on the isomorphism type of the graphs (and not on the labeling of vertices). This fact introduces substantial symmetry to this model and we will discuss how to exploit this symmetry. Next, we would like to mention complexity theory. Threshold phenomena play a role, both conceptual and technical, in various aspects of computa- tional complexity theory. One of the major developments in complexity the- ory in the last two decades is the emerging understanding of the complexity of approximating optimization problems. Here is an important example: for a graph G let m(G) be the maximum number of edges between two disjoint sets of vertices of G. MAX-CUT, the problem of dividing the vertices of a given input graph into two parts so as to maximize the number of edges between the parts, is known to be NP-hard. However, to find a partition such that the number of edges between the two parts is at least m(G)/2 is easy. The emerging (yet unproven) picture for this specific problem, as well as many others, is that if we wish to find a partition of the vertices with at least cm(G) edges between the parts then there is a critical value c0 = .878567.. such that the problem is easy (i.e., there is a randomized polynomial time algorithm to solve it) for c < c0 and hard (NP-hard) for c > c0. (For this problem 0.878567..-approximation is achieved by the fa- mous Goemans-Williamson algorithm based on semidefinite programming.) The study of threshold phenomena and other related properties of Boolean functions is also an important technical tool in understanding the hardness

4 of approximation. Another connection with complexity theory occurs in the area of circuit complexity. It turns out that very “low” complexity classes necessarily ex- hibit coarse threshold behavior. This insight is related to another major success of complexity theory: lower bounds for the size of bounded-depth circuits. The majority function which exhibits a very sharp threshold be- havior cannot be represented by a bounded-depth Boolean circuit of small size. Let us introduce now the basic mathematical object, which is the sub- ject of our considerations, namely that of a Boolean function, a function f(x1, x2, . . . , xn) where each variable xi is a Boolean variable, namely it can take the value ’0’ or ’1’ and the value of f is also ’0’ or ’1’. A Boolean function is monotone if f(y1, y2, . . . , yn) ≥ f(x1, x2, . . . , xn) when yi ≥ xi for every i. Some basic examples of Boolean functions are named after the voting method they describe. For an odd integer n, the majority function

M(x1, x2, . . . , xn) equals ’1’ if and only if x1 + x2 + ... + xn > n/2. The function f(x1, x2, . . . , xn) = xi is called a “dictatorship”. “Juntas” refer to the class of Boolean functions that depend on a bounded number of variables (that is, functions that disregard the value of almost all variables except for a few, whose number is independent of the number of variables).

Now consider the probability µp(f) that f(x1, x2, . . . , xn) = 1, when the probability that xi = 1 is p, independently for i = 1, 2, . . . , n, just as we had before in the election between Alice and Bob. When f is a monotone Boolean function, the function µp(f) is a monotone real function of p. Given a real number 1/2 >  > 0, the threshold interval depending on  is the interval

5 [p1, p2] where µp1 (f) =  and µp2 (f) = 1 − . Understanding the length of this threshold interval is one of our central objectives. Before we describe this papers’ sections it is worth noting that the notion of a sharp threshold is an asymptotic property and therefore it applies to a sequence of Boolean functions when the number of variables becomes large. Giving explicit, realistic and useful estimates is an important goal. In the example above concerning elections between Alice and Bob, the central limit theorem provides explicit, realistic and useful estimates, however, in more involved settings this task can be quite difficult. The main massages of this paper can be summarized as follows:

• The threshold behavior of a system is intimately related to combinato- rial notions of “influence” and “pivotality”. (Section 2)

• Sharp threshold is common. We can expect sharp threshold unless there are good reasons not to. (Section 3, 5.3)

• Higher symmetry leads (in a subtle way) to sharper threshold behavior (Section 5.2).

• Sharp threshold occurs unless the property can be described “locally”. (Section 5.3).

• Systems whose description belongs to a very low complexity class have rather coarse (not sharp) threshold behavior. (Section 6.1).

• In various optimization problems when we seek for approximate solu- tions there is a sharp transition between goals which are algorithmically

6 easy and those which are computationally intractable. (Section 6.3).

• A basic mathematical tools in understanding threshold behavior is Fourier analysis of Boolean functions. (Section 4).

In Section 2 we will introduce the notions of pivotality and influence and discuss “Russo’s Lemma”, which relates these notions to threshold behav- ior. In Section 3 we will describe basic results concerning influences and threshold behavior of Boolean functions. In section 4 we will discuss a major mathematical tool required for the study of threshold phenomena and influ- ences: Fourier analysis of Boolean functions. In Section 5 we will discuss the connection to random graphs and hypergraphs and to the k-SAT problem. In Section 6 we will discuss the connections to computational complexity. Section 7 is devoted to the related phenomena of noise sensitivity. Section 8 discusses connections with the model of percolation. Section 9 discusses an example from social science: A result by Fedderson and Pessondorfer which exhibits a situation of self-organizing criticality. Section 10 summarizes some open problems and Section 11 concludes.

2 Pivotality, influence, power and the thresh- old interval

In this section we will describe the n-dimensional hypercube, and define the notions of “pivotal” variables and influence for Boolean function. We state Russo’s fundamental lemma that connect influences and thresholds.

7 2.1 The discrete cube

n Let Ωn = {0, 1} denote the discrete n-dimensional cube, namely the set of

0-1 vectors of n entries. A Boolean function is a map from Ωn to {0, 1}.

Boolean functions on Ωn are of course in 1-1 correspondence with subsets of Ωn. Elements in Ωn are themselves in 1-1 correspondence with subsets of [n](= {1, 2, . . . , n}). Boolean functions under different names appear in many areas of science. We will equip Ωn with a metric (namely a distance function) and a probability measure. For x, y ∈ Ωn the Hamming distance d(x, y) is defined by

d(x, y) = |{i : xi 6= yi}|. (1)

Denote by Ωn(p) the discrete cube endowed with the product probability measure µp, where µp{x : xj = 1} = p. In other words,

k n−k µp(x1, x2, . . . , xn) = p (1 − p) , (2) where k = x1 + x2 + ... + xn.

2.2 Pivotality and influence of variables

Consider a Boolean function f(x1, x2, . . . , xn) and the associated event A ⊂

Ωn(p), such that f = χA, namely f is the characteristic function of A. For x = (x1, x2, . . . , xn) ∈ Ωn we say that the kth variable is pivotal if flipping the value of xk changes the value of f; formally, let

σk(x1, . . . , xk−1, xk, xk+1, . . . , xn) = (x1, . . . , xk−1, 1 − xk, xk+1, . . . , xn) (3) and define the kth variable to be pivotal at x if

f(σk(x)) 6= f(x). (4)

8 p The influence of the kth variable on a Boolean function f, denoted by Ik (f), is the probability that the kth variable is pivotal, i.e.

p Ik (f) = µp({x : f(σk(x) 6= f(x))}) (5)

The total influence Ip(f) is the sum of the individual influences.

p X p I (f) = Ik (f). (6)

1 (We omit the superscript p for p = 2 .) For a monotone Boolean function 1 2 thought of as an election method Ik(f) (= Ik (f)) is referred to as the Banzhaf power index of voter k. The quantity

Z 1 p φk(f) = Ik (f)dp, (7) 0 is called the Shapley-Shubik power index of voter k. Mathematical study (under different names) of pivotal agents and in- fluences is quite basic in percolation theory and statistical physics and also in probability theory and statistics, reliability theory, distributed computing, complexity theory, game theory, mechanism design and auction theory, other areas of theoretical economics, and political science.

2.3 Russo’s lemma and threshold intervals

A Boolean function f is monotone if its value does not decrease when we flip the value of any variable from 0 to 1. For a monotone Boolean function f ⊂ Ωn, let µp(f) be the probability that f(x1, . . . , xn) = 1 with respect to the product measure µp. Note that µp(f) is a monotone function of p.

9 Russo’s fundamental lemma (see [?]) asserts that

dµ (f) p = Ip(f). (8) dp

Given a small real number  > 0, let p1 be the unique real number in

[0, 1] such that µp1 (f) =  and let p2 be the unique real number such that

µp2 (f) = 1 − . The interval [p1, p2] is called a threshold interval and its 1 length p2 −p1 is denoted by t(f). Denote by pc the value so that µpc (f) = 2 , and call it the critical probability for the event A. By Russo’s lemma, large total influence around the critical probability implies a short threshold interval. Remark: Let us exhibit now the notions introduced here using a simple example that will demonstrate several issues throuout the paper. Let M3 represent the majority function on three variables. Thus, M3(x1, x2, x3) = 1 if x1 + x2 + x3 ≥ 2 and M3(x1, x2, x3) = 0, otherwise. Clearly, µ(M3) = 1/2.

This follows from the fact that M3 is an odd Boolean function namely it satisfies the relation

f(1 − x1, 1 − x2,..., 1 − xn) = 1 − f(x1, x2, . . . , xn). (9)

A simple calculation gives

3 2 µp(M3) = p + 3 · p (1 − p). (10)

p As for the influence of the variables we obtain: Ik(M3) = 1/2 and Ik (M3) = 2 2 p 2p(1 − p) + 2p (1 − p) for k = 1, 2, 3. Therefore I(M3) = 3/2 and I (M3) =

6(p · (1 − p)). This last expression is indeed equal to dµp(M3)/dp.

10 3 Basic results concerning influences and thresh- old behavior of Boolean functions

Dictatorship and Juntas have coarse threshold and when the critical prob- ability is 1/2, coarse threshold implies that the function “looks” like a junta.

Some basic facts on influences and the corresponding results concerning threshold intervals are given by:

1. The sum of influences cannot be overly small.

Theorem 3.1 For a Boolean function f,

X Ik(f) ≥ 2µ(f) log2(1/µ(f)). (11)

1 In particular if µ 1 (f) = then I(f) ≥ 1 and equality holds if and only 2 2 if f is a “dictatorship” namely f(x1, . . . , xn) = xi for some i or “antidicta- torship” f(x1, . . . , xn) = 1 − xi for some i. Inequality 11 has its origins in the works of Whitney and Loomis, Harper, Bernstein, Hart and others. It has great importance in many mathematical contexts. It can be regarded as an isoperimetric relation for subsets of the discrete cube, which is analogous to the famous Euclidean isoperimetric relations and this analogy goes a long way. We will come back to it in Section 5.4. An upper bound for the length of the threshold interval can be derived from the bounds on the sum of influences combined with Russo’s lemma.

11 Theorem 3.2 (Bollobas and Thomason) For every monotone Boolean function,

t(f) = O(min(pc, 1 − pc)). (12)

Two quick remarks are in place: We first note that for a function f(x1, x2, . . . , xn) we can consider the “dual” function defined by

g(x1, x2, . . . , xn) = f(1 − x1, 1 − x2,..., 1 − xn). (13)

Then it is easily seen that

µp(g) = 1 − µ1−p(f). (14)

Beacause of this duality we do not lose generality if we consider always the

1 case that pc(f) ≤ 2 and this can simplify several of the statements below. Second, we note that another way to state Bollobas-Thomason result is that for every  there is c() so that t(f)/pc(f) ≤ c(), for every Boolean function f. The Theorem of Bollobas and Thomason is the basis for the following definition: We say that a sequence (fn) of Boolean functions have a sharp threshold if for every  > 0,

t(fn) = o(min(pc, 1 − pc)). (15)

Otherwise, we say that the sequence demonstrate a coarse threshold behavior.

When the critical probabilities for the functions fn are bounded away from 0 and from 1 then having a sharp threshold simply means that for every  > 0, t(fn) = o(1).

12 2. Simple majority maximizes total influence of monotone Boolean func- tions.

Let n be an odd integer. Denote by Mn a simple majority function on n variables.

Proposition 3.3 Let f be a monotone Boolean function with n variables, n odd with pc(f) = 1/2. Then for every p, 0 < p < 1,

p p I (f) ≤ I (Mn). (16)

By Russo’s lemma it follows that:

Proposition 3.4 Let f be a monotone Boolean function with n variables (n

1 1 an odd integer) with pc(f) = 2 . Then, for every p > 2 , µp(Mn) ≥ µp(f).

3. Not all individual influences can be small.

Theorem 3.5 (Kahn, Kalai & Linial, [?]) For some absolute constant K, for every Boolean function f,

max Ik(f) ≥ K min(µ(f), 1 − µ(f)) log n/n. (17) k

Note that this theorem implies that when all individual influences are the same (e.g., when A is invariant under the induced action from a tran- sitive permutation group on [n]), then the total influence is larger than K min(µ(f), 1 − µ(f)) · log n. An extension for arbitrary product proba- bility spaces was found by Bourgain, Kahn, Kalai, Katznelson and Linial [?]. Talagrand [?] extended the result of Kahn, Kalai and Linial in various

13 directions and applied these results for studying threshold behavior. Tala- grand also presented a very useful extension for arbitrary real functions on the discrete cube. Talagrand’s extension for the product measure µp is stated as follows:

Theorem 3.6 (Talagrand, 1994)

n X p p Ik (f)/(log(Ik (f)) ≥ K (1/ log(2/p · (1 − p)))µp(f)(1 − µp(f)). (18) k=1

Our next result describes Boolean functions with a small total influence.

Theorem 3.7 (Friedgut) For real numbers z > 0, a > 1 and γ > 0, there is C = C(γ, a, z) such that if z ≤ p ≤ 1−z the following assertion holds: For a monotone Boolean function f, if Ip(f) ≤ a then there exists a collection S of at most C variables and a monotone Boolean function g that depends only on the variables in S such that

µp{x ∈ Ωn : f(x) 6= g(x)} < γ. (19)

1 Theorem 3.8 (Russo-Talagrand-Friedgut-Kalai) For every  > 0, 2 ≥ t > 0 and γ > 0 there are δi = δi(γ, , t) > 0, i = 1, 2, 3 such that the following assertion holds:

Let f be a Boolean function and suppose that t ≤ pc(f) ≤ 1 − t. Then any of the following conditions implies that

t(f) < γ.

p (1) For every value of p, 0 < p < 1 and every k, 1 ≤ k ≤ n Ik (f) ≤ δ1.

14 (2) For some value of p for which  < µp(f) < 1 −  (e.g. for p = pc(f)),

p Ik (f) < δ2 for every k.

(3) The Shapley-Shubik power index of each variable in f is at most δ3.

The first part of the theorem was proved by Russo [?]. A sharp version was proved by Talagrand [?] and Friedgut and Kalai [?] based on KKL’s theorem and its extensions. Theorem 3.7 asserts that if the critical probability is bounded away from 0 and 1 and the threshold is coarse then for most values of p in the threshold probability, f can be approximate by a Junta with respect to the probability measure µp. Parts (2) and (3) are derived (based on Friedgut’s result and some additional observations) in [?] but the values of δ2, δ3 are rather weak (doubly logarithmic in γ) and it will be interesting to find better bounds. The third property in the theorem above is, in fact, a characterization:

Theorem 3.9 For a sequence (fn) of monotone Boolean functions,

lim t(fn) = 0, n→∞ for every  > 0 if and only if the maximal Shapley-Shubik power index for fn tends to zero.

4 Fourier Analysis of Boolean functions

We describe some basic properties of Fourier analysis of Boolean func- tions.

15 In this section we describe an important mathematical tool in the study of threshold phenomena and in various related areas. For reading most of the remaining sections the material described in this section is not needed. Thus, the reader can safely skip this section. But being central to most of the mathematical results presented in this paper we feel it is important to get some familiarity with this topic at this early stage.

4.1 All the way to Parseval

It is surprising how much you can get by the simple base-change of the Fourier-Walsh transform with the very elementary Parseval relations.

Let Ωn denote the set of 0-1 vectors (x1, . . . , xn) of length n. Let L2(Ωn) denote the space of real functions on Ωn, endowed with the inner product

X −n < f, g >= 2 f(x1, . . . , xn)g(x1, . . . , xn). (20)

(x1,x2,...,xn)∈Ωn n The inner product space L2(Ωn) is 2 -dimensional. The L2-norm of f is defined by

2 X −n 2 kfk2 =< f, f >= 2 f (x1, x2, . . . , xn). (21)

(x1,x2,...,xn)∈Ωn Note that if f is a Boolean function, then f 2(x) is either 0 or 1 and therefore kfk2 = P 2−nf 2(x) is simply the probability that f = 1 (with re- 2 (x1,...,xn)∈Ωn spect to the uniform probability distribution on Ωn). If the Boolean function 2 f is odd (i.e., satisfying relation (9)) then kfk2 = 1/2. For a subset S of N consider the function

P {xi: i∈S} uS(x1, x2, . . . , xn) = (−1) . (22)

16 n It is not difficult to verify that the 2 functions uS for all subsets S form an orthonormal basis for the space of real functions on Ωn.

For a function f ∈ L2(Ωn), let

ˆ f(S) =< f, uS > . (23)

ˆ (f(S) is called a Fourier-Walsh coefficient of f ). Since the functions uS form an orthogonal basis it follows that

X < f, g >= fˆ(S)ˆg(S). (24) S⊂N In particular,

2 X ˆ2 kfk2 = f (S). (25) S⊂N This last relation is called Parseval’s formula. Remark: To demonstrate the notions introduced here we return to our example. Let M3 represent the majority function on three variables. The ˆ P 1 Fourier coefficients of M3 are easy to compute: M3(∅) = (1/8)M3(x) = 2 . In general if f is a Boolean function then fˆ(∅) is the probability that f(x) = ˆ 1 ˆ 1 and when f is an odd Boolean function, f(∅) = 2 . Next, M3({1}) =

1/8(M3(0, 1, 1) − M3(1, 0, 1) − M3(1, 1, 0) − M3(1, 1, 1)) = (1 − 3)/8 and thus ˆ ˆ M3({j}) = −1/4, for j = 1, 2, 3. Next, M3(S) = 0 when |S| = 2 and finally ˆ M3({1, 2, 3}) = 1/8(M3(1, 1, 0) + M3(1, 0, 1) + M3(0, 1, 1) − f(1, 1, 1)) = 1/4.

17 4.2 The relation with influences

The relation between influences and Fourier coefficients is given by the fol- lowing relations whose proof is elementary:

X ˆ2 Ik(f) = 4 · {f (S): k ∈ S}. (26)

X I(f) = 4 · fˆ2(S)|S|. (27)

S⊂Ωn ˆ If f is monotone we also have Ik(f) = −2f({k}). To practice these notions consider a Boolean function f so that µ(f) = 2 ˆ P ˆ2 kfk2 = 1/2. f(∅) = 1/2 and Parseval relation asserts that S⊂[n] f (S) = P ˆ2 P ˆ2 1/2 so S⊂[n],S6=∅ f (S) = 1/4 and I(f) = 4 · S⊂[n] f (S)|S| ≥ 1. This is an important special case of the edge-isoperimetric inequality, Theorem ??.

Remark: Indeed for our example M3 we have

X ˆ 2 3/2 = I(M3) = 4 · M3 (S)|S| = 4(3(1/16) · 1 + (1/16) · 3). S

4.3 Bernoulli measures

When we consider the probability distribution µp, we have to define the inner product by:

X < f, g >= f(x1, . . . , xn)g(x1, . . . , xn)µp(x1, . . . , xn). (28)

(x1,x2,...,xn)∈Ωn

18 We need an appropriate generalization for the Walsh-Fourier orthonormal basis for general Bernoulli probability measures µp. Those are given by:

P{x : i∈S} P  r1 − p i r p n− {xi: i∈S} up (x , x , . . . , x ) = − · (29) S 1 2 n p 1 − p

Let p be a fixed real number, 0 < p < 1. Every real function f on Ωn can be expanded to X ˆ p f = f(S; p)uS, S⊂[n] where ˆ X p f(S; p) = f(x)uS(x)µp(x). x∈Ωn The relations with influences also extend as follows:

p X ˆ2 p · (1 − p) · Ik (f) = {f (S; p): k ∈ S}. (30)

1 1 X Ip(f) = · · fˆ2(S)|S|. (31) p 1 − p S⊂Ωn Let us make the following notation:

p X ˆ2 Wk (f) = {f (S; p): |S| = k}. (32)

ˆ Exercise: Compute the coefficients M3(S, p) and verify relation (31) for the case of M3.

19 4.4 The Bonamie-Beckner relation

We present a fundamental non-elementary inequality. There are many ways to look at this inequality but its remarkable effectiveness is myste- rious

P ˆ For a real function f :Ωn → R, f = f(S)uS, define the p-norm of a function f to be !1/p X p kfkp = |f(x)| . (33)

x∈Ωn Next define the following operator:

X ˆ |S| Tρ(f) = f(S)ρ uS. (34)

The Bonamie-Beckner inequality asserts that for every real function f on

Ωn,

kTρ(f)k2 ≤ kfk1+ρ2 (35)

Here is a quick and sketchy argument which give a taste for the use of the Bonamie-Beckner’s inequality. Note that for a Boolean function f and every t ≥ 1,

t kfkt = µ(f). (36)

2 Let 0 < ρ < 1, t = 1 + ρ . If a large portion of the L2-norm of f is concentrated on “low frequencies” then kTρ(f)k2 will actually be close to kfk2. The Beckner-Bonamie inequality implies that in this case kfk2 is close to kfkt. (Since t < 2, kfkt ≤ kfk2.) This fact cannot coexist with relation (??) if µ(f) is small. More formally, it can be derived from Bonamie-Beckner

20 inequality, say for ρ = 1/2, that for some constant K,

X fˆ2(S) ≤ Kp(µ(f)(1 − µ(f)). 0<|S|

This implies that most of the L-2 norm of f is concentrated on Fourier coefficients fˆ(S) where |S| ≥ K0 log(1/µ(f). Together with relation (27) we can conclude that I(f) ≥ K00(µ(f)(1 − µ(f)) log(1/µ(f)). This gives (apart from a multiplicative constant) the fundemantal edge-isoperimetric relation (Relation (11)) but while not sharp, the information on Fourier coefficients is stronger. An extension of Bonamie-Beckner inequality for general p can be found in [?]. Remarks: 1. Fourier transform of Boolean functions is tailor-made to deal with the total influence which gives by Russo Lemma the “local” thresh- old behavior. However, to understand the behavior in the entire threshold interval further understanding of the relation between the behavior at differ- ent points is required. For global understanding of influences over the entire R 1 ˆ threshold interval the quantities 0 f(S, p)dp may play a role and it will be interesting to study them. 2. This section is only a taste of a rather young field of Fourier analysis of Boolean functions which has many connections, extensions, applications and problems. We hope to be able to give a fuller treatment elsewhere.

21 5 From Erd¨os and Renyi to Friedgut: ran- dom graphs and the k-SAT problem

5.1 Graph properties and Boolean functions

We first tell how to represent a graph property by a Boolean function.

Another origin for the study of threshold phenomena in mathematics is random graph theory and especially the seminal works by Erd¨os and Renyi [] . Consider a graph G =< V, E >, where V is the set of vertices and E is the set of edges. Let x1, x2, . . . , xe be Boolean variables which correspond to the edges of G. An assignment of the values 0 and 1 to the variables xi corresponds to a subgraph H of G. Namely, H =< V, E0 > and e ∈ E0 if and only if xe = 1. We will mostly consider the case where G is the complete V  graph, namely E = 2 . This basic Boolean representation of subgraphs (or substructures for other structures) is very important. A graph property P is a property of graphs which does not depend on the labeling of the vertices. In other words P only depends on the isomorphism type of G. The property is monotone if when a graph satisfies it every graph G on the same vertex set obtained by adding edges to H also satisfies the property. Examples include: “the graph is connected”, “the graph is planar (can be drawn in the plane without cross- ings)”, “the graph contains a triangle”, “the graph contains a Hamiltonian

22 cycle”. Understanding the threshold behavior of monotone graph properties for random graphs was the main motivation behind the theorem of Bollobas and Thomason (Theorem ??). Their result applies to arbitrary monotone Boolean functions so it does not use the symmetry that we have for graph properties.

Theorem 5.1 (Friedgut and Kalai) For every monotone property P of graphs,

t(P ) ≤ C log(1/)/ log n. (37)

Theorem 5.1 is a simple consequence of KKL’s theorem and its extensions combined with Russo’s lemma. The crucial observation is that all influences of variables are equal for Boolean properties defined by graph properties. As a matter of fact this continues to be true for Boolean functions f that described random subgraphs of an arbitrary edge transitive graph.1 All influences being

p equal imply that the total influence I is at least as large as K min(µp(f), 1−

µp(f)) log n and by Russo lemma this gives the required result. Theorem 5.1 answers a question raised by Nati Linial who also asked about possible applications for the problem of random k-CNF Boolean formulas that we will discuss later. In [?] Friedgut and Kalai raised several questions which were addressed in later works:

• What is the relation between the group of symmetries of a Boolean function and its threshold behavior? 1A graph is edge transitive if for every two edges e and e0 there is an automorphism of the graph that maps e to e0.

23 • What would guarantee a sharp threshold when p tends to zero with n?

• What is the relation between influences, the threshold behavior and other isoperimetric properties of f?

The third question was addressed by several papers of Talagrand [?, ?] and also [?] and we will not elaborate on it here. We will describe in some details the work of Bourgain and Kalai on the first question and the works of Friedgut and Bourgain on the second. There is one comment we would like to make at this point. When we consider the Fourier coefficients fˆ(S) of a Boolean function that represents a graph property then the set S which can be regraded as a subset of the variables also represents a graph. Being a graph property implies large sym- metry for the original Boolean function: it is invariant under permutations of the variables that correspond to permutations of the vertices of the graph. The same is true for the Fourier coefficients: The Fourier coefficient fˆ(S) depends only on the isomorphism type of the graph described by the set S. This is a crucial observation for the results that follow.

5.2 Threshold under symmetry

Here we describe a measure of symmetry which is related to the thresh- old behavior. The more symmetry we have the sharper the threshold behavior we must observe. The measure of symmetry is based on the the size of orbits. Bourgain and Kalai studied the effect of symmetry on the threshold in- terval and their work have led to the following result:

24 Theorem 5.2 (Bourgain and Kalai) For every monotone property P of graphs with m vertices, and every τ > 0,

2−τ t(P ) ≤ C(τ) · log(1/)/(log m) . (38)

It is conjectured that the theorem continues to hold for τ = 0. Let Γ be a group of permutations of [n]. Thus Γ is a subgroup of the group of all n! permutations of [n]. The group Γ acts on Ωn as follows:

π(x1, x2, . . . , xn) = (xπ(1), xπ(2), . . . , xπ(n)), for π ∈ Γ. A Boolean function is Γ-invariant if f(π(x)) = f(x) for every x ∈ Ωn and every πinGamma. We would like to understand influences and threshold behavior of Boolean functions which are Γ-invariant. Returning to graph properties, recall that a graph property for graphs m with m vertices is described by a Boolean function on n = 2 variables. Such Boolean functions are invariant under the induced action from the symmetric group Sm on the vertices (the group of all permutations of the vertices) acting on the all edges. (Recall that the variables of f correspond to the edges of the complete graph on m vertices.) In the previous section we used this symmetry to argue that all individual influences are the same. Here we would like to exploit further the specific symmetry in the situation at hand. Let Γ be a group of permutations of [n]. We will describe now certain parameters of Γ which depend on the size of orbits in the action of Γ on

r subsets of [n]. The discrete hypercube Ωn is divided to layers: Write Ωn for the vectors in Ωn with r 1’s altogether. For a group Γ of permutations of [n], let T (r) denote the number of orbits in the induced action of Γ on

25 r r Ωn and let B(r) be the smallest size of an orbit of Γ acting on Ωn. For graph properties, T (r) is the number of isomorphism types of graphs with m vertices and r edges and B(r) is the minimum number of (labeled) graphs with m vertices and r edges that are isomorphic to a specific graph H. Recall that the number of graphs isomorphic to H is m!/|Aut(H)|, where Aut(H) denotes the automorphism graph of H. The parameter κ(Γ) that “measures” how “big” is the group of symme- tries is defined as follows:

κ(Γ) = min{r : B(r) < 2r}. (39)

When we consider graph properties for graphs with m vertices, B(r)

√m  s behaves like r . To see this note that when r = 2 graphs H with a fewest isomorphic copies (hence largest automorphism groups) are complete graphs. Define also:

rτ κτ (Γ) = min{r : B(r) < 2 }. (40)

Bourgain and Kalai showed that for every τ > 0 the total influence Ip(f) of a Γ-invariant Boolean function is always at least

K(τ)κτ (Γ)(min µp(f), 1 − µp(f)), (41) where K(τ) is a positive function of τ. This theorem reduces to Theorem 5.2 when we specialize to graph properties. Bourgain and Kalai gave examples of a Γ-invariant function fn such that µ(fn) is bounded away from 0 and 1 and

I(fn) = Θ(κ(fn)). Based on this result and on results concerning primitive

26 permutation groups (that require the classification of finite simple groups), it is possible to classify the coarsest threshold behavior for Γ-invariant Boolean functions, when Γ is a primitive permutation group. It will be interesting to give sharper lower bounds for the influences and, for example, to prove a lower bound of K log2 nµ(f)(1 − µ(f)) for the influence of Boolean functions which describe graph properties.

5.3 Threshold behavior for small critical probabilities

Here we describe theorems by Friedgut and by Bourgain which show that when the critical probabilities are small coarse threshold imply that the function has “local” behavior

In this section we state theorems by Friedgut and by Bourgain that study sharp threshold when the critical probability pc is small:

For a family G of graphs let gG be the Boolean function which describes the graph property: “The graph contains a subgraph H, H ∈ G”. For a graph H, e(H) denote the number of edges in H.

Theorem 5.3 (Friedgut) For every  > 0, t > 0 and a > 0, there is C =

C(t, a, ) such that if f represents a graph property, t ≤ µp(f) ≤ 1 − t and

Ip(f) < a

, then the following conclusion holds: There a family of graphs G such that e(H) ≤ C for every H ∈ G such that

µp({x : f(x) 6= gG(x)}) ≤ . (42)

27 Friedgut’s proof relies on symmetry and the statement extends to hy- pergraphs and similar structures. The crucial property appears to be that the number of orbits of sets of size t (Namely, T (r) in the notations of the previous section) has a uniform upper bound. (For graphs this reads: For a fixed nonnegative integer t the number of isomorphism types of graphs with n vertices and t edges is uniformly bounded.) Friedgut conjectured that his theorem can be extended to arbitrary Boolean functions. For a collection G of subsets of [n] (that we can assume is an an- tichain of sets, namely it does not contain two sets R and T with R ⊂ T ) let gG(x1, x2, . . . , xn) defined as follows: gG(x1, x2, . . . , xn) = 1 if and only if for some S ∈ G xi = 1 for every i ∈ S. (The sets in G are called min-terms for the function g. Of course, every Boolean function can be represented in such a way.)

Conjecture 1 (Friedgut) For every  > 0, t > 0 and a > 0, there is C =

C(t, a, ) such that for every Boolean function f such that t ≤ µp(f) ≤ 1 − t and Ip(f) < a, there is another Boolean function g and a family G of subsets of [n] such that: (1) |S| ≤ C for every S ∈ G and

µp({x : f(x) 6= gG(x)}) ≤ 

.

In words, Friedgut’s conjecture asserts that a Boolean function with low influence can be approximated by a Boolean function with small min-terms.

28 A theorem of Bourgain towards this conjecture which is very useful for applications is:

Theorem 5.4 (Bourgain) For every t > 0 and a > 0, there is δ = δ(t, a) >

0 such that for every Boolean function f such that t ≤ µp(f) ≤ 1 − t and p I (f) < a min(µp(f), 1 − µp(f)) there is a set S of variables, |S| < 10 · a such that

µp(f(x)|xi = 1 : i ∈ S) ≥ (1 + δ) · µp(f).

Both Friedgut’s and Bourgain’s theorems are very useful to prove sharp threshold behavior in many cases. We will mention one example that was studied in Friedgut’s original paper. We refer the reader to Friedgut’s recent survey article [?] for many other examples.

The 3-SAT problem. Consider n Boolean variables, x1, . . . , xn. A “literal” zi is either xi or ¬xi. A clause c is an expression of the form (zi ∨ zj ∨ zk) where, 1 ≤ i < j < k ≤ n. And a 3-CNF formula with m clauses is a formula of the form c1 ∧ c2 ∧ · · · ∧ cm. A random formula of length m is obtained by n choosing ci at random uniformly among the possible 8 3 possible clauses. n A closely related model is obtained by choosing each one of the possible 8 3 clauses at random with probability p. A formula is satisfiable if we can assign truth values to the variables so that the Boolean value of the entire formula is “TRUE”. The larger m is the more difficult it is. Friedgut’s theorem can be used to show that there is a threshold s3(n) so that for every  > 0 a random formula with (s3(n) + )n clauses is satisfiable with probability tending to 0 while a random formula with (s3(n) − )n clauses is satisfiable with probability tending to 1. It is still an outstanding problem to show that

29 s3(n) can be replaced by a constant s3 namely that the location of the critical probability does not oscillate. For recent advances concerning the location of the critical value for the k-SAT problem, heuristic algorithms to satisfy and a detailed study for 2-SAT see [?, ?, ?, ?, ?].

5.4 Margulis’ theorem

Here we describe a theorem of Margulis which gives another general method to prove a sharp threshold behavior

Margulis [?] found in 1974 a remarkable condition that guarantees a sharp threshold for Boolean function and applied it for random subgraphs of highly connected graphs. (This paper contains also an earlier proof of Russo’s lemma.) An improvement of his theorem was found by Talagrand [?]. Let f be a Boolean function. For x ∈ Ωn let

h(x) = |{y ∈ Ωn : d(x, y) = 1, f(y) 6= f(x)k. (43)

Thus, h(x) counts the number of neighbors of x for which the value of f changes. Note that X Ip(f) = h(x).

x∈Ωn

Define also h+(x) = h(x) if f(x) = 1 and h+(x) = 0 if f(x) = 0 and note that

p X p · I (f) = h+(x).

x∈Ωn Theorem 5.5 (Talagrand) X p √ p h+(x)µp(x) ≥ µp(f)(1 − µp(f)) 2 min(p, 1 − p)/ p(1 − p) (44)

x∈Ωn

30 Let f = χA be a Boolean function suppose (for simplicity) that pc(f) is bounded away from 0 and 1. Suppose also that when f(x) = 1, if h(x) > 0 then h(x) ≥ k. (Alternatively we can make the same assumption for the case that f(x) = 0.) Define the vertex boundary measure of f (or A) by

p X ∂v (f) = {µp(x): x ∈ Ωn, f(x) = 1, h(x) > 0}. (45)

( p It follows from the property of f that I f) ≥ k · ∂v (f) and from relation √ ?? we conclude that Ip(f) ≥ C k. By Russo’s lemma the length of the √ threshold interval is O(1/ k). Here is Margulis’ original application: Let G be a k-connected graph and consider a random spanning subgraph H where an edge of G is taken to be an edge of H with probability p. We assume that H has n edges and let f be the Boolean function that represent the property: “H is connected.” √ Margulis proved that the threshold interval for f is of length O(1/ k). The reason is that if H is a spanning subgraph of G, H is not connected and it is possible to make H connected by adding a single edge of G, then H must have precisely two connected components and since G is k-connected, there are at least k edges in G\H such that adding them to H yield a connected graph.

5.5 Further connections and problems

1. The giant component. Both Talagrand’s strengthening of Margulis theo- rem and Friedgut’s theorem gives the sharp threshold of graph connectivity

31 as a special case. This is nice, but a serious critique, however, would be that the real interesting phase transition concerning the property of connectivity occurs earlier, when p is around 1/n. The value 1/n is the critical probabil- ity for the emergence of the “giant component”, see [?]. It will be desired to understand, even the basic facts concerning the giant component in the context of general threshold phenomena, discrete isoperimetry and Fourier analysis. 2. We were talking about a monotone graph property or more generally a monotone Boolean function and varied the parameter p. A different sce- nario would be to consider a parameter of graphs or a function defined on the discrete cube and study its distribution for a fixed p. We can consider, for example, the chromatic number, the clique number, the size of the max- imal component etc. The probabilistic properties of monotone functions on the discrete cube, and especially those which comes from interesting graph parameters are of great interest. discrete isoperimetric relations plays a cen- tral role in this study. But direct relations with threshold results and with Fourier analysis are sparse. 3. We should consider also non-monotone properties. A property of graphs (on n vertices) described by a Boolean function f is hereditary if there is a collection H of graphs such that f = 1 if the graph contains a subgraph H from H as an induced subgraph. Alon and Kalai asked for which hereditary properties is it the case the measure of the set of p’s for which  < µp(f) < 1 −  tends to 0 as n tends to infinity. Perhaps this is true for all hereditary properties? Of course, monotone properties are hereditary. 4. Another critique would be that we concentrate on the threshold be-

32 havior which is a secondary problem while neglecting the primary problem - finding the location of the critical probability. Indeed, finding the critical probability for particular random graph properties is a large and beautiful field. We comment that there are (still few) cases where knowing that the threshold is sharp helps in estimating its location, since it is enough to show that the property is satisfied with a certain small, but bounded away from zero, probability. The analogy with physical models suggests that the thresh- old behavior (like certain critical exponents for models of statistical physics) may exhibit more “universal” behavior than the location of the critical prob- ability. Finally, a recent work of Kahn and Kalai [?] suggests that for a large class of problems, good estimates on the location of critical probabilities can follow from understanding the behavior of the function t(f) when  itself is a function that tends to zero with n. And such an understanding can be derived some conjectures about influences of Boolean functions when µp(f) tends to zero with n.

6 Threshold behavior and complexity

In this Section we will discuss two areas where threshold phenomena and com- plexity theory are related. First we will describe results concerning bounded depth circuits which is a very basic notion in computational complexity. Second we will describe the connection to the area of “hardness of approxi- mation”.

33 6.1 Bounded depth Boolean circuit

Boolean functions which belongs to AC0- a very low complexity class (and very exciting nevertheless)-must have a pretty coarse threshold behavior

An important complexity class AC0 of Boolean functions are those which can be expressed by Boolean circuits of polynomial size (in the number of variables) and bounded depth. A Boolean circuit is an acyclic directed graph with 2n sources each correspond to a variable xi or its negation ¬xi, and one sink which is the output of the computation. The intermediate vertices are called gates and can represent the Boolean operations AND and OR. The depth of a Boolean circuit is the maximum length of a directed path. Boppana [?] proved that if a Boolean function f is expressed by a depth-c circuit of size N then c−1 I(f) ≤ C1 log N. (46)

Earlier, Linial, Mansour and Nisan [?] proved that the Fourier coefficients of Boolean functions which can be expressed by Boolean circuits of polynomial

(or quasi-polynomial) size and bounded depth the quantities Wk(f) decays exponentially with k when k is larger than poly-logarithmic in the number of variables. A more precise result was recently given by H˚astad [?] Both these results rely on the fundamental H˚astad Switching Lemma, see [?, ?]. It appears that all these results and their proofs apply to the probability measure µp(f) when p is bounded away from 0 and 1. Remark: A monotone circuit is one where all the gates are monotone increasing in the inputs; i.e., there are no “not” gates. The H˚astad lemma

34 for monotone Boolean circuits is easier and was proved already by Boppana [?]. It can be conjectured that the only reason for small influences (hence for a coarse threshold behavior) comes from bounded depth small circuits. (Here, “small” means a slowly growing function of n.) The following conjecture (formulated in the boldest possible form) is in this direction:

Conjecture 2 (Reverse H˚astad) For every ε > 0 there is a K = K(ε) > 0 satisfying the following. For every monotone Boolean function f there is another function g such that µ{x : f(x) 6= g(x)} < ε and g can be expressed as a Boolean circuit such that

(log N)c−1 < KI(f), where c and N are the depth and size of the circuit, respectively.

Remarks: 1. A large number of papers appeared in recent years that suggest a bold and far-reaching statistical physics approach to fundamental questions in complexity. These papers suggest a new approach in which one regards classical optimization problems as zero-temperature cases of statis- tical physics systems. This approach further proposes that the complexity of problems is related to the type of phase transition of the physical system. In addition, statistical physics suggests both a way of thinking and heuristic mathematical machineries to deal with these problems. This approach was accepted with a great deal of skepticism by the complexity theory commu- nity. And indeed hard evidence for the usefulness of this approach is still missing. The results by Hastad, Linial-Mansour-Nisan and Boppana can be

35 interpreted as going in the direction suggested by physicists but when we deal with complexity classes beyond AC0 caution is still advised. 2. Connections between influences and the model of decision trees can be found in [?, ?].

6.2 Hardness of approximation and PCP I

Can we approximate?? Sometimes approximation is intractable and sometimes it is easy. The recent PCP theory is a powerful tool to study approximation. Sharp threshold phenomena are technically important to prove that some approximation problems are difficult.

The PCP theorem concerns constraint satisfaction problems (sometimes referred to as Label-Cover) of various types, and is the main tool in proving NP-hardness for approximation algorithmic problems. As examples consider the following two computational problems:

Vertex-Cover given a graph G, find the smallest set whose complement is an inde- pendent set.

Max-Cut given a graph G find a partition that maximizes the number of edges between the two sets of the partition.

Coming up with the optimal solution for these problems is known to be NP-hard [?]. The next best option is then to approximate the optimal solu- tion. That is, in the case of Vertex-Cover, come up with an appropriate set, which may not be the smallest, still is known to be larger by at most the

36 approximation factor. Approximating the Max-Cut problem, would require coming up with a partition that may not maximize the size of the cut, nev- ertheless, is known to be smaller only by some factor C, which is referred to as the approximation factor of the algorithm. Proving such problems are NP-hard, requires extending the Cook-Levin [?, ?] characterization of NP, which in simple terms states that SAT is NP- complete. One has to show that even approximating SAT is NP-hard, in the following sense. A Constraint Satisfaction Problem (CSP) assumes a set of variables and constraints over the assignment to those variables, as follows: Let X and Y be two sets of (not necessarily Boolean) variables, whose range is Rx and Ry respectively. The constraints imposed on the variables are local, in the sense that they only involve one variable in X and one in Y . For some pairs of variables (x, y) where x ∈ X and y ∈ Y , there is a constraint φx,y, specifying what values for x and y satisfy that constraint. A quite general version of the PCP theorem is as follows:

Theorem 6.1 (PCP [?, ?, ?]) Given a CSP Φ, it is NP-hard to decide whether

• There is an assignment to all X,Y that satisfies all the constraints φ ∈ Φ.

−Ω(1) • There is no assignment that satisfies more than an  = |Rx| frac- tion of the constraints φ ∈ Φ.

37 One may further assume in the above theorem, that all constraints have the projection property, that is, for each constraint φx,y, for every a ∈ RX there is only one b ∈ RY so that both satisfy φx,y. A general scheme for proving hardness of approximation was developed in [?, ?, ?, ?, ?, ?]. Let us demonstrate this scheme on the Vertex-Cover problem from above.

n First, consider the graph GI [n], whose vertices are all {0, 1} , that is all inputs to a function over n Boolean variables, and where two vertices v and u are adjacent if there is no i ∈ [n] so that vi = vj = 1. This is referred to as the non-intersection graph, and it is the complement of the intersection graph, which was investigated extensively. The largest independent-set in

GI [n] is any of those vectors that all have one on a specific index i, or in other words, it is the pre-image of a dictatorship. The pre-image of the majority function is also an independent-set in the intersection graph, as any two vectors with more than half of their indices being 1 must have an index in which both are 1. Nevertheless, one may impose a different distribution on the vertices of GI [n] which will overrule such examples. One can weight the vertices of GI [n] according to µp for some 1 p smaller than 2 , and weigh independent sets as the sum of their vertices’ weight. In that case, dictatorships weights are p, while majority’s weight tends to 0 as n tends to infinity. What about independent-sets that are not as large, but still not too

1 small? It turns out that for p < 2 any independent-set of non-negotiable weight must correspond in some sense to a junta:

38 1 Theorem 6.2 ([?]) Fix 0 < p < 2 , γ > 0,  > 0 and let I be an independent set in GI [n] of maximal weight according to µp. For some q ∈ [p, p+γ], there O(1/γ) exists C ⊂ [n], where |C| ≤ 2 , such that C is an (, q)-core of fI .

Now, going back to our reduction from the CSP instance Φ above to the vertex-cover problem, one construct a graph GΦ as follows. GΦ consists of one copy of GI [RX ] for every variable x ∈ X, and one copy of GI [RY ] for every variable y ∈ Y . These copies are further connected to verify that a large independent set corresponds to a consistent assignment to the variables in X and Y . If Φ has an assignment satisfying all constraints, then the set of vertices that correspond to the appropriate dictatorship in each copy form an independent set in GΦ. On the other hand, any independent-set in GΦ corresponds to juntas in many of the copies of GI in GΦ, which are furthermore consistent. These, assuming the independent set is not too small, allows one to design an assignment that satisfies more than an  fraction of Φ, which implies Φ is completely satisfied.

6.3 The sharp threshold between easy and hard prob- lems

Can we approximate?? Sometimes approximation is intractable some- times it is easy. There is often a sharp transition between the two be- haviors.

In the previous section we briefly discussed PCP and indicated how tech- nical results concerning threshold phenomena are used. There is another

39 threshold aspect to the story. It turns out that when we try to approxi- mate the solution for an optimization problem, there are various problems for which there is a sharp threshold between cases that are very easy to solve and cases in which the problem is NP-hard. This insight and the methodology for observing such phenomena are fairly recent, and a deeper understanding of the issues involved may lead both to improved approximation algorithms as well as tighter hardness results. Harmonic analysis on Boolean functions, already proved to be a powerful tool for such considerations. Here are some results concerning sharp transition between easy and hard computational problems:

• 1. MAX-3-LIN(2): given a set of linear equations over Z2, satisfy as much of them as possible by assigning the variables – satisfying half of the equations is easy, by just taking a random assignment (this ”algorithm” can be derandomized easily). However, even distinguishing

1 between 2 +  and 1 −  satisfiable instances is NP-hard.

• 2. MAX-3-SAT: similar problem - only instead of equations you have ORs over three literals each. 7/8 fraction of the constraints are ex- pectedly satisfied by a random assignment, yet distinguishing between 7/8 +  and 1 is NP-hard.

• 3. SET-COVER: given a collection of subsets of [n], find the least amount of sets from the collection whose union is [n]. A ln n approxima- tion (namely using at most ln n times more sets than actually needed) is simple to obtain, and nothing better can be achieved unless NP is in sub-exponential time.

40 Khot [?] made a remarkable conjecture concerning a very strong form of PCP. Khot’s conjecture will imply a sharp threshold behavior for the following problems.

• MIN-2-SAT-DELETION: the instance is a set of ORs over 2 literals each. The goal however is to delete as little of the ORs as possible, such as the remaining instance is completely satisfiable. Khot’s conjecture shows that no constant factor approximation is possible if his conjecture is true.

: Given an undirected graph, find the minimal number of nodes that touch all edges. Covering the edges by at most twice the number of nodes needed (namely a 2 approximation) is quite easy - for example by taking BOTH ends of each yet-uncovered edge. Khot’s conjecture implies that 2 is tight.

• MAX-CUT: find a 2-partition of the nodes of a given graph such that as many edges as possible between the two parts. We will return to this problem in the next section.

Remark: Other interesting case of threshold behavior in complexity theory concern fault-tolerance computations, both for classical notions of computation and for the so called “quantum computers”. See also Aharonov [?] for a proposed sharp transition between the laws of quantum and classical physics in the context of computing.

41 7 Noise sensitivity

Is there a good voting method as far as immunity to random noise in counting the votes?

Motivated by mathematical physics, Benjamini, Kalai and Schramm (1999) studied (in a different language) the sensitivity of an election’s outcome to low levels of noise in the signals (or, if you wish, to small errors in the count- ing of votes). Their assumption is that there is a probability  > 0 for a mistake in counting a vote and these probabilities are independent. Simple majority tends to be quite stable in the presence of noise. Two-level majority like the US electoral system is less stable and multi-tier council democracy is quite sensitive to noise. For a Boolean function f and t > 0, consider the following scenario:

First choose the voter signals x1, x2, . . . , xn randomly such that xi = 1 with probability p independently for i = 1, 2, . . . , n. Let S = f(x1, x2, . . . , xn).

Next let yi = xi with probability 1 − t and yi = 1 − xi with probability t, independently for i = 1, 2, . . . , n. Let T = f(y1, y2, . . . , yn). Define Ct(f) to be the correlation between S and R. We will denote by Ns(f) the probability that R 6= S, (as some results are easier to descibe using this notation). Let p, 0 < p < 1, be fixed. (The reader can safely assume that p = 1/2.)

A sequence (fk)k=1,2,... of Boolean functions such that µp(fk) is bounded away from 0 and 1 is called asymptotically noise sensitive if, for every t > 0,

lim Ct(fk) = 0. (47) k→∞

42 We will now define the complementary notion of noise stability. A class F of Boolean functions is uniformly noise stable if for every s > 0 there is u(s) > 0 such that Cu(s)(f) ≥ 1 − s, for every f ∈ F. A basic result concerning noise sensitivity is: that the class of simple and weighted majority functions f such that µp(f) is bounded away from 0 and 1 are noise-stable. A sharp version was recently demonstrated by Peres [?]. (When the individual influences tends to 0 this is a consequence of the central limit theorem.) The main result of [?] is a sort of converse and it asserts the following:

Theorem 7.1 A sequence of Boolean function (fn) such that µp(fn) is bounded away from 0 and 1 which is not noise sensitive must have bounded-away-from- zero correlation with some weighted majority function.

The basic relation between noise sensitivity and influences is that for a

p sequence (fn) of noise sensitive monotone Boolean function lim I (fn) = ∞. Therefore, if f is noise sensitive in its threshold interval then it must have a sharp threshold behavior. On the other hand, in this case the threshold √ interval cannot be of length O(1/ n). For many of the results we mentioned to demonstrate a sharp threshold behavior, where the conclusion was a large total influence the proofs actually implied the stronger property of noise- sensitivity. The following four remarks will perhaps demonstrate further the relevance of noise sensitivity: 1. The connection with Fourier coefficients. A simple but important result from [?] asserts

43 Theorem 7.2 A sequence (fn) of Boolean functions which represent such that µ(fn) is bounded away from 0 and 1, is noise sensitive if and only if for every k > 0 k X lim Wk(fn) = 0. (48) n→∞ i=1

Thus, f is noise sensitive if and only most of the L2-norm of f is con- centrated on “high” frequencies and by the same token noise stability is equivalent to the assertion that most of the L2-norm of f is concentrated on “low” frequencies. 2. The noise stablest conjecture. What are the Boolean functions most stable under noise? It was conjectured by several authors that under several conditions which exclude individual variables having a large influence, ma- jority is (asymptotically) most stable to noise. This conjecture was recently proved by Mossel, O’Donnell and Oleszkiewicz (2005).

We define a sequence (fn) of Boolean functions to have a diminishing individual influence if

lim max{Ik(fn) : 1 ≤ k ≤ n} = 0. (49) n→∞ Theorem 7.3 (Mossel, O’Donnel and Oleszkiewicz(2005)) For a se- quence (fn) of Boolean functions with diminishing individual influence arccos(1 − s) N (f ) ≤ (1 − o(1)) · ( ). (50) s n π The fact that the left hand side gives the precise asymptotic description for the majority function is a 19-century result by Sheppard. 3. MAX-CUT. Khot, Kindler, Mossel and Ryan (2004) showed that the majority is stablest theorem (then - a conjecture which they formed) and

44 Khot’s unique game conjecture implies a sharp threshold for approximating MAX-CUT. The famous Goemans-Williamson algorithm based on semidefi- nite programming achieves the ratio α = .878567. and Khot, Kindler, Mossel and Ryan showed that assuming the majority is stablest theorem and Khot’s unique game conjecture anything better is hard. 4. Monotone threshold circuits: Threshold circuits form an important class of circuits which are more general than Boolean circuits since they allow weighted majority gates. It is not true that functions expressible by constant depth threshold circuits have coarse threshold behavior (e.g. majority itself) but there is a far reaching conjecture regarding their stability to noise which is analogous to the theorems by Boppana, Linial-Mansour-Nisan and Hastad mentioned in the previous section:

Conjecture 3 If a monotone Boolean function f is expressed by a monotone depth-c threshold circuit of size N then

X c−1 lim {Wk(f): k > K log N} = 0. (51)

(K is an absolute constant.)

8 Percolation

We already mentioned in the introduction that the primary area where threshold behavior was studied is Physics.

45 8.1 The crossing event for percolation

Consider the graph G of an n by n + 1 planar rectangular grid. Thus, the vertices of G are points of the form (i, j) : 1 ≤ i ≤ n, 1 ≤ j ≤ n + 1, and two vertices are adjacent in the graph G if they agree in one coordinate and differ by one in the other coordinate. Questions concerning percolation in the plane (usually on the infinite grid) are very important. Russo’s lemma was proved in the context of percolation and Kesten proved a sharp threshold result on the route for proving his famous result concerning critical probabilities for planar percolation. Choose every edge to be open with probability p what is the probability of an open path from the left side of the rectangle to the right side? Is there a sharp threshold? We can ask and immediately answer the analogous question “on the torus” when we identify the left and right sides of the rectangle and the top and bottom side. (or even just for a cylinder when we identify only the left and right sides.) When we look for a path homotopic to the horizontal path from (0,0) to (0, n + 1) a sharp threshold follows from Theorem ?? above. The total influence of the Boolean function f described by “left-right” percolation on the m + 1 by m grid is a basic notion in percolation theory. It

γ 3/8 is conjectured that I(fn) ≈ m , where γ = 3/4.( I.e., I(Fn) =≈ n , where n is the number of variables). This conjecture was recently verified for one of the variants of planar percolation (site percolation on the triangular grid) based on the works of Smirnov, Lawler, Schramm and Werner.

Problem 4 (Basic Problem) For a Boolean function f with µ(f) bounded away from 0 and 1, find sufficient conditions to guarantee that

46 (1) for some α > 0 I(f) > nα.

1 −β (2) for some β > 0 I(f) < n 2 .

Properties (1) and (2) for the crossing event for planar percolation was proved by Kesten. What is the reason that the total influence for percolation behaves like a power of n? We can expect that conceptually the reason lies in some symmetry like the one considered above in the Theorem of Bourgain and Kalai. However, there are two facts we should note. The first is that the present formulation of Theorem ?? is not sufficiently strong to yield lower

α bounds of the form I(fn) > n . The second is that the Boolean function we described does not admit many symmetries. What it does seem to have is “approximate” symmetries. We expect that as the grid becomes finer there is some “limit object” (The scaling limit) and this reflects some approximate symmetry of our functions under continuous maps of the square to itself. Some approximate symmetry w.r.t continuous maps is expected in any di- mension. In dimension two it is expected that the limit object is symmetric w.r.t. conformal maps and this was proved by Smirnov (for a different variant of planar percolation, site percolation on the triangular grid). Noise sensitiv- ity for the crossing event was proved in [?] and recently Schramm and Steif [?] proved a very strong form of noise sensitivity. We will discuss now briefly several related issues: 1. First passage percolation: Let f be a Boolean function. Consider a real function g defined on the discrete cube. Let y1, y2, . . . , yn be identical independent random variables. Define

47 X g(x1, x2, . . . , xn) = min{ x1y2 + x2y2 + ··· + xnyn : f(x1, x2, . . . , xn) = 1}. (52) Understanding the behavior of the function g is of interest in percolation theory. In this context f is the Boolean function which describes the existence of a path of open edges between two points on the grid. Quite curiously the same model is related to questions raised in mechanism design in economics theory. Influences and methods used to study them apply very nicely to the study of first passage percolation [?]. 2. Models with dependence: One of the major research challenges is to extend results described in this paper to models where the probability distribution is not a product distribution. Important cases are the Ising and more general Potts and random cluster models and also models based on of random walks of various types. The random cluster model is a model of random subgraphs of a graph G with n edges that depends on a real parameter q > 0. The probability of a spanning graph H with k edges is proportional to pk(1 − p)n−kqc, where c is the number of connected components of H. This model thus de- fines a two parameter probability distribution on random subgraphs and the challenge is to find useful discrete isoperimetric theory and useful Harmonic analysis for these probability distribution which will allow to extend some of the general theorems described in this paper. 3. The Fourier coefficients. We make a similar comments to our comments

48 concerning Fourier analysis for graph properties. The Fourier coefficients of the crossing (and other) events for percolation are indexed by subgraph of the grid. The Fourier transform gives a distribution on such subgraphs which is very interesting.

9 Economics and Voting: an example of self organizing criticality:

Why should we care about critical probabilities anyway?

Let us return now to the Condorcet Jury theorem from the Introduc- tion. A key assumption in Condorcet Jury Theorem is that each agent votes according to his signal. There is recent interesting literature on the case that voters vote strategically based on their signal. Suppose that every voter wishes to minimize the probability for mistakes. (We can give different weights to mistakes in the two directions.) Feddersen and Pesendorfer con- sidered juries and naturally gave much larger weight to an innocent person being convicted than to a guilty one being acquitted. If, in order to con- vict you need 2/3 of the votes, and jurors vote according to their signals, then, when p = 0.51 and the number of jurors is large, they will hardly ever convict. However, Feddersen and Pesendorfer showed that when the agents vote strategically (and used mixed (randomizes) strategies) the probability of a mistake tends to zero as the number of jurors grows, even if the signal is weak. They also showed that this is not the case when all jurors are re-

49 quired in order to convict. Feddersen and Pesendorfer result and analysis is based on the notion of Nash equilibrium. Nash equilibria in this case gives us a nice example of “self-organizing criticality”. The behavior at the criti- cal point is significant even when the voting method to start with is biased. When we consider general voting methods it can be shown that “asymptoti- cally complete aggregation of information” is intimately related to the sharp threshold. In particular, if there is a sharp threshold then there always is a Nash-equilibrium point for which the probability of mistakes tends to zero as the number of jurors grows. Indeed, it is easy to see that the existence of a sharp threshold is equivalent to the existence of a symmetric strategy (every voter has the same strategy) under which the probability for a mistake tends to zero. Since this is a game were players have a common goal the vector of individual strategies which minimize the probability for a mistake is a Nash equilibrium. Fedderson’s and Pesendorfer’s result is related to the question of why do we care about critical behavior to start with. Why is it the case that so often shortly before an election we can give substantial probability for each one of two candidates to be elected? How come that the probabilities we can assign to the choices of each voter do not ”sum up” to a decisive collective outcome. This seems especially surprising in view of the sharp threshold phenomena. Fedderson and Pesendorfer’s result suggests that strategic behavior of voters can push matter towards criticality. Another explanation would challenge the independence assumption between voters behavior. There are other relations between threshold phenomena and economics and social choice theory. We already mentioned that having a sharp threshold

50 for a sequence of odd Boolean functions is equivalent to having diminishing Shapley-Shubik power index. A famous result in social choice theory is Ar- row’s impossibility theorem concerning election methods when there are three or more candidates. Condorcet’s famous “paradox” demonstrates that given three candidates A, B and C, the majority rule may result in the society preferring A to B , B to C and C to A. Arrow’s Impossibility Theorem is an extension of Condorcet’s paradox which asserts that under certain general conditions, such non-transitive social preferences cannot be avoided under any non-dictatorial voting method. Relation between threshold phenomena and Arrow type theorems are described in [?], see also []. Also in the context of economics a major problem is to understand matters under other, perhaps more realistic, probabilistic assumptions moving away from product distributions. This poses interesting conceptual and technical problems. Another challenge in the economic arena is to study threshold phe- nomena (or in its economics name: aggregation of information) and related notions as noise sensitivity for other, more complex models of economics.

10 A few open problems

We already mentioned some important problems: Prove Friedgut’s conjecture ??. Find sharper versions for Theorem ?? of Bourgain and Kalai. Less explicit but nevertheless important problem is to Explain the emergence of “power laws”: The threshold interval behaves like n−β where β > 0 is a real number. It is an important challenge to Relate the threshold behavior with the

51 threshold’s location and to find methods to exclude the possibility of an oscillating critical probabilities. We mentioned this fundamental question in the context of the k-SAT problem. It is important for many other problems. Another important problem is to find methods to deal with the influence of events of small probability. To analyze the large deviation of the threshold behavior and to understand in details how the function µ(f) behaves. In this paper we mainly dealt with t(f) when  was fixed. It is of great interest to understand the dependence on . The precise behavior of the function

µp(f) in the threshold interval and the situation when  itself is very small and expressed as a function of n are both very interesting topics. Kahn and Kalai [?] described far reaching conjectures concerning the influence Ip(f) of

Boolean functions f when µp(f) is a function of n and tends to 0 when n tends to infinity. They also studied possible applications towards finding the location of the critical probability. It will be interesting to study threshold behavior and influences when we replace the Boolean cube {0, 1}n by the Σn when Σ is a finite Alphabet with more than two letters. Then, we expect that for symmetric monotone functions the transition will happen in small “membranes” [?]. An interesting work concerning powers of arbitrary graphs by Alon, Dinur, Friedgut and Sudakov is [?]. We will repeat a problem already mentioned in various contexts: Study threshold behavior and related notions of noise sensitivity and Fourier anal- ysis for various models with non-product probability distributions. Finally we challenge our readers to explore other applications.

52 11 Concluding remark

Threshold phenomena and related concepts such as pivotality, influence and noise sensitivity are important in many areas of mathematics, science and engineerings. We described some mathematical advances in the understand- ing of threshold behavior and related phenomena, various applications and connections and some open problems. While the mathematical underlying concepts are similar in different disciplines bridging over different points of view, methodologies and interpretations is another major challenge for work- ers in this fields.

References

53