Concentration on the Boolean hypercube via pathwise stochastic analysis

Ronen Eldan∗ and Renan Gross†

Abstract We develop a new technique for proving concentration inequalities which relate between the variance and influences of Boolean functions. Using this technique, we 1. Settle a conjecture of Talagrand [Tal97], proving that

Z q   1 1/2 h (x)dµ ≥ C · Var (f) · log , f P 2 {−1,1}n Infi (f)

where hf (x) is the number of edges at x along which f changes its value, and Infi (f) is the influence of the i-th coordinate. 2. Strengthen several classical inequalities concerning the influences of a Boolean function, showing that near-maximizers must have large vertex boundaries. An inequality due to Talagrand states that for a Boolean function f, Var (f) ≤ C Pn Infi(f) . We give i=1 1+log(1/Infi(f)) a lower bound for the size of the vertex boundary of functions saturating this inequality. As a corollary, we show that for sets that satisfy the edge-isoperimetric inequality or the Kahn-Kalai-Linial inequality up to a constant, a constant proportion of the mass is in the inner vertex boundary. 3. Improve a quantitative relation between influences and noise stability given by Keller and Kindler. Our proofs rely on techniques based on stochastic calculus, and bypass the use of hypercontrac- tivity common to previous proofs. arXiv:1909.12067v3 [math.PR] 12 Mar 2020

∗Weizmann Institute of Science. Incumbent of the Elaine Blond Career Development Chair. Supported by a European Research Council Starting Grant (ERC StG) and by an Israel Science Foundation grant no. 718/19. Email: [email protected]. †Weizmann Institute of Science. Supported by the Adams Fellowship Program of the Israel Academy of Sciences and Humanities, the European Research Council and by the Israeli Science Foundation. Email: re- [email protected].

1 Contents

1 Introduction 2 1.1 Background ...... 2 1.2 Our results ...... 4 1.3 Proof outline ...... 8

2 Preliminaries 10 2.1 Boolean functions ...... 10 2.2 Stochastic processes and quadratic variation ...... 12

3 The main tool: A jump process 12

4 Proof of the improved Talagrand’s conjecture 16

5 Talagrand’s influence inequality and its stability 27 5.1 Proof of Theorem2...... 32 5.2 Proof of Theorem5...... 33

6 Proof of Theorem8 36

A Appendix 1: p-biased analysis 42

B Appendix 2: Postponed proofs 43

1 Introduction

1.1 Background The influence of a Boolean function f : {−1, 1}n → {−1, 1} in direction i = 1, . . . , n is defined as

 n ⊕i  Infi (f) = µ y ∈ {−1, 1} | f (y) 6= f y , where y⊕i is the same as y but with the i-th bit flipped, and µ is the uniform measure on the discrete hypercube {−1, 1}n. The expectation and variance of a function are given by Z 2 Ef = fdµ and Var (f) = E (f − Ef) . {−1,1}n

The Poincaré inequality gives an immediate relation between the aforementioned quantities, namely,

n X Var (f) ≤ Infi (f) . (1) i=1 P The total influence i Infi (f) on the right hand side is equal to the number of edges of the hypercube which separate f (x) = 1 and f (x) = −1. It can therefore be seen as a type of surface-area of f. The inequality (1) in fact holds for any function (for a suitably defined influence), and it is natural to ask whether it can be improved when Boolean functions are considered. A corollary of

2 the breakthrough paper by Kahn, Kalai and Linial (KKL) [KKL88] shows that this inequality can be strengthened logarithmically: There exists a universal constant C > 0 such that

P Inf (f) Var (f) ≤ C i i . (2) log (1/ maxi (Infi (f))) (The above formulation does not appear explicitly in [KKL88], but follows easily from their meth- ods). √ The KKL inequality is tight for the Tribes function, but is off by a factor of n/ log n for √ the majority function, whose influences are all of order 1/ n, suggesting that the total influence P Infi (f) may not be the right notion of surface-area for all Boolean functions. In [Tal93, Theorem 1.1], Talagrand showed that 1 p Var (f) ≤ √ E hf , (3) 2  ⊕i p where hf (y) = # i ∈ [n] | f (y) 6= f y is the sensitivity of f at point y. The value E hf can be seen as another type of surface-area of the function f. A sharp tightening of this inequality was given by Bobkov [Bob97]; his inequality gives an elementary proof of the isoperimetric inequality on Gaussian space. Inequality (3) is tight for linear-threshold functions such as majority, but not for Tribes. Thus, neither inequality implies the other. This raises the following question:

Question 1. What is the right notion of boundary for Boolean functions? Is there an inequality from which both (2) and (3) can be derived?

As a step in this direction, Talagrand conjectured in [Tal97] that (3) can be strengthened, and that there exists a constant β > 0 such that

  e 1/2 ph ≥ β · Var (f) · log . (4) E f P 2 Infi (f) Talagrand showed that there exists an α ≤ 1/2 and a constant β > 0 such that

Z q  e 1/2−α   e α h (x)dµ ≥ β · Var (f) log · log , f P 2 {−1,1}n Var (f) Infi (f) but his proof did not yield the conjectured α = 1/2, and it falls short of recovering the logarithmic improvement in the KKL inequality. Another notion of surface-area is the vertex boundary ∂f of f, defined as n o n ⊕i ∂f = y ∈ {−1, 1} ∃i s.t f (y) 6= f y . It is the disjoint union of the inner vertex boundary, n o + n ⊕i ∂ f = y ∈ {−1, 1} ∃i s.t f (y) = 1, f y = −1 , and the outer vertex boundary, n o − n ⊕i ∂ f = y ∈ {−1, 1} ∃i s.t f (y) = −1, f y = 1 .

3 p p pP The Cauchy-Schwartz inequality implies that E hf ≤ E [hf ] µ (∂f) = i Infi (f) µ (∂f), so the above conjecture strengthens the KKL result in the regime Var (f) = Ω (1) (see below Theorem 4 for a calculation). The inequality (2) was further generalized in another direction by Talagrand [Tal94], who proved the following:

n Theorem 2. There exists an absolute constant CT > 0 such that for every f : {−1, 1} → {−1, 1},

n X Infi (f) Var (f) ≤ C . (5) T 1 + log (1/Inf (f)) i=1 i It is known that this inequality is sharp in the sense that for any sequence of influences, there exist examples which saturate it [KLS+15]. Inequalities such as (2), (4) and (5), in conjunction with concentration of influence [Fri98] and sharp threshold properties [Fri99], have been widely utilized across many subfields of and computer science, including learning theory [OS07], metric embeddings [KR09], first passage percolation [BKS03], classical and quantum communication complexity [Raz95, GKK+09], and hard- ness of approximation [DS05]; and also in social network dynamics [MNT14] and statistical physics [BDC12]. For a general survey, see [KS06]. Talagrand’s original proof of Theorem2, as well as later proofs (see e.g [CEL12]), all rely on the hypercontractive principle.

1.2 Our results In this paper, we develop a new approach towards the proofs of the aforementioned inequalities. Our proofs are based on pathwise analysis, which bypasses the use of hypercontractivity, and in fact uses classical Boolean Fourier-analysis only sparingly. Using these techniques, we first to show that Talagrand’s conjecture holds true:

Theorem 3. There exists an absolute constant C > 0 such that for all f : {−1, 1}n → {−1, 1}, v u ! u e ph ≥ C · Var (f) · log 2 + . E f t P 2 i Infi (f)

In fact, we prove a stronger theorem, of which Theorem3 is an immediate corollary:

Theorem 4. There exists an absolute constant C > 0 such that the following holds. For all f : n n 2 {−1, 1} → {−1, 1}, there exists a function gf : {−1, 1} → [0, 1] with Eg ≤ 2Var (f) so that for all 1/2 ≤ p < 1, !!p h i e hp g ≥ CVar (f) · log 2 + . (6) E f P 2 i Infi (f) The above inequality with p = 1/2 implies both the KKL inequality in full generality and a new lower bound on total influences in the spirit of the isoperimetric inequality. By the Cauchy-Schwartz inequality, hp i p p 2 p p E hf g ≤ Ehf Eg ≤ Inf (f) 2Var (f),

4 and plugging this into (6), we get P Inf (f) Var (f) ≤ C · i i . (7)  e  log P 2 i Infi(f) From this equation we can proceed in two directions:

Theorem4 =⇒ KKL Denoting δ = maxi Infi (f), the above display yields P Inf (f) Var (f) ≤ C · i i .  1  log P δ i Infi(f) P 1/2 Consider now two cases: If δ i Infi (f) ≤ δ , we immediately get from the above display that P i Infi(f) P 1/2 Var (f) ≤ 2C log(1/δ) . And if δ i Infi (f) ≥ δ , we have X 1 1 1 1 Inf (f) ≥ ≥ Var (f) ≥ Var (f) log , i δ1/2 δ1/2 2 δ i P i Infi(f) (for δ < 1, otherwise there is nothing to prove), again yielding Var (f) ≤ 2 log(1/δ) . Hierarchically, the relation between the Poincaré inequality, KKL, Talagrand’s theorem and Theorem4 may be summarized as in Figure1.

Poincaré (1)

=6 ⇒ KKL (2) Talagrand’s inequality (3) 6⇐=

Theorem4

Figure 1: Inequality implications

Theorem4 =⇒ Stability of the Isoperimetric inequality Assume that Ef ≤ 0 and let A = {x ∈ {−1, 1}n | f (x) = 1} be the support of f, so that µ (A) ≤ 1/2. The edge-isoperimetric inequality [Har76, section 3] states that n X 1 Inf (f) ≥ 2µ (A) log , (8) i 2 µ (A) i=1

5 with equality if and only if A is a subcube. Suppose that f saturates the isoperimetric inequality up to a constant, i.e n X 1 Inf (f) ≤ Cµ (A) log i 2 µ (A) i=1 for some constant C. Since Var (f) = 4µ (A) (1 − µ (A)) ≥ 2µ (A), this gives

n X 2 Inf (f) ≤ CVar (f) log . (9) i 2 Var (f) i=1 P 2 Suppose also that f is monotone; then Var (f) ≥ i Infi (f) , and from equations (7) and (9) we get that P Inf (f) P Inf (f) Var (f) ≤ C i i ≤ C i i ≤ C0Var (f) .  2   2  log P 2 log Var(f) i Infi(f) In particular, the two denominators are within a constant factor of each other: ! 2   2  log = Θ log , P 2 Var (f) i Infi (f) implying that the Fourier mass on the first level is proportional to a power of the variance. Next, we reprove Theorem2 using stochastic techniques, and provide a strengthening which can be thought of as a stability version of this bound in terms of the vertex boundary of f: If near-equality is attained in equation (5), then both the inner and outer vertex boundaries of f are large. The theorem reads, Theorem 5. Let T (f) = Pn Infi(f) , and denote r = Var(f) . There exists an absolute i=1 1+log(1/Infi(f)) Tal T (f) constant CB > 0 such that r µ ∂±f ≥ Tal Var (f) . CB CB log rTal Theorem5 can be readily applied to two related functional inequalities - the isoperimetric inequality and the KKL inequality - showing that when either of the inequalities are tight up to a constant, the function must have a large vertex boundary.

The Isoperimetric inequality and vertex boundary It is natural to ask about the ro- bustness of the isoperimetric inequality: Is it true that if near-equality is attained in (8), then A is close to a subcube in some sense? This question was answered in [Ell11] for sets A which are (1 + ε)-close to satisfying the inequality. Conjectures concerning sets for which the inequality is tight only up to a constant multiplicative factor can be found in [KK07]. We make a step in this direction by giving the first bound which is meaningful when the function is O (1)-close to satisfying the inequality (8), showing that in that case, a constant proportion of the set A is in its inner vertex boundary (whereas for the extremizers, the vertex boundary is the entire set A).

2µ(A) log 1 2 µ(A) rIso Corollary 6. Let rIso = Pn . Then there exists a constant cIso ≥   depending Infi(f) 2CB i=1 2CB log rIso only on rIso such that µ (∂A) ≥ cIsoµ (A) .

6 Var(f) Proof. As in Theorem5, denote rTal = T (f) . Observe that for every index i, Infi (f) ≤ 2µ (A). Since µ (A) ≤ 1/2, we have

Var (f) = 4µ (A) (1 − µ (A)) ≥ 2µ (A) .

This gives a bound on rTal: Var (f) Var (f) rTal = ≥ Pn Infi(f) Pn Infi(f) i 1+log(1/Infi(f)) i 1+log(1/2µ(A))   1  rIsoVar (f) 1 + log 2µ(A) r log 2 · Var (f) r = ≥ Iso ≥ Iso . 2 1 2µ (A) 2 log 2 µ (A) log µ(A)

Thus, by Theorem5, there exists a constant c ≥ rIso such that Iso  2CB  2CB log rIso c µ ∂±f ≥ Iso Var (f) ≥ c µ (A) . 2 Iso

The KKL inequality and vertex boundary In its original formulation, the KKL theorem [KKL88, Theorem 3.1], which follows immediately from (2), states that a Boolean function must have a variable with a relatively large influence: There exists an absolute constant C > 0 such that for every f : {−1, 1}n → {−1, 1}, there exists an index i ∈ [n] with log n Inf (f) ≥ C · Var (f) . i n log n Our second corollary states that if all influences are of the order Var (f) n , then the function must have a large (inner and outer) vertex boundary. √ log n Corollary 7. Suppose that for some C ≤ n, we have Infi (f) ≤ C · Var (f) n for all i. Then there exists a constant cKKL depending only on C such that

±  µ ∂ f ≥ cKKLVar (f) . Proof. In this case, we have Var (f) Var (f) rTal = ≥ Pn Infi(f) Pn Infi(f) log n i 1+log(1/Infi(f)) i 1+log(C·Var(f) n )   n  Var (f) 1 + log CVar(f) log n log n − log (CVar (f) log n) 1 ≥ ≥ > . C · Var (f) log n C log n 4C

Thus, by Theorem5, there exists a constant cKKL which depends only on C such that

±  µ ∂ f ≥ cKKLVar (f) .

7 Finally, we improve an inequality by Keller and Kindler [KK13]. Let Sε (f) be the noise stability of f, i.e

Sε (f) = Covx∼µ,y∼Nε(x) [f (x) , f (y)] , where Nε (x) is a random vector whose i-th coordinate is equal to xi with probability 1 − ε and to a uniformly random bit with probability ε.

Theorem 8. There exists universal constants C, c > 0 such that

n !cε X 2 Sε (f) ≤ C · Var (f) Infi (f) . (10) i=1 The bound proved in [KK13] is the same but with the term Var (f) is replaced by a constant; thus our result becomes stronger when Var (f) = o (1) . This theorem is used in the proof of Theorem 2. The relation between influences and noise sensitivity was first established in [BKS99], where a qualitative bound of the same nature is proven.

1.3 Proof outline  (1) (n) n The core of our proofs is the construction of a martingale Bt = Bt ,...,Bt ∈ R which satisfies

(i) n Bt = t and B1 ∼ Unif ({−1, 1} ) (Proposition 11). Since B1 is uniform on the hypercube, the expected value and variance of f can be obtained by Ef = Ef (B1) and Varf = Varf (B1), where the expectations in the right hand sides are over the randomness of the process Bt. Similarly, the influence of the i-th bit is given by Infi (f) = 2 p 2p E∂if (B1) , where ∂if is the partial derivative of f in direction i, and Ehf is given by E k∇f (B1)k2 . The strength of the approach stems from the fact that the behavior of f (B1) can be understood by an analysis of the processes Bt, f (Bt) and ∇f (Bt) for times smaller than 1. Indeed, there is a natural way to extend the domain of a Boolean function to the continuous n hypercube [−1, 1] so that the processes f (Bt) and ∂if (Bt) become martingales. The variance of f can then be expressed as n Z 1 X 2 Var (f) = 2E t (∂if (Bt)) dt i=1 0 (Lemma 13 and Corollary 14). Bounding the variance is then a matter of bounding the integral P R 1 2 E i 0 t (∂if (Bt)) dt, and for this we can utilize tools from real analysis and stochastic pro- cesses. Specifically, two well-known inequalities - called the Level-1 and Level-2 inequalities - give 2 us bounds on the speed with which both the individual processes ∂if (Bt) and their collective sum P 2 (∂if (Bt))i are moving in terms of their current value. In the Gaussian setting, somewhat similar ideas of using level inequalities appear in [Eld15]. This points to a significant conceptual difference between existing techniques that use the hy- percontractivity of the heat operator and our technique: Whereas the former proofs start from the function f and analyze the way that it changes by applying the heat semigroup, which corresponds to going backwards in the time t, our analysis goes forward in time. We may think of the process n Bt as a way to sample from {−1, 1} via a continuous filtration, where we add “infinitesimal bits of randomness” as time progresses. The analysis starts from f (B0) = Ef and considers the way that the martingales evolve as we refine our filtration, or in other words, add more randomness. On a

8 first glance this difference may seem to be only pedagogical, but the strength of our approach is that the pathwise analysis equips us with new tools, such as using stopping times and the optional stopping theorem, and conditioning on the past. Towards proving Talagrand’s conjecture, we use the Level-2 inequality (Lemma 10) to bound P 2 P 2 i (∂if (Bt)) by a time-dependent power of the sum of squares of influences i Infi (f) . Ideolog- 2 ically, when this sum is small, this roughly implies that the processk∇f (Bt)k2 makes most of its movement very close to time t = 1; this is in fact the essence of Theorem8. This can then be used to show that most of the quadratic variation of f (Bt) comes from paths in which there is a time t 1/2   1  such that k∇f (Bt)k2 is larger than α log P 2 (Proposition 19). However, the quadratic Infi (f) variation is itself large with probability that is directly proportional to the variance (Proposition 20). This is one of the steps where the pathwise analysis is crucially used; without it, we would have only known that the quadratic variation is large in expectation, which would not eliminate the possibility that the entire contribution to the variance is made on an event of negligible probability. 2p 2p Since k∇f (B1)k2 is a submartingale, if there was ever a time when k∇f (B1)k2 is large, then in p 2p   1  expectation it continues to be large. Thus E k∇f (B1)k2 is larger than αVar (f) log P 2 , Infi (f) giving the original Talagrand’s conjecture (Theorem3) when p = 1/2. With additional care, it can be shown that f (Bt) itself is large at some time before the gradient was large, which gives the strengthened result (Theorem4). This is a good place to point out an analogy between our technique and the one demonstrated by Barthe and Maurey [BM00], who give a stochastic proof of Bobkov’s extension of inequality (3). They use a stochastic argument in order to derive a one-dimensional inequality which implies Bobkov’s inequality via tensorization, and one of the central components in their proof is to establish that a certain process which is analogous to k∇f (B1)k2 is a submartingale (this is based on ideas introduced in a paper by Capitaine, Hsu and Ledoux [CHL97]). Similarly, for proving Theorem2, we use the Level-1 inequality (Lemma9) to bound each 2 individual (∂if (Bt)) by a time-dependent power of the influence Infi (f) (Lemma 22). When the influences are small, this roughly implies that the martingale f (Bt) makes most of its movement very close to time t = 1. Theorem2 then follows by plugging this bound into the integral (Proposition 23). The proof of Theorem5 is more involved, and utilizes the fact that f (Bt) is both a jump process and a martingale: For such processes, the variance of f (B1) is then given by the sum of squares of jumps of f (Bt) up to time 1: n Z 1 X 2 X 2 Var (f) = E (∆f (Bs)) = 2E t (∂if (Bt)) dt. 0 s∈Jump(Bt) i=1 The technical core of the proof (Proposition 24 and Lemma 25) shows that if T (f) and Var (f) differ only by a multiplicative constant, then with non-negligible probability the process f (Bt) must make a relatively large jump somewhere along the way. Roughly speaking, this is because if the process (i) Bt jumps at time t, then the function f (Bt) also jumps, changing by a value of 2t∂if (Bt). If all the jumps are small, then the expression P (∆f (B ))2 in the left hand side of the above s∈Jump(Bt) s display (which cares only about jumps) must be substantially smaller than the integral in the right hand side (which cares only about the size of the derivatives). Now, when the process f (Bt) makes a large jump, it necessarily means that the magnitude of one of the partial derivatives ∂if is large. Since the process ∂if (Bt) is also a martingale, if it is large

9 at some point in time, then it continues to be large with relatively high probability. But at time t = 1, since B1 is uniform on the hypercube, the only possibilities for the values of ∂if (B1) are −1, 0 and 1. Thus, it is likely that |∂if (B1)| = 1. This exactly corresponds to the point B1 being in the vertex boundary, showing that the vertex boundary is large. The distinction between the inner and outer vertex follows by similar arguments, using a symmetrification of Bt (Proposition 26).

Acknowledgements R.E. would like to thank Noam Lifshitz for useful discussions and in particular for pointing out the possible application to stability of the isoperimetric inequality. We are also thankful to Ramon Van Handel, Itai Benjamini and Gil Kalai for an enlightening discussion, and grateful for the comments from the anonymous STOC referees.

2 Preliminaries

Throughout the text, the letter C stands for a positive universal constant, whose value may change from line to line.

2.1 Boolean functions For a general introduction to Boolean functions, see [O’D14]; in what follows, we provide a brief overview of the required background and notation. n Every Boolean function f : {−1, 1} → R may be uniquely written as a sum of monomials: X Y f (y) = fˆ(S) yi, (11) S⊆[n] i∈S where [n] = {1, . . . , n}, and the harmonic coefficients (also known as Fourier coefficients) fˆ(S) are given by " # ˆ Y f (S) = E f (y) yi . (12) i∈S Equation (11) may be used to extend a function’s domain from the discrete hypercube {−1, 1}n to n real space R . We call this the harmonic extension, and denote it also by f. Under this notation, n f (0) = Ef. In general, for x ∈ [−1, 1] , the harmonic extension f (x) is a convex combination of f’s values on all the points y ∈ {−1, 1}n: X f (x) = wx (y) f (y) , (13) y∈{−1,1}n Qn where wx (y) = i=1 (1 + xiyi) /2. The derivative of a function f in direction i is defined as

f yi→1 − f yi→−1 ∂ f (y) = , i 2 where yi→a has a at coordinate i, and is identical to y at all other coordinates. The gradient is then defined as ∇f = (∂1f, . . . , ∂nf). A function is called monotone if f (x) ≤ f (y) whenever xi ≤ yi for

10 all i ∈ [n]. Similar to the function f, by abuse of notation ∂if will denote the harmonic extension n of ∂if, and we will treat it as a function on [−1, 1] . A short calculation reveals the following properties of the derivative:

1. The harmonic extension of the derivative ∂if is equal to the real-differentiable partial deriva- tive ∂ of the harmonic extension of f. ∂xi

2. For functions whose range is {−1, 1}, the derivative ∂if takes values in {−1, 0, 1}, and the influence of the i-th coordinate of f is given by 2 Infi (f) = E (∂if (y)) = E |∂if (y)| . (14)

3. For monotone functions, the derivative ∂if only takes values in {0, 1}, and the influence of the i-th coordinate is then given by ˆ Infi (f) = E∂if (y) = f ({i}) . (15)

In the definition of the Fourier coefficient in (12), the expectation is over the uniform measure 1 µ (y) = 2n . It is also possible to decompose a function into Fourier coefficients over a biased measure. This type of analysis will be used only in the proof of Theorem8. A brief overview can be found in the appendix. Finally, we’ll require two lemmas which effectively relate the weights of the Fourier coefficients at higher levels with those of lower ones; this translates to inequalities between the harmonic extension of a function and its derivatives. The first lemma is a direct application of the fact that wx (·) is subgaussian; it essentially bounds the Fourier weights in the first level by a function of the weights at level zero: Lemma 9 (Level-1 inequality). There exists a constant L so that the following holds. Let g : [−1, 1]n → [0, 1] be the harmonic extension of a Boolean function, and let x ∈ (−1, 1)n be such that |xi| = t for all i. Then L e k∇g (x)k2 ≤ g (x)2 log . (16) 2 (1 − t)4 g (x) The second lemma, whose original, uniform case is due to Talagrand [Tal96], essentially bounds the Fourier weights in the second level by those of the first. It is similar to [KK13, Lemma 6], but for real-valued functions. Lemma 10 (Level-2 inequality). There exists a continuous function C : [0, 1) → [0, ∞) so that the following holds. Let g :[−1, 1]n → [−1, 1] be the harmonic extension of a monotone function, and n let x ∈ (−1, 1) be such that |xi| = t for all i. Then ! 2 2 2 C (t) ∇ g (x) HS ≤ C (t) k∇g (x)k2 · log 2 , (17) k∇g (x)k2

2 n qP 2 where ∇ g is the Hessian (∂i∂jg)i,j=1 of g, and kXkHS = i,j Xij is the Hilbert-Schmidt norm of a matrix.

Remark. In both lemmas, the requirement that |xi| = t for all i is not crucial, and can be replaced by |xi| ≤ t for all i. The proofs of both lemmas are found in the appendix.

11 2.2 Stochastic processes and quadratic variation For a general introduction to stochastic processes and Poisson processes, see [Dur19] and [Kin93]. A Poisson point process Nt with rate λ (t) is an integer-valued process such that N0 = 0, and for every 0 ≤ a < b, the difference Nb − Na distributes as a Poisson random variable with rate R b R b a λ (t) dt. If a λ (t) dt < ∞ for all 0 ≤ a < b, then the sample-paths of a Poisson point process are right-continuous almost surely. The (random) set of times at which the sample-path is discontinuous is denoted by Jump (Nt). R b Let λ (t) be such that a λ (t) dt < ∞ for all 0 ≤ a < b and let Nt be a Poisson point process with rate λ (t). The set Jump (Nt) = {t1, t2,...} is then almost surely discrete. A process Xt is said to be a piecewise-smooth jump process with rate λ (t) if Xt is right-continuous and is smooth in the interval [ti, ti+1) for every i = 1, 2,.... This definition can be extended to the case where R b R b 0 λ (t) dt = ∞ but a λ (t) dt < ∞ for all 0 < a < b (this happens, for example, when λ = 1/t): In this case Jump (Nt) has only a single accumulation point at 0, and intervals between successive jump times are still well defined. An important notion in the analysis of stochastic processes is quadratic variation. Intuitively, the quadratic variation of a process Xt, denoted [X]t, describes how wildly the process Xt fluctuates; formally, it is defined as n X 2 [X]t = lim Xtk − Xtk−1 , kP k→0 k=1 if the limit exists; here P is an n-part partition of [0, t], and the notation limkP k→0 indicates that the size of the largest part goes to 0. Not all processes have a (finite) quadratic variation, but piecewise- smooth jump processes do; in fact, it can be seen from definition that if Xt is a piecewise-smooth jump process then X 2 [X]t = (∆Xs) , (18) s∈Jump(Xt)∩[0,t] where ∆Xs = limε→0+ (Xs+ε − Xs−ε) is the size of the jump at time s. The quadratic variation is especially useful for martingales due to its relation with the variance: If Xt is a martingale, then Var (Xt) = E ([X]t) . (19)

3 The main tool: A jump process

The proof of Theorems2 and5 relies on the construction of a piecewise-smooth jump process martingale Bt, described below. One of its key properties is that it will allow us to express quantities such as the variance of f in terms of derivatives of the harmonic extension, e.g:

n Z 1 X 2 Var (f) = 2E t (∂if (Bt)) dt. i=1 0

The process (Bt)t≥0 is characterized by the following properties:

n (i) 1. Bt ∈ R , with Bt independent and identically distributed for all i ∈ [n]. (i) 2. Bt is a martingale for all i.

12

(i) 3. Bt = t almost surely for all i ∈ [n] and t ≥ 0. Proposition 11. There exists a right continuous martingale with the above properties. Furthermore, for all t, h > 0, h i h signB(i) 6= signB(i) | B = . (20) P t+h t t 2 (t + h)

Proof. Let Ws be a standard Brownian motion. Consider the family of stopping times n o

τ (t) = inf s > 0 |Ws| > t and define Xt = Wτ(t). Then by definition, |Xt| = t, and Xt is a martingale due to the optional stopping theorem. Observe that Xt can fail to be right-continuous only if signWτ(t) is different from signWτ(s) for all s 6= t in some open interval around t. This event happens with probability 0, and so there exists a modification of Xt where paths are right-continuous almost surely. The process Bt  (1) (n) (i) is defined as Bt = Xt ,...,Xt , where Xt are independent copies of Xt. To prove equation (20), set p = P (signXt+h 6= signXt | Xt) and use the martingale property:

tsignXt = Xt h i

= E Xt+h Xt

= (−t − h) signXt · p + (1 − p)(t + h) signXt.

h Rearranging gives p = 2(t+h) as needed.

(i) It can be readily seen that Bt is a piecewise-smooth jump process with rate λ (t) = 1/2t.  (i) Denote its set of discontinuities by Ji = Jump Bt . n As described in (11), the harmonic extension of a function f : {−1, 1} → R is a multilinear polynomial. Since the product of two independent martingales is also a martingale with respect to its natural filtration, by independence of the coordinates of Bt, we conclude that n Fact 12. For a function f : {−1, 1} → R, the process f (Bt) is a martingale.

We denote this process by ft = f (Bt), and by slight abuse of notation, write ∂ift = ∂if (Bt) and ∇ft = ∇f (Bt). Since Bt is right-continuous, these processes are right-continuous also; when refer- ring to the left limit at jump discontinuities, we write ft− , ∂ift− and ∇ft− , with ft− = limε&0 ft−ε. Some example sample paths of ft for the 15-bit majority function are given in Figure2. Since ft is a piecewise-smooth jump process, by (18) its quadratic variation is equal to the sum of squares of its jumps. Now, almost surely, Bt can make a jump only in one coordinate at a time, and when the i-th coordinate jumps, the value of ft changes by 2t∂ift, since f is multi-linear. The quadratic variation of ft is therefore

n X X 2 [f]t = (2s · ∂ifs) . (21) i=1 s∈Ji∩[0,t]

A crucial property of Bt is that the expected value of these jumps behaves smoothly, as the next lemma shows:

13 Figure 2: Sample paths of ft for the 15-bit majority function

Lemma 13. Let 0 ≤ t1 < t2 ≤ 1, and let gt be a bounded process which satisfies one of the following:

1. gt is left-continuous and measurable with respect to the filtration generated by {Bs}0≤s 0, n Z t0 X 2 Var (ft0 ) = 2E t (∂ift) dt. (23) i=1 0

Proof. Since ft0 is a martingale, by Equation (21), its variance is the expected value of the quadratic variation: n X X 2 Var (ft0 ) = E (2t∂ift) . i=1 t∈Ji∩[0,t0] 2 Setting gt = (∂ift) in (22) completes the proof. n Corollary 15. Let f : {−1, 1} → R. Then n d X f 2 = 2t (∂ f )2 = 2t k∇f k2 . dtE t E i t E t 2 i=1

14 Proof. By the martingale property of ft, d d d   d f 2 = f 2 − f 2 = (f − f )2 = Var (f ) . dtE t dt E t E 0 dt E t 0 dt t Taking the derivative of equation (23) and using the fundamental theorem of calculus on the right hand side gives the desired result.

It is a basic fact (see e.g [GS15, section 4.3]) that if f has Fourier expansion f (x) = P ˆ S f (S) χS (x), its noise stability is given by

X 2 |S| Sε (f) = fˆ(S) (1 − ε) . S6=∅

On the other hand, recalling ft = f (Bt), a short calculation reveals that

X 2 2|S| Var (ft) = fˆ(S) t . S6=∅   √ Thus Sε (f) = Var f 1−ε , and the inequality (10) in the statement of Theorem8 becomes

!cε   n √ X 2 Var f 1−ε ≤ CVar (f) Infi (f) . i=1 Together with equation (23), this turns into

√ cε n Z 1−ε n ! X 2 X 2 E t (∂ift) dt ≤ CVar (f) Infi (f) . (24) i=1 0 i=1 We will use this formulation rather than the original statement of (10). (i) (i) (i) For every index i, let f be the harmonic extension of |∂if|, and let ft = f (Bt). If f is (i) monotone then f = ∂if, since the derivatives are positive, but in general,

(i) n f (x) ≥ |∂if (x)| ∀x ∈ [−1, 1] (25) by convexity. In particular, plugging (25) into Corollary 14, we have

n Z t0 X  (i)2 Var (ft0 ) ≤ 2 tE ft dt. (26) i=1 0

(i) We call the process ft the “influence process”, because of how the expectation of its square relates to the influence of f: Observe that by (14),

(i) (i) f0 = Ef = E |∂if| = Infi (f) . (27)

2 2  (i)  (i) 2 (i) 2 (i) Thus, at time 0, we have E f0 = f0 = Infi (f) , while at time 1, since f (y) = f (y) for 2 2 n  (i) (i) (i)  (i) y ∈ {−1, 1} , we have E f1 = Ef1 = Ef = Infi (f). The expected value E ft increases

15 2 2  (i) from Infi (f) to Infi (f) as t goes from 0 to 1. We denote this expected value by ψi (t) := E ft . Equation (26) then becomes

n X Z t0 Var (f (Bt0 )) ≤ 2 tψi (t) dt. i=1 0 R 1 The integral E 0 tψi (t) dt may be more easily handled using a time-change which makes ψi (t) log-convex; we can then bound it by a power of the influence. For this purpose, for s ∈ (0, ∞), 2 −s  (i)  denote ϕi (s) := ψi (e ) = E fe−s . 2 Lemma 16. Let g be the harmonic extension of a Boolean function, and let h (s) = g (Be−s ) . Then h (s) is a log-convex function of s. Proof. Expanding g as a Fourier polynomial, we have  2 2 X Y  (i)  h (s) = Eg (Be−s ) = E  gb(S) Be−s   S⊆[n] i∈S 2 X 2 Y  (i)  = gb(S) Be−s S⊆[n] i∈S X 2 −2s|S| = gb(S) e . (28) S⊆[n]

This is a positive linear combination of log convex-functions e−2s|S|, and is therefore also log-convex [BV04, section 3.5.2].

Finally, we’ll need the following easy technical lemma, whose short proof is postponed to the appendix. Lemma 17. Let g : [0, ∞) → [0, ∞) be a differentiable function satisfying K g0 (t) ≤ C · g (t) log , (29) g (t) where C,K are some positive constants. Suppose that g (0) ≤ K/2. Then there exists a time t0, which depends only on C and K, such that for all t ∈ [0, t0] ,

e−Ct−1  1  −Ct g (t) ≤ g (0)e . K

4 Proof of the improved Talagrand’s conjecture

We prove Theorem4 assuming that improved Keller-Kindler inequality (24) holds; the proof of (24) is found in Section6. Without loss of generality, we assume that Ef ≤ 0. n The function gf : {−1, 1} → [0, 1] used in Theorem4 will be defined by   1 + fs gf (y) = E sup B1 = y . 0≤s≤1 2

16 1 Recall Doob’s martingale inequality (see e.g [Dur19, Theorem 4.4.4]), which states that if (Xt)t=0 is a non-negative martingale, then " 2#  2 E sup Xs ≤ 4E X1 . 0≤s≤1

1+ft Using this inequality for the non-negative martingale Xt = 2 , we thus have "  2# " "  2 ## 2 1 + fs 1 + fs Eg = E E sup B1 = y ≤ E E sup B1 = y 0≤s≤1 2 0≤s≤1 2 "  2# 2 1 + fs 1 − (Ef) = E sup ≤ 2 (1 + Ef) = 2 ≤ 2Var (f) . 0≤s≤1 2 1 − Ef as required by the theorem. We now turn to prove the inequality (6) using gf . To this end, we can relate the product p hf (y) gf (y) to the stochastic constructions in the previous section. By definition of the discrete n ⊕i ⊕i derivative, for any y ∈ {−1, 1} , ∂if (y) = 0 if f (y) = f y , and ∂if (y) = ±1 if f (y) 6= f x , so n X 2 2 hf (y) = ∂if (y) = k∇f (y)k2 . i=1 p 2p We thus have hf (y) = k∇f (y)k2 , and using this relation we define the stochastic process

2p 1 + fs Ψt = k∇ftk2 sup , 0≤s≤t 2 noting that h p i EΨ1 = E hf gf .

Our goal is therefore to bound EΨ1 from below. 2p A simple calculation, using the fact that k∇ftk2 is a submartingale, shows that EΨt is an increasing function of t. For the rest of this section, we therefore assume that !!p e Ψ ≤ Var (f) · log 2 + ; 0 ≤ t ≤ 1, (30) E t P 2 i Infi (f) otherwise there is nothing to prove.

Our proof relies on the existence of a stopping time τα such that with high probability Ψτα is large, as is shown by the following proposition. Fix α > 0 whose value is to be chosen later, and define ( !!p) 1 e τ = inf 0 ≤ t ≤ 1 Ψ > α log 2 + ∧ 1. α t 8 P 2 i Infi (f) Proposition 18. Assume that (30) holds. Then there exists an α > 0 such that

P [τα < 1] ≥ CVar (f) , where C > 0 is a constant which depends only on α.

17 Assuming the above proposition holds, the improved Talagrand’s conjecture swiftly follows:

Proof of Theorem4. By conditioning on the event τα < 1, we have h i Ψ ≥ Ψ ≥ Ψ τ < 1 [τ < 1] E 1 E τα E τα α P α !!p 1 e ≥ α log 2 + · CVar (f) . 8 P 2 i Infi (f)

The rest of this section is devoted to proving Proposition 18. The main idea is to see how different sample paths contribute to the quadratic variation of ft. On the one hand, the lion’s share of the quadratic variation is gained from paths where the gradient’s norm k∇ftk2 is large. On the other hand, the quadratic variation has a relatively high probability to be large, and so k∇ftk2 must be large with relatively high probability as well. This argument takes care of the gradient’s contribution to Ψt; to deal with the supremum’s contribution, we show that with high enough probability, either ft makes a large jump (which causes both sup ft and the gradient’s norm to be large at the same time) or there is a time where ft’s position is bounded away from the endpoints {−1, 1}, allowing its gradient to be large later on.

Proof of Proposition 18. Let θ = inf {t ≥ 0 | ft > 0} ∧ 1. Since ft is a martingale,

f0 = Ef1 = 2P [f1 = 1] − 1, and since {f1 = 1} ⊆ {θ < 1},

2 (f ≤0) 1 + f0 1 − f0 Var (f) 0 1 P [θ < 1] ≥ P [f1 = 1] = = = ≥ Var (f) . (31) 2 2 (1 − f0) 2 (1 − f0) 4 By conditioning on θ < 1 we have h i

P [τα < 1] ≥ P τα < 1 θ < 1 P [θ < 1] 1 h i ≥ Var (f) · τ < 1 θ < 1 . 4 P α

It remains only to show that for small enough (but fixed) α, P [τα < 1 | θ < 1] is larger than some constant. Since we assume Ef ≤ 0, the process ft starts at f0 ≤ 0. There are two different ways for ft to cross above the value 0: It could either move across it continuously, or it could jump from some value smaller than 0 to some value larger than 0. We now divide the analysis into two cases, depending on the probability that ft makes a very large jump over 0 at time θ.

Case 1: Small jump

Suppose that P [fθ ∈ [0, 1/2] | θ < 1] ≥ 1/2, i.e with constant probability, no large jump was made at time θ. The next two propositions show that with high probability the quadratic variation gained from time θ onward must be large, and that most of the quadratic variation is gained at times when,

18 just before the process jumps, the gradient is large. To make these notions precise, we will need the following definitions. Let  v  u !  u e  F = k∇f − k > α log 2 + α,t t 2 t P 2  i Infi (f)  be the event that the norm of the gradient is large just before time t; let   Et = sup fs ≥ 0 , 0≤s

n X X 2 Vα = (2t∂ift) 1 C 1E , Fα,t t i=1 t∈Ji∩[0,1] be the quadratic variation accumulated at times when the supremum is large but the gradient is small; and let n t1→t2 X X 2 V = (2t∂ift)

i=1 t∈Ji∩(t1,t2] be the gain in quadratic variation from time t1 up to and including time t2. Proposition 19. Let 0 < α < 1/e, and assume that (30) holds. There exists a function ρ : [0, 1] → R with limx→0 ρ (x) = 0 such that E [Vα] ≤ ρ (α) Var (f) . (32) hPn P 2 i Proof. We first express [V ] = (2t∂ f ) 1 C 1 as an integral, rather than a E α E i=1 t∈Ji∩[0,1] i t Fα Et sum over jumps. Since ∂ift is independent of coordinate i, we have that for t ∈ Ji, ∂ift = ∂ift− . Thus  n  X X 2 [V ] = (2t∂ f − ) 1 C 1 . E α E  i t Fα Et  i=1 t∈Ji∩[0,1] 2 The process g = (∂ f − ) 1 C 1 is measurable with respect to the filtration generated by {B } t i t Fα Et s 0≤s

n X X 2 E [Vα] = E 4t gt i=1 t∈Ji∩[0,1] Z 1 h 2 i (Lemma 13) = 2 t k∇f − k 1 C 1 dt E t 2 Fα Et 0 Z 1 h 2 i = 2 t k∇f k 1 C 1 dt, (33) E t 2 Fα Et 0

2 2 where the last equality is because k∇f − k 1 C 1 can differ from k∇f k 1 C 1 only at disconti- t 2 Fα Et t 2 Fα Et nuities.

19 Let δ0 > 0 be defined as

 c−1 log(1/α) P 2 1    Infi (f) ≤  e i 2 0 log 2+ P 2 δ = i Infi(f) 1 otherwise, where c is the universal constant from Theorem8, and set

δ = min δ0, 1 .

Consider the integral

Z 1 Z 1−δ Z 1 h 2 i h 2 i h 2 i tE k∇ftk2 1F C 1Et dt = tE k∇ftk2 1F C 1Et dt + tE k∇ftk2 1F C 1Et dt. 0 α,t 0 α,t 1−δ α,t The first integral on the right hand side is equal to 0 if δ = 1. Otherwise, we necessarily have P 2 that i√Infi (f) ≤ 1/2, in which case the integral can be bounded using equation (24): Since 1 − δ ≤ 1 − δ for all δ ∈ [0, 1], we have

Z 1−δ Z 1−δ h 2 i 2 tE k∇ftk2 1F C 1Et dt ≤ E t k∇ftk2 dt 0 α,t 0 √ Z 1−δ 2 ≤ E t k∇ftk2 dt 0 cδ (24) ! X 2 ≤ C1Var (f) Infi (f) i  P 2 −1 P 2 i Infi(f) log Infi(f) log ( i ) 2 P Inf (f)2+e ≤ C1Var (f) α i i

P 2 for some constant C1 > 0. Since i Infi (f) ≤ 1/2, the exponent    P 2 −1 P 2 i Infi(f) log i Infi (f) log P 2 is bounded below by 1/3. Since α < 1, we thus 2 i Infi(f) +e P 2 have that regardless of the value of i Infi (f) , Z 1−δ h 2 i 1/3 tE k∇ftk2 1F C 1Et dt ≤ C1α Var (f) . (34) 0 α,t

The second integral on the right hand side can be bounded using the definitions of Fα,t and Et: By definition of Fα,t we have

h 2−2p 2p i h 2−2p  2p i k∇f − k k∇ftk 1 C 1E = k∇f − k 1 C k∇ftk 1E E t 2 2 Fα,t t E t 2 Fα,t 2 t !!1−p e h i ≤ α2−2p log 2 + k∇f k2p 1 , (35) P 2 E t 2 Et i Infi (f)

20 whereas by definition of Et,   h 2p i 2p 1 + fs E k∇ftk2 1Et ≤ 2E k∇ftk2 sup 1Et s

c−1 log(1/α) Since in any case δ ≤ C2 ·   for some constant C2 > 0, we thus have e log 2+ P 2 i Infi(f) Z 1 h 2 i −1 2−2p tE k∇ftk2 1F C 1Et dt ≤ C2c α log (1/α) Var (f) . (36) 1−δ α,t

−1 Combining (34) and (36), there exists an absolute constant C := C1 + C2c > 0 such that Z 1  2   1/3 2−2p tE k∇ftk2 1F C 1Et dt ≤ C α + α log (1/α) Var (f) . (37) 0 α,t Plugging this into (33) finishes the proof, with ρ (x) = C x1/3 + x2−2p log (1/x).

Proposition 20. Let a ∈ [0, 1) and let 0 ≤ θ ≤ 1 be a Bt-measurable stopping time such that h i

P fθ ∈ [−a, a] θ < 1 ≥ q for some q ∈ [0, 1]. Then  1  1 V θ→1 ≥ q (1 − a)2 θ < 1 ≥ q (1 − a)2 . P 5 9 This proposition reflects the intuition that for a martingale to reach a point far from its initial position, it should have a large quadratic variation. The proof, however, requires using the particular details of the way the martingale jumps.

Proof. Let x > 0 be a number whose value will be chosen later, and let σ = inf t > θ | V θ→t ≥ x2 ∧ 1 be the first time that the quadratic variation grows beyond x2. Since the quadratic variation increases only when ft jumps, and since the probability of jumping at time t = 1 is 0, the event  θ→1 2 V ≥ x | θ < 1 is equal to the event {σ < 1 | θ < 1}, which in turn implies that |fσ| < 1: h i h i θ→1 2 P V ≥ x θ < 1 ≥ P |fσ| < 1 θ < 1 . (38)

21 Hence it suffices to bound the probability that |fσ| < 1. This can be done by looking at the second 2 moment E (fσ − fθ) : On one hand, since fθ ∈ [−a, a] with probability at least q when θ < 1, we have h i h i h i 2 2 E (fσ − fθ) θ < 1 ≥ E (fσ − fθ) | {θ < 1} ∩ {|fσ| = 1} P |fσ| = 1 θ < 1 h i 2 ≥ q (1 − a) P |fσ| = 1 θ < 1 , giving h i 2 h i E (fσ − fθ) θ < 1 P |fσ| < 1 θ < 1 ≥ 1 − . (39) q (1 − a)2

h 2 i On the other hand, E (fσ − fθ) | θ < 1 can be bounded by considering the size of the jumps of ft: Since the σ-algebra generated by the event θ < 1 is contained in that generated by Bθ, h i h h i i 2 2 E (fσ − fθ) θ < 1 = E E (fσ − fθ) | Bθ θ < 1 .

Conditioned on Bθ, the process ft is a martingale for t ∈ [θ, σ], and so by (19), h h i i h i 2 E E (fσ − fθ) | Bθ θ < 1 = E E [([f]σ − [f]θ) | Bθ] θ < 1 h i

= E [f]σ − [f]θ θ < 1 h i θ→σ = E V θ < 1 h − i θ→σ 2 = E V + (∆fσ) θ < 1 .

By definition of σ, V θ→σ− ≤ x2, and since all jumps are bounded by 2, h i h i 2 2 2 E (fσ − fθ) θ < 1 ≤ x + E (∆fσ) θ < 1 h i h i 2 2 ≤ x + x P ∆fσ < x θ < 1 + 4 · P ∆fσ ≥ x θ < 1  h i h i 2 2 = x + x 1 − P ∆fσ ≥ x θ < 1 + 4P ∆fσ ≥ x θ < 1 h i 2 2 = 2x + P ∆fσ ≥ x θ < 1 4 − x . (40)

Plugging this into (38) and (39), we get

2 h i 2 2x + ∆fσ ≥ x θ < 1 4 − x h θ→1 2 i P P V ≥ x θ < 1 ≥ 1 − . q (1 − a)2

 θ→1 2 Since {∆fσ ≥ x | θ > 1} ⊆ V ≥ x | θ > 1 , this gives h i 2x2 + V θ→1 ≥ x2 θ < 1 4 − x2 h θ→1 2 i P P V ≥ x θ < 1 ≥ 1 − . q (1 − a)2

22  θ→1 2  Solving for P V ≥ x | θ < 1 , we have 2 h θ→1 2 i 4 + x P V ≥ x θ < 1 ≥ 1 − . q (1 − a)2 + 4 − x2

2 In particular, for x2 = 4q(1−a) ≥ 1 q (1 − a)2, we get 16+p(1−a)2 5  1  1 V θ→1 ≥ q (1 − a)2 θ < 1 ≥ q (1 − a)2 . P 5 9

With the above two propositions, we can show that P [τα < 1 | θ < 1] ≥ CVar (f). On one hand, by Proposition 19, the expected increase in the quadratic variation from time θ to time 1 of paths T c with small gradient must be small: Denoting Aα,θ = t∈(θ,1] Fα,t to be the event that the gradient was small at all times from θ to 1, we have

  h θ→1 i X X 2 V 1A θ < 1 ≤ (2∂ift) 1 C θ < 1 (41) E α,θ E  Fα,t  i t∈Ji∩(θ,1]   X X 2 (since θ < 1) = (2∂ift) 1 C 1E θ < 1 E  Fα,t t  i t∈Ji∩(θ,1] hP P 2 i E i t∈J ∩(θ,1] (2∂ift) 1F C 1Et = i α,t P [θ < 1] [V ] (32),(31) ≤ E α ≤ 4ρ (α) . P [θ < 1] On the other hand, by invoking Proposition 20 with q = 1/2 and a = 1/2, the overall increase in quadratic variation is large with high probability:  1  1 V θ→1 > θ < 1 ≥ . P 40 72 We then have h i 1  1   V θ→11 θ < 1 ≥ V θ→1 > ∩ A θ < 1 E Aα,θ 40P 40 α,θ 1   1  h i ≥ V θ→1 > θ < 1 − AC θ < 1 . 40 P 40 P α,θ

h C i Solving for P Aα,θ | θ < 1 gives

h i  1  h i AC θ < 1 ≥ V θ→1 > θ < 1 − 40 V θ→11 θ < 1 P α,θ P 40 E Aα,θ 1 ≥ − Cρ (α) . 72

23 Since limα→0 ρ (α) = 0, this last expression is larger than some constant c for small enough C ∗ α. But the event Aα,θ means that at some time t > θ, the gradient k∇ft∗ k2 was larger r  e  1+fs than α log 2 + P 2 , while the event θ < 1 implies that sup0≤s≤θ 2 ≥ 1/2, yielding i Infi(f) p 1   e  Ψt∗ ≥ 2 α log 2 + P 2 . Thus i Infi(f) h i h i C P τα < 1 θ < 1 ≥ P Aα,θ θ < 1 > c as needed.

Case 2: Large jump   1   Suppose now that P fθ ∈ 0, 2 | θ < 1 < 1/2, meaning that with large probability ft makes a large jump at time θ to some value greater than 1/2:

 1  1 ∆f ≥ θ < 1 ≥ . (42) P θ 2 2

The next proposition, parallel to Proposition 19, shows that most of the quadratic variation is gained at times when, just after the process jumped, the gradient was large. For a fixed α > 0 whose value is to be chosen later, let  v  u !  u e  H = k∇f k > α log 2 + α,t t 2 t P 2  i Infi (f)  be the event that the norm of the gradient is large exactly at time t, and let

n X X 2 Uα = (2t∂ift) 1 C 1 Hα,t {ft≥0} i=1 t∈Ji∩[0,1] be the quadratic variation accumulated at times when f’s value is large but the gradient is small. Proposition 21. Let 0 < α < 1/e, and assume that (30) holds. There exists a function ρ : [0, 1] → R with limx→0 ρ (x) = 0 such that

E [Uα] ≤ ρ (α) Var (f) . (43)

Proof. The proof is somewhat similar to that of Proposition 19. Observe that the random variable 1 C 1 is a function only of Bt; there is therefore a continuous “interpolating” function h : Hα,t {ft≥0} n [−1, 1] → R such that 1 C 1 ≤ h (Bt) ≤ 1Hc 1 . Hα,t {ft≥0} 2α,t {ft≥−1/2}

24 2 Invoking Lemma 13 with gt = (∂ift) h (Bt), we have

 n   n  X X 2 X X 2 [Uα] = (2t∂ift) 1 C 1 ≤ 4t gt E E  Hα,t {ft≥0} E   i=1 t∈Ji∩[0,1] i=1 t∈Ji∩[0,1] Z 1 h 2 i (22) = 2 tE k∇ftk2 h (Bt) dt 0 Z 1 h 2 i ≤ 2 t k∇f k 1 c 1 dt. (44) E t 2 H2α,t {ft≥−1/2} 0 We now define δ as in Proposition 19, and split the integral in two: Z 1 Z 1−δ Z 1 (...) dt = (...) dt + (...) dt. 0 0 1−δ The first integral on the right hand side is dealt with exactly as in Proposition 19, yielding Z 1−δ h 2 i 1/3 tE k∇ftk2 1HC 1{ft≥−1/2} dt ≤ C1α Var (f) . (45) 0 α,t

For the second integral on the right hand side, we again use the fact that Ψt is bounded: By definition of H2α,t,

h 2 i h 2−2p  2p i k∇ftk 1Hc 1{f ≥−1/2} = k∇ftk 1 C k∇ftk 1{f ≥−1/2} E 2 2α,t t E 2 H2α,t 2 t  !!1−p  e ≤ (2α)2−2p log 2 + k∇f k2p 1 , E  P 2 t 2 {ft≥−1/2} i Infi (f) (46) whereas !!p h i  1 + f  e k∇f k2p 1 ≤ 4 k∇f k2p sup s ≤ 4Ψ ≤ 4Var (f) log 2 + . E t 2 {ft≥−1/2} E t 2 2 t P 2 s≤t i Infi (f) Plugging the above display into (46), we therefore have

Z 1 ! h 2 i 2−2p e t k∇ftk 1 C 1{f≥−1/2} dt ≤ 16δα Var (f) log 2 + . E 2 H2α,t P 2 1−δ i Infi (f)

c−1 log(1/α) Since in any case δ ≤ C2 ·   for some constant C2 > 0, we thus have e log 2+ P 2 i Infi(f) Z 1 h 2 i −1 2−2p tE k∇ftk2 1HC 1{f≥−1/2} dt ≤ 16C2c α log (1/α) Var (f) . (47) 1−δ 2α,t −1 Combining (45) and (47), there exists an absolute constant C := C1 + 16C2c > 0 such that Z 1 h 2 i  1/3 2−2p tE k∇ftk2 1HC 1{ft≥−1/2} dt ≤ C α + α log (1/α) Var (f) . 0 2α,t Plugging this into (44) finishes the proof with ρ (x) = C x1/3 + x2−2p log (1/x).

25 With this proposition in hand, we can show that P [τα < 1] ≥ CVar (f). Consider the event that r  e  at time θ, both k∇fθk2 < α log 2 + P 2 and ∆fθ ≥ 1/2. Since θ is the first time that i Infi(f) ft ≥ 0, and since with probability 1 there is no jump at time 1, this event contributes at least 1/4 to Uα; thus  v   u !   1  u e   1 U ≥ ≥ k∇f k ≤ αtlog 2 + ∩ ∆f ≥ P α 4 P  θ 2 P 2 θ 2   i Infi (f)  v  u !  u e 1  1 = k∇f k ≤ αtlog 2 + ∆f ≥ ∆f ≥ P  θ 2 P 2 θ 2 P θ 2 i Infi (f) v  u !  u e 1 1 (42) , (31) ≥ k∇f k ≤ αtlog 2 + ∆f ≥ Var (f) . P  θ 2 P 2 θ 2 8 i Infi (f)

On the other hand, by Markov’s inequality and Proposition 21, this probability is upper-bounded by  1 [U ] U ≥ ≤ E α ≤ 4ρ (α) Var (f) . P α 4 1/4 Combining the two displays, we get v  u !  u e 1 k∇f k ≤ αtlog 2 + ∆f ≥ ≤ 4ρ (α) . P  θ 2 P 2 θ 2 i Infi (f)

Taking α small enough so that the right hand side is smaller than 1/2, we have

 1 1 H ∆f ≥ > . P α,θ θ 2 2

For this α, under the event Hα,θ ∩ {θ < 1}, Ψθ is large: !!p 1 + f 1 + f 1 e Ψ = k∇f k2p sup s ≥ k∇f k2p θ ≥ α log 2 + , θ θ 2 2 θ 2 2 2 P 2 0≤s≤θ i Infi (f) and so h i  1  1  τ < 1 θ < 1 ≥ τ < 1 θ < 1 and ∆f ≥ ∆f ≥ θ < 1 P α P α θ 2 P θ 2  1 1 1 (42) ≥ H ∆f ≥ · ≥ P α,θ θ 2 2 4 as needed.

26 5 Talagrand’s influence inequality and its stability

The proofs of Theorems2 and5 are similar in spirit to that of Theorem3, and again require bounding the gain in quadratic variation. However, extra care is needed to bound the size the (i) individual influence processes ft . We first define several quantities which will be central to our proofs. For a fixed 0 < α ≤ 1 whose value will be chosen later, let

n (i) o Fα = ∃t ∈ [0, 1] , ∃i ∈ [n] | t ∈ Ji and ft ≥ α (48) be the event that a coordinate had large derivative at the time it jumped; let

Z 1 2 (i)  (i) Q = 2 t f 1 (i) dt, α t f <α 0 t and n X (i) Qα = Qα ; i=1 and let (i) X 2 Vα = (2t∂ift) 1 (i) ft <α t∈Ji∩[0,1] and n X (i) Vα = Vα . i=1

Vα can be thought of as the quadratic variation of the process ft, but where big jumps (i.e those larger than tα) are excluded. Finally, define

 1  ρ (x) = x log + 2 . x

Instead of using Theorem8 to bound influences, we use the following lemma:

Lemma 22. There exists a universal constant γ > 1 so that

1+s/(2γ) ϕi (s) ≤ γInfi (f) (49) for all 0 ≤ s ≤ γ.

This lemma can be derived from the hypercontractivity principle (see e.g [CEL12] and [O’D14, Cor. 9.25]). However, we give a different proof based on the analysis of the stochastic process ft; this analysis can be pushed further to obtain the stability results. On an intuitive level and in light of equation (22) the lemma shows that all of the “action” which contributes to the variance of the function happens very close to time 1.

Proof of Lemma 22. Let γ > 1 to be chosen later. We start by showing that there exists a constant 0 cγ > 0 such that 0 1+cγ ϕi (γ) ≤ γϕi (0) . (50)

27 Recall that ψi (t) = ϕi (log 1/t); by applying Corollary 15 to the function fi, we see that ψi satisfies

dψi (i) 2 = 2tE ∇ft . (51) dt 2 (i) The right hand side of equation (51) can be bounded using Lemma9: Taking g = f and x = Bt in equation (16) and substituting this in equation (51), we have   dψi L  (i)2 e ≤ 2t E  ft log  . dt (1 − t)4   (i)2  ft

For t ≤ 1/2,   dψi  (i)2 e ≤ 16LE  ft log  dt   (i)2  ft    (i)2 e (Jensen’s inequality) ≤ 16LE ft log    (i)2 E ft e = 16Lψi (t) log . (52) ψi (t)

By Lemma 17, there exists a time t0 ≤ 1/2 and a constant K such that for all t ∈ [0, t0],

e−16Lt 2e−16Lt ψi (t) ≤ Kψi (0) = KInfi (f) ,    (i)2 where in the last equality we used equation (27) and the fact that ψi (0) = E f0 . Since −s ϕi (s) = ψi (e ), if γ ≥ log (1/t0) then

−γ 2e−16Le ϕi (γ) ≤ KInfi (f)

 −16Le−γ  1+ 2ee −1 = KInfi (f) .

0 −16Le−γ Setting cγ = 2e − 1 and taking γ larger than K gives the desired result: Equation (50) 2  (i) (i)2 0 follows because ϕi (0) = E f1 = E f = Infi (f) by equation (14). Note that cγ > 0 only if γ > log 16L − log log 2. Using equation (50) together with the log-convexity from Lemma 16, for all 0 ≤ s ≤ γ we can bound ϕi (s) by  s  s  ϕ (s) = ϕ 1 − · 0 + · γ i i γ γ 1−s/γ s/γ ≤ ϕi (0) ϕi (γ) 0 1−s/γ (1+cγ )s/γ ≤ ϕi (0) ϕi (0)

1+cγ s = γInfi (f)

28 0  −16Le−γ  as needed, with cγ = cγ/γ = 2e − 1 /γ. The theorem then follows by taking γ large 0 enough so that γ ≥ log 16L − log log 2 and and cγ ≥ 1/2. The following two propositions are somewhat analogous to Propositions 19 and 20.

Proposition 23. Let 0 < α ≤ 1. Then

2 E [Qα] ≤ 4γ ρ (α) T (f) , where γ is the universal constant from Lemma 22.

1 Proposition 24. If 0 < α < 1/8 and P [Fα] < 200 Var (f), then  1  1 V ≥ > Var (f) . P α 64 16

The proofs of Propositions 23 and 24 are of a very similar nature to those of Theorem8 and Propositions 19 and 20.

Pn (i) Proof of Proposition 23. Since Qα = i=1 Qα , it is enough to show that

h (i)i 2 Infi (f) E Qα ≤ 4γ ρ (α) . 1 + log (1/Infi (f))

 (i) 2 Denoting ϕ˜ (s) = f 1 (i) , by change of variables we get i E e−s f <α e−s Z ∞ h (i)i −2s E Qα ≤ 2 e ϕ˜i (s) ds. (53) 0

log(1/α) 1 Let τ = 1 . Assume first that τ ≤ 2 γ. The integral in equation (53) then splits up 2+ 2γ log(1/Infi(f)) into three parts: Z τ Z γ Z ∞ h (i)i −2s −2s −2s E Qα ≤ 2 e ϕ˜i (s) ds + 2 e ϕ˜i (s) ds + 2 e ϕ˜i (s) ds. (54) 0 τ γ For the first integral on the right hand side, we write

ϕ˜ (s) ≤ α f (i) i E e−s (i) (i) = αEfe−s = αf0 = αInfi (f) . Thus Z τ −2s e ϕ˜i (s) ds ≤ ατInfi (f) 0 Inf (f) (by choice of τ) ≤ 2γ i α log (1/α) . (55) 1 + log (1/Infi (f))

29 For the second and third integrals, we use the fact that trivially, ϕ˜i (s) ≤ ϕi (s) for all s. By Lemma 1+s/2γ 22, for s ∈ [τ, γ] we then have ϕ˜i (s) ≤ γInfi (f) . The second integral is therefore bounded by Z γ Z γ −2s −2s 1+s/2γ e ϕ˜i (s) ds ≤ γ e Infi (f) ds τ τ Z ∞ −2s 1+s/2γ ≤ γ e Infi (f) ds τ Z ∞  1  s 2γ log Infi(f)−2 = γInfi (f) e ds τ Inf (f)  1  i τ 2γ log Infi(f)−2 ≤ γ 1 e 2 + 2γ log (1/Infi (f)) Inf (f) ≤ 2γ2 i α. (56) 1 + log (1/Infi (f))

For the third integral, we use the fact that ϕi (s) is a decreasing function in s (as can be seen from 1 R ∞ −2s R γ −2s equation (28)). Since γ > 1 and τ ≤ 2 γ, we immediately have γ e ϕ˜i (s) ds ≤ τ e ϕ˜i (s) ds. 1 Putting these bounds together, when τ < 2 γ we get that   h (i)i Infi (f) 2 Infi (f) E Qα ≤ 2 2γ α log (1/α) + (2 + 2) γ α 1 + log (1/Infi (f)) 1 + log (1/Infi (f)) Inf (f) = 4γ2ρ (α) i . 1 + log (1/Infi (f))

1 Now assume that τ ≥ 2 γ. The integral in equation (53) then splits up into two parts: Z τ Z ∞ h (i)i −2s −2s E Qα ≤ 2 e ϕ˜i (s) ds + 2 e ϕ˜i (s) ds. 0 τ

1 1 Again, since ϕi (s) is decreasing as a function of s and since τ ≥ 2 γ > 2 , the second integral is smaller than the first, and so by (55),

h (i)i 2 Infi (f) 2 Infi (f) E Qα ≤ 2 · 2γ α log (1/α) ≤ 4γ ρ (α) 1 + log (1/Infi (f)) 1 + log (1/Infi (f)) in this case as well.

Proof of Proposition 24. Assume without loss of generality that f0 = Ef ≤ 0 (if not, use −f instead of f; the variances and the probability P [Vα ≥ x] are the same for both functions). Let τ = inf {0 ≤ t ≤ 1 | ft ∈ (0, 2α)} ∧ 1. By conditioning on the event {τ < 1}, for any x > 0 we have h i

P [[f]1 ≥ x] ≥ P [f]1 ≥ x τ < 1 P [τ < 1] .

We start by bounding the probability P [τ < 1]. Let A = {∃t ∈ [0, 1] s.t ft > 0}, and observe that {τ < 1} ⊆ A. Under the event A\{τ < 1}, the process ft never visited the interval (0, 2α) and yet at some point reached a value larger than 0, and so necessarily had a jump discontinuity of (i) size at least 2α. But a jump occurring at time t due to a discontinuity in Bt is of size 2t |∂ift|,

30 (i) and so 2t |∂ift| ≥ 2α, implying that ft ≥ |∂ift| ≥ α. Thus, A ∩ {τ = 1} ⊆ A ∩ Fα, and so C A ∩ Fα ⊆ A ∩ {τ < 1} = {τ < 1}. Hence

P [τ < 1] ≥ P [A\Fα] ≥ P [A] − P [Fα] .

To bound P [A], note that {f1 = 1} ⊆ A. By the martingale property of ft,

f0 = Ef1 = 2P [f1 = 1] − 1, and so 2 1 + f0 1 − f0 Var (f) 1 P [A] ≥ P [f1 = 1] = = = ≥ Var (f) . 2 2 (1 − f0) 2 (1 − f0) 4 1 Putting this together with the assumption that P [Fα] < 200 Var (f) gives 1 [τ < 1] ≥ Var (f) . (57) P 8 h i

Next we bound the probability P [f]1 ≥ x τ < 1 , by relating the quadratic variation to the variance of ft. Let σ be the stopping time σ = inf {s ≥ τ | [f]s ≥ x} ∧ 1. Since the σ-algebra generated by the event {τ < 1} is contained in that generated by Bτ , h i h h i i 2 2 E (fσ − fτ ) τ < 1 = E E (fσ − fτ ) Bτ | τ < 1 .

Conditioned on Bτ , the process ft is a martingale for t ∈ [τ, σ], and so by (19), h h i i h h i i 2 E E (fσ − fτ ) Bτ τ < 1 = E E ([f]σ − [f]τ ) Bτ | τ < 1 h i

= E [f]σ − [f]τ τ < 1 h i h i = ([f] − [f] ) 1F τ < 1 + ([f] − [f] ) 1 C τ < 1 . (58) E σ τ α E σ τ Fα

For the first term on the right hand side, observe that [f]σ − [f]τ ≤ x + 4: Since [f]σ is the sum of squares of the jumps of ft up to time σ, the largest value it can attain is x plus the square of the jump which occurred at time σ, and the size of this jump is bounded by 2. Thus h i h i ([f] − [f] ) 1 τ < 1 ≤ (x + 4) 1 τ < 1 E σ τ Fα E Fα [F ∩ {τ < 1}] = (x + 4) P α P [τ < 1] [F ] 8 (x + 4) ≤ (x + 4) P α ≤ , (59) P [τ < 1] 200 where the last inequality is by the assumption on P [Fα] and equation (57). For the second term on the right hand side, since the event 1 C forces all jumps to be of size Fα smaller than 2α, we similarly have

2 ([f] − [f] ) 1 C ≤ x + 4α . (60) σ τ Fα

31 Plugging displays (59) and (60) into (58), we get

h i 8 (x + 4) (f − f )2 τ < 1 ≤ x + 4α2 + . (61) E σ τ 200 On the other hand, h i h i h i 2 2 E (fσ − fτ ) τ < 1 ≥ E (fσ − fτ ) τ < 1 and |fσ| = 1 P |fσ| = 1 τ < 1 h i 2 ≥ (1 − 2α) P |fσ| = 1 τ < 1 , and so together with (61) and plugging in x = 1/64 and α < 1/8,

2 8(x+4) h i x + 4α + 200 259 5 P |fσ| < 1 τ < 1 ≥ 1 − ≥ > . (1 − 2α)2 450 9

Now, if |fσ|= 6 1 then fσ stopped because [f]σ was larger than or equal to x. Since [f]s is increasing as a function of s, [f]1 ≥ x as well, and so  1  h i 5 [f] ≥ τ < 1 ≥ |f | < 1 τ < 1 ≥ . (62) P 1 64 P σ 9

Combining (57) and (62) gives  1  5 [f] ≥ ≥ Var (f) . P 1 64 72 C Under the event Fα we have that Vα = [f]1, and by a union bound we get

 1   1  V ≥ ≥ [f] ≥ − [F ] P α 64 P 1 64 P α 5 1 1 ≥ Var (f) − Var (f) > Var (f) . 72 200 16

5.1 Proof of Theorem2 Proof of Theorem2. Let γ be the constant from the statement of Lemma 22. By Proposition 23, for every 0 < α ≤ 1, we have 2 E [Qα] ≤ 4γ ρ (α) T (f) . 2 Pn R 1  (i) Choosing α = 1 just gives Qα = 2 i=1 0 t ft dt, since the derivatives are bounded by 1; the expectation of this expression, as seen in (26), is larger than Var (f). We thus have

2 2 Var (f) ≤ E [Qα] ≤ 4γ ρ (1) T (f) = 8γ · T (f) .

32 5.2 Proof of Theorem5 Using Propositions 23 and 24, we can obtain the following lemma. Lemma 25. Let γ be the constant from Lemma 22, and assume that 0 < α < 1/16 is small enough 2 1 so that 4γ ρ (α) T (f) ≤ 1024 Var (f). Then 1 [F ] ≥ Var (f) . P α 200 1 (i) Proof. Suppose by contradiction that P [Fα] < 200 Var (f). Since neither ∂ift nor ft depend on coordinate i, we may write

h (i)i X 2 X 2 V = (2t∂ f ) 1 (i) = (2t∂ f − ) 1 (i) . E α E i t f <α E i t f <α t t− t∈Ji∩[0,1] t∈Ji∩[0,1] 2 Invoking Lemma 13 with g = (∂ f − ) 1 (i) , we get t i t f <α t−

h (i)i X 2 V = (2t∂ f − ) 1 (i) E α E i t f <α t− t∈Ji∩[0,1] Z 1 2 ≤ 2 t (∂ f − ) 1 (i) dt E i t f <α 0 t− Z 1 2 h (i)i = 2 t (∂ f ) 1 (i) dt = Q . E i t f <α E α 0 t Using Proposition 23, this means that 2 E [Vα] ≤ 4γ ρ (α) T (f) . On the other hand, by Proposition 24 and Markov’s inequality,  1  1 1 [V ] ≥ V ≥ · > Var (f) , E α P α 64 64 1024 2 1 contradicting the assumption that 4γ ρ (α) T (f) ≤ 1024 Var (f). The main assertion involved in proving Theorem5 connects between the vertex boundary and the probability that the function makes a large jump.

Proposition 26. For 0 < α ≤ 1, let Fα be the event defined in equation (48). Then 1 µ ∂±f ≥ α [F ] . (63) 2 P α

The prove this proposition, we will construct a modification B˜t of Bt, which can be thought of as a “hesitant” version of Bt. For each coordinate i, let J˜i be the jump set of a Poisson point process on (0, 1] with intensity 1/2t, independent from Bt (and in particular, independent from the jump  (i) ˜  ˜(1) ˜(n) process Ji = Jump Bt ). Define Bt = Bt ,..., Bt to be the process such that for every i, ( ˜ ˜(i) 0 t ∈ Ji ∪ Ji Bt = (i) Bt o.w. ˜(i) Loosely speaking, there are several ways of thinking about Bt :

33 ˜(i) (i) 1. The process Bt can be seen as a “hesitant” variation of Bt : It jumps with twice the rate (since its set of discontinuities is the union of two Poisson processes with rate 1/2t), but half of those times, it returns to the original sign rather inverting it. We refer to this view as the ˜(i) (i) ˜(i) (i) “standard coupling” of Bt with Bt : The process Bt is a copy of Bt , but with additional independent hesitant jumps. ˜(i) 2. The process Bt is equal to 0 at a discrete set of times which follows the law of a Poisson point process with intensity 1/t (this is the union Ji ∪ J˜i); between two successive zeros it chooses randomly to be either t or −t, each with probability 1/2.

˜  ˜  ˜ ˜ ˜(i) Similarly to the notation using Bt, we write ft = f Bt , and analogously ∂ift, ∇ft and ft .

Lemma 27. The process B˜t is a martingale. ˜(i) h ˜(i)i Proof. Let 0 ≤ s < t ≤ 1. For s = 0, since B0 = 0 always, we trivially have E Bt = 0, so assume s > 0. (i) If B˜s = 0, then s ∈ J ∪ J˜. Being independent Poisson point processes, almost surely we have h i ii h i ˜ ˜(i) ˜ ˜(i) ˜(i) (i) Ji ∩ Ji = ∅, and P s ∈ Ji Bs = P s ∈ Ji Bs = 1/2. Since Bt = Bt almost surely, we thus have h i 1 h i 1 h i B˜(i) B˜(i) = B(i) s ∈ J , B˜(i) + B(i) s ∈ J˜, B˜(i) E t s 2E t i s 2E t i s 1 h i 1 h i = B(i) s ∈ J ,B(i) + B(i) s ∈ J˜,B(i) . 2E t i s 2E t i s

It is evident by the definition of the process Bt that h i h i (i) (i) (i) ˜ (i) E Bt s ∈ Ji,Bs = −E Bt s ∈ Ji,Bs , h i ˜(i) ˜(i) ˜(i) so that E Bt Bs = 0 = Bs . ˜(i) ˜(i) Finally, if Bs 6= 0, then since Bt 6= 0 almost surely, we have by (20) that h i t − s signB˜(i) 6= signB˜(i) B˜(i) = . P t s s 2t Thus h i t − s t − s B˜(i) B˜(i) = signB˜(i) · t · + signB˜(i) · (−t) · E t s s 2t s 2t ˜(i) = signBs · s ˜(i) = Bs .

Proof of Proposition 26. In order to distinguish between the vertex boundaries, we will use the + hesitant jump process f˜t defined above. We prove (63) for the inner vertex boundary ∂ ; the proof n o − ˜(i) (i) for ∂ is identical. Let τ = inf t > 0 | ∃i ∈ [n] s.t Bt = 0 and ft ≥ α ∧ 1. Note that for any

34 ˜(i) t0 > 0, we almost surely have that Bt = 0 only finitely many times for t ∈ [t0, 1]. Thus, if 0 < τ < 1, then the infimum in the definition of τ is attained as a minimum, and there exists an (i0) (i0) i0 such that B˜τ = 0 and fτ ≥ α. In fact, this holds true if τ = 0 as well: In this case, there is a sequence of times t → 0 and indices i such that f (ik) ≥ α and B˜(ik) = 0. Since there are only k k tk tk

finitely many indices, there is a subsequence k` so that ik` are all the same index i0, and the claim (i ) follows by continuity of f 0 and the fact that B˜0 = 0. ˜(i) (i) When Fα occurs, we necessarily have τ < 1, since Bt = 0 whenever Bt is discontinuous. Since B˜1 is uniform on the hypercube, h i h i +  ˜ + ˜ + µ ∂ f = P B1 ∈ ∂ f ≥ P B1 ∈ ∂ f τ < 1 P [τ < 1] h i ˜ + ≥ P B1 ∈ ∂ f τ < 1 P [Fα] , and so it suffices to show that h i 1 B˜ ∈ ∂+f τ < 1 ≥ α. (64) P 1 2

(i0) (i0) Supposing that τ < 1, denote by i0 a coordinate for which B˜τ = 0 and fτ ≥ α. If τ = 0 then (j) (j) B˜τ = Bτ ; otherwise, almost surely i0 is the only coordinate of B˜τ which is 0, and so B˜τ = Bτ (i) for all j 6= i0 almost surely. Since the function f (·) does not depend on the i-th coordinate, we ˜(i0) (i0) ˜(i0) deduce that fτ = fτ ≥ α almost surely. Thus, under τ < 1, by the martingale property of ft , we have h i ˜(i0) ˜ P f1 = 1 Bτ ≥ α. (65) h i ˜(i0) ˜(i0) ˜ ˜(i0) Similarly, using the martingale property of Bt , we have E B1 | Bτ = Bτ = 0, and so

h i h i 1 B˜(i0) = 1 B˜ = B˜(i0) = −1 B˜ = . P 1 τ P 1 τ 2

˜ ˜(i0) Since ∂i0 ft is independent of Bt , we finally obtain h i h i h i B˜ ∈ ∂+f B˜ = B˜(i0) = 1 ∧ ∂ f˜ = 1 B˜ + B˜(i0) = −1 ∧ ∂ f˜ = −1 B˜ P 1 τ P 1 i0 1 τ P 1 i0 1 τ 1 h i 1 h i (independence) = ∂ f˜ = 1 B˜ + ∂ f˜ = −1 B˜ 2P i0 1 τ 2P i0 1 τ 1 h i = f˜(i0) = 1 B˜ 2P 1 τ (65) 1 ≥ α. 2

Proof of Theorem5. Let γ be the constant from the statement of Lemma 22. Let α be such that 1 1 2 1 α log α = 214γ2 rTal. Then the condition 4γ ρ (α) T (f) ≤ 1024 Var (f) is satisfied in Lemma 25, 1 implying that P [Fα] ≥ 200 Var (f). Together with Proposition 26, we have 1 1 µ ∂±f ≥ α [F ] ≥ αVar (f) . (66) 2 P α 400

35 1 √ All that remains is to obtain a lower bound on α. To this end, observe that α log α ≤ α for all 1 2 1  228γ4  0 α ∈ [0, 1], and so α ≥ 28 4 rTal. Thus log α ≤ log 2 , so there exists a constant CB such that 2 γ rTal

rTal rTal α = 1 ≥ 0 . 14 2 0 CB 2 γ log α C log B rTal Plugging this into (66) gives the desired result.

6 Proof of Theorem8

As explained above in equation (24), our goal is to show that

√ cε n Z 1−ε n ! X 2 X 2 Sε (f) = E t (∂ift) dt ≤ CVar (f) Infi (f) . (67) i=1 0 i=1 We first show that we may assume that f is monotone. For an index i = 1, . . . , n, define an operator κ by i ( max f (y) , f y⊕i y = 1, (κ f)(y) = i i  ⊕i min f (y) , f y yi = 0.

The following lemma relates between the influences and sensitivities of κif and f:

Lemma 28 ([BKS99, Lemma 2.7]). κ1κ2 . . . κnf is monotone, and for every pair of indices i, j, Infi (κjf) ≤ Infi (f) and Sε (κif) ≥ Sε (f).   Thus, if equation (67) holds for f˜ = κ1 . . . κnf, then it holds for f as well, since Sε (f) ≤ Sε f˜ 2 Pn 2 Pn  ˜ and i=1 Infi (f) ≥ i=1 Infi f . So it’s enough to verify (67) for monotone functions. In order to prove (67), we may also assume that for any fixed universal constant K < 1,

n X 2 Infi (f) ≤ K. (68) i=1

Pn 2 For if i=1 Infi (f) ≥ K for some K, then since f is monotone,

n n X 2 X 2 (15) X 2 Var (f) = fˆ(S) ≥ fˆ({i}) = Infi (f) ≥ K, S⊆[n],S6=∅ i=1 i=1

P 2 2 2 and so Varf · i Infi (f) ≥ K . Equation (67) then holds trivially with C = 1/K and c = 1, since Sε ≤ 1 for all ε. Similarly, we may assume that for any fixed, universal constant K,

Var (f) ≤ K; (69) otherwise Theorem8 would be equivalent (up to constants) to the original theorem proved in [KK13].

36 Remark 29. Our proof actually recovers the original theorem proved in [KK13], but we make this assumption since it simplifies some bounds. P 2 2 Define R (t) = E i (∂ift) = E k∇ftk2. At time 0, we have n n n n 2 X 2 X   X 2 X 2 R (0) = (∂if0) = ∂ˆif (∅) = fˆ({xi}) = Infi (f) . i=1 i=1 i=1 i=1

2 The function R (t) is monotone in t: Since ∂ift is a martingale, (∂ift) is a submartingale and so 2 E (∂ift) is increasing. By invoking Corollary 15 on ∂if, for every index i we have n d X (∂ f )2 = 2t (∂ ∂ f )2 . dtE i t E j i t j=1 Thus n n d X X 2 2 2 R (t) = 2t (∂i∂jft) ≤ 2 ∇ ft , (70) dt E E HS i=1 j=1 where kXkHS is the Hilbert-Schmidt norm of a matrix. By Lemma 10, there exists a continuous positive function C (t) such that " # d C (t) R (t) ≤ 2C (t) k∇f k2 log dt E t 2 2 k∇ftk2   h 2i C (t) C (t) (Jensen’s inequality) ≤ 2C (t) E k∇ftk log = 2C (t) R (t) log . 2 h 2i R (t) E k∇ftk2

Since C (t) is continuous it is bounded in [0, 1/2], so there exists a constant c > 0 such that for all t ∈ [0, 1/2], d  c  R (t) ≤ c · R (t) log . (71) dt R (t) By (68), we can assume that R (0) ≤ c/2. Using (71) together with Lemma 17, there exist constants C, L > 0 and a time t0, all of which depend only on c, such that for all t ∈ [0, t0],

−Ct R (t) ≤ L · R (0)e . In particular, there exists a constant K > 0 such that

R e−K  ≤ L · R (0)5/6 , (72) and since R is increasing, we can always assume that K > 1. Denote G (s) = R (e−s); by Lemma 16, G (s) is log-convex in s. Lemma 30. Let K ≥ 1 and let G (s) be a log-convex decreasing function. Denote v = R K −2s 0 e G (s) ds and assume that v > G (K). Then for all r < K, ! Z r G (K)r/K e−2sG (s) ds ≥ v · 1 − . 0 v

37 Proof. Consider the function v` h (s) = e−(`−2)s, ` 1 − e−`K where ` is the largest solution to the equation h` (K) = G (K). By choice of h`, we have

Z K Z K −2s −2s e h` (s) ds = v = e G (s) ds. 0 0

−2s −2s Since e h` (s) is log-linear on [0,K], e G (s) is log-convex on [0,K], they have the same integral on [0,K] and G (K) = h` (K), we must have one of two cases:

1. h` (s) = G (s)

2. The functions intersect at most once in the interval [0,K) at some point s0 such that G (s) ≥ h` (s) for all s < s0. In either case, for all r ∈ [0,K], we have Z r Z r −2s −2s v  −r`  −r` e G (s) ds ≥ e h` (s) ds = −K` 1 − e ≥ v 1 − e . (73) 0 0 1 − e

On the other hand, we chose ` to be such that h` (K) = G (K), and so ` h (K) G (K) e−(`−2)K = ` = < 1, 1 − e−`K v v

x −(x−2)K where the last inequality is by assumption on v. The function 1−e−xK e is decreasing as a function of x in the interval [2, ∞), but is greater than 1 at x = 2; hence, since ` is the largest number for which h` (K) = G (K), we must have ` > 2. We then have

G (K) 1 − e−`K  G (K) `e−(`−2)K = ≤ , v v and after rearranging, since ` > 2,

G (K) e−2K G (K) e−`K ≤ ≤ . v ` v Thus 1 v ` ≥ log . K G (K) Putting this into the right hand side of (73) gives

Z r   e−2sG (s) ds ≥ v 1 − e−r` 0   r !  − r log v  G (K) K ≥ v 1 − e K G(K) = v 1 − . v

38 Proof of Theorem8. By Corollary 14,

n Z 1 Z 1 X 2 Var (f) = Var (f1) = 2E t (∂ift) dt = 2 t · R (t) dt, i=1 0 0 and by change of variables this becomes Z ∞ Var (f) = 2 e−2sG (s) ds. 0

R K −2s Define v = 0 e G (s) ds, where K is the constant from equation (72). Note that since K ≥ 1 and G is decreasing,

Z ∞ (72) Var (f) − v = e−2sG (s) ds ≤ G (K) ≤ L · R (0)5/6 . (74) K Rearranging, this gives v Var (f) − LR (0)5/6 ≥ . (75) G (K) LR (0)5/6

1+f(x) 1 Set g (x) = 2 , and assume without loss of generality that Ef = f (0) ≤ 0, so that Eg = g (0) ≤ 2 (if f (0) > 0, we can take g (x) = (1 − f (x)) /2). This implies that 1 Var (g) = g (0) (1 − g (0)) ≥ g (0) . 2 Invoking Lemma9 with g and t = 0, there exists a constant C such that

n X 2 2 2 R (0) = E (∂if (0)) = k∇f (0)k2 = 4 k∇g (0)k2 i=1 e ≤ Cg (0)2 log g (0) C0 4C00 ≤ C0 (Var (g))2 log ≤ C00Var (f)2 log . (76) Var (g) Var (f)

By (69), we can assume that Var (f) is small enough so (76) implies

R (0)2/3 ≤ Var (f) . (77)

Plugging this into (75), we get

v R (0)2/3 − LR (0)5/6 ≥ . G (K) LR (0)5/6

5/6 1 2/3 For small enough R (0), we have LR (0) ≤ 2 R (0) , and so v 1 ≥ R (0)−1/6 . (78) G (K) 2L

39 By Lemma 30 and equations (74) and (75), we have ! Z r G (K)r/K e−2sG (s) ds ≥ v 1 − 0 v     r/K  ≥ Var (f) − LR (0)5/6 1 − 2L · R (0)1/6 . (79)

This allows us to prove (67): √ n Z 1−ε X 2 Sε (f) = E t (∂ift) dt i=1 0 √ Z 1−ε = tR (t) dt 0 Z e−ε/2 ≤ tR (t) dt 0 Z −ε/2 = Var (f) − e−2sG (s) ds 0 (79)  ε/2K ≤ LR (0)5/6 + Var (f) 2L · R (0)1/6

(77)  ε/2K ≤ LVar (f) R (0)1/6 + Var (f) 2L · R (0)1/6

≤ C · Var (f) R (0)ε/(12K) for some universal constant C.

References

[BDC12] Vincent Beffara and Hugo Duminil-Copin, The self-dual point of the two-dimensional random-cluster model is critical for q ≥ 1, Probab. Theory Related Fields 153 (2012), no. 3-4, 511–542. MR 2948685

[BKS99] Itai Benjamini, Gil Kalai, and Oded Schramm, Noise sensitivity of Boolean functions and applications to percolation, Inst. Hautes Études Sci. Publ. Math. (1999), no. 90, 5–43 (2001). MR 1813223

[BKS03] , First passage percolation has sublinear distance variance, Ann. Probab. 31 (2003), no. 4, 1970–1978. MR 2016607

[BM00] F. Barthe and B. Maurey, Some remarks on isoperimetry of Gaussian type, Ann. Inst. H. Poincaré Probab. Statist. 36 (2000), no. 4, 419–434. MR 1785389

[Bob97] S. G. Bobkov, An isoperimetric inequality on the discrete cube, and an elementary proof of the isoperimetric inequality in Gauss space, Ann. Probab. 25 (1997), no. 1, 206–214. MR 1428506

40 [BV04] Stephen Boyd and Lieven Vandenberghe, Convex optimization, Cambridge University Press, Cambridge, 2004. MR 2061575

[CEL12] Dario Cordero-Erausquin and Michel Ledoux, Hypercontractive measures, talagrand’s inequality, and influences, pp. 169–189, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.

[CHL97] Mireille Capitaine, Elton P. Hsu, and Michel Ledoux, Martingale representation and a simple proof of logarithmic Sobolev inequalities on path spaces, Electron. Comm. Probab. 2 (1997), 71–81. MR 1484557

[DS05] Irit Dinur and Samuel Safra, On the hardness of approximating minimum vertex cover, Ann. of Math. (2) 162 (2005), no. 1, 439–485. MR 2178966

[Dur19] Rick Durrett, Probability—theory and examples, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 49, Cambridge University Press, Cambridge, 2019, Fifth edition of [ MR1068527]. MR 3930614

[Eld15] Ronen Eldan, A two-sided estimate for the gaussian noise stability deficit, Inventiones mathematicae 201 (2015), no. 2, 561–624.

[Ell11] David Ellis, Almost isoperimetric subsets of the discrete cube, Combin. Probab. Comput. 20 (2011), no. 3, 363–380. MR 2784633

[Fri98] Ehud Friedgut, Boolean functions with low average sensitivity depend on few coordinates, Combinatorica 18 (1998), no. 1, 27–35. MR 1645642

[Fri99] , Sharp thresholds of graph properties, and the k-sat problem, J. Amer. Math. Soc. 12 (1999), no. 4, 1017–1054, With an appendix by . MR 1678031

[GKK+09] Dmitry Gavinsky, Julia Kempe, Iordanis Kerenidis, Ran Raz, and Ronald de Wolf, Ex- ponential separation for one-way quantum communication complexity, with applications to cryptography, SIAM J. Comput. 38 (2008/09), no. 5, 1695–1708. MR 2476272

[GS15] Christophe Garban and Jeffrey E. Steif, Noise sensitivity of Boolean functions and per- colation, Institute of Mathematical Statistics Textbooks, vol. 5, Cambridge University Press, New York, 2015. MR 3468568

[Har76] Sergiu Hart, A note on the edges of the n-cube, Discrete Math. 14 (1976), no. 2, 157–163. MR 0396293

[Kin93] J. F. C. Kingman, Poisson processes, Oxford Studies in Probability, vol. 3, The Claren- don Press, Oxford University Press, New York, 1993, Oxford Science Publications. MR 1207584

[KK07] Jeff Kahn and Gil Kalai, Thresholds and expectation thresholds, Combin. Probab. Com- put. 16 (2007), no. 3, 495–502. MR 2312440

[KK13] Nathan Keller and Guy Kindler, Quantitative relation between noise sensitivity and influences, Combinatorica 33 (2013), no. 1, 45–71. MR 3070086

41 [KKL88] Jeff Kahn, Gil Kalai, and Nathan Linial, The influence of variables on boolean functions (extended abstract), 1988, pp. 68–80. [KLS+15] Saleet Klein, Amit Levi, Muli Safra, Clara Shikhelman, and Yinon Spinka, On the converse of talagrand’s influence inequality, CoRR abs/1506.06325 (2015).

[KR09] Robert Krauthgamer and Yuval Rabani, Improved lower bounds for embeddings into L1, SIAM J. Comput. 38 (2009), no. 6, 2487–2498. MR 2506299 [KS06] Gil Kalai and Shmuel Safra, Threshold phenomena and influence: perspectives from mathematics, computer science, and economics, Computational complexity and statis- tical physics, St. Fe Inst. Stud. Sci. Complex., Oxford Univ. Press, New York, 2006, pp. 25–60. MR 2208732 [MNT14] Elchanan Mossel, Joe Neeman, and Omer Tamuz, Majority dynamics and aggregation of information in social networks, Autonomous Agents and Multi-Agent Systems 28 (2014), no. 3, 408–429. [O’D14] Ryan O’Donnell, Analysis of Boolean functions, Cambridge University Press, New York, 2014. MR 3443800 [OS07] Ryan O’Donnell and Rocco A. Servedio, Learning monotone decision trees in polynomial time, SIAM J. Comput. 37 (2007), no. 3, 827–844. MR 2341918 [Raz95] Ran Raz, Fourier analysis for probabilistic communication complexity, Comput. Com- plexity 5 (1995), no. 3-4, 205–221. MR 1394528 [Tal93] Michel Talagrand, Isoperimetry, logarithmic Sobolev inequalities on the discrete cube, and Margulis’ graph connectivity theorem, Geom. Funct. Anal. 3 (1993), no. 3, 295–314. MR 1215783 [Tal94] , On russo’s approximate zero-one law, Ann. Probab. 22 (1994), no. 3, 1576–1587. [Tal96] , How much are increasing sets positively correlated?, Combinatorica 16 (1996), no. 2, 243–258. MR 1401897 [Tal97] , On boundaries and influences, Combinatorica 17 (1997), no. 2, 275–285. MR 1479302 [Ver18] Roman Vershynin, High-dimensional probability, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47, Cambridge University Press, Cambridge, 2018, An introduction with applications in data science, With a foreword by Sara van de Geer. MR 3837109

A Appendix 1: p-biased analysis

n For p = (p1, . . . , pn) ∈ [0, 1] , let µp be the measure n Y 1 + yi (2pi − 1) µ (y) = = w (y) , p 2 (2p−1) i=1

42 which sets the i-th bit to 1 with probability pi. Let ! 1 1 − 2pi 1 ωi (y) = p + yi p , (80) 2 pi (1 − pi) pi (1 − pi) Q and for a set S ⊆ [n], define ωS (y) = i∈S ωi (y). Then every function f can be written as X ˆ f (y) = fp (S) ωS (y) S⊆[n] X  := Eµp [f · ωS] ωS (y) (81) S⊆[n]   X X =  f (y) ωS (y) w2p−1 (y) ωS (y) (82) S⊆[n] y∈{−1,1}n

ˆ The coefficients fp := Eµp [f · ωS] are called the “p-biased” Fourier coefficients of f. The p-biased influence of the i-th bit is

p  ⊕i Infi (f) = 4pi (1 − pi) Py∼µp f (x) 6= f x .

If f is monotone, then p √ p ˆ Infi (f) = 2 pi 1 − pifp ({i}) . (83) The p-biased Fourier coefficients are related to the derivatives of f by the following proposition, whose proof (using slightly different notation) can be found in [O’D14, Section 8].

n 1+x Proposition 31. Let S = {i1, . . . , ik} ⊆ [n] be a set of indices, x ∈ (−1, 1) , and p = 2 . Then   Y 4 ˆ ∂i1 . . . ∂ik f (x) =   fp (S) . q 2 i∈S 1 − xi

B Appendix 2: Postponed proofs

Proof of Lemma9. The lemma is similar to [Tal96, Proposition 2.2], but applied to a biased product distribution rather than to the uniform distribution on {−1, 1}n. For completeness, we present here a general proof, which does not explicitly use hypercontractivity. Using (13), the harmonic extension g (x) may be written as X g (x) = wx (y) g (y) , y

43 Q where wx (y) = i (1 + xiyi) /2. Since differentiation and harmonic extensions commute, we have ∂ X ∂ g (x) = w (y) g (y) i ∂x x i y X g (y) = w (y) y x i 1 + x y y i i 1 X = w (y) y g (y) (1 − x y ) 1 − t2 x i i i y R 1  yidν  = ν (A) A − x , 1 − t2 ν (A) i where ν is the harmonic measure wx (y), and A = supp (g). Under this notation, Z yi g (x) = ν (A) and ∂ig (x) = dν. A 1 + xiyi

n Pn 2 n Let {αi}i=1 be numbers such that i=1 αi = 1, and let h : {−1, 1} → R be defined as

n X yi h (y) = α . i 1 + x y i=1 i i

Let Y ∈ {−1, 1}n have distribution ν. Recall that the sub-gaussian norm of a random variable R ∈ is defined as kRk = inf s > 0 | exp R2/s2 ≤ 2 , while the sub-gaussian norm of a R ψ2 E n random vector R ∈ is defined as kRk = sup n−1 khR, rik (see e.g [Ver18, Sections 2.5 and R ψ2 r∈S ψ2 3.4] for more about sub-gaussian norms). The random variable Yi is bounded in magnitude 1+xiYi by (1 − t)−1, and so has sub-gaussian norm bounded by C (1 − t)−1 for some constant C. By [Ver18, Lemma 3.4.2], a random vector Z with independent, mean-zero sub-gaussian entries is also   sub-gaussian, with kZk ≤ C max kZ k . Thus the random vector Y1 ,..., Yn has ψ2 i i ψ2 1+x1Y1 1+xnYn sub-gaussian norm bounded by C (1 − t)−1 as well, which means that for every s > 0

 2 2 P [|h (Y )| ≥ s] ≤ 2 exp −Cs (1 − t) .

Let s ≥ √1 . Then 0 C Z Z ∞ |h| dν = ν ({|h| ≥ t} ∩ A) dν A 0 ∞ Z  2 2  ≤ min ν (A) , 2e−Cs (1−t) ds 0 Z ∞ 2 s 2 −Cs2(1−t)2 ≤ ν (A) s0 + 2 √ C (1 − t) e ds C (1 − t) s0 C 2 −Cs2(1−t)2 ≤ ν (A) s0 + e 0 . C3/2 (1 − t)2

44 q Taking s = 1 log e ≥ √1 gives 0 C(1−t)2 ν(A) C

Z 1 r e 1 e − log ν(A) |h| dν ≤ ν (A) √ log + 2 e A C (1 − t) ν (A) C (1 − t) L r e ≤ ν (A) log (1 − t)2 ν (A) for some L > 0. In particular, Z L r e hdν ≤ 2 ν (A) log (84) A (1 − t) ν (A)

−1/2 Pn 2 as well. Now choose αi = ∂ig (x) i=1 ∂ig (x) , and observe that

 2 Z 2 Z n X ∂ig (x) yi hdν =  q dν A A Pn 2 1 + xiyi i=1 i=1 ∂ig (x) n Z !2 1 X yi = ∂ g (x) dν Pn 2 i 1 + x y i=1 ∂ig (x) i=1 A i i n !2 n 1 X X = ∂ g (x)2 dν = ∂ g (x)2 = k∇gk2 . Pn 2 i i 2 i=1 ∂ig (x) i=1 i=1 Together with (84), this gives the desired result.

Proof of Lemma 10. Using Proposition 31 for S = {i, j} and the fact that |xi| = t, n 2 2 X 2 ∇ g (x) HS = (∂i∂jg (x)) i,j=1 n 2 X  16  = gˆ ({i, j}) 1 − t2 p i,j=1 X 2 ≤ 2C (t) gˆp (S) . (85) S⊆[n],|S|=2 The following lemma, which bounds the sum of squares of p-biased Fourier coefficients, is immedi- ately obtained from [KK13, Lemma 6]. n  1+t 1−t Lemma 32. Let 0 ≤ t < 1 and let p ∈ (0, 1) be such that pi ∈ 2 , 2 for all i. For a function g : {−1, 1}n → {−1, 1}, let n X p 2 W (f) = p (1 − p) Infi (g) . i=1 There exists a function C (t) such that for every g,

X  2  gˆ (S)2 ≤ C (t) W (g) · log . (86) p W (g) S⊆[n],|S|=2

45 Combining (85) and (86), we get   2 2 2 ∇ g (x) ≤ C (t) W (g) · log . (87) HS W (g)

As stated in equation (83), for monotone functions the influence of the i-th bit is given by

p √ p Infi (g) = 2 pi 1 − pigˆp ({i}) , and so n n X p 2 2 2 X 2 W (f) = p (1 − p) Infi (g) = 4p (1 − p) gˆp ({i}) . i=1 i=1 On the other hand, using Proposition 31 with S = {i},

n 2 X 2 k∇g (x)k2 = (∂ig (x)) i=1 n 16 X = gˆ ({i})2 1 − t2 p i=1 4 = W (g) := C0 (t) W (g) . (1 − t2) p2 (1 − p)2

Plugging this into (87), we see that for some C (t) we have ! 2 2 2 C (t) ∇ g (x) HS ≤ C (t) k∇g (x)k2 log 2 . k∇g (x)k2

Proof of Lemma 13. Assume first that gt satisfies condition (1), meaning that it is left-continuous and measurable with respect to the filtration generated by {Bs}0≤s 0, so that the number of jumps that Bt makes in the time interval [t1,t2] is almost surely finite. For any integer N > 0, partition the interval [t1,t2] into N equal parts, setting N k tk = t1 + N (t2 − t1) for k = 0,...,N. Since almost surely none of the jumps of Bt occur at any N tk , and since gt is left-continuous, we almost surely have

N−1 X 2 X N 2 4t gt = lim 4 tk gtN 1J ∩ tN ,tN 6=∅. N→∞ k i [ k k+1] t∈Ji∩[t1,t2] k=0

Since g N is bounded, the expression tk

N−1 X N 2 4 tk gtN 1 N N k Ji∩[tk ,tk+1]6=∅ k=0 is bounded in absolute value by a constant times the number of jumps of Bt in the interval [t1,t2], which is integrable. By the dominated convergence theorem, we then have

46 N−1 X 2 X N 2 E 4t gt = lim E 4 tk gtN 1J ∩ tN ,tN 6=∅. N→∞ k i [ k k+1] t∈Ji∩[t1,t2] k=0

Since g N is measurable with respect to {Bs} N , it is independent of whether or not a jump tk 0≤s

N−1 X 2 X h N 2 i h i E 4t gt = lim E 4 tk gtN E 1J ∩ tN ,tN 6=∅ . (88) N→∞ k i [ k k+1] t∈Ji∩[t1,t2] k=0

 (i) The set Ji = Jump Bt is a Poisson process with rate 1/2t, and so the number of jumps in the  N N  interval tk , tk+1 distributes as Pois (λ), where

Z tN N k+1 1 1 tk+1 λ = dt = log N . N 2t 2 t tk k The probability of having at least one jump is then equal to

s N s   N N   −λ tk (t2 − t1) /N P Ji ∩ tk , tk+1 6= ∅ = 1 − e = 1 − N = 1 − 1 − N tk+1 tk+1   (t2 − t1) /N 1 = N + O 2 . 2tk+1 N Plugging this into display (88), we get

N−1    X 2 X N 2 (t2 − t1) /N 1 E 4t gt = lim E 4 tk gtN + O . N→∞ k N 2 2tk N t∈Ji∩[t1,t2] k=0

1  The factor O N 2 is negligible in the limit N → ∞, since the sum contains only N bounded terms. We are left with

N−1 N−1 X N 2 (t2 − t1) /N X h N i t2 − t1 lim E 4 tk gtN = lim E 2tk gtN N→∞ k 2tN N→∞ k N k=0 k+1 k=0 N−1 X h N i t2 − t1 (bounded convergence) = E lim 2tk gtN . N→∞ k N k=0

Since gt is continuous almost everywhere, by the definition of the Riemann integral, the limit is equal to 2 R t2 t · g dt, and we get E t1 t

Z t2 X 2 E 4t gt = 2E t · gtdt t1 t∈Ji∩[t1,t2]

47 for all t1 > 0. Taking the limit t1 → 0 gives the desired result for t1 = 0 by continuity of the right hand side in t1. The proof for condition (2), where gt = g (Bt) is similar: Since gt is right-continuous, we now approximate the sum in the left hand side of (22) using the right-endpoint of the interval:

N−1 X 2 X N 2   4t gt = lim 4 tk+1 g BtN 1J ∩ tN ,tN 6=∅. N→∞ k+1 i [ k k+1] t∈Ji∩[t1,t2] k=0 Using the same reasoning as above we may interchange the expectation and limit, obtaining.

N−1 X 2 X N 2   E 4t gt = lim E 4 tk+1 g BtN 1J ∩ tN ,tN 6=∅. N→∞ k+1 i [ k k+1] t∈Ji∩[t1,t2] k=0

 N N  Since B N is independent of whether or not a jump occurred in the interval t , t , the expec- tk+1 k k+1 tation again breaks up into

N−1 X 2 X h N 2  i h i E 4t g (Bt) = lim E 4 tk+1 g BtN E 1J ∩ tN ,tN 6=∅ . N→∞ k+1 i [ k k+1] t∈Ji∩[t1,t2] k=0 From here onwards the proof is identical.

 K Proof of Lemma 17. Let t1 = inf {t | g (t) ≥ K}, and denote L = max x log x | x ∈ [0,K] . Note that L depends only on K. Then for all t ≤ t1, we have g0 (t) ≤ C · L.

Integrating, this means that for all t ≤ t1 K g (t) ≤ g (0) + tCL ≤ + tCL. 2 K In particular, t1 ≥ 2CL , otherwise we’d have g (t1) < K, contradicting the definition of t1 and K continuity of g. Denoting t0 = 4CL , we must have g (t) < K for all t ∈ [0, t0]. This ensures that K log g(t) is positive in this interval, which means we can rearrange the differential inequality (29) to give g0 (t) − ≤ C g(t) g (t) log K for all t ∈ [0, t0]. A short calculation reveals that the left hand side is the derivative of − log log (K/g). Integrating from 0 to t, we get K K log log − log log ≤ Ct. g (0) g (t) Rearranging gives K K log log ≥ log log − Ct, g (t) g (0) and exponentiating twice gives the desired result.

48