Chapter 6

Branching processes

Branching processes arise naturally in the study of stochastic processes on trees and locally tree-like graphs. After a review of the basic extinction theory of branching processes, we give a few classical examples of applications in discrete probability.

6.1 Background

We begin with a review of the extinction theory of Galton-Watson branching pro- cesses.

6.1.1 Basic definitions Recall the definition of a Galton-Watson process.

Definition 6.1. A Galton-Watson branching process is a of the fol- Galton-Watson lowing form: process

• Let Z0 := 1.

• Let X(i, t), i 1, t 1, be an array of i.i.d. Z+-valued random variables with finite mean m = E[X(1, 1)] < + , and define inductively, 1

Zt := X(i, t).

1 i Zt 1 X

To avoid trivialities we assume P[X(1, 1) = i] < 1 for all i 0. Version: November 19, 2020 Modern Discrete Probability: An Essential Toolkit Copyright © 2020 Sebastien´ Roch

309 In words, Zt models the size of a population at time (or generation) t. The X(i, t) corresponds to the number of offspring of the i-th individual (if there is one) in generation t 1. Generation t is formed of all offsprings of the individuals in generation t 1. X(1,1) We denote by pk k 0 the law of X(1, 1). We also let f(s):=E[s ] be { } the corresponding probability generating function. By tracking genealogical relationships, that is, who is whose child, we obtain a tree T rooted at the single individual in generation 0 with a vertex for each indi- vidual in the progeny and an edge for each parent-child relationship. We refer to T as a Galton-Watson tree. Galton-Watson A basic observation about Galton-Watson processes is that their growth is ex- tree ponential in t.

t Lemma 6.2 (Exponential growth I). Let Mt := m Zt. Then (Mt) is a nonneg- ative martingale with respect to the filtration t = (Z0,...,Zt). In particular, t F E[Zt]=m . Proof. Recall the following measure-theoretic lemma (see e.g. [Dur10, Exercise 5.1.1]).

Lemma 6.3. Let (⌦, , P) be a probability space. If Y1 = Y2 a.s. on B then F 2F E[Y1 ]=E[Y2 ] a.s. on B. |F |F Returning to the proof, observe that on Zt 1 = k { }

E[Zt t 1]=E X(j, t) t 1 = mk = mZt 1. |F 2 F 3 1 j k X 4 5 This is true for all k. Rearranging shows that(Mt) is a martingale. For the second claim, note that E[Mt]=E[M0]=1. In fact, the martingale convergence theorem gives the following.

Lemma 6.4 (Exponential growth II). We have Mt M < + a.s. for some ! 1 1 nonnegative random variable M ( t t) with E[M ] 1. 1 2 [ F 1  Proof. This follows immediately from the martingale convergence theorem for nonnegative martingales (Corollary 3.36) and Fatou’s lemma.

310 6.1.2 Extinction Observe that 0 is a fixed point of the process. The event

Z 0 = t : Z =0 , { t ! } {9 t } is called extinction. Establishing when extinction occurs is a central question in extinction branching process theory. We let ⌘ be the probability of extinction. Throughout, we assume that p0 > 0 and p1 < 1. Here is a first observation about extinction.

Lemma 6.5. A.s. either Zt 0 or Zt + . ! ! 1 Proof. The process (Zt) is integer-valued and 0 is the only fixed point of the pro- cess under the assumption that p1 < 1. From any state k, the probability of never k coming back to k>0 is at least p0 > 0, so every state k>0 is transient. The claim follows.

In the critical case, that immediately implies almost sure extinction.

Theorem 6.6 (Extinction: critical case). Assume m =1. Then Zt 0 a.s., i.e., ! ⌘ =1.

Proof. When m =1, (Zt) itself is a martingale. Hence (Zt) must converge to 0 by Lemma 6.4.

Zt We address the general case using generating functions. Let ft(s)=E[s ]. Note that, by monotonicity,

⌘ = P[ t 0:Zt = 0] = lim P[Zt = 0] = lim ft(0), (6.1) t + t + 9 ! 1 ! 1

Moreover, by the , ft has a natural recursive form

Zt ft(s)=E[s ] Zt = E[E[s t 1]] |F Zt 1 = E[f(s) ] (t) = ft 1(f(s)) = = f (s), (6.2) ··· where f (t) is the t-th iterate of f.

Theorem 6.7 (Extinction: subcriticial and supercritical cases). The probability of extinction ⌘ is given by the smallest fixed point of f in [0, 1]. Moreover:

• (Subcritical regime) If m<1 then ⌘ =1.

311 • (Supercritical regime) If m>1 then ⌘<1.

Proof. The case p0 + p1 =1is straightforward: the process dies after a geometrically distributed time. So we assume p0 + p1 < 1 for the rest of the proof. We first summarize some properties of f. Lemma 6.8. On [0, 1], the function f satisfies:

(a) f(0) = p0, f(1) = 1; (b) f is indefinitely differentiable on [0, 1);

(c) f is strictly convex and increasing;

(d) lims 1 f 0(s)=m<+ . " 1 Proof. See e.g. [Rud76] for the relevant power series facts. Observe that (a) is clear by definition. The function f is a power series with radius of convergence R 1. This implies (b). In particular, i 1 i 2 f 0(s)= ip s 0, and f 00(s)= i(i 1)p s > 0, i i i 1 i 2 X X because we must have pi > 0 for some i>1 by assumption. This proves (c). Since m<+ , f (1) = m is well defined and f is continuous on [0, 1], which 1 0 0 implies (d).

We first characterize the fixed points of f. See Figure 6.1 for an illustration. Lemma 6.9. We have: • If m>1 then f has a unique fixed point ⌘ [0, 1). 0 2 • If m<1 then f(t) >tfor t [0, 1). Let ⌘ := 1 in that case. 2 0 Proof. Assume m>1. Since f (1) = m>1, there is >0 s.t. f(1 ) < 1 . 0 On the other hand f(0) = p0 > 0 so by continuity of f there must be a fixed point in (0, 1 ). Moreover, by strict convexity and the fact that f(1) = 1, if x (0, 1) 2 is a fixed point then f(y)

It remains to prove convergence of the iterates to the appropriate fixed point. See Figure 6.2 for an illustration. Lemma 6.10. We have:

312 Figure 6.1: Fixed points of f in subcritical (left) and supercritical (right) cases.

Figure 6.2: Convergence of iterates to a fixed point.

313 • If x [0,⌘ ), then f (t)(x) ⌘ 2 0 " 0 • If x (⌘ , 1) then f (t)(x) ⌘ 2 0 # 0 Proof. We only prove 1. The argument for 2. is similar. By monotonicity, for x [0,⌘ ), we have x

The result then follows from the above lemmas together with Equations (6.1) and (6.2).

Example 6.11 (Poisson branching process). Consider the offspring distribution X(1, 1) Poi() with >0. We refer to this case as the Poisson branching ⇠ process. Then

i X(1,1) i (s 1) f(s)=E[s ]= e s = e . i! i 0 X So the process goes extinct with probability 1 when 1. For >1, the  probability of extinction ⌘ is the smallest solution in [0, 1] to the equation

(1 x) e = x.

The survival probability ⇣ := 1 ⌘ satisfies 1 e ⇣ = ⇣ . J We can use the extinction results to obtain more information on the limit in Lemma 6.4. Of course, conditioned on extinction, M =0a.s. On the other 1 hand:

Lemma 6.12 (Exponential growth III). Conditioned on nonextinction, either M = 1 0 a.s. or M > 0 a.s. In particular, P[M = 0] ⌘, 1 . 1 1 2{ } Proof. A property of rooted trees is said to be inherited if all finite trees satisfy this property and whenever a tree satisfies the property then so do all the descendant trees of the children of the root. The property M =0 is inherited. The result { 1 } then follows from the following 0-1 law.

314 Lemma 6.13 (0-1 law for inherited properties). For a Galton-Watson tree T , an inherited property A has, conditioned on nonextinction, probability 0 or 1.

Proof. Let T (1),...,T(Z1) be the descendant subtrees of the children of the root. We use the notation T A to mean that the tree T satisfies A. Then, by indepen- 2 dence,

(i) Z1 P[A]=E[P[T A Z1]] E[P[T A, i Z1 Z1]] = E[P[A] ]=f(P[A]), 2 |  2 8  | so P[A] [0,⌘] 1 . Also P[A] ⌘ because A holds for finite trees. 2 [{ } That concludes the proof.

A further moment assumption provides a more detailed picture.

Lemma 6.14 (Exponential growth: finite second moment). Let (Zt) be a branch- 2 ing process with m = E[X(1, 1)] > 1 and = Var[X(1, 1)] < + . Then, (Mt) 1 converges in L2 and, in particular, E[M ]=1. Further, P[M = 0] = ⌘. 1 1 2 Proof. We bound E[Mt ] by computing it explicitly by induction. From the orthog- onality of increments (Lemma 3.40), it holds that

2 2 2 E[Mt ]=E[Mt 1]+E[(Mt Mt 1) ].

On Zt 1 = k , { } 2 2t 2 E[(Mt Mt 1) t 1]=m E[(Zt mZt 1) t 1] |F |F k 2 2t = m E X(i, t) mk t 1 2 ! F 3 i=1 X = m 2tk42 5 2t 2 = m Zt 1 . Hence 2 2 t 1 2 E[Mt ]=E[Mt 1]+m . 2 Since E[M0 ]=1, t+1 2 2 i E[Mt ]=1+ m , Xi=2 2 which is uniformly bounded when m>1. Therefore (Mt) converges in L (see e.g. [Dur10, Theorem 5.4.5]). Finally, by Fatou’s lemma,

E M sup Mt 1 sup Mt 2 < + | 1| k k  k k 1 315 and E[Mt] E[M ] Mt M 1 Mt M 2, | 1 |k 1k k 1k implies the convergence of expectations. The last statement follows from Lemma 6.12.

Remark 6.15. The Kesten-Stigum Theorem gives a necessary and sufficient condition for E[M ]=1to hold [KS66b]. See e.g. [LP, Chapter 12]. 1

6.1.3 . Bond percolation on Galton-Watson trees Let T be a Galton-Watson tree for an offspring distribution with mean m>1. Perform bond percolation on T with density p.

Theorem 6.16 (Bond percolation on Galton-Watson trees). Conditioned on nonex- tinction, 1 p (T )= a.s. c m Proof. Let be the cluster of the root in T with density p. We can think of C0 C0 as being generated by a Galton-Watson branching process where the offspring dis- X(1,1) tribution is the law of i=1 Ii where the Iis are i.i.d. Ber(p) and X(1, 1) is distributed according to the offspring distribution of T . In particular, by condition- P ing on X(1, 1), the offspring mean under is mp. If mp 1 then C0 

1=Pp[ 0 < + ]=E[Pp[ 0 < + T ]], |C | 1 |C | 1| 1 and we must have Pp[ 0 < + T ]=1a.s. In other words, pc(T ) a.s. |C | 1| m On the other hand, the property of trees Pp[ 0 < + T ]=1 is inherited. { |C | 1| } So by Lemma 6.13, conditioned on nonextinction, it has probability 0 or 1. That probability is of course 1 on extinction. So by

Pp[ 0 < + ]=E[Pp[ 0 < + T ]], |C | 1 |C | 1| if the probability is 1 conditioned on nonextinction then it must be that mp 1.  In other words, for any fixed p such that mp > 1, conditioned on nonextinction Pp[ 0 < + T ]=0a.s. By monotonicity of Pp[ 0 < + T ] in p, taking a |C | 1| |C | 1| limit p 1/m proves the result. n ! 6.1.4 . on Galton-Watson trees To be written. See [LP, Theorem 3.5 and Corollary 5.10].*

*Requires: Section 2.3.3 and 3.1.1.

316 6.2 Random-walk representation

In this section, we develop a useful random-walk representation of the Galton- Watson process.

6.2.1 Exploration process Consider the following exploration process of a Galton-Watson tree T . The explo- ration process, started at the root 0, has 3 types of vertices: active, explored, and neutral - t: active vertices, A vertices - : explored vertices, Et - : neutral vertices. Nt We start with := 0 , := , and contains all other vertices in T . At time A0 { } E0 ; N0 t, if t 1 = we let ( t, t, t):=( t 1, t 1, t 1). Otherwise, we pick an A ; A E N A E N element, at, from t 1 and set: A - t := t 1 x t 1 : x, at T at , A A [{ 2N { }2 }\{ } - t := t 1 at , E E [{ } - t := t 1 x t 1 : x, at T . N N \{ 2N { }2 } To be concrete, we choose at in breadth-first search (or first-come-first-serve) man- ner: we exhaust all vertices in generation t before considering vertices in genera- tion t +1. We imagine revealing the edges of T as they are encountered in the exploration process and we let ( ) be the corresponding filtration. In words, starting with Ft 0, the Galton-Watson tree T is progressively grown by adding to it at each time a child of one of the previously explored vertices and uncovering its children in T . In this process, is the set of previously explored vertices and is the set of Et At vertices who are known to belong to T but whose full neighborhood is waiting to be uncovered. The rest of the vertices form the set . Nt Let A := , E := , and N := . Note that (E ) is non-decreasing t |At| t |Et| t |Nt| t while (Nt) is non-increasing. Let

⌧ := inf t 0:A =0 , 0 { t } (which by convention is + if there is no such t). The process is fixed for all 1 t>⌧ . Notice that E = t for all t ⌧ , as exactly one vertex is explored at each 0 t  0

317 time until the set of active vertices is empty. Moreover, for all t, ( , , ) forms At Et Nt a partition of [n] so A + t + N = n, t ⌧ . t t 8  0 Lemma 6.17 (Total progeny). Let W be the total progeny. Then

W = ⌧0.

The random-walk representation is the following. Observe that the process (At) admits a simple recursive form. Recall that A0 := 1. Conditioning on t 1: F - If At 1 =0, the exploration process has finished its course and At =0. - Otherwise, (a) one active vertex becomes an explored vertex and (b) its neu- tral neighbors become active vertices. That is,

At 1 + 1 + Xt ,t 1 <⌧0, A = t 8 ⇥ (a) (b) ⇤ > <0, |{z} |{z} o.w. > where Xt is distributed: according to the offspring distribution.

We let Y = X 1 1 and t t t St := 1 + Yi, Xi=1 with S0 := 1. Then

⌧ =inf t 0:S =0 0 { t } =inf t 0:1+[X 1] + +[X 1] = 0 { 1 ··· t } =inf t 0:X + + X = t 1 , { 1 ··· t } and (At) is a random walk started at 1 with steps (Yt) stopped when it hits 0 for the first time:

At =(St ⌧0 ). ^ We give two applications.

318 6.2.2 Duality principle The random-walk representation above is useful to prove the following duality principle.

Theorem 6.18 (Duality principle). Let (Zt) be a branching process with offspring distribution pk k 0 and extinction probability ⌘<1. Let (Zt0) be a branching { } process with offspring distribution pk0 k 0 where { } k 1 pk0 = ⌘ pk.

Then (Zt) conditioned on extinction has the same distribution as (Zt0), which is referred to as the dual branching process. dual branching process Proof. We use the random walk representation. Let H =(X1,...,X⌧0 ) and H0 = (X0 ,...,X0 ) be the histories of the processes (Zt) and (Zt0) respectively. (Under 1 ⌧00 breadth-first search, the process (Zt) can be reconstructed from H.) In the case of extinction, the history of (Zt) has finite length. We call (x1,...,xt) a valid history if x + + x (i 1) > 0 for all i

t P[H =(x1,...,xt)] 1 P[H =(x1,...,xt) ⌧0 < + ]= = ⌘ pxi . | 1 P[⌧0 < + ] 1 Yi=1 Because x + + x = t 1, 1 ··· t t t t 1 1 1 xi ⌘ pxi = ⌘ ⌘ px0 i = px0 i = P[H0 =(x1,...,xt)]. Yi=1 Yi=1 Yi=1

Note that k 1 1 pk0 = ⌘ pk = ⌘ f(⌘)=1, k 0 k 0 X X because ⌘ is a fixed point of f. So pk0 k 0 is indeed a . { } Note further that k 1 kpk0 = k⌘ pk = f 0(⌘) < 1, k 0 k 0 X X since f 0 is strictly increasing, f(⌘)=⌘<1 and f(1) = 1. So the dual branching process is subcritical.

319 Example 6.19 (Poisson branching process). Let (Zt) be a Galton-Watson branch- ing process with offspring distribution Poi() where >1. Then the dual proba- bility distribution is given by

k k k 1 k 1 1 (⌘) p0 = ⌘ p = ⌘ e = ⌘ e , k k k! k!

(1 ⌘) where recall that e = ⌘, so

k k (1 ⌘) (⌘) ⌘ (⌘) p0 = e e = e . k k! k! That is, the dual branching process has offspring distribution Poi(⌘). J

6.2.3 Hitting-time theorem The random-walk representation also gives a formula for the distribution of the size of the progeny. We start with a combinatorial lemma of independent interest. Let u ,...,u 1 t 2 R and define r0 := 0 and ri := u1 + +ui for 1 i t. We say that j is a ladder ···   index if rj >r0 rj 1. Consider the cyclic permutations of u =(u1,...,ut): (0) (1)_···_ (t 1) u = u, u =(u2,...,ut,u1), ... , u =(ut,u1,...,ut 1). Define the corresponding partial sums r() := u() + + u() for j =1,...,t and j 1 ··· j =0,...,t 1. Observe that () () (r1 ,...,rt ) =(r r ,r r ,...,r r , +1 +2 t [r r ]+r , [r r ]+r ,...,[r r ]+r ) t 1 t 2 t =(r r ,r r ,...,r r , +1 +2 t rt [r r1],rt [r r2],...,rt [r r 1],rt). (6.3)

Lemma 6.20 (Spitzer’s combinatorial lemma). Assume rt > 0. Let ` be the num- ber of cyclic permutations such that t is a ladder index. Then ` 1. Moreover, each such cyclic permutation has exactly ` ladder indices.

Proof. We first show that ` 1, i.e., there is at least one cyclic permutation where t is a ladder index. Let be the smallest index achieving the maximum of r1,...,rt, i.e., r >r1 r 1 and r r+1 rt. _···_ _···_ From (6.3), r r 0 0=r0 by assumption. So, in u , t is a ladder index. Since ` 1, we can assume w.l.o.g. that u is such that t is a ladder index. Then is a ladder index in u if and only if

r >r0 r 1, _···_ if and only if

r >r r and r [r r ] r for all j, we have r [r r ]=(r r )+r t j t +i t +i and the last equation is equivalent to

r >r [r r ], i =1,...,t t t +i 8 and r [r r ]

We are now ready to prove the hitting-time theorem.

Theorem 6.21 (Hitting-time theorem). Let (Zt) be a Galton-Watson branching process with total progeny W . In the random walk representation of (Zt), 1 P[W = t]= P[X1 + + Xt = t 1], t ··· for all t 1. Proof. Let R := 1 S and U := 1 X for all i =1,...,tand let R := 0. i i i i 0 Then X + + X = t 1 = R =1 , { 1 ··· t } { t } and W = t = t is the first ladder index in R ,...,R . { } { 1 t} By symmetry, for all

P[t is the first ladder index in R1,...,Rt] () () = P[t is the first ladder index in R1 ,...,Rt ].

321 Let be the event on the last line. Hence E t 1 P[W = t]=E[1 1 ]= E 1 . E t 2 E 3 X=1 4 5 By Spitzer’s combinatorial lemma, there is at most one cyclic permutation t where t is the first ladder index. In particular, =1 1 0, 1 . So E 2{ } P 1 t P[W = t]= P . t [=1E ⇥ ⇤ Finally observe that, because R =0and U 1 for all i, the partial sum at the 0 i  j-th ladder index must take value j. So the event t implies that R =1 {[=1E} { t } because the last partial sum of all cyclic permutations is Rt. Similarly, because there is at least one cyclic permutation such that t is a ladder index, the event R =1 implies t . Therefore, { t } {[=1E} 1 P[W = t]= P [Rt = 1] , t which concludes the proof.

Note that the formula in the hitting-time theorem is somewhat remarkable as the probability on the l.h.s. is P[Si > 0, i

Example 6.22 (Poisson branching process). Let (Zt) be a Galton-Watson branch- ing process with offspring distribution Poi() where >0. Let W be its total progeny. By the hitting-time theorem, for t 1, 1 P[W = t]= P[X1 + + Xt = t 1] t ··· t 1 1 t (t) = e t (t 1)! t 1 t (t) = e , t! where we used that a sum of independent Poisson is Poisson. J

322 6.3 Comparison to branching processes

We begin with an example whose connection to branching processes is clear: per- colation on trees. Translating standard branching process results into their perco- lation counterpart immediately gives a more detailed picture of the behavior of the process than was derived in Section 2.3.3. We then tackle the phase transition of Erdos-R¨ enyi´ graphs using a comparison to branching processes.

6.3.1 . Percolation on trees: critical exponents In this section, we use branching processes to study bond percolation on the infinite b-ary tree Tb. The same techniques can be adapted to Td with d = b +1in a straightforward manner. We denoteb the root by 0. We think of the open cluster of the root, , as the C0 progeny of a branching process as follows. Denote by @n the n-th level of Tb, that is, the vertices of Tb at graph distance n from the root. In the branching process interpretation, we think of the immediate descendants in of a vertexb v C0 as the “children” of v. Byb construction, v has at most b children, independently of all other vertices in the same generation. In this branching process, the offspring distribution is binomial with parameters b and p; Z := @ represents the n |C0 \ n| size of the progeny at generation n; and W := is the total progeny of the |C0| process. In particular < + if and only if the process goes extinct. Because |C0| 1 the mean number of offspring is bp, by Theorem 6.7, this leads immediately to a (second) proof of:

Claim 6.23. 1 pc Tb = . b ⇣ ⌘ The generating function of the offspringb distribution is (s):=((1 p)+ps)b. So, by Theorems 6.6 and 6.7, the percolation function

✓(p)=Pp[ 0 =+ ], |C | 1 is 0 on [0, 1/b], while on (1/b, 1] the quantity ⌘(p):=(1 ✓(p)) is the unique solution in [0, 1) of the fixed point equation

s =((1 p)+ps)b. (6.4) For b =2, for instance, we can compute the fixed point explicitly by noting that

0=((1 p)+ps)2 s = p2s2 +[2p(1 p) 1]s +(1 p)2, 323 whose solution for p (1/2, 1] is 2 [2p(1 p) 1] [2p(1 p) 1]2 4p2(1 p)2 s⇤ = ± 2 p 2p [2p(1 p) 1] 1 4p(1 p) = ±2 2p p [2p(1 p) 1] (2p 1) = ± 2p2 2p2 +[(1 2p) (2p 1)] = ± . 2p2 So, rejecting the fixed point 1, 2p2 + 2(1 2p) 2p 1 ✓(p)=1 = . 2p2 p2 We have proved: Claim 6.24. For b =2,

1 0, 0 p 2 , ✓(p)= 1   2(p 2 ) 1 ( 2

k 1 b k b k =[⌘(p)] p (1 p) k ✓ ◆ k [⌘(p)] b k b k = p (1 p) ((1 p)+p⌘(p))b k ✓ ◆ k b k b p⌘(p) 1 p = k (1 p)+p⌘(p) (1 p)+p⌘(p) ✓ ◆✓ ◆ ✓ ◆ b k b k = pˆ (1 pˆ) k ✓ ◆ 324 where we used (6.4) and implicitly defined the dual density

p⌘(p) pˆ := . (6.5) (1 p)+p⌘(p) In particular pˆ is indeed a probability distribution. In fact it is binomial with { k} parameters b and pˆ. The corresponding generating function is

b 1 ˆ(s):=((1 pˆ)+ˆps) = ⌘(p) (s⌘(p)), where the second expression can be seen directly from the definition of pˆ . { k} Moreover, 1 ˆ0(s)=⌘(p) 0(s⌘(p)) ⌘(p)=0(s⌘(p)), so ˆ0(1)=0(⌘(p)) < 1 by the proof of Theorem 6.7, confirming that percola- tion with density pˆ is subcritical. Summarizing:

Claim 6.25. Conditioned on 0 < + , (supercritical) percolation on Tb with 1 |C | 1 density p ( , 1) has the same distribution as (subcritical) percolation on Tb with 2 b density defined by (6.5). b b Therefore:

Claim 6.26.

1 1 f 1 bp ,p [0, b ), (p):=Ep 0 1 0 <+ = ⌘(p) 2 {|C | 1} 1 |C | ( 1 bpˆ,p ( b , 1). ⇥ ⇤ 2 2 1 p For b =2, ⌘(p)=1 ✓(p)= so p ⇣ ⌘ 2 1 p p p (1 p)2 pˆ = 2 = 2 =1 p, ⇣ ⌘1 p p(1 p)+(1 p) (1 p)+p p ⇣ ⌘ and

Claim 6.27. For b =2,

1/2 ,p[0, 1 ), 1 p 2 2 2 f 2 (p)= 1 1 p 8 2 p ,p ( 1 , 1). <> ⇣p 1 ⌘ 2 2 2 :> 325 In fact, the hitting-time theorem, Theorem 6.21, gives an explicit formula for d the distribution of 0 . Namely, because 0 = ⌧0 for St = ` t X` (t 1) |C | |C |  where S0 =1and the X s are i.i.d. binomial with parameters b and p and further ` P 1 P[⌧0 = t]= P[St = 0], t we have

1 1 b` ` 1 b` (` 1) Pp[ 0 = `]= P X` = ` 1 = p (1 p) , (6.6) |C | ` 2 3 ` ` 1 i ` ✓ ◆ X 4 5 where we used that a sum of independent binomials with the same p is still bino- mial. In particular, at criticality, using Stirling’s formula it can be checked that 1 1 1 Ppc [ 0 = `] = . |C | ⇠ ` 2⇡p (1 p )b` 2⇡(1 p )`3 c c c as ` + . p p ! 1 Close to criticality, physicists predict that many quantities behave according to power laws of the form p p , where the exponent is referred to as a critical | c| exponent. The critical exponents are believed to satisfy certain “universality” prop- critical exponent erties. But even proving the existence of such exponents in general remains a major open problem. On trees, though, we can simply read off the critical exponents from the above formulas. For b =2, Claims 6.24 and 6.27 imply for instance that, as p p , ! c ✓(p) 8(p pc)1 p>1/2 , ⇠ { } and f 1 1 (p) p p . ⇠ 2| c| In fact, as can be seen from Claim 6.26, the critical exponent of f (p) does not depend on b. The same holds for ✓(p). See Exercise 6.5. Using (6.6), the higher moments of can also be studied around criticality. See Exercise 6.6. |C0| 6.3.2 . Random binary search trees: height To be written. See [Dev98, Section 2.1].

326 6.3.3 . Erdos-R¨ enyi´ graph: the phase transition A compelling way to view Erdos-R¨ enyi´ graphs as the density varies is the following † coupling or “evolution.” For each pair i, j , let U i,j be independent uniform { } { } random variables in [0, 1] and set (p):=([n], (p)) where i, j (p) if and G E { }2E only if U i,j p. Then (p) is distributed according to Gn,p. As p varies from 0 { }  G to 1, we start with an empty graph and progressively add edges until the complete log n graph is obtained. We showed in Section 2.3.2 that n is a threshold function for connectivity. Before connectivity occurs in the evolution of the , a quantity of interest is the size of the largest connected component. As we show in this section, this quantity itself undergoes a remarkable phase transition: when p = n with <1, the largest component has size ⇥(log n); as crosses 1, many components quickly merge to form a so-called “giant component” of size ⇥(n). This celebrated result of Erdos¨ and Renyi,´ which is often referred to as “the” phase transition of the Erdos-R¨ enyi´ graph, is related to the phase transition in per- colation. That should be clear from the similarities between the proofs, specifically the branching process approach to percolation on trees (Section 6.3.1). Although the proof is quite long, it is well worth studying in details. It employs most tools we have seen up to this point: first and second moment methods, Chernoff-Cramer´ bound, martingale techniques, coupling and stochastic domination, and branching processes. It is quintessential discrete probability.

Statements and proof sketch Before stating the main theorems, we recall a basic result from Chapter 2.

- (Poisson tail) Let Sn be a sum of n i.i.d. Poi() variables. Recall from (2.33) and (2.34) that for a>

1 a Poi log P[Sn an] a log a + =: I (a), (6.7) n ⇣ ⌘ and similarly for a<

1 Poi log P[Sn an] I (a). (6.8) n  To simplify the notation, we let

I := IPoi(1) = 1 log 0, (6.9)

where the inequality follows from the convexity of I and the fact that it attains its minimum at =1where it is 0. †Requires: Sections 2.3.1, 2.4.1, 4.1 and 4.3.1

327 We let p = and denote by a largest connected component. In the subcritical n Cmax case, that is, when <1, we show that the largest connected component has logarithmic size in n.

Theorem 6.28 (Subcritical case: upper bound on the largest cluster). Let Gn ⇠ Gn,pn where pn = with (0, 1). For all >0, n 2 1 Pn,pn max > (1 + )I log n = o(1), |C | ⇥ ⇤ where I is defined in (6.9). (We also give a matching logarithmic lower bound on the size of in Theo- Cmax rem 6.36.) In the supercritical case, that is, when >1, we prove the existence of a unique connected component of size linear in n, which is referred to as the giant component.

Theorem 6.29 (Supercritical regime: giant component). Let Gn Gn,pn where ⇠ p = with >1. For any (1/2, 1) and <2 1, n n 2 Pn,pn [ max ⇣n n ] O(n ). ||C | |  In fact, with probability 1 o(1), there is a unique largest component and the second largest cluster has size ⌦(log n).

See Figure 6.3 for an illustration. At a high level, the proof goes as follows:

• (Subcritical regime) In the subcritical case, we use an exploration process and a domination argument to approximate the size of the connected compo- nents with the progeny of a branching process. The result then follows from the hitting-time theorem and the Poisson tail above.

• (Supercritical regime) In the supercritical case, a similar argument gives a bound on the expected size of the giant component, which is related to the survival of the branching process. Chebyshev’s inequality gives concentra- tion. The hard part there is to bound the variance.

Exploration process For a vertex v [n], let v be the connected component 2 C containing v, also referred to as the cluster of v. To analyze the size of , we Cv cluster introduce a natural procedure to explore and show that it is dominated above Cv and below by branching processes. This procedure is similar to the exploration process defined in Section 6.2.1. The exploration process started at v has 3 types of vertices: active, explored, and neutral 328 vertices Figure 6.3: Illustration of the phase transition.

- : active vertices, At - : explored vertices, Et - : neutral vertices. Nt We start with := v , := , and contains all other vertices in G . At A0 { } E0 ; N0 n time t, if t 1 = we let ( t, t, t):=( t 1, t 1, t 1). Otherwise, we pick A ; A E N A E N a random element, at, from t 1 and set: A - t := ( t 1 at ) x t 1 : x, at Gn A A \{ } [{ 2N { }2 } - t := t 1 at E E [{ } - t := t 1 x t 1 : x, at Gn N N \{ 2N { }2 } We imagine revealing the edges of Gn as they are encountered in the exploration process and we let ( ) be the corresponding filtration. In words, starting with v, Ft the cluster of v is progressively grown by adding to it at each time a vertex adjacent to one of the previously explored vertices and uncovering its neighbors in Gn. In this process, is the set of previously explored vertices and —the frontier of Et At the process—is the set of vertices who are known to belong to but whose full Cv neighborhood is waiting to be uncovered. The rest of the vertices form the set . Nt See Figure 6.4.

329 Figure 6.4: Exploration process for . The green edges are in . The red ones Cv Ft are not.

Let A := , E := , and N := . Note that (E ) is non-decreasing t |At| t |Et| t |Nt| t while (Nt) is non-increasing. Let

⌧ := inf t 0:A =0 . 0 { t } The process is fixed for all t>⌧ . Notice that E = t for all t ⌧ , as exactly one 0 t  0 vertex is explored at each time until the set of active vertices is empty. Moreover, for all t, ( , , ) forms a partition of [n] so At Et Nt A + t + N = n, t ⌧ . (6.10) t t 8  0 Hence, in tracking the size of the exploration process, we can work alternatively with At or Nt. Specifically , the size of the cluster of v can be characterized as follows.

Lemma 6.30. ⌧ = . 0 |Cv| Proof. Indeed a single vertex of is explored at each time until all of has been Cv Cv visited. At that point, is empty. At The processes (At) and (Nt) admit a simple recursive form. Conditioning on t 1: F 330 - (Active vertices) If At 1 =0, the exploration process has finished its course and At =0. Otherwise, (a) one active vertex becomes an explored vertex and (b) its neutral neighbors become active vertices. That is,

At = At 1 + 1 At 1>0 1 + Zt , (6.11) { } ⇥ (a) (b) ⇤ where Zt is binomial with parameters Nt 1|{z}= n |{z}(t 1) At 1 and pn. For the coupling arguments below, it will be useful to think of Zt as a sum of independent Bernoulli variables. That is, let (I : t 1,j 1) be an t,j array of independent, identically distributed 0, 1 -variables with P[I1,1 = { } 1] = pn. We write Nt 1 Zt = It,i. (6.12) Xi=1 - (Neutral vertices) Similarly, if At 1 > 0, i.e. Nt 1

Nt = Nt 1 1 Nt 1

Branching process arguments With these observations, we now relate the clus- ter size of v to the total progeny of a certain branching process. This is the key lemma.

Lemma 6.31 (Cluster size: branching process approximation). Let Gn Gn,pn ⇠ where p = with >0 and let be the connected component of v [n]. Let n n Cv 2 W be the total progeny of a branching process with offspring distribution Poi(). Then, for kn = o(pn), 2 kn P[W kn] O Pn,pn [ v kn] P[W kn]. n  |C |  ✓ ◆ Before proving the lemma, recall the following simple domination results from Chapter 4: - (Binomial domination) We have n m = Bin(n, p) Bin(m, p). (6.14) ) ⌫ The binomial distribution is also dominated by the Poisson distribution in the following way: (0, 1) = Poi() Bin n 1, . (6.15) 2 ) ⌫ n ✓ ◆ For the proofs, see Examples 4.24 and 4.28.

331 We use these domination results to relate the size of the connected components to the progeny of the branching process.

Proof of Lemma 6.31. We start with the upper bound.

Upper bound: Because Nt 1 = n (t 1) At 1 n 1, conditioned on t 1,  F the following stochastic domination relations hold Bin Nt 1, Bin n 1, Poi(), n n ✓ ◆ ✓ ◆ by (6.14) and (6.15). Observe that the r.h.s. does not depend on Nt 1. Let (Zt) be a sequence of independent Poi(). Using the coupling in Example 4.28, we can n 1 couple the processes (I ) and (Z) in such way that Z I a.s. for all t,j j t t j=1 t,j t. Then by induction on t, for all t, 0 At A a.s. where we define   t P A := A + 1 1+Z , (6.16) t t 1 At 1>0 t { } ⇥ ⇤ with A0 := 1. (In fact, this is a domination of Markov transition matrices, as defined in Definition 4.36.) In words, (At) is stochastically dominated by the ex- ploration process of a branching process with offspring distribution Poi(). As a result, letting ⌧ := inf t 0:A =0 , 0 { t } be the total progeny of the branching process, we immediately get

Pn,pn [ v kn]=Pn,pn [⌧0 kn] P[⌧ kn]=P[W kn]. |C |  0 Lower bound: In the other direction, we proceed in two steps. We first show that, up to a certain time, the process is bounded from below by a branching process with binomial offspring distribution. In a second step, we show that this binomial branching process can be approximated by a Poisson branching process.

1. (Domination from below) Let At be defined as

A := A + 1 1+Z , (6.17) t t 1 At 1>0 t { } ⇥ ⇤ with A0 := 1, where n kn Zt := It,j. (6.18) Xi=1 Note that (At) is the size of the active set in the exploration process of a branching process with offspring distribution Bin(n k ,p ). Let n n ⌧ := inf t 0:A =0 , 0 { t } 332 be the total progeny of this branching process. We claim that At is bounded from below by At up to time

n kn := inf t 0:Nt n kn . {  }

Indeed, for all t n kn , Nt 1 >n kn. Hence, by (6.12) and (6.18),  Zt Zt for all t n kn and as a result, by induction on t, 

At At, t n kn . 8 

Because the inequality between At and At holds only up to time n kn , we cannot compare directly ⌧0 and ⌧0. However, observe that the size of the cluster of v is at least the total number of active and explored vertices at any

time t; in particular, when n kn < + , 1

v An kn + En kn = n Nn kn kn. |C |

On the other hand, when n kn =+ , Nt >n kn for all t—in particular 1 for all t ⌧ —and therefore

⌧0 kn = n kn < + = ⌧0 kn. ) 1 ) In particular,

P[⌧ kn] Pn,pn [⌧0 kn]. (6.19) 0  2. (Poisson approximation) By Theorem 6.21, t 1 P[⌧0 = t]= P Zi = t 1 , (6.20) t " # Xi=1 t where the Zs are independent Bin(n k ,p ). Note that Z i n n i=1 i ⇠ Bin(t(n kn),pn). Recall the definition of (Z) from (6.16). By Exam- t P ple 4.10, Theorem 4.16, and the triangle inequality for total variation dis- tance, t t

P Zi = t 1 P Zi = t 1 " # " # Xi=1 Xi=1 1 t(n k )( log(1 p ))2 +[t t(n k )( log(1 p ))]  2 n n n n 2 1 2 2 tn + O(n ) + t t(n k ) + O(n )  2 n n n ✓ ◆  ✓ ◆ tk = O n . n ✓ ◆ 333 So by (6.20)

P[⌧ kn]=1 P[⌧

Pn,pn [ v kn]=Pn,pn [⌧0 kn] |C | 2 kn P[⌧ kn] O 0 n ✓ ◆ 2 kn = P[W kn] O . n ✓ ◆

Remark 6.32. In fact one can get a slightly better lower bound. See Exercise 6.7.

When kn is large , the branching process approximation above is not as ac- curate because of the saturation effect: an Erdos-R¨ enyi´ graph has a finite pool of vertices from which to draw edges; as the number of neutral vertices decreases, so does the expected number of uncovered edges at each time. Instead we use the following lemma.

Lemma 6.33 (Cluster size: saturation). Let Gn Gn,pn where pn = with >0 ⇠ n and let v be the connected component of v [n]. Let Yt Bin(n 1, 1 (1 t C 2 ⇠ pn) ). Then, for any t,

Pn,pn [ v = t] P[Yt = t 1]. |C |  Proof. We work with neutral vertices. By Lemma 6.30 and Equation (6.10), for any t,

Pn,pn [ v = t]=Pn,pn [⌧0 = t] Pn,pn [Nt = n t]. (6.21) |C |  Recall that N = n 1 and 0

Nt 1

Nt = Nt 1 1 Nt 1

334 It is easier to consider the process without the indicator as it has a simple distribu- tion. Define N 0 := n 1 and 0 0 Nt 1 0 0 Nt := Nt 1 It,i, (6.23) Xi=1 0 and observe that Nt Nt for all t , as the two processes agree up to time ⌧0 at 0 which point Nt stays fixed. The interpretation of Nt is straightforward: starting with n 1 vertices, at each time each remaining vertex is discarded with probability pn. Hence, the number of surviving vertices at time t has distribution N 0 Bin(n 1, (1 p )t), t ⇠ n by the independence of the steps. Arguing as in (6.21),

0 Pn,pn [ v = t] Pn,pn [N = n t] |C |  t 0 = Pn,pn [(n 1) N = t 1] t = P[Yt = t 1]. which concludes the proof.

Combining the previous lemmas we get: Lemma 6.34 (Bound on the cluster size). Let Gn Gn,pn where pn = with ⇠ n >0 and let be the connected component of v [n]. Cv 2 - (Subcritical case) Assume (0, 1). For all >0, 2 1 (1+) Pn,pn v > (1 + )I log n = O(n ). |C | ⇥ ⇤ - (Supercritical case) Assume >1. Let ⇣ be the unique solution in (0, 1) to the fixed point equation ⇣ 1 e = ⇣. Note that, Example 6.11, ⇣ is the survival probability of a branching process with offspring distribution Poi(). For any >0,

2 1 log n Pn,pn v > (1 + )I log n = ⇣ + O . |C | n ✓ ◆ ⇥ ⇤ Moreover, for any ↵<⇣ and any >0, there exists ,↵ > 0 large enough so that

1 (1+) Pn,pn (1 + ,↵)I log n v ↵n = O(n ). (6.24) |C | ⇥ ⇤ 335 Proof. In both cases we use Lemma 6.31. To apply the lemma we need to bound the tail of the progeny W of a Poisson branching process. Using the notation of Lemma 6.31, by Theorem 6.21,

t 1 P [W >kn]=P [W =+ ]+ P Zi = t 1 , (6.25) 1 t " # t>kXn Xi=1 where the Zis are i.i.d. Poi(). Both terms on the r.h.s. depend on whether or not the mean is smaller or larger than 1. We start with the first term. When <1, the Poisson branching process goes extinct with probability 1. Hence P[W =+ ]=0. When >1 on the other hand, P[W =+ ]=⇣, where 1 1 ⇣ > 0 is the survival probability of the branching process. As to the second term, the sum of the Zis is t. When <1, using (6.7),

t t 1 P Zi = t 1 P Zi t 1 t " #  " # t>kXn Xi=1 t>kXn Xi=1 t 1 exp tIPoi  t t>kXn ✓ ✓ ◆◆ 1 exp t(I O(t ))  t>k Xn C0 exp ( tI )  t>kXn C exp ( I k ) , (6.26)  n for some constants C, C0 > 0, where we assume that kn = !(1). When >1,

t t 1 P Zi = t 1 P Zi t t " #  "  # t>kXn Xi=1 t>kXn Xi=1 exp ( tI )  t>kXn C exp ( I k ) , (6.27)  n for a possibly different C>0. 1 Subcritical case: Assume 0 <<1 and let c =(1+)I for >0. By Lemma 6.31,

Pn,pn [ 1 >clog n] P [W >clog n] . |C | 

336 By (6.25) and (6.26),

P [W >clog n]=O (exp ( Ic log n)) , (6.28) which proves the claim. 1 Supercritical case: Now assume >1 and again let c =(1+)I for >0. By Lemma 6.31,

log2 n Pn,pn [ v >clog n]=P [W >clog n]+O , (6.29) |C | n ✓ ◆ By (6.25) and (6.27),

P [W >clog n]=⇣ + O (exp ( cI log n)) (1+) = ⇣ + O(n ). (6.30)

Combining (6.29) and (6.30), for any >0,

log2 n Pn,pn [ v >clog n]=⇣ + O . (6.31) |C | n ✓ ◆ Next, we show that in the supercritical case when >clog n the cluster size is |Cv| in fact linear in n with high probability. By Lemma 6.33

Pn,pn [ v = t] P[Yt = t 1] P[Yt t], |C |    where Y Bin(n 1, 1 (1 p )t). Roughly, the r.h.s. is negligible until the t ⇠ n mean µ := (n 1)(1 (1 /n)t) is of the order of t. Let ⇣ be the unique t solution in (0, 1) to the fixed point equation

⇣ f(⇣):=1 e = ⇣. The solution is unique because f(0) = 0, f(1) < 1, and the f is increasing, strictly concave and has derivative >1 at 0. Note in particular that, when t = ⇣n, µ t. Let ↵<⇣. For any t [c log n, ↵n], by a Chernoff bound for Poisson t ⇡ 2 trials (Theorem 2.37),

2 µt t P[Yt t] exp 1 . (6.32)   2 µ ✓ t ◆ !

337 For t/n ↵<⇣ , using 1 x e x for x (0, 1), there is > 1 such that   2 ↵ (t/n) µ (n 1)(1 e ) t n 1 1 e (t/n) = t n t/n ✓ ◆ n 1 f(t/n) = t n t/n ✓ ◆ n 1 1 e ↵ t n ↵ ✓ ◆ t, ↵ for n large enough, by the properties of f mentioned above. Plugging this back into (6.32), we get

2 ↵ 1 P[Yt t] exp t 1 .   2 ( ✓ ↵ ◆ )! Therefore ↵n ↵n

Pn,pn [ v = t] P[Yt t] |C |   t=Xc log n t=Xc log n + 2 1 1 exp t ↵ 1  ( 2 ↵ )! t=Xc log n ✓ ◆ 1 2 = O exp c log n ↵ 1 . 2 ( ✓ ↵ ◆ )!! Taking >0 large enough proves (6.24).

Let be the largest connected component of G (choosing the component Cmax n containing the lowest label if there is more than one such component). Our goal is to characterize the size of . Let Cmax

Xk := 1 v >k , {|C | } v [n] X2 be the number of vertices in clusters of size at least k. There is a natural connection between X and , namely, k Cmax >k X > 0 X >k. |Cmax| () k () k 338 A first moment argument on Xk and the previous lemma immediately imply an upper bound on the size of in the subcritical case. Cmax 1 Proof of Theorem 6.28. Let c =(1+)I for >0. We use the first moment method on Xk. By symmetry and the first moment method (Corollary ??),

Pn,pn [ max >clog n]=Pn,pn [Xc log n > 0] |C | En,pn [Xc log n]  = n Pn,pn [ 1 >clog n] . (6.33) |C | By Lemma 6.34,

(1+)  Pn,pn [ max >clog n]=O(n n )=O(n ) 0, |C | · ! as n + . ! 1 1 In fact we prove below that the largest component is of size roughly I log n. But first we turn to the supercritical regime.

Second moment arguments To characterize the size of the largest cluster in the supercritical case, we need a second moment argument. Assume >1. For >0 and ↵<⇣, let ,↵ be as defined in Lemma 6.34. Set 1 ¯ kn := (1 + ,↵)I log n and kn := ↵n. We call a vertex v such that k a small vertex. Let |Cv| n small vertex

Yk := 1 v k . {|C | } v [n] X2 Then Y is the number of small vertices. Observe that by definition Y = n X . kn k k Hence by Lemma 6.34, the expectation of Ykn is

2 En,pn [Yk ]=n(1 Pn,pn [ v >k ]) = (1 ⇣)n + O log n . (6.34) n |C | n Using Chebyshev’s inequality (Theorem 2.2), we prove that Ykn is close to its expectation:

Lemma 6.35 (Concentration of Y ). For any (1/2, 1) and <2 1, kn 2 Pn,pn [ Yk (1 ⇣)n n ]=O(n ). | n | Lemma 6.35, which is proved below, leads to our main result in the supercritical case: the existence of the giant component, a unique cluster of size linear in n. giant component

339 Proof of Theorem 6.29. Take ↵ (⇣ /2,⇣ ) and let k and k¯ be as above. Let 2 n n := X ⇣ n n . Because <1, for n large enough, the event c B1,n {| kn | } B1,n implies that X 1 and, in particular, that kn X . |Cmax| kn Let := v, [k , k¯ ] . If, in addition to c , c also holds then B2,n {9 |Cv|2 n n } B1,n B2,n X = X¯ . |Cmax| kn kn There is equality in the last display if there is a unique cluster of size greater than k¯ . This is indeed the case under c c : if there were two distinct clusters of n B1,n \B2,n size k¯n, then since 2↵>⇣ we would have for n large enough ¯ Xkn = Xk¯n > 2kn =2↵n>⇣n + n , a contradiction. Hence we have proved that, under c c , we have B1,n \B2,n = X = X¯ . |Cmax| kn kn Applying Lemmas 6.34 and 6.35 concludes the proof.

It remains to prove Lemma 6.35.

Proof of Lemma 6.35. The main task is to bound the variance of Ykn . Note that 2 En,pn [Y ]= Pn,pn [ u k, v k] k |C | |C | u,v [n] X2

= Pn,pn [ u k, v k, u v] |C | |C | , u,v [n] X2 +Pn,pn [ u k, v k, u < v] , (6.35) |C | |C | where u v indicates that u and v are in the same connected component. , To bound the first term in (6.35), we note that u v implies that = . , Cu Cv Hence,

Pn,pn [ u k, v k, u v]= Pn,pn [ u k, v u] |C | |C | , |C | 2C u,v [n] u,v [n] X2 X2

= En,pn [1 u k 1 v u ] {|C | } { 2C } u,v [n] X2

= En,pn [ u 1 u k ] |C | {|C | } u [n] X2 = n En,pn [ 1 1 1 k ] |C | {|C | } nk. (6.36)  340 To bound the second term in (6.35), we sum over the size of and note that, Cu conditioned on u = `, u < v , the size of v has the same distribution as the {|C | } C unconditional size of 1 in a Gn `,pn random graph, that is, C

Pn,pn [ v k u = `, u < v]=Pn `,pn [ 1 k]. |C | ||C | |C | Observe that the probability on the l.h.s. is increasing in `. Hence

Pn,pn [ u = `, v k, u < v] |C | |C | u,v [n] ` k X2 X

= Pn,pn [ u = `, u < v] Pn,pn [ v k u = `, u < v] |C | |C | ||C | u,v [n] ` k X2 X

Pn,pn [ u = `] Pn k,pn [ v k]  |C | |C | u,v [n] ` k X2 X

= Pn,pn [ u k] Pn k,pn [ v k]. |C | |C | u,v [n] X2

To get a bound on the variance of Yk, we need to relate this last expression to 2 (En,pn [Yk]) . For this purpose we define

k := Pn k,pn [ 1 k] Pn,pn [ 1 k]. |C | |C | Then, plugging this back above, we get

Pn,pn [ u = `, v k, u < v] |C | |C | u,v [n] ` k X2 X

Pn,pn [ u k](Pn,pn [ v k]+k)  |C | |C | u,v [n] X2 2 2 (En,pn [Yk]) + n k ,  | | and it remains to bound k. We use a coupling argument. Let H Gn k,pn and ⇠ construct H0 Gn,pn in the following manner: let H0 coincide with H on the first ⇠ n k vertices then pick the rest the edges independently. Then clearly 0 k since the cluster of 1 in H0 includes the cluster of 1 in H. In fact, k is the probability that under this coupling the cluster of 1 has at most k vertices in H but not in H0. That implies in particular that at least one of the vertices in the cluster of 1 in H is connected to a vertex in n k +1,...,n . Hence by a union bound { } over those edges k2p , k  n

341 and

2 2 Pn,pn [ u k, v k, u v] (En,pn [Yk]) + k n. (6.37) |C | |C | ,  u,v [n] X2 Combining (6.36) and (6.37), we get

Var[Y ] 2k2n. k  The result follows from Chebyshev’s inequality (Theorem 2.2) and Equation (6.34).

A similar second moment argument also gives a lower bound on the size of the largest component in the subcritical case. We proved in Theorem 6.28 that, when <1, the probability of observing a connected component of size significantly 1 larger than I log n is vanishingly small. In the other direction, we get:

Theorem 6.36 (Subcritical regime: lower bound on the largest cluster). Let Gn ⇠ Gn,pn where pn = with (0, 1). For all  (0, 1), n 2 2 1 Pn,pn max (1 )I log n = o(1). |C | Proof. Recall that ⇥ ⇤

Xk := 1 v >k . {|C | } v [n] X2 It suffices to prove that with probability 1 o(1) we have Xk > 0 when k = 1 (1 )I log n. To apply the second moment method, we need an upper bound on the second moment of Xk and a lower bound on its first moment. The following lemma is closely related to Lemma 6.35. Exercise 6.8 asks for a proof.

Lemma 6.37 (Second moment of Xk). Assume <1. There is C>0 such that

2 2 kI En,pn [X ] (En,pn [Xk]) + Cnke , k 0. k  8 1 Lemma 6.38 (First moment of X ). Let kn =(1 )I log n. Then, for any k (0,) we have that 2 En,pn [Xkn ]=⌦(n ), for n large enough.

342 Proof. By Lemma 6.31,

En,pn [Xkn ]=n Pn,pn [ 1 >kn] |C | 2 = n P[W >kn] O k . (6.38) n Once again, we use the random-walk representation of the total progeny of a branching process (Theorem 6.21). Using the notation of Lemma 6.31,

t 1 P[W >kn]= P Zi = t 1 t " # t>kXn Xi=1 t 1 1 t (t) = e . t (t 1)! t>kXn Using Stirling’s formula, we note that

t 1 t t (t) t (t) e = e t! t(t/e)tp2⇡t(1 + o(1)) 1+o(1) = exp ( t + t log + t) tp2⇡t 1+o(1) = exp ( tI) . p2⇡t3 For any ">0, for n large enough,

1 P[W >kn] exp ( t(I + ")) t>kXn =⌦(exp( k (I + "))) . n For any (0,), plugging the last line back into (6.38) and taking " small 2 enough gives

En,pn [Xkn ]=⌦(n exp ( kn(I + "))) 1 =⌦ exp 1 (1 )I (I + ") log n { } =⌦(n ).

1 We return to the proof of Theorem 6.28. Let k =(1 )I log n. By the n

343 second moment method (Theorem 2.20) and Lemmas 6.37 and 6.38,

2 (EXkn ) Pn,pn [Xkn > 0] [X2 ] E kn 1 O(nk e knI ) 1+ n ⌦(n2) ✓ ◆ 1 O(k e log n) = 1+ n ⌦(n2) ✓ ◆ 1, ! for close enough to . That proves the claim.

Critical regime via martingales To be written. See [NP10].

Exercises

Exercise 6.1 (Galton-Watson process: geometric offspring). Let (Zt) be a Galton- Watson branching process with geometric offspring distribution (started at 0), i.e., k pk = p(1 p) for all k 0, for some p (0, 1). Let q := 1 p, let m be the 2 t mean of the offspring distribution, and let Mt = m Zt.

a) Compute the probability generating function f of pk k 0 and the extinction { } probability ⌘ := ⌘p as a function of p.

b) If G is a 2 2 matrix, define ⇥ G s + G G(s):= 11 12 . G21s + G22 Show that G(H(s)) = (GH)(s).

c) Assume m =1. Use b) to derive 6 pmt(1 s)+qs p f (s)= . t qmt(1 s)+qs p Deduce that when m>1 (1 ⌘) E[exp( M )] = ⌘ +(1 ⌘) . 1 +(1 ⌘)

344 d) Assume m =1. Show that t (t 1)s f (s)= , t t +1 ts and deduce that Zt/t 1 E[e Zt > 0] . | ! 1+

Exercise 6.2 (Supercritical branching process: infinite line of descent). Let (Zt) be a supercritical Galton-Watson branching process with offspring distribution pk k 0. Let ⌘ be the extinction probability and define ⇣ := 1 ⌘. Let Zt1 { } be the number of individuals in the t-th generation with an infinite line of descent, i.e., whose descendant subtree is infinite. Denote by the event of nonextinction S of (Zt). Define p01 := 0 and

1 j j k k p1 := ⇣ ⌘ ⇣ p . k k j j k ✓ ◆ X

a) Show that pk1 k 0 is a probability distribution and compute its expectation. { } b) Show that for any k 0 P[Z1 = k ]=p1. 1 |S k [Hint: Condition on Z1.]

c) Show by induction on t that, conditioned on nonextinction, the process (Zt1) has the same distribution as a Galton-Watson branching process with off- spring distribution pk1 k 0. { } Exercise 6.3 (Hitting-time theorem: nearest-neighbor walk). Let X1,X2,... be i.i.d. random variables taking value +1 with probability p and 1 with probability q := 1 p. Let S := t X with S := 0 and M := max S :0 i t . t i=1 i 0 t { i   } a) For r 1, use theP reflection principle to show that P[St = b],br, P[Mt r, St = b]= r b ((q/p) P[St =2r b],b0, show that for all t 1 b t b P[⌧b = t]= P[St = b]. t

[Hint: Consider the probability P[Mt 1 = St 1 = b 1,St = b].] 345 Exercise 6.4 (Percolation on bounded-degree graphs). Let G =(V,E) be a count- able graph such that all vertices have degree bounded by b +1for b 2. Let 0 be a distinguished vertex in G. For bond percolation on G, prove that

pc(G) pc(Tb), by bounding the expected size of the clusterb of 0. [Hint: Consider self-avoiding paths started at 0.]

Exercise 6.5 (Percolation on Tb: critical exponent of ✓(p)). Consider bond per- 1 colation on the rooted infinite b-ary tree Tb with b>2. For " [0, 1 ] and 2 b u [0, 1], define b 2 b b h(", u):=u 1 1 " (1 u)+ 1 + " . b b a) Show that there is a constant C>0 not depending on ", u such that b 1 h(", u) b"u + u2 C(u3 "u2). 2b  _ b) Use a) to prove that ✓(p) 2b2 lim = . b 1 p pc(Tb) (p pc(Tb)) # b b Exercise 6.6 (Percolation on T2: higher moments of 0 ). Consider bond per- |C | 1 colation on the rooted infinite binary tree T2. For density p< 2 , let Zp be an integer-valued random variableb with distribution b ` Pp[ 0 = `] Pp[Zp = `]= |C | , ` 1. Ep 0 8 |C | a) Using the explicit formula for Pp[ 0 = `] derived in Section 6.3.1, show |C | that for all 0 0. " 2 b) Show that for all k 2 there is C > 0 such that k k Ep 0 lim |C | = Ck. 1 2(k 1) p pc(T2) (pc(T2) p) " b b 346 c) What happens when p pc(T2)? # Exercise 6.7 (Branching process approximation:b improved bound). Let pn = n with >0. Let Wn,pn , respectively W, be the total progeny of a branching process with offspring distribution Bin(n, pn), respectively Poi().

a) Show that

P[Wn,pn k] P[W k] | | max P[Wn,pn k, W

k 1 2 P[Wn,pn k] P[W k] P[W i]. | | n Xi=1 Exercise 6.8 (Subcritical Erdos-R¨ enyi:´ Second moment). Prove Lemma 6.37.

Bibliographic remarks

Section 6.1 See [Dur10, Section 5.3.4] for a quick introduction to branching pro- cesses. A more detailed overview relating to its use in discrete probability can be found in [vdH14, Chapter 3]. The classical reference on branching processes is [AN04]. The Kesten-Stigum theorem is due to Kesten and Stigum [KS66a, KS66b, KS67]. Our proof of a weaker version with an L2 condition follows [Dur10, Example 5.4.3]. Spitzer’s combinatorial lemma is due to Spitzer [Spi56]. The proof presented here follows [Fel71, Section XII.6]. The hitting-time theorem was first proved by Otter [Ott49]. Several proofs of a generalization can be found in [Wen75]. The critical percolation threshold for percolation on Galton-Watson trees is due to R. Lyons [Lyo90].

Section 6.3 The presentation in Section 6.3.1 follows [vdH10]. See also [Dur85]. For much more on the phase transition of Erdos-R¨ enyi´ graphs, see e.g. [vdH14, Chapter 4], [JLR11, Chapter 5] and [Bol01, Chapter 6]. In particular a for the giant component, proved by several authors including Martin- Lof¨ [ML98], Pittel [Pit90], and Barraez, Boucheron, and de la Vega [BBFdlV00], is established in [vdH14, Section 4.5].

347