<<

Consensus of Interacting Particle Systems on Erd¨os-R´enyi Graphs

Grant Schoenebeck∗ Fang-Yi Yu†

Abstract Node Dynamics with Interacting Particle Systems—exemplified by the voter f(x) = x. model, iterative majority, and iterative k−majority processes—have found use in many disciplines including dis- This models has been extensively studied in math- tributed systems, statistical physics, social networks, and ematics [15, 22, 27, 28], physics [6, 9], and even in theory. In these processes, nodes update their social networks [8, 34, 35, 36, 14]. A key question “opinion” according to the frequency of opinions amongst studied is how long it takes the dynamics to reach their neighbors. We propose a family of models parameterized by an consensus on different network typologies. update function that we call Node Dynamics: every node Iterative majority: In the iterative majority dynam- initially has a binary opinion. At each round a node is ics, in each round, a randomly chosen node updates uniformly chosen and randomly updates its opinion with the specified by the value of the update to the opinion of the majority of its neighbors. This function applied to the frequencies of its neighbors’ opinions. corresponds to the Node Dynamics where In this work, we prove that the Node Dynamics con-  1 if x > 1/2; verge to consensus in time Θ(n log n) in complete graphs and  f(x) = 1/2 if x = 1/2; dense Erd¨os-R´enyi random graphs when the update function  0 if x < 1/2. is from a large family of “majority-like” functions. Our tech- nical contribution is a general framework that upper bounds Typically works about Majority Dynamics study 1) the consensus time. In contrast to previous work that relies when the dynamics converge, how long it takes the on handcrafted potential functions, our framework system- dynamics to converge, and whether they converge atically constructs a potential function based on the state space structure. to the original majority opinion—that is, does ma- jority dynamics successfully aggregate the original 1 Introduction opinion [25, 7, 23, 31, 37]. We propose the following —that we Iterative k-majority: In this dynamics, in each call Node Dynamics—on a given network of n agents round, a randomly chosen node collects the opinion parameterized by an update function f : [0, 1] → [0, 1]. of k randomly chosen (with replacement) neighbors In the beginning, each agent holds a binary “opinion”, and updates to the opinion of the majority of those either red or blue. Then, in each round, an agent is k opinions. This corresponds to the Node Dynam- uniformly chosen and updates its opinion to be red with ics where probability f(p) and blue with probability 1−f(p) where k X k p is the fraction of its neighbors with the red opinion. f(x) = x`(1 − x)n−`. ` Node dynamics generalizes processes of interest in `=dk/2e many different disciplines including distributed systems, statistical physics, social networks, and even biology. A synchronized variant of this dynamics is pro- posed as a protocol for stabilizing consensus: col- Voter Model: In the voter model, at each round, lection of n agents initially hold a private opinion a random node chooses a random neighbor and and interact with the goal of agreeing√ on one of updates to its opinion. This corresponds to the the choices, in the presence of O( n)-dynamic ad- versaries which√ can adaptively change the opinions of up to O( n) nodes at every round. In the syn- ∗University of Michigan, [email protected]. He is grate- chronized variant of this dynamics, Doerr et al. [17] fully supported by the National Science Foundation under Career prove 3-majority reaches “stabilizing almost” con- Award 1452915 and AitF Award 1535912. sensus on the in the presence of †University of Michigan, [email protected]. He is gratefully √ supported by the National Science Foundation under AitF Award O( n)-dynamic adversaries. Many works extend 1535912. this result beyond binary opinions [16, 13, 5, 1].

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited Iterative ρ-noisy majority model: [20, 21] In this and concentration property to control the behavior of dynamics, in each round, a randomly chosen node the process [5], or (3) using handcrafted potential func- updates the majority opinion of its neighbors with tions [31]. probability 1 − ρ and uniformly at random with Our results fill in a large gap that these results probability ρ. do not adequately cover. time is not well-  defined in non-reversible or reducible Markov chains,  1 − ρ/2 if x > 1/2; and so does not apply to Markov chains with multiple f(x) = 1/2 if x = 1/2; absorption states, like in the consensus time question we  ρ/2 if x < 1/2. study. Reducing the dimension and using a mean field approximation fails for two reasons. First, summarizing Genetic Evolution Model: In biological systems, with a small set of parameters is not possible when the the chance of survival of an animal can depend process of interest has small imperfections (like in a on the frequencies of its kin and foes in the net- fixed Erd¨os-R´enyi graph). Second, the mean-field of work [3, 29]. Moreover, this frequency depend- our dynamics has unstable fixed points; in such cases ing dynamics is also known to model the dynamics the mean field does not serve as a useful proxy for the for maintaining the genetic diversities of a popula- Markov process. Handcrafting potential functions also tion [24, 32]. runs into several problems: the first is that because we Our Contribution We focus on a large set of consider dynamics on random graphs, the dynamic is update functions f that are symmetric, smooth, and not a priori well specified; so there is no specific dynamic satisfy a property well call “majority-like”, intuitively to handcraft a potential function for. Secondly, we wish meaning that agents update to the majority opinion to solve the problem for a large class of update functions strictly more often than the fraction of neighbors hold- f, and so cannot individually hand-craft a potential ing the majority opinion. We obtain tight bounds for function for each one. Typically, the potential function the consensus time—the time that it takes the system to is closely tailored to the details of the process. reach a state where each node has an identical opinion— Additional Related Work Our model is similar on Erd¨os-R´enyi random graphs. to that of Schweitzer and Behera [33] who study a va- Our main technical tool is a novel framework for riety of update functions in the homogeneous setting upper bounding the hitting time for a general discrete- (complete graph) using simulations and heuristic argu- time homogeneous Markov chain (X ,P ), including non- ments. However, they leave a rigorous study to future reversible and even reducible Markov chains. This work. framework decomposes the problem so that we only need to upper bound two sets of parameters for all 2 Preliminaries x ∈ X —the reciprocal of the probability of decreasing 2.1 Node Dynamics Given an undirected graph the to target 1/p+(x) and the ratio of the G = (V,E) let Γ(v) be the neighbors of node v and probability of decreasing the distance to the target and deg(v) = |Γ(v)|. the probability of increasing the distance to the target: We define a configuration x(G) : V → {0, 1} to p−(x)/p+(x). Our technique can give much stronger assign the “color” of each node v ∈ G to be x(G)(v) bounds than simply lower bounding p−(x) and upper so that x(G) ∈ {0, 1}n. We will usually suppress the bounding p+(x). superscript when it is clear. We will use uppercase (e.g., Once we apply this decomposition to our consensus X(G)) when the configuration is a . time problem, the problem becomes very manageable. Moreover we say v is red if x(v) = 1 and is blue if We show the versatility of our approach by extending x(v) = 0. We then write the set of red vertices as the results to a variant of the stabilizing consensus prob- x−1(1). We say that a configuration x is in consensus lem, where we show that all majority-like dynamics con- if x(·) is the constant function (so all nodes are red or vergence quickly to the “stabilizing almost” consensus all nodes are blue). Given a node v in configuration x on the complete graph in the presence of adversaries. |Γ(v)∩X−1(1)| we define rx(v) = deg(v) to be its fraction of red A large volume of literature is devoted to bound- neighbors. ing the hitting time of different Markov process and achieving fast convergence. The techniques typically Definition 2.1. An update function is a mapping employed are (1) showing the Markov chain has fast f : [0, 1] 7→ [0, 1] with the following properties: mixing time [30], (2) reducing the dimension of the pro- cess into small set of parameters (e.g. the frequency Monotone ∀x, y ∈ [0, 1], if x < y, then f(x) ≤ f(y). of each opinion) and using a mean field approximation Symmetric ∀t ∈ [0, 1/2], f(1/2 + t) = 1 − f(1/2 − t).

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited Absorbing f(0) = 0 and f(1) = 1. By the , the expected hitting time can be written as linear equation. We define node dynamics as follows:  1 + P P [τ (y)] if x 6∈ A, [τ (x)] = y∈Ω x,yEM A Definition 2.2. A node dynamics ND(G, f, X0) EM A 0 if x ∈ A with an undirected graph G = (V,E), update function f and initial configuration X0 is a stochastic process Due to the memory-less property of Markov chain, sometimes it is useful to analyze its first step. Let’s over configurations at time t, {Xt}t≥0 where X0 is the initial configuration. The dynamics proceeds in rounds. consider a general measurable function w :Ω 7→ R. If At round t, a node v is picked uniformly at random, and the Markov chain starts at state X = x, the next state we update is the random variable X0, then the average change of w(X) in one transition step is given by  1 with probability f(rXt−1 (v)) Xt(v) = 0 X 0 otherwise (Lw)(x) , EM[w(X )−w(X)|X = x] = Px,yw(y)−w(x) y∈Ω This formulation is general enough to contain many 0 well known dynamics such as the aforementioned voter To reduce the notation we will use EM[w(X )|X] to de- model, iterated majority model, and 3-majority dynam- note the expectation of the measurable function w(X0) ics. given the previous state at X. Note that in some of the original definitions the Definition 2.5. Given Markov chain M with state nodes syncronously update; whereas, to make our pre- space Ω, D ( Ω, and two real-valued functions ψ, φ sentation more cohesive, we only consider asynchronous with domain Ω, we define the Poisson equation as updates. the problem of solving the function w :Ω 7→ such that In this paper, we will focus on the interaction R between the update function f and geometric structure Lw(x) = −φ(x) where x ∈ D, of G. More specifically, we are interested in the w(x) = ψ(x) where x ∈ ∂D. consensus time defined as following. where the ∂D ∪ supp p(x, ·) \ D is the exterior Definition 2.3. The consensus time of node dy- , x∈D boundary of D w.r.t the Markov chain. namics ND(G, f, X0) is a random variable T (G, f, X0) denoting the first time step that ND is in a consen- Note that solving the expected hitting time of set A is a sus configuration. The (maximum) expected con- special case of the above problem by taking D = Ω \ A, sensus time ME(G, f) is the maximum expected con- φ(x) = 1 and ψ(x) = 0. The next fundamental theorem sensus time over any initial configuration, ME(G, f) = shows that super solutions to an associated boundary maxX0 E[T (G, f, X0)]. value problem provide upper bounds for the Poisson equation in Definition 2.5. Now we define some properties of functions. Theorem 2.1. (Maximum principle [19]) Given Definition 2.4. Given positive M ,M , a function f : 1 2 Markov Chain M with state space Ω, D Ω, and I ⊆ 7→ is called M -Lipschitz in I ⊆ if for all ( R R 1 R two real-valued functions ψ, φ with domain Ω, suppose x, y ∈ I, s :Ω 7→ R is a non-negative function satisfying |f(x) − f(y)| ≤ M1|x − y|. Ls(x) ≤ −φ(x) where x ∈ D, Moreover, f is M2-smooth in I ⊆ R if for all x, y ∈ I, 0 0 s(x) ≥ ψ(x) where x ∈ ∂D. |f (x) − f (y)| ≤ M2|x − y|. Then s(x) ≥ w(x) for all x ∈ D. 2.2 Potential Theory for Markov Chains Let Corollary 2.1. (Super solution for hitting time) M = (Xt,P ) be a discrete time-homogeneous Markov chain with finite state space Ω and transition matrix P . Given Markov Chain M with state space Ω and a set of states A Ω, suppose sA :Ω 7→ is a non-negative For x, z ∈ Ω, we define τa(x) to be the hitting time for ( R a with initial state x: function satisfying

LsA(x) ≤ −1 where x 6∈ A, τa(x) , min{t ≥ 0 : Xt = a, X0 = x}, (2.1) sA(x) ≥ 0 where x ∈ A. and τA(x) to be the hitting time to a set of state A ⊆ Ω: Then sA(x) ≥ EM[τA(x)] for all x 6∈ A. Moreover we τA(x) , min{t ≥ 0 : Xt ∈ A, X0 = x}. call sA a potential function for short. Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited 2.3 Concentration Inequalities Let AG be the of G, so (AG)i,j = 1 ¯ if vi ∼ vj and 0 otherwise, and A = EGn,p [AG], so Theorem 2.2. ([4]) Let X = (x1, . . . , xN ) be a finite ¯ set of N real numbers, that X ,...,X denote a random Ai,j = p if i 6= j and 0 otherwise. Let deg(v) be the 1 n of node v. sample without replacement from X and that Y1,...,Yn denote a random sample with replacement from X . If Definition 2.7. The weighted adjacency matrix of f : R 7→ R is continuous and convex, then undirected graph G is defined by n ! n ! ( 1 X X √ if (AG)i,j = 1; deg(v )deg(v ) Ef Xi ≤ Ef Yi MG(i, j) = i j i=1 i=1 0 otherwise .

Theorem 2.3. (A Chernoff bound [18]) Let X , Definition 2.8. (Expansiveness [12]) For Pn i=1 Xi where Xi for i ∈ [n] are independently dis- λ ∈ [0, 1], we say that a undirected graph G is a tributed in [0, 1]. Then for 0 <  < 1 λ-expander if if λk(MG) ≤ λ for all k > 1 where  2  λk(MG) is the k-th largest eigenvalue. P[X > (1 + )EX] ≤ exp − EX 3 Theorem 2.5. (Spectral profile of Gn,p [11])  2  For Gn,p, we denote I as identity matrix and J as the P[X < (1 − )EX] ≤ exp − EX log n 2 matrix that has ones. If Gn,p has p = ω( n ), then with probability at least 1 − 1/n, for all k If a bounded function g on a probability space (X,P ) which is Lipschitz for most of the measure in X, p  |λk(MG) − λk(M¯ )| = O log n/(np) then the following theorem prove a concentration prop- erty of g by using union bound and Azuma inequality. ¯ 1 ¯ where (M)i,j = n−1 if i 6= j and (M)i,i = 0. Theorem 2.4. (Bad events [18]) Consider a ran- Because the spectrum of M¯ is {1, −1/(n − 1)} where dom object X = (X1,...,Xn) with probability mea- −1/(n − 1) has multiplicity n − 1, we can have the sure P. Let E be an event in the space of X. Fix a following corollary real-valued function g with domain X which is bounded, log n m ≤ g(X) ≤ M. Let c be the maximum effect of chang- Corollary 2.2. If p = ω( ), the G ∼ Gn,p is i q n ing the ith input coordinate of g conditioned both inputs  log n  O np -expander with probability 1 − O(1/n), being in E: 0 Let e(S, T ) denote the number of edges between S sup |g(x) − g(x )| ≤ ci. 0 0 and T (double counting edges from S ∩ T to itself), and x,x ∈E,∀j6=i xj =x j let vol(S) count the number of edges adjacent to S. The Then, following lemma relates the number of edges between   two sets of nodes in an expander to their expected P g(X) > E[g(X)] + t + (M − m)P[¬E] E number in a random graph. 2 P 2 Lemma 2.1. (Irregular mixing lemma [10]) If G is bounded above by exp −2t / ci . i is a λ-expander, then for any two subsets S, T ⊆ V : We say a sequence of events {An}n≥1 happens with vol(S)vol(T ) p high probability if limn→∞ P[An] = 1 that is P[An] = e(S, T ) − ≤ λ vol(S)vol(T ) 1 − o(1). vol(G)

Finally, let E(δd; v) denote the event that the degree 2.4 Erd¨os-R´enyi Random Graphs Here we of some fixed node v is between (1−δd)np and (1+δd)np present the definition of Erd¨os-R´enyi random graphs and let E(δd) = ∩v∈V E(δd; v) a nearly uniform and show several well-known properties of them that degree event. By applying theorem 2.3 we yields the we need. following lemma. Definition 2.6. (Erdos-R¨ enyi´ Random Graph) Lemma 2.2. (Uniform degree) For any v ∈ V , if 1 Gn,p is a random undirected graph on node set V = [n] G ∼ Gn,p where each pair of nodes is independently connected 2  with a fixed probability p. We further use Gn,p to denote (2.2) P[E(δd; v)] ≤ 2 exp −δdnp/3 this random object. Furthermore, by union bound

1 2  [n] = {1, 2, . . . , n} (2.3) P[E(δd)] ≤ 2n exp −δdnp/3 Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited 3 Warm-up: Majority-liked Update Function model f(x) = x, there is no drift. For majority-like on Complete Graph function because there is a positive drift toward n when In this section we consider majority-like node dynamics P os(x) > n/2; and a negative drift toward 0 when when P os(x) < n/2. Informally the drift is always helping on the complete graph Kn with n nodes in which every pair of nodes has an edge (no self-loops). We use this as and thus the potential function for voter models works. a toy example to give intuition for dense Erd¨os-R´enyi Claim 3.1. Our definition of φ satisfies the inequali- graphs even though we will obtain better bounds later. ties (2.1): Given Markov Chain M = ND(Kn, f, X0) in theorem 3.1, φ defined in (3.4) are non-negative and Theorem 3.1. Let M = ND(K , f, X ) be a node n 0 satisfy dynamic over the complete graph Kn with n nodes. If the update function f satisfies ∀x : 1/2 < x < 1 then Lφ(x) ≤ −1 where x 6= 0n, 1n, x ≤ f(x), then the maximum expected consensus time φ(x) ≥ 0 where x = 0n, 1n. of a node dynamic over Kn is Combining claim 3.1 and corollary 2.1, we have 2 ME(Kn, f) = O n . EM[T (Kn, f, x)] ≤ φ(x). A standard method of proving fast convergence is to guess a potential function of each state and prove the By direct computation, if 0 ≤ k < n, ψ(k + 1) − ψ(k) = expectation decreases by 1 after every step—this is just (n − 1)(H(n − k − 1) − H(k)). Therefore, the maximum an application of corollary 2.1. ψ(k) happens at k = bn/2c, As a warm-up, we will prove theorem 3.1 by guess- 2 ME(Kn, f) ≤ ψ(bn/2c) ≤ (ln 2)n , ing a potential function and applying corollary 2.1. and completes our proof. Proof. [theorem 3.1] Given a configuration x, define −1 P os(x) , |x (1)| then for all red nodes v where 4 Smooth Majority-like Update Function on (Kn) P os(x)−1 x (v) = 1 have rx(v) = n−1 ; otherwise rx(v) = Dense Gnp P os(x) n−1 . In this section, we consider the smooth Majority-liked Because the node dynamics M is on the com- update function defined as follows: plete graph, M is lumpable with respect to partition We call an update function f a {Σ } where Σ = {x ∈ Ω: P os(x) = l} such that Definition 4.1. l 0≤l≤n l smooth majority-like update function if it satisfies ∀x : for any subsets Σ and Σ in the partition, and for any i j 1/2 < x < 1 then x < f(x) and the following technical states x, y in subset Σ , i conditions hold: X X P (x, z) = P (y, z) Lipschitz There exists M1 such that f is M1-Lipschitz

z∈Σj z∈Σj in [0, 1].

Furthermore, inspired by an analysis of Voter Condition at 1/2 There exists an open interval I1/2 ˇ ˇ Model [2] we consider ψ :[n] 7→ R as containing 1/2 and constants 1 < M1, 0 < M2 such ˇ ˇ 0 that f is M2-smooth in I1/2 and 1 < M1 ≤ f (1/2). ψ(k) = (n − 1)k(H(n − 1) − H(k − 1)) Condition at 0 and 1 There exists intervals I0 3 0, + (n − k)(H(n − 1) − H(n − k − 1)) I1 3 1 and a constant Mˆ 1 < 1 such that ∀x ∈ I0, f(x) ≤ Mˆ x and ∀x ∈ I , 1 − f(x) ≤ Mˆ (1 − x). Pk 1 1 1 1 where H(k) , `=1 ` , and define the potential function as Intuitively, the majority-like update function should be “smooth” and not tangent with y = x. The following (3.4) φ(x) = ψ(P os(x)) figure shows an example of smooth majority-like update function. Now we are ready to state our main theorem. The proof of the following claim is deferred to the full version, and here we just give some intuition Theorem 4.1. Let M = ND(G, f, X0) be a node dy- as to why this potential function for the voter model namic over G ∼ Gnp with p = Ω(1), and let f be a works. The sequence (P os(Xt))t≥0 can be seen as smooth majority-like function. Then the expected con- a on 0, 1, . . . , n with drift 2. Moreover sensus time of a node dynamic over G is the drift depends on f(P os(x)) − P os(x). For voter ME(G, f) = O (n log n)

2The formal definition of drift is in Equation (4.8). with high probability.

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited ploit this structure of our Markov chain and construct a potential function for Equations (2.1).

4.2 A Framework for Upper Bounding the Hit- ting Time We want to upper bound the hitting time from arbitrary state x to {0n, 1n} denoted as τ(x) of a given time-homogeneous Markov chain M = (Ω,P ) with finite state space Ω = {0, 1}n where P (x, y) > 0 only if the states x, y only differ by one digit, |x−y| ≤ 1. We let P os(x) be the position of state x ∈ Ω:

−1 (4.5) P os(x) , |x (1)|, and pos(x) , P os(x)/n and the bias of x as (4.6) Figure 1: An example of smooth majority-like update Bias(x) |n/2 − P os(x)| , and bias(x) Bias(x)/n function. , , Note that the Bias(x) = n/2 if and only if x = 0n, 1n. This theorem shows the fast convergence rate of Suppose that M can be “almost” characterized this process. Note that there is some chance of getting by one parameter Bias(x). Informally, we want the transitions at states x and y to be similar if Bias(x) = a disconnected graph G ∼ Gn,p which results in a Bias(y). Therefore with the notion of first step analysis reducible Markov chain M which cannot converge from + − some initial configurations. Therefore, we can only ask we define {(pG(x), pG(x))}x∈Ω where for the fast convergence result with high probability. p+(x) = [Bias(X0) = Bias(X) + 1|X = x], We note that, the technical conditions exclude G PM (4.7) − 0 interactive majority updates, which we leave for future pG(x) = PM[Bias(X ) = Bias(X) − 1|X = x]. work. + Moreover, we call pG(x) the exertion and define 4.1 Proof Overview Here we will first outline the the drift of state x as follows structure of the proof. In section 4.2 we propose a (4.8) D(x) [Bias(X0) − Bias(X)|X = x]. paradigm for proving an upper bound for the hitting , EM time when the state space has special structure. In It is easy to see D(x) = p+(x) − p−(x). section 4.3, we use the result in section 4.2 to prove Since M can be almost characterized by one param- theorem 4.1. eter, Bias(x), M is almost lumpable with respect to where large literature have devote to different pro- the partition induced by Bias(·). The following lemma cess. Most of them achieve fast convergence result by us- gives us a scheme for constructing an upper bound for ing handcraft potential function or showing the Markov the hitting time: chain has fast mixing time. However it is not easy to find clever potential function for any process, and the Lemma 4.1. (Pseudo-lumpability lemma) Let fast mixing time is not well defined in reducible Markov M = (Ω,P ) have finite state space Ω = {0, 1}n with chain. Recall that the expected consensus time is even n3 and P (x, y) > 0 only if the states x and y differ in at most one coordinate and τ(x) , EM[T (G, f, x)] (4.9) which is exactly the hitting time of states 0n and 1n. 1 d0 = max However in contrast to section 3, finding a clever poten- x:Bias(x)=0 p+(x) tial function is much harder here. We prove theorem 4.1 1 p−(x) using that the expected hitting can be formulated as a dl = max + max dl−1 x:Bias(x)=l p+(x) x:Bias(x)=l p+(x) system of linear equations (2.1) and by explicitly esti- + − mating an upper bound of this system of linear equa- where 0 < l < n/2, and {(p (x), p (x)}x∈Ω are as tions. Moreover, following the intuition in section 3, defined in (4.7). Then the maximum expected hitting the Markov chain M can be nearly characterized by one parameter P os(x) when the node dynamics is on 3To avoid cumbersome notion of parity we only consider n to a graph that is close to the complete graph. We ex- be even here.

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited time from state x to {0n, 1n} can be bounded as follows: In theorem 4.2 we partition the states into Σs, Σm, Σl according to the bias as follows: X max EM[τ(x)] ≤ d` x∈Ω Σs = {x ∈ Ω: bias(x) < ˇ} 0≤` 0, and 0 < r, A2,A3 < 1 such that all x ∈ {0, 1}n \{0n, 1n} let p+(x) = p−(x) = 1 , 2+x1 + + s m and 0n and 1n are absorbing states. This lemma yield (4.11) p

1. An upper bound for 1/p+(x). The proof of theorem 4.2, it is rather straightfor- ward using lemma 4.1, and carefully constructing the 2. An upper bound for p−(x)/p+(x). potential function from the recursive Equation (4.9).

In theorem 4.2 we give a framework that uses the 4.3 Proof of Theorem 4.1 In this section, we will + − + upper bounds for 1/p (x) and p (x)/p (x) to obtain use theorem 4.2, to prove an O(n log n) time bound by upper bounds for expected hitting time. To have exploiting properties of our process. Specifically, let our some intuition about the statement of the theorem, node dynamic M = ND(G, f, X0) be a node dynamic observe that if the drift D(x) is bounded below by some over G sampled from Gn,p, it is sufficient to prove an positive constant both 1/p+(x) and p−(x)/p+(x) have + − + upper bound for 1/pG(x) and pG(x)/pG(x). Note that nice upper bounds. However, this bound fails when the we use subscripts to emphasize the dependency of the drift is near zero or even negative. Taking our node graph G. dynamics on dense Gnp as an example, when the states To apply theorem 4.2, we partition the states into have either very small or very large bias(x) the drift three groups Σs, Σm, and Σl defined in (4.10). The D(x) can be very close to zero or even negative. The constants ˇ and ˆ depend on the update function f and drift near 1/2 is close to 0 because the effects of red the probability of an edge p and will be specified later. and blue largely cancel each other. The drift near the Figure 4.3 illustrates the partitions of the states. extreme point is small because there are very few nodes The following lemma upper bounds 1/p+(x): outside the majority. G + As a result, we partition the states into subsets, and Lemma 4.2. (lower bound for pG(x)) Given node take addition care on the sets of states with small drift. process M on G, if G λ-expander with nearly uniform

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited can apply lemma 4.2 and 4.3 to theorem 4.2, which finishes the proof.

5 The Stabilizing Consensus Problem The consensus problem in the presence of an adversary (known as Byzantine agreement) is a fundamental prim- itive in the design of distributed algorithms. For the stabilizing-consensus problem—a variant of the consensus problem, Doerr et al. [17] proves synchronized 3-majority converges fast to an almost stable√ consensus on a complete graph in the presence of O( n)-dynamic adversaries which, at every√ round, can adaptively change the opinions of up to O( n) nodes. Here we consider an asynchronous protocol for this Figure 2: An illustration of partition in section 4.3. problem:

2 2 1−δd ˆ (1/2−ˆ) Given a complete network of n anony- degree E(δd), δd < 1 and λ < · min{ , }, Definition 5.1. 1+δd 18 2 mous nodes with update function f, and F ∈ . In the then for p+ ˆf ˆ, N , 2 2 beginning configuration, each node holds a binary opin- + + s m (4.16) p < pG(x) ≤ 1 if x ∈ Σ ∪ Σ ion specified by x0(·). In each round: + 1 pG(x) l (4.17) 4 < (1/2−bias(x)) ≤ 1 if x ∈ Σ . 1. An adaptive dynamic adversary can arbitrarily cor- rupt up to F agents, and change the reports of their This lemma is proved by apply mixing lemma 2.1 to opinions in this run (the true opinion of these nodes show that the probability of increasing bias is (1) larger is restored and will be reported once the adversary than some constant for x ∈ Σs ∪ Σm lemma B.1 and (2) stops corrupting them). proportional to the size of minority in Σl in lemma B.2. The proof details are in appendix B. 2. A randomly chosen node updates its opinion accord- The second part follows from the following lemma: ing to node dynamics. (If the chosen node is cor- rupted by adversary in that run, the adversary can − + arbitrarily update the opinion of the chosen node.) Lemma 4.3. (upper bound for pG(x)/pG(x)) Given node process M on G, if G ∼ G , then n,p nγ We say a there exist positive constant A ,A ,A ,B , and Definition 5.2. ( -almost consensus) 1 2 3 1 complete network of n anonymous nodes reaches an nγ - 0 < A ,A < 1 such that, with high probability, 2 3 almost consensus if all but O(nγ ) of the nodes support the same opinion. p−(x)  B  (4.18) G ≤ 1 + A √1 − bias(x) if x ∈ Σs + 1 Our analysis in section 4 can be naturally extended pG(x) n − to the stabilizing consensus problem and proves all pG(x) m (4.19) ≤ 1 − A2 if x ∈ Σ majority-liked update functions (definition 4.1) are sta- p+(x) G bilizing almost consensus protocols and have the same − pG(x) l convergence rate. (4.20) + ≤ 1 − A3 if x ∈ Σ pG(x) Theorem 5.1. Given n nodes, fixed γ > 1/2, F = − + √ n Instead of bounding pG(x)/pG(x) directly, the drift O( n), and initial configuration X0 ∈ {0, 1} , the + − DG(x) , pG(x) − pG(x) is more natural to work node dynamic ND(Kn, f, x0) on a complete graph with γ with. Taking the complete graph as example, DG(x) = update function f reaches an n -almost consensus in the f(pos(x)) − pos(x). Therefore instead of proving an presence of any F -corrupt adversary within O(n log n) − + upper bound of pG(x)/pG(x) directly, we prove an lower rounds with high probability. bound for the drift in Appendix B (lemma B.3, B.4, B.5, and B.8). Combining with lemma 4.2, these gives us an Remark 5.1. The goal of this section is not to promote − + desired upper bound for pG(x)/pG(x). majority-liked node dynamics as a state-of-art protocol for the stabilizing consensus problem, but to show the Proof. [theorem 4.1] By corollary 2.2 G ∼ G is a n,p versatile power of the our framework of proving conver- q log n  O np -expander with high probability. Thus, we gence time in section 2.1. Additionally we modify the

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited formulation of the problem here to make our presenta- 5.2 Monotone Coupling Between Y(F ) And tion more cohesive. X (AF ). To transfer the upper bound of Y(F ) to X (AF ), we need to build a “nice” coupling between Let the random process with the presence of some them which is characterized as follow: fixed F -dynamic adversary AF defined in theorem 5.1 Definition 5.3. (Monotone Coupling) Let X,Y be denoted X (AF ) = (Xt)t≥0. Observe that our framework in section 4.2 only works for Markov chain, be two random variables on some partially ordered set but with the presence of adaptive adversary the process (Σ, ≥). Then a monotone coupling between X and Y ˜ ˜ is no longer a Markov chain. As a result, we “couple” is a measure (X, Y ) on Σ × Σ such that this process with a nice Markov chain Y(F ) = (Yt)t≥0, • The marginal distributions X˜ and X have the same and use the Markov chain as a proxy to understand the distribution; original process. ˜ The proof has two parts: we first define the proxy • The marginal distributions Y and Y have the same Markov chain Y(F ) and prove an upper bound of distribution; almost consensus time by using the tools in section 4.2. ˜ ˜ • P(X,˜ Y˜ )[X ≥ Y ] = 1. Secondly, we construct a monotone coupling between Note that the function bias(·) induces a natural Y(F ) and X (AF ) to prove X (AF ) also converges to n almost consensus fast. total order ≤bias of our state space Ω = {0, 1} such that for x, y ∈ Ω, x ≤bias y if and only if 5.1 Upper Bounding the Expected Almost bias(x) ≤ bias(y). We can also define a partial order Consensus Time for Y(F ). With the notation de- over sequences of states: given two sequences (Xt)t≥0, fined in section 4, we define Y(F ). Informally, we con- (Yt)t≥0 we call (Xt)t≥0 ≤bias (Yt)t≥0 if ∀t ≥ 0 Xt ≤bias Y . We use calligraphic font to represent the whole struct Y(F ) as a pessimistic version of ND(Kn, f, X0) t with the presence of an adversary: at every round the random sequence, e.g. Z = (Zt)t≥0. adversary tries to push the state toward the unbiased Lemma 5.2. There exists a monotone coupling (X˜, Y˜) configuration, and it always corrupts F nodes with the between X (AF ) and Y(F ) under the partial order ≤bias minority opinion. Initially, Y0 = X0. At time t if we set y = Yt, Yt+1 The proof of this lemma is straightforward, and we defer is uniformly sampled from the proof to the full version.

0 0 {y ∈ Ω: ∃i ∈ [n], ∀j 6= i, y = yj} 5.3 Proof of Theorem 5.1 (5.21) j ∩ {y0 ∈ Ω: Bias(y0) = Bias(y) + 1} Proof. [theorem 5.1] We call an event A increasing if x ∈ A implies that any y ≥ x is also in A. Observe 1  1  −(1−γ) with probability max{f 2 + bias(y) 2 − bias(y) − that Aγ := {y ∈ Ω: bias(y) > 1/2 − n } is (M1+1)F n , 0}, or uniformly sampled from increasing with respect to ≤bias. Therefore given a random sequence Z = (Zt)t≥0 0 0 {y ∈ Ω: ∃i ∈ [n], ∀j 6= i, yj = yj}   (5.22) −(1−γ) 0 0 PZ [Tγ (z) > τ] = PZ max bias(Zt) ≤ 1/2 − n ∩ {y ∈ Ω: Bias(y ) = Bias(y) − 1} t≤τ

1  1  By lemma 5.2, for fixed τ > 0 and initial configura- with probability min{f 2 − bias(y) 2 + bias(y) + (M1+1)F tion z ∈ Ω: n , 1}; otherwise Yt+1 stays the same: Yt+1 = y. γ Recall that the time to reach an n -almost consen- PX (AF )[Tγ (z) > τ] sus is the hitting time to the set of states   −(1−γ) = PX max bias(Xt) ≤ 1/2 − n A {y ∈ Ω: bias(y) > 1/2 − n−(1−γ)}, t≤τ γ ,   ˜ −(1−γ) = P(X˜,Y˜) max bias(Xt) ≤ 1/2 − n and we use Tγ (z) to denote the hitting time to a set of t≤τ state Aγ .   ˜ −(1−γ) ˜ ˜ = P(X˜,Y˜) max bias(Xt) ≤ 1/2 − n , X ≥bias Y Lemma 5.1. The expected nγ -almost consensus time of t≤τ   the Markov chain Y is maxy EY(F )[Tγ (y)] = O (n log n). ˜ −(1−γ) ≤ P ˜ ˜ max bias(Yt) ≤ 1/2 − n (X ,Y) t≤τ This lemma is very similar to theorem 4.1 and we defer the proof to the full version. = PY(F )[Tγ (z) > τ].

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited On the other hand, applying Markov’s inequality [6] Eli Ben-Naim, Laurent Frachebourg, and Paul L Krapivsky. Coarsening and persistence in the voter EY(F )[Tγ (z)] model. Physical Review E, 53(4):3078, 1996. [T (z) > τ] ≤ , PY(F ) γ τ [7] Itai Benjamini, Siu-On Chan, Ryan ODonnell, and by lemma 5.1, PY(F )[Tγ (z) > τ] can be arbitrary Omer Tamuz, and Li-Yang Tan. Convergence, una- small by taking τ = O (n log n) which finishes the proof. nimity and disagreement in majority dynamics on unimodular graphs and random graphs. Stochas- 6 Future Work tic Processes and their Applications, 126(9):2719– This work leaves several open questions. The most glar- 2733, 2016. ing question is whether we can analyze the maximum [8] Claudio Castellano, Daniele Vilone, and Alessan- consensus time of iterative majority dynamic on Erd¨os dro Vespignani. Incomplete ordering of the voter R´enyi random graphs. Another question is if we prove model on small-world networks. EPL (Europhysics the upper bounds of consensus time on sparse Erd¨os Letters), 63(1):153, 2003. R´enyi random graphs, or even general expander graphs? On the other hand, can we prove lower bound for the [9] Claudio Castellano, Vittorio Loreto, Alain Barrat, consensus time on these graphs? Federico Cecconi, and Domenico Parisi. Compari- To answer these questions, we may need a more re- son of voter and glauber ordering dynamics on net- fined understanding of “expansiveness” in , works. Physical review E, 71(6):066107, 2005. because the conventional notion for expansion/pseudo- randomness of graph can tell us the average connection [10] Fan Chung and Ron Graham. Quasi-random between the nodes in a set and its complement. How- graphs with given degree sequences. Random ever for node dynamics, we need to know: given a set of Structures & Algorithms, 32(1):1–19, 2008. nodes how every nodes in the set connects to the set’s [11] Fan Chung and Mary Radcliffe. On the spectra of complement. general random graphs. the electronic journal of , 18(1):P215, 2011. References [1] Mohammed Amin Abdullah and Moez Draief. [12] Fan RK Chung. Spectral graph theory. Number 92. Global majority consensus by local majority polling American Mathematical Soc., 1997. on graphs of a given degree sequence. Discrete Ap- [13] Colin Cooper, Robert Els¨asser, and Tomasz plied , 180:1–10, 2015. Radzik. The power of two choices in distributed voting. In International Colloquium on Au- [2] David Aldous et al. Interacting particle systems as tomata, Languages, and Programming, pages 435– stochastic social dynamics. Bernoulli, 19(4):1122– 446. Springer, 2014. 1149, 2013. [14] Colin Cooper, Tomasz Radzik, Nicol´asRivera, and [3] J Antonovics and P Kareiva. Frequency-dependent Takeharu Shiraga. Fast plurality consensus in reg- selection and competition: empirical approaches. ular expanders. arXiv preprint arXiv:1605.08403, Philosophical Transactions of the Royal Society of 2016. London B: Biological Sciences, 319(1196):601–613, 1988. [15] J Theodore Cox and David Griffeath. Diffusive clustering in the two dimensional voter model. The [4] R´emiBardenet, Odalric-Ambrym Maillard, et al. Annals of Probability, pages 347–370, 1986. Concentration inequalities for sampling without [16] James Cruise and Ayalvadi Ganesh. Probabilistic replacement. Bernoulli, 21(3):1361–1385, 2015. consensus via polling and majority rules. Queueing Systems, 78(2):99–120, 2014. [5] Luca Becchetti, Andrea Clementi, Emanuele Na- tale, Francesco Pasquale, and Luca Trevisan. Sta- [17] Benjamin Doerr, Leslie Ann Goldberg, Lorenz Min- bilizing consensus with many opinions. In Proceed- der, Thomas Sauerwald, and Christian Scheideler. ings of the Twenty-Seventh Annual ACM-SIAM Stabilizing consensus with the power of two choices. Symposium on Discrete Algorithms, pages 620–635. In Proceedings of the twenty-third annual ACM Society for Industrial and Applied Mathematics, symposium on Parallelism in algorithms and archi- 2016. tectures, pages 149–158. ACM, 2011.

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited [18] Devdatt P Dubhashi and Alessandro Panconesi. [31] Elchanan Mossel, Joe Neeman, and Omer Tamuz. Concentration of measure for the analysis of ran- Majority dynamics and aggregation of information domized algorithms. Cambridge University Press, in social networks. Autonomous Agents and Multi- 2009. Agent Systems, 28(3):408–429, 2014.

[19] Andreas Eberle. Markov processes. Lecture Notes [32] F James Rohlf and Gary D Schnell. An investiga- at University of Bonn, 2009. tion of the isolation-by-distance model. The Amer- ican Naturalist, 105(944):295–324, 1971. [20] Glenn Ellison. Learning, local interaction, and co- ordination. Econometrica: Journal of the Econo- [33] Frank Schweitzer and Laxmidhar Behera. Nonlin- metric Society, pages 1047–1071, 1993. ear voter models: the transition from invasion to coexistence. The European Physical Journal B- [21] Reza Gheissari and Anna Ben Hamou. Aimpl: Condensed Matter and Complex Systems, 67(3): Markov chain mixing times, available at 301–318, 2009. http://aimpl.org/markovmixing. [34] Vishal Sood and Sidney Redner. Voter model on [22] Richard A Holley and Thomas M Liggett. Ergodic heterogeneous graphs. Physical review letters, 94 theorems for weakly interacting infinite systems (17):178701, 2005. and the voter model. The annals of probability, pages 643–663, 1975. [35] Krzysztof Suchecki, Victor M Eguiluz, and Maxi San Miguel. Conservation laws for the voter model [23] Yashodhan Kanoria, Andrea Montanari, et al. Ma- in complex networks. EPL (Europhysics Letters), jority dynamics on trees and the dynamic cavity 69(2):228, 2005. method. The Annals of Applied Probability, 21(5): 1694–1748, 2011. [36] Krzysztof Suchecki, V´ıctorM Egu´ıluz, and Maxi San Miguel. Voter model dynamics in complex [24] Motoo Kimura and George H Weiss. The step- networks: Role of dimensionality, disorder, and ping stone model of population structure and the . Physical Review E, 72(3): decrease of genetic correlation with distance. Ge- 036132, 2005. netics, 49(4):561, 1964. [37] Omer Tamuz and Ran J Tessler. Majority dynam- [25] Paul L Krapivsky and Sidney Redner. Dynamics of ics and the retention of information. Israel Journal majority rule in two-state interacting spin systems. of Mathematics, 206(1):483–507, 2015. Physical Review Letters, 90(23):238701, 2003.

[26] David Asher Levin, Yuval Peres, and Elizabeth Lee A Proofs in Section 4.2 Wilmer. Markov chains and mixing times. Ameri- can Mathematical Soc., 2009. Proof. [lemma 4.1] Define s :Ω 7→ R as follows

n/2−1 [27] Thomas M Liggett. Coexistence in threshold voter X models. The Annals of Probability, pages 764–802, s(x) = d`, where Bias(x) < n/2 1994. `=Bias(x) s(x) = 0, where Bias(x) = n/2 [28] Thomas M Liggett et al. Stochastic models of interacting systems. The Annals of Probability, 25 Note that the value of s only depends on the bias of (1):1–29, 1997. each state and for x, y ∈ Ω with Bias(x) = Bias(y) we have s(x) = s(y), so we can abuse the notion [29] Jane Molofsky, Richard Durrett, Jonathan and consider potential function with integral domain Dushoff, David Griffeath, and Simon Levin. Local s : [0, n/2 − 1] 7→ R such that s(l) , s(x) for some x frequency dependence and global coexistence. such that Bias(x) = l. Theoretical population biology, 55(3):270–282, To prove s is a valid super solution of τ, by 1999. corollary 2.1 it is sufficient for us to show that [30] Andrea Montanari and Amin Saberi. Convergence (A.1) Ls(x) ≤ −1 where Bias(x) < n/2, to equilibrium in local interaction games and ising models. arXiv preprint arXiv:0812.0198, 2008. (A.2) s(x) ≥ 0 where Bias(x) = n/2.

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited For the Equation (A.1), if Bias(x) = ` and 0 < ` < n/2, some positive constant C1,C2,C3,C4,D1 such that √ √  X (A.3) d` ≤ C1 n where ` < D1 n , Ls(x) = Px,ys(y) − s(x) n √  y∈Ω (A.4) d` ≤ C2 where D1 n ≤ ` ≤ n,ˇ X ` = P s(Bias(y)) − s(Bias(x)) x,y (A.5) d` ≤ C3 wheren ˇ < ` ≤ (1/2 − ˆ)n, y∈Ω n d ≤ C where (1/2 − ˆ)n < ` < n/2(A.6) ` 4 n/2 − ` By the definition of M, P (x, y) > 0 only if the states x, y differ by at most one digit, we only need to consider Supposing the above inequalities are true, by lemma 4.1 the states y such that |Bias(y)−`| ≤ 1, by the defintion we can complete the proof as follows: of s, n/2−1 X max EM[T (G, f, x)] ≤ d` X  x∈Ω Ls(x) = Px,y s(` + 1) − s(`) `=0 y:Bias(y)=`+1 √ D1d ne−1 nˇ X  X √ X n + Px,y s(` − 1) − s(`) ≤ C n + C 1 2 ` y:Bias(y)=`−1 `=0 √ `=D1d ne     (1/2−ˆ)n n/2−1 X X n = − Px,y d` + Px,y d`−1 X X     + C3 + C4 y:Bias(y)=`+1 y:Bias(y)=`−1 n/2 − ` `=ˇn+1 `=(1/2−ˆ)n+1 = − [Bias(X0) = ` + 1|X = x]d PM ` √ √ nˇ 1 0   X + PM[Bias(X ) = ` − 1|X = x]d`−1 ≤ D1 n · C1 n + C2n + √ ` `=D1d ne + − by the definition of p (x) and p (x) nˆ −1 X 1 + − C3(1/2 − ˆ− ˇ)n + C4n Ls(x) = −p (x)d` + p (x)d`−1 ` `=1  1 p−(x)  ≤ −p+(x) + d + p−(x)d = −1 = O(n ln n) p+(x) p+(x) `−1 `−1 Now we are going to use induction to prove Equa- where the last equality comes from the definition of d` tions (A.3), (A.4), (A.5), and (A.6). Equation (A.3): We first use induction to prove On the other hand, if Bias(x) = 0 √ 1 n the following inequality: If A(n) = + + + and X  √ p p A1B1 Ls(x) = Px,y s(1) − s(0) n √ B(n) = p+A B , for all `, 0 ≤ ` ≤ D1 d ne y:Bias(y)=1 1 1 0 +  ` = − [Bias(X ) = 1|X = x]d = −p (x)d ≤ −1. A1B1 PM 0 0 (A.7) d ≤ A(n) 1 + √ − B(n) ` n Equation (A.2) automatically holds by the definition of Because for all constant D1 there exists some constant s.  ` √ A√1B1 Therefore, applying corollary 2.1 we have C1 > 0 such that A(n) 1 + n − B(n) ≤ C1 n for Pn/2−1 √ maxx∈Ω τ(x) ≤ maxx∈Ω s(x) = `=0 d`. all ` ≤ D1 d ne, the Equation (A.3) is proven once the Equation (A.7) is true. Now, let’s prove (A.7). The proof of theorem 4.2, which is rather straight- For ` = 0, applying Equation (4.11) to Equa- forward but tedious, uses lemma 4.1 and a careful es- tion (4.9), we have timation of the potential function from the recursive 1 equation (4.9). d0 = max + is rather straightforward but tedious by use of x∈Ω:Bias(x)=0 p (x) lemma 4.1, and estimation of the potential function (because {x ∈ Ω: Bias(x) = 0} ⊂ Σs ∪ Σm) from the recursive Equation (4.9) carefully. 1 ≤ max x∈Σs∪Σm p+(x) Proof. [theorem 4.2] With the help of lemma 4.1, we (by Equation (4.11)) ≤ 1/p+ only need to give an upper bound the recursive equa- tions (4.9). With the condition in the statements, sup- (by the definition of A and B) pose we prove the following equations: There exists = A − B

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited  `−1 2 A1B1 and D1 ≥ 4B1 and D ≥ 4/A1 we have Suppose d`−1 ≤ A 1 + √ − B for some 1 √ n √ 1 < ` < D1 d ne. Since ` < D1 d ne − 1 < nˇ , A1 A1B1 1 s ≤ A1 − − 2 {x ∈ Ω: Bias(x) = `} ⊂ Σ and we can apply 2 D1 D1 equation (4.13) and (4.11) to equation (4.9) and have √ (because ` > D1 d ne)  √  n A1` − A1B1 n 1    ≤ − 1 B1 ` n ` (A.8) d ≤ + 1 + A √ − ` d ` + 1 `−1  √  p n n A1` − A1B1 n 1   ≤ − 1 A1B1 ` − 1 n ` ≤ + 1 + √ d`−1 √ p+ n n n  A ` − A B n = − 1 − 1 1 1 ` ` − 1 n By induction hypothesis, and definition of B we have 2 1 A1 Because C2 ≥ + , we have + ≤ and using the p A2 C2p 2 above inequality we get    `−1 ! 1 A1B1 A1B1 √ ≤ + 1 + √ A 1 + √ − B 1 n n  A ` − A B n p+ n n 1 1 1 + ≤ − 1 − C2p ` ` − 1 n  A B ` A B 1  ≤ A 1 + √1 1 − B − √1 1 B − which completes proving Equation (A.4). Finally, by n n p+ the Equation (A.4)  ` A1B1 n ≤ A 1 + √ − B (A.9) d ≤ C = C /.ˇ n nˇ 2 nˇ 2

Equation (A.5): We use induction to prove d` is Equation (A.4): We use induction again to prove √ bounded above by some constant C3 for all ` such that Equation (A.4) holds√ for D1 d ne ≤ ` ≤ nˇ . √ nˇ < ` ≤ (1/2 − ˆ)n. For ` = D1 d ne, we already have d` ≤ C1 n ≤ n For ` =n ˇ + 1, because {x ∈ Ω: Bias(x) = C2 √ so if we take C2 ≥ C1D1 m D1d ne nˇ + 1} ⊂ Σ , we can apply (4.19) and (4.16) into Equation (4.9) and have √ n n d ≤ C n ≤ C √ = C . 1 ` 1 2 2 (A.10) d` ≤ + (1 − A2) d`−1 D1 d ne ` p+ 1 n √ (by Equation (A.9)) ≤ + + (1 − A2) C2/ˇ Suppose d`−1 ≤ C2 `−1 for some D1 d ne < ` < nˇ . p Because {x ∈ Ω: Bias(x) = `} ⊂ Σs, by equation (A.8) 1 ≤ A2 + + (1 − A2) C2/ˇ and induction hypothesis we have p A2 1 Because 0 ≤ A2 < 1 if we take C3 = max{ + ,C2/ˇ}    p A2 1 B1 d ≤ + 1 + A √ − ` d the base case of (A.3) is true. Suppose d`−1 ≤ C3 for ` p+ 1 n `−1 √ somen ˇ < ` < (1/2 − ˆ)n, because {x ∈ Ω: Bias(x) = 1  A ` − A B n m = + 1 − 1 1 1 d `} ⊂ Σ we can use (A.10) and p+ n `−1 1 1 d = + (1 − A ) d ≤ + (1 − A ) C ` p+ 2 `−1 p+ 2 3 and  1  ≤ C3 − A2C3 − + ≤ C3,   √  p 1 A1` − A1B1 n n d` ≤ + + 1 − C2 1 p n ` − 1 because C3 = max{ + C2/ˇ} √ p A2 C n  1  A ` − A B n C n C n This finishes the proof of Equation (A.5). = 2 + + 1 − 1 1 1 2 − 2 . ` p+ n ` − 1 ` Equation (A.6): Because {x ∈ Ω: Bias(x) = `} ⊂ Σl for all (1/2 − ˆ)n < ` < n/2 we can apply (4.20) and (4.17) into Equation (A.9), and get 1 Therefore equation (A.4) is proven if C p+ +  √  2 4 4n A1`−A1B1 n n n 2 d` ≤ +(1−A3)d`−1 = +(1−A3)d`−1 1 − − ≤ 0. By taking C2 ≥ + n `−1 ` p A2 1/2 − `/n n/2 − ` Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited Recursively applying this relation, d` is upper bounded V into three sets of vertices Sx,Tx,Ux ⊂ V such that by ˆ (B.11) Sx = {v ∈ V : x(v) = 0, rx(v) < }, l `−j 2 X 4n(1 − A3) `−(1/2−ˇ)n ˆ +(1−A3) d n/2 − j (1/2−ˆ)n Tx = {v ∈ V : x(v) = 0, rx(v) ≥ }, and(B.12) j=(1/2−ˆ)n+1 2 (B.13) Ux = {v ∈ V : x(v) = 1}. because of Equation (A.5) this is at most Observe that Ux is the set of red nodes in configuration

l `−j x, and Sx ∪ Tx is the set of blue nodes so |Sx ∪ Tx| = X 4n(1 − A3) ≤ + C P os(x) ≥ nˆ . Moreover by the definition of M with n/2 − j 3 j=(1/2−ˆ)n+1 update function f, the definition in (B.12), and the monotone property of f, the probability a node v ∈ T (taking i = ` − j) x becomes red in the next step, given v is chosen and the `−(1/2−ˆ)n−1 (1 − A )i ˆ X 3 current configuration is x, is greater than f( 2 ). As a ≤ 4n + C3 n/2 − ` + i result, every node in Tx has a constant probability to i=0 change if chosen, and `−(1/2−ˆ)n−1 4n X n/2 − ` ≤ (1 − A )i + C |T |  ˆ n/2 − ` 3 n/2 − ` + i 3 p+(x) ≥ x · f i=0 G n 2

i n/2−` i Therefore, if we prove the following inequality Because (1 − A3) n/2−`+i ≤ (1 − A3) and taking C ≥ 2/A +C ˆ , d is bounded above by ˆ 4 3 3 ` (B.14) |S | < |V | x 2 ∞ 4n X i 4 n C4n ˆ (1−A3) +C3 = +C3 ≤ then the size of set Tx is greater than 2 |V |, and we have n/2 − ` A3 n/2 − ` n/2 − ` + ˆ ˆ i=0 pG(x) ≥ 2 f 2 which finishes the proof. Now it is sufficient for us to prove equation (B.14). B Exertion and Drift: Proof of Lemma 4.2 and By the definition in (B.11) we can upper bound the 4.3 number of edges between Sx and Ux, e(Sx,Ux), and use + In this section, we want to control the exertion pG(x) mixing lemma 2.1 to upper bound the size of Sx. + − and drift pG(x) − pG(x) of the process M on graph First, since the degree of nodes are nearly uniform, G ∼ Gn,p. To achieve these upper bounds, we prove the volume of Sx, and Ux can be bounded several properties of dense Erd¨os-R´enyi graphs which might seem ad-hoc, but there is a common thread un- (B.15) (1 − δd)np|Sx| ≤ vol(Sx) ≤ (1 + δd)np|Sx| der these lemmas: concentration phenomena in dense (B.16) (1 − δd)np|Ux| ≤ vol(Ux) ≤ (1 + δd)np|Ux|, Erd¨os-R´enyi graph. Our main tools are the spectral and by the definition of S in (B.11) the number of edges property of random graph and several variants of Cher- x between S and U can be bounded as follows: noff bounds. x x ˆ ˆ (B.17) e(S ,U ) ≤ · vol(S ) ≤ · (1 + δ )np|S | B.1 Exertion and Lemma 4.2 We partition the x x 2 x 2 d x lemma 4.2 into lemma B.1 and B.2, and use the mixing Applying mixing lemma 2.1 on sets S and U , and we + x x lemma 2.1 to show all configurations have pG(x) close have to that of the complete graph if G is a good expander. vol(Sx)vol(Ux) p e(Sx,Ux) − ≤ λ vol(Sx)vol(Ux) Lemma B.1. (Exertion of Σs, Σm) If G λ-expander vol(G) 2 with nearly uniform degree E(δd), δd < 1 and λ < vol(Sx)vol(Ux) p − e(Sx,Ux) ≤ λ vol(Sx)vol(Ux) 1−δd · ˆ , for all x with Bias(x) < 1/2 − ˆ, vol(G) 1+δd 18 (by equation (B.17)) ˆ  ˆ + vol(Sx)vol(Ux) ˆ p f < pG(x) ≤ 1. − vol(S ) ≤ λ vol(S )vol(U ) 2 2 vol(G) 2 x x x (B.18) Proof. Let’s consider a fixed configuration x where ˆ ≤   vol(Ux) ˆ p p pos(x) < 1/2 and where the number of red nodes is less − vol(Sx) ≤ λ vol(Ux) than the number of blue nodes’. We can partition the vol(G) 2

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited For the left hand side, because the degree of G is near For the lower bound, similar to lemma B.1, given a vol(Ux) configuration, we partition the set of nodes V into three uniform, we can approximate the ratio of vol(G) by the 0 0 0 |Ux| sets Sx,Tx,Ux ratio of |G| as follows 0 1 vol(U ) ˆ (B.21) S = {v ∈ V : x(v) = 1, rx(v) ≥ }, x − pvol(S ) x 2 vol(G) 2 x 0 1   (B.22) Tx = {v ∈ V : x(v) = 1, rx(v) < }, (1 − δd)|Ux| ˆ p 2 ≥ − vol(Sx) 0 0 0 (1 + δd)|V | 2 (B.23) Ux = {v ∈ V : x(v) = 1} = Sx ∪ Tx

Because pos(x) < 1/2, this is + To show a lower bound for pG(x), it is sufficient to   show that the fraction of red nodes T 0 ⊂ V is large and (1 − δd)/2 ˆ p x ≥ − vol(Sx) has constant probability to change to blue if selected (1 + δ ) 2 d to update. Because the probability that node v ∈ T 0 1 1 − δ  x ≥ d − ˆ pvol(S ) becomes blue in the next step given v is chosen with 2 1 + δ x d configuration x is f(1 − rx(v)), by the definition in 1 (B.22) and by the monotone property of f (B.19) ≥ pvol(S ). 3 x 1 For the right hand side, we can upper bound the volume f(1 − r (v)) ≥ f ≥ 1/2. x 2 of Ux by

2 Suppose (B.20) vol(Ux) ≤ vol(V ) ≤ (1 + δd)n p. 1 Applying equations (B.19) and (B.20) into equa- (B.24) |S0 | < P os(x) x 2 tion (B.18) yields 0 1 then the size of set Tx is greater than 2 P os(x), and we 1 2 0 vol(Sx) ≤ λ vol(Ux) + 1 |Tx| 1 9 have a lower bound for pG(x): 2 n ≥ 4 pos(x) which 2 2 finishes the proof vol(Sx) ≤ 9λ (1 + δd)n p 2 2 Now it is sufficient for us to prove equation (B.24). (1 − δd)np|Sx| ≤ 9λ (1 + δd)n p By the definition in (B.21) we can upper bound the 1 + δ 0 0 0 0 2 d number of edges between Sx and Ux, e(Sx,Ux), and use |Sx| ≤ 9λ n = o(n) 0 1 − δd mixing lemma 2.1 to upper bound the size of Sx. First, since the degree of nodes are nearly uniform, which is smaller than ˆn because λ2 < 1−δd · ˆ . 2 1+δd 18 the volume of Sx, and Ux can be bounded

(B.25) (1 − δ )np|S0 | ≤ vol(S0 ) ≤ (1 + δ )np|S0 | Lemma B.2. (Exertion of Σl) If G is a λ-expander d x x d x 2 0 0 0 with nearly uniform degree E(δd), δd < 1 and λ < (B.26) (1 − δd)np|Ux| ≤ vol(Ux) ≤ (1 + δd)np|Ux|, 2 1−δd (1/2−ˆ) 1+δ · 2 , for all x with bias(x) > 1/2 − ˆ, 0 d and by the definition of Sx in (B.11) the number of edges between S0 and U 0 can be bounded as follows 1 + x x (1/2 − bias(x)) < pG(x) ≤ (1/2 − bias(x)). 4 1 (B.27) e(S0 ,U 0 ) ≥ · vol(S0 ) Proof. Without lose of generality, we consider the con- x x 2 x figuration x where pos(x) < ˆ. Applying mixing lemma 2.1 on sets S0 and U 0 , we have The proof of the upper bound is straightforward. x x Suppose Hv = {v changes from blue to red in the step 0 0 0 0 vol(Sx)vol(Ux) p given v is chosen and the configuration is x} e(S ,U ) − ≤ λ vol(S0 )vol(U 0 ) x x vol(G) x x + 0 0 pG(x) = PM[Bias(X1) = Bias(x) + 1|X0 = x] 0 0 vol(Sx)vol(Ux) p 0 0 e(Sx,Ux) ≤ + λ vol(Sx)vol(Ux) 1 X vol(G) = [H ] n PM v v∈V By equation (B.27) 1 X ≤ I[v is blue] = pos(x) = (1/2 − bias(x)). 1 vol(0S )vol(U 0 ) n · vol(S0 ) ≤ x x + λpvol(S0 )vol(U 0 ) v∈V 2 x vol(G) x x Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited Reorganizing the last inequality we have, Without lost of generality if pos(x) > 1/2,

0  2 DG(x) = EM[P os(X )|X = x] − P os(x) λ 0 0 1 X vol(Sx) ≤  0  vol(Ux) 0 1 vol(Ux) = PM[X (v) = 1|v is chosen,X = x] − pos(x) 2 − vol(G) n v∈V 1 X vol(U 0 ) = f(r (v)) − pos(x), Because 1 − x = 1 − pos(x) > 1/2 − ˆ, n x 2 vol(G) 2 v∈V

2 λ and by symmetry of Gn,p we can fixed arbitrary node vol(S0 ) ≤ vol(U 0 ). x (1/2 − ˆ)2 x v ∈ V and have

(B.29) G [DG(x)] = G [f(rx(v))] − pos(x). Finally by equations (B.25) and (B.26) and taking δd E n,p E n,p small enough Lemma B.3. (Expected Drift in Σs) If x ∈ Σs λ2 1 + δ 1 0 d 0 0 where bias(x) < ˇ then there exists constant K1 > 0 |Sx| ≤ 2 · vol(Ux) < vol(Ux) (1/2 − ˆ) 1 − δd 2 such that for large enough n,

(1/2−ˆ)2 1  1  K The last inequality holds because λ2 < 1−δd · . [D (x)] ≥ f + bias(x) − + bias(x) − 1 . 1+δd 2 EGn,p G 2 2 n B.2 Drift and Lemma 4.3 In this section, we want to prove lemma 4.3 As discussed in section 4.3, we Lemma B.4. (Expected Drift in Σm) If x ∈ Σm s m will prove lower bounds for drift DG(x) in Σ , Σ and where ˇ ≤ bias(x) ≤ 1/2 − ˆ then there exists constant l + Σ separately, and use the lower bound for pG(x) in K2 > 0 such that for large enough n, EGn,p [DG(x)] is lemma 4.2 to prove lemma 4.3. greater than

s m r B.2.1 Drift in Σ and Σ The high level idea 1  1  log n f + bias(x) − + bias(x) − K . is to use a serial of triangle inequalities: Given a 2 2 2 n configuration x ∈ Ω:

s m 1. The drift DG(x) is close to its expectation Lemma B.5. (Small noise in Σ and Σ ) For all s m EGn,p [DG(x)]; x ∈ Σ ∪ Σ , DG(x) there exists a constant L > 0 such that when n large enough 2. The expectation EGn,p [DG(x)] is close to the drift on complete graphs D (x); and L Kn D (x) − [D (x)] > −√ G EGn,p G n 3. The drift on the complete graph DKn (x) is lower bounded by its bias(x). happens with high probability over the randomness of G . The third part is easy because when pos(x) > 1/2 the n,p drift D (x) is Kn Properties of rx(v). Due to equation (B.29), to (B.28) prove lemma B.3 and B.4, it is sufficient for us to analyze + − DKn (x) = p (x) − p (x) = f(pos(x)) − pos(x) Kn Kn {EGn,p [f(rx(v))]}x∈Ω is close to {f (pos(x))}x∈Ω for some fixed node v ∈ V . We use the principal of and equations (4.18), (4.19), and (4.20) can be obtained deferred decisions—We reveal the randomness of the by the definition of f. graph G ∼ Gn,p after fixing node v and configuration x, Therefore, our strategy for states in Σs, Σm is to ar- and apply a union bound over all configurations x ∈ Ω. gue the value of {DG(x)}x∈Ω is close to {DKn (x)}x∈Ω Fixing a configuration and node v, let’s consider a with high probability. The first part is proved in bin with P os(x) red balls and n − P os(x) blue balls, lemmas B.3 and B.4, and the second part is ful- if we sample k balls without replacement, the expected filled in lemma B.5. Informally lemma B.3 shows number of red balls among those k ball is pos(x)·k, and

EGn,p [DG(x)]−DKn (x) = O(1/n), and lemma B.4 shows this random number has the same distribution as the p EGn,p [DG(x)] − DKn (x) = O( (log n)/n). random variable rx(v) · k if G ∼ Gn,p is conditioned on Before digging into the lemmas, let’s rewrite DG(x). the degree of v being k.

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited We define Ex(δr; v) to be the event Proof. [lemma B.7] Let A the event that X is in the interval [EX − , EX + ] (B.30) Ex(δr; v) , {G : |rx(v) − pos(x)| ≤ δrpos(x)}. Var[X|(1 − δr)EX ≤ X ≤ (1 + δr)EX](B.32) Since r (v) · k can be seen as a sample without replace- x = Var[X|A] ment, a standard argument combining theorems 2.3 and h 2 i 2.2 upper bounds the probability of it deviating from = E X − E[X|A] |A expectation by the one of sampling with replacement. h 2 i (B.31) (B.33) ≤ E X − E[X] |A  2  δr k pos(x) PGn,p [Ex(δr; v)|deg(v) = k] ≤ 2 exp − The last inequality is true, because for all z, 3  2  2 E (Z − z) ≥ E (Z − EZ) . Proof of the Lemmas As discussed below equa- On the other hand, Var[X] is equal to tion (B.29), we want to prove the difference be-  2  2 E (X − E[X]) A] Pr[A]+E (X − E[X]) ¬A](1−Pr[A]) tween EGn,p [f(rx(v))] and f (pos(x)) is of order O(1/n). However, in contrast to the O(p(log n)/n) error in Because |X − E[X]| ≥  conditioned on ¬A and  ≥ lemma B.4 we need a smoothness property of f around |X − E[X]| if A happens, 1/2 to derived this stronger result. The following two  2 (B.34) E (X − E[X]) A] ≤ Var[X] lemmas prove some basic results about smooth func- tions and conditional variance. The proof is completed by combining (B.33), and (B.34). Lemma B.6. Given I ⊆ R, and X is a random variable with support in I and expectation X, if g : 7→ is Proof. [lemma B.3] Following equation (B.29), our goal E R R is to derive a better approximation of M2-smooth in I, then

|EGn,p [f(rx(v))] − f(pos(x))| M2 2 2 E[g(X)] − g(EX) ≤ EX − (EX) . 2 We take ˇsmall enough so that [1/2−2ˇ, 1/2+2ˇ] ⊆ I1/2, ˇ Lemma B.7. Given a real-valued random variable X and so by the definition the update function f is M2- and  > 0 such that P[EX −  ≤ X ≤ EX + )] > 0, we smooth function in [1/2 − 2ˇ, 1/2 + 2ˇ]. Moreover we have take constants δr, δd to that δr ≤ ˇ and δd < 1. Let E be the event Ex(δr; v) ∧ E(δd; v) defined in Var[X|EX −  ≤ X ≤ EX + ] ≤ Var[X] equation (B.30) and lemma 2.2 respectively. Informally, if E happens that means the value of rx(v) is close to Proof. [lemma B.6] Let h(t) , g(EX + t(X − EX)). Because g is smooth, we use the fundamental theorem its expectation pos(x) and the degree is nearly uniform. Therefore we can decompose [f(r (v))]−f(pos(x)) of Calculus, and have EGn,p x as follows Z 1  0 | [f(r (v))] − f(pos(x))| E[g(X)] − g(EX) = EX [g(X) − g(EX)] = E h (t)dt EGn,p x 0 ≤ | [f(r (v))|E] − f(pos(x))| + [¬E] Z 1  EGn,p x PGn,p = g0 X + t(X − X)(X − X)dt    E E E E ≤ EGn,p f(rx(v))|E − f EGn,p [rx(v)|E] 0  + f EGn,p [rx(v)|E] − f(pos(x)) + PGn,p [¬E] Because g is M2-smooth, we have for all a, a + b ∈ I g0(a) − M |b| ≤ g0(a + b) ≤ g0(a) + M |b| and by taking Now we want to give upper bounds for these three terms. 2 2   a = X and b = t(X − X) For the first term, |EGn,p f(rx(v))|E − E E  f EGn,p [rx(v)|E] | only depends on the random variable [g(X)] − g( X) EX E rx(v)|E. By the definition of E the random variable Z 1  0 rx(v)|E has support in [(1 − δr)pos(x), (1 + δr)pos(x)] ≤ EX (g (EX) + M2t(X − EX)) (X − EX)dt and pos(x) ∈ [1/2 − ,ˇ 1/2 + ˇ]. Therefore the support 0   of rx(v)|E is in I1/2 and 0 M2 2 = EX g (EX)(X − EX) + (X − EX)  2 EGn,p [f(rx(v))|E] − f EGn,p [rx(v)|E] M 2  2 Mˇ 2 = EX (X − EX) ≤ Var [r (v)|E] 2 2 Gn,p x M 2 2   Mˇ 2 The lower bound − 2 EX (X − EX) ≤ EX [g(X)] − ≤ VarGn,p [rx(v)|E(δd)]. g(EX) is can be derived similarity. 2 Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited The first inequality is true because f is smooth in I1/2 Proof. [lemma B.4] Let E be the event of Ex(δr; v) ∧ and lemma B.6. The second comes from lemma B.7. E(δd; v) defined in equation (B.30) and lemma 2.2 Now we want to upper bound Var[rx(v)|E(δd)]. Re- respectively. Informally, if E happens that means the call that we observed that the random variable k · value of rx(v) is close to expectation pos(x) and the rx(v)|deg(v) = k can be seen as sampling balls from degree is nearly uniform. Therefore we can decompose bin with a pos(x) fraction of red balls without replace- EGn,p [f(rx(v))] − f(pos(x)) as follows ment. Because the variance is a convex function by | [f(r (v))] − f(pos(x))| theorem 2.2, the value of Var[rx(v)|deg(v) = k] is upper EGn,p x bounded by the variance of sampling k balls from the ≤ |EGn,p [f(rx(v))|E] − f(pos(x))| + PGn,p [¬E](B.39) pos(x)(1−pos(x)) same bin with replacement, k . As a result, The first term |EGn,p [f(rx(v))|E] − f(pos(x))| Since the pos(x)(1 − pos(x)) update function f is Lipschitz with Lipschitz constant VarGn,p [rx(v)|E(δd)] ≤ . (1 − δd)np M1, if the event E happens |rx(v) − pos(x)| ≤ δrpos(x) s and, Because x ∈ Σ , 1/2 − ˇ < pos(x) < 1/2 + ˇ, and δd is some constant independent of n |f(rx(v)) − f(pos(x))| ≤ M1 · |rx(v) − pos(x)| (B.35) 2 ≤ M · δ pos(x) ≤ M δ    1/4 − ˇ 1 1 r 1 r |EGn,p f(rx(v))|E − f EGn,p [rx(v)|E] | = · . (1 − δd)p n p By taking δr = A log n/n for some constant A which For the second term, because the update func- will be specified later we have tion f is Lipschitz, it is sufficient to prove an up- r log n per bound for |EGn,p [rx(v)|E] − pos(x)|. Note that in (B.40) |EGn,p [f(rx(v))|E] − f(pos(x))| = M1A the properties of rx(v) we show that EGn,p [rx(v)] = n pos(x). By the law of total probability we have, For the second term by equation (B.37), [¬E] is | [r (v)] − [r (v)|E]| ≤ | [r (v)|¬E] − PGn,p EGn,p x EGn,p x EGn,p x smaller than EGn,p [rx(v)|E]|PGn,p [¬E] which is less than ≤ 2PGn,p [¬E] 2 2 because 0 ≤ rx(v) ≤ 1. Therefore we have  δ (1 − δ )np · pos(x) −δ np 2 exp − r d + 2 exp d 3 3 (B.36) |EGn,p [rx(v)|E] − pos(x)| ≤ 2PGn,p [¬E]. m For the last term, [¬E] we just use a union because pos(x) ≥ ˆ = Ω(1) when x ∈ Σ . If δd PGn,p p bound: is some small constant and δr = A log n/n, then

by taking A large enough PGn,p [¬E] is smaller than 2 2 PGn,p [¬E] = PGn,p [¬Ex(δr; v) ∪ ¬E(δd; v)]  A (1−δ )p·pos(x)   −δ np  2 exp − d log n + 2 exp d There- ≤ [¬E (δ )|E(δ ; v)] + [¬E(δ ; v)] 3 3 PGn,p x r d PGn,p d fore (B.37) r    2  log n 1 2 −δdnp ≤ 2 exp − δ (1 − δ )np · pos(x) + 2 exp . (B.41) PGn,p [¬E] ≤ 3 r d 3 n Equation (B.37) is derived from equation 2.2 and equa- Combining equation (B.40) and (B.41) into equa- tion (B.31). Because p > 0 and pos(x) ≥ 1/2 − ˇ are tion (B.39), and have s constants when x ∈ Σ for large enough n we have r log n 1 | [f(r (v))] − f(pos(x))| ≤ (M A + 1) (B.38) [¬E] ≤ . EGn,p x 1 n PGn,p n

Recall that δd, δr are constants independent of n. Com- and the proof is completed by taking K2 = (M1A + 1). bining equations (B.35), (B.36), and (B.38), we finish 1/4−ˇ2 Proof. [lemma B.5] Given a fixed configuration x ∈ the proof with K1 = + 3. (1−δd)p A, random variable DG(x) has expectation D(x) with For lemma B.4, we want to prove the differ- randomness over Gn,p. Assuming the following claim which we will later prove: ence between EGn,p [f(rx(v))] and f (pos(x)) is of order O(plog n/n) which is much weaker than lemma B.3, Claim B.1. If δd is some fixed constant, there√ exists and we only need to use the Lipschitz properties of some constant K > 0 such that for all t > 1/ n, then update function f, and concentration phenomenon for 2 2 rx(v) shown in equation (B.31). PGn,p [DG(x)−EGn,p DG(x) < −Kt|E(δd)] ≤ exp −n t Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited By taking t = p2 ln 2/n in G only affects two endpoints, and changes the value 1 P 1 of n v∈V f(rx(v)) by at most O( nk ). (B.42) As a result, if E(δd) happens, every node has nearly " √ # 0 K 2 ln 2 uniform degree with constant δd. For all G, G in E(δd) Gn,p DG(x) − Gn,p DG(x) < − √ E(δd) P E n that differ in just the presence of a single edge e, we can take ce = maxG,G0 |DG(x) − DG0 (x)| and 1 ≥ 1 − exp (−2n ln 2) = 1 − . 4n  1   1  (B.43) c = O = O e n min deg(v) n2 Apply a union bound over all configurations x ∈ Ω = v∈V n {0, 1} √we will derived a high probability result with Therefore, there exists some constant ξ > 0 such that L = K 2 ln 2 P 2 2 e ce = ξ/n and 0 ≤ DG(x) ≤ 1, so we can apply " √ # theorem 2.4 and K 2 ln 2 PGn,p ∀x, DG(x) − EGn,p DG(x) ≥ − √  0  n PGn,p DG(x) − EGn,p DG(x) < −t − PGn,p [¬E(δd)]|E(δd)   " √ # 2 2 02 K 2 ln 2 ≤ exp − n t ≥ Gn,p ∀x, DG(x) − Gn,p DG(x) ≥ − √ E(δd) ξ P E n Note that by equation (2.3) when δd is some fixed√ − PGn,p [¬E(δd)]. constant and n is large enough PGn,p [¬E(δd)] ≤ 1/ n, By union bound, it is greater than and we finish the proof of√ equation (B.1) by taking K ≥ pξ/2 + 1 and t ≥ 1/ n. " √ # K 2 ln 2 n l 1 − 2 Gn,p DG(x) − Gn,p DG(x) ≥ − √ E(δd) B.2.2 Drift in Σ Here we consider the phase of the P E n process when the fraction of red nodes is almost 1. The + − PGn,p [¬E(δd)]. laziness 1/pG(x) should be roughly the inverse of the fraction of blue nodes and increases as the bias increases. By equation (B.42), this is lower bounded by As a result to prove equation (4.20) we need to give a

n −n better lower bound for the drift DG(x). 1 − 2 · 4 − PGn,p [¬E(δd)] = 1 − o(1). Lemma B.8. There exists small enough constants δd > Therefore, it is sufficient for us to prove claim B.1. 0, ˆ > 0, and K3 > 0. If G has nearly uniform degree, Following the analysis in equation (B.29), if E(δd), such that DG(x) ≥ K3(1/2 − bias(x)) for all pos(x) > 1/2, x ∈ Σl. 1 X D (x) = f(r (v)) − pos(x) The following proof is basically a counting argument: G n x l v∈V when x ∈ Σ the number of red nodes is so small for any node to ahve a majority of red neighbors. Now we think of {f(r (v))} as a set real- G,x x∈Ω,v∈V Proof. Without lose of generality, we only consider valued functions with input G indexed by x and v. configurations x where pos(x) < ˆ and pos(x) = 1/2 − Similarly we think {DG(x)}x∈Ω as a set of real-valued bias(x). Given p, δd we can take ˆ small enough such functions with input G. We will apply theorem 2.4 ˆ that ∈ I0. Because there are at mostn ˆ red with event E(δd) to prove claim B.1 which consists (1−δd)p of two parts: showing the maximum effect/Lipschitz nodes and for all v ∈ V deg(v) ≥ (1 − δd)np, we have r (v) ∈ I and by the property of update function constant ci is small, and showing the event E(δd) x 0 happens with high probability so that it does not change (B.44) f(r (v)) ≤ Mˆ · r (v) < r (v) the expectation too much. x 1 x x

For the first part, recall that the update function If we define Rx = {u ∈ V : x(u) = 1} to be the set of 4 f is M1-Lipschitz. Because given x, v if the degree of red nodes, by similarly to equation (B.29) we have the node v is k then adding/removing a single edge 1 X in G changes the value of rG,x(v) by at most 1/k, p+(x) − p−(x) = pos(x) − f(r (v)). G G n x rG,x(v) is 1/k-Lipschitz. Therefore the Lipschitz con- v∈V stants of {f(rG,x(v))}x∈Ω,v∈V are uniformly bounded by O(M1/k) = O(1/k). Moreover fixing x if every node 4In contrast to equation (B.29) where pos(x) > 1/2, here have degree at least k, adding/removing a single edge pos(x) < 1/2

Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited By the equation (B.44), this is greater than On the other hand, by equation (4.16), we have

1 X 1 1 ≥ pos(x) − Mˆ r (v) (B.46) 1 ≤ < n 1 x + ˆ ˆ v∈V pG(x) 2 f 2 ˆ M1 X e(Sx, v) = pos(x) − Multiplying equation (B.45) by equation (B.46) we n deg(v) v∈V have, ˆ M1 e(Sx,V ) p−(x) Mˇ − 1 K + L 1 ≥ pos(x) − . G ≤ 1 + − 1 bias(x) + 1 √ n minv∈V deg(v) + ˆ ˆ ˆ n pG(x) fˆ 2 2 f 2

The last is true because deg(v) ≥ minv∈V deg(v) P P which finishes the proof of equation (4.18) by taking and e(Sx, v) = e(Sx,V ). Because e(Sx, v) ≤ ˇ v v A = M1−1 , and B = 2(K1+L) which are positive |Sx| 1 fˆ ˆ 1 Mˇ −1 |Sx| maxu∈S deg(u), and = pos(x), ( 2 ) 1 x n constants. ! For equation (4.19). First, for drift DG(x) = Mˆ 1 maxu∈S deg(u) p+(x) − p−(x) ≥ 1 − x pos(x) p+(x) − p−(x) using argument similar to the proof of G G min deg(v) G G v∈V (4.18), we have with high probability by lemmas B.4   m + − 1 + δd and B.5, for all x ∈ Σ , p (x) − p (x) is greater than ≥ 1 − Mˆ pos(x) G G 1 − δ 1 d     1 1 K2 L > K3pos(x) = K3(1/2 − bias(x)) (B.47) f + bias(x) − + bias(x) − √ − . 2 2 n n

The last inequality is true by taking δd small enough 1+δd Recalled that the update function f is Lipschitz and and 0 < K3 ≤ 1 − Mˆ 1. 1−δd ∀0 < h < 1/2, f(1/2 + h) > 1/2 + h, we can define its B.3 Proof of Lemma 4.3 minimum over a compact set [1/2 +, ˇ 1 − ˆ]     Proof. We prove each equation in turn. 1 1 (B.48) 0 < δf , min f + h − + h + ˇ≤h≤1/2−ˆ 2 2 Equation (4.18). First, for drift DG(x) = pG(x) − − pG(x) we apply the idea illustrated at the beginning of section B.2.1. For the first and second steps, we have Combining equations (B.47), and (B.48), for large enough n we have with high probability for all x ∈ Σm      + − 1 1 pG(x) − pG(x) − f + bias(x) − + bias(x) − + K2 L δf 2 2 (B.49) pG(x) − pG(x) ≤ −δf + √ + ≤ − + − + − + − n n 2 = pG(x) − pG(x) − (p (x) − p (x)) + (p (x) − p (x))  1  1  Multiplying equation (B.49) by equation (B.46) we − f + bias(x) − + bias(x) 2 2 have, p−(x) δ G ≤ 1 − f By lemma B.3 and B.5, with high probability, this is + ˆ pG(x) fˆ 2 greater than which finishes the proof of equation (4.19) by taking δ K1 L K1 + L A = f and 0 < A < 1. ≥ − − √ ≥ − √ 2 fˆ ( ˆ ) 2 n n n 2 For equation (4.20), by lemmas B.8 and 4.2, we − + 1 For the last step, because the update function f satisfies have pG(x)−pG(x) ≤ −K3(1/2−bias(x)), and 4 (1/2− 0 ˇ + f (1/2) ≥ M1 > 1, we can take ˇ small enough such that bias(x)) ≤ pG(x). Therefore for all h such that 0 ≤ h < ˇ, − pG(x)     ≤ 1 − 4K3 1 1 Mˇ 1 + 1 + f + h − f ≥ h pG(x) 2 2 2 This finishes the proof by taking A3 = 4K3. As a result, with high probability we have for all x ∈ Σs where bias(x) < ˇ

Mˇ − 1 K + L (B.45) p−(x) − p+(x) ≤ − 1 bias(x) + 1√ G G 2 n Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited