arXiv:2010.06025v3 [math.PR] 12 Jan 2021 ∗ † eateto ahmtc n ttsis nvriyo Victoria, of University Statistics, and Mathematics of Department Email: erqieprubto tegh aifig1 satisfying strengths perturbation require we size of populations spatial Keywords: t in approximat resolve graphs. processes we density application, expected the an with that As literature process densities. Gaussian limiting prove the a are in to converge of fluctuations processes normalized density limiting the space, Introduction 1 Contents (2000): Classification Subject Mathematics diffusion. Wright–Fisher the equation; replicator References 7 distributions lineage coalescing of properties Further 6 processes density vector the of Convergence 5 distributions lineage ancestral the in Decorrelation 4 dynamics Semimartingale 3 results Main 2 esuytemlisrtg tcatceouinr aew game evolutionary stochastic multi-strategy the study We h elctreuto nsohsi spatial stochastic in equation replicator The . oprsnwt uain ...... graphs . regular . random . . large . on . decorrelation . . Full mutations . with . comparison 6.2 A . . . 6.1 . . fluctuati . . Wright–Fisher . the . . and . regularity . equation . path replicator . and The . equations . . of . closure . 5.2 Asymptotic . . . . . 5.1 ...... times meeting Extensions local for conditions 4.2 Mixing 4.1 [email protected] vltoaygms oe oe;caecne lotexpo almost coalescence; model; voter games; Evolutionary ...... N ∞ → vltoaygames evolutionary h oe savtrmdlprubto.Frtpclpopula typical For perturbation. model voter a is model The ...... aur 3 2021 13, January uTn Chen Yu-Ting 03,8C2 00,60J99. 60F05, 82C22, 60K35, Abstract /N 1 ≪ w ...... h elctreuto nmn non-regular many on equation replicator the e ≪ epstv ojcuefo h biological the from conjecture a positive he ooe h elctreuto,adthe and equation, replicator the obey to n rts ouba Canada. Columbia, British ∗† .Udraporaecniin nthe on conditions appropriate Under 1. h rgtFse oainefunction covariance Wright–Fisher the t et-it paigi expanding in updating death-birth ith ...... n ...... ons ...... etaiyo itn ie;the times; hitting of nentiality ...... tions, 39 33 21 13 10 36 33 31 21 17 15 6 2 1 Introduction

The replicator equation is the most widely applied dynamics among evolutionary game models. It also gives the first dynamical model in evolutionary [40], thereby establishing an important connection to theoretical explanations of animal behaviors [31]. For this application, the derivation of the equation considers a large well-mixed population. The individuals implement strategies from a finite set S with #S ≥ 2, such that the payoff to σ ∈ S from playing against σ′ ∈ S is Π(σ, σ′). In the continuum, the density Xσ of strategy σ evolves with a per capita rate given by the difference between its payoff

′ Fσ(X)= Π(σ, σ )Xσ′ (1.1) ′ σX∈S ′ and the average payoff of the population, where X = (Xσ′ ; σ ∈ S). Hence, the vector density process of strategy obeys the following replicator equation:

X˙ σ = Xσ Fσ(X) − Fσ′′ (X)Xσ′′ , σ ∈ S. (1.2) ′′ ! σX∈S ′ This equation is a point of departure for studying connections between the payoff matrix (Π(σ, σ ))σ,σ′∈S, and the equilibrium states of the model by methods from dynamical systems. See [23, 27] for an in- troduction and more properties. The replicator equation also arises from the Lotka–Volterra equation of ecology and Fisher’s fundamental theorem of [27, 38]. In this paper, we consider the stochastic evolutionary game dynamics in large finite structured populations. Our goal is to prove that the vector density processes of strategy converge to the replicator equation. In this direction, one of the major results in the biological literature is the convergence to the replicator equation on large random regular graphs [34]. The authors further conjecture that their approximations extend to more general graphs [34, Section 5]. To obtain the proofs, we view the model as a perturbation of the voter model, since this viewpoint has made possible several mathematical results for it (e.g. [20, 10, 18, 13, 4, 14]). Our starting point here is the method in [14], extended from [11, 12], for proving the diffusion approximations of the time-changed density processes of strategy under weak selection. In that context, the corresponding perturbations away from the voter model use strengths typically given by w = O(1/N), where N is the population size. The questions from [34] nevertheless concern very different properties. The crucial step of the method in [14] develops along the equivalence of probability laws in the limit between the evolutionary game model and the voter model. Now, this property breaks down for nontrivial parameters according to the limiting equation from [34]; distributional limits of the density processes under the evolutionary game and the voter model degenerate to delta distributions of distinct deterministic functions as solutions of differential equations. As will be explained in this introduction, the convergence to the replicator equation also requires the different range of perturbation strengths w satisfying 1/N ≪ w ≪ 1. This stronger perturbation implies weaker relations between the two models, and thus, calls for perturbation estimates of the evolutionary game model by the voter model generalizing those in [14]. With this change of perturbation strengths, the choice of time changes for the density processes and the characterization of coefficients of the limiting equation are the further tasks to be settled. Before further explanations of the main results of [34, 14], let us specify the evolutionary game model considered throughout this paper. First, to define spatial structure, we impose directed weights q(x,y) on all pairs of sites x and y in a given population of size N. We assume that q is an irreducible probability transition kernel with a zero trace x q(x,x) = 0. The perturbation strength w> 0 defines the selection intensity of the model in the following form: for an individual at site x using strategy ξ(x), P 2 its interactions with the neighbors determine the fitness as the sum 1 − w + w y q(x,y)Π ξ(x),ξ(y) , where ξ(y) denotes the strategy held by the neighbor at y. Under the condition of positive fitness P  by tuning the selection intensity appropriately, the death-birth updating requires that in a transition of state, an individual is chosen to die with rate 1. Then the neighbors compete to fill in the site by reproduction with probability proportional to the fitness. Although the main results of this paper extend to include mutations of strategies, we relegate this additional mechanism until Section 2. Besides the case of mean-field populations, the density processes of strategy play a significant role in the biological literature for studying equilibrium states of spatial evolutionary games. Here and throughout this paper, we refer the density of σ under population configuration ξ to the weighted sum 1 x π(x) σ ξ(x) , where π is the stationary distribution associated with the transition probability q. For such macroscopic descriptions of the model, the critical issue arises from the non-closure of the P  stochastic equations. The density processes are projections of the whole system, and in general, the density functions are not Markov functions in the sense of [37]. More specifically, for the evolutionary game with death-birth updating introduced above, the microscopic dynamics from pairwise interac- tions determine the densities’ dynamics. It is neither clear how to reduce the densities’ dynamics analytically in the associated Kolmogorov equations. See [14, Section 1] for more details on these issues and the physics method discussed below. One of the main results in [34] shows that for selection intensities w ≪ 1, the expected density processes on large random k-regular graphs for any integer k ≥ 3 approximately obey the following extended form of the replicator equation :

X˙ σ = wXσ Fσ(X)+ Fσ(X) − Fσ′′ (X)Xσ′′ , σ ∈ S. (1.3) ′′ ! σX∈S e ′ Here, Fσ(X) and Fσ(X) are linear functions in X = (Xσ′ ; σ ∈ S) such that the constant coefficients are explicit in the payoff matrix and the graph degree k. See [34, Equations (22) and (36)]. Note that (1.3) underlinese the nontrivial effect of spatial structure, since the coefficients are very different from those for the replicator equation (1.1) in mean-field populations. For the derivation of (1.3), [34] applies the physics method of pair approximation. It enables the asymptotic closure of the equations of the density processes by certain moment closure approximations and circumvents the fundamental issues discussed above. Moreover, based on computer simulations from [35], the authors of [34] conjecture that the approximate replicator equation for the density processes applies to many non-regular graphs, provided that the constant graph degree k in the coefficients of the replicator equation is replaced by the corresponding average degree. In approaching this conjecture, it is still not clear on how the average degrees of graphs enter. The method in this paper does not extend to this generality either. On the other hand, even within the scope of large random regular graphs, the constant graph degrees and the locally regular tree-like property seem essential in [34]. We notice that locally tree-like spatial structures are known to be useful to pair approximations in general [39]. In the case of two strategies, the supplementary information (SI) of [35] shows that the density processes of a fixed one approximate the Wright–Fisher diffusion with drift. The derivation also applies pair approximations on large random regular graphs, although it is noticeably different from the derivation in [34] for the replicator equation on graphs. (A slow-fast dynamical system for the density and a certain rapidly convergent local density is considered in [35, SI].) It is neither clear how to justify the derivation in [35, SI] mathematically. On the other hand, the diffusion approximations of the density processes can be proven on large finite spatial structures subject to appropriate, but general, conditions [14, Theorem 4.6] that include random regular graphs as a special case. See [8, 9] for mathematical investigations of moment closure in other spatial biological models and some general discussions, among other mathematical works in this direction.

3 The method in [14] begins with the aforementioned asymptotic equivalence of probability laws via perturbations for w = O(1/N) (not just equivalence of laws of the density processes). The 1/N- threshold is sharp such that the critical case yields nontrivial drifts of the limiting diffusions. This relation reduces the convergence of the game density processes to a convergence problem of the voter model. For the latter, fast mixing of spatial structure ensures approximations of the coalescence times in the ancestral lineages by analogous exponential random variables from large mean-field populations. This method goes back to [17]. Moreover, for the voter density processes, the relevant coalescence times can be reduced to the meeting times for two independent copies of the stationary Markov chains over the populations. The almost exponentiality of hitting times [1, 2, 3] applies to these times and leads to the classical diffusivity in the voter density processes in general spatial populations [11, 12]. The first moments of these meeting times are also used to time change the densities for the convergence. Besides the methods, the convergence results in [14, Theorem 4.6] for the game density processes under the specific setting of large random regular graphs and the payoff matrices for prisoner’s dilemma games are closely related to the replicator equation on graphs from [34]. See (2.15) for these payoff matrices and [15] for the exact asymptotics N(k−1)/[2(k−2)], N →∞, of the expected meeting times on the large random k-regular graphs. In this case, the diffusion approximations in [35, SI] hold to the degree of matching constants if the time changes are formally undone [15]. (See also [15, Remark 3.1] for a correction of inaccuracy in [14] on passing limits along random regular graphs.) This standpoint extends to a recovery of the replicator equation on graphs from [34] by a similar formal argument. It shows that these results in [34, 35], both due to pair approximations, are algebraically consistent with each other. See the end of Section 2 for details and the second main result discussed below for further comparison. In addition to its own interest, the replicator equation on graphs concerns a unified characterization of the evolutionary game within an enlarged range of selection intensities as mentioned above. The main results of this paper obtain the convergence to the replicator equation under the above specific setting, in addition to extensions to general spatial populations and payoff matrices. Multiple strategies and mutations are allowed. See Theorem 2.2 and Corollary 2.3. For the extended context, the first main result [Theorem 2.2 (1◦)] proves the convergence of the vector density processes of strategy under the following assumptions. We require that the stationary distributions associated with the spatial structures are asymptotically comparable to the uniform distributions (see (2.12)), and these spatial structures allow for suitable time changes of the density processes and suitable selection intensities (Definition 2.1). Here, for the typical eligible populations, the time changes can range in 1 ≪ θ ≪ N. The selection intensities are of the inverse order so that 1/N ≪ w ≪ 1. Then the precise limiting equation is given by (1.3), with the selection intensity w replaced by a constant w∞ as a limit of the parameters. The proof also determines Fσ(X) and Fσ(X) for (1.3):

′ Fσ(X)= κ0|2|3 Π(σ, σ )Xσ′ , e (1.4) ′ σX∈S Fσ(X) = (κ(2,3)|0 − κ0|2|3)Π(σ, σ) ′ ′ + (κ(0,3)|2 − κ0|2|3)[Π(σ, σ ) − Π(σ ,σ)]Xσ′ e ′ (1.5) σX∈S ′ ′ − (κ(2,3)|0 − κ0|2|3) Π(σ ,σ )Xσ′ . ′ σX∈S Here, κ(2,3)|0, κ(0,3)|2, and κ0|2|3 are nonnegative constants defined by the asymptotics of some coales- cent characteristics of the spatial structures. See Section 4 for the definitions of these constants. The first main result [Theorem 2.2 (1◦)] has the meaning of spatial universality as the diffusion approximations of the voter model and the evolutionary game in [11, 12, 14], although it does not

4 recover the explicit equations obtained in [34] on large random regular graphs under general payoff matrices. The conditions do not require convergence of local geometry as in the large discrete tori and large random regular graphs. The spatial structures can remain sparse in the limit, which is in stark contrast to the usual assumptions for proving scaling limits of particle systems. The locally tree-like property usually assumed in pair approximations is not required either. Based on these properties, the first main result [Theorem 2.2 (1◦)] gives an answer in the positive for the conjecture in [34] to the degree of using constants that may depend implicitly on the space: The approximations of the expected density processes by the replicator equation extend to many non-regular graphs, whenever the initial conditions converge deterministically. To further the formal comparison mentioned above with the approximate Wright–Fisher diffusion from [35, SI], the second main result [Theorem 2.2 (2◦)] considers one additional aspect for the convergence of the density processes. In this part, the normalized fluctuations are proven to converge to a vector centered Gaussian martingale [Theorem 2.2 (2◦)]. The quadratic covariation is the Wright– t ′ Fisher diffusion matrix in the limiting densities X: 0 Xσ(s)[δσ,σ′ − Xσ′ (s)]ds, σ, σ ∈ S, where δσ,σ′ are the Kronecker deltas. For the case of only two strategies on large random regular graphs, this R covariation formally recovers the approximate Wright–Fisher diffusion term from [35, SI]. Note that this result and the convergence to the replicator equation do not imply the diffusion approximations of the density processes. In the rest of this introduction, we explain the proof of the first main result. Its investigation raises all the central technical issues pointed out above. First, the lack of an asymptotic equivalence of probability laws is resolved via the populations’ microscopic dynamics driving the density processes. Duhamel’s principle replaces the pathwise, global change of measure method in [14] and shows the irrelevance of selection intensities in the microscopic dynamics (Proposition 5.3). This approach then links to the decorrelation proven in [11, Section 4] for some “local” meeting time distributions, from the ancestral lineages, driving the dynamics of the voter density processes. Here, local meeting times refer to those where the initial conditions of the Markov chains are within fixed numbers of edges. The decorrelation property from [11] shows that the probability distributions of those particular local meeting times for general populations converge to nontrivial convex combinations of the delta distributions at zero and infinity. The time scales are slower than those for the diffusion approxima- tions. In particular, the exponential distribution is not just absent. No distributions with a nonzero mass between zero and infinity arise in the limit. Informally speaking, the decorrelation occurs at time scales between the period when details of the spatial structures dominate and the period when the almost exponentiality [1, 2, 3] plays a role. To us, this presence of multiple time scales in the evolutionary dynamics is reminiscent of the slow-fast dynamical system in [35, SI]. For the convergence to the replicator equation, the choice of the time changes for the densities and the characterization of the limiting equation use the decorrelation from [11] and its extensions. First, the time changes can only grow slower than those for the diffusion approximations since the limiting trajectories are less rougher. This requirement relates the convergence to the decorrelation. We are now interested in proving the best possible range of growing time changes for the decorrelation, not just using the particular ones from [11]. After all, in [34], the replicator equation is expected to be present within the broad range w ≪ 1, and our argument requires the selection intensities to be of the inverse order of the time changes. Moreover, the application of Duhamel’s principle mentioned above leads to the entrance of various local meeting times more than those for the voter densities in [11]. Simultaneous decorrelation in these local meeting times is essential for getting a deterministic limiting differential equation: This property involves asymptotic path regularity of the density processes. The constant coefficients in (1.4) and (1.5) also arise as the weights at infinity in the limiting local meeting time distributions for the typical eligible populations. See Sections 4 and 5 for the related proofs.

5 Organization. Section 2 introduces the evolutionary game model and the voter model analytically and discusses the main results (Theorem 2.2 and Corollary 2.3). In Section 3, we define the voter model and the evolutionary game model as semimartingales and briefly explain the role of the coalescing duality. In Section 4, we quantify the time changes in proving the main results and characterize the coefficients of the limiting equation. Section 5 is devoted to the main arguments of the proofs of Theorem 2.2 and Corollary 2.3. Finally, Section 6 presents some auxiliary results for coalescing Markov chains. Acknowledgments The author would like to thank Lea Popovic for comments on earlier drafts and Sabin Lessard for pointing out several references from the literature. Support from the Simons Foun- dation before the author’s present position and from the Natural Science and Engineering Research Council of Canada is gratefully acknowledged.

2 Main results

In this section, we introduce the stochastic spatial evolutionary game with death-birth updating in more detail. A discussion of the main results of this paper then follows. To be consistent with the viewpoint of voter model perturbations and the neutral role of the voter model, strategies will be called types in the rest of this paper. The settings here and in the next section are adapted from those in [11, 12, 14] to the context of evolutionary games with multiple types. Recall that a discrete spatial structure considered in this paper is given by an irreducible, reversible tr probability kernel q on a finite nonempty set E such that (q) = x∈E q(x,x) = 0. Write N = #E and π for the unique stationary distribution of q. The interactions of individuals are defined by a ′ P payoff matrix Π = (Π(σ, σ ))σ,σ′∈S of real entries. Fix w ∈ (0, ∞) such that

w + w q(y, z)|Π ξ(y),ξ(z) | < 1, ∀ w ∈ [0, w], y ∈ E. (2.1) z∈E X  Then the following perturbed transition probability is used to update types of individuals due to interactions:

def q(x,y) (1 − w)+ w q(y, z)Π ξ(y),ξ(z) qw(x,y,ξ) = z∈E . (2.2) ′ ′ ′ y′∈E q(x,y ) (1 − w)+Pw z∈E q(y , z)Π ξ(y ),ξ(z) With these updates and the updatesP based on a mutationP measure µ on S, two types of configurations ξx,y,ξx|σ ∈ SE result. They are obtained from ξ ∈ SE by changing only the type at x such that x,y x|σ ξ (x) = ξ(y) and ξ (x) = σ. Hence, the evolutionary game (ξt) is a Markov jump process with a generator given by

LwH(ξ)= qw(x,y,ξ)[H(ξx,y) − H(ξ)] x,y∈E X (2.3) + [H(ξx|σ) − H(ξ)]dµ(σ),H : S → R. S xX∈E Z The first sum on the right-hand side of (2.3) governs changes of types due to selection, and the second sum is responsible for mutations. Given ξ ∈ SE and a probability distribution ν on SE as initial Pw Ew Pw Ew Lw conditions, we write ξ and ξ , or ν and ν , under the laws associated with . For w = 0, the generator Lw is reduced to the generator L of the multi-type voter model with mutation, and the notation P and E is used.

6 The object in this paper is the vector density processes p(ξt) = (pσ(ξt); σ ∈ S) for the evolutionary game with death-birth updating. Here, the density function of σ ∈ S is given by

pσ(ξ)= 1σ ◦ ξ(x)π(x), (2.4) xX∈E w where f ◦ ξ(x)= f(ξ(x)). Under P , pσ(ξt) admits a semimartingale decomosition:

pσ(ξt)= pσ(ξ0)+ Aσ(t)+ Mσ(t), (2.5)

t Lw where Aσ(t) = 0 pσ(ξs)ds. In the sequel, we study the convergence of the vector density pro- cesses and the martingales Mσ separately, along an appropriate sequence of discrete spatial structures (n) R (En,q ) with Nn =#En →∞. (n) Convention for superscripts and subscripts. Objects associated with (En,q ) will carry either superscripts “(n)” or subscripts “n”, although additional properties may be assumed so that these (n) objects are not just based on (En,q ). Otherwise, we refer to a fixed spatial structure (E,q).  For the main theorem, we choose parameters as time changes for the density processes, mutation measures, and selection intensities. The choice is according to the underlying discrete spatial struc- 1 (n) 2 tures. We use νn( ) = x∈En π (x) and the first moment γn of the first meeting time of two independent stationary rate-1 q(n)-Markov chains. The other characteristic of the spatial structure P (n) (n) considers the mixing time tmix of the q -Markov chains and the spectral gap gn as follows. Recall (n) that the semigroup of the continuous-time rate-1 q(n)- is given by (et(q −1); t ≥ 0). With

t(q(n)−1) (n) dEn (t) = max e (x, ·) − π TV (2.6) x∈En for k · kTV denoting the total variation distance, we choose

(n) −1 tmix = inf{t ≥ 0; dEn (t) ≤ (2e) }. (2.7)

(n) The spectral gap gn is the distance between the largest and second largest eigenvalues of q .

Definition 2.1. For all n ≥ 1, let θn ∈ (0, ∞) be a time change, µn a mutation measure on S, and wn ∈ [0, w]. The sequence (θn,µn, wn) is said to be admissible if all of the following conditions hold. First, (θn) satisfies

θn −tθn lim θn = ∞, lim < ∞, lim γnνn(1)e = 0, ∀ t ∈ (0, ∞), (2.8) n→∞ n→∞ γn n→∞ and at least one of the two mixing conditions holds:

t(n) 1 −gnθn mix + (n) lim γnνn( )e = 0 or lim [1 + log (γn/tmix)] = 0, (2.9) n→∞ n→∞ θn

+ where log α = log(max{α, 1}). Second, we require the following limits for (µn) and (wn):

lim µn(σ)θn = µ∞(σ) < ∞, ∀ σ ∈ S; (2.10) n→∞ wnθn lim wn = 0, lim = w∞ < ∞, lim sup wnθn < ∞. (2.11) n→∞ n→∞ 2γnνn(1) n→∞

7 (n) Another condition of the main theorem requires that supn Nn maxx∈En π (x) < ∞, which implies γn ≥ O(Nn) (see (3.22) or [11, (3.21)] for details). In this context, the admissible θn has the following effects. If limn θn/γn ∈ (0, ∞), the time-changed density processes p1(ξθnt) of the voter model converge to the Wright–Fisher diffusion [11, 12]. Moreover, the density processes of the evolutionary game converge to the same diffusion but with a drift [14, Theorem 4.6]. These diffusion approximations hold under the mixing conditions slightly different from those in (2.9). Therefore, assuming limn θn/γn = 0 in (4.2) has the heuristic that the time-changed density processes have paths less rougher in the limit, and so, do not converge to diffusion processes. Note that this variation of time scales can be contrasted with, e.g., the context considered in [24] where, among other results, the discrete processes converge to the equilibrium states of the limiting process due to faster time changes. The other conditions for the admissible sequences mainly consider the typical case of “transient” 1 spatial structures. The kernels are characterized by the condition supn γnνn( ) < ∞ [11, Remark 2.4]. In this case, (2.8) can be satisfied by any sequence (θn) such that 1 ≪ θn ≪ Nn, and (2.10) and (2.11) allow for nonzero µ∞ and w∞. The somewhat tedious condition in (2.11) simplifies drastically, −1 and we get Nn ≪ wn ≪ 1 when limn wnθn is nonzero. As for the mixing conditions in (2.9), they 1 can pose severe limitations if the spatial structures are “recurrent” (supn γnνn( )= ∞). In this case, we may not be able to find admissible sequences such that w∞ > 0, so that the limiting equation to be presented below only allows constant solutions in the absence of mutation. For example, the 1 (n) two-dimensional discrete tori satisfy γnνn( ) ∼ C log Nn, tmix ≤ O(Nn) and gn = O(1/Nn). See [17] and [30, Theorem 10.13 on p.133, Theorem 5.5 on p.66 and Section 12.3.1 on p.157]. We can choose 2 θn = Nn(log log Nn) to satisfy (2.8) with limn θn/γn = 0, and the first mixing condition in (2.9). But now the admissible (wn) only gives w∞ = 0. We notice that a similar restriction is pointed out in [22] on the low density scaling limits of the biased voter model, where the limit is Feller’s branching diffusion with drift. From now on, we write πmin = minx∈E π(x) and πmax = maxx∈E π(x) for the stationary distribution π of (E,q). The main theorem stated below shows a law of large numbers type convergence for the density processes and a central limit theorem type convergence for the fluctuations. These two results do not combine to give the diffusion approximation of the density processes proven in [14].

(n) Theorem 2.2. Let (En,q ) be a sequence of irreducible, reversible probability kernels defined on finite sets with Nn =#En →∞. Assume the following conditions:

En (a) Let νn be a probability measure on S such that νn(ξ; p(ξ) ∈ ·) converges in distribution to a S probability measure ν∞ on [0, 1] . (b) It holds that

(n) (n) 0 < lim inf Nnπmin ≤ lim sup Nnπmax < ∞. (2.12) n→∞ n→∞

(c) The limits in (4.5) and (4.6) defining the nonnegative constants κ(2,3)|0, κ(0,3)|2 and κ0|2|3 exist. These constants depend only on space.

(d) We can choose an admissible sequence (θn,µn, wn) as in Definition 2.1 such that limn θn/γn = 0. Then the following convergence in distribution of processes holds: ◦ Pwn (1 ) The sequence of the vector density processes p(ξθnt), νn converges to the solution X of the following differential equation with the random initial condition P(X ∈ ·)= ν :  0 ∞

X˙ σ = w∞Xσ Fσ(X)+ Fσ(X) − Fσ′′ (X)Xσ′′ ′′ ! (2.13) σX∈S + µ∞(σ)(1 − Xσ) −e µ∞(S \ {σ})Xσ, σ ∈ S,

8 where Fσ(X) and Fσ(X) are linear functions in X defined by (1.4) and (1.5). Moreover, the sum of the κ-constants in Fσ(X) and Fσ(X) is nontrivial to the following degree: e (κ(2,3)|0 − κ0|2|3)+e κ0|2|3 + (κ(0,3)|2 − κ0|2|3) ∈ (0, ∞). (2.14)

(n) ◦ Pwn (2 ) Recall the vector martingale defined by (2.5), and set Mσ (t) = (Mσ(θnt); σ ∈ S) under νn . 1/2 (n) If, moreover, limn γnνn(1)/θn = 0 holds, then (γn/θn) M converges to a vector centered t ′ Gaussian martingale with quadratic covariation ( 0 Xσ(s)[δσ,σ′ − Xσ′ (s)]ds; σ, σ ∈ S). We present the proof of Theorem 2.2 in Section 5.R The existence of the limits in condition (c) is proven in Proposition 4.4. See Lemma 5.5 for the additional condition in Theorem 2.2 (2◦). To illustrate Theorem 2.2, we consider the generalized prisoner’s dilemma matrix in the rest of this section. The matrix is for games among individuals of two types :

1 0 1 b − c −c Π= (2.15) 0 b 0 ! for real entries b, c. (The usual prisoner’s dilemma matrix requires b>c> 0.) The proof of the following corollary also appears in Section 5.

Corollary 2.3. Let conditions (a)–(d) of Theorem 2.2 be in force and Π be given by (2.15). If, moreover, q(n) are symmetric (q(n)(x,y) ≡ q(n)(y,x)) and

(n) (n),2 (∞),2 lim γnνn(1)π {x ∈ En; q (x,x) 6= q } = 0 (2.16) n→∞

(∞),2 for some constant q , then the differential equation for X1 = 1 − X0 takes a simpler form:

(∞),2 X˙ 1 = w∞(bq − c)X1(1 − X1)+ µ∞(1)(1 − X1) − µ∞(0)X1. (2.17)

Corollary 2.3 applies to large random k-regular graph for a fixed integer k ≥ 3, with q(∞),2 = 1/k and γn/Nn → (k − 1)/[2(k − 2)] (see (6.19) and the discussion there). Additionally, (θn) can be chosen to be any sequence such that 1 ≪ θn ≪ Nn, and (wn) can be any such that (wnθn) converges in [0, ∞). See [15] and Section 6.2. (More precisely, the application needs to pass limits along subsequences, since these graphs are randomly chosen.) Assume the absence of mutation. Then in this case, one can formally recover the replicator equation (2.17) from the drift term of the approximate Wright–Fisher wn diffusion in [35, SI] as follows. For the density process p1(ξt) under P , that drift term reads

(k − 2)(b − ck) w · p (ξ )[1 − p (ξ )]. (2.18) n k(k − 1) 1 t 1 t

Note that γn ≈ Nn(k − 1)/[2(k − 2)] as mentioned above and the choice in (2.11) of wn gives wn ≈ −1 w∞2γnNn /θn. By using these approximations and multiplying the foregoing drift term by θn as a time change, we get the approximate drift w∞(b/k − c)p1(ξθnt)[1 − p1(ξθnt)] of p1(ξθnt). This approximation recovers (2.17). The same formal argument can be used to recover the noise coefficient in Theorem 2.2 (2◦). See also [14, Remark 4.10] for the case of diffusion approximations.

9 3 Semimartingale dynamics

In this section, we define the voter model and the evolutionary game model as solutions to stochastic integral equations driven by point processes. Then we view these equations in terms of semimartingales and identify some leading order terms for the forthcoming perturbation argument. We recall the coalescing duality for the voter model briefly at the end of this section. First, given a triplet (E,q,µ), an equivalent characterization of the corresponding voter model is F σ given as follows. Introduce independent ( t)-Poisson processes {Λ(x,y); x,y ∈ E} and {Λt (x); σ ∈ S,x ∈ E} such that

Λt(x,y) with rate E[Λ1(x,y)] = q(x,y) and (3.1) σ E σ Λt (x) with rate [Λ1 (x)] = µ(σ), x,y ∈ E, σ ∈ S.

These jump processes are defined on a complete filtered probability space Ω, F , (Ft), P . Then given an initial condition ξ ∈ SE, the (E,q,µ)-voter model can be defined as the pathwise unique SE-valued 0  solution of the following stochastic integral equations [19, 33]: for x ∈ E and σ ∈ S,

t 1σ ◦ ξt(x)= 1σ ◦ ξ0(x)+ [1σ ◦ ξs−(y) − 1σ ◦ ξs−(x)]dΛs(x,y) 0 yX∈E Z t t (3.2) 1 σ 1 σ′ + σS\{σ} ◦ ξs−(x)dΛs (x) − σ ◦ ξs−(x)dΛs (x). 0 ′ 0 Z σ ∈XS\{σ} Z Hence, the type at x is replaced and changed to the type at y when Λ(x,y) jumps, and the type seen at x is σ right after Λσ(x) jumps. Recall that the rates of the evolutionary game are defined by (2.2). With the choice of w from (2.1), qw(x,y,ξ) > 0 if and only if q(x,y) > 0. Hence, Girsanov’s theorem for point processes [28, Section III.3] can be applied to change the intensities of the Poisson processes Λ(x,y) to qw(x,y,ξ) w such that under a probability measure P equivalent to P on Ft for all t ≥ 0, t def w σ def σ Λt(x,y) = Λt(x,y) − q (x,y,ξs)ds & Λt (x) = Λt (x) − µ(σ)t (3.3) Z0 w w are (Ft, P )-martingales.b See [14, Section 2] for the explicit formb of D when S = {0, 1}. Since all of Λ(x,y) and Λσ(x) do not jump simultaneously under Pw by the absolute continuity with respect to P, the product of any distinct two of them has a zero predictable quadratic variation [28, Theorem 4.2, Propositionb b 4.50, and Theorem 4.52 in Chapter I]. The point processes defined above now allows for straightforward representations of the dynamics of the density processes. By (3.2),

t pσ(ξt)= pσ(ξ0)+ π(x) 1σ ◦ ξs−(y) − 1σ ◦ ξs−(x) dΛs(x,y) 0 x,yX∈E Z t   1 σ + π(x) S\{σ} ◦ ξs−(x)dΛs (x) (3.4) x∈E Z0 X t 1 σ′ − π(x) σ ◦ ξs−(x)dΛs (x). ′ 0 σ ∈XS\{σ} xX∈E Z To obtain the limiting semimartingale for the density processes, we use the foregoing equation to derive the explicit semimartingale decompositions of the density processes.

10 w To obtain these explicit decompositions, first, note that the dynamics of pσ(ξt) under P relies on E various kinds of frequencies and densities as follows. For all x ∈ E,ξ ∈ S and σ, σ1,σ2 ∈ S, we set

fσ(x,ξ)= q(x,y)1σ ◦ ξ(y), yX∈E f (x,ξ)= q(x,y)1 (y) q(y, z)1 ◦ ξ(z), σ1σ2 σ1 σ2 (3.5) yX∈E zX∈E f•σ(x,ξ)= q(x,y) q(y, z)1σ ◦ ξ(z), f(ξ)= π(x)f(x,ξ). yX∈E zX∈E xX∈E To minimize the use of the summation notation, we also express these functions in terms of stationary Z ′ Z ′ discrete-time q-Markov chains {Uℓ; ℓ ∈ +} and {Uℓ; ℓ ∈ +} with U0 = U0 such that conditioned on ′ ′ U0, the two chains are independent. Additionally, let (U, U ) ∼ π ⊗ π and (V,V ) be distributed as ν(x,y) P(V = x, V ′ = y)= , x,y ∈ E, (3.6) ν(1) 2 1 2 1 for ν(x,y) = π(x) q(x,y) and ν( ) = x,y ν(x,y) = x π(x) . (When q is symmetric, ν( ) reduces to N −1.) For example, f f = E[1 ◦ ξ(U ′ )1 ◦ ξ(U )1 ◦ ξ(U )]. We also set σ1 σ2σ3 σP1 1 σ2 P 1 σ3 2 ′ pσσ′ (ξ)= E[1σ ◦ ξ(V )1σ′ ◦ ξ(V )]. (3.7) Second, we turn to algebraic identities that determine the leading order terms for the forthcoming perturbation arguments. For w ∈ [0, w], the kernel qw defined by (2.2) can be expanded to the second order in w as follows: 1 − wB(y,ξ) qw(x,y,ξ)= q(x,y) 1 − wA(x,ξ) ∞ = q(x,y)+ wiq(x,y)[A(x,ξ) − B(y,ξ)]A(x,ξ)i−1 Xi=1 = q(x,y)+ wq(x,y)[A(x,ξ) − B(y,ξ)] + w2q(x,y)Rw(x,y,ξ), (3.8) where

A(x,ξ) = 1 − q(x, z) q(z, z′)Π ξ(z),ξ(z′) , z∈E z′∈E X X  B(y,ξ) = 1 − q(y, z)Π ξ(y),ξ(z) , z∈E X  and Rw is uniform bounded in w ∈ [0, w],x,y,ξ, (E,q).

Lemma 3.1. For all ξ ∈ SE and σ ∈ S,

def Dσ(ξ) = π(x) 1σ ◦ ξ(y) − 1σ ◦ ξ(x) q(x,y)[A(x,ξ) − B(y,ξ)] (3.9) x,y∈E X  

= Π(σ, σ3)fσ0 fσσ3 (ξ) − Π(σ2,σ3)fσfσ2σ3 (ξ). (3.10)

σ0,σ3∈S σ2,σ3∈S σX06=σ σX26=σ In particular, if Π is given by (2.15), then

D1(ξ)= bf1f•0(ξ) − bf10(ξ) − cf1f0(ξ). (3.11)

11 Proof. By using the reversibility of q and taking y in (3.9) as the state of U0 in the sequence {Uℓ} defined above, we can compute Dσ as

′ ′ Dσ(ξ)= − π(x)1σ ◦ ξ(y)q(x,y) q(x, z) q(z, z )Π ξ(z),ξ(z ) x,y∈E z∈E z′∈E X X X  + π(x)1σ ◦ ξ(y)q(x,y) q(y, z)Π ξ(y),ξ(z) x,y∈E z∈E X X  (3.12) + π(x)1σ ◦ ξ(x)q(x,y) q(y, z)Π ξ(y),ξ(z) x,y∈E z∈E X X  ′ ′ − π(x)1σ ◦ ξ(x)q(x,y) q(x, z) q(z, z )Π ξ(z),ξ(z ) x,y∈E z∈E z′∈E X X X  = −E 1σ ◦ ξ(U0)Π ξ(U2),ξ(U3) + E 1σ ◦ ξ(U2)Π ξ(U2),ξ(U3) (3.13) E 1 1 = −  σ ◦ ξ(U0) σ ◦ ξ(U2)Π ξ(U2),ξ(U3)  E 1 1 −  σ ◦ ξ(U0) S\{σ} ◦ ξ(U2)Π ξ(U2),ξ(U3) E 1 +  σ ◦ ξ(U2)Π ξ(U2),ξ(U3)  E 1 1 = S\{σ} ◦ ξ(U0) σ ◦ ξ(U2)Π ξ(U2),ξ(U3) E 1 1 −  σ ◦ ξ(U0) S\{σ} ◦ ξ(U2)Π ξ(U2),ξ(U3) . Here, we use the reversibility of q with respect to π to cancel the last two terms in (3.12) and write the first term in (3.12) as the first term in (3.13). See [13, Lemma 1 on p.8] for the case of two types. The proof of (3.11) appears in [14, Lemma 7.1]. Now (3.10) allows for a quick proof: D1(ξ) = (b − c)f0f11 − cf0f10 − bf1f01. Then we use the identities f0f11 + f0f10 = f0f1, f0f11 + f0f01 = f0f•1, and f0f01 + f0f01 = f01. This calculation will be used in the proof of Corollary 2.3. 

We are ready to state the explicit semimartingale decompositions of the density processes and identify the leading order terms. From (3.4), (3.8) and the martingales in (3.3), we obtain the following decompositions extended from (2.5):

pσ(ξt)= pσ(ξ0)+ Aσ(t)+ Mσ(t)= pσ(ξ0)+ Iσ(t)+ Rσ(t)+ Mσ(t), (3.14) where

t t Iσ(t)= w Dσ(ξs)ds + µ(σ) pσ′ (ξs) − µ(S \ {σ})pσ (ξs) ds, (3.15) 0 0 ′ ! Z Z σ ∈XS\{σ} t 2 w Rσ(t)= w π(x) 1σ ◦ ξs(y) − 1σ ◦ ξs(x) q(x,y)R (x,y,ξs)ds, (3.16) 0 x,yX∈E Z t   Mσ(t)= π(x) 1σ ◦ ξs−(y) − 1σ ◦ ξs−(x) dΛs(x,y) x,y∈E Z0 X   t b 1 σ + π(x) S\{σ} ◦ ξs−(x)dΛs (x) (3.17) 0 xX∈E Z t b 1 σ′ − π(x) σ ◦ ξs−(x)dΛs (x). ′ 0 σ ∈XS\{σ} xX∈E Z b

12 ′ By (3.3), the predictable quadratic variations and covariations of Mσ and Mσ′ , for σ 6= σ , are

t 2 hMσ, Mσit = π(x) 1σ ◦ ξs(y)[1 − 1σ ◦ ξs(x)] 0 x,yX∈E Z  w + [1 − 1σ ◦ ξs(y)]1σ ◦ ξs(x) q (x,y,ξs)ds (3.18) t 2 1 1 + π(x) S\{σ} ◦ ξs−(x)µ(σ)+ σ ◦ ξs(x)µ S \ {σ} ds, x∈E Z0 X  t  2 hMσ, Mσ′ it = − π(x) 1σ ◦ ξs(y)1σ′ ◦ ξs(x) 0 x,yX∈E Z  w + 1σ ◦ ξs(y)1σ′ ◦ ξs(x) q (x,y,ξs)ds t (3.19) 2 1  1 − π(x) S\{σ} ◦ ξs−(x) σ ◦ ξs−(x)µ(σ) 0 xX∈E Z 1  1 ′ + S\{σ′} ◦ ξs−(x) σ′ ◦ ξs−(x)µ(σ ) ds. In Section 5, the above equations play the central role in characterizin g the limiting density processes. For this study, we apply the coalescing duality between (E,q,µ)-voter model and the coalescing x x rate-1 q-Markov chains {B ; x ∈ E}, where B0 = x. These chains move independently before meeting, x y x y and for any x,y ∈ E, B = B after their first meeting time Mx,y = inf{t ≥ 0; Bt = Bt }. In the absence of mutation, the duality is given by

n n E 1 xi E 1 σi ◦ ξ0(Bt ) = ξ0 σi ◦ ξt(xi) (3.20) " # " # Yi=1 Yi=1 E for all ξ0 ∈ S , σ1, · · · ,σn ∈ S, distinct x1, · · · ,xn ∈ E and n ∈ N. See the proof of Proposition 6.1 for the foregoing identity and the extension to the case with mutations. Without mutation, the density process is a martingale under the voter model by (2.5), and it follows from (3.17) and (3.18) that, for any σ 6= σ′,

t E0 1 E0 ξ[pσ(ξt)pσ′ (ξt)] = pσ(ξ)pσ′ (ξ) − ν( ) ξ[pσσ′ (ξs)+ pσ′σ(ξs)]ds. (3.21) Z0 For the present problem, the central application of this dual relation is the foregoing identity [11]. Let the random variables defined below (3.5) to represent frequencies and densities be independent of the coalescing Markov chains. Then the foregoing equality implies that

t P(MU,U ′ >t) = 1 − ν(1) − 2ν(1) P(MV,V ′ >s)ds, ∀ t ≥ 0. (3.22) Z0 See [11, Corollary 4.2] and [3, Section 3.5.3]. This identity for meeting times has several important applications to the diffusion approximation of the voter model density processes. See [11, Sections 3 and 4] and [12].

4 Decorrelation in the ancestral lineage distributions

This section is devoted to a study of degenerate limits of meeting time distributions. Here, we (n) consider meeting times defined on a sequence of spatial structures (En,q ) as before. According

13 to the coalescing duality, these distributions are part of the ancestral line distributions of the voter model, and by approximation, the ancestral line distributions of the evolutionary game. On the other hand, these meeting times encode the typical local geometry of the space, but in a rough manner. With the study of these distributions, the main results of this section (Propositions 4.2 and 4.4) extend to the choice of appropriate time scaling constants and the characterization of the limiting density processes. These properties are crucial to the forthcoming limit theorems. Our direction in this section can be outlined in more detail as follows. Recall the auxiliary random variables defined below (3.5), which are introduced to represent frequencies and densities. Under mild mixing conditions similar to those in (2.9) with γn replaced by θn and the condition νn(1) → 0, the (n) sequence P (MV,V ′ /γn ∈ ·) is known to converge. The limiting distribution is a convex combination of the delta distribution at zero and an exponential distribution. Moreover, one can choose some sn →∞ such that sn/γn → 0 and the following t-independent limit exists:

def (n) κ0 = lim 2γnνn(1)P (MV,V ′ >snt), ∀ t ∈ (0, ∞) (4.1) n→∞ with κ0 = 1. See [11, Corollary 4.2 and Proposition 4.3] for these results. As an extension of this existence result, our first goal in this section is to introduce sufficient conditions for these sequences (sn). Specifically, we require that the limit (4.1) exists with κ0 ∈ (0, ∞). See Section 4.1. The following is enough for the existence and the applications in the next section.

Definition 4.1. We say that (sn) is a slow sequence if

sn −tsn lim sn = ∞, lim = 0, lim γnνn(1)e = 0, ∀ t ∈ (0, ∞), (4.2) n→∞ n→∞ γn n→∞ and at least one of the two mixing conditions holds:

t(n) 1 −gnsn mix + (n) lim γnνn( )e = 0 or lim [1 + log (γn/tmix)] = 0. (4.3) n→∞ n→∞ sn Our second goal is to extend the existence of the limit (4.1) to the existence of analogous time- independent limits for other meeting time distributions: for integers ℓ ≥ 1, ℓ0,ℓ1,ℓ2 ≥ 0 with ℓ0,ℓ1,ℓ2 all distinct, and all t ∈ (0, ∞),

def (n) κℓ = lim 2γnνn(1)P (MU ,U >snt); (4.4) n→∞ 0 ℓ def 1 (n) κ(ℓ ,ℓ )|ℓ = lim 2γnνn( )P MU ,U >snt, MU ,U >snt ; (4.5) 0 1 2 n→∞ ℓ0 ℓ1 ℓ1 ℓ2 def 1 P(n)  κℓ |ℓ |ℓ = lim 2γnνn( ) MU ,U >snt, MU ,U >snt, MU ,U >snt . (4.6) 0 1 2 n→∞ ℓ0 ℓ1 ℓ1 ℓ2 ℓ0 ℓ2  The extension to κ1 is straightforward if we allow passing limits along subsequences. Indeed, it follows ′ from the definition of {Uℓ} and (V,V ) that

πmin πmax P ′ P P ′ B R (MV,V ∈ Γ) ≤ (MU0,U1 ∈ Γ) ≤ (MV,V ∈ Γ), ∀ Γ ∈ ( +). (4.7) πmax πmin (n) Hence, by taking a subsequence of (En,q ) if necessary, (4.1) and condition (a) of Theorem 2.2 imply the existence of the limit κ1. In Section 4.2, we prove the existence of the other limits κℓ, ℓ ≥ 2. More precisely, we prove tightness results as in the case of κ1 so that the limits may be passed along subsequences. We also prove that the limits κℓ, ℓ ≥ 2, are in (0, ∞). Note that in proving these results, we do not impose convergence of local geometry as in the case of discrete tori or random regular graphs.

14 4.1 Mixing conditions for local meeting times To apply mixing conditions to meeting times, first, we recall some basic properties of the spectral gap and the mixing time for the product of the continuous-time q-Markov chains. Note that by coupling the product chain with initial condition (x,y) after the two coordinates meet, we get the coalescing chain (Bx,By) defined before (3.21). Now, the discrete-time chain for the product chain has a transition matrix such that each of the coordinates is allowed to change with equal probability. Hence, the spectral gap is given by g = g/2 [30, Corollary 12.12 on p.161]. If (qt) denotes the semigroup of the product chain, then e sup q (x,y), · − π ⊗ π ≤ 2d (t), (4.8) e t TV E (x,y)∈E×E  e where dE is the total variation distance defined by (2.6). Additionally, it follows from the definition the mixing time in (2.7) that

−k dE(ktmix) ≤ e , ∀ k ∈ N (4.9)

[30, Section 4.5 on p.55]. By the last two displays, the analogous mixing time tmix of the product chain satisfies e tmix ≤ 3tmix. (4.10)

(n) We are ready to prove the first main result ofe Section 4. Note that under the condition supn Nnπmax < ∞ (see the discussion below (2.11)), the first condition in (4.3) implies the first one in (4.11).

Proposition 4.2. Suppose that (sn) satisfies (4.2) and at least one of the following mixing conditions:

(n) tmix + (n) lim gnsn = ∞ or lim [1 + log (γn/tmix)] = 0. (4.11) n→∞ n→∞ sn

Then (4.1) holds with κ0 = 1.

(n) (n) Proof. Write fn(t) = P (MU,U ′ > t) and gn(t) = P (MV,V ′ > t). The required result is proved in two steps.

Step 1. We start with a preliminary result: for all t0 ∈ [0, ∞) and µ ∈ (0, ∞),

∞ −µt 1 lim 2γnνn(1) e gn sn(t + t0) dt = . (4.12) n→∞ µ Z0  To obtain (4.12), first, we derive a representation of the integrals in (4.12) by fn(t). Note that (3.22) under the q(n)-chain takes the following form:

t fn(snt) = 1 − νn(1) − 2νn(1)sn gn(sns)ds, t ≥ 0. Z0 Hence, for any fixed 0 ≤ t0 < ∞,

t fn(sn(t + t0)) − fn(snt0)= −2νn(1)sn gn sn(s + t0) ds, t ≥ 0. (4.13) Z0 

15 Taking Laplace transforms of both sides of the last equality, we get, for µ> 0,

∞ ∞ t −µt −µt e fn sn(t + t0) − fn(snt0) dt = −2νn(1)sn e gn sn(s + t0) dsdt Z0 Z0 Z0    2ν (1)s ∞  = − n n e−µtg s (t + t ) dt, µ n n 0 Z0  where the last integral coincides with the integral in (4.12). Next, rewrite the last equality as ∞ −µt 2γnνn(1) e gn sn(t + t0) dt Z0 γ µ ∞  = − n e−µt f s (t + t ) − f (s t ) dt s n n 0 n n 0 n Z0 ∞    γnµ −µt −sn(t+t0)/γn −snt0/γn = − e fn sn(t + t0) − fn(snt0) − [e − e ] dt sn 0 Z n o (4.14) e−snt0/γn    + . µ + sn/γn

The last term tends to 1/µ since sn/γn → 0. To take the limit of the integral term in (4.14), we use the first mixing condition in (4.3). In this case, a bound for exponential approximations of the distributions of MU,U ′ [3, Proposition 3.23] gives γ µ ∞ n e−µt f s (t + t ) − f (s t ) − [e−sn(t+t0)/γn − e−snt0/γn ] dt s n n 0 n n 0 n Z0 (4.15) 2 n   o ≤ −−−→ 0 gnsn n→∞ now that gn = gn/2. Alternatively, by a different bound from [1, Theorem 1.4], the foregoing inequality holds with the bounde replaced by

e (n) (n) C4.16tmix + (n) C4.16 · 3tmix + (n) 1 + log (γn/tmix) ≤ 1 + log (γn/(3tmix)) (4.16) sn sn e   −1   by (4.10), the monotonicity of x 7→ x(1 +e log(x ∨ 1)) on (0, ∞), where C4.16 is independent of the q(n)-chains.The last term in (4.16) tends to zero by the second mixing condition in (4.3). Finally, we apply (4.15) and (4.16) to (4.14). Since the last term in (4.14) tends to 1/µ, we have proved (4.12). Step 2. We are ready to prove the existence of the limit in (4.1) and its independence of t. First, note that since gn is decreasing, we have 1 t 2γ ν (1)e−µtg s (t + t ) ≤ 2γ ν (1) e−µsg s (s + t ) ds, ∀ t,t ∈ (0, ∞), n n n n 0 t n n n n 0 0 Z0   whereas the last integral is bounded by the same integral with the upper limit t of integration replaced by ∞. By the (4.14) and the convergence proven for it in the preceding step, the last inequality implies −µt that t 7→ 2γnνn(1)e gn(snt), n ≥ 1, are uniformly bounded on [a, ∞), for any a ∈ (0, ∞). Hence, by Helly’s selection theorem, every subsequence of {t 7→ 2γnνn(1)gn(snt)} has a further subsequence, say indexed by nj, such that for some left-continuous function g∞ on (0, ∞),

lim 2γn νn (1)gn (sn t)= g∞(t), ∀ t ∈ (0, ∞). (4.17) j→∞ j j j j

16 Moreover, this convergence holds boundedly on compact subsets of (0, ∞) in t. To find g∞, note that, as in (4.15) and (4.16), either of the mixing conditions (4.3) implies that for fixed 0

γn γn −snt2/γn −snt1/γn [fn(snt2) − fn(snt1)] = e − e + o(1) sn sn (4.18) = −(t2 − t1)+ o(1),  where o(1)’s refer to terms tending to 0 as n →∞. By the foregoing equality and (4.13), we get

t2 t2 t2 − t1 = lim 2γn νn (1)gn (sn t)dt = g∞(t)dt, ∀ 0

Proposition 4.2 and the convergence in (6.19), extended to the convergence of the first moments, are enough to validate (4.1) and reinforce it to an explicit form on large random regular graphs. In Section 6.2, we give an alternative proof of these properties of (4.1). In this case, the limit (4.1) holds only by passing subsequential limits. Nevertheless, the use of subsequences is due to the randomness of the graphs.

4.2 Extensions We start with a basic recursion formula to relate tail distributions of the relevant meeting times

MU0,Uℓ , ℓ ≥ 2, to the tail distribution of MU0,U1 . Lemma 4.3. For any integer ℓ ≥ 1 and t ≥ 0, it holds that

t P −2tP −2(t−s)P (MU0,Uℓ >t) = e (U0 6= Uℓ)+ 2e (MU0,Uℓ+1 >s)ds 0 t Z (4.19) −2(t−s) ℓ − 2e π(x)q (x,x)q(x,y)P(Mx,y >s)ds. 0 Z x,yX∈E

Proof. Since Mx,x ≡ 0 and (U0,Uℓ) is independent of the meeting times, conditioning on (U0,Uℓ) P P U0 Uℓ gives (MU0,Uℓ > t) = (MU0,Uℓ > t, U0 6= Uℓ). Conditioning on the first update time of (B ,B ), which is an exponential variable with mean 1/2, yields

t P −2tP −2(t−s)P (MU0,Uℓ >t) = e (U0 6= Uℓ)+ 2e (U0 6= Uℓ, MU0,Uℓ+1 >s)ds. (4.20) Z0

Here, the initial condition (U0,Uℓ+1) in the last term follows from transferring the first transition of U0 U state of (B ,B ℓ ) to the initial condition. We also use the stationarity of {Uℓ; ℓ ≥ 0} when that first transition is made by BU0 . To rewrite the integral term in (4.20), note that

P(U0 6= Uℓ,U0 = x, Uℓ+1 = y)= P(U0 = x, Uℓ+1 = y) − P(U0 = Uℓ,U0 = x, Uℓ+1 = y) = π(x)qℓ+1(x,y) − π(x)qℓ(x,x)q(x,y)

17 so that P P (U0 6= Uℓ, MU0,Uℓ+1 >s)= (MU0,Uℓ+1 >s) ℓ (4.21) − π(x)q (x,x)q(x,y)P(Mx,y >s). x,yX∈E Applying (4.21) to (4.20) yields (4.19). 

We are ready to prove the existence of the limits in (4.4) and (4.5).

Proposition 4.4. For any sequence (sn) satisfying (4.2), we have the following properties: ◦ (n) (1 ) For any integer ℓ ≥ 2, every subsequence of (En,q ) contains a further subsequence such that the limit in (4.4) exists in [κ1,ℓκ1] and is independent of t ∈ (0, ∞). ◦ (2 ) Without taking any subsequence, (4.4) holds for ℓ = 2 with κ2 = κ1. (3◦) Suppose that (2.16) holds for some constant q(∞),2. Then without taking any subsequence, (4.4) (∞),2 holds κ3 = (1+ q )κ1. ◦ (4 ) For all distinct nonnegative integers ℓ0,ℓ1,ℓ2, it holds that

κ(ℓ1,ℓ2)|ℓ0 + κ(ℓ0,ℓ1)|ℓ2 − κℓ0|ℓ1|ℓ2 = κ|ℓ2−ℓ0|,

provided that all of the limits defining these constants exist.

◦ Proof. (1 ) To lighten notation in the rest of this proof but only in this proof, write Aℓ = P(U0 6= Uℓ),

Jℓ for MU0,Uℓ , ℓ Bℓ = π(x)q (x,x)q(x,y), x,yX∈E and Kℓ for the first meeting time for the pair of coalescing Markov chains where the initial condition −1 ℓ is distributed independently as Bℓ π(x)q (x,x)q(x,y) provided that Bℓ 6= 0. We set Kℓ to be an arbitrary random variable. Fix an integer ℓ ≥ 1. If e is an independent exponential variable with mean 1, then (4.19) can be written as

P P 1 P 1 1 P 1 1 (Jℓ >t)= Aℓ ( 2 e >t)+ (Jℓ+1 + 2 e > t, 2 e ≤ t) − Bℓ (Kℓ + 2 e > t, 2 e ≤ t). After rearrangement, the foregoing equality yields

P 1 P P 1 P 1 (Jℓ+1 + 2 e >t)= (Jℓ >t)+ Bℓ (Kℓ + 2 e >t) + (1 − Aℓ − Bℓ) ( 2 e >t). Hence, for all left-open intervals Γ ⊂ (0, ∞),

1 1 P(Jℓ+1 + e ∈ Γ)+(Aℓ + Bℓ)P( e ∈ Γ) 2 2 (4.22) P P 1 P 1 = (Jℓ ∈ Γ) + Bℓ (Kℓ + 2 e ∈ Γ) + ( 2 e ∈ Γ).

ℓ Since q (x,x) ≤ 1, we have BℓP(Kℓ ∈ ·) ≤ P(J1 ∈ ·), and so, the foregoing identity gives

1 1 P(Jℓ+1 + e ∈ Γ)+(Aℓ + Bℓ)P( e ∈ Γ) 2 2 (4.23) P P 1 P 1 ≤ (Jℓ ∈ Γ) + (J1 + 2 e ∈ Γ) + ( 2 e ∈ Γ).

18 We are ready to prove the required result. For any 0 < T0 < T1 < ∞, repeated applications of the first and third limits in (4.2) for all of the next three equalities give

(n) lim sup 2γnνn(1)P Jℓ+1 ∈ (snT0,snT1] n→∞ 1 P(n) 1  ≤ lim sup 2γnνn( ) Jℓ+1 + 2 e ∈ (snT0,sn2T1] n→∞ (n)  ≤ lim sup 2γnνn(1)P Jℓ ∈ (snT0,sn2T1] n→∞ 1 P(n) 1  + limsup 2γnνn( ) J1 + 2 e ∈ (snT0,sn2T1] (4.24) n→∞ (n)  ≤ lim sup 2γnνn(1)P Jℓ ∈ (snT0,sn2T1] n→∞ (n) −1  + limsup 2γnνn(1)P J1 ∈ (sn2 T0,sn2T1] n→∞ (n) −ℓ  ℓ ≤ (ℓ + 1) lim sup 2γnνn(1)P J1 ∈ (sn2 T0,sn2 T1] = 0, (4.25) n→∞  where (4.24) also uses (4.23), the last inequality follows from induction, and the equality in (4.25) follows from (4.4) with ℓ = 1. Moreover, by setting T0 = t and T1 = ∞, a similar argument as in the display for (4.25) shows that (4.4) with ℓ = 1 gives

(n) lim sup 2γnνn(1)P (Jℓ+1 >sna) ≤ (ℓ + 1)κ1, ∀ t ∈ (0, ∞). (4.26) n→∞ P 1 P 1 P On the other hand, since (Jℓ+1 + 2 e ∈ ·)+ |Aℓ + Bℓ − 1| ( 2 e ∈ ·) ≥ (Jℓ ∈ ·) by (4.22), it follows from (4.4) with ℓ = 1 and an argument similar to the one leading to (4.25) that

(n) lim inf 2γnνn(1)P (Jℓ+1 >snt) ≥ κ1 > 0, ∀ t ∈ (0, ∞). (4.27) n→∞

Combining (4.26) and (4.27), we deduce that for fixed t0 ∈ (0, ∞), any subsequence of the numbers (n) 2γnνn(1)P (Jℓ+1 >snt0) has a further subsequence that converges in [κ1, (ℓ + 1)κ1]. By (4.25), this (n) limit extends to the existence of the limit of the corresponding subsequence of 2γnνn(1)P (Jℓ+1 > snt) for any t ∈ (0, ∞), and all of these limits for different t are equal. We have proved (4.4). ◦ (n) tr (n) (2 ) Note that B1 = 0 since (q ) = 0 by assumption. Then an inspection of (4.24) shows the second limit superior on the right-hand side there can be dropped. The rest of the argument in (2◦), especially (4.26) and (4.27), can be adapted accordingly to get the required identity. (3◦) The proof is done again by improving the argument for (4.26) and (4.27), but now using (4.22) with ℓ = 2. In doing so, we also use the following implication of (2.16):

1 (n)P(n) (∞),2P(n) lim sup γnνn( ) B2 (K2 >s) − q (J1 >s) = 0, n→∞ s≥0 which follows since the distributions of K2 and J1 differ by the initial conditions. (4◦) By the definitions in (4.4)–(4.6), we have

(κ(ℓ1,ℓ2)|ℓ0 − κℓ0|ℓ1|ℓ2 ) + (κ(ℓ0,ℓ1)|ℓ2 − κℓ0|ℓ1|ℓ2 )+ κℓ0|ℓ1|ℓ2 (n) = lim 2γnνn(1)P (MU ,U >snt, MU ,U ≤ snt, MU ,U >snt) n→∞ ℓ0 ℓ1 ℓ1 ℓ2 ℓ0 ℓ2 (n) + lim 2γnνn(1)P (MU ,U ≤ snt, MU ,U >snt, MU ,U >snt) n→∞ ℓ0 ℓ1 ℓ1 ℓ2 ℓ0 ℓ2 (n) + lim 2γnνn(1)P (MU ,U >snt, MU ,U >snt, MU ,U >snt) n→∞ ℓ0 ℓ1 ℓ1 ℓ2 ℓ0 ℓ2

19 1 P(n) = lim 2γnνn( ) (MU ,U >snt)= κ|ℓ −ℓ |. n→∞ ℓ0 ℓ2 2 0

Here, the next to the last equality follows since on {M >s t}, we cannot have both M ≤ Uℓ0 ,Uℓ2 n Uℓ0 ,Uℓ1 s t and M ≤ s t by the coalescence of the Markov chains, and the last equality follows from n Uℓ1 ,Uℓ2 n the stationarity of the chain {Uℓ}. The proof is complete. 

We close this subsection with another application of Lemma 4.3. It will be used in Section 5.

Proposition 4.5. Let s0 ∈ (2, ∞). For all integers ℓ ≥ 1 and all t ∈ (0, ∞), it holds that

t ℓ j−1 2jt P −2k+1t −1 P (MU0,Uℓ >s0s)ds ≤ 1 − e (MU0,U1 >s0s)ds, (4.28) Z0 j=1 k=1 Z0 X Y  j where k=i ak ≡ 1 for j < i. Proof.QWe prove (4.28) by an induction on ℓ ≥ 1. The inequality is obvious for all t ∈ (0, ∞) if ℓ = 1. Suppose that for some ℓ ≥ 1, (4.28) holds for all t ∈ (0, ∞). By (4.19) with t replaced by s0r,

r −2s0(r−s)P 2s0e (MU0,Uℓ+1 >s0s)ds 0 Z r (4.29) P −2s0(r−s)P ≤ (MU0,Uℓ >s0r)+ 2s0e (MU0,U1 >s0s)ds. Z0 −1 The assumption s0 ∈ (2, ∞) gives t ≤ 2t(1 − s0 ), and so, for h nonnegative and Borel measurable,

−1 t 2t(1−s0 ) −4t −4t (1 − e ) h(s0s)ds ≤ h(s0s)(1 − e )ds 0 0 2t Z Z 2t r −2s0(2t−s) −2s0(r−s) ≤ h(s0s) 1 − e ds = 2s0e h(s0s)dsdr (4.30) Z0 Z0 Z0 2t  ≤ h(s0s)ds. Z0 Integrating both sides of (4.29) over [0, 2t] and applying the first and last inequalities in (4.30) give

t −4t P (1 − e ) (MU0,Uℓ+1 >s0s)ds 0 2t Z 2t P P ≤ (MU0,Uℓ >s0s)ds + (MU0,U1 >s0s)ds Z0 Z0 ℓ j−1 2j+1t 2t −2k+2t −1 P P ≤ 1 − e (MU0,U1 >s0s)ds + (MU0,U1 >s0s)ds j=1 k=1 Z0 Z0 X Y  ℓ+1 j−1 2j t 2t −2k+1t −1 P P ≤ 1 − e (MU0,U1 >s0s)ds + (MU0,U1 >s0s)ds, (4.31) j=2 k=2 Z0 Z0 X Y  where the second inequality follows from induction. Dividing both sides of (4.31) by (1 − e−4t) proves (4.28) for ℓ replaced by ℓ + 1. Hence, (4.28) holds for all ℓ ≥ 1 by induction. 

20 5 Convergence of the vector density processes

We present the proofs of Theorem 2.2 and Corollary 2.3 in this section. The key result is Proposition 5.3 where we reduce the evolutionary game model to the voter model. Throughout this section, conditions (a)–(d) of Theorem 2.2 are in force. (n) The other settings for this section are as follows. First, we write Iσ = Iσ(θnt) for the process Iσ(t) (n) defined by (3.15), when the underlying particle system is based on (En,q ). This notation extends to the other processes in the decompositions (3.14) by using the same time change. Next, recall that S denotes the type space. We will mostly consider (σ0,σ2,σ3) ∈ S × S × S such that σ0 6= σ2. These triplets fit into the context of (3.10), from which we will prove the limiting replicator equation in Theorem 2.2. Additionally, given an admissible sequence (θn,µn, wn) such that limn θn/γn = 0, we can choose a slow sequence (sn) (recall Definition 4.1) such that s lim n = 0. (5.1) n→∞ θn 5.1 Asymptotic closure of equations and path regularity (n) We begin by showing that the leading order drift term Iσ in (3.14) can be asymptotically closed by the vector density process (pσ(ξθnt); σ ∈ S). By (3.15), this term takes the following explicit form:

t (n) Iσ (t)= wnθn Dσ(ξθns)ds Z0 t (5.2)

+ θnµn(σ)[1 − pσ(ξθns)] − θnµn(S \ {σ})pσ(ξθns) ds. Z0 !

Specifically, in terms of the explicit form of Dσ in (3.10), our goal is to prove that

t wn lim sup E wnθnfσ fσ σ (ξθ s) − w∞Qσ ,σ σ p(ξθ s) ds = 0, (5.3) n→∞ ξ 0 2 3 n 0 2 3 n ξ∈SEn  Z0   where σ0 6= σ2, w∞ is defined by (2.11), and Qσ0,σ2σ3 (X) is a polynomial in X = (Xσ)σ∈S defined by

def 1 Qσ0,σ2σ3 (X) = {σ2=σ3}(κ(2,3)|0 − κ0|2|3)Xσ0 Xσ2 1 (5.4) + {σ0=σ3}(κ(0,3)|2 − κ0|2|3)Xσ0 Xσ2

+ κ0|2|3Xσ0 Xσ2 Xσ3 .

The choice of Qσ0,σ2σ3 is due to the proof of Lemma 5.4. The proof of (5.3) begins with an inequality central to the proof of [11, Theorem 2.2], which goes back to [21] and is also central to the proof of [12, Lemma 4.2]. This inequality is presented in a general form for future references. In what follows, we write a ∧ b for min{a, b}.

Proposition 5.1. Given a Polish space E0 and T ∈ (0, ∞), let (Xt)0≤t≤T be an E0-valued Markov process with c´adl´ag paths. Let f and g be bounded Borel measurable functions defined on E0. Suppose that x 7→ Ex[f(Xt)] is Borel measurable, and for some bounded decreasing function a(t),

sup Ex[|f(Xt)|] ≤ a(t), ∀ t ∈ [0, T ]. (5.5) x∈E0

21 Then for all 0 < 2δ

t sup Ex f(Xs) − g(Xs) ds x∈E0  Z0   (5.6) δ δ 1/2 ≤ a(s )ds + 8ta(δ) a(s)d s + 3δkgk∞ + t sup Ex[f(Xδ)] − g(x) . Z0  Z0  x∈E0

E Proof. For s ≥ δ, define H(s)= f(Xs) − Xs−δ [f(Xδ)]. Then

t Ex f(Xs) − g(Xs) ds  Z0   1/2 δ t 2 ≤ Ex[|f(Xs)|]+ kgk∞ ds + Ex H(s)ds Z0 "Zδ  # t  t E E E + x | Xs−δ [f(Xδ)] − g(Xs−δ)|ds + g(Xs−δ) − g(Xs) ds Zδ   Zδ  1/2  δ t 2 ≤ a(s)ds + Ex H(s)ds + 3δkgk ∞ Z0 "Zδ  # (5.7) + t sup Ex[f(Xδ)] − g(x) . x∈E0

t Note that in the last inequality, 2δkgk∞ is contributed by the integral δ g(Xs−δ) − g(Xs) ds. To bound the second term in (5.7), we note that for δ ≤ r

t 2 t t∧(r+δ) Ex H(s)ds = 2 Ex[H(r)H(s)]dsdr, (5.8) "Zδ  # Zδ Zr whereas for r ≤ s ≤ r + δ,

Ex[H(r)H(s)] E E E = x[f(Xr)f(Xs)] − x f(Xr) Xs−δ [f(Xδ)] − E E [f(X )]f(X ) + E E [f(X )]E [f(X )] x Xr−δ δ  s x Xr−δ  δ Xs−δ δ 2 ≤ a(r)a(s − r)+ a(r)a(δ)+a(δ)a(s)+ a(δ)  (5.9) by (5.5) and the Markov property. Since a is decreasing, integrating the terms in the last line yields

t t∧(r+δ) 2 a(r)a(s − r)+ a(r)a(δ)+ a(δ)a(s)+ a(δ)2 dsdr Zδ Zr δ δ  ≤ 2ta(δ) a(s)ds + 2a(δ)2tδ + 2ta(δ) a(s)ds + 2a(δ)2tδ Z0 Z0 22 δ ≤ 8ta(δ) a(s)ds. (5.10) Z0 Applying (5.8)–(5.10) to (5.7), we get (5.6). 

To prove (5.3), we apply Proposition 5.1 with the following choice:

Pwn Xt = ξθnt under ξ ,

δ = δn = 2sn/θn, (5.11) f = fn = wnθnfσ0 fσ2σ3 ,

g = gn = w∞Qσ0,σ2σ3 ◦ p.

The next two results are used to identify the appropriate a(t) = an(t) that satisfies (5.5). We recall for the last time that conditions (a)–(d) of Theorem 2.2 are in force throughout Section 5.

Lemma 5.2. Let s0 ∈ (2, ∞). Then for any t ∈ (0, ∞) and integer ℓ ≥ 1,

t 1 P(n) 2γnνn( ) (MU0,Uℓ >s0s)ds Z0 ℓ j−1 (n) (n) (5.12) k+1 π 1 t ≤C 1 − e−2 t −1 max ℓt + min , mix [1 + log+(γ /t(n) )] , 5.12 (n) n mix gns0 s0 j=1 k=1 ! πmin !" ( )# X Y  where C5.12 is a universal constant. Proof. By (4.7) and Proposition 4.5, we obtain the following inequality:

t 1 P(n) 2γnνn( ) (MU0,Uℓ >s0s)ds Z0 ℓ j−1 (n) 2ℓt γn −2k+1t −1 πmax (n) ≤ · 1 − e 2s ν (1) P (M ′ >s s)ds (n) 0 n V,V 0 s0 j=1 k=1 ! πmin ! Z0 X Y  ℓ j−1 (n) γ k+1 π = n · 1 − e−2 t −1 max (n) s0 j=1 k=1 ! πmin ! (5.13) X Y  (n) (n) × P (MU,U ′ > 0) − P (MU,U ′ > 2ℓs0t)

h ℓ j−1 (ni) γ k+1 π ≤ C n · 1 − e−2 t −1 max 5.14 (n) s0 j=1 k=1 ! πmin ! X Y  (5.14) (n) −2ℓs0t/γn 2 tmix + (n) × 1 − e + min , 1 + log (γn/tmix) " (gnγn γn )#  e   e for a universal constant C5.14. Here, (5.13)e follows from (3.22), and (5.14) follows from the exponential approximation of MU,U ′ as in the proof of Proposition 4.2. Recall the reduction of mixing of products chains to mixing of the coordinates as used in that proposition, and the inequality 1 − e−x ≤ x holds for all x ≥ 0. Hence, we obtain (5.12) from (5.14). The proof is complete. 

23 Proposition 5.3. Fix (σ0,σ1,σ2,σ3) ∈ S × S × S such that σ0 6= σ2 and σ0 6= σ1. (1◦) For any w ∈ [0, w] and t ∈ (0, ∞), the following estimates of the evolutionary game by the voter model holds: for some constant C5.15 depending only on Π, Ew E0 sup ξ fσ0 fσ2σ3 (ξt) − ξ fσ0 fσ2σ3 (ξt) ξ∈SE

 t    t s P 1 P ≤ C5.15w (MU0,U2 >s)ds + C5.15wµ( ) (MU0,U2 > r)drds; (5.15) Z0 Z0 Z0 Ew E0 sup ξ fσ0σ1 (ξt) − ξ fσ0σ1 (ξt) ξ∈SE

 t    t s P 1 P ≤ C5.15w (MU0,U1 >s)ds + C5.15wµ( ) (MU0,U1 > r)drds. (5.16) Z0 Z0 Z0

◦ (2 ) For any admissible sequence (θn,µn, wn) and T ∈ (0, ∞), it holds that

T Ewn E0 lim sup ξ wnθnfσ0 fσ2σ3 (ξ2snt) − ξ wnθnfσ0 fσ2σ3 (ξ2snt) dt = 0. (5.17) n→∞ 0 ξ∈SEn Z     Proof. (1◦) Recall that the generator of Lw of the evolutionary game is given by (2.3), and L = L0 denotes the generator of the voter model. By Duhamel’s principle [25, (2.15) in Chapter 1],

t Lw L Lw L et H = et H + e(t−s) (Lw − L)es Hds. (5.18) Z0 Here, it follows from (2.3) that

w w x,y (L − L)H1(ξ)= [q (x,y,ξ) − q(x,y)][H1(ξ ) − H1(ξ)]. (5.19) x,yX∈E sL To apply (5.18) and (5.19), we choose H = fσ0 fσ2σ3 and H1 = e fσ0 fσ2σ3 . The following bound will be proved in Proposition 6.1 (2◦):

sL x sL sup |e fσ0 fσ2σ3 (ξ ) − e fσ0 fσ2σ3 (ξ)| ξ∈SE

P Uℓ ≤ 4 (MU0,U2 > s,Bs = x) (5.20) ℓ∈{X0,2,3} s 1 P Uℓ + 4µ( ) (MU0,U2 > r,Bs = x)dr. 0 ℓ∈{X0,2,3} Z w sL w To bound (L − L)e H = (L − L)H1 in the expansion (5.18), notice that

w |q (x,y,ξ) − q(x,y)|≤ C5.21wq(x,y) (5.21) by (3.8) for some C5.21 depending only on Π. Putting (5.19), (5.20) and (5.21) together, we get Lw L sL sup |( − )e fσ0 fσ2σ3 (ξ)| ξ∈SE

P Uℓ ≤ C5.214w q(x,y) (MU0,U2 > s,Bs = x) ℓ∈{X0,2,3} x,yX∈E 24 s 1 P Uℓ + C5.214wµ( ) q(x,y) (MU0,U2 > r,Bs = x)dr 0 ℓ∈{X0,2,3} x,yX∈E Z s P 1 P ≤ C5.2112w (MU0,U2 >s)+ C5.2112wµ( ) (MU0,U2 > r)dr. Z0 w Since e(t−s)L is a probability, the required inequality in (5.17) follows upon applying the foregoing inequality to (5.18). We have proved (5.15). The proof of (5.16) is almost the same if we use Proposition 6.1 (3◦) instead of Proposition 6.1 (2◦). The details are omitted. (2◦) By the first limit in (2.11) and (5.15), it is enough to show that all of the following limits hold:

T 1 P(n) lim wn(2sn) · 2γnνn( ) (MU0,U2 > 2sns)ds = 0; (5.22) n→∞ 0 Z T (n) lim [wn(2sn) + 1] · µn(1)(2sn) · 2γnνn(1) P (MU ,U > 2sns)ds = 0. (5.23) n→∞ 0 2 Z0 (The limit (5.23) is stronger than needed but is convenient for the other proofs below.) To get (5.22), first, note that by (4.3), (5.1) and the limit superior in (2.11),

(n) 1 tmix + (n) lim wn(2sn) · T + min , [1 + log (γn/tmix)] = 0. (5.24) n→∞ " (gn(2sn) 2sn )#

We get (5.22) from applying (2.12) and (5.24) to (5.12) with s0 = 2sn. For (5.23), limn µn(1)(2sn) = 0 by (2.10) and (5.1). The limit superior in (2.11) and (5.1) give lim supn wn(2sn) < ∞. These two properties are enough for (5.23). The proof is complete. 

To satisfy (5.5) under the setting of (5.11), we consider the sum of the right-hand side of (5.15), with E0 t replaced by θnt, and supξ∈SEn ξ wnθnfσ0 fσ2σ3 (ξθnt) . Moreover, this supremum can be bounded P by using (6.1) and (MU0,U2 > θnt), thanks to duality and the choice σ0 6= σ2. Therefore, given 3  T ∈ (0, ∞), we set a(t)= an(t)= ℓ=1 an,ℓ(t) for t ∈ [0, T ], where

T def Pw θ a (t) = C · n n · w θ · 2γ ν (1) P(n)(M >θ s)ds n,ℓ 5.25 2γ ν (1) n n n n U0,U2 n n n Z0 w θ + C · n n · 2γ ν (1)P(n)(M >θ t) 5.25 2γ ν (1) n n U0,Uℓ n n n (5.25) wnθn 1 + C5.25 · · (wnθn + 1) · µn( )θn 2γnνn(1) T 1 P(n) · 2γnνn( ) (MU0,Uℓ >θns)ds Z0 and C5.25 depends only on (Π, T ). For any n ≥ 1, t 7→ an(t) is bounded and decreasing on [0, T ], and

Ewn sup ξ wnθnfσ0 fσ2σ3 (ξθnt) ≤ an(t), ∀ t ∈ [0, T ]. ξ∈SEn   Hence, the conditions of an(t) required in Proposition 5.1 hold. For the proof of (5.3), the next step is to show that under the setting of (5.11) and the above choice of a(t)= an(t), the right-hand side of (5.6) vanishes as n →∞. For the first term on the right-hand δn δn side of (5.6), proving 0 an(t)dt amounts to proving 0 an,ℓ(t)dt → 0 for all 1 ≤ ℓ ≤ 3. For the latter R R 25 limits, note that δn → 0 by (5.1). Also, a slight modification of the proofs of (5.22)–(5.23) shows that for the right-hand side of (5.25), the first and last terms in are bounded in n, and the second term satisfies

δn 1 P(n) lim 2γnνn( ) (MU0,Uℓ >θns)ds n→∞ 0 Z 1 2sn (n) = lim · 2γnνn(1) P (MU ,U > 2sns)ds = 0. n→∞ θ 0 ℓ n Z0 For the second term in (5.6), it is enough to show that an(δn)’s are bounded. From the above argument for the first term in (5.6), this property follows if we use the second limit in (2.11) and note that (n) (n) 2γnνn(1)P (MU ,U >θnt)|t=δ = 2γnνn(1)P (MU ,U > 2sn) −−−→ κℓ, 0 ℓ n 0 ℓ n→∞ where the limit follows from Proposition 4.2 and Proposition 4.4. To use these propositions precisely, (n) passing the foregoing limit actually requires that given any subsequence of (En,q ), a suitable further subsequence is used. To lighten the exposition, we continue to suppress similar uses of subsequential limits. For the third term in (5.6), note that δn → 0 by (5.1), and the gn’s in (5.11) are uniformly bounded in n. The last term in (5.6) is the major term. By (5.11) and Proposition 5.3 (2◦), it remains to prove E0 lim sup ξ[wnθnfσ0 fσ2σ3 (ξ2sn )] − w∞Qσ0,σ2σ3 p(ξ) = 0. (5.26) n→∞ ξ∈SEn  For the next lemma, recall that the total variation distance dE and the spectral gap g are defined at the beginning of Section 2. Also, here and in what follows, we use the shorthand notation E[Z; A]= E[Z1A].

Lemma 5.4. Fix (σ0,σ2,σ3) ∈ S × S × S such that σ0 6= σ2. (1◦) Given any 0 s,MU0,U3 >s)pσ0 (ξ)pσ2 (ξ) ξ∈SE

1  P  + {σ2=σ3} (MU0,U2 > s,MU2,U3 > s,MU0,U3 >s)pσ0 (ξ)pσ2 (ξ) 1 P − {σ0=σ3} (MU0,U2 > s,MU2,U3 >s)pσ0 (ξ)pσ2 (ξ) 1 P (5.27) + {σ0=σ3} (MU0,U2 > s,MU2,U3 > s,MU0,U3 >s)pσ0 (ξ)pσ2 (ξ) P − (MU0,U2 > s,MU2,U3 > s,MU0,U3 >s)pσ0 (ξ)pσ2 (ξ)pσ3 (ξ) 3

≤ C5.27 Γℓ(s,t), Xℓ=1 where C5.27 is a universal constant and def P Γℓ(s,t) = (MU0,Uℓ ∈ (s,t]) π + min max e−g(t−s), P(M >s)d (t − s) ν(1) U0,Uℓ E (5.28) r  t −2µ(1)t P 1 P + 1 − e (MU0,Uℓ >t)+ µ( ) (MU0,Uℓ > r)dr. Z0  (2◦) The limit in (5.26) holds.

26 Proof. (1◦) First, we consider the case that there is no mutation. Roughly speaking, the method of E0 this proof is to express ξ fσ0 fσ2σ3 (ξt) in terms of coalescing Markov chains before any two coalesce. This way we can express the coalescing Markov chains as independent Markov chains and compute E0   the asymptotics of ξ fσ0 fσ2σ3 (ξt) by the κ-constants defined in (4.4)–(4.6). This idea goes back to [11, Proposition 6.1].   Now, by duality and the assumption σ0 6= σ2, it holds that E0 ξ fσ0 fσ2σ3 (ξt) E 1 U0 1 U2 1 U3 = [ σ0 ◦ ξ(Bt ) σ2 ◦ ξ(Bt ) σ3 ◦ ξ(Bt )] 1 E 1 U0 1 U2 = {σ2=σ3} [ σ0 ◦ ξ(Bt ) σ2 ◦ ξ(Bt ); MU0,U2 > t,MU2,U3 ≤ t, MU0,U3 >t] 1 E 1 U0 1 U2 + {σ0=σ3} [ σ0 ◦ ξ(Bt ) σ2 ◦ ξ(Bt ); MU0,U2 > t,MU2,U3 > t,MU0,U3 ≤ t] E 1 U0 1 U2 1 U3 + [ σ0 ◦ ξ(Bt ) σ2 ◦ ξ(Bt ) σ3 ◦ ξ(Bt ); MU0,U2 > t,MU2,U3 > t,MU0,U3 >t] 1 1 = {σ2=σ3}I+ {σ0=σ3}II+III. (5.29) We can further write I and II as

E 1 U0 1 U2 I= [ σ0 ◦ ξ(Bt ) σ2 ◦ ξ(Bt ); MU0,U2 > t,MU0,U3 >t] E 1 U0 1 U2 (5.30) − [ σ0 ◦ ξ(Bt ) σ2 ◦ ξ(Bt ); MU0,U2 > t,MU2,U3 > t,MU0,U3 >t] =I′ − I′′, E 1 U0 1 U2 II = [ σ0 ◦ ξ(Bt ) σ2 ◦ ξ(Bt ); MU0,U2 > t,MU2,U3 >t] E 1 U0 1 U2 (5.31) − [ σ0 ◦ ξ(Bt ) σ2 ◦ ξ(Bt ); MU0,U2 > t,MU2,U3 > t,MU0,U3 >t] = II′ − I′′.

We estimate I′, I′′, II′ and III below, using the property that the coalescing Markov chains move independently before meeting. First, for 0 s,MU0,U3 >s)pσ0 (ξ)pσ2 (ξ) ≤ E e(t−s)(q−1)1 ◦ ξ(BU0 )e(t−s)(q−1)1 ◦ ξ( BU2 ) − p (ξ)p (ξ) σ0 s σ2 s σ0 σ2 (5.32) h ; M > s,M >s U0,U2 U0,U3 P P i + (MU0,U2 ∈ (s,t]) + (MU0,U3 ∈ (s,t]).

U0 U2 On the event {MU0,U2 > s,MU0,U3 > s}, (Br )0≤r≤s and (Br )0≤r≤s are independent q-Markov 1 chains and each chain is stationary by the assumption on {Uℓ}. Since pσ(ξ)= x σ ◦ ξ(x)π(x), the expectation in (5.32) can be estimate as in the proof of [11, Proposition 6.1]. We get P ′ P I − (MU0,U2 > s,MU0,U3 >s)pσ0 (ξ)pσ2 (ξ) π ≤ min 2 max e−g(t−s), 4P(M > s,M >s)d (t − s) ν(1) U0,U2 U0 ,U3 E  r  P P + (MU0,U2 ∈ (s,t]) + (MU0,U3 ∈ (s,t]). Similar estimates apply to the other terms I′′, II′ and III in (5.29), (5.30) and (5.31). Applying all of these estimates to (5.29) proves (5.27) when there is no mutation. The additional terms in (5.27) arise when we include mutation and use (6.1) again.

27 ◦ (2 ) Recall the second limit in (2.11) and (5.23). Then by (5.27) and (5.28) with t = 2sn and s = sn, it suffices to show all of the following limits:

(n) lim 2γnνn(1)P (MU ,U ∈ (sn, 2sn]) = 0, 1 ≤ ℓ ≤ 3; (5.33) n→∞ 0 ℓ −2µn(1)·(2sn) (n) lim 1 − e · 2γnνn(1)P (MU ,U > 2sn), 1 ≤ ℓ ≤ 3; (5.34) n→∞ 0 ℓ

lim Γ n,ℓ = 0, 1 ≤ ℓ ≤ 3, (5.35) n→∞ where Γn,ℓ is given by the minimum of the following two terms:

(n) πmax 1 −gnsn 1 P(n) γnνn( ) · e , 2γnνn( ) (MU0,Uℓ >sn)dEn (sn). (5.36) sνn(1)

To see (5.33), we simply use Propositions 4.2 and 4.4. The limit in (5.34) follows from the same propositions, in addition to (2.10) and (5.1). For (5.35), we consider the following two cases. When Γn is given by the first term in (5.36), the required limit holds by (2.12) and the first limit in (4.3). When Γn is given by the other term in (5.36), we first use Propositions 4.2 and 4.4. Then note that (n) the second limit in (4.3) implies limn tmix/sn = 0, and so, limn dEn (sn) = 0 by (4.9). We have proved (5.35). The proof is complete. 

Up to this point, we have proved the asymptotic closure of equation in the sense of (5.3). Note that under (5.11), the convergence of the last term in (5.6) also contributes to asymptotic path regularity of the density processes. The next lemma proves the asymptotic path regularity more explicitly as tightness in the conver- gence results of Theorem 2.2. The limit of the normalized martingale terms in Theorem 2.2 (2◦) is also proven. Here, recall that the density processes satisfy the decompositions in (2.5). From now on, (d) −−−→ refers to convergence in distribution as n →∞. n→∞ Lemma 5.5. Fix σ ∈ S. (n) ◦ Pwn (1 ) The sequence of laws of Iσ as continuous processes under νn is tight. (n) ◦ 1 −1 Pwn (2 ) The sequence of laws of {wn>0}wn Rσ as continuous processes under νn is tight. (n) ◦ Pwn (3 ) The sequence of laws of Mσ as continuous processes under νn converges to zero in distribution.

If, in addition, limn γnνn(1)/θn = 0, then the following holds. (4◦) The sequence of laws of

γ 1/2 γ t n M (n)(t), n hM (n), M (n)i − p (ξ )[1 − p (ξ )]ds; t ≥ 0 (5.37) θ σ θ σ σ t σ θns σ θns  n  n Z0 !

Pwn as processes with c`adl`ag paths under νn is C-tight, and the second coordinates converge to zero in distribution as processes. Moreover, for all T ∈ (0, ∞), γ n Ewn (n) 2 sup sup sup ξ Mσ (t) < ∞. (5.38) n≥1 t∈[0,T ] ξ∈SEn θn  

28 (5◦) For any σ′ ∈ S with σ′ 6= σ, the sequence

t γn (n) (n) hM , M i + p (ξ )p ′ (ξ )ds; t ≥ 0 (5.39) θ σ σ′ t σ θns σ θns  n Z0  Pwn under νn converges to zero in distribution as processes.

◦ Ewn (n) Proof. (1 ) First, we show a bound for supξ∈SEn ξ [|Iσ (θ)|] explicitly in θ. By (3.9),

|Dσ(ξ)|≤ C5.40 fσσ′ (ξ) (5.40) σ′∈S σX′6=σ 1 1 for some constant C5.43 depending only on Π and #S. Indeed, for q(x,y) > 0, σ ◦ξ(y)− σ ◦ξ(x) 6= 0 implies that either ξ(x) or ξ(y) is σ but not both. By (5.2) and (5.40),

θ wn (n) wn E E ′ sup ξ [|Iσ (θ)|] ≤ C5.40 sup ξ wnθnfσσ (ξθns) ds ξ∈SEn 0 ξ∈SEn σ′∈S Z (5.41) σX′6=σ  

+ 2µn(1)θn · θ.

Furthermore, we can bound the expectations on the right-hand side of (5.41) by using an analogue of the an(t) in (5.25), but now involving only the meeting time MV,V ′ . Specifically, given T ∈ (0, ∞), the following inequality holds for all t ∈ [0, T ]:

wn E ′ ξ wnθnfσσ (ξθnt) T  wnθn  (n) ≤ C · · w θ · 2γ ν (1) P (M ′ >θ s)ds 5.42 2γ ν (1) n n n n V,V n n n Z0 wnθn 1 P(n) + C5.42 · · 2γnνn( ) (MV,V ′ >θnt) 2γnνn(1) (5.42) wnθn 1 + C5.42 · · (wnθn + 1) · µn( )θn 2γnνn(1) T (n) · 2γnνn(1) P (MV,V ′ >θns)ds Z0 (n) (n) and C5.42 depends only on (Π, T, supn πmax/πmin). To see (5.42), we combine (5.16) and [12, Propo- ′ sition 3.2] and then use (2.12) and (4.7) to reduce probabilities of MU0,U1 to probabilities of MV,V . Next, we show that

Ewn (n) lim lim sup sup ξ [|Iσ (θ)|] = 0. (5.43) θց0 n→∞ ξ∈SEn

First, (2.10) readily gives the required limit of the last term of (5.41). We focus on the sum of integrals on the right-hand side of (2.10). For each of these integrals, note that the first and last terms in (5.42) are uniformly bounded in n as in the case of (5.25). The integral over t ∈ [0,θ] of the second term in (5.42) satisfies

θθn wnθn 2γn (n) lim lim sup · νn(1) P (MV,V ′ >s)ds = 0 (5.44) θց0 2γ ν (1) θ n→∞ n n n Z0

29 by the second limit of (2.11), (3.22), and (4.18) with sn replaced by θn and with t2 = θ and t1 = 0 since (θn) is also a slow sequence. (The use of MV,V ′ allows us to circumvent Lemma 5.2 due to the explosion of the bound in (5.12) as t → 0.) We have proved (5.43). Finally, the required tightness follows from (5.43), the strong Markov property of the particle system and Aldous’s criterion for tightness [28, Proposition VI.4.5 on p.356]. The detail is similar to the proof of [11, Theorem 5.1 (1)]. ◦ w (2 ) Recall the equation (3.16) of Rσ, and the explicit form of R can be read from (3.8). Then by the same reason for (5.40), the coefficient of Rσ satisfies 1 1 w π(x)| σ ◦ ξ(y) − σ ◦ ξ(x)|q(x,y)|R (x,y,ξ)|≤ C5.45 fσσ′ (ξ), (5.45) ′ x,yX∈E σX∈S ◦ where C5.45 depends only on Π and #S. From (5.45), the argument in (1 ) applies again. (3◦) The proof follows from a slight modification of the proof of (4◦) below even without the additional assumption limn γnνn(1)/θn = 0. ◦ (4 ) We start with the convergence of the second coordinate in (5.37). Define a density function pσ(ξ) En (n) (n) 2 on S such that the stationary weights π (x) in pσ(ξ) are replaced by π (x) /νn(1). From (3.18), Pwn En the following equality holds under ξ for all ξ ∈ S : e

γn (n) (n) hMσ , Mσ it θn t 1 ′ ′ = γnνn( ) [pσ σ(ξθns)+ pσσ (ξθns)]ds Z0 σ′∈S\{σ} X (5.46) t 1 + γnνn( ) [1 − pσ(ξθns)]µn(σ)+ pσ(ξθns)µn(S \ {σ}) ds 0 Z   γ ν (1) t + n n · w θ eR(n)(ξ )ds. e θ n n wn θns n Z0 (n) e (n) Here, Rwn can be bounded in the same way as (5.40). Note that due to the use of Rwn , we only involve the first term q(x,y) in the expansion (3.8) of qwn (x,y,ξ). Lete us explain how the required convergence of the second coordinate in (5.37) followse (5.46). 1 (n) First, since limn γnνn( )/θn = 0 by assumption, the bound for Rwn mentioned above and the proof of (1◦) show that the continuous process defined by the last integral of (5.46) converges to zero in distribution. Next, for all 0 ≤ T0 < T1 < ∞, it holds that e

T1 1 ′ γnνn( ) pσ (ξθns)µn(σ)+ pσ(ξθns)µn(S \ {σ}) ds T0 ′ ! Z σ ∈XS\{σ} γnνn(1) e e ≤ · µn(1)θn · (T1 − T0) −−−→ 0 θn n→∞ by (2.10) and the assumption limn γnνn(1)/θn = 0. Hence, the second integral on the right-hand side of (5.46) converges to zero in distribution as a continuous process. The sequence of laws of the first integrals on the right-hand side of (5.46) is tight for a reason similar to (5.44). Moreover, since κ0 in (4.1) is 1 by Proposition 4.2, a slight modification of the proof of (5.3) shows that t t (d) γnνn(1) [pσ′σ(ξθ s)+ pσσ′ (ξθ s)]ds − pσ(ξθ s)[1 − pσ(ξθ s)]ds −−−→ 0 n n n n n→∞ 0 ′ 0 Z σ ∈XS\{σ} Z 30 as processes. See (5.16) and [11, Proposition 6.1]. By this convergence and the explanation in the preceding paragraph, (5.46) thus shows that the sequence of laws of the second coordinates in (5.37) as processes converges to zero in distribution. To get the C-tightness of the sequence of laws of the first coordinates in (4.1), first, note that (n) (n) the tightness readily follows from the C-tightness of (γn/θn)hMσ , Mσ i proven above [28, Theo- rem VI.4.13 on p.358]. For the stronger C-tightness, note that the jump sizes of pσ(ξt) are given by π(x)’s. Hence,

1/2 (n) 1/2 (n) (γn/θn) sup |∆Mσ (t)|≤ (γn/θn) πmax, (5.47) t≥0 where the right-hand side tends to zero by (2.12) and the assumption limn γnνn(1)/θn = 0. The required C-tightness now follows from [28, Proposition 3.26 in Chapter VI on p.351]. 1/2 (n) Finally, the above argument for the tightness of (γn/θn) Mσ also proves (5.38). (5◦) The proof is almost identical to the proof of (4◦) for the second coordinate of (5.37), if we start with (3.19). The details are omitted. 

5.2 The replicator equation and the Wright–Fisher fluctuations In this subsection, we complete the proof of Theorem 2.2 and give the proof of Corollary 2.3.

Completion of the proof of Theorem 2.2. By (3.10), (5.2), (5.3) and Lemma 5.5 (1◦)–(3◦), we have proved that the following vector process converges to zero in distribution:

t

pσ(ξθnt) − w∞ Π(σ, σ3)Qσ0,σσ3 p(ξθns) − Π(σ2,σ3)Qσ,σ2σ3 p(ξθns) ds 0 ! Z σ0,σ3∈S σ2,σ3∈S σX06=σ  σX26=σ  t

− µ∞(σ)[1 − pσ(ξθns)] − µ∞(S \ {σ})pσ(ξθns) ds, σ ∈ S, Z0 ! where the polynomials Qσ0,σ2σ3 are defined in (5.4). Hence, the sequence of laws of p(ξθnt) is C-tight, and p(ξθnt) converges in distribution to X(t) as processes, where X is the unique solution to the following system:

X˙ σ = w∞Qσ(X)+ µ∞(σ)(1 − Xσ) − µ∞(S \ {σ})Xσ, σ ∈ S, (5.48) and the polynomial Qσ(X) in (5.48) is given by

Qσ(X)= Π(σ, σ3)Qσ0,σσ3 (X) − Π(σ2,σ3)Qσ,σ2σ3 (X). (5.49)

σ0,σ3∈S σ2,σ3∈S σX06=σ σX26=σ

To simplify (5.49) to the required form in (2.13), note that the constraints σ0 6= σ and σ2 6= σ in (5.49) can be removed from the definition of Qσ(X) by cancelling repeating terms. In doing so, we extend the definition Qσ0,σ2σ3 (X) to σ0 = σ2 by the same formula in (5.4), but only in this proof. We also lighten notation by the following: A = κ(2,3)|0 − κ0|2|3 and B = κ(0,3)|2 − κ0|2|3 and C = κ0|2|3. Then by (5.49), 1 1 Qσ(X)= Π(σ, σ3) AXσ0 Xσ2 + BXσ0 Xσ2 + CXσ0 Xσ2 Xσ3 {σ2=σ3} {σ0=σ3} σ2=σ σ ,σ ∈S 0X3 

31 1 1 − Π(σ2,σ3) {σ =σ }AXσ0 Xσ2 + {σ =σ }BXσ0 Xσ2 + CXσ0 Xσ2 Xσ3 2 3 0 3 σ0=σ σ ,σ ∈S 2X3 

= Xσ AΠ(σ, σ)Xσ0 + Xσ BΠ(σ, σ0)Xσ0 + Xσ CΠ(σ, σ3)Xσ3 σX0∈S σX0∈S σX3∈S

− Xσ AΠ(σ2,σ2)Xσ2 − Xσ BΠ(σ2,σ)Xσ2 σX2∈S σX2∈S

− Xσ CΠ(σ2,σ3)Xσ3 Xσ2 ! σX2∈S σX3∈S ′ ′ ′ = Xσ AΠ(σ, σ)+ B[Π(σ, σ ) − Π(σ ,σ)]Xσ′ + CΠ(σ, σ )Xσ′ ′ ′ ! σX∈S σX∈S ′ ′ ′ ′′ − Xσ AΠ(σ ,σ )+ CΠ(σ ,σ )Xσ′′ Xσ′ . ′ ′′ ! σX∈S σX∈S

Note that we have used the property σ Xσ = 1 in the last two equalities. The last equality is enough for the required form in (2.13) upon recalling (5.48) and involving the polynomials F (X) and F (X) P σ σ in (1.4) and (1.5). Moreover, (2.14) holds by Proposition 4.2, (4.7), and Proposition 4.4 (1◦) and (4◦). For the proof of (2◦), notice that by (1◦) and Lemma 5.5 (4◦)–(5◦), the following convergencee of matrix processes holds:

t γn (n) (n) (d) hMσ , Mσ′ it −−−→ Xσ(s)[δσ,σ′ − Xσ′ (s)]ds . θ ′ n→∞ ′  n σ,σ ∈S Z0 σ,σ ∈S By this convergence and (5.38), the standard martingale problem argument shows that every weakly 1/2 (n) convergent subsequence of ((γn/θn) Mσ ; σ ∈ S) converges to a continuous vector L2-martingale (∞) t ′ (Mσ ; σ ∈ S) with a quadratic variation matrix given by ( 0 Xσ(s)[δσ,σ′ − Xσ′ (s)]ds; σ, σ ∈ S). See [28, Proposition 1.12 in Chapter IX on p.525] and the proof of [36, Theorem 1.10 in Chapter XIII R(∞) on pp.519–520]. Hence, the limiting vector martingale (Mσ ; σ ∈ S) is a Gaussian process with t ′ covariance matrix ( 0 Xσ(s)[δσ,σ′ − Xσ′ (s)]ds; σ, σ ∈ S) [36, Exercise (1.14) in Chapter V on p.186]. Moreover, by uniqueness in law of this Gaussian process, the convergence holds along the whole se- R 1/2 (n) quence of the vector martingale (γn/θn) M . The proof is complete. 

Proof of Corollary 2.3. Write u for X1. In this case, X0 = 1 − u and the polynomial Q1(X) defined by (5.49) simplifies to

Q1 = (b − c)Q0,11 − cQ0,10 − bQ1,01 = (Q0,11 − Q1,01)b − (Q0,11 + Q0,10)c.

By (5.4), the coefficient of c is given by

2 −Q0,11(X) − Q0,10(X)= −(κ(2,3)|0 − κ0|2|3)(1 − u)u − κ0|2|3(1 − u)u 2 (5.50) − (κ(0,3)|2 − κ0|2|3)(1 − u)u − κ0|2|3(1 − u) u, and the coefficient of b is

2 2 Q0,11(X) − Q1,01(X) = (κ(2,3)|0 − κ0|2|3)(1 − u)u + κ0|2|3u (1 − u)u 2 (5.51) − (κ(0,3)|2 − κ0|2|3)(1 − u)u − κ0|2|3(1 − u)u .

32 These two coefficients can be simplified by using the definition of Qσ0,σ2σ3 and Proposition 4.4 (4◦), if we follow the algebra in the proof of Lemma 3.1 that simplifies (3.10) to (3.11). For example, a similar argument as in the proof of Lemma 5.4 shows that (5.26) holds with

2 Q0,01(X) = (κ(0,2)|3 − κ0|2|3)(1 − u)u + κ0|2|3(1 − u) u, and so

2 Q0,11(X)+ Q0,01(X) = (κ(2,3)|0 − κ0|2|3)(1 − u)u + κ0|2|3(1 − u)u 2 + (κ(0,2)|3 − κ0|2|3)(1 − u)u + κ0|2|3(1 − u) u

= (1 − u)uκ3, where the last equality follows from Proposition 4.4 (4◦). In this way, we can obtain from (5.50) and (5.51) that Q1(X) = [(κ3 − κ1)b − κ2c](1 − u)u. Moreover, by Proposition 4.4, we can pass limit along the whole sequence to get this limiting polynomial Q1(X). 

6 Further properties of coalescing lineage distributions 6.1 A comparison with mutations In this section, we prove some auxiliary results for the proof of Theorem 2.2. The next proposition 0 estimates the voter model (ξt) under P by its selection mechanism, that is, by the updates from {Λ(x,y); x,y ∈ E}. The proof extends [12, Proposition 3.2]. Recall the notation in Section 3 for the coalescing Markov chains.

Proposition 6.1. (1◦) Let f : S × S × S → [−1, 1] be a function such that f(σ, σ, ·) = 0 for all σ ∈ S. Then for all t ∈ (0, ∞) and x,y,z ∈ E,

E0 E x y z sup ξ f ξt(x),ξt(y),ξt(z) − f ξ(Bt ),ξ(Bt ),ξ(Bt ) ξ∈SE

 −2µ(1)t P  1 −µ(1)t P  ≤ 1 − e (Mx,y >t)+ x6=y 1 − e (Mx,z ∧ My,z >t) (6.1) t t   + 2µ(1) P(Mx,y >s)ds + 1x6=yµ(1) P(Mx,z ∧ My,z >s)ds. Z0 Z0 ◦ (2 ) For all (σ0,σ2,σ3) ∈ S × S × S with σ0 6= σ2, t ∈ (0, ∞) and x ∈ E, E0 E0 sup ξx fσ0 fσ2σ3 (ξt) − ξ fσ0 fσ2σ3 (ξt) ξ∈SE     P Uℓ ≤ 4 (MU0,U2 > t,Bt = x) (6.2) ℓ∈{X0,2,3} t 1 P Uℓ + 4µ( ) (MU0,U2 > s,Bt = x)ds. 0 ℓ∈{X0,2,3} Z

33 ◦ (3 ) For all (σ0,σ1) ∈ S × S with σ0 6= σ1, t ∈ (0, ∞) and x ∈ E, E0 E0 sup ξx fσ0σ1 (ξt) − ξ fσ0σ1 (ξt) ξ∈SE     P Uℓ ≤ 4 (MU0,U1 > t,Bt = x) (6.3) ℓ∈{X0,1} t 1 P Uℓ + 4µ( ) (MU0,U1 > s,Bt = x)ds. 0 ℓ∈{X0,1} Z

The proof of this proposition extends the proof of [12, Proposition 3.2] and is based on the pathwise duality between the voter model and the coalescing Markov chains. The relation follows from time reversal of the stochastic integral equations in Section 2 of the voter model. More specifically, for fixed t ∈ (0, ∞), we define a system of coalescing q-Markov chains {Ba,t; a ∈ E} such that in the absence of mutation, Ba,t traces out the time-reversed ancestral line that determines the type at (a, t) under the voter model. For example, if s is the last jump time of {Λr(a, b); b ∈ E, r ∈ (0,t]} and Λ(a, c) causes a,t a,t this jump, the state of B stays at a before transitioning to Bt−s = c. Similarly, with the Poisson processes Λσ driving the mutations, we can define e(a, t) and M(a, t) for the time and the type from the first mutation event on the trajectory of Ba,t, with e(a, t) = ∞ if there is no mutation. Since e(a, t) >t if and only if e(a, t)= ∞, we have

1 a,t 1 P0 ξt(a)= M(a, t) {e(a,t)≤t} + ξ Bt {e(a,t)>t}, ∀ a ∈ E, ξ-a.s. (6.4)

More details can be seen by modifying the description  in [12, Section 6.1]. In the absence of mutation, this relation between the duality and the stochastic integral equations is known in [33]. We also observe two identities for the probability distributions of the mutation times e(a, t)’s when def we condition on G = σ(Λ(a, b); a, b ∈ E). Let x,y ∈ E. Write 0 = J0 < J1 < · · · < JN < JN+1 = t x,t y,t x,t such that J1, J2, · · · , JN are the jump times of the bivariate chain (B ,B ). Hence, Br = xk and y,t Br = yk, for all r ∈ [Jk, Jk+1) and 0 ≤ k ≤ N. First, fix s ∈ [0, ∞), and note that conditioned x,t on G , whether mutation occurs along the trajectory of B over [Jk,s ∧ Jk+1) depends on whether σ σ ′ σ σ [Λ (x ) − Λ (x )] ≥ 1. Note that s 7→ [Λ (x ) − Λ ′ (x )] is a Poisson σ∈S (t−Jk)− k (t−s∧Jk+1)− k σ∈S t k (t−s )− k σ σ process with rate µ(1). Hence, summing over [Λ (xk) − Λ (xk)] in k, we deduce P σ∈S (t−Jk)−P (t−s∧Jk+1)− P 1 P e(x,t) ≤ s|G = 1 − e−µ( )s, s ≥ 0. (6.5)  Second, note that xk 6= yk for all k such that Jk+1 ≤ Mx,y ∧ s. In this case, mutations in [Jk, Jk+1) along the trajectories of Bx,t and By,t are determined by two disjoint subsets of the Poisson processes {Λσ(a); σ ∈ S, a ∈ E}. Hence, (6.5) generalizes to the following identity:

−2µ(1)(s∧Mx,y ) P e(x,t) ∧ e(y,t) ≤ s ∧ Mx,y|G = 1 − e , s ≥ 0. (6.6)

◦  Proof of Proposition 6.1. (1 ) Set a partition {Aj }1≤j≤4 as follows:

A1 = {e(x,t) ∧ e(y,t) ≤ t < Mx,y},

A2 = {e(x,t) ∧ e(y,t) ≤ Mx,y ≤ t}, (6.7) A3 = {Mx,y < e(x,t) ∧ e(y,t) ≤ t},

A4 = {e(x,t) ∧ e(y,t) >t}.

34 Then consider the corresponding differences for the left-hand side of (6.1):

E0 E x y z ∆j = ξ f ξt(x),ξt(y),ξt(z) ; Aj − f ξ(Bt ),ξ(Bt ),ξ(Bt ) ; Aj , 1 ≤ j ≤ 4. (6.8)       Let e1 and e2 be i.i.d. exponential random variables with mean 1/µ(1). It follows from (6.6) and the independence between selection and mutation that

−2µ(1)t |∆1|≤ P(e1 ∧ e2 ≤ t)P(Mx,y >t)= 1 − e P(Mx,y >t), (6.9) t t  |∆2|≤ P(t ≥ Mx,y >s)P(e1 ∧ e2 ∈ ds) ≤ 2µ(1) P(Mx,y >s)ds. (6.10) Z0 Z0 x,t y,t On A3, Bt = Bt by coalescence, and hence, ξt(x)= ξt(y) by (6.4). It follows from the assumption on f that both of the expectations defining ∆3 are zero. To bound ∆4, fix z ∈ E and partition A4 into the following four sets:

A41 = {e(x,t) ∧ e(y,t) > t,e(z,t) ≤ t < Mx,z ∧ My,z},

A42 = {e(x,t) ∧ e(y,t) > t,e(z,t) ≤ Mx,z ∧ My,z ≤ t},

A43 = {e(x,t) ∧ e(y,t) > t,Mx,z ∧ My,z < e(z,t) ≤ t},

A44 = {e(x,t) ∧ e(y,t) > t,e(z,t) >t}.

Then define ∆4k for 1 ≤ k ≤ 4 as in (6.8) by replacing Aj with A4k. By (6.5) and similar arguments for (6.9) and (6.10), we get

−µ(1)t |∆41|≤ 1x6=y 1 − e P(Mx,z ∧ My,z >t), t (6.11)  |∆42|≤ 1x6=yµ(1) P(Mx,z ∧ My,z >s)ds, Z0 where the use of the indicator function 1x6=y follows from the assumption of f. For ∆43, it is zero because A43 = ∅. Indeed, on {Mx,z ∧ My,z < e(z,t) ≤ t}, either e(x,t) ≤ t or e(y,t) ≤ t since either e(x,t) = e(z,t) (if Mx,z ∧ My,z = Mx,z) or e(y,t) = e(z,t) (if Mx,z ∧ My,z = My,z). Hence, {Mx,z ∧ My,z < e(z,t) ≤ t} does not intersect {e(x,t) ∧ e(y,t) > t}. Finally, ∆44 = 0 by (6.4) now that the random variables being taken expectation are actually equal. In summary, we have proved that ∆3 =∆43 =∆44 = 0. In addition, ∆1, ∆2, ∆41 and ∆42 satisfy (6.9), (6.10) and (6.11). We have proved (6.1). (2◦) For the left-hand side of (6.2), we use (6.4) to write

E0 E0 ξx fσ0 fσ2σ3 (ξt) − ξ fσ0 fσ2σ3 (ξt)

E  1  1  x Uj ,t 1 = σj M(Uj,t) {e(Uj ,t)≤t} + ξ (Bt ) {e(Uj ,t)>t} " j∈{0,2,3} Y   (6.12) 1 1 Uj ,t 1 − σj M(Uj ,t) {e(Uj ,t)≤t} + ξ(Bt ) {e(Uj ,t)>t} . # j∈{Y0,2,3}   Mutation neglects the role of the initial condition. Hence, to get a nonzero value for the difference inside the foregoing expectation, we cannot have e(Uj,t) ≤ t for all j ∈ {0, 2, 3}. In this case, at 1 Uj ,t 1 x Uj ,t least one of the sums σj ◦ ξ(Bt )+ σj ◦ ξ (Bt ), j ∈ {0, 2, 3}, has to be nonzero. We must have

35 Uj ,t Bt = x for some j ∈ {0, 2, 3}. By bounding the indicator functions associated with σ3 by 1, we obtain from (6.12) that E0 E0 ξx fσ0 fσ2σ3 (ξt) − ξ fσ0 fσ2σ3 (ξt)

    Uℓ,t E x E 1 ≤ ( ξ + ξ) σj ◦ ξt(Uj); Bt = x , ∀ x ∈ E. (6.13) " # ℓ∈{X0,2,3} j∈{Y0,2} The method in (1◦) now enters to remove mutations in each of the two expectations in the ℓ-th summand of (6.13). For η ∈ SE, we consider

E 1 Uℓ,t E 1 Uj ,t Uℓ,t η σj ◦ ξt(Uj); Bt = x − σj ◦ η(Bt ); Bt = x (6.14) " # " # j∈{Y0,2} j∈{Y0,2} and use only the partition in (6.7) with x = U0 and y = U2. In this case, on A4, the two products of the indicator functions in (6.14) are equal. Since σ0 6= σ2 ensures that the second expectation in P Uℓ (6.14) can be bounded by (MU0,U2 > t,Bt = x), (6.14) and a slight extension of (6.9) and (6.10) give

E 1 Uℓ,t η σj ◦ ξt(Uj); Bt = x " # j∈{Y0,2} 1 P Uℓ −2µ( )t P Uℓ ≤ (MU0,U2 > t,Bt = x)+ 1 − e (MU0,U2 > t,Bt = x) t 1 P Uℓ  E + 2µ( ) (MU0,U2 > s,Bt = x)ds, ∀ η ∈ S , x ∈ E. (6.15) Z0 The required inequality (6.2) now follows from (6.13) and (6.15). (3◦) The proof of (6.3) is almost the same as the proof of (6.2) and is omitted. 

6.2 Full decorrelation on large random regular graphs In this subsection, we give a different proof of the explicit form of (4.1) by using the graphs’ local convergence. Throughout the rest of this subsection, we use the graph-theoretic terminologies from [5, 16]. We start with the definition of the random regular graphs. Fix an integer k ≥ 3. Choose a sequence {Nn} of positive integers such that Nn →∞ and k-regular graphs (without loops and multiple edges) on Nn vertices exist. The existence of {Nn} follows from the Erd˝os–Gallai necessary and sufficient condition. Then the random k-regular graph on Nn vertices is the graph Gn chosen uniformly from the set of k-regular graphs with Nn vertices. We assume that the randomness defining the graphs is collectively subject to the probability P and the expectation E. For applications to the evolutionary dynamics, we need two properties of random walks on the random graphs. See [15, Section 3] and the references there for more details. First, the random walks are asymptotically irreducible in the following sense:

P(Gn has only one connected component) → 1 as n →∞. (6.16)

This property follows since the P-probability that Gn has a nonzero spectral gap tends to one [26, 7]. See [16, Lemma 1.7 (d) on pp.6–7] for connections between graph spectral gaps and numbers of

36 connected components. Second, Gn for large n is locally like the infinite k-regular tree G∞ in the (n),ℓ following sense. Write q (x,y) for the ℓ-step transition probability of random walk on Gn. For any n, r ∈ N, write Tn(r) for the set of vertices x in Gn such that the subgraph induced by vertices y with d(x,y) < r does not have a cycle, where d denotes the graph distance on Gn. Then a standard result for the random graphs {Gn} [6, Section 2.4] gives

N − #T (ℓ) P n n −−−→ 0, ∀ ℓ ≥ 1, (6.17) Nn n→∞

P (n) where −−−→ refers to convergence in P-probability. Note that π is uniform on Gn. Consequently, n→∞ (∞),ℓ if q stands for the ℓ-step transition probability of random walk on G∞, then (6.17) implies that for all L ∈ N,

(n) (n),ℓ (∞),ℓ P π x ∈Tn(2L); q (x,y)= q (x,y), ∀ y, ∀ ℓ ∈ {1, 2, · · · ,L} −−−→ 1. (6.18) n→∞  Below we write P(n) and E(n) for the random walk probability and the expectation under q(n) = q(n),1 for n ∈ N ∪ {∞}. Notations for meeting times, random walks, and related objects associated with q(n) extend to G∞. Recall the random variables U, U ′,V,V ′ defined at the beginning of Section 3. Now we recall some main results for the limiting distributions of MU,U ′ and MV,V ′ on the random regular graphs {Gn}. First, every (nonrandom) subsequence {G } contains a further subsequence {G } such that the ni nij following properties hold P-a.s.: G are connected graphs for all (randomly) large j and nij

M ′ (d) 1 k − 1 L U,U −−−→ L e (6.19) Nn j→∞ 2 k − 2 ij !     [15, Remark 3.1]. Here and in what follows, a meeting time scaled by a constant indexed by n is under P(n), e is exponential with mean 1, and L (X) denotes the distribution of X. Moreover, the convergence (6.19) extends to the convergence of all moments [15, Theorem 3.3]. By (6.19) and (3.22),

MV,V ′ (d) 1 k − 2 1 k − 1 L −−−→ δ0 + L e P-a.s. (6.20) Nn j→∞ k − 1 k − 1 2 k − 2 ij !     See [11, Section 4] for details. We work with {G } and write this subsequence as {G } in the rest nij n of this section.

Proposition 6.2. By taking a subsequence of {Gn} if necessary,

MV,V ′ (d) 1 k − 2 L −−−→ δ0 + δ∞ P-a.s. (6.21) s n→∞ k − 1 k − 1  n  for any sequence (sn) such that

sn lim sn = ∞ and lim = 0. (6.22) n→∞ n→∞ Nn

Proof. We fix two adjacent vertices a and b in G∞ and give the proof in a few steps.

Step 1. We claim that by taking a subsequence of {Gn} if necessary, it is possible to choose an ′ auxiliary sequence (sn) of constants such that ′ ′ ′ sn #Tn(sn) lim sn = ∞, lim = 0, and lim = 1 P-a.s. (6.23) n→∞ n→∞ Nn n→∞ Nn

37 To find this sequence, first, note that by (6.17), we can choose a sequence (ℓn) such that ℓn → ∞ ′ ′ ′ and #Tn(ℓn)/Nn → 1 in P-probability. Fix a sequence (sn) such that sn ≤ ℓn, sn/Nn → 0 and ′ ′ sn →∞. Since r 7→ #Tn(r) is decreasing, #Tn(sn)/Nn → 1 in P-probability as well. We have proved ′ the existence of (sn) satisfying (6.23) such that the third limit holds in the sense of convergence in P-probability. Hence, by using a subsequence of {Gn} if necessary, (6.23) holds. ′ Step 2. With respect to the sequence (sn) chosen in Step 1, let (sn) be any slower sequence such that ′ sn lim sn = ∞ and lim = ∞. (6.24) n→∞ n→∞ sn

By the second limits in (6.23) and (6.24), sn/Nn → 0 so (6.22) holds. In the next paragraph of this step, we show that

(n) (d) (∞) (∞) P MV,V ′ /sn ∈ · −−−→ P (Ma,b < ∞)δ0 + P (Ma,b = ∞)δ∞ P-a.s. (6.25) n→∞ Step 3 will show that the limits in (6.21) and (6.25) coincide. Additionally, we will include in Step 4 the other sequences (sn) that satisfy (6.22) but fail to satisfy the second limit in (6.24). V V ′ V Write Jm for the m-th jump time of (B ,B ) on Gn. For all t ∈ [0, Jm], both d(V,Bt ) and ′ V ′ ′ ′ ′ V V ′ d(V ,Bt ) are bounded by m. Hence, on {(V,V ) ∈Tn(sn) ×Tn(sn)}, the law of {(Bt ,Bt ); 0 ≤ t ≤ (n) a b (∞) J ′ } under P equals the law of {(B ,B ); 0 ≤ t ≤ J ′ } under P . It follows that ⌊sn⌋/2 t t ⌊sn⌋/2

(n) −λM ′ /sn (∞) −λMa,b/sn E V,V ′ ′ E ′ e ; MV,V ≤ J⌊sn⌋/2 − e ; Ma,b ≤ J⌊sn⌋/2 = h P (V,V ′) = (x,yi ) h i ′ ′ ∁ (x,y)∈[Tn(Xsn)×Tn(sn)]  (n) −λMx,y/sn (∞) −λMa,b/sn × E e ; M ≤ J ′ − E e ; M ≤ J ′ , x,y ⌊sn⌋/2 a,b ⌊sn⌋/2  h i h i and so ′ (n) −λM ′ /s (∞) −λM /s Nn − #Tn(sn) −λJ ′ /sn E e V,V n − E e a,b n ≤ 2 + 2E e ⌊sn⌋/2 . (6.26) Nn

      On the right-hand side of (6.26), the first term converges to zero P-a.s. by the third limit in (6.23), ′ −λJ⌊s′ ⌋/2/sn ⌊s ⌋/2 and the choice of (sn) from (6.24) gives E[e n ] = (2/(2 + λ/sn)) n → 0. The right-hand (∞) −λM /sn (∞) side of (6.26) thus tends to zero P-a.s. Additionally, E [e a,b ] → P (Ma,b < ∞) by the first limit in (6.24). We have proved (6.25). Step 3. In this step, we show that

(∞) −1 P (Ma,b < ∞) = (k − 1) (6.27) and only consider random walks on G∞. a By symmetry, the hitting time Ha,b of b by B has the same distribution as 2Ma,b. Hence, ∞ ∞ P(∞) P(∞) E(∞) 1 a E(∞) 1 b (Ma,b < ∞)= (Ha,b < ∞)= {b}(Bt )dt {b}(Bt )dt , Z0  Z0  where the second equality follows from a standard Green function decomposition for hitting times of points. The Green functions in the last equality satisfy k − 1 ∞ ∞ = E(∞) 1 (Bb)dt =1+ E(∞) 1 (Ba)dt . k − 2 {b} t {b} t Z0  Z0  38 Here, the first equality is implied by the Kesten–McKay law for the spectral measure of G∞ (see [15, (3.3)] and the references there); the second equality uses the strong Markov property at the first jump b time of B and the symmetry of G∞. The identity in (6.27) follows from the last two displays.

Step 4. To complete the proof, we extend (6.25) to all the faster sequences (sn) such that (6.22) ′ ′′ ′′ ′ ′′ holds, but now, liminfn→∞ sn/sn < ∞. Fix any sequence (sn) satisfying sn →∞ and sn/sn →∞ as ′′ in Step 2. Then it is enough to show (6.25) for all sequences (sn) satisfying (6.22) and sn ≥ csn for some constant c ∈ (0, ∞). As recalled above, the convergence in (6.19) extends to the convergence of all moments. Hence,

(n) 2E [M ′ ] k − 1 lim U,U = P-a.s. (6.28) n→∞ Nn k − 2 Additionally, by (6.25) and (6.27), it holds that

E(n) 2 [MU,U ′ ]P(n) ′′ lim (MV,V ′ >snt) = 1, ∀ t ∈ (0, ∞); P-a.s. (6.29) n→∞ Nn

′′ and so, by (6.19) and [11, Proposition 4.3 (2)], (6.29) with sn replaced by sn holds. We obtain (6.22) from this limit and (6.28). The proof is complete. 

Remark 6.3. McKay [32, Theorem 1.1] derives the limiting spectral measures of large random regular graphs. There the randomness of graphs only plays the role of inducing asymptotically deterministic properties. For the present case, we could have worked with given sequences of k-regular graphs and obtained the same limit if the graphs have spectral gaps bounded away from zero and are locally tree-like. (Dropping the locally tree-like assumption calls for a different evaluation of the limit.) We choose to work with the above context to explain how the randomness of graphs should be handled for the convergence of the evolutionary game model. 

7 References

[1] Aldous, D. J. (1982). Markov chains with almost exponential hitting times. Stochastic Processes and their Applications 13, 305–310. doi:10.1016/0304-4149(82)90016-3 [2] Aldous, D. J. and Brown, M. (1992). Inequalities for Rare Events in Time-Reversible Markov Chains I. Lecture Notes-Monograph Series 22, 1–16. doi:10.1214/lnms/1215461937 [3] Aldous, D. J. and Fill, J. A. (2002). Reversible Markov Chains and Random Walks on Graphs. Unfinished monograph. Available at https://www.stat.berkeley.edu/users/aldous/RWG/book.pdf [4] Allen, B., Lippner, G., Chen, Y.-T., Fotouhi, B., Momeni, N., Yau, S.-T. and Nowak, M. A. (2017). Evolutionary dynamics on any population structure. Nature 544, 227–230. doi:10.1038/nature21723 [5] Bollobas,´ B. (1979). Graph Theory: An Introductory Course. Graduate Texts in Mathematics 63. Springer Verlag, New York. doi:10.1007/978-1-4612-9967-7 [6] Bollobas,´ B. (2001). Random Graphs, 2nd edition. Cambridge Studies in Advanced Mathemat- ics 73. Cambridge University Press. doi:10.1017/CBO9780511814068

39 [7] Bordenave, C. (2019). A new proof of Friedman’s second eigenvalue theorem and its extension to random lifts. To appear in Annales scientifiques de l’Ecole´ Normale Sup´erieure. Available at arXiv:1502.04482 [8] Champagnat, N., Ferriere,` R. and Mel´ eard,´ S. (2006). Unifying evolutionary dynamics: From individual stochastic processes to macroscopic models. Theoretical Population Biology 69, 297–321. doi:10.1016/j.tpb.2005.10.004 [9] Champagnat, N., Ferriere,` R. and Mel´ eard,´ S. (2008). From individual stochas- tic processes to macroscopic models in adaptive evolution. Stochastic Models 24, 2–44. doi:10.1080/15326340802437710 [10] Chen, Y.-T. (2016). Sharp benefit-to-cost rules for the evolution of cooperation on regular graphs. Annals of Applied Probability 23, 637–664. doi:10.1214/12-AAP849 [11] Chen, Y.-T., Choi, J. and Cox, J. T. (2016). On the convergence of densities of finite voter models to the Wright–Fisher diffusion. Annales de l’Institut Henri Poincar´e, Probabilit´es et Statis- tiques 52, 286–322. doi:10.1214/14-AIHP639 [12] Chen, Y.-T. and Cox, J. T. (2018). Weak atomic convergence of finite voter models to- ward Fleming–Viot processes. Stochastic Processes and their Applications 128, 2463–2488. doi:10.1016/j.spa.2017.09.015 [13] Chen, Y.-T., McAvoy, A. and Nowak, M. A. (2016). Fixation probabilities for any configu- ration of two types on regular graphs. Scientific Reports 6, 39181. doi:10.1038/srep39181. [14] Chen, Y.-T. (2018). Wright–Fisher diffusions in stochastic spatial evolutionary games with death-birth updating. Annals of Applied Probability 28, 3418–3490. doi:10.1214/18-AAP1390 [15] Chen, Y.-T. (2020). Meeting times for the voter model on large random regular graphs. arXiv:1711.00127 [16] Chung, F. R. K. (1997). Spectral Graph Theory. CBMS Regional Conference Series in Mathe- matics 92. American Mathematical Society. doi:10.1090/cbms/092 [17] Cox, J. T. (1989). Coalescing random walks and voter model consensus times on the torus in Zd. Annals of Probability 17, 1333–1366. doi:10.1214/aop/1176991158 [18] Cox, J. T. and Durrett, R. (2016). Evolutionary games on the torus with weak selection. Stochastic Processes and their Applications 126, 2388–2409. doi:10.1016/j.spa.2016.02.004 [19] Cox, J. T., Durrett, R. and Perkins, E. A. (2000). Rescaled voter models converge to super-Brownian motion. Annals of Probability 28, 185–234. doi:10.1214/aop/1019160117 [20] Cox, J. T., Durrett, R. and Perkins, E. A. (2013). Voter model perturbations and reaction diffusion equations. Ast´erisque 349. Soci´et´eMath´ematique de France. [21] Cox, J. T., Merle, M. and Perkins, E. A. (2010). Coexistence in a two-dimensional Lotka– Volterra model. Electronic Journal of Probability 15, 1190–1266. doi:10.1214/EJP.v15-795 [22] Cox, J. T. (2017). Densities of biased voter models on finite sets converge to Feller’s branching diffusion. Markov Processes and Related Fields 23, 421–444. [23] Cressman, R. (2003). Evolutionary Dynamics and Extensive Form Games. The MIT Press. doi:10.7551/mitpress/2884.001.0001 [24] Ethier, S. N. and Nagylaki, T. (1980). Diffusion approximations of Markov chains with two time scales and applications to population genetics. Advances in Applied Probability 12, 14–49. doi:10.2307/1426492

40 [25] Ethier, S. N. and Kurtz, T. G. (2005). Markov Processes: Characterization and Convergence, 2nd edition. Wiley Series in Probability and Statistics. Wiley-Interscience. [26] Friedman, J. (2008). A proof of Alon’s second eigenvalue conjecture and related problems. Memoirs of the American Mathematical Society 195. doi:10.1090/memo/0910 [27] Hofbauer, J. and Sigmund, K. (1998). Evolutionary Games and Population Dynamics. Cam- bridge University Press. doi:10.1017/CBO9781139173179 [28] Jacod, J. and Shiryaev, A. N. (2003). Limit Theorems for Stochastic Processes, 2nd edition. Grundlehren der mathematischen Wissenschaften 288. Springer-Verlag, Berlin. doi:10.1007/978-3-662-05265-5 [29] Liggett, T. M. (2005). Interacting Particle Systems. Reprint of the 1985 edition with a new postface. Classics in Mathematics 276. Springer, Berlin. doi:10.1007/b138374 [30] Levin, D. A., Peres, Y. and Wilmer, E. L. (2009). Markov Chains and Mixing Times. American Mathematical Society, Providence. [31] Maynard Smith, J. and Price, G. R. (1973). The logic of animal conflict. Nature 246, 15–18. doi:10.1038/246015a0 [32] McKay, B. D. (1981). The expected eigenvalue distribution of a large regular graph. Linear Algebra and its Applications 40, 203–216. doi:10.1016/0024-3795(81)90150-6 [33] Muller,¨ C. and Tribe, R. (1995). Stochastic p.d.e.’s arising from the long range con- tact and long range voter processes. Probability Theory and Related Fields 102, 519–545. doi:10.1007/BF01198848 [34] Ohtsuki, H. and Nowak, M. A. (2006). The replicator equation on graphs. Journal of Theo- retical Biology 243, 86–97. doi:10.1016/j.jtbi.2006.06.004 [35] Ohtsuki, H., Hauert, C., Lieberman, E. and Nowak, M. A. (2006). A simple rule for the evolution of cooperation on graphs and social networks. Nature 441, 502–505. doi:10.1038/nature04605 [36] Revuz, D. and Yor, M. (2005). Continuous Martingales and Brownian Motion, corrected 3rd edition. Grundlehren der mathematischen Wissenschaften 293. Springer-Verlag, Berlin. doi:10.1007/978-3-662-06400-9 [37] Rogers, L. C. G. and Pitman, J. W. (2005). Markov functions. Annals of Probability 9, 573–582. doi:10.1214/aop/1176994363 [38] Schuster, P. and Sigmund, K. (1983). Replicator dynamics. Journal of Theoretical Biology 100, 533–538. doi:10.1016/0022-5193(83)90445-9 [39] Szabo,´ G. and Fath,´ G. (2007). Evolutionary games on graphs. Physics Reports 446, 97–216. doi:10.1016/j.physrep.2007.04.004 [40] Taylor, P. D. and Jonker, L. B. (1978). Evolutionary stable strategies and game dynamics. Mathematical Biosciences 40, 145–156. doi:10.1016/0025-5564(78)90077-9

41