<<

Rescaling limits of the spatial Lambda-Fleming-Viot process with selection Alison Etheridge, Amandine Véber, Feng Yu

To cite this version:

Alison Etheridge, Amandine Véber, Feng Yu. Rescaling limits of the spatial Lambda-Fleming-Viot process with selection. Electronic Journal of Probability, Institute of Mathematical Statistics (IMS), 2020, 25, pp.1 - 89. ￿10.1214/20-EJP523￿. ￿hal-02953327￿

HAL Id: hal-02953327 https://hal.archives-ouvertes.fr/hal-02953327 Submitted on 30 Sep 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. n a l r u o o f J P c r i o o n b a E l e c t r b i l i t y

Electron. J. Probab. 25 (2020), article no. 120, 1–89. ISSN: 1083-6489 https://doi.org/10.1214/20-EJP523

Rescaling limits of the spatial Lambda-Fleming-Viot process with selection

Alison M. Etheridge* Amandine Véber† Feng Yu‡

Abstract

We consider the spatial Λ-Fleming-Viot process model for frequencies of genetic types in a population living in Rd, with two types of individuals (0 and 1) and natural selection favouring individuals of type 1. We first prove that the model is well-defined and provide a measure-valued dual process encoding the locations of the “potential ancestors” of a sample taken from such a population, in the same spirit as the dual process for the SLFV without natural selection [7]. We then consider two cases, one in which the dynamics of the process are driven by purely “local” events (that is, reproduction events of bounded radii) and one incorporating large-scale extinction- recolonisation events whose radii have a polynomial tail distribution. In both cases, we consider a sequence of spatial Λ-Fleming-Viot processes indexed by n, and we assume that the fraction of individuals replaced during a reproduction event and the relative frequency of events during which natural selection acts tend to 0 as n tends to infinity. We choose the decay of these parameters in such a way that when reproduction is only local, the measure-valued process describing the local frequencies of the less favoured type converges in distribution to a (measure-valued) solution to the Fisher-KPP equation in one dimension, and to a (measure- valued) solution to the deterministic Fisher-KPP equation in more than one dimension. When large-scale extinction-recolonisation events occur, the sequence of processes converges instead to the solution to the analogous equation in which the Laplacian is replaced by a fractional Laplacian (again, noise can be retained in the limit only in one spatial dimension). We also consider the process of “potential ancestors” of a sample of individuals taken from these populations, which we see as (the empirical distribution of) a system of branching and coalescing symmetric jump processes. We show their convergence in distribution towards a system of Brownian or stable motions which branch at some finite rate. In one dimension, in the limit, pairs of particles also coalesce at a rate proportional to their collision local time. In contrast to previous proofs of scaling limits for the spatial Λ-Fleming-Viot process, here the convergence of the more complex forwards in time processes is used to prove the convergence of the dual process of potential ancestries.

*Department of Statistics, , UK. E-mail: [email protected] †CMAP, CNRS, Institut Polytechnique de Paris, France. E-mail: [email protected] ‡School of Mathematics, University of Bristol, UK. E-mail: [email protected] The spatial Lambda-Fleming-Viot process with selection

Keywords: generalised Fleming-Viot process; natural selection; limit theorems; duality; symmet- ric stable processes; . MSC2020 subject classifications: Primary 60G57; 60J25; 92D10, Secondary 60J75; 60G52. Submitted to EJP on June 23, 2014, final version accepted on September 6, 2020. Supersedes arXiv:1406.5884v4.

Contents

1 Introduction 2 1.1 The spatial Λ-Fleming-Viot process with selection ...... 4 1.2 A measure-valued dual process of “potential ancestors” ...... 9 1.2.1 Definition of the dual process ...... 9 1.2.2 Duality relation between (Mt)t≥0 and (Ξt)t≥0 ...... 14 1.3 Convergence of the rescaled SLFVS to Fisher-KPP processes ...... 15 1.4 Structure of the paper ...... 21

2 Proof of Theorem 1.2 (existence of the SLFVS) 21

3 Proof of Proposition 1.7 (duality) 29

4 Heuristics for the large-scale behaviour of the SLFVS and its dual process 34

5 Convergence of the rescaled SLFVS and its dual – the fixed radius case 38 5.1 Proof of Theorem 1.11 ...... 38 5.2 Proof of Theorem 1.13...... 48

6 Convergence of the rescaled SLFVS and its dual – the stable radius case 55 6.1 Proof of Theorem 1.14 ...... 55 6.2 Proof of Theorem 1.16 ...... 64

References 70

A Continuity estimates in the fixed radius case 72

B Continuity estimates in the stable radius case 72 B.1 Drift terms ...... 77 B.2 Martingale terms ...... 78 B.3 Lemmas ...... 81

1 Introduction

The principal aim of mathematical population genetics is to understand the influence of the different forces of evolution that act on a population, and the interactions be- tween them, in shaping the patterns of genetic diversity that we see in the present-day population. One important aspect of this is the interplay between spatial structure of the population and the intrinsic randomness due to reproduction in a finite popula- tion (known as genetic drift). This is particularly mathematically challenging in one of the most biologically important situations, when the population is distributed across a two-dimensional spatial continuum. The obstructions to producing a mathematically consistent and analytically tractable model in this setting were highlighted in [23] and dubbed “the pain in the torus”. The spatial Λ-Fleming-Viot process (SLFV), introduced in [7, 15], provides one route to overcoming those obstructions, and its relatively sim- ple mathematical structure makes it a powerful tool for investigating genetic diversity in spatially structured populations. In fact, it is not so much a process as a general

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 2/89 The spatial Lambda-Fleming-Viot process with selection framework for modelling frequencies of different genetic types in populations which evolve in a spatial continuum. For example, it is readily adapted to include things like the large-scale extinction/recolonisation events which have dominated the demographic history of many species. In this paper, we shall be interested in an extension of this measure-valued process in which some individuals have higher reproductive success than others, modelling the evolution of a spatially structured population subject to natural selection. Variants of the SLFV that incorporate forms of natural selection already appear in a number of studies [6, 17, 18, 19, 25], but without a detailed discussion of the construction of the stochastic processes, or whether they are well-defined when the geographic space in which the population evolves is infinite. Our first contribution is to formulate and construct an SLFV with natural selection. The methods that we employ can be readily adapted to capture all of the forms of selection considered to date, and indeed the form of selection considered here contains many of them as special cases. We shall then turn to using our model to study the interaction between natural selec- tion, spatial structure, and genetic drift. In particular, we are interested in identifying the spatial and temporal scales over which one can expect to see a non-trivial signature of the interaction between these forces. More precisely, we investigate rescaling limits of the model which capture the resultant patterns of genetic diversity over large spatial and temporal scales. In particular, our second contribution is to find suitable scalings of time, space and of the strength of selection for which, in the limit as the scaling parameter n tends to infinity, we recover the Fisher-KPP equation [24, 32] and, in one spatial dimension, its stochastic counterpart. In the presence of large-scale demographic events, the appropriate rescalings are different and lead to analogous equations with the Laplacian replaced by the fractional Laplacian, but, intriguingly, no other trace of the large-scale events survives. The limits obtained here assume that the local population densities are high, thus complementing results of [18, 19] which address the interaction of natural selection and genetic drift when local population densities are small. The Fisher-KPP equation

σ2 ∂ p = ∆p + sp(1 − p) (1.1) t 2 was introduced independently by Fisher [24], specifically to model the spread of an ad- vantageous gene through a spatially distributed population, and Kolomogorov, Petrovsky & Piskunov [32], who also highlighted the applications to biology. Fisher considered a population living in a one-dimensional space, whereas Kolmogorov et al. worked in two dimensions (although they then assumed that the distribution of types was independent of the second coordinate, thus reducing it to the one-dimensional case). The equation has been extensively studied (and extended in many ways), and is now a standard model of invasion in biology. A major focus of work has been on the travelling wave solutions. When the motion of individuals or genes is not local but has a heavy-tailed distribution, one replaces the Laplacian in (1.1) by a fractional Laplacian −(−∆)α. This, notably, modifies the speed of the travelling wave solutions, which is constant in the diffusive case and increases exponentially in the fractional case; see [14] and references therein. To take into account the stochasticity inherent in reproduction in a finite population, in one dimension one can add a noise term of the form

εpp(1 − p)W˙ , to the right hand side of (1.1), where W˙ is a space-time white noise. This yields the natural continuous space analogue of the classical stepping-stone model of population genetics, introduced without selection in [31], and studied in more generality in, for

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 3/89 The spatial Lambda-Fleming-Viot process with selection example, [47]. The (continuous space) stochastic Fisher-KPP equation can be obtained from the discrete space counterpart through rescaling (cf. [5], where the case without selection is treated) and was also obtained as the limit (over appropriate large spatial and temporal scales) of a family of long-range contact processes in [39]. It has been the object of intensive study, with the perturbations of solutions due to the noise when ε is very small receiving particular attention, e.g. [13, 37, 38] and a huge body of closely related work inspired by work of Brunet, Derrida and coworkers, e.g. [4]. Our results here provide the parameter regimes under which the SLFV with selection can be thought of as a noisy perturbation of the Fisher-KPP equation. Crucially, they apply in two or more spatial dimensions, where the stochastic PDE has no solution. In particular, the n rescaled process M introduced in Section 1.3 of this work provides a tractable analogue in dimension d ≥ 2 to the one-dimensional stochastic Fisher-KPP equation with small noise when n is large.

1.1 The spatial Λ-Fleming-Viot process with selection The main innovation in the SLFV is that reproduction in the population is based on a Poisson point process of events, rather than on individuals. It is this which overcomes the pain in the torus. This is discussed in detail in [7] and so we do not repeat the motivation here. Each event determines the region of space in which reproduction (or extinction/recolonisation) will take place and an impact u. As a result of the event, a proportion u of the individuals living in the region is replaced by offspring of a parent chosen from the population immediately before the event (a precise definition of the process is given below). The Poisson structure renders the process particularly amenable to analytic study. In the neutral setting, which has been studied rather extensively (see [8] for a somewhat out of date review), the parent is chosen uniformly at random from the affected region, irrespective of type. There are many possible ways to incorporate natural selection. Here we shall focus on one of the simplest, but also most important, in which in the selection of the parent, individuals are weighted according to their genetic type. To motivate our definition of the process with (fecundity) selection, suppose that there are two possible types in the population, which we shall denote by 0 and 1. In order to give a slight selective advantage to type 1, we fix a selection coefficient s > 0 and suppose that, when an event falls, if the proportion of type 0 individuals in the affected region immediately before the event is w¯, then the probability of picking a type 0 parent is p(w, ¯ s) =w/ ¯ (1 + s(1 − w¯)). In other words, in the choice of the parent we give a weight 1 to type 0 individuals, and a weight 1 + s > 1 to type 1 individuals, so that the probability of picking a parent of type 0 is w/¯ (w ¯ + (1 + s)(1 − w¯)) = p(w, ¯ s). Typically one is interested in weak selection, so that s  1 and, in this case, we can estimate this probability by (1 − s)w ¯ + sw¯2. Here again we reap the benefit of the Poisson structure of events: we can think of events as being of one of two types. A proportion (1 − s) of events are “neutral”: the parent is selected exactly as in the neutral setting and has probability w¯ of being of type 0. On the other hand, a proportion s of events are “selective” and then the probability of a type 0 parent is w¯2. One way to achieve this is to dictate that at selective events we choose two potential parents, independently, and only if both are type 0 will the offspring be type 0. The Poisson structure allows us to view neutral and selective events as being driven by independent Poisson processes. This approach exactly parallels that usually adopted to incorporate genic selection into the classical Moran model of population genetics (see, e.g., Definition 5.6 in [16]). Of course there are many ways to modify the selection mechanism. For example, as in Definition 1.3 below, we can allow both the distribution of the size of the region affected and of the impact to differ between selective and neutral events, or we can consider

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 4/89 The spatial Lambda-Fleming-Viot process with selection density dependent selection, in which the fitness of an individual depends on the local distribution of genetic types, e.g. [17]. Let us turn to a precise definition. All the random objects in this section are defined on some probability space (Ω, F, P). First we describe the state space of the process, borrowing some results from [49] in the special case in which the compact space of possible genetic types is K = {0, 1}. We suppose that the population evolves in Rd (although the space of geographical locations could equally, for example, be taken to be some subset of Rd, or a d-dimensional torus). d At each time t, the population is represented by a measure Mt on R × K whose first marginal is Lebesgue measure on Rd. As in the neutral setting, this corresponds to assuming that individuals are uniformly distributed over Rd and for any measurable d −1 subset E of R and κ ∈ {0, 1}, Vol(E) Mt(E × {κ}) gives the proportion of individuals of type κ in E. The space Z Z n d d o Mλ := M measure on R × {0, 1}: ∀f ∈ Cc(R ), f(x)M(dx, dκ) = f(x)dx Rd×{0,1} Rd (1.2) of such measures is equipped with the topology of vague convergence, which makes it d a compact set (cf. Lemma 1.1 in [49]). Here Cc(R ) denotes the space of all compactly supported continuous functions on Rd. A standard decomposition theorem (see e.g. [29], d p. 561) gives us the existence of a measurable mapping wt : R → [0, 1] such that  Mt(dx, dκ) = wt(x)δ0(dκ) + (1 − wt(x))δ1(dκ) dx. (1.3)

d Morally, wt(x) represents the local fraction of individuals of type 0 at site x ∈ R at time t, and we abuse notation and call it the “density” of M. Note that wt is defined up to a Lebesgue null set, that is two mappings wt and w˜t will be equivalent if and only if

 d  Vol x ∈ R : wt(x) 6=w ˜t(x) = 0.

In what follows, wt will denote any representative of the equivalence class of densities for Mt. We shall thus equally speak of Mt or wt, depending on what makes the notation more fluid. However, it should be understood that the object of interest in all our results is the measure-valued evolution (Mt)t≥0. d 1 For every f ∈ Cc(R ) and every F ∈ C (R) (the space of all continuously differen- tiable functions on R), let us set Z hw, fi := w(x)f(x)dx (1.4) Rd and let us define the function ΨF,f on Mλ by  Z  ΨF,f (M) := F (hw, fi) = F f(x)1{0}(κ)M(dx, dκ) , (1.5) Rd×{0,1} where w is any representative of the density of M. These functions will prove particularly useful for the following reason. 1 d Lemma 1.1. The set of functions of the form ΨF,f , F ∈ C (R) and f ∈ Cc(R ), is dense in C(Mλ) for the supremum norm topology.

Proof of Lemma 1.1. Since we endow Mλ with the topology of vague convergence, the set of all functions of the form Z ! M 7→ G ϕ(x, κ)M(dx, dκ) , (1.6) Rd×{0,1}

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 5/89 The spatial Lambda-Fleming-Viot process with selection

1 d with G ∈ C (R) and ϕ ∈ Cc(R × {0, 1}) is dense in C(Mλ). But if w is a representative of the density of M, we can write Z Z Z ϕ(x, κ)M(dx, dκ) = ϕ(x, 0)w(x)dx + ϕ(x, 1)(1 − w(x))dx Rd×{0,1} Rd Rd Z Z = (ϕ(x, 0) − ϕ(x, 1))w(x)dx + ϕ(x, 1)dx, Rd Rd and so the mapping (1.6) can be rewritten in the form F (hw, fi), with  Z  F (y) = G y + ϕ(x, 1)dx and f(x) = ϕ(x, 0) − ϕ(x, 1). Rd

1 d By construction, in the above we have F ∈ C (R) and f ∈ Cc(R ). The set of functions of the form (1.6) is thus included in the set of functions of the form (1.5) and the conclusion follows.

In order to gain a feeling for the process, let us first give a non-rigorous description based on the two independent Poisson point processes of “neutral” and “selective” events mentioned above. This intuitive idea of how the SLFV with fecundity selection should evolve suggests a natural choice of operator L on functions of the form (1.5), see (1.9), and we shall show in Theorem 1.2 that for any probability measure P on Mλ describing the law of the initial condition, the martingale problem for (L,P ) has a unique solution on the space of all measurable Mλ-valued paths. Furthermore, this solution is a Markov process with a.s. càdlàg paths, and it has the Feller property. The SLFV with selection, with initial distribution P , can then be defined as the unique solution to this well-posed martingale problem (see Definition 1.3). 0 So first, the idea. Let µ, µ be two σ-finite measures on (0, ∞), and let ν = {νr, r > 0}, 0 0 ν = {νr, r > 0} be two collections of probability measures on [0, 1] such that Z ∞ Z 1 Z ∞ Z 1 d d 0 0 r uνr(du)µ(dr) < ∞, and r uνr(du)µ (dr) < ∞. (1.7) 0 0 0 0

Further, let ΠN and ΠS be two independent Poisson point processes on R × Rd × (0, ∞) × 0 0 [0, 1] with respective intensity measures dt ⊗ dx ⊗ µ(dr)νr(du) and dt ⊗ dx ⊗ µ (dr)νr(du). Let M0 ∈ Mλ be the (for now, deterministic) initial value of the process. The dynamics N of (Mt)t≥0 are as follows. If (t, x, r, u) ∈ Π , a neutral event occurs at time t, within the closed ball B(x, r):

1. Sample a type κ according to the type distribution within B(x, r) just before the −1 event. That is, κ = 0 with probability Vr Mt−(B(x, r) × {0}), where Vr is the volume of a d-dimensional ball of radius r; otherwise, κ = 1.

2. Update the value of Mt (only) within B(x, r) by setting

Mt := (1 − u)Mt− + u dx ⊗ δκ. B(x,r)×{0,1} B(x,r)×{0,1} B(x,r)

In words, at every site y ∈ B(x, r) we keep a fraction (1 − u) of the population as it was just before the event, and we replace the remaining fraction u by descendants of the individual with type κ chosen during the first step. These offspring all inherit the type κ of their parent. Thus, a representative of the density of Mt can be taken to be wt(y) = wt−(y) if y∈ / B(x, r), and

wt(y) = (1 − u)wt−(y) + u1{κ=0} if y ∈ B(x, r).

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 6/89 The spatial Lambda-Fleming-Viot process with selection

Similarly, if (t, x, r, u) ∈ ΠS, a selective event occurs at time t, within the closed ball B(x, r): 1. Sample two types κ and κ0 independently, according to the type distribution within B(x, r) just before the event. We interpret them as the types of two “potential” parents. 2. Update the value of Mt (only) within B(x, r) by setting

Mt := (1 − u)Mt− + u dx ⊗ δmax{κ,κ0}. B(x,r)×{0,1} B(x,r)×{0,1} B(x,r) That is, the offspring are of type 0 if and only if both potential parents are of type 0. This time, a representative of the density of Mt can be taken to be wt(y) = wt−(y) if y∈ / B(x, r), and

wt(y) = (1 − u)wt−(y) + u1{κ=κ0=0} if y ∈ B(x, r).

Let us now introduce the operator that will encode this dynamics. For every potential density w : Rd → [0, 1], x ∈ Rd, r > 0 and u ∈ [0, 1], let us define + Θx,r,u(w) := 1B(x,r)c w + 1B(x,r)((1 − u)w + u), and − Θx,r,u(w) := 1B(x,r)c w + 1B(x,r)(1 − u)w. (1.8) These quantities will correspond to the value of the density immediately after an event (t, x, r, u) if the offspring are of type 0 or type 1 respectively. Assuming that the above description corresponds to a well-posed martingale problem, we would expect the corresponding operator L to act on functions of the form (1.5) as follows (recall that Vr stands for the volume of a ball of radius r): for every M ∈ Mλ, Z Z ∞ Z 1 Z 1 h + LΨF,f (M) = w(y)F (hΘx,r,u(w), fi) (1.9) Rd 0 0 B(x,r) Vr − i + (1 − w(y))F (hΘx,r,u(w), fi) − F (hw, fi) dy νr(du) µ(dr) dx Z Z ∞ Z 1 Z 1 h +  + 2 w(y)w(z)F hΘx,r,u(w), fi Rd 0 0 B(x,r)2 Vr − i 0 0 + (1 − w(y)w(z))F (hΘx,r,u(w), fi) − F (hw, fi) dy dz νr(du) µ (dr) dx.

Note that LΨF,f (M) can also be expressed without referring to the density of M:

LΨF,f (M) (1.10) Z Z ∞ Z 1 Z 1 h  = 1{0}(κ)F M, 1B(x,r)c×{0}f + (1 − u) M, 1B(x,r)×{0}f Rd 0 0 B(x,r)×{0,1} Vr Z 0 0   + u f(x )dx + 1{1}(κ)F M, 1B(x,r)c×{0}f + (1 − u) M, 1B(x,r)×{0}f B(x,r)  i − F M, 1Rd×{0}f M(dy, dκ) νr(du) µ(dr) dx Z Z ∞ Z 1 Z 1 h 0  c + 2 1{0}(κ)1{0}(κ )F M, 1B(x,r) ×{0}f Rd 0 0 (B(x,r)×{0,1})2 Vr Z 0 0 + (1 − u) M, 1B(x,r)×{0}f + u f(x )dx B(x,r) 0    + 1 − 1{0}(κ)1{0}(κ ) F M, 1B(x,r)c×{0}f + (1 − u) M, 1B(x,r)×{0}f

 i 0 0 0 0 − F M, 1Rd×{0}f M(dy, dκ)M(dy , dκ ) νr(du) µ (dr) dx, where we have used the bracket notation for some of the integrals to ease the notation.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 7/89 The spatial Lambda-Fleming-Viot process with selection

Let BMλ [0, ∞) (resp., DMλ [0, ∞)) denote the space of all paths (resp., càdlàg paths) with values in Mλ. When needed, DMλ [0, ∞) is endowed with the standard Skorokhod topology and the associated Borel σ-field. Our first main result is the following. Theorem 1.2. Suppose that Condition (1.7) holds. Then, for every probability measure P on Mλ we have:

(i) The BMλ [0, ∞)-martingale problem for (L,P ) is well-posed. That is, there exists a unique measurable process (Mt)t≥0 with values in Mλ such that M0 has law P and for every function ΨF,f of the form (1.5),  Z t  ΨF,f (Mt) − ΨF,f (M0) − LΨF,f (Ms)ds (1.11) 0 t≥0 is a martingale. (ii) The process (Mt)t≥0 in (i) is a Markov process and its semigroup is Feller. Moreover, it has càdlàg paths almost surely. We can finally define the SLFV with fecundity selection in a rigourous way, assuming that Condition (1.7) is satisfied. Definition 1.3 (SLFV with fecundity selection (SLFVS)). Let P be a probability measure on Mλ. We call spatial Λ-Fleming-Viot process with fecundity selection, with initial distribution P , the unique solution (Mt)t≥0 to the martingale problem for (L,P ) obtained in Theorem 1.2(i). In particular, by Theorem 1.2(ii), the SLFVS is a strong Markov process with càdlàg paths a.s. The proof of Theorem 1.2, given in Section 2 to ease the exposition, proceeds as follows. First, the result would be an obvious consequence of the Poisson point process formulation if we had chosen a compact set E in place of Rd for the geographical space in which the population evolves, and if the intensities of the Poisson point processes N S Π and Π were finite, as then the global rate at which events fall and Mt is updated would be finite. We thus start from this simple case and take a sequence of Poisson point processes whose intensities converge to the (possibly infinite) intensities of ΠN and ΠS on R × E × (0, ∞) × [0, 1]. We then take a sequence of hypercubes growing d to R , and construct the process (Mt)t≥0 of Theorem 1.2 as a potential limit for the corresponding processes. Uniqueness of such a limit is proved via a duality relation between any solution to the martingale problem (1.11) and a given family of solutions to the martingale problem satisfied by the particle system (Ξt)t≥0 introduced in Section 3. This duality argument is a natural analogue of the argument guaranteeing uniqueness of the neutral SLFV [7], for which the dual process is a system of coalescing random walks interpreted as tracing the locations of the ancestors of individuals in a sample from the population. In the case with selection, we shall see the dual process as a system of branching and coalescing random walks that describes the locations of all potential ancestors of individuals in a sample from the population modelled by (Mt)t≥0. The technical Condition (1.7) corresponds to Assumption 2.4 in [7] and expresses the fact that each “ancestral lineage” is affected by an event at a finite rate. Observe that the reproduction events encoded by the Poisson point process ΠS favour the subpopulation of individuals of type 1, since during an event determined by ΠS, offspring are of type 0 only if both the potential parents sampled are of type 0. Since we only consider this particular form of selection in this paper, there should be no ambiguity in simply calling this process the SLFV with selection, but we emphasise that, although this is certainly one of the most natural, there are many alternative models. For example, one could modify the construction so that one first selects a parental type and then an impact depending on that type, or one could “kill” with differential weights (cf. [3, 26, 36] in the non-spatial setting).

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 8/89 The spatial Lambda-Fleming-Viot process with selection

We note that [20] describes two constructions of the SLFV. The first gives the building blocks for the existence of an SLFV with type-dependent killing, under somewhat weaker conditions than (1.7). The proof of existence is given (only) in the neutral case, but uniqueness remains open. The second construction, which requires Condition (1.7), allows for the sort of selection considered here, although, again, the actual proof of existence is only provided in the neutral case.

1.2 A measure-valued dual process of “potential ancestors”

In this section, we first introduce a process (Ξt)t≥0 with values in the set of all finite point measures on Rd, whose evolution is driven by an independent copy of the Poisson point processes ΠN and ΠS. In Section 1.2.2, we state a duality relation between any solution to the martingale problem (1.11) and the process Ξ starting from suitable initial distributions. This duality is the analogue of the relation between the neutral SLFV and its “genealogical process” (see Theorem 4.2 in [7] for a general version of this relation, and Equation (8) in [9] for the particular case of two types of individuals). This is the content of Proposition 1.7, whose proof is deferred to Section 3 to ease the exposition. Although the duality presented here is very reminiscent of the standard notion of duality between two martingale problems (see [21], pp. 188–189, with α = β = 0 for us), it differs in that the natural duality function

k Y f(M, Ξ) = w(xi) i=1

Pk (for every Ξ = i=1 δxi and M ∈ Mλ with “density” w) suggested by classical population genetics is not well defined (see the discussion at the beginning of Section 1.2.2). Indeed, 0 another representative w of the density of M may differ from w at some of the xi, yielding a different value for f(M, Ξ). Consequently, we must modify the Ethier & Kurtz approach to duality, but Relation (1.25) stated in Proposition 1.7 will still take the same form as Relation (4.35) in [21].

1.2.1 Definition of the dual process

In contrast with the strategy adopted in Section 1.1 to construct the SLFVS, here we do not base the definition of the dual process on a martingale problem but, instead, we provide an explicit construction of this finite rate jump process in Definition 1.4. In Proposition 1.5, we show that this definition gives rise to a well-defined Markov process which also solves a martingale problem. This will be sufficient to obtain the duality relation stated in Proposition 1.7 and which is required to prove uniqueness of the solution to the martingale problem for (L,P ) stated in (1.11). Let us start with some heuristics on the form and dynamics of the dual process before formulating Definition 1.4. Recall that during a neutral event (t, x, r, u) ∈ ΠN , a single parental type is chosen according to the type distribution Z Z 1 1  Mt−(dz, dκ) = wt−(z)δ0(dκ) + (1 − wt−(z))δ1(dκ) dz Vr B(x,r) Vr B(x,r) in B(x, r) at time t−. Although, strictly speaking, the density wt− is only defined up to a Lebesgue null set (and so for a given z the value of wt−(z) may differ between two representatives of the density of Mt−), this sampling can informally be seen as picking a spatial location z uniformly at random within B(x, r), and then choosing a parent from

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 9/89 The spatial Lambda-Fleming-Viot process with selection the population at z immediately before the event. Thus the parent is of type 0 with probability wt−(z), or 1 with probability 1 − wt−(z). Similarly, the independent sampling of two types within B(x, r) during a selective event can be interpreted as choosing two locations z and z0 independently and uniformly at random within B(x, r), and then potential parental types according to the type distributions at z and z0 just before the event. d Suppose now that we sample k ∈ N individuals at some locations x1, . . . , xk ∈ R at time 0, “the present”, assuming that the population has been evolving for some very large time (that we do not specify). We want to trace back the locations of the “ancestors” of the individuals in the sample: that is, we want to go back into the past and describe at every earlier time t the set of locations in Rd from which the collection of types seen in our sample may have originated. To motivate the introduction of the process (Ξt)t≥0 below, let us first analyse from a genealogical perspective what happens during each reproduction event. If the event is neutral (i.e., belongs to ΠN ), when an ancestor finds itself in the region affected by the event just after the latter has occurred, the probability that it belongs to the fraction u of the local population replaced during the event is precisely u. In this case, the “parent” of this ancestor was the “individual” whose type was chosen to be the one reproduced during the event, and as expounded above, the location of this “parent” is uniformly distributed over the affected area. Consequently, precisely at the time of this event in the past, the ancestral lineage corresponding to the ancestor found in this area jumps onto the location of the “parent”. On the other hand, if (with probability 1 − u) the ancestor does not belong to the fraction replaced, it is not an offspring of the “parent” and its ancestral lineage is not affected by the event (i.e., it remains at the same spatial location). Finally, if there is more than one ancestor in this area, each of them belongs to the fraction of the population just replaced with probability u independently of each other, and the ancestral lineages of all those (and only those) who lie in this “offspring” population merge into a single ancestral lineage located at the position of the “parent”. Note that this procedure is independent of the type of the “parent”. During a selective event (i.e., belonging to ΠS), this can no longer be the case; since we only follow the spatial locations from which the sampled types originate, and not their types, we are unable to decide which of the two “potential” parents is the true parent of the event. Instead we follow the locations of all “potential” ancestors. More precisely, as in a neutral event, every ancestor present in the area of the event just after it occurred belongs to the fraction of the local population just replaced with probability u, independently of each other. At the time of the event in the past, the ancestral lineages corresponding to the ancestors who belong to the “offspring” population merge, since they all have the same “parent”. However, we do not know a priori from which of the two potential “parents” they inherit their types and so the new ancestral lineage instantly splits into two potential lineages, starting from the positions of the two potential “parents”, independently and uniformly distributed over the area covered by the event. This parallels the construction of the ancestral selection graph and its duality relation with the Wright-Fisher diffusion with selection in the case of a panmictic population [33, 40]. We shall sometimes use this informal description to see our dual process as a system of branching and coalescing jump processes, although this interpretation will appear much clearer when we describe the limiting “ancestral” processes that arise in the regimes of parameters on which we focus in Theorems 1.13 and 1.16. We now give a formal definition of the process (Ξt)t≥0 which will keep track of the locations of the potential ancestors of a sample taken from the current state of the population. To this end, observe that the time-reversed point processes ←− Π i := (−t, x, r, u):(t, x, r, u) ∈ Πi , i ∈ {N,S}, (1.12)

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 10/89 The spatial Lambda-Fleming-Viot process with selection also form two independent Poisson point processes on R × Rd × (0, ∞) × [0, 1] with the same intensity measures as the corresponding forwards in time processes. The way in which events happen in both directions of time is thus the same in distribution. Hence, let Πe N and Πe S be independent copies of ΠN and ΠS respectively, defined on another probability space (Ω, F 0, P) (and so is the process Ξ introduced below). d d Let Mp(R ) denote the set of all finite point measures on R , which we endow with d the topology of weak convergence. The process (Ξt)t≥0 will take its values in Mp(R ): each atom of Ξt will represent the location of a potential ancestor t units of time in the past. 0 d Definition 1.4. Let Ξ be an Mp(R )-valued random variable, and let us define the 0 0 process (Ξt)t≥0 with initial value Ξ as follows. We set Ξ0 = Ξ and, for convenience, at every time t ≥ 0 we write

N Xt Ξ = δ i t ξt i=1

Rd i where Nt = Ξt( ) and some of the ξt may be identical (by Lemma 2.3 in [28], the elements of this decomposition are measurable functions of Ξt). Note that the ordering by 1,...,Nt of the atoms is arbitrary and will play no role in the updating of Ξt. Then: For every (t, x, r, u) ∈ Πe N :

i 1. To each ξt− ∈ B(x, r), we independently give a mark with probability u, or not with probability 1 − u;

i 2. If at least one atom ξt− is marked, to form Ξt we remove all the marked atoms from Ξt− and we add a Dirac mass at a location which is drawn uniformly at random from within B(x, r).

For every (t, x, r, u) ∈ Πe S:

i 1. To each ξt− ∈ B(x, r), we independently give a mark with probability u, or not with probability 1 − u;

i 2. If at least one atom ξt− is marked, to form Ξt we remove all the marked atoms from Ξt− and we add two Dirac masses at locations which are drawn independently and uniformly from within B(x, r).

In both cases, if no particles in Ξt− are marked, then nothing happens.

Note that the point measure Ξt always has at least one atom (unless Ξ0 = 0), since any removal is accompanied by the insertion of at least one new atom. Before stating the result showing that this definition gives rise to a well-defined Markov process, let us introduce the operator G which will turn out to be the extended generator of (Ξt)t≥0 (i.e., the operator on which the martingale problem satisfied by Ξ is 1 R R 1 based). Let Cb ( ) denote the set of all functions on which are bounded, of class C d and whose first derivatives are bounded. Let also Bb(R ) denote the set of all bounded Rd 1 R Rd measurable functions on . For every F ∈ Cb ( ) and f ∈ Bb( ), we define the function ΦF,f by d ΦF,f (Ξ) := F (hΞ, fi), ∀Ξ ∈ Mp(R ), (1.13)

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 11/89 The spatial Lambda-Fleming-Viot process with selection

R where hΞ, fi = f(x)Ξ(dx). Finally, we define the function GΦF,f as follows. For every Pl Rd Ξ = i=1 δxi ∈ Mp( ), " Z Z ∞ Z 1 Z 1 X |D| |Ix,r (Ξ)\D| GΦF,f (Ξ) := u (1 − u) (1.14) Rd 0 0 B(x,r) Vr D⊆Ix,r (Ξ) |D|≥1  #  X   × F hΞ, fi − f(xi) + f(z) − F hΞ, fi dzνr(du)µ(dr)dx i∈D " Z Z ∞ Z 1 Z 1 X |D| |Ix,r (Ξ)\D| + 2 u (1 − u) Rd 0 0 B(x,r)2 Vr D⊆Ix,r (Ξ) |D|≥1  #  X 0   0 0 0 × F hΞ, fi − f(xi) + f(z) + f(z ) − F hΞ, fi dzdz νr(du)µ (dr)dx, i∈D where

Ix,r(Ξ) = {i ∈ {1, . . . , l} : xi ∈ B(x, r)} (1.15) is the set of atoms of Ξ sitting in the closed ball B(x, r) and by convention, the sum over D ⊂ Ix,r(Ξ), |D| ≥ 1 is set to 0 if Ix,r(Ξ) is empty. Note again that by Lemma 2.3 in [28], the elements l, x1, . . . , xl of the decomposition of Ξ are measurable functions of Ξ, and d so the mapping GΦF,f is a well-defined measurable function on Mp(R ).

Proposition 1.5. The process (Ξt)t≥0 of Definition 1.4 is a well-defined Markov jump d 0 d process with values in Mp(R ). In addition, if there exists K > 0 such that P[Ξ (R ) ≤ 1 R Rd K] = 1, then for every F ∈ Cb ( ) and f ∈ Bb( ), the process  Z t  ΦF,f (Ξt) − ΦF,f (Ξ0) − GΦF,f (Ξs) ds (1.16) 0 t≥0 is a martingale.

Proof of Proposition 1.5. Let us first argue that the process (Ξt)t≥0 of Definition 1.4 is well defined for all time t ≥ 0. Let us focus on a given atom in Ξt (for some t ≥ 0), say at z ∈ Rd. Since it is affected by a reproduction event only if it lies in the area of the event and if it is marked (which happens with a prescribed probability u), by construction the rate at which this atom is impacted by an event is given by

Z Z ∞ Z 1 Z Z ∞ Z 1 0 0 1{|z−x|≤r}u νr(du)µ(dr)dx + 1{|z−x|≤r}u νr(du)µ (dr)dx Rd 0 0 Rd 0 0 Z ∞ Z 1 Z ∞ Z 1 0 0 = Vru νr(du)µ(dr) + Vru νr(du)µ (dr) := C0 < ∞ (1.17) 0 0 0 0

(where the finiteness of C0 comes from Condition (1.7)), and so the total rate at which any d of the atoms of Ξt is affected, and hence Ξ jumps, is bounded from above by C0Ξt(R ). S Furthermore, the number of atoms in Ξt can increase only during an event of Πe , and by at most one (if only one atom is erased and two atoms are created during a selective event). Consequently, the total number of atoms in Ξt is stochastically bounded by the d number of particles in a Yule process starting with Ξ0(R ) particles, each of which splits into two at constant rate Z ∞ Z 1 0 0 Vru νr(du)µ (dr), (1.18) 0 0

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 12/89 The spatial Lambda-Fleming-Viot process with selection

d independently of each other. Combining the above with the fact that Ξ0(R ) is finite a.s., we obtain that with probability one the total mass of Ξt is finite for every t ≥ 0 and there is no accumulation of jumps in finite time. That is, (Ξt)t≥0 is a finite rate jump process defined for all time t ≥ 0. The fact that Ξ is Markovian then comes from the Poisson point process structure of its evolution. Let us now give a bound on GΦF,f (Ξ), defined in (1.14), to prove first that the operator G is well-defined on the set of test functions considered, and second that (Ξt)t≥0 is indeed 1 R Rd solution to the martingale problem (1.16). To this end, let F ∈ Cb ( ), f ∈ Bb( ) and d Ξ ∈ Mp(R ). Denoting the sup norm by k · k and applying Taylor’s theorem to the function F , we can write

GΦF,f (Ξ) (1.19) Z Z ∞ Z 1 " # 0 d X |D| |Ix,r (Ξ)\D| ≤ kF kΞ(R )kfk u (1 − u) νr(du)µ(dr)dx Rd 0 0 D⊆Ix,r (Ξ) |D|≥1 Z Z ∞ Z 1 " # 0 Rd  X |D| |Ix,r (Ξ)\D| 0 0 + kF k Ξ( ) + 1 kfk u (1 − u) νr(du)µ (dr)dx. Rd 0 0 D⊆Ix,r (Ξ) |D|≥1

d Next, using the bounds |Ix,r(Ξ)| = Ξ(B(x, r)) ≤ Ξ(R ),

X |D| |Ix,r (Ξ)\D| |Ix,r (Ξ)| d u (1 − u) = 1 − (1 − u) ≤ u |Ix,r(Ξ)| ≤ u Ξ(R )1{Ξ(B(x,r))>0},

D⊆Ix,r (Ξ) |D|≥1 and Z Z ∞ Z 1 d u Ξ(R )1{Ξ(B(x,r))>0}νr(du)µ(dr)dx Rd 0 0 Z ∞ Z 1 d  ≤ Ξ(R ) u Vol Supp(Ξ) + B(0, r) νr(du)µ(dr) 0 0 Z ∞ Z 1 d 2 d ≤ Cd Ξ(R ) ur νr(du)µ(dr), 0 0 where Supp(Ξ) denotes the (discrete) support of Ξ and Cd is the volume of a d-dimensional ball of radius 1, we obtain that

0 d 2 d  GΦF,f (Ξ) ≤ CdkF kkfk Ξ(R ) Ξ(R ) + 1 Z ∞ Z 1 Z ∞ Z 1  d d 0 0 × ur νr(du)µ(dr) + ur νr(du)µ (dr) . (1.20) 0 0 0 0

From this we can first conclude that the operator G is indeed well-defined on the set of 1 R Rd functions of the form ΦF,f , with F ∈ Cb ( ) and f ∈ Bb( ). It is then straightforward to d see that for every such test function and every Ξ ∈ Mp(R ),

d   EΞ ΦF,f (Ξt) = GΦF,f (Ξ). (1.21) dt t=0

Since Ξ0(Rd) ≤ K a.s., the Yule process with branching rate given in (1.18) that domi- nates the number of particles in Ξ has finite moments at any time t ≥ 0 (see Equation (5) in [50] for the original derivation of the distribution of the number of individuals at any time t in a Yule process, which is negative binomial for any initial number of individuals),

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 13/89 The spatial Lambda-Fleming-Viot process with selection

and so the expression on the r.h.s. of (1.20) applied to Ξt is integrable for any t ≥ 0. Combined with the boundedness of F and Fubini’s theorem, this yields that

Z t ΦF,f (Ξt) − ΦF,f (Ξ0) − GΦF,f (Ξs) ds 0 is integrable for every t ≥ 0. Together with (1.21), this allows us to conclude that Ξ is indeed a solution to the martingale problem (1.16) with initial distribution the law of Ξ0.

1.2.2 Duality relation between (Mt)t≥0 and (Ξt)t≥0

A key feature of our model, that we shall use repeatedly, is the fact that the processes (Mt)t≥0 and (Ξt)t≥0 are dual to each other if we restrict our attention to initial distri- d butions on Mp(R ) of a particular form (in essence, the atoms of Ξ0 should be random and have a distribution absolutely continuous with respect to Lebesgue measure – see below). As in the neutral case [7, 9], this will allow us to transfer the information we obtain on (Mt)t≥0 onto (Ξt)t≥0, and vice versa. Because we want to use this property in the proof of existence of (Mt)t≥0 in Section 2 (more precisely, to show that there is at most one solution to the martingale problem for (L, δM 0 )), Proposition 1.7 is phrased in a more general way and relates (Ξt)t≥0 to any solution to the martingale problem for L. The difficulty that we face is that the density of any element of Mλ is only defined Lebesgue a.e. and so the usual test functions used to establish such dualities in population genetics when the underlying geographical space is discrete, which take the form

k Z  Y D(M, Ξ) := exp ln w(x)Ξ(dx) = w(xi) (1.22) Rd i=1

Pk for M ∈ Mλ with density w and Ξ = i=1 δxi , will not make sense. However, if, instead of taking deterministic points x1, . . . , xk, we take random points, with a distribution d k which has a density ψ with respect to Lebesgue measure on (R ) , then writing µψ for the law of the random measure constructed in this way, we have for any M ∈ Mλ

k Z Z  Y  D(M,X)µψ(dX) = ψ(x1, . . . , xk) w(xj) dx1 ··· dxk (1.23) Rd Rd k Mp( ) ( ) j=1 k Z  Y  = ψ(x1, . . . , xk) 1{0}(κj) M(dx1, dκ1) ··· M(dxk, dκk), Rd k ( ×{0,1}) j=1 which is well-defined (and independent of the representative w of the density of M). The following property will therefore be very useful for the main result of this section, Proposition 1.7.

Lemma 1.6. Suppose that the distribution of Ξ0 has the form µψ, for some k ≥ 1 and some density function ψ on (Rd)k. Then for every t ≥ 0 and every j ∈ {1, 2,...}, 1 j conditionally on Nt = j, the law of (ξt , . . . , ξt ) is absolutely continuous with respect to Lebesgue measure on (Rd)j.

Proof of Lemma 1.6. The desired property follows from the fact that during every event of Πe N or Πe S, the distribution of each “potential parent” is uniformly distributed over the area of the event, independently of the current locations of the atoms of Ξs. Hence, each time a point from Ξs is removed, the one or two atoms that are added have a location whose law is absolutely continuous with respect to Lebesgue measure on Rd, while the

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 14/89 The spatial Lambda-Fleming-Viot process with selection

thinning procedure used to remove the points already in Ξs preserves the property that the distribution of the locations of the remaining atoms has a density with respect to Lebesgue measure.

d k Setting for every vector of k locations (x1, . . . , xk) ∈ (R )

k X Rd Ξ[x1, . . . , xk] := δxi ∈ Mp( ), (1.24) i=1 writing Ξ0 ∼ µψ to denote the fact that the random variable Ξ0 has law µψ, and recalling that P (resp., P) is the probability measure on the space on which (Mt)t≥0 (resp., (Ξt)t≥0) is defined, we can now state the following result, whose proof is given in Section 3. 0 d k Proposition 1.7. Let M ∈ Mλ, k ∈ {1, 2,...} and let ψ be a density function on (R ) .

0 Then any solution (Mt)t≥0 to the BMλ [0, ∞)-martingale problem for (L, δM ) satisfies: for every t ≥ 0 Z  0  0  E D(Mt,X) | M0 = M µψ(dX) = E D(M , Ξt) | Ξ0 ∼ µψ . (1.25) d Mp(R )

Equivalently, by Fubini’s theorem and (1.23),

k  Z  Y   EM 0 ψ(x1, . . . , xk) wt(xj) dx1 ··· dxk Rd k ( ) j=1 Z  Nt  Y 0 j = ψ(x1, . . . , xk)EΞ[x1,...,xk] w ξt dx1 ··· dxk. (1.26) Rd k ( ) j=1

Remark 1.8. By linearity, (1.26) also holds for every ψ ∈ L1((Rd)k). In addition, to keep the notation simple we have restricted our attention to deterministic initial values M 0, but the proof of Proposition 1.7 shows that a similar duality formula holds when M 0 is any Mλ-valued random variable. See in particular (3.12), which only needs to be integrated with respect to the law of M 0 to yield the result. Remark 1.9. One may try to use Proposition 1.7 to prove uniqueness of the solution to the martingale problem for (G, µψ), which is satisfied by (Ξt)t≥0. To this end, in the statement of Proposition 1.7, we would like to replace the process (Ξt)t≥0 of Definition 1.4 by any process (Ξet)t≥0 solving the same martingale problem (1.16). However, in contrast with the explicit construction of Ξ which immediately yields Lemma 1.6, one cannot see from the martingale problem formulation that at any time t ≥ 0, conditionally on d d Ξet(R ), the law of the locations in R of the atoms of Ξet is absolutely continuous with Rd respect to Lebesgue measure on (Rd)Ξet( ). But this property is crucial to the proof of Proposition 1.7, and therefore we cannot prove that (1.26) holds more generally than for the process Ξ of Definition 1.4.

1.3 Convergence of the rescaled SLFVS to Fisher-KPP processes Now that we have introduced the spatial Λ-Fleming-Viot process with selection and its dual process of “potential ancestors”, we turn to the main questions of this work: can we recover the solution to the deterministic or the stochastic Fisher-KPP equation as a scaling limit of the SLFVS, and how does the introduction of (a particular form of) rare but geographically extended extinction-recolonisation events impact the law of the limiting process under analogous scaling assumptions? Recall that the Fisher-KPP equation is a classical model for the wave of advance of a slightly favourable allele in a very dense population, in which individuals reproduce locally, so that changes in local

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 15/89 The spatial Lambda-Fleming-Viot process with selection allele frequencies are continuous in time and space. In our framework, this corresponds to focusing on a regime of parameters in which selective events are rare compared to neutral events, the impact of every event (i.e., the fraction of the local population actually affected by the event) is very small, and the event radii have a bounded variance. Therefore, writing n for a parameter that we shall let tend to infinity, in what follows we shall assume that there exist δ, γ > 0 such that the relative frequency of selective events to neutral events scales like n−δ, and the impact of every event scales like n−γ . Furthermore, in the first case that we consider below, all events will have the same radius (but the assumption of bounded radii would lead to the same type of results), and this assumption will be relaxed in the second case we consider. Since we are interested in the patterns of variation that we see under this model if we look over large spatial and temporal scales, we shall need a third parameter β ≥ 0 to describe the relevant spatial scale to be considered: time will be scaled by a factor n when space will be scaled by a factor nβ. Let us be more precise about our assumptions. First, we concentrate on the particular case in which the intensity measures of the Poisson point processes of reproduction events (see Definition 1.3) satisfy 0 0 µ (dr)νr(du) = snµ(dr)νr(du) (1.27) −δ for a parameter sn of the form σn , with σ > 0 independent of n. That is, the distribution of radii and impacts are the same for neutral and selective events, but neutral events happen nδ/σ times faster than events during which type 1 individuals are favoured. We also choose very special forms for the measures µ(dr) and νr(du). Our results will certainly hold under much more general conditions, but the proofs become obscured by notation. More precisely, we assume that all events (neutral and selective) have impact −γ un = un , where u > 0 is independent of n. In formulae: 0 νr(du) = νr(du) = δun (du) for every r > 0, (1.28) 0 implying in particular that µ = snµ by (1.27). The assumptions that u σ u = , and s = (1.29) n nγ n nδ mirror the usual assumptions in the classical Moran and Wright-Fisher models, in the absence of spatial structure, in which one is interested in the scaling limits that are obtained as population size N tends to infinity while NsN remains O(1) (see, e.g., Chapter 5 in [16]). We shall consider the following two cases:

• Fixed radius: µ(dr) = δR(dr), for some fixed R > 0. In this case, we choose γ = 1/3, δ = 2/3, β = 1/3 and set (in any dimension) Z n 1 β  1 wt (x) := Mnt B(n x, R) × {0} = wnt(y) dy, (1.30) VR VR B(n1/3x,R)

where we recall that VR stands for the volume of a d-dimensional ball of radius R. n 1/3 −1/3 Writing wt (·) = wnt(n ·) and Bn(x) = B(x, n R), we see that d/3 Z n n n wt (x) = wt (x), VR Bn(x) and so this scaling corresponds to scaling down the spatial coordinate by nβ = n1/3 (so that distance one in the new units corresponds to distance n1/3 in the original units), and to considering the timescale (nt, t ≥ 0). The random variable n wt (x) gives the local proportion of individuals of the unfavoured type 0 in a small neighbourhood (of radius n−1/3R) of the point x and at time t in these new units.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 16/89 The spatial Lambda-Fleming-Viot process with selection

• Stable radii: For some α ∈ (1, 2), we set

1{r≥1} µ(dr) = dr, (1.31) rd+α+1

and Z n 1 β  1 wt (x) := Mnt B(n x, 1) × {0} = wnt(y) dy, (1.32) V1 V1 B(nβ x,1) with 1 α − 1 α β = , γ = and δ = . (1.33) 2α − 1 2α − 1 2α − 1

n In both cases, we write M t for the random measure (taking its values in Mλ) with n density wt . It is straightforward to check that the integrability conditions (1.7) are satisfied; in particular, the indicator function 1{r≥1} in the definition of µ in the stable case prevents microscopic events from accumulating at a rate which would violate these conditions. Consequently, the unscaled Mλ-valued process corresponding to each n is n well-defined, and so is its scaled and locally averaged version (M t )t≥0. Note however n n that the process M = (M t )t≥0 is not Markovian. Indeed, it is not simply obtained by a change in space and time coordinates of the measures (Mt)t≥0 (with parameters sn, un, n ...) but its density wt at any time is defined as an average over a ball of fixed radius R (or 1 in the stable case) of the density of Mnt. Therefore, the law of the “parental” type(s) picked during an event cannot be expressed in terms of a sampling from the current n n value of M and, additionally, the change in the value of each wt (y) due to an event centered in B(x, r) will depend on the geometry of the intersection B(nβy, R) ∩ B(x, r). n Rd Hence, the evolution of quantities of the form hwt , fi, with f ∈ Cc( ), cannot be fully n described in terms of M t . Remark 1.10. We recover the parameters for the fixed radius case from those for stable radii on setting α = 2, and so there is some sort of continuity between the two regimes. In the fixed radius case, we are able to provide an informal argument which explains why our choice for the parameters β, γ, δ is appropriate (cf. Section 4). These heuristics also partly explain the choice of the parameter values in the stable case. The missing condition on β, γ, δ in this case is less intuitive and arises from a generator calculation, see also Section 4.

Recall that the space Mλ is equipped with the topology of vague convergence. Let ∞ Rd Rd Cc ( ) denote the set of all smooth compactly supported functions on and recall the notation hw, fi from (1.4). Our main results are as follows, starting with the case of “local” reproduction. n Theorem 1.11 (Fixed radius). Suppose that (M 0 )n≥1 converges in distribution to some n M0 ∈ Mλ. Then, as n → ∞, the process (M t )t≥0 converges weakly in DMλ [0, ∞) towards ∞ ∞ a Markov process (Mt )t≥0 with continuous sample paths, starting at M0 = M0. The limiting process is characterised as follows. Let Z Z 1 2 ΓR = (z1) dzdx (1.34) VR B(0,R) B(x,R)

(where z1 denotes the first coordinate of z). ∞ (i) When d = 1, (Mt )t≥0 is the unique process for which, for every choice of the ∞ ∞ ∞ R representative ws of the density of Ms at every time s, and for every f, g ∈ Cc ( ),  Z t    f ∞ ∞ uΓR ∞ ∞ ∞ Z := hwt , fi − hw0 , fi − hws , ∆fi − 2Ruσ hws (1 − ws ), fi ds 0 2 t≥0

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 17/89 The spatial Lambda-Fleming-Viot process with selection is a continuous zero-mean martingale with quadratic variation at time t equal to

Z t 2 2 ∞ ∞ 2 4R u hws (1 − ws ), f i ds. 0

Furthermore, the bracket process between Zf and Zg is given by

Z t  f g 2 2 ∞ ∞ Z , Z t = 4R u hws (1 − ws ), fgi ds. 0

∞ (ii) When d ≥ 2, (Mt )t≥0 is the unique (deterministic) process for which, for every ∞ ∞ choice of the representative ws of the density of Ms at every time s, and for every ∞ Rd f ∈ Cc ( ) and t ≥ 0, Z t   ∞ ∞ uΓR ∞ ∞ ∞ hwt , fi = hw0 , fi + hws , ∆fi − uσVR hws (1 − ws ), fi ds. 0 2

Informally, in one space dimension, one can see the time-indexed family of densities ∞ of the limiting process (Mt )t≥0 as a weak solution to the stochastic partial differential equation ∂w uΓ = R ∆w − 2Ruσw(1 − w) + 2Rupw(1 − w) W˙ ∂t 2 (independently of the representative chosen at every time t), where W˙ a space-time white noise. In dimension d ≥ 2, on the other hand, the noise term disappears in the ∞ limit and the time-indexed family of densities of (Mt )t≥0 can be seen as a weak solution to the deterministic Fisher-KPP equation

∂w uΓ = R ∆w − uσV w(1 − w). ∂t 2 R

Remark 1.12. As we shall explain in Section 4, our choice of β = 1/3 = γ and δ = 2/3 is obtained by solving

1 − γ = 2β, 1 − δ − γ = 0, and β = γ.

This set of three equations guarantees that in one dimension, the limiting process M ∞ is solution to the stochastic Fisher-KPP equation. If we replace the last condition by the n inequality 0 < β < γ, then the sequence of processes (M )n≥1 still converges, to a limit which is solution to the deterministic Fisher-KPP equation in any dimension (including d = 1). Theorem 1.11 has a counterpart for the correspondingly rescaled dual process. For every n ∈ N, let (Ξt)t≥0 be the process of Definition 1.4 with parameters µ = δR, 0 0 −2/3 −1/3 µ = snδR, νR = νR = δun , where sn = σn and un = un . (Ξt)t≥0 is thus dual to the unscaled process (Mt)t≥0 with the same parameters, in the sense of Proposition 1.7 (to ease the notation, the dependence on n of these processes is not reported). Now, n define the rescaled process (Ξt )t≥0 so that for every t ≥ 0,

n Nt Nnt n X X Ξ = δ n,i := δn−1/3ξi . (1.35) t ξt nt i=1 i=1

d d Recall that the space Mp(R ) of finite point measures on R is endowed with the d topology of weak convergence, and recall also the definition of the law µψ on Mp(R ) given in the paragraph below (1.22).

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 18/89 The spatial Lambda-Fleming-Viot process with selection

Theorem 1.13 (Fixed radius – dual). Let k ∈ {1, 2,...}, ψ be a probability density on Rd k n n ( ) and suppose that for any n ≥ 1, Ξ0 has law µψ. Then, as n → ∞, (Ξt )t≥0 converges ∞ d in distribution in DMp(R )[0, ∞) to a limiting Markov process (Ξt )t≥0 characterised as ∞ follows: Ξ0 has law µψ and ∞ (i) When d = 1, (Ξt )t≥0 is a system of branching and coalescing Brownian motions, in which particles follow independent Brownian motions with variance parameter uΓR, and branch at rate uσVR into two new particles, started at the location of the parent. In addition to branching and diffusing, each pair of particles, independently, also coalesces at rate 4R2u2 times their collision local time. ∞ (ii) When d ≥ 2, (Ξt )t≥0 is a branching Brownian motion (with no coalescence), in which particles follow independent Brownian motions with variance parameter uΓR, and branch at rate uσVR into two new particles, started at the location of the parent. To state the corresponding results for stable radii, we need some more notation. We write Vr(x, y) for the volume of B(x, r) ∩ B(y, r) and define

Z ∞ 1 Vr(y, z) Φ(|z − y|) := d+1+α dr. |z−y| r Vr 2

∞ Rd Now, for every f ∈ Cc ( ) we set Z Dαf(y) = u Φ(|z − y|)(f(z) − f(y))dz. (1.36) Rd

We shall check in Lemma 6.1 that this defines the infinitesimal generator of a symmetric stable process (that is, it is a constant multiple of the fractional Laplacian). Our results for stable radii are then as follows. n Theorem 1.14 (Stable radii). Suppose that M 0 converges in distribution to some M0 ∈ n Mλ. Then, as n → ∞, the process (M t )t≥0 converges weakly in DMλ [0, ∞) towards a ∞ α Markov process (Mt )t≥0 starting at M0. Furthermore, if D denotes the generator of the symmetric α-stable process defined in (1.36), then ∞ (i) When d = 1, (Mt )t≥0 is the unique process for which, for every choice of the ∞ ∞ ∞ R representative ws of the density of Ms at every time s, and for every f, g ∈ Cc ( ),

 Z t    f ∞ ∞ ∞ α 2uσ ∞ ∞ Z := hwt , fi − hw0 , fi − hws , D fi − hws (1 − ws ), fi ds 0 α t≥0 is a continuous zero-mean martingale with quadratic variation at time t equal to

2 Z t 4u ∞ ∞ 2 hws (1 − ws ), f i ds. α − 1 0

Furthermore, the bracket process between Zf and Zg is given by

2 Z t  f g 4u ∞ ∞ Z , Z t = hws (1 − ws ), fgi ds. α − 1 0

∞ (ii) When d ≥ 2, (Mt )t≥0 is the unique (deterministic) process for which, for every ∞ ∞ choice of the representative ws of the density of Ms at every time s, and for every ∞ Rd f ∈ Cc ( ) and t ≥ 0,

Z t   ∞ ∞ ∞ α uσV1 ∞ ∞ hwt , fi = hw0 , fi + hws , D fi − hws (1 − ws ), fi ds. 0 α

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 19/89 The spatial Lambda-Fleming-Viot process with selection

Remark 1.15. Again, our choice of values for β, γ and δ is obtained by solving

1 − γ = αβ, 1 − δ − γ = 0, and (α − 1)β = γ.

(see Section 4) in order to obtain a limiting process M ∞ which is stochastic in one dimension. If we replace the last condition by the inequality 0 < (α − 1)β < γ, then (in n any dimension) (M )n≥0 converges to a deterministic limit which is characterised as in the statement of Theorem 1.14(ii). d Likewise, letting (Ξt)t≥0 be the Mp(R )-valued process which is dual to the unscaled γ process (Mt)t≥0 corresponding to the case of stable radii with parameters un = u/n δ n and sn = σ/n , and defining the rescaled process (Ξt )t≥0 in such a way that for every t ≥ 0, n Nt Nnt n X X Ξ = δ n,i := δn−β ξi , (1.37) t ξt nt i=1 i=1 (with the values of β, γ, δ given in (1.33)), we have the following convergence result. Theorem 1.16 (Stable radii – dual). Let k ∈ {1, 2,...}, ψ be a probability density on Rd k n n ( ) and suppose that for any n ≥ 1, Ξ0 has law µψ. Then, as n → ∞, (Ξt )t≥0 converges ∞ d in distribution in DMp(R )[0, ∞) to a limiting Markov process (Ξt )t≥0 characterised as ∞ follows: Ξ0 has law µψ and ∞ (i) When d = 1, (Ξt )t≥0 is a branching and coalescing stable process, in which particles follow independent symmetric α-stable processes which branch at rate uσV1/α into two particles starting at the location of their parent. The motion of a single particle is fully described by the generator Dα defined in (1.36). In addition, each pair of particles, independently, coalesces at rate 4u2/(α − 1) times their collision local time. ∞ (ii) When d ≥ 2, (Ξt )t≥0 is a branching stable process (with no coalescence), in which particles follow independent symmetric α-stable processes with generator Dα, and branch at rate uσV1/α into two new particles, started at the location of the parent. ∞ In fact, we shall use knowledge of the limiting “population model” (Mt )t≥0 to recover the corresponding limiting results for our rescaled duals. The difficulty with proving Theorems 1.13 and 1.16 directly stems from problems with identifying the limiting coalescence mechanism in one dimension. This contrasts with the situation of uniformly bounded local population densities (i.e., the impact u not tending to zero) considered in [9] in the neutral case and in [18, 19] in the selective case, where it is the ability to identify the limiting behaviour of the (analytically tractable) coalescent dual that allows us to prove results about the large scale evolution of the spatial pattern of allele frequencies. We close this section with a few remarks. First, one may observe from the expression of Dα given in (1.36) that, as in the fixed radius case, the drift component of the limiting process is proportional to u and the quadratic variation is proportional to u2, so that u can be thought of as scaling time. Moreover, the limiting process that we obtain in the stable radius case can be seen as a weak solution to a (stochastic) PDE which only differs from that obtained in the fixed radius case in that the Laplacian has been replaced by the generator of a symmetric stable process. This is, perhaps, at first sight rather surprising. The only effect of the large scale events is on the spatial motion of individuals in the population, and we see no trace of the correlations in their movement, or of the selection or genetic drift acting over large scales, that we have in the prelimiting model. Notice also that the scaling of sn (relative to un) that leads to a nontrivial limit is independent of spatial dimension. In contrast, in [25], the authors consider a different scaling for the parameters and prove a similar convergence result and a central limit theorem, in which the order of magnitude and the limit of the fluctuations around the deterministic

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 20/89 The spatial Lambda-Fleming-Viot process with selection

limiting process are dimension-dependent (despite the fact that the impact un tends to zero, while the dependence on dimension mostly occurs when the impact remains fixed). As remarked above, we would obtain the same results under much more general conditions. For example, in selecting the regions to be affected by events, not only could one take more general measures µ (it is the tail behaviour of µ(dr) that we see in our limits), but also reproduction events do not need to be based on balls. We anticipate that this robustness will also be maintained if one considers more general selection mechanisms, in which the strength and direction of selection depends on the local frequencies of different types in the population, and it should be clear how to modify our proofs in such cases.

1.4 Structure of the paper The rest of the paper is laid out as follows. In Section 2, we prove Theorem 1.2. In Section 3, we prove the duality relation stated in Proposition 1.7. In Section 4, we provide heuristic arguments to explain our rescalings. In Section 5, we turn to proving Theorem 1.11, the scaling limit in the case of fixed radii, and Theorem 1.13 which provides the corresponding result for the rescaled duals. In Section 6, we prove Theorems 1.14 and 1.16, the analogous results for stable radii. In Appendices A and B, we obtain continuity estimates for the rescaled SLFVS of Sections 5 and 6. In particular, these rather technical estimates are key ingredients in (and nice complements to) the proofs of Theorems 1.11 and 1.14.

2 Proof of Theorem 1.2 (existence of the SLFVS)

The strategy of the proof is the following. We start with a version of the process in which Rd is replaced by a hypercube E of finite sidelength and the measures µ and µ0 are assumed to be finite. In this case, the total rate at which events happen is finite and the corresponding process is a well-defined measure-valued Markov jump process with a.s. càdlàg trajectories. We then proceed in two steps:

(i) We show existence when E has finite sidelength but µ and µ0 are only σ-finite, n 0 n n by taking sequences of finite measures (µ )n≥1 and (µ )n≥1 such that µ (dr) converges to µ(dr) (and the same with primes), and proving that the corresponding sequence of processes converges to a well-defined limit.

(ii) Given (i), we extend to Rd by considering a sequence of processes obtained by restricting to an increasing family of hypercubes (En)n≥1 which exhaust the space, and proving that this sequence converges to the process (Mt)t≥0 that we are seeking.

Both steps rely on Theorem 4.8.10 in [21], which states that provided we can show that

(a) The operator L on which the limiting martingale problem is based is included in the set Cb(Mλ) × Cb(Mλ), where Cb(Mλ) is the set of all bounded continuous functions on Mλ; E E (b) The limiting DMλ [0, ∞)-martingale problem for (L,P ) (where P is the distribu- (n) Rd tion of the limit of (M0 )n≥1, in particular P = P ) has at most one solution; (n) (c) For every n, M is a process with sample paths in DMλ [0, ∞) (here we follow n (n) Ethier and Kurtz in taking (Gt )t≥0 to be the natural filtration associated to M ) (n) and the sequence (M )n≥1 is relatively compact;

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 21/89 The spatial Lambda-Fleming-Viot process with selection

(d) There exists a countable set Γ ⊂ [0, ∞) such that for every (F,G) ∈ L,

" Z t+s   k # E (n)  (n) (n) Y (n) lim F Mt+s − F Mt − G Mu du hi Mt = 0 (2.1) n→∞ i t i=1

for all k ≥ 0, 0 ≤ t1 < t2 < ··· < tk ≤ t < t + s with ti, t, t + s∈ / Γ, and hi ∈ Cb(Mλ);

E then there exists a solution (Mt)t≥0 to the DMλ [0, ∞)-martingale problem for (L,P ) (n) and (M )t≥0 ⇒ (Mt)t≥0 as n → ∞. Item (b) will be a consequence of Proposition 1.7 and Remark 1.8, whose proofs are postponed until Section 3 for the sake of clarity. The other items will be checked one by one below. Once this is done, existence of a solution to the martingale problem in DMλ [0, ∞) will imply the existence of a solution in the larger space BMλ [0, ∞), and since uniqueness holds in BMλ [0, ∞) too (see the proof E of item (b) below), this will show that the BMλ [0, ∞)-martingale problem for (L,P ) is well-posed. That is, Theorem 1.2(i) and the property that the trajectories of (Mt)t≥0 are càdlàg a.s. will be proved. Furthermore, since (b) is satisfied for any distribution P E on Mλ, we shall be able to deduce from (a), (b) and Theorem 4.4.2(a) in [21] that the limiting process M is a Markov process with respect to its natural filtration. The last step will consist in showing that its semigroup is Feller. + − Recall the definitions of Θx,r,u(w) and Θx,r,u(w) given in (1.8), and the notation kfk 1 (resp., kfk1) for the supremum (resp., L ) norm of the function f. To simplify notation, we 0 shall restrict our attention to initial distributions P of the form δM 0 for some M ∈ Mλ. Indeed, the extension of (b) to a general P is covered by Remark 1.8, the bounds on the elements of the semi-martingale decomposition on which the proof of (c) rely are independent of the choice of the initial values for the processes of interest, and the proof of (d) can easily be generalised by using the linearity of the expectation and integrating 0 0 all key equations with respect to P (dM ). From now on, we thus fix M ∈ Mλ.

Proof of (i)

Let E be some hypercube with sidelength `, and let µ, µ0 be the σ-finite measures on n 0 n (0, ∞) of Theorem 1.2. Let (µ )n≥1 and (µ )n≥1 be two sequences of finite measures on (0, ∞) such that, as n → ∞,

Z ∞ Z ∞ Z ∞ Z ∞ ϕ(r)µn(dr) % ϕ(r)µ(dr) and ϕ(r)µ0 n(dr) % ϕ(r)µ0(dr), (2.2) 0 0 0 0 for all measurable ϕ ≥ 0. Let Mλ(E) be the analogue of Mλ (see (1.2)) when the d “geographical” space R is replaced by E. That is, Mλ(E) is the set of all measures on E × {0, 1} whose first marginal distribution is Lebesgue measure on E. It is also a compact space when endowed with the topology of vague convergence (note that since E × {0, 1} is compact, the topology of vague convergence is the same as the topology of weak convergence). Let

0 0 ME := M ∈ Mλ(E) E×{0,1} be the measure induced by M 0 (the initial value fixed above) on E × {0, 1}, and for every N,n S,n R n ≥ 1, let ΠE and ΠE be independent Poisson point processes on ×E ×(0, ∞)×[0, 1] n 0 n 0 with respective intensity measures dt ⊗ dx ⊗ µ (dr)νr(du) and dt ⊗ dx ⊗ µ (dr)νr(du). (n) N N,n S Finally, let (Mt )t≥0 be defined as in Definition 1.3, with Π replaced by ΠE , Π S,n (n) 0 replaced by ΠE , and with M0 = ME (what we call the ball B(x, r) in this case is N,n BE(x, r) := B(x, r) ∩ E). For a given n ≥ 1, the total rate at which an event of ΠE or

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 22/89 The spatial Lambda-Fleming-Viot process with selection

S,n ΠE happens is

Z Z ∞ Z 1 Z Z ∞ Z 1 n 0 0 n νr(du)µ (dr)dx + νr(du)µ (dr)dx E 0 0 E 0 0 = Vol(E)µn((0, ∞)) + µ0 n((0, ∞)) < ∞, (2.3) and so M (n) is a Markov jump process with jump rates uniformly bounded by the quantity (n) 0 in (2.3) and with càdlàg paths, solution to the martingale problem: M0 = ME and for every F ∈ C1(R) and f ∈ C(E),

 Z t  (n) (n) (n) (n) ΨF,f Mt − ΨF,f M0 − L ΨF,f Ms ds 0 t≥0

(n) is a martingale (for the natural filtration associated to M ), where ΨF,f is defined as (n) in (1.5) and the bounded continuous function L ΨF,f is defined by

Z Z ∞ Z 1 Z (n) 1 h + L ΨF,f (M) = w(y)F (hΘx,r,u(w), fi) (2.4) E 0 0 BE (x,r) Vol(BE(x, r)) − i n + (1 − w(y))F (hΘx,r,u(w), fi) − F (hw, fi) dy νr(du) µ (dr) dx Z Z ∞ Z 1 Z 1 h +  + 2 w(y)w(z)F hΘx,r,u(w), fi 2 E 0 0 BE (x,r) Vol(BE(x, r)) − i 0 0 n + (1 − w(y)w(z))F (hΘx,r,u(w), fi) − F (hw, fi) dy dz νr(du) µ (dr) dx, for every M ∈ Mλ(E). (As earlier, here w is any representative of the density of M and ± we have kept the notation Θx,r,u for the change in w during an event even though w is now defined on E only.) (n) Let us show that as n → ∞, M converges in distribution in DMλ(E)[0, ∞) to the (∞) (∞) unique solution M to the D [0, ∞)-martingale problem for (L , δ 0 ), where Mλ(E) ME L(∞) is defined as in (2.4) with µn and µ0 n respectively replaced by µ and µ0. We check items (a) − (d) one by one, with L = L(∞) whose domain D(L(∞)) is taken to be the set 1 of all functions of the form ΨF,f with f ∈ C(E) and F ∈ C (R). For item (a), observe that since every mapping w that we consider takes its values ± in [0, 1] (and so does its image by any Θx,r,u) and E is compact, for every f ∈ C(E) and every x ∈ E, r > 0 and u ∈ [0, 1], we have

± hΘx,r,u(w), fi − hw, fi ≤ ukfkVol(BE(x, r)). (2.5)

Consequently, for any F ∈ C1(R), by Taylor’s theorem we have ! ± 0 d F (hΘx,r,u(w), fi) − F (hw, fi) ≤ sup |F (z)| ukfkCdr , (2.6) |z|≤kfkVol(E) where Cd is a constant that depends only on the dimension d. Writing

+ − w(y)F (hΘx,r,u(w), fi) + (1 − w(y))F (hΘx,r,u(w), fi) − F (hw, fi) + − ≤ w(y) F (hΘx,r,u(w), fi) − F (hw, fi) + (1 − w(y)) F (hΘx,r,u(w), fi) − F (hw, fi) ! 0 d ≤ sup |F (z)| ukfkCdr , |z|≤kfkVol(E)

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 23/89 The spatial Lambda-Fleming-Viot process with selection we obtain that !  Z ∞ Z 1 (∞) 0 d L ΨF,f (M) ≤ sup |F (z)| kfkCdVol(E) ur νr(du)µ(dr) |z|≤kfkVol(E) 0 0 Z ∞ Z 1  d 0 0 + ur νr(du)µ (dr) , (2.7) 0 0 and the quantity on the r.h.s. is a finite constant independent of M by Condition (1.7). (∞) (∞) This result proves that the operator L is indeed well defined on D(L ). Since Mλ(E) is a compact subset of the set of all measures on E × {0, 1} and since functions of the form ΨF,f are continuous on the latter, each ΨF,f belongs to Cb(Mλ(E)). Recalling (1.5), (∞) we can rewrite L ΨF,f (M) in terms of the measure M as follows:

(∞) L ΨF,f (M) (2.8) Z Z ∞ Z 1 Z   Z = F f(z)1{0}(k)M(dz, dk) E 0 0 BE (x,r)×{0,1} E×{0,1} Z Z  − u f(z)1{0}(k)M(dz, dk) + u1{0}(κ) f(z)dz BE (x,r)×{0,1} BE (x,r)  Z  M(dy, dκ) − F f(z)1{0}(k)M(dz, dk) νr(du)µ(dr)dx E×{0,1} Vol(BE(x, r)) Z Z ∞ Z 1 Z   Z + F f(z)1{0}(k)M(dz, dk) 2 E 0 0 (BE (x,r)×{0,1}) E×{0,1} Z Z  0 − u f(z)1{0}(k)M(dz, dk) + u1{0}(κ ∨ κ ) f(z)dz BE (x,r)×{0,1} BE (x,r)  Z  ⊗2 0 0 M (dy, dκ, dy , dκ ) 0 0 − F f(z)1{0}(k)M(dz, dk) 2 νr(du)µ (dr)dx. E×{0,1} Vol(BE(x, r)) Using the Dominated Convergence Theorem and (2.7), it is then straightforward to check (∞) that if (Ml)l≥0 is a sequence in Mλ(E) converging to M, then L ΨF,f (Ml) converges (∞) (∞) to L ΨF,f (M) as l → ∞ and the function L ΨF,f is (sequentially) continuous on (∞) Mλ(E). Together with (2.7), this implies that L ΨF,f ∈ Cb(Mλ(E)) and item (a) is proved. Item (b) is a consequence of the exact analogue of Proposition 1.7 in which the d “geographical” space R is replaced by E (and Mλ by Mλ(E)). Indeed, by Lemma 2.1(c) in [49], the linear span of the set of constant functions and of functions of the form

k Z  Y  M 7→ ψ(x1, . . . , xk) w(xj) dx1 ··· dxk, (2.9) Rd k ( ) j=1

(where M has density w) for k ≥ 1 and ψ ∈ L1((Rd)k) ∩ C((Rd)k), is dense in the set of d all continuous functions on the compact space Mλ(E) (and the same holds with E = R ). This set of functions is therefore separating on the space of all probability distributions on Mλ(E). We can then proceed exactly as in the proof of Proposition 4.4.7 in [21] to 0 conclude that for every M ∈ Mλ(E), uniqueness holds for the BMλ [0, ∞)-martingale (∞) problem for (L , δM 0 ) (or more generally for any distribution for the initial value M0). Indeed, in short (1.26) allows us to conclude that any two solutions to the mar- tingale problem have the same one-dimensional distributions, and then Theorem 4.4.2 in [21] gives us that these two solutions necessarily have the same finite dimensional distributions and thus uniqueness in BMλ [0, ∞) holds. Item (b) is proved. (n) We now turn to item (c), the relative compactness of (M )n≥1. Since Mλ(E) (equipped with the topology of vague convergence) is a compact space and since by

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 24/89 The spatial Lambda-Fleming-Viot process with selection

(∞) Lemma 1.1 the set D(L ) is dense in C(Mλ(E)), by Theorem 3.9.1 in [21] the relative (n) compactness of (M )n≥1 is equivalent to the relative compactness of the sequence of (n) (∞) 1 real-valued processes (ΨF,f (M ))n≥1 for all ΨF,f ∈ D(L ). Thus let F ∈ C (R) and n f ∈ C(E). Using the standard Aldous-Rebolledo criterion [2, 43] and writing (Φt )t≥0 (n) n for the predictable finite variation part of ΨF,f (M ) and (Qt )t≥0 for the predictable quadratic variation of its martingale part, we only have to show that

(n) (1) For every t ≥ 0, the sequence (ΨF,f (Mt ))n≥1 is tight. (2) For every T > 0, given a sequence of stopping times (τn)n≥1 bounded by T , for every ε > 0 there exists δ > 0 such that

lim sup sup P  Φn − Φn > ε ≤ ε, τn+θ τn (2.10) n→∞ θ∈[0,δ]

and lim sup sup P  Qn − Qn > ε ≤ ε. τn+θ τn (2.11) n→∞ θ∈[0,δ]

(1) is straightforward, since for any potential density w we have |hw, fi| ≤ kfkVol(E) and (n) F is continuous, which implies that ΨF,f (Mt ) is bounded uniformly in n and t. To deal (n) (n) with (2), for every time t we fix a representative wt of the density of Mt . Since each (n) (Mt )t≥0 is a Markov jump process with bounded jump rates, we have Z t n (n) (n) Φt = L ΨF,f (Ms )ds 0 and Z t Z Z ∞ Z 1 Z n 1 n (n)  + (n) (n) 2 Qt = ws (y) F (hΘx,r,u(ws ), fi) − F (hws , fi) 0 E 0 0 BE (x,r) V (BE(x, r)) (n)  − (n) (n) 2o n + (1 − ws (y)) F (hΘx,r,u(ws ), fi) − F (hws , fi) dy νr(du) µ (dr) dx ds Z t Z Z ∞ Z 1 Z 1 n (n) (n)  + (n) + 2 ws (y)ws (z) F (hΘx,r,u(ws ), fi) 2 0 E 0 0 BE (x,r) V (BE(x, r)) (n) 2 − F (hws , fi) (n) (n)  − (n) (n) 2o 0 0 n + (1 − ws (y)ws (z)) F (hΘx,r,u(ws ), fi) − F (hws , fi) dy dz νr(du) µ (dr) dx ds.

Using the expression for L(n) given in (2.4) and the bound (2.6), as in (2.7) we obtain that for every M ∈ Mλ(E),

!  Z Z ∞ Z 1 (n) 0 d n L ΨF,f (M) ≤ C sup |F (z)| kfk ur νr(du)µ (dr)dx |z|≤kfkVol(E) E 0 0 Z Z ∞ Z 1  d 0 0 n + ur νr(du)µ (dr)dx E 0 0  Z ∞ Z 1 Z ∞ Z 1  0 d d 0 0 ≤ C ur νr(du)µ(dr) + ur νr(du)µ (dr) , (2.12) 0 0 0 0 where we have used that E has finite volume and, by assumption, µn(dr) % µ(dr) (and the corresponding statement with primes). By Condition (1.7), the expression on the r.h.s. is finite (and independent of M), and so with probability 1 we have for every θ > 0

Φn − Φn ≤ C00θ, τn+θ τn

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 25/89 The spatial Lambda-Fleming-Viot process with selection where C00 is independent of n (and even of T ). Hence, it suffices to choose δ > 0 small enough for (2.10) to hold. n Similarly, the integrands in the expression of Qt are bounded by !2 C00 sup |F 0(z)| kfk2u2 min(r2d, `2d), (2.13) |z|≤kfkVol(E)

2 where ` is the sidelength of E. But u ≤ u and there exists a constant CE < ∞ such that 2d 2d d min(r , ` ) ≤ CEr , and so the same reasoning shows that (2.11) holds too for δ > 0 (n) small enough. The relative compactness of every sequence (ΨF,f (M )) is proved, and (n) the relative compactness of (M )n≥1 in DMλ(E)[0, ∞) thus follows. 1 Finally, we prove item (d). Let F ∈ C (R) and f ∈ C(E). Let also k ≥ 0, 0 ≤ t1 < t2 < (n) ··· < tk ≤ t < t + s and hi ∈ Cb(Mλ(E)) for all i ∈ {1, . . . , k}. Since M satisfies the (n) martingale problem for (L , δ 0 ), we have ME " Z t+s   k # E (n)  (n) (n) (n) Y (n) ΨF,f Mt+s − ΨF,f Mt − L ΨF,f Mu du hi Mti = 0. t i=1 (2.14) We can thus write " Z t+s   k # E (n)  (n) (∞) (n) Y (n) ΨF,f Mt+s − ΨF,f Mt − L ΨF,f Mu du hi Mti t i=1 " k # Z t+s      E (n) (n) (∞) (n) Y (n) = 0 + L ΨF,f Mu − L ΨF,f Mu du hi Mti . t i=1 (2.15)

Using (2.2) and applying the estimate (2.12) with the positive measures µ(dr)νr(du) − n 0 0 0 n 0 µ (dr)νr(du) and µ (dr)νr(du) − µ (dr)νr(du), we see that we can use the Dominated Convergence Theorem to argue that the second term on the r.h.s. of (2.15) converges to 0 as n → ∞. Hence, (2.1) holds true with Γ = ∅ and item (d) is proved. As detailed at the beginning of this section, we can therefore conclude that there (∞) (∞) exists a unique solution M to the B [0, ∞)-martingale problem for (L , δ 0 ) Mλ(E) ME and this process is Markov with respect to its natural filtration and has càdlàg paths a.s. Concerning the Feller property of the semigroup of M (∞), the fact that for every t ≥ 0 E (∞) and every ϕ ∈ C(Mλ(E)), M 7→ M [ϕ(Mt )] is a continuous function is a consequence of the continuity in M 0 of the quantity on the r.h.s. of (1.26) (which is more easily seen in (1.23) when we replace ψ by the density at time t – conditional on Nt – of 1 Nt the locations of the atoms ξt , . . . , ξt ordered in some arbitrary way) and the property already mentioned in item (b) that the linear span of the set of constant functions and of functions of the form (2.9) is dense in C(Mλ(E)). The strong continuity of the semigroup (∞) is a consequence of the fact, proved in item (d), that for every ΨF,f ∈ D(L ), we have for every t ≥ 0 and every M ∈ Mλ(E)  Z t  E  (∞) E (∞) (∞) M ΨF,f Mt − ΨF,f (M) = M L ΨF,f Ms ds . 0

Together with the uniform bound (2.7), it shows that there exists CF,f > 0 such that for every t ≥ 0 E  (∞) sup M ΨF,f Mt − ΨF,f (M) ≤ CF,f t, M∈Mλ(E) and the quantity on the l.h.s. indeed converges to 0 as t → 0. This property can then be extended to any ϕ ∈ C(Mλ(E)) by Lemma 1.1. The proof of (i) is thus complete.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 26/89 The spatial Lambda-Fleming-Viot process with selection

Proof of (ii)

The proof of (ii) follows exactly the same pattern, but now the task of bounding the integrals defining Φn and Qn becomes more delicate. The resolution is to exploit the fact that f has compact support Sf . 0 0 Let µ, µ , ν and ν satisfy (1.7) and let {En}n≥1 be a sequence of hypercubes increasing d d to R . We embed each Mλ(En) into Mλ = Mλ(R ) by setting w(x) ≡ 0 outside En. [n] For every n ≥ 1, let M denote the Mλ-valued Markov process obtained by imposing [n] that M should evolve like the SLFVS on En × {0, 1} obtained in (i), and En×{0,1} [n] [n] c N [n] M c = dx c ⊗ δ1(dκ) (i.e., w ≡ 0 on En). For each n ∈ , M is an En×{0,1} En a.s. càdlàg process and we assume that it starts from the measure M 0 obtained by En restricting M 0 to E (as in (i)) and by assuming that its “density” w0 is 0 outside n En E (obviously, M 0 converges vaguely to M 0 as n → ∞). According to the previous n En 1 d paragraph, it satisfies the property that for every F ∈ C (R) and every f ∈ Cc(R ),

 Z t  [n] [n] [n] [n] ΨF,f Mt − ΨF,f M0 − L ΨF,f Ms ds (2.16) 0 t≥0 is a martingale, where

[n] L ΨF,f (M) Z ∞ Z Z 1 Z 1 h = w(y)F (hΘ+ (w), fi) Vol(B (x, r)) n,x,r,u 0 (Sf +B(0,r))∩En 0 BEn (x,r) En − i + (1 − w(y))F (hΘn,x,r,u(w), fi) − F (hw, fi) dy νr(du) dx µ(dr) Z ∞ Z Z 1 Z 1 h +  + 2 w(y)w(z)F hΘn,x,r,u(w), fi 2 Vol(B (x, r)) 0 (Sf +B(0,r))∩En 0 BEn (x,r) En −  i 0 0 + (1 − w(y)w(z))F hΘn,x,r,u(w), fi − F (hw, fi) dy dz νr(du) dx µ (dr). (2.17)

Here we have written Sf + B(0, r) := {x + y : x ∈ Sf , y ∈ B(0, r)} (motivated by the fact that if the centre of an event of radius r does not belong to this set, then the event does not intersect the support of f and therefore it does not affect the value of ΨF,f (M)) ± and we have chosen to report the dependence of the operations Θn,x,r,u on n since they modify the value of w only within En. The key observation is that

d |h1B(x,r)w, fi| ≤ kfkVol(Sf ∩ B(x, r)) ≤ C1kfk(r ∧ 1), (2.18) and

d Vol(Sf + B(0, r)) ≤ C2(r ∨ 1), (2.19)

where C1 and C2 are independent of r and depend only on the support of f. Moreover, the estimate (2.18) is uniform in w and, in particular, the same bound holds if we replace w by 1 − w.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 27/89 The spatial Lambda-Fleming-Viot process with selection

To see how to apply this, consider the part of (2.17) corresponding to neutral events. We split the integral over (0, ∞) at some radius R0 > 1. We have that

Z ∞ Z Z 1 Z 1 h + w(y)F (hΘ (w), fi) Vol(B (x, r)) n,x,r,u R0 (Sf +B(0,r))∩En 0 BEn (x,r) En

− i + (1 − w(y))F (hΘ (w), fi) − F (hw, fi) dy νr(du) dx µ(dr) n,x,r,u Z ∞ Z Z 1 0 ≤ C3(F , f) uVol(B(x, r) ∩ Sf ) νr(du) dx µ(dr) R0 (Sf +B(0,r))∩En 0 Z ∞ Z 1 0 d ≤ C4(F , f)Vol(Sf ) ur νr(du)µ(dr), (2.20) R0 0

0 0 0 where C3(F , f) and C4(F , f) depend only on F and f and the last line uses (2.19) and the fact that Vol(B(x, r) ∩ Sf ) ≤ Vol(Sf ). To control the second part of the integral corresponding to the neutral part, notice that a simple estimate using the fact that the corresponding events have radius bounded above by R0, yields

Z R0 Z Z 1 Z 1 h + w(y)F (hΘ (w), fi) Vol(B (x, r)) n,x,r,u 0 (Sf +B(0,r))∩En 0 BEn (x,r) En

− i + (1 − w(y))F (hΘ (w), fi) − F (hw, fi) dy νr(du) dx µ(dr) n,x,r,u Z R0 Z Z 1 0 ≤ C3(F , f) uVol(B(x, r) ∩ Sf ) νr(du) dx µ(dr) 0 (Sf +B(0,r))∩En 0 Z R0 Z 1 0 d ≤ C5(F , f)Vol(Sf + B(0,R0)) ur νr(du)µ(dr), (2.21) 0 0

d where we have used (2.18) to bound Vol(B(x, r) ∩ Sf ) by Cr (independently of x), and then we have bounded the remaining integral of dx over (Sf + B(0, r)) ∩ En by Vol(Sf + B(0,R0)). Observe that both bounds (2.20) and (2.21) are finite by Condition (1.7), and they are independent of M (or w). Exactly the same arguments control the selection part of the generator L[n]. Furthermore, the same bounds apply if we replace the operator L[n] by L defined in (1.9). Consequently, we can proceed as in (i) to prove that L is included in Cb(Mλ) × Cb(Mλ), which was item (a) to check. Item (b) is a direct consequence of Proposition 1.7 and of the same arguments as in the analogous part of the proof of [n] (i). The fact that each M is a process with sample paths in DMλ [0, ∞) is part of the conclusion of (i). Next, the estimates (2.20) and (2.21) enable us to proceed as in the [n] proof of item (c) for (i) to show that the sequence (M )n≥1 is relatively compact, which proves item (c) for (ii). To check item (d), notice that by Condition (1.7), by taking R0 sufficiently large, the right hand side of (2.20) can be made arbitrarily small, uniformly in M. This is enough to ensure that the missing contribution of the events centered outside En is negligible, that is that Z ∞ Z Z 1 Z 1 h + w(y)F (hΘx,r,u(w), fi) c Vol(B(x, r)) 0 (Sf +B(0,r))∩En 0 B(x,r)

− i + (1 − w(y))F (hΘ (w), fi) − F (hw, fi) dy νr(du) dx µ(dr) x,r,u Z ∞ Z 1 d ≤ CVol(Sf ) ur νr(du)µ(dr) → 0 (2.22) c d(Sf ,En) 0

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 28/89 The spatial Lambda-Fleming-Viot process with selection

c uniformly in M (or w) as n → ∞, where d(Sf ,En) is the minimal distance between a c point of Sf and a point of En (which tends to infinity as n tends to infinity). The same estimates hold for the selection term, and can also be used to control the error due to ± ± the vanishing difference between hΘx,r,u(w), fi and hΘn,x,r,u(w), fi (the latter modifying w on B(x, r) ∩ En only). We can then argue as in (2.15) to conclude. Since items (a) − (d) are now checked, we can conclude that there exists a unique 0 measurable process (Mt)t≥0 such that M0 = M and satisfying the martingale prob- lem (1.11), and this process is Markov and has càdlàg paths a.s. The Feller property of its semigroup can then be derived using the same arguments as in the proof of (i), recalling the uniform bound on L obtained in (2.20) and (2.21) (see the paragraph following (2.21)).

3 Proof of Proposition 1.7 (duality)

To prove Proposition 1.7, we first show that we can extend the operator L to a larger class of functions on Mλ and that any solution to the BMλ [0, ∞)-martingale problem for (L, δM 0 ) stated in (1.11) satisfies the corresponding extended martingale problem. We then use this new set of test functions to complete the proof of Proposition 1.7. 1 d k For every k ∈ {1, 2,...} and ψ ∈ L ((R ) ), let us define the function Dψ by: for every M ∈ Mλ with density w,

k Z  Y  Dψ(M) := ψ(x1, . . . , xk) 1{0}(κj) M(dx1, dκ1) ··· M(dxk, dκk) Rd k ( ×{0,1}) j=1 k Z  Y  = ψ(x1, . . . , xk) w(xj) dx1 ··· dxk. (3.1) Rd k ( ) j=1

Let us also set

LDψ(M) Z Z ∞ Z 1 Z Z   1 Y Y  := ψ(x1, . . . , xk) w(xj) w(y) (1 − u)w(xj) + u Rd Vr Rd k 0 0 B(x,r) ( ) j∈Ic j∈I  Y  Y + (1 − w(y)) (1 − u)w(xj) − w(xj) dx1 ... dxkdyνr(du)µ(dr)dx j∈I j∈I Z Z ∞ Z 1 Z 1 Z  Y  + 2 ψ(x1, . . . , xk) w(xj) Rd 2 V Rd k 0 0 B(x,r) r ( ) j∈Ic   Y  Y  Y × w(y)w(z) (1 − u)w(xj) + u + (1 − w(y)w(z)) (1 − u)w(xj) − w(xj) j∈I j∈I j∈I 0 0 dx1 ... dxkdydzνr(du)µ (dr)dx, (3.2)

where we use the notation I := {i : xi ∈ B(x, r)} and the convention that a product over d the empty set is equal to 1. Note that for every ψ ∈ Cc(R ), the function Dψ coincides with the function ΨId,ψ defined in (1.5) and, likewise, the function LDψ defined in (3.2) coincides with the function LΨId,ψ defined in (1.9). To see this, let us observe that the first part of (3.2) can be rewritten (using the convention that the product over an empty

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 29/89 The spatial Lambda-Fleming-Viot process with selection set is equal to 1, and then Fubini’s theorem to pass from the first line to the next)

Z Z ∞ Z 1 Z Z 1   ψ(x1)1{x1∈B(x,r)} w(y) (1 − u)w(x1) + u Rd 0 0 B(x,r) Vr Rd  + (1 − w(y))(1 − u)w(x1) − w(x1) dx1dyνr(du)µ(dr)dx Z Z ∞Z 1Z 1 n  = w(y) 1B(x,r) (1 − u)w + u , ψ + (1 − w(y)) 1B(x,r)(1 − u)w, ψ Rd 0 0 B(x,r) Vr o − 1B(x,r)w, ψ dyνr(du)µ(dr)dx, which is equal to the first integral on the r.h.s. of (1.9) when F = Id and f = ψ. The same reasoning can be made on the second part of (3.2). The following result shows that the expression in (3.2) in fact extends the operator L defined in (1.9) to all functions of the form Dψ.

Lemma 3.1. Let (Mt)t≥0 be a solution to the martingale problem (1.11). Then for every k ≥ 1 and ψ ∈ L1((Rd)k),

 Z t  Dψ(Mt) − Dψ(M0) − LDψ(Ms)ds (3.3) 0 t≥0 is a martingale.

Proof of Lemma 3.1. We have already checked that the desired property held true for d k = 1 and ψ ∈ Cc(R ), since then Dψ = ΨId,ψ and LDψ = LΨId,ψ. We first extend the result to ψ ∈ L1(Rd) by a density argument, and then consider the case k ≥ 2. To complete both parts of the programme, we need a general bound on functions of 1 d m the form LDψ that we derive now. Let m ≥ 1 and ψe ∈ L ((R ) ). For every M ∈ Mλ (with density w), the expression for LD (M) given in (3.2) can be rewritten as ψe

Z Z ∞ Z 1 Z 1 Z  Y  ψe(x1, . . . , xm) w(xj) Rd Vr Rd m 0 0 B(x,r) ( ) j∈Ic    X |J| |I\J| Y |I|  Y × (1 − u) u w(xj) w(y) + (1 − u) − 1 w(xj) J⊂I,J6=I j∈J j∈I

dx1 ... dxmdyνr(du)µ(dr)dx Z Z ∞ Z 1 Z 1 Z  Y  + 2 ψe(x1, . . . , xm) w(xj) Rd 2 V Rd m 0 0 B(x,r) r ( ) j∈Ic     X |J| |I\J| Y |I|  Y × w(y)w(z) (1 − u) u w(xj) + (1 − u) − 1 w(xj) J⊂I,J6=I j∈J j∈I 0 0 dx1 ... dxkdydzνr(du)µ (dr)dx.

Bounding w by 1 and using the facts that

X |J| |I\J| |I| (1 − u) u = 1 − (1 − u) ≤ u|I|1{|I|≥1} ≤ mu1{B(x,r)∩{x1,...,xm}6=∅} J(I and d  d Vol {x ∈ R : B(x, r) ∩ {x1, . . . , xm}= 6 ∅} ≤ mCdr

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 30/89 The spatial Lambda-Fleming-Viot process with selection for a constant C depending only on d, we obtain that the first term in |LD (M)| is d ψe bounded by

Z Z ∞ Z 1 Z Z 1 ψe (x1, . . . , xm)1{B(x,r)∩{x1,...,xm}6=∅}2mu Rd 0 0 B(x,r) Vr (Rd)m

dx1 ··· dxmdyνr(du)µ(dr)dx Z ∞ Z 1 Z 2 d ≤ ψe (x1, . . . , xm)2m Cdr u dx1 ··· dxmνr(du)µ(dr) 0 0 (Rd)m Z ∞ Z 1 2 d = 2m Cdkψek1 r u νr(du)µ(dr). (3.4) 0 0

We can then bound the second part of |LD (M)| in a similar way and obtain that ψe

 Z ∞ Z 1 Z ∞ Z 1  |LD (M)| ≤ 2m2C kψk rdu ν (du)µ(dr) + rdu ν0 (du)µ0(dr) , (3.5) ψe d e 1 r r 0 0 0 0 and the expression on the r.h.s. is finite by Condition (1.7). d 1 d 1 Using the fact that Cc(R ) is dense in L (R ) (for the L norm), the bound (3.5) and dominated convergence, we can then use the same approach as in the proof of items (d) in Section 2 (see Equations (2.14) and (2.15)) to conclude that the process in (3.3) is indeed a martingale when ψ ∈ L1(Rd). Let us now consider k ≥ 2. Any integrable function ψ on (Rd)k can be approximated 1 (in L norm) by linear combinations of functions of the product form ψ1(x1) ··· ψk(xk) d with ψi ∈ Cc(R ) for every i. Furthermore, by polarisation, the test function

k k Y  Z  Y D⊗ψi (M) = ψi(xi)1{0}(κi)M(dxi, dκi) = hw, ψii (3.6) Rd i=1 ×{0,1} i=1 can in turn be written as a linear combination of functions of the form hw, fim, with d m ∈ N and f ∈ Cc(R ), for which we can use (1.9) to obtain

Z Z ∞ Z 1 Z 1 h m LΨ(·)m,f (M) = w(y) 1B(x,r)c w + 1B(x,r)((1 − u)w + u), f Rd 0 0 B(x,r) Vr m mi + (1 − w(y)) 1B(x,r)c w + 1B(x,r)(1 − u)w, f − hw, fi dyνr(du)µ(dr)dx Z Z ∞ Z 1 Z 1 h m c + 2 w(y)w(z) 1B(x,r) w + 1B(x,r)((1 − u)w + u), f Rd 0 0 B(x,r)2 Vr m mi + (1 − w(y)w(z)) 1B(x,r)c w + 1B(x,r)(1 − u)w, fi − hw, f 0 0 dy dzνr(du)µ (dr)dx. (3.7)

Qm On the other hand, taking ψ(x1, . . . , xm) = i=1 f(xi) in (3.1), we obtain that

m Dψ(M) = hw, fi .

Let us thus show that, in this case, the expression on the r.h.s. of (3.2) coincides with (3.7). We focus on the first term on the r.h.s. of (3.2), since the computations are the same for the other terms. For fixed x, r, u, y, and writing B for B(x, r) to simplify the

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 31/89 The spatial Lambda-Fleming-Viot process with selection notation, we have Z    Y Y  f(x1) ··· f(xm) w(xi) (1 − u)w(xj) + u dx1 ... dxm (Rd)m j:xj ∈/B j:xj ∈B X Z  Y  Y  = f(x1) ··· f(xm) 1{xj ∈/B}w(xj) 1{xj ∈B}((1 − u)w(xj) + u) Rd m J⊆{1,...,m} ( ) j∈J c j∈J

dx1 ... dxm X m−|J| |J| = 1Bc w, f 1B((1 − u)w + u), f J⊆{1,...,m} m   X m m−j j  m = 1 c w, f 1 ((1 − u)w + u), f = 1 c w + 1 (1 − u)w + u , f , j B B B B j=0 which coincides with the integrand in the first part of (3.7). Checking that the same holds for the three other parts of (3.7), we can conclude that the two expressions for the action of L on functions of the form hw, fim coincide. Consequently, the process d in (3.3) is a martingale for ψ form f ⊗ · · · ⊗ f with f ∈ Cc(R ), and by linearity for ψ of d the form ψ1 ⊗ · · · ⊗ ψk with ψi ∈ Cc(R ) for every i. Using the same density argument as in the case k = 1, together with the bound (3.5), we can finally conclude that the process in (3.3) is a martingale for every k ≥ 1 and every ψ ∈ L1((Rd)k), and Lemma 3.1 is proved.

Remark 3.2. Note that either of these two sets of test functions, (1.5) or (3.1), is sufficient to characterise the law of the SLFVS (see Lemma 1.1 for the first set, and Lemma 2.1(c) in [49] for the second), and so we can use them interchangeably. In particular, the family (1.5) will be more convenient in proving the convergence of our rescaled Mλ-valued processes, whereas the duality relation that will give us the uniqueness of the limit is based on the family (3.1). Armed with Lemma 3.1, we can now prove Proposition 1.7.

Proof of Proposition 1.7. Despite the fact that Theorem 4.4.11 in [21] does not directly apply, we follow its proof closely (with α ≡ 0 ≡ β). Let (Mt)t≥0 be solution to the martingale problem (1.11). For every s, t ≥ 0, let

h  i F (s, t) := EM 0 E D(Ms, Ξt) | Ξ0 ∼ µψ .

By Lemma 1.6, since Ξ0 has law µψ, at every time t ≥ 0 the locations of the atoms of Ξt have a joint distribution which is absolutely continuous with respect to Lebesgue (n) measure. Let us write ψt for the density of these points conditionally on the event {Nt = n}. We thus have, by (1.23),

  Z  Nt   (Nt)  Y E 0 F (s, t) = M E ψt x1, . . . , xNt ws(xj) dx1 ··· dxNt Ξ0 ∼ µψ (3.8) Rd N ( ) t j=1 h i E   = E M 0 D (Nt) (Ms) Ξ0 ∼ µψ , ψt where the last line uses Fubini’s theorem. Since for every n ∈ {1, 2,...} we have (n) L1 Rd n ψt ∈ (( ) ), we can use Lemma 3.1 and write that   Z s  

E 0 F (s, t) − F (0, t) = E M LDψ(Nt) (Mτ )dτ Ξ0 ∼ µψ . (3.9) 0 t

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 32/89 The spatial Lambda-Fleming-Viot process with selection

On the other hand, for any fixed s ≥ 0 and any t ≥ 0, we can rewrite the expression on the r.h.s. of (3.8) as

  Nt  Y j h h  ii EM 0 E ws ξ Ξ0 ∼ µψ = EM 0 E exp hΞt, ln wsi Ξ0 ∼ µψ t j=1 h h ii E 0 = M E Φexp,ln ws (Ξt) Ξ0 ∼ µψ , (3.10) where we have fixed a representative ws of the density of Ms (since by (1.23), the r.h.s. of (3.10) is independent of the choice of this representative) and Φexp,ln ws is defined as in (1.13) with F = exp and f = ln ws. Here we use the convention that Φexp,ln ws (Ξ) = 0 whenever at least one of the atoms x of Ξ is such that ws(x) = 0. With this convention, the definition of GΦexp,ln ws (Ξ) given in (1.14) still makes sense, the function Φexp,ln ws takes its values in [0, 1] and the same bound as in (1.19) controls the expectation of 0 Q i GΦexp,ln ws (Ξτ ) for every τ ∈ [0, t] (since in this case F (hΞτ , fi) = i ws(ξτ ) ∈ [0, 1]). We may therefore use Proposition 1.5 to write that   Z t 

E 0 F (s, t) − F (s, 0) = M E GΦexp,ln ws (Ξτ )dτ Ξ0 ∼ µψ . (3.11) 0 It remains to show that for every s, t ≥ 0, h   i h  i E 0 E 0 E M LD (Nt−s) (Ms) Ξ0 ∼ µψ = M E GΦexp,ln ws (Ξt−s) Ξ0 ∼ µψ , (3.12) ψt−s so that we may use Lemma 4.4.10 in [21] to conclude that F (t, 0) = F (0, t), which is equivalent to (1.25). Using (1.14) and shortening the notation Ix,r(Ξt−s) into I, we have,

GΦexp,ln ws (Ξt−s) Z Z ∞ Z 1 Z   1 Y j = ws(ξt−s) Rd Vr 0 0 B(x,r) j∈Ic "  # X |D| |I\D| Y i Y i × u (1 − u) ws(y) ws(ξt−s) − ws(ξt−s) dyνr(du)µ(dr)dx D⊆I;|D|≥1 i∈Dc i∈I Z Z ∞ Z 1 Z  " 1 Y j X |D| |I\D| + 2 ws(ξt−s) u (1 − u) Rd 2 V 0 0 B(x,r) r j∈Ic D⊆I;|D|≥1  # Y i Y i 0 0 × ws(y)ws(z) ws(ξt−s) − ws(ξt−s) dydzνr(du)µ (dr)dx i∈Dc i∈I Z Z ∞Z 1Z  " ! 1 Y j X |D| |I\D| Y i = ws(ξt−s) ws(y) u (1 − u) ws(ξt−s) Rd Vr 0 0 B(x,r) j∈Ic D⊆I;|D|≥1 i∈Dc # |I| Y i − 1 − (1 − u) ws(ξt−s) dyνr(du)µ(dr)dx i∈I Z Z ∞ Z 1 Z  " 1 Y j + 2 ws(ξt−s) ws(y)ws(z) Rd 2 V 0 0 B(x,r) r j∈Ic ! # X |D| |I\D| Y i |I| Y i × u (1 − u) ws(ξt−s) − 1 − (1 − u) ws(ξt−s) D⊆I;|D|≥1 i∈Dc i∈I 0 0 dydzνr(du)µ (dr)dx. (3.13)

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 33/89 The spatial Lambda-Fleming-Viot process with selection

(Nt−s) On the other hand, using (3.2) and writing ψt−s for ψt−s to ease the notation, we obtain

LDψt−s (Ms) Z Z ∞ Z 1 Z 1 Z  Y  = ψt−s(x1, . . . , xNt−s ) ws(xj) Rd Vr Rd Nt−s 0 0 B(x,r) ( ) j∈Ic   Y  Y  Y × ws(y) (1 − u)ws(xj) + u + (1 − ws(y)) (1 − u)ws(xj) − ws(xj) j∈I j∈I j∈I

dx1 ... dxNt−s dyνr(du)µ(dr)dx Z Z ∞ Z 1 Z 1 Z  Y  + 2 ψt−s(x1, . . . , xNt−s ) ws(xj) Rd 2 V Rd Nt−s 0 0 B(x,r) r ( ) j∈Ic  Y  Y  × ws(y)ws(z) (1 − u)ws(xj) + u + (1 − ws(y)ws(z)) (1 − u)ws(xj) j∈I j∈I  Y 0 0 − ws(xj) dx1 ... dxNt−s dydzνr(du)µ (dr)dx. (3.14) j∈I

Now, the first integral in the above is equal to

Z Z ∞ Z 1 Z 1 Z Y ψt−s(x1, . . . , xNt−s ) ws(xj) Rd Vr Rd Nt−s 0 0 B(x,r) ( ) j∈Ic " # X |D| |I\D| Y |I| Y × ws(y) u (1 − u) ws(xj) − 1 − (1 − u) ws(xj) D⊆I;|D|≥1 j∈Dc j∈I

dx1 ... dxNt−s dyνr(du)µ(dr)dx. (3.15)

Taking the expectation of (3.15) with respect to E[ · |Ξ0 ∼ µψ], we obtain that it is equal to the expectation of the first term on the r.h.s. of (3.13). The same holds for the expectation of the second part of (3.14) and that of the second part of (3.13). Taking E then the expectation with respect to M0 (and using Fubini’s theorem), we arrive at the desired equality (3.12). Hence, we can use Lemma 4.4.10 in [21] to conclude that

F (t, 0) = F (0, t), and (1.25) is proved.

4 Heuristics for the large-scale behaviour of the SLFVS and its dual process

In this section, we provide an informal justification of our choices for the parameters β, γ and δ in our scalings. Recall from the introduction that we should like to establish scalings of the selection and impact parameters, sn and un, for which selection will leave a trace in the long-term evolution of the population, without leading to an instantaneous invasion by the favoured allele. In particular, we wish to complement the work of [18, 19], in which the impact (or fraction of the local population replaced during an event) is kept of order O(1) while the selection coefficient goes to 0 as the scaling parameter n tends to infinity, modelling a population in which the local densities of individuals are low and therefore any event leads to the replacement of a macroscopic fraction of the local population. In our work, we always assume that local population densities are very high and reproduction impacts only a very small fraction of the individuals present in

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 34/89 The spatial Lambda-Fleming-Viot process with selection the affected area (i.e., we assume that u  1). We consider long timescales (nt, t ≥ 0) and identify the orders of magnitude of un and sn, and the corresponding spatial scale (nβx, x ∈ Rd) over which the dynamics of the population will converge as n → ∞ to a Fisher-KPP type evolution. In passing, we shall also see why there is no way to obtain a stochastic limit in more than one dimension when we assume that both the impact un of each event and the relative frequency sn of selective to neutral events tend to 0, and we shall justify the claims made in Remarks 1.12 and 1.15 about the range of parameters leading to a deterministic Fisher-KPP process in one dimension. Let us start with the case of local reproduction. As usual in the spatial Lambda- Fleming-Viot framework, it is easier to first think about the corresponding scaled dual processes. Chapter 7 in [35] suggests that if there is a regime of parameters in which the SLFVS converges to the solution to the Fisher-KPP equation, and in one dimension to its stochastic counterpart, then the sequence of corresponding dual processes should converge to a branching Brownian motion in which, in dimension 1, pairs of particles coalesce at a rate proportional to their collision local time (we explain the correspondence between our frameworks in the paragraph on uniqueness of the limit in the proof of Theorem 1.11, see Section 5.1). Let us thus analyse the different types of event which can affect the dual process when time is sped up by n and space is scaled down by nβ, focusing first on what happens to a single particle. During a neutral event, if this −γ particle is marked (with probability un = O(n )), then it is removed and replaced by another particle whose distribution is uniformly distributed in the region of the event (of radius Rn−β in our new units). We see this as a jump of the particle. Because the component of the intensity of ΠN corresponding to the centres of events is Lebesgue measure on Rd, when the particle jumps, the location of the centre of the corresponding event is uniformly distributed in the ball of radius Rn−β around the current location x of the particle. Consequently, the position of the particle “after the jump” belongs to B(x, 2Rn−β) a.s. and has a radially symmetric distribution around x. Summing up the above, in our new units a single particle jumps at rate O(n1−γ ) and makes mean zero, finite variance, jumps of size bounded by a constant times 1/nβ. For this jump process to converge to some non-trivial process (Brownian motion, in fact) as n → ∞, we thus have to assume that 1 − γ = 2β. (4.1)

Now consider what happens at a selective event. Again our particle is marked with probability O(n−γ ), and in this case, the two particles which replace it are created at a separation of order O(n−β). Consequently, they may be overlapped by a new event very −1 2 −2γ quickly (after a time of order O(n )), and then with probability un = O(n ) they are both erased and replaced by a single “parental” particle (we see this type of event as a “coalescence”). In the limit as n → ∞, we will only “see” the branching event before it is erased by such a coalescence if the two particles have positive probability of moving apart to a distance of order one before (perhaps) coalescing. Let us find conditions under which we can expect this to hold. In our new timescale, each particle is overlapped by an event at rate O(n). The probability that only one of the two particles is marked during such an event, and therefore “jumps” to a location at distance O(n−β) while the other stands still, is un−γ (1− O(n−γ )). Furthermore, as soon as the two particles are at distance larger than 2Rn−β, they cannot be overlapped by the same event and so they jump independently of each other according to a continuous time random walk. Hence, what we actually have to understand is how many times the two particles come back to a separation less than 2Rn−β before they manage to move apart to a separation of O(1). Indeed, the same type of analysis as the one carried out in the proof of Lemma 6.6 in [7] (see Lemma 5.6 and below in Section 5 of the present work) shows that when they come together, the

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 35/89 The spatial Lambda-Fleming-Viot process with selection two particles remain at distance less than 2Rn−β during a number of events affecting them of order O(1) (which translates into a number of events simply overlapping them of order O(nγ )). Hence, the probability that they are both affected by an event and coalesce before separating again to a distance more than 2Rn−β is of the same order as the probability of coalescence during a single event conditionally on at least one of the particles being marked, which is O(n−2γ /n−γ ) = O(n−γ ). From this, we can conclude in particular that the two particles will need to come back “together” O(nγ ) times before they have a positive probability of both being affected by the same event and therefore coalescing into common ancestral particle(s). When they are more than 2Rn−β apart, the two particles jump independently accord- ing to a continuous time symmetric random walk with step sizes of order O(n−β), and so the separation between them is also a symmetric random walk with step size of this order. We are interested in the probability that during an excursion away from B(0, 2Rn−β), the difference walk reaches a distance of order 1, i.e. nβ times larger than its initial value. It is convenient to work in our original space units. For a symmetric continuous-time random walk with step size of order O(1), starting at distance slightly larger than 2R from 0, the probability of reaching distance nβ from 0 before reentering B(0, 2R) has the same order as the probability that the number of steps to come back within B(0, 2R) is larger than n2β. This in turn will have the same order as the corresponding quantity for simple symmetric random walk. Using Proposition 5.1.1 in [34] when d = 1, Theorem 1 in [45] when d = 2 and the transience of simple symmetric random walk when d ≥ 3, we obtain that in one dimension, this probability is of order O(n−β); in two dimensions, this probability is of order O(1/ ln n); in dimension d ≥ 3, this probability tends to p ∈ (0, 1) as n → ∞. Consequently, when d ≥ 2 the probability that the two particles come back together O(nγ ) times before they separate to a distance of O(1) tends to 0 and, in the limit, a given branching event is never followed by instantaneous coalescence. Since 1−δ−γ branching events happen at a rate O(nsnun) = O(n ), we need to impose that

1 − δ − γ = 0 (4.2) if they are to occur at rate O(1) in the limit. On the other hand, in one dimension, we see that if β > γ, with probability tending to one, the two particles will coalesce back together before they can separate to a distance of O(1) and in the limit, all branching events are cancelled (i.e., there is no branching in the limiting dual). If β < γ, the two particles become separated at distance O(1) before they have any chance to coalesce, and all branching events are conserved in the limit; in contrast coalescence will never be seen in the limit for particles starting at any separation. Finally, when

β = γ (4.3) the particles have positive probability of separating to a distance of O(1) before coalesc- ing, but coalescence of particles happens in finite time a.s. in the limit. In the last two cases, we also need to impose Condition (4.2) for branching events to occur at rate O(1) in the limit. Solving the system given by Conditions (4.1), (4.2) and (4.3), we obtain β = 1/3 = γ and δ = 2/3 as specified in Section 1.3. Observe that if we keep Conditions (4.1) and (4.2) and replace Condition (4.3) by β < γ, (4.4) even in one dimension two particles can branch but not coalesce in the limit and, as explained in the paragraph on uniqueness of the limit in the proof of Theorem 1.11 (see Section 5.1), the limit of the SLFVS is the (measure-valued) solution to the deterministic Fisher-KPP equation.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 36/89 The spatial Lambda-Fleming-Viot process with selection

Remark 4.1. In two dimensions, our heuristics suggest that if instead of scaling space β by n , we were to scale it by a more general factor kn, then the relation between the parameters allowing the number of excursions necessary for the particles to separate to distance O(kn) to be of the same order as the number of times the particles need to come “together” before they coalesce is

ln kn ≈ 1/un. (4.5)

(Here we use ≈ to mean that the two quantities are of the same order of magnitude.) 2 Together with the relations nun ≈ kn ensuring that the limiting motion is Brownian motion, and nunsn ≈ 1 guaranteeing that branching occurs at rate O(1) in the limit, this gives us that n r n ln kn ≈ 2 , and so kn ≈ . (4.6) kn ln n But with this choice of parameters, the bound (5.24) we shall establish for the predictable n quadratic variation of F (hw· , fi) reads

2 2 d −2d kn 1 unnkn × kn ≈ ≈ → 0 as n → ∞. (4.7) n ln kn Therefore, something more subtle happens here and even with a more general form of scaling of space, the limiting process of allele frequencies is still deterministic. We now turn to the stable case. As before, we first consider the “jumps” of a single particle. Again, such a jump occurs at rate O(n1−γ ), and its new position is chosen uniformly over a ball whose radius is given by the intensity measure µ with polynomial αβ decay described in (1.33). Consequently, if we choose nun ∝ n , i.e.

1 − γ = αβ, (4.8) then in the limit as n → ∞ the jump process will converge to a symmetric α-stable process with index α. Second, in order to see any branching of particles due to selective events at all, as before we need nsnun to be order one, that is Condition (4.2) to be fulfilled. Finally, let us consider the simultaneous removal of two particles, to be replaced by “parental particle(s)” (what we called a coalescence earlier). Since un → 0 as n → ∞, although it is now the case that two particles can always be affected by the same event (the radii of the events are not bounded), “most of the time” they will not and their jumps are almost independent. Consequently, the difference of their positions is also approximately described by an α-stable process. Now, because events of radius O(1) (in our original units) are much more frequent than events of large radii O(na) for any a > 0, and the probability that both particles belong to the fraction of the local population 2 2 −2γ replaced during an event is tiny (un = u n ), if coalescence is to happen in the limit, then we expect it to be driven by the smaller events. Note that because there is no bound on the event radii, we can no longer perform the same decomposition into excursions away from some ball as in the fixed radius case. The following argument only gives the intuition behind Lemma 6.4 in Section 6, which allows us to control the coalescence rate of the two particles. In more than one dimension, the rotation-invariant α-stable processes with α ∈ (1, 2) are transient (see Example 37.19(ii) in [46]), and so as in the fixed radius case, this tells us that the two particles do not spend enough time close together for coalescence to occur, whatever our choice of β, γ, α consistent with the previous conditions. In one dimension, we have not found a simple heuristic explanation for the last condition on the parameters (which one would expect to be analogous to the comparison between the number and lengths of visits in a neighbourhood of zero for the difference process, and

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 37/89 The spatial Lambda-Fleming-Viot process with selection the coalescence rate of the two particles, carried out in the fixed radius case). Instead, the condition γ = (α − 1)β (4.9) will emerge when we control the second term on the r.h.s. of (6.9), which corresponds to the variance term in the limiting process (and thus to the coalescence term in the dual process). See also Equations (6.15) and (6.19) and the surrounding paragraphs. In the end, we have three equations in three unknowns (in one dimension) and solving gives the values in Equation (1.33). As in the fixed radius case, we may replace Condition (4.9) by (α − 1)β < γ (4.10) and obtain a limiting dual in which, in any dimension, branching occurs at rate O(1) but coalescence never occurs. As a final comment, notice that in this work we have chosen a particular form for β the parameters un and sn, given in (1.29), and for the scaling of space (by n ). We have argued that, within this particular framework (which nonetheless covers a wide range of scenarios), in more than one dimension it was not possible to find values for β, γ, δ ∞ such that the martingale problem characterising the limiting process (Mt )t≥0 contains a Laplacian (or fractional Laplacian) term, a drift term due to the slight advantage of type 1 individuals and a martingale term corresponding to the noise in the Fisher-KPP- like equation satisfied by the process. Of course this does not prove that other forms of parameters and spatial scalings would not yield a stochastic limit even under the assumption that (un)n≥1 and (sn)n≥1 tend to 0 as n tends to infinity that we have imposed. One way to investigate this question is to use the correspondence between the fact that the limiting process is stochastic and the property that the events that we loosely call “coalescence of particles” have positive probability to happen in the limiting dual process (a general property of continuous-site stepping-stone models, see Section 5 in [22]). We have not been able to find scalings for which, in the limiting dual, particles may “move in space”, “branch” and “coalesce” (even in a non-local way), with positive probability when d ≥ 2. In fact, due to the facts that rotation-invariant α-stable processes are transient for d > α and that coalescence happens un times more slowly than movement of particles before taking the limit, we conjecture that such scalings do not exist, even in the case of α-stable radii. We leave this delicate question to the interested reader.

5 Convergence of the rescaled SLFVS and its dual – the fixed ra- dius case

In this section, we prove Theorem 1.11 and, from it, deduce Theorem 1.13.

5.1 Proof of Theorem 1.11 The proof proceeds in the usual way. First, we show that the sequence of non- n Markovian processes (M )n≥1 is tight in DMλ [0, ∞) and that any limit point has a.s. continuous trajectories. Next, we prove that any limit point M ∞ satisfies the martingale problem stated in Theorem 1.11. Finally, we show that there is at most one solution in DM [0, ∞) to this martingale problem (again thanks to a duality argument), which will λ n allow us to conclude that indeed (M )n≥1 converges to this solution. 1) Tightness and continuity of the limit. First, let n ≥ 1. Since the (unscaled) SLFVS (Mt)t≥0 with parameters given in (1.27), (1.28), (1.29) and µ = δR has sample paths in DM [0, ∞) by Theorem 1.2, so has the n λ locally averaged and scaled process M (recall that its density is defined by Rela- tion (1.30)).

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 38/89 The spatial Lambda-Fleming-Viot process with selection

n Let us now show that the sequence (M )n≥1 is relatively compact. We proceed as in the proof of Theorem 1.2 and refer to the paragraph Proof of (i), item (c), in Section 2 for a more detailed justification of the steps taken below. Again, due to the compactness of Mλ endowed with the topology of vague convergence and the fact that the set of 3 R ∞ Rd functions of the form ΨF,f with F ∈ C ( ) and f ∈ Cc ( ) is dense (for the supremum 1 d norm) in the set of functions of the form ΨF,f with F ∈ C (R) and f ∈ Cc(R ), and is therefore dense in C(Mλ) by Lemma 1.1, Theorem 3.9.1 in [21] tells us that it suffices to n show the relative compactness of the sequence of real-valued processes (ΨF,f (M ))n≥1 3 R ∞ Rd for every F ∈ C ( ) and f ∈ Cc ( ). Second, for each such function ΨF,f , we use the Aldous-Rebolledo criterion [2, 43] to reduce the problem to tightness of the sequences of the predictable finite variation parts and of the predictable quadratic variation of the n martingale parts of (ΨF,f (M t ))t≥0. More precisely, since ΨF,f is a bounded function on n Mλ, we directly have that for every t ≥ 0, the sequence (ΨF,f (M t ))n≥1 is tight. Writing n n (At )t≥0 (resp., (Qt )t≥0) for the finite variation part (resp., the quadratic variation of n the martingale part) of (ΨF,f (M t ))t≥0, it remains to prove that for every T > 0, every sequence of stopping times (τn)n≥1 bounded by T , and every ε > 0, there exists η > 0 such that lim sup sup P|An − An | > ε ≤ ε, τn+θ τn (5.1) n→∞ θ∈[0,η] and lim sup sup P|Qn − Qn | > ε ≤ ε. τn+θ τn (5.2) n→∞ θ∈[0,η] In what follows, we first establish an expression for the processes An and Qn (see (5.12) and (5.13)), and then we prove (5.1) and (5.2) by decomposing these expressions into several parts that we control separately. n To find expressions for the predictable finite and quadratic variation parts of ΨF,f (M ) for any fixed n ≥ 1, let us begin by considering the unscaled SLFVS (Mt)t≥0 with reproduction events of fixed radius R, and parameters un, sn (notice that for simplicty we have suppressed the dependence of (Mt)t≥0 on n in the notation). Also, the function ∞ Rd F will be fixed but for a moment we replace the function f by any function ϕ ∈ Cc ( ). A judicious choice of ϕ will then yield the desired function of the scaled and locally n averaged process M . ∞ Rd Let ϕ ∈ Cc ( ). By Theorem 1.2, we know that before scaling space and time, the extended generator of the SLFVS, acting on the function ΨF,ϕ, is given by Z Z 1 n LΨ (M) = w(y)(1 + s w(z))F (hΘ+ (w), ϕi) − F (hw, ϕi) (5.3) F,ϕ 2 n x,R,un Rd B(x,R)2 VR o + (1 − w(y) + s (1 − w(y)w(z)))F (hΘ− (w), ϕi) − F (hw, ϕi) dydzdx, n x,R,un where w is a representative of the density of M. This gives us that the predictable finite variation part of (ΨF,ϕ(Mt))t≥0 is Z t At = LΨF,ϕ(Ms) ds, t ≥ 0. (5.4) 0 2 Furthermore, the martingale problem stated in Theorem 1.2 applies to ΨF 2,ϕ = F (h·, ϕi) , which allows us to obtain (using Itô’s formula) that the predictable quadratic variation of (ΨF,ϕ(Mt))t≥0 at any time t ≥ 0 is given by

t Z Z Z 1 n 2 Q = w (y)(1 + s w (z))F (hΘ+ (w ), ϕi) − F (hw , ϕi) (5.5) t 2 s n s x,R,un s s 0 Rd B(x,R)2 VR 2o + (1 − w (y) + s (1 − w (y)w (z)))F (hΘ− (w ), ϕi) − F (hw , ϕi) dydzdxds. s n s s x,R,un s s

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 39/89 The spatial Lambda-Fleming-Viot process with selection

n n Let us now consider the Markov process (Mt )t≥0 whose density at time t is wt (·) := 1/3 wnt(n ·). We set −1/3 Bn(x) = B(x, n R) (5.6) and write w(x) = nd/3V −1 R w(z)dz. In particular, in the notation of Section 1.3 we R Bn(x) have for every t ≥ 0

d/3 Z Z n n 1 n wt (z)dz = wnt(y)dy = wt (x). (5.7) 1/3 VR Bn(x) VR B(n x,R) From our expression for L, accelerating time by a factor n and performing several changes of the spatial variables, we obtain that the extended generator of M n is given by

n L ΨF,ϕ(M) Z Z 1 n −1/3 −1/3  + = n w(n y) 1 + snw(n z)) F (hΘ −1/3 −1/3 (w), ϕi) 2 n x,n R,un Rd B(x,R)2 VR − F (hw, ϕi) −1/3 −1/3 −1/3  − + (1 − w(n y) + sn(1 − w(n y)w(n z))) F (hΘ −1/3 −1/3 (w), ϕi) n x,n R,un o − F (hw, ϕi) dydzdx

d Z n 1+ 3  +  = n w(x)(1 + snw(x)) F (hΘx,n−1/3R,u (w), ϕi) − F (hw, ϕi) Rd n 2  − o + 1 − w(x) + sn(1 − w(x) ) F (hΘ −1/3 (w), ϕi) − F (hw, ϕi) dx. (5.8) x,n R,un

n R t n n The predictable finite variation part of (ΨF,ϕ(Mt ))t≥0 is thus ( 0 L ΨF,ϕ(Ms )ds)t≥0. Likewise, its predictable quadratic variation at time t is equal to

t d Z Z n 2 1+ 3 n n  + n n  n w (x)(1 + snw (x)) F (hΘ −1/3 (w ), ϕi) − F (hw , ϕi) s s x,n R,un s s 0 Rd n n 2  − n n 2o + 1 − w (x) + sn(1 − w (x) ) F (hΘ −1/3 (w ), ϕi) − F (hw , ϕi) dxds. (5.9) s s x,n R,un s s

Finally, it remains to evaluate the above expressions with ϕ of the form

nd/3 Z ϕf (x) = f(y)dy (5.10) VR Bn(x) ∞ Rd for our fixed f ∈ Cc ( ) and to use the fact that, by Fubini’s Theorem, Z Z d/3 n n n n hw , ϕf i = w (y) f(z)1{|z−y|≤n−1/3R}dydz = hw , fi, (5.11) Rd Rd VR n to obtain that the predictable finite variation part of (ΨF,f (M t ))t≥0 is given by Z t n n n At = L ΨF,ϕf Ms ds, (5.12) 0 n with L ΨF,ϕf as in (5.8), and its predictable quadratic variation is given by

t d Z Z n 2 n 1+ 3 n n  + n n  Q = n w (x)(1 + snw (x)) F (hΘ −1/3 (w ), ϕf i) − F (hw , ϕf i) t s s x,n R,un s s 0 Rd n n 2  − n n 2o + 1 − w (x) + sn(1 − w (x) ) F (hΘ −1/3 (w ), ϕf i) − F (hw , ϕf i) dxds. s s x,n R,un s s (5.13)

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 40/89 The spatial Lambda-Fleming-Viot process with selection

Note that

+ hΘ −1/3 (w), ϕf i − hw, ϕf i = un h1B (x)(1 − w), ϕf i x,n R,un n − hΘ −1/3 (w), ϕf i − hw, ϕf i = −un h1B (x)w, ϕf i, (5.14) x,n R,un n

−d/3 so that both increments are of the order of unn . Moreover, f has compact support d Sf in R and thus so has ϕf . This will enable us to control the integrals over space of these increments. n Using this observation, we first show that |At | is bounded by a constant independent of n. To this end, we write it as the sum of a neutral term and a selective term and perform a Taylor expansion of F (truncating at second order in the neutral term and at first order in the selective term). This yields, for any t ≥ 0, Z t n  At = An(s) + Bn(s) + Cn(s) + Dn(s) + En(s) ds, (5.15) 0 where

d Z h 1+ 3 0 n n n An(s) = unn F (hws , fi) ws (x)h1Bn(x)(1 − ws ), ϕf i Rd n n i − (1 − ws (x))h1Bn(x)ws , ϕf i dx, 00 n d F (hw , fi) Z h 2 1+ 3 s n n 2 Bn(s) = unn ws (x)h1Bn(x)(1 − ws ), ϕf i 2 Rd n n 2i + (1 − ws (x))h1Bn(x)ws , ϕf i dx,

d Z 3 1+ 3  Cn(s) ≤ Cn unVol(Bn(x)) 1{Bn(x)∩Sf 6=∅}dx, Rd

d Z h 1+ 3 0 n n 2 n Dn(s) = unsnn F (hws , fi) ws (x) h1Bn(x)(1 − ws ), ϕf i Rd n 2 n i − (1 − ws (x) )h1Bn(x)ws , ϕf i dx,

d Z 0 1+ 3 2 2 En(s) ≤ C n snun Vol(Bn(x)) 1{Bn(x)∩Sf 6=∅}dx, (5.16) Rd for some constant C, C0 independent of n and s. To control these expressions, we take a Taylor expansion of ϕf . We illustrate with the term An(s). In fact, in identifying the limiting process we shall need a precise expression for the limit of An(s) and so we perform the expansion slightly more carefully than would be required to simply conclude boundedness. Let us write Dϕf for the vector of first derivatives of ϕf and Hϕf for the corre- sponding Hessian (Hϕf = DDϕf ). Recall that Sf denotes the compact support of f. Then

d Z h i 1+ 3 0 n n n An(s) = unn F (hws , fi) ws (x)h1Bn(x), ϕf i − h1Bn(x)ws , ϕf i dx (5.17) Rd d/3 d Z n ZZ 1+ 3 0 n n = unn F (hws , fi) 1{|y−x|≤n−1/3R}1{|z−x|≤n−1/3R}ws (y) Rd VR

× (ϕf (z) − ϕf (y))dzdydx d/3 d Z n Z Z 1+ 3 0 n n = unn F (hws , fi) 1{|y−x|≤n−1/3R}1{|z−x|≤n−1/3R}ws (y) Rd VR Rd Rd 1 × Dϕ (y)(z − y) + (z − y)Hϕ (y)(z − y) + O(|z − y|3)1 dzdydx. f 2 f {y∈Sf }

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 41/89 The spatial Lambda-Fleming-Viot process with selection

Consider the first term on the right. Integrating first with respect to x (using Fubini’s Theorem) this term is

1+ 2d Z Z unn 3 0 n n F (hws , fi) ws (y) Vol(Bn(y) ∩ Bn(z))Dϕf (y)(z − y)dzdy, (5.18) VR Rd Rd and since Vol(Bn(y) ∩ Bn(z)) is a function of |z − y| alone, the integrand is antisymmetric as a function of z − y and so the integral with respect to z vanishes. Similarly, the integrals corresponding to the off-diagonal terms in the Hessian will vanish, leaving

d/3 d Z n Z Z 1+ 3 0 n n unn F (hws , fi) 1{|y−x|≤n−1/3R}1{|z−x|≤n−1/3R}ws (y) Rd VR Rd Rd d 1 X ∂2 × (z − y )2 ϕ (y)dydzdx (5.19) 2 i i ∂y2 f i=1 i

∞ Rd plus a lower order term. Now observe that since f ∈ Cc ( ), another Taylor expansion argument enables us to write that

2 2 ∂ ∂ f −2/3 2 2 ϕf (y) = ϕ ∂ f (y) = 2 (y) + O(n )1{Bn(y)∩Sf 6=∅} (5.20) ∂y ∂y2 ∂y i i i

(where the term O(n−2/3) is independent of y). This yields

d/3 d Z n Z Z 1+ 3 0 n n An(s) = unn F (hws , fi) 1{|y−x|≤n−1/3R}1{|z−x|≤n−1/3R}ws (y) Rd VR Rd Rd d 1 X ∂2f × (z − y )2 (y)dydzdx 2 i i ∂y2 i=1 i Z Z Z −2/3 2 (1+d) 2 3 −1/3 −1/3 +O(n )n 1{|y−x|≤n R}1{|z−x|≤n R}|z − y| 1{Bn(y)∩Sf 6=∅} Rd Rd Rd 2 (1+d) Z Z d 2 un 3 0 n n X 2 ∂ f −2/3 = F (hws , fi) ws (y) (zi − yi) 2 (y)dydzdx + O(n ) 2V Rd 2 ∂y R Bn(x) i=1 i Z uΓR 0 n n −2/3 = F (hws , fi) ws (y)∆f(y)dy + O(n ) 2 Rd uΓR = F 0(hwn, fi) hwn, ∆fi + O(n−2/3), (5.21) 2 s s where

2 (1+d) Z Z Z Z n 3 2 1 2 ΓR = (z1 − y1) dzdx = (z1) dzdx (5.22) VR Bn(y) Bn(x) VR B(0,R) B(x,R) was defined in (1.34), and the last equality uses another Taylor expansion to show that for any s, n n −2/3 hws , ∆fi = hws , ∆fi + O(n ) (5.23) n with an error term uniformly bounded in s. In particular, since |hws , fi| ≤ kfkVol(Sf ), we can conclude that |An(s)| ≤ CA uniformly in s and n. Very similar arguments allow us to control the other terms:

2 1+ d Z unn 3 00 n 2 2 1−d |B (s)| ≤ |F (hw , fi)| 2Vol(B (x)) 1 kfk dx ≤ C n 3 , (5.24) n 2 s n {x∈Sf } B

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 42/89 The spatial Lambda-Fleming-Viot process with selection and, again by the same arguments,

− 2d − 1+d |Cn(s)| ≤ CC n 3 , |Dn(s)| ≤ CD and |En(s)| ≤ CE n 3 , (5.25) where the constants CB, CC , CD, CE are all independent of n and s. Coming back to (5.15) and combining all the estimates we just obtained, for every s < t we have

1−d 2d 1+d n n 3 − 3 − 3  |At − As | ≤ CA + CB n + CC n + CD + CE n (t − s). (5.26)

From there it is easy to deduce that for every T > 0, given a sequence of stopping times (τn)n≥1 bounded by T , for every ε > 0 there exists η > 0 such that lim sup sup P An − An > ε = 0, τn+θ τn (5.27) n→∞ θ∈[0,η] which corresponds to (5.1) and shows that the sequence of finite variation parts of n (ΨF,f (M t ))t≥0 is tight. Similarly, we obtain that

 ± n n 2 00 2 2 2 F (hΘ −1/3 (w ), ϕf i)−F (hw , ϕf i) ≤ C kfk u Vol(Bn(x)) 1{B (x)∩S 6=∅}. (5.28) x,n R,un s s F n n f n Notice that this bound is independent of the value of ws . Substituting into the definition n of Qt given in (5.13), we obtain that for every s < t,

1−d n n 3 |Qt − Qs | ≤ CF n (t − s), (5.29) for a constant CF independent of n (and s, t). Therefore, for every T > 0, every sequence of stopping times (τn)n≥1 bounded by T and every ε > 0, there exists η > 0 such that lim sup sup P Qn − Qn > ε = 0 τn+θ τn (5.30) n→∞ θ∈[0,η] and the sequence of predictable quadratic variations of the martingale part of the process n (ΨF,f (M t ))t≥0 is not only tight, but also when d ≥ 2 it tends to 0 uniformly over compact time intervals. By the Aldous-Rebolledo criterion (see again (i)-item (c) in Section 2), we n conclude that (M )n≥1 is tight in DMλ [0, ∞), as required. n n Finally, coming back to (5.14), we see that every increment of hM , fi = hM , ϕf i is −d/3 bounded by kfkVRunn . Consequently, for every T > 0 we have n n −(1+d)/3 sup sup M t , f − M t−, f ≤ VRun , (5.31) ∞ Rd t∈[0,T ] f∈Cc ( ):kfk≤1 n and thus any potential limit for (M )n≥1 has continuous paths in Mλ. 2) Identifying the limit. ∞ In what follows, we suppose that (Mt )t≥0 ∈ DMλ [0, ∞) is the weak limit of a nk ∞ subsequence (M )k≥1 and for any t ≥ 0, we write wt for (some representative of) the ∞ density of Mt . In order to show that M ∞ satisfies the martingale problem stated in Theorem 1.11, ∞ Rd we use the fact (established in the previous paragraph) that for every f ∈ Cc ( ) and every n ≥ 1,  Z t  n n n n ΨId,f M t − ΨId,f M 0 − L ΨId,ϕf (Ms )ds (5.32) 0 t≥0 is a martingale with predictable quadratic variation (5.13) (with F = Id), where Ln was defined in (5.8) and ϕf in (5.10). We first show that for every t ≥ 0,  Z t Z t    uΓR E nk nk ∞ ∞ ∞ lim L ΨId,ϕf (Ms )ds − hws , ∆fi − 2Ruσ hws (1 − ws ), fi ds = 0, k→∞ 0 0 2 (5.33)

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 43/89 The spatial Lambda-Fleming-Viot process with selection so that we can then use the fact that the quantity in (5.32) is a martingale, the fact that ΨId,f is a bounded continuous function and the Dominated Convergence Theorem to 0 conclude that for every 0 ≤ t < t , m ∈ N, 0 ≤ t1 < ··· < tm ≤ t and h1, . . . , hm ∈ Cb(Mλ),

0  Z t uΓ   E ∞ ∞ R ∞ ∞ ∞ hwt0 , fi − hwt , fi − hws , ∆fi−2Ruσ hws (1 − ws ), fi ds t 2 m  Y  × h M ∞ = 0 i ti (5.34) i=1 and consequently that Zf is a martingale (with respect to the natural filtration of M ∞). In the case d ≥ 2 this property will be sufficient to conclude, since we showed in (5.29) that the quadratic variation of the martingale (5.32) tended to 0 as n → ∞, and therefore the limit Zf is the constant process equal to 0. In one dimension, we shall still have to prove that the quadratic variation of Zf is non-trivial and has the announced form. This is what we do in the last part of this point 2). Let us prove (5.33). Specialising the computation of An in (5.8) to the case F = Id, we have by (5.21)

uΓR −2/3 A (s) = hwnk , ∆fi + O(n ) nk 2 s k Z uΓR nk −2/3 = ∆f(x)1{0}(κ)M s (dx, dκ) + O(nk ) 2 Rd×{0,1}

uΓR → hw∞, ∆fi as k → ∞. (5.35) 2 s

−2/3 These quantities being bounded by (uΓR/2)k∆fkVol(Sf ) + O(nk ), independently of s, the convergence also happens in L1 norm. Next, Taylor-expanding f to write that for −1/3 every y ∈ B(x, Rnk ),

−1/3 ϕf (y) = f(x) + O(|y − x|) = f(x) + O nk , (5.36) we obtain that Z n d/3 nk 2 nk −1/3 Dn (s) = σun w (x) 1 (1 − w ), f(x) + O n k k s Bnk (x) s k Rd −1/3 o − (1 − wnk (x)2) 1 wnk , f(x) + On  dx s Bnk (x) s k Z  nk 2 nk nk 2 nk −1/3 = σuVR ws (x) (1 − ws (x)) − (1 − ws (x) )ws (x) f(x)dx + O nk Rd

nk nk −1/3 = −σuVR hws (1 − ws ), fi + O nk . (5.37)

nk L1 As above, the part of Dnk (s) which is linear in ws converges (weakly and in ) towards

∞ − σuVR hws , fi. (5.38)

We now would like to show that the “quadratic” part of Dnk (s) converges to

∞ 2 σuVR h(ws ) , fi. (5.39) n Note that this is not a simple consequence of the weak convergence of M k to M ∞, as nk nk nk 2 ⊗2 h(ws ) , fi cannot be written as an integral with respect to M s or (M s ) . Instead, nk ⊗2 we shall approximate this expression by an integral with respect to (M s ) and use the continuity estimates obtained in Proposition A.1 to bound the remaining terms. (The

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 44/89 The spatial Lambda-Fleming-Viot process with selection statement and proof of this proposition are postponed until Appendix A to ease the reading.) d Let ε ∈ (0, 1/2), and let pε be a continuous probability density function on R sup- ported in B(0, ε). For every k ≥ 1 and s ≥ 0, we have

nk 2 ∞ 2 h(ws ) , fi − h(ws ) , fi Z Z Z nk 2 nk nk ≤ f(x)ws (x) dx − f(x)ws (x)ws (y)pε(y − x)dydx Rd Rd Rd Z Z Z Z nk nk ∞ ∞ + f(x)ws (x)ws (y)pε(y − x)dydx − f(x)ws (x)ws (y)pε(y − x)dydx Rd Rd Rd Rd Z Z Z ∞ ∞ ∞ 2 + f(x)ws (x)ws (y)pε(y − x)dydx − f(x)ws (x) dx . (5.40) Rd Rd Rd

The second term on the r.h.s. can be rewritten as Z 0 nk 0 nk f(x)pε(y − x)1{0}(κ)1{0}(κ )M s (dy, dκ )M s (dx, dκ) (Rd×{0,1})2 Z 0 ∞ 0 ∞ → f(x)pε(y − x)1{0}(κ)1{0}(κ )Ms (dy, dκ )Ms (dx, dκ) (Rd×{0,1})2 Z Z ∞ ∞ = f(x)ws (x)ws (y)pε(y − x)dydx (5.41) Rd Rd

d 2 as k tends to infinity (since the mapping (x, y) 7→ f(x)pε(y − x) belongs to Cc((R ) ), and since these terms are bounded uniformly in k (and ε, s), this convergence also happens in L1 norm. That is, the expectation of the second term in (5.40) tends to 0 as k → ∞. nk Concerning the first term on the r.h.s. of (5.40), because ws takes its values in [0, 1], we have, by Fubini’s Theorem,  Z Z Z  E nk 2 nk nk f(x)ws (x) dx − f(x)ws (x)ws (y)pε(y − x)dydx Rd Rd Rd Z Z E nk nk  ≤ kfk ws (x) − ws (y) pε(y − x)dydx. (5.42) Sf B(x,ε)

−1/3 By Proposition A.1 applied with ε = Rnk , there exists a, v, λ, C > 0 independent of k such that for every x, y ∈ Rd satisfying |x − y| < 1 and every s ∈ [0, t], we have

n −a   −1/3 E nk nk  1/4 1/2 λ(|x|+Rnk ) ws (x) − ws (y) ≤ C nk + τnk (x, y) + |x − y| + τnk (x, y) e

(1−d)/6 (2−d)/4o + nk τnk (x, y) , (5.43) where −v 2/(d+1) τn(x, y) = n ∨ |x − y| . (5.44)

Thus, using the facts that the support Sf of f is compact, that pε is a probability density 2/(d+1) supported in B(0, ε), and that τn(x, y) ≤ ε for n large enough whenever |x − y| ≤ ε, we can write that the first term on the r.h.s. of (5.40) is bounded by

0 −a 2/(d+1) 1/4 1/(d+1) (1−d)/6 1/(d+1) C nk + ε + ε + ε + nk ε . (5.45)

Likewise, by taking n → ∞ in Proposition A.1 (along the converging subsequence), we obtain that the last term on the r.h.s. of (5.40) is bounded by

0 1/4 2/(d+1) 1/(d+1) 1/(d+1)  C ε + ε + ε + ε 1{d=1} . (5.46)

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 45/89 The spatial Lambda-Fleming-Viot process with selection

Combining the above, we have that for every ε ∈ (0, 1/2),

E nk 2 ∞ 2  1/4 1/(d+1) lim sup h(ws ) , fi − h(ws ) , fi ≤ C ε + ε , (5.47) k→∞

and letting ε tend to 0 we can conclude that the part of the expression (5.37) for Dnk (s) nk L1 which is quadratic in ws indeed converges in towards

∞ 2 σuVR h(ws ) , fi. (5.48)

Combining (5.35), (5.38) and (5.48), and using the facts that Bn(s) = 0 since F = Id, n and that Cn(s) and En(s) tend to zero uniformly in all possible values of M , we conclude that (5.33) is satisfied. As we explained above, this is sufficient to conclude in the case d ≥ 2 since the quadratic variation of the martingale Zf is then 0. We now turn to the case d = 1. Defining Z t n n n n n Wt (f) := hwt , fi − hw0 , fi − L ΨId,ϕf Ms ds (5.49) 0 Z t   n n uΓR n n n −1/3 = hwt , fi − hw0 , fi − hws , ∆fi − σuVR hws (1 − ws ), fi ds + O(n ), 0 2 we know that W n(f) is a zero-mean martingale with predictable quadratic variation Z t Z 2 4/3 n n n n −1/3 2 unn ws (x)(1 + snws (x))h1Bn(x)(1 − ws ), f + O(n )i 0 Rd n n 2 n −1/3 2o + (1 − ws (x) + sn(1 − ws (x) ))h1Bn(x)ws , f + O(n )i dxds Z t 2 2 n n 2 −1/3 = u VR hws (1 − ws ), f ids + O(n ), (5.50) 0 where, more precisely, the remainder term is bounded by a constant times n−1/3t. As 0 a consequence, for every n ≥ 1, 0 ≤ t < t , m ∈ N, 0 ≤ t1 < ··· < tm ≤ t and h1, . . . , hm ∈ Cb(Mλ),

 Z t0  m  n 2 n 2 2 2 n n 2 −1/3 Y n  E W 0 (f) − W (f) −u V hw (1−w ), f ids+O(n ) h M = 0 t t R s s i ti t i=1 (5.51) Observe that for every n ≥ 1 and every t ≥ 0,    uΓR |W n(f)| ≤ Vol(S ) 2kfk + t k∆fk + σuV kfk + O(n−1/3) , (5.52) t f 2 R and so we can let n → ∞ in (5.51) (along the converging subsequence) and use the Dominated Convergence Theorem, together with (5.35), (5.38) and (5.48), to conclude that  Z t0  m  f 2 f 2 2 2 ∞ ∞ 2 Y ∞ E Z 0 − Z − u V hw (1 − w ), f ids h M = 0. t t R s s i ti (5.53) t i=1

This allows us to identify the quadratic variation of the martingale Zf as Z t f 2 2 ∞ ∞ 2 [Z ]t = u VR hws (1 − ws ), f ids, t ≥ 0. (5.54) 0

Since by (5.14) the jumps of W n(f) are all bounded by Cn−2/3, Zf is a continuous square- integrable martingale, starting at 0. By the Dubins-Schwarz Theorem (see Remark 5.1

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 46/89 The spatial Lambda-Fleming-Viot process with selection below), Zf is therefore a time-changed Brownian motion, solution to the stochastic differential equation p ∞ ∞ 2 f dWt = uVR hwt (1 − wt ), f i dBt , (5.55) where Bf denotes standard Brownian motion. The bracket process between Zf and Zg is then obtained by the same kind of calculations, writing first the bracket process for a fixed n and then identifying the limit by letting nk → ∞. f Remark 5.1. We cannot a priori prove that [Z ]∞ = +∞ a.s., as required by the classical Dubins-Schwarz Theorem. Note however that this condition can be removed, at the expense of extending the probability space on which we work. Indeed, if we introduce a f Brownian motion (βt )t≥0 independent of all other processes (possibly on some enlarged space) and set for every t ≥ 0

( Zf if t < [Zf ] , f τt ∞ Bt = f f f (5.56) Z + β f if t ≥ [Z ]∞, ∞ t−[Z ]∞ where  f τt := inf s ≥ 0 : [Z ]s > t , (5.57) f then by Theorem 1.7 in Chapter V of [44] we have that (Bt )t≥0 is a standard Brownian f f motion and for every t ≥ 0, Z = B f . t [Z ]t ∞ n To summarise, we have shown that any limit point (Mt )t≥0 of (M )n≥1 satisfies the ∞ Rd following system of stochastic differential equations: for every f ∈ Cc ( ), uΓ  dhw∞, fi = R hw∞, ∆fi − σuV hw∞(1 − w∞), fi dt t 2 t R t t p ∞ ∞ 2 f + 1{d=1}uVR hwt (1 − wt ), f idBt , (5.58) with initial value hw0, fi, and in one dimension, by polarisation the covariation between ∞ ∞ hw· , fi and hw· , gi is as in the statement of Theorem 1.11(i). 3) Uniqueness of the limit. Let us finally show that the system of equations (5.58) has at most one solution. We start with the case d ≥ 2. Any test function of the form

k Z  Y  M 7→ ψ(x1, . . . , xk) 1{0}(κj) M(dx1, dκ1) ··· M(dxk, dκk) Rd k ( ×{0,1}) j=1 k Z  Y  = ψ(x1, . . . , xk) w(xi) dx1 ··· dxk, (5.59) Rd k ( ) i=1 with ψ continuous and integrable on (Rd)k (where as before w is any representative of the density of M), can be uniformly approximated by linear combinations of functions Qk ∞ Rd of the form i=1h·, fii with fi ∈ Cc ( ) for every i. Thus, we can extend (5.58) to this more general class of functions. Then in Chapter 7 of [35], it is proved that, when σ = 0, any solution to (5.58) is dual, through the set of functional relations (1.26), to a system of independent Brownian motions with variance parameter uΓR, in which particles never coalesce. This is easily modified to σ > 0, in which case particles branch into two at rate uσVR, independently of each other. Since the set of all test functions of the form (5.59) is separating by Lemma 2.1(c) in [49], we can proceed as in the proof of Proposition 4.4.7 in [21] to conclude that the system of equations (5.58) has at most one solution. n Hence, this solution exists and the full sequence (M )n≥0 converges to it in distribution, as stated in Theorem 1.11(ii).

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 47/89 The spatial Lambda-Fleming-Viot process with selection

When d = 1, we follow the same route and use Itô’s Formula to extend (5.58) to Qk functions of the product form i=1h·, fii and then to the full class of functions (5.59) by the same density argument as before. Again in Chapter 7 of [35], it is proved that in one dimension and when σ = 0, any solution to these equations is dual, through the set of relations (1.26), to a system of independent Brownian motions with variance parameter uΓR, in which, this time, particles coalesce pairwise at an instantaneous rate given by 2 2 u VR times the local time at 0 of their separation (independently of the other pairs). As earlier, this is easily modified to cover the case σ > 0, by imposing that particles should also branch into two at rate uσVR. By the same chain of arguments as in the case d ≥ 2, we can therefore conclude that the system of equations (5.58) has a unique solution, to n which the full sequence (M )n≥0 thus converges in distribution as n tends to infinity. Theorem 1.11(i) is proved. Remark 5.2. Liang’s notation is very different from ours. To see that his process (with 2 2 selection added and the coalescence rate multiplied by u VR) and our limiting process ˆ ∞ ∞ do coincide, notice that m(dx) = dx in our case and Xt(x) = wt (x)δ0 + (1 − wt (x))δ1. Hence, taking χ(κ) = 10(κ) = ρ(κ) and ψ(x) = f(x), φ(x) = g(x) in Proposition 7.2 in [35] indeed leads to Z Z f g 2 2 ∞ 2 2 ∞ 2 d[Z , Z ]t = u VR wt (x)f(x)g(x)dx − u VR wt (x) f(x)g(x)dx Rd Rd Z 2 2 ∞ ∞  = u VR wt (x) 1 − wt (x) f(x)g(x)dx. (5.60) Rd

5.2 Proof of Theorem 1.13. We divide the proof into two parts. The first, and simpler, shows that the only possible n ∞ limit for (Ξ )n≥1 is the system of branching and coalescing Brownian motions Ξ . The n second part, tightness of the sequence (Ξt )n≥1, is rather more involved and will be broken into a number of smaller steps. Recall that Ξn is defined on the probability space (Ω, F 0, P) and takes its values in d d the set Mp(R ) of all finite point measures on R , which we have endowed with the topology of weak convergence. The linear hull of the set of test functions (recall (1.13))

|Ξ|  Z  Y i Φexp,ln f :Ξ 7→ f(ξ ) = exp (ln f(x))Ξ(dx) , (5.61) Rd i=1

1 d d where f ∈ C (R ) takes values in [0, 1], is dense in C0(Mp(R )) (the space of continuous d functions on Mp(R ) tending to 0 at “infinity”) for the topology of the uniform conver- gence (cf. Lemma 0.2 in [27], where the formalism is different but the result is equivalent to our claim). Consequently, the linear span of the set of functions (5.61) is dense in d Cb(Mp(R )) for the topology of uniform convergence over compact sets, and functions d of the above form will thus be sufficient to characterise the law of an Mp(R )-valued random variable. In this section, the atoms of the point measures considered will be viewed as particles evolving in Rd. We start with the following result. Lemma 5.3. The finite dimensional distributions of the system of scaled processes Ξn converge as n → ∞ to those of the system of branching and coalescing Brownian motions Ξ∞, described in the statement of Theorem 1.13. In particular, the only possible limit n ∞ point for the sequence (Ξ )n≥1 is Ξ .

Proof of Lemma 5.3. Suppose first that the density ψ of the locations of the atoms of n Rd Ξ0 can be factorised as ψ(x1, . . . , xk) = ψ1(x1) ··· ψk(xk), with ψi ∈ Cc( ) being a probability density function on Rd for every i.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 48/89 The spatial Lambda-Fleming-Viot process with selection

(n) Let us write (Mt )t≥0 for the unscaled SLFVS with parameters sn, un (in the fixed (n) (n) radius case), and wt for a representative of the density of Mt , for every t ≥ 0. Recall n (n) 1/3 the notation (Mt )t≥0 for the scaled process whose density at time t is wnt (n ·). Let 0 1 Rd (n) n 0 w ∈ C ( ), and suppose that M0 is such that M0 has density w for every n ≥ 1. With this initial condition and Relation (5.23) (where ∆f can be replaced by any function 2 Rd n f ∈ Cc ( )), it is easy to check that M 0 , as defined in Theorem 1.11, converges to the 0 0 measure M ∈ Mλ with density w as n → ∞. Hence, by Theorem 1.11 the sequence of n ∞ 0 processes (M )n≥1 converges weakly to M starting at M . Using the approximation n (n) 1/3 −2/3 (5.23) to replace hwt , ψii by hwnt (n ·), ψii + O(n ) on the third line, together with Fubini’s Theorem and the duality formula (1.26), we have

 k  Z  E Y n M (n) ψi(xi)1{0}(κi)M t (dxi, dκi) 0 Rd i=1 ×{0,1}  Z  k   E Y n = M (n) ψ1(x1) ··· ψk(xk) wt (xi) dx1 ... dxk 0 Rd k ( ) i=1  Z  k   E Y (n) 1/3 −2/3 = M (n) ψ1(x1) ··· ψk(xk) wnt (n xi) dx1 ... dxk + O(n ) 0 Rd k ( ) i=1  Z  k   −dk/3E −1/3 −1/3 Y (n) = n M (n) ψ1(n x1) ··· ψk(n xk) wnt (xi) dx1 ... dxk 0 Rd k ( ) i=1 + O(n−2/3)

Z  Nnt  −dk/3 −1/3 −1/3 Y (n) j = n ψ1(n x1) ··· ψk(n xk)EΞ[x1,...,xk] w0 (ξnt) dx1 ... dxk Rd k ( ) j=1 + O(n−2/3)

Z  Nnt  Y (n) 1/3 −1/3 j 1/3 1/3 = ψ1(x1) ··· ψk(xk)EΞ[n x1,...,n xk] w0 (n (n ξnt)) dx1 ... dxk Rd k ( ) j=1 + O(n−2/3) N n Z  t  Y n n,j −2/3 = ψ1(x1) ··· ψk(xk)EΞ[x1,...,xk] w0 (ξt ) dx1 ... dxk + O(n ) Rd k ( ) j=1 N n  t  Y 0 n,j −2/3 = E n w (ξ ) + O(n ). (5.62) Ξ0 t j=1

Now the expression on the l.h.s. of (5.62) converges to the corresponding expression for ∞ n M as n → ∞. Therefore, if Ξ is the limit of a subsequence of (Ξ )n≥1, then for every t ≥ 0

N ∞  k  Z   t  Y ∞ Y 0 j E 0 M ψi(xi)1{0}(κi)Mt (dxi, dκi) = EΞ0 w (ξt ) . (5.63) Rd i=1 ×{0,1} j=1

On the other hand, as explained in Point 3) of the proof of Theorem 1.11, the same 0 equality (5.63) holds for any w if we replace Ξt in the r.h.s. by the empirical distribution ∞ at time t, Ξt , of the system of independent branching (and in dimension 1, coalescing) Brownian motions described in Theorem 1.13. As mentioned in the paragraph around (5.61), test functions of the form used in the r.h.s. of (5.63) are separating. We can n therefore conclude that the one-dimensional distributions of (Ξt )t≥0 converge to those

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 49/89 The spatial Lambda-Fleming-Viot process with selection

∞ of (Ξt )t≥0. The generalisation to the finite-dimensional distributions is straightforward since the duality formula (1.26) holds on any time interval [s, t] (if we replace w0 by ws j j and ξt by ξt−s). Finally, since linear combinations of functions of the product form

x 7→ ψ1(x1) ··· ψk(xk), (5.64)

d 1 with ψi ∈ Cc(R ) a probability density function for every i, are dense (for the L norm) in the set of probability densities ψ on (Rd)k, an analogue of Relation (5.63) can be established for this more general class of initial densities ψ. The same chain of arguments is then sufficient to conclude the proof of Lemma 5.3.

Tightness n We now show tightness of the sequence (Ξ )n≥1. To ease the notation, we write Pψ

d for the probability measure on DMp(R )[0, ∞) under which the locations of the atoms of n each Ξ0 have density ψ. We first show that the compact containment condition holds n d d if we see (Ξ )n≥1 as a sequence of Mp(Rc)-valued Markov processes, where Rc is the one-point compactification of Rd. We can then use Theorem 3.9.1 in [21], together with d the fact that the linear span of functions of the form (5.61) is dense in Cb(Mp(Rc)) for n the topology of uniform convergence on compact sets to reduce the tightness of (Ξ )n≥1 n ∞ d to that of (Φexp,ln f (Ξ ))n≥1 for every f ∈ C (Rc) with values in [0, 1]. More precisely, we show that for every such f, every T > 0, every sequence of stopping times (τn)n≥1 bounded by T and every ε > 0, there exists δ = δ(f, T, ψ, ε) such that

n n " Nτ +t Nτ # Yn Yn lim sup P sup f(ξn,i ) − f(ξn,i) > ε ≤ ε. ψ τn+t τn (5.65) n→∞ 0≤t≤δ i=1 i=1 (This is actually stronger than the classical Aldous criterion based on stopping times [2], which considers the supremum over t ∈ [0, δ] of the probability that the increment between times τn and τn +t is larger than ε.) Finally, using Lemma 5.3 and Corollary 3.9.3 n d in [21], we shall be able to conclude that (Ξ )n≥1 is tight in DMp(R )[0, ∞), as desired n ∞ d (and furthermore that Ξ converges weakly to Ξ in DMp(R )[0, ∞)). We shall proceed in a number of steps. First we control the maximum number of n particles in Ξt up to time T + 1. Not only does this give us the compact containment condition, but conditional on this result, it is then easy to control the probability that there is a branch in an interval of length δ (by branch, we mean that a particle is replaced by two “parental” particles during a selective event). If we can also show that with high probability there is no coalescence (i.e., no group of at least two particles is ever removed during the same event and replaced by one or two “parental” particles), so that the number of particles in the system does not change, then the problem is reduced to controlling the jumps in a random walk. The most involved step, which is the substance of Proposition 5.5, is showing that indeed there is no accumulation of coalescence events. Let us replace Rd by its one-point compactification Rcd, so that the set of finite point measures with a total mass less than K is compact for every K > 0. Recall the notation |Ξ| = hΞ, 1i for the total mass of the measure Ξ. The following lemma thus implies the compact containment condition. Lemma 5.4. Let T > 0. Given ε > 0, there exists K > 0 such that for every n ≥ 1,   n ε Pψ sup |Ξt | > K ≤ . (5.66) 0≤t≤T +1 4

Proof of Lemma 5.4. Recall that two particles are created when at least one of the extant particles is affected by a selective event. For a given particle of Ξn, this happens at

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 50/89 The spatial Lambda-Fleming-Viot process with selection

rate nsnVRun = uσVR. Furthermore, the presence of more than one particle in the area affected by the event does not speed up the branching. Consequently, the number of n particles in (Ξt )t≥0 is stochastically bounded by the number of particles in a Yule process in which particles split (independently of one another) into two offspring at rate uσVR. n Let T > 0. Since the initial value, Ξ0 , has k < ∞ particles, we conclude that there exists K ∈ N such that for every n ≥ 1,   n ε Pψ sup |Ξt | > K ≤ , (5.67) 0≤t≤T +1 4 as required.

From now on, all our calculations proceed conditional on the event   n An = sup |Ξt | ≤ K . (5.68) t∈[0,T +1]

From our reasoning above, we already see that for any t ∈ [0,T ], conditional on An, the probability that at least one particle is created during the time interval (t, t + δ] is bounded by

  −uσVRδ K Pψ a given particle branches in (t, t + δ] ≤ K 1 − e ≤ uσKVR δ. (5.69)

This bound is uniform in n and so we see that there exists δ1 ∈ (0, 1) such that for every n ≥ 1, ε P at least 1 particle created in (τ , τ + δ ]; A  ≤ . (5.70) ψ n n 1 n 4 We also want to control the probability of coalescence events. Because of the calculation above, it is enough to do so in the absence of branching. c Proposition 5.5. Let Bδ denote the event that there is no branching event in (τn, τn + δ]. There exists δ2 ∈ (0, δ1] such that

c ε Pψ[at least 1 coalescence in (τn, τn + δ2]; An,B ] ≤ . (5.71) δ2 4 Before proving Proposition 5.5, let us turn to the final ingredient in the proof and control the “jumps” of a single particle. From the description in Section 1.2, after rescaling of time and space, ξn,1 “jumps” (i.e., is removed and replaced by another particle seen as its parent) at rate nunVR(1 + 2/3 sn) = n uVR(1 + o(1)), to a new location whose distribution is symmetric about its current location. Furthermore, the locations of the particle both before and after the jump belong to the same ball of radius Rn−1/3, and so the length of the jump is bounded by 2Rn−1/3. Doob’s Maximal Inequality and standard estimates for the variance of a compound Poisson process then imply that there exists C1 > 0 such that for every n, any s, η > 0, and every stopping time Tn,   C1 P sup ξn,1 − ξn,1 > η ≤ s, (5.72) ψ Tn+t Tn 2 t∈[0,s] η

n,1 where we have used the strong Markov property of ξ at time Tn. From this, we can draw two conclusions. The first one, which is not necessary for the rest of the proof but gives some nice insight on our sequence of processes, is that taking s = T and Tn = 0, we can find a compact set E ⊂ Rd such that for every n ≥ 1,   n c Pψ sup Ξt (E ) > 0; An ≤ ε. (5.73) t∈[0,T ]

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 51/89 The spatial Lambda-Fleming-Viot process with selection

˜ n ˜c Indeed, since ψ is integrable, there exists a compact set E such that Pψ[Ξ0 (E ) > 0] < ε/2. Conditionally on all the initial particles belonging to E˜, by (5.72) we can then find a radius η > 0 such that the probability that any of the (at most) K particles leaves E = E˜ + B(0, η) is less than ε/2. Second, conditional on the number of individuals not changing during a time interval of length δ, we can index the particles of Ξn and Ξn by a common indexing set which τn τn+δ we denote I , in such a way that a particle in Ξn has the same label as a particle in n τn+δ Ξn τn if and only if the position of the former can be seen as the result of a (potentially empty) series of jumps carried out by the latter during (τn, τn +δ]. Under this assumption, for any f ∈ C∞(Rcd) with values in [0, 1], a Taylor expansion yields

Y n,i Y n,i X n,i n,i f(ξ ) − f(ξ ) ≤ C ∇f ξ − ξ , (5.74) τn+t τn τn+t τn i∈In i∈In i∈In for some C > 1, where the sup norm of ∇f is finite since Rcd is compact. Together with (5.72) and the choice s = δ, Tn = τn and η = ε/(KCk∇fk), this shows that there c exists δ3 ∈ (0, δ2] such that for n large enough, writing Cδ for the event that there is no coalescence in (τn, τn + δ],   Y Y KC1 ε P sup f(ξn,i ) − f(ξn,i) > ε ; A ,Bc ,Cc ≤ δ ≤ . (5.75) ψ τn+t τn n δ3 δ3 2 3 t∈[0,δ3] η 4 i∈In i∈In

Combining Lemma 5.4, (5.70), Proposition 5.5 and (5.75), we obtain (5.65) with δ = δ3. It remains to prove Proposition 5.5. Let us remark that it is not enough to consider particles at an initial separation of order O(1) (or O(n1/3) before rescaling). In particular, when two particles are created through a selective event, their (rescaled) initial distance is of order O(n−1/3) and so we also need to control the coalescence of particles starting from very small initial separations.

Proof of Proposition 5.5. It suffices to consider just two particles and find δ2 > 0 such that the probability that they coalesce in a time interval of length δ2 is bounded by ε/(2K(K − 1)), irrespective of their initial separation. Once this bound has been estab- lished, we can write

 c  K(K − 1) ε ε Pψ at least 1 coalescence in (τn, τn + δ2]; An,B ≤ = , (5.76) δ2 2 2K(K − 1) 4 since, on the event An, there are at most K(K − 1)/2 pairs of particles at any time. −1/3 Recall that before scaling, each particle jumps at rate proportional to un = un . This makes it convenient to work in the timescale (n1/3t, t ≥ 0) and without rescaling ˜n,i i space. We shall write ξt = ξn1/3t, i ∈ {1, 2}. When ξ˜n,1 and ξ˜n,2 are separated by more than 2R, they cannot be contained in the same reproduction event, and so they evolve independently of one another. The 1/3 ith particle jumps at rate n unVR(1 + sn) = uVR(1 + o(1)) to a new location, which is uniformly distributed over the ball B(Z,R), where Z itself is chosen uniformly at random from B(ξ˜n,i,R). In what follows, we only need that the jump made by each particle is an independent realisation of a random variable X taking values in B(0, 2R), whose distribution is symmetric about the origin. On the other hand, when |ξ˜n,1 − ξ˜n,2| < 2R, the two particles can both lie in a region affected by a given reproduction event and their jumps become correlated. In particular, if they are both affected by this event, they merge together. The infinitesimal generator of

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 52/89 The spatial Lambda-Fleming-Viot process with selection

˜n,1 ˜n,2 Rd 2 ((ξt , ξt ))t≥0, applied to any function φ ∈ C0((c) ) (the space of continuous functions on (Rcd)2 vanishing at infinity) takes the form Z Z 1 ˜2 ˜1 ˜2  u(1 + sn) φ(z, ξ ) − φ(ξ , ξ ) dzdx B(ξ˜1,R)\B(ξ˜2,R) B(x,R) VR Z Z 1 ˜1 ˜1 ˜2  + u(1 + sn) φ(ξ , z) − φ(ξ , ξ ) dzdx B(ξ˜2,R)\B(ξ˜1,R) B(x,R) VR Z Z −1/3 1 ˜2 ˜1 ˜1 ˜2  + u(1 − un )(1 + sn) φ(z, ξ ) + φ(ξ , z) − 2φ(ξ , ξ ) dzdx B(ξ˜1,R)∩B(ξ˜2,R) B(x,R) VR Z Z 2 −1/3 1 ˜1 ˜2  + u n (1 + sn) φ(z, z) − φ(ξ , ξ ) dzdx. (5.77) B(ξ˜1,R)∩B(ξ˜2,R) B(x,R) VR

ˆn,1 ˆn,2 We can think of this as composed of two parts: the process ((ξt , ξt ))t≥0 whose generator is determined by the first three lines above, on top of which a coalescence 2 −1/3 ˆn,1 ˆn,2 event occurs at instantaneous rate u n (1 + sn)VR(0, ξt − ξt ) (recall that VR(0, a) is the volume of the intersection B(0,R) ∩ B(a, R)). With this description, the probability that the two particles have not coalesced by time δn2/3 (which corresponds to a time span of δ on the timescale of ξn,i) is given by

  2 Z δn2/3  u (1 + sn) P T˜ > δn2/3 = E exp − V (0, ξˆn,1 − ξˆn,2) ds , (5.78) ψ ψ 1/3 R s s n 0 where we have written T˜ for the coalescence time of the two particles. ˆn,1 ˆn,2 Since VR(0, x) = 0 when x ≥ 2R, it just remains to establish how much time ξ − ξ spends in the ball B(0, 2R) by time δn2/3. To do this, we define two sequences of stopping n n times, (σk )k≥1 and (τk )k≥1 by n ˆn,1 ˆn,2 n n ˆn,1 ˆn,2 σ1 = inf{t ≥ 0 : |ξt − ξt | ≤ 2R}, τ1 = inf{t ≥ σ1 : |ξt − ξt | > 2R}, (5.79) and for every k ≥ 1,

n n ˆn,1 ˆn,2 n n ˆn,1 ˆn,2 σk = inf{t ≥ τk−1 : |ξt − ξt | ≤ 2R}, τk = inf{t ≥ σk : |ξt − ξt | > 2R}. (5.80)

Now, we have the following result. Lemma 5.6. There exists C > 0 such that for every n, k ≥ 1,

 n n Eψ τk − σk ≤ C. (5.81)

In words, although the two particles are correlated when they are close together, each “incursion” of ξˆn,1 − ξˆn,2 inside B(0, 2R) lasts only O(1) units of time, uniformly in n. The proof of Lemma 5.6 is similar to that of Lemma 6.6 in [7] (based on the facts that the difference walk jumps at a rate bounded from below by a positive constant, independent of its current value, and that the probability that this jump leads to a sufficient increase ˆn,1 ˆn,2 of their separation for ξt − ξt to leave B(0, 2R) is also bounded from below by a positive constant). Therefore, we omit it here. ˆn,1 ˆn,2 Outside B(0, 2R), the difference ξt − ξt has the same law as a symmetric random walk, with jumps of size at most 2R, jumping at rate 2uVR(1 + sn). Its behaviour will be determined by the spatial dimension. d ≥ 3: When d ≥ 3, transience of the random walk guarantees that the number of times ξˆn,1 − ξˆn,2 returns to B(0, 2R) is a.s. finite. Since the parameter n appears only in the jump rates and not in the embedded chain of locations (during an excursion outside B(0, 2R)), the probability that the difference walk enters B(0, 2R) at least k times decays

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 53/89 The spatial Lambda-Fleming-Viot process with selection

to 0, uniformly in n, as k → ∞. Together with Lemma 5.6 and the fact that VR(0, ·) is bounded, this shows that for every η > 0,

 Z δn2/3 1/3  ˆn,1 ˆn,2 n lim Pψ VR(0, ξs − ξs ) ds > η 2 = 0 (5.82) n→∞ 0 u (1 + sn) As a consequence, coming back to (5.78), observing that

 2 Z δn2/3  u (1 + sn) P T˜ ≤ δn2/3 = P Exp(1) ≤ V (0, ξˆn,1 − ξˆn,2) ds (5.83) ψ ψ 1/3 R s s n 0 (where Exp(1) denotes an exponential random variable with parameter 1) and choosing η small enough that P[Exp(1) ≤ η] ≤ ε/(2K(K − 1)), we can conclude that for any δ > 0,

 2/3 ε lim sup Pψ T˜ ≤ δn ≤ . (5.84) n→∞ 2K(K − 1) d = 2: When d = 2, we claim that there exists C0 > 0, independent of n, such that for every x1, x2 with |x1 − x2| > 2R, C0 P σn > δn2/3 ≥ , (5.85) {x1,x2} 1 log(δn2/3) where we have written P{x1,x2} for the probability measure under which the two particles start at locations x1, x2. The proof of this claim is very similar to the beginning of the proof of Lemma 4.2 in [9], and so we only sketch the main ideas. We can a.s. embed the ˆn,1 ˆn,2 trajectories of the difference process ξt − ξt into the trajectories of a two-dimensional Brownian motion, in the same spirit as Skorokhod’s embedding in one dimension (see e.g. [11]). Now, since the jumps of the difference process (when outside B(0, 2R)) are rotationally invariant, we have ˆn,1 ˆn,2  inf P{x1,x2} ξ − ξ leaves B(0, 4R) before entering B(0, 2R) > 0, (5.86) |x1−x2|>2R and the result then follows from that for Brownian motion, namely Theorem 2 in [45] n applied with a = 2R and r ≥ 4R. As a consequence, the number NE of excursions outside B(0, 2R) that the difference walk makes before starting an excursion of (time) length at least δn2/3 is stochastically bounded by a geometric random variable with success probability C/ log(δn2/3). Now, once the difference walk has started such a long excursion (say, the kth one), it is sure not to come back within B(0, 2R) before time δn2/3 and the number of incursions in B(0, 2R) in the time interval [0, δn2/3] is bounded by k. Thus, fixing η > 0 as before and observing that VR(x, y) is bounded by the volume VR of a ball of radius R, we obtain that

 Z δn2/3 1/3  ˆn,1 ˆn,2 n Pψ VR(0, ξs − ξs ) ds > η 2 0 u (1 + sn) dCn log(δn2/3)e  E 1/3   n n 2/3  X n n n ≤ Pψ NE > CE log(δn ) + Pψ (τk − σk ) > η 2 u (1 + sn)VR k=1 2 −Cn C0 u (1 + sn)VR n 2/3 ≤ e E + C log(δn )C, (5.87) ηn1/3 E n where the last inequality uses the stochastic bound of NE first, and then Markov’s n inequality. Choosing CE = log n, for instance, we deduce that for any δ > 0,  Z δn2/3 1/3  ˆn,1 ˆn,2 n lim Pψ VR(0, ξs − ξs ) ds > η 2 = 0, (5.88) n→∞ 0 u (1 + sn) and we conclude as in (5.84).

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 54/89 The spatial Lambda-Fleming-Viot process with selection

d = 1: Finally, when d = 1 it is shown in [41] that there exists C0 > 0 such that for every x1, x2 such that |x1 − x2| > 2R,

0 n 2/3 C P{x ,x }[σ > δn ] ≥ √ . (5.89) 1 2 1 δ n1/3

Proceeding as before, and with the same notation, we therefore have

 Z δn2/3 1/3  ˆn,1 ˆn,2 n Pψ VR(0, ξs − ξs ) ds > η 2 0 u (1 + sn) √ dCn δn1/3e √  E 1/3   n n 1/3 X n n n ≤ Pψ NE > CE δn + Pψ (τk − σk ) > η 2 u (1 + sn)kVRk k=1 2 √ −Cn C0 u (1 + sn)kVRk n 1/3 ≤ e E + C δn C. (5.90) ηn1/3 E

n Choosing CE to be a constant large enough for the first term to be less than ε/(6K(K−1)), and then δ3 > 0 small enough for the second term to be less than ε/(6(K(K − 1)), and finally taking η small enough, we obtain that for any δ ≤ δ3,

 Z δn2/3 1/3   ˜ 2/3 ˆn,1 ˆn,2 n Pψ T ≤ δn ≤ Pψ VR(0, ξs − ξs ) ds > η 2 + P[Exp(1) ≤ η] 0 u (1 + sn) ε ε ε ε ≤ + + = . (5.91) 6K(K − 1) 6K(K − 1) 6K(K − 1) 2K(K − 1)

We have now proved the desired bound for the probability of a coalescence in any dimension and the proof of Proposition 5.5 is complete.

6 Convergence of the rescaled SLFVS and its dual – the stable radius case

We proceed exactly as for the case of fixed radius.

6.1 Proof of Theorem 1.14 n As in the proof of Theorem 1.11, we first show that the sequence (M )n≥1 is tight in

DMλ [0, ∞), then we show that any limit point of a subsequence satisfies the martingale problem stated in Theorem 1.14, and finally we prove that there exists at most one n solution in DMλ [0, ∞) to this martingale problem to conclude that (M )n≥1 indeed converges to it. 1) Tightness. We use the same method as in the proof of Theorem 1.11, but the computations n required are different. Note that for every n ≥ 1 the process M has sample paths in

DMλ [0, ∞), since the unscaled process from which it is constructed has a.s. càdlàg paths by Theorem 1.2. Using again Theorem 3.9.1 in [21] and the compactness of Mλ, we n n reduce the proof of tightness of (M )n≥1 to the proof of tightness of (ΨF,f (M ))n≥1 for 3 R ∞ Rd every F ∈ C ( ) and f ∈ Cc ( ). Hence, let us now fix F and f as above. Since ΨF,f is a bounded function on n Mλ and consequently (ΨF,f (M t ))n≥1 is a tight sequence for every t ≥ 0, by the Aldous- Rebolledo criterion we only have to prove the equivalent of (5.1) and (5.2) after finding an n n expression for the predictable finite variation A of (ΨF,f (M t ))t≥0 and for its predictable n ∞ Rd quadratic variation Q . As earlier, we first replace f by any function ϕ ∈ Cc ( ) and then specialise the formulae we derive to a suitably chosen function ϕf to conclude.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 55/89 The spatial Lambda-Fleming-Viot process with selection

For any given n ≥ 1, the extended generator of the unscaled process with parameters satisfying (1.27), (1.28), (1.29) and (1.31), acting on functions of the form ΨF,ϕ, is given by Z Z ∞ Z 1 n LΨ (M) = w(y)(1 + s w(z))F (hΘ+ (w), ϕi) − F (hw, ϕi) F,ϕ 2 n x,r,un Rd 1 B(x,r)2 Vr (6.1) o +(1 − w(y) + s (1 − w(y)w(z)))F (hΘ− (w), ϕi) − F (hw, ϕi) dydzµ(dr)dx, n x,r,un where, as usual now, w is a representative of the density of M. Arguing as in the part on tightness of the proof of Theorem 1.11 and using (6.1) with F and F 2, we obtain that the predictable finite variation part of (ΨF,ϕ(Mt))t≥0 is given at any time t ≥ 0 by Z t At = LΨF,ϕ(Ms) ds, (6.2) 0 and its predictable quadratic variation by

t ∞ Z Z Z Z 1 n 2 Q = w (y)(1 + s w (z))F (hΘ+ (w ), ϕi) − F (hw , ϕi) t 2 s n s x,r,un s s 0 Rd 1 B(x,r)2 Vr + (1 − w (y) + s (1 − w (y)w (z)))F (hΘ− (w ), ϕi) s n s s x,r,un s 2o − F (hws, ϕi) dydzµ(dr)dxds. (6.3)

To make the expressions easier to read, below we retain the notation β, γ and δ from n n (1.33). Let us now consider the process (Mt )t≥0 whose density at time t is wt (·) := β n wnt(n ·). Writing explicitly the martingale problem satisfied by M and performing a change in the time and space variables, we obtain that the extended generator of this Markov process is given by

n L ΨF,ϕ(M) (6.4) Z Z ∞ Z 1 n −β −β = n 2 w(n y) 1 + snw(n z)) Rd 1 B(x,r)2 Vr  +  × F (hΘ −β −β (w), ϕi) − F (hw, ϕi) n x,n r,un −β −β −β  − o + (1 − w(n y) + sn(1 − w(n y)w(n z))) F (hΘ −β −β (w), ϕi) − F (hw, ϕi) n x,n r,un dydzµ(dr)dx Z Z ∞ 1 Z 1 n = n1−βα w(y)(1 + s w(z))F (hΘ+ (w), ϕi) − F (hw, ϕi) d+1+α 2 n x,r,un Rd n−β r B(x,r)2 Vr o + 1 − w(y) + s (1 − w(y)w(z))F (hΘ− (w), ϕi) − F (hw, ϕi) dydzdrdx, n x,r,un which allows us to write as in the fixed radius case that the predictable finite variation n R t n n part of ΨF,ϕ(M ) is equal to ( 0 L ΨF,ϕ(Ms )ds)t≥0, while its predictable quadratic variation is given by the integral with respect to time of the function in (6.4) (applied to M n [F (hΘ± (wn), ϕi) − F (hwn, ϕi)] s ) in which the increments x,r,un s s are squared. It remains to apply these results to ϕf defined by ndβ Z ϕf (x) = f(y) dy, (6.5) V1 B(x,n−β ) and to use the fact that for every t ≥ 0,

n n n n ΨF,ϕf (Mt ) = F (hwt , ϕf i) = F (hwt , fi) = ΨF,f (M t ) (6.6)

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 56/89 The spatial Lambda-Fleming-Viot process with selection

n n n (where wt is the density of M t defined in (1.32)), to identify A as Z t n n n At = L ΨF,ϕf (Ms ) ds, (6.7) 0 and Qn as Z t Z Z ∞ 1 Z 1 n Qn =n1−βα wn(y)(1 + s wn(z))F (hΘ+ (wn), ϕ i) t d+1+α 2 s n s x,r,un s f 0 Rd n−β r B(x,r)2 Vr n 2 − F (hws , fi) + 1 − wn(y) + s (1 − wn(y)wn(z))F (hΘ− (wn), ϕ i) s n s s x,r,un s f n 2o − F (hws , fi) dydzdrdxds. (6.8)

Now that we have an expression for An and Qn, let us bound their increments to n n complete the proof of tightness of (ΨF,f (M ))n≥1. We start with A . As before, it is n n convenient to split L ΨF,ϕf (Ms ) into its neutral and selective components. Using a Taylor expansion of the function F , we obtain that the neutral part is equal to Z Z ∞ 1 Z 1 h n1−βαF 0(hwn, fi) wn(y) Θ+ (wn) − wn, ϕ s d+1+α s x,r,un s s f Rd n−β r B(x,r) Vr i + (1 − wn(y)) Θ− (wn) − wn, ϕ dydrdx s x,r,un s s f 00 n ∞ F (hw , fi) Z Z 1 Z 1 h 2 + n1−βα s wn(y) Θ+ (wn) − wn, ϕ d+1+α s x,r,un s s f 2 Rd n−β r B(x,r) Vr 2i + (1 − wn(y)) Θ− (wn) − wn, ϕ dydrdx + ε s x,r,un s s f n Z Z ∞ Z 1−βα−γ 0 n 1 1 n = n uF (hws , fi) d+1+α ws (y)(ϕf (z) − ϕf (y)) dydzdrdx Rd n−β r B(x,r)2 Vr 2 Z Z ∞ Z 1−βα−2γ u 00 n 1 1  n n 2 + n F (hws , fi) d+1+α ws (y)h1B(x,r)(1 − ws ), ϕf i 2 Rd n−β r B(x,r) Vr n n 2 + (1 − ws (y))h1B(x,r)ws , ϕf i dydrdx + εn, (6.9) with 3 Z Z ∞ Z 1−αβ−3γ u CF 1 2 3 |εn| ≤ n d+1+α h1B(x,r), ϕf i dydrdx, (6.10) 3! Rd n−β r B(x,r) Vr

(3) where the constant CF is the supremum of F over the bounded set in which its ∞ Rd argument takes its values (recall that ϕf ∈ Cc ( )). Consider the first term on the right hand side of (6.9). Since 1 − αβ − γ = 0, n1−βα−γ = 1. We split the integral over the radii −β into the sum of the integrals over [n , 1] and [1, ∞). By using a Taylor expansion of ϕf and a symmetry argument to cancel the integral of (z − y)dz, we obtain that Z Z 1 Z 0 n 1 1 n uF (hws , fi) d+1+α ws (y)(ϕf (z) − ϕf (y)) dydzdrdx (6.11) Rd n−β r B(x,r)2 Vr Z Z 1 1 Z 1 ≤ C |z − y|21 dydzdrdx d+1+α {B(x,r)∩Sϕf 6=∅} Rd n−β r B(x,r)2 Vr Z 1 0 1 d+2 00 −β(2−α) ≤ C Vol(Sϕf + B(0, 1)) d+1+α r dr = C (1 − n ) n−β r 0 00 for some constants C,C ,C > 0 (where Sϕf denotes the compact support of ϕf ). To control the integral over radii in [1, ∞), the cruder bound |ϕf (y) − ϕf (z)| ≤ 2kfk suffices

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 57/89 The spatial Lambda-Fleming-Viot process with selection and, using the fact that d Vol{x : Sϕf ∩ B(x, r) 6= ∅} ≤ C2(r ∨ 1), (6.12) we have Z Z ∞ Z 0 n 1 1 n uF (hws , fi) d+1+α ws (y)(ϕf (z) − ϕf (y))dydzdrdx Rd 1 r B(x,r)2 Vr Z Z ∞ 1 Z 1 ≤ C 1 + 1 dydzdrdx d+1+α {y∈Sϕf } {z∈Sϕf } Rd 1 r B(x,r)2 Vr Z ∞ Z 0 1 ≤ C 1 Vol(Sϕ )dxdr d+1+α {B(x,r)∩Sϕf 6=∅} f 1 r Rd Z ∞ 00 1 d 000 ≤ C d+1+α r dr ≤ C , (6.13) 1 r again for some constants C,C0,C00 and C000 which depend only on d, F and f. To control the second term on the right hand side of (6.9), we use (6.12) together with the inequality n d |h1B(x,r)ws , ϕf i| ≤ kfkVol(Sϕf ∩ B(x, r)) ≤ C1kfk(r ∧ 1), (6.14) to see that it is bounded by u2C Z Z ∞ 1 n−γ F × 2C2kfk2 (rd ∧ 1)21 drdx 1 d+1+α {Sϕf ∩B(x,r)6=∅} 2 Rd n−β r Z ∞ −γ 1 d 2 d = C3n d+1+α (r ∧ 1) (r ∨ 1)dr n−β r Z 1 2d Z ∞ d −γ r −γ r −γ −β(d−α) = C3n d+1+α dr + C3n d+1+α dr = C4n 1 − n . (6.15) n−β r 1 r When d ≥ 2, d−α > 0 and so this bound tends to 0 as n → ∞. When d = 1, (α−1)β−γ = 0, and so this term is bounded by a constant as n → ∞. The same calculation shows that n εn → 0, uniformly in ws , as n → ∞. As a consequence, in any dimension the absolute n value of the neutral term of L ΨF,ϕf (M) is bounded by a constant independent of n and M. Proceeding in the same way as for the second term above, we obtain that the n n “selection” term (i.e., that involving sn) of L ΨF,ϕf (Ms ) is bounded by (recall that 1 − αβ − γ = 0) Z Z ∞ Z 1−βα−γ−δ 1 1 0 0 2uσn CF d+1+α 2 |ϕf (z )|dy dz dz drdx Rd n−β r B(x,r)3 Vr Z ∞ 1 Z ≤ Cn−δ 1 ∧ rd 1 dxdr d+1+α {B(x,r)∩Sϕf 6=∅} n−β r Rd Z ∞ 0 −δ 1 d d 00 −δ+αβ 00 ≤ C n d+1+α 1 ∧ r 1 ∨ r dr ≤ C n = C , (6.16) n−β r since αβ − δ = 0. Combining (6.9), (6.11), (6.13), (6.15) and (6.16), we obtain that there exists a constant C independent of n such that for every 0 ≤ t1 ≤ t2,

Z t2 n n L ΨF,ϕf (Ms ) ds ≤ C(t2 − t1), (6.17) t1 and therefore for every T > 0, every sequence of stopping times (τn)n≥1 bounded by T , and every ε > 0, we can choose η > 0 small enough so that lim sup sup P  An − An > ε = 0, τn+θ τn (6.18) n→∞ θ∈[0,η] which corresponds to the first part of the Aldous-Rebolledo criterion.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 58/89 The spatial Lambda-Fleming-Viot process with selection

For the quadratic variation of the martingale part, a similar analysis yields that the n integrand in Qt is bounded by Z Z ∞ Z 1−βα−2γ 1 1 0 0 Cn d+1+α ϕf (z)ϕf (z )dy dz dz drdx Rd n−β r B(x,r)3 Vr ∞ Z Z 1 2 ≤ C0n−γ 1 ∧ rd 1 drdx d+1+α {B(x,r)∩Sϕf 6=∅} Rd n−β r Z ∞ 00 −γ 1 d2 d 000 −γ −β(d−α) ≤ C n d+1+α 1 ∧ r 1 ∨ r dr ≤ C n 1 + n , (6.19) n−β r which is bounded by a constant independent of n. As before, we conclude that the equivalent of (6.18) with An replaced by Qn is satisfied for η > 0 small enough. The Aldous-Rebolledo criterion allows us to conclude that the sequence of real-valued pro- n 3 R ∞ Rd cesses (ΨF,f (M ))n≥1 is tight, and since this is true for every F ∈ C ( ) and f ∈ Cc ( ), n we obtain the tightness of (M )n≥1 in DMλ [0, ∞). 2) Identifying the limit. ∞ nk Suppose M ∈ DMλ [0, ∞) is the weak limit of a subsequence (M )k≥1, and for ∞ ∞ every t ≥ 0, write wt for a representative of the density of Mt . We know from the ∞ Rd previous paragraph that for every f ∈ Cc ( ) and every n ≥ 1,  Z t  n n n n ΨId,f M t − ΨId,f M 0 − L ΨId,ϕf (Ms )ds (6.20) 0 t≥0 is a martingale with predictable quadratic variation (6.8) (with F = Id), where Ln was defined in (6.4) and ϕf in (6.5). As in the fixed radius case, we first show that for every t ≥ 0,  Z t Z t  2uσ   E nk nk ∞ α ∞ ∞ lim L ΨId,ϕf (Ms )ds − hws , D fi − hws (1 − ws ), fi ds = 0, k→∞ 0 0 α (6.21) so that we can then use the fact that the quantity in (6.20) is a martingale, the fact that ΨId,f is a bounded continuous function and the Dominated Convergence Theorem to 0 conclude that for every 0 ≤ t < t , m ∈ N, 0 ≤ t1 < ··· < tm ≤ t and h1, . . . , hm ∈ Cb(Mλ),

0  Z t  2uσ   E ∞ ∞ ∞ α ∞ ∞ hwt0 , fi − hwt , fi − hws , D fi− hws (1 − ws ), fi ds t α m  Y  × h M ∞ = 0 i ti (6.22) i=1 and consequently that Zf is a martingale (with respect to the natural filtration of M ∞). In the case d ≥ 2 this property is again sufficient to conclude, since we showed in (6.19) that the quadratic variation of the martingale (6.20) tended to 0 as n → ∞, and therefore the limit Zf is the constant process equal to 0. We shall thus end this point 2) by showing that in one dimension, the quadratic variation of Zf and the bracket process between Zf and Zg have the required form. ∞ Rd n n Let us fix f ∈ Cc ( ) and show (6.21). Let us first analyse the part of L ΨId,ϕf (Ms ) corresponding to neutral events. By (6.9) with F = Id, since 1 − αβ − γ = 0 this neutral part takes to form Z Z ∞ Z 1−αβ−γ 1 1 n un d+1+α ws (y)(ϕf (z) − ϕf (y)) dydzdrdx Rd n−β r B(x,r)2 Vr Z Z  Z ∞  n 1 Vr(y, z) = u ws (y) d+1+α dr (ϕf (z) − ϕf (y)) dzdy, (6.23) Rd Rd −β |z−y| r Vr n ∨ 2

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 59/89 The spatial Lambda-Fleming-Viot process with selection

where Vr(y, z) is again the volume of the intersection B(y, r) ∩ B(z, r). Now, a simple Taylor expansion to the second order gives us that

−2β  ϕf (z) − ϕf (y) = f(z) − f(y) + O(n ) 1{Bn(z)∩Sf 6=∅} + 1{Bn(y)∩Sf 6=∅} , (6.24)

−β where Bn(·) = B(·, n ) and the error term is uniform in y and z. Since

Z Z  Z ∞ 1 V (y, z)  n−2β w(y) r dr 1 + 1 dzdy d+1+α {Bn(z)∩Sf 6=∅} {Bn(y)∩Sf 6=∅} Rd Rd −β |z−y| r Vr n ∨ 2 Z Z  |z − y|−d−α ≤ Cn−2β n−β ∨ dzdy ≤ C0n−β(2−α) → 0 (6.25) d Sf +Bn(0) R 2 as n → ∞, we can conclude that up to a vanishing error term, the neutral part of nk nk L ΨF,ϕf (Ms ) is given by Z Z  Z ∞  n 1 Vr(y, z) u ws (y) d+1+α dr (f(z) − f(y)) dzdy. (6.26) Rd Rd −β |z−y| r Vr n ∨ 2

Now, our computations (6.11) and (6.13) in the proof of tightness imply that the function

Z  Z ∞  1 Vr(y, r) an(y): y 7→ d+1+α dr (f(z) − f(y)) dz (6.27) Rd −β |z−y| r Vr n ∨ 2 is a continuous function, uniformly bounded in y and n. Hence, up to a vanishing error n n term we can first replace ws by ws in (6.26) and, second, use dominated convergence to pass to the limit as n → ∞ in (6.26), along the converging subsequence. Doing so, and using the fact that all the error terms go to 0 uniformly in s, we obtain that the limit in t L1 R nk nk norm of the neutral term in 0 L ΨId,ϕf (Ms )ds is equal to Z t Z Z ∞ u ws (y) Φ(|z − y|)(f(z) − f(y))dzdyds, (6.28) 0 Rd Rd where, as in (1.36), Z ∞ 1 Vr(y, z) Φ(|z − y|) := d+1+α dr. (6.29) |z−y| r Vr 2 In passing, let us show the following property of the operator we obtain in the limit. ∞ Rd Lemma 6.1. For f ∈ Cc ( ) write Z Dαf(y) = u Φ(|z − y|)(f(z) − f(y))dz. (6.30) Rd

α Then D is the infinitesimal generator of a symmetric α-stable process (ζt)t≥0.

Proof of Lemma 6.1. It is reassuring to first check that this is the generator of a well- defined Lévy process: Z Z ∞ 2 1 Vr(0, y) (1 ∧ |y| ) d+1+α dr dy Rd 0 r Vr Z 1 Z Z ∞ 1 2 0 1 ≤ C d+1+α |y| dydr + C d+1+α dr 0 r B(0,2r) 1 r Z 1 d+2 Z ∞ 00 r 0 1 ≤ C d+1+α dr + C d+1+α dr < ∞, (6.31) 0 r 1 r

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 60/89 The spatial Lambda-Fleming-Viot process with selection since α ∈ (1, 2). Therefore the measure Z ∞ 1 Vr(0, y) να(dy) = d+1+α dr dy (6.32) 0 r Vr is a Lévy measure and there exists a unique Lévy process with values in Rd whose Lévy α triplet is (0, 0, να). By Theorem 6.8 in [30], the operator D is its infinitesimal generator. To verify that the associated Lévy process is a symmetric stable process, we check the scaling property (the symmetry property is obvious from the form of να). Let b > 0. −1/α The generator of (b ζbt)t≥0 is given by Z α 1/α −1/α Db f(y) = bu Φ(|z − b y|)(f(b z) − f(y)) dz Rd Z = ub1+d/α Φ(|b1/αz − b1/αy|)(f(z) − f(y)) dz. (6.33) Rd

But a simple change of variables gives us that

Z ∞ 1 V (b1/αy, b1/αz) Φ(|b1/αz − b1/αy|) = r dr 1/α d+1+α b |z−y| r Vr 2 Z ∞ −1−d/α 1 Vr(y, z) = b d+1+α dr, (6.34) |z−y| r Vr 2

α α α and so Db = D for all b > 0. This shows the desired property of D .

n n Having identified the neutral part of the limit, we now turn to the part of L ΨId,ϕf (Ms ) corresponding to the selective events. It is given by Z Z ∞ Z 1−βα−γ−δ 1 1 n n n 0 0 0 uσn d+1+α 2 (ws (y)ws (z) − ws (z ))ϕf (z ) dydzdz drdx. Rd n−β r B(x,r)3 Vr (6.35) n Now, the term which is linear in ws is easy to deal with: by Fubini’s Theorem, it is equal to Z Z ∞ Z −δ 1 n 0 0 0 uσn d+1+α ws (z )ϕf (z ) dz drdx Rd n−β r B(x,r) Z  Z ∞ d  −δ n 0 0 V1r 0 uσV1 n = uσn ws (z )ϕf (z ) d+1+α dr dz = hws , fi, (6.36) Rd n−β r α where the last equality uses the fact that αβ − δ = 0. It is then straightforward to obtain that  Z t Z t  uσV1 E nk ∞ lim hws , fids − hws , fids = 0. (6.37) k→∞ α 0 0 Similar calculations show that the “quadratic” term in (6.35) is equal to

Z Z n−β log n  Z 2 Z −δ 1 1 n −α uσn d+1+α ws (y)dy ϕf (z)dzdrdx + O (log n) . Rd n−β r B(x,r) Vr B(x,r) (6.38) In contrast with the fixed radius case, here we first have to show that up to a vanishing error term, along the trajectories of the process M n we can replace the average of the density wn over a ball of radius at most n−β log n by wn, the average over a ball of radius n−β centered at the same point. In a second step, we use the same method as in the fixed nk 2 ∞ 2 radius case to prove that for every t ≥ 0, (wt ) converges to (wt ) in the appropriate sense.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 61/89 The spatial Lambda-Fleming-Viot process with selection

Concerning the first point, we have  Z 2 1 n ws (y)dy B(x,r) Vr  Z  Z  1 n n 1 n n n 2 = ws (y)dy + ws (x) ws (y)dy − ws (x) + ws (x) . (6.39) B(x,r) Vr B(x,r) Vr

Suppose we have the following lemma (whose proof is quite technical and is given in Appendix B). Lemma 6.2. Under the conditions of Theorem 1.14, for every r ∈ [n−β, n−β log n],

 Z wn(y)  E s n lim dy − ws (x) = 0 (6.40) n→∞ B(x,r) Vr uniformly in x ∈ Rd and uniformly in s over compact time intervals [0, t]. From this result, we can conclude from a dominated convergence argument and a Taylor expansion of ϕf that the “quadratic” part of (6.35) is equal to

Z Z n−β log n −δ Vr n 2 uσV1 n 2 uσn d+1+α ws (x) f(x) drdx + n = h(ws ) , fi + n, (6.41) Rd n−β r α where n tends to zero as n → ∞ uniformly in s over compact intervals of time. As concerns the second point, we proceed as in (5.40) and below. Using Proposi- tion B.1(ii) in Appendix B, the facts that the support of f is bounded, and that pε is α/(d+1) supported in B(0, ε) (so that τ2 in (B.4) is bounded by ε when |z1 − z2| ≤ ε and n is n 2 sufficiently large), we obtain that the first term in the decomposition (5.40) of h(ws ) , fi is bounded by a constant (independent of n, ε) times

n−a + εα/(d+1) + ε1/4 + εα/(2d+2) + n−β(d−1)ε(α−d)/(2d+2). (6.42)

Letting n tend to infinity in the above expression, we can write that the third term in the decomposition (5.40) is bounded by a constant times

α/(d+1) 1/4 α/(2d+2) (α−1)/4 ε + ε + ε + ε 1{d=1}. (6.43)

Finally, the second term in the decomposition (5.40) tends to 0 by the assumption that n M k converges to M ∞. As in the fixed radius case, we can therefore conclude from (6.41) that  Z t   Z t  uσV1 uσV1 E nk 2 ∞ 2 lim h(ws ) , fi + n ds − h(ws ) , fids = 0. (6.44) k→∞ 0 α α 0

(Note that this convergence is independent of the representatives of the different densities that we choose.) Combining (6.28), (6.37) and (6.44), we obtain (6.21) and we can therefore conclude that Zf is a martingale with respect to the natural filtration of M ∞. As we already mentioned, when d ≥ 2 this is sufficient to conclude that M ∞ satisfies the equations stated in Theorem 1.14(ii). To identify the quadratic variation of Zf and the bracket process between Zf and Zg when d = 1, we proceed exactly as in the fixed radius case and therefore we do not provide all the details. Setting

Z t n n n n n Wt (f) := hwt , fi − hw0 , fi − L ΨId,ϕf (Ms )ds, t ≥ 0, (6.45) 0

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 62/89 The spatial Lambda-Fleming-Viot process with selection we know from the paragraph 1) on tightness that for every n ≥ 1, W n(f) is a zero- mean martingale with predictable quadratic variation Qn given in (6.8) (with F = Id). 0 As a consequence, for every n ≥ 1, 0 ≤ t < t , m ∈ N, 0 ≤ t1 < ··· < tm ≤ t and h1, . . . , hm ∈ Cb(Mλ),   m   n 2 n 2 n n Y n  E W 0 (f) − W (f) − Q 0 + Q h M = 0. t t t t i ti (6.46) i=1

n Recall from our calculations in 1) that all summands in the expression for Wt (f) are bounded uniformly in n ≥ 1 and t in a compact time interval. Furthermore, the same cal- n n culations as those we performed to obtain the limit of the selection part of L ΨId,ϕf (Ms ) (see in particular (6.37) and (6.44)) show that

 4u2 Z t  E nk ∞ ∞ 2 lim Qt − hws (1 − ws ), f ids = 0, (6.47) k→∞ α − 1 0 uniformly over compact intervals of time. Letting n → ∞ in (6.46) along the converging subsequence, we arrive at

 2 Z t0  m  f 2 f 2 4u ∞ ∞ 2 Y ∞ E Z 0 − Z − hw (1 − w ), f ids hi M = 0. (6.48) t t α − 1 s s ti t i=1

This allows us to identify the predictable quadratic variation of the martingale Zf as

2 Z t f 4u ∞ ∞ 2 [Z ]t = hws (1 − ws ), f ids, t ≥ 0. (6.49) α − 1 0

By the analogue of (5.14) (with n−1/3 replaced by n−β), every jump of W n(f) is bounded by unVol(Sϕf ) independently of the size of the radius of the event, where we recall that −γ f f un = un . Consequently, Z has a.s. continuous trajectories. Since Z0 = 0, we can use the Dubins-Schwarz Theorem (or rather its extension since we do not know whether f f [Z ]∞ = +∞, see Remark 5.1) to conclude that Z is a time-changed Brownian motion, solution to the stochastic differential equation

2u p ∞ ∞ 2 f dWt = √ hw (1 − w ), f i dB , (6.50) α − 1 t t t where Bf denotes standard Brownian motion. The bracket process between Zf and Zg is then obtained by the same kind of calculations, writing first the bracket process for a fixed n and then identifying the limit by letting nk → ∞. We thus obtain that any limit of n a subsequence of (M )n≥1 satisfies the set of equations stated in Theorem 1.14(i). 3) Uniqueness of the limit. The argument is exactly the same as in the corresponding part of the proof of Theorem 1.11. Indeed, by another modification of the results of Chapter 7 in [35] α (replacing Brownian motion by the symmetric α-stable process (ζt)t≥0 generated by D – see Lemma 6.1), we obtain that any solution to the limiting system of equations stated in Theorem 1.14 is dual through the set of relations (1.26) to a system of particles following independent symmetric α-stable processes (with the same law as ζ), and branching independently at rate uσV1/α into two particles starting at the location of their parent. In one dimension, each pair of particles also coalesces at a rate 4u2/(α − 1) times the local time at zero of their separation, independently of the other pairs. Since the set of all test functions of the form (5.59) is separating, we can again conclude that there is at most one solution to the system of equations of Theorem 1.14. Hence, this solution n exists and the full sequence (M )n≥0 converges to it in DMλ [0, ∞).

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 63/89 The spatial Lambda-Fleming-Viot process with selection

6.2 Proof of Theorem 1.16 Most of the proof is identical to that of Theorem 1.13. That the only possible limit for n (Ξt )t≥0 is the system of branching (and in one dimension coalescing) symmetric α-stable processes described in the theorem, again follows from an adaptation of Chapter 7 of [35], in which the only change is that Brownian motion is replaced by the stable process generated by Dα (see (6.30)) and we have added natural selection/branching of particles. This gives us the analogue of Lemma 5.3 in the case of stable radii, whose proof is exactly the same as that of Lemma 5.3. Lemma 6.3. The finite dimensional distributions of the system of scaled processes Ξn converge as n → ∞ to those of the system of branching and coalescing α-stable motions Ξ∞, described in the statement of Theorem 1.16. In particular, the only possible limit n ∞ point for the sequence (Ξ )n≥1 is Ξ . n Next, we have to show that the sequence (Ξ )n≥1 is tight. Let again Pψ denote

d the probability measure on DMp(R )[0, ∞) under which for each n ≥ 1, the locations n of the atoms of Ξ0 have density ψ. As in the proof of Theorem 1.13, after showing that the compact containment condition holds if we replace Rd by its one-point com- d n d pactification Rc and consider each Ξ as taking its values in Mp(Rc), we shall use n Theorem 3.9.1 in [21] to deduce the tightness of (Ξ )n≥1 in D d [0, ∞) from the Mp(Rc ) n ∞ d tightness of (Φexp,ln f (Ξ ))n≥1 in D[0,1][0, ∞) for every f ∈ C (Rc) with values in [0, 1]. More precisely, we show that for any such function f, every T > 0, every sequence of stopping times (τn)n≥1 bounded by T and every ε > 0, there exists δ = δ(f, T, ψ, ε) > 0 such that N n N n τn+t τn  Y Y  lim sup P sup fξn,i  − fξn,i > ε ≤ ε. ψ τn+t τn (6.51) n→∞ 0≤t≤δ i=1 i=1 Once these properties have been shown, we can use Lemma 6.3 and Corollary 3.9.3 in n ∞ d [21] to conclude that (Ξ )n≥1 is tight in DMp(R )[0, ∞) and converges to Ξ . Again, we proceed in four steps. First, by exactly the same arguments as in the proof of Theorem 1.13, for every T > 0 and every ε > 0, there exists K > 0 such that for every n ∈ N we have   n ε Pψ[An] := Pψ sup |Ξs | ≤ K ≥ 1 − , (6.52) 0≤s≤T +1 4 which, in particular, grants us the compact containment condition since the set of all point measures on the compact space Rcd with total mass less than K is compact. Furthermore, there exists δ1 ∈ (0, 1), independent of the subinterval of [0,T ] considered, such that ε P at least 1 particle created in (τ , τ + δ ]; A  ≤ . (6.53) ψ n n 1 n 4 As before, the difficulty will be to control the coalescence (i.e., the events in which two or more particles are removed and replaced by one or two “parental” particles), but suppose for a moment that there is no change in the number of particles in the interval (τ , τ + δ ] I Ξn n n 2 and write n for the indexing set of the particles in τn . Then, exactly as before, we can write

Y n,i  Y n,i X n,i n,i f ξ − f ξ ≤ Ck∇fk ξ − ξ , (6.54) τn+t τn τn+t τn i∈In i∈In i∈In and it suffices to consider the motion of a single particle to control the evolution of the whole set of particles. This is slightly more involved than in the fixed radius case.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 64/89 The spatial Lambda-Fleming-Viot process with selection

n n Let (Zt )t≥0 be a Lévy process, independent of (ξt )t≥0 and with infinitesimal generator Z n−β Z n 1 Vr(x, y)  D φ(x) := u(1 + sn) d+1+α φ(y) − φ(x) dy dr, (6.55) 0 r Rd Vr

Rd Rd n n for every φ ∈ C0(c) and x ∈ c . Then the process (Xt)t≥0 defined by Xt = ξt + Zt α α has generator (1 + sn)D , where D was shown in Lemma 6.1 to be the generator of a symmetric stable process (indeed, observe that the jump rates of ξn and Zn depend only on the jump size |y − x|, hence the fact that the intensity measure of the jumps of X is the sum of the intensity measures of ξn and Zn). Using the strong Markov property and standard results on the growth of Lévy processes, see e.g. [42], we have for any η, δ > 0, and any stopping time Tn   δ Pψ sup XTn+t − XTn > η < C α (6.56) t∈[0,δ] η for a constant C which is independent of η, δ and Tn. Since   P sup ξn − ξn > η ψ Tn+t Tn t∈[0,δ]     ≤ P sup X − X > η + P sup Zn − Zn > η , ψ Tn+t Tn ψ Tn+t Tn (6.57) t∈[0,δ] t∈[0,δ] it remains to show that   P sup Zn − Zn > η → 0, n → ∞. ψ Tn+t Tn as (6.58) t∈[0,δ]

n Now, by construction, the process (Zt )t≥0 has finite predictable quadratic variation, n whose time derivative when Zt = x is Z n−β Z 1 Vr(x, y) 2 (1 + sn)u d+1+α f(y) − f(x) dydr 0 r Rd Vr Z n−β Z 1 Vr(x, y) 2 2 = (1 + sn)u d+1+α (y − x).∇f(x) + O(|y − x| ) dydr 0 r Rd Vr Z n−β Z 1 2 0 −β(2−α) ≤ C d+1+α |y − x| dydr = C n , (6.59) 0 r B(x,2r) where the Taylor expansion is justified since Vr(x, y) = 0 if |x − y| > 2r and we are concentrating on radii r ≤ n−β, and the first integral on the right hand side vanishes by rotational symmetry. Hence, we can conclude that for any η, δ,   lim P sup Zn − Zn > η = 0. ψ Tn+t Tn (6.60) n→∞ t∈[0,δ]

Coming back to (6.57), and taking Tn = τn and η fixed, we can conclude that there exists δ3 ∈ (0, δ2] such that for n large enough,   ε P sup ξn − ξn > η ≤ . ψ τn+t τn (6.61) t∈[0,δ3] 4K

Choosing η = ε/(KCk∇fk) and recalling (6.54), we obtain that for all sufficiently large n,  Y Y  ε P sup fξn,i  − fξn,i > ε ; A ,Bc ,Cc ≤ , ψ τn+t τn n δ3 δ3 (6.62) t∈[0,δ3] 4 i∈In i∈In

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 65/89 The spatial Lambda-Fleming-Viot process with selection

c where as in the fixed radius case, Bδ is the event that there is no branching event in n n c n n (τ , τ + δ] and Cδ is the event that there is no coalescence in (τ , τ + δ]. Finally, tightness will be proven if we can show that coalescence events cannot accumulate. In particular, since we have controlled the total number of particles and the probability of branching, we just need to control the probability that two particles coalesce. The result will be based on the following lemma, in which we use again the interpretation of the replacement of a particle by its “parent” as a jump by this particle (or ancestral lineage – when there are two parents, we choose one of them uniformly at random). ˆ1 ˆ2 Lemma 6.4. Let (ξnγ t)t≥0 and (ξnγ t)t≥0 be two independent copies of the jump process obtained by following the (unscaled) position of one particle on the timescale (nγ t, t ≥ 0), n ˆ2 ˆ1 and let ζt = ξnγ t − ξnγ t denote their difference. Then, for every t ≥ 0 we have: (i) When d = 1, there exists C(t) > 0 such that

1−γ  1 Z n t 1  lim sup Eψ γ α n α ds ≤ C(t). (6.63) n→∞ n 0 2 ∨ |ζs |

Furthermore, the function t 7→ C(t) can be chosen such that C(t) ↓ 0 as t → 0. (ii) When d ≥ 2, 1−γ  1 Z n t 1  lim Eψ γ α n α ds = 0. (6.64) n→∞ n 0 2 ∨ |ζs | We defer the proof of Lemma 6.4 until after the end of the proof of Theorem 1.16. Suppose that we start with a sample of two (non independent) particles at some d γ (unscaled) separation z0 ∈ R . As before, we work on the timescale n so that a single particle jumps at rate O(1) and we suppose the two particles ξ1 and ξ2 are currently at locations 0 and z (in fact, only their separation matters). Then, the infinitesimal 2 1 generator Γ of the difference walk (ξnγ t − ξnγ t)t≥0 (until it reaches a cemetery state ∆, say the point “infinity” in (Rcd)2, corresponding to the two walks having coalesced) is d 2 equal, for every given φ ∈ C0((Rc) ), to

Γφ(z)  ∞ Z Z 1 Z 1{y∈B(x,r)} = 2u(1 + sn) d+1+α 1{0∈/B(x,r)}1{z∈B(x,r)} dxdr Rd 1 r Rd Vr ∞  Z 1 Z 1{y∈B(x,r)} + (1 − un) d+1+α 1{0∈B(x,r)}1{z∈B(x,r)} dxdr (φ(y) − φ(z))dy 1 r Rd Vr Z  Z ∞ Z  2 −γ 1 1{y∈B(x,r)} + u n (1 + sn) d+1+α 1{0∈B(x,r)}1{z∈B(x,r)} dxdr Rd 1 r Rd Vr × (φ(∆) − φ(z)) dy, Z  Z ∞  1 Vr(y, z) = 2u(1 + sn) d+1+α dr (φ(y) − φ(z)) dy Rd 1 r Vr Z  Z ∞  2 −γ 1 Vr(0, y, z) − 2u n (1 + sn) d+1+α dr (φ(y) − φ(z)) dy Rd 1 r Vr Z  Z ∞  2 −γ 1 Vr(0, y, z) + u n (1 + sn) d+1+α dr (φ(∆) − φ(z)) dy, (6.65) Rd 1 r Vr where Vr(0, y, z) denotes the volume of the intersection B(0, r) ∩ B(y, r) ∩ B(z, r). From the first two terms on the r.h.s. of (6.65), we see that until coalescence we can γ n couple the difference walk (on the timescale n ) with the difference (ζt )t≥0 between two independent random walks, each jumping according to the law of a single walk but with

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 66/89 The spatial Lambda-Fleming-Viot process with selection each jump z 7→ y “cancelled” with probability

2 −γ R ∞ 1 Vr (0,y,z) 2u n (1 + sn) d+1+α dr 1 r Vr ∆n(z, y) = . (6.66) R ∞ 1 Vr (y,z) 2u(1 + sn) d+1+α dr 1 r Vr

(One can check that these two descriptions give rise to the same jump times and embedded chain.) Each time we cancel a jump, with probability one half it was a coalescence in the original system (compare the second and third terms on the r.h.s. of (6.65)), but the key point is that if there are no cancelled jumps, then there was no coalescence. It therefore suffices to show that we can find δ2 ∈ (0, δ1] such that, for sufficiently 1−γ large n, the probability that an event is cancelled in the interval [0, δ2n ] is smaller than ε/(4K(K − 1)). Now, according to the expression on the right hand side of (6.65), when the two particles lie at separation z ∈ Rd, a cancelled event occurs at instantaneous rate Z  Z ∞  2 −γ 1 Vr(0, y, z) 2u n (1 + sn) d+1+α dr dy Rd 1 r Vr Z Z 2 −γ 1 −γ −α ≤ 2u n (1 + sn) d+1+α drdy = C1n 2 ∨ |z| . (6.67) Rd |z| |y| r 1∨ 2 ∨ 2

n Hence, (using the coupling with (ζt )t≥0), the probability of having no event cancelled up to time n1−γ t (corresponding to time nt in original units) is equal to

  Z n1−γ t Z  Z ∞ n   2 −γ 1 Vr(0, y, ζs ) Eψ exp − 2u n (1 + sn) d+1+α dr dyds (6.68) 0 Rd 1 r Vr   Z n1−γ t   Z n1−γ t  −γ n −α −γ ds ≥ Eψ exp − C1n 2 ∨ |ζ | ds ≥ 1 − C1 Eψ n . s n α 0 0 2 ∨ |ζs |

But Lemma 6.4 shows that we can indeed find δ2 > 0 such that

1−γ  Z n δ2  −γ ds ε lim sup Eψ n ≤ . (6.69) n α n→∞ 0 2 ∨ |ζs | 2C1K(K − 1)

Consequently,

c K(K − 1) ε ε Pψ[at least 1 coalescence in (τn, τn + δ2]; An,B ] ≤ = , (6.70) δ2 2 2K(K − 1) 4 which was the last result we needed to complete the proof of tightness and therefore of Theorem 1.16.

n Proof of Lemma 6.4. As before, we shall exploit the fact that (ζt )t≥0 is “nearly” a sym- n metric α-stable process. Indeed, the intensity at which (ζt )t≥0 jumps by some vector y is independent of its current location and equal to  Z ∞  1 Vr(0, y) 2(1 + sn) d+1+α dr dy. (6.71) 1 r Vr

n n Writing (Zt )t≥0 for a jump process, independent of (ζt )t≥0, starting at 0 and with jump intensity  Z 1  1 Vr(0, y) 2(1 + sn) d+1+α dr dy, (6.72) 0 r Vr

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 67/89 The spatial Lambda-Fleming-Viot process with selection

n n then the generator of the process (Xt)t≥0, where Xt = ζt + Zt , is precisely 2(1 + sn) times the operator Dα defined in (6.30), which we already checked corresponds to a n symmetric α-stable process. Once again, the idea is that the jumps of (Zt )t≥0 (which are bounded by 2) do not contribute much to the evolution of (Xt)t≥0. More precisely, let us show that there exists C > 0 such that for every n large enough and every s ≥ 1,  n  |Z | 2 P √s > (log n)2 ≤ Ce−(log n) /d. (6.73) ψ s n To this end, observe first that since the law of Zs is invariant under rotation, we can write that  n   n(1) 2   n(1) 2  |Z | |Zs | (log n) Zs (log n) P √s > (log n)2 ≤ d P √ > = 2d P √ > , (6.74) ψ s ψ s d ψ s d

n(1) n n(1) where Zs denotes the first coordinate of Zs . Now, (Zs )s≥0 is again a symmetric Lévy process with jumps bounded by 2, and so Theorem 25.3 in [46] shows that for every n(1) s, q ≥ 0, E[exp(qZs )] < ∞. In this case, it is known that the characteristic exponent n n(1) Ψ of (Zs )s≥0, given here by a formula of the form Z n iqx  n Ψ (q) = 1 − e + iqx1{|x|<1} m (dx), (6.75) [−2,2] has an analytic extension to the half-plane with negative imaginary part, and we have

h qZn(1) i sψn(q) n n Eψ e s = e , with ψ (q) = −Ψ (−iq). (6.76)

As a consequence, the Markov inequality gives us that

 n(1) 2  √ Zs (log n) 2 n P √ > ≤ e−(log n) /d+sψ (1/ s). (6.77) ψ s d Since the measure mn has support in [−2, 2], we can write that when q is small Z n 2 2 3 3  n ψ (q) = − 1 − [1 + qx + q x /2 + O(q x )] + qx1{|x|<1} m (dx) [−2,2] Z 2 Z n q 2 n 3 = q x1{|x|≥1}m (dx) + x m (dx) + O(q ), (6.78) [−2,2] 2 [−2,2] n where the first term on the right is zero, by symmetry. Furthermore, sn → 0 and so m m C > 0 converges to some√ finite . Consequently, there exists a constant such that for every s ≥ 1, ψn(1/ s) ≤ C/s. Together with (6.74) and (6.77), this gives us (6.73). It will be convenient to suppose that ζ0 = 0, but notice that there will be no loss of 2 generality in so-doing, since for n sufficiently large, ζ0 will be bounded by (log n) and so, for s > 1, can be absorbed into our bound for Zs. Similarly, we can, and do, replace α n α n α 2 ∧ |ζs | by 1 ∧ |ζs | in the denominator of our integrand. Based on these considerations, let us return to the integral of interest when d ≥ 2. R R Fixing a ∈ (0, γ) and splitting the integral with respect to time into [0,na] + [na,n1−γ t], we obtain  Z n1−γ t  Z n1−γ t   1 1 a−γ 1 1 Eψ γ n α ds = O(n ) + γ Eψ n α ds n 0 1 ∨ |ζs | n na 1 ∨ |Xs − Zs | n1−γ t  n  n1−γ t  √  Z |Z | Z 1{|Zn|≤(log n)2 s} a−γ −γ √s 2 −γ s ≤ Cn + n Pψ > (log n) ds + n Eψ n α ds na s na 1 ∨ |Xs − Zs | n1−γ t √ 2 Z 1 n 2  a−γ 1−2γ −(log n) /d −γ {|Zs |≤(log n) s} ≤ Cn + Cn te + n Eψ n α ds. (6.79) na 1 ∨ |Xs − Zs |

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 68/89 The spatial Lambda-Fleming-Viot process with selection

Since the first two terms on the right tend to 0 as n → ∞, it now suffices to show that the last term remains bounded when n is large. α By Lemma 5.3 in [10], if (ps )s≥0 denotes the transition density of (Xt)s≥0, we have, for every s > 0 and x ∈ Rd,

α α −d/α α −1/α ps (0, x) =: ps (x) = s p1 (xs ) (6.80) and there exists Cd,α > 0 (independent of x) such that

α d+α−1 0 ≤ p1 (x) ≤ Cd,α 1 + |x| . (6.81) √ Hence, for any s ≥ na and any z ∈ Rd such that |z| ≤ (log n)2 s, we can write

 1  Z 1 E ≤ s−d/α dx (6.82) ψ α α −1/α d+α 1 ∨ |Xs − z| Rd (1 ∨ |x − z| )(1 + |xs | ) Z 1 Z 1 ≤ s−d/α dx + s−d/α dx −1/α d+α α −1/α d+α B(z,1) 1 + |xs | B(z,1)c |x − z| (1 + |xs | ) Z dx Z dx ≤ Cs−d/α + C0s−d/α + C00s−d/α . α α −1/α d+α B(0,s1/α)\B(z,1) |x − z| B(0,s1/α)c |x − z| |xs | √ But since s ≥ na and |z| ≤ (log n)2 s, we have

−1/α 2 1 − 1 2 −a(2−α)/(2α) |z|s ≤ (log n) s 2 α ≤ (log n) n → 0, (6.83) and so the second term on the right is bounded (after a change to polar coordinates) by

Z s1/α C0s−d/α ρd−1−αdρ = C0s−1, (6.84) 1 while the third term is bounded by Z ∞ C00s−d/αs1+d/α ρd−1−2α−ddρ = C00s−1. (6.85) s1/α Since all the constants depend on neither z (in the range considered) nor s, we deduce that the right hand side of (6.79) is bounded by

2 C0na−γ +Cn1−2γ te−(log n) /d+C00n−γ n−a(d−α)/d+log n+log t → 0 as n → ∞, (6.86) which proves (ii). The only point that differs when d = 1 is that 1 − d/α > 0 and so

Z n1−γ t −γ −1/α −γ (1− 1 )(1−γ) 1− 1 n s ds ≤ Cn n α t α . (6.87) na

1 An easy check confirms that (1 − α )(1 − γ) − γ = 0, and so C(t) exists and is proportional 1− 1 to t α . Since α > 1, we also have that C(t) ↓ 0 as t → 0.

Acknowledgments. We are extremely grateful to the two reviewers (one in particular, who invested a lot of time in carefully reading and commenting on different versions of the paper) and the Associate Editor for their valuable suggestions to significantly improve the presentation of the results and some of the arguments. AME was supported in part by EPSRC Grant EP/I01361X/1, AV was supported in part by the chaire Modélisation Mathématique et Biodiversité of Veolia Environnement-École Polytechnique-Museum National d’Histoire Naturelle-Fondation X and FY was supported in part by EPSRC Grant EP/I028498/1.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 69/89 The spatial Lambda-Fleming-Viot process with selection

References

[1] Abramowitz, M. and Stegun, I. A., eds.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover Publications, 1972. MR-1225604 [2] Aldous, D.: Stopping times and tightness. Ann. Probab. 6, (1978), 335–340. MR-0474446 [3] Bah, B. and Pardoux, E.: Lambda-lookdown model with selection. Stoch. Process. Appl. 125, (2015), 1089–1126. MR-3303969 [4] Brunet, E. and Derrida, B.: Effect of microscopic noise on front propagation. J. Stat. Phys. 103, (2001), 269–282. MR-1828730 [5] Barton, N. H., Depaulis, F. and Etheridge, A. M.: Neutral evolution in spatially continuous populations. Theor. Pop. Biol. 61, (2002), 31–48. [6] Biswas, N., Etheridge, A. M. and Klimek, A.: The spatial Lambda-Fleming-Viot process with fluctuating selection, arXiv:1802.08188. [7] Barton, N. H., Etheridge, A. M. and Véber, A.: A new model for evolution in a spatial continuum. Electron. J. Probab. 15, (2010), 162–216. MR-2594876 [8] Barton, N. H., Etheridge, A. M. and Véber, A.: Modelling evolution in a spatial continuum. JSTAT, (2013), P01002. MR-3036210 [9] Berestycki, N., Etheridge, A. M. and Véber, A.: Large-scale behaviour of the spatial Λ-Fleming- Viot process. Ann. Inst. H. Poincaré Probab. Statist. 49, (2013), 374–401. MR-3088374 [10] Biler, P.and Woyczynski, W. A.: Global and exploding solutions for nonlocal quadratic evolution problems. SIAM J. Appl. Math. 59, (1998), 845–869. MR-1661243 [11] Billingsley, P.: Probability and Measure. Wiley, New York, 1995. MR-1324786 [12] Burkholder, D. L.: Distribution function inequalities for martingales. Ann. Probab. 1, (1973), 19–42. MR-0365692 [13] Conlon, J. G. and Doering, C. R.: On travelling waves for the stochastic Fisher-Kolmogorov- Petrovsky-Piscunov equation. J. Stat. Phys. 120, (2005), 421–477. MR-2182316 [14] Cabré, X. and Roquejoffre, J.-M.: The influence of fractional diffusion in Fisher-KPP equations. Communications in Mathematical Physics 320, (2013), 679–722. MR-3057187 [15] Etheridge, A. M.: Drift, draft and structure: some mathematical models of evolution. Banach Center Publ. 80, (2008), 121–144. MR-2433141 [16] Etheridge, A. M.: Some Mathematical Models from Population Genetics: École d’été de Probabilités de Saint-Flour XXXIX-2009. Springer, 2011. MR-2759587 [17] Etheridge, A. M., Freeman, N. and Penington, S.: Branching Brownian motion, mean curva- ture flow and the motion of hybrid zones. Electron. J. Probab. 22, (2017), 103. MR-3733661 [18] Etheridge, A. M., Freeman, N., Penington, S. and Straulino, D.: Branching Brownian motion and selection in the spatial Lambda-Fleming-Viot process. Ann. Applied Probab. 27, (2017), 2605–2645. MR-3719942 [19] Etheridge, A. M., Freeman, N. and Straulino, D.: The Brownian net and selection in the spatial Lambda-Fleming-Viot process. Electron. J. Probab. 22, (2017), 39. MR-3646065 [20] Etheridge, A. M. and Kurtz, T. G.: Genealogical constructions of population models. Ann. Probab. 47, (2019), 1827–1910. MR-3980910 [21] Ethier, S. N. and Kurtz, T. G.: Markov processes: characterization and convergence. Wiley, 1986. MR-0838085 [22] Evans, S. N.: Coalescing Markov labelled partitions and a continuous sites genetics model with infinitely many types. Ann. Instit. H. Poincaré Probab. Statist. 33, (1997), 339–358. MR-1457055 [23] Felsenstein, J.: A pain in the torus: some difficulties with the model of isolation by distance. Amer. Nat. 109, (1975), 359–368. [24] Fisher, R.: The wave of advance of advantageous genes. Annals of Eugenics, 7, (1937), 355–369. [25] Forien, R. and Penington, S.: A central limit theorem for the spatial Lambda-Fleming-Viot process with selection. Electron. J. Probab. 22, (2017), 5. MR-3613698

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 70/89 The spatial Lambda-Fleming-Viot process with selection

[26] Foucart, C.: The impact of selection in the Λ-Wright-Fisher model. Electron. Commun. Probab. 18, (2013), 1–10. Erratum: Electron. Commun. Probab. 15, (2014), 1–3. MR-3101637 [27] Ikeda, N., Nagasawa, M. and Watanabe, S.: Branching Markov processes I. J. Math. Kyoto Univ. 8, (1968), 233–278. MR-0232439 [28] Kallenberg, O.: Random measures. Academic Press, London-New York-San Francisco, 1976. MR-0431373 [29] Kallenberg, O.: Foundations of Modern Probability, 2nd edition. Springer, New York, 2002. MR-1876169 [30] Khoshnevisan, D. and Schilling R. L.: From Lévy-Type Processes to Parabolic SPDEs. Ad- vanced Courses in Mathematics CRM Barcelona, Springer, 2016. MR-3587832 [31] Kimura M.: “Stepping stone” model of population. Ann. Rept. Nat. Inst. Genetics Japan 3, (1953), 62–63. [32] Kolmogorov, A. N., Petrovsky, I. and Piscounov, N.: Étude de l’équation de la diffusion avec croissance de la quantité de matière et son application à un problème biologique. Moscow Univ. Math. Bull. 1, (1937), 1–25. [33] Krone, S. M. and Neuhauser, C.: Ancestral processes with selection. Theor. Pop. Biol. 51, (1997), 210–237. [34] Lawler, G. F. and Limic, V.: Random Walk: a modern introduction. Cambridge University Press, 2010. MR-2677157 [35] Liang, R. H.: Two continuum-sites stepping stone models in population genetics with delayed coalescence. PhD Thesis, University of California, Berkeley, 2009. MR-2713924 [36] Miller, L. R.: Evolution of highly fecund organisms. PhD Thesis, University of Oxford, 2015. [37] Mueller, C., Mytnik, L. and Quastel, J.: Effect of noise on front propagation in reaction- diffusion equations of KPP type. Invent. Math. 184, (2011), 405–453. MR-2793860 [38] Mueller, C. and Sowers, R.: Random travelling waves for the KPP equation with noise. J. Functional Analysis 128, (1995), 439–498. MR-1319963 [39] Mueller, C. and Tribe, R.: Stochastic p.d.e.’s arising from the long range contact and long range voter processes. Probab. Theory Relat. Fields 102, (1995), 519–545. MR-1346264 [40] Neuhauser, C. and Krone, S. M.: Genealogies of samples in models with selection. Genetics 145, (1997), 519–534. [41] Port, S. C. and Stone, C. J.: Infinitely divisible processes and their potential theory. II. Ann. Instit. Fourier 21, (1971), 179–265. MR-0346919 [42] Pruitt, W. E.: The growth of random walks and Lévy processes. Ann. Probab. 9, (1981), 948–956. MR-0632968 [43] Rebolledo, R.: Sur l’existence de solutions à certains problèmes de semimartingales. C.R. Acad. Sci. Paris, (1980), 290. MR-0579985 [44] Revuz, D. and Yor, M.: Continuous martingales and Brownian motion, 3rd edition. Springer, 1999. MR-1725357 [45] Ridler-Rowe, C. J.: On first hitting times of some recurrent two-dimensional random walks. Z. Wahr. verw. Geb. 5, (1966), 187–201. MR-0199901 [46] Sato, K.: Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press, 1999. MR-1739520 [47] Shiga, T. and Shimizu, A.: Infinite dimensional stochastic differential equations and their applications. J. Math. Kyoto Univ. 20, (1980), 395–416. MR-0591802 [48] Stein, E. and Weiss, G.: Introduction to Fourier Analysis on Euclidean Spaces. Princeton University Press, 1971. MR-0304972 [49] Véber, A. and Wakolbinger, A.: The spatial Λ-Fleming-Viot process: An event-based construc- tion and a lookdown representation. Ann. Instit. H. Poincaré Probab. Statist. 51, (2015), 570–598. MR-3335017 [50] Yule, G. U.: A mathematical theory of evolution, based on the conclusions of Dr. J.C. Willis. Philos. Trans. Roy. Soc. London Ser. B 213, (1924), 21–87.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 71/89 The spatial Lambda-Fleming-Viot process with selection

A Continuity estimates in the fixed radius case

n In this section, we state the continuity estimates for the scaled measures MT required in the proof of Theorem 1.11. Because their proof is an adaptation of the (long and slightly more involved) proof of Proposition B.1(ii), we do not give it here and instead refer to Appendix B. These estimates have the same flavour as the one dimensional estimates derived in [39] for the convergence of the local densities of 1’s in the long range voter or contact process. Proposition A.1. Under the conditions of Theorem 1.11, for every T > 0 there exist d a, λ, v, C > 0 such that for every n ≥ 1, z1, z2 ∈ R such that |z1 − z2| < 1 and  ∈ (0, 1),

 1 Z  E n wT (x)(1{|x−z1|<} − 1{|x−z2|<})dx V Rd

−a 1/4 1/2 λ(|z1|+) −(d−1)/6 (2−d)/4 ≤ Cn + Cτ + C |z1 − z2| + τ e + Cn τ , (A.1) where −v 2/(d+1) τ = τ(n, z1, z2) = n ∨ |z1 − z2| , and  can depend on n (as long as n ≤ 1).

B Continuity estimates in the stable radius case

n Our aim in this section is to obtain some continuity estimates for the measure MT (this time in the stable radius case), which are valid for fixed (large) n. Since in the stable radius case, we also need to compare the local densities of type-1 individuals over balls of radius n−β to the densities over balls of radius O(log n)n−β, Proposition B.1 below is more complete than Proposition A.1. Lemma 6.2 will then follow as a corollary of item (i). Proposition B.1. Suppose the conditions of Theorem 1.14 are satisfied. Fix T > 0. Then, (i) There exist a, C > 0 (dependent on T ) such that for every z ∈ Rd, t ∈ [0,T ], n ≥ 1 −β 0 and n ≤ n < n ≤ 1,  1 Z 1 Z  E n n wt (y)dy − wt (y)dy V V 0 0 n B(z,n) n B(z,n) 1− d+1 β(α−d)−γ h 1− 2(d+1) −a 0 d α d/2 2 0 2 α ≤ Cn + Cτ1 + Cn(log n) τ1 + (log n) n n τ1 (B.1)

β(2−α)d 1− d+1 i1/2 0 − 2(d+1) α + nn τ1 , where −β(2−α)/(2(d+1)) τ1 = τ1(n) = n . (B.2)

(ii) There exist a, λ, C > 0 (dependent on T ) such that for every |z1 − z2| < 1, t ∈ [0,T ], n ≥ 1 and  ∈ (0, 1),

 1 Z  E n wt (x)(1{|x−z1|<} − 1{|x−z2|<})dx V Rd 1/2 −a 1/4 1/2 λ(|z1|+) −β(d−1) 1−d/α ≤ Cn + Cτ2 + C |z1 − z2| + (τ2) e + C n τ2 , (B.3) where −β(2−α)d/(4(d+1)) α/(d+1) τ2 = τ2(n, z1, z2) = n ∨ |z1 − z2| , (B.4) and  can depend on n (as long as n ≤ 1).

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 72/89 The spatial Lambda-Fleming-Viot process with selection

In particular, (ii) implies uniform continuity of the limiting process of allele frequen- cies. That is: Corollary B.2. Suppose the conditions of Theorem 1.14 are satisified and fix T > 0. Then for every |z1 − z2| < 1, t ∈ [0,T ] and  ∈ (0, 1),

 1 Z  E n lim sup wt (x)(1{|x−z1|<} − 1{|x−z2|<})dx n→∞ V Rd (α−1)/4 α/(d+1) 1/4 ≤ C|z1 − z2| 1{d=1} + C|z1 − z2| 1{d≥2} + C |z1 − z2|

α/(2(d+1)) λ(|z1|+) + |z1 − z2| e , where C depends on T . Before proving Proposition B.1, let us show how it implies Lemma 6.2.

−β 0 −β −β Proof of Lemma 6.2. Set n = n and n ∈ [n , n log n] in (i). Then

1− 2(d+1) β(2−α) β(2−α) 0 2 α 2 −2β 2(d+1) − α n τ1 = (log n) n n , and it is straightforward to check that the exponent of n on the right hand side is negative for any α ∈ (1, 2). Moreover,

1− d+1 2−α 1 1 0 d α a −β(1− 2 ( α − d+1 )) n(log n) τ1 ≤ (log n) n for some a > 0, and again one can check that the exponent of n is negative in all dimensions. Thus the right hand side of (B.1) tends to zero and the lemma follows.

The rest of this section is devoted to the proof of Proposition B.1. Note that the different lemmas that appear in this proof will be shown later in Appendix B.3.

Proof of Proposition B.1. We define for x ∈ Rd,

1 ur(x) = 1{|x|≤r}, Vr

∗k n 1 R n u to be the k-fold convolution of ur and w˜ (x; r) = w (y)dy. Recall the r Vr B(x,r) expression (6.4) for the extended generator of M n. For ϕ ∈ L1(Rd), we follow our usual n strategy of writing the value of hwT , ϕi as a sum of drift and martingale terms (see the d 1 d beginning of the proof of Theorem 1.14, where we can replace ϕ ∈ Cc(R ) by ϕ ∈ L (R ) n n by a density argument): for any representative wt of the density of each Mt , we have

Z T Z Z ∞ Z n n n,ϕ 1−βα 1 1 hwT , ϕi =hw0 , ϕi + MT + unn d+1+α 2 (B.5) 0 Rd n−β r B(x,r)2 Vr n n n n × wt (y)(1 + snwt (z))h1B(x,r)(1 − wt ), ϕi

n n n  n o − 1 − wt (y) + sn(1 − wt (y)wt (z)) h1B(x,r)wt , ϕi dydzdrdxdt Z T Z Z ∞ Z n n,ϕ 1 1  n =hw0 , ϕi + MT + u d+1+α 2 wt (y)h1B(x,r), ϕi 0 Rd n−β r B(x,r)2 Vr n n n n − h1B(x,r)wt , ϕi + sn(wt (y)wt (z)h1B(x,r), ϕi − h1B(x,r)wt , ϕi) dydzdrdxdt

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 73/89 The spatial Lambda-Fleming-Viot process with selection

1−βα n,ϕ (since unn = u), where (MT )T ≥0 is a mean zero martingale. The first term in the integrand in (B.5) is equal to:

Z Z ∞ Z 1 1 n n d+1+α {wt (y)h1B(x,r), ϕi − h1B(x,r)wt , ϕi}dydrdx Rd n−β r B(x,r) Vr Z Z ∞ Z Z 1 1 n n = d+1+α 1{|x−y|≤r}1{|x−z|≤r}{wt (y)ϕ(z) − wt (z)ϕ(z)}dzdydrdx Rd n−β r Vr Rd Rd Z ∞ Z Vr ∗2 n n = d+1+α {(ur ∗ wt )(z)ϕ(z) − wt (z)ϕ(z)}dzdr n−β Rd r Z Z ∞ n Vr ∗2 = wt (z) d+1+α {(ur ∗ ϕ)(z) − ϕ(z)}drdz. (B.6) Rd n−β r

The second term in the integrand in (B.5) is equal to

Z Z ∞ 1 n 2 n sn d+1+α (w ˜t (x; r) h1B(x,r), ϕi − h1B(x,r)wt , ϕi)drdx Rd n−β r Z Z ∞ 1 n 2 n =sn d+1+α 1{|x−y|

2 1−βα 2 −γ 2 −(α−1)/(2α−1) Since unn = u n = u n , the martingale term in (B.5) has predictable quadratic variation

Z T Z Z ∞ n,ϕ 2 −γ 1 n n n n 2 [M ]T = u n d+1+α w˜t (x; r)(1 + snw˜t (x; r))h1B(x,r)(1 − wt ), ϕi 0 Rd n−β r n n 2  n 2o + 1 − w˜t (x; r) + sn(1 − w˜t (x; r) ) h1B(x,r)wt , ϕi drdxdt.

It is convenient to replace this martingale problem by a mild version, obtained by n replacing ϕ by the time dependent function ζt (x, z, ) chosen to solve Z ∞ n uVr h ∗2 n n i ∂tζt (x; z, ) = d+1+α (ur ∗ ζt (·; z, ))(x) − ζt (x; z, ) dr n−β r

n n with initial condition ζ0 (·; z, ). That is ζt (·; z, ) is the density at time t of the d- n n dimensional Lévy process, (Xt )t≥0, with initial distribution ζ0 (·; z, ), zero drift, no Brownian component, and Lévy measure

Z ∞ n uVr ∗2 ν (dx) = d+1+α ur (x)drdx n−β r

Rd n L1 Rd N for x ∈ (in particular, ζt (x, z, ) ∈ ( )). Here we assume that for any n ∈ , Rd n n n z ∈ and  > 0, ζ0 (·; z, ) = ζ0 (· − z; 0, ) and that the support of ζ0 (·; 0, ) is included in B(0, ). Of course, the particular example we have in mind is ζn(·; z, ) = 1 1 . 0 V {|·−z|<} The parameter  can be taken to depend on n. We observe that νn is radially symmetric. Let Z n −d n 2 n a (x; r) = r 1{|x−y|

Notice that an, bn and cn are all uniformly (in n, x and r) bounded between constants. Sup- n pose that we know the exponential decay of ζT −t(·; z, ) (which we prove in Lemma B.3),

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 74/89 The spatial Lambda-Fleming-Viot process with selection then substituting in the martingale problem in the usual way, we obtain

n n n n n n,ζ0 (·;z,) hwT , ζ0 (·; z, )i = hw0 , ζT (·; z, )i + MT Z T Z ∞ Z 1 n n + usn 1+α at (x; r)ζT −t(x; z, )dxdrdt, (B.7) 0 n−β r Rd T ∞ h n i Z Z 1 Z n n,ζ0 (·;z,) 2 −γ n n n 2 M = u n d+1+α bt (x; r)h1B(x,r)(1 − wt ), ζT −t(·; z, )i T 0 n−β r Rd n n n 2o + ct (x; r)h1B(x,r)wt , ζT −t(·; z, )i dxdrdt 2 Z T Z ∞ Z  Z ! 2 −γ 1 n n n =u n d+1+α bt (x; r) (1 − wt (y))ζT −t(y; z, )dy 0 n−β r Rd B(x,r) Z !2  n n n + ct (x; r) wt (y)ζT −t(y; z, )dy dxdrdt. (B.8) B(x,r)

In order to control the different terms appearing in (B.7) and (B.8), we are going to n n need to establish continuity estimates for ζ . In preparation for this, note that (Xt )t≥0 is a continuous time random walk with jump rate Z ∞ uVr αβ A = d+1+α dr = V1n . n−β r

d To describe the corresponding jump chain, let Rk be i.i.d. R -valued random variables V1 −(1+α) distributed according to A r 1{r>n−β }dr, Z1,k and Z2,k be independent uniformly distributed random variables in B(0, 1), and Yk = Rk(Z1,k + Z2,k). Then we can write

Kt n n X Xt = X0 + Yk, (B.9) k=1 where Kt is a Poisson random variable with parameter At. We define fY as the density ∗k of Y1, fY to be the k-fold convolution of fY ,

(At)k qn,{k}(x) = f ∗k(x)P[K = k] = e−At f ∗k(x) t Y t k! Y ∞ n X n,{k} qt (x) = qt (x). (B.10) k=1

Then, n n −At n n ζt (x; z, ) = ζ0 (x; z, )e + (ζ0 (·; z, ) ∗ qt (·))(x). Our estimates will involve splitting into two cases, according to whether the walk has taken greater or fewer than L steps in the interval [0, t] and so it will be convenient n,I P n,{k} n,{k} n n,{k} to define qt = k∈I q for I ⊂ [1, ∞), ζt (·; z, ) = ζ0 (·; z, ) ∗ qt (·), and n,I P n,{k} ζt = k∈I ζt for I ⊂ [0, ∞). Since the number of jumps made by the walk in [0, t] has mean proportional to nαβ, with probability tending to one as n → ∞ it will take at least ncαβ steps for any c ∈ (0, 1). We define c1 := (α − 1)/(2α) ∈ (0, 1) and set

L = nc1αβ/2.

In Section B.3, we shall prove a sequence of lemmas that control the behaviour of the random walk. In particular, we establish the following. For every t ≥ 0, let qt be the

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 75/89 The spatial Lambda-Fleming-Viot process with selection density function of value at time t of the symmetric α-stable process starting at 0 and with Laplace exponent Z ψ(θ) := eiθ·x − 1ν(dx), Rd where Z ∞ uVr ∗2 ν(dx) := d+1+α ur (x)drdx. 0 r (Note that this process is the one appearing in Lemma 6.1.)

λ|x| c1αβ/2 Lemma B.3. Let kfkλ = supx |f(x)|e . Let c2 ∈ (0, α) be a constant. Recall L = n , α−1 Rd with c1 = 2α . For x, y, z ∈ and n,

(i) If M ≥ 2 and t ∈ [n−c2β(2−α)/(2(d+1)),T ], then

n,[M,∞) −β(2−α)d/(2(d+1)) βd M−1 P |qt (x) − qt(x)| ≤ Cd,T n + Cdn (a + [Kt < M])

for some a ∈ (0, 1) independent of M and T . Furthermore,

n,[L,∞) −β(2−α)d/(2(d+1)) |qt (x) − qt(x)| ≤ Cd,T n .

−(d+1)/α (ii) If t > 0, then |qt(x) − qt(y)| ≤ Ct |x − y|. (iii) If t ∈ [n−c2β(2−α)/(2(d+1)),T ], then

n,[L,∞) n,[L,∞) −(d+1)/α −β(2−α)d/(2(d+1)) |qt (x) − qt (y)| ≤ Ct |x − y| + Cd,T n .

n,[1,∞) −λ(|x|−1) (iv) If λ > 0, t ≤ T and |x| ≥ 1, then qt (x) ≤ Cλ,T e . (v) If λ > 0, t ∈ [n−c2β(2−α)/(2(d+1)),T ] and |y − z| ≤ 1, then

n,[L,∞) n,[L,∞) λ −(d+1)/(2α) 1/2 kζt (·; y, ) − ζt (·; z, )kλ ≤ Cλ,d,T e (t |y − z| + n−β(2−α)d/(4(d+1)))eλ|z|,

where  can depend on n.

Recall the definitions

−β(2−α)/(2(d+1)) τ1 = n , −β(2−α)d/(4(d+1)) α/(d+1) τ2 = n ∨ |z1 − z2| .

The quantity τ1 (resp., τ2) will be used in the bounds needed to prove Proposition B.1(i) (resp., (ii)). Observe that for t ≥ τ2 and |z1 − z2| < 1, the estimate on the right hand side of Lemma B.3(v) is 1/2 λ λ|z1| ≤ Cλ,d,T (|z1 − z2| + τ2)e e . Since the organisations of the proofs are similar, we shall show Proposition B.1(i) and (ii) in parallel. In both cases, we set

n 1 ζ0 (·; z, ) := 1B(z,) V

n (although most of the proof does not require a specific form for ζ0 ), and we estimate

n n n 0 (i) hwT , ζ0 (·; z, n) − ζ0 (·; z, n)i, n n n (ii) hwT , ζ0 (·; z1, ) − ζ0 (·; z2, )i for the range of parameters stated in Proposition B.1, using (B.7) and (B.8).

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 76/89 The spatial Lambda-Fleming-Viot process with selection

B.1 Drift terms n Let us split the different terms into the cases in which Kt, the number of jumps of X by time t, is less than or larger than L. This first gives (using the fact that the function n at is bounded uniformly in n, t, x, r):

Z T Z ∞ Z 1 n n,[0,L) n,[0,L) 0  usn 1+α at (x; r) ζT −t (x; z, n) − ζT −t (x; z, n) dxdrdt 0 n−β r Rd Z T Z ∞ Z 1 n,[0,L) n,[0,L) 0  ≤ Cusn 1+α ζT −t (x; z, n) + ζT −t (x; z, n) dxdrdt 0 n−β r Rd Z T αβ −(1−c1)αβ ≤ Cusnn P[Kt < L] dt ≤ Cn (B.11) 0

αβ by Lemma B.6 (which controls P[Kt < L]) and the fact that, by definition, snn ≡ σ. The same estimate holds for (ii) and the corresponding integral. Next, let us split the remaining integral into an integral over large and small times. We can write Z n,[L,∞) n,[L,∞) 0 n 0 n 0 0  n,[L,∞) 0 0 ζT −t (x; z, n) − ζT −t (x; z, n) = ζ0 (x ; z, n) − ζ0 (x ; z, n) qT −t (x − x )dx Rd Z n 0 n,[L,∞) 0 n,[L,∞)  0 = ζ0 (x ; z, n) qT −t (x − x ) − qT −t (x − z) dx Rd Z n 0 0 n,[L,∞) 0 n,[L,∞)  0 − ζ0 (x ; z, n) qT −t (x − x ) − qT −t (x − z) dx . Rd

R n 0 0 0 (The extra terms cancel since Rd ζ0 (x , z , n)dx = 1 for all choices of n.) Since the second term above will be bounded in the same way as the first term, let us just consider the first one. We have by Lemma B.3(iii) and (iv) (recalling also that the support of n ζ0 (·; z, ) is contained in B(z, )):

Z T −τ1 Z ∞ Z Z 1 n 0 n,[L,∞) 0 n,[L,∞) 0 Cusn 1+α ζ0 (x ; z, n) qT −t (x − x ) − qT −t (x − z) dx dxdrdt 0 n−β r Rd Rd Z T Z Z αβ n 0 −(d+1)/α 0 ≤ Csnn ζ0 (x ; z, n) t |z − x | d τ1 B(z,log n) R −β(2−α)d/(2(d+1)) 0 + Cd,T n dx dxdt Z T Z Z 0 αβ n 0 −|x−z| 0 + C snn ζ0 (x ; z, n)e dx dxdt c d τ1 B(z,log n) R Z T d−1 d −(d+1)/α d −β(2−α)d/(2(d+1)) 0 (log n) ≤ Cn(log n) t dt + Cd,T T (log n) n + C T τ1 n d−1  1− d+1 (log n)  ≤ C  (log n)dτ α + (log n)dn−β(2−α)d/(2(d+1)) + . (B.12) n 1 n

For (ii), the corresponding calculation is different and uses Lemma B.3(v) with an arbitrary λ > 0:

Z T −τ2 Z ∞ Z 1 n  n,[L,∞) n,[L,∞)  usn 1+α at (x; r) ζT −t (x; z1, ) − ζT −t (x; z2, ) dxdrdt 0 n−β r Rd Z T −τ2 Z αβ n,[L,∞) n,[L,∞) −λ|x| ≤ Cusnn sup ζT −t (x; z1, ) − ζT −t (x; z2, ) λ e dxdt d t∈[0,T −τ2] 0 R

1/2 λ λ|z1| ≤ Cλ,d,T (|z1 − z2| + τ2)e e . (B.13)

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 77/89 The spatial Lambda-Fleming-Viot process with selection

Finally, it remains to bound the integral corresponding to small (T − t)’s. For (i), we obtain Z T Z ∞ Z 1 n  n,[L,∞) n,[L,∞) 0  usn 1+α at (x; r) ζT −t (x; z, n) − ζT −t (x; z, n) dxdrdt −β d T −τ1 n r R Z T Z ∞ Z 1 n,[L,∞) n,[L,∞) 0  ≤ Csn 1+α ζT −t (x; z, n) + ζT −t (x; z, n) dxdrdt −β d T −τ1 n r R αβ ≤ Csnn τ1 = Cτ1. (B.14)

The same result obviously holds for (ii), with τ1 replaced by τ2. n For the terms involving the initial condition w0 , similar arguments using Lemma B.3(i) and (v), and Lemma B.6 lead to

n n n 0 −nc1αβ/2 −β(2−α)d/(2(d+1)) hw0 , ζT (·; z, n) − ζT (·; z, n)i ≤ Ce + Cn , and

c1αβ/2 n n n −n λ(|z1|+) 1/2 hw0 , ζT (·; z1, ) − ζT (·; z2, )i ≤ Ce + Ce τ2 + |z1 − z2| .

B.2 Martingale terms

Now we turn to the martingale terms. As before, we first consider the case Kt < L. We shall estimate the term involving bn, but the same approach can also be applied to the terms involving cn. We have

Z T Z ∞ Z 2 −γ 1 n n n,[0,L) 2 u n d+1+α bt (x; r)h1B(x,r)(1 − wt ), ζT −t (·; z, n)i dzdrdt 0 n−β r Rd Z T Z ∞ Z Z −γ 1 n,[0,L) ≤ Cn d+1+α ζt (y; z, n) 1{|y−x|

Z T −τ1 Z ∞ 1 Z u2n−γ bn(x; r)h1 (1 − wn),ζn,[L,∞)(·; z,  ) d+1+α t B(x,r) t T −t n 0 n−β r Rd

n,[L,∞) 0 2 − ζT −t (·; z, n)i dxdrdt .

Once again we write

n,[L,∞) n,[L,∞) 0 ζt (y; z, n) − ζt (y; z, n) Z n 0 n,[L,∞) 0 n,[L,∞)  0 = ζ0 (x ; z, n) qt (y − x ) − qt (y − z) dx Rd Z n 0 0 n,[L,∞) 0 n,[L,∞)  0 − ζ0 (x ; z, n) qt (y − x ) − qt (y − z) dx . Rd

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 78/89 The spatial Lambda-Fleming-Viot process with selection

This gives us Z n n n,[L,∞) n,[L,∞) 0 2 bT −t(x; r)h1B(x,r)(1 − wT −t), ζt (·; z, n) − ζt (·; z, n)i dx Rd Z  Z n 0 n,[L,∞) 0 ≤ C 1{|x−y|≤r}1{|x−y0|≤r} ζ0 (x ; z, n) qt (y − x ) (Rd)3 Rd  n,[L,∞) 0 − qt (y − z) dx  Z   n 0 n,[L,∞) 0 0 n,[L,∞) 0 0 n 0 × ζ0 (x ; z, n) qt (y − x ) − qt (y − z) dx + St dy dydx, Rd

n where St is the sum of the remaining three terms comprising the squared integral on the first line. Since all these terms behave in the same way, we shall only bound the first one. 0 d 0 Writing as before Vr(y, y )(≤ Cdr ) for the volume of B(y, r) ∩ B(y , r), and using Fubini’s 0 theorem, we can replace the integral over x by Vr(y, y ). Next, as in our estimates of the drift, we split the integrals over y, y0 according to whether or not y, y0 ∈ B(z, log n). This gives us the following first bound, using Lemma B.3(iii): Z  Z   d+1 − β(2−α)d  0 n − α 2(d+1) Vr(y, y ) ζ0 (x; z, n) t |z − x| + Cd,T n dx B(z,log n)2 Rd  Z   d+1 − β(2−α)d  n 0 − α 0 2(d+1) 0 0 × ζ0 (x ; z, n) t |z − x | + Cd,T n dx dy dy Rd

Z  d+1 β(2−α)d 2 d − − 2(d+1) 0 ≤ r 1{|y−y0|≤2r} nt α + Cd,T n dy dy B(z,log n)2

β(2−α)d 2 d d d − d+1 −  ≤ r (r ∧ log n) (log n) nt α + Cd,T n 2(d+1) .

Integrating over t and r, we obtain

Z T Z ∞ 2 1  d+1 − β(2−α)d  −γ d d d − α 2(d+1) n d+1+α r (r ∧ log n) (log n) nt + Cd,T n drdt −β τ1 n r  Z log n Z ∞  ≤ Cn−γ (log n)d rd−1−αdr + (log n)d r−1−αdr n−β log n  Z T Z T  2(d+1) − β(2−α)d d+1 − β(2−α)d 2 − α 2(d+1) − α (d+1) × n t dt + 2nn t dt + T n τ1 τ1 h 1− 2(d+1) β(2−α)d 1− d+1 β(2−α)d i −γ d d−α β(α−d) 2 α − 2(d+1) α − (d+1) ≤ Cn (log n) (log n) + n nτ1 + 2nn τ1 + n . (B.16)

Secondly, considering the case where y ∈ B(z, log n) and y0 ∈ B(z, log n)c and using Points (iii) and (iv) in Lemma B.3, the corresponding integral is bounded by Z Z  Z   d+1 − β(2−α)d  0 n 0 − α 2(d+1) Vr(y, y ) ζ0 (x ; z, n) t |z − x| + Cd,T n dx B(z,log n) B(z,log n)c Rd  Z  n 0 −|z−y0| 0 0 × ζ0 (x ; z, n)e dx dy dy Rd

β(2−α)d Z Z  − d+1 −  d −|z−y0| 0 ≤ C t α n + Cd,T n 2(d+1) r e dydy B(z,log n)c B(z,log n)∩B(y0,2r)

β(2−α)d d−1  − d+1 −  d d (log n) ≤ C t α  + C n 2(d+1) r (r ∧ log n) . n d,T n

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 79/89 The spatial Lambda-Fleming-Viot process with selection

Integrating over t and r as well, we obtain Z T Z ∞ d−1 1  d+1 − β(2−α)d  (log n) −γ − α 2(d+1) d d n d+1+α t n + Cd,T n r (r ∧ log n) drdt −β τ1 n r n d−1 (log n) h 1− d+1 − β(2−α)d ih i ≤ n−γ  τ α + C n 2(d+1) (log n)d−α + nβ(α−d) . (B.17) n n 1 d,T The case where y ∈ B(z, log n)c and y0 ∈ B(z, log n) is treated in the same way. Finally, if y, y0 ∈ B(z, log n)c, Lemma B.3(iv) gives us the bound Z  Z  Z  0 n −|z−y| n 0 −|z−y0| 0 0 Vr(y, y ) ζ0 (x; z, n)e dx ζ0 (x ; z, n)e dx dy dy (B(z,log n)c)2 Rd Rd

Z Z 0 ≤ Crd e−|z−y|e−|z−y |dy0dy B(z,log n)c B(z,log n)c∩B(y,2r) (log n)d−1 ≤ Crd1 ∧ rd . n Integrating over t and r gives the bound Z T Z ∞ d−1 d−1 −γ 1 d d(log n) −γ (log n) β(α−d) n d+1+α r 1 ∧ r drdt ≤ CT n (n + 1). (B.18) −β τ1 n r n n For the corresponding bound for (ii), the argument is again much shorter thanks to Point (v) in Lemma B.3: Z Z Z n n n 1{|y−x|

n,[L,∞) n,[L,∞) n,[L,∞) n,[L,∞) (ζT −t (y; z1, ) − ζT −t (y; z2, ))(ζT −t (z; z1, ) − ζT −t (z; z2, ))dzdydx Z Z Z n,[L,∞) n,[L,∞) ≤ (ζT −t (y; z1, ) − ζT −t (y; z2, )) 1{|y−x|

1/2 λ(|z1|+) 2d d ≤ Cλ,d,T (|z1 − z2| + τ2)e (r ∧ r ), which yields Z T −τ2 Z ∞ 1 Z u2n−γ bn(x; r)h1 (1 − wn), ζn,[L,∞)(·; z , ) d+1+α t B(x,r) t T −t 1 0 n−β r Rd

n,[L,∞) 2 − ζT −t (·; z2, )i dxdrdt Z T −τ2 Z ∞ 1 −γ 2d d 1/2 λ(|z1|+) ≤ Cλ,d,T n d+1+α (r ∧ r )(|z1 − z2| + τ2)e drdt 0 n−β r Z 1 Z ∞  −γ 1/2 λ(|z1|+) d−1−α −1−α ≤ Cλ,d,T n (|z1 − z2| + τ2)e r dr + r dr n−β 1

−γ 1/2 λ(|z1|+) (α−d)β ≤ Cλ,d,T n (|z1 − z2| + τ2)e (n + C)

−(d−1)β 1/2 λ(|z1|+) ≤ Cλ,d,T n (|z1 − z2| + τ2)e (B.19)

since n(α−1)βn−γ = 1.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 80/89 The spatial Lambda-Fleming-Viot process with selection

For t ∈ (T − τ1,T ), we apply Lemma B.7 to obtain Z n n n,[L,∞) 2 bt (x; r)h1B(x,r)(1 − wt ), ζT −t (·; z, n)i dx Rd Z ZZ n,[L,∞) n,[L,∞) 0 0 ≤ C ζT −t (y; z, n) 1{|y−x|

d −d/α −nc5 2d ≤ Cd(r ∧ (((T − t) + e )r )), which implies that

Z T Z ∞ 1 Z 2 −γ n n n,[L,∞) 2 u n d+1+α bt (x; r)(h1B(x,r)(1 − wt ), ζT −t (·; z, n)i dxdrdt −β d T −τ1 n r R τ1 ∞ Z Z h c i ≤ Cn−γ r−1−α ∧ t−d/α + e−n 5 rd−1−α drdt 0 n−β α ∞ r ∧τ1 ∞ τ1 Z Z Z Z c = Cn−γ r−1−αdtdr + Cn−γ t−d/α + e−n 5 rd−1−αdtdr −β −β α n 0 n r ∧τ1 1/α 1/α Z τ1 Z ∞ Z τ1 Z τ1 −γ −1 −γ −1−α −γ −d/α d−1−α = Cn r dr + Cτ1n r dr + Cn t r dtdr −β 1/α −β α n τ1 n r −γ −γ+β(α−d) 1−d/α ≤ Cn log n + Cn τ1 . (B.20)

The same bound holds for (ii), with τ1 replaced by τ2. 0 Combining (B.15), (B.16), (B.17), (B.18) and (B.20) yields (recall that n ≤ n)

h n n 0 i 1−d/α n,ζ0 (·;z,n)−ζ0 (·;z,n) −(α−1)β/2 −γ+β(α−d) M ≤ Cn + n τ1 T h 2(d+1) −γ d d−α β(α−d) 0 2 1− α + n (log n) (log n) + n n τ1

β(2−α)d 1− d+1 β(2−α)d i 0 − 2(d+1) α − (d+1) + 2nn τ1 + n , while combining (B.15), (B.19) and (B.20) gives us

h n n i n,ζ (·;z1,)−ζ (·;z2,) −(α−1)β/2 1/2 λ(|z1|+) M 0 0 ≤ Cn + Cλ,d,T (|z1 − z2| + τ2)e T −γ+β(α−d) (α−d)/α + Cdn τ2 .

Now, by the Burkholder-Davis-Gundy inequality ([12]),

  1/2 n,ζn(·;z, )−ζn(·;z,0 ) h n n 0 i E 0 n 0 n n,ζ0 (·;z,n)−ζ0 (·;z,n) sup Mt ≤ M . t≤T T

Combining this and the estimate for the drift term yields the desired result.

B.3 Lemmas We define for θ ∈ Rd,

n,{k} n n E iθ·(Xt −X0 )  q˜t (θ) = e 1{Kt=k} ,

n,I n n,[0,∞) and correspondingly q˜t (θ) for I ⊂ [0, ∞), as well as q˜t (θ) =q ˜t (θ). Recall the representation of Xn using random walks in (B.9). As Xn has independent and stationary

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 81/89 The spatial Lambda-Fleming-Viot process with selection increments, the Lévy-Khintchine Formula (see e.g. Theorems 2.7.10 and 2.8.1 of [46]) implies that n,[0,∞) n n n E iθ·(Xt −X0 ) tψ (θ) q˜t (θ) = e = e , where Z ψn(θ) = (eiθ·x − 1)νn(dx). (B.21) Rd Similarly, we define the limiting Lévy measure Z ∞ uVr ∗2 ν(dx) = d+1+α ur (x)drdx, 0 r as well as the corresponding function ψ, Z ψ(θ) = (eiθ·x − 1)ν(dx). (B.22) Rd

n We observe that for all t > 0, |etψ (θ)| ≤ 1 and hence |etψ(θ)| ≤ 1. Lemma B.4. For all n, we have:

Rd n 4d −β(2−α) 2 (i) For all θ ∈ , |ψ (θ) − ψ(θ)| ≤ 3 n |θ| . β n α (ii) For |θ| ≤ n , −ψ (θ) ≥ c|θ| for some positive constant c = cd independent of n. Hence −ψ(θ) ≥ c|θ|α for all θ.

Proof. Since ν is radially symmetric, (B.21) implies

1 Z 1 Z ψn(θ) = (eiθ·x − 2 + e−iθ·x)νn(dx) = (eiθ·x/2 − e−iθ·x/2)2νn(dx) 2 Rd 2 Rd Z Z Z ∞ 2 n 2 Vr ∗2 = −2 sin (θ · x/2)ν (dx) = −2 sin (θ · x/2) d+1+α ur (x)drdx. Rd Rd n−β r

The calculations above can easily be repeated for X and ψ, then

−β 1 Z n V Z Z |ψn(θ) − ψ(θ)| = r sin2(θ · x/2) u (y) u (x − y)dydxdr d+1+α r r 2 0 r Rd Z n−β Z 2 Z Vr sin (θ · x/2) = d+1+α 2 1{|y|

Since | sin(x)| ≤ |x| for all x, we have

−β d !2 1 1 Z n 1 Z X |ψn(θ) − ψ(θ)| ≤ θ x dxdr 2 4 rd+1+α i i 0 |x|<2r i=1 −β d !2 1 Z n 1 Z 2r Z 2r X ≤ ... θ x dx ... dx dr 4 rd+1+α i i 1 d 0 −2r −2r i=1 d −β 1 X Z n 1 Z 2r Z 2r ≤ θ2 ... x2dx ... dx dr. 4 i rd+1+α i 1 d i=1 0 −2r −2r

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 82/89 The spatial Lambda-Fleming-Viot process with selection

The (d + 1)-dimensional integral above is the same for all i (by symmetry), and is equal to

Z n−β d−1 Z 2r Z n−β d−1 d+1 Z n−β d+1 (4r) 2 4 2 3 4 1−α 4 −β(2−α) d+1+α x1dx1dr = 2+α (2r) dr = r dr = n . 0 r −2r 0 r 3 3 0 3 Hence 4d |ψn(θ) − ψ(θ)| ≤ n−β(2−α)|θ|2, 3 as required by (i). For (ii), we have Z Z ∞ 1 n 2 Vr ∗2 − ψ (θ) = sin (θ · x/2) d+1+α ur (x)drdx 2 Rd n−β r Z Z ∞ Z 2 1 = sin (θ · x/2) d+1+α 1{|y| 0, we have Z ∞ Z r 1 n 2 1 − ψ (θ1) ≥ 2c0 sin (θ1x/2) 2+α dxdr 2 n−β 0 r Z ∞ Z ∞ 2 1 = 2c0 sin (θ1x/2) 2+α drdx 0 n−β ∨x r Z ∞ 2c0 2 −β −(1+α) = sin (θ1x/2)(n ∨ x) dx 1 + α 0 Z ∞ 2c0 2 −(1+α) ≥ sin (θ1x/2)x dx 1 + α n−β Z ∞ 2c0 α 2 −(1+α) = θ1 sin (y/2)y dy. −β 1 + α θ1n β Since θ1 ≤ n , the integral in the above is bounded below by a constant. By symmetry, with thus obtain that for any θ such that |θ| ≤ nβ,

− ψn(θ) ≥ c|θ|α (B.23) for some c > 0. We can carry out a similar calculation for d ≥ 2. Since ψn is radially symmetric, it suffices to consider θ = (θ1, 0,..., 0) with θ1 > 0: Z ∞ Z r Z 1 n 2 1 − ψ (θ) ≥ c0 sin (θ1x1/2) d+1+α dxdρdr 2 n−β 0 |x|=ρ r Z ∞ Z ∞ Z 2 1 = c0 sin (θ1x1/2) d+1+α dxdrdρ 0 n−β ∨ρ |x|=ρ r Z ∞ Z c0 2 −β −(d+α) = sin (θ1x1/2)(n ∨ ρ) dxdρ d + α 0 |x|=ρ Z ∞ Z c0 2 −(d+α) ≥ sin (θ1x1/2)ρ dxdρ d + α n−β |x|=ρ Z ∞ Z c0 2 −(1+α) = sin (ρθ1y1/2)ρ dydρ d + α n−β |y|=1 Z ∞ Z c0 α 2 −(1+α) = θ1 sin (ry1/2)r dydr. −β d + α θ1n |y|=1

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 83/89 The spatial Lambda-Fleming-Viot process with selection

β Since θ1 = |θ| ≤ n , the double integral in the above is bounded below by a constant. Therefore we arrive at the same estimate as in (B.23) and we have proved (ii).

−c2β(2−α)/(2(d+1)) Lemma B.5. (i) Let c2 ∈ (0, α) be a constant. If n ≤ t ≤ T , then Z t(ψn(θ)−ψ(θ)) tψ(θ) −β(2−α)d/(2(d+1)) |(e − 1)e |dθ ≤ Cd,T n . |θ|≤nβ

d (ii) Let Zr be a uniform random variable on B(0, r) ⊂ R , then

2d/2Γ(d/2 + 1) Eeiθ·Zr  = J (|rθ|), |rθ|d/2 d/2

where Jd/2 is the Bessel function of the first kind of order d/2. (iii) If M ≥ 2, then under the assumptions of (i) there exist positive a (with a < 1) and Cd, independent of M, such that for all t > 0, Z n,[M,∞) βd M−1 |q˜t (θ)|dθ ≤ Cdn a . |θ|≥nβ √ Proof. Let  = n−β(2−α)d/(d+1). For |θ| ≤ nβ(2−α) = nβ(2−α)/(2(d+1)), Lemma B.4(i) implies for t ≤ T and sufficiently large n,

t(ψn(θ)−ψ(θ)) n |e − 1| ≤ Ct ψ (θ) − ψ(θ) ≤ Cd,T .

Hence

Z n |(et(ψ (θ)−ψ(θ) − 1)etψ(θ)|dθ |θ|≤nβ Z Z t(ψn(θ)−ψ(θ) tψ(θ) tψn(θ) tψ(θ) ≤ √ |(e − 1)e |dθ + √ (e + e )dθ |θ|≤ nβ(2−α) nβ(2−α)<|θ|≤nβ Z nβ β(2−α) d/2 d−1 −ctrα ≤ Cd,T (n )  + Cd √ r e dr nβ(2−α)

− β(2−α)d by Lemma B.4(ii). The first term is equal to Cd,T n 2(d+1) . Since

trα ≥ n(α−c2)β(2−α)/(2(d+1))

βd −cnb in the integral, the second term is bounded by Cn e (with b = (α−c2)β(2−α)/(2(d+ 1)) > 0). Both estimates combined give us (i). For (ii), we use Theorem 4.15 of [48], which states that the Fourier transform of the indicator function on the unit ball in d dimensions is Z −d/2 iθ·x θ 1[0,1](|x|)e dx = Jd/2(|θ|). Rd 2π

Hence, dividing by the volume of the unit ball in d dimensions, which is πd/2/Γ(d/2 + 1), yields 2d/2Γ(d/2 + 1) Eeiθ·Z1  = J (|θ|). (B.24) |θ|d/2 d/2

Scaling Z1 by a factor of r gives us the desired result. For (iii), we recall from (B.9) the representation of Xn using random walks with step R V1 −(1+α) size Yk. Let R be an -valued r.v. distributed according to A r 1{r>n−β }dr, Z be

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 84/89 The spatial Lambda-Fleming-Viot process with selection a uniformly distributed random variable in B(0, 1) and ρ˜(θ) = E[eiθ·Z ]. Then ρ˜ is given by (B.24) is real and

n,[M,∞) E  E 2 Kt  q˜t (θ) = Kt ( R[˜ρ(Rθ) ]) 1{Kt≥M} ,

E E where Kt and R are expectations taken with respect to Kt and R, respectively. Observe that Z ∞ 2 −αβ −(1+α) 2 ER[˜ρ(Rθ) ] = n r ρ˜(rθ) dr. n−β

First, we show |ρ˜(v)| = |E[eiv·Z ]| is bounded above by a constant a ∈ (0, 1) for |v| ≥ 1 uniformly. Since Z is radially symmetric about 0, we have ρ˜(v) = E[cos(v · Z)] = (1) (1) E[cos(v1Z )], where v1 and Z denote the first coordinate of v and Z, respectively. It (1) suffices to consider v1 ≥ 1. Let δ1 be a small positive constant. If |v1Z − nπ| ≥ δ1 for (1) all n ∈ Z, then | cos(v1Z )| ≤ cos δ1 < 1. Let In = ((nπ − δ1)/v1, (nπ + δ1)/v1), then

∞  (1)  X  (1)  P |v1Z − nπ| < δ1 for some n ∈ Z = P Z ∈ In . n=−∞

(1) Since −1 ≤ Z ≤ 1, the intervals In for which the probabilities on the right hand side above are non-empty and have total length ≤ 2δ1. These intervals do not overlap. The way to arrange non-overlapping intervals Jn of total length 2δ1 so that the probability P P (1) n [Z ∈ Jn] is maximised is to take J1 = [−1, −1 + δ1],J2 = [1 − δ1, 1] and Jn = ∅ otherwise. Therefore

 (1)   (1)  P |v1Z − nπ| < δ1 for some n ∈ Z ≤ 2P Z ≥ [1 − δ1, 1] ≤ 2δ2 for some δ2 ∈ (0, 1/4) if we pick a sufficiently small δ1. This implies

 (1)   (1)   (1)  E E (1) E (1) cos(v1Z ) = cos(v1Z )1|v1Z −nπ|≥δ1 + cos(v1Z )1|v1Z −nπ|<δ1  (1)   (1)  ≤ (cos δ1)P |v1Z − nπ| ≥ δ1 + P |v1Z − nπ| < δ1 ≤ a for some a ∈ (0, 1). This estimate implies

2 ER[˜ρ(Rθ) ] ≤ a for |θ| ≥ nβ. Second, plugging in θ = nβξ yields

Z ∞ Z ∞ β 2 −αβ −(1+α) β 2 −(1+α) 2 ER[˜ρ(Rn ξ) ] = n r ρ˜(rn ξ) dr = x ρ˜(xξ) dx n−β 1 Z ∞ 2 Z ∞ d 2 −(1+α) Jd/2(|xξ|) −(1+α) −(d+1) = 2 Γ(d/2 + 1) x d dx ≤ Cd x |xξ| dx 1 |xξ| 1 −(d+1) ≤ Cd|ξ| ,

−1/2 where we use the fact |Jν (z)| < Cz for ν > 0 ([1], p. 362, 9.1.61). The two estimates above imply that there exist a ∈ (0, 1) and Cd > 0 (both independent of M) such that for |ξ| ≥ n−β, β 2 −(d+1) ER[˜ρ(Rn ξ) ] ≤ a ∧ Cd|ξ| .

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 85/89 The spatial Lambda-Fleming-Viot process with selection

We use this to estimate Z n,[M,∞) |q˜t (θ)|dθ |θ|≥nβ Z βd n,[M,∞) β = n |q˜t (n ξ)|dξ |ξ|≥1 Z βd −(d+1) M ≤ n (a ∧ Cd|ξ| ) dξ |ξ|≥1 Z Z ! βd M −(d+1) M ≤ n a dξ + (Cd|ξ| ) dξ 1/(d+1) 1/(d+1) 1≤|ξ|≤(Cd/a) |ξ|>(Cd/a) Z ∞ ! βd M d/(d+1) −(d+1) M d−1 ≤ n Cda (Cd/a) + (Cdr ) r dr . 1/(d+1) (Cd/a)

1/(d+1) −(d+1) −(d+1) We take ρ = r/Cd (hence Cdr = ρ ) to obtain

Z Z ∞ ! n,[M,∞) 0 βd M−d/(d+1) −(d+1) M 1/(d+1) d−1 |q˜t (θ)|dθ ≤ Cdn a + (ρ ) (ρCd ) dρ |θ|≥nβ 1/a1/(d+1) Z ∞ ! 0 βd M−1 −(d+1)(M−1)−2 ≤ Cdn a + ρ dρ 1/a1/(d+1) 0 βd M−1 1/(d+1) −(d+1)(M−1)−1 ≤ Cdn a + (1/a ) 0 βd M−1 ≤ Cdn a if M ≥ 2. Hence we have established (iii).

c3αβ/2 −(1−c3)αβ Lemma B.6. Let c3 ∈ (0, 1) be a constant. If M = n and n ≤ t ≤ T , then c3αβ/2 T P −n R P −(1−c3)αβ [Kt < M] ≤ Ce . Hence, 0 [Kt < M]dt ≤ CT n .

αβ Proof. By a standard tail estimate for the P oisson(V1n t) random variable Kt, since αβ M ≤ V1n t we can write

 αβ M αβ eV1n t P[K < M] ≤ e−V1n t t M αβ αβ = exp(−V1n t + M(1 + log V1 + log(n t) − log M)).

αβ c3αβ The dominant term in the exponent above is V1n t, which is ≥ V1n , hence

−nc3αβ/2 P[Kt < M] ≤ Ce .

This establishes the estimate on P[Kt < M]. The estimate on its integral follows easily by splitting the integral over [0, n−(1−c3)αβ) and [n−(1−c3)αβ,T ].

Finally we turn to the proof of our key lemma.

Proof of Lemma B.3. Recall from (B.9) the representation of Xn using random walks −αβ −(1+α) with step size Yk: conditioned on Rk, which has density n r 1{r>n−β }dr, Yk|Rk = ∗2 n r has density ur (x). Recall also the definition of qt given in (B.10) and let q be the density of the limiting α-stable process with Laplace exponent ψ defined in (B.22). We

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 86/89 The spatial Lambda-Fleming-Viot process with selection write Z n,[M,∞) n n n tψ (θ) E iθ·(Xt −X0 ) tψ(θ) −iθ·x 2π|qt (x) − qt(x)| = (e − [e 1{Kt

Z n Z t(ψ (θ)−ψ(θ)) tψ(θ) −iθ·x P ≤ (e − 1)e e dθ + [Kt < M]dθ |θ|

Z Z tψ(θ) n,[M,∞) + e dθ + q˜t (θ)dθ . |θ|≥nβ |θ|≥nβ Lemma B.5 implies that the first and fourth terms are bounded above by

−β(2−α)d/(2(d+1)) βd M−1 Cd,T n ,Cdn a , respectively, where we also use t ≥ n−c2β(2−α)/(2(d+1)). The second term is bounded above by βd CdP[Kt < M]n . Lemma B.4(ii) implies that the third term is bounded by

∞ Z Z α Z tψ(θ) −ct|θ| d−1 −c2β(2−α)/(2(d+1))  e dθ ≤ e dθ ≤ Cd r exp − cn r dr |θ|≥nβ |θ|≥nβ nβ Z ∞ −c2β(2−α)/(2(d+1))  ≤ Cd exp − cn r + (d − 1) log r dr nβ Z ∞  c  −c2β(2−α)/(2(d+1))  β−c2β(2−α)/(2(d+1)) ≤ Cd exp − cn r/2 dr ≤ Cd exp − n . nβ 2 Combining the estimates for these four terms yields the desired result in (i) for the case M ≥ 2. Using Lemma B.6, the estimate for L = nc1αβ/2 follows easily (noting that −(1−c1)αβ −c2β(2−α)/[2(d+1)] n is always smaller than n whenever c2 < 1). For (ii), we observe that it was shown in Lemma 6.1 that the process ηt with gen- d 1/α erator (6.30) is a symmetric α-stable process, hence ηt = t η1. Let fηt be the density R tψ(θ) m function of ηt. By Proposition 5.28.1 of [46], since Rd |e ||θ| dθ < ∞ for all m > 0, fηt m is C for all m > 0. In particular, this means that the first derivative of fηt is uniformly bounded, therefore fη1 is uniformly continuous. This means that

−d/α −1/α −1/α −(d+1)/α |fηt (x) − fηt (y)| = t |fη1 (t x) − fη1 (t y)| ≤ Ct |x − y|.

Hence −(d+1)/α |qt(x) − qt(y)| ≤ Ct |x − y|, as desired in (ii). Part (iii) follows easily from (i) and (ii). Pk Let fXk denote the density of Xk = i=1 Yi. Since the density of Y1 is radially symmetric and decreasing in |x|, the same properties hold for fXk . Let Xk,1 denote the first coordinate of Xk, then for x1 ∈ [1, ∞) and λ > 0, P fXk (x1, 0,..., 0)) ≤ [Xk,1 ≥ x1 − 1] k ≤ e−λ(x1−1)Ee(λ,0,...,0)·Xk  = e−λ(x1−1)Ee(λ,0,...,0)·Y1  .

We would like to estimate E[e(λ,0,...,0)·Y1 ], for which we calculate, using Lemma B.5(ii),

Z ∞ 1 2dΓ(d/2 + 1)2  E (λ,0,...,0)·Y1  −αβ 2 e − 1 = n 1+α d Jd/2(rλ) − 1 dr n−β r (rλ) Z ∞  d 2  −αβ α 1 2 Γ(d/2 + 1) 2 = n λ 1+α d Jd/2(ρ) − 1 dρ. λn−β ρ ρ

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 87/89 The spatial Lambda-Fleming-Viot process with selection

From [1], p. 362, 9.1.69, Bessel functions are related to generalised hypergeometric functions in the following way

∞ d  d  X 1 (−x2/4)n Γ + 1 J (x)(x/2)−d/2 = F + 1; −x2/4 := 1 + . 2 d/2 0 1 2 d d n! n=1 ( 2 + 1) ... ( 2 + n)

Hence  d  2 Γ + 1 J (ρ)(ρ/2)−d/2 − 1 ≤ C ρ2 2 d/2 d for ρ ∈ [0, 1]. This implies

Z 1 Z ∞   (λ,0,...,0)·Y1  −αβ α 1−α −(1+α) E e − 1 ≤ n λ Cdρ dρ + 2 ρ dρ 0 1 −αβ ≤ Cλn ,

2d/2Γ(d/2+1) E iρ·Z1 where we also use ρd/2 Jd/2(ρ) = | [e ]| ≤ 1 in the first inequality. Hence

−αβ  (λ,0,...,0)·Y1  −αβ Cλn E e ≤ 1 + Cλn ≤ e , which means −αβ −λ(x1−1) Cλn k fXk ((x1, 0,..., 0)) ≤ e e . Plugging the above into the random walk representation yields

n,[1,∞) −αβ −αβ E  −λ(x1−1) Cλn Kt  −λ(x1−1) αβ Cλn  qt (x) ≤ Kt e e 1{Kt≥1} ≤ e exp V1n t(e − 1)

αβ αβ C n−αβ since Kt ∼ P oisson(V1n t). Since n (e λ − 1)) → Cλ as n → ∞, we have for t ≤ T and |x| ≥ 1, n,[1,∞) −λ(|x|−1) qt (x) ≤ Cλ,T e , as desired in part (iv). For part (v), we obtain,

n,[L,∞) n,[L,∞) ζt (x; y, ) − ζt (x; z, ) Z n 0 n 0 n,[L,∞) 0 0 = (ζ0 (x ; y, ) − ζ0 (x ; z, ))qt (x − x )dx Z n 0 n 0 n,[L,∞) 0 0 = (ζ0 (x − y; 0, ) − ζ0 (x − z; 0, ))qt (x − x )dx Z n 0 n,[L,∞) 0 n,[L,∞) 0 0 = ζ0 (x ; 0, )(qt (x − y − x ) − qt (x − z − x ))dx . (B.25)

For t ∈ [n−c2β(2−α)/(2(d+1)),T ] and |y − z| ≤ 1, we have

n,[L,∞) n,[L,∞) λ|x| sup |qt (y − x) − qt (z − x)|e x n,[L,∞) n,[L,∞) λ|x| ≤ sup qt (y − x) − qt (z − x) e x:|x−z|<2 n,[L,∞) n,[L,∞) λ|x| + sup qt (y − x) − qt (z − x) e x:|x−z|≥2  −(d+1)/α −β(2−α)d/(2(d+1)) λ|z| ≤ Cλ,d,T (t |y − z| + n )e + sup min(t−(d+1)/α|y − z| + n−β(2−α)d/(2(d+1)), e−2λ|x−y| + e−2λ|x−z|)eλ|x|, x:|x−z|≥2

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 88/89 The spatial Lambda-Fleming-Viot process with selection where we use (iii) for the first term, and (iii) and (iv) (applied with 2λ) for the second. Hence,

n,[L,∞) n,[L,∞) λ|x| sup qt (y − x) − qt (z − x) e x  −(d+1)/α −β(2−α)d/(2(d+1)) λ|z| ≤ Cλ,d,T (t |y − z| + n )e + sup(t−(d+1)/α|y − z| + n−β(2−α)d/(2(d+1)))1/2(e−2λ|x−y| + e−2λ|x−z|)1/2eλ|x| x  −(d+1)/α −β(2−α)d/(2(d+1)) −(d+1)/α ≤ Cλ,d,T (t |y − z| + n + (t |y − z| + n−β(2−α)d/(2(d+1)))1/2eλ|z| −(d+1)/(2α) 1/2 −β(2−α)d/(4(d+1)) λ|z| ≤ Cλ,d,T (t |y − z| + n )e .

Plugging this estimate into (B.25) yields

n,[L,∞) n,[L,∞) λ|x| sup ζt (x; y, ) − ζt (x; z, ) e x Z n 0 n,[L,∞) 0 n,[L,∞) 0 λ|x−x0| λ(|x|−|x−x0|) 0 ≤ sup ζ0 (x ; 0, ) qt (x − x − y) − qt (x − x − z) e e dx x Z −(d+1)/(2α) 1/2 −β(2−α)d/(4(d+1)) λ|z| n 0 λ|x0| 0 ≤ Cλ,d,T t |y − z| + n e ζ0 (x ; 0, )e dx

λ −(d+1)/(2α) 1/2 −β(2−α)d/(4(d+1)) λ|z| ≤ Cλ,d,T e t |y − z| + n e , n as desired. Note that we used the assumption that the support of ζ0 (·; 0, ) is contained in λ|x0| λ B(0, ) to bound e by e . Note also that this calculation holds even if  = n depends on n.

Lemma B.7. There exists c5 > 0 such that for all t > 0, n,[L,∞) −d/α −nc5 sup ζt (x; z, ) ≤ Cd(t + e ), x where  can depend on n.

˜n R iθ·x ˜n ˜n,[L,∞) Proof. Let ζ0 (θ) = Rd e ζ0(x; z, )dx, then |ζ0 (θ)| ≤ 1 regardless of . Let ζt (θ) = n,[L,∞) n,[L,∞) n n ˜n E iθ·(Xt −X0 ) q˜t (θ)ζ0 (θ), where we recall that q˜t (θ) = [e 1{Kt≥L}]. Then n,[L,∞) ζt (x; z, ) Z 1 ˜n,[L,∞) −iθ·x = ζt (θ)e dθ 2π Rd

1 Z 1 Z n,[L,∞) ˜n −iθ·x n,[L,∞) ˜n −iθ·x ≤ q˜t (θ)ζ0 (θ)e dθ + q˜t (θ)ζ0 (θ)e dθ 2π |θ|

n,[0,L) n n E iθ·(Xt −X0 ) P Since |q˜t (θ)| = | [e 1{Kt

α R −c4t|θ| −d/α for some c4 > 0 and a ∈ (0, 1). Let f(t) = Rd e dθ, then f(t) = t f(1). Hence,  Z  α c5 n,[L,∞) −d/α −c4|θ| −n ζt (x; z, ) ≤ Cd t e dθ + e , Rd

for some c5 > 0. This implies the desired result.

EJP 25 (2020), paper 120. https://www.imstat.org/ejp Page 89/89 Electronic Journal of Probability Electronic Communications in Probability

Advantages of publishing in EJP-ECP

• Very high standards • Free for authors, free for readers • Quick publication (no backlog) • Secure publication (LOCKSS1) • Easy interface (EJMS2)

Economical model of EJP-ECP

• Non profit, sponsored by IMS3, BS4 , ProjectEuclid5 • Purely electronic

Help keep the journal free and vigorous

• Donate to the IMS open access fund6 (click here to donate!) • Submit your best articles to EJP-ECP • Choose EJP-ECP over for-profit journals

1LOCKSS: Lots of Copies Keep Stuff Safe http://www.lockss.org/ 2EJMS: Electronic Journal Management System http://www.vtex.lt/en/ejms.html 3IMS: Institute of Mathematical Statistics http://www.imstat.org/ 4BS: Bernoulli Society http://www.bernoulli-society.org/ 5Project Euclid: https://projecteuclid.org/ 6IMS Open Access Fund: http://www.imstat.org/publications/open.htm