Recent Progress in Coalescent Theory

SOCIEDADE BRASILEIRA DE MATEMÁTICA ENSAIOS MATEMATICOS¶ 2009, Volume 16, 1{193 Recent progress in coalescent theory NathanaÄel Berestycki Abstract. Coalescent theory is the study of random processes where particles may join each other to form clusters as time evolves. These notes provide an introduction to some aspects of the mathematics of coalescent processes and their applications to theoretical population genetics and in other ¯elds such as spin glass models. The emphasis is on recent work concerning in particular the connection of these processes to continuum random trees and spatial models such as coalescing random walks. 2000 Mathematics Subject Classi¯cation: 60J25, 60K35, 60J80. Contents Introduction 7 1 Random exchangeable partitions 9 1.1 De¯nitions and basic results . 9 1.2 Size-biased picking . 14 1.2.1 Single pick . 14 1.2.2 Multiple picks, size-biased ordering . 16 1.3 The Poisson-Dirichlet random partition . 17 1.3.1 Case ® = 0 . 17 1.3.2 Case θ = 0 . 20 1.3.3 A Poisson construction . 20 1.4 Some examples . 21 1.4.1 Random permutations . 22 1.4.2 Prime number factorisation . 24 1.4.3 Brownian excursions . 24 1.5 Tauberian theory of random partitions . 26 1.5.1 Some general theory . 26 1.5.2 Example . 30 2 Kingman's coalescent 31 2.1 De¯nition and construction . 31 2.1.1 De¯nition . 31 2.1.2 Coming down from in¯nity . 33 2.1.3 Aldous' construction . 35 2.2 The genealogy of populations . 39 2.2.1 A word of vocabulary . 40 2.2.2 The Moran and the Wright-Fisher models . 42 2.2.3 MÄohle's lemma . 43 2.2.4 Di®usion approximation and duality . 45 2.3 Ewens' sampling formula . 50 2.3.1 In¯nite alleles model . 50 2.3.2 Ewens sampling formula . 51 CONTENTS 2.3.3 Some applications: the mutation rate . 53 2.3.4 The site frequency spectrum . 55 3 ¤-coalescents 59 3.1 De¯nition and construction . 59 3.1.1 Motivation . 59 3.1.2 De¯nition . 60 3.1.3 Pitman's structure theorems . 61 3.1.4 Examples . 66 3.1.5 Coming down from in¯nity . 67 3.2 A Hitchhiker's guide to the genealogy . 73 3.2.1 A Galton-Watson model . 73 3.2.2 Selective sweeps . 77 3.3 Some results by Bertoin and Le Gall . 81 3.3.1 Fleming-Viot processes . 82 3.3.2 A stochastic ﬂow of bridges . 86 3.3.3 Stochastic di®erential equations . 88 3.3.4 Coalescing Brownian motions . 90 4 Analysis of ¤-coalescents 93 4.1 Sampling formulae for ¤-coalescents . 93 4.2 Continuous-state branching processes . 96 4.2.1 De¯nition of CSBPs . 96 4.2.2 The Donnelly-Kurtz lookdown process . 103 4.3 Coalescents and branching processes . 106 4.3.1 Small-time behaviour . 107 4.3.2 The martingale approach . 109 4.4 Applications: sampling formulae . 110 4.5 A paradox . 112 4.6 Further study of coalescents with regular variation . 112 4.6.1 Beta-coalescents and stable CRT . 113 4.6.2 Backward path to in¯nity . 114 4.6.3 Fractal aspects . 114 4.6.4 Fluctuation theory . 116 5 Spatial interactions 119 5.1 Coalescing random walks . 119 5.1.1 The asymptotic density . 120 5.1.2 Arratia's rescaling . 122 5.1.3 Voter model and super-Brownian limit . 124 5.1.4 Stepping stone and interacting di®usions . 127 5.2 Spatial ¤-coalescents . 131 5.2.1 De¯nition . 131 5.2.2 Asymptotic study on the torus . 132 CONTENTS 5.2.3 Global divergence . 133 5.2.4 Long-time behaviour . 135 5.3 Some continuous models . 137 5.3.1 A model of coalescing Brownian motions . 138 5.3.2 A coalescent process in continuous space . 140 6 Spin glass models and coalescents 142 6.1 The Bolthausen-Sznitman coalescent . 142 6.1.1 Random recursive trees . 143 6.1.2 Properties . 146 6.1.3 Further properties . 149 6.2 Spin glass models . 150 6.2.1 Derrida's GREM . 150 6.2.2 Some extreme value theory . 153 6.2.3 Sketch of proof . 154 6.3 Complements . 156 6.3.1 Neveu's branching process . 156 6.3.2 Sherrington-Kirkpatrick model . 157 6.3.3 Natural selection and travelling waves . 158 A Appendix: Excursions and random trees 163 A.1 Excursion theory for Brownian motion . 163 A.1.1 Local times . 164 A.1.2 Excursion theory . 166 A.2 Continuum random trees . 168 A.2.1 Galton-Watson trees and random walks . 168 A.2.2 Convergence to reﬂecting Brownian motion . 171 A.2.3 The continuum random tree . 172 A.3 Continuous-state branching processes . 174 A.3.1 Feller di®usion and Ray-Knight theorem . 174 A.3.2 Height process and the CRT . 177 Bibliography 181 Index 191 Introduction The probabilistic theory of coalescence, which is the primary subject of these notes, has expanded at a quick pace over the last decade or so. I can think of three factors which have essentially contributed to this growth. On the one hand, there has been a rising demand from population ge- neticists to develop and analyse models which incorporate more realistic features than what Kingman's coalescent allows for. Simultaneously, the ¯eld has matured enough that a wide range of techniques from modern probability theory may be successfully applied to these questions. These tools include for instance martingale methods, renormalization and random walk arguments, combinatorial embeddings, sample path analysis of Brownian motion and L¶evy processes, and, last but not least, continuum random trees and measure-valued processes. Finally, coalescent processes arise in a natural way from spin glass models of statistical physics. The identi¯cation of the Bolthausen-Sznitman coalescent as a universal scaling limit in those models, and the connection made by Brunet and Derrida to models of population genetics, is a very exciting recent development. The purpose of these notes is to give a quick introduction to the mathematical aspects of these various ideas, and to the biological motivations underlying them. We have tried to make these notes as self-contained as possible, but within the limits imposed by the desire to make them short and keep them accessible. Of course, the price to pay for this is a lack of mathematical rigour. Often we skip the technical parts of arguments, and instead focus on some of the key ideas that go into the proof. The level of mathematical preparation required to read these notes is roughly that of two courses in probability theory. Thus we will assume that the reader is familiar with such notions as Poisson point processes and Brownian motion. Sadly, several important and beautiful topics are not discussed. The most obvious such topics are the Marcus-Lushnikov processes and their relation to the Smoluchowski equations, as well as works on simultaneous 7 8 Introduction multiple collisions. Also not appearing in these notes is the large body of work on random fragmentation. For all these and further omissions, I apologise in advance. A ¯rst draft of these notes was prepared for a set of lectures at IMPA in January 2009. Many thanks to Vladas Sidoravicius and Maria Eulalia Vares for their invitation, and to Vladas in particular for arranging many details of the trip. I lectured again on this material at Eurandom on the occasion of the conference Young European Probabilists in March 2009. Thanks to Julien Berestycki and Peter MÄorters for organizing this meeting and for their invitation. I also want to thank Charline Smadi-Lasserre for a careful reading of an early draft of these notes. Many thanks to the people with whom I learnt about coalescent processes: ¯rst and foremost, my brother Julien, and to my other collabora- tors on this topic: Alison Etheridge, Vlada Limic, and Jason Schweinsberg. Thanks are due to Rick Durrett and Jean-Fran»cois Le Gall for triggering my interest in this area while I was their PhD students. N.B. Cambridge, September 2009 Chapter 1 Random exchangeable partitions This chapter introduces the reader to the theory of exchangeable random partitions, which is a basic building block of coalescent theory. This theory is essentially due to Kingman; the basic result (essentially a variation on De Finetti's theorem) allows one to think of a random partition al- ternatively as a discrete object, taking values in the set of partitions P of N = 1; 2; : : : ; , or a continuous object, taking values in the set 0 of tilings offthe unit ginterval (0,1). These two points of view are strictly equiv-S alent, which contributes to make the theory quite elegant: sometimes, a property is better expressed on a random partition viewed as a partition of N, and sometimes it is better viewed as a property of partitions of the unit interval. We then take a look at a classical example of random partitions known as the Poisson-Dirichlet family, which, as we partly show, arises in a huge variety of contexts. We then present some recent results that can be labelled as \Tauberian theory", which takes a particularly elegant form here. 1.1 De¯nitions and basic results We ¯rst ¯x some vocabulary and notation. A partition ¼ of N is an equivalence relation on N. The blocks of the partition are the equivalence classes of this relation. We will sometime write i j or i ¼ j to denote that i and j are in the same block of ¼.

Recent Progress in Coalescent Theory

Applied Probability

Superprocesses and Mckean-Vlasov Equations with Creation of Mass

Collision Times of Random Walks and Applications to the Brownian

Science Fiction and Astronomy

Semimartingale Properties of a Generalized Fractional Brownian Motion and Its Mixtures with Applications in ﬁnance

Lecture 19 Semimartingales

Coalescent Likelihood Methods

Natural Selection and Coalescent Theory

Population Genetics: Wright Fisher Model and Coalescent Process

Fclts for the Quadratic Variation of a CTRW and for Certain Stochastic Integrals

Science Fiction Stories with Good Astronomy & Physics

The Brownian Web