
Yun S. Song University of California, Berkeley Lecture Notes on Computational and Mathematical Population Genetics May 5, 2021 c Yun S. Song. DRAFT { May 5, 2021 c Yun S. Song. DRAFT { May 5, 2021 Contents References . ix Part I Genealogical Trees 1 Basic properties of the genealogy of a sample .......................... 3 1.1 A high-level description of the coalescent model . .3 1.2 Discrete-time ancestral process . .4 1.3 A large-N limit .....................................................7 1.4 Waiting time while there are k ancestors . 10 1.5 Tree height . 10 1.6 Tree length . 12 1.7 The ancestral lineage of a particular leaf. 13 References . 14 2 Kingman's coalescent .................................................. 15 2.1 The n-coalescent . 15 2.2 Subtree leaf-set sizes . 18 2.3 Some properties of a subsample . 19 2.4 Forward-in-time jump chain . 21 2.5 Tree topologies . 22 2.6 The Yule-Harding process . 25 2.7 Urn models with stochastic replacement . 27 2.8 Sufficient conditions for weak convergence to the n-coalescent . 28 2.9 Moran models . 32 2.10 Necessary and sufficient conditions for weak convergence . 33 2.11 Coming down from infinity . 34 References . 35 Part II Neutral Mutations on Trees at Equilibrium 3 Number of mutations .................................................. 39 3.1 Mutations in a single lineage . 40 3.2 Number of mutations in a coalescent tree with n leaves . 40 3.3 Waiting times conditioned on the number of mutations . 43 v c Yun S. Song. DRAFT { May 5, 2021 vi Contents References . 45 4 Infinite-alleles model and random combinatorial structures . 47 4.1 θ-biased random permutations . 47 4.2 The infinite-alleles model and the Ewens sampling formula . 48 4.3 The coalescent with killing . 51 4.4 Ancestral process under the coalescent with killing . 54 4.5 Hoppe's urn model . 54 4.6 Chinese Restaurant Process . 56 4.7 The number of distinct allele types . 57 4.8 A sufficient statistic for θ ............................................. 59 4.9 Population-wide distribution of allele frequencies . 59 4.9.1 Size-biased representation, stick breaking process, and the GEM distribution . 60 4.9.2 Poisson-Dirichlet point process . 62 4.9.3 Probability generating functional . 62 4.9.4 Limit of a symmetric mutation model with K alleles . 63 References . 65 5 Infinite-sites model of mutation ........................................ 67 5.1 Model description . 67 5.2 Connections with the infinite-alleles model . 69 5.3 Site frequency spectrum (SFS) under the infinite-sites model . 71 5.4 A warning on conditioning on the number of segregating sites . 72 5.5 The age of a mutation . 73 5.6 Unbiased moment estimators of θ ...................................... 75 5.7 Tests of selective neutrality . 78 5.8 A direct method of computing the full likelihood . 79 5.9 Perfect phylogeny . 81 5.10 Probability recursion for gene trees . 82 5.11 Root unknown case . 85 References . 85 6 Finite-alleles model of mutation ....................................... 87 6.1 Sampling probability . 87 6.2 Parent-independent mutation . 89 6.3 A simple Monte Carlo method for approximating the likelihood . 90 6.4 Sequential importance sampling (SIS) . 92 6.4.1 The coalescent prior distribution of histories . 93 6.4.2 Reverse transition probability . 95 6.5 Approximate conditional sampling distribution (CSD) . 96 6.5.1 A single site . 96 6.5.2 Generalization to multiple sites . 99 6.6 The infinite-sites model revisited . 100 6.7 Posterior probability of the first event back in time . 100 6.8 Closed-form asymptotic sampling formulae for small θ .................... 102 6.9 How many triallic sites do we expect to see in a sample of n genomes? . 104 References . 106 c Yun S. Song. DRAFT { May 5, 2021 Contents vii Part III Demography 7 Variable population size ...............................................109 7.1 Discrete-time model . 109 7.2 Inter-coalescence times and the ancestral process . 110 7.3 The expected SFS under variable population size . 112 7.3.1 Inter-coalescence times in terms of first-coalescence times . 112 7.3.2 Monotonicity and convexity. 115 7.4 SFS-based likelihoods . 115 7.4.1 Completely linked case . 116 7.4.2 Completely unlinked case: Poisson Random Field . 116 7.5 A recursion for efficiently computing P(Am(t) = k)...................... 118 7.6 Identifiability of population size histories from the SFS . 119 7.6.1 An analogy: Can you hear the shape of a drum? . 119 7.6.2 Non-identifiability and an explicit counterexample . 120 7.6.3 Rule of signs . 121 7.6.4 Identifiability . 123 7.7 Minimax error for population size estimation based on the SFS . 125 7.8 Geometry of the SFS . 125 References . 126 8 Multiple populations ..................................................127 8.1 The structured coalescent . 127 8.2 Coalescence time for a pair of lineages. ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages198 Page
-
File Size-