Lecture Notes on Computational and Mathematical Population Genetics

Lecture Notes on Computational and Mathematical Population Genetics

Yun S. Song University of California, Berkeley Lecture Notes on Computational and Mathematical Population Genetics May 5, 2021 c Yun S. Song. DRAFT { May 5, 2021 c Yun S. Song. DRAFT { May 5, 2021 Contents References . ix Part I Genealogical Trees 1 Basic properties of the genealogy of a sample .......................... 3 1.1 A high-level description of the coalescent model . .3 1.2 Discrete-time ancestral process . .4 1.3 A large-N limit .....................................................7 1.4 Waiting time while there are k ancestors . 10 1.5 Tree height . 10 1.6 Tree length . 12 1.7 The ancestral lineage of a particular leaf. 13 References . 14 2 Kingman's coalescent .................................................. 15 2.1 The n-coalescent . 15 2.2 Subtree leaf-set sizes . 18 2.3 Some properties of a subsample . 19 2.4 Forward-in-time jump chain . 21 2.5 Tree topologies . 22 2.6 The Yule-Harding process . 25 2.7 Urn models with stochastic replacement . 27 2.8 Sufficient conditions for weak convergence to the n-coalescent . 28 2.9 Moran models . 32 2.10 Necessary and sufficient conditions for weak convergence . 33 2.11 Coming down from infinity . 34 References . 35 Part II Neutral Mutations on Trees at Equilibrium 3 Number of mutations .................................................. 39 3.1 Mutations in a single lineage . 40 3.2 Number of mutations in a coalescent tree with n leaves . 40 3.3 Waiting times conditioned on the number of mutations . 43 v c Yun S. Song. DRAFT { May 5, 2021 vi Contents References . 45 4 Infinite-alleles model and random combinatorial structures . 47 4.1 θ-biased random permutations . 47 4.2 The infinite-alleles model and the Ewens sampling formula . 48 4.3 The coalescent with killing . 51 4.4 Ancestral process under the coalescent with killing . 54 4.5 Hoppe's urn model . 54 4.6 Chinese Restaurant Process . 56 4.7 The number of distinct allele types . 57 4.8 A sufficient statistic for θ ............................................. 59 4.9 Population-wide distribution of allele frequencies . 59 4.9.1 Size-biased representation, stick breaking process, and the GEM distribution . 60 4.9.2 Poisson-Dirichlet point process . 62 4.9.3 Probability generating functional . 62 4.9.4 Limit of a symmetric mutation model with K alleles . 63 References . 65 5 Infinite-sites model of mutation ........................................ 67 5.1 Model description . 67 5.2 Connections with the infinite-alleles model . 69 5.3 Site frequency spectrum (SFS) under the infinite-sites model . 71 5.4 A warning on conditioning on the number of segregating sites . 72 5.5 The age of a mutation . 73 5.6 Unbiased moment estimators of θ ...................................... 75 5.7 Tests of selective neutrality . 78 5.8 A direct method of computing the full likelihood . 79 5.9 Perfect phylogeny . 81 5.10 Probability recursion for gene trees . 82 5.11 Root unknown case . 85 References . 85 6 Finite-alleles model of mutation ....................................... 87 6.1 Sampling probability . 87 6.2 Parent-independent mutation . 89 6.3 A simple Monte Carlo method for approximating the likelihood . 90 6.4 Sequential importance sampling (SIS) . 92 6.4.1 The coalescent prior distribution of histories . 93 6.4.2 Reverse transition probability . 95 6.5 Approximate conditional sampling distribution (CSD) . 96 6.5.1 A single site . 96 6.5.2 Generalization to multiple sites . 99 6.6 The infinite-sites model revisited . 100 6.7 Posterior probability of the first event back in time . 100 6.8 Closed-form asymptotic sampling formulae for small θ .................... 102 6.9 How many triallic sites do we expect to see in a sample of n genomes? . 104 References . 106 c Yun S. Song. DRAFT { May 5, 2021 Contents vii Part III Demography 7 Variable population size ...............................................109 7.1 Discrete-time model . 109 7.2 Inter-coalescence times and the ancestral process . 110 7.3 The expected SFS under variable population size . 112 7.3.1 Inter-coalescence times in terms of first-coalescence times . 112 7.3.2 Monotonicity and convexity. 115 7.4 SFS-based likelihoods . 115 7.4.1 Completely linked case . 116 7.4.2 Completely unlinked case: Poisson Random Field . 116 7.5 A recursion for efficiently computing P(Am(t) = k)...................... 118 7.6 Identifiability of population size histories from the SFS . 119 7.6.1 An analogy: Can you hear the shape of a drum? . 119 7.6.2 Non-identifiability and an explicit counterexample . 120 7.6.3 Rule of signs . 121 7.6.4 Identifiability . 123 7.7 Minimax error for population size estimation based on the SFS . 125 7.8 Geometry of the SFS . 125 References . 126 8 Multiple populations ..................................................127 8.1 The structured coalescent . 127 8.2 Coalescence time for a pair of lineages. ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    198 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us