
Viral epidemiology and the coalescent Marc A. Suchard [email protected] UCLA Key to population genetics SISMID University of Washington Coalescent theory Key to population genetics Population size Structure Migration Selection SISMID University of Washington Coalescent theory Tree-based population genetics The coalescent . Models the ancestral relationships of a random sample of individuals taken from a large background population. Describes a probability distribution on ancestral trees given a population history Covers ancestral trees, not sequences, and its simplest from assumes neutral evolution. SISMID University of Washington Coalescent theory Population history inference A population history often called a demographic ! Inference: learn about changes in population size through time Applications include: I Reconstructing infection disease epidemics I Investigating viral dynamics within hosts I Using viral sequences as genetic markers for their hosts and host demographics I Identifying population bottlenecks caused by: F Changes in climate/environment? aridifcation, ice ages F Competition with other species? humans F Transmission between hosts in viruses SISMID University of Washington Coalescent theory Information pipe-line SISMID University of Washington Coalescent theory Coalescent inference Coalescent theory SISMID University of Washington Coalescent theory Simple model of reproduction For a randomly chosen pair of individuals, they share a common ancestor (coalesce) in the previous generation with probability 1=N. SISMID University of Washington Coalescent theory Wright-Fisher reproduction model A constant population size of N individuals (usually 2N) Each new (non-overlapping) generation “chooses" its parents from the previous generation at random with replacement No geographic/social structure, no recombination, no selection SISMID University of Washington Coalescent theory Sample tree from a Wright-Fisher population A sample tree of 3 sequences from a population of N = 10 SISMID University of Washington Coalescent theory Kingman discrete-time coalescent 2 individuals coalesce in 1 1 generation w.p. N 2 individuals coalesce in j generations w.p. 1 1 j−1 1 N − N k individuals coalesce in j generations w.p. k 1 k 1 j−1 1 2 N − 2 N SISMID University of Washington Coalescent theory Kingman continuous-time coalescent Let t j define a rescaled time in the∼ past, and Assume a sample of n individuals with n N Then, the waiting time for k individuals to have k 1 ancestors − −(k) t P(uk t) = 1 e 2 N ≤ − Exponential (memoryless) defines a continuous-time ! Kingman (1982) J Appl Prob 19, 27-42 Markov chain Kingman (1982) Stoch Proc Appl 13, 235-48 2N (u ) = E k k(k 1) − SISMID University of Washington Coalescent theory Kingman coalescent: CTMC The number of ancestral lineages decreases by one at each coalescence The process continues until the most recent common ancestor (MRCA) is reached What is the expected time to MRCA? n ! n X X E uk = E(uk) k=2 k=2 n X 2N = k(k 1) k=2 − 1 = 2N 1 − n Note: tMRCA=2 < E(u2) SISMID University of Washington Coalescent theory Kingman coalescent: probability distribution Given a known tree from a sample of individuals from a populationT The coalescent allows us to calculation the probability P( N) T j Or, the inverse problem : learn about N from T SISMID University of Washington Coalescent theory N governs the rate of coalescence Large and small population sizes SISMID University of Washington Coalescent theory Quiz Which population is large? SISMID University of Washington Coalescent theory Coalescent assumptions The major weakness of the coalescent lies in its simplifying assumptions Neutral evolution? Reproductive variance? Panmitic population? But, does this matter? SISMID University of Washington Coalescent theory Solution: Effective population size Consider an abstract parameter, the effective population size Ne The Ne of a real biological population is the size of an idealized Wright-Fisher population that loses or gains genetic diversity at the same rate Ne is generally smaller than the census population The coalescent Ne provides the time-to-ancestry distribution for a sample tree from a real population SISMID University of Washington Coalescent theory Variable population size coalescent Growing population Changes in Ne reflect changes in the census population size SISMID University of Washington Coalescent theory Variable population size coalescent SISMID University of Washington Coalescent theory Parametric models of N(t) through time The standard coalescent can be extended to accommodate various scenarios of demographic change through time However, few parametric forms (constant, exponential, logistic) are available. Can use piece-wise combinations SISMID University of Washington Coalescent theory Review: continuous-time coalescent Time Time measured in generation units t1 u2 hk i N = const uk Exp 2 N t2 ! ∼ u3 N = N(t) ! k R t+tk+1 N t3 − du (2) tk+1 N(u) u4 Pr(uk > t tk+1) = e j t4 uk are not independent any more N(t) = N Constant population size N(t) = Ne−100t Exponential growth SISMID University of Washington Coalescent theory Piecewise constant demographic model Number of Lineages: 2 3 4 5 Isochronous Data Ne(t) = θk for tk < t tk−1. ≤ u2; : : : ; un are independent k(k−1)uk k(k−1) − 2θ Pr (uk θk) = e k j 2θk u2 u3 u4 u5 Qn t1 t2 t3 t4 t5= 0 Pr (F θ) Pr (uk θk) j / k=2 j Equivalent to estimating exponential mean from one observation. Need further restrictions to estimate all effective pop sizes θ! SISMID University of Washington Coalescent theory Piecewise constant demographic model Number of Lineages: 213 4 3 4 3 Heterochronous Data w20; : : : ; wnjn are independent Pr(wk0 j θk)= nk0(nk0−1)wk0 n (n −1) − k0 k0 e 2θk 2θk nkj (nkj −1)wkj − 2θ u2 u3 u4 u5 Pr(wkj j θk)=e k ; j>0 w20 w30 w40 w41 w50 w51 w52 Qn Qjk Pr (F j θ) / k=2 j=0 Pr wkj j θk Equivalent to estimating exponential mean from one observation. Need further restrictions to estimate all effective pop sizes θ! SISMID University of Washington Coalescent theory Previous priors to restrict θ Strimmer and Pybus (2001) Make Ne(t) constant across some inter-coalescent times Group inter-coalescent intervals with AIC Drummond et al. (2005) Multiple change-point model with fixed number of change-points Change-points allowed only at coalescent events Joint estimation of phylogenies and population dynamics Opgen-Rhein et al. (2005) Multiple change-point model with random number of change-points Change-points allowed anywhere in interval (0; t1] Posterior is approximated with rjMCMC SISMID University of Washington Coalescent theory Smoothing priors - Gaussian Markov random fields Go to the log scale xk = log θk " n−2 # (n−2)=2 ! X 1 2 Pr(x !) ! exp (xk+1 xk) j / − 2 dk − k=1 d1 d2 d3 dn−2 x1 x2 x3 x4 xn⌧2 xn⌧1 Weighting Schemes 1. Skyride : weights dk determined by tree (in relative time) 2. Skygrid : dk on a regular grid in absolute time + multi-locus Pr(x;!) = Pr(x !)Pr(!) j Pr(!) !α−1e−β!, diffuse prior with α = 0:01, β = 0:01 / SISMID University of Washington Coalescent theory MCMC algorithm Pr (G; Q; x D) Pr (D G; Q) Pr (Q) Pr (G x) Pr (x) j / j j Updating Population Size Trajectory Use fast GMRF sampling (Rue et al., 2001, 2004) Draw !∗ from an arbitrary univariate proposal distribution Use Gaussian approximation of Pr(x !∗; G) to propose x∗ j Jointly accept/reject (!∗; x∗) in Metropolis-Hastings step Object-Oriented Reality? BEAST = Bayesian Evolutionary Analysis Sampling Trees Pr(G x; D; Q) - sampled by BEAST j Pr(Q G; D) - sampled by BEAST j SISMID University of Washington Coalescent theory Simulation: constant population size Classical Skyline Plot ORMCP Model Beast MCP 0.1 1.0 5.0 0.1 1.0 5.0 0.1 1.0 5.0 Effective Population Size 1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0 Uniform GMRF Time−Aware GMRF Beast GMRF 0.1 1.0 5.0 0.1 1.0 5.0 0.1 1.0 5.0 Effective Population Size 1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0 Time (Past to Present) Time (Past to Present) Time (Past to Present) SISMID University of Washington Coalescent theory Simulation: exponential growth Classical Skyline Plot ORMCP Model Beast MCP 0.001 0.1 1.0 0.001 0.1 1.0 0.001 0.1 1.0 Effective Population Size 0.014 0.010 0.006 0.002 0.014 0.010 0.006 0.002 0.014 0.010 0.006 0.002 Uniform GMRF Time−Aware GMRF Beast GMRF 0.001 0.1 1.0 0.001 0.1 1.0 0.001 0.1 1.0 Effective Population Size 0.014 0.010 0.006 0.002 0.014 0.010 0.006 0.002 0.014 0.010 0.006 0.002 Time (Past to Present) Time (Past to Present) Time (Past to Present) SISMID University of Washington Coalescent theory Simulation: exponential growth with bottleneck Classical Skyline Plot ORMCP Model Beast MCP Effective Population Size 0.001 0.01 1.0 0.001 0.01 1.0 0.001 0.01 1.0 0.15 0.10 0.05 0.00 0.15 0.10 0.05 0.00 0.15 0.10 0.05 0.00 Uniform GMRF Time−Aware GMRF Beast GMRF Effective Population Size 0.001 0.01 1.0 0.001 0.01 1.0 0.001 0.01 1.0 0.15 0.10 0.05 0.00 0.15 0.10 0.05 0.00 0.15 0.10 0.05 0.00 Time (Past to Present) Time (Past to Present) Time (Past to Present) SISMID University of Washington Coalescent theory Accuracy in simulations Z TMRCA Nbe(t) Ne(t) Percent error = j − jdt 100; 0 Ne(t) × Table: Percent error in simulations.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages37 Page
-
File Size-