An Introduction to Coalescent Theory
Total Page:16
File Type:pdf, Size:1020Kb
An introduction to coalescent theory Nicolas Lartillot May 26, 2014 Nicolas Lartillot (CNRS - Univ. Lyon 1) Coalescent May 26, 2014 1 / 39 De Novo Population Genomics in Animals Table 3. Coding sequence polymorphism and divergence patterns in five non-model animals. species #contigs #SNPs pS (%) pN (%) pN/pS dN/dS aa0.2 aEWK vA turtle 1 041 2 532 0.43 0.05 0.12 0.17 0.01 0.43 0.92 0.17 60.03 60.007 60.02 60.03 60.18 60.15 hare 524 2 054 0.38 0.05 0.12 0.15 20.11 0.30 ,0 ,0 60.04 60.008 60.02 60.03 60.22 60.23 ciona 2 004 11 727 1.58 0.15 0.10 0.10 20.28 0.10 0.34 0.04 60.06 60.011 60.01 60.01 60.10 60.11 termite 4 761 5 478 0.12 0.02 0.18 0.26 0.08 0.28 0.74 0.20 60.01 60.002 60.02 60.02 60.10 60.11 oyster 994 3 015 0.59 0.09 0.15 0.21 0.13 0.22 0.79 0.21 60.05 60.011 60.02 60.02 60.12 60.13 doi:10.1371/journal.pgen.1003457.t003 Explaining genetic variation Figure 4. Published estimates of genome-wide pS, pN and pN/pS in animals. a. pN as function of pS; b. pN/pS as function of pS; Blue: vertebrates; Red: invertebrates;Gayral Full circles: et species al, analysed2013, in PLoS this study, Genetics designated by 4:e1003457 their upper-case initial (H: hare; Tu: turtle; O: oyster; Te: termite; C: ciona); Dashed blue circles: non-primate mammals (from left to right: mouse, tupaia, rabbit). Estimates were taken from Bustamante et al. 2005 (human), Hvilsom et al 2012 (chimpanzee), Carneiro et al 2012 (rabbit), Perry et al 2012 (other mammals), Begun et al 2007 (D. simulans) and Tsagkogeorga et al 2012 (C. intestinalis B = right-most circle). doi:10.1371/journal.pgen.1003457.g004 Fraction ofPLOS heterozygous Genetics | www.plosgenetics.org sites innuclear 9 genomes April 2013 | Volume 9 | Issue 4 | e1003457 humans: 0.1% (1 every 1000 nucleotide positions) drosophila: 3.3% (1 every 30 nucleotide positions) what determines levels of within-species genetic variation? Nicolas Lartillot (CNRS - Univ. Lyon 1) Coalescent May 26, 2014 2 / 39 Viral phylogenies Rates of HIV Evolution and AIDS Progression REVIEWS As a result, strains with advantageous mutations could, by chance, find themselves in individuals with low rates of partner exchange and so will not be transmitted far in the population. Of more debate is whether a bottleneck has a selective component, so that strains that are better adapted to new hosts (such as R5 strains) competitively establish themselves in primary infection60,or whether it is entirely neutral61 and thereby only magnifies the effects of genetic drift. Finally, some advantageous mutations, such as those conferring CTL escape, might not appear until relatively late in infection62.If these late-escape mutants do not arise until after most individuals have transmitted the virus, natural selection will be less effective at the pop- ulation level. As a consequence, HIV strains might not readily adapt to the HLA HAPLOTYPE distributions of their local populations63,because some CTL-escape mutants have little opportunity for further transmission. The data presented to support the adaptation of HIV to HLA haplotypes at the population level only considered within-host evolution, albeit in a large number of Figure 4 | Contrasting patterns of intra- and inter-host evolution of HIV. The tree was patients, and did not measure the effect of transmission. constructed using the NEIGHBOUR-JOINING METHOD on envelope gene-sequence data that was takenwithin from nine HIV-infected and patients48 (a total between of 1,195 sequences, 822 base hosts pairs in length), with Indeed, the fact that repeated individual adaptation was Figure 1. Internal, Backbone, and External Branches in a Within-Host HIV Genealogy, and Mean Nucleotide Substitution Rates for These Branchesthose in viruses sampled from each patient depicted by a different colour. In each case, intra-host observed in these patients indicates that the HIV popu- Nine Longitudinally Sampled HIV-1 Patients HIV evolution is characterized by continual immune-driven selection, such that there is a lation as a whole was not adapted to the host HLA dis- To avoid the influence of deleterious mutations segregating to external branches in intrapatient HIV genealogies, we estimate mean substitutionsuccessive rates selective replacement of strains through time, with relatively little genetic diversity at tribution. Moreover, although certain CTL-escape for the set of internal and backbonewithin branches. Thesehost branch sets are depicted in color in the maximum a posteriori tree for ‘‘patient 1’’ obtained byRambaut et al, 2004, Nature Rev Genet 5:52 Bayesian relaxed-clock inference [26] (backbone, red; internal, blue; external, black). The backbone represents the central trunk of trees shaped byany rapid time point. By contrast, there is little evidence for positive selection at the population level mutants can be transmitted through the population64,it lineage turnover and can be defined phylogenetically (see Methods). Note that the set of internal branches also includes the backbone branches.(bold lines connecting patients), so that multiple lineages are able to coexist at any time point. Samples for each time point are indicated by the dotted line. Mean nucleotide substitution rates and their standard deviations on internal, external, and is possible that CTL-escape mutations that are passed to A major BOTTLENECK is also likely to occur when the virus is transmitted to new hosts. backbone branches are shown for all longitudinally sampled HIV-1 patients. The consistently higher substitution rate on external branches might be indicativeLemey of higher et mutation al, 2007load on these PLoS branches. Comput Biol 3:e29 individuals with the ‘wrong’ HLA background will doi:10.1371/journal.pcbi.0030029.g001 sometimes be deleterious and removed by purifying to progression time, and three continuous parameters that be attributed to differences in time length of sampling. In selection. In summary, inter-host HIV evolution is not relate to disease progression: progression time, the rate of general, backbone dS rates before progression time seem to (neutral) spatial and temporal diffusion of the virus, merely intra-host evolution played out over a longer CD4 T cell count change over time, and the rate of log viral show little temporal fluctuation in the trees since dS þ 2 with viral lineages co-existing for extended time peri- timescale, and the evolutionary process that occurs whatload (log VL) change tree over timeshape (Figure 3). The log of can the divergence say accumulated about in a linear fashion, infection with R values dynamics 2 ods. Indeed, there is little evidence that fitness differ- within hosts will not select for viruses with enhanced backbone rate of synonymous divergence shows a strong close to 0.96, except for patient 9 (R 0.83). We also ¼ negative correlation with both progression time (Pearson investigated the heterogeneity of synonymous and non- ences determine subtype structure and distribution. For transmissibility. correlation coefficient r 0.79, p 0.011) and the change in synonymous substitution rates within the env C2V3 gene, example, experimental studies have revealed that sub- ¼À ¼ CD4 T cell count (r 0.72, p 0.028), and a moderate because strong site-to-site variation in synonymous rates has whatþ global¼À ¼ tree can tell us about the global pandemicstype C viruses consistently have lower in vitro fitness Recombination and HIV diversity. Genetic recombina- positive correlation with the change in log VL (r 0.65, p the potential to bias dS estimates [30]. Although this analysis ¼ ¼ 0.059). No significant correlations were observed for non- revealed very strong site-to-site variation in dS rates, the than those assigned to subtype B (REF.56).Although cau- tion is an integral part of the HIV lifecycle, occurring synonymous divergence rates (r 0.32, p 0.40 for inferred rate distributions were nearly identical among all tion should be shown when extrapolating from the lab- when reverse transcriptase switches between alternative ¼À ¼ progression time; r 0.54, p 0.135 for CD4 T cell count patients (Table S3). Finally, recombination rate estimates ¼À ¼ þ oratory to nature, this indicates that the high prevalence genomic templates during replication. As already men- change; r 0.14, p 0.58 for log VL change). Similar results (Table S2) did not provide any evidence that recombination ¼À ¼ of subtype C in sub-Saharan Africa is the result of its tioned, the recombination rate of HIV is one of the were obtained when divergence rates were estimated from could be the cause of the differences between dS estimates. Nicolas Lartillotinternal branches (CNRS (Figure S2). - InUniv. contrastLyon to backbone 1) and The variability in dS rates couldCoalescent reflect differences in either chance entry into populationsMay with26, high rates 2014 of partner 3highest / 39 of all organisms, with an estimated three recom- internal rates, no significant correlations were observed for viral mutation rate or viral generation time, but only the exchange. However, it is unclear whether the success of bination events occurring per genome per replication both dS and dN rates on external branches. Similar results latter provides a likely explanation for the correlation HIV-1 group M, relative to groups N and O, is the result cycle65, thereby exceeding the mutation rate per replica- were also obtained when datasets were restricted to samples between dS rates and disease progression. However,NEIGHBOUR-JOINING viral METHOD up to about 70 months after seroconversion (Figure S3), generation time is also expected to affect dN rates to some of some intrinsic property of the virus that enhances tion.