Estimating the Parameters of Background Selection and Selective Sweeps in Drosophila in the Presence of Gene Conversion
Total Page:16
File Type:pdf, Size:1020Kb
Estimating the parameters of background selection and selective sweeps in Drosophila in the presence of gene conversion José Luis Camposa, Lei Zhao (赵磊)b, and Brian Charleswortha,1 aInstitute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom; and bCentre for Computational Systems Biology, Fudan University, Shanghai 200433, People’s Republic of China Edited by Daniel L. Hartl, Harvard University, Cambridge, MA, and approved May 9, 2017 (received for review November 24, 2016) We used whole-genome resequencing data from a population of long been a matter for debate, because they can have similar Drosophila melanogaster to investigate the causes of the negative effects on both levels of variability and the shapes of gene ge- correlation between the within-population synonymous nucleo- nealogies. This question has been especially intensively studied tide site diversity (πS) of a gene and its degree of divergence from using population genomic data on Drosophila. Three broad cat- related species at nonsynonymous nucleotide sites (KA). By using egories of approach can be distinguished. The first attempts to the estimated distributions of mutational effects on fitness at non- interpret patterns of variability by using estimates of the extent of synonymous and UTR sites, we predicted the effects of back- adaptive evolution in coding or functional noncoding sites, ig- π ground selection at sites within a gene on S and found that noring BGS (10, 11). The second uses estimates of the distri- these could account for only part of the observed correlation be- bution of fitness effects (DFE) of new deleterious mutations to π tween S and KA. We developed a model of the effects of selective ask whether BGS alone can account for most of the observed sweeps that included gene conversion as well as crossing over. We patterns (12, 13). The third uses whole-genome data on levels of used this model to estimate the average strength of selection on polymorphism and divergence in different classes of nucleotide positively selected mutations in coding sequences and in UTRs, as sites to fit the parameters of both BGS and SSWs, without using well as the proportions of new mutations that are selectively ad- prior knowledge of the DFE or the extent of adaptive evolution vantageous. Genes with high levels of selective constraint on non- (3, 14). Although all three approaches agree in suggesting a synonymous sites were found to have lower strengths of positive substantial effect of hitchhiking on the typical level of variability selection and lower proportions of advantageous mutations than in a gene, they disagree about the quantification of the contri- genes with low levels of constraint. Overall, background selection butions of the two causes of hitchhiking. and selective sweeps within a typical gene reduce its synonymous Here, we describe an approach that involves fitting models of diversity to ∼75% of its value in the absence of selection, with both BGS and SSWs to an important aspect of the population larger reductions for genes with high KA. Gene conversion has a genomic data—the negative relation between the level of synon- major effect on the estimates of the parameters of positive selec- ymous nucleotide site diversity (πS)inaDrosophila melanogaster tion, such that the estimated strength of selection on favorable gene and KA, its nonsynonymous site (NS) divergence from a mutations is greatly reduced if it is ignored. related species, first noted by Andolfatto (15) and confirmed in later studies (16, 17). We used whole-genome polymorphism background selection | selective sweeps | sequence diversity | data on a Rwandan population of D. melanogaster, previously gene conversion | Drosophila melanogaster analyzed for different purposes (18–20). By binning genes into sets with similar KA values with respect to divergence from dvances in population genomics are shedding light on the Aquestion of the extent to which patterns of DNA sequence Significance variation and evolution are affected by selection at sites that are genetically linked to those under investigation (1–3). Two main “ ” The level of DNA sequence variation at a site in the genome is processes have been invoked as causes of such hitchhiking affected by selection acting on genetically linked sites. We effects. The first involves selective sweeps (SSWs), in which a have developed models of selection at linked sites to explain selectively favorable mutation spreads all or part of the way the observed negative relation between the level of nearly through the population, causing a reduction in the level of var- neutral variability in Drosophila genes and their protein se- iability at nearby sites (4). The second is background selection quence divergence from a related species. We use fits of these (BGS) (5), whereby the elimination of deleterious mutations models to polymorphism and divergence data to show that results in the removal of linked variants. Both processes can be selective sweeps are the main determinants of this pattern. We viewed heuristically as causing a local reduction in effective obtain estimates of the strengths of selection on advantageous N population size ( e), resulting in reductions in within-population mutations and the proportions of new mutations that are se- variability and the effectiveness of selection. Variability increases lectively advantageous. Gene conversion, a major source of with the product of Ne and the mutation rate, and the fixation genetic recombination within genes, has a large effect on these probability of a mutation is determined by the magnitude of the parameter estimates. product of Ne and the strength of selection (6). These effects have important implications for the evolutionary significance of Author contributions: J.L.C. and B.C. designed research; J.L.C., L.Z., and B.C. performed recombination and sexual reproduction (1, 7, 8). research; J.L.C. analyzed data; and B.C. wrote the paper. Because genetic recombination reduces associations between The authors declare no conflict of interest. linked variants, the effects of hitchhiking on a genome region are This article is a PNAS Direct Submission. expected to be negatively correlated with the local rate of re- Data deposition: The computer code and output files reported in this paper have been combination (9). There is now a large body of evidence for deposited in Dryad (https://doi.org/10.5061/dryad.vs264). positive correlations between the recombination rate of a gene 1To whom correspondence should be addressed. Email: [email protected]. – and its level of variability and molecular adaptation (1 3, 9). The This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. contributions of SSWs and BGS to reductions in variability have 1073/pnas.1619434114/-/DCSupplemental. E4762–E4771 | PNAS | Published online May 30, 2017 www.pnas.org/cgi/doi/10.1073/pnas.1619434114 Downloaded by guest on September 24, 2021 Drosophila yakuba, or along the D. melanogaster lineage since its Table 1. Effects of gene conversion (GC), BGS, and SSWs on PNAS PLUS divergence from its closest relative Drosophila simulans, we es- parameter estimates from the mel-yak data timated the parameters of the DFE and the extent of positive Mean pa selection on NS sites for bins with different K values. We also 4 4 A UTR BGS GC Mean γα (×10 ) γu pu (×10 ) πr max πr mean r estimated these parameters for untranslated regions (UTRs) of coding sequences, which show levels of selective constraint +++249 2.21 213 9.04 0.882 0.756 0.924 that are intermediate between those for synonymous and non- − ++197 2.76 ——0.975 0.836 0.918 synonymous sites (21). + − + 319 1.82 197 9.74 0.925 0.793 0.922 Using recent estimates of rates of crossing over and gene −−+ 297 1.95 ——0.992 0.850 0.918 conversion for D. melanogaster (22, 23), we found that the effects ++− 122 4.41 135 10.4 0.867 0.742 0.927 of BGS can account for only part of the observed relation be- − + − 95.6 5.59 ——0.971 0.830 0.920 + −−188 3.07 119 16.1 0.923 0.792 0.922 tween πS and KA, so that SSW effects need to be invoked. We estimated the average strength of selection on positively selected −−−176 3.29 ——0.992 0.850 0.919 mutations, and the proportion of new mutations that are ad- The data used in Fig. 3 for the standard gene model and standard rates of vantageous, for both NS and UTR sites. A unique aspect of our mutation and crossing over were applied to estimates of the parameters of approach is that it includes gene conversion in the SSW as well positive selection and synonymous site diversity from models with (+)or as the BGS models, which has a major effect on the parame- without (−) selective effects on UTRs, BGS, and GC. When GC was present, π ter estimates. Because we found that the results using the the low rate of Fig. 3 was assumed. r max is the maximum value of the mean D. melanogaster–D. yakuba comparison had better statistical prop- synonymous site diversity of a gene, relative to its value in the absence of π erties than those for the D. melanogaster lineage, probably because selection; r mean is the corresponding mean value over bins. Other variables are defined in the text. they provided more accurate estimates of the rates of adaptive evolution, we focus attention on the former. From now on, we will refer to the two datasets as mel-yak and mel, respectively. implies that the level of selective constraint on a coding sequence is negatively correlated with its K value, consistent with the result Empirical Results A from the DFE-α analyses described next.