Pervasive Strong Selection at the Level of Codon Usage Bias in Drosophila Melanogaster
Total Page:16
File Type:pdf, Size:1020Kb
| INVESTIGATION Pervasive Strong Selection at the Level of Codon Usage Bias in Drosophila melanogaster Heather E. Machado,*,1 David S. Lawrie,† and Dmitri A. Petrov‡ *Cancer, Ageing, and Somatic Mutation, Wellcome Sanger Institute, Hinxton CB10 1SA, UK, †Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-3958, and ‡Department of Biology, Stanford University, California 94305-5020 ORCID IDs: 0000-0002-1523-3937 (H.E.M.); 0000-0002-6782-9986 (D.S.L.); 0000-0002-3664-9130 (D.A.P.) ABSTRACT Codon usage bias (CUB), where certain codons are used more frequently than expected by chance, is a ubiquitous phenomenon and occurs across the tree of life. The dominant paradigm is that the proportion of preferred codons is set by weak selection. While experimental changes in codon usage have at times shown large phenotypic effects in contrast to this paradigm, genome-wide population genetic estimates have supported the weak selection model. Here we use deep genomic population sequencing of two Drosophila melanogaster populations to measure selection on synonymous sites in a way that allowed us to estimate the prevalence of both weak and strong purifying selection. We find that selection in favor of preferred codons ranges from weak (|Nes| 1) to strong (|Nes| . 10), with strong selection acting on 10–20% of synonymous sites in preferred codons. While previous studies indicated that selection at synonymous sites could be strong, this is the first study to detect and quantify strong selection specifically at the level of CUB. Further, we find that CUB-associated polymorphism accounts for the majority of strong selection on synonymous sites, with secondary contributions of splicing (selection on alternatively spliced genes, splice junctions, and spliceosome-bound sites) and transcription factor binding. Our findings support a new model of CUB and indicate that the functional importance of CUB, as well as synonymous sites in general, have been underestimated. KEYWORDS codon usage bias; CUB; selection; splicing; synonymous sites; Drosophila HE degeneracy of the genetic code leads to protein-coding selection, the nature and strength of natural selection acting Tmutations that do not affect amino acid composition. to maintain CUB is disputed. Despite this, such synonymous mutations often have conse- The most common explanations for CUB postulate selec- quences for phenotype and fitness. The first evidence of the tion on either the rate or the accuracy with which ribosomes functionality of synonymous sites came from the discovery of translate mRNA to protein (Hershberg and Petrov 2008). The codon usage bias (CUB), where, for a given amino acid, certain existence of selection at synonymous sites at the level of codons are used more frequently in a genome than expected translation is supported by several key observations. First, by chance (Grantham et al. 1981; Ikemura 1981). While the the preference toward particular “preferred” codons is con- consensus in the field is that CUB is often driven by natural sistent across genes within a particular genome suggesting a global, genome-wide process, and not preference for the use of particular codons within specific genes (Grantham et al. 1980; Chen et al. 2004). Second, optimal codons tend to Copyright © 2020 Machado et al. doi: https://doi.org/10.1534/genetics.119.302542 correspond to more abundant tRNAs, suggesting a functional Manuscript received October 9, 2019; accepted for publication December 12, 2019; relationship between translation and CUB (Post et al. 1979; published Early Online December 23, 2019. Available freely online through the author-supported open access option. Ikemura 1981, 1982; Qian et al. 2012). Third, preferred co- This is an open-access article distributed under the terms of the Creative Commons dons are more abundant in highly expressed genes than in Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), the rest of the genome (Gouy and Gautier 1982; Bulmer which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1991; Novoa and Ribas de Pouplana 2012), consistent with Supplemental material available at figshare: https://doi.org/10.25386/genetics. selection being proportional to mRNA transcript abundance. 11316794. 1Corresponding author: Cancer, Ageing, and Somatic Mutation, Wellcome Sanger Finally, constrained amino acid positions tend to contain pre- Institute, Hinxton CB10 1SA, UK. E-mail: [email protected] ferred codons more frequently, suggesting a link between Genetics, Vol. 214, 511–528 February 2020 511 CUB and translational accuracy (Escherichia coli: Stoletzki and the test independent of the demographic history of a pop- Eyre-Walker 2007; Drosophila melanogaster: Akashi 1994; ulation. These and similarly constructed tests have been used mammals: Drummond and Wilke 2008). In addition to speed to estimate the strength of selection on CUB in D. mela- and accuracy, there is evidence that other processes are af- nogaster, and have typically failed to find any evidence of fected by codon composition, such as cotranslational folding strong selection on CUB (Singh et al. 2007; Zeng and (Pechmann and Frydman 2013), RNA stability (Presnyak et al. Charlesworth 2009, 2010; Clemente and Vogl 2012a; 2015; Carneiro et al. 2019), and transcription (Carlini and Campos et al. 2013). However, strong purifying selection Stephan 2003; Newman et al. 2016; Zhou et al. 2016). results in an enrichment of very low allele frequency vari- Beyond the level at which selection operates to generate ants, requiring very deep population sequencing to allow for CUB, it is important to consider how strong selection at the detection of its effects in SFS data. In the absence of very synonymous sites is likely to be. This question has been deep and accurate population sequencing, an alternative addressed with population genetics approaches introduced method is to utilize information about the proportion of sites by the seminal papers of Li and Bulmer (Li 1987; Bulmer that are polymorphic (polymorphism level). Since both 1991). The Li–Bulmer model proposes that the observed, strong purifying selection and a decreased mutation rate relative proportion of codons can be explained by the balance can lower the polymorphism level, the selected class of sites of mutation, selection (in favor of preferred codons), and would have to be compared with a neutral reference that is random genetic drift. This model assumes a constant selec- matched for mutation rate and levels of linked selection. tion coefficient per codon or codon preference group, and Thus, the limit of detection for strong selection in the afore- predicts that the strength of selection in favor of preferred mentioned studies, which either ignored polymorphism codons should be on the order of the reciprocal of the effec- level or did not have a control for it, was set by the lowest tive population size (Nes 1). The predicted weak selection allele frequency class in the dataset (set by the number of should be detectable as a slight deviation in the site frequency individuals sampled). spectrum (SFS). Mutations from preferred to unpreferred Intriguingly, a study by Lawrie et al. (2013) that did in- codons should reach comparatively lower frequencies in the corporate polymorphism level and SFS with the use of population than those in the opposite direction. Such devia- matched neutral controls did find evidence of strong purify- tions have, in fact, been observed in many organisms that ing selection in synonymous sites, but was unable to correlate show clear CUB (D. melanogaster: Zeng and Charlesworth this substantial selection with CUB. As the authors did not 2009; Caenorhabditis remanei: Cutter and Charlesworth have enough variants in their study to test the SFS of pre- 2006; E. coli: Sharp et al. 2010), supporting the conclusion ferred sites separately, they could not test for weak selection that selection at synonymous sites is weak but detectable. on CUB or distinguish strong selection from lethal CUB- While weak selection driving CUB may match the intuition associated mutations. that a synonymous change should not have a large phenotypic Here, we test the prevalence of strong purifying selection effect, there is abundant experimental evidence that this is not on CUB in two distinct D. melanogaster populations (the always the case. For example, optimizing the codon compo- DGRP Freeze 2 dataset and a high diversity African popula- sition of the viral protein BPV1 increases the heterologous tion). We accomplish this by comparing the polymorphism translation of the protein in humans by .1000 fold (Zhou level and SFS of fourfold degenerate synonymous sites in et al. 1999). In humans, a change in codon composition in the preferred and unpreferred codons to that of a short intron gene KRAS, from rare to common codons, increases KRas neutral reference. The neutral reference is produced by protein expression and is associated with tumorigenicity matching each fourfold site to a short intron site that is lo- (Lampson et al. 2013). In D. melanogaster, changing a small cated within 1 kb and has the same nucleotide at the position number of preferred codons to unpreferred codons in the of interest and at the 59 and 39 neighboring sites. This creates alcohol dehydrogenase (Adh) gene resulted in substantial a neutral reference that is subject to the same mutation rate changes in gene expression and in ethanol tolerance