The Selection-Mutation-Drift Theory of Synonymous Codon Usage
Total Page:16
File Type:pdf, Size:1020Kb
Copyright 0 1991 by the Genetics Society of America The Selection-Mutation-DriftTheory of Synonymous Codon Usage Michael Bulmer' Department of Statistics, Oxford University, Oxford OX1 3TG, England Manuscript received March 13, 199 1 Accepted for publication July 6, 199 1 ABSTRACT It is argued that the bias in synonymous codon usage observed in unicellular organismsis due to a balance between the forces of selection and mutation in a finite population, with greaterbias in highly expressed genes reflecting stronger selection for efficiency of translation. A populationgenetic model is developed taking into account population size and selective differences between synonymous codons. A biochemical model is then developed to predict the magnitude of selective differences between synonymous codons in unicellular organisms in which growth rate (or possibly growth yield) can be equated with fitness. Selection can arise from differences in either the speed or the accuracy of translation. A model for the effect of speed of translation on fitness is considered in detail, a similar model for accuracy more briefly. The model is successful in predicting a difference in the degree of bias at thebeginning than inthe rest of the gene under some circumstances, as observed in Escherichia coli, but grossly overestimates the amount of bias expected. Possible reasons for this discrepancy are discussed. YNONYMOUScodons (coding forthe same balance ina finite population between selection favor- S amino acid) are often used with very different ing an optimal codon for each amino acid and muta- frequencies (GRANTHAMet al. 1980, 1981; IKEMURA tion together with drift allowing the persistence of 1985; BULMER1988a; SHARP1989; ANDERSONand nonoptimal codons. Selection is likely to be stronger KURLAND1990). The main features of codon usage on codons in highly expressed genes since they are bias described in these references are as follows. (1) used more often, thus generating greater biasin Among a group of synonymouscodons recognized by highly than in weakly expressed genes (SHARPand LI several tRNAs, codons recognized by the most abun- 1986a,b; BULMER1987, 1988a). dant tRNA are used more often than those recognized Several lines ofevidence support the seiection-mu- by rare tRNAs. (2) Among codons recognized by the tation-drift theory rather than the expression-regula- same tRNA, those making a natural Watson-Crick tion theory of codon usage in weakly expressed genes. pair with the anticodon in the wobble position are (1) The wayin which the pattern ofcodon usage usually performed, though there are exceptions to changes with expression level suggests a trend from this rule (GROSJEANand FIERS1982). (3) The bias is predominant use of optimalcodons in highly ex- more extremein strongly expressed genes which pro- pressed genes toward more uniform use of all synon- duce large amounts of protein than in weakly ex- ymous codons in weakly expressed genes (SHARPand pressed genes. Codon usage bias of this kind hasbeen LI 1986a,b; BULMER1988a). There is no evidence found in Drosophila melanogaster (SHIELDSet al. 1988) that weakly expressed genes have a preference for as well as in many prokaryotes and unicellular eukar- rare codons. (2) The synonymous substitution rate is yotes but not in mammals. higher in Enterobacteria inweakly than inhighly There is general agreement that the strong bias expressed genes, inaccordance with expectation when found in highlyexpressed genes is due toselection for selection pressure is relaxed (SHARPand LI 1987). (3) speed or translational efficiency, but two theories have Investigation of the effect of neighboring bases (the been put forward to account for codon usagein codon context effect) on the synonymous usage of A weakly expressed genes. The first, the expression-reg- vs. G or of U vs. C shows a similar pattern in comple- dation theory, is that rarecodons are used in the latter mentary sequences in weakly expressed genes inboth as a mechanism for keeping their expression low Escherichia coli and yeast (BULMER1990). This is con- (GROSJEANand FIERS1982; KONICSBERGand GODSON sistent with an effect of context on mutation rate, but 1983; WALKER,SARASTE and GAY 1984; HINDand is not consistent with selection acting at the mRNA BLAKE1985). The second, the selection-mutation-drijit rather than theDNA level.SHIELDS and SHARP(1 987) theory, is that codon usage patterns result from the have madea similar observation in Bacillus subtilis. (4) ' Current address:Department of Biological Sciences, Rutgers University, It would seem more efficient to modulate the level of Piscataway, New Jersey 08855-1059. gene expression by changing the strength of the pro- Genetics 149: 897-907 (November, 1991) 898 M. Bulmer moter or the ribosome binding site. KONIGSBERGand due to the combined effects of mutation and genetic GODSON(1 983) discuss a weakly expressed E. coli gene drift (LI 1987; BULMER1987; SHIELDS1990). Con- sandwiched in the same transcriptional unit between sider first the jointeffect of selection and mutation in two highly expressed genes,and propose that its weak an effectively infinite population on a two codon fam- expression is achieved through theuse of non-optimal ily, such as the codons UUU and UUC for Phe. Fix codons.However, DE BOER and KASTELEIN(1986) attention on a particular UUY codon at a particular point out that it has an inefficient ribosome binding position in a particular gene, and assume that only site which is more likely to be the cause of its weak synonymous changesoccur, since nonsynonymous expression. (5) In any case, it seems unlikely that substitutions are comparatively rare. Thus there are changes in the elongation rate brought about through two alleles at this codon, represented by the presence codon usage will lead to any appreciable change in of U or C at the third position, which will be labeled the rateof protein production (ANDERSONand KUR- B1 and B2. For simplicity consider a haploid model, LAND 1990). The rate of initiation is probably the and suppose that individuals carrying B2 have relative rate-limiting factor in protein synthesis, so that the fitness 1 - s compared with those carrying B1,so that elongation rate has little or no direct effect on the s is the selective disadvantage of B2 compared with B1, rate of protein synthesis. This will be discussed in assumed small. Write u for the mutation rate from B1 more detail later. to B:! and v forthe mutation rate in the reverse I shall therefore adopt the selection-mutation-drift direction. Write p, and q1 (=I - p,) for the relative theory as a working hypothesis. The purpose of this frequencies of B1 and B2 in generation t. The change paper is to evaluate the selective forces likely to act in the gene frequency in one generation under weak on codon usage in unicellular organisms inwhich selection and mutation is growth rate (or possibly growth yield) can be equated with fitness and to determine whether they are of the Ap = sptqr + Vqt - up,. (1) right order of magnitude to generate, in conjunction At equilibrium Ap = 0, so that the equilibrium gene with estimated mutation ratesand population size, the frequency P satisfies the quadratic equation observed pattern of codon usage. sP(1 - P) v(l - P) - UP = 0 (2) The next section describes a simple population ge- + neticmodel, the following section quantifiesfrom whose unique positive root is biochemical considerations the selective forces acting P = 1/2(1 - (u v)/s on codon usage and the final section confronts the + (3) predictions of the model, incorporating theseselection + sign(s).\/l - 2(u - v)/s + (u + V)~/S~~. coefficients, with the facts. The observed bias in syn- In a finite population, randomsampling introduces onymous codon usage is much less than predicted; stochastic fluctuations in the gene frequency, and we possible reasons for this discrepancy are considered. must consider not the equilibrium gene frequency P but the equilibrium distribution of gene frequencies A POPULATION GENETIC MODEL OF f(p)with Expected value P. A classical result in pop- SYNONYMOUSCODON USAGE ulation genetics (WRIGHT193 1; CROWand KIMURA 1970) is that KIMURA(1 98 1, 1983) considered a model of stabi- lizing selection on codon usage in which there is an f@) a esPpv”(l - p)”” (4) optimal state of highest fitness when the relative fre- where quencies of synonymous codons exactly match those of the cognate tRNAs. The rationale of this model is S = 2Nes unclear.It is truethat codon usage will alterthe V = 2Nev relative abundanceof cognate charged tRNAs, butit (5) isunlikely thatunder physiological conditions the U = 2Neu tRNA with greatesttotal abundance (charged, un- Ne = effective population size. charged and bound to mRNA) will also have greatest abundance in free, charged form and will therefore If U + V is large, the distribution will be clustered translate fastest regardless of codon usage. This model about the deterministic equilibrium in Equation 3. If also fails to take into accountdifferences between U + V is small, the population is likely to be at or near codons recognized by the same tRNA, and there is one of the boundaries so thatthe expectedgene some weakness in the link between its verbal formu- frequency is the probability of being near 1 rather lation and its mathematical development (LI 1987). than 0, given by An alternative model will be adopted herein which P = esV/(esV + u). (6) there is a unique optimal codon for each amino acid, the occurrence of other synonymous codons being To prove this result we use the fact (CROWand KI- Synonymous Codon Usage 899 MURA 1970, p. 426) that theprobability of fixation of quencies Pi satisfy the set of simultaneous quadratic a newly arisen mutant with small selective advantage equations analogous to Equation 3: s is approximately Pi(C SF^ - si) + 1 UjSj - Pix ~9 = 0, (9) j j 4(S) = 2S/N(1 - e-') (7) j i=l, where Nis the actual populationsize.