<<

Copyright 0 1991 by the Genetics Society of America

The Selection--DriftTheory of Synonymous Codon Usage

Michael Bulmer' Department of Statistics, Oxford University, Oxford OX1 3TG, Manuscript received March 13, 199 1 Accepted for publication July 6, 199 1

ABSTRACT It is argued that the bias in synonymous codon usage observed in unicellular organismsis due to a balance between the forces of selection and mutation in a finite population, with greaterbias in highly expressed reflecting stronger selection for efficiency of . A populationgenetic model is developed taking into account population size and selective differences between synonymous codons. A biochemical model is then developed to predict the magnitude of selective differences between synonymous codons in unicellular organisms in which growth rate (or possibly growth yield) can be equated with fitness. Selection can arise from differences in either the speed or the accuracy of translation. A model for the effect of speed of translation on fitness is considered in detail, a similar model for accuracy more briefly. The model is successful in predicting a difference in the degree of bias at thebeginning than inthe rest of the under some circumstances, as observed in Escherichia coli, but grossly overestimates the amount of bias expected. Possible reasons for this discrepancy are discussed.

YNONYMOUScodons (coding forthe same balance ina finite population between selection favor- S ) are often used with very different ing an optimal codon for each amino acid and muta- frequencies (GRANTHAMet al. 1980, 1981; IKEMURA tion together with drift allowing the persistence of 1985; BULMER1988a; SHARP1989; ANDERSONand nonoptimal codons. Selection is likely to be stronger KURLAND1990). The main features of codon usage on codons in highly expressed genes since they are bias described in these references are as follows. (1) used more often, thus generating greater biasin Among a group of synonymouscodons recognized by highly than in weakly expressed genes (SHARPand LI several tRNAs, codons recognized by the most abun- 1986a,b; BULMER1987, 1988a). dant tRNA are used more often than those recognized Several lines ofevidence support the seiection-mu- by rare tRNAs. (2) Among codons recognized by the tation-drift theory rather than the expression-regula- same tRNA, those making a natural Watson-Crick tion theory of codon usage in weakly expressed genes. pair with the anticodon in the wobble position are (1) The wayin which the pattern ofcodon usage usually performed, though there are exceptions to changes with expression level suggests a trend from this rule (GROSJEANand FIERS1982). (3) The bias is predominant use of optimalcodons in highly ex- more extremein strongly expressed genes which pro- pressed genes toward more uniform use of all synon- duce large amounts of than in weakly ex- ymous codons in weakly expressed genes (SHARPand pressed genes. of this kind hasbeen LI 1986a,b; BULMER1988a). There is no evidence found in Drosophila melanogaster (SHIELDSet al. 1988) that weakly expressed genes have a preference for as well as in many prokaryotes and unicellular eukar- rare codons. (2) The synonymous substitution rate is yotes but not in mammals. higher in Enterobacteria inweakly than inhighly There is general agreement that the strong bias expressed genes, inaccordance with expectation when found in highlyexpressed genes is due toselection for selection pressure is relaxed (SHARPand LI 1987). (3) speed or translational efficiency, but two theories have Investigation of the effect of neighboring bases (the been put forward to account for codon usagein codon context effect) on the synonymous usage of A weakly expressed genes. The first, the expression-reg- vs. G or of U vs. C shows a similar pattern in comple- dation theory, is that rarecodons are used in the latter mentary sequences in weakly expressed genes inboth as a mechanism for keeping their expression low Escherichia coli and yeast (BULMER1990). This is con- (GROSJEANand FIERS1982; KONICSBERGand GODSON sistent with an effect of context on , but 1983; WALKER,SARASTE and GAY 1984; HINDand is not consistent with selection acting at the mRNA BLAKE1985). The second, the selection-mutation-drijit rather than theDNA level.SHIELDS and SHARP(1 987) theory, is that codon usage patterns result from the have madea similar observation in Bacillus subtilis. (4) ' Current address:Department of Biological Sciences, , It would seem more efficient to modulate the level of Piscataway, New Jersey 08855-1059. by changing the strength of the pro-

Genetics 149: 897-907 (November, 1991) 898 M. Bulmer moter or the binding site. KONIGSBERGand due to the combined effects of mutation and genetic GODSON(1 983) discuss a weakly expressed E. coli gene drift (LI 1987; BULMER1987; SHIELDS1990). Con- sandwiched in the same transcriptional unit between sider first the jointeffect of selection and mutation in two highly expressed genes,and propose that its weak an effectively infinite population on a two codon fam- expression is achieved through theuse of non-optimal ily, such as the codons UUU and UUC for Phe. Fix codons.However, DE BOER and KASTELEIN(1986) attention on a particular UUY codon at a particular point out that it has an inefficient ribosome binding position in a particular gene, and assume that only site which is more likely to be the cause of its weak synonymous changesoccur, since nonsynonymous expression. (5) In any case, it seems unlikely that substitutions are comparatively rare. Thus there are changes in the elongation rate brought about through two alleles at this codon, represented by the presence codon usage will lead to any appreciable change in of U or C at the third position, which will be labeled the rateof protein production (ANDERSONand KUR- B1 and B2. For simplicity consider a haploid model, LAND 1990). The rate of initiation is probably the and suppose that individuals carrying B2 have relative rate-limiting factor in protein synthesis, so that the fitness 1 - s compared with those carrying B1,so that elongation rate has little or no direct effect on the s is the selective disadvantage of B2 compared with B1, rate of protein synthesis. This will be discussed in assumed small. Write u for the mutation rate from B1 more detail later. to B:! and v forthe mutation rate in the reverse I shall therefore adopt the selection-mutation-drift direction. Write p, and q1 (=I - p,) for the relative theory as a working hypothesis. The purpose of this frequencies of B1 and B2 in generation t. The change paper is to evaluate the selective forces likely to act in the gene frequency in one generation under weak on codon usage in unicellular organisms inwhich selection and mutation is growth rate (or possibly growth yield) can be equated with fitness and to determine whether they are of the Ap = sptqr + Vqt - up,. (1) right order of magnitude to generate, in conjunction At equilibrium Ap = 0, so that the equilibrium gene with estimated mutation ratesand population size, the frequency P satisfies the quadratic equation observed pattern of codon usage. sP(1 - P) v(l - P) - UP = 0 (2) The next section describes a simple population ge- + neticmodel, the following section quantifiesfrom whose unique positive root is biochemical considerations the selective forces acting P = 1/2(1 - (u v)/s on codon usage and the final section confronts the + (3) predictions of the model, incorporating theseselection + sign(s).\/l - 2(u - v)/s + (u + V)~/S~~. coefficients, with the facts. The observed bias in syn- In a finite population, randomsampling introduces onymous codon usage is much less than predicted; stochastic fluctuations in the gene frequency, and we possible reasons for this discrepancy are considered. must consider not the equilibrium gene frequency P but the equilibrium distribution of gene frequencies A POPULATION GENETIC MODEL OF f(p)with Expected value P. A classical result in pop- SYNONYMOUSCODON USAGE ulation genetics (WRIGHT193 1; CROWand KIMURA 1970) is that KIMURA(1 98 1, 1983) considered a model of stabi- lizing selection on codon usage in which there is an f@) a esPpv”(l - p)”” (4) optimal state of highest fitness when the relative fre- where quencies of synonymous codons exactly match those of the cognate tRNAs. The rationale of this model is S = 2Nes unclear.It is truethat codon usage will alterthe V = 2Nev relative abundanceof cognate charged tRNAs, butit (5) isunlikely thatunder physiological conditions the U = 2Neu tRNA with greatesttotal abundance (charged, un- Ne = effective population size. charged and bound to mRNA) will also have greatest abundance in free, charged form and will therefore If U + V is large, the distribution will be clustered translate fastest regardless of codon usage. This model about the deterministic equilibrium in Equation 3. If also fails to take into accountdifferences between U + V is small, the population is likely to be at or near codons recognized by the same tRNA, and there is one of the boundaries so thatthe expectedgene some weakness in the link between its verbal formu- frequency is the probability of being near 1 rather lation and its mathematical development (LI 1987). than 0, given by An alternative model will be adopted herein which P = esV/(esV + u). (6) there is a unique optimal codon for each amino acid, the occurrence of other synonymous codons being To prove this result we use the fact (CROWand KI- Synonymous Codon Usage 899

MURA 1970, p. 426) that theprobability of fixation of quencies Pi satisfy the set of simultaneous quadratic a newly arisen mutant with small selective advantage equations analogous to Equation 3: s is approximately Pi(C SF^ - si) + 1 UjSj - Pix ~9 = 0, (9) j j 4(S) = 2S/N(1 - e-') (7) j i=l, where Nis the actual populationsize. If the probability ...K. of being fixed at 1 is P, the numberof new BZ mutants In afinite population inwhich the Uys are small arising per generation in B1 populations which are enough that there is little polymorphism, the fixation ultimately fixed is NuP$(-S); likewise the number of probabilities Pi satisfy the set of simultaneous linear new B1 mutants arising per generation in BZ popula- equations analogous to Equation 6: tions which are ultimately fixed is Nv(1 - P)4(S). At (si - sj)(U$iexp Si - UjiPjexp Sj)/ equilibriumthese flux rates areequal, leading to j#i Equation 6. (10) (exp Si- exp Sj) = 0, i = 1, . . . K. In conclusion, in a large population (U + V >> l), we expect to see polymorphism at every codon posi- tion, with a fraction of P B1 codons and (1 - P) B:! SELECTIVE FORCES ACTINGON SYNONYMOUS codons, with P given by Equation 3; in a small popu- CODON USAGE lation (U + V << 1) we expect to see monomorphism with afraction P of the relevant positions mono- Selection might operateon synonymous codon morphic for B1 and (1 - P) for Bz, with P given by usage through its effect on mRNA or DNA secondary Equation 6. structure.For example, the secondary structure of We have so far only considered selection acting at mRNA can affect both its rate of degradation (KEN- a single site, whereas we need to consider selection NELL 1986) and the rate of initiation of translation operating simultaneously on a large number of syn- (TOMICHet al. 1989; GOLD1988). However, it is not onymous sites which may be tightly linked(either obvious that this type of selection would consistently because they are in the same gene, or because the favor one synonymous codonover another; a sug- whole genome is tightly linked in a clonal organism gestion that MS2 phage has biased its codon usage like E. coli). If selection acts in a multiplicative way in such a way as to achieve optimal secondary struc- (that is to say, if total fitness is the productof fitnesses ture (HASEGAWA,YASUNAGA and MIYATA 1979) has at individual sites), no linkage disequilibrium is gen- proved on reanalysis of the data to be unfounded erated in a deterministic model so that the sites do (BULMER1989). Selection on secondary structure will not interfere with one another and it is legitimate to therefore be ignored as a major force acting on syn- treatthem separately. However, simulations by LI onymous codon usage, and attention will be concen- (1 98'7) showthat this may not hold for a largenumber trated on factors acting directly on the efficiency of of very tightly linked sites in a finite population. The translation through affecting either the speed or the problem seems to arise from theHill-Robertson effect accuracy of translation. Attentionwill be concentrated (HILLand ROBERTSON1966; FELSENSTEIN1974) on E. coli because it has been so intensively studied, whereby the existence of background variability be- but the arguments are sufficiently general that they tween lines in an asexual population increases the can be extended to all unicellular organisms (prokar- variance of the distribution of offspring number and yote or eukaryote). We shall first consider how differ- thus decreases the effective population size and the ences in speed of translation between synonymous co- effectiveness of selection. However, this factor will be dons might affectfitness before discussing the possible implicitly allowed for in the present context by esti- effects of differences in accuracy of translation. matingthe effective population size empirically, so Speed of translation that it seems legitimate to ignore it and to treat the Codonusage affects of translation: It is problem with a single site model. speed plausible that codon usage affects speed of translation The model can be extended to families of more if a codon recognized by a rare tRNA takes longer to than two synonymous codons, though the results are translate than one recognized by a common tRNA. more complicated. Suppose that there are K alleles Several lines of evidence suggest that this is the case. labeled B1, Bz, . ., BK, that the fitness of Bi relative . (1) attenuation in, for example, theleu to BI is 1 and that the mutation rate from Bi to - si operon depends on the fact that the ofrate translation Bj is uy. In a finite population write of a group of leucine codons in a critical position in Si = 2N2, the leader decreases as aminoacylated Leu-tRNALe" (8) becomes rare (LANDICKand YANOFSKY1987; YAN- U..'I = 2N,ug. OFSKY 1988). Both the likely mechanism of attenua- In an infinite population the equilibrium gene fre- tion and its experimental modulationby changing rare 900 M.Bulmer to synonymous common codons (CARTER,BARTKUS proportional to the absolute abundance of the cognate and CALVO1986; HARMSand UMBARGER1987; tRNA. This modelpresupposes that tRNA abun- BONEKAMPet al. 1985) provide strong evidence of a dances are sufficiently low that are not significanteffect of charged tRNA abundance on saturated by the cognate tRNA. In vitro studies of the speed of translation. (2) VARENNEet al. (1984) found kinetics of translation support this view (ANDERSON from studies of incomplete polypeptide chains that et al. 1986; ANDERSON and KURLAND1990). This their length distribution is not uniform, and they model can also explain differences in translation rate inferred from the pattern of this distribution that the between codons recognized by the same tRNA, since ribosome pausesat sites corresponding to rarecodons. a codon with a weaker affinity for the anticodon will (3) SORENSEN,KURLAND and PEDERSEN(1989) have have a higher K, and a lower translation rate. measured elongation rate directly in vivo, and have How does speed of translation affect fitness?It is found, by inserting a short string of either common tempting to suppose that a change in therate of or predominantly rare codons into a gene, that rare translation (the elongation rate) in a particular mRNA codons are translated at an average rate of only 2.1 will cause a corresponding change in the rate of pro- per second compared with 12 per second forthe tein production by that mRNA; but this is only the common codons. In a similar experiment, SORENSEN case if elongation rather than initiation is the rate- and PEDERSEN(1989) have found thatstrings of GAA, limiting step in protein synthesis. To understand this a major codon for Glu, are translated three times point, observe that the rate of protein production by faster than strings of GAG, whichis recognized by the messengers of a certain type is equal to the rate of same tRNA but which is uncommon, particularly in termination on those messengers, which must at equi- this context (SHPAER1986). Thus it may be that speed librium equal the rate of initiation. A change in the of translation determines codon usage between co- elongation rate will only lead to a change in the rate dons recognized by the same tRNA as well as between of protein synthesis if it also leads to a change in the those recognized by different tRNAs. rate of initiation. There are two views about how tRNA abundance If ribosomes are so numerous that a free ribosome affects elongation rate. They both assume that acti- is able to bind to theinitiation site assoon as it is freed vated tRNAs diffuse passively within the cell until a by the movement of the preceeding ribosome, the cognate tRNA is sufficiently close to an open A site polysome will be a continuous queue of ribosomes to be recognized. Under the first model each tRNA traveling down the messenger in a solid block. In this within a critical distance of the A site is tested, non- case elongation is the rate-limiting factor in protein cognate ones being rejected, until the first cognate synthesis. An increase in the elongation rate, particu- tRNA is encountered and accepted. It is envisaged larly of the slowest codons, will increase the rate at that the time between the rejection of a noncognate which the queue of ribosomes travels down the mes- tRNA and the arrival of the next tRNA to be tested senger, thus increasing the rateof initiation and hence is negligible, but that an appreciable time is taken to the rate of protein production. On the other hand, if test each tRNA, during which access to the A site by ribosomes are not saturating so that a free binding other tRNAs is blocked (GOUYand GRANTHAM 1980; site waits an appreciable time before a ribosome binds GOUYand GAUTIER1982). Under this model it is not to it, the polysome will be a free-flowing stream of the absolute abundance of the cognate tRNA that ribosomeswhich do not interfere witheach other determines how long a codon takes to be recognized because there are gaps between them. Inthis case but its relative abundance compared with that of all initiation is rate-limiting. Increasing the elongation theother noncognate tRNAs. BILGIN, EHRENBERG rate will not directly increase the initiation rate when and KURLAND(1 988) have tested this model by com- there is no queue of ribosomes back to the initiation paring the elongation rate of polyU into polyphe by site; the ribosomes will travel faster down the messen- the cognate tRNA in an in vitro system in the absence ger, but there will be fewer of them on the messenger and in the presence of a noncognate tRNA. The at a time with larger gaps between them. However, addition of anexcess of noncognate tRNA had a this decrease in the polysome size will indirectly in- negligible effecton the elongation rate. Itis concluded crease therate of initiation on all messengers by that noncognate tRNAs do not inhibit translation by increasing the poolof free ribosomes, and so will the cognate tRNA, contrary to the premise of the increase slightly the rate of protein production of all above model. . Under the second model the time taken to reject Severallines of evidencesuggest that initiation noncognate tRNAs is negligible, as suggested by the rather than elongation is normally rate-limiting. (1) above experiment, but thetime taken until the arrival Since ribosomes form the largest part of the protein of the first cognate tRNA in the vicinity of the A site translational machinery, it would be inefficient to is the rate-limiting factor. This time will be inversely saturate the system with them. (2) In E. coli polysomes Synonymous Codon Usage 90 1 there areabout 225 bases per ribosome in a polysome where t* is the step time at the jth codon. The total (INCRAHAM, MAALOE and NEIDHARDT1983, p. 286), translation time in the absence of queueing is while ribosome-protection studies show that the ribo- Si some covers about 30 bases (KOZAK 1983), so that tTi = tij (13) there is an average length of about 195 unoccupied j- 1 bases between ribosomes. (3) Simulations of models if there arealtogether Si codons in the ith gene. of protein synthesis with realistic parameter values In a steady state indicate that initiation is rate-limiting (VON HEIJNE, BLOMBERCand LILJENSTROM1987), though HARLEY Rbi = tTi/tIi (14) et al. (1981) show that elongation can become rate- and the rateof synthesis of the ith protein is limiting if it becomes very slow due to amino acid Pi = mi/tIi. (15) starvation. (4) Experimental manipulation of codon usagehas little effect on protein synthesis except Suppose that the step time of the jth codon in the kth under extreme conditions. For example, ROBINSONet mRNA changes because of a synonymous codon al. (1984)inserted four rare arginine codons (AGG) change. The change in the relative rate of synthesis or 4 common arginine codons (CGT) in a cluster into of the ith protein (which is a more convenient quantity the CAT gene. The made no difference to than the absolute rate of synthesis) is production of CAT protein at low expression levels, 1 butat veryhigh expression levels in a multicopy - dPi/atkj = a In P,/dtkj = -d In t,i/atkj. (16) plasmid the construct with four AGG codons pro- Pi duced substantially less CAT protein than either the From Equation 12 we cansee that two terms are CGT construct or the wild type gene. VARENNE and involved. The first term is the direct effect ofa change LAZDUNSKI(1986) have shown by a theoretical calcu- in the step time in the initiation region of the ith lation thatthe latter effect is dueto the tandem mRNA on the production of its own protein because arrangement of the rarecodons, which at high expres- of its effect on the frequency of initiations: sion levels sequester the cognate tRNA in the P-site and so make the translation of these codons rate- d In Pi/dtkj (direct) = -t;’ if k = i, j IL. (1 7) limiting. Theytherefore predicted that the effect The negative sign shows that an increase in the step would disappear if the AGG codons were scattered time leads directly to a decrease in protein synthesis, throughout the gene rather than clustered; this pre- as expected. The second term is the indirect effect of diction has been confirmed experimentally (VARENNE a change in the step time on the production of all et al. 1989). proteins through changing the number of free ribo- A model for the effect of speed of translation on somes (see Equations 1 1 and 14), which can be found fitness: LILJENSTROMand VON HEIJNE(1987) have by implicit differentiation: proposed a deterministic model ofthe effect of codon usage on translation time when initiation is rate-lim- iting, which will be developed further here. Suppose that there are mi mRNA molecules transcribed from with the ith gene and that Rbi is the average number of ribosomes bound to an mRNA of this type (the poly- somesize). If R,, and Rf are the total number of ribosomes and the numberof free ribosomes, then The negative sign outside the initiation region (j> L) means that an increase in the step time leads indirectly R,,, = 2 m,Rbi + Rp (1 1) i toa decrease insynthesis of all proteins through decreasing the abundance of free ribosomes. On the The time interval, tIi, between two initiations on a otherhand, the positivesign within the initiation given mRNA of this type is the sum of the time it region (j d L) if the polysomesize exceeds unity takes for the newly bound ribosome to move away means that an increase in the step time leads indirectly from the initiation region and of the time it takes for to an increasein synthesis of all proteins through a free ribosome to bind to the messenger once it is increasing the abundance of free ribosomes; this hap- unblocked.If we assume that the ribosomeblocks pens because there are fewer initiations on the kth further initiation until it has translated L codons and type of messenger so that its polysome sizeis reduced. that kIi is the rate constant for the second process, To interpret these results in terms of relative fitness L (w), suppose that ai has the samevalue a for all tr; = tij + (k,iRf)” (12) mRNAs so that the indirect effect is the same on the j- 1 relative production of all proteins. In unicellular or- 902 M.Bulmer ganisms such as E. coli and yeast under optimal growth where p = Ph/Ptot is the relative abundance of the conditions, fitness, growth rate and rate of protein protein (as a proportion of total protein production). synthesis are nearly synonymous. Thus the change in Finally, we calculate the selective disadvantage of a relative protein synthesis and hence the change in codonoutside the initiation regiontranslated by a relative fitness resulting from a change in step time rare tRNA which slows down the rate of translation outside the initiation region is sixfold (SORENSEN,KURLAND and PEDERSEN1989) so that the step time is increased by 0.28 sec from 0.05 dw/atkj = -&k. (20) to 0.33 sec; this would produce a selective disadvan- We shall nowtry to obtain an approximateestimate tage s of about of (Y for E. coli. Substituting average values for the gene-specific constants in Equation 19, s = 0.033 X 0.28~= 0.01~. (30) It has been assumed in this model that the total = 1/[Rj%h Rbt~pt~t], (21) + number of ribosomes and the tRNA concentrations where Ptotis the total rate of protein production. We remain fixed. It would be advantageous for a cell to shall use data for cells in balanced growth at 37" in respond to the introductionof a rare codon by increas- glucose minimal medium with a mass doubling time ing the concentration of the cognate tRNA and the of 40 min given by INGRAHAM,MAAL0E and NEID- total number of ribosomes to offset the reduction in HARDT (1983, pp. 3, 286 and 289) to estimate the the rate of protein synthesis. Calculations by 0. G. parameters in this equation. The number of protein BERG(personal communication) show that, if this re- molecules per cell is 2.36 X lo6, whence the rate of sponse of the protein-synthetic machinery to maintain protein synthesis given a doubling timeof 40 min is optimalperformance can bemade inphysiological time by demand-regulation, it will substantially reduce P,,, = 2.36 X lo6 X In 2/(40 X 60) (22) the selection pressure against rare codons. This com- = 680 molecules per sec. plication will not, however, be pursued further here. The number of ribosomes per cell is 18,700 of which We turn now to the change in fitness for a codon about 85% are bound, so that within the initiation region. This is the sum of two components: R, = 0.15 X 18,700 = 2,800.(23) dw/dtkj = (Rbk - 1)dk - t,'dw/d In Pk. (31) The number of messengers per cell is 1380, so that from Equation 15 The term dw/d In Pk is the change in fitness induced by a change in the production of a single protein. It tI = 1380/680 = 2.0 sec. (24) is likely to be small because changing the activity of There are about400 codons per gene andcodons are one enzyme in a complex metabolic pathway usually translated at a rate of about 18 per second, so that has only a small effect on the flux through that path- way (KACSERand BURNS1973). For example, DEAN, t~ = 400/18 =(25) 22, DYKHUIZENand HARTL(1986) showed that a small and from Equation 14, increase in &galactosidase activity in E. coli in che- mostat cultures with limiting lactose caused an in- Rb = 22/2 = 11. (26) crease in fitness one hundred fold smaller; and they The time taken for the ribosome to travel through remark that selection under natural conditions must the initiation region (the first term on the right hand be much less intense than this because E. coli inhabits side of Equation 12) may be estimated as about 1 sec an environmentwith many alternative sources of car- by putting L = 10 (see earlier) and making some bon and energy. allowance for the fact that translation is slower in this If initiation is rate-limiting one therefore expects to region than in the rest of the gene; hence see different codon usage behavior in the initiation region and in the remainder of the gene in prokar- kIRf = 1 sec. (27) yotes in whom the ribosome binds to an initiation Substitutingthese values into Equation 21 we find region which covers the beginning of the coding re- that gion, as assumed in the abovetheoretical analysis. However, in eukaryotes the ribosome binds to thecap (Y = O.033/Pt,. (28) site at the 5' end of the mRNA and then migrates From Equation 20 the change in relative fitness re- down to the AUG start codon (KOZAK1983); In' con- sulting from a change in step time outside the initia- sequencenone of the codingregion is within the tion region is initiation region as envisaged above so that the effect of translation time on fitness is given by Equation 20 dw/atk, = -0.033~(29) for all codons. Synonymous Codon Synonymous Usage 903

Accuracy of translation this cannot be achievedand thereis a tradeoff between maintaining a high probabilityof correcting a mistake, A differential error rate between synonymous co- say 4, > 0.99 for j > k, and a low probability of dons is the other major selective force likely to affect incorrectly rejecting the cognate tRNA, say QI<0.05 codon usage. The accuracy of ribosomal translation (EHRENBERG,KURLAND and BLOMBERG1986). In con- hasbeen reviewed by BUCKINGHAMand GROSJEAN sequence the error rate afterproofreading is reduced (1986), Kurland and BLOMBERG(1 986) in KIRKWOOD, to about at the costof rejecting, say, 5% of ROSENBERGERand GALAS(1986). Selection on the cognate tRNAs, whichdecreases therate and in- ribosome occurs in two stages-an initial discrimina- creases the energetic cost of translation by the same tion of the ternary complex by the codon at the A amount. site, inwhich cognate tRNA is more likely to be The average number of trials before a tRNA is accepted than noncognate tRNA, followed by proof- finally accepted is reading in which noncognate tRNA is more likely to be rejected and released from the A site than cognate N = 1/c Pj(1 - Qj). tRNA. Write Pj for theprobability that initial discrim- i ination leads to acceptance of the jth tRNA by The probability that this tRNA belongs to the jth a particular codon, and Qjfor theprobability that this species is tRNA is rejected at the proofreading stage, with the 7rj = NPJ1 - Qj). (33) convention thatj = 1 denotes the cognate tRNA and j = 2 to k denote any noncognate but isoaccepting A perfectly accurate codon would have N = 1, cjcl7rj tRNAs, so that j > k denotes nonisoaccepting tRNAs = 1. Departures from either of these conditions im- whose acceptance would lead to an error in transla- pose a cost. The excess of N over 1 expresses the cost tion. of proofreading. The deficit of &k 7rj below 1 meas- The initial error rate, Pj (j > l), depends on the ures the frequency with which the wrong amino acid degree of mismatch between codon and anticodon. is incorporated. These costs will be discussed in turn. For example, the tRNALe"which recognizes the Leu The cost of proofreading: The term N - 1 meas- codons UUA and UUG is accepted initially by the ures the average number of rejections at proofreading Phe codon UUU about 3% of the timeinstead of before the final acceptance. Each rejection wastes both tRNAPhewhen the two tRNAs are equally abundant, time and energy. Since N - 1 is likely to be small (less whereas the tRNALe"which recognizes the Leu codon than 0. l), differences between synonymous codons in CUG is never accepted by UUU (RUUSALA,EHREN- time wasted by proofreading are likely to be small BERG and KURLAND1982). In the above example, the compared with differences in time taken for the initial error involved a mispairing in the wobble position, discrimination. Only the energy costs will be consid- but it can also involve mispairing thein first or second ered here. codon positions (BUCKINGHAMand GROSJEAN1986; Protein synthesis is very costly in energetic terms; MCPHERSON1988), though an error requiring a mis- this cost can be expressed as high energy phosphate pairing at two positions is probably too rare to be bonds(-P). Each bond requires 2 -P for observable. The error rate is also proportional to the activating aminoacyl-tRNA, 1 -P for the EF-Tu me- relative abundance of the noncognate compared with diated transfer of the aminoacyl-tRNA to the ribo- the cognate tRNA, as expected for a first order reac- somal A site and 1 -P for the EF-G mediated trans- tion (BILGIN,EHRENBERG and KURLAND1988). location of peptidyl-tRNA to the P site. A total of 4 KATO (1990) has suggested that the preference for -P are thus required for eachaminoacyl residue a codon-anticodon bond of intermediate strength added, of which 3 have already been used if the tRNA (GROSJEANand FIERS1982) is due to avoidance of is rejected at the proofreading stage. The energy codons with a high error frequency. For example, in wastedby proofreading by a codon in a gene with yeast the Gly codon GGU is strongly preferred in relative expression p, expressed asa proportion of the highly expressed genes over GGC. These codons are total energy used in protein synthesis, is recognized by the abundant tRNAG'y with anticodon GCC (with G in the third position) which is expected E = 0.75(N - l)p/300 (34) to recognize GGC better than GGU. KATO (1990) where 300 represents the average number of codons suggests that this factor, which would lead to quicker in a gene. (EHRENBERGet al. (1990) suggest that 2 -P translation of GGCby the cognate tRNA, is more are used in the EF-Tu mediated step, in which case than offset by the fact that GGC is much more likely the factor 0.75 would be increased to 0.8.) than GGU to be wrongly recognized by the abundant INGRAHAM, MAALBE and NEIDHARDT(1 983) calcu- tRNAAsP withanticodon GUC. late that protein synthesis accounts for about one half In an ideal world, proofreading would always cor- of the energy requirement for growth of E. coli in rect a mistake (QJ= 1 for j > k). In the real world, minimal medium, and for nearly all the energy re- 904 M. Bulmer quirementfor growth in richmedium, so that E TABLE 1 should be multiplied by a factor between 0.5 and 1 to The proportion of codons with Y in the third position (P)and express it as a proportion of the total requirement for the estimatedselective advantage (S = 2Ns = In P/(1 - P))in growth. two four-codon families in E. coli A small change in the rate of energy consumption might affect fitness in two ways. First, energy might Ribosomal protein genes aa-tRNAgenes be the limiting factor for growth rate. It has been suggested by ANDERSENand VONMEYENBURC (1 980) Codon family Frequency P S Frequency P S that energy is the limiting factor for growthrate in E. Thr(ACN) 300 0.91 2.3 194 0.79 1.3 coli in aerobic respiration because of the restriction of Gly (GGN) 467 0.98 3.8 287 0.91 2.3 respiration to the cytoplasmic membrane; this expla- nation would be limited to aerobic respiration in pro- with the fact that the ribosome binds to the cap site karyotes. Second, in conditions of clonal competition upstream of the start codon in eukaryotes so that no found in unicellular organisms, the major component difference between the beginning and the rest of the of fitness might be notthe growth rate but thegrowth gene is predicted. However, this phenomenon is not yieldin a particular medium, which is likely to be found in all prokaryotes, since SHARPet al. (1990) strongly correlated with energetic efficiency; if clone failed to find it in Bacillus subtilis. It is possible that A wastes 1% more energy than clone B and if they accuracy of translation is a more important selective are growing separately on identical media, then clone force than speed of translation in B. subtilis, affecting A will have 1% fewer individuals than clone B when fitness either directly or through energetic costs, in the carbon content of the medium is exhausted. In which case the initiation region is subject to the same this case the selective disadvantage of a rare compared selective force as the rest of the gene. with a common codon can be calculated from Equa- Let us provisionally assume that speed of translation tion 34 if we substitute the difference in the average is the dominant selective force in E. coli, so that the number of rejections at proofreading between them selective disadvantage of a rare codon is about 0.01 p for (N - 1). Taking this difference as 0.4,for example, (Equation 2 l), andinvestigate whether the frequency gives the selective disadvantage as s = 0.00 1p. It seems unlikely that the difference is larger than 0.4, so that of such codons is correctly predictedby the population 0.00 1p is an upperlimit on theselective disadvantage. genetic theory described above. Ribosomal proteins (apart from L7/L12)are each thought tobe produced The cost of translational errors: The term T, meas- ures the probability that the tRNA finally accepted in the same amount, slightly less than 1 per cent of belongs to the jth species, which will result in the totalprotein production (INGRAHAM, MAALOE and incorporation of an incorrect aminoacid ifj> k. The NEIDHARDT1983); the selective disadvantage of a effect of such an error on fitness will depend on the ribosomal protein codon outside the initiation region nature of the error (how similar is the amino acid translated by a raretRNA is thusabout Ami- incorporated to the correct one in chemical terms?), noacyl-tRNA synthetases are also each produced in on the natureof the site within the protein (some sites similar amounts, about one tenthof that of ribosomal are more highly conserved than others,which suggests proteins (PEDERSENet al. 1978), so that the compa- that they are more sensitive to change) and on the rabledisadvantage should be about lop5.Table 1 nature of the protein (some proteins are more highly shows the usage of Y-ending codonsfor the four- conserved than others, which suggests that mistakes codon families threonine (ACN) and glycine (GGN) anywhere within them are less easilytolerated). There for 45 ribosomal protein genes (excluding L7/L12) is no a priori reason to expect the effect of an error and for eight aminoacyl-tRNA synthetase genes, ex- on fitness to be proportional to the degreeof expres- cluding the first 15codons of each gene.In both sion of the protein, but one would expect it to be families the Y-ending codons (ACY and GGY) are inversely proportional to the rate of of the recognized by abundant tRNAs andthe R-ending protein. codons (ACR and GGR) by rare tRNAs. These two families were chosen for study because they are unique FACING THE FACTS in showing such aclear-cut pattern of recognition Highly expressed E. coli genes are less biased in (Table 3 in BULMER1988a). The quantity S = 2Nes their codon usage at the beginning than in the rest of defined in Equation 5 has been estimated from Equa- the gene (BULMER1988b). This is consistent with the tion 6 as model developed here for the effect of speed of trans- S = In P/(1 - P) (35) lation on fitness, which predicts a difference between the initiation region and the rest of the gene in pro- on the assumption that backward and forward muta- karyotes. This effect was not foundin yeast, consistent tion rates are equal when only (between Synonymous Codon Usage 905 R and Y)are counted as and transitions are there may becounter- pressures ignored. arisingfrom DNA or mRNA secondary structure The results in Table 1 are difficult to reconcile with which in particular positions may tip the balance of the theory developed in this paper. Silent site poly- selective advantage against the most efficiently trans- morphism is rare in E. coli (SAWYER, DYKHUIZENand latedcodon. It has been suggested that a second HARTL1987), justifying the use of Equation 6, and ribosome binding site exists at the beginning of the its magnitude together with an estimated mutation coding region in E. coli (PETERSEN,STOCKWELL and rate of about 10"' (DRAKE1974) suggests an effective HILL 1988; SPRENGART,FALSCHER and FUCHS1990), population size of about 1 O9 (see also SELANDER, CAW- so that this regionmight be subject to conflicting CANT and WHITTAM1987). Thus S is predicted to be selection pressures; this would be an alternative expla- about lo5for ribosomal protein genes and about lo4 nation of the lower codon usage bias in this region. for amino acid (aa)-tRNA synthetase genes, several Base changes anywherein the genemay affect mRNA orders of magnitude larger than the estimated values; structure in such a way as to affect transcription or in other words no codons at all recognized by rare mRNA stability, and may thus set up a conflicting tRNAs should be observed in a sample of the size of selection pressure. Under this explanation rare codons Table 1. To reconcile the estimated values of S with exist because they have some beneficial effect on DNA the predicted values would require an effective pop- or mRNA structure;an experimentalprediction is ulation size of lo5 rather than 10'. It should also be that changingthem to commoncodons should in- noted that the estimated values of S are only twice as crease the translation rate but lower fitness. large for ribosomal protein as for aa-tRNA synthetase The third explanation for the discrepancy is that genes, rather than ten times as large, as predicted if the populationgenetic model describedabove is they are proportional top. grossly inadequate because it fails to take into account Three possible reasons for this discrepancy will now the genetic structure of clonal organisms, in particular be discussed. First, the calculated selection coefficient the importance of periodical selection whereby a fa- may be wrong. An attempt to measure experimentally vorable new mutant periodically sweeps through the the protein burden, the cost in terms of fitness of population carrying with it associated alleles by hitch- manufacturing a usefuless protein, suggests that it is hiking (LEVIN1981 ; MILKMANand BRIDGES1990). substantially less than the fraction of waste protein Further work on this problem is required. synthesis (KOCH 1983), thoughthe reason for this A similar model has been developed for the evolu- discrepancy is notunderstood. The possibility that tion of gene-regulatorybinding sites (0. G. BERG, demand-regulation of the protein-synthetic machin- unpublished results). Binding sites for a gene-regula- ery to maintain optimal efficiency may bufferthe tory protein exhibitsome variation in DNA sequence impact of codon usage changes on fitness has already which affects their binding strength.The same biolog- been discussed. There is also doubt about therelative ical function can be achieved with a strong binding contributions to fitness of (1) the effect of speed of site and a small amount of protein as with a weak translation on growth rate, which might be the dom- binding site and a large amount of protein, but the inant factor in optimal growth conditions and would weak binding site will be at a selective disadvantage generate a selection coefficient against a rare codon because of the associated protein burden. The pre- dicted selection pressure and the expected distribu- of about 0.01~(where p is the relative abundance of the protein); (2) the effect of accuracy of translation tion of binding sites under the jointaction of selection, on energy costs which might be the dominant factor mutation and drift were evaluated. Theoretical pre- dictions are in good agreement with data on E. coli in organisms competing with each other in poor binding sites provided that the effective population growthconditions and would generatea selection size is taken to be about lo4 or 1 05, which seems coefficient rather less than 0.001p; and (3) the direct unreasonably low. This presents the same paradox as effect of accuracy of translation onerrors in the that encountered here in developing an evolutionary protein product, whose effect on fitness it is more explanation of synonymous codon usage. difficult to predict. Nevertheless, it is difficult to be- lieve that the selection coefficients calculated above I thank OTTO BERG,TOM KIRKWOOD, CHUCK KURLAND, PAUL are several ordersof magnitude too large. These SHARP andMONTY SLATKIN for valuable comments and suggestions. selection coefficients could be estimated experimen- tally in defined growth conditionsin chernostat exper- LITERATURE CITED iments by competing strains with a string of either ANDERSEN,K. B.,and K. VON MEYENBURG,1980 Are growth common or rare codons inserted into a gene, asin the rates of Escherichia coli in batch cultures limited by respiration? experiments of SORENSEN,KURLAND and PEDERSEN J. Bacteriol. 144: 114-123. ANDERSSON,D. I., H. W. VANVERSEVELD, A. H. STOUTHAMERand (1 989) on elongation rate discussed above. C. G. KURLAND, 1986 Suboptimal growth with hyperaccurate A second explanation for the discrepancy is that ribosomes. Arch. Microbiol. 144: 96-101. 906 M. Bulmer

ANDERSON,S. G.E., and C.G. KURLAND,1990 Codon prefer- prokaryotic genes: the optimal codon-anticodon interaction ences in freeliving microorganisms. Microbiol. Rev. 54 198- energy and the selective codon usage in efficiently expressed 210. genes. Gene 18: 199-209. BILGIN,N., M. EHRENBERGand C. G. KURLAND, 1988Is transla- HARLEY,C. B., J. W. POLLARD,C. P. STANNERSand S. GOLDSTEIN, tion inhibited by noncognate ternary complexes? FEBS Lett. 1981 Model for messenger RNA translation during amino 233: 95-99. acid starvation applied to the calculation of protein synthetic BONEKAMP,F., H. D. ANDERSEN,T. CHRISTENSENand K. F. JENSEN, error rates. J. Biol. Chem. 256 10786-10793. 1985 Codondefined ribosomal pausing in Escherichia coli de- HARMS,E., and H. E. UMBARGER,1987 Role of codon choice in tected by using the pyrE attenuator to probethe coupling the leader region of the ilvGMEDA operon of Serratia marces- between transcription and translation. Nucleic Acids Res. 13: cens. J. Bacteriol. 169 5668-5677. 4113-4123. HASEGAWA,M., T. YASUNAGAand T. MIYATA,1979 Secondary BUCKINGHAM,R. H., andH. GROSJEAN,1986 The accuracy of structure of MS2 phage RNA and bias in code word usage. mRNA-tRNA recognition, pp. 83-126 in Accuracy in Molecular Nucleic Acids Res. 7: 2073-2079. Processes, edited by T. B. L. KIRKWOOD,R. F. ROSENBERGER HILL, W. G., and A. ROBERTSON,1966 The effect of linkage on and D. J. GALAS.Chapman & Hall, London. limits to artificial selection. Genet. Res. 8: 269-294. BULMER,M., 1987 Coevolution of codon usage and transfer RNA HINDS,P. W., and R. D. BLAKE,1985 Delineation of coding areas abundance. Nature 325: 728-730. in DNA sequences through assignment of codon probabilities. BULMER,M., 1988a Are codon usage patterns in unicellular or- J. Biomol. Struct. Dyn. 3: 543-549. ganisms determined by selection mutation balance? J. Evol. IKEMURA, T., 1985Codon usage and tRNA content in unicellular Biol. 1: 15-26. and multicellular organisms. Mol. Biol. Evol. 2: 13-34. BULMER,M., 1988b Codon usage and intragenic position. J. INGRAHAM,J, L., 0.MAAL~E and F. C. NEIDHARDT,1983 Growth Theor. Biol. 133: 67-71. of the Bacterial Cell. Sinauer, Sunderland, Mass. BULMER,M., 1989 Codon usage and secondary structure of MS2 KACSER, H., and J. A. BURNS,1973 The control of flux. Symp. phage RNA. Nucleic Acids Res. 17: 1839-1843. SOC. Exp. Biol.27: 65-104. BULMER,M., 1990 The effect of context on synonymous codon KATO, M., 1990 Codon discrimination due to presence of abun- usage in genes with low codon usage bias. Nucleic Acids Res. dant non-cognate competitive tRNA. J. Theor. Biol. 142: 35- 18: 2869-2873. 39. CARTER,P. W., J. M. BARTKUSand J. M. CALVO,1986 Transcrip- KENNELL,D. E., 1986 The instability of messenger RNA in bac- tion attenuation in Salmonella typhimurium: the significance of teria, pp. 10 1-142, in Maximizing Gene Expression, edited by rare leucine codons in the leu leader. Proc. Natl. Acad. Sci. W. REZNIKOFFand L. COLD. Butterworth, Boston. USA 83: 8127-8131. KIMURA,M., 1981 Possibility of extensive neutral evolution under CROW,J. F., and M. KIMURA,1970 An Introduction to Population with specialreference to nonrandom usage Genetics Theory. Harper & Row, New York. of synonymous codons. Proc. Natl. Acad. Sci. USA 78: 5773- DE BOER,H. A,, and R. A. KASTELEIN,1986 Biased codon usage: 5777. an exploration of its role in optimization of translation, pp. KIMURA,M., 1983 TheNeutral Theory of MolecularEvolution. 225-285 in Maximizing Gene Expression, edited by W. REZNI- Cambridge University Press, Cambridge. KOFF and L. GOLD.Butterworth, Boston. KIRKWOOD,T. B.L., R. F. ROSENBERGERand D. J. GALAS, DEAN,A. M.,D. E. DYKHUIZENand D. L. HARTL, 1986 Fitness 1986 Accuracy in Moleculas Processes. Chapman & Hall, Lon- as a function of @-galactosidaseactivity in Escherichia coli. Genet. don. Res. 48 1-8. KOCH, A. L., 1983 The protein burden of products. J. DRAKE,J. W., 1974 The role of mutation in microbial evolution. Mol. EvoI. 19 455-462. Symp. SOC.Gen. Microbiol. 24 41-58. KONIGSBERG,W. J. N., and G. N. GODSON,1983 Evidence for use EHRENBERG,M., C. G. KURLANDand C. BLOMBERG,1986 Kinetic of rarecodons in the dnaG gene and otherregulatory genes of costs of accuracy in translation, pp. 329-361 in Accuracy in Escherichia coli. Proc. Natl. Acad. Sci. USA 80 687-691. Molecular Processes, edited by T. B.L. KIRKWOOD,R. F. Ro- KOZAK,M., 1983 Comparison of initiation of protein synthesis in SENBERGER and D. J. GALAS.Chapman & Hall, London. procaryotes, eucaryotes, and organelles. Microbiol. Rev. 47: 1- EHRENBERG,M., A". ROJAS,J. WEISERand C.G. KURLAND, 43. 1990Two EF-Tu's participate in aminoacyl-tRNA binding KURLAND,C. G., and J. A. GALLANT, 1986The secret life of the and peptide bond formation in E. coli translation. J. Mol. Biol. ribosome, pp. 127-157 in Accuracy in Molecular Processes, ed- 211: 739-749. ited by T. B. L. KIRKWOOD,R. F. ROSENBERGERand D. J. FELSENSTEIN,J., 1974 The evolutionary advantage of recombi- GALAS.Chapman & Hall, London. nation. Genetics 78: 737-756. LANDICK,R., and C. YANOFSKY,1987 Transcription attenuation, GOLD, L., 1988 Posttranscriptional regulatory mechanisms in pp. 1276-1307 in Escherichia coli and Salmonella typhimurium: Escherichia coli. Annu. Rev. Biochem., 57: 199-234. Cellularand Molecular , edited by F. C. NEIDHARDT. GOUY,M., and C. GAUTIER,1982 Codon usage in bacteria: cor- ASM Press, Washington, D.C. relation with gene expressivity. Nucleic Acids Res. 10 7055- LEVIN,B. R., 1981 Periodic selection, infectious gene exchange 7074. and the genetic structure of E. coli populations. Genetics 99 GOUY,M., and R. GRANTHAM, 1980Polypeptide elongation and 1-23. tRNA cycling in Escherichia coli: a dynamic approach. FEBS LI, W-H., 1987 Models of nearly neutral mutations with particular Lett. 115: 151-155. implications for nonrandom usage of synonymous codons. J. GRANTHAM,R., C. GAUTIER,M. COUY,R. MERCIER and A. PAVE, Mol. EvoI. 24 337-345. 1980 Codon catalog usage and the genome hypothesis. Nu- LILJENSTROM,H., and G. VONHEIJNE, 1987Translation rate cleic Acids Res. 8: r49-r62. modification by preferential codon usage: intragenic position GRANTHAM,R., C. GAUTIER,M. GOUY,M. JACOBZONE and R. effects. J. Theor. Biol. 124: 43-55. MERCIER,198 1 Codon catalog usage is agenome strategy MC~HERSON,D. T., 1988 Codon preference reflects rnistransh- modulated for gene expressivity. Nucleic Acids Res. 9: r43- tional constraints: a proposal. Nucleic Acids Res. 16: 41 11- r79. 4 120. GROSJEAN,H., and W. FIERS,1982 Preferential codon usagein MILKMAN,R., and M. M. BRIDGES,1990 Of Synonymous Codon Usage 907

the Escherichia coli chromosome. 111 Clonal frames. Genetics SHIELDS,D. C., and P. M. SHARP,1987 Synonymous codon usage 126 505-517. in Bacillus subtilis reflects both translational selection and mu- PEDERSEN,S., P. L. BLOCH,S. REEH and F. C. NEIDHARDT, tation biases. Nucleic Acids Res. 15: 8023-8040. 1978 Patterns of protein synthesis in E. coli: a catalog of the SHIELDS,D. C., P. M. SHARP,D. C. HIGGINSand F. WRIGHT, amount of 140 individual proteins at different growth rates. 1988 Silent sites in Drosophila genes are not neutral: evidence Cell 14: 179-190. of selection among synonymous codons. Mol.Biol. Evol. 5: PETERSEN,G. B., P. A. STOCKWELLand D. F. HILL, 1988 Messenger 704-7 16. RNA recognition in Escherichia coli: a possible second site of SHPAER,E. G., 1986 Constraints on codon context in Escherichia interaction with 16s ribosomal RNA. EMBO J. 7: 3957-3962. coli genes: their possible role in modulating the efficiency of ROBINSON,M., R. LILLEY,S. LITTLE,J. S. EMTAGE,G. YARRANTON, translation. J. Mol. Biol. 188: 555-564. P. STEPHENS,A. MILLICAN,M. EATONand G. HUMPHREYS, S~RENSEN,M. A,, C. G. KURLAND,and S. PEDERSEN,1989 Codon 1984 Codon usage can affect efficiency oftranslation of genes usage determines translation rate in Escherichia coli. J. Mol. in Escherichia coli. Nucleic Acids Res. 12: 6663-667 1. Biol. 207: 365-377. RUUSALA,T., M. EHRENBERGand C. G. KURLAND,1982 Is there SORENSEN,M. A., and S. PEDERSEN,1989 Abstracts, Thirteenth proofreading during polypeptide synthesis? EMBO J. 1: 741- International Transfer RNA Meeting, Vancouver. (Quoted in 745. ANDERSON and KURLAND,1990). SAWYER,S. A., D. E. DYKHUIZEN,and D. E. HARTL,1987 Confi- SPRENGART,M. L., H. P.FALSCHER and E. FUCHS,1990 The dence interval for thenumber of selectively neutral amino acid initiation of translation in E. coli: apparent base pairing between the 16srRNA and downstream sequences of the mRNA. Nu- polymorphisms. Proc. Natl. Acad. Sci. USA 84: 6225-6228. SELANDER,R. K.,D. A. CAUGANT,and T. S. WHITTAM,1987 cleic Acids Res. 18: 1719-1723. TOMICH,C-S., E. R. OLSON,M. K. OLSEN,P. S. KAYTES,S. K. Genetic structure and variation in natural populations of Esch- ROCKENBACHand N. T. HATZENBUHLER,1989 Effect of nu- erichiacoli, pp. 1625-1648 in Escherichia coli and Salmonella cleotide sequences directly downstream from the AUG on the typhimurium:Cellular and Molecular Biology, edited byF. C. expression of bovine somatotropin in E.coli. NucleicAcids NEIDHARDT.ASM Press, Washington, D.C. Res. 17: 3179-3197. SHARP,P. M., 1989 Evolution at ‘silent’ sites in DNA, pp. 23-32 VARENNE,S., and C. LAZDUNSKI,1986 Effect of distribution of in Evolutionand animal breeding: Reviews on Molecular and unfavourable codons on the maximum rate of gene expression Quantitative Approaches in Honour Alan Robertson, edited by of by an heterologous organism. J. Theor. Biol. 120 99-1 10. W. G. HILL,and T. F. C. MACKAY.CAB International, Wall- VARENNE,S., J. Buc, R. LLOUBESand C. LAZDUNSKI,1984 ingford, U.K. Translation is a non-uniform process. Effect of tRNA availa- SHARP,P. M., and W-H. LI, 1986a An evolutionary perspective bility on the rate of elongation of nascent polypeptide chains. on synonymous codon usage in unicellular organisms. J. Mol. J. Mol. Biol. 180: 549-576. EvoI. 24: 28-38. VARENNE,S., D. BATY,VERHEIJ., D. SHIREand C. LAZDUNSKI, SHARP,P. M., and W-H. LI, 1986b Codon usage in regulatory 1989 The maximum rate of gene expression is dependent on genes in Escherichia coli does not reflect selection for “rare” the downstream context of unfavourable codons. Biochimie codons. Nucleic Acids Res. 14 7737-7749. 71: 1221-1229. SHARP,P. M., and W-H. LI, 1987 The rate of synonymous sub- VON HEIJNE, G., C. BLOMBERGand H. LILJENSTROM,1987 stitution in enterobacterial genes is inversely related to codon Theoretical modelling of protein synthesis. J. Theor. Biol. 125: usage bias. Mol. Biol. Evol. 4 22-230. 1-14. SHARP,P. M., D. G. HIGGINS,D. C. SHIELDS, K. M. DEVINEand J. YANOFSKY,C., 1988 Transcriptionattenuation. J. Biol. Chem. A. HOCH,1990 Bacillus subtilis gene sequences, pp. 89-98 in 263: 609-612. Genetics and Biotechnology of Bacilli, vol. 3, edited by M. M. WALKER,J. E., M. SARASTEand N. J. GAY,1984 The unc operon, ZUKOSKI,A. T. GANESANand J. A. HOCH.Academic Press, San nucleotide sequence, regulation andstructure of ATP-syn- Diego. thase. Biochim. Biophys. Acta 768: 164-200. SHIELDS,D. C., 1990 Switches in species-specific codon prefer- WRIGHT,S., 1931 Evolution in Mendelian populations. Genetics ences: the influence of mutation biases. J. Mol. Evol. 31: 71- 1697-159. 80. Communicating editor: W-H. LI