The Genetic Code Is One in a Million

J Mol Evol (1998) 47:238–248 © Springer-Verlag New York Inc. 1998 The Genetic Code Is One in a Million Stephen J. Freeland,1 Laurence D. Hurst2 1 Department of Genetics, Downing Street, Cambridge CB2 3EH, UK 2 Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA27AY, UK Received: 25 July 1997 / Accepted: 9 January 1998 Abstract. Statistical and biochemical studies of the Key words: Genetic code — Error minimization — genetic code have found evidence of nonrandom patterns Mistranslation — Transition/transversion bias — Evolu- in the distribution of codon assignments. It has, for ex- tion — Natural selection ample, been shown that the code minimizes the effects of point mutation or mistranslation: erroneous codons are either synonymous or code for an amino acid with chemical properties very similar to those of the one that Introduction would have been present had the error not occurred. This work has suggested that the second base of codons is less The genetic code is not random. Wong (1975), for ex- efficient in this respect, by about three orders of magni- ample, has argued that the assignment of codons to tude, than the first and third bases. These results are amino acids was guided by the biosynthetic relationships based on the assumption that all forms of error at all between amino acids (but see Amirnovin 1997). Signifi- bases are equally likely. We extend this work to inves- cantly, then, it has been shown that codons specifying tigate (1) the effect of weighting transition errors differ- amino acids that share the same biochemical synthetic ently from transversion errors and (2) the effect of pathway tend to have the same first base (Taylor and weighting each base differently, depending on reported Coates 1989). Codons of the amino acid belonging to the mistranslation biases. We find that if the bias affects all shikimate, pyruvate, aspartate, and glutamate families codon positions equally, as might be expected were the tend to have U, G, A, and C in the first position, respec- code adapted to a mutational environment with transi- tively. tion/transversion bias, then any reasonable transition/ That amino acids in the same biochemical pathway transversion bias increases the relative efficiency of the are coded by related codons does not necessarily explain second base by an order of magnitude. In addition, if we the observation that amino acids that have similar phys- employ weightings to allow for biases in translation, then icochemical properties also have similar codons (Di Gi- only 1 in every million random alternative codes gener- ulio 1997). It was noted early on that the natural genetic ated is more efficient than the natural code. We thus code appeared to be arranged such that amino acids with conclude not only that the natural genetic code is ex- similar chemical properties are coded by similar codons. tremely efficient at minimizing the effects of errors, but This may well be the result of selection favoring those also that its structure reflects biases in these errors, as codes that minimized the average phenotypic effects of might be expected were the code the product of selection. single-point mutations or mistranslations (see, e.g., Alf- Steinberger 1969; Epstein 1966; Goldberg and Wittes 1966; Sonneborn 1965; Woese 1965, 1973; Woese et al. 1966). Correspondence to: S.J. Freeland; e-mail: [email protected] An attempt to quantify this effect by Haig and Hurst 239 (1991) found that of 10,000 randomly generated codes, The MS measure for any particular genetic code is calculated as only 2 performed better at minimizing the effects of er- four separate values: MS1, MS2, and MS3 correspond to all possible ror, when polar requirement was taken as the amino acid single-base substitutions in the first, second, and third codon positions, respectively, of all codons in a given genetic code; MS0 corresponds to property (see also Di Giulio 1989; Goldman 1993; Szath- all possible single base changes in all codon positions. mary and Zintzaras 1992). Were this due to the fact that biosynthetically related amino acids share the same first codon position, changes at the second and third base The Weighted Mean Square (WMS) Measure should account for nearly all the efficiency of the code, as these conserve first-base identity. However, first and In an extension to the methods of Haig and Hurst, MS values for each third bases are very efficient, while the second base code were calculated at 20 different weightings (weights of 1, 2, . 20) proved unexceptional (Haig and Hurst 1991). Put another of transition/transversion bias. Because the nature of the MS measure is to incorporate every possible (i.e., transition and transversion) mu- way, it is the second base that determines the polar re- tation of a given base, it was not possible to weight the probability of quirement of the amino acid. It may hence be concluded transitions as opposed to transversions, so our tests weighted the that the code is very efficient at minimizing the effects of squared difference in polar requirement resulting from transitions dif- errors, and this is probably the result of selection be- ferently from that resulting from transversion bias, thus turning the MS tween alternative codes, with selection favoring those measures into WMS measures. At a weighting of 1, all possible mutations are weighted equally that minimize the effects of errors on fitness. when calculating the MS values for each position of each codon. At a The above analysis (Haig and Hurst 1991) made no weighting of 2, the differences in amino acid attribute resulting from allowance for any biases in errors. All bases were as- transition error (i.e., U to C, C to U, A to G, or G to A) were weighted sumed to be equally prone to error and all forms of error twice as heavily as those resulting from transversion errors. For ex- at each base were assumed to be equally likely. How- ample, the possible errors used to calculate the MS2 value of codon UUU (Phe) are UCU (Ser), UAU (Tyr) and UGU (Cys), of which only ever, both mutation and mistranslation are biased pro- UCU (Ser) represents a transition error, Tyr and Cys resulting from cesses. Transition mutations tend to occur more fre- transversion errors. quently than transversion mutations (e.g., see Collins All MS calculations presented here consider a single attribute of 1994; Kumar 1996; Moriyama and Powell 1997; Morton amino acids, namely, the ‘‘polar requirement’’ (Woese et al. 1966), 1995). Likewise, mistranslation appears to have transi- which may be considered a measure of hydrophobicity: amino acids with a high polar requirement are more strongly hydrophobic than tion/translation biases as well as biases by codon position those with a low polar requirement. This particular measure was chosen (Friedman and Weinstein 1964; Parker 1989; Woese because previous work (Haig and Hurst 1991) on optimization of the 1965). If the code has evolved to minimize the effects of genetic code found it to give the most significant evidence of load either, then we might expect that if one allows for biases minimization from an array of four amino acid properties (also tested in the direction of mutation and/or mistranslation, the were hydropathy, molecular volume, and isoelectric point). natural code should, in terms of relative efficiency, per- form even better than randomly generated codes. Here, Rules for Forming Variant Genetic Codes then, we extend the previous analysis so as to ask how these biases effect the relative efficiency of the natural In general, MS values for a given transition/transversion bias were code. calculated for a large number of randomly generated variant genetic codes. Our criteria for creating plausible alternative codes are the same as those used by Haig and Hurst (1991). Methods 1. The ‘‘codon space’’ (i.e., the 64 possible codons) is divided into the 21 nonoverlapping sets of codons observed in the natural code, each The Mean Square (MS) Measure set comprising all codons specifying a particular amino acid in the natural code (20 sets for the amino acids and 1 set for the 3 stop codons). Work presented here uses the ‘‘mean square’’ (MS) measure (Haig and 2. Each alternative code is formed by randomly assigning each of the Hurst 1991) to quantify the relative efficiency of any given code. This 20 amino acids to one of these sets. All three stop codons remain measure calculates the mean squared change in an amino acid property invariant in position for all alternative codes. resulting from all possible changes to each base of all codons within a given code. Any one change is calculated as the squared difference between the amino acid coded for by the original codon and the amino The MS measures of each sample of codes generated by this process acid coded for by the new (mutated) codon. Synonymous changes are form a probability distribution against which the real code MS values included in the calculation, but changes to and from stop codons are may be compared. As noted by Haig and Hurst, variant codes produced ignored. by this method retain the level of redundancy inherent to the natural The use of a mean square value avoids a problem with negatives, code, i.e., this method controls for the redundancy inherent in the code. but it also, unavoidably, introduces a form of weighting of the relative These methods were incorporated into an ANSI ‘‘C’’ program importance of differences in chemical property. Many alternative which calculated MS values for all randomly generated codes over a weightings are imaginable. Code fitness might perhaps be a linear range of weightings for transition/transversion bias.

The Genetic Code Is One in a Million

Genetic Code

Dna the Code of Life Worksheet

Genetic Code: a New Understanding of Codon – Amino Acid Assignment

Basic Genetic Terms for Teachers

DNA : TACGCGTATACCGACATT Transcription Will Make Mrna From

Transcription Study Guide This Study Guide Is a Written Version of the Material You Have Seen Presented in the Transcription Unit

The Genomics Era: the Future of Genetics in Medicine - Glossary

Glossary/Index

"The" Genetic Code?

How Genes Work

The Universal Genetic Code, Protein Coding Genes and Genomes of All Species Were Optimized for Frameshift Tolerance Abstract

The Evolution of the Genetic Code: Impasses and Challenges