Nucleotide Base Coding and Am1ino Acid Replacemients in Proteins* by Emil L
Total Page:16
File Type:pdf, Size:1020Kb
VOL. 48, 1962 BIOCHEMISTRY: E. L. SAIITH 677 18 Britten, R. J., and R. B. Roberts, Science, 131, 32 (1960). '9 Crestfield, A. M., K. C. Smith, and F. WV. Allen, J. Biol. Chem., 216, 185 (1955). 20 Gamow, G., Nature, 173, 318 (1954). 21 Brenner, S., these PROCEEDINGS, 43, 687 (1957). 22 Nirenberg, M. WV., J. H. Matthaei, and 0. WV. Jones, unpublished data. 23 Crick, F. H. C., L. Barnett, S. Brenner, and R. J. Watts-Tobin, Nature, 192, 1227 (1961). 24 Levene, P. A., and R. S. Tipson, J. Biol. Ch-nn., 111, 313 (1935). 25 Gierer, A., and K. W. Mundry, Nature, 182, 1437 (1958). 2' Tsugita, A., and H. Fraenkel-Conrat, J. Mllot. Biol., in press. 27 Tsugita, A., and H. Fraenkel-Conrat, personal communication. 28 Wittmann, H. G., Naturwissenschaften, 48, 729 (1961). 29 Freese, E., in Structure and Function of Genetic Elements, Brookhaven Symposia in Biology, no. 12 (1959), p. 63. NUCLEOTIDE BASE CODING AND AM1INO ACID REPLACEMIENTS IN PROTEINS* BY EMIL L. SMITHt LABORATORY FOR STUDY OF HEREDITARY AND METABOLIC DISORDERS AND THE DEPARTMENTS OF BIOLOGICAL CHEMISTRY AND MEDICINE, UNIVERSITY OF UTAH COLLEGE OF MEDICINE Communicated by Severo Ochoa, February 14, 1962 The problem of which bases of messenger or template RNA' specify the coding of amino acids in proteins has been largely elucidated by the use of synthetic polyri- bonucleotides.2-7 For these triplet nucleotide compositions (Table 1), it is of in- terest to examine some of the presently known cases of amino acid substitutions in polypeptides or proteins of known structure. The code appears to be universal, that is, it is the same in all species.6 It is assumed that a mutation involving the substitution of one amino acid for another in a protein of the same species, e.g., human hemoglobin, represents an alteration in the triplet code in which only a single base of the three is replaced without al- teration of the sequence of the other two bases. A change of two or more bases is less likely. Moreover, in those cases for which the code for two amino acids is represented by the same triplet composition of bases, a substitution by an amino acid possessing the same code composition would be unlikely, since this would amount to a double base substitution in order to accomplish the necessary trans- position in sequence. Various considerations suggest that the triplet code is of the nonoverlapping type; this has recently been discussed rather fully by Crick et al.,8 who have also reported evidence that a triplet code is involved. At the time this present evaluation was undertaken, the codes for only 14 amino acids were known.9 It was of interest to determine whether it was possible to pre- dict, on the basis of known amino acid substitutions, the codes for other amino acids. This proved to be valid for certain amino acids where sufficient information was available. This is illustrated below for glutamic acid, aspartic acid, asparagine. and alanine. These data, as well as other, thus serve as a verification of the thesis that simple mutations involving an amino acid substitution involve a change in a single nucleotide base of a triplet without alteration of the sequence in the triplet. 678 BIOCHEMISTRY: E. L. SMITH PROC. N. A. S. TABLE 1 TRIPLET CODE LETTERS FOR AMINO ACIDS* Amino acid Code Amino acid Code Amino acid Code Alanine UCG Glycine UG2 Proline UC2 Arginine UCG Histidine UAC Serine U2C Asparagine UA2 (UAC)t Isoleucine U2A Threonine UAC (UC2) Aspartic acid UAG Leucine U2C (U2A, U2G) Tryptophan UG2 Cysteine U2G Lysine UA2 Tyrosine U2A Glutamic acid UAG Methionine UAG Valine U2G Glutamine UCG Phenylalanine UUU ..... ... * This table has been adapted from Speyer, Lengyel, Basilio, and Ochoa.6 These codes are, in general, similar to the 15 recently reported by Martin, Matthaei, Jones, and Nirenberg.7 The code for glutamine is not directly available. It was deduced from an amino acid replacement in HNO2 mut ant of tobacco mosaic virus.4 t Code letters in parentheses represent additional letters for the amino acid (degenerate code).6 On the basis of this concept, it is useful to examine some homologous proteins of different species in which substitution of amino acids is known; this offers the pos- sibility of determining whether the alteration has involved a change in a single base or whether two or more base substitutions have occurred. Hence, it can be de- duced, in some instances, whether single or multiple steps in mutation have occurred during the evolution of species-specific proteins. Thus, a definition is at hand for direct permissible substitutions (single base change in the triplet) and for changes which would appear to be nonpermissible by a change in a single base. It is also of interest to examine briefly some amino acid substitutions as related to the known codes in terms of possible effects on the functions of proteins. Deduced Codes for Amino Acids.-From known amino acid substitutions, it was possible to deduce certain codes by making use of the code information then avail- able. Code for glutamic acid: In Table 2, there are listed the presently reported amino TABLE 2 SOME AMINO ACID REPLACEMENTS IN HUMAN HEMOGLOBINS Mutant ,_ __Mutant HbA Amino acid Type of Hb HbA Amino acid Type of Hb Glutamic acid Valine S10, 13 Lysine Aspartic acid F18 Glutamic acid Lysine C", E'2 Histidine Tyrosine MBostOn MEmory'5 Glutamic acid Glycine GSan Jose13 Histidine Arginine Zurich'5 Glutamic acid Glutamine GHonolulu14 Glutamic acid Alanine A220 Valine Glutamic acid MMilwaukeel' Serine Threonine A220 Asparagine Lysine Gphiladelphia.6 Threonine Asparagine A220 acid Glycine Aspartic Norfolk17 .... ... acid substitutions in human hemoglobin (Hb). For glutamic acid, it is apparent that this amino acid has been replaced by lysine, valine, glycine, and glutamine. By assuming that only a single base of the triplet can be replaced, the code for glu- tamic acid can be deduced. Since lysine is UA2, the code for glutamic acid must contain at least one A. Since the valine code is U2G, the code for glutamic acid must also contain one U. The code for glycine is UG2; hence, the glutamic acid code must contain, in addition, one G. Therefore, the complete code for glutamic acid is UAG. This is consistent with the code for glutamine, UGC, which can be derived from the code for glutamic acid by a single base change. It is noteworthy that a mutant involving the change glycine to glutamic acid has been found also in the A protein of the tryptophan synthetase of E. coli.21 Code for aspartic acid: Inspection of Table 2 shows that aspartic acid has re- placed lysine (UA2) and glycine (UG2). In order for the code for aspartic acid to VOL. 48, 1962 BIOCHEMISTRY: E. L. SMITH 679 have two bases in common with the codes for glycine and lysine, the code for as- partic acid must be UAG. Codefor asparagine: Asparagine has replaced threonine (UC2) in HbA2, and lysine (UA2) has replaced asparagine in HbGPhiladelphia (Table 2). This suggests that the code for asparagine is UAC. Code for alanine: The only known substitution for alanine in the hemoglobin series is in HbA2 in which it has replaced glutamic acid (UAG) although this may not be a simple mutation. Although such assumptions are somewhat risky, one may consider the known replacements for alanine in the homologous proteins of various species. The data of Pal6us and Tuppy,22 for the cytochromes c of various species, show that glutamine (UCG) and leucine (U2C) replace alanine and, in another position, that serine (U2C), glutamic acid (UAG), -and leucine (U2C) oc- cupy the same position as alanine. Also, in the insulins of various species threonine (UC2) and alanine occur in the same position.23' 24 The codes for almost all of the aforementioned amino acids, viz., threonine, serine, leucine, and glutamine, con- tain U and C. Although some of the transformations for the triplet codes of these amino acids directly to one another and to alanine may involve more than one base change, it appears very likely that at least some of these are permissible changes, that is, that they involve only a single base change. The foregoing information suggests that the code for alanine would have to contain U and C in order to have two bases in common with the code for most of these amino acids. Since the codes for glutamic acid and glutamine both contain G, it is likely that the code for alanine is UCG. This has subsequently been shown to be the code for alanine.6 The Role of Uracil.-The most striking feature of the triplet base compositions for all of the amino acids thus far is that each triplet contains at least one U. Inas- much as the presence of U is the only unique feature of the base composition of RNA, as compared to DNA, an important and specific function for U in template or messenger RNA is implied. On the basis of a triplet code with 4 bases, there are 64 possible combinations. If, however, each code for an amino acid must contain one U, then all of the 27 possible codes lacking one U are excluded. This leaves for the 20 amino acids commonly found in proteins 37 potential triplets. The presently available information on mutants suggests that there are probably preferred positions in the triplet for the unique U although the sequences in the triplet are unknown. It is likely, for example, that there are 16 codes for amino acids in which U occupies the same unknown position of the triplet.