<<

Evolutionary Anthropology 14:6–11 (2005)

CROTCHETS & QUIDDITIES

“The” Genetic ?

KENNETH M. WEISS AND ANNE V. BUCHANAN

The DNA-based code for through messenger and transfer RNA is widely themselves, that carry the informa- regarded as the code of . But are littered with other kinds of coding tion. elements as well, and all of them probably came after a supercode for the tRNA Your life depends on the fidelity of system itself. these many . Aberrant codes re- lated to behavior can lead to dys- genesis or various metabolic diseases. Evolution and the diversification of Everyone knows of “the” genetic Anomalous cell-surface can are made possible by code, by which triplets in cause autoimmune destruction, and vi- codes, or arbitrary assignments of DNA in the nucleus of cells specify the ruses are the Alan Turings of life that “meaning,” in multiple ways. Many (aa) sequence of proteins. evolve ways to break their receptor are not widely appreciated. Codes al- This is the code described in text- codes to gain illicit entry into cells (Fig. low the same system of components books as the heart of the genetic the- 1). to be used for multiple purposes. ory of life and its evolution. Discover- But there is an additional code, a These can be open-ended, the way the ies in recent years have made things code of codes, that makes all of this alphabet and vocabulary make this more complicated by showing that ge- possible, including “the” column possible, but the flexibility of nomes are littered with all sorts of itself, and may be the oldest and most a code can become constrained once a other kinds of coding elements. An ex- fundamental one of all. Protein-cod- system with many components that ample is DNA sequences located near ing (Figure 2) works via two interme- must work in concert, if an to protein-coding segments—“” diaries: messenger RNA (mRNA), a is to develop and survive—or evolve— proper—that are chemically recog- complementary copy of code tran- is in place. Codes are highly efficient nized by regulatory proteins, which scribed from a of DNA, ways to carry uncorrupted informa- bind there to cause the nearby to and transfer RNA (tRNA), which car- tion from one place to enable indirect be expressed (or repressed) in specific ries aa’s to to be linked to- action elsewhere. But this requires a cells or under specific conditions.1,2 gether to form a chain (polypeptide) decryption system at the receiving This gene-expression mechanism is as that becomes a protein, thus translat- end. fundamental as the classical genetic ing “the” genetic code. If mRNA takes Fidelity at both ends is vital. In code, because it allows cells to differ- a message to ribosomes, tRNA is the World War II, the Germans had their entiate into organs and tissues, en- Enigma decrypter that turns the code Enigma encryption machine, whose abling organisms from plants to peo- into action (a protein). As important use by U-boats caused chaos for Brit- ple to exist. The cells in the eyes you’re as the genetic protein code itself is, ish shipping, until the eccentric com- reading with and in the fingers that this decryption system may be the puter pioneer Alan Turing learned hold this page all have the same ge- most deeply fundamental aspect of how to break the code. That led to nome, but eyes and fingers are differ- the coding system, itself a kind of doom for Germany. For organisms, ent because they use different subsets code, and probably the first code. the price for code breakdown in the of those genes. This is specified by In 1953 Watson and Crick solved Battle of Life is similarly unremorse- tissue-specific developmental codes. the basic structure of DNA as a set of ful. Other DNA sequence elements are parallel chains connected by specific used to package, protect, or copy pairing of , A with T and C DNA. Embryos develop and adult or- with G, the famous double helix. It ganisms respond to their environ- was not until 1967 that the use of DNA Ken Weiss is Evan Pugh Professor and Anne ments by using extensive arbitrary to code for aa sequences—our friend Buchanan is Senior Research Scientist in the codes in the form of combinations of the protein code, shown in Table Department of Anthropology at Penn State chemical signaling molecules that are 1—was deciphered (see3). Remark- University. E-mail: [email protected] produced by specific genes and are ably, this protein-coding process was released to be detected by other cells found to be based on the same © 2005 Wiley-Liss, Inc. whose they alter. Watson-Crick complementary base- DOI 10.1002/evan.20033 Published online in Wiley InterScience These are codes because it is the com- pairing phenomenon that made DNA (www.interscience.wiley.com). bination of factors, not the factors itself. Experiments that synthesized CROTCHETS & QUIDDITIES “The” Genetic Code? 7

be much more difficult to maintain or ature as the second genetic code or modify the code by . the RNA code). Like the protein code, Too many would change a the supercode involves RNA-aa asso- codon’s specification from an aa to ciations, but unlike the protein code, “nothing,” causing a skip in the the supercode is far from universal mRNA, and translating into nothing. and includes aspects of tRNA that have nothing to do with its nucleotide sequence or base-pairing. Instead, en- ATALEWITHIN,ORUPONA zymes called aminoacyl-tRNA syn- TALE thetases (aaRS’s) first “charge,” or aminoacylate the tRNA by attaching So far so good. But if genes code for proteins by specifying their aa se- its cognate aa, and then accompany it quence, then the mecha- to the ribosomes where the aa is dis- nism has to be just as specific! Sur- charged to the growing aa string. prisingly, how this codon-tRNA-aa There are specific aaRS’s for each aa. specifity, with its multiplicities of tRNA’s are small molecules 73 to 95 codons and tRNA genes with the same nucleotides in length that use comple- specificity, is managed and main- mentary base-pairing to fold into a Figure 1. Alan Turing. tained has little to do with the protein four-armed cloverleaf structure (Fig- code. Instead, there is an independent ure 3A), the D stem-loop, the central tRNA “supercode” (known in the liter- arm containing the anticodon, the template RNA strings in bacterial ex- tracts to see what aa strings were pro- duced showed that specific nucleotide triplets in DNA, that we now call codons, through a mRNA copy, bind with nucleotide triplet anticodons in tRNA (i.e., the anticodon is a string of 3 nucleotides complementary to those at the corresponding codon position in the mRNA). This system entirely depends on the fact that, when activated, distinct tRNA molecules bind a specific aa which they carry to the mRNA during translation, and that the aa they carry corresponds to their anticodon se- quence. As the mRNA moves through a , each of its triplets is se- quentially bound, again by base-pair- ing, to a tRNA with the complemen- tary anticodon; this juxtaposes the tRNA’s aa to the end of the building string of aa’s (Figure 2B). The 64 nucleotide triplets made possible by four options (A,C,T,G) in each of three successive positions, code for only 20 aa’s because most amino acids are represented by more than one codon, and hence more than one type of tRNA specifies a given aa, as shown in Table 1. The table also shows that the has many cop- ies of most types of tRNA (each coded for and transcribed from its own loca- tion in the genome). It is crucial that the code and its supporting tRNA sys- Figure 2. The main steps in DNA as a protein code. A. : mRNA copied from the tem be degenerate. Were this not so, template DNA. B. Translation: amino acids carried to the ribosome by tRNAs for incorpo- more than 40 nucleotide triplets ration into the growing polypeptide chain, with mRNA as template. The aaRS’s are would code for nothing, and it would explained below. 8 Weiss and Buchanan CROTCHETS & QUIDDITIES

TABLE 1. GENETIC CODE: CODON TRIPLETS, AA NAME FOR WHICH EACH tRNA-aa part of the story would have TRIPLET CODES, AND IN PARENTHESES THE NUMBER OF GENES IN THE HUMAN nothing to do with coding. The triplet GENOME THAT PRODUCE tRNA THAT USES THAT CODON codes for each aa are identical in all species—there is only one codebook, that (with minor exceptions) pre- serves vital aspects of evolution and the coherence of all life. But the way in which tRNA’s pair with aa’s is not universal at all. Gene duplication events are respon- sible for the multiple copies of genes specifying tRNA’s that use each aa (Table 1). Like any other gene, each tRNA gene experiences and varies. Similarly, the aaRS genes are also subject to mutation. Though all aaRSs have the same function, they are otherwise characterized by their diversity among species. Since there are only 20 aaRS genes, one for each aa, but the code is degenerate, an aaRS for a specific aa must recognize and charge the set of tRNAs (with dif- ferent anticodons and multiple inde- pendent, varying copies in the ge-

T␺C stem-loop, and the acceptor stem at the other end where its aa is at- tached. Due to base-pairing character- istics and chemical modification of the nucleotides, the folded tRNA forms a 3-dimensional L-shape that collapses the four cloverleaf arms into two domains: the acceptor and the an- ticodon (Figure 3B). It is important that forming its folded shape requires nucleotides in different places along the linear sequence code for the tRNA itself to be complementary, which we illustrate schematically in Figure 3C. The appropriate aa binds to its tRNA where bases 1–7 in the acceptor stem pair with bases 72–66 at the other end, always terminating in a sin- gled-stranded N73CCA76 string (N means that position 73 in the se- quence can be any nucleotide, but all aa’s in part require a CCA at the very end of the tRNA in order to attach properly). The numbers are the nucle- otides reading from left to right, 5Ј to 3Ј as RNA sequences are usually writ- ten out. tRNA combines with its aaRS as part of the protein assembly sys- tem, that decrypts the original code Figure 3. RNA molecule structure. A. Two-dimensional tRNA cloverleaf. B. Three-dimen- (Figure 2). sional L structure. C. Schematic illustration of how a hypothetical stem-loop folded struc- These are universal aspects of ture requires that the coding for the tRNA must have complementary nucle- tRNA’s, and if that were all, the otides in appropriate places (the highlighted U and A). A and B reprinted from2. CROTCHETS & QUIDDITIES “The” Genetic Code? 9

Glu attached to the latter complex is then modified to become Gln (glu- tamine) to make everybody happy. It is all of this that makes the supercode a “code.” The aaRS’s comprise two largely unrelated classes each with ten genes (Table 2). The genes coding for aaRS’s in each class share short sequence motifs and the resultant folding shape for the aaRS protein. Each class at- taches the aa to a different side of the tRNA molecule. Different major branches of life have different subsets of these systems: we and other mam- mals basically use a simple, direct sys- tem,6,8 and sequence phylogenies of the aaRS genes do not completely fit the standard phylogenetic tree of ar- chaeae, and (in- Figure 4. The mutual combinations that are the basis of the supercode. A. The binding of cluding us). This suggests that these an aaRS (thinner ribbon to the right) to its corresponding tRNA (thicker ribbon, left) involves or even the translation Gln recognition of parts across the structures of both molecules (this shows Gln-tRNA , mechanism, arose twice in evolution, glutaminyl-tRNA synthetase, bound to tRNA-Gln). B. tRNA with nucleotide positions repre- either because of dual origins of trans- sented by circles; the size of a circle indicates the with which that position is involved in recognition and hence binding with its aaRS (from4). lation itself or a deep evolutionary his- tory of horizontal transfer of these vi- tal genes, that is, between taxa.9 nome) specific to that aa. It does this in two different ways: directly and indi- by a combined chemical interaction rectly. The direct pathway relies on 20 among protein, RNA, and aa in which aaRS’s that discriminate between all 20 tRNA and the Origins of the aaRS recognizes a flexible variety aa’s, while the indirect route uses some Complex Life of “identity elements” in the tRNA’s non-discriminating aaRS’s.7 The indi- Complex life—like ourselves—arose 3-dimensional structure, involving the rect pathway, requires an additional after the protein coding system made acceptor arm, the anticodon loop it- step. A non-discriminating aaRS possible arbitrarily long strings of aa’s self and other sites (as well as negative charges its cognate tRNA with the ap- (proteins) whose sequence could be identifiers that preclude binding non- propriate aa, but also charges some reliably specified, so they did not have cognate tRNA’s), there are also proof- non-cognate tRNA’s with the same aa. to arise spontaneously by self-assem- reading enzymes that check the proper The non-cognate aa then undergoes bly based on the chemical properties charging of tRNA’s.4 The different chemical modification that converts it of free-floating aa’s in a primordial tRNA’s to which the same aa binds to the proper aa for that tRNA. For ex- soup. But there are inklings that the share unique identity elements, and the ample, if we denote the tRNA type by RNA supercode may have arisen orig- bonds formed involve around 20% of tRNAaa, a non-discriminating GluRS inally from just such chemical prop- the tRNA’s surface area (Figure 4), in an charges both tRNAGlu and tRNAGln erties, and not have been code-like at embrace of many touches.5 with Glu () to produce all. AaRS’s charge tRNA’s with their aa’s Glu-tRNAGlu and Glu-tRNAGln, but the Experiments have shown that what

TABLE 2. TABLE OF AMINOACYL-tRNA SYNTHETASE CLASSES AND THE AMINO ACIDS THEY ATTACH (FOR TERMS AND DETAILS SEE6 AND HTTP://WWW.CS.STEDWARDS.EDU/CHEM/CHEMISTRY/CHEM43/CHEM43/TRNA/TMP-1112108482.HTM) Class I Class II Members ArgRS, CysRS, GlnRS, GluRS, IleRS, AlaRS, AsnRS, AspRS, GlyRS, HisRS, LysRS II, LeuRS, MetRS, TrpRS TyrRS, ValRS PheRS, ProRS, ThrRS, SerRS Sequence motifs HIGH, KMSKS Motifs 1, 2 and 3 Active site Rossmann fold, ϳ170 amino acids Anti-parallel ␤ fold, ϳ250 amino acids typology tRNA binding site Minor groove side of acceptor arm Major groove side of acceptor arm Aminoacylation 3Ј-OH of tRNA 2Ј-OH of tRNA (except PheRS) site Amino acids Bulkier amino acids Smaller amino acids charged 10 Weiss and Buchanan CROTCHETS & QUIDDITIES

protein code that became the basis for have to simmer for decades and the diversity of life we know today. sperm precursors have to divide re- peatedly, and so on. There is a general Darwinian pre- EVOLUTION OF THE diction that evolution will drive selec- SUPERCODE: MULTIPLE tion to as early in life as possible, be- PARTNERS WALTZING cause it is cheaper in terms of THROUGH TIME biological investment to get rid of an offspring early rather than late. Prezy- The supercode is interesting in its gotic selection would be particularly own right and may bear lessons for cheap. Most species easily produce our own evolution. It exemplifies how millions more gametes than they the many parts of a complex informa- need; a woman begins life with 8 mil- tion-based system must be con- lion eggs but even the most prolific strained and evolve simultaneously. uses only around 20 of them. Selec- This is an elegant example of tion within organisms probably (interaction between games) or coevo- purges a lot of mutational variation lution. A newly arising mutation that that never shows up as Mendelian affects one component of such a sys- variation from parent to offspring, to tem must be matched by a compatible compete via Malthusian population change elsewhere in the system (the U pressure for limited resources. with the A highlighted in Figure 3C) if If this is right we should not expect it is not to destroy the in-place orga- many diseases caused by aaRS muta- nization. There must be extensive ep- tions, and that is the case. One possible istasis in the supercode—the interac- Figure 5. The Enigma machine. exception are several mutations in the tion among parts of the same gene, a -RS in affected by the neurolog- tRNA sequence, or different genes ical disease Charcot-Marie-Tooth syn- such as different copies of the cognate drome.17 These effects occur many has been frozen in life is not entirely tRNAs or those and the corresponding by accident relative to the chemical years after birth, which is difficult to aaRS. The supercode shows how com- reconcile with the aspects of the tRNA properties of the components. For ex- plex coevolution was already inherent system we have been describing. If ample, AGA codes for and in the earliest, simplest aspects of life. aaRS mutations cause this disease, the arginine may be attracted to the A mutation in the coding region of likely mechanism is reduced efficiency, chemical “smell” of AGA. The coding most genes may be serious but at least rather than major loss of function, of system may have arisen by such pri- only affect that one protein. But mu- the aaRS gene. Close inspection should meval attractions that may then have tations in the supercode genes are dif- show widespread effects in the body, been the functional units of life. Only ferent. A mutation that causes a tRNA because tens of thousands of proteins later did these RNA triplets become to be charged with the wrong aa, or a include glycine. Still, it’s curious. part of tRNA intermediaries between mutation in an aaRS gene, would af- DNA and mRNA.10–16 fect all proteins that use that particu- However, experiments show that lar aa or tRNA class. It may require a CODES, CODES, AND MORE not-very-darwinian kind of selection the same aa is attracted by RNA se- CODES: BUT SO WHAT? quences other than the triplets of to- to police such a system. The effects of day’s system. So may a mis-specifying aaRS would not have It was a surprise in the last 50 years have been at least partly right to refer to await a race with a predator, or a to learn that the intervening DNA be- to the current codon system as a “fro- battle to defend a territory, for the tween protein-coding genes, once effect to be manifest, and the animal called “junk,” has a diversity of se- zen accident” in evolution—whatever (or plant) carrying the mutation to be quence-based coding functions. “The” coding pattern was established first, eliminated. Instead, there is likely to genetic code has turned into a multi- stuck. Regardless of its origins, any be developmental or prezygotic selec- plicity of codes. DNA sequence specify- such connections are today unrelated tion that purifies cells before they be- ing the classic genetic code, gene ex- to the use of the coding system to come individuals, because any cell in pression, chromosome packaging, and specify the structure of proteins: the the developmental lineage leading to replication might be called linear or use of arginine in millions of places in the egg or sperm cells would be dev- “horizontal” codes—sequences that, millions of proteins has nothing to do astated by supercode mutations. reading along the line, correspond to a with any affinities between arginine There would never even be a chance sequence of aa’s or are directly recog- and AGA. That’s what makes the sys- for competitive selection of the usual nized by some kind of regulatory pro- tem a code.1,2 The main plot of the darwinian kind between individuals. tein. Each such instance in the genome story of evolution may have been the Similar statements probably apply to is independent. The supercode that turning of an original operational su- many gene regulation and cell house- charges tRNA with the right aa is a kind percode into a flexible informational keeping functions: human egg cells of “vertical” code, reading from tRNA CROTCHETS & QUIDDITIES “The” Genetic Code? 11 down to mRNA in ribosomes, and it ence that it may require the low-cost, tional considerations of the aminoacylation reac- isn’t linear, because it’s based on the high-power ability of prezygotic or tion. Trends Biochem Sci 22:211–216. 6 Ibba M, Becker HD, Stathopoulos C, Tumbula 3-dimensional structure of tRNA and other very early selection to monitor. DL, Soll D. 2000. The revis- proteins that bind it, all of which must This may relate to more aspects of life ited. Trends Biochem Sci 25:311–316. coevolve. It isn’t independent either, be- than we have thought, and aspects of 7 Berg J, Tymoczko J, Stryer L. 2002. Biochem- cause while the protein code provides human evolution that we try to ex- istry. New York: W.H. Freeman. plain in terms of conventional selec- 8 Sekine S, Nureki O, Shimada A, Vassylyev DG, open-ended aa strings for any gene, its Yokoyama S. 2001. Structural basis for antico- supercode “staples” the individual ele- tion may evolve for cellular rather don recognition by discriminating glutamyl- ments together for every gene. than organismal reasons. tRNA synthetase. Nat Struct Biol 8:203–206. The subtlety and complexity of even 9 Woese CR, Olsen GJ, Ibba M, Soll D. 2000. Prezygotic selection may explain Aminoacyl-tRNA synthetases, the genetic code, how the supercode is maintained, but the basic supercode exemplify the and the evolutionary process. Microbiol Mol Biol doesn’t explain why such a mecha- many other layers of codes, genetic Rev 64:202–236. nism evolved in the first place. To un- and otherwise, by which life is lived 10 Yarus M. 1998. Amino acids as RNA ligands: a and through which evolution works. direct-RNA-template theory for the code’s origin. derstand that we probably have to go J Mol Evol 47:109–117. But the supercode is different: if it back to when all life was simple and 11 Rodin SN, Ohno S. 1997. Four primordial code was function. Today, 3–4 billion were to break the way Turing broke modes of tRNA-synthetase recognition, deter- mined by the (G,C) operational code. Proc Natl years later, the supercode protects the the Enigma code, then we, like the U-boats, would be sunk (Fig. 5). Acad SciUSA94:5183–5188. integrity of the protein code that 12 de Duve C. 1988. Transfer : the second makes biological diversity possible genetic code. 333:117–118. and preserves the unity of all life. NOTES 13 Ribas de Pouplana L, Schimmel P. 2001. Op- erational RNA code for amino acids in relation to The problem of co-evolution is ac- We welcome comments on this col- genetic code in evolution. J Biol Chem 276:6881– 6884. tually a general one. Most proteins umn: [email protected]. There is a fold up upon themselves in ways anal- 14 Noller H. 1993. On the origin of the ribosome: feedback CrotchetyComments page at coevolution of subdomains of tRNA and rRNA. ogous to what happens to tRNA, and http://www.anthro.psu.edu/weiss lab/ In: Gesteland R, Atkins J, editors. The RNA they interact with other substrates in- index.html. We thank John Fleagle for world. Cold Spring Harbor, NY: Cold Spring cluding proteins. Yet functions evolve. Harbor Press. p. 197–220. critically reading the manuscript. 15 Szathmary E. 1999. The origin of the genetic Explanations for this are not yet very code: amino acids as cofactors in an RNA world. satisfactory, but the general idea is Trends Genet 15:223–229. that small mutational changes that do REFERENCES 16 Rodin S, Rodin A, Ohno S. 1996. The presence not make much difference to the sys- of codon-anticodon pairs in the acceptor stem of 1 Weiss KM. 2002. Is the message the medium? tRNAs. Proc Natl Acad SciUSA93:4537–4542. tem can accumulate within the vari- Biological traits and their regulation. Evolution- ary Anthropology 11:88–93. 17 Antonellis A, Ellsworth RE, Sambuughin N, ous interacting components until Puls I, Abel A, Lee-Lin SQ, Jordanova A, Kremen- 2 Weiss KM, Buchanan AV. 2004. and eventually major differences have sky I, Christodoulou K, Middleton LT, Sivaku- the logic of evolution. New York: Wiley-Liss. mar K, Ionasescu V, Funalot B, Vance JM, Gold- been achieved or at least could evolve 3 Nirenberg M. 2004. Historical review: Deci- farb LG, Fischbeck KH, Green ED. 2003. Glycyl rapidly (see a good discussion of this phering the genetic code—a personal account. tRNA synthetase mutations in Charcot-Marie- Trends Biochem Sci 29:46–54. mechanism by Fontana18). However, Tooth disease type 2D and distal spinal muscular 4 Ibba M, Soll D. 2000. Aminoacyl-tRNA synthe- atrophy type V. Am J Hum Genet 72:1293–1299. there are so countlessly many possible sis. Annu Rev Biochem 69:617–650. 18 Fontana W. 2002. Modelling “evo-devo” with mutations of varying levels of differ- 5 Arnez JG, Moras D. 1997. Structural and func- RNA. Bioessays 24:1164–1177.

Books Received

• Kimbel WH, Rak Y, Johanson pp. Cambridge: Cambridge Uni- ISBN 0-520-23369-7 $27.50 DC (2004) The Skull of Australo- versity Press. ISBN 0-521- (cloth). pithecus afarensis. 288 pp. Ox- 53543-3 $29.99 (paper). • Reardon J. (2005) Race to the ford: Oxford University Press. • Oller DK, Griebel U, eds. (2004) Finish: Identity and Governance ISBN 0-195-15706-0 $164.50 Evolution of Communication in an Age of . xii ϩ 237 (cloth). Systems: A Comparative Ap- pp. Princeton: Princeton Univer- • Schwartz JH (2005) The Red proach.xϩ 338 pp. Cambridge: sity Press. ISBN 0-691-11857-4 Ape: Orangutans and Human Or- The MIT Press. ISBN 0-262- $17.95 (paper). igins. xvi ϩ 286 pp. New York: 15111-1 $45.00 (cloth). • Turner T. (2005) Biological An- Westview Press. ISBN 0-8133- • Beard C. (2004) The Hunt for the thropology and Ethics: From Re- 4064-0 $27.50 (cloth). Dawn Monkey: Unearthing the patriation to Genetic Identity.X • McGrew WC (2004) The Cul- Origins of Monkeys, Apes, and ϩ 327 pp. Albany; State Univer- tured Chimpanzee: Reflections on Humans.xvϩ 348 pp. Berkeley: sity of New York Press. ISBN Cultural Primatology. xiii ϩ 248 University of California Press. 0-7914-6296-X $28.95 (paper).