<<

11 The &

To understand how mRNA is At the heart of the central dogma is the concept that information in the Goal translated into protein. form of the four-letter alphabet (A, G, C and T) of the genetic material is translated into the 20-letter (amino acids) alphabet of proteins. As we Objectives have seen, the intermediary between the genetic material, DNA, and the After this chapter, you should be able to translation machinery, the , is the messenger RNA or mRNA. The mRNA is copied from the DNA in a process called (Chapter • describe the principal features of the 10) and is then decoded on the ribosome in a process called translation, genetic code. where it directs the ordered polymerization of amino acids into polypeptide • explain how tRNAs mediate chains. Here we focus on the nature and logic of the genetic code, the RNA information transfer and do so adaptors that decipher the genetic code, the workings of the molecular accurately. machine that translates mRNA into protein and does so with high accuracy, • describe the bond cycle and and the chemistry of the formation of the peptide bond. how it achieves accuracy. • describe the ribosome cycle and how open reading frames are set. The four-letter alphabet of the genetic material is read in units of three How many bases are required to specify an ? Because nucleic acids have only four bases and proteins are composed of twenty amino acids, the coding unit or codon for each amino acid must consist of more than one base. Even two bases would not be enough. If codons consisted of two bases, then only 16 (or 42) different codons would be possible, and there would insufficient codons to specify 20 amino acids. However, if codons are composed of three bases, then 64 (or 43) codons are possible, more than enough for the amino acid alphabet. Therefore, the minimum number of bases needed to specify 20 different amino acids is three. Indeed, the genetic code is a triplet code. Chapter 11 The Genetic Code & Translation 2

Second position Figure 1 Each codon corresponds to a particular amino acid U C A G UUU UCU UAU UGU U Each codon is written from 5’ to 3’. The Phe Tyr Cys (AUG) is shown in green. Stop UUC UCC UAC UGC C U Ser codons (UAA, UAG, and UGA) are shown UUA UCA UAA UGA Stop A in red. Leu Stop UUG UCG UAG UGG Trp G

CUU CCU CAU CGU U His CUC CCC CAC CGC C C Leu Pro Arg Third position (3’) CUA CCA CAA CGA A Gln CUG CCG CAG CGG G

AUU ACU AAU AGU U Asn Ser

First Position (5’) Position First AUC Ile ACC AAC AGC C A Thr AUA ACA AAA AGA A Lys Arg AUG Met ACG AAG AGG G

GUU GCU GAU GGU U Asp GUC GCC GAC GGC C G Val Ala Gly GUA GCA GAA GGA A Glu GUG GCG GAG GGG G

Of the 64 possible codons in the genetic code, 61 specify amino acids, indicating that many amino acids are encoded by more than one codon (Figure 1). Thus, the code isdegenerate in the sense that some amino acids are specified by more than one synonymous codon. At one extreme, , , and are each specified by six synonymous codons. At the other extreme, and have unique codons (AUG and UGG, respectively). The methionine codon AUG has an additional function as a start codon: it signals the beginning of a coding sequence in the mRNA. The remaining three (of the 64) triplets—UAA, UGA, and UAG—do not specify any amino acids. Instead, these triplets are stop codons that signal the end of the coding sequence for a messenger RNA. Thus, coding sequences have two kinds of punctuation marks: an AUG at the beginning that marks the start and one (or more successive) stop codons at the end. We will return to start and stop codons near the end of the chapter, including the question of how AUG can serve both as a methionine codon internal to a coding sequence and as the start signal at the beginning. The entire genetic code, which is sometimes called the “Rosetta Stone of Life” (because it deciphers codons), is shown in its entirety in Figure 1. Notice that the left-hand vertical column indicates the first (5’) position in a codon, the horizontal bar across the top indicates the second position, and the right-hand vertical column indicates the third (3’) position. Start and stop codons are highlighted in green and red, respectively. Finally, we return to the 5’-to-3’ directionality of polynucleotides. Codons have a 5’-to-3’ orientation with respect to the directionality of the RNA Chapter 11 The Genetic Code & Translation 3 transcript in which they are embedded. Thus, for example, the CGA codon for arginine has the orientation 5’-CGA-3’. This is in keeping with the three foundational rules of directionality introduced in Chapter 8: polynucleotide synthesis proceeds in a 5’-to-3’ direction (rule 1), polypeptide synthesis

proceeds in an NH2-terminal-to-COOH-terminal direction (rule 2), and information for the order of amino acids in the mRNA is specified sequentially in a 5’-to-3’ direction (rule 3). Thus, codons are lined up in the same 5’-to-3’ orientation as the direction of translation.

Transfer are adaptors between codons and amino acids How are the 61 codons that specify amino acids deciphered such that each directs the incorporation of the appropriate, cognate amino acid? The answer is that codons are recognized by adaptor molecules known as transfer RNAs or tRNAs. In contrast to mRNAs, tRNAs are non-protein- coding RNAs that directly act as adaptors through their tertiary structure, as we will explain. (A second example of non-coding RNAs is the RNA components of the ribosome, as we will also discuss.) Each tRNA recognizes the codon for a particular amino acid. Recognition is mediated by base pairing between the codon and a corresponding anti-codon in the tRNA molecule, which align in an anti-parallel orientation. Covalently attached to each tRNA at its 3’ terminus is its cognate amino acid, that is, the amino (A) O acid that corresponds to that specified by the codon. Thus, the amino acid N NH is covalently attached to a tRNA whose anti-codon pairs with the aspartic acid codon 5’-GAC-3’. N N Some tRNAs recognize more than one synonymous codon. This is possible Inosine due to a phenomenon known as wobble, which takes advantage of the fact that synonymous codons often differ from each other at the third (3’) (B) H position. Part of the explanation for wobble is that the 5’ base in the anti- N O H N codon is not as spatially restricted as the other two, allowing it to “wobble” and form hydrogen bonds with bases other than its cognate base at the 3’ N I N H N C position of the codon. Also, some tRNAs have an unusual base, inosine N N (Figure 2A), at the 5’ (wobble) position of the anti-codon; inosine is able O to pair with A, U or C at the 3’ position of the codon (Figure 2B). Thus, the base-pair rules that form the basis for double-helical structures are not O strictly adhered to in the special case of codon/anti-codon interactions. N O H N U tRNAs are approximately 80 nucleotides in length. They contain regions N N of self-complementarity that enable them to fold back on themselves in a I N H O N characteristic cloverleaf-like pattern of loops and short stretches of double helix (analogous to secondary structure in proteins). The cloverleaf, in turn, folds into a precise three-dimensional structure (analogous to the tertiary H structure of proteins) that resembles the capital letter “L” (an upside-down N N N O H L in Figure 3). A key feature of tRNAs is that the anti-codon and the site of N A N attachment of the amino acid are at opposite ends of the L-shaped molecule. I N H N N N The anti-codon is displayed in a loop at one end, and the amino acid is attached to the 3’ terminus at the other end. Notice that the 5’ and 3’ termini Figure 2 Inosine can form base are both near the same end of the molecule but that the 3’ end protrudes as pairs with A, U, or C a short stretch of single-stranded RNA beyond the 5’ terminus. Chapter 11 The Genetic Code & Translation 4

(A) (B) amino acid attachment point 3’ 5’ 3’ 5’ amino acid attachment point

anti-codon anti-codon Figure 3 tRNAs exhibit secondary and tertiary structure (A) Shown is a cartoon representing the two-dimensional, cloverleaf-like secondary structure of a generic tRNA. Each circle represents an individual nucleotide, with the nucleotides of the anti-codon shown in blue. Black lines indicate base pairing within the strand. (B) Shown is the X-ray crystal structure of a tRNA molecule. The tRNA folds into an L-shaped structure, with the anti-codon at one end and the 3’ hydroxyl, which is where the amino acid becomes attached, at the other end. All tRNAs have this overall folded structure, even though their specific nucleotide sequences vary. Shown is a tRNA for ; as such, it is known as tRNAPhe.

Aminoacyl-tRNA synthetases catalyze tRNA charging What is the nature of the linkage between an amino acid and a tRNA, and how is it created? Amino acids are joined to tRNAs via an acyl linkage between the carboxyl group of a cognate amino acid and the 3’ hydroxyl at the 3’ end of the tRNA molecule, creating an aminoacyl-tRNA (Figure 4). tRNAs bearing an aminoacyl linkage are said to be charged. Charging is catalyzed by an enzyme called aminoacyl-tRNA synthetase (Figure 5). Each amino acid has its own aminoacyl-tRNA synthetase (meaning there are 20 synthetases), which recognizes a particular amino acid and all of the cognate tRNAs for that amino acid.

O 3’ hydroxyl high-energy acyl linkage OH A M P, H2N O Leu tRNA ATP PPi O OH + NH2 anti-codon AAC (3’ AAC 5’) AAC Figure 4 tRNA charging creates a high-energy acyl linkage between the tRNA and its cognate amino acid The particular tRNA represented has a 3’-AAC-5’ anti-codon, which is complementary to the leucine codon 5’-UUG-3’. Consequently, this tRNA (“tRNALeu”) is being charged with leucine. Chapter 11 The Genetic Code & Translation 5

Figure 5 Aminoacyl-tRNA synthetases catalyze tRNA aminoacyl-tRNA charging synthetase tRNA Shown is the X-ray crystal structure of the isoleucyl-tRNA synthetase bound to a tRNAIle. This particular enzyme contains both a catalytic site with a binding pocket for and an editing site with a pocket that distinguishes isoleucine from editing pocket the closely related amino acid .

catalytic binding pocket

anti-codon

Charging is energetically unfavorable; it has a ΔG°rxn that is positive. Therefore, energy must be expended in order to create the aminoacyl linkages. Charging takes place in a two-step process in which the formation of the aminoacyl linkage is coupled to the hydrolysis of ATP (Figure 6). The first step is a transesterification reaction in which the adenosine monophosphate (AMP) moiety of ATP is transferred to the carboxyl group of the amino acid, producing aminoacyl-AMP. This transfer occurs by nucleophilic attack by the oxygen atom from the carboxyl group on the α-phosphate of the ATP, resulting in the liberation of pyrophosphate. As we saw in the context of DNA polymerization, the liberation of pyrophosphate helps drive the reaction forward in that it is coupled to the favorable hydrolysis of pyrophosphate by pyrophosphatase. In the second step, the aminoacyl group is transferred from the aminoacyl-AMP intermediate to the tRNA, generating aminoacyl-tRNA and releasing AMP. This second step is favorable because the reactant aminoacyl-AMP is of higher free energy than the product aminoacyl-tRNA. As we will come to, the aminoacyl-tRNA is the direct substrate for peptide bond formation by the ribosome. Peptide bond formation between a free amino group of one amino acid and the free carboxyl group of another amino acid is energetically unfavorable and hence does not occur spontaneously. However, the aminoacyl linkage in the charged tRNA reactant is at a higher free energy state than the peptide-bond product. Thus, the energy released by the expenditure of an ATP molecule is transferred first to the formation of aminoacyl-AMP, then to the formation of aminoacyl-tRNA and ultimately to the formation of a peptide bond by the ribosome. Chapter 11 The Genetic Code & Translation 6

Amino acid O NH2 NH2 O O N N O N NH3 O O O N O P O N O O O P O P O P O N O N O N NH O O O O 3 + O P O P O Step 1 O O HO OH HO OH

ATP Aminoacyl-AMP PPi

NH2 O N O N O P O high-energy O N N O acyl linkage NH3 O H2N O NH HO OH 2 H N O O N Leu O P O Step 2 tRNA 3’ hydroxyl + O N N O

AAC HO OH anti-codon Aminoacyl-tRNA AMP AAC (3’ AAC 5’)

Figure 6 tRNA charging proceeds via a two-step mechanism Shown is the arrow-pushing mechanism for both steps of the tRNA charging reaction catalyzed by the aminoacyl-tRNA synthetase. In step 1, the carboxylate oxygen of the amino acid acts as a nucleophile and attacks the phosphorus atom of the α-phosphate of ATP, resulting in the liberation of pyrophosphate (PPi). In step 2, the 3’ hydroxyl of the tRNA acts as a nucleophile and attacks the carbonyl carbon of the aminoacyl-AMP molecule generated in step 1. The linkage in aminoacyl-AMP is cleaved, releasing AMP. The other product is an aminoacyl-tRNA molecule in which the carboxyl group of the amino acid is connected to the 3’ end of the tRNA via a high-energy acyl linkage. Although this particular example uses leucine, the mechanism is the same for all of the synthetases.

Aminoacyl-tRNA synthetases employ chemical selectivity and proofreading to achieve high accuracy As adaptors between codons and amino acids, tRNAs are directly responsible for deciphering the genetic code. It is therefore critical that each tRNA be charged with its correct amino acid. Thus, synthetases have a two-fold challenge: they must recognize cognate tRNAs and they must recognize cognate amino acids. How do aminoacyl-tRNA synthetases achieve high accuracy in the recognition of both substrates? Because of their relatively large size, tRNAs are a less-formidable challenge for proper recognition by synthetases than are amino acids. Synthetases interact with tRNAs along a large interface that affords multiple sites of protein- contact for discriminating between cognate and non-cognate tRNAs (Figure 5). Proper recognition of cognate amino acids poses a greater challenge. For many synthetases, selectivity is achieved by binding to a pocket in the catalytic site, the location where the charging reaction described in Figure 6 takes place. The correct amino acid has the highest binding affinity for the binding pocket of the catalytic site; therefore, its binding is favored over the other 19 amino acids. Amino acids that are larger than the correct one are excluded from the catalytic site based on size. Also, amino acids whose Chapter 11 The Genetic Code & Translation 7

Figure 7 The aminoacyl-tRNA CH3 synthetase for isoleucine must CH3 H3C CH3 discriminate against valine, which differs from isoleucine by only a O O H N H N methyl group 3 3 O O isoleucine valine side chains display a negative charge could be excluded from the binding pocket of the catalytic site for an amino acid whose side chain carries a positive charge. Therefore, in many cases, differences in size and chemistry among amino acids suffice for effective discrimination. In some cases, however, the synthetase must distinguish between amino acids that are so similar in size and chemical properties that effective discrimination based solely on selective binding is not possible. Consider, for example, the challenge of the synthetase for isoleucine in discriminating against valine. Isoleucine differs from valine only by the presence of an extra methyl group in its side chain (Figure 7). Hence, valine is likely to fit in the catalytic-site binding pocket for isoleucine. Even though the catalytic site

favors isoleucine over valine, the difference in ΔG°rxn between isoleucine binding and valine binding is only 2.5 kcal/mol. This means (recall from Chapter 9 that 2.7 kcal/mol roughly corresponds to an equilibrium constant of 100) that the isoleucine aminoacyl-tRNA synthetase would bind valine and incorporate the wrong amino acid about once every 100 times, which is a high error rate. How then does the synthetase for isoleucine achieve high accuracy? It does so using a proofreading mechanism analogous to the editing pocket we encountered in Chapter 9 for DNA polymerase. Isoleucyl-tRNA synthetase has an editing pocket that enables it to discriminate against valine in either of the two steps of charging. Recall that isoleucine is successively transferred to AMP and then to the tRNA. If mis-acylation takes place with valine instead of isoleucine, then the valine moiety, whether attached to AMP or to tRNA, is able to slip into an editing pocket that is only large enough to allow valine with its smaller side chain to enter (for simplicity, Figure 8 only shows editing at the second, tRNA aminoacylation step). Once in the editing pocket, the aminoacyl linkage is hydrolyzed, allowing the synthetase to begin a fresh charging cycle. Editing improves the accuracy of charging to less than one mis-charging event in 10,000 cycles of aminoacylation.

The ribosome is a molecular machine Arguably, the ribosome is the most extraordinary of all the molecular machines that mediate the processes of living systems. This will become self- evident as we proceed. The ribosome is composed of two subunits known as the large subunit and small subunit. The total mass of the ribosome is about 3,000 kilodaltons (1 kilodalton = 1,000 atomic mass units), about six times the size (~ 500 kilodaltons) of RNA polymerase. Each subunit is a complex of proteins and RNA molecules. The RNA molecules, which are known as ribosomal RNAs or rRNAs, are non-protein-coding RNAs, which Chapter 11 The Genetic Code & Translation 8

Isoleucine is too large Isoleucine remains to enter editing pocket attached to the tRNA

Ile Isoleucine added Ile Ile (correct)

aminoacyl-tRNA synthetase editing pocket tRNA

catalytic binding pocket

Val

Valine added Val (incorrect) Val

Valine is small enough The editing pocket removes to enter editing pocket valine from the tRNA

Figure 8 Isoleucyl-tRNA synthetase has an editing pocket that discriminates against valine

as we will see are intimately involved in decoding mRNA and in protein synthesis. In bacteria, the small subunit consists of a 1,540-nucleotide-long rRNA and 21 proteins, whereas the large subunit contains two rRNAs, one of 120 nucleotides and the other of 2,900 nucleotides, and 31 proteins. The of are somewhat larger and more complex. Ribosomes contain three binding sites for tRNAs that are important landmarks in the process of translation (Figure 9). These are the A-site, the P-site, and the E-site, which each span across the large and small subunits. The A-site is the entry site fora minoacyl-tRNAs. The P-site is the site at which the peptidyl-tRNA, the growing polypeptide chain attached to tRNA, is located. Finally, the E-site is the exit site from which tRNA that has handed off its cargo leaves the ribosome. As we shall see, all three sites participate dynamically in translation as one charged tRNA after another delivers amino acids for incorporation into the growing polypeptide chain.

Peptide bond formation takes place in three steps In overview, translation commences at the beginning of a protein-coding sequence (which we will soon refer to as an open ) where the start codon is located and terminates at the end of the coding sequence, which is marked by one or more stop codons. Near the end of this chapter we will return to the molecular events that trigger the start and stop of translation. For now, we focus on the ribosome in steady state as it migrates Chapter 11 The Genetic Code & Translation 9

P-site A-site

Large subunit E-site

mRNA binding site

Small subunit

Figure 9 Ribosomes consist of a large subunit and a small subunit and bind tRNAs at three sites Shown here is the X-ray crystal structure of a bacterial ribosome. The large and small subunits are shown in cyan and green, respectively. Three tRNAs shown in purple, orange, and grey are bound to the A-, P-, and E-sites, respectively. The mRNA in this structure is occluded by the small subunit, but its location is indicated.

down the mRNA progressively adding amino acids at the carboxyl terminus of the growing polypeptide chain. This phase of protein synthesis is known as elongation. As illustrated in the hypothetical example of Figure 10, each cycle of peptide bond formation during elongation takes place in three steps: In step 1, a peptidyl-tRNA, that is, a tRNA with a peptide chain attached to it via an acyl linkage, sits in the P-site and an aminoacyl-tRNA enters the A site. In Figure 10 the growing chain in the P-site is a tripeptide,

NH2-methionine--arginine-COOH, with the arginine residue attached via its carboxyl group to its tRNA (tRNAArg). The peptidyl- tRNA is paired with its codon 5’-CGA-3’ in the mRNA via its anti- codon 3’-GCU-5’. Entering the A-site is tRNAGly aminoacylated with . Entry into the A-site is mediated by pairing between the 5’- GGA-3’ glycine codon in the mRNA and the 3’-CCU-5’ anti-codon in the charged tRNA. In step 2, a peptide bond (the main event!) is formed between the glycine in the A-site and the carboxyl-terminal amino acid, arginine, of the peptidyl chain in the P-site. Specifically, the bond forms between Chapter 11 The Genetic Code & Translation 10

Growing peptide (N-terminus) M M T T large M aminoacyl-tRNA M T P-site T R R subunit R G R G G G

GCU CCU

E GCU A E GCU CCU GCU CCU A E CCU A 5’ AUG ACG CGA GGA UUG GGA 3’ 5’ AUG ACG CGA GGA UUG GGA 3’ 5’ AUG ACG CGA GGA UUG GGA 3’ 5’ AUG ACG CGA GGA UUG GGA 3’ mRNA small subunit

Step 1 The aminoacyl-tRNA that Step 2 A peptide bond forms between the amino acid in the Step 3 The small subunit shifts posi- matches the codon binds to the A-site and the growing peptide chain in the P-site. The ribosome tions and the tRNA in the E-site is eject- empty A-site. translocates by three nucleotides, and the growing peptide chain ed. This resets the ribosome for anoth- is transferred onto the tRNA that was previously in the A-site. er translation cycle.

Next cycle

M M M M T T T T R R R R G G G L G L L L

AAC CCU

E CCU A E CCU AAC CCU AAC A E AAC A 5’ AUG ACG CGA GGA UUG GGA 3’ 5’ AUG ACG CGA GGA UUG GGA 3’ 5’ AUG ACG CGA GGA UUG GGA 3’ 5’ AUG ACG CGA GGA UUG GGA GG U3’

Figure 10 Peptide bond formation during elongation is a cyclical process that sequentially adds individual amino acids to the COOH-terminus of the growing polypeptide Shown here are two cycles of elongation. In the first cycle top( row), a glycine residue is added to the COOH-terminus of the growing polypeptide chain. In the second cycle (bottom row), a leucine residue is added. Notice that in translocation (step 3) the large subunit moves ahead first and is then joined by the small subunit.

the free amino group of the glycine and the carbonyl carbon that is connected to the tRNA by an acyl linkage. As a consequence of peptide bond formation, the acyl linkage between the peptide and the tRNAArg in the P-site is broken and the peptide chain that is now one residue longer is transferred to the tRNAGly in the A-site. Thus, peptide bond formation involves the transfer of the growing chain from the tRNA in the P-site to the tRNA in the A-site. In step 3, the ribosome translocates one codon unit (three nucleotides) along the mRNA in the 5’-to-3’ direction, thereby shifting the peptidyl- tRNAGly into the P-site, shifting the tRNAArg now freed of its cargo into the E-site and leaving the A-site vacant. Once in the E-site, the deacylated tRNAArg dissociates from the ribosome, leaving the E-site vacant. The now-empty A-site is ready to accept another charged tRNA in the next cycle of peptide bond formation. Notice in our example that the peptide

chain is growing in an NH2-to-COOH-terminal direction and that the mRNA is being translated in a 5’-to-3’ direction, in keeping with the directional rules we introduced in Chapter 8 and the 5’-to-3’ orientation of codons. Chapter 11 The Genetic Code & Translation 11

Figure 11 The GTP-bound GTP conformation of EF-Tu binds EF-Tu tightly to aminoacyl-tRNAs Shown is the X-ray crystal structure of the GTP-bound conformation of EF-Tu bound to an aminoacyl-tRNA. Hydrolysis of GTP to GDP alters the shape of EF-Tu, causing its release from the aminoacyl-tRNA.

aminoacyl-tRNA

Entry of charged tRNA into the A-site is mediated by Tu with the expenditure of a molecule of GTP In step 1 of the peptide bond formation cycle, a charged tRNA enters the A-site of the ribosome. Entry into the A-site does not, however, occur with free molecules of aminoacyl-tRNA. Rather, charged tRNAs are escorted into the A-site in a complex with a protein called Elongation Factor Tu or EF-Tu bound to a molecule of the guanine nucleotide GTP (EF-Tu·GTP) (Figure 11). Accuracy in protein synthesis demands that the correct charged tRNA enter the A-site as dictated by the codon exposed on the mRNA in the A-site. To ensure that the correct charged tRNA has entered the A-site, EF-Tu·GTP releases its cargo of charged tRNA if, and only if, correct pairing takes place between the anti-codon and the codon. If not, the complex of charged tRNA and EF-Tu·GTP simply diffuses away (Figure 12). If so, then the GTP is hydrolyzed before the complex diffuses away. Whether GTP hydrolysis takes place before the complex diffuses away is governed by the length of time that the charged tRNA remains in the A-site, which is in turn determined by whether there is a correct codon/anti-codon match. Hydrolysis causes a change in the conformation of the EF-Tu that causes the complex to dissociate, allowing free, charged tRNA to remain in the A-site. Thus, energy in the form of GTP is expended to ensure that only the correct charged tRNA is deposited into the A-site. Indeed, without this energy-dependent certification it would not be possible for the ribosome to achieve a high level of accuracy in protein synthesis. Chapter 11 The Genetic Code & Translation 12 We usually think of ATP as the energy currency of the cell, but step 1 of protein synthesis is an example of an energy-dependent process in which the currency is GTP rather than ATP. We will shortly encounter a second example.

Incorrect Match Correct Match

Growing peptide (N-terminus) EF-Tu•GTP M M large T aminoacyl-tRNA T G subunit R M R

UAC GTP CCU GTP

E GCU A E GCU A 5’ AUG ACG CGA GGA UUG GGA 3’ 5’ AUG ACG CGA GGA UUG GGA 3’ mRNA small subunit

No GTP hydrolysis GTP hydrolysis

M M T T R M R G

GTP GDP E GCU UAC E GCU CCU 5’ AUG ACG CGA GGA UUG GGA 3’ 5’ AUG ACG CGA GGA UUG GGA 3’

Aminoacyl-tRNA and EF-Tu dissociates; EF-Tu dissociate from aminoacyl-tRNA ribosome remains in A-site

M M T T R R G M GDP UAC GTP E GCU A E GCU CCU 5’ AUG ACG CGA GGA UUG GGA 3’ 5’ AUG ACG CGA GGA UUG GGA 3’

Another aminoacyl-tRNA Peptide bond formation enters the A-site

Figure 12 EF-Tu expends a molecule of GTP to increase the accuracy of translation The left-hand column shows an incorrect aminoacyl-tRNA bearing methionine binding to a codon that encodes glycine. Since the codon and anti-codon do not match, the complex does not remain bound long enough for EF-Tu to have time to hydrolyze its bound GTP. Because it remains in its GTP-bound conformation, EF-Tu does not release the tRNA charged with methionine, and peptide bond formation does not occur. Instead, the methionine-charged tRNA and EF-Tu dissociate from the ribosome, leaving a vacant A-site to which a different aminoacyl-tRNA can bind. The right-hand column shows a correct match in which a glycine-charged tRNA binds to the A-site. Since the codon and anti-codon match, the complex remains in residence in the A-site long enough for EF-Tu to hydrolyze its bound GTP. The hydrolysis of GTP triggers a conformational change in EF-Tu, causing it to release the glycine-charged tRNA. EF-Tu then dissociates from the ribosome, leaving behind the glycine-charged tRNA in the A-site to take part in peptide bond formation. Chapter 11 The Genetic Code & Translation 13

Peptide bond formation is mediated by nucleophilic attack of the free amino group of the incoming amino acid and the carboxyl carbon of the growing polypeptide chain As we have seen, peptide bond formation takes place in step 2 of the cycle. What is the chemical mechanism by which the bond is formed, and what enzyme catalyzes the reaction? In peptide bond formation, a lone pair of electrons on the nitrogen of the free amino group of the amino acid in the A-site attacks the carboxyl carbon joining the peptide chain to the tRNA in the P-site (i.e., the carbonyl carbon at the C-terminus of the growing peptide chain) (Figure 13). Thus, in peptide bond formation nucleophilic attack replaces the acyl linkage between the carbonyl carbon and the 3’ hydroxyl of the tRNA with a peptide bond between the carbonyl carbon and the amino nitrogen. Recall that the aminoacyl linkage to the tRNA is a high- energy bond that was created with the expenditure of a molecule of ATP during tRNA charging. Thus, energy spent in the form of ATP hydrolysis during tRNA charging is harnessed on the ribosome to drive peptide bond formation without the input of any additional source of energy. The enzyme that catalyzes peptide bond formation is known aspeptidyl transferase. The catalytic center for the sits in the large subunit of the ribosome. In one of the most amazing discoveries in , structural and biochemical experiments have shown that the peptidyl transferase catalytic center is largely, if not entirely, composed of the 2,900-nucleotide rRNA of the large subunit. Thus, an RNA molecule catalyzes peptide bond formation. The peptidyl transferase is therefore an example of an RNA enzyme or ribozyme, a topic to which we return in Chapter 13. Thus, at the heart of the ribosome is an RNA molecule that generates peptide bonds, the most fundamental feature of proteins.

R1 H3N H3N O NH O O R2 R3 R 1 HN H R2 HN NH O H N O NH O O O H R3 O R4 O O R4 O peptide bond formation new peptidyl-tRNA peptidyl-tRNA

empty tRNA aminoacyl-tRNA

Figure 13 During peptide bond formation the growing peptide is transferred from the peptidyl-tRNA to the aminoacyl-tRNA Shown is the arrow-pushing mechanism for the peptide bond-forming reaction that takes place during translation. The free amino group on the amino acid attached to the aminoacyl-tRNA acts as a nucleophile that attacks the carbonyl carbon at the COOH-terminus of the polypeptide attached to the peptidyl-tRNA. The acyl linkage that connects the growing polypeptide to the peptidyl-tRNA is broken. As a consequence, a new peptidyl-tRNA is generated from the previous aminoacyl-tRNA. This reaction is also known as peptidyl transfer, as the entire peptide is transferred from the peptidyl-tRNA to the aminoacyl-tRNA. Chapter 11 The Genetic Code & Translation 14

Box 1 Many antibiotics target the ribosome

Antibiotics kill bacteria by binding to and blocking the functions of various molecular machines in the cell. One of the most common targets of antibiotics is the ribosome. Examples of antibiotics that target the ribosome include erythromycin, kanamycin, streptomycin, and tetracycline. Figure 14 shows the X-ray crystal structure of erythromycin (green) bound to a bacterial ribosome. This particular antibiotic binds close to the site where the peptidyl transferase reaction takes place, and in doing so, inhibits protein synthesis. Importantly, erythromycin is able to discriminate between bacterial and human ribosomes; otherwise, it would be toxic.

erythromycin

aminoacyl-tRNA

peptidyl transfer location

peptidyl-tRNA

Figure 14 Erythromycin binds to the ribosome and disrupts translation The large ribosomal subunit is shown in cyan; rRNA is omitted for clarity, and only proteins are shown. The small ribosomal subunit is shown in green. The aminoacyl-tRNA is shown in magenta and the peptidyl-tRNA is shown in orange. The location of the peptidyl transferase reaction is indicated, with the two amino acid substrates shown in the box.

Movement of the ribosome along the mRNA following peptide bond formation is driven by Elongation Factor G with the hydrolysis of a molecule of GTP Finally, after the peptide bond has formed, the ribosome must translocate along the mRNA by three nucleotides so that the next codon can enter the A-site and the next cycle of translation can commence. How does this movement take place, and what is the source of energy that drives Chapter 11 The Genetic Code & Translation 15 translocation? Movement is driven by a second elongation factor, Elongation Factor G or EF-G, that is also bound to a molecule of GTP (EF-G·GTP). Hydrolysis of GTP once again triggers a conformational change, this time in EF-G, that powers the movement of the ribosome to the next codon unit. What then is the overall cost to the cell for producing a peptide bond? Each cycle of peptide bond formation expends two molecules of GTP (one each in steps 1 and 3) and one molecule of ATP. The ATP was expended in the earlier step of tRNA charging but is not drawn upon until peptide bond formation in step 2. Hence, the overall cost accounting for an elongating ribosome in steady state is three nucleoside triphosphate molecules per cycle of peptide bond formation.

The large and small subunits of the ribosome undergo a cycle of association and dissociation during each round of translation As we have seen, during the elongation phase of protein synthesis, the ribosome, consisting of a large and a small subunit, moves along the coding sequence in the mRNA, translating one codon after another. However, the ribosome does not remain intact when it is not in the process of translating an mRNA. Rather, its two subunits dissociate from each other when the ribosome reaches the end of a protein-coding sequence and re-associate at the beginning of a protein-coding sequence when a new round of translation is initiated. This cycle of dissociation and re-association is known as the ribosome cycle. How do the subunits know where to assemble and initiate protein synthesis? The answer is different for bacteria and eukaryotes. In bacteria, the beginning of a protein-coding sequence is marked by an 5’-AUG-3’ preceded at a short distance by a sequence known as the ribosome binding site that is complementary to a sequence in the rRNA component of the small subunit. It is the combination of the ribosome binding site and the nearby 5’-AUG-3’ that represents the punctuation mark for the beginning S of translation. As we have seen, 5’-AUG-3’ is the codon for methionine when it is located within a protein-coding sequence. But when it is preceded O by the ribosome binding site, it takes on the function of serving as the start C O tRNA codon. The process of initiation commences with the binding of the small H N subunit to the ribosome binding site via base pairing to the complementary H O sequence in the rRNA, as orchestrated by three proteins. As a start codon, 5’-AUG-3’ specifies a modified form of methionine Figure 15 Translation in (harboring a one-carbon unit known as a formyl group covalently attached bacteria initiates with an initiator to the amino group; Figure 15), and protein synthesis is indeed initiated tRNA charged with formylated with this modified methionine, which is delivered to the small subunit by methionine a specialized initiator tRNA. (The modification is not retained. Rather, it is removed enzymatically after protein synthesis commences, and mature The formyl group (-CHO;blue ) is covalently attached to methionine’s amino group. proteins do not have the formyl group or often even a methionine at their N-termini.) Finally, the large subunit binds to the mRNA-bound small subunit to form a complete initiation complex that is ready to commence translation. Initiation works differently in eukaryotes. In eukaryotes the small subunit, once again under the direction of initiation factors, recognizes and binds Chapter 11 The Genetic Code & Translation 16

1 2 Peptide chain is hydrolyzed 3 Ribosome dissociates binds to from peptidyl-tRNA from mRNA

M M N-term T T large subunit R R N-term M G release factor G T R H O G free E P A 2 C-term E CCU A E CCU E CCU 5’ AUG ACG CGA GGA UGA GGA 3’ 5’ AUG ACG CGA GGA UGA GGA 3’ 5’ AUG ACG CGA GGA UGA GGA 3’ 5’ AUG ACG CGA GGA UGA GGA 3’

small subunit

CCU Figure 16 Release factors recognize stop codons and terminate translation

to the cap (Chapter 10) at the 5’ end of the mRNA. The small subunit then slides down the mRNA in a 5’-to-3’ direction while scanning for a 5’- AUG-3’. The first 5’-AUG-3’ it encounters serves as the start codon. Once a start codon is encountered, the large subunit is recruited to the mRNA- bound small subunit by initiation factors to create the initiation complex. In eukaryotes, the start codon simply specifies methionine rather than a modified form of methionine. Finally, we come to the dissociation of the ribosome when it reaches the end of a protein-coding sequence. The end of the coding sequence is marked by a stop codon (5’-UAA-3’, 5’-UGA-3’, or 5’-UAG-3’) or sometimes more than one stop codon in succession. Stop codons are not recognized by tRNAs. Instead, stop codons are recognized by release factors, which catalyze the release of the completed polypeptide chain from the peptidyl-tRNA and the dissociation and release of the ribosome from the mRNA (Figure 16). Thus, the cycle of association and dissociation repeats with each round of translation, with the small and large subunits assembling into an initiation complex at the start of a protein-coding sequence and dissociating back into free subunits at the end of a coding sequence. Lastly, it is important to note that the start site of translation is not the same as the start site of transcription. Indeed, the start codon is preceded by untranslated sequences that extend upstream to the 5’ end of the mRNA, which corresponds to the transcription start site (position +1 in the transcription unit). As we have seen, this upstream region contains the ribosome binding site in bacteria and untranslated sequences downstream of the 5’ cap that the small subunit scans in eukaryotes. Likewise, the stop codon does not correspond to the 3’ end of an mRNA. Rather, the 3’ end of the mRNA extends past the coding sequence and contains untranslated sequences.

The start codon sets the reading frame for protein-coding sequences As we have seen, the formation of the initiation complex at the beginning of a protein-coding sequence is a complicated process involving multiple protein factors and recognition cues in the mRNA. Why is it so complicated? Chapter 11 The Genetic Code & Translation 17

Figure 17 mRNA sequences can Reading frame 1 5’ ... AUGACGACGACGACGACGACGACG ... 3’ be decoded in three reading frames Met Thr Thr Thr Thr Thr Thr Thr Shown is the repeating sequence 5’-… Reading frame 2 5’ ...AUGGACGACGACGACGACGACGACG ... 3’ ACGACGACGACG…-3’ translated in Met Asp Asp Asp Asp Asp Asp Asp three different reading frames, which are set by the placement of the start codon. Reading frame 3 5’ ...AUGCGACGACGACGACGACGACGACG ... 3’ Met Arg Arg Arg Arg Arg Arg Arg

Why do bacteria and eukaryotes go to such lengths to ensure that protein synthesis is initiated with precision at the start codon? The answer is that the start codon not only marks the beginning of a protein-coding sequence but also sets the reading frame in which all the successive triplets will be translated. Without a start codon, the mRNA is simply a string of nucleotides that could in principle be translated in any of three possible reading frames. Just which frame is the correct one is set by the start codon. That is, each successive and immediately adjacent triplet after the start codon is in the same reading frame as the start codon and represents the correct coding sequence. If, for example, the position of the start codon in the RNA were shifted by one or two nucleotides, then the downstream coding sequence would be completely altered. Consider, for example, the three sequences shown in Figure 17. All three are the same except for the positions of the 5’-AUG-3’ start codons, which set three different reading frames. In this hypothetical example, three completely different amino acid sequences are translated from the same RNA sequence depending on the frame of the start codon. These considerations explain why initiation must take place with single- nucleotide precision; otherwise, the ribosome would generate the wrong sequence of amino acids. Note also that the reading frame determines whether the ribosome will encounter a stop codon. Stop codons are only recognized as such if they are in the same reading frame as the amino-acid-specifying codons that precede them in a protein-coding sequence. For these reasons, protein- coding sequences are often referred to asopen reading frames, that is, a stretch of base triplets that lacks, or extends up to, a stop codon in the same frame. Thus, exons in the pre-mRNAs of eukaryotes are open reading frames. Removal of an intron between two exons merges the exons into a single, longer . This explains why, as stated in Chapter 10, removal of introns from pre-mRNAs by splicing must also take place with single-nucleotide precision. If not, then the ribosome could or would translate the mRNA in an incorrect frame, resulting in a completely different sequence of amino acids. Finally, the concept of an open reading frame also explains why that insert or delete a single base pair in a coding sequence (frame shift mutations) have profound effects on the function of a as compared to mutations that simply replace one base pair with another. Insertions and deletions change the reading frame, whereas replacement of one base pair with another only alters a single codon. Chapter 11 The Genetic Code & Translation 18 Summary Information in the form of the linear order of bases is translated into sequences of amino acids according to the genetic code. The code uses triplet units that specify amino acids. Of the 64 possible triplets, 61 specify amino acids, whereas three are stop codons. One triplet, 5’-AUG-3’, is both the codon for methionine within protein-coding sequences and a start codon that marks the beginning of a coding sequence. The code is degenerate in that some amino acids are specified by more than one synonymous codon. Codons are oriented 5’-to-3’ in the mRNA, in keeping with the direction of translation. Codons are deciphered by tRNA molecules, which act as adaptors between codons and amino acids. tRNAs recognize codons via pairing with their anti-codons. A particular amino acid is covalently attached to the 3’ end of its cognate tRNA by an aminoacyl-tRNA synthetase in a process termed charging. A single tRNA synthetase is responsible for charging all cognate tRNAs for each amino acid. Charging takes place in a two-step reaction and involves the hydrolysis of a molecule of ATP, thereby creating a high- energy acyl bond between the carbonyl carbon of the amino acid and the protruding 3’ hydroxyl of the tRNA. Some synthetases have an editing pocket for removing incorrect (non-cognate) amino acids derived from errors in acylation. The ribosome is a complex of one large and one small subunit, which consist of rRNAs and multiple proteins. The ribosome has A-, P- and E-sites for binding tRNAs. During the elongation phase of protein synthesis, charged tRNAs are escorted into the A-site in a complex with EF-Tu·GTP, which certifies correct codon/anti-codon pairing by hydrolyzing GTP and dissociating from the ribosome. Peptide bond formation takes place by transfer of the growing polypeptide chain in the P-site to the free amino group of the amino acid in the A-site, with a lone pair of electrons on the amino nitrogen attacking the carbonyl carbon. Thus, amino acids are added

at the COOH-terminus, and the polypeptide chain grows in an NH2-to- COOH-terminal direction while translation proceeds in a 5’-to-3’ direction on the mRNA. Peptide bond formation is a favorable reaction because the aminoacyl linkage to the tRNA is higher in free energy than the peptide bond. After peptide bond formation, the ribosome moves one codon unit in the 3’ direction powered by EF-G and hydrolysis of GTP. Translocation shifts the tRNA that was in the P-site and that now lacks a peptide cargo into the E-site, where it dissociates from the ribosome. The overall cost of forming a peptide bond is two GTPs and one ATP, which was expended during charging to create the high-energy acyl linkage. The ribosome undergoes cycles of association and dissociation during each round of protein synthesis. In bacteria, protein synthesis is initiated by the binding of the small subunit to the ribosome binding site upstream of the start codon via a complementary sequence in the rRNA. Initiation factors recruit the large subunit and create an initiation complex at the start site. In eukaryotes, the small subunit binds at the 5’ cap and scans downstream until it encounters the first start codon. Initiation factors then recruit the large subunit and assemble the initiation complex. Translation Chapter 11 The Genetic Code & Translation 19 terminates at stop codons, which are recognized by release factors that release the completed polypeptide, completing the ribosome cycle. Thus, protein synthesis involves two cycles: cycles of peptide bond formation during elongation that are embedded within cycles of ribosome assembly and disassembly on the mRNA. In addition to marking the beginning of a protein-coding sequence, the start codon sets the reading frame for downstream codons. The protein- coding sequence is therefore also known as an open reading frame.