Quick viewing(Text Mode)

Gene Expression and Regulation 4

Gene Expression and Regulation 4

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION expression and regulation 4

David G. Bear Department of Cell Biology and Physiology, University of New Mexico School of Medicine, Albuquerque, NM

TRANSLATION OF A MESSENGER RNA (mRNA) ­isolated from a cell in the salivary gland of an insect Chironomous tentans in this digitally colored trans- mission electron micrograph. is the pro- cess by which (blue particles) translocate along the mRNA (pink strand) and read its sequence in three nucleotide units called codons. As the ribo- some moves to each codon on the mRNA, it inserts the appropriate amino acid into the growing chain (green filaments). (© Dr. Elena Kiseleva/SPL/ Photo Researchers, Inc.)

CHAPTER OUTLINE

4.1 Introduction 4.12 Translation is a three-stage process that decodes an 4.2 are units mRNA to synthesize a protein 4.3 Transcription is a multistep process directed by 4.13 Translation is catalyzed by the DNA-dependent RNA polymerase 4.14 Translation is guided by a large number of protein 4.4 RNA polymerases are large multisubunit protein factors that regulate the interaction of amino­ complexes acylated tRNAs with the ribosome 4.5 Promoters direct the initiation of transcription 4.15 Translation is controlled by the interaction of the 4.6 Activators and regulate transcription 5′ and 3′ ends of the mRNA and by translational initiation 4.7 Transcriptional regulatory circuits control eukaryotic 4.16 Some mRNAs are translated at specific locations cell growth, proliferation, and differentiation within the cytoplasm 4.17 Sequence elements in the 5 and 3 untranslated 4.8 The 5′ and 3′ ends of mature mRNAs are generated ′ ′ by RNA processing regions determine the stability of an mRNA 4.9 Terminators direct the end of transcription elongation 4.18 Noncoding are important regulators of 4.10 in eukaryotic pre-mRNAs are removed by the spliceosome 4.19 What’s next? 4.11 generates protein diversity 4.20 Summary References

2nd pages

9781284023558_CH04_0103.indd 103 6/11/13 4:20 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

4.1 Introduction DNA Key concepts Replication • DNA functions as the template for the synthesis of Reverse Transcription RNA by RNA polymerase (transcription). Messenger transcription RNA (mRNA) directs the synthesis of proteins by the ribosome (translation). • The refers to the set of 64 base RNA ­triplets (codons) that are read by the ribosome Replication during translation. Translation • Each of the 61 codons codes for 1 of the 20 ­common amino acids, whereas 3 codons cause the ribosome to stop translation. Protein • In prokaryotes, transcription and translation of mRNA occur concomitantly in the same region of the cell. • In eukaryotes, transcription and mRNA processing occur concomitantly inside the nucleus, whereas FIGURE 4.1 The Central Dogma of . The translation occurs subsequent to the export of the Central Dogma states that DNA serves as the template mRNA to the cytoplasm. for the synthesis of RNA by the enzyme RNA polymerase (transcription) and the RNA transcript is decoded by the Modern biology began in the early part of the ribosome to synthesize a polypeptide (translation). The 20th century with the direct examination of the enzyme can synthesize DNA from structure and function of cells, first by improved RNA. Reverse transcription is a rare event in normal cells. microscopy techniques that permitted the de- Many RNA viruses contain a reverse-transcriptase gene tailed study of cell structure and then by high- in their genome. speed centrifugation methods that facilitated biochemical isolation and characterization of mechanisms of transcription and translation, cytoplasmic and nuclear subcellular fractions. with details of the structural and functional Geneticists focused on the relationship between properties of RNA polymerases that tran- the dynamic structure of within scribe the genome and ribosomes that trans- the and the mechanisms of genetic late mRNA to make protein. This work also inheritance, which culminated with the eluci- revealed that transcription and translation are dation of the structures of DNA and RNA and an coupled in bacteria—the mRNA is translated understanding of the genetic code. The scientific concomitantly as the newly synthesized RNA revolution of molecular biology that took place emerges from the active site of RNA poly- in the mid-20th century provided a framework merase (FIGURE 4.2). known as the Central Dogma of Molecular Biology: DNA functions as the template for the synthesis of RNA (transcription) and RNA directs the synthesis of protein by ribosomes Transcription (translation), as depicted in FIGURE 4.1. Initially, molecular biology focused on reconciling the classical concept of the gene— the unit of information controlling inherited DNA traits—with the more modern tenets of the RNA Central Dogma. These studies led to the One Gene-One Protein Hypothesis that posited Translation a gene to be the unit of genetic information that directs the synthesis of a specific mRNA, which in turn encodes an individual protein. Much of the early work on the Central Dogma Ribosome translates mRNA was carried out using a nonpathogenic strain of the bacterium Escherichia coli because of the ease of genetic manipulation and the ability to FIGURE 4.2 Gene expression in prokaryotes. Prokaryotes easily fractionate the bacterial cellular compo- do not have a cell nucleus. Thus, transcription and trans- nents for biochemical analysis. Studies with lation take place in the same cellular compartment and E. coli provided our first understanding of the occur concomitantly.

104 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 104 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

In the latter half of the 20th century, in- In addition to splicing, eukaryotic mRNAs un- vestigations of gene expression in several eu- dergo other posttranscriptional modifications, karyotic model systems, including the yeast including the addition of a 7-methylguanosine Saccharomyces cerevisiae, the insect Drosophila (7-MeG) cap to the 5′ end and a poly(A) tail melanogaster, and the human HeLa cultured cell comprising 100 to 300 adenosine ribonucleo- line, showed that many fundamental mecha- tides on the 3′ end. Selected mRNAs can also nisms of gene expression are conserved among have additional bases added posttranscription- all organisms; however, there are some basic ally in a process referred to as mRNA editing. differences between prokaryotes and eukary- Analogous to transcription–translation coupling otes, particularly with regard to the posttran- in prokaryotes, most eukaryotic mRNA process- scriptional pathways of mRNA biosynthesis ing reactions are coupled to transcription. (FIGURE 4.3). Some of the greatest similarities, as well as One of the most surprising discoveries was some of the biggest differences, between pro- that most protein-coding genes in higher eukary- karyotic and eukaryotic cells are at the final step otes are interrupted with DNA segments, termed of gene expression—translation. Prokaryotes introns, that are transcribed into RNA but are re- lack a cell nucleus, and the translation complex moved before translation. Ribonucleoprotein (ribosomes and associated ­factors) is in direct (RNP) complexes, referred to as spliceosomes, contact with the transcription apparatus on the were found to excise the introns from the orig- chromosomal DNA, whereas in eukaryotes the inal mRNA, or pre-­mRNA, and to ligate the transcription and RNA processing reactions oc- remaining ­expressed mRNA sequences, termed cur in the nucleus and translation occurs in exons, ­together to form the mature mRNA. the cytoplasm. This separation of RNA biosyn- thesis from translation necessitates the pack- aging of mRNA into an mRNA RNP (mRNP) complex that then must be exported through Exon 1 Exon 2 the pores of the nuclear envelope into the cy- DNA toplasm before translation. In spite of the extra steps between transcription and translation in eukaryotes, the prokaryotic and eukaryotic translational machineries are very similar. In fact, some of the highest evolutionary conser- Transcription vation of macromolecular structure and func- Nucleus tion among species is found in ribosomes and associated translational factors. 5 Intron The information contained in a mature 3 Pre-mRNA mRNA for making a protein is decoded by the ribosome. The ribosome reads the mRNA se- Intron Splicing quence in three-nucleotide units referred to 5 as codons. Because there are four different 3 nucleotide bases in DNA and RNA—A, G, C, mRNA and T (U in RNA)—there are 64 possible codons (FIGURE 4.4). Each codon specifies the insertion Transport by the ribosome of a specific amino acid into tRNA the protein. Twenty amino acids are commonly 5 3 found in proteins; thus, some amino acids are specified by more than one codon. In general, Ribosome codons that encode the same amino acid differ Translation Cytoplasm by only one base. The tendency for amino acids with chemically similar functional groups to be represented by related codons minimizes the effects of mutations by increasing the like- Protein lihood that a single random base change will FIGURE 4.3 Gene expression in eukaryotes. Eukaryotes result in an alternative amino acid substitu- contain a cell nucleus where transcription and RNA pro- tion with similar molecular characteristics. For cessing occur. The mRNA transcript is exported from the example, a mutation of CUC to CUG has no nucleus to the cytoplasm where translation takes place. effect, because both codons represent leucine.

4.1 Introduction 105 2nd pages 2nd pages

9781284023558_CH04_0103.indd 105 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

cases, the first codon that specifies a protein First Position Second Position Third Position (5’-end) (3’-end) sequence is AUG, which specifies the amino acid methionine and is referred to as the start U C A G codon. Three codons (UAA, UAG, and UGA) UUC PheUCU SerUAU TyrUGU Cys U do not code for any amino acid but instead U UCC Phe UCC Ser UAC Tyr UGC Cys C cause the ribosome to terminate translation; UUA Leu UCA Ser UAA Stop UGA Stop A each of these is referred to as a stop codon. UUG Leu UCG Ser UAG Stop UGG Trp G The ribosome translocates along the mRNA CUU Leu CCU Pro CAU His CGU Arg U in a 5′ to 3′ direction reading the codons in C CUC Leu CCC Pro CAC His CGC Arg C frame. The region between the start codon, CUA Leu CCA Pro CAA Gln CGA Arg A AUG, and one of the stop codons is referred CUG Leu CCG Pro CAG Gln CGG Arg G to as the open reading frame (ORF) and is AUU IIe ACU Thr AAU Asn AGU Ser U generally synonymous with a peptide-coding­ A AUC IIe ACC Thr AAC Asn AGC Ser C region. The translation process carried out by AUA IIe ACA Thr AAA Lys AGA Arg A the ribosome entails facilitating the - AUG Met ing of the anticodon loop of a transfer RNA ACG Thr AAG Lys AGG Arg G Start (tRNA) carrying an amino acid with the co- GUU Val GCU Ala GAU Asp GGU Gly U don sequence in the mRNA, followed by the transfer of the growing peptide chain located G GUC Val GCC Ala GAC Asp GGC Gly C on the previous tRNA that docked with the GUA Val GCA Ala GAA Glu GGA Gly A mRNA onto the amino acid from the incom- GUG Val GCG Ala GAG Glu GGG Gly G ing tRNA, in what is known as the peptidyl Start Codon transferase reaction. Stop Codon An important difference between pro- Nonpolar Side Chain karyotic and eukaryotic mRNAs is the or- Uncharged Polar Side Chain ganization of the protein-coding sequences. Charged Polar Side Chain Many genes in bacteria that code for pro- FIGURE 4.4 The genetic code. The genetic code is nearly universal among all species. teins in the same functional pathway are tan- The table shows the 64 codons, 61 of which specify 1 of the 20 naturally occurring demly arrayed as a single transcription unit amino acids commonly found in proteins, whereas 3 codons do not code for an amino called an , which yields a single large acid but serve as signals for the ribosome to end translation (stop codons). The codon mRNA transcript carrying multiple ORFs sep- AUG codes for methionine and also serves as the initiation codon. arated by short spacer regions as shown in FIGURE 4.5. Prokaryotic with multiple ORFs are said to be polycistronic. Also, in A mutation of CUU to AUU results in the re- some prokaryotic genes, it is possible to find placement of leucine with isoleucine, which overlapping translational ORFs that yield two has a side chain with similar chemical proper- entirely different proteins of different amino ties that have little if any effect on the protein’s acid sequence and length from the same re- structure or function. gion of the genome. Thus, due to multiple With only a few exceptions, the genetic protein-coding sequences residing in operons code is the same for all organisms. In most and overlapping ORFs, the number of pro- teins encoded by relatively compact prokary- otic genomes can be quite large. By contrast, most (but not all) eukaryotic mRNAs contain 5Ј UTR precedes 3Ј UTR follows one primary functional ORF subsequent to initiation codon termination codon splicing and are said to be monocistronic Intercistronic distance varies from Ϫ1 to +40 bases (FIGURE 4.6). However, because most eukary- 5Ј 3Ј otic pre-mRNAs are alternatively spliced Coding Coding with alternative ORFs generated, multiple proteins may be encoded by the same gene StartSStop Start top sequence. FIGURE 4.5 mRNA structure in prokaryotes. Most prokaryotic mRNAs contain In addition to the protein-coding re- several tandem open reading frames coding for proteins and are said to be “poly- gion, most mRNAs in both prokaryotes and cistronic.” The distance between the stop codon of one and the eukaryotes contain sequences located be- start codon of the next coding region can vary from a one-base overlap (−1) up fore (upstream) and after (downstream) the to about 40 bases. ­protein-coding regions, referred to as the

106 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 106 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

5′ and 3′ untranslated regions (UTRs), Start codon Stop codon AUG UAA, UAG or UGA ­respectively. 5′ UTRs often ­encode signals that

regulate translation initiation efficiency at the 5 cap Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 (A)n start codon, whereas 3′ UTRs often contain Pre-mRNA 3 -poly-A tail sequence elements that regulate translational initiation efficiency and mRNA stability. The Splicing ability of 3′ UTR sequences to enhance trans- Translation Translation lation initiation suggests that proteins bound start site stop site

at the 5′ and 3′ ends of the mRNA mediate 5 cap Exon 1Exon 2Exon 3 (A)n the formation of circularized mRNA (see 4.15 Mature mRNA Translation is controlled by the interaction of the 5′ 5 UTR3 UTR and 3′ ends of the mRNA and by translational re- FIGURE 4.6 mRNA structure in eukaryotes. Eukaryotic mRNAs must be processed from pressor proteins). Regulation of translational re- the (pre-mRNA) to form the mature mRNA. Processing includes pression and mRNA stability within the 5′ and removal of the introns (splicing) as well as the addition of the 7-methylguanosine 3′ UTRs is regulated by specific protein binding (7-MeG) cap and the poly(A) tail. In general, once a mature mRNA is formed, only a sites and by RNA binding sites for small non- single protein is produced. The AUG start codon and the stop codon can be located coding RNAs called (miRNAs), in any of the exons. discussed also in 4.15. The goal of this chapter is to present an overview of the basic molecular mechanisms Over the past two decades, the sequencing and of how genes are expressed and briefly de- functional analysis of prokaryotic and eukary- scribe some of the most important strategies otic genomes, known as genomics, has led to the cell uses for regulation of gene expres- an evolving definition of the term “gene.” In sion. Gene expression behaves as an inte- this section we focus on both the structural and grated system, which is self-organized into functional definitions of a gene and how genes cellular membrane-bound compartments are organized within the genome. and nonmembranous subcompartments. This A surprising finding from genomic studies dynamic organization facilitates the regula- is that the exact number of genes in various or- tion of gene expression during cell growth, ganisms is often difficult to determine from the proliferation, differentiation, and devel- DNA sequence alone in the absence of infor- opment and in response to nutritional and mation about the RNA transcripts and protein physical stress or changes in the extracellular products. In prokaryotic organisms such as my- environment. coplasma and bacteria, the genomes are quite compact, ranging in size from 0.6 to 7 Mb and Concept and Reasoning Check containing 500 to 8,000 genes. By comparison, 1. List the steps in the expression of a protein-­ the number of genes in eukaryotic genomes encoding gene in a eukaryotic cell and indicate is estimated to be ∼6,000 in yeasts, ∼22,000 the location within the cell where each step takes in humans, and ∼30,000 in rice (FIGURE 4.7). place. What could be the advantages of ­performing The exact number of proteins in the genomes these steps in separate cellular locations? of higher eukaryotes is still unknown but is estimated to be 3 to 10 times more than the number of genes. In higher eukaryotes, most 4.2 Genes are transcription protein-coding genes encode mRNAs that are alternatively spliced to give rise to multiple pro- units tein species from a single gene. In some cases Key concepts the proteins derived from alternative splicing • The number of genes within a genome is generally are related in structure and function, whereas a function of the complexity of both the structure in other cases the proteins have significantly and cycle of an organism. different sequences and functions. In addition, • Eukaryotic cells have large nuclear genomes, as alternative transcriptional start sites, alterna- well as smaller genomes contained within the mi- tochondria and chloroplast cytoplasmic organelles. tive sites, and alternative • Most RNA transcription within eukaryotic cells translational start and stop sites can lead to generates RNA species that do not encode proteins multiple gene products (RNAs and proteins) but have other important functions. Thus, the most from a single gene. Thus, the size of a genome general definition of a gene is a segment of the and the number of proteins encoded by the genome that encodes a functional transcript. genome are not always correlated.

4.2 Genes are transcription units 107 2nd pages 2nd pages

9781284023558_CH04_0103.indd 107 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Genomes Lethal Cyt b Species (Mb) Genes loci D-lo op Mycoplasma 0.58 470 ~300 ND6 12S rRNA genitalium ND5

Rickettsia 1.11 834 16S rRNA prowazekii 16.6 kb Haemophilus 1,743 1.83 ND4 influenzae ND1 ND4L Methanococcus 1.66 1,738 jannaschi ND3 ND2 CO3 B. subtilis 4.2 4,100 ATPase 6 ATPase 8 E. coli 4.6 4,288 1,800 CO2 CO1 S. cerevisiae 13.5 6,034 1,090 tRNA genes S. pombe 12.5 4,929 Coding regions A. thaliana 119 25,498 Indicates direction of gene, 5 to 3 CO: cytochrome oxidase O. sativa (rice) 466 ~30,000 ND: NADH dehydrogenase D. melanogaster 165 13,601 3,100 FIGURE 4.8 The mitochondrial genome. The human mito- C. elegans 97 18,424 chondrial genome encodes 37 genes. Most mitochondrial H. sapiens 3,300 ~25,000 proteins, however, are encoded by nuclear genes and FIGURE 4.7 Genomes and gene numbers. The genome sizes imported into the mitochondria. and gene numbers have been determined for many common organisms. A number of “lethal loci,” which correspond to essential genes, have been determined for several organisms Our concept of the gene has changed signifi- that can be studied by rigorous genetic methods. cantly with the recent discovery that most cellu- lar RNAs do not encode proteins; in fact, much In eukaryotic organisms, the majority of the of the transcription that takes place in the cell genome is within the chromosomes located in- is geared toward the production of functional side the nucleus. The cytoplasmic mitochondria noncoding RNAs (ncRNAs). Surprisingly, and chloroplasts, however, contain circular dou- more than 90% of the is tran- ble-stranded , which are referred to as the scribed, and much of this transcription results mitochondrial and chloroplast genomes. in ncRNAs that perform a wide range of func- The organelle genomes provide an additional tions (FIGURE 4.9): ribosomal RNAs (rRNAs), number of proteins and RNA that are related to which become part of the ribosome; tRNAs, their compartmentalization within the cytoplasm. which are involved in translation; small nu- Mitochondrial genomes vary in size from clear RNAs (snRNAs) and small nucleolar 80 to 100 kb in fungi and plants to less than RNAs, which are part of the rRNA processing 20 kb in mammals. The human mitochon- reactions; and various other RNAs that are in- drial genome is 16.6 kb and encodes 37 genes volved in cellular structure and function. More (FIGURE 4.8). Some mitochondrial genes in fungi recently, a large class of ncRNAs, referred to as and plants contain introns, whereas the much miRNAs, has been identified. miRNAs regu- more compact mammalian mitochondrial genes late gene expression by base pairing to specific do not have any introns. Mitochondrial genes mRNAs and inhibiting translation or stimulat- can overlap, similar to the organizations of ing degradation of mRNA. The genome also genes in bacteria. Most proteins required for mi- encodes a large number of large noncoding tochondrial function are contained within the RNAs, whose functions include the regulation mitochondrial genome and are encoded in the of gene transcription and genome maintenance. nuclear genome, synthesized in the cytoplasm, It is highly likely that discoveries of many new and then imported into the mitochondria.­ ncRNA functions will be made in the future. Chloroplast genomes are relatively large com- In summary, the concept of the “gene” has pared with mitochondrial genomes, with sizes evolved over the past century from the clas- of about 140 to 200 kb, and may be present in sical definition as a determinant of a genetic 20 to 40 copies with over 100 to 120 genes. phenotype, to a protein-encoding segment of

108 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 108 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

ncRNA Class Function(s) ncRNAs Involved in Translation Ribosomal RNA (rRNA) Structural and functional components of the ribosome; some rRNAs have catalytic activities Transfer RNA (tRNA) Covalent attachment with amino acids for interaction with ribosome and mRNA during protein synthesis 7SL RNA Co-translational translocation of secreted proteins into the endoplasmic reticulum

ncRNAs Involved in Nuclear RNA Processing Small nuclear RNA (snRNA) RNA processing—mRNA splicing and rRNA processing Small nucleolar RNA (snoRNAs) RNA processing—processing of ribosomal RNA; guidance of covalent modification of rRNA, tRNA, mRNA, and snRNA

ncRNAs Involved in Regulation of Gene Expression Large noncoding RNA Suppression of transcription of specific gene loci on chromosomes Small interfering RNA (siRNA) Gene-specific mRNA degradation and transcriptional regulation MicroRNA (miRNA) Gene-specific mRNA degradation and translational repression/ translational activation Short hairpin RNA (shRNA) mRNA degradation Piwi-Interacting RNA (piRNA) Transposon silencing

Other ncRNAs Vault RNA Structural component of the cytoplasmic vault particle, which has been implicated in intracellular detoxification processes and intracellular transport

FIGURE 4.9 Noncoding RNAs. Most cellular RNAs do not code for proteins and are referred to as noncoding RNAs (ncRNAs).

DNA, and finally to the more modern defini- The first step in gene expression is the tran- tion as the unit of information contained in scription of DNA to make RNA, which is the genome required for the transcription of a carried out by a class of enzymes called DNA- functional mRNA or ncRNA. In the next sev- dependent RNA polymerases. RNA poly- eral sections, we explore the process of tran- merases are found in all cellular organisms scription and its regulation in detail. and in mitochondria and chloroplasts. The transcription cycle carried out by all RNA Concept and Reasoning Check polymerases can be separated into four differ- 1. Why are the sizes of eukaryotic genomes signifi- ent phases: recognition, initiation, cantly larger than the amount of DNA required to elongation, and termination, as illustrated in encode proteins and functional RNAs? FIGURE 4.10. 1. Promoter recognition: RNA polymerase initiates transcription at locations on 4.3 Transcription is a the genome referred to as ­promoters. multistep process directed The term “promoter” refers to all DNA sequence elements required for the by DNA-dependent RNA maximal level of transcription initi- polymerase ation and often includes sequences quite distal to the transcription start Key concepts site. The minimal set of sequence • Specific sequences referred to as “promoters” elements required to initiate tran- and “terminators” direct the location on the DNA scription, however, is referred to as template where RNA polymerase begins and ends transcription. the core promoter. In prokaryotes, • The four phases of transcription are promoter the core promoter is relatively small ­recognition, initiation, elongation, and termination. (∼40 base pair [bp]), and RNA poly- • During transcription, RNA polymerase opens merase can locate the core promoter ­duplex DNA and uses the template strand to ­direct through direct recognition of the DNA the polymerization of ribonucleotides into a sequence. In eukaryotes, the core pro- ­complementary RNA strand. moters for all three classes of nuclear

4.3 Transcription is a multistep process directed by DNA-dependent RNA polymerase 109 2nd pages 2nd pages

9781284023558_CH04_0103.indd 109 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

not require a primer to initiate poly- Template recognition: RNA polymerase binds to duplex DNA merization of RNA from ribonucleo- side triphosphate precursors. Thus, because the first nucleotide in RNA synthesis is a ribonucleoside triphos- phate, three phosphate groups are at DNA is unwound at promoter the 5′ end of an RNA, whereas each phosphoester linkage in the internal sequence of the RNA contains a mono- phosphate and a ribose sugar. The tem- plate DNA strand directs the synthesis Initiation: Very short chains of a complementary strand of RNA are synthesized and released in a 5′ to 3′ direction using ribonu- cleoside triphosphates as substrates. The 3′-OH group on the ribose at the growing 3′ end of the RNA is a nucle- ophile that attacks the α-­phosphate Elongation: polymerase synthesizes RNA group on the incoming ribonucleoside triphosphate, liberating the β-γ pyro- phosphate group (FIGURE 4.11). The initiation phase of transcription can be dissected into the following steps: (a) binding of RNA polymerase to the Termination: core promoter region; (b) separation of RNA polymerase and RNA are released the DNA strands by RNA polymerase to form the open complex, a process termed isomerization­ ; and (c) RNA polymerase moving away from the promoter and down the DNA tem- plate, a process termed promoter escape. Initiation formally ends only when RNA polymerase succeeds in FIGURE 4.10 The transcription cycle. The transcription cycle leaving the promoter region by ex- has four phases. tending the transcript to a length that more than fills its active site and mov- ing the transcription bubble. RNA polymerase transcription units 3. Elongation: The 10- to 14-bp transcrip- are much larger (often more than 100 tion bubble of open DNA within the bp). A preinitiation complex (PIC), RNA polymerase is propagated inside consisting of a large number of initi- RNA polymerase as it translocates ation factors, assembles onto the pro- along the DNA template (FIGURE 4.12). moter. The eukaryotic nuclear RNA The length of the transcription bub- polymerases recognize and bind to ble usually varies by only a few bases promoter regions through recognition throughout the elongation reaction: of the PIC, rather than through direct RNA polymerase unwinds the DNA in recognition of a DNA sequence. front of the bubble and rewinds the 2. Initiation: To initiate RNA synthesis, DNA at the back of the bubble, which RNA polymerase binds to the promoter creates a major change in the DNA to- and unwinds the DNA to produce­ a pology dispersed over about one turn “bubble” of separated strands, expos- of the DNA helix. Within the bubble, ing one of the strands to serve as the 8–10 bases of the template strand template for the RNA polymerization are transiently paired with the tran- reaction. Unlike DNA polymerases, script. During elongation, the enzyme which initiate DNA synthesis using moves along the DNA and extends a short primer synthesized by a pri- the growing RNA chain at an aver- mase enzyme, RNA polymerases do age rate that varies between 10 and

110 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 110 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

3 3 DNA template 5 strand 5

O O

CH 2 Pi CH 2 O 2 O

H2O H O OH PP OH i – O O – – – O O O O O O P P P P HO O HO O O O CH CH 2 O 2 O

HO OH HO OH 5 5 FIGURE 4.11 The mechanism of RNA synthesis. RNA polymerase catalyzes the template-directed polymerization of RNA in the 5′ to 3′ direction. The 3′ hydroxyl group on the 3′ end of the RNA carries out a nucleophilic attack of the α-phosphate

on the monomer, liberating pyrophosphate (PPi). (Adapted from J. M. Berg, et al. Biochemistry, Fifth edition. W. H. Freeman and Company, 2002.)

100 ­nucleotides per second depend- resuming elongation. The nature of ing on the specific RNA polymerase. these pause sites is still under active However, transcription elongation investigation, but the purpose of the does not occur at a constant rate; pausing is to facilitate the coordination RNA polymerase can pause at specific of cotranslational processes such as sites for up to several minutes before translation (in ­prokaryotes) and RNA splicing (in eukaryotes). 4. Termination: Termination of transcrip- tion involves three steps: (a) the ces- sation of RNA polymerase movement, Enzyme movement (b) the release of the transcript, and (c) the release of RNA polymerase. Rewinding point DNA coding strand termination can occur hundreds or thousands of Unwinding point nucleotides from the translational stop codon. The signals and proteins required for transcription termination in mammals are still being identified and characterized. DNA template strand By convention, the DNA sequence of a Catalytic site DNA transcription unit is written such that RNA binding site the coding strand or sense strand is in the FIGURE 4.12 The active site of bacterial RNA polymerase. 5′ to 3′ direction and is the top strand in The catalytic pocket of RNA polymerase transiently un- the sequence (FIGURE 4.13). Thus, the coding winds a small region of the duplex DNA template creating strand shares the same nucleotide sequence a “bubble” that permits the template strand to direct as the RNA transcript—the only difference is transcription. As the polymerase moves along the DNA that in RNA the base thymine (T) is replaced template, the region in front of the polymerase is unwound with the base uracil (U). The coding strand is and the region behind the polymerase is rewound, thereby also referred to as the nontemplate strand. The maintaining the bubble at a constant size of 10–14 bp. bottom strand is the template that ­directs

4.3 Transcription is a multistep process directed by DNA-dependent RNA polymerase 111 2nd pages 2nd pages

9781284023558_CH04_0103.indd 111 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Direction of transcription (RNA synthesis) finally ending the transcription reaction by releasing the transcript and dissociating from –2 –1 +1+2 Coding strand the template. In the next section, we examine 5 … AGCATCCTAGGT… 3 how the structure of RNA polymerase relates 3 … TCGTAGGATCCA … 5 to its function. Template strand

Upstream Downstream Concept and Reasoning Check 1. A number of prokaryotic and eukaryotic genomes contain genes that are overlapped and that are 5 ppp AUCCUAGGU 3 transcribed by RNA polymerases using the oppo- +1 +5 site DNA strands as transcription templates. The promoters, which are located in front of each gene Newly synthesized RNA strand at opposite ends of the DNA sequence, are said FIGURE 4.13 The DNA transcription template. The conventions for describing tran- to be “converging.” What might be the biologic scription are the same for both prokaryotes and eukaryotes. The top strand is re- consequences of convergent transcription? Draw ferred to as the nontemplate, coding, or sense strand, whereas the bottom strand a diagram with overlapping genes and convergent promoters, indicating which strands are the sense is referred to as the template, noncoding, or nonsense strand. Because the RNA strand and the nonsense strand for each gene. transcript is complementary to the template strand, it is the same base sequence as the nontemplate or coding strand, although in the RNA transcript all monomer units are ribonucleoside monophosphates and the base thymine found in the DNA nontemplate strand is replaced with uracil. The direction in which the polymerase 4.4 RNA polymerases are moves and transcribes the DNA template is referred to as downstream, whereas the large multisubunit opposite direction is referred to as upstream. The first base pair to be used as a template for the synthesis of the RNA is referred to as the transcription start site protein complexes and is labeled as position +1. The next nucleotide is labeled as position +2, etc. The Key concepts position immediately upstream of +1 is labeled as −1. • General aspects of RNA polymerase structure and function are conserved among prokaryotes and eukaryotes. • Cellular RNA polymerases are composed of the nucleotide specificity of polymerization ­multiple protein subunits that carry out structural, by RNA polymerase and is written in the 3′ ­enzymatic, or regulatory functions. to 5′ direction. The template strand is also referred to as the nonsense strand or non- The simplest RNA polymerases are those en- coding strand. In most cases, when a DNA coded by bacterial viruses, also called bacterio- sequence is published, only the sense strand phages. DNA-dependent RNA is written out—the template strand is easily polymerases usually consist of a single subunit inferred from the complementary base pair- that is able to carry out all enzymatic func- ing rules. The transcription start site—the tions of the much larger and more complicated locus on the template that directs the insertion cellular RNA polymerases discussed below, in- of the first base in the RNA—is designated by cluding being able to recognize the promoter convention as +1. The base on the 5′ side of and sequences. However, the extra the start site (+1) is designated as −1 and the subunits of the cellular RNA polymerases fa- base on the 3′ side of the start site is designated cilitate the ability of transcription to respond as +2 (notice there is no “0” in this system). to the environment and to be regulated during Sequences on the DNA template behind the cell proliferation, growth, development, and transcribing polymerase (in the 5′ direction differentiation. on the sense strand of the template) are said Prokaryotes usually contain a single RNA to be upstream, whereas sequences located polymerase that consists of two large subun- in the front of the transcribing polymerase (in its with molecular weights of 150 to 160 kDa, the 3′ direction on the sense strand) are in the named β′ and β, as well as two copies of the

downstream direction. smaller ∼30- to 40-kDa α subunit. The β′βα2 In summary, the phases of the transcrip- complex is referred to as the core enzyme. tion process correlate with RNA polymerase A fourth subunit type called ω is sometimes recognizing the promoter, opening the duplex associated with the core enzyme but is not DNA to access the template strand, leaving the present in stoichiometric ratios. ω is impor- promoter and using the template strand as a tant in the transcription of specific genes when guide to polymerize the RNA transcript, and the bacterial cell is under nutritional or other

112 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 112 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

FIGURE 4.14 The E. coli RNA Subunit Number of Subunits Gene Map Molecular Function in Holoenzyme Position Mass (Da) polymerase subunits. The sub- Alpha (α) 2 rpoA 74.10 36,511 Required for units of E. coli RNA polymerase enzyme assembly; each have a functional role in interacts with some regulatory proteins transcription. Beta (β) 1 rpoB 90.08 150,616 Forms a pincer and is the site of rifampicin action Beta’ (β’) 1 rpoC 90.16 155,159 Forms a pincer and provides an absolutely conserved -NADFDGD- motif that is essential for catalysis Omega (ω) 1 rpoZ 82.34 10,105 Helps in enzyme assembly but is not required for enzyme activity Sigma (σ) 1 rpoD 69.21 70,263 Directs enzyme to promoters but is not required for phospho- diester bond formation.

forms of environmental stress. The molecular ­characteristics and function of each of these subunits are shown in FIGURE 4.14. The bacterial core enzyme is capable of carrying out the RNA polymerization reac- tion but requires an exchangeable subunit called σ factor­ to recognize the promoter. The core enzyme­ plus the σ factor are col- lectively referred to as the holoenzyme. Prokaryotes generally have a number of dif- ferent exchangeable σ factors that permit the holoenzyme to recognize different classes of promoters. Transcription initiation of most genes in the bacterium E. coli is carried out by RNA polymerase complexed to a σ factor with a molecular weight of 70 kDa, referred to as σ70. The crystal structure of a bacterial RNA polymerase enzyme (FIGURE 4.15) shows that the channel for DNA lies at the interface FIGURE 4.15 The structure of a bacterial RNA polymer- of the β′ and β subunits (the ω subunit is not ase. The bacterial core RNA polymerase has a claw-shaped visible in this view). The DNA is unwound at structure. In this x-ray crystal structure of the T. aquat- the active site where the RNA transcript is be- icus RNA polymerase the individual subunits are shown ing synthesized. The β′ and β subunits contact in separate colors as delineated in the color code shown DNA at many points downstream of the active below the structure. One α subunit is on one side of the site and on the coding strand in the region of structure and interacts with the β′ subunit, whereas the the transcription bubble, thereby stabilizing other α subunit interacts with the β subunit on the other the separation of the strands. The α subunit side. Mg2+ and Zn2+ ions are also tightly associated with is important for interacting with proteins that the RNA polymerase. The various structural domains of the modulate RNA polymerase elongation. RNA largest subunit β′ are delineated as C, E, F, and G. The ω is contacted largely in the region of the tran- subunit is hidden in this view of the structure. (Modified scription bubble. from Cell, vol. 98, G. Zhang, et al., Crystal Structure of In contrast to prokaryotes, eukaryotes con- Thermus Aquaticus . . . , pp. 811–824, Copyright (1999) tain three RNA polymerases in the ­nucleus, with permission from Elsevier [http://www.science designated RNA polymerases I, II, and III direct.com/science/journal/00928674]. Photo courtesy (FIGURE 4.16). Some plant species have ­additional of Seth Darst, Rockefeller University.)

4.4 RNA polymerases are large multisubunit protein complexes 113 2nd pages 2nd pages

9781284023558_CH04_0103.indd 113 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Enzyme Location RNA Products Sensitivity to Sensitivity to α-amanitin Actinomycin D

RNA polymerase I Nucleolus Pre-rRNA (leading to Resistant Very sensitive 5.8S, 18S, and 28S rRNA

RNA polymerase II Nucleoplasm Pre-mRNA and 50% inhibition Slightly sensitive some snRNAs at 0.02 g/mL RNA polymerase III Nucleoplasm tRNA, 5S rRNA, 50% inhibition Slightly sensitive U6 snRNA at 20 g/mL (spliceosome), and 7SL RNA (signal recognition particle) FIGURE 4.16 The eukaryotic nuclear RNA polymerases. The three nuclear RNA polymerases each transcribe a different class of genes. RNA polymerase II is the target of α-amanitin, a toxin produced in the poisonous mushroom Amanita phalloides; the other forms of RNA polymerase are less sensitive to α-amanitin. Actinomycin D is an antibiotic that binds to DNA and blocks RNA polymerases. RNA polymerase I shows the greatest sensitivity to this drug.

nuclear RNA polymerases that transcribe spe- Eukaryal Archaeal Bacterial cific genes encoding small ncRNAs. Nuclear RNA polymerase I is localized to the nucleolar RNAP I RNAP II RNAP III compartment within the nucleus and transcribes β RPB1 the rRNA genes; nuclear RNA polymerase II RPA1 RPC1 β transcribes genes encoding mRNAs and genes RPA2 RPB2 RPC2 B encoding many small RNAs involved in mRNA processing; and nuclear RNA polymerase III transcribes the 5S rRNA genes, tRNA genes, and A genes encoding many small RNAs with different RPB3 A α functions. These polymerases generally contain RPC5 RPC5 10 to 12 subunits with a total molecular weight D of >500 kDa. Each polymerase has subunits that are homologous in both structure and function to the core subunits in prokaryotes; in addition, the nuclear polymerases also have a large num- ber of other small subunits with functions that RPB6 RPB6 RPB6 are still under investigation (FIGURE 4.17). Some RPC9 RPC9 of the small core subunits are common to all three nuclear RNA polymerases, whereas other subunits are unique to only one polymerase. The general structure of the eukaryotic RNA K polymerase II enzyme, as typified in the yeast S. cerevisiae, is shown in FIGURE 4.18. The overall RPB11 L clawlike shape of the yeast RNA polymerase II is quite similar to bacterial RNA polymerase,

ω but the dimensions of the yeast polymerase are larger due to the larger mass and number of FIGURE 4.17 Comparison of the subunits of RNA polymerases across the subunits. The largest subunit in RNA poly- kingdoms. Nuclear eukaryal, archeal, and bacterial RNA polymerases merase II has a carboxy-terminal domain (CTD) display some homologies as well as significant differences in subunit size that consists of multiple repeats of the consensus and function. The subunits of the RNA polymerases subunits are shown sequence YSPTSPS. The sequence is unique to as they would appear on an SDS-polyacrylamide gel (the higher the band RNA polymerase II. There are ∼25 repeats in on the gel, the larger the molecular weight). Bands with the same color lower eukaryotes and ∼50 repeats in mammals. represent structural (and perhaps functional) homologs. (Adapted from The serine residues in the repeats are substrates S. D. Bell and S. P. Jackson, Nat. Struct. Mol. Biol. 7 (2000): 703–705.) for phosphorylation.

114 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 114 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

(b)

bridge helix (a) wall

iaw

cleft

CTD

clamp head active site

FIGURE 4.18 The structure of RNA polymerase II. The structure of eukaryotic RNA polymerase II is similar in overall shape to bacterial RNA polymerase. (A) Diagram of the interactions between the 12 yeast RNA polymerase II subunits. Note that the colors and numbers correspond to each of the subunits shown in B. (B) X-ray crystal structure of the S. cerevisiae enzyme (lacking 2 of the 12 subunits) shows the jaw, clamp, and cleft regions of the polymerase, all of which have counterparts in the bacterial enzyme. The CTD of the largest subunit is found on the right-hand side of the polymerase. (Reproduced from P. Cramer, D. A. Bushnell, and R. D. Kornberg, Science 292 (2001): 1863–1876 [http://www.sciencemag.org]. Reprinted with permission from AAAS. Photo courtesy of Roger D. Kornberg, Stanford University School of Medicine.)

The mitochondrial and chloroplast cy- ­monomeric and has structural similarities to toplasmic organelles each carry out tran- mitochondrial RNA polymerase. scription of their own genomes and use RNA In summary, cellular RNA polymerases are polymerases that are distinct from the nu- large oligomeric proteins with two large subun- clear polymerases. The mitochondrial RNA its and a variable number of smaller subunits. polymerase core enzyme (also known as There is only one form of the prokaryotic core POLRMT) is a single subunit enzyme that RNA polymerase, but a number of different displays significant structural similarities to exchangeable σ factors can provide specificity bacteriophage RNA polymerases. Unlike most for the initiation of the enzyme at different bacteriophage-­encoded RNA polymerases, classes of gene promoters. By contrast, there however, POLRMT cannot recognize pro- are multiple forms of the eukaryotic nuclear moters by itself and requires a transcription core RNA polymerases. Each polymerase has a initiation factor, similar to the requirement­ of set of unique subunits, as well as subunits that bacterial RNA polymerase for a σ factor. In most are common to all the nuclear polymerases. metazoan organisms, POLRMT is encoded by In the next section, we explore the structure the nuclear genome and must be imported and function of promoters in prokaryotes and from the cytoplasm into the mitochondria. eukaryotes in greater detail. Chloroplasts contain two RNA polymerases: one called plastid-encoded RNA polymerase Concept and Reasoning Check that is encoded by the chloroplast genome and another called nuclear-encoded RNA 1. The simplest RNA polymerases from ­ , bacteria, and the eukaryotic organelles are capable polymerase that is encoded by the nuclear of carrying out the same basic catalytic reaction as genome. Plastid-encoded RNA polymerase the nuclear eukaryotic RNA polymerases, although shares structural homologies to bacteria RNA the prokaryotic and organellar RNA polymerases polymerases and has several subunits homol- contain far fewer subunits. Describe the possible ogous to the bacterial subunits β′, β, and α. By functions for the extra subunits in the eukaryotic nuclear core RNA polymerases. contrast, ­nuclear-encoded RNA polymerase is

4.4 RNA polymerases are large multisubunit protein complexes 115 2nd pages 2nd pages

9781284023558_CH04_0103.indd 115 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

4.5 Promoters direct the sequence that represents the base most often present at each position. A initiation of transcription is defined by aligning all known examples to Key concepts maximize their homology. For a sequence to be • Promoters are DNA sequences that direct the accepted as a consensus, each particular base ­initiation of transcription by binding RNA must be reasonably predominant at its posi- ­polymerase and by promoting the melting of tion, and most of the actual examples must be the duplex DNA strands to facilitate the ex- related to the consensus by only one or two posure of the template strand to direct RNA substitutions. Once conserved nucleotides have polymerization. been identified by sequence comparison, their • Core promoters have conserved sequences that usually differ from the consensus at one or more positions. importance in transcription initiation can be • Prokaryotic core promoters contain several short evaluated by site-directed mutagenesis or de- conserved sequence elements dispersed over letion of the entire sequence element. This 40–50 bp that are recognized directly by the RNA approach was originally applied to bacterial polymerase holoenzyme. promoters. • Each of the three nuclear RNA polymerases has a There are several conserved features distinct core promoter structure that consists of of prokaryotic promoters, typified by those multiple sequence modules dispersed over a large 70 region near the transcription start site. found in the σ promoters of E. coli, as shown • Unlike prokaryotic promoters, eukaryotic promoters in FIGURE 4.19. These sequences are as follows: are not recognized directly by RNA polymerase but • The start site is usually located at instead assemble with transcription factors that an adenosine or guanosine nucleo- form a PIC, which is in turn recognized by RNA tide (e.g., a common start site is CAT). polymerase holoenzyme. However, there are a number of excep- • The transition from transcription initiation to tions in many E. coli promoters. ­elongation at RNA polymerase II promoters ­involves the phosphorylation of the CTD of the • The −10 hexamer is centered ∼10 bp largest polymerase subunit. upstream of the start site. The consen- sus sequence is summarized in the form

Promoters are sequences that function to bind T80 A95 T45 A60 A50 T96, where the sub- RNA polymerase and facilitate the initial RNA script denotes the percent occurrence polymerase–dependent strand separation or of the most frequently found base. isomerization process. The most common • The −35 hexamer is centered ∼35 bp strategy to identify core promoter sequences upstream of the start site. The con-

has been to examine regions around the sensus is summarized as T82 T84 G78 A65

­transcription start sites of genes, searching for C54 A45. common sequence elements present in most, if • The spacer region between the −35 not all, promoters of the same classes of genes; and −10 hexamers is usually 17 ± 1 bp, such a sequence is said to be conserved. A although it can vary between 15 and conserved sequence element, however, is not 20 bp. The actual sequence in the in- necessarily conserved at every single posi- tervening region appears to be unim- tion; some variation usually occurs. Putative portant, but the distance is important ­promoters are defined in terms of an idealized for optimal promoter function, and

Transcription starts

–35 sequenceConsensus –10 sequence sequences Gene TTGA AC TTA TAA +1 lac TAGGC AACCCC GGC TTTTAACCTTAATTGGCCCCTT GGCCTTGGGGTT TTGGAA TTT GGA lacl GGAACCCCAATTAAGGCCGGAAAAC TT TTCCGGTTA GGCCTTGGGGGAA CCC G A AAAGGT trp T CCTTTGGAAA AAGGTTGGC AATTAAA TTCCAACC TTAAG TTAAAGGGTTCCAA TTA CC G his AATTA AAAAAGGTTCCCTT TTTTAA CCGGTTAAAA GGG TTTTG GGGTTAAAA AAC AATTG A leu GTTGAACCC GGTT TTTTTTAACC GGTTAACCTT AAAA CCAATTG C A T bio GGCCTTCCT CCAAAAATTGGTTTTTGT TTT A TTCCGGTTGGAATTG T AA AAC CCTTTAA recA TTTCCT A AAAACCAATTGGTTA C TGA T AAGGCCTTAA AAT AATTGGCCTT AACCAA A T FIGURE 4.19 The E. coli σ70 promoters contain conserved sequences elements. Sequence elements at −10 and −35 from the start sites of several σ70 RNA polymerase promoters are conserved. Much of the variation in promoter strength results from the differences between the consensus sequence and the actual sequence of the promoter element. In addition, the distance variations between the −10 and −35 regions also affect promoter strength.

116 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 116 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

FIGURE 4.20 The eukaryotic RNA polymerase I pro- Startpoint G-C-rich A-T-rich moter. The RNA polymerase I promoter consists of a Inr core element located just upstream of the transcrip- Upstream promoter element Core promoter tional start site and the upstream promoter element –170 –160 –150 –140 –130 –120 –110 –40–30 –20–10 +10+20 (UPE) located approximately at −100 bp, which binds to the UBF (homodimer). The

UBF binds to upstream promoter element transcription factors TATA box binding protein and SL1 are also required for RNA polymerase bind the core promoter.

SL1

RNA polymerase I holoenzyme includes core binding factor (SL1) that binds to core promoter

TBP RNA polymerase I

SL1

deviations from the 17 ± 1 bp spacing ­difficult to identify by comparison alone. Part decrease the strength of the promoter. of the reason for the lack of sequence conser- • Upstream elements can influence vation is that nuclear RNA polymerases do not promoter strength. For example, the recognize the DNA sequence of the core pro- UP element is an A-T–rich sequence moter region directly but instead recognize an found within 50 bp upstream of the −35 oligomeric protein complex, called the PIC, hexamer in some promoters that are that assembles at the promoter. For RNA poly- transcribed at high levels, such as the merases I and III, the PIC is relatively simple, promoters for rRNA genes. being comprised of just a few proteins, whereas Based on these observations, the “­consensus” the PIC formed on RNA polymerase II promot- E. coli core promoter consists of the −35 hexamer ers is extremely large, containing a number of and −10 hexamer separated by 17 ± 1 bp. It is transcription factors with multiple subunits. important to emphasize that these consensus In the case of the RNA polymerase I pro- ­sequences represent the average of hundreds moter, which controls the transcription of the of different gene promoters, and no naturally rRNA precursor, the protein known as UBF occurring E. coli promoter actually contains these binds to the promoter region as a dimer and exact sequence elements. In general, however, recruits a second factor known as SL1/TIF1B those promoters with sequences closest to to form the PIC (FIGURE 4.20). The PIC recruits these consensus features are highly efficient in RNA polymerase I bound to TIF1A. Once RNA stimulating transcription initiation; conversely, polymerase I escapes the promoter and moves promoters that deviate significantly from the into the elongation phase, TIF1A is released. consensus sequences tend to be less efficient. The PIC formed at RNA polymerase III Promoter strength (defined as the affinity of the promoters, which control the synthesis of a promoter for RNA polymerase and the rate at number of small RNAs of diverse functions, is which the promoter undergoes ­isomerization slightly more complex than the PIC for poly- to the open complex) for a specific gene is merase I (FIGURE 4.21). The biggest difference is “­optimized” rather than “maximized” to provide that the RNA polymerase III promoter elements the correct basal level of transcription for cell are usually­ located, with some exceptions, growth, proliferation, and homeostasis. within the transcribed region, downstream of In contrast to prokaryotic promoters, eu- the transcription start site. Three types of RNA karyotic promoters are much more complex; polymerase III promoters are classified on the conserved sequence elements can be very basis of the elements found in the internal or

4.5 Promoters direct the initiation of transcription 117 2nd pages 2nd pages

9781284023558_CH04_0103.indd 117 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

FIGURE 4.21 Three classes of RNA polymerase III Internal promoters promoters. The type 1 promoter directs transcription (a) Type 1 promoter (5S rRNA genes) of 5S rRNA genes and contains internal consensus sequences. The type 2 promoter directs transcription Tn of tRNA genes and also contains internal consensus A boxCIE box +1 +50 +64+70 +80+97 +120 sequences. The A box (red) is found in both type 1 and type 2 promoters, whereas the B box (yellow), (b) Type 2 promoter (transfer RNA genes) C box (green), and IE (blue) are sequences specific Tn to the promoter type. The type 3 promoter directs A boxB box transcription of the genes encoding U6 snRNA, which +1 +8 +19+52 +62+73 is part of the spliceosome, and contains external con- sensus sequences located proximal and distal to the External promoters transcription start site. (Adapted from M. R. Paule and (c) Type 3 promoter (mammalian U6 snRNA gene) R. J. White, Nucleic Acids Res. 28 (2000): 1283–1298.) DSE PSE TATA –244 –214 –66 –47 –30 –25 +1

external control regions. Type 1 promoters are to be found in various combinations, depending found in genes encoding 5S rRNA and consist on the level of the gene’s expression. The basal of three internal elements located within the ­transcription factors, also referred to as the gene beginning about 50 bp downstream from general transcription factors, bind to the core the transcription start site. Type 2 promoters are promoter elements and/or to RNA polymerase. found in genes encoding tRNAs and contain two They are labeled with the prefix “TFII” to desig- internal elements beginning 8 bp downstream nate that they are general transcription factors for of the start site. By contrast, the type 3 promot- RNA polymerase II. Collectively, the promoter, ers, which are found in genes encoding the U6 the general transcription factors, and RNA poly- small RNA involved in mRNA splicing, contain merase are referred to as the basal transcrip- upstream promoter elements located between tion apparatus. The core promoter elements −25 and −244 bp upstream of the transcription are as follows: start site. Types 1 and 2 RNA polymerase III • The TATA box, named for the consen- promoters use one or more of the transcription sus sequence (T)ATA, is usually found factors called TFIIIA, -B, and -C, whereas type 25 to 30 bp upstream of the transcription 3 RNA polymerase promoters may use other start site. Although the TATA box is of- transcription factors specific to each promoter. ten cited as a main feature of RNA poly- Promoters used by RNA polymerase II show merase II promoters, most genes that considerably more variation in sequence and lo- are not highly expressed generally lack cation than those of RNA polymerases I and III the TATA box. In fact, the TATA box is but have a similar modular organization. Short- present in less than 35% of all RNA poly- sequence elements that function as transcription merase II promoters characterized.­ The factor binding sites are located upstream of the transcription factor involved in almost start site. Five short core promoter elements all RNA polymerase PICs is the multisub- are located between −40 and +40 base pairs rel- unit protein complex called TFIID. One ative to the transcriptional start site (FIGURE 4.22). of the components of TFIID is the TATA These core promoter elements are rarely all box binding protein. However, TFIID also found in a single promoter and are more likely binds to promoters that do not ­contain

FIGURE 4.22 The modular structure of an eukaryotic –37 to –32 –31 to –26 –2 to +4 +28 to +32 RNA polymerase II core promoter. The eukaryotic RNA polymerase II core promoter consists of multiple short sequence elements located both upstream and down- BRE TATA Box Inr DPE stream proximal to the transcription start site. Any given TFIIB InitiatorDownstream core RNA polymerase II promoter may have some, all or none A G recognition TATA AA promoter element T A of these elements. (Adapted from S. T. Smale and J. T. element G T Drosophila TCAT G Kadonaga, Annu. Rev. Biochem. 72 (2003): 449–479.) T C A A C GGG T G A CGCC G T T CCA Mammals PyPyANA PyPy C

118 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 118 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

TATA • The BRE (TFIIB recognition ele- ment) is a GC-rich heptanucleotide Accessible sequence that binds to the transcrip- core promoter tion initiation factor TFIIB, which is TFIID (core promoter the other basal transcription factor recognition) involved in PIC formation. The BRE is TFIIA often ­located immediately adjacent to the TATA box (when one is present) and TFIIB helps to stabilize the binding of TFIID.

TFIIF-Pol II • Three other elements commonly found in core promoters, especially TFIIE those lacking a TATA box, are the INR (initiator­ region), which overlaps TFIIH the transcription start site, the MTE C-terminal domain (motif 10 element) located between +18 to +27 bp downstream from the TAFs Pol II transcription start site, and the DPE IID IIH (downstream promoter element) located ∼30 bp downstream from the IIB IIE IIA TBP IIF transcription start site. The combina- tion of the INR, MTE, and DPE appears FIGURE 4.23 The assembly of the general transcription to compensate for the lack of a TATA factors on an RNA polymerase II core promoter. The gen- box by interacting with the TATA box eral transcription factors (TFIID, TFIIA, TFIIB, TFIIE, and binding protein associated factors of the TFIIH) assemble with TFIIF-RNA polymerase II to form the TFIID complex. initiation complex at the core promoter. (Adapted from Once TFIID and TFIIB bind to the core pro- R. G. Roeder, Nat. Med. 9 (2003): 1239–1244.) moter, the remaining basal transcription factors, TFIIA, TFIIF, TFIIE, and TFIIH as well as RNA a TATA box by interacting with one or polymerase II, assemble onto the promoter, more of the other core promoter ele- ­interacting directly with the PIC (FIGURE 4.23). The ments. These interactions may involve proposed function of each basal transcription fac- the other subunits of TFIID called TATA tor is shown in FIGURE 4.24. Transcription initiation box binding protein associated factors. commences upon complete assembly of this very

FIGURE 4.24 The function of the RNA polymerase Factor Number of Functions Subunits II general transcription factors in yeast. The general transcription factors in yeast (S. cerevisiae) have spe- TFIIA 2 Stabilizes TBP and TFIID binding. Blocks inhibition by an inhibitory TAF subunit. cific functions. (Adapted from S. Hahn, Nat. Struct. Mol. Biol. 11 (2004): 394–403.) TFIIB 1 Binds TBP, RNA polymerase II, and pro- moter DNA. Its N-terminal region has a finger structure that inserts into the RNA poly- merase II active site cleft. It helps fix the transcription initiation site. TFIID 1 Binds TATA element and deforms promoter (TBP and TAFs) 14 DNA. Platform for the assembly of TFIIB, TFIIA, and TAFs. Binds Inr and DPE promoter elements. TFIIE 2 Binds promoter near transcription initiation site. May help to stabilize the transcription bubble in the open complex. TFIIF 3 Prevents nonspecific DNA binding. Binds RNA polymerase II and is involved in recruiting RNA polymerase II to the preinitiation com- plex. It has regions that are homologous to the bacterial σ factor.

TFIIH 10 Functions in transcription and DNA repair. It has kinase and activities and is essential for open complex formation.

Adapted from Hahn, S., Nat. Struct. Mol. Biol. 11 (2004): 394–403.

4.5 Promoters direct the initiation of transcription 119 2nd pages 2nd pages

9781284023558_CH04_0103.indd 119 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

large complex­ of transcription factors­ and RNA Once RNA polymerase II has left the pro- polymerase II and the polymerization of the first moter and entered the elongation phase, two phosphodiester linkages of the RNA transcript. important transcription factors increase the During the transition between the initi- ­fidelity and elongation rate of the polymerase. ation and elongation phases of transcription, TFIIS (also known as SII) is a fidelity factor many basal transcription factors dissociate that causes RNA polymerase II to backtrack and from the initiation complex, but TFIIH plays ­remove any noncomplementary nucleotide an ­essential role in the transition. TFIIH pos- that has been erroneously inserted. P-TEFb is sesses both kinase and helicase activities that an elongation factor that is a ­cyclin-dependent are ­required for the isomerization process of protein kinase. P-TEFb phosphorylates the CTD melting the DNA and the phosphorylation of on the large subunit of RNA polymerase II at the CTD of the large subunit of RNA poly- sites distinct from those modified by TFIIH and merase II. Phosphorylation converts the CTD also phosphorylates specific antielongation fac- into a target for the binding of RNA-processing tors located on the CTD, thereby stimulating proteins that are carried along with RNA poly- RNA polymerase elongation. merase II (FIGURE 4.25). The mRNA splicing and In summary, the interaction of RNA poly- 3′ end processing factors bind to the CTD at merase with the promoter region can occur various locations along the DNA template and directly (in prokaryotes) or indirectly (in transfer from the polymerase to the mRNA as ­eukaryotes) through the formation of PICs the factor binding sequences on the transcript using the general transcription factors. Both emerge from the active site of the polymerase. prokaryotic and eukaryotic promoters are constructed out of conserved short modular sequences that contact the individual subunits of RNA polymerase or the basal transcription factors. The variation in promoter struc- ture is correlated with variation in promoter strength and can be regarded as the first level CTD of gene-specific transcriptional control. Once RNA polymerase leaves the promoter region, TFIIJ & TFIIH other transcription factors help the polymerase to avoid errors and inactivation. In the next section, we explore the structure and function of the gene-specific transcription factors that activate or repress the basal level of transcrip- tion in response to the cellular environment and during cell growth, proliferation, differ- entiation, and development. CTD tail is phosphorylated Concept and Reasoning Check 1. The following 80-nucleotide DNA sequence (sense P strand) contains a classical RNA polymerase II P core promoter. Identify the conserved sequence elements and nucleotide that most likely serves as the transcription start site (+1). RNA polymerase transcribes 5′TCATACCTTCCTATCTCCGATCCTAGGCTATTTATAACCATGG P TATTTCATGCATTACCTTCATTCCTGTGGCCTACGCA 3′ P

4.6 Activators and repressors P [YSPTSPS]n regulate transcription FIGURE 4.25 Phosphorylation of the RNA polymerase initiation II CTD. The CTD of RNA polymerase II is phosphorylated by the kinase activity of TFIIH before the transition of Key concepts the polymerase from the initiation phase to the elon- • Transcription factors modulate RNA biosynthesis at gation phase of transcription. The human CTD contains the initiation phase by interacting with the basal transcription apparatus. 52 ­repeats of the amino acid sequence YSPTSPS.

120 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 120 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Enhancer • Transcription factors bind to DNA sequences proxi- DNA mal or distal to the basal initiation complex at the core promoter region. DNA binding domain • Enhancers are transcription factor binding sites Activating domains proteins ­located distal to the promoter that function to modulate transcription initiation through DNA DNA binding domain looping. TATA Inr DPE • The E. coli lactose (lac) operon serves as a Regulatory promoter ­paradigm for understanding the general ­principles Core promoter of regulation of transcription initiation. The lac FIGURE 4.26 function through DNA looping. Enhancer binding transcription repressor is a negative regulatory factor that factors contact the other transcription factors located proximal to the core promoter ­represses transcription initiation at the lac pro- region via DNA looping. moter by blocking RNA polymerase, whereas the cyclic adenosine monophosphate (cAMP) receptor protein (CRP) activates transcription by recruiting RNA polymerase to the lac promoter. distances from the core promoter is called an • Transcription factors are composed of structural enhancer. Enhancers can function hundreds motifs that function in DNA recognition, dimeriza- or even thousands of base pairs from the core tion of subunits, and protein–protein interactions promoter and may be located either upstream with the basal transcription apparatus. or downstream of the transcription start site. • The most common eukaryotic structural motifs are Because of the extreme flexibility of the large the helix-turn-helix (HTH) and zinc finger (ZF) intervening DNA loop, enhancers can also func- DNA binding domains and the helix-loop-helix (HLH) and basic leucine zipper (bZIP) dimerization tion in ­either a forward or backward orientation domains. (FIGURE 4.26). For enhancer binding proteins to exert their effects on the basal transcription ap- paratus, other proteins, called coactivators, Virtually every aspect of cell structure and may facilitate the conformational rearrange- function is directly or indirectly controlled by ment or the condensation of the DNA loop, the coordinated regulation of gene expression. thereby bringing the factor in close proximity to Gene regulation also directs the differentiation the core promoter. Some transcription factors, and development of all multicellular organisms including enhancer binding proteins, connect to through its interface with signal transduction. the basal transcription apparatus through a large It is now clear that regulation can occur at oligomeric protein complex called Mediator, ­every level of gene expression, including tran- which has a molecular­ weight of 1.2 MDa and scription, RNA processing, nucleocytoplasmic is essential for transcription activation of almost transport, translation, and RNA and protein all protein-coding genes (FIGURE 4.27). Because degradation/recycling. However, we currently know the most about transcriptional regula- tion, in particular, control of transcription fac- Regulatory tors that function as activators and repressors sequence DNA

of the initiation step. ACT In addition to the core promoter and the basal factors, almost all protein-coding genes Tail Srb8–11 contain binding sites for transcription fac- Middle tors that direct the increase (upregulation) or ­decrease (downregulation) of transcription ini- tiation. Transcription factor binding sites may be Pol II located proximal (<10 bp) to the core promoter, IIH facilitating short-range interactions between the IID IIB IIE transcription factor and the basal transcription IIF apparatus, or the binding sites may be ­distally Head located (>100 bp) from the core promoter. In FIGURE 4.27 RNA polymerase II mediator. Mediator functions as a bridge between general, those transcription factor binding sites gene activator (ACT) transcription factors bound at distal sites and the basal tran- that function distal to the core promoter require scription apparatus bound at the core promoter. Mediator has three oligomeric that the intervening DNA sequence­ form a loop structural domains (head, middle, and tail). Each domain consists of several protein that brings the transcription factor in proxim- subunits. The entire mediator­ complex consists of more than two dozen subunits. ity to the basal transcription apparatus at the The protein Srb8-11 (also known as Cdk8) can inhibit the interaction of mediator core promoter. An important type of transcrip- with RNA polymerase. (Adapted from S. Björkland and C. M. Gustafson, Cell 30 tion factor binding site that functions at long (2005): 240–244.)

4.6 Activators and repressors regulate transcription initiation 121 2nd pages 2nd pages

9781284023558_CH04_0103.indd 121 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

(a) enhancers can function at such large distances

Repressor Promoter Operator Structural genes and in both directions from the core promoter, gene sequences, called silencers, set boundaries that iop zya provide limitations in both distance and direc- 5 3 3 5 tionality to the effects that enhancers exert­ on their promoter targets. Direction of transcription The first biologic regulatory circuit to (b) be characterized genetically and biochemi- iop zya cally was the lactose (lac) operon in E. coli 5 3 3 5 (FIGURE 4.28), which has served as a paradigm for all other transcriptional regulatory systems. Repressor prevents transcription E. coli is a bacterium that inhabits the intestine Transcription of most mammals and uses the host’s dietary mRNA intake of glucose as its primary source for ­energy metabolism. When glucose is scarce, the E. coli cell induces the synthesis of several Repressor proteins for the utilization of lactose as an protein ­energy source—lactose is a disaccharide com- posed of glucose and galactose. The ­enzyme (c) -­galactosidase cleaves the bond between iop zya β 5 3 glucose and galactose,­ whereas subsequent 3 5 metabolic steps convert galactose to glucose, thereby yielding two molecules of glucose from -repressor a single lactose molecule. The con- complex tains three lactose utilization genes arranged mRNA in tandem: the gene encoding β-galactosidase, Inducer which breaks down lactose; a gene encoding a lactose permease, which allows lactose to en- Repressor ter the cell; and a nonessential gene encoding protein a ­galactoside acetyl transferase, which acety- lates a number of proteins but whose biologic

RNA polymerase function is still unclear. All three genes are encoded in a single polycistronic mRNA that iop zya is transcribed from the operon promoter 5 3 lac 3 5 called Plac.

Plac is under both negative and positive Transcription regulation. Normally, when glucose levels are allowed high and the lactose-utilizing enzymes are not lac mRNA needed, Plac is shut down by a protein called the , which binds to the operator region adjacent to the promoter and blocks binding and/or promoter clearance by RNA polymerase. If glucose levels fall and lactose is β-galactosidase PermeaseTransacetylase detected in the surrounding medium, lactose FIGURE 4.28 The E. coli lac operon. The E. coli lac operon (not drawn to scale) is taken up by the E. coli cell. A small amount consists of three genes encoding proteins ­necessary for the utilization of lactose of β-galactosidase is constitutively present in as an energy source z = β­ -­galactosidase; y = galactose permease; a = transacet- the cell, and one of its minor catalytic activ- ylase. (A) The promoter region contains a negative regulatory site called the ities is to convert some of the lactose to the operator (o), which overlaps the lac promoter (p). (B) In the absence of lactose, compound allolactose. Allolactose acts as an the lac repressor, encoded by the repressor gene (i), binds to the operator and inducer of the lac operon by binding to and inhibits transcription. (C) In the presence of lactose, the lactose is converted eliciting a conformational change in the re- to allolactose, which binds and stimulates the release of the repressor from the pressor, which causes the repressor to dissoci- operator, permitting RNA polymerase to transcribe the operon. ate from the operator, thereby allowing RNA polymerase to transcribe the lac mRNA and produce the enzymes required to break down lactose.

122 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 122 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

A positive regulatory pathway for induc- important mechanisms for transcrip- tion of the lac operon also functions in E. coli. tion factor modulation of transcription. When cellular glucose levels are low, the bac- The first mechanism is referred to as terium synthesizes the messenger molecule physical interference. In the case cAMP. cAMP binds to a transcription activator of the lac operon, the lac operator lies

protein called the CRP, whose DNA binding site just downstream of Plac and binding of

is located adjacent to Plac. The cAMP–CRP com- the lac repressor to the operator phys- plex binds to the promoter region and signifi- ically blocks polymerase from binding cantly increases the affinity of RNA polymerase to the promoter, although some stud-

for Plac, thereby enhancing transcription of the ies have suggested that polymerase can lac operon and resulting in the upregulation still bind but is unable to escape the of the synthesis of the lac utilization enzymes promoter after initiation. The second (FIGURE 4.29). Thus, full induction of the lac mechanism, known as recruitment, ­operon genes requires both the absence of glu- is illustrated by the interaction of CRP cose and the presence of lactose. with the CRP binding site, located just

The lac operon illustrates several important upstream of Plac. In this case the tran- points applicable to virtually all other examples scription factor increases the affinity of transcriptional regulation in both prokary- of RNA polymerase for the promoter. otes and eukaryotes, as discussed below: Eukaryotic transcription factors can • There is a strong link between cellular sig- also increase the local concentration naling responses and transcription. The of RNA polymerase II and the basal low level of constitutive synthesis of transcription apparatus at the core lactose permease and β-galactosidase promoter. A third key mechanism for allows the cell to import lactose into transcription regulation is allosteric the cell and for β-galactosidase to modification. Transcription factors convert lactose to allolactose, which can interact with polymerase or other then induces the operon by causing lac components of the basal transcription ­repressor to dissociate from the pro- apparatus and cause conformation moter. The cAMP formed in response changes in the core promoter complex, to a drop in intracellular levels of glu- leading to a modulation of transcrip- cose induces the operon by increasing tion initiation. the affinity of RNA polymerase for the • Transcriptional regulation often involves promoter. both positive and negative regulatory • Key protein–protein interactions ­between mechanisms. The lac operon is an ex- transcription factors and the basal tran- ample of a gene system that is normally scription apparatus are mediated by off due to constitutive repression by DNA–protein interactions. Both lac the lac ­repressor and that requires ­repressor and CRP interact with RNA the repressor to be released for tran- polymerase. These protein–protein scription to occur. Activation of RNA interactions are illustrative of three polymerase, however, also occurs by

AR1 RNA polymerase αNTD β CRP β αCTD σ

–61.5 –35 –10 FIGURE 4.29 Positive control of the lac operon. The lac operon is under positive regula- tory control by the cAMP receptor protein, CRP. CRP binds to the site just upstream from the −35 hexamer in the lac promoter and contacts the carboxy-terminal domain of the a subunit of RNA polymerase (α-CTD); this interaction causes an allosteric conformational change that induces the interaction between the amino-terminal domain of the α subunit with large subunits, β′ and β. These protein-protein interactions strengthen the affinity of RNA polymerase for the lac promoter. (Adapted from S. Busby and R. H. Ebright, J. Mol. Biol. 293 (1999): 199–213.)

4.6 Activators and repressors regulate transcription initiation 123 2nd pages 2nd pages

9781284023558_CH04_0103.indd 123 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

cAMP-bound CRP. Transcriptional of the protein interacts directly with RNA regulation of eukaryotic genes also polymerase or with some component of the ­involves both repression­ and activa- basal transcription apparatus to increase the tion. Most genes in eukaryotes are efficiency of initiation by RNA polymerase constitutively repressed­ by highly through recruitment or an allosteric effect. condensed chromatin near and at the Thus, such transcription factors contain a DNA core promoter region. Before basal binding domain, an activation ­domain, transcription can occur, the chroma- and, in some cases, a dimerization domain. tin in the promoter region must be Other transcription­ factors function as internal rearranged and ­remodeled to allow subunits of a multisubunit complex and con- access of the basal transcription appa- tain only domains that are involved in protein– ratus to the DNA template. Thus, the protein interactions. transcription activation factors include Transcription factors are often classified a ­number of ­proteins that covalently into families according to the specific combi- modify to loosen their interac- nations of secondary and tertiary structures of tion with DNA within the chromatin. their functional domains that are the result of In addition, other activator proteins distinctive primary structures, that is, amino act to physically rearrange or move acid sequences, resulting in a structural fold the ­histones away from the DNA. or structural motif. Motifs are sometimes • Transcription activation may occur at prox- referred to as super-secondary structures imal or distal sites. Although the CRP to indicate that they contain a specific order of binding site and primary lac operator secondary structures. Although a large number are located immediately adjacent to the of structural motifs have been identified in the promoter, other lac repressor binding­ crystal structures of transcription factors, four sites or secondary operators are families of motifs are especially common: the ­located about −100 bp and +400 bp on HTH, the ZF, the HLH, and the leucine zipper. either side of the promoter that enhance The HTH motif is found in many pro- the interaction of lac repressor with the karyotic transcription factors, including the primary operator. Thus, action at dis- lac repressor and the bacteriophage λ cro and tance appears to be a universal strategy cI ­repressors. In the HTH motif, one α helix for transcription factor function in both ­interacts with the major groove of DNA and prokaryotes and ­eukaryotes. the other α helix, which is separated from The regulation of mRNA transcription the first helix by a β turn, lies at an angle by RNA polymerase in eukaryotes is remark- across the DNA. The amino acids within the ably complex compared with the lac operon. α helices contain functional groups that form Thousands of transcription factors are encoded ­hydrogen bonds and ionic interactions with in higher eukaryotic genomes that activate or functional groups on the nucleotides of the repress transcription. A comprehensive dis- DNA. In eukaryotes, a motif very similar to the cussion of eukaryotic transcription factors is HTH, called the homeodomain, is found in a beyond the scope of this chapter. However, a number of transcription factors (FIGURE 4.30). few general principles are important for under- The term “homeodomain” comes from a standing how transcription factors activate or highly conserved gene sequence known as repress transcription in response to the cellular the homeobox, which codes for a 60 amino environment and during cell growth, prolifer- acid domain commonly found in transcrip- ation, development, and differentiation. tion factors that regulate the development of The range of different types of eukaryotic the body plan of many animals. Examples of transcription factors is vast, but these proteins eukaryotic transcription factors with homeo- often share a number of structural features domains include the POU family members that have specific functions in activation or Pit-1, Oct-1, and unc-86, as well as the HOX repression. Transcription factors commonly proteins involved in pattern formation during function as monomers or as dimers. For those development. transcription factors that interact directly with The ZF motif is found in the DNA binding DNA, a distinct region of the three-dimensional domains of various eukaryotic transcription structure of the transcription factor, referred factors, including several nuclear hormone to as a domain, interacts with a specific DNA ­receptors. In addition, other transcription fac- sequence element, whereas another domain tors, including Sp1 and the Wilms tumor

124 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 124 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

to the core promoter sequences of many highly N-terminal arm lies expressed housekeeping genes. Sp1 contains in minor groove Helices 1 and 2 lie above the DNA three ZF motifs. Many transcription factors function as 37 homodimers or heterodimers. The subunits of 2 these dimers are often the products of related 28 42 3 10 gene families. Different combinations of the 1 family members form heterodimers can create 22 58 activating or inhibitory complexes with varying capacities to modulate transcription. In some cases, a family includes inhibitory members, Helix 3 lies in the major groove whose participation in dimer formation pre- vents the partner from activating transcription. FIGURE 4.30 The homeodomain motif. The homeodomain The basic helix-loop-helix or bHLH mo- contains three α helices and is found in a number of eukary- tif is a common motif found in the ­dimerization otic transcription factors important for cellular development. domains of many transcription factors that Helices 1 and 2 lie above the DNA, whereas helix 3 (the rec- ­regulate development in eukaryotes. Each am- ognition helix) lies in the major groove and contains amino phipathic helix contains ­hydrophobic residues acids that make specific hydrogen bond contacts with bases on one side and charged residues on the other in the DNA binding site. side that enable proteins to dimerize, whereas a short segment of basic amino acids adjacent protein (WT-1), also contain ZF motifs. The ZF to the bHLH region contacts DNA by ionic in- motif derives its name from the stable loop of teractions with the phosphodiester backbone amino acids that forms through the coordina- (FIGURE 4.32). The two helices are separated by tion of a zinc ion by cysteine and histidine res- a relatively unstructured loop that allows the idues (FIGURE 4.31). The finger itself comprises two helices to act independently of each other. ∼20 to 25 amino acids, and the linker between Two classes of the many bHLH transcription fac- adjacent fingers is usually 7 to 8 amino acids. tors are the MyoD family of proteins, which are The finger sequences are usually organized into tandem arrays ranging from 2 to 9 finger ­repeats. Sp1 is an important transcription factor C C that binds to GC-rich segments located adjacent

Zn Zn Zn

Forms Forms sheet helix N N

FIGURE 4.32 The helix-loop-helix motif. The helix-loop-helix motif is used by transcription factors to form homo- and heterodimers. In the structure shown, the transcription factor MyoD is a homodimer with FIGURE 4.31 The zinc-finger motif. Zinc fingers two recognition helices containing form through the chelation of Zn2+ ion by histidine highly basic regions that contact and/or cysteine amino acids. Often multiple zinc specific bases in the DNA. (Structure fingers are found within adjacent α helices that from 1MDY. P. C. insert into the ­major groove of the DNA. Ma, et al., Cell 77 (1994): 451–459.)

4.6 Activators and repressors regulate transcription initiation 125 2nd pages 2nd pages

9781284023558_CH04_0103.indd 125 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

involved in muscle cell differentiation, and the Aside from the use of structural motifs­ in the E-box proteins, which play important roles in taxonomic classification of transcription factors, the regulation of genes involved in cell growth knowledge of the location of motif structures and proliferation. can be very useful in the ­assessment of potential The bZIP motif is another example of a effects of specific gene ­mutations, particularly commonly encountered motif found in the disease-causing ­mutations. Molecular biology dimerization domains of transcription factors. techniques are becoming­ increasingly impor- Each side of the “zipper” consists of a stretch tant in diagnosis and prognosis of diseases and of amino acids with a leucine residue in every are changing their entire classification systems. seventh position on one monomer subunit that Diseases that were formerly classified on the ba- interacts with the corresponding region on the sis of signs and symptoms are now being associ- other monomer subunit to form the dimer. ated with specific gene mutations. Diagnosis and Adjacent to each zipper is a stretch of positively prognosis of various diseases based on specific charged residues that is involved in binding to mutations and levels of expression of specific DNA, similar to the bHLH motif. The two sides genes (referred to as molecular biomarkers) are of the leucine zipper in effect form a Y-shaped becoming increasingly commonplace in medical structure, in which the zipper motifs comprise practice. In many cases, some of the most delete- the stem and the two basic regions protrude out rious disease-causing mutations associated with to form the segments that interact with the DNA genes encoding transcription factors are found (FIGURE 4.33). As with the bHLH motif, zippers in the key structural motifs of the transcription can be involved in either homodimer or het- factor functional domains. erodimer formation. An important example In summary, the genomes of both prokary- of a homodimer transcription factor is C/EBP, otes and eukaryotes encode a large number which binds to a sequence known as the CAAT of transcription factors that act to increase or box, usually located 50 to 100 bp from the core decrease the basal level of transcription of spe- promoters of highly expressed genes. An ex- cific genes, often in response to signals from the ample of a heterodimer that forms due to the cellular environment or other cells. The mech- bZIP motif is the Jun-Fos transcription factor anisms of transcriptional activation include re- known as AP1, which activates the transcription cruitment of the basal transcription machinery, of a number of genes important in cell growth alteration of chromatin structure to clear the and proliferation. Mutations in the Jun and Fos core promoter region, and allosteric activation subunits of AP1, which are oncoproteins, are of RNA polymerase. In the next section, we dis- associated with certain forms of cancer. cuss some specific examples of ­signal-mediated control of transcription initiation in mamma- lian cells. Leucines on hydrophobic faces of helices interact Concept and Reasoning Check 1. Enhancers can function at very large distances to activate initiation at core promoters. Several dif- ferent core promoters are often within the maxi- Basic region Basic region mum distance for enhancer-mediated activation. binds DNA binds DNA Propose some possible mechanisms for selective activation of one promoter over another. Subunit 1 Subunit 2

+ + + + + + + + 4.7 Transcriptional regulatory

FIGURE 4.33 The leucine zipper motif. The circuits control eukaryotic leucine zipper motif is used by many transcrip- cell growth, proliferation, tion factors to form homo- and heterodimers. The zippers are held together by hydrophobic and differentiation interactions between the α-­helical stretches of Key concepts leucine residues on each monomer. The basic • A number of important transcription factors activate regions located near the leucine zipper motifs transcription by facilitating the displacement of ­interact with specific­ bases within the transcrip- from the core promoter region, thereby clearing the region for transcription initiation. tion ­factor binding site.

126 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 126 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

cAMP to CRP promotes the binding of CRP to • cAMP- binding protein (CREB) and signal transducers and activators of transcription its recognition site on the lac promoter, which (STAT) are examples of transcription factors that activates transcription through the recruitment are activated by signal-mediated phosphorylation. of RNA polymerase. However, cAMP also plays STAT is phosphorylated by the Janus kinase (JAK) an important role in eukaryotic cells by acting and CREB is phosphorylated by protein kinase A as a messenger molecule in response to differ- (PKA). ent types of extracellular signals. When cAMP • The members of the Myc family of proteins (Myc, Max, and Mad) form different combinations of levels rise, cAMP binds and activates PKA in heterodimers that act as activators or repressors the cytoplasm. The cAMP–PKA complex enters depending on the dimer composition. the nucleus where it phosphorylates CREB. • The steroid hormone receptors are transcription Phosphorylated CREB binds to the cAMP factors that when bound to hormone activate tran- response element within the promoters of a scription by binding to DNA sequences known as number of cAMP-response genes and activates hormone-response elements within the promoters of hormone-responsive genes. their transcription (FIGURE 4.34). Activation of transcription by CREB does not occur through a direct interaction with the basal transcription We previously discussed how transcription apparatus; instead, CREB recruits a ­complex activation occurs in both prokaryotes and containing the CREB binding protein complex,­ eukaryotes through interactions between CBP/p300, and the protein PCAF. The CREB– the transcription factors and the basal tran- CBP/p300–PCAF complex acetylates the his­ scription apparatus. Activation plays an espe- tones in the region of the core promoter, cially important role in eukaryotes where the causing their displacement and thereby fa- ­genome is constitutively repressed due to the cilitating the assembly of the basal transcrip- assembly of the DNA with proteins to tion apparatus at the core promoter. Histone form chromatin. An important function of displacement occurs because acetylation of many transcription factors is to participate in the amino groups on the side chains of lysine the displacement of the nucleosomes from residues neutralizes their positive charges and the core promoter region. The chromatin re- reduces the electrostatic interactions between modeling process occurs in two steps: (1) the histones and the negatively charged backbone loosening of nucleosomes through covalent of DNA. modification of specific histones, including Another example of the signal-mediated acetylation, phosphorylation, and methylation, control of transcription factor phosphoryla- and (2) the physical displacement of the loos- tion is the JAK–STAT pathway, which is ened histones from the core promoter region stimulated by the binding of cytokines to by ATP-dependent structural modification pro- their membrane-bound cytokine receptors. teins. Some of the gene-specific transcription Cytokines are a class of small protein or pep- factors we discuss in this section carry out these tide growth factors that bind to receptors in activities. the plasma membrane of various cells in the In any given cell type within a tissue, a immune system and stimulate cell prolifera- select set of transcription factors becomes tion and growth. Upon binding of the cyto- ­activated in response to extracellular signals. kine to the receptor, the receptor ­phosphorylates Various combinations of transcription fac- and activates a tyrosine kinase called JAK (a tors induce different transcriptional levels of member of the Janus kinase family). JAK binds, their target genes, thereby modulating the phosphorylates, and activates a transcription amounts of mRNA and protein that are pro- factor known as STAT. Once phosphorylated, duced. We discuss several examples to illus- STAT forms a homodimer and enters the nu- trate the ­linkage between cell signaling and cleus where it recognizes sequences known as ­transcription. BOX 4.1 provides a brief descrip- cytokine ­response ­elements located within tion of common methods for measuring and the promoter regions of a number of different analyzing changes in gene expression. genes involved in cell division (FIGURE 4.35). As A very important signal transduction- in the case of CREB, STATs activate transcrip- mediated, gene-specific transcription factor is tion indirectly by recruiting histone acetylases CREB. We saw previously in our discussion of to clear nucleosomes from the promoter region. the E. coli lac operon that cAMP is a messenger The c-Myc family of proteins constitutes molecule that increases in concentration when another important group of transcription fac- bacterial cell glucose levels decrease. Binding of tors and includes the proteins Myc, Max, and

4.7 Transcriptional regulatory circuits control eukaryotic cell growth, proliferation, and differentiation 127 2nd pages 2nd pages

9781284023558_CH04_0103.indd 127 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Box 4.1. Methods and techniques: Gene expression profiling

For many years, the expression of genes was investi- The entire population of mRNAs isolated from cells is gated one at a time. The study of gene expression dur- labeled with a fluorescent dye and placed in contact ing cell growth and proliferation and in development with the array. Each of the labeled mRNAs base pairs and different diseases, however, has required a more ­(hybridizes) with those arrayed DNA sequences derived global approach in which the temporal expression of a from the gene from which the mRNA was transcribed, large number of genes can be measured simultaneously. thereby causing the hybridized DNAs to fluoresce. A These needs led to the development of two very powerful fluorescence-imaging­ device scans the microarray and ­approaches for gene expression profiling: microarrays records the color, fluorescence intensity, and position of and the quantitative reverse transcriptase poly- the fluorescent spots, which provides an mRNA expres- merase chain reaction (qRT-PCR). sion profile for the cell or tissue. Microarrays are arrays of small pieces of DNA from To compare the levels of mRNAs from different cell known segments of individual genes that are printed as types or disease states, two different approaches can be individual spots on a solid support such as a silicon chip taken. In the dual-color microarray experiment, mRNAs or a glass slide. A single microarray may have hundreds are isolated from two different cell types, and each mRNA to tens of thousands of different spots representing all or sample is labeled with a different fluorescent dye. The a subset of the known genes in a genome (FIGURE 4.B1). two samples are then mixed together and hybridized to the same microarray. An estimation of the relative differences in expression of specific genes between the two cell types is obtained by measuring the color ratio of the hybridized fluorescent RNAs. For example, mRNA isolated from cell type 1 is labeled with a dye that fluo- resces red, whereas mRNA isolated from cell type 2 is labeled with a dye that fluoresces green. The two samples of mRNA are mixed together and allowed to hybridize to the microarray. If a spot on the microarray fluoresces yellow (green and red produce yellow), one can interpret the result to mean that the mRNAs are present in equal amounts in the two cell types. However, if the spot is red, the gene is expressed at a higher level in cell type 1 than in cell type 2. A green spot indicates the converse. A fluorescence micrograph of a typical microar- ray chip with mRNA hybridized using the dual-color ­approach is shown in Figure 4.B1. The significant de- crease in the cost and increase in the quality of mi- croarrays has led to a second approach in which each individual mRNA sample is hybridized to a separate array. The fluorescence intensities from corresponding spots on the different arrays­ can then be quantitatively compared by ­computer-assisted image analysis. This ap- FIGURE 4.B1 A fluorescence microscopy image of a yeast (S. cerevisiae) proach eliminates the necessity to have dual-color label- genome microarray spotted with fragments of 6,000 different yeast ing. A display of the data obtained from a comparative genes. The microarray was hybridized to an mRNA sample isolated from microarray analysis is shown in FIGURE 4.B2. yeast cells that were grown under two different metabolic conditions. Protocols have also been developed to increase the The transcripts from each population were labeled with different fluo- sensitivity of microarrays for detection of mRNAs that rescent dyes (Cy3 and Cy5), mixed together and hybridized to the same are expressed at very low levels. The enzyme RT is used array. The colors of the spots indicate that the level of expression of a to convert the isolated mRNA to complementary DNA particular mRNA was increased (red) or decreased (green) under one (cDNA) that can be amplified a million-fold by the PCR. condition versus the other. Yellow indicates no difference in expression The cDNA is used as a template to prepare RNA that can between the two conditions. (Photo courtesy of the University of New be detected by direct fluorescent labeling or by secondary Mexico Health Sciences Keck UNM Genomic Resource.) fluorescent antibody or streptavidin labeling.

128 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 128 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Box 4.1. Methods and techniques: Gene expression profiling

Time to as a quantitative PCR. Under optimal conditions, the amount of PCR product is directly proportional to the amount of initial starting concentration of DNA at the beginning of the PCR reaction. The quantitative PCR technique can determine the level of a specific mRNA in a specific cell type by using the enzyme RT, which

R = Reporter Polymerization A Q = Quencher

Forward primer R Probe Q 5’ 3’ 3’ 5’ 5’ 3’ 5’ B Reverse C primer Strand displacement D R Q 5’ 3’ E 3’ 5’ 5’ 3’ 5’

Cleavage FIGURE 4.B2 An example of an analysis of a microarray experiment. R In this example, human fibroblast cells were profiled over time after Q 5’ 3’ stimulation with serum. Those mRNAs whose expression increased 3’ 5’ relative to unstimulated cells are shown in red, whereas mRNAs whose 5’ 3’ expression was decreased relative to unstimulated cells are shown in 5’ green. Black indicates no significant change in expression. Groups A–E delineate sets of genes that are grouped together by related Polymerization completed R Q functions and expression patterns. (Reproduced from M. Eisen, et al., 3’ Proc. Natl. Acad. Sci. USA 95 (1998): 14863–14868. Copyright © 1998 5’ National Academy of Sciences, U.S.A. Photo courtesy of Michael Eisen, 3’ 5’ 5’ 3’ University of California, Berkeley.) 5’ FIGURE 4.B3 The TaqMan® PCR assay is used to quantify the amount The use of DNA microarrays has been perfected over of a specific mRNA in an RNA sample. A labeled oligonucleotide probe, the past decade and is providing a wealth of information which hybridizes to a specific sequence in the interior part of the on the differential regulation of gene expression during mRNA, is included in the PCR reaction, which contains a forward and cellular differentiation, proliferation, and development. a reverse primer. The probe contains a “reporter” fluorescent dye at Microarray analysis is also yielding clues to the changes one end of the oligonucleotide probe and a nonfluorescent “quencher” in gene expression that occur in various diseases includ- molecule at the other end. The quencher acts to strongly inhibit ing cancer, diabetes, and cardiovascular disease. the fluorescence of the reporter in the intact oligonucleotide. During A companion technique to microarrays is real-time the PCR reaction, synthesis of the complementary strand displaces the qRT-PCR. Standard PCR uses specific DNA oligonucle- oligonucleotide from the strand that is undergoing amplification. The otide primers to initiate DNA replication across a spe- DNA polymerase used in the PCR assay has a nuclease activity that cific section of the genome. By repeating this process for cleaves the reporter from the quencher, resulting in significantly 20–40 cycles, one can amplify a very small amount of increased reporter fluorescence. The TaqMan® approach derives its DNA from a specific segment of the genome. A variation name from the fact that the thermostable DNA polymerase used in of PCR is to use fluorescent dyes to monitor the PCR the assay, Taq DNA polymerase, was originally isolated from the reaction and to quantify the amount of PCR product pro- thermophilic bacterium T. aquaticus. (Courtesy of Life Technologies, duced as a function of time (real-time), which is referred Carlsbad, CA.)

4.7 Transcriptional regulatory circuits control eukaryotic cell growth, proliferation, and differentiation 129 2nd pages 2nd pages

9781284023558_CH04_0103.indd 129 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Box 4.1. Methods and techniques: Gene expression profiling

­copies mRNA into DNA. In qRT-PCR, total RNA is iso- a nuclease activity that cleaves the probe, thereby sepa- lated from a cell or tissue and is used to produce a cDNA rating the reporter from the quencher, resulting in sig- copy of each mRNA transcript. The cDNA is amplified nificantly increased reporter fluorescence. The TaqMan® using primers to a specific mRNA of interest. Several approach derives its name from the fact that the ther- housekeeping genes, whose mRNA expression levels are mostable DNA polymerase used in the assay, Taq DNA known to be relatively constant, are used as standards polymerase,was originally isolated from the thermo- to obtain quantitative results. philic bacterium Thermus aquaticus, which in hot There are two major approaches to quantifying the springs and hydrothermal vents. Taq DNA polymerase qRT-PCR reaction. In the first approach, a dye that fluo- can withstand the constant heating–cooling cycles of resces intensely upon binding to DNA is included in the the PCR process. reaction (the most commonly used fluorescent dye in The qRT-PCR method has become a popular and qRT-PCR is SYBR Green). As the amount of amplified widespread approach to examine changes in the expres- DNA increases during the qRT-PCR reaction, the amount sion of specific genes in relationship to cell proliferation, of bound dye increases, thereby producing an increase differentiation, and tissue development. qRT-PCR is also in fluorescence. In the second approach, referred to as frequently used to verify the changes in gene expres- a TaqMan® assay (Applied Biosystems, Foster City, CA), sion detected in large microarray studies. Used together, a labeled oligonucleotide, which hybridizes to a specific microarrays and qRT-PCR provide a quantitative pro- sequence in the interior part of the mRNA, is included file of changes in the expression of specific genes at the in the PCR reaction. The probe contains a “reporter” RNA level. These technologies are revolutionizing our fluorescent dye at one end of the oligonucleotide and a approach to the study of human disease. Many diseases nonfluorescent “quencher” molecule at the other end that were previously classified based on physical signs (FIGURE 4.B3). The quencher acts to strongly inhibit the and symptoms are now classified according to molecu- fluorescence of the reporter in the intact oligonucleotide. lar signatures based on differences in gene expression, During the course of the PCR, synthesis of the comple- which is becoming an invaluable tool for precise disease mentary strand from one primer displaces the labeled diagnosis and prognosis, and in the evaluation of ther- probe. The DNA polymerase used in the PCR assay has apeutic options.

Mad. In highly differentiated nonproliferating to a bHLH dimerization–DNA binding motif cells, the expression of the Myc family genes (FIGURE 4.36). Once synthesized, Myc and Max is low. Upon stimulation of cells with growth form a heterodimer­ transcription activation factors, the Myc and Max genes are activated. factor that binds to an E-box sequence in the All members of the Myc family contain a promoter regions of specific genes involved in leucine zipper dimerization motif adjacent cellular growth and proliferation, including the

FIGURE 4.34 Regulation of PKA-activated genes by CREB. The cAMP-dependent protein kinase (PKA) phosphorylates and activates the Cytosol Nucleus cAMP response element binding (CREB) protein. Phosphorylated CREB recruits the CBP, Activated PKA which as a complex activates a number of genes Nuclear pore that are important in hormonal control of cellular metabolism and in brain function. (Adapted from Inactive CREB B. M. Alberts, et al. Molecular Biology of the Cell, Activated PKA Activated, phosphorylated CREB Fifth edition. Garland Science, 2008.) CREB-binding protein (CBP) Activated target gene

Cyclic AMP response element (CRE) Transcription

130 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 130 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Cytosol

JAK

Cytokine receptors

JAK Binding of cytokine cross-links adjacent P receptores Cytokine JAKs cross-phosphorylate each other on tyrosines Nucleus

P

Activated JAKs P phosphorylate STATs translocate to nucleus, bind to DNA and other gene receptors on P STATs dissociate tyrosines from receptor and regulatory proteins, and activate gene transcription P dimerize via their SH2 domain Other gene P regulatory proteins

STAT2 P P P P JAK Nuclear pore P P P Cytokine target gene STATs dock with specific DNA phosphotyrosines on the P Cytokine response P receptors Transcription element P JAKs phosphorylate the STATs SH2 STAT1 domain

FIGURE 4.35 JAK–STAT signaling. Cytokine signaling works via JAK–STAT transcriptional activation. A cytokine binds to the Janus receptor kinase (JAK) and stimulates the dimerization and autophosphorylation of tyrosines. The STAT subunits bind and are phosphorylated by the JAK dimer. The phosphorylated STAT dimer enters the nucleus and forms a complex with other transcription coactivator proteins to recognize the promoters for genes encoding the differentiation, growth, and proliferation of various blood cells. (Adapted from B. M. Alberts, et al. Molecular Biology of the Cell, Fifth edition. Garland Science, 2008.)

FIGURE 4.36 Transcription activation by Myc family Myc Max proteins. Myc is the founding member of a bHLH fam- Cell growth and More differentiation ily of proteins that can form homo- and ­heterodimers. CACGTG proliferation The Myc–Max dimer activates the transcription of GTGCAC genes involved in cell growth and proliferation. The Mad–Max dimer represses the transcription of these same genes. When cells differentiate, Mad levels Mad When cells differentiate, Mad increase, thereby increasing the formation of Mad–

Less proliferation Myc levels increase Mad Mad Max dimers and inhibiting the formation of Myc–Max dimers. The result of the switch from Myc–Max to Mad-Max dimers Mad–Max dimers is a decrease in cell growth and replace Myc-Max Mad Max proliferation. (Adapted from R. A. Weinberg. The dimers CACGTG Biology of Cancer. Garland Science, 2007.) GTGCAC Cell growth and proliferation decreases

4.7 Transcriptional regulatory circuits control eukaryotic cell growth, proliferation, and differentiation 131 2nd pages 2nd pages

9781284023558_CH04_0103.indd 131 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

cyclin genes, thereby activating DNA synthesis ­receptors that includes the steroid hormone and cell division. When growth factors are not receptors for estrogen, progesterone, and tes- present, Mad protein is synthesized and forms tosterone, as well as related nuclear receptors a heterodimer with Max. The Mad–Max dimer for vitamin D, retinoic acid, and cortisol. These inhibits Myc–Max formation and also competes hormones are small molecules that are only with Myc–Max for binding to the E-box sites, partially soluble in water and are usually trans- thereby acting as a transcriptional repressor ported in the bloodstream by carrier proteins; to turn off cell proliferation and keep the cells however, once delivered to the exterior of the in a quiescent, differentiated state. Thus, con- cell, these small hydrophobic molecules dif- trol of cell proliferation and differentiation is fuse across the plasma membrane (FIGURE 4.37). due to the maintenance of the correct ratio of Each compound is secreted by a specific cell Myc–Max to Max–Mad complexes. In certain type and plays a role in tissue-specific tran- forms of cancer Myc is overexpressed, lead- scription activation. In contrast to the mem- ing to ­uncontrolled cell proliferation. In this brane-associated receptor signaling discussed ­regard, Myc is referred to as an oncoprotein. in the previous examples, steroid hormone Our final example of gene-specific tran- ­receptors usually bind their ligands in either scription factors is the family of nuclear the cytoplasm or the nucleus, depending on the receptor. The resulting hormone–receptor complex is an active transcription factor that can bind to its specific activation sequence, known as a hormone-response element, Corticoids (adrenal steroids) located in the enhancer regions of target genes Glucocorticoids increase Mineralocorticoids maintain OH blood sugar; also have water and salt balance (FIGURE 4.38). Each steroid hormone receptor CH2OH CH2 anti-inflammatory action H contains a ligand binding domain near its C=O C=O CH O=C C-terminus, a sequence-specific DNA bind- OH 3 OH OH ing domain in the middle of the protein, and CH3 CH3 an activation domain at the N-terminus. The Cortisol DNA binding domains of most steroid hormone Aldosterone ­receptors contain a ZF motif located adjacent O O to a basic sequence that recognizes the specific Steroid sex hormones Estrogens are involved Androgens are required hormone-response element DNA sequence. in female sex development OH for male sex development OH The steroid hormone-response elements con- CH3

CH3

β-Estradiol Steroid OH O Testosterone Development and morphogenesis Receptor Vitamin D is required Retinoic acid is a morphogen for bone development CH3 Cytoplasm and calcium metabolism HC CH2 CH2 CH2 CH3 H C CH 3 CH3 H3C CH3CH3 HC CH2 Nucleus CH

C CH3 HO Vitamin D GRE/enhancer promoter 3 HC Thyroid hormones CH Activation Transcription Thyroid hormones control basal metabolic rate HC I I COOH C CH3 HO O CH2 CH HC FIGURE 4.38 Steroid-receptor transcription factors. NH COOH The glucocorticoid receptor is an example of a steroid I Triiodothyronine (T ) (trans) Retinoic acid 3 ­hormone-receptor transcription factor, which binds to FIGURE 4.37 Steroid and related hormones that activate ­transcription. cortisol and interacts with the glucocorticoid response Steroid and related hormones bind to protein receptors that function as element (GRE) located within the enhancers of promoters transcription factors to activate specific promoters. for specific genes.

132 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 132 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

and other transcription factors. Thus, transcrip- tion activation in metazoan cells is a complex process that is based on combinations and per- SMRT mutations of factors whose assembly and func- tion are governed by signals from the cellular environment. Steroid receptor Concept and Reasoning Check Ligand triggers activation 1. Some breast cancer cells express estrogen recep- Coactivators Basal apparatus tors that increase the growth rate of the tumor. A number of anticancer drugs are estrogen ana- PCAF logs that inhibit the binding of estrogen to the CBP/p300 . Over time, these drugs often TBP lose their efficacy. Propose at least two different hypotheses for how the resistance to the drugs could develop. RNA polymerase Ligand

FIGURE 4.39 Steroid receptors use coactivators and to control transcription activa- 4.8 The 5’ and 3’ ends tion. The retinoic acid receptor (RAR) binds the of mature mRNAs SMRT corepressor in the absence of retinoic acid, thereby repressing transcription. When retinoic are generated by acid binds to the RAR, SMRT is replaced by the RNA processing coactivator complex containing the proteins PCAF, CBP, and p300, which leads to activation of the Key concepts general transcription apparatus bound at the core • The 5′ end of a eukaryotic mRNA is “capped” with a 7-MeG ribonucleotide that protects the 5′ end promoter of retinoic acid-responsive genes. of the transcript from enzymatic degradation and functions as a binding site for proteins involved in mRNA export from the nucleus to the cyto- sist of two short half sites separated by 1 to plasm and for translation initiation factors in the 5 bp; thus, the hormone receptors function cytoplasm. only as dimers—some steroid receptors func- • The 3′ end of the mRNA is generated by an endo- nucleolytic cleavage reaction and the polymeriza- tion as homodimers and others are heterodi- tion of a large number of adenosines to form the mers. In a manner very similar to CREB, the poly(A) tail. The poly(A) tail protects the 3′ end of hormone-bound steroid receptor binds to its the transcript from degradation and modulates the response element and recruits the coactiva- efficiency of translation initiation. tors PCAF and dimeric protein CBP/p300 to the promoter, causing acetylation of histones Most mRNA transcripts in eukaryotic cells are and displacement of nucleosomes within the not functional when first synthesized and must core promoter regions of steroid hormone-­ go through one or more steps to convert them responsive genes (FIGURE 4.39). into active forms. The initial transcript (also In summary, we have seen examples called the primary transcript or pre-­mRNA) is of three different mechanisms for signal converted to the active or mature transcript ­transduction-mediated regulation of transcrip- through a series of covalent modifications. The tion factor activation: phosphorylation in the first processing reaction is the ­addition of the cases of CREB and STAT, heterodimerization in 7-MeG cap; the second processing reaction the case of Myc–Max, and allosteric-induced is the addition of 50 to 300 adenine ribonu- conformational changes by hormone binding cleotides onto the 3′ end to form the poly(A) in the case of the steroid receptors. Although tail. In addition, most ­eukaryotic ­mRNAs are there are cases in which transcription factors also spliced. In this section, the capping and may interact directly to recruit or stabilize the polyadenylation processing reactions are de- assembly of the basal transcription apparatus, scribed; in the following section RNA splicing many activator proteins can function indirectly is presented. by promoting the displacement of nucleosomes Capping of the RNA transcript with 7-MeG and clearing the core promoter region for the is unique to RNA polymerase II transcripts. assembly of the basal transcription apparatus The cap is important for mRNA stability and

4.8 The 5’ and 3’ ends of mature mRNAs are generated by RNA processing 133 2nd pages 2nd pages

9781284023558_CH04_0103.indd 133 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

O CH3 modification in which a poly(A) tail is added 7-methyl to the 3′ end of the transcript (histone mRNAs guanine +N N are an important exception). The poly(A) tail is not encoded by the gene sequence but is added O N H2N N by the enzyme poly(A) polymerase in concert –O POCH 2 O with a number of polyadenylation factors. The O polyadenylation reaction occurs in two steps: Triphosphate 1. In the first step, a large complex con- –O PO sisting of more than a dozen proteins O OH OH recognizes the polyadenylation –O PO ­signal AAUAAA on the pre-mRNA O and directs the cleavage of the tran- script just after the first CA dinucleo- CH2 O B1 tide located within 10 to 30 nucleotides downstream of the beginning of the AAUAAA (FIGURE 4.42).

O OCH3 2. In the second step, poly(A) polymerase adds the poly(A) tail. The initial length O P O – of the poly(A) tail ranges from 50 to 80 O adenines in lower eukaryotes such as

CH2 yeasts to 200 to 300 adenines in mam- O B2 mals. Elements located upstream and downstream from the polyadenylation signal can enhance the efficiency of O OH the polyadenylation reaction. Several additional proteins are also ­required Messenger RNA 5-cap structure for cleavage and polyadenylation, as depicted FIGURE 4.40 Structure of the 7-MeG cap. The 7-MeG in FIGURE 4.43. The key protein involved in cap contains a methyl group on the 7-position of mammalian polyadenylation is the cleavage/­ guanine and on the 2′ position of the ribose on the polyadenylation specificity factor (CPSF), base of the original 5′ end of the mRNA. The 7-MeG which binds to the AAUAAA sequence and cap is attached to the 5′ end via an unusual 5′→5′ directs both cleavage and polyadenylation triphosphate linkage. (FIGURE 4.44). Homologs of CPSF are found in all eukaryotes. Mammalian CPSF consists of four subunits, CPSF-160, CPSF-100, CPSF-73, and translation and also plays a role in mRNA CPSF-30. CPSF-73 is thought to be the actual­ ­nucleocytoplasmic transport. The cap is added cleavage enzyme that cuts the mRNA just af- to the 5′ end of the transcript as soon as the ter the CA dinucleotide to form the 3′ end for RNA emerges from the active site of RNA poly- the addition of the poly(A) tail. The cleavage merase II. The structure of the 7-MeG cap is ­stimulatory factor CStF binds to a ­GU-rich shown in FIGURE 4.40, and the steps in the cap- ­sequence downstream of the AAUAAA site and ping process are shown in FIGURE 4.41. In the participates only in the cleavage reaction. Two first step, an RNA triphosphatase removes the other components, cleavage factors I and II, γ-phosphate of the triphosphate group on the are required to stimulate the cleavage­ reaction. first base at the 5′ end (which is usually a pu- The second function of CPSF is to hold the rine, A or G). Then a guanyltransferase puts poly(A) polymerase on the end of the poly(A) a guanosine monophosphate on the 5′ end. tail so the polymerization proceeds rapidly with- This reaction is not the conventional 5′ to 3′ out the polymerase falling off the end. The first linkage found in all other types of nucleo­ ∼10 adenines are added slowly, but once these tidyl transfer reactions; instead, the linkage is are polymerized, the reaction proceeds rapidly to 5′ to 5′. In the final step, the enzyme N7G- produce the full 200- to 300-residue length tail. methyltransferase transfers a methyl group Another protein important for the regula- from S-adenosylmethionine to the N7 position tion of poly(A) tail length is the nuclear poly(A) of the added guanine at the 5′ end. binding protein, PABPN1. PABPN1 binds to In addition to the 7-MeG cap, almost all the poly(A) tail as a linear array of mono- mRNAs undergo an additional posttranslational mers as the tail is synthesized and helps CPSF

134 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 134 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Pu B B B FIGURE 4.41 Pathway for the synthesis of the 7-MeG cap. The γ­ -phosphate Initiating 2 3 4 nucleotide on the 5′ end of the pre-mRNA is removed by a RNA triphosphatase. OH OH OH OH Guanalyltransferase catalyzes the addition of guanosine monophosphate 7 5 P PP P P P P to the 5′ end of the pre-mRNA. N G-methyltransferase catalyzes the meth- 3-end of RNA chain ylation of the terminal guanosine. H2O RNA 5-triphosphatase

H2PO4

Pu B2 B3 B4

OH OH OH OH

PP P P P P 3-end of RNA chain

GTP

Guanylyltransferase PPi

G Pu B2 B3 B4

HO OH OH OH OH HO P PP P P P P 3-end of RNA chain

S-adenosylmethionine

N7G-methyltransferase S-adenosylhomocysteine

MeG Pu B2 B3 B4

HO OH OH OH OH cap 0 HO P PP P P P P 3-end of RNA chain

(a) Poly(A) site recognition in mammals FIGURE 4.42 Polyadenylation signals in mammals Cleavage site and yeast. (A) The poly(A) signal in mammals is 10–30 nt ≤30 nt Variable composed of the highly conserved sequence AAUAAA located 10–30 nucleotides upstream of the cleav- 5 UGUA AAUAAA poly(A) site U-rich 3 (n) age site. In most cases, additional polyadenylation Polyadenylation U-rich signal (highly downstream enhancer elements are located upstream and down- conserved) element stream of the cleavage site. (B) The poly(A) signal in yeast is comprised of elements that are more variable than mammals. (Part A adapted from G. M. Gilmartin, (b) Poly(A) site recognition in yeast Cleavage site Genes Dev. 19 (2005): 2517–2521. Part B adapted 10–30 nt from J. Zhao, L. Hyman, and C. Moore, Mol. Biol. Rev. 63 (1999): 405–445.) 5 EE PE poly(A) site 3

Efficiency element Positioning element Py(A)n UAUAUA AAUAAA and related AAAAAA sequences and related sequences

4.8 The 5’ and 3’ ends of mature mRNAs are generated by RNA processing 135 2nd pages 2nd pages

9781284023558_CH04_0103.indd 135 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Factor Processing Step Function CPSF Cleavage and Contains four subunits. CPSF-73 probably cleavage/polyadenylation poly(A) addition cleaves pre-mRNA at the poly(A) site. specificity factor CPSF-160 binds to AAUAAA and CPSF-30 and CPSF-100 subunits increase specificity.

CstF Cleavage Contains three subunits (CstF-77, CstF-64, and cleavage stimulation factor CstF-50). CstF-64 binds to the U-rich sequence; CstF-77 bridges CstF-64 and CstF-50 and contacts CPSF-160.

CFI Cleavage Recognizes sequence elements in poly(A) site. cleavage factor I CFII Cleavage Unknown cleavage factor II

PAP Cleavage and Catalyzes poly(A) formation. poly(A) polymerase poly(A) addition

PAB II Poly(A) elongation Binds poly(A) and CPSF-30; responsible for poly(A) binding protein processive poly(A) elongation and for the tail length

CTD Cleavage Binds CPSF and CstF. Carboxyl terminal domain of large subunit in RNA polymerase II

Ssu72 and PC4 Cleavage Ssu72 and PC4 proteins interact with the general transcription factor TFIIB and with components of cleavage/polyadenylation machinery.

Symplekin Cleavage and Symplekin helps to assemble or stabilize the poly(A) addition CstF complex and there by helps to hold the complete cleavage/polyadenylation machinery together.

FIGURE 4.43 Components of the mammalian cleavage/polyadenylation machinery.

to maintain the speed of the reaction. Once thereby terminating the polyadenylation reac- the tail has reached the 200 to 300 adenine tion. PABPN1 ­remains bound to the poly(A) tail length, however, the poly(A)-bound PABPN1 for the ­duration of the time that the mRNA is condenses into an oligomeric particle that in the nucleus but dissociates from the mRNA ­functions to dissociate­ poly(A) polymerase poly(A) tail during or shortly after the tran- from CPSF and the 3′ end of the poly(A) tail, script is ­exported to the nucleus.

FIGURE 4.44 The mammalian polyadenylation CTD ­machinery. The black diamonds indicate experimen- CPSF tally confirmed protein–protein interactions between CstF specific components of the complex. CPSF-160 is CstF-50 bound to the AAUAAA signal and CstF-64 is bound to CPSF-73 CstF-77 U-rich downstream element. The complex is assem- CPSF-100 RNAP II bled with the aid of the CTD of the largest subunit of Ssu72 Symplekin PC4 RNA polymerase II. (Adapted from O. Calvo and J. L. Manley, Genes Dev. 17 (2003): 1321–1327.) CPSF-160 CstF-64 CPSF-30 PAP 5 AAUAAA U-rich 3

CFI CFII

136 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 136 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

In summary, eukaryotic mRNAs are post- G U transcriptionally modified to add a 7-MeG cap A U to the 5′ end and a poly(A) tail to the 3′ end. U A In the next section, we explore how RNA poly- G G A A merase terminates the transcription reaction. UG G C G C Concept and Reasoning Check C G A C U 1. All RNAs synthesized by RNA polymerase II are U A capped, regardless of whether or not they function U A as mRNAs or are ncRNAs. What might be the func- G C A G tion of cap for RNAs that do not encode proteins? C G A U A U A U G C C G GC-rich region in stem 4.9 Terminators direct the C G C G end of transcription U A Single-stranded U-run A U A U elongation G C U U U U Key concepts • Transcription terminators are sequences that signal FIGURE 4.45 The structure of a bacterial intrinsic RNA polymerase to pause, release the RNA tran- ­terminator. Intrinsic terminators in bacteria have an RNA script, and dissociate from the polymerase. structure that includes a thermodynamically stable GC-rich • There are two classes of prokaryotic transcription stem-loop structure followed by several uridine nucleotides. terminators: intrinsic terminators that are RNA structures, which do not require protein factors for function, and factor-dependent terminators. Intrinsic terminators consist of a GC-rich stem-loop In ­eukaryotes, however, the 3′ termini of nearly structure followed by four to eight uridine residues. all species of RNAs are generated by posttran- The most common type of factor-dependent termi- scriptional processing, which may include endo- nator requires the ATP-dependent Rho helicase for nucleolytic or exonucleolytic cleavages and, in transcript release. the case of mRNAs, addition of a poly(A) tail. In • Each of the three eukaryotic nuclear RNA poly- this section, we focus on the signals and proteins merases uses a different transcription termination strategy. RNA polymerase I uses a combination responsible for transcription termination. of DNA sequences and DNA binding proteins. RNA Terminators are classified in prokaryotes polymerase III uses RNA sequences. RNA poly- ­according to whether or not RNA polymerase merase II couples transcription termination to requires additional protein factors to recog- cleavage and polyadenylation of the transcript. nize the termination signal. If RNA polymerase ­efficiently terminates transcription at specific Although transcription initiation is of central ­locations on the DNA in the absence of any other importance in gene regulation, transcription factors, these sites are called intrinsic termina- termination is an essential process for ensuring tors. Intrinsic prokaryotic terminators have two that adjacent genes are transcribed in an inde- structural features (FIGURE 4.45): a stable GC-rich pendent and orderly manner and that the tran- stem-loop structure followed by a stretch of U scription apparatus is recycled once synthesis of residues. Both features are required for termi- the transcript has been completed. The termina- nation. The stem-loop structure signals RNA tion reaction is composed of three steps: (1) the polymerase to pause, whereas the U-rich region cessation of RNA polymerase elongation, (2) the increases the instability of the DNA–RNA hybrid dissociation of the transcript from the DNA inside the active site of the polymerase, promot- template, and (3) the dissociation of RNA poly- ing the dissociation of the transcript from the merase from the template. The sequence that ternary complex. Often, the actual termination directs this process is called the transcription event takes place at any one of several positions termination signal, and the transcription toward or at the end of the U-rich region. stop site is the last nucleotide in the template Rho, the most common bacterial transcrip- that directs the synthesis of the transcript. In pro- tion termination factor, is an ATP-dependent karyotes, termination generates the 3′ termini helicase composed of six identical subunits. of most transcripts, although some transcripts FIGURE 4.46 depicts the current model for how Rho may be processed by endo- and exonucleases. ­functions. First, Rho binds to the ­rho-­utilization

4.9 Terminators direct the end of transcription elongation 137 2nd pages 2nd pages

9781284023558_CH04_0103.indd 137 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Amino-terminal domain 1234 (contains the 1° site)

Carboxy-terminal domain (contains the 2° site) 5’

rut sequence

mRNA

RNA polymerase 3’ DNA

FIGURE 4.46 A model for Rho-dependent transcription termination. Rho-dependent transcription termination occurs in four steps. Step 1: The amino-terminal domain of Rho in the open-ring conformation binds the Rho-utilization (rut) sequence on mRNA. Step 2: The ring closes and rho binds a second site on the mRNA, thereby activating the rho ATPase. Step 3: Rho reels the mRNA through its hexamer ring by coupling its helicase activity to the hydrolysis of ATP. Step 4: the mRNA-Rho complex is dissociated from RNA polymerase, which disengages RNA polymerase from the DNA. (Adapted from D. L. Kaplan and M. J. O’Donnell, Curr. Biol. 13 (2003): R714–R716.)

(rut) sequence within the RNA transcript, which ­termination factor (TTF-1) that functions as a stimulates the Rho ATP-dependent helicase­ blockade to pause the polymerase. In concert ­activity. Rho remains tethered to the rut site with the transcription release factor PTRF, RNA while translocating along RNA transcript in a polymerase I releases the transcript 10 to 12 process referred to as tethered tracking. RNA bp upstream of the binding site (FIGURE 4.47). polymerase continues to transcribe the template RNA polymerase III terminates at a loca- until it stops at a strong pause site, allowing Rho tion containing a series of uridine residues, sim- to catch up and release the transcript and RNA ilar to prokaryotic intrinsic terminators. Other polymerase from the DNA template. sequences adjacent to the U run are ­important Our understanding of transcription ter- for pausing the polymerase. The role of protein mination in eukaryotes is less detailed than factors in RNA polymerase III transcription ter- in prokaryotes. RNA polymerase I transcrip- mination remains obscure. tion termination involves a sequence-specific­ Transcription termination of RNA poly- DNA binding protein called transcription merase II is much more complicated. Most

FIGURE 4.47 Mammalian RNA polymerase I RNA polymerase I transcription termination. The mammalian RNA polymerase I terminator contains a run of A-T RNA 3 end TTF-1 base pairs followed by a run of G-C base pairs, which causes the RNA polymerase I to pause on GCCTCTTCCATTTTTTCCCCCCCAACTTC GGAGGTCGACCAGTAC TCCG the DNA template. These sequence elements are CGGAGAAGGTAAAAAAGGGGGGGTTGAAGCCTCCAGCTGGTCATGAGGC AUUUUUUCCCCC located upstream of the binding site for the tran- CC U Sal box U scription termination factor TTF-1, which stimu- C Nascent RNA U lates release of RNA polymerase I from the DNA C C 18 bp TTF-1 template. (Adapted from R. H. Reeder and W. H. G 10–12 bp binding site upstream Lang, Trends Biochem. Sci. 22 (1997): 473–477.) element

138 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 138 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

importantly, RNA polymerase II transcripts are this leaves the impression that termination processed by an internal cleavage reaction that may be somewhat superfluous during RNA releases the transcript from the polymerase, polymerase II transcription. However, experi- as discussed previously. In the case of almost mental evidence demonstrates that the cleav- all mRNAs, polyadenylation occurs immedi- age and polyadenylation reactions are tightly ately after cleavage. Termination of RNA poly- coupled to transcription termination. merase II transcription occurs downstream of FIGURE 4.48 shows to primary models of the site where the RNA cleavage reaction took how coupling of transcription termination place—sometimes thousands of ­nucleotides to polyadenylation might occur. In the “an- ­downstream of the cleavage site. At first glance, titermination” model, positive elongation/­

(a) “Antitermination” model (b) “Torpedo” model

Capping enzyme Poly-A factors P P P CE 5-cap Gppp RNAP II RNAP II Pause poly(A)

Positive elongation/ anti-termination factors 5-cap Cleavage, Polyadenylation Gppp AAAAA RNAP II

RNAP II Pause Negative elongation/ poly(A) termination factors Gppp 5 3 RNA degradation, Termination, RNAP II RNAPII dissociation

p(A) Rat1/Xm2 RNAP II Pause

RNAP II 5 3 RNA degradation, Termination, RNAPII dissociation

RNAP II

FIGURE 4.48 Two models for termination of RNA polymerase II transcription. (A) The “antitermination model” proposes that elongation factors are loaded onto the phosphorylated CTD of RNA polymerase II, thereby preventing premature termination during transcription elongation. Once RNA polymerase II passes the poly(A) site, termination factors replace the elongation factors and stimulate termination. (B) The “torpedo” model proposes that when cleavage of the transcript occurs at the poly(A) site, the exonuclease RAT1/Xrn2 digests the nascent transcript from the uncapped 5′ end in a 5′→3′ direction, leading to the dissociation of RNA polymerase from the DNA template. (Adapted from S. Buratowski, Curr. Opin. Cell Biol. 17 (2005): 257–261.)

4.9 Terminators direct the end of transcription elongation 139 2nd pages 2nd pages

9781284023558_CH04_0103.indd 139 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

antitermination factors assemble onto the CTD • Most exon–intron junctions have the sequence of the large subunit of RNA polymerase II as the AG/GU at the 5′ boundary and AG/G at the 3′ polymerase leaves the promoter. Upon pausing boundary. at the polyadenylation signal during the cleav- • The major steps of splicing are as follows: (1) cleav- age reaction, the positive elongation/antiter- age of the 5′ exon–intron junction, (2) covalent at- mination factors are replaced by the negative tachment of the 5′ end of the intron to the branch elongation/termination factors. In the “tor- point located just upstream of the 3′ exon–intron junction, (3) cleavage at the 3′ exon–intron junc- pedo” model of RNA polymerase II termination, tion, and (4) ligation of the upstream exon to the an exonuclease called RAT1/Xrn2 binds to the downstream exon. newly generated 5′ end of the remaining RNA transcript after cleavage at the poly(A) site. As The removal of introns from pre-mRNA tran- the transcript continues to be transcribed after scripts is a critical intermediate step in the ex- cleavage, RAT1/Xrn2 degrades the RNA faster pression of protein-coding genes. All organisms­ than the transcript is synthesized. Once the ex- or the viruses that infect them have interrupted onuclease has reached RNA polymerase II, the genes in their genomes, although introns are nuclease interacts with proteins bound to the very rare in prokaryotes and are found in only CTD on the large subunit of the polymerase, a small proportion of unicellular eukaryotic which triggers the release of the polymerase genes. In higher eukaryotes, however, introns and the transcript from the DNA template. This in protein-coding genes are the rule rather than model is similar to how the exception, with the average human gene termination factor Rho functions and explains containing about eight introns. At least a dozen why the termination sites for RNA polymerase genes in the human genome have more than II are not specific sites—the exact location of 100 introns. The sizes of mammalian introns termination is determined by the rate of elonga- range between ∼1,000 and 700,000 nucleo- tion of RNA polymerase on the specific gene se- tides, with an average 1,000 to 2,000 nucleo- quence and the speed at which the exonuclease tides. Removal of the introns from pre-mRNA degrades the specific transcript. In both models produces mature mRNAs that range in size the cleavage event provides an indirect trigger from ∼1,000 to 100,000 nucleotides. Although for termination of transcription at a distal site introns were originally thought to correspond from the polyadenylation signal. The intricacy to specific structural or functional domains in of the coupling between polyadenylation and proteins, introns can be found in the UTRs as termination suggests that transcription termi- well as ncRNA genes. Thus, the origins and nation is extremely important to the cell, most purposes of introns remain unclear. In this sec- likely for the elimination of interference by tion, the biochemical mechanisms and macro- nonterminated RNA polymerase II complexes molecular assemblies involved in pre-mRNA in the process of transcribing adjacent genes and splicing are discussed. for the recycling of RNA polymerase II mole- The spliceosome is a very large RNP com- cules for new rounds of transcription. plex that functions to recognize the splicing signals in the pre-mRNA that delineate exon– Concept and Reasoning Check intron boundaries and to catalyze the cleavage 1. Why is it necessary to terminate transcription and ligation reactions in the splicing path- of RNA polymerase II if the transcript is already ­released by the cleavage step in polyadenylation? way. Three short conserved RNA sequences, known as splicing signals, are required for splicing. The splicing signals in yeast are shown 4.10 Introns in eukaryotic in FIGURE 4.49, whereas the analogous signals in mammals are shown in FIGURE 4.50: the 5′ pre-mRNAs are removed splice junction that determines the 5′ end of by the spliceosome the intron; the 3′ splice junction that deter- mines the 3′ end of the intron; and the branch Key concepts point, which is a sequence where the 5′ end • Splicing of introns from pre-mRNAs is a highly con- served process found in all eukaryotic organisms. forms a covalent linkage at an internal segment Almost all mRNAs in eukaryotic organisms contain of the intron to form a lariat-like RNA inter- a number of introns, ranging from one to two per mediate. The mechanism of splicing involves gene in yeast to an average of eight per gene in transesterification­ reactions catalyzed by mammals. Some genes, however, have hundreds of RNA–protein components of the spliceosomes introns. as shown in FIGURE 4.51. Although only mRNA

140 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 140 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Exon 1 IntronExon 2

5 AGGUAUGU UACUAACC(Y)8–12 AGG 3

5-splice site Branching Pyrimidine- 3-splice site nucleotide rich tract FIGURE 4.49 Splicing signals in yeast. The 5′ splice site is defined by the sequence AG/GU (the/denotes the boundary between the exon and intron). The branching nucleotide is the A denoted in red inside the branch site. The 3′ splice site is defined by the sequence AG/G

(the/denotes the boundary between the intron and exon). The sequence (Y)8–12 denotes a tract of pyrimidines (C or U) that enhances splicing.

(a) Human GU-AG introns Exon 1 IntronExon 2

5 AGGURAGU YNCURAYN(Y)10–15 YAGG 3

5-splice site Branching Pyrimidine- 3-splice site nucleotide rich tract

(b) Human AU-AC introns Exon 1 IntronExon 2

5 NAUAUCCUUCCUUAACYACN 3

5-splice site Branching 3-splice site nucleotide FIGURE 4.50 Splicing signals in mammals. There are two classes of introns in mammals named on the basis of the 5′ intron boundary sequences: (1) the major class designated GU-AG introns and (2) a minor class designated AU-AC introns. The splicing signals in ­mammals are less conserved than in yeast. Y designates a pyrimidine base (U or C); R designates a purine base (G or A).

Branching Pyrimidine FIGURE 4.51 The transesterification steps in mRNA 5-splice site nucleotide tract 3-splice site splicing. The mRNA splicing reaction consists of two transesterification steps. In the first step, the 2′ ­hydroxyl group on the branching nucleotide (adenosin­ e) Exon 1 GU A Y(n) AG Exon 2 carries out a nucleophilic attack on the first nucleotide 2-OH : on the intronic side of the 5′ splice junction (usually a uridine). In the second step, the hydroxyl group on the cleaved 3′ end of the exon carries out a nucleophilic attack on the last nucleotide of the intronic side of the 3′ splice junction, thereby joining the two exons and releasing a covalently closed lariat intermediate. : Exon 1 3-OH A Y(n) AG Exon 2 G U

A Y(n) AG Exon 1 Exon 2 + G U

4.10 Introns in eukaryotic pre-mRNAs are removed by the spliceosome 141 2nd pages 2nd pages

9781284023558_CH04_0103.indd 141 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Pre-mRNA that RNA splicing is a highly conserved pro- Exon 1 GU A YYYYYYYYYYAG Exon 2 cess that has evolved in complexity throughout ­biologic evolution. Exon–intron boundaries are identified by E complex the splicing apparatus through recognition of a short 5 and 3 splice site and branch point U1 snRNP ′ ′ consensus sequences. Figure 4.50 shows the U2AF65 SF1 U2AF35 two major classes of mammalian exon–intron boundary sequences. In the most commonly Exon 1 GU A YYYYYYYYYYAG Exon 2 found splice sites, the 5′ exon–intron bound- ary is AG/GU, where the last two nucleotides at the 3′ end of the exon are AG and the first A complex two nucleotides of the 5′ end of the intron are GU. Likewise, the 3′ exon–intron boundary U1 snRNP U2 snRNP is AG/G, where the last two nucleotides of the 3′ end of the intron are AG and the first ­nucleotide of the 5′ end of the next exon is Exon 1 GU A YYYYYYYYYYAG Exon 2 G. Thus, the intron defined by these signals ­begins with the dinucleotide GU and ends with the dinucleotide AG. It is obvious that these B complex sequences alone do not provide the entire sig- U4/U6•U5 tri-snRNP nal for exon–intron recognition—the random chance of ­encountering the sequence AGGU is on average once per 256 nucleotides, and U1 snRNP U2 snRNP even more frequent for the sequence AGG. However, other regions of the transcript that are not easily identifiable by sequence alone Exon 1 GU A YYYYYYYYYYAG Exon 2 play an ­important part in exon–intron rec- ognition. The capacity of RNA to form stem- loop secondary structure and higher-order conformation guide the splicing reaction so C complex

Exon 1 Exon the correct splice sites are recognized and the ­appropriate introns are removed and the exons

GU U6 snRNP are spliced together in the correct order. The steps in the assembly of the spliceosome U2 snRNP are shown in FIGURE 4.52. It is important­ to A keep in mind that the fragments generated in YYYYYYYYYYAG Exon 2 the splicing reaction are held ­together by the FIGURE 4.52 Spliceosome assembly. The spliceosome is comprised of five spliceosome and are not released until ­after snRNA ribonucleoprotein complexes (snRNPs), as well as a large number of splicing has been completed. The splicing auxiliary proteins, including SF1 and U2AF, which help guide the splicing ­apparatus contains both proteins and small reaction. The steps of splicing are designated according to the spliceosome RNAs (in addition to the pre-mRNA). The five intermediates, which can be isolated by biochemical techniques. In the for- snRNAs include the U1, U2, U4, U5, and U6 mation of the E complex, the U1 snRNP binds to the 5′ splice site. SF1 binds to snRNAs. There are 100,000 to 1,000,000 cop- the branch point guided by U2AF, which binds to the polypyrimidine tract near ies of each of these snRNAs per cell. Each of the 3′ splice junction. The U2 snRNP displaces SF1 to form the A complex. The the snRNAs exists as a complex with a set of U4/U6 • U5 snRNP complex joins the spliceosome to form the B complex. Upon specific proteins; these small nuclear RNP release of the U4 and U1 snRNPs, the C complex containing U2, U5 (not shown), complexes (snRNPs) are sometimes referred and U6 completes the splicing reaction to yield the mature mRNA. (Adapted to as snurps. The snRNPs assemble with a large from K. J. Hertel and B. R. Graveley, Trends Biochem. Sci. 30 (2005): 115–118.) number of other proteins into a complex with the pre-mRNA to form the spliceosome, which splicing in higher eukaryotes is discussed in this has a size greater than 50S (see 4.13 Translation chapter, some ncRNAs can autocatalyze the is cata­lyzed by the ribosome for an explanation of removal of their own introns by transesteri- the S value). By point of reference, ribosomes fication reaction mechanisms very similar to range in size from 50 to 80S. The ­snRNP parti- those in pre-mRNA splicing. Thus, it appears cles are involved in the recognition of the splice

142 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 142 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

junctions and the branch point and interact to ­lariat intron intermediate­ from the downstream­ guide the ­assembly of the other protein com- exon. Finally, the left exon and the right exon ponents to form the complete spliceosome. are ligated together. The lariat intermediate is Significant evidence suggests that the snRNAs “debranched” and usually ­degraded by spe- play a more direct role in the catalysis of the cific nucleases within the cell, although some splicing reaction, whereas the proteins help the miRNAs are contained within introns (see 4.18 snRNAs and pre-mRNA adopt active confor- Noncoding RNAs are ­important regulators of gene ex- mations and promote the formation of critical pression). Although these reactions­ are described­ base-pairing interactions necessary for splice- for illustrative purposes as occurring sequen- osome assembly and function. tially, they actually occur very rapidly in a highly In the first step of the splicing reaction, coordinated fashion. The branch site plays a crit- cleavage occurs at the 5′ end of the exon–intron ical role in determining the 3′ splice junction and splice junction, resulting in the separation of has a highly conserved ­sequence in yeast, which the 5′ end of the intron from the 3′ end of the is UACUAAC; however, in higher eukaryotes, left exon. Next, the 5′ end of the intron forms the branch site sequence is less conserved, as a 5′–2′ linkage with a specific adenine within shown in Figures 4.49 and 4.50. In general, the interior of the intron in a region known as the branch site in most introns is located 20 to the branch site, forming the lariat intron inter- 40 nucleotides upstream of the 3′ splice junc- mediate (FIGURE 4.53). Then, a cleavage takes tion, and in higher eukaryotes this proximity is place at the 3′ end of the exon–intron splice important in defining which sequence­ serves as junction, resulting in the separation of the the branch site. It is evident from our discussion that the spliceosome is a very large and dynamic com- O plex of proteins and RNAs, which assembles sequentially on the pre-mRNA. In fact, several CH2 O B assembly intermediates can be isolated by bio- chemical analysis. Splicing occurs only after all components have assembled. The known O OH ­intermediates in this assembly process are

O P O – shown in Figure 4.52. One of the most intriguing questions is how O 5′ and 3′ end processing (capping and polyade- CH2 B nylation) and splicing occur in an orderly and O 2 5 branch progressive manner, especially considering that O some genes must splice together many dozens 2 OOP CH or even hundreds of exons. As alluded to pre- O 2 G 5 O viously, the phosphorylated form of the CTD of O – O P O – the large subunit of RNA polymerase II becomes O a platform for the attachment of RNA process- CH O OH ing factors, including splicing proteins, to RNA 2 O C O P O – polymerase. Once bound to the CTD, some of the splicing factors can transfer to the nascent O RNA transcript as the splicing regulatory sites OH CH2 O O B become exposed after being synthesized in the O P O – active site of the enzyme (FIGURE 4.54). O In summary, splicing occurs through the recognition of the 5 and 3 splice sites and O OH ′ ′ 3 the branch site by the dynamic assembly of – O P O the components of the spliceosome. However, O splicing is much more complicated than merely removing all the introns and splicing FIGURE 4.53 The structure of the lariat branch formed together all the exons. In genes with multiple during mRNA splicing. The transesterification reaction exons and introns, splicing can be regulated yields a 2′→5′ branch that joins the 2′ ribose of the to ­produce a diverse set of transcripts where branching nucleotide to the nucleotide on the intronic different combinations­ of exons and introns site of the 5′ splice junction. are retained in the mature mRNAs. In the next

4.10 Introns in eukaryotic pre-mRNAs are removed by the spliceosome 143 2nd pages 2nd pages

9781284023558_CH04_0103.indd 143 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

of different­ protein products in different cell P types with the same gene sequences. The P ­alternative inclusion of some introns as exons CTD-associated or the skipping of some exons may result in splicing factors multiple mRNA species with different ORFs. In some cases, different amino acids encoded P by the included intron may be inserted into the P interior segment of the ORF, and translation of the new mRNA will result in a protein with a different amino acid sequence and a longer Fully assembled spliceosome or shorter length. In other cases, the alterna- tive inclusion of the intronic sequences may result in either the generation or the loss of FIGURE 4.54 The CTD helps to load splicing factors a start or a stop codon, producing an mRNA onto the pre-mRNA from the CTD. The CTD of the larg- with either a completely new reading frame or est subunit of RNA polymerase II serves as a staging a longer reading frame. FIGURE 4.55 shows the platform for specific splicing factors that facilitate the diversity of mRNA products that can be gener- assembly of the spliceosome on the pre-mRNA. ated through alternative splicing from a single gene. The myriad of different RNA and protein section, we explore this process of alternative products that may be produced through alter- splicing in greater detail. native splicing can blur the distinction between introns and exons. The term “exon” refers to Concept and Reasoning Check those RNA segments found in the final mature 1. The following RNA transcript sequence contains mRNA transcript, and the exons will be dif- a 5′ and 3′ splice junction and a branch point ferent for each of the different mRNA species sequence. Identify these sequences and write generated from a transcription unit. the sequence of the exon–exon fusion that oc- More than 70% of all human genes encode curs after splicing. alternatively spliced mRNAs, and the poten- 5′ UCAGGUUUACAUAAUGACCCUGAACGCGCGCGUGC tial number of splice variants can range from UAACCGCAUGCCCCCCAUGCCCUUCCAGGCAACAG 3′ a few to tens of thousands, underscoring the importance of splicing in the regulation of gene expression. Although the mechanisms that 4.11 Alternative splicing control alternative splicing remain under inten- sive study, it is clear that the decision to splice generates protein at any given potential intron–exon boundary diversity in a pre-mRNA is not simple. Similar to the regulation of transcription, specific ­sequences Key concepts and binding proteins activate or repress splicing • More than one protein can be generated from of introns. As one might expect, these protein a single transcription unit through alternative ­splicing, resulting in the inclusion or elimination binding sites are strategically located to en- of specific introns or exons and the modification of hance or inhibit the use of specific splice sites. the amino acid sequence in the resulting protein Subtle shifts in the binding of specific proteins product. to these sites can influence the likelihood that • Alternative splicing is controlled by splicing the splice event will occur. For example, if a activator and repressor proteins that bind to splice site junction signal or a branch point de- ­enhancer and sequences near exon–intron boundaries. viates from the canonical sequence or is located • Splicing activator and repressor proteins are in an RNA stem-loop structure, proteins that ­expressed in a cell type–specific manner that leads increase the affinity of the spliceosome com- to tissue-specific expression of mRNA species ponents for the sequence or that denature the ­encoding alternative proteins. stem-loop structure can increase the likelihood that the intron will be spliced out. Such se- Although most mammalian genomes con- quences are referred to as splicing enhancers, tain ∼25,000 genes, hundreds of thousands which can be located within the exon (exonic) of proteins can be derived from these genes or within the intron (intronic). Similarly, pro- because­ of alternative splicing. Tissue-specific teins can shield the splice junction and branch alternative splicing results in the expression point signals from the spliceosome, resulting

144 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 144 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

(a) An optional exon (blue) can be skipped or included in repression or silencing­ of splicing of the in- tron and the inclusion of the intron within the mRNA—these sites are referred to as splicing silencers. Several examples of splicing regulatory proteins play important roles in exon/intron (b) Only one exon in an array of optional exons (blue definition. One example is the mRNA binding and green) may be included in mature mRNA protein ASF/SF2—a member of the SR ­family of proteins with amino acid sequences that contain segments rich in the amino acids serine (S) and arginine (R), known as RS domains. More than 10 different SR proteins have been (c) An intron (yellow) may be retained or excluded found in human cells. In addition to the serine/ arginine-rich domain, the SR proteins also contain one or more segments referred to as an RNA recognition motif. Because ASF/ SF2 was discovered by different groups of (d) An exon may have alternative splice sites at its 3-end investigators who gave the protein different names before the proteins were shown to be the same, the hybrid designation of ASF/SF2 is commonly used. ASF/SF2 can form multimeric complexes on the exon sides of 5′ and 3′ splice junctions and interact with the splicing compo- (e) An exon may have alternative splice sites at its 5-end nents to promote splicing at weak splice sites. In particular, ASF/SF2 can interact with the U1 snRNP. Another important splicing regulatory protein is U2AF, which binds to polypyrimidine tracts often located within introns between the FIGURE 4.55 Alternative splicing. Alternative splicing gener- branch site and the 3′ splice junction. U2AF is ates protein diversity by including or excluding selected introns a heterodimer consisting of 65- and 35-kDa as exons. The red boxes correspond to constitutive exons that subunits and can interact with the U2 snRNP are always included in the mature mRNA, whereas the blue and as well as with the SR proteins to help recruit green boxes correspond to optional exons that may or may not spliceosome components to the pre-mRNA. be retained. The yellow boxes correspond to introns that may FIGURE 4.56 shows a model of how ASF/SF2 be either retained or excluded in the mature mRNA. The cyan works in conjunction with the SR and U2AF boxes designate regions within the constitutive exons that proteins to regulate splicing. have alternative 3′ or 5′ splice sites and that can lead to larger In summary, the regulation of splicing to or smaller constitutive exons. The gray regions correspond to produce mRNA diversity, in turn leading to introns. (Adapted from B. R. Graveley, Trends Genet. 17 (2001): multiple proteins from the same transcription 100–107.) unit, is a key strategy in generating ­protein

SR SR

U1 snRNPUU2 snRNP 1 snRNP SRSRSR SRSRSR U2AF65 U2AF35 U2AF65 U2AF35 YYYYYYYYYYAG GU A YYYYYYYYYYAG GU ESE ESE Cross exon Cross exon Cross intron FIGURE 4.56 Regulation of splice site recognition. The SR proteins bind to exonic splicing enhancers (ESEs) and intronic splicing enhancers (ISEs) to increase the affinity of the snRNPs for the splice junctions and branch site. The splicing factor U2AF, which binds to the polypyrimidine tract, acts in conjunction with the SR proteins to enhance lariat formation during the splicing reaction. (Adapted from T. Maniatis and B. Tasic, Nature 418 (2002): 236–243.)

4.11 Alternative splicing generates protein diversity 145 2nd pages 2nd pages

9781284023558_CH04_0103.indd 145 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

­diversity. In the next several sections, we ­explore the translation of the mature mRNA Initiation 30S subunit on mRNA binding site transcript. is joined by 50S subunit and aminoacyl-tRNA binds

Concept and Reasoning Check 1. how might the proteins involved in the regulation of mRNA splicing play a role in ensuring that mRNAs are correctly spliced before they leave the nucleus? AUG

Elongation Ribosome moves along mRNA, extending 4.12 Translation is a three- protein by transfer from peptidyl-tRNA to aminoacyl-tRNA stage process that decodes an mRNA to synthesize a protein Key concepts • Protein synthesis (translation) takes place in three steps: initiation, elongation, and termination. Termination Polypeptide chain is released from tRNA, • The key biochemical reactions in translation are and ribosome dissociates from mRNA (1) the assembly of the ribosome and the initiator tRNA at the mRNA initiation codon, (2) aminoacyl tRNA–mRNA codon recognition, (3) the peptidyl transferase reaction, (4) ribosome translocation from one codon to another along the mRNA, (5) stop codon recognition, and (6) cleavage of the complete polypeptide from the peptidyl tRNA.

The final step in the expression of protein-cod- ing genes is the translation of the mRNA to synthesize a polypeptide chain. Translation is FIGURE 4.57 The three phases of translation. Translation catalyzed by ribosomes, which are large RNP can be divided into three phases: initiation, elongation, and complexes composed of several rRNA species ­translation. and 50 to 80 proteins. The three phases to the translation process are initiation, elongation, and termination (FIGURE 4.57): moves through the ribosome as each 1. Initiation involves those reactions codon is translated. Elongation is the that lead up to the polymerization of most rapid step in translation. the first two amino acids. The steps 3. Termination encompasses the rec- ­include the dissociation of the small ognition of one of three different stop and large subunits from the ribosome, codons by the ribosome, the cleav- the binding of the small ribosomal age of the complete polypeptide from subunit to the initiation codon on the the last peptidyl tRNA, and the dis- mRNA, the joining of the large sub- sociation of the ribosome from the unit to the complex, and the binding mRNA. of the initiator aminoacyl tRNA to the In the next several sections, we explore the P-site of the ribosome. Initiation is a translation process in greater detail. relatively slow step in protein synthe- sis and usually determines the rate at which an mRNA is translated. 4.13 Translation is catalyzed 2. Elongation includes those reactions from synthesis of the first peptide bond by the ribosome to addition of the last amino acid. The Key concepts peptidyl transferase step transfers • The ribosome is the site of protein synthesis the peptide from the previous step in (translation) in both prokaryotes and eukaryotes elongation to the incoming aminoacyl and consists of three to four RNAs and 50 to 80 proteins. tRNA. During elongation, the mRNA

146 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 146 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

the unit of the sedimentation coefficient called • The translation reactions are carried out in large part by the rRNAs, although the ribosomal proteins the Svedberg coefficient or “S value.” The S and translation factors are necessary cofactors in value is based on both the molecular mass of the process. the particle and its shape. Bacterial ribosomal • Translation is an RNA-guided process, involving subunits sediment as 30S and 50S particles interactions among tRNAs, rRNAs, and the mRNA. and combine to form 70S particles. Ribosomal The peptidyl transferase reaction results in the subunits in higher eukaryotes sediment as 40S polymerization of the polypeptide chain and is ­catalyzed by rRNA. and 60S particles and combine to form 80S • tRNAs are 70 to 95 nucleotides in length and con- particles. Because the S value is determined tain a number of covalently modified nucleotides. by both mass and shape, the final size of the All tRNAs fold into similar secondary and tertiary ribosome is not simply the sum of the S val- structures. ues for the individual subunits but reflects the • tRNAs function as adapter molecules that cova- change in overall shape that occurs when the lently attach to amino acids and allow the ribo- small and large subunits interact. Although a some to decode the mRNA through tRNA anticodon base pairing with the mRNA codon. few of the roles of the individual ribosomal pro- teins have been defined in specific steps of the translation reactions, as we saw in the case of The ribosome is perhaps the most complex and the spliceosome, most of the proteins appear to interesting of all biologic supramolecular as- be involved in promoting the conformational semblies. All ribosomes are composed of two rearrangements and changes in base pairing multiprotein subunits—a large subunit with between various RNA molecules. proteins designated by the prefix “L” and a Over the past two decades, it has become small subunit with proteins designated “S.” apparent that the steps of translation are guided E. coli ribosomes have a molecular weight of by a combination of RNA–RNA and protein– ∼2.5 MDa. The small ribosomal subunit consists RNA interactions. Most importantly, rRNA cat- of a single RNA (16S rRNA) and 21 (S1–S21) alyzes the central reaction of translation, the proteins; the large subunit contains two RNAs peptidyl transferase step, which catalyzes the (5S and 23S rRNAs) and 34 (L1–L34) pro- polymerization of the polypeptide chain. RNA teins (FIGURE 4.58). Eukaryotic ribosomes are is thought to have preceded proteins in biomo- somewhat larger; mammalian ribosomes have lecular evolution. In hindsight, it is not surpris- a molecular weight of more than 3 MDa. In ing that RNA catalysis is necessary for protein mammals, the small subunit consists of a single synthesis; however, it took many ­decades to RNA (18S rRNA) and more than 30 proteins, prove this experimentally. Through a series of whereas the large subunit has three RNAs (5S, elegant studies carried out in a number of labo- 5.8S, and 28S rRNAs) and more than 45 pro- ratories, translation has been shown to consist teins. Historically, ribosomes and their compo- of a set of dynamic RNA-mediated ­reactions nents have been named according to their rate among rRNAs, tRNAs, and mRNAs that are of sedimentation in an ultracentrifuge, using modulated by ribosomal proteins and other accessory proteins. During the steps of trans- lation, the rRNAs, tRNAs, and mRNAs change r- conformations to facilitate the formation and Ribosomes rRNAs proteins disruption of specific base-pairing interactions 23S = 2904 bases 31 Bacterial (70S) 50S between the RNAs. Remarkably, in spite of 5S = 120 bases mass: 2.5 MDa some differences in the size and organization 66% RNA 30S 16S = 1542 bases 21 of the ribosomes and some of the regulatory protein factors, translation is the most highly 28S = 4718 bases 49 conserved step in gene expression. Almost all Mammalian (80S) 60S 5.8S = 160 bases RNAs and proteins involved in translation have mass: 4.2 MDa 5S = 120 bases 60% RNA both structural and functional homologs that 40S 18S = 1874 bases 33 can be readily identified among all prokaryotes, eukaryotes, and archaea. FIGURE 4.58 Ribosome structure and composition. Both The tRNAs function as adapters to bring prokaryotic and eukaryotic ribosomes contain a large and the correct amino acids to the reactive center a small subunit comprised of a number of proteins and of the ribosome as directed by the mRNA codon 1–3 ­rRNAs. The “S value” refers to the sedimentation coef- sequence. The tRNAs are 70 to 95 nucleotides ficient and reflects both molecular weight and shape. in length and have three levels of structure.

4.13 Translation is catalyzed by the ribosome 147 2nd pages 2nd pages

9781284023558_CH04_0103.indd 147 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

The primary structure of the tRNA is the linear anticodon loop is always located at the top of sequence of nucleotides, which in addition to the L, whereas the aminoacyl acceptor stem is the conventional bases A, G, C, and U contains located at the foot of the L. The L shape maxi- a number of bases that are modified posttran- mizes stacking of the bases by the formation of scriptionally by a series of tRNA modification a helix containing the anticodon arm and the enzymes. The secondary structure of a tRNA D arm and by the formation of a perpendicular is determined by the intramolecular base pair- helix containing the TψC arm and the acceptor ing, which results in the formation of adjacent arm (see Figure 4.60 and FIGURE 4.61). stem-loop structures. All tRNAs can be folded Thus, tRNAs have two functional ends: into a conserved secondary structure referred 1. The 3′ CCA end: The amino acids to to as the cloverleaf structure (FIGURE 4.59). be polymerized into the polypeptide The tertiary structure of the tRNA refers to the chain come to the ribosome as cova- three-dimensional folding of the entire mole- lent adducts located at the 3′ ends of cule, which is stabilized by specific base pairs their specific tRNAs. The OH group at between nonadjacent nucleotides and by base- the end of the CCA sequence forms an stacking interactions. The tRNA cloverleaf sec- acyl linkage with the amino group on ondary structure folds into a three-dimensional the amino acid. The covalent conjugate (tertiary) structure that has an overall general L between an amino acid and its cognate shape, as shown in FIGURE 4.60. Each tRNA has tRNA is called an aminoacyl tRNA. a slightly different tertiary structure, but the The aminoacylation of the tRNA is catalyzed by an aminoacyl-tRNA­ synthetase, which has to match the 3 specific tRNA with the correct amino A—OH 76 acid before the aminoacyl reaction­ can

C 75 take place. Each tRNA has its own tRNA synthetase to catalyze its aminoacyla- C 74 Acceptor stem tion with the correct amino acid. 73 2. The anticodon loop 5 P— The anticodon loop: 1 72 base pairs with the complementary 2 71 mRNA codon, thereby “reading” the 3 70

4 69

5 68 D arm Tψ C arm TΨC arm Acceptor arm U*6 67 R A Y A* Y C 5 3 G* R R G G A C Y* T ψ D arm

Variable arm Variable arm Anticodon arm

Y* U R*

34 36 35 Anticodon arm Anticodon FIGURE 4.59 The tRNA cloverleaf secondary structure. All tRNAs can FIGURE 4.60 The tRNA L-shaped tertiary structure. All be folded into a conserved secondary structure that contains short tRNAs can be folded into a conserved L-shaped terti- base-paired helical regions (dots). Some positions contain invariant ary structure that stacks the helices of the secondary pyrimidine (Y) or purine (R) bases or contain modified bases such as structure as shown. The tertiary structure is stabilized pseudouridine (ψ) or methylated bases (*), whereas other positions by base pairing between distal nucleotides on regions contain nucleotides that vary between specific tRNAs. (Adapted from that are brought into proximity by the folding of the D. Voet, J. G. Voet, and C. W. Pratt. Fundamentals of Biochemistry, tRNA. (Adapted from J. G. Arnez and D. Moras, Trends First edition. John Wiley & Sons, Ltd., 1999.) Biochem. Sci. 22 (1997): 211–216.)

148 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 148 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Cloverleaf Secondary Structure L-form Tertiary Structure FIGURE 4.61 The folding of tRNA secondary 3 structure into L-shaped tertiary structure. TΨC arm CCA sequence Acceptor arm The helical regions of secondary structure 3 stack with one another to form extended hel- 5 5 ices, which are perpendicular to each other. Acceptor arm

TΨC arm D arm D arm

Variable arm Variable arm Anticodon arm Anticodon arm

Anticodon Anticodon

genetic code embedded in the mRNA and x-ray diffraction studies. An example of the sequence, as shown in FIGURE 4.62. As structure of a bacterial ribosome derived from discussed previously, there are 64 dif- high-resolution electron microscopy is shown ferent codons and 61 of the codons in FIGURE 4.63. The three binding sites for the ­specify 1 of the 20 amino acids com- aminoacyl tRNAs on the prokaryotic ribosome monly found in proteins. Thus, there are depicted in FIGURE 4.64: The A-site binds the should be 61 different anticodons, and, thus, in theory there should be 61 differ- 3 ent tRNAs. However, there are ­actually Amino acid many more than 61 tRNAs; each has a different sequence in the remainder of 5 the molecule but has the same antico- don loop. The three termination codons tRNA do not have cognate tRNAs. The initiation codon in both prokaryotes and eukaryotes is generally AUG, although a few genes use another triplet. AUG specifies the amino acid methionine. In prokaryotes, how- ever, there are two tRNAs with an anticodon Anticodon for AUG, which is CAU (remember that codon– Anticodon arm anticodon base pairing conforms to the same 321 rules as all other nucleic acid base pairing in UAG mRNA that one sequence is 5′ to 3′ and the other is 3′ 5 AUC 3 to 5′, but all sequences are always written 5′ to 123 3′). One of the prokaryotic tRNAmet species is Codon called the initiator tRNA and carries a methi- FIGURE 4.62 The interaction between the onine that has been covalently modified with a tRNA anticodon and the mRNA codon. The formyl group to yield N-formylmethionine; it is base pairing between the anticodon loop of designated tRNAf-met. For the insertion of all other the tRNA (magenta) and the codon of the methionines in the polypeptide chain, a nonini- mRNA (green) is antiparallel. Thus, the an- tiator (elongator) tRNAmet is used. By contrast, ticodon sequence is designated as 5′-GAU-3′ in eukaryotes, only one tRNAmet serves as both in keeping with the convention that all nu- the initiator and elongator tRNA for methionine. cleic acids are written in the 5′→3′ direction. The details of the structure of prokaryotic (Adapted from D. C. Nelson and M. M. Cox. and eukaryotic ribosomes have emerged from a Lehninger Principles of Biochemistry, Third number of high-resolution electron microscopy edition. W. H. Freeman & Company, 2000.)

4.13 Translation is catalyzed by the ribosome 149 2nd pages 2nd pages

9781284023558_CH04_0103.indd 149 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Intersubunit space

(a) Intact ribosome

FIGURE 4.64 The tRNA binding sites of the bacterial 30S 50S ribosome. The structure of the ribosome from the thermo- philic bacterium Thermus thermophilus was determined to (b) 30S subunit (c) 50S subunit 5.5 Å resolution by x-ray crystallography. The rendering FIGURE 4.63 Cryoelectron microscopy reconstruction of shows the relative locations of the A-site (red), P-site the E. coli 70S ribosome and its constituent subunits. The and peptidyl transferase site (purple), and E-site (cyan). image shows the reconstruction of the E. coli ribosome from (Structure from Protein Data Bank IJGO. G. Z. Yusupova, electron micrographs at a resolution of 25 Å. (Reproduced et al., Cell 106 (2001): 233–241. Protein Data Bank 1GIY. from J. Frank, R. K. Agarwal, and A. Verschoor, Ribosome M. M. Yusupova, et al., Science 292 (2001): 883–896.) structure and shape, Encyclopedia of Life Sciences, 2001. Copyright © 2001 John Wiley & Sons, Ltd. Reproduced with permission of Blackwell Publishing Ltd. Photos courtesy of Joachim Frank, HHML/HRI, Wadsworth Center.) in the P-site to the new aminoacyl tRNA in the A-site, catalyzed by the peptidyl trans- incoming aminoacyl tRNA that now carries the ferase activity of the rRNAs, followed by the next amino acid to be added to the growing movement of the peptidyl tRNA in the A-site polypeptide chain, the P-site or peptidyl­ tRNA to the P-site, and the dissociation of the tRNA site contains the tRNA that was previously in from the ribosome (FIGURE 4.65). Different sets the A-site and carries the polypeptide chain, and of accessory proteins assist the ribosome at each E-site is the exit site where the tRNA that pre- stage. Energy is provided at various stages by viously carried the polypeptide chain leaves the the hydrolysis of guanine triphosphate (GTP). ribosome. The P- and E-sites do not appear to be In the next section, we explore each step in distinct separate sites in eukaryotic ribosomes. translation in greater detail and describe the In summary, the key reaction in protein roles of a myriad of accessory protein factors synthesis is the transfer of the peptidyl tRNA that guide the process.

Aminoacyl-tRNA Polypeptide is transferred to Translocation moves enters the A site aminoacyl-tRNA peptidyl-tRNA into P site

FIGURE 4.65 The peptidyl transferase reaction. The peptidyl transferase reaction is the central biochemical reaction that takes place during protein synthesis. The aminoacyl-tRNA enters the A-site on the ribosome. The growing polypeptide chain on the tRNA in the P-site is transferred to the aminoacyl-tRNA in the A-site and the ribosome moves to the next codon on the mRNA. The peptidyl-tRNA moves to the P-site and the deacylated tRNA moves from the P-site to the E-site.

150 chapter 4 Gene expression and regulation 1st pages 1st pages

9781284023558_CH04_0103.indd 150 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Concept and Reasoning Check Here, we briefly discuss the roles of the trans- lation factors in the various steps of protein 1. Trace the steps in the occupation of the amino- acylated tRNA binding sites required to ­synthesize synthesis and describe the important role of the tripeptide, met-val-lys. GTP in the translation process. During initi- ation, the small ribosomal subunit binds to mRNA (the 30S ribosomal subunit in prokary- 4.14 Translation is guided otes and the 40S subunit in eukaryotes). In by a large number of prokaryotes, recognition of the start codon oc- curs through the base pairing of the 16S rRNA protein factors that within the 30S subunit to a short sequence regulate the interaction on the mRNA immediately upstream of the start codon, commonly referred to as the ribo- of aminoacylated tRNAs some binding site or the Shine-Dalgarno with the ribosome sequence, named after the investigators who discovered the site (FIGURE 4.66). The initia- Key concepts tion factors IF1, IF2, and IF3 (FIGURE 4.67) are • The energy source for translation is the hydrolysis required for this recognition process to occur. of GTP by G-protein translation factors. IF3 keeps the 30S subunit from reassociating • The translation initiation factors control the with the 50S subunit, an event that inhibits ­assembly of the small ribosomal subunit with the mRNA and the location of translation initiation site. initiation and facilitates the 16S RNA–mRNA • The translation elongation factors control the interaction. IF2 binds the initiator tRNA, which assembly of the aminoacylated tRNAs with the carries a formylmethionine and controls the ribosome. binding of the formylmethionine tRNA to • The translation termination factors control the the empty P-site on the 30S subunit. IF2 also ­cessation of the elongation reaction and the promotes the association of the 50S subunit ­release of the completed polypeptide chain from the ribosome.

1. 30S subunit binds to mRNA IF-3 IF-1

Bind ribosome to initiation site AUG on mRNA 2. IF-2 brings tRNA to P site

F-2 Add nuclease I to digest all unprotected AUG mRNA IF-3 IF-1 Isolate AUG fragment of protected AUG mRNA

3. IFs are released and 50S subunit joins Determine AACAGGAGGAUUACCCCAUGUCGAAGCAA... sequence of protected Leader Coding region fragment

Shine-Dalgarno AUG All initiation <10 bases in center of regions have upstream of AUG protected two consensus fragment elements

FIGURE 4.66 The prokaryotic ribosome binding site on mRNA. FIGURE 4.67 The prokaryotic translation initiation re- During the initiation phase of translation, the 30S ribosomal action. The initiation factors IF1 and IF3 stabilize the subunit binds to a sequence that includes the AUG start codon as interaction of the 30S ribosomal subunit with the ribosome well as an adjacent upstream region, called the Shine-Dalgarno binding site on the mRNA, and IF2 stimulates the binding sequence, named after its discoverers. of the tRNA to the 30S–mRNA complex.

4.14 Translation is guided by a large number of protein factors 151 1st pages 1st pages

9781284023558_CH04_0103.indd 151 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

with the 30S–mRNA–tRNAf-met complex and adjacent on the 5′ and 3′ sides of the the subsequent release of all initiation factors. AUG appear to be critical for whether This process requires energy, and IF2 is a ribo- or not the initiation complex chooses to some-dependent GTPase. This initiation step is use the sequence to initiate translation. the only time when an entering aminoacylated The consensus sequence for an AUG in tRNA binds to the P-site; all other aminoacyl the proper context to be used as an ini- tRNAs that subsequently bind to the ribosome tiation codon is NNPuNNAUGG, where enter at the A-site. To make sure the A-site is N is any one of the four nucleotides and not available for this initial tRNA interaction, Pu is a purine (A or G). the initiation factor IF1 blocks the A-site. 5. Stage 5 is when eIF5 and eIF5B each The translation initiation reaction in higher catalyze the hydrolysis of GTP to eukaryotes is very similar in most respects to the stimulate the assembly of the 60S ri- prokaryotic reaction, although there are more bosomal subunit onto the initiation than 12 eukaryotic initiation factors compared complex and the dissociation of the with just three in bacteria. As mentioned earlier, initiation factors. another major difference is that in eukaryotes Eukaryotic initiation usually occurs by the the first amino acid to be incorporated in re- scanning mechanism as previously outlined; sponse to the AUG start codon is methionine, however, there is an alternative means of ini- as opposed to formylmethionine in prokaryotes. tiation, used especially by certain viral RNAs, FIGURE 4.68 shows that the eukaryotic ini- in which a 40S subunit associates directly with tiation pathway can be broken down into five an internal site called an IRES, bypassing any stages: upstream AUG codons. There are few sequence 1. Stage 1 consists of the dissociation homologies between known IRES elements, of the 80S ribosome by the initiation but their use is especially important in specific factors eIF1A and eIF3. eIF2 binds to types of viral infections where the virus inhibits the aminoacylated tRNAmet along with host protein synthesis by destroying 7-MeG GTP, analogous to prokaryotic IF2, and cap structures on the 5′ end of the mRNAs and interacts with eIF5 and eIF1 to form inhibiting the initiation factors that bind them. the 43S PIC. Once the complete ribosome has assem- 2. Stage 2 is when the 43S PIC assembles bled at the initiation codon, the transition to with the mRNA. In contrast to the abil- the elongation phase occurs. As in the case of ity of the prokaryotic 30S ribosome to initiation, the prokaryotic and eukaryotic steps recognize internal ribosome binding in elongation are very similar. Three elongation sites, in most cases the 43S PIC must factors function to modulate the elongation first bind to a complex of initiation reactions. The steps in the prokaryotic elon- ­factors located at the 5′ 7-MeG cap on gation reaction are illustrated in FIGURE 4.69. the mRNA. These steps are nearly identical in eukaryotes. 3. Stage 3 occurs independently of stage 1 At the beginning of the elongation phase, and encompasses the formation of the the A-site of the ribosome is vacant due to the cap binding complex, which consists of loss of the initiation factor that blocked it from initiation factor eIF4E bound directly occupancy by the initiator tRNA. The amino- to the cap and also bound to eIF4G acyl tRNA with the amino acid to be inserted and EIF4A. The cap binding com- at the next position of the polypeptide binds to plex requires energy in the form of the A-site of a ribosome, as the P-site is occu- ATP hydrolysis to unwind any stable pied by initiator peptidyl-tRNA. In prokaryotes, stem-loop structures in the mRNA that entry of the aminoacyl tRNA into the A-site might interfere with initiation codon is mediated by elongation factor EF-Tu (also recognition. called EF1A). EF-Tu is a highly conserved pro- 4. Stage 4 is when the 43S PIClex joins tein throughout prokaryotes and eukaryotes the cap binding complex and scans the and is an example of a monomeric GTP binding mRNA in the 5′ to 3′ direction searching protein called a G-protein. We encounter vari- for the first AUG that is in the proper ous examples of monomeric G-proteins that are sequence context. In contrast to pro- involved in nucleocytoplasmic trafficking and karyotes, there is no easily recognized in receptor-mediated signal transduction. All ribosome binding sequence analogous G-proteins work by a common set of principles: to the Shine-Dalgarno ­sequence in eu- • When GTP is present, the protein is in karyotes. The nucleotides immediately its active state.

152 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 152 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

mRNA 60S m7G AUG

4E 40S Binding 4G 4A eIF4F complex Stage 1 Binding 3 1A 4E m7G 4G AUG Stage 3 4A

3 1A ATP Unwinding eIF4B eIF4H Met ADP + Pi 4E 2 GTP tRNAi 5 1 m7G 4G AUG 4A eIF2 tertiary complex Binding

Stage 2 Met Met 40S subunit loading GTP onto mRNA 2 GTP 5 2 3 5 3 1A 4E 1A 1 1 m7G 4G AUG 4A 43S pre-initiation complex ATP Scanning eIF4B

ADP + Pi

Met GTP 5 2 Stage 4 4E 3 1A 7 1 m G 4G AUG 4A

GDP GTP hydrolysis 2 + Pi

Met Met

5 4E 3 1A 7 1 m7G AUG m G 4G AUG 4A 48S pre-initiation complex

GDP + Pi GTP eIF5B

4E 4G 4A 1 1A 3 5 Stage 5 FIGURE 4.68 The initiation pathway. The eukaryotic translation initiation reaction can be subdivided into five stages. The stages are guided by the eukaryotic initiation factors (eIFs). In stage 1, the 80S is dissociated into

the 40S and 60S subunits by eIF1A and eIF3A. In stage 2, eIF1 and eIF5 stimulate the eIF2–GTP–tRNAmet complex to join the eIF1A–eIF3–40S complex to form the 43S PIC. In stage 3, the eIF4E–eIF4G–eIF4A complex interacts with the 5′ end of the mRNA and eIF4B and eIF4H catalyze the ATP hydrolysis-dependent unwinding of any mRNA secondary structure located between the mRNA 5′ end and the initiation site. In stage 4, the 43S PIC joins the eIF4E–eIF4G–eIF4A–mRNA complex. The entire assembly scans in the 5′→3′ direction until it locates the initiation site, at which time eIF2 hydro- lyzes its bound GTP to GDP and is released resulting in the formation of the 48S PIClex. In stage 5, additional hydrolysis of GTP results in the dissociation of all remaining initiation factors from the 40S–mRNA complex and the assembly of the 60S subunit onto the 40S subunit to form the initiated 80S-mRNA complex. (Adapted from R. J. Jackson, Biochem. Soc. Trans. 33 (2005): 1231–1241.)

4.14 Translation is guided by a large number of protein factors 153 2nd pages 2nd pages

9781284023558_CH04_0103.indd 153 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

FIGURE 4.69 The translation elongation pathway. The trans- Polypeptide lation elongation pathway in bacteria can be divided into nine chain steps. Step 1: EF-Tu • GTP-tRNA complex binds to the A-site on Ribosome the ribosome. Step 2: The EF-Tu GTPase activity is activated. tRNA Amino Step 3: The ribosome catalyzes the hydrolysis of GTP and the acid mRNA release of EF-Tu • GDP. Another factor, EF-Ts (not shown), me- 5 3 EPA GTP diates the regeneration of EF-Tu • GDP back to EF-Tu • GTP. Step EF-Tu GDP 4: The aminoacylated tRNA in the A-site moves next to the 3′ tRNA end of the polypeptidyl-tRNA sitting in the P-site. Step 5: The EF-G 9 Codon 1 polypeptide is transferred to the aminoacyl tRNA in the A-site. release recognition Step 6: EF-G • GTP binds to the peptidyl tRNA in the A-site. Step 7: GTP is hydrolyzed to form the EF-G • GDP complex. Step 8: The polypeptidyl tRNA translocates from the A-site to the P-site GDP and the deacylated tRNA to the E-site. Step 9: EF-G dissociates GTP from the A-site to yield an empty A-site for the next aminoa- cylated tRNA. The steps in eukaryotic translation elongation EPA EPA are very similar; however, the eukaryotic homologs of EF-Tu and EF-G are named EF1A and EF2, respectively. (Adapted from Activation 2 8 Translocation V. Ramakrishnan, Cell 108 (2002): 557–572.) of GTPase

GDP GTP

EPA EPA

3 GTP GTP 7 hydrolysis hydrolysis GDP

EF-Tu • GDP

GTP

EPA EPA

6 EF-G • GTP Accommodation 4 binding GTP EF-G

5 EPA EPA Peptidyl transferase

• When the GTP is hydrolyzed to gua- of aminoacyl-tRNA-EF-Tu • GTP. The ternary nosine diphosphate (GDP), the factor complex binds only to the A-site of the ribo- becomes inactive. some when the P-site is already occupied by • Activity is restored when the GDP is peptidyl-tRNA. This critical reaction ensures replaced by GTP, which is an exchange the aminoacyl-tRNA and peptidyl-tRNA are reaction—this is not the result of phos- correctly positioned for peptide bond forma- phorylation of the GDP by the G-protein. tion during the peptidyl transferase reaction. The binary complex of EF-Tu • GTP binds The aminoacyl-tRNA is loaded into the A-site aminoacyl-tRNA to form a ternary complex in two steps. First, the anticodon end binds to

154 chapter 4 Gene expression and regulation 2nd pages 1st pages

9781284023558_CH04_0103.indd 154 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

the A-site of the small subunit. Then, codon– EF-G • GDP stabilizes the post-translocation­ state. anticodon recognition triggers a change in the EF-G is a major constituent of the cell, present conformation of the ribosome, which stabilizes at about one copy per ribosome. the tRNA binding and causes EF-Tu to hydro- The final step in translation is termination, lyze its bound GTP. The CCA end of the tRNA which is activated when the ribosome encounters now moves into the A-site on the 50S subunit. one of the three stop codons, UAG, UAA, or UGA. The EF-Tu • GDP complex, which is now inac- The strength of these codons in terminating trans- tive due to its low affinity for aminoacyl-tRNA, lation is dependent on the base on either side. is released. Another factor, EF-Ts (also called Translation termination is a two-step pro- EF1B), mediates the regeneration of EF-Tu • cess that requires release factors (FIGURE 4.70). GDP back to the active EF-Tu • GTP complex. Bacteria have two classes of release factors. In the next step of the elongation reac- Class 1 release factors include RF1 and RF2, tion, which is the actual peptidyl transferase whereas RF3 is a class 2 release factor. RF1 step, the polypeptide chain is transferred from specifically recognizes termination codon the polypeptide attached to the tRNA in the UAG, whereas RF2 recognizes termination P-site to the aminoacyl-tRNA in the A-site. codon UGA. Both recognize termination co- As previously mentioned, a large body of re- don UAA. RF1 and RF2 function in the first search indicates that the peptidyl transferase step of translation termination by interact- reaction is catalyzed by the 23S RNA in the ing directly with the termination codons on prokaryotic 50S subunit but supported by ri- the mRNA to block tRNAs from entering the bosomal proteins that hold the 23S RNA in the A-site. Specific domains of the class 1 release correct conformation for its catalytic function. factors have structural homologies to tRNAs, By analogy, it is hypothesized that the peptidyl allowing these domains to fit into the A-site transferase reaction in eukaryotes is catalyzed on the ribosome. Additionally, the RFs stimu- by the 28S RNA in the 60S ribosomal subunit, late the intrinsic ribosomal peptidyl hydrolase although studies to unequivocally demonstrate activity that cleaves the polypeptide from pep- this are still in progress. The peptidyl transfer- tidyl tRNA and releases the polypeptide from ase reaction is triggered when EF-Tu releases the ribosome. The class 2 release factor RF3 the aminoacyl end of its tRNA. The aminoacyl stimulates the release of RF1 or RF2 from the end of the tRNA in the A-site then pivots into ribosome. Although the general mechanism proximity with the end of the peptidyl-tRNA in of termination is similar in prokaryotes and the P-site, whereupon there is a rapid transfer eukaryotes, there are some differences in the of the polypeptide chain from the tRNA in the details. The eukaryotic class 1 release factor, P-site to the aminoacyl-tRNA in the A-site. eRF1, is a protein that recognizes all three ter- The cycle of addition of amino acids to the mination codons. Its sequence is unrelated to growing polypeptide chain is completed by the bacterial factors. The class 2 eukaryotic re- translocation, when the ribosome advances lease factor is called eRF2 and it appears to have three nucleotides along the mRNA with the the same activity as prokaryotic RF3. concomitant expulsion of the uncharged tRNA Once translation termination has occurred, from the P-site, so the newly formed pepti- the mRNA and deacylated tRNA remain associ- dyl-tRNA can enter from the A-site. The empty ated with the ribosome. A ribosome-recycling A-site is then ready to accept a new aminoacyl- factor acts in conjunction with EF2 bound to tRNA corresponding to the next codon. In pro- GTP to dissociate the tRNA, mRNA, and the karyotes, the discharged tRNA is transferred small and large subunits, using energy from from the P-site to the E-site and then to the cy- the hydrolysis of the GTP. toplasm, whereas in eukaryotes the discharged In summary, translation is a complex process tRNA is expelled directly into the cytoplasm. This guided by a large number of protein–protein, translocation step requires another monomeric protein–RNA, and RNA–RNA interactions. The G-protein called EF-G (also called EF2). EF-G • translation reaction mechanism and the struc- GTP stimulates the transition of the movements ture and function of the translational compo- of both the deacylated tRNA and the peptidyl nents are conserved across species. The different tRNA. The mechanism by which EF-G drives this roles of mRNA, tRNA, and rRNA in the mecha- transition reaction is still under investigation; nisms of the various steps of translation provide evidence supports both the movement through the best evidence that cells have evolved from a energy derived from the hydrolysis of the GTP primordial RNA world. In the next section, we and a more indirect role in which the EF-G • explore some examples of the regulation of gene GTP stabilizes the pre-translocation state, while expression at the level of translation.

4.14 Translation is guided by a large number of protein factors 155 2nd pages 1st pages

9781284023558_CH04_0103.indd 155 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Polypeptide chain

Ribosome GDP

tRNA RF2 mRNA

5 AAG UAA 3 AAG UAA AAG UAA EP AEPA EP A GDP RF3

GDP GTP

GTP

AAG UAA AAG UAA AAG UAA EP AEPA EP A

FIGURE 4.70 The translation termination pathway. The translation termination reaction in bacteria is regulated by one of several release factors (in this case, RF2). When the ribosome encounters a stop codon (UGA, UAA, UAG), the release factor binds to the stop codon in the A-site. The ribosome hydrolyzes the aminoacyl linkage between the tRNA in the P-site and the polypeptide, thereby releasing the polypeptide from the ribosome. The RF3–GDP complex binds to RF2, causing the dissociation of the GDP. RF2 binds to a GTP molecule and dissociates from the ribosome. RF3 binds a GTP, and the subsequent hydrolysis of GTP results in the dissociation of RF3. The deacylated tRNA remains bound to the P-site until the complex is dissociated by the binding of ribosome release factor (RRF) to the A-site. (Adapted from L. Kisselev, et al. EMBO J. 22 (2003): 175–182.)

Concept and Reasoning Check Although most of our previous discussion of gene regulation has focused on control 1. Trace the steps in translation that are regulated by G-proteins and GTP hydrolysis and explain the func- of mRNA biosynthesis and processing, gene tion of each of the translation elongation factors. expression can also be regulated at the level of translation. Although there are examples where translation elongation and termina- 4.15 Translation is controlled tion are regulated in both prokaryotes and ­eukaryotes, we focus our discussion on the by the interaction of the control of translation initiation. 5′ and 3′ ends of the In prokaryotes, the most important factors that control translational initiation are the in- mRNA and by translational trinsic strength and physical accessibility­ of the repressor proteins Shine-Dalgarno sequence within the ribosome binding site, which are in turn influenced by Key concepts both the sequence and secondary structure • The cytoplasmic poly(A) binding protein, PABPC1, functions as a translation initiation factor by of the mRNA. Those initiation sites whose binding to the poly(A) tail at the 3′ end of the Shine-Dalgarno sequences deviate greatly transcript and interacting with the cap binding from the consensus sequence, or whose Shine- proteins on the 7-MeG cap at the 5′ end of the Dalgarno sequences are located in regions of transcript, via looping of the mRNA back on itself stable RNA secondary structure, require trans- to form a circular structure. lational ­activators to increase the affinity of • The mTOR pathway is an example of how the cell can modulate translation in response to the avail- the 30S ­ribosomal subunit for the ribosome ability of amino acids for protein synthesis. binding site. These activators interact directly • Translational repressors, such as the iron-response with the 30S ribosomal subunit or function to element binding protein, can regulate translation denature the RNA secondary structure around initiation by binding to specific RNA sites within the ­initiation site. the 5′ UTR and blocking the 40S ribosome from Regulation of eukaryotic translation ini­ locating the initiation codon. tiation occurs primarily by controlling the

156 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 156 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

its assembly with the 7-MeG cap. When nu- A UAA AA mRNA AA trients and growth factors are at high levels, AA PABP AA AAAAA the mTOR complex is activated, leading to the PABP PABP phosphorylation of 4E-BP1, causing it to dis- 4G sociate from eIF4E, thereby stimulating global 4A AUG 4E 7 translation initiation. Another target of mTOR m G kinase activity is the ribosomal protein S6 ki- nase, which phosphorylates ribosomal protein FIGURE 4.71 The circularization of mRNA during S6 on the small subunit of the 40S ribosome. ­eukaryotic translation initiation. During translation Phosphorylation of S6 activates the 43S ribo- initiation, mRNA circularizes through the interaction somal PIC. between the cytoplasmic poly(A) binding protein Another important example of transla- PABPC1 on the poly(A) tail and the eIF4E–eIF4G– tional regulation in eukaryotes is the control of eIF4A complex on the 5′ 7-MeG cap. (Adapted from iron levels in mammals by the proteins ferritin T. Preiss and M. W. Hentze, Bioessays 25 (2003): and transferrin. Because iron is a toxic metal 1201–1211.) that can cause oxidative damage to cellular components, it is important to keep excess iron stored safely as a complex with ferritin in the ­binding of the initiation complex to the 7-MeG bloodstream until needed for synthesis of he- cap at the 5′ end of the mRNA or modulating the moglobin and other iron-containing enzymes. translocation and initiation of the 43S complex Transferrin helps iron enter cells through trans- at the initiation codon. Regulatory signals for ferrin receptor-mediated endocytosis. When these processes can be found at both the 5′ and iron levels are low, it is advantageous to the 3′ ends of the eukaryotic mRNA. As discussed cell to inhibit the levels of ferritin so that any in the previous section, the eukaryotic initia- remaining free iron can be available for endo- tion factors eIF4E, eIF4G, eIF4A, and eIF4B cytic transport by transferrin via the transferrin form the 7-MeG cap binding complex that receptor. prepares the mRNA for assembly with the 43S To reduce the levels of ferritin and ­increase complex consisting of eIF2, eIF3, tRNAmet, and the levels of transferrin receptor, a family of the 40S ribosomal subunit. The activity of the iron regulatory proteins, called IRPs, bind cap binding complex can be greatly ­enhanced to a site located on both the ferritin and by the cytoplasmic poly(A) tail binding protein, the transferrin receptor mRNAs, called the PABPC1. It is proposed that this stimulation iro­ n-response element or IRE. The IRE is occurs by the interaction of the PABPC1-bound composed of consensus sequence elements and poly(A) tail at the 3′ end of the mRNA with a distinct stable RNA secondary structure and the initiation factor complex on the cap at the is located upstream of the translation initiation 5′end through the looping of the poly(A) tail site on ferritin mRNA (FIGURE 4.73). When the to form a circular mRNA (FIGURE 4.71). IRE is bound by the IRP, the complex blocks An important example of a global trans- the scanning 40S ribosome from reaching the lational regulatory system that targets the start codon. formation of the eukaryotic translation initia- In the case of the transferrin receptor, mul- tion complex is the mTOR signal transduction tiple IREs are located in the 3′ UTR. Binding of pathway. mTOR stands for mammalian target an IRP to the IREs in the transferrin receptor of rapamycin. Rapamycin is an antibiotic that 3′ UTR greatly stabilizes the receptor’s normally binds to mTOR and inhibits its function. mTOR unstable mRNA. Thus, when iron levels are is a protein that forms a complex with two low in the blood, the cell acts to decrease the other proteins, GβL and Raptor (FIGURE 4.72). level of ferritin translation initiation and to Together this complex is an active protein ki- increase the stability of the transferrin recep- nase and the node for the intersection of a tor mRNA. The IRP has an iron binding site; number of different receptor-mediated sig- when the iron binding site on the IRP is filled naling pathways that are involved in the reg- with iron, the affinity of IRP for the IRE is sig- ulation of cell growth and stress responses to nificantly reduced. Consequently, when iron nutrient deprivation. When cells are deprived levels are high in the blood, ferritin synthesis of nutrients, including amino acids, the cell is increased to enhance iron storage. High iron shuts down translation by activating the eIF4E levels also release the binding of IRPs to the binding protein called 4E-BP1, which binds to transferrin receptor IREs and thereby stimulate translation initiation factor eIF4E and blocks the degradation of IRP.

4.15 Translation is controlled by the 5′ and 3′ ends of the mRNA and repressor proteins 157 2nd pages 2nd pages

9781284023558_CH04_0103.indd 157 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

A AA AA AA PABP AA Rapamycin AAAAA PABP mRNA elF4E complex PABP 4G 4G 4A 4A elF4E-BP 4E 4E m7G Raptor

mTOR

GβL S6 kinase S6 40S

Ribosomal 40S ribosome protein Initiation of protein synthesis Nutrients and growth factors FIGURE 4.72 The mammalian target of rapamycin (mTOR) signal transduction pathway regulates translation in mammalian cells. Rapamycin is an antibiotic that binds to mTOR and inhibits its function. High levels of nutrients and growth factors induce the assembly of the proteins mTOR, GbL, and Raptor into a complex that is an active protein kinase. The mTOR complex activates initiation of protein synthesis through the phosphorylation of S6 kinase and eIF4E-BP. Phosphorylation activates S6 kinase leading to phosphorylation of ribosomal protein S6, which enhances translation initiation. eIF4E-BP is an inhibitor of eIF4E and translation initiation under low growth conditions. eIF4E is a translation initiation factor that binds to the 7-MeG cap and serves as the first step in the binding of the 40S ribosome (see Figures 4.68 and 4.71). Phosphorylation by the mTOR complex causes eIF4E-BP to dissociate from eIF4E, thereby activating translation initiation.

In summary, translational control plays the 43S complex or the poly(A) tail with the an important role in the regulation of specific cap binding complex. The levels of translation genes, using a set of activator and repressor are directly related to the amount of mRNA proteins and specific sequences to modulate available. In the next section, we ­explore ribosome assembly with mRNA and the recog- another type of cellular control of protein nition of the translation initiation site. Another synthesis—the localization of mRNAs to spe- very common strategy for regulating transla- cific regions of the cell for the purpose of site-­ tion initiation is to modulate the ­interaction of specific translation.

Iron Scarce Iron Plentiful

IRP Fe Fe Fe IRP Ferritin mRNA Ferritin mRNA

7 7 5’ cap m GmAAAA(n) G AAAA(n) 3’ IRE Protein-coding regionIRE Protein-coding region

The mRNA is translated, No translation producing ferritin, which facilitates iron storage

FIGURE 4.73 Regulation of ferritin mRNA translation initiation by the iron-response element binding protein. The binding of an IRP to the IRE in the 5′ UTR of the mRNA encoding ferritin blocks translation initiation. When the IRE is bound by the IRP, the complex blocks the scanning 43S initiation complex from reaching the start codon. When iron levels are low in the blood, IRP binds tightly to the IRE and inhibits translation of ferritin mRNA. When iron levels are high in the blood, iron binds to the IRP and causes it to dissociate from the IRE site on the mRNA, thereby increas- ing ferritin synthesis to enhance iron storage. (Adapted from J. M. Cooper and R. E. Hausman. The Cell: A Molecular Approach, Fourth edition. Sinauer Associates, Inc., 2006.)

158 chapter 4 Gene expression and regulation 2nd pages 1st pages

9781284023558_CH04_0103.indd 158 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Concept and Reasoning Check known as a zip code, which binds to a protein known as a zip code binding 1. how might translational control be achieved at the other steps in translation subsequent to initiation? protein. • The mRNA together with the ZBP–zip code complex is exported to the cyto- plasm where the mRNP is remodeled 4.16 Some mRNAs are and several mRNPs assemble into an translated at specific RNP granule. • The ZBP-containing RNP granule binds locations within the the motor proteins dynein or ­kinesin, cytoplasm whose primary function is to move mo- lecular cargo along the microtubule Key concepts cytoskeleton. Alternatively, the RNP • Translation of selected mRNAs in specific regions granule can bind to the motor protein of cells generates structural and functional cell myosin, which moves cargo along the polarity. actin microfilament ­cytoskeleton. • Zip code binding proteins (ZBPs) bind to zip codes in the mRNA and help to deliver some proteins to Both types of motor proteins use ATP specific sites in the cytoplasm for translation using as an energy source for the mecha- the microfilament and microtubule cytoskeletal nochemical transport of the RNP systems. ­granule. Other examples of mRNA localization in- In the last section, we saw how mRNA binding clude the localization of specific mRNAs to the proteins could regulate translation. However, axons and dendrites of neurons and the move- not only can translation of mRNA be controlled ment of specific mRNAs to distinct locations in temporally, it can also be regulated spatially; Drosophila embryos during the early stages of more than 50% of mRNAs coding for cytoplas- development. mic proteins are localized to specific regions in the cytoplasm before translation occurs. mRNA Concept and Reasoning Check localization leads to spatially restricted transla- tion of an mRNA, allowing for multiple rounds 1. During the localization of mRNAs to specific sites in the cytoplasm, translation of the mRNA of translation of the same mRNA and providing is ­inhibited. Suggest a mechanism for this inhi- a large localized concentration of newly synthe- bition and for activation of translation once the sized protein at a specific cellular location and mRNA reaches its final destination. time. The localization of protein synthesis to the site where the proteins will be used is much more efficient than transporting the protein from a large number of spatially unrestricted 4.17 Sequence elements in the translation sites to a specific cellular location. 5′ and 3′ untranslated One of the best examples of mRNA local- ization is β-actin mRNA. Fibroblast cells can regions determine the move along substrates by extending out pro- stability of an mRNA cesses called lamellipodia. Lamellipodia form at the leading edge of a cell, extend forward Key concepts and adhere to the surface in front of the cell, • Sequences located within the 5′ and 3′ UTRs as well as the coding regions of mRNAs interact with and then pull the cell toward the adhesion proteins that regulate RNA degradation. point. This movement is governed by the local • Degradation of mRNAs occurs by two classes of synthesis and polymerization of β-actin into nucleolytic reactions: the exonuclease and endo- filaments at the site of lamellipodium forma- nuclease pathways. The exonucleolytic decay path- tion. β-Actin mRNA is localized to the lead- way begins with the degradation of the poly(A) ing edge of the cell and translated where the tail, followed by either decapping and 5′ to 3′ exonucleolytic degradation of the mRNA or 3′ to lamellipodium is to be formed. The following 5′ degradation from the 3′ end by the exosome mechanism based on studies of β-actin mRNA exonuclease complex. The endonucleolytic decay localization (FIGURE 4.74) may also be applicable pathway uses any one of several site-specific en- to many localized mRNAs: donucleases that produce mRNA cleavage products • Located within the 3′ UTR of the lo- that can be further degraded by the exonucleolytic pathways. calized mRNA is a sequence element

4.17 Sequence elements in the 5′ and 3′ untranslated regions determine the stability of an mRNA 159 2nd pages 1st pages

9781284023558_CH04_0103.indd 159 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

FIGURE 4.74 Cytoplasmic localization and site-­ Cytoplasm Nucleus specific translation of mRNA. mRNAs that are to be localized to specific regions of the cytoplasm contain a sequence in the 3′ UTR called a zip code, which binds Transcription to proteins called zip code binding proteins. After 5’ 3’ export to the nucleus, the mRNP complex assembles Formation of RNP in nucleus into granular aggregates, which are transported to Splicing specific locations in the cytoplasm either by dynein or kinesin along microtubules or by myosin along 5’ 3’ actin filaments. (Reprinted from Cell, vol. 136, K. C. Export Martin and A. Ephrussi, mRNA Localization . . . , pp. 719–730, Copyright (2009) with permission from Elsevier [http://www.sciencedirect.com/science 3’ 5’ /journal/00928674].) Remodeling of RNP in cytoplasm; RNP oligomerization

5’ 3’

Assembly into RNA granules

Transport by moleculer motors along cytoskeleton Myosin Dynein Kinesin

Microtubules: Kinesin or dynein Actin: Myosin

An additional strategy for the regulation of Specific protein complexes divert the faulty gene expression is mRNA stability. The life- mRNAs into various decay pathways. times of specific mRNAs are often carefully The decay of mRNA can occur by exonu- controlled by the cell to optimize the level of cleases that degrade the mRNA from either protein synthesis. Proteins that are required the 5′ end or the 3′ end or by endonucleases in only small amounts due to their potential that cleave mRNAs at specific internal sites, cytotoxic effects at higher levels or proteins ­producing products for further degradation via that are needed for only specific periods in the exonuclease pathways. In exonucleolytic decay, cell cycle are encoded by mRNAs that con- the two most important structural features of tain signals for rapid and effective degrada- ­mRNAs that protect the transcript from prema- tion. Another function of mRNA decay is the ture degradation are the 7-MeG cap and the quality control of mRNAs. Genes that have ac- poly(A) tail; consequently, the exonucleolytic cumulated nonsense mutations or mutations mRNA decay pathways are regulated by pro- in translation termination codons can produce teins bound to sites located in the 5′ and 3′ UTRs proteins with abnormal lengths (shorter or of the transcript (FIGURE 4.75). Exonucleolytic longer than normal), which may be highly decay pathways, which are found in most eu- toxic to the cell. In addition, errors during karyotic cells, including mammalian cells, begin­ mRNA synthesis (transcription and process- with the 3′ to 5′ exonucleolytic degradation of ing) can produce various types of mutations the poly(A) tail by specific nucleases that are that could potentially lead to toxic proteins. still under intensive investigation. Once the

160 chapter 4 Gene expression and regulation 2nd pages 1st pages

9781284023558_CH04_0103.indd 160 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

Activity Protein Substrate (a) Exonuclease decay pathways (also known as)

7 DCP2 Deadenylated mRNA m GpppG AAA(A)n Decapping NUDT16 Deadenylated mRNA Deadenylation

m7GpppG AA Deadenylating PAN2 Poly(A) mRNA CNOT6 (CCR4A) Poly(A) mRNA PARN, PAN2, PAN3, CNOT6L (CCR4B) Poly(A) mRNA Decapping CNOT6, CNOT6L, CNOT7 (CAF1A) Poly(A) mRNA CNOT7, CNOT8 CNOT8 (CAF1B or Poly(A) mRNA POP2) GpppG AA PARN Poly(A) mRNA

DCP2, NUDT16 5–3 decay 3–5 decay 5′-to-3′ XRN1 Decapped mRNA, exonucleolytic endonucleolytically cleaved mRNA

XRN1 Exosome (including RRP44 3′-to-5′ RRP44 (DIS3L) Deadenylated mRNA, or EXOSC10) exonucleolytic endonucleolytically cleaved mRNA EXOSC10 Deadenylated mRNA, (b) Endonuclease decay pathways (RRP6 or endonucleolytically PM/Scl-100) cleaved mRNA Endonuclease

7 m GpppG AAA(A)n Endonucleolytic PMR1 Translating poly(A) mRNA ZC3H12A Poly(A) mRNA (MCPIP1) 3–5 decay 5–3 decay IRE1 Endoplasmic- reticulum-associated 7 m GpppG AAA(A)n poly(A) mRNA SMG6 NMD targets Exosome XRN1

FIGURE 4.75 Two major pathways for eukaryotic mRNA decay. The decay of mRNA can occur by exonucleases that degrade the mRNA from either the 5′ end or the 3′ end or by endonuclease pathways. The 7-MeG cap and the poly(A) tail protect the mRNA from degradation, whereas the mRNA is translated. Decay of the mRNA in the exonucleolytic pathways begins with the 3′→5′ exonucleolytic degradation of the poly(A) tail by specific nucleases that are still under intensive investigation. Once the poly(A) tail has been degraded to less than 10 adenine residues, the remainder of the mRNA is degraded by either removal of the 7-MeG cap followed by 5′→3′ exonucleolytic digestion of the remaining body of the mRNA by the Xrn1 nuclease or by 3′→5′ exonucleolytic digestion of the rest of the poly(A) tail followed by degradation remaining body of the mRNA by the exosome. (Adapted from D. R. Schoenberg and L. E. Maquat, Nat. Rev. Genet. 13 (2012): 246–259.)

poly(A) tail has been degraded to less than 10 AREs contain numerous copies of the short adenine residues, the remainder of the mRNA sequence element AUUUA or the longer is degraded by one of the two following sets of ­sequence UUAUUUAUU along with one or events: more stretches of uridine residues. More than 1. Removal of the 5′ 7-MeG cap followed 15 ARE binding proteins have been identified by 5′ to 3′ exonucleolytic digestion of in mammalian cells. ARE elements are most the remaining body of the mRNA. often found in mRNAs encoding proteins 2. 3′ to 5′ exonucleolytic digestion of the whose levels must be carefully controlled and rest of the poly(A) tail followed by the whose expression is only desirable for a specific remaining body of the mRNA. short period during the cell cycle. Some ARE Although a number of specific protein binding proteins are expressed in all tissues, binding sites are involved in mRNA stability, whereas others are tissue specific. In addition, the best characterized and most common pro- some ARE binding proteins are present at only tein binding site is the AU-rich element or specific stages of the cell cycle. ARE binding ARE (FIGURE 4.76). AREs range between 50 proteins can function to either stabilize or de- and 150 nucleotides in length and represent stabilize the mRNA transcript, depending on a diverse set of nucleotide sequences. Many the ARE binding protein and the presence of

4.17 Sequence elements in the 5′ and 3′ untranslated regions determine the stability of an mRNA 161 2nd pages 1st pages

9781284023558_CH04_0103.indd 161 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

FIGURE 4.76 AU-rich elements control mRNA Sequences involved in mRNA post-transcriptional regulation stability. AU-rich elements (AREs) are found in the 3′ UTRs of mRNAs that have short half- 5’UTR 3’UTR lives. The ARE binds to one of several different ARE binding proteins, which stimulates the degradation of the mRNA. (Courtesy of Rebecca ARE-BP ARE-BP Hartley, University of New Mexico School of ATG Stop Medicine. Reproduced from an illustration by CAP ORF ARE NPS AAA(A) Rebecca Hartley, University of New Mexico n School of Medicine.)

NPS Nuclear polyadenylation signal, AAUAAA ARE-BP Destabilizing ARE-binding protein AUF1, TTP, BRF1, TIA-1, TIAR, ORF Open reading frame KSRP ARE AU-rich element ARE-BP Stabilizing ARE-binding protein Type Sequence HuA, HuB, HuC, HuD, HuR I (AUUUA)1-5, in an U-rich context IIA (AUUU)5A IIB (AUUU)4A IIC (A/U)(AUUU)3A(A/U) III U-rich region

other mRNA stability proteins associated with ­endonucleases that cleave ­mRNAs at specific the 3′ UTR. A complex called the exosome, sites in response to presence of a premature which contains multiple exonucleases, has stop codon or the lack of a proper stop codon. been found in most eukaryotic cells and has Once endonucleolytic cleavage occurs,­ the been shown to be ­involved in the 3′ to 5′ nu- mRNA fragments are further degraded by the cleolytic degradation of mRNAs. The exosome exonucleolytic pathways. has a preference for mRNAs that contain AREs. The nonsense-mediated decay or NMD In addition to regulating the levels and pathway degrades mRNAs that contain pre- temporal expression of specific proteins, mature stop codons. The NMD complex as- mRNA decay is an important surveillance sembles with a ribosome to carry out a single mechanism to identify and remove mRNAs round of nonproductive translation that scans containing mutations that could lead to the the mRNA for premature stop codons. The production of proteins that are deleterious to factors that govern which mRNAs go through the cell. Such mutations include premature NMD surveillance are still under ­investigation. stop codons caused by frame shift mutations A working model for NMD is shown in that occur due to mRNA splicing errors. The FIGURE 4.77. The general features of the model ORFs of ­protein-coding regions in the ge- are the detection of the exon junction sites that nome are ­usually protected by evolutionary are “marked” during splicing by the binding constraints from acquiring inherited stop of a group of proteins referred to as the exon codons that would cause premature transla- junction complex. The ribosome uses the tion ­termination and truncated proteins. If an exon junction complexes as reference points ­aberrant splicing event occurs that generates to begin the search for ­premature termination a frame shift mutation, however, there is no codons relative to the normal stop codon at the evolutionary protection, and probability of en- end of the ORF. When a premature termination countering a stop codon now becomes 3 in 64 codon is detected, a group of proteins assemble or about 1 in 20 for every codon that is read with the ribosome to guide the exonucleolytic by the ribosome, leading to the strong likeli- cleavage of the mRNA. hood that a truncated protein will be produced. An mRNA decay pathway similar to NMD, Likewise, aberrant alternative splicing can also called nonstop-mRNA decay, detects the produce mRNA products that lack an in-frame lack of an in-frame stop codon. Nonstop-mRNA stop ­codon. Thus, it is almost inevitable that a decay not only protects the cell against the pro- mutant protein will result from a splicing er- duction of potentially deleterious proteins that ror. The mRNA quality control pathways pri- arise from a lack of a stop codon but also en- marily involve endonucleolytic decay using sures that ribosomes and other translational

162 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 162 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

components are not tied up in unproductive protein synthesis and are properly recycled. 7 m G (A)n In summary, mRNA decay is an impor- PTC STOP tant mechanism for controlling the amount Splicing of mRNA available for translation. The signals UPF3 UPF3 that control mRNA exonucleolytic decay are 7 located in the 5′ and 3′ UTRs of the transcript m G (A)n Nucleus EJCEPTC JC STOP and ­interact with a variety of proteins that in turn interact with various mRNA binding proteins. In addition, endonucleolytic mRNA Cytoplasm decay plays a critical role in mRNA quality Nascent UPF2 control to prevent the production of deleteri- peptide ous ­proteins. In the next section, we explore another mechanism for control of mRNA (A) m7G n utilization. PTCSTOP UPF1 eRF3 Concept and Reasoning Check SMG1 eRF1 1. What are the advantages of regulating gene ex- pression at the level of mRNA stability versus SURF transcription?

7 m G (A)n STOP 4.18 Noncoding RNAs are P important regulators of 7 m G (A)n STOP gene expression SMG5 SMG7 Key concepts • RNA interference (RNAi) is a recently discovered P set of pathways that use small ncRNAs to inhibit expression of mRNAs. • Small or short interfering RNAs (siRNAs) are 19- to 23-bp double-stranded RNAs (dsRNAs) found in plant cells and some animal cells; they contain se- mRNA decay quences complementary to specific mRNAs. siRNAs as part of the protein containing RNA-induced si- FIGURE 4.77 A model for nonsense-mediated mRNA decay. The lencing complex (RISC) base pair to the mRNA and nonsense-mediated decay (NMD) pathway is an mRNA quality induce mRNA decay. Synthetic siRNAs have become control process to degrade mRNAs that contain a premature ter- a powerful tool in biotechnology to knock down mination codon (PTC). PTCs usually result from splicing defects gene expression of specific genes. caused by mutations in splicing signals and can lead to the • miRNAs are originally generated as larger precursors, from segments of introns or from their own transcrip- synthesis of deleterious proteins. After mRNA splicing, the NMD tion units, and are processed in the cytoplasm to core protein UPF3 binds as part of the exon junction complex mature 22 nucleotide miRNAs by Dicer. Like siRNAs, (EJC), marking each junction between adjacent exons. After miRNAs assemble into a RISC and function to silence cytoplasmic export, UPF2 joins UPF3. If a ribosome encounters the expression of mRNAs. miRNAs can promote mRNA a premature termination codon, four other proteins, UPF1, decay but also have the capacity to directly inhibit translation without degrading the mRNA. SMG1, eRF1, and eREF3, form a complex at the stalled ribo- some called SURF. UPF1 binds to UPF2 at the nearest EJC and causes SMG1 to phosphorylate UPF1, leading to the assembly Perhaps the single most important event in of SMG5 and SMG7 and to the subsequent degradation of the molecular cell biology during the past decade mRNA. (Reprinted from Trends Genet., vol. 24, L. Linde and B. has been the discovery of small ncRNAs that Kerem, Introducing sense into nonsense . . . , pp. 552–563, regulate gene expression. The processes are Copyright (2008) with permission from Elsevier [http://www collectively referred to as RNAi. Although .sciencedirect.com/science/journal/01689525].) several types of RNAi have been identified, the two largest and best studied classes of RNAi are siRNAs­ and miRNAs, which both carry out their inhibitory functions through the

4.18 Noncoding RNAs are important regulators of gene expression 163 2nd pages 2nd pages

9781284023558_CH04_0103.indd 163 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

­sequence-specific recognition and binding by Dicer to yield a 21- to 23-bp dsRNA product. base pairing to target mRNAs. Both siRNAs and (FIGURE 4.78). The duplex siRNA is denatured miRNAs are generated by a set of enzymologic and one strand, known as the guide strand, pathways that begin in the nucleus and end in assembles with one of several members of the cytoplasm and share some common enzy- the Argonaute (Ago) protein family along mologic processing steps. siRNAs were origi- with the GW182 protein and several other nally identified in virus-infected plants and in proteins to form the RISC. The RISC uses the nematodes and are derived from larger dsRNA guide strand to recognize and cleave specific precursors that are transcribed and processed mRNAs in the cytoplasm. Only a few natu- in the nucleus into shorter dsRNAs; these short rally occurring mammalian siRNAs have been dsRNAs are in turn exported to the cytoplasm reported thus far, but synthetic siRNAs have for further trimming by the enzyme called emerged as a powerful tool for the study of

Hairpin RNA Long dsRNA siRNA

Cleavage by Dicer 5-end phosphorylation and assembly into RISC and assembly into RISC

AGO2

PO 4 -OH PO4 PO HO- PO4 4 Guide-strand loading and mature-RISC formation

AGO2 PO4 Target-mRNA recognition

PO4 m7G AAA mRNA cleavage

PO4

7 AAA m G

FIGURE 4.78 The siRNA pathway. siRNAs target and degrade specific mRNAs. Naturally occurring siRNAs are derived from larger dsRNA precursors, made in the nucleus and processed in the cyto- plasm by the nuclease Dicer, to a final 21- to 23-bp siRNA product. siRNAs can also be synthesized in the laboratory as short double-stranded products and transfected into cells. The siRNA is un- wound and one strand, known as the guide strand, along with one of the Argonaute (Ago) family proteins, assembles into what is known as the RISC. The RISC uses the guide strand to recognize and cleave specific mRNAs in the cytoplasm. (Reprinted by permission from Macmillan Publishers Ltd: Nat. Rev. Mol. Cell Biol., T. M. Rana, Illuminating the silence . . . , 2007, vol. 8 (1): 23–36.)

164 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 164 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

­mammalian gene expression. Synthetic siRNAs in 5′ UTRs and the coding regions. The identifi- can be designed to target specific mRNAs and cation of target sequences for miRNAs is under can be transfected directly into mammalian very active investigation, and more than 500 cells or synthesized inside the cell in situ using different target sequences have been identified. transfected expression plasmid vectors. Once In vertebrates, individual miRNAs have been inside the cell, a synthetic siRNA can ­assemble found to play important roles in differentia- into the RISC and degrade the targeted mRNA, tion and development. miRNAs can function ­often resulting in a near elimination or at least to either destabilize mRNAs through the RISC a significant decrease (knockdown) in the ex- or can directly repress translation. pression levels of a specific protein. siRNAs The pathways for the biogenesis of miRNAs have been engineered for silencing specific is shown in FIGURE 4.79. The original precur- ­mRNAs coding for disease-related proteins and sors of miRNAs can be either a primary tran- are expected to become increasingly important script (pri-miRNA) synthesized from its own as therapeutic agents. promoter or from an excised mRNA intron. In contrast to siRNAs, miRNAs are Pri-miRNAs and intronic miRNA precursors highly abundant in mammalian cells, and it is are processed into pre-miRNAs within the ­estimated that over 1,000 different miRNAs are nucleus by an enzymatic complex­ includ-­ encoded by the human genome; many of these ing the nuclease Drosha and the DCGR8 have been functionally validated to participate ­protein (also called Pasha). The pre-miRNA in gene regulation. As in the case of siRNAs, is exported to the cytoplasm,­ where the same miRNAs bind to specific mRNAs through com- Dicer complex that was used for siRNA gen- plementary base pairing. Although the miRNA eration processes the pre-miRNA into the target sequences are most often found in the single-stranded 22 nucleotide final miRNA 3′ UTRs of mRNAs, targets may also be located product (Figure 4.79). The miRNA assembles

FIGURE 4.79 The miRNA pathway. Nucleus miRNAs can be derived from intronic

CCCAC regions of mRNA transcripts or can A CA AC C Pasha C U U G be transcribed from their own pro- G G G G G A A G moters. In this figure, the primary UU UA U A U A Dicer transcript, called a pri-mRNA, is Drosha G C Pre-miRNA A U U A transcribed from its own promoter A U U A G C U A RAN– and processed into a pre-miRNA U A G U G C GTP A U miRNA: within the nucleus by an enzymatic U A G C miRNA* complex, including the nuclease U A U G Exportin G U Unwind G C duplex AU 5 Drosha and the DCGR8 protein (also G U U A U known as Pasha). The pre-miRNA is G C G C G U U A exported to the cytoplasm, where Mature miRNA the Dicer complex processes the MGpppG7 AAAAA pri-miRNA into a 22-nucleotide final miRNA product. The miRNA assembles with Argonaute (Ago) Asymmetric miRISC Imperfect assembly complementarity protein and GW182 into the RISC. Perfect Depending on the degree of com- complementarity ORF plementarity between the miRNA Pol II and its target mRNA sequence, as 7MGpppG AAAAA 5 UTR 3 UTR well as other factors, the attach- miRNA gene Translational repression ment of the miRNA–RISC leads to ORF mRNA degradation or to transla- Target AAAAA 7MGpppG tional interference. (Adapted from mRNA mRNA cleavage A. Esquela-Kerscher and F. J. Slack, Nat. Rev. Cancer 6 (2006): 259–269.) Cytoplasm

4.18 Noncoding RNAs are important regulators of gene expression 165 2nd pages 2nd pages

9781284023558_CH04_0103.indd 165 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

(a) Perfect miRNA 5’ match miR-375 A U U C G G G U U 5 U U U G U U C G C C G C G A A A C A A G C G G U G U A A U U U 3 U Myotrophin 3’ UTR (b)

5 Imperfect 3 C A miRNA G U 5’ match A C G A let-7 C U U A U A U A G C U A U A C A A C C U A G U G U G U U G G A U U U A U G U lin-41 3’ UTR

FIGURE 4.80 The miRNA “seed” sequence. Interaction between a miRNA and its target mRNA is mediated by base pairing between the target site on the mRNA and an seven- to eight-nucleotide segment on the miRNA guide strand referred to as the seed sequence. (A) Perfect complementarity between the miRNA miR-375 seed sequence and the myotrophin mRNA target but imperfect complementarity outside the seed sequence. (B) Imperfect complementarity between miRNA let-7 and lin-41 mRNA but more extensive complementarity outside of the seed sequence. (Adapted from N. Rajewsky, Nat. Genet. 38 (2006): S8–S13.)

along with one of several members of the Ago Concept and Reasoning Check protein family into the RISC, similar to the 1. A number of mRNAs contain several different siRNA pathway. Depending on the miRNA miRNA target sites in their 3′ UTR. How might the and its target sequence, the attachment of the presence of multiple sites be important in cellular miRNA–RISC leads to mRNA degradation or to ­development? translational interference. Interaction between the miRNA and the target mRNA is mediated by base pairing between the target site on the 4.19 mRNA and an 8-nt segment on the miRNA re- What’s next? ferred to as the guide sequence (FIGURE 4.80). In this chapter we reviewed the individual events Computer algorithms have been developed to in gene expression and considered how each step identify potential miRNA target sites in mam- is regulated. The linear presentation of these pro- malian genomes, and it is common to en- cesses, however, is not an accurate portrayal of counter at least a half dozen different miRNA how they actually occur inside the cell; these targets within any given mRNA 3′ UTR or 5′ events do not occur as individual biochemical UTR; confirmation of the target site is carried assemblies and reactions but as an integrated, out experimentally. complex dynamic system, functionally­ linked The study of the role of miRNAs in cell together by regulatory factors that act in multi- differentiation and embryonic development is ple steps to keep the reactions coupled to each one of the most active areas of investigation in other, facilitating feedback and autoregulatory cell biology. It is now clear that miRNAs are also behavior. It is also important to emphasize that involved in many disease processes, including expression of genes is not monotonic; in other cancer and cardiovascular disease. words, genes are not simply on or off but have a

166 chapter 4 Gene expression and regulation 2nd pages 2nd pages

9781284023558_CH04_0103.indd 166 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

dynamic range of expression. By using multiple ­communication includes not only interacting as layers of regulation at different stages of expres- individual cells but also interacting physically­ to sion, it is possible to control RNA and protein build the next levels of biologic structure within levels temporally with a much larger dynamic the organism—tissues and organs. Thus, the range. In addition to temporal regulation, gene expression of genes in one cell must be coordi- ­expression is controlled spatially within the con- nated with the expression of genes in other cells text of the structural organization of the cell. in its immediate neighborhood as well as in tis- Trafficking of RNAs and proteins between the sues and organs located at a significant distance. various membrane-bound organelles as well These intercellular gene control pathways as the nonmembranous subcompartments are are almost a complete mystery at the moment, more fully explored elsewhere, but suffice it to but understanding them at this higher level is say that the interplay between temporal and critical if we are to exploit our knowledge of spatial regulation adds an increasing level of gene regulation to the fullest extent in the pro- complexity to our understanding of the control motion of health in organisms and populations. of gene expression. To address the future challenges of under- Although a plethora of applications and im- standing the coordinated function of cellular plications for studies of gene regulatory systems genomes at the organism and population lev- exist, some of the most compelling are in medi- els, we will have to combine experimental bi- cine. There are more than 210 distinct cell types ology with mathematical and computational in the adult human body. Each cell in the body ­approaches. We are just beginning to see the ap- contains the same linear sequence of DNA—the plication of sophisticated computer algorithms same set of genes (the same genome). It is the and network theory to the analysis of global gene expression pattern of the genome, how- gene regulatory systems. The first attempts at ever, that determines the ­structure, differenti- using this interdisciplinary approach have been ation state, and metabolic homeostasis of each applied to the understanding of networks of of these cell types. The failure to control the transcription activators and repressors involved growth, proliferation, and differentiation of just in cell differentiation, development, and cancer. a single cell can have enormous ramifications for Recently, a very large consortium of investiga- the health and survival of the entire organism. tors from around the world have collaborated The evolution of a complete understanding on a project to systematically map the regions of the interactive pathways that lead to cellular of transcription, transcription factor association, growth, proliferation, and differentiation will chromatin structure, and histone modification provide us with increasingly greater capabilities for the entire human genome. This project, en- for applications to improve health and treat titled “The Encyclopedia of DNA Elements,” disease. Ultimately, by understanding gene ex- or ENCODE, promises to provide new insights pression at the integrative systems level, it may into the regulation of our genes and genome. be possible to modify and correct aberrant gene As we have seen in this chapter, however, it is expression to treat a variety of human diseases. now clear that gene regulation also occurs at the Stem cells are pluripotent and can differen- level of alternative mRNA splicing, translation, tiate into a wide variety of different cell types. and mRNA stability; thus, these processes will Programming stem cells to differentiate­ into have to be integrated into a truly comprehensive functional tissues and organs will pave the way modeling of gene expression systems. for effective disease intervention strategies. It The iterative approach of testing theoretical may even be possible to partially or completely models experimentally and using the results to reprogram differentiated cells to change cell modify the models increases our level of un- types. However, even the ability to repair an in- derstanding of how gene expression systems herited mutation in a single faulty master reg- provide the direction for the complex dynamic ulatory gene would have enormous benefits. structural and functional integrity of the cell—in What is necessary to successfully inter- the context of its physical environment and in vene in the control of cellular gene expression? the context of a tissue and organ within the or- First, we need to know the expression states ganism. Thus, the biggest challenge for the future at the multicellular level—not just for a single of the study of gene expression and regulation at cell. Cells do not exist as isolated entities. Even the systems level is effective collaborative efforts prokaryotes can function as communities of between experimental and theoretical scientists cells, communicating their functional states and mathematicians in which members of the between one another through intercellular sig- research teams will have to learn new vocabu- naling pathways. For eukaryotic metazoans, this laries, approaches, instrumentation, and tools.

4.19 What’s next? 167 2nd pages 2nd pages

9781284023558_CH04_0103.indd 167 6/11/13 4:21 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION

4.20 Summary References In this chapter we focused on the basic mech- Carthew, R. W., and Sontheimer, E. J. 2009. anisms of gene expression and ­regulation. The Origins and mechanisms of miRNAs and discovery of multiple families of ncRNAs has ­siRNAs. Cell. v. 136 p. 642–655. led to the modern definition of the gene as Cheung, A. C. M., and Cramer, P. 2012. A movie a chromosomal or organellar DNA sequence of RNA polymerase transcription. Cell. v. 149 p. 1431–1437. that encodes the information to transcribe a ENCODE Project Consortium. 2012. An integrated functional RNA; the RNA transcript may be encyclopedia of DNA elements in the human an mRNA that encodes information to syn- genome. Nature. v. 489 p. 57–74. thesize a protein or a noncoding RNA with Fabian, M. F., Sonenberg, N., and Filipowicz, W. structural, regulatory, or catalytic functions. 2010. Regulation of mRNA translation and In the first step of gene expression, the gene stability by microRNAs. Annu. Rev. Biochem. is transcribed to produce the primary RNA v. 79, p. 351–379. transcript; however, the vast majority of pri- Krebs, J. E., Goldstein, E. S., and Kilpatrick, S. T. mary RNA transcripts, especially in eukaryotes, 2010. Lewin’s Genes 10. Sudbury, MA: Jones must be processed into mature transcripts. In and Bartlett. the case of eukaryotic protein-coding genes, Lenhard, B., Sandelin, A., and Carninci, P. 2012. Metazoan promoters: emerging characteristics mRNA processing includes 5 capping, 3 ′ ′ and insights into transcriptional regulation. polyadenylation, and the removal of intronic Nat. Rev. Genet. v. 13 p. 233–245. sequences. Transcription and RNA processing Moore, P. B., and Steitz, T. A. 2011. The roles of occur in the nucleus and are directly coupled RNA in the synthesis of protein. Cold Spring to mRNA export to the cytoplasm where trans- Harbor Perspect. Biol. v. 3 a003780, p. 1–17. lation, localization, storage, and degradation Proudfoot, N. J. 2011. Ending the message: occur. The export of mRNAs from the nucleus poly(A) signals then and now. Genes Dev. v. 25 to the cytoplasm is regulated to ensure that p. 1770–1782. only functional mRNAs leave the nucleus and Scherer, S. 2011. Guide to the Human Genome. are translated at the correct time and location Cold Spring Harbor, NY: Cold Spring Harbor within the cytoplasm. Translation, mRNA Press. Schoenberg, D. R., and Maquat, L. E. 2012. ­stability, and mRNA cytoplasmic ­localization Regulation of cytoplasmic mRNA decay. Nat. are regulated by proteins and small ncRNAs. Rev. Genet. v. 13 p. 246–259. Many of the gene regulatory systems in eukary- Shahbabian, K., and Chartrand, P. 2011. Control otic cells are targets of signaling mechanisms of cytoplasmic mRNA localization. Cell Mol. that connect the extracellular environment, Life Sci. v. 69 p. 535–552. cell growth, proliferation, and differentiation to Sonenberg, N., and Hinnenbusch, A. G. 2009. gene expression. It is clear that aberrant regu- Regulation of translation initiation in lation of gene expression is a major mechanism ­eukaryotes: mechanisms and biological of inherited and acquired diseases. ­targets. Cell. v. 136 p. 731–745. Tropp, B. 2012. Molecular Biology: Genes to Proteins. 4th Ed. Sudbury, MA: Jones and Bartlett. http://biology.jbpub.com/lewin/ Wahl, M. C., Will, C. L., and Lührman, R. 2009. cells/3e The spliceosome: design principles of a dy- namic RNP machine. Cell. v. 136 p. 701–718. To explore these topics in Vannnini, A., and Cramer, P. 2012. Conservation more detail, visit this book’s between RNA polymerase I, II, and III tran- Interactive Student Study Guide. scription initiation machineries. Mol. Cell. v. 45 p. 439–446.

168 chapter 4 Gene expression and regulation 2nd pages

9781284023558_CH04_0103.indd 168 6/11/13 4:21 PM