<<

Chapter 3. The Beginnings of Genomic Biology –

Molecular

Contents

3. The beginnings of Genomic Biology – 3.1. DNA is the Genetic Material 3.6.5. initiation, elongation, and termnation 3.2. Watson & Crick – The structure of DNA 3.6.6. Sorting in Eukaryotes 3.3. Chromosome structure 3.7. Regulation of Eukaryotic Expression 3.3.1. Prokaryotic chromosome structure 3.7.1. Transcriptional Control 3.3.2. Eukaryotic chromosome structure 3.7.2. Pre-mRNA Processing Control 3.3.3. Heterochromatin & Euchromatin 3.4. DNA Replication 3.7.3. mRNA Transport from the Nucleus 3.4.1. DNA replication is semiconservative 3.7.4. Translational Control 3.4.2. DNA polymerases 3.7.5. Protein Processing Control 3.4.3. Initiation of replication 3.7.6. Degradation of mRNA Control 3.4.4. DNA replication is semidiscontinuous 3.7.7. Protein Degradation Control 3.4.5. DNA replication in Eukaryotes. 3.4.6. Replicating ends of chromosomes 3.5. 3.5.1. Cellular RNAs are transcribed from DNA 3.5.2. RNA polymerases catalyze transcription 3.5.3. Transcription in 3.5.4. Transcription in Prokaryotes - Polycistronic mRNAs are produced from 3.5.5. Beyond Operons – Modification of expression in Prokaryotes 3.5.6. Transcriptions in Eukaryotes 3.5.7. Processing primary transcripts into mature mRNA 3.6. Translation 3.6.1. The Nature of 3.6.2. The Genetic Code 3.6.3. tRNA – The decoding molecule 3.6.4. Peptides are synthesized on Ribosomes CONCEPTS OF GENOMIC BIOLOGY Page 1

 CHAPTER 3. THE BEGINNINGS OF GENOMIC

BIOLOGY – MOLECULAR GENETICS (RETURN)

3.1. DNA IS THE GENETIC MATERIAL. (RETURN) As the development of classical genetics proceeded from Mendel in 1866 through the In 1928, a British scientist, early part of the 20th century the understanding Frederick Griffith, published his that Mendel’s factors that produced traits were work showing that live, rough, avirulent bacteria could be carried on chromosomes, and that there were transformed by a “principle” infinite ways that the genetic information from 2 found in dead, smooth, virulent parents could assort in each generation to bacteria into smooth, virulent produce the genetic variety demanded by bacteria. This meant that the Darwin’s theories on “origin of species” on which bacterial traits of rough versus natural selection acted. This gave rise to the Frederick Griffith smooth and avirulence versus study of gene behavior of more complex traits (1879-1941) viru- and an understanding of in populations. virulence were controlled by a substance that could carry the phenotype from dead to live cells. At the same time a quest for the material Griffith’s observations on Pneu-mococcus were inside a cell, perhaps a subcomponent of a controversial to say the least, and inspired a spirited chromosome, that carried the genetic instructions debate and much experimentation directed at proving to make organisms what they are was ongoing. whether the “transforming principle” was protein or nucleic acid, the two main components of

chromosomes identified early in the 20th century, well CONCEPTS OF GENOMIC BIOLOGY Page 2 before Griffith’s experiments. This debate continued the label determined. The 35S-labeled protein was found until Oswald Avery and his colleagues, Colin MacLeod, outside the infected cells, while the 32P-labeled DNA and Maclyn McCarty published their work in 1944 was inside the E. coli, indicating that DNA carried the unequivocally showing that DNA was, in fact, Griffith’s information needed for viral infection. transforming principle. This completely revolutionized genetics and is considered the founding observation of molecular genetics.

Figure 3.1. An electron micrograph of bacteriophage T2 (left), and a sketch showing the structures present in the virus (right). The head consists of a DNA molecule surrounded by proteins, while the core, sheath, and tail fibers are all made of Oswald T. Avery Colin MacLeod Maclyn McCarty protein. Only the DNA molecule enters the cell. In 1953, more evidence supporting DNA being the genetic material resulted from the work of Alfred Once it was established that DNA was the genetic Hershey and Martha Chase on E. coli infected with material carrying the instructions for so to speak, bacteriophage T2. In their experiment, T2 proteins attention turned to the question of “How could a were labeled with the 35S radioisotope, and T2 DNA was molecule carry genetic information?” The key to that labeled with was labeled with the 32P radioisotope. Then became obvious with a detailed understanding of the the labeled viruses were mixed separately with the E. structure of the DNA molecule, which was developed by coli host, and after a short time, phage attachment was two scientists a Cambridge University, James Watson disrupted with a kitchen blender, and the location of and Francis Crick. CONCEPTS OF GENOMIC BIOLOGY Page 3 Rosalind Franklin a young x-ray crystallographer working in the laboratory of Maurice Wilkins at 3.2. WASON & CRICK – THE STRUCTURE OF DNA. Cambridge University used a technique known as x-ray (RETURN) diffraction to generate images of DNA molecules that showed that DNA had a helical structure with repeating The basic laboratory observations that lead to the structural elements every 0.34 nm and every 3.4 nm formulation of a structure for DNA did not involve along the axis of the molecule. biologists. Rather Irwin Chargaff, an analytical, organic chemist, and physicists, Rosalind Franklin and Maurice Wilkins made the laboratory observations that led to the solution of the structure of DNA. Chargaff determined that there were 4 different nitrogen bases found in DNA molecules; the purines, (A) and guanine G), and the , (C) and thymine (T), and he purified DNA from a number of different sources so he could examine the quantitative relationships of A, T, G, and C. He con- Rosalind Franklin Maurice Wilkins cluded that in all DNA molecules, the mole-percentage of A was nearly equal to the mole-percentage of T, while the mole-percentage of G was nearly equal to the mole-percentage of C. Alternatively, you could state this as the mole-percentage of pyri-midine bases equaled the mole-percentage of purine bases. These observations became known as Chargaff’s rules. Figure 3.2. X-ray diffraction image of DNA molecule showing helical structure with repeat structural elements.

CONCEPTS OF GENOMIC BIOLOGY Page 4 These astute observations allowed Watson and Crick The key elements of this structure are: to synthesize together a 3-dimentional structure of a • Double helical structure – each helix is made from DNA molecule with all of these essential features. This the alternating deoxyribose sugar and phosphate structure was published in 1953, and immediately groups derived from deoxynuclotides, which are the generated much excitement, culminating in a Nobel monomeric units that are used to make up Prize in Physiology and Medicine, in 1962 awarded to polymeric nucleic acid molecules. Each Franklin, Wilkins, Watson, and Crick. in each chain consists of a nitrogen base of either the purine type (adenine or guanine) or the type (cytosine or thymidine) attached to the 1’-position of 2’-deoxyribose sugar, and a phosphate group, esterified by a phospho-ester bond to the 5’-position of the sugar.

Figure 3.3. Watson & Crick’s DNA structure. Their model consisted of a double helicical structure with the sugars and phosphates making the two hlices on the outside of the structure. The sugars were held together by 3’-5’-phosphodiester bonds. The bases pair on the inside of the molecule with A always pairing with T, and G always pairing with C. This pairing leads to Chargaff’s observations about bases in DNA. Figure 3.4. Structures of purine and pyrimidine bases in DNA, and structure of 2’-deoxyribose sugar. CONCEPTS OF GENOMIC BIOLOGY Page 5 • The are held together in sequence order Figure 3.5. The building bocks of nucleic along the length of the polynucleotide chain by 3’-5’- acids are nucleotides and nucleosides. Any phosphodiester bonds, and the strands demonstrate base together with a deoxyribose sugar forms a deoxyribonucleoside, while if the a polarity as the 5’-OH at one end of a polynuc- sugar is ribose a ribonucleoside is formed leotide strand is distinct from the 3’-OH at the other (not shown). Addition of a phosphate on end of the strand. Often, but not always, the 5’- the 5’ position of the sugar froms nucleotides from nucleosides. strand end will have a phosphate group attached.

• Each of the 2 polynucleotide chains of the double helix are held together by hydrogen bonds beween the in one strand and the thymidines in the other strand, and between the guanosines in one strand hydrogen bonded to the in the other strand.

Figure 3.6. Base pairing between A and T involves two hydrogen bonds, and pairing between G and C involves 3 Figure 3.7. Strand of DNA hydrogen bonds. This means that the forces holding strands showing the 3’,5’-phosphodi- together in G=C base pair-rich regions are stronger than in A=T ester bonds holding base pair-rich regions. nucleotides together. CONCEPTS OF GENOMIC BIOLOGY Page 6 In order to get a uniform diameter for the molecule and have proper alignment of the nucleotide pairs in the middle of the strands, the strands must be orient- ed in antiparallel fashion, i.e. with the strand polarity of each strand of the double helix going in the opposite direction (one strand is 3’-> 5’ whie the other is 5’ -> 3’). The truly elegant aspect of this solution to DNA structure produces a spacing of exactly 0.34 nm between nucleotide base pairs in the molecule, and there are 10 base-pairs per complete turn of the helix. This corresponds precisely with Rosalind Franklin’s x-ray diffraction measurements of repeating units of 0.34 nm and 3.4 nm, and with her measurements of 2 nm for the diameter of the double helix. It is also noteworthy that Watson and Crick suggested that the structure they proposed produced a clear method for the two strands of the DNA molecule to duplicate and maintain the fidelity of the sequence of bases along each chain as DNA was synthesized inside a cell. Thus, providing a mechanism for the fidelity of information transfer from cell generation to cell generation.

CONCEPTS OF GENOMIC BIOLOGY Page 7 base sequences of the DNA molecules themselves. This 3.3. CHROMOSOME STRUCTURE. (RETURN) was obvious well before we btained the first genomic DNA sequences, but has become even more apparent The DNA inside a cell seldom exists as a simple, and significant now that we have the DNA sequences of “naked” DNA molecule. Because DNA molecules are many genomes. Thus, genomic biology is not merely long linear molecules with an overall negative charge the study of DNA nucleotide sequences, but involves deriving from the phosphate groups making up the the study of the structure of the genetic material such helices, positively charged ionic species within cells are as chromosomes and chromatin. attracted to these molecules. These positively charged + ++ molecules can be small ions such as K and Mg , or they 3.3.1. Prokaryotic chromosome structure (return) can be larger positively charged proteins, and/or other Most Prokaryotes (e.g. bacteria) have a single, larger molecular species. These ionic interactions play circular chromosome although some have more than an important role producing the folding and packaging one chromosome, and some have linear chromosomes that is required to keep the large linear molecule rather than circular chromosomes. Certainly, the most packaged inside the microscopic cell. well studied bacteria, e.g. , has a single In the case of proteins it is clear that the positively circular chromosome that can exist in either a relaxed or charged proteins can interact both by general ionic supercoiled state. interactions, but they can also ingeract in sequence Supercoiling involves breaking one of the 2 circular specific ways; i.e. specific proteins only bind to specific helical strands and then rotating the broken ends either sequences of bases in the DNA strand. Thus, the types in the direction of the helix (+ supercoil) or in the of molecular interactions that ionic substances, opposite direction of the helix (- supercoil). As particularly proteins, have with DNA molecules play supercoiling is added to the DNA molecule it becomes important roles in determining the expression of “tightly” coiled (see Figure 3.8.), and therefore can be information that is carried in the DNA molecule. It will compacted more easily. This permits the packaging of be obvious as we proceed through our study of genomic the large DNA molecule into the relatively small cells in biology, that such DNA-protein interactions are as which it must exist and function. critical to describing “genetic information” as are the CONCEPTS OF GENOMIC BIOLOGY Page 8

Figure 3.9. Diagram of DNA organizational structure in prokaryotes. Supercoiled DNA is looped and attached to scaffold proteins.

Figure 3.8. An E. coli cell lysed open showing the expanse of its DNA molecule (left). Note that this entire molecule must be folded and 3.3.2. Eukaryotic chromosome structure (return) packaged inside the cell in the picture. On the right are two electron micrographs showing circular DNA molecules either in a relaxed (top) or In general Eukaryotes have much larger genomes supercoiled (bottom) state. than do Archea and other Prokaryotes. This difference in relative genome size compared to the complexity of Additional packaging results from the supercoiled the organism does not appear to be as true for species DNA being carefully looped onto a scaffold of proteins within the Eukaryota. This lack of correlation between leading to an organized intracellular structure that can organismal complexity and genome size (called the C- be easily accessible but also keep DNA from twisting vlaue) is referred to as the C-value paradox (Table 3.1) and being damaged du-ring normal cellular processes. The C-value paradox results from great variation in the nature of DNA in different Eukaryotes. Some eukaryotes contain substantial amounts of DNA that appears to have limited or at have a gene density in their genomes resembling the Prokaryotes (e.g. the

yeasts and malarial parasite in the table above). The CONCEPTS OF GENOMIC BIOLOGY Page 9 majority of Eukaryotes fall somewhere in between these extremes, but are highly variable in their DNA contents. For now we need to appreciate that this variation in DNA content and type appears to have a relationship to chromosome structure. But the nature of this relationship will be considered further once we learn more about DNA sequencing and examine fully Figure 3.10. Electron micrograph showing the structure sequenced genomes. of Eukaryotic DNA. The DNA molecule is barely visible, but connects the beads of proteins that the DNA wraps around creating the appearance of beads on a string.

In eukaryotes, there are multiple levels of chromosomal organization that we will need to consider. Observations using powerful electron

CONCEPTS OF GENOMIC BIOLOGY Page 10

microscopes demonstrated that in Eukaryotes, the DNA molecules in chromosomes are organized like beads on a string. These structures have subsequently been

named . Investigation of the nature of nucleosomes has shown that they are made from several types basic proteins (positively charged) found

in cells called proteins.

The basic nucleosome consists of a combination of H2A, H2B, H3, and H4. DNA is subsequently

wrapped around these structures producing the bead- like appearance observed in the electron microscope. Once the nucleosomes are formed, they can condense

or decondense based on interaction with another histone, histone H1.

During prophase of mitosis or meiosis, the

nucleosome structure of chromatin further condenses into a so-called solenoid structure, which is approxim- ately 30 nm in diameter. This solenoid from is not

visible in a light microscope but can be viewed in an electron microscope. This appears to be the form DNA Figure 3.11. Nucleosomes are formed when DNA wraps around a histone complex. Nucleosomes can exist in either a assumes when chromosomes condense during during more condensed or a decondenses state depending ot the mitosis, but the DNA is not as accessible for use in the state of the genetic material in a cell. cell as it is during interphase, when the chromatin is

decondensed.

CONCEPTS OF GENOMIC BIOLOGY Page 11

Figure 3.13. Loop-folding of the 30 nm solenoid structure yields a packaged DNA that is visible in a ligh microscope in each Eukaryotic chromosome.

structure appears to be required to allow for the appropriate assembly and assortment of the genetic material during the cell cycle in mitosis. Without this structural organization, it is likely that cellular DNA would become a hopeless Figure 3.12. Condensation of chromatin leads to tangle, and cellular reproduction would be severely the careful packaging of DNA into so called solenoid sturctures. These structures ultimately hampered, and would likely require too much time and form chromosomes. effort to ultimately be successful.

The solenoid structures are subsequently looped and 3.3.3. Heterochromatin & Euchromatin. (return) fastened to chromosome scaffold proteins generating a The cell cycle affects DNA packing into chromatin structure that is visible in a light microscope that we with chromatin condensing for mitosis and meiosis and know as a chromosome. then decondensing during interphase while being most While this may seem like an elaborate structure dispersed at S-phase. However, cytogeneticists have involving several sets of structural proteins, such a observed that there can be two differently staining forms of chromatin, called Euchromatin and CONCEPTS OF GENOMIC BIOLOGY Page 12 heterochromatin. Euchromatin condenses and unique-sequence DNA, and Prokaryotes had little or no decondenses with the cell cycle. Euchromatin accounts repetitive sequences. However, Eukaryotes have a mix for most of the active genome in dividing cells and bears of unique and repetitive sequence types of DNA. most of the protein-coding DNA sequences. • Unique-sequence DNA includes most of the genes Heterochromatin remains condensed throughout the that encode proteins, and Euchromatin is rich in cell cycle and is believed to be relatively inactive. There unique-sequence DNA. are two types of heterochromatin based on activity, ie. • Repetitive-sequence DNA includes the moderately constitutive heterochromatin that is tightly condensed and highly repeated sequences. They may be in virtually all cell types and facultative dispersed throughout the genome or clustered in heterochromatin which varies between cell types tandem repeats. Heterochromatin is rich in moderate and/or developmental stages. and highly repetitive DNA. Other methods of characterizing types of DNA • Human DNA contains about 65% unique sequences suggest that there are sequences of DNA that can occur while unque sequence DNA makes up a much lower in may copies in the genome. These types of sequences percentage of the genome of organisms that have can be repeated only once in the genome or they can unexpectedly large genomes (C-values) that were occur 10’s of thousans of times or more in genomes. discussed earlier in this section. Sequences can be categorized into: • Unique-sequence DNA, present in one or a few copies per genome. • Moderately repetitive DNA, present in a few to 105 copies per genome • Highly repetitive DNA, present in about 105–107 copies per genome Observations about repetitive DNA sequences as described above have been known for decades, and initially it was shown that Prokaryotic DNA was mostly CONCEPTS OF GENOMIC BIOLOGY Page 13 but it is possible to find atoms with 7 protons, and 8 neutrons, having an atomic mass of 15 (written as 15N). 3.4. DNA REPLICATION. (RETURN) It turns out that if you grew bacterial cells on a nitrogen 15 source enriched in a N enriched nitrogen source, the DNA molecules purified from such cells have a greater As Watson and Crick were solving the structure of density (they are heavier). By synchronizing cells and DNA, they realized the general mechanism by which the purifying DNA after each round of DNA replication and molecule could be copied and maintain fidelity in then determining the density of the newly made DNA copying the DNA molecule. From that beginning, molecules using density gradient centrifugation, interest in understanding the duplication of the DNA Meselson and Stahl were able to show that the first molecules of a cell became a subject of investigation, round of DNA synthesis produced molecules having a and led to a number of Nobel Prize awards. However, hybrid density between light and heavy DNA. While understanding DNA replication was critical to the after a subsequent round of DNA replication they development of the technologies needed for molecular produced light and hybrid molecules. Such a pattern of genetics and ultimately genomic biology research.

3.4.1. DNA Replication is semiconservative. (return) Among the earliest experiments concerning the nature of how DNA replicates were the studies of Mathew Meselson and Frank Stahl. Meselson, while a Ph.D. student designed an experiment that utilized so called “heavy” isotopes nitrogen. Elemental isotopes consist of atoms having the same number of proton, but Figure 3.14. Diagram showing the predicted outcome with more than the average number of neutrons. For of conservative, semiconservative, and dispersive DNA example, nitrogen normally has 7 protons, and 7 replication. Original strands are shown in red while neutrons, giving it an atomic mass of 14 (written 14N) newly made DNA is shown in blue. CONCEPTS OF GENOMIC BIOLOGY Page 14 15N labeling was consistent only with the strand according to the sequence of the corresponding semiconservative replication of DNA. strand being copied. This copied strand is referred to as the template strand. 3.4.2. DNA Polymerases. (return) All DNA polymerases studied to date make DNA The that replicates the DNA double helix is using the general principles established for Kornberg’s called DNA polymerase. The enzyme is difficult to work with because there are but a few copies of it needed per cell, and then they are required only in S-phase of the cell cycle. In spite of these limitations, Arthur Kornberg, won the Nobel Prize in 1959 for the first purification and characterization of an enzyme that makes DNA. Kornberg’s enzyme was purified from the bacterium E. coli, and beside the enzyme 4 additional components were required to make DNA in a test tube. These factors included a template DNA (Kornberg used E. coli DNA), the four deoxy nucleotide triponosphates (dNTP), i.e. dATP, dGTP, dCTP, and dTTP. Note that these are the deoxy NTP, and not the ribose containing NTP’s. The remaining requirements for DNA polymerase are magnesium ion (Mg++) and a primer single strand of DNA. This primer requirement involves a single strand of DNA that will form a short double- stranded region of DNA. DNA polymerase then adds nucleotides to the free 3’-end of this primer, but Figure 3.15. Note that the template strand is read from it’s 3’-end without the primer DNA polymerase is unable to make a to its 5’-end while the antiparallel, new DNA strand is made from DNA strand. As the nucleotides are added they are the 5’-end to the 3’-end. added from the 5’-end to the growing 3’-end of the CONCEPTS OF GENOMIC BIOLOGY Page 15 enzyme, but there are significant differences between • A minimal sequence of about 245 bp required for them in other respects. For example, in E. coli there are initiation. five different DNA polymerases. Kornberg’s enzyme is • Three copies of a 13-bp AT-rich sequence. now known as DNA polymerase I, but there are also • Four copies of a 9-bp sequence. DNA polymerases II, III, IV, and V. DNA polymerases II, IV, and V are not involved in the DNA replication process, and they have specialized functions in repairing damaged DNA under specific circumstances. DNA polymerases I and III are the DNA polymerases involved in the replication of cellular DNA. Both of these DNA polymerases contain a 3’ -> 5’ exonuclease activity that is involved in proof-reading the recently made DNA strand and removing any mistakes that are made. Only DNA polymerase I has a 5’ -> 3’ exonuclease activity and we will visit this function again below when the role of DNA polymerase I in DNA replication is considered. 3.4.3. Initiation of replication. (return) Replication initiates at a specific sequence in the genome that is often called an origin of replication. E. coli has one origin, called oriC, where replication starts when the strands of the helix are forced apart to expose the bases, creating a replication bubble with two replication forks. Replication is usually bidirectional Figure 3.16. Initiation of DNA replication in E. from the origin using the two forks to enlarge the coli. at oriC. Noote the 9 and 13 bp repeats where DNA binds and activates bubble in both directions. E. coli has one origin, oriC, replicatlion throught the action of DNA primase. with the following properties: CONCEPTS OF GENOMIC BIOLOGY Page 16 From a series of in vitro studies it has been shown in E. 3) DNA polymerase III adds nucleotides to the 3’-end coli that the following steps are involved in initiating of the primer, synthesizing a new strand replication: complementary to the template and displacing the 1) Initiator proteins attach to oriC (E. coli’s initiator SSBs. DNA is made in opposite directions (at each protein is the DnaA protein derived from the dnaA fork) on the two template strands since DNA gene. polymerase only adds nuclotides to the free 3’- end. 2) DNA helicase (from dnaB gene) binds initiator proteins on the DNA and denatures the AT-rich 13- 4) The new strand made 5’-to-3’ in the same bp region using ATP as an energy source. direction as movement of the replication fork, i.e. 3) DNA primase (from the dnaG gene) binds helicase DNA polymerase III is continuously moving toward to form a primosome, which synthesizes a short the fork on one strand of the bubble at each fork. (5–10 nt) RNA primer. This defines the “leading strand”. On the other strand the new strand must be made in the

opposite direction as it must be made 5’ -> 3’. 3.4.4. DNA Replication is Semidiscontinuous (return) 5) This means that on this “lagging strand” primase When DNA denatures (strands separate) at the ori, must add the RNA primer very close to the replication forks are formed. DNA replication is usually replication fork, and the DNA polymerase III bidirectional, but we will consider events at just one moves away from the fork rather than toward the replication fork, but don’t forget that a similar set of fork like it was on the leading strand. events are occurring at the other replication fork in the bubble. The events occurring at each fork are: 6) The Leading strand needs only one primer and continuously makes the new DNA strand, while on 1) Single-strand DNA-binding proteins (SSBs) bind the the lagging strand a series of RNA primers are ssDNA formed by helicase, preventing required and only a limited number of DNA reannealing. nucleotides are added by DNA polymerase III 2) Primase synthesizes a primer on each template before the previously made fragment is strand. encountered. CONCEPTS OF GENOMIC BIOLOGY Page 17 ments. DNA replication is therefore semidiscon- tinuous. 8) As the bubble enlarges and DNA helicase denatures (untwists) the strands, this causes tighter winding in other parts of the circular chromosome. A protein called DNA Gyrase relieves the tension created in the molecule. 9) As Okazaki fragments accumulate on the lagging strand, DNA polymerase I binds and the 5’ -> 3’ exonuclease activity removes the RNA primers, and replaces them with DNA nucleotides.

Figure 3.17. DNA replication at a replication fork showing continuous DNA synthesis on the lower strand and discontinuous DNA synthesis on the upper strand where Okazaki fragments are dd 7) Thus, the leading strand is synthesized continuous- ly, while the lagging strand is synthesized discontin- uously in the form of shorter pieces of DNA with Figure 3.18. Removal of the RNA primers by the 5’-> 3’ exonuclease of DNA polymerase I, and replacement with DNA interspersed RNA primers called Okazaki frag- nucleotides on the lagging strand.

CONCEPTS OF GENOMIC BIOLOGY Page 18

Primer removal differs from that in prokaryotes. Pol 10) The DNA fragments lacking RNA primers are now continues extension of the newer Okazaki fragment, fastened together using an enzyme called DNA displacing the RNA and producing a flap that is removed ligase that closes the remaining gaps on the lagging by nucleases, thus allowing the Okazaki fragments to be strand. joined by DNA ligase. Other DNA polymerases replicate mitochondrial or chloroplast DNA, or they are used in DNA repair. These are all similar to the prokaryotic system described in detail above.

3.4.6. Replicating ends of chromosomes. (return) Figure 3.19. DNA ligase joins an opening in a DNA strand remaking acomplete phosphodiester-linked polynucleotide chain. Replicating the ends of chromosomes in organisms without circular chromosomes presents unique problems. Removal of primers at the 5’-end of the 3.4.5. DNA replication in Eukaryotes. (return) newly made strand will produce shorter strands that cannot be extended with existing DNA polymerases, and of eukaryotic DNA replication are not as if the gap is not addressed chromosomes would become well characterized as their prokaryotic counterparts. shorter each time DNA replicates. Thus a new Fifteen DNA polymerases are known in mammalian mechanism for the completion of the ends of the cells, for example. Three DNA polymerases are used to chromosome is required. This is accomplished using the replicate nuclear DNA. Pol extends the 10-nt RNA telomerase system. primer by about 30 nt. Pol and Pol extend the RNA/DNA primers, one the leading strand and the other Most eukaryotic chromosomes have short, species- on the lagging stand, but it is not clear which specific sequences tandemly repeated at their synthesizes which. telomeres. It has been shown that chromosome lengths are maintained by telomerase, which adds telomere repeats without using the cell’s regular replication CONCEPTS OF GENOMIC BIOLOGY Page 19 machinery. In humans, the telomere repeat sequence is Telomerase, an enzyme containing both protein and 5’-TTAGGG-3’. RNA, includes an 11-bp RNA sequence used to synthesize the new telomere repeat DNA. Using an RNA template to make DNA, telomerase functions as a reverse transcriptase called TERT (telomerase reverse transcriptase). The 3’-end of the telomerase RNA

contains the sequence 3’-CAUC, which binds the 5-- GTTAG-3’ overhang on the chromosome, positioning telomerase to complete its synthesis of the GGGTTAG telomere repeat. Additional rounds of telomerase activity lengthen the chromosome by adding telomere repeats. Ends of telomere DNA usually loop back to form a D-loop. After telomerase adds telomere sequences, chromosomal replication proceeds in the usual way. Any shortening of the chromosome ends is compensated for by the addition of the telomere repeats. Telomere length may vary, but organisms and cell types have characteristic telomere lengths, resulting from many levels of regulation of telomerase. Mutants affecting telomere length have been identified, and data shortening of telomeres eventually leads to cell death. Loss of telomerase activity results in limited rounds of cell division before the cell death. Figure 3.20. The dilemma of how the 3’ overhangs are replicated at each end of the chromosome to duplicate a chromosome and make sister chromatids. CONCEPTS OF GENOMIC BIOLOGY Page 20

Figure 3.21. Replication of chromosome ends using telomerase. CONCEPTS OF GENOMIC BIOLOGY Page 21

3.5. TRANSCRIPTION. (RETURN)

In cells the genetic information carried in the DNA nucleotide sequence becomes functional information that gives characteristics to cells ultimately specifying traits. This conversion of DNA sequence information into functional information begins with the creation of cellular RNAs from one of the two strands of DNA sequence. This process is called transcription. The mechanism by which these cellular RNAs are transcribed from will be presented in this section while the regulation of these processes will be covered later.

3.5.1. Cellular RNAs are transcribed from DNA (return) Ribosomal RNAs (return 3.6.4.) The most abundant type of RNA in most cells is a structural component of the cellular particle that is involved in the synthesis of proteins called a ribosome. Since ribosomes have 2 subunits, a large subunit and a small subunit, they also have two major types of ribosomal RNA. These are described in detail in Table 3.2. In addition to the largest ribosomal RNAs there are additional smaller ribosomal RNAs as well. Note that the size and nature of all of these ribosomal RNAs is different in Prokaryotes and Eukaryotes.

CONCEPTS OF GENOMIC BIOLOGY Page 22 In prokaryotes a small 30S ribosomal subunit Mammalian mitochondria have only two contains the 16S ribosomal RNA. The large 50S mitochondrial rRNA molecules (12S and 16S) but do not ribosomal subunit contains two rRNA species (the 5S contain 5S rRNA. The ribosomal RNAs are transcribed and 23S ribosomal RNAs). Bacterial 16S ribosomal RNA, from the mitochondrial genome. This is also the case for 23S ribosomal RNA, and 5S rRNA genes are typically plant mitochondrial rRNAs although plants contain a organized as a co-transcribed unit (). There may more prokaryotic like ribosomal RNAs, i.e. a 16S, a 26S, be one or more copies of the operon dispersed in the and a 5S rRNA. Plants also contain chloroplast genome (for example, Escherichia coli has seven). ribosomal RNAs (16S, 23S, and 5S) produced by Archaea contains either a single rDNA operon or transcription from the chloroplast genome. multiple copies of the operon. Messenger RNAs – mRNAs In Eukaryotes, the cytoplasmic small ribosomal All organisms (and mitochondria and chloroplasts) subunit (40S) contains an 18S rRNA while the large produce a type of RNA that codes for the amino acid ribosomal subunit (60S contains a 28S, 5S, and 5.8S sequence of proteins. This RNA is a copy of the DNA rRNA. As in Prokaryotes these rRNAs are structural sequence of the gene and is transcribed from one of the components of ribosomes where they perform essential two DNA strands of each gene. By reproducing the DNA function. In mammals, the 28S, 5.8S, and 18S rRNAs are sequence as an mRNA copy the sequence information encoded by a single nuclear transcription unit (45S). for the gene is faithfully maintained allowing the Two internally transcribed spacers separate the 3 rRNA generation of many gene “copies” that can be used to species in the 45S transcript. Generally, there are many produce even more protein copies from each gene. copies of the 45S rDNAs organized clusters throughout the nuclear genome. In humans, for example, each Transfer RNAs - tRNA (return 3.6.3.) cluster has 300-400 repeats. 5S rDNA is not made as Transfer RNAs (tRNAs) are smaller (~90 nt) RNA part of the 45S transcript, but occurs in tandem arrays molecules that are transcribed from genes scattered (~200-300 5S genes) interspersed in the mammalian throughout both Prokaryotic and Eukaryotic genomes, genome independently of the 45S rDNA genes. including mitochondrial and chloroplast genomes. These molecules are the “decoding” molecules that determine which amino acids are put in proteins in the CONCEPTS OF GENOMIC BIOLOGY Page 23 order specified by the nucleotide sequence in the other types of RNA, mostly rRNA, tRNA, and snRNA. mRNA. They are highly structured RNA molecules, and One of the main functions of snoRNAs involves there is at least one, often several, tRNA for each of the modification of the 45S ribosomal precursor so that it twenty protein-contained amino acids. Each tRNA is can be futher processes to generate the 18S, 5.8S, and processed from a transcribed precursor-tRNA molecule 28S rRNAs. coded for by specific tRNA genes, and typically there is Small regulatory RNAs are found in prokaryotes but one tRNA produced per tRNA gene. where they are involved in the regulation of gene In Eukaryotes tRNA are scattered across all expression, but mostly they are known for the role they chromosomes, and there are separate sets of tRNA play in transcriptional, posttranscriptional and genes in each of the organelle genomes present in translational control of in Eukaryotes. eukaryotes. These molecules are an array of 20-30 nt RNAs Other Non-protein-coding Transcribed RNAs transcribed in various ways from genes in the genomes of organisms. Note that although there are primarily 2 More recently additional types of RNAs that perform types of srRNAs, microRNAs (miRNA) and short vital functions in cells have been described. Most of interfering RNA (siRNA) these types are specific to these have been described in Eukaryotes once we certain organisms and there are likely thousands of described and characterized genomes of Eukaryotes. genes transcribed for such srRNAs. Small nuclear RNAs (snRNA) are smaller RNAs (typically ~ 150 nt) transcribed from nuclear DNA in 3.5.2. RNA polymerases catalyze transcription (return) eukaryotic cells. snRNAs are structurally part of small RNA polymerase is the enzyme responsible for nuclear ribonucleoprotein particles (snRNPs) that are copying a DNA sequence into an RNA sequence, during involved in processing mRNAs in the nucleus of cells. the process of transcription. As complex molecule Typically there are but a handful of different snRNAs composed of protein subunits, RNA polymerase controls made in each species and these are highly conserved the process of transcription, during which the among eukaryotes. information stored in a molecule of DNA is copied into a Small nucleolar RNAs (snoRNAs) are a class of small molecule of cellular RNA. RNA molecules that function to guide modification of CONCEPTS OF GENOMIC BIOLOGY Page 24 The detailed mechanism of how RNA polymerase of RNA polymerase that transcribes mRNA, tRNA, and works is shown in Figure 3.22. all rRNAs. Eukaryotes contain three (animals and fungi) to five (plants) distinct types of RNA polymerases. Each of these RNA polymerases transcribes different species of RNA as shown in Table 3.3.

Figure 3.22. The chemical reaction catalyzed by RNA polymerases showing both the reactants and products and the specificity of base pair addition. Note the antiparallel nature of the RNA strand to the DNA strand being transcribed. RNA polymerase makes a phosphodiester bond between the 5’-phosphate group closest to the ribose sugar and the 3’-OH on the 3’-end of the growing strand of RNA.

Multisubunit RNA polymerases exist in all species, but the number and composition of these proteins vary across taxa. For instance, bacteria contain a single type CONCEPTS OF GENOMIC BIOLOGY Page 25 A prokaryotic gene is a DNA sequence in the In spite of these differences, there are striking chromosome. The gene has three regions, each with a similarities among transcriptional mechanisms for all function in transcription (see Figure 3.23.). These are: RNA polymerases. For example, transcription is divided into three steps for both bacteria and eukaryotes. They are initiation, elongation, and termination. The process of elongation is highly conserved between bacteria and eukaryotes, but initiation and termination are somewhat different. All species require a mechanism by which transcription can be regulated in order to achieve Figure 3.23. Prokaryotic genes all have regions upstream (toward the 5’-end of the mRNA) of the protein coding gene and spatial and temporal changes in gene expression. regions downstream (toward the 3’-end of the mRNA). Proteins that interact with the core RNA polymerase, These regions are located at the 3’-end (promoter) and the 5’-end (terminator) of the template strand of DNA. Typically the nucleotide and that recognize specific sequences in the DNA where RNA polymerase begins transcribing is designaed the +1 mediate these initial regulatory steps during nucleotide position, and sequences in the promoter are designated as transcription initiation. However the types and nature (-) nt positions. of these interacting proteins are quite distinct in Prokaryotes compared to Eukaryotes. This leads to a 1) A promoter sequence that attracts RNA discussion of how transcription initiation at each gene polymerase to begin transcription at a site locus takes place in both Prokaryotes and Eukaryotes. specified by the promoter. Some genes use one strand of DNA as the template; other genes use 3.5.3. Transcription in Prokaryotes (return) the other strand. For a model of Prokaryotic gene regulation, the 2) The transcribed sequence, called the RNA-coding bacterium, Escherichia coli, will be used as a model. sequence. The sequence of this DNA corresponds This model is similar to nearly all Prokaryotes. with the RNA sequence of the transcript. 3) A terminator region that specifies where trans- cription will stop. CONCEPTS OF GENOMIC BIOLOGY Page 26

The process of transcription initiation in E. coli is a shown in Figure 3.24. The process involves two DNA sequences centered at -35 bp and -10 bp upstream from the +1 start site of transcription in the promoter region of the gene. These two consensus sequences (in E. coli) are 5’-TTGACA-3’ at the -35 nt region and 5’-TATAAT-3’ b at the -10 region (previously known as a , but they can vary according to the organism and gene within the organism. Transcription initiation requires the RNA polymerase holoenzyme (only one type is found in bacteria) to bind to the promoter DNA sequence. Holoenzyme consists c of: 1) Core enzyme of RNA polymerase, containing five polypeptides (two alpha, one beta, one beta’ and an omega; written as α2ββ’ω).

2) One of several sigma factors (σ-factor) that binds the core enzyme and confers ability to recognize d

specific gene promoters. RNA polymerase holoenzyme binds promoter in two steps (Figure 3.24) that involve the sigma factor. First, it loosely binds to the -35 sequence of dsDNA closed Figure 3.24. Prokaryotic (E. coli) transcription initiation. a) RNA Polymerase holoenzyme is “recruited to the promoter by a specific promoter complex (Figure 3.24a). Second, it binds σ-factor (sigma factor); b) strands of the DNA are separated tightly to the -10 sequence (Figure 3.24b), untwisting exposing the sense strand for copying; d) nucleotides are polymerized as RNA polymerase moves down the strand, and σ- about 17 bp of DNA at the site. At this point RNA factor leaves the complex as; d) elongation continues, the newly made mRNA exits the enzyme, and the transcription “bubble” moves

CONCEPTS OF GENOMIC BIOLOGY Page 27 polymerase is in position to begin transcription (open TABLE 3.4. promoter complex). E. coli σ-factors and their function Promoters often deviate from consensus the s-factors Function consensus sequences at -35 and -10, and the associated σ70 (rpoD) = σA the "housekeeping" sigma factor or also genes will show different levels of transcription, called as primary sigma factor, transcribes corresponding with σ-factor’s ability to recognize their most genes in growing cells. Every cell has a “housekeeping” sigma sequences. E. coli has several sigma factors with important roles in gene regulation. Each sigma can bind σ19 (fecI) the ferric citrate sigma factor, regulates a molecule of core RNA polymerase and guide its choice the fec gene for iron transport of genes to transcribe, but has different affinity for σ24 (rpoE) the extracytoplasmic/extreme heat stress sigma factor specific promoters. σ28 (rpoF) the flagellar sigma factor 70 Most E. coli genes have a σ promoter, and σ70 is σ32 (rpoH) the heat shock sigma factor; it is turned usually the most abundant σ-factor in the cell. σ70 on when the bacteria are exposed to heat. Due to the higher expression, the factor recognizes the sequence TTGACA at -35, and TATAAT at will bind with a high probability to the -10. Other sigma factors may be produced in response polymerase-core-enzyme. Doing so, other to changing conditions, and each can bind the core RNA heatshock proteins are expressed, which enable the cell to survive higher polymerase, enabling holoenzyme to recognize different temperatures. Some of the enzymes that promoters. An example is σ32, which arises in response are expressed upon activation of σ32 are to heat shock and other forms of stress and recognizes a chaperones, proteases and DNA-repair enzymes. sequence at -39 bp and -15 bp. E. coli has additional σ38 (rpoS) the starvation/stationary phase sigma sigma factors with various roles (Table 3.4), and other factor bacterial species also have multiple similar and σ54 (rpoN) the nitrogen-limitation sigma factor additional sigma factors. Many bacterial genes are controlled by regulatory proteins that interact with regulatory sequences near proteins, i.e. activators that stimulate transcription by the promoter. There are two classes of regulatory facilitating RNA polymerase activity, and that CONCEPTS OF GENOMIC BIOLOGY Page 28 inhibit transcription by decreasing RNA polymerase move along the transcript and destabilize the binding or elongation of RNA. RNA–DNA hybrid at the termination region, Once initiation is completed, RNA synthesis begins, terminating transcription. and the sigma factor is released and reused for other initiations (Figure 3.24c). Core enzyme completes the Figure 3.25. Simplified transcript. Core enzyme untwists DNA helix locally, schematics of the mechanisms of prokaryotic transcriptional allowing a small region to denature. Newly synthesized termination. In Rho- RNA forms an RNA–DNA hybrid, but most of the independent termination, a transcript is displaced as the DNA helix reforms (Figure terminating hairpin forms on the nascent mRNA interacting with 3.24d). the NusA protein to stimulate release of the transcript from

Terminator sequences are used to end transcription. the RNA polymerase complex In E. coli there are two types of transcript termination: (top). In Rho-dependent termination, the Rho protein 1) Rho-independent (ρ-independent) or type I binds at the upstream rut site, terminators (Figure 3.25, upper) have twofold translocates down the mRNA, symmetry that would allow a hairpin loop to form and interacts with the RNA polymerase complex to (Figure 3.25). The palindrome is followed by 4–8 U stimulate release of the residues in the transcript, and when these transcript.

sequences are transcribed, they form a stem-loop structure and cause chain termination. 2) Rho-dependent (ρ-dependent) or type II 3.5.4. Transcription in Prokaryotes – polycistronic terminators (Figure 3.25, lower) require the mRNAs from operons (return) protein ρ for termination. Rho binds to the C-rich While we have considered the structure of a sequence in the RNA upsteam of the termination prokaryotic gene as having a promoter, a coding region, site and moves with the transcript until and a termination region (see Figure 3.23), in most encountering a stalled polymerase. It then acts as cases multiple protein-coding regions are under the a helicase, using ATP hydrolysis for energy to control of a single promoter. This genetic structure is CONCEPTS OF GENOMIC BIOLOGY Page 29 referred to as an operon, and the mRNA transcribed from each operon is in fact an RNA capable of producing multiple peptides. This type of mRNA, typical of prokaryotes, and Eukaryotic mitochondria and chloroplasts, is referred to as a polycistronic mRNA. Thus, the proteins binding to promoter and regulatory regions of genomes that regulate gene expression in prokaryotes regulate the production of multiple peptides simultaneously. Typically, these peptides are functionally related, e.g. the proteins required to catabolize lactose as a carbon source [] (see Figure 3.26.), or the proteins required to make the Figure 3.26. The lac operon in E. coli. Three lactose genes (lacZ, lacY, and lacA) are organized together in a cluster called amino acid tryptophan [] (see Figure 3.27.). the lac operon. The coordinated transcription and translation of The lac operon is an example of an inducible the lac operon structural genes is controlled by a shared promoter, operator, and terminator. A lac (lacI) with its separate (positively regulated) operon. The protein promoter is found just outside the lac operon. The lacI gene produces does not bind to the operator and stop transcription in a regulatory protein, the protein that binds to the the presence of the (lactose), while the “”, which is lactose (or a derivative, ) when it is present in a cell. The lacI protein also can bind to a region of the tryptophan operon is an example of a repressible operon between the lac promoter and the structural genes referred to (netatively regulated) operon. The repressor protein as the lac operator (lacO). In the absence of lactose (allolactose) the only binds to the operator in the presence of the lacI protein tightly binds to the operator and prevents RNA polymerase from transcribing the polycistronic mRNA. When lactose binds to the effector molecule (tryptophan). Thus, using the similar lacI protein, the lacI protein cannot bind to the lacO gene, and RNA types of regulatory proteins and genes, and similar polymerase proceeds to produce the polycistronic mRNA operon structure almost any type of gene regulation can corresponding to the lacZ, lacY, and lacA genes. © 2013 Nature Education Adapted from Pierce, Benjamin. Genetics: A Conceptual Approach, be obtained. 2nd ed. All rights reserved. Additionally, it should be noted that the proteins for regulated as a consequence of the production of related critical cellular functions can be coordinately polycistronic mRNAs. CONCEPTS OF GENOMIC BIOLOGY Page 30 A 3.5.5. Beyond Operons - Modification of expression of prokaryotic genes (return) Additional regulation of operons is often used to produce further fine-tuning of transcription. This can vary with each operon in Prokaryotic genomes. A common type of additional regulation has been shown for the lac operon and many other catabolic operons. B Glucose is the preferred carbon source in E. coli. In the presence of glucose, lactose will not be utilized. This means that if an abundant supply of glucose and lactose are both available, the lac operon will not be induced until the glucose is used up. This phenomenon is often referred to as catabolite repression, and the critical components of catabolite repression in the lac operon Figure 3.27. The tryptophan operon of E. Coli consists of five structural genes (trpE, trpD, trpC, trpB, and trpA) with a common are shown in Figure 3.28. promoter, operator, and terminator. A separate promoter regulates the trpR regulatory protein (trp repressor). Transcription of the trp When the concentration of intracellular glucose is operon produces a polycistronic mRNA that contains a leader peptide low (Figure 3.28, upper panel) the levels of the signal and coding sequences for the 5 structural genes that produce the 5 enzymes required to make tryptophan. Since tryptophan is an amino molecule cAMP are high, and cAMP binds to CAP acid required for cell growth, the trp operon is “repressed” when protein. The association between RNA polymerase and cells have access to an abundant supply of tryptophan (panel A), and becomes “derepressed” when cells are starving for tryptophan (panel promoter DNA is enhanced when the CAP-cAMP B). A) Tryptophan present, repressor bound to operator, operon complex is present. Enhanced RNA polymerase binding repressed. When complexed with tryptophan, the repressor protein binds tightly to the trp operator, thereby preventing RNA polymerase leads to a high rate of transcription (provided that the from transcribing the operon structural genes. B) Tryptophan absent, operator is free) and translation of the lac operon repressor not bond to operator, operon derepressed. In the absence of tryptophan, the free trp repressor cannot bind to the operator polycistronic mRNA. The resulting mRNA transcripts are site. RNA polymerase can therefore move past the operator and translated into the enzymes beta-galactosidase, transcribe the trp operon structural genes, giving the cell the capability to synthesize tryptophan. permease, and transacetylase, and these enzymes are

CONCEPTS OF GENOMIC BIOLOGY Page 31 used to break down lactose into glucose and Galactose. The latter can subsequently be converted into glucose. When the glucose concentration in the cell is high (Figure 3.28, lower panel), low concentrations of cAMP result in decreased binding of cAMP to CAP. Therefore, the cAMP-CAP complex is not bound to the bacterial DNA, and as a result, neither is RNA polymerase. This lowers the rate of transcription and polycistronic mRNA production is decreased for the lacZ, lacY, and lacA genes. The absence of these proteins reduces glucose production from lactose, leading to the use of the available glucose prior to the use of any lactose. The interaction of CAP with DNA and with cAMP directly regulates the production of mRNA. Some type of interaction of proteins with regulatory regions in the DNA mediates the phenomenon of catabolite repression in operons associated with carbon source utilization in prokaryotes. In anabolic operons (typical of amino acid synthesis), a phenomenon of additional regultation referred to as attenuation has been documented. The example most commonly considered involves the trp operon discussed above.

Figure 3.28. Diagram showing the major effects of low glucose The leader sequence in the polycistronic mRNA of the (upper panel) and high glucose (lower panel) on the expression trp operon contains several trp codons, and can form 3 of lac operon genes. © 2013 Nature Education Adapted from Pierce, Benjamin. Genetics: A Conceptual Approach, different stem-loop structures. Depending on the 2nd ed. (New York: W. H. Freeman and Company), 446. All rights reserved. CONCEPTS OF GENOMIC BIOLOGY Page 32 when trp is abundant. While the other structure does not terminate transcription, and the polycistronic mRNA is produced. Several amino acid synthetic operons (e.g. phenyl- alanine, histidine, leucine, threonine, and isoleucine- valine) demonstrate this same type of attenuation. Consequently, this mechanism is relatively widespread as a means of modulating and fine-tuning pathways for amino acid biosynthesis.

3.5.6. Transcription in Eukaryotes (return) Although transcription in Eukaryotes follows the general principles outlined above for Prokaryotes, there Figure 3.29. Attenuation of the trp operon. The diagram at the center are many specific details that are different. Recall that shows the general folding of the leader sequence of the trp polycistronic mRNA and labeling of strands. The mRNA is folded in four there are as many as five Eukaryotic RNA polymerases. parallel strands connected at the bottom by two small hairpin loops While each of these transcribes different types of RNA, between strands 1 and 2 and strands 3 and 4 and by one large hairpin they are all Multisubunit RNA polymerases that function loop at the top between strands 2 and 3. In the structure on the left, strands 1 and 2 and strands 3 and 4 are stabilized by base pairing. This in related ways. The mechanism of the important RNA structure terminates transcription of the trp operon in the presence of polymerase II that produces mRNAs will be described high tryptophan. In contrast, strands 2 and 3 are stabilized by base pairing in the structure on the right, which allows transcription of the here, but each of the 5 has similar mechanisms for trp operon to continue in the presence of low tryptophan. © 1981 Nature Publishing Group Yanofsky, C. Attenuation in the control of expression of bacterial initiation, elongation, and termination of transcription. operons. Nature 289, 753 (1981). All rights reserved. Eukaryotic mRNAs are nearly always monocistronic mRNAs with a general structure as shown in Figure amount of available tryptophan, one of two structures 3.30. The key transcribed features are a 5’-UTR can be produced (Figure 3.29) One structure leads to (untranslated region), a coding region, and 3’-UTR. termination of transcription in the leader sequence CONCEPTS OF GENOMIC BIOLOGY Page 33 Other nontranscribed features that are typical of Eukaryotic promoters, core promoter elements and mRNAs in promoter proximal elements. Core promoter elements are located near the transcription start site and specify where transcription 5’ UTR Coding Region 3’ UTR upstream TATA box begins. Examples include: Enhancers Promoter Exon 1 Exon 2 Exon 3 DNA 5’ 3’ Intron 1 Intron 2 1) The (Inr), a pyrimidine-rich A Gene Transcription that spans the transcription start site; by RNA Polymerase II 2) The TATA box (also known as a TATA element or Goldberg–Hogness box) at -30 nt (full Nuclear Processing – 5’ Capping &

poly-A tail addition sequence is TATAAAA). This element aids in

G

Me - 5’ Cap 7 3’ Poly-A tail Pre-mRNA AAAAAA local DNA denaturation and sets the start point Nuclear Processing – Intron removal for transcription. & transport to the cytoplasm

5’ UTR Protein Coding Region 3’ UTR Promoter-proximal elements are required for high

5’ Cap G 3’ Poly-A tail Me - Final mRNA 7 AAAAAA levels of transcription. They are further upstream from the start site, at positions between -50 and -200. These Figure 3.30. Diagram showing the elements and structure of a typical eukaryotic mRNA-producing gene. Note that a primary transcript is elements generally function in either orientation. produced which is subsequently modified by the addition of a 7-methyl Examples include: guanosine (Cap), and the poly-A tail. Subsequently, introns are spiced from the transcript to make a finished mRNA ready to exit the nucleus. 1) The CAAT box, located at about -75. 2) The GC box, consensus sequence GGGCGG, Eukaryotic cells include a 5’-Cap structure and a poly-A located at about -90. tail that will be described in more detail below. Various combinations of core and proximal elements Promoters in many Eukaryotes have been analyzed are found near different genes. Promoter-proximal either by the use of directed within promoter elements are key to understanding the rate at which sequences or by comparative analysis of multiple genes transcription initiation occurs and thus the level of gene from different organisms. These studies have revealed expression. that there are two types of elements found in CONCEPTS OF GENOMIC BIOLOGY Page 34 initiation requires assembly complex (PIC). Note that the PIC is sometimes referred of RNA polymerase II and binding of general to simply as the transcription initiation complex. GTFs transcription factors (GTFs) on the core promoter at the are needed for initiation by all RNA polymerases and are TATA box (see Figure 3.31) forming a preinitiation numbered to match their corresponding RNA polymerase and lettered in the order of discovery (e.g., TFIID was the fourth GTF discovered that works with RNA polymerase II). The general transcription factors along with other proteins forming specific PICs at a particular promoter poise RNA polymerase to begin transcription of the gene behind the promoter. Once the PIC forms, RNA polymerase will initiate transcription. However, the rate at which transcription initiation occurs at a particular gene depends on 2 factors. The first factor is the number and types of / sequence elements found in the promoter. These sequence elements can be from 50 nt to over 1,000 nt in length. Enhancer/silencer elements must be located in cis (meaning close to) to promoter/coding sequence in order to effect the expression of a gene. Some enhancer/silencer sequences have been found that are as much as 1 Figure 3.31. Eukaryotic transcription begins with the formation of a transcription preinitiation complex (PIC) on the TATA box in the megabase (1,000,000 nt) away from the transcription promoter of the gene. The PIC is a large complex of proteins that is start site (TATA box), but most are within a few necessary for the transcription of protein-coding genes in eukaryotes. The preinitiation complex helps position RNA polymerase II over gene thousand bases or less of the TATA box. transcription start sites, denatures the DNA, and positions the DNA in the RNA polymerase II active site for transcription. The minimal PIC The second factor regulating the rate of transcript includes RNA polymerase II and six general transcription factors: TFIIA, initiation is proteins that can bind to specific enhancer TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. Additional regulatory complexes (co- activators and chromatin-remodeling complexes) could also be or silencer sequence elements. Activators are proteins components of the PIC. CONCEPTS OF GENOMIC BIOLOGY Page 35 that bind to enhancer sequences. proteins either interfering with the critical protein-protein also contain protein-protein interaction domains that interactions of activators or by binding tightly to allow them to bind to and affect the behavior of other enhancer sequences keeping activators from binding. proteins. These other proteins could be RNA polymer- Thus, activator and repressor proteins are important ase itself; other general transcription factors in the PIC; in transcription regulation. They are recognized by or other adapter proteins that interact with the PIC (see promoter-proximal elements and other enhancer/ Figure 3.32). silencer sequence elements found upstream of the promoter, and they are specific for groups of similarly regulated genes. These proteins mediate the rate of transcription initiation for genes that contain recognized sequence elements. The presence or absence of specific activator and repressor sequences in a specific cell either because of cell type or because of environmental factors can mediate the initiation of transcription. For example, housekeeping genes (used in all cell types for basic cellular functions) have common promoter- proximal elements and are recognized by activator proteins found in all cells. Examples of genes with housekeeping functions include: actin, hexokinase, and Figure 3.32. An activator protein binding to a promoter-proximal Glucose-6-phosphate dehydrogenase. enhancer seuqence, interacting with an adapter protein, and the PIC to enhance transcription initiation. Genes expressed only in some cell types or at particular times have promoter-proximal elements recognized by activator proteins found only in specific Repressor proteins can either bind to silencer or cell types or times. Enhancers are another cis-acting enhancer sequence elements in the promoter. In so element. They are required for maximal transcription of doing they reverse the effect of activator proteins by a gene. CONCEPTS OF GENOMIC BIOLOGY Page 36 Enhancers/silencers are usually upstream of the Additionally, each tissue produces a set of tissue- transcription initiation site but may also be specific and general activator and repressor proteins, downstream. They may modulate from a distance of and the spectrum of these proteins can be influenced by thousands of base pairs away from the initiation site. environmental factors such as cellular surroundings, Because there are similar enhancer and silencer temperature, chemical environment, etc. This affords sequences in front of several genes that are the ability of each cell to “customize” the expression of coordinately regulated, and each gene promoter has its genes depending on the protein functions that are own unique spectrum of such sequences, Eukaryotic required in each cell based on cell type and cellular cells can avoid the necessity of contiguous organization environment. This phenomenon is referred to as of genes into operons as is common in prokaryotes. combinatorial gene regulation and is illustrated in Figure 3.33. Once transcription initiation has occurred, the RNA polymerase moves away from the TATA box as the transcript is elongated. This is fundamentally the work of RNA polymerase, and the other proteins of the PIC now leave the complex to be recycled to form new PICs while RNA polymerase elongates the primary transcript nucleotide chain to complete the formation of the primary transcript. 3.5.7. Processing the primary transcript into a mature mRNA (return) As shown in Figure 3.30, the primary transcript must Figure 3.33. Combinatorial gene regulation leads to the coordinate be processed in 3 significant ways to become a mature regulation of batteries of genes in Eukaryotes. The types of enhancer and silencer sequences in front of each gene determine the level of mRNA. This processing all takes place in the nucleus of transcription of each gene based on the activator and repressor the cell and prepares the mRNA for transport to the proteins present in each cell/tissue type and the environment surrounding each cell. CONCEPTS OF GENOMIC BIOLOGY Page 37 cytosol of the cell where it will subsequently be to the 5’-end of the transcript. Note that the cap is translated to produce a protein. reversed compared to the RNA strand, i.e. it is attached First, the primary transcript must acquire a cap at its 5’ to 5’ not 5’ to 3’ as are the other nucleotides in the 5’- end. The cap prepares the transcript for transport transcript. The cap can be attached to the transcript from the nucleus, provides stability to attack by during transcription before completion of the primary exonucleases in the cytoplasm, and aids in the initiation transcript, but it is critical to efficient transport of the of the translation process. Structurally, a cap consists of mRNA from the nucleus so it must be attached in the a 7-methyl guanosine attached by 3 phosphate groups nucleus. The second processing step occurs at the 3’-end of the transcript (Figure 3.35), and is involved in transcript termination of elongation by RNA polymerase II. Note that other eukaryotic RNA polymerases may have other mechanisms of transcript termination since they do not produce poly adenylated transcripts. The process for addition of the poly-A tail involves a complex of proteins that assembles at a poly-A addition consensus sequence (AAUAAA). The proteins involved in the cleavage step of the termination process include: 1) CPSF (cleavage and polyadenylation specificity factor). 2) CstF (cleavage stimulation factor). 3) Two cleavage factor proteins (CFI and CFII). Following cleavage, the enzyme poly(A) polymerase Figure 3.34. Structure of the 5’-Cap added to Eukaryotic primary RNA transcripts. The cap consists of a 7-methyl guanosine (PAP) adds A nucleotides to the 3’ end of the cleaved residueattached 5’ to 5’ at the 5’ end of the transcript by 3 transcript RNA, using ATP as a substrate. PAP is bound phosphate groups (a phosphotetraester). to CPSF during this process. Typically, about 200-250 A’s CONCEPTS OF GENOMIC BIOLOGY Page 38 are added. PABII (poly-A binding protein II) binds the A. Cleavage poly-A tail as it is produced. Upon completion of the poly-A tail, further transcription is terminated with the release of the pre-mRNA transcript from the protein complex. The third step in the process of producing a mature mRNA from a pre-mRNA involves removal of sequences that are found in the DNA coding sequence and pre- mRNA that are absent from the mature mRNA that is found in the cytoplasm of the cell. These removed sequences are called introns. The parts of the pre- mRNA that remain in the mature mRNA are called exons B. Poly-A tail addition (see Figure 3.30). The removal of introns from the primary transcript to is a process referred to as splicing, and it typically involves a protein RNP particle referred to as a spliceosome. Spliceosomes are small nuclear ribonucleoprotein particles (snRNPs) associated with pre-mRNAs. snRNAs that were previously discussed are structural parts of spliceosome RNPs. The principal snRNAs involved are U1, U2, U4, U5, and U6. Each of these snRNAs is Figure 3.35. The addition of a poly-A tail to the associated with several proteins; e.g. U4 and U6 are transcript terminates transcription of the pre-mRNA. The process involves 2 steps: A) cleavage of the part of the same snRNP. Others are in their own growing primary transcript by a complex of proteins snRNPs. Each snRNP type is abundant (~105 copies per that recognize a poly-A addition signal in the transcript; B) Addition of 200-300 A’s to the 3’ end of the transcript by PolyA polymerase (PAP).

CONCEPTS OF GENOMIC BIOLOGY Page 39 nucleus) consistent with the critical role that these The steps of RNA splicing are outlined in Figure 3.36: snRNPs play in nuclear processes. 1) U1 snRNP binds the 5’ splice junction of the intron, as a result of base-pairing of the U1 snRNA to the intron RNA. 2) U2 snRNP binds by base pairing to the branch-

point sequence upstream of the 3’ splice junction. 3) U4/U6 and U5 snRNPs interact and then bind the

U1 and U2 snRNPs, creating a loop in the intron. 4) U4 snRNP dissociates from the complex, forming

Figure 3.36. The process of intron spicing the active spliceosome. conducted by U2-dependent spiceosomes. Note 5) The spliceosome cleaves the intron at the 5’ splice that there are other types of spiceosomes, and that there are a few introns that are spliced junction, freeing it from exon 1. The free 5’ end of independent of spliceosomes. The binding of at the intron bonds to a specific nucleotide (usually least 5 RNP complexes containing snRNAs and proteins ultimately produce a structure that holds A) in the branch-point sequence to form an RNA the transcript cleaved ends together while the lariat. intron is spliced out producing a “lariat” structure. The exon ends of the transcript are then ligated 6) The spliceosome cleaves the intron at the 3’ together producing a mature mRNA with the junction, liberating the intron lariat. Exons 1 and 2 intron removed from the sequence. are ligated, and the snRNPs are released. One of the most interesting aspects of intron splicing is that there can be different transcripts created based on how introns are spliced. This is referred to as alternative splicing can be used to produce different polypeptides from the same gene as shown in Figures 3.37 and 3.38.

CONCEPTS OF GENOMIC BIOLOGY Page 40

Figure 3.38 Alternative splicing of 1 primary transcript to produce 3 different proteins.

From the above discussion it is clear that processing Figure 3.37. A schematic representation of alternative splicing. The of a mature mRNA from the primary RNA transcript, and figure illustrates different types of alternative splicing: exon inclusion or skipping, alternative splice-site selection, mutually exclusive exons, the transport of the mature mRNA from the nucleus to and intron retention. For an individual pre-mRNA, different alternative the cytoplasm are steps that can influence the amount exons often show different types of alternative-splicing patterns. © 2002 Nature Publishing Group Cartegni, L., Chew, S. L., & Krainer, A. R. Listening to of translatable mRNA for a particular protein that exists silence and understanding nonsense: exonic mutations that affect splicing. 3, 285–298 (2002). All rights reserved. in a cell. The details of the steps we have discussed have emerged from a series of original molecular genetic studies, and have been greatly embellished more recently by functional genomic studies that we

will investigate further in subsequent chapters.

CONCEPTS OF GENOMIC BIOLOGY Page 41 protein. This section concerns the process by which this translation of genetic information takes place.

3.6. TRANSLATIION (RETURN) 3.6.1. The Nature of Proteins (return)

Amino acids are the monomeric building blocks used Following transcription, the genetic information to make proteins. The generic structure of an amino carried in nucleotide sequence of mRNA is translated acid is shown in Figure 3.39. into an amino acid sequence of a polypeptide or The feature that distinguishes the 20 protein amino acids from one another is the nature of their R-groups while the amino and carboxyl groups are utilized to make the peptide bonds that form the “backbone” of proteins (Figure 3.40). The R-groups make the 20 amino acids unique.

Figure 3.40. The peptide bond is used to fasten amino acids together in proteins. Basically, peptide bonds form between the amino group of one amino acid and the carboxyl group of another amino acids. This would produce a dipeptide.

Figure 3.39. Generic structure of an amino acid showing the features However, there are similarities in the chemical common to all amino acids. These features include an alpha carbon properties imparted to each amino acid by their R- that has chemical bonded to it: an amino group, a carboxyl group, a group. As we learn more about proteins and their hydrogn atom, and an R-group. The 20 protein amino acids differ by the structure of their R-group. important features will need to understand this CONCEPTS OF GENOMIC BIOLOGY Page 42 similarity in amino acid functional groups. Figure 3.41 shows the four classes of amino acids: acidic, basic, polar, and nonpolar. These classes are based on R- groups having similar properties in each group.

3.6.2. The Genetic Code (return) How many nucleotides are needed to specify one amino acid? A one-letter code could specify four amino acids; two letters specify 16 (4 * 4). To accommodate 20, at least three letters are needed (4 * 4 * 4 = 64). Thus we can conclude that the Genetic Code is most likely a triplet code. This has been verified experimentally and the code words for all 20 amino acids have been determined (see Figure 3.42). Characteristics of the genetic code: 1) It is a triplet code; meaning each three-nucleotide codon in the mRNA specifies one amino in the polypeptide. 2) It is “punctuation” free, i.e. the mRNA is read con- tinuously, three bases at a time, without skipping any bases. 3) It is nonoverlapping. Each nucleotide is part of only one codon and is read only once during

translation. Figure 3.41. There are 20 amino acids used in biological proteins. 4) It is almost universal. In nearly all organisms TheyFigure are 3.42. divided Codon into subgroups table, showing according each to of the the properties 64 possible of theirtriplet R groupscode words.(2 acidic, Both 3 singlebasic, letter 9 neutral, and 3 letterand polar,abbreviations or 6 neutral are shown and studied, most codons have the same amino acid nonpolar).the amino acid corresponding to each code word. Note that there meaning. Examples of minor code differences are 3 stop code words that do not code for any amino acid, and an initation code word that codes for methionine (or N-formyl methionine in Prokaryotes). CONCEPTS OF GENOMIC BIOLOGY Page 43 include the protozoan Tetrahymena and mitochondria of some organisms. 5) It is degenerate. Of 20 amino acids, 18 are encoded by more than one codon (see Figure 3.42). M (AUG) and W (UGG) are the exceptions; all other amino acids correspond to a set of two or more codons, e.g. F, Y, C, H, Q, N, K, & D all have 2 code words, while I has 3, V, P, T, A, & G all have 4 code words, and L, S, & R have 6 code words). Figure 3.43. Schematic showing normal and Wobble base Codon sets often show a pattern in their pairing at the 5’ end of the anticodon in the tRNA with the 3’ sequences; variation at the third position is most end of the codon in the mRNA. common (result of wobble in this position, see below). recognize up to three different codons (Figure 6) Wobble occurs in the anticodon. The third base in 3.43 and Table 3.5). the codon is able to base-pair less specifically, 7) The code has start and stop signals. AUG is the usual start signal for protein synthesis and defines the open reading frame. Stop signals are codons with no corresponding tRNA, the nonsense or chain-terminating codons. There are generally three stop codons: UAG (amber), UAA (ochre), and UGA (opal).

3.6.3. tRNA the decoding molecule (return) As described earlier in Section 3.5.1. transfer RNA because it is less constrained three-dimensionally. (tRNA) are a type of transcribed RNA produced from a It wobbles, allowing a tRNA with base modification set of tRNA genes scattered throughout the genomes of of its anticodon (e.g., the purine inosine) to eukaryotes, mitochondria, and chloroplasts. A genome would have to have at least 1 transfer RNA for each of CONCEPTS OF GENOMIC BIOLOGY Page 44 the 20 amino acids, but in fact because of multiple code words and the inability of wobble to make it possible for 1 tRNA to code for even 4 code words, there must be substantially more than 20 tRNAs in each genome. Typical Eukaryotic genomes contain around 100 tRNA genes, and multiple tRNAs for each amino acid. Structurally, tRNAs have a highly organized, folded structure, and because they must all bind to the same sites on ribosomes, the overall general structure of each tRNA are quite similar. In 2-dimensions tRNAs are typically written showing a cloverleaf structure (Figure 3.44, upper left). Because of base pairing the 3’ end and the 5’ end form a stem, and there are 3 separate loops making up each of the leaves of the “clover”. The loop opposite from the stem contains the anticodon sequenc,, while the other two loops help the tRNA fold into a proper 3-dimensional structure (Figure 3.44, upper right). This 3-dimensional structure is an L- shaped molecule that fits into sites on the ribosome during the translation process. The amino acid is attached to the free 3’-OH group at the 3’ end of the

tRNA sequence via an ester bond (acid function of the Figure 3.44. Characteristics and structure of tRNAs. Upper left shows amino acid to the hydroxyl function on the ribose sugar. a 2-D “cloverleaf” structure typical of most tRNAs. However, this structure has additional secondary structure producing a 3-D This 3’-terminal OH is always on an structure similar to that shown in the upper right. An atomic space- nucleoside, which is preceded by two cytosines. These filling model of this 3-D structure is shown in the lower left, and the detailed 3’-end of all tRNAs showing the CCA where the amino acid is 3 nucleotide residues are not transcribed, but are attached that is posttranscriptionally added to all tRNAs is shown in the lower right. CONCEPTS OF GENOMIC BIOLOGY Page 45 posttranscriptionally added during processing of the The attachment of the amino acid to the tRNA primary transcript. Note all tRNAs share this 3’ feature. The enzyme that adds the amino acids to the tRNA, called an amino acyl tRNA synthetase (note for a particular amino acid attachment, amino acyl would be replaced by the name of the amino acid, e.g. for the amino acid, glycine, a glycyl tRNA synthetase attaches glycine to the 3’-OH on the adenosine at the 3’- terminus). There must be at least 1 amino acyl tRNA synthetase for the tRNAs coding for each amino acid, but there may be multiple tRNA synthetases if there are multiple tRNAs. tRNA synthetases recognize the proper amino acid to be fastened to each tRNA based on the shape of the amino acid, and they also recognize the tRNA to which the amino acid should be fastened based on the detailed shape of the tRNA. Note that tRNAs do not recognize anticodon sequence, but rather overall shape Figure 3.45. The catalytic cycle of an amino acyl tRNA synthetase. The enzyme binds an amino acid and ATP (top); ATP is hydrolyzed characteristics of the tRNAs. Although all tRNAs have a producing pyrophosphate and aminoacyl-AMP bound to the enxyme. relatively similar shape so that they fit into ribosomes, This if followed by the appropriate tRNA binding to the aa-tRNA synthetase, and subsequently the formation of the amino acylated they have enough uniqueness to their shape that only tRNA (referred to as a charged tRNA), with the subsequent liberation the properly shaped tRNAs are recognized by the of the tRNA and AMP. corresponding tRNA synthetases and thus shape recognition assures that the proper amino acid gets molecule carried out by the an amino acyl tRNA attached to a tRNA with the appropriate anticodon for synthetase is described in Figure 3.45. This reaction that amino acid. requires that the amino acid be energized by ATP forming an amino acyl AMP enzyme bound complex and CONCEPTS OF GENOMIC BIOLOGY Page 46 releasing pyrophosphate (PPi). The amino acid is then transferred to a tRNA forming an ester bond, and the aminoacylated tRNA and AMP are then released from the enzyme. Functionally, amino acids are inserted into the polypeptide in the proper sequence due to the decoding function of the tRNA. This involves specific binding of each amino acid to its cognate (appropriate) tRNA, and specific base pairing between the mRNA codon and tRNA anticodon. Thus, the fidelity of the transfer of genetic information from the DNA of the gene to the amino acid sequence of the protein relies as much on the fidelity of the “coding” done by the tRNA moleucles as on the fidelity of the transcription process creating the mRNA.

3.6.4. Peptides Are Synthesized On Ribosomes In Both Prokaryotes and Eukaryotes (return) Ribosomes are RNA-protein complexes that are so small that they cannot be seen with a light microscope. Figure 3.46. The structure and composition of (a) Prokaryotic and (b) Eukaryotic ribosomes. Both are composed of a large ribosomal Although both Prokaryotes and Eukaryotes have subunit and a small ribosomal subunit although these are different ribosomes, there are key differences between both the sizes. The rRNAs and proteins that are a part of each of the subunits are also shown. RNA species and proteins found in ribosomes in the two groups of organisms. It should also be noted that The details of ribosomal structure in Prokaryotes and mitochondria and chloroplasts in Eukaryotic cells Eukaryotes are shown in Figure 3.46; you may need to produce -like ribosomes that produce refer back to section 3.5.1 and Table 3.2 for more proteins in each or these organelles. details. Bacterial 70S ribosomes are composed of a 50S CONCEPTS OF GENOMIC BIOLOGY Page 47 large subunit and a 30S small subunit. Each of these Note that Eukaryotic mitochondria and chloroplasts contains the rRNAs and proteins shown in Figure 3.46. contain a genome that codes for bacteria-like rRNAs and and ribosomal proteins, and thus, the prokaryotic versions of ribosomes are found in these intracellular compartments in Eukaryotes. Eukaryotic ribosomes are significantly larger than their prokaryotic counterparts. The 80S ribosome is composed of a 60S large ribosomal subunit and a 40S small ribosomal subunit. Both of these are made up of larger rRNAs and more proteins than their prokaryotic counterparts. The important structural features of ribosomes are created by the assembly of the large and small subunits into a functional ribosome (Figure 3.47). There are 3 tRNA binding sites created on the surface of the fully assembled ribosome. One of these sites is for the entering aminoacyl-tRNA (A-site), a second is for the tRNA containing the growing peptide chain (P-site), and the last site is used for the tRNA to exit the ribosome Figure 3.47. An assembled functioning ribosome has pockets on the after it has lost its amino acid. surface near the location of the mRNA that bind the decoding tRNAs. A charged tRNA (with amino acid) binds to the aminoacyl (A) site identifying a codon sequence in the mRNA corresponding to it’s anticodon. Adjacent to the A-site is a peptidyl (P) site occupied by the tRNA that just received the growing peptide chain. As protein chain elongation occurs, the peptide is transferred to the amino acid bound to the tRNA in the A-site, forming a new peptide bond. The third site 3.6.5. Translation involves initiation, elongation, and is the exit (E) site where the tRNA having just lost it’s peptide exits the ribosome. Once the peptide has transferred, and the E site is emptied termination (return) the ribosome translocates (moves) to put the peptidyl tRNA into the P- site, and create an opportunity of a new tRNA corresponding to the codon in the A-site to enter beginning this cycle over again. CONCEPTS OF GENOMIC BIOLOGY Page 48 The translation process is very similar in Prokaryotes subject to a series of modifications, and must exit the and Eukaryotes. Although different elongation, nucleus to be translated. These multiple steps offer initiation, and termination factors are used and the ribosomes differ as discussed above, the genetic code is generally identical. In bacteria, transcription and translation take place simultaneously, and mRNAs are

Figure 3.48. Eukaryotes and bacteria produce mRNAs somewhat differently. Bacteria use the RNA transcript as an often polycistronic mRNA without modification. Eukaryotes modify pre-mRNA into mRNA by processing. The processed mRNA then leaves the nucleus where translation occurs in the cytoplasm.

relatively short-lived. This facilitates the regulation of gene expression using processes such as attenuation. Figure 3.49. The process of initiation in both Porkaryotes and Eukaryotes is fundamentally similar. The small ribosomal By comparison, in eukaryotes mRNAs can have subunit interacts with the mRNA and an initiator tRNA is located at the AUG start code word. The large subunit then more variable half- from short to long-lived, are completes initiation by aligning to the complex and froming an active ribosome. CONCEPTS OF GENOMIC BIOLOGY Page 49 additional opportunities to regulate levels of protein production, and thereby fine-tune gene expression. Initiation of translation The process of initiation of translation involves the formation of a functional ribosome at the AUG start code word on the mRNA. In both Prokaryotes and Eukaryotes, this process involves the small ribosomal subunit interacting with the mRNA at a translation start site (Figure 3.49). The initiator tRNA with methionine (N-formyl-methionine in Prokaryotes) then locates the AUG code word, and provides the opportunity for assembly of a completed ribosome with the addition of the large ribosomal subunit. This simple process of initiation is substantially more complex in Eukaryotes, and relatively simple in Prokaryotes. The details of Prokaryotic translation initiation are shown in Figure 3.50, and the details of Eukaryotic translation initiation are shown in Figure 3.51. CONCEPTS OF GENOMIC BIOLOGY Page 50

Figure 3.51. Eukaryotic translation initiation and ribosomal subunit recycling are depicted as a nine-stage process. In stage 1, ribosome recycling occurs to yield separated 40S and 60S ribosomal subunits. In stage 2, eukaryotic initiation factor 2 (eIF2), GTP, and an initiator methionine tRNA (Met-tRNAMeti) form a ternary complex called eIF2–GTP–Met-tRNAMeti. In stage 3, the 43S preinitiation complex forms. This complex includes a 40S subunit, eIF1, eIF1A, eIF3, eIF2– GTP–Met-tRNAMeti and probably eIF5. In stage 4, mRNA activation occurs, during which the mRNA cap-proximal region is unwound in an ATP- dependent manner by eIF4F with eIF4B. The mRNA loops into a circular configuration. In stage 5, the 43S preinitiation complex attaches to the unwound mRNA region. In stage 6, the 43S complex scans the 5′ UTR in a 5′ to 3′ direction. In stage 7, the initiation codon is recognized, and the 48S initiation complex forms, which switches Figure 3.50. Initiation of translation in prokaryotes the scanning complex to a 'closed' conformation. involves the assembly of the components of the This leads to the displacement of eIF1 to allow translation system, which are: the two ribosomal eIF5-mediated hydrolysis of eIF2-bound GTP and subunits (50S and 30S subunits); the mature mRNA to inorganic phosphate (Pi) release. In stage 8, the be translated; the tRNA charged with N- 60S subunit joins the 48S complex, and there is concomitant displacement of GDP-bound eIF2 formylmethionine (the first amino acid in the nascent and other factors (eIF1, eIF3, eIF4B, eIF4F, and peptide); guanosine triphosphate (GTP) as a source of eIF5) mediated by eIF5B. In stage 9, hydrolysis of energy; the prokaryotic elongation factor EF-P and the eIF5B-bound GTP occurs, and eIF1A and GDP- three prokaryotic initiation factors IF1, IF2, and IF3, bound eIF5B are released from the assembled which help the assembly of the initiation complex. elongation-competent 80S ribosomes. Translation Variations in the mechanism can be anticipated. The is a cyclical process. Following elongation, termination occurs, followed by recycling (stage selection of an initiation site (usually an AUG codon) 1), which generates separated ribosomal depends on the interaction between the 30S subunit subunits, and the process begins again. Note the and the mRNA template. The 30S subunit binds to the critical role that the 5’-CAP and the poly-A tail mRNA template at a purine-rich region (the Shine- play in translation initiation in Eukaryotes, and Dalgarno sequence) upstream of the AUG initiation note the number of GTP’s and different proteins codon. The Shine-Dalgarno sequence is complementary required in this process compared to the Prokaryotic process. to a pyrimidine rich region on the 16S rRNA component © 2010 Nature Publishing Group Jackson, R. J., Hellen, C. U., of the 30S subunit. During the formation of the & Pestova, T. V. The mechanism of eukaryotic translation initiation complex, these complementary nucleotide initiation and principles of its regulation. Nature Reviews Molecular Cell Biology 11, 113–127 (2010). All rights sequences pair to form a double stranded RNA reserved. structure that binds the mRNA to the ribosome in such a way that the initiation codon is placed at the P site. CONCEPTS OF GENOMIC BIOLOGY Page 51 Translation Elongation Once the initiation process is completed, the peptide chain is elongated by the repeated cycling of the ribosome using a process like that shown in Figure 3. 52. Note that this is the Prokaryotic version of the process, but the overall process is essentially the same though the details for the eukaryotic version differ mostly in name and number of elongation factors. Termination Termination is signaled by a stop codon (UAA,UAG,UGA) that has no corresponding tRNA (Figure 3.42). Release factors (RF) assist the ribosome in recognizing the stop codon and terminating translation. In E. coli there are 3 RFs: 1) RF1 recognizes UAA and UAG. 2) RF2 recognizes UAA and UGA. 3) RF3 stimulates termination via GTP hydrolysis. RRF (ribosome recycling factor) binds the A site, EF-G translocates the ribosome, RRF then releases the last uncharged tRNA and EF-G releases RRF, causing the ribosomal subunits to dissociate from the mRNA. In eukaryotes, eRF1 recognizes all three Figure 3.52. The Prokaryotic translation elongation cycle. Though the names, and stop codons, while eRF3 stimulates termination. numbers of elongation factors differ in Eukaryotes, the process is essentially the same as that shown here. CONCEPTS OF GENOMIC BIOLOGY Page 52 Ribosome recycling occurs without an equivalent of and move the proteins to the intracellular compartment RRF. where they are intended to function, or outside the cell. This is accomplished by the rough endoplasmic 3.6.6. Protein Sorting in Eukaryotes (return) reticulum and the Golgi apparatus of the cell. Both bacteria and eukaryotes secrete proteins As proteins destined to a specific cellular location although different mechanisms are involved. Eukar- are being made, a hydrophobic signal (leader) yotes utilize the endomembrane system to synthesize sequences (15–30 N-terminal amino acids) is produced.

Figure 3.53. Proteins made by ribosomes on the rough endoplasmic reticulum (ER) are placed inside the lumen of the ER, at the time of their synthesis. As protein synthesis begins, the first ~20 amino acids of the protein are hydrophobic (signal peptide) and bind to a signal recognition particle (SRP). This attracts the ribosome with, mRNA, and peptide to a SRP receptor on the surface of the rough endoplasmic reticulum (RER). As protein synthesis continues the protein is pushed through a pore to the inside (lumen) of the RER. A signal peptidase cleaves the signal peptide from the newly made peptide chain, and when translation is terminated the newly made protein is released inside the ER lumen. CONCEPTS OF GENOMIC BIOLOGY Page 53 As the signal sequence is produced by translation, it is bound by a signal recognition particle (SRP) composed of RNA and protein (Figure 3.53). The SRP suspends translation until the complex (containing nascent protein, ribosome, mRNA, and SRP) binds a docking protein (SRP receptor) on the ER membrane. When the complex binds the docking protein, the signal sequence is inserted into the membrane through a specifically designed pore complex, SRP is released, and translation resumes. The growing polypeptide is inserted through the membrane into the ER, in an example of cotranslational transport. In the ER cisternal space, the signal sequence is removed by signal peptidase. The protein is usually glycosylated and then transferred to the Golgi for sorting. In eukaryotes, proteins synthesized on the rough ER (endoplasmic reticulum) are glycosylated and then transported in vesicles to the Golgi apparatus. In the Golgi proteins are sorted based on other sequence signals, and then they are packaged in membrane vesicles for transport via the sends them to their destinations.

CONCEPTS OF GENOMIC BIOLOGY Page 54 This establishes a basis for investigating the additional levels of complexity of these processes in Eukaryotes. 3.7. REGULATION OF EUKARYOTIC GENE EXPRESSION (RETURN) The points of control that regulate gene expression for protein-coding genes in eukaryotes are shown in Fundamentally, the central dogma of molecular Figure 3.54. Details of each step are given below. biology that dates back to Francis Crick, states that 3.7.1. Transcriptional Control (RETURN) genetic information is transferred through an RNA Transcriptional control relates to the production of the intermediate to the amino acid sequence of proteins. pre-mRNA from the coding region of a protein-coding Thus, proteins are critical to the function of genes that gene. Some of the factors regulating transcriptional leads to the phenotype we observe for each gene. control have been discussed above in section 3.5.6. and Clearly the amount of each type of protein made in a involve the transcription factors regulating the cell determines the phenotype of the cell. The proteins formation of the preinitiation complex (PIC) or derive that are expressed in specific quantities and in specific from models of combinatorial gene regulation involving cellular locations in multicellular organisms is referred enhancer and silencer sequences in promoters and to understanding the expression of a gene. trans-acting activator and repressor proteins that We have seen that gene expression in prokaryotes is interact with these sequences. regulated primarily at the transcriptional level, and However there are additional aspects of involves factors that mediate changes in mRNA levels in transcriptional control that also impact gene expression. cells in response to environmental and intracellular These include: factors (refer back to section 3.5.5). – However, we also understand that the processes of transcription and translation in Eukaryotes are more In eukaryotes, the binding of DNA with histones to complex involving not just basic transcription and form chromatin generally represses gene expression translation, but also involving compartmentation of and is part of gene regulation. Chromatin remodeling is both transcription and translation into nuclear as well as the dynamic modification of chromatin architecture to several cytoplasmic membrane-bound compartments. allow access of condensed genomic DNA to the CONCEPTS OF GENOMIC BIOLOGY Page 55 transcriptional regulatory machinery (proteins), and thereby control gene expression. Activation of eukaryotic genes requires alteration of the chromatin structure near the core promoter. Two classes of protein complexes cause chromatin remodeling (see Figure 3.55):

xFigure 3.55. a) Chromatin remodeling using histone acetylases. b) Chromatin remodeling using a chromatin remodeling complex. This changes nucleosome positions and makes promoter sequences accessible to transcriptional machinery.

• Acetylating and deacetylating enzymes act on core histones. Histone acetyl transferases (HATs) are part of multiprotein complexes recruited to chromatin when activators bind DNA. • HATs acetylate lysines in the amino-terminus of core

Figure 3.54. Showing the control points for the histones. accumulation of proteins as a result of gene expression. CONCEPTS OF GENOMIC BIOLOGY Page 56 • The negative charges of acetyl groups decrease the DNA methylation – positive charges of the histones, reducing their DNA methylation of particular DNA sequences can affinity for DNA. also silence transcription in many eukaryotes. DNA • Acetylation of histones changes 30-nm chromatin to methylase alters cytosine to 5-methylcytosine (5mC) 10-nm fiber, making promoter more accessible for (Figure 3.56). transcription. • The effect is reversible. When histone deacetylases (HDACs) remove acetyl groups, 30-nm chromatin reforms. • Nucleosome remodeling complexes are ATP- dependent multiprotein complexes that alter nucleosome positions on the chromatin in response to binding of activators to DNA, increasing transcription. • Different types of nucleosome remodeling complexes are known, and some have more than one function: Figure 3.56. Enzymes called DNA methylases convert cytosine into 5- – Some slide a nucleosome along the DNA, methylcytosine. Methylation does not alter hydrogen bonding but does exposing DNA-binding sites for proteins. make regions of DNA recognizably different. Also some restriction enzymes will not cut their restriction sites when the cytosines are – Some restructure the nucleosome in place. methylated. – Some transfer the nucleosome from one DNA molecule to another. • Higher eukaryotes such as mammals have about 3% m The factors that control chromatin remodeling are of their cytosines modified to 5 C, while lower not clearly understood at this point but are under eukaryotes have virtually 0%. • m investigation. 5 C is nonrandomly distributed, with most found in the symmetrical sequence CpG. This allows patterns of methylation to be studied by using restriction

CONCEPTS OF GENOMIC BIOLOGY Page 57

enzymes that contain the CG sequence in their 3.7.2. Pre-mRNA Processing Control (RETURN) recognition sites. For example: The primary mRNA transcript (pre-mRNA) is – HpaII cuts at 5’-CCGG-3’, but only if the processed into an mRNA in the nucleus of a cell as cytosines are unmethylated. described in section 3.5.7. This processing involves 3 – MspI also cuts at 5’-CCGG-3’, regardless of steps: 1) the capping of the pre-mRNA at the 5’-end; 2) methylation the addition of the polyA-tail at the 3’-end; and 3) the – Differences in the array of DNA fragments splicing of introns out of the pre-mRNA to complete the produced by these enzymes on Southern synthesis of the mature mRNA. The efficiency of each of blotting allow methylation patterns to be these steps determines the rate of production of inferred. mature mRNAs and therefore amount of mRNA that will • CpG dinucleotides are found in clusters called CpG be available for translation. Additionally, steps 1 and 2 islands in specific regions of the genome. affect the stability of the mature mRNA once it is • Human CpG islands often occur in the promoters of transported to the cytoplasm, and this also affects the protein-coding genes. amount of mature mRNA. • Generally the CpG islands are unmethylated, and Beyond the influence of processing on quantity of transcription occurs. mRNA, intron splicing can regulate the specific introns • Methylation of the CpG sequence represses that are found in each mRNA. In the extreme RNA transcription by binding of specific proteins to the splicing can produce completely different transcritps methylated CpG that recruit histone deacetylases from the same pre-mRNA. An example of this is shown that cause chromatin remodeling, making promoters in Figure 3.57. inaccessible. 3.7.3. mRNA Transport from the Nucleus (RETURN) An example of methylation affecting gene expression is fragile X syndrome (OMIM 309550). The transport of RNA molecules from the nucleus to Expansion of a triplet repeat and abnormal methylation the cytoplasm is critical to Eukaryotic gene expression. in the FMR-1 gene silences its expression. The different RNA species that are produced in the nucleus are exported through the nuclear pore

CONCEPTS OF GENOMIC BIOLOGY Page 58 ribonucleoprotein (RNP) particles and recruit their exporters via class-specific adaptor proteins. Export of mRNAs is unique as it is extensively coupled to transcription and intron splicing. Understanding the mechanisms that connect RNP formation and RNA export from the nucleus is a major challenge in the gene expression field.

3.7.4. Translational Control (RETURN) Once the RNA has reached the cytoplasm, many factors control the rate at which it will be translated. The control of gene expression at the level of translation can occur in many ways. Ribosomal translational control, i.e. selecting specific Figure 3.57. Differential splicing of introns can produce two different mRNAs for translation, can also impact gene expression. proteins from the same pre-mRNA transcript. In Thyroid the calcitonin Unfertilized eggs are an example, in which mRNAs show gene is polyadenylated after exon 4, and after intron spicing an mRNA is significant increases in translation after fertilization made, translated, and cleaved to produce calcitonin. In neuronal cells, a pre-mRNA containing 5 exons is produced, and this is polyadenylated without new mRNA synthesis. Examples of this are after exon 5. Differential splicing in neuronal cells produces an mRNA shown in Table 3.6. lacking exon 4 but containing exon 5. This mRNA is translated and cleaved to produce CGRP a different peptide hormone. • Stored mRNAs are associated with proteins that both protect them and inhibit their translation. complexes via large protein complexes that form a • Poly(A) tails promote translation initiation, and nuclear pore called mobile export receptors. stored mRNAs generally have shorter tails. Small RNAs (such as tRNAs and microRNAs) follow • In some mRNAs of mouse and frog oocytes, a relatively simple export routes by binding directly to normal-length poly(A) tail is added and then trimmed export receptors while large RNAs (such as ribosomal enzymatically. RNAs and mRNAs) assemble into complicated CONCEPTS OF GENOMIC BIOLOGY Page 59

Table 3.6.

• Particular mRNAs are marked for deadenylation by a glycosylated, and this both controls their ultimate region in the 3’ untranslated region, called the cellular destination and stability. Proteins associated adenylate/uridylate (AU)-rich element (ARE), with with membranes can also have hydrophobic groups the consensus sequence UUUUUAU. such as myristic acid groups attached to assist them in • Activation of the stored mRNA occurs when a binding to membranes. cytoplasmic polyadenylation enzyme recognizes the Numerousl other types of posttranslational ARE and adds about 150 A residues, making a full- modifications, such as phosphorylation, can length poly(A) tail. dramatically affect enzymatic activity and thus regulate 3.7.5. Protein Processing Control (RETURN) protein function. These effects will be summarized further in the Proteomics section of the course. Once translated proteins are also processed. This processing depends on whether the proteins are made on the endoplasmic reticulum, or on cytosolic ribosomes. Refer to section 3.6.6 and Figure 3.53 for more information. Proteins made on the ER are CONCEPTS OF GENOMIC BIOLOGY Page 60

3.7.6. Degradation of mRNA Control (RETURN) The control of mRNA degradation is a complex process. However, small regulatory RNAs (srRNA) have emerged as important ways that mRNA degradation is regulated. srRNAs can be either microRNAs (miRNA) that are typically 20-21 nt in length, or short interfering RNAs (siRNA) that are typically 23-24 nt in length. Recall from section 3.5 that miRNAs are transcribed from miRNA genes by RNA polymerase II (see Figure 3.58a). They are formed from hairpin loop structures, and participate in RNA silencing complexes (RISC). These complexes target specific mRNAs for degradation which prevents their further translation into protein. miRNAs can also target mRNAs for storage inhibiting their translation until an appropriate signal brings these mRNAs out of storage by degrading the miRNA. siRNAs are produced from larger double stranded precursors (see Figure 3.58b). An enzyme called dicer produces the siRNAs by cleavage of these larger precursors. SiRNAs can also load into RISC particles where they target specific mRNA for degradation silencing their subsequent translation into proteins. Figure 3.58. Small regulatory RNAs are produced by either the miRNA or the siRNA pathway. Both pathways produce RNAs that target specific mRNAs for degradation, thus controlling gene expression bymRNA degradation. CONCEPTS OF GENOMIC BIOLOGY Page 61

3.7.7. Protein Degradation Control (RETURN) Regulation of protein degradation occurs in many ways: • A constitutively produced mRNA may be translated continuously, and so the protein degradation rate determines its level. • A short-lived mRNA may make a very stable protein, so that it persists for long periods in the cell. • Protein stability varies. • Proteolysis (protein degradation) in eukaryotes requires ubiquitin, a protein cofactor. • Protein stability is directly related to the amino acid at the N terminus of the protein (the N-end rule). In yeast, stability of the same protein was measured with different N-terminal amino acids. • The N-terminal amino acid directs the rate of ubiquitin binding, which in turn determines the half- life of the protein.