<<

Chapter 3. The Beginnings of Genomic

Molecular

Contents

3. The beginnings of Genomic Biology – molecular genetics 3.1. DNA is the Genetic Material 3.6.5. initiation, elongation, and termnation 3.2. Watson & Crick – The structure of DNA 3.6.6. Sorting in 3.3. structure 3.3.1. Prokaryotic chromosome structure 3.3.2. Eukaryotic chromosome structure 3.3.3. & 3.4. DNA Replication 3.4.1. DNA replication is semiconservative 3.4.2. DNA 3.4.3. Initiation of replication 3.4.4. DNA replication is semidiscontinuous 3.4.5. DNA replication in Eukaryotes. 3.4.6. Replicating ends of 3.5. 3.5.1. Cellular are transcribed from DNA 3.5.2. RNA polymerases catalyze transcription 3.5.3. Transcription in 3.5.4. Transcription in Prokaryotes - Polycistronic mRNAs are produced from 3.5.5. Beyond Operons – Modification of expression in Prokaryotes 3.5.6. Transcriptions in Eukaryotes 3.5.7. Processing primary transcripts into mature mRNA 3.6. Translation 3.6.1. The Nature of 3.6.2. The 3.6.3. tRNA – The decoding molecule 3.6.4. Peptides are synthesized on CONCEPTS OF GENOMIC BIOLOGY Page 1

 CHAPTER 3. THE BEGINNINGS OF GENOMIC

BIOLOGY – MOLECULAR GENETICS (RETURN)

3.1. DNA IS THE GENETIC MATERIAL. (RETURN) As the development of proceeded from Mendel in 1866 through the In 1928, a British scientist, early part of the 20th century the understanding Frederick Griffith, published his that Mendel’s factors that produced traits were work showing that live, rough, avirulent could be carried on chromosomes, and that there were transformed by a “principle” infinite ways that the genetic information from 2 found in dead, smooth, virulent parents could assort in each generation to bacteria into smooth, virulent produce the genetic variety demanded by bacteria. This meant that the Darwin’s theories on “origin of species” on which bacterial traits of rough versus acted. This gave rise to the Frederick Griffith smooth and avirulence versus study of behavior of more complex traits (1879-1941) viru- and an understanding of in populations. virulence were controlled by a substance that could carry the from dead to live cells. At the same time a quest for the material Griffith’s observations on Pneu-mococcus were inside a , perhaps a subcomponent of a controversial to say the least, and inspired a spirited chromosome, that carried the genetic instructions debate and much experimentation directed at proving to make what they are was ongoing. whether the “transforming principle” was protein or , the two main components of

CONCEPTS OF GENOMIC BIOLOGY Page 2 chromosomes identified early in the 20th century, well disrupted with a kitchen blender, and the location of before Griffith’s experiments. This debate continued the label determined. The 35S-labeled protein was found until Oswald Avery and his colleagues, Colin MacLeod, outside the infected cells, while the 32P-labeled DNA and Maclyn McCarty published their work in 1944 was inside the E. coli, indicating that DNA carried the unequivocally showing that DNA was, in fact, Griffith’s information needed for viral infection. transforming principle. This completely revolutionized genetics and is considered the founding observation of molecular genetics.

Figure 3.1. An electron micrograph of bacteriophage T2 (left), and a sketch showing the structures present in the (right). The head consists of a DNA molecule surrounded by proteins, while the core, sheath, and tail fibers are all made of Oswald T. Avery Colin MacLeod Maclyn McCarty protein. Only the DNA molecule enters the cell. In 1953, more evidence supporting DNA being the genetic material resulted from the work of Alfred Once it was established that DNA was the genetic Hershey and Martha Chase on E. coli infected with material carrying the instructions for so to speak, bacteriophage T2. In their experiment, T2 proteins attention turned to the question of “How could a were labeled with the 35S radioisotope, and T2 DNA was molecule carry genetic information?” The key to that labeled with was labeled with the 32P radioisotope. Then became obvious with a detailed understanding of the the labeled were mixed separately with the E. structure of the DNA molecule, which was developed by coli host, and after a short time, phage attachment was two scientists a Cambridge University, James Watson and Francis Crick. CONCEPTS OF GENOMIC BIOLOGY Page 3 a young x-ray crystallographer working in the laboratory of at 3.2. WASON & CRICK – THE STRUCTURE OF DNA. Cambridge University used a technique known as x-ray (RETURN) diffraction to generate images of DNA molecules that showed that DNA had a helical structure with repeating The basic laboratory observations that lead to the structural elements every 0.34 nm and every 3.4 nm formulation of a structure for DNA did not involve along the axis of the molecule. biologists. Rather Irwin Chargaff, an analytical, organic , and physicists, Rosalind Franklin and Maurice Wilkins made the laboratory observations that led to the solution of the structure of DNA. Chargaff determined that there were 4 different nitrogen bases found in DNA molecules; the purines, (A) and guanine G), and the pyrimidines, (C) and (T), and he purified DNA from a number of different sources so he could examine the quantitative relationships of A, T, G, and C. He con- Rosalind Franklin Maurice Wilkins cluded that in all DNA molecules, the mole-percentage of A was nearly equal to the mole-percentage of T, while the mole-percentage of G was nearly equal to the mole-percentage of C. Alternatively, you could state this as the mole-percentage of pyri-midine bases equaled the mole-percentage of purine bases. These observations became known as Chargaff’s rules. Figure 3.2. X-ray diffraction image of DNA molecule showing helical structure with repeat structural elements.

CONCEPTS OF GENOMIC BIOLOGY Page 4 These astute observations allowed Watson and Crick The key elements of this structure are: to synthesize together a 3-dimentional structure of a • Double helical structure – each helix is made from DNA molecule with all of these essential features. This the alternating deoxyribose sugar and phosphate structure was published in 1953, and immediately groups derived from deoxynuclotides, which are the generated much excitement, culminating in a Nobel monomeric units that are used to make up Prize in Physiology and , in 1962 awarded to polymeric nucleic acid molecules. Each Franklin, Wilkins, Watson, and Crick. in each chain consists of a nitrogen base of either the purine type (adenine or guanine) or the pyrimidine type (cytosine or ) attached to the 1’-position of 2’-deoxyribose sugar, and a phosphate group, esterified by a phospho-ester bond to the 5’-position of the sugar.

Figure 3.3. Watson & Crick’s DNA structure. Their model consisted of a double helicical structure with the sugars and phosphates making the two hlices on the outside of the structure. The sugars were held together by 3’-5’-phosphodiester bonds. The bases pair on the inside of the molecule with A always pairing with T, and G always pairing with C. This pairing leads to Chargaff’s observations about bases in DNA. Figure 3.4. Structures of purine and pyrimidine bases in DNA, and structure of 2’-deoxyribose sugar. CONCEPTS OF GENOMIC BIOLOGY Page 5 • The are held together in sequence order Figure 3.5. The building bocks of nucleic along the length of the polynucleotide chain by 3’-5’- acids are nucleotides and . Any phosphodiester bonds, and the strands demonstrate base together with a deoxyribose sugar forms a deoxyribonucleoside, while if the a polarity as the 5’-OH at one end of a polynuc- sugar is ribose a ribonucleoside is formed leotide strand is distinct from the 3’-OH at the other (not shown). Addition of a phosphate on end of the strand. Often, but not always, the 5’- the 5’ position of the sugar froms nucleotides from nucleosides. strand end will have a phosphate group attached.

• Each of the 2 polynucleotide chains of the double helix are held together by hydrogen bonds beween the in one strand and the in the other strand, and between the guanosines in one strand hydrogen bonded to the in the other strand.

Figure 3.6. Base pairing between A and T involves two hydrogen bonds, and pairing between G and C involves 3 Figure 3.7. Strand of DNA hydrogen bonds. This means that the forces holding strands showing the 3’,5’-phosphodi- together in G=C -rich regions are stronger than in A=T ester bonds holding base pair-rich regions. nucleotides together. CONCEPTS OF GENOMIC BIOLOGY Page 6 In order to get a uniform diameter for the molecule and have proper alignment of the nucleotide pairs in the middle of the strands, the strands must be orient- ed in antiparallel fashion, i.e. with the strand polarity of each strand of the double helix going in the opposite direction (one strand is 3’-> 5’ whie the other is 5’ -> 3’). The truly elegant aspect of this solution to DNA structure produces a spacing of exactly 0.34 nm between nucleotide base pairs in the molecule, and there are 10 base-pairs per complete turn of the helix. This corresponds precisely with Rosalind Franklin’s x-ray diffraction measurements of repeating units of 0.34 nm and 3.4 nm, and with her measurements of 2 nm for the diameter of the double helix. It is also noteworthy that Watson and Crick suggested that the structure they proposed produced a clear method for the two strands of the DNA molecule to duplicate and maintain the fidelity of the sequence of bases along each chain as DNA was synthesized inside a cell. Thus, providing a mechanism for the fidelity of information transfer from cell generation to cell generation.

CONCEPTS OF GENOMIC BIOLOGY Page 7 base sequences of the DNA molecules themselves. This 3.3. CHROMOSOME STRUCTURE. (RETURN) was obvious well before we btained the first genomic DNA sequences, but has become even more apparent The DNA inside a cell seldom exists as a simple, and significant now that we have the DNA sequences of “naked” DNA molecule. Because DNA molecules are many . Thus, genomic biology is not merely long linear molecules with an overall negative charge the study of DNA nucleotide sequences, but involves deriving from the phosphate groups making up the the study of the structure of the genetic material such helices, positively charged ionic species within cells are as chromosomes and . attracted to these molecules. These positively charged + ++ molecules can be small ions such as K and Mg , or they 3.3.1. Prokaryotic chromosome structure (return) can be larger positively charged proteins, and/or other Most Prokaryotes (e.g. bacteria) have a single, larger molecular species. These ionic interactions play although some have more than an important role producing the folding and packaging one chromosome, and some have linear chromosomes that is required to keep the large linear molecule rather than circular chromosomes. Certainly, the most packaged inside the microscopic cell. well studied bacteria, e.g. , has a single In the case of proteins it is clear that the positively circular chromosome that can exist in either a relaxed or charged proteins can interact both by general ionic supercoiled state. interactions, but they can also ingeract in sequence Supercoiling involves breaking one of the 2 circular specific ways; i.e. specific proteins only bind to specific helical strands and then rotating the broken ends either sequences of bases in the DNA strand. Thus, the types in the direction of the helix (+ supercoil) or in the of molecular interactions that ionic substances, opposite direction of the helix (- supercoil). As particularly proteins, have with DNA molecules play supercoiling is added to the DNA molecule it becomes important roles in determining the expression of “tightly” coiled (see Figure 3.8.), and therefore can be information that is carried in the DNA molecule. It will compacted more easily. This permits the packaging of be obvious as we proceed through our study of genomic the large DNA molecule into the relatively small cells in biology, that such DNA-protein interactions are as which it must exist and function. critical to describing “genetic information” as are the CONCEPTS OF GENOMIC BIOLOGY Page 8

Figure 3.9. Diagram of DNA organizational structure in prokaryotes. Supercoiled DNA is looped and attached to scaffold proteins.

Figure 3.8. An E. coli cell lysed open showing the expanse of its DNA molecule (left). Note that this entire molecule must be folded and 3.3.2. Eukaryotic chromosome structure (return) packaged inside the cell in the picture. On the right are two electron micrographs showing circular DNA molecules either in a relaxed (top) or In general Eukaryotes have much larger genomes supercoiled (bottom) state. than do Archea and other Prokaryotes. This difference in relative size compared to the complexity of Additional packaging results from the supercoiled the does not appear to be as true for species DNA being carefully looped onto a scaffold of proteins within the Eukaryota. This lack of correlation between leading to an organized intracellular structure that can organismal complexity and genome size (called the C- be easily accessible but also keep DNA from twisting vlaue) is referred to as the C-value paradox (Table 3.1) and being damaged du-ring normal cellular processes. The C-value paradox results from great variation in the nature of DNA in different Eukaryotes. Some eukaryotes contain substantial amounts of DNA that appears to have limited or at have a gene density in their genomes resembling the Prokaryotes (e.g. the

yeasts and malarial parasite in the table above). The CONCEPTS OF GENOMIC BIOLOGY Page 9 majority of Eukaryotes fall somewhere in between these extremes, but are highly variable in their DNA contents. For now we need to appreciate that this variation in DNA content and type appears to have a relationship to chromosome structure. But the nature of this relationship will be considered further once we learn more about DNA and examine fully Figure 3.10. Electron micrograph showing the structure sequenced genomes. of Eukaryotic DNA. The DNA molecule is barely visible, but connects the beads of proteins that the DNA wraps around creating the appearance of beads on a string.

In eukaryotes, there are multiple levels of chromosomal organization that we will need to consider. Observations using powerful electron

CONCEPTS OF GENOMIC BIOLOGY Page 10

microscopes demonstrated that in Eukaryotes, the DNA molecules in chromosomes are organized like beads on a string. These structures have subsequently been

named . Investigation of the nature of nucleosomes has shown that they are made from several types basic proteins (positively charged) found

in cells called proteins.

The basic nucleosome consists of a combination of H2A, H2B, H3, and H4. DNA is subsequently

wrapped around these structures producing the bead- like appearance observed in the electron microscope. Once the nucleosomes are formed, they can condense

or decondense based on interaction with another histone, .

During prophase of or , the

nucleosome structure of chromatin further condenses into a so-called solenoid structure, which is approxim- ately 30 nm in diameter. This solenoid from is not

visible in a light microscope but can be viewed in an electron microscope. This appears to be the form DNA Figure 3.11. Nucleosomes are formed when DNA wraps around a histone complex. Nucleosomes can exist in either a assumes when chromosomes condense during during more condensed or a decondenses state depending ot the mitosis, but the DNA is not as accessible for use in the state of the genetic material in a cell. cell as it is during interphase, when the chromatin is

decondensed.

CONCEPTS OF GENOMIC BIOLOGY Page 11

Figure 3.13. Loop-folding of the 30 nm solenoid structure yields a packaged DNA that is visible in a ligh microscope in each Eukaryotic chromosome.

structure appears to be required to allow for the appropriate assembly and assortment of the genetic material during the cell cycle in mitosis. Without this structural organization, it is likely that cellular DNA would become a hopeless Figure 3.12. Condensation of chromatin leads to tangle, and cellular reproduction would be severely the careful packaging of DNA into so called solenoid sturctures. These structures ultimately hampered, and would likely require too much time and form chromosomes. effort to ultimately be successful.

The solenoid structures are subsequently looped and 3.3.3. Heterochromatin & Euchromatin. (return) fastened to chromosome scaffold proteins generating a The cell cycle affects DNA packing into chromatin structure that is visible in a light microscope that we with chromatin condensing for mitosis and meiosis and know as a chromosome. then decondensing during interphase while being most While this may seem like an elaborate structure dispersed at S-phase. However, cytogeneticists have involving several sets of structural proteins, such a observed that there can be two differently staining forms of chromatin, called Euchromatin and CONCEPTS OF GENOMIC BIOLOGY Page 12 heterochromatin. Euchromatin condenses and unique-sequence DNA, and Prokaryotes had little or no decondenses with the cell cycle. Euchromatin accounts repetitive sequences. However, Eukaryotes have a mix for most of the active genome in dividing cells and bears of unique and repetitive sequence types of DNA. most of the protein-coding DNA sequences. • Unique-sequence DNA includes most of the genes Heterochromatin remains condensed throughout the that encode proteins, and Euchromatin is rich in cell cycle and is believed to be relatively inactive. There unique-sequence DNA. are two types of heterochromatin based on activity, ie. • Repetitive-sequence DNA includes the moderately constitutive heterochromatin that is tightly condensed and highly repeated sequences. They may be in virtually all cell types and facultative dispersed throughout the genome or clustered in heterochromatin which varies between cell types tandem repeats. Heterochromatin is rich in moderate and/or developmental stages. and highly repetitive DNA. Other methods of characterizing types of DNA • Human DNA contains about 65% unique sequences suggest that there are sequences of DNA that can occur while unque sequence DNA makes up a much lower in may copies in the genome. These types of sequences percentage of the genome of organisms that have can be repeated only once in the genome or they can unexpectedly large genomes (C-values) that were occur 10’s of thousans of times or more in genomes. discussed earlier in this section. Sequences can be categorized into: • Unique-sequence DNA, present in one or a few copies per genome. • Moderately repetitive DNA, present in a few to 105 copies per genome • Highly repetitive DNA, present in about 105–107 copies per genome Observations about repetitive DNA sequences as described above have been known for decades, and initially it was shown that Prokaryotic DNA was mostly CONCEPTS OF GENOMIC BIOLOGY Page 13 but it is possible to find atoms with 7 protons, and 8 neutrons, having an atomic mass of 15 (written as 15N). 3.4. DNA REPLICATION. (RETURN) It turns out that if you grew bacterial cells on a nitrogen 15 source enriched in a N enriched nitrogen source, the DNA molecules purified from such cells have a greater As Watson and Crick were solving the structure of density (they are heavier). By synchronizing cells and DNA, they realized the general mechanism by which the purifying DNA after each round of DNA replication and molecule could be copied and maintain fidelity in then determining the density of the newly made DNA copying the DNA molecule. From that beginning, molecules using density gradient centrifugation, interest in understanding the duplication of the DNA Meselson and Stahl were able to show that the first molecules of a cell became a subject of investigation, round of DNA synthesis produced molecules having a and led to a number of Nobel Prize awards. However, hybrid density between light and heavy DNA. While understanding DNA replication was critical to the after a subsequent round of DNA replication they development of the technologies needed for molecular produced light and hybrid molecules. Such a pattern of genetics and ultimately genomic biology research.

3.4.1. DNA Replication is semiconservative. (return) Among the earliest experiments concerning the nature of how DNA replicates were the studies of Mathew Meselson and Frank Stahl. Meselson, while a Ph.D. student designed an experiment that utilized so called “heavy” isotopes nitrogen. Elemental isotopes consist of atoms having the same number of proton, but Figure 3.14. Diagram showing the predicted outcome with more than the average number of neutrons. For of conservative, semiconservative, and dispersive DNA example, nitrogen normally has 7 protons, and 7 replication. Original strands are shown in red while neutrons, giving it an atomic mass of 14 (written 14N) newly made DNA is shown in blue. CONCEPTS OF GENOMIC BIOLOGY Page 14 15N labeling was consistent only with the strand according to the sequence of the corresponding semiconservative replication of DNA. strand being copied. This copied strand is referred to as the template strand. 3.4.2. DNA Polymerases. (return) All DNA polymerases studied to date make DNA The that replicates the DNA double helix is using the general principles established for Kornberg’s called DNA . The enzyme is difficult to work with because there are but a few copies of it needed per cell, and then they are required only in S-phase of the cell cycle. In spite of these limitations, Arthur Kornberg, won the Nobel Prize in 1959 for the first purification and characterization of an enzyme that makes DNA. Kornberg’s enzyme was purified from the bacterium E. coli, and beside the enzyme 4 additional components were required to make DNA in a test tube. These factors included a template DNA (Kornberg used E. coli DNA), the four deoxy nucleotide triponosphates (dNTP), i.e. dATP, dGTP, dCTP, and dTTP. Note that these are the deoxy NTP, and not the ribose containing NTP’s. The remaining requirements for DNA polymerase are magnesium ion (Mg++) and a primer single strand of DNA. This primer requirement involves a single strand of DNA that will form a short double- stranded region of DNA. DNA polymerase then adds nucleotides to the free 3’-end of this primer, but Figure 3.15. Note that the template strand is read from it’s 3’-end without the primer DNA polymerase is unable to make a to its 5’-end while the antiparallel, new DNA strand is made from DNA strand. As the nucleotides are added they are the 5’-end to the 3’-end. added from the 5’-end to the growing 3’-end of the CONCEPTS OF GENOMIC BIOLOGY Page 15 enzyme, but there are significant differences between • A minimal sequence of about 245 bp required for them in other respects. For example, in E. coli there are initiation. five different DNA polymerases. Kornberg’s enzyme is • Three copies of a 13-bp AT-rich sequence. now known as DNA polymerase I, but there are also • Four copies of a 9-bp sequence. DNA polymerases II, III, IV, and V. DNA polymerases II, IV, and V are not involved in the DNA replication process, and they have specialized functions in repairing damaged DNA under specific circumstances. DNA polymerases I and III are the DNA polymerases involved in the replication of cellular DNA. Both of these DNA polymerases contain a 3’ -> 5’ activity that is involved in proof-reading the recently made DNA strand and removing any mistakes that are made. Only DNA polymerase I has a 5’ -> 3’ exonuclease activity and we will visit this function again below when the role of DNA polymerase I in DNA replication is considered. 3.4.3. Initiation of replication. (return) Replication initiates at a specific sequence in the genome that is often called an . E. coli has one origin, called oriC, where replication starts when the strands of the helix are forced apart to expose the bases, creating a replication bubble with two replication forks. Replication is usually bidirectional Figure 3.16. Initiation of DNA replication in E. from the origin using the two forks to enlarge the coli. at oriC. Noote the 9 and 13 bp repeats where DNA binds and activates bubble in both directions. E. coli has one origin, oriC, replicatlion throught the action of DNA . with the following properties: CONCEPTS OF GENOMIC BIOLOGY Page 16 From a series of in vitro studies it has been shown in E. 3) DNA polymerase III adds nucleotides to the 3’-end coli that the following steps are involved in initiating of the primer, synthesizing a new strand replication: complementary to the template and displacing the 1) Initiator proteins attach to oriC (E. coli’s initiator SSBs. DNA is made in opposite directions (at each protein is the DnaA protein derived from the dnaA fork) on the two template strands since DNA gene. polymerase only adds nuclotides to the free 3’- end. 2) DNA helicase (from dnaB gene) binds initiator proteins on the DNA and denatures the AT-rich 13- 4) The new strand made 5’-to-3’ in the same bp region using ATP as an energy source. direction as movement of the replication fork, i.e. 3) DNA primase (from the dnaG gene) binds helicase DNA polymerase III is continuously moving toward to form a , which synthesizes a short the fork on one strand of the bubble at each fork. (5–10 nt) RNA primer. This defines the “leading strand”. On the other strand the new strand must be made in the

opposite direction as it must be made 5’ -> 3’. 3.4.4. DNA Replication is Semidiscontinuous (return) 5) This means that on this “lagging strand” primase When DNA denatures (strands separate) at the ori, must add the RNA primer very close to the replication forks are formed. DNA replication is usually replication fork, and the DNA polymerase III bidirectional, but we will consider events at just one moves away from the fork rather than toward the replication fork, but don’t forget that a similar set of fork like it was on the leading strand. events are occurring at the other replication fork in the bubble. The events occurring at each fork are: 6) The Leading strand needs only one primer and continuously makes the new DNA strand, while on 1) Single-strand DNA-binding proteins (SSBs) bind the the lagging strand a series of RNA primers are ssDNA formed by helicase, preventing required and only a limited number of DNA reannealing. nucleotides are added by DNA polymerase III 2) Primase synthesizes a primer on each template before the previously made fragment is strand. encountered. CONCEPTS OF GENOMIC BIOLOGY Page 17 ments. DNA replication is therefore semidiscon- tinuous. 8) As the bubble enlarges and DNA helicase denatures (untwists) the strands, this causes tighter winding in other parts of the circular chromosome. A protein called DNA Gyrase relieves the tension created in the molecule. 9) As accumulate on the lagging strand, DNA polymerase I binds and the 5’ -> 3’ exonuclease activity removes the RNA primers, and replaces them with DNA nucleotides.

Figure 3.17. DNA replication at a replication fork showing continuous DNA synthesis on the lower strand and discontinuous DNA synthesis on the upper strand where Okazaki fragments are dd 7) Thus, the leading strand is synthesized continuous- ly, while the lagging strand is synthesized discontin- uously in the form of shorter pieces of DNA with Figure 3.18. Removal of the RNA primers by the 5’-> 3’ exonuclease of DNA polymerase I, and replacement with DNA interspersed RNA primers called Okazaki frag- nucleotides on the lagging strand.

CONCEPTS OF GENOMIC BIOLOGY Page 18

Primer removal differs from that in prokaryotes. Pol 10) The DNA fragments lacking RNA primers are now continues extension of the newer Okazaki fragment, fastened together using an enzyme called DNA displacing the RNA and producing a flap that is removed ligase that closes the remaining gaps on the lagging by nucleases, thus allowing the Okazaki fragments to be strand. joined by DNA ligase. Other DNA polymerases replicate mitochondrial or DNA, or they are used in DNA repair. These are all similar to the prokaryotic system described in detail above.

3.4.6. Replicating ends of chromosomes. (return) Figure 3.19. DNA ligase joins an opening in a DNA strand remaking acomplete phosphodiester-linked polynucleotide chain. Replicating the ends of chromosomes in organisms without circular chromosomes presents unique problems. Removal of primers at the 5’-end of the 3.4.5. DNA replication in Eukaryotes. (return) newly made strand will produce shorter strands that cannot be extended with existing DNA polymerases, and of eukaryotic DNA replication are not as if the gap is not addressed chromosomes would become well characterized as their prokaryotic counterparts. shorter each time DNA replicates. Thus a new Fifteen DNA polymerases are known in mammalian mechanism for the completion of the ends of the cells, for example. Three DNA polymerases are used to chromosome is required. This is accomplished using the replicate nuclear DNA. Pol extends the 10-nt RNA system. primer by about 30 nt. Pol and Pol extend the RNA/DNA primers, one the leading strand and the other Most eukaryotic chromosomes have short, species- on the lagging stand, but it is not clear which specific sequences tandemly repeated at their synthesizes which. . It has been shown that chromosome lengths are maintained by telomerase, which adds repeats without using the cell’s regular replication CONCEPTS OF GENOMIC BIOLOGY Page 19 machinery. In humans, the telomere repeat sequence is Telomerase, an enzyme containing both protein and 5’-TTAGGG-3’. RNA, includes an 11-bp RNA sequence used to synthesize the new telomere repeat DNA. Using an RNA template to make DNA, telomerase functions as a reverse transcriptase called TERT (telomerase reverse transcriptase). The 3’-end of the telomerase RNA

contains the sequence 3’-CAUC, which binds the 5-- GTTAG-3’ overhang on the chromosome, positioning telomerase to complete its synthesis of the GGGTTAG telomere repeat. Additional rounds of telomerase activity lengthen the chromosome by adding telomere repeats. Ends of telomere DNA usually loop back to form a D-loop. After telomerase adds telomere sequences, chromosomal replication proceeds in the usual way. Any shortening of the chromosome ends is compensated for by the addition of the telomere repeats. Telomere length may vary, but organisms and cell types have characteristic telomere lengths, resulting from many levels of regulation of telomerase. Mutants affecting telomere length have been identified, and data shortening of telomeres eventually leads to cell death. Loss of telomerase activity results in limited rounds of before the cell death. Figure 3.20. The dilemma of how the 3’ overhangs are replicated at each end of the chromosome to duplicate a chromosome and make sister . CONCEPTS OF GENOMIC BIOLOGY Page 20

Figure 3.21. Replication of chromosome ends using telomerase. CONCEPTS OF GENOMIC BIOLOGY Page 21

3.5. TRANSCRIPTION. (RETURN)

In cells the genetic information carried in the DNA nucleotide sequence becomes functional information that gives characteristics to cells ultimately specifying traits. This conversion of DNA sequence information into functional information begins with the creation of cellular RNAs from one of the two strands of DNA sequence. This process is called transcription. The mechanism by which these cellular RNAs are transcribed from will be presented in this section while the regulation of these processes will be covered later.

3.5.1. Cellular RNAs are transcribed from DNA (return) Ribosomal RNAs (return 3.6.4.) The most abundant type of RNA in most cells is a structural component of the cellular particle that is involved in the synthesis of proteins called a . Since ribosomes have 2 subunits, a large subunit and a small subunit, they also have two major types of ribosomal RNA. These are described in detail in Table 3.2. In addition to the largest ribosomal RNAs there are additional smaller ribosomal RNAs as well. Note that the size and nature of all of these ribosomal RNAs is different in Prokaryotes and Eukaryotes.

CONCEPTS OF GENOMIC BIOLOGY Page 22 In prokaryotes a small 30S ribosomal subunit Mammalian mitochondria have only two contains the 16S ribosomal RNA. The large 50S mitochondrial rRNA molecules (12S and 16S) but do not ribosomal subunit contains two rRNA species (the 5S contain 5S rRNA. The ribosomal RNAs are transcribed and 23S ribosomal RNAs). Bacterial 16S ribosomal RNA, from the mitochondrial genome. This is also the case for 23S ribosomal RNA, and 5S rRNA genes are typically mitochondrial rRNAs although contain a organized as a co-transcribed unit (). There may more prokaryotic like ribosomal RNAs, i.e. a 16S, a 26S, be one or more copies of the operon dispersed in the and a 5S rRNA. Plants also contain chloroplast genome (for example, Escherichia coli has seven). ribosomal RNAs (16S, 23S, and 5S) produced by contains either a single rDNA operon or transcription from the chloroplast genome. multiple copies of the operon. Messenger RNAs – mRNAs In Eukaryotes, the cytoplasmic small ribosomal All organisms (and mitochondria and ) subunit (40S) contains an 18S rRNA while the large produce a type of RNA that codes for the ribosomal subunit (60S contains a 28S, 5S, and 5.8S sequence of proteins. This RNA is a copy of the DNA rRNA. As in Prokaryotes these rRNAs are structural sequence of the gene and is transcribed from one of the components of ribosomes where they perform essential two DNA strands of each gene. By reproducing the DNA function. In mammals, the 28S, 5.8S, and 18S rRNAs are sequence as an mRNA copy the sequence information encoded by a single nuclear transcription unit (45S). for the gene is faithfully maintained allowing the Two internally transcribed spacers separate the 3 rRNA generation of many gene “copies” that can be used to species in the 45S transcript. Generally, there are many produce even more protein copies from each gene. copies of the 45S rDNAs organized clusters throughout the nuclear genome. In humans, for example, each Transfer RNAs - tRNA (return 3.6.3.) cluster has 300-400 repeats. 5S rDNA is not made as Transfer RNAs (tRNAs) are smaller (~90 nt) RNA part of the 45S transcript, but occurs in tandem arrays molecules that are transcribed from genes scattered (~200-300 5S genes) interspersed in the mammalian throughout both Prokaryotic and Eukaryotic genomes, genome independently of the 45S rDNA genes. including mitochondrial and chloroplast genomes. These molecules are the “decoding” molecules that determine which amino acids are put in proteins in the CONCEPTS OF GENOMIC BIOLOGY Page 23 order specified by the nucleotide sequence in the other types of RNA, mostly rRNA, tRNA, and snRNA. mRNA. They are highly structured RNA molecules, and One of the main functions of snoRNAs involves there is at least one, often several, tRNA for each of the modification of the 45S ribosomal precursor so that it twenty protein-contained amino acids. Each tRNA is can be futher processes to generate the 18S, 5.8S, and processed from a transcribed precursor-tRNA molecule 28S rRNAs. coded for by specific tRNA genes, and typically there is Small regulatory RNAs are found in prokaryotes but one tRNA produced per tRNA gene. where they are involved in the regulation of gene In Eukaryotes tRNA are scattered across all expression, but mostly they are known for the role they chromosomes, and there are separate sets of tRNA play in transcriptional, posttranscriptional and genes in each of the organelle genomes present in translational control of gene expression in Eukaryotes. eukaryotes. These molecules are an array of 20-30 nt RNAs Other Non-protein-coding Transcribed RNAs transcribed in various ways from genes in the genomes of organisms. Note that although there are primarily 2 More recently additional types of RNAs that perform types of srRNAs, microRNAs (miRNA) and short vital functions in cells have been described. Most of interfering RNA (siRNA) these types are specific to these have been described in Eukaryotes once we certain organisms and there are likely thousands of described and characterized genomes of Eukaryotes. genes transcribed for such srRNAs. Small nuclear RNAs (snRNA) are smaller RNAs (typically ~ 150 nt) transcribed from nuclear DNA in 3.5.2. RNA polymerases catalyze transcription (return) eukaryotic cells. snRNAs are structurally part of small RNA polymerase is the enzyme responsible for nuclear ribonucleoprotein particles (snRNPs) that are copying a DNA sequence into an RNA sequence, during involved in processing mRNAs in the nucleus of cells. the process of transcription. As complex molecule Typically there are but a handful of different snRNAs composed of protein subunits, RNA polymerase controls made in each species and these are highly conserved the process of transcription, during which the among eukaryotes. information stored in a molecule of DNA is copied into a Small nucleolar RNAs (snoRNAs) are a class of small molecule of cellular RNA. RNA molecules that function to guide modification of CONCEPTS OF GENOMIC BIOLOGY Page 24 The detailed mechanism of how RNA polymerase of RNA polymerase that transcribes mRNA, tRNA, and works is shown in Figure 3.22. all rRNAs. Eukaryotes contain three ( and fungi) to five (plants) distinct types of RNA polymerases. Each of these RNA polymerases transcribes different species of RNA as shown in Table 3.3.

Figure 3.22. The chemical reaction catalyzed by RNA polymerases showing both the reactants and products and the specificity of base pair addition. Note the antiparallel nature of the RNA strand to the DNA strand being transcribed. RNA polymerase makes a phosphodiester bond between the 5’-phosphate group closest to the ribose sugar and the 3’-OH on the 3’-end of the growing strand of RNA.

Multisubunit RNA polymerases exist in all species, but the number and composition of these proteins vary across taxa. For instance, bacteria contain a single type CONCEPTS OF GENOMIC BIOLOGY Page 25 A prokaryotic gene is a DNA sequence in the In spite of these differences, there are striking chromosome. The gene has three regions, each with a similarities among transcriptional mechanisms for all function in transcription (see Figure 3.23.). These are: RNA polymerases. For example, transcription is divided into three steps for both bacteria and eukaryotes. They are initiation, elongation, and termination. The process of elongation is highly conserved between bacteria and eukaryotes, but initiation and termination are somewhat different. All species require a mechanism by which transcription can be regulated in order to achieve Figure 3.23. Prokaryotic genes all have regions upstream (toward the 5’-end of the mRNA) of the protein coding gene and spatial and temporal changes in gene expression. regions downstream (toward the 3’-end of the mRNA). Proteins that interact with the core RNA polymerase, These regions are located at the 3’-end (promoter) and the 5’-end (terminator) of the template strand of DNA. Typically the nucleotide and that recognize specific sequences in the DNA where RNA polymerase begins transcribing is designaed the +1 mediate these initial regulatory steps during nucleotide position, and sequences in the promoter are designated as transcription initiation. However the types and nature (-) nt positions. of these interacting proteins are quite distinct in Prokaryotes compared to Eukaryotes. This leads to a 1) A promoter sequence that attracts RNA discussion of how transcription initiation at each gene polymerase to begin transcription at a site locus takes place in both Prokaryotes and Eukaryotes. specified by the promoter. Some genes use one strand of DNA as the template; other genes use 3.5.3. Transcription in Prokaryotes (return) the other strand. For a model of Prokaryotic gene regulation, the 2) The transcribed sequence, called the RNA-coding bacterium, Escherichia coli, will be used as a model. sequence. The sequence of this DNA corresponds This model is similar to nearly all Prokaryotes. with the RNA sequence of the transcript. 3) A terminator region that specifies where trans- cription will stop. CONCEPTS OF GENOMIC BIOLOGY Page 26

The process of transcription initiation in E. coli is a shown in Figure 3.24. The process involves two DNA sequences centered at -35 bp and -10 bp upstream from the +1 start site of transcription in the promoter region of the gene. These two consensus sequences (in E. coli) are 5’-TTGACA-3’ at the -35 nt region and 5’-TATAAT-3’ b at the -10 region (previously known as a , but they can vary according to the organism and gene within the organism. Transcription initiation requires the RNA polymerase holoenzyme (only one type is found in bacteria) to bind to the promoter DNA sequence. Holoenzyme consists c of: 1) Core enzyme of RNA polymerase, containing five polypeptides (two alpha, one beta, one beta’ and an omega; written as α2ββ’ω).

2) One of several sigma factors (σ-factor) that binds the core enzyme and confers ability to recognize d

specific gene promoters. RNA polymerase holoenzyme binds promoter in two steps (Figure 3.24) that involve the sigma factor. First, it loosely binds to the -35 sequence of dsDNA closed Figure 3.24. Prokaryotic (E. coli) transcription initiation. a) RNA Polymerase holoenzyme is “recruited to the promoter by a specific promoter complex (Figure 3.24a). Second, it binds σ-factor (sigma factor); b) strands of the DNA are separated tightly to the -10 sequence (Figure 3.24b), untwisting exposing the sense strand for copying; d) nucleotides are polymerized as RNA polymerase moves down the strand, and σ- about 17 bp of DNA at the site. At this point RNA factor leaves the complex as; d) elongation continues, the newly made mRNA exits the enzyme, and the transcription “bubble” moves

CONCEPTS OF GENOMIC BIOLOGY Page 27 polymerase is in position to begin transcription (open TABLE 3.4. promoter complex). E. coli σ-factors and their function Promoters often deviate from consensus the s-factors Function consensus sequences at -35 and -10, and the associated σ70 (rpoD) = σA the "housekeeping" sigma factor or also genes will show different levels of transcription, called as primary sigma factor, transcribes corresponding with σ-factor’s ability to recognize their most genes in growing cells. Every cell has a “housekeeping” sigma sequences. E. coli has several sigma factors with important roles in gene regulation. Each sigma can bind σ19 (fecI) the ferric citrate sigma factor, regulates a molecule of core RNA polymerase and guide its choice the fec gene for iron transport of genes to transcribe, but has different affinity for σ24 (rpoE) the extracytoplasmic/extreme heat stress sigma factor specific promoters. σ28 (rpoF) the flagellar sigma factor 70 Most E. coli genes have a σ promoter, and σ70 is σ32 (rpoH) the heat shock sigma factor; it is turned usually the most abundant σ-factor in the cell. σ70 on when the bacteria are exposed to heat. Due to the higher expression, the factor recognizes the sequence TTGACA at -35, and TATAAT at will bind with a high probability to the -10. Other sigma factors may be produced in response polymerase-core-enzyme. Doing so, other to changing conditions, and each can bind the core RNA heatshock proteins are expressed, which enable the cell to survive higher polymerase, enabling holoenzyme to recognize different temperatures. Some of the enzymes that promoters. An example is σ32, which arises in response are expressed upon activation of σ32 are to heat shock and other forms of stress and recognizes a chaperones, proteases and DNA-repair enzymes. sequence at -39 bp and -15 bp. E. coli has additional σ38 (rpoS) the starvation/stationary phase sigma sigma factors with various roles (Table 3.4), and other factor bacterial species also have multiple similar and σ54 (rpoN) the nitrogen-limitation sigma factor additional sigma factors. Many bacterial genes are controlled by regulatory proteins that interact with regulatory sequences near proteins, i.e. activators that stimulate transcription by the promoter. There are two classes of regulatory facilitating RNA polymerase activity, and that CONCEPTS OF GENOMIC BIOLOGY Page 28 inhibit transcription by decreasing RNA polymerase move along the transcript and destabilize the binding or elongation of RNA. RNA–DNA hybrid at the termination region, Once initiation is completed, RNA synthesis begins, terminating transcription. and the sigma factor is released and reused for other initiations (Figure 3.24c). Core enzyme completes the Figure 3.25. Simplified transcript. Core enzyme untwists DNA helix locally, schematics of the mechanisms of prokaryotic transcriptional allowing a small region to denature. Newly synthesized termination. In Rho- RNA forms an RNA–DNA hybrid, but most of the independent termination, a transcript is displaced as the DNA helix reforms (Figure terminating hairpin forms on the nascent mRNA interacting with 3.24d). the NusA protein to stimulate release of the transcript from

Terminator sequences are used to end transcription. the RNA polymerase complex In E. coli there are two types of transcript termination: (top). In Rho-dependent termination, the Rho protein 1) Rho-independent (ρ-independent) or type I binds at the upstream rut site, terminators (Figure 3.25, upper) have twofold translocates down the mRNA, symmetry that would allow a hairpin loop to form and interacts with the RNA polymerase complex to (Figure 3.25). The palindrome is followed by 4–8 U stimulate release of the residues in the transcript, and when these transcript.

sequences are transcribed, they form a stem-loop structure and cause chain termination. 2) Rho-dependent (ρ-dependent) or type II 3.5.4. Transcription in Prokaryotes – polycistronic terminators (Figure 3.25, lower) require the mRNAs from operons (return) protein ρ for termination. Rho binds to the C-rich While we have considered the structure of a sequence in the RNA upsteam of the termination prokaryotic gene as having a promoter, a coding region, site and moves with the transcript until and a termination region (see Figure 3.23), in most encountering a stalled polymerase. It then acts as cases multiple protein-coding regions are under the a helicase, using ATP hydrolysis for energy to control of a single promoter. This genetic structure is CONCEPTS OF GENOMIC BIOLOGY Page 29 referred to as an operon, and the mRNA transcribed from each operon is in fact an RNA capable of producing multiple peptides. This type of mRNA, typical of prokaryotes, and Eukaryotic mitochondria and chloroplasts, is referred to as a polycistronic mRNA. Thus, the proteins binding to promoter and regulatory regions of genomes that regulate gene expression in prokaryotes regulate the production of multiple peptides simultaneously. Typically, these peptides are functionally related, e.g. the proteins required to catabolize lactose as a carbon source [] (see Figure 3.26.), or the proteins required to make the Figure 3.26. The lac operon in E. coli. Three lactose metabolism genes (lacZ, lacY, and lacA) are organized together in a cluster called amino acid tryptophan [] (see Figure 3.27.). the lac operon. The coordinated transcription and translation of The lac operon is an example of an inducible the lac operon structural genes is controlled by a shared promoter, operator, and terminator. A lac regulator gene (lacI) with its separate (positively regulated) operon. The protein promoter is found just outside the lac operon. The lacI gene produces does not bind to the operator and stop transcription in a regulatory protein, the protein that binds to the the presence of the effector (lactose), while the “inducer”, which is lactose (or a derivative, allolactose) when it is present in a cell. The lacI protein also can bind to a region of the tryptophan operon is an example of a repressible operon between the lac promoter and the structural genes referred to (netatively regulated) operon. The repressor protein as the lac operator (lacO). In the absence of lactose (allolactose) the only binds to the operator in the presence of the lacI protein tightly binds to the operator and prevents RNA polymerase from transcribing the polycistronic mRNA. When lactose binds to the effector molecule (tryptophan). Thus, using the similar lacI protein, the lacI protein cannot bind to the lacO gene, and RNA types of regulatory proteins and genes, and similar polymerase proceeds to produce the polycistronic mRNA operon structure almost any type of gene regulation can corresponding to the lacZ, lacY, and lacA genes. © 2013 Nature Education Adapted from Pierce, Benjamin. Genetics: A Conceptual Approach, be obtained. 2nd ed. All rights reserved. Additionally, it should be noted that the proteins for regulated as a consequence of the production of related critical cellular functions can be coordinately polycistronic mRNAs. CONCEPTS OF GENOMIC BIOLOGY Page 30 A 3.5.5. Beyond Operons - Modification of expression of prokaryotic genes (return) Additional regulation of operons is often used to produce further fine-tuning of transcription. This can vary with each operon in Prokaryotic genomes. A common type of additional regulation has been shown for the lac operon and many other catabolic operons. B Glucose is the preferred carbon source in E. coli. In the presence of glucose, lactose will not be utilized. This means that if an abundant supply of glucose and lactose are both available, the lac operon will not be induced until the glucose is used up. This phenomenon is often referred to as catabolite repression, and the critical components of catabolite repression in the lac operon Figure 3.27. The tryptophan operon of E. Coli consists of five structural genes (trpE, trpD, trpC, trpB, and trpA) with a common are shown in Figure 3.28. promoter, operator, and terminator. A separate promoter regulates the trpR regulatory protein (trp repressor). Transcription of the trp When the concentration of intracellular glucose is operon produces a polycistronic mRNA that contains a leader peptide low (Figure 3.28, upper panel) the levels of the signal and coding sequences for the 5 structural genes that produce the 5 enzymes required to make tryptophan. Since tryptophan is an amino molecule cAMP are high, and cAMP binds to CAP acid required for cell growth, the trp operon is “repressed” when protein. The association between RNA polymerase and cells have access to an abundant supply of tryptophan (panel A), and becomes “derepressed” when cells are starving for tryptophan (panel promoter DNA is enhanced when the CAP-cAMP B). A) Tryptophan present, repressor bound to operator, operon complex is present. Enhanced RNA polymerase binding repressed. When complexed with tryptophan, the repressor protein binds tightly to the trp operator, thereby preventing RNA polymerase leads to a high rate of transcription (provided that the from transcribing the operon structural genes. B) Tryptophan absent, operator is free) and translation of the lac operon repressor not bond to operator, operon derepressed. In the absence of tryptophan, the free trp repressor cannot bind to the operator polycistronic mRNA. The resulting mRNA transcripts are site. RNA polymerase can therefore move past the operator and translated into the enzymes beta-galactosidase, transcribe the trp operon structural genes, giving the cell the capability to synthesize tryptophan. permease, and transacetylase, and these enzymes are

CONCEPTS OF GENOMIC BIOLOGY Page 31 used to break down lactose into glucose and Galactose. The latter can subsequently be converted into glucose. When the glucose concentration in the cell is high (Figure 3.28, lower panel), low concentrations of cAMP result in decreased binding of cAMP to CAP. Therefore, the cAMP-CAP complex is not bound to the bacterial DNA, and as a result, neither is RNA polymerase. This lowers the rate of transcription and polycistronic mRNA production is decreased for the lacZ, lacY, and lacA genes. The absence of these proteins reduces glucose production from lactose, leading to the use of the available glucose prior to the use of any lactose. The interaction of CAP with DNA and with cAMP directly regulates the production of mRNA. Some type of interaction of proteins with regulatory regions in the DNA mediates the phenomenon of catabolite repression in operons associated with carbon source utilization in prokaryotes. In anabolic operons (typical of amino acid synthesis), a phenomenon of additional regultation referred to as attenuation has been documented. The example most commonly considered involves the trp operon discussed above.

Figure 3.28. Diagram showing the major effects of low glucose The leader sequence in the polycistronic mRNA of the (upper panel) and high glucose (lower panel) on the expression trp operon contains several trp codons, and can form 3 of lac operon genes. © 2013 Nature Education Adapted from Pierce, Benjamin. Genetics: A Conceptual Approach, different stem-loop structures. Depending on the 2nd ed. (New York: W. H. Freeman and Company), 446. All rights reserved. CONCEPTS OF GENOMIC BIOLOGY Page 32 when trp is abundant. While the other structure does not terminate transcription, and the polycistronic mRNA is produced. Several amino acid synthetic operons (e.g. phenyl- alanine, histidine, leucine, threonine, and isoleucine- valine) demonstrate this same type of attenuation. Consequently, this mechanism is relatively widespread as a means of modulating and fine-tuning pathways for amino acid biosynthesis.

3.5.6. Transcription in Eukaryotes (return) Although transcription in Eukaryotes follows the general principles outlined above for Prokaryotes, there Figure 3.29. Attenuation of the trp operon. The diagram at the center are many specific details that are different. Recall that shows the general folding of the leader sequence of the trp polycistronic mRNA and labeling of strands. The mRNA is folded in four there are as many as five Eukaryotic RNA polymerases. parallel strands connected at the bottom by two small hairpin loops While each of these transcribes different types of RNA, between strands 1 and 2 and strands 3 and 4 and by one large hairpin they are all Multisubunit RNA polymerases that function loop at the top between strands 2 and 3. In the structure on the left, strands 1 and 2 and strands 3 and 4 are stabilized by base pairing. This in related ways. The mechanism of the important RNA structure terminates transcription of the trp operon in the presence of polymerase II that produces mRNAs will be described high tryptophan. In contrast, strands 2 and 3 are stabilized by base pairing in the structure on the right, which allows transcription of the here, but each of the 5 has similar mechanisms for trp operon to continue in the presence of low tryptophan. © 1981 Nature Publishing Group Yanofsky, C. Attenuation in the control of expression of bacterial initiation, elongation, and termination of transcription. operons. Nature 289, 753 (1981). All rights reserved. Eukaryotic mRNAs are nearly always monocistronic mRNAs with a general structure as shown in Figure amount of available tryptophan, one of two structures 3.30. The key transcribed features are a 5’-UTR can be produced (Figure 3.29) One structure leads to (), a coding region, and 3’-UTR. termination of transcription in the leader sequence CONCEPTS OF GENOMIC BIOLOGY Page 33 Other nontranscribed features that are typical of Eukaryotic promoters, core promoter elements and mRNAs in promoter proximal elements. Core promoter elements are located near the transcription start site and specify where transcription 5’ UTR Coding Region 3’ UTR upstream TATA box begins. Examples include: Enhancers Promoter 1 Exon 2 Exon 3 DNA 5’ 3’ 1 Intron 2 1) The initiator element (Inr), a pyrimidine-rich A Gene Transcription that spans the transcription start site; by RNA Polymerase II 2) The TATA box (also known as a TATA element Primary Transcript or Goldberg–Hogness box) at -30 nt (full Nuclear Processing – 5’ Capping &

poly-A tail addition sequence is TATAAAA). This element aids in

G

Me - 5’ Cap 7 3’ Poly-A tail Pre-mRNA AAAAAA local DNA denaturation and sets the start point Nuclear Processing – Intron removal for transcription. & transport to the cytoplasm

5’ UTR Protein Coding Region 3’ UTR Promoter-proximal elements are required for high

5’ Cap G 3’ Poly-A tail Me - Final mRNA 7 AAAAAA levels of transcription. They are further upstream from the start site, at positions between -50 and -200. These Figure 3.30. Diagram showing the elements and structure of a typical eukaryotic mRNA-producing gene. Note that a primary transcript is elements generally function in either orientation. produced which is subsequently modified by the addition of a 7-methyl Examples include: guanosine (Cap), and the poly-A tail. Subsequently, are spiced from the transcript to make a finished mRNA ready to exit the nucleus. 1) The CAAT box, located at about -75. 2) The GC box, consensus sequence GGGCGG, Eukaryotic cells include a 5’-Cap structure and a poly-A located at about -90. tail that will be described in more detail below. Various combinations of core and proximal elements Promoters in many Eukaryotes have been analyzed are found near different genes. Promoter-proximal either by the use of directed within promoter elements are key to understanding the rate at which sequences or by comparative analysis of multiple genes transcription initiation occurs and thus the level of gene from different organisms. These studies have revealed expression. that there are two types of elements found in CONCEPTS OF GENOMIC BIOLOGY Page 34 Eukaryotic Transcription initiation requires assembly complex (PIC). Note that the PIC is sometimes referred of RNA polymerase II and binding of general to simply as the transcription initiation complex. GTFs transcription factors (GTFs) on the core promoter at the are needed for initiation by all RNA polymerases and are TATA box (see Figure 3.31) forming a preinitiation numbered to match their corresponding RNA polymerase and lettered in the order of discovery (e.g., TFIID was the fourth GTF discovered that works with RNA polymerase II). The general transcription factors along with other proteins forming specific PICs at a particular promoter poise RNA polymerase to begin transcription of the gene behind the promoter. Once the PIC forms, RNA polymerase will initiate transcription. However, the rate at which transcription initiation occurs at a particular gene depends on 2 factors. The first factor is the number and types of / sequence elements found in the promoter. These sequence elements can be from 50 nt to over 1,000 nt in length. Enhancer/silencer elements must be located in cis (meaning close to) to promoter/coding sequence in order to effect the expression of a gene. Some enhancer/silencer sequences have been found that are as much as 1 Figure 3.31. Eukaryotic transcription begins with the formation of a transcription preinitiation complex (PIC) on the TATA box in the megabase (1,000,000 nt) away from the transcription promoter of the gene. The PIC is a large complex of proteins that is start site (TATA box), but most are within a few necessary for the transcription of protein-coding genes in eukaryotes. The preinitiation complex helps position RNA polymerase II over gene thousand bases or less of the TATA box. transcription start sites, denatures the DNA, and positions the DNA in the RNA polymerase II active site for transcription. The minimal PIC The second factor regulating the rate of transcript includes RNA polymerase II and six general transcription factors: TFIIA, initiation is proteins that can bind to specific enhancer TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. Additional regulatory complexes (co- activators and chromatin-remodeling complexes) could also be or silencer sequence elements. Activators are proteins components of the PIC. CONCEPTS OF GENOMIC BIOLOGY Page 35 that bind to enhancer sequences. Activator proteins either interfering with the critical protein-protein also contain protein-protein interaction domains that interactions of activators or by binding tightly to allow them to bind to and affect the behavior of other enhancer sequences keeping activators from binding. proteins. These other proteins could be RNA polymer- Thus, activator and repressor proteins are important ase itself; other general transcription factors in the PIC; in transcription regulation. They are recognized by or other adapter proteins that interact with the PIC (see promoter-proximal elements and other enhancer/ Figure 3.32). silencer sequence elements found upstream of the promoter, and they are specific for groups of similarly regulated genes. These proteins mediate the rate of transcription initiation for genes that contain recognized sequence elements. The presence or absence of specific activator and repressor sequences in a specific cell either because of cell type or because of environmental factors can mediate the initiation of transcription. For example, housekeeping genes (used in all cell types for basic cellular functions) have common promoter- proximal elements and are recognized by activator proteins found in all cells. Examples of genes with housekeeping functions include: actin, hexokinase, and Figure 3.32. An activator protein binding to a promoter-proximal Glucose-6-phosphate dehydrogenase. enhancer seuqence, interacting with an adapter protein, and the PIC to enhance transcription initiation. Genes expressed only in some cell types or at particular times have promoter-proximal elements recognized by activator proteins found only in specific Repressor proteins can either bind to silencer or cell types or times. Enhancers are another cis-acting enhancer sequence elements in the promoter. In so element. They are required for maximal transcription of doing they reverse the effect of activator proteins by a gene. CONCEPTS OF GENOMIC BIOLOGY Page 36 Enhancers/silencers are usually upstream of the Additionally, each tissue produces a set of tissue- transcription initiation site but may also be specific and general activator and repressor proteins, downstream. They may modulate from a distance of and the spectrum of these proteins can be influenced by thousands of base pairs away from the initiation site. environmental factors such as cellular surroundings, Because there are similar enhancer and silencer temperature, chemical environment, etc. This affords sequences in front of several genes that are the ability of each cell to “customize” the expression of coordinately regulated, and each gene promoter has its genes depending on the protein functions that are own unique spectrum of such sequences, Eukaryotic required in each cell based on cell type and cellular cells can avoid the necessity of contiguous organization environment. This phenomenon is referred to as of genes into operons as is common in prokaryotes. combinatorial gene regulation and is illustrated in Figure 3.33. Once transcription initiation has occurred, the RNA polymerase moves away from the TATA box as the transcript is elongated. This is fundamentally the work of RNA polymerase, and the other proteins of the PIC now leave the complex to be recycled to form new PICs while RNA polymerase elongates the primary transcript nucleotide chain to complete the formation of the primary transcript. 3.5.7. Processing the primary transcript into a mature mRNA (return) As shown in Figure 3.30, the primary transcript must Figure 3.33. Combinatorial gene regulation leads to the coordinate be processed in 3 significant ways to become a mature regulation of batteries of genes in Eukaryotes. The types of enhancer and silencer sequences in front of each gene determine the level of mRNA. This processing all takes place in the nucleus of transcription of each gene based on the activator and repressor the cell and prepares the mRNA for transport to the proteins present in each cell/tissue type and the environment surrounding each cell. CONCEPTS OF GENOMIC BIOLOGY Page 37 cytosol of the cell where it will subsequently be to the 5’-end of the transcript. Note that the cap is translated to produce a protein. reversed compared to the RNA strand, i.e. it is attached First, the primary transcript must acquire a cap at its 5’ to 5’ not 5’ to 3’ as are the other nucleotides in the 5’- end. The cap prepares the transcript for transport transcript. The cap can be attached to the transcript from the nucleus, provides stability to attack by during transcription before completion of the primary in the cytoplasm, and aids in the initiation transcript, but it is critical to efficient transport of the of the translation process. Structurally, a cap consists of mRNA from the nucleus so it must be attached in the a 7-methyl guanosine attached by 3 phosphate groups nucleus. The second processing step occurs at the 3’-end of the transcript (Figure 3.35), and is involved in transcript termination of elongation by RNA polymerase II. Note that other eukaryotic RNA polymerases may have other mechanisms of transcript termination since they do not produce poly adenylated transcripts. The process for addition of the poly-A tail involves a complex of proteins that assembles at a poly-A addition consensus sequence (AAUAAA). The proteins involved in the cleavage step of the termination process include: 1) CPSF (cleavage and polyadenylation specificity factor). 2) CstF (cleavage stimulation factor). 3) Two cleavage factor proteins (CFI and CFII). Following cleavage, the enzyme poly(A) polymerase Figure 3.34. Structure of the 5’-Cap added to Eukaryotic primary RNA transcripts. The cap consists of a 7-methyl guanosine (PAP) adds A nucleotides to the 3’ end of the cleaved residueattached 5’ to 5’ at the 5’ end of the transcript by 3 transcript RNA, using ATP as a substrate. PAP is bound phosphate groups (a phosphotetraester). to CPSF during this process. Typically, about 200-250 A’s CONCEPTS OF GENOMIC BIOLOGY Page 38 are added. PABII (poly-A binding protein II) binds the A. Cleavage poly-A tail as it is produced. Upon completion of the poly-A tail, further transcription is terminated with the release of the pre-mRNA transcript from the protein complex. The third step in the process of producing a mature mRNA from a pre-mRNA involves removal of sequences that are found in the DNA coding sequence and pre- mRNA that are absent from the mature mRNA that is found in the cytoplasm of the cell. These removed sequences are called introns. The parts of the pre- mRNA that remain in the mature mRNA are called B. Poly-A tail addition (see Figure 3.30). The removal of introns from the primary transcript to is a process referred to as splicing, and it typically involves a protein RNP particle referred to as a spliceosome. Spliceosomes are small nuclear ribonucleoprotein particles (snRNPs) associated with pre-mRNAs. snRNAs that were previously discussed are structural parts of spliceosome RNPs. The principal snRNAs involved are U1, U2, U4, U5, and U6. Each of these snRNAs is Figure 3.35. The addition of a poly-A tail to the associated with several proteins; e.g. U4 and U6 are transcript terminates transcription of the pre-mRNA. The process involves 2 steps: A) cleavage of the part of the same snRNP. Others are in their own growing primary transcript by a complex of proteins snRNPs. Each snRNP type is abundant (~105 copies per that recognize a poly-A addition signal in the transcript; B) Addition of 200-300 A’s to the 3’ end of the transcript by PolyA polymerase (PAP).

CONCEPTS OF GENOMIC BIOLOGY Page 39 nucleus) consistent with the critical role that these The steps of RNA splicing are outlined in Figure 3.36: snRNPs play in nuclear processes. 1) U1 snRNP binds the 5’ splice junction of the intron, as a result of base-pairing of the U1 snRNA to the intron RNA. 2) U2 snRNP binds by base pairing to the branch-

point sequence upstream of the 3’ splice junction. 3) U4/U6 and U5 snRNPs interact and then bind the

U1 and U2 snRNPs, creating a loop in the intron. 4) U4 snRNP dissociates from the complex, forming

Figure 3.36. The process of intron spicing the active spliceosome. conducted by U2-dependent spiceosomes. Note 5) The spliceosome cleaves the intron at the 5’ splice that there are other types of spiceosomes, and that there are a few introns that are spliced junction, freeing it from exon 1. The free 5’ end of independent of spliceosomes. The binding of at the intron bonds to a specific nucleotide (usually least 5 RNP complexes containing snRNAs and proteins ultimately produce a structure that holds A) in the branch-point sequence to form an RNA the transcript cleaved ends together while the lariat. intron is spliced out producing a “lariat” structure. The exon ends of the transcript are then ligated 6) The spliceosome cleaves the intron at the 3’ together producing a mature mRNA with the junction, liberating the intron lariat. Exons 1 and 2 intron removed from the sequence. are ligated, and the snRNPs are released. One of the most interesting aspects of intron splicing is that there can be different transcripts created based on how introns are spliced. This is referred to as alternative splicing can be used to produce different polypeptides from the same gene as shown in Figures 3.37 and 3.38.

CONCEPTS OF GENOMIC BIOLOGY Page 40

Figure 3.38 Alternative splicing of 1 primary transcript to produce 3 different proteins.

From the above discussion it is clear that processing Figure 3.37. A schematic representation of alternative splicing. The of a mature mRNA from the primary RNA transcript, and figure illustrates different types of alternative splicing: exon inclusion or skipping, alternative splice-site selection, mutually exclusive exons, the transport of the mature mRNA from the nucleus to and intron retention. For an individual pre-mRNA, different alternative the cytoplasm are steps that can influence the amount exons often show different types of alternative-splicing patterns. © 2002 Nature Publishing Group Cartegni, L., Chew, S. L., & Krainer, A. R. Listening to of translatable mRNA for a particular protein that exists silence and understanding nonsense: exonic mutations that affect splicing. Nature Reviews Genetics 3, 285–298 (2002). All rights reserved. in a cell. The details of the steps we have discussed have emerged from a series of original molecular genetic studies, and have been greatly embellished more recently by functional genomic studies that we

will investigate further in subsequent chapters.