CHAPTER 1 Introduction to Molecular Genetics and Genomics CHAPTER OUTLINE PRINCIPLES 1.1 DNA: The Genetic Material • Inherited traits are affected by genes. Experimental Proof of the Genetic Function of DNA • Genes are composed of the chemical deoxyribonucleic acid Genetic Role of DNA in (DNA). Bacteriophage • DNA replicates to form copies of itself that are identical 1.2 DNA Structure: The Double Helix (except for rare mutations). 1.3 An Overview of DNA Replication • DNA contains a genetic code specifying what types of enzymes and other proteins are made in cells. 1.4 Genes and Proteins Inborn Errors of Metabolism as a • DNA occasionally mutates, and the mutant forms specify Cause of Hereditary Disease altered proteins that have reduced activity or stability. Mutant Genes and Defective Proteins • A mutant enzyme is an “inborn error of metabolism” that 1.5 Gene Expression: The Central Dogma blocks one step in a biochemical pathway for the metabolism Transcription of small molecules. Translation The Genetic Code • Traits are affected by environment as well as by genes. 1.6 Mutation • Organisms change genetically through generations in the Protein Folding and Stability process of biological evolution. 1.7 Genes and Environment 1.8 Evolution: From Genes to Genomes, from Proteins to Proteomes The Molecular Unity of Life CONNECTIONS Natural Selection and Diversity Shear Madness Alfred D. Hershey and Martha Chase 1952 Independent Functions of Viral Protein and Nucleic Acid in Growth of Bacteriophage The Black Urine Disease Archibald E. Garrod 1908 Inborn Errors of Metabolism
1 ach species of living organism has a terms of the abstract rules by which heredi- unique set of inherited characteristics tary elements (he called them “factors”) are that makes it different from other transmitted from parents to offspring. His Especies. Each species has its own develop- objects of study were garden peas, with mental plan—often described as a sort of variable traits like pea color and plant “blueprint” for building the organism— height. At one time genetics could be stud- which is encoded in the DNA molecules pre- ied only through the progeny produced sent in its cells. This developmental plan from matings. Genetic differences between determines the characteristics that are in- species were impossible to define, because herited. Because organisms in the same organisms of different species usually do not species share the same developmental plan, mate, or they produce hybrid progeny that organisms that are members of the same die or are sterile. This approach to the study species usually resemble one another, al- of genetics is often referred to as classical ge- though some notable exceptions usually are netics, or organismic or morphological ge- differences between males and females. For netics. Given the advances of molecular, or example, it is easy to distinguish a human modern, genetics, it is possible to study dif- being from a chimpanzee or a gorilla. A hu- ferences between species through the com- man being habitually stands upright and has parison and analysis of the DNA itself. There long legs, relatively little body hair, a large is no fundamental distinction between clas- brain, and a flat face with a prominent nose, sical and molecular genetics. They are dif- jutting chin, distinct lips, and small teeth. ferent and complementary ways of studying All of these traits are inherited—part of our the same thing: the function of the genetic developmental plan—and help set us apart material. In this book we include many ex- as Homo sapiens. amples showing how molecular and classi- But human beings are by no means cal genetics can be used in combination to identical. Many traits, or observable charac- enhance the power of genetic analysis. teristics, differ from one person to another. The foundation of genetics as a molecu- There is a great deal of variation in hair lar science dates back to 1869, just three color, eye color, skin color, height, weight, years after Mendel reported his exper- personality traits, and other characteristics. iments. It was in 1869 that Friedrich Some human traits are transmitted biologi- Miescher discovered a new type of weak cally, others culturally. The color of our acid, abundant in the nuclei of white blood eyes results from biological inheritance, but cells. Miescher’s weak acid turned out to be the native language we learned as a child the chemical substance we now call DNA results from cultural inheritance. Many (deoxyribonucleic acid). For many years traits are influenced jointly by biological in- the biological function of DNA was un- heritance and environmental factors. For known, and no role in heredity was as- example, weight is determined in part by cribed to it. This first section shows how inheritance but also in part by environ- DNA was eventually isolated and identified ment: how much food we eat, its nutri- as the material that genes are made of. tional content, our exercise regimen, and so forth. Genetics is the study of biologically inherited traits, including traits that are in- 1.1 fluenced in part by the environment. DNA: The Genetic Material The fundamental concept of genetics is That the cell nucleus plays a key role in in- that: heritance was recognized in the 1870s by the observation that the nuclei of male and Inherited traits are determined by the ele- female reproductive cells undergo fusion in ments of heredity that are transmitted from the process of fertilization. Soon thereafter, parents to offspring in reproduction; these elements of heredity are called genes. chromosomes were first observed inside the nucleus as thread-like objects that The existence of genes and the rules become visible in the light microscope governing their transmission from gen- when the cell is stained with certain dyes. eration to generation were first articulated Chromosomes were found to exhibit a by Gregor Mendel in 1866 (Chapter 3). characteristic “splitting” behavior in which Mendel’s formulation of inheritance was in each daughter cell formed by cell division
2 Chapter 1 Introduction to Molecular Genetics and Genomics receives an identical complement of chro- as the genetic material, and DNA was as- mosomes (Chapter 4). Further evidence for sumed to function merely as the structural the importance of chromosomes was pro- framework of the chromosomes. The ex- vided by the observation that, whereas the periments described below finally demon- number of chromosomes in each cell may strated that DNA is the genetic material. differ among biological species, the number of chromosomes is nearly always constant within the cells of any particular species. Experimental Proof of the Genetic These features of chromosomes were well Function of DNA understood by about 1900, and they made An important first step was taken by it seem likely that chromosomes were the Frederick Griffith in 1928 when he demon- carriers of the genes. strated that a physical trait can be passed By the 1920s, several lines of indirect from one cell to another. He was working evidence began to suggest a close relation- with two strains of the bacterium ship between chromosomes and DNA. Streptococcus pneumoniae identified as S and Microscopic studies with special stains R. When a bacterial cell is grown on solid showed that DNA is present in chromo- medium, it undergoes repeated cell divi- somes. Chromosomes also contain various sions to form a visible clump of cells called a types of proteins, but the amount and kinds colony. The S type of S. pneumoniae synthe- of chromosomal proteins differ greatly from sizes a gelatinous capsule composed of one cell type to another, whereas the complex carbohydrate (polysaccharide). amount of DNA per cell is constant. The enveloping capsule makes each colony Furthermore, nearly all of the DNA present large and gives it a glistening or smooth (S) in cells of higher organisms is present in the appearance. This capsule also enables the chromosomes. These arguments for DNA as bacterium to cause pneumonia by protect- the genetic material were unconvincing, ing it from the defense mechanisms of an however, because crude chemical analyses infected animal. The R strains of S. pneumo- had suggested (erroneously, as it turned niae are unable to synthesize the capsular out) that DNA lacks the chemical diversity polysaccharide; they form small colonies needed in a genetic substance. The favored that have a rough (R) surface (Figure 1.1). candidate for the genetic material was pro- This strain of the bacterium does not cause tein, because proteins were known to be an pneumonia, because without the capsule exceedingly diverse collection of molecules. the bacteria are inactivated by the immune Proteins therefore became widely accepted system of the host. Both types of bacteria
FPO
R strain S strain
Figure 1.1 Colonies of rough (R, the small colonies) and smooth (S, the large colonies) strains of Streptococcus pneumoniae. The S colonies are larger because of the gelatinous capsule on the S cells. [Photograph from O. T. Avery, C. M. MacLeod, and M. McCarty. Reproduced from the Journal of Experimental Medicine, 1944, vol. 79, p. 137 by copyright permission of The Rockefeller University Press.]
1.1 DNA: The Genetic Material 3 Living Living Heat-killed Living R cells plus S cells R cells S cells heat-killed S cells
Mouse contracts Mouse remains Mouse remains Mouse contracts pneumonia healthy healthy pneumonia
S colonies isolated R colonies isolated No colonies isolated R and S colonies isolated from tissue of dead mouse from tissue from tissue from tissue of dead mouse
Figure 1.2 The Griffith's experiment demonstrating bacterial transformation. A mouse remains healthy if injected with either the nonvirulent R strain of S. pneumoniae or heat-killed cell fragments of the usually virulent S strain. R cells in the presence of heat-killed S cells are transformed into the virulent S strain, causing pneumonia in the mouse.
“breed true” in the sense that the progeny until 1944 that the chemical substance re- formed by cell division have the capsular sponsible for changing the R cells into S type of the parent, either S or R. cells was identified. In a milestone experi- Mice injected with living S cells get ment, Oswald Avery, Colin MacLeod, and pneumonia. Mice injected either with living Maclyn McCarty showed that the sub- R cells or with heat-killed S cells remain stance causing the transformation of R cells healthy. Here is Griffith’s critical finding: into S cells was DNA. In doing these exper- mice injected with a mixture of living R cells iments, they first had to develop chemical and heat-killed S cells contract the disease— procedures for isolating almost pure DNA they often die of pneumonia (Figure 1.2). from cells, which had never been done be- Bacteria isolated from blood samples of fore. When they added DNA isolated from these dead mice produce S cultures with a S cells to growing cultures of R cells, they capsule typical of the injected S cells, even observed transformation: A few cells of though the injected S cells had been killed type S cells were produced. Although the by heat. Evidently, the injected material DNA preparations contained traces of pro- from the dead S cells includes a substance tein and RNA (ribonucleic acid, an abun- that can be transferred to living R cells and dant cellular macromolecule chemically confer the ability to resist the immunologi- related to DNA), the transforming activity cal system of the mouse and cause pneumo- was not altered by treatments that de- nia. In other words, the R bacteria can be stroyed either protein or RNA. However, changed—or undergo transformation— treatments that destroyed DNA eliminated into S bacteria. Furthermore, the new char- the transforming activity (Figure 1.3). These acteristics are inherited by descendants of experiments implied that the substance re- the transformed bacteria. sponsible for genetic transformation was Transformation in Streptococcus was orig- the DNA of the cell—hence that DNA is the inally discovered in 1928, but it was not genetic material.
4 Chapter 1 Introduction to Molecular Genetics and Genomics (A) The transforming activity in S cells is not destroyed by heat.
Cells killed by heat Plate on agar medium
S cell extract (contains mostly DNA with a little R colonies and Culture of S cells protein and RNA) Culture of R cells a few S colonies
(B) The transforming activity is not destroyed by either protease or RNase.
Protease or RNase Plate on agar medium Conclusion: Transforming activity not protein or RNA S cell extract
R colonies and Culture of R cells a few S colonies
(C) The transforming activity is destroyed by DNase.
DNase
Plate on agar medium Conclusion: Transforming activity most likely DNA
S cell extract
R colonies only Culture of R cells
Figure 1.3 A diagram of the Avery–MacLeod–McCarty experiment that demonstrated that DNA is the active material in bacterial transformation. (A) Purified DNA extracted from heat-killed S cells can convert some living R cells into S cells, but the material may still contain undetectable traces of protein and/or RNA. (B) The transforming activity is not destroyed by either protease or RNase. (C) The transforming activity is destroyed by DNase and so probably consists of DNA.
1.1 DNA: The Genetic Material 5 radioactive isotopes of the two elements. Hershey and Chase produced particles con- Protein Head taining radioactive DNA by infecting E. coli (protein DNA and DNA) cells that had been grown for several gen- erations in a medium that included 32P(a radioactive isotope of phosphorus) and then collecting the phage progeny. Other parti- cles containing labeled proteins were ob- Tail tained in the same way, by using medium (protein that included 35S (a radioactive isotope of only) sulfur). In the experiments summarized in Figure 1.5, nonradioactive E. coli cells were infected with phage labeled with either 32P (part A) or 35S (part B) in order to follow the DNA (A) (B) and proteins individually. Infected cells Figure 1.4 (A) Drawing of E. coli phage T2, showing various components. were separated from unattached phage par- The DNA is confined to the interior of the head. (B) An electron micro- ticles by centrifugation, resuspended in graph of phage T4, a closely related phage. [Electron micrograph courtesy fresh medium, and then swirled violently in of Robley Williams.] a kitchen blender to shear attached phage material from the cell surfaces. This treat- ment was found to have no effect on the subsequent course of the infection, which implies that the phage genetic material Genetic Role of DNA in must enter the infected cells very soon after Bacteriophage phage attachment. The kitchen blender A second pivotal finding was reported by turned out to be the critical piece of equip- Alfred Hershey and Martha Chase in 1952. ment. Other methods had been tried to tear They studied cells of the intestinal the phage heads from the bacterial cell sur- bacterium Escherichia coli after infection by face, but nothing had worked reliably. the virus T2. A virus that attacks bacterial Hershey later explained, “We tried various cells is called a bacteriophage, a term of- grinding arrangements, with results that ten shortened to phage. Bacteriophage weren’t very encouraging. When Margaret means “bacteria-eater.” The structure of a McDonald loaned us her kitchen blender, bacteriophage T2 particle is illustrated in the experiment promptly succeeded.” Figure 1.4. It is exceedingly small, yet it has a After the phage heads were removed by complex structure composed of head the blender treatment, the infected bacteria (which contains the phage DNA), collar, were examined. Most of the radioactivity tail, and tail fibers. (The head of a human from 32P-labeled phage was found to be as- sperm is about 30–50 times larger in both sociated with the bacteria, whereas only a length and width than the head of T2.) small fraction of the 35S radioactivity was Hershey and Chase were already aware present in the infected cells. The retention that T2 infection proceeds via the attach- of most of the labeled DNA, contrasted with ment of a phage particle by the tip of its tail the loss of most of the labeled protein, im- to the bacterial cell wall, entry of phage ma- plied that a T2 phage transfers most of its terial into the cell, multiplication of this DNA, but very little of its protein, to the cell material to form a hundred or more prog- it infects. The critical finding (Figure 1.5) eny phage, and release of the progeny phage by bursting (lysis) of the bacterial host cell. They also knew that T2 particles Figure 1.5 (on facing page) The Hershey–Chase were composed of DNA and protein in ap- (“blender”) experiment demonstrating that proximately equal amounts. DNA, not protein, is responsible for directing Because DNA contains phosphorus but the reproduction of phage T2 in infected E. coli cells. (A) Radioactive DNA is transmitted to no sulfur, whereas most proteins contain progeny phage in substantial amounts. sulfur but no phosphorus, it is possible to la- (B) Radioactive protein is transmitted to bel DNA and proteins differentially by using progeny phage in negligible amounts.
6 Chapter 1 Introduction to Molecular Genetics and Genomics (A) (B) Infection with Infection with nonradioactive nonradioactive T2 phage T2 phage
E. coli cells grown E. coli cells grown in 32P-containing in 35S-containing medium (labels DNA) medium (labels protein)
Phage reproduction; Phage reproduction; cell lysis releases cell lysis releases DNA-labeled progeny protein-labeled phage progeny phage
DNA-labeled phage Protein-labeled phage used to infect used to infect nonradioactive cells nonradioactive cells
After infection, part of phage After infection, part of phage remaining attached to cells is remaining attached to cells is removed by violent agitation removed by violent agitation in a kitchen blender in a kitchen blender
Infecting Infected cellInfecting Infected cell labeled DNA nonlabeled DNA
Phage reproduction; cell lysis Phage reproduction; cell releases progeny phage that lysis releases progeny contain some 32P-labeled DNA phage that contain almost from the parental phage DNA no 35S-labeled protein
Conclusion: DNA from an infecting parental phage is inherited in the progeny phage
1.1 DNA: The Genetic Material 7 Shear Madness
Margaret McDonald loaned us her Alfred D. Hershey and ferred from parental to progeny phage kitchen blender the experiment promptly Martha Chase 1952 at yields of about 30 phage per infected succeeded.” bacterium. . . . [Incomplete separation Cold Spring Harbor Laboratories, of phage heads] explains a mistaken Cold Spring Harbor, New York The work [of others] has shown that preliminary report of the transfer of 35S Independent Functions of Viral Protein bacteriophages T2,T3, from parental to progeny and Nucleic Acid in Growth of and T4 multiply in the Our experiments phage. . . . The following Bacteriophage bacterial cell in a non-in- show clearly that questions remain unan- fective [immature] form. a physical swered. (1) Does any sul- Published a full eight years after the paper Little else is known about fur-free phage material of Avery, MacLeod, and McCarty, the ex- the vegetative [growth] separation of the other than DNA enter the periments of Hershey and Chase get equal phase of these viruses. phage T2 into cell? (2) If so, is it trans- billing. Why? Some historians of science The experiments reported genetic and ferred to the phage prog- in this paper show that suggest that the Avery et al. experiments nongenetic parts eny? (3) Is the transfer of were “ahead of their time.” Others sug- one of the first steps in the phosphorus to progeny gest that Hershey had special standing be- growth of T2 is the release is possible. direct or indirect? ...Our cause he was a member of the “in group” from its protein coat of experiments show clearly of phage molecular geneticists. Max the nucleic acid of the virus particle, that a physical separation of the phage Delbrück was the acknowledged leader of after which the bulk of the sulfur-con- T2 into genetic and nongenetic parts is this group, with Salvador Luria close be- taining protein has no further func- possible. The chemical identification of hind. (Delbrück, Luria, and Hershey tion....Anderson has obtained the genetic part must wait until some of shared a 1969 Nobel Prize.) Another pos- electron micrographs indicating that the questions above have been an- sible reason is that whereas the experi- phage T2 attaches to bacteria by its swered. . . . The sulfur-containing ments of Avery et al. were feats of strength tail....Itought to be a simple matter to protein of resting phage particles is con- in biochemistry, those of Hershey and break the empty phage coats off the in- fined to a protective coat that is respon- Chase were quintessentially genetic. fected bacteria, leaving the phage DNA sible for the adsorption to bacteria, and Which macromolecule gets into the hered- inside the cells....When a suspension functions as an instrument for the injec- itary action, and which does not? Buried of cells with 35S- or 32P-labeled phage tion of the phage DNA into the cell. This in the middle of this paper, and retained was spun in a blender at 10,000 revolu- protein probably has no function in the in the excerpt, is a sentence admitting tions per minute, ...75to80percent of growth of the intracellular phage. The that an earlier publication by the re- the phage sulfur can be stripped from DNA has some function. Further chemi- searchers was a misinterpretation of their the infected cells....These facts show cal inferences should not be drawn from preliminary results. This shows that even that the bulk of the phage sulfur re- the experiments presented. first-rate scientists, then and now, are mains at the cell surface during infec- sometimes misled by their preliminary tion. . . . Little or no 35S is contained in Source: Journal of General Physiology 36: data. Hershey later explained, “We tried the mature phage progeny. . . . Identical 39–56 various grinding arrangements, with re- experiments starting with phage labeled sults that weren´t very encouraging. When with 32P show that phosphorus is trans-
was that about 50 percent of the transferred Chase are regarded as classics in the demon- 32P-labeled DNA, but less than 1 percent of stration that genes consist of DNA. At the the transferred 35S-labeled protein, was in- present time, the equivalent of the transfor- herited by the progeny phage particles. mation experiment is carried out daily in Hershey and Chase interpreted this result to many research laboratories throughout the mean that the genetic material in T2 phage world, usually with bacteria, yeast, or ani- is DNA. mal or plant cells grown in culture. These The experiments of Avery, MacLeod, experiments indicate that DNA is the ge- and McCarty and those of Hershey and netic material in these organisms as well as
8 Chapter 1 Introduction to Molecular Genetics and Genomics in phage T2. Although there are no known They are examined in Chapter 2. A key exceptions to the generalization that DNA is point for our present purposes is that the the genetic material in all cellular organisms bases in the double helix are paired as and many viruses, in a few types of viruses shown in Figure 1.6B. That is: the genetic material consists of RNA. At any position on the paired strands of a DNA molecule, if one strand has an A, then the partner strand has a T; and if one strand has a 1.2 DNA Structure: G, then the partner strand has a C. The Double Helix The pairing between A and T and be- tween G and C is said to be comple- The inference that DNA is the genetic mate- mentary; the complement of A is T, and rial still left many questions unanswered. the complement of G is C. The complemen- How is the DNA in a gene duplicated when tary pairing means that each base along one a cell divides? How does the DNA in a gene strand of the DNA is matched with a base in control a hereditary trait? What happens to the opposite position on the other strand. the DNA when a mutation (a change in the Furthermore: DNA) takes place in a gene? In the early 1950s, a number of researchers began to try Nothing restricts the sequence of bases in a to understand the detailed molecular struc- single strand, so any sequence could be ture of DNA in hopes that the structure present along one strand. alone would suggest answers to these ques- This principle explains how only four bases tions. In 1953 James Watson and Francis in DNA can code for the huge amount of in- Crick at Cambridge University proposed the formation needed to make an organism. It first essentially correct three-dimensional structure of the DNA molecule. The struc- ture was dazzling in its elegance and revo- lutionary in suggesting how DNA duplicates (A) (B) 3’ 5’ itself, controls hereditary traits, and under- TA goes mutation. Even while their tin-and- wire model of the DNA molecule was still GC incomplete, Crick would visit his favorite AT pub and exclaim “we have discovered the secret of life.” CG In the Watson–Crick structure, DNA CG consists of two long chains of subunits, each GC twisted around the other to form a double- T A Paired stranded helix. The double helix is right- nucleotides handed, which means that as one looks GC along the barrel, each chain follows a clock- T A wise path as it progresses. You can visualize GC the right-handed coiling in part A of Figure TA 1.6 if you imagine yourself looking up into the structure from the bottom. The dark TA spheres outline the “backbone” of each in- AT dividual strand, and they coil in a clockwise GC direction. The subunits of each strand are T A nucleotides, each of which contains any one of four chemical constituents called CG bases attached to a phosphorylated mole- T A cule of the 5-carbon sugar deoxyribose. GC The four bases in DNA are 5’ 3’ • Adenine (A) • Guanine (G) • Thymine (T) • Cytosine (C) Figure 1.6 Molecular structure of the DNA double helix in the standard “B form.” (A) A space-filling model, in which each atom is depicted as a The chemical structures of the nucleotides sphere. (B) A diagram highlighting the helical strands around the outside and bases need not concern us at this time. of the molecule and the A T and G C base pairs inside.
1.2 DNA Structure: The Double Helix 9 is the sequence of bases along the DNA that In the remainder of this chapter, we discuss encodes the genetic information, and the some of the implications of these clues. sequence is completely unrestricted. The complementary pairing is also called Watson–Crick pairing. In the three- 1.3 An Overview of DNA dimensional structure in Figure 1.6A, the Replication base pairs are represented by the lighter spheres filling the interior of the double Watson and Crick noted that the structure helix. The base pairs lie almost flat, stacked of DNA itself suggested a mechanism for its on top of one another perpendicular to the replication. “It has not escaped our notice,” long axis of the double helix, like pennies in they wrote, “that the specific base pairing a roll. When discussing a DNA molecule, we have postulated immediately suggests a biologists frequently refer to the individual copying mechanism.” The copying process strands as single-stranded DNA and to in which a single DNA molecule becomes the double helix as double-stranded two identical molecules is called repli- DNA or duplex DNA. cation. The replication mechanism that Each DNA strand has a polarity, or di- Watson and Crick had in mind is illustrated rectionality, like a chain of circus elephants in Figure 1.7. linked trunk to tail. In this analogy, each As shown in part A of Figure 1.7, the elephant corresponds to one nucleotide strands of the original (parent) duplex sep- along the DNA strand. The polarity is deter- arate, and each individual strand serves as a mined by the direction in which the nu- pattern, or template, for the synthesis of a cleotides are pointing. The “trunk” end of new strand (replica). The replica strands are the strand is called the 3' end of the strand, synthesized by the addition of successive and the “tail” end is called the 5' end. In nucleotides in such a way that each base in double-stranded DNA, the paired strands the replica is complementary (in the are oriented in opposite directions, the 5' Watson–Crick pairing sense) to the base end of one strand aligned with the 3' end of across the way in the template strand the other. The molecular basis of the polar- (Figure 1.7B). Although the mechanism in ity, and the reason for the opposite orienta- Figure 1.7 is simple in principle, it is a com- tion of the strands in duplex DNA, are plex process that is fraught with geometri- explained in Chapter 2. In illustrating DNA cal problems and requires a variety of molecules in this book, we use an arrow- enzymes and other proteins. The details are like ribbon to represent the backbone, and examined in Chapter 6. The end result of we use tabs jutting off the ribbon to repre- replication is that a single double-stranded sent the nucleotides. The polarity of a DNA molecule becomes replicated into two strand is indicated by the direction of the copies with identical sequences: arrow-like ribbon. The tail of the arrow rep- 5'-ACGCTTGC-3' resents the 5' end of the DNA strand, the 3'-TGCGAACG-5' head the 3' end. 5 -ACGCTTGC-3 5 -ACGCTTGC-3 Beyond the most optimistic hopes, ' ' ' ' 3 -TGCGAACG-5 3 -TGCGAACG-5 knowledge of the structure of DNA imme- ' ' ' ' diately gave clues to its function: Here the bases in the newly synthesized strands are shown in red. In the duplex on 1. The sequence of bases in DNA could be the left, the top strand is the template from copied by using each of the separate “partner” strands as a pattern for the the parental molecule and the bottom creation of a new partner strand with a strand is newly synthesized; in the duplex complementary sequence of bases. on the right, the bottom strand is the tem- 2. The DNA could contain genetic information plate from the parental molecule and the in coded form in the sequence of bases, top strand is newly synthesized. Note in analogous to letters printed on a strip of Figure 1.7B that in the synthesis of each paper. new strand, new nucleotides are added 3. Changes in genetic information (mutations) only to the 3' end of the growing chain: could result from errors in copying in which The obligatory elongation of a DNA strand the base sequence of the DNA became only at the 3' end is an essential feature of altered. DNA replication.
10 Chapter 1 Introduction to Molecular Genetics and Genomics (A) (B) Parent molecule of DNA 5’ 3’ TA ACGCTTGC CG TGCGAACG AT 3’ 5’
CG CG Parent duplex Template strand Complement of A GC 5’ 3’ “T” adds “A” T A ACGCTTGC TGCGAACG Complement of 3’ 5’ CG G “C” adds “G” Template strand T A CG TA
C Complement of TA “G” adds “C” 5’ 3’ 5’ A T ACGCTTGC A A T G TGCGAACG Complement of 5’ 3’ 5’ C G “G” adds “C” C CG CG GC GC T A And so forth T Daughter A CG duplex 5’ 3’ 5’ CG T A ACGCTTGC ACG T A ACG TGCGAACG C G CG TA 5’ 3’ 5’ TA
TA TA Template AT AT strands 5’ 3’ 5’ 3’ ACGCTTGC ACGCTTGC TGCGAACG TGCGAACG 3’ 5’ 3’ 5’ Replica strands Daughter molecules of DNA
Figure 1.7 Replication of DNA. (A) Replication of a DNA duplex as originally envisioned by Watson and Crick. As the parental strands separate, each parental strand serves as a template for the formation of a new daughter strand by means of A T and G C base pairing. (B) Greater detail showing how each of the parental strands serves as a template for the production of a comple- mentary daughter strand, which grows in length by the successive addition of single nucleotides to the 3' end.
1.4 Genes and Proteins the complex and diverse DNA codes is pro- tein, a class of macromolecules that carries Now that we have some basic understand- out most of the activities in the cell. Cells ing of the structural makeup of the genetic are largely made up of proteins: structural blueprint, how does this developmental proteins that give the cell rigidity and mo- plan become a complex living organism? If bility, proteins that form pores in the cell the code is thought of as a string of letters membrane to control the traffic of small on a sheet of paper, then the genes are molecules into and out of the cell, and re- made up of distinct words that form sen- ceptor proteins that regulate cellular activi- tences and paragraphs that give meaning to ties in response to molecular signals from the pattern of letters. What is created from the growth medium or from other cells.
1.4 Genes and Proteins 11 Proteins are also responsible for most of the metabolic activities of cells. They are essen- tial for the synthesis and breakdown of or- ganic molecules and for generating the chemical energy needed for cellular activi- ties. In 1878 the term enzyme was intro- duced to refer to the biological catalysts that accelerate biochemical reactions in cells. By 1900, thanks largely to the work of the German biochemist Emil Fischer, enzymes were shown to be proteins. As often hap- pens in science, nature’s “mistakes” provide clues as to how things work. Such was the case in establishing a relationship between Figure 1.8 Urine from a person with alkapto- genes and disease, because a “mistake” in a nuria turns black because of the oxidation of the gene (a mutation) can result in a “mistake” homogentisic acid that it contains. [Courtesy of (lack of function) in the corresponding pro- Daniel De Aguiar.] tein. This provided a fruitful avenue of re- search for the study of genetics. rare, with an incidence of about one in 200,000 people, it was well known even be- Inborn Errors of Metabolism as a fore Garrod studied it. The disease itself is Cause of Hereditary Disease relatively mild, but it has one striking symp- It was at the turn of the twentieth century tom: The urine of the patient turns black that the British physician Archibald Garrod because of the oxidation of homogentisic realized that certain heritable diseases fol- acid (Figure 1.8). This is why alkaptonuria is lowed the rules of transmission that also called black urine disease. An early case Mendel had described for his garden peas. was described in the year 1649: In 1908 Garrod gave a series of lectures in The patient was a boy who passed black which he proposed a fundamental hypoth- urine and who, at the age of fourteen years, esis about the relationship between hered- was submitted to a drastic course of treatment ity, enzymes, and disease: that had for its aim the subduing of the fiery heat of his viscera, which was supposed Any hereditary disease in which cellular to bring about the condition in question by metabolism is abnormal results from an charring and blackening his bile. Among inherited defect in an enzyme. the measures prescribed were bleedings, purgation, baths, a cold and watery diet, and Such diseases became known as inborn drugs galore. None of these had any obvious errors of metabolism, a term still in use effect, and eventually the patient, who tired today. of the futile and superfluous therapy, resolved Garrod studied a number of inborn er- to let things take their natural course. None of rors of metabolism in which the patients the predicted evils ensued. He married, begat excreted abnormal substances in the urine. a large family, and lived a long and healthy One of these was alkaptonuria. In this life, always passing urine black as ink. case, the abnormal substance excreted is (Recounted by Garrod, 1908.) homogentisic acid: Garrod was primarily interested in the OH biochemistry of alkaptonuria, but he took O note of family studies that indicated that the
CH2 C disease was inherited as though it were due CH to a defect in a single gene. As to the bio- HO chemistry, he deduced that the problem in alkaptonuria was the patients’ inability to An early name for homogentisic acid was break down the phenyl ring of six carbons alkapton, hence the name alkaptonuria for that is present in homogentisic acid. Where the disease. Even though alkaptonuria is does this ring come from? Most animals
12 Chapter 1 Introduction to Molecular Genetics and Genomics obtain it from foods in their diet. Garrod Benzene ring proposed that homogentisic acid originates as a breakdown product of two amino acids, C C NH2 O phenylalanine and tyrosine, which also C CCCH2 C contain a phenyl ring. An amino acid is C C OH one of the “building blocks” from which H Phenylalanine (a normal amino acid) proteins are made. Phenylalanine and Each arrow tyrosine are constituents of normal pro- represents one teins. The scheme that illustrates the rela- 1 step in the tionship between the molecules is shown in biochemical Figure 1.9. Any such sequence of biochemical pathway. reactions is called a biochemical pathway NH O or a metabolic pathway. Each arrow in C C 2 the pathway represents a single step depict- HO C C CH2 C C ing the transition from the “input” or C C H OH substrate molecule, shown at the head of Tyrosine (a normal amino acid) the arrow, to the “output” or product molecule, shown at the tip. Biochemical 2 pathways are usually oriented either verti- cally with the arrows pointing down, as in O Figure 1.9, or horizontally, with the arrows C C O pointing from left to right. Garrod did not HO C C CH2 C C know all of the details of the pathway in C C H OH Figure 1.9, but he did understand that the 4-Hydroxyphenylpyruvic acid key step in the breakdown of homogentisic acid is the breaking open of the phenyl ring 3 and that the phenyl ring in homogentisic In the next step acid comes from dietary phenylalanine and the benzene ring tyrosine. OH is opened at this position. What allows each step in a biochemical C C O pathway to occur? Garrod correctly sur- C C CH C mised that each step requires a specific en- 2 C C zyme to catalyze the reaction for the OH chemical transformation. Persons with an OH inborn error of metabolism, such as alkap- Homogentisic acid (formerly known as alkapton) tonuria, have a defect in a single step of a metabolic pathway because they lack a This is the step functional enzyme for that step. When an that is blocked 4 in alkaptonuria; enzyme in a pathway is defective, the path- X homogentisic way is said to have a block at that step. acid accumulates. One frequent result of a blocked pathway is that the substrate of the defective enzyme O accumulates. Observing the accumulation O O of homogentisic acid in patients with alkap- C CH CH C CH2 C CH2 C tonuria, Garrod proposed that there must HO O OH be an enzyme whose function is to open 4-Maleylacetoacetic acid the phenyl ring of homogentisic acid and that this enzyme is missing in these pa- tients. Isolation of the enzyme that opens the phenyl ring of homogentisic acid was Further breakdown not actually achieved until 50 years after Garrod’s lectures. In normal people it is Figure 1.9 Metabolic pathway for the breakdown of phenylalanine and tyrosine. Each step in the pathway, found in cells of the liver, and just as Garrod represented by an arrow, requires a specific enzyme to had predicted, the enzyme is defective in catalyze the reaction. The key step in the breakdown of patients with alkaptonuria. homogentisic acid is the breaking open of the phenyl ring.
1.4 Genes and Proteins 13 The Black Urine Disease
To students of heredity the inborn errors Archibald E. Garrod 1908 in the urine originally called alkapton] is of metabolism offer a promising field of homogentisic acid, the excretion of St. Bartholomew’s Hospital, investigation....Itwaspointed out [by which is the essential feature of the London, England others] that the mode of incidence of alkaptonuric....Homogentisic acid is a Inborn Errors of Metabolism alkaptonuria finds a ready explanation if product of normal metabolism....The the anomaly be regarded as a most likely sources of the Although he was a distinguished physi- rare recessive character in We may further benzene ring in homo- cian, Garrod’s lectures on the relationship the Mendelian sense....Of conceive that gentisic acid are phenyl- the cases of alkaptonuria a between heredity and congenital defects in the splitting of alanine and tyrosine, metabolism had no impact when they very large proportion have [because when these were delivered. The important concept been in the children of first the benzene ring in amino acids are adminis- that one gene corresponds to one enzyme cousin marriages.... It is normal tered to an alkaptonuric] (the “one gene–one enzyme hypothesis”) also noteworthy that, if one metabolism is the they cause a very con- takes families with five or was developed independently in the 1940s work of a special spicuous increase in the by George W. Beadle and Edward L. more children [with both par- output of homogentisic Tatum, who used the bread mold ents normal and at least one enzyme and that in acid.... Where the al- Neurospora crassa as their experimental child affected with alkap- congenital kaptonuric differs from organism. When Beadle finally became tonuria], the totals work out alkaptonuria this the normal individual is in strict conformity to aware of Inborn Errors of Metabolism, enzyme is wanting. in having no power of he was generous in praising it. This excerpt Mendel’s law, i.e. 57 [normal destroying homogentisic shows Garrod at his best, interweaving children] : 19 [affected chil- acid when formed—in history, clinical medicine, heredity, and dren] in the proportions 3 : 1....Ofin- other words of breaking up the benzene biochemistry in his account of alkap- born errors of metabolism, alkaptonuria ring of that compound.... We may fur- tonuria. The excerpt also illustrates how is that of which we know most. In itself it ther conceive that the splitting of the the severity of a genetic disease depends is a trifling matter, inconvenient rather benzene ring in normal metabolism is on its social context. Garrod writes as than harmful.... Indications of the the work of a special enzyme and that in though alkaptonuria were a harmless anomaly may be detected in early med- congenital alkaptonuria this enzyme is curiosity. This is indeed largely true when ical writings, such as that in 1584 of a wanting. the life expectancy is short. With today’s schoolboy who, although he enjoyed longer life span, alkaptonuria patients ac- good health, continuously excreted cumulate the dark pigment in their carti- black urine; and that in 1609 of a monk Source: Originally published in London, lage and joints and eventually develop who exhibited a similar peculiarity and England, by the Oxford University Press. severe arthritis. stated that he had done so all his life.... Excerpts from the reprinted edition in Harry Harris. 1963. Garrod’s Inborn Errors of There are no sufficient grounds [for Metabolism. London, England: Oxford doubting that the blackening substance University Press.
The pathway for the breakdown of clinical consequences of defects in the other phenylalanine and tyrosine, as it is under- enzymes. Unlike alkaptonuria, which is a stood today, is shown in Figure 1.10. In this relatively benign inherited disease, the oth- figure the emphasis is on the enzymes ers are very serious. The condition known rather than on the structures of the as phenylketonuria (PKU) results from metabolites, or small molecules, on which the absence of (or a defect in) the enzyme the enzymes act. Each step in the pathway phenylalanine hydroxylase (PAH). requires the presence of a particular en- When this step in the pathway is blocked, zyme that catalyzes that step. Although phenylalanine accumulates. The excess Garrod knew only about alkaptonuria, in phenylalanine is broken down into harmful which the defective enzyme is homogentisic metabolites that cause defects in myelin for- acid 1,2 dioxygenase, we now know the mation that damage a child’s developing
14 Chapter 1 Introduction to Molecular Genetics and Genomics nervous system and lead to severe mental Phenylalanine retardation. A defect in this Each step in a 1 enzyme leads to However, if PKU is diagnosed in children metabolic pathway Phenylalanine soon enough after birth, they can be placed hydroxylase accumulation of requires a different phenylalanine and on a specially formulated diet low in pheny- enzyme. to phenylketonuria. lalanine. The child is allowed only as much phenylalanine as can be used in the synthe- Tyrosine sis of proteins, so excess phenylalanine does A defect in this 2 enzyme leads to not accumulate. The special diet is very Tyrosine aminotransferase accumulation of strict. It excludes meat, poultry, fish, eggs, tyrosine and to milk and milk products, legumes, nuts, and tyrosinemia type II. bakery goods manufactured with regular flour. These foods are replaced by an expen- 4-Hydroxyphenyl- sive synthetic formula. With the special pyruvic acid A defect in this diet, however, the detrimental effects of ex- 3 Each enzyme is 4-Hydroxyphenyl- enzyme leads to cess phenylalanine on mental development accumulation of can largely be avoided, although in adult encoded in a pyruvic acid different gene. dioxygenase 4-hydroxyphenyl- women with PKU who are pregnant, the fe- pyruvic acid and to tus is at risk. In many countries, including tyrosinemia type III. the United States, all newborn babies have Homogentisic acid their blood tested for chemical signs of PKU. A defect in this Routine screening is cost-effective because 4 enzyme leads to Homogentisic acid PKU is relatively common. In the United 1,2-dioxygenase accumulation of States, the incidence is about 1 in 8000 homogentisic acid among Caucasian births. The disease is less and to alkaptonuria. common in other ethnic groups. 4-Maleylacetoacetic acid In the metabolic pathway in Figure 1.10, defects in the breakdown of tyrosine or of 4-hydroxyphenylpyruvic acid lead to types of tyrosinemia. These are also severe Further breakdown diseases. Type II is associated with skin le- Figure 1.10 Inborn errors of metabolism that sions and mental retardation, Type III with affect the breakdown of phenylalanine and severe liver dysfunction. tyrosine. An inherited disease results when any of the enzymes is missing or defective. Alkapto- nuria results from a mutant homogentisic acid Mutant Genes and Defective 1,2 dioxygenase phenylketonuria results from a Proteins mutant phenylalanine hydroxylase. It follows from Garrod’s work that a defec- tive enzyme results from a mutant gene, but how? Garrod did not speculate. For all he use the standard typographical convention knew, genes were enzymes. This would have that genes are written in italic type, whereas been a logical hypothesis at the time. We gene products are not printed in italics. This now know that the relationship between convention is convenient, because it means genes and enzymes is somewhat indirect. that the protein product of a gene can be With a few exceptions, each enzyme is en- represented with the same symbol as the coded in a particular sequence of nucleotides gene itself, but whereas the gene symbol is present in a region of DNA. The DNA region in italics, the protein symbol is not. that codes for the enzyme, as well as adja- cent regions that regulate when and in • The gene PAH on the long arm of chromo- which cells the enzyme is produced, make some 12 encodes phenylalanine hydroxylase (PAH). up the “gene” that encodes the enzyme. • The gene TAT on the long arm of chromo- The genes for the enzymes in the bio- some 16 encodes tyrosine aminotransferase chemical pathway in Figure 1.10 have all (TAT). been identified and the nucleotide se- • The gene HPD on the long arm of chromo- quence of the DNA determined. In the fol- some 12 encodes 4-hydroxyphenylpyruvic lowing list, and throughout this book, we acid dioxygenase (HPD).
1.4 Genes and Proteins 15 • The gene HGD on the long arm of chromo- chains; each polypeptide chain consists some 3 encodes homogentisic acid 1,2 of a linear sequence of amino acids con- dioxygenase (HGD). nected end to end. For example, the en- Next we turn to the issue of how genes code zyme PAH consists of four identical for enzymes and other proteins. polypeptide chains, each 452 amino acids in length. In the decoding of DNA, each successive “code word” in the DNA specifies the next amino acid to be added to the 1.5 Gene Expression: polypeptide chain as it is being made. The amount of DNA required to code for the The Central Dogma polypeptide chain of PAH is therefore Watson and Crick were correct in proposing 452 3 1356 nucleotide pairs. The en- that the genetic information in DNA is con- tire gene is very much longer—about tained in the sequence of bases in a manner 90,000 nucleotide pairs. Only 1.5 percent of analogous to letters printed on a strip of pa- the gene is devoted to coding for the amino per. In a region of DNA that directs the syn- acids. The noncoding part includes some se- thesis of a protein, the genetic code for the quences that control the activity of the protein is contained in only one strand, and gene, but it is not known how much of the it is decoded in a linear order. A typical pro- gene is involved in regulation. tein is made up of one or more polypeptide There are 20 different amino acids. Only four bases code for these 20 amino acids, with each “word” in the genetic code con- sisting of three adjacent bases. For example, the base sequence ATG specifies the amino acid methionine (Met), TCC specifies serine Nucleotide sequence ATGTCCACTGCGGTCCTGGAA (Ser), ACT specifies threonine (Thr), and in DNA molecule TACAGGTGACGCCAGGACCTT GCG specifies alanine (Ala). There are 64 possible three-base combinations but only 20 amino acids because some combinations code for the same amino acid. For example, TRANSCRIPTION TCT, TCC, TCA, TCG, AGT, and AGC all code for serine (Ser), and CTT, CTC, CTA, CTG, TTA, and TTG all code for leucine (Leu). An Two-step decoding An RNA intermediate example of the relationship between the process synthesizes plays the role of base sequence in a DNA duplex and the a polypeptide. ”messenger“ amino acid sequence of the corresponding protein is shown in Figure 1.11. This particular DNA duplex is the human sequence that codes for the first seven amino acids in the TRANSLATION polypeptide chain of PAH. The scheme outlined in Figure 1.11 in- dicates that DNA codes for protein not di- Amino acid sequence rectly but indirectly through the processes in polypeptide chain Met Ser Thr Ala Val Leu Glu ATGTCCACTGCGGTCCTGGAA of transcription and translation. The indirect route of information transfer, DNA triplets encoding each amino acid DNA RNA Protein Figure 1.11 DNA sequence coding for the first seven amino acids in a polypeptide chain. The is known as the central dogma of molecu- DNA sequence specifies the amino acid lar genetics. The term dogma means “set of sequence through a molecule of RNA that serves beliefs”; it dates from the time the idea was as an intermediary “messenger.” Although the put forward first as a theory. Since then the decoding process is indirect, the net result is that “dogma” has been confirmed experimen- each amino acid in the polypeptide chain is specified by a group of three adjacent bases in tally, but the term persists. The central the DNA. In this example, the polypeptide chain dogma is shown in Figure 1.12. The main is that of phenylalanine hydroxylase (PAH). concept in the central dogma is that DNA
16 Chapter 1 Introduction to Molecular Genetics and Genomics does not code for protein directly but rather acts through an intermediary molecule of DNA ribonucleic acid (RNA). The structure of RNA is similar to, but not identical with, that of DNA. There is a difference in the TRANSCRIPTION sugar (RNA contains the sugar ribose instead of deoxyribose), RNA is usually single-stranded (not a duplex), and RNA contains the base uracil (U) instead of rRNA mRNA tRNA (ribosomal) (messenger) (transfer) thymine (T), which is present in DNA. Actually, three types of RNA take part in the synthesis of proteins: • A molecule of messenger RNA (mRNA), Ribosome which carries the genetic information from DNA and is used as a template for polypep- tide synthesis. In most mRNA molecules, there is a high proportion of nucleotides that TRANSLATION actually code for amino acids. For example, the mRNA for PAH is 2400 nucleotides in length and codes for a polypeptide of 452 Protein amino acids; in this case, more than 50 percent of the length of the mRNA codes for amino acids. Figure 1.12 The “central dogma” of molecular • Several types of ribosomal RNA (rRNA), genetics: DNA codes for RNA, and RNA codes for which are major constituents of the cellular protein. The DNA RNA step is transcription, and the RNA protein step is translation. particles called ribosomes on which polypeptide synthesis takes place. • A set of transfer RNA (tRNA) molecules, each of which carries a particular amino acid mRNA for an unneeded protein. Another as well as a three-base recognition region possible reason may be historical. RNA that base-pairs with a group of three adjacent structure is unique in having both an infor- bases in the mRNA. As each tRNA partici- mational content present in its sequence of pates in translation, its amino acid becomes the terminal subunit added to the length of bases and a complex, folded three-dimen- the growing polypeptide chain. The tRNA sional structure that endows some RNA that carries methionine is denoted tRNAMet, molecules with catalytic activities. Many that which carries serine is denoted tRNASer, scientists believe that in the earliest forms and so forth. of life, RNA served both for genetic infor- mation and catalysis. As evolution pro- The central dogma is the fundamental ceeded, the informational role was principle of molecular genetics because it transferred to DNA and the catalytic role to summarizes how the genetic information in protein. However, RNA became locked into DNA becomes expressed in the amino acid its central location as a go-between in the sequence in a polypeptide chain: processes of information transfer and pro- The sequence of nucleotides in a gene specifies tein synthesis. This hypothesis implies that the sequence of nucleotides in a molecule of the participation of RNA in protein synthe- messenger RNA; in turn, the sequence of sis is a relic of the earliest stages of evolu- nucleotides in the messenger RNA specifies tion—a “molecular fossil.” The hypothesis the sequence of amino acids in the polypeptide is supported by a variety of observations. chain. For example, (1) DNA replication requires Given a process as conceptually simple an RNA molecule in order to get started as DNA coding for protein, what might ac- (Chapter 6), (2) an RNA molecule is essen- count for the additional complexity of RNA tial in the synthesis of the tips of the chro- intermediaries? One possible reason is that mosomes (Chapter 8), and (3) some RNA an RNA intermediate gives another level molecules act to catalyze key reactions in for control, for example, by degrading the protein synthesis (Chapter 11).
1.5 Gene Expression: The Central Dogma 17 the Watson–Crick pairing sense) to that in Transcription the DNA template, except that U (which The manner in which genetic information is pairs with A) is present in the RNA in place transferred from DNA to RNA is shown in of T. The rules of base pairing between DNA Figure 1.13. The DNA opens up, and one of the and RNA are summarized in Figure 1.14. Each strands is used as a template for the synthe- RNA strand has a polarity—a5' end and a 3' sis of a complementary strand of RNA. end—and, as in the synthesis of DNA, nu- (How the template strand is chosen is dis- cleotides are added only to the 3' end of a cussed in Chapter 11.) The process of mak- growing RNA strand. Hence the 5' end of ing an RNA strand from a DNA template is the RNA transcript is synthesized first, and transcription, and the RNA molecule that transcription proceeds along the template is made is the transcript. The base se- DNA strand in the 3'-to-5' direction. Each quence in the RNA is complementary (in gene includes nucleotide sequences that ini- tiate and terminate transcription. The RNA transcript made from any gene begins at the 3’ 5’ initiation site in the template strand, which is located “upstream” from the amino TA acid–coding region, and ends at the termi- CG nation site, which is located “downstream” AT from the amino acid–coding region. For any gene, the length of the RNA transcript is CG very much smaller than the length of the CG DNA in the chromosome. For example, the GC transcript of the PAH gene for phenyl- T A alanine hydroxylase is about 90,000 nu- cleotides in length, but the DNA in CG chromosome 12 is about 130,000,000 nu- T A cleotide pairs. In this case, the length of the CG PAH transcript is less than 0.1 percent of the TA length of the DNA in the chromosome. A different gene in chromosome 12 would be Direction of DNA strand TA transcribed from a different region of the growth of being transcribed A T RNA strand DNA molecule in chromosome 12, and per- A T haps from the opposite strand, but the tran- scribed region would again be small in C 3’ G comparison with the total length of the CG G C DNA in the chromosome. GC A T A Translation RNA transcript C CG T The synthesis of a polypeptide under the di- * U A rection of an mRNA molecule is known as CG C translation. Although the sequence of UA * T bases in the mRNA codes for the sequence G TA of amino acids in a polypeptide, the mole- A cules that actually do the “translating” are AU T 5’ * the tRNA molecules. The mRNA molecule 3’ is translated in nonoverlapping groups of 5’ three bases called codons. For each codon U in RNA pairs in the mRNA that specifies an amino acid, with A in DNA there is one tRNA molecule containing a complementary group of three adjacent Figure 1.13 Transcription is the production of an RNA strand that is bases that can pair with those in that codon. complementary in base sequence to a DNA strand. In this example, the DNA strand at the bottom is being transcribed into a strand of RNA. Note The correct amino acid is attached to the that in an RNA molecule, the base U (uracil) plays the role of T (thymine) other end of the tRNA, and when the tRNA in that it pairs with A (adenine). Each A U pair is marked. comes into line, the amino acid to which it
18 Chapter 1 Introduction to Molecular Genetics and Genomics Base in DNA template which combines with a single mRNA and Adenine Thymine Guanine Cytosine moves along it from one end to the other in steps, three nucleotides at a time (codon by A T G C codon). As each new codon comes into U A C G place, the next tRNA binds with the ribo- some. Then the growing end of the poly- Uracil Adenine Cytosine Guanine peptide chain becomes attached to the Base in RNA transcript amino acid on the tRNA. In this way, each Figure 1.14 Pairing between bases in DNA and in tRNA in turn serves temporarily to hold the RNA. The DNA bases A, T, G, and C pair with polypeptide chain as it is being synthesized. the RNA bases U, A, C, and G, respectively. As the polypeptide chain is transferred from each tRNA to the next in line, the tRNA that previously held the polypeptide is released from the ribosome. The polypeptide chain is attached becomes the most recent addi- elongates one amino acid at each step until tion to the growing end of the polypeptide any one of three particular codons specify- chain. ing “stop” is encountered. At this point, The role of tRNA in translation is illus- synthesis of the chain of amino acids is fin- trated in Figure 1.15 and can be described as ished, and the polypeptide chain is released follows: from the ribosome. (This brief description of The mRNA is read codon by codon. Each translation glosses over many of the details codon that specifies an amino acid matches that are presented in Chapter 11.) with a complementary group of three adjacent bases in a single tRNA molecule. One end of the tRNA is attached to the correct amino acid, The Genetic Code so the correct amino acid is brought into line. Figure 1.15 indicates that the mRNA The tRNA molecules used in translation codon AUG specifies methionine (Met) in do not line up along the mRNA simultane- the polypeptide chain, UCC specifies Ser ously as shown in Figure 1.15. The process (serine), ACU specifies Thr (threonine), of translation takes place on a ribosome, and so on. The complete decoding table is
The coding sequence of bases in mRNA specifies Messenger the amino acid sequence RNA (mRNA) of a polypeptide chain.
Bases A UGU CCACUGCGGUCCUGGAA Each group of three
in the U AC [ adjacent bases is a codon. A GG mRNA UGA CGC CAG The mRNA is translated Transfer GAC codon by codon by means CUU RNA of tRNA molecules. Met (tRNA) Ser Thr Each tRNA has a different Ala Val base sequence but about Leu the same overall shape. Glu Each tRNA carries an amino acid to be added to the polypeptide chain.
Figure 1.15 The role of messenger RNA in translation is to carry the information contained in a sequence of DNA bases to a ribosome, where it is translated into a polypeptide chain. Translation is mediated by transfer RNA (tRNA) molecules, each of which can base-pair with a group of three adjacent bases in the mRNA. Each tRNA also carries an amino acid. As each tRNA, in turn, is brought to the ribosome, the growing polypeptide chain is elongated.
1.5 Gene Expression: The Central Dogma 19 Table 1.1 The standard genetic code Second nucleotide in codon UC AG UUU Phe F Phenylalanine UCU Ser S Serine UAU Tyr Y Tyrosine UGU Cys C Cysteine U U UUC Phe F Phenylalanine UCC Ser S Serine UAC Tyr Y Tyrosine UGC Cys C Cysteine C UUA Leu L Leucine UCA Ser S Serine UAA Termination UGA Termination A
UUG Leu L Leucine UCG Ser S Serine UAG Termination UGG Trp W Tryptophan G Third nucleotide in codon (3’ end) CUU Leu L Leucine CCU Pro P Proline CAU His H Histidine CGU Arg R Arginine U C CUC Leu L Leucine CCC Pro P Proline CAC His H Histidine CGC Arg R Arginine C CUA Leu L Leucine CCA Pro P Proline CAA Gln Q Glutamine CGA Arg R Arginine A CUG Leu L Leucine CCG Pro P Proline CAG Gln Q Glutamine CGG Arg R Arginine G AUU Ile I Isoleucine ACU Thr T Threonine AAU Asn N Asparagine AGU Ser S Serine U A AUC Ile I Isoleucine ACC Thr T Threonine AAC Asn N Asparagine AGC Ser S Serine C AUA Ile I Isoleucine ACA Thr T Threonine AAA Lys K Lysine AGA Arg R Arginine A
First nucleotide in codon (5’ end) AUG Met M Methionine ACG Thr T Threonine AAG Lys K Lysine AGG Arg R Arginine G GUU Val V Valine GCU Ala A Alanine GAU Asp D Aspartic acid GGU Gly G Glycine U G GUC Val V Valine GCC Ala A Alanine GAC Asp D Aspartic acid GGC Gly G Glycine C GUA Val V Valine GCA Ala A Alanine GAA Glu E Glutamic acid GGA Gly G Glycine A GUG Val V Valine GCG Ala A Alanine GAG Glu E Glutamic acid GGG Gly G Glycine G
Codon Three-letter and single-letter abbreviations
called the genetic code, and it is shown in In addition to the 61 codons that code Table 1.1. For any codon, the column on the only for amino acids, there are four codons left corresponds to the first nucleotide in that have specialized functions: the codon (reading from the 5' end), the • The codon AUG, which specifies Met row across the top corresponds to the sec- (methionine), is also the “start” codon for ond nucleotide, and the column on the polypeptide synthesis. The positioning of a right corresponds to the third nucleotide. tRNAMet bound to AUG is one of the first The complete codon is given in the body of steps in the initiation of polypeptide synthe- the table, along with the amino acid (or sis, so all polypeptide chains begin with Met. translational “stop”) that the codon speci- (Many polypeptides have the initial Met fies. Each amino acid is designated by its cleaved off after translation is complete.) In full name and by a three-letter abbreviation most organisms, the tRNAMet used for initia- Met as well as a single-letter abbreviation. Both tion of translation is the same tRNA used types of abbreviations are used in molecular to specify methionine at internal positions in genetics. The code in Table 1.1 is the “stan- a polypeptide chain. dard” genetic code used in translation in • The codons UAA, UAG, and UGA are each the cells of nearly all organisms. In Chapter a “stop” that specifies the termination of 11 we examine general features of the stan- translation and results in release of the dard genetic code and the minor differences completed polypeptide chain from the found in the genetic codes of certain organ- ribosome. These codons do not have tRNA isms and cellular organelles. At this point, molecules that recognize them but are instead recognized by protein factors that we are interested mainly in understanding terminate translation. how the genetic code is used to translate the codons in mRNA into the amino acids How the genetic code table is used to in- in a polypeptide chain. fer the amino acid sequence of a polypep-
20 Chapter 1 Introduction to Molecular Genetics and Genomics tide chain can be illustrated by using PAH Codon number again, in particular the DNA sequence cod- in PAH gene ing for amino acids 1 through 7. The DNA sequence is 1234567 5'-ATGTCCACTGCGGTCCTGGAA-3' A TGTCCACTGCGGTCCTGGAA 3'-TACAGGTGACGCCAGGACCTT-5' DNA TACAGGTGACGCCAGGACCTT This region is transcribed into RNA in a left- to-right direction, and because RNA grows TRANSCRIPTION by the addition of successive nucleotides to the 3' end (Figure 1.13), it is the bottom mRNA A UGU CCACUGCGGUCCUGGAA U AC strand that is transcribed. The nucleotide A GG UGA sequence of the RNA is that of the top CGC strand of the DNA, except that U replaces T, TRANSLATION CAG GAC so the mRNA for amino acids 1 through 7 is CUU Met 5'-AUGUCCACUGCGGUCCUGGAA-3' Ser Thr The codons are read from left to right ac- Polypeptide Ala Val cording to the genetic code shown in Table Leu 1.1. Codon AUG codes for Met (methion- Amino acid number Glu ine), UCC codes for Ser (serine), and so on. in PAH polypeptide 1234567 Altogether, the amino acid sequence of this region of the polypeptide is Figure 1.16 The central dogma in action. The DNA that encodes PAH serves as a template for the production of a messenger RNA, and the 5'-AUG UCC ACU GCG GUC CUG GAA-3' mRNA serves to specify the sequence of amino acids in the PAH Met Ser Thr Ala Val Leu Glu polypeptide chain through interactions with the ribosome and tRNA molecules. or, in terms of the single-letter abbre- viations, 5'-AUG UCC ACU GCG GUC CUG GAA-3' MSTAVLE duplex molecule may mutate to T A, A T, The full decoding operation for this region or G C. The change in base sequence may of the PH gene is shown in Figure 1.16. In this also be more complex, such as the deletion figure, the initiation codon AUG is high- or addition of base pairs. These and other lighted because some patients with PKU types of mutations are considered in have a mutation in this particular codon. As Chapter 7. Geneticists also use the term might be expected from the fact that me- mutant, which refers to the result of a thionine is the initiation codon for polypep- mutation. A mutation yields a mutant tide synthesis, cells in patients with this gene, which in turn produces a mutant particular mutation fail to produce any of mRNA, a mutant protein, and finally a mu- the PAH polypeptide. Mutation and its con- tant organism that exhibits the effects of sequences are considered next. the mutation—for example, an inborn er- ror of metabolism. DNA from patients from all over the 1.6 Mutation world who have phenylketonuria has been studied to determine what types of muta- The term mutation refers to any heritable tions are responsible for the inborn error. change in a gene (or, more generally, in the There are a large variety of mutant types. genetic material) or to the process by which More than 400 different mutations have such a change takes place. One type of mu- been described in the gene for PAH. In tation results in a change in the sequence some cases part of the gene is missing, so of bases in DNA. The change may be sim- the genetic information to make a complete ple, such as the substitution of one pair of PAH enzyme is absent. In other cases the bases in a duplex molecule for a different genetic defect is more subtle, but the result pair of bases. For example, a C G pair in a is still either the failure to produce a PAH
1.6 Mutation 21 changes the normal codon AUG (Met) used A G Codon 1 in Mutation of T C PAH gene for the initiation of translation into the codon GUG, which normally specifies va- 1234567 line (Val) and cannot be used as a “start” codon. The result is that translation of the G TGTCCACTGCGGTCCTGGAA PAH mRNA cannot occur, and so no PAH DNA C ACAGGTGACGCCAGGACCTT polypeptide is made. This mutant is desig- nated M1V because the codon for M (me- thionine) at amino acid position 1 in the Mutant initiation codon TRANSCRIPTION in PAH mRNA PAH polypeptide has been changed to a codon for V (valine). Although the M1V mutant is quite rare worldwide, it is com- mRNA G UGUCCACUGCGGUCCUGGAA mon in some localities, such as Québec CAC Province in Canada. One PAH mutant that is quite common TRANSLATION tRNAVal is designated R408W, which means that codon 408 in the PAH polypeptide chain X No PAH polypeptide is produced has been changed from one coding for argi- Val Val because tRNA cannot be used nine (R) to one coding for tryptophan (W). to initiate polypeptide synthesis. This mutation is one of the four most com- mon among European Caucasians with Figure 1.17 The M1V mutant in the PAH gene. The methionine codon needed for initiation PKU. The molecular basis of the mutant is mutates into a codon for valine. Translation shown in Figure 1.18. In this case, the first cannot be initiated, and no PAH polypeptide is base pair in codon 408 is changed from a produced. C G base pair into a T A base pair. The re- sult is that the PAH mRNA has a mutant codon at position 408; specifically, it has protein or the production of a PAH protein UGG instead of CGG. Translation does oc- that is inactive. In the mutation shown in cur in this mutant because everything else Figure 1.17, substitution of a G C base pair about the mRNA is normal, but the result is for the normal A T base pair at the very that the mutant PAH carries a tryptophan first position in the coding sequence (Trp) instead of an arginine (Arg) at posi-
The women in the wedding photograph are sisters. Both are homozygous for the same mutant PAH gene. The bride is the younger of the two. She was diagnosed just three days after birth and put on the PKU diet soon after. Her older sister, the maid of honor, was diagnosed too late to begin the diet and is mentally retarded. The two-year old pictured in the photo at the right is the daughter of the married couple. They planned the pregnancy: dietary control was strict from conception to delivery to avoid the hazards of excess phenylalanine harming the fetus. Their daughter has passed all developmental milestones with distinction. [Courtesy of Charles R. Scriver.]
22 Chapter 1 Introduction to Molecular Genetics and Genomics Figure 1.18 The R408W mutant in Mutation of C T Codon 408 in G A the PAH gene. Codon 408 for arginine PAH gene (R) is mutated into a codon for trypto- phan (W). The result is that position 408 408 in the mutant PAH polypeptide is occupied by tryptophan rather than by G CCACAATAC CT T GGCCCTTCTCAGTTCGC arginine. The mutant protein has no DNA CGGTGTTATG GA ACCGGGAAGAGTCAAGCG PAH enzyme activity.
TRANSCRIPTION
mRNA G CCACAAUACCU UGGCCCUUC UCAGUUCGC C GG U GU UAU Mutant codon GGA in PAH mRNA TRANSLATION ACC GGG AAG Ala AGU Thr CAA Ile GCG Polypeptide Pro Trp Pro Mutant amino acid Phe Ser in PAH polypeptide Val Arg tion 408 in the polypeptide chain. The con- ing, or in the coming together of the pro- sequence of the seemingly minor change of tein subunits, or in the stability of the one amino acid is very drastic. Although folded protein. Protein folding is the com- the R408W polypeptide is complete, the plex process by which polypeptide chains enzyme has less than 3 percent of the activ- attain a stable three-dimensional structure ity of the normal enzyme. through short-range chemical interactions between nearby amino acids and long- range interactions between amino acids in Protein Folding and Stability different parts of the molecule. Folding nor- More than 400 different mutations in the mally occurs as the polypeptide is being PAH gene have been identified in patients synthesized on the ribosome, and the with PKU throughout the world. Many of process is facilitated by a class of proteins the mutations affect the level of expression known as chaperones. During the folding of the gene or processing of the RNA tran- process, the polypeptide chain twists and script, and some mutations are deletions in bends until it achieves a minimum energy which part of the gene is missing. But more state that maximizes the stability of the re- than 240 of the mutations are simple amino sulting structure, which is referred to as the acid replacements resulting from single nu- native conformation. For example, one cleotide substitutions in the DNA. Surpris- aspect of protein folding is that hydropho- ingly, only a minority of amino acid bic amino acids, which have low affinity for replacements result in a normal amount of water molecules, tend to move toward each PAH protein with a reduced enzyme activity. other and to form a relatively hydrophobic As a result of most mutations the amount of center, or core, in the native conformation. PAH is reduced, sometimes drastically, and For a polypeptide of realistic length, there in some other mutations the enzyme activ- are so many short-range and long-range in- ity of the PAH protein that remains is virtu- teractions, and so many possible folded ally normal. Yet in all these cases the level of conformations, that even the fastest com- expression of the gene, and the amount of puters cannot calculate and compare all mRNA, are within the normal range. their energy levels. Computer simulation of The reason why so many amino acid protein folding has yielded some insights, replacements reduce the amount of protein but the reliable prediction of protein folding is that they cause problems in protein fold- is still a major challenge.
1.6 Mutation 23 Figure 1.19 Some (A)–Normal folding pathway (B)–Aberrant folding pathway amino acid replace- (subunits prone to aggregation) ments perturb the ability of a protein to fold properly. (A) Normal folding in phenylalanine Unfolded hydroxylase forms Folding intermediate PAH polypeptide Abnormal intermediate the active tetramer. (B) Abnormal folding of a mutant poly- peptide chain results in the formation of polypeptide aggre- gates, which are progressively cleaved Folding intermediate Aggregate into the constituent amino acids through a ubiquitin-dependent proteosomal degrada- Active site tion pathway. Oligomerization for substrate domains through cleavage which monomers interact Folded monomer
Aggregate
Proteasome
Binding of oligomerization domains
Degradation via Tetramer (active form in cells) ubiquitin-dependent proteasomal pathway and other routes
Hypothetical pathways of protein fold- esses are reversible, so any amino acid re- ing in the case of PAH are illustrated in placement that decreases the stability of the Figure 1.19. The normal pathway is shown in tetramer or any of the intermediates will part A, including some of the (typically cause more of the PAH polypeptides to fold short-lived) intermediates in the folding according to pathway B. process. The native conformation of a single Pathway B in the figure is a misfolding PAH polypeptide constitutes the PAH pathway in which the folded monomers are monomer. Like many other polypeptides, prone to undergo irreversible aggregation the PAH monomer contains short regions, with each other. These aggregates are called oligomerization domains (oligo targeted for enzymatic breakdown into means “a small number”), through which their constituent amino acids, first by be- PAH polypeptides undergo stable binding to coming covalently bound with a 76–amino one another. In the case of PAH, the active acid polypeptide called ubiquitin, which is form of the PAH enzyme is a tetramer, attached through the activity of several consisting of four identical polypeptide proteins, including a ubiquitin-conjugating chains held together by interactions be- enzyme. The tagged protein is then degraded tween the tetramerization domains. Note by the proteasome, which is a large multi- that the folding and tetramerization proc- protein complex containing proteins with
24 Chapter 1 Introduction to Molecular Genetics and Genomics Au: Please con- firm that caption is OK as set.
Three-dimensional structure of two of the four subunits in the active tetramer of phenylalanine hydroxylase. The chains of amino acids are represented as a sequence of curls, loops, and flat arrows, which represent different types of local structure described in Chapter 11. The oligomeriza- tion (in this case, tetramerization) domain is shown in green, and the catalytic domain containing the active site is shown in gold. Compare this with the simplified diagram in Figure 1.19. [Courtesy of R. C. Stevens, T. Flatmark, and H. Erlandsen. From H. Erlandsen, F. Fusetti, A Martinez, E. Hough, T. Flatmark, and R. C. Stevens. 1997. Nature Structural Biology 4: 995.]
ubiquitin-binding, protease, and other ac- ample is true in general. Most traits are de- tivities. The ubiquitin/proteasome pathway termined by the interaction of genes and is required for the degradation of specific environment. proteins during the cell cycle and develop- It is also true that most traits are affected ment, and it is also used to degrade certain by multiple genes. No one knows how proteins that are intrinsically unstable or many genes are involved in the develop- that become unfolded in response to stress. ment and maturation of the brain and ner- In the case of PAH, many amino acid re- vous system, but the number must be in the placements decrease the stability of the pro- thousands. This number is in addition to the tein to such an extent that the protein is genes that are required in all cells to carry routed through the misfolding pathway in out metabolism and other basic life func- Figure 1.19B and targeted for degradation. tions. It is easy to lose sight of the multiplic- A destabilizing amino acid replacement can ity of genes when considering extreme be located at virtually any position along examples, such as PKU, in which a single the polypeptide chain. mutation can have such a drastic effect on mental development. The situation is the same as that with any complex machine. An 1.7 airplane can function if thousands of parts Genes and Environment are working together in harmony, but only Inborn errors of metabolism illustrate the one defective part, if that part affects a vital general principle that genes code for pro- system, can bring it down. Likewise, the de- teins and that mutant genes code for mu- velopment and functioning of every trait re- tant proteins. In cases such as PKU, mutant quire a large number of genes working in proteins cause such a drastic change in harmony, but in some cases a single mutant metabolism that a severe genetic defect gene can have catastrophic consequences. results. But biology is not necessarily des- In other words, the relationship be- tiny. Organisms are also affected by the en- tween a gene and a trait is not necessarily a vironment. PKU serves as an example of simple one. The biochemistry of organisms this principle, because patients who adhere is a complex branching network in which to a diet restricted in the amount of phenyl- different enzymes may share substrates, alanine develop mental capacities within yield the same products, or be responsive to the normal range. What is true in this ex- the same regulatory elements. The result is
1.7 Genes and Environment 25 that most visible traits of organisms are the 2. Any trait can be affected by more than net result of many genes acting together and one gene. We discussed this principle earlier in combination with environmental factors. in connection with the large number of PKU affords examples of each of three prin- genes that are required for the normal de- ciples governing these interactions: velopment and functioning of the brain and nervous system. Among these are genes 1. One gene can affect more than one trait. that affect the function of the blood–brain Children with extreme forms of PKU often barrier, which consists of specialized glial have blond hair and reduced body pigment. cells wrapped around tight capillary walls in This is because the absence of PAH is a the brain, forming an impediment to the metabolic block that prevents conversion of passage of most water-soluble molecules phenylalanine into tyrosine, which is the from the blood to the brain. The blood–brain precursor of the pigment melanin. The rela- barrier therefore affects the extent to which tionship between severe mental retardation excess free phenylalanine in the blood can and decreased pigmentation in PKU makes enter the brain itself. Because the effective- sense only if one knows the metabolic con- ness of the blood–brain barrier differs nections among phenylalanine, tyrosine, among individuals, PKU patients with very and melanin. If these connections were not similar levels of blood phenylalanine can known, the traits would seem completely have dramatically different levels of cogni- unrelated. PKU is not unusual in this re- tive development. This also explains in part gard. Many mutant genes affect multiple why adherence to a controlled-phenylala- traits through their secondary or indirect nine diet is critically important in children effects. The various, sometimes seemingly but less so in adults; the blood–brain barrier unrelated, effects of a mutant gene are is less well developed in children and is called pleiotropic effects, and the phe- therefore less effective in blocking the ex- nomenon itself is known as pleiotropy. cess phenylalanine. Figure 1.20 shows a cat with white fur and Multiple genes affect even simpler meta- blue eyes, a pattern of pigmentation that is bolic traits. Phenylalanine breakdown and often (about 40 percent of the time) associ- excretion serve as a convenient example. ated with deafness. Hence deafness can be The metabolic pathway is illustrated in regarded as a pleiotropic effect of white coat Figure 1.10. Four enzymes in the pathway and blue eye color. The developmental ba- are indicated, but even more enzymes are sis of this pleiotropy is unknown. involved at the stage labeled “further break- down.” Because differences in the activity of any of these enzymes can affect the rate at which phenylalanine can be broken down and excreted, all of the enzymes in the pathway are important in determining the amount of excess phenylalanine in the blood of patients with PKU.
3. Most traits are affected by environmen- tal factors as well as by genes. Here we come back to the low-phenylalanine diet. Children with PKU are not doomed to se- vere mental deficiency. Their capabilities can be brought into the normal range by di- etary treatment. PKU serves as an example of what motivates geneticists to try to dis- cover the molecular basis of inherited dis- ease. The hope is that knowing the metabolic basis of the disease will eventu- Figure 1.20 Among cats with white fur and blue eyes, ally make it possible to develop methods for about 40 percent are born deaf. We do not know why there is defective hearing nor why it is so often associated clinical intervention through diet, medica- with coat and eye color. This form of deafness can be tion, or other treatments that will amelio- regarded as a pleiotropic effect of white fur and blue eyes. rate the severity of the disease.
26 Chapter 1 Introduction to Molecular Genetics and Genomics chromosomes will turn out to share several 1.8 Evolution: From Genes thousand protein families. Furthermore, to Genomes, from Proteins about a thousand of these protein families to Proteomes are shared with organisms as distantly re- lated as bacteria. The pathway for the breakdown and excre- tion of phenylalanine is by no means The Molecular Unity of Life unique to human beings. One of the re- Why do organisms share a common set of markable generalizations to have emerged similar genes and proteins? Because all from molecular genetics is that organisms creatures share a common origin. The that are very distinct—for example, plants process of evolution takes place when a and animals—share many features in their population of organisms descended from a genetics and biochemistry. These similari- common ancestor gradually changes in ge- ties indicate a fundamental “unity of life”: netic composition through time. From an All creatures on Earth share many features of evolutionary perspective, the unity of fun- the genetic apparatus, including genetic damental molecular processes is derived by information encoded in the sequence of bases inheritance from a distant common ances- in DNA, transcription into RNA, and transla- tor in which the molecular mechanisms tion into protein on ribosomes with the use of were already in place. transfer RNAs. All creatures also share certain Not only the unity of life but also many characteristics in their biochemistry, including other features of living organisms become many enzymes and other proteins that are comprehensible from an evolutionary per- similar in amino acid sequence, three- dimensional structure, and function. spective. For example, the interposition of an RNA intermediate in the basic flow of The totality of DNA in a single cell is called the genome of the organism. In sexual or- ganisms, the genome is usually regarded as the DNA present in a reproductive cell. The human genome, which is contained in the chromosomes of a sperm or egg, includes approximately 3 billion nucleotide pairs of DNA. The complete set of proteins encoded in the genome is known as the proteome. The study of genomes constitutes genomics; 09131_01_1677P the study of proteomes constitutes pro- teomics. The fundamental unity of life can be seen in the similarity of proteins in the pro- teomes of diverse types of organisms. For example, the proteome of the fruit fly Drosophila melanogaster includes 13,601 pro- teins; these can be grouped into 8065 dif- ferent families of proteins that are similar in amino acid sequence. For comparison, the proteome of the nematode worm Caenorhabditis elegans contains 18,424 pro- teins that can be grouped into 9453 fami- Photocaptionphotoca lies. These two proteomes share about 5000 ptionphotocaption- 09131_01_1678P proteins that are sufficiently similar to be photocaptionphoto- captionphotocaptionp regarded as having a common function. hotocaptionphotocap- Among the protein families that are com- tionphotocaptionpho- mon to flies and worms, about 3000 are tocaptionphotocaption also found in the proteome of the yeast photocaptionphoto- Saccharomyces cerevisiae. Based on these data captionphotocaption- photocaptionphotocap from these three completely sequenced tionphotocaptionpho- complex genomes, it seems likely that all tocaptionphotocap- organisms whose cells have a nucleus and tionphotocaption
1.8 Evolution from Genes to Genomes, from Proteins to Proteomes 27 genetic information from DNA to RNA to resembles that of bacteria. About half of the protein makes sense if the earliest forms of genes found in the kingdom Archaea are life used RNA for both genetic information unique to this group. and enzyme catalysis. The importance of 3. The kingdom Eukarya This group includes the evolutionary perspective in undestand- all organisms whose cells contain an elaborate network of internal membranes, a ing aspects of biology that seem pointless or membrane-bounded nucleus, and needlessly complex is summed up in a mitochondria. Their DNA is organized into famous aphorism of the evolutionary biolo- true chromosomes, and cell division takes gist Theodosius Dobzhansky: “Nothing in place by means of mitosis (discussed in biology makes sense except in the light of Chapter 4). The eukaryotes include plants evolution.” and animals as well as fungi and many One indication of the common ancestry single-celled organisms, such as amoebae among Earth’s creatures is illustrated in and ciliated protozoa. Figure 1.21. The tree of relationships was in- The members of the kingdoms Bacteria ferred from similarities in nucleotide se- and Archaea are often grouped together into quence in an RNA molecule found in the a larger assemblage called prokaryotes, small subunit of the ribosome. Three major which literally means “before [the evolution kingdoms of organisms are distinguished: of] the nucleus.” This terminology is conve- 1. The kingdom Bacteria This group includes nient for designating prokaryotes as a group most bacteria and cyanobacteria (formerly in contrast with eukaryotes, which literally called blue-green algae). Cells of these means “good [well-formed] nucleus.” organisms lack a membrane-bounded nucleus and mitochondria, are surrounded by a cell wall, and divide by binary fission. Natural Selection and Diversity 2. The kingdom Archaea This group was Although Figure 1.21 illustrates the unity initially discovered among microorganisms of life, it also illustrates life’s diversity. Frogs that produce methane gas or that live in extreme environments, such as hot springs are different from fungi, and beetles are dif- or high salt concentrations. They are widely ferent from bacteria. As a human being, it is distributed in more normal environments as sobering to consider that complex, multi- well. Like those of bacteria, the cells of cellular organisms are relatively recent ar- archaeans lack internal membranes. DNA sequence analysis indicates that the machin- ery for DNA replication and transcription in archaeans resembles that of eukaryans, whereas their metabolism strongly EUKARYA Stramenopiles Alveolates Figure 1.21 Evolutionary relation- ships among the major life forms Plantae Red algae as inferred from similarities in Slime molds nucleotide sequence in an Animalia Entamoebae RNA molecule found in BACTERIA the small subunit of the Mycoplasma Fungi Heterolobosea ribosome. The three Physarum Plant Chloroplasts major kingdoms— Kinetoplastids Cyanobacteria Bacteria, Archaea, and Euglenoids Eukarya—are apparent. Plants, animals, and Microsporidians Agrobacterium fungi are more closely Trichomonads related to each other Diplomonads Plant Mitochondria than to members of either of the other kingdoms. Enterobacteria Sulfolobus Among eukaryotes, the time of branching of the diverse groups Thermoplasma Halobacteria of undifferentiated, relatively simple organisms (trichomonads, microsporidians, Methanobacteria euglenoids, and so forth) is probably not as ARCHAEA ancient as suggested by the diagram. [Courtesy of Mitchell L. Sogin.]
28 Chapter 1 Introduction to Molecular Genetics and Genomics rivals to the evolutionary scene of life on process that accounts for adaptation was Earth. Animals came later still and primates described by Charles Darwin in his 1859 very late indeed. What about human evo- book On the Origin of Species. Darwin pro- lution? In the time scale of Earth history, posed that adaptation is the result of nat- human evolution is a matter of a few mil- ural selection: Individual organisms lion years—barely a snap of the fingers. carrying particular mutations or combina- If common ancestry is the source of the tions of mutations that enable them to sur- unity of life, what is the source of its diver- vive or reproduce more effectively in the sity? Because differences among species are prevailing environment leave more off- inherited, the original source of the differ- spring than other organisms and so ences must be mutation. However, muta- contribute their favorable genes dispro- tions alone are not sufficient to explain portionately to future generations. If this why organisms are adapted to living in process is repeated throughout the course their environments—why ocean mammals of many generations, the entire species be- have special adaptations for swimming and comes genetically transformed—that is, it diving, why desert mammals have special evolves—because a gradually increasing adaptations that enable them to survive on proportion of the population inherits the minimal amounts of water. Mutations are favorable mutations. Mutation, natural se- chance events not directed toward any par- lection, and other features of the field of ticular adaptive goal, such as longer fur population genetics (also called evolutionary ge- among mammals living in the Arctic. The netics) are discussed in Chapter 17.
Chapter Summary Organisms of the same species have some traits (charac- cleotide subunits twisted around each other to form a teristics) in common but may differ from each other in in- right-handed helix. Each nucleotide subunit contains any numerable other traits. Many of the differences between one of four bases: A (adenine), T (thymine), G (guanine), organisms result from genetic differences, the effects of or C (cytosine). The bases are paired in the two strands of the environment, or both. Genetics is the study of inher- a DNA molecule; wherever one strand has an A, the part- ited traits, including those influenced in part by the envi- ner strand has a T, and wherever one strand has a G, the ronment. The elements of heredity consist of genes, which partner strand has a C. The base pairing means that the are transmitted from parents to offspring in reproduction. two paired strands in a DNA duplex molecule have com- Although the sorting of genes in successive generations plementary base sequences along their lengths. The was first expressed numerically by Mendel, the chemical structure of the DNA molecule suggested that genetic in- basis of genes was discovered by Miescher in the form of a formation could be coded in DNA in the sequence of weak acid, deoxyribonucleic acid (DNA). However, exper- bases. Mutations—changes in the genetic material—re- imental proof that DNA is the genetic material did not sult from changes in the sequence of bases, such as the come until about the middle of the twentieth century. substitution of one nucleotide for another or the insertion The first convincing evidence of the role of DNA in or deletion of one or more nucleotides. The structure of heredity came from the experiments of Avery, MacLeod, DNA also suggested a mode of replication: The two and McCarty, who showed that genetic characteristics in strands of the parental DNA molecule separate, and each bacteria could be altered from one type to another by individual strand serves as a template for the synthesis of treatment with purified DNA. In studies of Streptococcus a new, complementary strand. pneumoniae, they transformed mutant cells unable to Most genes code for proteins. More precisely stated, cause pneumonia into cells that could do so by treating most genes specify the sequence of amino acids in a them with pure DNA from disease-causing forms. A sec- polypeptide chain. The transfer of genetic information ond important line of evidence was the Hershey–Chase from DNA into protein is a multistep process that includes experiment. Hershey and Chase showed that the T2 bac- several types of RNA (ribonucleic acid). Structurally, an terial virus injects primarily DNA into the host bacterium RNA strand is similar to a DNA strand except that the (Escherichia coli) and that a much higher proportion of “backbone” contains a different sugar (ribose instead of parental DNA, compared with parental protein, is found deoxyribose) and RNA contains the base uracil (U) in- among the progeny phage. stead of thymine (T). Also, RNA is usually present in cells The three-dimensional structure of DNA, proposed in in the form of single, unpaired strands. The initial step in 1953 by Watson and Crick, gave many clues to the man- gene expression is transcription, in which a molecule of ner in which DNA functions as the genetic material. A RNA is synthesized that is complementary in base molecule of DNA consists of two long chains of nu- sequence to whichever DNA strand is being transcribed.
Chapter Summary 29 In polypeptide synthesis, which takes place on a ribo- Most visible traits of organisms result from many some, the base sequence in the RNA transcript is trans- genes acting together in combination with environmental lated in groups of three adjacent bases (codons). The factors. The relationship between genes and traits is often codons are recognized by different types of transfer RNA complex because (1) every gene potentially affects many (tRNA) through base pairing. Each type of tRNA is at- traits (pleiotropy), (2) every trait is potentially affected by tached to a particular amino acid, and when a tRNA base- many genes, and (3) many traits are significantly affected pairs with the proper codon on the ribosome, the growing by environmental factors as well as by genes. end of the polypeptide chain is transferred to the amino All living creatures are united by sharing many fea- acid on the tRNA. The table of all codons and the amino tures of the genetic apparatus (for example, transcription acids they specify is called the genetic code. Special codons and translation) and many aspects of metabolism. They specify the “start” (AUG, Met) and “stop” (UAA, UAG, share many similar genes in the genome and many simi- and UGA) of polypeptide synthesis. The reason why vari- lar proteins in the proteome. The unity of life results from ous types of RNA are an intimate part of transcription and all life being of common ancestry and provides evidence translation is probably that the earliest forms of life used for evolution. There is also great diversity among living RNA for both genetic information and enzyme catalysis. creatures. The three major kingdoms of organisms are the A mutation that alters one or more codons in a gene kingdoms Bacteria (whose members lack a membrane- can change the amino acid sequence of the resulting bounded nucleus), Archaea (whose members share fea- polypeptide chain synthesized in the cell. Often the al- tures with both bacteria and eukaryans but form a tered protein is functionally defective, so an inborn error distinct group), and Eukarya (which includes all “higher” of metabolism results. One of the first inborn errors of organisms whose cells have a membrane-bounded nu- metabolism studied was alkaptonuria; it results from the cleus that contains DNA organized into discrete chromo- absence of an enzyme for cleaving homogentisic acid, somes). Members of the kingdoms Bacteria and Archaea which accumulates and is excreted in the urine, turning are often collectively called prokaryotes. The ultimate black upon oxidation. Phenylketonuria (PKU) is an in- source of diversity among organisms is mutation, but nat- born error of metabolism that affects the same metabolic ural selection is the process by which mutations that en- pathway. The enzyme defect in PKU results in an inability hance survival and reproduction are retained and to convert phenylalanine to tyrosine. Phenylalanine ac- mutations that are harmful are eliminated. Natural selec- cumulation has catastrophic effects on the development tion, first proposed by Darwin, is therefore the primary of the brain. Children with the disease have severe men- mechanism by which organisms become progressively tal deficits unless they are treated with a special diet low better adapted to their environments. in phenylalanine.
Key Terms adenine (A) evolution polarity (of DNA or RNA) alkaptonuria gene polypeptide chain amino acid genetic code product molecule Archaea genetics prokaryote Bacteria genome protein folding bacteriophage guanine (G) proteome base (in DNA or RNA) homogentisic acid replication biochemical pathway inborn error of metabolism ribonucleic acid (RNA) block (in a biochemical pathway) messenger RNA (mRNA) ribose central dogma metabolic pathway ribosomal RNA (rRNA) chaperone metabolite ribosome chromosome monomer single-stranded DNA codon mutant substrate molecule colony mutation template complementary base sequence native conformation tetramer cytosine (C) natural selection thymine (T) deoxyribose nucleotide transcript deoxyribonucleic acid (DNA) oligomerization domain transcription double-stranded DNA phage transfer RNA (tRNA) duplex DNA phenylalanine hydroxylase (PAH) transformation enzyme phenylketonuria (PKU) translation Eukarya pleiotropic effect uracil (U) eukaryote pleiotropy Watson–Crick base pairing
30 Chapter 1 Introduction to Molecular Genetics and Genomics Review the Basics
• What were the key experiments showing that DNA is • How does the “central dogma” explain Garrod’s dis- the genetic material? covery that nonfunctional enzymes result from mu- tant genes? • How did understanding the molecular structure of DNA give clues to its ability to replicate, to code for • Explain why many mutant forms of phenylalanine proteins, and to undergo mutation? hydroxylase have a simple amino acid replacement, yet the mutant polypeptide chains are absent or pre- • Why is pairing of complementary bases a key feature sent in very small amounts.? of DNA replication? • What is a pleiotropic effect of a gene mutation? Give • What is the process of transcription and in what ways an example. does it differ from DNA replication? • What are some of the major differences in cellular or- • What three types of RNA participate in protein syn- ganization among Bacteria, Archaea, and Eukarya? thesis, and what is the role of each type of RNA? • What process was Charles Darwin describing when • What is the “genetic code,” and how is it relevant to he wrote the following statement? “As more individ- the translation of a polypeptide chain from a mole- uals are produced than can possibly survive, there cule of messenger RNA? must in every case be a struggle for existence, either • What is an inborn error of metabolism? How did this one individual with another of the same species, or concept serve as a bridge between genetics and bio- with the individuals of distinct species, or with the chemistry? physical conditions of life.”
Guide to Problem Solving Problem 1 In the human gene for the chain of hemoglo- that the transcribed strand is that given in Problem 1. The bin (the oxygen-carrying protein in the red blood cells), RNA transcript therefore has the base sequence the first 30 nucleotides in the amino-acid–coding region have the sequence 5'-AUGGUGCACCUGACUCCUGAGGAGAAGUCU-3'
3'-TACCACGTGGACTGAGGACTCCTCTTCAGA-5' Problem 3 Given the RNA sequence coding for part of hu- man hemoglobin deduced in Problem 2, what is the What is the sequence of the partner strand? amino acid sequence in this part of the polypeptide chain? Answer The base pairing between the strands is A with T and G with C, but it is equally important that the strands Answer The polypeptide chain is translated in successive in a DNA duplex have opposite polarity. The partner groups of three nucleotides (each group constituting a strand is therefore oriented with its 5 end at the left, and ' codon), starting at the 5' end of the coding sequence and the base sequence is moving in the 5'-to-3' direction. The amino acid corre- sponding to each codon can be found in the genetic code 5'-ATGGTGCACCTGACTCCTGAGGAGAAGTCT-3' table. The first ten amino acids in the polypeptide chain are therefore Problem 2 If the DNA duplex for the chain of hemoglo- bin in Problem 1 were transcribed from left to right, de- 5'-AUG GUG CAC CUG ACU CCU GAG GAG AAG UCU-3' duce the base sequence of the RNA in this coding region. Met Val His Leu Thr Pro Glu Glu Lys Ser
Answer To deduce the RNA sequence, we must apply Problem 4 A very important mutation in human hemoglo- three concepts. First, in the transcription of RNA, the base bin occurs in the DNA sequence shown in Problem 1. In pairing is such that an A, T, G, or C in the DNA template this mutation, the T at nucleotide position 20 is replaced strand is transcribed as U, A, C, or G, respectively, in the with an A, where the numbering is from the left in the RNA strand. Second, the RNA transcript and the DNA strand shown at the top in Problem 1. The mutant hemo- template strand have opposite polarity. Third (and criti- globin is called sickle-cell hemoglobin, and it is associated cally for this problem), the RNA transcript is always tran- with a severe anemia known as sickle-cell anemia. Severe as scribed in the 5'-to-3' direction, so the 5' end of the RNA the genetic disease is, the mutant gene is present at rela- is the end synthesized first. This being the case, and con- tively high frequency in some human populations be- sidering the opposite polarity, the 3' end of the template cause carriers of the gene, who have only a mild anemia, strand must be transcribed first. Because we are told that are more resistant to falciparum malaria than are noncar- transcription takes place from left to right, we can deduce riers. What is the nucleotide sequence of this region of the
Guide to Problem Solving 31 CHAPTER XX GeNETics on the Web will introduce you to some of the most see my father....Shehadasked many doctors for help, which important sites for finding genetic information on the none had been able to give. But this woman was unusually Internet. To explore these sites, visit the Jones and Bartlett persistent and would not accept the situation without expla- home page at nation. She had also noticed that a peculiar smell always http://www.jbpub.com/genetics clung to her children....This woman was advised to seek help from my father [who held a professorship of nutrient re- For the book Genetics: Analysis of Genes and Genomes, choose search at the University Hospital in Norway]. He of course the link that says Enter GeNETics on the Web. You will be pre- had no real hope of being able to help her. But he did not sented with a chapter-by -chapter list of highlighted keywords. want to reject her, and he agreed to examine the children. On Select any highlighted keyword and you will be linked to a Web clinical examination he found nothing of importance, except site containing genetic information related to the keyword. the [severe mental retardation], which was beyond doubt. • James D. Watson once said that he and Francis Crick had [Urine analysis] was part of his thorough routine examina- no doubt that their proposed DNA structure was essentially tion. On adding ferric chloride [to normal urine] the color nor- correct, because the structure was so beautiful it had to be mally stays brownish, [but in the case of these children] a true! At an internet site accessed by the keyword DNA, you deep green color developed. He had not seen this reaction can view a large collection of different types of models of DNA before...Heconcluded that two mentally retarded children structure. Some models highlight the sugar–phosphate back- excreted a substance not found in normal urine. But which bones, others the A T and G C base pairs, still others the substance?” Consult this keyword site for more information helical structure of double-stranded DNA. on how Asbjörn Fölling finally tracked down the cause of the disease. • How was PKU discovered? The story is told by the Norwegian physician Ivar Fölling: “The stage is set in 1934. A • With proper dietary control of blood phenylalanine, pa- mother with two severely mentally retarded children came to tients with PKU can develop normally and lead normal lives.
DNA duplex in sickle-cell hemoglobin (both strands) and and that of the mutant polypeptide chain as in Problem 3, that of the messenger RNA, and what is the amino acid re- except that at each step there is one nucleotide (or one placement that results in sickle-cell hemoglobin? amino acid) that differs from the nonmutant. The DNA, RNA, and polypeptide in this region of sickle-cell hemo- Answer The mutation is already given as a T A substitu- globin are as follows, where the differences from the non- tion at position 20. The sequence of the DNA duplex is ob- mutant gene are in red. The amino acid replacement is tained as in Problem 1, that of the RNA as in Problem 2, glutamic acid valine.
DNA (transcribed strand) 3'-TAC CAC GTG GAC TGA GGA CAC CTC TTC AGA-5' DNA (nontranscribed strand) 5'-ATG GTG CAC CTG ACT CCT GTG GAG AAG TCT-3' RNA coding region 5'-AUG GUG CAC CUG ACU CCU GUG GAG AAG UCU-3' Polypeptide chain Met Val His Leu Thr Pro Val Glu Lys Ser Problem 4 Answer—Nucleotide sequences
Analysis and Applications 1.1 Classify each of the following statements as true (d) There is one-to-one correspondence between the set or false. of codons in the genetic code and the set of amino acids encoded. (a) Each gene is responsible for only one visible trait. (b) Every trait is potentially affected by many genes. 1.2 From their examination of the structure of DNA, (c) The sequence of nucleotides in a gene specifies the what were Watson and Crick able to infer about the prob- sequence of amino acids in a protein encoded by the able mechanisms of DNA replication, coding capability, gene. and mutation?
32 Chapter 1 Introduction to Molecular Genetics and Genomics When dietary control is relaxed, however, blood phenylala- • The Mutable Site changes frequently. Each new update in- nine returns to high levels. This situation is extremely dan- cludes a different site that highlights genetics resources avail- gerous for a developing fetus, resulting in high risk of able on the World Wide Web. Select the Mutable Site for congenital heart disease, small head size, mental retarda- Chapter 1 and you will be linked automatically. tion, and slow growth. Affected children are said to have • The Pic Site showcases some of the most visually appeal- maternal PKU. They are affected, not because of their own in- ing genetics sites on the World Wide Web. To visit the genetics ability to metabolize phenylalanine, but because of high lev- Web site pictured below, select the PIC Site for Chapter 1. els of phenyalanine in their mothers’ blood. The risk can be reduced, but not entirely eliminated, if PKU mothers plan their pregnancies and adhere to a strict dietary regimen prior to and during pregnancy. To learn more about this unantici- pated consequence of dietary treatment of PKU, log onto the phenylalanine hydroxylase knowledge database at the key- word site. • Perhaps surprisingly, the history of the bacteriophage T2 that figures so prominently in the experiments of Hershey and Chase is shrouded in mystery. Indeed, the time, place, and source material for the original isolation of phages T2, T4, and T6 (known as the “T-even phages”) may never be known with certainty. Use the keyword T2 to learn what is known about the origin of the bacteriophage and what sleuthing was required to discover what is known.
1.3 What does it mean to say that each strand of a duplex 1.9 What feature of the physical organization of bacterio- DNA molecule has a polarity? What does it mean to say phage T2 made it suitable for use in the Hershey–Chase that the paired strands in a duplex molecule have oppo- experiments? site polarity? 1.10 Like DNA, molecules of RNA contain large amounts 1.4 What is the end result of replication of a duplex DNA of phosphorus. When Hershey and Chase grew their T2 molecule? phage in bacterial cells that had grown in the presence of radioactive phosphorus, the RNA must also have incor- 1.5 What is the role of the messenger RNA in transla- porated the labeled phosphorus, and yet the experimen- tion? What is the role of the ribosome? What is the role of tal result was not compromised. Why not? transfer RNA? Is there more than one type of ribosome? Is there more than one type of transfer RNA? 1.11 Although the Hershey–Chase experiments were widely accepted as proof that DNA is the genetic material, 1.6 What important observation about S and R strains of the results were not completely conclusive. Why not? Streptococcus pneumoniae prompted Avery, MacLeod, and McCarty to study this organism? 1.12 The DNA extracted from a bacteriophage contains 28 percent A, 28 percent T, 22 percent G, and 22 percent 1.7 In the transformation experiments of Avery, C. What can you conclude about the structure of this MacLeod, and McCarty, what was the strongest evidence DNA molecule? that the substance responsible for the transformation was DNA rather than protein? 1.13 The DNA extracted from a bacteriophage consists of 24 percent A, 30 percent T, 20 percent G, and 26 percent 1.8 A chemical called phenyl (carbolic acid) destroys pro- C. What is unusual about this DNA? What can you con- teins but not nucleic acids, and a strong alkali such as clude about its structure? sodium hydroxide destroys both proteins and nucleic acids. In the transformation experiments with Streptococcus 1.14 A double-stranded DNA molecule is separated into pneumoniae, what result would be expected if the S-strain its constituent strands, and the strands are separated in an extract had been treated with phenyl? What would be ex- ultracentrifuge. In one of the strands the base composi- pected if it had been treated with a strong alkali? tion is 24 percent A, 28 percent T, 22 percent G, and 26
Au: Phenol changed to phenyl Analysis and Applications 33 two times in prob 1.8. OK? percent C. What is the base composition of the other peating amino acid Phe Phe Phe Phe .... If you as- strand? sume that the genetic code is a triplet code, what does this result imply about the codon for phenylalanine (Phe)? 1.15 While studying sewage, you discover a new type of bacteriophage that infects E. coli. Chemical analysis re- 1.24 A synthetic mRNA molecule consisting of the re- veals protein and RNA but no DNA. Is this possible? peating base sequence
1.16 One strand of a DNA duplex has the base sequence 5'-UUUUUUUUUUUU ...-3' 5'-ATCGTATGCACTTTACCCGG-3'. What is the base se- quence of the complementary strand? is terminated by the addition, to the right-hand end, of a single nucleotide bearing A. When translated in vitro, the 1.17 A region along one strand of a double-stranded DNA resulting polypeptide consists of a repeating sequence of molecule consists of tandem repeats of the trinucleotide phenylalanines terminated by a single leucine. What does 5'-TCG-3', so the sequence in this strand is this result imply about the codon for leucine?
5'-TCGTCGTCGTCGTCG ...-3' 1.25 With in vitro translation of an RNA into a polypep- tide chain, the translation can begin anywhere along the What is the sequence in the other strand? RNA molecule. A synthetic RNA molecule has the sequence 1.18 A duplex DNA molecule contains a random se- quence of the four nucleotides with equal proportions of 5'-CGCUUACCACAUGUCGCGAACUCG-3' each. What is the average spacing between consecutive occurrences of the sequence 5'-GGCC-3'? Between con- How many reading frames are possible if this molecule is secutive occurrences of the sequence 5'-GAATTC-3'? translated in vitro? How many reading frames are possible if this molecule is translated in vivo, in which translation 1.19 A region along a DNA strand that is transcribed con- starts with the codon AUG? tains no A. What base will be missing in the correspond- ing region of the RNA? 1.26 You have sequenced both strands of a double- stranded DNA molecule. To inspect the potential amino 1.20 The duplex nucleic acid molecule shown here con- acid coding content of this molecule, you conceptually sists of a strand of DNA paired with a complementary transcribe it into RNA and then conceptually translate the strand of RNA. Is the RNA the top or the bottom strand? RNA into a polypeptide chain. How many reading frames One of the base pairs is mismatched. Which pair is it? will you have to examine?
5'-AUCGGUUACAUUCCGACUGA-3' 1.27 A synthetic mRNA molecule consists of the repeat- 3'-TAGCCAATGTAAGGGTGACT-5' ing base sequence 5'-UCUCUCUCUCUCUCUC ...-3'. When this molecule is translated in vitro, the result is a 1.21 The sequence of an RNA transcript that is initially polypeptide chain consisting of the alternating amino synthesized is 5'-UAGCUAC-3', and successive nu- acids Ser Leu Ser Leu Ser Leu ... . Why do the cleotides are added to the 3' end. This transcript is pro- amino acids alternate? What does this result imply about duced from a DNA strand with the sequence the codons for serine (Ser) and leucine (Leu)?
3'-AAGTCGCATATCGATGCTAGCGCAACCT-5' 1.28 A synthetic mRNA molecule consists of the repeat- ing base sequence 5'-AUCAUCAUCAUCAUCAUC ...-3'. What is the sequence of the RNA transcript when synthe- When this molecule is translated in vitro, the result is a sis is complete? mixture of three different polypeptide chains. One con- sists of repeating isoleucines (Ile Ile Ile Ile ...), an- 1.22 An RNA molecule folds back upon itself to form a other of repeating serines (Ser Ser Ser Ser ...), and “hairpin”structure held together by a region of base pair- the third of repeating histidines (His His His His ...). ing. One segment of the molecule in the paired region has What does this result imply about the manner in which the base sequence 5'-AUACGAUA-3'. What is the base an mRNA is translated? sequence with which this segment is paired? 1.29 How is it possible for a gene with a mutation in the 1.23 A synthetic mRNA molecule consists of the repeat- coding region to encode a polypeptide with the same ing base sequence amino acid sequence as the nonmutant gene?
5'-UUUUUUUUUUUU ...-3' 1.30 A polymer is made that has a random sequence con- sisting of 75 percent G’s and 25 percent U’s. Among the When this molecule is translated in vitro using ribosomes, amino acids in the polypeptide chains resulting from in transfer RNAs, and other necessary constituents from E. vitro translation, what is the expected frequency of Trp? coli, the result is a polypeptide chain consisting of the re- of Val? of Phe?
34 Chapter 1 Introduction to Molecular Genetics and Genomics Challenge Problems 1.31 The coding sequence in the messenger RNA for consider that the coding sequence in the messenger RNA amino acids 1 through 10 of human phenylalanine for the first 10 amino acids in human beta hemoglobin hydroxylase is (part of the oxygen carrying protein in the blood) is
5'-AUGUCCACUGCGGUCCUGGAAAACCCAGGC-3' 5'–AUGGUGCACCUGACUCCUGAGGAGAAGUCU ···–3'
(a) What are the first 10 amino acids? (a) What is the amino acid sequence in this part of the (b) What sequence would result from a mutant RNA in polypeptide chain? which the red A was changed to G? (b) What would be the consequence of a frameshift mu- (c) What sequence would result from a mutant RNA in tation resulting in an RNA missing the red U? which the red C was changed to G? (c) What would be the consequence of a frameshift mu- (d) What sequence would result from a mutant RNA in tation resulting in an RNA with a G inserted immedi- which the red U was changed to C? ately in front of the red U? (e) What sequence would result from a mutant RNA in which the red G was changed to U? 1.31 With regard to the wildtype and mutant RNA mole- cules described in the previous problem, deduce the base 1.32 A “frameshift” mutation is a mutation in which sequence in both strands of the corresponding double- some number of base pairs, not a multiple of three, is in- stranded DNA for: serted into or deleted from a coding region of DNA. The (a) The wildtype sequence result is that, at the point of the frameshift mutation, the (b) The single-base deletion reading frame of protein translation is shifted with re- (c) The single-base insertion spect to the nonmutant gene. To see the consequence,
Further Reading Abedon, S. T. 2000. The murky origin of Snow White Olson, G. J., and C. R. Woese. 1997. Archaeal genomics: and her T-even dwarfs. Genetics 155: 481. An overview. Cell 89: 991. Bearn, A. G. 1994. Archibald Edward Garrod, the Radman, M., and R. Wagner. 1988. The high fidelity of reluctant geneticist. Genetics 137: 1. DNA duplication. Scientific American, August. Calladine, C. R. 1997. Understanding DNA: The Molecule Rennie, J. 1993. DNA’s new twists. Scientific American, and How It Works. New York: Academic Press. March. Ciechanover, A., A. Orian, and A. L. Schwartz. 2000. Scazzocchio, C. 1997. Alkaptonuria: From humans to Ubiquitin-mediated proteolysis: Biological regulation moulds and back. Trends in Genetics 13: 125. via destruction. Bioessays 22: 442. Scriver, C. R., and P. J. Waters. 1999. Monogenic traits Doolittle, W. F. 2000. Uprooting the tree of life. Scientific are not simple: Lessons from phenylketonuria. Trends American, February. in Genetics 15: 267. Gehrig, A., S. R. Schmidt, C. R. Muller, S. Srsen, K. Stadler, D. 1997. Ultraviolet-induced mutation and the Srsnova, and W. Kress. 1997. Molecular defects in chemical nature of the gene. Genetics 145: 863. alkaptonuria. Cytogenetics & Cell Genetics 76: 14. Susman, M. 1995. The Cold Spring Harbor phage course Gould, S. J. 1994. The evolution of life on the earth. (1945–1970): A 50th anniversary remembrance. Scientific American, October. Genetics 139: 1101. Horowitz, N. H. 1996. The sixtieth anniversary of Watson, J. D. 1968. The Double Helix. New York: biochemical genetics. Genetics 143: 1. Atheneum. Judson, H. F. 1996. The Eighth Day of Creation: The Makers Zhang, J. Z. 2000. Protein-length distributions for the of the Revolution in Biology. Cold Spring Harbor, NY: Cold three domains of life. Trends in Genetics 16: 107. Spring Harbor Laboratory Press. Mirsky, A. 1968. The discovery of DNA. Scientific American, June.
Further Reading 35 CHAPTER 2 DNA Structure and DNA Manipulation
36 CHAPTER OUTLINE PRINCIPLES 2.1 Genomes and Genetic Differences Among • A DNA strand is a polymer of A, T, G and C Individuals deoxyribonucleotides joined 3'-to-5' by phosphodiester bonds. DNA Markers as Landmarks in Chromosomes • The two DNA strands in a duplex are held together by hydrogen bonding between the A T and G C base pairs and 2.2 The Molecular Structure of DNA Polynucleotide Chains by base stacking of the paired bases. Base Pairing and Base Stacking • Each type of restriction endonuclease enzyme cleaves double- Antiparallel Strands stranded DNA at a particular sequence of bases usually four or DNA Structure as Related to six nucleotides in length. Function • The DNA fragments produced by a restriction enzyme can be 2.3 The Separation and Identificatin of Genomic DNA Fragments separated by electrophoresis, isolated, sequenced, and Restriction Enzymes and Site-Specific manipulated in other ways. DNA Cleavage • Separated strands of DNA or RNA that are complementary in Gel Electrophoresis nucleotide sequence can come together (hybridize) Nucleic Acid Hybridization The Southern Blot spontaneously to form duplexes. 2.4 Selective Replication of Genomic DNA • DNA replication takes place only by elongation of the growing Fragments strand in the 5'-to-3' direction through the addition of Constraints on DNA Replication: successive nucleotides to the 3' end. Primers and 5'-to-3' Strand • In the polymerase chain reaction, short oligonucleotide Elongation The Polymerase Chain Reaction primers are used in successive cycles of DNA replication to amplify selectively a particular region of a DNA duplex. 2.5 The Terminology of Genetic Analysis • Genetic markers in DNA provide a large number of easily accessed sites in the genome that can be used to identify the 2.6 Types of DNA Markers Present in Genomic DNA chromosomal locations of disease genes, for DNA typing in Single Nucleotide Polymorphisms individual identification, for the genetic improvement of (SNPs) cultivated plants and domesticated animals, and for many Restriction Fragment Length other applications. Polymorphisms (RFLPs) Random Amplified Polymorphic DNA (RAPD) Amplified Fragment Length Polymorphisms (AFLPs) Simple Tandem Repeat CONNECTIONS Polymorphisms (STRPs) The Double Helix 2.7 Applications of DNA Markers James D. Watson and Francis H. C. Crick 1953 Genetic Markers, Genetic Mapping, A Structure for Deoxyribose Nucleic Acid and “Disease Genes” Other Uses for DNA Markers Origin of the Human Genetic Linkage Map David Botstein, Raymond L. White, Mark Skolnick, and Ronald W. Davis 1980 Construction of a Genetic Linkage Map in Man Using Restriction Fragment Length Polymorphisms
37 n Chapter 1, we reviewed the experimen- tal evidence demonstrating that the genetic material is DNA. We saw how, Ithrough the unique structure of the DNA molecule, genetic information can be tran- scribed and translated into proteins that affect the inherited characteristics of organ- isms. When a mutant gene encodes a non- functional protein that results in some physical or physiological abnormality—for example, an “inborn error of metabo- lism”—the expression of that abnormality can be used to trace the transmission of the mutant gene from one generation to the next in a pedigree (family history). As a consequence, until recently, the first step in genetic analysis was the identification of or- ganisms with such abnormal traits, such as peas with wrinkled seeds instead of round seeds and fruit flies with white eyes instead of red. These traits were studied by means of controlled crosses so that the parentage of each individual could be traced. Large- scale genetic studies were typically limited to one of a small number of model organ- isms especially favorable for isolating and identifying mutant genes, such as the bud- 09131_01_1651A ding yeast Saccharomyces cerevisiae, the nem- atode worm Caenorhabditis elegans, or the fruit fly Drosophila melanogaster. Since the mid-1970s, studies in genetics have undergone a revolution based on the use of increasingly sophisticated ways to isolate and identify specific fragments of DNA. The culmination of these techniques was large-scale genomic sequencing—the ability to determine the correct sequence of the base pairs that make up the DNA in an entire genome and to identify the se- quences associated with genes. Because many of the model organisms used in ge- netics have relatively small genomes, these sequences were completed first, in the late 1990s (Figure 2.1) The techniques used to se- quence these simpler genomes were then scaled up to sequence the human genome. The initial “rough draft” of the human genome was announced in June 2000; this represents an important milestone in the Human Genome Project, whose goals in-
Figure 2.1 Timeline of large-scale genomic DNA sequencing.
38 Chapter 2 DNA Structure and DNA Manipulation clude determining the sequence, and iden- tions that are responsible for genetic dis- Au: In 1st para in tifying the function, of all human genes. eases such as phenylketonuria and other 1st column, proof- inborn errors of metabolism, as well as the reader asks if mutations that increase the risk of more “hundreds or complex diseases such as heart disease, thousands of 2.1 Genomes and Genetic breast cancer, and diabetes. copies” is correct? Differences Among Fortunately, only a small proportion of Or should phrase Individuals all differences in DNA sequence are asso- be “hundreds of ciated with disease. Some of the others are thousands of The numbers associated with the genome of associated with inherited differences in copies”? even a simple organism can be intimidating. height, weight, hair color, eye color, facial The sequenced genomes of D. melanogaster features, and other traits. Most of the and C. elegans, both approximately 100 mil- genetic differences between people are lion base pairs in length, encode 13,601 completely harmless. Many have no de- proteins and 18,424 proteins, respectively. tectable effects on appearance or health. The human genome is considerably larger. Such differences can be studied only As found in a human reproductive cell, the through direct examination of the DNA it- human genome consists of 3 billion base self. These harmless differences are never- pairs organized into 23 distinct chromo- theless important, because they serve as somes (each chromosome contains a single genetic markers. molecule of duplex DNA). A typical chro- mosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in DNA Markers as Landmarks the chromosome. The sequences that make in Chromosomes up the protein-coding part of these genes In genetics, a genetic marker is any differ- actually account for only about 4 percent of ence in DNA, no matter how it is detected, the entire genome. The other 96 percent of whose pattern of transmission from genera- the sequences do not code for proteins. tion to generation can be tracked. Each in- Some noncoding sequences are genetic dividual who carries the marker also carries “chaff” that gets separated from the protein- a length of chromosome on either side of it, coding “wheat” when genes are transcribed so it marks a particular region of the and the RNA transcript is processed into genome. A mutant gene, or some portion of messenger RNA. Other noncoding se- a mutant gene, can serve as a genetic quences are relatively short sequences that marker. In the “classical” approach to ge- are found in hundreds or thousands of netics, it is the outward expression of a copies scattered throughout the genome. gene (or lack of expression) that forms the Still other noncoding sequences are rem- basis of genetic analysis. For example, a nants of genes called pseudogenes. As might mutation causing wrinkled peas is a genetic be expected, identifying the protein-coding marker, which can be identified through its genes from among the large background of effects on pea shape. In modern genetic noncoding DNA in the human genome is a analysis, any difference in DNA sequence challenge in itself. between two individuals can serve as a ge- Geneticists often speak of the nucleotide netic marker. And although these genetic sequence of “the” human genome because markers are often harmless in themselves, 99.9 percent of the DNA sequences in any they allow the positions of disease genes to two individuals are the same. This is our be located and their DNA isolated, identi- evolutionary legacy; it contains the genetic fied, and studied. information that makes us human beings. Genetic markers that are detected by di- In reality, however, there are many differ- rect analysis of the DNA are often called ent human genomes. Geneticists have great DNA markers. DNA markers are impor- interest in the 0.1 percent of the human tant in genetics because they serve as land- DNA sequence—3 million base pairs—that marks in long DNA molecules, such as those differs from one genome to the next, be- found in chromosomes, which allow ge- cause these differences include the muta- netic differences among individuals to be
2.1 Genomes and Genetic Differences Among Individuals 39 tracked. They are like signposts along a ward expression. Use of these methods highway. Using DNA markers as landmarks, broadens the scope of genetics, making it the geneticist can identify the positions of possible to carry out genetic analysis in any normal genes, mutant genes, breaks in organism. This means that detailed genetic chromosomes, and other features impor- analysis is no longer restricted to human tant in genetic analysis (Figure 2.2). beings, domesticated animals, cultivated The detection of DNA markers usually plants, and the relatively small number of requires that the genomic DNA (the total model organisms favorable for genetic DNA extracted from cells of an organism) studies. Direct study of DNA eliminates the be fragmented into pieces of manageable need for prior identification of genetic dif- size (usually a few thousand nucleotide ferences between individuals; it even elim- pairs) that can be manipulated in labora- inates the need for controlled crosses. The tory experiments. In the following sec- methods of molecular analysis discussed in tions, we examine some of the principal this chapter have transformed genetics: ways in which DNA is manipulated to re- veal genetic differences among individuals, The manipulation of DNA is the basic experi- whether or not these differences find out- mental operation in modern genetics.
Chromosomes are located Each chromosome contains DNA markers are used in the cell nucleus. one long molecule of duplex to identify particular (double-stranded) DNA. regions along the DNA in chromosomes.
A B Nucleus C DNA marker
Double-stranded DNA molecule Chromosomes Cleavage
HUMAN CELL Single DNA DNA fragment D fragments can be cleaved from the molecule. BACTERIAL CELL Replication in CLONED DNA bacterial cells The fragments can be Large quantities of the transferred into bacterial cloned human DNA D cells, where they can fragment can be isolated replicate. (This is the from the bacterial cells. procedure of cloning.)
A DNA marker, in this case D D, serves to identify bacterial cells containing a particular DNA fragment of interest. Au: Proofreader asks that you lcon- Figure 2.2 DNA markers serve as landmarks that identify physical positions along a DNA molecule, firm Chapter 13 such as DNA from a chromosome. As shown at the right, a DNA marker can also be used to identify reference in Fig 2.2 bacterial cells into which a particular fragment of DNA has been introduced. The procedure of DNA caption is correct. cloning is not quite as simple as indicated here; it is discussed further in Chapter 13.
40 Chapter 2 DNA Structure and DNA Manipulation These methods are the principal techniques used in virtually every modern genetics laboratory.
2.2 The Molecular Structure
of DNA 09131_01_1745P Modern experimental methods for the ma- nipulation and analysis of DNA grew out of a detailed understanding of its molecular structure and replication. Therefore, to un- derstand these methods, one needs to know something about the molecular structure of DNA. We saw in Chapter 1 that DNA is a helix of two paired, complemen- tary strands, each composed of an ordered string of nucleotides, each bearing one of the Photocaptionphotocaptionphotocaptionphotoc bases A (adenine), T (thymine), G (gua- aptionphotocaptionphotocaptionphotocaption- nine), or cytosine (C). Watson–Crick base photocaptionphotocaptionphotocaptionphoto- pairing between A and T and between G caption and C in the complementary strands holds the strands together. The complementary strands also hold the key to replication, be- cause each strand can serve as a template gar), phosphoric acid, and the four nitro- for the synthesis of a new complementary gen-containing bases denoted A, T, G, and strand. We will now take a closer look at C. The chemical structures of the bases are DNA structure and at the key features of its shown in Figure 2.3. Note that two of the replication. bases have a double-ring structure; these are called purines. The other two bases have a single-ring structure; these are Polynucleotide Chains called pyrimidines. In terms of biochemistry, a DNA strand is a • The purine bases are adenine (A) and polymer—a large molecule built from guanine (G). repeating units. The units in DNA are com- • The pyrimidine bases are thymine (T) and posed of 2'-deoxyribose (a five-carbon su- cytosine (C).
Purines Pyrimidines
Adenine Guanine Thymine Cytosine H H
N O CH3 H H C H C H C O H C N 6 N N N 5 C C N1 5 C 7 C C 6 4 C H A 8 C H G C H T C 2 4 9 1 3 C 3 C C C N 2 N N N N H N H N N N Deoxyribose C H Deoxyribose C H Deoxyribose Deoxyribose O O
Figure 2.3 Chemical structures of the four nitrogen-containing bases in DNA: adenine, thymine, guanine, and cytosine. The nitrogen atom linked to the deoxyribose sugar is indicated. The atoms shown in red participate in hydrogen bonding between the DNA base pairs.
2.2 The Molecular Structure of DNA 41 Nucleoside Nucleotide
OH Base Base
HOCH2 HO P O CH2 5 O A, G, T, or C 5 O A, G, T, or C
4 HH1 O 4 HH1 HH HH 3 2 3 2 OH H Phosphate OH H Sugar Sugar This group is OH This group is OH in RNA. in RNA.
Figure 2.4 A typical nucleotide, showing the three major components (phosphate, sugar, and base), the difference between DNA and RNA, and the distinction between a nucleoside (no phosphate group) and a nucleotide (with phosphate). Nucleotides are monophosphates (with one phosphate group). Nucleoside diphosphates contain two phosphate groups, and nucleoside triphosphates contain three.
In DNA, each base is chemically linked in Table 2.1. Most of these terms are not to one molecule of the sugar deoxyribose, needed in this book; they are included be- forming a compound called a nucleoside. cause they are likely to be encountered in When a phosphate group is also attached further reading. to the sugar, the nucleoside becomes a In nucleic acids, such as DNA and RNA, nucleotide (Figure 2.4). Thus a nucleotide the nucleotides are joined to form a is a nucleoside plus a phosphate. In the polynucleotide chain, in which the phos- conventional numbering of the carbon phate attached to the 5' carbon of one sugar atoms in the sugar in Figure 2.4, the car- is linked to the hydroxyl group attached to bon atom to which the base is attached is the 3' carbon of the next sugar in line Au: Query from the 1' carbon. (The atoms in the sugar are (Figure 2.5). The chemical bonds by which MH—Why no given primed numbers to distinguish them the sugar components of adjacent nu- primes shown from atoms in the bases.) The nomencla- cleotides are linked through the phosphate here (shown in ture of the nucleoside and nucleotide de- groups are called phosphodiester bonds. Fig 2.5)? rivatives of the DNA bases is summarized The 5'–3'–5'–3' orientation of these linkages
Table 2.1 DNA nomenclature Base Nucleoside Nucleotide
Adenine (A) Deoxyadenosine Deoxyadenosine-5' monophosphate (dAMP) diphosphate (dADP) triphosphate (dATP) Guanine (G) Deoxyguanosine Deoxyguanosine-5' monophosphate (dGMP) diphosphate (dGDP) triphosphate (dGTP) Thymine (T) Deoxythymidine Deoxythymidine-5' monophosphate (dTMP) diphosphate (dTDP) triphosphate (dTTP) Cytosine (C) Deoxycytidine Deoxycytidine-5' monophosphate (dCMP) diphosphate (dCDP) triphosphate (dCTP)
42 Chapter 2 DNA Structure and DNA Manipulation (A) (B)
5’ end 5’ end terminates P A with phosphate group
5’ end –O
– O PO NH2 P G
O N N H A 5’ CH 2 O N H N P C HH HH 3’ O H HO –O PO O 3’ end H O N N Phosphate linked H G 5’ CH2 to 5’ carbon and O N N NH to 3’ carbon 2 HH HH 3’ O H NH Phosphodiester –O PO 2 bonds O H N C 5’ CH 2 O H N O HH HH 3’ 3’ end OH H
3’ end terminates with hydroxyl (–OH)
Figure 2.5 Three nucleotides at the 5' end of a single polynucleotide strand. (A) The chemical structure of the sugar–phosphate linkages, showing the 5'-to-3' orientation of the strand (the red numbers are those assigned to the carbon atoms). (B) A common schematic way to depict a polynu- cleotide strand.
continues throughout the chain, which technique to measure the amount of each typically consists of millions of nucleotides. base present in DNA. As we describe this Note that the terminal groups of each technique, we will let the molar concentra- polynucleotide chain are a 5'-phosphate tion of any base be represented by the sym- (5'-P) group at one end and a 3'-hydroxyl bol for the base in square brackets; for (3'-OH) group at the other. The asymme- example, [A] denotes the molar concentra- try of the ends of a DNA strand implies that tion of adenine. Chargaff used his tech- each strand has a polarity determined by nique to measure the [A], [T], [G], and [C] which end bears the 5' phosphate and content of the DNA from a variety of which end bears the 3' hydroxyl. sources. He found that the base composi- A few years before Watson and Crick tion of the DNA, defined as the percent proposed their essentially correct three- G C, differs among species but is con- dimensional structure of DNA as a double stant in all cells of an organism and within a helix, Erwin Chargaff developed a chemical species. Data on the base composition of
2.2 The Molecular Structure of DNA 43 DNA from a variety of organisms are given by addition of the other two: [A] [G] in Table 2.2. [T] [C]. In the next section, we examine Chargaff also observed certain regular the molecular basis of base pairing in more relationships among the molar concentra- detail. tions of the different bases. These relation- ships are now called Chargaff’s rules: Base Pairing and Base Stacking • The amount of adenine equals that of In the three-dimensional structure of the thymine: [A] [T]. • The amount of guanine equals that of DNA molecule proposed in 1953 by Watson cytosine: [G] [C]. and Crick, the molecule consists of two • The amount of the purine bases equals that of polynucleotide chains twisted around one the pyrimidine bases: another to form a double-stranded helix in which adenine and thymine, and guanine [A] [G] [T] [C]. and cytosine, are paired in opposite strands Although the chemical basis of these ob- (Figure 2.6). In the standard structure, which servations was not known at the time, one of is called the B form of DNA, each chain the appealing features of the Watson–Crick makes one complete turn every 34 Å. The structure of paired complementary strands helix is right-handed, which means that as was that it explained Chargaff’s rules. Be- one looks down the barrel, each chain fol- cause A is always paired with T in double- lows a clockwise path as it progresses. The stranded DNA, it must follow that [A] bases are spaced at 3.4 Å, so there are ten [T]. Similarly, because G is paired with C, we bases per helical turn in each strand and ten know that [G] [C]. The third rule follows base pairs per turn of the double helix.
Table 2.2 Base composition of DNA from different organisms Base (and percentage of total bases) Base composition Organism Adenine Thymine Guanine Cytosine (percent G C)
Bacteriophage T7 26.0 26.0 24.0 24.0 48.0
Bacteria Clostridium perfringens 36.9 36.3 14.0 12.8 26.8 Streptococcus pneumoniae 30.2 29.5 21.6 18.7 40.3 Escherichia coli 24.7 23.6 26.0 25.7 51.7 Sarcina lutea 13.4 12.4 37.1 37.1 74.2
Fungi Saccharomyces cerevisiae 31.7 32.6 18.3 17.4 35.7 Neurospora crassa 23.0 22.3 27.1 27.6 54.7
Higher plants Wheat 27.3 27.2 22.7 22.8* 45.5 Maize 26.8 27.2 22.8 23.2* 46.0
Animals Drosophila melanagaster 30.8 29.4 19.6 20.2 39.8 Pig 29.4 29.6 20.5 20.5 41.0 Salmon 29.7 29.1 20.8 20.4 41.2 Human being 29.8 31.8 20.2 18.2 38.4
*Includes one-fourth 5-methylcytosine, a modified form of cytosine found in most plants more complex than algae and in many animals
44 Chapter 2 DNA Structure and DNA Manipulation (A) (B)
Minor groove
Adenine T A Thymine A T Guanine Major C G Cytosine groove G C
A T C G 34 Å per complete
G C
P P turn (10 base pairs C
G per turn)
H
N
N
N
C
H C
Guanine C
H
C N
N
C
H
O O
C N
Cytosine H
N
C N
N
N
H
C
H
C
C H C
C
C N H
H N
Adenine C
N O
H
H
H
C N
N
C O C
Thymine C
H
C
H 3 P
Phosphate P P Deoxyribose
sugar P P Base
Diameter Oxygen 20 Å Hydrogen
Phosphorus
C in sugar– phosphate chain
C and N in bases
Figure 2.6 Two representations of DNA, illustrating the three-dimensional structure of the double helix. (A) In a ribbon diagram, the sugar–phosphate backbones are depicted as bands, with horizon- tal lines used to represent the base pairs. (B) A computer model of the B form of a DNA molecule. The stick figures are the sugar–phosphate chains winding around outside the stacked base pairs, forming a major groove and a minor groove. The color coding for the base pairs is as follows: A, red or pink; T, dark green or light green; G, dark brown or beige; C, dark blue or light blue. The bases depicted in dark colors are those attached to the blue sugar–phosphate backbone; the bases depicted in light colors are attached to the beige backbone. [B, courtesy of Antony M. Dean.]
2.2 The Molecular Structure of DNA 45 The strands feature base pairing, in percent of G C. Because nothing re- which each base is paired to a complemen- stricts the sequence of bases in a single tary base in the other strand by hydrogen strand, any sequence could be present bonds. (A hydrogen bond is a weak bond along one strand. This explains Chargaff’s in which two participating atoms share a observation that DNA from different or- hydrogen atom between them.) The hydro- ganisms may differ in base composition. gen bonds provide one type of force holding However, because the strands in duplex the strands together. In Watson–Crick base DNA are complementary, Chargaff’s rules pairing, adenine (A) pairs with thymine of [A] [T] and [G] [C] are true what- (T), and guanine (G) pairs with cytosine ever the base composition. (C). The hydrogen bonds that form in the In the B form of DNA, the paired bases Au: Term in Key adenine–thymine base pair and in the are planar, parallel to one another, and per- Terms list is guanine–cytosine pair are illustrated in pendicular to the long axis of the double “hydrophobic Figure 2.7. Note that an A T pair (Figure helix. This feature of double-stranded DNA interaction” 2.7A and B) has two hydrogen bonds and is known as base stacking. The upper and rather than that a G C pair (Figure 2.7C and D) has lower faces of each nitrogenous base are “hydrophic.” three hydrogen bonds. This means that the relatively flat and nonpolar (uncharged). Would you like to hydrogen bonding between G and C is These surfaces are said to be hydrophobic change term in stronger in the sense that it requires more because they bind poorly to water mole- list? Or change energy to break; for example, the amount cules, which are very polar. (The polarity text here? of heat required to separate the paired refers to the asymmetrical distribution of strands in a DNA duplex increases with the charge across the V-shaped water molecule;
(A) (B) Two hydrogen bonds attract A and T.
H H N C N H O CH3 C N C C C C A Deoxyribose N H N T C H N C C N H O Deoxyribose
Adenine Thymine (C) (D) Three hydrogen bonds attract G and C.
H H C N O H N H C N C C C C G Deoxyribose N H N C C H N C C N N H O Deoxyribose H Guanine Cytosine
Figure 2.7 Normal base pairs in DNA. On the left, the hydrogen bonds (dotted lines) with the joined atoms are shown in red. (A and B) A T base pairing. (C and D) G C base pairing. In the space- filling models (B and D), the colors are as follows: C, gray; N, blue; O, red; and H (shown in the bases only), white. Each hydrogen bond is depicted as a white disk squeezed between the atoms sharing the hydrogen. The stick figures on the outside represent the backbones winding around the stacked base pairs. [Space-filling models courtesy of Antony M. Dean.]
46 Chapter 2 DNA Structure and DNA Manipulation the oxygen at the base of the V tends to be 3’ end quite negative, whereas the hydrogens at (terminates in the tips are quite positive). Owing to their 3’ hydroxyl) 5’ end OH repulsion of water molecules, the paired (terminates in nitrogenous bases tend to stack on top of 5’ phosphate) one another in such a way as to exclude the maximum amount of water from the in- T A P terior of the double helix. Hence a double- P stranded DNA molecule has a hydrophobic core composed of stacked bases, and it is the P energy of base stacking that provides P G C double-stranded DNA with much of its chemical stability. When discussing a DNA molecule, mol- ecular biologists frequently refer to the in- P C G P dividual strands as single strands or as single-stranded DNA; they refer to the dou- ble helix as double-stranded DNA or as a P duplex molecule. The two grooves spiraling P A T along outside of the double helix are not symmetrical; one groove, called the major groove, is larger than the other, which is P T A P called the minor groove. Proteins that in- teract with double-stranded DNA often have regions that make contact with the base pairs by fitting into the major groove, P G C P into the minor groove, or into both grooves (Figure 2.6B). 5’ end (terminates in HO 5’ phosphate) Antiparallel Strands 3’ end Each backbone in a double helix consists of (terminates in deoxyribose sugars alternating with phos- 3’ hydroxyl) phate groups that link the 3' carbon atom of Figure 2.8 A segment of a DNA molecule, one sugar to the 5' carbon of the next in showing the antiparallel orientation of the line (Figure 2.5). The two polynucleotide complementary strands. The overlying blue strands of the double helix have opposite arrows indicate the 5'-to-3' direction of each polarity in the sense that the 5' end of one strand. The phosphates (P) join the 3' carbon atom of one deoxyribose (horizontal line) to the strand is paired with the 3' end of the other 5' carbon atom of the adjacent deoxyribose. strand. Strands with such an arrangement are said to be antiparallel. One implica- tion of antiparallel strands in duplex DNA is that in each pair of bases, one base is at- ferent one. The right-handed double helix tached to a sugar that lies above the plane in Figure 2.6 is the standard B form, but de- of pairing, and the other base is attached to pending on conditions, DNA can actually a sugar that lies below the plane of pairing. form more than 20 slightly different vari- Another implication is that each terminus ants of a right-handed helix, and some re- of the double helix possesses one 5'-P group gions can even form helices in which the (on one strand) and one 3'-OH group (on strands twist to the left (called the Z form the other strand), as shown in Figure 2.8. of DNA). If there are complementary The diagram of the DNA duplex in Fig- stretches of nucleotides in the same strand, ure 2.6 is static and so somewhat mislead- then a single strand, separated from its ing. DNA is a dynamic molecule, constantly partner, can fold back upon itself like a in motion. In some regions, the strands can hairpin. Even triple helices consisting of separate briefly and then come together three strands can form in regions of DNA again in the same conformation or in a dif- that contain suitable base sequences.
2.2 The Molecular Structure of DNA 47 The Double Helix
James D. Watson and We wish to suggest a structure for the salt nine and cytosine. The sequence of Francis H. C. Crick 1953 of deoxyribose nucleic acid (DNA).... bases on a single chain does not appear The structure has two helical chains to be restricted in any way. However, if Cavendish Laboratory, each coiled round the same axis.... only specific pairs of bases can be Cambridge, England Both chains follow right-handed helices, formed, it follows that if the sequence of A Structure for Deoxyribose Nucleic Acid but the two chains run in opposite di- bases on one chain is given, then the se- rections.... The bases are quence on the other This is one of the watershed papers of twen- on the inside of the helix and If only specific pairs chain is automatically tieth-century biology. After its publication, the phosphates on the out- of bases can be determined.... It has nothing in genetics was the same. Every- side.... There is a residue not escaped our notice formed, it follows thing that was known, and everything still on each chain every 3.4 Å and that the specific pair- to be discovered, would now need to be in- the structure repeats after 10 that if the sequence ing we have postulated terpreted in terms of the structure and residues.... The novel fea- of bases on one immediately suggests function of DNA. The importance of the ture of the structure is the chain is given, then a plausible copying paper was recognized immediately, in no manner in which the two mechanism for the ge- the sequence on the small part because of its lucid and concise chains are held together by netic material.... We description of the structure. Watson and the purine and pyrimidine other chain is are much indebted to Crick benefited tremendously in knowing bases. The planes of the automatically Dr. Jerry Donohue for that their structure was consistent with the bases are perpendicular to determined. constant advice and unpublished structural studies of Maurice the fiber axis. They are joined criticism, especially on Wilkins and Rosalind Franklin. The same together in pairs, a single base from one interatomic distances. We have also issue of Nature that included the Watson chain being hydrogen-bonded to a sin- been stimulated by a knowledge of the and Crick paper also included, back to gle base from the other chain, so that general nature of the unpublished ex- back, a paper from the Wilkins group and the two lie side by side. One of the pair perimental results and ideas of Dr. one from the Franklin group detailing their must be a purine and the other a pyrimi- Maurice H. F. Wilkins, Dr. Rosalind data and the consistency of their data with dine for bonding to occur.... Only spe- Franklin and their co-workers at King’s the proposed structure. It has been said cific pairs of bases can bond together. College, London. that Franklin was poised a mere two half- These pairs are adenine (purine) with steps from making the discovery herself, thymine (pyrimidine), and guanine Source: Nature 171: 737–738 alone. In any event, Watson and Crick and (purine) with cytosine (pyrimidine). In Wilkins were awarded the 1962 Nobel other words, if an adenine forms one Prize for their discovery of DNA structure. member of a pair, on either chain, then Rosalind Franklin, tragically, died of cancer on these assumptions the other mem- in 1958 at the age of 38. ber must be thymine; similarly for gua-
pairing of A with T and of G with C in the DNA Structure as Related two polynucleotide chains. Unwinding and to Function separation of the chains, with each free chain In the structure of the DNA molecule, we being copied, results in the formation of two can see how three essential requirements of identical double helices (see Figure 1.6). a genetic material are met. 2. A genetic material must also have the capacity to carry all of the information 1. Any genetic material must be able to be needed to direct the organization and replicated accurately, so that the information metabolic activities of the cell. As we saw in it contains will be precisely replicated and Chapter 1, the product of most genes is a inherited by daughter cells. The basis for protein molecule—a polymer composed of exact duplication of a DNA molecule is the repeating units of amino acids. The sequence
48 Chapter 2 DNA Structure and DNA Manipulation of amino acids in the protein determines its who might carry the mutant gene. To take chemical and physical properties. A gene is another example, suppose there is reason expressed when its protein product is to believe that a mutation causing a genetic synthesized, and one requirement of the disease is present in a particular DNA frag- genetic material is that it direct the order in ment; then it is important to be able to pin- which amino acid units are added to the end point this fragment and isolate it from of a growing protein molecule. In DNA, this is done by means of a genetic code in which affected individuals to verify whether this groups of three bases specify amino acids. hypothesis is true and, if so, to identify the Because the four bases in a DNA molecule nature of the mutation. can be arranged in any sequence, and Most procedures for the separation and because the sequence can vary from one part identification of DNA fragments can be of the molecule to another and from grouped into two general categories: organism to organism, DNA can contain a great many unique regions, each of which 1. Those that identify a specific DNA fragment can be a distinct gene. A long DNA chain can present in genomic DNA by making use of direct the synthesis of a variety of different the fact that complementary single-stranded protein molecules. DNA sequences can, under the proper 3. A genetic material must also be capable of conditions, form a duplex molecule. These undergoing occasional mutations in which procedures rely on nucleic acid hybridization. the information it carries is altered. 2. Those that use prior knowledge of the Furthermore, so that mutations will be sequence at the ends of a DNA fragment to heritable, the mutant molecules must be specifically and repeatedly replicate this one capable of being replicated as faithfully as fragment from genomic DNA. These the parental molecule. This feature is procedures rely on selective DNA replication necessary to account for the evolution of (amplification) by means of the polymerase diverse organisms through the slow chain reaction. accumulation of favorable mutations. Watson and Crick suggested that heritable The major difference between these ap- mutations might be possible in DNA by rare proaches is that the first (relying on nucleic mispairing of the bases, with the result that acid hybridization) identifies fragments that an incorrect nucleotide becomes incorpo- are present in the genomic DNA itself, rated into a replicating DNA strand. whereas the second (relying on DNA ampli- fication) identifies experimentally manu- factured replicas of fragments whose original templates (but not the replicas) were pre- 2.3 The Separation and sent in the genomic DNA. This difference has practical implications: Identification of Genomic • Hybridization methods require a greater DNA Fragments amount of genomic DNA for the experimen- The following sections show how an un- tal procedures, but relatively large fragments derstanding of DNA structure and replica- can be identified, and no prior knowledge of tion has been put to practical use in the the DNA sequence is necessary. development of procedures for the separa- • Amplification methods require extremely small amounts of genomic DNA for the tion and identification of particular DNA experimental procedures, but the amplifica- fragments. These methods are used primar- tion is usually restricted to relatively small ily either to identify DNA markers or to aid fragments, and some prior knowledge of DNA in the isolation of particular DNA frag- sequence is necessary. ments that are of genetic interest. For ex- ample, consider a pedigree of familial The following sections discuss both types of breast cancer in which a particular DNA approaches and give examples of how they fragment serves as a marker for a bit of are used. In methods that use nucleic acid chromosome that also includes the mutant hybridization to identify particular frag- gene responsible for the increased risk; ments present in genomic DNA, the first then the ability to identify the fragment is step is usually cutting the genomic DNA critically important in assessing the relative into fragments of experimentally manage- risk for each of the women in the pedigree able size. This procedure is discussed next.
2.3 The Separation and Identification of Genomic DNA Fragments 49 Restriction Enzymes and Site-Specific DNA Cleavage Procedures for chemical isolation of DNA, such as those developed by Avery, MacLeod, and McCarty (Chapter 1), usually lead to random breakage of double- stranded molecules into an average length of about 50,000 base pairs. This length is de- noted 50 kb, where kb stands for kilobases (1 kb 1000 base pairs). A length of 50 kb is close to the length of double-stranded DNA present in the bacteriophage that in- fects E. coli. The 50-kb fragments can be made shorter by vigorous shearing forces, such as occur in a kitchen blender, but one Figure 2.9 Structure of the part of the restriction of the problems with breaking large DNA enzyme BamHI that comes into contact with its recognition site in the DNA (blue). The pink and molecules into smaller fragments by ran- green cylinders represent regions of the enzyme dom shearing is that the fragments con- in which the amino acid chain is twisted in the taining a particular gene, or part of a gene, form of a right-handed helix. [Courtesy of A. A. will be of different sizes. In other words, Aggarwal. Reprinted with permission from M. with random shearing, it is not possible to Newman, T. Strzelecka, L. F. Dorner, I. Schildkraut, and A. A. Aggarwal, 1995. Science isolate and identify a particular DNA frag- 269: 656. Copyright 2000 American Association ment on the basis of its size and sequence for the Advancement of Science.] content, because each randomly sheared molecule that contains the desired se- quence somewhere within it differs in size and cleaves each strand between the G- from all other molecules that contain the bearing nucleotides shown in red. Figure 2.9 sequence. In this section we describe an im- shows how the regions that make up the portant enzymatic technique that can be active site of BamHI contact the recognition used for cleaving DNA molecules at specific site (blue) just prior to cleavage, and the sites. This method ensures that all DNA cleavage reaction is indicated in Figure 2.10. fragments that contain a particular se- Table 2.3 lists nine of the several hundred quence have the same size; furthermore, restriction enzymes that are known. Most each fragment that contains the desired se- restriction enzymes are named after the quence has the sequence located at exactly species in which they were found. BamHI, the same position within the fragment. for example, was isolated from the bac- The cleavage method makes use of an terium Bacillus amyloliquefaciens strain H, important class of DNA-cleaving enzymes and it is the first (I) restriction enzyme iso- isolated primarily from bacteria. The en- lated from this organism. Because the first zymes are called restriction endo- three letters in the name of each restriction nucleases or restriction enzymes, and enzyme stand for the bacterial species of they are able to cleave DNA molecules at origin, these letters are printed in italics; the the positions at which particular, short se- rest of the symbols in the name are not ital- quences of bases are present. These natu- icized. Most restriction enzymes recognize rally occurring enzymes serve to protect the only one short base sequence, usually four bacterial cell by disabling the DNA of bacte- or six nucleotide pairs. The enzyme binds riophages that attack it. Their discovery with the DNA at these sites and makes a earned Werner Arber of Switzerland a break in each strand of the DNA molecule, Nobel Prize in 1978. Technically, the en- producing 3'-OH and 5'-P groups at each zymes are known as type II restriction endonu- position. The nucleotide sequence recog- cleases. The restriction enzyme BamHI is one nized for cleavage by a restriction enzyme is example; it recognizes the double-stranded called the restriction site of the enzyme. sequence The examples in Table 2.3 show that some 5'-GGATCC-3' restriction enzymes cleave their restriction 3'-CCTAGG-5' site asymmetrically (at different sites in the
50 Chapter 2 DNA Structure and DNA Manipulation BamH1 restriction site, GGATCC Figure 2.10 Mechanism of DNA cleavage by the restriction enzyme BamHI. Wherever the duplex 5’ end 3’ end contains a BamHI restriction site, the GGATCC enzyme makes a single cut in the CCTAGG backbone of each DNA strand. Each 3’ end 5’ end cut creates a new 3' end and a new 5' end, separating the duplex into two fragments. In the case of BamHI the Cleavage occurs Cleavage creates a short cuts are staggered cuts, so the result- in each strand at complementary single- ing ends terminate in single-stranded the site of the stranded overhang in each regions, each four base pairs in arrowhead. cleaved end (“sticky ends”). length.
5’ 3’ 5’ 3’ G New ends GATCC CCTAG created G 3’5’ 3’ 5’ Restriction fragment Restriction fragment
Table 2.3 Some restriction endonucleases, their sources, and their cleavage sites Enzyme Enzyme Enzyme (Microorganism) (Microorganism) (Microorganism)
EcoRI HindIII AruI (Escherichia coli)(Haemophilus influenzae)(Arthrobacter luteus)
Target sequence Target sequence GAATTC and cleavage site; AAGCTT and cleavage site; AGCT CTTAAG TCGA sticky ends TTCGAA blunt ends
BamHI PstI RsAI (Bacillus amyloliquefaciens H) (Providencia stuartii)(Rhodopseudomonas sphaeroides)
GGATCC CTGCAG GTAC CCTAGG GACGTC CATG
HaeII Taq I PvuII (Haemophilus aegyptus)(Thermus aquaticus)(Proteus vulgaris)
P u GCGCPy TCGA CAGCTG P y CGCGPu AGCT GTCGAC
Note: The vertical dashed line indicates the axis of symmetry in each sequence. Red arrows indicate the sites of cutting. The enzyme Taq I yields cohe- sive ends consisting of two nucleotides, whereas the cohesive ends produced by the other enzymes contain four nucleotides. Pu and Py refer to any purine and pyrimidine, respectively.
2.3 The Separation and Identification of Genomic DNA Fragments 51 BamHI two DNA strands), but other restriction The DNA fragment produced by a pair (Bacillus enzymes cleave symmetrically (at the same of adjacent cuts in a DNA molecule is called amyloliquefaciens H) site in both strands). The former leave a restriction fragment. A large DNA mol- sticky ends because each end of the ecule will typically be cut into many restric- cleaved site has a small, single-stranded tion fragments of different sizes. For GGATCC overhang that is complementary in base se- example, an E. coli DNA molecule, which CCTAGG quence to the other end (Figure 2.10). In contains 4.6 106 base pairs, is cut into contrast, enzymes that have symmetrical several hundred to several thousand frag- cleavage sites yield DNA fragments that ments, and mammalian genomic DNA is AruI have blunt ends. In virtually all cases, the cut into more than a million fragments. Al- (Arthrobacter luteus) restriction site of a restriction enzyme reads though these numbers are large, they are the same on both strands, provided that the actually quite small relative to the number opposite polarity of the strands is taken into of sugar–phosphate bonds in the DNA of an AGCT account; for example, each strand in the re- organism. TCGA striction site of BamHI reads 5'-GGATCC-3' (Figure 2.10). A DNA sequence with this type of symmetry is called a palindrome. Gel Electrophoresis (In ordinary English, a palindrome is a The DNA fragments produced by a restric- word or phrase that reads the same for- tion enzyme can be separated by size using wards and backwards, such as “madam.”) the fact that DNA is negatively charged and Restriction enzymes have the following moves in response to an electric field. If the important characteristics: terminals of an electrical power source are connected to the opposite ends of a hori- • Most restriction enzymes recognize a single zontal tube containing a DNA solution, restriction site. then the DNA molecules will move toward • The restriction site is recognized without the positive end of the tube at a rate that regard to the source of the DNA. depends on the electric field strength and • Because most restriction enzymes recognize a on the shape and size of the molecules. The unique restriction site sequence, the number of cuts in the DNA from a particular organism movement of charged molecules in an elec- is determined by the number of restriction tric field is called electrophoresis. sites present. The type of electrophoresis most commonly used in genetics is gel electro- phoresis. An experimental arrangement for gel electrophoresis of DNA is shown in Figure 2.11. A thin slab of a gel, usually agarose or acrylamide, is prepared contain- Slots for Bands (visible after ing small slots (called wells) into which samples suitable treatment) Gel samples are placed. An electric field is ap- plied, and the negatively charged DNA mol- Electrode ecules penetrate and move through the gel toward the anode (the positively charged electrode). A gel is a complex molecular Buffer solution network that contains narrow, tortuous Direction of passages, so smaller DNA molecules pass movement – through more easily; hence the rate of + movement increases as the size of the DNA fragment decreases. Figure 2.12 shows the re- Figure 2.11 Apparatus for gel electrophoresis. Liquid gel sult of electrophoresis of a set of double- is allowed to harden with an appropriately shaped mold in place to form “wells” for the samples (purple). After stranded DNA molecules in an agarose gel. electrophoresis, the DNA fragments, located at various Each discrete region containing DNA is positions in the gel, are made visible by immersing the called a band. The bands can be visualized gel in a solution containing a reagent that binds to or under ultraviolet light after soaking the gel reacts with DNA. The separated fragments in a sample in the dye ethidium bromide, the molecules appear as bands, which may be either visibly colored or fluorescent, depending on the particular reagent used. of which intercalate between the stacked The region of a gel in which the fragments in one sam- bases in duplex DNA and render it fluores- ple can move is called a lane; this gel has seven lanes. cent. In Figure 2.12, each band in the gel
52 Chapter 2 DNA Structure and DNA Manipulation Figure 2.12 Gel electrophoresis of DNA. Band from Fragments of different sizes were mixed and placed in a well. Electrophoresis was in the heaviest fragment downward direction. The DNA has been made (moves least) visible by the addition of a dye (ethidium Direction bromide) that binds only to DNA and that of movement fluoresces when the gel is illuminated with short-wavelength ultraviolet light.
Band from lightest fragment (moves most)
For any one agarose concentration, except for results from the fact that all DNA fragments the largest fragments, the distance migrated of a given size have migrated to the same decreases as a linear function of the logarithm of fragment size. position in the gel. To produce a visible band, a minimum of about 5 10 9 20 grams of DNA is required, which for a frag- ment of size 3 kb works out to about 109 18 molecules. The point is that a very large number of copies of any particular DNA 16 fragment must be present in order to yield a visible band in an electrophoresis gel. 14 of size in bp A linear double-stranded DNA fragment 10 has an electrophoretic mobility that de- log 12 creases in proportion to the logarithm of its length in base pairs—the longer the frag- Distance migrated ment, the slower it moves—but the propor- 10 tionality constant depends on the agarose concentration, the composition of the 8 buffering solution, and the electrophoretic conditions. This means that different con- 6 centrations of agarose allow efficient sepa-
ration of different size ranges of DNA linear double-stranded DNA fragments Size range (kb) for efficient separation of 4 fragments (see Figure 2.13). Less dense gels, such as 0.6 percent agarose, are used to sep- arate larger fragments; whereas more dense 2 gels, such as 2 percent agarose, are used to separate smaller fragments. The inset in 0 0.60.7 0.9 1.2 1.5 2.0 Figure 2.13 shows the dependence of elec- trophoretic mobility on the logarithm of Agarose concentration (%) fragment size. It also indicates that the lin- Figure 2.13 In agarose gels, the concentration of ear relationship breaks down for the largest agarose is an important factor in determining fragments that can be resolved under a the size range of DNA fragments that can be given set of conditions. separated.
2.3 The Separation and Identification of Genomic DNA Fragments 53 (A) EcoRI enzyme genetic engineering, discussed further in 22 5 5.5 7.5 6 4 Chapter 13. λ DNA DNA fragments that have been cloned 1 5 4 2 3 6 into organisms such as E. coli are widely used because the fragments can be isolated in large amounts and purified relatively easily. Among the uses of cloned DNA are: 1 2 3 4 5 6 • DNA sequencing. All current methods of DNA (B) EcoRI sequencing require cloned DNA fragments. These methods are discussed in Chapter 6. BamHI • Nucleic acid hybridization. As we shall see 1 2 3 4 5/6 below, an important application of cloned DNA fragments entails incorporating a radioactive or light-emitting “label” into (C) BamHI enzyme them, after which the labeled material is used to “tag” DNA fragments containing similar sequences. 6 1 5 4 3 2 λ DNA • Storage and distribution. Cloned DNA can be 5.5 17.5 5 6.5 7.5 8 stored for long periods without risk of change and can easily be distributed to other Figure 2.14 Restriction maps of DNA for the restriction enzymes researchers. (A) EcoRI and (C) BamHI. The vertical bars indicate the sites of cutting. The numbers within the arrows are the approximate lengths of the fragments in kilobase pairs (kb). (B) An electrophoresis gel of BamHI and EcoRI enzyme digests of DNA. Numbers indicate fragments in order Nucleic Acid Hybridization from largest (1) to smallest (6); the circled numbers on the maps correspond to the numbers beside the gel. The DNA has not undergone Most genomes are sufficiently large and electrophoresis long enough to separate bands 5 and 6 of the BamHI digest. complex that digestion with a restriction Note: In Problem 2 at the end of this chapter (Guide to Problem Solving), enzyme produces many bands that are the we show how to use the results of a double digest to determine the partic- same or similar in size. Identifying a partic- ular order of fragments for a pair of restriction enzymes. ular DNA fragment in a background of many other fragments of similar size pre- sents a needle-in-a-haystack problem. Sup- pose, for example, that we are interested in Because of the sequence specificity of a particular 3.0 BamHI fragment from the cleavage, a particular restriction enzyme pro- human genome that serves as a marker in- duces a unique set of fragments for a particular dicating the presence of a genetic risk factor DNA molecule. Another enzyme will pro- toward breast cancer among the individuals duce a different set of fragments from the in a particular pedigree. This fragment of same DNA molecule. In Figure 2.14, this prin- 3.0 kb is indistinguishable, on the basis of ciple is illustrated for the digestion of E. coli size alone, from fragments ranging from phage DNA by either EcoRI or BamHI (see about 2.9 to 3.1 kb. How many fragments part B). The locations of the cleavage sites in this size range are expected? When hu- for these enzymes in DNA are shown in man genomic DNA is cleaved with BamHI, Figures 2.14A and C. A diagram showing the average length of a restriction fragment sites of cleavage along a DNA molecule is is 46 4096 base pairs, and the expected called a restriction map. Particular DNA total number of BamHI fragments is about fragments can be isolated by cutting out the 730,000; in the size range 2.9–3.1 kb, the small region of the gel that contains the expected number of fragments is about fragment and removing the DNA from the 17,000. What this means is that even gel. One important use of isolated restric- though we know that the fragment we are tion fragments employs the enzyme DNA interested in is 3 kb in length, it is only one ligase to insert them into self-replicating of 17,000 fragments that are so similar in molecules such as bacteriophage, plasmids, size that ours cannot be distinguished from or even small artificial chromosomes (Fig- the others by length alone. ure 2.2). These procedures constitute DNA This identification task is actually harder cloning and are the basis of one form of than finding a needle in a haystack because
54 Chapter 2 DNA Structure and DNA Manipulation haystacks are usually dry. A more accurate Denatured analogy would be looking for a needle in a Strand separation single strands haystack that had been pitched into a swimming pool full of water. This analogy is 1.40 more relevant because gels, even though they contain a supporting matrix to make them semisolid, are primarily composed of 1.30 water, and each DNA molecule within a gel is surrounded entirely by water. Clearly, we Further denaturation Temperature at which need some method by which the molecules half the base pairs are in a gel can be immobilized and our specific 1.20 denatured and half fragment identified. remain intact The DNA fragments in a gel are usually Denaturation immobilized by transferring them onto a begins
Relative light absorbance at 260 nm 1.10 sheet of special filter paper consisting of Double- nitrocellulose, to which DNA can be perma- stranded DNA nently (covalently) bound. How this is done is described in the next section. In this 1.00 section we examine how the two strands in 30 50 70Tm 90 100 110 a double helix can be “unzipped” to form Temperature, °C single strands and how, under the proper conditions, two single strands that are Figure 2.15 Mechanism of denaturation of DNA by heat. The temperature at which 50 percent of the base pairs are denatured is the melting tempera- complementary or nearly complementary ture, symbolized T . in sequence can be “zipped” together to m form a different double helix. The “unzip- ping” is called denaturation, the “zipping” renaturation. The practical applications of denaturation and renaturation are many: pairs. When solutions containing DNA frag- ments are raised to temperatures in the • A small part of a DNA fragment can be ° “zipped” with a much larger DNA fragment. range 85–100 C, or to the high pH of strong This principle is used in identifying specific alkaline solutions, the paired strands begin DNA fragments in a complex mixture, such as to separate, or “unzip.” Unwinding of the the 3-kb BamHI marker for breast cancer that helix happens in less than a few minutes we have been considering. Applications of this (the time depends on the length of the mol- type include the tracking of genetic markers ecule). When the helical structure of DNA in pedigrees and the isolation of fragments is disrupted, and the strands have become containing a particular mutant gene. completely unzipped, the molecule is said • A DNA fragment from one gene can be to be denatured. A common way to detect “zipped” with similar fragments from other denaturation is by measuring the capacity genes in the same genome; this principle is of DNA in solution to absorb ultraviolet used to identify different members of families of genes that are similar, but not identical, in light of wavelength 260 nm, because the sequence and that have related functions. absorption at 260 nm (A260) of a solution of • A DNA fragment from one species can be single-stranded molecules is 37 percent “zipped” with similar sequences from other higher than the absorption of the double- species. This allows the isolation of genes stranded molecules at the same concentra- that have the same or related functions in tion. As shown in Figure 2.15, the progress of multiple species. It is used to study aspects of denaturation can be followed by slowly molecular evolution, such as how differences heating a solution of double-stranded DNA in sequence are correlated with differences and recording the value of A260 at various in function, and the patterns and rates of temperatures. The temperature required change in gene sequences as they evolve. for denaturation increases with G C As we saw in Section 2.2, the double- content, not only because G C base pairs stranded helical structure of DNA is main- have three hydrogen bonds and A T base tained by base stacking and by hydrogen pairs two, but because consecutive G C bonding between the complementary base base pairs have stronger base stacking.
2.3 The Separation and Identification of Genomic DNA Fragments 55 Photocaptionphotocaptionphoto captionphotocaptionphotocap- tionphotocaptionphotocaption- photocaptionphotocaptionphotoc aptionphotocaption
09131_01_1746P
Denatured DNA strands can, under cer- Shown in part A is a solution of denatured tain conditions, form double-stranded DNA DNA, called the probe, in which each mol- with other strands, provided that the ecule has been labeled with either radioac- strands are sufficiently complementary in tive atoms or light-emitting molecules. sequence. This process of renaturation is Probe DNA is typically obtained from a called nucleic acid hybridization be- clone, and the labeled probe usually con- cause the double-stranded molecules are tains denatured forms of both strands pre- “hybrid” in that each strand comes from a sent in the original duplex molecule. (This different source. For DNA strands to hy- has led to some confusing terminology. Ge- bridize, two requirements must be met: neticists say that probe DNA hybridizes 1. The salt concentration must be high with DNA fragments containing sequences ( 0.25M) to neutralize the negative charges that are similar to the probe, rather than of the phosphate groups, which would complementary. What actually occurs is that otherwise cause the complementary strands one strand of the probe undergoes hy- to repel one another. bridization with a complementary sequence 2. The temperature must be high enough to in the fragment. But because the probe usu- disrupt hydrogen bonds that form at random ally contains both strands, hybridization between short sequences of bases within the takes place with any fragment that contains same strand, but not so high that stable base a similar sequence, each strand in the probe pairs between the complementary strands undergoing hybridization with the comple- are disrupted. mentary sequence in the fragment.) The initial phase of renaturation is a slow Part B in Figure 2.16 is a diagram of ge- process because the rate is limited by the nomic DNA fragments that have been im- random chance that a region of two com- mobilized on a nitrocellulose filter. When plementary strands will come together to the probe is mixed with the genomic frag- form a short sequence of correct base pairs. ments (part C), random collisions bring This initial pairing step is followed by a short, complementary stretches together. If rapid pairing of the remaining complemen- the region of complementary sequence is tary bases and rewinding of the helix. short (part D), then random collision can- Rewinding is accomplished in a matter of not initiate renaturation because the flank- seconds, and its rate is independent of DNA ing sequences cannot pair; in this case the concentration because the complementary probe falls off almost immediately. If, how- strands have already found each other. ever, a collision brings short sequences to- The example of nucleic acid hybridiza- gether in the correct register (part E), then tion in Figure 2.16 will enable us to under- this initiates renaturation, because the pair- stand some of the molecular details and also ing proceeds zipperlike from the initial con- to see how hybridization is used to “tag” tact. The main point is that DNA fragments and identify a particular DNA fragment. are able to hybridize only if the length of
56 Chapter 2 DNA Structure and DNA Manipulation (A)–Fragments of denatured (B)–Fragments of denatured and labeled probe DNA genomic DNA Mix immobilized on filter The denatured probe usually contains both complementary strands.
GTATAATGCGAGCC Renaturation CATATTACGCTCGG Some fragments in the genomic DNA may contain a sequence similar to that in the probe DNA.
(C)– Random collisions bring small regions of complementary sequences together to start the renaturation. Heat-sealed bag
(D)–Initial pairing with incorrect fragment (E)–Initial pairing with correct fragment T G C A G C C G T T A C A T G G G A A C A C T C A A T A C A T T A C G G T A T A A T G C G A G C C T C T C C C T AT T A C C A A C G C T C G G C GG
Base pairing cannot go farther Base pairing proceeds in a zipper-like because flanking sequences are not fashion because flanking sequences complementary; probe falls away. are complementary; probe sticks.
Figure 2.16 Nucleic acid hybridization. (A) Duplex molecules of probe DNA (obtained from a clone) are denatured and (B) placed in contact with a filter to which is attached denatured strands of genomic DNA. (C) Under the proper conditions of salt concentration and temperature, short complementary stretches come together by random collision. (D) If the sequences flanking the paired region are not complementary, then the pairing is unstable and the strands come apart again. (E) If the sequences flanking the paired region are complementary, then further base pairing stabilizes the renatured duplex.
the region in which they can pair is suffi- tured DNA, if it is suitably labeled (for ex- ciently long. Some mismatches in the ample, with radioactive 32P), can be com- paired region can be tolerated. How many bined with a complex mixture of denatured mismatches are allowed is determined by DNA fragments, and upon renaturation the the conditions of the experiment: The small fragment will “tag” with radioactivity lower the temperature at which the hy- any molecules in the complex mixture with bridization is carried out, and the higher which it can hybridize. The radioactive tag the salt concentration, the greater the pro- allows these molecules to be identified. portion of mismatches that are tolerated. The methods of DNA cleavage, elec- trophoresis, transfer to nitrocellulose, and hybridization with a probe are all combined The Southern Blot in the Southern blot, named after its in- The ability to renature DNA in the manner ventor Edward Southern. In this procedure, outlined in Figure 12.16 means that solu- a gel in which DNA molecules have been tion containing a small fragment of dena- separated by electrophoresis is treated with
2.3 The Separation and Identification of Genomic DNA Fragments 57 (A)–DNA is cleaved; electrophoresis (B)–DNA fragments are blotted (C)–Filter is exposed (D)–Filter is exposed to is used to separate DNA onto nitrocellulose filter to radioactive probe photographic film; film is developed Gel DNA restriction fragments Heat-sealed bag (so many that Filter individual bands Probe in x-ray run together) film Nitrocellulose solution Size filter markers
Gel
Gel
Buffer Absorbent Nitrocellulose DNA fragments Probe hybridizes with Bands on x-ray solution paper filter transferred onto filter homologous DNA film created by (invisible at this stage) fragments (still not visible) labeled probe
Figure 2.17 Southern blot. (A) DNA re- alkali to denature the DNA and render it the ones that are of interest. Practical appli- striction fragments are single-stranded (Figure 2.17). Then the DNA cations of Southern blotting center on iden- separated by electro- phoresis, blotted from is transferred to a sheet of nitrocellulose fil- tifying DNA fragments that contain the gel onto a nitro- ter in such a way that the relative positions sequences similar to the probe DNA or cellulose or nylon of the DNA fragments are maintained. The RNA, where the proportion of mismatched filter, and chemically transfer is accomplished by overlaying the nucleotides allowed is determined by the attached by the use nitrocellulose onto the gel and stacking conditions of hybridization. The advantages of ultraviolet light. (B) The strands are many layers of absorbent paper on top; the of the Southern blot are convenience and denatured and mixed absorbent paper sucks water molecules sensitivity. The sensitivity comes from the with radioactive or from the gel and through the nitrocellulose, fact that both hybridization with a labeled light-sensitive probe to which the DNA fragments adhere. (This probe and the use of photographic film am- DNA, which binds step is the “blot” component of the South- plify the signal; under typical conditions, a with complementary sequences present ern blot, parts A and B.) Then the filter is band can be observed on the film with only 12 on the filter. The treated so that the single-stranded DNA be- 5 10 grams of DNA—a thousand bound probe remains, comes permanently bound. The treated fil- times less DNA than the amount required whereas unbound ter is mixed with a solution containing to produce a visible band in the gel itself. probe washes off. denatured probe (DNA or RNA) under con- (C) Bound probe is revealed by darkening ditions that allow complementary strands to of photographic film hybridize to form duplex molecules (part placed over the filter. C). Radioactive or other label present in the 2.4 Selective Replication of The positions of the probe becomes stably bound to the filter, bands indicate which and therefore resistant to removal by wash- Genomic DNA Fragments restriction fragments contain DNA se- ing, only at positions at which base se- Although nucleic acid hybridization allows quences homologous quences complementary to the probe are a particular DNA fragment to be identified to those in the probe. already present on the filter, so that the when present in a complex mixture of frag- probe can form duplex molecules. The label ments, it does not enable the fragment to be is located by placing the paper in contact separated from the others and purified. Ob- with x-ray film. After development of the taining the fragment in purified form Au: Proofreader film, blackened regions indicate positions of requires cloning, which is straightforward asks that you con- bands containing the radioactive or light- but time-consuming. (Cloning methods are firm reference to emitting label (part D). discussed in Chapter 13.) However, if the Chapter 13 in first The procedure in Figure 2.17 solves the fragment of interest is not too long, and if para of sec. 2.4. wet-haystack problem by transferring and the nucleotide sequence at each end is immobilizing the genomic DNA fragments known, then it becomes possible to obtain to a filter and identifying, by hybridization, large quantities of the fragment merely by
58 Chapter 2 DNA Structure and DNA Manipulation selective replication. This process is called dATP, dGTP, dTTP, and dCTP, which contain amplification. How would one know the the bases adenine, guanine, thymine, and sequence of the ends? Let us return to our cytosine, respectively. Details of the struc- example of the 3.0-kb BamHI fragment that tures of dCTP and dGTP are shown in Figure serves to mark a risk factor for breast cancer 2.18, in which the phosphate groups cleaved in certain pedigrees. Suppose that this frag- off during DNA synthesis are indicated. ment is cloned and sequenced from one af- DNA synthesis requires all four nucleoside fected individual, and it is found that, 5'-triphosphates and does not take place if relative to the normal genomic sequence in any of them is omitted. this region, the BamHI fragment is missing a A feature found in all DNA polymerases region of 500 base pairs. At this point the se- is that quences at the ends of the fragment are A DNA polymerase can only elongate a DNA known, and we can also infer that amplifi- strand. It is not possible for DNA polymerase to cation of genomic DNA from individuals initiate synthesis of a new strand, even when a with the risk factor will yield a band of 3.0 template molecule is present. kb, whereas amplification from genomic DNA of noncarriers will yield a band of 3.5 One important implication of this prin- kb. This difference allows every person in ciple is that DNA synthesis requires a pre- the pedigree to be diagnosed as a carrier or existing segment of nucleic acid that is noncarrier merely by means of DNA ampli- hydrogen-bonded to the template strand. fication. To understand how amplification This segment is called a primer. Because works, it is first necessary to examine a few the primer molecule can be very short, it is key features of DNA replication. an oligonucleotide, which literally means “few nucleotides.” As we shall see in Chap- ter 6, in living cells the primer is a short seg- Constraints on DNA Replication: ment of RNA, but in DNA amplification in Primers and 5'-to-3' Strand vitro, the primer employed is usually DNA. Elongation
As with most metabolic reactions in living H cells, nucleic acids are synthesized in chem- ical reactions controlled by enzymes. An H NH enzyme that forms the sugar–phosphate O– O– O– C C bond (the phosphodiester bond) between H C C N – O P O P O P O CH adjacent nucleotides in a nucleic acid chain 2 O N C is called a DNA polymerase. A variety of O O O HH O DNA polymerases have been purified, and HH for amplification of a DNA fragment, the OH H DNA synthesis is carried out in vitro by com- Deoxycytidine 5'-triphosphate (dCTP) bining purified cellular components in a test tube under precisely defined condi- The outer two phosphate tions. (In vitro means “without participation groups are cleaved off when of living cells.”) nucleotides are added to the In order for DNA polymerase to catalyze growing DNA strand. synthesis of a new DNA strand, preexisting single-stranded DNA must be present. Each single-stranded DNA molecule present in H O– O– O– the reaction mix can serve as a template N C O upon which a new partner strand is created – O P O P O P O CH 2 O C by the DNA polymerase. For DNA replica- N C C G tion to take place, the 5'-triphosphates of O O O HH NH N the four deoxynucleosides must also be pre- HHC OH H sent. This requirement is rather obvious, be- NH cause the nucleoside triphosphates are the precursors from which new DNA strands Deoxyguanosine 5'-triphosphate (dGTP) H are created. The triphosphates needed are Figure 2.18 Two deoxynucleoside triphosphates used in DNA synthesis. the compounds denoted in Table 2.1 as The outer two phosphate groups are removed during synthesis.
2.4 Selective Replication of Genomic DNA Fragments 59 44 A P–P (pyrophosphate) OH group is released.
C O P 5’ P P end P P 1 Base pairing specifies C 3 the next nucleotide to G 3’ An O–P bond is be added at the 3’ end. end formed to attach OH the new nucleotide. 3’
T P Template Newly 2 strand synthesized 3’ The 3’ hydroxyl group at the strand 3’ end of the growing strand attacks the innermost phosphate group of O P 3’ 5’ the incoming trinucleotide. end end
Figure 2.19 Addition of nucleotides to the 3'-OH terminus of a growing strand. The recognition step is shown as the formation of hydrogen bonds between the A and the T. The chemical reaction is that the 3'-OH group of the 3' end of the growing chain attacks the innermost phosphate group of the incoming trinucleotide.
It is the 3' end of the primer that is essen- The Polymerase Chain tial, because, as emphasized in Chapter 1, Reaction DNA synthesis proceeds only by addition of The requirement for an oligonucleotide successive nucleotides to the 3' end of the primer, and the constraint that chain elon- growing strand. In other words, chain elonga- gation must always occur in the 5' 3' tion always takes place in the 5'-to-3' direction direction, make it possible to obtain large (5' 3'). quantities of a particular DNA sequence by The reason for the 5' 3' direction of selective amplification in vitro. The method chain elongation is illustrated in Figure 2.19. for selective amplification is called the poly- It is a consequence of the fact that the reac- merase chain reaction (PCR). For its in- tion catalyzed by DNA polymerase is the vention, Californian Kary B. Mullis was formation of a phosphodiester bond be- awarded a Nobel Prize in 1993. PCR amplifi- tween the free 3'-OH group of the chain cation uses DNA polymerase and a pair of being extended and the innermost phos- short, synthetic oligonucleotide primers, phorus atom of the nucleoside triphosphate usually 18–22 nucleotides in length, that are being incorporated at the 3' end. Recogni- complementary in sequence to the ends of tion of the appropriate incoming nucleoside the DNA sequence to be amplified. Figure triphosphate in replication depends on base 2.20 gives an example in which the primer pairing with the opposite nucleotide in the oligonucleotides (green) are 9-mers. These template strand. DNA polymerase will usu- are too short for most practical purposes, but ally catalyze the polymerization reaction they will serve for illustration. The original that incorporates the new nucleotide at the duplex molecule (part A) is shown in blue. primer terminus only when the correct This duplex is mixed with a vast excess of base pair is present. The same DNA poly- primer molecules, DNA polymerase, and all merase is used to add each of the four four nucleoside triphosphates. When the deoxynucleoside phosphates to the 3'-OH temperature is raised, the strands of the terminus of the growing strand. duplex denature and become separated.
60 Chapter 2 DNA Structure and DNA Manipulation (A)–Original duplex DNA Figure 2.20 Role of DNA primer sequences in PCR amplification. (A) ACA T GACGT CTATGCATG Target DNA duplex TGT A CTGCA GATACGTAC (blue), showing sequences chosen as the primer-binding sites flanking the region to be amplified. (B)–First cycle—primers attached (B) Primer (green) bound to denatured strands of target DNA. ACATGACGT CTATGCATG GATACGTAC (C) First round of amplification. Newly Primer Region to be synthesized DNA is amplified Primer shown in pink. Note that each primer is ACA T GACGT TGT A CTGCA GATACGTAC extended beyond the other primer site. (D) Second round of amplification (only one strand shown); in (C)–Elongation (DNA synthesis) this round, the newly synthesized strand terminates at the ACA T GACGT CTATGCATG opposite primer site. TGT A CTGCA GATACGTAC (E) Third round of amplification (only one strand shown); in this round, both strands are truncated ACA T GACGT CTATGCATG at the primer sites. TGT A CTGCA GATACGTAC Primer sequences are normally about twice as long as shown here.
(D)–Second cycle—after elongation
ACA T GACGT CTATGCATG TGT A CTGCA GATACGTAC
(E)–Third cycle—after elongation
ACA T GACGT CTATGCATG TGT A CTGCA GATACGTAC
When the temperature is lowered again to cause each DNA strand elongates only at the allow renaturation, the primers, because 3' end. After the primers have annealed, they are in great excess, become annealed to each is elongated by DNA polymerase using the separated template strands (part B). the original strand as a template, and the Note that the primer sequences are different newly synthesized DNA strands (red) grow from each other but complementary to se- toward each other as synthesis proceeds quences present in opposite strands of the (part C). Note that: original DNA duplex and flanking the re- A region of duplex DNA present in the gion to be amplified. The primers are ori- original reaction mix can be PCR-amplified ented with their 3' ends pointing in the only if the region is flanked by the primer direction of the region to be amplified, be- oligonucleotides.
2.4 Selective Replication of Genomic DNA Fragments 61 To start a second cycle of PCR amplifica- ends of the sequence to be amplified and be- tion, the temperature is raised again to de- come the substrates for chain elongation by nature the duplex DNA. Upon lowering of DNA polymerase. In the first cycle of PCR the temperature, the original parental amplification, the DNA is denatured to sep- strands anneal with the primers and are arate the strands. The denaturation temper- replicated as shown in Figure 2.20B and C. ature is usually around 95°C. Then the The daughter strands produced in the first temperature is decreased to allow annealing round of amplification also anneal with in the presence of a vast excess of the primer primers and are replicated, as shown in part oligonucleotides. The annealing tempera- D. In this case, although the daughter du- ture is typically in the range of 50°C–60°C, plex molecules are identical in sequence to depending largely on the G C content of the original parental molecule, they consist the oligonucleotide primers. To complete entirely of primer oligonucleotides and the cycle, the temperature is raised slightly, nonparental DNA that was synthesized in to about 70°C, for the elongation of each either the first or the second cycle of PCR. primer. The steps of denaturation, renatura- As successive cycles of denaturation, primer tion, and replication are repeated from annealing, and elongation occur, the origi- 20–30 times, and in each cycle the number nal parental strands are diluted out by the of molecules of the amplified sequence is proliferation of new daughter strands until doubled. eventually, virtually every molecule pro- Implementation of PCR with conven- duced in the PCR has the structure shown tional DNA polymerases is not practical, in part D. because at the high temperature necessary The power of PCR amplification is that for denaturation, the polymerase is itself ir- the number of copies of the template strand reversibly unfolded (denatured) and be- increases in exponential progression: 1, 2, comes inactive. However, DNA polymerase 4, 8, 16, 32, 64, 128, 256, 512, 1024, and so isolated from certain organisms is heat sta- forth, doubling with each cycle of replica- ble because the organisms normally live in tion. Starting with a mixture containing as hot springs at temperatures well above little as one molecule of the fragment of in- 90°C, such as are found in Yellowstone terest, repeated rounds of DNA replication National Park. Such organisms are said to increase the number of amplified molecules be thermophiles. The most widely used exponentially. For example, starting with a heat-stable DNA polymerase is called Taq single molecule, 25 rounds of DNA replica- polymerase, because it was originally iso- tion will result in 225 3.4 107 mole- lated from the thermophilic bacterium cules. This number of molecules of the Thermus aquaticus. amplified fragment is so much greater than PCR amplification is very useful for gen- that of the other unamplified molecules in erating large quantities of a specific DNA se- the original mixture that the amplified quence. The principal limitation of the DNA can often be used without further pu- technique is that the DNA sequences at the rification. For example, a single fragment of ends of the region to be amplified must be 3 kb in E. coli accounts for only 0.06 percent known so that primer oligonucleotides can of the DNA in this organism. However, if be synthesized. In addition, sequences this single fragment were replicated longer than about 5000 base pairs cannot be through 25 rounds of replication, then replicated efficiently by conventional PCR 99.995 percent of the resulting mixture procedures, although there are modifica- would consist of the amplified sequence. A tions of PCR that allow longer fragments to 3-kb fragment of human DNA constitutes be amplified. On the other hand, many ap- only 0.0001 percent of the total genome plications require amplification of relatively size. Amplification of a 3-kb fragment of small fragments. The major advantage of human DNA to 99.995 percent purity PCR amplification is that it requires only would require about 34 cycles of PCR. trace amounts of template DNA. Theoreti- An overview of the polymerase chain cally only one template molecule is re- reaction is shown in Figure 2.21. The DNA se- quired, but in practice the amplification of a quence to be amplified is again shown in single molecule may fail because the mole- blue and the oligonucleotide primers in cule may, by chance, be broken or damaged. green. The oligonucleotides anneal to the But amplification is usually reliable with as
62 Chapter 2 DNA Structure and DNA Manipulation (A) First cycle
DNA sequence to be amplified
Denaturation, annealing
Primer oligonucleotides
DNA replication
(B) Second cycle
(C) 20–30 cycles Amplified DNA sequences
Figure 2.21 Polymerase chain reaction (PCR) for amplification of particular DNA sequences. Only the region to be amplified is shown. Oligonucleotide primers (green) that are complementary to the ends of the target sequence (blue) are used in repeated rounds of denaturation, annealing, and DNA replication. Newly replicated DNA is shown in pink. The number of copies of the target sequence doubles in each round of replication, eventually overwhelming any other sequences that may be present.
2.4 Selective Replication of Genomic DNA Fragments 63 few as 10–100 template molecules, which alanine in the polypeptide chain that the makes PCR amplification 10,000–100,000 gene encodes, another form of the same times more sensitive than detection via nu- gene may have, at the same position, the cleic acid hybridization. codon GCG, which also specifies alanine. The exquisite sensitivity of PCR amplifi- Hence the two forms of the gene encode cation has led to its use in DNA typing for the same sequence of amino acids yet dif- criminal cases in which a minuscule fer in DNA sequence. The alternative forms amount of biological material has been left of a gene are called alleles of the gene. behind by the perpetrator (skin cells on a Different alleles may also code for different cigarette butt or hair-root cells on a single amino acid sequences, sometimes with hair can yield enough template DNA for drastic effects. Recall the example of the amplification). In research, PCR is widely PAH gene for phenyalanine hydroxylase in used in the study of independent muta- Chapter 1, in which a change in codon 408 tions in a gene whose sequence is known from CGG (arginine) to TGG (tryptophan) in order to identify the molecular basis of results in an inactive enzyme that becomes each mutation, to study DNA sequence expressed as the inborn error of metabo- variation among alternative forms of a lism phenylketonuria. gene that may be present in natural popu- Within a cell, genes are arranged in lin- lations, or to examine differences among ear order along microscopic thread-like genes with the same function in different bodies called chromosomes, which we species. The PCR procedure has also come will examine in detail in Chapters 4 and 8. into widespread use in clinical laboratories Each human reproductive cell contains one for diagnosis. To take just one very impor- complete set of 23 chromosomes contain- tant example, the presence of the human ing 3 109 base pairs of DNA. A typical immunodeficiency virus (HIV), which chromosome contains several hundred to causes acquired immune deficiency syn- several thousand genes. In humans the drome (AIDS), can be detected in trace average is approximately 3500 genes per quantities in blood banks via PCR by using chromosome. Each chromosome contains a primers complementary to sequences in single molecule of duplex DNA along its the viral genetic material. These and other length, complexed with proteins and very applications of PCR are facilitated by the tightly coiled. The DNA in the average hu- fact that the procedure lends itself to au- man chromosome, when fully extended, tomation by the use of mechanical robots has relative dimensions comparable to to set up and run the reactions. those of a wet spaghetti noodle 25 miles long; when the DNA is coiled in the form of a chromosome, its physical compaction is 2.5 The Terminology of comparable to that of the same noodle coiled and packed into an 18-foot canoe. Genetic Analysis The physical position of a gene along a In order to discuss the types of DNA mark- chromosome is called the locus of the ers that modern geneticists commonly use gene. In most higher organisms, including in genetic analysis, we must first introduce human beings, each cell other than a sperm some key terms that provide the essential or egg contains two copies of each type of vocabulary of genetics. These terms can be chromosome—one from the mother and understood with reference to Figure 2.22.In one inherited from the father. Each mem- Chapter 1 we defined a gene as an ele- ber of such a pair of chromosomes is said to ment of heredity, transmitted from parents be homologous to the other. (The chro- to offspring in reproduction , that influ- mosomes that determine sex are an impor- ences one or more hereditary traits. Chem- tant exception, discussed in Chapter 4, that ically, a gene is a sequence of nucleotides we will ignore for now.) At any locus, along a DNA molecule. In a population of therefore, each individual carries two al- organisms, not all copies of a gene may leles, because one allele is present at a cor- have exactly the same nucleotide se- responding position in each of the quence. For example, whereas one form of homologous maternal and paternal chro- a gene has the codon GCA, which specifies mosomes (Figure 2.22).
64 Chapter 2 DNA Structure and DNA Manipulation The genetic constitution of an individual Locus (physical position) One of each pair of is called its genotype. For a particular of gene A in each chromosomes is maternal in gene, if the two alleles at the locus in an in- homologous chromosome origin, the other paternal. dividual are indistinguishable from each other, then for this gene the genotype of the Gene A Gene B Gene C individual is said to be homozygous for BC the allele that is present. If the two alleles at Homologous chromosomes the locus are different from each other, b C then for this gene the genotype of the indi- vidual is said to be heterozygous for the Heterozygous Homozygous genotype Bb genotype CC alleles that are present. Typographically, genes are indicated in italics, and alleles are Many different A typically distinguished by uppercase or low- A1 A alleles can exist Genotypes are sometimes ercase letters (A versus a), subscripts (A 2 in an entire population 1 A3 written with a slash (for of organisms, but only versus A2), superscripts (a versus a ), or A4 example, B/b and C/C) to sometimes just and . Using these sym- A5 a single allele can be distinguish the alleles in bols, homozygous genes would be por- • present at the locus of homologous chromosomes. • the A gene in any one trayed by any of these formulas: AA, aa, • chromosome. A1A1,A2A2,a a ,a a , ,or .Asin the last two examples, the slash is some- Figure 2.22 Key concepts and terms used in times used to separate alleles present in modern genetics. Note that a single gene can homologous chromosomes to avoid ambi- have any number of alleles in the population as guity. Heterozygous genes would be por- a whole, but no more than two alleles can be present in any one individual. trayed by any of the formulas Aa, A1A2, a a , or . In Figure 2.22, the genotype Bb is heterozygous because the B and b al- leles are distinguishable (which is why they smoker is much more likely to develop the are assigned different symbols), whereas disease. Environmental effects also imply the genotype CC is homozygous. These that the same phenotype can result from genotypes could also be written as B b and more than one genotype; smoking again C C, respectively. provides an example, because most smok- Whereas the alleles that are present in ers who are not genetically at risk can also an individual constitute its genotype, the develop lung cancer. physical or biochemical expression of the genotype is called the phenotype. To put it as simply as possible, the distinction is that the genotype of an individual is what 2.6 Types of DNA Markers is on the inside (the alleles in the DNA), whereas the phenotype is what is on the Present in Genomic DNA outside (the observable traits, including Genetic variation, in the form of multiple biochemical traits, behavioral traits, and so alleles of many genes, exists in most natural forth). The distinction between genotype populations of organisms. We have called and phenotype is critically important be- such genetic differences between individu- cause there usually is not a one-to-one cor- als DNA markers; they are also called DNA respondence between genes and traits. polymorphisms. (The term polymorphism Most complex traits, such as hair color, skin literally means “multiple forms.”) The color, height, weight, behavior, life span, methods of DNA manipulation examined in and reproductive fitness, are influenced by Sections 2.3 and 2.4 can be used in a variety many genes. Most traits are also influenced of combinations to detect differences among more or less strongly by environment. This individuals. Anyone who reads the litera- means that the same genotype can result in ture in modern genetics will encounter a different phenotypes, depending on the en- bewildering variety of acronyms referring to vironment. Compare, for example, two different ways in which genetic polymor- people with a genetic risk for lung cancer; if phisms are detected. The different ap- one smokes and the other does not, the proaches are in use because no single
2.6 Types of DNA Markers Present in Genomic DNA 65 method is ideal for all applications, each to differ at one SNP site about every method has its own advantages and limita- 1000–3000 bp in protein-coding DNA and tions, and new methods are continually be- at about one SNP site every 500–1000 bp in ing developed. In this section we examine noncoding DNA. Note, in the definition of a some of the principal methods for detecting SNP, the stipulation that DNA molecules DNA polymorphisms among individuals. must differ at the nucleotide site “fre- quently.” This provision excludes rare ge- netic variation of the sort found in less than Single-Nucleotide 1 percent of the DNA molecules in a popu- Polymorphisms (SNPs) lation. The reason for the exclusion is that A single-nucleotide polymorphism, or genetic variants that are too rare are not SNP (pronounced “snip”), is present at a generally as useful in genetic analysis as the particular nucleotide site if the DNA mole- more common variants. A catalog of SNPs is cules in the population frequently differ in regarded as the ultimate compendium of the identity of the nucleotide pair that oc- DNA markers, because SNPs are the most cupies the site. For example, some DNA common form of genetic differences among molecules may have a T A base pair at a people and because they are distributed particular nucleotide site, whereas other approximately uniformly along the chro- DNA molecules in the same population mosomes. By the middle of 2001, some may have a C G base pair at the same site. 300,000 SNPs are expected to have been This difference constitutes a SNP. The SNP identified in human populations and their defines two “alleles” for which there could positions in the chromosomes located. be three genotypes among individuals in the population: homozygous with T A at the corresponding site in both homologous Restriction Fragment Length chromosomes, homozygous with C G at Polymorphisms (RFLPs) the corresponding site in both homologous Although most SNPs require DNA sequenc- chromosomes, or heterozygous with T A ing to be studied, those that happen to be in one chromosome and C G in the ho- located within a restriction site can be ana- mologous chromosome. The word allele is lyzed with a Southern blot. An example of in quotation marks above because the SNP this situation is shown in Figure 2.23, where need not be in a coding sequence, or even the SNP consists of a T A nucleotide pair in in a gene. In the human genome, any two some molecules and a C G pair in others. In randomly chosen DNA molecules are likely this example, the polymorphic nucleotide
(A)Polymorphic (B) Polymorphic G AATTC GAATTC site GAATTC GAATTC GAACTC site GAATTC CTTAAG CTTAAG CTTAAG CTTAAG CTTGAG CTTAAG
5' 3' 5' 3' 3' 5' 3' 5'
Treatment of DNA Treatment of DNA with EcoRI with EcoRI
No cleavage
Cleavage Result: Two fragments Result: One larger fragment
Figure 2.23 A minor difference in the DNA sequence of two molecules can be detected if the differ- ence eliminates a restriction site. (A) This molecule contains three restriction sites for EcoRI, includ- ing one at each end. It is cleaved into two fragments by the enzyme. (B) This molecule has an altered EcoRI site in the middle, in which 5'-GAATTC-3' becomes 5'-GAACTC-3'. The altered site cannot be cleaved by EcoRI, so treatment of this molecule with EcoRI results in one larger fragment.
66 Chapter 2 DNA Structure and DNA Manipulation site is included in a cleavage site for the re- phism, or RFLP (pronounced either as striction enzyme EcoRI (5'-GAATTC-3'). The “riflip” or by spelling it out). two nearest flanking EcoRI sites are also Because RFLPs change the number and shown. In this kind of situation, DNA mole- size of DNA fragments produced by cules with T A at the SNP will be cleaved at digestion with a restriction enzyme, they both flanking sites and also at the middle can be detected by the Southern blotting site, yielding two EcoRI restriction frag- procedure discussed in Section 2.3. An ex- ments. Alternatively, DNA molecules with ample appears in Figure 2.24. In this case the C G at the SNP will be cleaved at both labeled probe DNA hybridizes near the re- flanking sites but not at the middle site striction site at the far left and identifies the (because the presence of C G destroys the position of this restriction fragment in the EcoRI restriction site) and so will yield only electrophoresis gel. The duplex molecule one larger restriction fragment. A SNP that labeled “allele A” has a restriction site in the eliminates a restriction site is known as a middle and, when cleaved and subjected to restriction fragment length polymor- electrophoresis, yields a small band that
Positions of cleavage sites Direction of current
Site of hybridization Larger DNA Smaller DNA with probe DNA fragments fragments
5’ 3’ “Allele” A 3’ 5’ Duplex DNA DNA band from allele A
5’ 3’ “Allele” a 3’ 5’ Duplex DNA DNA band from allele a
Duplex DNA in homologous Figure 2.24 In a restric- chromosomes tion fragment length polymorphism (RFLP), 5’ 3’ alleles may differ in 3’ 5’ the presence or Homozygous AA absence of a cleavage 5’ 3’ site in the DNA. In this 3’ 5’ example, the a allele DNA band lacks a restriction site from genotype AA that is present in the DNA of the A allele. 5’ 3’ The difference in 3’ 5’ fragment length can Homozygous Aa be detected by 5’ 3’ Southern blotting. 3’ 5’ RFLP alleles are DNA bands codominant, which from genotype Aa means (as shown at the bottom) that DNA from the heterozygous 5’ 3’ Aa genotype yields 3’ 5’ each of the single Homozygous aa 5’ 3’ bands observed in 3’ 5’ DNA from homozy- DNA band gous AA and aa from genotype aa genotypes.
2.6 Types of DNA Markers Present in Genomic DNA 67 Origin of the Human Genetic Linkage Map
David Botstein,1 Raymond L. White,2 tors worldwide for genetic linkage studies. partially by a major locus segregating in Mark Skolnick,3 and Ronald W. Today the CEPH maintains a database on a pedigree to be mapped. Such a pro- Davis4 1980 the individuals in these pedigrees that com- cedure would not require any knowl- prises approximately 12,000 polymorphic edge of the biochemical nature of the 1Massachusetts Institute of Technology, DNA markers and 2.5 million genotypes. trait or of the nature of the alterations in Cambridge, Massachusetts the DNA responsible for the trait. . . . 2University of Massachusetts Medical No method of systematically mapping The most efficient procedure will be to Center, Worcester, Massachusetts human genes has been de- study a small set of 3University of Utah, Salt Lake City, Utah vised, largely because of the In principle, linked large pedigrees which 4Stanford University, Stanford, paucity of highly polymor- have been genotyped California marker loci can phic marker loci. The advent for all known polymor- Construction of a Genetic Linkage Map in allow one to of recombinant DNA tech- phic markers.... The Man Using Restriction Fragment Length nology has suggested a theo- establish, with high resolution of genetic Polymorphisms retically possible way to certainty, the and environmental define an arbitrarily large genotype of an components of dis- This historic paper stimulated a major in- number of arbitrarily poly- ease . . . must involve individual. ternational effort to establish a genetic morphic marker loci. . . . A unraveling the underly- linkage map of the human genome based subset of such polymorphisms can ing genetic predisposition, understand- on DNA polymorphisms. Pedigree studies readily be detected as differences in the ing the environmental contributions, using these genetic markers soon led to the length of DNA fragments after digestion and understanding the variability of ex- chromosomal localization and identifica- with DNA sequence-specific restriction pression of the phenotype. In principle, tion of mutant genes for hundreds of hu- endonucleases. These restriction frag- linked marker loci can allow one to man diseases. A more ambitious goal, still ment length polymorphisms (RFLPs) establish, with high certainty, the geno- only partly achieved, is to understand the can be easily assayed in individuals, fa- type of an individual and, consequently, genetic and environmental interactions in- cilitating large population studies. . . . assess much more precisely the con- volved in complex traits such as heart dis- [Genetic mapping] of many DNA tribution of modifying factors such as ease and cancer. The "small set of large marker loci should allow the es- secondary genes, likelihood of expres- pedigrees" called for in the excerpt was tablishment of a set of well-spaced, sion of the phenotype, and environment. soon established by the Centre d'Etude du highly polymorphic genetic markers Polymorphisme Humain (CEPH) in Paris, covering the entire human genome [and Source: American Journal of Human Genetics France, and made available to investiga- enabling] any trait caused wholly or 32: 314–331
contains sequences homologous to the codominant. In Figure 2.24, the bands probe DNA. The duplex molecule labeled from AA and aa have been shown as some- “allele a” lacks the middle restriction site what thicker than those from Aa, because and yields a larger band. In this situation each AA genotype has two copies of the A there can be three genotypes—AA, Aa, or allele and each aa genotype has two copies aa, depending on which alleles are present of the a allele, compared with only one in the homologous chromosomes—and all copy of each allele in the heterozygous three genotypes can be distinguished as genotype Aa. shown in the Figure 2.24. Homozygous AA yields only a small fragment, homozygous aa yields only a large fragment, and het- Random Amplified Polymorphic erozygous Aa yields both a small and a large DNA (RAPD) fragment. Because the presence of both the For studying DNA markers, one limitation A and a alleles can be detected in heterozy- of Southern blotting is that it requires mate- gous Aa genotypes, A and a are said to be rial (probes available in the form of cloned
68 Chapter 2 DNA Structure and DNA Manipulation DNA) and one limitation of PCR is that it re- often anneal to genomic DNA at multiple quires sequence information (so primer sites. Some primers anneal in the proper oligonucleotides can be synthesized). These orientation and at a suitable distance from are not severe handicaps for organisms that each other to support amplification of the are well studied (for example, human be- unknown sequence between them. Among ings, domesticated animals and cultivated the set of amplified fragments are ones that plants, and model genetic organisms such as can be amplified from some genomic DNA yeast, fruit fly, nematode, or mouse), be- samples but not from others, which means cause research materials and sequence in- that the presence or absence of the ampli- formation are readily available. But for the fied fragment is polymorphic in the popula- vast majority of organisms that biologists tion of organisms. study, there are neither research materials In most organisms it is usually straight- nor sequence information. Genetic analysis forward to identify a large number of can still be carried out in these organisms by RAPDs that can serve as genetic markers for using an approach called random ampli- many different kinds of genetic studies. An fied polymorphic DNA or RAPD (pro- example of RAPD gel analysis is illustrated nounced “rapid”), described in this section. in Figure 2.25, where three pairs of primers RAPD analysis makes use of a set of PCR (sets 1–3) are used to amplify genomic DNA primers of 8–10 nucleotides whose se- from four individuals in a population. The quence is essentially random. The random fragments that amplify are then separated primers are tried individually or in pairs in on an electrophoresis gel and visualized af- PCR reactions to amplify fragments of ter straining with ethidium bromide. Many genomic DNA from the organism of inter- amplified bands are typically observed for est. Because the primers are so short, they each primer set, but only some of these are
RAPD primer sets
Set 1 Set 2 Set 3 Individuals ABCDABCDABCD
RAPD bands on Figure 2.25 Random electrophoresis amplified polymorphic gel DNA (RAPD) is detected through the use of relatively short primer sequences that, by chance, match genomic DNA at Monomorphic multiple sites that are bands close enough together to support PCR amplification. Genomic DNA from a single individual typically yields many Polymorphic bands, only some of bands which are polymor- phic in the population. Different sets of Fragments that PCR-amplify with genomic DNA from some primers amplify differ- individuals but not others, using the same primer set ent fragments of genomic DNA.
2.6 Types of DNA Markers Present in Genomic DNA 69 polymorphic. These are indicated in Figure Figures 2.24 through 2.26 illustrate an 2.25 by the colored dots. The amplified important point: bands that are not polymorphic are said to In modern genetics, the phenotypes that are be monomorphic in the sample, which studied are very often bands in a gel rather means that they are the same from one in- than physical or physiological characteristics. dividual to the next. This example shows 17 RAPD polymorphisms. Figure 2.26 shows an Figure 2.25 offers a good example. Each actual RAPD gel amplified from genomic position at which a band is observed in one DNA obtained from small tissue samples or more samples is a phenotype, whether from a population of fish (Campostoma or not the band is polymorphic. For exam- anomalum) in the Great Miami River Basin, ple, primer set 1 yields a total of 19 bands, Ohio. The fish were collected as part of a of which 5 are polymorphic and 14 are water quality assessment program to deter- monomorphic in the sample. The pheno- mine whether fish populations in stressful types could be named in any convenient water environments progressively lose their way, such as by indicating the primer set genetic variation (that is, become increas- and the fragment length. For example, ingly monomorphic). Each pair of samples suppose that the smallest amplified frag- is flanked by a lane containing DNA size ment for primer set 1 is 125 bp, which is standards producing a “ladder” of fragments the polymorphic fragment at the bottom at 100-bp increments. left in Figure 2.25. We could name this fragment unambiguously as 1-125 because it is a fragment of 125 bp amplified by primer set 1. To understand why a DNA band is a phe- notype, rather than a gene or a genotype, it is useful to assign different names to the “alleles” that do or do not support amplifi- cation. (The word allele is in quotation marks again, because the 1-125 fragment that is amplified need not be part of a gene.) We are talking only about the 1-125 frag- 900 bp ment, so we could call the allele capable of supporting amplification the plus ( ) allele and the allele not capable of supporting am- plification the minus ( ) allele. Then there are three possible genotypes with regard to the amplified fragment: , , and 400 bp . Using genomic DNA from these geno- types, the homozygous and hetero- zygous will both support amplification of the 1-125 fragment, whereas the genotype will not support amplification. Hence the presence of the 1-125 fragment is the phenotype observed in both and Figure 2.26 RAPD polymorphisms in the genotypes. In other words, with stoneroller fish (Campostoma anomalum) trapped regard to amplification, the allele is in tributaries of the Great Miami River in Ohio. dominant to the allele, because the Each pair of samples is flanked by a lane phenotype (presence of the 1-125 band) is containing DNA size standards; in these lanes, the smallest DNA fragment is 100 base pairs present in both homozygous and het- Au: Are 400 and (bp), and each successively larger fragment erozygous genotypes. Therefore, on 900 base pairs cor- increases in size by 100 bp. Fragments whose the basis of the phenotype for the 1-125 sizes are multiples of 500 bp are present in rect? Or is it 100 band in Figure 2.25, we could say that indi- greater concentration and so yield darker bands. and 400 base viduals A and D could have either a or [Courtesy of Michael Simonich, Manju Garg, pairs? and Ana Braam (Pathology Associates genotype but that individuals B and C International, Cincinnati, Ohio).] must have genotype .
70 Chapter 2 DNA Structure and DNA Manipulation polymorphisms known as amplified frag- Amplified Fragment Length ment length polymorphisms, or AFLPs Polymorphisms (AFLPs) (usually pronounced by spelling it out). Because RAPD primers are small and may The first step (part A) is to digest genomic not match the template DNA perfectly, the DNA with a restriction enzyme; this exam- amplified DNA bands often differ a great ple uses the enzyme EcoRI, whose restric- deal in how dark or light they appear. This tion site is 5'-GAATTC-3'. Digestion yields a variation creates a potential problem, be- large number of restriction fragments cause some exceptionally dark bands may flanked by what remains of an EcoRI site on actually result from two amplified DNA each side. In the next step (part B), double- fragments of the same size, and some stranded oligonucleotides called primer exceptionally light bands may be difficult to adapters, with single-stranded overhangs visualize consistently. To obtain amplified complementary to those on the restriction fragments that yield more uniform band in- fragments, are ligated onto the restriction tensities, double-stranded oligonucleotide fragments using the enzyme DNA ligase. sequences that match the primer sequences The resulting fragments (C) are ready for perfectly can be attached to genomic amplification by means of PCR. Note that restriction fragments enzymatically prior to the same adapter is ligated onto each end, amplification. This method, which is out- so a single primer sequence will anneal to lined in Figure 2.27, yields a class of DNA both ends and support amplification.
(A) EcoRI site EcoRI site
5 ’–GAATTC GAATTC– 3 ’ 3 ’–CTTAAG CTTAAG– 5 ’
Cleavage
(B) 5’–NN…NN–3’ 5’–AATTC G–3’ 5’–AATTNN…NN–3’ 3’–NN…NNTTAA–5’ 3’–G CTTAA–5’ 3’–NN…NN–5’ Primer adapter Restriction fragment Primer adapter
Adapter ligation
(C) 5’–NN…NNAATTC GAATTNN…NN–3’ 3’–NN…NNTTAAG CTTAANN…NN–5’
Amplification Fraction of fragments (D) Primer sequence amplified 5’–NN…NNAATTC–3’ 3’–CTTAANN…NN–5’ All Nucleotide 5’–NN…NNAATTCA–3’ extensions reduce 3’–ACTTAANN…NN–5’ 1/16 5’–NN…NNAATTCAC–3’ the number of 3’–CACTTAANN…NN–5’ 1/256 amplifications. 5’–NN…NNAATTCACT–3’ 3’–TCACTTAANN…NN–5’ 1/4096
Figure 2.27 An amplified fragment length polymorphism (AFLP). (A and B) Genomic DNA is digested with one or more restriction enzymes (in this case, EcoRI). (C) Oligonucleotide adaptors are ligated onto the fragments; note that the single-stranded overhang of the adaptors matches those of the genomic DNA fragments. (D) The resulting fragments are subjected to PCR using primers complementary to the adaptors. The number of amplified fragments can be adjusted by manipulat- ing the number of nucleotides in the adaptors that are also present in the primers.
2.6 Types of DNA Markers Present in Genomic DNA 71 09131-01-1747P
Photocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaptionpho- tocaptionphotocaptionphotocaptionphotocaption
There are nevertheless a number of choices concerning the primer sequence. A Simple Tandem Repeat primer that matches the adapters perfectly Polymorphisms (STRPs) will amplify all fragments, but this often re- One more type of DNA polymorphism war- sults in so many amplified fragments that rants consideration because it is useful in they are not well separated in the gel. A DNA typing for individual identification PCR primer must match perfectly at its 3' and for assessing the degree of genetic relat- end to be elongated. Thus, additional nu- edness between individuals. This type of cleotides added to the 3' end reduce the polymorphism is called a simple tandem number of amplified fragments, because repeat polymorphism (STRP) because these primers will amplify only those frag- the genetic differences among DNA mole- ments that, by chance, have a complemen- cules consist of the number of copies of a tary nucleotide immediately adjacent to the short DNA sequence that may be repeated EcoRI site. For example, the primer se- many times in tandem at a particular locus quence in the second row has a single- in the genome. STRPs that are present at nucleotide 3' extension; because only different loci may differ in the sequence 1 4 1 4 of the fragments would be ex- and length of the repeating unit, as well as pected to have a complementary T immedi- in the minimum and maximum number of ately adjacent to the EcoRI site on both tandem copies that occur in DNA molecules sides, this primer is expected to amplify in the population. A STRP with a repeating 1 16 of all the restriction fragments. Simi- unit of 2–9 bp is often called a microsatel- larly, the primer sequence in the third row lite or a simple sequence length poly- has a two-nucleotide 3' extension, so this morphism (SSLP), whereas a STRP with a primer is expected to amplify 1 16 repeating unit of 10–60 bp is often called a 1 16 1 256 of all the fragments. One ap- minisatellite or a variable number of plication of AFLP analysis is to organisms tandem repeats (VNTR). with large genomes, such as grasshoppers Figure 2.28 shows an example of a STRP and crickets, for which RAPD analysis with a copy number ranging from 1 would yield an excessive number of ampli- through 10. Because the number of copies fied bands. How large is a “large” genome? determines the size of any restriction frag- Compared to the human genome, that of ment that includes the STRP, each DNA the brown mountain grasshopper Podisma molecule yields a single-size restriction pedestris is 7 times larger, and those of North fragment depending on the number of American salamanders in the genus Amphi- copies it contains. The STRP in Figure 2.28 uma are 70 times larger! (Genome size is has 10 different “alleles” (again, we use discussed in Chapter 8.) quotation marks because the STRP may not
72 Chapter 2 DNA Structure and DNA Manipulation Au: Proofreader asks that youconfirm refer- ence to Chapter 8 in 1st para, 1st column. Direction of current Positions of cleavage sites Larger DNA Smaller DNA fragments fragments 1 Duplex DNA 5’ 3’ molecules 3’ 5’
12 Position 5’ 3’ of band in 3’ 5’ DNA gel
1 2 3 5’ 3’ 3’ 5’
1 2 3 4 5’ 3’ 3’ 5’
1 2 3 4 5 5’ 3’ 3’ 5’
1 2 3 4 5 6 5’ 3’ 3’ 5’
1 2 3 4 5 6 7 5’ 3’ 3’ 5’
1 2 3 4 5 6 7 8 5’ 3’ 3’ 5’
1 2 3 4 5 6 7 8 9 5’ 3’ 3’ 5’
1 2 3 4 5 6 7 8 910 5’ 3’ 3’ 5’
Tandem repeats of a DNA sequence
Figure 2.28 In a simple tandem repeat polymorphism (STRP), the alleles in a population differ in the number of copies of a short sequence (typically 2–60 bp) that is repeated in tandem along the DNA molecule. This example shows alleles in which the repeat number varies from 1 to 10. Cleavage at restriction sites flanking the STRP yields a unique fragment length for each allele. The alleles can also be distinguished by the size of the fragment amplified by PCR using primers that flank the STRP.
be in a gene), which could be distinguished individual genotype can carry at most two either by Southern blotting using a probe to different alleles. Nevertheless, a large a unique (nonrepeating) sequence within number of alleles means an even larger the restriction fragment or by PCR amplifi- number of genotypes, which is the feature cation using primers to a unique sequence that gives STRPs their utility in individual on either side of the tandem repeats. In this identification. For example, even with only situation the locus is said to have multiple 10 alleles in a population of organisms, alleles in the population. Even with multi- there could be 10 different homozygous ple alleles, however, any one chromosome genotypes and 45 different heterozygous can carry only one of the alleles, and any genotypes.
2.6 Types of DNA Markers Present in Genomic DNA 73 More generally, with n alleles there are other hand, among women who are not n homozygous genotypes and n(n 1) 2 carriers, the lifetime risk of breast cancer is heterozygous genotypes, or n(n 1) 2dif- about 12 percent, and hence many women ferent genotypes altogether. With STRPs, without the genetic risk factor do develop not only are there a relatively large number breast cancer. Indeed, BRCA1 mutations are of alleles, but no one allele is exceptionally found in only 16 percent of affected women common, so each of the many genotypes in who have a family history of breast cancer. the population has a relatively low fre- The importance of a genetic risk factor can quency. If the genotypes at 6–8 STRP loci be expressed quantitatively as the relative are considered simultaneously, then each risk, which equals the risk of the disease in possible multiple-locus genotype is exceed- persons who carry the risk factor as com- ingly rare. Because of their high degree of pared to the risk in persons who do not. variation among people, STRPs are widely The relative risk for BRCA1 equals 3.0 (cal- used in DNA typing (sometimes called culated as 36 percent 12 percent). DNA fingerprinting) to establish indi- The utility of DNA polymorphisms in lo- vidual identity for use in criminal investi- cating and identifying disease genes results gations, parentage determinations, and so from genetic linkage, the tendency for forth (Chapter 17). genes that are sufficiently close together in a chromosome to be inherited together. Ge- netic linkage will be discussed in detail in Chapter 5, but the key concepts are sum- 2.7 marized in Figure 2.29, which shows the lo- Applications of cation of many DNA polymorphisms along DNA Markers a chromosome that also carries a genetic Why are geneticists interested in DNA risk factor denoted D (for disease gene). markers (DNA polymorphisms)? Their in- Each DNA polymorphism serves as a ge- terest can be justified on any number of netic marker for its own location in the grounds. In this section we consider the chromosome. The importance of genetic reasons most often cited. linkage is that DNA markers that are suffi- ciently close to the disease gene will tend to be inherited together with the disease gene in pedigrees—and the closer the markers, Genetic Markers, Genetic Mapping, the stronger this association. Hence, the ini- and “Disease Genes” tial approach to the identificationofadis- Perhaps the key goal in studying DNA poly- ease gene is to find DNA markers that are morphisms in human genetics is to identify genetically linked with the disease gene in the chromosomal location of mutant genes order to identify its chromosomal location, associated with hereditary diseases. In the a procedure known as genetic mapping. context of disorders caused by the interac- Once the chromosomal position is known, tion of multiple genetic and environmental other methods can be used to pinpoint the factors, such as heart disease, cancer, dia- disease gene itself and to study its functions. betes, depression, and so forth, it is impor- If genetic linkage seems a roundabout tant to think of a harmful allele as a risk way to identify disease genes, consider the factor for the disease, which increases the alternative. The human genome contains probability of occurrence of the disease, approximately 80,000 genes. If genetic rather than as a sole causative agent. This linkage did not exist, then we would have needs to be emphasized, especially because to examine 80,000 DNA polymorphisms, genetic risk factors are often called disease one in each gene, in order to identify a dis- genes. For example, the major “disease ease gene. But the human genome has only gene” for breast cancer in women is the 23 pairs of chromosomes, and because of gene BRCA1. For women who carry a mu- genetic linkage and the power of genetic tant allele of BRCA1, the lifetime risk of mapping, it actually requires only a few breast cancer is about 36 percent, and hundred DNA polymorphisms to identify hence most women with this genetic risk the chromosome and approximate location factor do not develop breast cancer. On the of a genetic risk factor.
74 Chapter 2 DNA Structure and DNA Manipulation Locus of a “disease gene” (a genetic risk factor), D DNA markers that are too far from the DNA markers that are close enough disease gene in the chromosome (or are in a to a disease gene tend to be inherited different chromosome) are not linked to the together (genetically linked) with the disease gene. They do not tend to be disease gene. inherited with the disease gene in pedigrees.
D
DNA polymorphisms The closer a marker is to the disease gene, (genetic markers) along the closer the linkage and the more the chromosomes likely it is that they will be inherited together.
Figure 2.29 Concepts in genetic localization of genetic risk factors for disease. Polymorphic DNA markers (indicated by the vertical lines) that are close to a genetic risk factor (D) in the chromosome tend to be inherited together with the disease itself. The genomic location of the risk factor is deter- mined by examining the known genomic locations of the DNA polymorphisms that are linked with it.
Improvement of domesticated plants Other Uses for DNA Markers and animals. Plant and animal breeders DNA polymorphisms are widely used in all have turned to DNA polymorphisms as aspects of modern genetics because they genetic markers in pedigree studies to iden- provide a large number of easily accessed tify, by genetic mapping, genes that are as- genetic markers for genetic mapping and sociated with favorable traits in order to other purposes. Among the other uses of incorporate these genes into currently used DNA polymorphisms are the following. varieties of plants and breeds of animals. Individual identification. We have al- History of domestication. Plant and ani- ready mentioned that DNA polymorphisms mal breeders also study genetic polymor- have application as a means of DNA typing phisms to identify the wild ancestors of (DNA fingerprinting) to identify different cultivated plants and domesticated animals, individuals in a population. DNA typing in as well as to infer the practices of artificial other organisms is used to determine indi- selection that led to genetic changes in vidual animals in endangered species and these species during domestication. to identify the degree of genetic relatedness DNA polymorphisms as ecological in- among individual organisms that live in dicators. DNA polymorphisms are being packs or herds. For example, DNA typing in evaluated as biological indicators of genetic wild horses has shown that the wild stallion diversity in key indicator species present in in charge of a harem of mares actually sires biological communities exposed to chemi- fewer than one-third of the foals. cal, biological, or physical stress. They are Epidemiology and food safety science. also used to monitor genetic diversity in DNA typing also has important applications endangered species and species bred in in tracking the spread of viral and bacterial captivity. epidemic diseases, as well as in identifying Evolutionary genetics. DNA polymor- the source of contamination in contami- phisms are studied in an effort to describe nated foods. the patterns in which different types of Human population history. DNA poly- genetic variation occur throughout the morphisms are widely used in anthropology genome, to infer the evolutionary mecha- to reconstruct the evolutionary origin, nisms by which genetic variation is main- global expansion, and diversification of the tained, and to illuminate the processes by human population. which genetic polymorphisms within
2.7 Applications of DNA Markers 75 species become transformed into genetic population history, patterns of migration, differences between species. and so forth. Population studies. Population ecolo- Evolutionary relationships among spe- gists employ DNA polymorphisms to assess cies. Differences in homologous DNA se- the level of genetic variation in diverse quences between species is the basis of populations of organisms that differ in ge- molecular systematics, in which the se- netic organization (prokaryotes, eukary- quences are analyzed to determine the otes, organelles), population size, breeding ancestral history (phylogeny) of the species structure, or life-history characters, and and to trace the origin of morphological, they use genetic polymorphisms within behavioral, and other types of adaptations subpopulations of a species as indicators of that have arisen in the course of evolution.
Chapter Summary The sequence of bases in the human genome is 99.9% stable duplexes with whatever fragments contain suffi- identical from one person to the next. The remaining ciently complementary base sequences, and the positions 0.1%—comprising 3 million base pairs—differs among in- of these duplexes can be determined by exposing the fil- dividuals. Included in these differences are many muta- ter to x-ray film on which radioactive emission (or, in tions that cause or increase the risk of disease, but the some procedures, light emission) produces an image of majority of the differences are harmless in themselves. the band. Particular DNA sequences can also be amplified Any of these differences between genomes can be used as without cloning by means of the polymerase chain reac- a genetic marker. Genetic markers are widely employed in tion (PCR), in which short, synthetic oligonucleotides are genetics to serve as positional landmarks along a chromo- used as primers to replicate repeatedly and amplify the some or to identify particular cloned DNA fragments. The sequence between them. The primers must flank, and manipulation of DNA molecules to identify genetic mark- have their 3' ends oriented toward, the region to be am- ers is the basic experimental operation in modern genetics. plified, because DNA polymerase can elongate the A DNA strand is a polymer of deoxyribonucleotides, primers only by the addition of successive nucleotides to each composed of a nitrogenous base, a deoxyribose the 3' end of the growing chain. Each round of PCR am- sugar, and a phosphate. Sugars and phosphates alternate plification results in a doubling of the number of ampli- in forming a polynucleotide chain with one terminal 3'- fied fragments. OH group and one terminal 5'-P group. In double- Most genes are present in pairs in the nonreproduc- stranded (duplex) DNA, the two strands are paired and tive cells of most animals and higher plants. One member antiparallel. Each end of the double helix carries a termi- of each gene pair is in the chromosome inherited from the nal 3'-OH group in one strand and a terminal 5'-P group maternal parent, and the other member of the gene pair is in the other strand. The four bases found in DNA are the at a corresponding location (locus) in the homologous purines, adenine (A) and guanine (G), and the pyrim- chromosome inherited from the paternal parent. A gene idines, cytosine (C) and thymine (T). Equal numbers of can have different forms that correspond to differences in purines and pyrimidines are found in double-stranded DNA sequence. The different forms of a gene are called al- DNA (Chargaff’s rules), because the bases are paired as leles. The particular combination of alleles present in an A T pairs and G C pairs. The hydrogen-bonded base organism constitutes its genotype. The observable charac- pairs, along with hydrophobic base stacking of the nu- teristics of the organism constitute its phenotype. In an cleotide pairs in the core of the double helix, hold the two organism, if the two alleles of a gene pair are the same polynucleotide strands together in a double helix. (for example, AA or aa), then the genotype is homozy- Duplex DNA can be cleaved into fragments of defined gous for the A or a allele; if the alleles are different (Aa), length by restriction enzymes, each of which cleaves DNA then the genotype is heterozygous. Even though each at a specific recognition sequence (restriction site) usually genotype can include at most two alleles, multiple alleles four or six nucleotide pairs in length. These fragments are often encountered among the individuals in natural can be separated by electrophoresis. The positions of par- populations. ticular restriction fragments in a gel can be visualized by DNA polymorphisms (DNA markers) are common in means of nucleic acid hybridization, in which strands of natural populations of most organisms. Among the most duplex DNA that have been separated (denatured) by widely used DNA polymorphisms are single-nucleotide heating are mixed and come together (renature) with polymorphisms (SNPs), restriction fragment length poly- strands having complementary nucleotide sequences. In morphisms (detected by Southern blots), and such PCR- a Southern blot, denatured and labeled probe DNA is based polymorphisms as random amplified polymorphic mixed with denatured DNA made up of restriction frag- DNA (RAPD), amplified fragment length polymorphisms ments that have been transferred to a filter membrane af- (AFLPs), and simple tandem repeat polymorphisms ter electrophoresis. The probe DNA anneals and forms (STRPs). DNA polymorphisms are used in genetic map-
76 Chapter 2 DNA Structure and DNA Manipulation ping studies to identify DNA markers that are genetically man population history, and improving cultivated plants linked to disease genes (genetic risk factors) in the chro- and domesticated animals, as well as for the genetic mosome in order to pinpoint their location. They are also monitoring of endangered species and for many other used in DNA typing for identifying individuals, tracking purposes. the course of virus and bacterial epidemics, studying hu-
Key Terms allele genomic DNA probe amplification genotype purine amplified fragment length heterozygous pyrimidine polymorphism (AFLP) homologous chromosomes random amplified polymorphic antiparallel homozygous DNA (RAPD) band hydrogen bond relative risk base composition hydrophobic interaction renaturation base pairing kilobase (kb) restriction endonuclease base stacking locus restriction enzyme B form of DNA major groove restriction fragment blunt end microsatellite restriction fragment length polymor- Chargaff’s rules minisatellite phism (RFLP) chromosome minor groove restriction map codominant monomorphic restriction site denaturation multiple alleles risk factor denatured DNA nucleic acid hybridization simple sequence length polymor- phism (SSLP) disease gene nucleoside simple tandem repeat polymorphism DNA cloning nucleotide (STRP) DNA fingerprinting oligonucleotide DNA marker single-nucleotide polymorphism palindrome (SNP) DNA polymerase percent G C Southern blot DNA polymorphism phenotype sticky end DNA typing phosphodiester bond thermophile dominant polarity 3'-OH (hydroxyl) group 5'-P (phosphate) group polymerase chain reaction variable number of tandem repeats gel electrophoresis (PCR) (VNTR). gene polynucleotide chain Z form of DNA genetic linkage primer genetic mapping primer adapter genetic marker
Review the Basics • What four bases are commonly found in the nu- • Describe how a Southern blot is carried out. Explain cleotides in DNA? Which form base pairs? what it used for. What is the role of the probe? • Which chemical groups are present at the extreme 3’ • How does the polymerase chain reaction work? What and 5’ ends of a single polynucleotide strand? is it used for? What information about the target se- quence must be known in advance? What is the role • What does it mean to say that a single strand of DNA of the oligonucleotide primers? strand has a polarity? What does it mean to say that the DNA strands in a duplex molecule are anti- • What is a DNA marker? Explain how harmless DNA parallel? markers can serve as aids in identifying disease genes through genetic mapping. • What are restriction enzymes and why are they im- portant in the study of particular DNA fragments? • Define and given an example of each of the following What does it mean to say that most restriction sites key genetic terms: locus, allele, genotype, hetero- are palindromes? zygous, homozygous, phenotype.
Review the Basics 77 GeNETics on the Web will introduce you to some of the most contributing to the stability of double-stranded DNA is the important sites for finding genetic information on the Inter- stacking of the base pairs on top of one another as a result of net. To explore these sites, visit the Jones and Bartlett home hydrophobic interactions. For further discussion of this fea- page at ture of DNA, and much else of interest regarding the discov- ery and analysis of this critical biological macromolecule, http://www.jbpub.com/genetics consult the keyword site. For the book Genetics: Analysis of Genes and Genomes, choose • The concept of the polymerase chain reaction (PCR)oc- the link that says Enter GeNETics on the Web. You will be pre- curred to Kary Mullis one night while cruising on Route 128 sented with a chapter-by -chapter list of highlighted keywords. from San Francisco to Mendocino. He immediately realized Select any highlighted keyword and you will be linked to a Web that this approach would be unique in its ability to amplify, at site containing genetic information related to the keyword. an exponential rate, a specific nucleotide sequence present • DNA is like Coca-Cola? According to this keyword site, it in a vanishingly small quantity amid a much larger back- is. It contains sugar, which is highly soluble in water; phos- ground of total nucleic acid. Once its feasibility was demon- phate groups, which are of moderate solubility; and bases, strated, PCR was quickly recognized as a major technical which have extremely low solubility. (The base in the soft drink advance in molecular biology. The new technique earned is caffeine, which is chemically similar to adenine and can Mullis the 1993 Nobel Prize in chemistry, and today it is the sometimes be incorporated into DNA, causing a mutation.) basis of a large number of experimental and diagnostic pro- As this keyword site emphasizes, the most important property cedures. At this keyword site you can learn more about the
Guide to Problem Solving Problem 1 Distinguish between base pairing and base How many possible restriction maps are compatible with stacking in double-stranded DNA. these data? For each possible restriction map, make a dia- gram of the circular molecule and indicate the relative po- Answer Base pairing is the hydrogen bonding between sitions of the EcoRI and HindIII restriction sites. corresponding bases in opposite strands of duplex DNA; A (adenine) is paired with T (thymine), and G (guanine) is Answer Because the single-enzyme digests give two paired with C (cytosine). Base stacking refers to the bands each, there must be two restriction sites for each hydrophobic (water-hating) interaction between consec- enzyme in the molecule. Furthermore, because digestion utive base pairs along a DNA duplex, which promotes the with HindIII makes both the 6-kb and the 14-kb restric- formation of a “stack” of base pairs with the sugar–phos- tion fragments disappear, each of these fragments must phate backbones of the strands running along outside. contain one HindIII site. Considering the sizes of the frag- ments in the double digest, the 6-kb EcoRI fragment must Problem 2 The restriction enzyme EcoRI cleaves double- be cleaved into 2-kb and 4-kb fragments, and the 14-kb stranded DNA at the sequence 5'-GAATTC-3', and the re- EcoRI fragment must be cleaved into 5-kb and 9-kb frag- striction enzyme HindIII cleaves at 5'-AAGCTT-3'. A ments. Two restriction maps are compatible with the 20-kilobase (kb) circular plasmid is digested with each data, depending on which end of the 6-kb EcoRI frag- enzyme individually and then in combination, and the ment the HindIII site is nearest. The position of the re- resulting fragment sizes are determined by means of elec- maining HindIII site is determined by the fact that the trophoresis. The results are as follows: 2-kb and 5-kb fragments in the double digest must be ad- EcoRI alone fragments of 6 kb and 14 kb jacent in the intact molecule in order for a 13-kb frag- HindIII alone fragments of 7 kb and 13 kb ment to be produced by HindIII digestion alone. The EcoRI and HindIII fragments of 2 kb, 4 kb, 5 kb and 9 kb accompanying figure shows the relative positions of the
EcoRI EcoRI EcoRI (A) (B) HindIII (C)
2 6 5 4 9 HindIII HindIII 4 14 2 9 5 EcoRI EcoRI EcoRI HindIII
78 Chapter 2 DNA Structure and DNA Manipulation development of PCR from Mullis’s original conception, in- • The Pic Site showcases some of the most visually appeal- cluding two major innovations that were necessary to perfect ing genetics sites on the World Wide Web. To visit the genetics the process. Web site pictured below, select the PIC Site for Chapter 2.
• Human beings rely on plants for food, shelter, and medi- cines. Although at least 5000 species are cultivated, modern agricultural research emphasizes a few widely cultivated crops while largely ignoring plants such as Bambara groundnut (Vi- gna subterranea), breadfruit (Artocarpus altilis), carob (Cerato- nia siliqua), coriander (Coriandrum sativum), emmer wheat (Triticum dicoccum), oca (Oxalis tuberosa), and ulluco (Ullucus tuberosus). To learn more about these minor crops and the use of molecular markers for characterizing and preserving their genetic diversity, consult this keyword site.
• The Mutable Site changes frequently. Each new update in- cludes a different site that highlights genetics resources avail- able on the World Wide Web. Select the Mutable Site for Chapter 2 and you will be linked automatically.
EcoRI sites (part A). Parts B and C are the two possible re- accompanying gel diagram, indicate the genotypes across striction maps, which differ according to whether the the top and the phenotype (band position or positions) EcoRI site at the top generates the 2-kb or the 4-kb frag- expected for each genotype. The scale on the right shows ment in the double digest. the expected positions of fragments ranging in size from 1 to 12 kb. Problem 3 The accompanying diagram shows the positions of restriction sites (tick marks) for a particular restriction Answer After cleavage with the restriction enzyme, the enzyme that can be present in the DNA at a locus in a hu- A1-type DNA yields a 3-kb fragment that binds with the man chromosome. The DNA present in any particular probe (yielding a 3-kb band) and a 9-kb fragment that chromosome may be that shown at the top or that shown does not bind with the probe (yielding no visible band), at the bottom. A probe DNA binds to the fragments at the whereas the A2-type DNA yields a 12-kb fragment that position shown by the rectangle. With respect to an RFLP binds with the probe (yielding a 12-kb band). A particu- based on these fragments, three genotypes are possible. lar chromosome may carry allele A1 or allele A2. Because What are they? Use the symbol A1 to refer to the allele individuals have two copies of each chromoosme (ex- that yields the upper DNA fragment, and use A2 to refer cept for the sex chromosomes), any individual may to the allele that yields the lower DNA fragment. In the carry A1A1, A1A2,orA2A2. DNA from homozygous A1A1 genotypes yields a 3-kb band, that from heterozygous A A genotypes yields both a 3-kb and a 12-kb band, 3 kb 9 kb 1 2 and that from homozygous A A genotypes yields a 12- A1 2 2 kb band. The expected phenotypes are illustrated here. 12 kb A2
A1A1 A1A2 A2A2 12 kb 12 kb 9kb 9kb 6kb 6kb
3kb 3kb
1kb 1kb
Guide to Problem Solving 79 5'-GTACGGGCAATGGTAATTTTTCAGGAACCAGGGCCCTTAAGCCGTC-3' 3'-CATGCCCGTTACCATTAAAAACTCCTTGGTCCCGGGAATTCGGCAG-5' Problem 4
Problem 4 A geneticist plans to use the polymerase chain one that is elongated in a left-to-right direction) should reaction (PCR) to amplify part of the DNA sequence have the sequence 5'-GCAATG-3' and the “reverse shown below, using oligonucleotide primers that are primer” (the one that is elongated in a right-to-left direc- hexamers matching the regions shown in red. (In prac- tion) should have the sequence 3'-TTCGGC-5'. The re- tice, hexamers are too short for most purposes.) State the sulting amplified sequence is shown below. sequence of the primer oligonucleotides that should be 5'-GCAATGGTAATTTTTCAGGAACCAGGGCCCTTAAGCCG-3' used, including the polarity, and give the sequence of the DNA molecule that results from amplification. 3'-CGTTACCATTAAAAACTCCTTGGTCCCGGGAATTCGGC-5'
Answer The primers must be able to base-pair with the chosen primer sites and must be oriented with their 3' ends facing one another. Thus the “forward primer” (the AU/ED: Space in art for prob 2.6 reduced slightly to make page.OK?
Analysis and Applications 2.1 Many restriction enzymes produce restriction frag- 2.6 The linear DNA fragment shown here has cleavage ments that have “sticky ends.” What does this mean? sites for BamHI (B) and EcoRI (E). In the accompanying diagram of an electrophoresis gel, indicate the positions at 2.2 Which of the following sequences are palindromes which bands would be found after digestion with: and which are not? Explain your answer. Symbols such (a) BamHI alone as (A T) mean that the site may be occupied by (in this (b) EcoRI alone case) either A or T, and N stands for any nucleotide. (c) BamHI and EcoRI together (a) 5'-AATT-3' The dashed lines on the right indicate the positions to (b) 5'-AAAA-3' which bands of 1–12 kb would migrate. (c) 5'-AANTT-3' (d) 5'-AA(A T)AA-3' BEB E (e) 5'-AA(G C)TT-3'
02468 10 12 kb 2.3 The following list gives half of each of a set of palin- dromic restriction sites. What is the complete sequence of each restriction site? (N stands for any nucleotide.) BamHI (a) 5'-AA??-3' + (b) 5'-ATG???-3' BamHI EcoRI EcoRI (c) 5’'-GGN??-3' 12 kb (d) 5'-ATNN??-3' 9kb
2.4 Apart from the base sequence, what is different about 6kb the ends of restriction fragments produced by the follow- ing restriction enzymes? (The downward arrow repre- 3kb sents the site of cleavage in each strand.) (a) HaeIII (5'-GG j CC-3') (b) MaeI (5'-C j TAG-3') 2.7 The circular DNA molecule shown at the top of page (c) CfoI (5'-GCG j C-3') 81 has cleavage sites for BamHI and EcoRI. In the accom- (a) panying diagram of an electrophoresis gel, indicate the 2.5 A solution contains double- positions at which bands would be found after digestion stranded DNA fragments of size 3 kb, (b) with: 6 kb, 9 kb, and 12 kb. They are sepa- (a) BamHI alone rated in an electrophoresis gel. In (c) (b) EcoRI alone the diagram of the gel at the right, (c) BamHI and EcoRI together match the fragment sizes with the (d) The dashed lines on the right indicate the positions to correct bands. which bands of 1–12 kb would migrate.
80 Chapter 2 DNA Structure and DNA Manipulation 0 kb (a) A restriction enzyme with a 4-base cleavage site? 91AU/ED: Space in art for BamHI prob 2.7 reduced slightly (b) A restriction enzyme with a 6-base cleavage site? to make page.OK? (c) A restriction enzyme with an 8-base cleavage site? 8 EcoRI 2 2.11 In a random sequence consisting of equal proportions of all four nucleotides, what is the average distance be- BamHI 7 3 tween restriction sites for: (a) A restriction enzyme with a 4-base cleavage site? EcoRI 6 4 (b) A restriction enzyme with a 6-base cleavage site? 5 kb (c) A restriction enzyme with an 8-base cleavage site?
BamHI 2.12 If human DNA were essentially a random sequence + of 3 109 bp with equal proportions of all four nu- BamHI EcoRI EcoRI cleotides (this is an oversimplification), approximately 12 kb how many restriction fragments would be expected from 9 kb cleavage with 6kb (a) A “4-cutter” restriction enzyme? (b) A “6-cutter” restriction enzyme? 3kb (c) An “8-cutter” restriction enzyme?
1kb 2.13 Consider the restriction enzymes BamHI (cleavage site 5'-G j GATCC-3') and Sau3A (cleavage site 5'-j GATC-3'), where the downward arrow denotes the 2.8 Consider the accompanying diagram of a region of site of cleavage in each strand. Is every BamHI site a duplex DNA, in which the B’s represent bases in Watson– Sau3A site? Is every Sau3A site a BamHI site? Explain Crick pairs. Specify as precisely as possible the identity of: your answer. (a) B5, assuming that B1 A (b) B , assuming that B C 6 2 2.14 A DNA duplex with the sequence shown here is (c) B , assuming that B purine 7 3 cleaved with BamHI (cleavage site 5'-G j GATCC-3'), (d) B , assuming that B A or T 8 4 where the arrow denotes the site of cleavage in each strand. If the resulting fragments were brought together OH in the right order, and the breaks in the backbones re- paired, what possible DNA duplexes would be expected?
P 5'-ATTGGATCCAAACCCCAAAGGATCCTTA-3' 1 P B B 1 5 3'-TAACCTAGGTTTGGGGTTTCCTAGGAAT-5'
P 2.15 The restriction enzymes PstI, PvuII, and MluI have 2 P B2 B6 the following restriction sites, where the arrow indicates the site of cleavage in each strand. PstI5'-CTGCA j G-3' 3 P P B3 B7 PvuII5'-CAG j CTG-3' MluI5'-A j CGCGT-3' AU/ED: Art for prob 2.8 scaled P A DNA duplex with the sequence above is digested. What 4 P B to 85%B to make page. OK? 4 8 fragments would result from cleavage with: 2.9 Refer to the DNA molecule diagrammed in Problem (a) PstI? 2.8. In the precursor nucleotides of this molecule, which (b) PvuII? base was each of the phosphate groups 1–4 associated (c) MluI? with? 2.16 With regard to the restriction enzymes and the DNA 2.10 In a random sequence consisting of equal propor- duplex in Problem 2.15, what fragments would result tions of all four nucleotides, what is the probability that a from digestion with particular short sequence of nucleotides matches a re- (a) PstI and MluI? striction site for: (b) PvuII and MluI?
Problems 2.15, 2.16 5'-ATGCCCTGCAGTACCATGACGCGTTACGCAGCTGATCGAAACGCGTATATATGCC-3' 3'-TACGGGACGTCATGGTACTGCGCAATGCGTCGACTAGCTTTGCGCATATATACGG-5'
Analysis and Applications 81 2.17 Consider the sequence: ment.) In the accompanying gel diagram, indicate the genotypes across the top and the phenotype (band posi- 5'-CTGCAGGTG-3' tion or positions) expected for each genotype. (The scale 3'-GACGTCCAC-5' on the right shows the expected positions of fragments If this sequence were cleaved with PstI (5'-CTGCA j G-3'), could from 1 to 12 kb.) it still be cleaved with PvuII (5'-CAG j CTG-3')? If it were cleaved with PvuII, could it still be cleaved with PstI? Ex- 4 kb 2 kb A plain your answer. 1 6 kb 2.18 A circular DNA molecule is cleaved with BamHI, A2 EcoRI, or the two restriction enzymes together. The ac- companying diagram shows the resulting electrophoresis gel, with the band sizes indicated. Draw a diagram of the 12 kb 9kb circular DNA, showing the relative positions of the BamHI and EcoRI sites. 6kb
BamHI + 3kb BamHI EcoRI EcoRI 1kb 10 kb 7kb 2.21 The accompanying diagram shows the DNA frag- ments associated with an RFLP revealed by a probe that 3kb hybridizes where shown by the rectangle. The tick marks are cleavage sites for the restriction enzyme used in the RFLP analysis. How many alleles does this RFLP have? 2.19 In the diagrams of DNA fragments shown here, the How many genotypes are possible? In the accompanying tick marks indicate the positions of restriction sites for a gel diagram, indicate the phenotype (pattern of bands) particular restriction enzyme. A mixture of the two types expected of each genotype. (The scale on the right shows of molecules is digested and analyzed with a Southern the expected positions of fragments from 1 to 12 kb.) blot using either probe A or probe B, which hybridizes to the fragments where shown by the rectangles. In the ac- 4 kb 8 kb companying gel diagram, indicate the bands that would result from the use of each of these probes. (The scale on 12 kb the right shows the expected positions of fragments from 1 to 12 kb.)
5 kb 7 kb 12 kb 9kb 12 kb 6kb
Probe A Probe B 3kb Probe A Probe B 12 kb 1kb 9kb 6kb 2.22 The thick horizontal lines shown below represent al- ternative DNA molecules at a particular locus in a human 3kb chromosome. The tick marks indicate the positions of re- striction sites for a particular restriction enzyme. Ge- 1kb nomic DNA from a sample of people is digested and analyzed by a Southern blot using a probe DNA that hy- 2.20 In the accompanying diagram, the tick marks indi- bridizes at the position shown by the rectangle. How cate the positions of restriction sites in two alternative many possible RFLP alleles would be observed in the sam- DNA fragments that can be present at the A locus in a hu- ple? How many genotypes? man chromosome. An RFLP analysis is carried out, using 3 kb3 kb 2 kb 4 kb probe DNA that binds to the fragments at the position shown by the rectangle. With respect to this RFLP, how 3 kb5 kb 4 kb many genotypes are possible? (Use the symbol A1 to refer to the allele that yields the upper DNA fragment, and use 6 kb2 kb 4 kb A2 to refer to the allele that yields the lower DNA frag-
82 Chapter 2 DNA Structure and DNA Manipulation 2.23 The RFLPs described in Problem 2.22 are analyzed ABCD with the same restriction enzyme but a different probe, 12 which hybridizes at the site indicated here by the rectan- gle. How many RFLP alleles would be found? How many 10 genotypes? (Use the symbols A1, A2, . . . to indicate the al- leles.) In the accompanying gel diagram, indicate the Band 8 genotypes across the top and the phenotype (band posi- number tion or positions) expected for each genotype. (The scale 6 on the right shows the expected positions of fragments 4 from 1 to 12 kb.) 2 3 kb3 kb 2 kb 4 kb
3 kb5 kb 4 kb 2.28 A cigarette butt found at the scene of a robbery is 6 kb2 kb 4 kb found to have a sufficient number of epithelial cells stuck to the paper for the DNA to be extracted and typed. Shown below are the results of typing for three probes (locus 1, locus 2, and locus 3) of the evidence (X) and 7 12 kb suspects (A through G). Which of the suspects can be ex- 9kb cluded? Which cannot be excluded? Can you identify the 6kb robber? Explain your reasoning.
3kb ABCDEFGX
1kb
Locus 1 2.24 If hexamers were long enough oligonucleotides to serve as specific primers for PCR (for most purposes they are too short), what DNA fragment would be amplified using the “forward” primer pair 5'-AATGCC-3' and the “reverse” primer 3'-GCATGT-5' on the double-stranded DNA molecule shown below? ABCDEFGX 2.25 Would the primer pairs 3'-AATGCC-5' and 5'-GCATGT-3' amplify the same fragment described in the previous problem? Explain your answer. Locus 2 2.26 A human DNA fragment of 3 kb is to be amplified by PCR. The total genome size is 3 109 bp. (a) Prior to amplification, what fraction of the total DNA does the target sequence constitute? (b) What fraction does it constitute after 10 cycles of PCR? ABCDEFGX (c) After 20 cycles of PCR? (d) After 30 cycles of PCR?
2.27 RAPD analysis is carried out using genomic DNA Locus 3 from four individuals (A–D) sampled from a natural pop- ulation of Hawaiian crickets. The gel shown at the top of the page resulted from PCR with one of the primer pairs tested. Which bands are the RAPD polymorphisms?
5'-GATTACCGGTAAATGCCGGATTAACCCGGGTTATCAGGCCACGTACAACTGGAGTCC-3' 3'-CTAATGGCCATTTACGGCCTAATTGGGCCCAATAGTCCGGTGCATGTTGACCTCAGG-5' Problem 2.24
Analysis and Applications 83 2.29 A woman is uncertain which of two men is the fa- 2.30 Snake venom phosphodiesterase cleaves the chemi- ther of her child. DNA typing is carried out on blood from cal bonds shown in red in the accompanying diagram, the child (C), the mother (M), and each of the two males leaving mononucleotides that are phosphorylated in the (A and B), using probes for a highly polymorphic DNA 3' position. If the phosphates numbered 2 and 4 are ra- marker on two different chromosomes (“locus 1” and dioactive, which mononucleotides will be radioactive af- “locus 2”). The result is shown in the accompanying dia- ter cleavage with snake venom phosphodiesterase? gram. Can either male be excluded as the possible father? Explain your reasoning.
1 P B B P ABMC ABMC 1 5
2 P P B2 B6
3 P P B3 B7
4 P P B4 B8
Locus 1 Locus 2 HO
AU/ED: Art for prob 2.8 scaled to 85% for consistency. OK?
Challenge Problems
2.31 The genome of Drosophila melanogaster is 180 ABCDE ABCDE 106 bp, and a fragment of size 1.8 kb is to be amplified by (&(&(&(&(& XX(&(&(&(&(& PCR. How many cycles of PCR are necessary for the am- plified target sequence to constitute at least 99 percent of the total DNA?
2.32 A murder victim is found in an advanced state of decomposition and cannot be identified. Police suspect that the victim is one of five persons reported by their parents as missing. DNA typing is carried out on tissues from the victim (X) and on the five sets of parents (A through E), using probes for a highly polymorphic DNA marker on two different chromosomes (“locus 1” and “locus 2”). The result is shown in the diagram at the right. How do you interpret the fact that genomic DNA from each individual yields two bands? Can you identify the parents of the victim? Explain your reasoning.
2.33 The snake venom phosphodiesterase enzyme de- scribed in Problem 2.30 was originally used in a proce- dure called “nearest neighbor” analysis. In this Locus 1 Locus 2 procedure, a DNA strand is synthesized in the presence of Problem 2.32 all four trinucleotides, one of which carries a radioactive phosphate in the α (innermost) position. Then the DNA is digested to completion with snake venom phos- phodiesterase, and the resulting mononucleotides are (a) How does this procedure reveal the “nearest neigh- separated and assayed for radioactivity. Examine the dia- bors” of the radioactive nucleotide? gram in Problem 2.30, and then answer the following (b) Is the “nearest neighbor” on the 5' or the 3' side of questions. the labeled nucleotide?
84 Chapter 2 DNA Structure and DNA Manipulation Further Reading Botstein, D., R. L. White, M. Skolnick, and R. W. Davis. Loxdale, H. D., and G. Lushai. 1998. Molecular markers 1980. Construction of a genetic linkage map in man in entomology. Bulletin of Entomological Research 88: using restriction fragment length polymorphisms. 577–600. American Journal of Human Genetics 32: 314. Mitton, J. B. 1994. Molecular approaches to population Calladine, C. R., and H. Drew. 1997. Understanding DNA: biology. Annual Review of Ecology & Systematics The Molecule and How it Works. 2d ed. San Diego: 25: 45. Academic Press. Mullis, K. B. 1990. The unusual origin of the polymerase Cruzan, M. B. 1998. Genetic markers in plant evolution- chain reaction. Scientific American, April. ary ecology. Ecology 79: 400. Olby, R. C. 1994. The Path to the Double Helix: The Discovery Danna, K., and D. Nathans. 1971. Specific cleavage of of DNA. New York: Dover. Simian Virus 40 DNA by restriction endonuclease of Pena, S. D. J., V. F. Prado, and J. T. Epplen. 1995. DNA Hemophilus influenzae. Proceedings of the National Academy diagnosis of human genetic individuality. Journal of of Sciences, USA 68: 2913. Molecular Medicine 73: 555. DePamphilis, M. L., ed. 1996. DNA Replication in Sayre, A. 1975. Rosalind Franklin and DNA. New York: Eukaryotic Cells. Cold Spring Harbor, NY: Cold Spring Norton. Harbor Press. Schafer, A. J., and J. R. Hawkins. 1998. DNA variation Eeles, R. A. and A. C. Stamps. 1993. Polymerase Chain and the future of human genetics. Nature Biotechnology Reaction (PCR): The Technique and Its Application. Austin 16: 33. TX: R. G. Landes. Smithies, O. 1995. Early days of electrophoresis. Genetics Frank-Kamenetskii, M. D. 1997. Unraveling DNA: The 139: 1. Most Important Molecule of Life. Tr. by L. Liapin. Reading Thomson, G., and M. S. Esposito. 1999. The genetics of MA: Addison-Wesley. complex diseases. Trends in Genetics 15: M17. Hartl, D. L. 2000. A Primer of Populaton Genetics. 3d ed. Wang, D. G., J.-B. Fan, C.-J. Siao, A. Berno, P. Young, R. Sunderland, MA: Sinauer. Sapolsky et al. 1998. Large-scale identification, Jorde, L. B., M. Bamshad, and A. R. Rogers. 1998. Using mapping, and genotyping of single-nucleotide mitochondrial and nuclear DNA markers to polymorphisms in the human genome. Science 280: reconstruct human evolution. Bioessays 20: 126. 1077. Kumar, L. S. 1999. DNA markers in plant improvement: Watson, J. D. 1968. The Double Helix. New York: An overview. Biotechnology Advances 17: 143. Atheneum.
Further Reading 85 CHAPTER 3 Transmission Genetics: The Principle of Segregation
86 CHAPTER OUTLINE PRINCIPLES 3.1 Morphological and Molecular • Inherited traits are determined by the genes present in the Phenotypes reproductive cells united in fertilization. 3.2 Segregation of a Single Gene • Genes are usually inherited in pairs—one from the mother Phenotypic Ratios in the F2 and one from the father. Generation The Principle of Segregation • The genes in a pair may differ in DNA sequence and in their Verification of Segregation effect on the expression of a particular inherited trait. The Testcross and the Backcross • The maternally and paternally inherited genes are not changed 3.3 Segregation of Two or More by being together in the same organism. Genes The Principle of Independent • In the formation of reproductive cells, the paired genes Assortment separate again into different cells. The Testcross with Unlinked • Random combinations of reproductive cells containing Genes The Big Experiment different genes result in Mendel’s ratios of traits appearing among the progeny. 3.4 Human Pedigree Analysis Characteristics of Dominant and • The ratios actually observed for any trait are determined by the Recessive Inheritance types of dominance and gene interaction. Molecular Markers in Human • In genetic analysis, the complementation test is used to Pedigrees determine whether two recessive mutations that cause a 3.5 Pedigrees and Probability similar phenotype are alleles of the same gene. The mutant Mutually Exclusive Possibilities parents are crossed, and the phenotype of the progeny is Independent Possibilities examined. If the progeny phenotype is nonmutant 3.6 Incomplete Dominance and (complementation occurs), the mutations are in different Epistasis genes; if the progeny phenotype is mutant (complementation Multiple Alleles Human ABO Blood Groups does not occur), the mutations are in the same gene. Epistasis 3.7 Genetic Analysis: Mutant Screens and the Complementation Test The Complementation Test in Gene Identification CONNECTIONS Why Does the Complementation What Did Gregor Mendel Think He Discovered? Test Work? Gregor Mendel 1866 Experiments on Plant Hybrids Troubadour The Huntington’s Disease Collaborative Research Group 1993 A Novel Gene Containing a Trinucleotide Repeat That Is Expanded and Unstable on Huntington’s Disease Chromosomes
87 we emphasized that in mod- AU: Reference in n Chapter 2 ern genetics, a typical phenotype consists 3.1 Morphological and first paragraph of a band present at a particular position changed to Fig. Molecular Phenotypes Iin a DNA electrophoresis gel. In a method of 2.25. Correct? or genetic analysis such as RAPD (random am- Geneticists study traits in which one or- restore to Fig. plified polymorphic DNA), genomic DNA ganism differs from another in order to dis- 2.24? from a single individual can yield 30 or cover the genetic basis of the difference. AU: In first para- more bands (see Figure 2.25). Each of these Until the advent of molecular genetics, graph, phrase bands represents a phenotype and results geneticists dealt mainly with morphologi- “phenotype is from the PCR (polymerase chain reaction) cal traits, in which the differences between characterized by” amplification of a single region of DNA in organisms can be expressed in terms of changed to “phe- the genome of the individual. color, shape, or size. Mendel studied seven notype consists.” In this chapter we consider how genes morphological traits contrasting in seed OK? are transmitted from parents to offspring shape, seed color, flower color, pod shape, and how this determines the distribution of and so forth (Figure 3.1). Perhaps the most genotypes and phenotypes among related widely known example of a contrasting individuals. The study of the inheritance of Mendelian trait is round versus wrinkled traits constitutes transmission genetics. seeds. As pea seeds dry, they lose water This subject is also called Mendelian and shrink. Round seeds are round be- genetics because the underlying principles cause they shrink uniformly; wrinkled were first deduced from experiments in seeds are wrinkled because they shrink ir- garden peas (Pisum sativum) carried out in regularly. The wrinkled phenotype is due the years 1856–1863 by Gregor Mendel, a to the absence of a branched-chain form of monk at the monastery of St. Thomas in the starch known as amylopectin, which is not town of Brno (Brünn), in the Czech Repub- synthesized in wrinkled seeds owing to a lic. He reported his experiments to a local defect in the enzyme starch-branching en- natural history society, published the re- zyme I (SBEI). In this small garden sults and his interpretation in its scientific The nonmutant, or wildtype, allele of plot adjacent to journal in 1866, and began exchanging let- the gene for SBEI is designated W and the the monastery of ters with one of the leading botanists of the mutant allele w. Seeds that are heterozy- St. Thomas, Gregor time. His experiments were careful and ex- gous Ww have only half as much SBEI en- Mendel grew more zyme as wildtype homozygous WW seeds, than 33,500 pea ceptionally well documented, and his paper plants in the years contains the first clear exposition of the sta- but this half the normal amount of enzyme 1856–1863, including tistical rules governing the transmission of produces enough amylopectin for the het- more than 6400 plants genes from generation to generation. Nev- erozygous Ww seeds to shrink uniformly in one year alone. He ertheless, Mendel’s paper was ignored for and remain phenotypically round. Hence, received some help with respect to seed shape, the wildtype W from two fellow 34 years until its significance was finally ap- monks who assisted in preciated. allele is dominant over the mutant w allele. the experiments. Because the w allele is recessive, only the homozygous ww seeds become wrinkled. The molecular basis of the wrinkled mu- tation is that the SBEI gene has become in- terrupted by the insertion, into the gene, of a DNA sequence called a transposable element. Such DNA sequences are capable of moving (transposition) from one location to another within a chromosome or be- tween chromosomes. The molecular mech- anism of transposition is discussed in Chapter 7, but for our present purposes it is necessary to know only that transposable elements are present in most genomes, es- pecially the large genomes of eukaryotes, and that many spontaneous mutations re- sult from the insertion of transposable ele- ments into a gene.
88 Chapter 3 Transmission Genetics: The Principle of Segregation Phenotype of progeny of Parental strain 1: Dominant Parental strain 2: Recessive monohybrid cross
Seed shape
Round Wrinkled Round
Seed color
Yellow Green Yellow
Flower color
Purple White Purple
Pod shape Inflated Constricted Inflated
Pod color
Green Yellow Green
Flower and pod position
Axial (along stem) Terminal (at top of stem) Axial
Stem length
Standard Dwarf Standard
Figure 3.1 The seven different traits studied in peas by Mendel. The phenotype shown at the far right is the dominant trait, which appears in the hybrid produced by crossing.
3.1 Morphological and Molecular Phenotypes 89 (A)–SBEI gene (C) (wildtype round, W) Genotype WW Ww ww EcoRI EcoRI Seed DNA phenotype Duplex
Site of hybridization with probe DNA Molecular phenotype (B)–SBEI gene with insertion (mutant wrinkled, w) EcoRI EcoRI
Transposable element insertion into SBEI gene
Figure 3.2 (A) W (round) is an allele of a gene that specifies the amino acid sequence of starch branching enzyme I (SBEI). (B) w (wrinkled) is an allele that encodes an inactive form of the enzyme because its DNA sequence is interrupted by the insertion of a transposable element. (C) At the level of the morphological phenotype, W is dominant to w: Genotype WW and Ww have round seeds, whereas genotype ww has wrinkled seeds. The molecular difference between the alleles can be detected as a restriction fragment length polymorphism (RFLP) using the enzyme EcoRI and a probe that hybridizes at the site shown. At the molecular level, the alleles are codominant: DNA from each genotype yields a different molecular phenotype—a single band differing in size for homozygous WW and ww, and both bands for heterozygous Ww.
Figure 3.2 includes a diagram of the DNA mozygous genotype comes from the two structure of the wildtype W and mutant w copies of whichever allele is homozygous, alleles and shows the DNA insertion that in- and thus it contains more DNA than the cor- terrupts the w allele. Highlighted are two responding DNA in the heterozygous geno- EcoRI restriction sites, present in both al- type, in which only one copy of each allele is leles, that flank the site of the insertion. The present. diagram in part C indicates what pattern of Hence, as illustrated in Figure 3.2C, the bands would be expected if one were to RFLP analysis clearly distinguishes between carry out a Southern blot (Section 2.3) in the genotypes WW, Ww, and ww, because which genomic DNA was digested to com- the heterozygous Ww genotype exhibits pletion with EcoRI and then the resulting both of the bands observed in the homo- fragments were separated by electrophore- zygous genotypes. This situation is de- sis and hybridized with a labeled probe com- scribed by saying that the W and w alleles plementary to a region shared between the are codominant with respect to the mole- W and w alleles. The EcoRI fragment from cular phenotype. However, as indicated by the W allele would be smaller than that of the seed shapes in Figure 3.2C, W is domi- the w allele, because of the inserted DNA in nant over w with respect to morphological the w allele, and thus it would migrate faster phenotype. In the discussion that follows, than the corresponding fragment from the we use the RFLP analysis of the W and w al- w allele and move to a position closer to the leles to emphasize the importance of mole- bottom of the gel. Genomic DNA from ho- cular phenotypes in modern genetics and to mozygous WW would yield a single, faster- demonstrate experimentally the principles migrating band; that from homozygous ww of genetic transmission. a single, slower-migrating band; and that from heterozygous Ww two bands with the same electrophoretic mobility as those ob- 3.2 served in the homozygous genotypes. In Segregation of a Single Gene Figure 3.2C, the band in the homozygous Mendel selected peas for his experiments genotypes is shown as somewhat thicker for two primary reasons. First, he had access than those from the heterozygous geno- to varieties that differed in contrasting type, because the single band in each ho- traits, such as round versus wrinkled seeds
90 Chapter 3 Transmission Genetics: The Principle of Segregation and yellow versus green seeds. Second, his could be obtained, gave him the opportu- earlier studies had indicated that peas usu- nity, as he says in his paper, to “determine ally reproduce by self-pollination, in which the number of different forms in which pollen produced in a flower is used to fertil- hybrid progeny appear” and to “ascertain ize the eggs in the same flower. To produce their numerical interrelationships.” hybrids by cross-pollination (outcrossing), The fact that garden peas are normally all he needed to do was open the keel petal self-fertilizing means that in the absence of (enclosing the reproductive structures), re- deliberate outcrossing, plants with contrast- move the immature anthers (the pollen- ing traits are usually homozygous for alter- producing structures) before they shed native alleles of a gene affecting the trait; pollen, and dust the stigma (the female for example, plants with round seeds have structure) with pollen taken from a flower genotype WW and those with wrinkled on another plant (Figure 3.3). The relatively seeds genotype ww. The homozygous geno- small space needed to grow each plant, and types are indicated experimentally by the the relatively large number of progeny that observation that hereditary traits in each
Keel petal enclosing reproductive structures Stigma (female part) Anther (male part) Ovule (forms pea pod)
Flower of plant grown Flower of plant grown from a round seed from a wrinkled seed
Open flower Open flower and and discard collect pollen anthers
Brush pollen Figure 3.3 Crossing pea onto stigma plants requires some minor surgery in which the anthers of a flower are removed before they produce pollen. The stigma, or female part of the flower, is not removed. It is fertil- ized by brushing with All seeds formed mature pollen grains taken from another from flower are round plant.
3.2 Segregation of a Single Gene 91 variety are true-breeding, which means This principle is illustrated for round and that plants produce only progeny like wrinkled seeds in Figure 3.4. The gel icons themselves when allowed to self-pollinate show the RFLP bands that genomic DNA normally. Outcrossing between plants that from each type of seed in these crosses differ in one or more traits creates a would yield. The genotypes of the crosses hybrid. If the parents differ in one, two, or and their progeny are as follows: three traits, the hybrid is a monohybrid, dihy- / ? brid, or trihybrid. In keeping track of parents Cross A: WW ww Ww / ? and their hybrid progeny, we say that the Cross B: ww WW Ww parents constitute the P1 generation and In both reciprocal crosses, the progeny have their hybrid progeny the F1 generation. the morphological phenotype of round Because garden peas are sexual organ- seeds, but as shown by the RFLP diagrams isms, each cross can be performed in two on the right, all progeny genotypes are ac- ways, depending on which phenotype is tually heterozygous Ww and therefore dif- present in the female parent and which in ferent from either parent. The genetic the male parent. With round versus wrin- equivalence of reciprocal crosses illustrated kled, for example, the female parent can be in Figure 3.4 is a principle that is quite gen- round and the male wrinkled, or the re- eral in its applicability, but there is an im- verse. These are called reciprocal crosses. portant exception, having to do with sex Mendel was the first to demonstrate the fol- chromosomes, that will be discussed in lowing important principle: Chapter 4. The outcome of a genetic cross does not In the following section we examine a depend on which trait is present in the male few of Mendel’s original experiments in the and which is present in the female; reciprocal context of RFLP analysis in order to relate crosses yield the same result. the morphological phenotypes and their
Cross A
Ovule from round variety. Flower on plant grown Flower on plant grown Pollen from wrinkled variety. from wrinkled seed from round seed Result: All seeds round.
Cross B
Figure 3.4 Morpho- logical and molecular phenotypes showing the equivalence of reciprocal crosses. In this example, the hybrid seeds are round and yield an RFLP Ovule from wrinkled variety. pattern with two Flower on plant grown Flower on plant grown Pollen from round variety. bands, irrespective of from round seed from wrinkled seed the direction of the Result: All seeds round. cross.
92 Chapter 3 Transmission Genetics: The Principle of Segregation ratios to the molecular phenotypes that would be expected. Table 3.1 Results of Mendel’s monohybrid experiments Parental traits F1 trait Number of F2 progeny F2 ratio Phenotypic Ratios Round wrinkled (seeds) Round 5474 round, 1850 wrinkled 2.96 : 1 Yellow green (seeds) Yellow 6022 yellow, 2001 green 3.01 : 1 in the F2 Generation Although the progeny of the crosses in Fig- Purple white (flowers) Purple 705 purple, 224 white 3.15 : 1 ure 3.4 have the dominant phenotype of Inflated constricted (pods) Inflated 882 inflated, 299 constricted 2.95 : 1 round seeds, the RFLP analysis shows that Green yellow (unripe pods) Green 428 green, 152 yellow 2.82 : 1 they are actually heterozygous. The w allele is hidden with respect to the morphological Axial terminal (flower position) Axial 651 axial, 207 terminal 3.14 : 1 phenotype because w is recessive to W. Long short (stems) Long 787 long, 277 short 2.84 : 1 Nevertheless, the wrinkled phenotype reappears in the next generation when the hybrid progeny are allowed to undergo self- principal observations from the data in fertilization. For example, if the round F1 Table 3.1 are seeds from Cross A in Figure 3.4 were grown into sexually mature plants and al- • The F1 hybrids express only the dominant lowed to undergo self-fertilization, some of trait (because the F1 progeny are heterozy- the resulting seeds would be round and gous—for example, Ww) others wrinkled. The progeny seeds pro- duced by self-fertilization of the F genera- • In the F2 generation, plants with either the 1 dominant or the recessive trait are present tion constitute the F2 generation. When (which means that some F2 progeny are ho- Mendel carried out this experiment, in the mozygous—for example, ww). F2 generation he observed the results shown in the following diagram: • In the F2 generation, there are approximately three times as many plants with the domi- nant phenotype as plants with the recessive Parental lines: phenotype.
The 3 : 1 ratio observed in the F2 generation Round Wrinkled is the key to understanding the mechanism
F1 generation: of genetic transmission. In the next section we use RFLP analysis of the W and w alleles All round seeds to explain why this ratio is produced.
F2 generation:
3 round : 1 wrinkled (5474) (1850)
Note that the ratio 5474 : 1850 is approxi- mately 3 : 1. A 3 : 1 ratio of dominant : recessive forms in the F2 progeny is characteristic of simple Mendelian inheritance. Mendel’s data demonstrating this point are shown in 09131_01_1715P Table 3.1. Note that the first two traits in the table (round versus wrinkled seeds and yel- low versus green seeds) have many more observations than any of the others. The reason is that these traits can be classified captioncaptioncap- directly in the seeds, whereas the others tioncaptioncaption- captioncaptioncaption can be classified only in the mature plants, captioncaptioncap- and Mendel could analyze many more tioncaptioncaption- seeds than he could mature plants. The captioncaption
3.2 Segregation of a Single Gene 93 3. Each reproductive cell (gamete) produced The Principle of Segregation by an individual contains only one allele of The 3 : 1 ratio can be explained with refer- each gene (that is, either W or w). ence to Figure 3.5. This is the heart of 4. In the formation of gametes, any particular Mendelian genetics. You should master it gamete is equally likely to include either al- and be able to use it to deduce the progeny lele (hence, from a heterozygous Ww geno- types produced in crosses. The diagram il- type, half the gametes contain W and the lustrates these key features of single-gene other half contain w). inheritance: 5. The union of male and female reproductive cells is a random process that reunites the 1. Genes come in pairs, which means that a alleles in pairs. cell or individual has two copies (alleles) of each gene. The essential feature of transmission ge- 2. For each pair of genes, the alleles may be netics is the separation, technically called identical (homozygous WW or homozygous segregation, in unaltered form, of the two ww), or they may be different (heterozy- alleles in an individual during the forma- gous Ww). tion of its reproductive cells. Segregation
Parents:
WW ww
Meiosis Reproductive cells (gametes):
WW parent can W w ww parent can contribute only contribute only W gametes w gametes Fertilization F1 progeny:
Both W and w Ww present, but seeds are round F2 progeny: Ww parent produces equal numbers of Female gametes W and w gametes 1/2 W 1/2 w
Figure 3.5 A diagrammatic 1/2 W 1/ explanation of the 3 : 1 ratio 1/4 WW 1/4 Ww 3 of round of dominant : recessive Male seeds are WW; morphological phenotypes gametes 2/3 of round observed in the F2 genera- seeds are Ww 1/2 w tion of a monohybrid cross. 1/4 Ww 1/4 ww The 3 : 1 ratio is observed because of dominance. 1/4WW : 1/2Ww : 1/4 ww Note that the ratio of WW:Ww:wwgenotypes in the F2 generation is 1 : 2 : 1, as can be seen from the restriction fragment phenotypes.
94 Chapter 3 Transmission Genetics: The Principle of Segregation corresponds to points 3 and 4 in the fore- quencies are arrayed across the top margin going list. The principle of segregation is and the male gametes and their frequencies sometimes called Mendel’s first law. along the left-hand margin. This calculating device is widely used in genetics and is The Principle of Segregation: In the forma- called a Punnett square after its inventor tion of gametes, the paired hereditary determi- Reginald C. Punnett (1875–1967). The nants separate (segregate) in such a way that each gamete is equally likely to contain either Punnett square in Figure 3.5 shows that member of the pair. random combinations of the F1 gametes re- sult in an F2 generation with the genotypic Another key feature of transmission genet- composition 1 4 WW, 1 2 Ww, and 1 4 ww. ics is that the hereditary determinants are This can be confirmed by the RFLP banding present as pairs in both the parental organ- patterns in the gel icons because of the isms and the progeny organisms but as sin- codominance of W and w with respect to gle copies in the reproductive cells. This the molecular phenotype. But because W is feature corresponds to points 1 and 5 in the dominant over w with respect to the mor- foregoing list. phological phenotoype, the WW and Ww Figure 3.5 illustrates the biological genotypes have round seeds and the ww mechanism underlying the important genotypes have wrinkled seeds, yielding Mendelian ratios in the F2 generation of 3 : 1 the phenotypic ratio of round : wrinkled for phenotypes and 1:2:1forgenotypes. seeds of 3 : 1. Hence it is a combination of To understand these ratios, consider first the segregation, random union of gametes, and parental generation in which the original dominance that results in the 3 : 1 ratio. cross is WW ww. The sex of the parents is The ratio of F2 genotypes is as important not stated because reciprocal crosses yield as the ratio of F2 phenotypes. The Punnett identical results. (There is, however, a con- square in Figure 3.5 also shows that the ra- vention in genetics that unless otherwise tio of WW : Ww : ww genotypes is 1 :2:1, specified, crosses are given with the female which can be confirmed directly by the parent listed first.) In the original cross, the RFLP analysis. WW parent produces only W-containing gametes, whereas the ww parent produces only w-containing gametes. Segregation still Verification of Segregation takes place in the homozygous genotypes as The round seeds in Figure 3.5 conceal a well as in the heterozygous genotype, even genotypic ratio of 1 WW :2Ww.Tosaythe though all of the gametes carry the same same thing in another way, among the F2 type of allele (W from homozygous WW and seeds that are round (or, more generally, w from homozygous ww). When the W- among organisms that show the dominant bearing and w-bearing gametes come morphological phenotype), 1 3 are homo- together in fertilization, the hybrid geno- zygous (in this example, WW) and 2 3are type is heterozygous Ww, which is shown by heterozygous (in this example, Ww). This the bands in the gel icon next to the F1 prog- conclusion is obvious from the RFLP pat- eny. With regard to seed shape, the hybrid terns in Figure 3.5, but it is not at all obvi- Ww seeds are round because W is dominant ous from the morphological phenotypes. over w. Unless you knew something about genetics When the heterozygous F1 progeny already, it would be a very bold hypothe- form gametes, segregation implies that half sis, because it implies that two organisms the gametes will contain the W allele and with the same morphological phenotype the other half will contain the w allele. (in this case round seeds) might neverthe- These gametes come together at random less differ in molecular phenotype and in when an F1 individual is self-fertilized or genotype. when two F1 individuals are crossed. The Yet this is exactly what Mendel pro- result of random fertilization can be de- posed. But how could this hypothesis be duced from the sort of cross-multiplication tested experimentally? He realized that it square shown at the bottom of the figure, could be tested via self-fertilization of the F2 in which the female gametes and their fre- plants. With self-fertilization, plants grown
3.2 Segregation of a Single Gene 95 What Did Gregor Mendel Think He Discovered?
Gregor Mendel 1866 made no consistent distinction between experimentation also justifies the as- genotype and phenotype until 1909, when sumption that pea hybrids form germinal Monastery of St. Thomas, Brno the terms themselves were coined. and pollen cells that in their composition [then Brünn], Czech Republic correspond in equal numbers to all the Experiments on Plant Hybrids Artificial fertilization undertaken on or- constant forms resulting from the com- (original in German) namental plants to obtain bination of traits united new color variants initiated Whether the plan by through fertilization. Mendel’s paper is remarkable for its preci- the experiments reported which the individual The difference of forms sion and clarity. It is worth reading in its here. The striking regularity experiments were among the progeny of entirety for this reason alone. Although the with which the same hybrid set up and carried hybrids, as well as the most important discovery attributed to forms always reappeared ratios in which they are Mendel is segregation, he never uses this whenever fertilization be- out was adequate to observed, find an ade- term. His description of segregation is tween like species took place the assigned task quate explanation in found in the first passage in italics in the suggested further experi- should be decided the principle [of segre- excerpt. (All of the italics are reproduced ments whose task it was to by a benevolent gation] just deduced. from the original.) In his description of the follow that development of The simplest case is process, he takes us carefully through the hybrids in their progeny.... judgment. given by the series for separation of A and a in gametes and their This paper discusses the at- one pair of differing coming together again at random in fertil- tempt at such a detailed experiment. . . . traits. It is shown that this series is de- ization. One flaw in the description is Whether the plan by which the individual scribed by the expression: A 2Aa a, Mendel’s occasional confusion between experiments were set up and carried out in which A and a signify the forms with genotype and phenotype, which is illus- was adequate to the assigned task constant differing traits, and Aa the form trated by his writing A instead of AA and a should be decided by a benevolent judg- hybrid for both. The series contains four instead of aa in the display toward the end ment. ... [Here the experimental re- individuals in three different terms. In of the passage. Most early geneticists sults are described in detail.] Thus their production, pollen and germinal
from the homozygous WW genotypes the F3 generation. The ratio 193 : 372 equals should be true-breeding for round seeds, 1 : 1.93, which is very close to the ratio 1 : 2 whereas those from the heterozygous Ww of WW : Ww genotypes predicted from the genotypes should yield round and wrinkled genetic hypothesis. seeds in the ratio of 3 : 1. On the other An important feature of the homo- hand, the plants grown from wrinkled seeds zygous round and homozygous wrinkled should be true-breeding for wrinkled be- seeds produced in the F2 and F3 generations cause these plants are homozygous ww. The is that the phenotypes are exactly the same results Mendel obtained are summarized in as those observed in the original parents in Figure 3.6. As predicted from the genetic hy- the P1 generation. This makes sense in pothesis, plants grown from F2 wrinkled terms of DNA, because the DNA of each al- seeds were true-breeding for wrinkled lele remains unaltered unless a new muta- seeds, yielding only wrinkled seeds in the F3 tion happens to occur. Mendel described generation. But some of the plants grown this result in a letter by saying that in the from round seeds showed evidence of seg- progeny of crosses, “the two parental traits regation. Among 565 plants grown from appear, separated and unchanged, and round F2 seeds, 372 plants produced both there is nothing to indicate that one of them round and wrinkled seeds in a proportion has either inherited or taken over anything very close to 3 : 1, whereas the remaining from the other.” From this finding, he con- 193 plants produced only round seeds in cluded that the hereditary determinants for
96 Chapter 3 Transmission Genetics: The Principle of Segregation cells of form A and a participate, on the The result of fertilization can be vi- makes no difference to the consequence average, equally in fertilization; there- sualized by writing the designations for of fertilization which of the two traits be- fore each form manifests itself twice, associated germinal and pollen cells in longs to the pollen and which to the ger- since four individuals are produced. Par- the form of fractions, pollen cells above minal cell. Therefore ticipating in fertilization are thus: the line, germinal cells below. In the A A a a Pollen cells A A a a case under discussion one obtains A a A a A 2Aa a Germinal cells A A a a A A a a A a A a This represents the average course It is entirely a matter of chance of self-fertilization of hybrids when two which of the two kinds of pollen com- In the first and fourth terms germi- differing traits are associated in them. bines with each single germinal cell. nal and pollen cells are alike; therefore In individual flowers and individual However, according to the laws of prob- the products of their association must plants, however, the ratio in which the ability, in an average of many cases it be constant, namely A and a; in the sec- members of the series are formed may will always happen that every pollen ond and third, however, a union of the be subject to not insignificant devia- form A and a will unite equally often two differing parental traits takes place tions. . . . Thus it was proven experi- with every germinal-cell form A and a; again, therefore the forms arising from mentally that, in Pisum, hybrids form therefore, in fertilization, one of the two such fertilizations are absolutely identi- different kinds of germinal and pollen pollen cells A will meet a germinal cell cal with the hybrid from which they de- cells and that this is the reason for the A, the other a germinal cell a, and rive. Thus, repeated hybridization takes variability of their offspring. equally, one pollen cell a will become place. The striking phenomenon, that associated with a germinal cell A, and hybrids are able to produce, in addition Source: Verhandlungen des naturforschenden den Vereines in Brünn 4: 3–47. the other a. to the two parental types, progeny that Pollen cells AAaa resemble themselves is thus explained: jjj j Aa and aA both give the same associa- Germinal cells AAaa tion, Aa, since, as mentioned earlier, it
1/3 (193) gave plants Figure 3.6 Summary of = 1/4 of all with pods containing only F phenotypes and the F2 seeds 2 round F3 seeds progeny produced by self-fertilization. 3/4 (5474) were round; 565 were planted, and...
2/3 (372) gave plants with pods containing = 1/ of all both round and wrinkled 2 F2 seeds Of 7324 F2 seeds F3 seeds in a 3 : 1 ratio of round : wrinkled
all gave plants producing = 1/4 of all 1/4 (1850) were wrinkled only wrinkled F3 seeds F2 seeds the traits in the parental lines were trans- or “contaminate” each other. In modern mitted as two different elements that retain terminology, this means that, with rare but their purity in the hybrids. In other words, important exceptions, genes are transmitted the hereditary determinants do not “mix” unchanged from generation to generation.
3.2 Segregation of a Single Gene 97 gametes produced by the heterozygous parent, because the recessive parent contributes only recessive alleles. Heterozygous Ww parent Mendel carried out a series of testcrosses Segregation yields with various traits. The results are shown in W and w gametes Table 3.2. In all cases, the ratio of phenotypes in a ratio of 1 : 1. among the testcross progeny is very close to the 1 : 1 ratio expected from segregation of 1/2 W 1/2 w the alleles in the heterozygous parent. Homozygous Another valuable type of cross is a recessive all w backcross, in which hybrid organisms are parent 1/2 Ww 1/2 ww crossed with one of the parental genotypes. Backcrosses are commonly used by geneti- cists and by plant and animal breeders, as The progeny of a testcross includes dominant and recessive we will see in later chapters. Note that the phenotypes in a ratio of 1 : 1. testcrosses in Table 3.2 are also backcrosses, because in each case, the F1 heterozygous Figure 3.7 In a testcross of an Ww heterozygous parent with a ww parent came from a cross between the homozygous recessive, the progeny are Ww and ww in the ratio of homozygous dominant and the homozy- 1 : 1. A testcross shows the result of segregation. gous recessive.
The Testcross and the Backcross 3.3 Segregation of Two Another straightforward way of testing the or More Genes genetic hypothesis in Figure 3.5 is by means The results of many genetic crosses depend of a testcross, a cross between an organism on the segregation of the alleles of two of that is heterozygous for one or more genes more genes. The genes may be in different (for example, Ww) and an organism that is chromosomes or in the same chromosome. homozygous for the recessive alleles (for Although in this section we consider the example, ww). The result of such a testcross case of genes that are in two different chro- is shown in Figure 3.7. Because the het- mosomes, the same principles apply to erozygous parent is expected to produce W genes that are in the same chromosome but and w gametes in equal numbers, whereas are so far distant from each other that they the homozygous recessive produces only w segregate independently. The case of linkage gametes, the expected progeny are 1 2 with of genes in the same chromosome is exam- the genotype Ww and 1 2 with the geno- ined in Chapter 5. type ww. The former have the dominant To illustrate the principles, we consider phenotype because W is dominant over w, again a cross between homozygous geno- and the latter have the recessive pheno- types, but in this case homozygous for the type. A testcross is often extremely useful alleles of two genes. A specific example is a in genetic analysis: true-breeding variety of garden peas with In a testcross, the phenotypes of the progeny seeds that are wrinkled and green (geno- reveal the relative frequencies of the different type ww gg) versus a variety with seeds that are round and yellow (genotype WW GG). As suggested by the use of uppercase and lowercase symbols for the alleles, the domi- Table 3.2 Mendel’s testcross results nant alleles are W and G, the recessive al-
Testcross (F1 heterozygote Progeny leles w and g. Crossing these strains yields homozygous recessive) from testcross Ratio F1 seeds with the genotype Ww Gg, which are phenotypically round and yellow be- Round wrinkled seeds 193 round, 192 wrinkled 1.01 : 1 cause of the dominance relations. When Yellow green seeds 196 yellow, 189 green 1.04 : 1 the F1 seeds are grown into mature plants Purple white flowers 85 purple, 81 white 1.05 : 1 and self-fertilized, the F2 progeny show the result of simultaneous segregation of the W, Long short stems 87 long, 79 short 1.10 : 1 w allele pair and the G, g allele pair. When
98 Chapter 3 Transmission Genetics: The Principle of Segregation Mendel perfomed this cross, he obtained Seed color phenotypes the following numbers of F2 seeds: Round, yellow 315 3 1 Round, green 108 /4 /4 Wrinkled, yellow 101 Yellow Wrinkled, green 32 Green Total 5 5 6 In these data, the first thing to be noted is the expected 3 : 1 ratio for each trait consid- 3/4 ered separately. The ratio of round : wrin- kled (pooling across yellow and green) is Round 9/16 3/16 Seed shape Round, yellow Round, green (315 108) : (101 32) phenotypes 423 : 133 3.18 : 1 1/4 And the ratio of yellow : green (pooling across round and wrinkled) is Wrinkled 3/16 1/16 (315 101) : (108 32) Wrinkled, yellow Wrinkled, green 416 : 140 2.97 : 1 Both of these ratios are in satisfactory Ratio of phenotypes in the F2 progeny of a dihybrid agreement with 3 : 1. (Testing for goodness cross is 9 : 3 : 3 : 1. of fit to a predicted ratio is described in Chapter 4.) Furthermore, in the F2 progeny Figure 3.8 The 3 : 1 ratio of round : wrinkled, when combined of the dihybrid cross, the separate 3 : 1 ra- at random with the 3 : 1 ratio of yellow : green, yields the tios for the two traits were combined at 9 : 3 : 3 : 1 ratio observed in the F2 progeny of the dihybrid random. With random combinations, as cross. shown in Figure 3.8, among the 3 4 of the progeny that are round, 3 4 will be yellow and 1 4 green; similarly, among the 1 4 of the progeny that are wrinkled, 3 4 will be yellow and 1 4 green. The overall propor- tions of round yellow to round green to wrinkled yellow to wrinkled green are therefore expected to be 3 4 3 4:34 1 4:14 3 4:14 1 4 Segregation of Independent 9 16 : 3 16 : 3 16 : 1 16 W and w alleles segregation of G and g alleles The observed ratio of 315 : 108 : 101 : 32 equals 9.84 : 3.38 : 3.16 : 1, which is a sat- 1 isfactory fit to the 9 :3:3:1 ratio expected /4 WG from the Punnett square in Figure 3.8. 1/2 W (and G or g) 1/4 Wg The Principle of Ww Gg 1/ wG Independent Assortment 4 1/2 w (and G or g) The independent segregation of the W, w 1/4 wg and G, g allele pairs is illustrated in Figure 3.9. What independence means is that if a ga- mete contains W, it is equally likely to con- Result : An equal frequency of all tain G or g; and if a gamete contains w, it is four possible types of gametes equally likely to contain G or g. The impli- cation is that the four gametes are formed Figure 3.9 Independent segregation of the W, w and G, g allele in equal frequencies: pairs means that among each of the W and w gametic classes, the ratio of G : g is 1 : 1. Likewise, among each of the G and g 1 4 W G 1 4 W g 1 4 w G 1 4 w g gametic classes, the ratio of W : w is 1 : 1.
3.3 Segregation of Two or More Genes 99 The result of independent assortment Parents: Round, Wrinkled, yellow green Phenotypes when the four types of gametes combine at random to form the zygotes of the next generation is shown in Figure 3.10. Note that WW GG ww gg Genotypes the expected ratio of phenotypes among the F2 progeny is
Gametes: 9:3:3:1 Au: Is phrase However, as the Punnett square also shows, “independent segregation” WG wg the ratio of genotypes in the F generation correct in 2 is more complex; it is Fig. 3.10 caption, or should it be changed to 1:2:1:2:4:2:1:2:1 “independent F1 progeny: assortment”? The reason for this ratio is shown in Figure 3.11. Among seeds that have the WW Round, genotype, the ratio of yellow Double heterozygote GG : Gg : gg equals 1 : 2 : 1
Ww Gg Among seeds that have the Ww genotype, the ratio is 2:4:2 Female gametes (which is a 1 : 2 : 1 ratio multiplied by 2 because there are twice as many Ww geno- 1/4 WG 1/4 Wg1/4 wG1/4 wg types as either WW or ww). And among seeds that have the ww genotype, the ratio of 1/ WG 4 GG : Gg : gg equals 1:2:1 WW GG WW Gg Ww GG Ww Gg The phenotypes of the seeds are shown be- neath the genotypes, and the combined 1/ Wg 4 phenotypic ratio is Male WW Gg WW gg Ww Gg Ww gg gametes 9:3:3:1
1/4 wG Figure 3.11 also shows that among seeds Ww GG Ww Gg ww GG ww Gg that are GG, the ratio of WW : Ww : ww equals 1:2:1 1/4 wg among seeds that are Gg, it is Ww Gg Ww gg ww Gg ww gg 2:4:2
F progeny: and among seeds that are gg, it is 2 Genotypes Phenotypes 1:2:1 + 1/16 WW GG 2/16 WW Gg + + Therefore, the independent segregation 2/ Ww GG 4/ Ww Gg = 9/ round, yellow 16 16 16 means that among each of the possible + genotypes formed by one pair of alleles, the 1/16 ww GG 2/16 ww Gg = 3/16 wrinkled, yellow ratio of homozygous dominant to heterozy- gous to homozygous recessive is 1 : 2 : 1 for 1/ WW gg + 2/ Ww gg = 3/ round, green 16 16 16 the other independent pair of alleles. The principle of independent segrega- 1/16 ww gg = 1/16 wrinkled, green tion of two pairs of alleles in different chro- mosomes (or located sufficiently far apart Figure 3.10 Independent segregation is the biological basis for the 9 : 3 :3:1 in the same chromosome) has come to be ratio of F2 phenotypes resulting from a dihybrid cross. known as the principle of independent as-
100 Chapter 3 Transmission Genetics: The Principle of Segregation Segregation of Segregation of Segregation of Gg within WW Gg within Ww Gg within ww
WW GG WW Gg WW gg Ww GG Ww Gg Ww gg ww GG ww Gg ww gg 12::1 : 2 ::24 : 1 ::21
9 Round, yellow All genotypes 3 Round, green combined 3 Wrinkled, yellow 1 Wrinkled, green
Au: Proof asks if Figure 3.11 Genotypes and phenotypes of the F2 progeny of the dihybrid cross for seed shape and it is OK that term seed color. in Key Terms list is “independent assortment.” Or As with the one-gene testcross, in a two- sortment. It is also sometimes referred to as should term in gene testcross the ratio of progeny pheno- Mendel’s second law. list be changed to types is a direct demonstration of the ratio The Principle of Independent Assortment: “principle of in- of gametes produced by the doubly Segregation of the members of any pair of al- dependent assort- heterozygous parent. In the actual cross, leles is independent of the segregation of other ment”? Or should Mendel obtained 55 round yellow, 51 pairs in the formation of reproductive cells. term be made round green, 49 wrinkled yellow, and 53 bold elsewhere in Although the principle of independent as- wrinkled green, which is in good agree- text? sortment is fundamental in Mendelian ge- ment with the predicted 1 :1:1:1 ratio. netics, the phenomenon of linkage, caused by proximity of genes in the same chromo- some, is an important exception.
The Testcross with Unlinked Genes Parents: Ww Gg ww gg Genes that show independent assortment All gametes from are said to be unlinked. The hypothesis of Gametes homozygous recessive independent assortment can be tested di- parent are wg. wg rectly in a testcross with the double ho- Gametes mozygous recessive: 1/4 WG = 1/4 round, yellow Ww Gg ww gg Ww Gg The result of the testcross is shown in Figure
3.12. Because plants with doubly heterozy- Gametes from 1/4 Wg = 1/4 round, green gous genotypes produce four types of ga- heterozygous parent metes—W G, W g, w G, and w g—in equal show independent Ww gg frequencies, whereas the plants with ww gg assortment. genotypes produce only w g gametes, the 1/4 wG = 1/4 wrinkled, yellow possible progeny genotypes are Ww Gg, Ww ww Gg gg, ww Gg, and ww gg, and these are ex- pected in equal frequencies. Because of the 1/4 wg = 1/4 wrinkled, green dominance relations—W over w and G over g—the progeny phenotypes are expected to ww gg be round yellow, round green, wrinkled yellow, and wrinkled green in a ratio of Figure 3.12 Genotypes and phenotypes resulting from a testcross of the 1:1:1:1 Ww Gg double heterozygote.
3.3 Segregation of Two or More Genes 101 The Big Experiment Mendel tested this hypothesis, too, but By analogy with the independent segrega- by this time he was complaining of the tion of two genes illustrated in Figure 3.8, amount of work the experiment entailed, one might expect that independent segre- noting that “of all the experiments, it re- quired the most time and effort.” The result gation of three genes would produce F2 progeny in which combinations of the phe- of the experiment is shown in Figure 3.14. notypes are given by successive terms in the From top to bottom, the three Punnett multiplication of [(3 4) (1 4) ]3, which, squares show the segregation of W and w when multiplied out, yields 27 :9:9:9:3 from G and g in the genotypes PP, Pp, and :3:3:1.A specific example is shown in pp. In each box, the number in red is the expected number of plants of each geno- Figure 3.13. The allele pairs are W, w for round versus wrinkled seeds; G, g for yellow ver- type, assuming independent assortment, sus green seeds; and P, p for purple versus and the number in black is the observed white flowers. (The alleles indicated with number of each genotype of plant. The ex- the uppercase letters are dominant.) With cellent agreement confirmed what Mendel regarded as the main conclusion of all of his independent segregation, among the F2 progeny the most frequent phenotype experiments: (27 64) has the dominant form of all three “Pea hybrids form germinal and pollen cells traits, the next most frequent (9 64) has the that in their composition correspond in equal dominant form of two of the traits, the next numbers to all the constant forms resulting most frequent (3 64) has the dominant from the combination of traits united through form of only one trait, and the least fre- fertilization.” quent (1 64) is the triple recessive. Observe that if you consider any one of the traits In this admittedly somewhat turgid sen- and ignore the other two, then the ratio of tence, Mendel incorporated both segrega- phenotypes is 3 : 1; and if you consider any tion and independent assortment. In two of the traits, then the ratio of pheno- modern terms, what he means is that with types is 9:3:3:1.Thismeans that all of independent segregation, the gametes pro- the possible one- and two-gene indepen- duced by any hybrid plant consist of equal dent segregations are present in the overall numbers of all possible combinations of the three-gene independent segregation. alleles that are heterozygous. For example,
(3/4 W +X1/4ww) (3/4 G + 1/4 gg) X (3/4 P + 1/4 pp)
Observed Expected number number
27/64 W G P Round, yellow, purple 269 270
9/64 W G pp Round, yellow, white 98 90
9/64 W gg P Round, green, purple 86 90
9/64 ww G P Wrinkled, yellow, purple 88 90
3/64 W gg pp Round, green, white 27 30 Figure 3.13 With independent assortment, 3/64 ww G pp Wrinkled, yellow, white the expected ratio of phenotypes in a 34 30 trihybrid cross is obtained by multiplying 3/64 ww gg P Wrinkled, green, purple 30 30 the three independent 3 : 1 ratios of the dominant and recessive phenotypes. A dash 1/64 ww gg pp Wrinkled, green, white 7 10 used in a genotype symbol indicates that either the dominant or the recessive allele is present; for example, W refers collectively For any one gene, For any pair of genes,the to the genotypes WW and Ww. (The the ratio of phenotypes is ratio of phenotypes is expected numbers total 640 rather than 639 48 : 16 = 3 : 1 36 : 12 : 12 : 4 = 9 : 3 : 3 : 1 because of round-off error.)
102 Chapter 3 Transmission Genetics: The Principle of Segregation Expected number Observed number WW GG 10 8 Ww WW GG Ww Gg Gg ww 20 14 ww gg 20 15 PP gg 10 8 40 49 10 9 Pp 20 19 20 20 PP pp 10 10 PP
PP
WW GG 20 22 Ww Gg 40 38 ww 40 45 gg 20 25 80 78 20 17 40 36 40 40 Pp 20 20 Pp
Pp
WW GG 10 14 Ww Gg 20 18 ww 20 18 gg 10 10 40 48 10 11 20 24 20 16 pp 10 7 pp
pp
Figure 3.14 Progeny observed in the F2 genera- observed. Note that each gene, by itself, yields a tion of a trihybrid cross with the allelic pairs W, 1 : 2 : 1 ratio of genotypes and that each pair of w and G, g and P, p. In each box, the red entry is genes yields a 1 : 2 : 1 : 2 : 4 : 2 : 1 : 2 : 1 ratio of the expected number and the black entry is the genotypes.