<<

Proc. Natl. Acad. Sci. USA Vol. 90, pp. 4338-4344, May 1993 Review The project Maynard V. Olson Department of Molecular , , Seattle, WA 98195

ABSTRACT The germ cells), which contain 3 x 109 base synthesize a mammalian provides Project in the is now well pairs (bp) of DNA. Given the four-letter a route to amounts of the pure underway. Its programmatic direction was alphabet of DNA-customarily symbol- protein. The ability to alter the largely set by a National Research Council ized with the letters G, A, T, and C-the of the protein through site-directed mu- report issued in 1988. The broad frame- sequence of 3 x 109 bp corresponds to tagenesis lends genuine novelty to the work supplied by this report has survived 750 megabytes of information. If the se- resultant biosynthetic opportunities. almost unchanged despite an upheaval in quence of the human genome could be However, the importance of recombi- the technology of genome analysis. This determined, it would be possible to store nant-DNA technology in making the Hu- upheaval has primarily affected physical and manipulate it on a desktop computer. man feasible stems from and genetic mapping, the two dominant However, even the dream of acquiring its analytical dimensions. pro- activities in the present phase ofthe project. DNA sequence on this scale is of recent vides a means to purify individual recom- Advances in mapping techniques have al- origin. Dramatic progress was made dur- binant-DNA from complex lowed good progress toward the specific ing the 1950s and 1960s in understanding mixtures and then to prepare biochemi- goals of the project and are also providing the mechanisms by which genetic infor- cally useful amounts of the molecules by strong corollary benefits throughout bio- mation specifies biological structure and culturing the microbial strains into which . Actual DNA . However, during this era, the they have been introduced. of the of the human and model information itself remained nearly inac- A less obvious consequence of the is still at an early stage. There cessible. discovery of restriction was the has been little progress in the intrinsic A landmark event in DNA analysis development ofthe first practical method efficiency of DNA-sequence determination. came in 1970 with the discovery of site- of genetic mapping in (ref. 4, pp. However, refinements in experimental pro- 519-522; ref. 6). Most human cells con- tocols, and man- specific restriction enzymes (refs. 2 and instrumentation, project 3; ref. 4, p. 64). These remarkable en- tain two copies of each DNA sequence, agement have made it practical to acquire one of maternal and the other of paternal sequence data on an enlarged scale. It is zymes have the ability to scan any source of DNA for every occurrence of a par- origin. When a new germ is pro- also increasingly apparent that DNA- duced, it contains only one copy of the sequence data provide a potent means of ticular string of bases-for example, the EcoRI recognizes the genome, a copy that is a unique of relating knowledge gained from the study string the two genomes from which it was de- of model organisms to human GAATTC. Restriction enzymes cleave . both strands of the at their rived. Genetic mapping involves measur- There is as yet little indication that the ing, through actual inheritance studies in infusion of technology from outside biology recognition sites. Since the events are directed by the DNA se- families, the probability that two closely into the has been spaced segments of the genome will stay effectively stimulated. Opportunities in this quence, they always occur at the same positions in different samples of genomic together during germ-cell formation. The area remain large, posing substantial tech- mapping requires an ability to distinguish nical and policy challenges. DNA extracted from any genetically ho- mogeneous source (e.g., different tissues between the two copies of the genome of the same individual human or different present in the somatic cells from which In the United States, the Human Genome individuals sampled from an inbred strain the germ cells are derived. Subtle differ- Project first took clear form in February of mice). ences in the base sequence of different 1988, with the release of the National Restriction enzymes provide a means instances of the human genome some- Research Council (NRC) report Mapping to develop precise physical maps ofDNA times alter restriction sites and, hence, and Sequencing the Human Genome (1). simply by determining the coordinates in restriction-fragment sizes. These alter- To a degree remarkable in Federal sci- base pairs of the sites at which particular ations are detectable even in complex ence policy, this report has had a clear enzymes cleave (ref. 4, pp. 66-67; ref. 5). genomes by a method known as gel- effect on subsequent programmatic ac- Like topographic maps, physical maps of transfer hybridization, which was devel- tivity. With a budget in the current fiscal DNA derive their utility through annota- oped in 1975 (ref. 4, pp. 127-130; ref. 7). year of $170 million, jointly administered tion: mapped landmarks provide refer- In 1987, the first global human genetic by the- National Institutes of Health ence points relative to which functional map, based on "restriction-fragment- (NIH) and the Department of DNA sequences such as can be length polymorphisms," was published (DOE), a program is underway that con- localized. Restriction enzymes also facil- (8). forms closely to the recommendations of itate a key step in the cut-and-splice As to the actual determination ofDNA the NRC committee. After a 5-year real- procedures by which recombinant-DNA sequence, reasonably efficient methods ity check, it is of both scientific and molecules (i.e., DNA clones) are con- first appeared in 1977 (ref. 4, pp. 67-69; policy interest to examine how the com- structed (ref. 4, pp. 73-74 and pp. 99- refs. 9 and 10). A technique known as mittee's view toward this project has 124). fared in the field. The importance of recombinant-DNA Abbreviations: DOE, Department of Energy; technology is often attributed primarily FISH, fluorescence hybridization; Background NIH, National Institutes of Health; NRC, to its synthetic dimension. For example, National Research Council; STS, sequence- The human genome is the genetic mate- the ability to design and construct a DNA tagged site; YAC, yeast artificial chromo- rial in human and cells (i.e., that programs a bacterium to some. 4338 Downloaded by guest on September 28, 2021 Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993) 4339

chain-termination sequencing came to 188-191) and as the starting points for methods used to construct them. This dominate standard practice: it is based on or functional studies. capability required a new choice of land- enzymatic DNA synthesis, carried out in Human is uniquely dependent marks called sequence-tagged sites vitro in the presence of artificial chain- on strong research infrastructure. While (STSs; ref. 19). An STS is simply a short, terminating variants of the normal DNA- model organisms have been extensively unique sequence of DNA that can be precursor molecules. By the early 1980s bred for the specific purpose of facilitat- amplified via the PCR. STSs are ideal individual sequences exceeding 105 bp ing genetic analysis (13), landmarks during map construction be- had been determined (11); however, a is limited to the examination of individ- cause of the ease with which they can be more common scale of analysis was 103- uals, families, and as they detected by PCR assays. Equally impor- 104 bp. Most physical mapping was car- are found in contemporary society. tant is their role in map representation ried out on a similar scale. Hence, the NRC committee set ambi- and map use. Given the gap between the ability to tious goals for the construction of de- Complex physical maps based on re- determine 105 bp of DNA sequence in a tailed physical and genetic maps of the striction sites are of little value as exper- state-of-the-art laboratory and the 109-bp human genome, as well as organized col- imental tools unless they are supported size of the human genome, the NRC lections of cloned human DNA. By de- by a collection of clones that can be used committee confronted an enormous sign, the goals were too ambitious for the to detect particular segments of the problem of scale. Partly because of the technology of 1988. In retrospect, they mapped DNA via DNADNA hybridiza- obvious need for improved technology were so ambitious that they probably tion assays. Comprehensive maps of the would have overwhelmed the basic meth- and also because of a desire to maximize human genome would have to be sup- odologies on which the NRC report was tens of synergy between genome analysis and ported by thousands of clones, based. Fortunately, technical advances each of which would have to be main- studies of biological function, the com- since 1988 have exceeded all reasonable tained as a separate microbial strain. In mittee recommended against early em- expectations. contrast, STSs can be described in an phasis on large-scale sequencing of hu- Much of this progress was made pos- electronic data base in a form that makes man DNA. Instead, it advocated com- sible by the development of the - them prehensive physical and genetic experimentally accessible in any mapping ase chain reaction (PCR). PCR, which is laboratory. The most critical aspect ofan ofthe human genome, extensive mapping essentially a method of cloning, STS description is the DNA sequence of and sequencing ofthe smaller genomes of allows the amplification of specific DNA the two primers. Laboratory implemen- several model organisms, and a system- molecules in vitro through cycles of en- tation of an STS simply requires that the atic effort to develop improved sequenc- zymatic DNA synthesis (ref. 4, pp. 79- two primers be synthesized and the ap- ing technology. 85). PCR amplification is dependent on a propriate temperature-cycling regime be pair of short, synthetic "primers" (i.e., carried out. Principal Aims of the Human single-stranded DNA molecules whose Most large-scale physical maps are Genome Project ends can be extended by DNA polymer- constructed through the process of "con- ase under the direction of template mol- tig building." A contig is an organized set More important than the specific map- ecules). The test sample provides the of DNA clones that collectively provide ping and sequencing objectives of the template molecules, and the primers di- redundant cloned coverage of a region Human Genome Project are three rect the amplification to a particular seg- that is too long to clone in one piece (ref. broader aims that are implicit in these ment of the template DNA, typically a 4, pp. 587-588; refs. 20-22). Typically, goals: region only a few hundred base pairs in the clones have random end points, and (i) To improve the research infrastruc- length. Starting with a minute sample of the contig is described by specifying the ture of human genetics. total human DNA, it is possible to am- amount of overlap between each clone (ii) To help establish DNA sequence as plify any such region 1 billionfold while and its nearest neighbors. A procedure the primary interface between knowledge leaving the rest of the genome at its referred to as STS-content mapping pro- of and knowledge of the original concentration. vides a convenient method of establish- biology of model organisms. Widespread application ofthe PCR de- ing these overlaps (ref. 4, pp. 610-612; (iii) To launch an open-ended effort to pends on an efficient, automated method ref. 23). In a step that precedes contig improve the analytical of for the chemical synthesis of the PCR building, the STSs are tested to confirm DNA. primers. An approach to DNA synthesis that they occur in a single copy in the For the purposes of this review, prog- based on phosphoramidite , genome; then, if two clones share even a ress in the Human Genome Project will which became routine in the early 1980s, single STS, they can be reliably assumed be examined relative to these three broad meets this need (ref. 4, pp. 69-70; refs. to overlap. aims. 14-16). The first paper on the PCR ap- Although the PCR has had a profound peared in 1985 (17) but received little effect on physical mapping, other new The Research Infrastructure of notice; for example, despite its present developments have also improved the Human Genetics prominence in genome analysis, it is not prospects for the construction of large- mentioned in the NRC report. The ex- scale physical maps. One such develop- In the context of the Human Genome plosive growth of PCR applications be- ment has been the introduction of the Project, research infrastructure refers to gan with the publication of an important yeast artificial- (YAC) clon- the biological, informational, and meth- refinement ofthe PCR protocol in 1989- ing system, fist described in 1987 (ref. 4, odological tools with which genetics re- the use of a thermostable DNA polymer- pp. 590-592; ref. 24). YACs allow large search is carried out. Intensive genetic ase (18). This refinement allowed the segments of DNA to be cloned as linear, analysis of any species is heavily depen- cycles ofDNA synthesis, which are anal- artificial into the yeast dent on infrastructure. Particularly im- ogous to cellular generations, to be Saccharomyces cerevisiae. Even portant are genetic-linkage maps, physi- driven by simple thermal cycling with no some of the earliest YAC clones were 10 cal maps of DNA, and characterized new addition of reagents at each cycle. times the size of the largest clones that DNA clones. The latter are useful as By the end of 1989, it was already had been constructed previously. Fur- reagents that can be used to for apparent that the PCR provided a prac- thermore, the YAC system appears ca- particular short segments of the genome tical means of abstracting large-scale pable of cloning a higher proportion of by DNA-DNA hybridization (ref. 12, pp. physical maps away from the particular the genomic DNA of many organisms Downloaded by guest on September 28, 2021 4340 Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993) than could be recovered using earlier have led to greatly improved continuity, A key test of the effectiveness of the systems. This point has been most clearly but the need remains for supplementary infrastructure-building features of the documented during the physical mapping methods to define the order and orienta- Human Genome Project is the extent to of the genome of the nematode worm tion of disconnected contigs along chro- which its components are being used (25). mosomes. even before genome-wide physical maps By 1989, YAC technology had evolved -hybrid mapping, which in- are available. The most critical test in- to the point where specific segments of volves fragmentation of chromosomes in volves projects directed at the "position- the human genome could be recovered cultured cells with high doses of x-rays al cloning" of genes associated with her- efficiently in YAC clones (26). Soon followed by incorporation of the frag- itable diseases. Positional cloning is a thereafter, multi-megabase-pair contigs ments into stable cell lines, provides still strategy that was developed during the began to appear (23, 27-29), and, in the another solution to this problem (ref. 4, 1980s to allow determination of the bio- fall of 1992, complete YAC-based phys- pp. 608-609; ref. 40). Current protocols chemical basis of the many heritable dis- ical maps of human (30) for radiation-hybrid mapping are notable eases whose analysis has resisted the and the human (31) were for their abandonment of the traditional more traditional approach of direct bio- published. In these projects, contig con- goal of isolating a single short segment of chemical analysis of diseased (46). struction was largely by STS-content the human genome in each rodent cell In general, the biochemical analysis of mapping. There is little doubt that the line. Nearly all of the radiation-hybrid diseased tissue is rarely effective unless same technology employed on chromo- lines produced by these protocols con- the genetic defect alters a protein whose somes Y and 21, as well as on a large tain many unrelated segments of the hu- metabolic role in normal tissue is already segment of the (29), has man genome. Proximity of two STSs, or understood. Few of the heritable dis- sufficient power to produce highly con- other markers, is inferred by statistical nected physical maps ofthe entire human analysis of the pattern in which they eases that cause mental retardation, psy- genome. occur in a large collection of cell lines. chosis, congenital malformation, malig- Another important advance in physical Closely spaced markers have a higher nant tumors, and other similarly complex mapping has been the development of probability of occurring together in the effects meet this criterion. fluorescence in situ hybridization (FISH) same cell line than do pairs of markers The first step in positional cloning is to into a routine procedure. This technique that are on different chromosomes or are localize the "disease" by carrying employs DNA probes that can detect far apart on the same chromosome. out genetic mapping studies on families segments of the human genome by While the PCR, together with such new with multiple affected members. Studies DNADNA hybridization on samples of techniques as YAC cloning, FISH, and of the coinheritance of the disease with lysed cells prepared under radiation-hybrid mapping, has led to a genetically mapped DNA markers allow conditions that preserve the morphology surge of success in physical mapping, determination of the position of the gene of the condensed human chromosomes. PCR-based methods have also trans- in the genome. Actual biochemical iden- Attachment of fluorescent molecules to formed genetic mapping. In particular, tification of the gene still remains a for- the probe DNA allows visualization in the PCR has allowed development of a midable task since the resolution of ge- the of the position on a new class of genetic markers that have a netic maps in the human is rarely better chromosome to which the probe binds. particularly high probability ofexisting in than 1 megabase pair (Mbp). Physical The technique is a refinement ofprevious alternate forms in different instances of mapping and functional studies on the in situ hybridization methods that de- the human genome. cloned DNA are required to find the gene pended on radiolabeling of the probes These markers are based on short, within the candidate region. and autoradiographic detection (32). The repetitive DNA sequences that are Better physical mapping methods, par- increases in convenience, reliability, and widely distributed in the human genome. ticularly the combination ofYAC cloning resolution that have accompanied non- A particularly common motif is ... and FISH analysis, have improved the isotopic detection have transformed the (CA), .... At sites where this motif prospects for positional cloning. An ex- role of in situ hybridization in physical occurs, n, the number of repetitions of emplary baseline case, publishedjust be- mapping. The first nonisotopic visualiza- the dinucleotide CA, is highly variable fore either of these techniques became tion of single-copy sequences in human from one instance of the human genome widely available, is . Final chromosomes by in situ hybridization to the next (41, 42). Different values of n success in the positional cloning of the was published in 1985 (33). Fluorescence lead to PCR-amplification products of cystic fibrosis gene required heroic phys- detection of single-copy sequences was different lengths when the entire ... ical mapping efforts that never achieved introduced in 1987 (34), after which ap- (CA)n ... tract is amplified by using any semblance ofcontinuous cloned cov- plications expanded rapidly (35-37). primers that flank the repeat; these dif- erage of the candidate region (47). Piece- FISH contributes to two aspects of ferences are readily detected by gel elec- meal cloning and mapping proved ade- long-range physical mapping. First, it al- trophoresis. An attractive feature of quate only because the gene was large lows individual clones to be mapped at a PCR-detectable genetic markers is that and in a gene-poor region of the genome. coarse level long before contig building is they are simply a special type of STS. As Subsequent successes with a series of complete, thereby providing reagents of such, they can be readily included as disease genes reveal the influence of im- immediate use in the analysis of targeted landmarks in physical maps, as well as proved techniques. Examples in which regions. Second, contig maps have dis- genetic maps, thereby providing a simple YACs, FISH, or both figured promi- continuities whenever a site in the ge- method ofinterrelating these two types of nently include the following: fragile-X nome is missing from the available clone maps. Many PCR-detectable genetic syndrome (48-50), the most common collections. FISH provides a way to or- markers have been integrated into preex- heritable form of mental retardation; fa- der and orient contigs along a chromo- isting maps ofthe human, greatly improv- milial adenomatous polyposis (51, 52), a some even when occasional discontinui- ing these maps (43). Still more recently, a heritable form of colorectal ; my- ties exist. Early efforts to construct phys- human genetic map that is completely otonic dystrophy (53), an adult-onset dis- ical maps ofhuman chromosomes, which based on PCR-detectable markers has ease that affects muscle function; Kall- depended on clones that are prop- been constructed (44). Markers of the mann syndrome (54, 55), a defect in neu- agated in Escherichia coli, yielded - same type have also transformed genetic ronal development; Lowe syndrome (56), tively small contigs separated by discon- mapping in the mouse, whose genome is a developmental defect affecting the , tinuities (38, 39). YAC-based methods the same as that of the human (45). brain, and kidney; and Downloaded by guest on September 28, 2021 Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993) 4341 (57-59), a neurological disease that is ruption is the initiating event in several the of a factor follows the nor- lethal in early childhood. forms of neoplasia (65, 66). pathway, its has a fea- The genes for many other heritable ture that is unusual in yeast but relatively diseases are now under analysis by sim- DNA Sequence as an Interface Between common in human cells: it is produced by ilar techniques. Particularly impressive is Knowledge of Human Biology and proteolytic processing of a precursor progress on the genetic mapping of dis- Knowledge of the Biology of at a Lys-Arg linkage. Genetic eases such as familial breast and ovarian Model Organisms studies revealed that the gene KEX2 en- cancer (60, 61) and early-onset familial codes the protease that carries out this Alzheimer disease (62). The genetic anal- Central to the NRC committee's recom- processing step. Comparison of the se- ysis of these diseases is complicated by a mendations, which emphasized the im- quence of KEX2 with all other known set of factors that will be encountered portance of sequencing the genomes of DNA sequences revealed strong similar- increasingly often as positional cloning is model organisms, was the belief that ity to a human gene of previously un- applied to complex, adult-onset genetic DNA sequence offers a potent means of known function, c-fur (72). Subsequent disorders: suitable families are rare, interrelating diverse aspects ofbiological analysis showed that c-fur is a member of small, and incomplete (i.e., few grand- knowledge. Events during the past 5 a family of human genes that parents, parents, or siblings of the af- years have strongly reinforced this con- proteases that process precursors to fected individuals are available); even cept. many important and family members that remain disease-free Particularly remarkable is the ability of including , nerve , throughout a normal span cannot be DNA-sequence data to call attention to bone morphogenetic protein, and a major reliably categorized as unaffected since similarities between biological phenom- component ofthe AIDS (73). There they may have died from other causes ena that are superficially unrelated. A is a long history of direct, biochemical before disease developed; the disease is typical example involves the successful efforts to identify these proteases be- common enough in the general popula- transfer of information from the study of cause of their potential interest as phar- tion that cases with genetic and nonge- yeast mating to diverse areas of human macological targets. These efforts led to netic causes occur frequently in the same biology. Yeast cells have two mating the description of a whole series of pro- family. Highly informative genetic mark- types, commonly referred to as a and a. teases that are capable of cleaving Lys- ers, such as the PCR-detectable CA- In the yeast life cycle, a and a cells are Arg linkages under particular in vitro repeat polymorphisms, have helped ad- the rough counterparts of mammalian conditions but that serve other functions dress these problems since they maxi- germ cells. The yeast counterpart of - in vivo. mize the likelihood that the segment of tilization involves the fusion of a and a These two examples illustrate the the chromosome that bears the disease- cells, a process that is partly mediated by strength of the concept, which is funda- causing can be tracked reliably two peptide , a factor and a mental to the Human Genome Project, from one generation to the next even factor. These hormones are named after that DNA sequence provides the key to when there are many family members the that secretes them. They efficient knowledge transfer between that are missing or must be excluded from trigger a series ofchanges in the opposite model organisms and human biology. At the study because of their uncertain dis- cell type that prepare the cell for fusion. present, this process requires consider- ease status. The mechanisms through which a fac- able serendipity, only because available In addition to improved genetic map- tor and a factor are synthesized and DNA-sequence data on the genomes of ping, successful completion of these secreted have been studied in detail by both the human and the major model projects may require further advances in genetic techniques that are particularly organisms are fragmentary. There has physical mapping and sequencing. Be- well developed in yeast (67). A peculiar been enough progress on the sequencing cause ofthe difficulty ofthe genetic anal- feature of a-factor secretion is its inde- of the genomes of E. coli (74), S. cerevi- ysis, it is unlikely that disease genes such pendence of the pathway through which siae (75), and the nematode worm C. as the recently described one for early- yeast proteins are normally secreted. A elegans (76) to indicate the value of sys- onset familial Alzheimer disease on chro- particular gene, STE6, encodes a protein tematic genomic sequencing. However, mosome 14 (62) will be localized even to that allows a factor to leave the cell while the real work of determining complete within 1 Mbp by genetic mapping. Thus, bypassing the normal secretory pathway. sequences for the genomes of these its isolation will place great demands on Sequence analysis of STE6 revealed un- model organisms still lies ahead. physical mapping resources and tech- mistakable similarity to the human gene In humans, the main new source of niques for locating genes within cloned mdrl (68, 69). This gene has attracted systematic data has come from the se- DNA. The case of Huntington disease, interest because of its involvement in quencing of cDNAs, cloned DNA copies for which ample family resources are multiple-drug resistance, a phenomenon of the messenger RNA (mRNA) mole- available, is instructive: the gene was in which malignant cells become simul- cules that actually direct protein synthe- genetically mapped to a position near the taneously resistant to several of the most sis (ref. 4, pp. 102-104; ref. 77). This end of the short arm of in commonly used chemotherapeutic method is a cost-effective way of discov- 1983 (63) but has not yet been identified agents (70). Once the relatedness ofSTE6 ering new human genes because only a in cloned DNA. Its position, even now, is and mdrl had been established by se- small fraction of genomic DNA directly known only to within 2.5 Mbp (64). quence comparison, it was quickly codes for proteins. However, cDNA se- Still another class of disease genes shown by gene-transfer experiments that quencing is unlikely to replace genomic whose analysis has benefited from new the mouse version of the mdrl gene will sequencing as the definitive method of infrastructure are genes whose disruption actually substitute in yeast for the func- characterizing the complete set ofhuman in somatic cells causes cancer. Particu- tion of STE6, correcting the inability of genes for several reasons: genes contain larly in leukemias and lymphomas, a yeast cells with in the STE6 critical DNA sequences that regulate common mechanism by which disease- gene to secrete a factor (71). The avail- their expression but are not included in causing mutations arise is translocation, ability of the yeast system opens up a the mRNA; there are common instances a process of chromosome breakage and powerful new front for the study of this both in which one gene produces multi- rejoining. The combination of YACs and poorly understood transport mechanism. ple, substantially different mRNA mole- FISH analysis has simplified the mapping Studies of a-factor biosynthesis have cules and in which multiple genes pro- of the chromosomal breakpoints and al- proven equally productive in providing duce nearly identical mRNA molecules, lowed the isolation of genes whose dis- insights into human . While situations that are difficult to sort out Downloaded by guest on September 28, 2021 4342 Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993) without detailed knowledge of the struc- "The technical problems associated The Human Genome Project has stimu- tures of the corresponding genes and with mapping and sequencing the hu- lated increased interactions between bi- gene families; the cost advantages of man and other genomes are sufficiently ologists and and technologists cDNA sequencing erode when the goal is great that a scientifically sound program who have the necessary expertise to accurate, full-length sequences rather requires a diversified, sustained effort solve these problems. However, the dif- than one-pass, partial sequences; no ad- to improve our ability to analyze com- ficulties of translating these beginnings equate solution has been found to the plex DNA molecules .... Prospects into major improvements in DNA analy- problem that the mRNA products of dif- are ... good that the required ad- sis continue to pose substantial policy ferent genes are present at widely differ- vanced DNA technologies would challenges. ent concentrations, which vary dramati- emerge from a focused effort that em- cally in different tissues, different devel- phasizes pilot projects and technologi- Conclusions opmental stages, and different metabolic cal development." states. For an effort that is only in its third year NIH and DOE have both made vigorous of substantial funding, the Human Ge- Open-Ended Improvements in the efforts to steer a significant portion ofthe nome Project in the United States is Analytical Biochemistry of DNA project in the recommended directions. making good progress toward its central However, there is little indication that goals. The policy on which it was based The promise of DNA-sequence compar- decisive research momentum has devel- has proven farsighted even in the face of ison as a fundamental tool in biological oped in the technology of DNA sequenc- rapid technological change. In the map- research emphasizes the need for pro- ing. It can be argued that the problem is ping goals of the project, which have gressively better methods of DNA anal- predominantly cultural rather than tech- nical, relating to the different value sys- dominated the first years, the experimen- ysis, particularly DNA sequencing. A tems and research emphases of molecu- tal methods that are leading to success critical feature of this challenge is its lar genetics, on the one hand, and ana- have diverged widely from those extant open-ended . DNA sequencing is a lytical chemistry, applied physics, and when the NRC report was issued. None- technology, like digital computing, for engineering, on the other (79). theless, the report's conceptual frame- which there is no obvious point at which An illustration of the magnitude of the work has survived with little alteration. further improvements would saturate po- technical challenge is provided by the gap Examples abound of biological ad- tential applications. A basic misimpres- between the theoretical and actual output vances that have benefited directly from sion about the Human Genome Project is of the current generation of DNA- the early activities ofthe Human Genome that once its narrow goals are met, de- sequencing instruments. Standard com- Project. Precise tracking of the cause- mands for large-scale DNA sequencing mercial instruments now have the capac- and-effect relationships between activi- will taper off. DNA sequence data are ity to produce -30 kilobase pairs (kbp) of ties funded through the Human Genome basically a source of hypotheses, the raw sequence data per day (78). Allowing Project in the United States and specific rigorous testing of which typically re- for the desirability of determining the biological advances is neither possible quires the acquisition of still more DNA sequence ofthe two redundant strands of nor desirable. Human genome analysis is sequence. The determination of a "ref- DNA independently and for some over- a loosely coordinated international en- erence" human sequence will provide a sampling of data for each strand, a ratio deavor to which funding agencies and strong incentive to trace genes through of raw sequence data to finished data of scientists in many countries have already with finer grain than the E. 5:1 should be achievable. Hence, a single made important contributions. Vigorous coli/yeast/worm/fly/mouse/human instrument should be capable of produc- research activity funded through other comparisons on which the NRC commit- ing 6 kbp offinished sequence per day, or Federal programs, private agencies, and tee recommended early emphasis. Fi- =2 Mbp per year. In reality, no genome industry has also had a major impact. nally, the study of individual variation, center has yet produced even 1 Mbp of Nonetheless, NIH and DOE program- which plays a central role both in biology contiguous, finished sequence per year matic efforts, particularly through their and in , poses unbounded de- even though such centers typically have productive investment in YACs, FISH, mands for DNA-sequence data. many sequencing instruments. PCR-detectable DNA polymorphisms, Juxtaposed to this open-ended need for This paradox reflects the present im- and radiation-hybrid mapping, have improvements in the efficiency of DNA possibility of integrating all the steps in clearly achieved good progress toward sequencing is the reality that there has DNA sequencing into a continuous pro- the mapping goals of the NRC report and been no obvious increase in the basic effi- cess that fully utilizes even the capabili- also contributed directly to the success of ciency ofDNA sequencing during the past ties of current sequencing instruments. many other research projects in the bio- decade. The protocols have become more Although this experience is universal medical . robust, and the skill level required for among DNA sequencing laboratories, Like other human endeavors, the Hu- success has been lowered. Fluorescence- there is little consensus about which man Genome Project has succeeded best based methods with real-time detection of steps in the process are rate limiting, when it has aligned itself with broader the products ofDNA-sequencing reactions much less what should be done to im- trends. Examples include its increasing during have eased labora- prove them. reliance on PCR, yeast genetics, and flu- tory management of large projects and What is clear is that there is a dramatic orescence microscopy. It has succeeded decreased the subjectivity ofdata interpre- gap between the advanced biological least when it has tried to establish new tation (ref. 4, pp. 595-598; ref. 78). Hence, technologies of and trends such as the importation of high the practicality of large projects is greater the primitive nonbiological technologies. technology from other areas into biology. now than it was a decade ago. However, it The latter include the physical manipu- This tension is healthy and will undoubt- is not apparent that there has been any lation of samples, methods of chemical edly remain as the project focuses in- change in either the efficiency or the accu- and physical analysis, process design, creased attention on its flagship goal of racy with which an expert DNA sequencer quality control, and information han- determining the sequence of the 990% of can gather data. dling. These areas are all critical to ef- the human genome about which we still The NRC committee recognized this forts to scale up bench-top molecular know almost nothing. problem but was overoptimistic about its genetics, and most are poorly resolution (ref. 1, p. 2): trained to make the needed innovations. Note Added in Proof. The gene that is mutated Downloaded by guest on September 28, 2021 Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993) 4343

in Huntington disease has now been identified 26. Brownstein, B. H., Silverman, G. A., M., Yu, S., Holman, K., Baker, E., (80). Little, R. D., Burke, D. T., Korsmeyer, Warren, S. T., Schlessinger, D., Suther- S. J., Schlessinger, D. & Olson, M. V. land, G. R. & Richards, R. I. (1991) Sci- 1. National Research Council (1988) Map- (1989) 244, 1348-1351. ence 252, 1711-1714. ping and Sequencing the Human Ge- 27. Anand, R., Ogilvie, D. J., Butler, R., 50. Oberle, I., Rousseau, F., Heitz, D., nome (Natl. Acad. Press, Washington, Riley, J. H., Finniear, R. S., Powell, Kretz, C., Devys, D., Hanauer, A., DC). S. J., Smith, J. C. & Markham, A. F. Boue, J., Bertheas, M. F. & Mandel, 2. Smith, H. 0. & Wilcox, K. W. (1970) J. (1991) 9, 124-130. J. L. (1991) Science 252, 1097-1102. Mol. Biol. 51, 379-391. 28. Silverman, G. A., Jockel, J. I., Domer, 51. Kinzler, K. W., Nilbert, M. C., Su, 3. Smith, H. 0. (1979) Science 205, 455- P. H., Mohr, R. M., Taillon-Miller, P. & L. K., Vogelstein, B., Bryan, T. M. et 462. Korsmeyer, S. J. (1991) Genomics 9, al. (1991) Science 253, 661-665. 4. Watson, J. D., Gilman, M., Witkowski, 219-228. 52. Groden, J., Thliveris, A., Samowitz, W., J. & Zoller, M. (1992) Recombinant DNA 29. Little, R. D., Pilia, G., Johnson, S., Carlson, M., Gelbert, L. et al. (1991) Cell (Freeman, New York), 2nd Ed. D'Urso, M. & Schlessinger, D. (1992) 66, 589-600. 5. Nathans, D. (1979) Science 206, 903-909. Proc. Natl. Acad. Sci. USA 89, 177-181. 53. Fu, Y. H., Pizzuti, A., Fenwick, R. G., 6. Botstein, D., White, R. L., Skolnick, M. 30. Chumakov, I., Rigault, P., Guillou, S., Jr., King, J., Rajnarayan, S., Dunne, & Davis, R. W. (1980) Am. J. Hum. Ougen, P., Billaut, A. et al. (1992) Nature P. W., Dubel, J., Nasser, G. A., Ash- Genet. 32, 314-331. (London) 359, 380-387. izawa, T., de Jong, P., Wieringa, B., 7. Southern, E. M. (1975) J. Mol. Biol. 98, 31. Foote, S., Vollrath, D., Hilton, A. & Korneluk, R., Perryman, M. B., Ep- 503-517. Page, D. C. (1992) Science 258, 60-66. stein, H. F. & Caskey, C. T. (1992) Sci- 8. Donis-Keller, H., Green, P., Helms, C., 32. Gall, J. G. & Pardue, M. L. (1969) Proc. ence 255, 1256-1258. Cartinhour, S., Weiffenbach, B. et al. Natl. Acad. Sci. USA 63, 378-383. 54. Legouis, R., Hardelin, J. P., Levilliers, (1987) Cell 51, 319-337. 33. Landegent, J. E., Jansen in de Wal, N., J., Claverie, J. M., Compain, S., 9. Sanger, F., Nicklen, S. & Coulson, A. R. van Ommen, G.-J. B., Baas, F., de Vi- Wunderle, V., Millasseau, P., Le Paslier, (1977) Proc. Natl. Acad. Sci. USA 74, jlder, J. J. M., van Duijn, P. & van der D., Cohen, D., Caterina, D., Bouguel- 5463-5467. Ploeg, M. (1985) Nature (London) 317, eret, L., Delemarre-Van de Waal, H., 10. Maxam, A. M. & Gilbert, W. (1977) 175-177. Lutfalla, G., Weissenbach, J. & Petit, C. Proc. Natl. Acad. Sci. USA 74, 560-564. 34. Landegent, J. E., Jansen in de Wal, N., (1991) Cell 67, 423-435. 11. Baer, R., Bankier, A. T., Biggin, M. D., Dirks, R. W., Baas, F. & van der Ploeg, 55. Franco, B., Guioli, S., Pragliola, A., Deininger, P. L., Farrell, P. J., Gibson, M. (1987) Hum. Genet. 77, 366-370. Incerti, B., Bardoni, B., Tonlorenzi, R., T. J., Hatfull, G., Hudson, G. S., Satch- 35. Lawrence, J. B., Villnave, C. A. & Carrozzo, R., Maestrini, E., Pieretti, M., well, S. C., Seguin, C., Tuffnell, P. S. & Singer, R. H. (1988) Cell 52, 51-61. Taillon-Miller, P., Brown, C. J., Willard, Barrell, B. G. (1984) Nature (London) 36. Trask, B., Pinkel, D. & van den Engh, G. H. F., Lawrence, C., Persico, M. G., 310, 207-211. (1989) Genomics 5, 710-717. Camerino, G. & Ballabio, A. (1991) Na- 12. Alberts, B., Bray, D., Lewis, J., Raff, 37. Lichter, P., Tang, C.-J. C., Call, K., ture (London) 353, 529-536. M., Roberts, K. & Watson, J. D. (1989) Hermanson, G., Evans, G. A., Hous- 56. Attree, O., Olvios, I. M., Okabe, I., Bai- of the Cell (Garland, man, D. & Ward, D. (1990) Science 247, ley, L., Nelson, D. L., Lewis, R. A., New York), 2nd Ed. 64-69. McInnes, R. R. & Nussbaum, R. L. 13. Fink, G. R. (1988) Genetics 118, 549- 38. Stallings, R. L., Torney, D. C., Hilde- (1992) Nature (London) 358, 239-242. 550. brand, C. E., Longmire, J. L., Deaven, 57. Vulpe, C., Levinson, B., Whitney, S., 14. Beaucage, S. L. & Caruthers, M. H. L. L., Jett, J. H., Doggett, N. A. & Packman, S. & Gitschier, J. (1993) Na- (1981) Tetrahedron Lett. 22, 1859-1862. Moyzis, R. K. (1990) Proc. Natl. Acad. ture Genet. 3, 7-13. 15. Caruthers, M. H. (1985) Science 230, Sci. USA 87, 6218-6222. 58. Chelly, J., Tumer, Z., Tonnesen, T., 281-285. 39. Tynan, K., Olsen, A., Trask, B., de Petterson, A., Ishikawa-Brush, Y., Tom- 16. Hunkapiller, M., Kent, S., Caruthers, Jong, P., Thompson, J., Zimmermann, merup, N., Horn, N. & Monaco, A. P. M., Dreyer, W., Firca, J., Giffin, C., W., Carrano, A. & Mohrenweiser, H. (1993) Nature Genet. 3, 14-19. Horvath, S., Hunkapiller, T., Tempst, P. (1992) Nucleic Acids Res. 20, 1629-1636. 59. Mercer, J. F. B., Livingston, J., Hall, & Hood, L. (1984) Nature (London) 310, 40. Cox, D. R., Burmeister, M., Price, B., Paynter, J. A., Begy, C., Chan- 105-111. E. R., Kim, S. & Myers, R. M. (1990) drasekharappa, S., Lockhart, P., 17. Saiki, R. K., Scharf, S., Faloona, F., Science 250, 245-250. Grimes, A., Bhave, M., Siemieniak, D. Mullis, K. B., Horn, G. T., Erlich, 41. Weber, J. L. & May, P. E. (1989) Am. J. & Glover, T. W. (1993) Nature Genet. 3, H. A. & Arnheim, N. (1985) Science 230, Hum. Genet. 44, 388-396. 20-25. 1350-1354. 42. Litt, M. & Luty, J. A. (1989) Am. J. 60. Hall, J. M., Lee, M. K., Newman, B., 18. Saiki, R. K., Gelfand, D. H., Stoffel, S., Hum. Genet. 44, 397-401. Morrow, J. E., Anderson, L. A., Huey, Scharf, S. J., Higuchi, R., Horn, G. T., 43. NIH/CEPH Collaborative Mapping B. & King, M.-C. (1990) Science 250, Mullis, K. B. & Erlich, H. A. (1988) Group (1992) Science 258, 67-86. 1684-1689. Science 239, 487-491. 44. Weissenbach, J., Gyapay, G., Dib, C., 61. Hall, J. M., Friedman, L., Guenther, C., 19. Olson, M., Hood, L., Cantor, C. & Bot- Vignal, A., Morissette, J., Millasseau, Lee, M. K., Weber, J. L., Black, D. M. stein, D. (1989) Science 245, 1434-1435. P., Vaysseix, G. & Lathrop, M. (1992) & King, M.-C. (1992) Am. J. Hum. 20. Coulson, A., Sulston, J., Brenner, S. & Nature (London) 359, 794-801. Genet. 50, 1235-1242. Karn, J. (1986) Proc. Natl. Acad. Sci. 45. Dietrich, W., Katz, H., Lincoln, S. E., 62. Schellenberg, G. D., Bird, T. D., Wijs- USA 83, 7821-7825. Shin, H. S., Friedman, J., Dracopoli, man, E. M., Orr, H. T., Anderson, L., 21. Olson, M. V., Dutchik, J. E., Graham, N. C. & Lander, E. S. (1992) Genetics Nemens, E., White, J. A., Bonnycastle, M. Y., Brodeur, G. M., Helms, C., 131, 423-447. L., Weber, J. L., Elisa Alonso, M., Pot- Frank, M., MacColin, M., Scheinman, 46. Collins, F. S. (1992) Nature Genet. 1, ter, H., Heston, L. L. & Martin, G. M. R. & Frank, T. (1986) Proc. Natl. Acad. 3-6. (1992) Science 258, 668-671. Sci. USA 83, 7826-7830. 47. Rommens, J. M., Iannuzzi, M. C., 63. Gusella, J. F., Wexler, N. S., Con- 22. Kohara, Y., Akiyama, K. & Isono, K. Kerem, B. S., Drumm, M. L., Melmer, neally, P. M., Naylor, S. L., Anderson, (1987) Cell 50, 495-508. G., Dean, M., Rozmahel, R., Cole, J. L., M. A., Tanzi, R. E., Watkins, P. C., Ot- 23. Green, E. D. & Olson, M. V. (1990) Sci- Kennedy, D., Hidaka, N., Zsiga, M., tina, K., Wallace, M. R., Sakaguchi, ence 250, 94-98. Buchwald, M., Riordan, J. R., Tsui, A. Y., Young, A. B., Shoulson, I., Be- 24. Burke, D. T., Carle, G. F. & Olson, L. C. & Collins, F. S. (1989) Science nilla, E. & Martin, J. B. (1983) Nature M. V. (1987) Science 236, 806-812. 245, 1059-1065. (London) 306, 234-238. 25. Coulson, A., Kozono, Y., Lutterbach, 48. Verkerk, A. J. M. H., Pieretti, M., Sut- 64. Davies, K. (1992) Nature (London) 357, B., Shownkeen, R., Sulston, J. & Wa- cliffe,J. S., Fu,Y. H., Kuhl, D. P. A. et page before 95. terston, R. (1991) BioEssays 13, 413- al. (1991) Cell 65, 905-914. 65. Diabali, M., Selleri, L., Parry, P., 417. 49. Kremer, E. J., Pritchard, M., Lynch, Bower, M., Young, B. D. & Evans, Downloaded by guest on September 28, 2021 4344 Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993) G. A. (1992) Nature Genet. 2, 113- & Thomas, D. Y. (1992) Science 256, Hawkins, T., Ainscough, R. & Water- 118. 232-234. ston, R. (1992) Nature (London) 356, 66. Ziemin-van der Poel, S., McCabe, N. R., 72. Fuller, R. S., Brake, A. J. & Thorner, J. 37-41. Gill, H. J., Espinosa, R., III, Patel, Y., (1989) Science 246, 482-486. 77. Adams, M. D., Kelley, J. M., Gocayne, Harden, A., Rubinelli, P., Smith, S. D., 73. Barr, P. J. (1991) Cell 66, 1-3. J. D., Dubnick, M., LeBeau, M. M., Rowley, J. D. & Diaz, 74. Daniels, D. L., Plunkett, G., III, Bur- Polymeropoulos, M. 0. (1991) Proc. Nati. Acad. Sci. USA land, V. & Blattner, F. R. (1992) Science M. H., Xiao, H., Merril, C. R., Wu, A., 88, 10735-10739. 257, 771-778. Olde, B., Moreno, R. F., Kerlavage, 67. Botstein, D. & Fink, G. R. (1988) Sci- 75. Oliver, S. G., van der Aart, Q. J. M., A. R., McCombie, W. R. & Venter, ence 240, 1439-1443. Agostoni-Carbone, M. L., Aigle, M., Al- J. C. (1991) Science 252, 1651-1656. 68. Kuchler, K., Sterne, R. E. & Thorner, J. berghina, L. et al. (1992) Nature (Lon- 78. Hunkapiller, T., Kaiser, R. J., Koop, (1989) EMBO J. 8, 3973-3984. don) 357, 38-46. B. F. & Hood, L. (1991) Science 254, 69. McGrath, J. P. & Varshavsky, A. (1989) 76. Sulston, J., Du, Z., Thomas, K., Wilson, 59-67. Nature (London) 340, 400-404. R., Hillier, L., Staden, R., Halloran, N., 79. Olson, M. V. (1991) Anal. Chem. 63, 70. Gottesman, M. M. & Pastan, I. (1988) J. Green, P., Thierry-Mieg, J., Qiu, L., 416A-420A. Biol. Chem. 263, 12163-12166. Dear, S., Coulson, A., Craxton, M., 80. The Huntington's Disease Collaborative 71. Raymond, M., Gros, P., Whiteway, M. Durbin, R., Berks, M., Metzstein, M., Research Group (1993) Cell 72, 971-983. Downloaded by guest on September 28, 2021