The Human Genome Project: Big Science Transforms Biology and Medicine Leroy Hood* and Lee Rowen*
Total Page:16
File Type:pdf, Size:1020Kb
Hood and Rowen Genome Medicine 2013, 5:79 http://genomemedicine.com/content/5/9/79 OPINION The Human Genome Project: big science transforms biology and medicine Leroy Hood* and Lee Rowen* understanding of cancer [4]. In May 1985 a meeting fo- Abstract cused entirely on the HGP was held, with Robert The Human Genome Project has transformed biology Sinsheimer, the Chancellor of the University of California, through its integrated big science approach to Santa Cruz (UCSC), assembling 12 experts to debate the deciphering a reference human genome sequence merits of this potential project [5]. The meeting concluded along with the complete sequences of key model that the project was technically possible, although very organisms. The project exemplifies the power, challenging. However, there was controversy as to whether necessity and success of large, integrated, cross- it was a good idea, with six of those assembled declaring disciplinary efforts - so-called ‘big science’ - directed themselves for the project, six against (and those against towards complex major objectives. In this article, we felt very strongly). The naysayers argued that big science discuss the ways in which this ambitious endeavor led is bad science because it diverts resources from the ‘real’ to the development of novel technologies and small science (such as single investigator science); that the analytical tools, and how it brought the expertise of genome is mostly junk that would not be worth sequen- engineers, computer scientists and mathematicians cing; that we were not ready to undertake such a complex together with biologists. It established an open project and should wait until the technology was adequate approach to data sharing and open-source software, for the task; and that mapping and sequencing the thereby making the data resulting from the project genome was a routine and monotonous task that would accessible to all. The genome sequences of microbes, not attract appropriate scientific talent. Throughout the plants and animals have revolutionized many fields of early years of advocacy for the HGP (mid- to late 1980s) science, including microbiology, virology, infectious perhaps 80% of biologists were against it, as was the disease and plant biology. Moreover, deeper National Institutes of Health (NIH) [6]. The US Depart- knowledge of human sequence variation has begun ment of Energy (DOE) initially pushed for the HGP, partly to alter the practice of medicine. The Human Genome using the argument that knowing the genome sequence Project has inspired subsequent large-scale data would help us understand the radiation effects on the acquisition initiatives such as the International HapMap human genome resulting from exposure to atom bombs Project, 1000 Genomes, and The Cancer Genome Atlas, and other aspects of energy transmission [7]. This DOE as well as the recently announced Human Brain advocacy was critical to stimulating the debate and ultim- Project and the emerging Human Proteome Project. ately the acceptance of the HGP. Curiously, there was more support from the US Congress than from most biologists. Those in Congress understood the appeal of Origins of the human genome project international competitiveness in biology and medicine, the The Human Genome Project (HGP) has profoundly potential for industrial spin-offs and economic benefits, changed biology and is rapidly catalyzing a transform- and the potential for more effective approaches to dealing ation of medicine [1-3]. The idea of the HGP was first with disease. A National Academy of Science committee publicly advocated by Renato Dulbecco in an article report endorsed the project in 1988 [8] and the tide of published in 1984, in which he argued that knowing opinion turned: in 1990, the program was initiated, with the human genome sequence would facilitate an the finished sequence published in 2004 ahead of schedule and under budget [9]. * Correspondence: [email protected]; Lee.Rowen@ systemsbiology.org Institute for Systems Biology, 401 Terry Ave N., Seattle, WA 98109, USA © 2013 BioMed Central Ltd. Hood and Rowen Genome Medicine 2013, 5:79 Page 2 of 8 http://genomemedicine.com/content/5/9/79 What did the human genome project entail? University in St Louis, the Joint Genome Institute, and the This 3-billion-dollar, 15-year program evolved consider- Whole Genome Laboratory at Baylor College of Medicine) ably as genomics technologies improved. Initially, the emerged from this effort, with these five centers continuing HGP set out to determine a human genetic map, then a toprovidegenomesequenceandtechnologydevelopment. physical map of the human genome [10], and finally the The HGP also fostered the development of mathematical, sequence map. Throughout, the HGP was instrumental computational and statistical tools for handling all the in pushing the development of high-throughput tech- data it generated. nologies for preparing, mapping and sequencing DNA The HGP produced a curated and accurate reference [11]. At the inception of the HGP in the early 1990s, sequence for each human chromosome, with only a small there was optimism that the then-prevailing sequencing number of gaps, and excluding large heterochromatic technology would be replaced. This technology, now regions [9]. In addition to providing a foundation for sub- called ‘first-generation sequencing’, relied on gel electro- sequent studies in human genomic variation, the reference phoresis to create sequencing ladders, and radioactive- sequence has proven essential for the development and or fluorescent-based labeling strategies to perform base subsequent widespread use of second-generation sequen- calling [12]. It was considered to be too cumbersome cing technologies, which began in the mid-2000s. Second- and low throughput for efficient genomic sequencing. generation cyclic array sequencing platforms produce, in a As it turned out, the initial human genome reference single run, up to hundreds of millions of short reads sequence was deciphered using a 96-capillary (highly (originally approximately 30 to 70 bases, now up to several parallelized) version of first-generation technology. Al- hundred bases), which are typically mapped to a reference ternative approaches such as multiplexing [13] and genome at highly redundant coverage [19]. A variety of sequencing by hybridization [14] were attempted but not cyclic array sequencing strategies (such as RNA-Seq, effectively scaled up. Meanwhile, thanks to the efforts of ChIP-Seq, bisulfite sequencing) have significantly ad- biotech companies, successive incremental improve- vanced biological studies of transcription and gene regula- ments in the cost, throughput, speed and accuracy of tion as well as genomics, progress for which the HGP first-generation automated fluorescent-based sequencing paved the way. strategies were made throughout the duration of the HGP. Because biologists were clamoring for sequence Impact of the human genome project on biology data, the goal of obtaining a full-fledged physical map of and technology the human genome was abandoned in the later stages of First, the human genome sequence initiated the compre- the HGP in favor of generating the sequence earlier than hensive discovery and cataloguing of a ‘parts list’ of most originally planned. This push was accelerated by Craig human genes [16,17], and by inference most human Venter’s bold plan to create a company (Celera) for the proteins, along with other important elements such as purpose of using a whole-genome shotgun approach non-coding regulatory RNAs. Understanding a complex [15] to decipher the sequence instead of the piecemeal biological system requires knowing the parts, how they clone-by-clone approach using bacterial artificial chromo- are connected, their dynamics and how all of these relate some (BAC) vectors that was being employed by the to function [20]. The parts list has been essential for the International Consortium. Venter’s initiative prompted emergence of ‘systems biology’, which has transformed government funding agencies to endorse production of a our approaches to biology and medicine [21,22]. clone-based draft sequence for each chromosome, with As an example, the ENCODE (Encyclopedia Of DNA the finishing to come in a subsequent phase. These paral- Elements) Project, launched by the NIH in 2003, aims to lel efforts accelerated the timetable for producing a discover and understand the functional parts of the genome sequence of immense value to biologists [16,17]. genome [23]. Using multiple approaches, many based on As a key component of the HGP, it was wisely decided second-generation sequencing, the ENCODE Project to sequence the smaller genomes of significant experi- Consortium has produced voluminous and valuable data mental model organisms such as yeast, a small flowering related to the regulatory networks that govern the expres- plant (Arabidopsis thaliana), worm and fruit fly before sion of genes [24]. Large datasets such as those produced taking on the far more challenging human genome. The by ENCODE raise challenging questions regarding gen- efforts of multiple centers were integrated to produce ome functionality. How can a true biological signal be dis- these reference genome sequences, fostering a culture of tinguished from the inevitable biological noise produced cooperation. There were originally 20 centers mapping by large datasets [25,26]? To what extent is the functional- and sequencing the human genome as part of an inter- ity of individual genomic elements