Genomics, Proteomics, and the New Paradigm in Biomedical Research
Total Page:16
File Type:pdf, Size:1020Kb
November/December 2002 ⅐ Vol. 4 ⅐ No. 6, Supplement Genomics, proteomics, and the new paradigm in biomedical research Joshua LaBaer, MD, PhD This article is based on the keynote address that introduced the third biennial Asan Medical Center-Harvard Medical International Symposium “Genetics and Proteomics: Impact on Medicine and Health” that took place in Seoul, Korea, July 3–4, 2001. In his address, the author summarized exciting achievements in the field of genomics and introduced the related and emerging field of proteomics. By using industrialized high-throughput approaches, genomics and proteomics are dramatically accelerating the pace of biological research. They have started a scientific revolution whose impact will range from elucidating the structure of our chromosomes to providing powerful new tools for the study of disease; and from understanding human evolutionary history to novel applications in the medicine of the future. The author’s overview highlighted the recent history of the two fields and laid the foundation for the rest of the symposium presentations. Genet Med 2002:4(6, Supplement):2S–9S. Key Words: genomics, proteomics, biomedical research The end of the beginning component of disease, help us to understand its pathophysiol- The announcement of the completion of the draft sequenc- ogy, and identify potential targets for treatment. ing of the human genome in the spring of 2001 signaled a A second key factor was the development of tools that sug- watershed event in biology.1,2 The information provided by the gested the feasibility of the project. By the mid-1980s, the Human Genome Project (HGP), along with the powerful tech- Sanger sequencing method,5 along with some technical im- nologies developed in the process of completing it, has already provements in the enzymology of DNA polymerases6 and the altered dramatically the manner in which biomedical research labeling of the nucleotides,7 had advanced to the point at which is performed. sequencing advocates could dream of completing the entire The HGP began in the mid-1980s at the US Department of human genome. Even with these tools, however, it was by no Energy as the result of a confluence of factors.3,4 First, from a means obvious that this was achievable. medical perspective, there was increasing evidence for the im- After all, if the genome is considered the Book of Life, it is a portance of genetics in virtually all diseases. It had long been big book. There are more than 3 billion letters in the human recognized that on one end of a spectrum, diseases like cystic genome. At the time the project was conceived, typical se- fibrosis are almost entirely caused by a genetic factor. Simply quencing read lengths were in the 200–300 base range. So with the presence of a particular mutation in both of the alleles of a a simple calculation of the number of needed reads and the single gene leads to the disease. On the other end of that spec- amount of computing power needed to handle the data (not to trum, infectious diseases, such as the acquired immunodefi- mention special technical problems such as repeated sequenc- ciency syndrome, are predominantly caused by environmental es), it is not surprising that many argued that it would be an factors. Yet even for these environmentally caused diseases, inappropriate use of research money to take on this project and genetically determined host factors affect the individual’s re- perhaps mere folly altogether. sponse to the illness. The majority of human diseases falls Nevertheless, the draft version of the human sequence has somewhere between these extremes. Diseases like diabetes and been completed far ahead of the many decades originally an- heart disease comprise a mix of both genetic and environmen- ticipated. More than 20 different major sequencing centers and tal factors, and the genetic component is often complex, in- hundreds of scientists participated in this project. In the final volving many genes. By the mid-1980s it was clear that DNA stages of the project, centers were sequencing 24 hours a day, 7 sequence could expedite our understanding of the genetic days a week, all across the globe. As Francis Collins put it, “The sun never set on the Human Genome Project.”8 The HGP was not always a high-throughput sequencing From the Institute of Proteomics, Harvard Medical School, Boston, Massachusetts. project. That aspect of its operations did not begin until late in Joshua LaBaer, MD, PhD, Director, Institute of Proteomics, Harvard Medical School, 250 the 1990s after an early phase during which genetic maps and Longwood Avenue, BCMP, Boston, MA 02115. technologies were developed that were essential to the high- Received: August 12, 2002. throughput sequencing that occurred at the end. This may be Accepted: September 24, 2002. an important lesson for us as we move forward into the “Pro- DOI: 10.1097/01.GIM.0000041504.37108.75 teomic Era.” At the beginning, much time will be spent devel- 2S Genetics IN Medicine Genomics, proteomics, and the new paradigm oping new methods, new computational tools, and mapping Genomic structure and function out the projects that need to be accomplished. Moreover, the Among the first lessons learned from the genome sequenc- HGP has shown us that the technologies themselves are at least ing was the level of heterogeneity in the genome. The image as powerful as the data collected. New tools, such as DNA from the publication of draft sequencing by the International microarrays and transcriptional profiling, serial analysis of Human Genome Sequencing Consortium (Fig. 1) is a complex gene expression, and haplotype mapping, will prove to revolu- image, but one that illustrates several of these heterogeneous tionize the way that important genes are identified and diseases features. This part of the short arm of chromosome 7 is char- are characterized. acterized in detail as indicated by the abundance in the line Although the data are increasing rapidly, the sequencing of marked “Coverage,” which denotes “finished” sequence.2 the human genome is not yet complete. As of this printing, Genetic variation also appears to be heterogeneous across more than 98.5% of the genome has been sequenced in draft the genome. Single-nucleotide polymorphisms, or SNPs, indi- form, and about 91% has reached “finished” quality (http:// cate positions in the genome where there are common single www.ncbi.nlm.nih.gov/genome/seq/). Several chromosomes base genetic differences in the population. Peaks in this plot are either completed or are near completed, including 20, 21, indicate places where SNPs are common in the genome and, as 22, 10, 13, 14, 19, 6, and 7. indicated, some areas contain frequent polymorphisms Lessons from the genome whereas other areas much less so. Gene density is also heterogeneous in the genome. As many Although the HGP will impact all aspects of biology, some genes have not yet been identified, gene density has been esti- areas in particular will be directly influenced as indicated in mated by four different techniques. One method for estimat- Table 1. Not surprisingly, advances in our understanding of ing the presence of genes is to look for regions of homology genome structure and function and human evolution are the with the genome of the pufferfish T. nigroviridis, indicated with two disciplines most immediately affected by the genome the line marked “Exofish.” This animal is evolutionarily distant project. The manuscripts describing the draft sequencing enough from humans that conserved regions are likely to in- dwelt in these areas at length. Moreover, there has been an dicate genes. A second method of gene prediction is to evaluate unprecedented acceleration in the number of papers published the frequency of expressed sequence tags (ESTs), which is a about human evolution and genome structure in the last sev- measure of RNA species with poly-A tails and thus should cor- eral years. The sequence is a rich information store that will be relate with genes in the genome. The line marked “EST” indi- mined for years, becoming even more fruitful as additional cates areas where ESTs with at least one intron mapped to vertebrate genomes are completed, such as the mouse, the rat, genomic DNA. Third, the starts of genes are also marked. The and the dog. line marked “Genes” indicates the beginnings of genes as pre- By providing a foundation upon which discovery can occur, dicted by at least one of two gene-predicting software applica- the genome sequence is also impacting the daily execution of tions called Genie and Ensembl, and the line marked “Known biomedical research. This is occurring in two key areas: first, genes” indicates the starts of genes in the RefSeq database. Fi- genetic approaches are used to create linkages that demon- nally, the dinucleotide CpG occurs much less frequently in the strate genes that play a role in the etiology of disease; and sec- human genome than would be predicted by the known frac- ond, by providing a template that can be used to produce and tion of Cs and Gs. However, areas relatively rich in this dinu- study the gene products themselves in order to understand the cleotide, called CpG islands, are found in the genome, often in biochemical mechanisms that play a direct role in the causa- tion and treatment of disease. Finally, on the coming horizon, the genome will have a large effect on the practice of medicine. These profound changes will also demand a careful consideration of the public policy and ethics regarding the use of genetic information. Table 1 Areas of biology influenced by the Human Genome Project Lessons from the genome Genome structure and function Human evolutionary past Advances in biomedical research Genetic approaches Biochemical and functional approaches The practice of medicine Fig.