Commentary The Human Genome Revealed

James D. Watson President, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA

Seeing the International Sequencing Con- proteins necessary for bacterial existence. Back horses of our big sequencing labs, are 1000- sortium’s draft of the human genome is then I thought that the human genome, at sev- fold-improved descendents of the original se- highly satisfying. The way in which its 3 bil- eral billion bases long, was much, much too quencing machine put together by Mike lion bases have been determined closely fol- large to take on. Soon, however, I became a Hunkepillar and Lloyd Smith in Lee Hood’s lowed the course outlined more than a de- strong proponent of an internationally-based Caltech lab. The computers and software that cade ago by the National Academy of Sci- Human Genome Project (HGP), believing that now compare new raw DNA sequences to pre- ences (NAS) Committee on “Mapping and the large-scale mapping and sequencing re- existing ones also do their tasks 1000 times Sequencing the Human Genome.” Bruce Al- sources that it would command would greatly faster than was possible when the HGP began. berts, now the President of the NAS, was its hasten our discovery of the genetic underpin- A major obstacle to the correct assembly chairman and I one of its 14 other members. nings of many important human diseases. of the human genome was the vast amount The predictions in our 1988 report, that the Our NAS committee wasted little time of the repetitive DNA (∼50%). So the HGP human genome could be sequenced over a on whether we needed a HGP; instead we fo- labs decided early on to sequence DNA com- 15-year period for a cost of three billion dol- cused on how it should be organized and fi- ing from known chromosomal locations. lars, were more accurate than we dared guess. nanced. It seemed best to begin modestly and Their map-based strategy, however, was sud- Two more years of work, to fill in gaps and end with a sequencing crescendo, hopefully denly challenged in May 1998 by the new correct mistakes, will result in an almost er- fueled by much lower sequencing costs. We private company Celera Genomics, led by rorless genetic script for human existence. agreed unanimously that the first big se- . Celera proposed an alternative That the human script would become quencing efforts should not focus on human strategy whereby the genome was randomly available within our lifetimes never passed DNA but on DNA from a model organism of shredded into pieces that were sequenced and through my mind or that of Francis Crick genetics, such as baker’s yeast and the fruit then reassembled in a single process without when we found the double helix in 1953. At fly, Drosophila. We knew that many human the construction of a map—a strategy known that time, just learning how cells read the ge- genes were likely to be homologous to those as “whole-genome shotgun sequencing.” The netic instructions within DNA seemed a tall of model organisms, and these provided good key to their approach was to be the 200 new, order. Happily, progress was faster than ex- systems for studying gene function. high-capacity capillary DNA sequencers that pected, and by 1966 we knew how the ge- That we proposed a 15-year effort re- were about to be launched in the market, as netic code utilizes groups of three DNA bases flected our belief that those starting the well as new proprietary shotgun assembly soft- to specify the amino acid constituents of pro- project should also be part of the finishing ware for use on high-powered computers. So teins—the main “actors” in the plays of life. team. Richard Gibbs, Eric Lander, Maynard armed, Cetera promised a first draft of the hu- Things speeded up even more after the re- Olson, John Suiston, Bob Waterston, and man genome in only two years. combinant DNA procedures of Stanley Co- Jean Weissenbach all have stayed the course, I first heard of Celera in a telephone call hen and Herb Boyer burst upon the scene in running increasingly larger megabase se- from my former associate, Richards Roberts, 1973. Gene cloning and manipulation meta- quencing labs. Only one of our original NAS who organized the first (1988) Cold Spring morphosed from being dreams to becoming committee is no longer in science. Sadly, Dan Harbor meeting on Genome Mapping and Se- facts of life. Simultaneously, Fred Sanger and Nathans died of leukemia three years ago, at quencing. Rich told me that Celera would Walter Gilbert each developed a powerful way the age of 70. During our committee delib- blow the international consortium out of the to determine the order of bases along DNA erations, no one proposed a shorter time water and asked me to consider joining him molecules. This meant that humans, like cells, frame—technology had to improve too on its scientific advisory board. Expecting to could read the messages of genes. The way was much. Later, I learned that Congress likes big learn more about Celera’s game plan at our open to ascertain the complete genetic instruc- projects to be finished within 10 years so that soon-to-be-held spring 1988 Genome Meet- tions, i.e. to sequence the genome, of any or- key initial backers are still in Washington ing, I quickly phoned the National Institutes ganism (subject to the usual constraints of when the achievement is celebrated. Luckily, of Health (NIH) Genome Office and the Well- money, personnel, and technology). Tom Harkin recently became that Congress come Trust to report that Celera had marked The first tackled were those of rarity: a three-term Democratic senator from them out for obsolescence. Later that week, viruses, with the first sequenced viral ge- Iowa. So, like New Mexico’s Republican Pete Craig Venter visited the NIH to tell Harold nomes containing only several thousand Domenici, he will see the HGP from its be- Varmus and that the HGP’s bases. By the early 1980s, viral genomes con- ginnings to its finish as a senator. future effort might best be devoted to se- taining more than 100,000 bases had been The improvements in technology that quencing the mouse. completed, and bacterial genomes contain- the HGP would need for its success material- From the moment of Rich Roberts’s call, ing more than a million bases became realis- ized almost on schedule. They largely in- I found it unthinkable that a private com- tic objectives. Completion of such genomes volved modifications in pre-existing meth- pany should effectively control much of the would at last tell us the number of different ods, as opposed to great leaps forward that human genome through key patents. This generate Nobel Prize-like rewards. The cur- was a gene power-play that, at all costs, must Article and publication are at http://www.genome. rent DNA sequencing machines, the work- be contained. To my relief, the Wellcome org/cgi/doi/10.1101/gr.211601.

Trust’s immediate response was to double the gene numbers. So, I and virtually all of my Of the many new facts emerging from budget for human genome sequencing at the scientific peers were surprised last year when the human genome draft, I am most excited Sanger Centre. Although the merits of each the number of genes of the fruit fly, Dro- by the finding that repetitive sequences are approach were yet to be tested, Celera’s “su- sophila melanogaster, was found to be much almost absent from the four clusters of ho- per shotgun” method quickly caught the lower than that of a less complex animal, the meobox genes. Unlike most functionally- fancy of the serious press, who reported that roundworm Caenorhabditis elegans (13,500 vs. related human genes, the chromosomal order the HGP was off-course. In fact, two years ear- 18,500). More shocking still was the recent of homeobox genes reflects their temporal lier at its spring 1996 Bermuda meeting, HGP finding that the small mustard plant, Araba- expression patterns during embryonic devel- leaders had seriously discussed Jim Weber’s dopsis thaliana, contains many thousand opment. In this respect, they resemble the proposal for a low-resolution, whole-genome more genes (∼28,000) than does C. elegans. genes of bacterial operons that are tran- shotgun effort to complement the high- Now we are jolted again by the conclusion scribed from single messenger RNA mol- resolution map-based thrust. There, Phil that the number of human genes may not be ecules: Genes located at the start of bacterial Green’s off-the-cuff calculations, later redone much more than 30,000. Until a year ago, I operons are transcribed first by RNA polymer- and published (Green 1997), indicated that hu- anticipated that human existence would re- ase molecules moving along their respective man DNA is too repetitive for a pure shotgun quire 70,000–100,000 genes. region of DNA. Conceivably, much of early approach to assemble the genome correctly. Why organismal complexity fails to cor- developmental timing in humans may be a In September 1998, I returned to Wash- relate with gene numbers is not fully clear. It reflection of the time needed for RNA poly- ington to tell key congressional leaders that may be partly due to RNA splicing events, merase molecules to transcribe the lengthy expanded federal support of the publicly- which generate multiple protein products introns of homeobox genes. If so, insertions funded sequencing effort was necessary to from single genes: Vertebrate genes give rise of sizable transposable sequences into them prevent a monopoly on human genetic infor- to more splicing products than do inverte- would lethally mis-set key timing events in mation. Much of “big pharma” rooted for the brate genes. But equally relevant may be the embryonic development. public HGP, believing that Celera’s future da- quality of respective nervous systems. The Many, many more unanticipated obser- tabases could only be validated through roundworm, being dumber than the fruit fly, vations and hypotheses will emerge as the checking with publicly obtained sequences. may need more specific proteins (and there- reading of the human script extends beyond To my relief, Congress increased public se- fore genes) to respond to enemies or changes those individuals who produced it, to the quencing monies significantly. Thus encour- in its environment; the fruit fly’s more ad- much larger world of interested biologists. aged, the HGP announced that it, like Celera, vanced nervous system lets it respond to po- Even the heartiest, however, will find them- would complete a rough draft of the human tential enemies and stresses by flying away. selves stretched if they take on too much. The genome in the spring of 2000. But unlike Cel- Plants, being totally dumb, must continually most triumphs of the near future will likely era, it would further pursue a highly accurate evolve new genes to respond to new enemies come from focusing on human homologs of final product. and climatic changes. genes functionally understood in one or The February 2001 publications of the Many more vertebrate genomes need to more model organisms. Eventually, even human genome by the HGP (International be sequenced before we have a sense of how more important dividends will come from fo- Human Genome Sequencing Consortium often the generation of new genes has under- cusing on ourselves as human beings and 2001) and Celera (Venter et al. 2001) repre- lain evolutionary change. We also need to making sense of the often-seemingly intrac- sent a milestone in human history, revealing know why vertebrate genomes contain so table relationsbetween and nurture. the basic features of the human genetic many more repetitive sequences than do in- There is much more to human life than in- script. They will allow us to identify most of vertebrate genomes. Most human repetitive teractions between its DNA script and the the genes that underlie human existence. Us- sequences appear to have risen as the result of RNA and protein “actors” that carry out its ing the genetic code to translate their mes- the generation and movement of transpos- instructions. The culturally-derived facts and sage into protein products, we now have the able genetic elements. Conceivably, many of traditions that our brains pass onward from first comprehensive overview of the mol- the mutations that underlie vertebrate evolu- one generation to the next equally affect our ecules that make up our bodies. And it is im- tion arise from transposon movement into lives. mediately obvious that these are very similar regulatory regions, thus changing gene ex- Our genomes, thus, can never accurately to the molecular building blocks of other pression patterns. The very high levels of re- predict our futures. But we would be more forms of life. Darwinian evolution can be in- petitive DNA in amphibians and lungfish than silly if we did not use their information creasingly described through incremental may reflect their past needs to evolve fast for to the fullest. The human genetic script that changes in underlying DNA scripts. survival in their ever-changing ecological we are now finalizing will be regarded as the It is, however, unclear whether either niches. most important book ever to be read. draft is accurate enough for confident protein It should be possible to test the idea that structure predictions. In fact, proteome pre- changes in regulatory segments, as opposed NOTE dictions from the two human drafts may be to changes in amino acid coding segments, Published in modified form from A Passion for seriously misleading; only a virtually errorless have dominated vertebrate evolution. For ex- DNA: Genes, Genomes, and Society by James D. “gold standard” human DNA script will move ample, sequencing information from mor- Watson, (2001). Cold Spring Harbor Labora- us confidently into proteome waters. That so phologically different breeds of dog may be tory Press, Cold Spring Harbor, NY. much more sequencing needs to be done, informative, and hopefully funds will be however, should in no way lessen our admi- made available to produce draft genomes of REFERENCES ration for what both groups have accom- several breeds. How soon we shall be able to Green, P. 1997. Genome Res. 7: 410–417. plished. meaningfully compare the chimpanzee ge- International Human Genome Sequencing Until we saw the first DNA scripts under- nome with that of our own, remains unclear. Consortium. 2001. Nature 409: 860–921. lying multicellular existence, it seemed natu- Obviously we would like to know the genetic Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, ral that increasing organismal complexity changes that make possible the larger and M., Evans, C.A., Holt, R.A., et al. 2001. Science would involve corresponding increases in more powerful human brain. 291: 1304–1351.

