<<

www..com/nature Vol 464 | Issue no. 7289 | 1 April 2010 The human at ten Nearly a decade on from the completion of the draft sequence of the , researchers should work with the same intensity and focus to apply the results to health.

he race to complete the first human genome sequence had ones (see page 670). Along the way, geneticists have discovered that every thing a story needs to keep its audience enthralled — right such basic concepts as ‘gene’ and ‘gene regulation’ are far more com- Tdown to a neck-and-neck sprint for the finish by two fierce plex than they ever imagined (see page 664). rivals. In the end, the result was basically a tie. The rivals — the inter- But for all the intellectual ferment of the past decade, has human national, publicly funded Human and the private, health truly benefited from the sequencing of the human genome? for-profit company Celera then based in Rockville, Mary- A startlingly honest response can be found on pages 674 and 676, land — jointly announced the completion of their draft sequences in where the leaders of the public and private efforts, June 2000 at a gala televised press conference attended by US Presi- and , both say ‘not much’. Granted, there has been some dent and UK Prime Minister . progress, in the form of drugs targeted against specific genetic defects The White House press statement articulated the hope, felt by many, identified in a few types of cancer, for example, and in some rare that this landmark achievement would inherited disorders. But the complex- “lead to a new era of molecular medicine, EDITORIAL ity of post-genome biology has dashed an era that will bring new ways to prevent, 649 The human early hopes that this trickle of therapies diagnose, treat and cure disease”. genome at ten would rapidly become a flood. Witness This issue of Nature takes a look at the the multitude of association studies that NEWS FEATURES next chapter in the story — the first post- aimed to find connections between 664 Life is complicated genome decade — then asks how the tale common genetic variants and common might unfold in the years to come. Erika Check Hayden diseases, with only limited success, or For many scientists, the chronicle of 668 The human race the discovery that most cancers have that first decade is an intensely personal Alison Abbott their own unique genetic characteris- one. Not only were they inspired by this tics, making widely applicable therapies example of what researchers can do as 670 The sequence explosion hard to find. group, but they found that the avail- OPINION companies, including ability of the sequence shaped their lives 674 Has the revolution arrived? Celera, deCODE Genetics in Reykja- and their research in ways they could vik, Iceland, and Human Genome Sci- Francis Collins not have predicted. On page 668, we ences of Rockville, have had to rethink share the experiences of some of those 676 Multiple personal await their optimistic assumption that selling in thick of the race. But we also want to J Craig Venter human genetic information could turn hear about yours: we would like you to a profit. Excitement over start-up com- take part in Nature’s brief survey about 678 Point: Hypotheses first panies offering personal the genome’s impact at www.nature.com/ Robert Weinberg has withered just as fast, as it has become humangenomesurvey. The results will be 679 Counterpoint: Data first clear that their predictions have little published later this year. Todd Golub actionable value (see page 680). The first post-genome decade saw spec- This gap between basic research and tacular advances in . The success BOOKS & ARTS clinical application has not escaped the of the original genome project inspired 680 A reality check for personalized notice of funding agencies, many of many other ‘big biology’ efforts — nota- medicine which are now investing serious money bly the International HapMap Project, Muin J Khoury, James Evans & in a bid to close it. The US National which charted the points at which human Wylie Burke Institutes of Health, for example, has genomes commonly differ, and the Ency- established a string of major clinical clopedia of DNA Elements (ENCODE), and translational science centres over WHAT DO YOU THINK? which aims to identify every functional the past few years, and in February What did the human genome sequence mean element in the human genome. Dramatic it established a joint council with the to you? Take part in Nature’s online survey leaps in sequencing technology and a pre- Food and Drug Administration aimed ➧ www.nature.com/humangenomesurvey cipitous drop in costs have helped gen- at promoting translation. Similar efforts erate torrents of genetic data, including are under way elsewhere, such as those For more online extras visit more than two dozen published human being rolled out by the UK Medical ➧ www.nature.com/humangenome genomes and close to 200 unpublished Research Council.

649 © 2010 Macmillan Publishers Limited. All rights reserved EDITORIALS NATURE|Vol 464|1 April 2010

It is not yet clear whether such efforts will be enough for genomics. Perhaps it would even require an explicit challenge to foster Given the biological complexities involved, applying knowledge of fervour and competition. For example, the X-Prize Foundation of the human genome to health may well require a community-wide Playa Vista, , is offering a prize of US$10 million to the first effort as determined and as systematic as was the project to sequence group to sequence 100 human genomes in ten days. Why shouldn’t it in the first place. some organization offer a reward and kudos for, say, the next genetics- That effort would need to solve a long-standing mismatch: the based cancer drug to go from basic genomic data to approved therapy rapidly increasing ease of gathering genomic data versus the continu- within ten years? True, real-world therapies are unlikely to have met- ing difficulty of establishing what the genetic elements actually do. rics as clear as those in genome sequencing, in which progress can be One intriguing experiment in high-throughput functional analysis measured in base pairs read per day and there is a well-defined end appears on page 721, where the authors detail how they systemati- point. But the need is no less urgent, and the collective will to reach cally disabled each of the 21,000 protein-coding genes in human cells such goals should be no less intense. and then captured and processed microscope images of the resulting More than anything, the race to sequence the human genome cellular behaviour. proved that researchers comfortable in their individual pursuits are The effort would also require even more imaginative ways to visu- capable of incredible cohesion, focus and breathtaking speed. They alize and draw meaning from the flood of genomic and molecular rewrote the research rule-book, broke with the conventions of indi- data (see page 678). It would require interdisciplinary teams that vidual academic goals and left the public with the sense that anything can provide know-how not just in research, but in the issues such is scientifically possible. The ten years since have brought astound- as intellectual property, informed consent, finance and regulatory ing technological and intellectual advances. But ten years from now, bureaucracy that are needed to keep discoveries moving through the when the story of the genome’s first two decades is being told, it should agonies of clinical development. include equally astounding applications to human health. ■

and childhood obesity. The agency has also declared that the success A new row to hoe of the programmes will be measured not just by scientific publications, but by real progress towards solving these challenges, such as reducing The time is right to revitalize US agricultural the amount of energy, nitrogen fertilizer and water used in agriculture research. by 10% by 2030. It has also introduced new fellowship and outreach programmes in an effort to stimulate the dwindling pipeline of young n nations where food is plentiful, it is easy to take that abundance talent entering the field. for granted. In the , for example — a country rich The overall intent, says Beachy, is to raise the status of the nation’s Iin corn fields and pasturelands, and where shops overflow with agricultural research, hopefully drawing attention and support from cheap produce — agricultural research has languished for years under the nation’s lawmakers in the process. With just $262 million avail- comparatively low budgets and disorganized funding priorities. In able for the first round of grants, the Agriculture and Food Research the 2009 economic stimulus bill, for example, the National Science Initiative is still financially constrained compared with other US sci- Foundation received a $3-billion boost and the National Institutes of ence agencies. But the structure of its competitive grants programme Health got $10 billion — but the Department of Agriculture’s internal is an important step towards maximizing the impact of the funds research programme was allocated just $176 million, all of which was that it does have. restricted to improving facilities. The transition has not pleased everyone. News of Beachy’s appoint- It is heartening, then, that the administration of President Barack ment caused a stir among opponents of genetically modified crops, Obama has begun a much-needed overhaul of the nation’s agricul- who noted that he has voiced support for such crops in the past. tural-research programme. The groundwork was laid in June 2008, Beachy has maintained that NIFA will also support small farms and when US Congress mandated the creation of a National Institute of traditional crop-breeding programmes, and the first call for propos- Food and Agriculture (NIFA) within the agriculture department. als does seem to be doing this. But the strength of that support will Under the leadership of Roger Beachy, whom Obama appointed in become clear only after the grants have been made. September last year, the new agency has taken over, expanded and Meanwhile, many long-time agricultural researchers are feeling revitalized the department’s long-standing competitive grants pro- unsettled by the abruptness of the changes. Some are dismayed to gramme, now called the Agriculture and Food Research Initiative. find that their favourite funding programmes have been cut. Others Last week, NIFA released its eagerly anticipated first call for grant find the new structure too prescriptive, and worry that it leaves little proposals, and the changes are indeed dramatic. In striking contrast room for creative approaches. These are legitimate concerns, and to the smaller, two-year individual grants that were the mainstay of the NIFA should follow through with its promise to solicit input from programme, NIFA’s offerings now include a series of five-year ‘coordi- the community before it crafts next year’s proposals. nated agricultural projects’ of up to $45 million for collaborative, inter- Nevertheless, the community should seize this opportunity to disciplinary research. NIFA has defined its funding priorities much tackle big problems. Growing pains are inevitable, but the shake-up more clearly than its predecessor did, and it has aligned them with a has the potential to rejuvenate the field at a time when its talents are series of ‘societal challenges’ that include , bioenergy desperately needed. ■

650 © 2010 Macmillan Publishers Limited. All rights reserved NeWs featURe huMan GEnoME at tEn NATURE|Vol 464|1 April 2010

Life is compLicated The more look, the more complexity there seems to be. truly know an organism — or even a , an

organelle or a molecular pathway — down to urton b Erika Check Hayden asks if there’s a way to make life simpler. the finest level of detail? Imagine a perfect knowledge of inputs, out- ot that long ago, biology was consid- it turns out, is closer to 21,000, and biologists puts and the myriad interacting variables, ena- ered by many to be a simple science, a now know what many of those genes are. But at bling a predictive model. How tantalizing this ons by Jonathan Jonathan by ons pursuit of expedition, observation and the same time, the genome sequence did what notion is depends somewhat on the scientist; I Nexperimentation. At the dawn of the biological discoveries have done for decades. some say it is enough to understand the basic twentieth century, while Albert Einstein and It opened the door to a vast labyrinth of new principles that govern life, whereas others are Max Planck were writing mathematical equa- questions. compelled to reach for an answer to the next Illustrat tions that distilled the fundamental physics of Few predicted, for example, that sequencing question, unfazed by the ever increasing intri- the Universe, a was winning the Nobel the genome would undermine the primacy of cacies. “It seems like we’re climbing a mountain prize for describing how to make genes by unveiling whole new that keeps getting higher and higher,” says Jen- dogs drool on command. classes of elements — sequences nifer Doudna, a biochemist at the University The molecular revolution that make RNA or have a regu- of California, Berkeley. “The more we know, that dawned with the discov- latory role without coding for the more we realize there is to know.” ery of the structure of DNA in proteins. Non-coding DNA is 1953 changed all that, making biology more crucial to biology, yet knowing that it is there Web-like networks quantitative and respectable, and promising hasn’t made it any easier to understand what Biologists have seen promises of simplicity to unravel the mysteries behind everything it does. “We fooled ourselves into thinking the before. The regulation of gene expression, for from evolution to disease origins. The human genome was going to be a transparent blue- example, seemed more or less solved 50 years genome sequence, drafted ten years ago, prom- print, but it’s not,” says Mel Greaves, a cell ago. In 1961, French biologists François ised to go even further, helping scientists trace biologist at the Institute of Cancer Research Jacob and Jacques Monod proposed the ancestry, decipher the marks of evolution and in Sutton, UK. idea that ‘regulator’ proteins bind to find the molecular underpinnings of disease, Instead, as sequencing and other new DNA to control the expression of genes. guiding the way to more accurate diagnosis technologies spew forth data, the complex- Five years later, American biochemist and targeted, personalized treatments. The ity of biology has seemed to grow by orders Walter Gilbert confirmed this model by genome promised to lay bare the blueprint of of magnitude. Delving into it has been like discovering the lac repressor protein, which human biology. zooming into a Mandelbrot set — a space that binds to DNA to control lactose metabolism That hasn’t happened, of course, at least is determined by a simple equation, but that in Escherichia coli bacteria1. For the rest of the not yet. In some respects, sequencing has reveals ever more intricate patterns as one twentieth century, scientists expanded on the provided clarification. Before the Human peers closer at its boundary. details of the model, but they were confident Genome Project began, biologists guessed that With the ability to access or assay almost any that they understood the basics. “The crux the genome could contain as many as 100,000 bit of information, biologists are now strug- of regulation,” says the 1997 genetics text- genes that code for proteins. The true number, gling with a very big question: can one ever book Genes VI (Oxford Univ. Press), “is that

664 © 2010 Macmillan Publishers Limited. All rights reserved VolNATURE 464|1|Vol April 464 2010|1 April 2010 huMan GEnoME at tEn NeWs featURe

a regulator gene codes for a regulator protein that stifles cancer growth by condemning modifiers, such as phosphates and methyl that controls transcription by binding to par- genetically damaged cells to death. Few pro- groups. Through a process known as alternative ticular site(s) on DNA.” teins have been studied more than p53, and splicing, p53 can take nine different forms, Just one decade of post-genome biology has it even commands its own meetings. Yet the each of which has its own activities and exploded that view. Biology’s new glimpse at a p53 story has turned out to be immensely chemical modifiers. Biologists are now real- universe of non-coding DNA — what used to more complex than it seemed izing that p53 is also involved be called ‘junk’ DNA — has been fascinating at first. “The more we know, in processes beyond can- and befuddling. Researchers from an inter- In 1990, several labs found cer, such as fertility and very national collaborative project called the Ency- that p53 binds directly to DNA the more we realize early embryonic develop- clopedia of DNA Elements (ENCODE) showed to control transcription, sup- there is to know.” ment. In fact, it seems wilfully that in a selected portion of the genome con- porting the traditional Jacob– ignorant to try to under- taining just a few per cent of protein-coding Monod model of gene regulation. But as stand p53 on its own. Instead, biologists sequence, between 74% and 93% of DNA was researchers broadened their understanding of have shifted to studying the p53 network, transcribed into RNA2. Much non-coding gene regulation, they found more facets to p53. as depicted in cartoons containing boxes, DNA has a regulatory role; small RNAs of dif- Just last year, Japanese researchers reported3 circles and arrows meant to symbolize ferent varieties seem to control gene expression that p53 helps to process several varieties of its maze of interactions. at the level of both DNA and RNA transcripts small RNA that keep cell growth in check, in ways that are still only beginning to become revealing a mechanism by which the protein Data deluge clear. “Just the sheer existence of these exotic exerts its tumour-suppressing power. The p53 story is just one example of how regulators suggests that our understanding Even before that, it was clear that p53 sat at biologists’ understanding has been reshaped, about the most basic things — such as how a the centre of a dynamic network of protein, thanks to genomic-era technologies. Know- cell turns on and off — is incredibly naive,” says chemical and genetic interactions. Research- ing the sequence of p53 allows computational Joshua Plotkin, a mathematical biologist at the ers now know that p53 binds to thousands biologists to search the genome for sequences University of Pennsylvania in Philadelphia. of sites in DNA, and some of these sites are where the protein might bind, or to predict Even for a single molecule, vast swathes of thousands of base pairs away from any genes. positions where other proteins or chemical messy complexity arise. The protein p53, for It influences cell growth, death and structure modifications might attach to the protein. example, was first discovered in 1979, and and DNA repair. It also binds to numerous That has expanded the universe of known despite initially being misjudged as a cancer other proteins, which can modify its activ- protein interactions — and has dismantled promoter, it soon gained notoriety as a tumour ity, and these protein–protein interactions old ideas about signalling ‘pathways’, in which suppressor — a ‘guardian of the genome’ can be tuned by the addition of chemical proteins such as p53 would trigger a defined set of downstream consequences. “When we started out, the idea was that signalling pathways were fairly simple and linear,” says Tony Pawson, a cell biologist at the University of Toronto in Ontario. “Now, we appreciate that the signalling information in cells is organized through networks of infor- mation rather than simple discrete pathways. It’s infinitely more complex.” The data deluge following the is undoubtedly part of the prob- M pl.co

lem. Knowing what any bio- E logical part is doing has become much more difficult, because an/natur

modern, high-throughput technol- M ot

ogies have granted tremendous power r to collect data. Gone are the days when ns: J. J. ns: cloning and characterizing a gene would I rch garner a paper in a high-impact journal. u a sE Now teams would have to sequence an entire human genome, or several, and compare them. Unfortunately, say some, such impressive feats don’t always bring meaningful biological insights. “In many cases you’ve got high-throughput projects going on, but much of the biology is still occurring on a small scale,” says James Collins, a bioengineer at in Massachusetts. “We’ve made the mistake of equating the gathering of information

665 © 2010 Macmillan Publishers Limited. All rights reserved NeWs featURe huMan GEnoME at tEn NATURE|Vol 464|1 April 2010

with a corresponding increase in insight and swimming in a sea of phenomenology,” says his first lab. Back then, he says, most theories of understanding.” Eric Davidson, a developmental biologist development were “manifestly useless”. A new discipline — systems biology — was at the California Institute of Technology in Davidson calls his work “a proof of princi- supposed to help scientists make sense of the Pasadena. ple that you can understand everything about complexity. The hope was that by cataloguing Such progress has not come from top–down the system that you want to understand if all the interactions in the p53 network, or in a analyses — the sort that try to arrive at insights you get hold of its moving parts”. He credits cell, or between a group of cells, then plugging by dumping a list of parts into a model and hop- the Human Genome Project with pushing them into a computational model, biologists ing that clarity will emerge from chaos. Rather, individual biologists more in the direction of would glean insights about how biological sys- insights have come when scientists systemati- understanding systems, rather than staying tems behaved. cally analyse the components of processes that stuck in the details, focused on a single gene, In the heady post-genome years, systems are easily manipulated in the laboratory — protein or other player in those systems. First, biologists started a long list of projects built largely in model organisms. They’re still using it enabled the sequencing of model-organism on this strategy, attempting to model pieces of a systems approach, but focusing it through a genomes, such as that of the sea urchin, and the biology such as the yeast cell, E. coli, the liver more traditional, bottom–up lens. identification of all the transcription factors and even the ‘virtual human’. So far, all these Davidson points to the example of how active in development. And second, it brought attempts have run up against the same road- gene regulation works during development new types of biologists, such as computational block: there is no way to gather all the relevant to specify the construction of the body. His biologists, into science, he says. data about each interaction included in the group has spent almost a decade dissecting model. sea-urchin development by systematically The eye of the beholder knocking out the expression of each of the So how is it that Davidson A bug in the system transcription factors — regulatory proteins sees simplicity and order In many cases, the models themselves quickly that control the expression of genes — in the emerging where many other become so complex that they are unlikely to cells that develop into skeleton. By observing biologists see increasing reveal insights about the system, degenerat- how the loss of each gene affects development, disarray? Often, complex- ing instead into mazes of interactions that are and measuring how each ‘knockout’ affects ity seems to lie in the eye of simply exercises in cataloguing. the expression of every other transcription the beholder. Researchers who In retrospect, it was probably unrealis- factor, Davidson’s group has constructed a work on model systems, for tic to expect that charting out the biological map of how these transcription factors work instance, can manipulate those interactions at a systems level would reveal together to build the animal’s skeleton4. The systems in ways that are off-limits to systems-level properties, when many of the map builds on the Jacob–Monod princi- those who study human biology, arriv- mechanisms and principles governing inter- ple that regulation depends on interactions ing at more definitive answers. And there are and intracellular behaviour are still a mystery, between regulatory proteins and DNA. Yet it basic philosophical differences in the way sci- says Leonid Kruglyak, a geneticist at Princ- includes all of these regulatory interactions entists think about biology. “It’s people who eton University in New Jersey. He draws a and then attempts to draw from them com- complicate things,” says Randy Schekman, a comparison to physics: imagine building a mon guiding principles that can be applied cell and molecular biologist at the Univer- particle accelerator such as the Large Hadron to other developing organisms. sity of California, Berkeley. “I’ve seen enough Collider without knowing anything about the For example, transcription factors encoded scientists to know that some people are underlying theories of quantum mechanics, in the urchin embryo’s genome are first acti- simplifiers and others are dividers.” Although quantum chromodynamics or relativity. “You vated by maternal proteins. These embry- the former will glean big-picture principles would have all this stuff in your onic factors, which are active from select examples, the latter will invari- detector, and you would have “It’s people who for only a short time, trigger ably get bogged down in the details of the no idea how to think about it, downstream transcription fac- examples themselves. because it would involve proc- complicate things. tors that interact in a positive Mark Johnston, a yeast geneticist at the esses that you didn’t under- Some people are feedback circuit to switch each University of Colorado School of Medicine stand at all,” says Kruglyak. simplifiers and other on permanently. Like in Denver, admits to being a generalizer. He “There is a certain amount the sea urchin, other organ- used to make the tongue-in-cheek prediction of naivety to the idea that for others are dividers.” isms from fruitflies to humans that the budding yeast Saccharomyces cerevi- any process — be it biology or organize development into siae would be “solved” by 2007 when every weather prediction or anything else — you can ‘modules’ of genes, the interactions of which gene and every interaction has been charac- simply take very large amounts of data and run are largely isolated from one another, allowing terized. He has since written more seriously a data-mining program and understand what evolution to tweak each module without com- that this feat will be accomplished within the is going on in a generic way.” promising the integrity of the whole process. next few decades5. Like Davidson, he points This doesn’t mean that biologists are stuck Development, in other words, follows similar out that the many aspects of yeast life, such peering ever deeper into a Mandelbrot set rules in different species. as the basics of DNA synthesis and repair, without any way of making sense of it. Some “The fundamental idea that the genomic are essentially understood. Scientists already biologists say that taking smarter systems regulatory system underlies all the events of know what about two-thirds of the organism’s approaches has empowered their fields, development of the body plan, and that changes 5,800 genes do, and the remaining genes will revealing overarching biological rules. “Biol- in it probably underlie the evolution of body be characterized soon enough, Johnston says. ogy is entering a period where the science can plans, is a basic principle of biology that we He works on the glucose-sensing pathway, be underlaid by explanatory and predictive didn’t have before,” says Davidson. That’s a big and says he will be satisfied that he under- principles, rather than little bits of causality step forwards from 1963, when Davidson started stands it when he can quantitatively describe

666 © 2010 Macmillan Publishers Limited. All rights reserved VolNATURE 464|1|Vol April 464 2010|1 April 2010 huMan GEnoME at tEn NeWs featURe

watched first-hand as complexity dashed one of the biggest hopes of the genome era: that knowing the sequence of healthy and diseased genomes would allow researchers to find the genetic glitches that cause disease, paving the way for new treatments. Cancer, like other common diseases, is much more complicated than researchers hoped. By sequencing the genomes of cancer cells, for example, research- ers now know that an individual patient’s cancer has about 50 genetic , but that they differ between individuals. So the search for drug targets that might help many patients has shifted away from individual genes and towards drugs that might interfere in networks common to many cancers. Even if we never understand biology completely, Vogelstein says, we can under- stand enough to interfere with the disease. “Humans are really good at being able to take a bit of knowledge and use it to great advan- tage,” Vogelstein adds. “It’s important not to wait until we understand everything, because that’s going to be a long time away.” Indeed, drugs that influence those bafflingly complex signal-transduction pathways are among the most prom- ising classes of new medicines being used to treat cancer. And medicines targeting the still-mysterious small RNAs are already in clinical trials to treat viral infections, cancer and macular degeneration, the leading cause of untreatable blindness in wealthy nations. The complexity explosion, therefore, does not spell an end to progress. And the interactions in the pathway — a difficult that is a relief to many researchers who but not impossible task, he says. celebrate complexity rather than wring Not everyone agrees. James Haber, a molecu- their hands over it. Mina Bissell, a cancer lar biologist at Brandeis University in Waltham, researcher at the Lawrence Berkeley National Massachusetts, says it is hard to argue that the Laboratory in California, says that during the understanding of fundamental processes will Human Genome Project, she was driven to be enriched within 20–30 years. “Whether this despair by predictions that all the mysteries progress will result in these processes being would be solved. “Famous people would get up ‘solved’ may be a matter of semantics,” he says, it bears on the ultimate question in biology: will and say, ‘We will understand everything after “but some questions — such as how chromo- we ever understand it all? this’,” she says. “Biology is complex, and that is somes are arranged in the nucleus — are just part of its beauty.” She need not worry, however; beginning to be explored.” Johnston argues that The edge of the universe the beautiful patterns of biology’s Mandelbrot- it is neither possible not necessary to arrive at Some, such as Hiroaki Kitano, a systems biologist like intricacy show few signs of resolving. ■ the quantitative understanding that he hopes at the Systems Biology Institute in Tokyo, Erika Check Hayden is a senior reporter for to achieve for the glucose-sensing pathway for point out that systems seem to grow more Nature based in San Francisco. every other system in yeast. “You have to decide complex only because we continue to learn 1. Gilbert, W. & Muller-hill, b. Proc. Natl Acad. Sci. USA 56, what level of understanding you’re satisfied with, about them. “Biology is a defined system,” he 1891–1898 (1966). and some people respond that they’re not satis- says, “and in time, we will have a fairly good 2. the EncoDE project consortium Nature 447, 799–816 fied at any level — that we have to keep going,” understanding of what the system is about.” (2007). 3. suzuki, h. I. et al. Nature 460, 529–533 (2009). he says. This gulf between simplifiers and divid- Others demur, arguing that biologists will 4. oliveri, p., tu, Q. & Davidson, E. h. Proc. Natl Acad. Sci. USA ers isn’t just a matter of curiosity for armchair never know everything. And it may not mat- 105, 5955–5962 (2008). philosophers. It plays out every day as study sec- ter terribly that they don’t. , a 5. Fields, s. & Johnston, M. Science 307, 5717 (2005). tions and peer reviewers decide which approach cancer-genomics researcher at Johns Hopkins See Editorial, page 649, and human genome to science is worth funding and publishing. And University in Baltimore, , has special at www.nature.com/humangenome.

667 © 2010 Macmillan Publishers Limited. All rights reserved NEWS FEATURE HUMAN GENOME AT TEN NATURE|Vol 464|1 April 2010

THE HUMAN RACE What was it like to participate in the fastest, fiercest research race in biology? Alison Abbott talks to some of the genome competitors about the rivalries and obstacles they faced then — and now.

n many people’s minds, May and furniture did, yet little more to observe development in unprecedented 1998 marked the real start of than a year later the group had molecular detail — if, that is, they can make the race to sequence the human lined up most of the 120-million- sense of the vast numbers of high-resolution Igenome. In that month, Craig base-pair genome of the fruitfly images. Meyers is tackling this data challenge at Venter announced that his upstart company, Drosophila melanogaster (E. W. Meyers et al. the Janelia Farm Research Campus in Ashburn, Celera Genomics in Rockville, Maryland, Science 287, 2196–2204; 2000), proving that Virginia. “Sequences: been there, done that,” would sequence the genome within two years. the shotgun technique could work. The human Meyers says. “Cell-resolution models of nerv- The publicly funded Human Genome Project, genome came next. ous systems or developing organisms: daunting which had been plodding along until that point, Meyers still feels sore about his early but looking more and more doable.” had a competitor — and each side assembled rejection — “it hurt deeply” — and expresses and prepped its team. a gleeful triumph that the technique is now The mega-manager standard in genomics. The academic world was The huge sequencing effort of the Human The shotgunner hypocritical, he says. It castigated him for push- Genome Project was biology’s first foray Venter was willing to flout convention, and he ing the technique and joining into the world of ‘big science’. recruited Gene Meyers to help him. industry, then sneaked him job “It was an incredible It required big money, and a As a mathematician at the University of offers at the first inkling that he moment, seeing level of teamwork that came as Arizona in Tucson, Meyers had developed a might have been right. a major sociological shock to technique for blasting a genome to pieces and When Myers left Celera in everyone stand up. We participating scientists. These reassembling the sequenced debris. But he 2002, he was looking for a new felt we had saved the were the problems with which despaired of ever using this ‘whole-genome shot- direction. He eventually found day.” — Jane Rogers Jane Rogers had to contend gun sequencing’ method on the human genome. it in neuroinformatics, a field as manager of the Human The field was signed up en bloc to sequencing that provides its own computational challenges. Genome Project for the Wellcome Trust Sanger the genome piece by consecutive piece to avoid Advances in microscopy combined with sophis- Institute near Cambridge, UK. gaps, and Meyer’s algorithms had been scorned ticated genetic techniques now make it possi- In 1998, Rogers was part of a small posse of for being error-prone and unworkable. ble to observe how individual neurons behave senior scientists from Sanger who persuaded At Celera, Meyers never felt he was on the when genes are turned on and off. Doing this governors of the Wellcome Trust to inject more ‘wrong side’. He arrived before the computers across an entire mouse brain allows biologists momentum into the project by doubling the

668 © 2010 Macmillan Publishers Limited. All rights reserved VolNATURE 464|1|Vol April 464 2010|1 April 2010 HUMAN GENOME AT TEN NEWS FEATURE

i Left to right: Gene Meyers, Jane Rogers, Robert and proteins that were considered likely drug compensation. Sulston thinks that the biology YOSH i Millman (top), and Todd Taylor. targets, a handful of ‘SNP’ patterns linked to should be able to be exploited by businesses fUJ disease, and technologies linked to shotgun but that better checks are needed to stop basic Sanger centre’s budget so that it could sequence gene sequencing, none of which Celera fully researchers from becoming secretive. a full one-third of the genome. The trust’s sen- exploited. Frustrated, Millman says that when S SYGMA ; T. T. ; SYGMA S i ior administrator, Michael Morgan, revealed he left the company in 2002, he didn’t want to The diplomatic coder ORB c the decision to scientists at that year’s genome hear the suffix ‘-omics’ ever again. When Todd Taylor moved to Japan from meeting at Cold Spring Harbor Laboratory in He obviously changed his mind. Millman the United States in 1998, he was a molecu- New York. The scientists were demoralized has since been involved in start-up companies lar geneticist in need of employment. Taking by Venter’s recent announcement that he was that are pursuing other hot new biotechnolo- a chance, he presented himself as a bioinfor- entering the race, and Morgan’s news brought gies, including, in 2004, Alnylam Pharmaceu- matics expert to the RIKEN Genomic Sciences ; VO TRUNG DUNG/ TRUNG VO ; the crowd to its feet. “It was an incredible ticals in Cambridge, Massachusetts, which has Research Complex in Yokohama, newly cre- HU . Z .

c moment, seeing everyone stand up,” Rogers led the way in RNA-interference technologies ated to allow Japan to contribute to the Human .- c

; says. “We felt we had saved the day.” for regulating genes. In his cur- Genome Project. Then he started Back home, Rogers had to cajole and coerce rent position at the venture-capital “The race made reading up like crazy. iMAGES scientists who were used to working in their company MPM Capital in Boston, for a crazy and The centre was collaborating

OME own small groups into working together on Massachusetts, he has invested in irrational time.” on chromosome 21 with another llc E a central project, using standardized meth- firms exploring epigenetics and — John Sulston Japanese group and two German ; W ; ods and procedures. There were emotional stem cells. Gene patenting, however, teams. He soon found himself as moments, she concedes with some diplomacy. remains controversial, even though patents are the centre’s English-speaking representative at

ERHAUT Rogers, one of very few women involved at a no longer granted for sequences alone and now its meetings, and experienced the occasionally

: R. M R. : high level in the Human Genome Project, devel- require information about a gene’s function and sharp edge of international tensions. GHT i oped a taste for big science. After finishing the utility. Millman still sports his colourful clothes The Japanese side was not well organized at R

TO major sequencing, the Sanger Institute reverted and his red ponytail. Occasionally he yearns to first, he says, and sequenced some parts of the

T f

E to principal-investigator-led research groups don his straightjacket and ride his unicycle genome assigned to its partners. He recalls a l focused on the genomics of human health. But across a tightrope, but, these days, he resists. meeting at the Sanger Institute when one of the Rogers set about lobbying the UK Biotechnology Germans, beside himself with anger, shouted and Biological Sciences Research Council for The freedom fighter that by doing so the Japanese had wasted Ger- funds to establish a centre for sequencing plant, Whenever Celera put out a bullish press release man taxpayers’ money. animal and microbial genomes. She now heads to reassure shareholders that it was winning the Once the Japanese groups hit their stride, the council’s Genome Analysis Centre in Nor- race, John Sulston went on television to explain they bid for the unassigned chromosomes 11 wich, UK, which opened last year— a manage- that, actually, it wasn’t. “I was a reluctant media and 18. The researchers flew over to Washing- ment challenge that, for her, matches the buzz star,” he recalls. Sulston never worked directly ton University in St Louis to negotiate with the of the Human Genome Project. on the human genome, but his work sequencing rival US contingent. “We stepped off the plane that of the nematode worm at the Sanger Insti- and went straight into a three-hour meeting The patent pioneer tute paved the way for the Human Genome where no one even offered us a glass of water,” Robert Millman believed he’d landed in Project — and he became one of its most right- Taylor remembers. After some fairly hostile patent-attorney heaven when he joined Cel- eous political and scientific champions. bargaining, they came away with a compro- era as head of intellectual property in 1999. It Sulston fought to ensure that sequence data mise — the long arm of chromosome 11, the was Millman’s task to work out which of the were released daily into the public domain, short arm of 18 and no dinner invitation. “It company’s intended products — the human helping to establish principles at a 1996 strat- was crazy to split the chromosomes that way, genome sequence, its constituent genes, and egy meeting on human-genome sequencing in but at least I got two Nature papers,” he jokes. the software and algorithms to analyse it — Bermuda that are still largely followed by the Taylor, now a recognized bioinformatician, could be patented. genomics community. And he put the kibosh works at the RIKEN Advanced Science Insti- In earlier days, Millman had been a street on a compromise with Celera, proposed in tute that replaced the former genome centre. artist, performing outrageous feats of esca- 1999, because the company was not prepared His group has shrunk from 70 to 20 people. pology in his free time. Life at Celera turned to release data early enough to satisfy the pub- One of his main projects is with the Inter- out to be similarly challenging. He enjoyed lic effort’s principles. In retrospect, Sulston still national Human Consortium, the buzz of testifying in front of Congress thinks it was right to fight. “Otherwise the bio- developing software for analysing the hun- with Venter, helping to shape the US patent logical databases that we have today would have dreds of microbial species in the intestines of office’s policies in gene patenting. Academics collapsed — everything could have ended up healthy Japanese people. But these and other scorned Venter for making a business out of in the hands of an American corporation. The international efforts cannot rival the Human the human genome, but Millman remembers race made for a crazy and irrational time.” Genome Project, says Taylor, who calls it “a that although Venter “revelled in his bad- Yet his battles over the ownership of biology once-in-a-lifetime project, something the likes boy image, he didn’t always act like he really haven’t stopped. Now emeritus at the Sanger of which we probably won’t see again. Not that believed in patents and he didn’t make my life Institute, he is a part-time faculty member at we all wouldn’t mind working like that together easy”. Millman found himself caught between the University of Manchester’s Institute for Sci- again. I’d jump at the opportunity.” ■ Venter’s academic principles and his business ence, Ethics and Innovation, which is engaging Alison Abbott is Nature’s senior European drive, and thought that the company could patent attorneys in heated debate about own- correspondent. have pursued patents more aggressively. ership issues in biology, such as the extent to See Editorial, page 649, and human genome In the end, Millman patented 150 genes which donors of biological material deserve special at www.nature.com/humangenome.

669 © 2010 Macmillan Publishers Limited. All rights reserved NEWS FEATURE huMAN GeNoMe AT TeN NATURE|Vol 464|1 April 2010 VolNATURE 464|1|Vol April 464 2010|1 April 2010 huMAN GeNoMe AT TeN NEWS FEATURE S

The Sequence Read Archive (SRA) houses raw data from de next-generation sequencing and has grown to 25 trillion base pairs. If this chart were to accommodate it, it would stretch to NAN SRA more than 12 metres — twice the height of an average giraffe. cer & W. Fer W. & cer N A glioma cell line17, Inuk18, !Gubi and Archbishop

19 Spe N. y B

Desmond Tutu , James

20 21 S Lupski , and a family of four c THE SEQUENCE EXPLOSION I BI; GrAph BI; Two Korean males including c Seong-Jin Kim9,10, Stephen t the time of the announcement of the first drafts of the 300 Quake11, another cancer 12 human genome in 2000, there were 8 billion base pairs of The Trace archive, started in 2000, genome , George Church, a N Source: sequence in the three main databases for ‘finished’ sequence: houses raw sequence data, and Yoruban female, another male13, and four others14–16 GenBank, run by the US National Center for Biotechnology currently holds 1.8 trillion base pairs. A Trace Information; the DNA Databank of Japan; and the European Molecu- lar Biology Laboratory (EMBL) Nucleotide Sequence Database. The databases share their data regularly as part of the International Nucle- $10, 000 otide Sequence Database Collaboration (INSDC). In the subsequent 250 first post-genome decade, they have added another 270 billion bases 454 PYROSEQUENCING: James Watson5, a woman with to the collection of finished sequence, doubling the size of the database Released in 2005, 454 sequencing is acute myeloid leukemia6, considered the first ‘next-generation’ a Yoruba male from Nigeria7 roughly every 18 months. But this number is dwarfed by the amount of and the first Asian genome8 Co technique. A machine could sequence raw sequence that has been created and stored by researchers around st pe r million ba hundreds of millions of base pairs in a se pairs of sequ the world in the Trace archive and Sequence Read Archive (SRA). ence (log scale) single run. See Editorial, page 649, and human genome special at www.nature.com/humangenome 200 AUTOMATED : Based on a decades-old method, at the J. Craig Venter peak of the technique, a single machine DNA SEQUENCES BY TAXONOMY diploid genome4 could produce hundreds of thousands of International Nucleotide Sequence Database Collaboration: base pairs in a single run. The main repositories of ‘finished’ sequence span a wide range of $1, 000 organisms, representing the many priorities of scientists worldwide. 150 SEQUENCING BY SYNTHESIS: Other companies such as Solexa (now

Green plants Illumina) modified the next-generation, sequencing-by-synthesis techniques $100 Non-human and can produce billions of base pairs vertebrates Billions of base pairs in a single run.

100 THIRD-GENERATION SEQUENCING: INSDC Companies such as Helicos BioSciences Human Genome Project already read sequence from short, single Fungi databases 3 completed DNA molecules. Others, such as Pacific Biosciences, Oxford Nanopore and Ion Viruses, bacteria and archaea Torrent say they can read from longer Invertebrates molecules as they pass through a pore.

Humans 50 SEQUENCING BY LIGATION: $10 Protozoa

Metagenomes (multiple species) Whole Genome Shotgun Sequence This technique employed in SOLiD and Polonator instruments uses a different First drafts of two composite from previous technologies haploid human genomes1,2 Gene sequence stored in and samples every base twice, reducing Human Non-human the error rate. international public databases $1

0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Trace SRA

1. Venter, J. c. et al. Science 291, 1304–1351 (2001). 9. Ahn, S.-M. et al. Genome Res. 19, 1622–1629 15. pleasance, e. d. et al. Nature 463, 191–196 (2010). hoW MANy 2. International human Genome Sequencing (2009). 16. pleasance, e. d. et al. Nature 463, 184–190 (2010). consortium Nature 409, 860–921 (2001). 10. Kim, J.-I. et al. Nature 460, 1011–1015 (2009). 17. clark, M. J. et al. PLoS Genet. 6, e1000832 (2010). Non-human Human huMAN GeNoMeS? 3. International human Genome Sequencing 11. pushkarev, d., Neff, N. F. & Quake, S. r. Nature 18. rasmussen, M. et al. Nature 463, 757–762 (2010). consortium Nature 431, 931–945 (2004). Biotechnol. 27, 847–850 (2009). 19. Schuster, S. c. et al. Nature 463, 943–947 (2010). The graphic shows all published, fully sequenced hu- 4. Levy, S. et al. PLoS Biol. 5, e254 (2007). 12. Mardis, e. r. et al. N. Engl. J. Med. 10, 1058–1066 20. Lupski, J. r. et al. N. Engl. J. Med. doi:10.1056/ Trace Archive: Developed to Sequence Read Archive: Houses man genomes since 2000, including nine from the first 5. Wheeler, d. A. et al. Nature 452, 872–876 (2008). (2009). NeJMoa0908094 (2010). house the raw output of high- raw data from next-generation quarter of 2010. Some are resequencing efforts on the 6. Ley, T. J. et al. Nature 456, 66–72 (2008). 13. drmanac, r. et al. Science 327, 78–81 (2009). 21. roach, J. c. et al. Science doi:10.1126/ throughput sequencers built in sequencers. Dominated by human same person and the list does not include unpublished 7. Bentley, d. r. et al. Nature 456, 53–59 (2008). 14. McKernan, K. J. et al. Genome Res. 19, 1527–1541 science.1186802 (2010). the late 1990s, the trace archive sequence, including multiple completed genomes. 8. Wang, J. et al. Nature 456, 60–65 (2008). (2009). spans a wide range of taxa. coverage for more than 170 people. Page size by comparison 670 671 © 2010 Macmillan Publishers Limited. All rights reserved OPINION HuMan GEnoME at tEn NATURE|Vol 464|1 April 2010

Point: Hypotheses first There is little to show for all the time and money invested in genomic studies of cancer, says Robert Weinberg — and the approach is undermining tried-and-tested ways of doing, and of building, science.

uring the twentieth century, biology biology that has since emerged. Increasing — traditionally a descriptive science proportions of national research budgets are D— became one of hypothesis-driven being diverted to them. But is it worth extin- experimentation. Tightly coupled with this guishing 20 or 30 small-scale, hypothesis- was the increasing dominance of reductionism, driven projects to make room for an attack at the idea that complex biologi- the systems-wide level? cal systems can be understood From a cancer researcher’s by dismantling them into their perspective, the successes of constituent pieces and studying hypothesis-driven science are each in isolation. Implicit here clear and undeniable. They was the notion that observa- stretch back over half a century — although this is likely to change as technol-

tions should only be made to support or attack and continue week after week, month after ogy costs tumble. Meanwhile, countless smaller urton b

hypothesized mechanisms of action, and that month, to yield new conceptual insights. By experimental research programmes — proven an simple observation — phenomenology for its contrast, the new ways of doing biology are so sources, year-after-year, of conceptual innova- H own sake — is of relatively little use. untested that their long-term benefits are still tion — have struggled to survive. These approaches served us well over the hard to project. Nonetheless, it’s useful to make on by Jonat by on past half-century: witness the revolutions in comparisons, if only because economic neces- High stakes I molecular and cellular biology, immunology, sities force them to be made. Arguably the most ambitious large-scale ven-

neurobiology and genetics. Our insights into Analysis of expression arrays, which show ture involves assembling the many interacting Illustrat pathogenetic mechanisms exceed the wildest which genes are active in a tumour sample, signalling components within individual cells speculations of 50 years ago. have shown that cancers previously viewed as a into wiring diagrams. These elaborate maps, Now the dominant position of hypothesis- single entity have different pathogenetic mech- sometime termed ‘hairballs’, and the computer driven research is under threat. Many feel that anisms and respond differently to therapy. The algorithms that model signal processing, could traditional conceptual tools cannot map the use of genome-wide libraries of small interfer- shed light on why and how individual cells enormous complexity that allows single cells ing RNAs to inhibit large cohorts of genes, and respond to external signals and predict their and complex organisms to thrive, and that so identify those behind cancer, is a blend of the future behaviour. Although aesthetically pleas- recent technological innovations have created old and the new. It lacks a clear preconception ing, they have yielded few conceptual insights a viable alternative. My of what will eventually into how and why cells and tissues behave students can gather cer- “Is it worth extinguishing 30 be found, but contains the way they do. Some feel that a thorough tain types of experimen- clear hypotheses about understanding of individual signal-processing tal data 1,000 and even small-scale projects for an attack the biological pheno- components is an essential prerequisite to pre- 10,000 times faster than at the systems-wide level?” types that will result. dicting the behaviour of entire signalling cir- I could 40 years ago. This approach, still in cuits — a notion often dismissed as old-style In cancer research, the new technologies its infancy, has been remarkably productive. reductionism. promise to change the landscape of diagnosis, Sequencing of entire tumour genomes (or The stakes here are high. The repercussions therapy and insights into disease pathogenesis. their coding exons) has a more mixed record. of major agencies shifting their funding allo- So have the old ways of doing business — of These projects consume an enormous amount cations will be felt for a generation. Running testing hypotheses — become anachronisms? of resources and researchers’ energy. The divi- laboratories focused on small-scale, hypothe- I think not. dends to date have been modest: the discovery sis-driven research has become unattractive for of several new oncogenes and tumour suppres- many young people because of the enormous Better or just bigger? sor genes involved in tumour formation (for difficulty of procuring enough money to launch The era of the new biology — genomics, example, BRAF, IDH1/2 and translocations in and expand such a research programme. The , metabolomics — began with the prostate carcinomas), and a general measure long-term effects will be an increasing inabil- sequencing of the human genome a bit more of the degree of genetic instability of various ity of many biological disciplines to attract the than a decade ago. Its successes are indisputa- tumour-cell genomes. brightest young people — and they are, after ble: tens of thousands of research programmes, These massive data-generating projects all, the engines of scientific progress. Without many focused on identifying and characteriz- have yet to yield a clear consensus about how them, we are lost. ■ ing specific genes, have benefited enormously many somatic mutations are required to create Robert A. Weinberg is at the Whitehead Institute from the creation and study of this database. a human tumour, and have given us few major for Biomedical Research, and in the Department Large-scale efforts such as the Human breakthroughs in our understanding of how of Biology, Massachusetts Institute of Technology, Genome Project are portrayed as the future, individual tumours develop. The cost of each Cambridge, Massachusetts 02142, USA. and as central to the discipline of systems conceptual insight has been very high indeed e-mail: [email protected]

678 © 2010 Macmillan Publishers Limited. All rights reserved Vol 464|1 April 2010 OPINION Has the revolution arrived? Looking back over the past decade of human genomics, Francis Collins finds five key lessons for the future of — for technology, policy, partnerships and .

n 26 June 2000, Craig remains quite real. Those who non-coding) and the patterns that determine Venter and I stood next HUMAN somehow expected dramatic whether genes are switched on or off in a given Oto the President of the GENOME results overnight may be disap- tissue — patterns of chromatin modification, United States, in the same room pointed, but should remember transcription factors and DNA methylation. of the White House where the that genomics obeys the First With regard to medical applications, genome- explorers Meriwether Lewis and William Clark Law of Technology: we invariably overestimate wide association studies (GWAS) have now had unfurled their map of the Northwest Ter- the short-term impacts of new technologies and revealed an astounding number of common ritories for Thomas Jefferson. “Today,” Bill underestimate their longer-term effects. DNA variations that play a part in the risk of Clinton said, “the world is joining us here in developing common diseases such as heart the East Room to behold a map of even greater Breathtaking acceleration disease, diabetes, cancer or autoimmunity. To significance. We are here to celebrate the com- The decade from 2000 to 2010 was char- identify less common variations, methods to pletion of the first survey of the entire human acterized by breathtaking acceleration in target DNA sequencing to subsets of the human genome … With this profound new knowledge, genome science. Thanks to advances in DNA genome have been developed. These methods humankind is on the verge of gaining immense, sequencing technology that dropped the cost can now sequence 80–90% of the protein-coding new power to heal. Genome science will have a approximately 14,000-fold between 1999 and regions — the exons or ‘exome’ — of a human real impact on all our lives — and even more, on 2009, finished sequences are now available for DNA sample for just a few thousand dollars. the lives of our children. It will revolutionize the 14 mammals, and draft or complete sequences Genome research has already had a profound diagnosis, prevention, and treatment of most, if have been done for many other vertebrates, impact on scientific progress. The combination not all, human diseases.” invertebrates, fungi, plants and microorgan- of new technologies and freely accessible data- I was honoured to be standing there, but also isms. Comparative genomics has emerged as bases of high-quality genomic information has somewhat embarrassed: the milestone being a powerful approach for understanding evolu- enabled the average investigator to make dis- reported was not yet attached to a publication tion and genome function at a level of detail coveries much more quickly than would other- — there was a lot of analysis still to do, and the barely imagined a few years ago. wise have been possible. For example, the search paper would not appear in Nature until eight For humans, the HapMap project produced for the gene finally succeeded in months later. Still, it was a heady moment. a remarkable catalogue of common variation 1989 after years of effort by my lab and several Wisely, the president did not attach time tables in the genome in just three years, from 2002 to others, at an estimated cost of US$50 million. to his bold predictions, even though in the early 2005. As full sequencing Such a project could now days of the millennium, everyone wanted to hear has become more prac- “This profusion of therapeutic be accomplished in a few where this genome revolution was going. I even tical, researchers have opportunites is a challenge to days by a good graduate made my own predictions for 2010. Never hav- been releasing complete student with access to ing discarded a PowerPoint file, I can reproduce genomes of individuals prioritize.” the Internet, appropri- my list verbatim: — a total of 13 at the time ate DNA samples, some of this writing, including my personal hero, inexpensive reagents, a thermal cycler and a ● Predictive genetic tests will be avail- Archbishop of South Africa. DNA sequencer (see graphic). able for a dozen conditions In 2011, an international team is set to com- The consequences for clinical medicine, ● Interventions to reduce risk will be plete the data-production phase of the 1000 however, have thus far been modest. Some available for several of these Genomes Project, designed to produce highly major advances have indeed been made: pow- ● Many primary-care providers will accurate assembled sequences from more than erful new drugs have been developed for some begin to practise genetic medicine 1,000 individuals whose ancestors came from cancers; genetic tests can predict whether peo- ● Preimplantation genetic diagnosis will Europe, Asia and Africa. ple with breast cancer need chemotherapy; the be widely available, and its limits will The same determination to study the entire major risk factors for macular degeneration be fiercely debated genome, not just isolated segments, has now have been identified; and drug response can ● A ban on genetic discrimination will been applied to understanding its function — be predicted accurately for more than a dozen be in place in the United States although this quest is, of course, much more drugs. But it is fair to say that the Human ● Access to genetic medicine will remain complicated and open-ended. The Encyclo- Genome Project has not yet directly affected inequitable, especially in the develop- pedia of DNA Elements (ENCODE) project the health care of most individuals. ing world (started in pilot form in 2003 and slated to GWAS have so far identified only a small run at least until 2011) and the US National fraction of the heritability of common diseases, It is fair to say that all of these predictions Institutes of Health (NIH) Roadmap Epig- so the ability to make meaningful predictions is have come true, with some caveats that offer enomics Program (started in 2008 and funded still quite limited, even using chips that test for important lessons about the best path forward until 2013) continue to define the ‘parts list’ a million or more common variants. Nonethe- for genomics and personalized medicine. of the human genome. These projects iden- less, direct-to-consumer marketing of genetic The promise of a revolution in human health tify the locations of genes (protein coding and risk prediction, based on the rapidly growing

674 © 2010 Macmillan Publishers Limited. All rights reserved NATURE|Vol 464|1 April 2010 HuMAN GeNOMe AT TeN OPINION I r 250 on the planet to begin work immediately in UNDERSTANDING THE GENOME analysing the massive amounts of genomic Investment in major genomics programmes by the US National data now being produced. It is a very good thing Human Genome Research Institute: over the past decade that the ‘race for the genome’ in 1998–2000 M. Guyer, NHG Guyer, M. expenditure on large-scale sequencing fell, thanks to technology advances, and focus shifted to probing genome function. resulted in the human genome sequence being 200 immediately and freely available to all, rather Large-scale sequencing than becoming a commercial commodity. 50 Databases modENCODE Second, technology development for Population sequencing and functional genomics — key to 45 HapMap genomics the success achieved thus far — must continue to ELSI $1,000 genome 150 40 be a major focus of investment by both the pub- ENCODE $100,000 genome lic and private sectors. Although huge leaps have 35 been made in increasing the speed and reduc- 30 ing the costs of DNA sequencing, expression 100 analysis and methods to assess the epigenome, 25 the limits are still nowhere near being reached. 20 Third, the success of personalized medicine will depend on continued accurate identifica- 15

Expenditure (US$ millions) Expenditure tion of genetic and environmental risk factors, 50 10 and the ability to utilize this information in the real world to influence health behaviours 5 and achieve better outcomes. This will require well designed, large-scale research projects, 0 for discovering risk factors and for testing the 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 0 ELSI — Ethical, Legal and Social Implications programme; ENCODE — Encyclopedia of DNA 2000 2009 implementation of prevention and pharmaco- Elements; modENCODE — Model Organism Encyclopedia of DNA Elements. genomic programmes. Fourth, achieving the enormous promise of database of GWAS results, is attracting early the identification of new cancer drug targets the myriad new drug targets emerging from adopters. Having gone through that process is accelerating rapidly as a consequence of genomic analysis of common and rare diseases myself, I can report that I found the opportunity the ability to do deep genome sequencing of requires new paradigms of public–private to view my own personal genotype results rather many tumours to identify recurrent mutations. partnership. Academic investigators will have riveting, despite the limited clinical validity and Projects such as the Cancer Genome Atlas, a much more important role in the early stages, utility of many of these predictions. which is carrying out the equivalent of 20,000 effectively ‘de-risking’ projects for downstream This dynamic is likely to change in the next genome projects on matched tumour and blood commercial investment. Closer relationships five years. Much of the missing heritability (the DNA samples from 20 common types of cancer, between the US Food and Drug Administra- ‘dark matter’ of the genome) will probably turn have begun to reveal numerous opportunities tion and the NIH, announced this February, up as the technology advances. Whole-genome for therapeutic development. And GWAS have will assist this process. sequencing, coming into its own as the cost per pointed to hundreds of previously unrecognized Finally, good policy decisions will be crucial genome falls below $1,000 in the next three to drug targets for dozens of other diseases. to reaping the benefits that should flow from the five years, will identify rare variants of larger This profusion of therapeutic opportuni- coming revelations about the genome. These effect and the copy number variants that GWAS ties is a challenge to prioritize. Efforts are will include protection of individual privacy, may have missed. With an increasing inventory now under way to forge innovative partner- effective education of health-care providers and of these discoveries, prediction of disease risk ships between the traditional strengths of the the public about genomic medicine, and appro- and drug response will continue to improve. private sector and academic labs. The NIH priate health-care system reimbursement for the As the cost falls and evidence grows, there has provided new resources to catalyse such cost of validated preventive measures. will be increasing merit in obtaining complete- partner ships, including access by academic In The Wisdom of the Sands, author Antoine genome sequences for each of us, and storing investigators to high-throughput screening de Saint-Exupéry wrote: “As for the future, your that information, with appropriate privacy pro- through the Molecular Libraries Roadmap task is not to foresee, but to enable it.” Genomics tections, in our medical records, where it will be project, and to preclinical testing of promising has had an exceptionally powerful enabling role quickly available to guide prevention strategies lead compounds through the Therapeutics for in biomedical advances over the past decade. or choice. Rare and Neglected Disease initiative. Only time will tell how deep and how far that Perhaps the most profound consequence of power will take us. I am willing to bet that the the genome revolution in the long run will be Enabling the future best is yet to come. ■ the development of targeted therapeutics based I propose five major lessons that could be Francis Collins is director of the National on a detailed molecular understanding of patho- gleaned from this first decade of the genome Institutes of Health, Bethesda, Maryland 20892, genesis. However, this is also the goal most chal- era. First, free and to genome data USA. Between 1993 and 2008 he was director of lenged by long timelines, high failure rates and has had a profoundly positive effect on progress. the National Human Genome Research Institute. exorbitant costs. Despite those obstacles, inspir- The radical ethic of immediate data deposit, e-mail: [email protected] ing examples of success are in hand, many of adopted by the Human Genome Project in them (trastuzumab, imatinib, gefitinib and erlo- 1996 and now the norm for other community See Editorial, page 649, and human genome tinib) for the treatment of cancer. Furthermore, resource projects, empowers the best brains special at www.nature.com/humangenome.

675 © 2010 Macmillan Publishers Limited. All rights reserved OPINION HuMan genoMe at ten NATURE|Vol 464|1 April 2010

Multiple personal genomes await Genomic data will soon become a commodity; the next challenge — linking with and disease — will be as great as the one genomicists faced a decade ago, says J. Craig Venter.

early ten years after history has proved they were a unfolded. Sequencing centres turned to zoology, Francis Collins and I mistake. Although for exam- and the number of sequenced genomes of Nstood at the White House ple argued that non-human species grew to today’s tally of with President Bill Clinton to his first sequencing machines more than 3,800 (see ‘Completed genomes’). announce the first two drafts of were equivalent to the Model A At the same time, data from labs around the the human genome, the technology for DNA Ford and that a major effort was needed on world continued to add to the draft human sequencing has progressed more dramati- technology development; the project moved genome, resulting in an improved version in cally than any of us could have predicted. The forwards regardless. 2004 (ref. 5). My team concentrated on com- Human Genome Project took a worldwide In 1994, frustrated with the slow progress pleting the sequencing of my personal genome, effort and billions of dollars to reach what some and inefficient use of labour, my team at the resulting in the publication of the first diploid had thought was an impossible goal. Today, Institute for Genome Research in Rockville, human genome from an individual in 2007 thanks to innovation inspired in part by the race Maryland, developed the ‘whole-genome shot- (known as the HuRef genome)6. for the first draft between my company Celera gun sequencing’ approach, which we used to Genomics, then in Rockville, Maryland, and sequence an entire bacterial genome in three Bigger differences the public effort led by Collins, it is possible to months1. Five years later at Celera we applied This first diploid human genome ushered in a sequence a human genome in a day on a single this approach to the Drosophila and human new picture of human diversity. The sequence machine for just a few thousand dollars. genomes2,3. Once highly controversial, whole- showed that my two parental genomes differed Yet there is still some way to go before this genome has been used for from each other by 0.5% when insertions and capability can have a significant effect on almost every genome sequenced deletions of nucleotides in the medicine and health. As sequencing costs since 2001. “The genome DNA sequence were included continue to plummet, data quality needs to Work at Celera was done in along with single nucleotide poly- improve. The generation of genomic data a single large facility with 300 revolution is only morphisms (SNPs) — another will have little value without corresponding automated DNA sequencers and just beginning.” common form of genetic varia- phenotypic information about individuals’ a powerful computer. The pub- tion. This was a dramatic increase observable characteristics, and computational lic project used around 600 DNA sequencers over the 0.1% estimated in 2001 from looking tools for linking the two. The challenges facing distributed among several laboratories around at SNPs alone. It was subsequently discovered researchers today are at least as daunting as the world. Together, the two projects gave a that the genomes of different individuals differ those my colleagues and I faced a decade ago. remarkable early insight into the human spe- by between 1% and 3%. cies, the most significant findings being the Why did the data from the first two draft In hindsight small number of human genes — 26,000 com- human genomes fail to show that individual The Human Genome Project was controversial pared with earlier estimates of up to 300,000 genomes differ so significantly? The public from the start for several reasons, in particular — and the small amount of variation (0.1%) effort was by design a haploid genome project. the likelihood that it would divert funds from between individual humans3,4. Construction of the haploid sequence involved other biological research projects. Some of With the publication of the human genome sequencing cloned segments; there was there- the early decisions had long-term effects on drafts in 2001, many analysts predicted the fore no way to directly detect polymorphisms, research strategies. In 1989, with sequencing end of the market for DNA sequencing tech- insertions and deletions. In contrast, the prob- costs projected to fall to US$1 a base pair nology. As we now know, a very different story lem with the Celera programme was that there t.

S within a few years, a group of those was too much genetic variation. The involved decided to ask the US Con- SPEED READING DNA came from two males and three gress for $3 billion to cover the costs of Genomes can now be sequenced around 50,000 times faster than in 2000. females of various ethnicities including t Sanger In Sanger t sequencing a haploid genome consist- African American, Chinese, Hispanic S 100,000,000 3 ru

t ing of 3 billion base pairs, rather than and Caucasian . Early versions of the e

M $6 billion to cover a diploid genome of 10,000,000 Celera genome assembly software used 6 billion base pairs representing both a ‘majority rule’ approach to generate sets of chromosomes, which was con- 1,000,000 a single consensus sequence. This left sidered too expensive. 100,000 out a substantial number of insertions It was also agreed that, because of and deletions from each genome. Had the hugely ambitious nature of the 10,000 we used only one person’s DNA, we project, armies of scientists would be would have had a much more complete needed to sequence fragmented pieces 1,000 understanding back then of the extent Kilobases per day per machine Kilobases per day Source: M. Stratton/Wellco M. Source: of the genome. Once these decisions to which individuals vary genetically. 100 were made, there was little room for 2000 2005 2010 Despite the limitations of both substantive innovation. I believe that projects, the race to sequence the

676 © 2010 Macmillan Publishers Limited. All rights reserved NATURE|Vol 464|1 April 2010 HuMan genoMe at ten OPINION ) human genome inspired many in human genetic variation and biological bl

COMPLETED GENOMES* M e basic research labs and companies outcomes such as physiology and dis- I, More than 3,800 organisms have now had their genomes sequenced. eb (

to develop the sequencing technolo- ease, will require the complete genomes S 4,000 e gies that have emerged over the past of tens of thousands of humans together M few years. In 2003, to spur commer- with comprehensive digitized pheno- Plant cial and government investment, my type data. A simplistic query80 could geno bl 3,500 Mammal M e institute set up a $500,000 prize to be be easily scored, for example, “do you S n

Other vertebrates Eukaryotes e 60 awarded to the team that made the have diabetes: yes or no”. A more com- D Invertebrate

most substantial progress towards the 3,000 prehensive view would include40 such an e Fungus sequencing of a human genome for things as age of onset and a scoring HIV Bacteria 20 rc

$1,000 or less. This has since evolved Prokaryotes of the range of clinical manifestations a

Archaea e into the $10-million Archon X-Prize, 2,500 associated with the disease, including0 ID Bacteriophage 2008–2009 to go to the first team to build a device Virus Non-cellular extent of nerve damage, vascular issues, ucleot

that can sequence 100 human genomes doses of used and family n 2,000 Archaeal virus

to a high degree of accuracy within history. The scoring system would ean 10 days for minimal costs7. subcategorize characteristics such as P uro These and other incentives rapidly 1,500 disease type, progression and severity. e accelerated the pace of innovation Even if we had all this information (see ‘Speed reading’). Early this year, genomes Number of completed today, we wouldn’t be able to make use two companies, Illumina in , 1,000 of it because we don’t have the com- California, and Life Technologies putational infrastructure to compare in Carlsbad, California, announced even thousands of genotypes and sales of new sequencing instruments 500 phenotypes with each other. The need that can generate 25 billion base pairs for such an analysis could be the best per day and 100 billion base pairs per 0 justification for building a proposed day, respectively. Life Technologies 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ‘exa scale’ supercomputer, which would also announced a future version that *Deposited in the International Nucleotide Sequence Database Collaboration run 1,000 times faster than today’s fast- will produce 300 billion base pairs per est computers. Scientists need to work day. Both companies claim that their current advances in processing, the resulting data together on a global basis to set the criteria for instruments can sequence a human genome in quality is still well below diagnostic standards. phenotype data; the incentive could come from a day for less than $6,000. Another company, Improving data quality is crucial, because if a academia, governments or industry. Complete Genomics in Mountain View, Cali- human genome cannot be independently assem- Where will genomics be ten years from fornia, maintains that it can sequence a human bled then the sequence data cannot be sorted now? As sequencing capacity increases glo- genome for $5,000–$8,000, but it is not pro- into the two sets of parental chromosomes, or bally and the data quality improves, we will ducing instruments for sale. This incredible . This process — phasing move beyond the current goal of one genome progress matches or exceeds anything that has — will become one of the most useful tools in per person to sequencing multiple genomes happened in high-performance computing genomic medicine. Establishing the complete per person from sources including sperm and over the same period. Consider that the first set of genetic information that we received from egg cells, blastocysts, stem cells, pre-tumour sequencer in my National each parent is crucial to understanding the links cells and cancer cells. This will enable us to Institutes of Health lab in 1987 processed 4,800 between heritability, gene function, regulatory select healthy cells for reproduction and tissue base pairs a day. Twenty-three years on, the lat- sequences and our predisposition to disease. transplants, or to better understand est Life Technology instrument has improved Fortunately there are some exciting develop- and tumour development. Equally important on that by about eight orders of magnitude. ments on the way that could help, such as new for medical progress is the sequencing of the Yet these impressive increases have come methods from Pacific Biosciences in Menlo genomes of the millions of microbacteria that with a big penalty. Most of the high-speed Park, California, and Life Technologies that dwell within all of us8. The genome revolution instruments sequence DNA in very short seg- can produce sequence information from a single is only just beginning. ■ ments (or ‘reads’) of less than 100 base pairs DNA strand. This approach promises sequence J. Craig Venter is at the J. Craig Venter Institute, at a time. This is significantly shorter than the reads, in the range of thousands of base pairs, , California 92121, USA. reads produced by the first generation Sanger that will result in substantially higher-quality e-mail: [email protected] instruments, which manage 800–900 base pairs genome sequence data. 1. Fleischmann, r. D. et al. Science 269, 496–512 (1995). per read, or the second generation Roche 454 2. adams, M. D. et al. Science 287, 2185–2195 (2000). technology, whose reads approached 500 base The next challenge 3. Venter, J. c. et al. Science 291, 1304–1351 (2001). pairs. Short reads greatly hamper one’s abil- At the current rate of technological progress, 4. International Human genome Sequencing consortium Nature 409, 860–921 (2001). ity to assemble sequences into long stretches DNA sequencing is soon likely to become a 5. International Human genome Sequencing consortium representing the chromosomes. Sequencing commodity, and the generation of cheap, high- Nature 431, 931 (2004). groups have tried to overcome these limita- quality sequence data will cease to be an issue. 6. levy, S. et al. PLoS Biol. 5, e254 (2007). 7. http://genomics.xprize.org/archon-x-prize-for-genomics/ tions by layering their results on one of the Phenotypes — the next hurdle — present a much prize-overview already published human genome sequences greater challenge than genotypes because of the 8. nIH HMP Working group et al. Genome Res. 19, 2317–2323 instead of trying to assemble the whole complexity of human biological and clinical (2009). sequence from scratch. This gives a distorted information. The experiments that will change See Editorial, page 649, and human genome view of any single genome and, despite all the medicine, revealing the relationship between special at www.nature.com/humangenome.

677 © 2010 Macmillan Publishers Limited. All rights reserved