NEWS FEATURE NATURE|Vol 464|15 April 2010

The ANCER GEN ME challenge Databases could soon be flooded with genome sequences from 25,000 tumours. Heidi Ledford looks at the obstacles researchers face as they search for meaning in the data.

hen it was first discovered, in needle pulled from a veritable haystack of 2006, in a study of 35 colorec- GENOMES AT A GLANCE -associated thanks to high- tal cancers1, the in the Circos plots can give a snapshot of the mutations within powered genome sequencing. In the past two gene IDH1 seemed to have little a genome. The outer ring represents the chromosomes years, labs around the world have teamed W and the inner rings each detail the location of different consequence. It appeared in only one of the types of mutations. up to sequence the DNA from thousands of tumours sampled, and later analyses of some tumours along with healthy cells from the same 300 more have revealed no additional muta- individuals. Roughly 75 cancer genomes have tions in the gene. The mutation changed only Interchromosomal been sequenced to some extent and published; rearrangement Point mutation

one letter of IDH1, which encodes isocitrate Y researchers expect to have several hundred

X dehydrogenase, a lowly housekeeping enzyme 1 completed sequences by the end of the year. 22

involved in metabolism. And there were plenty 21 The efforts are certainly creating bigger hay- 20 of other mutations to study in the 13,000 genes 2 stacks. Comparing the gene sequence of any sequenced from each sample. “Nobody tumour to that of a normal cell reveals dozens 19 would have expected IDH1 to be impor- of single-letter changes, or point mutations, (2009). 719–724 458, tant in cancer,” says Victor Velculescu, a 18 along with repeated, deleted, swapped or researcher at the Sidney Kimmel Com- 3 inverted sequences (see ‘Genomes at prehensive Cancer Center at Johns 17 a glance’). “The difficulty,” says Bert Nature Hopkins University in Baltimore, Vogelstein, a cancer researcher at the

16 utreal f Maryland, who had contributed to Ludwig Center for Cancer Genetics . a . p the study. 4 and Therapeutics at Johns Hopkins, But as efforts to sequence tumour 15 “is going to be figuring out how to DNA expanded, the IDH1 muta- use the information to help people tion surfaced again: in 12% of sam- rather than to just catalogue lots and

14 & Campbell J. . ples of a type of brain cancer called lots of mutations”. No matter how p glioblastoma multiforme2, then in 5 similar they might look clinically, 8% of acute myeloid leukaemia sam- 13 most tumours seem to differ geneti-

3 Stratton, . r

ples . Structural studies showed that the cally. This stymies efforts to distinguish . 6 m mutation changed the activity of isocitrate 12 the mutations that cause and accelerate dehydrogenase, causing a cancer-promoting — the drivers — from the accidental metabolite to accumulate in cells4. And at least by-products of a cancer’s growth and thwarted 11 7 one pharmaceutical company — Agios Phar- DNA-repair mechanisms — the passengers.

8 from adapted 10

maceuticals in Cambridge, Massachusetts — is 9 Researchers can look for mutations that pop up already hunting for a drug to stop the process. again and again, or they can identify key path- Four years after the initial discovery, ask Intrachromosomal ways that are mutated at different points. But rearrangement a researcher in the field why cancer genome Copy-number change the projects are providing more questions than projects are worthwhile, and many will probably answers. “Once you take the few obvious muta- bring up the IDH1 mutation, the inconspicuous tions at the top of the list, how do you make

972 © 2010 Macmillan Publishers Limited. All rights reserved VolNATURE 464|15|Vol April 464 2010|15 April 2010 NEWS FEATURE

Canada Britain • • Breast cancer 1 (ductal adenocarcinoma) (ER–, PR–, HER–) Germany • Breast cancer (lobular) • Paediatric brain tumours 3 • Breast cancer (ER+, HER–) (medulloblastoma, – European Union sponsored pilocytic astrocytoma) 1

1 China 1 • Gastric cancer United States Spain 6+ – Through • Chronic 1 • lymphocytic • Brain cancer leukaemia 3 1 (glioblastoma multiforme) India • • Oral cancer (squamous-cell carcinoma) France (gingivobuccal) 1 Japan • Lung cancer • Breast cancer • Liver cancer (adenocarcinoma) (HER2 overexpressing) (virus-associated) • Acute myeloid leukaemia • Liver cancer Italy • Colon cancer (alcohol-associated) • Rare pancreatic cancers (adenocarcinoma) • Renal-cell carcinoma (enteropancreatic endocrine, • Others – European Union sponsored pancreatic exocrine)

ALL TOGETHER NOW Number of Australia Eleven countries have signed on to sequence DNA from 500 tumour samples for cancer types • Pancreatic cancer each of more than 20 cancer types for the International Cancer Genome Consortium. 1 being sequenced 2 (ductal adenocarcinoma) Each cancer type is estimated to cost nearly US$20 million to sequence. • Ovarian cancer sense of the rest of them?” asks Will Parsons, a useful, says Joe Gray, a cancer researcher at 1% of the cancers,” says Vogelstein. To find these paediatric oncologist at Baylor College of Medi- Lawrence Berkeley National Laboratory in low-frequency drivers, researchers are sampling cine in Houston, Texas. “How do you decide California, but it’s just a start. “In the early days, heavily — sequencing 500 samples per cancer which are worthy of follow up and functional I thought that doing a few hundred tumours should reveal mutations that are present in as analysis? That’s going to be the hard part.” would probably be sufficient,” he says. “Even at few as 3% of the tumours. Although they may the level of 1,000 samples, I think we’re probably not contribute to the majority of tumours, they Drivers wanted not going to have the statistics we want.” may still have important biological lessons, Because cancer is a disease so intimately What bigger numbers could provide is more says Stratton. “We need to know about these associated with genetic mutation, many thought driver mutations like the one in IDH1. These to understand the overall genomic landscape it would be amenable to genomic exploration could, researchers argue, provide the clear- of cancer.” through initiatives based on the collaborative est route to developing new cancer therapies. Another popular approach has been to model of the Project. The Many scientists have looked for mutations that look for mutations that cluster in a pathway, a International Cancer Genome Consortium occur repeatedly in a given type of tumour. “If group of genes that work together to carry out a (ICGC), formed in 2008, is coordinating efforts there are lots and lots of abnor- specific process, even if the to sequence 500 tumours from each of 50 can- malities of a particular gene, the “It’s going to take mutations strike it at different cers. Together, these projects will cost in the most likely explanation is often good old-fashioned points. In an analysis of 24 pan- 6 order of US$1 billion. Eleven countries have that those mutations have been biology to really creatic cancers , for instance, already signed on to cover more than 20 cancers selected for by the cancers and Vogelstein and his colleagues (see map). The ICGC includes two older, large- therefore they are cancer-caus- determine what these identified 12 signalling path- scale projects: the , at ing,” says , who mutations are doing.” ways that had been altered. the Sanger Institute near Cam- co-directs the Cancer Genome Nevertheless, Vogelstein cau- bridge, UK, and the US National Institutes of Project. This approach has worked well in some tions that this approach is not easy to pursue. Health’s Cancer Genome Atlas (TCGA). The cancers. For example, with a frequency of 12%, Many pathways overlap, and their boundaries Cancer Genome Project has churned out more it is clear that the IDH1 mutation is a driver in are unclear. And because many have been than 100 partial genomes and roughly 15 whole glioblastoma. Such searches should be fruitful defined using data from different animals or genomes in various stages of completion, and for cancers that have fewer mutations overall. cell types, they do not always match what’s intends to tackle 2,000–3,000 more over the The full genome sequence of acute myeloid found in a specific human tissue. “When you next 5–7 years. TCGA, meanwhile, wrapped leukaemia cells yielded just ten mutations in layer on top of that the fact that the cancer cell up a three-year, three-cancer pilot project last protein-coding genes, eight of which had not is not wired the same as a normal cell, that raises year, then launched a full-scale endeavour to previously been linked with cancer5. even further difficulties,” says Vogelstein. sequence up to 500 tumours from each of more Other cancers have proved more challeng- than 20 cancers over the next five years. ing. IDH1 was overlooked at first, on the basis How much is enough? Although the groups collaborate, TCGA has of the data alone. It was not Separating drivers from passengers will not yet been able to fully join the ICGC owing until the search was expanded to other cancers become even more difficult as researchers to differences in privacy regulations govern- that its importance was revealed. Moreover, move towards sequencing entire tumour ing access to genome data. For now, members some mutations shown to be drivers haven’t genomes. To date, only a fraction of the exist- of both consortia are sequencing a subset of turned up as often as expected. “It’s very clear, ing cancer genomes are complete sequences. tumour samples from each cancer type — now that all the genes have been sequenced in To keep costs low, most have covered only the around 100 — and will follow this by sequencing this many tumours, you have drivers that are exome, the 1.5% of the genome that directly promising areas in the remaining 400. That’s mutated at very low frequency, in less than codes for protein and is therefore the easiest

973 © 2010 Macmillan Publishers Limited. All rights reserved NEWS FEATURE NATURE|Vol 464|15 April 2010

CANCER GENOMES COMING FAST to interpret. Assigning importance to a For example, a $65-million, three-year A few examples of fully and partially sequenced cancer mutation found in the murky non-protein- paediatric-cancer genome project headed by genomes and their defining characteristics. coding depths of the genome will be more researchers at St Jude Children’s Research Hos- challenging, especially given that scientists pital in Memphis, Tennessee, and Washington LUNG CANCER don’t yet know what function — if any — University aims to sequence 600 tumours. And Cancer: small-cell lung carcinoma most of these regions usually serve. The more small projects seem poised to pop up. • Sequenced: full genome vast majority of mutations fall here. The “Pretty much any cancer centre with any inter- • Source: NCI-H209 cell line full genome sequence of a lung cancer cell est in the genomics of cancer is now buying • Point mutations: 22,910 line, for example, yielded 22,910 point muta- these sequencers and using them,” says Sam • Point mutations in gene regions: 134 tions, only 134 of which were Aparicio, a cancer researcher at • Genomic rearrangements: 58 “Even at the level of • Copy-number changes: 334 in protein-coding regions (see the University of British Colum- graphic, left)7. Nevertheless, bia in Vancouver, Canada. Highlights: 1,000 samples, Duplication of the CHD7 gene confirmed in two finding them is worth the I think we’re probably Part of the reason that cancer- other small-cell lung carcinoma cell lines. cost and effort, argues Strat- genome proponents don’t want Source: E. D. Pleasance et al. Nature 463, 184–190 (2010). ton. “It could be that none of not going to have the to wait for sequencing costs to those mutations pertain to statistics we want.” drop is that the real work starts the causation of cancer,” he after the sequencing is over. As SKIN CANCER says. “But it equally could be that some do. Velculescu puts it, “Ultimately it’s going to take Cancer: metastatic We’ll never find out unless we systemati- good old-fashioned biology and experimen- • Sequenced: full genome cally investigate.” tal analyses to really determine what these • Source: COLO-829 cell line Not everyone agrees. Some researchers mutations are doing.” With this in mind, the • Point mutations: 33,345 argue that the costs of cancer-genome projects US National Cancer Institute established • Point mutations in gene regions: 292 currently outweigh the benefits. Prices are two 2-year projects in September last year to • Genomic rearrangements: 51 • Copy-number changes: 41 poised to drop dramatically in the next few develop high-throughput methods to test how years as a new generation of sequencing the mutations identified by the TCGA pilot Highlights: Patterns of mutation reflect damage machines comes online, says Ari Melnick, a project affect cell function. The two centres by ultraviolet light. cancer researcher at Weill Cornell Medical — one at the Dana-Farber Cancer Center in Source: E. D. Pleasance et al. Nature 463, 191–196 (2010). College in New York. “Why not wait for that?” Boston, and another at Cold Spring Harbor he asks. In the meantime there are lower- Laboratory in New York — aim to systematize hanging fruit to pick, says Stephen Elledge, the way that researchers pull other needles like BREAST CANCER a geneticist at Harvard Medical School in the IDH1 mutation from the cancer-genomes Cancer: basal-like breast cancer Boston, Massachusetts. Mutations that haystack and make sense of them. The Boston • Sequenced: full genome affect how many copies of a gene are found team will systematically amplify and reduce • Source: primary tumour, brain in a genome, he argues, are cheaper to assess the expression of genes of interest in cell metastasis, and tumours transplanted and provide a more intuitive insight into bio- cultures, and the Cold Spring Harbor centre into mice logical processes. “If you delete something, will study cancer-associated mutations using • Point mutations: 27,173 in primary, 51,710 in metastasis and 109,078 in transplant you can turn a pathway off very efficiently,” tumours transplanted into mice. • Point mutations in gene regions: 200 in primary, he says. “And if you amplify something, In addition, large-scale projects are being 225 in metastasis, 328 in transplant you can increase flow through the pathway. run in parallel with the cancer-sequencing • Genomic rearrangements: 34 Making point mutations in genes to activate consortia to assess the effects of deleting each • Copy-number changes: 155 in primary, 101 in metastasis, 97 in transplant them is a little dicier.” gene in the mouse genome, enabling research- Changes in gene copy number can be ers to learn more about the normal function of Highlights: The CTNNA1 gene encodes a putative suppressor of detected using fast, relatively inexpensive genes that are mutated in cancer. Sequencing metastasis that is deleted in all tumour samples. array-based technologies, but sequencing can is all very well, researchers have realized, but Source: L. Ding et al. Nature 464, 999–1005 (2010). provide a higher-resolution snapshot of these it won’t be enough. “Some people say statistics regions, says Elaine Mardis, a sequencing should get us all the drivers that are worth- specialist at Washington University in St Louis, while,” says Lynda Chin, an investigator with BRAIN CANCER Missouri. Sequencing can enable researchers TCGA at Harvard Medical School. “I don’t Cancer: glioblastoma multiforme to map the boundaries of insertions and dupli- agree with that. At the end of the day, we need • Sequenced: exome (no complete Circos plot) cations with more precision and to catch tiny these functional studies to prioritize the list of • Source: 7 patient tumours, 15 tumours duplications or deletions that might have gone potential cancer-relevant candidates.” ■ transplanted into mice (follow-up sequencing undetected by an array. Mardis, along with her Heidi Ledford is a reporter for Nature in on 21 genes for 83 additional samples) colleague Richard Wilson and others, used Cambridge, Massachusetts. • Genes containing at least one protein-altering mutation: 685 sequencing to detect overlapping deletions in 1. Sjöblom, t. et al. Science 314, 268–274 (2006). • Genes containing at least one protein-altering a breast cancer that had spread to other parts 2. parsons, d. W. et al. Science 321, 1807–1812 (2008). point mutation: 644 of the body (see page 999)8. The deletions 3. mardis, e. r. et al. N. engl. J. Med. 361, 1058–1066 (2009). • Copy-number changes: 281 4. dang, l. et al. Nature 462, 739–744 (2009). spanned the region containing CTNNA1, a 5. ley, t. J. et al. Nature 456, 66–72 (2008). Highlights: gene thought to suppress the spread, or metas- 6. Jones, S. et al. Science 321, 1801–1806 (2008). Mutations in the active site of IDH1 have tasis, of cancer. 7. pleasance, e. d. et al. Nature 463, 184–190 (2010). been found in 12% of patients. 8. ding, l. et al. Nature 464, 999–1005 (2010). Source: E. R. Mardis et al. N. Engl. J. Med. 361, 1058–1066 (2009). Meanwhile, cancer genomics is spreading out from under the large, centralized projects. See also News and Views, page 989.

974 © 2010 Macmillan Publishers Limited. All rights reserved