Advances in Understanding Cancer Genomes Through Second-Generation Sequencing
Total Page:16
File Type:pdf, Size:1020Kb
Advances in understanding cancer genomes through second-generation sequencing Matthew Meyerson, Stacey Gabriel and Gad Getz Second-generation A major near-term medical impact of the genome comprehensive genome-based diagnosis of cancer is sequencing technology revolution will be the elucidation of mecha- becoming increasingly crucial for therapeutic decisions. Used in this Review to refer to nisms of cancer pathogenesis, leading to improvements During the past decades, there have been major sequencing methods that have in the diagnosis of cancer and the selection of cancer advances in experimental and informatic methods emerged since 2005 that second-generation sequencing parallelize the sequencing treatment. Thanks to for genome characterization based on DNA and RNA 1–5 process and produce millions technologies , recently it has become feasible to microarrays and on capillary-based DNA sequenc- of typically short sequence sequence the expressed genes (‘transcriptomes’)6,7, ing (‘first-generation sequencing’, also known as Sanger reads (50–400 bases) from known exons (‘exomes’)8,9, and complete genomes10–15 sequencing). These technologies provided the ability amplified DNA clones. of cancer samples. to analyse exonic mutations and copy number altera- It is also often known as next-generation sequencing. These technological advances are important for tions and have led to the discovery of many important advancing our understanding of malignant neoplasms alterations in the cancer genome20. because cancer is fundamentally a disease of the genome. However, there are particular challenges for the A wide range of genomic alterations — including point detection and diagnosis of cancer genome alterations. mutations, copy number changes and rearrangements For example, some genomic alterations in cancer are — can lead to the development of cancer. Most of these prevalent at a low frequency in clinical samples, often alterations are somatic, that is, they are present in cancer owing to substantial admixture with non-malignant cells but not in a patient’s germ line16. cells. Second-generation sequencing can solve such An impetus for studies of somatic genome altera- problems21. Furthermore, these new sequencing meth- Dana-Farber Cancer Institute, 44 Binney Street, Boston, tions, which are the focus of this Review, is the poten- ods make it feasible to discover novel chromosomal rear- 22 23–25 Massachusetts 02115, USA. tial for therapies targeted against the products of these rangements and microbial infections and to resolve Broad Institute, 7 Cambridge alterations. For example, treatment with the inhibitors copy number alterations at very high resolution22,26. Center, Cambridge, of the epidermal growth factor receptor kinase (EGFR), At the same time, the avalanche of data from second- Massachusetts 02142, USA. gefitinib and erlotinib, leads to a significant survival generation sequencing provides a statistical and com- Correspondence to M.M. email: matthew_meyerson@ benefit in patients with lung cancer whose tumours putational challenge: how to separate the ‘wheat’ of dfci.harvard.edu carry EGFR mutations, but no benefit in patients causative alterations from the ‘chaff’ of noise caused by doi:10.1038/nrg2841 whose tumours carry wild-type EGFR17–19. Therefore, alterations in the unstable and evolving cancer genome. NATURE REVIEWS | VOLUME 11 | OCTOBER 2010 | © 2010 Macmillan Publishers Limited. All rights reserved This challenge is likely to be solved in part by system- generally used for germline genome analysis — periph- atic analyses of large cancer genome data sets that eral blood mononuclear cells — are known to be hetero- will provide sufficient statistical power to overcome geneous only at the rearranged immunoglobulin and T experimental and biological noise27,28. cell receptor loci in a subset of cells. By contrast, a cancer In this Review, we discuss the key challenges in can- specimen contains a mixture of malignant and non- cer genome sequencing, the methods that are currently malignant cells and, therefore, a mixture of cancer and available and their relative values for detecting differ- normal genomes (and transcriptomes). Furthermore, ent types of genomic alteration. We then summarize the the cancers themselves may be highly heterogeneous main points to consider in the computational analysis and composed of different clones that have different of cancer genome sequencing data and comment on the genomes34. Cancer genome analytical models must take future potential for using genomics in cancer diagnosis. these two types of heterogeneity (cancer versus normal Cancer genome sequencing is a rapidly moving field, heterogeneity and within-cancer heterogeneity) into so in this Review we aim to set out the principles and account in their prediction of genome alterations. important methodological considerations, with a brief summary of important findings to date. Structural variability of cancer genomes. Cancer genomes are enormously diverse and complex. They vary substantially in their sequence and structure com- Cancer samples and cancer genomes have general char- pared to normal genomes and among themselves. To acteristics that are distinct from other tissue samples paraphrase Leo Tolstoy’s famous first line from Anna and from genomic sequences that are inherited through Karenina: normal human genomes are all alike, but every the germ line. These require particular consideration in cancer genome is abnormal in its own way. second-generation sequencing analyses. Specifically, cancer genomes vary considerably in their mutation frequency (degree of variation compared Characteristics of cancer samples for genomic analysis. to the reference sequence), in global copy number or Cancer samples differ in their quantity, quality and purity ploidy, and in genome structure. These variations have from the peripheral blood samples that are used for germ- several implications for cancer genome analysis: the line genome analysis. Surgical resection specimens tend presence of a somatic mutation is not enough to establish First-generation sequencing to be large and have been the mainstay of cancer genome statistical significance as it must be evaluated in terms (also known as Sanger analysis. However, diagnostic biopsies from patients with of the sample-specific background mutation rate, which sequencing or capillary disseminated disease tend to contain few cells — as surgi- can vary at different types of nucleotides (discussed sequencing). The standard cal cure is not possible in these cases, minimizing biopsy further below). The analysis of mutations must also be sequencing methodology used to sequence the reference size is a safety consideration. Therefore, the quantity of adjusted for the ploidy and the purity of each sample human (and other model nucleic acids available may be limiting; obtaining sequence and the copy number at each region. For example, if 50% organism) genomes. It uses information from such biopsies will require decreasing the of the tumour DNA is derived from cancer cells and a radioactively or fluorescently minimum inputs for second-generation sequencing. An mutation is present on 1 of 4 copies of chromosome 11, labelled dideoxynucleotide triphosphates (ddNTPs) as alternative approach to sequencing from small samples is the frequency of that mutation will be 12.5% in the DNA chain terminators. Various whole-genome amplification, but this method does not pre- sample. Similar considerations apply to the detection of detection methods allow serve genome structure and can give rise to artefactual somatic rearrangements. read-out of sequence according nucleotide sequence alterations29. To identify somatic alterations in cancer, comparison to the incorporation of each Nucleic acids from cancer are also often of lower with matched normal DNA from the same individual is specific terminator (ddATP, ddCTP, ddGTP or ddTTP). quality than those purified from peripheral blood. One essential. This is largely owing to our incomplete knowl- reason for this is technical: most cancer biopsy and edge of the variations in the normal human genome; to Whole-genome resection specimens are formalin-fixed and paraffin- date, each ‘matched normal’ cancer genome sequence amplification embedded (FFPE) to optimize the resolution of micro- has identified large numbers of mutations and rear- Various molecular techniques (including multiple scopic histology. Nucleic acids from FFPE specimens rangements in the germ line that had not been previously 11–15,35 displacement amplification, are likely to have undergone crosslinking and also may described . In the future, the complete characteriza- rolling circle amplification or be degraded30. Second-generation sequence analy- tion of many thousands of normal human genomes may degenerate oligonucleotide sis of FFPE-derived nucleic acids can require special obviate this need for a matched normal sample. primed PCR) in which very experimental 31 and computational methods to han- small amounts (nanograms) 32,33 of a genomic DNA sample dle an increased background mutation rate . A sec- can be multiplied in a largely ond reason for this difference in nucleic acid quality is Second-generation sequencing technologies are based unbiased fashion to produce biological: cancer specimens often include substantial on the simultaneous detection of nucleotides in arrayed suitable quantities for genomic fractions of necrotic or apoptotic cells that reduce the amplified DNA products originating from single