Making Sense of Cancer Genomic Data
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press REVIEW Making sense of cancer genomic data Lynda Chin,1,2,3 William C. Hahn,1,2 Gad Getz,2 and Matthew Meyerson1,2 1Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA; 2Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA High-throughput tools for nucleic acid characterization 2010). Moreover, the current pace of technological ad- now provide the means to conduct comprehensive anal- vances make it increasingly clear that the ability to yses of all somatic alterations in the cancer genomes. perform prospective and comprehensive molecular pro- Both large-scale and focused efforts have identified new filing of tumors will become commonplace and enable targets of translational potential. The deluge of informa- genome-informed personalized cancer medicine. tion that emerges from these genome-scale investigations However, the bottlenecks along this path are formi- has stimulated a parallel development of new analytical dable and numerous. For one, these large-scale genome frameworksandtools.Thecomplexityofsomaticgeno- characterization efforts involve the generation and in- mic alterations in cancer genomes also requires the de- terpretation of data at an unprecedented scale, which has velopment of robust methods for the interrogation of the brought into sharp focus the need for improved informa- function of genes identified by these genomics efforts. tion technology infrastructure and new computational Here we provide an overview of the current state of can- tools to render the data suitable for meaningful analyses. cer genomics, appraise the current portals and tools for Moreover, exploiting this information to develop new accessing and analyzing cancer genomic data, and dis- therapeutic strategies depends on further biological in- cuss emerging approaches to exploring the functions of sights derived from understanding the functional conse- somatically altered genes in cancer. quences of such genomic alterations. Thus, it is also clear that new approaches that permit the efficient validation The development of powerful and scalable methods to of genomic data are required as a first step in distinguish- analyze nucleic acids has transformed biological inquiry ing mutations responsible for disease pathogenesis from and has the potential to alter the practice of medicine other mutations that are the consequence of genomic (Lander 1996; Collins et al. 2003). The application of instability; defining genes involved in cancer initiation, such technologies, together with powerful computational progression, or maintenance; and identifying the optimal methods in human disease and animal model systems, ways to exploit this information therapeutically. has facilitated the study of both normal and disease- In this review, we provide an overview of the current state affected tissues in a manner previously not possible. In- of cancer genomics, describe the types of data being gen- deed, the connection between basic inquiry and potential erated and where they can be accessed, and discuss recent clinical translation has never been more intimate. progress in developing tools, models, and methods for This convergence is particularly evident in cancer, analysis of gene functions in cancer, which is the requisite a complex multigenic disease characterized by a diversity next step in the translation of cancer genome information. of genetic and epigenetic alterations (Vogelstein and Kinzler 1993; Weir et al. 2004; Jones and Baylin 2007; State of cancer genomics Stratton et al. 2009). Early cancer genome analysis has already led to new targets for cancer therapy and new Nearly all cancer genomes contain many nucleotide se- insights into the relationship of specific genetic muta- quence changes compared with the germline of the can- tions and clinical response, as well as new approaches cer patient (Vogelstein and Kinzler 1993; Stratton et al. useful for diagnosis and prognosis. These initial efforts 2009). These variations include the genomic alterations have motivated large-scale coordinated cancer genomic that cause or promote cancer, often referred to colloqui- efforts to obtain complete catalogs of the genomic alter- ally as ‘‘drivers,’’ as well as alterations present in the ations in specific cancer types (The Cancer Genome Atlas cancer genome but without obvious advantage to the can- [TCGA], http://cancergenome.nih.gov; Hudson et al. cerous cells when they occurred, referred to as ‘‘passen- gers’’ (Davies et al. 2005). The major known somatic alterations in the cancer genome include nucleotide sub- [Keywords: functional validation; genomic data portal; integrative analysis; stitution mutations and small insertion/deletions (indels), large-scale cancer genomics] 3Corresponding author. copy number gains and losses, chromosomal rearrange- E-MAIL [email protected]; FAX (617) 582-8169. ments, and nucleic acids of foreign origin (e.g., oncogenic Article is online at http://www.genesdev.org/cgi/doi/10.1101/gad.2017311. Freely available online through the Genes & Development Open Access viruses) (Weir et al. 2004; Chin and Gray 2008; Stratton option. et al. 2009). In addition, alterations in the epigenetic 534 GENES & DEVELOPMENT 25:534–555 Ó 2011 by Cold Spring Harbor Laboratory Press ISSN 0890-9369/11; www.genesdev.org Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press Making sense of cancer genomic data mechanisms that regulate gene expression occur in most project applied targeted sequencing, copy number, and cancers; this subject has been covered elsewhere (Jones expression profiling, in addition to epigenetic assessment and Baylin 2002, 2007) and is not treated in detail here. to a large number of stringently qualified tumor samples to All of these acquired changes occur in the setting of define core pathways of deregulation in GBM (The Cancer germline variations of copy number and nucleotide se- Genome Atlas Network 2008) and discover genomic and quence, which may influence the rate of occurrence epigenomic definition of molecular subtypes (Noushmehr and/or the effects of somatic genetic alterations (Balmain et al. 2010; Verhaak et al. 2010). Indeed, global com- et al. 2003). Moreover, although somatic mutations occur prehensive analysis with complementary genome an- in tumor cells, it is increasingly clear that the tumor notation tools in statistically powered high-quality microenvironment mediates important heterotypic sig- sample cohorts is a key aspect of the current consensus nals between the tumor and stromal cells for growth and standard for large-scale cancer genomics efforts under survival (Hanahan and Weinberg 2000). In this respect, the International Cancer Genome Consortium (ICGC) global gene expression encompassing the full tran- (Hudson et al. 2010), now encompassing >20 projects scriptome—including coding messenger RNAs (mRNAs) from 14 countries (Table 1). (Schena et al. 1995) and noncoding microRNAs (miRNAs) (Lu et al. 2005)—of the complex tumor tissue reflects the Second-generation sequencing technologies panoply of somatic and epigenomic alterations, together with the state of cell differentiation of the tumor and the During the past several decades, continuous improve- admixture of noncancerous cells. Hence, transcriptional ments in genomic technology have led to a series of profiling can define a unique gene expression signature breakthroughs in our understanding of cancer genetics. for each tumor that may prove useful for classification The advent of second-generation sequencing technolo- and prognosis (Golub et al. 1999; Alizadeh et al. 2000; gies and their applications to cancer have already accel- van’t Veer et al. 2002). erated the pace of genome discovery, as summarized in recent reviews (Shendure and Ji 2008; Meyerson et al. 2010). During the last 5 years, a variety of array-based Evolution of cancer genomics methods have been developed, including picotiter plate Over the past decade, technologies for detection of each pyrosequencing (Margulies et al. 2005; Wheeler et al. of these types of alterations have been developed and 2008), single-nucleotide fluorescent base extension with applied to analyses of the cancer genomes. The initial reversible terminators (Bentley et al. 2008), and ligation- studies focused on a single technology platform and/or based sequencing (Shendure et al. 2005; Drmanac et al. type of genetic alteration. For example, high-resolution 2010). All of these second-generation methods involve copy number profiling has led to the discovery of novel the amplification of individual DNA molecules on arrays oncogenes in ovarian cancer (Nanjundan et al. 2007), or beads prior to massively parallel sequence generation. melanoma (Garraway et al. 2005; Kim et al. 2006; Scott The throughput limitation and cost of first-generation et al. 2009), lung carcinoma (Weir et al. 2007; Bass et al. Sanger-based capillary sequencing technology had, until 2009), and colon carcinoma (Firestein et al. 2008), and now, dictated two predominant study designs for cancer tumor suppressor genes in leukemias (Mullighan et al. genome discovery: one in which large numbers of sam- 2007, 2008). Similarly, the application of directed se- ples were analyzed but only a small number of genes were quencing of specific classes of genes has identified novel interrogated (Greenman