Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

REVIEW

Making sense of genomic data

Lynda Chin,1,2,3 William C. Hahn,1,2 Gad Getz,2 and Matthew Meyerson1,2 1Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA; 2Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA

High-throughput tools for nucleic acid characterization 2010). Moreover, the current pace of technological ad- now provide the means to conduct comprehensive anal- vances make it increasingly clear that the ability to yses of all alterations in the cancer genomes. perform prospective and comprehensive molecular pro- Both large-scale and focused efforts have identified new filing of tumors will become commonplace and enable targets of translational potential. The deluge of informa- genome-informed personalized cancer medicine. tion that emerges from these genome-scale investigations However, the bottlenecks along this path are formi- has stimulated a parallel development of new analytical dable and numerous. For one, these large-scale genome frameworksandtools.Thecomplexityofsomaticgeno- characterization efforts involve the generation and in- mic alterations in cancer genomes also requires the de- terpretation of data at an unprecedented scale, which has velopment of robust methods for the interrogation of the brought into sharp focus the need for improved informa- function of identified by these efforts. tion technology infrastructure and new computational Here we provide an overview of the current state of can- tools to render the data suitable for meaningful analyses. cer genomics, appraise the current portals and tools for Moreover, exploiting this information to develop new accessing and analyzing cancer genomic data, and dis- therapeutic strategies depends on further biological in- cuss emerging approaches to exploring the functions of sights derived from understanding the functional conse- somatically altered genes in cancer. quences of such genomic alterations. Thus, it is also clear that new approaches that permit the efficient validation The development of powerful and scalable methods to of genomic data are required as a first step in distinguish- analyze nucleic acids has transformed biological inquiry ing responsible for disease pathogenesis from and has the potential to alter the practice of medicine other mutations that are the consequence of genomic (Lander 1996; Collins et al. 2003). The application of instability; defining genes involved in cancer initiation, such technologies, together with powerful computational progression, or maintenance; and identifying the optimal methods in human disease and animal model systems, ways to exploit this information therapeutically. has facilitated the study of both normal and disease- In this review, we provide an overview of the current state affected tissues in a manner previously not possible. In- of cancer genomics, describe the types of data being gen- deed, the connection between basic inquiry and potential erated and where they can be accessed, and discuss recent clinical translation has never been more intimate. progress in developing tools, models, and methods for This convergence is particularly evident in cancer, analysis of functions in cancer, which is the requisite a complex multigenic disease characterized by a diversity next step in the translation of cancer genome information. of genetic and epigenetic alterations (Vogelstein and Kinzler 1993; Weir et al. 2004; Jones and Baylin 2007; State of cancer genomics Stratton et al. 2009). Early cancer genome analysis has already led to new targets for cancer therapy and new Nearly all cancer genomes contain many nucleotide se- insights into the relationship of specific genetic muta- quence changes compared with the germline of the can- tions and clinical response, as well as new approaches cer patient (Vogelstein and Kinzler 1993; Stratton et al. useful for diagnosis and prognosis. These initial efforts 2009). These variations include the genomic alterations have motivated large-scale coordinated cancer genomic that cause or promote cancer, often referred to colloqui- efforts to obtain complete catalogs of the genomic alter- ally as ‘‘drivers,’’ as well as alterations present in the ations in specific cancer types ( cancer genome but without obvious advantage to the can- [TCGA], http://cancergenome.nih.gov; Hudson et al. cerous cells when they occurred, referred to as ‘‘passen- gers’’ (Davies et al. 2005). The major known somatic alterations in the cancer genome include nucleotide sub- [Keywords: functional validation; genomic data portal; integrative analysis; stitution mutations and small insertion/deletions (indels), large-scale cancer genomics] 3Corresponding author. copy number gains and losses, chromosomal rearrange- E-MAIL [email protected]; FAX (617) 582-8169. ments, and nucleic acids of foreign origin (e.g., oncogenic Article is online at http://www.genesdev.org/cgi/doi/10.1101/gad.2017311. Freely available online through the Genes & Development Open Access viruses) (Weir et al. 2004; Chin and Gray 2008; Stratton option. et al. 2009). In addition, alterations in the epigenetic

534 GENES & DEVELOPMENT 25:534–555 Ó 2011 by Cold Spring Harbor Laboratory Press ISSN 0890-9369/11; www.genesdev.org Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data mechanisms that regulate gene expression occur in most project applied targeted sequencing, copy number, and ; this subject has been covered elsewhere (Jones expression profiling, in addition to epigenetic assessment and Baylin 2002, 2007) and is not treated in detail here. to a large number of stringently qualified tumor samples to All of these acquired changes occur in the setting of define core pathways of deregulation in GBM (The Cancer germline variations of copy number and nucleotide se- Genome Atlas Network 2008) and discover genomic and quence, which may influence the rate of occurrence epigenomic definition of molecular subtypes (Noushmehr and/or the effects of somatic genetic alterations (Balmain et al. 2010; Verhaak et al. 2010). Indeed, global com- et al. 2003). Moreover, although somatic mutations occur prehensive analysis with complementary genome an- in tumor cells, it is increasingly clear that the tumor notation tools in statistically powered high-quality microenvironment mediates important heterotypic sig- sample cohorts is a key aspect of the current consensus nals between the tumor and stromal cells for growth and standard for large-scale cancer genomics efforts under survival (Hanahan and Weinberg 2000). In this respect, the International Cancer Genome Consortium (ICGC) global gene expression encompassing the full tran- (Hudson et al. 2010), now encompassing >20 projects scriptome—including coding messenger RNAs (mRNAs) from 14 countries (Table 1). (Schena et al. 1995) and noncoding microRNAs (miRNAs) (Lu et al. 2005)—of the complex tumor tissue reflects the Second-generation sequencing technologies panoply of somatic and epigenomic alterations, together with the state of cell differentiation of the tumor and the During the past several decades, continuous improve- admixture of noncancerous cells. Hence, transcriptional ments in genomic technology have led to a series of profiling can define a unique gene expression signature breakthroughs in our understanding of cancer genetics. for each tumor that may prove useful for classification The advent of second-generation sequencing technolo- and prognosis (Golub et al. 1999; Alizadeh et al. 2000; gies and their applications to cancer have already accel- van’t Veer et al. 2002). erated the pace of genome discovery, as summarized in recent reviews (Shendure and Ji 2008; Meyerson et al. 2010). During the last 5 years, a variety of array-based Evolution of cancer genomics methods have been developed, including picotiter plate Over the past decade, technologies for detection of each pyrosequencing (Margulies et al. 2005; Wheeler et al. of these types of alterations have been developed and 2008), single-nucleotide fluorescent base extension with applied to analyses of the cancer genomes. The initial reversible terminators (Bentley et al. 2008), and ligation- studies focused on a single technology platform and/or based sequencing (Shendure et al. 2005; Drmanac et al. type of genetic alteration. For example, high-resolution 2010). All of these second-generation methods involve copy number profiling has led to the discovery of novel the amplification of individual DNA molecules on arrays in ovarian cancer (Nanjundan et al. 2007), or beads prior to massively parallel sequence generation. melanoma (Garraway et al. 2005; Kim et al. 2006; Scott The throughput limitation and cost of first-generation et al. 2009), lung carcinoma (Weir et al. 2007; Bass et al. Sanger-based capillary sequencing technology had, until 2009), and colon carcinoma (Firestein et al. 2008), and now, dictated two predominant study designs for cancer tumor suppressor genes in leukemias (Mullighan et al. genome discovery: one in which large numbers of sam- 2007, 2008). Similarly, the application of directed se- ples were analyzed but only a small number of genes were quencing of specific classes of genes has identified novel interrogated (Greenman et al. 2007; The Cancer Genome genes involved in specific types of cancer (Davies et al. Atlas Network 2008; Dalgliesh et al. 2010; Kan et al. 2002; Lynch et al. 2004; Paez et al. 2004; Pao et al. 2004; 2010), versus a second in which all coding genes were Samuels et al. 2004; Stephens et al. 2004; Baxter et al. 2005; sequenced but in only a handful of discovery samples, James et al. 2005; Kralovics et al. 2005; Levine et al. 2005; followed by targeted sequencing of candidates in an ex- Zhao et al. 2005; Pollock et al. 2007; Chen et al. 2008; Dutt tension cohort comprised of an independent set of sam- et al. 2008; George et al. 2008; Janoueix-Lerosey et al. 2008; ples (Sjoblom et al. 2006; Wood et al. 2007; S Jones et al. Mosse et al. 2008). These observations underscore the 2008; Parsons et al. 2008). Second-generation sequencing contribution of different types of somatic genome alter- technology enables the complete sequencing of entire ations in different subsets of cancer, and that comprehen- genomes in a time- and cost-efficient manner. Today, a sive profiling of the cancer genome will require interroga- single sequencing run of an Illumina HiSeq 2000 se- tion of different types of genome alterations in diverse quencer can generate ;200 gigabases of sequence data cancer types and subtypes. in 8 d—an output that easily exceeds the annual sequenc- As technologies to perform comprehensive profiling of ing production of a genome sequencing center a few years the cancer genome progressed, different technology plat- ago (http://www.genome.gov/10001691). This astronom- forms from microarrays to capillary sequencing were ical increase in sequencing capacity, along with the rapid brought together on unified sample sets (The Cancer reduction in sequencing cost (which is faster than the Genome Atlas Network 2008; S Jones et al. 2008; Parsons doubling of semiconductor/computer capacity every 18 et al. 2008). For example, Velculescu and colleagues (S mo, known as Moore’s law) (Pettersson et al. 2009), has Jones et al. 2008) integrated sequencing with expression completely transformed cancer genome discovery science. and copy number profiling to identify IDH1 mutations in Second-generation sequencing offers several advan- (GBM). The Cancer Genome Atlas pilot tages over previous technologies. It has the power to

GENES & DEVELOPMENT 535 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Chin et al.

Table 1. ICGC cancer genome projects, committed or active, including 37 projects in 12 countries and two European consortia as of January 2011 Lead jurisdiction Organ sites Tumor subtypes Ovary Serous cystadenocarcinoma Australia Pancreas Pancreatic ductal adenocarcinoma Pancreas Pancreatic ductal adenocarcinoma Canada Prostate Prostate adenocarcinoma China Stomach Intestinal- and diffuse-type gastric cancer European Union/France Kidney Renal cell carcinoma European Union/United Kingdom Breast ER-positive, HER2-negative breast cancer Breast HER2-amplified breast cancer France Liver Hepatocellular carcinoma secondary to alcohol and adiposity Prostate Prostate adenocarcinoma Blood Germinal center B-cell-derived lymphoma Germany Brain Medulloblastoma and pediatric pilocytic astrocytoma Prostate Early onset prostate cancer India Oral cavity Gingivobuccal carcinoma Italy Pancreas Rare pancreatic subtypes, including enteropancreatic endocrine tumors and exocrine tumors Japan Liver Virus-associated hepatocellular carcinoma Mexico Multiple Common tumor types in Mexico Spain Hematopoietic Chronic lymphocytic leukemia with mutated and unmutated IgVH Bone Osteosarcoma/chondrosarcoma/rare bone cancers United Kingdom Breast Triple negative/lobular/other breast cancers Hematopoietic Chronic myeloid disorders, including myelodysplastic syndrome, myeloproliferative , and other chronic myeloid malignancies Brain GBM and low-grade gliomas Breast Ductal and lobular breast adenocarcinomas Stomach Intestinal-type gastric adenocarcinoma Liver Hepatocellular carcinoma Intestine Colon and rectal adenocarcinomas Gynecologic Serous ovarian adenocarcinoma; endometrial carcinoma; cervical adenocarcinoma; and squamous carcinomas United States (TCGA) Prostate Prostate adenocarcinoma Bladder Nonpapillary bladder cancer Head and neck Head and neck squamous cell and thyroid papillary carcinomas Hematopoietic Skin Metastatic cutaneous melanoma Lung Non-small-cell lung cancer, adenocarcinoma, and squamous subtypes Kidney Renal clear cell and renal papillary carcinomas Pancreas Pancreatic adenocarcinoma For updated information, see http://www.icgc.org and http://cancergenome.nih.gov. identify mutations in highly admixed samples by virtue et al. 2010a; Pleasance et al. 2010a,b). Similar approaches of deep coverage (Thomas et al. 2006), overcoming a major can be applied to cDNA, also known as RNA-seq, which limitation of Sanger-based capillary sequencing technol- permits accurate digital measurements of gene expression ogy. Whereas previous technologies could query one across the whole . Importantly, this latter modality of cancer genome alteration (, copy approach provides the means to measure known, and number, or expression) at a time, second-generation se- discover novel, splice variants as well as aberrant tran- quencing analyses permit the identification of all such scripts generated by somatic structural genome rear- alterations simultaneously. For example, one can obtain rangements (Maher et al. 2009a,b; Berger et al. 2010; high-resolution and accurate measurements of somatic Palanisamy et al. 2010). This type of data will undoubt- copy number alterations (SCNAs) from whole-genome edly reveal new insights into the regulation of gene sequencing (Campbell et al. 2008; Chiang et al. 2009), and transcription and RNA processing. the same data can identify nucleotide substitutions. Fur- In the near future, sequencing-based approaches will be thermore, second-generation sequencing offers structural applied to nearly all aspects of cancer genome character- information never before available from other genomic ization. For example, current TCGA projects involve the platforms, thus enabling for the first time global assess- comprehensive sequencing of all protein-coding genes ment of chromosomal rearrangements in cancer (Campbell and transcripts by hybrid capture/whole-exome sequenc- et al. 2008; Mardis et al. 2009; Stephens et al. 2009; Ding ing in hundreds of tumor- and germline-matched pairs,

536 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data complemented by deep sequencing of the whole genomes to define significant events or molecular subtypes. This (WGS) (at >30-fold coverage) in 10% of the samples. category of analyzed data is often presented as the Given the rate at which sequencing capacity is increasing findings of a genomic study in a publication. Major sites and cost is shrinking, it is highly likely that deep-coverage where these data sets can be accessed are listed in Table 2 WGS will soon be applied to the majority of the discovery and are described in some detail below. samples. In parallel, efforts to use low-coverage sequenc- Although cancer genomic data from various large-scale ing to conduct structural and copy number analyses are projects—including the (CGP) at likely to replace array-based technologies in the near Wellcome Trust Sanger Institute (http://www.sanger.ac. future. Similar study designs are being adopted by ICGC uk/genetics/CGP), TCGA (http://cancergenome.nih.gov/ projects. dataportal), and ICGC (http://dcc.icgc.org)—are publi- cally available, for protection of patient privacy, access is either open or controlled. Prior to second-generation Accessing cancer genomics data sequencing, most raw data and some type of normalized Although the data generated from these large-scale can- data (e.g., single nucleotide polymorphism [SNP] profiles) cer genome characterization efforts have been and will are subjected to controlled-access restriction, while inter- continue to be made publicly available, accessing and preted and summarized data are openly accessible. With using these cancer genome data remains a major chal- the transition to second-generation sequencing data, it is lenge. In this section, we attempt to provide a framework likely that raw and processed data, and possibly some for nongenomic noncomputational cancer biologists to interpreted data, will fall under the ‘‘controlled-access’’ become familiar with what data are available and where category, since the level of resolution may provide the to download or query each of the major genomic data means to identify specific patients. Controlled-access types. Specifically, we describe briefly the technology data are restricted to qualified researchers (with certifi- platform(s) used to generate each data type, followed cation by host institution) with a specific proposal of data by common public sites where cancer genome data can use that is deemed compliant with the project’s data be downloaded and summarized results can be queried. access policy, typically requiring preapproval by the in- We also point out basic open source analytical tools or stitutional review board of the requesting investigator. computational algorithms for manipulation and analysis For TCGA, access to controlled data is obtained through of cancer genome data. However, it should be noted that dbGAP (http://www.ncbi.nlm.nih.gov/gap); for ICGC the tools described here are illustrative examples, rather projects, access is obtained through its Data Access than a complete survey of sites, data sources, or analytical Compliance Office (http://www.icgc.org/daco). In the tools, as these are rapidly evolving. case of the Sanger Institute’s Cancer Genome Project, genotyping and first-generation sequencing traces can be requested at its data archive (http://www.sanger.ac.uk/ Data structure and data access policies genetics/CGP/Archive), and its second-generation sequenc- Generally speaking, cancer genomic data can be divided ing data must be obtained through the European Genome- into (1) raw, (2) processed or normalized, (3) interpreted, Phenome Archive (EGA, http://www.ebi.ac.uk/ega). At and (4) summarized categories based on the degree of present, downloading raw data from these sources is a computational modification and integration applied to technically and logistically challenging task that requires the data. These categories are sometimes referred to as significant network infrastructure to handle the size of the Level I–Level IV data. Raw, processed, and interpreted data files. (Level I–III) data apply to individual samples, while sum- marized (Level IV) data refer to analyses across sample Nucleotide sequence mutations sets. For example, normalized or processed data represent data that have been assigned to a genome reference, such Nucleotide substitutions and small insertions/deletions as alignment of sequences to or map- are common mechanisms for activating oncogenes and ping of probes to chromosomal positions. For microarray- inactivating tumor suppressor genes. The initial develop- based platforms, normalization refers to combining mul- ment of methods to determine the nucleotide sequence tiple probes measuring a single genomic locus to a single of DNA in 1975 (Sanger and Coulson 1975) led to the value and transforming the measured intensities such discovery of cancer-specific somatic mutations in the that the values can be compared between experiments; RAS gene family in the early 1980s (Parada et al. 1982; examples of normalization steps include correction for Shimizu et al. 1983; Santos et al. 1984; Bos et al. 1985) and, background noise and total brightness. Interpreted data later, mutations in human tumor suppressor genes (Friend represent meaningful biological results extracted from et al. 1986; Hahn et al. 1996). Subsequently, the invention each specimen, such as genome-wide copy number pro- of automated sequencing instruments (Hunkapiller et al. file, where copy number breakpoints have been statisti- 1991) led to the initial sequencing of the human genome cally defined, or gene expression profiles, where individ- (Lander et al. 2001; Venter et al. 2001), and then to sys- ual gene expression levels have been collated from tematic efforts to sequence gene families (Davies et al. multiple loci across the gene. Summarized (Level IV) data 2005; Stephens et al. 2005; Greenman et al. 2007). These represent analysis of interpreted data across a cohort of latter efforts identified several new mutations samples, where statistical methodologies can be applied that are targets for cancer therapy—most notably the

GENES & DEVELOPMENT 537 3 GENES 538 al. et Chin Downloaded from & DEVELOPMENT

Table 2. Databases for cancer genomics data Database Link Data type Type of information Access

ICGC http://dcc.icgc.org/ Levels I–IV Copy number, rearrangement, expression, and Open and controlled genesdev.cshlp.org mutation data TCGA http://cancergenome.nih.gov/dataportal Levels I–III Copy number, expression (mRNA and miRNA), Open and controlled , and mutation sequencing NCBI dbGAP http://www.ncbi.nlm.nih.gov/gap Levels I–II Raw sequencing traces; second-generation Controlled sequencing BAM files by TCGA COSMIC http://www.sanger.ac.uk/genetics/CGP/cosmic Levels III–IV Somatic mutations and copy number alterations by Open gene: amino acid position, tumor type, literature onSeptember29,2021-Publishedby references Cancer Gene Census http://www.sanger.ac.uk/genetics/CGP/Census Level IV Annotation of mutated or genomically altered genes Open WTSI CGP http://www.sanger.ac.uk/genetics/CGP/Archive Levels I–II First-generation trace archive; SNP genotype Controlled profiles EGA http://www.ebi.ac.uk/ega Levels I–II Second-generation sequencing BAM files generated Controlled by WTSI CGP Tumorscape http://www.broadinstitute.org/tumorscape Levels I–IV Browsable, searchable cancer copy number viewer Open using SNP array data Oncomine http://www.oncomine.org Level IV Gene expression and copy number data in readily Password-protected searchable and comparable fashion GEO http://ncbi.nlm.nih.gov/geo Level I Gene expression data Password-protected caArray http://caarray.nci.nih.gov Level I Gene expression data Password-protected UCSC Cancer https://genome-cancer.soe.ucsc.edu Levels III–IV Browsable viewer for cancer copy number and Open

Genome Browser expression data Cold SpringHarborLaboratoryPress The cBio Cancer http://cbioportal.org Levels III–IV Browsable and searchable viewer for cancer copy Open Genomics Portal number and expression data OMIM http://www.ncbi.nlm.nih.gov/omim Inherited syndromes and causative genes for cancer Open and other diseases, with extensive literature review Mitelman http://cgap.nci.nih.gov/Chromosomes/Mitelman Copy number alterations and translocations based on Open cytogenetic data (Level I) Raw; (Level II) normalized/processed; (Level III) interpreted; (Level IV) summarized.

Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data

BRAF and EGFR protein genes and the PIK3CA in cancer genomes is likely to represent passenger events phosphatidylinositol kinase gene (Davies et al. 2002; and only a minority is likely drivers. For example, of the Lynch et al. 2004; Paez et al. 2004; Pao et al. 2004; 453 validated nonsilent mutations in GBM scattered Samuels et al. 2004)—leading to approved and in-devel- across 223 genes, only eight genes were considered having opment targeted therapeutics for cancers. higher than background mutation frequency, suggestive For sequencing data, the normalized data category of positive selection pressure (The Cancer Genome Atlas represents sequencing reads that have been aligned to Network 2008). However, it is worth noting that, as for a specific version of the human reference genome. As the any statistical test, the lack of statistical significance by reference genome is refined and filled in with each new MutSig or similar analyses does not preclude true cancer version, mapping data may change; thus, researchers relevance, as, relatively speaking, the number of samples should pay attention to and always note the specific having been adequately sequenced is still low. Moreover, reference genome build used in an analysis. Raw sequenc- computational algorithms for mutation calling and sta- ing reads from both Sanger-based capillary sequencing tistical frameworks for significance calculation are still and second-generation platforms are stored at NCBI being developed and refined. Sequence Read Archive and dbGAP (see Table 2). Access Beyond statistical analyses, there are various theoreti- to these sequencing data is restricted and requires data cal and computational models designed to predict the use approval by the appropriate data access committee. likely functional consequences of specific nucleotide The interpreted category for mutation data includes the mutations, particularly for mutations in coding genes. sequence variant calls, the types of sequence variants These models are often based on the impact of specific (such as synonymous versus nonsynonymous or mis- amino acid substitutions on protein structure or known sense versus indel), and the location of the nucleotide functional domains or evolutionarily conserved regions. change in relation to annotated gene structure; e.g., in- For example, the PolyPhen (for polymorphism phenotyp- tron versus exon and consequent amino acid changes. ing) tool predicts the possible impact of an amino acid One aspect of this analysis includes an annotation of substitution on the structure and function of a protein whether such sequence variants are reported in dbSNP, using a variety of structural and chemical parameters in a database representing likely common SNPs. If not found addition to evolutionary conservation (Sunyaev et al. in dbSNP and not observed in matched germline-derived 2001; Ramensky et al. 2002). Indeed, a recurring theme sequences, such variants are generally considered so- in analysis of genomic data is the leverage of evolutionary matic in nature. Putative variants should be verified information. MutationAssessor (Reva et al. 2007) is a re- (e.g., result reproduced in an independent assay using cently published algorithm for predicting potential func- the same technology platform) or validated (e.g., result tional impact of a sequence mutation based heavily on observed by an orthogonal method) by methods including the assumption that if a highly conserved residue is genotyping. For TCGA projects, all verified, validated, or changed to a different residue type, the change is pre- putative somatic mutations and associated descriptions sumed to have high functional impact on the function of discovered in a sample or a cohort of samples can be the affected protein. By analyzing aligned sequence families found in the .MAF file, which is available on the TCGA of paralogous and orthologous proteins within the human data portal (http://cancergenome.nih.gov/dataportal). genome and across many other species, this algorithm Similar data files can be found at the ICGC Data Co- calculates the functional impact (FI) score for a mutation. ordination Center (http://dcc.icgc.org) under the Down- Both of these tools are Web accessible (http://genetics.bwh. load Data page. New file formats will likely emerge in the harvard.edu/pph; http://mutationassessor.org) and offer an near future to support the increasing applications of next- intuitive and easy-to-use query interface as well as a batch generation sequencing platforms for data generation. processing feature. With MutationAssessor, in addition to A key downstream (Level IV) analysis of verified or the calculated FI score for each variance, users can inspect validated mutations is the determination of significance, the placement of mutations in a multiple sequence align- accounting for the background mutation rate and size as ment relative to amino acid residues that are conserved well as composition of a gene. Several methodologies globally or in a specific subfamily, as well as observe the have been developed for this purpose (Getz et al. 2007; consequences of the residue change in an interactive Greenman et al. 2007). For example, in the MutSig three-dimensional protein structure. Other examples of algorithm, a P-value is calculated for each gene, testing prediction algorithms include SIFT, CanPredict, and the hypothesis that all of the observed mutations in that CHASM (Ng and Henikoff 2001; Kaminker et al. 2007; gene are a consequence of random background mutation Carter et al. 2009; Adzhubei et al. 2010). In the end, processes, taking into account the list of bases that are beyond statistical analysis or the prediction of functional successfully interrogated by sequencing (i.e., ‘‘covered’’) impact, the relevance of any mutational event to human and the list of observed somatic mutations, as well as the cancer will require functional validation (see below). length and composition of the gene in addition to the The COSMIC (Catalog of Somatic Mutations in Can- background mutation rates in different sequence con- cer) site is the single most comprehensive source of texts. As in analyses of other genomic data, such calcu- curated analyzed somatic mutation data in cancers de- lations must then be corrected for multiple hypothesis veloped and maintained by the Cancer Genome Project at testing (see below). Using these types of significance the Wellcome Trust Sanger Institute (Futreal et al. 2004; analyses, the majority of the somatic mutations found Forbes et al. 2008). COSMIC is an open source, easily

GENES & DEVELOPMENT 539 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Chin et al. accessible, and searchable database containing >140,000 When a cohort of segmented copy number profiles is somatic mutations of 18,000+ genes in >550,000 tumor analyzed together, several methodological tools are avail- samples curated from 2.8 million experiments (COSMIC able to define the Level IV summarized data, where version 49, http://www.sanger.ac.uk/genetics/CGP/cosmic; significantly altered regions are defined (by specifying Forbes et al. 2011). While it clearly provides a valuable the boundaries of ‘‘peaks’’ and significance). GISTIC is source of collated mutation data in human cancers, users a popular algorithm based on statistical considerations should also be cognizant of the fact that there is likely (Beroukhim et al. 2007), allowing users to define the most curational bias, as in any database of this type. significant regions and peaks; GISTIC works most ro- For germline cancer-associated mutations, no similar bustly in large sample cohorts. RAE uses a similar meth- effort exists to collect such mutations, but the Online odology (Taylor et al. 2008). GTS (Wiedemeyer et al. 2008) Mendelian Inheritance in Man (OMIM, http://www.ncbi. provides independent measures of recurrence frequency nlm.nih.gov/omim; McCusick 1998) provides an excel- and focality; the latter can be a strong indicator for lent literature summary regarding each major familial relevance when sample size is small. For those with cancer gene and susceptibility loci. basic R programming skills, cghMCR and CNTools are two Bioconductor packages (http://www.r-project.org; Gentleman et al. 2004) available for copy number data SCNAs and structural rearrangements analyses. The former provides a fast and platform-in- Cancer genomes are highly disordered compared with dependent approach to identifying and visualizing altered normal genomes, with extensive changes in regions, while the latter enables the conversion of seg- structure and copy number. The SCNAs found in cancer mented copy number profiles into a matrix structure to include whole-chromosome or regional alterations span- allow further downstream analyses. It should be noted ning part to whole arms of a , as well as that most of these algorithms are built using the preva- focal events involving one or a few genes. The develop- lence and shape (e.g., focal and high amplitude vs. flat and ment of array-based comparative genomic hybridization broad) of a numerical copy number aberration to discrim- using bacterial artificial chromosomes (Hodgson et al. inate likely target(s) at the peaks from passengers. Al- 2001; Cai et al. 2002), cDNA (Pollack et al. 1999), oligo- though this framework has led to identification of new mers (O’Hagan et al. 2003; Brennan et al. 2004), and SNP cancer genes that have been experimentally validated, arrays (Lindblad-Toh et al. 2000; Mei et al. 2000; Bignell next-generation sequencing is beginning to provide in- et al. 2004; Zhao et al. 2004) has enabled systematic formation on sequence-level structures underlying these analyses of the cancer genome and defined many new numerical copy number aberrations. These emerging data recurrent SCNAs in cancer (Beroukhim et al. 2010). To will provide much finer details of structural rearrange- date, the major CNAs linked to cancer are somatic changes, ments in addition to simple numerical changes such as although many germline copy number variations are found duplication or amplification or deletion. Such new in- in human populations (Sebat et al. 2004; Redon et al. sights will likely lead to different or improved algorithms 2006). Further work is needed to determine the role of to identify candidate targets of these genomic alterations. germline copy number aberrations in cancer. The major repositories for somatic copy number data in Most of the available genome-wide high-resolution cancer include Gene Expression Omnibus (GEO) (Edgar copy number profiling data were generated on either et al. 2002; Barrett et al. 2009), Tumorscape (Beroukhim Agilent or Affymetrix microarray platforms. Performance et al. 2010), TCGA (http://cancergenome.nih.gov), COSMIC of these various platforms has been compared (Brennan (Forbes et al. 2008), and Oncomine (Rhodes et al. 2004, et al. 2004; Lai et al. 2005, 2008; Willenbrock and Fridlyand 2007) (Table 3). The GEO site contains predominantly raw 2005) and used comparatively on the same sample cohort data, while COSMIC and Oncomine emphasize normal- by TCGA during its pilot phase (The Cancer Genome ized and interpreted data. All three categories of data are Atlas Network 2008). Raw data for copy number are probe- available from the TCGA DCC as well as at Tumorscape. level signals, whereas processed data are the results For systematic queries of analyzed summary data on of normalization, calculation of tumor-to-normal copy CNAs in human cancers, Tumorscape is particularly use- number ratio, and mapping to chromosomal positions. ful, as it provides segmented data for >3000 high-resolu- Interpreted data generally contain segmented copy number tion copy number profiles from SNP arrays in a format profiles where breakpoints along the chromosomes have that can be visualized with the interactive Integrative been defined in each individual tumor using segmen- Genome Viewer (http://www.broadinstitute.org/igv). tation methods such as circular binary segmentation COSMIC also contains copy number data annotated on (CBS) (Venkatraman and Olshen 2007),GLAD (Hupe et al. the gene level in addition to raw data download at the 2004), and others (Picard et al. 2005; Wang et al. 2005; CGP archive, while Oncomine has recently begun to Day et al. 2007; Ben-Yaacov and Eldar 2008). Recent curate copy number profile data sets in addition to tran- methods for measuring and determining segments of scriptome data sets. absolute allele-specific copy number from SNP arrays Before the advent of second-generation sequencing provide a more accurate description of allelic gains and (Maher et al. 2009a,b; Berger et al. 2010; Palanisamy losses and loss of heterozygosity in cancer genomes et al. 2010), conventional cytogenetic methodologies (LaFramboise et al. 2005; Bengtsson et al. 2010; Greenman such as FISH were the primary means for identification et al. 2010; Van Loo et al. 2010). of chromosomal translocations in cancers, predominantly

540 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data

Table 3. Sites with open source analytical tools for cancer genomics data Tools Link Bioconductor http://www.bioconductor.org GenePattern http://www.broadinstitute.org/genepattern Gene Ontology http://www.geneontology.org/GO.tools.microarray.shtml UCSC Cancer Genome Browser https://genome-cancer.soe.ucsc.edu Integrative Genomics Viewer (IGV) http://www.broadinstitute.org/igv The Cancer Genomics Pathway Portal http://cbioportal.org in leukemia and lymphoma (Rabbitts 1994). One of the variations and confound such analyses. These consider- most useful compilations of cytogenetic alterations was ations require reproducing the classification results in established by Mitelman et al. (2007), which includes independent test cohorts of samples and using multiple both translocations and CNAs in >50,000 samples. The hypothesis testing correction methods. recent identification of translocations in epithelial can- Once subtypes are defined, there are several user- cers suggests that the application of high-throughput friendly analytical algorithms that can be employed to sequencing will expand the number of such transloca- interrogate these data sets. For example, one of the most tions (Tomlins et al. 2005), although the number and commonly asked questions with gene expression data is: significance of such recurrent translocations in solid What gene expression difference exists between two tumors remains to be delineated. biologically, clinically, or molecularly defined subgroups? Significance analysis of microarrays (SAM, http://www- stat.stanford.edu/;tibs/SAM; Tusher et al. 2001) is a com- Expression analysis of cancer monly used tool to discern genes that are significantly Global gene expression profiles offer a global view to the different between two groups of samples. In GenePattern, transcriptome of a tumor, the signature of which has the ComparativeMarkerSelection (Gould et al. 2006) is an- potential to provide diagnostic (Golub et al. 1999; Alizadeh other tool that can be used to define genes that are char- et al. 2000) or prognostic (van de Vijver et al. 2002; Paik acteristic of a subgroup. Finally, one of the most commonly et al. 2004) information. This approach has also been used used sites for querying analyzed cancer gene expression to molecularly subclassify tumors that are currently data is Oncomine, which allows comparison of any two assigned to one histopathological class. Gene expression data sets from different cancers or normal tissues to de- is also one measure of functional consequence of a geno- termine the genes that are specifically expressed in the data mic alteration, thus potentially enabling the interpreta- set of interest (Rhodes et al. 2004, 2007). tion of inactivating genetic changes on the DNA level Once differential expression gene lists or signatures for or epigenomic (methylation) on DNA promoters. Inter- each subtype are generated, these lists can be interrogated preted expression data typically represent gene-level data for pathway activation using knowledge-based pathway where multiple probes reporting on one annotated gene analysis tools such as Ontologizer (Bauer et al. 2008), are collapsed (by taking either the median or mean values which looks for statistical enrichment of Gene Ontology or best probe). terms, or Gorilla, a tool to visualize Gene Ontology terms A major summarized category for expression data type that are enriched in gene lists ranked by a user-defined is definition of molecular subtypes. The discovery of criterion (Eden et al. 2009). Gene set enrichment analysis novel molecular subclasses is based primarily on differ- (GSEA) is an algorithm that allows one to assess whether ences in gene expression between groups within a cohort the differentially expressed genes are enriched for partic- using various unsupervised classification methodologies, ular gene sets even though each member of the gene set such as hierarchical clustering (Eisen et al. 1998), self- individually is not necessarily strongly differentially organizing maps (Tamayo et al. 1999), and nonnegative expressed. Gene sets can be defined in various ways, such matrix factorization (Kim and Tidor 2003; Brunet et al. as manual curation of pathways, based on genomic 2004). Visual verification or identification of clusters position or shared motifs, expression correlation, or by requires transforming the high-dimensional expression experimentally identifying signatures that represent a data (each sample is represented by the expression values molecular event or (Subramanian et al. 2005), of all genes) to two or three dimensions. This is often such as KRAS activation (Sweet-Cordero et al. 2005). performed using principal component analysis (PCA) Gene signatures that can be used as input for GSEA can (Raychaudhuri et al. 2000) or multidimensional scaling be downloaded from the Molecular Signatures Data- (MDS) (Khan et al. 1998). However, it should be noted base (MSigDB, http://www.broadinstitute.org/gsea/msigdb/ that, given the large number of available gene expression index.jsp). Other gene set and pathway repositories include profiles and the number of independent measurements Pathway Commons (http://www.pathwaycommons.org), contained within each profile, one can produce class KEGG (Kanehisa et al. 2006), and others. Another com- discriminations that may or may not reflect underlying monly used tool for pathway and Gene Ontology enrich- biological differences. For example, batch effects, intro- ment is DAVID (http://david.abcc.ncifcrf.gov/home.jsp; duced by profiling samples on different days using differ- Huang et al. 2009). These types of analyses provide ent lots of reagents or at different sites, can introduce a framework for generating hypotheses for further testing.

GENES & DEVELOPMENT 541 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Chin et al.

However, users should be reminded that annotation of allows browsing clinical annotations alongside the geno- pathways or definition of gene sets is biased by what is mic data. In addition to local data, a user can load data known and published. Genes that are widely studied will from a server that contains many data sets, including be linked to many different processes, while ones that open access TCGA data. IGV can be downloaded and have not been explored in depth would have few connec- launched from a desktop computer. Similarly, for viewing tions to others. and manipulating cancer copy number data and gene expression data to generate heat map figures, the dChip Second-generation sequencing data software system (Li and Wong 2001) can be installed on a standard desktop or laptop computer. It is clear that, in the very near future, all genome char- To perform customized analyses beyond querying sum- acterization data will be generated by second-generation marized results, many open source analytical tools re- sequencing technologies. Output from these sequencing quiring only basic programming and ex- platforms is in the form of short-sequence reads, ranging pertise are now available. For example, Bioconductor from 35 base pairs (bp) to 100 bp or longer. These raw (Gentleman et al. 2004) is an open source site with many sequencing reads and their mapped reads after alignments useful analytical packages, written in the R language (to reference genome) are captured in a BAM file. There- (http://www.r-project.org), for the analysis and interpre- fore, one can consider the BAM file as containing both tation of cancer genomic data, including second-generation raw and normalized (Level I–II) data; hence, BAM files are sequencing data. Another site is GenePattern (Reich controlled access data. For TCGA, these data are stored at et al. 2006) (http://www.broadinstitute.org/cancer/software/ the Sequence Read Archive of NCBI (http://www.ncbi. genepattern), which is a user-friendly Web-based inter- nlm.nih.gov/Traces/sra) and accessed through dbGAP. face for >125 different genomics analysis tools and pipe- Similar second-generation sequencing data from the lines for various types of data, including gene expression, Sanger CGP are accessed through EGA (http://www.ebi. copy number data, proteomic data, and others. GenePattern ac.uk/ega). The size of these data files (typically in tens to can also keep track of parameters and versions of tools, and hundreds of gigabytes) and the complexity of manipulat- thus achieves an important goal of enabling reproducible ing such data make it challenging to access and analyze research. The Gene Ontology site (http://www.geneontology. these data by noncomputational groups. A detailed dis- org/GO.tools.microarray.shtml) also maintains a collec- cussion of the computational challenges, algorithms, and tion of open source analytical tools contributed by its software for analyzing second-generation sequencing for consortial members for analyses of microarray-based cancer is provided in recent reviews (Ding et al. 2010b; expression data. Meyerson et al. 2010).

Viewers and tools Computational considerations in cancer genome analysis Some of the open source software tools and viewers are summarized in Table 3. Several user-friendly Web-based In addition to specific challenges inherent in analysis of viewers provide intuitive environments with which to each type of cancer genome data, several general consid- visualize and explore cancer genome data, although most erations should be kept in mind when one analyzes, do not allow fully interactive queries at the present time. interprets, and uses cancer genomics data. These include The University of California at Santa Cruz (UCSC) (1) quality control (QC) of data, (2) the accurate estima- Cancer Genome Browser (Zhu et al. 2009) is perhaps tion of signal and noise in large data sets, (3) reproducible the most commonly accessed site, which allows users to approaches to complex genomic analyses, and (4) achiev- view and query a variety of cancer data types in the ing sufficient power in the face of multiple hypothesis context of the widely used UCSC Genome Browser testing. We briefly touch on each of these issues, but (https://genome-cancer.soe.ucsc.edu). The Cancer Geno- recommend several books for a broader introduction to mics Pathway Portal (http://www.cbioportal.org), hosted bioinformatics; these include ones that are focused on at Memorial Sloan-Kettering Cancer Center (MSKCC), is principles and applications (Xiong 2006; Pevsner 2009), a recently launched site aiming to provide direct visual- as well as on computational methodologies (Jones and ization and queries of summarized results by cancer Pevzner 2004). biologists with little to no bioinformatic expertise as well Before analyzing genomic data, either publicly avail- as download of large-scale cancer genomics data sets for able or locally produced, and generating hypotheses for bioinformatic power users. For example, its major feature, further experimental follow-up, one has to ensure the Oncoprints, provides an easy way to visualize distinct data are of sufficient quality. Two key aspects contribut- genomic alterations (e.g., somatic mutations, CNAs, and ing to raw data quality are biospecimen and technical mRNA expression changes) of a gene or genes of interest execution. Criteria used for biospecimen inclusion and across a set of tumor samples. Another tool is the Inte- exclusion can influence data quality; for example, a tumor grative Genome Viewer (IGV, http://www.broadinstitute. specimen with a high proportion of stromal contamina- org/igv), a scalable and readily browsable interface for tion will reduce one’s ability to detect somatic alterations accessing any type of cancer genome data, including mu- in the tumor cells. On the data generation front, stan- tations, expression, and copy number, particularly suit- dardization of technical methods (i.e., standard operat- able for second-generation sequencing data. IGV also ing procedures [SOPs]) and execution by highly trained

542 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data individuals can minimize experimental variation and ulations for tracking and debugging. Such information ensure reproducibility and high data quality. Once data will permit one to identify, for example, an input error are generated, the normalization step (Level II) attempts that results in mislabeling of subsequent samples. There- to remove any experimental artifacts that can negatively fore, some of the key elements in reproducible bioinfor- impact the data quality; however, some level of batch matics include associating each analysis with a freeze of effects often remains and, if not controlled, may lead to not just the data, but also the analytic software (codes) spurious findings. A simple approach for detecting batch and parameters (Mesirov 2010). For example, recording effects and estimating their extent is to use standard the specific reference genome build used for a particular supervised methods to search for differences between analysis is critical to one’s ability to reproduce a result batches; e.g., use SAM, ComparativeMarkerSelection, or using the same input data. Increasing use of tools that ANOVA to look for genes that are differentially expressed have automated version tracking capability will greatly between batches. enhance reproducibility. Another potential problem with large genomic data Unlike traditional molecular biology experiments, sets beyond data generation is sample mismatching; where only relatively few measurements are generally hence, it is recommended that one double-checks that made (<10) in any single experiment, the number of the data originated from the intended sample. This is individual measurements in a cancer genomics study is particularly important when analyzing multiple data in the thousands to millions. This scale requires not only types, since a mix-up could happen in some but not other a large number of samples to achieve statistical power, data types. A useful approach to screening for sample but also addressing the issue of multiple hypothesis mismatching is to leverage our understanding of the data testing. Statistical power is a concept discussed in general types. For example, one can (1) correlate copy number and publications on biostatistics, but it is also specifically expression data, expecting overall positive correlation; (2) illustrated in the case of whole-exome sequencing (Getz correlate expression and promoter methylation data with et al. 2007; Hudson et al. 2010), where it was calculated the expectation of finding a negative correlation; and (3) that 500 tumor samples are needed to detect, with ;80% detect a drop in expression levels of specific genes in power, genes that are mutated in 3% of patients (assum- samples harboring truncating mutations (nonsense and ing a typical background mutation rate). In other words, frameshift insertions or deletions). With second-genera- a small discovery cohort will have very little power to tion sequencing for somatic alterations, a common QC detect infrequently mutated genes. Therefore, a lack of check is to compare the SNP genotype profiles between somatic mutation in a gene of interest in a study of array-based and sequencing data to ensure that there a small number of samples should be considered non- is a match between tumor and germline , as informative, rather than interpreted as the gene is not a mismatch will result in erroneous calls of somatic mutated in cancers. mutations. The concept of correcting for multiple hypothesis Like any other type of experimental data acquisition testing is very important in rigorous data interpretation and analysis, maximizing signal and minimizing noise is of cancer genomics data, as it addresses the issue of false essential in cancer genomics. In particular, using noise discovery. When thousands to millions of measurements filters to remove poor-quality samples helps to achieve are queried in each sample, significant differences will be the most accurate analysis (Kauffmann and Huber 2010). observed, but the likelihood that such observed differ- Platform-specific noise measures can be used to optimize ences would occur by chance is extremely high. For ex- data quality for each type of genomic measurement. With ample, when searching for differentially expressed genes such measures, the analysis of independent genomic data by comparing two of 20,000 coding genes, sets has proven to be reproducible across multiple in- a typical P-value cutoff of 0.05 will mean that, regardless dependent laboratories, for example, for microarray-based of biological relevance, 5% of 20,000 genes (1000 genes) gene expression analysis of human cancers (Dobbin et al. will be identified as differentially expressed by chance 2005). alone. To compensate for this, a q-value or false discovery Another key element in any computational analysis is rate (FDR) (Benjamini and Hochberg 1995) is calculated to reproducibility, a concept that is familiar to experimen- bound the expected fraction of false discoveries in the talists, but the framework to ensure reproducibility in results. In the above example of comparing two tran- computational analysis is still in development. Naturally, scriptomes, if a set of genes in the differentially expressed the reproducibility standards expected from in silico list has a calculated FDR value of 0.2 (a commonly used experiments are beyond what is possible from bench cutoff) or less, it implies that the expected fraction of false experiments, since the initial raw data are available and discoveries among the list of differentially expressed computational tools (at least deterministic ones) should genes is, at most, 20%. The FDR approach has become always reproduce the same values for all measurements. widely used in cancer genome analyses, as controlling the For example, if 200 genes are reported to be differentially FDR is a more liberal approach than the standard Bonfer- expressed between two tumor subtypes, one would ex- roni correction, which bounds the chance of having even pect to obtain the exact same list of 200 genes when a single false discovery. reproducing the analysis using the same tool, parameters, In summary, when using publically available cancer and input data. To aid in reproducibility, one must have genomic data, one should pay attention to the design of the a clear record of all inputs, parameters, and data manip- study that generated the data; when expertise is available,

GENES & DEVELOPMENT 543 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Chin et al. it is recommended that one downloads the raw data to Cross-species comparative repeat normalization and other QC checks prior to analyses. Analysts performing analyses, whether using The structural complexity of human cancer genomes has off-the-shelf tools or developing new algorithms, must be motivated efforts to leverage different data sets involving mindful of reproducibility. With respect to interpretation complementary information to identify somatically al- of results, one must keep in mind the statistical power of tered genes that likely contribute to tumorigenesis. In a particular study and correct for multiple hypothesis addition to integrating information derived from the testing. Finally, the most stringent test to assess the same samples, the comparison of genetic alterations significance of these conclusions is the ability to re- found in murine cancer models with those found in produce the findings in independent data sets. human cancers provides another method to identify cancer drivers. The rationale for the use of model organ- isms rests on the view that truly important driver genes Integrative analyses of the cancer genomes and their linked mechanisms will be evolutionarily Cancer genomic data remains incomplete due to limita- conserved, while bystander events not linked to bi- tions of the current experimental methods as well as the ological processes are less likely to be shared in these inherent complexity of its biology. One method to iden- cross-species comparisons. tify genes of particular interest involves integrating the Several studies have successfully leveraged the mouse outputs from different types of experiments. For example, cancer genomics along with human cancer genome data finding that a gene is targeted for genomic deletion, to identify novel cancer genes (Kim et al. 2006; Zender inactivating mutations, promoter hypermethylation, al- et al. 2006). For example, Zender et al. (2006) compared terations of miRNA expression, and/or transcriptional regions that were amplified in both murine and human down-regulation in different tumor samples would col- liver cancer genomes and found that cIAP1 and Yap are lectively suggest that this gene is a candidate tumor coamplified. Similarly, Kim et al. (2006) investigated a suppressor gene, even if each type of genomic alteration focal amplification in a murine model of melanoma that may be infrequent. acquired new metastatic capability and found NEDD9 as By extension, one can interrogate several types of data a gene that is the target of the large 6p23 derived from the same set of tumors for evidence of regional gain observed in ;30% of human metastatic dysregulation in a functional complex or pathway based melanomas. In a more recent example using second- on known constituents. The recent work in clear cell generation sequencing technology, a mouse mammary renal cancer is an excellent example of integrative anal- tumor was found to carry an internal deletion of exons of ysis of a functional complex (Dalgliesh et al. 2010). Here, the Lrp1b gene, an event similar to internal deletions the finding of low-frequency (3%) mutations in several found in the corresponding human ortholog in ;4% of enzyme-coding genes (SETD2, UTX, and JARID1C), all human cancer cell lines (Varela et al. 2010). Unlike what implicated in modifying regulatory lysine residues on is usually observed in human cancer genomes, most histone H3, was interpreted collectively as evidence mouse cancer genomes in genetically engineered mouse pinpointing the deregulation of H3 histone modification (GEM) strains harbor relatively few structural and copy as a likely cancer-promoting process in a significant pro- number aberrations. Although this species difference portion of clear cell RCC. limits the number of events available for comparison An example of pathway integration is the analysis of across the species, it also means that the presence of a GBM by TCGA. In this study, hundreds of clinically focal amplification or deletion in a mouse cancer genome annotated GBM with matched blood normal (white blood reflects strong selective pressure, thus providing strong cells) as a reference were characterized for (1) somatic evolutionary evidence that implicates the syntenic event mutation in 600 known or candidate cancer genes, (2) in humans as a likely driver event. Indeed, when the global SCNA patterns, (3) mRNA and miRNA expression, mouse genome is engineered to experience telomere and (4) DNA promoter methylation status. By analyzing dysfunction and consequent genome instability like most mutations and SCNAs in the same samples, TCGA human cells, the resultant tumors acquired a human-like showed that nearly all GBM harbored alterations in some genome harboring complex rearrangement and alter- components of the receptor /PI3K/RAS, ations that are syntenic to loci altered in human cancers , and RB/cell cycle pathways (The Cancer Genome (Maser et al. 2007). These examples provide a rationale for Atlas Network 2008), providing definitive genomic evi- cross-species comparative oncogenomics as an efficient dence of the deregulation of these core pathways as strategy for the annotation of human cancer genes. obligate events for this cancer. In addition, by integrat- A complementary way to use mouse cancer models to ing sequencing and copy number data with expression identify cancer genes involves the use of forward genetic profile analyses, TCGA linked major molecular subtypes screens. For example, Berns and colleagues (Uren et al. of GBM defined by transcriptional profiles to specific 2008) have created cohorts of mice containing retroviral genotypes (Verhaak et al. 2010). In addition, analysis of insertions for identification of genes that cooperate in a DNA promoter methylation defined a further subclass specific genetic background to drive tumorigenesis. In the of GBM exhibiting a CpG island Methylator Phenotype setting of Ink4a/Arf or p53 deletion, they have identified linked to IDH1 mutation and better survival (Noushmehr many candidate oncogenes and tumor suppressor genes et al. 2010). that corresponded to known human cancer-associated

544 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data mutations. Similarly, Copeland and Jenkins (2010) have High-throughput evaluation of gene function in cancer deployed the Sleeping Beauty transposon system to In addition to laying the foundation for analysis of the generate both activation and inactivation of genes in cancer genomes, the information provided by the human the murine genome, and have used this system to genome project has facilitated the development of re- discover genes involved in colorectal and other cancers agents to perform comprehensive somatic cell genetics in (Dupuy et al. 2009; Starr et al. 2009). In another recent mammalian cells through the systematic manipulation study, genome-wide copy number profiles of hundreds of of gene expression. Specifically, libraries that permit the human cancer cell lines were compared with hundreds of expression or suppression of the majority of human or common insertion sites isolated from >1000 mouse murine genes now exist in several formats. Although tumors induced with the murine leukemia virus (MuLV), these tools can be used to study nearly any aspect of showing a significant enrichment of human ortholog biology, early studies focused on cancer have genes with known mutations in COSMIC and overrepre- not only provided a proof-of-principle demonstration of sentation of human orthologs that are annotated as the utility of such systematic functional approaches, but oncogenes in the Cancer Gene Census (http://www. have also begun to provide insights into novel aspects of sanger.ac.uk/genetics/CGP/Census; Futreal et al. 2004). cancer biology. Taken together, these efforts thus far have demonstrated the power of cross-species comparisons as a powerful Expression-based studies Although cDNA libraries approach to identify cancer genes, and argue for sys- have been used for many years to identify genes whose tematic and comprehensive genomic characterization overexpression confers specific phenotypes (Seed and of appropriately engineered mouse models of human Aruffo 1987; Lin et al. 1991; Wang et al. 1991), and played cancers. a key role in the discovery of some of the first oncogenes (Shih and Weinberg 1982), such screens have often been Converting genomic data into biological knowledge limited by the efficiency of gene transduction as well as the representation of genes present in such libraries. As The current large-scale cancer genomics efforts will such, most successful screens involved positive selection identify and enumerate the frequency of every genetic strategies. However, several vector systems now exist element of interest that is structurally altered in the that permit high-efficiency transduction of cDNAs into cancer genome, including those impacting annotated a wide range of mammalian cells (Koh et al. 2002), and genes, noncoding miRNA, or other conserved elements. increasingly complete collections of human cDNAs or While statistical significance based on frequency will be ORFs now permit more comprehensive gain-of-function an important filter to cull bystanders from likely driver screens. Probably the most dramatic recent example of events, it is likely that mutations that occur at a lower such a gain-of-function screen is the discovery of the frequency may also contribute to specific types of cancer. EML4-ALK translocation in NSCLC by a retroviral ex- For example, mutations or translocations involving ALK pression system (Soda et al. 2007). occur in 4%–5% of non-small-cell lung cancers (NSCLCs) Using a positive selection screening strategy, several (Soda et al. 2007), but confer sensitivity to small-mole- groups have identified genes that bypass specific anti- cule ALK inhibitors (McDermott et al. 2008). In such proliferative signals. For example, building on work that cases, additional experimental evidence is necessary to identified the retinoblastoma and p53 signaling networks identify such low-frequency events as contributors to as key regulators of cell proliferation and survival, TBX2 cancer initiation or progression (Chin and Gray 2008). was found as a gene amplified in breast cancer that Indeed, since it is likely that multiple genetic alterations permits cells to proliferate in Bmi1-deficient fibroblasts are required to program the behavior of any specific (Jacobs et al. 2000), DRIL1 was discovered as a gene that cancer, functional analyses will be required to comple- bypassed RAS-induced senescence (Peeper et al. 2002), and ment genome annotation. For example, the recent dem- BCL6 was found to permit cell proliferation in the pres- onstration that the mutation status of KRAS dictates ence of active p19ARF/p53 signaling (Shvarts et al. 2002). the response of tumors that harbor mutant EGFR to This approach has also been used to identify genes treatment with EGFR inhibitors and the complex in- involved in other cancer-related phenotypes. For exam- terplay between BRAF, CRAF, and RAS in response to ple, the prostate-derived ETS factor gene SPDEF was selective BRAF inhibitor (Heidorn et al. 2010; Joseph identified as a gene that permits immortalized mammary et al. 2010; Poulikakos et al. 2010; Whittaker et al. 2010) epithelial cells to invade and migrate (Gunawardane et al. confirm that further functional studies will be required 2005). SPDEF was subsequently found to be overex- to exploit knowledge of somatic mutations. In addition, pressed in both breast and prostate cancers, and to coop- genes that are not mutated in cancers may also con- erate with receptor tyrosine kinase genes such as ERBB2 tribute to the survival of cancers harboring other gen- and CSF1R to drive cell transformation. etic alterations, such as the PARP1 gene, whose in- In each of these examples, the cDNA libraries used by activation is synthetically lethal in breast and ovarian these investigators were derived from cell lines by reverse cancers that lack BRCA1 or BRCA2 (Fong et al. 2009, transcription of mRNA. Although this approach has been 2010). Thus, functional analyses of cancer genomes are used successfully, several limitations of this methodology needed to complement structural analyses of cancer are that each gene is not represented at equal frequency in genomes. the library, longer cDNAs are underrepresented, and the

GENES & DEVELOPMENT 545 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Chin et al. full repertoire of splice variants is not present. With the Moffat et al. 2006; Luo et al. 2009). Each of these systems development of large collections of ORFs (Table 4; Lamesch has unique features, including high-efficiency infections, et al. 2007; Rolfs et al. 2008; Varjosalo et al. 2008), it is now ease of recombination-based cloning, and inducible ex- possible to create expression libraries in which at least one pression. In addition, these vector-based systems are splice version of every gene is present at equal numbers. useful for both arrayed and pooled screening approaches. For example, analysis of a relatively small cDNA library Both siRNA and shRNA libraries have been used suc- targeting 353 identified IKBKE as a breast cancer cessfully in arrayed screens (Aza-Blanc et al. 2003; Kittler oncogene that substitutes for AKT to permit cell trans- and Buchholz 2005; MacKeigan et al. 2005; Whitehurst formation (Boehm et al. 2007). In addition to this example et al. 2007). For many other cancer-related phenotypic of intersecting hits from a genetic screen with genomic assays—such as anchorage-independent colony formation, profiles of human cancers, another approach is to create bypass of senescence, or tumor xenografts—long-term customized libraries of candidate genes—e.g., a library of gene suppression is essential, requiring stable integration amplified genes or mutated genes—to assess which ones and expression of the RNAi vector. Recent work from have oncogenic activities. This type of approach can now several laboratories has shown that these approaches are be expanded to hundreds, rather than tens, of genes that tractable in human cells. For example, PITX1 was found emerge from genomic efforts, so one can enlist many as a negative regulator of RAS signaling (Kolfschoten et al. candidates into a gain-of-function genetic screen for func- 2005), REST1 has been identified a negative regulator of tional activity. PI3K signaling (Westbrook et al. 2005), CDK8 has been identified as a regulator of b-catenin signaling in colon Loss-of-function approaches Similar to the cDNA or cancer (Firestein et al. 2008), SIK1 was found to be ORF libraries used for gain-of-function approaches, librar- a negative regulator of aniokis and metastasis (Cheng ies of RNAi reagents can be introduced into cells either et al. 2009), and CDK6 has been shown to be an oncogene stably or transiently. In mammalian cells, RNAi-medi- in GBM (Wiedemeyer et al. 2010). Although such arrayed ated gene suppression can be induced by the introduction format screens require assays that are amenable to well- of chemically synthesized siRNAs or plasmids expressing based miniaturization, this experimental design permits RNA hairpins, known as shRNAs, which are processed to the use of high-content imaging to identify subtle or siRNAs by Dicer (Bernstein et al. 2001). Chemically complex phenotypes such as changes in cell morphology synthesized siRNAs are available from many different (Moffat et al. 2006; TR Jones et al. 2008). commercial sources as individual reagents, as pools target- In addition, vector-based shRNA libraries can be used ing specific genes, or as genome-scale libraries. In general, to interrogate gene function in a massively parallel siRNAs are easily synthesized and highly effective in manner by creating pools of shRNAs. The advantages of inducing gene knockdown. However, such oligonucleo- this approach are that such pooled screens permit the tide reagents are relatively expensive and can be used only study of a larger number of genes with decreased cost and for transient loss-of-function experiments. provide the possibility of using loss-of-function genetics Vector-based systems to express RNAi provide several in assays that cannot be performed in vitro. Several large- advantages compared with siRNA. By creating viruses, scale screens using pooled libraries have been performed these vectors permit long-term, stable expression of the (Brummelkamp et al. 2006; Ngo et al. 2006; Luo et al. RNAi construct and expand the range and type of cells 2008; Schlabach et al. 2008), demonstrating that both into which such constructs can be introduced. Both positive and negative selection screens are possible using academic and commercial groups have produced large these formats. To facilitate the deconvolution of genes libraries of shRNAs in a variety of expression vectors targeted by shRNAs in these screens, each of these groups (Table 4; Brummelkamp and Bernards 2003; Paddison has developed strategies to quantify the abundance of et al. 2004; Silva et al. 2005, 2008; Buchholz et al. 2006; each shRNA at the beginning and end of each screen by

Table 4. Reagent collections for manipulating mammalian gene function ORF/cDNA collection Link DFCI Center for Cancer Systems Biology http://ccsb.dfci.harvard.edu/web/www/ccsb German Center (DKFZ) http://www.smp-cell.org/smp-cell/cell.org/groups.asp?siteID=7 Harvard Institute of (HIP) http://www.hip.harvard.edu Mammalian Gene Collection (MGC) http://mgc.nci.nih.gov NEDO (FLJ) http://www.kazusa.or.jp/NEDO shRNA collection Link http://hannonlab.cshl.edu/index.html Hannon-Elledge http://elledgelab.bwh.harvard.edu/index.html MISSION esiRNA http://www.sigmaaldrich.com/life-science/functional-genomics-and-rnai/ mission-esirna.html Netherlands Cancer Center http://screeninc.nki.nl/library/index.php The RNAi Consortium (TRC) http://www.broadinstitute.org/rnai/trc

546 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data using the sequence of the shRNA or another unique expression library of miRNAs, Voorhoeve et al. (2006) sequence in the shRNA vector. Indeed, the use of a pooled identified miR-372 and miR-373 in a Ras-induced senes- format screen, together with microarrays to identify the cence bypass screen. Since a relatively small number of abundance of shRNA sequences, has been used to iden- miRNAs exist, one advantage of screening with current tify the NF-kB pathway and CARD11 in particular as miRNA expression libraries is that it is possible to compre- essential in the activated B-cell-like subtype of diffuse hensively query each of the miRNAs. However, with the large B-cell lymphoma (Ngo et al. 2006), TTI1 (Tel two- development of large-scale libraries of cDNAs or ORFs for interacting protein 1) and TTI2 as members of a complex the majority of human genes, similar experiments at the with TEL2 and ATM that mediate resistance to ioniz- genome scale will be increasingly possible in the near future. ing radiation (Hurov et al. 2010), 53BP1 as an essential mediator of nutlin-3-induced cytotoxicity (Brummelkamp Model systems et al. 2006), and CRKL as a NSCLC oncogene (Luo et al. A wide spectrum of model systems exists that permits the 2008). As deep-sequencing technologies become widely investigation of the context necessary for cell transfor- available, this technology will increasingly be used for mation and progression. The commonly used model deconvolution of both focused and genome-scale screens. systems include large panels of genome-annotated hu- In addition to the identification of genes that are man cancer cells, early passage primary cancer cells from oncogenes or that act in specific pathways, the use of patients, genetically engineered immortal primary hu- loss-of-function screens has also facilitated the identifi- man cells, and GEM models and their derivative primary cation of genes whose expression is essential in a partic- or transformed cells. The optimal use of these models ular context. For example, several groups have identified requires an understanding of their ideal applications and genes that, when suppressed, lead to cell death only in the experimental limitations, and the most predictive results context of cells that are dependent on oncogenic KRAS. will come from complementary uses of multiple models. Specifically, TBK1, STK33, PLK1, WT1, and SNAIL2 have been identified as genes that are required for the survival Established cancer cell model systems Established hu- of cells dependent on KRAS (Barbie et al. 2009; Luo et al. man cancer cell lines and primary human cancer cell 2009; Scholl et al. 2009; Wang et al. 2010). Although not cultures have been used extensively in functional valida- yet reaching saturation, these studies already provide tion assays due to their ease of manipulation and versa- a path toward defining enhancers and suppressors that tility. Although such systems only partially model the may also serve as therapeutic targets. Indeed, a similar more complex biological features of cancers in vivo, they strategy was used to identify genes that enhance PARP have proven powerful in advancing the validation of inhibitor sensitivity (Turner et al. 2008). Taken together, novel cancer genes, defining signaling pathways, and the systematic manipulation of gene function promises to establishing pharmacogenomic relationships. A major provide information complementary to that derived from challenge to the optimal use of these established cancer characterizing mutations in cancer genomes. cell line panels has been the lack of a complete atlas of the genetic alterations in these cells, as it is appreciated that Manipulating miRNA expression miRNAs are endoge- the specific genotypes in these cell models will dictate or nous small noncoding RNAs that function by down- modify the response to any molecular (RNAi or over- regulating expression of their target genes, either primar- expression) or pharmacological perturbation. Using the ily through induction of transcript degradation or through same tools described above to characterize tumor ge- translational inhibition. Approximately 500 annotated nomes, several groups are engaged in characterizing large human miRNAs have been described to date, although panels of cell lines, such as the Cancer Cell Line Project the targets of most of these miRNAs remain undefined (http://www.sanger.ac.uk/genetics/CGP/CellLines), or the (Griffiths-Jones 2006). Several lines of evidence have Cancer Cell Line Encyclopedia Project (http://www. established that dysregulation of miRNAs contributes broadinstitute.org/ccle). Additionally, when multiple in- to malignant transformation. Indeed, mice lacking Dicer, dependent cell lines with molecular diversity are used, the endo-ribonuclease that is required for miRNA pro- the risk that the observed phenotypes are idiosyncratic to cessing, show an increased susceptibility to cancer a particular cell line can be minimized. Although these (Sekine et al. 2009). In addition, specific miRNAs have cell line models are powerful in throughput, they do not been implicated in particular tumors. For example, let-7, capture all molecular subtypes of a particular tumor type, a negative regulator of RAS, is up-regulated in a subset of nor do they retain any interaction with stromal microen- lung cancers (Johnson et al. 2005); the miR-17-92 cluster vironment, thus raising the possibility that certain gene is amplified and up-regulated in lymphomas and pro- functions will be audited or represent artifacts of in vitro motes lymphomagenesis (He et al. 2005), and miR-15 and biology. Therefore, re-enforcing data from other model miR-16, negative regulators of BCL2, are down-regulated systems will continue to be important to complement in chronic lymphocytic leukemia (Cimmino et al. 2005). information derived from established cell lines. Recent work has confirmed that, just as has been reported for protein-coding oncogenes, such miRNAs are also Nontransformed genetically engineered cells Primary essential for tumor maintenance (Medina et al. 2010). human cells engineered with initiating events that are Several groups have now created expression libraries insufficient to achieve full transformation represent composed of miRNAs. For example, using a retroviral a powerful system to validate novel genes. In particular,

GENES & DEVELOPMENT 547 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Chin et al. such experimental models permit the creation of isogenic However, as noted, every assay—whether it is low or cells with specific mutations found in specific tumor high throughput—will yield biological false positives and subtypes, enabling the investigation of the role of a can- false negatives due to the context-specific nature of gene didate cancer driver in a stringently defined genetic function. context. Such models can complement the use of estab- Context can relate to cellular, genetic, and microenvi- lished human cancer cell lines to interrogate functions of ronmental factors. One well-known example is the op- a candidate gene through both suppression by RNAi in posing roles of TGFb signaling in initiation versus pro- established cancer cell lines and overexpression by ex- gression (Massague 2008); hence, it is possible that the pression in engineered primary human cells. role of a candidate cancer gene may be oncogenic or tumor-suppressive, depending on the specific cellular or GEM models GEM models of cancer have proven in- developmental contexts. Alternatively, a candidate may valuable in cancer gene validation and in revealing mech- require cooperation of another genetic event to manifest anistic insights of a novel gene’s role in the cancer process. its oncogenicity. An example is NEDD9, a metastasis gene While tumors from these GEM models are tremendously that resides on chromosome 6 that is frequently gained in useful for comparative oncogenomics (see above), the melanoma (Kim et al. 2006). NEDD9 exhibited robust practical challenges of time and expense involved in the invasion activity in a Modified Boyden Chamber assay creation, characterization, and uses of such GEM tumor only in melanocytes harboring RAS mutations, consistent models limit their utility as a high-throughput system. with its focal amplification in RAS mutant melanoma The recent advances in nongermline GEM (nGEM) cells that have acquired metastatic capacity in vivo (Kim models offered different approaches that mitigate some et al. 2006). Thus, the specific genetic and cellular back- of these limitations (for review, see Heyer et al. 2010)). ground is necessary to understand the context in which Inspired by the use of nontransformed stem/progenitor a candidate oncogene or operates. cells from GEM models, termed stem transgenesis Biological false positives can also emerge as a direct (Bachoo et al. 2002), these nGEM systems make use consequence of the artificial nature of experimental models. of tissue-restricted stem and progenitor cells that are For example, overexpression may induce phenotypes due to engineered with signature mutations encountered in supraphysiologic levels of expression, or suppression of a specific human cancer subtypes for transplantation into gene may have a different phenotype in vivo. The combi- a primed syngeneic recipient, in which the engineered nation of both functional studies and information derived primary cells hone in to the appropriate tissue for tumor from the analysis of cancer genomes can help mitigate these development. When these primary cells are engineered in concerns. In addition, clinicopathological validation using such a way that they are poised for (but not capable of on tumor tissue microarrays can be highly informative, offer- their own) transformation, transduction with a library of ing added evidentiary support for cancer relevance by vectors encoding candidate oncogene ORFs or shRNAs demonstrating the prevalence of dysregulation on DNA targeting a candidate tumor suppressor gene will permit (by FISH) and protein (by immunohistochemistry or immu- selection for cooperating event(s) to achieve tumorigen- nofluorescence) levels in independent large cohorts of esis. For example, Rad17 was shown to be a haploinsuffi- specific tumor types and of broad tumor spectrums. cient tumor suppressor in lymphoma in a study in which Although we have at our disposal a series of assays that a library of shRNA targeting a curated list of 1000 cancer permit the assessment of a large number of candidates genes was introduced into hematopoietic progenitor cells with high throughput, as well as ones that enable deeper derived from Em-myc transgenic mice and screened for interrogation with lower throughput, the existing reper- lymphomagenesis following engraftment into syngeneic toire of cancer-related assays remains incomplete. Even recipients. Although this is often used in hematopoietic in assays that are well developed, cellular as well as systems as hematopoietic stem and progenitor cells are genetic contexts are important in interpretation of the readily isolated from bone marrow or fetal livers, concep- results. Hence, it is important that conclusions drawn tually similar approaches can be applied to other organ from these functional assays performed in experimental systems, such as liver or brain, as stem or progenitor cells model systems are validated in human cancer specimens. have been identified, isolated, and transplanted in these systems (Bachoo et al. 2002; Zender et al. 2006; Zindy Conclusions and challenges et al. 2007). Comprehensive characterization of human cancer ge- nomes will soon generate comprehensive lists of genomic Approaches to functional validation and epigenomic alterations present in diverse human To fully credential a candidate gene as a therapeutic cancers. The integration of these genome characteriza- target or a diagnostic biomarker requires extensive func- tion efforts—both structural and functional—promises to tional assays and downstream biological studies that are provide the foundation for a complete understanding of time- and labor-intensive (Chin and Gray 2008). One the somatic alterations that program cancer initiation, approach to validation is to begin with assays that offer maintenance, and progression. Although the majority of throughput rather than depth, and move only the vali- early efforts have focused on known cancer pathways as a dated ones onto lower-throughput labor-intensive as- means to validate these methodologies, the full application says that provide insight into specific biological aspects. of these approaches will identify new pathways and

548 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data networks necessary for establishment and maintenance Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald of the malignant state. Moreover, such efforts will pro- A, Boldrick JC, Sabet H, Tran T, Yu X, et al. 2000. Distinct vide a framework on which to develop new therapeutics types of diffuse large B-cell lymphoma identified by gene and rational combination regimens. expression profiling. Nature 403: 503–511. Aza-Blanc P, Cooper CL, Wagner K, Batalov S, Deveraux QL, Although the throughput and reproducibility of Cooke MP. 2003. Identification of modulators of TRAIL- methods for analyzing cancer genomes has improved at induced apoptosis via RNAi-based phenotypic screening. a rapid rate, many challenges remain. For example, Mol Cell 12: 627–637. collecting accurate clinical information on tumor sam- Bachoo RM, Maher EA, Ligon KL, Sharpless NE, Chan SS, You ples remains an important and difficult task, but one that MJ, Tang Y, DeFrances J, Stover E, Weissleder R, et al. 2002. is necessary to afford the interpretation of genomic Epidermal growth factor receptor and Ink4a/Arf: convergent findings in a larger context. However, most available mechanisms governing terminal differentiation and trans- samples are associated with incomplete annotation or formation along the neural stem cell to astrocyte axis. lack appropriate consent to permit the linkage of clinical Cancer Cell 1: 269–277. information. Further progress will require the develop- Balmain A, Gray J, Ponder B. 2003. The genetics and genomics of cancer. Nat Genet 33: 238–244. ment and expansion of an infrastructure to collect sam- Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, ples and associated data with appropriate consent. The Schinzel AC, Sandy P, Meylan E, Scholl C, et al. 2009. challenges—scientific, operational, and legal—of this pro- Systematic RNA interference reveals that oncogenic KRAS- cess should not be underestimated. driven cancers require TBK1. Nature 462: 108–112. In addition, it is clear that future efforts will need to Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista account for genetic heterogeneity within tumors. As C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, et al. technologies improve in the near future, one will be able 2009. NCBI GEO: archive for high-throughput functional ge- to explore patterns of molecular heterogeneity on the nomic data. Nucleic Acids Res 37: D885–D890. doi: 10.1093/ single-cell level to determine how such heterogeneity nar/gkn764. affects tumor biology and the response to treatment. Bass AJ, Watanabe H, Mermel CH, Yu S, Perner S, Verhaak RG, Kim SY, Wardwell L, Tamayo P, Gat-Viks I, et al. 2009. SOX2 Finally, as emphasized above, integrative analyses of is an amplified lineage-survival oncogene in lung and esoph- comprehensive cancer genomics data will generate hy- ageal squamous cell carcinomas. Nat Genet 41: 1238–1242. potheses (such as candidate targets) that require experi- Bauer S, Grossmann S, Vingron M, Robinson PN. 2008. Ontolog- mental testing and validation. Such experimental valida- izer 2.0—a multifunctional tool for GO term enrichment tion results will guide further refinements of these analysis and data exploration. Bioinformatics 24: 1650–1651. analytical tools and approaches. The release of data sets Baxter EJ, Scott LM, Campbell PJ, East C, Fourouclas N, prior to publication will aid in these efforts, but it is clear Swanton S, Vassiliou GS, Bench AJ, Boyd EM, Curtin N, that more work is necessary to make these data sets et al. 2005. Acquired mutation of the tyrosine kinase JAK2 in available in useful formats. human myeloproliferative disorders. Lancet 365: 1054–1061. In summary, cancer is the consequence of accumulated Bengtsson H, Neuvial P, Speed TP. 2010. TumorBoost: normal- ization of allele-specific tumor copy numbers from a single somatic genomic and epigenomic alterations within the pair of tumor-normal genotyping microarrays. BMC Bioin- tumor cells, influenced by their heterotypic interactions formatics 11: 245. doi: 10.1186/1471-2105-11-245. with a tumor microenvironment. The ability to catalog the Benjamini Y, Hochberg Y. 1995. Controlling the false discovery somatic genomic alterations in large numbers of cancers, rate: a practical and powerful approach to multiple testing. together with systematic functional analyses, will provide J R Stat Soc Ser B Methodol 57: 289–300. a foundation that will facilitate efforts to understand how Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, these alterations induce malignant transformation. More- Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell over, as these genome characterization efforts are applied HR, et al. 2008. Accurate whole human genome sequencing prospectively in patients, these efforts will provide a frame- using reversible terminator chemistry. Nature 456: 53–59. work for relating studies in experimental models to the Ben-Yaacov E, Eldar YC. 2008. A fast and flexible method for the segmentation of aCGH data. Bioinformatics 24: i139–i145. corresponding human tumors. doi: 10.1093/bioinformatics/btn272. Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Acknowledgments Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C, et al. 2010. Integrative analysis of the melanoma transcrip- We apologize to colleagues whose studies, algorithms, or portals tome. Genome Res. 20: 413–427. are not cited in this review due to limitation in space. L.C., G.G., Bernstein E, Caudy AA, Hammond SM, Hannon GJ. 2001. Role and M.M. are TCGA-funded investigators (NIH U24CA143845, for a bidentate ribonuclease in the initiation step of RNA U24CA144025, and U24CA143867). L.C. and W.C.H. are CTD2 interference. Nature 409: 363–366. (Cancer Target Discovery and Development) network investiga- Beroukhim R, Getz G, Nghiemphu L, Barretina J, Hsueh T, tors (NIH RC2CA148268). Linhart D, Vivanco I, Lee JC, Huang JH, Alexander S, et al. 2007. Assessing the significance of chromosomal aberrations References in cancer: methodology and application to glioma. Proc Natl Acad Sci 104: 20007–20012. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, A, Bork P, Kondrashov AS, Sunyaev SR. 2010. A method and Donovan J, Barretina J, Boehm JS, Dobson J, Urashima M, server for predicting damaging missense mutations. Nat et al. 2010. The landscape of somatic copy-number alteration Methods 7: 248–249. across human cancers. Nature 463: 899–905.

GENES & DEVELOPMENT 549 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Chin et al.

Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Copeland NG, Jenkins NA. 2010. Harnessing transposons for Grigorova M, Jones KW, Wei W, Stratton MR, et al. 2004. cancer gene discovery. Nat Rev Cancer 10: 696–706. High-resolution analysis of DNA copy number using oligo- Dalgliesh GL, Furge K, Greenman C, Chen L, Bignell G, Butler nucleotide microarrays. Genome Res 14: 287–295. A, Davies H, Edkins S, Hardy C, Latimer C, et al. 2010. Boehm JS, Zhao JJ, Yao J, Kim SY, Firestein R, Dunn IF, Sjostrom Systematic sequencing of renal carcinoma reveals inactiva- SK, Garraway LA, Weremowicz S, Richardson A, et al. 2007. tion of histone modifying genes. Nature 463: 360–363. Integrative genomic approaches identify IKBKE as a breast Davies H, Bignell GR, Cox C, Stephens P, Edkins S, Clegg S, cancer oncogene. Cell 129: 1065–1079. Teague J, Woffendin H, Garnett MJ, Bottomley W, et al. 2002. Bos JL, Toksoz D, Marshall CJ, Verlaan-de Vries M, Veeneman Mutations of the BRAF gene in human cancer. Nature 417: GH, van der Eb AJ, van Boom JH, Janssen JW, Steenvoorden 949–954. AC. 1985. Amino-acid substitutions at codon 13 of the N-ras Davies H, Hunter C, Smith R, Stephens P, Greenman C, Bignell oncogene in human acute myeloid leukaemia. Nature 315: G, Teague J, Butler A, Edkins S, Stevens C, et al. 2005. 726–730. Somatic mutations of the protein kinase gene family in Brennan C, Zhang Y, Leo C, Feng B, Cauwels C, Aguirre AJ, Kim human lung cancer. Cancer Res 65: 7591–7595. M, Protopopov A. Chin L. 2004. High-resolution global Day N, Hemmaplardh A, Thurman RE, Stamatoyannopoulos profiling of genomic alterations with long JA, Noble WS. 2007. Unsupervised segmentation of contin- microarray. Cancer Res 64: 4744–4748. uous genomic data. Bioinformatics 23: 1424–1426. Brummelkamp TR, Bernards R. 2003. New tools for functional Ding L, Ellis MJ, Li S, Larson DE, Chen K, Wallis JW, Harris CC, mammalian cancer genetics. Nat Rev Cancer 3: 781–789. McLellan MD, Fulton RS, Fulton LL, et al. 2010a. Genome Brummelkamp TR, Fabius AW, Mullenders J, Madiredjo M, Velds remodelling in a basal-like breast cancer metastasis and A, Kerkhoven RM, Bernards R, Beijersbergen RL. 2006. An xenograft. Nature 464: 999–1005. shRNA barcode screen provides insight into cancer cell Ding L, Wendl MC, Koboldt DC, Mardis ER. 2010b. Analysis of vulnerability to MDM2 inhibitors. Nat Chem Biol 2: 202–206. next-generation genomic data in cancer: accomplishments Brunet JP, Tamayo P, Golub TR, Mesirov JP. 2004. Metagenes and challenges. Hum Mol Genet 19: R188–R196. doi: and molecular pattern discovery using matrix factorization. 10.1093/hmg/ddq391. Proc Natl Acad Sci 101: 4164–4169. Dobbin KK, Beer DG, Meyerson M, Yeatman TJ, Gerald WL, Buchholz F, Kittler R, Slabicki M, Theis M. 2006. Enzymatically Jacobson JW, Conley B, Buetow KH, Heiskanen M, Simon prepared RNAi libraries. Nat Methods 3: 696–700. RM, et al. 2005. Interlaboratory comparability study of Cai WW, Mao JH, Chow CW, Damani S, Balmain A. Bradley A. cancer gene expression analysis using oligonucleotide micro- 2002. Genome-wide detection of chromosomal imbalances in arrays. Clin Cancer Res 11: 565–572. tumors using BAC microarrays. Nat Biotechnol 20: 393–396. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Campbell PJ, Stephens PJ, Pleasance ED, O’Meara S, H, Kermani BG, Carnevali P, Nazarenko I, Nilsen GB, Yeung G, Santarius T, Stebbings LA, Leroy C, Edkins S, Hardy C, et al. 2010. Human genome sequencing using unchained base et al. 2008. Identification of somatically acquired rearrange- reads on self-assembling DNA nanoarrays. Science 327: 78–81. ments in cancer using genome-wide massively parallel Dupuy AJ, Rogers LM, Kim J, Nannapaneni K, Starr TK, Liu P, paired-end sequencing. Nat Genet 40: 722–729. Largaespada DA, Scheetz TE, Jenkins NA, Copeland NG. The Cancer Genome Atlas Network. 2008. Comprehensive 2009. A modified sleeping beauty transposon system that genomic characterization defines human genes can be used to model a wide variety of human cancers in and core pathways. Nature 455: 1061–1068. mice. Cancer Res 69: 8150–8156. Carter H, Chen S, Isik L, Tyekucheva S, Velculescu VE, Kinzler Dutt A, Salvesen HB, Chen TH, Ramos AH, Onofrio RC, Hatton KW, Vogelstein B, Karchin R. 2009. Cancer-specific high- C, Nicoletti R, Winckler W, Grewal R, Hanna M, et al. 2008. throughput annotation of somatic mutations: computational Drug-sensitive FGFR2 mutations in endometrial carcinoma. prediction of driver missense mutations. Cancer Res 69: Proc Natl Acad Sci 105: 8713–8717. 6660–6667. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. 2009. Chen Y, Takita J, Choi YL, Kato M, Ohira M, Sanada M, Wang L, GOrilla: a tool for discovery and visualization of enriched Soda M, Kikuchi A, Igarashi T, et al. 2008. Oncogenic GO terms in ranked gene lists. BMC Bioinformatics 10: 48. mutations of ALK kinase in neuroblastoma. Nature 455: doi: 10.1186/1471-2105-10-48. 971–974. Edgar R, Domrachev M, Lash AE. 2002. Gene Expression Cheng H, Liu P, Wang ZC, Zou L, Santiago S, Garbitt V, Gjoerup Omnibus: NCBI gene expression and hybridization array OV, Iglehart JD, Miron A, Richardson AL, et al. 2009. SIK1 data repository. Nucleic Acids Res 30: 207–210. couples LKB1 to p53-dependent anoikis and suppresses Eisen MB, Spellman PT, Brown PO, Botstein D. 1998. Cluster metastasis. Sci Signal 2: ra35. doi: 10.1126/scisignal. analysis and display of genome-wide expression patterns. 2000369. Proc Natl Acad Sci 95: 14863–14868. Chiang DY, Getz G, Jaffe DB, O’Kelly MJ, Zhao X, Carter SL, Firestein R, Bass AJ, Kim SY, Dunn IF, Silver SJ, Guney I, Freed Russ C, Nusbaum C, Meyerson M, Lander ES. 2009. High- E, Ligon AH, Vena N, Ogino S, et al. 2008. CDK8 is resolution mapping of copy-number alterations with mas- a colorectal cancer oncogene that regulates b-catenin activ- sively parallel sequencing. Nat Methods 6: 99–103. ity. Nature 455: 547–551. Chin L, Gray JW. 2008. Translating insights from the cancer Fong PC, Boss DS, Yap TA, Tutt A, Wu P, Mergui-Roelvink M, genome into clinical practice. Nature 452: 553–563. Mortimer P, Swaisland H, Lau A, O’Connor MJ, et al. 2009. Cimmino A, Calin GA, Fabbri M, Iorio MV, Ferracin M, Shimizu Inhibition of poly(ADP-ribose) polymerase in tumors from M, Wojcik SE, Aqeilan RI, Zupo S, Dono M, et al. 2005. miR- BRCA mutation carriers. N Engl J Med 361: 123–134. 15 and miR-16 induce apoptosis by targeting BCL2. Proc Fong PC, Yap TA, Boss DS, Carden CP, Mergui-Roelvink M, Natl Acad Sci 102: 13944–13949. Gourley C, De Greve J, Lubinski J, Shanley S, Messiou C, Collins FS, Green ED, Guttmacher AE, Guyer MS. 2003. A et al. 2010. Poly(ADP)-ribose polymerase inhibition: frequent vision for the future of genomics research. Nature 422: 835– durable responses in BRCA carrier ovarian cancer correlating 847. with platinum-free interval. J Clin Oncol 28: 2512–2519.

550 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data

Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, Heidorn SJ, Milagre C, Whittaker S, Nourry A, Niculescu-Duvas Menzies A, Teague JW, Futreal PA, Stratton MR. 2008. The I, Dhomen N, Hussain J, Reis-Filho JS, Springer CJ, Pritchard catalogue of somatic mutations in cancer (COSMIC). Curr C, et al. 2010. Kinase-dead BRAF and oncogenic RAS coop- Protoc Hum Genet 57: 10.11.1–10.11.26. doi: 10.1002/ erate to drive tumor progression through CRAF. Cell 140: 0471142905.hg1011s57. 209–221. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Heyer J, Kwong LN, Lowe SW, Chin L. 2010. Non-germline Shepherd R, Leung K, Menzies A, et al. 2011. COSMIC: genetically engineered mouse models for translational can- mining complete cancer genomes in the catalogue of somatic cer research. Nat Rev Cancer 10: 470–480. mutations in cancer. Nucleic Acids Res 39: D945–D950. doi: Hodgson G, Hager JH, Volik S, Hariono S, Wernick M, Moore D, 10.1093/nar/gkq929. Nowak N, Albertson DG, Pinkel D, Collins C, et al. 2001. Friend SH, Bernards R, Rogelj S, Weinberg RA, Rapaport JM, Genome scanning with array CGH delineates regional alter- Albert DM, Dryja TP. 1986. A human DNA segment with ations in mouse islet carcinomas. Nat Genet 29: 459–464. properties of the gene that predisposes to retinoblastoma and Huang DW, Sherman BT, Lempicki RA. 2009. Systematic and osteosarcoma. Nature 323: 643–646. integrative analysis of large gene lists using DAVID bioin- Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster formatics resources. Nat Protoc 4: 44–57. R, Rahman N, Stratton MR. 2004. A census of human cancer Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabe genes. Nat Rev Cancer 4: 177–183. RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, et al. 2010. Garraway LA, Widlund HR, Rubin MA, Getz G, Berger AJ, International network of cancer genome projects. Nature Ramaswamy S, Beroukhim R, Milner DA, Granter SR, Du J, 464: 993–998. et al. 2005. Integrative genomic analyses identify MITF as Hunkapiller T, Kaiser RJ, Koop BF, Hood L. 1991. Large-scale and a lineage survival oncogene amplified in malignant mela- automated DNA sequence determination. Science 254: 59–67. noma. Nature 436: 117–122. Hupe P, Stransky N, Thiery JP, Radvanyi F, Barillot E. 2004. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Analysis of array CGH data: from signal ratio to gain and loss Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al. 2004. of DNA regions. Bioinformatics 20: 3413–3422. Bioconductor: open software development for computational Hurov KE, Cotta-Ramusino C, Elledge SJ. 2010. A genetic screen biology and bioinformatics. Genome Biol 5: R80. doi: identifies the Triple T complex required for DNA damage 11.1186/gb-2004-5-10-r80. signaling and ATM and ATR stability. Genes Dev 24: 1939– George RE, Sanda T, Hanna M, Frohling S, Luther W II, Zhang J, 1950. Ahn Y, Zhou W, London WB, McGrady P, et al. 2008. Jacobs JJ, Keblusek P, Robanus-Maandag E, Kristel P, Lingbeek Activating mutations in ALK provide a therapeutic target M, Nederlof PM, van Welsem T, van de Vijver MJ, Koh EY, in neuroblastoma. Nature 455: 975–978. Daley GQ, et al. 2000. Senescence bypass screen identifies Getz G, Hofling H, Mesirov JP, Golub TR, Meyerson M, TBX2, which represses Cdkn2a (p19(ARF)) and is amplified Tibshirani R, Lander ES. 2007. Comment on ‘The consensus in a subset of human breast cancers. Nat Genet 26: 291–299. coding sequences of human breast and colorectal cancers.’ James C, Ugo V, Le Couedic JP, Staerk J, Delhommeau F, Lacout Science 317: 1500. C, Garcon L, Raslova H, Berger R, Bennaceur-Griscelli A, Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, et al. 2005. A unique clonal JAK2 mutation leading to Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, constitutive signalling causes polycythaemia vera. Nature et al. 1999. Molecular classification of cancer: class discovery 434: 1144–1148. and class prediction by gene expression monitoring. Science Janoueix-Lerosey I, Lequin D, Brugieres L, Ribeiro A, de Pontual 286: 531–537. L, Combaret V, Raynal V, Puisieux A, Schleiermacher G, Gould J, Getz G, Monti S, Reich M, Mesirov JP. 2006. Compar- Pierron G, et al. 2008. Somatic and germline activating ative gene marker selection suite. Bioinformatics 22: 1924– mutations of the ALK kinase receptor in neuroblastoma. 1925. Nature 455: 967–970. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Johnson SM, Grosshans H, Shingara J, Byrom M, Jarvis R, Cheng Bignell G, Davies H, Teague J, Butler A, Stevens C, et al. A, Labourier E, Reinert KL, Brown D, Slack FJ. 2005. RAS is 2007. Patterns of somatic mutation in human cancer ge- regulated by the let-7 microRNA family. Cell 120: 635–647. nomes. Nature 446: 153–158. Jones PA, Baylin SB. 2002. The fundamental role of epigenetic Greenman CD, Bignell G, Butler A, Edkins S, Hinton J, Beare D, events in cancer. Nat Rev Genet 3: 415–428. Swamy S, Santarius T, Chen L, Widaa S, et al. 2010. PICNIC: Jones PA, Baylin SB. 2007. The epigenomics of cancer. Cell 128: an algorithm to predict absolute allelic copy number varia- 683–692. tion with microarray cancer data. Biostatistics 11: 164–175. Jones NC, Pevzner PA. 2004. An introduction to bioinformatics Griffiths-Jones S. 2006. miRBase: the microRNA sequence algorithms. MIT Press, Cambridge, MA. database. Methods Mol Biol 342: 129–138. Jones S, Zhang X, Parsons DW, Lin JC, Leary RJ, Angenendt P, Gunawardane RN, Sgroi DC, Wrobel CN, Koh E, Daley GQ, Mankoo P, Carter H, Kamiyama H, Jimeno A, et al. 2008. Brugge JS. 2005. Novel role for PDEF in epithelial cell Core signaling pathways in human pancreatic cancers re- migration and invasion. Cancer Res 65: 11572–11580. vealed by global genomic analyses. Science 321: 1801–1806. Hahn SA, Schutte M, Hoque AT, Moskaluk CA, da Costa LT, Jones TR, Kang IH, Wheeler DB, Lindquist RA, Papallo A, Rozenblum E, Weinstein CL, Fischer A, Yeo CJ, Hruban RH, Sabatini DM, Golland P, Carpenter AE. 2008. CellProfiler et al. 1996. DPC4, a candidate tumor suppressor gene at analyst: data exploration and analysis software for complex human chromosome 18q21.1. Science 271: 350–353. image-based screens. BMC Bioinformatics 9: 482. doi: Hanahan D, Weinberg RA. 2000. The hallmarks of cancer. Cell 10.1186/1471-2105-482. 100: 57–70. Joseph EW, Pratilas CA, Poulikakos PI, Tadi M, Wang W, Taylor He L, Thomson JM, Hemann MT, Hernando-Monge E, Mu D, BS, Halilovic E, Persaud Y, Xing F, Viale A, et al. 2010. The Goodson S, Powers S, Cordon-Cardo C, Lowe SW, Hannon RAF inhibitor PLX4032 inhibits ERK signaling and tumor GJ, et al. 2005. A microRNA polycistron as a potential cell proliferation in a V600E BRAF-selective manner. Proc human oncogene. Nature 435: 828–833. Natl Acad Sci 107: 14903–14908.

GENES & DEVELOPMENT 551 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Chin et al.

Kaminker JS, Zhang Y, Watanabe C, Zhang Z. 2007. CanPredict: Li C, Wong WH. 2001. Model-based analysis of oligonucleotide a computational tool for predicting cancer-associated mis- arrays: expression index computation and outlier detection. sense mutations. Nucleic Acids Res 35: W595-W598. doi: Proc Natl Acad Sci 98: 31–36. 10.1093/nar/gkm405. Lin HY, Harris TL, Flannery MS, Aruffo A, Kaji EH, Gorn A, Kan Z, Jaiswal BS, Stinson J, Janakiraman V, Bhatt D, Stern HM, Kolakowski LF Jr, Lodish HF, Goldring SR. 1991. Expression Yue P, Haverty PM, Bourgon R, Zheng J, et al. 2010. Diverse cloning of an adenylate cyclase-coupled calcitonin receptor. somatic mutation patterns and pathway alterations in hu- Science 254: 1022–1024. man cancers. Nature 466: 869–873. Lindblad-Toh K, Tanenbaum DM, Daly MJ, Winchester E, Lui Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, WO, Villapakkam A, Stanton SE, Larsson C, Hudson TJ, Kawashima S, Katayama T, Araki M, Hirakawa M. 2006. Johnson BE, et al. 2000. Loss-of-heterozygosity analysis of From genomics to chemical genomics: new developments in small-cell lung carcinomas using single-nucleotide polymor- KEGG. Nucleic Acids Res 34: D354–D357. doi: 10.1093/nar/ phism arrays. Nat Biotechnol 18: 1001–1005. gkj102. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Kauffmann A, Huber W. 2010. Microarray data quality control Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA, et al. improves the detection of differentially expressed genes. 2005. MicroRNA expression profiles classify human cancers. Genomics 95: 138–142. Nature 435: 834–838. Khan J, Simon R, Bittner M, Chen Y, Leighton SB, Pohida T, Luo B, Cheung HW, Subramanian A, Sharifnia T, Okamoto M, Smith PD, Jiang Y, Gooden GC, Trent JM, et al. 1998. Gene Yang X, Hinkle G, Boehm JS, Beroukhim R, Weir BA, et al. expression profiling of alveolar rhabdomyosarcoma with 2008. Highly parallel identification of essential genes in cDNA microarrays. Cancer Res 58: 5009–5013. cancer cells. Proc Natl Acad Sci 105: 20380–20385. Kim PM, Tidor B. 2003. Subsystem identification through Luo J, Emanuele MJ, Li D, Creighton CJ, Schlabach MR, Westbrook dimensionality reduction of large-scale gene expression data. TF, Wong KK, Elledge SJ. 2009. A genome-wide RNAi screen Genome Res 13: 1706–1718. identifies multiple synthetic lethal interactions with the Ras Kim M, Gans JD, Nogueira C, Wang A, Paik JH, Feng B, Brennan oncogene. Cell 137: 835–848. C, Hahn WC, Cordon-Cardo C, Wagner SN, et al. 2006. Lynch TJ, Bell DW, Sordella R, Gurubhagavatula S, Okimoto Comparative oncogenomics identifies NEDD9 as a mela- RA, Brannigan BW, Harris PL, Haserlat SM, Supko JG, noma metastasis gene. Cell 125: 1269–1281. Haluska FG, et al. 2004. Activating mutations in the epi- Kittler R, Buchholz F. 2005. Functional genomic analysis of cell dermal growth factor receptor underlying responsiveness of division by endoribonuclease-prepared siRNAs. Cell Cycle 4: non-small-cell lung cancer to gefitinib. N Engl J Med 350: 564–567. 2129–2139. Koh EY, Chen T, Daley GQ. 2002. Novel retroviral vectors to MacKeigan JP, Murphy LO, Blenis J. 2005. Sensitized RNAi facilitate expression screens in mammalian cells. Nucleic screen of human kinases and identifies new Acids Res 30: e142. doi: 10.1093/nar/gnf142. regulators of apoptosis and chemoresistance. Nat Cell Biol 7: Kolfschoten IG, van Leeuwen B, Berns K, Mullenders J, 591–600. Beijersbergen RL, Bernards R, Voorhoeve PM, Agami R. Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, 2005. A genetic screen identifies PITX1 as a suppressor of Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM. RAS activity and tumorigenicity. Cell 121: 849–858. 2009a. Transcriptome sequencing to detect gene fusions in Kralovics R, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg cancer. Nature 458: 97–101. JR, Tichelli A, Cazzola M, Skoda RC. 2005. A gain-of- Maher CA, Palanisamy N, Brenner JC, Cao X, Kalyana-Sundaram function mutation of JAK2 in myeloproliferative disorders. S, Luo S, Khrebtukova I, Barrette TR, Grasso C, Yu J, et al. N Engl J Med 352: 1779–1790. 2009b. Chimeric transcript discovery by paired-end tran- LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington scriptome sequencing. Proc Natl Acad Sci 106: 12353– D, Sellers WR, Meyerson M. 2005. Allele-specific amplifica- 12358. tion in cancer revealed by SNP array analysis. PLoS Comput Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen Biol 1: e65. doi: 10.1371/journal.pcbi.0010065. K, Koboldt DC, Fulton RS, Delehaunty KD, McGrath SD, Lai WR, Johnson MD, Kucherlapati R, Park PJ. 2005. Compara- et al. 2009. Recurring mutations found by sequencing an tive analysis of algorithms for identifying amplifications and acute myeloid leukemia genome. N Engl J Med 361: 1058– deletions in array CGH data. Bioinformatics 21: 3763–3770. 1066. Lai W, Choudhary V, Park PJ. 2008. CGHweb: a tool for Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, comparing DNA copy number segmentations from multiple Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, algorithms. Bioinformatics 24: 1014–1015. et al. 2005. Genome sequencing in microfabricated high- Lamesch P, Li N, Milstein S, Fan C, Hao T, Szabo G, Hu Z, density picolitre reactors. Nature 437: 376–380. Venkatesan K, Bethel G, Martin P, et al. 2007. Human Maser RS, Choudhury B, Campbell PJ, Feng B, Wong KK, ORFeome 3.1: a resource of human open reading frames Protopopov A, O’Neil J, Gutierrez A, Ivanova E, Perna I, covering over 10,000 human genes. Genomics 89: 307–315. et al. 2007. Chromosomally unstable mouse tumours have Lander ES. 1996. The new genomics: global views of biology. genomic alterations similar to diverse human cancers. Science 274: 536–539. Nature 447: 966–971. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin Massague J. 2008. TGFb in cancer. Cell 134: 215–230. J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. 2001. McCusick VA. 1998. Mendelian inheritance in man: a catalog Initial sequencing and analysis of the human genome. of human genes and genetic disorders. Johns Hopkins Nature 409: 860–921. University Press, Baltimore, MD. Levine RL, Wadleigh M, Cools J, Ebert BL, Wernig G, Huntly BJ, McDermott U, Iafrate AJ, Gray NS, Shioda T, Classon M, Boggon TJ, Wlodarska I, Clark JJ, Moore S, et al. 2005. Maheswaran S, Zhou W, Choi HG, Smith SL, Dowell L, Activating mutation in the tyrosine kinase JAK2 in poly- et al. 2008. Genomic alterations of anaplastic lymphoma cythemia vera, essential thrombocythemia, and myeloid kinase may sensitize tumors to anaplastic lymphoma kinase metaplasia with myelofibrosis. Cancer Cell 7: 387–397. inhibitors. Cancer Res 68: 3389–3395.

552 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data

Medina PP, Nolde M, Slack FJ. 2010. OncomiR addiction in an K, et al. 2010. Rearrangements of the RAF kinase pathway in in vivo model of microRNA-21-induced pre-B-cell lym- prostate cancer, gastric cancer and melanoma. Nat Med 16: phoma. Nature 467: 86–90. 793–798. Mei R, Galipeau PC, Prass C, Berno A, Ghandour G, Patil N, Pao W, Miller V, Zakowski M, Doherty J, Politi K, Sarkaria I, Wolff RK, Chee MS, Reid BJ, Lockhart DJ. 2000. Genome- Singh B, Heelan R, Rusch V, Fulton L, et al. 2004. EGF wide detection of allelic imbalance using human SNPs and receptor gene mutations are common in lung cancers from high-density DNA arrays. Genome Res 10: 1126–1137. ‘never smokers’ and are associated with sensitivity of tumors Mesirov JP. 2010. Computer science. Accessible reproducible to gefitinib and erlotinib. Proc Natl Acad Sci 101: 13306– research. Science 327: 415–416. 13311. Meyerson M, Gabriel S, Getz G. 2010. Advances in understand- Parada LF, Tabin CJ, Shih C, Weinberg RA. 1982. Human EJ ing cancer genomes through second-generation sequencing. bladder carcinoma oncogene is homologue of Harvey sar- Nat Rev Genet 11: 685–696. coma virus ras gene. Nature 297: 474–478. Mitelman F, Johansson B, Mertens F. 2007. The impact of Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, translocations and gene fusions on cancer causation. Nat Mankoo P, Carter H, Siu IM, Gallia GL, et al. 2008. An Rev Cancer 7: 233–245. integrated genomic analysis of human glioblastoma multi- Moffat J, Grueneberg DA, Yang X, Kim SY, Kloepfer AM, Hinkle forme. Science 321: 1807–1812. G, Piqani B, Eisenhaure TM, Luo B, Grenier JK, et al. 2006. A Peeper DS, Shvarts A, Brummelkamp T, Douma S, Koh EY, lentiviral RNAi library for human and mouse genes applied Daley GQ, Bernards R. 2002. A functional screen identifies to an arrayed viral high-content screen. Cell 124: 1283–1298. hDRIL1 as an oncogene that rescues RAS-induced senes- Mosse YP, Laudenslager M, Longo L, Cole KA, Wood A, Attiyeh cence. Nat Cell Biol 4: 148–153. EF, Laquaglia MJ, Sennett R, Lynch JE, Perri P, et al. 2008. Pettersson E, Lundeberg J, Ahmadian A. 2009. Generations of Identification of ALK as a major familial neuroblastoma sequencing technologies. Genomics 93: 105–111. predisposition gene. Nature 455: 930–935. Pevsner J. 2009. Bioinformatics and functional genomics. Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan-Smith E, Wiley-Blackwell, New York. Dalton JD, Girtman K, Mathew S, Ma J, Pounds SB, et al. Picard F, Robin S, Lavielle M, Vaisse C, Daudin JJ. 2005. A 2007. Genome-wide analysis of genetic alterations in acute statistical approach for array CGH data analysis. BMC lymphoblastic leukaemia. Nature 446: 758–764. Bioinformatics 6: 27. doi: 10.1186/1471-2105-6-27. Mullighan CG, Miller CB, Radtke I, Phillips LA, Dalton J, Ma J, Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Hum- White D, Hughes TP, Le Beau MM, Pui CH, et al. 2008. phray SJ, Greenman CD, Varela I, Lin ML, Ordonez GR, BCR-ABL1 lymphoblastic leukaemia is characterized by the Bignell GR, et al. 2010a. A comprehensive catalogue of deletion of Ikaros. Nature 453: 110–114. somatic mutations from a human cancer genome. Nature Nanjundan M, Nakayama Y, Cheng KW, Lahad J, Liu J, Lu K, 463: 191–196. Kuo WL, Smith-McCune K, Fishman D, Gray JW, et al. 2007. Pleasance ED, Stephens PJ, O’Meara S, McBride DJ, Meynert A, Amplification of MDS1/EVI1 and EVI1, located in the 3q26.2 Jones D, Lin ML, Beare D, Lau KW, Greenman C, et al. , is associated with favorable patient prognosis in 2010b. A small-cell lung cancer genome with complex ovarian cancer. Cancer Res 67: 3074–3084. signatures of tobacco exposure. Nature 463: 184–190. Ng PC, Henikoff S. 2001. Predicting deleterious amino acid Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov substitutions. Genome Res 11: 863–874. A, Williams CF, Jeffrey SS, Botstein D, Brown PO. 1999. Ngo VN, Davis RE, Lamy L, Yu X, Zhao H, Lenz G, Lam LT, Genome-wide analysis of DNA copy-number changes using Dave S, Yang L, Powell J, et al. 2006. A loss-of-function RNA cDNA microarrays. Nat Genet 23: 41–46. interference screen for molecular targets in cancer. Nature Pollock PM, Gartside MG, Dejeza LC, Powell MA, Mallon MA, 441: 106–110. Davies H, Mohammadi M, Futreal PA, Stratton MR, Trent Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, JM, et al. 2007. Frequent activating FGFR2 mutations in Berman BP, Pan F, Pelloski CE, Sulman EP, Bhat KP, et al. endometrial carcinomas parallel germline mutations associ- 2010. Identification of a CpG island methylator phenotype ated with craniosynostosis and skeletal dysplasia syndromes. that defines a distinct subgroup of glioma. Cancer Cell 17: Oncogene 26: 7158–7162. 510–522. Poulikakos PI, Zhang C, Bollag G, Shokat KM, Rosen N. 2010. O’Hagan RC, Brennan CW, Strahs A, Zhang X, Kannan K, RAF inhibitors transactivate RAF dimers and ERK signalling Donovan M, Cauwels C, Sharpless NE, Wong WH, Chin L. in cells with wild-type BRAF. Nature 464: 427–430. 2003. Array comparative genome hybridization for tumor Rabbitts TH. 1994. Chromosomal translocations in human classification and gene discovery in mouse models of malig- cancer. Nature 372: 143–149. nant melanoma. Cancer Res 63: 5352–5356. Ramensky V, Bork P, Sunyaev S. 2002. Human non-synonymous Paddison PJ, Silva JM, Conklin DS, Schlabach M, Li M, Aruleba SNPs: server and survey. Nucleic Acids Res 30: 3894–3900. S, Balija V, O’Shaughnessy A, Gnoj L, Scobie K, et al. 2004. A Raychaudhuri S, Stuart JM, Altman RB. 2000. Principal compo- resource for large-scale RNA-interference-based screens in nents analysis to summarize microarray experiments: appli- mammals. Nature 428: 427–431. cation to sporulation time series. Pac Symp Biocomput 2000: Paez JG, Janne PA, Lee JC, Tracy S, Greulich H, Gabriel S, 455–466. Herman P, Kaye FJ, Lindeman N, Boggon TJ, et al. 2004. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, EGFR mutations in lung cancer: correlation with clinical Fiegler H, Shapero MH, Carson AR, Chen W, et al. 2006. response to gefitinib therapy. Science 304: 1497–1500. Global variation in copy number in the human genome. Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Nature 444: 444–454. Walker MG, Watson D, Park T, et al. 2004. A multigene Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. assay to predict recurrence of tamoxifen-treated, node-nega- 2006. GenePattern 2.0. Nat Genet 38: 500–501. tive breast cancer. N Engl J Med 351: 2817–2826. Reva B, Antipin Y, Sander C. 2007. Determinants of protein Palanisamy N, Ateeq B, Kalyana-Sundaram S, Pflueger D, function revealed by combinatorial entropy optimization. Ramnarayanan K, Shankar S, Han B, Cao Q, Cao X, Suleman Genome Biol 8: R232. doi: 10.1186/gb-2007-8-11-r232.

GENES & DEVELOPMENT 553 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Chin et al.

Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, tifies BCL6 as an inhibitor of anti-proliferative p19(ARF)-p53 Ghosh D, Barrette T, Pandey A. Chinnaiyan AM. 2004. signaling. Genes Dev 16: 681–686. ONCOMINE: a cancer microarray database and integrated Silva JM, Li MZ, Chang K, Ge W, Golding MC, Rickles RJ, Siolas data-mining platform. Neoplasia 6: 1–6. D, Hu G, Paddison PJ, Schlabach MR, et al. 2005. Second- Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Varambally R, generation shRNA libraries covering the mouse and human Yu J, Briggs BB, Barrette TR, Anstet MJ, Kincead-Beal C, genomes. Nat Genet 37: 1281–1288. Kulkarni P, et al. 2007. Oncomine 3.0: genes, pathways, and Silva JM, Marran K, Parker JS, Silva J, Golding M, Schlabach networks in a collection of 18,000 cancer gene expression MR, Elledge SJ, Hannon GJ, Chang K. 2008. Profiling profiles. Neoplasia 9: 166–180. essential genes in human mammary cells by multiplex RNAi Rolfs A, Montor WR, Yoon SS, Hu Y, Bhullar B, Kelley F, screening. Science 319: 617–620. McCarron S, Jepson DA, Shen B, Taycher E, et al. 2008. Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Production and sequence validation of a complete full length Mandelker D, Leary RJ, Ptak J, Silliman N, et al. 2006. The ORF collection for the pathogenic bacterium Vibrio chol- consensus coding sequences of human breast and colorectal erae. Proc Natl Acad Sci 105: 4364–4369. cancers. Science 314: 268–274. Samuels Y, Wang Z, Bardelli A, Silliman N, Ptak J, Szabo S, Yan Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa H, Gazdar A, Powell SM, Riggins GJ, et al. 2004. High S, Fujiwara S, Watanabe H, Kurashina K, Hatanaka H, et al. frequency of mutations of the PIK3CA gene in human 2007. Identification of the transforming EML4-ALK fusion cancers. Science 304: 554. gene in non-small-cell lung cancer. Nature 448: 561–566. Sanger F, Coulson AR. 1975. A rapid method for determining Starr TK, Allaei R, Silverstein KA, Staggs RA, Sarver AL, sequences in DNA by primed synthesis with DNA poly- Bergemann TL, Gupta M, O’Sullivan MG, Matise I, Dupuy merase. J Mol Biol 94: 441–448. AJ, et al. 2009. A transposon-based genetic screen in mice Santos E, Martin-Zanca D, Reddy EP, Pierotti MA, Della Porta identifies genes altered in colorectal cancer. Science 323: G, Barbacid M. 1984. Malignant activation of a K-ras onco- 1747–1750. gene in lung carcinoma but not in normal tissue of the same Stephens P, Hunter C, Bignell G, Edkins S, Davies H, Teague J, patient. Science 223: 661–664. Stevens C, O’Meara S, Smith R, Parker A, et al. 2004. Lung Schena M, Shalon D, Davis RW, Brown PO. 1995. Quantitative cancer: intragenic ERBB2 kinase mutations in tumours. monitoring of gene expression patterns with a complemen- Nature 431: 525–526. tary DNA microarray. Science 270: 467–470. Stephens P, Edkins S, Davies H, Greenman C, Cox C, Hunter C, Schlabach MR, Luo J, Solimini NL, Hu G, Xu Q, Li MZ, Zhao Z, Bignell G, Teague J, Smith R, Stevens C, et al. 2005. A screen Smogorzewska A, Sowa ME, Ang XL, et al. 2008. Cancer of the complete protein kinase gene family identifies diverse proliferation gene discovery through functional genomics. patterns of somatic mutations in human breast cancer. Nat Science 319: 620–624. Genet 37: 590–592. Scholl C, Frohling S, Dunn IF, Schinzel AC, Barbie DA, Kim SY, Stephens PJ, McBride DJ, Lin ML, Varela I, Pleasance ED, Silver SJ, Tamayo P, Wadlow RC, Ramaswamy S, et al. 2009. Simpson JT, Stebbings LA, Leroy C, Edkins S, Mudie LJ, Synthetic lethal interaction between oncogenic KRAS de- et al. 2009. Complex landscapes of somatic rearrangement in pendency and STK33 suppression in human cancer cells. human breast cancer genomes. Nature 462: 1005–1010. Cell 137: 821–834. Stratton MR, Campbell PJ, Futreal PA. 2009. The cancer Scott KL, Kabbarah O, Liang MC, Ivanova E, Anagnostou V, Wu genome. Nature 458: 719–724. J, Dhakal S, Wu M, Chen S, Feinberg T, et al. 2009. GOLPH3 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, modulates mTOR signalling and rapamycin sensitivity in Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander cancer. Nature 459: 1085–1090. ES, et al. 2005. Gene set enrichment analysis: a knowledge- Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, based approach for interpreting genome-wide expression Maner S, Massa H, Walker M, Chi M, et al. 2004. Large-scale profiles. Proc Natl Acad Sci 102: 15545–15550. copy number polymorphism in the human genome. Science Sunyaev S, Ramensky V, Koch I, Lathe W III, Kondrashov AS, 305: 525–528. Bork P. 2001. Prediction of deleterious human alleles. Hum Seed B, Aruffo A. 1987. Molecular cloning of the CD2 antigen, Mol Genet 10: 591–597. the T-cell erythrocyte receptor, by a rapid immunoselection Sweet-Cordero A, Mukherjee S, Subramanian A, You H, Roix JJ, procedure. Proc Natl Acad Sci 84: 3365–3369. Ladd-Acosta C, Mesirov J, Golub TR, Jacks T. 2005. An Sekine S, Ogawa R, Ito R, Hiraoka N, McManus MT, Kanai Y, oncogenic KRAS2 expression signature identified by cross- Hebrok M. 2009. Disruption of Dicer1 induces dysregulated species gene-expression analysis. Nat Genet 37: 48–55. fetal gene expression and promotes hepatocarcinogenesis. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Gastroenterology 136:2304–2315e4. Dmitrovsky E, Lander ES, Golub TR. 1999. Interpreting Shendure J, Ji H. 2008. Next-generation DNA sequencing. Nat patterns of gene expression with self-organizing maps: methods Biotechnol 26: 1135–1145. and application to hematopoietic differentiation. Proc Natl Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Acad Sci 96: 2907–2912. Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church Taylor BS, Barretina J, Socci ND, Decarolis P, Ladanyi M, GM. 2005. Accurate multiplex polony sequencing of an Meyerson M, Singer S, Sander C. 2008. Functional copy- evolved bacterial genome. Science 309: 1728–1732. number alterations in cancer. PLoS ONE 3: e3179. doi: Shih C, Weinberg RA. 1982. Isolation of a transforming sequence 10.1371/journal.pone.0003179. from a human bladder carcinoma cell line. Cell 29: 161–169. Thomas RK, Nickerson E, Simons JF, Janne PA, Tengs T, Yuza Y, Shimizu K, Goldfarb M, Suard Y, Perucho M, Li Y, Kamata T, Garraway LA, LaFramboise T, Lee JC, Shah K, et al. 2006. Feramisco J, Stavnezer E, Fogh J, Wigler MH. 1983. Three Sensitive mutation detection in heterogeneous cancer spec- human transforming genes are related to the viral ras imens by massively parallel picoliter reactor sequencing. Nat oncogenes. Proc Natl Acad Sci 80: 2112–2116. Med 12: 852–855. Shvarts A, Brummelkamp TR, Scheeren F, Koh E, Daley GQ, Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Spits H, Bernards R. 2002. A senescence rescue screen iden- Sun XW, Varambally S, Cao X, Tchinda J, Kuefer R, et al.

554 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data

2005. Recurrent fusion of TMPRSS2 and ETS transcription Weir BA, Woo MS, Getz G, Perner S, Ding L, Beroukhim R, Lin factor genes in prostate cancer. Science 310: 644–648. WM, Province MA, Kraja A, Johnson LA, et al. 2007. Turner NC, Lord CJ, Iorns E, Brough R, Swift S, Elliott R, Rayter Characterizing the cancer genome in lung adenocarcinoma. S, Tutt AN, Ashworth A. 2008. A synthetic lethal siRNA Nature 450: 893–898. screen identifying genes mediating sensitivity to a PARP Westbrook TF, Martin ES, Schlabach MR, Leng Y, Liang AC, inhibitor. EMBO J 27: 1368–1377. Feng B, Zhao JJ, Roberts TM, Mandel G, Hannon GJ, et al. Tusher VG, Tibshirani R, Chu G. 2001. Significance analysis of 2005. A genetic screen for candidate tumor suppressors microarrays applied to the ionizing radiation response. Proc identifies REST. Cell 121: 837–848. Natl Acad Sci 98: 5116–5621. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, Uren AG, Kool J, Matentzoglu K, de Ridder J, Mattison J, van McGuire A, He W, Chen YJ, Makhijani V, Roth GT, et al. Uitert M, Lagcher W, Sie D, Tanger E, Cox T, et al. 2008. 2008. The complete genome of an individual by massively Large-scale in p19(ARF)- and p53-deficient mice parallel DNA sequencing. Nature 452: 872–876. identifies cancer genes and their collaborative networks. Cell Whitehurst AW, Bodemann BO, Cardenas J, Ferguson D, Girard 133: 727–741. L, Peyton M, Minna JD, Michnoff C, Hao W, Roth MG, et al. van de Vijver MJ, He YD, van’t Veer LJ, Dai H, Hart AA, Voskuil 2007. Synthetic lethal screen identification of chemosensi- DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, et al. tizer loci in cancer cells. Nature 446: 815–819. 2002. A gene-expression signature as a predictor of survival Whittaker S, Kirk R, Hayward R, Zambon A, Viros A, Cantarino in breast cancer. N Engl J Med 347: 1999–2009. N, Affolter A, Nourry A, Niculescu-Duvaz D, Springer C, Van Loo P, Nordgard SH, Lingjaerde OC, Russnes HG, Rye IH, et al. 2010. Gatekeeper mutations mediate resistance to Sun W, Weigman VJ, Marynen P, Zetterberg A, Naume B, BRAF-targeted therapies. Sci Transl Med 2: 35ra41. doi: et al. 2010. Allele-specific copy number analysis of tumors. 10.1126/scitranslmed.3000758. Proc Natl Acad Sci 107: 16910–16915. Wiedemeyer R, Brennan C, Heffernan TP, Xiao Y, Mahoney J, van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Protopopov A, Zheng H, Bignell G, Furnari F, Cavenee WK, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, et al. et al. 2008. Feedback circuit among INK4 tumor suppressors 2002. Gene expression profiling predicts clinical outcome of constrains human glioblastoma development. Cancer Cell breast cancer. Nature 415: 530–536. 13: 355–364. Varela I, Klijn C, Stephens PJ, Mudie LJ, Stebbings L, Wiedemeyer WR, Dunn IF, Quayle SN, Zhang J, Chheda MG, Galappaththige D, van der Gulden H, Schut E, Klarenbeek Dunn GP, Zhuang L, Rosenbluh J, Chen S, Xiao Y, et al. 2010. S, Campbell PJ, et al. 2010. Somatic structural rearrange- Pattern of retinoblastoma pathway inactivation dictates ments in genetically engineered mouse mammary tumors. response to CDK4/6 inhibition in GBM. Proc Natl Acad Genome Biol 11: R100. doi: 10.1186/gb-2010-11-10-r100. Sci 107: 11501–11506. Varjosalo M, Bjorklund M, Cheng F, Syvanen H, Kivioja T, Willenbrock H, Fridlyand J. 2005. A comparison study: applying Kilpinen S, Sun Z, Kallioniemi O, Stunnenberg HG, He segmentation to array CGH data for downstream analyses. WW, et al. 2008. Application of active and kinase-deficient Bioinformatics 21: 4084–4091. kinome collection for identification of kinases regulating Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, Shen hedgehog signaling. Cell 133: 537–548. D, Boca SM, Barber T, Ptak J, et al. 2007. The genomic Venkatraman ES, Olshen AB. 2007. A faster circular binary landscapes of human breast and colorectal cancers. Science segmentation algorithm for the analysis of array CGH data. 318: 1108–1113. Bioinformatics 23: 657–663. Xiong J. 2006. Essential bioinformatics. Cambridge University Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Press, New York. Smith HO, Yandell M, Evans CA, Holt RA, et al. 2001. The Zender L, Spector MS, Xue W, Flemming P, Cordon-Cardo C, sequence of the human genome. Science 291: 1304–1351. Silke J, Fan ST, Luk JM, Wigler M, Hannon GJ, et al. 2006. Verhaak RG, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson Identification and validation of oncogenes in liver cancer MD, Miller CR, Ding L, Golub T, Mesirov JP, et al. 2010. using an integrative oncogenomic approach. Cell 125: 1253– Integrated genomic analysis identifies clinically relevant 1267. subtypes of glioblastoma characterized by abnormalities in Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, Girard L, PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17: 98–110. Minna J, Christiani D, Leo C, et al. 2004. An integrated view Vogelstein B, Kinzler KW. 1993. The multistep nature of cancer. of copy number and allelic alterations in the cancer genome Trends Genet 9: 138–141. using single nucleotide polymorphism arrays. Cancer Res 64: Voorhoeve PM, le Sage C, Schrier M, Gillis AJ, Stoop H, Nagel 3060–3071. R, Liu YP, van Duijse J, Drost J, Griekspoor A, et al. 2006. A Zhao R, Xing S, Li Z, Fu X, Li Q, Krantz SB, Zhao ZJ. 2005. genetic screen implicates miRNA-372 and miRNA-373 as Identification of an acquired JAK2 mutation in polycythemia oncogenes in testicular germ cell tumors. Cell 124: 1169– vera. J Biol Chem 280: 22788–22792. 1181. Zhu J, Sanborn JZ, Benz S, Szeto C, Hsu F, Kuhn RM, Karolchik Wang XF, Lin HY, Ng-Eaton E, Downward J, Lodish HF, Weinberg D, Archie J, Lenburg ME, Esserman LJ, et al. 2009. The RA. 1991. Expression cloning and characterization of the UCSC Cancer Genomics Browser. Nat Methods 6: 239–240. TGF-b type III receptor. Cell 67: 797–805. Zindy F, Uziel T, Ayrault O, Calabrese C, Valentine M, Rehg JE, Wang P, Kim Y, Pollack J, Narasimhan B, Tibshirani R. 2005. A Gilbertson RJ, Sherr CJ, Roussel MF. 2007. Genetic alter- method for calling gains and losses in array CGH data. ations in mouse medulloblastomas and generation of tumors Biostatistics 6: 45–58. de novo from primary cerebellar granule neuron precursors. Wang Y, Ngo VN, Marani M, Yang Y, Wright G, Staudt LM, Cancer Res 67: 2676–2684. Downward J. 2010. Critical role for transcriptional repressor Snail2 in transformation by oncogenic RAS in colorectal carcinoma cells. Oncogene 29: 4658–4670. Weir B, Zhao X, Meyerson M. 2004. Somatic alterations in the human cancer genome. Cancer Cell 6: 433–438.

GENES & DEVELOPMENT 555 Erratum Genes & Development 25: 534–555 (2011)

Making sense of cancer genomic data Lynda Chin, William C. Hahn, Gad Getz, and Matthew Meyerson

In the above-mentioned review, the authors regret that a reference (Vicent et al. 2010) was inadvertently omitted. On page 547, in the second paragraph, the statement should read as follows:

‘‘For example, several groups have identified genes that, when suppressed, lead to cell death only in the context of cells that are dependent on oncogenic KRAS. Specifically, TBK1, STK33, PLK1, WT1, and SNAIL2 have been identified as genes that are required for the survival of cells dependent on KRAS (Barbie et al. 2009; Luo et al. 2009; Scholl et al. 2009; Vicent et al. 2010; Wang et al. 2010).’’

In addition, the Reference section should include the following:

Vicent S, Chen R, Sayles LC, Lin C, Walker RG, Gillespie AK, Subramanian A, Hinkle G, Yang X, Saif S, et al. 2010. Wilms tumor 1 (WT1) regulates KRAS-driven oncogenesis and senescence in mouse and human models. J Clin Invest 120: 3940–3952.

GENES & DEVELOPMENT 26:1003 Ó 2012 by Cold Spring Harbor Laboratory Press ISSN 0890-9369/12; www.genesdev.org 1003 Downloaded from genesdev.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Making sense of cancer genomic data

Lynda Chin, William C. Hahn, Gad Getz, et al.

Genes Dev. 2011, 25: Access the most recent version at doi:10.1101/gad.2017311

Related Content Making sense of cancer genomic data Lynda Chin, William C. Hahn, Gad Getz, et al. Genes Dev. May , 2012 26: 1003

References This article cites 220 articles, 64 of which can be accessed free at: http://genesdev.cshlp.org/content/25/6/534.full.html#ref-list-1

Articles cited in: http://genesdev.cshlp.org/content/25/6/534.full.html#related-urls

License Freely available online through the Genes & Development Open Access option.

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the top Service right corner of the article or click here.

Copyright © 2011 by Cold Spring Harbor Laboratory Press