Integrative Analysis of the Cancer Transcriptome
Total Page:16
File Type:pdf, Size:1020Kb
PERSPECTIVE Integrative analysis of the cancer transcriptome Daniel R Rhodes & Arul M Chinnaiyan DNA microarrays have been widely applied to the study of making a list of differences in the parts and then focusing on a single human cancer, delineating myriad molecular subtypes of part. Opposite to this single-part approach, a new line of attack seeks cancer, many of which are associated with distinct biological to examine the cancer profile as a whole, often in the context of other underpinnings, disease progression and treatment response. cancer signatures or other types of genomic data. Such integrative These primary analyses have begun to decipher the molecular approaches are capable of simplifying complex cancer signatures heterogeneity of cancer, but integrative analyses that evaluate into coordinately regulated modules, transforming one-dimensional http://www.nature.com/naturegenetics cancer transcriptome data in the context of other data sources cancer signatures into multidimensional interaction networks and are often capable of extracting deeper biological insight from extracting regulatory mechanisms encoded in cancer gene expres- the data. Here we discuss several such integrative computational sion. Here, we review approaches that glean biological insight from and analytical approaches, including meta-analysis, functional cancer microarray data by applying integrative computational and enrichment analysis, interactome analysis, transcriptional analytical methodologies. network analysis and integrative model system analysis. Before exploring the integrative analyses carried out on cancer tran- scriptome data, it is useful to provide a brief overview of the cancer The widespread application of DNA microarrays to cancer research is profiling field. In the past few years, we have witnessed an explosion nothing less than astounding. In the short ten-year history of this ver- of cancer profiling studies. Once a technology available in only a few satile technology, hundreds of large-scale experiments have been done, laboratories, DNA microarrays now seem as pervasive as PCR. generating global quantitative profiles of gene expression in cancer. Querying the Affymetrix database of publications for all reports relat- Known types and subtypes of cancer have been readily distinguished by ing to human cancer returned 646 primary research articles, 453 of their gene-expression patterns, and more importantly, new molecular which were published in the last two years. GEO (Gene Expression 2005 Nature Publishing Group Group 2005 Nature Publishing © subtypes of cancer have been discovered that are associated with a host Omnibus), which has emerged as one of the principal repositories of tumor properties, including the propensity to metastasize and sensi- for microarray data4,5, returned 84 data sets when queried for ‘Homo tivity or resistance to particular therapies. The clinical utility of array- sapiens’ and ‘cancer.’ The Oncomine database (http://www.oncomine. based gene profiles is evidenced by recent studies showing that cancer org/)6, which includes data sets that have profiled ten or more human gene-expression signatures may affect clinical decision-making in breast tumor samples (excluding cell line studies), has catalogued 300 primary cancer and lymphoma management1,2. It may not be long before every research articles and amassed 114 data sets, totaling > 8,000 microarray human cancer sample is profiled with a gene chip to ascertain a molecu- experiments, each profiling a distinct human tissue sample. From this lar diagnosis and prognosis and to define an optimal treatment strategy. large body of cancer gene-expression research, several themes have Before this becomes a reality, however, careful validation to identify emerged. First, cancer types can be reliably distinguished from nor- optimal signatures is needed3. Apart from the impact of microarrays mal tissue of the same type based on global gene-expression patterns. on clinical decision-making, cancer microarray profiling is also poised Second, predominant clinical and pathological subtypes of cancer to advance our understanding of cancer biology and, ultimately, aid in often have distinct gene-expression profiles. Third, gene-expression the development of new and more effective therapies. signatures of primary tumors can often predict disease recurrence, The biologist’s initial instinct might be to use cancer microarray distant metastasis, survival and treatment response. Fourth, hetero- data as a prioritized list of candidate genes for experimental work-up, geneous cancers can be subclassified into molecular subtypes on the hoping to strike gold and identify a gene important in cancer among basis of gene-expression signatures7. These results were generated with a sizeable differential expression profile. Although this approach has primary analytical methods such as hierachical clustering8 and statis- identified several genes important in cancer, it is akin to asking what tically based differential expression analysis9,10, usually with careful makes an airplane different from an automobile, taking both apart, consideration for multiple-hypothesis testing11. Although these primary analyses have made great strides in deci- phering the complex molecular heterogeneity of cancer, integrative Daniel R. Rhodes is in the Departments of Pathology and Bioinformatics and the Comprehensive Cancer Center, and Arul M. Chinnaiyan is in the Departments of bioinformatics approaches that leverage multiple types of informa- Pathology, Bioinformatics and Urology and the Comprehensive Cancer Center at tion have begun to show promise in uncovering important biology the University of Michigan Medical School, Ann Arbor, Michigan 48109, USA. not apparent from standard analysis methods (i.e., differential expres- e-mail: [email protected] sion analysis, hierarchical clustering, etc.). We highlight several such Published online 26 May 2005; doi:10.1038/ng1570 approaches, including meta-analysis for extracting robust profiles from NATURE GENETICS SUPPLEMENT| VOLUME 37 | JUNE 2005 S31 PERSPECTIVE a b Meta-analysis of cancer signatures With hundreds of cancer signatures published Cancer gene-expression profiling with DNA microarrays in the literature, several types of integrative Tissue mRNA Microarray Gene expression analysis can be done. We call analysis of mul- hybridization dataset tiple data sets ‘meta-analyses,’ similar to meta- Samples analyses done in clinical research, in which multiple studies interrogating a common Genes hypothesis are analyzed together. In the realm Lung adenocarcinoma Small cell lung cancer Hepatocellular carcinoma carcinoma Ovarian Colon carcinoma Prostate carcinoma Breast ductal carcinoma Salivary carcinoma Glioma Medulloblastoma carcinoma Pancreatic Bladder carcinoma Diffuse large B cell lymphoma Gene of microarray data, meta-analysis is compli- NME1 TOP2A ** cated by distinct experimental platforms and ARMET Data collection, E2F5 ** designs, and so gene-expression measurements Collection of cancer microarray data processing, ACLY normalization PAICS 40 independent data sets, 15 cancer types PRDX4 are not always directly comparable. Several 3,762 microarray experiments IARS p100 studies have applied meta-analysis methods 37,901,459 gene expression measurements HNRPA2B1 SOX4 ** COPB2 to cancer microarray data, both to identify SMARCA4 SSBP1 robust gene-expression signatures in a single TSTA3 SSR1 cancer type and to look for common expres- CKS2 PLK ** ILF2 sion patterns across different types of cancer. Differential expression analysis SNRPE KPNA2 We applied a summary statistic-based method Sample class Student's t-test for P values Q values NONO HSPD1 coupled with false discovery rate analysis to assignment each gene OK/SW-cl.56 CANX Mean 1 = 0.52 CCT5 compare four gene-expression studies that Mean 2 = 1.81 OGT T statistic = 2.85 HSPE1 P value = 0.02 analyzed clinically localized prostate cancer TRAF4 Genes Genes MTHFD2 relative to benign prostate tissue and identi- KDELR2 SDHC http://www.nature.com/naturegenetics E2-EPF fied a signature of genes commonly activated SNRPF G3BP in prostate cancer across all data sets, irrespec- AHCY 13 81 differential expression analyses from 34 data sets TARS tive of technological platform . After defin- PAFAH1B3 PTMA ing a robust signature for prostate cancer, we Metaprofiling RBM4 PSME2 ** TRA1 carried out pathway analysis with the KEGG Differential expression Significance threshold Identify genes most CRIP2 14 analysis selection to define signatures enriched in signatures PSMA1b** database and found that the polyamine bio- CDKN3 CDC2 ** Gene Count MRPL3 synthesis pathway was hyperactivated, consis- H2AFX 5 RFC4 PSMC4 ** Genes EZH2 5 tent with the known elevation of polyamines DVL3 BIRC5 5 .... C7orf14 in prostate cancer. Another report identified MCM3 ** FAP Compare gene enrichment Define 'metasignature' Assess robustness TGIF gene expression–based classes of breast cancer in signatures with a random if significant gene of metasignature KIAA0111 LDHA simulation enrichment by leave-one-out in one data set and then searched for the exis- IFNGR2 (mFDRMIN < 0.10) class prediction KIAA0101 tence of the classes in two independent data MMP9 ** HDAC1 sets15. Three main biological subtypes held 2005 Nature Publishing Group Group 2005 Nature Publishing CBX3 CCT4 © COL1A2 up in the two independent datasets: basal- Genes (#) Genes MRPS12 NCBP2 like, ERBB2-overexpressing and luminal- PPP2R5C 0 1 2 3 4 5 C20orf1 Signatures (#) like, reflecting