Pan-Cancer Molecular Classes Transcending Tumor Lineage Across 32 Cancer Types, Multiple Data Platforms, and Over 10,000 Cases Fengju Chen1, Yiqun Zhang1, Don L
Total Page:16
File Type:pdf, Size:1020Kb
Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378 Biology of Human Tumors Clinical Cancer Research Pan-Cancer Molecular Classes Transcending Tumor Lineage Across 32 Cancer Types, Multiple Data Platforms, and over 10,000 Cases Fengju Chen1, Yiqun Zhang1, Don L. Gibbons2,3, Benjamin Deneen4,5,6,7, David J. Kwiatkowski8,9, Michael Ittmann10, and Chad J. Creighton1,11,12,13 Abstract Purpose: The Cancer Genome Atlas data resources represent of immune infiltrates were most strongly manifested within a an opportunity to explore commonalities across cancer types class representing 13% of cancers. Pathway-level differences involving multiple molecular levels, but tumor lineage and involving hypoxia, NRF2-ARE, Wnt, and Notch were manifested histology can represent a barrier in moving beyond differences in two additional classes enriched for mesenchymal markers and relatedtocancertype. miR200 silencing. Experimental Design: On the basis of gene expression data, we Conclusions: All pan-cancer molecular classes uncovered classified 10,224 cancers, representing 32 major types, into 10 here, with the important exception of the basal-like breast molecular-based "classes." Molecular patterns representing tissue cancer class, involve a wide range of cancer types and would or histologic dominant effects were first removed computation- facilitate understanding the molecular underpinnings of ally, with the resulting classes representing emergent themes cancers beyond tissue-oriented domains. Numerous biolog- across tumor lineages. ical processes associated with cancer in the laboratory Results: Key differences involving mRNAs, miRNAs, proteins, setting were found here to be coordinately manifested and DNA methylation underscored the pan-cancer classes. One across large subsets of human cancers. The number of class expressing neuroendocrine and cancer-testis antigen markers cancers manifesting features of neuroendocrine tumors may represented 4% of cancers surveyed. Basal-like breast cancers be much higher than previously thought, which disease segregated into an exclusive class, distinct from all other cancers. is known to occur in many different tissues. Clin Cancer Res; Immune checkpoint pathway markers and molecular signatures 24(9); 2182–93. Ó2018 AACR. Introduction Cancer is not a single disease, and at the molecular level 1 there is widespread heterogeneity that may be observed from Dan L. Duncan Comprehensive Cancer Center Division of Biostatistics, Baylor fi 2 patient to patient. Nevertheless, unsupervised classi cation of College of Medicine, Houston, Texas. Department of Thoracic/Head and Neck fi Medical Oncology, The University of Texas MD Anderson Cancer Center, Hous- tumors on the basis of molecular pro lingdatacanreveal ton, Texas. 3Department of Molecular and Cellular Oncology, The University of major subtypes existing within a given cancer type according to Texas MD Anderson Cancer Center, Houston, Texas. 4Center for Cell and Gene tissue of origin (1, 2). Such molecular-based subtypes can Therapy, Baylor College of Medicine, Houston, Texas. 5Department of Neuro- reflect altered pathways within different cancer subsets, which 6 science, Baylor College of Medicine, Houston, Texas. Neurological Research could have important implications for applying existing ther- Institute at Texas' Children's Hospital, Baylor College of Medicine, Houston, apies or for developing new therapeutic approaches (3). The Texas. 7Program in Developmental Biology, Baylor College of Medicine, Hous- ton, Texas. 8The Eli and Edythe L. Broad Institute of Massachusetts Institute of Cancer Genome Atlas (TCGA), a large-scale effort to compre- Technology and Harvard University, Cambridge, Massachusetts. 9Brigham and hensively characterize over 10,000 human cancers at the molec- Women's Hospital and Harvard Medical School, Boston, Massachusetts. ular level, provides a common platform for the study of diverse 10Department of Pathology & Immunology, Baylor College of Medicine, Houston, cancer types, with multiple levels of data including mRNA, Texas. 11Department of Bioinformatics and Computational Biology, The Univer- 12 miRNA, protein, DNA methylation, copy number, and muta- sity of Texas MD Anderson Cancer Center, Houston, Texas. Human Genome tion(2).FormostcancertypesrepresentedinTCGA,an Sequencing Center, Baylor College of Medicine, Houston, Texas. 13Department of Medicine, Baylor College of Medicine, Houston, Texas. individual study of the molecular landscape of that cancer type was carried out (2). Note: Supplementary data for this article are available at Clinical Cancer With data generation completed, there is opportunity for Research Online (http://clincancerres.aacrjournals.org/). systematic analyses of the entire TCGA pan-cancer cohort (4), fi F. Chen and Y. Zhang are co- rst authors. including defining molecular subtypes and associated pathways Corresponding Author: Chad J. Creighton, Baylor College of Medicine, One relevant to multiple cancer types. One challenge, in the iden- Baylor Plaza, Houston, TX 77030. Phone: 713 798 2264; Fax: 713 798 2716; E-mail: tification of cancer subsets transcending the tissue of origin, [email protected] is that widespread molecular patterns are associated with doi: 10.1158/1078-0432.CCR-17-3378 tumor lineage and histology (5–7). Although TCGA datasets Ó2018 American Association for Cancer Research. have been harmonized to allow for cross-cancer type 2182 Clin Cancer Res; 24(9) May 1, 2018 Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378 Pan-Cancer Molecular Classes anoma. Cancer molecular profiling data were generated through Translational Relevance informed consent as part of previously published studies and Unsupervised molecular classification of tumors can reveal analyzed in accordance with each original study's data use guide- major subtypes existing within a given cancer type as defined lines and restrictions. by tissue of origin. Such molecular-based subtypes can reflect For the mRNA platform, log2-transformed expression values different pathways at work within different cancer subsets, within each cancer type (as defined by TCGA project) were which could have important implications for applying existing normalized to standard deviations from the median. By k- therapies or for developing new therapeutic approaches. Here means clustering method (using the "kmeans" R function, with we applied an alternative classification approach, in order to 10 clusters and 1000 maximum iterations and 25 random sets), consolidate the individual subtypes that might be discoverable cancer cases were subtyped, based on the top 2,000 features in individual cancer types into super-types or pan-cancer with the most variable expression on average by cancer type "classes" that transcend tissue or histology distinctions. Coor- (using standard deviation of the log2-transformed expression dinate pathways and processes were revealed across the ten values as a measure of variability, excluding genes on X or Y pan-cancer classes in our cohort. As reflected in these classes, chromosomes). the tumor microenvironment may influence cancer in differ- We defined the top differential genes associated with each ent ways between distinct subsets of human tumors. Our subtype, or pan-cancer "class." Taking the top 2,000 mRNA molecular class manifesting a differential expression profile features, we first computed the two-sided t test for each gene and of neuroendocrine tumors would have therapeutic implica- each class, comparing expression levels of each class with that of tions for an appreciable subset of human cancers. the rest of the tumors. We then selected the top 100 genes with the lowest P-value for each subtype; however, for the c2 class only three genes were associated, and for the c7 class only 51 genes were associated, resulting in 854 top class-specific genes in all. In a comparisons—which is useful for addressing a host of ques- similar manner, top features associated with pan-cancer class for fi tions—an alternative approach to molecular classification on RPPA, miRNA, and DNA methylation platforms were identi ed. the basis of these data would be to first computationally The Fantom datasets of gene expression by cell type (10) were fl subtract the molecular differences between cancer types (8, analyzed using a previously utilized approach (7). Brie y, the top 9). This alternative approach would have the effect of consol- 2,000 most variable mRNAs (used above for the clustering anal- idating the individual subtypes that might be discoverable in yses) were examined in Fantom. Logged expression values for individual cancer types into super-types or pan-cancer "classes" each gene in the fantom dataset were centered on the median of fi fi that transcend tissue or histology distinctions. sample pro les. For each fantom differential expression pro le, fi The aim of this study was to define pan-cancer molecular-based the inter-pro le correlation (Spearman's) was taken with that of fi subtypes or "classes," which would transcend tumor lineage each TCGA pan-cancer differential expression pro le (with genes across the over 10,000 human cancers and 32 cancer types profiled normalized within each TCGA project to standard deviations by TCGA, using data from multiple molecular profiling platforms from the median). fi to characterize these classes in terms of associated pathways. We examined an external gene expression