Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Biology of Human Tumors Clinical Cancer Research Pan-Cancer Molecular Classes Transcending Tumor Lineage Across 32 Cancer Types, Multiple Data Platforms, and over 10,000 Cases Fengju Chen1, Yiqun Zhang1, Don L. Gibbons2,3, Benjamin Deneen4,5,6,7, David J. Kwiatkowski8,9, Michael Ittmann10, and Chad J. Creighton1,11,12,13

Abstract

Purpose: The Cancer Genome Atlas data resources represent of immune infiltrates were most strongly manifested within a an opportunity to explore commonalities across cancer types class representing 13% of cancers. Pathway-level differences involving multiple molecular levels, but tumor lineage and involving hypoxia, NRF2-ARE, Wnt, and Notch were manifested histology can represent a barrier in moving beyond differences in two additional classes enriched for mesenchymal markers and relatedtocancertype. miR200 silencing. Experimental Design: On the basis of data, we Conclusions: All pan-cancer molecular classes uncovered classified 10,224 cancers, representing 32 major types, into 10 here, with the important exception of the basal-like breast molecular-based "classes." Molecular patterns representing tissue cancer class, involve a wide range of cancer types and would or histologic dominant effects were first removed computation- facilitate understanding the molecular underpinnings of ally, with the resulting classes representing emergent themes cancers beyond tissue-oriented domains. Numerous biolog- across tumor lineages. ical processes associated with cancer in the laboratory Results: Key differences involving mRNAs, miRNAs, , setting were found here to be coordinately manifested and DNA methylation underscored the pan-cancer classes. One across large subsets of human cancers. The number of class expressing neuroendocrine and cancer-testis antigen markers cancers manifesting features of neuroendocrine tumors may represented 4% of cancers surveyed. Basal-like breast cancers be much higher than previously thought, which disease segregated into an exclusive class, distinct from all other cancers. is known to occur in many different tissues. Clin Cancer Res; Immune checkpoint pathway markers and molecular signatures 24(9); 2182–93. 2018 AACR.

Introduction Cancer is not a single disease, and at the molecular level

1 there is widespread heterogeneity that may be observed from Dan L. Duncan Comprehensive Cancer Center Division of Biostatistics, Baylor fi 2 patient to patient. Nevertheless, unsupervised classi cation of College of Medicine, Houston, Texas. Department of Thoracic/Head and Neck fi Medical Oncology, The University of Texas MD Anderson Cancer Center, Hous- tumors on the basis of molecular pro lingdatacanreveal ton, Texas. 3Department of Molecular and Cellular Oncology, The University of major subtypes existing within a given cancer type according to Texas MD Anderson Cancer Center, Houston, Texas. 4Center for Cell and Gene tissue of origin (1, 2). Such molecular-based subtypes can Therapy, Baylor College of Medicine, Houston, Texas. 5Department of Neuro- reflect altered pathways within different cancer subsets, which 6 science, Baylor College of Medicine, Houston, Texas. Neurological Research could have important implications for applying existing ther- Institute at Texas' Children's Hospital, Baylor College of Medicine, Houston, apies or for developing new therapeutic approaches (3). The Texas. 7Program in Developmental Biology, Baylor College of Medicine, Hous- ton, Texas. 8The Eli and Edythe L. Broad Institute of Massachusetts Institute of Cancer Genome Atlas (TCGA), a large-scale effort to compre- Technology and Harvard University, Cambridge, Massachusetts. 9Brigham and hensively characterize over 10,000 human cancers at the molec- Women's Hospital and Harvard Medical School, Boston, Massachusetts. ular level, provides a common platform for the study of diverse 10Department of Pathology & Immunology, Baylor College of Medicine, Houston, cancer types, with multiple levels of data including mRNA, Texas. 11Department of Bioinformatics and Computational Biology, The Univer- 12 miRNA, , DNA methylation, copy number, and muta- sity of Texas MD Anderson Cancer Center, Houston, Texas. tion(2).FormostcancertypesrepresentedinTCGA,an Sequencing Center, Baylor College of Medicine, Houston, Texas. 13Department of Medicine, Baylor College of Medicine, Houston, Texas. individual study of the molecular landscape of that cancer type was carried out (2). Note: Supplementary data for this article are available at Clinical Cancer With data generation completed, there is opportunity for Research Online (http://clincancerres.aacrjournals.org/). systematic analyses of the entire TCGA pan-cancer cohort (4), fi F. Chen and Y. Zhang are co- rst authors. including defining molecular subtypes and associated pathways Corresponding Author: Chad J. Creighton, Baylor College of Medicine, One relevant to multiple cancer types. One challenge, in the iden- Baylor Plaza, Houston, TX 77030. Phone: 713 798 2264; Fax: 713 798 2716; E-mail: tification of cancer subsets transcending the tissue of origin, [email protected] is that widespread molecular patterns are associated with doi: 10.1158/1078-0432.CCR-17-3378 tumor lineage and histology (5–7). Although TCGA datasets 2018 American Association for Cancer Research. have been harmonized to allow for cross-cancer type

2182 Clin Cancer Res; 24(9) May 1, 2018

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Pan-Cancer Molecular Classes

anoma. Cancer molecular profiling data were generated through Translational Relevance informed consent as part of previously published studies and Unsupervised molecular classification of tumors can reveal analyzed in accordance with each original study's data use guide- major subtypes existing within a given cancer type as defined lines and restrictions. by tissue of origin. Such molecular-based subtypes can reflect For the mRNA platform, log2-transformed expression values different pathways at work within different cancer subsets, within each cancer type (as defined by TCGA project) were which could have important implications for applying existing normalized to standard deviations from the median. By k- therapies or for developing new therapeutic approaches. Here means clustering method (using the "kmeans" R function, with we applied an alternative classification approach, in order to 10 clusters and 1000 maximum iterations and 25 random sets), consolidate the individual subtypes that might be discoverable cancer cases were subtyped, based on the top 2,000 features in individual cancer types into super-types or pan-cancer with the most variable expression on average by cancer type "classes" that transcend tissue or histology distinctions. Coor- (using standard deviation of the log2-transformed expression dinate pathways and processes were revealed across the ten values as a measure of variability, excluding genes on X or Y pan-cancer classes in our cohort. As reflected in these classes, ). the tumor microenvironment may influence cancer in differ- We defined the top differential genes associated with each ent ways between distinct subsets of human tumors. Our subtype, or pan-cancer "class." Taking the top 2,000 mRNA molecular class manifesting a differential expression profile features, we first computed the two-sided t test for each gene and of neuroendocrine tumors would have therapeutic implica- each class, comparing expression levels of each class with that of tions for an appreciable subset of human cancers. the rest of the tumors. We then selected the top 100 genes with the lowest P-value for each subtype; however, for the c2 class only three genes were associated, and for the c7 class only 51 genes were associated, resulting in 854 top class-specific genes in all. In a comparisons—which is useful for addressing a host of ques- similar manner, top features associated with pan-cancer class for fi tions—an alternative approach to molecular classification on RPPA, miRNA, and DNA methylation platforms were identi ed. the basis of these data would be to first computationally The Fantom datasets of gene expression by cell type (10) were fl subtract the molecular differences between cancer types (8, analyzed using a previously utilized approach (7). Brie y, the top 9). This alternative approach would have the effect of consol- 2,000 most variable mRNAs (used above for the clustering anal- idating the individual subtypes that might be discoverable in yses) were examined in Fantom. Logged expression values for individual cancer types into super-types or pan-cancer "classes" each gene in the fantom dataset were centered on the median of fi fi that transcend tissue or histology distinctions. sample pro les. For each fantom differential expression pro le, fi The aim of this study was to define pan-cancer molecular-based the inter-pro le correlation (Spearman's) was taken with that of fi subtypes or "classes," which would transcend tumor lineage each TCGA pan-cancer differential expression pro le (with genes across the over 10,000 human cancers and 32 cancer types profiled normalized within each TCGA project to standard deviations by TCGA, using data from multiple molecular profiling platforms from the median). fi to characterize these classes in terms of associated pathways. We examined an external gene expression pro ling dataset of multiple cancer types from the Expression Project for Oncology fi Materials and Methods (expO; GSE2109), classifying each external tumor pro le by pan- cancer class as defined by TCGA data. Within each cancer type, Results are based upon data generated by TCGA Research log2-transformed genes in the expO dataset were normalized to Network (http://cancergenome.nih.gov/). Molecular data were standard deviations from the median. As a classifier, the top set of aggregated from public repositories. Tumors spanned 32 different 854 mRNAs distinguishing between the TCGA pan-cancer classes TCGA projects, each project representing a specific cancer type, was used. For each pan-cancer class, the average value for each listed as follows: LAML, acute myeloid leukemia; ACC, adreno- gene was computed, based on the centered TCGA expression data cortical carcinoma; BLCA, bladder urothelial carcinoma; LGG, matrix. The Pearson's correlation between each expO profile and lower grade glioma; BRCA, breast invasive carcinoma; CESC, each TCGA pan-cancer class averaged profile was computed. Each cervical squamous cell carcinoma and endocervical adenocarci- expO case was assigned to TCGA pan-cancer class, based on which noma; CHOL, cholangiocarcinoma; CRC, colorectal adenocarci- class profile showed the highest correlation with the given expO noma (combining COAD and READ projects); ESCA, esophageal profile. carcinoma; GBM, glioblastoma multiforme; HNSC, head and All P values were two-sided unless otherwise specified. Scoring neck squamous cell carcinoma; KICH, chromophobe; of expression profiles for pathway-associated gene signatures was KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal carried out as previously described (9, 11). Additional methods papillary cell carcinoma; LIHC, hepatocellular carcinoma; details are provided in supplemental. Supplementary Fig. S1 LUAD, adenocarcinoma; LUSC, lung squamous cell carci- provides a schematic of the various analyses performed and their noma; DLBC, lymphoid neoplasm diffuse large B-cell lymphoma; relation to supplementary data files. MESO, mesothelioma; OV, ovarian serous cystadenocarcinoma; PAAD, pancreatic adenocarcinoma; PCPG, pheochromocytoma and paraganglioma; PRAD, adenocarcinoma; SARC, Results sarcoma; SKCM, skin cutaneous melanoma; STAD, stomach ade- Pan-cancer molecular subtypes primarily driven by tumor nocarcinoma; TGCT, testicular germ cell tumors; THYM, thy- lineage moma; THCA, thyroid carcinoma; UCS, uterine carcinosarcoma; Across cancer types, tumor lineage and squamous histology UCEC, uterine corpus endometrial carcinoma; UVM, uveal mel- were found to represent major determinates of unsupervised

www.aacrjournals.org Clin Cancer Res; 24(9) May 1, 2018 2183

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Chen et al.

molecular classification. In total, our study involved 11,232 Table 1. Tissue-independent pan-cancer molecular classes in TCGA cohort human cancer cases representing 32 different major types, for Class n (%) Description and notable features which TCGA generated data on one or more of the following c1 847 (8.3) High differential expression of oxidative molecular characterization platforms (Supplementary Data S1): phosphorylation genes, glycolysis genes, and n ¼ pentose phosphate pathway genes. whole exome sequencing (WES, 10,224 cases), somatic c2 2202 (21.5) Lack of strong associated expression patterns; can n ¼ n ¼ DNA copy by SNP array ( 10,845), RNA-seq ( 10,224), serve as a comparison group for the other classes. miRNA-seq (n ¼ 10,128), DNA methylation (n ¼ 10,959), and c3 1340 (13.1) Strong association with immune checkpoint pathway; reverse phase protein array (RPPA, n ¼ 7,663). Using established differential expression profile associated with analytical approaches (5–7), 10,662 TCGA cases (with data immune cell infiltration; mesenchymal signature; available for at least three platforms) were subtyped according NRF2/KEAP1 pathway signature; Wnt pathway signature. to each of the above data platforms excluding WES, with the c4 411 (4) Differential expression profile associated with various subtype calls for each sample then being consolidated to neuroendocrine tumors and with normal cells and define multiplatform-based molecular subtypes. tissues of the CNS; CT antigen expression. With hierarchical clustering of the platform-level subtype c5 187 (1.8) Represents basal-like ; TP53-related assignments (Supplementary Figs. S2A and S3), the cases segre- alterations, MYC amplification and expression; YAP1 gated largely according to cancer type, with higher level branches target expression; high expression of pentose phosphate and TCA cycle genes; immune checkpoint of the clustering tree representing major tissue-based categories pathway; CT antigen expression. including breast, liver, kidney, brain, gastrointestinal, squamous c6 1179 (11.5) Epithelial signature; normoxia signature; YAP1 target (head and neck, lung squamous, esophageal, bladder, cervical), expression. lung adenocarcinoma, immune-related, uterus, and skin. In this c7 1153 (11.3) Mesenchymal signature; hypoxia signature; Wnt pan-cancer molecular subtyping, basal-like BRCA (breast cancer) pathway signature; Notch pathway signature; NRF2/ was molecularly distinct from other BRCA. K-means clustering KEAP1 pathway signature; low differential expression of miR-200. was applied to the platform-level subtype assignment matrix to c8 948 (9.3) High differential expression of fatty acid metabolism fi formally de ne a discrete number of subtypes (Supplementary genes; mesenchymal signature; hypoxia signature; Figs. S2B, 4A, and 4B, and Supplementary Data S2). With a Wnt pathway signature; Notch pathway signature; 25-subtype solution, for example (Supplementary Fig. S2B), most NRF2/KEAP1 pathway signature; high differential subtypes were specific to either a single cancer type or multiple DNA methylation and low differential expression of fi types related by common tissue of origin. miR-200; differential expression pro le associated with normal cells and tissues of the CNS; immune checkpoint pathway (observed in TCGA cohort only). Pan-cancer molecular classes that transcend tumor lineage c9 732 (7.2) Wnt pathway signature; Notch pathway signature; Although the above pan-cancer molecular subtypes were found NRF2/KEAP1 pathway signature. to be highly concordant with their tissue-of-origin counterparts, c10 1217 (11.9) Immune checkpoint pathway; differential expression we went on to pursue an alternate analytical strategy to define profile associated with immune cell infiltration; YAP1 subtypes that would not be driven by tumor lineage-specific target expression. markers or patterns. For expression (mRNA, miRNA, protein) Abbreviation: CNS, central nervous system; CT, cancer-testis. and DNA methylation datasets, values were first normalized or centered within each cancer type, thereby computationally removing any tissue or histology dominant effects. For defining Figs. S5B–S5F and S6). For each class, the top genes most differ- pan-cancer subtypes using the above datasets, we had initially entially expressed in the given class versus the rest of the tumors considered multiplatform-based solutions (e.g., analyzing results were identified (Fig. 1A; Supplementary Data S2), where the top from the five data platforms together to define subtypes; ref. 9), differential patterns—involving 854 genes in all—could be but instead opted to use RNA-seq platform to define subtypes observed to span cancer type. Notably, very few genes were and then examine the other platforms for correlates of these RNA- strongly associated with the c2 class in particular, which repre- seq-based subtypes. RNA-seq data would represent the full range sented one distinguishing feature of this class as compared to the of features for cancer-relevant pathways, whereas miRNA and other classes, although this c2 class would represent a basis protein data platforms both represent expression-based data but of comparison. Normalized differential expression patterns for with a more limited number of features and represented pathways specific genes of interest—including MYC, MKI67, CTAG1B, as compared to RNA-seq data. In addition, DNA copy alterations HIF1A, CD274, VIM, and ZEB1—could further distinguish (while having a normal control) vary by cancer type (Supple- between the classes and suggested associated pathways as further mentary Fig. S3 and ref. 9), and so would serve to segregate explored below. samples by cancer type. Specific miRNAs and proteins (with values normalized within The RNA-seq platform (with values normalized within cancer cancer type) could distinguish between the pan-cancer molecular type) was used to define 10 different subtypes or "classes" of classes (Fig. 1B; Supplementary Fig. S5E and Supplementary Data cancer (Table 1) across the 10,224 cases in TCGA cohort with S2), with, for example, miR200 family members showing the available data (where with solutions of more than 10 subtypes no lowest differential expression in c8 and c7 classes, and with appreciable increase in clustering consensus was observed; Sup- immune cell protein markers LCK and SYK being highest in the plementary Fig. S5A), with the other profiling platforms then c3 class. Differential DNA methylation patterns could also dis- being used to characterize these classes in terms of associated tinguish between the subtypes (Fig. 1C; Supplementary Figs. S5B pathways as described below. These pan-cancer molecular classes, and S5F, and Supplementary Data S2), with distinctive patterns referred to here as "c1" to "c10," were each characterized by associated with c5 class in particular. For a subset of genes, widespread molecular patterns (Fig. 1A; Supplementary significant anticorrelations between methylation and expression

2184 Clin Cancer Res; 24(9) May 1, 2018 Clinical Cancer Research

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Pan-Cancer Molecular Classes

Pan-cancer class 10,224 Human cancers

AEc1 c2 c3 c4c5 c6 c7 c8 c9 c10 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 n 136 BRCA:Basal 63 LUAD:Prox.-prolif. 854 genes 41 BLCA:c1 luminal 109 PRAD:iCluster2 85 PRAD:iCluster1 176 BRCA:LumB 113 THCA:1 104 KIRC:CC-e.1 Higher 53 KIRP:P-e.2 Lower 24 ACC:COC3 MYC 60 UCEC:CN high MKI67 65 LUSC:Classical CTAG1B 89 LUAD:TRU HIF1A 256 KIRC:CC-e.2

mRNA Differential expression Differential mRNA CD274

normalized within cancer type 134 KIRP:P-e.1a VIM 65 OV:Differentiated ZEB1 40 GBM:Classical 8 GBM:G-CIMP B Higher 76 THCA:4 Lower 85 CRC:CIN 90 UCEC:CN low miR-100 66 THCA:3 miR-199ab 139 PRAD:iCluster3 28 GBM:Neural 30 GBM:Proneural let-7c 102 SKCM:Keratin

miR Expression 70 TGCT:Non-Seminoma normalized within miR-200c 169 LGG:IDHmut-codel cancer type - 50 miRs miR-141 19 ACC:COC2 LCK 139 KIRC:CC-e.3 SYK 25 KIRP:P.CIMP-2 31 BLCA:c3 Basal ER-alpha 78 LUAD:Prox.-inflam GATA3 42 OV:Mesenchymal Claudin 7 96 CESC:C2 Fibronectin 52 GBM:Mesenchymal Collagen VI 95 LGG:IDHwt

type - 25 proteins 43 LUSC:Basal Protein expression norm. within cancer Caspace 7 86 THCA:5

476 loci 72 KIRP:P-e.1b 166 SKCM:Immune Higher 65 BRCA:Her2 Lower 65 UCEC:MSI DNA meth. DNA

Noisy 17 UCEC:POLE cancer type center within Copy alt. 33 ACC:COC1 50 THCA:2 Cancer Quiet 60 CRC:Invasive type Basal 87 HNSC:Basal Pam50 Luminal BA 249 LGG:IDHmut-non-codel % Purity 20 100 76 OV:Immunoreactive 30 STAD/ESCA:EBV 78 STAD/ESCA:MSI n 847 2,202 1,340 1,179 1,153 948 732 1217 64 TGCT:Seminoma 411 187 71 CRC:MSI/CIMP 25 BRCA:Normal 42 BLCA:c2 Lum. Immune c1: n = 845 c2: n = 2,181 c3: n = 1,337 15 BLCA:c4 Immune undiff. CDc4: n = 409 c5: n = 187 c6: n = 1,172 415 BRCA:LumA

ACC BLCA BRCA CESC CHOL CRC DLBC ESCA GBM HNSC KICH KIRC KIRP LAML LGG LIHC LUAD LUSC MESO OV PAAD PCPG PRAD SARC SKCM STAD TGCT THCA THYM UCEC UCS UVM c7: n = 1,148 c8: n = 940 c9: n = 720 43 LUSC:Secretory c10: n = 1,213 75 HNSC:Mesenchymal c1 1 78 STAD/ESCA:GS c2 47 CESC:C1 0.9 Censored 27 LUSC:Primitive c3 0.8 c4 79 OV:Proliferative c5 0.7 49 HNSC:Classical c6 0.6 68 HNSC:Atypical c7 0.5 35 CESC:C3 c8 0.4 59 SKCM:MITF-low

Pan-cancer class 372 STAD/ESCA:CIN c9 0.3 c10 0.2 Survival probability P n 20

0 -value (-log10) P 10 79 36 48 66 87 57 80 0.1 Overall < 1E-10 Relative 408 304 623 184 161 520 533 290 173 516 371 515 501 262 178 179 497 259 469 415 150 503 120 545

1095 0 enrichment

% Representation of 0 50 100 150 200 250 0 FDR (-log10) 18 class by cancer type 8.4 <1 2 5 10 20 30 40 Overall survival (months) FDR>0.1

Figure 1. Molecular classes of TCGA cancers that transcend tumor lineage or tissue-of-origin. A, Using an alternative molecular classification approach, whereby differences between cancer types were first removed computationally prior to classification on the basis of mRNA expression data, ten major pan-cancer "classes" were identified. The first heat map shows differential mRNA expression patterns (values normalized within each main cancer type) for a set of 854 genes found to best distinguish between the ten subtypes (see Materials and Methods). The second shows differential expression patterns for a select set of genes representing pathways of particular interest. Numbers of cases (n ¼ 10,224) denote representation on RNA-seq data platform. B, Molecular features from other data platforms associating with pan-cancer molecular class. Top heat map shows differential expression patterns (values normalized within each cancer type), representing a top set of 50 miRNA features that distinguish between the ten molecular classes from part A. The second heat map shows differential protein expression patterns (by RPPA platform, values normalized within each cancer type), representing a top set of 25 features that distinguish between the 10 subtypes. The third heat map shows differential DNA methylation patterns (values centered within each cancer type) for a top set of features that distinguish a class associated with basal-like breast cancer. Additional sample-level data tracks denote levels of genome-wide copy number alteration, cancer type (according to TCGA project, color coding in part C), BRCA Pam50 subtype, and estimated tumor sample purity (ref. 47; white, 100% purity). C, The percent representations of each pan-cancer class by cancer type (according to TCGA project) are represented using a colorgram. D, Differences in patient overall survival among the pan-cancer molecular classes. P values by stratified log- test incorporating cancer type as a confounder. Overall P-value evaluates for significant differences among the groups as defined by pan-cancer class. E, Significance of overlap between the pan-cancer class assignments made in the present study (columns), with molecular-based subtype assignments (rows) made previously for a subset of cases. P values by one-sided Fisher exact test; only P values with FDR <0.1 (48) are represented. See Materials and Methods for TCGA project abbreviations. See also Supplementary Figs. S1 to S7 and Supplementary Data S1 and S2.

www.aacrjournals.org Clin Cancer Res; 24(9) May 1, 2018 2185

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Chen et al.

could be identified (Supplementary Fig. S5F and Supplementary levels of p53 transcriptional targets, and MTOR-related altera- Data S2). Within the top differentially expressed genes under- tions were associated with increased MTOR proteomic signal- scoring each pan-cancer molecular class, specific gene categories ing; Supplementary Fig. S8C). These results would demonstrate were overrepresented (Supplementary Fig. S7), including a widespread level of concordance between observed DNA- immune-related genes being most highly expressed in the c3 level alterations and global expression patterns. At the same class, neuron-related genes being highly expressed in the c4 class, time, within the somatically altered and unaltered groups for and cytoskeleton- and keratin-related genes being highly each pathway, a wide range of genes signature levels were expressed in the c9 class. The c4, c5, and c6 classes tended to evident (Supplementary Fig. S8C), suggesting that other factors show a higher degree of genome-wide copy alterations (Fig. 1B; besides somatic or copy alteration (e.g., tumor micro- Supplementary Fig. S5B). Each of the pan-cancer classes were environmental influences) may impact pathways. found to span cases from multiple cancer types (Figs. 1B and C), with the notable exception of c5, which consisted entirely of BRCA Pathway-related gene signatures and mesenchymal cells cases (n ¼ 187) and represents the basal-like breast cancer To gain insight into pathways that would distinguish between molecular subtype (ref. 12; Fig. 1B). As compared to the other the various pan-cancer molecular classes, we applied a number pan-cancer classes, the c5 class was associated with better overall of gene signatures to TCGA expression profiles with values nor- patient survival (Fig. 1D); however, compared with other breast malized within each cancer type. A number of pathways appeared cancers, c5 has a poorer survival (3). Overall, the pan-cancer more or less active for different pan-cancer classes (Fig. 2A), molecular classes showed significant concordances with other as further explored below. For example, gene signature scores molecular subtype designations, which had been previously for epithelial–mesenchymal transition (EMT), hypoxia, NRF2/ made for a subset of TCGA cases in individual cancer type studies KEAP1, Wnt, and Notch all were higher in c3, c7, and c8 classes, (refs. 6, 12–24; Fig. 1E). as compared to the rest of the tumors. Focusing here on EMT, representing mesenchymal features, we observed a strong nega- Associations involving somatic alterations tive association between the expression of miR-200 family mem- Across the entire TCGA pan-cancer cohort, assessment of genes bers and their promoter methylation levels (Fig. 2B and 2C; within pathways demonstrated a high number of somatic altera- Supplementary Fig. S12A), which demonstrates epigenetic regu- tions (mutation, copy alteration, or epigenetic silencing) involv- lation of these miRNAs across a large subset of human cancers. The ing p53 (62.7% of 10,224 cases with exome data available), miR200 family suppresses ZEB1, a key transcriptional regulator of PI3K/AKT/mTOR (44.1%), tyrosine kinase signaling EMT. Across the entire TCGA cohort, we examined differential (RTK, 43.9%), chromatin modification (40.8%), SWI/SNF com- expression values for a set of genes and proteins representing plex (32.0%), Wnt/b-catenin (22.3%), MYC (10.9%), NRF2-ARE canonical mesenchymal or epithelial markers (11), as well as for (9.3%), and Hippo signaling (4.5%; Supplementary Fig. S8A). miR200 family and ZEB1 genes (Fig. 2B). Manifestation of mes- The above pathways were found to be altered in different ways enchymal features was highest in the c7 and c8 tumors and lowest involving different genes in different cancer types (Supplementary in the c6 tumors (which appeared more epithelial), and miR200 Fig. S9A). A number of pathway-level or individual gene-level expression was lowest in c7 and c8 tumors, with associated DNA alterations surveyed were highly represented within specific can- methylation being highest in c8 tumors; c3 tumors also showed cer types or pan-cancer classes (Supplementary Figs. S8B and mesenchymal-associated patterns but less strongly than did c7 S9B). In particular, TP53 were highly represented and c8 tumors, and c9 tumors showed high expression of some within both c4 and c5 tumors, and MYC amplifications were mesenchymal and epithelial marker genes, as well as high highly represented in c5 tumors (Supplementary Figs. S8B and miR200. The c7 and c8 pan-cancer classes each involved a wide 9B). Furthermore, c10 tumors were enriched for somatic altera- range of cancer types (Fig. 2D). tions involving p53 pathway (including TP53 and ATM muta- For some tumor subsets, the observed associations with mes- tions), RTKs (MET mutations), and numerous genes involving enchymal features could conceivably be attributable to surround- chromatin modification and SWI/SNF complex (Supplementary ing stromal cells within the tumor sample, as well as to changes Figs. S8B and S9B). Some of the somatic mutation associations within the actual cancer cells. For example, invasive lobular observed might be attributable to cancer type- or mutation rate- carcinoma (ILC), the second most prevalent histologic subtype specific patterns (Supplementary Figs. S10 and S11); for example, of invasive breast cancer, is characterized by small discohesive TP53 and chromatin modifier mutations being enriched within c4 neoplastic cells invading the stroma in a single-file pattern (12). tumors would reflect in part the types of cancers more highly Immunohistochemical analysis in breast ILC has demonstrated represented within c4 (BLCA, CESC, HNSC, LUSC, etc.). that the neoplastic lobular cells tend to retain their epithelial We sought to examine the effects on pathway activation—as identity, with mesenchymal markers being expressed by the measured by mRNA or protein signature—of somatic altera- fibroblasts in the prominent stromal component of ILC (28). tions impacting specific pathways noted above. Previously- However, for other cancer types, changes within cancer cells from defined gene expression signatures for p53, k-ras, MTOR, epithelial to mesenchymal states may occur at the tumor invasive Wnt/b-catenin, MYC, NRF2, and YAP1 (6, 25–27) were applied front, although not within the main tumor bulk (29). Of our to the mRNA or protein expression profiles of the TCGA 10 pan-cancer molecular classes, four were characterized by samples, whereby each sample profile was scored for each of relatively lower sample purity: c3, c5 (representing basal-like the above pathways. For each pathway considered, relative breast cancer), c7, and c8 (Fig. 1B; Supplementary Figs. S12B and levels of the corresponding signature were significantly differ- S12C). Interestingly, of these four classes, only the c8 class was ent between somatically altered versus unaltered cases for that strongly associated with the breast ILC cases in TCGA (Supple- pathway, and the differences were in the anticipated direction mentary Fig. S12B). In contrast, renal cell carcinomas of the (e.g. p53-related alterations were associated with decreased previously described "CC-e.3" molecular subtype manifesting

2186 Clin Cancer Res; 24(9) May 1, 2018 Clinical Cancer Research

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Pan-Cancer Molecular Classes

Gene expression 6 (norm. within cancer type) C A Lower Higher 4 c1 c2 c3 c4c5 c6 c7 c8 c9 c10 2

Fatty acid metabolism expression 0 Glycolysis/GNG Pentose phosphate -2 TCA CYCLE OX-PHOS -4 n = 8540 Ras Spearman’s r = -0.46 YAP1 MIR141/200C -6 P ~ 0

MYC normalized within cancer type EMT -8 HYPOXIA -0.8 -0.4 0 0.4 0.8 NRF2/KEAP1 cg24702147 - Differential DNA methylation WNT beta values centered within cancer type NOTCH Cancer type

10,224 Human cancers Pan-cancer c7 class - 1153 cases KIRC 78 UCEC 75 CRC 84 c1 c2 c3 c4c5 c6 c7 c8 c9 c10 D LUAD 74 7% 7% BRCA 140 B LUSC 74 6% 7% cg24702147 cg15822328 6% PCPG 20 12% THYM 13 5% MIR200A HNSC 57 MESO 12 MIR200B 4% LGG 11 BLCA 50 10% DLBC 9 MIR429 4% MIR141 CHOL 7 MIR200C THCA 50 4% KICH 7 4% SARC 24 PRAD 47 4% LAML 7 ZEB1 4% PAAD 27 TGCT 7 CDH2 CESC 44 LIHC 27 UVM 6 FN1 OV 41 KIRP 30 GBM 5 FOXC2 STAD 41 ESCA 38 UCS 5 GSC SKCM 40 ACC 3 ITGB6 MMP2 MMP3 Pan-cancer c8 class - 948 cases MMP9 STAD 85 CRC 98 SNAI1 SNAI2 HNSC 78 SOX10 9% 10% BRCA 136 BLCA 71 TWIST1 8% THYM 7 VIM 14% TGCT 5 CDH1 7% MESO 4 DSP SARC 3 OCLN 3% 6% OV 11 UVM 3 CHOL 2 Fibronectin protein LUAD 57 6% KIRP 13 LAML 17 KICH 2 E-Cadherin protein 4% ACC 1 LUSC 56 4% KIRC 17 Cancer type 4%4% PAAD 29 DLBC 1 CESC 31 LGG 1 10,224 Human cancers PRAD 40 SKCM 39 ESCA 32 UCS 1 Gene expression DNA methylation Mesenchymal marker UCEC 38 THCA 33 (norm. within cancer type) (center within cancer type) Epithelial marker LIHC 37 Lower Higher Lower Higher

Figure 2. Pathway-associated gene signatures across pan-cancer molecular classes. A, By pan-cancer molecular class, pathway-associated mRNA signatures (using values normalized within each cancer type). See Fig. 1C and D for cancer type color legend. Numbers of cases (n ¼ 10,224) denote representation on RNA-seq data platform. B, Corresponding to cases from part A, heat maps showing DNA methylation and expression levels for miR200 family members (using values normalized or centered within each cancer type). Representative DNA methylation probes (49) that map to the promoter of each miRNA cluster are shown (miR-141/200c ¼ cg24702147, miR-200a/200b/429 ¼ cg15822328). Normalized expression levels for a set of canonical epithelial or mesenchymal markers (11) are also shown. C, Scatter plot of differential methylation vs differential expression (using values normalized or centered within each cancer type), for cg24702147 versus miR-141/200c (normalized values for the two miRNAs being averaged). Numbers of cases denote representation on all three data platforms for mRNA-seq, miRNA-seq, and 450K DNA methylation. Data points are colored according to pan-cancer class, as represented in A and B. D, For c7 and c8 pan-cancer classes (associated with mesenchymal cells, along with hypoxia, NRF2/KEAP1, Wnt, and Notch signatures), distributions by cancer type. See also Supplementary Figs. S8 to S12 and Supplementary Data S3 and S4.

EMT (30) associated with the c7 class but not with the c8 class, and Immune checkpoints and neuroendocrine tumors the EMT-associated "SQ.1" molecular subtype of lung cancer (7) Analysis of gene expression patterns of normal tissues and was distributed among c3, c7, and c8 classes (Supplementary Fig. cells can provide meaningful context to the widespread differ- S12B). Three molecular subtypes previously associated with mes- ential patterns observed in cancer (7, 31). We examined the top enchymal features for specific cancer types, GBM:mesenchymal, 2,000 differential mRNAs from TCGA cohort (from Supple- OV:mesenchymal, and HNSC:mesenchymal, were each respec- mentary Fig. S5B) in a public expression dataset from the tively associated with c3, c7, and c8 pan-cancer classes (Fig. 1E). In Fantom consortium (10) of 850 profiles representing various comparison to c7, c8 was further distinguished by high expression human cell and tissue specimens. Inter-correlations between of fatty acid metabolism genes (Fig. 2A), and associated hypoxia- Fantom profiles and TCGA profiles revealed each pan-cancer related changes in c7 and c8, for example, suggest an altered molecular class to manifest distinctive patterns of global sim- microenvironment. All of this would indicate that the molecular ilarity or dissimilarity with specific categories of normal cells classes associated with lower purity would each represent distinc- and tissues (Fig. 3A; Supplementary Data S5). In particular, c3, tive biology, apart from the technical aspects involved with c5, and c10 classes were strongly associated with immune- sample collection. related cells and tissues; the c4 class was strongly associated

www.aacrjournals.org Clin Cancer Res; 24(9) May 1, 2018 2187

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Chen et al.

Gene expression (norm. within cancer type) A 850 Human cell and tissue profiles (fantom) B Lower Higher Cancer c1 c2 c3 c4c5 c6 c7 c8 c9 c10 cell line Immune CTAG1B - NY-ESO-1 CNS MAGEA4 Squamous SAGE1 fibroblast CD274 Adipose/ - PDL1 PDCD1 - PD1 CD247 - CD3 c1 PDCD1LG2 - PDL2 CTLA4 - CD152 TNFRSF9 - CD137 TNFRSF4 - CD134 TLR9

c2 LCK protein SYK protein Cancer type TEX101 CXorf61 HORMAD1 ACTL8 DMRT1 c3 CTCFL IGF2BP3 ATAD2 CASC5 c4 KIF20B KIF2C c5 CEP55

212 CT Antigen genes 212 CT NUF2 c6 TTK 10,224 Human cancers PBK OIP5

10,224 Human cancers C c1 c2 c3 c4c5 c6 c7 c8 c9 c10 c7 CHGA HSF2 SYP TPH1 NCAM1 SLC18A1 c8 ENO2 SSTR4 CDX2 SSTR5 PNMA2 SSTR1 c9 NAP1L1 RNF41 AKAP8L ZFHX3 c10 FLJ10357 SMARCD3 PHF21A 51 NET genes 51 NET GLT8D1 FZD7 Inter-profile Positive 10,224 Human cancers SPATA7 correlation (r) SSTR3 Gene expression PLD3 (norm. within cancer type) ZXDC OAZ2

Negative Lower Higher 0.00 0.03 0.07 0.10 -0.10 -0.07 -0.03

D Pan-cancer c3 class - 1,340 cases Pan-cancer c10 class - 1,217 cases Pan-cancer c4 class - 411 cases LGG 87 STAD 75 CESC 59 UCEC 79 THCA 89 CRC 71 HNSC 84 PRAD 77 BRCA 68 UCEC 88 LUSC 44 HNSC 69 6% SKCM 97 6% 7% CRC 65 6% 7% 6% 6% LUAD 66 7% LGG 117 14% 7% BRCA 106 6% BCLA 36 11% 5% ESCA 23 5% 17% 8% PAAD 20 10% MESO 20 LIHC 5 KIRC 63 5% 9% ACC 19 5% KIRP 18 MESO 5 4% LAML 18 LUSC 64 LIHC 12 GBM 4 4% 11% 7% 7% LUSC 60 4% 11% KICH 15 THYM 12 PRAD 4 UVM 15 KIRC 51 4% PCPG 11 ACC 2 4% OV 27 6% PAAD 6 LIHC 55 ESCA 12 4% UCS 10 KIRP 2 4% OV 50 KIRP 52 4% STAD 32 MESO 12 4% PAAD 25 LAML 8 STAD 26 6% 4% THCA 8 TGCT 2 4% GBM 33 THYM 12 4%4% GBM 33 UVM 8 5% 5% CRC 8 KICH 1 LUAD 49 THCA 50 BRCA 10 OV 35 TGCT 11 SKCM 34 ACC 4 LUAD 25 KIRC 1 PCPG 49 CESC 37 PRAD 46 SARC 35 DLBC 2 SARC 12 UCS 1 SARC 49 UCS 8 TGCT 46 SKCM 20 ESCA 15 HNSC 42 CHOL 6 CESC 39 CHOL 1 BLCA 42 DLBC 2 BLCA 45 KICH 1 UCEC 19

Figure 3. Normal tissue and cell-type associations with the pan-cancer molecular classes. A, Inter-profile correlations were computed between TCGA expression profiles (with values normalized within each cancer type) and profiles from the Fantom consortium expression dataset of various cell types or tissues from human specimens (n ¼ 850 profiles; ref. 10). Membership of the Fantom profiles in general categories of "cancer," "cell line," "immune" (immune cell types or blood or related tissues), "CNS" (related to central nervous system including brain), "squamous" (including bronchial, trachea, oral regions, throat and esophagus regions, nasal regions, urothelial, cervix, sebocyte, keratin/skin/epidermis), "fibroblast," or "adipocyte/heart" is indicated. Cancer-type color coding in D. B, Heat maps of differential expression (values normalized within each cancer type), for genes encoding immunotherapeutic targets (top), for LCK and SYK proteins (middle, representing markers for T and B cells, respectively), and for genes encoding cancer-testis (CT) antigens (from the CT Gene Database, http:// cancerimmunity.org/resources/ct-gene-database/). C, Heat maps of differential expression (values normalized within each cancer type), for genes encoding canonical markers of neuroendocrine tumors (top), and for a set of 51 genes in a panel of neuroendocrine tumor (NET) markers (34), as uncovered previously using gene expression profiling (bottom). D, For c3 (immune-associated), c10 (immune-associated), and c4 (CNS- and neuroendocrine-associated) pan-cancer classes, distributions by cancer type. See also Supplementary Fig. S13 and Supplementary Data S5.

withcellsandtissuesrelatedtothecentralnervoussystem blasts and other mesenchymal-related categories (reflecting the (CNS),andc8classwasassociatedwithCNStoasomewhat mesenchymal marker patterns observedabove;Fig.2B);and lesser degree; the c7 and c8 classes were associated with fibro- the c9 class was associated with adipose cells and heart tissues

2188 Clin Cancer Res; 24(9) May 1, 2018 Clinical Cancer Research

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Pan-Cancer Molecular Classes

(which may be reflective of the association with fatty acid involved in fatty acid synthesis. However, c8 tumors appeared to metabolism observed above; Fig. 2A). Analysis of the Fantom downregulate TCA cycle, oxidative phosphorylation, and fatty mouse expression dataset yielded similar associations (Supple- acid synthesis pathways, and to upregulate genes involving in fatty mentary Fig. S13A). In previous analyses utilizing TCGA expres- acid metabolism. sion profiles as normalized across all cancers, brain cancers and Deregulated pathways were also reflective of tumor microen- blood cancers associated as a group with fantom CNS and vironmental effects at work in distinct subsets of cancers. Differ- immune-related profiles, respectively (7). ential expression patterns involving the c3, c7, or c8 pan-cancer Focusing here on the immune-related associations found for molecular classes were consistent with a consensus model (35) of specific pan-cancer molecular classes, we went on to survey tumor-associated macrophage (TAM) roles in the tumor micro- TCGA expression profilesforasetofgenesrepresentingpoten- environment (Fig. 4B), whereby monocytes are first recruited to tial targets for immunotherapy (6), including cancer-testis (CT) the tumor microenvironment (e.g., by CSF1 and CCL2). Tumor- antigen genes and genes involved in immune checkpoint path- secreted cytokines then have the potential to polarize recruited way (Fig. 3B), such as genes encoding PD1, PDL1, PDL2, and monocytes into TAMs, which play vital roles in a number of CTLA4. As a group, CT antigen genes were highest in both c4 processes including immune suppression and EMT. EMT may also and c5 classes, with a subset of CT antigen genes being partic- be induced by either hypoxia or Wnt signaling pathway (36–38), ularly higher in c5 class (see Fig. 3B). However, immune both of which appear increased in c3, c7, and c8 tumors (Figs. 2A checkpoint genes were most strongly differentially expressed and 4B); NRF2/KEAP1 and Notch pathways also appeared within the c3 class, whereas the c8 and c10 classes also showed increased in these tumors. Immune suppression may involve increased though relatively lower expression of these genes. multiple redundant immune checkpoints, which were most asso- Differential protein expression of T-cell marker LCK and B-cell ciated with c3, c5, c8, and c10 classes (Figs. 3B and 4C); these marker SYK indicated the presence of T cells in c3, c8, and c10 tumors showed higher expression of ligands such as PDL1 classes, and the presence of B cells in c3 and c10 classes and PDL2, along with higher expression of the corresponding (Fig. 3B). Analysis of gene expression signatures from Bindea receptors associated with T cells. and colleagues (32) suggested that levels of infiltrating immune cell types were highest within the c3 class, followed by the c8 TCGA patterns observable in external cohorts class, whereas the c10 class showed signatures for some but not The pan-cancer molecular class associations, as observed in all immune cell types (Supplementary Fig. S13B). TCGA datasets, were also examined in an external multicancer The above associations of c4 tumors with CNS tissues and cells expression dataset. From the expO dataset, mRNA expression suggested features of neuroendocrine tumors (NET), which profiles of 2041 cancer cases representing 26 different types arise from cells of the endocrine and nervous systems and which were obtained. Within each cancer type, expression values for occur in many different tissues throughout the human body (33). each gene in the expO dataset were first normalized, and then Genes encoding canonical markers of NETs—including CHGA each expO tumor profile was classified by pan-cancer molecular (chromogranin A), SYP (synaptophysin), NCAM1 (CD56), classasdefined by TCGA data (Fig. 5A; Supplementary Figs. ENO2 (neuron-specific enolase), and CDX2—were all differen- S14A and S14B), where the top set of 854 mRNAs distinguish- tially higher in c4 tumors as compared with the other pan-cancer ing between our 10 classes (Fig. 1B) was used as the classifier. In classes (Fig. 3C). In addition, a set of 51 genes in a panel of NET thesamemannerascarriedoutaboveforTCGAdatasets,expO markers previously uncovered using gene expression profiling was expression profiles were also scored for mRNA signatures examined (34), with the majority of these being differentially relatedtospecific pathways or normal cell types (Fig. 5B; higher within the c4 class (Fig. 3C). TCGA LUAD cases previously Supplementary Fig. S15), where similar overall trends of path- determined to represent neuroendocrine cancer were also way-level differences between classes as originally observed in significantly enriched within the c4 class (Supplementary TCGAcohortcouldalsobeobservedintheexpOcohort.In Fig. S12B, 6/14 cases, P < 1E5, one-sided Fisher exact test). The particular, c5 tumors were composed almost entirely of breast c3, c10, and c4 pan-cancer classes each involved a wide range of cancer cases; anticipated pathway-level differences involving cancer types (Fig. 3D), with c4 class in particular being composed metabolism, YAP1, MYC, EMT, hypoxia, NRF2/KEAP1, WNT, of sizable percentages of head and neck, cervical, lung, bladder, NOTCH, and immune checkpoint were observed in expO ovarian, and gastrointestinal cancers. cohort; CT antigen genes were higher as a group in c4, with the same subset also being high in c5 class as observed for Pathway-level differences across pan-cancer molecular classes TCGA; and neuroendocrine-associated global patterns and The above pathway-associated gene signatures (Fig. 2A) indi- markers were associated with c4 class. cated differential activation of specific pathways among the pan- In a similar manner to that of the expO dataset, cell line cancer molecular classes. A number of these signatures were profiles from the cancer cell line encyclopedia (CCLE) expres- related to metabolism, including fatty acid metabolism, glycol- sion dataset (39) were each assigned to a pan-cancer class ysis/gluconeogenesis, pentose phosphate pathway, TCA cycle, (Supplementary Fig. S16). Not all patterns of interest observ- and oxidative phosphorylation. The individual genes that com- able in human tumor data were as apparent in the CCLE results, prised these signatures can be represented in a pathway diagram which could be attributed to a number of factors including (Fig. 4A), whereby the c1, c6, and c8 classes were each observed to growth conditions of cell lines lacking tumor microenviron- show evidence for the altered utilization of specific metabolic mental effects and differences in the types of cancers repre- pathways, as compared to the rest of the cancer cases. Both c1 and sented in CCLE; however, a number of associations were c6 tumors showed increased expression of genes involved with identifiable in CCLE, including c7 and c8 classes with EMT oxidative phosphorylation, whereas c6 tumors also showed and hypoxia, and c4 class with neuroendocrine-related patterns higher expression of genes involved in the TCA cycle and of genes (e.g., involving small cell lung cancers).

www.aacrjournals.org Clin Cancer Res; 24(9) May 1, 2018 2189

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Chen et al.

PDK1 PDK2 Fatty acid PDP1 PDP2 CPT1A CPT1B metabolism A PDK3 PDK4 PDPR Acycarnitine Fatty Acyl CoA CPT1C PDC, Pyruvate CPT2 ACSL4 ACSL1 RNA Dehydrogenase Complex β-oxidation ACSL5 ACSL6 Pyruvate DLAT DLD Acetyl-CoA Fatty Acyl CoA Fatty acids protein PDHA1 PDHA2 PDHB ACADL FASN PC FASN OAA CS Citrate Fatty acid Malonyl-CoA synthesis ACC MDH2 ACO2 ACC Acetyl CoA Malate Isocitrate Citrate ACLY IDH3B

c1 vs. others c6 vs. others c8 vs. others FH IDH3A IDH2 Oxaloacetate TCA Cycle IDH3G MDH1 FADH2 Fumarate α-ketoglutarate Malate ME1 OGDH Figure 4. Pyruvate NADPH SDHA Succinate SUCLA2 Succinyl-CoA Differentially active pathways across pan-cancer molecular classes. Complex II SDHB SUCLG1 SUCLG2 P -value/FDR <0.01/0.01 < 0.0001/0.0001 NNT < 1E-10/1.6E-10 SDHC A, Pathway diagram representing Lower in c1/c6/c8 core metabolic pathways, with SDHD NADH Higher in c1/c6/c8 differential expression patterns Complex III Complex IV represented (using values normalized Complex I Complex V Nuclear- within cancer type), comparing Nuclear- Nuclear- encoded encoded Nuclear- tumors in pan-cancer classes c1, c6, or encoded COX encoded NADH-DH UQCR c8 with tumors in the other subunits subunits ATPA subunits subunits seven classes (red, significantly higher The electron transport chain (ETC) in c1/c6/c8). B, Diagram of tumor- associated macrophage roles in the tumor microenvironment (35), and of Notch, NRF2-ARE, and B RNA Wnt/beta-catenin pathways, with JAG1 JAG2 DLL1 DLL3 DLL4 differential expression patterns Monocyte recruitment factors protein Hypoxia NOTCH1 NOTCH2 Notch represented (using values normalized CCL2 CSF1 HIF1A Hypoxia NOTCH3 NOTCH4 within cancer type), comparing targets tumors in pan-cancer classes c3, c7, or Tumor-associated macrophage c8 with tumors in the other seven classes (red, significantly higher in c3/ CD68 EMT LYVE1 CCL18 HES1 HES2 HEY1 HEY2 DTX1 c7/c8). C, Diagram of immune ZEB1 MIR200 EMT c3 vs. others c7 vs. others c8 vs. others checkpoint pathway (featuring VIM interactions between T cells and E-cad FGF18 IL10 NRF2-ARE antigen-presenting cells, including SNAI1 tumor cells), with differential HIF1A Immune MMP9 suppression KEAP1 CUL3 expression patterns represented Tissue VEGFA PDCD1 AXIN1 (using values normalized within remodeling CTNNB1 NFE2L2 PDGFA LAG3 APC cancer type), comparing tumors in pan-cancer classes c3, c5, or c10 with Enhanced invasion

P -value/FDR tumors in the other seven classes (red, <0.01/0.01 < 0.0001/0.0001 and metastasis < 1E-10/1.3E-10 NRF2/KEAP1 Lower in c3/c7/c8 Wnt Wnt/ -catenin significantly higher in c3/c5/c10). targets β targets Higher in c3/c7/c8 P values in A–C by Mann–Whitney U test.

C RNA Tumor cell/professional APC/target cell SAGE1 CTAG1B MAGEA4 protein PDL1 PDL2 HVEM CD137L OX40L CD274 PDCD1LG2 CD80 CD86 TNFRSF14 MHC II TNFSF9 TNFSF14 CD48

TCR

c3 vs. others c5 vs. others PDCD1 CTLA4 BTLA CD4 LAG3 TNFRSF9 TNFRSF4 CD244 c10 vs. others P Lower in c3/c5/c10 Higher in c3/c5/c10 -value / FDR PD1 CD137 OX40 2B4 LCK CD247 <0.01/0.01 <0.0001/0.0001 CD3 T cell <1E-10/1.2E-10

2190 Clin Cancer Res; 24(9) May 1, 2018 Clinical Cancer Research

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Pan-Cancer Molecular Classes

A TCGA External GSE2109 expression dataset (n = 2,041) pan-cancer c1 c2 c3 c4c5 c6 c7 c8 c9 c10 Lower (norm. within cancer type) c1 c3 Gene expression c4 c5 c6 c7 c8

Figure 5. Higher c9 Observation of patterns associated with Class-specific genes TCGA pan-cancer molecular classes in an c10 external multicancer expression profiling dataset. A, Gene expression Abdominal, Bladder, Bone & Cartilage, Brain, Breast, Cervix, Colorectal, Corpus Uteri, profiles of 2041 cancer cases of various Endometrium, Esophagus, Kidney, Liver, Lung, Omentum, Ovary, , pathologically defined cancer types, Pelvic Tissue, Prostate, Renal Pelvis, Soft Tissue, Stomach, Thyroid, Uterus, Vulva represented in the expO (GSE2109) dataset (profiles being normalized External GSE2109 expression dataset (n = 2,041) within their respective cancer type), B c1 c2 c3 c4c5 c6 c7 c8 c9 c10 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 were classified according to TCGA pan- cancer molecular class. Expression Fatty acid metabolism patterns for the top set of 854 mRNAs Glycolysis/GNG Pentose phosphate distinguishing between the 10 TCGA TCA CYCLE molecular classes (from Fig. 1A) are OX-PHOS Ras shown for both TCGA and GSE2109 YAP1 datasets. Genes in the GSE2109 sample MYC profiles sharing similarity with TCGA EMT fi HYPOXIA class-speci c signature pattern are Pathway signatures NRF2/KEAP1 highlighted. B, In the same manner as WNT carried out for TCGA datasets, expO NOTCH expression profiles were scored for CD274 - PDL1 pathway-associated gene signatures PDCD1 - PD1 CD247 (from Fig. 2A), surveyed for immune - CD3 PDCD1LG2 - PDL2 checkpoint markers and for CT antigen CTLA4 - CD152 genes (from Fig. 3B, using the TNFRSF9 - CD137 TNFRSF4 - CD134 same gene ordering), surveyed for TLR9

canonical NET markers (from Fig. 3C), Immune checkpoint TEX101 and scored for similarity to normal cell- CXorf61 HORMAD1 type categories represented in the ACTL8 fantom dataset (from Fig. 3A). Pan- DMRT1 CTCFL cancer class associations of particular IGF2BP3 interest (which tend to follow the ATAD2 CASC5 patterns first observed in TCGA cohort) KIF20B are highlighted. The purple-cyan heat KIF2C CEP55 maps off to the right denote t-statistics NUF2

for comparing the given class versus the Antigen genes 161 CT TTK PBK rest of the tumors; dark purple or cyan OIP5 corresponds approximately to P < 0.01. CHGA A and B have the same ordering of expO SYP

fi NET NCAM1 expression pro les. See also ENO2 Supplementary Figs. S14 to S16 and CDX2 Supplementary Data S6. Immune profiles CNS profiles Fibroblast profiles Adipocyte profiles Fantom Gene expression t-statistic, differential expression (norm. within cancer type) Average inter-profile correlation given class versus others Lower Higher Negative Positive Lower Higher 0.00 2.67 5.33 8.00 -8.00 -5.33 -2.67

Discussion remarkable that basal-like breast cancer forms its own molecular Although previous studies have greatly served to elucidate the class—made up only of breast cancers—entirely distinct from all molecular landscape of the individual cancer types represented in of the other cancer cases examined. Although basal-like breast TCGA, our pan-cancer molecular classes provide an excellent cancer is already understood to represent a fundamentally differ- framework for examining pathways or processes that would cut ent disease than other types of breast cancer, questions remain as across these individual types. All but one of these molecular to the origins of the observed molecular differences (40). The classes each involves a wide range of cancer types and would distinct patterns associated with basal-like breast cancer at the therefore be relevant to the study of multiple diseases. It is also epigenetic and transcriptional levels, in this study, could suggest

www.aacrjournals.org Clin Cancer Res; 24(9) May 1, 2018 2191

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Chen et al.

that this disease would have a different cell-of-origin from that of TCGA cases belonging to this "c4" class. Only through our other breast cancers. analytical approach of subtracting out tissue-level molecular The landscape of pan-cancer–associated biological processes differences were we able to uncover this neuroendocrine-associ- and pathways as uncovered here would include many that were ated pan-cancer class (where otherwise such CNS-related patterns well-established as having a functional role in the experimental would be more strongly associated with TCGA brain cancers). setting, but for which the full extent of their involvement in NETs are understood to occur throughout the body, including in human cancers may have been unclear. Processes and pathways breast, cervical, lung, pancreas, ovarian, and gastrointestinal tis- such as metabolism, immune checkpoint, hypoxia, NRF2-ARE, sues, although it is believed that the incidence of these tumors is HIPPO, Wnt, EMT, and Notch have been well characterized in relatively rare, estimated at about 1% of cancers diagnosed in the the experimental setting, using models such as cell lines and United States (44, 45). If on the order of 4% of human cancers mice (41). This study finds each of the above to involve large could be considered as NETs (or at least manifesting a molecular subsets of human cancers, as observed based on the coordinate profile associated with these tumors), this would be well outside expression patterns involving large numbers of genes. DNA the range of previous estimates, suggesting that a large proportion mutation events alone (and tumor evolution by proxy) would of NETs are not diagnosed as such, but rather are grouped in with not appear to represent the sole driver of these processes in other cancers sharing the same tissue of origin. The tumor samples cancer, where tumor microenvironment influenceslikelyplaya originally contributed to TCGA came from many different treat- major role. For example, mutations in VHL or in RTK genes may ment facilities, with no uniform practices for determining neu- accelerate processes of hypoxia or RTK signaling, respectively, roendocrine features being in place. Current strategies for iden- or microenvironmental conditions such as lack of oxygen or tifying NETs would include staging at surgery, pathological grad- availability of growth factors could conceivably achieve the ing, blood Chromogranin A (CgA) measurements, and detection same respective results. Individual molecular markers of our of circulating tumor cells, with there being a clear need for pan-cancer molecular classes that might seem most relevant additional and better biomarkers (34). The more accurate iden- from a therapeutic standpoint—including markers of processes tification of neuroendocrine-associated cancers in particular and pathways noted above—could potentially be evaluated in would have important implications regarding the treatment of the setting of patient care in future studies. this disease (46). Multiple pan-cancer classes manifested patterns that we could ascribe to the noncancer cellular component of the tumor, Disclosure of Potential Conflicts of Interest fi fi including immune in ltrates and cancer-associated broblasts. No potential conflicts of interest were disclosed. Our finding of different stroma-associated tumor subsets, each with distinctive features that would distinguish it from the other pan-cancer classes, suggests various biological roles for the stro- Authors' Contributions mal component in human cancer. Three of our pan-cancer classes Conception and design: C.J. Creighton Development of methodology: C.J. Creighton (c3, c5, and c10) showed particularly strong patterns of immune Analysis and interpretation of data (e.g., statistical analysis, biostatistics, cell infiltrates, whereas one of these (c3) showing stronger pat- computational analysis): F. Chen, Y. Zhang, B. Deneen, D.J. Kwiatkowski, terns of immune checkpoint pathway genes and of genes involved C.J. Creighton in specific metabolic pathways. Three of our pan-cancer classes Writing, review, and/or revision of the manuscript: D.L. Gibbons, (c3, c7, and c8) showed patterns of mesenchymal cells, with two D.J. Kwiatkowski, M. Ittmann, C.J. Creighton of these (c7 and c8) showing strong patterns of fibroblasts in Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C.J. Creighton particular, and with one of these (c8) showing patterns associated Study supervision: C.J. Creighton with fatty acid metabolism. The tumor microenvironment— which consists of a mixture of fibroblasts, myofibroblasts, endo- Acknowledgments thelial cells, immune cells, other cells, and altered extracellular — This work was supported in part by Cancer Prevention and Research Institute matrix is understood to play an important role in the initiation of Texas (CPRIT) grants RP120713 C2 (to C.J. Creighton), RP150405 (to and progression of various cancers (42, 43). For example, EMT of D.L. Gibbons), and RP120713 P2 (to D.L. Gibbons), and by the NIH grant the cancer cells can occur in areas of fibrosis and may account for P30CA125123 (to C. Creighton). up to 40% of tumor-associated fibroblasts (43). As reflected in TCGA data, the various ways in which the tumor microenviron- fl The costs of publication of this article were defrayed in part by the ment can in uence cancer may be involved within distinct subsets payment of page charges. This article must therefore be hereby marked of human tumors. advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate Our discovery of a molecular class of cancers manifesting a this fact. differential expression profile associated with both NETs and with normal cells and tissues of the CNS would potentially have Received November 13, 2017; revised January 8, 2018; accepted February 2, implications for a large subset of human cancers, with 4% of 2018; published first February 9, 2018.

References 1. Perou C, Sørlie T, Eisen M, van de Rijn M, Jeffrey S, Rees C, et al. Molecular 3. Creighton C. The molecular profile of luminal B breast cancer. Biologics portraits of human breast tumours. Nature 2000;406:747–52. 2012;6:289–97. 2. Cancer_Genome_Atlas_Research_Network, Weinstein J, Collisson E, 4. Zhang Y, Kwok-Shing Ng P, Kucherlapati M, Chen F, Liu Y, Tsang Y, et al. Mills G, Shaw K, Ozenberger B, et al. The Cancer Genome Atlas pan- A pan-cancer proteogenomic atlas of PI3K/AKT/mTOR pathway altera- cancer analysis project. Nat Genet 2013;45:1113–20. tions. Cancer Cell 2017 [Epub ahead of print].

2192 Clin Cancer Res; 24(9) May 1, 2018 Clinical Cancer Research

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Pan-Cancer Molecular Classes

5. Hoadley K, Yau C, Wolf D, Cherniack A, Tamborero D, Ng S, et al. Multi- 27. Creighton C. Multiple oncogenic pathway signatures show coordinate platform analysis of 12 cancer types reveals molecular classification within expression patterns in human prostate tumors. PloS One 2008;3: and across tissues of origin. Cell 2014;158:929–44. e1816. 6. Chen F, Zhang Y, S¸ enbabaoglu Y, Ciriello G, Yang L, Reznik E, et al. 28. McCart-Reed A, Kutasovic J, Lakhani S, Simpson P. Invasive lobular Multilevel genomics-based taxonomy of renal cell carcinoma. Cell Rep carcinoma of the breast: morphology, biomarkers and 'omics'. Breast 2016;14:2476–89. Cancer Res 2015;17:12. 7. Chen F, Zhang Y, Parra E, Rodriguez J, Behrens C, Akbani R, et al. Multi- 29. Nieto M, Huang R, Jackson R, Thiery J. EMT: 2016. Cell 2016;166:21–45. platform-based molecular subtypes of non-small cell lung cancer. Onco- 30. Chang J, Wooten E, Tsimelzon A, Hilsenbeck S, Gutierrez M, Tham Y, et al. gene 2016 [Epub ahead of print]. Patterns of resistance and incomplete response to docetaxel by gene 8. Akbani R, Ng P, Werner H, Shahmoradgoli M, Zhang F, Ju Z, et al. A pan- expression profiling in breast cancer patients. J Clin Oncol 2005;23: cancer proteomic perspective on The Cancer Genome Atlas. Nat Commun 1169–77. 2014 [Epub ahead of print]. 31. Davis C, Ricketts C, Wang M, Yang L, Cherniack A, Shen H, et al. The 9. Chen F, Zhang Y, Bosse D, Lalani A, Hakimi A, Hsieh J, et al. Pan-urologic somatic genomic landscape of chromophobe renal cell carcinoma. Cancer cancer genomic subtypes that transcend tissue of origin. Nat Commun Cell 2014;26:319–30. 2017;8:199. 32. Bindea G, Mlecnik B, Tosolini M, Kirilovsky A, Waldner M, Obenauf A, et al. 10. FANTOM_Consortium_and_the_RIKEN_PMI_and_CLST_(DGT), Forrest Spatiotemporal dynamics of intratumoral immune cells reveal A, Kawaji H, Rehli M, Baillie J, de Hoon M, et al. A promoter-level the immune landscape in human cancer. Immunity 2013;39:782–95. mammalian expression atlas. Nature 2014;507:462–70. 33. Ramage J, Ahmed A, Ardill J, Bax N, Breen D, Caplin M, et al. Guidelines for 11. Gibbons D, Creighton C. Pan-cancer survey of epithelial-mesenchymal the management of gastroenteropancreatic neuroendocrine (including transition markers across The Cancer Genome Atlas. Dev Dyn 2017 [Epub carcinoid) tumours (NETs). Gut 2012;61:6–32. ahead of print]. 34. Modlin I, Drozdov I, Kidd M. The identification of gut neuroendocrine 12. Ciriello G, Gatza M, Beck A, Wilkerson M, Rhie S, Pastore A, et al. tumor disease by multiple synchronous transcript analysis in blood. PloS Comprehensive molecular portraits of invasive lobular breast cancer. Cell One 2013;8:e63364. Cycle 2015;163:506–19. 35. Cook J, Hagemann T. Tumour-associated macrophages and cancer. Curr 13. Cancer_Genome_Atlas_Research_Network, Kandoth C, Schultz N, Cher- Opin Pharmacol 2013;13:595–601. niack A, Akbani R, Liu Y, et al. Integrated genomic characterization of 36. Tsai Y, Wu K. Hypoxia-regulated target genes implicated in tumor metas- endometrial carcinoma. Nature 2013;497:67–73. tasis. J Biomed Sci 2012;19:102. 14. Cancer_Genome_Atlas_Research_Network. The molecular taxonomy of 37. Kao S, Wu K, Lee W. Hypoxia, epithelial-mesenchymal transition, and TET- primary prostate cancer. Cell 2015;163:1011–25. mediated epigenetic changes. J Clin Med 2016;5:E24. 15. Cancer_Genome_Atlas_Research_Network. Comprehensive molecular 38. Micalizzi D, Farabaugh S, Ford H. Epithelial-mesenchymal transition in characterization of urothelial bladder carcinoma. Nature 2014;507: cancer: parallels between normal development and tumor progression. 315–22. J Mammary Gland Biol Neoplasia 2010;15:117–34. 16. Cancer_Genome_Atlas_Research_Network. Comprehensive molecular 39. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin A, Kim S, profiling of lung adenocarcinoma. Nature 2014;511:543–50. et al. The cancer cell line encyclopedia enables predictive modelling of 17. Cancer_Genome_Atlas_Research_Network. Comprehensive genomic anticancer drug sensitivity. Nature 2012;483:603–7. characterization of squamous cell lung cancers. Nature 2012;489:519–25. 40. Skibinski A, Kuperwasser C. The origin of breast tumor heterogeneity. 18. Cancer_Genome_Atlas_Research_Network. Integrated genomic analyses Oncogene 2015;34:5309–16. of ovarian carcinoma. Nature 2011;474:609–15. 41. Hanahan D, Weinberg R. Hallmarks of cancer: the next generation. Cell 19. Cancer_Genome_Atlas_Research_Network. Comprehensive genomic 2011;144:646–74. characterization defines human glioblastoma genes and core pathways. 42. Dakhova O, Ozen M, Creighton C, Li R, Ayala G, Rowley D, et al. Global Nature 2008;455:1061–8. gene expression analysis of reactive stroma in prostate cancer. Clin Cancer 20. Cancer_Genome_Atlas_Research_Network. Integrated genomic and Res 2009;15:3979–89. molecular characterization of cervical cancer. Nature 2017;543:378–84. 43. Franco O, Shaw A, Strand D, Hayward S. Cancer associated fibroblasts in 21. Cancer_Genome_Atlas_Research_Network. Integrated genomic character- cancer pathogenesis. Semin Cell Dev Biol 2010;21:33–9. ization of oesophageal carcinoma. Nature 2017;541:169–75. 44. Yao J, Hassan M, Phan A, Dagohoy C, Leary C, Mares J, et al. One hundred 22. Cancer_Genome_Atlas_Network. Genomic classification of cutaneous years after "carcinoid": epidemiology of and prognostic factors for neu- melanoma. Cell 2015;161:1681–96. roendocrine tumors in 35,825 cases in the United States. J Clin Oncol 23. Cancer_Genome_Atlas_Network. Comprehensive genomic characteriza- 2008;26:3063–72. tion of head and neck squamous cell carcinomas. Nature 2015;517: 45. Basuroy R, Srirajaskanthan R, Ramage J. Neuroendocrine tumors. Gastro- 576–82. enterol Clin North Am 2016;45:487–507. 24. Cancer_Genome_Atlas_Research_Network. Comprehensive, integrative 46. Strosberg J, El-Haddad G, Wolin E, Hendifar A, Yao J, Chasen B, et al. Phase genomic analysis of diffuse lower-grade gliomas. N Engl J Med 2015;372: 3 Trial of 177Lu-dotatate for midgut neuroendocrine tumors. N Engl J Med 2481–98. 2017;376:125–35. 25. Gingras M, Covington K, Chang D, Donehower L, Gill A, Ittmann M, et al. 47. Aran D, Sirota M, Butte A. Systematic pan-cancer analysis of tumour purity. Ampullary cancers harbor ELF3 tumor suppressor gene mutations and Nat Commun 2015;6:8971. exhibit frequent WNT dysregulation. Cell Rep 2016;14:907–19. 48. Storey JD, Tibshirani R. Statistical significance for genomewide studies. 26. Singh A, Greninger P, Rhodes D, Koopman L, Violette S, Bardeesy N, Proc Natl Acad Sci USA 2003;100:9440–5. et al. A gene expression signature associated with "K-Ras addiction" 49. Cherniack A, Shen H, Walter V, Stewart C, Murray B, Bowlby R, et al. reveals regulators of EMT and tumor cell survival. Cancer Cell 2009; Integrated molecular characterization of uterine carcinosarcoma. Cancer 15:489–500. Cell 2017;31:411–23.

www.aacrjournals.org Clin Cancer Res; 24(9) May 1, 2018 2193

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research. Published OnlineFirst February 9, 2018; DOI: 10.1158/1078-0432.CCR-17-3378

Pan-Cancer Molecular Classes Transcending Tumor Lineage Across 32 Cancer Types, Multiple Data Platforms, and over 10,000 Cases

Fengju Chen, Yiqun Zhang, Don L. Gibbons, et al.

Clin Cancer Res 2018;24:2182-2193. Published OnlineFirst February 9, 2018.

Updated version Access the most recent version of this article at: doi:10.1158/1078-0432.CCR-17-3378

Supplementary Access the most recent supplemental material at: Material http://clincancerres.aacrjournals.org/content/suppl/2018/02/08/1078-0432.CCR-17-3378.DC1

Cited articles This article cites 45 articles, 5 of which you can access for free at: http://clincancerres.aacrjournals.org/content/24/9/2182.full#ref-list-1

Citing articles This article has been cited by 5 HighWire-hosted articles. Access the articles at: http://clincancerres.aacrjournals.org/content/24/9/2182.full#related-urls

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at Subscriptions [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://clincancerres.aacrjournals.org/content/24/9/2182. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from clincancerres.aacrjournals.org on October 4, 2021. © 2018 American Association for Cancer Research.