www.nature.com/scientificreports

OPEN A novel approach to triple- negative breast molecular classifcation reveals a luminal Received: 11 June 2018 Accepted: 19 December 2018 immune-positive subgroup with Published: xx xx xxxx good prognoses Guillermo Prado-Vázquez1,2, Angelo Gámez-Pozo1,2, Lucía Trilla-Fuertes2, Jorge M. Arevalillo4, Andrea Zapater-Moros1,2, María Ferrer-Gómez1, Mariana Díaz-Almirón3, Rocío López-Vacas1, Hilario Navarro4, Paloma Maín 5, Jaime Feliú6,7, Pilar Zamora6, Enrique Espinosa6,7 & Juan Ángel Fresno Vara 1,7

Triple-negative breast cancer is a heterogeneous disease characterized by a lack of hormonal receptors and HER2 overexpression. It is the only breast cancer subgroup that does not beneft from targeted therapies, and its prognosis is poor. Several studies have developed specifc molecular classifcations for triple-negative breast cancer. However, these molecular subtypes have had little impact in the clinical setting. expression data and clinical information from 494 triple-negative breast tumors were obtained from public databases. First, a probabilistic graphical model approach to associate profles was performed. Then, sparse k-means was used to establish a new molecular classifcation. Results were then verifed in a second database including 153 triple-negative breast tumors treated with neoadjuvant chemotherapy. Clinical and gene expression data from 494 triple- negative breast tumors were analyzed. Tumors in the dataset were divided into four subgroups (luminal-androgen expressing, basal, claudin-low and claudin-high), using the cancer stem cell hypothesis as reference. These four subgroups were defned and characterized through hierarchical clustering and probabilistic graphical models and compared with previously defned classifcations. In addition, two subgroups related to immune activity were defned. This immune activity showed prognostic value in the whole cohort and in the luminal subgroup. The claudin-high subgroup showed poor response to neoadjuvant chemotherapy. Through a novel analytical approach we proved that there are at least two independent sources of biological information: cellular and immune. Thus, we developed two diferent and overlapping triple-negative breast cancer classifcations and showed that the luminal immune-positive subgroup had better prognoses than the luminal immune-negative. Finally, this work paves the way for using the defned classifcations as predictive features in the neoadjuvant scenario.

Breast cancer (BC) causes 450,000 deaths every year worldwide1. BC is clinically and genetically heterogeneous2, and this heterogeneity has led to subdivisions in an attempt to treat patients more efciently. Te classical cate- gorization considers the expression of hormonal receptors (estrogen receptors [ERs], and progesterone receptors

1Molecular Oncology & Pathology Lab, INGEMM, La Paz University Hospital Health Research Institute-IdiPAZ, Madrid, Spain. 2R&D department, Biomedica Molecular Medicine SL, Madrid, Spain. 3Biostatistics Unit, La Paz University Hospital Health Research Institute-IdiPAZ, Madrid, Spain. 4Department of Statistics, Operational Research and Numerical Analysis, National University of Distance Education (UNED), Madrid, Spain. 5Department of Statistics and Operations Research, Faculty of Mathematics, Complutense University of Madrid, Madrid, Spain. 6Medical Oncology Service, La Paz University Hospital Health Research Institute-IdiPAZ, Madrid, Spain. 7Biomedical Research Networking Center on Oncology-CIBERONC, ISCIII, Madrid, Spain. Correspondence and requests for materials should be addressed to J.Á.F. (email: [email protected])

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 1 www.nature.com/scientificreports/

[PRs]) and epidermal growth factor receptor 2 (HER2) expression, because this determines the possibility of treatment with hormones and anti-HER2 therapies, respectively. Triple-negative breast cancer (TNBC) is defned by a lack of ER and PR expression and a lack of HER2 overex- pression. TNBC comprises a heterogeneous group of tumors. In 2000, Perou et al. proposed a classifcation of BC based on gene expression patterns. Most triple-negative tumors are included in the so-called basal-like molecular subgroup3, although both categories have up to 30% discordance4. Several studies have developed specifc molecular classifcations for TNBC. For example, Rody et al. defned metagenes that distinguished molecular subsets within the group5. Lehmann et al. identifed seven molecular subgroups: unstable; basal-like 1; basal-like 2; immunomodulatory; mesenchymal (MES)-like; mesenchymal stem-like (MSL); and luminal (LAR)6. Te Immunomodulatory and MSL subtypes have recently been refned7. Burstein et al. applied non-negative matrix factorization and defned four subgroups: basal-like immune active; basal-like immune suppressed; mesenchymal; and luminal AR8. Other classifcations have also been proposed by Sabatier9, Prat10, Jézéquel11, and Milioli12. Despite these extensive studies, the desig- nation of TNBC molecular subtypes has had little impact in the clinical setting. Te so-called cancer stem cell hypothesis could provide a diferent way to categorize BC. It theorizes that can- cer derives from a stem cell compartment that undergoes an abnormal and poorly regulated process of organo- genesis analogous to many aspects of normal stem cells13–15. Depending on the activation point of these cancer stem cells, tumors will have varying characteristics. Poorly diferentiated breast tumors would arise from the most primitive stem cells14. Tis hypothesis contextualizes BC molecular groups1 in a development framework. Moreover, molecular characterization of the claudin (CLDN)-low subtype reveals that these tumors are signif- cantly enriched in epithelial-mesenchymal transition and stem cell-like features, while showing a low expression of luminal and proliferation-associated genes16. In the present study, we applied probabilistic graphical models to a previously published TNBC cohort5. Tis technique allows exploring the molecular information from a functional perspective. Our aim was to tackle the molecular analysis of TNBC from a broad perspective, such as the cancer stem cell hypothesis, to provide a clas- sifcation with clearer clinical implications. Methods TNBC gene expression and clinical data. Gene expression data from TNBC tumors and available clin- ical follow-up information were obtained from GSE31519. Gene expression values were magnitude normalized, 17 and then log2 was calculated. Te Limma R package was applied to avoid the batch efect. Finally, the complete dataset was mean centered. Te probe with the highest variance of each gene within all patients was selected. Te results obtained with the frst database were then applied to a second database of patients treated with neoadju- vant chemotherapy, GSE25066. GSE25066 data was magnitude normalized and log2 was calculated just as with GSE31519.

Probabilistic graphical model analysis. A probabilistic graphical model compatible with a high-dimensionality approach to associate gene expression profles, including the most variable 2000 , was performed as previously described18. Briefy, the resulting network, in which each node represents an individual gene, was split into several branches to identify functional structures within the network. Ten, we used analyses to investigate which function or functions were overrepresented in each branch, using the functional annotation chart tool provided by DAVID 6.8 beta19. We used “homo sapiens” as a background list and selected only GOTERM-DIRECT gene ontology categories and Biocarta and KEGG pathways. Functional nodes were composed of nodes presenting a gene ontology enriched category. To measure the functional activity of each functional node, the mean expression of all the genes included in one branch related to a concrete function was calculated. Diferences in functional node activity were assessed by class comparison analyses. Finally, metanodes were defned as groups of related functional nodes using nonsupervised hierarchical clustering analyses.

Sparse k-means classifcation. Sparse k-means was used to establish the optimal number of tumor groups. Tis method uses the genes included in each node and metanode, as previously described20. Briefy, classifcation consistency was tested using random forest. An analysis using the consensus clustering algorithm21 as applied to the data containing the variables that were selected by the sparse K-means method22 has provided an optimum classifcation into two subtypes in previous studies20. In order to transfer the newly defned classifcation from the main dataset to other datasets, we constructed centroids for each defned subgroup, using genes included in various metanodes.

Assignation to groups defned by other molecular classifcations. Tumors in the main dataset were assigned to a single group according to previously defned molecular classifcations: PAM50 + CLDN low was assigned using the single sample predictor10. Burstein’s four subtypes were assigned using an 80-gene signature8. Te TNBC4 type was performed in two steps: frst, Lehmann’s seven subtypes were assigned using centroids con- structed from 77 tumors included in the dataset that was previously assigned, and then Immunomodulatory and MSL groups were redefned as previously described7.

Statistical analyses and software suites. Survival curves were estimated using Kaplan–Meier analy- ses and compared with the log- test, using relapse free survival (RFS) as the end point. RFS was defned as the time between the day of surgery and the date of distant relapse or last date of follow-up. Correlations were assessed using Pearson’s r and linear regression. Diferences in functional node activity between groups were assessed by the Kruskal–Wallis test, and multiple comparisons were assessed using the Dunn’s multiple compar- isons test. Box-and-whisker plots are Tukey boxplots. All p-values were two-sided, and P < 0.05 was considered statistically signifcant. Expression data and network analyses were performed in MeV and Cytoscape sofware

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 2 www.nature.com/scientificreports/

Main Dataset Neoadjuvant dataset p-value Number of patients 494 153 Tumor Size T1 99 (20%) 9 (6%) <0.0001 >T1 276 (56%) 144 (94%) NA 119 (24%) Tumor Grade G1&2 103 (21%) 16 (10%) 0.0001 G3 280 (57%) 124 (81%) NA 111 (22%) Lymph node status N0 251 (51%) 37 (24%) <0.0001 N1 68 (14%) 116 (76%) NA 175 (35%) Adjuvant Chemotherapy No 257 (52%) Yes 71 (14%) NA 166 (34%) Pathological Response RD 95 (62%) pCR 53 (34%)

Table 1. Clinical features of the main and neoadjuvant datasets. Size data is divided into T1 (<2 cm) and >T1 (>2 cm) tumors; grade is classifed as G1&2 (well or moderately diferentiated tumors) or G3 (poorly diferentiated tumors); lymph node status represents lymph node invasion (N0: no invasion; N1: invasion or metastasis); and the adjuvant chemotherapy column comprises patients who had been treated with adjuvant chemotherapy or not. Te pathological response column stands for the response to neoadjuvant treatment (RD: residual disease; pCR: pathological complete response). Te chi-squared test confrmed that both cohorts are diferent regarding clinical parameters and treatment.

suites23. Te SPSS v16 sofware package, GraphPad Prism 5.0 and R v2.15.2 (with the Design sofware package 0.2.3) were used for all the statistical analyses. Results Gene expression and clinical data. Gene expression data and clinical information from 579 TNBC tum- ors were obtained from GSE31519. Some 85 samples were excluded because the patients had been treated with neoadjuvant chemotherapy or a diferent platform had been used. As a consequence, the data from 494 TNBC tumors from GSE31519 were used in subsequent analyses. Gene expression was normalized, the batch efect was corrected and the most variant probe was selected for each gene. Te resulting dataset, including expression val- ues from 13,146 genes will be referred to as the main dataset from now on. Gene expression data from 508 breast cancer samples treated with neoadjuvant taxane-anthracycline chemo- therapy were retrieved from GSE25066. A total 153 of these 508 samples were identifed as TNBC.

Clinical features. All available clinical features of the main dataset and the neoadjuvant dataset are presented in Table 1. Te main dataset’s population of tumors tended to be large (>T1 in 56% of the population), poorly dif- ferentiated (G3 in 57% of the samples), with no node invasion (N0 in 51% of the samples) and most of the patients were not treated with adjuvant chemotherapy (52%). Te neoadjuvant dataset’s population of tumors tended to be T2 (44%) and T3 (32%), poorly diferentiated (G3 in 81% of the samples), and N1 (46%) with 32% of the patients achieving a complete pathological response afer neoadjuvant treatment.

Molecular characterization of TNBC. A gene expression-based network, including the 2000 most variant genes in the development dataset, was constructed using a probabilistic graphical model (PGM) (Fig. 1). Te functional structure of the network was explored using gene ontology analyses, and 26 functional nodes were defned (Fig. 1 and Sup. File 1). Functional node activity was calculated and relationships between nodes were assessed using a hierarchical clustering (HCL) analysis (Sup. File 2). Functional node 1 is composed of 34 genes, including the CLDN3, CLDN4 and CLDN7 genes. On the other hand, functional nodes 15 (chemokine activity), 16 (major histocompatibility complex class II receptor activity), 17 (immune response) and 18 (antigen bind- ing) were related to various aspects of the immune response and clustered together as an “immune metanode” in the HCL analysis (Sup. File 2). Additionally, functional node 19 contained genes related to the peroxisome proliferator-activated receptor (PPAR) signaling pathway, and functional node 24 contained genes involved in the G1/S transition of mitotic cell cycle (Sup. File 1). We then used the method described by Rody et al. to assess 15 metagenes (series of genes known to be related to one specifc biological function or characteristic)5. Genes within a given metagene appeared close to each other in our network. Additionally, related metagenes, i.e., B-cell and IL-8 metagenes, also appeared close to each other (Fig. 2).

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 3 www.nature.com/scientificreports/

ST8SIA3 ADAM7

CXCL3 CPS1 MEST CEACAM8

CXCL2 CXCL5 HCG4 BST2 DHX58 RTP4 KIAA0087

CCL20 IFI27 CXCL6 HESX1 CXCL1 RETN MX1 IFIT1 PPBP IFI44L IL1RAP C14orf143 IFIH1

LOC283079 FOSL1 DENND4B CDH6 CA2 LAMP3 CA9 GAD2 ODZ1 FOXRED2 ENO2 IL8 CCL13 TNFRSF10D HYAL1 MTUS2 DPYSL4 USP49 ICOS AQP9

SLC11A1 KLRC3 VEGFA PPFIA4 MMP12 CCL18 ERAP2 KLRC1///KLRC2

ABCA8 ADH1B LOC57399 TRDV3 TRPC5 STMN3 GNLY TRD@ SYNGR3 GZMB FABP4 TNIP3 MEOX2 CAT IGHA1///IGHA2///IGHD///IGHG1///IGHG3///IGHG4///IGHM///IGHV4-31///IGHV4-59///LOC100126583 CHRDL1 ADIPOQ PTH2R CD36 PRG4 SLC6A6 CD8B PPARG IGHA1///IGHG1///IGHG2///IGHG3///IGHM///LOC100126583///LOC100290036IGHA2///IGHD///IGHG1///LOC100126583///LOC100510361 CTSW XCL1///XCL2 PLIN1 RGS12 PRDM2 TRHDE Node 1 PDAP1 TDRD12 HERC2P3 THPO XYLB GYS2 FCGR1A LOC653513///PDE4DIP G6PC PDE3B XCL1 PIP PDK4 DKK4 CXCR6 IGHA1///IGHA2///IGHD///IGHG1///IGHG3///IGHG4///IGHM///IGHV3-23///IGHV4-31///IGHV4-59///LOC100126583IGHM CFD AGT SOX30 HP RBP4 LOC100507804///TPSAB1 PCK1 SLC2A6 LPL LIPE MYEF2 HP///HPR C10orf116 ZNF214 NAV3 AKR1C3 TPSAB1///TPSB2 GPD1 MGC4859 TPSAB1 SORBS1 ITGA7 AKR1C1 ASPA CRIP1 GZMH Node 2 SGCG IL32 ELF4 ITGB7 AKR1C2 SOCS1 PPP1R1A AGBL3 EBI3 SLITRK5 TPSB2 PPP2R1B TIMP4 NTS MT1M IGKV3-20 ATP1A2 LRRN3 C8orf71 NUP54 LRP1B MS4A2 LEP IGHA1///IGHD///IGHG1///IGHG3///IGHM///IGHV1-69///IGHV3-23///IGHV4-31///LOC10012658KCNN3 3 PRSS2 CLDN-Enriched CPA3 CIDEA MARCO IGHG1///ZCWPW2 CA4 ADAMDEC1 IGHV3-48///IGKV3-20 SVEP1 TNMD INHBE NKG7NCF1///NCF1B///NCF1C KCND3 ACSM5 BCAN DGKB IGHA1///IGHA2///IGHD///IGHG1///IGHG3///IGHM///IGHV3-48///IGHV4-31///LOC100291917 NAALAD2 FAM13C EDN3 SV2B EPHA5 IGHA1///IGHD///IGHG1///IGHG3///IGHM///IGHV3-48///IGHV4-31///LOC100291917 Node 3 ADAM23 KHDC1L CYP27A1 OPCML GSC2 CNGB1 OGN GNAL NBLA00301 MCFD2 BCL2A1 CCL5 DCAKDTNFRSF4 STAR IGHA1///IGHA2///IGHG1///IGHG3///IGHM///IGHV3-23///IGHV4-31 PTPRN2 CCDC68 CHL1 DARC RHAG MCF2 BRDT LOC100287927 LAMC3 Extrecellular Exosome UTP14A SCN1A F2RL2 KCNJ15 CCL23 MMP3 DPT MYBPC2 DENND2A PPM1E GRIK2 JAK2 PDE3A IGHA1///IGHG1///IGHM///IGHV3-23///IGHV4-31 HTR2B SPIB IGLV4-60 NGFR KRTAP9-9 IGHM///LOC100510678 Node 4 GLRA2 CPZ///GPR78 NOL4 PAX6 MST1///MST1P2///MST1P9 MGAT3 GABBR1///UBD F13B IGHA1///IGHG1///IGHG3///IGHM///IGHV4-31///LOC100510678 MMP1 C7 HLA-DOB RGPD5///RGPD6 BIRC3 PDE5A FHL1 ESR1 TAP2 HSD17B6 ITM2A CFB IL33 IL1R2 KLHL23 Plasma Membrane TRPA1 LRRC2 ANTXR1C7orf10 HGF TNXA///TNXB MATN3 PSMB8 ABCB1///ABCB4 INHBB SEMA3D TRIM58 NTM SLC22A3 ABI3BP RGS4 IL27RA Node 5 CLEC3B DIO2 INHBA KIAA1199 IGHA1///IGHG1///IGHM NKX3-2 NR5A2 ABCB4 COMP VASH2 IGHA1///IGHA2///IGHG1///IGHG2///IGHG3///IGHM///IGHV4-31///LOC100126583///LOC100290036 MFAP4 CCL14-CCL15///CCL15 ARNTL2 CDH4 CACNG4 SALL1 IGHA1///IGHA2///IGHD///IGHG1///IGHG2///IGHG3///IGHM///IGHV3-48///IGHV4-31///LOC100126583///LOC100291917 CST1 IGHM///LOC100133862 IGH@ ITGB6 ITGBL1 BHMT2 HBB GREM1 FN1 GALNT14 Binding CORIN COL11A1 IGHG1///IGHM///LOC100133862 HBA1///HBA2 NAG18 IGLV1-40 F2RL1 USP53 SLIT3 CD1E STXBP6 MMP11 CAMTA1 CDKAL1 CD1C CCL2 CORO2B SNED1 COL14A1 TEKT2 IGKV1D-8 IGLL5///IGLV2-11 IGLV2-23 HEY1 IGHA1///IGHA2///IGHD///IGHG1///IGHG3///IGHG4///IGHM///IGHV4-31///LOC100133862 Node 6 GRP CD1B GLRB COL10A1 EMCN ZNF385D CACNA2D3 IGLV1-40///IGLV1-44 FBLN2 CTGF GREB1 NEDD4L C1orf21 AGXT IGLV6-57 IGLJ3 IGL@ CCL8 COL5A3 IGKC///IGKV1-5///IGKV1D-8///LOC652493///LOC652694 Integral component of Plasma Membrane CCDC102B IDO1 IGHG1///LOC100293559 EHD2 NEFM LAG3 CXCL10 MS4A3 TPPP3 IGF1 FRZB TIE1 SEC24D WNT5A PLCB4 ELN NPTX2 CD47 EFEMP1 IGLL3P IGK@///IGKC///LOC652493///LOC652694 C6orf54 NRXN2 TBX2 CCL7 NPAS3 IGHG1///IGHG2///IGHM///IGHV4-31 NLGN4X CCL14///CCL14-CCL15 ST3GAL1 IGHA1///IGHA2///LOC100126583 Node 7 F3 SEPT8 IL13RA2 LYVE1 CXCL14 GGT5 RENBP MYO1B SLC14A1 CDH12 PF4V1 LOC100127886 NPVF NR2F1 TTC22 MME PELI2 CAPN10 IGLV1-44///LOC100290481 IGHD ZBBX IGLV3-19 ERC2 KCNA2 BMP8A NEFL GIPC2 HIC1 SCN3A TNC PDPN ASPN SFRP4 ANKH DKK1 C20orf103 Extracellular Matrix PTN MYRIP EDNRB IGJ IGF2///INS-IGF2 MYB LEPREL1 ADAM12 LHX2 OLFM1 EMILIN1 HCFC1R1 IGLC7///IGLV1-44///LOC100290481 POPDC3 CYBRD1 CXCL11 CYTL1 PLN PTGFR RERGL D4S234E Node 8 EREG POSTN MAGEB1 C10orf72 IGLC7///IGLV1-44 AREG FBN1 CMAH ZEB1 DENND5B KCNN2 EVC FAM198B EPHA3 ANGPTL4 CACNA1C EMX2 PLA2G5 IGKV4-1 EDNRA AHNAK CYP2C18 OMD WISP2 IGKC NPY2R Desmosome ALOX12 PSD3 SYNE1 GALR2 CCK ODZ3 SLC16A4 IGK@///IGKC TLX3 ZNF80 PLA2G2A PCDHGB6 IGK@///IGKC///IGKV3D-15///LOC100510044 ROBO3 SGCD CACNB2 CUEDC1 CLN8 VCAN Node 9 IGFBP5 KERA RUNDC2C IGHA1///IGHD///IGHG1///IGHG2///IGHG3///IGHM///IGHV4-31///LOC100510678 GPR25 PRKG1 GYPE SERPINE1 NRP1 NDUFA4L2 VNN1 CXCL9 PCDHGA8 TNFRSF17 TREX1 C9 FMO3 MYCN GULP1 ATP2A3 MFAP5 BAI3 TMEM144 ERAP1 VNN2 IGFBP2 LOC100289109 DHRS7 Keratinization COL4A3 SP110 ISG20 SMARCA1 TNFSF11 TWSG1 MGC29506 CLIC2 FLT3LG CDH13 MYH3 IFNA8 GNRHR EIF3F MMP13 CD38 Node 10 SPON1 EPYC SCGB1A1 IRAK3 IGKV1-5 SORCS3 DOPEY2 CCDC69 PCSK5 TRPC1 TRMT1 SLAMF7 ANKRD36BP2 ARHGAP25 UTS2 DPP4 LRRC19 Binding STMN2 IL2 OBFC2A MMP9 NPPC RGS1 IFNA2 IGK@///IGKC///IGKV1-5 UTY WT1 CD163 CYorf15B FAIM3 NKTR PHACTR1 PIK3CG PDE4DIP SEMG1 DOPEY1///LOC100509911 CAPN3 Node 11 SMAD6 CLCA4 ATXN3 CRTAM HS3ST3A1 IGHG1 SI MYO5A CD3G CXCR5 TNFRSF25 POU2AF1 RAP2B NLRP1 RASGRP2 PLA2R1 LY9 CYP7A1 ACTG2 SOX3 BMP5 GPR171 P2RX5 COL17A1 ZAP70 WFDC1 IGHV5-78 Extracellular Space LAMA3 CD22 MAP4K1 PPP1R16B TPM2 DPEP2 MYLK SCN9A DPPA4 ATRNL1 GPR18 DST BLK EPHA7 DDX43 TARP///TRGC2 HEPH CR2 DLX2 SH2D1A Node 12 DCT UBASH3A DZIP1 ARL17A///ARL17B MMP10 BANK1 KCNV1 ACAP1 ACTA1 IKZF1 GAS2 SH3BGR MSTN RAB3GAP2 TTC18 GZMK PLP1 ZNF711 GSTA1 PLA2G2D CNN1 GC CD7 PLA1A TRAT1 N4BP2L1 REG1A MS4A1 TARP TUBB2B ELOVL2 AICDA LOC100132247///LOC348162///LOC613037///LOC728888///LOC729978///NPIPL2///NPIPLSIGLEC8 3 IL2RA CCND2 FAM169A GZMM Extracellular Exosome SOX11 ALDH1A1 FAM149A TMSB15A TRAF3IP3 CDKN1C PTPRCAP NUDT11 DMRT1 SELE PRKDC LGALS2 GDF1///LASS1 TCEAL2 MYH11 LMOD1 IL21R KLRC4 ZNF674 KLRB1 SAA1///SAA2 Node 13 FBXO17///SARS2 MAGED4///MAGED4B SMR3A LPHN3 C6orf103 SPANXB1///SPANXB2///SPANXF1 CCL22 STAP1 AASS CCL21 C3 CXCL13 CLSTN2 PNMAL1 KIF24 SMR3A///SMR3B MORC1 GAP43 LYPD1 SV2A SLC6A8 CCL19 CHIT1 MYOZ2 PCSK1N ANO3 PAMR1 MAP2K5 ABCG5 ORC5 Cell Adhesion SMR3B AMACR IMPG1 MGAT4C TAC1 MUC7 CDH2 GSTT2 LOC399491 HSPC159 ZNF157 GAD1 PTGDS CD52 RIBC2 PRR4 NPR2 ATP1B4 FLJ11292 CHAF1B Node 14 ZNF44 HELLS IFT74 TRO LAMB4 RAG2 AQP5 FABP7 FKBP10 PCDHA6 C10orf137 ZFX///ZFY GPSM2 RRAD TRBC1///TRBC2 C20orf195 CDC45 MYBPC1 ZNF165 TMEM100 RASAL2 RRP12 ATP11A FAH Cytosol CHRM3 PCTP EVPL EXOG IQCK PDK3 RNF32 ARSJ EDA RAD52 PRELP RFPL1 SNAP25 RAD54L C11orf67 RAC2 ODF2 NUP210 GPER PYCRL FLJ13197 ING1 MPP6 FOSB NR4A2 Node 15 TTLL4 SLC30A4 PPARGC1A BAIAP2 ADAM8 TWF1 KCNJ16 MYH1 SLC2A5 SPOCK3 JRK SP2 TPH1 ZNF771 NMU FLJ22184 DNAH3 GCAT OSGIN2 FOS BRCA2 RAD51 BST1 HIST1H4L DHFR KCNAB1 SHANK2 RASA4///RASA4P EPN3 C17orf60 CALCRL MBNL3 CDC25A DIRAS2 SLC16A3 PBRM1 CSF2RA RECQL4 SUN1 CGREF1 APOC2 KLRF1 ITGB3 Chemokine Activity ID4 HLF LOX MYLK3 PDE6C SPP1 EHBP1L1 SH3GL2 YBX1///YBX1P2 KIT POU4F1 GREM2 FAM135A PRKCB TH TUBB1 GPR19 CHAF1A CENPN FGF9 ARHGAP8///PRR5-ARHGAP8 FCGR2C DUSP1 NPR3 PRKAG2 RAC3 PODXL2 NYNRIN PLXNC1 PECAM1 ITGA4 SNAP91 ANGPT1 PRDM13 CLPB FCF1///LOC100507758 PTPN12BCAT1 HEY2 AIF1 OVOL1 GNRH1 Node 16 HSPB2 HRASLS MGC2889 FAM153A///FAM153B///LOC100507397 RNF24 GAS1 FBXO11 CDCA8 TCF21 SLCO2B1 DIAPH2 C5orf23 ARHGEF12 TREM2 FOXM1 EPCAM LOC100131298 LOC100509761 RIMBP2 CD300A ZNF549 SMPDL3B CR1 TMEM156 BUB1 ANKS1B ADAM28 RNFT2 CRYAB PTPN22 MHC Class II Receptor NDUFA2 ORC4 CLEC4A THBS1 COBL TGM2 BIN2 LRMP PTPRC MCM10 DDX11 TGFBR2 SGCB ACSM3 PAICS LARP4B CADM4 RGS13 LOXL2 BCL2L11 NPTXR ANP32E PART1 CD28 CRY2 LILRA2 EPB41L3 POU1F1 Node 17 ZC3H7B LIF BCKDHB TLR8 GSN SPTBN1USP9Y P2RY13 RAD54B ZNF208 IPW SUV39H2 HOMER3 PLK4 C14orf105 MET EZH1 ST6GALNAC2 KRT81 LILRB1 CES1 FUT9 ZNF257 LRP12 CDK5R2 FAM70A SERPINA3 SNRPN LOC100510692///NAIP KCNMA1 CP KIAA1024 PCDHGA10///PCDHGA11///PCDHGA12///PCDHGA3///PCDHGA5///PCDHGA6 CLCA3P ELK4 Inflamatory Response CDKN2A GPR161 NRAP TLK1 CDKN2B TPR PAR5 LOC644450 SIGLEC1 PTPN20A///PTPN20B NRP2 EPB41L4A KRT20 GPC1 COL6A1 TAF13 CDYL CFTR VIP XPNPEP1 CCND1 IL26 SORBS2 GRTP1 THAP9 BGN PHLDA1 FGF13 RIT2 RNFT1 Node 18 KRT23 FYB FZD8 MYH8 CPD TUBB2A///TUBB2B DHX34 GHR PLEC SENP7 BTF3P11 EPOR GALNT12 SLC13A1 MAPRE2 ITGB4 SSX4///SSX4B LDLR C1orf129 ZNF682 SSX3 TAF9B ID2B SLC27A6 ZC3H15 SLC6A2 MOSC1 RAB7A SCD LCMT2 ARTN C8orf84 PMAIP1 GDAP1 KDSR ZNF287 Ig Receptor Binding PAH CXCR2 SYK ABLIM3 CCDC144A SLC5A12 ZBTB33 GAL RYR2 SSX1 FBXO38 HHLA2 REPS1 DMP1 ZFP36L2 ASPH UBE4B PSG1 ITGA6 DDX17 PKD2L2 CX3CL1 SLC34A2 PROM1 GPR137 TFR2 UBE2D1 ABHD2 VTCN1 OPRM1 TBX1 GTPBP10 PICALM EXOC5 Node 19 FOLR1 TCF3 FOXO3///FOXO3B IBTK GPR6 IMPG2 GRK4 RARRES1 BBOX1 PTPRT UBR2 CAMK1D SLC30A10 MAZ TBL1X TM4SF1 SYCP2 PLXNB1 ZNF780A///ZNF780B TAPT1 DAZ1///DAZ2///DAZ3///DAZ4DEFB1 GPR110 INVS ARHGDIA SEPT9 LRRC37A4 PDZK1 PTPN2 C10orf92 PPAR pathway TEF SPATA1 LOC100287076 CRABP2 MGP KBTBD10 LCN2 HLA-DQA1///HLA-DQA2 TIMM17A WNT4 SLC24A3 KCNK3 EYA2 STIP1 Node 20 C21orf2 CLDN1 DR1 RRP15 BCL2L1 NLRP2 SCNN1B HOXD3 KCNQ1 SLPI AMY1A///AMY1B///AMY1C///AMY2A///AMY2B SATB2PCDH7 HLA-DRB4///LOC100509582HLA-DRA ARFGEF2 MCL1 RAD21L1 GP5 NUMA1 AP1S1 GSTM4 PSCA PLGLB1///PLGLB2 WWC1 MED18 Plasma Membrane MAGEB4 ADAM3A ST8SIA1 GPC4 CHI3L2 ITPR1 PPP1R3A HOXB6 HLA-DRB4 BRAP PDZK1IP1 TP53 TSFM B3GALT2 SLC4A7 SPTLC3 SUCLG2 STK24 PHF3 CLCN4 PCYT1A PEX6 LGSN ST6GAL1 FKBP5 WLS HSD17B2 WHSC1 MLC1 GDF15 HEXIM1B3GNTL1 ARPC4 SUSD4 TSHR ABCF2 FOXI1 PMS2L2 PML Node 21 HLA-DQB1 KCNJ13 ITGB1 ZIC3 SRD5A1 NKX2-2 MYO10 TBC1D22B C6orf155 CRISP1 NR0B1 CDH1 TYMS MTAP NFIB LRRTM4 RAB9BP1 GRM8 KCNV2 RARS2 PLD1 EPB41 SNAP23 RBPMS SLC25A37 SLC25A31 HTR7 CHD1L EFCAB2 MUC4 C1orf116 C7orf44 TRPC3 RNF6 SNAPC5 HSD11B2 IL17RC KITLG CDV3 EIF5A APOF Extracellular Matrix CD24 EHF ASB4 ST18 HLA-DQA1 RAB6A PSPH ZNF238 C18orf25 PDE9A MBD2 DUOX1 HIST1H2AG ZNF471 C14orf106 CA10 GPR56 IQCH PHLPP1 GPM6B SLC27A5 HLA-DRB1///HLA-DRB3///HLA-DRB4///HLA-DRB5///LOC100133661///LOC100294036///LOC100509582///LOC100510495///LOC100510519 TTTY2 NOVA1 SPICE1 MLLT4 STK3 SYCE1L ZNF81 EEA1 EFNA3 LPA SORT1 MLXIP BCL11A MED6 GPRC5A CBLC ACTN2 CXorf57 GM2A MPHOSPH9 PDGFRA AP3D1 HIST1H3H FRMD4A HNRNPH1 SHMT1 CRAT APBA2 TFAP2C MAU2 Node 22 S100P CCDC88A RAB25 LAMA2 ZNF224 BDNF HIST1H4J///HIST1H4HIST1H4K J HIST1H2AE HIST1H2BJ HIST1H3B PKNOX1 FXYD3 HIST1H2BH HIST1H2AM IRX4 CYP1B1 C10orf81 PACSIN3 PRSS8 MVD PPP2R3B EIF2S2 NEK1 BTC HIST1H2BC ETV4 BTBD18 SLC1A1 PLK1S1 S100A14 MED1 PITX2 HIST1H2BG TACSTD2 GPM6A LOC647070 Metabolic Pathways GABRP LOC389906PSMD11 MUC1 SNCAIP SLC4A4 HIST1H3G LOC100505960 HMGCS2 CYP2J2 PEG10 PKNOX2 C11orf80 IGFBP1 MYST3 COX11 HIST2H4A///HIST2H4B DDX3Y STC2 ADCY10 ZNF750 ARHGEF26 HIST1H2AD///HIST1H3D ALB DNAJC6 CDR1 SEMG2 Node 23 RND1 CUL3 MPZL2 HIST1H1C MMP20 TRDN CXADR CUX2 CRABP1 NEBL GRIA3 H2BFS MAK LRP2 ROPN1B SLC7A8 CHODL TNP2 NLGN1 USH2A MSMB MMP7 POU2F3 ADAM2 CLGN AQP3 STEAP4 SLURP1 KLF13 ATP6V1B1 BMP4 FUT6 AP1M2 ELF5 CHI3L1 ATP6V0A4 EPHX2 HIST1H2BO FAIM2 Nucleosome CASC5 PLA2G16 KCNJ12 SLC25A21 FAR2 SIX3 LOC202181 LIN7A LOC100510224 HIST1H4H DHRS2 KRT83 UGT8 VIPR2 GCNT2 ZNF814 CLU NFASC GPRC5C ABCC6 SULT1E1 MGAM SNPH TMOD2 OLAH SFTPD LALBA RARRES3 DIAPH3 KYNU Node 24 NUDT4///NUDT4P1 LTF HPGD TSPAN8 VGLL1 CKMT1A///CKMT1B HIST1H2BM SERHL///SERHL2 LRRC31 SLC9A2 ADH1C S100A1 NRG1 ACTL8 CYP4F8 GRB14 CSN3 CSN1S1 MYF5 MOG CALB2 DUSP7 ZCCHC11 MUC5B C4A///C4B///LOC100509001 LASS4 Cell Cycle CELSR3 ODAM APOD RFPL1S PCSK1 CD44 HOXA1 BPI LRRN2 PNPLA3 CGA MFGE8 HOXA10///HOXA9 MARK2 PCDH9 GPR64 RNF128 TRPM1 BAMBI KBTBD11 NEFH CLIC3 Node 25 HOXA10HOXA11 EDDM3A CALML5 PBX1 FAM110B CASC1 CA3 FMO5 PCP4 FAM107A NPAS2 FGFR4 MTRF1L CLDN4 ALOX15B MVK SERPINA5 HGD Cytosol BCHE PTPRZ1 NRXN1 CEACAM7 NQO1 IGF1R SRCAP SERHL2 NOL3 TF CLDN3 SLC26A3 EDN2 TRIM9 GPR143 PRMT7 DLG2 CDK10 STOM KCNK1 GATA3 ACSM1 Node 26 FLG IL20RA CLDN7 MAOA MYOM2 MRAS MYO6 PLEKHA6 GNMT LOC100289410 GPLD1 TJP3 ALDH4A1 FEZF2 S100B LOC1720 SLC9A3R2 KRT7 SCUBE2 TRPV6 MCF2L BMP7 ATP2C2 PIP ARNT2 FRK Nucleoplasm RPRM MMP15 MAP9 LMF1 LOC284649 HPD C9orf156 PCCA CPB1 NRTN ZNF135 EGR4 SSH3 TTC9 TP53TG1 PLEKHB1 BIK MFSD7 HSD3B1 ZNF667 KRT19 AKR1D1 CYP4B1 PCDH8 ANKRD7 C2orf72 GRIA1 SCG3 PTK6 DDC ARHGAP44 CA6 SULT1C2 NTRK3 PLCE1 PNPLA4 PCSK6 GABRB3 RAB3B PRKD1 NES SLC37A1 ACADL EEF1A2 ACAN NVL CTNNA2 KCNE1L SFRP1 STS FLJ11235 UGT2A3 GABBR2 ABCC2 MBNL1 SOX10 GLRA3 GAMT CTTN SLC6A20 TEX14 KCTD14 SLC4A8 KCNJ4 SYNM GRM5 CNTNAP2 KIAA1659 RGR PEG3 TLE4 EPHA4 MFI2 NPY1R AP4E1 SLC35F5 GTF2A1L///STON1-GTF2A1L SYNPO2L GRIA2 TTYH1 SPRY4 NCAN SCRG1 MAOB FGF2 FFAR2 GFAP CYP39A1 IFT122 EFHD1 UGT2B28 FGFR2 TSPAN12 FOXD1 MTMR7 HRASLS2 TFPI2 RPE65 CYP2B7P1 EXTL1 FES CYP2B6///CYP2B7P1 DLK1 PLAG1 ERO1LB PVALB IL17B MYOT SEMA3B CAPN6 CRHR1 ZNF780B SLC12A1 NTRK2 VAV3 DIO1 GP1BB///SEPT5 SLC16A1 HRK PTH FZD9 CEP135 KCNK2 SEMA6A SLCO3A1 ACE2 CTNND2 MFAP3L SCGB2A1 WIF1 RET SH3BP2 AZGP1 MCOLN3 SYT1 PNMT FERMT1 FOXC1 ELOVL4 TGFB2 PNLIPRP2DCX PTPLA COL9A3 MIA FGFR3 LARGE GRM1 MPPED2 GNAS SCGB1D2 MATN2 KRTAP1-3 WISP3 NKX2-5 IGF2BP2 KCNJ3 CEL///LOC100508206 COL2A1 COCH ADD2 CRLF1 DLGAP1 SIDT1 DSG1 LAMB3 PTGS2 CREM MAPT CAPN9 SUSD5 EPHX3 QPCT SELENBP1 KCNK12 RTEL1 SOS2 SCGB2A2 KIF1A SLCO1A2 PLA2G4A ACSBG1 PRRX2 CHST3 SMA4///SMA5 PAX3 UCHL1 TMOD1 SYCP1 DSC3 DSG3 PYGB EPO TMX4 FCN2 DLX5 PAX2 CORO1B FSD1 EIF1AY KLK7 VSNL1 SMPX FBN2 COL11A2 GJB5 LGR5 GSTT1 MNX1 CRISP2 FAM149B1 SLC27A2 SNTB1 ANXA9 FAM5C CYP26B1 PTX3 CSPG4 GJB3 ACOX2 PKP1 SERPINB5 TMEM158 ZNF407 DUSP6 RAB27B TSPAN1 PON3 CACNA1A CRISP3 HAPLN1 ADCY2 CYLC1 TGM1 FCGBP CDH19 TFPI SIX1 SCAND2 FADS1 IGF2BP3 TBX3 KLK6 PCOLCE2 TMPRSS11E PADI3 RALYL NR2E1 KIAA1644 MCF2L2 C8orf4 TOX3 COL9A1 LRFN4 CACNA1D DSC2 POF1B HR FOXG1 CDY1 SLCO4C1 BMP2 AGR2 MBL2 IRF4 RAD23B TNFRSF11B PRSS21 ZIC1 INSM1 TNNI2 SOX9 MAGEA9///MAGEA9B GYPA KLK5 BARX2 SLC38A4 CLCA2 CLDN8 ALDH1A3 TRIM36 AR KIAA1324 NMNAT2 MAGEA4 MSX2 SPDEF KLK11 PRAME PADI4 TTC12 GCM2 ALCAM PRKCA ZNF557 FOXA1 TMPRSS2 PLCH1 CSAG2///CSAG3 GPR85 KCNH2 C6orf15 ABCA12 GABRA2 MLPH TFF3 ABCC3 COL4A5 DUSP4 FABP6 LGALS7///LGALS7B MAGEA1 MEGF9 DNALI1 RAB38 ST3GAL6 N6AMT1 KLK10 CALB1 FOXE1 OLFM4 HLCS SLC44A4 ESRRG GLDC JMJD7 UGT2B17 CEACAM6 HPX S100A2 MAGEA2///MAGEA2B TFF1 ALDOB CA12 TFAP2B REEP1 CTAG1A///CTAG1B ATP8A2 EGF AKR1B10 SEMA3E EYA1 SEMA3C PCDHB13 SOSTDC1 ABCA4 C6orf26///MSH5 SYT17 PCDHB3 KRT15 TMC5 LPAR4 NLGN4Y MAGEA12 CTAG2 DACH1 UGT1A1///UGT1A10///UGT1A3///UGT1A4///UGT1A5///UGT1A6///UGT1A7///UGT1A8///UGT1A9 PCDHB8 PTHLH TRPM4 SCNN1A CEACAM1 CYP24A1 TYRP1 KRT14 TDRD1 TM7SF2 CYB5A ZFP2 TFF2 DSC1 LY6G6E GNGT1 LOC100289775///WNT7B GABRA5 UGT1A8///UGT1A9 SLC6A15 TCN1 UGT2B4 FAT2 GABRA5///LOC100509612 MAGEA3 OCA2 SLC6A14 IL4 P2RY2 GPR37 PP14571 RYR1 LOC51152 EGFR CLDN10 MAGEA6 KRT17 ALDH3B2 KCNK15 KRT6B TNFRSF11A GAGE12C///GAGE12D///GAGE12E///GAGE12F///GAGE12G///GAGE12H///GAGE12I///GAGE2A///GAGE2C///GAGE4///GAGE5///GAGE6///GAGE7 ARSD BAGE MAGEA5 AQP4 KRT75 KRT5 GAGE1///GAGE12F///GAGE12G///GAGE12I///GAGE12J///GAGE4///GAGE5///GAGE6///GAGE7 ELAVL2 GAGE3 CCNA1 EN1 ANKRD1 PLCB1 FGB ANXA8///ANXA8L1///ANXA8L2 WNT6 GAGE1///GAGE12C///GAGE12D///GAGE12E///GAGE12F///GAGE12G///GAGE12H///GAGE12I///GAGE12J///GAGE2A///GAGE2C///GAGE2D///GAGE2E///GAGGPC5 E4///GAGE5///GAGE6///GAGE7///GAGE8 CAMP HTR2C MSLN OBP2A///OBP2B GPR172B PDLIM4 SIM1 FGG PCYT1B FHOD3 CCDC70 RBP1 GAGE12F///GAGE12G///GAGE12I///GAGE5///GAGE7 TNIK WNT5B MAGEA11 GABRE COBLL1 IL12RB2 MYBL1 DTNA OBP2A RPL5///SNORA66 GAGE1///GAGE12F///GAGE12G///GAGE12I///GAGE12J///GAGE2A///GAGE2B///GAGE2C///GAGE2D///GAGE2E///GAGE3///GAGE4///GAGE5///GAGE6///GAGE7///GAGE8 SEZ6L2 ACSL6 ORM1 LBP LY6D GDPD5 APOBEC3B PRSS12 PITX1 QPRT ART3 CSTA CA8 BARX1 TRIM29 ORM1///ORM2 TMPRSS3 MUC16 S100A9 PIK3R4 S100A7 PGBD5 PAK7 GPR109B MAGIX NFE2 RHCG TNNT1 ADAM22 LOC100505650 KRT16 WFDC2 L1CAM PI3 KCNG1 CST6 S100A8 APLP1 HSD17B1 SLC15A1 NRG2 KRT4 EGOT ZNF334 KRT6A///KRT6B///KRT6C SCEL SOX15

CDH3 HES2 KRT6A

SFN SERPINB13 LYPD3 CELA2A///CELA2B SERPINB3 ADRA1A CHD7 KRT13 RGS20

STYK1 SERPINB7 TRGV5 SPRR1B SERPINB4 CDA GPR87 SERPINB2

SPINK5 CRCT1 SPRR1A CWH43

CYP3A5

SPRR2BSPRR3TMPRSS4

ARL14 NUP62CL

Figure 1. PGM resulting network; each functional node is encoded from 0 to 26. Each box (node) represents one gene, and lines (edges) connect genes with related expression. Functional nodes are represented by the same color, and metanodes are presented the same color palette, with basal nodes in red, luminal nodes in blue and immune nodes in green.

Functional nodes 5, 6, 7, 8 and 10 in our network had diferent gene ontologies related to an integral compo- nent of the plasma membrane, extracellular matrix, desmosomes, keratinization, and extracellular space, respec- tively. However, these fve nodes appeared to correlate in the HCL analysis (Sup. File 2) and included genes from Rody’s basal-like metagene (Fig. 2). Tus, from now on, these fve functional nodes were grouped as the basal metanode (Fig. 1). In the same way, functional nodes 0, 9, 11, 14, 22 and 23 were related to protein binding, extra- cellular exosomes, sequence-specifc DNA binding, metabolic pathways and nucleosomes, respectively, again grouped together in the HCL analysis and including genes from Rody’s apocrine/luminal metagene, so they were defned as the luminal metanode (Fig. 1).

Cellular classifcation. Te sparse k-means method was used to group samples into a limited number of clusters based on functional nodes and metanodes. Samples from the basal and luminal metanodes and the CLDN-enriched functional node were each divided into two groups. Mimicking the cancer stem cell hypothesis, we established the following workfow (Fig. 3): Samples with high luminal metanode activity were classifed as the luminal androgen receptor group (LAR). Tumors showing low luminal metanode activity and high basal metanode activity from the basal subgroup were classifed as basal. Finally, tumors with low activity in both the basal and luminal metanodes were screened for CLDN-enriched node expression. Samples showing low activity for the CLDN-enriched functional node were categorized as CLDN-low, whereas samples showing high activity for CLDN-enriched functional node were labeled as CLDN-high (Fig. 3). From the 494 samples in the main dataset, the cellular classification defined 91 (18%) LAR, 53 (11%) CLDN-low, 310 (63%) basal and 40 (8%) CLDN-high samples. Only 7 (1.5%) samples showed high activity in both the luminal and basal metanodes (Table 2). Clinical characteristics from the various entities of cellular classifcation are shown in Table 3. Basal subtype tumors were mostly small-sized, poorly diferentiated and without lymph node infltration. Te CLDN-high sub- type tumors were large, had poor diferentiation and no lymph node infltration. Te CLDN-low as well as the LAR tumors were large, more diferentiated and showed more infltration than the basal and CLDN-high tumors. Cellular classifcation does not show a signifcant relationship to RFS (Sup. File 3), nor did basal and luminal metanode activities show prognostic value. CLDN-high tumors showed a trend toward a poorer prognosis than CLDN-low, but again, the diferences were not signifcant.

Activity of functional nodes in cellular groups. Te activity of the main functional nodes was assessed in each cellular group. CLDN-low tumors had lower activity than every other tumor subgroup in the functional nodes related to alpha-amylase activity and regulation of actin , and higher activity than the other subgroups in the haptoglobin binding functional node. CLDN-high tumors had lower activity than basal tumors

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 4 www.nature.com/scientificreports/

CPS1 MEST CEACAM8 CXCL3 ST8SIA3 ADAM7 HCG4 CXCL2 BST2 DHX58 RTP4 KIAA0087 CXCL5

CCL20 IFI27 CXCL6 HESX1 CXCL1 RETN MX1 IFIT1 PPBP IFI44L IL1RAP C14orf143 IFIH1

LOC283079 FOSL1 DENND4B CDH6 LAMP3 GAD2 IL8 CCL13 TNFRSF10D HYAL1

USP49 ICOS AQP9

SLC11A1 MMP12 KLRC3 CCL18 ERAP2 KLRC1///KLRC2

LOC57399 TRDV3 TRPC5 STMN3 GNLY TRD@ SYNGR3 GZMB TNIP3 CAT

PTH2R SLC6A6 CD8B CTSW XCL1///XCL2

IGHA1///IGHA2///IGHD///IGHG1///IGHG3///IGHG4///IGHM///IGHV4-31///IGHV4-59///LOC100126583 XCL1 15 DKK4 CXCR6 AGT IGHA1///IGHG1///IGHG2///IGHG3///IGHM///LOC100126583///LOC100290036IGHA2///IGHD///IGHG1///LOC100126583///LOC100510361 SLC2A6 PRDM2

FCGR1A Adipocyte G6PC

IL32 GZMH IGHA1///IGHA2///IGHD///IGHG1///IGHG3///IGHG4///IGHM///IGHV3-23///IGHV4-31///IGHV4-59///LOC100126583IGHM ELF4 ITGB7 EBI3 NAV3 NUP54 ZNF214 NKG7 18 Adipocyte PRSS2 MARCO ADAMDEC1 BCAN NCF1///NCF1B///NCF1C DGKB IGKV3-20 EDN3 SV2B EPHA5 LRP1B KHDC1L CYP27A1 IGHA1///IGHD///IGHG1///IGHG3///IGHM///IGHV1-69///IGHV3-23///IGHV4-31///LOC100126583KCNN3 NBLA00301 BCL2A1 IGHG1///ZCWPW2 STAR CCL5 DCAKDTNFRSF4 IGHV3-48///IGKV3-20 RHAG IGHA1///IGHA2///IGHD///IGHG1///IGHG3///IGHM///IGHV3-48///IGHV4-31///LOC100291917 Node 2 MCF2 FAM13C IGHA1///IGHD///IGHG1///IGHG3///IGHM///IGHV3-48///IGHV4-31///LOC100291917 JAK2 MYBPC2 DENND2A OPCML GSC2 GRIK2 SPIB PDE3A

KRTAP9-9 IGHA1///IGHA2///IGHG1///IGHG3///IGHM///IGHV3-23///IGHV4-31 NOL4 GABBR1///UBD MST1///MST1P2///MST1P9 MGAT3 BRDT LOC100287927 LAMC3 Apocrine 19 HLA-DOB BIRC3 ESR1 IGHA1///IGHG1///IGHM///IGHV3-23///IGHV4-31 TAP2 IGLV4-60 PPP1R1A MT1M CFB INHBE KLHL23 IL1R2 IGHM///LOC100510678 TDRD12 HP///HPR THPO SORBS1 PSMB8 IGHA1///IGHG1///IGHG3///IGHM///IGHV4-31///LOC100510678 AGBL3 INHBB C8orf71 RGPD5///RGPD6 CIDEA CA4 HP SOX30 TNMD ASPA IL27RA LOC100507804///TPSAB1 SOCS1 Node 3 LOC653513///PDE4DIP TIMP4 LEP ATP1A2 HERC2P3 VASH2 MYEF2 PDK4 CCL14-CCL15///CCL15 ARNTL2 AKR1C3 MGC4859 TPSAB1///TPSB2 TRHDE TPSAB1 PDAP1 CA9 SEMA3D ITGA7 LPL PPARG TRIM58 AKR1C1 SGCG GALNT14 PDE3B PRG4 AKR1C2 NAG18 CD36 F2RL1 USP53 IGHA1///IGHG1///IGHM TPSB2 GPD1 CDKAL1 STXBP6 SLITRK5 RGS12 PCK1 CA2 CAMTA1 IGHA1///IGHA2///IGHG1///IGHG2///IGHG3///IGHM///IGHV4-31///LOC100126583///LOC100290036 B-Cell NTS ODZ1 CCL2 CDH4 PLIN1 ADIPOQ LRRN3 ENO2 IGHA1///IGHA2///IGHD///IGHG1///IGHG2///IGHG3///IGHM///IGHV3-48///IGHV4-31///LOC100126583///LOC100291917 MS4A2 FABP4 HEY1 GYS2 CHRDL1 IGHM///LOC100133862 IGH@ FOXRED2 MTUS2 GLRB CPA3 XYLB DPYSL4 IGHG1///IGHM///LOC100133862 SVEP1 CFD IGLV1-40 KCND3 LIPE ADH1B PPFIA4 NEDD4L C1orf21 RBP4 MEOX2 VEGFA CCL8 NAALAD2 PPP2R1B IDO1 ACSM5 IGKV1D-8 IGLL5///IGLV2-11 IGLV2-23 ADAM23 C10orf116 PPM1E ABCA8 LAG3 CXCL10 IGHA1///IGHA2///IGHD///IGHG1///IGHG3///IGHG4///IGHM///IGHV4-31///LOC100133862 CNGB1 OGN Node 4 GNAL MCFD2 CD47 NGFR CRIP1 PAX6 CCL7 IGLV1-40///IGLV1-44 PTPRN2 C7 CCDC68 CHL1 IL33 IGLV6-57 IGLJ3 IGL@ LRRC2 RENBP F2RL2 IGKC///IGKV1-5///IGKV1D-8///LOC652493///LOC652694 UTP14A SCN1A LOC100127886 IGHG1///LOC100293559 KCNJ15 CCL23 ITM2A ABCB4 MS4A3 MMP3 DPT ABI3BP HTR2B ABCB1///ABCB4 TNXA///TNXB IGLL3P IGK@///IGKC///LOC652493///LOC652694 Basal-Like HBA1///HBA2 CLEC3B IGHG1///IGHG2///IGHM///IGHV4-31 GLRA2 CPZ///GPR78 HBB IGHA1///IGHA2///LOC100126583 F13B DARC MMP1 CD1E CD1C IGHD IGLV3-19 PDE5A FHL1 CXCL11 IGLV1-44///LOC100290481 ZBBX HSD17B6 EMCN CD1B TRPA1 ANTXR1C7orf10 HGF CACNA2D3 IGJ MATN3 Node 5 CCDC102B 20 POPDC3 NTM SLC22A3 IGLC7///IGLV1-44///LOC100290481 RGS4 IGF1 21 TIE1 SEC24D DIO2 INHBA KIAA1199 NR5A2 IGLC7///IGLV1-44 COMP NKX3-2 MFAP4 CCL14///CCL14-CCL15 CXCL14 TBX2 NPAS3 CACNG4 SALL1 PCDHGB6 GGT5 DENND5B CST1 ITGBL1 BHMT2 SCN3A C20orf103 PF4V1 IGKV4-1 CLaudin-CD24 ITGB6 FN1 GYPE CYP2C18 GREM1 COL11A1 NPY2R CORIN VNN1 CXCL9 IGKC LHX2 OLFM1 ERC2 CCK HIC1 C9 FMO3 GALR2 SLIT3 MMP11 IGK@///IGKC RERGL BAI3 D4S234E ERAP1 VNN2 CORO2B SNED1 COL14A1 TEKT2 IGK@///IGKC///IGKV3D-15///LOC100510044 CACNB2 SP110 GRP CYTL1 LYVE1 IGHA1///IGHD///IGHG1///IGHG2///IGHG3///IGHM///IGHV4-31///LOC100510678 GPR25 COL10A1 CLIC2 FLT3LG TNFRSF17 CTGF ZNF385D MYH3 IFNA8 TREX1 AGXT FBLN2 GREB1 EDNRB GIPC2 17EIF3F ATP2A3 COL5A3 Node 6 EHD2 NEFM TPPP3 SCGB1A1 IRAK3 FRZB SORCS3 DOPEY2 CCDC69 ISG20 WNT5A PLCB4 ELN NPTX2 TRMT1 MGC29506 EFEMP1 DPP4 ANKRD36BP2 ARHGAP25 C6orf54 NRXN2 UTS2 LRRC19 NLGN4X ST3GAL1 OBFC2A CD38 F3 SEPT8 IL13RA2 RGS1 MYO1B SLC14A1 CDH12 IFNA2 NR2F1 CD163 UTY CYorf15B HOXA NPVF IGKV1-5 TTC22 MME PELI2 CAPN10 FAIM3 NKTR PHACTR1 PDE4DIP PIK3CG KCNA2 BMP8A NEFL SEMG1 DOPEY1///LOC100509911 SLAMF7 CAPN3 ATXN3 CRTAM TNC PDPN ASPN SFRP4 ANKH DKK1 CD3G PTN CACNA1C CXCR5 TNFRSF25 RAG2 MYRIP NLRP1 RASGRP2 IGK@///IGKC///IGKV1-5 MYB IGF2///INS-IGF2 CYP7A1 LEPREL1 EMILIN1 ADAM12 LY9 SGCD HCFC1R1 BMP5 GPR171 CYBRD1 SOX3 ZAP70 PTGFR FAH PLN IGHV5-78 EXOG EREG C10orf72 CD22 PPP1R16B AREG POSTN MAGEB1 MAP4K1 IGHG1 RNF32 Node 7 MFAP5 FBN1 DPEP2 CMAH SCN9A DPPA4 ATRNL1 GPR18 POU2AF1 RAP2B C11orf67 ZEB1 EPHA7 KCNN2 EVC EPHA3 ANGPTL4 BLK CDH13 FAM198B PLA2G5 DLX2 TARP///TRGC2 HEPH SH2D1A EMX2 CR2 P2RX5 FOSB NR4A2 EDNRA WISP2 AHNAK UBASH3A SLC30A4 OMD BANK1 ACAP1 TWF1 ALOX12 NDUFA4L2 IKZF1 PSD3 SYNE1 MSTN FOS MMP13 ODZ3 RAB3GAP2 TTC18 GZMK SLC16A4 PLA2G2D TLX3 ZNF80 CD7 PLA1A TRAT1 TNFSF11 PLA2G2A TARP Hemoglobin ROBO3 MS4A1 CUEDC1 AICDA LOC100132247///LOC348162///LOC613037///LOC728888///LOC729978///NPIPL2///NPIPL3 IL2RA ITGB3 CLN8 VCAN CCND2 LOX IGFBP5 KERA RUNDC2C PRKG1 GZMM EPYC SERPINE1 NRP1 PCDHGA8 TRAF3IPPTPRCAP3 DUSP1 PRKDC LGALS2 DMRT1 SELE KLRC4 MYCN WT1 IL21R GULP1 ZNF674 FCF1///LOC100507758 PTPN12BCAT1 TMEM144 KCNV1 KLRB1 SAA1///SAA2 IGFBP2 LOC100289109 DHRS7 MMP9 GAS1 GAP43 TAC1 COL4A3 SPANXB1///SPANXB2///SPANXF1 SIGLEC8 STAP1 SMARCA1 CCL21 C3 CXCL13 DIAPH2 TWSG1 HS3ST3A1 SPON1 KIF24 Node 8 CLSTN2 ZNF157 ZNF549 PAMR1 SLC6A8 CCL19 GNRHR ALDH1A1 MAP2K5 ORC5 CDKN1C PCSK5 ABCG5 MGAT4C STMN2 AMACR THBS1 MORC1 TGFBR2 SGCB LOXL2 GC 4 NPPC LOC399491 HSPC159 C6orf103 LMOD1 SI TRPC1 PTGDS RIBC2 CD52 FLJ11292 MYO5A ATP1B4 CHAF1B POU1F1 MMP10 GSN SPTBN1USP9Y Histone HELLS IFT74 LAMA3 LAMB4 MYH11 AASS PCDHA6 C10orf137 25 C14orf105 CNN1 CHIT1 ZFX///ZFY LRP12 EZH1 DST GPSM2 CLCA4 COL17A1 CCL22 TRBC1///TRBC2 C20orf195 CDC45 ZNF165 PCDHGA10///PCDHGA11///PCDHGA12///PCDHGA3///PCDHGA5///PCDHGA6 CLCA3P RRP12 ATP11A GPR161 ELK4 REG1A TLK1 PDK3 26 IL2 WFDC1 EDA RAD52 NRP2 RFPL1 RAD54L GPC1 COL6A1 TAF13 CDYL IL26 PHLDA1 SMAD6 MYLK RAC2 PYCRL ODF2 NUP210 BGN ING1 MPP6 RNFT1 Node 9 TTLL4 FZD8 MYH8 DHX34 ADAM8 JRK CPD TUBB2A///TUBB2B PLA2R1 MYH1 SLC2A5 SPOCK3 SENP7 BTF3P11 TPH1 ZNF771 NMU PLEC ACTA1 DNAH3 GCAT OSGIN2 SLC13A1 BST1 BRCA2 RAD51 MAPRE2 ITGB4 SSX4///SSX4B LDLR C1orf129 ZNF682 RASA4///RASA4P HIST1H4L DHFR SSX3 TAF9B ID2B C17orf60 CALCRL MBNL3 CDC25A LCMT2 SH3BGR DIRAS2 SLC16A3 APOC2 PBRM1 CSF2RA RECQL4 RAB7A SCD ACTG2 KLRF1 GDAP1 KDSR ZNF287 SPP1 SH3GL2 YBX1///YBX1P2 PAH EHBP1L1 GREM2 CCDC144A GSTA1 POU4F1 CHAF1A RYR2 SLC5A12 ZBTB33 FAM135A PRKCB TH TUBB1 GPR19 CENPN DMP1 SSX1 FBXO38 HHLA2 REPS1 IFN FCGR2C ZFP36L2 ASPH UBE4B ITGA6 DDX17 PECAM1 ITGA4 PKD2L2 TPM2 PLXNC1 SNAP91 PRDM13 CLPB GPR137 AIF1 OVOL1 OPRM1UBE2D1 TBX1 ABHD2 GTPBP10 PICALM EXOC5 FOXO3///FOXO3B GNRH1 TCF3 IMPG2 IBTK FAM153A///FAM153B///LOC100507397 RNF24 FBXO11 TCF21 SLCO2B1 CDCA8 UBR2 SLC30A10 MAZ TBL1X TREM2 FOXM1 ZNF780A///ZNF780B SP2 LOC100131298 PLXNB1 PRKAG2 CD300A TMEM156 BUB1 LRRC37A4 3 ADAM28 RNFT2 ARHGDIA SEPT9 PDZK1 PTPN22 C10orf92 ANGPT1 CLEC4A SUN1 BAIAP2 TGM2 BIN2 PTPRC MCM10 DDX11 PAICS LARP4B IL-8 ANO3 NPR2 GPER RASAL2 RGS13 MED18 EVPL LRMP GRM8 KCNJ16 TYMS C5orf23 FGF9 FLJ22184 SNAP25 ANP32E BCL2L11 NPR3 LOC100509761 P2RY13 WHSC1 GSTT2 VTCN1 IQCK CD28 EPB41L3 CR1 CRY2 AP1S1 STIP1 FKBP10 CGREF1 BCKDHB TLR8 LPHN3 PCTP KCNAB1 CRYAB ARHGEF12 EPN3 HSPB2 KRT23 HEY2 MYLK3 PDE6C LILRA2 HLA-DQA1///HLA-DQA2 RAD54B ZNF208 IPW KRT81 MGP RRAD SUV39H2 HOMER3 PLK4 B3GALT2 BCL2L1 GRTP1 ZNF257 NUMA1 SV2A TRO TMEM100 FLJ13197 HLF BBOX1 GPR110 ANKS1B NYNRIN LILRB1 CES1 FUT9 ARFGEF2 GSTM4 MCL1 IL-8 KIT RAC3 PODXL2 KCNMA1 SNRPN MET SIGLEC1 LOC100510692///NAIP CXorf57 CDH2 SLC6A2 SLC24A3 KIAA1024 ARPC4 PRELP SYCP2 CDKN2A NRAP BRAP SLC4A7 ADAM3A 5 MYBPC1 SHANK2ARHGAP8///PRR5-ARHGAP8 FAM70A EIF5A PEX6 PPP1R3A HRASLS MGC2889 HLA-DRB4///LOC100509582 CDKN2B TPR LOC644450 LYPD1 ID4 SPATA1 HLA-DRA PAR5 FABP7 PPARGC1A NPTXR ST6GALNAC2 PROM1 TM4SF1 KRT20 PTPN20A///PTPN20B SPTLC3 ARTN EPB41L4A IL17RC ZIC3 SUCLG2 SYK XPNPEP1 CCND1 PCYT1A PMS2L2 LGSN ST6GAL1 FKBP5 EPCAM SMPDL3B RIMBP2 CFTR TP53 RARS2 FOLR1 THAP9 GAD1 ARSJ ITPR1 RIT2 FRMD4A PML SERPINA3 SLC27A6 RARRES1 KCNK3 NLRP2 FYB HLA-DRB4 SLC34A2 DEFB1 CADM4 ORC4 RAB6A C7orf44 SRD5A1 CHRM3 LIF COBL TBC1D22B AQP5 SORBS2 EYA2 BTBD18 MTAP GALNT12 GHR CDH1 KCNV2 NDUFA2 ZC3H15 GPR6 EPOR SNAP23 PLD1 DAZ1///DAZ2///DAZ3///DAZ4 VIP EPB41 CHD1L RBPMS MHC-2 ACSM3 CRABP2 ZNF471 EFCAB2 SMR3A MUC7 PRR4 ZNF44 FGF13 PART1 HLA-DQB1 NOVA1 TRPC3 RNF6 MYO10 MOSC1 KITLG CDV3 CP CX3CL1 WNT4 C8orf84 PMAIP1 ACTN2 APOF TIMM17A NR0B1 ZNF238 C18orf25 ASB4 SMR3B MYOZ2 CXCR2 ABLIM3 PSPH SMR3A///SMR3B PNMAL1 GPR56 HTR7 IQCH PHLPP1 IMPG1 CDK5R2 GRK4 TEF DR1 GAL SPICE1 MLLT4 TAPT1 PSG1 MLXIP ZC3H7B 24 TFAP2C SORT1 GM2A HNRNPH1 SHMT1 HLA-DQA1 TFR2 IRX4 MVD MAU2 MAGED4///MAGED4B MHC-2 SOX11 CA10 PCSK1N HLA-DRB1///HLA-DRB3///HLA-DRB4///HLA-DRB5///LOC100133661///LOC100294036///LOC100509582///LOC100510495///LOC100510519 LAMA2 TUBB2B PTPRT CYP1B1 PKNOX1 NUDT11 FAM169A CAMK1D TMSB15A TCEAL2 AP3D1 FBXO17///SARS2 ETV4 GAS2 INVS PLP1 BDNF LOC647070 ZNF711 FAM149AGDF1///LASS1 PTPN2 ELOVL2 LOC100287076 DZIP1 DDX43 N4BP2L1 DCT ARL17A///ARL17B MED1 PITX2 KBTBD10 LCN2 Node 12 C21orf2 CLDN1 RRP15 MYST3 COX11 SCNN1B HOXD3 KCNQ1 SLPI AMY1A///AMY1B///AMY1C///AMY2A///AMY2B SATB2PCDH7 RAD21L1 GP5 PSCA PLGLB1///PLGLB2 WWC1 MAGEB4 ST8SIA1 GPC4 CHI3L2 HOXB6 PDZK1IP1 KLF13 Proliferation STK24 PHF3 TSFM CLCN4 WLS HSD17B2 GDF15 B3GNTL1 MLC1 ABCF2HEXIM1 SUSD4 TSHR FOXI1 LOC202181 KCNJ13 ITGB1 NKX2-2 16 C6orf155 CRISP1 NFIB LRRTM4 RAB9BP1 SLC25A37 SLC25A31 MUC4 C1orf116 SNAPC5 HSD11B2 EHF ST18 CD24 PDE9A MBD2 DUOX1 23HIST1H2AG Node 13 C14orf106 2 GPM6B SLC27A5 TTTY2 STK3 SYCE1L ZNF81 EEA1 EFNA3 LPA BCL11A MED6 CBLC MPHOSPH9 GPRC5A PDGFRA HIST1H3H CRAT APBA2 S100P CCDC88A RAB25 ZNF224 HIST1H4J///HIST1H4K HIST1H2AE HIST1H3B FXYD3 HIST1H4J HIST1H2BH HIST1H2AMHIST1H2BJ C10orf81 PACSIN3 PRSS8 PPP2R3B EIF2S2 NEK1 BTC HIST1H2BC Stroma SLC1A1 PLK1S1 S100A14 HIST1H2BG GABRP TACSTD2 GPM6A LOC389906PSMD11 MUC1 SNCAIP SLC4A4 HIST1H3G PEG10 LOC100505960 HMGCS2 CYP2J2 PKNOX2 IGFBP1 C11orf80 HIST2H4A///HIST2H4B DDX3Y STC2 ADCY10 ZNF750 ARHGEF26 HIST1H2AD///HIST1H3D ALB DNAJC6 CDR1 SEMG2 RND1 MMP20 CUL3 MPZL2 HIST1H1C CRABP1 TRDN CXADR GRIA3 CUX2 MAK LRP2 SLC7A8 CHODL NEBL H2BFS USH2A MSMB ROPN1B MMP7 TNP2POU2F3 NLGN1 ADAM2 CLGN AQP3 Node 14 HOXA1 STEAP4 SLURP1 HOXA10///HOXA9 PCDH9 ATP6V1B1 BMP4 FUT6 AP1M2 ELF5 CHI3L1 ATP6V0A4 EPHX2 HIST1H2BO FAIM2 CASC5 HOXA10 PLA2G16 KCNJ12 SLC25A21 FAR2 SIX3 CALML5 LIN7A HOXA11 1 LOC100510224 HIST1H4H DHRS2 KRT83 UGT8 VIPR2 GCNT2 ZNF814 MARK2 CLU 6 NFASC SNPH BCHE NRXN1 GPRC5C ABCC6 SULT1E1 MGAM TMOD2 OLAH SFTPD CLDN4 14 LALBA DIAPH3 T-cell SRCAP CLDN3 RARRES3 KYNU NUDT4///NUDT4P1 KCNK1 LTF HPGD TSPAN8 VGLL1 CKMT1A///CKMT1B FLG HIST1H2BM SERHL///SERHL2 LRRC31 SLC9A2 ADH1C S100A1 MYO6 NRG1 ACTL8 FRK CLDN7 CYP4F8 GRB14 CSN3 CSN1S1 MYF5 PBX1 MOG CALB2 ATP2C2 DUSP7 ZCCHC11 MUC5B PLEKHA6 BMP7 C4A///C4B///LOC100509001 LASS4 CELSR3 MMP15 ODAM TRPM1 TTC9 APOD TJP3 KRT7 RFPL1S EGR4 PCSK1 CD44 BPI LRRN2 PNPLA3 BIK SLC37A1 CGA MFGE8 GPR64 SSH3 RNF128 SLC9A3R2 KRT19 Node 15 BAMBI KBTBD11 NEFH NVL STS CLIC3 EDDM3A CASC1 CA3 FMO5 PCP4 FAM107A NPAS2 FGFR4 MTRF1L ALOX15B HGD MVK SERPINA5 CEACAM7 PTPRZ1 SERHL2 NQO1 VEGF SLC26A3 NOL3 TF FAM110B EDN2 IGF1R TRIM9 GPR143 GATA3 DLG2 PRMT7 STOM CDK10 ACSM1 0 IL20RA MAOA MRAS ANKRD7 MYOM2 GNMT LOC100289410 GPLD1 ALDH4A1 FEZF2 S100B LOC1720 LOC284649 SCUBE2 TRPV6 MCF2L PIP ARNT2 RPRM CPB1 MAP9 LMF1 HPD C9orf156 PCCA NRTN ZNF135 TP53TG1 PLEKHB1 MFSD7 ZNF667 AKR1D1 HSD3B1 CYP4B1 PCDH8 C2orf72 GRIA1 SCG3 PTK6 DDC ARHGAP44 CA6 SULT1C2 NTRK3 PLCE1 PNPLA4 PCSK6 GABRB3 RAB3B PRKD1 NES ACADL EEF1A2 ACAN CTNNA2 KCNE1L 22 SFRP1 FLJ11235 UGT2A3 GABBR2 ABCC2 MBNL1 SOX10 GLRA3 GAMT CTTN SLC6A20 TEX14 KCTD14 SLC4A8 SYNM GRM5 CNTNAP2 KIAA1659 RGR PEG3 TLE4

MFI2 NPY1R SLC35F5 GTF2A1L///STON1-GTF2A1L SYNPO2L GRIA2 TTYH1 NCAN SCRG1 MAOB FGF2 FFAR2 7 UGT2B28 GFAP CYP39A1 IFT122 FGFR2 TSPAN12 HRASLS2 EXTL1 TFPI2 CYP2B7P1 DLK1 PLAG1 CYP2B6///CYP2B7P1 IL17B MYOT SEMA3B CAPN6 CRHR1 ZNF780B FOXD1 NTRK2 VAV3 DIO1 GP1BB///SEPT5 HRK PTH FZD9 SLC16A1 ERO1LB PVALB KCNK2 CEP135 CTNND2 SEMA6A ACE2 WIF1 MFAP3L PNMT FERMT1 TGFB2 PNLIPRP2DCX PTPLA COL9A3 MIA FGFR3 LARGE GRM1 MPPED2 GNAS FOXC1 RET MATN2 KRTAP1-3 WISP3 NKX2-5 CEL///LOC100508206 COL2A1 CRLF1 DLGAP1 DSG1 LAMB3 PTGS2 AP4E1 CAPN9 SUSD5 EPHX3 QPCT EPHA4 EFHD1 PRRX2 SLCO1A2 PLA2G4A CHST3 SIDT1 SMA4///SMA5 KCNJ4 ACSBG1 TMOD1 SYCP1 DSC3 DSG3 PYGB MTMR7 FES SLC12A1 PAX3 11CREM FCN2 DLX5 PAX2 RTEL1 EIF1AY KLK7 VSNL1 SMPX RPE65 FBN2 COL11A2 GJB5 LGR5 SCGB2A1 AZGP1 SPRY4 KCNK12 SLCO3A1 FAM149B1 GSTT1 CORO1B SYT1 SNTB1 FAM5C CYP26B1 PTX3 CSPG4 ACOX2 SCGB1D2 MCOLN3 GJB3 SERPINB5 TMEM158 TMX4 SH3BP2 PKP1 DUSP6 ANXA9 EPO ADD2 KCNJ3 PON3 CACNA1A HAPLN1 ZNF407 SCGB2A2 IGF2BP2 COCH ADCY2 CYLC1 TGM1 CDH19 TFPI SLC27A2 SCAND2 FCGBP FADS1 10 RAB27B KIF1A PADI3 KLK6 PCOLCE2 TMPRSS11E RALYL NR2E1 KIAA1644 MCF2L2 SOS2 COL9A1 LRFN4 TSPAN1 SIX1 MAPT ELOVL4 UCHL1 DSC2 FSD1 HR FOXG1 CDY1 SLCO4C1 BMP2 MNX1 IRF4 RAD23B TNFRSF11B PRSS21 C8orf4 SELENBP1 CRISP3 CRISP2 ZIC1 AGR2 POF1B TNNI2 SOX9 MAGEA9///MAGEA9B GYPA CLCA2 KLK5 BARX2 SLC38A4 AR SPDEF TOX3 IGF2BP3 TRIM36 HPX TBX3 NMNAT2 MAGEA4 MSX2 KLK11 PRAME PADI4 TTC12 CACNA1D ALCAM PRKCA KIAA1324 ALDH1A3 ZNF557 FOXA1 MBL2 C6orf15 PLCH1 CSAG2///CSAG3 GPR85 TFF3 INSM1 ABCA12 ABCC3 GABRA2 COL4A5 DUSP4 CLDN8 FABP6 MLPH EGF LGALS7///LGALS7B MAGEA1 MEGF9 DNALI1 N6AMT1 GCM2 RAB38 ST3GAL6 TFAP2B TMPRSS2 KLK10 CALB1 FOXE1 KCNH2 JMJD7 SLC44A4 ESRRG ALDOB GLDC UGT2B17 CEACAM6 OLFM4 HLCS S100A2 MAGEA2///MAGEA2B TFF1 CA12 ZFP2 REEP1 CTAG1A///CTAG1B ATP8A2 PCDHB3 PCDHB13 AKR1B10 SEMA3E EYA1 SOSTDC1 SEMA3C SYT17 ABCA4 C6orf26///MSH5 TDRD1 CYB5A PCDHB8 KRT15 TMC5 LPAR4 NLGN4Y MAGEA12 CTAG2 DACH1 PTHLH TRPM4 SCNN1A CEACAM1 CYP24A1 TYRP1 KRT14 TM7SF2 TFF2 LY6G6E GNGT1 LOC100289775///WNT7B GABRA5 UGT1A1///UGT1A10///UGT1A3///UGT1A4///UGT1A5///UGT1A6///UGT1A7///UGT1A8///UGT1A9 SLC6A15 TCN1 UGT2B4 GABRA5///LOC100509612 DSC1 MAGEA3 P2RY2 OCA2 SLC6A14 IL4 UGT1A8///UGT1A9 GPR37 PP14571 RYR1 LOC51152 CLDN10 MAGEA6 EGFR FAT2 KRT17 ALDH3B2 KCNK15 KRT6B TNFRSF11A GAGE12C///GAGE12D///GAGE12E///GAGE12F///GAGE12G///GAGE12H///GAGE12I///GAGE2A///GAGE2C///GAGE4///GAGE5///GAGE6///GAGE7 ARSD BAGE MAGEA5 AQP4 KRT75 KRT5 GAGE1///GAGE12F///GAGE12G///GAGE12I///GAGE12J///GAGE4///GAGE5///GAGE6///GAGE7

GAGE3 EN1 ELAVL2 ANXA8///ANXA8L1///ANXA8L2 ANKRD1 PLCB1 FGB MSLN WNT6 GPC5 CAMP CCNA1 GAGE1///GAGE12C///GAGE12D///GAGE12E///GAGE12F///GAGE12G///GAGE12H///GAGE12I///GAGE12J///GAGE2A///GAGE2C///GAGE2D///GAGE2E///GAGE4///GAGE5///GAGE6///GAGE7///GAGE8 COBLL1 HTR2C OBP2A///OBP2B IL12RB2 GPR172B PDLIM4 OBP2A FGG PCYT1B TNIK SIM1 FHOD3 CCDC70 RBP1 GAGE12F///GAGE12G///GAGE12I///GAGE5///GAGE7 WNT5B MAGEA11 GABRE MYBL1

APOBEC3B SEZ6L2 RPL5///SNORA66 GAGE1///GAGE12F///GAGE12G///GAGE12I///GAGE12J///GAGE2A///GAGE2B///GAGE2C///GAGE2D///GAGE2E///GAGE3///GAGE4///GAGE5///GAGE6///GAGE7///GAGE8 ORM1 LBP CA8 LY6D ART3 ACSL6 PITX1 GDPD5 DTNA QPRT MAGIX 12 CSTA BARX1 TRIM29 ORM1///ORM2 TMPRSS3 S100A9 PIK3R4 S100A7 HSD17B1 ADAM22 MUC16 PRSS12 PAK7 WFDC2 RHCG TNNT1 9 GPR109B NFE2 KRT16 NRG2 PGBD5 PI3 KCNG1 CST6 S100A8 APLP1 EGOT LOC100505650

SLC15A1 ZNF334 L1CAM KRT4

KRT6A///KRT6B///KRT6C SCEL 8 SOX15 CDH3 HES2 KRT6A

SFN SERPINB13 LYPD3 SPRR1B CELA2A///CELA2B SERPINB3 ADRA1A CHD7 KRT13 RGS20

STYK1 SERPINB7 TRGV5 SERPINB4 CDA SPINK5 CRCT1 GPR87 SPRR1A SERPINB2

CYP3A5

SPRR2BSPRR3TMPRSS4 CWH43

ARL14 NUP62CL

Figure 2. PGM represents the resulting network in which each functional node is encoded from 0 to 26, each box (node) represents one gene and lines (edges) connect genes with related expression. Genes from Rody’s metagenes are represented by diferent colors.

Metanode classificaon Cellular classificaon

Luminal LAR posive

Basal BASAL posive Luminal CLDN- negave CLDN-High High Basal Negave CLDN- CLDN-Low Low

Figure 3. Workfow from the sparse k-means groups in each metanode to the fnal cellular classifcation.

in the actin binding functional node, higher activity than tumors belonging to any other subgroup in chemok- ine activity functional node and lower activity than CLDN-low and LAR subtypes in the haptoglobin binding functional node. Basal tumors had higher activity than any other tumor in the functional nodes related to cell adhesion and regulation of the actin cytoskeleton. Finally, LAR tumors had lower activity in the nodes related to cell adhesion, G1/S transition of mitotic cell cycle and chemokine activity (Sup. File 4).

Immune metanode activity: Immune characteristics. On the other hand, taking the immune metan- ode into account, tumors were split according to their immune (IM) activity. High/low immune activity was defned with the sparse K-means method using genes included in the IM metanode. Some 259 (52%) samples were included in the IM-positive (IM+) group and 235 (48%) were included in the IM-negative (IM−) group (Table 4).

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 5 www.nature.com/scientificreports/

Luminal N Basal N CLDN Tumors % of total Cellular N High 40 (43%) 8% CLDN-High 40 (8%) — 93 (23%) Low 53 (57%) 11% CLDN-Low 53 (11%) — 403 (82%) High 245 (79%) 50% + 310 (77%) Basal 310 (63%) Low 65 (21%) 13% High 79 (94%) 16% — 84 (92%) Low 5 (6%) 1% + 91 (18%) LAR 91 (18%) High 7 (100%) 1% + 7 (8%) Low 0 0%

Table 2. Number of tumors classifed in each metanode sparse k-means group and in the cellular classifcation.

Tumor size Grade Nodal Cellular Classifcation T1 >T1 p-value G1 or G2 G3 p-value N0 N1 p-value Basal 76 (32%) 163 (68%) 0.169 45 (18%) 199 (82%) 0.015 168 (83%) 35 (17%) 0.262 CLDN-High 2 (7%) 27 (93%) 0.023 5 (14%) 31 (86%) 0.110 19 (83%) 4 (17%) 0.795 CLDN-Low 10 (24%) 32 (76%) 0.853 20 (49%) 21 (51%) 0.005 28 (72%) 11 (28%) 0.313 LAR 11 (17%) 54 (83%) 0.121 33 (53%) 29 (47%) <0.001 36 (67%) 18 (33%) 0.056 Total 99 (26%) 276 (74%) — 103 (27%) 280 (73%) — 251 (79%) 68 (21%) —

Table 3. Number of tumors with clinical characteristics. T1: tumor smaller than 2 cm; >T1: tumor larger than 2 cm; G3: grade 3; G1 or G2: grade 1 or grade 2; Nodal (N0): no node infltration; N1: node infltration. % is calculated using the total amount of a row for each clinical characteristic. Fisher exact test were performed between each group of the cellular classifcation and the total population (signifcant p-value = 0.05).

IM negative IM positive Cellular Cellular Classifcation Tumors % Classifcation Tumors % Basal 159 68% Basal 151 58% CLDN-Low 23 10% CLDN-Low 30 12% LAR 42 18% LAR 49 19% CLDN-High 11 5% CLDN-High 29 11%

Table 4. Immune characteristic interaction with cellular classifcation. According to the chi-squared test, IM characteristics and cellular classifcation are dependent.

IM+ tumors had a better prognosis than IM- tumors (hazard ratio [HR], 0.7286; 95% confdence interval [CI] 0.5329–0.9961; P < 0.05) (Fig. 4A). In addition, the immune metanode activity had a prognostic impact on the groups defned by the cellular classifcation. Patients with IM+/LAR subtype tumors had a better progno- sis than those with IM−/LAR tumors (HR, 0.3474; 95% CI 0.1657–0.7284; P < 0.05). Also, patients with IM+/ CLDN-high tumors had a better prognosis than those with IM−/CLDN−, although these diferences did not reach statistical signifcance (HR, 0.3556; 95% CI 0.04115–0.9828; P = 0.057). IM activity had no impact on the prognosis of the basal and CLDN-low subtypes (Fig. 4B).

Comparison between Cellular classifcation and PAM50, TNBC4-type and Burstein’s classifca- tions. Cellular classifcation and previous classifcations were compared (Fig. 5). Te basal subtype is highly enriched in basal-like immune suppressed (BLIS) and basal-like immune associated (BLIA) (Burstein 2015), basal (PAM50 + CLDN-low) and M (Lehmann 2016) subtypes, and it is poorly represented in the LAR subtypes from the Burstein and Lehmann classifcations. Te CLDN-high subtype is highly enriched in BLIA (Burstein 2015) and BL2 (Lehmann 2016). Te CLDN-low subtype is highly enriched in MES (Burstein 2015), LumA (PAM50 + CLDN-low) and BL2 (Lehmann 2016). Te LAR subtype is highly enriched in LAR (Burstein 2015), LumA (PAM50 + CLDN-low) and LAR (Lehmann 2016). Te LAR subtype is not present in Basal (PAM50) and BL1 (Lehmann) assignations (Fig. 5 and Table 5).

Immune characteristics and previous classifcations. Te Mesenchymal subtype from the TNBC4 type7 was highly enriched in IM- samples (148 samples of 187, 80% of all M subtype samples). Also, BL2 was enriched in IM+ samples (135 samples of 185, 72% of all BL2 subtype samples). Te IM+ and IM- groups showed no prognostic value for the BL1, BL2 and M groups (Fig. 6). However, patients with IM+ tumors had better prog- nosis than those with IM− in the LAR group (HR, 0.2896; 95% CI 0.1125–0.7273; P < 0.05). Te IM+ and IM- subgroups were evenly distributed in the subtypes defned by PAM50 and CLDN-low, with the exception of the HER2 subtype, which was enriched in IM+ (Table 6).

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 6 www.nature.com/scientificreports/

Figure 4. Kaplan-Meier survival curves represent the survival rate of immune-positive and immune-negative tumors in the whole cohort (A) and in the four cellular subgroups (B).

Figure 5. Various molecular classifcations compared with the cellular classifcation. From top to bottom, cellular, PAM50 + CLDN-low, Lehmann 2016 TNBC4 type, immune and Burstein’s classifcations are presented.

Basal CLDN-High CLDN-Low LAR Burstein N % Burstein N % Burstein N % Burstein N % BLIA 104 34% BLIA 23 58% BLIA 11 21% BLIA 2 2% BLIS 149 48% BLIS 3 1% BLIS 3 6% BLIS 1 1% LAR 4 1% LAR 4 1% LAR 3 6% LAR 76 84% MES 53 17% MES 10 25% MES 36 68% MES 12 13% PAM50 + CLDN−Low N % PAM50 + CLDN−Low N % PAM50 + CLDN−Low N % PAM50 + CLDN−Low N % Basal 125 40% Basal 13 33% Basal 5 9% Basal 0 0% CLDN-Low 76 25% CLDN-Low 9 23% CLDN-Low 44 83% CLDN-Low 13 14% Her2 23 7% Her2 6 15% Her2 1 2% Her2 8 9% LumA 25 8% LumA 7 18% LumA 1 2% LumA 52 57% LumB 27 9% LumB 4 10% LumB 4 4% LumB 16 18% Normal 34 11% Normal 1 3% Normal 0 0% Normal 2 2% TNBC4 type N % TNBC4 type N % TNBC4 type N % TNBC4 type N % BL1 57 18% BL1 8 20% BL1 1 2% BL1 0 0% BL2 81 26% BL2 29 73% BL2 47 89% BL2 28 31% LAR 3 1% LAR 0 0% LAR 1 2% LAR 52 57% M 169 55% M 3 8% M 4 8% M 11 12%

Table 5. Shows comparisons between Cellular classifcation and PAM50, Lehmann’s and Burstein’s classifcations.

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 7 www.nature.com/scientificreports/

Figure 6. Kaplan-Meier survival curves represent the survival rate of immune-positive and immune-negative tumors in the TNBC4-type subgroups.

PAM50 + CLND-low IM− IM+ Basal 69 (48%) 74 (52%) CLDN-low 62 (44%) 80 (56%) Her2 10 (26%) 28 (74%) LumA 43 (51%) 42 (49%) LumB 27 (55%) 22 (57%) Normal 24 (65%) 13 (35%)

Table 6. Shows immune characteristics in the PAM50+CLDN-low subgroups.

LumA immune-positive tumors had a better prognosis than immune-negative tumors (HR, 2.638; 95% CI 1.098–6.341; P < 0.05). Basal Immune and normal-like immune-positive tumors also showed a trend toward a better prognosis than immunenegative, but the diferences were not statistically signifcant. Finally CLDN-low, LumB and HER2 tumors showed no diferences in prognosis related to their immune status (Fig. 7). Finally, the Burstein subtype BLIA was highly enriched in the IM+ (106 samples of 140, 75%) and the BLIS was highly enriched in the IM- tumors (119 samples of 156, 76%). Immune-positive and immune-negative tumors had diferent outcomes in each of the Burstein’s subgroups. BLIA, BLIS and LAR immune-positive tumors as well as MES immune-negative tumors had a better prognosis, although the diferences were not statistically signifcant (Fig. 8).

Implications of the cellular classifcation and the immune characteristic in response to neoad- juvant treatment. Cellular classifcation was transferred using genes from the basal and luminal metanodes and the CLDN-enriched functional node. Of 153 triple-negative breast cancer tumors, 79 were assigned to the basal subgroup (51%), 8 were assigned to the CLDN-high subgroup (5%), 19 were assigned to the CLDN-low subgroup (12%) and 47 were assigned to the LAR subgroup (31%). Te immune characteristic was transferred using genes from the immune metanode. Some 80 samples were immune-negative (52%) and 73 samples were assigned to the immune-positive subgroup (47%) (Table 7). Te CLDN-high subgroup presented the poorest prognosis among the cellular classifcation subgroups. Immune-positive tumors had a better prognosis (Fig. 9).

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 8 www.nature.com/scientificreports/

Figure 7. Kaplan-Meier survival curves represent the survival rate of immune-positive and immune-negative tumors in the PAM50 + CLDN-low subgroups.

Figure 8. Kaplan-Meier survival curves represent the survival rate of immune-positive and immune-negative tumors in the Burstein’s subgroups.

Discussion TNBC constitutes a heterogeneous disease with various molecular entities. Te study of this heterogeneity has thus far not conferred signifcant advances in the treatment of patients. Te application of probabilistic graphical models (PGMs) provides deep insight into high-throughput data18. In the present study, we used PGMs to unravel

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 9 www.nature.com/scientificreports/

Cellular Classifcation Number IM Characteristic Number %Intragroup IM− 41 52% Basal 79 (52%) IM+ 38 48% IM− 2 25% CLDN-High 8 (5%) IM+ 6 75% IM− 12 63% CLDN-Low 19 (12%) IM+ 7 37% IM− 25 53% LAR 47 (31%) IM+ 22 47%

Table 7. Shows the cellular classifcation and the immune characteristic in the neoadjuvant dataset.

Figure 9. Kaplan–Meier survival curves represent the distant relapse-free survival rate of the cellular and the TNBC4-type subgroups in the GSE25066 series.

specifc molecular information concerning various biological entities, such as the immune status or the develop- mental point when the breast stem cell turns carcinogenic. Previous studies used diferences in gene expression to defne TNBC subtypes3,6–8,10. Subtypes emerged from clustering methods such as HCL or non-negative matrix factorization, which group genes around specifc func- tions. On the contrary, we hereby applied an unsupervised analysis, without knowledge of the functions of the genes selected in each step of the process. We ultimately identifed the genes involved in 26 diferent molecular functions, which agreed with the metagenes described by Rody et al.5. Tis approach provides two diferent clas- sifcations (immune and cellular), each related to particular genes and functions. Once the PGM functional structure was established, we defned four subgroups: CLDN-low, CLDN-high, basal-like and LAR, agreeing with the cancer stem cell hypothesis2,13–15. Tese four groups identify the point of the diferentiation process where the stem cell becomes carcinogenic: the less diferentiated tumors will be CLDN-low, and the most diferentiated tumors will be LAR. Functional node activities confrm that there are diferences among cellular subgroups, and some of these diferences could have therapeutic utility. For example, the activity of node 19 (PPAR signaling pathway) showed meaningful diferences between the CLDN-low subgroup and the other three, suggesting that PPAR-directed therapies might have a diferent efect on the CLDN-low subgroup. Finally, we observed that cellular subgroups had diferent clinical features. On the other hand, the immune layer was described in this study as a compendium of functional nodes, each of which related to a specifc immune function. However, when taking all these nodes together as a metanode we were able to establish an immune classifcation with prognostic value among all the series. Te immune and cellular classifcations refected unrelated biological identities. As shown in Fig. 4, the LAR and CLDN-high subgroups presented diferent prognoses when split by the immune layer. LAR immune-negative tumors were associated with a 30% 5-year survival rate compared with 70% in the LAR immune-positive group. Te immune-based subtype might also infuence the response to . Ongoing trials are evaluating anti-PD1 in breast cancer, particularly in triple-negative disease24. It would be interesting to assess the efcacy of anti-PD1 therapy in subtypes defned by immune layer. We also compared the cellular classifcation with other classifcations previously described7,8,10. LAR is over- represented in every luminal subgroup regardless of the classifcation, which demonstrates that this is a homo- geneous and reproducible group. Similarly, the basal cellular subgroup is overrepresented in basal subgroups

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 10 www.nature.com/scientificreports/

across classifcations. Tere is also a high correlation (83%) in the CLDN-low cellular groups, which confrms the existence of a CLDN-low subgroup independent of the expression of ER, PR and HER2, as previously suggested16. Our results show that immune features appear across different subtypes. Interestingly, the luminal immune-positive group did much better than the luminal immune-negative group. Regardless of the classifca- tion7,8,10, the immune layer added prognostic information to the luminal subtypes. Te immune layer had been previously defned as a separate group in these classifcations, but it appears to intersect with other biological features, providing additional prognostic value. With regard to the cellular classifcation, our CLDN-low cellular subgroup had an 89% concordance with the basal-like 2 Lehmann’s subgroup, which puts BL2 in the stem cell hypothesis context, suggesting that basal-like 2 tumors might be caused by early diferentiated carcinogenic stem cells. Te CLDN-high subgroup does not appear in other classifcations, which suggests that this is an intermediate group between CLDN-low tumors (stem cell not yet expressing CLDN genes) and basal tumors. It might be difcult to draw the line between groups in this continuous, cellular diferentiation-based classifcation, although Burnstein’s basal-like immune-active corresponded to the CLDN-high immune-negative in our classifcation. Regardless of the classifcation, there was always a luminal subgroup, one or two basal subgroups and some mesenchymal or CLDN subgroup. Our classifcation could also provide some predictive information. CLDN-high tumors had a poor response to neoadjuvant chemotherapy. Much efort has been devoted to the prediction of response to chemotherapy in TNBC. Cell-free DNA25, tumor-infltrating lymphocytes26, microRNA signatures27 and proteomics28, among oth- ers, have recently been proposed as useful methods in this regard. Further research is needed before the cellular classifcation described in the present paper could be considered in the selection of therapy. Tis study has some limitations. Te 2010 American Society of Clinical Oncology guidelines established the 1% threshold for the expression of PR and ER29; however, our tumor series was assessed before that date, so we cannot ensure that all the TNBC tumors fulflled this criterion. Another limitation to our study is that the cellular classifcation is based on a continuum, which makes it difcult to set categories. Finally, these results should be validated in additional cohorts to evaluate the robustness of our cellular and immune classifcation. However, we believe that our fndings serve as an important hypothesis in generating fndings that can be explored in future studies. Conclusion In conclusion, the use of probabilistic graphical models in TNBC suggests that there are at least two independent biological layers, cellular and immune. We propose a new way to characterize TNBC taking these two dimensions into account, and leading to the result that the luminal immune-positive subgroup had a better prognosis than the luminal immune-negative. Availability of Data and Material The datasets analyzed during the current study, GSE31519 [https://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc = GSE31519], and GSE25066 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE25066], are available in the GEO Datasets repository. References 1. Network, C. G. A. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012). 2. Stingl, J. & Caldas, C. Molecular heterogeneity of breast carcinomas and the cancer stem cell hypothesis. Nat Rev Cancer 7, 791–799 (2007). 3. Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406, 747–752 (2000). 4. Yersal, O. & Barutca, S. Biological subtypes of breast cancer: Prognostic and therapeutic implications. World J Clin Oncol 5, 412–424 (2014). 5. Rody, A. et al. A clinically relevant gene signature in triple negative and basal-like breast cancer. Breast Cancer Res 13, R97 (2011). 6. Lehmann, B. D. et al. Identifcation of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J Clin Invest 121, 2750–2767 (2011). 7. Lehmann, B. D. et al. Refnement of Triple-Negative Breast Cancer Molecular Subtypes: Implications for Neoadjuvant Chemotherapy Selection. PLoS One 11, e0157368 (2016). 8. Burstein, M. D. et al. Comprehensive genomic analysis identifes novel subtypes and targets of triple-negative breast cancer. Clin Cancer Res 21, 1688–1698 (2015). 9. Sabatier, R. et al. Kinome expression profling and prognosis of basal breast . Mol Cancer 10, 86 (2011). 10. Prat, A. & Perou, C. M. Deconstructing the molecular portraits of breast cancer. Mol Oncol 5, 5–23 (2011). 11. Jézéquel, P. et al. Gene-expression signature functional annotation of breast cancer tumours in function of age. BMC Med Genomics 8, 80 (2015). 12. Milioli, H. H., Tishchenko, I., Riveros, C., Berretta, R. & Moscato, P. Basal-like breast cancer: molecular profles, clinical features and survival outcomes. BMC Med Genomics 10, 19 (2017). 13. Shipitsin, M. & Polyak, K. Te cancer stem cell hypothesis: in search of defnitions, markers, and relevance. Lab Invest 88, 459–463 (2008). 14. Sims, A. H., Howell, A., Howell, S. J. & Clarke, R. B. Origins of breast cancer subtypes and therapeutic implications. Nat Clin Pract Oncol 4, 516–525 (2007). 15. Allegra, A. et al. Te cancer stem cell hypothesis: a guide to potential molecular targets. Cancer Invest 32, 470–495 (2014). 16. Prat, A. et al. Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res 12, R68 (2010). 17. Ritchie, M. E. et al. limma powers diferential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47 (2015). 18. Gámez-Pozo, A. et al. Combined Label-Free Quantitative Proteomics and microRNA Expression Analysis of Breast Cancer Unravel Molecular Diferences with Clinical Implications. Cancer Res 75, 2243–2253 (2015). 19. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44–57 (2009). 20. de Velasco, G. et al. Urothelial cancer proteomics provides both prognostic and functional information. Sci Rep 7, 15819 (2017).

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 11 www.nature.com/scientificreports/

21. Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine learning 52, 91–118 (2003). 22. Witten, D. M. & Tibshirani, R. A framework for feature selection in clustering. J Am Stat Assoc 105, 713–726 (2010). 23. Saeed, A. I. et al. TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34, 374–378 (2003). 24. Katz, H. & Alsharedi, M. Immunotherapy in triple-negative breast cancer. Med Oncol 35, 13 (2017). 25. Sung, J. S. et al. Detection of somatic variants and. Oncotarget 8, 106901–106912 (2017). 26. Wein, L. et al. Clinical Validity and Utility of Tumor-Infltrating Lymphocytes in Routine Clinical Practice for Breast Cancer Patients: Current and Future Directions. Front Oncol 7, 156 (2017). 27. García-Vazquez, R. et al. A microRNA signature associated with pathological complete response to novel neoadjuvant therapy regimen in triple-negative breast cancer. Tumour Biol 39, 1010428317702899 (2017). 28. Gámez-Pozo, A. et al. Prediction of adjuvant chemotherapy response in triple negative breast cancer with discovery and targeted proteomics. PLoS One 12, e0178296 (2017). 29. Hammond, M. E., Hayes, D. F., Wolf, A. C., Mangu, P. B. & Temin, S. American society of clinical oncology/college of american pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. J Oncol Pract 6, 195–197 (2010). Acknowledgements This study was funded by Instituto de Salud Carlos III, Spanish Economy and Competitiveness Ministry, Spain and co-funded by the FEDER program, “Una forma de hacer Europa” (PI15/01310). LT-F is supported by the Spanish Economy and Competitiveness Ministry (DI-15-07614). GP-V is supported by Conserjería de Educación, Juventud y Deporte of Comunidad de Madrid (IND2017/BMD7783). Author Contributions All the authors have directly participated in the preparation of this manuscript and have read and approved the fnal version submitted. J.M.A., H.N. and P.M. contributed the probabilistic graphical models. M.D.-A. and G.P.-V. contributed the statistical analyses. G.P.-V., L.T.-F., A.Z.-M., M.F.-G. and R.L.-V. performed the probabilistic graphical model interpretation and the gene ontology analyses. G.P.-V. drafed the manuscript. G.P.-V., A.G.-P., J.A.F.V., J.F., P.Z. and E.E. conceived of the study and participated in its design and interpretation. A.G.-P., J.A.F.V., E.E. and L.T.-F. supported the manuscript drafing. A.G.-P. and J.A.F.V. coordinated the study. Additional Information Supplementary information accompanies this paper at https://doi.org/10.1038/s41598-018-38364-y. Competing Interests: A.F.V., A.G.-P. and E.E. are shareholders of Biomedica Molecular Medicine S.L. L.T.-F. and G.P.-V. are employees of Biomedica Molecular Medicine S.L. Te other authors declare no competing interests. Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre- ative Commons license, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not per- mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

© Te Author(s) 2019

Scientific Reports | (2019)9:1538 | https://doi.org/10.1038/s41598-018-38364-y 12