ARTICLE

DOI: 10.1038/s41467-018-04383-6 OPEN Distinct epigenetic landscapes underlie the pathobiology of pancreatic cancer subtypes

Gwen Lomberk 1, Yuna Blum 2, Rémy Nicolle 2, Asha Nair3, Krutika Satish Gaonkar3, Laetitia Marisa 2, Angela Mathison4, Zhifu Sun 3, Huihuang Yan 3, Nabila Elarouci2, Lucile Armenoult2, Mira Ayadi 2, Tamas Ordog 5,6, Jeong-Heon Lee5, Gavin Oliver3, Eric Klee3, Vincent Moutardier7,8, Odile Gayet7, Benjamin Bian 7, Pauline Duconseil7, Marine Gilabert7, Martin Bigonnet7, Stephane Garcia7,8, Olivier Turrini7,9, Jean-Robert Delpero9, Marc Giovannini9, Philippe Grandval10, Mohamed Gasmi8, Veronique Secq8, Aurélien De Reyniès2, Nelson Dusetti7, Juan Iovanna7 & Raul Urrutia 1,4,5 1234567890():,;

Recent studies have offered ample insight into genome-wide expression patterns to define pancreatic ductal adenocarcinoma (PDAC) subtypes, although there remains a lack of knowledge regarding the underlying epigenomics of PDAC. Here we perform multi- parametric integrative analyses of immunoprecipitation-sequencing (ChIP-seq) on multiple modifications, RNA-sequencing (RNA-seq), and DNA methylation to define epigenomic landscapes for PDAC subtypes, which can predict their relative aggressiveness and survival. Moreover, we describe the state of promoters, enhancers, super-enhancers, euchromatic, and heterochromatic regions for each subtype. Further analyses indicate that the distinct epigenomic landscapes are regulated by different membrane-to-nucleus path- ways. Inactivation of a basal-specific super-enhancer associated pathway reveals the exis- tence of plasticity between subtypes. Thus, our study provides new insight into the epigenetic landscapes associated with the heterogeneity of PDAC, thereby increasing our mechanistic understanding of this disease, as well as offering potential new markers and therapeutic targets.

1 Division of Research, Department of Surgery, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin 53226, USA. 2 Programme Cartes d’Identité des Tumeurs (CIT), Ligue Nationale Contre Le Cancer, 14 rue Corvisart, Paris 75013, France. 3 Division of Biomedical Statistics and Informatics, Department of Health Science Research, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, USA. 4 Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin 53226, USA. 5 Epigenomics Program, Center for Individualized Medicine, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, USA. 6 Department of Physiology and Biomedical Engineering, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. 7 Centre de Recherche en Cancérologie de Marseille (CRCM), INSERM U1068, CNRS UMR 7258, Aix-Marseille Université and Institut Paoli-Calmettes, Parc Scientifique et Technologique de Luminy, 163 Avenue de Luminy, Marseille 13288, France. 8 Hôpital Nord, Chemin des Bourrely, Marseille 13015, France. 9 Institut Paoli-Calmettes, 232 Boulevard Sainte Marguerite, Marseille 13009, France. 10 Hôpital de la Timone, 264 rue Saint-Pierre, Marseille 13385, France. These authors contributed equally: Gwen Lomberk, Yuna Blum, Rémy Nicolle. Correspondence and requests for materials should be addressed to G.L. (email: [email protected]) or to J.I. (email: [email protected]) or to R.U. (email: [email protected])

NATURE COMMUNICATIONS | (2018) 9:1978 | DOI: 10.1038/s41467-018-04383-6 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04383-6

ancreatic ductal adenocarcinoma (PDAC) is a painful and the current study is that implantation of patient-derived tumors fatal disease that undoubtedly remains a health priority, into mice reproduces two molecularly distinct subtypes of pan- P fi offers signi cant therapeutic challenges, and will soon rank creatic cancer, namely basal and classical, indicating that these as the second cause of death by cancer in the world1. Searching phenotypes are primarily maintained by the epithelial cell com- for somatic genetic causes, many laboratories have discovered ponent. We report the first known multi-factorial integrative and tumor suppressors for PDAC2. However, cumu- analysis of genome-wide ChIP-Seq for five distinct histone marks lative evidence reveals more complex mechanisms underlying the (H3K4me1, H3K27ac, H3K4me3, H3K27me3, and H3K9me3), development and progression of this disease, involving, among DNA methylation, and RNA-seq on 24 PDAC samples others, interactions between genomic and epigenomic altera- grown as PDTXs (Supplementary Table 1). Using a multivariate tions3. Although extensive studies have provided an under- Hidden Markov Model, as built by ChromHMM8, our analysis standing of aberrant networks4, insights into the assigned fifteen chromatin states along with the average genome epigenomics of PDAC remains remarkably limited. coverage by each state (Fig. 1a–c). In agreement with the Epigenomics, which is the basis for the regulation of gene ENCODE Roadmap project, we defined that H3K4me3 contain- activity, expression, as well as nuclear organization and function ing states are mostly found in promoter regions at the tran- includes the posttranslational modifications of histone and scription start site (TSS) or flanking the TSS, often combined with DNA methylation. contained within the basic repeating the presence of H3K27ac and H3K4me1 (Fig. 1a). H3K4me3 near unit of chromatin, a , can be dynamically modified at the TSS was associated with highly expressed , as deter- specific residues to signal for the activation or repression of tran- mined by RNA-seq (E1 to E4; Fig. 1d) and low DNA methylation scription. Several modifications have been associated with parti- levels (Fig. 1c), unless present in combination with H3K27me3 cular transcriptional regulatory outcomes, including H3K4me3 (E5 and E6), which in general is a mark associated to strong with active promoters, H3K27ac with active enhancers and pro- downregulation (E5, E6, E13, and E15; Fig. 1b). Overall, the moters, H3K4me1 with active and poised enhancers, H3K9me3 epigenetic states in PDAC correlated with clear effects on with heterochromatin, and H3K27me3 with Polycomb-repressed expression of nearby genes, as defined by RNA-seq (Fig. 1b). In regions5. In addition, the combined distribution of different his- addition, DNA methylation levels show specific patterns in tone modifications reveals different epigenetic signals that mediate association with that are chromatin state- transcriptional initiation, elongation, and splicing, as well as DNA dependent (Fig. 1c). Active promoters (E1 to E4) are, as expec- repair and replication, among others6. ted, significantly hypomethylated and strong repressive states Here we have performed a multi-factorial integrative analysis (E12, E13, and E15) are highly methylated. However, DNA of genome-wide chromatin immunoprecipitation-sequencing methylation has a complex state-dependent role in gene regula- (ChIP-seq) on multiple histone modifications, as well as RNA- tion as hypermethylation can be associated to over-expression sequencing (RNA-seq) and DNA methylation studies to generate, (e.g., E10 and E12) and hypomethylation to under-expression for the first time, new knowledge on epigenetic landscapes linked (e.g., E5 and E6). On average, 28% of each PDAC sample’s epi- to the heterogeneity of PDAC grown as patient-derived tumor genome was associated with silencing heterochromatin or xenografts (PDTXs). We report that PDTXs recapitulate two repressed polycomb-based modifications (E12, E13, and E15; phenotypes observed in vivo, namely the classical and basal Fig. 1a). Approximately 57% of each epigenome had none of the subtypes. Multi-parametric integrative analyses of these multi- tested histone marks, which, similar to reports in most cell types9, omics datasets identify key epigenomic landscapes that are con- results in a majority of genome categorized in the so-called gruent with disease aggressiveness and survival. Super-enhancer quiescent state (E14). The remaining 15% of the epigenome mapping combined with factor (TF) binding motif corresponded to promoters and enhancers, in either an active or and upstream regulatory analyses reveal that these tumors poised/bivalent state. populate two distinct epigenomic landscapes with classical In order to characterize the impact of each of these tumors associated with TFs involved in pancreatic development, specific chromatin states on pancreatic carcinogenesis, we as well as metabolic regulators and Ras signaling, whereas the identified cellular functions that were most frequently basal phenotype tumors utilize proliferative and epithelial-to- associated to each of these states (Fig. 1d, Supplementary Figure 1, mesenchymal transition (EMT) transcriptional nodes down- and Supplementary Data 1). We found that virtually all aspects of stream of the MET . The functional importance of these pancreatic cancer biology were regulated by specific combinations findings is underscored by the fact that genetic inactivation of of epigenetic marks and DNA methylation including the MET results in a transition from a basal to more classical tran- following: proliferation and apoptotic control (RB and TP53, scriptomic signature. Combined, this new knowledge on the other cell cycle checkpoints), major signaling pathways (ErbB, PDAC epigenome, along with gene expression networks that it IGF, RAS, mammalian target of rapamycin, and transforming regulates, provides valuable and biomedically relevant mechan- growth factor (TGF), among others), and cell adhesion istic insight into this disease, offers potential new markers for molecules (such as cadherins and integrins). More specifically, PDAC, and informs the potential development of future ther- we focused on the regulation of major pancreatic cancer genes apeutic regimens that may help manage patients affected by this and evaluated their tight transcriptional control by epigenetic dismal malignancy. Therefore, these findings bear significant modification (Fig. 1e). In particular, the major reprogramming mechanistic and medical importance. TFs, , and SOX9 (Fig. 1f and Supplementary Figure 2), as well as the ERBB2 oncogene, were associated to marks of highly active TSS and active enhancers, as well as notable hypomethyla- Results tion. On the other hand, the hedgehog pathway appeared strongly Distinct chromatin states underlie PDAC heterogeneity. inhibited by the presence of extensive repressed-polycomb marks PDTXs have become a promising tool to generate diagnostic, over the promoters and gene bodies of SMO and PTCH1 (Fig. 1g prognostic, and therapeutic approaches, particularly for indivi- and Supplementary Figure 2). Other tumor suppressor genes, dualized medicine. Importantly, these xenografts often recapitu- such as WT1, SMAD4, and BRCA2, which are also silenced in all late the biology, pathobiology, and therapeutic response of the samples, are frequently associated to polycomb-repressed (E15, primary counterpart, even though they do not actively reproduce H3K27me3) or heterochromatin-like (E12, H3K9me3) states the same microenvironment7. However, an important finding of (Fig. 1e). Key epigenetic regulators were found to be significantly

2 NATURE COMMUNICATIONS | (2018) 9:1978 | DOI: 10.1038/s41467-018-04383-6 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04383-6 ARTICLE upregulated by highly activated epigenetic states, potentially Thus, this data indicates that the PDAC epigenome is explaining this extensive epigenetic control of cancer-related dynamically marked for regulation, which together likely genes. For instance, DNMT1 (DNA methyltransferase), the constitute areas that could react to extracellular signals, intrinsic repressing EZH2 (H3K27 methyltransferase), and HDAC1 cellular clues, and potentially inheritable modifications. (deacetylase), as well as the activating MLL2, SETD3 (both The patterns of epigenetic marks on genes from major H3K4 methyltransferases), and KAT2A (H3K acetyltransferase) carcinogenic pathways demonstrate the dominant role of the are significantly overexpressed in association with active TSS and epigenetic landscape on maintaining a neoplastic phenotype. As active enhancer chromatin in all samples (Fig. 1e). these marks are reversible, this data suggests epigenetic drugs

abc 1E 3E 5E 7E 9E0E1E2E3E4E15 E14 E13 E12 E11 E10 E9 E8 E7 E6 E5 E4 E3 E2 E1 1E 3E 5E 7E 9E0E1E2E3E4E15 E14 E13 E12 E11 E10 E9 E8 E7 E6 E5 E4 E3 E2 E1 Active TSS E1 0.4% Flanking active TSS E2 2.5% Flanking active TSS E3 0.6% Transcription at gene 5’ & 3’ E4 0.3% Poised promoter E5 0.4% Bivalent enhancer E6 0.7% Poised enhancer E7 4.8% Active enhancer E8 3.4% Weakly active enhancer E9 0.9% Active/bivalent enhancer E10 0.3% Poised/bivalent enhancer E11 0.3% Heterochromatin E12 11.5% Heterochromatin E13 1.8% Quiescent E14 57.2% Repressed polycomb E15 15.0%

–4 –2 02 4 0.0 0.2 0.40.6 0.8 1.0 TES TSS ExonGene Gene expression DNA methylation K4me3K27acK4me1 K9me3 K27me3 TSS 2kb GenomeCpG % island Lamin B1 lads de E1 State fgsea p-value frequency E2 5% 1 E3 1% 0.8 E4 1‰ 0.6 E5 0.4 Mean expression 0.2 (leading edges) E6 0 E7 1 0.5 E8 0 E9 –0.5 E10 –1 E11 E12 E13 E15 WT1 SMO KLF4 MLL2 EZH2 SOX9 KAT2A SETD3 PTCH1 ERBB2 BRCA2 HDAC1 DNMT1 SMAD4

2 GLYCOLYSISHIF PATHWAY (24) (15) RB PATHWAY (13) MET PATHWAY (77) RAS IGF1PATHWAY PATHWAY (21) (30) Gene expression 1 MTOR PATHWAY (68) 0 SIGNALING BY WNT (61) G2/M CHECKPOINTS (41) –1 SIGNALING BY NOTCH (91) NCADHERIN PATHWAY (33) TGFBR SIGNALING IN EMT (14) INTEGRIN A9B1P53 PATHWAY SIGNALING (21) PATHWAY (67) 0.8 REGULATION OFCELL APOPTOSIS CYCLE CHECKPOINTS (54) (111) INSULIN SIGNALING PATHWAY (131) DNA methylation 0.4 ERBB1 DOWNSTREAM PATHWAY (102) SEMA4D IN SEMAPHORINCELL ADHESION SIGNALING MOLECULES (31) CAMS (118) REGULATION OF MITOTIC CELL CYCLE (76) 0 DOWNREG. OF ERRB2 ERBB3 SIGNALING (11) f 70.12 mb g 128.84 mb 70.11 mb 128.83 mb 128.85 mb SOX9 SMO

Chromatin states Chromatin states SMO proportion proportion

0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 Methylation level Methylation level 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0

NATURE COMMUNICATIONS | (2018) 9:1978 | DOI: 10.1038/s41467-018-04383-6 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04383-6 bear therapeutic potential, in particular for inhibitors of the subtypes of PDTXs correspond to the previously described clas- highly upregulated molecules, EZH2 (e.g., tazemetostat), sical and basal subtypes (Fig. 2c). These observations, plus DNMT1 (e.g., decitabine), and HDAC1 (e.g., vorinostat and additional support from a previous study on PDTXs14, sub- trichostatin A). stantiate that these avatars retain features of human primary tumors and validate our analytical methodologies. Notably, although we found that the most frequent genomic alteration, Epigenomic landscapes differentiate clinical outcomes. Subse- including point mutations and copy number aberrations, do not quently, we sought to identify the chromatin states that can serve distinguish subtypes (Supplementary Figure 6), the basal samples to epigenetically classify PDAC subtypes, by performing a MCA were more frequently associated to advanced, unresectable PDAC (multiple correspondence analysis) factorial analysis10. This with liver metastasis (Fig. 2d). Overall, the integration of distinct method, which represents the categorical data generated by chromatin states characterized the epigenetic landscape of PDAC ChromHMM as points in a low-dimensional Euclidean space, to distinguish the less aggressive classical subtype from the more behaves as the counterpart of the PCA (principal components aggressive basal subtype. Further support of this observation was analysis) that is commonly used for continuous data, allowing us illustrated by the strong association of the second MCA com- to separate the tumors by their proximity. We found that the first ponent with patient survival (Fig. 2e). In summary, the use of MCA component was associated to a global increase of histone multivariate histone-based information to develop a fifteen-state marks within regions gained in the genome, as determined by chromatin landscape model by ChromHMM followed by an SNP (single-nucleotide polymorphism) array analysis (Supple- MCA approach led to the classification of PDAC PDTXs into two mentary Figure 3). This observation is extremely important, as it major subgroups, which correlated with clinical parameters. suggests that ChIP-seq comparisons, which do not employ this Thus, this integrative method achieves better clinical value than method to identify this type of epigenomic information in tumors each of the assays used independently, but more importantly it with a myriad of duplications and deletions, may introduce provides mapping of genome-wide epigenetic modifications, technical bias yielding an increased probability of overestimating which can potentially serve as candidates for phenotypic, diag- or underestimating histone marks in genomic regions with vari- nostic, prognostic markers, and potential pharmacological targets able representation11. The second component was not influenced from the significant growing number of epigenomic inhibitor by any of these factors, thus offering the richest unbiased infor- drugs. mation for reliably classifying the tumors, as well as identifying gene networks of pathobiological importance. The third MCA component was mainly determined by one sample, which was Epigenomic landscapes implicate distinct pathways. To gain hereafter considered as an outlier (Supplementary Figure 3). The biological and pathobiological mechanistic information on these outlier sample was a very differentiated tumor from a 60-year-old PDAC subtypes, we performed unsupervised cluster analyses of woman, which, surprisingly, was wild-type for KRAS, SMAD4, the chromatin states defining the second MCA component CDKN2A, and alleles, and displayed expression profiles and assembled into three clusters of loci with particular epigenetic methylation patterns that were not characteristic of previously states, namely: cluster 1 mainly composed of enhancers active in described pancreatic cancer tumors, suggesting that it may have the most basal-like samples; cluster 2 consisting of enhancers been either clinically misclassified or represented a very rare type active in classical samples; and cluster 3 representing active of pancreatic cancer not previously described. This interpretation promoters in classical samples (Fig. 3a and Supplementary Fig- further validated our classification method for the purpose of ure 7). These three clusters of loci were associated with differ- individualized medicine, which most often deals with single rare ential methylation patterns, specifically with hypermethylation cases of diseases. Therefore, using the epigenomic regions asso- near enhancers in the basal samples and active promoters of ciated to the second MCA component (5412 regions), the classical samples (Fig. 3b). Each of these clusters of epigenetic remaining 23 PDTXs were hierarchically clustered into two landscapes was strongly associated with a corresponding change subtypes (Fig. 2a). Transcriptome- and methylation-based unsu- in transcription levels of nearby genes (Fig. 3c). The transcrip- pervised analyses also supported this classification, thereby tional activity of nearby genes was consistent with the pre- revealing the impact of both DNA and histone-based epigenetic dominant type of chromatin state in each cluster of loci, as components on gene expression (Fig. 2b and Supplementary denoted by an overall pattern of gene upregulation near regions Figure 4). Control analyses of the average levels of each histone of active enhancers (in basal samples for cluster 1 and classical modification in the enriched regions of the genome demonstrate samples for cluster 2) or active promoters (cluster 3 with classical that our classifications were not influenced by differences in samples only). Functional analysis of basal enhanced genes overall levels of histone marks (Supplementary Figure 5). Cross- (cluster 1) revealed that, consistent with the aggressiveness of referencing our epigenomic data with published PDAC classifi- these tumors, these genes were implicated in signal transduction cations based on genomic data12, 13 clearly showed that these pathways with strong oncogenic potential (e.g., ErbB/EGFR,

Fig. 1 Distinct chromatin states of human PDAC PDTXs. a Chromatin state definitions and histone mark probabilities as determined by ChromHMM8. Average genome coverage. Genomic annotation enrichments for each chromatin state as calculated by ChromHMM. b Boxplots illustrate sample centered averaged gene expression of genes within regions of particular chromatin states based on RNA-seq data. c Boxplots depict sample-averaged level of DNA methylation for the overlapping CpGs with each chromatin state. Dotted line represents mean methylation cut-off (0.5). d Gene-set enrichment analysis

(GSEA) pathways for each chromatin state. Circle size is proportional to − log10 p-value (showing only p-value < 5%) and colors correspond to the mean normalized expression of the genes driving the enrichment, or leading-edge genes. e Heatmap of the frequency of each state among all samples for tumor suppressors (green labels), pro-tumorigenic genes (red labels), and epigenetic regulators (blue labels). Boxplots of gene expression and methylation level are shown for each gene. f,g Visualization of chromatin state proportions on all samples and methylation levels across the SOX9 and the SMO locus. Stacked bars represent the proportion of each chromatin state at a given genomic position. Methylation level is represented for each sample, at each genomic position and colored by its value. For all boxplots (b,c,e), bottom and top of boxes are the first and third quartiles of the data, respectively, and whiskers represent the lowest (respectively highest) data point still within 1.5 interquartile range of the lower (respectively upper) quartile. Center line represents the median value

4 NATURE COMMUNICATIONS | (2018) 9:1978 | DOI: 10.1038/s41467-018-04383-6 | www.nature.com/naturecommunications siae yaCxmdlt eosrt iko et ssonaogteMAdmnin osaeclrdacrigt h hoai state-based chromatin the to according colored are Dots dimension. MCA the along shown is death of risk de demonstrate as to subtypes model sample Cox a by estimated vial signatures available apeaesono h o fec lse,bsdo h oo oepoie nFig. in provided code color st the chromatin on of based proportion indicating cluster, plots each Bar of landscapes. epigenomic top differential the with on loci of shown clusters are three sample into divided been has dimension MCA opnn eosrtstosbye.Sml oriae nti ieso n Student and dimension this in coordinates Sample subtypes. two demonstrates component AUECOMMUNICATIONS NATURE 3 Fig. 2 Fig. 10.1038/s41467-018-04383-6 DOI: | COMMUNICATIONS NATURE Fisher xrsinfrnab ee mil cieTSadmil cieehne ein oae t<2 bad<10k ptemfo S,respectively) TSS, from upstream kb 100 < and kb 20 < and at 0.2 located > (mad regions coordinates enhancer MCA active the mainly to and relation TSS in active shown (mainly genes nearby for expression p lseigaeindicated. are clustering tt-ae lseigfrom clustering state-based ehltrPeoyecluae stema filn-p ehlto level. methylation island-CpG of mean the as calculated Phenotype Methylator oae busra rmrgossoni eaint h C oriae md>02and 0.2 > (mad coordinates MCA the to relation in shown regions from upstream kb 1 < located vle fFisher of -values d c b ae ’ Diagnosis withlivermetastasis:0.038 pgnmclnsae ugs ahbooia mechanisms. pathobiological suggest landscapes Epigenomic xc et nec lse sdtrie yspeci by determined as cluster each in test) exact s Prediction bycollisson,etal:6.9e–05 Cluster 3 Cluster 2 Cluster 1 outcomes. patient predict landscapes Epigenomic ad Prediction byMoffitt,etal:0.00083 1236 Regions 0.0 0.2 0.4 0.6 0.8 1.0 2423 Regions0.0 0.2 0.4 0.6 0.8 1.0 1753 Regions 0.0 0.2 0.4 0.6 0.8 1.0

AM.IP Methylation subtypes:0.00073 Prediction byCollisson Prediction byMoffitt Surgery performedfirst:0.052

1.052 Prediction byICGC:0.00055

1.053 RNAseq subtypes:6.9e–05 Chromatin statessubtypes

2.042 Chromatin states Basal Classical QM Classical 3.076 MCA coordinates:0.0031

D.IPC ’ 1.119 clustering. state-based chromatin the with data clinical of association the show to indicated are test exact s 3.043 12

2.116 , 2.058 13 fi

1.064 . e in ned b

AH.IP d 2.045 lseigo N-e n N ehlto aait utpsi hw ydsic oosadpotdacrigt chromatin to according plotted and colors distinct by shown is subtypes into data methylation DNA and RNA-seq of Clustering | B.Tim and diagnosis at metastasis liver of presence and resection surgical of status on based samples PDAC of characteristics Clinical (2018) 9:1978

2.099 a 2.083 Prediction byICGC 1.048 . p a

1.037 Immunogenic Squamous Progenitor vle fFisher of -values 2.087 CIMP 1.03 1.033 AO.IP foie8 b 208 CpG (20% of 1073) 264 CpG (11% of 2108) 98 CpG (9% of 2285) 2.042 3.043 AM.IP 1.052 Diagnosis withlivermets Surgery first AM.IP 1.053

2.042 Pathways DNA methylation

Y N Y N D.IPC 3.076

D.IPC ’ 1.119 1.119 Island CpG CIMP, clustering. state-based chromatin with association the demonstrate test exact s 3.043 1.052 2.116 2.058 B.Tim 1.064 AH.IP 3.076 2.045 a

| B.Tim 2.116 O:10.1038/s41467-018-04383-6 DOI: fi

2.099 MCA second the with associated regions epigenetic using clustering state-based Chromatin p

pgnmclnsae.Ptwyde Pathway landscapes. epigenomic c 2.083 2.058 vle<00,AOATest). ANOVA 0.05, < -value 1.048 p- 1.037 *** 1.064 value threshold 2.087 **

1.03 * 2.083 1.033 AO.IP p p p 1.053 <0.01 <0.5 <0.1

a foie8

c 2.045 eta ersnigcrmtnsae o h ein soitdwt h second the with associated regions the for states chromatin representing Heatmap 131 Genes (18% of 699) 308 Genes (23% of 1314) 221 Genes (22% of 1001) AH.IP AM.IP 1.052 1.048 1.053 –0.5 2.042 Gene expression 2.099 MCA coordinates 0.5

3.076 –1

D.IPC 0 1 2.087 1.119 3.043 c 1.037 2.116 2.058 publically transcriptome-based using subtypes PDAC of Prediction 1.03 1.064 AH.IP 1.033 2.045 B.Tim AO.IP 1 2.099 a. | 2.083 foie8 www.nature.com/naturecommunications ** * *** *** *** *** *** *** ’ d 1.048 b s 1.037 MCA coordinates ahasfudt esigni be to found Pathways t

leylo eta fDAmtyainfrnab CpGs nearby for methylation DNA of heatmap Blue-yellow 2.087 1.03 -test

p 1.033

vle<00,AOAtest). ANOVA 0.05, < -value AO.IP foie8 fi p iin rgnt rmteG rKG database KEGG or GO the from originate nitions Hazard ratio (log2) vleo h soito ihcrmtnstate-based chromatin with association the of -value Hedgehog signalingpathway(hsa04340) Gastric acidsecretion(hsa04971) Steroid hormonebiosynthesis(hsa00140) Insulin secretion(hsa04911) Maturity onsetdiabetesoftheyoung(hsa04950) (GO:0002067) Glandular epithelialcelldifferentiation Lipid biosyntheticprocess(GO:0008610) (GO:0032024) Positive regulationofinsulinsecretion Fructose andmannosemetabolism(hsa00051) Ras signalingpathway(hsa04014) TGF- Steroid biosynthesis(hsa00100) Endocrine pancreasdevelopment(GO:0031018) Hexose metabolicprocess(GO:0019318) Maturity onsetdiabetesoftheyoung(hsa04950) ErbB signalingpathway(hsa04012) PI3K-Akt signalingpathway(hsa04151) Wnt signalingpathway(hsa04310) TGF- Hippo signalingpathway(hsa04390) (GO:2001233) Regulation ofapoptoticsignalingpathway Regulation ofcelldivision(GO:0051302) Epithelial celldifferentiation(GO:0030855) 0.5 2.0 8.0 β β –0.5 signalingpathway(hsa04350) signalingpathway(hsa04350) (0.037 whenadjusted Logrank test:0.0017 forbiopsy/surgery) 0.0 fi atyatrd( altered cantly c lerdhampo gene of heatmap Blue-red e aadRto(log2) Ratio Hazard . 1.0 0.5 ARTICLE DNA methylation Gene expression Chromatin states p 0.2 0.4 0.6 0.8 –1 –0.5 0 0.5 1 E15 E14 E13 E12 E11 E10 E9 E8 E7 E6 E5 E4 E3 E2 E1 vle<0.05, < -value tsper ates 5 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04383-6

PI3K-AKT, Hippo, and Wnt), EMT, such as the TGFβ pathway, TFs mostly associated with development, including TFs known to as well as deregulation of cell differentiation, proliferation and influence pancreatic morphogenesis (e.g., HNFs, PDX1, MNX1) apoptosis (e.g., YAP1, HEY1, , and E2F7) (Fig. 3d and and lipid (PPARs). This information suggests the Supplementary Table 2). Genes activated by epigenetic landscapes existence of a mechanism whereby enhancer-associated TFs in classical samples (clusters 2 and 3) were mainly involved in amplify their function through regulating other fate-determining pancreatic development (e.g., PDX1, BMP2, GATA6, SHH), TFs and ultimately their target genes involved in distinct func- metabolic processes (e.g., HKDC1, FBP1), and Ras signaling (e.g., tions, in particular, metabolic networks for the classical subtype KITLG, RASA3) (Fig. 3d and Supplementary Table 2). Thus, these (Fig. 4a). Although no basal-specific TF was identified as an results provide a better understanding of epigenetic landscapes upstream regulator, MET, the hepatocyte growth factor that regulate biological pathways differentially enriched in each (HGF) , was associated with the regulation of basal- subtype of tumor, which likely explain the heterogeneity of specific super-enhancers (Fig. 4b). This finding is important since PDAC. anti-MET therapy is clinically used in other cancers16. Looking at the gene networks downstream of MET, we found that TFs pri- marily involved in proliferation, including MYC, MYBL1, and Super-enhancers reveal regulatory TF nodes for PDAC sub- , and in EMT, such as SNAI2, are the best candidates to be types. Additional mechanistic information was derived from key regulators of the basal phenotype. characterizing the heterogeneity in the representation of super- When considering established knowledge on PDAC genetics enhancers in the different PDAC subtypes. Super-enhancers are and now, through the current study, the epigenetic landscape of known to have a cell- and state-specific function, as well as these tumors, the data is in agreement with our previously mediate the aberrant upregulation of cell fate determination genes described model for PDAC3. This model, refined by the data in both developing and cancer cells15. We found the majority of derived here, predicts that key epigenetic pathways, most likely super-enhancers to be specific for classical samples (250), working as effectors of well-known genetic alterations, serve as including a substantial number of loci (28) associated with TFs, amplifiers and differentiating nodes to give rise to distinct PDAC whereas only a few super-enhancers (30) were specific to basal phenotypes (Fig. 5a). It is likely that, at some moment of samples. To further understand the regulatory programs driving tumorigenesis, patient, environmental, and tumor-intrinsic fac- the two phenotypes, we analyzed the upstream transcriptional tors (e.g., tumor microenvironment), or their combination push regulation of these super-enhancers (Fig. 4). We found that the cells through various epigenetic landscapes. The spontaneous classical phenotype is likely influenced by at least 9 TFs contained inter-conversion between subtype landscapes, once their pheno- within super-enhancers (GATA6, FOS, FOXP1, FOXP4, KLF4, type is established, is unlikely. However, we believe that this could ELF3, NFIX, CUX1, and SSBP3) (Fig. 4a and Supplementary potentially be achieved through the inhibition of key pathways or Figure 8). In addition, these super-enhancer-associated TFs processes, such as super-enhancers, which are predicted to be appear to exert their regulatory influence on other upregulated

a CLASSICAL b BASAL Subtypes Super-enhanced TF GATA6 MCA coordinates FOXP1 Other significant TF FOS KLF4 Target genes CUX1 NFIX TF super-enhancers SSBP3 ELF3 CUX1 Super-enhancer FOXP4 ELF3 FOS Present FOXP1 Absent CREB3L1 BATF FOXP4 GATA6 ATF3 KLF4 HNF4A MYCN NFIX HNF1B HNF1A NFIB (9) SSBP3 MNX1 HNF4G ZNF713 CDX2 Gene expression PDX1 SE TF expression 1 FOSL1 YAP1 TCF7L2 NR1I2 NR3C1 NR3C2 IRF8 JUND 0.5 MYBL1 ZNF238 THRA POU2F3 H2AFY SRP9 0 PIAS1 PPARD –0.5 ENO1 CEBPB MYC PPARG RCOR3 FOXH1 ZSCAN4 ZNF784 –1 ETS1 VDR SPDEF TF expression GADD45A CKMT1B FEZ1 FOXO4 JDP2 SOX15 MEIS2 MEIS1 CEBPD NFATC1 GSC ZNF124 RAD21 HIC2 UBE2V1 ZNF3 A1CF LARP1 TP63 HSF1 MAFG TCF7L1 TBX1 (42) FOXC1 SNAI2 TNFRSF11AIGSF9MGST3PKDCCGPRIN2SLC9A2PRKAB1ATP1B1RNASE4FCGBPGPR137BPSORS1C2FAIM3 BCL2L15MLECRDH12TMEM125 SERP2MUC12ETV7MATN2 OSBPL7GSTM3EDEM3 TP53I11RNF183IQCE ISG20CGN PRR5LBTG2 DEGS2DPCR1 PCSK7SLC29A1 E2F1 RXRA ADARB1 RNF186 PRR15 MUC13BTG1 MGLLTAGLN CTSEQSOX1 SLC6A20TRIM31 BACE2TMC5 MYO1DGALC TRNP1PLCB4 SEMA4GCABLES1 (33) HPGD MLXIP LENG9 FNIP2 FOSB ZNRF2 EPS8L3 MGAT3 RAB40B LINC00471 GBP2 CNNM1 SGPP2 ABLIM1 LY6G5C CYP3A5 FOXD1 TMEM229B LRP5 CASR RBM47 PLAC8 DDC ADHFE1 FBP1 FDFT1 CLRN3 BEST4 ZMYND12 MBTPS2 PTGER2 ANG OLFM4 HKDC1 SLC7A8 CHDH PML DDAH1 KANSL1 TMEM2 FAM134B PARD3B ANXA1 CALML4 DNM2 CA8 AXIN2 EPCAM RNASET2 BAIAP2L1 AMN VSIG2 GPX2 SEL1L3 PLEKHG1 APH1B BCAS1 VILL ALDH2 AKR1B15 ATP7B DHRS12 PSTPIP1 KCNC4 ALAD PHLDB2TCTEX1D2FTOHOMER1 AEN ARRB1 Target super-enhancer MET NARSHOMER3KCTD5MSH2DAAM1 SMURF2UTP15IL1BGNB1LPPP2CA LY6D CMTM7 ADRA2A SLC40A1 RCC1PHGDHFBXO27 SVILPTPN1TUBB6 SELPLG ONECUT2 PSMB7GPR157 MTL5NECAB3 PAPSS2 ST6GALNAC1 DCUN1D3DPYD TMPRSS11EPC DLL4 SLC26A5 PTPN13PVRL1 C19orf26PPL MARCKSL1 SPIRE2 MERTK SYNGR1 TSPAN15 EFNA1 TCFL5SPRY4 CTSDLYPD3 MYO15B ARHGAP26 LRRC8A RBM38 RGS20 PRSS27 ETNK1 BTNL3 STOM DUSP5 RCBTB2 SRD5A3 P2RY2 CAMK1D MFSD4 KLHL25 SH3TC2 IGFBP3 PTPRZ1 ELMO2 SPATA6 SEC16B SEMA3F RASAL2 DDAH2 BMF SE target expression FOXK2 KIRREL EIF4E3 MYH14 NXN WNT7B ANO5 PLXNA2 MET MPHOSPH9 CDK5RAP3 GOLGA7B TOP1 B9D1 C1orf210 CYP2C18 TBC1D2 NAP1L4 ENPP4 ABHD2 ALDH1A3 KIFC3 FKBP1B KCNE3 ETV5 KLHL5 CYSTM1 PFKFB2 ARHGEF19 MXRA7 TM4SF20 TNIK ITGA2 MCC C1orf116 GPD1 PLEKHG3 JAG2 FUT4 PRLR BCAS4 NRD1 RIMKLA TC2N A4GALT NAMPT BMP2 COL27A1 PRKG2 CAST PCNXL2 RNF43 Target gene expression DSE PRKAR1A PTPN22 CAPN5 CYR61 HMGA2 UBL3 SLCO2B1 BICD2 LRP12 MMP7 IYD USP31 BCAR3 RASA3 FAM174B AMOTL1 FCHSD2 VAMP4 CAPN9 TGFBI ADORA2B LPCAT4 SPPL2A TPM1 ZFAND6 NF2 PITPNM2 OMP CRTC1 SSH1 SLC3A2 ARHGAP24 GDA HOPX E2F7 PLEKHD1 COL16A1 ZBTB10 KRT13 P2RX4 RAB3B SLC38A2 SPSB1 NADSYN1 FOXA3 SRGAP3 LGALS1 TMEM92 ANKS4B CDH26 EMP1 KPNA7 KITLG SYNPO PDZK1IP1 PLXNA1 S100A9 CHMP4C SORBS1 BAG3 IGF2BP2 LYPD6B PLEKHA6 SNN LARP6 NOXA1 NQO1 CORO1C AQP3 ITM2C METTL7B CNN3 MAPKBP1 MLPH HSD17B2 HSD11B2 CHN2 DDX1 HEY1 CYP2S1 LIMA1 NOTCH1 CSPP1 STOML1 GRAMD2 WDFY2 LYPLA1 NUDT16 ADCY6 ZNF436 FJX1 HR TMEM163 GPA33 NEDD9 FSCN1 SLC20A1 ZNF703 KANK1 ADAM19 STK4 R3HDML GPRC5C PGF CAMSAP1 PLA2G4F PELI1 FST GPC1 HOXB5 CLDN12 IHH NR5A2 ANKRD46 RPRD1B DOK1 SLC7A7 MREG MYO1B MECOM TMEM37 IL4R COPS5 CYP51A1 TOX3 FAM3C PDXP DAPK2 SLC22A23 PLBD1 SERPINB6 (372) RUVBL1 MCM4 TMEM38A F5 POFUT1 NRSN2 CD164 TESC CELSR1 PIGV EGLN3 GPR35 KDM1A HSPB11 LYN CCR6 SLC10A5 TRIM2 ZNF697 RHCG ENTPD8 IMPA2 STK17A SSH3 ANXA10 SHH ETNK2 CCBL1 SYTL2 SULT1C2 GRHL2 ELFN2 ZNF823 MYOM3 AKR1B10 PCSK6 GRHL3 CKAP5 PLA2G4A PTPRJ MTSS1L PHLDB3 DUSP8 TMEM51 SERTAD2 RGS19 RASA4 MTMR11 TTC39A ACY3 GPR153 ARNTL2 MZB1 MTMR10 EGFR PUF60 KCNK5 MYOM1 GRAMD3 COQ6 KIF12 NOSTRIN IL17RB NAPEPLD CD44 C18orf54 B2M ID2 AEBP2 FGFBP1 PARM1 LYZ IL1A MICALL1 LXN MTHFS MN1 STIM1 TMEM98 REG4 HADHB NLN CLDN10 PTPRN2 ELOVL5 CHAF1B TIAF1 SLC19A2 (241) RASGEF1B ATP6V0A1 SH3PXD2A CD3EAP POU5F1 SPTBN1 SLCO2A1 PROM1 WDR43 TTPAL RAB20 TSPAN3 SLC35F2 PTGES GPR116 STXBP6 MOCS2 LTBP3 VIPR1 ASL NRP1 MLLT11 ACSS2 PIP5K1B GUCA2B LDHD FASN SPRY1 CA9 PRDM16 ELL2 TOMM34 MUC17 IQGAP2 MYO5A LAMA5 TLR3 SLC30A10 EXOC6B ARHGAP40 CA13 GATA4 MSLN MGST2 RDX UPK3BL KCNK10 SLC35D2 FZD6 BAIAP2 BEND7 TMEM63B SS18L1 ATAD2 B3GALT5 TTYH3 EDEM1 ZBTB32 LGALS9 SLC45A3 CLDN18 TMEM63A GSDMCWWTR1 PDP1MAPK11 ARHGEF38TJP3 ZFPM1CASZ1 SFNNUAK1 ODC1NBPF1 FGD5AKR7A3 MPPE1TLN2 NMD3 LRP11 TNFSF13BCMO1 SLC5A9TEP1 SLAIN2TGFA C8orf37RNF169 CCNI2STK31 CYP27A1RBP4 PI4K2ASIK1 PMAIP1FAM83D MCUPGAP3 SCAMP2CD2AP MTHFD1AP1B1SERPINF1 PCYT1ASLC2A11MBD2 MICU1TFCP2L1 HDHD3ASRGL1 ACOT7TMEM194BPRKCHNDST1SERPINE1ELK3TTI1PDPRSPTBN2ZW10WARS PDZD3ACHEFUT2 MYO7BACSL5TM4SF4 ATP10BEFNA2LRRC66 IL1R2PRKCDCOG2 GCNT3ASPHD2ATP2A3SNRNP35PDCD4BTNL8SPACA4AQP7XYLT1C4BPBARFIP1HABP2PAQR8ENTPD2DAPK1

Fig. 4 Transcriptional regulatory networks for PDAC phenotypes. TFs found to be significant regulators of the a classical- and b basal-associated genes were used to reconstruct regulatory networks. Networks include TFs upregulated by super-enhancers (upper-left light red nodes in a only), significant TFs (yellow), and their targets arranged in a circular layout. In each circle, cellular pathways and functions over-represented among the target genes of each network (classical a, basal b) are shown and grouped by gene-set similarity. In the center of a and b, the heatmaps of TF expression and their targets, as well as the heatmaps of super-enhancers are represented. Red arrow corresponds to TF–TF regulation, and gray arrows to TF–nonTF target regulations

6 NATURE COMMUNICATIONS | (2018) 9:1978 | DOI: 10.1038/s41467-018-04383-6 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04383-6 ARTICLE

abEpigenetic model for establishment Validation of the model of the tumor cell phenotype MET Genetic alterations KRAS CLASSICAL BASAL TRANSFORMATION CDKN2A TP53 c Individuals factor map (PCA) 10 Legend MCA coordinates Scramble siMET Classical Basal 5 PHENOTYPE 01-048f 01-048f Epigenetic alterations ACQUISITION 01-030 01-030 0 GATA6 MET AH_IPC Dim 2 (8.11%) AH_IPC FOS –5 AO_IPC AO_IPC

Snail HNFs, PPARs –10 MYC, MYBL1, E2Fs Pancreas development –15 –10 –50 5 10 EMT and proliferation Dim 1 (79.94%)

deGATA6 targetsp-value=0.035 Cell cycle p-value=0.016 0.4 0.0 0.3 –0.1 0.2 –0.2 CLASSICAL BASAL 0.1 –0.3 0.0 –0.4 0 5000 10,000 15,000 0 5000 10,000 15,000

Fig. 5 Epigenetic model for the tumor phenotype and its validation. a Genes that are in a box with red ribbon correspond to genes having an associated Super-enhancer. Yellow flash corresponds to genetic alterations. b Diagram is shown of the design for validating the model through inhibition of MET by siRNA to test the potential conversion of a basal to classical cellular phenotype. c PCA based on the differentially expressed genes from RNA-seq on basal siMET and basal control (scramble siRNA) samples. Basal samples were used as active individuals in the PCA construction, whereas the pure classical sample was projected on the two first dimensions. d GSEA plot is shown for GATA6 targets with vertical black lines corresponding to GATA6 putative targets ordered by their statistical tests of the differential analysis between siMET vs. control in basal samples. The curve illustrates the running enrichment score for the gene set ranked by the difference between the cohorts, showing global upregulation of GATA6 targets in the siMET condition. e GSEA plot for cell cycle is shown with vertical black lines corresponding to cell cycle genes (mitotic cell cycle, Reactome) ordered by their statistical tests of the differential analysis comparing siMET vs. control in basal samples. Green curve represents the running enrichment score for this gene set across the ranked list, which demonstrates global downregulation of those genes in the siMET condition important for the acquisition of a distinct phenotype. Conse- modeled, our data leads to the inference that genetic, environ- quently, we tested the importance of MET in the maintenance of mental, and tumor-intrinsic factors, such as the tumor micro- the basal phenotype, using small interfering RNA (siRNA) environment, likely all converge to give rise to distinct epigenetic knockdown in PDTX-derived cell lines (Fig. 5b). Congruent with landscapes during carcinogenesis. The inter-conversion between the epigenetic data, MET-inhibited samples underwent an overall landscapes may not occur spontaneously, thereby fixing distinct shift towards the classical phenotype (Fig. 5c), which involved the subtypes. On the other hand, the targeting of upstream regulator increase of GATA6 transcriptional activity, as evidenced by pathways of key super-enhancers (e.g., MET), provides a proof- upregulation of its gene targets (Fig. 5d), and the inhibition of cell of-principle for the existence of phenotypic plasticity. Thus, we cycle related pathways (Fig. 5e). Over-expression of apoptotic- envision that through the guidance of this work, some agents may related genes and metabolic modifications was also caused by be useful for achieving this result. For instance, our results from manipulation of the MET pathway (Supplementary Figure 9 and targeting super-enhancer-mediated mechanisms of the more Supplementary Data 2). Thus, these results reveal that subtypes aggressive basal subtype, through MET inactivation, support the have a potential plasticity, which can be manipulated by targeting future testing of anti-MET therapies, such as those in clinical pathways described in this study. trials for many types of tumors16, in PDAC. Furthermore, the data provided on both DNA methylation and histone marks support the testing of novel therapies with many drugs against Discussion writers, readers, and erasers that are in several phases of clinical Epigenomic mechanisms are responsible for the regulation of trials or approved for other diseases17. Thus, we speculate that, in ontologically related gene expression networks at an appropriate the future, pharmacological manipulation targeting specific level, time, and place to give rise to both normal and diseased pathways described here may convert the most aggressive tumors phenotypes. The current study provides the most comprehensive into a more benign or manageable counterpart in the clinic to understanding, to date, of epigenomic landscapes underlying improve survival. PDAC heterogeneity, predicts survival, informs the molecular pathobiology of this disease, as well as identifies epigenetically modified regions of the genome, which can serve as potential new Methods markers and therapeutic targets. The integration of all datasets Patient-derived xenografts. Three expert clinical centers collaborated on this project after receiving ethics review board approval. Patients were included in this provides new information as to the state of promoter, enhancer, project under the Paoli-Calmettes Institute clinical trial number 2011-A01439-32. and super-enhancer associated transcriptional processes that are Consent forms of informed patients were collected and registered in a central operational in the basal and classical PDAC phenotypes. When database. The tumor tissues used for xenograft generation were deemed excess to

NATURE COMMUNICATIONS | (2018) 9:1978 | DOI: 10.1038/s41467-018-04383-6 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04383-6 that required for the patient’s diagnosis. PDAC tissue from surgical samples was each sample as the proportion of methylated ( > 30% β-value) CpGs among the fragmented, mixed with 100 μL of Matrigel, and implanted with a trocar (10 gauge; selected normally unmethylated island CpG. Innovative Research of America, Sarasota, FL) in the subcutaneous right upper fl ank of an anesthetized and disinfected male NMRI (Naval Medical Research fi Institute)-nude mouse. Samples obtained from EUS-FNA were mixed with 100 μL mRNA pro ling (RNA-seq) and analysis. RNA libraries were prepared (Illumina TruSeq RNA v2) and run on the Illumina High Seq-2000 for 101 bp paired end of Matrigel (BD Biosciences, Franklin Lakes, NJ) and injected in the upper right fi flank of a male nude mice (Swiss Nude Mouse Crl: NU(lco)-Foxn1nu; Charles reads in the Mayo Clinic Medical Genomics Core. Gene expression pro les were fi obtained using the MAP-RSeq v.1.2.1 workflow25, the Mayo Bioinformatics Core River Laboratories, Wilmington, MA) for the rst implantation. When xenografts 26 reached 1 cm3, these were removed and passed to NMRI-nude mice in the same pipeline. MAP-RSeq consists of alignment with TopHat 2.0.6 against the human hg19 genome build and gene counts with the HTSeq software 0.5.3p9 (http://www. manner as the surgical samples. All animal experiments were conducted in fi accordance with institutional guidelines and were approved by the “Plateforme de huber.embl.de/users/anders/HTSeq/doc/overview.html) using gene annotation les ’ ” fi obtained from Illumina (http://cufflinks.cbcb.umd.edu/igenomes.html). RNA-seq Stabulation et d Expérimentation Animale (PSEA, Scienti c Park of Luminy, 27 Marseille). reads were also mapped using STAR with the proposed ENCODE parameters and XENOME28 on the human hg19 and mouse mm10 genomes and transcript annotation (Ensembl 75). Gene counts were normalized using reads per kilobase DNA and RNA extraction. Nucleic acids were extracted for 24 xenograft samples per million mapped reads (RPKMs). corresponding to 24 unique patients. DNA was extracted using Blood & Cell culture DNA mini kit (Qiagen) following the manufacturer’s instructions. RNA Public dataset comparison. ICGC Methylation chips, RNAse, and microarray was extracted using RNeasy mini kit with optional on-column DNA digestion gene expression datasets were downloaded from the ICGC data portal (dcc.icgc.org, (Qiagen). release 20). Other datasets were downloaded from the provided Gene Expression Omnibus entry (Moffitt et al. GSE7172913 and Collisson et al.12 GSE17891). All Histone modification profiling (ChIP-seq) and analysis. Tissue (50 mg) was non-cancer samples were removed from each dataset. Expression datasets were homogenized for 15–30 s in 500 μL of 1 × PBS using a tissue grinder. Homogenized then centered gene-wise. Centroid classifiers were built for each dataset describing 29, 30 tissues were cross-linked to final 1% formaldehyde for 10 min, followed by a classification using an approach described in previous works . Briefly, after quenching with 125 mM glycine for 5 min at room temperature, and by washing gene-wise centering, the 1000 most differentially expressed genes (limma) or dif- with tris-buffered saline (TBS). The pellets were resuspended in cell lysis buffer (10 ferentially methylated CpG (Student’s t-test), were used to build centroids of each mM Tris-HCl, pH 7.5, 10 mM NaCl, 0.5% NP-40) and incubated on ice for 10 min. subtype. Gene expression profiles of samples to test were correlated (Pearson’s The lysates were aliquoted into two tubes and washed with MNase digestion buffer correlation) to all the centroids of a classification system and the closest centroid fi fi (20 mM Tris-HCl, pH 7.5, 15 mM NaCl, 60 mM KCl, 1 mM CaCl2) once. After class (highest correlation coef cient) was assigned. When speci ed, indeterminate resuspending in 250 μL of the MNase digestion buffer with proteinase inhibitor samples correspond to samples that did not significantly correlated to any centroid. cocktails for each tube, the lysates were incubated in the presence of 1000 gel units of MNase (NEB, M0247S) per 4e6 cells at 37 °C for 20 min with continuous mixing SNP arrays analysis. Illumina Infinium HumanCode-24 BeadChip SNP arrays in a thermal mixer. After adding an equal volume of sonication buffer (100 mM were used to analyze the DNA samples. Integragen SA (Evry, France) carried out Tris-HCl, pH 8.1, 20 mM EDTA, 200 mM NaCl, 2% Triton X-100, 0.2% sodium hybridization, according to the manufacturer’s recommendations. The BeadStudio deoxycholate), the lysates were sonicated for 15 min (30 s on; 30 s off) in a Diag- software (Illumina) was used to normalize raw fluorescent signals and to obtain log enode bioruptor and centrifuged at 15,000 r.p.m. for 10 min. The cleared super- R ratio (LRR) and B allele frequency (BAF) values. Asymmetry in BAF signals due natant was incubated with 2 μg of histone modification-specific antibodies overnight to bias between the two dyes used in Illumina assays was corrected using the tQN at 4 °C. The following antibodies were used: anti-H3K4me1 (Abcam, ab8895, lot normalization procedure31. We used the circular binary segmentation algorithm32 114262), anti-H3K27ac, (Abcam, ab4729, lot GR150367), anti-H3K4me3 (Abcam, to segment genomic profiles and assign corresponding smoothed values of LRR ab8580, lot GR188707-1), anti-H3K27me3 (Cell Signaling Technology, 9733 s, lot and BAF. The Genome Alteration Print method was used to determine the ploidy 8), and anti-H3K9me3 (Diagenode, C15410056, lot A1675-001p). After adding 30 of each sample, the level of contamination with normal cells, and the allele-specific μL of G-agarose magnetic beads, the reactions were incubated for another 3 copy number of each segment33. h. Beads were washed extensively with ChIP buffer, high-salt buffer, LiCl2 buffer, and TE buffer. Bound chromatin was eluted and reverse-crosslinked at 65 °C overnight. DNA was purified using the Mini-Elute PCR purification kit (Qiagen) Unsupervised clustering. Features on sexual were removed for after treatment with RNase A and proteinase K. Enrichment was confirmed by subsequent analysis. One sample outlier identified by MCA analysis was removed targeted real-time PCR in positive and negative genomic loci. For next-generation for clustering. Unsupervised clustering analysis was carried out on: gene expression sequencing, ChIP-seq libraries were prepared from 10 ng of ChIP, and input DNA (RNA-seq, 23 samples), CpG methylation (MethEpic, 23 samples), and H3K4me3, with the Ovation Ultralow DR Multiplex system (NuGEN). The ChIP-seq libraries H3K4me1, H3K27ac, H3K9me3, and H3K27me3 (ChIP-seq, 23 samples). For were sequenced to 51 base pairs from both ends using the Illumina HiSeq 2000 in histone marks, ChIP-seq data were filtered as follows: − log10(FDR) > 10 and the Mayo Clinic Medical Genomics Core. Data were analyzed using the HiChIP number of samples that share the peak > 3. For sequence or ChIP-based data, an pipeline18. Briefly, paired-end reads were mapped by BWA19 and pairs with one or extension of the ConsensusClusterPlus algorithm was used34. In brief, using all both ends uniquely mapped were retained. H3K4me3, H3K4me1, and H3K27ac paired combination of Pearson’s distance and different linkage metrics (Ward, peaks were called using the MACS2 software package20 at false discovery rate complete and average), hierarchical clustering is bootstrapped in 1000 iterations of (FDR) ≤ 1%. SICER21 was used to identify enriched domains for H3K27me3 and resampling of the most variant features. An additional level of iteration adjusts the H3K9me3. For data visualization, BEDTools22 in combination with in-house scripts threshold of feature variability. The consensus is given by a final hierarchical were used to generate normalized tag density profile at a window size of 200 bp and clustering using the complete linkage and the number of co-classification as sample step size of 20 bp. We also visualized the average profile around TSS for H3K4me1, distance. For Methylation chips, the SD was used as a measure of variability and 10 H3K4me3, H3K9me3, H3K27ac, and H3K27me3 across all samples. The tags were thresholds between 1% and 10% were used for each iteration of Consensu- normalized to tags-per-million with a flanking region of 10 kb around the TSS. The sClusterPlus. To consider the specificity of the mean-variance relationship in count data was plotted with a Y-axis as the normalized log2 fold change of IP over control data, a combination of mean and SD of the log-counts (minimum rank of both) and the X axis as the bins across the given region of interest. Unsupervised clus- was used to select between 1% and 50% of the features with the highest counts and tering was performed by selecting the top 20,000 merged peak regions with highest variance in RNA-seq and ChIP-seq data. Chromatin state-based clustering was variance to generate the clusters for individual histone modifications. performed on the 5412 regions that were associated to the second MCA compo- nents using a binary distance and Ward linkage metrics. Clustering of chromatin states regions was performed using hierarchical clustering and the number of DNA methylation profiling and analysis. Whole-genome DNA methylation was fi clusters was determined using the cutreeDynamic function (dynamicTreeCut R analyzed using the Illumina In nium MethylationEPIC Beadchip. Integragen SA package35) with a minimum size module of 500 features. (Evry, France) carried out microarray experiments and hybridized to the BeadChip arrays following the manufacturer’s instructions. Illumina GenomeStudio software was used to extract the probe DNA methylation intensity signal values for each ChromHMM. ChromHMM8 was used to perform hidden Markov modeling on the locus. Data were then preprocessed following recommendations from the five histone marks and, by default, chromatin states were analyzed at 200 bp Dedeurwaerder et al23. Data were removed from probes that were not detected or intervals and a fold threshold of 10. The tool was used to learn consensus models saturated and that contained SNPs or overlapped with a repetitive element that was from virtually concatenated aligned bam files from all the samples. Control data not uniquely aligned to the or regions of insertions and deletions in were used in the model to help reduce copy number variation and repeat associate the human genome. Data were then adjusted for color balance bias and normalized artifacts in the ChIP-seq samples. Multiple state models were visualized to capture between samples using the SSN (shift and scaling normalization) method using the all the key chromatin states. A 15-state model was selected and applied on all lumi package functions. The CpG Island Methylator Phenotype (CIMP) index was samples to obtain overlap and enrichment for each state. Enrichment of each state estimated by adapting the approach by Toyota et al.24. In brief, CpG islands found was calculated across genomic regions of interest and visualized as heatmaps for to be unmethylated ( < 20% β-value) in all 25 normal pancreatic samples from the genomic regulatory regions of interest. The states were then given functional ICGC consortium were selected. The CIMP index was calculated independently for annotation based on these enrichment patterns.

8 NATURE COMMUNICATIONS | (2018) 9:1978 | DOI: 10.1038/s41467-018-04383-6 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04383-6 ARTICLE

MCA analysis. ChromHMM output files were concatenated using the unionbed References function from BEDTools22. Chromatin states (4,810,649 regions) were filtered so 1. Rahib, L. et al. Projecting cancer incidence and deaths to 2030: the unexpected that the 2 main representative states per region were present in at least 30% of burden of thyroid, liver, and pancreas cancers in the United States. Cancer Res. samples, and that the quiescent states (E14) was not the main representative state 74, 2913 (2014). per region. Sex chromosomes were removed for the rest of the analysis. An MCA 2. Hruban, R. H., Goggins, M., Parsons, J. & Kern, S. E. Progression model for analysis10 was performed on the filtered chromatin states matrix (665,328 regions) pancreatic cancer. Clin. Cancer Res. 6, 2969 (2000). 36 using FactoMineR . As in PCA, the MCA decomposes the variance into a com- 3. Lomberk, G. A. & Urrutia, R. The triple code model for pancreatic cancer: ponent of a set of new orthogonal variables (dimensions) ordered by the amount of crosstalk among genetics, epigenetics, and nuclear structure. Surg. Clin. North. variance that each component explains. For a given dimension, the most con- Am. 95, 935–952 (2015). fi tributing regions were selected as those having a signi cant association (Anova test, 4. Bailey, P. et al. Genomic analyses identify molecular subtypes of pancreatic ɑ fi 5%) and having a determination coef cient above 0.6 (dimdesc function from cancer. Nature 531,47–52 (2016). FactoMineR). 5. Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015). Functional analysis. In order to retrieve the pathways affected by a particular 6. Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone chromatin state, a gene-set enrichment analysis (GSEA) approach was per- modifications. Cell. Res. 21, 381–395 (2011). formed for each chromatin state, using the fgsea R package37, which implements 7. Tentler, J. J. et al. Patient-derived tumour xenografts as models for oncology GSEA on a pre-ranked genelist and MsigDB signaling database38. Gene score used drug development. Nat. Rev. Clin. Oncol. 9, 338–350 (2012). in fgsea was the frequency among all samples of the presence of a particular 8. Ernst, J. & Kellis, M. ChromHMM: automating chromatin state discovery and chromatin state within − 20 kb from TSS + 1 kb from TSE of the gene. Leading characterization. Nat. Methods 9, 215–216 (2012). edges correspond to genes that drive the enrichment as given by fgsea. 9. Hoffman, M. M. et al. Integrative annotation of chromatin elements from For the regions contributing the most to the second MCA dimension (5412 ENCODE data. Nucleic Acids Res. 41, 827–841 (2013). regions selected as described in the previous paragraph), we retrieved nearby CpGs 10. Greenacre, M. & Blasius, J. Multiple Correspondence Analysis and Related and genes. Nearby CpGs were determined using the findOverlaps function from Methods. (Chapman & Hall/CRC, 2006). 39 the IRanges packages and with a location < 1 kb upstream from regions. Nearby 11. Ashoor, H. et al. HMCan: a method for detecting chromatin modifications in 40 genes were retrieved using rGreat (basal plus extension; proximal 5 kb upstream cancer samples using ChIP-seq data. Bioinformatics 29, 2979–2986 (2013). 1 kb downstream, plus distal: up to 20 kb for Active TSS regions and 100 kb for 12. Collisson, E. A. et al. Subtypes of pancreatic ductal adenocarcinoma and their ’ ɑ active enhancer regions). Pearson s correlation ( 5%) with the MCA component differing responses to therapy. Nat. Med. 17, 500–503 (2011). coordinates was used to select associated genes and CpGs. We performed 13. Moffitt, R. A. et al. Virtual microdissection identifies distinct tumor- and functional enrichment analysis on the genes corresponding to each region clusters fi 41 stroma-speci c subtypes of pancreatic ductal adenocarcinoma. Nat. Genet. 47, using the Enrichr software . 1168–1178 (2015). 14. Nicolle, R. et al. Pancreatic adenocarcinoma therapeutic targets revealed by TF and super-enhancer analysis. ROSE42, 43 was employed to identify super- tumor-stroma cross-talk analyses in patient-derived xenografts. Cell Rep. 21, enhancers based on their density and length of the stitched regions. A peak 2458–2470 (2017). stitching distance of 12,500 bp was used and the regions around TSS 2500 bp were 15. Heinz, S., Romanoski, C. E., Benner, C. & Glass, C. K. The selection and excluded, while identifying super-enhancers in each sample. The signal and ranks function of cell type-specific enhancers. Nat. Rev. Mol. Cell. Biol. 16, 144–154 are normalized from 0 to 1 and sorted by their score and plotted. All stitched (2015). regions above the inflection point were considered as super-enhancers for further 16. Eder, J. P., Vande Woude, G. F., Boerner, S. A. & LoRusso, P. M. Novel 44 analysis. BAM tracks were visualized using the gviz R package . Significant TFs therapeutic inhibitors of the c-Met signaling pathway in cancer. Clin. Cancer fi were identi ed by gene-target enrichment analysis using the ChEA database Res. 15, 2207 (2009). 45 (Fisher exact test on the basal-classical component associated genes) or by motif 17. Lomberk, G. A., Iovanna, J. & Urrutia, R. The promise of epigenomic enrichment on the basal-classical component associated regions using PWMEnrich therapeutics in pancreatic cancer. Epigenomics 8, 831–842 (2016). R package. TFs, for which expression was negatively or positively correlated to the 18. Yan, H. et al. HiChIP: a high-throughput pipeline for integrative analysis of basal-classical component, were regarded as classical or basal TFs, respectively. 46 47 ChIP-Seq data. BMC Bioinformatics 15, 280 (2014). Cytoscape was used for network visualization and the ClueGO app was applied 19. Li, H. & Durbin, R. Fast and accurate short read alignment with to target genes for clustered enriched pathway network representation. Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 20. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, Met siRNA transfection, RNA-seq, and qPCR validations. Cells (3e5) were R137 (2008). plated in six-well plates and 24 h later transfected with a pool of 4 MET siRNAs 21. Zang, C. et al. A clustering approach for identification of enriched domains (ON-TARGETplus siRNA Reagents, Dharmacon), using INTERFERin reagent from histone modification ChIP-Seq data. Bioinformatics 25, 1952–1958 (Polyplus-transfection) according to the manufacturer’s protocol. A scrambled (2009). siRNA pool was used as the negative control. After 72 h, cells were lysed, and RNA 22. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for extracted with RNeasy Mini Kit (Qiagen). The sequences of Met-specific siRNAs comparing genomic features. Bioinformatics 26, 841–842 (2010). were as follows: Met1: 5′AACUGGUGUCCCGGAUAU-3′; Met2: 5′-GAA- 23. Dedeurwaerder, S. et al. A comprehensive overview of Infinium ′ ′ ′ CAGCGAGCUAAAUAUA-3 ; Met3: 5 -GAGCCAGCCUGAAUGAUGA-3 ; and HumanMethylation450 data processing. Brief. Bioinform. 15, 929–941 (2014). ′ ′ Met4: 5 -GUAAGUGCCCGAAGUGUAA-3 . RNA libraries were prepared (Illu- 24. Toyota, M. et al. CpG island methylator phenotype in colorectal cancer. Proc. mina TruSeq RNA v2) and run on the Illumina High Seq-2500 for 125 bp paired Natl. Acad. Sci. USA 96, 8681–8686 (1999). end reads in the Genomic Sciences and Precision Medicine Center, Medical College 25. Kalari, K. R. et al. MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing. ’ of Wisconsin. Gene counts were normalized using RPKM. Student s t-tests were BMC Bioinformatics 15, 224 (2014). performed to test for differentially expressed genes (list is given in Supplementary 26. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence Table 3) between the siMET basal and scramble basal samples. Fgsea enrichment of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013). tests were performed based on Pvalues of the differential analysis of basal siMET 27. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, vs. basal control samples, and by using MsigDB database and the described open 15–21 (2013). resource48. For qPCR, total RNA (1 μg) was used as a template for cDNA synthesis, 28. Conway, T. et al. Xenome—a tool for classifying reads from xenograft using the GoScript™ reverse transcription kit (Promega). GoTaq® qPCR 2 × Master samples. Bioinformatics 28, i172–i178 (2012). Mix (Promega) was used with the following reaction conditions: denaturation at 29. Marisa, L. et al. Gene expression classification of colon cancer into molecular 95 °C for 2 min; 40 cycles of 15 s at 95 °C, 45 s at 60 °C. Reactions were carried out using the AriaMx real‐time PCR system and analyzed using the AriaMx software subtypes: characterization, validation, and prognostic value. PLoS Med. 10, v1.1 (Agilent Technologies, Santa Clara, CA, USA). Primer lists for each transcript e1001453 (2013). are provided in Supplementary Table 4. 30. Guinney, J. et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 21, 1350–1356 (2015). 31. Staaf, J. et al. Normalization of Illumina Infinium whole-genome SNP data Data availability. ChIP-seq, DNA methylation, RNA-seq, and SNP datasets that improves copy number estimates and allelic intensity ratios. BMC fi support the ndings of this study have been deposited at ArrayExpress (http:// Bioinformatics 9, 409–409 (2008). www.ebi.ac.uk/arrayexpress) under accession codes E-MTAB-5632, E-MTAB- 32. Venkatraman, E. S. & Olshen, A. B. A faster circular binary segmentation 5571, E-MTAB-5639, and E-MTAB-5570, respectively. algorithm for the analysis of array CGH data. Bioinformatics 23, 657–663 (2007). Received: 13 July 2017 Accepted: 25 April 2018 33. Popova, T. et al. Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays. Genome Biol. 10, R128–R128 (2009).

NATURE COMMUNICATIONS | (2018) 9:1978 | DOI: 10.1038/s41467-018-04383-6 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04383-6

34. Wilkerson, M. D. & Hayes, D. N. ConsensusClusterPlus: a class discovery tool Author contributions with confidence assessments and item tracking. Bioinformatics 26, 1572–1573 G.L., A.M., N.E., and J.H.L. made contributions to the acquisition of data. V.M., P.D., M. (2010). G., S.G., O.T., J.R.D., M.G., P.G., M.G., and V.S. contributed to sample acquisition. O.G., 35. Langfelder, P., Zhang, B. & Horvath, S. Defining clusters from a hierarchical B.B., and M.B. established and maintained the PDTXs. G.L., Y.B., R.N., A.N., K.S.G., L. cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24, 719–720 M., Z.S., H.Y., N.E., L.A., M.A., G.O., E.K., J.I., and R.U. contributed to analysis and (2008). interpretation of data. G.L., J.I., and R.U. generated the main idea of the work and 36. Lê, S., Josse, J. & Husson, F. FactoMineR: an R package for multivariate developed the study design, both conceptually and methodologically. G.L., A.N., A.D.R., analysis. J. Stat. Software 25, https://doi.org/10.18637/jss.v025.i01 (2008). J.I., and R.U. supervised the data analysis. G.L., N.D., J.I., and R.U. supervised the project. 37. Sergushichev, A. An algorithm for fast preranked gene set enrichment analysis G.L., Y.B., R.N., L.M., A.N., J.I., and R.U. wrote the manuscript from first draft to using cumulative statistic calculation. Preprint at bioRxiv https://doi.org/ completion. K.S.G., A.M., N.E., L.A., M.A., T.O., J.H.L., G.O., A.D.R., and N.D. made 10.1101/060012 (2016). comments, suggested appropriate modifications, and corrections. All authors read and 38. Liberzon, A. in Stem Cell Transcriptional Networks: Methods and Protocols approved the final manuscript. (ed. Benjamin L. Kidder) 153–160 (Springer New York, 2014). 39. Lawrence, M. et al. Software for computing and annotating genomic ranges. Additional information PLoS Comput. Biol. 9, e1003118 (2013). 40. rGREAT: Client for GREAT analysis. R package version 1.7.1, http://great. Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467- stanford.edu/public/html (2018). 018-04383-6. 41. Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016). Competing interests: The authors declare no competing interests. 42. Lovén, J. et al. Seletive inhibition of tumor oncogenes by disruption of super- enhancers. Cell 153, 320–334 (2013). Reprints and permission information is available online at http://npg.nature.com/ 43. Whyte, W. A. et al. Master transcription factors and mediator establish super- reprintsandpermissions/ enhancers at key cell identity genes. Cell 153, 307–319 (2013). Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in 44. Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and fi Bioconductor. In Statistical Genomics. Methods in Molecular Biology, vol 1418 published maps and institutional af liations. (eds Mathé, E. & Davis, S.) 335–351 (Humana Press, New York, 2016). 45. Lachmann, A. et al. ChEA: regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 26, 2438–2444 (2010). Open Access This article is licensed under a Creative Commons 46. Shannon, P. et al. Cytoscape: a software environment for integrated models of Attribution 4.0 International License, which permits use, sharing, biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). adaptation, distribution and reproduction in any medium or format, as long as you give 47. Bindea, G. et al. ClueGO: a Cytoscape plug-in to decipher functionally appropriate credit to the original author(s) and the source, provide a link to the Creative grouped and pathway annotation networks. Bioinformatics 25, Commons license, and indicate if changes were made. The images or other third party ’ 1091–1093 (2009). material in this article are included in the article s Creative Commons license, unless 48. Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular indicated otherwise in a credit line to the material. If material is not included in the perturbations across complex diseases. Nat. Methods 13, 366–370 (2016). article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/. Acknowledgements This work was supported by La Ligue Contre le Cancer (Programme Cartes d’Identité des Tumeurs® (CIT)), INCa, Canceropole PACA, DGOS (labellization SIRIC), INSERM to J.I., © The Author(s) 2018 the National Institutes of Health (grants R01 CA178627 to G.L., and R01 DK52913 to R.U.).

10 NATURE COMMUNICATIONS | (2018) 9:1978 | DOI: 10.1038/s41467-018-04383-6 | www.nature.com/naturecommunications