bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Chromatin Patterns Distinguish Breast Tumor Subtypes and Disease Progression in Association 2 with ANP32E levels. 3 4 Garrett L. Ruff1, Kristin E. Murphy1, Paula M. Vertino1,2 *, Patrick J. Murphy1,2* 5 6 1Department of Biomedical Genetics and 2Wilmot Cancer Institute, University of Rochester 7 Medical Center, Rochester NY, 14642 8 9 * correspondence: 10 11 [email protected] & [email protected] 12 13 601 Elmwood Avenue, Rochester, NY 14624 14 15 ABSTRACT: 16 Despite highly advanced diagnosis and treatment strategies, breast cancer patient outcomes 17 vary extensively, even among individuals with the same diagnosis. Thus, a better understanding 18 of the unique molecular characteristics that underlie tumor trajectories and responses to therapy 19 remains a central goal. We report that chromatin patterns represent an important characteristic, 20 capable of stratifying tumor identity and progression. We find that patterns of chromatin 21 accessibility can be classified into 3 major groups, representing Basal-like tumors, hormone 22 receptor (HR)-expressing tumors, and invasive lobular Luminal-A tumors. Major chromatin 23 differences occur throughout the genome at motifs for the transcription factor FOXA1 in HR- 24 positive tumors, and motifs for SOX9 in Basal-like tumors. A large portion of lobular Luminal-A 25 tumors display a chromatin signature defined by accessibility at FOXA1 binding motifs, 26 distinguishing them from others within this subtype. Expression of the histone chaperone 27 ANP32E is inversely correlated with tumor progression and chromatin accessibility at FOXA1 28 binding sites. Tumors with high levels of ANP32E exhibit an immune response and proliferative 29 expression signature, whereas tumors with low ANP32E levels appear programmed for 30 differentiation. Our results indicate that ANP32E may function through chromatin state 31 regulation to control breast cancer differentiation and tumor plasticity. 32 33 INTRODUCTION: 34 Cellular programming is controlled by epigenetic modifications, transcription factor binding 35 patterns, and DNA packaging within the nucleus. These mechanisms control how gene 36 transcription machinery gains access to DNA at transcription start sites (TSS) and cis-regulatory 37 enhancers, ultimately controlling cellular programming through regulation of . 38 Regions with more accessible chromatin tend to be more highly transcribed, and inaccessible 39 regions are typically silent (1). Overall, chromatin accessibility is generally stable in terminally 40 differentiated cells, along with steady gene expression profiles, and the majority of chromatin 41 state dynamics occur either during embryonic development or as a consequence of disease 42 progression, including during carcinogenesis (1–3). Breast cancer is among the most frequent 43 and well-studied forms of cancer worldwide, but chromatin state specific differences among 44 breast cancers have not been established.

1 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

45 Breast cancer represents the most diagnosed cancer in women (4) with an estimated 2.1 million 46 newly diagnosed cases globally in 2018 (5). Measurements of gene expression or 47 abundance have enabled breast tumors to be classified into discrete subtypes, allowing for 48 diagnosis-specific treatment strategies based on underlying cellular programming. 49 Measurements of chromatin states have the potential to provide additional insights into breast 50 cancer mechanism and may ultimately lead to novel therapy strategies. For example, a recent 51 study of 410 tumors from The Cancer Genome Atlas (TCGA) used chromatin accessibility 52 measurements to identify more than 500,000 putative gene regulatory elements, including 53 thousands of genomic locations where accessibility differences occurred in a disease-specific 54 and tissue-specific manner (2). Separate studies of myeloma have also found that accessibility 55 levels at gene-distal enhancer regions enable accurate prediction of nearby oncogene 56 expression levels, as well as cancer subtype classification (6). Similar breast cancer focused 57 studies are lacking and have the potential to identify parallel associations. 58 Presently, breast cancer diagnostic methods include histologic classification, which is based on 59 the expression of estrogen receptor (ER) and progesterone receptor (PR), as well as expression 60 or amplification of the ERBB2/HER2 (gene/protein). These measurements are critical in 61 selecting patients for hormonal or HER2-directed therapy. Triple-negative breast cancers 62 (TNBC) lack expression of these biomarkers (7), and treatment options are thus limited to 63 chemotherapy. PAM50 (Prediction Analysis of Microarray 50) gene expression profiling is an 64 effective method to identify “intrinsic subtypes”, classified as Luminal A (Lum-A), Luminal B 65 (Lum-B), HER2-enriched, and Basal-like (Basal-L), and these subtypes provide initial insight 66 into molecular mechanisms, likelihood of progression, and patient outcome (8). While much is 67 known about the biological underpinnings of these subtypes, understanding how they 68 correspond with chromatin accessibility differences may uncover additional mechanisms, 69 including novel roles for chromatin and epigenetic factors. 70 The vast majority of transcription factors can only bind DNA at accessible chromatin locations, 71 rendering them non-functional at inaccessible binding sites (1). When chromatin state changes 72 occur, increased transcription factor binding generally leads to increased expression of 73 neighboring (1–3). Many transcription factors which are normally active during 74 development become reactivated in breast cancer, and depending on chromatin state, these 75 factors may influence tumorigenic behavior. For example, SOX9 and FOXC1 are important for 76 developmental regulation of transcription in multipotent neural crest stem cells (9, 10), and they 77 become reactivated in breast cancer to co-regulate Basal-L cancer initiation and proliferation 78 (11). In contrast, FOXA1, which is normally active in hematopoietic progenitor cells, acts 79 coordinately with ER to suppress Basal-L programming and reinforce the luminal phenotype 80 (12, 13). Furthermore, hyperactivity of FOXA1 promotes pro-metastatic transcriptional programs 81 in endocrine-resistant tumors (14, 15). Thus, assessing chromatin accessibility in breast cancer 82 tumors, at specific transcription factor binding sites, could be highly informative for studying the 83 molecular function of numerous factors already known to control cancer outcomes. 84 We recently defined the histone chaperone protein ANP32E as a genome-wide regulator of 85 chromatin accessibility in mouse fibroblasts (16). ANP32E functions to modulate the 86 installation/removal of H2A.Z from chromatin, regulating chromatin remodeler activity and 87 limiting chromatin accessibility. We found that loss of ANP32E caused thousands of gene 88 promoters and enhancers to become more “open”, leading to activation of neighboring genes. 89 These changes were accompanied by cellular reprogramming events where loss of ANP32E

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

90 caused cells to take on a more differentiated transcriptome phenotype. Interestingly, a recent 91 study suggests that ANP32E may be an independent prognostic marker for human breast 92 cancers, where higher ANP32E protein levels are associated with the TNBC subtype and 93 correlated with a shorter overall and disease-free survival. Moreover, forced downregulation of 94 ANP32E suppressed TNBC tumor growth in xenograft models (17). However, the precise 95 mechanisms by which ANP32E functions to support breast cancer growth and its role in defining 96 breast cancer phenotypes has not been fully established. 97 To gain insight into chromatin state function and heterogeneity in human breast cancer, we 98 used an unsupervised computational approach to segregate tumors into defined groups based 99 solely on genome-wide chromatin accessibility patterns. Basal-L tumors segregated as a 100 homogeneous class within group 1, where as a mixture of tumor types was found within group 101 2, including nearly all Lum-B and HER2-enriched tumors, and group 3 consisted primarily of 102 lobular Lum-A tumors. By defining the chromatin accessibility ‘signature’ associated with each 103 group, we identified DNA sequence motifs for specific transcription factors. SOX9 motifs were 104 most accessible in group 1 tumors, and FOXA1 motifs were most accessible in HR+ tumors 105 within groups 2 and 3. Finally, we found that expression for the chromatin factor ANP32E was 106 anti-correlated with tumor progression and with accessibility at FOXA1 binding sites among 107 group 2 and 3 tumors, suggestive of a novel mechanism by which FOXA1 activity may be 108 regulated in breast cancer tumors. Our results highlight the potential for future disease focused 109 studies of chromatin accessibility, as well as epigenetic therapies directed at disrupting 110 chromatin regulatory factors. 111 112 Materials and Methods: 113 Measurements of Chromatin Accessibility, Gene Expression and Classification of Tumors 114 ATAC-Seq datasets were downloaded from TCGA-BRCA project in the National Cancer 115 Institute’s (NCI) Genomic Data Commons (GDC) (18). Datasets were downloaded as bam files, 116 sorted, and read count normalized with DeepTools (v3.1.3) (bamCoverage -bs 10) (19). MACS2 117 (v2.1.4) was used for peak calls (bdgpeakcall -c 35 -g 100 -l 100) (20). A union peak set was 118 generated containing all peaks across datasets (n=245133), and accessibility in these regions 119 was scored for all tumors. Gene expression datasets were also downloaded from the TCGA- 120 BRCA project. The expression files were downloaded as tables and matched to ATAC-Seq with 121 Case ID. All 1222 expression files available in the TCGA-BRCA project were also combined into 122 a union expression table. Tumor stage and IHC subtype were extracted from the TCGA-BRCA 123 project in the NCI’s GDC. PAM50 subtype (21), histological subtype (22), and general patient 124 demographics (22) data were obtained in cBioPortal (23, 24). 125 Unsupervised Dimensional Reduction and Clustering 126 The union peak table (described above) was uploaded into R and scores were normalized by 127 ranking regions from minimum to maximum accessibility for each tumor. This table was then 128 input into UMAP package (n_neighbors=10) (25). UMAP output three tumor groups by 129 agnostically grouping tumors based on similarities in chromatin accessibility patterns. To identify 130 regions where accessibility differences occurred, log2FC values were calculated from a region’s 131 average accessibility within a tumor group compared with its accessibility in all other tumors. 132 Signatures 1, 2 and 3 consisted of regions with a log2FC greater than 2.5 for groups 1, 2 and 3,

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

133 respectively. Tumors were considered individually rather than as replicates, and therefore 134 significance measurements were not assessed in defining divergent accessibility or gene 135 expression groups. 136 Data Visualization 137 The pheatmap package in R was used to create heatmaps of chromatin accessibility and gene 138 expression, annotated by tumor characteristics. The ggplot2 package was used to create 139 scatterplots and superimpose characteristics, such as cancer type, on UMAP plots. Integrative 140 Genomics Viewer (IGV) (26) was used to visualize chromatin accessibility in tumor groups and 141 stages. DeepTools was used to create heatmaps of accessibility and ChIP-Seq binding across 142 regions. The Hg38 genome assembly was used. 143 Annotation of Chromatin Signatures and Analyses 144 HOMER (v4.10) was used to annotate and find motifs enriched in each chromatin signature 145 (see above) (27), and group accessibility trends at those motifs were subsequently determined. 146 Gene ontologies for chromatin regions were determined with GREAT, which associates regions 147 to any gene whose TSS is within 1000 kb (28). Gene ontologies for genes from divergent gene 148 expression analyses were determined with Enrichr (29, 30). 149 Analysis of MCF-7 ChIP-Seq 150 Encode was used to download ChIP-Seq data from the MCF-7 cell line for FOXA1 151 (ENCSR126YEB), H3K27ac (ENCSR752UOD), H2A.Z (ENCSR057MWG) and ER 152 (ENCSR463GOT) (31, 32). BigWig files of log2FC over control were downloaded from the 153 ENCODE portal with the following identifiers: ENCFF795BHZ (FOXA1), ENCFF063VLJ 154 (H3K27ac), ENCFF589PLM (H2A.Z), and ENCFF237WTX (ER). 155 GSEA 156 Using the union expression dataset, tables of tumors in the top and bottom decile of ANP32E 157 expression were generated. In order to associate gene ontologies with ANP32E expression, the 158 average gene expressions of the top and bottom deciles were input into GSEA, which then 159 converted normalized counts data to ranked lists for enrichment scoring (33, 34). To isolate this 160 effect from ANP32E’s association with Basal-L tumors, we sought to eliminate the Basal-L 161 subtype. Using expression of FOXA1 and GATA3, two PAM50 markers, we removed the tumors 162 that were in the bottom quartile of expression for both genes. Testing this method on the 74 163 known tumors, this results in 14 tumors being eliminated. 10 of the 12 known Basal-L tumors 164 were removed, and 12 of the 14 tumors removed were in group 1. Since this method was shown 165 to be effective in removing the majority of Basal-L tumors from the sample, we applied it to all 166 tumors in the TCGA-BRCA project. This resulted in removing 112 of the 1222 expression files 167 available. We then repeated the GSEA analysis with this subset of tumors. 168 Statistical analyses were done with R statistical software (v3.6.3), and p-values obtained are 169 from parametric t-tests. Log2 fold-change values were calculated with a pseudo-count of 1. 170 171 172

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

173 174 Results: 175 Patterns of Chromatin Accessibility Segregate Breast Tumors into Distinct Subtypes 176 Chromatin accessibility has been used for defining cell identities, for establishing tissues of 177 origin, and for measuring developmental cell-state transitions (35–38). We therefore sought to 178 identify similarities and differences in chromatin state comparing between breast tumors. 179 Chromatin accessibility maps were previously generated for 74 primary invasive breast 180 carcinomas using ATAC-Seq, as part of TCGA-BRCA project (2, 39). Sequence data 181 (downloaded from the Genome Data Commons - gdc.cancer.gov) were normalized based on 182 total mapped reads, and enriched ‘peaks’ of high accessibility were identified using MACS2 (20) 183 (245133 union peaks). Uniform Manifold Approximation and Projection (UMAP) (25) was then 184 used to group individual tumors based on chromatin accessibility patterns, wherein tumors 185 segregated into three distinct groups (Fig. 1A), with no obvious differences in demographics 186 between groups (Fig. S1A & S1B). Most chromatin differences occurred along UMAP dimension 187 2, where tumors within group 1 bore the greatest distinction from groups 2 and 3 (Fig. 1A). 188 We next compared the chromatin-based UMAP classification with existing IHC and PAM50 189 classifications. We found that group 1 included nearly all TNBC and all Basal-L tumors, whereas 190 all HER2+ tumors (either by IHC or PAM50 classification) were within group 2. HR+/HER2- 191 tumors, however, were distributed across all three groups (Fig. 1A – middle). Perhaps not 192 surprisingly, the chromatin-based UMAP classification better reflected the intrinsic tumor 193 subtypes that are based on gene expression (PAM50 classification) rather than the pathologic 194 classification. For example, Basal-L tumors were found within group 1, and nearly all HER2- 195 enriched and Lum-B tumors were within group 2 (Fig. 1A-right). Interestingly however, a subset 196 of Lum-A tumors (and 1 Lum-B) were classified as a distinct set of tumors within group 3 (Fig. 197 1A-right) (analyzed subsequently). 198 We further examined the relationship between the chromatin-based classification and other 199 known features, including the frequency of common mutations and the expression of key genes. 200 As expected (given the enrichment of TNBC/Basal-L tumors in group 1) mutations in TP53 were 201 over-represented in group 1, whereas mutations in PIK3CA, GATA3 and CDH1 were 202 underrepresented (Fig. S1C). Likewise, (given the relationship to PAM50 subtype) expression of 203 FZD7, SOX9, and MYC was higher for tumors within group 1, whereas tumors in group 2 and 3 204 had higher expression of FOXA1 and GATA3 (Fig. S2A & S2B). 205 The above data indicated that features of chromatin accessibility may promote discrete tumor 206 phenotypes, but chromatin patterns are nevertheless distinct from gene expression or 207 histopathology-based phenotypes, suggesting that the chromatin differences may represent 208 unique tumor behavior or underlying biology differences. In this regard, two classes stood out. 209 HR+/HER2- tumors, which were distributed across all three groups, and Lum-A tumors, which 210 were split between groups 2 and 3. To further investigate the nature of these differences, we 211 compared the transcriptomes of those HR+/HER2- tumors that clustered in group 1 with 212 HR+/HER2- tumors outside of group 1. Rather than grouping tumor samples as ‘replicates’, we 213 compared average gene expression levels between UMAP groups and assessed statistical 214 significance in downstream steps. Using this approach, we identified more than 4000 genes with 215 divergent expression (Log2FC >1) (Fig. 1D). As expected, (based on results in Fig 1A) gene 216 ontology (GO) analysis indicated that genes involved in hormone signaling tended to be under- 217 expressed in the HR+/HER2- tumors in group 1 relative to those in the other two groups (Fig. 218 1E). Among these, expression of ESR1, PR, and ERBB2, as well as androgen receptor (AR) 219 (Fig. S2C) showed reduced expression in group 1 tumors relative to those in groups 2 and 3,

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

220 and these HR+/HER2- tumors were more similar in the expression of these genes to group 1 221 tumors classified as TNBC by IHC. These data suggest that chromatin-based classification may 222 be a more accurate reflection of tumor phenotype, and that differences in classification may 223 reflect heterogeneity of HR protein expression, variation in how (low vs. no) HR protein 224 expression is stratified by different sites/pathologists, particularly for ER, and/or differences in 225 mRNA vs. protein-based determination. Consistent with this idea, ESR1 expression in group 1 226 HR+/HER2- tumors was not only lower than that in HR+ tumors from other groups, but also 227 exhibited greater variation than tumors determined to be TNBC (i.e. and thus ER-) (Fig. S2C). 228 As noted above, our chromatin-based classification distinguished a subset of Lum-A tumors (8 229 of 24) as a distinct group (group 3). There were no apparent differences in the expression of the 230 classic biomarker genes (ESR1, PR, ERBB2) or AR (Fig. S2D) between Lum-A tumors in 231 groups 2 vs. those in group 3. We identified 523 genes with divergent expression between 232 group 2 vs. group 3 Lum-A tumors (Fig. 1F). Interestingly, these differences largely reflected 233 dysregulation of genes involves in humoral immune response and inflammatory pathways 234 (which were enriched and depleted) respectively in Lum-A tumors within group 3 (Fig. 1G). 235 Taken together, these data suggest that the chromatin state differences in breast cancer occur 236 largely in Basal-L tumors (as compared with non-Basal-L tumors) and within a distinct subset of 237 Lum-A tumors, potentially resulting from immune evasion (40). 238 239 Accessibility Differences at a Subset of “Signature” Regions Underlie Tumor Groups 240 To gain further insight into the factors driving group classification, we identified the genomic 241 regions where high levels of accessibility were present for tumors within each respective group, 242 as compared with all other tumors (Log2FC>2.5). This enabled us to define a set of accessible 243 loci (signature regions) which independently partitioned tumors in a manner nearly identical to 244 UMAP grouping (Fig. 2A & 2B). Interestingly, the signature sites for all 3 groups tended to be 245 further away from the nearest annotated TSS (Fig. S3A) and less CpG rich (Fig. S3B), as 246 compared with randomly-selected accessible peak regions, suggesting that they might 247 represent distal regulatory elements. GREAT analysis (28) was then used to annotate the 248 signature regions to all genes whose TSS were within 1000 kb in either direction (Fig. S3). 249 Analysis of the genes near signature 1 sites indicated they were involved in exocrine gland 250 development, which was not surprising given that all tumors within group 1 were Basal-L, and 251 are therefore thought to arise from precursor cells within the basal layer of mammary exocrine 252 glands. By contrast, signature 2 sites were located nearest to hormone responsive genes, 253 consistent with the abundance of HR+ and Lum-A/B tumors in this group. Interestingly, genes 254 associated with signature 3 sites were enriched in functions involved in cell metabolism (Fig. 255 S3E), suggesting that a unique metabolic program may distinguish tumors in this group from 256 those that otherwise bear a Lum-A gene expression signature. 257 As noted, the chromatin profiles segregated Lum-A tumors into two groups (Lum-A, group 2 vs 258 group 3) (Fig. 1C). To gain further insight into the potential factors segregating these tumors, we 259 applied a similar approach as above to identify regions of chromatin accessibility that differed 260 between Lum-A tumors in group 2 and group 3, including sites outside our defined signature 261 regions (log2FC>2, n=3360) (Fig. 2C). Interestingly, GREAT gene ontology analysis (genes 262 within 1000 kb) revealed that regions of greater accessibility in group 2 Lum-A tumors (lower in 263 group 3) were annotated to genes involved in development and morphogenesis, whereas 264 regions with greater accessibility in group 3 were annotated to genes involved in carbohydrate 265 metabolism (Fig. 2D). These results provide further indication that a subset of Lum-A tumors 266 may be programmed in a metabolically distinct manner based on differences in chromatin state.

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

267 Both the subset of Lum-A tumors in group 3, and group 3 tumors in general, were distinguished 268 by features associated with immune (Fig. 1G) and metabolic regulation (Fig. 2D), similar to gene 269 expression characteristics previously identified as distinguishing subsets of invasive lobular 270 carcinomas (ILC) which exhibit a Lum-A intrinsic gene expression pattern upon PAM50 271 subtyping (41–43). To investigate this further, we overlayed the tumor histology information 272 extracted from the TCGA metadata for each tumor in the dataset along with the UMAP 273 classification (Fig. 2E). We found that indeed, ILC was over-represented in group 3, relative to 274 groups 1 and 2 (Fig. 2F). ILC is also characterized by high rates of CDH1 mutation, which were 275 also found to be somewhat overrepresented in group 3 tumors (Fig. S1C). However, many (8 of 276 14) lobular carcinomas also distributed to group 2, indicating that chromatin state differences 277 occurred in a subset of lobular carcinomas, many of which were classified previously as Lum-A 278 (Figs 1A & 1G). 279 280 Accessibility at FOX and SOX binding sites defines tumor groups 281 To better understand how chromatin changes might contribute to biologically distinct tumor 282 properties, we next investigated the genomic context of the established signature regions. The 283 gene-distal nature of these signature regions (Fig. S3A) suggests that they might function as 284 intergenic regulatory sites. To investigate this possibility, we used HOMER (27) to identify DNA 285 sequence motifs enriched in the signature regions, representing potential transcription factor 286 binding sites. Several motifs were found to be enriched (Supplemental Table 1), as compared 287 with background regions (consisting of 5000 randomly selected, similarly sized genome-wide 288 accessible sites). SOX factor binding motifs were most enriched in signature 1 regions, FOX 289 factor motifs were the most enriched in signature 2 regions, and CEBP motifs were the most 290 enriched in signature 3 regions (Fig. 3A). To confirm that accessibility differences occurred 291 directly over these candidate motifs, we next mapped motif locations within signatures 1, 2, and 292 3, and assessed accessibility levels at these sites (Fig. 3B – top). Indeed, tumors in group 1 293 had, on average, greater accessibility at SOX motifs, group 2 tumors had the highest 294 accessibility at FOX motifs, and group 3 tumors had the highest accessibility at CEBP motifs. 295 Having identified DNA motifs enriched within each signature region, we next asked whether 296 these motif locations were more broadly accessible throughout the genome, even when motifs 297 were located outside our signature regions (Fig. 3B – bottom). Indeed, we found that genome- 298 wide, accessibility at SOX motifs was highest in group 1 tumor samples, and at FOX motifs, 299 accessibility was highest in group 2 samples (hormone positive tumors). Interestingly, the 300 accessibility at SOX, FOX, and CEBP sites was lower for group 3 tumors, when compared with 301 groups 1 and 2, suggesting that additional factors may underlie accessibility of the group 3 302 tumors. We next compared the levels of gene expression to determine which among the FOX 303 and SOX family transcription factors might be involved. Here we found that group 1 tumors 304 tended to express high levels of SOX9, FOXC1, and FOXM1 relative to tumors in groups 2 and 305 3, whereas group 2 tumors expressed high levels of FOXA1 (Fig. 3C & S4A). These results are 306 well aligned with published studies indicating that SOX9 and FOXC1 play central roles in TNBC 307 (within group 1), whereas FOXA1 status and levels are mediators of outcome and programming 308 among ER+ breast tumors (mostly in group 2) (11, 12, 14). 309 HR+ breast cancers in groups 2 and 3 expressed higher levels of FOXA1 (Fig. S4A) and 310 exhibited greater chromatin accessibility over FOX factor sequence motifs (Fig. 3B). Prior 311 studies indicate that FOXA1 function in conjunction with ER to influence enhancer activity and 312 promote pro-metastatic transcriptional programming in breast cancer cell lines (14, 15). We 313 therefore determined the relationship between chromatin grouping and tumor progression. 314 Indeed, we found that average chromatin accessibility levels at the signature regions defining

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

315 groups 1-3 were associated with tumor stage (Fig. 4A-C). Notably, at signature 2 regions, which 316 are enriched for FOX motifs (Fig. 3A), there was a strong relationship between accessibility and 317 tumor stage, with progressively greater accessibility associated with increasing severity of 318 disease (Fig. 4C & S4B). There was a nonsignificant trend towards decreased accessibility at 319 signature 3 sites with increasing stage of disease. Signature 1 sites showed the greatest 320 accessibility in early stage (stage I and II), suggesting that among Basal-L/TNBC breast cancers 321 such sites may underlie the early stages of disease. 322 The observation that signature 2 regions were positively associated with increasing tumor stage 323 among group 2 tumors (Fig. 4A) was recapitulated in our analysis of accessibility at FOX motifs 324 across all accessible sites for the 74 tumors in the dataset (n=96,280) (Fig. 4D & S4C). Here we 325 found again that accessibility of FOX motifs among all accessible sites is positively correlated 326 with tumor stage, despite no apparent differences in FOXA1 gene expression between tumors 327 of different stages (Fig. 4E). Similar to what was observed for differences in accessibility at 328 FOX motifs (Fig. 3D), we noted a trend towards greater accessibility at FOX motifs among Lum- 329 A or lobular carcinomas in group 2 vs. group 3. Regardless of whether we focused on Lum-A or 330 lobular tumors, those in group 2 had higher accessibility levels than tumors in group 3 (Fig. 331 S4D) – despite no apparent differences in FOXA1 expression levels (Fig. S4E). 332 Cognizant of the previously described relationship between ER and FOXA1 binding at distal 333 enhancer elements in breast cancer cells (as discussed above (14, 15)), we next investigated 334 histone modifications at the accessible signature regions. Using publicly available chromatin 335 immunoprecipitation data from MCF-7 cells (31, 32), we found that FOXA1, ER, and H3K27ac 336 (a marker of active enhancers) are significantly enriched at signature 2 regions, as compared 337 with accessible regions that define groups 1 or 3 (Fig. 5A). This high level of co-occurrence 338 suggests that the co-binding of FOXA1 and ER may underlie the chromatin-based segregation 339 in HR+/Lum-A/lobular carcinomas between groups 2 vs. 3. When considered together, these 340 results suggest that chromatin accessibility at FOX factor binding motifs increases with tumor 341 progression, and this mechanism underlie distinguishing chromatin patterns in group 2 verses 342 group 3 tumors. 343 344 ANP32E Expression Levels are Associated with Accessibility at FOX Motifs 345 The above data indicate that accessibility of FOX motifs is generally associated with tumor 346 stage, but we find little evidence for differences in FOXA1 (or ESR1) expression levels between 347 tumors of different stages, suggesting that additional factors may contribute to accessibility at 348 FOX binding motifs. Prior studies of HR+ breast cancer cells have demonstrated that the 349 function of FOXA1 is impacted by the local enrichment of the histone variant H2A.Z (44, 45). 350 H2A.Z accumulates at estrogen response elements that are bound by FOXA1 and loss of 351 H2A.Z impairs both FOXA1 binding and polymerase recruitment. ANP32E is a chromatin 352 chaperone that regulates the genomic localization of H2A.Z to control locus-specific chromatin 353 state dynamics (16). In recent work, we showed that ANP32E antagonizes H2A.Z installation, 354 such that ANP32E loss causes a global increased H2A.Z enrichment, heightened chromatin 355 accessibility and amplified TF-binding at open sites, in cultured mouse fibroblasts (16). ANP32E 356 may function similarly in breast tumors, influencing the binding of key oncogenic transcription 357 factors, like FOXA1. Therefore, we investigated the relationship between ANP32E expression, 358 chromatin accessibility, and tumor characteristics across the chromatin-defined tumor groups 359 (Fig. 5B). We found that ANP32E expression was generally higher in group 1 tumors than 360 groups 2 and 3 (Fig. 5B & S5A), and consistent with a prior report looking at protein levels (17), 361 ANP32E was significantly higher in Basal-L tumors (within group 1) than the other PAM50 362 subtypes (Fig. 5B & S5B). Moreover, the levels of ANP32E expression tended to stratify tumors

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

363 by stage, wherein early-stage (I, II) tumors had the highest levels of ANP32E expression and 364 late-stage (III, IV) tumors had the lowest levels (Fig. 5B, 5C & S5C). This association was 365 maintained even when group 1 tumors were excluded (Fig. 5D & S5D), indicating that ANP32E 366 levels may be functionally involved in cancer progression independent of tumor subtype. 367 We next evaluated the relationship between ANP32E expression and accessibility. We found 368 that accessibility at signature 2 regions (Fig. 5E) or all accessible FOXA1 motifs (Fig. 5F) were 369 significantly anticorrelated with levels of ANP32E expression across all tumors (signature 2: R= 370 -0.409, p=0.0003; FOX motifs: R= -0.276, p=0.017), suggesting that ANP32E may indeed 371 function as a negative regulator of chromatin accessibility at these sites. Notably, signature 2 372 regions had the highest levels of H2A.Z in MCF-7 cells (Fig. S5E), and the negative relationship 373 between ANP32E expression and accessibility was specific to signature 2 (Fig. S5F & S5G). 374 375 ANP32E Expression Levels are Associated with DNA Replication and Immune Response 376 Based on our findings that reduced ANP32E expression levels associated with tumor stage 377 progression, we next sought to determine the relationship between ANP32E expression and 378 tumor phenotype, using the tumor transcriptome as a read-out. For this analysis, we first 379 identified tumors in the top and bottom deciles of ANP32E expression (n=123) among all breast 380 tumors from the TCGA-BRCA project for which RNA-Seq data were available (n=1222). Then, 381 we used GSEA to investigate the ontologies of these transcriptomic shifts associated with 382 ANP32E expression levels. We found that high ANP32E expression was associated with 383 increased expression of genes involved in the immune response (Fig. 6A) and to a lesser extent 384 DNA replication (Fig. S6A). Consistent with this idea, KI67 expression, a marker of cellular 385 proliferation (46), was highest in group 1 tumors (representing all Basal-L and most TNBC 386 tumors) (Fig. 6B), and ANP32E and KI67 levels were positively correlated across all samples 387 analyzed, but not after removing group 1 (Basal-L) tumors (Fig. 6C). Conversely, low ANP32E 388 expression was associated with increased expression of genes involved in separate 389 developmental processes (eg. ‘Keratinocyte Differentiation’ and ‘Cilium Movement’). To test 390 whether the observed gene expression associations were driven by differences between Basal- 391 L and non-Basal-L tumors, we repeated these analyses exclusively assaying tumors classified 392 as non-Basal-L (see methods). Here again we found that high ANP32E expression was 393 associated with genes involved in DNA replication and immune response (Fig. 6D & S6B), 394 indicating that the observed GSEA results were not simply the outcome of tumor subtype gene 395 expression differences. 396 Taken together, these results suggest that ANP32E may generally function to restrict chromatin 397 changes at the beginning stages of tumor development, and loss of ANP32E promotes tumor 398 progression by enabling more aggressive cancer. In this regard ANP32E may act to ‘lock in’ a 399 defined chromatin state, and when tumor cells transition to later stages of cancer progression, 400 ANP32E becomes downregulated, leading to increased chromatin accessibility at a defined set 401 of gene regulatory regions, including sites where H3K27ac and H2A.Z are enriched, enhancer 402 elements, and FOXA1 binding sites. 403 404 Discussion: 405 We set out to investigate how differences in chromatin state across separate breast tumors 406 coincided with unique characteristics of cancer biology, and to investigate whether differences in 407 chromatin patterns could provide insight into new cancer mechanisms. To test whether 408 chromatin accessibility patterns differed in a biologically meaningful manner, we took an

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

409 unsupervised approach, using a dimensional reduction method (UMAP) to group tumors based 410 only on chromatin differences. With this approach, 74 breast cancer tumors were grouped into 411 three distinct UMAP categories. Supporting the validity of our UMAP approach, we found that 412 differences in chromatin patterns associated with several known breast cancer features, 413 including IHC marker status (Fig. 1B), PAM50 subtype classification (Fig. 1C), and histological 414 classification (Fig. 2F). We also uncovered several novel chromatin associations. For example, 415 our UMAP analysis indicated that 6 HR+/HER2- tumors were more similar to TNBC tumors (Fig. 416 1A), and these tumors were distinct from other HR+/HER2- tumors. Further characterization 417 revealed that these 6 samples, along with TNBC samples, were classified as Basal-L, 418 suggesting that the chromatin state of Basal-L tumors drove the UMAP segregation patterns, 419 rather than chromatin associations with IHC marker status. Differences in tumor heterogeneity 420 may contribute to these differences in grouping and IHC status. For example, HR+/HER2- 421 tumors with non-uniform IHC staining may be more similar to TNBC tumors when considered in 422 aggregate than homogeneously stained HR+/HER2- tumors. Another interesting possibility is 423 that a subset of HR+/HER2- tumors may be mechanistically more similar to TNBC-like tumors, 424 perhaps explaining why a subset of HR+/HER2- tumors are resistant to hormone therapies (47). 425 More comprehensive and longitudinal studies of breast cancer, measuring chromatin state 426 changes along with IHC status and gene expression profiling, will help in establishing which of 427 these possibilities account for the observed UMAP grouping. In the future, additional diagnostic 428 tests of HR+/HER2- tumors may be necessary to assess intrinsic cell-type of origin, potentially 429 strengthening predictions of therapy response. 430 We also found chromatin differences occurred in a subset of Lum-A tumors, which appeared to 431 have chromatin patterns more similar to non-Lum-A tumors within UMAP group 2 (including 432 Lum-B and HER2-enriched tumors). This subset had reduced expression for genes involved in 433 immune response (Fig. 1G) and reduced accessibility at regions proximal to metabolism genes 434 (Fig. 2D), despite no measurable difference in expression for typical breast cancer markers, 435 such as PR, ESR1, and ERBB2 (Fig. S2E). We observed similar patterns for lobular tumors, 436 which also segregated into two classes (Fig. 2E & 2F), with a subset of lobular tumors grouping 437 with ductal tumors (within UMAP group 2). Previous studies examining differences in Lum-A 438 carcinomas found that pathways similar to those active in group 3 tumors were also active in 439 ILC (as compared with ductal carcinoma), including immune-related and metabolic pathways 440 (43). In this context, our results suggest that group 3 may represent invasive carcinomas, similar 441 to those described previously (41–43). Indeed, prior studies found that ILC had decreased 442 FOXA1 activity (based on measurements of gene expression and mutation frequency) (41), and 443 in our study, we found lower chromatin accessibility levels at FOXA1 binding sites in group 3 444 tumors, which we presume to be of this same ILC subtype (Fig. 3B). In sum, our results support 445 a model where loss of FOXA1 activity (and subsequent loss of DNA binding) in luminal tumors 446 distinguishes ILC from other HR+ tumors (presumably occurring within UMAP group 2). 447 We found the tumors within UMAP group 2 to be particularly interesting, as several distinct 448 cancer subtypes grouped together, indicating that they had quite similar chromatin accessibility 449 patterns despite differences in clinical classifications. Interestingly, FOX motifs were enriched 450 within the genomic loci where accessibility differences distinguished group 2 tumors (Fig. 3A & 451 3B) and these loci were located distal from gene promoters (Fig. S3A). In MCF-7 breast cancer 452 cells, these regions are bound by FOXA1 and ER, and enriched for H2A.Z and H3K27ac (Fig. 453 S5E & 5A), suggesting that they may function as enhancer elements in HR+ breast cancer 454 tumors. Prior studies have demonstrated that H2A.Z levels at ER binding sites facilitates 455 enhancer activation and FOXA1 binding in this type of (HR+) breast cancer cell (44, 45). We 456 and others previously demonstrated that H2A.Z is a negative regulator of DNA methylation (48– 457 50), and accordingly, lower DNA methylation levels are known to occur at enhancers bound by

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

458 FOXA1 and ER in luminal tumors (compared with basal tumors) (51). Additionally, increased 459 FOXA1 activity has been shown to function in the activation of pro-metastatic cellular 460 programming (14). Taken together, these results suggest that increased H2A.Z levels at 461 enhancers in luminal tumors may promoter increased accessibility, improved FOXA1 binding, 462 and amplified enhancer activity, potentially driving tumors toward a more metastatic cellular 463 program without changing FOXA1 expression levels (Fig. 6E). 464 The histone chaperone ANP32E has previously been shown to control H2A.Z levels at 465 thousands of vertebrate gene regulatory regions, including enhancers (16, 48, 52, 53). We 466 previously found that ANP32E functions in mouse cells to control genome-wide chromatin 467 accessibility through regulation of H2A.Z patterns (16). Based on this mechanism, differences in 468 ANP32E levels among breast tumors may lead to differences in H2A.Z enrichment, causing 469 chromatin accessibility differences, and ultimately impacting transcription factor binding events. 470 In the context of this study, we do indeed find that ANP32E expression levels differ among 471 tumors, and these differences are anticorrelated with chromatin accessibility at FOX factor 472 binding sites. Interestingly, accessibility at these same sites tends to increase in later-stage 473 tumors (stage III, IV), compared with earlier stages (stage I, II), suggesting that selective 474 opening of signature 2 regions, and FOX binding in particular may function to promote tumor 475 progression. In this regard, ANP32E levels in HR+ tumors may specifically restrict chromatin 476 accessibility at FOX factor motifs (Fig. 6E). Additional mechanistic studies of ANP32E, H2A.Z 477 and its role in FOX factor binding in the context of HR+ breast cancer will be necessary to 478 further investigate this possibility. 479 It is important to note that our study investigated accessibility data from primary tumor samples 480 only. In this context, our ability to identify significant correlations between stage at resection and 481 chromatin accessibility suggests that changes in the chromatin state of the primary tumor may 482 precede, and/or be predictive of, the propensity for tumor progression and/or metastatic spread. 483 We therefore propose a model in which ANP32E has two separate functions in breast cancer, 484 depending on tumor subtype or the differentiation state of the cell of origin. In Basal-L/TNBC 485 tumors, largely believed to arise from a more stem-like multipotent progenitor, high levels of 486 ANP32E ‘lock-in’ a pattern of accessible chromatin that favors proliferation and self-renewal, 487 while in HR+ breast tumors, arising in a more differentiated luminal progenitor, ANP32E 488 supports the maintenance of luminal identity and hormone responsiveness by restricting FOXA1 489 binding at estrogen response elements (Fig. 6E). In this latter setting, the loss of ANP32E 490 expression may lead to increased FOXA1 binding, relaxation of cellular programming, and 491 progression to a hormone-resistant state. Indeed, factors affecting the balance of ER and 492 FOXA1 binding to estrogen response elements, such as forced overexpression of FOXA1, may 493 promote expression of genes involved in metastasis and endocrine-resistant breast cancers 494 (14). Future studies addressing the role of ANP32E, H2A.Z, and their role in FOX factor binding 495 in the context of HR+ breast cancer are necessary to further investigate this possibility. 496 497 Acknowledgements: 498 The results published here are in whole or part based upon data generated by the TCGA 499 Research Network: https://www.cancer.gov/tcga. TCGA-BRCA dbGaP Study Accession: 500 phs000178.v11.p8. 501 Conflicts of Interest: 502 Authors declare there are no conflicts of interest.

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

503 Funding: 504 Funding for this research was from NIGMS R35-GM137833 to PJM and R01-CA250531 to 505 PMV. 506 507 508 Literature Cited: 509 1. Klemm,S.L., Shipony,Z. and Greenleaf,W.J. (2019) Chromatin accessibility and the regulatory 510 epigenome. Nature Reviews Genetics, 20, 207–220.

511 2. Corces,M.R., Granja,J.M., Shams,S., Louie,B.H., Seoane,J.A., Zhou,W., Silva,T.C., Groeneveld,C., 512 Wong,C.K., Cho,S.W., et al. (2018) The chromatin accessibility landscape of primary human 513 cancers. Science, 362.

514 3. Domcke,S., Hill,A.J., Daza,R.M., Cao,J., O’Day,D.R., Pliner,H.A., Aldinger,K.A., Pokholok,D., Zhang,F., 515 Milbank,J.H., et al. (2020) A human cell atlas of fetal chromatin accessibility. Science, 370, 516 eaba7612.

517 4. Ferlay,J., Colombet,M., Soerjomataram,I., Mathers,C., Parkin,D.M., Piñeros,M., Znaor,A. and Bray,F. 518 (2019) Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and 519 methods. International Journal of Cancer, 144, 1941–1953.

520 5. Harbeck,N., Penault-Llorca,F., Cortes,J., Gnant,M., Houssami,N., Poortmans,P., Ruddy,K., Tsang,J. and 521 Cardoso,F. (2019) Breast cancer. Nature Reviews Disease Primers, 5, 1–31.

522 6. Barwick,B.G., Gupta,V.A., Matulis,S.M., Patton,J.C., Powell,D.R., Gu,Y., Jaye,D.L., Conneely,K.N., 523 Lin,Y.C., Hofmeister,C.C., et al. (2021) Chromatin Accessibility Identifies Regulatory Elements 524 Predictive of Gene Expression and Disease Outcome in Multiple Myeloma. Clin Cancer Res, 27, 525 3178–3189.

526 7. Foulkes,W.D., Smith,I.E. and Reis-Filho,J.S. (2010) Triple-Negative Breast Cancer. New England Journal 527 of Medicine, 363, 1938–1948.

528 8. Parker,J.S., Mullins,M., Cheang,M.C.U., Leung,S., Voduc,D., Vickery,T., Davies,S., Fauron,C., He,X., 529 Hu,Z., et al. (2009) Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes. J Clin 530 Oncol, 27, 1160–1167.

531 9. Xu,P., Yu,H.V., Tseng,K.-C., Flath,M., Fabian,P., Segil,N. and Crump,J.G. (2021) Foxc1 establishes 532 enhancer accessibility for craniofacial cartilage differentiation. eLife, 10, e63595.

533 10. Schock,E.N. and LaBonne,C. (2020) Sorting Sox: Diverse Roles for Sox Transcription Factors During 534 Neural Crest and Craniofacial Development. Front Physiol, 11, 606889.

535 11. Tang,L., Jin,J., Xu,K., Wang,X., Tang,J. and Guan,X. (2020) SOX9 interacts with FOXC1 to activate MYC 536 and regulate CDK7 inhibitor sensitivity in triple-negative breast cancer. Oncogenesis, 9, 1–12.

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

537 12. Bernardo,G.M., Bebek,G., Ginther,C.L., Sizemore,S.T., Lozada,K.L., Miedler,J.D., Anderson,L.A., 538 Godwin,A.K., Abdul-Karim,F.W., Slamon,D.J., et al. (2013) FOXA1 represses the molecular 539 phenotype of basal breast cancer cells. Oncogene, 32, 554–563.

540 13. Pellacani,D., Tan,S., Lefort,S. and Eaves,C.J. (2019) Transcriptional regulation of normal human 541 mammary cell heterogeneity and its perturbation in breast cancer. EMBO J, 38, e100330.

542 14. Fu,X., Pereira,R., Angelis,C.D., Veeraraghavan,J., Nanda,S., Qin,L., Cataldo,M.L., Sethunath,V., 543 Mehravaran,S., Gutierrez,C., et al. (2019) FOXA1 upregulation promotes enhancer and 544 transcriptional reprogramming in endocrine-resistant breast cancer. PNAS, 116, 26823–26834.

545 15. Ross-Innes,C.S., Stark,R., Teschendorff,A.E., Holmes,K.A., Ali,H.R., Dunning,M.J., Brown,G.D., Gojis,O., 546 Ellis,I.O., Green,A.R., et al. (2012) Differential oestrogen receptor binding is associated with 547 clinical outcome in breast cancer. Nature, 481, 389–393.

548 16. Murphy,K.E., Meng,F.W., Makowski,C.E. and Murphy,P.J. (2020) Genome-wide chromatin 549 accessibility is restricted by ANP32E. Nature Communications, 11, 5063.

550 17. Xiong,Z., Ye,L., Zhenyu,H., Li,F., Xiong,Y., Lin,C., Wu,X., Deng,G., Shi,W., Song,L., et al. (2018) ANP32E 551 induces tumorigenesis of triple-negative breast cancer cells by upregulating E2F1. Mol Oncol, 12, 552 896–912.

553 18. Grossman,R.L., Heath,A.P., Ferretti,V., Varmus,H.E., Lowy,D.R., Kibbe,W.A. and Staudt,L.M. (2016) 554 Toward a Shared Vision for Cancer Genomic Data. New England Journal of Medicine, 375, 1109– 555 1112.

556 19. Ramírez,F., Dündar,F., Diehl,S., Grüning,B.A. and Manke,T. (2014) deepTools: a flexible platform for 557 exploring deep-sequencing data. Nucleic Acids Research, 42, W187–W191.

558 20. Zhang,Y., Liu,T., Meyer,C.A., Eeckhoute,J., Johnson,D.S., Bernstein,B.E., Nusbaum,C., Myers,R.M., 559 Brown,M., Li,W., et al. (2008) Model-based Analysis of ChIP-Seq (MACS). Genome Biology, 9, 560 R137.

561 21. Koboldt,D.C., Fulton,R.S., McLellan,M.D., Schmidt,H., Kalicki-Veizer,J., McMichael,J.F., Fulton,L.L., 562 Dooling,D.J., Ding,L., Mardis,E.R., et al. (2012) Comprehensive molecular portraits of human 563 breast tumours. Nature, 490, 61–70.

564 22. Broad Institute TCGA Genome Data Analysis Center (2016): Firehose stddata_2016_01_28 run. Broad 565 Institute of MIT and Harvard. doi:10.7908/C11G0KM9.

566 23. Cerami,E., Gao,J., Dogrusoz,U., Gross,B.E., Sumer,S.O., Aksoy,B.A., Jacobsen,A., Byrne,C.J., 567 Heuer,M.L., Larsson,E., et al. (2012) The cBio cancer genomics portal: an open platform for 568 exploring multidimensional cancer genomics data. Cancer Discov, 2, 401–404.

569 24. Gao,J., Aksoy,B.A., Dogrusoz,U., Dresdner,G., Gross,B., Sumer,S.O., Sun,Y., Jacobsen,A., Sinha,R., 570 Larsson,E., et al. (2013) Integrative analysis of complex cancer genomics and clinical profiles 571 using the cBioPortal. Sci Signal, 6, pl1.

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

572 25. McInnes,L., Healy,J. and Melville,J. (2020) UMAP: Uniform Manifold Approximation and Projection 573 for Dimension Reduction. arXiv:1802.03426 [cs, stat].

574 26. Robinson,J.T., Thorvaldsdóttir,H., Winckler,W., Guttman,M., Lander,E.S., Getz,G. and Mesirov,J.P. 575 (2011) Integrative genomics viewer. Nat Biotechnol, 29, 24–26.

576 27. Heinz,S., Benner,C., Spann,N., Bertolino,E., Lin,Y.C., Laslo,P., Cheng,J.X., Murre,C., Singh,H. and 577 Glass,C.K. (2010) Simple combinations of lineage-determining transcription factors prime cis- 578 regulatory elements required for macrophage and B cell identities. Mol Cell, 38, 576–589.

579 28. McLean,C.Y., Bristor,D., Hiller,M., Clarke,S.L., Schaar,B.T., Lowe,C.B., Wenger,A.M. and Bejerano,G. 580 (2010) GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol, 28, 581 495–501.

582 29. Chen,E.Y., Tan,C.M., Kou,Y., Duan,Q., Wang,Z., Meirelles,G.V., Clark,N.R. and Ma’ayan,A. (2013) 583 Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC 584 Bioinformatics, 14, 128.

585 30. Kuleshov,M.V., Jones,M.R., Rouillard,A.D., Fernandez,N.F., Duan,Q., Wang,Z., Koplev,S., Jenkins,S.L., 586 Jagodnik,K.M., Lachmann,A., et al. (2016) Enrichr: a comprehensive gene set enrichment 587 analysis web server 2016 update. Nucleic Acids Res, 44, W90-97.

588 31. Dunham,I., Kundaje,A., Aldred,S.F., Collins,P.J., Davis,C.A., Doyle,F., Epstein,C.B., Frietze,S., Harrow,J., 589 Kaul,R., et al. (2012) An integrated encyclopedia of DNA elements in the . 590 Nature, 489, 57–74.

591 32. Sloan,C.A., Chan,E.T., Davidson,J.M., Malladi,V.S., Strattan,J.S., Hitz,B.C., Gabdank,I., Narayanan,A.K., 592 Ho,M., Lee,B.T., et al. (2016) ENCODE data at the ENCODE portal. Nucleic Acids Research, 44, 593 D726–D732.

594 33. Subramanian,A., Tamayo,P., Mootha,V.K., Mukherjee,S., Ebert,B.L., Gillette,M.A., Paulovich,A., 595 Pomeroy,S.L., Golub,T.R., Lander,E.S., et al. (2005) Gene set enrichment analysis: A knowledge- 596 based approach for interpreting genome-wide expression profiles. PNAS, 102, 15545–15550.

597 34. Mootha,V.K., Lindgren,C.M., Eriksson,K.-F., Subramanian,A., Sihag,S., Lehar,J., Puigserver,P., 598 Carlsson,E., Ridderstråle,M., Laurila,E., et al. (2003) PGC-1α-responsive genes involved in 599 oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics, 600 34, 267–273.

601 35. Muto,Y., Wilson,P.C., Ledru,N., Wu,H., Dimke,H., Waikar,S.S. and Humphreys,B.D. (2021) Single cell 602 transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult 603 human kidney. Nature Communications, 12, 2190.

604 36. Carter,B. and Zhao,K. (2021) The epigenetic basis of cellular heterogeneity. Nature Reviews Genetics, 605 22, 235–250.

606 37. Meshorer,E. and Misteli,T. (2006) Chromatin in pluripotent embryonic stem cells and differentiation. 607 Nature Reviews Molecular Cell Biology, 7, 540–546.

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

608 38. Ho,Y.-T., Shimbo,T., Wijaya,E., Ouchi,Y., Takaki,E., Yamamoto,R., Kikuchi,Y., Kaneda,Y. and Tamai,K. 609 (2018) Chromatin accessibility identifies diversity in mesenchymal stem cells from different 610 tissue origins. Scientific Reports, 8, 17765.

611 39. Buenrostro,J.D., Wu,B., Chang,H.Y. and Greenleaf,W.J. (2015) ATAC-seq: A Method for Assaying 612 Chromatin Accessibility Genome-Wide. Current Protocols in Molecular Biology, 109, 21.29.1- 613 21.29.9.

614 40. Bates,J.P., Derakhshandeh,R., Jones,L. and Webb,T.J. (2018) Mechanisms of immune evasion in 615 breast cancer. BMC Cancer, 18, 556.

616 41. Ciriello,G., Gatza,M.L., Beck,A.H., Wilkerson,M.D., Rhie,S.K., Pastore,A., Zhang,H., McLellan,M., 617 Yau,C., Kandoth,C., et al. (2015) Comprehensive Molecular Portraits of Invasive Lobular Breast 618 Cancer. Cell, 163, 506–519.

619 42. Netanely,D., Avraham,A., Ben-Baruch,A., Evron,E. and Shamir,R. (2016) Expression and methylation 620 patterns partition luminal-A breast tumors into distinct prognostic subgroups. Breast Cancer 621 Research, 18, 74.

622 43. Du,T., Zhu,L., Levine,K.M., Tasdemir,N., Lee,A.V., Vignali,D.A.A., Houten,B.V., Tseng,G.C. and 623 Oesterreich,S. (2018) Invasive lobular and ductal breast carcinoma differ in immune response, 624 protein translation efficiency and metabolism. Sci Rep, 8, 7205.

625 44. Gévry,N., Hardy,S., Jacques,P.-É., Laflamme,L., Svotelis,A., Robert,F. and Gaudreau,L. (2009) Histone 626 H2A.Z is essential for estrogen receptor signaling. Genes Dev., 23, 1522–1533.

627 45. Brunelle,M., Nordell Markovits,A., Rodrigue,S., Lupien,M., Jacques,P.-É. and Gévry,N. (2015) The 628 histone variant H2A.Z is an important regulator of enhancer activity. Nucleic Acids Research, 43, 629 9742–9756.

630 46. Pathmanathan,N. and Balleine,R.L. (2013) Ki67 and proliferation in breast cancer. Journal of Clinical 631 Pathology, 66, 512–516.

632 47. Garcia-Martinez,L., Zhang,Y., Nakata,Y., Chan,H.L. and Morey,L. (2021) Epigenetic mechanisms in 633 breast cancer therapy and resistance. Nat Commun, 12, 1786.

634 48. Murphy,P.J., Wu,S.F., James,C.R., Wike,C.L. and Cairns,B.R. (2018) Placeholder Nucleosomes Underlie 635 Germline-to-Embryo DNA Methylation Reprogramming. Cell, 172, 993-1006.e13.

636 49. Zilberman,D., Coleman-Derr,D., Ballinger,T. and Henikoff,S. (2008) Histone H2A.Z and DNA 637 methylation are mutually antagonistic chromatin marks. Nature, 456, 125–129.

638 50. Conerly,M.L., Teves,S.S., Diolaiti,D., Ulrich,M., Eisenman,R.N. and Henikoff,S. (2010) Changes in 639 H2A.Z occupancy and DNA methylation during B-cell lymphomagenesis. Genome Res., 640 10.1101/gr.106542.110.

641 51. Fleischer,T., Tekpli,X., Mathelier,A., Wang,S., Nebdal,D., Dhakal,H.P., Sahlberg,K.K., Schlichting,E., 642 Børresen-Dale,A.-L., Borgen,E., et al. (2017) DNA methylation at enhancers identifies distinct 643 breast cancer lineages. Nat Commun, 8, 1379.

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

644 52. Mao,Z., Pan,L., Wang,W., Sun,J., Shan,S., Dong,Q., Liang,X., Dai,L., Ding,X., Chen,S., et al. (2014) 645 Anp32e, a higher eukaryotic histone chaperone directs preferential recognition for H2A.Z. Cell 646 Research, 24, 389–399.

647 53. Obri,A., Ouararhni,K., Papin,C., Diebold,M.-L., Padmanabhan,K., Marek,M., Stoll,I., Roy,L., Reilly,P.T., 648 Mak,T.W., et al. (2014) ANP32E is a histone chaperone that removes H2A.Z from chromatin. 649 Nature, 505, 648–653.

650

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 1 A Tumor Groups IHC Subtype PAM50 Subtype Group 1 HR+/HER2- Lum-A Group 2 HR+/HER2+ Lum-B Group 3 HR+ HER2-En HR-/HER2+ Basal-L

TNBC UMAP D2

UMAP D1 B IHC Subtypes C PAM50 Subtypes

Group 1 Group 2 Group 3 Group 1 Group 2 Group 3

E Genes High in Group 1 HR+/HER2- Tumors D Gene Expression of 06-log10(adjusted p value) HR+/HER2- Tumors IRF6 TF ARCHS4 Co-expression 6.60 TP63 TF ARCHS4 Co-expression 5.40 3284

1e+03 Jensen Tissues - Stratifed Epithelium 2.07

Epidermis Development 1.56

1e+01 Genes Low in Group 1 HR+/HER2- Tumors

lg2FC >1 01-log10(adjusted p value) 3 Group 1 (FPKM)

1e−01 951 ChEA 2016 - ESR2 ChIP-Seq MCF7 13.86 Serine-type peptidase activity 10.53

1e−03 Reg. of Humoral Immune Response 10.09 1e−02 1e+02 Not Group 1 (FPKM) ChEA 2016 - ESR1 ChIP-Seq MCF7 8.75

G F Genes High in Group 3 Luminal-A Tumors Gene Expression of Luminal-A Tumors 07-log10(adjusted p value) 5 Complement activation - classical pathway 75.44 235 Immunoglobulin receptor binding 49.01 Reg. of B cell activation 43.81 1e+2 Serine-type endopeptidase activity 37.17

Genes High in Group 3 Luminal-A Tumors lg2FC > 1

Group 3 (FPKM) 288 07-log10(adjusted p value) 1e−2 BioPlanet 2019 - 7.80 Endopeptidase inhibitor Activity 5.09

1e−2 1e+2 -Reg.ofEndopeptidase 3.71 Group 2 (FPKM) Serine-type endopeptidase inhibitor 2.91 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 1:

Chroman Accessibility Disnguishes Breast Cancer Subtypes. A) UMAP dimension reducon plots depicng three disnct groups of tumors, colored by group (n=74), IHC subtype (n=69) and PAM50 subtype (n=65). B-C) Individual pie charts depict groups of tumors based on IHC subtypes (B) and PAM50 subtypes (C), indicang tumor groups disnguish breast cancer subtypes. D-E) Scaerplots depicng genes found to have higher or lower expression in HR+/HER2- tumors in group 1 (n=6) compared to rest (n=40) (D) and in Luminal-A tumors in group 3 (n=8) compared to group 2 (n=17) (E). F-G) Bar charts depicng signifcance of gene ontology results from Enrichr, invesgang genes found to have higher and lower expression in HR+/HER2- paents in group 1 compared to rest (F) and in Luminal-A tumors in group 3 compared to group 2 (G). Adjusted p-values obtained within Enrichr. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2 A Chromatin Accessibility C Chromatin Accessibility of Tumor Group Luminal-A Tumors PAM50 Subtype Rank Score IHC Subtype 74 lg2FC > 2

Tumor Groups 1e+03 2133 Group 1 Group 2 Group 3 0 Group 3 (RPKM) 1e+01 1227 PAM50 Subtype Signature 1 Lum-A 7026 regions Lum-B HER2-En 1e−01 1e+01 1e+03 Basal-L Group 2 (RPKM) D IHC Subtype Regions High in Group 3 Lum-A Tumors HR+/HER2- 0 -log10(FDR q value) 20 HR+/HER2+ +reg.oflocomotion 20.36 HR+ response to monosaccharide 15.85 HR-/HER2+ Signature 2 TNBC 1314 regions reg. of sequestering triglyceride 15.53 Signature 3 response to carbohydrate 13.86 527 regions B Regions Low in Group 3 Lum-A Tumors Sig 3 Sig 1 20kb Sig 2 0-17 0 -log10(FDR q value) 15 Group 1 otic vesicle development 15.59 +reg.ofkidneydevelopment 7.64 0-17 +reg.ofmesenchymalcellprolif. 7.18 Group 2 skeletal system morphogenesis 6.90

0-17 E F Group 3 Histologic Subtype Histologic Subtype

Ductal 100 Lobular NR_120402.1 Mixed Metaplastic 80 Sig 1 10kb Sig 1 Sig 3 0-71 Pagets Disease Group 1 60 0-71 Group 2 inGroup 40

0-71

Group 3 20

FAM49A 0

Group 1 Group 2 Group 3 Figure 2: All Tumors

Disnct Chroman Accessibility Signatures Associate with each Tumor Group. A) Heatmap showing 3 groups of chroman regions, each showing greater accessibility in their respecve tumor group compared to the rest (Lg2FC > 2.5). B) Screenshots from IGV depicng average accessibility of tumor groups in regions within each chroman signature. C) Scaerplot displaying regions found to have higher or lower accessibility in Luminal-A tumors in group 3 (n=8) compared to group 2 (n=17). D) Bar charts depicng signifcance of gene ontology results from GREAT, invesgang genes nearby (<1000 b) regions found to have higher and lower accessibility in group 3 Luminal-A tumors compared to group 2. FDR -values obtained within GREAT. E-F) UMAP plot (E) and staced barplot (F) showing the distribuon of cancer types between each tumor group. P-value in F obtained from Chi-suared test within cBioPortal. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 3

A in in transcription Motif p-value cluster background factor CT CC TT 1e-71 29.4 20.4 Sig 1 GAA C C SO TT GGG T A CCCC G AG AGGCTAGTGTTAAAA Sig 2 G TTA 1e-98 67.9 39.0 FO A CC G T G C G

TGCCT ACCAAACCTAGGAC G T TATTGG A Sig 3 CG C 1e-47 37.4 12.5 CEBP A T C ATATCT G

GCC TGT AAC GGCG T GGGC C A ATTT AAA Group 1 Group 2 Group 3 B Accessibility in Signatures

SO FO CEBP 30 30 Sig. 1 Sig. 2 15 Sig. 3

10 10 5

-1kb +1kb Accessibility in All Peaks

SO FO CEBP 20 20 15

10 10 5

C Expression of SO and FO Factors 1 FOA1 2 FOP4 3 FOC1 4 FOO3 5 FOI1 6 FOM1 7 FOJ1 8 FOO1 9 FOJ3 10 FOP1 1 SO9 2 SO4 3 SO8 4 SO2 5 SO12 6 SO13 7 SO11 8 SO18 9 SO15 10 SO17 Tumor Group

Figure 3:

Accessibility at F and S Binding Sites Defne Tumor Groups. A) Table displaying top mof result from HMER for each chroman signature. Due to similarity across S and F binding mofs, we refer to S6 simply as S, and FM1 simply as F. P-values obtained within HMER. B) Profle plots depicng average accessibility of tumor groups in mof regions, both within the mofs respecve signature (top) and in all accessible pea regions in tumors (boom), indicang that group 1 and 2 tumors show increased accessibility at S and F mofs, respecvely. e again use the S6 mof to represent S mofs, and the FM1 mof to represent F mofs. C) Heatmap showing expression of S and F factors across tumor groups. Factors are ordered from 1 to 10 by standard deviaon across tumors. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4 A B Stage 1 Stage 2 Stage 3 Stage 4 Signature 1 Signature 2 Signature 3 Tumor Stage 1 8 20 15 2 3 10 4 5 10 4 -1Kb Center +1Kb -1Kb Center +1Kb -1Kb Center +1Kb C Signature 1 Signature 1 Accessibility Signature 2 Accessibility Signature 3 Accessibility 7026 regions p0.065 n.s. 6 40 p0.038 p0.124

14 2 30

10 30 20 20 Signature 2 20 1314 regions 10 6 10

18 Signature 3 527 regions 2 2 -1Kb +1Kb 0 0 Center I II III I I II III I I II III I D FO Motif Accessibility F 0-60 5kb 0-82 p0.033 Sig 2 Sig 2 20kb Sig 1 p0.084 Stage 1

0-60 0-60 p0.039 Stage 2 25 0-60 0-60 Stage 3

0-60 0-60 15 Stage 4

0-49 0-32 FOA1 (ChIP-Seq) 5 I II III I AGTR1 FOXP1 E FOA1 Expression 0-22 0-54 Sig 2 40kb Sig 3 5kb Sig 2 n.s. Stage 1

0-22 0-54 200 Stage 2

0-22 0-54 Stage 3

0-22 0-54 100 Stage 4

0-27 0-36 FOA1 (ChIP-Seq)

0 SYN2 GTF2E1 I II III I Figure 4:

Chroman Accessibility in F mofs and Signature 2 Regions Associate with Tumor Progression Stages. A-B) Heatmaps (A) and profle plots (B) showing accessibility in signatures 1, 2 and 3 across tumor stages. Heatmaps have regions ordered from greatest to least average accessibility across tumor stages. C) Boxplots of accessibility in signatures 1, 2 and 3 across tumor stages, indicang that only signature 2 shows an accessibility trend across stages. D-E) Boxplots comparing accessibility of F mofs in accessible pea regions (n=96280) (D) and FA1 expression levels (E) across tumor stages. F) Screenshots from IGV depicng average accessibility of tumor stages and FA1 binding in MCF-7 cells from ChIP-Se in regions within each chroman signature. ChIP-Se data is Log2FC over control. P-values in C-E obtained from one-tailed parametric t-tests. is p<0.01, is p<0.001, is p<0.0001. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 5 A FOA1 ER H3K27ac C D ANP32E Expression ANP32E Expression no Group 1 p0.090 p0.119 p0.034 Signature 1 50 30 7026 regions

p0.032 20

FPKM 30

10

10 Signature 2 1314 regions I II III I I II III I

Signature 3 527 regions E ANP32E Expression with F ANP32E Expression with -1Kb +1Kb -1Kb +1Kb -1Kb +1Kb Center Center Center Signature 2 Accessibility FO Motif Accessibility 30 R-0.276 0 3 0 2 0 5 R-0.409 30 p0.0003 p0.017

B 0 ANP32E Expression 120

10

Tumor Group Stage PAM50 Subtype 1 10 Tumor Stage 3 2 3 Portion Basal-L 1/15 0/15 1/15 1/15 9/14 4

Portion Stage 3/4 8/15 10/15 6/15 2/15 3/14 FO Motif Accessibility (RPKM) Signature 1 Accessibility (RPKM) 1 Tumor Groups PAM50 Subtype Tumor Stage 5 Group 1 Lum-A 1 Group 2 Lum-B 2 3 10 30 100 3 10 30 100 ANP32E Expression (FPKM) ANP32E Expression (FPKM) Group 3 HER2-En 3 4 Basal-L Figure 5:

ANP32E Expression Levels Associate with F Mof Accessibility and Tumor Stage. A) Heatmaps showing binding of FA1, ER, and H327ac in MCF-7 cells within regions from signatures 1, 2 and 3. Data from ChIP-Se of MCF-7 cells regions sorted from greatest to least FA1 enrichment. B) Heatmap with tumors ordered by ANP32E expression and annotated by tumor group, PAM50 subtype and tumor stage, indicang that late-stage tumors group with lower ANP32E expression. C-D) Boxplots comparing ANP32E expression by tumor stage, both in all tumors with available stage data (n=73) (C) and in only tumors from groups 2 and 3 (n=59) (D), indicang that late-stage tumors have signifcantly lower expression of ANP32E.P-valuesobtainedfromone-tailedparametrict-tests.isp<0.01,isp<0.001,isp<0.0001.E-F)Scaerplots showing correlaon of ANP32E expression with tumors accessibility in signature 2 regions (E) and with tumors average accessibility in F mofs across all accessible pea regions (n=96280) (F), with tumors colored by stage. R denotes Pearson correlaon coecient p-values from Pearsons product moment correlaon coecient. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.28.454151; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 6 A All Tumors D No Basal-L Tumors Natural Killer Cell DNA Dependent DNA Activation (GO) Odorant Binding (GO) Replication (GO) 0.6 0.7 0.6 FDR0.011 FDR0.001 FDR0.001 NES1.9538 NES2.4090 NES2.920

Score

Score

Score

Enrichment

Enrichment

Enrichment 0.0 0.0 0.0

high ANP32E high ANP32E high ANP32E positive correlation positive correlation positive correlation cross ero 30880 cross ero 30880 cross ero 22842 low ANP32E low ANP32E low ANP32E positive correlation positive correlation positive correlation CCNE1 E2F7 E2F8 TICRR MCM10 AICDA CCNE2 BLM POL NUGGC ATAD EME1 WDD1 BRCA2 GINS1 CDC7 DNA2 FBXO DSCC1 ORC1 CDC6 RFC3 GINS4 SAMD1 ORC6 MMS22L GEN1 NUCKS1 PCNA GRF1 IFNA4 IFNA10 IFNA7 IFNA17 IFNA16 IFNA21 IFNA13 ULBP1 UNC13D ULBP3 IFNA2 KLRF2 IFNB1 IFNA6 NCR1 IFNA14 SLAMF7 IL18R1 SLAMF6 IL12B SLAMF1 PIK3CD IL21R CD2 CD244 KLRK1 KLRC4 ULBP2 IL2RB IFNW1 high ANP32E OR14C36 ORAC1 OR9G4 OR8K1 ORK3 ORD14 ORAS1 OR81 ORD16 OR141 ORB17 ORF1 ORAU1 OR92 ORM1 ORK1 ORD18 OR8U1 ORAP2 ORAN1 ORAK2 ORAC2 OR14I1 ORL2 OR82 ORM9 ORI1 OR9K2 OR10W1 ORAK3P low ANP32E Reg. of Natural Killer Cell Keratinocyte Dierentiation (GO) Cilium Movement (GO) Mediated Immunity (GO) 0.0 0.7 0.0 FDR0.0082 FDR0.001 NES-2.232 NES2.500 FDR0.0192 NES-2.102

Score

Score

Score

Enrichment

Enrichment Enrichment -0.6 -0.5 0.0

high ANP32E high ANP32E high ANP32E positive correlation positive correlation positive correlation cross ero 30880 cross ero 30880 cross ero 22842 low ANP32E low ANP32E low ANP32E positive correlation positive correlation positive correlation IL21 IL12B S2D1A KIR2DL4 CD96 S2D1B SLAMF6 CRTAM NCR1 KLRC4KLRK1 IL18RAP CLNK CD226 NCR3 KLRK1 IL12A LAG3 SERPINB9 RASGRP1 LILRB1 LGALS9 CLEC12B RAET1E A1 ACR2 LAE PR LAF STATB PIK3R6 CABYR NPP3 OFD1 CCDC103 DNA11 TTLL1 KIF27 DNA1 ARMC4 RSP9 CCDC39 DNAI2 YDIN STK36 DPCD SPA17 SPAG16 ADCY10 CCDC114 RSP4A DNA7 LRRC6 CCDC40 NME CCDC11 DNAI1 DNA ROPN1L DNAAF1 SPAG17 TP63 POU3F1 SPRR1B ACER1 DNASE1L2 ABCA12 SPRR2G FLG FOXN1 LCE2A KRT2 SPRR4 TGM3 LCE3E KRT10 LCE3D CRCT1 LCE1D LCEA LCE1F LCE1A LCE1E LCE3C LCE1B LCE1C LCE6A LCE2D LCE2C LOR LCE2B high ANP32E low ANP32E

B C ANP32E and KI67 E KI67 Expression Expression Bsllie Horoe ecetor ositie FOXA1 p0.012 Erly Stge FOXA1 te Stge SOX9 AN 40 AN FOXA1 10

A32E oss

20 Expression all tumors 1 R0.31

K67 p0.007

no group 1 R0.02 SOX Motif Accessibility pn.s. 0.1 0 G1 G2 G3 3 10 30 100 FOX Motif Accessibility ANP32E Expression Figure 6:

ANP32E Expression Levels Associate with Disnct Expression Profles. A) GSEA plots depicng gene ontology associaons with high and low ANP32E expression levels for all tumors with RA-se data in the TCGA-BRCA project (n=1222). FDR values and normalied enrichment scores (ES) obtained within GSEA. B) Boxplot of KI expression across tumor groups. P-value from one-tailed parametric t-test. C) Scaerplot showing the associaon between ANP32E and KI expression levels, with tumors colored by tumor group. R denotes Pearson correlaon coecient p-values from Pearsons product moment correlaon coecient (done for all tumors, and all tumors excluding group 1). D) GSEA plots for all tumors excluding Basal-L (n=1110). E) Model displaying the associaon of ANP32E expression levels with mulple characteriscs of breast cancer.